Type

Journal Article

Authors

Gianluca Pollastri
Catherine Mooney

Subjects

Mathematics

Topics
databases protein sequence analysis protein amino acid sequence sequence homology amino acid homology detection secondary structure sequence alignment bioinformatics machine learning alignments homology biology solvent accessibility proteins state of the art computational biology protein structure secondary chemistry

Beyond the twilight zone : automatedprediction of structural properties ofproteins by recursive neural networks andremote homology information (2009)

Abstract The prediction of 1D structural properties of proteins is an important step toward the prediction of protein structure and function, not only in the ab initio case but also when homology information to known structures is available. Despite this the vast majority of 1D predictors do not incorporate homology information into the prediction process. Wedevelop a novel structural alignment method,SAMD, which we use to build alignments of putative remote homologues that we compress into templates of structural frequency profiles. We use these templates as additional input to ensembles of recursiveneural networks, which we specialise for the prediction of query sequences that show only remote homology to any Protein Data Bank structure. We predict four 1D structural properties – secondary structure, relative solvent accessibility, backbone structural motifs, and contact density. Secondary structure prediction accuracy, tested by five-fold cross-validation on a large set of proteins allowing less than 25% sequence identity between training and test set and query sequences and templates, exceeds 82%, outperforming its ab initio counterpart, other state-of-the-art secondary structure predictors (Jpred 3 and PSIPRED) and two other systems based on PSI-BLAST and COMPASS templates. We show that structural information from homologues improves prediction accuracy well beyond theTwilight Zone of sequence similarity, even below 5% sequence identity, for all four structural properties. Significant improvement over the extraction ofstructural information directly from PDB templates suggests that the combination of sequence and template information is more informative than templates alone.
Collections Ireland -> University College Dublin -> CASL Research Collection
Ireland -> University College Dublin -> Institutes and Centres
Ireland -> University College Dublin -> Complex and Adaptive Systems Laboratory
Ireland -> University College Dublin -> College of Science
Ireland -> University College Dublin -> School of Computer Science
Ireland -> University College Dublin -> Computer Science Research Collection

Full list of authors on original publication

Gianluca Pollastri, Catherine Mooney

Experts in our system

1
Catherine Mooney
University College Dublin
Total Publications: 63