Secondary structure prediction is a set of techniques in bioinformatics that aim to predict the local secondary structures of proteins based only on knowledge of their amino acid sequence.
The secondary structure of proteins is determined by the pattern of hydrogen bonding. A large number of server and tools are used to predict the secondary structure analysis.
Protein secondary structure refers to the local conformation proteins’ polypeptide backbone. There are two regular secondary structure states, α-helix (H) and β-strand (E), and one irregular secondary structure type, the coil region (C). Sander developed a secondary structure assignment method Dictionary of Secondary Structure of Proteins (DSSP)3, which automatically assigns secondary structure into eight states (H, E, B, T, S, L, G, and I) according to hydrogen-bonding patterns. These eight states are often further simplified into three states of helix, sheet and coil. The most widely used convention is that helix is designated as G, H and I; sheet as B and E; and all other states are designated as a coils.
One of the first approaches for predicting protein secondary structure, uses a combination of statistical and heuristic rules4,5. The GOR6 method formalizes the secondary structure prediction problem within an information-theoretic framework. Position specific scoring matrix (PSSM)7 based on PSIBLAST8 shows evolutionary information and has made the most significant improvements in protein secondary structure prediction. Many machine learning methods have been developed to predict protein secondary structure, and exhibit good performance by exploiting evolutionary information, as well as statistic information about amino acid subsequences9. For example, many neural network (NN)10,11,12,13,14 methods, hidden Markov model (HMM)15,16,17, support vector machines (SVM)18,19,20,21, and K-nearest neighbors22 have substantial success and Q3 accuracy has reached to 80%. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences23. Recently SPIDER3 improved the prediction of protein secondary structure by capturing non-local interactions using long short-term memory bidirectional recurrent neural networks29.
There are two types of protein secondary structure prediction algorithms. A single-sequence algorithm does not use information about other (homologous) proteins. The algorithm should be suitable for a sequence with no similarity to any other protein sequence. Algorithms of another type are explicitly using sequences of homologous proteins, which often have similar structures. The prediction accuracy of such an algorithm should be higher than one of a single-sequence algorithm due to incorporation of additional evolutionary information from multiple alignments.
The estimated theoretical limit of the accuracy of secondary structure assignment from experimentally determined 3D structure is 88%. The of the best current single-sequence prediction methods is below 70%. BSPSS, SIMPA, SOPM, and GOR V are examples of single-sequence prediction algorithms. Among the current best methods that use evolutionary information (multiple alignments, PSI-BLAST profiles), one can mention PSIPRED, Porter, SSpro, APSSP2, SVMpsi, PHDpsi, JPRED2 and PROF.
Single-sequence algorithms for protein secondary structure prediction are important because a significant percentage of the proteins identified in genome sequencing projects have no detectable sequence similarity to any known protein. Particularly in sequenced prokaryotic genomes, about a third of the protein coding genes are annotated as encoding hypothetical proteins lacking similarity to any protein with a known function. Also, out of the 25,000 genes believed to be present in the human genome, no more than 40–60% can be assigned a functional role based on similarity to known proteins. For a larger picture, the Pfam database allows one to get information on the distribution of proteins with known functional domains in three domains of life.
BioinfoLytics Company
Our company, BioinfoLytics, is affliated with BioCode and is a project, which is covering many topics on Genomics, Proteomics, their analysis using many tools in a cool way, Sequence Alignment & Analysis, Bioinformatics Scripting & Software Development, Phylogenetic and Phylogenomic Analysis, Functional Analysis, Biological Data Analysis & Visualization, Custom Analysis, Biological Database Analysis, Molecular Docking, Protein Structure Prediction and Molecular Dynamics etc for the seekers of Biocode to further develop their interest to take part in these services to fulfill their requirements and obtain their desired results. We are providing such a platform where one can find opportunity to learn, research projects analysis and get help and huge knowledge based on molecular, computational and analytical biology.
We are providing “Secondary Structure Prediction and Analysis” service to our customers along tools to study methods to predict secondary structures of our research interest and their analysis and evaluation and to strive high quality research and will advance science in the domain of Sequence Alignment & Analysis.