top of page

Protein Structure Prediction

BioCodeKb - Bioinformatics Knowledgebase

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence that is the prediction of its folding and its secondary and tertiary structure from its primary structure.

The main techniques used to determine protein 3D structure are X-ray crystallography and nuclear magnetic resonance (NMR). In X-ray crystallography the protein is crystallized and then using X-ray diffraction the structure of protein is determined. Determination of 3D structure by X-ray crystallography is not always straightforward and sometimes takes as much as three to five years. NMR is another useful technique to determine the protein structure. The advantage of NMR over X-ray crystallography is that the protein can be studied in an aqueous environment that may resemble its actual physiological state more closely. The main limitation of NMR is that it is only suitable for small proteins that have less than 150 amino acids. The gap between known protein sequences and the known protein structure is increasing exponentially. Thus, there is a need to develop the computational techniques to predict protein structures. Computer-aided protein conformation/tertiary structure prediction could facilitate;

  1. the prediction of tertiary structures for proteins with known sequences and unknown structures

  2. understanding of protein folding

  3. engineering of proteins so that new functions may be incorporated

  4. drug designing.

There are two general approaches to predict the structure of a protein of interest (the ‘target’); template-based modelling, in which the previously determined structure of a related protein is used to model the unknown structure of the target; and template-free modelling, which does not rely on global similarity to a structure in the PDB and hence can be applied to proteins with novel folds.

General steps in Prediction

Conformation initialization

Actually, the key difference between the “template-based” and “template-free” methods is the way of conformation initialization. The template-based method obtains the initial conformation by searching for the solved structures which are homologous or structurally similar with the target protein. The template-free method usually constructs the initial conformation by fragment assembly. In most cases the structural template homologous to the target protein can be identified from the PDB database by sequence alignment and an accurate alignment between target and template can be built.

However, there is no guarantee that the satisfactory structural templates for any target protein can always be found.

Conformational search

We can continue to run simulation with the guide of a certain force field to search for near-native conformations step by step. As a typical biological macromolecule, protein consists of thousands of atoms and its conformational degrees of freedom are huge. Therefore, a simplified representation of protein conformations becomes particularly important for speeding up the simulation of protein folding process.

Structure selection

Following the conformational search, a large number of structures of target protein are generated. One of the unsolved issues in both molecular dynamics simulation and Monte Carlo simulation is that the conformations are often trapped at the local minimal state. Even with the global minimal state identified, the conformation is not necessarily corresponding to the one closest to native state because of the inadequacies of force field. Thus, the common procedure during simulation is to regularly output lower energy intermediate structures for subsequent conformational screening.

All-atom structure reconstruction

The all-atom structure should be reconstructed based on the reduced models. Some prediction methods adopt the representation of “Cα atom” plus “virtual center of side chain”, where the “virtual center of side chain” only acts as an assistant for determining the position of Cα atom during conformational search and the output structure contains only Cα atoms. In that case, the reconstruction process is usually divided into two separate steps. The first step is to rebuild the backbone atoms (C N and O) based on the position of Cα atoms, which is the primary function of many methods developed specifically for all-atom reconstruction. The second step is to rebuild the side chain for every residue.

Structure refinement

The process of structure selection by clustering method may also bring some local structural issues if the structures of cluster centroid are used. Therefore, it is almost a routine step to further refine the structure after all-atom reconstruction.

BioinfoLytics Company

Our company BioinfoLytics is affliated with BioCode and is a project, where we are providing many topics on Genomics, Proteomics, their analysis using many tools in a better and advance way, Sequence Alignment & Analysis, Bioinformatics Scripting & Software Development, Phylogenetic and Phylogenomic Analysis, Functional Analysis, Biological Data Analysis & Visualization, Custom Analysis, Biological Database Analysis, Molecular Docking, Protein Structure Prediction and Molecular Dynamics etc. for the seekers of Biocode to further develop their interest to take part in these services to fulfill their requirements and obtain their desired results. We are providing such a platform where one can find opportunity to learn, research projects analysis and get help and huge knowledge based on molecular, computational and analytical biology.

We are providing “Protein Structure Prediction” service to the bioinformatics community through our expertise using bioinformatics tools that provide help in finding of sequence similarity, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein-fold recognition, and constructing 3D protein structures to atomic detail.


Need to learn more about Protein Structure Prediction and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page