top of page

Homology Modeling

BioCodeKb - Bioinformatics Knowledgebase

Homology modeling builds an atomic model of proteins based on an experimentally determined structure that is closely related at the sequence level.


As the name suggests, homology modeling predicts protein structures based on sequence homology with known structures. It is also known as comparative modeling. The principle behind it is that if two proteins share a high enough sequence similarity, they are likely to have very similar three-dimensional structures. Homology modeling produces an all-atom model based on alignment with template proteins.


Template Selection

The template selection involves searching the Protein Data Bank (PDB) for homologous proteins with determined structures. The search can be performed using a heuristic pairwise alignment search program such as BLAST or FASTA. However, the use of dynamic programming based search programs such as SSEARCH or ScanPS can result in more sensitive search results. The relatively small size of the structural database means that the search time using the exhaustive method is still within reasonable limits, while giving a more sensitive result to ensure the best possible similarity hits. As a rule of thumb, a database protein should have at least 30% sequence identity with the query sequence to be selected as template. Occasionally,a20%identitylevel can be used as threshold as long as the identity of the sequence pair falls within the “safe zone”. On the other hand, there may be a situation in which no highly similar sequences can be found in the structure database. In that instance, template selection can become difficult.


Sequence Alignment

Once the structure with the highest sequence similarity is identified as a template, the full-length sequences of the template and target proteins need to be realigned using refined alignment algorithms to obtain optimal alignment. Errors made in the alignment step cannot be corrected in the coming modeling steps. Therefore, the best possible multiple alignment algorithms, such as Praline and T-Coffee, should be used for this purpose. Even alignment using the best alignment program may not be error free If necessary, manual refinement of the alignment should be carried out to improve alignment quality.


Backbone Model Building

Residue in the aligned regions of the target protein can assume a similar structure as the template proteins, meaning that the coordinates of the corresponding residues of the template proteins can be simply copied on to the target protein. If the two aligned residues are identical, coordinates of the side chain at atoms are copied along with the main chain atoms. If the two residues differ, only the backbone atoms can be copied. The side chain atoms are rebuilt in a subsequent procedure.


Loop Modeling

In most cases, alignment between model and template sequence contain gaps. By means of insertions and deletions with some conformational changes to the backbone it can be modeled, although it rarely happens to secondary structures. So it is safe to shift the insertion and deletions of the alignment, out of helices or strands and placing them in loops or coils.

There are two main ways to overcome this and model the loop region:

Knowledge based method:

User can search PDB for known loops with endpoints that match the residues between loops that have to be inserted and simply copy the loop conformation.

Energy based method:

The quality of a loop is determined with energy function and minimizes the function using Monte Carlo or molecular dynamics to find the best loop conformation.


Side Chain Modeling

Proteins that are structurally similar, have similar torsion angle about Ca-Cb bond (psi angle) when comparing with side chain conformations. In such cases, copying conserved residues entirely from the template to the model will result in higher accuracy than copying the backbone or re-predicting side chains. Side chain conformations are partially knowledge based which uses libraries of rotamers extracted from high resolution X ray structures. To build a position-specific rotamer library, one can take high-resolution protein structures and collects all stretches of three to seven residues (method dependant) with a given amino acid at the center.

Model Optimization

Sometimes the rotamers are predicted based on incorrect backbone or incorrect prediction. Such cases modeling programs either restrain the atom positions and/or apply only a few hundred steps of energy minimization to get an accurate value. This accuracy can be achieved by 2 ways.


Quantum force field:

To handle large molecules efficiently force field can be used, energies are therefore normally expressed as a function of the positions of the atomic nuclei only.


Self-parametrizing force fields:

The precision of a force field depends to a large extent on its parameters (e.g., Van der Waals radii, atomic charges). These parameters are usually obtained from quantum chemical calculations on small molecules and fitting to experimental data, following elaborate rule. By applying the force field to proteins, one implicitly assumes that a peptide chain is just the sum of its individual small molecule building blocks—the amino acids.


Model Validation

The models we obtain may contain errors. These errors mainly depend upon two things.

First, if the value is > 90% then accuracy can be compared to crystallography, except for a few individual side chains. If its value ranges between 50-90 % r.m.s.d. error can be as large as 1.5 Å, with considerably more errors. If the value is <25% the alignment turns out to be difficult for homology modeling, often leading to quite larger errors.


Second, errors in a model become less of a problem if they can be localized. Therefore, an essential step in the homology modeling process is the verification of the model. The errors can be estimated by calculating the model’s energy based on a force field. This method checks to see if the bond lengths and angles are in a normal range.

ad-scaled.webp

Need to learn more about Homology Modeling and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page