I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template-based fragment assembly simulations. Function insights of the target are then derived by re-threading the 3D models through protein function database BioLiP.
I-TASSER has been extended for structure-based protein function predictions, which provides annotations on ligand binding site, gene ontology and enzyme commission by structurally matching structural models of the target protein to the known proteins in protein function databases. It has an on-line server built in the Yang Zhang Lab at the University of Michigan, Ann Arbor, allowing users to submit sequences and obtain structure and function predictions. A standalone package of I-TASSER is available for download at the I-TASSER website.
I-TASSER server is an on-line platform that implements the I-TASSER based algorithms for protein structure and function predictions. It allows acedemic users to automatically generate high-quality model predictions of 3D structure and biological function of protein molecules from their amino acid sequences.
When user submits an amino acid sequence, the server first tries to retrieve template proteins of similar folds (or super-secondary structures) from the PDB library by LOMETS, a locally installed meta-threading approach.
In the second step, the continuous fragments excised from the PDB templates are reassembled into full-length models by replica-exchange Monte Carlo simulations with the threading unaligned regions (mainly loops) built by ab initio modeling. In cases where no appropriate template is identified by LOMETS, I-TASSER will build the whole structures by ab initio modeling. The low free-energy states are identified by SPICKER through clustering the simulation decoys.
In the third step, the fragment assembly simulation is performed again starting from the SPICKER cluster centroids, where the spatial restrains collected from both the LOMETS templates and the PDB structures by TM-align are used to guide the simulations. The purpose of the second iteration is to remove the steric clash as well as to refine the global topology of the cluster centroids. The decoys generated in the second simulations are then clustered and the lowest energy structures are selected. The final full-atomic models are obtained by REMO which builds the atomic details from the selected I-TASSER decoys through the optimization of the hydrogen-bonding network.
For predicting the biological function of the protein, the I-TASSER server matches the predicted 3D models to the proteins in 3 independent libraries which consist of proteins of known enzyme classification (EC) number, gene ontology (GO) vocabulary, and ligand-binding sites. The final results of function predictions are deduced from the consensus of top structural matches with the function scores calculated based on the confidence score of the I-TASSER structural models, the structural similarity between model and templates as evaluated by TM-score, and the sequence identity in the structurally aligned regions.
The pipeline consists of six consecutive steps:
Secondary structure prediction by PSSpred
Template detection by LOMETS
Fragment structure assembly using replica-exchange Monte Carlo simulation
Model selection by clustering structure decoys using SPICKER
Atomic-level structure refinement by fragment-guided molecular dynamics simulation (FG-MD) or ModRefiner
Structure-based biology function annotation by COACH
The output of the I-TASSER server include:
Up to five full-length atomic models
Estimated accuracy of the predicted models
GIF images of the predicted models
Predicted secondary structures
Predicted solvent accessibility
Top 10 threading alignment from LOMETS
Top 10 proteins in PDB which are structurally closest to the predicted models
Predicted Enzyme Classification and the confidence score
Predicted GO terms and the confidence score
Predicted ligand-binding sites and the confidence score
An image of the predicted ligand-binding sites