An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore help in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments.
HHrepID is a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method has the following features;
(1) generates a multiple alignment of repeats
(2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments
(3) improves alignment quality through an algorithm that maximizes the expected accuracy
(4) it is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method
(5) improves sensitivity through a new approach to assess statistical significance.
The demonstration of HHrepID is done using the example of the outer membrane protein A (OmpA). The membrane domain of OmpA consists of a closed, eight-stranded antiparallel β-barrel with short turns at the periplasmatic barrel end and long, flexible loops at the extracellular end. It has been speculated that outer membrane β-barrels may have originated by duplication of an ancient β-hairpin motif. Due to its high sensitivity, HHrepID is able to unambiguously identify four β-hairpin repeats in OmpA. Overall, these findings are the first clear evidence for the origin of OmpA, and indeed of all outer membrane β-barrels, by duplication of an ancient β-hairpin module.
HHrepID clearly identifies all four repeat units with the correct boundaries. The dot plot shows the four repeat alignments (blue) as they are predicted by HHrepID. Position and length of the identified repeat units are depicted above and to the left of the dot plot. In the structure, the detected repeat units are highlighted in their respective colors. Green paths correspond to structural alignments of repeats. Cells in which HHrepID’s alignments agree with the structural alignments are highlighted in red.
The MPI Bioinformatics Toolkit is a free, one-stop web service for protein bioinformatic analysis. It currently offers 34 interconnected external and in-house tools, whose functionality covers sequence similarity searching, alignment construction, detection of sequence features, structure prediction, and sequence classification. This breadth has made the Toolkit an important resource for experimental biology and for teaching bioinformatic inquiry. Recently, the first version of the Toolkit has been replaced, which was released in 2005 and had served around 2.5 million queries, with an entirely new version, focusing on improved features for the comprehensive analysis of proteins, as well as on promoting teaching. For instance, the popular remote homology detection server, HHpred, now allows pairwise comparison of two sequences or alignments and offers additional profile HMMs for several model organisms and domain databases.