top of page

TCoffee

BioCodeKb - Bioinformatics Knowledgebase

Multiple sequence alignment (MSA) is one of the most widely used techniques in bioinformatics. Indeed the multiple comparisons of homologous sequences has applications in almost all fields of modern biology, from simple data monitoring up to sophisticated modeling-like structure prediction and phylogenetic reconstruction.


T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment consistency based software using a progressive approach. It creates a library of pairwise alignments to guide the multiple sequence alignment. It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs.


By default, T-Coffee will compare all our sequences two by two, producing a global alignment and a series of local alignments (using LAlign). We can use T-Coffee to align sequences or to combine the output of our favorite alignment methods such as Clustal, Mafft, Probcons, Muscle etc into one unique alignment such as M-Coffee.


T-Coffee provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives. The method is broadly based on the popular progressive approach to multiple alignment but avoids the most serious pitfalls caused by the greedy nature of this algorithm. With T-Coffee we pre-process a data set of all pair-wise alignments between the sequences. This provides us with a library of alignment information that can be used to guide the progressive alignment. Intermediate alignments are then based not only on the sequences to be aligned next but also on how all of the sequences align with each other. This alignment information can be derived from heterogeneous sources such as a mixture of alignment programs and structure superposition.


Sequences for input can be in GCG, FASTA, EMBL (Nucleotide only), GenBank, PIR, NBRF, PHYLIP or UniProtKB/Swiss-Prot (Protein only) format. The sequence type may be of DNA, RNA and protein. Once sequences have been properly entered and the appropriate parameters, such as matrix selection, align order, the ‘Submit’ button at the bottom of the page must be clicked in order to send the alignment request to the server. An identification number is assigned to each request and used as a unique reference. The alignment process can take from a few seconds up to several minutes, depending on the alignment complexity, its input parameters and the server load. Runing a tool is an interactive process, the results are delivered directly to the browser when they become available. It's possible to be notified by email when the job is finished by simply ticking the box "Be notified by email". The output may be in the form of ClustalW, Pearson/FASTA, GCG MSF, Phylip and HTML.


The graphic colored output shows the level of consistency between the final alignment and the library used by T-Coffee. The main score is the total consistency value. A value of a 100 means full agreement between the considered alignment and its associated primary library. It also means that the library is self-consistent.

ad-scaled.webp

Need to learn more about TCoffee and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page