top of page

Clustal Omega

BioCodeKb - Bioinformatics Knowledgebase

Clustal Omega is a multiple sequence alignment program for aligning three or more sequences together in a computationally efficient and accurate manner. It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.

Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It is widely used in Bioinformatics. Clustal Omega can be run on widest variety of operating systems out of all the Clustal tools.

All variations of the Clustal software align sequences using a heuristic that progressively builds a multiple sequence alignment from a series of pairwise alignments. This method works by analyzing the sequences as a whole, then utilizing the UPGMA/Neighbor-joining method to generate a distance matrix. A guide tree is then calculated from the scores of the sequences in the matrix, then subsequently used to build the multiple sequence alignment by progressively aligning the sequences in order of similarity. At each step, the nearest two clusters are combined and is repeated until the final tree can be assessed. In the final step, the multiple sequence alignment is produced using HHAlign package from the HH-Suite, which uses two profile HMM's.

ClustalΩ is a fast and scalable program written in C and C++. It uses seeded guide trees and a new HMM engine that focuses on two profiles to generate these alignments. Clustal Omega is consistency-based and is widely viewed as one of the fastest online implementations of all multiple sequence alignment tools and still ranks high in accuracy, among both consistency-based and matrix-based algorithms.

Clustal Omega uses a modified version of mBed which has a complexity of  and produces guide trees that are just as accurate as those from conventional methods. The speed and accuracy of the guide trees in Clustal Omega is attributed to the implementation of a modified mBed algorithm. It also reduces the computational time and memory requirements to complete alignments on large datasetsand improves the quality of the sensitivity and alignment significantly.

The accuracy of Clustal Omega on a small number of sequences is, on average, very similar to what are considered high quality sequence aligners. The difference comes when using large sets of data with hundreds of thousands of sequences. In these cases, Clustal Omega outperforms other algorithms across the board. It is capable of running 100,000+ sequences on one processor in a few hours.

In Clustal output format the flag prints residue numbers at the end of each line.

We use BAliBASE version 3.0 to measure the quality and execution times for alignments comprised of small numbers of sequences. Others are MAFFT and MUSCLE.

Magenta shows Basic, Green shows Hydroxly + sulfhydryl + amine + G. Others Gray shows unusual amino/imino acids etc. Consensus Symbols show degree of conservation of residue like * (asterix) shows positions which have a single, fully conserved residue.


Need to learn more about Clustal Omega and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page