TrimAl is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment.
Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. trimAl is a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized.
Moreover, the user can remove spurious sequences from the alignment before applying any method to improve the alignment's quality.
Among trimAl's additional features, trimAl allows getting the complementary alignment (columns that were trimmed), to compute statistics from the alignment, to select the output file format, to get a summary of trimAl's trimming in HTML format, and many other options.
trimAl has been developed in a GNU/Linux environment using C++ programming language and has been tested on various UNIX, Mac and Windows platforms. Moreover, a web server have developed to run trimAl online (http://phylemon2.bioinfo.cipf.es/), which has been included in the Phylemon suite for phylogenetic and phylogenomic tools.
trimAl reads and renders protein or nucleotide alignments in several standard formats. trimAl starts by reading all columns in an alignment and computes a score (Sx) for each of them. This score can be a gap score (Sg), a similarity score (Ss) or a consistency score (Sc). The score for each column can be computed based only on the information from that column or, if a window size of w is specified, it corresponds to the average value of w columns around the position considered.
Once all column scores have been computed trimAl can proceed in two ways. If both a score and a minimum conservation threshold are provided, trimAl renders a trimmed alignment in which only the columns with scores above the score threshold are included, as far as the number of selected columns is above a conservation threshold defined by the user. If this number is below the conservation threshold, trimAl will add more columns to the trimmed alignment in a decreasing order of scores until the conservation threshold is reached. The conservation threshold corresponds to the minimum percentage of columns, from the original alignment, which the user wants to include in the trimmed alignment.