Phylogenetic Tree Construction & Analysis

BioCodeKb - Bioinformatics Knowledgebase

Basic steps in construction of Phylogenetic Trees

Data selection – Amino acid or nucleotide

In the case of a gene phylogeny, we need to decide if we want to work with nucleotide or amino acid data.

We can use either amino acid or nucleotide data to generate a tree.

Some argue that it is better to use amino acid data because the redundancy of the genetic code means our will be able to recover more conserved sites in our alignment. However, any analysis we perform with amino acid data is more time consuming in comparison to its nucleotide counterpart. This is because there are 20 possible amino acids substitutions, as opposed to only 4 nucleotide substitutions.

Other scientists prefer to use nucleotide data. As mentioned above, nucleotide analyses are faster. In addition, nucleotide data has more information that can be used to recognize the evolution of your sequence since 3 nucleotides code for 1 amino acid.


Alignment programs shift our data by inserting gaps to line up all the homologous (or conserved) sites into vertical columns. There are many alignment program, the most common and well-supported are,



  • Mesquite

It is best to try at least 2 different parameters, if not more, and then view our alignment to determine which is better

Model Selection

Models consist of many parameters that calculate the substitution rates of our data. In other words, a program predicts which model’s algorithm best captures the way our data set is evolving or changing. This model is used later to build our tree.

Tree building

Maximum likelihood (ML) assumes the best tree is the tree that is most likely with the given data, under a certain model. ML will take into account all the data we have generated so far in order to construct our final tree. It is a commonly used tree-building algorithm that will give us a single tree as our output.

Making it pretty

When we have created our tree, then it’s time to make it publication ready.

If we need to change the taxa names, font, or size, use Adobe Illustrator or a similar image manipulation program. Make sure our taxa names can be clearly read and the bootstrap values are visible above each node.

Not all data will require such robust analysis. But we will not know for certain how much better or different a tree produced from a more robust analysis will be until this analysis is performed.

In general, the output tree of a phylogenetic analysis is an estimate of the character's phylogeny (a gene tree) and not the phylogeny of the taxa (species tree) though ideally, both should be very close. They do not necessarily accurately represent the species evolutionary history the analysis can be confounded by horizontal gene transfer, hybridization between species, convergent evolution, and conserved sequences;

  • Noncoding regions are more variable than coding regions

  • Some positions in the protein coding genes are more variable then the others

  • Some genes evolve faster than the other

BioinfoLytics Company

Our company BioinfoLytics is affliated with BioCode and is a project, where we are providing many topics on Genomics, Proteomics, their analysis using many tools in a better and advance way, Sequence Alignment & Analysis, Bioinformatics Scripting & Software Development, Phylogenetic and Phylogenomic Analysis, Functional Analysis, Biological Data Analysis & Visualization, Custom Analysis, Biological Database Analysis, Molecular Docking, Protein Structure Prediction and Molecular Dynamics etc. for the seekers of Biocode to further develop their interest to take part in these services to fulfill their requirements and obtain their desired results. We are providing such a platform where one can find opportunity to learn, research projects analysis and get help and huge knowledge based on molecular, computational and analytical biology.

We are providing “Phylogenetic Tree Construction and Analysis” service to our researchers and seekers to infer evolutionary relationships of species and to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences.

Need to learn more about Phylogenetic Tree Construction & Analysis and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

Get in touch with us

Tel: +92 314 7785980


  • Black Instagram Icon
  • Facebook

© Copyright 2020 BioCode Ltd. - All rights reserved.