top of page

Pairwise Distance Analysis

BioCodeKb - Bioinformatics Knowledgebase

Distance analysis compares two aligned sequences at a time, and builds a matrix of all possible sequence pairs. The final estimates of the difference between all possible pairs of sequences are known as pairwise distances. Pairwise methods evaluate all pairs of sequences and transform the differences into a distance. This essentially is a data reduction from a possibly many state difference to a single number. Combining these distances to estimate a tree must be less powerful than the full likelihood approach.


The phylogenetic Mean Pairwise Distance (MPD) is one of the most popular measures for computing the phylogenetic distance between a given group of species. Among other phylogenetic measures, the MPD is used as a tool for deciding if the species of a given group are closely related.


Given a measure of the distance between each pair of species, a simple approach to the phylogeny problem would be to find a tree that predicts the observed set of distances as closely as possible. This leaves out some of the information in the data matrix, reducing it to a simple table of pairwise distances. However, it seems that in many cases most of the evolutionary information is conveyed in these distances. When we consider a tree from genetic data using parsimony we minimize the amount of change along the branches of the tree. Similarly when we use the likelihood principle we minimize change conditional on a specific mutation model. The mutation model is crucial. The model can take into account that we do not observe all substitution events, because recent events might hide ancient events. Parsimony is therefore undercounting the number of changes and so might have a shorter tree than the true tree. Likelihood does not escape this problem either we have a tree that is shorter or the same length as the true tree. An alternative to likelihood or parsimony is an approach based on evolutionary distances between a pair of sequences, where the distance is accounting for all unseen events, for example using similar mutation models as likelihood. Pairwise distance methods are not so popular anymore because they are outperformed by likelihood methods. Pairwise methods evaluate all pairs of sequences and transform the differences into a distance. This essentially is a data reduction from a possibly many state difference to a single number. Combining these distances to estimate a tree must be less powerful than the full likelihood approach. In addition, an identical distance can be generated from different sequence pairs and once we only analyze the distance matrix that difference is lost. Using the number of different sites as a distance measure makes quickly clear that we can arrive at the same measure from different sequences. Distance methods have still their merit because once the distance matrix is calculated the tree building can be very fast and under many circumstances are the trees generated with such methods not all that terrible and often are identical to the likelihood tree.


If we could estimate branch length on a tree with absolute certainty all distances on a tree would be additive, ultrameric trees have additive properties and also obey ultrameric properties. Minimal evolution sets the weights to 1 and α = 2 and simply assumes that the sum of all branches are minimized (instead of all individual branches by itself).


BioinfoLytics Company

Our company BioinfoLytics is affliated with BioCode and is a project, where we are providing many topics on Genomics, Proteomics, their analysis using many tools in a better and advance way, Sequence Alignment & Analysis, Bioinformatics Scripting & Software Development, Phylogenetic and Phylogenomic Analysis, Functional Analysis, Biological Data Analysis & Visualization, Custom Analysis, Biological Database Analysis, Molecular Docking, Protein Structure Prediction and Molecular Dynamics etc. for the seekers of Biocode to further develop their interest to take part in these services to fulfill their requirements and obtain their desired results. We are providing such a platform where one can find opportunity to learn, research projects analysis and get help and huge knowledge based on molecular, computational and analytical biology.

We are providing “Pairwise Distance Analysis” service to the bioinformatics community through our expertise in phylogeny as non-parametric distance methods to work efficiently in the field of phylogenetics.

ad-scaled.webp

Need to learn more about Pairwise Distance Analysis and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page