Clustering Based Method (Tree Construction)

BioCodeKb - Bioinformatics Knowledgebase

The most popular and frequently used methods of tree building can be classified into two major categories: phenetic methods based on distances and cladistic methods based on characters. The former measures the pair-wise distance/dissimilarity between two genes, the actual size of which depends on different definitions, and constructs the tree totally from the resultant distance matrix. The latter evaluate all possible trees and seek for the one that optimizes the evolution.

The construction of graphical phylogenetic tree reveals similarity as well as dissimilarity among organisms [4]. To make a cluster or group of similar organism in the tree, various clustering methods are applied. Such observations also make the appearance of organism in relation to time.

UPGMA Method

This method follows a clustering procedure:

(1) Assume that initially each species is a cluster on its own.

(2) Join closest 2 clusters and recalculate distance of the joint pair by taking the average.

(3) Repeat this process until all species are connected in a single cluster. Strictly speaking, this algorithm is phenetic, which does not aim to show evolutionary descent. It assigns equal weight on the distance and assumes a randomized molecular clock. WPGMA is a similar algorithm but assigns different weight on the distances. UPGMS method is simple, fast.

The basic assumption of the UPGMA method is that all taxa evolve at a constant rate and that they are equally distant from the root, implying that a molecular clock is in effect. However, real data rarely meet this assumption. Thus, UPGMA often produces erroneous tree topologies. However, owing to its fast speed of calculation, it has found extensive usage in clustering analysis of DNA microarray data.

Neighbor Joining Method (NJ)

The neighborjoining (NJ) method can be used, which is somewhat similar to UPGMA in that it builds a tree by using stepwise reduced distance matrices. However, the NJ method does not assume the taxa to be equidistant from the root. It corrects for unequal evolutionary rates between sequences by using a conversion step. This conversion requires the calculations of “r-values”  and “transformedr-values”.

This algorithm does not make the assumption of molecular clock and adjust for the rate variation among branches. It begins with an unresolved star-like tree. Each pair is evaluated for being joined and the sum of all branches length is calculated of the resultant tree. The pair that yields the smallest sum is considered the closest neighbors and is thus joined .A new branch is inserted between them and the rest of the tree and the branch length is recalculated. This process is repeated until only one terminal is present. NJ method is comparatively rapid and generally gives better results than UPGMA method. But it produces only one tree and neglects other possible trees, which might be as good as NJ trees, if not significantly better. Moreover, since errors in distance estimates are exponentially larger for longer distances, under some condition, this method will yield a biased tree.

Generalized Neighbor Joining

One of the disadvantages of the NJ method is that it generates only one tree and does not test other possible tree topologies. To overcome the limitation, a generalized NJ method has been developed, in which multiple NJ trees with different initial taxon groupings are generated. A best tree is then selected from a pool of regular NJ trees that best fit the actual evolutionary distances.

Need to learn more about Clustering Based Method (Tree Construction) and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

Get in touch with us

Tel: +92 314 7785980


  • Black Instagram Icon
  • Facebook

© Copyright 2020 BioCode Ltd. - All rights reserved.