# Distance Based Method (Tree Construction)

# BioCodeKb - Bioinformatics Knowledgebase

The most popular and frequently used methods of tree building can be classified into two major categories: phenetic methods based on distances and cladistic methods based on characters. The former measures the pair-wise distance/dissimilarity between two genes, the actual size of which depends on different definitions, and constructs the tree totally from the resultant distance matrix. The latter evaluate all possible trees and find for the one that optimizes the evolution.

**Distance-Based Methods**

The most popular distance-based methods are the unweighted pair group method with arithmetic mean (UPGMA), neighbor joining (NJ) and those that optimize the additivity of a distance tree (FM and ME).

**Clustering Based Methods**

**UPGMA Method**

This method follows a clustering procedure:

(1) Assume that initially each species is a cluster on its own.

(2) Join closest 2 clusters and recalculate distance of the joint pair by taking the average.

(3) Repeat this process until all species are connected in a single cluster. Strictly speaking, this algorithm is phenetic, which does not aim to show evolutionary descent. It assigns equal weight on the distance and assumes a randomized molecular clock. WPGMA is a similar algorithm but assigns different weight on the distances. UPGMS method is simple, fast.

**Neighbor Joining Method (NJ)**

This algorithm does not make the assumption of molecular clock and adjust for the rate variation among branches. It begins with an unresolved star-like tree. Each pair is evaluated for being joined and the sum of all branches length is calculated of the resultant tree. The pair that yields the smallest sum is considered the closest neighbors and is thus joined .A new branch is inserted between them and the rest of the tree and the branch length is recalculated. This process is repeated until only one terminal is present. NJ method is comparatively rapid and generally gives better results than UPGMA method. But it produces only one tree and neglects other possible trees, which might be as good as NJ trees, if not significantly better. Moreover, since errors in distance estimates are exponentially larger for longer distances, under some condition, this method will yield a biased tree.

**Weighted Neighbor-Joining (Weighbor)**

The Weighbor criterion consists of two terms; an additivity term (of external branches) and a positivity term (of internal branches), that quantifies the implications of joining the pair. Weighbor gives less weight to the longer distances in the distance matrix and the resulting trees are less sensitive to specific biases than NJ and relatively immune to the "long branches attraction/distraction" drawbacks observed with other methods.

**Optimality-Based Methods**

**Fitch-Margoliash (FM) and Minimum Evolution (ME)**

Methods Fitch and Margoliash proposed in 1967 a criteria (FM Method) for fitting trees to distance matrices. This method seeks the least squared fit of all observed pair-wise distances to the expected distance of a tree. The ME method also seeks the tree with the minimum sum of branch lengths. But instead of using all the pair-wise distances as FM, it fixed the internal nodes by using the distance to external nodes and then optimizes the internal branch lengths FM and ME methods perform best in the group of distance-based methods, but they work much more slowly than NJ, which generally yield a very close tee to these methods.