# Neighbor Joining Method (NJ)

# BioCodeKb - Bioinformatics Knowledgebase

The neighbor joining (NJ) method can be used, which is somewhat similar to UPGMA in that it builds a tree by using stepwise reduced distance matrices. However, the NJ method does not assume the taxa to be equidistant from the root. It corrects for unequal evolutionary rates between sequences by using a conversion step. This conversion requires the calculations of “r-values” and “transformed r-values”.

This algorithm does not make the assumption of molecular clock and adjust for the rate variation among branches. It begins with an unresolved star-like tree. Each pair is evaluated for being joined and the sum of all branches length is calculated of the resultant tree. The pair that yields the smallest sum is considered the closest neighbors and is thus joined .A new branch is inserted between them and the rest of the tree and the branch length is recalculated. This process is repeated until only one terminal is present. NJ method is comparatively rapid and generally gives better results than UPGMA method. But it produces only one tree and neglects other possible trees, which might be as good as NJ trees, if not significantly better. Moreover, since errors in distance estimates are exponentially larger for longer distances, under some condition, this method will yield a biased tree.

**Algorithm**

Neighbor-joining is a recursive algorithm. Each step in the recursion consists of the following steps:

Based on the current distance matrix calculate a modified distance matrix Q.

Find the least distant pair of nodes in Q (= the closest neighbors = the pair with the lowest distance value). Create a new node on the tree joining the two closest nodes: the two nodes are linked by their common ancestral node.

Calculate the distance of each of the nodes in the pair to their ancestral node.

Calculate the distance of all nodes outside of this pair to their ancestral node.

Start the algorithm again, considering the pair of joined neighbors as a single taxon (the terminal nodes are replaced by their ancestral node and the ancestral node is then treated as a terminal node) and using the distances calculated in the previous step.

**Negative branch lengths**

NJ represents the data in an additive tree. Therefore it can assign negative branch lengths. Usually, branch lengths can be interpreted as an estimate for the substitutions. However, here we have difficulties in doing so.

If this occurs one can set negative branch length to zero and transfer the difference to the adjacent branch length so that the total distance between an adjacent pair of terminal nodes remains unaffected. This does not alter the overall topology of the tree.

**Generalized Neighbor Joining**

One of the disadvantages of the NJ method is that it generates only one tree and does not test other possible tree topologies. To overcome the limitation, a generalized NJ method has been developed, in which multiple NJ trees with different initial taxon groupings are generated. A best tree is then selected from a pool of regular NJ trees that best ﬁt the actual evolutionary distances.