A Pairwise Algorithm is an algorithmic technique with its origins in Dynamic programming. Pairwise algorithms have many uses including comparing a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frame shifting, most remarkable feature of Pairwise.
The Pairwise algorithm is a variant of the Smith–Waterman algorithm best local alignment algorithm. These algorithms all belong to the class known as minimal string edit algorithms. The main differences between Pairwise and other alignment algorithm is that, besides normal penalties such as Gap Opening Penalty (GOP), Gap Extension Penalty (GEP) and Match, Pairwise introduced two new penalties called Frame Opening Penalty (FOP) and Frame Extension Penalty (FEP), which will be incurred when a frame shift is accepted and extended respectively.
Pairwise sequence alignment is one of the most fundamental tools for comparing DNA and protein sequences. It establishes the basis for the interpretation of evolutionary and functional relationships between gene sequences and species. Through their seminal work in 1970, Needleman and Wunsch (1970) provided the catalyst to certain other profound contributions to sequence alignment. To date, the most successful and widely understood algorithm is the Smith–Waterman technique (Smith and Waterman, 1981), which can detect similar segments in genomic sequences with maximal score if the compared sequences are closely related. More recently, there have been a number of modifications to the Smith–Waterman technique as well as novel approaches that are based on structural information, profile information and evolutionary models of insertions and deletions.
Dot plot method
A dot plot is a simple, yet intuitive way of comparing two sequences, either DNA or protein, and is probably the oldest way of comparing two sequences.
Dot plot are two dimensional graphs, showing a comparison of two sequences. The principle used to produce the dot plot is: The top X and the left y axes of a rectangular array are used to represent the two sequences to be compared.
Calculation: Matrix
• Columns = residues of sequence 1
• Rows = residues of sequence 2.
A dot is plotted at every co-ordinate where there is similarity between the bases.
Any region of similarity is revealed by a diagonal row of dots. Isolated dots not on diagonal show random matches.
Detection of matching regions can be improved by filtering out random matches and this can be achieved by using a sliding window. It means that instead of comparing a single sequence position more positions is compared at the same time and dot is printed only if a certain minimal number of matches occur.
Steps in Dynamic programing are;
Initialization Step
First row and first column of the matrix can be initially filled with 0. If the gap score is assumed, the gap score can be added to the previous cell of the row or column.
Matrix Fill Step
It starts from the upper left hand corner of the matrix. To find the maximum score of each cell, it is required to know the neighbouring scores (diagonal, left and right) of the current position. From the assumed values, add the match or mismatch (assumed) score to the diagonal value. Similarly add the gap score to the other neighbouring values. Thus, we can obtain three different values, from that take the maximum among them and fill the ith and jth position with the score obtained.
Trace back Step
The final step for the appropriate alignment is trace backing, firstly, one need to find out the maximum score obtained in the entire matrix for the local alignment of the sequences. It is possible that the maximum scores can be present in more than one cell, in that case there may be possibility of two or more alignments, and the best alignment by scoring it (match =5, mismatch=-1, gap=-2) which may be user defined. By continuing the trace back step by this method, one would reach to the 0th row, 0th column.