Dynamic programming is used for optimal alignment of two sequences. It finds the alignment in a more quantitative way by giving some scores for matches and mismatches (Scoring matrices), rather than only applying dots. It is fundamentally similar to the dot matrix method in that it also creates a two dimensional alignment grid. By searching the highest scores in the matrix, alignment can be accurately obtained. The Dynamic Programming solves the original problem by dividing the problem into smaller independent sub problems. These techniques are used in many different aspects of computer science. Needleman-Wunsch and Smith-Waterman algorithms for sequence alignment are defined by dynamic programming approach.
Smith Waterman algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981. The algorithm explains the local sequence alignment, it gives conserved regions between the two sequences, and one can align two partially overlapping sequences, also it’s possible to align the subsequence of the sequence to itself. These are the main advantages of Local Sequence Alignment.
This algorithm mainly differs within two aspects from the Needleman-Wunsch algorithm. First aspect being, local alignment differs by having only a negative score for the mismatch and when the matrix value becomes negative it has to be set to zero. They are also predefined scoring matrices for nucleotide or protein sequences.
Scoring matrices:
Mainly used predefined matrices are PAM and BLOSUM.
PAM Matrices: Margaret Dayhoff was the first one to develop the PAM matrix, PAM stands for Point Accepted Mutations. PAM matrices are calculated by observing the differences in closely related proteins. One PAM unit (PAM1) specifies one accepted point mutation per 100 amino acid residues, i.e. 1% change and 99% remains as such.
BLOSUM: BLOcks SUbstitution Matrix, developed by Henikoff and Henikoff in 1992, used conserved regions. These matrices are actual percentage identity values. Simply to say, they depend on similarity. Blosum 62 means there is 62 % similarity.
Gap score or gap penalty: Dynamic programming algorithms use gap penalties to maximize the biological meaning. Gap penalty is subtracted for each gap that has been introduced. There are different gap penalties such as gap open and gap extension. The gap score defines a penalty given to alignment when we have insertion or deletion. Gap open and gap extension has been introduced when there are continuous gaps (five or more). The open penalty is always applied at the start of the gap, and then the other gaps following it is given with a gap extension penalty which will be less compared to the open penalty.
Steps in Dynamic programing are;
1.Initialization of the matrix with the scores possible
2.Matrix filling
3.Trace back the residues for appropriate alignment
Initialization Step
First row and first column of the matrix can be initially filled with 0. If the gap score is assumed, the gap score can be added to the previous cell of the row or column.
Matrix Fill Step
It starts from the upper left hand corner of the matrix. To find the maximum score of each cell, it is required to know the neighbouring scores (diagonal, left and right) of the current position. From the assumed values, add the match or mismatch (assumed) score to the diagonal value. Similarly add the gap score to the other neighbouring values. Thus, we can obtain three different values, from that take the maximum among them and fill the ith and jth position with the score obtained.
Trace back Step
The final step for the appropriate alignment is trace backing, firstly, one need to find out the maximum score obtained in the entire matrix for the local alignment of the sequences. It is possible that the maximum scores can be present in more than one cell, in that case there may be possibility of two or more alignments, and the best alignment by scoring it (match =5, mismatch=-1, gap=-2) which may be user defined. By continuing the trace back step by this method, one would reach to the 0th row, 0th column.