top of page

Pairwise Sequence Alignment

BioCodeKb - Bioinformatics Knowledgebase

Pairwise Sequence Alignment is used to identify regions of similarity that may show functional, structural or evolutionary relationships between two biological sequences (protein or nucleic acid). Pairwise alignments can only be used between two sequences at a time, but they are efficient to calculate and are often used for methods that do not require extreme precision (such as searching a database for sequences with high similarity to a query). The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming (Needleman-Wunch Algorithm Smith-Waterman Algorithm) and word methods. Although each method has its individual strengths and weaknesses, all three pairwise methods have difficulty with highly repetitive sequences of low information content  especially where the number of repetitions differ in the two sequences to be aligned. One way of quantifying the utility of a given pairwise alignment is the 'maximum unique match' (MUM), or the longest subsequence that occurs in both query sequences. Longer MUM sequences typically shows closer relatedness.


Local alignment is an alignment of two sub-regions of a pair of sequences. It is appropriate when we are aligning two segments of genomic DNA that may have local regions of similarity embedded in a background of a non-homologous sequence while global alignment is a sequence alignment over the entire length of two or more nucleic acid or protein sequences. In a global alignment, the sequences are considered to be homologous along their entire length.


In order to align a pair of sequences, a scoring system is required to score matches and mismatches. The scoring system can be as simple as “+1” for a match and “-1” for a mismatch between the pair of sequences at any given site of comparison. However substitutions, insertions and deletions occur at different rates over evolutionary time. This variation in rates is the result of a large number of factors, including the mutation process, genetic drift and natural selection. For protein sequences, the relative rates of different substitutions can be empirically determined by comparing a large number of related sequences.  These empirical measurements can then form the basis of a scoring system for aligning subsequent sequences. Popular matrices used for protein alignments are BLOSUM and PAM matrices.


The score of a pairwise alignment is:

matchCount × matchCost + mismatchCount × mismatchCost

For each gap of length n, a score of gapOpenPenalty + (n- 1) × gapExtensionPenalty is subtracted from this.

Where

  • gapOpenPenalty = The “gap open penalty” setting in Geneious.

  • gapExtensionPenalty = The “gap extension penalty” setting in Geneious.

  • matchCost = The first number in the Geneious cost matrix.

  • mismatchCost = The second number in the Geneious cost matrix.

  • matchCount = The number of matching residues in the alignment.

  • mismatchCount = The number of mismatched residues in the alignment.


When doing a Global alignment with free end gaps, gaps at either end of the alignment are not penalized when determining the optimal alignment. This is especially useful if we are aligning sequence fragments that overlap slightly in their starting and ending positions, e.g. when using two slightly different primer pairs to extract related sequence fragments from different samples. We can also do a Local Alignment if we want to allow free end overlaps, rather than just free end gaps in one alignment.


If we are aligning nucleotide sequences, we will also have the option of doing our alignment by translation and back. To view the options for translation alignment, click the ‘More Options’ button that the bottom of the alignment dialog. The translation alignment options will appear. We  can set the genetic code and translation frame for the translation as well as the cost matrix, gap open penalty and gap extension penalty for the alignment. If we want to set the alignment type (global or local) or choose to automatically determine the sequences’ direction, do it in the main section of the dialog.


The most similar third of the alignment pairs have sequence similarity expectation values E() < 10-10, with an average of 48% identity. The intermediate and most distantly related sequences have 10-10 < E() < 10-5 (26.9% identity) and E() < 10-5 (22.6% identity), respectively.


We need a way to estimate the statistical significance of a given alignment score. For global alignments, there is no adequate theory to predict the distribution of alignment scores from randomly generated sequences. One can simply generate scores from alignments of sequences that have been randomly shuffled many times. If 100 such shuffles all produce alignment scores that are lower than the observed alignment score, then one can say that the p-value is likely to be less than 0.01.


For local alignments, probability theory predicts that randomly shuffled sequences will produce alignment scores with an extreme value (type I maximum) distribution.


Tools

  • Needle (EMBOSS)

  • Stretcher (EMBOSS)

  • Water (EMBOSS)

  • Matcher (EMBOSS)

  • LALIGN

  • SIM

  • ALION


BioinfoLytics Company

Our company, BioinfoLytics, is affliated with BioCode and is a project, which is covering many topics on Genomics, Proteomics, their analysis using many tools in a cool way, Sequence Alignment & Analysis, Bioinformatics Scripting & Software Development, Phylogenetic and Phylogenomic Analysis, Functional Analysis, Biological Data Analysis & Visualization, Custom Analysis, Biological Database Analysis, Molecular Docking, Protein Structure Prediction and Molecular Dynamics etc for the seekers of Biocode to further develop their interest to take part in these services to fulfill their requirements and obtain their desired results. We are providing such a platform where one can find opportunity to learn, research projects analysis and get help and huge knowledge based on molecular, computational and analytical biology.


We are providing “Pairwise Sequence Alignments” service to our customers to study conservation or differences between two sequences of either protein or DNA and to strive high quality research and will advance science in the domain of Sequence Alignment & Analysis.

ad-scaled.webp

Need to learn more about Pairwise Sequence Alignment and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page