top of page

Sequence Variation

BioCodeKb - Bioinformatics Knowledgebase

Genetic variation or sequence variation is a term used to explain the variation in the DNA sequence in each of our genomes. Each single nucleotide polymorphism shows a difference in a single DNA base, A, C, G or T, in a person's DNA. On average they occur once in every 300 bases and are often found in the DNA between genes. There are certain sources of genetic variation:

  • Mutation

  • gene flow

  • random mating between organisms.

  • random fertilization

  • crossing over (or recombination) between chromatids of homologous chromosomes during meiosis

A mutation is simply a change in the DNA. Mutations themselves are not very common and are usually harmful to a population. Because of this, mutations are usually selected against through evolutionary processes.

The majority of sequence variants in Ensembl are single nucleotide polymorphisms (SNPs), insertions and deletions (indels) imported from NCBI dbSNP. For human SNPs in particular, we aim to keep current with dbSNP, updating these with every Ensembl release (every 2-3 months). Projects submitting their variants to dbSNP include individual labs, the 1000 genomes project, ExAC and gnomAD. Small sequence variants are mapped onto the reference genome, and effects on Ensembl transcripts are determined. Larger structural variations (such as copy number variation) are also viewable on the genomic sequence. These include structural variants from dGVA and somatic mutations. Here, the variation tab provides a wealth of information about a SNP, insertion, deletion, copy number variant, or somatic mutation.

We will explore:

  • The genomic sequence in the region of a variant

  • Genes and transcripts associated with a SNP of interest

  • Population frequencies

  • Associated diseases and phenotypes

We can get to the variant tab by clicking on variants in the gene variant table, any of the sequence views for the gene or transcript, or from the region in detail view. We can also search directly for variants using rsIDs, COSMIC or phenotypes.

Three types of sequence variations are studied;

  • single-nucleotide polymorphisms (SNPs)

  • insertions and deletions (indels)

  • short tandem repeats (STRs)

Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics.

The large-scale typing of sequence variation in genes and genomic DNA shows new challenges for which it is not clear that current technologies are sufficiently sensitive, robust, or scalable. We assess techniques for discovering and typing variation on a large scale, especially that of single-nucleotide polymorphisms. The in-depth focus is the DNA chip/array platform, and some of the published large-scale studies are closely examined.


Need to learn more about Sequence Variation and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page