top of page

Genomic Data and Biological annotation

BioCodeKb - Bioinformatics Knowledgebase

Genomic data means the genome and DNA data of an organism. They are used in bioinformatics for collecting, storing and processing the genomes of living things. Once a genome sequence has been assembled and annotated the information needs to be stored in a database so that it can be shared with lots of people around the world. Genomic data generally require a large amount of storage and purpose-built software to analyze such as genome browsers.

These data have also created new challenges related to the development of methods for visualizing and searching information.

Genomic data may therefore reveal things about people other than the person from whom they were derived. This creates more challenges to the release of genomic data than for some other types of data. Genomic data are relatively static because they remain relevant to the individual to whom they relate over long periods of time, even between generations.

Genomic data have other important characteristics;

  • Can reveal information about susceptibility to diseases and other physical conditions

  • Contain information about ethnic heritage

  • Help in Proteomics, health records, metabolomics, imaging etc

Biological Annotation

DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. Estimating complete annotation of a genome includes information regarding gene location and organization, transcripts and products of those genes, as well as regulation and control of expression, translation and degradation.

Genome annotation consists of three main steps;

  1. identifying portions of the genome that do not code for proteins

  2. identifying elements on the genome, a process called gene prediction

  3. attaching biological information to these elements

Over the last years a great number of organismal genomes were sequenced. Now one of the most important challenges of computational genomics is the functional annotation of nucleic acid sequences.

Automatic annotation tools attempt to perform these steps via computer analysis, as opposed to manual annotation which involves human expertise.

A simple method of gene annotation relies on homology based search tools, like BLAST, to search for homologous genes in specific databases, the resulting information is then used to annotate genes and genomes. However, as information is added to the annotation platform, manual annotators become capable of deconvoluting discrepancies between genes that are given the same annotation.

Structural annotation consists of the identification of genomic elements such as ORFs and their localization, gene structure, coding regions and location of regulatory motifs. Functional annotation consists of attaching biological information to genomic elements such as biochemical function, biological function and expression. Reference based annotation is done with GeneMapper. Protein function annotation is done by homology-based inference. Now Computer Methods are developed for Annotation at NCBI. Gene annotation involves the process of taking the raw DNA sequence produced by the genome-sequencing projects and adding layers of analysis and interpretation necessary to extracting biologically significant information and placing such derived details into context. Through the aid of bioinformatics, there exists software to perform such complex procedures. 


Need to learn more about Genomic Data and Biological annotation and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page