top of page


BioCodeKb - Bioinformatics Knowledgebase

Metagenomics is the study of genetic material recovered directly from environmental samples. The broad field may also be known as environmental genomics, ecogenomics or community genomics.

Because of the ability of metagenomics to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world. As the price of DNA sequencing continues to fall, metagenomics now allows microbial ecology to be investigated at a much greater scale and detail than before.

The data generated by metagenomics experiments are both enormous and inherently noisy, containing fragmented data representing as many as 10,000 species. Collecting, curating, and extracting useful biological information from datasets of the huge size represent significant computational challenges for researchers.

Metagenomic analysis involves the application of bioinformatics tools to study the genetic material from environmental, uncultured microorganisms. Next generation sequencing of 16S rRNA allows the evaluation of bacterial diversity and detection of thousands of organisms. Analysis of metagenomic data involves three major steps:

1) assembly

2) annotation

3) statistical analysis

If the goal is to analyze the genome of the microorganism rather than its community, short reads will have to be assembled into longer genomic contigs. Assembly approaches for metagenomic samples fall into two categories: reference based assembly and de novo assembly. Current metagenomic annotation relies on classifying sequences to known functions or taxonomic units based on homology searches against a database. Major annotation pipelines perform taxonomic assignments of reads.


  • Unbiased results free from influence of specific genomic loci

  • Independence from taxonomically informative genetic markers

  • Ability to study highly diverged microbes, such as viruses

  • Close estimations of microbial diversity

  • Detection of abundance of microorganisms in various environments

  • Analysis of unculturable microorganisms

  • Information on composition as well as functional capabilities of an ecosystem

  • Investigation of function genes and gene clusters

Functional expression of genes from metagenomic libraries is limited by various factors including inefficient transcription and/or translation of target genes as well as improper folding and assembly of the corresponding proteins caused by the lack of appropriate chaperones and cofactors. It is now well accepted that the use of different expression hosts of distinct phylogeny and physiology can dramatically increase the rate of success.

The reduction of the price of DNA sequencing has resulted in the emergence of large data sets to handle and analyze, especially in microbial ecosystems, which are characterized by high taxonomic and functional diversities. To assess the properties of these complex ecosystems, a conceptual background of the application of NGS technology and bioinformatics analysis to metagenomics is required.


Need to learn more about Metagenomics and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page