top of page

Shotgun metagenomics (sequencing)

BioCodeKb - Bioinformatics Knowledgebase


  • de novo metagenomic sequencing assembly per sample.

  • Gene predictions and their protein translations in FASTA as well as GenBank formats.

  • A non-redundant gene catalogue that is created by clustering the predicted genes, providing an inventory of the gene content.

  • Annotation of the gene catalogue with more than 10 databases and functional units including KEGG orthologous groups, modules, pathways, as well as COGs.

  • Functional profiling (abundance) tables of non-redundant genes and KEGG KOs, modules, pathways, and COGs. Both raw and normalized abundances are provided.

  • Taxonomic relative abundance profiles per sample and per project both in BIOM format and Excel-sheet format.

Shotgun metagenomic sequencing allows researchers to comprehensively sample all genes in all organisms present in a given complex sample. The method enables microbiologists to evaluate bacterial diversity and detect the abundance of microbes in various environments. Shotgun metagenomics also provides a means to study unculturable microorganisms that are otherwise difficult or impossible to analyze.

Unlike capillary sequencing or PCR-based approaches, next-generation sequencing (NGS) allows researchers to sequence thousands of organisms in parallel. With the ability to combine many samples in a single sequencing run and obtain high sequence coverage per sample, NGS-based metagenomic sequencing can detect very low abundance members of the microbial community that may be missed or are too expensive to identify using other methods. 16S rRNA Sequencing is another method used for metagenomics studies.

Shotgun metagenomic sequencing

Shotgun metagenomic sequencing is a relatively new environmental sequencing approach used to examine thousands of organisms in parallel and comprehensively sample all genes, providing insight into community biodiversity and function. Shotgun sequencing allows for the detection of low abundance members of microbial communities.

There are certain steps involved in a sequencing based metagenomics project. These include DNA extraction, library preparation, sequencing, assembly, annotation and statistical analysis.

Sample extraction

A reproducible method to extract DNA from microbial communities is essential for surveying and whole genome metagenomic analysis. Isolation and extraction must yield high quality nucleic acid for subsequent library preparation and sequencing. Sampling variation can have an effect on comparisons, and abundance measurements.

Library preparation

One of the biggest considerations for library preparation of environmental samples for shotgun metagenomic sequencing has to do with amplification. Amplification by PCR can over amplify certain fragments over others confounding abundance and microbial diversity measurements. Minimizing variability, constructing libraries together to reduce batch effects and keeping library preparation steps as consistent as possible between samples is good practice.


Shotgun metagenomic sequencing is unique in the sense that we are trying to sequence a large diverse pool of microbes, each with a different genome size, often mixed with host DNA. Current sequencing technologies offer a wide variety of read lengths and outputs. Illumina sequencing technology offers short reads, 2x250 or 2x300 bp but generates high sequencing depth. Longer reads are preferred as they overcome short contigs and other difficulties during assembly.


Assembly involves the merging of reads from the same genome into a single contiguous sequence (contig).


Once assembled, genes can be predicted and functionally annotated. Genes are typically predicated in one of three ways:

1) de novo gene prediction

2) protein family classification

3) fragment recruitment (binning)

Frequently used sequence databases for functional annotation include:

  • SEED

  • KEGG

  • MetaCyc

  • EggNOG


Need to learn more about Shotgun metagenomics (sequencing) and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page