Comparative metagenomics is a fast growing field and novel tools are required to support comparative analysis of multiple metagenomic datasets.
Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies.
Comparative visualizations are useful to obtain an impression of how two datasets differ.
Metagenomics is a field of research that aims at studying uncultured organisms to understand the true diversity of microbes, their functions, cooperation and evolution, in environments such as soil, water, ancient remains of animals, or the digestive system of animals and humans. The recent development of ultra-high throughput sequencing technologies, which do not require cloning or PCR amplification, and can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects.
An enormous number of metagenomic sequence datasets have been and continue to be generated, covering a huge variety of environmental niches, including several different human body sites. Comparing these metagenomes and identifying their commonalities and differences is a challenging task, due not only to the large amounts of data, but also because there are several methodological considerations that need to be taken into account to ensure an appropriate and sound comparison between datasets.
Metagenomics has been successfully applied to investigate microbial diversity, adaptation, evolution, and function. The profiling of microbial communities can be revealed by high-throughput sequencing of targeted PCR amplification. For example, through 16S rRNA genes sequencing, the microbial community variation can provide important baseline understandings of the microbial ecology and health assessment of the marine ecosystems. Metabolism of microbial communities can be investigated by metagenomics without any prior knowledge. The metagenomic results are important evidences to measure some specific ecological processes. For instance, the metagenomic analysis revealed the metabolic versatility of microorganisms and their roles in biogeochemical cycles including nitrogen, carbon, and sulfur cycles. Metagenomics was also applied to study the upper and core regions of oxygen minimum zones in Arabian Sea, and confirmed the genomic potentials of active nitrogen cycle.
Comparative analyses between metagenomes can provide additional insight into the function of complex microbial communities and their role in host health. Pairwise or multiple comparisons between metagenomes can be made at the level of sequence composition (comparing GC-content or genome size), taxonomic diversity, or functional complement. Comparisons of population structure and phylogenetic diversity can be made on the basis of 16S and other phylogenetic marker genes, or in the case of low-diversity communities by genome reconstruction from the metagenomic dataset. Functional comparisons between metagenomes may be made by comparing sequences against reference databases such as COG or KEGG, and tabulating the abundance by category and evaluating any differences for statistical significance.
Additionally, several studies have also utilized oligonucleotide usage patterns to identify the differences across diverse microbial communities.