Metatranscriptomics is the science that studies gene expression of microbes within natural environments, such as the metatranscriptome. It also allows to obtain whole gene expression profiling of complex microbial communities.
Metagenomics allows researchers to access the functional and metabolic diversity of microbial communities, but it cannot show which of these processes are active. The extraction and analysis of metagenomic mRNA (the metatranscriptome) provides information on the regulation and expression profiles of complex communities. Because of the technical difficulties in the collection of environmental RNA there have been relatively few in situ metatranscriptomic studies of microbial communities to date. While originally limited to microarray technology, metatranscriptomics studies have made use of transcriptomics technologies to measure whole-genome expression and quantification of a microbial community, first employed in analysis of ammonia oxidation in soils.
Since metatranscriptomics focuses on what genes are expressed, it allows to understand the active functional profile of the entire microbial community.
Although microarrays can be exploited to determine the gene expression profiles of some model organisms, next-generation sequencing and third-generation sequencing are the preferred techniques in metatranscriptomics. The protocol that is used to perform a metatranscriptome analysis may vary depending on the type of sample that needs to be analysed. Indeed, many different protocols have been developed for studying the metatranscriptome of microbial samples. Generally, the steps include sample harvesting, RNA extraction, mRNA enrichment, cDNA synthesis and preparation of metatranscriptomic libraries, sequencing and data processing and analysis. mRNA enrichment is one of the trickiest parts.
A typical metatranscriptome analysis pipeline:
maps reads to a reference genome or
performs de novo assembly of the reads into transcript contigs and supercontigs
The first strategy maps reads to reference genomes in databases, to collect information that is useful to deduce the relative expression of the single genes. Metatranscriptomic reads are mapped against databases using alignment tools, such as Bowtie2, BWA, and BLAST. Then, the results are annotated using resources, such as GO, KEGG, COG, and Swiss-Prot. The final analysis of the results is carried out depending on the aim of the study. One of the latest metatranscriptomics techniques is stable isotope probing (SIP), which has been used to retrieve specific targeted transcriptomes of aerobic microbes in lake sediment.
Metatranscriptomics, a subset of metagenomics, provides valuable information about the whole gene expression profiling of complex microbial communities of an ecosystem. Metagenomic studies mainly focus on the genomic content and identification of microbes present within a community, while metatranscriptomics provides the diversity of the active genes within such community, their expression profile and how these levels change due to change in environmental conditions. Metatranscriptomics has been applied to different types of environments, from the study of human microbiomes, to those found in plants, animals, within soils and in aquatic systems. Metatranscriptomics, based on the utilization of mRNA isolated from environmental samples, is a suitable approach to mine the eukaryotic gene pool for genes of biotechnological relevance. Also, it is imperative to develop different bioinformatic pipelines to analyze the data obtained from metatranscriptomic analysis.
Steps
RNA Purification
Library preparation
RNA-seq
Filtering reads
Aligning reads to a reference
De novo assembly
Annotation
Statistical analysis
Applications
Human health
Assessment of microbiome–immune interactions
Studying microbiome small noncoding RNAs
Drug discovery
Agriculture
Ecology
Food industry
One of the important steps that should be taken into consideration is physical removal or depletion of the highly abundant ribosomal RNA (rRNA) transcripts from the samples, as they often constitute upward of 90% of all data if not removed and do not contribute towards most downstream analyses, such as finding differentially expressed genes or pathway characterization.