top of page

Pan genomics

BioCodeKb - Bioinformatics Knowledgebase

A pan-genome is defined as the set of all unique gene families found in one or more strains of a prokaryotic species. Studies of pan-genomes have become popular due to the easy access to whole-genome sequence data for prokaryotes.


In the fields of molecular biology and genetics, a pan-genome (or supragenome) is the entire set of genes for all strains within a clade. The pan-genome includes: the core genome containing genes present in all strains within the clade, the accessory genome containing 'dispensable' genes present in a subset of the strains, and strain-specific genes. The study of the pan-genome is called pangenomics.


Some species have open (or extensive) pan-genomes, while others have closed pan-genomes. For species with a closed pan-genome, very few genes are added per sequenced genome (after sequencing many strains), and the size of the full pan-genome can be theoretically predicted. Species with an open pan-genome have enough genes added per additional sequenced genome that predicting the size of the full pan-genome is impossible. Population size and niche versatility have been suggested as the most influential factors in determining pan-genome size. The pan-genome can be broken down into a "core pan-genome" that contains genes present in all individuals, a "shell pan-genome" that contains genes present in two or more strains, and a "cloud pan-genome" that contains genes only found in a single strain.


Pan-genomes were originally constructed for species of bacteria and archaea, but more recently eukaryotic pan-genomes have been developed, particularly for plant species. Plant studies have shown that pan-genome dynamics are linked to transposable elements. The significance of the pan-genome arises in an evolutionary context, especially with relevance to metagenomics, but is also used in a broader genomics context.


As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.


Uses

  • characterizing strains by their individual gene set (e.g., detecting virulence factors only present in one particular strain of a species)

  • develop vaccines against pathogenic strains

  • detection, identification and tracking of new strains in metagenomics samples based on their individual gene subset of the species pangenome

  • study the evolutionary impact of horizontal gene transfer

  • Exploring strain diversity in environmental population genomics studies


Pangenome tools

  • Roary: Fast tool for extracting complete pangenomes, core gene sets, or differences between reference genomes

  • panX: pangenome analysis and web-based visualization

  • PanOCT: considers both gene homology and conserved gene neighborhoods

  • OrthoMCL: extracting the core genomes, etc..

  • LS-BSR: rapid comparison of the genetic content of large numbers of genomes

  • PanPhlAn: pangenome based detection of gene compositions of strains in environmental WGS samples

ad-scaled.webp

Need to learn more about BioCodeKB - Bioinformatics Knowledge... | BioCode and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page