The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, Phytozome have been developed. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes.
Phytozome is the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute, provides JGI users and the broader plant science community a hub for accessing, visualizing and analyzing JGI-sequenced plant genomes, as well as selected genomes and datasets that have been sequenced elsewhere. Phytozome hosts 93 assembled and annotated genomes, from 82 Viridiplantae species. More than half of these genomes have been sequenced, assembled and/or annotated with JGI Plant Science program resources. By integrating this large collection of plant genomes into a single resource and performing comprehensive and uniform annotation and analyses, Phytozome facilitates accurate and insightful comparative genomics studies.
All gene sets in Phytozome have been annotated with KOG, KEGG, ENZYME, Pathway and the InterPro family of protein analysis tools. Families of related genes showing the modern descendants of putative ancestral genes are constructed at key phylogenetic nodes. These families provide additional insight into clade-specific orthology/paralogy relationships as well as clade-specific novelties and expansions. Search and visualization tools let users quickly find and analyze genes or genomic regions of interest. Query-based data access is provided by Phytozome's InterMine and BioMart instances, while bulk data sets can be accessed via the JGI's Genome Portal. JBrowse genome browsers are available for all genomes.
Gene family construction in Phytozome uses a distance-based method similar to the PhiGs method, the initial proto-family creation step used in TreeFam, with many changees. Family construction is restricted initially to a subset of core genomes, which are considered to have relatively stable assemblies and complete structural annotations, though in some cases genomes with draft assemblies and annotations are used if the species in question is the sole representative of its clade (e.g. Selaginella, Physcomitrella, Mimulus). Using the assumed species tree, gene families are developed at each evolutionary node, starting from the crown nodes and moving backward in evolutionary time. At each bifurcating parent node, pairs of gene families from the two daughter nodes are combined into a parent family if they are joined by a cross-node MBH. Remaining families from the daughter nodes will be added to a parent family as paralogs if they have a hit to the parent that is stronger than the parent's best outgroup hit. This process is repeated down to the root node. MSAs from MUSCLE and Hidden Markov Model (HMM) profiles from HMMER3 are created for each core family.