A taxonomic database is a database developed to hold information related to biological taxa, for example groups of organisms organized by species name or other taxonomic identifier for efficient data management and information retrieval as required.
Today, taxonomic databases are routinely used for the automated construction of biological checklists such as floras and faunas, both for print publication and online to underpin the operation of web based species information system as a part of biological collection management (for example in museums and herbaria); as well as providing, in some cases, the taxon management component of broader science or biology information systems. They are also a fundamental contribution to the discipline of biodiversity informatics.
The goal of a taxonomic database is to accurately model the characteristics of interest that are relevant to the organisms which are in scope for the intended coverage and usage of the system. For example, databases of fungi, algae, bryophytes and higher plants would need to encode conventions from the International Code of Botanical Nomenclature while their counterparts for animals and most protists would encode equivalent rules from the International Code of Zoological Nomenclature.
The NCBI Taxonomy database
It is the standard nomenclature and classification repository, created in 1991, for the International Nucleotide Sequence Database Collaboration (INSDC), consisting of the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC’s nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web.
NCBI’s Taxonomy Database has now surpassed 300,000 individual records of species with formal scientific names. The majority of these represent eukaryotic organisms. However, the Taxonomy database contains listings for nearly all of the prokaryotes and viruses that have been described.
The taxonomy database that is maintained by the UniProt group is based on the NCBI taxonomy database, which is supplemented with data specific to the UniProt Knowledgebase (UniProtKB). While the NCBI taxonomy is updated daily to be in sync with GenBank/EMBL-Bank/DDBJ, the UniProt taxonomy is updated only at UniProt releases to be in sync with UniProtKB. It may therefore happen that for the time period of a UniProt release, we can find new taxa at the NCBI that are not yet in UniProt (and vice versa for deleted taxa).
Species with manually annotated and reviewed protein sequences in the Swiss-Prot section of UniProtKB are named according to UniProt nomenclature. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates.