
UniProtKB, is also named as Swiss-Prot, is the manually quality annotated and reviewed section of the UniProt. It is a non-redundant protein sequence database which gives the researchers combination of experimental results, computed features and scientific results.
Since 2002, the UniProt consortium is maintained it and it can be accessible via the UniProt website.
Importance of UniProt Knowledgebase
It is the central hub for the collection of functional information on proteins, with accurate, consistent and annotation. In addition to this, it shows the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much protein annotation information as possible is added. This includes extensively accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
The UniProt Knowledgebase consists of two sections:
UniProtKB/Swiss-Prot
A section which contains manually-annotated records with information extracted from literature and curator-evaluated computational analysis.
UniProtKB/TrEMBL
A section with computationally analyzed records that expect full manual annotation.
More than 95 % of the protein sequences present in UniProtKB are derived from the translation of the coding sequences (CDS) which have been submitted to the public nucleic acid databases, the EMBL-Bank, GenBank, DDBJ databases. All these sequences, as well as the related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
Minimal redundancy is the first trying of UniProtKB to improve sequence reliability. All protein sequences encoded by a same gene are merged into a single UniProtKB/Swiss-Prot entry. Differences found between various sequencing reports are analysed and fully described in the feature table. Once in UniProtKB/Swiss-Prot, a protein entry is removed from UniProtKB/TrEMBL.
Manual annotation in UniProtKB consists of a critical review of experimentally proven or computer-predicted data about each protein, including the protein sequences. Data are continuously updated by an expert team of biologists.
Here are some specifications and functions that have made UniProtKB popular among scientists and researchers;
Annotation of data
The sequence data
The citation information
The taxonomic data
Function(s) of the protein
Posttranslational modification(s) such as carbohydrates, phosphorylation, acetylation and GPI-anchor
Domains and sites
homeoboxes, SH2 and SH3 domains and kringle
Secondary structure, e.g. alpha helix, beta sheet
Quaternary structure, i.g. homodimer, heterotrimer, etc
Similarities to other proteins
Disease(s) associated with any number of deficiencies in the protein
Sequence conflicts, variants etc
Minimal redundancy
Integration with other databases
Documentation
Sequence curation.
Sequence analysis
Literature curation
Family-based curation
Evidence attribution
Evidence on protein existence
Quality assurance, integration and updation
In UniProtKB, annotation of a protein consists of the description of the following: function(s), enzyme-specific information, biologically relevant domains and sites, post-translational modifications, subcellular location(s), tissue specificity, developmentally specific expression, structure, interactions, splice isoform(s), diseases associated with deficiencies or abnormalities, etc. Another important part of the annotation process involves the joining of different reports for a single protein.
Once a protein sequence has been selected for manual annotation on the basis of our curation priorities, Blast searches are run against UniProtKB to identify additional sequences from the same gene and to identify homologs.