PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.
PROSITE include identifying possible functions of newly discovered proteins and analysis of known proteins for previously undetermined activity. Properties from well-studied genes can be propagated to biologically related organisms, and for different or poorly known genes biochemical functions can be predicted from similarities. PROSITE offers tools for protein sequence analysis and motif detection.
The database ProRule builds on the domain descriptions of PROSITE. It provides additional information about functionally or structurally critical amino acids. The rules contain information about biologically meaningful residues, like active sites, substrate or co-factor-binding sites, posttranslational modification sites or disulfide bonds, to help function determination. These can automatically generate annotation based on PROSITE motifs.
The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. PROSITE patterns are short sequence motifs, while PROSITE profiles are position specific score matrices. Profiles characterize protein domains over their entire length, and they are more sensitive than patterns. Profiles and patterns have complementary qualities. Patterns, confined to small regions with high sequence similarity, are often powerful predictors of protein functions such as enzymatic activities. Profiles covering complete domains are more suitable for predicting protein structural properties.
The PROSITE database is composed of two ASCII (text) files. The first file, PROSITE.DAT, is a computer readable file that contains all the information necessary to programs that will scan sequences with patterns and/or matrices. The second file, PROSITE.DOC, contains textual information that fully documents each pattern and profile. We must point out that we strongly urge software developers to build software tools that make use of both files. A list of patterns or profiles present in a sequence is not very useful to biologists without the relevant documentation.
The PROSITE database is now complemented by a series of rules that can give more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages were redesigned to add more functionalities in PROSITE. The latest version of PROSITE contains 1329 patterns and 552 profile entries.
The PROSITE website was redesigned and new predictive tools were implemented to assign more detailed functional information to the scanned proteins. Users who want to scan their own proteins against all PROSITE entries or to scan a PROSITE entry against a protein database will find a new version of the ScanProsite web page.