To predict protein function, assign family identity or detect remote homologues, searches against signature databases, also known as secondary databases, are essential. ScanProsite provides a web interface to identify protein matches against signatures from the PROSITE database. The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns (regular expressions), used for short motif detection, or generalized profiles (weight matrices) for sensitive detection of larger domains. All signatures are built from manually derived alignments and are provided with extensive manually curated documentation (taxonomic occurrence, function, etc.). ScanProsite was implemented in Perl, and is served through an Apache web server running on a UNIX operating system. Care was taken to ensure that all generated pages are fully standards-compliant.
Program output can either be displayed in the web browser (interactive mode), or sent by email (batch mode). The rich view output mode uses standard DHTML (HTML, CSS, JavaScript), and no additional plugins are required.
At the beginning the user has to choose between three options:
Option 1: Submit PROTEIN sequences to scan them against the PROSITE collection of motifs.
Option 2: Submit MOTIFS to scan them against a PROTEIN sequence database.
Option 3: Submit PROTEIN sequences and MOTIFS to scan them against each other.
Quick Scan
The Quick Scan mode of ScanProsite corresponds to a simplified version of 'Option 1 - Submit PROTEIN sequences to scan them against the PROSITE collection of motifs' that is available from the PROSITE homepage.
Enter or paste up to 10 protein sequences in the text area.
Our input sequences will be scanned against all PROSITE motifs including or excluding the ones with a high probability of occurrence depending of whether we check (exclude) or uncheck (include) the checkbox below the text area.
Once the scan carried out, the results will be displayed in the 'Graphical view' output format.
Main operations
Submit PROTEIN sequences
We can either enter or paste protein sequences in the text area or submit a protein database.
If we use the 'Option 1' (scan against all PROSITE motifs), the maximum number of sequences that we can submit is 10; while if we use 'Option 3' (scan against specified motifs) the maximum number of sequence we can enter is 1'000 if we submit 1 motif and 50 if we submit a combination of motifs.
If we want the scan to be carried out against our own sequence database either enter a database code or submit a file in FASTA (max. 16MB). Once our file uploaded, we will receive a code that we can use for repeated scans on the database we have just submitted, the database will remain on our server for a period of 1 month.
Submit MOTIFS (Enter a MOTIF or a combination of MOTIFS)
Enter a motif or a combination of motifs in the text area, the supported input is:
A PROSITE accession e.g. PS50240 or identifier e.g. TRYPSIN_DOM
Our own pattern e.g. P-x(2)-G-E-S-G(2)-[AS]
A combination of PROSITE accessions/identifiers e.g. PS50240 and PS50068, e.g. PS50240 and not ( PS00134 or PS00135 )
A combination of PROSITE accessions/identifiers and your own pattern e.g. PS50240 and P-x(2)-G-E-S-G(2)-[AS]
Then we have the possibility to modify a couple of default scanning parameters (scanning options).
Mimimal number of hits per matched sequences (only in 'Option 2')
Run the scan at high sensitivity (show weak matches for profiles)
Number of X characters in a scanned sequence that can be matched by a conserved position in a pattern
Match mode