The ability to predict a protein's local structural features from the primary sequence is of paramount importance for unravelling its function if no solved structures of the protein or its homologs are available. NetSurfP is the tool that can predict the most important local structural features with unprecedented accuracy and run-time. It is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP predicts solvent accessibility, secondary structure, structural disorder, interface residues and backbone dihedral angles for each residue of the input sequences. In addition to improved prediction accuracy the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours and complete proteomes in less than 1 day.
It also determines phi/psi dihedral angles of amino acids in an amino acid sequence. It predicts the surface accessibility and secondary structure of amino acids in an amino acid sequence. The method also simultaneously predicts the reliability for each prediction, in the form of a Z-score. The Z-score is related to the surface prediction, and not the secondary structure.
Input
All the input sequences must be in one-letter amino acid code.
The sequences can be input in the following two ways:
Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.
Select a FASTA file on our local disk, either by typing the file name into the lower window or by browsing the disk.
Pipes
All pipes '|' will be replaced with an underscore '_' in the name of a fasta-entry eg:
Lowercase letters
All lowercase letters in a sequence will be changed to uppercase letters.
Submit the job
Click on the "Submit" button. The status of our job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.
At any time during the wait we may enter our e-mail address and simply leave the window. Our job will continue; we will be notified by e-mail when it has completed. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for us to collect them.
The method consists of two neural network ensembles. The primary networks are trained on sequence profiles and predicted secondary structure and have two outputs corresponding to buried or exposed, respectively. The higher output defines the predicted category. The secondary networks use these outputs as input together with sequence profiles and have been trained to predict the relative surface exposure of the individual amino acid residues. The proposed reliability prediction method is applied to the secondary networks only.