The recognition of errors in experimental and theoretical models of protein structures is a major problem in structural biology. ProSA-web provides an easy-to-use interface to the program ProSA which is frequently employed in protein structure validation. ProSA calculates an overall quality score for a specific input structure. If this score is outside a range characteristic for native proteins the structure probably contains errors. A plot of local quality scores points to problematic parts of the model which are also highlighted in a 3D molecule viewer to facilitate their detection.
A particular intention of the ProSA-web application is to encourage structure depositors to validate their structures before they are submitted to PDB and to use the tool in early stages of structure determination and refinement. The service requires only Cα atoms so that low-resolution structures and approximate models obtained early in the structure determination process can be evaluated and compared against high-resolution structures. The ProSA-web service returns results instantaneously, i.e. the response time is in the order of seconds, even for large molecules.
ProSA is a diagnostic tool that is based on the statistical analysis of all available protein structures. The potentials of mean force compiled from the data base provide a statistical average over the known structures. Structures of soluble globular proteins whose z-scores deviate strongly from the data base average are unusual and frequently such structures turn out to be erroneous. For proteins containing membrane spanning regions, the significance of deviations from the average over the data base is less clear.
Input
ProSA-web requires the atomic coordinates of the model to be evaluated. Users can supply coordinates either by uploading a file in PDB format or by entering the 4-letter code of a protein structure available from PDB. A chain identifier and an NMR model number may be used to specify a particular model. A list with possible values of these parameters is presented to the user if the entered chain identifier or model number is invalid. If no chain identifier or model number is supplied by the user, the first chain of the first model found in the PDB file is used for analysis.
ProSA-web uses only the C-alpha atoms of the input structure, hence it can also be applied to low resolution structures and approximate models obtained early in the structure determination process.
Output
Z-score
The z-score indicates overall model quality. Its value is displayed in a plot that contains the z-scores of all experimentally determined protein chains in current PDB. In the plot, groups of structures from different sources (X-ray, NMR) are distinguished by different colors. It can be used to check whether the z-score of the input structure is within the range of scores typically found for native proteins of similar size.
Plot of residue scores
The plot shows local model quality by plotting energies as a function of amino acid sequence position i. In general, positive values correspond to problematic or erroneous parts of the input structure. A plot of single residue energies usually contains large fluctuations and is of limited value for model evaluation. Hence the plot is smoothed by calculating the average energy over each 40-residue fragment s(i,i+39), which is then assigned to the 'central' residue of the fragment at position i+19. A second line with a smaller window size of 10 residues is usually shown in the background of the plot.
Interactive molecule viewer
ProSA-web visualizes the 3D structure of the input protein using the molecule viewer Jmol. Residues are colored from blue to red in the order of increasing residue energy.