Source:Journal of Molecular Biology, 277:419–448, 1998
The feasibility of predicting the global fold of small proteins by incorporating predicted secondary and tertiary restraints into ab initio folding simulations has been demonstrated on a test set comprised of 20 non-homologous proteins, of which one was a blind prediction of target 42 in the recent CASP2 contest. These proteins contain from 37 to 100 residues and represent all secondary structural classes and a representative variety of global topologies. Secondary structure restraints are provided by the PHD secondary structure prediction algorithm that incorporates multiple sequence information. Predicted tertiary restraints are derived from multiple sequence alignments via a two-step process. First, seed side-chain contacts are identified from correlated mutation analysis, and then a threading-based algorithm is used to expand the number of these seed contacts. A lattice-based reduced protein model and a folding algorithm designed to incorporate these predicted restraints is described. Depending upon fold complexity, it is possible to assemble native-like topologies whose coordinate root-mean-square deviation from native is between 3.0 A and 6.5 A. The requisite level of accuracy in side-chain contact map prediction can be roughly 25% on average, provided that about 60% of the contact predictions are correct within +/-1 residue and 95% of the predictions are correct within +/-4 residues. Precision in tertiary contact prediction is more critical than absolute accuracy. Furthermore, only a subset of the tertiary contacts, on the order of 25% of the total, is sufficient for successful topology assembly. Overall, this study suggests that the use of restraints derived from multiple sequence alignments combined with a fold assembly algorithm holds considerable promise for the prediction of the global topology of small proteins.