%0 Book Section %B Encyclopedia of Molecular Biology %D 1999 %T Contact map %A Andrzej Koliński %A Adam Godzik %A Jeffrey Skolnick %B Encyclopedia of Molecular Biology %I John Wiley & Sons %C New York %P 567–571 %G eng %0 Journal Article %J Protein Science %D 1997 %T Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? %A Jeffrey Skolnick %A Lukasz Jaroszewski %A Andrzej Koliński %A Adam Godzik %K empirical parameter sets %K inverse protein folding %K protein structural database %K protein threading %K quasichemical approximation %X Many existing derivations of knowledge-based statistical pair potentials invoke the quasichemical approximation to estimate the expected side-chain contact frequency if there were no amino acid pair-specific interactions. At first glance, the quasichemical approximation that treats the residues in a protein as being disconnected and expresses the side-chain contact probability as being proportional to the product of the mole fractions of the pair of residues would appear to be rather severe. To investigate the validity of this approximation, we introduce two new reference states in which no specific pair interactions between amino acids are allowed, but in which the connectivity of the protein chain is retained. The first estimates the expected number of side-chain contracts by treating the protein as a Gaussian random coil polymer. The second, more realistic reference state includes the effects of chain connectivity, secondary structure, and chain compactness by estimating the expected side-chain contrast probability by placing the sequence of interest in each member of a library of structures of comparable compactness to the native conformation. The side-chain contact maps are not allowed to readjust to the sequence of interest, i.e., the side chains cannot repack. This situation would hold rigorously if all amino acids were the same size. Both reference states effectively permit the factorization of the side-chain contact probability into sequence-dependent and structure-dependent terms. Then, because the sequence distribution of amino acids in proteins is random, the quasichemical approximation to each of these reference states is shown to be excellent. Thus, the range of validity of the quasichemical approximation is determined by the magnitude of the side-chain repacking term, which is, at present, unknown. Finally, the performance of these two sets of pair interaction potentials as well as side-chain contact fraction-based interaction scales is assessed by inverse folding tests both without and with allowing for gaps. %B Protein Science %V 6 %P 676–688 %8 mar %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2143667&tool=pmcentrez&rendertype=abstract %R 10.1002/pro.5560060317 %0 Journal Article %J Proteins %D 1997 %T A method for the prediction of surface "U"-turns and transglobular connections in small proteins %A Andrzej Koliński %A Jeffrey Skolnick %A Adam Godzik %A Wei-Ping Hu %K Algorithms %K Amino Acid Sequence %K Animals %K Humans %K Molecular Sequence Data %K Protein Folding %K Protein Structure %K Proteins %K Proteins: chemistry %K Secondary %X A simple method for predicting the location of surface loops/turns that change the overall direction of the chain that is, "U" turns, and assigning the dominant secondary structure of the intervening transglobular blocks in small, single-domain globular proteins has been developed. Since the emphasis of the method is on the prediction of the major topological elements that comprise the global structure of the protein rather than on a detailed local secondary structure description, this approach is complementary to standard secondary structure prediction schemes. Consequently, it may be useful in the early stages of tertiary structure prediction when establishment of the structural class and possible folding topologies is of interest. Application to a set of small proteins of known structure indicates a high level of accuracy. The prediction of the approximate location of the surface turns/loops that are responsible for the change in overall chain direction is correct in more than 95% of the cases. The accuracy for the dominant secondary structure assignment for the linear blocks between such surface turns/loops is in the range of 82%. %B Proteins %V 27 %P 290–308 %8 feb %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/9061792 %0 Conference Proceedings %B Proceeding of I-st Pacific Symposium on Biocomputing %D 1996 %T An algorithm for prediction of structural elements in small proteins %A Andrzej Koliński %A Jeffrey Skolnick %A Adam Godzik %X A method for predicting the location of surface loops/turns and assigning the intervening secondary structure of the transglobular linkers in small, single domain globular proteins has been developed. Application to a set of 10 proteins of known structure indicates a high level of accuracy. The secondary structure assignment in the center of transglobular connections is correct in more than 85% of the cases. A similar error rate is found for loops. Since more global information about the fold is provided, it is complementary to standard secondary structure prediction approaches. Consequently, it may be useful in early stages of tertiary structure prediction when establishment of the structural class and possible folding topologies is of interest. %B Proceeding of I-st Pacific Symposium on Biocomputing %P 446–460 %G eng %U http://helix-web.stanford.edu/psb96/kolinski.pdf %0 Journal Article %J Protein Science: a Publication of the Protein Society %D 1995 %T Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets %A Adam Godzik %A Andrzej Koliński %A Jeffrey Skolnick %K Amino Acid Sequence %K Amino Acids %K Crystallography %K Databases %K Factual %K Magnetic Resonance Spectroscopy %K Mathematics %K Models %K Protein Conformation %K Protein Folding %K Proteins %K Proteins: chemistry %K Theoretical %K Thermodynamics %K X-Ray %X Various existing derivations of the effective potentials of mean force for the two-body interactions between amino acid side chains in proteins are reviewed and compared to each other. The differences between different parameter sets can be traced to the reference state used to define the zero of energy. Depending on the reference state, the transfer free energy or other pseudo-one-body contributions can be present to various extents in two-body parameter sets. It is, however, possible to compare various derivations directly by concentrating on the "excess" energy-a term that describes the difference between a real protein and an ideal solution of amino acids. Furthermore, the number of protein structures available for analysis allows one to check the consistency of the derivation and the errors by comparing parameters derived from various subsets of the whole database. It is shown that pair interaction preferences are very consistent throughout the database. Independently derived parameter sets have correlation coefficients on the order of 0.8, with the mean difference between equivalent entries of 0.1 kT. Also, the low-quality (low resolution, little or no refinement) structures show similar regularities. There are, however, large differences between interaction parameters derived on the basis of crystallographic structures and structures obtained by the NMR refinement. The origin of the latter difference is not yet understood. %B Protein Science: a Publication of the Protein Society %V 4 %P 2107–2117 %8 oct %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2142984&tool=pmcentrez&rendertype=abstract %R 10.1002/pro.5560041016 %0 Journal Article %J Journal of Computer-Aided Molecular Design %D 1993 %T De novo and inverse folding predictions of protein structure and dynamics %A Adam Godzik %A Andrzej Koliński %A Jeffrey Skolnick %K Inverse folding %K lattice protein models %K Molten globule intermediates %K Protein folding pathways %K tertiary structure prediction %X In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered beta-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including beta-hairpins, helical hairpins and alpha/beta/alpha fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3-4 A from native. Furthermore, the de novo algorithms can assess the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorithm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem. %B Journal of Computer-Aided Molecular Design %V 7 %P 397–438 %G eng %U http://www.springerlink.com/index/QM35800826224081.pdf %0 Journal Article %J Proceedings of the National Academy of Sciences of the United States of America %D 1993 %T From independent modules to molten globules: observations on the nature of protein folding intermediates %A Jeffrey Skolnick %A Andrzej Koliński %A Adam Godzik %K Binding Sites %K Isomerases %K Isomerases: chemistry %K Protein Disulfide-Isomerases %K Protein Folding %K Protein Structure %K Proteins %K Proteins: chemistry %K Secondary %B Proceedings of the National Academy of Sciences of the United States of America %V 90 %P 2099–100 %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=46030&tool=pmcentrez&rendertype=abstract %0 Journal Article %J The Journal of Chemical Physics %D 1993 %T A general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins %A Andrzej Koliński %A Adam Godzik %A Jeffrey Skolnick %K Amino Acid Sequence %K GLOBULAR proteins %X Starting from amino acid sequence alone, a general approach for simulating folding into the molten globule or rigid, native state depending on sequence is described. In particular, the 3D folds of two simple designed proteins have been predicted using a Monte Carlo folding algorithm. The model employs a very flexible hybrid lattice representation of the protein conformation, and fast lattice dynamics. A full rotamer library for side group conformations, and potentials of mean force of short and long range interactions have been extracted from the statistics of a high resolution set of nonhomologous, 3D structures of globular proteins. The simulated folding process starts from an arbitrary random conformation and relatively rapidly assembles a well defined four helix bundle. The very cooperative folding of the model systems is facilitated by the proper definition of the model protein hydrogen bond network, and multibody interactions of the side groups. The two sequences studied exhibit very different behavior. The first one, in excellent agreement with experiment, folds to a thermodynamically very stable four helix bundle that has all the properties postulated for the molten globule state. The second protein, having a more heterogeneous sequence, at lower temperature undergoes a transition from the molten globule state to the unique native state exhibiting a fixed pattern of side group packing. This marks the first time that the ability to predict a molten globule or a unique native state from sequence alone has been achieved. The implications for the general solution of the protein folding problem are briefly discussed. %B The Journal of Chemical Physics %V 98 %P 7420 %G eng %U http://smartech.gatech.edu/handle/1853/26987 http://link.aip.org/link/JCPSA6/v98/i9/p7420/s1&Agg=doi %R 10.1063/1.464706 %0 Journal Article %J Journal of Computational Chemistry %D 1993 %T Lattice representations of globular proteins: How good are they? %A Adam Godzik %A Andrzej Koliński %A Jeffrey Skolnick %X Using a number of different lattice models of proteins, the problems introduced by the discretization of a protein backbone are discussed and examples of the most typical errors arising in low coordination number lattices presented. The geometric properties of different lattices used in the literature are compiled, and for all of them the resulting α-carbon models of proteins are described in detail and compared to the original structures obtained from experiment. © John Wiley & Sons, Inc. %B Journal of Computational Chemistry %V 14 %P 1194-1202 %G eng %U http://onlinelibrary.wiley.com/doi/10.1002/jcc.540141009/abstract %0 Journal Article %J Current Biology %D 1993 %T A method for prediction of protein structure from sequence %A Jeffrey Skolnick %A Andrzej Koliński %A Charles L. Brooks III %A Adam Godzik %A Antonio Rey %X BACKGROUND: The ability to predict the native conformation of a globular protein from its amino-acid sequence is an important unsolved problem of molecular biology. We have previously reported a method in which reduced representations of proteins are folded on a lattice by Monte Carlo simulation, using statistically-derived potentials. When applied to sequences designed to fold into four-helix bundles, this method generated predicted conformations closely resembling the real ones. RESULTS: We now report a hierarchical approach to protein-structure prediction, in which two cycles of the above-mentioned lattice method (the second on a finer lattice) are followed by a full-atom molecular dynamics simulation. The end product of the simulations is thus a full-atom representation of the predicted structure. The application of this procedure to the 60 residue, B domain of staphylococcal protein A predicts a three-helix bundle with a backbone root mean square (rms) deviation of 2.25-3 A from the experimentally determined structure. Further application to a designed, 120 residue monomeric protein, mROP, based on the dimeric ROP protein of Escherichia coli, predicts a left turning, four-helix bundle native state. Although the ultimate assessment of the quality of this prediction awaits the experimental determination of the mROP structure, a comparison of this structure with the set of equivalent residues in the ROP dime- crystal structure indicates that they have a rms deviation of approximately 3.6-4.2 A. CONCLUSION: Thus, for a set of helical proteins that have simple native topologies, the native folds of the proteins can be predicted with reasonable accuracy from their sequences alone. Our approach suggest a direction for future work addressing the protein-folding problem. %B Current Biology %V 3 %P 414–423 %G eng %U http://dx.doi.org/10.1016/0960-9822(93)90348-R %R 10.1016/0960-9822(93)90348-R %0 Journal Article %J Protein Engineering %D 1993 %T Regularities in interaction patterns of globular proteins %A Adam Godzik %A Jeffrey Skolnick %A Andrzej Koliński %X The description of protein structure in the language of side chain contact maps is shown to offer many advantages over more traditional approaches. Because it focuses on side chain interactions, it aids in the discovery, study and classification of similarities between interactions defining particular protein folds and offers new insights into the rules of protein structure. For example, there is a small number of characteristic patterns of interactions between protein supersecondary structural fragments, which can be seen in various non-related proteins. Furthermore, the overlap of the side chain contact maps of two proteins provides a new measure of protein structure similarity. As shown in several examples, alignments based on contact map overlaps are a powerful alternative to other structure-based alignments. %B Protein Engineering %V 6 %P 801–810 %G eng %U http://peds.oxfordjournals.org/content/6/8/801.short %R 10.1093/protein/6.8.801 %0 Journal Article %J Proceedings of the National Academy of Sciences of the United States of America %D 1992 %T Simulations of the Folding Pathway of TIM-type a/ß Barrel Proteins %A Adam Godzik %A Jeffrey Skolnick %A Andrzej Koliński %X Simulations of the folding pathways of two large alpha/beta proteins, the alpha subunit of tryptophan synthase and triose phosphate isomerase, are reported using the knight's walk lattice model of globular proteins and Monte Carlo dynamics. Starting from randomly generated unfolded states and with no assumptions regarding the nature of the folding intermediates, for the tryptophan synthase subunit these simulations predict, in agreement with experiment, the existence and location of a stable equilibrium intermediate comprised of six beta strands on the amino terminus of the molecule. For the case of triose phosphate isomerase, the simulations predict that both amino- and carboxyl-terminal intermediates should be observed. In a significant modification of previous lattice models, this model includes a full heavy atom side chain description and is capable of representing native conformations at the level of 2.5- to 3-A rms deviation for the C alpha positions, as compared to the crystal structure. With a well-balanced compromise between accuracy of the protein description and the computer requirements necessary to perform simulations spanning biologically significant amounts of time, the lattice model described here brings the possibility of studying important biological processes to present-day computers. %B Proceedings of the National Academy of Sciences of the United States of America %V 89 %P 2629–2633 %G eng %0 Journal Article %J Journal of Molecular Biology %D 1992 %T A Topology Fingerprint Approach to the Inverse Protein Folding Problem %A Adam Godzik %A Jeffrey Skolnick %A Andrzej Koliński %K globin-phycocyanin similarity %K plastocyanin-azurin-immunoglobulin similarity %K protein stability %K protein structure prediction %K TIM barrel similarity %X We describe the most general solution to date of the problem of matching globular protein sequences to the appropriate three-dimensional structures. The screening template, against which sequences are tested, is provided by a protein "structural fingerprint" library based on the contact map and the buried/exposed pattern of residues. Then, a lattice Monte Carlo algorithm validates or dismisses the stability of the proposed fold. Examples of known structural similarities between proteins having weakly or unrelated sequences such as the globins and phycocyanins, the eight-member alpha/beta fold of triose phosphate isomerase and even a close structural equivalence between azurin and immunoglobulins are found. %B Journal of Molecular Biology %V 227 %P 227–238 %G eng %U http://dx.doi.org/10.1016/0022-2836(92)90693-E %R 10.1016/0022-2836(92)90693-E