%0 Book Section %B Methods in Molecular Biology %D 2017 %T The GOR Method of Protein Secondary Structure Prediction, and its Application as Protein Aggregation Prediction Tool %A Maksim Kouza %A Eshel Faraggi %A Andrzej Koliński %A Andrzej Kloczkowski %X

The GOR method of protein secondary structure prediction is described. The original method was published by Garnier, Osguthorpe and Robson in 1978, and was one of the first successful methods to predict protein secondary structure from amino acid sequence. The method is based on the information theory, and an assumption that information function of a protein chain can be approximated by a sum of information from single residues and pairs of residues. The analysis of frequencies of occurrence of secondary structure for singlets and doublets of residues in a protein database enables prediction of secondary structure for new amino acid sequences. Because of these simple physical assumptions the GOR method has a conceptual advantage over other later developed methods such as PHD, PSIPRED and others, that are based on Machine Learning methods (like Neural Networks), give slightly better predictions, but have a “black box” nature. The GOR method has been continuously improved and modified for 30 years with the last GOR V version published in 2002, and the GOR V server developed in 2005. We discuss here the original GOR method and the GOR V program and the web server. Additionally we discuss new highly interesting and important applications of the the GOR method to chameleon sequences in protein folding simulations, and for prediction of protein aggregation propensities. Our preliminary studies show that the GOR method is a promising and efficient alternative to other protein aggregation predicting tools. This shows that the GOR method despite being almost 40 years old is still important and has significant potential in application to new scientific problems.

%B Methods in Molecular Biology %V 1484 %P 7-24 %@ 978-1-4939-6404-8 %G eng %R 10.1007/978-1-4939-6406-2_2 %0 Journal Article %J Proceedings of the National Academy of Sciences of the United States of America %D 2012 %T Genomics-aided structure prediction. %A Joanna I. Sulkowska %A Morcos, Faruck %A Weigt, Martin %A Hwa, Terence %A Onuchic, José N %K Amino Acid Sequence %K Genomics %K Molecular Dynamics Simulation %K Molecular Sequence Data %K Proteins %K Sequence Homology, Amino Acid %X We introduce a theoretical framework that exploits the ever-increasing genomic sequence information for protein structure prediction. Structure-based models are modified to incorporate constraints by a large number of non-local contacts estimated from direct coupling analysis (DCA) of co-evolving genomic sequences. A simple hybrid method, called DCA-fold, integrating DCA contacts with an accurate knowledge of local information (e.g., the local secondary structure) is sufficient to fold proteins in the range of 1-3 Å resolution. %B Proceedings of the National Academy of Sciences of the United States of America %V 109 %P 10340-5 %8 2012 Jun 26 %G eng %N 26 %R 10.1073/pnas.1207864109 %0 Journal Article %J Proteins %D 2005 %T Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models %A Andrzej Koliński %A Janusz M. Bujnicki %K Algorithms %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Computers %K Data Interpretation %K Databases %K Dimerization %K Models %K Molecular %K Monte Carlo Method %K Protein %K Protein Conformation %K Protein Folding %K Protein Structure %K Proteomics %K Proteomics: methods %K Reproducibility of Results %K Secondary %K Sequence Alignment %K Software %K Statistical %K Tertiary %X To predict the tertiary structure of full-length sequences of all targets in CASP6, regardless of their potential category (from easy comparative modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in different categories in CASP5. First, the GeneSilico metaserver was used to identify domains, predict secondary structure, and generate fold recognition (FR) alignments, which were converted to full-atom models using the "FRankenstein's Monster" approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated "de novo" by fully automated servers were obtained from the CASP website. All these models were evaluated by VERIFY3D, and residues with scores better than 0.2 were used as a source of spatial restraints. Second, a new implementation of the lattice-based protein modeling tool CABS was used to carry out folding guided by the above-mentioned restraints with the Replica Exchange Monte Carlo sampling technique. Decoys generated in the course of simulation were subject to the average linkage hierarchical clustering. For a representative decoy from each cluster, a full-atom model was rebuilt. Finally, five models were selected for submission based on combination of various criteria, including the size, density, and average energy of the corresponding cluster, and the visual evaluation of the full-atom structures and their relationship to the original templates. The combination of FRankenstein and CABS was one of the best-performing algorithms over all categories in CASP6 (it is important to note that our human intervention was very limited, and all steps in our method can be easily automated). We were able to generate a number of very good models, especially in the Comparative Modeling and New Folds categories. Frequently, the best models were closer to the native structure than any of the templates used. The main problem we encountered was in the ranking of the final models (the only step of significant human intervention), due to the insufficient computational power, which precluded the possibility of full-atom refinement and energy-based evaluation. %B Proteins %V 61 Suppl. 7 %P 84–90 %8 jan %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/16187348 %R 10.1002/prot.20723 %0 Journal Article %J Proteins %D 2001 %T Generalized comparative modeling (GENECOMP): a combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement %A Andrzej Koliński %A Marcos Betancourt %A Daisuke Kihara %A Piotr Rotkiewicz %A Jeffrey Skolnick %K Algorithms %K Chemical %K Combinatorial Chemistry Techniques %K Combinatorial Chemistry Techniques: methods %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Databases %K Factual %K Models %K Molecular %K Monte Carlo Method %K Protein Folding %K Proteins %K Proteins: chemistry %K Sequence Alignment %K Sequence Alignment: methods %X An improved generalized comparative modeling method, GENECOMP, for the refinement of threading models is developed and validated on the Fischer database of 68 probe-template pairs, a standard benchmark used to evaluate threading approaches. The basic idea is to perform ab initio folding using a lattice protein model, SICHO, near the template provided by the new threading algorithm PROSPECTOR. PROSPECTOR also provides predicted contacts and secondary structure for the template-aligned regions, and possibly for the unaligned regions by garnering additional information from other top-scoring threaded structures. Since the lowest-energy structure generated by the simulations is not necessarily the best structure, we employed two structure-selection protocols: distance geometry and clustering. In general, clustering is found to generate somewhat better quality structures in 38 of 68 cases. When applied to the Fischer database, the protocol does no harm and in a significant number of cases improves upon the initial threading model, sometimes dramatically. The procedure is readily automated and can be implemented on a genomic scale. %B Proteins %V 44 %P 133–149 %8 aug %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/11391776 %0 Journal Article %J The Journal of Chemical Physics %D 1993 %T A general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins %A Andrzej Koliński %A Adam Godzik %A Jeffrey Skolnick %K Amino Acid Sequence %K GLOBULAR proteins %X Starting from amino acid sequence alone, a general approach for simulating folding into the molten globule or rigid, native state depending on sequence is described. In particular, the 3D folds of two simple designed proteins have been predicted using a Monte Carlo folding algorithm. The model employs a very flexible hybrid lattice representation of the protein conformation, and fast lattice dynamics. A full rotamer library for side group conformations, and potentials of mean force of short and long range interactions have been extracted from the statistics of a high resolution set of nonhomologous, 3D structures of globular proteins. The simulated folding process starts from an arbitrary random conformation and relatively rapidly assembles a well defined four helix bundle. The very cooperative folding of the model systems is facilitated by the proper definition of the model protein hydrogen bond network, and multibody interactions of the side groups. The two sequences studied exhibit very different behavior. The first one, in excellent agreement with experiment, folds to a thermodynamically very stable four helix bundle that has all the properties postulated for the molten globule state. The second protein, having a more heterogeneous sequence, at lower temperature undergoes a transition from the molten globule state to the unique native state exhibiting a fixed pattern of side group packing. This marks the first time that the ability to predict a molten globule or a unique native state from sequence alone has been achieved. The implications for the general solution of the protein folding problem are briefly discussed. %B The Journal of Chemical Physics %V 98 %P 7420 %G eng %U http://smartech.gatech.edu/handle/1853/26987 http://link.aip.org/link/JCPSA6/v98/i9/p7420/s1&Agg=doi %R 10.1063/1.464706