%0 Journal Article %J BMC Structural Biology %D 2010 %T Modeling of loops in proteins: a multi-method approach %A Michal Jamroz %A Andrzej Koliński %K Databases %K Models %K Molecular %K Protein %K Protein Structure %K Proteins %K Proteins: chemistry %K Software %K Tertiary %X BACKGROUND: Template-target sequence alignment and loop modeling are key components of protein comparative modeling. Short loops can be predicted with high accuracy using structural fragments from other, not necessairly homologous proteins, or by various minimization methods. For longer loops multiscale approaches employing coarse-grained de novo modeling techniques should be more effective. RESULTS: For a representative set of protein structures of various structural classes test predictions of loop regions have been performed using MODELLER, ROSETTA, and a CABS coarse-grained de novo modeling tool. Loops of various length, from 4 to 25 residues, were modeled assuming an ideal target-template alignment of the remaining portions of the protein. It has been shown that classical modeling with MODELLER is usually better for short loops, while coarse-grained de novo modeling is more effective for longer loops. Even very long missing fragments in protein structures could be effectively modeled. Resolution of such models is usually on the level 2-6 A, which could be sufficient for guiding protein engineering. Further improvement of modeling accuracy could be achieved by the combination of different methods. In particular, we used 10 top ranked models from sets of 500 models generated by MODELLER as multiple templates for CABS modeling. On average, the resulting molecular models were better than the models from individual methods. CONCLUSIONS: Accuracy of protein modeling, as demonstrated for the problem of loop modeling, could be improved by the combinations of different modeling techniques. %B BMC Structural Biology %V 10 %P 5 %8 jan %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2837870&tool=pmcentrez&rendertype=abstract %R 10.1186/1472-6807-10-5 %0 Journal Article %J Journal of Structural and Functional Genomics %D 2009 %T Distance matrix-based approach to protein structure prediction %A Andrzej Kloczkowski %A Robert L. Jernigan %A Zhijun Wu %A Guang Song %A Lei Yang %A Andrzej Koliński %A Piotr Pokarowski %K Binding Sites %K Computer Simulation %K Databases %K Models %K Molecular %K Principal Component Analysis %K Protein %K Protein Conformation %K Proteins %K Proteins: chemistry %X

Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)–the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).

%B Journal of Structural and Functional Genomics %V 10 %P 67–81 %8 mar %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3018873&tool=pmcentrez&rendertype=abstract %R 10.1007/s10969-009-9062-2 %0 Journal Article %J Nucleic Acids Research %D 2008 %T AAindex: amino acid index database, progress report 2008 %A Kawashima, Shuichi %A Piotr Pokarowski %A Pokarowska, Maria %A Andrzej Koliński %A Katayama, Toshiaki %A Kanehisa, Minoru %K Amino Acids %K Amino Acids: chemistry %K Databases %K Internet %K Protein %K Proteins %K Proteins: chemistry %X

AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www\_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).

%B Nucleic Acids Research %V 36 %P D202–5 %8 jan %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2238890&tool=pmcentrez&rendertype=abstract %R 10.1093/nar/gkm998 %0 Journal Article %J BMC Structural Biology %D 2008 %T Contact prediction in protein modeling: scoring, folding and refinement of coarse-grained models %A Dorota Latek %A Andrzej Koliński %K Algorithms %K Caspase 6 %K Caspase 6: chemistry %K Caspase 6: genetics %K Computer Simulation %K Databases %K Models %K Molecular %K Protein %K Protein Folding %K Proteins %K Proteins: chemistry %K Proteins: genetics %X

Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB.

%B BMC Structural Biology %V 8 %P 36 %8 jan %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2527566&tool=pmcentrez&rendertype=abstract %R 10.1186/1472-6807-8-36 %0 Journal Article %J Bioinformatics (Oxford, England) %D 2006 %T BioShell–a package of tools for structural biology computations %A Dominik Gront %A Andrzej Koliński %K Chemical %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Databases %K Models %K Protein %K Protein: methods %K Proteins %K Proteins: analysis %K Proteins: chemistry %K Proteins: classification %K Sequence Alignment %K Sequence Alignment: methods %K Sequence Analysis %K Software %X

SUMMARY: BioShell is a suite of programs performing common tasks accompanying protein structure modeling. BioShell design is based on UNIX shell flexibility and should be used as its extension. Using BioShell various molecular modeling procedures can be integrated in a single pipeline. AVAILABILITY: BioShell package can be downloaded from its website http://biocomp.chem.uw.edu.pl/BioShell and these pages provide many examples and a detailed documentation for the newest version.

%B Bioinformatics (Oxford, England) %V 22 %P 621–622 %8 mar %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/16407320 %R 10.1093/bioinformatics/btk037 %0 Journal Article %J Proteins %D 2005 %T Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models %A Andrzej Koliński %A Janusz M. Bujnicki %K Algorithms %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Computers %K Data Interpretation %K Databases %K Dimerization %K Models %K Molecular %K Monte Carlo Method %K Protein %K Protein Conformation %K Protein Folding %K Protein Structure %K Proteomics %K Proteomics: methods %K Reproducibility of Results %K Secondary %K Sequence Alignment %K Software %K Statistical %K Tertiary %X To predict the tertiary structure of full-length sequences of all targets in CASP6, regardless of their potential category (from easy comparative modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in different categories in CASP5. First, the GeneSilico metaserver was used to identify domains, predict secondary structure, and generate fold recognition (FR) alignments, which were converted to full-atom models using the "FRankenstein's Monster" approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated "de novo" by fully automated servers were obtained from the CASP website. All these models were evaluated by VERIFY3D, and residues with scores better than 0.2 were used as a source of spatial restraints. Second, a new implementation of the lattice-based protein modeling tool CABS was used to carry out folding guided by the above-mentioned restraints with the Replica Exchange Monte Carlo sampling technique. Decoys generated in the course of simulation were subject to the average linkage hierarchical clustering. For a representative decoy from each cluster, a full-atom model was rebuilt. Finally, five models were selected for submission based on combination of various criteria, including the size, density, and average energy of the corresponding cluster, and the visual evaluation of the full-atom structures and their relationship to the original templates. The combination of FRankenstein and CABS was one of the best-performing algorithms over all categories in CASP6 (it is important to note that our human intervention was very limited, and all steps in our method can be easily automated). We were able to generate a number of very good models, especially in the Comparative Modeling and New Folds categories. Frequently, the best models were closer to the native structure than any of the templates used. The main problem we encountered was in the ranking of the final models (the only step of significant human intervention), due to the insufficient computational power, which precluded the possibility of full-atom refinement and energy-based evaluation. %B Proteins %V 61 Suppl. 7 %P 84–90 %8 jan %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/16187348 %R 10.1002/prot.20723 %0 Journal Article %J Bioinformatics (Oxford, England) %D 2005 %T A new approach to prediction of short-range conformational propensities in proteins %A Dominik Gront %A Andrzej Koliński %K Algorithms %K Amino Acid %K Artificial Intelligence %K Chemical %K Computer Simulation %K Databases %K Gas Chromatography-Mass Spectrometry %K Gas Chromatography-Mass Spectrometry: methods %K Models %K Protein %K Protein Conformation %K Protein: methods %K Proteins %K Proteins: analysis %K Proteins: chemistry %K Sequence Alignment %K Sequence Alignment: methods %K Sequence Analysis %K Sequence Homology %K Structure-Activity Relationship %X

MOTIVATION: Knowledge-based potentials are valuable tools for protein structure modeling and evaluation of the quality of the structure prediction obtained by a variety of methods. Potentials of such type could be significantly enhanced by a proper exploitation of the evolutionary information encoded in related protein sequences. The new potentials could be valuable components of threading algorithms, ab-initio protein structure prediction, comparative modeling and structure modeling based on fragmentary experimental data. RESULTS: A new potential for scoring local protein geometry is designed and evaluated. The approach is based on the similarity of short protein fragments measured by an alignment of their sequence profiles. Sequence specificity of the resulting energy function has been compared with the specificity of simpler potentials using gapless threading and the ability to predict specific geometry of protein fragments. Significant improvement in threading sensitivity and in the ability to generate sequence-specific protein-like conformations has been achieved.

%B Bioinformatics (Oxford, England) %V 21 %P 981–987 %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/15509604 %R 10.1093/bioinformatics/bti080 %0 Journal Article %J Proteins %D 2001 %T Generalized comparative modeling (GENECOMP): a combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement %A Andrzej Koliński %A Marcos Betancourt %A Daisuke Kihara %A Piotr Rotkiewicz %A Jeffrey Skolnick %K Algorithms %K Chemical %K Combinatorial Chemistry Techniques %K Combinatorial Chemistry Techniques: methods %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Databases %K Factual %K Models %K Molecular %K Monte Carlo Method %K Protein Folding %K Proteins %K Proteins: chemistry %K Sequence Alignment %K Sequence Alignment: methods %X An improved generalized comparative modeling method, GENECOMP, for the refinement of threading models is developed and validated on the Fischer database of 68 probe-template pairs, a standard benchmark used to evaluate threading approaches. The basic idea is to perform ab initio folding using a lattice protein model, SICHO, near the template provided by the new threading algorithm PROSPECTOR. PROSPECTOR also provides predicted contacts and secondary structure for the template-aligned regions, and possibly for the unaligned regions by garnering additional information from other top-scoring threaded structures. Since the lowest-energy structure generated by the simulations is not necessarily the best structure, we employed two structure-selection protocols: distance geometry and clustering. In general, clustering is found to generate somewhat better quality structures in 38 of 68 cases. When applied to the Fischer database, the protocol does no harm and in a significant number of cases improves upon the initial threading model, sometimes dramatically. The procedure is readily automated and can be implemented on a genomic scale. %B Proteins %V 44 %P 133–149 %8 aug %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/11391776 %0 Journal Article %J Nature Biotechnology %D 2000 %T Structural genomics and its importance for gene function analysis %A Jeffrey Skolnick %A Jacquelyn S. Fetrow %A Andrzej Koliński %K Animals %K Computer Simulation %K Databases %K Evolution %K Factual %K Genome %K Humans %K Internet %K Molecular %K Molecular Biology %K Molecular Biology: methods %K Protein Folding %K Structure-Activity Relationship %X Structural genomics projects aim to solve the experimental structures of all possible protein folds. Such projects entail a conceptual shift from traditional structural biology in which structural information is obtained on known proteins to one in which the structure of a protein is determined first and the function assigned only later. Whereas the goal of converting protein structure into function can be accomplished by traditional sequence motif-based approaches, recent studies have shown that assignment of a protein's biochemical function can also be achieved by scanning its structure for a match to the geometry and chemical identity of a known active site. Importantly, this approach can use low-resolution structures provided by contemporary structure prediction methods. When applied to genomes, structural information (either experimental or predicted) is likely to play an important role in high-throughput function assignment. %B Nature Biotechnology %V 18 %P 283–287 %8 mar %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/10700142 %R 10.1038/73723 %0 Journal Article %J Protein Science: a Publication of the Protein Society %D 1995 %T Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets %A Adam Godzik %A Andrzej Koliński %A Jeffrey Skolnick %K Amino Acid Sequence %K Amino Acids %K Crystallography %K Databases %K Factual %K Magnetic Resonance Spectroscopy %K Mathematics %K Models %K Protein Conformation %K Protein Folding %K Proteins %K Proteins: chemistry %K Theoretical %K Thermodynamics %K X-Ray %X Various existing derivations of the effective potentials of mean force for the two-body interactions between amino acid side chains in proteins are reviewed and compared to each other. The differences between different parameter sets can be traced to the reference state used to define the zero of energy. Depending on the reference state, the transfer free energy or other pseudo-one-body contributions can be present to various extents in two-body parameter sets. It is, however, possible to compare various derivations directly by concentrating on the "excess" energy-a term that describes the difference between a real protein and an ideal solution of amino acids. Furthermore, the number of protein structures available for analysis allows one to check the consistency of the derivation and the errors by comparing parameters derived from various subsets of the whole database. It is shown that pair interaction preferences are very consistent throughout the database. Independently derived parameter sets have correlation coefficients on the order of 0.8, with the mean difference between equivalent entries of 0.1 kT. Also, the low-quality (low resolution, little or no refinement) structures show similar regularities. There are, however, large differences between interaction parameters derived on the basis of crystallographic structures and structures obtained by the NMR refinement. The origin of the latter difference is not yet understood. %B Protein Science: a Publication of the Protein Society %V 4 %P 2107–2117 %8 oct %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2142984&tool=pmcentrez&rendertype=abstract %R 10.1002/pro.5560041016