%0 Journal Article %J BMC Structural Biology %D 2010 %T Modeling of loops in proteins: a multi-method approach %A Michal Jamroz %A Andrzej Koliński %K Databases %K Models %K Molecular %K Protein %K Protein Structure %K Proteins %K Proteins: chemistry %K Software %K Tertiary %X BACKGROUND: Template-target sequence alignment and loop modeling are key components of protein comparative modeling. Short loops can be predicted with high accuracy using structural fragments from other, not necessairly homologous proteins, or by various minimization methods. For longer loops multiscale approaches employing coarse-grained de novo modeling techniques should be more effective. RESULTS: For a representative set of protein structures of various structural classes test predictions of loop regions have been performed using MODELLER, ROSETTA, and a CABS coarse-grained de novo modeling tool. Loops of various length, from 4 to 25 residues, were modeled assuming an ideal target-template alignment of the remaining portions of the protein. It has been shown that classical modeling with MODELLER is usually better for short loops, while coarse-grained de novo modeling is more effective for longer loops. Even very long missing fragments in protein structures could be effectively modeled. Resolution of such models is usually on the level 2-6 A, which could be sufficient for guiding protein engineering. Further improvement of modeling accuracy could be achieved by the combination of different methods. In particular, we used 10 top ranked models from sets of 500 models generated by MODELLER as multiple templates for CABS modeling. On average, the resulting molecular models were better than the models from individual methods. CONCLUSIONS: Accuracy of protein modeling, as demonstrated for the problem of loop modeling, could be improved by the combinations of different modeling techniques. %B BMC Structural Biology %V 10 %P 5 %8 jan %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2837870&tool=pmcentrez&rendertype=abstract %R 10.1186/1472-6807-10-5 %0 Journal Article %J Journal of Structural and Functional Genomics %D 2009 %T Distance matrix-based approach to protein structure prediction %A Andrzej Kloczkowski %A Robert L. Jernigan %A Zhijun Wu %A Guang Song %A Lei Yang %A Andrzej Koliński %A Piotr Pokarowski %K Binding Sites %K Computer Simulation %K Databases %K Models %K Molecular %K Principal Component Analysis %K Protein %K Protein Conformation %K Proteins %K Proteins: chemistry %X

Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)–the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).

%B Journal of Structural and Functional Genomics %V 10 %P 67–81 %8 mar %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3018873&tool=pmcentrez&rendertype=abstract %R 10.1007/s10969-009-9062-2 %0 Journal Article %J Nucleic Acids Research %D 2008 %T AAindex: amino acid index database, progress report 2008 %A Kawashima, Shuichi %A Piotr Pokarowski %A Pokarowska, Maria %A Andrzej Koliński %A Katayama, Toshiaki %A Kanehisa, Minoru %K Amino Acids %K Amino Acids: chemistry %K Databases %K Internet %K Protein %K Proteins %K Proteins: chemistry %X

AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www\_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).

%B Nucleic Acids Research %V 36 %P D202–5 %8 jan %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2238890&tool=pmcentrez&rendertype=abstract %R 10.1093/nar/gkm998 %0 Journal Article %J BMC Structural Biology %D 2008 %T Contact prediction in protein modeling: scoring, folding and refinement of coarse-grained models %A Dorota Latek %A Andrzej Koliński %K Algorithms %K Caspase 6 %K Caspase 6: chemistry %K Caspase 6: genetics %K Computer Simulation %K Databases %K Models %K Molecular %K Protein %K Protein Folding %K Proteins %K Proteins: chemistry %K Proteins: genetics %X

Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB.

%B BMC Structural Biology %V 8 %P 36 %8 jan %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2527566&tool=pmcentrez&rendertype=abstract %R 10.1186/1472-6807-8-36 %0 Journal Article %J Journal of Computer-Aided Molecular Design %D 2008 %T Fast and accurate methods for predicting short-range constraints in protein models %A Dominik Gront %A Andrzej Koliński %K Algorithms %K Amino Acid Sequence %K Models %K Molecular %K Molecular Sequence Data %K Predictive Value of Tests %K Protein %K Proteins %K Proteins: chemistry %K Proteins: genetics %K Proteins: metabolism %K Sequence Analysis %K Software %X

Protein modeling tools utilize many kinds of structural information that may be predicted from amino acid sequence of a target protein or obtained from experiments. Such data provide geometrical constraints in a modeling process. The main aim is to generate the best possible consensus structure. The quality of models strictly depends on the imposed conditions. In this work we present an algorithm, which predicts short-range distances between Calpha atoms as well as a set of short structural fragments that possibly share structural similarity with a query sequence. The only input of the method is a query sequence profile. The algorithm searches for short protein fragments with high sequence similarity. As a result a statistics of distances observed in the similar fragments is returned. The method can be used also as a scoring function or a short-range knowledge-based potential based on the computed statistics.

%B Journal of Computer-Aided Molecular Design %V 22 %P 783–8 %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/18415023 %R 10.1007/s10822-008-9213-8 %0 Journal Article %J Bioinformatics (Oxford, England) %D 2007 %T Comparative modeling without implicit sequence alignments %A Andrzej Koliński %A Dominik Gront %K Algorithms %K Amino Acid Sequence %K Chemical %K Computer Simulation %K Models %K Molecular %K Molecular Sequence Data %K Protein %K Protein Conformation %K Protein: methods %K Proteins %K Proteins: chemistry %K Proteins: ultrastructure %K Sequence Alignment %K Sequence Alignment: methods %K Sequence Analysis %X

MOTIVATION: The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). RESULTS: The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.

%B Bioinformatics (Oxford, England) %V 23 %P 2522–7 %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/17660201 %R 10.1093/bioinformatics/btm380 %0 Journal Article %J BMC Structural Biology %D 2007 %T Type II restriction endonuclease R.Eco29kI is a member of the GIY-YIG nuclease superfamily %A Elena M. Ibryashkina %A Marina V. Zakharova %A Vladimir B. Baskunov %A Ekaterina S. Bogdanova %A Maxim O. Nagornykh %A Marat M Den'mukhamedov %A Bogdan S. Melnik %A Andrzej Koliński %A Dominik Gront %A Marcin Feder %A Alexander S. Solonin %A Janusz M. Bujnicki %K Amino Acid Sequence %K Binding Sites %K Computational Biology %K Computational Biology: methods %K Deoxyribonucleases %K DNA %K DNA Cleavage %K DNA: metabolism %K Electrophoretic Mobility Shift Assay %K Models %K Molecular %K Molecular Sequence Data %K Mutation %K Protein %K Protein Conformation %K Sequence Alignment %K Structural Homology %K Type II Site-Specific %K Type II Site-Specific: chemist %K Type II Site-Specific: metabol %X BACKGROUND: The majority of experimentally determined crystal structures of Type II restriction endonucleases (REases) exhibit a common PD-(D/E)XK fold. Crystal structures have been also determined for single representatives of two other folds: PLD (R.BfiI) and half-pipe (R.PabI), and bioinformatics analyses supported by mutagenesis suggested that some REases belong to the HNH fold. Our previous bioinformatic analysis suggested that REase R.Eco29kI shares sequence similarities with one more unrelated nuclease superfamily, GIY-YIG, however so far no experimental data were available to support this prediction. The determination of a crystal structure of the GIY-YIG domain of homing endonuclease I-TevI provided a template for modeling of R.Eco29kI and prompted us to validate the model experimentally. RESULTS: Using protein fold-recognition methods we generated a new alignment between R.Eco29kI and I-TevI, which suggested a reassignment of one of the putative catalytic residues. A theoretical model of R.Eco29kI was constructed to illustrate its predicted three-dimensional fold and organization of the active site, comprising amino acid residues Y49, Y76, R104, H108, E142, and N154. A series of mutants was constructed to generate amino acid substitutions of selected residues (Y49A, R104A, H108F, E142A and N154L) and the mutant proteins were examined for their ability to bind the DNA containing the Eco29kI site 5'-CCGCGG-3' and to catalyze the cleavage reaction. Experimental data reveal that residues Y49, R104, E142, H108, and N154 are important for the nuclease activity of R.Eco29kI, while H108 and N154 are also important for specific DNA binding by this enzyme. CONCLUSION: Substitutions of residues Y49, R104, H108, E142 and N154 predicted by the model to be a part of the active site lead to mutant proteins with strong defects in the REase activity. These results are in very good agreement with the structural model presented in this work and with our prediction that R.Eco29kI belongs to the GIY-YIG superfamily of nucleases. Our study provides the first experimental evidence for a Type IIP REase that does not belong to the PD-(D/E)XK or HNH superfamilies of nucleases, and is instead a member of the unrelated GIY-YIG superfamily. %B BMC Structural Biology %V 7 %P 48 %8 jan %@ 1472680774 %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1952068&tool=pmcentrez&rendertype=abstract %R 10.1186/1472-6807-7-48 %0 Journal Article %J Bioinformatics (Oxford, England) %D 2006 %T BioShell–a package of tools for structural biology computations %A Dominik Gront %A Andrzej Koliński %K Chemical %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Databases %K Models %K Protein %K Protein: methods %K Proteins %K Proteins: analysis %K Proteins: chemistry %K Proteins: classification %K Sequence Alignment %K Sequence Alignment: methods %K Sequence Analysis %K Software %X

SUMMARY: BioShell is a suite of programs performing common tasks accompanying protein structure modeling. BioShell design is based on UNIX shell flexibility and should be used as its extension. Using BioShell various molecular modeling procedures can be integrated in a single pipeline. AVAILABILITY: BioShell package can be downloaded from its website http://biocomp.chem.uw.edu.pl/BioShell and these pages provide many examples and a detailed documentation for the newest version.

%B Bioinformatics (Oxford, England) %V 22 %P 621–622 %8 mar %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/16407320 %R 10.1093/bioinformatics/btk037 %0 Journal Article %J Journal of Computer-Aided Molecular Design %D 2006 %T Three dimensional model of severe acute respiratory syndrome coronavirus helicase ATPase catalytic domain and molecular design of severe acute respiratory syndrome coronavirus helicase inhibitors %A Marcin Hoffmann %A Krystian Eitner %A Marcin von Grotthuss %A Leszek Rychlewski %A Ewa Banachowicz %A Tomasz Grabarkiewicz %A Tomasz Szkoda %A Andrzej Koliński %K Amino Acid Sequence %K Catalytic Domain %K Conserved Sequence %K DNA Helicases %K DNA Helicases: antagonists & inhibitors %K DNA Helicases: chemistry %K Drug Design %K Enzyme Inhibitors %K Enzyme Inhibitors: pharmacology %K Models %K Molecular %K Molecular Sequence Data %K Protein %K SARS Virus %K SARS Virus: enzymology %K Sequence Alignment %K Structural Homology %K Thermodynamics %X The modeling of the severe acute respiratory syndrome coronavirus helicase ATPase catalytic domain was performed using the protein structure prediction Meta Server and the 3D Jury method for model selection, which resulted in the identification of 1JPR, 1UAA and 1W36 PDB structures as suitable templates for creating a full atom 3D model. This model was further utilized to design small molecules that are expected to block an ATPase catalytic pocket thus inhibit the enzymatic activity. Binding sites for various functional groups were identified in a series of molecular dynamics calculation. Their positions in the catalytic pocket were used as constraints in the Cambridge structural database search for molecules having the pharmacophores that interacted most strongly with the enzyme in a desired position. The subsequent MD simulations followed by calculations of binding energies of the designed molecules were compared to ATP identifying the most successful candidates, for likely inhibitors - molecules possessing two phosphonic acid moieties at distal ends of the molecule. %B Journal of Computer-Aided Molecular Design %V 20 %P 305–319 %8 may %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/16972168 %R 10.1007/s10822-006-9057-z %0 Journal Article %J Proteins %D 2005 %T Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models %A Andrzej Koliński %A Janusz M. Bujnicki %K Algorithms %K Computational Biology %K Computational Biology: methods %K Computer Simulation %K Computers %K Data Interpretation %K Databases %K Dimerization %K Models %K Molecular %K Monte Carlo Method %K Protein %K Protein Conformation %K Protein Folding %K Protein Structure %K Proteomics %K Proteomics: methods %K Reproducibility of Results %K Secondary %K Sequence Alignment %K Software %K Statistical %K Tertiary %X To predict the tertiary structure of full-length sequences of all targets in CASP6, regardless of their potential category (from easy comparative modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in different categories in CASP5. First, the GeneSilico metaserver was used to identify domains, predict secondary structure, and generate fold recognition (FR) alignments, which were converted to full-atom models using the "FRankenstein's Monster" approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated "de novo" by fully automated servers were obtained from the CASP website. All these models were evaluated by VERIFY3D, and residues with scores better than 0.2 were used as a source of spatial restraints. Second, a new implementation of the lattice-based protein modeling tool CABS was used to carry out folding guided by the above-mentioned restraints with the Replica Exchange Monte Carlo sampling technique. Decoys generated in the course of simulation were subject to the average linkage hierarchical clustering. For a representative decoy from each cluster, a full-atom model was rebuilt. Finally, five models were selected for submission based on combination of various criteria, including the size, density, and average energy of the corresponding cluster, and the visual evaluation of the full-atom structures and their relationship to the original templates. The combination of FRankenstein and CABS was one of the best-performing algorithms over all categories in CASP6 (it is important to note that our human intervention was very limited, and all steps in our method can be easily automated). We were able to generate a number of very good models, especially in the Comparative Modeling and New Folds categories. Frequently, the best models were closer to the native structure than any of the templates used. The main problem we encountered was in the ranking of the final models (the only step of significant human intervention), due to the insufficient computational power, which precluded the possibility of full-atom refinement and energy-based evaluation. %B Proteins %V 61 Suppl. 7 %P 84–90 %8 jan %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/16187348 %R 10.1002/prot.20723 %0 Journal Article %J Bioinformatics %D 2005 %T HCPM–program for hierarchical clustering of protein models %A Dominik Gront %A Andrzej Koliński %K Algorithms %K Chemical %K Cluster Analysis %K Computer Simulation %K Internet %K Models %K Molecular %K Protein %K Protein: methods %K Proteins %K Proteins: analysis %K Proteins: chemistry %K Sequence Alignment %K Sequence Alignment: methods %K Sequence Analysis %K Software %K User-Computer Interface %X HCPM is a tool for clustering protein structures from comparative modeling, ab initio structure prediction, etc. A hierarchical clustering algorithm is designed and tested, and a heuristic is provided for an optimal cluster selection. The method has been successfully tested during the CASP6 experiment. %B Bioinformatics %V 21 %P 3179–80 %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/15840705 %R 10.1093/bioinformatics/bti450 %0 Journal Article %J Bioinformatics (Oxford, England) %D 2005 %T A new approach to prediction of short-range conformational propensities in proteins %A Dominik Gront %A Andrzej Koliński %K Algorithms %K Amino Acid %K Artificial Intelligence %K Chemical %K Computer Simulation %K Databases %K Gas Chromatography-Mass Spectrometry %K Gas Chromatography-Mass Spectrometry: methods %K Models %K Protein %K Protein Conformation %K Protein: methods %K Proteins %K Proteins: analysis %K Proteins: chemistry %K Sequence Alignment %K Sequence Alignment: methods %K Sequence Analysis %K Sequence Homology %K Structure-Activity Relationship %X

MOTIVATION: Knowledge-based potentials are valuable tools for protein structure modeling and evaluation of the quality of the structure prediction obtained by a variety of methods. Potentials of such type could be significantly enhanced by a proper exploitation of the evolutionary information encoded in related protein sequences. The new potentials could be valuable components of threading algorithms, ab-initio protein structure prediction, comparative modeling and structure modeling based on fragmentary experimental data. RESULTS: A new potential for scoring local protein geometry is designed and evaluated. The approach is based on the similarity of short protein fragments measured by an alignment of their sequence profiles. Sequence specificity of the resulting energy function has been compared with the specificity of simpler potentials using gapless threading and the ability to predict specific geometry of protein fragments. Significant improvement in threading sensitivity and in the ability to generate sequence-specific protein-like conformations has been achieved.

%B Bioinformatics (Oxford, England) %V 21 %P 981–987 %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/15509604 %R 10.1093/bioinformatics/bti080 %0 Journal Article %J Biophysical Journal %D 2003 %T TOUCHSTONE II: a new approach to ab initio protein structure prediction %A Yang Zhang %A Andrzej Koliński %A Jeffrey Skolnick %K Algorithms %K Amino Acid Sequence %K Computer Simulation %K Crystallography %K Crystallography: methods %K Energy Transfer %K Models %K Molecular %K Molecular Sequence Data %K Protein %K Protein Conformation %K Protein Folding %K Protein Structure %K Protein: methods %K Proteins %K Proteins: chemistry %K Secondary %K Sequence Analysis %K Software %K Static Electricity %K Statistical %X We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale. %B Biophysical Journal %V 85 %P 1145–64 %G eng %U http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1303233&tool=pmcentrez&rendertype=abstract %R 10.1016/S0006-3495(03)74551-2