Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2){\textendash}the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).

}, keywords = {Binding Sites, Computer Simulation, Databases, Models, Molecular, Principal Component Analysis, Protein, Protein Conformation, Proteins, Proteins: chemistry}, issn = {1570-0267}, doi = {10.1007/s10969-009-9062-2}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3018873\&tool=pmcentrez\&rendertype=abstract}, author = {Andrzej Kloczkowski and Robert L. Jernigan and Zhijun Wu and Guang Song and Lei Yang and Andrzej Koli{\'n}ski and Piotr Pokarowski} } @article {Kawashima2008, title = {AAindex: amino acid index database, progress report 2008}, journal = {Nucleic Acids Research}, volume = {36}, number = {Database issue}, year = {2008}, month = {jan}, pages = {D202{\textendash}5}, abstract = {AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www\_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).

}, keywords = {Amino Acids, Amino Acids: chemistry, Databases, Internet, Protein, Proteins, Proteins: chemistry}, issn = {1362-4962}, doi = {10.1093/nar/gkm998}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2238890\&tool=pmcentrez\&rendertype=abstract}, author = {Kawashima, Shuichi and Piotr Pokarowski and Pokarowska, Maria and Andrzej Koli{\'n}ski and Katayama, Toshiaki and Kanehisa, Minoru} } @article {Pokarowski2007, title = {Ideal amino acid exchange forms for approximating substitution matrices}, journal = {Proteins: Structure, Function, Bioinformatics}, volume = {69}, year = {2007}, pages = {379{\textendash}393}, abstract = {We have analyzed 29 published substitution matrices (SMs) and five statistical protein contact potentials (CPs) for comparison. We find that popular, {\textquoteleft}classical{\textquoteright} SMs obtained mainly from sequence alignments of globular proteins are mostly correlated by at least a value of 0.9. The BLOSUM62 is the central element of this group. A second group includes SMs derived from alignments of remote homologs or transmembrane proteins. These matrices correlate better with classical SMs (0.8) than among themselves (0.7). A third group consists of intermediate links between SMs and CPs - matrices and potentials that exhibit mutual correlations of at least 0.8. Next, we show that SMs can be approximated with a correlation of 0.9 by expressions c