%0 Journal Article %J BMC Bioinformatics %D 2016 %T Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach %A Shamima Rashid %A Saras Saraswathi %A Andrzej Kloczkowski %A Suresh Sundaram %A Andrzej Koliński %X Background Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins. Results The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils. Conclusions The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications. %B BMC Bioinformatics %V 17 %8 2016 %G eng %N 362 %0 Journal Article %J Journal of Molecular Modeling %D 2013 %T Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure %A Saras Saraswathi %A J. L. Fernández-Martínez %A Andrzej Koliński %A Robert L. Jernigan %A Andrzej Kloczkowski %X Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering. %B Journal of Molecular Modeling %V 19 %P 4337-48 %8 2013 Oct %G eng %N 10 %R 10.1007/s00894-013-1911-z %0 Journal Article %J Journal of Molecular Modeling %D 2012 %T Fast learning optimized prediction methodology (FLOPRED) for protein secondary structure prediction %A Saras Saraswathi %A Juan Luis Fernandez Martinez %A Andrzej Koliński %A Robert L. Jernigan %A Andrzej Kloczkowski %K knowledge-based potentials %K learning %K machine %K neural networks %K particle swarm optimization %K protein secondary structure prediction %X

Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computational methods to predict structures and identify their functions from the sequence. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, including drug development and discovery of biomarkers. A novel method called fast learning optimized prediction methodology (FLOPRED) is proposed for predicting protein secondary structure, using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data that yield better and faster convergence to produce more accurate results. Protein secondary structures are predicted reliably, more efficiently and more accurately using FLOPRED. These techniques yield superior classification of secondary structure elements, with a training accuracy ranging between 83 % and 87 % over a widerange of hidden neurons and a cross-validated testing accuracy ranging between 81 % and 84 % and a segment overlap (SOV) score of 78 % that are obtained with different sets of proteins. These results are comparable to other recently published studies, but are obtained with greater efficiencies, in terms of time and cost.

%B Journal of Molecular Modeling %V 18 %P 4275–89 %G eng %U http://www.ncbi.nlm.nih.gov/pubmed/22562230 %R 10.1007/s00894-012-1410-7 %0 Conference Proceedings %B International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation %D 2010 %T Protein secondary structure prediction using knowledge-based potentials %A Saras Saraswathi %A Robert L. Jernigan %A Andrzej Kloczkowski %A Andrzej Koliński %B International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation %P 370-375 %8 2010 %G eng