Studies on Combining Sequence and Structure for Protein Classification



Journal Title

Journal ISSN

Volume Title



The ultimate goal of our research is to develop a better understanding of how proteins evolve different structures and functions. A large scale protein clustering can provide a useful platform to identify such principles of protein evolution. Manual classification schemes accurately group homologous proteins, but they are slow and subjective. Automatic protein clustering methods are largely based on sequence information. Therefore, they often do not accurately reflect remote homologies that can be recognized by structural information. We hypothesized that combining evolutionary signals from protein sequence and 3D structure will improve automated protein classification. To test this hypothesis, we clustered proteins into evolutionary groups using both sequence and structure by a fully automated method. We developed a stringent algorithm, self-consistency grouping (SCG) method, which clusters proteins if all the proteins in the group are more similar to each other than to proteins outside the group. Comparison of SCG and other commonly used clustering methods to a widely accepted manual classification scheme, Structural Classification of Protein (SCOP), showed SCG groups to better reflect the reference classification. In depth analysis of SCG clusters highlights new non-trivial evolutionary links between proteins. SCG clustering can be further developed as a reference for evolutionary classification of proteins. [Keywords: protein classification; protein evolution; fold change; homology; structural similarity; sequence similarity; bioinformatics; computational biology]