“Cells function as a remarkably synchronized orchestra of finely tuned molecular interactions, and establishing this molecular network has become a major goal of molecular biology,” states the team of Dr. Alfonso Valencia, of the Spanish National Cancer Research Center (CNIO). This is the opening sentence of an article recently published in the journal Proceedings of the National Academy of Sciences, in which this group of researchers showed that it is possible to understand numerous interactions among human proteins by analyzing how collections of homologous molecule sequences have evolved in prokaryote organisms.
Despite their importance in the function of our cells, tissues and systems, or the key role they can play in the onset of certain diseases, only a few thousand protein interactions, of the nearly 200,000 estimated to exist in human beings, have been characterized at the molecular level. Advancements in electron microscopy and crystallography have made it possible to study the molecular details of some of these interactions. However, when the information is non-existent or incomplete, the most recommendable predictive technique is a comparative modeling of proteins.
The importance of revealing protein-protein interactions
Specifically, as explained in the article, “homology modeling of protein-protein interactions follows a conservation-based approach, in which the quaternary structure of one or more experimentally solved complexes with enough sequence similarity to a target complex is projected onto the target.” Nonetheless, according to a study published in the Journal of Molecular Biology, there is a “twilight zone” in sequence similarity. In this region, sequences may be related, but unable to detect through alignment. Yet, variability among sequences can become a powerful informative tool if the coevolution of residues is analyzed. That is precisely the approach used in the CNIO team’s study, which used massive sequencing and coevolution to unlock some of the secrets of interactomics.
“The interacting proteins tend to undergo coordinated evolutionary changes that maintain this interaction, despite the accumulation of mutations over time,” says David Juan, of CNIO’s Structural Biology and Biocomputing Program. The group’s research was aimed at demonstrating whether strong coevolutionary signals identify highly-conserved contacts between proteins, turning them into an appropriate tool for homology-based projection. For their analysis, researchers took over 4,500 heterodimeric complexes, from the Pfam collection, whose three-dimensional structure had already been solved. Later, they developed a protocol that, for every pair of Pfam domains, searched to determine whether there was at least one member of every heterodimer encoded in the database of over 15,000 prokaryotic genomes, using the HMMER software. This program allowed the identification of 559 protein-protein interactions that presented a high number of nonredundant sequences, and that were aligned to determine the degree of divergence and coevolution.
The approach, as described in the article, is based on “the analysis of coevolving residues between domains in 15,271 prokaryotic genomes and their homologous sites in 3D structures of eukaryotic complexes”. They showed that human protein interactions can be inferred with great accuracy thanks to the data of the genomes of bacteria obtained from massive sequencing. The few errors found in the projections between eukaryotic and prokaryotic structures could be detected by checking the quality of sequence alignment. The analysis carried out made it possible for CNIO scientists to predict over 31,000 experimentally known interactions between human proteins, which stand for 15% of the human interactome. According to the authors of the study published in PNAS, this suggests that large-scale prediction of contact and interaction points between eukaryotic molecules is viable.
The researchers demonstrated that massive sequencing and coevolution help reveal molecular details of the interactions between human proteins from the systematic identification of contacts between prokaryotic residues that are structurally conserved in eukaryotes. Their approach can predict interactions between proteins in humans of which little or nothing was previously known, using only the mutational patterns taken from the massive analysis of prokaryotic genomes. Even in cases of high divergence between sequences, the so-called twilight zone, in which homology modeling is unreliable, the method implemented by the CNIO can be applied to better understand different aspects of protein-protein interactions.
Although, in evolutionary terms, bacteria and human beings are separated by more than 3 billion years, it is true that the analysis of a significant amount of biological data using computational methods has helped to gather information of great interest from the basic as well as applied research standpoint. “Knowing more about these interactions opens the door to the attainment of three-dimensional models that are useful to design drugs that target relevant interactions in a number of tumors,” says David Juan. Advancements in massive sequencing allow researchers to gather an ever-increasing amount of data. With them, it is becoming possible to build more complex statistical models, and provide a more comprehensive view of biological systems.