Sequence and Structure Based Protein Folding Studies With Implications
MetadataShow full item record
As the expression of the genetic blueprint, proteins are at the heart of all biological systems. The ever increasing set of available protein structures has taught us that diversity is the hallmark of their architecture, a fundamental characteristic that enables them to perform the vast array of functionality upon which all of life depends. This diversity, however, is central to one of the most challenging problems in molecular biology: how does a folding polypeptide chain navigate its way through all of the myriad of possible conformations to find its own particular biologically active form? With few overarching structural principles to draw upon that can be applied to all protein architecture, the search for a solution to the protein folding problem has yet to produce an algorithm that can explain and duplicate this fundamental biological process. In this thesis, we take a two-pronged approach for investigating the protein folding process. Our initial statistical studies of the distributions of hydrophobic and hydrophilic residues within α-helices and β-sheets suggest (i) that hydrophobicity plays a critical role in helix and sheet formation; and (ii) that the nucleation of these motifs may result in largely unidirectional growth. Most tellingly, from an examination of the amino acids found in the smallest β-sheets, we do not find any evidence of a β-nucleating code in the primary protein sequence. Complementing these statistical analyses, we have analyzed the structural environments of several ever-widening aspects of protein topology. Our examination of the gaps between strands in the smallest β-sheets reveals a common organizational principle underlying β-formation involving strands separated by large sequential gaps: with very few exceptions, these large gaps fold into single, compact structural modules, bringing the β-strands that are otherwise far apart in the sequence close together in space. We conclude, therefore, that β-nucleation in the smallest sheets results from the co-location of two strands that are either local in sequence, or local in space following prior folding events. A second study of larger β-sheets both corroborates and extends these findings: virtually all large sequential gaps between pairs of β-strands organize themselves into an hierarchical arrangement, creating a bread-crumb model of go-and-come-back structural organization that ultimately juxtaposes two strands of a parental β-structure that are far apart in the sequence in close spatial proximity. In a final study, we have formalized this go-and-come-back notion into the concept of anti-parallel double-strandedness (DS), and measure this property across protein architecture in general. With over 90% of all residues in a large, non-redundant set of protein structures classified as DS, we conclude that DS is a unifying structural principle that underpins all globular proteins. We postulate, moreover, that this one simple principle, anti-parallel double-strandedness, unites protein structure, protein folding and protein evolution.