CHAPTER 3 Introduction to Proteins: The Primary Level of Protein Structure We have seen that one class of biopolymers, the nucleic acids, stores and transmits the genetic information of the cell. Much of that information is expressed in another class of biopolymers, the proteins. Proteins play an enormous variety of roles: Some carry out the transport and storage of small molecules; others make up a large part of the structural framework of cells and tissues. Muscle contraction, the immune response, and blood clotting are all mediated by proteins. An important class of proteins is the enzymes〞the catalysts that promote the tremendous variety of reactions that are required to support the living state. Each type of cell in every organism has several thousand kinds of proteins to serve these many functions. In keeping with the multiplicity of their functions, proteins are extremely complex molecules. This complexity is illustrated in Figure 3.1, which depicts the molecular structure of myoglobin, a relatively small protein that functions primarily in oxygen binding and storage in animal tissues. In this and the following three chapters, we analyze in detail the structures and functions of a handful of proteins, including myoglobin. We will see that although there are general features of protein structure shared by most proteins, each protein has a distinct structure that is optimally suited to its function. Protein structures may appear at first glance to be hopelessly complex; however, there is an elegant and readily comprehensible logic to protein structure, which we will describe here and in Chapter 4. We begin with a description of the simple ※building blocks§ that are found in all proteins: the amino acids. Amino Acids Structure of the 汐-Amino Acids All proteins are polymers, and the monomers that combine to make them are 汐-amino acids. A general representation of an 汐-amino acid is shown in Figure 3.2(a). The amino Protein function is determined by group is attached to the 汐-carbon, the carbon next to the carboxylic acid protein structure, which, in turn, is group; hence the name 汐-amino acid. To the 汐-carbon of every amino determined by the structures and acid are also attached a hydrogen atom and a side chain (※R§ group). properties of the various amino Different 汐-amino acids are distinguished by their different side chains. acids which make up the protein. We can write the general structure for an 汐-amino acid as shown in Figure 3.2(a). This representation, although chemically correct, ignores the conditions in vivo. Most biochemistry occurs in the physiological pH range near neutrality. The pKa*s of the carboxylic acid and amino groups of the 汐-amino acids are about 2 and 10, respectively. Therefore, near neutral pH the carboxylic acid group will have lost a proton, and the amino group will have picked up a proton, to yield the zwitterion form shown in Figure 3.2(b). This is the form in which we will customarily write amino acid structures. Twenty different kinds of amino acids are commonly incorporated into proteins during the process of translation. The complete structures of these amino acids are shown in Figure 3.3 and other important data are given in Table 3.1. At least two additional amino acids, selenocysteine and pyrrolysine, are encoded genetically and incorporated into proteins; however, they are found in a relatively small number of proteins. For the purposes of this introductory discussion we will focus our attention on the twenty common amino acids shown in Figure 3.3. a Approximate values found for side chains on the free amino acids. b To obtain the mass of the amino acid itself, add the mass of a molecule of water, 18.02 daltons. The values given are for neutral side chains; slightly different values will apply at pH values where protons have been gained or lost from the side chains. c Average for a large number of proteins. Individual proteins can show large deviations from these values. Data from Journal of Chemical Information and Modeling 50:690每700, J. M. Otaki, M. Tsutsumi, T. Gotoh, and H. Yamamoto, Secondary structure characterization based on amino acid composition and availability in proteins. . 2010 American Chemical Society. W. P. Jencks and J. Regenstein (1976) Ionization constants of acids and bases in Handbook of Biochemistry and Molecular Biology, 3rd ed., G. Fasman (ed.), CRC Press, Boca Raton, FL. Stereochemistry of the 汐-Amino Acids The asymmetry of biomolecules plays a critical role in determining their structures and functions; thus, familiarity with the basic stereochemistry of amino acids is necessary for an understanding of the biochemistry of proteins. The four groups shown in Figure 3.2(a) are bonded to the central 汐-carbon in a tetrahedral arrangement, as CHAPTER 3 Introduction to Proteins: The Primary Level of Protein Structure ALIPHATIC AMINO ACIDS Glycine (Gly) G Alanine (Ala) A Valine (Val) V Leucine (Leu) L Isoleucine (Ile) I AMINO ACIDS WITH HYDROXYL- OR SULFUR-CONTAINING SIDE CHAINS CYCLIC AMINO ACID Serine (Ser) S Cysteine (Cys) C Threonine (Thr) T Methionine (Met) M Proline (Pro) P AROMATIC AMINO ACIDS BASIC AMINO ACIDS Phenylalanine (Phe) F Tyrosine (Tyr) Y Tryptophan (Trp) W Histidine (His) H Lysine (Lys) K Arginine (Arg) R ACIDIC AMINO ACIDS AND THEIR AMIDES Aspartic acid (Asp) D Glutamic acid (Glu) E Asparagine (Asn) N Glutamine (Gln) Q Figure 3.3 The 20 common amino acids found in proteins. The 20 common 汐-amino acids that are incorporated into proteins are shown in Fischer projections and arranged here in the order in which they are discussed in the text. The ※side chain§ or ※R-group§ of each amino acid is highlighted in orange. Below each amino acid are its name, its three-letter abbreviation, and its one-letter abbreviation. is predicted for an sp3 hybridized carbon atom. In Figure 3.2 the projection of these groups around the 汐每carbon (C汐) is represented as follows: The lines represent bonds in the plane of the page, the solid wedges represent bonds projecting forward from the page, and the dashed wedges represent bonds projecting behind the page. When a carbon atom has four different substituents attached to it, it is said to be chiral, or a stereocenter, or, preferably, an asymmetric carbon. In Figure 3.3, the stereochemistry of the amino acids is shown by a convention known as the Fischer projection. In a Fischer projection the bonds are all represented as solid lines, where the horizontal bonds project forward from the page and the vertical bonds project behind the page. To help you visualize the Fischer projection convention, we have drawn in Figure 3.4 the general structure of an amino acid in a ball-and-stick rendering as well as a Fischer projection that includes solid and dashed wedges. Note that the spatial orientation of the four groups bound to the C汐 is the same in Figures 3.2 to Figures 3.4. If a molecule contains one asymmetric carbon, two distinguishable stereoisomers exist; these are CHAPTER 3 Introduction to Proteins: The Primary Level of Protein Structure 51 advantage over the D-isomers for biological function. Indeed, D-amino acids exist in nature, and some play important biochemical roles (some examples are given in Table 3.2), but they are rarely found in proteins. Many scientists have attempted to provide explanations for this ※handedness preference§ in biology. Most point to an intrinsic asymmetry in the behavior of subnuclear particles, a kind of asymmetry that gives electrons emitted in 汕 decay a preferential left-hand spin. Such influences are very weak but might, in a competition between primitive organisms using L- or D-proteins, give a slight advantage to one or the other. After billions of generations, even a small advantage can become overwhelming. Using peptide synthesis methods, it is possible to chemically synthesize proteins using all D-amino acids. These structures are the mirror images of the corresponding natural proteins. One such D-protein synthesized in the laboratory of Stephen Kent is the mirror image of a protease (a protein-cleaving enzyme) from the human immunodeficiency virus, HIV. Whereas its natural L-counterpart cleaves natural l-proteins, this synthetic enzyme will cleave only those containing d-amino acids. The results of this experiment suggest that life would be possible for cells that made proteins from only d-amino acids rather than l-amino acids. The preference for l-amino acids in natural proteins has two important consequences, which we will discuss further in subsequent chapters: The stereochemistry of the amino 1. The surface of any given protein, which is where the interesting acids plays an important role in biochemistry occurs, is asymmetric. This asymmetry is the basis the formation of the structure of for the highly specific molecular recognition of binding targets by proteins. proteins. CHAPTER 3 Introduction to Proteins: The Primary Level of Protein Structure The product of this oxidation is given the name cystine. We do not list it among the 20 amino acids because cystine is always formed by oxidation of two cysteine side chains and is not coded for by DNA. Such disulfide bonds often play an important role in stabilizing the structure of a protein. Aromatic Amino Acids Three amino acids, phenylalanine, tyrosine, and tryptophan, carry aromatic side chains. Phenylalanine, together with the aliphatic amino acids valine, leucine, and isoleucine, is one of the most hydrophobic amino acids. Tyrosine and tryptophan have some hydrophobic character as well, but it is tempered by the polar groups in their side chains. In addition, tyrosine can ionize at high pH: H OO. p K aa = 10.1 +H+ CH2 CH2 H2N C COO. H2N C COO. H H The aromatic amino acids, like most highly conjugated compounds, absorb light in the near-ultraviolet region of the spectrum (Figure 3.6). This characteristic is frequently used for the detection and/or quantitation of proteins, by measuring absorption at 280 nm. Basic Amino Acids Histidine, lysine, and arginine carry basic groups in their side chains. They are represented in Figure 3.3 in the form that predominates at pH 7. Histidine is the least basic of the three, and the imidazole ring in the side chain of the free amino acid loses its proton at about pH 6 (pKa values for the side chains of free amino acids are given in Table 3.1). Lysine and arginine are more basic amino acids, and as their pKa values indicate (Tables 3.1 and Molar absorptivity, M每1﹞cm每1 10,000 1000 3.3), their side chains are almost always positively charged under physiological conditions. The guanidino group of arginine is a particularly strong base due to the resonance stabilization of the 100 10 Wavelength, nm Figure 3.6 Absorption spectra of the aromatic amino acids in the near-ultraviolet region. Tryptophan (red; 竹max=278 nm) and tyrosine (blue; 竹max=274 nm) account for most of the UV absorbance by proteins in the region around 280 nm. Phenylalanine (black; 竹max=258 nm) does not absorb at 280 nm. Note that the absorptivity scale is logarithmic. Compared with nucleic acids, amino acids absorb only weakly in the UV; see Figure 2.5 for comparison. Reprinted from Advances in Protein Chemistry 17:303每390, D. B. Wetlaufer, Ultraviolet spectra of proteins and amino acids. . 1962, with permission a Values outside these ranges are observed. For example, side chain carboxyls have been reported with pKa values as high as 7.3. from Elsevier. protonated side chain. The basic amino acids are strongly polar, and as a consequence they are usually found on the exterior surfaces of proteins, where they can be hydrated by the surrounding aqueous environment. Acidic Amino Acids and Their Amides Aspartic acid and glutamic acid typically carry negative charges at pH 7; they are depicted in the anionic forms in Figure 3.3. The pKa values of the acidic amino acids are so low (see Table 3.3) that even when the amino acids are incorporated into proteins, the negative charge on the side chain is typically retained under physiological conditions. Hence, these amino acid residues are often referred to as aspartate and glutamate (i.e., the conjugate bases rather than the acids). Companions to aspartic and glutamic acids are their amides, asparagine and glutamine. Unlike their acidic analogs, asparagine and glutamine have uncharged polar side chains. Like the basic and acidic amino acids, they are hydrophilic and tend to be on the surface of a protein molecule, in contact with the surrounding water. H2C CH2SeH Rare Genetically Encoded Amino Acids H3N C COO H We have considered the 20 common amino acids that are coded for in DNA Selenocysteine and are incorporated directly into proteins by ribosomal synthesis. There are two other amino acids encoded in gene sequences: selenocysteine, which N is widely distributed but found in few proteins, and pyrrolysine, which is restricted to a few archaea and eubacteria. Selenocysteine (※Sec§) and HN CH3 pyrrolysine (※Pyl§) are sometimes referred to, respectively, as the 21st and (CH2)4 22nd amino acids (Figure 3.7). Selenocysteine is a structural analog of cysteine H3N C COO in which the sulfur atom is replaced by a selenium atom. In prokaryotes H selenoproteins are typically involved in catabolic processes, whereas in Pyrrolysine eukaryotes the roughly 25 selenoproteins characterized to-date appear to be Figure 3.7 anabolic and/or anti-oxidant catalysts. Pyrrolysine is a derivative of lysine in which a 4-methyl-pyrroline-5-carboxylic acid forms an amide bond with the 汍每 Structures of selenocysteine and pyrrolysine. The selenium amino group of the lysine side chain. Pyrrolysine is found in the active sites of atom in selenocysteine and several archaeal enzymes involved in the catabolism of methylamine. the 4-methyl-pyrroline-5- carboxylic acid in pyrrolysine Modified Amino Acids are highlighted in red. The repertoire of side chain groups in proteins can be expanded beyond the Nonpolar amino acids are 20 canonical structures described above by the chemical modification of typically found in the interiors of certain amino acids after they are assembled into proteins. The structures of soluble proteins, whereas polar four such post-translationally modified amino acids are depicted below, with and charged amino acids are the modifying group shown in red. typically found on the surfaces We shall consider these again when we encounter specific proteins in of proteins. which such modification has occurred. + NH3 CH2 e PO32. OH H dC OH .OOC COO. O CH CH2 CH gg CH2 CH2 CH2 bCH2 bCH2 + + + + H3N C COO. H2N C COO. H3N C COO. H3N C COO. a a H HHH Phosphoserine 4-Hydroxyproline -Hydroxylysine -Carboxyglutamic acid CHAPTER 3 Introduction to Proteins: The Primary Level of Protein Structure The amino acids found in proteins are by no means the only ones to occur in living organisms. Many other amino acids play important roles in metabolism. A partial list is given in Table 3.2. Note that not all of them are 汐-amino acids, or the L-enantiomers. Peptides and the Peptide Bond Peptides Amino acids can be covalently linked together by formation of an amide bond between the 汐 -carboxylic acid group on one amino acid and the Sometimes amino acid side 汐 -amino group on another. This bond is often referred to as a peptide chains are modified after bond, and the products formed by such a linkage are called peptides. incorporation into a protein〞a The formation of a peptide bond between glycine and alanine is shown in process called ※post-translational Figures 3.8 and 3.9. The product in this case is called a dipeptide because modification.§ two amino acids have been combined. As illustrated in Figure 3.10, the reaction can be viewed as a simple elimination of a water molecule between Oligopeptides and polypeptides the carboxylic acid of one amino acid and the amino group of the other. are formed by polymerization Note that amide bond formation leaves an H3N+ㄜgroup available on one of amino acids. All proteins are end of the dipeptide and a ㄜCOOㄜ group on the other; thus, the reaction polypeptides. could in principle be continued by adding, for example, glutamic acid to one end and lysine to the other to yield the tetrapeptide shown in Figure 3.10. As each amino acid is added to the chain, another molecule of water must be eliminated. The portion of each amino acid remaining in the chain is called an amino acid residue. When specifying an amino acid residue in a peptide, the suffix 每yl may be used to replace 每ine or 每ate in the name of the amino acid (e.g., glycyl for glycine, aspartyl for aspartate; tryptophanyl and cysteinyl are exceptions to this general rule). Thus, the alanyl residue in the tetrapeptide in Figure 3.10 is CH3 O NC C HH In the structure of a peptide we distinguish the side chains (i.e., the R groups in Figure 3.3) from the main chain (or peptide backbone), which is + + composed of the atoms that make up the peptide bonds: the 汐-NH, the C汐 and the 汐-CˊO groups of each amino acid residue in the peptide. The N-terminal amino group and C-terminal carboxylate Glycine Alanine are also part of the main chain. Chains containing only a few amino acid residues (like a tetrapeptide) are collectively referred to as oligopeptides. If the chain is longer (>15每20 residues), it is called a polypeptide. Polypeptides greater than ~50 residues are generally referred to as proteins (note that most globular proteins contain 250每600 amino acid residues). As shown in Figure 3.10, most oligopeptides and polypeptides retain an unreacted amino group at one end (called the amino terminus or N-terminus) and an Figure 3.8 unreacted carboxylic acid group at the other end (the carboxyl terminus or C-terminus). Exceptions Formation of a dipeptide. Here the dipeptide glycylalanine are certain small cyclic oligopeptides, in which the (Gly每Ala) is depicted as being formed by removal of a water molecule as glycine is linked to alanine (See Figure 3.9).