Isotopic labeling plays an indispensable role in structure determination of proteins and other biomacromolecules using solidstate NMR. It not only enhances the NMR sensitivity but also allows for site-specific interrogation of structures and intermolecular contacts. This article gives a survey of the different isotopic labeling approaches available today for biological solid-state NMR research.
The simplest and most cost-effective biosynthetic labeling method for protein solid-state NMR is to uniformly label all carbon and nitrogen atoms with 13C and 15N. In this way, a single protein sample can in principle provide all the structural constraints – dihedral angles and distances - about the protein. The labeled precursors are typically uniformly (U) 13C-labeled glucose or glycerol, and 15N-labeled ammonium chloride or ammonium sulfate. These compounds can be readily incorporated into the growth media for protein expression. Uniform 13C, 15N-labeling has seen the most widespread application in the development of new magic-anglespinning (MAS) multidimensional correlation techniques for full structure determination of proteins. A number of microcrystalline proteins whose structures are known from X-ray crystallography or solution NMR have been used to demonstrate the ability of solidstate NMR to obtain de novo three-dimensional structures. These microcrystalline proteins include ubiquitin1,2, GB13,4, thioredoxin5, and the a-spectrin SH3 domain6. Uniform 13C and 15N labeling has also been used effectively in structure determination of amyloid fibril proteins, such as transthyretin7, the HET-s prion protein8, and a human prion protein9. A common feature of the proteins amenable to this labeling scheme is that they possess sufficient structural order on the nanometer scale to give highly resolved spectra. Without this high conformational homogeneity and the resulting high spectral resolution, uniform 13C labeling is not recommended since it would cause considerable spectral congestion. Various 2D, 3D1,10,11, and 4D12 correlation techniques have been developed to resolve the signals of uniformly 13C, 15N-labeled proteins and to determine internuclear distances and dihedral angles.
Uniform 13C and 15N labeling has also been applied to a handful of membrane proteins, such as potassium ion channels13, seventransmembrane-helix proteins14,15, light-harvesting complexes16, membrane-bound enzymes17, and bacterial toxins18. Since membrane proteins usually have larger conformational disorder than microcrystalline proteins or fibril-forming proteins, the spectral resolution of membrane proteins is generally lower. Nevertheless, detailed structural information of key regions of these membrane proteins or the global topology of membrane proteins in the lipid bilayer, such as their depth of insertion, could still be obtained even using uniformly 13C, 15N-labeled samples.
The main spectroscopic challenges involved in MAS NMR of uniformly 13C-labeled proteins are three-fold: 1) the limited dispersion of 13C isotropic chemical shifts given the inhomogeneous linewidths of the sample; 2) the 13C-13C scalar couplings that contribute to line broadening; and 3) the dipolar truncation effect that makes it difficult to measure long-range 13C-13C distances in the presence of strong one-bond 13C-13C dipolar couplings. Static 15N NMR of oriented membrane peptides and proteins do not have these challenges, since the spectral dispersion is determined by the much larger anisotropic chemical shift range rather than the isotropic chemical shift range, and because there is no 15N-15N scalar coupling nor any sizeable 15N-15N dipolar coupling in proteins. Therefore, uniform 15N labeling entails few complications for orientation determination of membrane proteins and indeed has seen fruitful applications19,20. On the other hand, it is clearly desirable to increase the information content of the aligned sample spectra by including 13C dimensions. New spectroscopic challenges need to be overcome in 13C NMR of oriented membrane proteins. For example, 13C-13C dipolar couplings of U-13C-labeled proteins are no long removed by MAS in these static samples. Strategies for decoupling the 13C-13C couplings and for correlation experiments under the static condition have been proposed and demonstrated on single crystal model compounds21. Random fractional 13C labeling, which strikes a compromise between resolution and structural information, has also been proposed22.
Two of the three challenges listed above for studying U-13C labeled proteins are nicely addressed by the complementary approach of selective 13C labeling. In this approach, carbon precursors that contain only specific 13C-labeled sites are incorporated into the protein expression media. These labeled sites are converted, through well-known enzymatic pathways23, to predictable positions in the twenty amino acids, which result in selectively and extensively labeled proteins. All residues of the same amino acid type have the same labeled positions, but different amino acids have different labeled positions due to their distinct enzymatic pathways.
The two main precursors that have been demonstrated are [2-13C] glycerol, which primarily label the Cα carbons of amino acids, and [1,3-13C] glycerol, which label the other sites skipped by [2-13C] glycerol. Each precursor tends to label alternating carbons, thus removing any sizeable 13C-13C scalar couplings and the trivial one-bond dipolar couplings. This selective labeling approach was originally proposed by LeMaster and Kushlan for solution NMR studies and subsequently adopted for solid-state NMR24-26. By far the most important application of selective 13C labeling is distance extraction from 13C-13C correlation spectra. Other amino acid precursors can in principle also be exploited, for example, oxaloacetate, a-ketoglutarate, and pyruvate, as having been done in protein solution NMR. In addition, 13C-labeled carbon dioxide has been used for studying plant cell wall proteins27,28.
Another strategy to reduce the spectral congestion without resorting to amino-acid-specific labeling is to combine a labeled general carbon precursor with unlabeled amino acids, so that only a subset of amino acid types will be labeled. For membrane protein structural studies, one version of this strategy is the TEASE (ten-amino-acid-selective-and-extensive) labeling protocol25. In this approach, [2-13C] glycerol and ten unlabeled amino acids serve as the carbon precursors of the expression media. The ten amino acids are Glu, Gln, Pro, Arg, Asp, Asn, Met, Thr, Ile, and Lys, which are products of the citric acid cycle. Normally, the cycle distributes the 13C labels in glucose or glycerol to produce fractionally labeled sites in these amino acids, so that their signals are more difficult to assign in the NMR spectra than amino acids synthesized from the glycolysis pathway. Due to the approximate hydrophobic versus hydrophilic distinction of the amino acids from the glycolysis pathway versus the citric acid cycle, a membrane protein could in principle be TEASE 13C-labeled to selectively detect the transmembrane segments rich in the hydrophobic residues.
Clearly, this reverse labeling approach is highly flexible and can be adapted for different applications. For example, a U-13C-labeled precursor can be combined with a small set of unlabeled amino acids that are dominant in the protein. Unlabeling of these amino acid types simplifies the NMR spectra considerably14, and does not bring any disadvantages to the protein expression.
Site-specific 13C and 15N labeling continues to provide rich structural information about polypeptides that are too small to be recombinantly expressed or proteins that are too large for uniformly 13C-labeled spectra to be analyzable. For polypeptides shorter than 40 amino acids, chemical synthesis is generally feasible, therefore 13C, 15N-labeled amino acids in their protected forms can be incorporated into the peptide synthesis for sitespecific labeling.
A common site-specific amino acid labeling strategy is the scattered uniform 13C, 15N-labeling of residues. As long as the yield of the peptide synthesis is not prohibitively low, the combination of several samples with different U-13C, 15N-labeled residues can eventually map out the complete structure of the polypeptide of interest. This approach has been used extensively to study amyloid peptides29 and membrane peptides30-32. Non-uniform 13C and 15N labeling of specific amino acid residues has also been applied. The most commonly labeled sites are the 13CO of the polypeptide backbone, and sometimes the sidechain 15N of lysine residues. Applications usually involve distances measurements using heteronuclear REDOR33 or homonuclear 13C recoupling34 experiments.
Since most peptides are synthesized using the Fmoc solid phase chemistry, site-specific amino acid labeling requires Fmoc-protected amino acids. For hydrophobic amino acids, their Fmoc-protected forms are usually commercially available and can also be synthesized readily from their unprotected forms. On the other hand, polar amino acids require both backbone and sidechain protection, thus are more costly and difficult to prepare. While Fmoc solid-phase synthesis is the dominant chemistry in peptide synthesis, t-Boc solid-phase synthesis has also been used for interesting structure determination targets35. Boc-protected 13C, 15N-labeled amino acids are so far much less common. Therefore, increased commercial production and availability of t-Boc-protected amino acids are desirable.
For large macromolecular complexes such as the cell walls of plants and bacteria, and for membrane proteins bound to ligands or inhibitors, it is often important to increase the diversity of isotopic labeling to enable intermolecular distance measurements. Two isotopes are readily available for this purpose: 2H and 19F. 19F is naturally 100% abundant and has a long history of being incorporated into amino acids36-38 as well as non-peptidic molecules such as lipids and pharmaceutical drugs39. Sitespecific 2H labeling is most commonly used for methyl groups of Ala, Leu, and Val, and is an excellent probe of the dynamics of proteins40,41 and DNA42. More recently, perdeuteration of proteins in combination with uniform 13C and 15N labeling has been exploited as a means to obtain high-resolution spectra of proteins, as perdeuteration removes 1H dipolar coupling as a line broadening mechanism. The back-exchanged proteins have 1H spins only at exchangeable positions such as the amide hydrogens and lysine amino groups. These sparse protons can be used as a high-sensitivity detection nucleus. Perdeuterated microcrystalline proteins have been used to study relaxation dynamics of proteins and protein-water interactions43-45.
To produce 13C/15N/2H triply labeled recombinant proteins, one needs to use 2H and 13C labeled glucose, which is commercially available. The main challenge in this type of protein expression is for the cells to tolerate a water-deuterated liquid culture, which usually decreases the protein expression yield.
Isotopic labeling is an essential and versatile tool for NMR structural biology. Creative labeling of NMR-sensitive nuclei (13C, 15N, and 2H), combined with strategic exploitation of naturally 100% abundant nuclei such as 19F and 31P, can advance the structural biology of many insoluble macromolecules important in biology.
For future progress in solid-state NMR structural biology, it will be important to develop a more diverse panel of isotopically labeled compounds and to produce the existing compounds at a more economical level. Since biosynthetically obtained 13C-labeled precursors are ubiquitous and relatively simple to produce, one of the future challenges is a chemical one, which is to produce a diverse array of specifically labeled specifically labeled amino acids and other small biomolecules with isotopic labels at desired positions.