Introduction to Quantitative Proteomics
Early biochemical proteomics research focused on identifying and understanding the functions of individual proteins or protein complexes. Technological advances in instrumentation, though, have increased the number of proteins that one can analyze in a single sample from hundreds a decade ago to thousands today (1). At this level of analysis, global protein dynamics can be studied on a cellular, tissue or organism level. This type of approach is consistent with the increasingly broad-scope analyses that are being used in other life science fields, including genomics, transcriptomics, metabolomics, and kinomics, to give us a greater understanding of global biological processes and how they respond to different stimuli or change during disease states.
While proteomic analyses can be used to qualitatively identify thousands of proteins in cells or other biological samples, there is also a need to quantitate these proteins. Because of the dynamic and interactive nature of proteins, though, quantitative proteomics is considerably more complex than simply identifying proteins in a sample. But due of the considerable amount of data that one can acquire from quantitative proteomics, this approach is critical for our understanding global protein kinetics and molecular mechanisms of biological processes.
Two fundamental approaches to proteomic analyses are currently employed. In top-down proteomics, intact proteins or large fragments are ionized and analyzed by mass spectrometry (MS). Bottom-up proteomics rely on peptides, which are generated by proteolytic digestion of protein samples. Due to the protein size limitation in top-down proteomics (<50kD), bottom-up proteomics is more commonly used.
Because of the overwhelming number of proteotypic peptides in a sample, only a small subset of all peptides in a sample can be analyzed in a single MS run, which limits the number of proteins in a sample that can be identified. The number of proteins available for quantitation is further limited, because they have to be identified in all samples that are tested in a single experiment. Practically speaking, the linear dynamic range of quantitation is often limited to 10- to 20-fold, depending on the the sensitivity of the instrument and complexity of the sample, which also affects the scope of quantitative proteomics.
Quantitative Proteomics Handbook
Our Quantitative Proteomics Handbook provides an overview of protein quantitation tools and reagents for discovery and targeted proteomics.
Sample complexity is a critical factor of peptide quantitation, as identification and quantification rates are directly proportional to sample complexity. Methods such as affinity purification are often performed to remove high-abundance proteins and reduce sample complexity. In-line liquid chromatography (LC) is also a common pre-MS fractionation process to chemically separate peptides to further reduce sample complexity.
Quantitative proteomic analyses typically rely on MS to identify or quantitate selected peptides, although tandem mass spectrometry (MS/MS) is required for peptide identification. During the first round of MS (MS1), ionized peptides are sampled to produce a precursor ion spectrum that represents all ionized peptides in the sample. Individual ions are then selected to undergo collision-induced fragmentation (CID) and a second round of MS (MS2), which yields a fragment ion spectrum for each precursor ion. These fragment spectra are compared to peptide databases and assigned specific peptide sequences and then computationally organized into the predicted protein sequence.
Discovery vs. Targeted Proteomics
Strategies to improve the sensitivity and scope of proteomic analysis generally require large sample quantities and multi-dimensional fractionation, which sacrifices throughput. Alternatively, efforts to improve the sensitivity and throughput of protein quantification limit the number of features that can be monitored. For this reason, proteomics research is typically divided into two categories: discovery and targeted proteomics. Discovery proteomics optimizes protein identification by spending more time and effort per sample and reducing the number of samples analyzed. In contrast, targeted proteomics strategies limit the number of features that will be monitored and then optimize the chromatography, instrument tuning and acquisition methods to achieve the highest sensitivity and throughput for hundreds or thousands of samples.
Discovery proteomics experiments are intended to identify as many proteins as possible across a broad dynamic range and often require depletion of highly abundant proteins, enrichment of relevant components (e.g., subcellular compartments or protein complexes) and fractionation to decrease sample complexity (e.g., SDS-PAGE or chromatography). These strategies can reduce the dynamic range between components in a fraction and reduce the competition between proteins or peptides for ionization and MS duty cycle time. Quantitative discovery proteomics experiments add a further challenge, because they seek to identify and quantify protein levels across multiple fractionated samples.
Targeted proteomics experiments are typically designed to quantify less than one hundred proteins with very high precision, sensitivity, specificity and throughput. Indeed, this approach typically minimizes the amount of sample preparation to improve precision and throughput. Targeted MS quantitation strategies use specialized workflows and instruments to improve the specificity and quantification of a limited number of features across hundreds or thousands of samples, including directed sequencing by inclusion lists and selected (or multiple) reaction monitoring (SRM or MRM, respectively).
While discovery proteomics is most often used to inventory proteins in a sample or detect differences in the abundance of proteins between multiple samples, targeted quantitative proteomic experiments are increasingly used in pharmaceutical and diagnostic applications to quantify proteins and metabolites in complex samples (1). Additionally, targeted proteomics often follow discovery proteomics to quantitate specific proteins found during discovery screening.
The characteristics of specific mass spectrometers make them more amenable to use with either discovery or targeted proteomic analysis. For example, because discovery proteomics emphasizes identification of all peptides in a limited number of samples, high-resolution instruments, such as orbitrap, Fourier transform and tandem time-of-flight (TOF/TOF) mass spectrometers, are used to maximize the detection of peptides with minute mass-to-charge ratio (m/z) differences. Conversely, because targeted proteomics emphasizes sensitivity and throughout, instruments including triple quadrupoles, ion traps, quadrupole TOFs (QTOFs) and Q traps are used.
Relative vs. Absolute Quantitation
Mass spectrometry is not inherently quantitative, because proteolytic peptides show great variability in physiochemical properties that in turn result in variability in mass spectrometric response between runs. Additionally, mass spectrometers only sample a small percentage of the total peptides in a sample (3). Therefore, various approaches have been developed to perform relative and absolute proteomic quantitation.
Relative quantitation strategies compare the levels of individual peptides in a sample to those in an identical, but experimentally modified, sample. One approach for relative quantitation is to separately analyzing samples by MS and compare the spectra to determine peptide abundance in one sample relative to another, as in label-free quantitation strategies.
More costly and time-consuming approaches require internal, isotopically labeled standards for the mass spectrometer to distinguish between identical proteins from separate samples. A typical relative quantitation experiment that uses isotopic labels entails labeling proteins or peptides from two experimental samples with isotopically heavy and light atoms (via a labeled amino acid or cell culture component), which makes the peptides in these two samples isotopologues (identical molecules that differ only in isotope composition). After alteration of the proteome in the experimental group through chemical treatment or genetic manipulation, equal amounts of protein from both populations are combined and analyzed by LC-MS or LC-MS/MS analysis. Because the light and heavy forms of individual peptides are chemically identical, they co-elute during LC prefractionation and are therefore detected simultaneously during MS analysis. The peak intensities of the heavy and light peptides are then compared to determine the change in abundance in one sample relative to that of the other sample. Methods to isotopically label proteins or peptides include metabolic labeling of live cells and enzymatic or chemical labeling of extracted proteins or peptides.
Absolute proteomic quantitation using isotopic peptides entails spiking known concentrations of synthetic, heavy isotopologues of target peptides into an experimental sample and then performing LC-MS/MS. As with relative quantitation using isotopic labels, peptides of equal chemistry co-elute and are analyzed by MS simultaneously. Unlike relative quantitation, though, the abundance of the target peptide in the experimental sample is compared to that of the heavy peptide and back-calculated to the initial concentration of the standard using a pre-determined standard curve to yield the absolute quantitation of the target peptide.
When to use relative and absolute quantitation strategies
It may seem obvious that absolute quantitation would be ideal compared to relative quantitation, because the absolute peptide values from different samples could also be compared to determine relative protein changes. Relative proteomic quantitation is used more often than absolute quantitation, though, because costly reagents and time-consuming assay development are required for the absolute quantitation of each protein of interest.
Experimental bias can influence the decision to use relative or absolute quantitation strategies. One source of bias is the mass spectrometer itself, which has a limited capacity to detect low-abundance peptides in samples with a high dynamic range. Additionally, the limited duty cycle of mass spectrometers restricts the number of collisions per unit of time, which may result in an undersampling of complex proteomic samples (2). Another source of bias is variation in sample preparation between experiments or individual samples in single experiments. The greater the number of steps between labeling and sample combination, the greater is the risk of introducing experimental bias. For example, during metabolic labeling, proteins are labeled in live animals or cells and the samples are then immediately combined. Because all subsequent sample preparation and analysis is performed with the combined samples, metabolic labeling has the lowest risk of experimental variation (3). Conversely, samples that are individually processed and analyzed in label-free quantitation strategies have a greater risk of sample variation and experimental bias.
Label-free methods for both relative and absolute quantitation have been developed as a rapid and low-cost alternative to other quantitative proteomic approaches. These strategies are ideal for large-sample analyses in clinical screening or biomarker discovery experiments (17), and while they are good at measuring large changes in protein expression, they are less reliable for measuring small changes and can have a limited range of linear quantitative measurement (< 2 orders of magnitude)(18).
Unlike other quantitation methods, label-free samples are separately collected, prepared and analyzed by LC-MS or LC-MS/MS. Because of this, label-free quantitation experiments need to be more carefully controlled than stable isotope methods to account for any experimental variations. Protein quantitation is performed using either ion peak intensity or spectral counting.
Relative quantitation by ion peak intensity relies on LC-MS only (no MS/MS). The direct MS m/z values for all ions are detected and their signal intensities at a particular time recorded. The signal intensity from electrospray ionization has been reported to highly correlate with ion concentration, and therefore the relative peptide levels between samples can be determined directly from these peak intensities (19,20). Because of the large amount of data collected from these experiments, sensitive computer algorithms are required for automated ion peak alignment and comparison.
Label-free relative quantitation by spectral counts entails comparing the sum of the MS/MS spectra from a given peptide across multiple samples, which has been shown to directly correlate with protein abundance (19). Unlike quantitation by peak intensity, spectral counting does not require special algorithms or other tools, although significant normalization is required (21,22).
Besides relative quantitation, label-free methods can be used to determine the absolute concentration of proteins in a sample. One method entails determining the exponentially modified protein abundance index (emPAI), which estimates protein abundance based on the number of peptides detected and the number of theoretically observed tryptic peptides for each protein, is used to determine the approximate absolute protein abundance in large-scale proteomic analyses (17,23,24). Another method, absolute protein expression (APEX), is based on spectral counts and uses correction factors to make protein abundance proportional to the number of peptides observed.
There are multiple methods of this type of in vivo labeling, and selection criteria include the extent of labeling required. Metabolic labeling for relative proteomic quantitation was first reported by Oda et al., who uniformly labeled all amino acids in yeast with heavy nitrogen (15N) by growing yeast in culture medium where the only nitrogen source was15N-labeled ammonium persulfate (4).
This approach was further developed for use in mammalian cell lines by Mann et al., who reported a method for stable isotope labeling by amino acids in cell culture (SILAC), which has become the most common approach for in vivo isotopic labeling (5). Instead of labeling all amino acids with heavy nitrogen, cells are cultured in growth medium that contains 13C6-lysine and/or 13C6-arginine. These amino acids were chosen, because trypsin, the predominant enzyme used to generate proteotypic peptides for MS analysis, cleaves at the C-terminus of lysine and arginine. Thus, all tryptic peptides from cultures grown in SILAC media (except for the very C-terminal peptides) have at least one labeled amino acid, which results in a constant mass increment in labeled samples over non-labeled, yet otherwise identical, samples.
There are many benefits of using metabolic labeling strategies compared to other methods of quantitation. For one, proteins can often attain >90% isotopic incorporation in immortalized cell lines after 6-8 passages (5). Because heavy and light samples are combined before sample preparation for MS analysis, the level of quantitation bias from processing errors is low. This key aspect of metabolic labeling makes this method particularly useful to detect relatively small changes in protein levels or post-translational modifications between experimental conditions.
A limitation of this approach is that some cells convert high concentrations of arginine to proline, which in the case of heavy arginine labeling produces two distinct heavy peak clusters that represent heavy arginine- or proline-labeled peptides. This issue can be addressed by either accounting for the heavy proline in the quantitation calculation or by titrating the heavy arginine concentration in the culture medium to below the threshold at which conversion is detectable. Metabolic labeling may not be amenable to cell lines that are difficult to grow or show extreme sensitivity to changes in culture medium composition. This technique also may influence how the organism functions, as growth conditions are changed to allow incorporation of heavy compounds (6). Finally, the number of experimental conditions per experiment is restricted when using metabolic labeling because of the limited number of heavy isotopes incorporated into lysine and arginine. For example, a maximum of 3 conditions per experiment (unlabeled, 13C6- and13C6 15N4-labeled amino acids) can be performed with SILAC.
For samples that are not amenable to metabolic labeling, such as when analyzing clinical samples (e.g., biological fluids, tissue samples) or when experimental time is limited, chemical or enzymatic stable isotopic labeling methods are available for quantitative proteomic analyses. These include strategies to add isotopic atoms or isotope-coded tags to peptides or proteins. While the methods described below do not comprise an exhaustive list of isotopic labeling methods, they do represent commonly used approaches.
Enzymatic labeling with 18O takes advantage of the proteolytic mechanism of trypsin to incorporate two heavy oxygen atoms from H218O at the C-terminus of every newly digested peptide (7). In this labeling scheme, one sample is digested with trypsin and18O water and another with 16O water, and then the samples are combined for relative proteomic analysis by MS. While this method is simple to execute, a disadvantage is a slow back exchange of 18O and 16O when the two samples are combined, leading to incomplete labeling or peptides labeled with only one heavy oxygen atom. While adding 1-5% formic acid can attenuate this back exchange for up to 24 hours, samples labeled with this method should be processed rapidly (8).
Another enzymatic isotopic labeling strategy is global internal standard technology (GIST), which uses deuterated (2H) acylating agents such as N-acetoxysuccinimide (NAS) to label primary amino groups on digested peptides (9). Acylation of these groups, though, changes the ionic states of peptides and may affect the ionization efficiency of peptides with C-terminal lysines (10). Additionally, isotopic methods that label with deuterium result in partial separation of heavy and light peptides during LC, because the deuterium slightly interacts with the stationary phase (e.g., C18). This difference can affect the confidence and accuracy of the internal standards, because one of them may co-elute with another peptide that inhibits its ionization.
A rapid and relatively inexpensive method of chemical labeling is stable isotope dimethylation. This approach uses formaldehyde in deuterated water to label primary amines with deuterated methyl groups (10). Unlike GIST, this approach does not change the ionic state of the labeled peptides because of the reductive amination that occurs, so their chemical properties remain the same as those of unlabeled peptides.
A benefit of this approach is that a wide array of sample types is amenable to formaldehyde fixation, which is fast and cheap compared to other labeling reagents. As with other methods of labeling, this method has global labeling characteristics, which has both pros and cons. While this high level of isotopic labeling is beneficial when other labeling strategies fail, it requires either using relative pure samples or sample preparation to reduce the complexity of biological samples to minimize the number of peaks detected by MS.
Commercially isotopic labeling reagents are also available that encompass a wide range of reactive groups for different crosslinker specificity and heavy labels for different applications isotopologue separation.
The isotope-coded affinity tag (ICAT) method was developed to reduce the sample complexity and identify low-abundance proteins and peptides in complex samples (11). ICAT tags were originally comprised of a sulfhydryl-reactive chemical crosslinking group, an 8-fold deuterated (d8; adds 8 Da to the molecular mass of the unlabeled peptide) or light (d0) linker region and a biotin molecule. Because of the sulfhydryl-reactive chemical group, only free thiols on cysteine residues are labeled with this tag. The sample is then passed over immobilized avidin, which binds to the biotin tag and purifies the labeled peptides from the sample. Because not all peptides have cysteine residues, this method does not result in global labeling and thus is an inherent approach to reduce sample complexity. Once peptides are labeled, they are eluted from the sample by column chromatography using immobilized avidin or streptavidin. After purification, heavy (d8) and light (d0) samples are combined and analyzed for relative quantitation by LC-MS.
This method is ideal for complex samples, because only cysteine residues are tagged and labeled peptides are affinity purified, which significantly reduces sample complexity. ICAT labeling does have a bias against proteins and peptides that lack cysteine residues, which is considerable compared to proteins that lack lysine residues. For example, 14% of Escherichia coli (E. coli) open reading frames (ORFs) do not code for cysteines, while only 0.8% do not code for lysine (although half of those could still be tagged because of terminal amines) (12). This difference in amino acid availability should be considered when determining the right isotopic labeling method to use for quantitative proteomic analyses. The group that originally developed ICAT reagents also later developed ICAT tags that contain 13C instead of deuterium to circumvent the issue of partial peak separation during LC (14).
Although affinity purification of ICAT-labeled peptides reduces sample complexity by 10-fold, the cysteine-specific labeling method also reduces protein sequence coverage by the same factor (13). Because of this limitation, isotope-coded protein labeling (ICPL) was developed, in which lysine residues and available N-termini on intact proteins are isotopically labeled with a heavy (d4) or light (d0) tag. This approach increases the level of labeling, because significantly more terminal amino groups are available than cysteine resides. Also, ICPL is amenable to a greater level of pre-MS fractionation than other labeling methods, because sample complexity can be reduced at both the protein level (before digestion; electrophoresis or LC) and the peptide level (after digestion; LC). ICPL also allows the simultaneous comparison of three experimental conditions in a single experiment with two heavy tags (d7 and d3) and the d0 light tag. This multiplex capability distinguishes ICPL from ICAT and the other labeling methods listed above.
Unlike isotopic tags that have the potential to separate during LC elution, isobaric tags have identical masses and chemical properties that allow heavy and light isotopologues to co-elute together. The tags are then cleaved from the peptides by collision-induced dissociation (CID) during MS/MS, which is required for this type of quantitative proteomic analysis. Indeed, these tags were originally called tandem mass tags to indicate their use with tandem mass spectrometry (6). After CID, the peptide fragment ions are analyzed for sequence assignment and the isobaric tags are quantitated, resulting in concurrent peptide identification and relative quantitation. Additionally, because MS/MS is required to detect the isobaric tags, unlabeled peptides are not quantitated.
A benefit of isobaric mass tags is the multiplex capabilities and thus increased throughput potential of this approach. Commercially available isobaric mass tags (e.g., TMT*, iTRAQ*) are commercially available that offer the simultaneously analysis of 4, 6 or 8 biological samples. While the exact tags used vary depending on manufacturer, the basic components of all isobaric mass tag reagents consist of a mass reporter (tag) that has a unique number of 13C substitutions, a mass normalizer that has a unique mass that balances the mass of the tag to make all of the tags equal in mass. Isobaric mass tags also have a reactive moiety that crosslinks to primary amines or cysteines (depending on the product used). These tags are designed so that the mass tag is cleaved at a specific linker region upon high-energy CID (HCD), yielding the different-sized tags that are then quantitated by LC-MS/MS. Isobaric mass tagging has also been adapted for use with protein labeling (similar to ICPL). Some commercially available kits also offer isobaric tags with sulfhydryl-reactivity and anti-TMT antibody for affinity purification of cysteine-tagged peptides prior to LC-MS/MS.
Selected reaction monitoring (SRM) or multiple reaction monitoring (MRM) is a method of absolute quantitation (also terms AQUA) in targeted proteomics analyses that is performed by spiking complex samples with stable isotope-labeled synthetic peptides that act as internal standards for specific peptides (15). These heavy peptides are designed to be identical to tryptic peptides generated by sample digestion, so that they co-elute with the target peptide and are concomitantly analyzed by MS/MS (using instrumentation with a large dynamic range). The target peptide concentration is then determined by measuring the observed signal response for the target peptide relative to that of the heavy peptide, the concentration of which is calculated from a pre-determined calibration-response curve. While this method yields absolute peptide concentrations in as few as one sample, calibration curves have to be generated for each target peptide in the sample.
Assay development is a significant part of SRM proteomic analyses. Heavy peptides for each of the target peptides must be synthesized, and because proteins yield multiple peptides with varying electrochemical characteristics, the heavy peptide sequences that will yield the optimal results must be identified. Software is used to help predict the ideal tryptic peptide sequences, but the combination of trial-and-error peptide identification and instrumentation optimization makes absolute quantitation using isotopic peptides time-consuming and costly. Once the assay is optimized for a predetermined set of peptides (up to approx. 200 per LC-MS run; 15), though, SRM offers the highest level of reproducibility and sensitivity in detecting these peptides in multiple samples. This approach has been reported to detect proteins with concentrations less than 50 copies per cell in unfractionated lysates (16), demonstrating that it is the quantitative approach that is the least affected by sample complexity (1).
AQUA-grade peptides are costly because of their high quality and purity, and therefore scientists often use low-quality crude peptides during targeted assay development. Entire libraries of different peptide sequences can be commercially synthesized and screened during assay development to identify the optimum peptides, which are then synthesized at the AQUA purity and quality standards for SRM assays.
- Ahrens C. H. et al. (2010) Generating and navigating proteome maps using mass spectrometry. Nat Rev Mol Cell Biol. 11, 789-801.
- Prakash A. et al. (2007) Assessing bias in experiment design for large scale mass spectrometry-based quantitative proteomics. Mol Cell Proteomics. 6, 1741-8.
- Bantscheff M. et al. (2007) Quantitative mass spectrometry in proteomics: A critical review. Anal Bioanal Chem. 389, 1017-31.
- Oda Y. et al. (1999) Accurate quantitation of protein expression and site-specific phosphorylation. Proc Natl Acad Sci U S A. 96, 6591-6.
- Ong S. E. et al. (2002) Stable isotope labeling by amino acids in cell culture, silac, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics. 1, 376-86.
- Thompson A. et al. (2003) Tandem mass tags: A novel quantification strategy for comparative analysis of complex protein mixtures by ms/ms. Anal Chem. 75, 1895-904.
- Mirgorodskaya O. A. et al. (2000) Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using (18)o-labeled internal standards. Rapid Commun Mass Spectrom. 14, 1226-32.
- Stewart, II et al. (2001) 18O labeling: A tool for proteomics. Rapid Commun Mass Spectrom. 15, 2456-65.
- Chakraborty A. and Regnier F. E. (2002) Global internal standard technology for comparative proteomics. J Chromatogr A. 949, 173-84.
- Hsu J. L. et al. (2003) Stable-isotope dimethyl labeling for quantitative proteomics. Anal Chem. 75, 6843-52.
- Gygi S. P. et al. (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol. 17, 994-9.
- Schmidt A. et al. (2005) A novel strategy for quantitative proteomics using isotope-coded protein labels. Proteomics. 5, 4-15.
- Gygi S. P. et al. (2002) Proteome analysis of low-abundance proteins using multidimensional chromatography and isotope-coded affinity tags. J Proteome Res. 1, 47-54.
- Yi E. C. et al. (2005) Increased quantitative proteome coverage with (13)c/(12)c-based, acid-cleavable isotope-coded affinity tag reagent and modified data acquisition scheme. Proteomics. 5, 380-7.
- Kuhn E. et al. (2004) Quantification of c-reactive protein in the serum of patients with rheumatoid arthritis using multiple reaction monitoring mass spectrometry and 13C-labeled peptide standards. Proteomics. 4, 1175-86.
- Picotti P. et al. (2009) Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell. 138, 795-806.
- Zhu W. et al. (2010) Mass spectrometry-based label-free quantitative proteomics. J Biomed Biotechnol. 2010, 840518.
- Old W. M. et al. (2005) Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteomics. 4, 1487-502.
- Liu H. et al. (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem. 76, 4193-201.
- Voyksner R. D. and Lee H. (1999) Investigating the use of an octupole ion guide for ion storage and high-pass mass filtering to improve the quantitative performance of electrospray ion trap mass spectrometry. Rapid Commun Mass Spectrom. 13, 1427-37.
- Florens L. et al. (2006) Analyzing chromatin remodeling complexes using shotgun proteomics and normalized spectral abundance factors. Methods. 40, 303-11.
- Zybailov B. et al. (2006) Statistical analysis of membrane proteome expression changes in saccharomyces cerevisiae. J Proteome Res. 5, 2339-47.
- Rappsilber J. et al. (2002) Large-scale proteomic analysis of the human spliceosome. Genome Res. 12, 1231-45.
- Ishihama Y. et al. (2005) Exponentially modified protein abundance index (empai) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics. 4, 1265-72.
For Research Use Only. Not for use in diagnostic procedures.