Mass spectrometry (MS) is fast becoming an important tool for protein research. Based on the mass-to-charge ratio, or m/z, of peptide and protein fragments, this method relies on comparisons of MS data taken from existing database records. From this, probable sequence structure and protein abundance within samples are estimated. In their comprehensive review of MS methods in use today, Bruce et al. (2013) discuss how these integrate with existing databases and software algorithms to make proteomic research more efficient.1
Discovery proteomics is often the first step in analyzing a sample, whether it is a tissue, biological sample or other form. This stage often involves establishing and validating protocols to enhance protein recovery and processing, so that data gathered are representative. The authors review two different approaches — bottom-up/shotgun and top-down proteomics. Both result in a protein profile generated from MS analysis of peptides, either fragmented with enzyme digestion or initially intact in an undigested sample. In some protocols, the experimental protocol is refined by adding liquid chromatography (LC) fractionation prior to MS (LC-MS/MS).
Using search engines such as SEQUEST (Thermo Scientific), MS data are compared with existing peptide or spectral databases to generate probable sequences for each peptide mass detected. From these sequences, the original protein structures are generated by running data through protein interference engines. Other computational programs can predict post-translational modifications such as phosphorylation. Since samples are usually run as replicates in LC-MS/MS analysis, vast data sets are generated. Many of these are available in repositories such as Human Proteinpedia and PeptideAtlas, making the data available for collaborative projects.
In contrast to discovery proteomics, quantitative proteomics, or protein profiling, is used to determine the absolute or relative concentrations of known proteins within a sample. Samples are usually analyzed in comparison with controls in order to investigate differential expression in disease pathology. Novel biomarkers are often discovered using this approach.
Currently, labeled (e.g., iTRAQ), absolute and label-free methods are used. The first two methods (i.e., labeled or absolute) involve labeling either the analytes themselves or internal standards assayed along with the sample. Label-free quantification relies on the positive correlation between LC-MS/MS signal strength and protein concentration. It is a more sensitive method, but samples must be run in triplicate, thus generating vast data sets. Software applications such as SIEVE (Thermo Scientific) are available for analysis.
Once proteins have been assayed and their differential expressions statistically analyzed, other bioinformatics programs can give added information on cellular and chromosomal location, clustering, pathway involvement and interactions. Bearing in mind the size of data sets generated at multiple points in the experimental process, it is important that researchers consider the file formats generated. Choosing a multi-purpose format enables efficient data sharing among analytical programs.
The review’s authors conclude that, due to continual development, the computational power behind data analysis of LC-MS/MS proteomic assay is becoming more powerful. Already, MS is beginning to replace traditional biochemistry as the assay of choice in proteomic research. With advances in software and algorithm development for processing the proteomic data sets, the procedure is becoming more efficient and more sensitive, thus increasing its relevance to protein research now and in the future.
1. Bruce, C., et al. (2013) “Proteomics and the analysis of proteomic data: 2013 overview of current protein-profiling technologies,” Curr. Protoc. Bioinformatics, 41 (pp. 13.21.1–13.21.17), doi: 10.1002/0471250953.bi1321s41.