Instead of quantifying differential expression changes in proteins by converting spectral data back into protein identities, Suomi et al. (2015) present a bioconductor package, PECA, that will perform the analytical process directly from the peptide readouts themselves.1 They propose to quantify proteins by peptide-level computation using peak intensities. The authors suggest that this method, which performs better than current label-free workflows where differences between sample groups are small, avoids errors in quantitation.
Traditional quantitation protocols for estimating differential expression identify peptides from the spectral data recovered during mass spectrometric proteomic analysis. However, unlike genomics, another high-throughput –omics technology, mass spectrometry–based proteomics lacks the breadth of analytical packages and computational statistics tools existing for the former.
Suomi et al. propose that combining multiple statistical analyses of the original peptide values enables accurate protein estimation. They suggest that concentrating on the peptide as the unit of measurement, even though the protein is the desired unit, can reduce errors introduced due to inconsistencies between peptides.
The research team validated the new analytical workflow for liquid chromatography–tandem mass spectrometry through controlled spike-in studies that they analyzed on an LTQ Orbitrap Velos mass spectrometer coupled with an EASY-nLC nanoflow liquid chromatography system (both Thermo Scientific). They used the Mascot algorithm in Proteome Discoverer software (Thermo Scientific) to search the UniProt KB/Swiss-Prot database (release 2011_03).
The team digested a universal proteomics standard solution (equimolar; n = 48 proteins) with trypsin before spiking aliquots into 100 ng of the yeast proteome digest to give final concentrations of 2 fmol/µl, 4 fmol/µl, 10 fmol/µl, 25 fmol/µl and 50 fmol/µl. They then ran each concentration in triplicate, comparing differential expression between each sample. Following acquisition, the researchers analyzed the peptide-level results with PECA to quantify proteins. The analytical package combined a number of statistical testing procedures, including normalization and log2 transformation, before running tests of significance for peptide-level difference in expression. These included both paired and unpaired t-tests where possible to establish the p-value.
Following analysis, the researchers found that PECA achieved good sensitivity and specificity. Plotting receiver operating characteristic curves to compare true and false positives showed a higher accuracy than the traditional approach, with PECA outperforming the traditional method in terms of true positive values. Furthermore, with PECA calculation, peptide-level identities emerged from background noise, and the package showed good sensitivity in comparing between the 2 fmol/µl and 4 fmol/µl samples. Suomi et al. obtained better results from PECA than from other, conventional analytical software packages, such as MSstats and InfernoRDN.
By eliminating the conversion step between peptide spectral data and protein identities, the authors show that the PECA workflow results in fewer errors for protein quantitation. They suggest that the method shows good potential and gives reliable quantitation data. Suomi et al. also propose that further testing using spectral counts instead of peak intensities could work. Moreover, benchmarking PECA against current methods requires characterization on a larger scale.
1. Suomi, T., et al. (2015) “Using peptide-level proteomics data for detecting differentially expressed proteins,” Journal of Proteome Research, 14(11) (pp.4564–4570), doi: 10.1021/acs.jproteome.5b00363.