One of the basic requirements for analyzing shotgun proteomics data to obtain meaningful results is correctly matching the observed tandem mass spectrometric (MS/MS) spectra with theoretical values obtained from protein sequence databases. Using powerful data processing tools, the assignment of probability scores determines whether an observed peptide matches information found in a database, thus identifying it as derived from its theoretical parent protein. Post-search validation of peptide identity is therefore a vital part of the proteomics workflow.
In data-dependent acquisition (DDA), however—even with a false discovery rate (FDR) of 1%—this approach identifies fewer than 50% of the shotgun MS/MS spectra with confidence. Large numbers of peptide spectrum matches (PSMs) receive low scores and are left orphaned because they do not meet matching criteria. In other words, approximately 50% of MS/MS spectra are discarded as meaningless unless further processing using post-search validation software such as Percolator or PeptideProphet improves identification rates.
Ivanov and colleagues (2014) sought to diminish this loss of data. They developed a new scoring scheme to enhance programs used to analyze the cast-off data and thus improve peptide identification yields.1 Their method is based on a multi-parameter (MP) score that takes into account all data collected on PSMs, such as retention times (experimental and predicted), ion precursor matches, missed cleavages, charge states and so on. The method is specifically applicable to data collected on PSMs below the positive identification threshold of an experimental run.
In order to examine their proposed scoring system, the research team processed two universal standard protein mixtures (n=48 proteins each) using a Dionex UltiMate 3000 RSLCnano system (Thermo Scientific). They also examined data sets acquired from proteomic analysis of human IVF samples and rat kidney tissues using an Orbitrap Elite hybrid ion trap-Orbitrap mass spectrometer (Thermo Scientific). The team analyzed the data using X! Tandem software followed by PeptideProphet and Percolator (self-boosted), with reference to the Swiss-Prot human protein and the UniProt rat protein databases.
Setting an FDR at lower than 1%, the scientists looked at PSM descriptors (e.g., mass difference between theoretical and observed, retention times, fragment mass errors) to calculate the MP score and the estimated FDR for peptide identifications obtained from their new workflow.
In comparison with data obtained from the use of PeptideProphet and Percolator software to identify low-scoring PSMs, Ivanov et al. found that their MP scoring scheme performed well, giving highly comparable or better results. They also noted that protein coverage was improved when several algorithms were used together to identify more peptides. Furthermore, the authors consider that their MP scoring scheme is more efficient than PeptideProphet for improving rates of peptide identification and have made the tool available online.
Reference
1. Ivanov, M.V., et al. (2014) “Empirical multi-dimensional space for scoring peptide spectrum matches in shotgun proteomics,” Journal of Proteome Research, 13 (pp. 1911–20), doi: 10.1021/pr401026y.
Post Author: Amanda Maxwell. Mixed media artist; blogger and social media communicator; clinical scientist and writer.
A digital space explorer, engaging readers by translating complex theories and subjects creatively into everyday language.
Leave a Reply