Wang et al. (2014) present an analytical tool, MixGF, which examines the statistics of spectral probabilities for mixture spectra arising from more than one peptide and thus increases identification rates in large-scale proteomics experiments.1
Co-fragmentation of multiple peptide precursors happens frequently in large-scale mass spectrometry-based proteomics experiments, and large-scale studies are becoming the norm in this area of research. Although maximizing experimental throughput increases data delivery, it also means that co-fragmentation is a bigger issue. This is especially applicable when using data-independent acquisition, since this method promotes it as part of the workflow. The problem with co-fragmentation arises as researchers must determine what peptide generated the spectrum under examination in order to make a firm identification. According to Wang and coauthors, around 50% of tandem mass spectrometry (MS/MS) data can come from more than one peptide precursor, thus reducing peptide identification rates per experimental run.
Currently, analytical tools for examining these data show low sensitivity. Wang et al. present a new software package, MixGF, which can run the statistical analysis on mixture spectra to score peptide spectral matches (PSMs) and improve the identifications made in each experimental run. In their words, “MixGF determines the probability that a random pair of peptides…will match a given mixture spectrum…” using a scoring system to generate best matches and thus identities.
To evaluate their tool, the researchers worked with both a simulated mixture spectra data set and with those generated from MS/MS analysis of complex biological matrices. The simulated mixture spectra data comprised known peptide pairs and identities. From this, the team could compare MixGF-generated results with known values.
Considering these pairs of peptides, the research team preformed the following steps in developing the statistical analyses required, with the goal of identifying correct PSMs from those that are only half correct or incorrect:
- Spectral probability for a mixture spectrum
- Scoring function for a mixture spectrum
- Computing spectral probabilities by extending the generating function of the existing tool MS-GF to include conditional and joint probabilities
- Approximating joint probabilities to enable analysis that distinguishes between correct and incorrect matches
- Classification of matches as “no match,” “single-peptide match” or “mixture match”
- Estimation of false detection rates, using a global standard of 1%
Using the simulated data set, Wang et al. tested out the statistical steps required to generate data scoring and thus peptide identifications. Once satisfied with the methodology, they turned to three real data sets generated from MS-based proteomics analysis of a yeast cell line and two human cell lines. Instrumentation used included LTQ Orbitrap XL and LTQ Orbitrap XL ETD mass spectrometers (both Thermo Scientific). Comparison with existing analytical tools MixDB and ProbIDtree, in addition to benchmarking against M-SPLIT, showed that MixGF identifies 30.1–390% more mixture spectra with comparable computational speeds.
Wang et al. consider that MixGF provides a tool for accurate assessment of mixture spectra that allows identification of multiple peptides within complex biological matrices. Furthermore, its performance surpasses existing tools, giving better identification rates than those achieved conventionally.
1. Wang, J. et al. (2014) “MixGF: spectral probabilities for mixture spectra from more than one peptide“, Molecular and Cellular Proteomics 13(pp.3688-97) doi: 10.1074/mcp.O113.037218.
Post Author: Amanda Maxwell. Mixed media artist; blogger and social media communicator; clinical scientist and writer. A digital space explorer, engaging readers by translating complex theories and subjects creatively into everyday language.