Optimizating Search Engines and Post-Processing Approaches

Finding the best way to analyze proteomic data using data searching and post-processing is crucial to obtaining wide proteome coverage. While there are several different combinations available, it is important to know which combinations interpret data with the highest accuracy. To this end, Tu et al. aspired to determine the best approaches using three search engines (SEQUEST, Mascot and MS Amanda) with five filtering approaches (respective score-based filtering, a group-based approach, local false discovery rate (LFDR), PeptideProphet and Percolator).

To get the necessary data, researchers obtained eight data sets from various proteomes (e.g., E. coli, yeast and human) produced by various instruments. The following table helps summarize the experimental design.

Model system	Mass spectrometer	Mass spectrometry (MS) parameters
yeast	Thermo Scientific LTQ Orbitrap XL	Collision-induced dissociation (CID) with MS2 analysis in the ion trap (XL CID−IT yeast)
human cell line sample (MCF7 cells)	Thermo Scientific Orbitrap Elite	CID with MS2 analysis in IT (Elite CID−IT human)
human cell line sample (Hela cells)	Thermo Scientific Orbitrap Fusion Tribrid	CID with MS2 analysis in IT (Fusion CID−IT human)
yeast sample	Thermo Scientific Orbitrap Fusion Tribrid	HCD with product-mass-spectra analysis in IT (Fusion HCD−IT yeast)
E. coli	Agilent 6530A (Q-TOF E. coli)	CID with TOF, mass tolerance of .05 Da
yeast sample	Thermo Scientific LTQ Orbitrap Velos	HCD with product-mass-spectra observation in the orbitrap (Velos HCD−OT yeast)
human cell line sample (Hela cells)	Thermo Scientific Q Exactive	HCD with product-mass-spectra observation in OT (QE HCD−OT human)
human cell line sample (PANC- 1 cells)	Thermo Scientific Orbitrap Fusion Tribrid	HCD with product-mass-spectra analysis in OT (Fusion HCD−OT human

After analyzing the eight data sets using the various mass spectrometry platforms, the team performed database searches and post-processing filtering. After comparing each combination, the team found that data filtered with Percolator outperformed the other four methods. Using the naive score-based approach, improvements by Percolator ranged from 55% to 88%, 44% to 85%, and 14% to 39% at the peptide spectra match (PSM), distinct peptide and protein group levels, respectively, in the eight data sets. For all of the CID−IT data, the group-based approach and PeptideProphet achieved the second- and third-highest numbers in all categories. For all HCD-OT data (data sets F−H), the LFDR and group-based approach achieved the second- and third-highest numbers in all categories. As for the Fusion HCD−IT yeast and Q-TOF E. coli data sets, the group-based approach and LFDR achieved similar improvements, although both were inferior to the results from Percolator.

The team also noted that combinations of SEQUEST−Percolator and MS Amanda−Percolator provided slightly better performances for data sets with low accuracy MS2 (ion trap or IT) and high accuracy MS2 (Orbitrap or TOF), respectively, than did other methods. Looking to uniquely identified proteins, SEQUEST−Percolator achieved the highest percentage of proteins containing ≥4 peptides.

Finally, the team determined that where Percolator was not used, Mascot−LFDR gave more identifications for data sets generated by higher-energy collisional dissociation (HCD) and analyzed in Orbitrap (HCD−OT) and in Orbitrap Fusion (HCD−IT); MS Amanda−Group exceled for the Q-TOF data set and the Orbitrap Velos HCD−OT data set. Taken together, these results are valuable for determining the best method to interpret data.

Reference

Tu, C. et al. (2015) “Optimization of search engines and postprocessing approaches to maximize peptide and protein identification for high-resolution mass data,” Journal of Proteome Research, 4(11) (pp. 4662–73), doi: 10.1021/acs.jproteome.5b00536.