What’s True? What’s False? ProteoStats and the FDR

One of the key steps in harvesting meaningful results from the plethora of spectral data generated by mass spectrometric (MS)-based proteomic analysis is deciding what is real and what is spurious. Commonly referred to as the false discovery rate, or FDR, this multiple correction procedure filters results based on the reality of theoretical versus experimental spectral data. Currently, there is no easy way of doing this for large-scale experimental datasets. It all comes down to statistical probability, and the FDR is the gatekeeper against false-positive results.

ProteoStats aims to fill this gap, with an open-source framework for developers that is platform-independent and can handle various types of data files. Its co-developers, Yadav et al. (2013), have created a highly versatile tool that can be adapted by developers to provide automated data analysis and output.¹

MS-based proteomics generates millions of spectral data points during high-throughput shotgun proteomics workflows. The key to making these data meaningful is to assign identities to each of the peptides discovered. This occurs by comparing actual with observed spectra, assigning the most probable result to each peptide, and thereby arriving eventually at the parent-protein identity. The authors cite a technique known as target-decoy search-based FDR estimation as the most common approach used.

According to Yadav and colleagues, however, the lack of FDR estimation tools is hampering onward development of statistical proteomics and experimental design. The researchers describe ProteoStats—their solution—as “an open-source cross-platform scripting library…[using] several FDR estimation procedures.” Written in Perl scripting, it interfaces easily with other tools. In addition, the authors claim that its user interface allows for easy programming.

The program supports various file formats including OMSSA, MassWiz, Mascot and X! Tandem. It also handles PepXML, as well as text file formats. ProteoStats can be used for separate searches, where target and decoy databases remain apart, or concatenated, where the two are combined. It calculates the FDR by reading and converting the native results files before removing decoy peptides. It then creates a target/decoy source array before outputting an FDR estimate and decoy score FDR and before it calculates a q-value. The authors explain that outputs can be tailored as .csv or text files for further processing or by aggregating into an Excel spreadsheet (for example). One of the key perks of ProteoStats is the ability to visualize data.

As an open-source tool, the authors consider ProteoStats a versatile instrument for results analysis in the field of shotgun proteomics. If you would like to take ProteoStats for a spin yourself, you can find it at the following location: https://sourceforge.net/projects/mssuite/files/ProteoStats/.

Reference

1. Yadav, A.K., et al. (2013, November) “ProteoStats—A library for estimating false discovery rates in proteomics pipelines,” Bioinformatics, 29 (pp. 2799–800), doi: 10.1093/bioinformatics/btt490.

Post Author: Amanda Maxwell. Mixed media artist; blogger and social media communicator; clinical scientist and writer.

A digital space explorer, engaging readers by translating complex theories and subjects creatively into everyday language.

Leave a Reply Cancel reply

Get news and research reviews on the topic of your choice, right in your inbox.