Proteomics researchers benefit—or suffer, depending on the point of view—from a plethora of file formats with which to store data. Some may be used because of personal preference, while others because of instrument manufacturer specifications. Although all formats store mass spectrometry data, their incompatibility may not allow exchange or comparison between studies. Furthermore, the situation also makes sharing data publicly more complex, presenting barriers to understanding beyond a proteomics bioinformatics community.
Members of the Proteomics Standards Initiative (PSI) within the Human Proteome Organization (HUPO) have released mzTab, a data exchange format both compatible with mzML, an XML-based file format developed to store primary MS spectral and chromatogram data, and accessible by other proteomics data analysis tools.1 Developed according to unified standardized data reporting guidelines as a supplement to XML-based file formats, mzTab enables dissemination of valuable proteomics results to a wider biological community. According to authors Griss et al. (2014), specific knowledge of bioinformatics and computational proteomics is not required to access the data. Moreover, the mzTab format enables access to converter tools for public repository uploads.
mzTab provides a simple and consistent tabular layout that displays mass spectrometry data in a dynamic and flexible format for reporting at a number of levels. The program displays data as identities, basic quantitative information, and metadata that relates to the experimental design. The program can be used alone or in conjunction with mzML; it also allows export from mzIdentML and mzQuantML files. The mzTab data can be imported into various statistical packages, –omics tools and Web-based apps, and the file contents are accessible to scripting languages. Information for developers is available online to facilitate further integration of mzTab into proteomics bioinformatics tools.
There are five sections within an mzTab file: metadata plus identification data for proteins, peptides, peptide spectral matches and small molecules. These sections make two types of information available: identity and quantitation.
The metadata section within mzTab is customizable enough to allow varying levels of reporting, from summaries to outputs that are more complete. It also maintains the minimum standards for accompanying information suggested by the HUPO PSI MIAPE (Minimum Information about a Proteomics Experiment) guidelines. mzTab also upholds the CIMR (Core Information for Metabolomics Reporting) guidelines for metabolomics data reporting.
Peptide and protein identity reporting within mzTab can reflect association with multiple search engines used to generate the findings. Furthermore, mzTAB assigns scores to these as associated with the data. In addition to reporting quantitation results, mzTab can also denote presence and position of modifications within the analytes detected.
In addition to providing a simplified format for exchanging data, Griss et al. suggest that mzTab could also be valuable for standardizing the supplementary information that authors are required to provide with paper submissions. MzTab is available for download online along with information for new users.
1. Griss, S.J., et al. (2014) “The mzTab data exchange format: Communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience,” Molecular & Cellular Proteomics, 13 (pp. 2765–75). doi: 10.1074/mcp.O113.036681.