The world of proteomics is humming with the early summer publication of not one, but two draft maps of the human proteome. Each paper represents proteomics data harvested from Thermo Scientific’s high-resolution LTQ Orbitrap hybrid ion trap-Orbitrap mass spectrometers, with each team approaching the project from a slightly different direction.
Bernhard Kuster’s Technische Universität München (TUM) team (first author: Mathias Wilhelm) chose to analyze data available publically, as well as information generated from its own laboratories.1 Using only information generated by the high-resolution Orbitrap instrumentation, the researchers assembled their draft map and made it available through a public in-memory database with sufficient processing power to enable big data analysis.2
Wilhelm et al. (2014) combined the data from 16,857 liquid chromatography–tandem mass spectrometric (LC-MS/MS) experiments, focusing only on results produced by the high-resolution LTQ Orbitrap hybrid ion trap-Orbitrap mass spectrometers (Thermo Scientific) for consistency. Samples analyzed included human tissue samples, cell lines and biological fluids. The researchers also acquired data from post-translational modification (PTM) studies and from affinity purification enrichments. They used data from open access repositories or contributed by colleagues for 60% of the total, and the remainder they generated themselves from LC-MS/MS analysis of human samples, as previously described.
Once the researchers had obtained all the data required, they reprocessed the files through the protein identification and quantitation software programs MaxQuant and Mascot before importing into the new online database, ProteomicsDB, for analysis. At the time of publication, this database held protein evidence for 18,097 of the 19,629 genes described as protein-encoding.
Through ProteomicsDB analysis, the team identified a core proteome of between 10,000 and 12,000 proteins responsible for basic cellular function. Functional and tissue-typing analysis also created proteome characteristics for 27 tissues and biological fluids.
In addition, the analysis yielded alternative protease suggestions for optimal peptide generation; trypsin does not act on all proteins, thus limiting sequencing coverage. ProteomicsDB contains a tool for optimal protease selection that the authors are confident will greatly enhance research into, for example, PTMs. In addition to recognizing other PTMs, ProteomicsDB analysis identified 81,721 phosphorylated peptides, thus showing that more than half of all human proteins are kinase substrates.
When describing the new online resource ProteomicsDB, Wilhelm and co-authors note that it is capable of displaying and annotating the vast array of data it contains in real time. With 2 TB of RAM and employing 160 CPUs, the publically accessible portal can work with all 71 million identified peptide mass spectra uploaded at the time of the draft map’s publication in the May 2014 edition of Nature. The authors consider the online database tool a valuable addition to facilitate future proteomics research.
Reference and Note
1. Wilhelm, M., et al. (2014) “Mass-spectrometry-based draft of the human proteome,” Nature, 509 (pp. 582–7), doi: 10.1038/nature13319.
2. ProteomicsDB is available at https://www.proteomicsdb.org.
Post Author: Amanda Maxwell. Mixed media artist; blogger and social media communicator; clinical scientist and writer.
A digital space explorer, engaging readers by translating complex theories and subjects creatively into everyday language.