A new proteomics data analysis tool, Novor, describes peptide sequencing in real time, delivering results as the spectra roll out of the mass spectrometer. As Bin Ma’s recent paper (2015) reveals, Novor’s real-time sequencing ability could also integrate well with instrument control programs.1
Although real-time sequencing capabilities exist for several genomics strategies, this tool has not yet been available to proteomics researchers. Ma’s Novor bioinformatics tool is a two-stage algorithmic approach that combines scoring and a decision tree with machine learning to improve speed and accuracy over and above existing methods. From this dynamic programming and methodology refinement approach, Novor demonstrated enhanced performance compared with a current industry standard for de novo sequencing.
Using a training data set, Ma first adjusted the Novor algorithms before validating the approach with four different spectral libraries. He used a human spectral library from NIST (National Institute of Standards and Technology) comprising 340,357 spectra measured by Iontrap to train the software and fix parameters. By using this machine learning approach, the Novor programming developed the ability to “fill in the gaps” in peptide sequencing. This allows real-time sequencing acquisition where the software runs alongside spectral data acquisition.
Following software calibration using the NIST human spectral library, Ma compared the processing capabilities of the Novor software with PEAKS, a popular state-of-the-art de novo sequencing tool. He used available data sets comprising proteomic data from an NIST C. elegans ion trap library, an E3 ubiquitin ligase study derived by Orbitrap mass spectrometer analysis, a UPS2 protein standards library obtained using LTQ Orbitrap hybrid ion trap-Orbitrap mass spectrometer (Thermo Scientific) analysis, and a U2OS human osteosarcoma study.
Ma compared the two sequencing methods by looking at the precision-recall trade-off curves generated in each analysis. Compared to PEAKS, Novor represented improvements of 37%, 15%, 20% and 7% in residue sequencing for the four validation data sets. As Ma discusses, this is probably due to Novor’s ability to work with weaker evidence for processing the spectral data presented. Novor is able to fill in the holes in sequencing mass gaps with more credible and valid predictions due to its machine learning stage.
In terms of de novo sequencing speed, Novor sequenced 322 spectra per second (UPS2 dataset), accomplishing this on a laptop (MacBook Pro). Since Novor works in both Windows and siOS environment, and PEAKS is purely Windows compliant, Ma also ran the Novor speed tests on a Windows PC for comparison. He found that Novor runs 13 times faster than PEAKS. At this speed, Novor is outstripping the rate of data acquisition by a mass spectrometer, showing that real-time sequencing is possible.
Ma concludes that the speed and accuracy demonstrated by Novor make de novo sequencing affordable and less time-consuming for proteomics researchers. Furthermore, since the speed of data interpretation exceeds acquisition, it is possible for de novo sequencing results to interact with mass spectrometric analytical instrument control programs.
1. Ma, B. (2015) “Novor: Real-time peptide de novo sequencing software,” Journal of the American Society for Mass Spectrometry, doi: 10.1007/s13361-015-1204-0
Researchers can download a fully functional free academic license of Novor software from www.rapidnovor.org/novor
Post Author: Amanda Maxwell. Mixed media artist; blogger and social media communicator; clinical scientist and writer. A digital space explorer, engaging readers by translating complex theories and subjects creatively into everyday language.