From the efforts of the human genome project and related scientific endeavors, we now know that more than 20,000 protein encoding genes are present in the human genome. A variety of variables—including location of the gene on the chromosome, mutations that affect promoter or transcription binding sites, epigenetic factors or various diseases—control the individual expression of these genes. The goal of the Chromosome-Centric Human Proteome Project is, first, to build a database of expression patterns for these genes and, second, to map those data to sites on chromosomes, ultimately identifying at least one isoform for each of the expressed protein coding genes in humans.1 If successful, such a database could be used to profile individual or cell-line abnormalities in expression patterns on specific chromosomes or regions of chromosomes, and aid in the identification of salient biomarkers for various disease states.
Even though researchers have performed hundreds of proteomic studies on human cell lines and tissues, thousands of predicted proteins encoded by genes in the genome have yet to be detected. This is most likely due to the widely varied expression patterns of genes depending on tissue type, development states, and external signals that activate or suppress gene expression in individual cells. All human cells have the same DNA; the variance in form and function is a consequence of specific gene regulation programs present in those cells and tissues that result in unique protein expression profiles.
Shiromizu et al. took datasets from five separate proteomic experiments using colorectal cancer samples.2 By analyzing samples on an LTQ Orbitrap Velos hybrid ion trap-Orbitrap mass spectrometer (Thermo Scientific) after fractionation or steps to make relative quantification comparisons between diseased and healthy tissue (through iTRAQ or SILAC labeling experiments), the researchers identified 11,278 different proteins. In addition, they subjected protein samples to immobilized metal affinity chromatography (IMAC) using titanium dioxide for the enrichment of phosphorylated peptides. In addition, the team analyzed phosphorylated peptide fractions on the LTQ Orbitrap Velos mass spectrometer and identified 28,205 phosphosites on 8,305 proteins. Several proteins overlapped between the phosphorylation enrichment and the straight proteome, but more than 1,700 proteins were identified in the phosphoprotein data set alone.2
Through comparison of this dataset with the neXtProt database, a depository for all human proteomic data, 3,033 previously unidentified proteins were present in either the phospho data set or the unenriched samples. This decreases the total number of “missing” human proteins from the neXtProt database significantly. By combining different fractionation and enrichment techniques on parallel samples, several thousand additional proteins that were previously unidentified were confirmed in colon tissues. The upstream fractionation step served to simplify the sample. A more complete proteome can be assembled using these data in combination with high-throughput mass spectrometry.2 This is an important step in establishing an accurate baseline proteome for healthy tissue, for comparison to disease-state tissues. Proper and thorough measurement of protein levels and expression patterns can establish the necessary baseline for the identification of useful disease-associated biomarkers.
References
1. Hancock, W., Omenn, G., Legrain, P., and Paik, Y.K. (2011) “Proteomics, human proteome project, and chromosomes,” Journal of Proteome Research, 10(1) (pp. 210–11).
2. Shiromizu, T., et al. (2013) “Identification of missing proteins in the neXtProt database and unregistered phosphopeptides in the PhosphoSitePlus database as part of the Chromosome-centric Human Proteome Project,” Journal of Proteome Research, 12(6) (pp. 2414–21).
Post Author: Adam Humbard.
Leave a Reply