Thermo Fisher Scientific

  • Categories
    • Advancing Materials
    • Advancing Mining
    • AnalyteGuru
    • Analyzing Metals
    • Ask a Scientist
    • Behind the Bench
    • Biotech at Scale
    • Clinical Conversations
    • Examining Food
    • Identifying Threats
    • Illuminating Semiconductors
    • Life in Atomic Resolution
    • Life in the Lab
    • OEMpowered
    • The Connected Lab
  • About Us
  • Contact
Accelerating ScienceAccelerating Proteomics / Bioinformatics / Isoelectric Point Estimation, Amino Acid Sequence and Algorithms

Isoelectric Point Estimation, Amino Acid Sequence and Algorithms

Written by Amanda Maxwell | Published: 05.16.2016

Program code and chemical formula. Image: isak55/Shutterstock.comThe isoelectric point, or pI,represents a point of balance for a molecule, where the external surface charge is a net zero. This factor governs electrophoretic mobility in proteins and also plays a role in identifying peptides from mass spectral proteomics data. pI depends on a number of factors, including amino acid sequence, post-translational modifications (PTMs) and presence of side chain—all of which can alter surface charge and behavior depending on the pH of the environment.

Various methods for predicting pI in denatured proteins exist, and most base this calculation on amino acid sequence with reference to pKa values recorded for ionizable constituents. Although these predictive methods exist, their performance can be variable and may skew ensuing results.

Audain et al. (2015) compared and contrasted five tools available to researchers for determining pI on the basis of amino acid sequence.1 The researchers benchmarked algorithm performance, comparing results obtained against public data sets to show how well these predictive tools performed.

The researchers chose the following tools to undergo benchmarking:

  • Iterative: calculated from amino acid sequence
  • Cofactor: calculated with correction factors according to amino acid position and adjacent charged residues
  • Bjellqvist: calculated according to pKa and amino acid position
  • Support Vector Machine (SVM): calculation based on amino acid sequence and Amino Acid Index database (AAindex) data
  • Branca: calculation according to correction factors for position, influence of neighboring groups, and statistical corrections for presence and nature of side chain groups

Audain et al. note that in order to avoid bias in reporting, they did not optimize the methods used for evaluation for any of the tools under investigation.

First, the team constructed an R-package, a collection of programs, functions and data written in statistical programming language R, as a framework for reproducible analysis within which to examine performance of the various algorithms. In this way, the benchmarking process would allow for direct comparisons through reference to correlation and root-mean-square deviation (RMSD) evaluation. The researchers then calculated pI values using each of the tools under investigation before comparing the theoretical results obtained against those publicly available. Audain et al. used two databases for reference; the first, the PIP-DB (protein isoelectric point database) contains a comprehensive record of protein pI data. The second is made up of values obtained for the tryptic proteome generated from the cellular fraction of Drosophila Kc167 cells.

For the theoretical values generated for proteins, the team first grouped the results into those with variable pIs and those with only one unique pI. From this analysis, they found that most proteins do not possess a unique pI. From the comparison between observed and theoretical, the researchers found a mostly poor performance from all five tools, with R2 values ranging between 0.61 and 0.15. The best performance, with the lowest RMSD of 1.28, came from the SVM calculations.

When considering the data from peptides, the researchers found much better performance, with high correlation between predicted and observed pI values (R2 = 0.96). They found the lowest RMSD with SVM predictions (0.21). Looking at peptides modified by PTMs, the team saw that the best predictions came when the algorithm included the effect of the PTM alongside its overall theoretical calculation.

Although Audain et al. found poor benchmarking performance for the five methods investigated, they make some suggestions arising from the process:

  1. Some algorithms are suitable for in silico prediction
  2. Machine-learning algorithms function best, although the ability depends on training and quality of training data

The authors also make further suggestions based on the results for the ideal conditions under which the algorithms function best, and have also made software and data freely available for scrutiny. 

 

Reference

1. Audain, E., et al. (2015) “Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences,” Bioinformatics, doi: 10.1093/bioinformatics/btv674. 

Post Author: Amanda Maxwell. Mixed media artist; blogger and social media communicator; clinical scientist and writer. A digital space explorer, engaging readers by translating complex theories and subjects creatively into everyday language.

Share this article
17
SHARES
FacebookLinkedin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Propagating Injury: Alarmins as Biological Mediators of Spinal Cord Damage
Identifying Brain Protein Changes with Methamphetamine Exposure in Rats

Privacy StatementTerms & ConditionsLocationsSitemap

© 2025 Thermo Fisher Scientific. All Rights Reserved.

Talk to us

Notifications

Get news and research reviews on the topic of your choice, right in your inbox.

Subscribe Now

  • This field is for validation purposes and should be left unchanged.

×
  • Tweet
  • Facebook
  • Tweet
  • Facebook