Engineer Your Protein Using Directed Evolution Methods

Would you like to evolve your protein of interest but are tired of error-prone PCR approaches? Libraries based on synthetic templates have many advantages and could be the solution for you.

Synthetic biology has now paved the way to create genes and libraries that require no natural template at all—almost any design vision is possible. Synthetic genes are well suited for optimizing protein expression in a non-natural host or to acquire genes that are hard to find in natural sources. DNA synthesis is based on the orderly assembly of short, chemically produced oligonucleotides. The same methodology can also be used to synthesize a library that mimics the error-prone PCR. During automated oligonucleotide synthesis, a definable frequency of misincorporation of an alternative residue is allowed for certain positions by adding distinct impurities to the corresponding reagent bottle. Based on these ‘spiked’ oligonucleotides, a controlled randomization library can be assembled:

With the statistically expected ratio of transitions and transversions
Where the average number of substitutions can be fine-tuned very accurately
Where the positions of possible substitutions are not entirely random anymore, but limited to regions of interest defined solely by the investigator

How many mutations should one variant carry on average?

In general, a good rule of thumb is to limit the complexity of the library to still allow for the occurrence of the wild type gene in the screened subset. This helps ensure that the examined variants don’t diverge too much from the original protein, which could result in a loss of function for the complete pool.

Definition of regions of interest for which mutations should be introduced?

This question can be approached in many different ways, depending on the individual project. If the three dimensional structure of the investigated protein (or a highly similar one) is available, the amino acid residues likely related to the protein’s function (for example the active site of an enzyme) can sometimes be deduced and confirmed by single point mutations. If this data is available, it is a good idea to limit randomness to only these positions and to those that are in close three-dimensional proximity in a degenerated library. The remaining scaffold of the protein may be left unmodified.

Another approach uses the availability of natural homologous sequences in protein or DNA databases. These can be aligned according to their similarity to identify regions of conservation and variation, reflecting the natural selection pressure for the particular gene family. Depending on the overall similarity within the family and the scope of the aimed protein adaptation, a library design will provide for new mutations either in the conserved or the variable part of the gene. One example is antibodies which have a very conserved framework, but are highly variable within the antigen-binding pockets and are best randomized in these regions to identify new binding specificities.

If no such data is available, or if it does not correlate with the protein performance, the protein function topology can be mapped ahead of designing evolution experiments. A common tool is an alanine scan, which comprises the generation of all consecutive variants with each single amino acid being changed to alanine. The testing of these variants for maintenance or loss of function is an indicator of the importance of each protein position and usually identifies clusters of interest for further mutagenesis. The extreme application of this strategy is to change each individual amino acid not only to alanine but to all 19 non–wild type residues. This ‘sequential permutation’ of a 300 amino acid protein, involves the screening of 300 x 19 = 5,700 variants, which is manageable, even with low-throughput assays.

Figure 1. Example workflow for streamlined identification of beneficial mutations within your target protein

The functional analysis of these mutants generates a data matrix containing information about the importance of each amino acid position regarding overall function, and which non–wild type amino acid contributes to the adaptation of the protein towards the technical demands. In this type of experiment it is also common to identify beneficial mutations at amino acid positions that were not previously considered to be related to protein performance. With this valuable information at hand, it is possible to combine the advancing single mutations in a new combinatorial library to screen for synergistic beneficial effects. These second-round libraries typically concentrate on not more than 10 sites with either the wild type or the improving amino acid present. Thus, the size of this library is only ~210 = 1024, and can be fully screened even in complicated assays. As the demand for adapted proteins for industrial and medical applications increases, so will the scope of approaches to tailor them.

Get more information and case studies on our Invitrogen™ GeneArt™ Directed Evolution services website >>

Learn more about our Directed Evolution methods for protein engineering in our recorded webinar. Time-, effort-, and cost- effective alternatives for your protein research will be discussed. This webinar includes an exemplary case study comparing traditional error-prone PCR method versus a rationally designed library.

Watch our webinar: Directed evolution methods for protein engineering >>

For Research Use Only. Not for use in diagnostic procedures.

Sources:

https://www.thermofisher.com/us/en/home/life-science/cloning/gene-synthesis/directed-evolution.html