Protein expression

Recombinant protein expression technology enables analysis of gene regulation and protein structure and function. Utilization of recombinant protein expression varies widely—from investigation of function in vivo to large-scale production for structural studies and biotherapeutic drug discovery. This handbook will cover the fundamentals of protein expression, from selecting a host system to creating your protein expression vector, as well as highlighting key tips and products that can be used to optimize recombinant protein production and purification.

Do you need guidance on a new project, troubleshooting expertise, or more information on a product?

Share your protein expression application needs with us, and one of our protein expression specialists will contact you to discuss the solutions we can provide.

Contact a specialist

Get your copy of the handbook

If you've found this chapter—Protein Expression Overview—useful, you may be interested in getting your own copy of the entire 118-page Protein Expression Handbook in convenient PDF format.

Request your free copy

How proteins are made

Proteins are synthesized and regulated depending upon the functional need in the cell. The blueprints for proteins are stored in DNA, which is used as a template by highly regulated transcriptional processes to produce messenger RNA (mRNA). The message coded by an mRNA is then translated into defined sequences of amino acids that form a protein (Figure 1.1). Transcription is the transfer of information from DNA to mRNA, and translation is the synthesis of protein based on an amino acid sequence specified by mRNA.

Simple diagram of transcription and translation
Figure 1.1. Simple diagram of transcription and translation. This describes the general flow of information from DNA base-pair sequence (gene) to amino acid polypeptide sequence (protein).

In prokaryotes, the processes of transcription and translation occur simultaneously. The translation of mRNA starts even before a mature mRNA transcript is fully synthesized. This simultaneous transcription and translation of a gene is termed coupled transcription and translation. In eukaryotes, the processes are spatially separated and occur sequentially, with transcription happening in the nucleus and translation occurring in the cytoplasm. After translation, polypeptides are modified in various ways to complete their structure, designate their location, or regulate their activity within the cell. Posttranslational modifications (PTMs) are various additions or alterations to the chemical structure of the newly synthesized protein and are critical features of the overall cell biology.

In general, proteomics research involves investigating any aspect of a protein, such as structure, function, modifications, localization, or interactions with proteins or other molecules. To investigate how particular proteins regulate biology, researchers usually require a means of producing (manufacturing) functional proteins of interest. Given the size and complexity of proteins, de novo synthesis is not a viable option for this endeavor. Instead, living cells or their cellular machinery can be harnessed as factories to build and construct proteins based on supplied genetic templates. Unlike proteins, DNA is simple to construct synthetically or in vitro using well-established recombinant DNA techniques. Therefore, DNA sequences of specific genes can be constructed as templates for subsequent protein expression (Figure 1.2). Proteins produced from such DNA templates are called recombinant proteins.

 A typical protein expression workflow
Figure 1.2. A typical protein expression workflow.

Cloning technologies for protein expression

Cloning refers to the process of transferring a DNA fragment, or gene of interest, from one organism to a self-replicating genetic element such as an expression vector (Figure 1.3).

Basic cloning vector
Figure 1.3. Basic cloning vector.

A typical expression vector includes at least 4 key elements:

Additional elements may include:

  • Multiple cloning site (MCS) (i.e., a polylinker)
  • Epitope tags
  • Secretion signal
  • Protease recognition sites
  • Internal ribosome entry site (IRES)

Most vectors contain a promoter for expression by a specific host system, however, some offer the option to add your own promoter. Table 1.1 lists common constitutive and inducible promoters.

Table 1.1. Constitutive and inducible promoters commonly used in recombinant protein expression systems.

HostExpression SystemConstitutive PromotersInduciblePromoters Inducers
MammalianIn vivoCMV (cytomegalovirus); EF-1 alpha (human elongation factor alpha) 1; UbC (human ubiquitin C); SV40 (simian virus 40)Promoter with TetO2 (tetracycline operator); promoter with GAL4-UAS (yeast GAL4 upstream activating sequence)Tetracycline or doxycycline; mifepristone
Cell free (rabbit reticulocyte)NoneNANA
Cell free (HeLa or CHO)Requires T7 promoter and T7 RNA polymerase for transcriptionNANA
InsectIn vivoAc5 (actin); OpIE1 & 2; PH (polyhedrin); p10MT (metallothionein)Copper
YeastIn vivoGAP (glyceraldehyde-3-phosphate dehydrogenase)AOX1 (aldehyde oxidase); GAL1 (galactokinase)Methanol; galactose
E. coliIn vivoNot commonly availableLac (lactose operon); araBAD (L-arabinose operon)IPTG; L-arabinose
Cell freeRequires T7 promoter and T7 RNA polymerase for transcriptionNANA

Depending on the host system, another important factor to consider is the inclusion of a Shine-Dalgarno ribosome-binding sequence (for prokaryote systems) or Kozak consensus sequence (for eukaryote systems).

Epitope tags are commonly used to allow for easy detection or rapid purification of your protein of interest by fusing a sequence coding for the tag with your gene. Epitope tags can be either on the N-terminus or C-terminus of your recombinant protein. Table 1.2 offers some basic guidelines to help select an epitope tag.

Table 1.2. Typical applications for various epitope tags.

PurposeDescriptionExamples of tag
DetectWell-characterized antibody available against the tag Easily visualizedV5, Xpress, myc, 6XHis, GST, BioEase tag, capTEV tag, GFP, Lumio tag, HA tag, FLAG tag
PurifyResins available to facilitate purification6XHis, GST, BioEase tag, capTEV tag
CleaveProtease recognition site (TEV, EK, HRV3C, Factor Xa) to remove tag after expression to get native proteinAny tag with a protease recognition site following the tag (only on N-terminus)

Genes and their variants can be prepared via PCR, isolated as a cDNA Clone, or synthesized as Invitrogen GeneArt Strings DNA Fragments or Libraries. Alternatively, genes can be synthesized by Invitrogen GeneArt Gene Synthesis or Directed Evolution custom services. Read more about building your gene in our Gene to protein handbook.

Thermo Fisher Scientific offers a variety of unique cloning technologies to shuttle your gene of interest into the right vector, to simplify cloning procedures, and help accelerate protein expression.

  • Restriction enzyme digestion followed by ligation cloning is an industry standard for molecular biologists
  • Invitrogen Gateway technology offers the greatest flexibility in vector choices and downstream applications
  • Invitrogen TOPO cloning provides simple and convenient reactions, typically requiring less than 5 minutes
  • Invitrogen GeneArt seamless cloning and genetic assembly cloning allows for cloning of up to 4 large DNA fragments simultaneously into virtually any linearized E. coli vector in a 30-minute room-temperature reaction (up to 40 kb total size)
  • Invitrogen GeneArt Type IIs Assembly avoids homologous recombination, allowing for the simultaneous cloning of up to 8 homologous or repetitive sequence fragments without scars
  • GeneArt Gene Synthesis offers 100% sequence accuracy and optimization of genes to help maximize protein expression

Read more about choosing a cloning method
Learn about GeneArt products

Transformation and plasmid isolation

Once cloning is completed, plasmids are taken up into competent cells (chemically competent or electrocompetent E. coli) for propagation and storage, by a process called transformation. Chemically competent cells are cells treated with salts to open up the pores in the membrane and cell wall. Plasmid DNA is then added to the cells and a mild heat shock opens pores in the E. coli cells, allowing for entry of the plasmid. In contrast, DNA is introduced into electrocompetent cells through transient pores that are formed in the E. coli membrane and cell wall when short electrical pulses are delivered to the cell and plasmid DNA mixture. When choosing a competent cell strain to work with, it is important to consider the following factors:

  • Genotype—the list of genetic mutations (deletions, changes, or insertions) in the strain that distinguish it from wild-type E. coli
  • Transformation efficiency—measurement of amount of supercoiled plasmid (such as pUC19) successfully transformed into a volume of cells; defined as colony forming units per microgram of DNA delivered (cfu/μg); we manufacture competent cells that have efficiencies ranging from >1 x 106 to >3 x 1010 cfu/μg
  • Application—experiment type for which the competent cells are well suited; applications include routine cloning, protein expression, library production, cloning unstable DNA, ssDNA propagation, bacmid creation, and Cre-Lox recombination
  • Kit format—formats include high-throughput (96 well), single-use Invitrogen One Shot vials, standard kits, or bulk format

Find the best competent cell strain for your experiment

After taking advantage of the E. coli’s molecular machinery to replicate the plasmid DNA, a plasmid purification kit can be used to purify the plasmid.

DNA (containing your gene of interest). We offer two main technologies for plasmid purification:

  • Anion exchange
  • Silica

For purification of a cloned plasmid that will be used to transfect into a cell line for protein expression, we recommend anion exchange purification for its higher purity and lower endotoxin levels. Silica-based purification is appropriate for cloning related workflows, but not optimal for plasmids used for transfection as there are higher levels of endotoxins and impurities. Anion exchange columns also produce better results with large plasmids. The Invitrogen PureLink HiPure Expi Plasmid Kits have been developed to give higher yields from large-scale plasmid isolation, in less than half the time of typical plasmid DNA isolation methods.

Find more information on plasmid isolation

Selecting an expression system

Using the right expression system for your specific application is the key to success. Protein solubility, functionality, purification speed, and yield are often crucial factors to consider when choosing an expression system. Additionally, each system has its own strengths and challenges, which are important when choosing an expression system. We offer 6 unique expression systems: mammalian, insect, yeast, bacterial, algal, and cell-free systems. Table 1.3 summarizes the main characteristics of these expression systems including the most common applications, advantages, and challenges with each system.

Once a system is selected, the method of gene delivery will need to be considered for protein expression. The main methods for gene delivery include transfection and transduction.

Transfection is the process by which nucleic acids are introduced into mammalian and insect cells. Protocols and techniques vary widely and include lipid transfection, chemical, and physical methods such as electroporation.

See different transfection methods or our transfection reagent selection guide

For cell types not amenable to lipid-mediated transfection, viral vectors are often employed. Virus-mediated transfection, also known as transduction, offers a means to reach hard-to-transfect cell types for protein overexpression or knockdown, and it is the most commonly used method in clinical research. Adenoviral, oncoretroviral, lentiviral, and baculoviral vectors have been used extensively for gene delivery to mammalian cells, both in cell culture and in vivo.

Cell lysis and protein purification

The next step following protein expression is often to isolate and purify the protein of interest. Protein yield and activity can be maximized by selecting the right lysis reagents and appropriate purification resin. We offer cell lysis formulations that have been optimized for specific host systems, including cultured mammalian, yeast, baculovirus-infected insect, and bacterial cells. Most recombinant proteins are expressed as fusion proteins with short affinity tags, such as polyhistidine or glutathione S-transferase, which allow for selective purification of the protein of interest. Recombinant His-tagged proteins are purified using immobilized metal affinity chromatography (IMAC) resins, and GST-tagged proteins are purified using a reduced glutathione resin.

Table 1.3. Characteristics of various recombinant protein expression systems.

Expression systemMost common applicationAdvantagesChallenges

  • Functional assays
  • Structural analysis
  • Antibody production
  • Expression of complex proteins
  • Protein interactions
  • Virus production
  • Highest-level protein processing
  • Can produce proteins either transiently, or by stable expression
  • Robust optimized transient systems for rapid, ultrahigh-yield protein production
  • Gram-per-liter yields only possible in suspension cultures
  • More demanding culture conditions

  • Functional assays
  • Structural analysis
  • Expression of intracellular proteins
  • Expression of protein complexes
  • Virus production
  • Similar to mammalian protein processing
  • Can be used in static or suspension culture
  • More demanding culture conditions than prokaryotic systems
  • Production of recombinant baculovirus vectors is time consuming

  • Structural analysis
  • Antibody generation
  • Functional analysis
  • Protein interactions
  • Eukaryotic protein processing
  • Scalable up to fermentation (grams per liter)
  • Simple media requirements
  • Fermentation required for very high yields
  • Growth conditions may require optimization

  • Structural analysis
  • Antibody generation
  • Functional assays
  • Protein interactions
  • Scalable
  • Low cost
  • Simple culture conditions
  • Protein solubility
  • May require protein specific optimization
  • May be difficult to express some mammalian proteins

  • Studying photosynthesis, plant biology, lipid metabolism
  • Genetic engineering
  • Biofuel production
  • Genetic modification and expression systems for photosynthetic microalgae
  • Superb experimental control for biofuel, nutraceuticals, and specialty chemical production
  • Optimized system for robust selection and expression
  • Nascent technology
  • Less developed compared to other host platforms

  • Toxic proteins
  • Incorporation of unnatural label or amino acids
  • Functional assays
  • Protein interactions
  • Translational inhibitor screening
  • Open system; able to add unnatural components
  • Fast expression
  • Simple format
  • Scaling above multimilligram quantities may not be costly

For Research Use Only. Not for use in diagnostic procedures.