Recombinant protein expression technology enables analysis of gene regulation and protein structure and function. Utilization of recombinant protein expression varies widely—from investigation of function in vivo to large-scale production for structural studies and biotherapeutic drug discovery. This handbook will cover the fundamentals of protein expression, from selecting a host system to creating your protein expression vector, as well as highlighting key tips and products that can be used to optimize recombinant protein production and purification.

Get your copy of the handbook

If you've found this chapter—Protein Expression Overview—useful, you may be interested in getting your own copy of the entire 118-page Protein Expression Handbook in convenient PDF format.

Request your free copy

How proteins are made

Proteins are synthesized and regulated depending upon the functional need in the cell. The blueprints for proteins are stored in DNA, which is used as a template by highly regulated transcriptional processes to produce messenger RNA (mRNA). The message coded by an mRNA is then translated into defined sequences of amino acids that form a protein (Figure 1.1). Transcription is the transfer of information from DNA to mRNA, and translation is the synthesis of protein based on an amino acid sequence specified by mRNA.

Simple diagram of transcription and translation
Figure 1.1. Simple diagram of transcription and translation. This describes the general flow of information from DNA base-pair sequence (gene) to amino acid polypeptide sequence (protein).

In prokaryotes, the processes of transcription and translation occur simultaneously. The translation of mRNA starts even before a mature mRNA transcript is fully synthesized. This simultaneous transcription and translation of a gene is termed coupled transcription and translation. In eukaryotes, the processes are spatially separated and occur sequentially, with transcription happening in the nucleus and translation occurring in the cytoplasm. After translation, polypeptides are modified in various ways to complete their structure, designate their location, or regulate their activity within the cell. Posttranslational modifications (PTMs) are various additions or alterations to the chemical structure of the newly synthesized protein and are critical features of the overall cell biology.

In general, proteomics research involves investigating any aspect of a protein, such as structure, function, modifications, localization, or interactions with proteins or other molecules. To investigate how particular proteins regulate biology, researchers usually require a means of producing (manufacturing) functional proteins of interest. Given the size and complexity of proteins, de novo synthesis is not a viable option for this endeavor. Instead, living cells or their cellular machinery can be harnessed as factories to build and construct proteins based on supplied genetic templates. Unlike proteins, DNA is simple to construct synthetically or in vitro using well-established recombinant DNA techniques. Therefore, DNA sequences of specific genes can be constructed as templates for subsequent protein expression (Figure 1.2). Proteins produced from such DNA templates are called recombinant proteins.

 A typical protein expression workflow
Figure 1.2. A typical protein expression workflow.

Cloning technologies for protein expression

Cloning refers to the process of transferring a DNA fragment, or gene of interest, from one organism to a self-replicating genetic element such as an expression vector (Figure 1.3).

Basic cloning vector
Figure 1.3. Basic cloning vector.

A typical expression vector includes at least 4 key elements:

Additional elements may include:

  • Multiple cloning site (MCS) (i.e., a polylinker)
  • Epitope tags
  • Secretion signal
  • Protease recognition sites
  • Internal ribosome entry site (IRES)

Most vectors contain a promoter for expression by a specific host system, however, some offer the option to add your own promoter. Table 1.1 lists common constitutive and inducible promoters.

Table 1.1. Constitutive and inducible promoters commonly used in recombinant protein expression systems.

Host Expression System Constituitive Promoters Inducible Promoters Inducers
Mammalian In vivo CMV (cytomegalovirus); EF-1 alpha (human elongation factor alpha) 1; UbC (human ubiquitin C); SV40 (simian virus 40) Promoter with TetO2 (tetracycline operator); promoter with GAL4-UAS (yeast GAL4 upstream activating sequence) Tetracycline or doxycycline; mifepristone
Cell free (rabbit reticulocyte) None NA NA
Cell free (HeLa or CHO) Requires T7 promoter and T7 RNA polymerase for transcription NA NA
Insect In vivo Ac5 (actin); OpIE1 & 2; PH (polyhedrin); p10 MT (metallothionein) Copper
Yeast In vivo GAP (glyceraldehyde-3-phosphate dehydrogenase) AOX1 (aldehyde oxidase); GAL1 (galactokinase) Methanol; galactose
E. coli In vivo Not commonly available Lac (lactose operon); araBAD (L-arabinose operon) IPTG; L-arabinose
Cell free Requires T7 promoter and T7 RNA polymerase for transcription NA NA

Depending on the host system, another important factor to consider is the inclusion of a Shine-Dalgarno ribosome-binding sequence (for prokaryote systems) or Kozak consensus sequence (for eukaryote systems).

Epitope tags are commonly used to allow for easy detection or rapid purification of your protein of interest by fusing a sequence coding for the tag with your gene. Epitope tags can be either on the N-terminus or C-terminus of your recombinant protein. Table 1.2 offers some basic guidelines to help select an epitope tag.

Table 1.2. Typical applications for various epitope tags.

Purpose Description Examples of tag
Detect Well-characterized antibody available against the tag Easily visualized V5, Xpress, myc, 6XHis, GST, BioEase tag, capTEV tag, GFP, Lumio tag, HA tag, FLAG tag
Purify Resins available to facilitate purification 6XHis, GST, BioEase tag, capTEV tag
Cleave Protease recognition site (TEV, EK, HRV3C, Factor Xa) to remove tag after expression to get native protein Any tag with a protease recognition site following the tag (only on N-terminus)

Genes and their variants can be prepared via PCR, isolated as a cDNA Clone, or synthesized as Invitrogen™ GeneArt™ Strings™ DNA Fragments or Libraries. Alternatively, genes can be synthesized by Invitogen™ GeneArt™ Gene Synthesis or Directed Evolution custom services. Read more about building your gene in our Gene to protein handbook.

Thermo Fisher Scientific offers a variety of unique cloning technologies to shuttle your gene of interest into the right vector, to simplify cloning procedures, and help accelerate protein expression.

  • Restriction enzyme digestion followed by ligation cloning is an industry standard for molecular biologists
  • Invitogen™ Gateway™ technology offers the greatest flexibility in vector choices and downstream applications
  • Invitogen™ TOPO™ cloning provides simple and convenient reactions, typically requiring less than 5 minutes
  • Invitrogen™ GeneArt™ seamless cloning and genetic assembly cloning allows for cloning of up to 4 large DNA fragments simultaneously into virtually any linearized E. coli vector in a 30-minute room-temperature reaction (up to 40 kb total size)
  • Invitrogen™ GeneArt™ Type IIs Assembly avoids homologous recombination, allowing for the simultaneous cloning of up to 8 homologous or repetitive sequence fragments without scars
  • GeneArt Gene Synthesis offers 100% sequence accuracy and optimization of genes to help maximize protein expression

Read more about choosing a cloning method
Learn about GeneArt products

Transformation and plasmid isolation

Once cloning is completed, plasmids are taken up into competent cells (chemically competent or electrocompetent E. coli) for propagation and storage, by a process called transformation. Chemically competent cells are cells treated with salts to open up the pores in the membrane and cell wall. Plasmid DNA is then added to the cells and a mild heat shock opens pores in the E. coli cells, allowing for entry of the plasmid. In contrast, DNA is introduced into electrocompetent cells through transient pores that are formed in the E. coli membrane and cell wall when short electrical pulses are delivered to the cell and plasmid DNA mixture. When choosing a competent cell strain to work with, it is important to consider the following factors:

  • Genotype—the list of genetic mutations (deletions, changes, or insertions) in the strain that distinguish it from wild-type E. coli
  • Transformation efficiency—measurement of amount of supercoiled plasmid (such as pUC19) successfully transformed into a volume of cells; defined as colony forming units per microgram of DNA delivered (cfu/μg); we manufacture competent cells that have efficiencies ranging from >1 x 106 to >3 x 1010 cfu/μg
  • Application—experiment type for which the competent cells are well suited; applications include routine cloning, protein expression, library production, cloning unstable DNA, ssDNA propagation, bacmid creation, and Cre-Lox recombination
  • Kit format—formats include high-throughput (96 well), single-use Invitrogen™ One Shot™ vials, standard kits, or bulk format

Find the best competent cell strain for your experiment

After taking advantage of the E. coli’s molecular machinery to replicate the plasmid DNA, a plasmid purification kit can be used to purify the plasmid.

DNA (containing your gene of interest). We offer two main technologies for plasmid purification:

  • Anion exchange
  • Silica

For purification of a cloned plasmid that will be used to transfect into a cell line for protein expression, we recommend anion exchange purification for its higher purity and lower endotoxin levels. Silica-based purification is appropriate for cloning related workflows, but not optimal for plasmids used for transfection as there are higher levels of endotoxins and impurities. Anion exchange columns also produce better results with large plasmids. The Invitrogen™ PureLink™ HiPure Expi Plasmid Kits have been developed to give higher yields from large-scale plasmid isolation, in less than half the time of typical plasmid DNA isolation methods.

Find more information on plasmid isolation

Selecting an expression system

Using the right expression system for your specific application is the key to success. Protein solubility, functionality, purification speed, and yield are often crucial factors to consider when choosing an expression system. Additionally, each system has its own strengths and challenges, which are important when choosing an expression system. We offer 6 unique expression systems: mammalian, insect, yeast, bacterial, algal, and cell-free systems. Table 1.3 summarizes the main characteristics of these expression systems including the most common applications, advantages, and challenges with each system.

Once a system is selected, the method of gene delivery will need to be considered for protein expression. The main methods for gene delivery include transfection and transduction.

Transfection is the process by which nucleic acids are introduced into mammalian and insect cells. Protocols and techniques vary widely and include lipid transfection, chemical, and physical methods such as electroporation.

See different transfection methods or our transfection reagent selection guide

For cell types not amenable to lipid-mediated transfection, viral vectors are often employed. Virus-mediated transfection, also known as transduction, offers a means to reach hard-to-transfect cell types for protein overexpression or knockdown, and it is the most commonly used method in clinical research. Adenoviral, oncoretroviral, lentiviral, and baculoviral vectors have been used extensively for gene delivery to mammalian cells, both in cell culture and in vivo.

Cell lysis and protein purification

The next step following protein expression is often to isolate and purify the protein of interest. Protein yield and activity can be maximized by selecting the right lysis reagents and appropriate purification resin. We offer cell lysis formulations that have been optimized for specific host systems, including cultured mammalian, yeast, baculovirus-infected insect, and bacterial cells. Most recombinant proteins are expressed as fusion proteins with short affinity tags, such as polyhistidine or glutathione S-transferase, which allow for selective purification of the protein of interest. Recombinant His-tagged proteins are purified using immobilized metal affinity chromatography (IMAC) resins, and GST-tagged proteins are purified using a reduced glutathione resin.

Table 1.3. Characteristics of various recombinant protein expression systems.

Expression system Most common application Advantages Challenges

  • Functional assays
  • Structural analysis
  • Antibody production
  • Expression of complex proteins
  • Protein interactions
  • Virus production
  • Highest-level protein processing
  • Can produce proteins either transiently, or by stable expression
  • Robust optimized transient systems for rapid, ultrahigh-yield protein production
  • Gram-per-liter yields only possible in suspension cultures
  • More demanding culture conditions

  • Functional assays
  • Structural analysis
  • Expression of intracellular proteins
  • Expression of protein complexes
  • Virus production
  • Similar to mammalian protein processing
  • Can be used in static or suspension culture
  • More demanding culture conditions than prokaryotic systems
  • Production of recombinant baculovirus vectors is time consuming

  • Structural analysis
  • Antibody generation
  • Functional analysis
  • Protein interactions
  • Eukaryotic protein processing
  • Scalable up to fermentation (grams per liter)
  • Simple media requirements
  • Fermentation required for very high yields
  • Growth conditions may require optimization

  • Structural analysis
  • Antibody generation
  • Functional assays
  • Protein interactions
  • Scalable
  • Low cost
  • Simple culture conditions
  • Protein solubility
  • May require proteinspecific optimization
  • May be difficult to express some mammalian proteins

  • Studying photosynthesis, plant biology, lipid metabolism
  • Genetic engineering
  • Biofuel production
  • Genetic modification and expression systems for photosynthetic microalgae
  • Superb experimental control for biofuel, nutraceuticals, and specialty chemical production
  • Optimized system for robust selection and expression
  • Nascent technology
  • Less developed compared to other host platforms

  • Toxic proteins
  • Incorporation of unnatural label or amino acids
  • Functional assays
  • Protein interactions
  • Translational inhibitor screening
  • Open system; able to add unnatural components
  • Fast expression
  • Simple format
  • Scaling above multimilligram quantities may not be costly
Visistat reference component