Protein Expression Overview
Recombinant protein expression technology enables analysis of gene regulation and protein structure and function. Utilization of recombinant protein expression varies widely—from investigation of function in vivo to large-scale production for structural studies and biotherapeutic drug discovery. This handbook will cover the fundamentals of protein expression, from selecting a host system to creating your protein expression vector, as well as highlighting key tips and products that can be used to optimize recombinant protein production and purification.
Get your copy of the handbook
If you've found this chapter—Protein Expression Overview—useful, you may be interested in getting your own copy of the entire 118-page Protein Expression Handbook in convenient PDF format.
How proteins are made
Proteins are synthesized and regulated depending upon the functional need in the cell. The blueprints for proteins are stored in DNA, which is used as a template by highly regulated transcriptional processes to produce messenger RNA (mRNA). The message coded by an mRNA is then translated into defined sequences of amino acids that form a protein (Figure 1.1). Transcription is the transfer of information from DNA to mRNA, and translation is the synthesis of protein based on an amino acid sequence specified by mRNA.
In prokaryotes, the processes of transcription and translation occur simultaneously. The translation of mRNA starts even before a mature mRNA transcript is fully synthesized. This simultaneous transcription and translation of a gene is termed coupled transcription and translation. In eukaryotes, the processes are spatially separated and occur sequentially, with transcription happening in the nucleus and translation occurring in the cytoplasm. After translation, polypeptides are modified in various ways to complete their structure, designate their location, or regulate their activity within the cell. Posttranslational modifications (PTMs) are various additions or alterations to the chemical structure of the newly synthesized protein and are critical features of the overall cell biology.
In general, proteomics research involves investigating any aspect of a protein, such as structure, function, modifications, localization, or interactions with proteins or other molecules. To investigate how particular proteins regulate biology, researchers usually require a means of producing (manufacturing) functional proteins of interest. Given the size and complexity of proteins, de novo synthesis is not a viable option for this endeavor. Instead, living cells or their cellular machinery can be harnessed as factories to build and construct proteins based on supplied genetic templates. Unlike proteins, DNA is simple to construct synthetically or in vitro using well-established recombinant DNA techniques. Therefore, DNA sequences of specific genes can be constructed as templates for subsequent protein expression (Figure 1.2). Proteins produced from such DNA templates are called recombinant proteins.
Cloning technologies for protein expression
Cloning refers to the process of transferring a DNA fragment, or gene of interest, from one organism to a self-replicating genetic element such as an expression vector (Figure 1.3).
A typical expression vector includes at least 4 key elements:
- The gene of interest (GOI) expression cassette (including a promoter and a gene-termination or poly(A) signal)
- Bacterial origin of replication (ori)
- Antibiotic selection cassette for the particular host (e.g., conferring resistance to blasticidin S, Geneticin Selective Antibiotic, hygromycin B, mycophenolic acid, puromycin, Zeocin Selection Reagent)
- Antibiotic selection cassette for E. coli (e.g., conferring resistance to ampicillin, blasticidin S, carbenicillin, Zeocin Selection Reagent)
Additional elements may include:
- Multiple cloning site (MCS) (i.e., a polylinker)
- Epitope tags
- Secretion signal
- Protease recognition sites
- Internal ribosome entry site (IRES)
Most vectors contain a promoter for expression by a specific host system, however, some offer the option to add your own promoter. Table 1.1 lists common constitutive and inducible promoters.
Table 1.1. Constitutive and inducible promoters commonly used in recombinant protein expression systems.
|Host||Expression System||Constituitive Promoters||Inducible||Promoters Inducers|
|Mammalian||In vivo||CMV (cytomegalovirus); EF-1 alpha (human elongation factor alpha) 1; UbC (human ubiquitin C); SV40 (simian virus 40)||Promoter with TetO2 (tetracycline operator); promoter with GAL4-UAS (yeast GAL4 upstream activating sequence)||Tetracycline or doxycycline; mifepristone|
|Cell free (rabbit reticulocyte)||None||NA||NA|
|Cell free (HeLa or CHO)||Requires T7 promoter and T7 RNA polymerase for transcription||NA||NA|
|Insect||In vivo||Ac5 (actin); OpIE1 & 2; PH (polyhedrin); p10||MT (metallothionein)||Copper|
|Yeast||In vivo||GAP (glyceraldehyde-3-phosphate dehydrogenase)||AOX1 (aldehyde oxidase); GAL1 (galactokinase)||Methanol; galactose|
|E. coli||In vivo||Not commonly available||Lac (lactose operon); araBAD (L-arabinose operon)||IPTG; L-arabinose|
|Cell free||Requires T7 promoter and T7 RNA polymerase for transcription||NA||NA|
Depending on the host system, another important factor to consider is the inclusion of a Shine-Dalgarno ribosome-binding sequence (for prokaryote systems) or Kozak consensus sequence (for eukaryote systems).
Epitope tags are commonly used to allow for easy detection or rapid purification of your protein of interest by fusing a sequence coding for the tag with your gene. Epitope tags can be either on the N-terminus or C-terminus of your recombinant protein. Table 1.2 offers some basic guidelines to help select an epitope tag.
Table 1.2. Typical applications for various epitope tags.
|Purpose||Description||Examples of tag|
|Detect||Well-characterized antibody available against the tag Easily visualized||V5, Xpress, myc, 6XHis, GST, BioEase tag, capTEV tag, GFP, Lumio tag, HA tag, FLAG tag|
|Purify||Resins available to facilitate purification||6XHis, GST, BioEase tag, capTEV tag|
|Cleave||Protease recognition site (TEV, EK, HRV3C, Factor Xa) to remove tag after expression to get native protein||Any tag with a protease recognition site following the tag (only on N-terminus)|
Genes and their variants can be prepared via PCR, isolated as a cDNA Clone, or synthesized as Invitrogen GeneArt Strings DNA Fragments or Libraries. Alternatively, genes can be synthesized by Invitogen GeneArt Gene Synthesis or Directed Evolution custom services. Read more about building your gene in our Gene to protein handbook.
Thermo Fisher Scientific offers a variety of unique cloning technologies to shuttle your gene of interest into the right vector, to simplify cloning procedures, and help accelerate protein expression.
- Restriction enzyme digestion followed by ligation cloning is an industry standard for molecular biologists
- Invitogen Gateway technology offers the greatest flexibility in vector choices and downstream applications
- Invitogen TOPO cloning provides simple and convenient reactions, typically requiring less than 5 minutes
- Invitrogen GeneArt seamless cloning and genetic assembly cloning allows for cloning of up to 4 large DNA fragments simultaneously into virtually any linearized E. coli vector in a 30-minute room-temperature reaction (up to 40 kb total size)
- Invitrogen GeneArt Type IIs Assembly avoids homologous recombination, allowing for the simultaneous cloning of up to 8 homologous or repetitive sequence fragments without scars
- GeneArt Gene Synthesis offers 100% sequence accuracy and optimization of genes to help maximize protein expression
Transformation and plasmid isolation
Once cloning is completed, plasmids are taken up into competent cells (chemically competent or electrocompetent E. coli) for propagation and storage, by a process called transformation. Chemically competent cells are cells treated with salts to open up the pores in the membrane and cell wall. Plasmid DNA is then added to the cells and a mild heat shock opens pores in the E. coli cells, allowing for entry of the plasmid. In contrast, DNA is introduced into electrocompetent cells through transient pores that are formed in the E. coli membrane and cell wall when short electrical pulses are delivered to the cell and plasmid DNA mixture. When choosing a competent cell strain to work with, it is important to consider the following factors:
- Genotype—the list of genetic mutations (deletions, changes, or insertions) in the strain that distinguish it from wild-type E. coli
- Transformation efficiency—measurement of amount of supercoiled plasmid (such as pUC19) successfully transformed into a volume of cells; defined as colony forming units per microgram of DNA delivered (cfu/μg); we manufacture competent cells that have efficiencies ranging from >1 x 106 to >3 x 1010 cfu/μg
- Application—experiment type for which the competent cells are well suited; applications include routine cloning, protein expression, library production, cloning unstable DNA, ssDNA propagation, bacmid creation, and Cre-Lox recombination
- Kit format—formats include high-throughput (96 well), single-use Invitrogen One Shot vials, standard kits, or bulk format
After taking advantage of the E. coli’s molecular machinery to replicate the plasmid DNA, a plasmid purification kit can be used to purify the plasmid.
DNA (containing your gene of interest). We offer two main technologies for plasmid purification:
- Anion exchange
For purification of a cloned plasmid that will be used to transfect into a cell line for protein expression, we recommend anion exchange purification for its higher purity and lower endotoxin levels. Silica-based purification is appropriate for cloning related workflows, but not optimal for plasmids used for transfection as there are higher levels of endotoxins and impurities. Anion exchange columns also produce better results with large plasmids. The Invitrogen PureLink HiPure Expi Plasmid Kits have been developed to give higher yields from large-scale plasmid isolation, in less than half the time of typical plasmid DNA isolation methods.
Selecting an expression system
Using the right expression system for your specific application is the key to success. Protein solubility, functionality, purification speed, and yield are often crucial factors to consider when choosing an expression system. Additionally, each system has its own strengths and challenges, which are important when choosing an expression system. We offer 6 unique expression systems: mammalian, insect, yeast, bacterial, algal, and cell-free systems. Table 1.3 summarizes the main characteristics of these expression systems including the most common applications, advantages, and challenges with each system.
Once a system is selected, the method of gene delivery will need to be considered for protein expression. The main methods for gene delivery include transfection and transduction.
Transfection is the process by which nucleic acids are introduced into mammalian and insect cells. Protocols and techniques vary widely and include lipid transfection, chemical, and physical methods such as electroporation.
See different transfection methods or our transfection reagent selection guide
For cell types not amenable to lipid-mediated transfection, viral vectors are often employed. Virus-mediated transfection, also known as transduction, offers a means to reach hard-to-transfect cell types for protein overexpression or knockdown, and it is the most commonly used method in clinical research. Adenoviral, oncoretroviral, lentiviral, and baculoviral vectors have been used extensively for gene delivery to mammalian cells, both in cell culture and in vivo.
Cell lysis and protein purification
The next step following protein expression is often to isolate and purify the protein of interest. Protein yield and activity can be maximized by selecting the right lysis reagents and appropriate purification resin. We offer cell lysis formulations that have been optimized for specific host systems, including cultured mammalian, yeast, baculovirus-infected insect, and bacterial cells. Most recombinant proteins are expressed as fusion proteins with short affinity tags, such as polyhistidine or glutathione S-transferase, which allow for selective purification of the protein of interest. Recombinant His-tagged proteins are purified using immobilized metal affinity chromatography (IMAC) resins, and GST-tagged proteins are purified using a reduced glutathione resin.
Table 1.3. Characteristics of various recombinant protein expression systems.
|Expression system||Most common application||Advantages||Challenges|
For Research Use Only. Not for use in diagnostic procedures.