Biowire Fall 2011 — Screening — microRNA Target Identification
One of the main problems miRNA researchers face today is global identification of gene targets that are functionally regulated by their miRNA of interest. Unfortunately, bioinformatic predictions are unreliable and there is little agreement between different algorithms. Many experimental strategies only look at mRNA degradation, not translational repression. Methods are needed that will enable global identification of all biologically relevant RNA targets regulated by miRNAs.
To assist discovery and identification of human microRNA (miRNA) targets, we developed the MISSION® Target ID Library, a library of cDNA cloned after a dual-selection fusion protein. Dual-selection allows the user to transfect cells and screen the entire library at once, selecting first for stable transformants and secondly, downstream from introducing a miRNA of interest, for cDNAs containing the miRNA’s targets. Cells containing cDNA constructs targeted by the miRNA survive the second selection, and selected cDNAs can be identified by sequencing (Figure 1). The cDNA library was prepared from a mixture of total RNAs from multiple human tissues and cell lines to give broad coverage of the human transcriptome. In addition, the cDNA was normalized to reduce representation of abundant mRNAs and enrich for rare mRNAs, and size fractionated to clone long and short cDNA separately and preserve the longer constructs. Illumina® sequencing shows 16,922 of the 21,518 unique genes in UCSC RefGene (79%) are represented in the Target ID Library, 14,000 genes with 10 or more reads (66%). Sequence results confirm normalization, PCR with insertflanking primers demonstrate that >95% of clones contain inserts, and restriction digestion shows an average insert size of approximately 1.2 kb. We demonstrate performance of the Target ID Library by screening for targets of miR-373, an important regulator in cancer metastasis and in stem cell reprogramming and maintenance. The screen in MCF-7 cells yielded nearly 3,000 unique gene hits with confidence, 10 of which are previously published miR-373 targets. Work is in progress to analyze results further and validate additional targets.
Figure 1.Target ID Library Workflow. Each step is illustrated and described in detail in Figures 2 and 3.
Figure 2.Creating a Cell Line Expressing the Target ID Library. The Target ID Library is a pool of plasmids A, each with a human cDNA inserted into the 3'UTR after a thymidine kinase-Zeocin fusion protein (TKzeo; Figure 6). Cells are transfected with Target ID Library B and allowed to recover for 3–5 days. Constructs can integrate into the genome during this recovery period C, and express the encoded transcript D. After recovery, cells are exposed to Zeocin E. Cells expressing the TK/Zeo fusion protein from stably integrated Target ID constructs survive Zeocin selection F. Untransfected cells die G. In addition, any cells containing a construct that is a target for an endogenous miRNA or any other factor expressed in the cell that inhibits expression of TK/Zeo from the Target ID construct will die H.
Figure 3. Expressing miRNA and Selecting Targets. Cells containing the Target ID Library (i.e., Zeocin-selected cells) are transfected with a selectable microRNA expression construct A. During recovery, the miRNA expression construct can integrate B and express the selectable marker encoded on the miRNA construct. After selection for stable integration C and cell expansion, cells are treated with ganciclovir D. Cells producing thymidine kinase (TK) in the presence of ganciclovir (i.e., cells expressing TK/Zeo constructs not targeted by the miRNA) will die E. On the other hand, cells containing Library constructs with miRNA target sites will not produce TK, and therefore, will survive ganciclovir selection F. Surviving cells can be grown, gDNA isolated, and the cDNA containing miRNA target sites PCR-amplified using the Target ID Amplification primers. PCR products may be sequenced and aligned with the human genome to identify miRNA targets.
microRNAs are 20–24 nucleotide RNAs that regulate gene expression post-transcriptionally by inhibiting mRNA translation, and frequently, destabilizing the targeted mRNA1. A single miRNA may regulate several hundred mRNAs to control a cell’s response to developmental and environmental signals. Identifying and validating target mRNAs is essential in determining a miRNA’s role and function in these pathways. However, target identification is not straightforward because, in animals, miRNAs and their target sites are not fully complementary. The “seed” region, bases 2 through 7 from the 5'-end of the miRNA, is usually complementary to its targets. However, there are many exceptions to the seed rule, and downstream base pairing can compensate for an imperfect seed match. A number of computer algorithms have been developed to predict miRNA targets based on seed matching and downstream compensation, target structure and position, sequence conservation, and various other parameters that have been observed for experimentally validated targets2. While in silico predictions are convenient and do identify many valid miRNA targets, most predicted genes fail experimental validation tests, and many actual targets are not predicted. Furthermore, since computer algorithms are based on previously determined target features, they do not allow discovery of targets that deviate from what is already known.
A number of experimental systems have been used successfully to identify or discover functional miRNA targets in living cells. These global screening methods include microarrays and RNA sequencing, RNA co-immunoprecipitation (RIP), and Stable Isotope Labeling with Amino acids in Cell culture (SILAC), a proteomic method2. Each method has advantages and disadvantages. Since miRNA targeting often destabilizes an mRNA and leads to its degradation, loss or gain in mRNA level after introducing an miRNA mimic or inhibitor, respectively, can identify miRNA targets. This loss or gain of mRNA is readily detected by microarray or deep sequencing. While mRNA detection methods are simpler and more sensitive than protein detection methods, mRNA detection will miss any miRNA targets that are not degraded. Recent reports from the Bartel lab comparing miRNA target results from RNA detection with those from SILAC3 or ribosome profiling4 indicate that mammalian miRNAs predominantly act by reducing target mRNA levels. However, many other labs working with individual miRNA targets have detected change in protein but no change in mRNA level. Reports of what appear to be translational regulation with no detectable mRNA loss include: miR-10b on HOXD10(5), miR221/222 on p27kip1(6), miR-21 on Pdcd47, miR-126 on p85β8, miR-34 on SIRT19, miR-21 on PTEN10, miR-302d on Arid4b11, miR-200c on JAG112, and most recently, miR-299, 297, 567, and 609 on VEGFA13. Furthermore, Clancy et al. recently reported that translational regulation by let-7 was only detected when individual mRNA isoforms that contain the let-7 target site were detected selectively14. The latter suggests that, at least in some cases, translational regulation may be overlooked in a composite profile — that is, when all mRNA isoforms are detected as one mRNA — as obtained with many microarray and sequence analyses. Co-immunoprecipitation of miRNA-mRNA complexes via argonaute (usually ago2) or another associated protein (RNA immunoprecipitation, or RIP) will isolate miRNA targets regardless of regulation mechanism. In addition, RIP detects endogenous, and presumably biologically relevant, interactions. However, it is not known whether all miRNA-mRNA interactions are functional, and RIP would miss any mRNA targets that associate only transiently or are rapidly degraded. Furthermore, specific miRNA-mRNA partners must be inferred bioinformatically because all miRNA-mRNA pairs coprecipitate together. Finally, SILAC directly identifies the final product of miRNA regulation, the protein itself, but is insensitive and therefore misses rare proteins and small fold changes in protein levels.
In light of the limitations with current miRNA target identification methods, we felt an additional global assay to identify functional miRNA targets by an alternative mechanism was needed. To fulfill this need, we licensed a technology invented by Dr. Joop Gäken and Dr. Azim Mohamedali of King’s College London. Their invention is a dual selection fusion protein, specifically, a thymidine kinase-Zeocin™ fusion, regulated by a cDNA library of potential miRNA target sequences. Cells stably transfected with and expressing TK/ Zeo-cDNA constructs can be selected with Zeocin, as illustrated in Figure 2. After expressing a miRNA of interest, that miRNA’s targets can be selected with ganciclovir, as illustrated in Figure 3. Ganciclovir kills cells expressing thymidine kinase, that is, any cells expressing TK/ Zeo-cDNA lacking a target for the miRNA of interest. Targeted cDNA can be isolated by PCR amplification of DNA from cells that survive ganciclovir using primers that flank the cDNA, and PCR products can be sequenced to identify targets. Gäken et al. demonstrated proof of concept with a library prepared from human brain cDNA15. They identified one known and discovered several new targets of miR- 130a. Notably, for all three of the new targets tested, they detected significantly less protein in cells transfected with miR-130a mimic, but did not detect change in any of the three mRNA levels.
To create the MISSION® Target ID Library, we prepared a more comprehensive cDNA library which was normalized to somewhat balance the representation of abundant and rare mRNAs. Here we present library design strategy, development, characterization, and preliminary performance results.
cDNA Library Construction
To generate a comprehensive library that contains as many miRNA targets as possible for global miRNA target discovery, we started with the following requirements:
To include as much of the human transcriptome as possible in our cDNA, we started with equal proportions of RNA from Stratagene’s and Clontech’s Universal Human Reference RNAs. The former is a pool of total RNA from 10 human cancer cell lines, and therefore includes RNA from transformed, de-differentiated cells. The latter is a mixture of total RNA from a collection of normal human tissues, confirmed to give broad coverage of the normal human transcriptome. Results from Illumina® sequencing (Table 1) show that the final library includes 16,922 of the 21,518 unique genes in UCSC RefGene (79%), or 14,000 genes with 10 or more reads (66%).
mRNA was isolated from the total RNA mixture, and cDNA was prepared and normalized as outlined in Materials and Methods below. Normalization is an attempt to balance the levels of abundant and rare mRNAs so that rare mRNAs are more readily detected in a screen. When hybridizing mRNA with biotinylated first strand cDNA prepared from the same mRNA, abundant mRNAs encounter their complementary sequence more frequently, and therefore, hybridize more quickly than rare mRNAs. After removing the biotinylated first strand cDNA-mRNA hybrids, cDNA is prepared from the remaining mRNA. Successful normalization of our mRNA was verified by semiquantitative PCR from cDNA before cloning. As shown in Figure 4A, GAPDH, an abundant mRNA, was detected at PCR cycle 21 before normalization but not until cycle 24 after normalization. This 3-cycle change in detection indicates an approximately 8-fold reduction in GAPDH mRNA. On the other hand, TGF-β, a rare mRNA, was detected at least 7 cycles earlier after normalization, indicating an approximately 1,000-fold enrichment (Figure 4A). Illumina sequencing results confirm successful normalization. As shown in Figure 4B, 498 reads aligned with GAPDH, whereas a total of 709 reads aligned with TGF-β isoforms 1–3.
Figure 4. cDNA Normalization Results. A) Semi-quantitative RT-PCR before and after normalization of mRNA. RT-PCR was performed to detect the abundant GAPDH mRNA or the rare TGF-ß mRNA in the mRNA pool before and after normalization. PCR was run for 18, 21, 24, or 28 cycles, and the PCR products evaluated by agarose gel electrophoresis. B) Relative abundance of genes represented in the Target ID Library. Illumina® sequencing results for reads that mapped to the human genome were plotted as number of reads with 10 or more reads are indicated on the x-axis. Positions of GAPDH and TGF-ß are marked by arrows, with number of reads in parenthesis.
To avoid preferential cloning of shorter cDNAs, the cDNA was sizefractionated (Figure 5A), and the long (>1.5 kb) and short (0.4 to 1.5 kb) cDNAs were ligated to the Target ID Library vector (Figure 6) and transformed into E. coli separately. The two transformed E. coli pools, containing long or short cDNA constructs, were combined and amplified in semi-solid agarose. Growth in semi-solid agarose, or culture medium containing a low amount of agarose so that each transformant grows as a separate colony, allows E. coli containing larger constructs to grow with minimal competition from those with smaller constructs. This reduces the loss of larger constructs that occurs in liquid medium. Agarose gel fractionation of the plasmid library after restriction digest to release inserts shows cDNA inserts range from 0.75 to at least 2 kb, with an average size of 1.2 kb (Figure 5B). PCR amplification from 192 individual clones using insert-flanking primers indicates that 69% of the cDNA inserts are >1 kb (Figure 5C).
Figure 5. cDNA Size Selection and Results. A) SfiI-digested and size-selected cDNA before cloning. Aliquots from 2 preparations of both short (0.4–1.5 kb) and long (>1.5 kb) cDNA were fractionated on a 1% agarose gel along with undigested and SfiI-digested Target ID Vector. B) Digestion of the final Library to show size range of cDNA inserts. Three separate digests were performed and fractionated on a 1% agarose gel. C) Individual cDNA insert sizes by PCR. E. coli containing the Library were spread on LB ampicillin plates, and 192 colonies randomly selected and tested by colony PCR with primers flanking the cDNA insert. PCR products were fractionated on 1% agarose gels.
Figure 6.Target ID Library Vector. Features include a CMV promoter, the TK/Zeo fusion protein encoding construct, two different SfiI sites in the 3'UTR region for introducing cDNA, and a polyadenylation signal.
After plasmid purification, the final Target ID Library was deep sequenced on an Illumina® Genome Analyzer IIX. As shown in Table 1, 95% of the reads aligned with either the vector (84%) or the human genome (cDNA insert reads; 11%). Only 2% failed to map to either but did have a match in NCBI’s non-redundant nucleotide database (i.e., with NR hit), and only 0.2% aligned with rRNA, indicating minimal contamination. The remaining 3% did not map to any of the reference sequences or to NCBI’s non-redundant nucleotide database, which is within the normal 1–5% range for Illumina sequence results. A few of the genes represented in the Library and their relative abundance is illustrated in Figure 4B. All represented genes and their relative abundance is listed on our website.
Library Screen for miR-373 Targets
To evaluate performance of the MISSION® Target ID Library, miR-373 targets were selected from MCF-7 library expressing cells. MCF-7 was chosen because it expresses little or no detectable miR-373 (data not shown). miR-373 was chosen for its biological interest. Huang et al., demonstrated that miR-373 expression promotes tumor invasion and metastasis in MCF-7, normally a non-metastatic cell line16. Furthermore, Lichner et al. showed that miR-373 orthologues from the mouse miR-290 cluster are involved in mouse embryonic stem cell maintenance17, and Subramanyan et al. demonstrated that miR-373 family member, miR-372, promotes fibroblast reprogramming to induced pluripotent stem cells18. Finally, we previously generated a cell line from MCF-7 that expresses miR-373 using zinc finger nucleases (ZFNs) to insert a PGK promoter-miR-373 expression construct into the AAVS1 site, and generated a list of potential miR-373 targets by RNA microarray analysis (data not shown).
The Target ID Library was transfected into MCF-7 cells, and a stable population of cells was selected and amplified in Zeocin-containing medium. The resulting cells (MCF-7 Library) were stably transfected with a miR-373 expressing construct and selected in ganciclovircontaining medium to enrich for cells expressing miR-373 targets. It was noted that negative control MCF-7 Library cells without miR-373 did not detach and become rounded as expected for dead cells. However, they did stop growing, which was readily detected because the phenol red-containing medium did not turn orange-yellow as for cells expressing miR-373 (Figure 7). Cells were expanded in ganciclovircontaining medium, and target sequences were isolated by PCR with primers flanking the Target ID Library cDNA inserts and DNA prepared from the surviving cells. PCR products were Illumina® sequenced.
Figure 7.Ganciclovir Effect on Cell Growth. 24-well plates were seeded with 200,000 MCF-7 cells or MCF-7 cells containing the Target ID Library. 24 hours later, the medium was replaced with medium containing 0, 8, or 16 μM ganciclovir. After 15 days, the plate was photographed (6 upper wells), then the wells washed with HBSS and stained with Brilliant Blue R Staining Solution (B6529) (6 lower wells). Note that ganciclovir has no effect on MCF-7 cells without Library because they do not express thymidine kinase (TK). On the other hand, MCF-7 Library cells do express TK but are not completely killed by ganciclovir at 8 or 16 μM, as shown by the live cells taking up Brilliant Blue. However, the Library cells do stop growing in ganciclovir, as evidenced by color of the phenol red dye in their medium. The arrested cells do not acidify their medium, and the color remains reddish instead of changing to orange-yellow.
As shown in Table 2, we obtained 17,740,719 cDNA reads, which mapped to 11,076 unique genes. Of these unique genes, 2,898 were detected with more than 40 reads, and therefore were considered reliable hits. The 13,106,469 vector reads were expected because the PCR primers used to amplify cDNA inserts are 40 and 300 bases away from the insertion site. 14% did not map to either vector or cDNA but did align with NCBI’s non-redundant nucleotide database (i.e., with NR hit), and only 1% had no cDNA insert.
Although data analysis is ongoing, an initial comparison found 10 of the unique genes identified with the Target ID Library also in the list of previously identified miR-373 targets in TarBase (Table 3). These 10 miR-373 targets were previously identified by Lim et al. by microarray with RNA from HeLa cells transiently transfected with a synthetic miR-373 mimic19. Furthermore, we detected these same 10 genes down-regulated in MCF-7 cells expressing miR-373 (data not shown).
Therefore, these 10 are likely valid targets of miR-373. Work is in progress to characterize the list of potential targets and attempt to validate selected hits experimentally by RT-qPCR, Western blot, and luciferase reporter assay.
In conclusion, we have developed a new tool for global identification and discovery of functional human miRNA targets — the MISSION® Target ID Library. With the Target ID Library, users can isolate miRNA targets by a series of mammalian cell culture transfection and drug selection steps. The Library is comprehensive, and contains 66–79% of human genes. Initial results indicate that both previously discovered as well as new targets can be isolated from the MISSION® Target ID Library.
All reagents were from our catalog of products unless otherwise noted.
cDNA Preparation and Cloning
cDNA was prepared and normalized by Rx Biosciences, Ltd. Briefly, we provided Rx Biosciences with 400 μg of both Stratagene Universal Human Reference RNA (Cat. No. 740000) and Clontech Human Universal Reference Total RNA (Cat. No. 636538). These were pooled and mRNA isolated using oligo dT cellulose. First strand cDNA was synthesized from purified mRNA using a biotinylated oligo dT primer. The biotinylated first strand was hybridized with more of the purified mRNA at a ratio of 5:1, biotinylated cDNA-mRNA hybrids removed with streptavidin beads, and unhybridized mRNA recovered from the supernatant. Note that abundant mRNAs will encounter their complementary first strand and hybridize more rapidly than rare mRNA, leaving rare mRNAs enriched in the single-stranded product. cDNA was prepared from the normalized mRNA using the following oligo dT primer: 5'-ATTCTAGAGGCCGAGGCGGCCGGCCGAGGCGGCCGACATG(T30)-3'. After second strand synthesis, the following adapter was ligated to the 3'-end of the cDNA:
The cDNA was digested with SfiI, separated into 0.4–1.5 kb and >1.5 kb fractions on low melting agarose, and the two fractions cloned separately between the SfiI sites in our Target ID Library vector. E. coli were transformed with the large and small cDNA libraries separately, and transformed E. coli were grown in semi-solid agarose medium. 10% of the E. coli recovered was re-amplified in semi-solid agarose medium, and glycerol stocks prepared from the rest and stored at -80 °C for future batches of Target ID Library. Target ID Library plasmid was purified directly from colonies that grew in the second batch of semi-solid medium using Qiagen’s EndoFree Plasmid Giga Kit (Cat. No. 12391), and 100 μg aliquots prepared and stored at -20 °C. One aliquot was submitted to Cofactor Genomics for sequencing on an Illumina® Genome Analyzer IIX and data analysis.
Library Screen with miR-373
MCF-7 cells (ATCC HTB-22) were transfected with the Target ID Library and screened for miR-373 targets as described in the Product User Guide. Specifically, 12 aliquots of 4 x 106 MCF-7 cells were each transfected with 1.6 μg Target ID Library using Amaxa® Cell Line Nucleofector® Kit V and Program P-020 on an Amaxa Nucleofector II instrument. Medium (RPMI-1640 with 10% fetal bovine serum, R8758 and F2442, respectively) was replaced after two days, and cells were allowed to continue recovering for a total of five days, at which time the medium was replaced with medium containing 0.5 mg/mL Zeocin™ (Invitrogen R250-01). Zeocin medium was replaced after two days to remove dead cells. Surviving cells were expanded with Zeocin selection for approximately two weeks to generate MCF-7 Library cells. Freezer stocks were prepared and stored in liquid nitrogen.
A miR-373 expression plasmid was constructed by cloning miR-373 along with 100–200 bp on either side of the miR-373 hairpin from human genomic DNA, as described by Huang et al.16, and inserting it into pBABE-puro (Adgene plasmid 1764). Ten aliquots of 2 x 106 cells were each transfected with 2 μg miR-373 expression plasmid by Nucleofection as above. Medium was replaced after two days. Five days post-transfection, medium was replaced with medium containing 0.5 mg/mL puromycin (P9620). Surviving cells were expanded for approximately one week with puromycin selection, after which medium was replaced with medium containing 8 μM ganciclovir (G2536). Surviving cells were expanded with ganciclovir selection for about a month. Freezer stocks were prepared and DNA was isolated using GenElute™ Mammalian Genomic DNA Isolation Kit (G1N). PCR was performed as described in the User Guide for MISSION® Target ID Library, and PCR products submitted to Cofactor Genomics for Illumina® sequencing and data analysis.
We thank the entire MISSION® Target ID Library development team, especially Kevin Gutshall for finding the technology and negotiating license terms, and Heather Holemon for her support and participation in fruitful troubleshooting discussions. We also thank Dr. Joop Gäken of King’s College London for freely sharing unpublished results and suggestions, and Qazi Hamid of Rx Biosciences for preparing a great batch of cDNA and his persistence in getting it cloned. Finally, we thank Nan Lin and Scott Bahr performing RNA microarrays and data analysis.
For more information, visit Website.