Transcriptome analysis in maritime pine using laser capture microdissection and 454 pyrosequencing

11
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] Tree Physiology 34, 1278–1288 doi:10.1093/treephys/tpt113 Transcriptome analysis in maritime pine using laser capture microdissection and 454 pyrosequencing Rafael A. Cañas, Javier Canales, Josefa Gómez-Maldonado, Concepción Ávila and Francisco M. Cánovas 1 Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Instituto Andaluz de Biotecnología, Universidad de Málaga, Campus Universitario de Teatinos s/n, Málaga 29071, Spain; 1 Corresponding author ([email protected]) Received July 30, 2013; accepted November 19, 2013; published online January 3, 2014; handling Editor Ron Sederoff Maritime pine (Pinus pinaster Aiton) is one of the most advanced conifer models for genomics research. Conifer genomes are extremely large and major advances have recently been made in the characterization of transcriptomes. The combination of laser capture microdissection (LCM) and next-generation sequencing is a powerful tool with which to resolve the entire tran- scriptome of specific cell types and tissues. In the current work, we have developed a protocol for transcriptomic analyses of conifer tissue types using LCM and 454 pyrosequencing. Tissue sections were isolated using non-fixed flash-frozen samples processed by LCM. Complementary DNA synthesis and amplification from tiny amounts of total RNA from LCM samples was performed using an adapted protocol for Conifer RNA Amplification (CRA+). The cDNA amplification yield and cDNA quality provided by CRA+ were adequate for 454 pyrosequencing. Furthermore, read length and quality results of the 454 runs were near the optimal parameters considered by Roche for transcriptome sequencing. Using the CRA+ protocol, non-specific amplifications were prevented, problems derived from poly(A:T) tails in the 454 sequencing technology were reduced, and read length and read number considerably enhanced. This technical approach will facilitate global gene expression analysis in individual tissues of conifers and may also be applied to other plant species. Keywords: cDNA amplification, conifers, Pinus pinaster , trees. Introduction Laser capture microdissection (LCM) is a technique that allows the isolation of DNA, RNA, proteins (Decarlo et al. 2011 ) or metabolites (Hölscher and Schneider 2007) from specific cell types or tissues. Thanks to LCM, it is possible to study compart- mentalization of processes and to deepen our understanding of rare or specifically localized processes. The combination of LCM with high-throughput techniques is a very powerful tool for global gene expression analysis in plants and facilitates our understand- ing of complex processes localized in specialized plant cells. Trees are complex organisms with long life cycles, and many dif- ferent tissues, physiological processes and symbiotic interac- tions, which require the expression of determinate genes in a very specific form. The combination of LCM with high-throughput expression analysis are is an excellent tool to identify transcrip- tional networks and key regulatory genes particularly associated with unique tissues and cell types in woody plants. Next-generation sequencing (NGS) technology is one of the most important new techniques to obtain massive data (Mutz et al. 2013). Next-generation sequencing generates millions of DNA fragments in a single sequencing run. There are several approaches used in NGS technologies yielding different read length and read number per run, but the most often used plat- forms are Roche 454, Illumina-Solexa and SOLiD (Mutz et al. 2013). As the total RNA amount isolated from LCM samples is very small, it is necessary to amplify the cDNA to obtain enough for NGS. There are at least two major strategies for cDNA Technical note by guest on May 21, 2016 http://treephys.oxfordjournals.org/ Downloaded from

Transcript of Transcriptome analysis in maritime pine using laser capture microdissection and 454 pyrosequencing

© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

Tree Physiology 34, 1278–1288doi:10.1093/treephys/tpt113

Transcriptome analysis in maritime pine using laser capture microdissection and 454 pyrosequencing

Rafael A. Cañas, Javier Canales, Josefa Gómez-Maldonado, Concepción Ávila and Francisco M. Cánovas1

Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Instituto Andaluz de Biotecnología, Universidad de Málaga, Campus Universitario de Teatinos s/n, Málaga 29071, Spain; 1Corresponding author ([email protected])

Received July 30, 2013; accepted November 19, 2013; published online January 3, 2014; handling Editor Ron Sederoff

Maritime pine (Pinus pinaster Aiton) is one of the most advanced conifer models for genomics research. Conifer genomes are extremely large and major advances have recently been made in the characterization of transcriptomes. The combination of laser capture microdissection (LCM) and next-generation sequencing is a powerful tool with which to resolve the entire tran-scriptome of specific cell types and tissues. In the current work, we have developed a protocol for transcriptomic analyses of conifer tissue types using LCM and 454 pyrosequencing. Tissue sections were isolated using non-fixed flash-frozen samples processed by LCM. Complementary DNA synthesis and amplification from tiny amounts of total RNA from LCM samples was performed using an adapted protocol for Conifer RNA Amplification (CRA+). The cDNA amplification yield and cDNA quality provided by CRA+ were adequate for 454 pyrosequencing. Furthermore, read length and quality results of the 454 runs were near the optimal parameters considered by Roche for transcriptome sequencing. Using the CRA+ protocol, non-specific amplifications were prevented, problems derived from poly(A:T) tails in the 454 sequencing technology were reduced, and read length and read number considerably enhanced. This technical approach will facilitate global gene expression analysis in individual tissues of conifers and may also be applied to other plant species.

Keywords: cDNA amplification, conifers, Pinus pinaster, trees.

Introduction

Laser capture microdissection (LCM) is a technique that allows the isolation of DNA, RNA, proteins (Decarlo et al. 2011) or metabolites (Hölscher and Schneider 2007) from specific cell types or tissues. Thanks to LCM, it is possible to study compart-mentalization of processes and to deepen our understanding of rare or specifically localized processes. The combination of LCM with high-throughput techniques is a very powerful tool for global gene expression analysis in plants and facilitates our understand-ing of complex processes localized in specialized plant cells. Trees are complex organisms with long life cycles, and many dif-ferent tissues, physiological processes and symbiotic interac-tions, which require the expression of determinate genes in a very specific form. The combination of LCM with high-throughput

expression analysis are is an excellent tool to identify transcrip-tional networks and key regulatory genes particularly associated with unique tissues and cell types in woody plants.

Next-generation sequencing (NGS) technology is one of the most important new techniques to obtain massive data (Mutz et al. 2013). Next-generation sequencing generates millions of DNA fragments in a single sequencing run. There are several approaches used in NGS technologies yielding different read length and read number per run, but the most often used plat-forms are Roche 454, Illumina-Solexa and SOLiD (Mutz et al. 2013).

As the total RNA amount isolated from LCM samples is very small, it is necessary to amplify the cDNA to obtain enough for NGS. There are at least two major strategies for cDNA

Technical note

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Online at http://www.treephys.oxfordjournals.org

amplification: ‘exponential amplification’ based on polymerase chain reaction (PCR) and multiple rounds of isothermal ‘linear amplification’ based on in vitro transcription with T7 RNA poly-merase. Most reports that combine LCM with NGS have used ‘linear amplification’ methods (Emrich et al. 2007, Ohtsu et al. 2007, Scanlon et al. 2009, Li et al. 2010, Takacs et al. 2012, Thiel et al. 2012, Torti et al. 2012). The advantages and incon-veniences of the different methods have been previously dis-cussed (Kurimoto and Saitou 2010).

The amplification methods are based on oligo-dT adapters which maintain the poly(A:T) tails, and consequently generate a bias of sequence alignments toward the 3′ end of the tran-scripts (Emrich et al. 2007, Li et al. 2010, Takacs et al. 2012). Poly(A:T) tails generate serious problems in Roche 454 sequencing, diminishing the number of acceptable reads (Emrich et al. 2007, Ohtsu et al. 2007).

In the Roche 454 (pyrosequencing) system, the light pro-duced by the firefly luciferase during the release of pyrophos-phate in the incorporation of a nucleotide by DNA polymerase is used for sequencing (Mardis 2008). Roche 454 cannot properly interpret long stretches (>6 bp), such as polyA tails, leading to errors in the read sequences (Mardis 2008). The Illumina system utilizes a sequencing-by-synthesis approach that incorporates fluorescent nucleotides into template strands. The Roche 454 GS FLX+ System is able to yield read up to 1000 bp against the 150 bp of Illumina HiSeq 2500, although Roche 454 yields up to 1 million reads per run and the Illumina HiSeq 2500 up to 3 billion reads per run.

Our laboratory is interested in molecular and functional stud-ies of maritime pine (Pinus pinaster Aiton), a coniferous tree. Advances in the genomics of conifers have been hampered by the large size of their genomes. Conifer genomes are extremely large in size ranging from 20 to 40 Gb, which is on average more than 200-fold the Arabidopsis genome and nearly seven-fold the human genome (Mackay et al. 2012, Ritland 2012). As the conifer genomes are extremely large, major research efforts have been concentrated in the characterization of transcrip-tomes (Fernández-Pozo et al. 2011, Rigault et al. 2011, Chen et al. 2012). Recently, massive parallel sequencing data have been used for de novo assembling of the maritime pine tran-scriptome (Canales et al. 2013). Several initiatives have been launched to sequence the pine genome and the genome drafts of Picea glauca (Moench) Voss and Picea abies (L.) Karst have been recently published (Birol et al. 2013, Nystedt et al. 2013).

In the current work, we have developed an improved protocol for transcriptomic analyses of conifer tissue types isolated by LCM. The tissue sections were isolated using non-fixed flash-frozen samples processed by LCM. The amount of total RNA from LCM samples was extremely limited, but we were able to perform cDNA synthesis and amplification from tiny amounts of total RNA, modifying the conditions of a commercial kit from Evrogen (Moscow, Russia). This protocol is based on a 5′

RACE-PCR designed to synthesize full-length-enriched double-stranded (ds) cDNA from total or poly(A)-enriched RNA, and it has been adapted to the main NSG platforms including Roche 454 sequencing (Babik et al. 2010, Kumar and Blaxter 2010, Xiong et al. 2011, Ekblom et al. 2012). The cDNA amplification yield and quality were optimal for 454 pyrosequencing. Using our improved protocol, the read length and quality results of the Roche 454 runs were near the optimal parameters considered by Roche for transcriptome sequencing. This technical approach will facilitate global gene expression analysis in individual tissues of conifers and may also be applied to other plant species.

Materials and methods

Laser capture microdissection of pine tissue types

Pinus pinaster seeds from the Oria provenance were germi-nated and grown at 20/24 °C with a 16/8 h photoperiod. Germinating seeds were watered twice a week with distilled water. One-month-old seedlings were sampled and 0.5-cm tis-sue sections were processed for LCM. Tissue sections were treated for paraffin embedding or cryosectioning.

The paraffin-embedded samples were fixed in ethanol : acetic acid (75 : 25 v/v) with vacuum pulses. The samples were then sequentially treated with 75% ethanol for 1 h, 100% ethanol for 1 h, ethanol : Histolemon (1:1 v/v) for 1 h and pure Histolemon (Carlo Erba, Val-de-Reuil, France) for 1 h. Five to six pearls of Paraplast X-tra (Leica, Wetzlar, Germany) were added to the Histolemon and incubated for 1 h at room temperature. Later on, an equal volume of molten Paraplast X-tra was added to the samples at 58 °C and incubated for 2 h. Finally, the Histolemon : Paraplast X-tra mix was replaced by pure liquid Paraplast X-tra at 58 °C. The liquid Paraplast X-tra was replaced four times before forming the blocks over the course of 1 day. The embedded samples were stored at 4 °C before sectioning. The samples were cut with a rotary microtome and the sections (10 µm thick) were mounted on polyethylene terephthalate (PET)-membrane 1.4 µm steel frames (Leica) and dried at 37 °C for 1–2 h. Dry slides were deparaffinized twice for 5 min each in Histolemon. Subsequently, the samples were incubated for 5 min in 100% ethanol and air dried. The microdissection was made using a laser microdissector (LMD700, Leica). The microdissection samples were placed into the caps of 0.5 ml tubes containing 10 µl lysis buffer from the RNAqueous-Micro RNA Isolation Kit (Ambion, Austin, TX, USA). Samples used after were stored at −80 °C. All the RNA extractions from the microdissection of par-affin-embedded samples were made using the LCM protocol of the RNAqueous-Micro RNA Isolation Kit (Ambion). RNA quality was assessed using the RNA Pico Assay for the 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA).

The tissue sections for cryosectioning were mounted in a specimen holder with embedding medium optimal cutting temperature (OCT) (Tissue-Tek, Torrance, CA, USA) and

LCM of pine tissues and transcriptome sequencing 1279

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Volume 34, 2014

snap-frozen in liquid nitrogen for cryostat sectioning. The fro-zen samples were stored at −80 °C until use. One day before the cryostat sectioning, the samples were tempered at −20 °C. Twenty-micrometer-thick sections were made using a cryostat (HM 525, Thermo, USA) at −20 °C and mounted on PET-membrane 1.4-µm steel frames (Leica) with the help of a plexiglass FrameSupport (Leica). The steel frames containing the samples were used immediately or stored at −80 °C until use. Before the microdissection, the samples were fixed in cold 100% ethanol for 10 s, the OCT was removed in diethyl-pyrocarbonate-treated water for 2 min and refixed in 100% ethanol for 1 min. Subsequently, the samples were air dried and used for microdissection. The microdissection was car-ried out in a laser microdissector (LMD700, Leica) as before. The microdissection samples were placed into the caps of 0.5 ml tubes containing 10 µl lysis buffer from the RNAqueous-Micro RNA Isolation Kit (Ambion). Before RNA extraction, all the samples were placed at −80 °C. RNA extractions from the microdissection samples and RNA quality assessment were as described above. The RNA samples with an RNA integrity number (RIN) >7 were used for the subsequent mRNA ampli-fication and 454 pyrosequencing.

The LCM protocol was routinely used to separate a range of tissues types from roots, hypocotyls and cotyledons in 1-month-old P. pinaster seedlings.

Complementary DNA amplification

In a first group of samples, the cDNA amplification was made using the Mint-2 cDNA synthesis kit (Evrogen) protocol (Evrogen Mint2 cDNA synthesis manual, http://www.evrogen.com/kit-user-manuals/Mint-2.pdf, 17 December 2013, date last accessed) for Roche 454 sequencing (Conifer RNA Amplification, CRA protocol). In the rest of samples, the cDNA amplification was made using the modified CRA+ protocol.

In the CRA+ cDNA synthesis protocol, the first-strand cDNA synthesis was carried out following the indications of Pinto and Lindblad (2010). For the first-strand cDNA synthesis, 2.5–10 ng of total RNA were added to a mix with 1 µl dNTP (10 mM each) and 1 µl polTdeg primer (10 µM) in a final volume of 7 µl. The mixture was incubated in a thermal cycler at 65 °C for 5 min (using a heated lid). After this incubation, the tem-perature was decreased to 50 °C keeping the tubes in the thermal cycler at 50 °C while the RT master mix was prepared (between 1 and 3 min). The RT master mix for each reaction tube included 2 µl of 5× first-strand buffer, 0.5 µl MgCl2 (50 mM) and 0.5 µl Mint RT (Evrogen) in a final volume of 3 µl. The 3 µl of RT master mix was added to each reaction tube with the RNA. The reaction mix was gently pipetted. The reaction tubes were maintained in the thermal cycler except for the time necessary to add the RT master mix. Incubation pro-ceeded at 50 °C for 60 min. After this time 10 µl of a new RT mix was added to each tube. This RT mix included 2 µl of 5×

first-strand buffer, 5.1 µl nuclease-free water, 0.5 µl MgCl2 (50 mM), 0.6 µl MnCl2 (100 mM), 1.3 µl PlugOligo-Adapter (15 µM) and 0.5 µl Mint RT (Evrogen) in a final volume of 10 µl. The 10 µl mixture was added to each tube and mixed by gently pipetting. The reaction tubes were maintained in the thermal cycler except for the time necessary to add the RT master mix. The tubes were incubated at 42 °C for 1.5 h.

The synthesized single-stranded (ss) cDNA was purified with the NucleoSpin® Gel and PCR Clean-up kit (Macheray-Nagel, Düren, Germany) according to the manufacturer’s man-ual. The ss cDNA was eluted from the columns with 25 µl of preheated water at 70 °C. The column stood for 2 min with the caps open and subsequently the tubes were closed and centri-fuged at 14,000 rpm for 1 min to elute the sample. This opera-tion was repeated once again.

To establish the optimal PCR conditions for the ds cDNA syn-thesis and amplification four PCR of 25, 28, 31 and 34 cycles were set. Each PCR tube contained 11.5 µl of purified ss cDNA, 4 µl M1 primer (10 µM) and 12.5 µl iProof™ HF Master Mix (BioRad, Hercules, CA, USA) in a total volume of 25 µl. The reaction parameters were a denaturing step at 95 °C for 1 min, followed by 25, 28, 31 or 34 cycles of 95 °C for 15 s, 65 °C for 20 s and 72 °C for 3 min. Ten microliters of each reaction were run in a 1.2% agarose gel to verify the PCR conditions in order to carry out the subsequent amplification. The selected PCRs had 25 or 28 cycles. The PCRs with the proper appearance were diluted 10 times in nuclease-free water and used for the amplification.

The ds cDNA amplification was carried out by nested PCR. Five reactions of 50 µl were prepared including the following components: 22 µl nuclease-free water, 25 µl iProof™ HF Master Mix, 2 µl 454-PCR primer mix and 1 µl diluted (1 : 10) ds cDNA from the first PCR step in a final volume of 50 µl. The reaction parameters were a denaturing step at 95 °C for 1 min, followed by three cycles of 95 °C for 15 s, 50 °C for 20 s and 72 °C for 3 min plus 11 cycles of 95 °C for 15 s, 63 °C for 20 s and 72 °C for 3 min. Finally, the five reactions were pooled and purified with the NucleoSpin® Gel and PCR Clean-up kit (Macheray-Nagel) according to the manufacturer’s manual. The ds cDNA was eluted from the columns with 25 µl of preheated water at 70 °C. The column stood for 2 min with the caps open and, subsequently, the tubes were closed and centrifuged at 14,000 rpm for 1 min to elute the sample. This operation was repeated once more.

The quantity of the amplified ds cDNA was determined using the Quant-iT™ PicoGreen® dsDNA Kit (Invitrogen, Paisley, UK). The quality of the amplified ds cDNA was determined using the Agilent 7500 DNA Kit in the 2100 Bioanalyzer (Agilent).

454 pyrosequencing

Transcriptome sequencing was performed at the Universidad de Málaga ultrasequencing facility using the GS-FLX+ platform with

1280 Cañas et al.

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Online at http://www.treephys.oxfordjournals.org

a GS-FLX Titanium kit, Roche Applied Sciences (Indianapolis, IN, USA). Each sample was run in one-half of a 454 PicoTiterPlate following the manufacturer’s sequencing protocol.

The quantity of the cDNA libraries was determined using the Quant-iT™ PicoGreen® dsDNA Kit (Invitrogen). The quality of the cDNA libraries was determined using the Agilent High Sensitivity DNA Kit in the 2100 Bioanalyzer (Agilent).

The runs were analysed using the Roche GS-FLX+ software. The resulting reads were trimmed and analysed using the SeqTrimNext software (Falgueras et al. 2010).

Quantitative PCR

The expression analyses were carried out by quantitative PCR (qPCR) in LCM samples. Total RNA (2.5 ng) was used to syn-thesize cDNA using the iScript™ Reverse Transcription Supermix for RT-qPCR (BioRad), which is able to work with a broad linear dynamic range of total RNA input (1 µg–1 pg). Following the instruction manual, the reverse transcription con-ditions were: 5 min at 25 °C, 30 min at 42 °C and finally 5 min at 85 °C. The reverse transcription was carried out in a thermal cycler DNA Engine® (BioRad).

The primers used for the qPCRs were as follows:Actin-Forward (5′ATCTCTCAGCACATTCCAACAG3′)Actin-Reverse (5′GCTTGAGATGATGGTGGCAATA3′)EF1A-Forward (5′TGCTGTTGGAGTCATCAAGG3′)EF1A-Reverse (5′CTCGTGCATCAGAATCAGACA3′)GS1a-Forward (5′ATCGAGGAGCTTCAGTTAGAGTGG3′)GS1a-Reverse (5′TGGTCGTCTCAGCAATCATAGAAGT3′)GS1b-Forward (5′CAGCTTCCAACATGGACCCCTAT3′)GS1b-Reverse (5′TGGGGATCCCTTCGCAATTTGAC3′)AS1-Forward (5′TACACAGAAATGGTTGGG3′)AS1-Reverse (5′CCCATCCAAATAAAACCG3′)ASPG-Forward (5′AGTAATGGAGACGTTGCTGC3′)ASPG-Reverse (5′TCAAACAAGCACAGAGGGTG3′)The qPCR was carried out using SsoFast™ EvaGreen®

Supermix (BioRad). The reaction mix was prepared as follows: 5 µl of 2× SsoFast™ EvaGreen® Supermix, 0.5 pmol of Forward primer, 0.5 pmol of Reverse primer and 1 µl of the diluted one-half cDNA (125 pg from the original total RNA sample). The qPCR were carried out in a thermal cycler CFX384 (BioRad) under the following conditions: 3 min at 95 °C (1 cycle), 1 s at 95 °C and 5 s at 60 °C (50 cycles) and a melting curve from 60 to 95 °C.

The raw fluorescence data from each reaction were fitted to the MAK2 model, which requires no assumptions about the amplification efficiency of a qPCR assay (Boggy and Woolf 2010). The initial target concentrations (D0 parameter) for each gene were deduced from the MAK2 model using the qpcR package for the R environment (Ritz and Spiess 2008) and normalized to the geometric mean of two reference genes (actin and EF1A). For the qPCR analysis, three biological repli-cates and three technical replicates per sample were made.

Read alignment and read count

The read obtained in the 454 sequencing were mapping against the SustainpineDB contigs as reference (Canales et al. 2013, http://www.scbi.uma.es/sustainpinedb/home_page) using the maximal exact matches (MEM) option of the Burrows–Wheeler Aligner (BWA) software for long reads (Li and Durbin 2010; H. Li and R. Durbin, unpublished). The scripts used for BWA-MEM analysis were as follows:

bwa index -p sustainpine_v3.0_unigenes -a bwtsw sustain-pine_v3.0_unigenes.fastabwa mem sustainpine_v3.0_unigenes sample.fastq > sam-ple.sam

The number of reads for each gene was counted using the Tablet assembly visualization software (Milne et al. 2013). The contigs considered for counting the reads of the genes were the following: for actin sp_v3.0_unigene17257 and sp_v3.0_unigene18113; for EF1A sp_v3.0_unigene1012; for GS1a sp_v3.0_unigene11788 and sp_v3.0_unigene10612; for GS1b sp_v3.0_unigene10503; for AS1 sp_v3.0_unigene14147; and for ASPG sp_v3.0_unigene4029.

Results and discussion

Setting up a working protocol for LCM in pine

We determined the most appropriate method to preserve RNA integrity for LCM. With regard to conifers, a single report is found in the literature describing the isolation of tissue- specific RNA by LCM from Picea glauca (Abbott et al. 2010). We initially started to assay LCM from paraffin-embedded samples (32 RNA extrac-tions) and also from flash-frozen samples embedded in OCT medium for cryostats (>50 RNA extractions) (Figure 1a and b). The total RNA obtained from paraffin- embedded sections had in all cases a suboptimal quality for 454 pyrosequencing (Figure 1c) with an RIN <7, and even in many cases <5, which is consid-ered to be the minimum RIN value to quantify transcript levels using real-time PCR (qPCR) (Fleige and Pfaffl 2006, Abbott et al. 2010). In contrast, the RIN values observed in the flash-frozen samples were always >5, and in most cases >7 (Figure 1d). Although the paraffin- embedded samples were fixed with etha-nol : acetic acid (75 : 25 v/v), they displayed problems by RNase re-activation after microtome cutting and/or in the time exposed to the air during the LCM process. This could be related to the ambient humidity (60–70% in our laboratory), as was previously observed for paraffin-embedded samples (Ordway et al. 2010). In contrast, the flash-frozen samples were not affected by the ambient humidity although the tissue sections were only fixed in absolute ethanol after cryostat cutting. In both treatments, the amount of isolated total RNA depended primarily on the number of sections used. However, the amount of total RNA, ranging between 15 and 60 ng, was insufficient for 454 pyrosequencing. Owing to low RNA yield, it was necessary to synthesize and amplify cDNA from all tissue-type samples. According to these

LCM of pine tissues and transcriptome sequencing 1281

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Volume 34, 2014

1282 Cañas et al.

Figure 1. Laser capture microdissection and RNA isolation from maritime pine tissues. Images of LCM in root tips (a and b). In (a), the microdis-section areas are selected by lines. (b) The microdissection areas are cut. Black scale bars in the pictures represent 400 µm. Analysis of isolated RNA from LCM samples (c and d). Virtual electropherograms from Agilent RNA 6000 Pico chip runs performed in an Agilent’s 2100 Bioanalyzer. (c) RNA sample isolated from LCM samples using the paraffin-embedded protocol. (d) Total RNA was isolated from LCM samples using the cryo-sectioning protocol. The RIN is shown in the image.

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Online at http://www.treephys.oxfordjournals.org

results, we decided to select the flash- freezing protocol as the best source for LCM, in close agreement with the data previously reported in Picea glauca (Abbott et al. 2010).

Complementary DNA synthesis and amplification: evaluation of pre-existing and improved protocols

The synthesis and amplification of cDNA from small quantities of total RNA or poly(A)+ RNA is the usual method in molecular biol-ogy and the lower limits of input RNA depend on the method chosen but can be just a few picograms (Kurimoto and Saitou 2010). To amplify pine RNA isolated from LCM, we decided to use the Mint2 kit from Evrogen because it is a 5′-RACE-PCR-based method able to amplify full-length mRNA from total RNA and poly(A)+ RNA (Matz et al. 1999, Schmidt and Mueller 1999). The main reason for using this method in our laboratory is that it is especially well adapted for Roche 454 sequencing as we previ-ously confirmed with conventional RNA samples (see Table S1 available as Supplementary Data at Tree Physiology Online). This method is able to reduce and modify the poly(A:T) tails favouring the performance of 454 sequencing. This protocol for conifer RNA amplification and 454 sequencing was named CRA.

In the current work, the CRA protocol was adapted for the amplification of total RNA samples isolated by LCM from tissue types of maritime pine, and the improved protocol was named CRA+. In the present work we show data from the following sam-ples: cotyledon parenchyma (CP), hypocotyl pith (HM.1 and HM.2) and root vascular bundles (RV). The total RNA yield of our samples was very low, ranging from 0.5 to 2 ng/µl. In the CRA+ protocol, the volume of total RNA used for the amplification was 5 µl corresponding to 2.5–10 ng of total RNA, which is much lower than the minimal total RNA amount accepted by the CRA original protocol (250 ng) (Evrogen Mint2 cDNA synthesis man-ual, http://www.evrogen.com/kit-user-manuals/Mint-2.pdf, 17 December 2013, date last accessed). In the CRA protocol, the ss cDNA synthesis was developed at 42 °C and in the CRA+ at 50 °C, and we divided the first-strand synthesis and the PlugOligo-Adapter incorporation into two steps, following the rec-ommendations by Pinto and Lindblad (2010). Although the yield of ds cDNA synthesis from ss cDNA in the samples at 42 °C was higher (Figure 2, CRA lanes) than in the samples incubated at 50 °C (Figure 2, CRA+ lanes), the increased temperature in the first-strand cDNA synthesis reduced the problems related to the non-specific ss cDNA synthesis (Table 2).

Another parameter modified in the CRA protocol was the number of PCR cycles used for ds cDNA synthesis. The recom-mended number of cycles (from 15 to 24) in the CRA protocol was insufficient to obtain a visible amplification. We found that the optimal number of cycles to amplify our samples was between 25 and 34 cycles, and 25 or 28 cycles were usually selected in CRA+ (Figure 2). However, the samples incubated at 50 °C during the first-strand synthesis had very low yields in this step even if we included in the PCR mix all the ss cDNA gener-ated in the previous step. This is a critical point because it can result in the need to begin the protocol again due to poor ss cDNA or ds cDNA synthesis. DNA polymerase was also changed in CRA+ to improve the protocol. The high-fidelity iProof™ HF enzyme supplied by BioRad is faster (2 versus 1 kb min−1) and able to amplify longer sequences (37 versus 15 kb) than the Encyclo polymerase mix. The characteristics of the iProof™ HF allow the protocol to diminish the non-specific amplifications in reactions with a supraoptimal number of cycles (Figure 2). As in regular PCR amplifications, the products with a smaller length are promoted against the higher-length products in the course of the reaction.

LCM of pine tissues and transcriptome sequencing 1283

Table 1. Summary of the sample 454 sequencing.

Sample cDNA protocol Library length average (bp) Reads number Read length average (bp) Median read length (bp) Total bases

HM.1 CRA 1355 141,862 300 308 42,552,106RV CRA 1540 448,120 521 546 233,457,069Mean CRA 1448 294,991 411 427 138,004,588HM.2 CRA + 1083 631,059 632 712 399,055,868CP CRA + 1274 776,974 653 737 507,257,184Mean CRA + 1179 704,017 643 725 453,156,526

Figure 2. Double-stranded cDNA first-round amplification from total RNA of LCM tissue samples. Double-stranded cDNA amplifications per-formed with the original CRA and modified CRA+ protocols. Lane num-bers 25, 28, 31 and 34 indicate the PCR cycle numbers. Ten microliters of PCR product were loaded per lane. Lane L corresponds to 0.5 µg of a 1 kb ladder (GeneRuler™ 1 kb Plus, Fermentas). DNA electrophore-sis was carried out in a 1.2% agarose/EtBr gel run in 1× TAE buffer.

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Volume 34, 2014

The amplification of cDNA was performed in the CRA+ pro-tocol after the ds cDNA synthesis without the purification per-formed in CRA. This step was omitted to avoid the loss of cDNA in the samples. In both cases, the PCRs were run with a precycling step of three cycles and a second cycling step of 11 cycles. The number of cycles of this second cycling can be modified through a test PCR as carried out for the synthesis of ds cDNA with a number of optimal cycles from 11 to 17. The cDNA amplification was performed from a 250 µl PCR mix divided into five tubes and the products were purified together for subsequent use. Depending on the obtained yield, it is pos-sible to increase the volume of PCR mix to reach the appropriate cDNA amount. The final cDNA yield was > 1 µg.

The quality of the amplified cDNA with CRA and CRA+ pro-tocols was examined and the observed profiles were quite dif-ferent (Figure 3). Samples amplified with the CRA protocol showed DNA smearing with a maximal intensity < 1000 bp and very intense discrete bands (Figure 3a, HM.1, RV). In con-trast, the samples amplified with the CRA+ protocol showed a homogeneous DNA smearing with a maximal intensity ~1400 bp (Figure 3a, HM.2, CP). The isolated peaks observed in the CRA protocol samples suggest that the 42 °C ss cDNA synthesis promotes the non-specific reverse transcription of abundant RNAs (Figure 3b), while the ss cDNA synthesis at 50 °C is much more restrictive and lineal (Figure 3c). All these differences are important to determine whether the cDNA syn-thesis and cDNA amplification were appropriate for GS-FLX+ sequencing.

GS-FLX+ sequencing

In the 454 sequencing, the conventional Roche protocols were followed without changes. Sequencing runs were carried out with samples generated with CRA and CRA+ protocols. The ds

1284 Cañas et al.

Figure 3. Double-stranded cDNA second-round amplification from total RNA of LCM tissue samples. Virtual gel images and electropherograms from Agilent 7500 DNA chip runs in an Agilent 2100 Bioanalyzer. Gel image summary of the ds cDNA products of the second-round PCR ampli-fication (a). Lanes HM.1 and RV correspond to ds cDNA amplifications carried out with the CRA protocol; lanes HM.2 and CP correspond to the same samples amplified with the CRA+ protocol. Electropherogram sum-mary of the samples amplified with the CRA protocol (b). Electropherogram summary of the samples amplified with the CRA+ protocol (c).

Table 2. Quality analysis of the 454 sequencing by SeqTrimNext.

HM.1 RV HM.2 CP

cDNA protocol CRA CRA CRA + CRA +

Input sequences 14,1112 44,8120 63,1059 776,974Output sequences 3599 3616 459,228 591,958Rejected sequences 137,200 444,235 135,809 132,935Low complexity sequences

313 269 36,022 52,081

Mean input reads 694 800 916 962Mean output reads 163 272 368 372Low quality, % 1.83 0.33 22.06 23.38Contaminants, % 0.88 0.58 4.45 3.22Adapters, % 3.94 1.89 3.56 3.15Indeterminations, % 2.08 0.75 1.16 1.09Poly(A:T), % 0.03 0.01 0.89 0.80Low complexity, % 0.04 0.01 0.52 0.64Nucleotide useful, % 0.65 0.30 31.55 32.04Contaminant, % 1.54 0.86 6.93 4.75Low complexity, % 0.01 0.00 2.47 3.02Short inserts, % 76.81 51.17 0.49 0.43Repeated sequences, % 6.20 43.00 11.51 8.79Indeterminations, % 0.21 0.04 0.03 0.04Empty inserts, % 1.08 0.02 0.02 0.02No valid inserts, % 11.38 4.05 0.06 0.04Rejected sequences, % 97.23 99.13 21.52 17.11

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Online at http://www.treephys.oxfordjournals.org

cDNA samples were nebulized before the generation of the 454 FLX+ libraries due to the length of the amplified cDNAs (Figure 3). All the constructed libraries had a similar appearance independent of whether the cDNA preparations were obtained using CRA or CRA+ protocols (Figure 4). In all the cases the libraries showed a wide band with a maximal peak between 1083 and 1540 bp (Table 1). Consequently, the lengths and profiles observed in the libraries were not useful parameters to determine whether the samples were suitable or not for 454 sequencing. We would recommend therefore to carefully exam-ine the quality of the amplified cDNA rather than the profile of the library prior to 454 sequencing.

Table 1 shows the raw data of 454 sequencing runs from two LCM samples amplified with CRA (HM.1 and RV) and CRA+ (HM.2 and CP) protocols. Table 1 also shows the accu-mulative data obtained using both protocols. Each library was sequenced in one-half of a 454 PicoTiterPlate. The average number of reads was much higher in the sequencing of CRA+ libraries (704,017) compared with CRA libraries (294,991). The same was also true for other parameters such as read-length average and total number of sequenced bases. Thus, the read-length average of CRA+ libraries was 643 bp, whereas a much lower value was observed in CRA libraries (411 bp). The total number of sequenced bases using the CRA+ protocol was 453,156,526, whereas this value was significantly reduced when using the CRA protocol. Taken together, all these sequencing data clearly indicate that the number of reads, the mean length and the median length of reads were higher in the

samples amplified using the CRA+ protocol, and subsequently also the total number of sequenced bases (Table 1). The observed values of the above-mentioned parameters using CRA+ were in the range of the optimal values recommended for 454 sequencing. This suggests that the CRA+ protocol pro-motes the linear synthesis of ss cDNA, avoiding the preferential synthesis of small abundant and non-specific sequences, improving the cDNA quality that is essential for accurate and high-quality 454 sequencing.

This conclusion is further supported by the frequency analysis of the distribution profiles of read lengths resulting from GS-FLX Titanium pyrosequencing (Figure 5). In the CRA libraries, the number of reads was not distributed homogenously throughout the profile, with a large number of single peaks (Figure 5a). In the CRA+ libraries the number of reads was distributed homog-enously through the profiles without single peaks of read accu-mulation. Figure 5b clearly shows that the frequency of distribution of read lengths in the CRA+ libraries is close to a normal distribution, between 600 and 1000 bp lengths, with a long tail in the low read lengths, between 1 and 600 bp. These profiles are characteristic of the optimal 454 sequencing.

Quality of the 454 sequencing

Table 2 shows the quality analyses of 454 sequencing runs performed with two LCM samples amplified using CRA and CRA+ protocols. For these analyses, the trimming software SeqTrimNext (Falgueras et al. 2010) was used. The percentage of rejected sequences was higher in the CRA (98.18%) than in

LCM of pine tissues and transcriptome sequencing 1285

Figure 4. Electropherograms of the 454 libraries synthesized from the amplified ds cDNA. Virtual gel images and electropherogram from Agilent High Sensitivity DNA chip runs in an Agilent 2100 Bioanalyzer. Libraries constructed with ds cDNA amplified using the CRA protocol (CRA panel). Libraries constructed with ds cDNA amplified using the CRA+ protocol (CRA+ panel). Library length averages of the samples are indicated in parentheses.

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Volume 34, 2014

the CRA+ samples (19.32%). Consequently, the percentages of useful nucleotides were much lower in the CRA (0.48%) than in the CRA+ (31.80%) samples. The mean length of the input reads in the CRA samples (747 bp) was lower than that in the CRA+ samples (939 bp). Consequently, the mean lengths of the output sequences were much lower in CRA than in CRA+ sam-ples. These data show that the modifications introduced to the CRA+ protocol improved the quality of cDNA for subsequent 454 pyrosequencing. When we analyse the quality indexes of the reads, we observe that the CRA protocol was unable to avoid reverse transcription of non-specific RNA sequences (Table 2). The percentage level of no valid inserts, short inserts and repeated reads were usually lower in the CRA+ samples. Likely, this problem is due to the low amount of input mRNA that promotes the non-specific reverse transcription of abundant RNA as observed in the CRA samples. Non-specific amplifica-tion was prevented in the CRA+ samples by increasing the incu-bation temperature in the synthesis of first-strand cDNA.

The percentage of the poly(A:T) nucleotide regions elimi-nated was very low in all the sequenced samples (Table 2).

This shows that the RNA amplification protocols used here are able to effectively modify the poly(A:T) tails. This result supports the choice of the current protocol to process total RNA samples from LCM for 454 sequencing, avoiding the problems found in other studies that combine LCM and 454 sequencing (Emrich et al. 2007, Ohtsu et al. 2007). Another important problem of cDNA synthesis and amplification kits based on poly(T) primers/adapters is the bias in the 3′ end transcript sequences. In our case we observed this tendency, with sequence alignments towards the 3′ end of the transcripts in all the samples, no matter which protocol was employed (R. Cañas, L. Sterck and F.M. Cánovas, unpublished data), as has previously been described (Emrich et al. 2007, Babik et al. 2010, Li et al. 2010, Takacs et al. 2012).

Validation of 454 sequencing results by qPCR

With the aim of confirming the robustness of the 454 sequenc-ing data obtained from the CRA+ samples, we performed qPCR analysis to compare the expression of different known genes. We analysed the expression of four genes that have been pre-viously studied in our laboratory (glutamine synthetase 1a, GS1a; glutamine synthetase 1b, GS1b; asparagine synthetase 1, AS1; asparaginase, ASPG) and two reference genes regularly

1286 Cañas et al.

Figure 5. Frequency distribution of 454 sequencing read lengths. Graphics correspond to runs with RNA samples amplified using the CRA (a) and CRA+ (b) protocols.

Figure 6. Validation of selected candidate genes using qPCR. (a) qPCR analysis of LCM samples (CP, HM and RV) for the genes: actin, EF1A, GS1a, GS1b, AS1 and ASPG. The results are the mean of three biological replicates and the standard deviations are shown. (b) Normalized read counts of the genes assayed by qPCR in the 454 sequenced samples (CP, HM.2, RV and HM.1). The read counts for each gene are divided by the total read count of the libraries.

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Online at http://www.treephys.oxfordjournals.org

used in our laboratory (actin; elongation factor 1α, EF1A). Complementary DNA was synthesized and qPCR data (Figure 6a) were compared with the normalized number of reads for each gene in the 454 sequencing (Figure 6b). The qPCR anal-ysis supports that the CRA+ is able to avoid gene bias in the cDNA amplification despite the high number of PCR cycles used in the protocol. Considering the number of useful reads in the 454 sequencing (HM.2, 459,228 reads; CP, 591,958 reads), qPCR- and CRA+-derived data showed high similarity. The observed differences could be explained by the lack of biological replicates in the 454 sequencing. The analysis is much more difficult in the CRA samples with a low number of useful reads (HM.1, 3599 reads; RV, 3616 reads). The reads for each gene were normalized, dividing by the total number of reads in the sequencing run. Although the normalization was able to recuperate some data for the expression analysis, the low number of reads in the CRA samples indicates that the CRA protocol is not suitable for expression analyses.

Conclusions

In the current work, we have successfully adapted pre-existing protocols for LCM and 454 transcriptome sequencing in the conifer tree P. pinaster. We have been able to obtain RNA prepa-rations with good quality indeces from isolated tissue sections by LCM employing snap-frozen samples. Using an improved protocol for CRA+ , we were able to obtain enough high-quality cDNA for 454 sequencing. The 454 runs generated quality parameters very similar to those obtained in runs performed with conventional RNA samples. Using our CRA+ protocol, we mainly avoid non-specific amplifications, reduction of read length and read number, and finally problems derived from poly(A:T) tails in 454 pyrosequencing.

Supplementary data

Supplementary data for this article are available at Tree Physiology Online.

Acknowledgments

We thank Prof. Jose María Pomares for the use of the cryostat HM 525. We thank Zhen Li from VIB (Ghent, Belgium) for the analysis of the bias in the 3′ end transcript sequences.

Conflict of interest

None declared.

Funding

This work was supported by the European Commission Seventh Framework grant PROCOGEN (FP7-KBBE-2011-5).

References

Abbott E, Hall D, Hamberger B, Bohlmann J (2010) Laser microdissec-tion of conifer stem tissues: isolation and analysis of high quality RNA, terpene synthase enzyme activity and terpenoid metabolites from resin ducts and cambial zone tissue of white spruce (Picea glauca). BMC Plant Biol 10:106.

Babik W, Stuglik M, Qi W et al. (2010) Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC Genomics 11:390.

Birol I, Raymond A, Jackman SD et al. (2013) Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics 29:1492–1497.

Boggy GJ, Woolf PJ (2010) A mechanistic model of PCR for accurate quantification of quantitative PCR data. PLoS One 5:e12355.

Canales J, Bautista R, Label P et al. (2013) De novo assembly of mari-time pine transcriptome: implications for forest breeding and bio-technology. Plant Biotechnol J (in press) doi:10.1111/pbi.12136

Chen J, Uebbing S, Gyllenstrand N et al. (2012) Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms. BMC Genomics 13:589.

Decarlo K, Emley A, Dadzie OE, Mahalingam M (2011) Laser capture microdissection: methods and applications. Methods Mol Biol 755:1–15.

Ekblom R, Slate J, Horsburgh GJ et al. (2012) Comparison between normalised and unnormalised 454-sequencing libraries for small-scale RNA-seq studies. Comp Funct Genomics 2012:281693.

Emrich SJ, Barbazuk WB, Li L, Schnable PS (2007) Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res 69–73.

Evrogen Mint-2 cDNA synthesis kit manual. http://www.evrogen.com/kit-user-manuals/Mint-2.pdf (17 December 2013, date last accessed).

Falgueras J, Lara AJ, Fernández-Pozo N et al. (2010) SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics 11:38.

Fernández-Pozo N, Canales J, Guerrero-Fernández D et al. (2011) EuroPineDB: a high-coverage web database for maritime pine tran-scriptome. BMC Genomics 12:366.

Fleige S, Pfaffl MW (2006) RNA integrity and the effect on the real-time qRT-PCR performance. Mol Asp Med 27:126–139.

Hölscher D, Schneider B (2007) Laser microdissection and cryogenic nuclear magnetic resonance spectroscopy: an alliance for cell type-specific metabolite profiling. Planta 225:763–770.

Kumar S, Blaxter ML (2010) Comparing de novo assemblers for 454 transcriptome data. BMC Genomics 11:571.

Kurimoto K, Saitou M (2010) Single-cell cDNA microarray profiling of complex biological processes of differentiation. Curr Opin Genet Dev 20:470–477.

Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26:589–595.

Li P, Ponnala L, Gandotra N et al. (2010) The developmental dynamics of the maize leaf transcriptome. Nat Genet 42:1060–1067.

Mackay J, Dean JFD, Plomion C et al. (2012) Towards decoding the coni-fer giga-genome. Plant Mol Biol 80:555–569.

Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9:387–402.

Matz M, Shagin D, Bogdanova E et al. (1999) Amplification of cDNA ends based on template-switching effect and step-out PCR. Nucleic Acids Res 27:1558–1560.

Milne I, Stephen G, Bayer M et al. (2013) Using Tablet for visual exploration of second-generation sequencing data. Brief Bioinform 14:193–202.

Mutz K-O, Heilkenbrinker A, Lönne M et al. (2013) Transcriptome analysis using next-generation sequencing. Curr Opin Biotechnol 24:22–30.

LCM of pine tissues and transcriptome sequencing 1287

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from

Tree Physiology Volume 34, 2014

Nystedt B, Street NR, Wetterbom A et al. (2013) The Norway spruce genome sequence and conifer genome evolution. Nature 497:579–584.

Ohtsu K, Takahashi H, Schnable PS (2007) Cell type-specific gene expression profiling in plants by using a combination of laser micro-dissection and high-throughput technologies. Plant Cell 48:3–7.

Ordway GA, Szebeni A, Duffourc MM et al. (2010) NIH public access. J Neurosci 87:2430–2438.

Pinto FL, Lindblad P (2010) A guide for in-house design of template-switch-based 5 0 rapid amplification of cDNA ends systems. Anal Biochem 397:227–232.

Rigault P, Boyle B, Lepage P et al. (2011) A white spruce gene catalog for conifer. Plant Physiol 157:14–28.

Ritland K (2012) Genomics of a phylum distant from flowering plants: conifers. Tree Genet Genomes 8:573–582.

Ritz C, Spiess A-N (2008) qpcR: an R package for sigmoidal model selection in quantitative real-time polymerase chain reaction analy-sis. Bioinformatics 24:1549–1551.

Scanlon MJ, Ohtsu K, Timmermans MCP, Schnable PS (2009) Laser microdissection-mediated isolation and in vitro transcriptional ampli-fication of plant RNA. Curr Protoc Mol Biol 87:25A.3.1–25A.3.15.

Schmidt WM, Mueller MW (1999) CapSelect: a highly sensitive method for 5′ CAP-dependent enrichment of full-length cDNA in PCR-mediated analysis of mRNAs. Nucleic Acids Res 27:e31.

Takacs EM, Li J, Du C et al. (2012) Ontogeny of the maize shoot apical meristem. Plant Cell 24:3219–3234.

Thiel J, Hollmann J, Rutten T et al. (2012) 454 Transcriptome sequencing suggests a role for two-component signalling in cellularization and dif-ferentiation of barley endosperm transfer cells. PloS One 7:e41867.

Torti S, Fornara F, Vincent C et al. (2012) Analysis of the Arabidopsis shoot meristem transcriptome during floral transition identifies dis-tinct regulatory patterns and a leucine-rich repeat protein that pro-motes flowering. Plant Cell 24:444–462.

Xiong Y, Li Q-B, Kang B-H, Chourey PS (2011) Discovery of genes expressed in basal endosperm transfer cells in maize using 454 tran-scriptome sequencing. Plant Mol Biol Rep 29:835–847.

1288 Cañas et al.

by guest on May 21, 2016

http://treephys.oxfordjournals.org/D

ownloaded from