Genome-wide high throughput analysis of DNA methylation in eukaryotes

9
Methods 47 (2009) 142–150 1046-2023/$ - see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ymeth.2008.09.022 Contents lists available at ScienceDirect Methods journal homepage: www.elsevier.com/locate/ymeth 1. Introduction Site-specific DNA methylation is an epigenetic mark found in organisms across all domains of life. Archae and eubacteria have a variety of methylated nucleotides including N4-methylcytosine (4mC), 5-methylcytosine (5mC) and N6-methyladenine (6mA) [1,2]. The 6mA found at GATC sites in -proteobacteria is likely the best studied DNA modification in single celled organisms and is involved in genome defense, DNA mismatch repair, DNA replication and control of gene expression [3]. While some eukaryotes (e.g. the genus Tetrahymena) contain 6mA, 5mC is the most prevalent, most studied and best understood DNA modification in eukaryotes [4]. Here we describe genome-wide approaches to study the distribu- tion of 5mC in eukaryotes. In many eukaryotes, DNA methylation is thought to control gene expression by modulation of DNA–protein interactions [5–7]. Nevertheless, DNA methylation is not essential in all eukaryotes. The yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe and the nematode Caenorhabditis elegans appear to have lost the DNA methylation machinery as no DNA methyltransferase genes are present in their genomes and no DNA methylation has been detected by various methods [8–10]. Very low levels of DNA methylation have been reported during development in Drosophila melanogaster [11]. Within the filamentous fungi, DNA methylation has been detected in many species but it is not universally present, e.g. Aspergillus nidulans has no detectable DNA methylation while very low levels have been reported from the closely related A. fla- vus [12]. Two fungi that have been used extensively for DNA meth- ylation research, Neurospora crassa and Ascobolus immersus, show heterogeneous distribution of 5mC in all possible sequence con- texts. In these species, DNA methylation is thought to be involved in “genome defense”, e.g. by silencing invading transposable ele- ments [13], but is not essential for survival. In plants, e.g. Arabidop- sis thaliana, DNA methylation similarly silences the expression of transposable elements. Upon loss of DNA methylation, transposons reactivate and integrate in various regions of the genome, causing pleiotropic effects on development as has been demonstrated by use of ddm1 mutants [14]. Arabidopsis contains an abundance of 5mC sites [15] in symmetrical and non-symmetrical contexts at CpG, CpHpG and CpHpH sites, where H represents A, C or T [16,17]. In vertebrates, where DNA methylation plays a role in chromatin packaging [18,19] and transcription [20–24], 5mC occurs almost exclusively in the CpG context [25]. Many methods have been used to assess DNA methylation [26]. HPLC is a conceptionally simple method to determine the overall content of 5mC in genomic DNA [27], and has been used in combi- nation with thin-layer chromatography to demonstrate very minor amounts of DNA methylation in D. melanogaster and A. flavus [11,12]. Nevertheless, establishing absence or presence of DNA methylation and determining the percentage of 5mC in a specific genome is only the first step to elucidate methylation patterns that may control gene Genome-wide high throughput analysis of DNA methylation in eukaryotes Kyle R. Pomraning, Kristina M. Smith, Michael Freitag * Center for Genome Research and Biocomputing and Department of Biochemistry and Biophysics, Oregon State University, 2011 ALS Building, Corvallis, OR 97331-7305, USA article info abstract Article history: Accepted 29 September 2008 Available online 23 October 2008 Cytosine methylation is the quintessential epigenetic mark. Two well-established methods, bisulfite sequenc- ing and methyl-DNA immunoprecipitation (MeDIP) lend themselves to the genome-wide analysis of DNA methylation by high throughput sequencing. Here we provide an overview and brief review of these methods. We summarize our experience with MeDIP followed by high throughput Illumina/Solexa sequencing, exem- plified by the analysis of the methylated fraction of the Neurospora crassa genome (“methylome”). We provide detailed methods for DNA isolation, processing and the generation of in vitro libraries for Illumina/Solexa sequencing. We discuss potential problems in the generation of sequencing libraries. Finally, we provide an overview of software that is appropriate for the analysis of high throughput sequencing data generated by Illumina/Solexa-type sequencing by synthesis, with a special emphasis on approaches and applications that can generate more accurate depictions of sequence reads that fall in repeated regions of a chosen reference genome. © 2008 Elsevier Inc. All rights reserved. Keywords: Cytosine DNA methylation High throughput sequencing 5-methylcytosine MeDIP Methylome Epigenome Epigenetics Bisulfite sequencing Genomic sequencing Neurospora crassa * Corresponding author. Fax: +1 541 737 0481. E-mail address: [email protected] (M. Freitag).

Transcript of Genome-wide high throughput analysis of DNA methylation in eukaryotes

Methods 47 (2009) 142–150

1046-2023/$ - see front matter © 2008 Elsevier Inc. All rights reserved.

doi:10.1016/j.ymeth.2008.09.022

Contents lists available at ScienceDirect

Methods

journal homepage: www.elsevier.com/ locate /ymeth

1. Intro duc tion

Site-spe cific DNA meth yl a tion is an epi ge netic mark found in organ isms across all domains of life. Archae and eu bac te ria have a vari ety of meth yl ated nucle o tides includ ing N4-meth yl cyt o sine (4mC), 5-meth yl cyt o sine (5mC) and N6-meth yl ad e nine (6mA) [1,2]. The 6mA found at GATC sites in -pro te o bac te ria is likely the best stud ied DNA mod i fi ca tion in sin gle celled organ isms and is involved in genome defense, DNA mis match repair, DNA rep li ca tion and con trol of gene expres sion [3]. While some eukary otes (e.g. the genus Tet ra hy mena) con tain 6mA, 5mC is the most prevalent, most stud ied and best under stood DNA mod i fi ca tion in eukary otes [4]. Here we describe genome-wide approaches to study the dis tri bu-tion of 5mC in eukary otes.

In many eukary otes, DNA meth yl a tion is thought to con trol gene expres sion by mod u la tion of DNA–pro tein inter ac tions [5–7]. Nev er the less, DNA meth yl a tion is not essen tial in all eukary otes. The yeasts Sac cha ro my ces ce re vi si ae and Schizo sac cha ro my ces pom be and the nem a tode Cae no rhab di tis ele gans appear to have lost the DNA meth yl a tion machin ery as no DNA meth yl trans fer ase genes are pres ent in their genomes and no DNA meth yl a tion has been detected by var i ous meth ods [8–10]. Very low lev els of DNA meth yl a tion have been reported dur ing devel op ment in Dro soph ila mel a no gas ter [11]. Within the fil a men tous fungi, DNA meth yl a tion

has been detected in many spe cies but it is not uni ver sally pres ent, e.g. Asper gil lus nidu lans has no detect able DNA meth yl a tion while very low lev els have been reported from the closely related A. fla-vus [12]. Two fungi that have been used exten sively for DNA meth-yl a tion research, Neu ros pora crassa and As co bo lus im mer sus, show het er o ge neous dis tri bu tion of 5mC in all pos si ble sequence con-texts. In these spe cies, DNA meth yl a tion is thought to be involved in “genome defense”, e.g. by silenc ing invad ing trans pos able ele-ments [13], but is not essen tial for sur vival. In plants, e.g. Ara bid op-sis tha li ana, DNA meth yl a tion sim i larly silences the expres sion of trans pos able ele ments. Upon loss of DNA meth yl a tion, trans po sons reac ti vate and inte grate in var i ous regions of the genome, caus ing pleio tro pic effects on devel op ment as has been dem on strated by use of ddm1 mutants [14]. Ara bid op sis con tains an abun dance of 5mC sites [15] in sym met ri cal and non-sym met ri cal con texts at CpG, CpHpG and CpHpH sites, where H rep re sents A, C or T [16,17]. In ver te brates, where DNA meth yl a tion plays a role in chro ma tin pack ag ing [18,19] and tran scrip tion [20–24], 5mC occurs almost exclu sively in the CpG con text [25].

Many meth ods have been used to assess DNA meth yl a tion [26]. HPLC is a con cep tion al ly sim ple method to deter mine the over all con tent of 5mC in geno mic DNA [27], and has been used in com bi-na tion with thin-layer chro ma tog ra phy to dem on strate very minor amounts of DNA meth yl a tion in D. mel a no gas ter and A. fla vus [11,12]. Nev er the less, estab lish ing absence or pres ence of DNA meth yl a tion and deter min ing the per cent age of 5mC in a spe cific genome is only the first step to elu ci date meth yl a tion pat terns that may con trol gene

Genome-wide high throughput analysis of DNA methylation in eukaryotes

Kyle R. Pomraning, Kristina M. Smith, Michael Freitag*

Cen ter for Genome Research and Bio com put ing and Depart ment of Bio chem is try and Bio phys ics, Ore gon State Uni ver sity, 2011 ALS Build ing, Cor val lis, OR 97331-7305, USA

a r t i c l e i n f o a b s t r a c t

Article history:

Accepted 29 September 2008

Available online 23 October 2008

Cyto sine meth yl a tion is the quin tes sen tial epi ge netic mark. Two well-estab lished meth ods, bisul fite sequenc-

ing and methyl-DNA immu no pre cip i ta tion (Me DIP) lend them selves to the genome-wide anal y sis of DNA

meth yl a tion by high through put sequenc ing. Here we pro vide an over view and brief review of these meth ods.

We sum ma rize our expe ri ence with Me DIP fol lowed by high through put Illu mina/So lexa sequenc ing, exem-

pli fied by the anal y sis of the meth yl ated frac tion of the Neu ros pora crassa genome (“methy lo me”). We pro vide

detailed meth ods for DNA iso la tion, pro cess ing and the gen er a tion of in vitro libraries for Illu mina/So lexa

sequenc ing. We dis cuss potential prob lems in the gen er a tion of sequenc ing libraries. Finally, we pro vide an

over view of soft ware that is appro pri ate for the anal y sis of high through put sequenc ing data gen er ated by

Illu mina/So lexa-type sequenc ing by syn the sis, with a spe cial empha sis on approaches and appli ca tions that

can gen er ate more accu rate depic tions of sequence reads that fall in repeated regions of a cho sen ref er ence

genome.

© 2008 Elsevier Inc. All rights reserved.

Key words:

Cyto sine DNA meth yl a tion

High through put sequenc ing

5-meth yl cyt o sine

Me DIP

Methy lo me

Epi ge nome

Epi ge net ics

Bisul fite sequenc ing

Geno mic sequenc ing

Neu ros pora crassa

* Cor re spond ing author. Fax: +1 541 737 0481.

E-mail address: fre i [email protected] e gon state.edu (M. Freitag).

K.R. Pom ra ning et al. / Methods 47 (2009) 142–150 143

expres sion. Cur rent meth ods are either used for “typ ing” a spe cific known region that may or may not be meth yl ated, or for “pro fil ing”, where parts or all of a given genome are inves ti gated and no a pri ori knowl edge of meth yl a tion sta tus and/or sequence is required [26].

One of the ear li est typ ing approaches relies on the com par i son of diges tion pat terns caused by meth yl a tion-sen si tive and meth yl a-tion-insen si tive iso schiz o mers of restric tion endo nu cle ases [28,29]. This approach has proven use ful to estab lish meth yl a tion pat terns for many ver te brate, plant and fun gal genes or inter genic regions. In fungi, anal y sis with iso schiz o mers that require sym met ri cal rec-og ni tion sites resulted in the under es ti ma tion of 5mC con tent [30]; later stud ies showed that 5mC can occur in non-sym met ri cal sites in Neu ros pora [31]. Other meth yl a tion typ ing approaches involve semi-quan ti ta tive and quan ti ta tive PCR meth ods that are use ful for detec tion and quan ti fi ca tion of meth yl a tion at cer tain loci by gene ampli fi ca tion after restric tion digest [32,33]. In most cases, bisul fite con ver sion of 5mC is the most infor ma tive way to ana lyze gene- or region-spe cific DNA meth yl a tion pat terns. Unmod i fied cy to sines are con verted to ura cil (U), while 5mC remains uncon-verted. The con verted DNA is ampli fied by PCR, thus intro duc ing C to T changes in the PCR prod ucts, which are sequenced and com-pared to untreated DNA con trols to yield the exact position of 5mC within a sequence [34]. The most com mon pit falls of this method are related to incom plete C–U con ver sion and the inabil ity to find appro pri ate primer pair com bi na tions in spe cies where 5mC is not found in sym met ri cal con texts. Many com bi na tions of the tech-niques described above were applied before the wide spread use of micro arrays and are reviewed else where [35,36].

Restric tion land mark geno mic scan ning (RLGS) was the first truly genome-wide method avail able for ana lyz ing meth yl a tion by com par ing restric tion frag ments after end-label ing of digested DNA [37]. Another early genome-wide approach relied on puri fi-ca tion of meth yl ated DNA by col umn chro ma tog ra phy on res ins that were cou pled to the methyl-bind ing domain (MBD) of MeCP2 [13,38,39]. More recently, micro array tech nol ogy has been devel-oped as a pow er ful tool to ana lyze the meth yl a tion state of an organ ism [40–43]. Meth yl ated DNA immu no pre cip i ta tion (Me DIP) [44] is used in con junc tion with til ing micro arrays for the eval u-a tion of spe cific genetic loci and dis ease states [41,45–48]. Many micro array vari ants have also been used in con junc tion with bisul-fite con ver sion to ana lyze the meth yl a tion sta tus of spe cific loci of inter est dur ing dis ease [22,49–52] or whole genomes [53,54]. Cur-rently, til ing array design and pro duc tion are the main draw backs to this tech nique and are con founded by inac cu rate hybrid iza tion sig nals. More over, in most genomes rel a tively heav ily meth yl ated regions of repeat DNA have not been sequenced or assem bled and are thus lack ing from cur rent micro array designs.

As a result, micro array tech nol ogy is being replaced by high through put short read sequenc ing as the method of choice. Here we describe two tech niques as car ried out with the fil a men tous fun-gus Neu ros pora crassa. Me DIP sequenc ing (Me DIP-Seq) com bines immu no pre cip i ta tion of 5mC res i dues with sequenc ing to cre ate mod er ate res o lu tion meth yl a tion pro files, while tra di tional bisul-fite con ver sion is used with high through put sequenc ing (bisul-fite sequenc ing, BS-Seq; for merly called “geno mic sequenc ing”) to gen er ate sin gle base res o lu tion meth yl a tion pro files (Fig. 1). These tech niques can be used in tan dem to pro vide inde pen dent meth-ods for val i da tion of whole genome meth yl a tion pro fil ing.

2. DNA iso la tion and frag men ta tion

2.1. Intro duc tion

Prior to meth yl a tion anal y sis geno mic DNA must be puri fied and pro cessed. Iso la tion of geno mic DNA and sep a ra tion of the meth yl ated from the non-meth yl ated com po nent of the genome

are the first crit i cal steps in deter min ing genome-wide meth yl-a tion pat terns. We found that com monly used meth ods for the iso la tion of geno mic DNA prove suc cess ful for subsequent Me DIP, as well as val i da tion by region-spe cific PCR or South ern blot ting. This includes many com mer cially avail able kits for the iso la tion of plant DNA. Here we include a detailed descrip tion of a widely used method for the prep a ra tion of geno mic DNA from fil a men-tous fungi.

2.2. DNA iso la tion from fil a men tous fungi

We grow Neu ros pora as shak ing cul tures in 10 ml of Vo gel’s min i mal medium [55] with 1.5% sucrose at 32 °C for three days in 125 ml Erlen meyer flasks or 22 £ 150 mm cul ture tubes with Kim Caps. Tis sue is har vested with wooden appli ca tor sticks, briefly dried on paper tow els, trans ferred to 15 ml poly pro pyl ene tubes and fro zen at ¡20° C or ¡80 °C. Solid tis sue is lyoph i lized over night and pul ver ized by vor texing with a metal spat ula for »30 s. Sam ples are sus pended in salt–deter gent solu tion: Dis solve

Fig. 1. Meth ods for whole genome meth yl a tion anal y sis. Geno mic DNA is puri fied

and sheared. For BS-Seq, (left path way) adapt ors are ligated to the DNA prior to

treat ment with bisul fite which con verts un me thy lat ed cy to sines (C) into ura cils

(U). Meth yl ated cy to sines (M) remain unmod i fied. The con verted DNA is ampli fied

by PCR and digested. For Me DIP-Seq (right path way), meth yl ated DNA is immu no-

pre cip i tated with 5mC-spe cific anti bod ies. In both cases, the meth yl ated frac tion

of the genome is puri fied, assem bled into in vitro libraries and sequenced on high

through put sequenc ing machines. The result ing mil lions of short sequence “reads”

are aligned to a ref er ence genome to map enrich ment of meth yl ated DNA in spe cific

chro mo somal regions.

144 K.R. Pom ra ning et al. / Methods 47 (2009) 142–150

2 g of Na–deoxy cho late (Sigma, D6750), 5 g of Brij 58/poly oxy eth-yl ene20cetyl ether (Sigma, P5884), 58.44 g of NaCl in 350 ml ster-ile dis tilled water, adjust to 500 ml with water and store at 4 °C;a pre cip i tate may appear over time, but after re-mix ing the solu tion can be used with out ill effects. We add su! cient salt–deter gent solu tion (usu ally »600 l) to the pul ver ized tis sue to make a thick sus pen sion after vor texing at high speed for 20 s. Sam ples are incu-bated at room tem per a ture for »20 min and vor texed once or twice. Tis sues will become vis cous over time. At this point sam ples can be trans ferred to Ep pen dorf tubes for fur ther pro cess ing. Sam ples are cen tri fuged either in the large tube in a Sorv all SS-34 rotor at 8,000 rpm for 10 min, or at full speed in a 1.5 or 2 ml Ep pen dorf tube in a bench top micro cen tri fuge. This typ i cally yields 400–600 l of super na tant, which is trans ferred to a fresh Ep pen dorf tube. The tube is filled with 4.5 M TCA:EtOH (1:1 v/v) solu tion. To make 4.5 M TCA:EtOH solu tion, dis solve 417 g Na-TCA salt (not free acid; Cres cent Chem i cal Co., #AV17004) in 200 ml of water, use low heat to get the TCA salt into solu tion, adjust to 500 ml, add 500 ml of 95% or 100% eth a nol and store at 4 °C. A pre cip i tate will appear and set tle, usu ally over night—this pre cip i tate should be avoided. At least 3 vol umes of TCA:EtOH solu tion should be added for e! cient pre cip i ta tion. Sam ples are mixed by inver sion and nucleic acids and pro teins pre cip i tated at ¡20 °C for at least 2 h. Sam ples may be stored at this stage for up to two days—upon fur ther stor age the result ing geno mic DNA does not digest well with most restric tion endo nu cle ases. Nucleic acids are pel leted by cen tri fu ga tion at full speed for 1 min in a micro fuge and the super na tant aspi rated or poured off the pel lets. The pel lets are resus pended in 200 l of 10 mM NH4OAc + 0.3 g/ml RNase A (a pre-mixed solu tion that can be stored at 4 °C for sev eral months), detached from tube walls by strik ing sev eral times across a plas-tic tube rack and vor texed briefly. RNA is digested at 50 °C for 40 min; we vor tex briefly every 10 min to help resus pend the pellet. We add 200 l of chlo ro form, vor tex briefly, cen tri fuge in a micro fuge for 5 min at full speed and trans fer the now clear super na tant to a fresh 1.5 ml tube. We add 900 l of iso pro pa nol/NH4OAc (for a stock that can be kept at room tem per a ture for sev eral weeks, mix 42.5 ml iso pro pa nol + 7.5 ml of 5 M NH4OAc), mix well by inver sion, cen tri fuge imme di ately in a micro fuge for 1 min at full speed, aspi rate the super na tant, wash the pellet with 300 l of 70% EtOH, aspi rate the super na tant and air-dry for 15 min. The pellet is resus pended in 100 l TE buffer over night at 4 °C, typ i cally yield ing »200 ng/ l or »25 g of geno mic DNA per cul ture.

Dur ing puri fi ca tion we include an RNase step, which reduces the size of con tam i nat ing RNA in the DNA prep a ra tion. An addi tional gel puri fi ca tion step after son i ca tion but before Me DIP removes the remain ing RNA and may be ben e fi cial to increase recov ery of DNA from the Me DIP, because the 5-methyl cyti dine anti body used for Me DIP also rec og nizes meth yl ated RNA.

2.3. DNA pro cess ing

Geno mic DNA must be pro cessed into small frag ments prior to Me DIP-Seq or BS-Seq. Frag ment ends need to be repaired and adapt ors need to be ligated to the short frag ments for both BS-Seq (see Sec tion 4.2.) and high through put sequenc ing meth ods (see Sec tion 5.). His tor i cally, cleav age into short frag ments has been achieved using either restric tion endo nu cle ases, which gen er ate non-random frag ments based on DNA com po si tion, or a vari ety of mechan i cal shear ing tech niques. Like oth ers [44,56,57] (Z.A. Lewis, S. Honda, T. Khla fal lah, J.K. Jeff ress, M. Fre i tag, F. Mohn, D. Schü bel er and E.U. Sel ker, sub mit ted), we found that son i ca tion is a rel a tively quick method to shear the DNA while avoid ing the bias of restric tion enzymes. End-repair and adap tor liga tion pro-to cols resem ble those that have been used to gen er ate probes for

micro array anal y ses [56,58] For Neu ros pora, this Me DIP-Seq pro-to col works very well as there is su! cient time to allow re an neal-ing of sin gle stranded, immu no pre cip i tated DNA to gen er ate dou-ble-stranded DNA, which serves as sub strate for the pol ish ing and adap tor liga tion reac tions. Alter na tively, the pol ish ing and adap tor liga tion reac tions are car ried out prior to the immu no pre cip i ta tion steps [16,17].

To shear DNA, we dilute 5–50 g of geno mic DNA into 400 l of TE buffer in a micro cen tri fuge tube. We have suc cess fully used 5 g of DNA to Me DIP the »42 Mb Neu ros pora genome, where 5 g rep re sent »100 mil lion cop ies of the genome. For son i ca tion we use a Bran son son i fier 450 equipped with a mi cro tip, set to a duty cycle of 80% and output con trol of 1.2 [59,60]. For other mod els spe cific son i ca tion con di tions need to be estab lished on a case-by-case basis. We son i cate the DNA five times for 10 s each with 30 s rest on ice between cycles and run »250 ng of sheared DNA on a 1% aga rose gel to ver ify the size dis tri bu tion of the son i-cated DNA. We typ i cally aim for a smear of sheared DNA between 300 and 1000 bp in length. If the smear of DNA is too large one can son i cate the remain ing sam ple for addi tional cycles and re-check the frag ment size dis tri bu tion. Prior to work ing with a new organ ism we rec om mend run ning a stan dard from two to twelve cycles of son i ca tion to deter mine how many cycles are nec es sary to achieve the desired result. With Neu ros pora DNA, even just two cycles of son i ca tion reduces the aver age size of the geno mic DNA to »1.2 kb.

The size of the sheared DNA directly affects the precision achiev able when map ping Me DIP-Seq reads. Because the 5-methyl cyti dine anti body requires more than just a sin gle 5mC to e! ciently bind [63], shorter DNA frag ments will allow more pre cise map ping to a ref er ence genome but may not enrich for regions that have only a few meth yl ated cy to sines. Con versely, longer frag ments will make map ping less pre cise but will increase sen si tiv ity for detec tion of regions with lower lev els of 5mC. The size of the sheared frag ments will also affect PCR val i da tion of the results as small DNA frag ments will require closely spaced PCR primer pairs. Like oth ers [44] and (Z.A. Lewis, S. Honda, T. Khla fal lah, J.K. Jeff ress, M. Fre i tag, F. Mohn, D. Schü bel er and E.U. Sel ker, sub mit ted), we found frag ments sheared to »500 bp to be the most use ful. This allows us to design PCR primer pairs that amplify con trol prod ucts of »400 bp along with test prod ucts of »200 bp (see Sec tion 7). One method to increase the accu racy asso ci ated with read map ping from high through put sequenc ing is to decrease the size range of the input DNA by excis ing a nar-row, well-defined band (450–550 bp), puri fy ing the DNA with a com mer cially avail able gel extrac tion kit and using this as the input DNA for Me DIP.

3. Me DIP pro to col

Our pro to col is almost iden ti cal to the original Me DIP method described by Weber et al. [44] and was first used with Neu ros-pora by Sel ker and co-work ers (Z.A. Lewis, S. Honda, T. Khla fal lah, J.K. Jeff ress, M. Fre i tag, F. Mohn, D. Schü bel er and E.U. Sel ker, sub-mit ted). We save a quar ter of the son i cated input DNA as con trol and use the rest for the Me DIP. At least 5 g of sheared DNA is diluted into 450 l of TE buffer, dena tured in a 100 °C heat block for 10 min and snap-cooled on ice for 5 min. We add 50 l of 10£ immu no pre cip i ta tion buffer [100 mM Na-Phos phate pH 7.0, 1.4 M NaCl, 0.5% Tri tonX-100] and 1 l of the 5mC anti body to the DNA solu tion (Di agen ode, #MAb-5ME CYT-100, 1 g/ l), and incu bate for 2 h on an orbi tal rota tor at 4 °C. While the anti bod ies and DNA are incu bat ing, we pre-wash 40 l of mag netic Dyn a be ads (Invit-ro gen, M-280 sheep anti-mouse IgG) with 1 ml of PBS + 0.1% BSA for 5 min at room tem per a ture with shak ing. The beads are col-lected by use of a stick mag net attached to a pipet tip rack, and

K.R. Pom ra ning et al. / Methods 47 (2009) 142–150 145

the wash is repeated once. Beads are resus pended in 40 l of 1£ immu no pre cip i ta tion buffer, added to the DNA sam ple and incu-bated on an end-over-end rota tor at 4 °C for 2–16 h. Dyn a be ads are col lected with a stick mag net as above and the super na tant with unbound DNA is removed. Beads are washed three times with 1 ml of 1£ immu no pre cip i ta tion buffer for 10 min at room tem per a ture with shak ing, resus pended in 250 l pro tein ase K diges tion buffer [5 mM Tris pH 8.0, 1 mM EDTA pH 8.0, 0.05% SDS] with 7 l of 10 mg/ml pro tein ase K and incu bated for 3 h on an end-over-end rota tor at 50 °C to digest the anti bod ies and release the 5mC-con tain ing DNA. DNA is extracted once with 250 l phe nol, once with 250 l chlo ro form and pre cip i tated by add ing 500 l eth a nol with 400 mM NaCl. To improve recov ery, 1 l gly-co gen (20 mg/ml) is added. DNA pel lets are resus pended in 50 l TE buffer and stored at ¡20 °C.

A con ve nient alter na tive to pre cip i ta tion with 5mC anti bod-ies is the use of com mer cially avail able Me DIP kits that rely on the inter ac tion of the methyl-bind ing domain (MBD) of MBD2 or MeCP2 with 5mC [61,62]. While MeCP2 has a nat u ral his ti dine (His) tag because its pro tein sequence in mice, rats and humans con tains seven con sec u tive his ti dine res i dues [62,63], the MBD of MBD2 has been expressed with a His tag to allow puri fi ca tion with mag netic nickel beads (Ac tive Mo tif, Carls bad, CA). These types of meth yl ated-CpG island recov ery assays (“MIRA”) have been used in sev eral stud ies [64]. Dif fer ent MBDs have slightly dif fer ent a!n-i ties for the den sity and num ber of con sec u tive CpGs (i.e., for the MBD2 inter ac tion to be e! cient, sev eral CpGs need to be in close prox im ity, whereas fewer, more widely spaced CpGs are su! cient for inter ac tion with MeCP2 [65].

4. Bisul fite sequenc ing

Bisul fite (or “geno mic”) sequenc ing is use ful to deter mine the meth yl a tion sta tus of cy to sines at the sin gle nucle o tide level. Briefly, sin gle-stranded DNA is treated with bisul fite which sulf-a nates cyto sine but leaves 5mC unaf fected. The cyto sine is then deam i nated and des ulf a nat ed to ura cil [66]. Con verted DNA is ampli fied by PCR with con ve nient primer pairs and PCR prod ucts are directly sequenced and aligned to uncon verted DNA, thus reveal ing exactly which cy to sines were meth yl ated. Two param e-ters affect the suc cess of this tech nique: (1) com plete con ver sion of un me thy lat ed cy to sines to ura cil and (2) potential deg ra da tion of DNA dur ing the con ver sion reac tion by high tem per a ture and low pH. Incom plete con ver sion mainly occurs because bisul fite only attacks cy to sines in sin gle-stranded DNA. In areas of the genome with high GC con tent the DNA may not dena ture com-pletely, which results in patches of unmod i fied cy to sines [67]. If a rea son ably com plete genome sequence is avail able, these false-positive regions can be screened for in an organ ism-spe-cific man ner. In a study to define the meth yl ated frac tion of the Ara bid op sis genome (the “methy lo me”), a high pro por tion of false-positive sequences were removed from fur ther anal y sis by screen ing for reads with three meth yl ated CpHpH sites in a row [16].

4.1. End-repair of sheared DNA for bisul fite sequenc ing

Liga tion of the sheared DNA to adapt ors prior to bisul fite treat-ment should ensure unbi ased PCR ampli fi ca tion of com pletely bisul fite-con verted DNA. The end-repair of sheared DNA fol lows essen tially the same pro to col as used for the prep a ra tion of micro-array probes [56] or Illu mina/So lexa sequenc ing libraries (see details in Sec tion 5). Pol ished DNA is mixed with Kle now poly mer-ase (exo¡) and dATP to gen er ate a 3 A-over hang, puri fied on MinE-lute PCR puri fi ca tion col umns (Qiagen, Valen cia, CA) and eluted with 10 l of the sup plied elu tion buffer.

4.2. Design of PCR prim ers and adapt ors for bisul fite con ver sion

A linker liga tion step cou pled to PCR after bisul fite treat ment selects for sequences that have under gone com plete cyto sine to ura cil con ver sion; because only 18 PCR cycles are used, this allows for unbi ased ampli fi ca tion [68]. Linker sequences that con tain un me thy lat ed cy to sines and a restric tion site are annealed to tem-plate (Fig. 2). After bisul fite treat ment, prim ers designed to amplify only com pletely bisul fite-con verted linker sequences are used for PCR ampli fi ca tion. The restric tion site (in Fig. 2 it is AluI) should be as close to the tem plate sequence as pos si ble to min i mize the amount of linker sequenced. To ligate dou ble-stranded bisul fite sequenc ing adapt ors, mix the adapt ors and pol ished DNA (10:1 molar ratio), add 5 l T4 DNA ligase (1 U/ l), 25 l DNA ligase buffer and water to bring the reac tion vol ume to 50 l. Dif fer ent com mer-cial DNA ligase prep a ra tions result in sim i lar e! cien cies. Incu bate the reac tion at room tem per a ture for 15 min. To remove un li gat ed adapt ors, sep a rate the ligated DNA on a 1% aga rose gel, ex ise the 300–500 bp frac tion and extract the DNA on com mer cially avail-able PCR puri fi ca tion col umn (Qiagen; Valen cia, CA; Epoch Bio labs. Hous ton, TX); elute with 32 l of elu tion buffer (10 mM Tris–HCl, pH 8.5).

4.3. Bisul fite con ver sion of DNA

By now, a num ber of kits for bisul fite sequenc ing are com mer-cially avail able, all of which prom ise to min i mize deg ra da tion while max i miz ing the C to U con ver sion rate. Recently, the “CpGe-nome DNA mod i fi ca tion kit” (Chem icon; Teme cu la, CA) and the “Epi Tect kit” (Qiagen; Valen cia, CA) were both shown to pro duce C to U con ver sion lev els of greater than 99% with only mod est DNA deg ra da tion [16,17]. Other tech niques have been shown to achieve even more com plete con ver sion rates but at the expense of DNA integ rity [17,66]. After bisul fite con ver sion, bisul fite-con verted DNA is ampli fied by 18 PCR cycles with prim ers that are spe cific for bisul fite sequenc ing adapt ors (Fig. 2). The DNA is digested with the restric tion enzyme whose rec og ni tion site was included on the adapt ors, puri fied on com mer cially avail able PCR puri fi ca tion col-umns (Qiagen; Valen cia, CA; Epoch Bio labs. Hous ton, TX) and used to gen er ate in vitro sequenc ing libraries.

Fig. 2. Prep a ra tion of sheared DNA for bisul fite sequenc ing. (a) Dou ble-stranded

adap tor sequences (black) are ligated to sheared end-repaired DNA (red). The DNA is

treated with bisul fite, which con verts un me thy lat ed cyto sine to ura cil. (b) A linker-

spe cific PCR primer (blue) is used to selec tively amplify top-strand sequences that

have under gone com plete bisul fite con ver sion. (c) In the sec ond round of PCR, a

sec ond con ver sion-spe cific primer ampli fies bot tom-strand sequences. (d) After 18

rounds of PCR, the prod ucts will con sist pri mar ily of bisul fite-con verted DNA in

which all U/G mis matches have been con verted to T/A. The adapt ors are removed at

the indi cated sites by diges tion with restric tion endo nu cle ases, e.g. AluI.

146 K.R. Pom ra ning et al. / Methods 47 (2009) 142–150

5. Prep a ra tion of DNA libraries for Illu mina/So lexa sequenc ing

A num ber of high through put sequenc ing tech nol o gies are cur-rently avail able, includ ing py rose quenc ing (Roche/454), sequenc-ing by liga tion (ABI SOLiD), and revers ible ter mi na tor sequenc ing (Illu mina/So lexa). These tech nol o gies give the abil ity to quickly and inex pen sively sequence very large amounts of DNA but all have the draw back of gen er at ing short (»400 nt) or very short (36–50 nt) sequence reads. We make use of So lexa sequenc ing, which is cur-rently able to gen er ate »1.5 Gb of sequence data in »3 days from a sin gle flow cell with eight chan nels.

Prior to So lexa sequenc ing, DNA sam ples from Me DIP or bisul-fite con ver sions must be pro cessed to gen er ate in vitro sequenc-ing libraries. Illu mina/So lexa rec om mends use of their geno mic or chro ma tin immu no pre cip i ta tion (ChIP) sam ple prep a ra tion kits. Nev er the less, except for the adap tor and PCR prim ers the reagents sup plied with the kit are not in any way dif fer ent from the typ i cally used enzymes avail able in most molec u lar biol ogy labs. Illu mina sup plies mod i fied prim ers as sep a rate “primer-only” kits (and the type and extent of mod i fi ca tion are con sid ered proprietary infor-ma tion). We found that unmod i fied HPLC-puri fied or non-puri fied prim ers can work as well as Illu mina-sup plied primer kits; this reduces the sam ple prep a ra tion costs by about three fold.

5.1. End-repair of sheared DNA for Illu mina/So lexa sequenc ing

When sequenc ing Me DIP sam ples, we typ i cally start with 200–400 ng of DNA. Sheared DNA frag ments are repaired with T4 DNA poly mer ase to fill in 5 over hangs, Kle now poly mer ase to remove

the 3 over hangs, and T4 poly nu cle o tide kinase to phos phor y late the 5 -OH. As sug gested in the Illu mina sam ple prep a ra tion pro to-cols, we mix 30 l of the DNA to be sequenced with 45 l H2O, 10 l of 10£ T4 DNA ligase buffer with 10 mM ATP, 2 l dNTP mix (20 mM of each dNTP), 5 l T4 DNA poly mer ase (3 U/ l), 1 l Kle now poly-mer ase (5 U/ l) and 5 l T4 poly nu cle o tide kinase (10 U/ l). In lieu of Ilu mi na sam ple prep a ra tion kits, enzymes from other man u-fac tur ers can be substi tuted with good suc cess when used at the con cen tra tions given above. The pol ish ing reac tion pro ceeds at 20 °C for 30 min. Pol ished DNA is puri fied on com mer cially avail-able puri fi ca tion col umns (e.g. Qiagen, Valen cia, CA; Epoch Bio labs, Hous ton, TX), and eluted with 32 l of 10 mM Tris–HCl (pH 8.5). To allow liga tion to the Illu mina/So lexa adap tor prim ers, an ade nine needs to be added to the 3 ends of DNA frag ments by using Kle-now poly mer ase that lacks the 3 to 5 exo nu cle ase activ ity. The pol ished DNA (32 l) is mixed with 5 l of 10£ Kle now poly mer-ase buffer, 10 l of 1 mM ATP and 3 l Kle now (exo¡) poly mer ase (5 U/ l). The reac tion is incu bated at 37 °C for 30 min. We purify the DNA on MinE lute PCR puri fi ca tion col umns (Qiagen, Valen cia, CA) and elute with 10 l of 10 mM Tris–HCl (pH 8.5).

5.2. Adap tor liga tion and PCR ampli fi ca tion for Illu mina/So lexa sequenc ing

The adap tor prim ers pro vided by So lexa are dena tured by heat ing to 98 °C for 5 min and cooled to room tem per a ture. Alter-na tively, equal amounts of lab-designed prim ers (at 10 M) are mixed, dena tured by heat ing to 98 °C for 5 min and cooled to room tem per a ture. Annealed adapt ors are sta ble at ¡20 °C for at least

Fig. 3. DNA meth yl a tion of link age group (LG) VII in Neu ros pora crassa. (a) Unique (blue) and non-unique reads (red) were aligned to the cur rent N. crassa ref er ence sequence

for LG VII by BLAT. Meth yl a tion peaks were visu al ized in the Argo browser as his to grams in 50 bp win dows, gen er ated by our “BLATm ap per” script. Repet i tive DNA (gold) was

mapped by count ing how many times each pos si ble 32-mer occurs in the Neu ros pora genome. This track pro vides a visual guide for repet i tive ness of non-unique read map-

ping. Read map height was deter mined by count ing the num ber of times a read was sequenced and divid ing by the num ber of times that read occurs in the genome. (b) DNA

meth yl a tion peaks (red) in a syn tenic region that flanks the LG VII cen tro meres of N. crassa (top) and the related spe cies N. dis cre ta (bot tom). DNA meth yl a tion in both spe cies

occurs almost exclu sively in inter genic regions. The ORF for the heav ily meth yl ated “pre dicted pro tein” on the right of the N. crassa track appears to be a pseu do gene.

K.R. Pom ra ning et al. / Methods 47 (2009) 142–150 147

one year. Annealed adapt ors are ligated to the DNA by mix ing them with pol ished DNA in a molar ratio of 10:1. Mix the adapt-ors, DNA, 5 l T4 DNA ligase (1 U/ l), ligase buffer, and water (final vol ume of 50 l). Incu bate at room tem per a ture for 15 min. Sep-a rate ligated DNA from un li gat ed adapt ers on a 2% NuSi eve aga-rose gel and select a DNA size for So lexa sequenc ing. We typ i cally excise DNA between 200 and 500 bp. We purify this DNA with a com mer cially avail able gel extrac tion kit (e.g. Qiagen or Epoch Bio-labs). The in vitro library is ampli fied by PCR and thus enriched for DNA frag ments that are flanked by ligated adapt ors. We use 1 l of the gel-puri fied DNA with 25 l Phu sion DNA poly mer ase Mas-ter Mix (Finn zymes, NEB), 1 l PCR primer 1.1 (Illu mina), 1 l PCR primer 2.1 (Illu mina) and 22 l of water. After ini tial dena tur ation (30 s at 98 °C), only 18–24 cycles of PCR are used (10 s at 98 °C, 30 s at 65 °C, 30 s at 72 °C) to avoid selec tive ampli fi ca tion of spe cific regions. This is fol lowed by a 5 min exten sion at 72 °C. Ampli cons are puri fied on PCR puri fi ca tion col umns and eluted with 30–50 l of 10 mM Tris–HCl (pH 8.5). DNA con cen tra tion is mea sured by absor bance at 260 nm (e.g. on a Nano drop spec tro pho tom e ter) and »10% of the reac tion is sep a rated on a 1–2% aga rose gel to mon i tor ampli fi ca tion.

Sequenc ing libraries are diluted to the spec i fi ca tions required by the Illu mina/So lexa sequenc ing machine in use. In most sit u-a tions, clus ter gen er a tion and the actual sequenc ing is han dled by per son nel who are part of a core sequenc ing facil ity. As these manip u la tions do not lend them selves to changes or spe cific adap-ta tions by indi vid ual labs, these steps are not dis cussed here any fur ther. Man u als and pro to cols that describe these steps are avail-able from Illu mina/So lexa.

6. Data anal y sis and visu al i za tion

6.1. Intro duc tion

High through put sequenc ing, e.g. with the Illu mina/So lexa 1G genome ana lyzer, gen er ates nearly a ter a byte of image files dur ing a sin gle run. Image files are ana lyzed and con verted into sequence reads. Prior to under tak ing any high through put sequenc ing pro-ject it is essen tial to design and test a data anal y sis pipe line. Deal-ing with the large amount of data that is gen er ated from even a sin-

gle run is not a triv ial task and in most lab envi ron ments this will require invest ment in addi tional com put ing resources. Addi tion ally, it is rec om mended that genome brows ers (e.g. Gmod/GBrowse or Argo) are installed and func tional to allow visu al i za tion of mapped sequence reads in a user-friendly man ner.

Until recently, the read length of Illu mina/So lexa sequenc ing was 36 bp. Because of a dras tic increase of sequenc ing errors at the end of the reads, most map ping pro grams only con sider the first 32 bp; usable reads have been sig nifi cantly increased with the release of the new ver sion of the Illu mina/So lexa sequencer. After a com plete list of reads for all eight lanes or chan nels on a sin gle flow cell is com piled, indi vid ual reads are mapped onto the ref er-ence genome. For that pur pose, Illu mina pro vides the ELAND algo-rithm, which is able to e! ciently map reads of 32 bp to a ref er ence genome. This pro gram yields six data sets. The first three data sets are com prised of reads that map to a unique loca tion in the ref-er ence genome with zero, one, or two mis matches (U0, U1, U2, respec tively). The sec ond three data sets con tain reads that map to more than one place in the ref er ence genome with zero, one, or two mis matches (R0, R1, R2, respec tively). In many cases, unique reads are used for most if not all of the data anal y ses, as ELAND pro vides map coor di nates for the unique data sets. A draw back to the ELAND algo rithm is that it does not give coor di nates for non-unique reads. In order to over come this, non-unique hits must be mapped back to the genome using a pro gram such as BLAT [69]; tra di tional BLAST algo rithms are not rec om mended for the anal y-sis of high through put data.

E! cient alter na tive meth ods are now avail able to map non-unique reads. Two of these pro grams have been devel oped at Ore gon State Uni ver sity, CashX (J. Cum bie, C. Sul li van, K. Kass chau and J. Car-ring ton, in prep a ra tion) and RGA (“ref er ence-guided assem bly”; R. Shen and T. Moc kler, in prep a ra tion). We use both pro grams, along with BLAT, to map repet i tive DNA sequences that are meth yl ated. CashX uses tab u lated and tal lied reads to map them pre cisely to the ref er ence genome while keep ing track of read tally and all map posi-tions in the genomes. Output data are trans ferred into files that can be used as track input for com monly used genome brows ers (e.g. in GFF3 for mat). RGA uses a BLAT-type map ping approach, result ing in a sim ple table or his to gram for mat that can be plot ted with sev eral pro grams, e.g. “R”. SOAP (“small oligo align ment pro gram”) is another use ful map ping pro gram for the visu al i za tion of Illu mina/So lexa data [70]. In all three appli ca tions, a var i able num ber of mis matches and gaps are allowed to facil i tate map ping.

No mat ter how non-unique reads are mapped back to a ref er ence genome, it is crit i cal to rep re sent data in an unbi ased man ner and val i date calls for non-unique reads. Var i ous nor mal i za tion meth ods have been pro posed [71]. The prob lem lies in not know ing pre cisely where a spe cific non-unique read orig i nated. Sev eral ways to solve this conun drum can be imag ined. Reads can be mapped to all pos si-ble loca tions and “nor mal ized” by sim ply divid ing by the num ber of times a read hits to the genome. To make this approach more sophis-ti cated, reads can be ana lyzed by near est-neigh bor approaches to eval u ate if one spe cific region in the genome is more likely than oth-ers to have pro duced a cer tain per cent age of non-unique reads. For exam ple, if only one out of five potential regions shows unique reads around a small num ber of non-unique reads, it is more plau si ble that this sin gle region orig i nated non-unique reads. Divid ing the num ber of non-unique reads by the num ber of geno mic loca tions would gen-er ate a read count for this spe cial region that would be too low.

Other than already pub lished nor mal i za tion approaches, assign-ment of a con fi dence value to each hit pro vides one method for eval-u at ing the map ping of non-unique sequences (Fig. 3a). This approach can be used prior to val i da tion of map ping data by other meth ods (e.g. region-spe cific quan ti ta tive PCR). We assign a “read value” (i.e., num ber of times a par tic u lar oli go nu cleo tide is sequenced divided by the num ber of places it maps in the genome) and a “con fi dence

Fig. 4. Anal y sis of reads from high through put bisul fite sequenc ing. To map sequence

reads derived from BS-Seq, two addi tional ref er ence genomes are prepared from a

cur rent ref er ence genome (0). The first (1) is the ref er ence genome with all cy-

to sines changed to thy mines. The sec ond (2) is the com ple ment sequence to the

genome with all gua nines changed to ade nines. After bisul fite con ver sion the DNA

is sub jected to PCR ampli fi ca tion result ing in two main prod ucts from any given

sequence. The prod ucts are sequenced and aligned to their best hits in either of

the two con verted ref er ence genomes. C/T mis matches (in the C to T con verted ref-

er ence sequence) and G/A mis matches (in the G to A con verted com ple ment ref-

er ence sequence) indi cate the position of a meth yl ated cyto sine. Meth yl ated cy-

to sines are red while ura cils derived from con verted un me thy lat ed cy to sines are

shown in green.

148 K.R. Pom ra ning et al. / Methods 47 (2009) 142–150

value” (i.e., 1/num ber of places a read maps to the genome). Thus, a read that maps to a unique loca tion in the genome will be given a con fi dence value of 1, while a read that maps to 60 dif fer ent places will be given a con fi dence value of 1/60 ( = 0.0167). To fur ther facil i-tate view ing, the read value can be used to assign the height in a his-to gram while the con fi dence value can be simul ta neously mapped as a heat map (e.g. a read with a high con fi dence value is mapped as red while a low con fi dence value is mapped as blue on a spec trum).

All map ping approaches require the use of genome brows ers to e! ciently view mapped reads in rela tion to other genome anno ta-tions. In most sit u a tions, this requires invest ment in com pu ta tional resources to allow the use of Gmod/Gbrowse, which needs to be set up spe cifi cally for each organ ism that is to be ana lyzed [72]. The most cur rent infor ma tion on this open source tool is avail able at http://gmod.org/wiki/index.php/Gbrowse. Other brows ers can ful fill use ful roles for spe cific require ments. To view the rel a tively small (4–11 Mb) chro mo somes of fil a men tous fungi in com bi na-tion with Me DIP-seq or ChIP-seq data tracks we fre quently use Argo (Avail able at http://www.broad.mit.edu/anno ta tion/argo/) because Gbrowse appears too cum ber some to dis play large seg-ments of a genome in a sin gle screen shot.

6.2. Me DIP sequenc ing

Cur rently, reads gen er ated dur ing a Illu mina/So lexa sequenc ing run are 36 bp in length but rep re sent frag ments of DNA that were any where between 200 and 500 bp in length depend ing on the size selected when gel puri fy ing the adapter-ligated library. Each start position of a read mapped to the genome rep re sents the 5 end of one of these sheared pieces. There fore, each read that is aligned to the genome comes with some amount of uncer tainty as to the exact loca tion of the meth yl ated cy to sines. For exam ple, if frag-ments of »200–500 bp were selected, in the ory the meth yl ated cy to sines are located any where within a 1000 bp win dow with the read start position at the cen ter. With good cov er age, reads will stack to yield a good pre dic tion of the meth yl ated regions as shown in Fig. 3. Meth yl ated regions from two closely related Neu-ros pora spe cies, N. crassa and N. dis cre ta coin cide almost per fectly

in some regions, e.g. the seg ment of link age group VII around the sin gle his tone H2A and his tone H2B genes (Fig. 3b).

6.3. Bisul fite sequenc ing

Align ing bisul fite-con verted reads to a ref er ence genome is essen tially the same as align ing non-con verted reads with the caveat of not know ing whether a thy mine base call is actu ally a thy mine or a bisul fite-con verted cyto sine. How ever, if high con-ver sion rates of ume thy lat ed cy to sines were observed, >99% of the remain ing cy to sines can be con sid ered to be meth yl ated. Addi tion-ally, since the com ple men tary strand of the DNA will also be altered dur ing PCR ampli fi ca tion it is not known whether an ade nine is actu ally an ade nine or whether it was the com ple ment guan ine to a con verted cyto sine. A straight for ward method to over come this prob lem was used by Lister and co-work ers to map meth yl a tion in the Ara bid op sis genome [17] while a sim i lar method is described in detail by Cok us and co-work ers [16].

Three ref er ence genomes must be used for align ment of bisul fite-con verted reads (Fig. 4). The first is an uncon verted genome. The sec-ond is a genome where all the cy to sines are con verted to thy mines to rep re sent com plete con ver sion of un me thy lat ed cy to sines and the third is a com ple ment genome where all the ref er ence genome has been com ple mented and the gua nines are con verted to ade nines to rep re sent PCR con ver sion of the com ple men tary DNA. Meth yl ated cy to sines can be iden ti fied when a read aligns to a con verted genome with a mis match. If the mis matched base is a cyto sine align ing to a thy mine or a guan ine align ing to an ade nine and the position was orig i nally a cyto sine or guan ine in the uncon verted genome then the mis match rep re sents a meth yl ated cyto sine.

When map ping reads to the con verted genomes it is impor tant to keep in mind that reads with higher con cen tra tions of meth yl-ated cy to sines will map poorly. For exam ple, the ELAND map ping pro gram nor mally allows up to two mis matches dur ing align ment, which will occur when map ping a meth yl ated cyto sine onto the con verted genomes. How ever, if an area con tains very high lev els of meth yl a tion (greater than two meth yl ated cy to sines in a 36 bp read) it will not map to the con verted genome but may map bet ter to the uncon verted genome if there are fewer than two un me thy-lat ed cy to sines in the bisul fite-con verted read.

7. Val i da tion of results

Val i da tion of results is cru cial for high through put appli ca tions and pro vides an esti mate of the data’s accu racy. Quan ti ta tive and semi-quan ti ta tive PCR are straight for ward meth ods for val i da tion of meth yl a tion lev els at spe cific loci. In either case, the level of a geno mic sequence of inter est and a con trol sequence are com-pared between con trol and Me DIP sam ples with enrich ment in the Me DIP sam ple indi cat ing cyto sine meth yl a tion. South ern anal-y sis may also be used to look for the pres ence of a spe cific DNA frag ment. We also have mapped 5mC in the related organ isms N. crassa and N. dis cre ta (Fig. 3b) and com pared sim i lar regions of their genomes to ver ify broad meth yl a tion pat terns.

We use pri mar ily semi-quan ti ta tive PCR with radio ac tive label-ing for val i da tion of Me DIP and ChIP-sequenc ing (Fig. 5). Real-time PCR is a con ve nient alter na tive if avail able. We use prim ers for a known euchro matic sequence with out DNA meth yl a tion (e.g. part of the Neu ros pora his tone H4 gene, hH4) as a neg a tive con trol. DNA sequences known to be meth yl ated serve as positive con trols. Sev-eral primer sets for sequences of inter est are designed. Typ i cally we design the neg a tive and positive con trol prod uct to be »400 bp and the test prod uct to be »200 bp. This allows co-ampli fi ca tion in one PCR reac tion and facil i tates sep a ra tion and quan ti fi ca tion after run ning PAGE gels and expos ing phos phor im ag er cas settes or film to the dried gels.

Fig. 5. Val i da tion of DNA meth yl a tion by semi-quan ti ta tive region-spe cific PCR.

Prim ers were designed for two regions known to be meth yl ated in Neu ros pora

(8:F10 and 1d21) and for an un me thy lat ed con trol region, the his tone H4 gene

(hH4). To quan tify the amount of DNA meth yl a tion in these regions, the ratio of

the band inten si ties between the Me DIP and con trol sam ples are cal cu lated. In the

wild type (wt) Me DIP both meth yl ated regions showed enrich ment when com pared

to the son i cated wt con trol. In con trast, a DNA meth yl trans fer ase mutant (dim-2)

that lacks all DNA meth yl a tion known in N. crassa showed no enrich ment in the

Me DIP sam ple.

K.R. Pom ra ning et al. / Methods 47 (2009) 142–150 149

For semi-quant itive PCR, mix 12.65 l H2O, 0.5 l poly mer ase (2 U/ l), 0.25 l dNTP mix (20 mM), 2.5 l 10£ poly mer ase buffer, 0.1 l -32P labeled dCTP (20 mM), 0.5 l tem plate DNA, 2 l con trol prim ers (5 mM each), 2 l test prim ers (5 mM each) and amplify the DNA by PCR (90 s at 94 °C, 24 times 30 s at 94 °C, 30 s at 55 °C, 30 s at 72 °C, and 5 min at 72 °C)[60]. We sep a rate 10 l of each PCR prod uct by PAGE through a 5% gel, record the band inten si ties by phos phor im ag ing and cal cu late the rel a tive inten sity of the test band divided by the con trol band for each sam ple. The rel a tive inten si ties from the Me DIP and con trol sam ples are com pared to find enrich ment in the Me DIP sam ple (Fig. 5).

8. Con clud ing remarks

High through put sequenc ing has rev o lu tion ized the anal-y sis of meth yl ated DNA. In non-repeat regions, 5mC can now be mapped with exqui site precision by genome-wide bisul-fite sequenc ing. The rel a tively short read lengths gen er ated by most high through put sequenc ing approaches con tinue to sti fle attempts to pro vide the com plete “methy lo me” of most higher eukary otes, sim ply because much 5mC resides in regions that con sist of repeated DNA. Nev er the less, both Me DIP-Seq and BS-Seq have now been used to map meth yl a tion pat terns in the non-repet i tive regions of fun gal, plant and ani mal genomes. Genetic stud ies com bined with high through put sequenc ing are car ried out both in wild type and mutant strains. In mutants the capac ity to meth yl ate DNA can either be grossly or only slightly altered, thus reveal ing dif fer en tial meth yl a tion pat terns that help to uncover the mech a nisms of DNA meth yl a tion. These and future stud ies will pro vide valu able insights into how eukary-otic genomes are organized, how gene clus ters or reg u lons are coor di nately reg u lated by epi ge netic mech a nisms, and how DNA meth yl a tion affects the expres sion of spe cific genes.

Acknowl edg ments

Our lab o ra tory is sup ported by a Grant from the Amer i can Can-cer Soci ety (RSG-08-030-01-CCG, to M.F.). We are indebted to Eric Sel ker for the original Neu ros pora Me DIP pro to col. We thank Zack Lewis and Eric Sel ker for shar ing of unpub lished data, mate ri als and stim u lat ing dis cus sions. High through put sequenc ing and ini tial data anal y ses were car ried out in the OSU CGRB core lab by Mark Dase nko, Steve Drake and Chris Sul li van. We are grate-ful for gifts of strains and DNA olig o mers from the Fun gal Genet-ics Stock Cen ter (FGSC, Uni ver sity of Kan sas, Kan sas City, MS) and the Neu ros pora Genome Pro gram Pro ject grant from the NIH (P01 GM068087), respec tively.

Ref er ences

[1] M. Ehr lich, M.A. Gama-Sosa, L.H. Car re ira, L.G. Ljung dahl, K.C. Kuo, C.W. Ge hrke, Nucleic Acids Res. 13 (1985) 1399–1412.

[2] D. Lod wick, H.N. Ross, J.E. Har ris, J.W. Almond, W.D. Grant, J. Gen. Micro biol. 132 (1986) 3055–3059.

[3] M.G. Mari nus, Annu. Rev. Genet. 21 (1987) 113–131.[4] D. Ratel, J.L. Rav anat, F. Ber ger, D. Wion, Bi oes says 28 (2006) 309–315.[5] J.D. Lewis, R.R. Mee han, W.J. Hen zel, I. Ma ur er-Fogy, P. Je ppe sen, F. Klein, A.

Bird, Cell 69 (1992) 905–914.[6] J.H. Lee, D.G. Skal nik, J. Biol. Chem. 277 (2002) 42259–42267.[7] P.H. Tate, A.P. Bird, Curr. Opin. Genet. Dev. 3 (1993) 226–231.[8] J.H. Pro! tt, J.R. Davie, D. Swin ton, S. Hatt man, Mol. Cell Biol. 4 (1984) 985–988.[9] V.J. Simp son, T.E. John son, R.F. Ham men, Nucleic Acids Res. 14 (1986) 6711–6719.

[10] C.R. Wil kin son, R. Bart lett, P. Nurse, A.P. Bird, Nucleic Acids Res. 23 (1995) 203–210.

[11] H. Go wher, O. Leis mann, A. Jel tsch, EMBO J. 19 (2000) 6918–6923.[12] H. Go wher, K.C. Ehr lich, A. Jel tsch, FEMS Micro biol. Lett. 205 (2001) 151–155.[13] E.U. Sel ker, N.A. Toun tas, S.H. Cross, B.S. Mar go lin, J.G. Mur phy, A.P. Bird, M.

Fre i tag, Nature 422 (2003) 893–897.[14] A. Mi ura, S. Yone bay ash i, K. Wa tan a be, T. Toy ama, H. Shi mad a, T. Ka ku tan i,

Nature 411 (2001) 212–214.

[15] R.E. Pru itt, E.M. Meye ro witz, J. Mol. Biol. 187 (1986) 169–183.[16] S.J. Cok us, S. Feng, X. Zhang, Z. Chen, B. Mer ri man, C.D. Hau dens child, S. Prad-

han, S.F. Nel son, M. Pel leg rin i, S.E. Jac ob sen, Nature 452 (2008) 215–219.[17] R. Lister, R.C. O’Mal ley, J. Ton ti-Fil ip pin i, B.D. Greg ory, C.C. Berry, A.H. Mil lar,

J.R. Ec ker, Cell 133 (2008) 523–536.[18] M. Weber, I. Hell mann, M.B. Sta dler, L. Ra mos, S. Pa ab o, M. Reb han, D. Schu-

bel er, Nat. Genet. 39 (2007) 457–466.[19] J. Lewis, A. Bird, FEBS Lett. 285 (1991) 155–159.[20] R. Met i vier, R. Gal lais, C. Tiff oche, C. Le Per on, R.Z. Jur kowska, R.P. Car mou che,

D. Ibb er son, P. Ba rath, F. De may, G. Reid, V. Benes, A. Jel tsch, F. Gan non, G. Sal-bert, Nature 452 (2008) 45–50.

[21] G. Go kul, B. Gau tam i, S. Mal a thi, A.P. Sow ja nya, U.R. Poli, M. Jain, G. Ra ma-krishna, S. Kho sla, Epi ge net ics 2 (2007) 80–85.

[22] Z. Zhang, P.C. Hu ett ner, L. Ngu yen, M. Bid der, M.C. Funk, J. Li, J.S. Rader, Onco-gene 25 (2006) 5436–5445.

[23] Z. Wang, Y. Zhang, B. Rams ahoye, D. Bowen, S.H. Lim, Br. J. Can cer. 91 (2004) 1597–1603.

[24] N. All aman-Pil let, A. Dje mai, C. Bonny, D.F. Schord er et, Gene Expr. 7 (1998) 61–73.

[25] M.J. Browne, R.H. Bur don, Nucleic Acids Res. 4 (1977) 1025–1037.[26] S. Beck, V.K. Rak yan, Trends Genet. 24 (2008) 231–237.[27] K.C. Kuo, R.A. McCu ne, C.W. Ge hrke, R. Midg ett, M. Ehr lich, Nucleic Acids Res.

8 (1980) 4763–4776.[28] M. Nel son, E. Ras chke, M. McC lel land, Nucl. Acids Res. 21 (1993) 3139–3154.[29] A.P. Bird, E.M. South ern, J. Mol. Biol. 118 (1978) 27–47.[30] F. An te qu era, M. Tam ame, J.R. Vi llanu eva, T. San tos, J. Biol. Chem. 259 (1984)

8033–8036.[31] E.U. Sel ker, J.N. Ste vens, Proc, Natl. Acad. Sci. USA 82 (1985) 8114–8118.[32] M. He iska nen, A.C. Sy va nen, H. Si i tari, S. La ine, A. Pal o tie, PCR Meth ods Appl. 4

(1994) 26–30.[33] J. Singer-Sam, J.M. Le Bon, R.L. Tan guay, A.D. Rig gs, Nucleic Acids Res. 18 (1990)

687.[34] M. From mer, L.E. McDon ald, D.S. Mil lar, C.M. Col lis, F. Watt, G.W. Grigg, P.L.

Mol loy, C.L. Paul, Proc. Natl. Acad. Sci. USA 89 (1992) 1827–1831.[35] S. Der ks, M.H. Lent jes, D.M. Hel le bre kers, A.P. de Bru ine, J.G. Her man, M. van

Enge land, Cell Oncol. 26 (2004) 291–299.[36] M.F. Fraga, M. Es tell er, Bio tech ni ques 33 (2002) 636–649.[37] H. Cedar, A. So lage, G. Gla ser, A. Ra zin, Nucleic Acids Res. 6 (1979) 2125–2132.[38] S.H. Cross, J.A. Charl ton, X. Nan, A.P. Bird, Nat. Genet. 6 (1994) 236–244.[39] M. Shi rai shi, A. Sek ig u chi, Y.H. Chuu, T. Se ki ya, Biol. Chem. 380 (1999) 1127–

1131.[40] M.R. Este cio, P.S. Yan, A.E. I bra him, C.S. Tellez, L. Shen, T.H. Hu ang, J.P. Issa,

Genome Res. 17 (2007) 1529–1536.[41] E. Schil ling, M. Re hli, Genom ics 90 (2007) 314–323.[42] P.S. Yan, C.M. Chen, H. Shi, F. Rah matpa nah, S.H. Wei, C.W. Cald well, T.H. Hu ang,

Can cer Res. 61 (2001) 8375–8380.[43] S.E. Brown, M.F. Fraga, I.C. Weaver, M. Berd as co, M. Szyf, Epi ge net ics 2 (2007)

54–65.[44] M. Weber, J.J. Davies, D. Wit tig, E.J. Oake ley, M. Ha ase, W.L. Lam, D. Schu bel er,

Nat. Genet. 37 (2005) 853–862.[45] A.S. Cheng, A.C. Culh ane, M.W. Chan, C.R. Venk a tar am u, M. Eh rich, A. Na sir,

B.A. Rodri guez, J. Liu, P.S. Yan, J. Quac ken bush, K.P. Nephew, T.J. Yeat man, T.H. Hu ang, Can cer Res. 68 (2008) 1786–1796.

[46] X.Q. Tian, D.F. Sun, Y.J. Zhang, J.Y. Fang, Yi Chu an 30 (2008) 295–303.[47] F.V. Jac in to, E. Bal le star, S. Rop ero, M. Es tell er, Can cer Res. 67 (2007) 11481–

11486.[48] C. Geb hard, L. Schwarzfi scher, T.H. Pham, E. Schil ling, M. Klug, R. An dree sen, M.

Re hli, Can cer Res. 66 (2006) 6118–6128.[49] K. So, G. Tam ura, T. Honda, N. Hom ma, T. Waki, N. Tog a wa, S. Nish izuka, T.

Mo toy ama, Can cer Sci. 97 (2006) 1155–1158.[50] N. Omura, C.P. Li, A. Li, S.M. Hong, K. Wal ter, A. Ji meno, M. Hidalgo, M. Gog gins,

Can cer Biol. Ther. 7 (2008) .[51] M.O. Ho que, M.S. Kim, K.L. Os trow, J. Liu, G.B. Wis man, H.L. Park, M.L. Po e ta, C.

Jer on im o, R. Hen ri que, A. Lend vai, E. Schu ur ing, S. Begum, E. Ro sen baum, M. On ge na ert, K. Ya mash it a, J. Cal if ano, W. Wes tra, A.G. van der Zee, W. Van Criek-in ge, D. Si dran sky, Can cer Res. 68 (2008) 2661–2670.

[52] D. Zhou, W. Qiao, Y. Wan, Z. Lu, J. Bio chem. Bio phys. Meth ods 66 (2006) 33–43.[53] J. Pen terman, D. Zil ber man, J.H. Huh, T. Bal lin ger, S. He nik off, R.L. Fischer, Proc.

Natl. Acad. Sci. USA 104 (2007) 6752–6757.[54] H. Hay ash i, G. Na gae, S. Tsuts umi, K. Kanes hi ro, T. Ko zaki, A. Kan e da, H. Sugi-

sa ki, H. Ab ura tan i, Hum. Genet. 120 (2007) 701–711.[55] 333 R.H. Davis, Oxford Uni ver sity Press, 2000 333.[56] M.R. Miller, T.S. At wood, B.F. Ea mes, J.K. Eb er hart, Y.L. Yan, J.H. Post leth wait,

E.A. John son, Genome Biol. 8 (2007) R105.[57] Z.A. Lewis, A.L. Shiver, N. Sti" er, M.R. Miller, E.A. John son, E.U. Sel ker, Genet ics

177 (2007) 1163–1171.[58] J.L. De Ris i, V.R. Iyer, P.O. Brown, Sci ence 278 (1997) 680–686.[59] J. Na kay ama, A.J. Klar, S.I. Gre wal, Cell 101 (2000) 307–317.[60] H. Ta maru, X. Zhang, D. McMil len, P.B. Singh, J. Na kay ama, S.I. Gre wal, C.D.

Al lis, X. Cheng, E.U. Sel ker, Nat. Genet. 34 (2003) 75–79.[61] B. Hend rich, A. Bird, Mol. Cel lu lar Biol. 18 (1998) 6538–6547.[62] X. Nan, R.R. Mee han, A. Bird, Nucleic Acids Res. 21 (1993) 4886–4892.[63] R.R. Mee han, J.D. Lewis, A.P. Bird, Nucleic Acids Res. 20 (1992) 5085–5092.[64] T. Rauch, H. Li, X. Wu, G.P. Pfe if er, Can cer Res. 66 (2006) 7939–7947.[65] M.F. Fraga, E. Bal le star, G. Mon toya, P. Tay sav ang, P.A. Wade, M. Es tell er, Nucleic

Acids Res. 31 (2003) 1765–1774.

150 K.R. Pom ra ning et al. / Methods 47 (2009) 142–150

[66] S.J. Clark, A. Sta tham, C. Stir za ker, P.L. Mol loy, M. From mer, Nat. Pro toc. 1 (2006) 2353–2364.

[67] P.M. War necke, C. Stir za ker, J. Song, C. Grun au, J.R. Melki, S.J. Clark, Meth ods 27 (2002) 101–107.

[68] A. Mei ss ner, A. Gnirke, G.W. Bell, B. Rams ahoye, E.S. Lander, R. Jae nisch, Nucleic Acids Res. 33 (2005) 5868–5877.

[69] W.J. Kent, Genome Res. 12 (2002) 656–664.[70] R. Li, Y. Li, K. Kris tian sen, J. Wang, Bio in for mat ics 24 (2008) 713–714.[71] K.D. Kass chau, N. Fa hl gren, E.J. Chap man, C.M. Sul li van, J.S. Cum bie, S.A. Gi van,

J.C. Car ring ton, PLoS Biol. 5 (2007) e57.[72] M.J. Don lin, Curr Pro toc Bio in for mat ics Chap ter 9 (2007) Unit 9 9.