MethylSig: a whole genome DNA methylation analysis pipeline

8
methylSig: a whole genome DNA methylation analysis pipeline Yongseok Park 1,†* , Maria E. Figueroa 2 , Laura S. Rozek 3,4 , and Maureen A. Sartor 1,5* 1 Department of Computational Medicine and Bioinformatics, 2 Pathology Department, 3 Department of Environmental Health Sciences, 4 Department of Otolaryngology, 5 Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA, Current location: Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA ABSTRACT Motivation: DNA methylation plays critical roles in gene regulation and cellular specification without altering DNA sequences. The wide application of reduced representation bisulfite sequencing (RRBS) and whole genome bisulfite sequencing (bis-seq) opens the door to study DNA methylation at single CpG site resolution. One challen- ging question is how best to test for significant methylation differences between groups of biological samples in order to minimize false positive findings. Results: We present a statistical analysis package, methylSig, to analyze genome-wide methylation differences between samples from different treatments or disease groups. MethylSig takes into account both read coverage and biological variation by utilizing a beta-binomial approach across biological samples for a CpG site or region, and identifies relevant differences in CpG methylation. It can also incorporate local information to improve group methylation level and/or variance estimation for experiments with small sample size. A permutation study based on data from enhanced RRBS samples shows that methylSig maintains a well-calibrated type-I error when the number of samples is three or more per group. Our simulati- ons show that methylSig has higher sensitivity compared to several alternative methods. The use of methylSig is illustrated with a compa- rison of different subtypes of acute leukemia and normal bone marrow samples. Availability: methylSig is available as an R package at http://sartorlab.ccmb.med.umich.edu/software. Contact: [email protected] 1 INTRODUCTION DNA methylation (5-methylcytosine) is the most intensively stu- died and one of the best understood epigenetic marks in mammalian cells, having important roles in imprinting, genome stability, and regulation of gene expression without altering the DNA sequence itself (Kulis & Esteller, 2010; Yang et al., 2010). In mamma- lian cells, cytosine methylation occurs almost exclusively at CpG dinucleotides, with the exception of embryonic stem cells, where non-CpG methylation is a frequent occurrence (Lister et al., 2009). DNA methylation is essential for normal development and cell differentiation due to its roles in regulating gene expression. For example, unmethylated CpGs in promoter regions can allow bin- ding of specific transcription factors while methylated CpGs in these * to whom correspondence should be addressed regions can prevent binding (Lim & Maher, 2010). Furthermore, extensive cross-talk occurs between DNA methylation and chroma- tin modifying histone marks (Vaissi` ere et al., 2008; Shen & Laird, 2013; Izzo & Schneider, 2010; Kassner et al., 2013). Dysregulation of DNA methylation is a hallmark of cancer, with overall genomic demethylation and gene-specific hypermethylation, most notably in oncogenes and tumor suppressor genes (Sharma et al., 2010). Treatment of DNA with sodium bisulfite deaminates unmethy- lated cytosines to uracil while methylated cytosines are resistant to this conversion, thus allowing for sequence-specific discrimi- nation between methylated and unmethylated CpG sites (Clark et al., 2006). Sodium bisulfite pre-treatment of DNA coupled with next-generation sequencing has facilitated genome-wide quantita- tive DNA methylation to be studied at single cytosine site resolution (Lister & Ecker, 2009; Gu et al., 2010; Laird, 2010). The high cost of whole-genome bisulfite sequencing (bis-seq), and the uneven distribution of CpG sites in the genome motivated the development of modified approaches such as reduced representation bisulfite sequencing (RRBS) (Meissner et al., 2005; Jeddeloh et al., 2008; Gu et al., 2011) and enhanced RRBS (ERRBS) (Akalin et al., 2012a). These methods have the advantage of requiring fewer sequencing reads by enriching for CpG dense regions of the genome. Bis-seq, RRBS, and ERRBS facilitate the study of DNA methylation patterns across the genome in multiple samples and between sample groups. Recently, ERRBS was used to identify and describe distinct DNA methylation patterns associated with specific forms of acute mye- loid leukemias (AML) (Akalin et al., 2012a). AML is a highly heterogeneous disease both from the clinical and molecular stand- points, with many distinct molecular subtypes defined by genetic abnormalities; several of these target key epigenetic regulators. An estimated 20%-25% of all AMLs are associated with heterozygous somatic mutations of isocitrate dehydrogenase 1 or 2 (IDH1 or 2), or ten-eleven translocation 2 (TET2) (Patel et al., 2011). Any one of these mutations results in an impairment of DNA demethylation pathways and leads to the establishment of a DNA hypermethylation phenotype (Figueroa et al., 2010a). A separate class of AMLs, con- stituting approximately 15% of all AML cases, are identified by the presence of the t(8;21) translocation giving rise to the AML1/ETO fusion oncoprotein (Petrie & Zelent, 2007). Several methods have been published for analyzing genome-wide methylation levels. BSmooth (Hansen et al., 2012), which is targe- ted to analyze bis-seq data, uses a smoothing approach across the 1 Associate Editor: Dr. Michael Brudno © The Author (2014). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] Bioinformatics Advance Access published May 16, 2014 at University of Michigan on May 22, 2014 http://bioinformatics.oxfordjournals.org/ Downloaded from

Transcript of MethylSig: a whole genome DNA methylation analysis pipeline

methylSig: a whole genome DNA methylation analysis pipelineYongseok Park 1,†∗, Maria E. Figueroa 2, Laura S. Rozek 3,4, and Maureen A. Sartor 1,5∗

1 Department of Computational Medicine and Bioinformatics, 2Pathology Department, 3Department ofEnvironmental Health Sciences, 4Department of Otolaryngology, 5Department of Biostatistics, University ofMichigan, Ann Arbor, MI 48109, USA,† Current location: Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA

ABSTRACTMotivation: DNA methylation plays critical roles in gene regulationand cellular specification without altering DNA sequences. The wideapplication of reduced representation bisulfite sequencing (RRBS)and whole genome bisulfite sequencing (bis-seq) opens the door tostudy DNA methylation at single CpG site resolution. One challen-ging question is how best to test for significant methylation differencesbetween groups of biological samples in order to minimize falsepositive findings.Results: We present a statistical analysis package, methylSig,to analyze genome-wide methylation differences between samplesfrom different treatments or disease groups. MethylSig takes intoaccount both read coverage and biological variation by utilizing abeta-binomial approach across biological samples for a CpG site orregion, and identifies relevant differences in CpG methylation. It canalso incorporate local information to improve group methylation leveland/or variance estimation for experiments with small sample size.A permutation study based on data from enhanced RRBS samplesshows that methylSig maintains a well-calibrated type-I error whenthe number of samples is three or more per group. Our simulati-ons show that methylSig has higher sensitivity compared to severalalternative methods. The use of methylSig is illustrated with a compa-rison of different subtypes of acute leukemia and normal bone marrowsamples.Availability: methylSig is available as an R package athttp://sartorlab.ccmb.med.umich.edu/software.Contact: [email protected]

1 INTRODUCTIONDNA methylation (5-methylcytosine) is the most intensively stu-died and one of the best understood epigenetic marks in mammaliancells, having important roles in imprinting, genome stability, andregulation of gene expression without altering the DNA sequenceitself (Kulis & Esteller, 2010; Yang et al., 2010). In mamma-lian cells, cytosine methylation occurs almost exclusively at CpGdinucleotides, with the exception of embryonic stem cells, wherenon-CpG methylation is a frequent occurrence (Lister et al., 2009).DNA methylation is essential for normal development and celldifferentiation due to its roles in regulating gene expression. Forexample, unmethylated CpGs in promoter regions can allow bin-ding of specific transcription factors while methylated CpGs in these

∗to whom correspondence should be addressed

regions can prevent binding (Lim & Maher, 2010). Furthermore,extensive cross-talk occurs between DNA methylation and chroma-tin modifying histone marks (Vaissiere et al., 2008; Shen & Laird,2013; Izzo & Schneider, 2010; Kassner et al., 2013). Dysregulationof DNA methylation is a hallmark of cancer, with overall genomicdemethylation and gene-specific hypermethylation, most notably inoncogenes and tumor suppressor genes (Sharma et al., 2010).

Treatment of DNA with sodium bisulfite deaminates unmethy-lated cytosines to uracil while methylated cytosines are resistantto this conversion, thus allowing for sequence-specific discrimi-nation between methylated and unmethylated CpG sites (Clarket al., 2006). Sodium bisulfite pre-treatment of DNA coupled withnext-generation sequencing has facilitated genome-wide quantita-tive DNA methylation to be studied at single cytosine site resolution(Lister & Ecker, 2009; Gu et al., 2010; Laird, 2010). The highcost of whole-genome bisulfite sequencing (bis-seq), and the unevendistribution of CpG sites in the genome motivated the developmentof modified approaches such as reduced representation bisulfitesequencing (RRBS) (Meissner et al., 2005; Jeddeloh et al., 2008; Guet al., 2011) and enhanced RRBS (ERRBS) (Akalin et al., 2012a).These methods have the advantage of requiring fewer sequencingreads by enriching for CpG dense regions of the genome. Bis-seq,RRBS, and ERRBS facilitate the study of DNA methylation patternsacross the genome in multiple samples and between sample groups.

Recently, ERRBS was used to identify and describe distinct DNAmethylation patterns associated with specific forms of acute mye-loid leukemias (AML) (Akalin et al., 2012a). AML is a highlyheterogeneous disease both from the clinical and molecular stand-points, with many distinct molecular subtypes defined by geneticabnormalities; several of these target key epigenetic regulators. Anestimated 20%-25% of all AMLs are associated with heterozygoussomatic mutations of isocitrate dehydrogenase 1 or 2 (IDH1 or 2),or ten-eleven translocation 2 (TET2) (Patel et al., 2011). Any oneof these mutations results in an impairment of DNA demethylationpathways and leads to the establishment of a DNA hypermethylationphenotype (Figueroa et al., 2010a). A separate class of AMLs, con-stituting approximately 15% of all AML cases, are identified by thepresence of the t(8;21) translocation giving rise to the AML1/ETOfusion oncoprotein (Petrie & Zelent, 2007).

Several methods have been published for analyzing genome-widemethylation levels. BSmooth (Hansen et al., 2012), which is targe-ted to analyze bis-seq data, uses a smoothing approach across the

1

Associate Editor: Dr. Michael Brudno

© The Author (2014). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

Bioinformatics Advance Access published May 16, 2014 at U

niversity of Michigan on M

ay 22, 2014http://bioinform

atics.oxfordjournals.org/D

ownloaded from

genome within each sample to increase the accuracy of the esti-mated methylation level for a single CpG site. BSmooth identifiesdifferentially methylated regions (DMRs) by combining top rankeddifferentially methylated cytosines (DMCs) found using a t-stati-stics approach with either a quantile or direct t-statistic cutoff.MethylKit (Akalin et al., 2012b) offers useful annotation featuresand provides a statistical test by pooling sequencing reads amongthe individuals in each group. When multiple samples are present,logistic regression with a binary predictor is used, which can beformulated as a binomial-based test. However, methylation levelsoften vary significantly across individuals, as observed in cancersamples (Hansen et al., 2011). Pooling of reads among individualsmay result in inflated type-I error rates when testing for group dif-ferences. Non-parametric tests such as the Wilcoxon rank-sum testhave also been used to test for DMCs (Nordlund et al., 2013; Wanget al., 2013), however these tests suffer from a lack of statisticalpower for experiments with small sample sizes.

To overcome current limitations, we developed methylSig, agenome-wide DNA methylation analysis pipeline for use with bis-seq, RRBS, or ERRBS data. MethylSig utilizes a beta-binomialmodel across the samples in each defined group for each CpG siteor region to identify either DMCs or DMRs. It can also incorpo-rate local information across a chromosome to improve estimatesof variances and/or methylation levels. MethylSig offers annotationfunctions to map DMCs/DMRs to gene structures and a unique datavisualization approach to view several aspects of the data simul-taneously across the genome, providing a platform from which toprocess, analyze and visualize genome-wide methylation data.

2 METHODS

2.1 Beta-Binomial ApproachIf X follows a binomial distribution with number of trials n and theprobability of “success” p, the distribution function of X is

P (X = x) =(nx

)px(1− p)n−x, x = 0, 1, . . . , n.

In the case that p varies between sets of trials and p ∼ Beta(α, β), we sayX has a beta-binomial distribution with probability mass function

P (X = x) =(nx

)Γ(α+ x)Γ(β + n− x)Γ(α+ β)

Γ(n+ α+ β)Γ(α)Γ(β)

=(nx

)Γ(µθ + x)Γ((1− µ)θ + n− x)Γ(θ)

Γ(n+ θ)Γ(µθ)Γ((1− µ)θ),

where Γ(x) =∫∞0 tx−1e−tdt. The mean and variance of X are nµ and

nµ(1−µ){1+(n−1)φ}, where mean µ = α/(α+β). Let θ = α+β, thenthe over-dispersion parameter φ = 1/(1 + θ). To simplify the estimationprocess, we use θ instead of φ.

Let Xigj ∼ Beta-Bin(µigθi, (1 − µig)θi, nigj), g = 1, 2, j =

1, 2, . . . , Jg , whereXigj and nigj are number of methylated cytosine readsand coverage at site i of the jth sample in group g, respectively. Our goal isto test the hypothesis Hi0 : µi1 = µi2 against Hi1 : µi1 6= µi2.

There are several methods to estimate the parameters µig and θi, suchas estimation by moments and maximum likelihood estimator (MLE) (Grif-fiths, 1973), and an alternative method discussed by Tripathi et al. (1994).Although simple, the estimator from the method of moments is not frommaximizing the likelihood; thus, it is not useful when applying the likelihoodratio test. Because there is no closed form for the MLE, obtaining estima-tes of µig and θi simultaneously would be computationally demanding. Wepropose an approximation method.

The parameter θi is a simple function of the over-dispersion parameter.Since our goal is to compare means from different groups, θi is a nuisanceparameter. Based on the Wilks phenomenon results on generalized likelihoodratio test (Fan et al., 2001), we can first estimate the nuisance parameter θiand use the estimated θi to conduct the likelihood ratio test.

Let `(µi1, µi2, θi) be the log joint likelihood function. We estimate θi bysetting d`(·)/dθi = 0 (see supplement for formulas) while using the obse-

rved group mean∑Jg

j=1 xigj/∑Jg

j=1 nigj as the estimate for µg . Here we

restrict θi ≥ 0.Given θi = θi, the MLE for µig , g = 1, 2 is the solution of the equation

d`

dµig(µi1, µi2, θi = θi) = 0, g = 1, 2.

Under the null hypothesis Hi0 : µi1 = µi2 = µi, the MLE for µi is thesolution of the equation

d`

dµi(µi1 = µi, µi2 = µi, θi = θi) = 0.

The likelihood ratio test then becomes

D = 2{`(µig = µig , θi = θi)− `(µig = µi, θi = θi)}.

The distribution ofD asymptotically approaches a χ21 distribution as sample

size increases. However, the distribution of D has significantly heavier tailsthan χ2

1 for small sample sizes, since it is based on the estimated nuisanceparameter θ. Motivated by the two-sample t-test, which can be considereda likelihood ratio test conditional on the estimated variance, we propose theapproximationD ∼ t2p, where tp is the Student t distribution with p degreesof freedom. Our permutation results in section 3.2 show empirically that thisis a close approximation, with the p-values from our proposed method clo-sely matching the expected p-values under the null. In contrast, the resultsare noticeably anti-conservative using the χ2

1 distribution (Figure S1).

2.2 SimulationsTo assess the ability of methylSig to identify true positive DMCs, we con-ducted a set of simulations, and compared three versions of methylSig withfour other methods: the binomial-based test of MethylKit, BSmooth, a stan-dard t-test, and Wilcoxon rank test. Using ERRBS data from AML samples(Table S1), we generated data based on the properties of data from chromo-some 1 of the five IDH1/2 mutated (cancer) and four normal bone marrow(NBM) samples. This resulted in 65,284 CpG sites for which at least threesamples in each group satisfied the required coverage level. When genera-ting data, we preserved the actual CpG locations and coverage levels foreach sample. To produce data similar to a real situation, we first estima-ted the site-specific dispersion parameters using beta-binomial approach andgroup mean methylation levels for the normal group using BSmooth. Wethen used a beta-binomial distribution with the estimated dispersion para-meters and group methylation levels to generate data for the normal samplesand non-DMCs for the cancer samples. For cancer samples at DMC sites, werandomly chose methylation differences uniformly between 15-30% (weakersignal) or 25-40% (stronger signal). We also simulated three levels of cluste-ring of DMCs into regions, resulting in six scenarios. We performed 100simulations for each scenario. We used the percent estimates of methylationto perform the standard t-test and Wilcoxon rank test. We select top rankedCpG sites based on t-statistics provided by BSmooth after smoothing.

2.3 Options for combining local informationTo incorporate local information when estimating group mean methylationlevels and the dispersion parameters, we also provide a local MLE usingtriangular Kernel weights. This method is particularly useful for small sam-ple sizes and when the group methylation levels or dispersion parametersare locally similar or highly correlated, as we observed they are for up to200-300bp windows (Figure S2).

For CpG site i, let S = {k : −R ≤ k − i ≤ R}, where R is apredefined range in base pairs to combine local information. For CpG sites

2

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

k in S, the weight function is H((k − i)/(R+ 1)). The default function isH(u) = (1− u2)3, which can be redefined.

The θi is the solution of the equation:∑k∈S

H(k − i)d`

dθi(µk1, µk2, θi) = 0.

The degrees of freedom p =∑

k∈S H(k−i)(Jk1+Jk2−2) when usingthe squared t-distribution approximation for the likelihood ratio statistics.Note that only sites with data for at least two samples in each group areused. Similarly, µig and µi can be obtained using∑

k∈SH(k − i)

d`

dµig(µi1, µi2, θi = θi) = 0,

and∑k∈S

H(k − i)d`

dµi(µi1 = µi, µi2 = µi, θi = θi) = 0.

2.4 Identifying enriched or differentially methylatedtranscription factors

For each transcription factor (TF) in a given data base such ENCODE uni-form TF (http://genome.ucsc.edu/ENCODE/), our goal is to identify whichTFs are enriched, that is, which TFs’ binding sites have a significantly lar-ger proportion of DMCs than the overall proportion of DMCs. Let Ntotalbe the total number of compared CpG sites that can be annotated into TFsand among these, let Ndmc be the total number of identified DMCs. Forthe TF i, let nT

i be the number of CpG sites and nDi be the number of

DMCs within this TF i. We testHi0 : pTi = pDi vs.Hi1 : pTi 6= pDi , wherepT = nT

i /Ntotal and pD = nDi /Ndmc. Since Ntotal � Ndmc, we use like-

lihood ratio test based on binomial distribution for pDi and treat pT as theknown true proportion.

Alternatively, we can ask which TFs have a significant level of hyperme-thylation or hypomethylation across their binding sites, which could indicatewhether the TF is having a weaker or stronger regulatory effect, respectively.To address this, for each sample we first tile all reads from regions to whicha particular TF is predicted to bind. We then apply our beta-binomial modelto the data for each TF to identify TFs with hyper- or hypo-methylated bin-ding sites. This performs a self-contained hypothesis test, in that the level ofdifferential methylation is compared to the null hypothesis of no differentialmethylation, as opposed to the level of differential methylation outside of theTF binding sites.

3 RESULTS3.1 Overview of MethylSigThe methylSig workflow proceeds through several steps, from rea-ding in data to annotating and visualizing results (Figure 1). Itaccepts methylation data defining the number of Cs and Ts at eachCpG site for multiple samples that are assigned to one of two ormore groups (e.g., myCpG methylation call files from Bismark sof-tware, Krueger & Andrews 2011). If coverage is low or the aim is toexamine methylation trends, the data may be tiled within regions ofspecified width or local information may be incorporated to increasepower. MethylSig utilizes a beta-binomial model to compare methy-lation levels at each CpG site or tiled region between two groups ofsamples, as defined by the user. Parameter estimation includes twostages. In the first stage, the dispersion parameter is estimated ateach CpG site or region; this parameter accounts for the biologi-cal variation among samples within the same group. A weightedlikelihood can be used to incorporate information from nearby CpGsites or regions. In the second stage, the group methylation levelat each CpG site or region is calculated using the estimated disper-sion parameter; again information can be incorporated from nearbyCpG sites. A statistic based on the likelihood ratio test is used to

Read data

Methylation %

Condition A Condition B

Tile data?

Combine local data

No

ethylation % Local information?

p-values

Common dispersion

Group mean%Meth

Likelihood ratio test

q-values

Annotations Enrichement Test (TFs)

False discovery rate

Data visualization

Methylation % Methylation %

Local information?

t or χ approximation

21

2p

Fig. 1: Workflow of methylSig package.

evaluate the significance level of the difference in methylation. P-values and q-values are calculated based on either a t2p (default) ora χ2

1 approximation (recommended for large sample sizes). Finally,the methylSig package provides data visualization and annotationfunctions as well as functions to identify enriched TFs.

3.2 Evaluation of type-I error using permutationsTo evaluate the type-I error rate of methylSig under the null hypoth-esis of no signal, and compare it with that of the binomial model, wepermuted 21 AML samples (table S1) and tested for DMCs. Sam-ple libraries were prepared using the ERRBS method (Akalin et al.,2012a), and CpG sites with≥10X but≤500X coverage were inclu-ded. At each CpG site, there were≤21 pairs of percent methylationlevel and coverage, since some sites were not covered in some sam-ples; these pairs were permuted and randomly assigned to samplesin two groups (11 “treatment” and 10 “control” samples) to generategroups lacking meaningful methylation differences.

The binomial and beta-binomial tests were applied to calculate p-values, which were compared to the expected p-values under the nullhypothesis of no differential methylation. We analyzed CpG sitescovered by at least two samples in both groups. Results are displa-yed in QQ-plots of the -log10(p-values) (Figure 2). For sites coveredby at least six samples, methylSig closely follows the expected p-value distribution resulting in a well-calibrated type-I error rate;however, the p-values from the binomial model are anti-conservative(Figure S3), with p-values as low as 1.25× 10−177 with four cove-red samples and 2.7 × 10−285 with 21 samples, and 41% of CpGsites satisfying FDR < 0.05. In contrast, the minimum FDR formethylSig was 0.066 among the 2.46 million CpG sites tested.

3.3 SimulationTo assess methylSig’s ability to identify true positive DMCs, weconducted simulations comparing methylSig with four other meth-ods: a standard t-test, Wilcoxon rank sum test, the binomial test, andBSmooth. Data were generated to mimic the observed properties ofERRBS data, including coverage levels and variation in methylationlevels among samples (see Methods for details).

Simulations were performed varying two main parameters, the

3

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

4

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

4

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45 ●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

02

46

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

02

46

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

01

23

45

6

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

# samples: 5 # samples: 6

# samples: 7 # samples: 8

# samples: 9 # samples: 10 # samples: 11 # samples: 12

# samples: 13 # samples: 14 # samples: 15 # samples: 16

# samples: 18# samples: 17

# samples: 20# samples: 19

# samples: 4

# samples: 21

●●●● ● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4

02

46

8

− log10(E[pvalue])

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−log

10(p

valu

e)

●●

●●●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5

02

46

8

− log10(E[pvalue])

−log

10(p

valu

e)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Binomial Model

Beta-binomial Model

Expected

Binomial Model

Beta-binomial Model

Expected

Fig. 2: QQ-plots of observed versus expected log10(p-values) for thebinomial and beta-binomial models show that methylSig has a well-cali-brated type-I error rate when each group has ≥3 samples (#samples = 6).Results from the binomial model consistently show an inflated type-I errorrate (gray; shown for four and 21 samples). Data from 21 samples werepermuted among groups for each CpG site.

range of differences in percent methylation and the level of DMCclustering into DMRs. The difference in percent methylation forDMCs or DMRs for a simulation was either randomly chosen tobe between 15-30% (to simulate weaker signals) or 25-40% (stron-ger signals). We simulated three levels of DMC clustering. First, wesimulated DMCs independently, randomly choosing 5% of the CpGsites to be DMCs. This simulates situation that DMCs uncorrelatedwith neighboring sites. Next DMCs are simulated to be clusteredinto regions (DMRs) by dividing the chromosome into R regions,and randomly selecting 5% of the regions to be DMRs. In the lowerlevel of clustering, DMCs were assigned to 200 different regions(R = 4, 000) with∼16 CpG sites in each DMR. In the higher level,DMCs were assigned to 50 different regions (R = 1, 000) with∼65CpG sites in each DMR. In both cases, the methylation differencefor each CpG site in a DMR was set to be equal.

We compared seven methods: BSmooth, a standard t-test, thebinomial-based test, the Wilcoxon rank test, our site-specific beta-binomial test, and our beta-binomial test incorporating local infor-mation (dispersion and methylation levels separately) (Figure 3).

0 500 1000 1500 2000 2500 3000

020

4060

8010

0

# top statistics

% tr

ue D

MC

s

A BC

D

E

F

G

0 500 1000 1500 2000 2500 3000

020

4060

8010

0

# top statistics

% tr

ue D

MC

s

0 500 1000 1500 2000 2500 3000

020

4060

8010

0

# top statistics

% tr

ue D

MC

s

A B C

DE

F

G

0 500 1000 1500 2000 2500 3000

020

4060

8010

0

# top statistics

% tr

ue D

MC

s

AB

CD

EF

G

0 500 1000 1500 2000 2500 3000

020

4060

8010

0

# top statistics

% tr

ue D

MC

s

ABCDEFG

methylSigSig−LocDisp

Sig−LocMethBinomialBSmoothT−TestWilcoxon

0 500 1000 1500 2000 2500 3000

020

4060

8010

0

# top statistics

% tr

ue D

MC

s

ABCDEFG

methylSigSig−LocDispSig−LocMethBinomialBSmoothT−TestWilcoxon

A B

CD

E

F

G

A

BC

D

EF

G

A

B

C

D

EF

G

(a) (b)

(c) (d)

(e) (f)

Fig. 3: Simulation study comparing methylSig to alternative methods. Eachplot shows the percent of true DMCs among increasing numbers of top ran-ked DMCs. The x-axis value at which a line falls below 0.95 provides thenumber of sites resulting in a 5% false positive rate. Left column: 15-30%methylation differences. Right column: 25-40% methylation differences. (aand b) uncorrelated DMCs. (c and d) DMCs with a low level of clustering( 16 CPG sites per DMR. (e and f) DMCs with a high level of clustering(∼ 65 CpG sites per DMR). A total of 65,284 CpG sites were used, and 100simulations were conducted for each of the six cases.

Because not all methods being compared have a appropriate type-I error rate, and researchers are often interested in the top rankeddeclared DMCs, we compared the proportion of true DMCs amongthe total declared DMCs for varying cut-offs for each method. In allcases, our methods performed as well or better than the alternatives,with methylSig using local information for dispersion estimationoutperforming methylSig without use of local information. Thisis not surprising given the observed high correlation of disper-sion parameters within a 200-300 bp range. MethylSig using localinformation to estimate methylation levels outperformed methylSigwithout use of local information, for lower and high levels of DMCclustering (narrow and broad DMRs) (Figure 3(c)-(f)), but not forindependent DMCs (Figure 3(a)-(b)). For DMCs that occur inde-pendently across CpG sites, BSmooth has strikingly low detectionpower (Figure 3(a)-(b)). The performance of BSmooth improvesdramatically for clustered DMCs, with performance similar to thesite-specific version of methylSig for broad DMRs with 25-40%differences in methylation (Figure 3(f)). In practice, however, oursite-specific test is potentially more robust since it does not dependon the assumption of correlated dispersion parameters.

4

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

3.4 Site specific versus tiled analysisAlthough a change in methylation at a single CpG site may dis-rupt gene regulation, in some experiments, the average coverageper cytosine site may be low, making tiling data a desirable optionto increase power. Much biological regulation occurs over a widerregion covering multiple CpGs. Thus tiling nearby data may providecomplementary insights about the effect of DNA methylation onbiological phenotypes. Using the 21 AML samples and four NBMcontrols, we assessed the auto-correlation among nearby sites usinga weighted Pearson correlation coefficient to account for the varyinglevels of coverage. We found that on average, adjacent CpG sites hadhigh correlation (R> 0.9) with correlation dropping rapidly whendistance> 200−300 bp (Figure S2(a)). Currently, each significanttiled region is considered a separate DMR, and 25bp is the defaultsize of the tiles. While we observed this to work well for RRBS data,users will likely want to increase the tiling width for bis-seq data.

3.5 Basing variance estimation on normal samples onlyDNA methylation levels are often more heterogeneous amongcancer samples than among normal samples of the same tissue (Han-sen et al., 2011). This potentially reduces the power to identifymethylation differences between diseased and normal samples whenthe variance is estimated using data from both groups. An alterna-tive approach, with a slightly different interpretation of results, isto calculate variances from normal samples only. This is particu-larly useful when relevant DNA methylation changes occur in onlya subset of the disease samples. Such a subset of cancer samples mayharbor a common mutation or have a similar clinical outcome. Thus,we offer the ability to base variance estimates on just one of thegroups in methylSig. However, since the number of normal samplesis sometimes relatively small, as in our AML example, this approachalso increases the uncertainty of the variance estimates. Therefore,in these cases and in our implementation below, we recommendcombining information from nearby CpG sites to obtain more robustvariance estimation. This is also motivated by the observation thatnot only methylation levels, but also dispersion estimates are corre-lated among nearby CpG sites (R = 0.68 on average for adjacentCpG sites within 25bp of each other) (Figure S2(b)).

3.6 Data visualizationMethylSig offers a unique two-tiered visualization of the methyla-tion data depending on the zoom level. For narrow regions (recom-mended for<100kbp) where at most 500 CpG sites have data reads,users can visualize sample-specific coverage levels and % methyla-tion at each site, together with group averages, significance levelsand a number of genomic annotations (Figure 4a). To assess thepotential biological impact and differential methylation events, wealso annotate these CpG sites to the genomic context, including CpGislands and shores, RefGene information (promoter, UTR, intron,exon, noncoding RNA and intergenic regions), and TFs. This isimportant when interpreting DNA methylation differences, since thepresence of methylcytosine may have different effects on gene regu-lation depending on the context of the genomic region (van Vlodropet al., 2011). For broad regions, the visualization is simplified tothe locations of multiple genomic features (e.g. CpG islands, enh-ancer regions, etc) along with log10-scale significance levels of theDMCs/DMRs (Figure 4b).

2600000 2800000 3000000 3200000 3400000

CpG

refGene

10 − log10(pvalue)

0

tfbs

base(chr1)

CpG IslandCpG ShoreCpG Shelf

intergenic

integenic CDS

DMCs

ExonPromoterUTR

(a)

3150000 3152000 3154000 3156000 3158000 3160000

met

hyla

tion level

0

20

40

60

80

100BBB

BBBB

B

B

B

B

B

BBBBB

B

B

B

B

BB

B

B

B

BBB

B

B

G GG

G

GG

G

GGGGG

GG

G

G

G

G GG G

GGGGG

GGGGGGG GGG

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DD

DD

D

D

D

D

DDD

F

FF

F

F

FF

F

F

F

FFFFF

FFF

FFFFFFF

F

F

FF

FFFF

F

F

F

F

FFF

F

F

F

F

F

F

F

FF

F

F

F

FF

FF

F

F

FFFFFFFF

C

C

CCC C

C

C

C

CCCC

C

C

C

C

CC

C

C

CC

C

C

C

C

C

C

C

CC

K

K

KKKKK

K

KK

KKKKKKK

K

KK

KKKKK

KK

KKKK

K

K

K

K K

KK

K

K

K

KK

K

KKKKK

K

K

KKK

KKKK

K

K

K

A

AAAA

A AA

A

A

A

A

AAA

AA

A

A

AA

AAAAAA

A

AA

A AAA

A

A

AA

AA

A

A

A

AA

A

AA

A

A

AA AA AAAA

AA

HHHHH HH

HH

H

H

H

HHHHHHHHH

H

HHHHHHHHHH

HHH

HHHHHH

H

HH

H

H

HH

H

H

H

H

H

H

H

H H H

HHHHH

HHH

H

H HHHHHHHE

EEEE EE E

E

E

E

E

E

E

E

E

E

EEEEE

CpG

refGene

− log10(pvalue)10

0K562Ifng6hCmyc

K562MaxHek293Pol2

Hepg2Hnf4aForsklnHepg2CebpbForskln

Hepg2CebpbForskln

Hepg2Pol2ForsklnHepg2Srebp1aInsln

Hepg2Srebp1atfbs

base(chr1)

PRDM16

Trea

tmen

t gro

up e

stim

ates

C

ontro

l gro

up e

stim

ates

(b)

Fig. 4: Data visualization. (a) When the chromosome range is large (>1million bp), the visualization function does not show data. It is helpful toidentify regions where DMCs or DMRs exist, and then zoom in to a parti-cular region of interest. (b) When zoomed in to a small region (< 1mbp orat most 500 CpG sites with data), users can visualize the coverage levels,percent methylation levels and group mean methylation estimates at eachCpG site. Different letters represent different samples, and the size of eachletter represents the read coverage at that CpG site for the related sample.Letter with color (consistent with CpG island annotation) presents treatmentgroup (IDH1/2 mutation) and black for control group (normal). Red andblack dots and lines represent the group estimates for treatment and controlgroups respectively. CpG islands and shores, intergenic regions, −log10(p-values), and protein-DNA binding sites are shown in the annotation tracksbelow the methylation data.

3.7 Implementation of methylSig with ERRBS datafrom multiple subtypes of AML

3.7.1 Subtypes of AML Among the 21 ERRBS samples used inour permutation analysis, we classified 13 samples into the follo-wing three subgroups: IDH1/2 mutated (n = 5), TET2 mutated(n = 4), and AML1/ETO fusion (n = 4). An additional four sam-ples correspond to NBM specimens obtained from healthy donors.Figueroa et al. (2010a) previously showed that these AML subtypesdisplay distinct DNA methylation patterns.

We applied the beta-binomial and binomial models to identifyDMCs or DMRs (using 25bp windows) characterizing the AMLsubtypes and normal samples for CpG sites covered by at leastthree samples in each group. Significant DMCs/DMRs were defi-ned as those CpG sites or regions with FDR ≤ 0.05 and estimatedmethylation difference ≥ 25%.

5

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

3.7.2 Site specific analysis Both the beta-binomial and binomialmodels found substantial numbers of DMCs. However, as expe-cted based on the anti-conservative nature of the binomial model,we found substantially more DMCs using the binomial model thanbeta-binomial model (Figure 5(a)). With beta-binomial model weidentified largely different numbers of DMCs when comparingIDH1/2 mutation, TET2 mutation, or AML1/ETO gene fusion toNBM samples. Samples harboring IDH1/2 mutations have previ-ously been reported to have broader and more severe effects onDNA methylation compared to AMLs with TET2 (Figueroa et al.,2010a). Indeed, we identified ∼ 6X more DMCs in the IDH1/2group than in the TET2 group with methylSig. In contrast, the bino-mial model identified more (∼ 1.7X) DMCs in the TET2 group.Both groups have been observed to have a strong bias towardshypermethylation. Using methylSig, 98% of IDH1/2 and 88% ofTET2 DMCs were hypermethylated compared to NBM. Strong effe-cts on DNA methylation were also observed in samples harboringthe AML1/ETO gene fusion, although approximately equal num-bers of hyper- and hypo-methylated sites were observed in thissubtype (Figueroa et al., 2010a) (Figure 5(a)). For both of theseAML subtypes, methylSig identified several thousand DMCs. Incontrast, Binomial model identified more similar numbers of DMCsbetween each of the AML subtypes and NBM samples, with thefewest DMCs found for IDH1/2, and only 87% and 63% of theDMCs were hypermethylated in IDH1/2 and TET2 samples, respe-ctively. The beta-binomial model also tends to identify DMCs witha larger % methylation change than binomial model, such that thesmall p-value is more highly associated with large % methylationchange using beta-binomial model than it is with binomial model(Figure S4). For example, removing the minimum methylation dif-ference of 25% criteria increases the number (%) of DMCs forIDH1/2 from 116,605 (7.8%) to 607,745 (40.4%) using the bino-mial model, but only increases the number from 8,521 (0.57%) to11,018 (0.73%) using our beta-binomial method.

Mapping CpG sites to CpG islands and gene bodies using Ref-Gene models, as can be seen for IDH1/2 compared to NBM fromFigure 6(a), about 49% of CpG sites covered by the ERRBS assayare annotated to a CpG island; this percent increased to 62% amongidentified DMCs. Conversely, the percentage of CpG sites from interCpG island regions decreased from 33% covered by ERRBS ove-rall to 20% among DMCs. CpG shores and shelves were neitherenriched nor depleted with DMCs. This result is consistent with pre-vious observations regarding the patterns of DMCs in IDH-mutantAMLs (Akalin et al., 2012a). We did not observe enrichment ofDMCs in any regions defined by gene structure, including promo-ter, 5’ or 3’ UTR, CDS (intron and exon), and noncoding RNAwhen using RefGene models (Figure 6b). Using methylSig to testfor TFs with potentially enriched DMCs in their binding sites, weidentified an extremely strong enrichment of Suz12 sites (p-value:2.4 × 10−190) (Figure 6c). Suz12 is a component of the PolycombRepressive Complex 2 (PRC2), which trimethylates lysine 27 ofhistone 3. DNA regions normally harboring this histone mark inembryonic stem cells have been observed to be hypermethylated inmultiple types of cancer (Widschwendter et al., 2006). This suggeststhat PRC2 target sites also have a strong tendency toward hyperme-thylation in IDH1/2 mutant AML samples. An enrichment of Suz12sites was also observed for the TET2 and AML1/ETO subtypes (p-values: 4.7×10−9 and 1.7×10−102 respectively), and for Brf2 andPol3 sites in the AML1/ETO subtype (Figure S5).

# DMCs from Binomial Model

# DMCs from Beta-Binomial Model

# DMCs (FDR < 0.05, methylation difference > 25%)

AML1 ETO

Normal (a)

(0.57%)

Hyper: Hypo: Total:

8,327 194

8,521

1,311 172

1,483

(0.07%)

11,95112,788 24,739

(1.49%)

(7.75%)

Hyper: Hypo: Total:

101,329 15,276

116,605

128,080 74,610

202,690

(10.1%)

73,258 79,750

153,008

(9.2%)

TET2 IDH1/2

# DMRs (tiled 25bp) Binomial Model

# DMRs (tiled 25bp) Beta-Binomial Model

# DMRs (FDR < 0.05, methylation difference > 25%)

AML1 ETO

Normal (b)

(0.75%)

Hyper: Hypo: Total:

6,555 209

6,764

1,566 280

1,846

(0.16%)

9,682 12,76422,446

(2.40%)

(7.48%)

Hyper: Hypo: Total:

56,755 10,299 67,054

64,549 54,653

119,202

(10.4%)

37,906 57,376 95,282

(10.2%)

TET2 IDH1/2

# DMCs: Beta-binomial model using normal only and local information for variance

# DMCs (FDR < 0.05, methylation difference > 25%)

AML1 ETO

Normal (c)

(3.45%)

Hyper: Hypo: Total:

48,880 3,054

51,934

13,606 3,697

17,303

(0.86%)

43,43541,74485,179

(5.12%)

(3.87%)

Hyper: Hypo: Total:

55,088 3,197

58,285

76,361 23,800

100,161

(5.01%)

43,079 37,419 80,498

(4.84%)

TET2 IDH1/2

# DMCs: Beta-binomial model using both groups and local information for variance

Fig. 5: Number of DMCs identified using methylSig (beta-binomial model)and methylKit (Binomial model) when comparing AML subtypes withIDH1/2 mutation, TET2 mutation, another gene mutation, and AML1/ETOto NBM samples respectively. (a) Site specific analysis (DMCs). (b) Tiledanalysis (DMRs). (c) Estimating variance with using local information toobtain variances.

3.7.3 Tiled analysis For the tiled analysis, we divided thegenome into regions of continuous non-overlapping 25bp windowsand tiled (pooled) data within each region. Similar to the site-specific analysis, both the binomial and beta-binomial methodsidentified substantial numbers of DMRs. As expected due to theincrease in statistical power from pooling data from nearby sites,methylSig (Figure 5(b)) identified more DMRs for all subtypes ofAMLs compared to the site-specific analyses. Again, the great majo-rity, 97% and 88.0%, of DMRs were hypermethylated for IDH1/2and TET2 compared to NBM, respectively. In contrast, fewer signi-ficant changes were identified using tiled analysis with the binomialmodel, with the IDH1/2 subtype again having the fewest significantchanges, and 85% and 64% of DMRs hypermethylated for IDH1/2and TET2 subtypes, respectively.

3.7.4 Basing variance estimation on normal samples only andusing local information When estimating variance from normalsamples, many CpG sites with severe heterogeneities among cancerpatients are identified as DMCs. To show the utility of this approachwe combined the 13 AML samples, simulating a situation wherethere is no prior knowledge about molecular subtype, and com-pared this heterogeneous group to the NBM samples using both

6

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

CpG Island 49%

CpG Shore 13%

CpG Shelf 4%

InterCGI 33%

ALL

CpG Island 62%

CpG Shore 15% CpG Shelf 3%

InterCGI 21%

DMC(a)

ALL

0e+001e+052e+053e+054e+055e+056e+057e+05

29%

17%

1% 3% 1%7%

43%

Inte

rgen

ic

Intro

n

3'UTR

5'UTR

Nonco

nding

CDS

Prom

oter

DMC

0500

100015002000250030003500

30%

18%

1% 2% 1%

9%

39%

Inte

rgen

ic

Intro

n

3'UTR

5'UTR

Nonco

nding CDS

Prom

oter

(b)

Fig. 6: Annotation results from comparing IDH1/2 mutant to NBM samples.(a) When annotating CpG sites to CpG island, 13% more CpG sites (from49% to 62%) are annotated to CpG islands in DMCs while the percentageof CpG sites from inter CpG island regions decreased 13% (from 33% to20%). (b) No noticeable change is observed when annotating CpG sites togene bodies using refGene Model. (c) When using ENCODE uniform TFinformation, Suz12 is the only one TF that is significantly enriched in DMCscompared to all CpG sites used to identify DMCs.

variance estimation methods (all samples and normal-only samples)with methylSig. We then determined which sites were identified asDMCs in the “normal-only” analysis to help understand and explorethe possibility of cancer subtypes. Clustering the 13 AMLs and fournormal samples based on these DMCs separated the AML subtypesand normal samples, with only one IDH1/2 outlier (Figure 7). Clu-stering the samples based on DMCs using all samples to estimatevariance was not able to separate the subtypes (Figure S6).

When estimating variance with local information, as expected,many more DMCs were identified (3.5−12X, Figures 5(a) and (c),bottom results). This option of using local information is usefulwhen nearby CpG sites have similar variation across samples withineach group. We suggested using 200 ∼ 300 bps window in eachdirection based on our observations (Figure S2). The patterns of theresults are similar with or without local information. We found thatmost DMCs are hyper-methylated when comparing IDH1/2 or TET2mutation group to normal samples, while there was no significantdifference between % hyper- and hypo-methylated when compa-ring AML1/ETO samples to normal samples. When combining localinformation with using normal samples only to estimate varianceoption, interestingly, we found the number of identified DMCs whencomparing TET2 to normal samples changed from the least to themost (Figure 5(c)). This suggests the possible stronger methylationheterogeneity across samples in TET2 mutation group.

4 DISCUSSIONStatistical analysis of genome-wide bisulfite sequencing experi-ments with multiple biological samples per group is challenging

IDH

1/2

IDH

1/2

IDH

1/2

IDH

1/2

Nor

mal

Nor

mal

Nor

mal

Nor

mal

TE

T2

TE

T2

TE

T2

TE

T2

IDH

1/2

ET

O

ET

O

ET

O

ET

O

ETO

ETO

ETO

ETO

IDH1/2

TET2

TET2

TET2

TET2

Normal

Normal

Normal

Normal

IDH1/2

IDH1/2

IDH1/2

IDH1/2

0 0.2 0.5

Value

Color Key

Fig. 7: Clustering AML and NBM samples using CpG sites significantusing only NBM samples to estimate the variance is able to separate themolecular subtypes without prior information. The distance measure usedwas 1 - pairwise correlation. Hierarchical clustering method was “ward”.

due to heterogeneous read coverage and methylation levels, rela-tively small sample sizes, and the large number of tests required fora site-specific analysis. Appropriately applying a statistical methodto adjust for biological variation, and using information from nearbyCpG sites to improve estimation, can significantly reduce the falsepositive findings while maximizing power. Here, we introducedmethylSig, a genome-wide DNA methylation analysis pipeline forbis-seq or RRBS/ERRBS data that estimates biological variationusing a beta-binomial approach and maintains a well-calibratedtype-I error rate. MethylSig is able to incorporate local informationto improve the estimation of variances and group methylation levels.Our simulation results show that in terms of sensitivity, methyl-Sig outperforms both standard statistical tests such as the t-test andWilcoxon rank test, as well as BSmooth and the binomial test ofMethylKit under a variety of realistic scenarios. Our method reducesfalse positive findings by filtering out sites that have a high differe-nce in % methylation, but are too heterogeneous or have very low ofcoverage in many samples to be of biological relevance.

Our analyses using ERRBS methylation data from AML andNBM samples showed that methylSig can lead to statistically andbiologically relevant results. One subgroup of AML samples hadmutations in either IDH1 or IDH2. Nearly all of the DMCs incancers with neomorphic IDH1/2 mutations are expected to behypermethylated, since they lead to abnormally high levels of 2-hydroxyglutarate, which is a direct inhibitor of the TET DNAdemethylases required to convert 5mC to 5hmC (Xu et al., 2011).Similarly, loss-of-function mutations in TET2 result in impairedDNA demethylation (Figueroa et al., 2010b; Ko et al., 2010). Whilemost of the DMCs in the TET2 mutant subgroup are also expectedto be hypermethylated, this occurs to a lesser degree than for theIDH1/2 group due to the possible partial compensation by otherTET proteins. The great majority of DMCs identified by methylSigin these two subgroups were hypermethylated.

Compared to MethylKit, which uses a binomial model withoutadjusting for biological variation, we demonstrated that our appro-ach performs favorably in terms of type-I error rate, and correlationbetween significance and % methylation change. Recently, more

7

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from

sophisticated methods based on Bayesian hierarchical models havebeen published (Sun et al., 2014; Feng et al., 2014). Although thesemethods also make use of a beta-binomial model, they differ frommethylSig by using priors for the methylation levels (Sun et al.,2014) or dispersion parameters (Feng et al., 2014) based on theentire data; methylSig, in contrast, incorporates local informationfor methylation levels and/or dispersion parameters.

For sites that are highly heterogeneous in both groups, methyl-Sig may not be able to identify regions that are important for asubgroup of samples. This is similar to other traditional statisticalmethods that test for significant group differences. In this case, atest for a significant subgroup, such as Cancer Outlier Profile Analy-sis (COPA) used for expression data and implemented in Oncomine(Tomlins et al., 2005), may be more appropriate. In cases where oneof the groups (e.g., controls) is more homogeneous, users may esti-mate variance based only on that group in order to identify regionspotentially important in a subset of the diseased samples, which arelikely more heterogeneous overall.

MethylSig can also be used to analyze alternative cytosine methy-lation contexts. Finally, the methylSig R package provides a uniquevisualization approach and useful annotation functions.

ACKNOWLEDGEMENTFunding: Funding for this work was provided by NCI Grant #R01CA158286-01A1 and NIEHS grant # P30 ES017885-01A1.

REFERENCESAkalin, A., Garrett-Bakelman, F. E., Kormaksson, M., Busuttil, J., Zhang, L., Khreb-

tukova, I., Milne, T. A., Huang, Y., Biswas, D., Hess, J. L. et al. (2012a) Base-pairresolution DNA methylation sequencing reveals profoundly divergent epigeneticlandscapes in acute myeloid leukemia. PLoS genetics, 8 (6), e1002781.

Akalin, A., Kormaksson, M., Li, S., Garrett-Bakelman, F. E., Figueroa, M. E., Mel-nick, A., Mason, C. E. et al. (2012b) methylkit: a comprehensive R package for theanalysis of genome-wide DNA methylation profiles. Genome Biol, 13 (10), R87.

Clark, S. J., Statham, A., Stirzaker, C., Molloy, P. L. & Frommer, M. (2006) DNAmethylation: bisulphite modification and analysis. Nature Protocols, 1 (5), 2353–2364.

Fan, J., Zhang, C. & Zhang, J. (2001) Generalized likelihood ratio statistics and Wilksphenomenon. The Annals of Statistics, 29 (1), 153–193.

Feng, H., Conneely, K. N. & Wu, H. (2014) A bayesian hierarchical model to detectdifferentially methylated loci from single nucleotide resolution sequencing data.Nucleic acids research, doi:10.1093/nar/gku154.

Figueroa, M. E., Abdel-Wahab, O., Lu, C., Ward, P. S., Patel, J., Shih, A., Li, Y.,Bhagwat, N., Vasanthakumar, A., Fernandez, H. F. et al. (2010a) Leukemic IDH1and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function,and impair hematopoietic differentiation. Cancer cell, 18 (6), 553–567.

Figueroa, M. E., Lugthart, S., Li, Y., Erpelinck-Verschueren, C., Deng, X., Chri-stos, P. J., Schifano, E., Booth, J., van Putten, W., Skrabanek, L. et al. (2010b)DNA methylation signatures identify biologically distinct subtypes in acute myeloidleukemia. Cancer cell, 17 (1), 13–27.

Griffiths, D. A. (1973) Maximum likelihood estimation for the beta-binomial distribu-tion and an application to the household distribution of the total number of cases ofa disease. Biometrics, 29, 637–648.

Gu, H., Bock, C., Mikkelsen, T. S., Jager, N., Smith, Z. D., Tomazou, E., Gnirke, A.,Lander, E. S. & Meissner, A. (2010) Genome-scale DNA methylation mapping ofclinical samples at single-nucleotide resolution. Nature methods, 7 (2), 133–136.

Gu, H., Smith, Z. D., Bock, C., Boyle, P., Gnirke, A. & Meissner, A. (2011) Prepara-tion of reduced representation bisulfite sequencing libraries for genome-scale DNAmethylation profiling. Nature protocols, 6 (4), 468–481.

Hansen, K. D., Langmead, B., Irizarry, R. A. et al. (2012) BSmooth: from wholegenome bisulfite sequencing reads to differentially methylated regions. GenomeBiol, 13 (10), R83.

Hansen, K. D., Timp, W., Bravo, H. C., Sabunciyan, S., Langmead, B., McDonald,O. G., Wen, B., Wu, H., Liu, Y., Diep, D. et al. (2011) Increased methylation

variation in epigenetic domains across cancer types. Nature genetics, 43 (8), 768–775.

Izzo, A. & Schneider, R. (2010) Chatting histone modifications in mammals. Briefingsin Functional Genomics, 9 (5-6), 429–443.

Jeddeloh, J., Greally, J. & Rando, O. (2008) Reduced-representation methylationmapping. Genome Biology, 9 (8), 231.

Kassner, I., Barandun, M., Fey, M., Rosenthal, F., Hottiger, M. O. et al. (2013)Crosstalk between SET7/9-dependent methylation and ARTD1-mediated ADP-ribosylation of histone H1.4. Epigenetics & chromatin, 6 (1), 1.

Ko, M., Huang, Y., Jankowska, A. M., Pape, U. J., Tahiliani, M., Bandukwala, H. S.,An, J., Lamperti, E. D., Koh, K. P., Ganetzky, R. et al. (2010) Impaired hydroxyla-tion of 5-methylcytosine in myeloid cancers with mutant TET2. Nature, 468 (7325),839–843.

Krueger, F. & Andrews, S. R. (2011) Bismark: a flexible aligner and methylation callerfor Bisulfite-seq applications. Bioinformatics, 27 (11), 1571–1572.

Kulis, M. & Esteller, M. (2010) DNA methylation and cancer. Adv Genet, 70, 27–56.Laird, P. W. (2010) Principles and challenges of genome-wide DNA methylation

analysis. Nature Reviews Genetics, 11 (3), 191–203.Lim, D. H. & Maher, E. R. (2010) DNA methylation: a form of epigenetic control of

gene expression. The Obstetrician & Gynaecologist, 12 (1), 37–42.Lister, R. & Ecker, J. R. (2009) Finding the fifth base: genome-wide sequencing of

cytosine methylation. Genome research, 19 (6), 959–966.Lister, R., Pelizzola, M., Dowen, R. H., Hawkins, R. D., Hon, G., Tonti-Filippini, J.,

Nery, J. R., Lee, L., Ye, Z., Ngo, Q.-M. et al. (2009) Human DNA methylomesat base resolution show widespread epigenomic differences. nature, 462 (7271),315–322.

Meissner, A., Gnirke, A., Bell, G. W., Ramsahoye, B., Lander, E. S. & Jaenisch, R.(2005) Reduced representation bisulfite sequencing for comparative high-resolutionDNA methylation analysis. Nucleic acids research, 33 (18), 5868–5877.

Nordlund, J., Backlin, C. L., Wahlberg, P., Busche, S., Berglund, E. C., Eloranta, M.-L.,Flaegstad, T., Forestier, E., Frost, B.-M., Harila-Saari, A. et al. (2013) Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblasticleukemia. Genome biology, 14 (9), r105.

Patel, K. P., Ravandi, F., Ma, D., Paladugu, A., Barkoh, B. A., Medeiros, L. J. &Luthra, R. (2011) Acute myeloid leukemia with IDH1 or IDH2 mutation frequencyand clinicopathologic features. American journal of clinical pathology, 135 (1),35–45.

Petrie, K. & Zelent, A. (2007) AML1/ETO, a promiscuous fusion oncoprotein. Blood,109 (10), 4109–4110.

Sharma, S., Kelly, T. K. & Jones, P. A. (2010) Epigenetics in cancer. Carcinogenesis,31 (1), 27–36.

Shen, H. & Laird, P. W. (2013) Interplay between the cancer genome and epigenome.Cell, 153 (1), 38–55.

Sun, D., Xi, Y., Rodriguez, B., Park, H. J., Tong, P., Meong, M., Goodell, M. A. &Li, W. (2014) MOABS: model based analysis of bisulfite sequencing data. Genomebiology, 15 (2), R38.

Tomlins, S. A., Rhodes, D. R., Perner, S., Dhanasekaran, S. M., Mehra, R., Sun, X.-W., Varambally, S., Cao, X., Tchinda, J., Kuefer, R. et al. (2005) Recurrent fusionof TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 310(5748), 644–648.

Tripathi, R. C., Gupta, R. C. & Gurland, J. (1994) Estimation of parameters in the betabinomial model. Annals of the Institute of Statistical Mathematics, 46 (2), 317–331.

Vaissiere, T., Sawan, C. & Herceg, Z. (2008) Epigenetic interplay between histonemodifications and DNA methylation in gene silencing. Mutation Research/Reviewsin Mutation Research, 659 (1), 40–48.

van Vlodrop, I. J. H., Niessen, H. E. C., Derks, S., Baldewijns, M. M. L. L., vanCriekinge, W., Herman, J. G. & van Engeland, M. (2011) Analysis of promoter CpGisland hypermethylation in cancer: location, location, location! Clinical CancerResearch, 17 (13), 4225–4231.

Wang, T., Liu, Q., Li, X., Wang, X., Li, J., Zhu, X., Sun, Z. S. & Wu, J. (2013)RRBS-Analyser: a comprehensive web server for reduced representation bisulfitesequencing data analysis. Human mutation, 34 (12), 1606–1610.

Widschwendter, M., Fiegl, H., Egle, D., Mueller-Holzner, E., Spizzo, G., Marth, C.,Weisenberger, D. J., Campan, M., Young, J., Jacobs, I. et al. (2006) Epigenetic stemcell signature in cancer. Nature genetics, 39 (2), 157–158.

Xu, W., Yang, H., Liu, Y., Yang, Y., Wang, P., Kim, S.-H., Ito, S., Yang, C., Wang,P., Xiao, M.-T. et al. (2011) Oncometabolite 2-hydroxyglutarate is a competitiveinhibitor of α-ketoglutarate-dependent dioxygenases. Cancer cell, 19 (1), 17–30.

Yang, X., Lay, F., Han, H. & Jones, P. A. (2010) Targeting DNA methylation forepigenetic therapy. Trends in pharmacological sciences, 31 (11), 536–546.

8

at University of M

ichigan on May 22, 2014

http://bioinformatics.oxfordjournals.org/

Dow

nloaded from