Discrimination of Pseudogene and Parental Gene DNA Methylation Using Allelic Bisulfite Sequencing

10
Laura Poliseno (ed.), Pseudogenes: Functions and Protocols, Methods in Molecular Biology, vol. 1167, DOI 10.1007/978-1-4939-0835-6_18, © Springer Science+Business Media New York 2014 Chapter 18 Discrimination of Pseudogene and Parental Gene DNA Methylation Using Allelic Bisulfite Sequencing Luke B. Hesson and Robyn L. Ward Abstract Determining the methylation status of genes with pseudogenes can be technically challenging due to sequence homology. High sequence homology can result in the amplification of both pseudogene and parental gene alleles, potentially leading to data misinterpretation. Allelic bisulfite sequencing allows for detection of the methylation status of individual alleles at nucleotide resolution and represents the most reliable method for discriminating pseudogene and parental gene sequences. Here, we discuss important points that should be considered when investigating pseudogene and parental gene methylation status and we describe the method of allelic bisulfite sequencing, including assay design. Key words Pseudogene, Epigenetics, Methylation, Bisulfite, Sequencing 1 Introduction Pseudogenes are ancestral nonfunctional copies of protein coding genes that have lost the potential to give rise to a protein product [1]. Pseudogenes that arise by genomic duplication are called unprocessed pseudogenes, whereas those formed by retrotranspo- sition through reverse transcription of an mRNA intermediate and reintegration into the genome are known as processed. Pseudogenes are not restricted by the same selective pressures as functional parental genes and accumulate deleterious sequence changes over time, usually resulting in stop codons that render the “open read- ing frame” nonfunctional. Pseudogenes are ubiquitous in the human genome with current estimates indicating that there are over 17,000 pseudogenes [2]. Given the close similarity between pseudogenes and almost all coding genes, it is challenging to develop molecular analyses that are specific for the gene of interest rather than pseudogenes [3]. Most notably, amplification of DNA sequences using PCR can be problematic if the target region is not unique in the genome. Pseudogenes with high sequence homology can therefore be a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Transcript of Discrimination of Pseudogene and Parental Gene DNA Methylation Using Allelic Bisulfite Sequencing

Laura Poliseno (ed.), Pseudogenes: Functions and Protocols, Methods in Molecular Biology, vol. 1167,DOI 10.1007/978-1-4939-0835-6_18, © Springer Science+Business Media New York 2014

Chapter 18

Discrimination of Pseudogene and Parental Gene DNA Methylation Using Allelic Bisulfite Sequencing

Luke B. Hesson and Robyn L. Ward

Abstract

Determining the methylation status of genes with pseudogenes can be technically challenging due to sequence homology. High sequence homology can result in the amplification of both pseudogene and parental gene alleles, potentially leading to data misinterpretation. Allelic bisulfite sequencing allows for detection of the methylation status of individual alleles at nucleotide resolution and represents the most reliable method for discriminating pseudogene and parental gene sequences. Here, we discuss important points that should be considered when investigating pseudogene and parental gene methylation status and we describe the method of allelic bisulfite sequencing, including assay design.

Key words Pseudogene, Epigenetics, Methylation, Bisulfite, Sequencing

1 Introduction

Pseudogenes are ancestral nonfunctional copies of protein coding genes that have lost the potential to give rise to a protein product [1]. Pseudogenes that arise by genomic duplication are called unprocessed pseudogenes, whereas those formed by retrotranspo-sition through reverse transcription of an mRNA intermediate and reintegration into the genome are known as processed. Pseudogenes are not restricted by the same selective pressures as functional parental genes and accumulate deleterious sequence changes over time, usually resulting in stop codons that render the “open read-ing frame” nonfunctional. Pseudogenes are ubiquitous in the human genome with current estimates indicating that there are over 17,000 pseudogenes [2].

Given the close similarity between pseudogenes and almost all coding genes, it is challenging to develop molecular analyses that are specific for the gene of interest rather than pseudogenes [3]. Most notably, amplification of DNA sequences using PCR can be problematic if the target region is not unique in the genome. Pseudogenes with high sequence homology can therefore be a

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

source of “nonspecific” amplification when investigating gene expression, mutations, or DNA methylation [4, 5].

Both processed and unprocessed pseudogenes often show high sequence homology with the promoter regions of parental genes. Promoter regions are usually the focus of attention when assaying for methylation changes. In a recent study of the methylation sta-tus of pseudogene–parental gene pairs, Cortese et al. found that the majority of pseudogenes were methylated in different tissues compared with parental genes [6]. Therefore, when investigating the promoter methylation status of genes with pseudogenes, it is essential that the assays used are able to reliably discriminate pseu-dogene and parental gene sequences in order to avoid data misinterpretation.

Recently, we have demonstrated the technical challenges asso-ciated with analyzing the methylation status of the PTEN CpG island promoter, which shows >95 % sequence homology with the 5′ region of the PTENP1 pseudogene [7]. Using allelic bisulfite sequencing, we were able to unequivocally demonstrate that meth-ylation of the PTEN CpG island is a rare event in cancer cell lines and that apparent methylation in fact originates from homologous regions of the PTENP1 pseudogene [7]. Allelic bisulfite sequenc-ing involves bisulfite PCR, bacterial cloning of PCR amplicons, and fluorescent automated DNA sequencing of individual alleles.

Here, we describe a methodological approach to determine the methylation status of genes with pseudogenes or regions shar-ing high sequence similarity, using allelic bisulfite sequencing.

2 Materials

1. Genome browser [8]. 2. Sequence alignment tool such as BLAT [8] or BLAST [9].

1. EZ DNA methylation Gold Kit (Zymo Research).

1. Thermocycler. 2. Platinum®Taq DNA polymerase complete with 10× PCR

buffer and 50 mM MgCl2 (Invitrogen). 3. dNTP mixture. 4. Primers for amplification of region of interest from bisulfite

modified DNA. 5. PCR purification kit or Gel extraction kit.

1. pCR®2.1-TOPO® TA Cloning vector kit (Invitrogen). 2. Luria–Bertani (LB) agar plates supplemented with 50 μg/mL

Carbenicillin, 80 μg/mL 5-bromo-4-chloro-3-indolyl-β-d-

2.1 In Silico Characterization

2.2 Sodium Bisulfite DNA Modification

2.3 Bisulfite PCR and PCR Purification

2.4 Ligation and Transformation of PCR Products

Luke B. Hesson and Robyn L. Ward

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

galactopyranoside (X-β-gal), and 500 μM Isopropyl-B-d- thiogalactopyranoside (IPTG, BDH).

3. Water bath set to 42 °C. 4. Super Optimal broth with Catabolite repression (SOC) media:

2 % (w/v) Bacto-Tryptone, 0.5 % (w/v) yeast extract, 10 mM NaCl, 2.5 mM KCl and 10 mM MgCl2, 20 mM glucose.

5. Ice box. 6. Orbital shaker. 7. Incubator set to 37 °C. 8. Chemically competent DH5α E. coli.

1. Standard PCR reagents and equipment as listed in Subheading 2.3, items 1–3.

2. M13 sequencing primers (5′-GTTTTCCCAGTCACGAC-3′ and 5′-CAGGAAACAGCTATGAC-3′).

3. 96-well PCR reaction plates.

1. Thermocycler. 2. Antarctic phosphatase, 5,000 U/mL complete with 10×

Antarctic phosphatase buffer. 3. Exonuclease I, 20,000 U/mL.

1. Thermocycler. 2. BigDye® Terminator v3.1 Cycle Sequencing Kit complete with

5× reaction buffer (Applied Biosystems). 3. Ethanol (95 % and 70 % (v/v)). 4. 3 M Sodium Acetate (pH 5.2). 5. Refrigerated centrifuge with a plate spinning rotor capable of

2,235 RCF. 6. ABI3730 DNA analyzer (Applied Biosystems).

1. DNA sequence viewing software. 2. “CpGviewer” interactive bisulfite DNA sequencing analysis tool

[10] or equivalent bisulfite DNA sequencing analysis software.

3 Methods

1. Obtain the sequence of the CpG island promoter or of other regions of interest.

2. Using a sequence alignment tool, search for regions of homol-ogy in the genome (Fig. 1a).

3. Identify the sequence differences between the pseudogene and the parental gene (Fig. 1b).

2.5 Colony PCR

2.6 Phosphatase and Exonuclease Treatment

2.7 Fluorescent Automated DNA Sequencing

2.8 Data Interpretation

3.1 In Silico Characterization

Pseudogenes and Allelic Bisulphite Sequencing

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

4. Design bisulfite PCR primers that amplify regions that contain informative sequence differences between the parental gene and the pseudogene (see Notes 1–4 for hints on bisulfite PCR primer design).

Extract genomic DNA from the cell line or tissue of interest using phenol-chloroform DNA extraction or a commercially available kit. Extracted genomic DNA is then bisulfite modified, which involves the selective chemical conversion of cytosine to uracil, whereas 5-methylcytosine remains refractory to this conversion.

1. Dilute 1 μg of genomic DNA into 20 μL in nuclease-free water. 2. Prepare the “CT conversion reagent” (provided in the EZ

DNA methylation-Gold™ kit) according to manufacturer’s instructions and add 130 μL to the DNA.

3.2 Sodium Bisulfite DNA Modification

Homologous region in PTENP1 Bisulphite PCR

a

b

500 basesKLLN

PTEN

CGI

PTENP1

Bisulphite PCR

1 kb

Bisulphite PCR 100 bases

PTENPTENP1

CCTCCAGCCCGCCGGC

CGGACGAGACGCACGGGA

CTCCATCTGGAT

GCC---------GCCGCCGCCGCCGCCGCC

Fig. 1 Regions of homology between the PTEN and PTENP1 pseudogene and identification of discriminating sequence differences. (a) The PTEN CpG island (green bar) is a bidirectional promoter encompassing the 5′UTR of the KLLN and PTEN genes (blue bars). The fragmented black bar indicates the region that shares high sequence homology with the PTENP1 pseudogene and the degree of similarity. Thin vertical red lines indicate single nucleotide differences, whilst gaps indicate larger regions of sequence variation between the PTEN and PTENP1 genes across this region. The bracket indicates the region amplified using bisulfite PCR primers as described in Bennett et al. [11]. (b) Shown is the region of the PTENP1 processed pseudogene (light blue bar ) that is amplified by the bisulfite PCR described in (a). Thin vertical red lines and gaps within the black bar indicates the locations of sequence differences (bottom) that can be used to distinguish PTEN alleles from PTENP1 alleles

Luke B. Hesson and Robyn L. Ward

110

111

112

113

114

115

116

117

118

119

120

121

122

3. Place the reaction tube in a thermocycler and incubate at 98 °C for 10 min followed by 53 °C for 18 h. This extended incuba-tion time ensures the complete modification of DNA.

4. Recover the modified DNA using the DNA binding columns provided in the EZ DNA methylation-Gold™ kit according to the manufacturer’s instructions. Elute the modified DNA in 50 μL nuclease-free water to obtain ~20 ng/μL bisulfite modi-fied DNA.

1. The optimum conditions for each PCR must be determined empirically. We commonly use the following conditions when optimizing bisulfite PCRs: 0.2 mM dNTPs, 0.4–1 μM each primer, 0.5–1 U Platinum®Taq DNA polymerase, 2 mM MgCl2, and 40–100 ng bisulfite modified DNA. The thermo-cycle consists of 5 min at 95 °C, 35–40 cycles of 1 min at 94 °C, 1 min at the calculated annealing temperature of the primers and 2 min at 74 °C, followed by 10 min and 72 °C.

2. Purify PCR amplicons using a PCR purification or gel extrac-tion kit (see Note 5).

1. Perform ligation into the pCR®2.1-TOPO® TA cloning vector according to the manufacturer’s instructions. Ligation is per-formed for 30 min at room temperature.

2. Thaw DH5α E. coli on ice. Aliquot 50 μL into a fresh 1.5 mL tube and add 2 μL of ligation reaction. Gently mix with the pipette tip. Do not vortex or pipette up and down. Incubate on ice for 30 min.

3. Transform the bacteria by heat shock in a 42 °C water bath for 30 s. Incubate on ice for 2 min.

4. Add 450 μL room temperature SOC media. 5. Incubate in an orbital shaker for 90 min at 37 °C, shaking at

250 rpm. 6. Evenly spread 100 μL of each transformation mixture onto the

surface of a pre-warmed LB agar plate (see Note 6). 7. Incubate at 37 °C for 20 h.

1. Remove the LB agar plate from the incubator and mark the desired number of white colonies for colony PCR.

2. Place 5 μL nuclease-free water into the bottom of the desired number of wells within a 96-well PCR plate, plus one addi-tional well for a negative control (see Note 7).

3. To inoculate the water, gently touch a white colony using a pipette tip and place it in a well within the 96-well plate. Leave the tip in the well and continue picking colonies until the desired number is reached (see Note 8).

3.3 Bisulfite PCR and PCR Purification

3.4 Ligation and Transformation of PCR Products

3.5 Colony PCR

Pseudogenes and Allelic Bisulphite Sequencing

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

4. Gently shake the plate to agitate the tips. This will disperse the bacteria cells into the water. To remove the tips without cross- contaminating wells invert the plate over a waste disposal bin.

5. Prepare a PCR master mix containing 0.25 mM dNTPs, 2 mM MgCl2, 1 U Platinum® Taq DNA polymerase, and M13 sequencing primers (0.4 μM each primer, see Note 9).

6. Place 20 μL PCR master mix into each well to obtain a final volume 25 μL. Seal the plate and place it in a thermocycler.

7. Incubate for 5 min at 95 °C followed by 30 cycles of 30 s at 95 °C, 30 s at 50 °C, 30 s at 72 °C, and a final incubation for 10 min at 72 °C (see Note 10).

8. To identify reactions ready for sequencing, load 5 μL of each reaction into a 2 % agarose gel (see Note 11).

1. Add 0.5 μL (2.5 U) alkaline phosphatase, 0.5 μL (10 U) exo-nuclease I, 2.5 μL 10× alkaline phosphatase buffer, and 1 μL nuclease-free water (final volume 25 μL, see Note 12) to each reaction.

2. Incubate in a thermocycler for 30 min at 37 °C followed by 20 min at 80 °C.

1. Remove 4 μL of each reaction and place it in a fresh PCR plate. 2. Add 0.5 μL BigDye® Terminator v3.1 Cycle Sequencing

reagent, 2 μL 5× BigDye sequencing reaction buffer, 1 μL of 3.2 μM primer (see Note 13), and 2.5 μL nuclease-free water (final volume 10 μL). Place the reaction in a thermocycler.

3. Incubate for 25 cycles of 20 s at 94 °C, 20 s at 50 °C and 4 min at 60 °C. Store reactions at 4 °C and protect from light.

4. Add 25 μL 95 % (v/v) ethanol (chilled on ice) and 1 μL 3 M Sodium Acetate (pH 5.2) to each well.

5. Seal the plate and place it in a 4 °C centrifuge. 6. Centrifuge at 2,235 RCF for 20 min. 7. Remove ethanol and add 50 μL 70 % (v/v) ethanol. Repeat

steps 2–4. 8. Remove the 70 % (v/v) ethanol and air-dry for 10 min. 9. Sequence using an ABI3730 DNA analyzer.

1. Using DNA sequence viewing software separate pseudogene and parental gene alleles based on sequence differences identi-fied in step 3 in Subheading 3.1.

2. Determine the methylation status of each individual CpG dinu-cleotide using the “CpGviewer” software [10] (Fig. 2). To use this software, the genomic sequence of the regions analyzed

3.6 Phosphatase and Exonuclease Treatment

3.7 Fluorescent Automated DNA Sequencing

3.8 Data Interpretation

[AU1]

Luke B. Hesson and Robyn L. Ward

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

PTEN-derived allele (unmethylated)

PTENP1-derived allele (methylated)

PTEN-derived allele

PTENP1-derived allele

a

b

Fig. 2 Allelic bisulfite sequencing shows DNA methylation is specifically associ-ated with PTENP1 and not the PTEN CpG island. (a) Allelic bisulfite sequencing data showing the methylation status of PTEN and PTENP1-derived alleles in the hematological cancer cell line Raji (taken from Hesson et al. [7]). Each line rep-resents a single allele. Circles indicate the positions of CpG dinucleotides; black circles indicate methylated CpG dinucleotides, white circles indicate unmethyl-ated CpG dinucleotides; yellow diamonds indicate the positions of nucleotide variations within PTENP1 alleles used to discriminate between PTEN and PTENP1 alleles; black diamonds indicate the positions of additional CpG dinucleotides specific to PTENP1 alleles that were also methylated. (b) Representative electro-pherograms showing an unmethylated PTEN allele and a methylated PTENP1 allele. Indicated by the black arrow is the position of a nucleotide variation used to discriminate the PTEN and PTENP1 alleles

Pseudogenes and Allelic Bisulphite Sequencing

(both the pseudogene and the parental gene) are required in plain text (.txt format), as well as the electropherogram for each allele (.ab1 format).

4 Notes

1. Choice of the DNA strand. Following bisulfite modification, the two previously complementary DNA strands become non-complementary single-stranded DNAs that can be amplified separately using strand-specific PCR primers. Once the region of interest has been identified, it is crucial to choose the most appropriate DNA strand, so that the sequence differences between the parental gene and the pseudogene remain infor-mative even after the treatment with sodium bisulfite. For example, C/T mismatches between the parental gene and the pseudogene (with the C unmethylated) are lost following bisulfite conversion, but the corresponding G/A mismatches present on the other strand do persist.

2. Basic principles of bisulfite PCR primer design. It is essential that bisulfite PCR primers are specific for bisulfite modified DNA. To achieve this goal, we design them so that they include thymines (originally cytosines) at critical positions, such as a thymine (originally cytosine) at the most 3′ base or a short stretch of thymines (originally cytosines) in the central part of the primer.

Bisulfite modification reduces the complexity of the DNA sequence, making it more difficult to design primers with a low rate of off-target binding. It is therefore important to incorpo-rate a mix of the three remaining bases A, T, and G, whenever possible. In this respect, increasing primer length (between 25 and 40 nt) increases primer specificity and also compensates for reduced primer annealing temperature due to the loss of cyto-sine bases.

For allelic bisulfite sequencing, amplification of modified DNA with no bias towards original methylation status is also crucial. This is achieved by avoiding CpG dinucleotides within primer binding sites.

3. Amplicon size and nested PCRs. Generally bisulfite PCR works well with small amplicon sizes (up to ~500 bp). Nested PCRs can improve the amplification of particularly large amplicons or regions for which it is difficult to obtain specific amplicons. Nested bisulfite PCRs are split into two reactions, the first of which involves a limited number of cycles (~20). A small amount of this reaction is then transferred to a second PCR reaction, which includes nested primers.

Luke B. Hesson and Robyn L. Ward

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

4. When analyzing the methylation status of any gene with a pseudogene that shares high sequence homology, we advise against the use of other techniques such as bisulfite pyrose-quencing, methylation-specific PCR (MSP) or combined bisulfite restriction analysis (COBRA). These techniques may not be informative for the proportion of the amplicon that is pseudogene-derived and/or may not allow for discrimination of whether methylation originates from the pseudogene or the parental gene.

5. Ligation efficiency is improved by PCR purification. This can be done using PCR column purification or gel extraction. Gel extraction is desirable if the reaction contains significant primer dimer or nonspecific PCR products.

6. If low numbers of transformants are expected, then the entire transformation mixture can be plated following collection of cells by centrifugation and resuspension in 100 μL SOC media.

7. Each white colony should contain a single allele of the region of interest. The number of colonies to be screened depends largely on the application. Sequencing of a greater number of alleles will give a more accurate representation of the methyla-tion status of a region across a population of cells, as well as of the proportion of methylated pseudogene and parental gene- derived sequences.

8. When picking the colonies, avoid scraping the entire colony as this will overload the PCR reaction with too much DNA. If a plasmid miniprep containing a cloned allele is required, the remainder of the colony can be used to inoculate LB broth containing 50 μg/mL carbenicillin.

9. The use of M13 primers standardizes colony PCR conditions but also ensures that different primers can be used for colony PCR and sequencing reactions, which reduces background in sequencing reactions.

10. The initial incubation for 5 min at 95 °C is essential to activate Platinum®Taq DNA polymerase and also releases plasmid DNA from bacteria.

11. Colony PCR product size is defined by the size of the insert ligated into the pCR®2.1-TOPO® TA cloning vector plus ~200 bp of flanking vector sequence.

12. Prior to sequencing, unincorporated dNTPs and primers must be removed from the PCR reaction. This can be done enzymatically or through purification columns. We recom-mend enzymatic treatment using antarctic phosphatase and exonuclease I, which dephosphorylates dNTPs and removes single- stranded DNA primers, respectively. These enzymes are then heat inactivated, thereby preventing interference

Pseudogenes and Allelic Bisulphite Sequencing

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

1. Zheng D, Frankish A, Baertsch R et al (2007) Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 17: 839–851

2. Karro JE, Yan Y, Zheng D et al (2007) Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res 35:D55–D60

3. Kalyana-Sundaram S, Kumar-Sinha C, Shankar S et al (2012) Expressed pseudogenes in the transcriptional landscape of human cancers. Cell 149:1622–1634

4. Whang YE, Wu X, Sawyers CL (1998) Identification of a pseudogene that can mas-querade as a mutant allele of the PTEN/MMAC1 tumor suppressor gene. J Natl Cancer Inst 90:859–861

5. Zysman MA, Chapman WB, Bapat B (2002) Considerations when analyzing the methyla-tion status of PTEN tumor suppressor gene. Am J Pathol 160:795–800

6. Cortese R, Krispin M, Weiss G et al (2008) DNA methylation profiling of pseudogene- parental gene pairs and two gene families. Genomics 91:492–502

7. Hesson LB, Packham D, Pontzer E et al (2012) A reinvestigation of somatic hypermethylation at the PTEN CpG island in cancer cell lines. Biol Proced Online 14:5

8. Meyer LR, Zweig AS, Hinrichs AS et al (2012) The UCSC Genome Browser database: exten-sions and updates 2013. Nucleic Acids Res 41(Database issue):D64–D69

9. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410

10. Carr IM, Valleley EM, Cordery SF et al (2007) Sequence analysis and editing for bisulphite genomic sequencing projects. Nucleic Acids Res 35:e79

11. Bennett KL, Mester J, Eng C (2010) Germline epigenetic regulation of KILLIN in Cowden and Cowden-like syndrome. JAMA 304:2724–2731

[AU2]

with subsequent sequencing. The use of these enzymes allows for a more convenient high-throughput sequencing of individual alleles in a 96-well format.

13. Add only one primer from the set used to obtain the original PCR amplicon.

References

Luke B. Hesson and Robyn L. Ward

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343