Three-dimensional organisation of the Runx1
locus
Amarni Thomas
A thesis submitted for the degree of
Doctor of Philosophy
Department of Pathology
University of Otago
Dunedin, New Zealand
April 2021
ii
Abstract
The proper regulation of gene expression is crucial for cell-fate and lineage
determination of hematopoietic stem cells (HSCs). Dysregulated expression of genes that
drive haematopoietic differentiation can cause blood disorders, including leukaemia.
Connecting hematopoietic genes with their regulatory elements is central to correct gene
regulation, and is orchestrated by chromatin modifiers, including the genome organising
protein complex, cohesin.
The developmental transcription factor RUNX1/AML1 is a well-known
leukaemia-associated gene. Runx1 is an important regulator of haematopoiesis in
vertebrates; it is crucial for early myeloid differentiation, and plays a vital role in adult
blood development. Genetic disruptions to the RUNX1 gene are frequently associated
with Acute Myeloid Leukaemia (AML).
Despite the well-established role of RUNX1 in haematopoiesis and disease
contribution, the cis-regulatory mechanisms that modulate RUNX1 require further
elucidation. Gene regulatory elements, such as enhancers, located in non-coding DNA
are likely to be important for Runx1 transcription.
My PhD studies provided insight into mouse Runx1 enhancer functions and
highlighted the features of conserved human enhancers. I also used circular chromosome
conformation capture sequencing (4C-seq) to identify DNA interactions with the
previously identified +24 RUNX1 enhancer, which is essential for HSC expression of
RUNX1. Using cohesin-deficient K562 leukaemia cells, my work shows that cohesin is
required for the +24 enhancer (also known as eR1) to anchor an insulated neighbourhood
that shields RUNX1 from outside influences. Striking results from this work show that
the +24 Runx1 enhancer is a significant cohesin-dependent organiser of Chromosome 21
during megakaryocyte differentiation.
iii
In summary, my PhD studies identified a bouquet of enhancer-anchored
connections that regulate RUNX1 transcription in leukaemia cells at steady state and
during differentiation, and explain why cohesin loss dysregulates the expression of this
important leukaemogenic gene.
iv
Acknowledgements
First and foremost, I would like to thank my supervisor, Prof. Julia Horsfield for
her encouragement and guidance throughout the course of my PhD, as well as her
valuable career advice and support. Many thanks to the Chromosome Structure &
Development Lab Group for friendship, collegiality and making lab work enjoyable:
Alice, Amy, Anastasia, Bridget, Bryony, Caitlin, Chi, Jisha, Judith, Lisa, Megan,
Michael, Nabila, Nima, Noel, Sarada, Stephanie, Trent, Tanushree, Ute. Thanks to all the
wonderful writing group friends for the engaging discussions and enjoyable days
together.
I would also like to thank those who have helped with this PhD project: Dr.
William Schierding for his advice and bioinformatics training. Dr. Aaron Stevens for
guidance on G-quadruplex characterisation. Ute Zellhuber-McMillan for helping me
breed and characterise the F1 zebrafish lines. Noel Jhinku for making sure all the
zebrafish are well cared for. Dr. Robert Day for running my samples on the MiSeq. To
Dr. Judith Marsman for guidance with the 4C experiment. Dr. Jisha Antony for providing
counsel with lab experiments.
To my committee members, Dr. Jisha Antony, Dr. Judith Marsman, Prof. Julia
Horsfield, Prof. Justin O’Sullivan, Prof. Ian Morison, and my convenor Prof. Stephen
Robertson, for supporting me during my studies and providing exciting discussions.
Thanks to Prof. Motomi Osato for providing some of the Runx1 enhancer DNA essential
to this project, and for hosting me along with Prof. Hitoshi Takizawa at International
Research Centre for Medical Sciences Research, Kumamoto University, Japan. Thank
you to the Liggins Institute, University of Auckland, Prof. Justin O’Sullivan and Dr.
William Schierding for providing a space for me to learn bioinformatics and analsye my
data.
v
I gratefully acknowledge the funding I have received during the course of my
PhD. Cancer Research Trust New Zealand for awarding me a scholarship, University of
Otago Pūtea Tautoko - Student Fund, and Health Research South - Dunedin School of
Medicine for additional fees and stipend support. Marsden Fund (administered by the
Royal Society of New Zealand), Health Research Council of New Zealand and
Leukaemia and Blood Cancer New Zealand for supporting this project. International
Research Centre for Medical Sciences Research, Kumamoto University, Japan for
awarding me an internship to further develop molecular techniques. Lorne Genome
Conference, and Leukaemia and Blood Cancer New Zealand for providing travel
funding.
Thank you to the various workplaces who have made my PhD journey a unique
and fulfilling experience; Pathology Department, Research And Enterprise, Te Huka
Mātauraka, Disability Information And Support, Genetics Teaching Programme And
Genetics Otago.
Finally, the biggest thanks to my Mum, Dad, my brother Blake and my friends
for their enthusiasm, support and for brightening every day.
vi
Table of Contents
Abstract .......................................................................................................................... ii
Acknowledgements ........................................................................................................ iv
Table of Contents ........................................................................................................... vi
List of Figures ............................................................................................................ xvii
List of Tables ................................................................................................................ xxi
List of symbols and abbreviations............................................................................ xxiv
Nomenclature .......................................................................................................... xxviii
1 Chapter 1: Introduction ........................................................................................... 1
1.1 Haematopoiesis .................................................................................................. 1
1.1.1 Primitive and definitive haematopoiesis .................................................... 1
1.1.2 Transcription factors involved in haematopoiesis ...................................... 5
1.2 Haematological disorders .................................................................................. 6
1.2.1 Familial platelet disorder ............................................................................ 7
1.2.2 Acute Myeloid Leukaemia ......................................................................... 7
1.2.3 Leukaemogenesis ....................................................................................... 8
1.2.4 AML patient survival and treatment......................................................... 10
1.2.5 Cohesin mutation contributes to AML ..................................................... 12
1.3 RUNX1 is a leukaemia gene............................................................................. 16
1.3.1 Role of RUNX1 in normal embryonic and adult developmental processes..
.................................................................................................................. 18
1.3.2 Regulation of Runx1 ................................................................................. 18
1.3.3 Role of RUNX1 in AML progression ....................................................... 21
1.4 Three-dimensional genome structure............................................................... 24
vii
1.4.1 Topologically associated domains (TADs) .............................................. 25
1.5 Regulators of gene transcription ...................................................................... 27
1.5.1 Cohesin acts with other regulators to determine gene expression ............ 28
1.5.2 Cohesin and Runx1 ................................................................................... 29
1.5.3 Regulatory elements ................................................................................. 32
1.5.3.1 Enhancers .............................................................................................. 32
1.5.3.2 Silencers and insulators ........................................................................ 34
1.5.4 Techniques to determine three-dimensional conformation ...................... 35
1.5.5 Genome structure and disease .................................................................. 36
1.5.6 Regulatory regions of Runx1 .................................................................... 37
1.6 Zebrafish as a model organism ........................................................................ 39
1.6.1 Haematopoiesis in zebrafish ..................................................................... 42
1.7 Overall hypothesis and aims of this PhD ......................................................... 43
2 Chapter 2: Methods ................................................................................................ 46
2.1 Ethics consent and protocol approval .............................................................. 46
2.1.1 Zebrafish ................................................................................................... 46
2.1.2 Escherichia coli (E. coli) .......................................................................... 46
2.1.3 Cell culture ............................................................................................... 46
2.2 Materials .......................................................................................................... 46
2.3 Molecular biology techniques .......................................................................... 46
2.3.1 Plasmids and reporter constructs .............................................................. 46
2.3.2 Nanodrop .................................................................................................. 47
2.3.3 Agarose gel electrophoresis ...................................................................... 48
2.3.4 Primer design ............................................................................................ 48
2.3.5 Sequence confirmation and alignment ..................................................... 48
viii
2.3.6 Measuring Runx1 +24 enhancer G-quadruplexes..................................... 49
2.3.6.1 Runx1 +24 enhancer G-quadruplexes Oligo design ............................. 49
2.3.6.2 CD Spectrophotometer protocol ........................................................... 49
2.3.7 Site-Directed Mutagenesis of Runx1 +24 enhancer ................................. 50
2.3.7.1 Runx1 +24 enhancer Site-Directed Mutagenesis primer design ........... 50
2.3.7.2 Mutant Strand Synthesis Reaction (Thermal Cycling) ......................... 50
2.3.7.3 DnpI Digestion of the Mutant Strand Amplification Products ............. 51
2.3.8 Transformation of plasmids in competent cells........................................ 51
2.3.9 Plasmid purification preparation .............................................................. 52
2.3.10 Tol2 transposase mRNA synthesis ........................................................... 52
2.4 In vivo Zebrafish approaches ........................................................................... 53
2.4.1 Zebrafish strains ....................................................................................... 53
2.4.2 Maintenance ............................................................................................. 53
2.4.3 Feeding ..................................................................................................... 53
2.4.4 Breeding and embryo collection ............................................................... 54
2.4.5 Growing embryos ..................................................................................... 54
2.5 Zebrafish microinjection .................................................................................. 54
2.5.1 Needle pulling and calibration.................................................................. 54
2.5.2 Microinjection .......................................................................................... 56
2.5.3 Microscopy ............................................................................................... 56
2.6 Transgenic zebrafish analysis .......................................................................... 56
2.6.1 Zebrafish enhancer assay .......................................................................... 57
2.7 In vitro Cell culture and maintenance of cell lines .......................................... 57
2.7.1 Cell lines ................................................................................................... 57
2.7.2 Reviving cells from liquid nitrogen stocks ............................................... 58
ix
2.7.3 Splitting cells for new passage ................................................................. 59
2.7.3.1 Adherent cells (PNX, BHK/MKL, NIH/3T3) ...................................... 59
2.7.3.2 Suspension cells (K562, HPC7)............................................................ 59
2.7.3.3 K562 STAG2 R614-/- mutant lines ...................................................... 59
2.7.4 Freezing cells for liquid nitrogen storage ................................................. 60
2.8 Expression analysis .......................................................................................... 61
2.8.1 Total RNA purification............................................................................. 61
2.8.2 cDNA synthesis ........................................................................................ 61
2.8.3 Quantitative reverse-transcriptase PCR (qRT-PCR) ................................ 61
2.8.4 pGL4.23 Transfection............................................................................... 63
2.8.5 pGL4.23 Transfection analysis ................................................................. 63
2.8.6 Statistical analysis of luciferase assays .................................................... 64
2.9 In silico approaches for Enhancer analysis ...................................................... 64
2.9.1 Epigenetic analyses using ENCODE data ................................................ 64
2.9.2 Comparative analysis of cancer genomic datasets ................................... 66
2.9.3 Single nucleotide polymorphism (SNP) analysis ..................................... 67
2.10 Circular Chromatin Conformation Capture (4C) ............................................. 67
2.10.1 4C Design ................................................................................................. 68
2.10.2 Cross-linking of K562 cells ...................................................................... 68
2.10.3 Isolation of nuclei ..................................................................................... 69
2.10.4 First Digestion .......................................................................................... 69
2.10.5 First restriction enzyme inactivation ........................................................ 70
2.10.6 Ligation..................................................................................................... 70
2.10.7 Reverse cross-linking and DNA purification ........................................... 71
2.10.8 Second digestion ....................................................................................... 71
x
2.10.9 Removal of second restriction enzyme..................................................... 72
2.10.10 Second Ligation .................................................................................... 72
2.10.11 DNA purification with linear polyacrylamide (LPA) ........................... 73
2.10.12 4C Polymerase Chain Reaction ............................................................ 73
2.11 4C-seq Library preparation .............................................................................. 74
2.11.1 PCR purification ....................................................................................... 74
2.11.2 Adaptor ligation ........................................................................................ 74
2.11.3 Library purification .................................................................................. 76
2.11.4 Equal mixing of baits................................................................................ 77
2.11.4.1 MiSeq library .................................................................................... 77
2.11.4.2 HiSeq Library .................................................................................... 77
2.11.5 Next generation sequencing ..................................................................... 78
2.12 Processing of sequenced reads ......................................................................... 78
2.12.1 Generating read counts and data quality reports ...................................... 78
2.12.2 Demultiplexing of libraries based on indexed adaptors ........................... 78
2.12.3 Trimming of adaptors ............................................................................... 78
2.12.4 Demultiplexing of baits based on 4C primers .......................................... 78
2.12.4.1 HiSeq ................................................................................................. 79
2.12.4.2 MiSeq ................................................................................................ 79
2.13 Mapping of sequenced reads............................................................................ 80
2.13.1 Generating restriction enzyme digestion of Human Genome (hg19) ....... 80
2.13.2 Generate Bowtie2 reference files of RE digested genome. ...................... 81
2.13.3 Mapping of reads ...................................................................................... 81
2.13.4 Creating a counts file from mapped data .................................................. 81
2.13.5 Removing self-ligated and undigested reads ............................................ 82
xi
2.13.6 Removal of background noise .................................................................. 82
2.14 Analysing 4C interactions ................................................................................ 82
2.14.1 Identification of statistically significant interactions per condition ......... 83
2.14.2 Generation of ggplots ............................................................................... 84
2.14.3 Differential interactions using DESeq2 .................................................... 84
2.14.4 Overlap of significant interactions with genes ......................................... 84
2.14.5 Overlap of significant interactions with protein binding sites and enhancer
characteristics .......................................................................................................... 84
3 Chapter 3: Functional analysis of Runx1 enhancers ........................................... 85
3.1 Runx1 enhancers .............................................................................................. 85
3.1.1 Previously identified enhancers of Runx1 ................................................ 85
3.1.2 Previously identified long-range interactions with Runx1 promoters ...... 92
3.1.3 Functional enhancer characterisation assays ............................................ 97
3.2 Hypothesis ....................................................................................................... 98
3.3 Aims ................................................................................................................. 98
3.4 Characteristics of previously identified regulatory regions ............................. 99
3.4.1 Epigenetic and conservation analyses using ENCODE data .................... 99
Single nucleotide polymorphism (SNP) analysis .................................................. 106
3.4.2 Comparative analysis of cancer genomic datasets ................................. 111
3.5 Zebrafish enhancer reporter assays ................................................................ 114
3.5.1 Zebrafish Enhancer assay F0: ................................................................. 116
3.5.2 Zebrafish Enhancer assay F1: ................................................................. 120
3.6 Genomic characteristics of 4C identified functional novel enhancers .......... 121
3.6.1 Conservation and epigenetic analysis ..................................................... 122
3.6.2 Single nucleotide polymorphism (SNP) analysis ................................... 122
xii
3.6.3 Comparative analysis of cancer genomic datasets ................................. 123
3.7 Chapter summary ........................................................................................... 124
4 Chapter 4: Three-dimensional (3D) connections of Runx1 +24 ....................... 127
4.1 Introduction .................................................................................................... 127
4.1.1 Secondary structure of DNA .................................................................. 129
4.1.1.1 Secondary structure of Runx1 +24 ..................................................... 129
4.2 Hypothesis ..................................................................................................... 132
4.3 Aims ............................................................................................................... 132
4.4 Enhancer characteristics of RUNX1 +24 ....................................................... 133
4.4.1 Epigenetic analysis ................................................................................. 133
4.4.2 Single nucleotide polymorphism (SNP) analysis ................................... 133
4.4.3 Comparative analysis of cancer genomic datasets ................................. 133
4.5 Characterising Runx1 +24 G-quadruplex structure ....................................... 134
4.5.1 CD spectroscopy ..................................................................................... 134
4.5.2 Runx1 +24 G-quadruplex Luciferase reporter assay .............................. 136
4.6 Circular Chromatin Conformation Capture (4C) of Human RUNX1 +24 ..... 140
4.6.1 Quality check of 4C data ........................................................................ 141
4.6.2 Detection of significant interactions ....................................................... 143
4.7 Identification and Genomic Characteristics of Human RUNX1 +24 3D
Connections .............................................................................................................. 145
4.7.1 Visualisation of significant cis interactions ............................................ 145
4.7.2 Overview of significant near-bait interactions ....................................... 148
4.7.3 Significant interactions with known RUNX1 regulators ........................ 151
4.7.4 Significant interactions with other genes ............................................... 152
4.7.5 Overall significant RUNX1 +24 cis and near-bait interactions .............. 154
xiii
4.7.6 Trans interactions ................................................................................... 157
4.8 Chapter summary ........................................................................................... 159
4.8.1 RUNX1 +24 characteristics..................................................................... 159
4.8.2 RUNX1 +24 interactions ......................................................................... 161
5 Chapter 5: Cohesin insulates 3D connections of RUNX1 +24 during
megakaryocyte differentiation .................................................................................. 164
5.1 Introduction .................................................................................................... 164
5.2 Hypothesis ..................................................................................................... 166
5.3 Aims ............................................................................................................... 166
5.4 RUNX1 +24 significant interactions during normal megakaryocyte
differentiation............................................................................................................ 167
5.4.1 Visualisation of significant interactions during normal megakaryocyte
differentiation. ....................................................................................................... 167
5.4.2 Overview of significant changes in interactions during normal
megakaryocyte differentiation. ............................................................................. 168
5.4.3 Visualisation of significant near-bait interactions during normal
megakaryocyte differentiation. ............................................................................. 172
5.4.4 Overview of significant changes in near-bait interactions during normal
megakaryocyte differentiation. ............................................................................. 173
5.4.5 Significant interactions with other genes ............................................... 176
5.4.6 Overall significant RUNX1 +24 and near-bait interactions in K562 cells
during megakaryocyte differentiation ................................................................... 177
5.5 The effect of STAG2 deficiency on RUNX1 +24 significant interactions during
differentiation of K562 cells ..................................................................................... 180
xiv
5.5.1 Visualisation of significant interactions during cohesin impaired
“differentiation” .................................................................................................... 180
5.5.2 The effect of STAG2 deficiency on interactions during differentiation of
K562 cells ............................................................................................................. 181
5.5.3 The effect of STAG2 deficiency on near-bait interactions during
differentiation of K562 cells ................................................................................. 190
5.5.4 Significant changes in near-bait interactions caused by STAG2 deficiency
................................................................................................................ 191
5.5.5 Significant interactions with other genes ............................................... 196
5.5.6 Overall changes in and near-bait interactions mediated by RUNX1 +24
interactions during differentiation of K562 cells, WT versus STAG2-deficient .. 197
5.6 Chapter summary ........................................................................................... 200
5.6.1 RUNX1 +24 enhancer connections during K562 differentiation to
megakaryocytes ..................................................................................................... 200
5.6.1.1 Distant connections ............................................................................. 200
5.6.1.2 Connections within the RUNX1 TAD ................................................. 201
5.6.2 RUNX1 +24 enhancer connections during PMA treatment in STAG2
mutants 201
5.6.2.1 Baseline conditions: ............................................................................ 201
5.6.2.2 Mid-differentiation.............................................................................. 202
5.6.2.3 Post-differentiation ............................................................................. 202
6 Chapter 6: Discussions and Conclusions ............................................................ 204
6.1 Regulators of Runx1....................................................................................... 205
6.1.1 Mouse Runx1 enhancers ......................................................................... 205
6.1.2 Conserved human RUNX1 regulators ..................................................... 206
xv
6.1.3 G-quadruplex .......................................................................................... 208
6.1.4 Regulatory RNAs ................................................................................... 209
6.1.4.1 Enhancer RNA (eRNA) ...................................................................... 209
6.1.4.2 Long non-coding RNAs ...................................................................... 210
6.1.4.3 G-Quadruplex RNAs .......................................................................... 210
6.1.5 Could mutational load in regulatory elements also contribute to disease
progression? .......................................................................................................... 211
6.2 Organisational role of the RUNX1 +24 enhancer .......................................... 217
6.3 RUNX1 +24 enhancer connections during megakaryocyte differentiation ... 218
6.4 Cohesin/STAG2 dependency of RUNX1 +24 enhancer connections in K562
cells ....................................................................................................................... 221
6.4.1 K562 Limitations .................................................................................... 224
6.5 Future directions ............................................................................................ 224
6.5.1 RUNX1 +24 G-quadruplex ..................................................................... 225
6.5.2 Functional analysis of RUNX1 regulatory regions ................................. 225
6.5.3 RUNX1 +24 connections with genes ...................................................... 226
6.5.4 Roles of non-coding RNA in RUNX1 regulation ................................... 227
6.5.5 Further questions proposed by this research .......................................... 228
6.5.6 Clinical significance ............................................................................... 229
6.6 Overall conclusions........................................................................................ 229
References ................................................................................................................... 231
Appendix I: Materials ................................................................................................ 260
Chemical reagents and consumables used in this project ..................................... 260
Solutions and buffers ................................................................................................ 262
Appendix II: Plasmid maps ....................................................................................... 264
xvi
Zebrafish enhancer detector vector (ZED) ............................................................ 264
pCS-TP .................................................................................................................. 265
pGL4.23-GW ........................................................................................................ 266
pRL-TK ................................................................................................................. 267
pUC19 ................................................................................................................... 268
Appendix III: Primer list ........................................................................................... 269
Appendix IV: Prediction of G-quadruplexes ........................................................... 271
Appendix V: Sequencing report ................................................................................ 273
Appendix VI: Publication .......................................................................................... 275
Appendix VII: RUNX1 regulatory region Hg38 coordinates ................................. 287
Appendix VIII: 4C data processing .......................................................................... 288
RUNX1 +24 .......................................................................................................... 289
Appendix IX: WT 12 DMSO interaction peaks....................................................... 290
WT 12 DMSO cis significant interactions ............................................................ 290
WT 12 DMSO near-bait significant interactions .................................................. 292
Appendix X: Trans interaction peaks....................................................................... 297
Appendix XI: List of significant changes (cis) ......................................................... 301
WT cis significant changes during differentiation ................................................ 301
WT and STAG2 cis comparisons .......................................................................... 302
Appendix XII: List of significant changes (near-bait) ............................................ 305
WT near-bait significant changes during differentiation ...................................... 305
WT and STAG2 near-bait comparisons ................................................................ 307
Appendix XIII: List of gene connections .................................................................. 321
Appendix XIV: List of significant changes STAG2 12 DMSO compared STAG2 72
PMA ............................................................................................................................. 327
xvii
List of Figures
Figure 1.1: Schematic of haematopoiesis cell generation locations during human
development. .................................................................................................................... 3
Figure 1.2: Vertebrate primitive and definitive haematopoiesis cell development
pathways. .......................................................................................................................... 5
Figure 1.3: Structure of cohesin complex. ...................................................................... 13
Figure 1.4: The Runx protein. ........................................................................................ 17
Figure 1.5: Comparative genomic organisation of the Runx1 locus in human, mouse and
zebrafish. ........................................................................................................................ 20
Figure 1.6: RUNX1 gene common isoforms and leukaemic fusion isoforms. .............. 22
Figure 1.7: RUNX1 TAD Hi-C contact map of RUNX1 region in K562 cells. .............. 27
Figure 1.8: Cohesin mediated cis-regulatory DNA elements. ........................................ 34
Figure 1.9: Identified Runx1 regulatory regions............................................................. 39
Figure 1.10: Diagram of normal Danio rerio (zebrafish) development. ......................... 41
Figure 1.11: Schematic of haematopoiesis cell generation locations during zebrafish
development. .................................................................................................................. 43
Figure 2.1: Calibration slide with close up of 1 mm scale, each division is 0.01 mm. .. 55
Figure 2.2: Haemocytometer slide used for counting cells, each shaded grid is 1 mm2. 60
Figure 2.3: Quantitative PCR primer locations at human RUNX1. ................................ 62
Figure 2.4: Schematic outline of 4C process. ................................................................. 68
Figure 3.1: Schematic overview of the mouse Runx1 locus (chromosome 16:93,220,074-
92,589,466). .................................................................................................................... 96
Figure 3.2: Schematic overview of human RUNX1 locus (chromosome 21: 36,148,773-
36,872,777). .................................................................................................................... 97
Figure 3.3: Proportions of cancerous primary site mutations of regulatory regions. ... 113
xviii
Figure 3.4: Schematic representation of ZED assay vector. ........................................ 115
Figure 3.5: Putative mouse enhancers are active in hematopoietic regions in zebrafish.
...................................................................................................................................... 118
Figure 3.6: . Quantitative summary of GFP expression patterns generated by putative
mouse enhancers in zebrafish. ...................................................................................... 119
Figure 4.1: Hi-C contact map of the RUNX1 region in K562 cells. ............................. 128
Figure 4.2: Schematic representation of mouse +24 individual G-quadruplex. ........... 130
Figure 4.3: Predicted representation of Runx1 +24 mCRE G-quadruplex and mutated +24
mCRE. .......................................................................................................................... 134
Figure 4.4: CD spectroscopy of wild-type and mutant +24 G4s. ................................. 135
Figure 4.5: Schematic representation of the predicted Runx1+24 mCRE G-quadruplex in
parallel conformation and when base subsitutions are made at the +24 mCRE. ......... 136
Figure 4.6: Diagram of Luciferase reporter construct pGL4.23-GW. .......................... 137
Figure 4.7: Schematic representation of pGL4 Runx1+24 mCRE G-quadruplex in parallel
conformation and mutated +24 mCREs. ...................................................................... 138
Figure 4.8: Luciferase activity in response to the various +24 mCREs in the NIH/3T3
cell line. ........................................................................................................................ 139
Figure 4.9: RUNX1 +24 Mapped read count. ............................................................... 141
Figure 4.10: RUNX1 +24 4C quality evaluation. ......................................................... 143
Figure 4.11: RUNX1 +24 cis 4C profile (chr21) in K562 cells (WT 12 DMSO)......... 146
Figure 4.12: RUNX1 +24 significant cis interactions. .................................................. 147
Figure 4.13: Profile of RUNX1 +24 significant cis interactions. .................................. 148
Figure 4.14: RUNX1 +24 near-bait 4C profile. ............................................................ 149
Figure 4.15: RUNX1 +24 significant near-bait interactions (chr21:34999764-37572609).
...................................................................................................................................... 151
xix
Figure 4.16: RUNX1 +24 chromatin hub. ..................................................................... 155
Figure 5.1: Megakaryocyte differentiation with PMA stimulation RUNX1 +24 WT cis
4C profiles. ................................................................................................................... 168
Figure 5.2: Differentiation-induced changes to the RUNX1 +24 WT cis interactions. 170
Figure 5.3: Differentiation-induced changes in the number of RUNX1 +24 significant cis
interactions with super-enhancers in WT K562 cells. .................................................. 171
Figure 5.4: Megakaryocyte differentiation with PMA stimulation RUNX1 +24 WT near-
bait 4C profile. .............................................................................................................. 172
Figure 5.5: Differentiation-induced changes to the RUNX1 +24 WT near-bait
interactions. .................................................................................................................. 174
Figure 5.6: Changes to the RUNX1 +24-anchored chromatin hub during megakaryocyte
differentiation of K562 cells. ........................................................................................ 179
Figure 5.7: Time course of PMA stimulation of all RUNX1 +24 cis 4C profiles. ....... 181
Figure 5.8: Significant changes to the RUNX1 +24 cis baseline (12 DMSO) interactions
due to cohesin deficiency. ............................................................................................ 183
Figure 5.9: Features of RUNX1 +24 significant cis interactions in WT baseline and
STAG2 baseline cells (12 DMSO). .............................................................................. 184
Figure 5.10: Significant changes to the RUNX1 +24 12h PMA treatment cis interactions
due to cohesin deficiency. ............................................................................................ 186
Figure 5.11: Significant changes to the RUNX1 +24 72h PMA treatment cis interactions
due to cohesin deficiency. ............................................................................................ 188
Figure 5.12: Profile of RUNX1 +24 significant cis interactions with super-enhancers
during megakaryocyte differentiation with PMA stimulation in WT and STAG2 mutant
cells. .............................................................................................................................. 189
Figure 5.13: Time course PMA stimulation of all RUNX1 +24 near-bait 4C profiles. 191
xx
Figure 5.14: Significant changes to the RUNX1 +24 baseline (12 DMSO) of near-bait
interactions due to cohesin deficiency. ......................................................................... 193
Figure 5.15: Significant changes to the RUNX1 +24 near-bait interactions after 12h PMA
treatment due to cohesin deficiency. ............................................................................ 194
Figure 5.16: Significant changes to the RUNX1 +24 near-bait interactions after 72h PMA
treatment due to cohesin deficiency. ............................................................................ 196
Figure 5.17: The effect of cohesin deficiency on the RUNX1 +24 enhancer-anchored
chromatin hub in baseline (12 DMSO) conditions in K562 cells. ............................... 198
Figure 5.18: The effect of cohesin deficiency on the RUNX1 +24 enhancer-anchored
chromatin hub in mid-differentiation (12 PMA) conditions in K562 cells. ................. 199
Figure 5.19: The effect of cohesin deficiency on the RUNX1 +24 enhancer-anchored
chromatin hub in post-differentiation (72 PMA) conditions in K562 cells.................. 200
Figure 6.1:Schematic overview of human RUNX1 locus (hg19: chromosome 21:
36,148,773-36,872,777). .............................................................................................. 207
Appendix VIII Figure 1: RUNX1 +24 demultiplexed read count................................. 289
Appendix VIII Figure 2: RUNX1 +24 Mapped read count. ......................................... 289
Appendix VIII Figure 3: RUNX1 +24 4C quality evaluation. ...................................... 290
Appendix X Figure 1: RUNX1 +24 trans chromosome 1 reads. .................................. 297
Appendix X Figure 2: RUNX1 +24 chr 1 trans interactions. ....................................... 298
Appendix X Figure 3: RUNX1 +24 chr 3 trans interactions. ....................................... 299
Appendix X Figure 4: RUNX1 +24 chr 9 trans interactions. ....................................... 300
xxi
List of Tables
Table 2.1: Plasmids used in this study............................................................................ 47
Table 2.2: Runx1 +24 enhancer G-quadruplexes Oligo ................................................. 49
Table 2.3: Mutant strand synthesis reaction ................................................................... 51
Table 2.4: Cell lines used in this study ........................................................................... 58
Table 2.5: qRT-PCR reaction mix .................................................................................. 62
Table 2.6: Transfection components per well ................................................................ 63
Table 2.7: Analysed cell lines ........................................................................................ 66
Table 2.8: Analysed ENCODE annotations ................................................................... 66
Table 2.9: 4C-PCR reagents .......................................................................................... 74
Table 2.10: Optimised PCR conditions for 4C-PCR primers........................................ 74
Table 2.11: 4C-Seq Library and sequencing adapters ................................................... 76
Table 2.12: Primer sequences and lengths used for demultiplexing of 4C baits ........... 80
Table 2.13: Length of sequenced reads after trimming ................................................. 81
Table 3.1: Runx1 +24 regulatory region ........................................................................ 86
Table 3.2: Identified mouse Runx1 regulatory regions with available classifications ... 88
Table 3.3: Identified human RUNX1 regulatory regions with available classifications . 91
Table 3.4: Identified long range interactions with putative mouse Runx1 regulatory
regions. ........................................................................................................................... 94
Table 3.5: Analysed ENCODE annotations ................................................................. 101
Table 3.6: Conservation and annotations of previously identified Runx1 regulators ... 101
Table 3.7: Cell types with regulatory region RNA expression .................................... 105
Table 3.8: SNP analysis of previously identified Runx1 regulators ............................. 107
Table 3.9: Spread of 237 Runx1 regulatory region mutations that were found in ICGC
dataset represented in a range of primary tissue sites. .................................................. 112
xxii
Table 3.10: Conservation and annotations of novel Runx1 regulators ......................... 122
Table 3.11: SNP analysis of novel m-371 enhancer ..................................................... 123
Table 4.1: Conservation and annotations of RUNX1 +24 ............................................ 133
Table 4.2: Type and name of genes that interact with RUNX1 +24 ............................. 153
Table 4.3: Name of genes that interact with Runx1 +24 in HPC7s .............................. 154
Table 4.4: Annotations of identified putative regulatory region connections .............. 157
Table 4.5: Annotations of identified trans interactions ................................................ 158
Table 5.1: Annotations of identified putative regulatory region connections .............. 176
Table 5.2: Number of genes connected to RUNX1 +24 in WT cells ............................ 177
Table 5.3: Number of genes connected to RUNX1 +24 in WT cells ............................ 197
Table 6.1: Summary of conserved RUNX1 regulatory region analysis ........................ 215
Appendix III Table.1: Details of the primers used in this project ............................... 269
Appendix VII Table 1: Hg38 Genome coordinates for regulatory regions ................. 287
Appendix VIII Table 1: Read counts before and after primer demultiplexing ........... 288
Appendix IX Table 1: WT 12 DMSO cis interactions ................................................. 290
Appendix IX Table 2: WT 12 DMSO near-bait interactions ....................................... 292
Appendix XI Table 1: Differentially decreased cis interactions in WT 12 h PMA cells
compared to WT 12 h DMSO K562 cells .................................................................... 301
Appendix XI Table 2: Differentially decreased cis interactions in WT 72 h PMA cells
compared to WT 12 h DMSO K562 cells .................................................................... 301
Appendix XI Table 3: Differentially decreased cis interactions in STAG2 12h DMSO
cells compared to WT 12h DMSO K562 cells ............................................................. 302
Appendix XI Table 4: Differentially decreased cis interactions in STAG2 12 h PMA cells
compared to WT 12 h PMA K562 cells ....................................................................... 303
xxiii
Appendix XI Table 5: Differentially decreased cis interactions in STAG2 72 h PMA cells
compared to WT 72 h PMA K562 cells ....................................................................... 303
Appendix XII Table 1: Differentially connected near-bait interactions in WT 12 h PMA
cells compared to WT 12 h DMSO K562 cells ............................................................ 305
Appendix XII Table 2: Differentially connected near-bait interactions in WT 72 h PMA
cells compared to WT 12 h DMSO K562 cells ............................................................ 305
Appendix XII Table 3: Differentially connected near-bait interactions in STAG2 12 h
DMSO cells compared to WT 12 h DMSO K562 cells ............................................... 307
Appendix XII Table 4: Differentially connected near-bait interactions in STAG2 12 h
PMA cells compared to WT 12 h PMA K562 cells ..................................................... 316
Appendix XII Table 5: Differentially connected near-bait interactions in STAG2 72 h
PMA cells compared to WT 72 h PMA K562 cells ..................................................... 316
Appendix XIII Table 1: List of gene connecting to RUNX1 +24 connections in each
condition ....................................................................................................................... 321
Appendix XIV Table 1: Differential cis interactions in STAG2 72 h PMA cells compared
to STAG2 12 h DMSO K562 cells ............................................................................... 327
Appendix XIV Table 2: Differential near-bait interactions in STAG2 72 h PMA cells
compared to STAG2 12 h DMSO K562 cells .............................................................. 336
xxiv
List of symbols and abbreviations
°C Degrees Celsius 3C Chromosome Conformation Capture 3D Three-dimensional 4C Circularised Chromosome Conformation Capture 4C-seq 4C Sequencing 5C Chromosome Conformation Capture Carbon Copy AGM Aorta-Gonad Mesonephros AML Acute Myeloid Leukaemia
AML1 Acute Myeloid Leukemia 1 Protein
ANOVA One-Way Analysis of Variance
ATAC-seq Assay for Transposase-Accessible Chromatin Using Sequencing
ATP Adenosine Triphosphate
BET Bromodomain And Extra-Terminal Motif
BLAT Blast-Like Alignment Tool
bp Base Pair
BS Binding Sites
CAGE Cap-Analysis Gene Expression
CD Circular Dichroism
cDNA Complementary DNA
ChIA-PET Chromatin Interaction Analysis Using Paired-End Sequencing
ChIP Chromatin Immunoprecipitation
ChIP-seq Chromatin Immunoprecipitation and Sequencing
chr Chromosome
ChromHMM Chromatin State Predictions
CML Chronic Myelogenous Leukaemia
CMML Chronic Myelomonocytic Leukaemia
CRE cis-Regulatory Element
CTCF CCCTC Binding Factor
xxv
DA Dorsal Aorta
DMEM Dulbecco’s Modified Eagle’s Medium
DMSO Dimethyl Sulfoxide
DNA Deoxyribonucleic Acid
DNase I HS Dnase I Hypersensitive Sites
DNase-seq Dnase I Hypersensitive Sites Sequencing
dpf Days Post Fertilisation
DS-AMKL Down’s Syndrome Acute Megakaryoblastic Leukaemia
dsDNA Double-Stranded DNA ENCODE Encyclopedia of DNA Elements eQTL Expression Quantitative Trait Locus E. coli Escherichia coli eRNA Enhancer RNA FACs Fluorescence-Activated Cell Sorting FBS Fetal Bovine Serum FISH Fluorescent In Situ Hybridization FPD Familial Platelet Disorder G4 G-Quadruplex GFP Green Fluorescent Protein GTEx Genotype-Tissue Expression GWAS Genome-Wide Association Studies h hours H3K27ac Histone H3 Lysine 27 Acetylation H3K27me3 Histone H3 Lysine 27 Trimethylation H3K36me1 Histone H3 Lysine 36 Monomethylation H3K4me1 Histone H3 Lysine 4 Monomethylation H3K4me2 Histone H3 Lysine 4 Dimethylation H3K4me3 Histone H3 Lysine 4 Trimethylation H3K9ac Histone H3 Lysine 9 Acetylation HE Hemogenic Endothelium Hi-C Genome-Wide Chromatin Conformation Capture HL-60 Human Myelocytic Leukaemia Cell Line HPC7 Mouse Haematopoietic Progenitor Cells hpf Hours Post Fertilisation HS Hypersensitivity Sites HSCs Haematopoietic Stem Cells HSPC Hematopoietic Stem/Progenitor Cells IAC Intra -Arterial Clusters ICGC International Cancer Genome Consortium ICM Inner Cell Mass IMDM Isocove’s Modified Dulbecco’s Media INS Insulator Assay Vector inv Inversion of Chromosome
xxvi
K562 Chronic Myelogeneous Leukaemia Cells kb Kilobase L Litre LB Luria Broth LD Linkage Disequilibrium lncRNA Long Non-Coding RNA LPA Linear Polyacrylamide m Mouse MAF Minor Allele Frequency mb Megabase MDS Myelodysplastic Syndrome µg Microgram mg Milligram µL Microlitre mL Millilitre mm Micrometre mM Millimolar MPN Myeloproliferative Neoplasms mRNA Messenger Ribonucleic Acid µS Microsiemens MTG Monothiolglycerol ng Nanogram NIH/3T3 Mouse Embryonic Fibroblast Cells nl Nanolitre Registered Trademark Unregistered Trademark P Passage P1 Promoter 1 P2 Promoter 2 P3 Promoter 3 PBS Phosphate Buffered Saline pc Post Conception pCarA Cardiac Actin Promoter PCR Polymerase Chain Reaction pf Post Fertilisation PMA Phorbol 12- Myristate 13-Acetate PRC2 Polycomb Repressive Complex 2 PRO-seq Precision Run-On Sequencing PS Penicillin Streptomycin QC Quality Control qRT-PCR Quantitative Reverse-Transcriptase PCR R3C RNA‐Guided Chromatin Conformation Capture RACE Rapid Amplification Of cDNA Ends RE Restriction Enzyme RFP Red Fluorescent Protein RLU Relative Luciferase Units RNA Ribonucleic Acid RNA Pol II RNA Polymerase II RNA-seq Rpm Revolutions Per Minute
xxvii
Runx Runt-Related Runx1 Runt-Related Transcription Factor 1 Runx2 Runt-Related Transcription Factor 2 Runx3 Runt-Related Transcription Factor 3 S.E.M Standard Error of Mean SAM Sequence Alignment/Map sc Single Cell SCF Stem Cell Factor SDS Sodium Dodecyl Sulfate SELEX Systematic Evolution of Ligands By Exponential Enrichment SINE Short Interspersed Element siRNA Small Interfering RNA SNP Single Nucleotide Polymorphism t Translocation of Chromosome TAD Topologically Associated Domain TCGA The Cancer Genome Atlas TFBS Transcription Factor Binding Sites TFs Transcription Factors TL Tübingen long fin TSS Transcriptional Site Start UCSC University Of California, Santa Cruz UV Ultraviolet V Volt v/v Volume Per Volume WT Wild Type ZED Zebrafish Enhancer Detector
xxviii
Nomenclature
Specific gene and protein nomenclature applied to each species. All the gene
names are italicised, while protein names are not.
Species Gene Protein Zebrafish runx1 runx1 Mouse Runx1 Runx1 Human RUNX1 RUNX1 Multiple species Runx1 Runx1
xxix
Names of previously identified Runx1 enhancers used throughout the thesis
Name used in thesis Also known as +24 +23/ +24/ eR1/ RE1 m-371 -371 m-368 -368 m-354 -354 m-328 -328 m-327 -327 m-322 -322 m-321 -321 m-303 -303 m-101 -101 m-59 -59 m-58 -58 m-48 -48 m-43 -43 m-42 -42 m+1 +1 m+3 +3 m+110 +110 m+181 +181 m+204 +204 -139 -139 / E1 -188 -188/ E4 -250 -250/ E6 +62 +62 intron 5.2 Intron 5.2/RE2
1
1 Chapter 1: Introduction
1.1 Haematopoiesis
Haematopoiesis is the formation, development and differentiation of all the blood
cellular components, and is fundamentally conserved throughout vertebrate evolution
(Davidson et al., 2004). Haematopoietic stem cells (HSCs) are the source of all blood cells that
are required throughout the lifetime of an organism. HSCs are capable of self-renewal and
differentiation into progenitor cells, which generate specialised haematopoietic cell lineages
(Dobrzycki et al., 2020). Haematopoiesis begins in the early stages of embryo development
(primitive) and carries on into adulthood (definitive) to maintain a steady supply of
haematopoietic cells over the lifespan of an organism (Orkin et al., 2008).
Correct spatiotemporal transcription of haematopoietic genes is essential in the
initiation of embryonic haematopoiesis, as well as in the maintenance of haematopoiesis in
adults (Dobrzycki et al., 2020). Combinations of regulatory factors (transcription factors,
enhancers, mediators, cohesins, insulators, silencers) help coordinate differentiation,
proliferation, ensure survival of haematopoietic cells, and maintain correct levels of
haematopoietic gene expression throughout life (Peng et al., 2018). Misexpression of crucial
haematopoietic genes can lead to failure to produce HSCs, or haematopoietic disorders that can
lead to leukaemia (Wilson et al., 2011). Therefore, understanding the drivers of normal
haematopoiesis can provide opportunities for intervention when haematopoiesis is
dysregulated.
1.1.1 Primitive and definitive haematopoiesis
In vertebrate embryos, haematopoiesis occurs in two sequential waves: “primitive” and
“definitive” (Orkin et al., 2008). Each wave occurs in anatomically distinct locations that vary
throughout development (Figure 1.1). The initial wave, known as primitive haematopoiesis,
2
specifies endothelial to haematopoietic precursors and takes place in the mammalian yolk sac
(Figure 1.1A) (Davidson et al., 2004; Dzierzak et al., 1998; Ivanovs et al., 2017; Julien et al.,
2016). The primary function of this wave is the production of red blood cells (erythrocytes),
which enable tissue oxygenation as the embryo undergoes rapid growth (Orkin et al., 2008).
The primitive wave is transient and rapidly superseded by the adult-type definitive
haematopoiesis. Definitive haematopoiesis produces multilineage self-renewing HSCs and
allows for the life-long production of all blood cells (Ciau-Uitz et al., 2014) (Figure 1.2). HSC
specification begins in cells from an endothelial precursor, termed the hemogenic endothelium
(HE) (Figure 1.2) (Bonkhofer et al., 2019; Gritz et al., 2016). Expression of the haematopoietic
gene Runx1 defines the distinct HE population (Gritz et al., 2016; Kalev-Zylinska et al., 2002).
These cells appear from HE in the aorta-gonad-mesonephros (AGM) region, then subsequently
colonise the fetal liver (Figure 1.1B) (Bonkhofer et al., 2019; Gritz et al., 2016). Finally, the
permanent location for HSCs to seed is in the bone marrow (Figure 1.1C) (Julien et al., 2016;
Martin et al., 2011). In mammals, initial production of B- and T- lymphocytes occurs in bone
marrow, and then cells migrate to other sites to complete development (Germain, 2002; Loder
et al., 1999) (Figure 1.1C).
3
Figure 1.1: Schematic of haematopoiesis cell generation locations during human development.
A) Primitive haematopoiesis occurs in the yolk sac blood island (blue) 16-17 days post conception (pc). B) Definitive haematopoiesis begins first in the Aorta-gonad-mesonephros (purple) 4 weeks pc, then in the foetal liver (orange) 5 weeks pc. C) Bone marrow (pink) is the lifelong source of adult definitive haematopoiesis from 12 weeks pc. The thymus (yellow) is the site of T lymphocyte development, whereas the spleen (brown) develops B lymphocytes. Figure is from Thomas (2015).
HSCs are required throughout an organism’s life to replenish precursors of multiple
lineages, as mature haematopoietic cells are predominantly short lived (Orkin et al., 2008).
HSCs must self-renew to maintain their own population, as well as produce differentiated blood
cells. In adult humans (Homo sapiens) and mice (Mus musculus), a limited number of HSCs
reside in the bone marrow and commence the cascade of cell differentiation (Albacker et al.,
2009). HSCs are at the root of progenitor cells which gradually produce mature haematopoietic
cells; this haematopoietic model is described as a multi-step process with cells becoming more
lineage-specific at each step (Figure 1.2). In this model, self-renewing HSCs, at the top give
rise to common progenitors/stem cells for the myeloid and lymphoid lineages. Following this
first step in lineage-restriction, myeloid stem cells give rise to proerythroblasts,
4
megakaryoblasts, myeloblasts and monoblasts; lymphoid stem cells, on the other hand, give
rise to B- and T- lymphocytes and large granular lymphocytes (Orkin et al., 2008).
Proerythroblasts differentiate into erythrocytes, these cells contain haemoglobin and
transporting oxygen from the lungs to different sites of the body. Erythrocytes make up 40-
45% of blood volume (Popel, 1989).
White blood cells or leukocytes are generated from both myeloid and lymphoid cell
types and contribute about ~1% of blood volume. Leukocytes are crucial in protection against
infection as they form the major part of the innate immune system (Lay et al., 1968; Trinchieri,
1989). Megakaryoblasts produce megakaryocytes and platelets; during thrombopoiesis,
megakaryocytes break into fragments to make platelets. Platelets are fundamental to blood
coagulation and wound healing (Opneja et al., 2019). Myeloblasts generate granulocytes;
basophils, neutrophils and eosinophils. Neutrophils are transient immediate-responders to
general infection, while basophils and eosinophils are predominantly connected to allergies
and inflammation (Ginhoux et al., 2014; van Furth et al., 1972). Monoblasts differentiate into
circulating monocytes and tissue-resident macrophages (Ginhoux et al., 2014; van Furth et al.,
1972).
Lymphoid derived T-lymphocytes and B-lymphocytes are crucial to adaptive
immunity. Large granular lymphocytes are innate cytotoxic cells that target cells that are virally
infected or cancerous; they target these compromised cells through the occurrence of
immunoglobulin or major histocompatibility complex class 1 receptors (Lay et al., 1968;
Trinchieri, 1989).
5
Figure 1.2: Vertebrate primitive and definitive haematopoiesis cell development pathways.
Primitive haematopoiesis is the initial wave of blood production which is transient and rapidly replaced by the adult-type definitive haematopoiesis (Highlighted in grey). Definitive haematopoiesis is the life-long production of all blood cells. Haematopoietic stem cells reside in the bone marrow in adult mammals and yield blood precursors devoted to unilineage differentiation and production of mature blood cells. HSCs are also capable of self-renewal.
1.1.2 Transcription factors involved in haematopoiesis
It is well established that a number of transcription factors are required to regulate the
fate determination of haematopoietic stem cell populations. These transcription factors include
6
(but are not limited to) Tal1, Runx1, Gata1, Gata2, Lmo2, Spi1, Notch and c-Myb in most
vertebrates (Okuda et al., 2001).
Primitive haematopoiesis is principally regulated by two factors; Gata1 (erythroid) and
Spi1 (myeloid) (Cantor et al., 2002). Mesodermal haematopoietic cell-fate specification is
facilitated by the basic-helix-loop-helix factors (Tal1 and Lmo2) in both haematopoietic waves
(Kim et al., 2007).
Erg and Gata2 are required for HSC survival and self-renewal (Ezoe et al., 2002;
Taoudi et al., 2011). Maintenance of HSCs is also co-ordinated through Notch, Wnt and Hox
signalling pathway factors. Runx1 is crucial for making HSCs in the AGM and subsequently
in definitive haematopoiesis Runx1 protein is required for correct megakaryocyte and
lymphocyte production (Ichikawa et al., 2004; North et al., 1999; North et al., 2004).
Dysregulation of normal haematopoiesis leads to a diverse range of diseases collectively called
haematological malignancies.
1.2 Haematological disorders
Misexpression of transcription factors can lead to failure to produce HSCs, or
haematopoietic disorders that can lead to leukaemia (Dobrzycki et al., 2020). There are many
different types of haematological disorders; including chromic myeloid leukaemia (CML),
acute myeloid leukaemia, familial platelet disorder, B- or T- cell lymphoblastic leukaemia.
This thesis will focus on myeloid dysplasias (acute myeloid leukaemia and familial platelet
disorder) as RUNX1 has a central role in disease progression.
Disruption of Runx1 leads to the complete loss of definitive haematopoiesis in all
lineages (Kalev-Zylinska et al., 2002; Okuda et al., 1996; Wang et al., 1996). Somatic
mutations in the RUNX1 gene are one of the most frequent mutations identified in the patients
with myeloid malignancies (Harada et al., 2005; Harada et al., 2009; Hayashi et al., 2017).
RUNX1 (runt-related transcription factor 1) is also known as AML1 (acute myeloid leukaemia
7
1 protein), as it was initially cloned from a chromosomal breakpoint normally associated with
acute myeloid leukaemia (Meyers et al., 1993; Miyoshi et al., 1991; Okuda et al., 2001).
RUNX1 mutations can result in familial platelet disorder, an inheritable blood disease, which
predisposes individuals to acute myeloid leukaemia (Ito et al., 2015).
1.2.1 Familial platelet disorder
Familial platelet disorder (FPD) is characterised by inherited platelet defects caused by
germline mutations in RUNX1 (Luddy et al., 1978; Song et al., 1999). Clinically, patients
present with mild to moderate thrombocytopenia, platelet dysfunction, bleeding, and an
increased predisposition of developing acute myeloid leukaemia (around a 35% of patients)
(Owen et al., 2008). Many RUNX1 variants are reported only in individual families; more than
70 families with inherited RUNX1 mutations have been identified (Sood et al., 2017). FPD can
almost exclusively be attributed to RUNX1 mutation or downregulation. RUNX1 mutations in
FPD generate null, hypomorphic, or dominant-negative alleles (Churpek et al., 2015; Krutein
et al., 2021; Latger-Cannard et al., 2016; Matheny et al., 2007; Song et al., 1999).
1.2.2 Acute Myeloid Leukaemia
Leukaemia is a common name given to cancers that develop in the bone marrow and
blood cells. Leukaemia occurs due to the dysregulation of a normal cell, which then enables
one cell type to proliferate in excess. Leukaemia is normally classified according to the original
cell lineage in which the cancer occurs. i.e., myeloid or lymphoid (Figure 1.2). The leukaemia
is further distinguished into two types: acute leukaemia, which progresses at a fast pace and
chronic leukaemia, which develops over a longer time period (Teittinen et al., 2012).
Therefore, Acute myeloid leukaemia (AML) is a fast developing leukaemia, which
occurs in the myeloid cell lineage. Whilst found predominantly in individuals over 65, 15-20%
of AML cases occur in childhood. (O'Donnell et al., 2012).
8
Chromosomal abnormalities such as t(9;21), t(9;11), t(8;21), inv(16), t(15;17) are
clinically distinctive in AMLs and indicate a genetic basis to leukemogenesis. However, major
chromosome abnormalities only describe 45% of AML cases (Betz et al., 2010). Unlike other
adult cancers, AML genomes have fewer somatic mutations, with a total of 23 genes
significantly mutated (CGARN, 2013). Molecular classification of AMLs can offer prognostic
information (Betz et al., 2010). Somatic driver mutations in genes have been identified in ~
80% of patients across various myeloid malignancies subtypes (Shallis et al., 2018). Within
the myeloid malignancies, a median of three somatic mutations per patient was observed
(Shallis et al., 2018). Several studies have detected at least one driver gene mutation in AML
patients (86%) (Bullinger et al., 2010; CGARN, 2013; Hou et al., 2020a; Papaemmanuil et al.,
2016; Yu et al., 2020). RUNX1 and other mutations (TP53, EZH2, ETV6, ASXL1, SRSF2)
predict poor overall survival (Yu et al., 2020). The prognostic significance of these mutations
appears to be maintained irrespective of whether these mutations are early or late events in
disease progression (Yu et al., 2020).Understanding the mutational profile of malignant cells,
helps define which treatments will be successful and the likelihood of relapse or achieving
remission (Assi et al., 2019; Conway O’Brien et al., 2014; Papaemmanuil et al., 2016; Shallis
et al., 2018; Tyner et al., 2018).
1.2.3 Leukaemogenesis
Gilliland et al., 2002 suggested a conventional model of leukaemogenesis, in which
they proposed that two different types of genetic mutations were necessary for malignant
transformation of the myeloid precursor (Gilliland et al., 2002; Jordan, 2002). Class I mutations
are understood to lead to unrestrained cellular proliferation and evasion of apoptosis, whereas
Class II mutations are associated with preventing differentiation. It is therefore suggested that
alone, Class I mutations lead to myeloproliferative diseases, however Class II mutations lead
to the development of myelodysplastic syndromes (Conway O’Brien et al., 2014).
9
Mutations in epigenetic modifier genes including DNA methylation and histone
modification have been found in a significant proportion of AML (Jaiswal et al., 2019; Shih et
al., 2012); these mutations do not fit into Class I or II. It is also been proposed that mutations
have to occur at a particular time in cell development and in a certain order to progress to
leukaemia (Murati et al., 2012). This evidence indicates that Gilliland’s conventional model is
an oversimplification of a more complex process. Each novel mutation and its potential roles
need to be thoroughly evaluated so that updated models of AML development can be defined.
As healthy individuals age, the haematopoietic cells become burdened with mutations
typically seen in AML patients (Young et al., 2016; Zink et al., 2017). Abnormal and excessive
growth of myeloid cells (myeloid neoplasms) develop due to a gradual gain of somatic
mutations in haematopoietic cells (Kim et al., 2016). Expansion of these myeloid neoplasms
may lead to a pre-leukaemic state. Pre-leukaemic clones are dynamic and can have several
founder mutations (Kim et al., 2016); even after treatment, these mutations can persist through
to complete remission and then lead to relapse with additional secondary mutations (Shlush et
al., 2014).
Founder haematopoietic mutations are largely characterised by mutations in genes
related to epigenetic control of the genome; these mutations are frequently observed in pre-
leukaemic clones (Desai et al., 2018). Chromatin and RNA splicing-associated factor
mutations can arise later, or alongside epigenetic gene mutations. Additionally, mutations in
transcription factors, and genes encoding signalling proteins occur as late events leading to
AML transformation (Papaemmanuil et al., 2016). Desai et al., (2018) studied 188 patients and
revealed mutations in epigenetic genes and splicing factors were evident up to 9.6 years before
AML onset; however, mutations in transcription factors were found to be late events tied to the
onset of AML (Desai et al., 2018). Therefore, AML is characterised by combinatorial
mutations that are attained in a hierarchical manner.
10
1.2.4 AML patient survival and treatment
The genetic and epigenetic profile of malignant cells influences the threat of relapse
and the likelihood of achieving remission (Conway O’Brien et al., 2014). Despite the high
variability in AML subtypes, the outcome for most is unfavourable. Patients aged 60 and under
achieve a remission with chemotherapy in approximately 70-80% of cases, whereas for those
over 60 it is 50% chance of remission (Bower et al., 2016; Burnett et al., 2011; Sweeney et al.,
2019). Despite achieving remission, without ongoing treatment most patients subsequently
relapse, typically within 6 months, and relapse is coupled with poor prognosis (Goyal et al.,
2016; Sweeney et al., 2019)
A small number of therapies targeted to mutational events have been developed for
patients with AML, although the current standard of care remains largely unchanged over the
past 30–40 years (Tyner et al., 2018). Rigorous chemotherapy is the conventional treatment
for AML and depending on the cancer’s genetic profile, bone marrow transplantation may also
be used (Ferrara et al., 2008). The combination of cytotoxic chemotherapy drugs used varies
depending on the patient’s age and karyotype (Ferrara et al., 2013). Chemotherapy drugs are
designed to target proliferating cells; however, other cells rapidly divide including those in
bone marrow, gastrointestinal mucosa, hair follicles and gonads, which results in adverse side
effects (Corrie, 2008).
Advanced age in most patients affects their ability to survive the side effects of
chemotherapy (Ferrara et al., 2008). In addition to improved antibiotic, antifungal, and antiviral
drugs to reduce infections, developments to improve supportive care include high quality
platelet transfusions and elimination of post transfusion hepatitis (Ferrara et al., 2013).
Increased understanding of the different disease mechanisms may detect potential therapeutic
targets and refine treatment strategies, and consequently improve survival rates.
11
Some targeted therapies for leukaemias exist, such as imatinib mesylate (Gleevec),
which is a very efficient treatments of Philadelphia chromosome positive chronic myeloid
leukemia (Kantarjian et al., 2002a; Kantarjian et al., 2002b). AML treatment has expanded to
include inhibitors for gene mutation-targeted therapies. For example, patients with FLT3-ITD–
mutated AML who were treated with FLT3-inhibitor therapy at diagnosis have demonstrated
significant survival benefit (Daver et al., 2015; DiNardo et al., 2016; Röllig et al., 2015; Stein,
2015; Stone et al., 2015).
AML with mutated transcription factor RUNX1, is considered a biologically distinct
subgroup due to its distinctive clinico-pathologic features and inferior prognosis (Gaidzik et
al., 2016). Currently no targeted or specific therapeutic options exist for RUNX1-mutated
patients, or patients with other high-risk genetic lesions such as chromatin-spliceosome
(including cohesin subunits) mutations, TP53 mutations, or complex cytogenetics (DiNardo et
al., 2016).
Papammanuil et al., (2016) found that 4% of AML patients sequenced contained no
driver mutations (Papaemmanuil et al., 2016). The molecular events that cause the
development and pathology of this leukaemia have so far not been completely defined.
Difficulties arise for clinicians when providing prognoses to AML patients without well-known
mutations. Changes to the known leukaemia genes by mutation in non-coding regulatory
regions, which could alter gene regulation; rather than mutations in protein-coding sequence,
may explain why not all instances of leukaemia harbour mutations in known genes. The cell-
fate of haematopoietic cells from a common progenitor relies upon the differential expression
of key haematopoietic genes. Mutations in these key genes, such as RUNX1, or changes to gene
regulation, could consequently alter normal blood development, and lead to the development
of leukaemia.
12
1.2.5 Cohesin mutation contributes to AML
Cohesin is a chromosome-associated multiprotein ring complex made up of four major
subunits: two structural maintenance subunits, Smc1a and Smc3, which form a closed ring
together with Rad21 and Stag1 or Stag2 (Figure 1.3) (Costantino et al., 2020; Feeney et al.,
2010; Gruber et al., 2003; Nasmyth et al., 2009).This multi-subunit protein complex was
originally characterised for its crucial role in sister chromatid cohesion, as it ensures accurate
sister chromatid segregation during mitosis (Downes et al., 1991; Michaelis et al., 1997).
Cohesin also plays a role in DNA double strand break repair, chromatin loop extrusion and
regulation of gene expression. Complete loss of cohesin function stops mitosis and thus results
in cell death. Reductions in cohesin functions alters the development of multiple organ systems
in humans, indicating that the correct expression of developmental genes are sensitive to
cohesin dosage (Cuartero et al., 2018b; Deardorff et al., 2020).
Somatic mutations to cohesin subunits are typically heterozygous, mutually exclusive,
and result in loss of function (17% frameshift, 70% missense)(CGARN, 2013). Mutations in
RAD21, SMC3, and STAG1 subunits are always heterozygous, whereas mutations in the X
chromosome-located genes SMC1A and STAG2 can result in complete loss of function due to
male hemizygosity, or wild-type silencing during X-inactivation in females (Kon et al., 2013;
Thota et al., 2014; Tsai et al., 2017). STAG2 and STAG1 have redundant roles in cell division,
therefore complete loss of STAG2 is tolerated due to partial compensation by STAG1 (Tsai et
al., 2017) Nevertheless, STAG1 and STAG2 have non-redundant roles in facilitating three-
dimensional (3D) genome organisation (Kojic et al., 2018). Combined loss of both STAG2 and
STAG1 is lethal to cells (Benedetti et al., 2017; Van Der Lelij et al., 2017).
Cohesin proteins are mutated, have reduced function, or are overexpressed in multiple
cancer types including brain, breast, prostate, bladder, pancreatic, colorectal, Ewing sarcoma,
and myeloid leukaemia (Balbás-Martínez et al., 2013; Brennan et al., 2013; Brohl et al., 2014;
13
Evers et al., 2014; Guo et al., 2013; Jones et al., 2012; Rhodes et al., 2011; Solomon et al.,
2013; Stephens et al., 2012; Taylor et al., 2014; Welch et al., 2012). STAG2 is one of 12 genes
that is mutated at significant frequencies in at least four different tumour types (Lawrence et
al., 2014).
Figure 1.3: Structure of cohesin complex.
Smc1 (blue), Smc3 (purple), Rad21 (pink) and Stag1 or Stag2 (yellow) form a tripartite ring. Figure is from Thomas (2015).
Multiple studies observed cohesin mutations in myeloid malignancies, providing
evidence of a role for cohesin in pathogenesis (Au et al., 2016; CGARN, 2013; Kon et al.,
2013; Leeke et al., 2014; Lindsley et al., 2015; Papaemmanuil et al., 2016; Rocquain et al.,
2010; Ruffalo et al., 2015; Thol et al., 2014; Wang et al., 2016; Welch et al., 2012; Welch et
al., 2016). Cohesin insufficiency has been shown to impair differentiation and alter overall
chromatin accessibility, as well as the transcription of RUNX1 (Mazumdar et al., 2015;
Mullenders et al., 2015; Viny et al., 2015).
Cohesin mutant AML is rarely aneuploid, with over half of AML patients exhibiting a
normal karyotype, demonstrating that mitotic defects are not likely to be involved in
14
leukemogenesis (Kon et al., 2013). AML patients with cohesin mutations do not typically have
a higher observed presence of chromosomal translocations (Fisher et al., 2017a; Heimbruch et
al., 2021). Except RAD21 mutations, which have been reported to co-occur with t(8:21) in
AML patients (Tsai et al., 2017; Wilde et al., 2019),
Deep sequencing established that mutations in subunits or regulators of the cohesin
complex occur in 20% of AML samples (CGARN, 2013; Fisher et al., 2017a; Heimbruch et
al., 2021; Leeke et al., 2014; Losada, 2014). Somatic cohesin defects are found in 12% of
myeloid malignancy patients and a further 15% of patients had low cohesin expression (Thota
et al., 2014). Previous research has highlighted that cohesin mutations occur early in
leukaemogenesis and mainly arise in pre-leukaemic cells (CGARN, 2013; Corces et al., 2017;
Kon et al., 2013; Thol et al., 2014; Welch et al., 2012).
Mutations in cohesin subunits occur in other myeloid malignancies, including
myelodysplastic syndrome (MDS), Down Syndrome acute megakaryoblastic leukaemia (DS-
AMKL), myeloproliferative neoplasms (MPN), chronic myelogenous leukaemia (CML), and
chronic myelomonocytic leukaemia (CMML) (Kon et al., 2013; Thota et al., 2014; Yoshida et
al., 2013). Cohesin mutations are recurrent in myeloid malignancies but not in lymphoid
malignancies, consistant with evidence showing cohesin-deficient cells have myeloid-skewed
differentiation(Chen et al., 2019; Cuartero et al., 2018b; Viny et al., 2019). Cohesin mutated
myeloid leukaemia cells have reduced levels of chromatin-bound cohesin components,
implying a loss of function mechanism (Kon et al., 2013). Cohesin mutations found in patients
are mutually exclusive, illustrating that a mutation in just one member of the complex is
sufficient to decrease cohesin activity (Kon et al., 2013; Thota et al., 2014). Patients with
cohesin mutations have a lower 5 year survival rate of 20% when compared to cohesin-WT
AML patients (48%) (Heimbruch et al., 2021). Cohesin mutations co-occur with other
leukaemogenic mutations, indicating cooperation between pathways (Welch et al., 2012).
15
Cohesin mutations have been shown to impair the differentiation of HSPCs in humans
and mice (Mazumdar et al., 2015; Mullenders et al., 2015; Viny et al., 2019; Viny et al., 2015).
Stag2 deletion in HSPCs had altered hematopoietic function, increased self-renewal, and
impaired differentiation. (Viny et al., 2019). Typically increased inflammatory signals in
HSPCs promotes myeloid differentiation. Chen et al., (2019) demonstrated that HSPCs with
impaired cohesin function had reduced inflammatory gene expression and increased resistance
to the differentiation inducing inflammation stimuli (Chen et al., 2019). STAG2 is frequently
mutated in myeloid malignancies and shows co-mutation patterns with other drivers, including
RUNX1. Combined loss of Runx1 and Stag2 leads to myeloid-skewed expansion of HSPC and
MDS in mice (Ochi et al., 2020). Concurrent loss of Stag1 and Stag2 in HSPCs is lethal to
cells. Alone, Stag2 loss decreases the chromatin accessibility and differentiation gene
transcription (Viny et al., 2019). Cohesin subunits have dose-dependent consequences which
alter chromatin structure and HSC function in mice (Viny et al., 2015). Complete loss of Smc3
induced bone marrow aplasia and mitotic defects, whereas Smc3 haploinsufficiency reduced
transcriptional capacity (Viny et al., 2015). Haploinsufficiency of Smc3 increased self-renewal
of cells and when combined with Flt3-ITD mutation could induce acute leukaemia in mice
(Viny et al., 2015). Chin et al., (2020) highlighted that cohesin mutations are synthetic lethal
with Wnt pathway stimulation, and suggested that cohesin mutant cells could progress to
oncogenesis by enhancing their Wnt signaling (Chin et al., 2020). Depletion of Rad21
enhanced self renewal in HSPCs by derepression of Polycomb Repressive Complex 2 (PRC2)
target genes (Fisher et al., 2017b). Fisher et al., (2017) showcased that cohesin function is
crucial for proper PRC2 complex mediated gene silencing, and dysregulation of this epigenetic
mechanism may underly leukemogenesis in AML patients with cohesin mutations (Fisher et
al., 2017b).
16
Together these studies provide evidence that cohesin mutations are a causal event in
acute myeloid leukaemogenesis. Overall cohesin mutations are implicated as a putative genetic
precursor for development of AML, but how the disease develops from these mutations has
not been resolved. One possibility is that impairment of cohesin’s regulatory role leads to
dysregulation of other important leukaemogenic genes.
1.3 RUNX1 is a leukaemia gene
RUNX1 belongs to the family of Runx transcription factors (TFs), which are essential
for lineage-specific gene expression in major developmental pathways.
Runx genes encode transcription factor proteins defined by the highly-conserved Runt
DNA-binding domain (Figure 1.4). The conserved Runt domain is a fundamental functional
component of Runx proteins (Rennert et al., 2003). The Runt domain is required for DNA
binding and interacting with the conserved non-DNA-binding partner CBFβ (Figure 1.4). Runx
family members all have important roles during lineage-specifying steps in both embryonic
and adult developmental processes; they appear to be involved in cell quiescence, proliferation,
lineage commitment and cell-fate determination in mammals and invertebrates (Coffman,
2003, 2009; Groner et al., 2017).
17
Figure 1.4: The Runx protein.
The Runt domain (dark blue) of the Runx protein (blue) directs binding to the Runx DNA-motif (grey box) near the promoter of target genes (black right angled arrow). It also binds with core-binding factor-β (CBF-β) protein (purple). The Runx proteins either activate or repress gene transcription through interactions with other transcription factors (TFs)(Orange) and co-activators (arrows), or co-repressors (blocked line).
Runx1 is essential for definitive haematopoiesis (Levanon et al., 2004; Li et al., 2002).
Runx2 is crucial for osteoblast development and proper bone formation (Komori et al., 1997).
Runx3 is needed for the development of dorsal root ganglia proprioceptive neurons and proper
control of gastric epithelium growth (Levanon et al., 2002; Li et al., 2002). Because each Runx
protein has a particular biological function, mutations in Runx genes are often the cause for
specific diseases. RUNX1 mutations can cause FPD (Ito et al., 2015). Point mutations in the
runt domain of RUNX2 are reported in cleidocranial dysplasia, in which patients exhibit
skeletal deformities of the collarbone, teeth and cranium (Ito et al., 2015; Otto et al., 2002).
Point mutations of RUNX3 are associated with stomach and bladder cancers (Ito et al., 2015).
Each gene contains P1 (distal promoter) and P2 (proximal promoter) that generate
alternative Runx isoforms (Levanon et al., 2004; Rennert et al., 2003). A dual promoter
structure is a characteristic feature of vertebrate Runx genes, as invertebrates only have P2
(Rennert et al., 2003). Since all vertebrate Runx genes have this conserved dual promoter
structure, it is likely to be crucial for correct regulation (Rennert et al., 2003).
18
1.3.1 Role of RUNX1 in normal embryonic and adult developmental
processes
Runx1 expression is crucial in early myeloid differentiation, as well as playing an
essential role in adult blood development (Okuda et al., 2001). Runx1-/- mice fail to generate
haematopoietic cell clusters from the vessel wall and consequentially cannot generate
haematopoietic stem cells (HSCs), leading to mid-gestation lethality (Okuda et al., 1996;
Yokomizo et al., 2001). The Runx1 protein initiates transcription of genes that help control the
development of blood cells. In particular, it plays an important role in the development of
pluripotent haematopoietic stem cells that have the potential to develop into all types of mature
blood cells. (Ichikawa et al., 2004; North et al., 1999; North et al., 2004).
As well as haematopoiesis, RUNX1 plays a role in development of dorsal root ganglia,
cranial sensory neurons, myofibroblasts and mesenchymal stem cells, regulation of hair follicle
stem cells, mammary stem cells, gastric stem cells, neural crest stem cells, and some possible
roles in intestinal stem cells and oral epithelial stem cells (Groner et al., 2017; Inoue et al.,
2008; Kim et al., 2014; Kramer et al., 2006; Theriault et al., 2004).
1.3.2 Regulation of Runx1
The dual promoter structure of the Runx genes allows for many different isoforms of
the gene to be produced (Figure 1.5) (Levanon et al., 2004). The regulation machinery of these
genes and spatiotemporal regulation of Runx are poorly understood. It is possible that the
spatial-temporal regulation is driven by different 3D arrangements of the chromosome in
different cell types.
In mammals, the dual promoter structure of Runx1 allows for the transcription of at
least three diverse isoforms (Levanon et al., 2004). Transcription from P2 produces Runx1a
and Runx1b, whereas transcription from P1 generates Runx1c (Figure 1.5) (Bee et al., 2009b).
P1 and P2 expression alone is not consistent with Runx1 tissue specificity, suggesting
19
additional enhancer elements are required for correct initiation of runx1 (Marsman et al., 2014).
The 5’UTR of Runx1c is conserved between mammals and zebrafish, and may be important
for the regulation of Runx1 expression (Marsman et al., 2014). Lam et al., 2009 showed the
first exon of P1 transcript was not needed for definitive or primitive haematopoiesis in
zebrafish (Lam et al., 2009).
Expression of Runx1-P1 and-P2 isoforms is tightly controlled during haematopoiesis.
In early mouse haematopoiesis, before the generation of HSCs, expression of the P2 isoforms
are prominent (Bee et al., 2010; Pozner et al., 2007). Runx1-P1 is expressed shortly after P2,
its expression is co-ordinated with the generation of HSCs (Bee et al., 2009b; Bee et al., 2010).
The main isoform in the mouse fetal liver is P1 driven, the liver is the main site of definitive
HSPC development (Bee et al., 2009b).
Horsfield et al., 2007 showed that homozygote rad21 cohesin mutants lack cell type
specific expression of runx1 and runx3, highlighting that cohesin is necessary for correct runx
expression. Cohesin may provide an additional mechanism by which Runx expression can be
coordinated with cell cycle, allowing cell proliferation and differentiation (Horsfield et al.,
2007).
20
Figure 1.5: Comparative genomic organisation of the Runx1 locus in human, mouse and zebrafish.
The exons (grey) are numbered according to Marsman et al., (2014). Untranslated regions are shown as white boxes. Distances between promoters (black right-angled arrows) is indicated in kilobases (kb). Figure is from Thomas (2015).
21
1.3.3 Role of RUNX1 in AML progression
The developmental transcription factor RUNX1/AML1 has been identified as a classic
leukaemia gene (Okuda et al., 2001). Germline mutations to the Runt binding domain result in
FPD with patients having a predisposition to acute myelogenous leukaemia (Ito et al., 2015).
Disruptions such as truncations and translocations to RUNX1 are frequently associated with
AML (Betz et al., 2010; Tighe et al., 1993).
RUNX1 is important for the differentiation of cells of all haematopoietic lineages, and
there are two categories of genetic changes: translocations and somatic mutations (Bellissimo
et al., 2017). In AML, RUNX1 mutations that occur due to chromosomal translocations have a
favourable prognosis, compared to somatic mutations which are associated with a poorer
prognosis (Byrd et al., 2002; Grimwade et al., 1998; Schlenk et al., 2003).
RUNX1 is altered by chromosomal translocations; these disruptions can generate
oncogenic fusion proteins like RUNX1-ETO t(8;21) and ETV6-RUNX1 t(12;21) (Figure 1.6).
10-15% of all AML cases are characterised by the most common translocation t(8;21)
(Reikvam et al., 2011; Tighe et al., 1993). This translocation creates a chimeric gene of the 5'-
region of the RUNX1 gene fused with the 3'-region of the ETO gene (RUNX1-ETO) (Erickson
et al., 1992; Miyoshi et al., 1993). RUNX1-ETO protein is predicted to bind to several target
genes and dysregulate haematopoiesis which results in AML (Ajore et al., 2010; Gardini et al.,
2008). The breakpoints in characterised RUNX1 translocations are often downstream of the
Runt homology domain (De Braekeleer et al., 2011). ETV6-RUNX1 translocation is the most
frequent gene fusion (~25%) in paediatric acute lymphoblastic leukemia (ALL)(Sun et al.,
2017). It is important to note that RUNX1 changes can lead to AML as well as to ALL (Sood
et al., 2017). This is not often the case for other genes involved in haematological malignancies.
Translocations can also create truncated RUNX1 protein that interfere with the wildtype protein
22
product (Cheng et al., 2018). Chromosomal translocations in RUNX1 often result in
haematopoietic malignancies, though the molecular mechanisms are not often described.
Figure 1.6: RUNX1 gene common isoforms and leukaemic fusion isoforms.
The exons are depicted as grey and blue, the runt binding domain is blue. Untranslated regions are shown as white boxes. Distances between the two promoters P1 and P2 (black right angled arrows) is indicated in kilobases (kb). Straight arrows indicate common chromosomal breakpoints. Isoforms and common leukaemic fusion isoforms are depicted.
Cheng et al., 2018 characterised a translocation that disrupts a regulatory element
between RUNX1 P1 and P2. This regulatory element was shown to “normally” silence RUNX1
P2 (Cheng et al., 2018). They proposed that the translocation increased P2 transcription and
may activate P1 expression via a positive feedback loop (Cheng et al., 2018). RUNX1a isoform
is transcribed from P2 and its overexpression has been previously noted in AML (Liu et al.,
2009). RUNX1a overexpression has also been observed in patients with myelodysplastic
23
syndromes and myeloproliferative neoplasms; interestingly, levels of expression increased as
the disease progressed (Sakurai et al., 2017).
Studies have identified breakpoints in similar regions between the distal promoters in
cases of paediatric AML, B-lineage acute lymphoblastic leukaemia, and FPD (Buijs et al.,
2012; Cheng et al., 2018; Gandemer et al., 2007). The occurrence of DNA double strand breaks
which underpin translocations have been linked to 3D genome organisation (Cuartero et al.,
2018a).
Mutations are significantly associated with increasing age and higher platelet counts.
3% of paediatric and 15% of adult de novo AML patients have somatic RUNX1 point mutations
(Sood et al., 2017). Further mutations are usually required on top of RUNX1 alteration for the
progression to leukaemia (Asou, 2003). Molecular mechanisms for haematopoietic
malignancies include disruption of intron 1, which deregulates promoters and causes RUNX1
overexpression (Gandemer et al., 2007). Aberrant regulation of RUNX1 promoters was seen in
the human myeloid leukaemia (HL-60) cell line, implying that promoter dysregulation may
lead to leukaemogenesis (Marsman et al., 2014).
These previous studies indicate that progression to AML can occur by disruption of
RUNX1 regulation. This disruption can be a translocation which alters the genome structure or
somatic mutations which alter the RUNX1 transcription. Cheng et al., 2018 showcased that
dysregulation of regulatory elements can produce aberrant RUNX1 and progress
leukaemogenesis.
The 3D genome influences the regulation of genes at the transcriptional level
(Holwerda et al., 2012; Lieberman-Aiden et al., 2009). For instance, 3D folding of chromatin
provides a mechanism for distal enhancers to interact with target promoters (Deng et al., 2010).
In cases where no mutation is found in the coding sequence, instead there may be mutations in
regulatory sequences, such as enhancers (Ng et al., 2010).
24
1.4 Three-dimensional genome structure
Though the genome is often represented as linear in order to view the annotations of its
features, the genome is in reality arranged in three-dimensions. The eukaryotic genome is
tightly packaged to sit within the spatial limits of a cell nucleus; yet it must permit movement,
remain flexible and allow for access of transcription and regulatory factors to the DNA
sequence (Deng et al., 2010).
Genome organisation relies on two mechanisms: compartmentalisation and chromatin
loop formation. (Nuebler et al., 2018). Nuebler et al., (2018) previously showed that active
process of loop extrusion counteracts compartmental segregation, which indicates that these
two mechanisms are mutually exclusive (Nuebler et al., 2018). Genome-wide chromatin
conformation capture (Hi-C) is a technique used to systematically interrogate interaction
frequencies between genomic regions. The development of Hi-C technology has demonstrated
the presence of spatially insulated genomic regions: topologically associated domains (TADs),
organised by chromatin loop extrusion (Dixon et al., 2012). Hi-C also revealed that each
genome territory separates into two chromosome compartments (compartmentalisation),
euchromatic (active) and heterochromatic (inactive) (Lieberman-Aiden et al., 2009).
Chromatin loop extrusion model uses the cohesin complex and CCCTC binding factor
(CTCF) to form stable DNA loops, which act as functional barriers between active and inactive
chromatin. Loop extrusion by cohesin and CTCF is important for the formation of TADs, and
cohesin and CTCF are enriched at TAD boundaries (Dixon et al., 2012; Rao et al., 2014). Loss
of CTCF relaxes the global domain structures, which results in increased contacts between
domains; loss of cohesin mostly affects interactions within domains (Li et al., 2013b; Zuin et
al., 2014). These observations have led to the chromatin loop extrusion model to describe how
TADs are formed: in this model, cohesin begins to extrude a chromatin loop until it encounters
an obstruction associated with the DNA (CTCF). Think of the DNA as a thread being pulled
25
through a needle’s eye (cohesin) to make a loop. The CTCF only functions as a barrier when
it is bound to the DNA in a specific orientation (Rao et al., 2014); directionality of CTCF sites
is important for loop formation, as sites that anchor loops are convergent (Merkenschlager et
al., 2016).
Compartmentalisation depicts the process in which proteins arrange themselves into
condensates and act as membrane-less organelles that cluster specific molecules (Hyman et al.,
2014; Shin et al., 2018). In these condensates there are often co-localisation of similar
transcription factors or polycomb proteins, suggesting that like attracts like. Hence why there
is euchromatic and heterochromatic compartments (Stadhouders et al., 2019).
1.4.1 Topologically associated domains (TADs)
TADs are characterised by an elevated number of chromatin interactions within a
genomic locality, with fewer interactions to outside regions (Lieberman-Aiden et al., 2009;
Rao et al., 2014). Genes positioned within the same TAD often have the same histone
modifications and are co-regulated (Rao et al., 2014). TADs are present in a wide range
different metazoan phyla, suggesting that they are ancient features of the genome (Harmston
et al., 2017). TADs have been under intense selective pressure, and act on the same functional
subset of genes, at least back to the common ancestor of chordates and arthropods (Harmston
et al., 2017).
Harmston et al., 2017 propose that chromatin structure is mainly defined by the need
to regulate important developmental genes. TADs often contain the regulatory elements
(enhancers, silencers and insulators) required to regulate the genes within that TAD. TADs
containing clusters of regulatory regions are enriched with conserved CTCF sites when
compared to TADs without clusters of regulatory regions (Harmston et al., 2017). This
suggests that these regions are isolated from neighbouring domains. Gómez-Marín et al., 2015
highlighted that diverging CTCF sites are a signature of TAD borders that are maintained
26
between species, suggesting that specific 3D organisation is highly conserved and under
intense selection pressure (Gómez-Marín et al., 2015).
Disruption of TAD boundaries allows erroneous contacts with regulatory elements of
neighbouring domains, which leads to misexpression of genes, including the activation of
oncogenes driving cancer progression (Kloetgen et al., 2019; Pope et al., 2014; Taub et al.,
1982; Zhan et al., 2017)(Bohlander, 2000; Look, 1997; Rawat et al., 2004).
Taub et al., (1982) described t(8;14) translocations in Burkitt's lymphoma which alter
gene expression through enhancer hijacking (Taub et al., 1982). Rewat et al., (2004) showed
that in the t(12;13) translocation, overexpression of CDX2 gene is leukaemogenic rather than
the ETV6-CDX2 fusion(Rawat et al., 2004). Chromosomal translocations like RUNX1-ETO
disrupt the well-established and conserved RUNX1 TAD (Figure 1.7).
27
Figure 1.7: RUNX1 TAD Hi-C contact map of RUNX1 region in K562 cells.
Hi-C data generated by Rao et al., (2014) was visualised in 3D genome browser at a 25-kb resolution (Wang et al., 2018). Every square is coloured according to the frequency of two regions interacting. The alternating yellow and blue bars are predicted TADs. RUNX1 is annotated according to its transcription direction (reverse strand). RUNX1 sits within its own isolated TAD.
1.5 Regulators of gene transcription
Most genes display individual characteristics that determine the site, level, and timing
of expression throughout development and the cell cycle (Kadauke et al., 2009). Accurate
spatiotemporal and quantitative gene expression is crucial for normal development and, in
many cases, is achieved by the interaction of promoters with cis-regulatory elements. In
addition, the chromatin must be ‘open’ for external factors to facilitate regulatory functions of
the DNA (Gilbert et al., 2004).
Genomic regions have been shown to be highly conserved across a diverse set of
vertebrate species (Abbasi et al., 2007). While some conserved regions correspond to coding
genes and non-coding RNAs, the remainder are unlikely to produce a functional transcript
(Rebolledo-Jaramillo et al., 2014). Non-coding DNA can play crucial roles in the regulation of
28
gene expression, as some conserved non-coding regions can contain transcriptional regulatory
modules, such as enhancers, insulators or silencers (Deng et al., 2010).
Highly conserved developmental genes are surrounded by clusters of conserved
regulatory elements. Expression patterns driven by regulatory regions often correlate with the
function of the closest genes; however, the overlap can be inconsistent with regulatory elements
controlling expression in genes further afield (Marsman et al., 2014; Smemo et al., 2014;
Spieler et al., 2014). Long-range chromatin interactions between promoters and regulatory
elements can be mediated by scaffolding proteins and transcription factors to regulate gene
expression (Phillips-Cremins et al., 2013).
The syntenic organisation of clusters of regulatory regions supports the idea that there
are combinations of regulatory elements controlling the complex expression patterns of
developmental genes. Synteny is required to retain the regulatory regions in cis with the genes
under long range interaction (Ritter et al., 2010).
1.5.1 Cohesin acts with other regulators to determine gene expression
Cohesin controls a large number of genes, as evidenced by altered expression profiles
when the multi-subunit protein is knocked down in mouse embryonic stem cells and drosophila
neuronal cells (Kagey et al., 2010; Schaaf et al., 2009). One of the potential mechanisms by
which cohesin regulates cell-type specific gene expression is by facilitating long-range
interactions through the formation of DNA loops. These chromatin loops could connect
regulatory elements and gene promoters over large distances (Strom et al., 2012).
Depending on the nature of the regulatory elements, formation of chromatin loops may
either facilitate or inhibit transcription from the interacting promoter. Cohesin and CTCF act
together to form regions that isolate enhancers from interacting with inappropriate gene
promoters (Canela et al., 2017; Phillips-Cremins et al., 2013); this can lead to cell-type and
gene-type specific transcriptional programmes. Simonis et al., 2006 showed that the mouse β-
29
globin locus is active in the fetal liver and inactive in the fetal brain due to CTCF and cohesin
mediated regulation. CTCF binding motifs sit within the β-globin locus and CTCF binding
varies depending on the cell type; different CTCF binding patterns change the 3D structure of
the β-globin locus, consequentially altering regulation in a cell-specific manner (Phillips et al.,
2009; Simonis et al., 2006). In human cells, both CTCF and cohesin were shown to be
necessary for long-range loop formations that are cell-type specific (Hou et al., 2010).
Disease-causing mutations are also found in cellular machinery that maintain 3D
structure, such as cohesin and CTCF. Hnisz et al., (2016) reported leukaemia cell genomes
with recurrent microdeletions that modified CTCF boundary sites. These mutations were
sufficient to activate proto-oncogenes (Hnisz et al., 2016). Schierding et al., (2020) highlighted
that several genes that are coregulated with cohesin loci are themselves intolerant to loss-of-
function mutations. These results emphasised the significance of correct regulation of cohesin
genes and indicated new pathways of cohesin related disease progression (Schierding et al.,
2020). Mutations in cohesin could inhibit cohesin binding to regulatory elements, thus varying
interactions with promoters and consequent gene expression. Alternatively, mutations in
regulatory elements to which cohesin binds could affect the binding of regulatory proteins and
modify transcription of the regulatory element’s gene target. Katainen et al., (2015) reported
recurrent mutations to CTCF and cohesin binding sites in cancer samples (Katainen et al.,
2015).
1.5.2 Cohesin and Runx1
Tissue-specific regulation of runx1 transcription in zebrafish is reliant on cohesin
(Horsfield et al., 2007). Horsfield et al., (2007) observed that a null mutation in the rad21
subunit of cohesin prevented runx1 expression in the haematopoietic mesoderm, but not in
Rohon-Beard neurons in the developing zebrafish embryo (Horsfield et al., 2007). This study
30
presented the first evidence that the transcriptional role of cohesin is tissue-specific in
haematopoietic precursors.
Subsequently, it was established that zebrafish runx1 is directly bound by cohesin and
CTCF at the promoters and contains an intronic regulatory element between runx1 promoters
(Marsman et al., 2014). Marsman et al., (2014) showed that depletion of cohesin in zebrafish
decreased the total amount of runx1 transcript produced overall, but the relative expression of
P1-regulated runx1 increased. Cohesin depletion in human myelocytic leukaemia (HL-60) cell
line enhanced RUNX1 transcription, implying that cohesin’s transcriptional role could be
conserved in humans (Marsman et al., 2014).
Down’s syndrome patients with acute megakaryoblastic leukaemia (DS-AMKL)
contain three copies of the RUNX1 gene (owing to trisomy 21), as well as a high frequency of
cohesin mutation (53% ) (Yoshida et al., 2013). As normal Runx1 expression requires cohesin,
the elevated rate of cohesin mutations in DS-AMKL patients indicates the strong possibility
that cohesin mutation is actively contributing to leukaemia progression (Leeke et al., 2014).
AML patients with mutations in genes regulating RNA splicing, chromatin (cohesin)
or in the RUNX1 gene have poor prognoses, with the various mutations contributing
independently and additively to the disease (Papaemmanuil et al., 2016). RUNX1 and cohesin
mutations co-occur or are enriched in 27-52% of AML samples (Heimbruch et al., 2021).
Mazumdar et al., (2015) showed that cohesin mutant cell lines with impaired differentiation
had increased chromatin accessibility and binding of ERG, GATA2 and RUNX1 (Mazumdar
et al., 2015).
Recently, Antony et al., (2020) showed that mutation of the STAG2 subunit of cohesin
in K562 cells disrupted RUNX1 expression during megakaryocyte differentation (Antony et
al., 2020). In the STAG2 mutant, RUNX1 showed increased chromatin accessibility. This
research also established that in wildtype K562 cells, RUNX1-P1 had a gradual increase in
31
transcription during phorbol 12-myristate 13-acetate (PMA)-stimulated megakaryocyte
differentiation. However, STAG2 mutants cells showed an aberrant spike in transcription of
RUNX1-P2 6-12h post-PMA stimulation (Antony et al., 2020). After 48h stimulation the
aberrant RUNX1 transcription observed in STAG2 mutant cells had returned to baseline. This
implies that the initial response to PMA stimulation in cohesin mutant cells (that have increased
chromatin accessibility at RUNX1), leads to unrestrained transcription (Antony et al., 2020).
Antony et al., (2020) decreased BRD4 binding and suppressed enhancer-driven
transcription by using a bromodomain and extra-terminal motif (BET) inhibitor protein called
JQ1 (Antony et al., 2020). BRD4 is a bromodomain-containing protein that interacts with
active enhancers (Bhagwat et al., 2016). When wildtype cells were JQ1-treated concurrently
with PMA, there was reduced RUNX1-P2 and ERG expression in parental cells, and
dramatically dampened aberrant expression peaks in STAG2 mutant cells. RUNX1-P1
expression is essential for megakaryocyte development and its transcription was completely
blocked by JQ1 in both parental and STAG2 mutant cells. These results suggest that STAG2
depletion deconstrains the chromatin surrounding RUNX1, which produces aberrant enhancer-
amplified transcription in response to differentiation signal. This study also highlighted that
the correct expression of RUNX1 is enhancer-driven.
Ochi et al., (2020) observed reduced enhancer–promoter loops in Runx1 and Stag2
double knockout mouse models. These double knockouts lead to myeloid-skewed expansion
of hematopoietic stem/progenitor cells (HSPC) and myelodysplastic syndromes (MDS) in
mice. The reduced loops were associated with genes important for regulation of HSPCs; these
same genes were downregulated in cohesin-mutated leukaemia samples (Ochi et al., 2020). In
the absence of Stag2 in HSPCs, Stag1 can still maintain TAD boundaries, however Stag1
cannot retain intra-TAD interactions, at differentiation genes (Viny et al., 2019).
32
The occurrence of RUNX1 and cohesin dysregulation in AML subtypes and in vitro
models provides further evidence that these two factors are likely to play important roles in
normal and abnormal haematopoiesis. Additionally, these studies support previous work that
suggests cohesin mediates interactions between cis-regulatory elements and the promoters of
Runx1 (Antony et al., 2020; Marsman, 2016; Marsman et al., 2014; Ochi et al., 2020).
1.5.3 Regulatory elements
Crucial to the development of multicellular organisms is the ability of cells to acquire
new cell-fates. Cell-fate decisions are driven by environmental cues that activate signal
transduction in the nucleus – the environmental cues initiate transcription factors that define
the specific gene-expression program for each cell type. Transcription factors take effect by
binding to defined DNA motifs within gene regulatory elements to control the rates of
transcription (Griffiths et al., 1999). There are two main types of protein regulated transcription
elements: enhancers and insulators (Figure 1.8).
1.5.3.1 Enhancers
Enhancers are regions characteristically found in non-coding DNA regions that
positively regulate promoters to stimulate transcription by recruiting specific transcription
factors, chromatin remodelling proteins, and RNA polymerase II (Marsman et al., 2012).
Enhancers are generally identified by distinct histone modifications including; histone H3
lysine 4 monomethylation (H3K4me1), histone H3 lysine 4 dimethylation (H3K4me2) histone
H3 lysine 4 trimethylation (H3K4me3), histone H3 lysine 9 acetylation (H3K9ac), histone H3
lysine 36 monomethylation (H3K36me1) and histone H3 lysine 27 acetylation (H3K27ac)
(Ong et al., 2012; Zentner et al., 2011). These histone marks determine which regions of DNA
are accessible for transcription factor binding (Gates et al., 2017; Lawrence et al., 2016). The
pattern of histone marks suggests whether a region is active, inactive, or poised, and
accordingly whether the enhancer status correlates with gene activity. Active transcription is
33
associated with histone marks such as H3K4me3 and H3K9ac at active promoters, H3K4me1
and H3K4me2 at active enhancers, and H3K27ac at both active promoters and enhancers.
Heterochromatic regions are zones of gene inactivation and are associated with H3K27me3
(Heintzman et al., 2007; Kolovos et al., 2012; Pokholok et al., 2005). Active enhancer
sequences are characterised by bound RNA polymerase II (RNA Pol II) and can produce
enhancer RNAs (eRNA) that contribute to gene regulation (Peng et al., 2018). These regions
are often identifiable by transcription factor binding, acetyltransferase p300, cohesin, and/or
CTCF (Zentner et al., 2011). In addition, active enhancers usually lack nucleosomes and are
sensitive to DNase I digestion (Kolovos et al., 2012). Enhancers regulate spatiotemporal gene
expression, and therefore direct developmental programmes (Ong et al., 2012).
The location or orientation of an enhancer is independent of its ability to interact with
a promoter. Enhancers have been traditionally identified based on their ability to drive
transcription in reporter gene assays (Peng et al., 2018). Enhancers can also be predicted by
computational methods, including: computational analysis of conserved non-coding sequence
and transcription factor binding motifs (Pennacchio et al., 2006; Visel et al., 2008; Woolfe et
al., 2004); analysis of chromatin immunoprecipitation and sequencing (ChIP-seq) to locate
binding of transcription factors, mediators, and histone modifications at the regions of interest
(Murakawa et al., 2016; Visel et al., 2009); chromatin accessibility assays to identify areas of
open chromatin (DNase-seq, ATAC-seq) (Buenrostro et al., 2013; Thurman et al., 2012);
detection of enhancer RNAs such as cap-analysis gene expression (CAGE) (Andersson et al.,
2014); revealing promoter-enhancer interactions using chromosome conformation capture (3C,
R3C, 4C, 5C, Hi-C, Promoter capture) (Dekker et al., 2002; Dostie et al., 2006; Lieberman-
Aiden et al., 2009; Simonis et al., 2006); and chromatin interaction analysis using paired-end
sequencing (ChIA-PET) (Fullwood et al., 2009).
34
1.5.3.2 Silencers and insulators
Conversely, suppression of gene expression during differentiation and cell cycle
progression is facilitated by silencers and insulators. Silencers prevent gene expression by
producing non-coding transcripts that recruit polycomb repressor complexes to target genes
(Boyer et al., 2006).
Insulators block the interaction between enhancers and promoters by confining regions
within repressed or active chromatin boundaries; insulators establish chromatin domains
(Kolovos et al., 2012; Levine, 2010; Splinter et al., 2006; West et al., 2002). Mutation of
insulators changes the pattern of gene expression and can lead to developmental defects
(Feinberg, 2007).
Figure 1.8: Cohesin mediated cis-regulatory DNA elements.
A) An enhancer (green) is recruited to activate gene expression through a physical interaction with the gene promoter (black arrow). Transcription factors (blue) are often present for binding. B) An insulator (orange) excludes the enhancer from interacting with the promoter, which prevents expression. CTCF (red) can demarcate domains of active and inactive chromatin by binding with cohesin (multi-coloured ring) and forming long-range loop formations. Figure is from Thomas (2015).
The three-dimensional genome conformation is crucial for efficient gene regulation.
CTCF and cohesin help mediate enhancer-promoter, insulator-insulator and insulator-promoter
interactions (Raab et al., 2010).
35
1.5.4 Techniques to determine three-dimensional conformation
A significant portion of our understanding of nuclear organisation has been discovered
through microscopic study of the nucleus, using techniques like fluorescent in situ
hybridization (FISH)(Dekker et al., 2015). However, as the resolution of these techniques is
low, it is difficult to reliably distinguish between a looped or linear conformation of two regions
(Williamson et al., 2014).
The majority of recent discoveries of sequence-specific chromatin structure have been
made using chromatin capture techniques which map physical genome interactions (Sanyal et
al., 2011; Splinter et al., 2011). 3C, 4C, 5C, and Hi-C involve chemical cross-linking of the
genomes capturing regions that are in close proximity, fragmenting chromatin (by digestion
with restriction enzymes), religating cross-linked regions, and identifying the interacting
regions by DNA sequencing (Hakim et al., 2012; Sanyal et al., 2011; Splinter et al., 2011).
Most of these methods allow the unbiased and genome-wide mapping of interactions without
prior knowledge of partners, thus permitting the discovery of new interactions.
Chromosome conformation capture (3C) investigates previously selected regions and
determines if there is an interaction between them (one versus one)(Dekker et al., 2002).
Circular chromosome conformation capture (4C), is utilised to investigate all interactions with
a chosen region, or ‘bait’, and can be detected through the use of next-generation sequencing
(one versus many)(Simonis et al., 2006; Zhao et al., 2006). 4C is typically used to identify all
genomic loci that interact with a particular gene or regulatory element. Chromosome
conformation capture carbon copy (5C) detects all interactions within a genomic region (many
versus many)(Dostie et al., 2006). Hi-C is determined from a promoter perspective and is used
to analyse all interactions within a genome (Lieberman-Aiden et al., 2009). Hi-C allows for
high throughput analysis and genome interaction maps to be created, which is valuable for
36
determining genomic structure. However, detection of specific local interactions is limited by
its resolution, detection of local interactions is better achieved using 4C.
1.5.5 Genome structure and disease
Recent research shows that disrupting the boundaries of TADs severely alters the
spatial organisation of a locus, which allows ectopic enhancer-promoter interactions and leads
to aberrant expression of target genes (Gómez-Marín et al., 2015; Lupiáñez et al., 2015).
Gómez-Marín et al., 2015 demonstrated that vertebrate Six genes share an evolutionarily-
conserved 3D chromatin architecture. When the conserved intergenic region containing the
TAD border between six3 and six2 was removed, there was a dramatic change to the gene
expression profile. This research demonstrates that evolutionarily-conserved TAD borders are
crucial to prevent competition between promoters for enhancers in opposing regulatory
landscapes (Acemel et al., 2017). This concept has been challenged by Williamson et al.,
(2019) who demonstrated that perturbations to the sonic hedgehog (Shh) TAD structure had
little or no detectable effect on Shh expression patterns or levels of Shh expression during
development (Williamson et al., 2019).
Multiple studies have shown the 3D context of the genome is a key determinant in the
development and progression of cancer. The probability of 3D contact between two loci
influences the extent of somatic copy number alterations in cancer cells (Engreitz et al., 2012;
Fudenberg et al., 2011; Zhang et al., 2012). Physical proximity is a major player of genomic
rearrangements which result in translocations (Chiarle et al., 2011; Klein et al., 2011).
Mutations to cohesin and CTCF have been shown to dysregulate 3D organisation of the
genome and perturb normal development (Anania et al., 2020; Hnisz et al., 2016).
Hypermethylation of specific CTCF sequences resulted in a loss of CTCF at a domain
boundary, which caused aberrant expression of an oncogene (Flavahan et al., 2016).
37
Northcott et al., (2014) and Weischenfeldt et al., (2017) describe somatic variants that
disrupt genome organisation, and result in enhancer jacking in cancer cells (Northcott et al.,
2014; Weischenfeldt et al., 2017).
Disease-associated variants are found in regulatory elements and in genes with equal
frequency, making these regions highly relevant to cancer (Ernst et al., 2011). More than 95%
of genome-wide association studies (GWAS) single nucleotide polymorphisms (SNPs) are
located in intergenic regions, of which over 75% are associated with open chromatin (DNase I
HS sites) implying a strong association to regulatory elements (Maurano et al., 2012). A variant
in a regulatory region may cause differential regulation of a cancer gene, and could be
functionally equivalent to a mutation in the coding sequence of the cancer gene itself. Recent
studies have shown alterations to regulatory regions reported in cancers (Zhang et al., 2020).
It is therefore possible that mutations in enhancers or insulators may change their function, and
subsequently could lead to altered regulation of key blood development gene(s) and the onset
of leukaemia (Stranger et al., 2007).
Understanding the connection between the 3D organisation of the genome and its
underlying biological function will allow a better interpretation of human pathogenesis.
1.5.6 Regulatory regions of Runx1
Both the specific haematopoietic and non-haematopoietic expression of Runx1 appear
to rely on enhancer elements, as the Runx1 promoters by themselves in vivo are unable to drive
any Runx1-specific activity (Bee et al., 2009a). Antony et al., (2020) highlighted the enhancer
regulated activity in aberrant megakaryocytic differentiation (Antony et al., 2020).
Understanding Runx1 enhancers and how they contribute to Runx1 organisation is imperative
to understanding the regulatory landscape of Runx1.
Numerous groups recognised an intronic enhancer of Runx1 specific to haematopoietic
cells. This mouse regulatory region is variously termed +23/ +24/eR1/RE1 (Bee et al., 2009a;
38
Koh et al., 2015; Markova et al., 2011; Ng et al., 2010; Nottingham et al., 2007). Significantly,
this enhancer was shown to be active in haemogenic endothelial cells and haematopoietic cell
clusters in the AGM equivalent region in both mice and zebrafish (Markova et al., 2011; Ng et
al., 2010; Schütte et al., 2016). Bee et al., (2009) showed the enhancer worked with both
promoters to drive HSC-specific gene expression (Bee et al., 2009a). Interestingly, this well-
known enhancer was also described as a silencer in HEK293 cells (Markova et al., 2011).
Other putative regulatory elements have been identified upstream of Runx1 and
between the P1 and P2 promoters in humans and mice (Figure 1.9)(Cheng et al., 2018; Gunnell
et al., 2016; Markova et al., 2011; Marsman, 2016; Marsman et al., 2017; Ng et al., 2010;
Schütte et al., 2016; Wilson et al., 2016).
Schütte et al., (2016) and Ng et al., (2010) identified 12 possible regulators of Runx1
in mice, with some regions functionally validated (Ng et al., 2010; Schütte et al., 2016). My
previous masters project investigated the functions of the putative regulatory regions described
by Ng et al., (2010) (Thomas, 2015). Nine further putative regulators were identified because
they physically interacted with Runx1 P1, P2 or +24 enhancer in mouse hematopoietic
progenitor cells (Marsman, 2016) (Figure 1.9A).
In humans, the interactions between RUNX1 promoters and two regulatory regions were
identified using chromosome conformation capture for two regions (Cheng et al., 2018;
Markova et al., 2011). Gunnell et al., (2016), characterised the functions of three potential
RUNX1 regulatory elements (Gunnell et al., 2016) (Figure 1.9B).
39
Figure 1.9: Identified Runx1 regulatory regions.
Regulatory elements are annotated with orange circles, grey boxes annotate exons. The blue boxes show the exons encoding the Runt binding domain. The two Runx1 promoters are represented by black right angled arrows A) Identified Human RUNX1 regulatory elements B) Identified mouse Runx1 regulatory regions.
However, not all of these elements have been tested for regulatory functionality, nor
examined for Runx1 promoter-enhancer interactions using chromosome conformation capture.
Interaction between regulatory elements and promoters of the Runx1 gene cannot be predicted,
as less than 50% of enhancers contact the nearest gene promoter. Previous research has used
zebrafish to characterise the functional roles of regulatory elements in vivo and demonstrated
the value of this animal model for studies of non-coding DNA (Bessa et al., 2009; Marsman et
al., 2014).
The spatiotemporal regulation of Runx1 is beginning to be understood; it is possible
that mutations in regulatory regions may explain AML progression in patients who have no
known protein-coding mutations, and perhaps in patients who have cohesin mutations.
1.6 Zebrafish as a model organism
The small tropical zebrafish (Danio rerio) is an important model organism for a variety
of studies. As developmental genes are highly conserved amongst vertebrates, development of
zebrafish can be compared to that of the mice and humans (Bopp et al., 2006; Kari et al., 2007;
Lieschke et al., 2007). Advantages of using zebrafish as an animal model for research include
the zebrafish’s rapid development and ability to lay hundreds of eggs in each mating, allowing
40
for large sample sizes. Additionally, their small size means they are simple and cost-efficient
to maintain and breed. In contrast to mammals, zebrafish eggs are externally fertilised and the
resulting embryos develop outside of the mother. The embryos are transparent, which allows
development to be easily visualised through a microscope (Teittinen et al., 2012).
Zebrafish develop rapidly, with fish reaching sexual maturity by 3-4 months post-
fertilisation. All major organs are formed by 5 days post-fertilisation (dpf) (Haffter et al., 1996)
(Figure 1.10). The zygote remains at the one-cell stage for approximately 40 minutes, allowing
for injections of reporter constructs, which can reveal gene expression in the developing
embryo. At ten hours post-fertilisation (hpf), somites and organs begin to form (Albacker et
al., 2009). The zebrafish begin to hatch from their clear chorions at 48 hpf, and they are
swimming freely by three dpf (Bopp et al., 2006).
41
Figure 1.10: Diagram of normal Danio rerio (zebrafish) development.
A) 1-cell stage embryo approximately 20 minutes post fertilisation (pf). B) 2-cell stage embryo approximately 40 minutes pf. C) Zebrafish embryo in clear chorion approximately 36 hours pf. D) Zebrafish embryo hatched from chorion approximately 48 hours pf. E) Zebrafish embryo 3-5 days pf. F) Zebrafish reach adulthood at approximately 3-4 months pf. Figure is from Thomas (2015).
Although the fish genome is immensely rearranged compared to the human genome,
several areas of local synteny and larger chromosomal regions are preserved (Martin et al.,
2011). The entire zebrafish genome has been sequenced and 70% of human genes have at least
one obvious zebrafish orthologue (Howe et al., 2013).
42
1.6.1 Haematopoiesis in zebrafish
The combination of visualisation, manipulation, and large-scale capabilities of
zebrafish allows for novel genome-wide studies. Zebrafish are particularly informative for the
analysis of haematopoietic development as blood circulation begins around 23-26 hpf within
the transparent embryos (Albacker et al., 2009). Comparable to other vertebrates,
haematopoiesis in zebrafish occurs in primitive and definitive waves (Paik et al., 2010;
Rasighaemi et al., 2015). The zebrafish primitive wave of haematopoiesis arises within the
intermediate cell mass (ICM) and generates primitive erythrocytes and limited myeloid
populations (Figure 1.11A). The ICM is functionally analogous to the embryonic yolk sac in
mammals (Davidson et al., 2004). These blood cells are seen in circulation until about four
dpf. The first definitive HSC markers are seen at 26 hpf, in the ventral wall of the dorsal aorta
(Figure 1.11B) (Kulkeaw et al., 2012; Orkin et al., 2008). The ventral wall of the dorsal aorta
of zebrafish is the equivalent of the AGM in mammals (Martin et al., 2011). HSCs emerge by
endothelial-to-haematopoietic transition in the dorsal aorta in zebrafish (Julien et al., 2016).
Finally, in zebrafish the permanent niche of HSCs is the kidney marrow (Figure 1.11C),
however in mammals, the permanent location for HSCs to seed is in the bone marrow
(Albacker et al., 2009).
43
Figure 1.11: Schematic of haematopoiesis cell generation locations during zebrafish development.
A) Primitive haematopoiesis occurs in the intermediate cell mass (blue) 12 hours post fertilisation (pf). B) Definitive haematopoiesis begins first in the ventral wall of the dorsal aorta (purple) 26 hours pf, then in the caudal haematopoietic tissue (orange) 2 days pf and finally in the kidney marrow (pink) 4 days pf. C) Kidney marrow (pink) is the life-long source of adult definitive haematopoiesis. Figure is from Thomas (2015).
1.7 Overall hypothesis and aims of this PhD
Runx1 has differing isoforms which are crucial to distinctive developmental and cell
cycle processes. Runx1 is essential for definitive haematopoiesis and dysregulations of RUNX1
are frequently associated with AML progression. Despite the well-established role of RUNX1
in haematopoiesis and disease contribution, there is limited understanding of the regulatory
mechanisms that drive cell-specific expression.
Previous identification of the +24 Runx1 regulatory element as an enhancer in
haematopoietic cells in zebrafish and mice, as well as dysregulation of a RUNX1 silencer
element in a leukaemia patient has highlighted the importance of regulatory elements in driving
correct RUNX1 expression. RUNX1-P1 expression is essential for megakaryocyte development
44
and is driven by enhancers. Investigating 3D structure of RUNX1 and functions of possible
regulatory elements may reveal important regulatory connections and provide rationale for
disease progressions.
Cohesin is crucial in correct genome organisation and gene regulation, and enhancer-
promoter interactions. Cohesin mutations are frequently seen in myeloid malignancies and in
several solid tumour types. The occurrence of RUNX1 and cohesin dysregulation in AML
subtypes and in vitro models suggests these two factors are likely to play important roles in
normal and abnormal haematopoiesis.
The overall aim of my study is to explore the potential of regulatory regions and cohesin
to modulate the transcription of Runx1.
I hypothesise that regions that interact with Runx1 promoters and Runx1 +24 enhancer
may have enhancer capabilities, and some may be conserved in humans. Furthermore, I predict
RUNX1 +24 is an important regulator of RUNX1 expression during megakaryocyte
differentiation, and coupled with reduced cohesin the correct genomic organisation cannot be
maintained.
To examine this hypothesis, first, I analysed the putative Runx1 regions
bioinformatically, to establish genomic characteristics and conservation, in order to determine
enhancer potential. I subsequently tested regions identified by 4C for functional ability and
cell-type specific regulation using zebrafish reporter assays (Chapter 3).
Second, I investigated RUNX1 +24 structural and genomic characteristics. I then
identified the significant 3D connections of RUNX1 +24 using chromosome conformation
capture (Chapter 4).
Third, to elucidate important connections of RUNX1 +24 during normal megakaryocyte
differentiation, I used chromosome conformation capture of megakaryocyte differentiation
time points and compared them to the baseline RUNX1 +24 interactions. To understand how
45
cohesin and 3D chromosome organisation influences RUNX1 transcription and impacts disease
progression, I compared the wild type differentiation interaction profiles to STAG2 mutant
cells at the same time points of PMA stimulation (Chapter 5).
46
2 Chapter 2: Methods
2.1 Ethics consent and protocol approval
2.1.1 Zebrafish
The University of Otago Animal Ethics Committee approved all zebrafish work (AEC
#95/14). University of Otago Animal Welfare Office Animal Ethics Training was undertaken
and any manipulations involved larvae no older than 30 days. Relevant work with transgenic
Zebrafish was approved by IBSC/EPA (GMC100225, GMD101727).
2.1.2 Escherichia coli (E. coli)
All E.coli work was undertaken under the approval codes GMC100216, GMC100219,
GMD101715, GMD101717.
2.1.3 Cell culture
All work with human cell lines was carried out under approval codes GMC100228,
GMD101730. Animal cell line work was performed under approval codes GMC100227,
GMD101729.
2.2 Materials
All materials, solutions and buffers used in this study are listed in Appendix I.
2.3 Molecular biology techniques
The mouse regulatory elements were previously identified in Ng et al., (2010) and in
Judith Marsman’s PhD (Marsman, 2016).
2.3.1 Plasmids and reporter constructs
Plasmids used in this project are listed in Table 2.1.
To determine the regulatory potential of selected mouse regulatory regions in vivo,
Zebrafish enhancer detector vector (ZED) and pGL4.23-GW reporter constructs were used.
47
mouse regulatory elements were previously cloned into ZED and pGL4.23-GW(Marsman,
2016; Thomas, 2015).
Maps of all plasmids are shown in Appendix II.
Table 2.1: Plasmids used in this study
Plasmid Relevant characteristics
Antibiotic selection
Source
Zebrafish enhancer reporter plasmids Zebrafish enhancer detector vector (ZED)
RFP expression across somites as internal control. Cell-specific GFP expression when an enhancer is cloned into vector.
Ampicillin José Bessa lab, Institute for Molecular and Cell Biology, Portugal (Bessa et al., 2009)
Zebrafish mRNA plasmid pCS-TP Tol2 mRNA plasmid
for integration of enhancer DNA into zebrafish
Ampicillin (Kawakami, 2005)
Cell Culture enhancer reporter plasmid
pGL4.23-GW Luciferase assay reporter construct
Ampicillin Jorge Ferrer lab, Imperial Centre for Translational and Experimental Medicine, Imperial College London (Pasquali et al., 2014)
Cell culture Luciferase control plasmid
pRL-TK Renilla control plasmid for luciferase assay
Ampicillin Promega
Control plasmid
pUC19 Control plasmid for transformation and transfection
Ampicillin Addgene
2.3.2 Nanodrop
Quantity and quality of plasmid DNA, genomic DNA or RNA was measured by
spectrophotometry, using a Nanodrop ND-1000, version 3.81 Spectrophotometer. 1 µL of
48
DNA or RNA was added to the well to measure sample purity and concentration by the
absorbance spectrum of UV-visible light.
2.3.3 Agarose gel electrophoresis
To determine if the production of PCR, mRNA or a restriction digest was successful,
products were run on an agarose gel. Unless otherwise stated 2 µL of PCR, mRNA or restriction
digest product was mixed with 8 µL of 1x loading dye and loaded onto a 60mL 1% agarose
gel, with 1x SYBR Green I (ThermoFisher) added before the gel set. A 1 kb plus DNA ladder
(Invitrogen) was used as a marker of band sizes. Gels were run at 100 Volts for 1 hour, then
visualised and photographed using a BioRad UV light transilluminator.
2.3.4 Primer design
To determine the best primers, the online program Primer3 was employed (Koressaar
et al., 2007; Untergasser et al., 2012). Primers were selected based on optimised efficiency (i.e.
Primer GC%, 18-13 bp length) and ordered from Integrated DNA Technologies.
2.3.5 Sequence confirmation and alignment
All sequence confirmations were carried out by Genetic Analysis Services (Department
of Anatomy, University of Otago) according to their protocol using a capillary ABI 3730xl
DNA Analyser. 200 ng of plasmid DNA was supplied for sequencing. Primer list is provided
in Appendix III.
To align sequences to their appropriate reference genome the sequences BLAST-like
alignment tool (BLAT) was applied (Kent, 2002). Sequences were viewed using 4Peaks
(Griekspoor et al., 2005). Ambiguous base calls were examined by eye and manually corrected
if necessary.
49
2.3.6 Measuring Runx1 +24 enhancer G-quadruplexes
To test the if Runx1 +24 enhancer contains G-quadruplex structures Circular dichroism
(CD) photospectrometer was used.
2.3.6.1 Runx1 +24 enhancer G-quadruplexes Oligo design
The G-quadruplex oligos were designed directly from the mouse Runx1 +24 predicted
G-quadruplexes (Appendix IV). A negative control Runx1 +24 oligo was also designed. The
negative oligo should not be able to function as a G-quadruplex.
Oligos (100 nM) were commercially synthesised (Integrated DNA Technologies,
Singapore), and reconstituted in nuclease-free H2O.
Table 2.2: Runx1 +24 enhancer G-quadruplexes Oligo
Oligo Sequence Runx1 +24 G-quadruplexes Oligo GGTGGGGGTGGG Runx1 +24 negative Oligo GGTGTGTGTGGG
2.3.6.2 CD Spectrophotometer protocol
To determine if the predicted G-quadruplexes Runx1 +24 enhancer are present in vitro,
CD spectrometer readings were used to measure ellipticity.
4 µM of each oligonucleotide was heated to 95 °C for 10 minutes then left to cool
(slowly in the heating block) to room temperature (22 °C) overnight for CD analyses the
following day. CD experiments were carried out in different buffers: 100 mM KCl pH 7
(positive control buffers) 100 mM NaCl, and 100 mM LiCl (negative control buffer). The
desired buffer pH was adjusted by relative quantities of Na2H2PO4 /Na2HPO4.H20 and
determined using a pH meter.
CD measurements were performed on a CD Spectrophotometer (Department of
Biochemistry, University of Otago) with a 1 mm path length quartz curvette (approx. 8000
µL). CD spectra were collected across 340 nm to 220 nm in 1 nm increments (continuous
50
scanning) and the reported spectra corresponded to the average of at least two scans. An
appropriate buffer blank correction was used for all spectra. The scanning speed of 200
nm/minute was used for preliminary investigations, with a response of 1 second and a band
width of 1 nm.
CD melting studies were performed to measure if there was a change in ellipticity as a
function of temperature across all wavelengths. Temperature was increased from 25 °C to 95
°C at a rate of 0.25°C min-1.
2.3.7 Site-Directed Mutagenesis of Runx1 +24 enhancer
To test whether the G-quadruplexes were functionally important for Runx1 +24
enhancer capability, pGL4.23-Runx1+24 G-quadruplex sites were mutated. Quikchange II
Site-Directed Mutagenesis Kit (Agilent Technology) was used to mutagenise the Runx1+24
enhancer within the pGL4.23-Runx1+24.
2.3.7.1 Runx1 +24 enhancer Site-Directed Mutagenesis primer design
Primers were designed using the online program QuikChange Primer Design
(https://www.agilent.com/store/primerDesignProgram.jsp) to ensure the desired mutation
(deletion or insertion) were in the middle of the primer with ~10-15 bases of correct sequence
on both sides and had a minimum GC content of 40%. The list of primers is available in
Appendix III.
2.3.7.2 Mutant Strand Synthesis Reaction (Thermal Cycling)
To generate point mutations in pGL4.23-Runx1+24, the mutant strand synthesis
reaction was PCR amplified. PCR amplification was performed in an Applied Biosystem
Thermal Cycler 2.08 as follows: 95 °C for 30 seconds; 12 cycles of 95 °C for 30 seconds, 55
°C for 1 minute, 68 °C for 4 minutes, finally 72 °C for 2 minutes. Following temperature
cycling, the reaction was cooled on ice for 2 minutes.
51
Table 2.3: Mutant strand synthesis reaction
Ligation components Amount per 50µL (reaction volume)
PfuUltra HF DNA polymerase reaction buffer (10x) 5 µL pGL4.23-Runx1+24 50 ng Oligonucleotide primer 2 125 ng Oligonucleotide primer 2 125n g dNTP mix 1 µL nuclease-free H2O to 50 µL PfuUltra HF DNA polymerase (2.5U/µL) 1 µL
2.3.7.3 DnpI Digestion of the Mutant Strand Amplification Products
1 µL DpnI (10U/µL) was added to the amplification reactions and incubated at 37 °C for 1
hour to digest any nonmutated supercoiled template dsDNA. Plasmids were then transformed
into competent cells (Section 2.3.8).
2.3.8 Transformation of plasmids in competent cells
All bacterial work was carried out using standard aseptic techniques.
Competent Escherichia coli (E. coli) STABLE3 (One Shot) and XL1 Blue (One Shot)
cells were used for transformation procedures. STABLE3 and XL1 Blue competent cells were
prepared in house.
pUC19 was used as a positive control for all transformations. All plasmid
transformations were carried out with a LacZ recombination positive control and MilliQ H2O
negative recombination control.
1 µL plasmid DNA was added to a vial of E. coli competent cells and mixed gently and
incubated on ice for 15 minutes. Samples were transferred to a water bath at 42°C for 30
seconds before placing on ice. 250 µL prewarmed Luria Broth (LB) was added and cells were
incubated at 37 °C for 1 hour with shaking (200 rpm).
After transformation, cells were spread on agar plates containing appropriate antibiotic,
Ampicillin (Sigma-Aldrich). The plates for the LacZ transformation control contained 50 µL
of 20 mg/mL X-gal (Sigma-Aldrich). 50 µL of competent cell mixture was spread on the first
52
half of the prewarmed plate, before cells were pelleted by centrifugation. Most of the
supernatant was discarded, and all remaining cells were resuspended in the remaining
supernatant and spread on the other half of the selective plate. Plates were incubated at 37 °C
overnight.
Recombination efficiency was identified by the blue/white colony ratio on the LacZ
plate. 2-4 individual colonies from successful transformations were picked with a wire loop.
Bacteria were then incubated at 37 °C overnight with shaking (200 rpm) in starter culture (4
mL LB containing 100 µg/mL of the appropriate antibiotic).
To prepare a large amount of plasmid, 1 mL of starter culture was added to 99 mL of
LB containing 100 µg/mL of the appropriate antibiotic, this was incubated at 37 °C overnight
with shaking (200 rpm).
To establish glycerol stocks of each successful transformation, 750 µL of the starter
culture was combined with 250 µL of glycerol (VMR Prolabo) and stored at -80°C.
2.3.9 Plasmid purification preparation
After glycerol stocks were made, the remaining bacterial culture was centrifuged at
13,300 g for 1 minute and the supernatant was discarded. Plasmids were prepared using the
Qiagen plasmid Miniprep kit for initial restriction enzyme digestion or sequencing
confirmation. Preparation of plasmids for experiments used the Macherey-Nagel maxi prep kit;
both were executed according to manufacturer’s manual. Plasmid DNA was eluted in 40 µL or
500 µL of elution buffer (5 mM Tris-HCl pH 8.5) respectively and stored at -20 °C.
2.3.10 Tol2 transposase mRNA synthesis
Tol2 sites permit transposon-mediated integration into the zebrafish genome with high
efficiency in the presence of Tol2 transposase mRNA (Kawakami, 2005). In order to
successfully integrate ZED reporter construct into zebrafish host genome, Tol2 mRNA is
53
required (Bessa et al., 2009). Ambion mMESSAGE mMACHINE® kit was used for in vitro
synthesis of large amounts of capped mRNA. To confirm successful mRNA synthesis the
product was run on a 1% agarose gel according to section 2.3.3.
2.4 In vivo Zebrafish approaches
Feeding, maintenance and breeding were performed according to the Otago Zebrafish
Facility Standard Operating Procedures.
2.4.1 Zebrafish strains
The strains of Zebrafish used in this project were wild-type (WT) AB/PS, and (WT)
WIK. AB/PS is an in-house line, previously created by outcrossing an AB fish with a pet-shop
fish. Transgenic lines created in this project were outcrossed onto Tübingen long fin (TL) fish
for easy differentiation between the transgenic and WT lines.
2.4.2 Maintenance
Zebrafish (Danio rerio) were housed in the Otago Zebrafish Facility (OZF), Pathology
Department, University of Otago, Dunedin. Each tank was kept at a temperature of 24-30 °C,
pH of 6.5-8.5 and a conductivity of 200-1000 µS. Water in the facility was purified using
ZebTEC system (Techniplast) of biological filters and UV light. Stocking density was
maintained at or below the maximum number of fish per 3.5 L tank; 15-20 adults or 20-30
younger fish (5-40 days old).
2.4.3 Feeding
Zebrafish were fed three times a day; two dry feedings and one live feeding. The dry
feed (ZM Ltd) was given morning and evening. The live feed was given to adult Zebrafish in
the afternoon and consisted of artemia (brine shrimp). Younger fish were given rotifers, as they
are smaller than artemia.
54
2.4.4 Breeding and embryo collection
Wild type Zebrafish breeding pairs were set up in specialised breeding tanks the
afternoon prior to embryo collection. The tanks contain a false bottom and a removable barrier
to separate male and females. The false bottom allows the eggs to fall through once they are
laid to prevent the parents consuming them. The natural internal breeding cue is light. The OZF
has automated lighting; from 10 pm until 8 am it is dark. Removal of barriers allows the
zebrafish to mate. To regulate breeding, barriers are removed from the tank just prior to
collection.
Embryos are collected through a fine sieve 10 minutes after laying, washed twice with
system water and transferred to a petri dish with E3 embryo medium. E3 medium promotes
growth and survival. Methylene blue (Sigma-Aldrich) is also added as it acts as an anti-fungal
agent, as zebrafish eggs are prone to fungal infection.
2.4.5 Growing embryos
Embryos are grown in a petri dish in E3 medium at 28.5 °C for normal development or
22.5 °C for slower development. The embryos hatch naturally at 2-3 days post fertilisation or
the chorion can be removed earlier using #55 Dumont Forceps. When the chorions are
discarded the media is changed to E3 without methylene blue.
2.5 Zebrafish microinjection
2.5.1 Needle pulling and calibration
A 1.0 mm OD borosilicate glass capillary (Harvard Apparatus) was pulled to form 2
needles using pulling programme 6 (P= 500, Heat= 600, Pull= 50, Vel = 75, Time= 120) on a
Sutter p-97 Flaming/Brown glass micropipette puller.
Injection solutions were well mixed and 2 µL loaded with a microloader pipette into
the needle. The needle was placed in the needle holder of an air-pressure MPPI3 system. 10
55
µL of Halocarbon oil 27 (Sigma-Aldrich) was pipetted onto a stage micrometre calibration
slide. Pressing the injection pedal released a droplet of injected solution, which floated on the
oil. The drop was measured to ensure it was 1 nl (Figure 2.1). Clean forceps were used to trim
needle tip and pulse duration was adjusted to change drop size when necessary. The pulse
duration of the pressure injector was set between 4.5 and 6.5 to minimise needle breakage and
embryo damage.
Figure 2.1: Calibration slide with close up of 1 mm scale, each division is 0.01 mm.
The grey circle represents the size of a 1 nl droplet
56
2.5.2 Microinjection
Injections were achieved by lining embryos up on a ridged plate and, under
magnification of a Leica M80 microscope, guiding the needle tip through the chorion to pierce
the yolk sac.
ZED injections were optimised and the following concentrations were injected 30ng/µL
ZED vector DNA, 120 ng/µL Tol2 transposase mRNA and Ultrapure DNase – RNase free
distilled H20.
The vector solutions were directly injected into the yolk sac at the one cell stage, to
ensure the dividing cell could uptake and integrate the plasmid into the host genome. For each
injected group, uninjected controls of the same batch were kept to observe normal development
and death rates.
2.5.3 Microscopy
In order to screen embryos for fluorescence, embryos were anaesthetised in 0.5ng/µL
tricaine and placed on a microscope slide. GFP3 and dsRED2 filters were applied to view
fluorescence.
If fluorescence was observed, samples were removed from E3 tricaine solution and
embryos were mounted in 3% methylcellulose (Sigma-Aldrich). The dechorionated embryos
were orientated with anterior on the left and dorsal at the top. Embryos were first imaged with
bright field, and then filters were employed to photograph the fluorescence.
Images were taken using a Leica M205 fluorescence microscope on the Leica
Application software.
2.6 Transgenic zebrafish analysis
To find out the function of the putative enhancers identified by Dr. Judith Marsman
ZED zebrafish reporter assays were used. DNA sequences of the putative enhancers were
57
previously cloned into ZED reporter constructs, then injected into the first cell of zebrafish
embryos. The enhancer test vectors contain Tol2 transposon sites which allow the reporter gene
constructs to be integrated into the gDNA. Transient expression of assays can be analysed in
F0; however, the F1 generation contain a germ line integration of the constructs, to avoid the
problem of a mosaic expression pattern seen in F0.
2.6.1 Zebrafish enhancer assay
ZED F0 embryos were dechorionated then imaged at 20-24 hours post fertilisation (hpf)
or 40 hpf using a Leica M205FA stereomicroscope and were imaged with a DFC490 camera
and LAS software (Leica Microsystems), images were processed using Adobe Photoshop.
ZED F1 embryos were also viewed at 20-24 hpf or 40 hpf 72 hpf and embryos with
were examined for GFP.
2.7 In vitro Cell culture and maintenance of cell lines
2.7.1 Cell lines
Cell lines were stored in a Thermo Scientific liquid nitrogen dewar. All cells were
maintained at 37 °C in a humidified environment with 5% CO2 incubator
HPC7 cell line was cultured in Isocove’s Modified Dulbecco’s Media (IMDM)
containing 10% fetal bovine serum (FBS) (Gibco, Life Technology) and 10% stem cell factor
(SCF) conditioned media, 1% Penicillin Streptomycin (P/S), 1.5x 10-4 M monothiolglycerol
(MTG).
BHK/MKL was cultured in Dulbecco’s Modified Eagle’s Medium (DMEM) containing
10% FBS. These cells contain a mouse SCF expression vector and the media collected from
these cell lines was used to produce SCF used for HPC7 media.
PNX and NIH/3T3 were cultured in DMEM containing 10% FBS.
K562 and K562 cell line variations were cultured in IMDM containing 10% FBS.
Differentiation to megakaryocytes was induced using 30 nM of phorbol 12- myristate 13-
58
acetate (Kuvardina et al., 2015)(PMA, Sigma-Aldrich). Control treatment of K562 cell line
variations used Dimethyl Sulfoxide (DMSO)(Sigma-Aldrich), which was prepared in the same
way as PMA.
Table 2.4: Cell lines used in this study
Cell line Media Remarks References K562 (Chronic Myelogeneous Leukaemia cells)
IMDM + 10% fetal bovine serum (FBS).
Sourced originally from ATCC (p10)
(Lozzio et al., 1975)
K562 STAG2 R614* -/-
IMDM + 10% FBS
Generated by Dr. Jisha Antony, a Postdoc in the Horsfield lab (p20)
(Antony et al., 2020)
PNX (Phoenix cells)
DMEM + 10% FBS
Sourced originally from Braithwaite lab (Pathology department, University of Otago, Dunedin, New Zealand)
(Swift et al., 1999)
HPC7 (mouse haematopoietic progenitor cells)
IMDM +10% SCF + 10% FBS + 1% P/S, 1.5x 10-4 M MTG
Sourced originally from Kathy Knezevic (Lowy Cancer Research centre, Sydney, Australia)
(Kolterud et al., 1998)
BHK/MKL (Baby Hamster Kidney cells)
DMEM + 10% FBS
Sourced originally from Kathy Knezevic (Lowy Cancer Research centre, Sydney, Australia)
(Yusta et al., 1999)
NIH/3T3 (Mouse embryonic fibroblast cells)
DMEM + 10% FBS
Sourced originally from Braithwaite lab (p15) (Pathology department, University of Otago, Dunedin, New Zealand)
(Todaro et al., 1963)
2.7.2 Reviving cells from liquid nitrogen stocks
Cell vials were removed from liquid nitrogen stocks and transferred on ice to 37 °C
water bath. Vials were rapidly thawed and the cell suspension added to 10 mL of pre-warmed
media in 15 mL falcon tube. Cells were pelleted by centrifugation (250 g for 5 minutes) and
supernatant was aspirated. Cells were resuspended in 10 mL of fresh media and plated in a
medium culture flask.
59
2.7.3 Splitting cells for new passage
2.7.3.1 Adherent cells (PNX, BHK/MKL, NIH/3T3)
The cells were allowed to form a monolayer and reach 70-90% confluence before
splitting for a new passage. The media was carefully aspirated and adherent cells were washed
with phosphate buffered saline (PBS). 0.005% trypsin (Life Technologies) was added and flask
was then incubated at 37 °C for 3-5 minutes. Media was added to flask to deactivate trypsin.
Cells were gently pipetted up and down to create a single cell suspension. Cells were spun at
250 g for 5 minutes and media was removed. The pellet was resuspended in fresh media to
create single cell suspension. 1 mL of cell suspension was added to a new culture flask with 10
mL fresh media to make a new passage. 1 mL of suspension was collected for cell counting
(Figure 2.2). Number of cells/mL was determined using a haemocytometer slide on a
microscope stage or Luna automatic cell counter.
2.7.3.2 Suspension cells (K562, HPC7)
Cells were observed every day for signs of growth. 1 mL of cell suspension was counted
each day to determine growth patterns and when the cell density reaches 0.7-0.8x106 cells/mL
the culture was split to approximately 0.4x106 cells/mL with fresh media. Number of cells/mL
was determined using a haemocytometer slide on a microscope stage. Cells were spun at 250
g for 5 minutes and media was removed and the pellet was resuspended in fresh media.
2.7.3.3 K562 STAG2 R614-/- mutant lines
Cells were semi-adherent so were grown in suspension cell flasks and treated as
suspension cells (Section 2.7.3.2).
60
Figure 2.2: Haemocytometer slide used for counting cells, each shaded grid is 1 mm2..
The black circles represent cells in the grid. The number of cells in each shaded grid is counted. The average is determined by dividing the total by 5. The number of cells/mL is calculated by grid averaged dilution factor (2)x 10 000
2.7.4 Freezing cells for liquid nitrogen storage
Single cell suspension (2.7.3) cells were centrifuged for 5 minutes at 250 g and
supernatant removed. The pellet was gently suspended in equal parts media and cryoprotectant
(20% DMSO in FBS), 500 µl of each per intended vial of frozen cells except for K562 and
61
K562 cell line variations which were suspended in media and 5% DMSO 1 mL per intended
vial. The cell suspension was aliquoted into each cryovial and then transferred to Nalgene Mr
Frosty. Mr Frosty was stored at -80 °C overnight then transferred to the liquid nitrogen dewar.
2.8 Expression analysis
2.8.1 Total RNA purification
To examine gene expression in cell lines, RNA was extracted from K562 and K562
STAG2 R614* mutant cell lines. Cells (5x105) were collected by centrifugation. The extraction
procedure was carried out in accordance with the RNA purification method described in
Macherey-Nagel user manual. Concentration and purity of RNA was determined by Nanodrop
technique described in section 2.3.2. RNA was eluted in 40 µL Ultrapure DNase – RNase
free distilled H2O (Life Technology) and stored at -80 °C until required.
2.8.2 cDNA synthesis
Complementary DNA (cDNA) was synthesised from the RNA extracted in section
2.8.1 using 4 µL of qScript cDNA SuperMix (Quantabio), 500 ng of total RNA and RNase-
free H2O in a 20 µL reaction. Reverse transcription was performed in an Applied Biosystem
Thermal Cycler 2.08, at 25°C for 5 minutes, 42 °C for 30 minutes and finally 85 °C for 5
minutes. cDNA products were stored at -20 °C until required.
2.8.3 Quantitative reverse-transcriptase PCR (qRT-PCR)
qRT-PCR is a technique that quantitatively measures RNA following cDNA
conversion. This is achieved through real-time PCR. qRT-PCR was used to measure
endogenous ERG, KLF1, P1-/P2- generated, and total RUNX1 expression in K567 and K562
STAG2 R614* mutant lines cell lines.
62
Figure 2.3: Quantitative PCR primer locations at human RUNX1.
Schematic of human RUNX1 gene and transcripts are shown with exons in boxes and the 5’ and 3’ untranslated regions in white. P1 and P2 promoters are indicated with a black hooked arrow. Primers used for analysis of P1, P2 and total RUNX1 transcript expression by quantitative PCR are indicated with black arrowheads.
Relative gene expression was normalised to reference genes H-CYCLOPHILIN and
GAPDH. Dr. Jisha Antony previously validated each primer for 80-100% efficiency, with an
external standard curve, generated by serial dilutions of the cDNA. A list of primers is available
in Appendix III.
qRT-PCR mix was made for each primer set, in each cell line (Table 2.5).
Table 2.5: qRT-PCR reaction mix
Reagent Amount per 20µL (reaction volume) cDNA (500 ng) 1 µL 3µM Forward primer 1 µL 3µM Reverse primer 1 µL SYBR Green (2x) 10 µL Ultrapure DNase – RNase free distilled H20
7 µL
qRT-PCR was performed using LightCycler® 480 Instrument II (Roche Life Science)
using the following programme: 95 °C for 30 seconds and 40 cycles of 95 °C for 5 seconds, 60
°C for 30 seconds.
63
Analysis was carried out using qBase software, which utilised the ∆∆CT method. Graphs were
created using Prism 7 (GraphPad).
2.8.4 pGL4.23 Transfection
To determine the ability of the Runx1 +24 G-quadruplex to act as an enhancer in mouse
cells lines, NIH/3T3 cells were transfected with the pGL4.23 +24, pGL4.23 +24 g->t ,
pGL4.23 +24 t->a. Cultures were plated in a 96-well plate at 8x 103 cells per well, and cells
were 30-50% confluent at time of transfection. Cells were transfected using Lipofectamine
3000 (Invitrogen) according to the manufacturer’s instructions. Plates were incubated at 37 °C
for 48 hours. A Renilla luciferase reporter construct was co-transfected with the mouse
regulatory elements pGL4 reporter constructs. Firefly and Renilla luciferases have distinct
evolutionary origins and dissimilar enzyme structures and substrate requirements. Renilla acts
as a transfection control.
Table 2.6: Transfection components per well
Component per well Amount per well Adherent cells 8x103 cells
Lipofectamine 3000 dilution
Opti-MEM Media (Gibco, Life Technology)
5 µL
Lipofectamine 3000 0.2 µL Master Mix
Opti-MEM Media 5 µL Plasmid DNA 50 ng P3000 Reagent 0.2 µL Renilla 20 ng
2.8.5 pGL4.23 Transfection analysis
If pGL4.23-GW contains an enhancer there should be an increase in expression of
luciferase, which can be determined with a luminometer. Dual-Glo Luciferase Assay System
(Invitrogen) was used according to manufacturer’s instructions to measure luminescence.
Dual-Glo Reagent was added in equal parts to the volume of culture medium and mixed well.
Firefly luminescence was measured by Victor X3 plate reader after 10 minutes. Dual-Glo
64
Stop & Glo Reagent, equal to the original culture medium, was added to each well and mixed.
After 10 minutes Renilla luminescence was measured by Victor X3 plate reader.
2.8.6 Statistical analysis of luciferase assays
All analyses were performed in Prism 6. Significance of the differences observed was
assessed using ANOVA.
2.9 In silico approaches for Enhancer analysis
Genomic sequences surrounding the Runx1 locus for human (Homo sapiens) and mouse
(Mus musculus) were obtained from publicly available genome assemblies [human
(GRCh38/hg38 and GRCh37/hg19) assembly, and mouse (mm9 and mm10) assembly] on the
UCSC genome browser (Kent et al., 2002; Waterston et al., 2002). All genome alignments
were performed in accordance with previous Ng et al., 2010 alignments (Ng et al., 2010).
2.9.1 Epigenetic analyses using ENCODE data
ENCODE is the Encyclopaedia of DNA elements, a publicly available database of
functional protein, RNA and DNA elements in the human and mouse genomes, including
epigenetic marks and regulatory factors. Numerous other genomes are available for analysis
(Rosenbloom et al., 2013), but were not used in this study. To identify any patterns in
functionally validated regulatory elements as enhancers, various haematopoietic cell lines were
selected for analysis (Table 2.7). The ENCODE datasets were used to detect the presence of
various histone modifications, DNase I hypersensitivity sites, transcription factors and cohesin
binding sites (Table 2.8) (Barski et al., 2007; Raney et al., 2014; Rosenbloom et al., 2013;
Wilson et al., 2016). Analysis of ENCODE annotations was carried out using data submitted
by Stanford, Yale, University of Washington (UW) and Ludwig Institute for Cancer Research
(LICR).
65
In order to visualise the combined ENCODE cell line data, all sequences were
annotated using Geneious, program version R8.0.4 (Biomatters) (Kearse et al., 2012).
To determine the enhancer locations in different genome assemblies UCSC ‘liftOver’
tool was utilised to convert coordinates within a species (Rosenbloom et al., 2013).
To determine the human genome co-ordinates for functionally validated Runx1
enhancers, the identified enhancer sequences were searched using UCSC BLAT tool and NCBI
BLAST tool (Altschul et al., 1990; Kent, 2002). More genomic features of the conserved
enhancer regions were deduced by analysing the human coordinates using FANTOM 5
(Andersson et al., 2014; Lizio et al., 2015) and UCSC Genome Browser (Table 2.8).
ENCODE also has Chromatin State predictions (ChromHMM) for K562 cells (Kellis
et al., 2017). ChromHMM annotates the data based on epigenetic information. The predicted
chromatin states are active promoter, promoter flanking, inactive promoter, candidate strong
enhancer, candidate weak enhancer, distal CTCF/candidate insulator, transcription associated,
low activity proximal to active states, polycomb repressed, and heterochromatin /repetitive
/copy number variation. Transcription associated refers to transcriptional transition or
elongation (Ernst et al., 2010; Ernst et al., 2011).
Further regulatory markers for K562 cells were uploaded into UCSC genome browser
including; super-enhancers (Khan et al., 2016), silencers (Pang et al., 2020), cohesin mediated
chromatin accessibility (Antony et al., 2020).
66
Table 2.7: Analysed cell lines
Cell line Origin K562 Human Chronic Myelogeneous
Leukaemia cells Table 2.8: Analysed ENCODE annotations
Regulatory association ENCODE annotation Open chromatin DNase I hypersensitivity (HS) Transcription initiation RNA Pol II (phosphorylated at serine
5) Cohesin (subunit) RAD21, SMC3 CTCF binding sites CTCF Histone acetyltransferase that is recruited to transcriptionally active enhancers (an increase in the expression of target genes)
p300
H3K4me1/2 demethylase (mediates transcriptional repression)
LSD1
Super-enhancer previously annotated by Khan et al., (2016)
K562 Super-enhancers
Silencer previously annotated by Peng et al., (2020)
Silencers
Cohesin mediated chromatin accessibility previously annotated by Antony et al., (2020)
Chromatin sites that show increased accessibility in the absence of cohesin
Chromatin State ChromHMM Enhancer Enhancer-associated
histone methylation or acetylation marks
H3K27ac, H3K4me1, H3K4me2
Active promoters H3K27ac, H3K9ac, H3K4me3
Repressed promoters H3K9me3, H3K27me3 Haematopoiesis transcription factors
Erythroid development GATA1 Haematopoietic SPI1 Blood/ Stem progenitors TAL1,GATA2
2.9.2 Comparative analysis of cancer genomic datasets
To assess whether any AML patients had mutations in the conserved regulatory
elements regions, publicly available cancer genomic datasets were explored. Online tools were
used to categorise publicly available patient information, into different tumour types for
analysis. Sequence data at the RUNX1 locus was downloaded from all available AML cases.
The large-scale cancer genomics datasets were retrieved from International Cancer
Genome Consortium (ICGC) Data Portal [human Feb. 2009 (GRCh37/ hg19) assembly]
67
(Hudson et al., 2014; Zhang et al., 2011), The c-BioPortal for Cancer Genomics [human Feb.
2009 (GRCh37/ hg19) assembly] (Cerami et al., 2012; Gao et al., 2013) and The Cancer
Genome Atlas (TCGA) Data Portal [human Mar. 2006 (NCBI36/ hg18) assembly](CGARN,
2013).
2.9.3 Single nucleotide polymorphism (SNP) analysis
To deduce if there were any reported SNPs in each region GWAS catalog
(GRCh38/hg38) (Buniello et al., 2019), HaploReg v4.1 (GRCh37/hg19)(Ward et al., 2016)
were used. GWAS catalog shows SNPs chromosome locations and highlights any publications
that report on the function of that SNP. HaploReg v4.1 displays the genome characteristics and
transcription factor binding data for each SNP and predicts regulatory motif changes.
To establish if any SNPs had a minor allele frequency (MAF) of >1% Genotype-Tissue
Expression (GTEx) portal v8 (GRCh38/hg38) (https://gtexportal.org/home/) was used.
gnomAD browser (Karczewski et al., 2020) v2 1.1 (GRCh37/hg19) and v3 (GRCh38/hg38)
reviewed SNPs with MAF >1% for any ClinVar associations and/ or publications.
To identify which genes a SNP physically interacts with, the CoDeS3D pipeline
(Fadason et al., 2017) was used to interrogate Hi-C chromatin interaction libraries. This
pipeline identifies interactions across the genome with the SNPs, and then identifies SNPs
where the genetic variant correlates with the expression level of the spatially associated partner
gene [the SNP is an expression quantitative trait locus (eQTL)].
2.10 Circular Chromatin Conformation Capture (4C)
Circular chromatin conformation capture (4C) was used to elucidate the all interactions
with a chosen region or ‘bait’ (Figure 2.4). 4C sequencing libraries were prepared from 4C
material, each containing PCR products amplified by bait-specific primer pairs. Chromatin
connections with the bait were detected through the use of next-generation sequencing and data
processing.
68
Figure 2.4: Schematic outline of 4C process.
Chromatin in close contact is cross-linked using formaldehyde (1), then digested with the 1st Restriction Enzyme (RE) (2). Chromatin is ligated to fuse the RE cut ends of DNA fragments (3). Chromatin is reverse cross-linked using heat (4), then digested by a 2nd RE (5). DNA fragments are then ligated to form small circles, then regions of interest are PCR amplified using bait specific primers (6). Regions of interest are then deep-sequenced (7).
2.10.1 4C Design
To determine the best primers for the chosen 4C bait regions (P1-RUNX1, P2-RUNX1,
RUNX1+24, ERG+85), the sequences for each bait were searched for possible restriction
enzyme digest sites. An appropriate primary restriction enzyme would leave a bait fragment of
>500 bp for 4-bp cutters or >1500 for 6-bp cutters. An appropriate secondary enzyme would
chop the bait to be ideally >250 bp long. Once appropriate enzymes were selected, the online
program Primer3 was employed (Koressaar et al., 2007; Untergasser et al., 2012). All primers
were aligned to the genome using BLAST and BLAT. Primers were selected based on
optimised efficiency (i.e. Primer GC%, 18-13 bp length, shortest length to the restriction
enzyme site) and ordered from Integrated DNA Technologies. Primers are listed in Appendix
III.
2.10.2 Cross-linking of K562 cells
37% fomaldehyde was heated up to 65 °C for 15 minutes whilst vortexing every five
minutes then left to cool to room temperature. Two million cells in 20 mL petri dish were cross-
69
linked in 2% formaldehyde by addition of 1.1 mL of 37% formaldehyde to media, and mixed
into media with gentle swirling. Cells were then incubated with gentle rocking in a hybe oven
at 21 °C for 10 minutes. To stop cross-linking activity, 1 mL of 2.5 M glycine was added to
media with gentle swirling and then incubated for 5 minutes on ice with occasional swirling.
Cells were scraped off the dish and into a 50 mL falcon tube. Cells were spun at 250 g for 8
minutes at 4 °C. The supernatant was decanted, cells were washed (not resuspended) with 10
mL ice-cold 1x PBS and centrifuged at 250 g for 8 minutes at 4 °C. The supernatant was
decanted, cells were resuspended with 10 mL ice-cold 1x PBS and centrifuged at 250 g for 8
minutes at 4 °C. The supernatant was removed with a 10 mL pipette and the remaining volume
was removed with a p200 pipette. Pellets were stored at -80 °C.
2.10.3 Isolation of nuclei
Frozen pellets were resuspended in 10 mL lysis buffer (Appendix I) (1 mL first) and
incubated on ice for 30 minutes. Cells were Dounce homogenised 10 times up and down, then
centrifuged at 600 g for 5 minutes at 4 °C. The supernatant was removed and the pellet was
resuspended in 1 mL of ice-cold PBS and transferred to 1.5 mL Eppendorf tube. Cells were
centrifuged at 600 g for 5 minutes at 4 °C. The supernatant was decanted and the remaining
volume removed using a p200 pipette.
2.10.4 First Digestion
The pellets from 1.11.3 contain 2 million cells. Pellets of the same samples were
combined for a total of 4 million cells. Four million nuclei were resuspended in 76 µL
Ultrapure DNase – RNase free distilled H20 and 4 µL 10% sodium dodecyl sulfate (SDS)
and incubated at 60 °C for 10 minutes. SDS was neutralised by addition of 26.72 µL 15%
Triton X-100 and 245.28 µL Ultrapure DNase – RNase free distilled H20, then incubated at
37 °C for 20 minutes. Any aggregate was broken up every 5 minutes. 16 µL of sample was
taken out as undigested control and stored at -20 °C. 80 µL restriction buffer and 14 µL DpnII
70
restriction enzyme (50,000U/mL) (for 4 million cells; increased proportionally for more cells)
were added and the total volume was made up to 800 µL with Ultrapure DNase – RNase free
distilled H20. Samples were digested with tubes placed horizontally at 37 °C for 4 hours without
shaking. Every 10 minutes for the first hour and every 30 minutes after that samples were
gently mixed by inverting the tube. Lids were sealed with parafilm to prevent evaporation.
After digestion, 32 µL of sample was taken out as the digested control and stored at -20°C.
To analyse digestion success, undigested and digested controls were treated with
Proteinase K (Invitrogen). The undigested sample had 2.5 µL of 10 mg/mL Proteinase K and
70 µL of 10 mM Tris-HCl pH 7.5 added. The digested sample had 5 µL of 10 mg/mL Proteinase
K and 58 µL of 10 mM Tris-HCl pH 7.5 added, and both samples were incubated at 65 °C for
1 hour. 20 µl of each sample was run on 0.6% agarose gel for 1 hour at 100 Volts.
2.10.5 First restriction enzyme inactivation
The first restriction enzyme was heat inactivated at 65 °C for 20 minutes
2.10.6 Ligation
Two 15 mL Falcon tubes were prepared for ligation; one for ligated sample and one for
non-ligated sample. Both the samples had 2336 µL of Ultrapure DNase – RNase free distilled
H20 800 µL ligase buffer, 40 µL adenosine triphosphate (ATP) and 784 µL of sample added.
For the ligated sample, 40 µL Invitrogen T4 DNA ligase (500 U at 1 U/µL). For the non-ligated
sample 40 µL of Ultrapure DNase – RNase free distilled H20 was added. Samples were
incubated at 16 °C overnight while shaking vertically at 30 rpm. After incubation at 16 °C,
samples were incubated at room temperature for 30 minutes. After ligation, 90 µL of sample
was taken out as ligated and non-ligated controls and stored at -20 °C.
To check ligation efficiency, ligated and non-ligated controls were treated with 5 µL
of 10mg/mL Proteinase K at 65 °C for 1 hour. 40 µL of ligated and non-ligated controls were
run alongside 20 µL of digested control on 0.6% agarose gel for 1 hour at 100 Volts.
71
2.10.7 Reverse cross-linking and DNA purification
To reverse crosslink samples, 75 µg (7.5 µL of 10 mg/mL) of Proteinase K was added
to ligated and non-ligated samples and incubated overnight at 65 °C. For samples that were
treated with PMA for 72 hours, 150 µg of Proteinase K was added and samples were incubated
overnight at 65 °C.
RNA was degraded by addition of 75 µg (7.5 µL of 10 mg/mL) RNaseA to samples
and incubation at 37 °C for 45 minutes.
DNA was purified by phenol/chloroform extraction. An equal volume of
Phenol:Chloroform:Isoamyl Alcohol 25:24:1 (Sigma-Aldrich) was added to the samples.
Samples were vortexed for 30 seconds and spun at 4000 g (maximum speed) for 15 minutes at
room temperature. The aqueous phase was transferred to 15 mL Falcon tubes.
DNA was ethanol precipitated by adding 3 M NaOAC pH 5.6 to a concentration of 240
mM and 2.5 volumes absolute ethanol (Scharlab). Samples were vortexed and incubated at -80
°C for 1.5 hours or until frozen solid. Samples were centrifuged at 4000 g for 1 hour at 4 °C.
The supernatant was discarded and 5 mL of 70% ethanol was added. Samples were spun at
4000 g for 15 minutes at room temperature. The supernatant was discarded and tubes were re-
spun to collect any residual solution. Residual supernatant was removed using a p200 pipette.
Pellets were air-dried for around 10 minutes at room temperature and resuspended in 400 µL
of 10 mM Tris-HCl pH 7.5.
To check DNA purification of the undigested and digested sample, 60 µL of the sample
was taken out, 20 µL of which was run on 0.6% agarose gel for 1 hour at 100 Volts.
2.10.8 Second digestion
Samples were transferred to 1.5 mL tubes for digestion with CviQI. 40 µL of 10x
Buffer, 340 μL of sample and 80 U (8 μL) of 10000 U/mL CviQI and 12 µL of Ultrapure
DNase – RNase free distilled H20 (for a total of 400 µL), were digested overnight at 25 °C
72
while shaking with tubes placed horizontally at 30 rpm. Lids were sealed with parafilm to
prevent evaporation. After digestion, 10 µL of each sample was taken out as digested control
and combined with 100 µL 10 mM Tris-HCl pH 7.5 and stored at -20 °C.
To analyse digestion success, 20 μL of digested samples were run on a 0.6% agarose
gel for 1 hour at 100 Volts alongside purified ligated samples (Section 2.10.6).
2.10.9 Removal of second restriction enzyme
As CviQI cannot be heat inactivated, phenol:chloroform extraction was used to remove
restriction enzyme. Total volume was made up to 400 µL with Ultrapure DNase – RNase
free distilled H20, then an equal volume of phenol:chloroform was added to samples. Samples
were vortexed for 30 seconds, then centrifuged at 15000 rpm (max speed) for 10 minutes at
room temperature. The aqueous phase was transferred to 2 mL tube.
DNA was ethanol precipitated by adding 3 M NaOAC pH 5.6 to a concentration of 240
mM and 2.5 volumes absolute ethanol. Samples were vortexed and incubated at -80 °C for 1.5
hours or until frozen solid. Samples were centrifuged at 15000 rpm for 20 minutes at 4 °C.
Supernatant was removed and 1 mL of 70% ethanol was added. Samples were spun at 15000
rpm for 5 minutes at room temperature. The supernatant was removed and tubes were re-spun
to collect any residual solution. Residual supernatant was removed using a p200 pipette. Pellets
were air-dried for around 5-10 minutes at room temperature and resuspended in 400 µL of 10
mM Tris-HCl pH 7.5.
To check DNA purification, 10 µL of the sample was taken out, of which 2.5 µL was
run on 0.6% agarose gel for 1 hour at 100 Volts with the digested sample.
2.10.10 Second Ligation
Samples were transferred to 50 mL falcon tubes for ligation. To each sample, 1120 µL
ligation buffer, 56 µL of ATP and 40 µL ligase was added and the total volume was made up
73
to 5600 µL with Ultrapure DNase – RNase free distilled H20. Samples were incubated at
16°C overnight at 30rpm.
To check ligation, 80 µL of sample was taken out. 20 µL was run on 0.6% agarose gel
for 1 hour at 100 Volts with the purified digested sample.
2.10.11 DNA purification with linear polyacrylamide (LPA)
DNA was purified by adding 22.4 µL 20µg/mL linear polyacrylamide (LPA), NaOAC
to a final concentration of 240 mM, and 2 volumes of Ethanol to the samples. Samples were
incubated at -80 °C for 1.5 hours. Samples were then centrifuged at 4000 rpm (max speed) for
1 hour at 4 °C. The supernatant was removed and 5 mL 70% ethanol was added. Samples were
centrifuged at 4000 rpm for 15 minutes at room temperature. The supernatant was removed
and tubes were re-spun to collect any residual solution. Residual supernatant was removed
using a p200 pipette. Pellets were air-dried for around 5-10 minutes at room temperature and
resuspended in 160 µL of 10 mM Tris-HCl pH 7.5.
The concentration of samples was measured using the Qubit® 3.0 Fluorometer (Life
Technologies) and Qubit® double-stranded DNA (dsDNA) High Sensitivity Assay kit (Life
Technologies) according to the manufacturer’s instructions. Samples were stored at -20 °C.
2.10.12 4C Polymerase Chain Reaction
A total of 344 ng of DNA per bait was amplified by Polymerase Chain Reaction (PCR)
using Q5® Hot Start High-Fidelity 2x Master mix (New England Biolabs). For the final 4C-
PCR reactions a no-template control was used to control for DNA contamination. PCR reaction
conditions are listed below. Optimised amplification conditions, per primer set, are listed in
Table 2.10. Primer sequences are listed in Appendix III. Amplified products were run on 1.5%
agarose gel for 45 minutes.
74
Table 2.9: 4C-PCR reagents
Reagent Amount per 25µL (reaction volume) Q5® High-Fidelity 2x Master Mix 12.5 µL 10 µM Forward primer 1.25 µL 10 µM Reverse primer 1.25 µL Template DNA 86 ng nuclease-free H2O to 25 µL
The PCR protocol was optimised by varying the temperature of annealing phase (63.2
°C-67.8 °C). Amplification was performed in an Applied Biosystem Thermal Cycler 2.08 as
follows: 98°C for 30 seconds; 30 cycles of 98 °C for 10 seconds, 63.2 °C-67.8 °C for 15
seconds, 72 °C for 2 minutes, finally 72 °C for 2 minutes, 4 °C for ∞.
Table 2.10: Optimised PCR conditions for 4C-PCR primers
Primer set Annealing temperature (°C) RUNX1 P1 64.7 °C RUNX1 P2 67.8 °C RUNX1 +24 63.2 °C ERG +85 64.2 °C
2.11 4C-seq Library preparation
2.11.1 PCR purification
PCR products were cleaned up using the QIAquick PCR purification kit (Qiagen)
according to the manufacturer’s instructions. For each 4C-PCR, one column was used (the
maximum binding capacity of a column is 10 µg). DNA from column was eluted in a total of
30 µL elution buffer (5 mM Tris-HCl pH 8.5).
The concentration of samples was measured using the Qubit® 3.0 Fluorometer (Life
Technologies) and Qubit® double-stranded DNA (dsDNA) High Sensitivity Assay kit (Life
Technologies) according to the manufacturer’s instructions and samples were stored at -20 °C.
2.11.2 Adaptor ligation
Adapter ligation was performed by Dr. Jisha Antony (Horsfield lab, University of Otago).
75
Libraries were prepared using TruSeq® DNA PCR-Free preparation kit (Illumina)
according to the manufacturer’s instructions for Low Sample preparation. Based on the lowest
DNA concentration, 29 ng from each bait were pooled according to sample and the total
volume was made up to 50 µL with Ultrapure DNase – RNase free distilled H20.
Purified amplicon concentration was measured by Qubit® Fluorometer using dsDNA
High Sensitivity assay kit and average size of PCR products was measured on a 2100
Bioanalyser (Agilent Technologies) using a High Sensitivity DNA Kit (Agilent Technologies)
according to the manufacturer’s instructions. Bioanalyser data was analysed using 2100 Expert
software (Agilent Technologies).
76
Table 2.11: 4C-Seq Library and sequencing adapters
Library TruSeq® i7index
I7 Index sequence
TruSeq® i5 index
I5 Index sequence
WT_12_DMSO_1 UDI0001 CCGCGGTT UDI0001 AGCGCTAG
WT_12_DMSO_2 UDI0002 TTATAACC UDI0002 GATATCGA
WT_12_DMSO_3 UDI0003 GGACTTGG UDI0003 CGCAGACG
WT_12_DMSO_no_ligase UDI0004 AAGTCCAA UDI0004 TATGAGTA
WT_12_PMA_1 UDI0005 ATCCACTG UDI0005 AGGTGCGT
WT_12_PMA_2 UDI0006 GCTTGTCA UDI0006 GAACATAC
WT_12_PMA_3 UDI0007 CAAGCTAG UDI0007 ACATAGCG
WT_72_PMA_1 UDI0024 ATGAGGCC UDI0024 GTTAATTG
WT_72_PMA_2 UDI0009 AGTTCAGG UDI0009 CCAACAGA
WT_72_PMA_3 UDI0010 GACCTGAA UDI0010 TTGGTGAG
WT_72_PMA_no_ligase UDI0012 CTCTCGTC UDI0012 TATAACCT
ST37_12_DMSO_1 UDI0021 GGTACCTT UDI0021 AAGACGTC
ST37_12_DMSO_2 UDI0022 AACGTTCC UDI0022 GGAGTACT
ST37_12_DMSO_3 UDI0015 GGCTTAAG UDI0015 TCGTGACC
ST37_12_DMSO_no_ligase UDI0016 AATCCGGA UDI0016 CTACAGTT
ST37_12_PMA_1 UDI0017 TAATACAG UDI0017 ATATTCAC
ST37_12_PMA_2 UDI0018 CGGCGTGA UDI0018 GCGCCTGT
ST37_12_PMA_3 UDI0019 ATGTAAGT UDI0019 ACTCTATG
ST37_72_PMA_1 UDI0020 GCACGGAC UDI0020 GTCTCGCA
ST37_72_PMA_2 UDI0013 CCAAGTCT UDI0013 AAGGATGA
ST37_72_PMA_3 UDI0014 TTGGACTC UDI0014 GGAAGCAG
ST37_72_PMA_no _ligase UDI0023 GCAGAATT UDI0023 ACCGGCCA
2.11.3 Library purification
Library purification was performed by Dr. Jisha Antony (Horsfield lab, University of Otago).
Libraries were purified by removing fragments above 1 kb using Agencourt®
AMPure® XP beads (Beckman Coulter). The total volume was made up to 50 µL with
Ultrapure DNase – RNase free distilled H20. Beads were equilibrated at RT for at least 30
77
minutes and vortexed briefly before use. 0.5x of magnetic beads was added per sample and
mixed at 1800 rpm at room temperature on the thermomixer for 1 minute and then incubated
at room temperature for further 5 minutes. Samples were spun at 300 g for 1 minute at room
temperature. Beads were separated using a magnet until solution was clear (around 5 minutes)
and supernatant was removed. Beads were washed twice with 200 µL of 70-80% ethanol
(freshly made). Tubes were air dried at room temperature for 5 minutes. The magnet was
removed and 20 µL of TruSeq® DNA PCR-Free Resuspension buffer was added and mixed
until homogeneous. Solutions were incubated for 5 minutes at room temperature. The magnet
was applied until solution was clear (1-2 minutes). Eluates were transferred to DNA low-bind
0.2 mL tubes (Eppendorf).
2.11.4 Equal mixing of baits
Purified amplicon concentration was measured by Qubit® Fluorometer using dsDNA
High Sensitivity assay kit and average size of PCR products was measured on a 2100
Bioanalyser (Agilent Technologies) using a High Sensitivity DNA Kit (Agilent Technologies)
according to the manufacturer’s instructions. Bioanalyser data was analysed using 2100 Expert
software (Agilent Technologies).
2.11.4.1 MiSeq library
Samples were pooled by Dr. Jisha Antony (Horsfield lab, University of Otago) based
on average fragment size: 1 µL for libraries with average size of <1.5 kb and 2 µL for libraries
with average size of >1.5 kb. After pooling, samples were vacuum dried to reduce volume to
10 µL. 1 µL was run on the Bioanalyser.
2.11.4.2 HiSeq Library
Baits were mixed equimolarly based on concentration and average fragment size bait
mixed and adjusted based on the percentage of demultiplexed 4C baits obtained from an initial
MiSeq run.
78
2.11.5 Next generation sequencing
Final libraries were sequenced as 100 bp single-reads on an IlluminaTM HiSeq 2500 by
Otago Genomics Facility (OGF). Because the 4C-seq libraries have low diversity, they were
combined with 15% PhiX control (v/v). A trial library (PCR comparison) was sequenced as
151 bp reads on an IlluminaTM MiSeq by Dr. Rob Day (Cancer Genetics, Department of
Biochemistry, University of Otago).
2.12 Processing of sequenced reads
All code used and generated for this project can be found in Github:
(https://github.com/thoam810/4C_analysis).
2.12.1 Generating read counts and data quality reports
Quality and number of initial reads and demultiplexed reads were checked using:
https://github.com/thoam810/4C_analysis/blob/master/FASTQC_analysis_bash. Quality of
reads was assessed using FastQC software (Andrews, 2010), for each bait within the libraries
before and after demultiplexing.
2.12.2 Demultiplexing of libraries based on indexed adaptors
Libraries were demultiplexed based on 8 base index barcodes (see Table 2.11). For the
HiSeq run, libraries were demultiplexed by OGF (see Appendix V for report); for MiSeq runs,
libraries were demultiplexed automatically after the run.
2.12.3 Trimming of adaptors
Adaptor sequences were removed from bait demultiplexed reads by Sequencing
providers (HiSeq - OGF) and (MiSeq - Dr. Rob Day).
2.12.4 Demultiplexing of baits based on 4C primers
Baits were demultiplexed by Dr. William Schierding (University of Auckland).
79
2.12.4.1 HiSeq
Samples were sequenced using HiSeq2500 100 bp single end reads on two lanes. To
join reads for the different lanes samples files were concatenated
(https://github.com/thoam810/4C_analysis/blob/master/Concatenate_seq_bash). Baits were
demultiplexed based on 4C primers up to and including the restriction site, as listed in Table
2.12 (https://github.com/thoam810/4C_analysis/blob/master/demux_4C_hiseq.sl). Only reads
that had the forward and reverse primer and bait sequences in the correct orientation were
selected. No mismatches were allowed. Primer sequences were not removed from the reads.
2.12.4.2 MiSeq
Sequences used for demultiplexing of baits are 4C primers up to and including the
restriction site (no mismatches were allowed) as listed in Table 2.12.
80
Table 2.12: Primer sequences and lengths used for demultiplexing of 4C baits
Primer Sequence Length (bp) RUNX1 P1 F AACCAAGTCTGTCAGTCTCCTGTAC 25 RUNX1 P1 R GCTAACAAGCAGAGGAAGCAATATTTGTATAGAAT
AACAGAAAAAAAAGATC 52
RUNX1 P2 F GCCGGCTCCTGGAATTGGCCCGCGCGCCCCCGCCGCCGCGCCGCGCGCTACTGTAC
56
RUNX1 P2 R AGGATGTGCGTGCGTGTGTAACCCGAGCCGCCCGATC
37
RUNX1 +24 F ACGTGTCCACATTTTTCAGAAGTTGAATGGTTTTGTGTAATGGTAC
46
RUNX1 +24 R CAGTTACTTCGCCCATTGCTTTTTCTAAAGGATC 34 ERG +85 F CTCGCTGCTTAGGTGTAGTCCTTAGGAATGAAAATC
TGAACGTAGTAC 48
ERG +85 R TGGGGGAGACTGACAAATAGATC 23
2.13 Mapping of sequenced reads
Mapping of sequenced reads was carried out on Linex Server according to 4Cker
(Raviram et al., 2016).
2.13.1 Generating restriction enzyme digestion of Human Genome (hg19)
4C-Seq reads are usually mapped to a reduced genome which consists of unique
sequence fragments adjacent to the primary restriction enzyme sites in the genome. Mapping
to a reduced genome helps to remove inauthentic ligation events that do not result from
crosslinking.
The 4Cker script created a custom reduced genome using oligoMatch (Kent et al.,
2002) and fastaFromBed (Bedtools).
The fragment length should be longer than the length of the sequenced read (100 bp)
minus the size of adapter barcode (8 bp) and primer including the restriction enzyme (RE)
sequence (various bp see Table 2.11 and 2.12). As the longest sequence read was 69 bp
therefore, 70 bp fragment length for digestion was chosen.
In order to map reads to regions of Homo sapiens hg19 reference genome, the Human
genome file (Broad Institute) was digested into 70 bp fragments using an edited 4Cker script
81
and primary RE DpnII file (4Cker).
(https://github.com/thoam810/4C_analysis/blob/master/RE_digest_genome).
Table 2.13: Length of sequenced reads after trimming
Primer Sequenced read length after trimming (bp) RUNX1 P1 F 67 RUNX1 P1 R 40 RUNX1 P2 F 36 RUNX1 P2 R 55 RUNX1 +24 F 46 RUNX1 +24 R 58 ERG +85 F 44 ERG +85 R 69
2.13.2 Generate Bowtie2 reference files of RE digested genome.
The DpnII reduced genome was used to map 4C-Seq single-end reads using Bowtie2
(Langmead et al., 2012)
Bowtie2 requires specific index files which were created using the reduced genome
files: https://github.com/thoam810/4C_analysis/blob/master/bowtie2_build.
2.13.3 Mapping of reads
The 4C-seq single-end reads were aligned to the Bowtie2 DpnII reduced genome using:
https://github.com/thoam810/4C_analysis/blob/master/Alignment_bowtie. This generates
Sequence Alignment/Map (SAM) files of aligned and unaligned reads.
The sequenced reads were trimmed from the 5’ end based on the adapter sequence
length and bait primer sequence including the RE length. The 3’ end was trimmed by 5bp based
on the demultiplexed quality reports.
2.13.4 Creating a counts file from mapped data
The SAM output from bowtie2 reads (Section 2.13.2) were used to create a counts file
(.bedGraph) with number of reads per RE fragment using:
https://github.com/thoam810/4C_analysis/blob/master/mapped_data_counts_bedGraph. This
82
generates a tabulated bedGraph file with 4 columns: Chromosome number, location start of
fragment, location end of fragment, count.
2.13.5 Removing self-ligated and undigested reads
The self-ligated and undigested fragments were removed from the count files before
analysing 4C interactions using:
https://github.com/thoam810/4C_analysis/blob/master/remove_self_ligated_undigested.
To remove the self-ligated and undigested fragments for each bait, the coordinates for
the primer sequences (not including RE sequence) were found in the human reference
genome. Following this the sequence before and after the primer sequence were found. The
fragments that match the coordinates for the fragments before, after and including the primer
were removed.
2.13.6 Removal of background noise
Fragments that had reads in controls are considered background noise and these reads
were removed from the statistically significant interactions using:
https://github.com/thoam810/4C_analysis/blob/master/remove_controls.
Reads at fragments 10 kb up- and downstream of the bait were removed using:
https://github.com/thoam810/4C_analysis/blob/master/remove_bait_10kb.
2.14 Analysing 4C interactions
4C interaction analyses were done in R on a Linux server.
The interactions of a bait across the genome will differ depending on proximity to bait,
with closer interactions (near-bait) occurring at a higher frequency, followed by cis interaction
then trans interactions having the least amount of interaction with the bait region. 4Cker
provides different analyses for near-bait, cis and trans interactions.
4Cker interaction analysis checks if each samples passes quality control (QC) criteria.
To pass 4Cker QC, samples must have more than 1 million reads, more than 40% of reads on
83
the cis chromosome and more than 40% coverage in the 200 kb region near the bait. Krijger et
al., 2020 state that 50,000-100,000 reads are sufficient to generate 4C profiles that correlate to
the 4C profile of 1 million reads (Krijger et al., 2020). For this 4C, samples with more than
400,000 reads with more that 40% reads in cis and >30% reads in near-bait would be allowed
in analysis.
When checking RUNX1 P1, RUNX1 P2 and ERG +85 baits they did not pass the quality
control criteria, so these reads were not processed further.
2.14.1 Identification of statistically significant interactions per condition
To determine significant interactions, 4Cker reduces bias of PCR artifacts as fragments
with counts greater than the 75th quantile within a given window by trimming to this value.
As 4C signal is generally higher around the bait region and decreases in far-cis and
trans, different adaptive window sizes were used for read normalisation and creating smooth
spline in each data set.
Adaptive windows were used to obtain an analogous number of observed fragments for
each window. Window size is determined by the amount of signal detected in the region. This
results in small windows near the bait and regions where there is high coverage and larger
windows further away from the bait where there is low coverage.
The smooth spline (with a smoothing parameter of 0.75) is fitted to the window sizes
separately for each analysis. Significant interactions are determined by what read counts in
domains are called by all replicates, and are greater than the 90th quantile of the adaptive
window.
The value of 'k' determines the size of the adaptive windows. The following k values
were optimised based on comparison of raw reads to the normalised reads and analysed reads.
For the analysis near the bait k = 3, entire bait chromosome analysis k = 3 and trans analysis
used k = 15.
84
Significant 4C interactions for each condition were determined using;
https://github.com/thoam810/4C_analysis/blob/master/Analysing_4C_interactions.
2.14.2 Generation of ggplots
To plot the 4C profiles for all the conditions ggplot was utilised
(https://github.com/thoam810/4C_analysis/blob/master/4C_ggplots).
2.14.3 Differential interactions using DESeq2
Differential analysis was carried out between two conditions on the windows that are
interacting with the bait in at least 2 conditions. 4Cker has optimised this analysis for near-bait
and cis interactions. Significantly different domains were determined with a p-value of 0.05
using (https://github.com/thoam810/4C_analysis/blob/master/DESeq_diff_analysis).
2.14.4 Overlap of significant interactions with genes
Significant interactions were overlapped with UCSC K562 reference gene list using
(https://github.com/thoam810/4C_analysis/blob/master/map_gene_list).
2.14.5 Overlap of significant interactions with protein binding sites and
enhancer characteristics
See section 2.9.1
85
3 Chapter 3: Functional analysis of Runx1 enhancers
Some of the results from this chapter contributed to Marsman, J., Thomas, A., Osato,
M., O'Sullivan, J. M., & Horsfield, J. A. (2017). A DNA contact map for the mouse Runx1
gene identifies novel haematopoietic enhancers. Scientific Reports, 7, 13347 (Appendix VI).
3.1 Runx1 enhancers
The precise spatiotemporal expression of Runx1 requires the activity of cis-regulatory
elements, the majority of which have not been defined. In AML and other myeloid
malignancies where no mutation is found in the coding sequence, there may instead be
mutations in regulatory sequences that affect gene function (Cornish et al., 2019).
Understanding the regulatory landscape of RUNX1 will further increase understanding
of haematopoiesis as well as identify potential new regions for driver mutations in myeloid
malignancies. Recent studies have identified recurrent mutations in regulatory elements of
genes associated with oncogenesis, such in as the TAL1, ETV1, and PAX5 enhancers
(Mansour et al., 2014; Orlando et al., 2018; Puente et al., 2015), highlighting the importance
for determining functionality of non-coding elements.
Regulatory regions can be highly tissue-specific, are often spread over across the
chromatin, and only a small proportion of distal enhancers target the nearest transcript (Javierre
et al., 2016; Mifsud et al., 2015).
3.1.1 Previously identified enhancers of Runx1
Cis-regulatory elements (CREs), such as enhancers are able to control the expression
of genes by long-range chromatin interactions (Marsman et al., 2012).
Several research groups identified an important intronic Runx1 enhancer in mice:
Runx1 +23/+24/eR1/RE1 mouse conserved non-coding element (Bee et al., 2009a; Ng et al.,
2010; Nottingham et al., 2007) (Table 3.1). The naming of the element refers to the kilobases
86
(kb) downstream of each group’s reference point. The inconsistency of enhancer naming
between groups is because Ng et al., (2010) uses the transcriptional start site (TSS), +1 as the
reference point, whilst the other groups refer to this enhancer being 23.5 kb downstream from
the ATG of exon 1. The term eR1 proposed by Koh et al., (2015) refers to enhancer of Runx1
(Koh et al., 2015), whereas Tang et al., (2019) named it regulatory element 1 (RE1) (Tang et
al., 2018). Throughout this thesis, this previously described intronic enhancer will be referred
to as +24.
The +24 enhancer is responsible for the activation of Runx1 expression in
hematopoietic stem/progenitor cells (HSPCs) (Ng et al., 2010; Nottingham et al., 2007). Most
enhancers evolve rapidly and are rarely conserved at the DNA sequence level among species
due to positive evolutionary selection (Villar et al., 2015). Runx1 +24 is highly conserved in
eukaryotes, indicating an essential role in gene regulation. The discovery of +24 regulatory
element as a haemogenic endothelial cell-specific enhancer in mouse and zebrafish embryos
provided an insight into a mechanism for the targeted transcriptional regulation of Runx1
(Markova et al., 2011; Ng et al., 2010; Schütte et al., 2016).
Table 3.1: Runx1 +24 regulatory region
Regulatory region location (kb)
Name in this thesis
Identification methods Functional validation
Mouse and Human Runx1 relative to P1 +23/ +24/ eR1/RE1
+24 In silico conservation (interspecies DNA sequence homology) and DNase I hypersensitive sites (Markova et al., 2011; Ng et al., 2010; Nottingham et al., 2007).
Haematopoietic enhancer activity during mouse and zebrafish development. Luciferase enhancer activity in 416B, K562 and Jurkat haematopoietic cell lines. Thought to be a silencer in HEK293 cells (Markova et al., 2011; Marsman et al., 2017; Ng et al., 2010; Schütte et al., 2016)
This chapter will focus on the other candidate regulatory regions for Runx1, while
Chapter 4 provides a detailed analysis of the Runx1 +24 enhancer.
87
In this thesis enhancers are labelled according to sequence origin, and location relative
to Runx1 transcriptional start site (TSS): + means downstream from the Runx1 TSS at the P1
promoter, whereas – refers to upstream e.g. +62 is 62 kilobases (kb) downstream, from Runx1
TSS. If the sequence for the Runx1 regulatory element was discovered in mice, (m) is used
before the location label, human is unlabelled e.g. +62 sequence information is from humans
and m+110 is from mice.
In mice, Schütte et al., (2016) and Ng et al., (2010) used a combination of techniques
to identify 12 possible regulators of Runx1 with some functional identification (Table 3.2). My
previous masters project investigated the functions of the putative mouse regulatory regions
described by Ng et al., (2010).
Markova et al., (2011), Gunnell et al., (2010), Wang et al., (2014), Cheng et al., (2018),
and Nie et al., (2019) characterised the functions of potential human RUNX1 regulatory
elements. Some of these discoveries included identifying long-range interactions with RUNX1
promoters (Table 3.3).
Enhancers were predicted by several different methods including; computational
analysis of non-coding sequence for inter-species conservation, predicted transcription factor
binding motifs; analysis of chromatin immunoprecipitation sequencing (ChIP-seq) to locate
binding of transcription factors, mediators, and histone modifications at the regions of interest;
revealing promoter-enhancer interactions using chromosome conformation capture (4C, Hi-C)
and chromatin interaction analysis using paired-end sequencing (ChIA-PET).
In this Chapter I describe the conservation and features of previously identified and
novel Runx1 CREs, and identify the functional enhancer capability of regions previously
identified by 4C. This information will provide a deeper understanding of the regulation of
Runx1 transcription, with implications for causality and treatment of leukaemias with RUNX1
dysregulation.
88
Table 3.2: Identified mouse Runx1 regulatory regions with available classifications
Regulatory region location (kb)
Name in this thesis Identification methods Functional validation
-368 m-368 In silico Conservation (interspecies DNA sequence homology) and transcription factor binding motif analysis of sequences (Ng et al., 2010)
Potential enhancer activity within hematopoietic cells/tissues in 20-24 hpf zebrafish embryos (Thomas, 2015)
-328 m-328 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016).
No enhancer activity observed in mice (Schütte et al., 2016).
-322 m-322 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016).
Identified, but not tested for enhancer activity (Schütte et al., 2016).
-101 m-101 In silico Conservation (interspecies DNA sequence homology) and transcription factor binding motif analysis of sequences (Ng et al., 2010)
Potential enhancer activity within hematopoietic cells in 20-24 hpf zebrafish embryos (Thomas, 2015)
-59 m-59 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016)..
Haematopoietic enhancer activity in E11.5 transgenic mice and luciferase enhancer activity in 416B cells (Schütte et al., 2016).
-43 m-43 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016).
No enhancer activity observed in mice (Schütte et al., 2016).
89
Regulatory region location (kb)
Name in this thesis Identification methods Functional validation
-42 m-42 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016).
No enhancer activity in observed mice (Schütte et al., 2016).
+1 m+1 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016).
Identified, but not tested for enhancer activity (Schütte et al., 2016).
+3 m+3 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016)..
Haematopoietic enhancer activity in E11.5 transgenic mice and luciferase enhancer activity in 416B cells (Schütte et al., 2016).
+110 m+110 In silico Conservation (interspecies DNA sequence homology) and transcription factor binding motif analysis of sequences (Ng et al., 2010)
Haematopoietic enhancer activity in E11.5 transgenic mice, potential enhancer activity within hematopoietic cells in 20-24 hpf zebrafish embryos and luciferase enhancer activity in 416B and Hela cell lines (Schütte et al., 2016; Thomas, 2015).
+181 m+181 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016).
No enhancer activity in observed mice (Schütte et al., 2016).
90
Regulatory region location (kb)
Name in this thesis Identification methods Functional validation
+204 m+204 ChIP-seq analysis of haematopoietic TFs (ERG, FLI1, GATA2, GFI1B, LYL1, MEIS1, SPI1, RUNX1 and TAL1) and H3K27ac (Schütte et al., 2016)..
Haematopoietic enhancer activity in E11.5 transgenic mice and luciferase enhancer activity in 416B cells (Schütte et al., 2016).
91
Table 3.3: Identified human RUNX1 regulatory regions with available classifications
Regulatory region location (kb)
Name in this thesis Identification methods Functional validation
-139 / E1 -139 ChIP-seq analysis of EBNA2 (Gunnell et al., 2016).
Luciferase enhancer activity in GM12878 and EBV positive Burkitt’s Lymphoma cell lines (Gunnell et al., 2016).
-188/ E4 -188 ChIP-seq analysis of EBNA2 (Gunnell et al., 2016).
Alone drives a 4.3 fold increase in luciferase enhancer activity in GM12878 and EBV positive Burkitt’s Lymphoma cell lines. When combined with -139 and -250 it drives a five-fold increase in enhancer activity (Gunnell et al., 2016).
-250/ E6 -250 ChIP-seq analysis of EBNA2 (Gunnell et al., 2016).
Luciferase enhancer activity in GM12878 and EBV positive Burkitt’s Lymphoma cell lines (Gunnell et al., 2016).
+62 +62 ChIA-PET (RNA polymerase II) and DNase I hypersensitive sites and conserved binding sites for the SNAG repressors GFI1/GFI1B and SNAI1 (Cheng et al., 2018).
Silencer activity in K562, OCl-AML3, U937 cell lines (Cheng et al., 2018).
Intron 5.2/RE2 intron 5.2 In silico Conservation (interspecies DNA sequence homology) and DNase I hypersensitive sites (Markova et al., 2011).
No enhancer or silencer activity observed in K562, Jurkat and HEK293 cell lines (Markova et al., 2011).
92
3.1.2 Previously identified long-range interactions with Runx1
promoters
Interaction between predicted CREs and promoters of the Runx1 gene cannot be
assumed, as fewer than 50% of enhancers contact the nearest gene promoter (Li et al., 2012).
In order to determine the long-range interaction between Runx1 promoters and its CREs, Dr.
Judith Marsman performed circular chromosome conformation capture (4C) (Marsman, 2016)
in the mouse haematopoietic progenitor cell line, HPC7. 4C detects interactions with chosen
regions or “baits” through use of next-generation sequencing. HPC7 cells resemble early
haematopoietic progenitors (Kolterud et al., 1998) and were chosen because Runx1 expression
is crucial for the development of definitive haematopoietic stem- and progenitor cells (Cai et
al., 2011; Growney et al., 2005; Ichikawa et al., 2004; Okuda et al., 1996; Wang et al., 1996).
Quantitative PCR (qPCR) analysis showed that in HPC7 cells, the Runx1 P1 isoform is highly
expressed, whereas the Runx1 P2 isoform is not (Marsman, 2016).
Dr. Marsman’s selected baits for Runx1 4C, situated at the Runx1 P1 and Runx1 P2
promoters and at the Runx1 +24 enhancer. For each of these regions, one bait was located at
the core region, and another bait was located at a nearby (1.5-2 kb away) cohesin/CTCF binding
site.
Cohesin/CTCF binding sites upstream of Runx1 P1 and downstream of the Runx1 +24
enhancer are present in CH12, MEL and HPC7 cells, whereas cohesin/CTCF binding sites
upstream of Runx1 P2 are not present in HPC7 cells (Wilson et al., 2016). 4C analysis with
baits at the Runx1 -P1 and -P2 promoters allows for comparison of interactions between the
active Runx1 -P1 promoter and inactive -P2 promoter.
Runx1 P1 interacted with seven regions, as well as with the Runx1 +24 enhancer (Table
3.4). Each regulatory element recruits multiple blood transcription factors in HPC7 cells,
suggesting they function as haematopoietic-specific enhancers. They are named according to
93
their distance from the Runx1 P1 transcriptional start site: -371, -354, -327, -321, -303, -58, -
48, and +110.
Of these putative enhancers, two had been previously described; conserved Runx1 m-
368, m+110 (Ng et al., 2010; Thomas, 2015).
94
Table 3.4: Identified long range interactions with putative mouse Runx1 regulatory regions.
Regulatory region location (kb)
Name in this thesis Identification methods Interactions identified Functional validation
-371 m-371 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P1 in HPC7 cells identified by 4C (Marsman, 2016).
Not investigated.
-368 m-368 Conservation and transcription factor binding motif analysis (Ng et al., 2010)
Interacts with +24 and P1 in HPC7 identified by 4C (Marsman, 2016). Interacts with P1 in HPC-7 cells identified by Capture Hi-C (Wilson et al., 2016).
Potential enhancer activity within hematopoietic cells/tissue in 20-24 hpf zebrafish embryos (Thomas, 2015)
-354 m-354 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P1 in HPC7 cells identified by 4C (Marsman, 2016). Interacts with P1 in HPC-7 cells identified by Capture Hi-C (Wilson et al., 2016).
Not investigated.
-327 m-327 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P1 in HPC7 cells identified by 4C (Marsman, 2016). Interacts with P1 in HPC-7 cells identified by Capture Hi-C (Wilson et al., 2016).
Not investigated.
-321 m-321 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P1 in HPC7 cells identified by 4C (Marsman, 2016).
Not investigated.
95
Regulatory region location (kb)
Name in this thesis Identification methods Interactions identified Functional validation
Interacts with P1 in HPC-7 cells identified by Capture Hi-C (Wilson et al., 2016).
-303 m-303 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P2 in HPC7 cells identified by 4C (Marsman, 2016). Interacts with P1 in HPC-7 cells identified by Capture Hi-C (Wilson et al., 2016).
Not investigated.
-58 m-58 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P1 in HPC7 cells identified by 4C (Marsman, 2016).
Not investigated.
-48 m-48 3D interaction with Runx1 Promoters described by 4C and transcription factor binding motif analysis (Marsman, 2016).
Interacts with +24 and P1 in HPC7 cells identified by 4C (Marsman, 2016).
Not investigated.
+110 m+110 Conservation and transcription factor binding motif analysis (Ng et al., 2010)
Interacts with P1 in HPC7 cells identified by 4C (Marsman, 2016).
Haematopoietic enhancer activity in E11.5 transgenic mice, potential enhancer activity in hematopoietic sites in 20-24 hpf zebrafish embryos and luciferase enhancer activity in 416B and Hela cell lines (Schütte et al., 2016; Thomas, 2015).
96
4C data identified new cis elements bound by TFs, that could have regulatory functions.
The Runx1 +24 enhancer interacted with all of the putative elements, except m+110. Runx1 P1
connected to all the identified regions excluding m-303. Conversely, the Runx1 P2 promoter
showed fewer interactions with enhancer clusters, but did interact with the putative regulatory
sites m-303, m-48 and m+24.
Figure 3.1: Schematic overview of the mouse Runx1 locus (chromosome 16:93,220,074-92,589,466).
Previously identified regulatory regions are annotated with orange circles, novel regions detected by 4C (Marsman, 2016) are represented by yellow circles, grey boxes annotate exons. The blue boxes show the exons encoding the Runt binding domain. The two Runx1 promoters are represented by black right angled arrows.
In humans, the interactions between RUNX1 promoters and regulatory regions were
identified using 3C for two regions, +62 and intron 5.2. +62 interacts with RUNX1 -P2 in K562
and OCl-AML3 cell lines (Cheng et al., 2018). Intron 5.2 Interacts with RUNX1 -P1 and P2 in
K562, Jurkat and HEK293 cell lines (Markova et al., 2011).
97
Figure 3.2: Schematic overview of human RUNX1 locus (chromosome 21: 36,148,773-36,872,777).
Each human RUNX1 regulatory region is annotated with orange circles, grey boxes annotate exons. The blue boxes show the exons encoding the Runt binding domain. The two RUNX1 promoters are represented by black right angled arrows.
3.1.3 Functional enhancer characterisation assays
In order to determine the enhancer function of these identified regions, enhancer assays
can be used. Luciferase assays show the ability of a regulatory element to drive expression of
the firefly luciferase reporter gene Luc2 in cell lines. If the luciferase vector contains an
enhancer, there should be an increase in expression of luciferase, which can be determined with
a luminometer.
Zebrafish reporter assays can identify cell-type specific gene expression driven by
putative enhancers. If the zebrafish vector contains an enhancer there should be expression of
green fluorescent protein (GFP) in the cells that can respond to that enhancer. This is
determined by viewing zebrafish embryos during development under a fluorescent microscope.
The zebrafish enhancer assay vectors contain integration sites so that the reporter gene
constructs can be incorporated into the host genome. The Horsfield lab currently utilises three
reporter assays; ISceI-pBSII SK+ which integrates using I-Sce I sites, and Tol2Gata2 and ZED
which use Tol2 sites. Transient expression can be analysed in F0, however the F1 generation
contain a germ line integration of the constructs, to avoid the problem of a mosaic expression
pattern seen in F0.
98
By using zebrafish enhancer assays I can define whether these putative regulatory
elements influence any cell- and/or stage-specific Runx1 haematopoietic expression. It is
possible that mutations or defects in the regulation of in such elements may explain AML
progression in patients who have no known protein-coding mutations or perhaps patients who
have cohesin mutations. It is also necessary to determine whether the identified mouse
enhancers are conserved in humans.
3.2 Hypothesis
I hypothesise that the regions identified by Runx1 4C have enhancer capabilities and
some of them may be conserved in humans. As most enhancers rapidly evolve and are rarely
conserved at a sequence level between species, any regions that are conserved may have an
essential roles in RUNX1 gene regulation. Understanding the regulatory mechanisms that drive
cell-specific expression of RUNX1 in haematopoiesis will have implications for causality and
treatment of leukaemias with RUNX1 dysregulation.
3.3 Aims
1) Elucidate conservation and genomic characteristics of previously identified and
novel Runx1 cis-regulatory elements using bioinformatic tools.
2) Identify the enhancer ability of putative cis-regulatory elements previously
identified by 4C, using zebrafish reporter assays.
99
3.4 Characteristics of previously identified regulatory regions
In order to understand the overall contributions of regulatory regions to Runx1
haematopoietic expression and their potential to be dysregulated in myeloid malignancies, in
silico analyses were carried out. Bioinformatic tools can be used to determine conservation of
a regulatory regions, which cells these regions are active in and what transcription factors bind
to them.
3.4.1 Epigenetic and conservation analyses using ENCODE data
ENCODE project data from RUNX1-dependent Human Chronic Myelogenous
Leukaemia cells (K562) were analysed at each conserved locus (Barski et al., 2007; Raney et
al., 2014; Rosenbloom et al., 2013; Wilson et al., 2016) using the UCSC Genome Browser.
Elements included in the search for regulatory features were transcription factor
binding sites (TFBS), open chromatin (DNase I hypersensitivity), histone modifications, CTCF
binding sites, RNA polymerase II (Pol II), cohesin (RAD21 and SMC3 subunits) binding sites
(Table 3.5).
ENCODE also has Chromatin State predictions (ChromHMM) for K562 cells (Kellis
et al., 2017). ChromHMM annotates the data based on epigenetic information. The predicted
chromatin states are active promoter, promoter flanking, inactive promoter, candidate strong
enhancer, candidate weak enhancer, distal CTCF/candidate insulator, transcription associated,
polycomb repressed and heterochromatin/repetitive/copy number variation. Transcription
associated refers to transcriptional transition or elongation (Ernst et al., 2010; Ernst et al.,
2011).
Further gene regulatory markers for K562 cells were uploaded into UCSC genome
browser including; super-enhancers from dbSUPER (Khan et al., 2016), silencers annotated
by Pang et al., (2020) (Pang et al., 2020), cohesin-mediated chromatin accessibility (Antony
et al., 2020).
100
Conservation analysis identifies whether these regions could be important for human
haematopoiesis. The UCSC BLAT tool (Kent, 2002) and NCBI BLAST tool (Altschul et al.,
1990) were used to determine the human co-ordinates of mouse (m) Runx1 -368, -327, -321, -
58, -43, -42, +3, +1, +110, +181, +204. Mouse Runx1 m-368, m-368, m-327, m-58, m-43,
m+3, m+110 and m+204 were all found to be conserved in humans (Table 3.6). Runx1 m+204
sequence sits within the previously described RUNX1 intron 5.2.
When gathering sequence information about these regulatory regions I noted that three
regions identified by Dr. Judith Marsman corresponded to the regions described by Schütte et
al., (2016); Marsman’s m-327 is Schütte’s m-328, Marsman’s m-321 is Schütte’s m-322 and
Marsman’s m-58 is Schütte’s m-59. These will be referred to from now as m-327, m-321 and
m-58.
To convert coordinates within a species UCSC ‘liftOver’ tool was utilised (Rosenbloom
et al., 2013). Appendix VII contains the Hg38 locations of Runx1 regulatory regions.
101
Table 3.5: Analysed ENCODE annotations
Regulatory association ENCODE annotation Open chromatin DNase I hypersensitivity (HS) Transcription initiation RNA Pol II (phosphorylated at serine
5) Cohesin (subunit) RAD21, SMC3 CTCF binding sites CTCF Histone acetyltransferase that is recruited to transcriptionally active enhancers (an increase in the expression of target genes)
p300
H3K4me1/2 demethylase (mediates transcriptional repression)
LSD1
Super-enhancer previously annotated by Khan et al., (2016)
K562 Super-enhancers
Silencer previously annotated by Pang et al., (2020)
Silencers
Cohesin mediated chromatin accessibility previously annotated by Antony et al., (2020)
Chromatin sites that show increased accessibility in the absence of cohesin (STAG2)
Chromatin State ChromHMM Enhancer Enhancer-associated
histone methylation or acetylation marks
H3K27ac, H3K4me1, H3K4me2
Active promoters H3K27ac, H3K9ac, H3K4me3
Repressed promoters H3K9me3, H3K27me3 Haematopoiesis transcription factors
Erythroid development GATA1 Haematopoietic SPI1 Blood/ Stem progenitors TAL1, GATA2
Table 3.6: Conservation and annotations of previously identified Runx1 regulators
Regulatory region location relative to P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation ChromHMM annotation
Cohesin-mediated chromatin accessibility
m-368 chr21:36849117-36849289
RNA Pol II, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, H3K27me3
Strong Enhancer
No change in accessibility in K562 STAG2 mutants
m-327 chr21:36800068-36801396
p300, H3K4me1, H3K9ac, H3K9me3, H3K27me3
Repressed No change in accessibility in K562 STAG2 mutants
102
Regulatory region location relative to P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation ChromHMM annotation
Cohesin-mediated chromatin accessibility
m-321 Not conserved between humans and mice
N/A N/A N/A
-250 chr21:36669712-36670621
RNA Pol II, p300, LSD1, H3K9me3, H3K27me3
Repressed No change in accessibility in K562 STAG2 mutants
-188 chr21:36608329-36608806
p300, H3K4me3, H3K9me3, H3K27me3, SPI1
Transcription Associated
No change in accessibility in K562 STAG2 mutants
-139 chr21:36561619-36562555
p300, H3K4me1, H3K4me3, H3K9ac, H3K9me3, SPI1
Weak enhancer, Transcription Associated
No change in accessibility in K562 STAG2 mutants
m-101 chr21:36542872-36543055
RNA Pol II, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9me3,
Transcription Associated
No change in accessibility in K562 STAG2 mutants
m-58 chr21:36478706-36478906
DNase I HS, RNA Pol II, CTCF, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, GATA1, TAL1, GATA2
Strong Enhancer
No change in accessibility in K562 STAG2 mutants
m-43 chr21:36464084-36464260
RNA Pol II, CTCF, p300, LSD1, H3K4me3, H3K9ac, H3K9me3
Weak Enhancer
No change in accessibility in K562 STAG2 mutants
m-42 Not conserved between humans and mice
N/A N/A N/A
103
Regulatory region location relative to P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation ChromHMM annotation
Cohesin-mediated chromatin accessibility
m+1 Not conserved between humans and mice
N/A N/A N/A
m+3 chr21:36418472-36418744
RNA Pol II, CTCF, p300, LSD1, H3K4me1, H3K4me3, H3K9ac, H3K9me3
Transcription Associated
No change in accessibility in K562 STAG2 mutants
+62 chr21:36358933-36359953
DNase I HS, RNA Pol II, CTCF, p300, LSD1, H3K4me1, H3K4me3, H3K9ac, H3K9me3, H3K27me3
Weak Enhancer
No change in accessibility in K562 STAG2 mutants
m+110 chr21:36280710-36281200
DNase I HS, RNA Pol II, RAD21, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, H3K27me3 GATA1, TAL1, GATA2
Strong Enhancer
Increased accessibility in K562 STAG2 mutants
m+181 Not conserved between humans and mice
N/A N/A N/A
Intron 5.2 containing m+204
chr21:36179311-36181581
DNase I HS, RNA Pol II, CTCF, p300, LSD1, H3K27ac, H3K4me1H3K9ac, H3K9me3, H3K27me3
Transcription Associated
Increased accessibility in K562 STAG2 mutants
m+204 Referred to from now as Intron 5.2 this thesis
chr21:36179925-36180456
DNase I HS, RNA Pol II, CTCF, p300, LSD1, H3K27ac, H3K4me1H3K9ac, H3K9me3, H3K27me3
Transcription Associated
Increased accessibility in K562 STAG2 mutants
Based on the ChromHMM data it appears that the in RUNX1-dependent K562 line m-
368, m-58, m+110 are all operating as strong enhancers, whilst -139, m-43, and +62 are
functioning as weak enhancers. Remarkably, +62 a known silencer in K562, also has enhancer-
associated histone methylation marks. Intron 5.2, m-101 and m+3 show the same regulatory
104
associations as the enhancers, however, they are labelled as transcription associated.
ChromHMM also reported -139 and -188 as transcription associated. Transcription associated
annotation could be due to their close proximity with exons when compared to the regions
annotated as enhancers, or it may indicate that these regions are producing RNA, such as
enhancer RNA (eRNA). Intron 5.2 and m+110 had increased accessibility in cohesin mutant
K562 cells. Antony et al., (2020) found that sites of increased accessibility were enriched for
motifs of enhancer-regulating bZIP or AP-1 factors (Antony et al., 2020). These regulatory
regions did not overlap with any known K562 super-enhancers or silencers. Interestingly, m-368
and m-327 are both within LOC100506403 an uncharacterised non-coding RNA and may
therefore play a role in its regulation as well.
m-368, -250, m-101, m-58, m-43, m+3, +62, m+110 and intron 5.2 were all bound by
LSD1 a mediator of transcriptional repression usually seen in silencers. SPI1 a haematopoietic
transcription factor was only bound to -139 and -188. Blood stem progenitor transcription
factor TAL1 and GATA2, as well as, GATA1 a marker of erythroid development were found
to bind m+110 and m-58.
These annotations provide a good overview of the likeliness of a region to be regulatory.
However, the analysis is limited to the types of cell lines that have been used for ChiP-seq by
previous groups, as well as which transcription factors were chosen. This may explain why no
consistent expression patterns could be observed in the available K562 data.
To further determine enhancer expression, FANTOM 5 data was considered. Rather
than assessing enhancers via chromatin accessibility and transcription factor binding,
FANTOM 5 detects enhancer RNAs using cap-analysis gene expression (CAGE). The
FANTOM 5 dataset is a compilation of many cell types and RNA transcription across the
genome. It can predict enhancer activity status based on whether short segments of RNA are
105
transcribed from the regulatory region. By understanding which cell types enhancers are
transcribed in, a pattern of enhancer arrangement may become apparent.
The FANTOM 5 data showed that not all regulatory regions produced enhancer RNA
(eRNA) in the cells available types for analysis. Regions m-368, -188, m-58, m-101 did not
have any eRNA expression. Although eRNA is typically used for enhancer identification, not
all active enhancers produce detectable eRNA (Mikhaylichenko et al., 2018; Sartorelli et al.,
2020). It remains unclear whether these enhancers produce low levels of eRNAs that are not
easily detectable with current methods.
All other regions expressed RNA in haematopoietic cells, however there was no
consistent expression patterns in other tissues (Table 3.7).
Table 3.7: Cell types with regulatory region RNA expression
Regulatory region location relative to P1 (kb)
Cell type expressing regulatory region RNA
m-368 Not expressed
m-327 Skin, Liver, Brain, Lung, Haematopoiesis, Stomach, Oesophagus, Colon, Gall Bladder, Mesenchymal
-250 Prostate, Brain, Haematopoiesis, Colon, Gall Bladder, Mesenchymal
-188 Not expressed
-139 Liver, Breast, Brain, Lung, Haematopoiesis, Oesophagus,
m-101 Not expressed
m-58 Not expressed
106
Regulatory region location relative to P1 (kb)
Cell type expressing regulatory region RNA
m-43 Brain, Lung, Haematopoiesis,
m+3 Haematopoiesis, Mesenchymal, Kidney
+62 Skin, Prostate, Liver, Breast, Brain, Lung, Haematopoiesis, Ovary, Oesophagus, Colon, Mesenchymal, Kidney
m+110 Prostate, Haematopoiesis
Intron 5.2
Skin, Prostate, Brain, Lung, Haematopoiesis, Stomach, Colon, Mesenchymal, Kidney
Single nucleotide polymorphism (SNP) analysis
Ernst. et al., (2011) highlighted that disease variants, such as single nucleotide polymorphisms
(SNPs) in genome-wide association studies (GWAS) frequently overlapped with enhancer
elements relevant to the disease cell type (Ernst et al., 2011). To determine if there were any
SNPs in each regulatory region, GWAS catalog (Buniello et al., 2019) and HaploReg v4.1
(Ward et al., 2016) were used. SNPs were also analysed to ensure they had a minor allele
frequency (MAF) of >1 % using the Genotype-Tissue Expression (GTEx) portal v8
(https://gtexportal.org/home/). gnomAD browser v2 1.1 and v3 (Koch, 2020) were used to
review SNPs with MAF >1 % for any ClinVar associations and/or publications. Some of the
SNPs identified in this research (Table 3.8) were linked to SNP variants with r2 values of >=
0.8 and r2 >= 1. As this was a preliminary investigation these linked SNPs were not investigated
further.
Multiple SNPs were identified in all of the regulatory regions except m-101. m-368,
m+110 and intron 5.2 contained SNPs that have functional effects in haematopoietic cells
(Table 3.8). Therefore, mutations in these regulatory regions have the potential to alter TF
107
binding sites and affect the regulatory activity of that region of DNA. Some of the other SNPs
may be functional, but have not yet been investigated.
Table 3.8: SNP analysis of previously identified Runx1 regulators
Regulatory region
Number of SNPs (MAF>=1%)
SNP name Predicted changes SNP function
m-368 3 rs116951441
not predicted to change any motifs
Not investigated
rs16993221 A to T change of rs16993221 alters 2 regulatory motifs BATF and IRF which work together in immune response.
Related to white blood cell count a crucial marker which contributes to chronic inflammation (p value: 2E-8)(Kong et al., 2013).
rs909143 The A to G change in rs909143 is predicted to affect 3 regulatory motifs
Has an eQTL with oesophagus and is related to gene expression of Kynurenine 3-monooxygenase gene in peripheral blood monocytes (p value: 1.84E-06)(Zeller et al., 2010).
m-327 2 rs4817723 not predicted to change any motifs.
Not investigated
rs12106380 T to G change of rs12106380 alters 6 regulatory motifs including GATA binding sites.
Not investigated
-250 3 rs57911917 not predicted to change any motifs
Not investigated
rs2834885 C to A change of rs2834885 alters 1 regulatory motif, PEBP.
Not investigated
108
Regulatory region
Number of SNPs (MAF>=1%)
SNP name Predicted changes SNP function
rs61607093 T to C change of rs61607093 alters 20 binding motifs including GATA3, Nanog, SPI1, Pax5 and p300
Not investigated
-188 1 rs35526434 C to T change alters 12 regulatory motifs including SPI1, PAX5 and p300.
Not investigated
-139 3 rs189789980
T to G change alters 3 regulatory motifs.
Not investigated
rs140039393
TTA to T change alters 4 binding motifs including Sox5.
Not investigated
rs2834825 G to A change alters 19 motifs including Pou2f2.
Not investigated
m-101 0 Not applicable
Not applicable Not applicable
m-58 13 rs2834768 G to A change of rs2834768 alters 13 regulatory motifs including MYC, TAL1 and multiple GATA sites.
Not investigated
rs2834769 C to T alters 3 binding motifs.
Not investigated
rs13049322 not predicted to change any motifs
Not investigated
rs73374626 G to A change alters 5 motifs including MYC
Not investigated
rs9984842 C to T is predicted to alter 12 motifs including KLF4, KLF7 and SP1.
Not investigated
109
Regulatory region
Number of SNPs (MAF>=1%)
SNP name Predicted changes SNP function
rs113221662
C to T has 7 predicted motif binding alterations including GATA3 and SP1.
Not investigated
rs66840558 2 altered motifs. Not investigated
rs144641305
contains a change from AAG to A and is predicted to affect 10 binding sites including p300, PAX5, NANOG
Not investigated
rs1883063 not predicted to change any motifs
Not investigated
rs8129889 not predicted to change any motifs
Not investigated
rs2834770 A to G change alters CEBPβ site and Hdx (brain related).
Not investigated
rs7283199 C to T change which predicts Sox17 (which modulates WNT3A and represses RUNX1 expression) and HDAC2 motif changes
Has eQTLs reported with skin and thyroid
rs41360844 G to C shows 2 predicted regulatory motif changes.
Not investigated
m-43 2 rs73902837 C to G predicted to change 2 regulatory regions
Not investigated
rs2834756 T to C change alters 4 regulatory motifs
Not investigated
110
Regulatory region
Number of SNPs (MAF>=1%)
SNP name Predicted changes SNP function
m+3 1 rs9978978 T to C change to change 1 regulatory motif
Not investigated
+62 2 rs933131 G to A change alters 4 regulatory motifs including Pax5 and Pax8.
Not investigated
rs2834716. not predicted to alter any regulatory motifs
Not investigated
m+110 2 rs73900579 T to C change alters 2 regulatory motifs CEBPα and Pax4.
Has an eQTL with brain related to red cell distribution width (Kichaev et al., 2019).
rs201708857
A to AG change is predicted to alter 20 regulatory motifs including CEBPA, SP1 and Pou2f2.
Not investigated
Intron 5.2
3 rs2268276 (G major - A minor) alters 2 regulatory motifs.
These two SNPs (rs2268276 and Rs2249650) were found to be associated with acute myeloid leukaemia susceptibility. The different SNPs are in LD and change the ability of the enhancer. A (rs2249650) G (rs2268276) bases and AA have more enhancer capability that GA and GG. GA has the most reduced enhancer capability. The SNPs also affect SPI1 binding capability; AG has strong SPI1 binding, GA has weak SPI1 binding, whereas AA and GG have medium SPI1 binding capability (Xu et al., 2016)
Rs2249650 (G major- A minor) alters 4 binding motifs including CEBPA and Pouf2f.
111
Regulatory region
Number of SNPs (MAF>=1%)
SNP name Predicted changes SNP function
rs8130772 A to G alters 5 binding motifs including SP1 and FLI1.
Not investigated
To determine a potential regulatory relationship, Dr. William Schierding and I
identified which genes a SNP physically interacts with and whether that SNP-gene interaction
is supported by an allele-specific change in gene expression (CoDeS3D pipeline (Fadason et
al., 2018)). First, CoDeS3D interrogates 70 Hi-C chromatin interaction libraries (from a diverse
set of human cell lines) to identify interactions across the genome with the SNPs. Next, each
SNP is tested for an allele-specific association to the expression level of the spatially associated
partner gene (the SNP is an expression quantitative trait locus (eQTL).
Despite finding many regulatory spatial interactions between RUNX1 regulatory region
SNPs and RUNX1, none of them were supported by a significant eQTL interaction (FDR<
0.05). One reason for this lack of significance is that the CoDeS3D pipeline uses the GTEx
portal for eQTL outputs, which only can test 54 tissues sampled from healthy older individuals.
GTEx also only measures whole blood, rather than individual cell types, so cell type- or
disease-specific effects may not be detected.
3.4.2 Comparative analysis of cancer genomic datasets
To assess whether any AML patients had mutations in the conserved regulatory regions,
publicly available cancer genomic datasets were explored. Online tools were used to categorise
publicly available patient information into different tumour types for analysis.
The large-scale cancer genomics datasets were retrieved from International Cancer
Genome Consortium (ICGC) Data Portal (Hudson et al., 2014; Zhang et al., 2011), the c-
BioPortal for Cancer Genomics (Cerami et al., 2012; Gao et al., 2013) and the Cancer Genome
Atlas (TCGA) Data Portal (CGARN, 2013).
112
International Cancer Genome Consortium (ICGC) was the only database that allowed
for analysis of non-coding regions. RUNX1 regulatory regions (as defined in Table 3.6) were
analysed for mutations in cancer datasets, with 237 mutations found (Table 3.9). The number
of mutations observed in the different RUNX1 regulatory regions in haematopoietic- related
cancers was relatively low (6.75%). The majority of mutations in these regions occurred in
skin-related cancers (39.24%) (Table 3.9). However it is worth noting that even normal skin
has a high mutational load relative to other tissues (García-Nieto et al., 2019; Martincorena et
al., 2015).
Table 3.9: Spread of 237 Runx1 regulatory region mutations that were found in ICGC dataset represented
in a range of primary tissue sites.
Primary site of mutation Representation of found mutations (%) Skin 39.24% Liver 10.97% Brain 8.44% Oesophagus 8.02% Haematopoietic 6.75% Prostate 4.22% Breast 3.80% Pancreas 3.80% Lung 3.38% Ovary 2.53% Gall Bladder 2.53% Mesenchymal 1.69% Head and Neck 1.69% Stomach 1.27% Kidney 0.84% Colorectal 0.42% Uterus 0.42%
The regulatory regions showed different frequencies of mutations in this dataset (Figure
3.3). m-58 had the highest frequency of mutations with 81 incidences, followed by intron 5.2
(41) and then m-327 (27). m-58 also showed the highest incidence of mutation in
haematopoietic cancers (4). Each enhancer had a relatively diverse profile of primary mutations
sites (Figure 3.3)
113
Figure 3.3: Proportions of cancerous primary site mutations of regulatory regions.
Each of the regulatory regions displayed diverse cancerous primary site profiles. Primary site profiles are colour coded. Key is at the bottom of the figure. Total number of mutations excluding skin in each regulatory region shown at top of each bar. As normal skin has a high mutational load relative to other tissues it was not included in this diagram.
Some of the regulatory region SNPs (from Section 3.4.2) were reported in patients: an
oesophageal adenocarcinoma patient had the m-327 T>G rs12106380 SNP, another
oesophageal adenocarcinoma sample had the same G to A change reported in m-58 rs8129889
SNP. Three patients had -250 SNP rs2834885, two with acute myeloid leukaemia and one with
oesophageal adenocarcinoma. Four patients with malignant lymphoma had m-58 mutations
chr21:g.36480715C>A (rs199811665), chr21:g.36480761A>C and chr21:g.36481470C>T
(rs1440069314). None of these SNPs were reported in the GTEx portal. Two SNPs in +62 were
found in patients with acute myeloid leukaemia rs933131 and rs2834716. Interestingly, the
patient with rs2834716 also had the intron 5.2 SNPs (rs2268276 and rs2249650) which have
been previously associated with acute myeloid leukaemia susceptibility (Xu et al., 2016).
114
Intron 5.2 SNP rs2268276 was also found in a patient with lung cancer. Three other patients
also reported intron 5.2 SNP rs2249650, two with lung cancer, one with paediatric brain
tumour.
The genomic data in the ICGC Data portal contains multiple mutations per patient. The
analyses of these regions shows that some of the reported SNPs are found in AML patients,
however, it is unknown whether many of these SNP have a functional impact in these patients.
As was done for the analysis of intron 5.2 SNPs (Xu et al., 2016), there is a need to analyse
each of these SNPs for functional capability. Interestingly, the majority of cancers reported
with mutations in RUNX1 regulatory regions were not blood-specific, instead reflecting the
roles of RUNX1 in other tissues (Groner et al., 2017). The tissues highlighted as expressing
regulatory regions in FANTOM 5 data (Table 3.7) were the same primary sites of mutation in
ICGC data (Table 3.9, Figure 3.3).
3.5 Zebrafish enhancer reporter assays
To understand the tissue-specific activity of Runx1 regions that directly interact with Runx1 -
P1, -P2 and +24 (Marsman, 2016), zebrafish reporter assays were used.
DNA sequences of the putative enhancers were amplified from mouse gDNA then
cloned into zebrafish enhancer detector assay reporter constructs (Figure 3.4), and injected into
one-cell zebrafish embryos.
115
Figure 3.4: Schematic representation of ZED assay vector.
This vector contains two cassettes, the first cassette consists of cardiac actin promoter (blue) upstream of Red Fluorescent Protein (red). The second cassette contains the gateway cassette (yellow), which is immediately downstream of the gata2 promoter (blue), downstream of the green fluorescent protein (green). The second cassette is protected by insulator sequence (purple). Purple dotted line indicates the insulated sequence, which prevents positional effects from surrounding DNA, following integration. Tol2 sites (orange) allow for integration of vector into zebrafish genome. The vector contains two excision cassettes, one targeted by Flip-recombinase (FRT sites) and the other by Cre-recombinase (LoxP sites). Figure from Thomas (2015).
The zebrafish enhancer detector assay (ZED) is designed to have two different
cassettes; one to test enhancer activity of the regulatory element in a cell type specific manner
and a second cassette which functions as a an internal control for assessing injection efficiency
and a transgenic marker. If the regulatory region acts as an enhancer it will cooperate with the
gata2 promoter to drive cell-specific GFP expression. The enhancer detection cassette is
bordered by two insulator sequences which prevent interaction between the two cassettes. The
two insulator sequences also prevent position effects from the surrounding DNA following
integration, eliminating false positives. LoxP and FRT sites allow for deletion of either cassette
using Flipase or Cre recombinase. The internal control cassette utilises the cardiac actin
promoter (pCarA) to drive, Red Fluorescent Protein (RFP) expression in the somites, which
serves as a marker for successful transgene integration. The second cassette designed to test
the cell-type specific functional ability of a putative enhancer contains; a gateway entry site for
the putative enhancer, a gata2 minimal promoter and the enhanced GFP reporter gene.
Tol2 sites permit transposon-mediated integration into the zebrafish genome with high
efficiency in the presence of Tol2 transposase mRNA (Kawakami, 2005). Germline integration
into the host genome allows for expression of the ZED cassette in subsequent generations. If
116
vector is successfully integrated, non-mosaic expression patterns are seen. Unsuccessful or
partial integration can lead to inconsistent and mosaic expression.
3.5.1 Zebrafish Enhancer assay F0:
All putative mouse Runx1 enhancer regions identified by Dr. Marsman’s 4C (ZED
m+110, m-48, m-58, m-303, m-321, m-327, m-354, m-368 and m-371) and ZED +24 were
tested for enhancer activity in a zebrafish enhancer assay (Figure 3.4).
Runx1 +24 was used as a positive control and empty ZED assay vector was utilised as
a negative control. (Figure 3.5 A). Runx1 +24 drove GFP expression specifically in the
intermediate cell mass (ICM) and posterior blood island, which are sites of hematopoietic
progenitor cell production, at 20-24 hours post-fertilisation (hpf) (de Jong et al., 2005).
Eight of the nine putative enhancer regions ZED m+110, m-48, m-58, m-303, m-321,
m-354, m-368 and m-371, showed similar expression patterns to +24 (Figure 3.5 B). These 8
regions functioned as hematopoietic-specific enhancers in zebrafish, indicating that their
enhancer activity is conserved.
ZED m-303 also showed expression in the keratinocytes at 20-24 hours post
fertilisation (hpf) and 48 hpf as well as haematopoietic activity at 20-24 hpf (Figure 3.5 C).
ZED empty and ZED m-327 had no observed enhancer activity at these timepoints (Figure 3.5
A,B).
Further analysis of these Runx1 enhancers highlighted that each region had individual
expression patterns (Figure 3.6). Of the enhancer patterns +24, and m-368 had similar patterns
with the highest expression in the dorsal aorta (DA), followed by expression in the intermediate
cell mass (ICM). ICM is the site of the primitive wave of haematopoiesis and it generates
primitive erythrocytes and limited myeloid populations. The DA is a main axial vessel, which
carries blood away from the heart, towards the tail. The first wave of definitive haematopoiesis
arises in the ventral wall of the DA.
117
The m-371, m-354, -m-321, m-58 and m-48 enhancers drove the highest expression in
the ICM. m+110 produced even expression in the DA and ICM as well as in the tail bud of the
zebrafish. m-303 produced the highest expression keratinocytes in the tail lining, followed by
DA expression. High DA expression suggests m-371, m-368, m-354 and m+110 have the
potential to act as blood cell-specific enhancers, and may be involved in the initial wave of
definitive haematopoiesis.
118
Figure 3.5: Putative mouse enhancers are active in hematopoietic regions in zebrafish.
Whole mount representative lateral images of zebrafish embryos (20-24 hpf [A, B] and ~48 hpf [C]) that were injected with enhancer-GFP plasmids at the one-cell stage; left hand panels are merges with bright field, right hand panels are GFP fluorescence. (A) negative control ZED plasmid shows zero (image) or non-specific fluorescence, while positive control (+24) shows GFP expression in hematopoietic cells of the posterior blood island (red arrows) and intermediate cell mass (yellow arrows). (B) enhancer activity of m+110, m-48, m-58, m-303, m-321, m-327, m-354, m-368, m-371. (C) The m-303 enhancer also expressed GFP in keratinocytes (white arrow). The numbers in the right hand panels represent the number exhibiting the representative phenotype out of the total number of fluorescent embryos analysed. Figure originally used in Marsman et al., (2017) (Marsman et al., 2017) This image is used with permission under Creative Commons CC BY license.
119
Figure 3.6: . Quantitative summary of GFP expression patterns generated by putative mouse enhancers in
zebrafish.
For each enhancer construct (m+110, m- 48, m-58, m-303, m-321, m-327, m-354, m-368, -m371) and the negative (ZED) and positive (+24) controls, GFP expression data in different tissues is shown. Diagrams represent pooled expression from at least 5 photographed embryos per putative enhancer (20-24 hpf), displayed on illustrations of a 24 hpf zebrafish embryo. Categories of expression zones are colour-coded: key is at the bottom of the Figure. Bar graphs display the percentage of the total number of GFP-expressing embryos that show expression in each tissue category for each putative enhancer. The total number of expressing embryos analysed per construct is displayed in the top right corner of each graph. Figure originally used in Marsman et al., (2017) (Marsman et al., 2017) This image is used with permission under Creative Commons CC BY license.
120
The expression of enhancer assays in injected generation of zebrafish (F0) is mosaic
(not expressed in every cell), however this still provides information about tissue specific
activities for the identified enhancers. F1 fish can be grown and used for further analysis.
3.5.2 Zebrafish Enhancer assay F1:
Although strong enhancers were detected using transient GFP expression in zebrafish
reporter assays, it is more informative to analyse the F1 fluorescence, as each embryo should
have complete integration of the construct in all cells (Bessa et al., 2009). Cell-specific GFP
expression through development would reveal more precisely tissue-specific spatiotemporal
activity of the enhancer.
The injected generation, F0, which presented complete fluorescence in the somites of
embryos were selected to produce transgenic lines. When the F0 were sexually mature they
were out-crossed with wild type fish in order to determine the individual integration success of
the assay into the gametes. F1 fish can be grown and used for further analysis.
Ute Zellhuber-McMillan and I bred all the fish lines to try establish stable F1 lines.
Together we bred 11 different fish enhancer lines, from which over 140 fish were paired for
breeding and 116 fish produced embryos. The fish produced 15,314 embryos which were
screened for green fluorescence (20-24 hpf) and red fluorescence (48 hpf), however no
embryos showed any control somite RFP or enhancer GFP.
Lack of fluorescence does not appear to be a Tol2 mRNA issue, as others lines in the
lab have made successful F1 lines from the same Tol2 mRNA (Ketharnathan et al., 2018). It is
unlikely that the lack of fluorescence is due to injection technique as I clearly observed
fluorescence in F0, and have completed successful double injections for other projects during
my PhD (Meier et al., 2018). This could be a blood enhancer specific issue, as none of the
different plasmids used have produced successfully integrated F1 lines (I previously have used
ZED, SCE, Tol2Gata2). Personal communication with the Zon lab indicated that other
121
researchers have been unsuccessful in making haematopoietic specific enhancer F1 lines using
assays that contained the zebrafish gata2 promoter. It has been proposed that due to the delicate
nature of haematopoiesis, if the plasmid DNA is integrated into the zebrafish genome, it is
actively repressed to maintain the correct haematopoiesis. As the plasmid insert is repressed
there is no RFP or GFP expressed in F1 fish.
3.6 Genomic characteristics of 4C identified functional novel
enhancers
To understand if there was a rationale to the cell type patterns observed in the zebrafish
enhancer assays (Figure 3.6), Dr. Judith Marsman analysed the Runx1 4C identified regulatory
regions (Table 3.4) (Runx1 m+24, m+110, m-48, m-58, m-303, m-321, m-327, m-354, m-368
and m-371) in the available mouse haematopoietic progenitor cells (HPC7) ENCODE data. No
consistent patterns could be observed. Runx1 -327 had similar transcription factor binding to
other observed enhancers but no observed enhancer activity in zebrafish embryos. Interestingly
the Runx1 -368 does not bind any transcription factors, shows no sign of open chromatin
(DNase I HS site) and therefore does not appear to have a regulatory function in developing
haematopoietic cells.
Of the Runx1 4C identified regulatory regions, m-371, m-354, m-303 and m-48 were
novel. As noted in section 3.1.1 m-368, m-327, m-321, m-58 and m+110 were identified by
other groups (Ng et al., 2010; Schütte et al., 2016).
In order to further understand the novel Runx1 enhancers identified by Judith Marsman
(m-371, m-354, m-303 and m-48), I carried out the same in silico analysis used to define the
previously identified Runx1 regions (Section 3.4).
122
3.6.1 Conservation and epigenetic analysis
The only novel Runx1 enhancer that is conserved in humans is m-371 (Table 3.10).
This region shows transcription factor binding sites and chromatin modifications similar to that
found at m-58 and m+110. ChromHMM annotates this region as a strong enhancer like m-368,
m-58 and m+110. The m-371 enhancer region sits within the same uncharacterised non-coding
RNA LOC100506403 as m-368 and m-327.
m-354, m-303 and m-48 are mouse specific novel Runx1 enhancers.
Table 3.10: Conservation and annotations of novel Runx1 regulators
Regulatory region location relative to P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation ChromHMM annotation
Cohesin mediated chromatin accessibility
m-371 chr21:36855111-36856085
DNase I HS, RNA Pol II, CTCF, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, H3K27me3, GATA1, TAL1, GATA2
Strong Enhancer
Increased accessibility in K562 cohesin mutants
m-354 Not conserved between humans and mice
N/A N/A N/A
m-303 Not conserved between humans and mice
N/A N/A N/A
m-48 Not conserved between humans and mice
N/A N/A N/A
3.6.2 Single nucleotide polymorphism (SNP) analysis
The FANTOM 5 data showed that m-371 RNA was actively expressed in skin and
prostate cells. SNP analysis of m-371 showed that one of the two SNPs in this region is related
123
to Kynurenine 3-monooxygenase (KMO) expression in peripheral blood monocytes (Table
3.11) identical to m-368 SNP rs909143. KMO is typically researched in a neurological context,
as high KMO expression leads to neurodegeneration (Breton et al., 2000).
The 3D spatial interactions found by CoDeS3D pipeline between RUNX1 m-371 SNPs
and RUNX1, were not supported by any significant eQTL interactions.
Table 3.11: SNP analysis of novel m-371 enhancer
Regulatory region
Number of SNPs (MAF>=1%)
SNP name Predicted changes SNP function
m-371 2 rs2834944 The T to C change alters 4 regulatory motifs.
Not investigated
rs2834945 The T to C change is predicted to affect 11 regulatory motifs including GATA2 and PAX3 and PAX5.
Has an eQTL with oesophagus and is related to Gene expression of KMO in peripheral blood monocytes (p value: 1.36E-06)(Zeller et al., 2010).
3.6.3 Comparative analysis of cancer genomic datasets
The majority of m-371 mutations (n=23) presented with skin as the primary site
(40.90%), followed by prostate, liver and breast primary sites (13.63% each). m-371 also
presented in oesophagus (9.09%), pancreas (4.54%) and ovary (4.54%).
The FANTOM5 expression of m-371 in skin and prostate cells overlaps with the
primary tissues that contain m-371 mutations. This suggests that m-371 is an enhancer in skin
and prostate cells, as it is producing eRNA. Mutations in this enhancer, in skin and prostate
tissues may alter normal cell development.
124
3.7 Chapter summary
This work identified conservation of 9 out of the 16 mouse Runx1 regulatory regions
from mouse to human (m-371, m-368, m-327, m-101, m-58, m-43, m+3, m+110, m+204 ). m-
354, m-321, m-303, m-48, m-42, m+1 and m+181 are not conserved in human.
In K562 cells ChromHMM annotates m-368, m-58, m+110 -139, m-43, and +62 as
enhancers. Intron 5.2, m-101 and m+3 show the same regulatory associations as the enhancers,
however, they are instead labelled as transcription associated. This could be due to their close
proximity with exons when compared to the regions annotated as enhancers. ChromHMM also
reported -139 and -188 as transcription associated. m-371, m-368 and m-327 all sit within
LOC100506403, an uncharacterised non-coding RNA.
m-368, -250, m-101, m-58, m-43, m+3, +62, m+110 and intron 5.2 were all bound by
LSD1, a mediator of transcriptional repression usually seen in silencers. SPI1, a haematopoietic
transcription factor was only bound to -139 and -188. Blood stem progenitor transcription
factors TAL1 and GATA2, as well as GATA1, a marker of erythroid development, were found
to bind m-371, m+110 and m-58.
Cheng et al., (2018) characterised +62 as a silencer of RUNX1 P2 in K562, OCI-AML3
and U937 cells (Cheng et al., 2018). However in K562 cells +62 has ENCODE and
ChromHMM annotations associated with enhancer activity.
Intron 5.2, m+110 and m-371 regions were annotated as differentially opened
chromatin in the ATAC-seq of the cohesin mutant (Antony et al., 2020). This suggests these
enhancers’ interactions could be dependent on cohesin. In this chapter, I observed a
coincidence of cohesin subunit Rad21 binding in the absence of CTCF with +110 enhancer,
and CTCF binding with m-371, m-58, m-43, m+3, +62 and intron 5.2. This suggests that CTCF
and cohesin could independently mediate a subset of enhancer-promoter looping in
combination with transcription factors.
125
Nine of the 13 conserved RUNX1 regulatory regions produced eRNA in human
haematopoietic cells. SNP analysis identified SNPs in each of the conserved regulatory regions
except m-101. Of the 35 SNPs 5 had previously reported functional effects in haematopoietic
cells. The available AML genomic data is relatively limited because the majority of analyses
were done with exome sequencing data rather than whole genome sequencing data.
I functionally characterised novel Runx1 enhancers m-371, m-354, m-303 and m-48
using zebrafish enhancer assays. I also characterised previously identified m+110, -58, -321, -
327, -368 showing that all of these have enhancer capability except m-327. Despite the lack of
activity seen in embryos expressing m-327, this region spatially connects to the active promoter
Runx1 P1 and +24 in HPC7 cells so it may have another regulatory role in this hub. My analysis
showed enhancer activity in haematopoietic tissues of developing zebrafish embryos,
highlighting that they appear to be important in both primitive and definitive waves of
haematopoiesis.
After I functionally analysed the mouse enhancers using zebrafish reporter assays, Zhu
et al., (2020) further characterised m-371 using single cell (sc) ATAC-seq and scRNA-seq
from developing mice. m-371 is only accessible in pre- haemogenic endothelial (HE) cells and
in the intra -arterial clusters (IAC) in mice. This suggested it could initiate Runx1 P1 in pre-
HEs and contribute to expression of P1 in IACs (Zhu et al., 2020).
Of the enhancers functionally identified, all except +110 interacted with Runx1 +24
(Marsman, 2016), suggesting that +24 could be anchoring an enhancer hub. These CREs are
conserved between mice and humans suggesting they are fundamental for the correct regulation
of RUNX1.
This chapter has provided an insight into mouse Runx1 enhancer functions and
highlighted the features of conserved human enhancers. As shown in Dr. Judith Marsman’s 4C
126
analysis +24 is a central hub that binds most active Runx1 enhancers in HPC7s. +24 may also
be the organiser of correct enhancer-promoter interactions for Runx1 regulation.
127
4 Chapter 4: Three-dimensional (3D) connections of
Runx1 +24
4.1 Introduction
The three-dimensional (3D) organisation of the genome is non-random (Holwerda et
al., 2012; Lieberman-Aiden et al., 2009). The genome is organised into topologically
associated domains (TADs), formed by an elevated number of chromatin interactions within a
genomic locality, with fewer interactions to outside regions (Lieberman-Aiden et al., 2009;
Rao et al., 2014). Genes located within the same TAD often have the same histone
modifications and are co-regulated (Rao et al., 2014). The borders of domains are enriched for
cohesin and CTCF binding sites, housekeeping genes, short interspersed element (SINE)
retrotransposons and transfer RNAs (Dixon 2012). Loss of CTCF relaxes the global domain
structures, which increases contacts between domains, whereas loss of cohesin mostly affects
interactions within domains (Li et al., 2013b; Zuin et al., 2014). 3D connections of the genome
are predominantly detected by Hi-C and Promoter Capture sequencing, allowing for high
throughput analysis and subsequently the creation of genome interaction maps (example shown
in Figure 4.1).
128
Figure 4.1: Hi-C contact map of the RUNX1 region in K562 cells.
Hi-C data generated by Rao et al., (2014) from chromosome 21 was visualised in 3D genome browser (http://3dgenome.fsm.northwestern.edu/view.php) at a 25-kb resolution (Wang et al., 2018). Every square is coloured according to the frequency of two regions interacting. The alternating yellow and blue bars are predicted TADs. The dark red bars are DNase I hypersensitive sites (DHS) in the same cell type. Locations of genes are annotated according to their transcription direction, forward strand (black) and reverse strand (blue).
In Chapter 3 I described the characterisation and function of Runx1 regulatory regions.
The majority of functional haematopoietic-specific Runx1 enhancers interacted with +24 in
mouse hematopoietic progenitor (HPC7) cells (Marsman, 2016). Runx1 +24, a well
characterised enhancer, interacts with promoters (P1 and P2) of Runx1 (Markova et al., 2011;
Marsman, 2016). I hypothesise that Runx1 +24 may nucleate an enhancer hub, thereby
controlling Runx1 connections and contributing to the overall Runx1 organisation and
expression.
Previous work from Dr. Antony described RUNX1 transcription in normal and cohesin
deficient megakaryocytic differentiation (Antony et al., 2020). Runx1 +24 is crucial to
hemogenic endothelial cells in mouse and zebrafish (Markova et al., 2011; Marsman et al.,
129
2017; Ng et al., 2010; Schütte et al., 2016). Interestingly this well-known enhancer was also
described as a silencer in the non-haematopoietic cell line, HEK293 (Markova et al., 2011).
Understanding where in the genome Runx1 +24 connects to and how it contributes to Runx1
organisation is imperative to understanding the transcriptional regulatory landscape of Runx1.
4.1.1 Secondary structure of DNA
G-quadruplexes are secondary structures that are often present in telomeres and also
gene promoters; however, their function remains elusive. Previous studies on G-quadruplexes
indicate possible roles for them in controlling gene expression (Du et al., 2007; Eddy et al.,
2008; Zhao et al., 2007).
4.1.1.1 Secondary structure of Runx1 +24
In my previous Master’s work, a possible G-quadruplex (G4) structure was identified
in the Runx1 +24 mouse cis-regulatory element (CRE). Despite multiple attempts and different
primers, the mCRE was never sequenced the full way through using Sanger sequencing. When
compared back to the mouse (mm9) +24 mCRE sequence sent by Professor Motomi Osato, it
was noted that the sequences stopped in the same place every time. The online tool QGRS
mapper was used to predict if there was a G-quadruplex structure (Kikin et al., 2006).
Two predicted quadruplex structures sit either side of the +24 region that could not be
sequenced. One was in the sense direction and the other was in the antisense direction.
Sequences can be found in Appendix IV. Each predicted structure has the potential to form a
four-stranded structure stabilised by G-quartets (Figure 4.2). G-quadruplex structures derive
stability from hydrogen bonding between guanines and stacking of G-quartets (Maizels et al.,
2013). Due to time constraints I have not investigated to see if this structure is predicted in the
human sequence.
130
Figure 4.2: Schematic representation of mouse +24 individual G-quadruplex.
Four guanines form a G-quartet, a planar ring around a central channel (represented by each grey rectangle). G-quadruplex structures derive stability from hydrogen bonding between guanines and stacking of G-quartets .
In the work described in this Chapter, I sought to understand how Runx1 +24 influences
the 3D structure and connectivity of the Runx1 locus. To understand how the formation of the
G-quadruplexes influences transcription, I measured the predicted mouse Runx1 G-quadruplex
3D structure integrity using a Circular Dichroism (CD) photospectrometer. I then determined
whether G-quadruplex structure impacts on enhancer capability by using luciferase assays in
mouse fibroblast (NIH/3T3) cells.
I used circular chromatin conformation capture (4C) to identify DNA-DNA interactions
with the RUNX1 +24 enhancer region. 4C is an unbiased approach that detects genome-wide
interactions with a chosen region or “bait” by using next generation sequencing of captured
fragments (van de Werken et al., 2012a). Identification of RUNX1 +24-anchored interactions
has potential to further elucidate the role of this enhancer by measuring its genomic
connectivity. 4C was performed in the K562 (chronic myelogenous leukaemia) cell line. In
K562, RUNX1 P1 expression was previously shown to have a key role in megakaryocyte
lineage determination (Draper et al., 2017). K562 cells can be induced to differentiate to the
megakaryocytic lineage (Kuvardina et al., 2015; Pencovich et al., 2011).
131
Majority of 3D chromatin mapping has been done in steady state cells. Cohesin
depletion has been shown to alter, and in some cases result in the complete collapse of TADs
(Waldman, 2020). Cuartero et al., (2018) and Antony et al., (2020) have recently shown upon
differentiation stimulation, the gene expression changes are more distinct in cohesin depleted
cells (Antony et al., 2020; Cuartero et al., 2018b). To determine the influence of cohesin and
enhancer specific connections in lineage specific regulation, the chromatin architecture should
be investigated before, during and after the differentiation process.
To understand the role of cohesin in enhancer connections and leukemic gene
regulation, K562 cells with CRISPR-CAS9 engineered STAG2 mutations were used to
determine the connectivity of RUNX1 +24. K562 cells were induced to differentiate to the
megakaryocytic lineage, using phorbol 12-myristate 13-acetate (PMA), so the role of RUNX1
+24 could be further understood. K562 lines with patient-specific STAG2 R614* mutation (-/-
) were previously generated by Dr. Antony.
Antony et al., (2020) highlighted the altered phenotype of K562 cells with STAG2
mutation using RNA seq (Antony et al., 2020). Using ATAC-seq Dr. Antony also noted altered
chromatin accessibility in K562 cells with STAG2 mutation. The RNA seq and ATAC seq
revealed that the gene expression changes observed are associated with altered chromatin
accessibility at regulatory regions and super-enhancers in STAG2 mutants (Antony et al.,
2020).
K562 cells are labelled as wild-type (WT) cells with the treatment condition (PMA or
DMSO) and timeframe also included in the nomenclature. STAG2 refers to the STAG2-mutant
K562 cells that were CRISPR edited to contain R614* mutation in the STAG2 subunit (Antony
et al., 2020). The baseline treatment condition was collected after 12 hours (h) of DMSO
treatment (12 DMSO). Cells were collected after 12 h of PMA treatment (12 PMA) and 72 h
of PMA treatment (72 PMA). 12 h of PMA treatment was chosen based on the aberrant RUNX1
132
and ERG expression spikes seen in STAG2 mutants at this timepoint (Antony et al., 2020) and
72 h is the time point when the aberrant expression spikes of RUNX1 seen in STAG2 mutants
return to non-PMA stimulated levels (Antony et al., 2020).
I chose RUNX1 +24 as a ‘bait’ to capture chromatin interactions because it is a strong
and well-characterised enhancer of RUNX1 transcription and therefore is likely to make
important regulatory contacts during haematopoietic cell differentiation. Identifying the +24
contacts in leukaemia cells may enhance our understanding of haematopoiesis and the
development of leukaemia.
4.2 Hypothesis
I hypothesise that RUNX1 +24 makes 3D genomic contacts that are important in the
regulation of RUNX1 expression.
4.3 Aims
1) Identify enhancer characteristics of RUNX1 +24 using bioinformatic tools.
2) Characterise a possible G-quadruplex structure in the mouse Runx1 +24 enhancer.
3) Identify the 3D connections of human RUNX1 +24 using 4C in K562 cells.
4) Determine genomic characteristics of significant chromatin interactions using
bioinformatic tools.
133
4.4 Enhancer characteristics of RUNX1 +24
RUNX1 +24 was analysed in the same manner as the other Runx1 regulatory regions
(Section 3.5).
4.4.1 Epigenetic analysis
The +24 region shows similar haematopoietic transcription factor binding sites and
chromatin modifications to m-371 m-58 and m+110. However, +24 has additional cohesin
(SMC3) and SPI1 binding. ChromHMM annotates this region as a strong enhancer, similar to
m-371, m-368, m-58 and m+110. Interestingly, +24 showed no change in chromatin
accessibility in K562 cohesin-mutant cells relative to wild type K562.
Table 4.1: Conservation and annotations of RUNX1 +24
Enhancer location relative to P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation ChromHMM annotation
Cohesin mediated chromatin accessibility
+24 chr21:36399106-36399322
DNase I HS, RNA Pol II, SMC3, CTCF, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, GATA1, SPI1, TAL1, GATA2
Strong Enhancer No change in accessibility in K562 cohesin mutants
4.4.2 Single nucleotide polymorphism (SNP) analysis
FANTOM5 data showed that +24 eRNA is actively expressed in haematopoietic cells.
No SNPs with a minor allele frequency (MAF) of >1% were detected in this region.
Consequently, 3D spatial interactions were not detected by the CoDeS3D pipeline due to the
lack of SNPs.
4.4.3 Comparative analysis of cancer genomic datasets
The International Cancer Genome Consortium (ICGC) reports 5 mutations in this
region from 5 donors. Primary sites of +24 mutations presented evenly with ovary, prostate,
breast, lung and haematopoietic cells (20%) in ICGC database. There is only one
134
haematopoietic mutation chr21:g.36399191A>T reported in a malignant lymphoma. The
FANTOM5 expression of +24 in haematopoietic cells overlaps with the primary ICGC tissues
that contained +24 mutations. The low number of mutations and lack of SNPs may suggest this
area is important for haematopoesis and a mutation may prevent normal development.
4.5 Characterising Runx1 +24 G-quadruplex structure
4.5.1 CD spectroscopy
G-quadruplexes (G4s) are generally classified according to strand orientation, loop
location and the number of strands involved in formation. To verify if the G4s predicted can
form in vitro, preliminary CD spectrometer readings were taken.
The predicted G4 structure GGTGGGGGTGGG, within the Runx1 +24 sequence is
made up of two G-quartets (Appendix IV) . Mutated G4 GGTGTGTGTGGG was generated as
a negative control with two Ts so that only one quartet can form, one quartet should be
insufficient for this structure to function as a G-quadruplex (Figure 4.3).
Figure 4.3: Predicted representation of Runx1 +24 mCRE G-quadruplex and mutated +24 mCRE.
A) Predicted Runx1 +24 mCRE G-quadruplex structure with two G-quartets (Grey rectangles). B) Predicted mutated Runx1 +24 mCRE structure. Blue Ts show the mutated base pairs so only one quartet can form.
135
CD experiments are carried out in different buffers: 100 mM KCl pH 7 (positive control
buffers); 100 mM NaCl, and 100 mM LiCl (negative control buffer). It is expected that G4s
will form in the KCl buffer because the buffer provides the K+ cation required to stabilise the
quartet. Both NaCl and LiCl donate fewer stable cations so the G4 should be less prominent;
at 95 °C, NaCl and LiCl should not support formation of the G4 structure.
Preliminary scans showed that at 25 °C the predicted +24 mCRE G-quadruplex was
forming a parallel stranded G4 formation (Figure 4.4A-C and Figure 4.5A). The mutated G-
quadruplex was unable to form any secondary structure because the DNA is in a parallel
formation (Figure 4.5B). The increase of temperature to 95 °C significantly decreased
structural stability in all buffers except for KCl (Figure 4.4D-F).
Figure 4.4: CD spectroscopy of wild-type and mutant +24 G4s.
Molar ellipticity (deg.cm2.dmol-1) is on the vertical axis and wavelength (nm) is on the horizontal axis. The blue line represents the CD spectra for the wild-type sequence and the orange line represents CD spectra for the mutant sequence. CD spectra of +24 G4 and mutant at A) 25 °C in the presence of 100 mM KCl. B) 25 °C in the presence of 100 mM NaCl. C) 25 °C in the presence of 100 mM LiCl. D) 95 °C in the presence of 100 mM KCl. E) 95 °C in the presence of 100 mM NaCl. F) 95 °C in the presence of 100 mM LiCl.
These preliminary scans indicate that a parallel G4 forms in vitro for the native
sequence. Based on the CD Spectra, the +24 G4 presents a positive peak at 260 nm and a
negative peak at 240 nm. This specific spectroscopic signature is representative of a parallel
G4, which suggests the strands are orientated in the same direction (Gattuso et al., 2016).
Figure 4.5 shows a schematic representation of the normal and mutated G4.
136
Figure 4.5: Schematic representation of the predicted Runx1+24 mCRE G-quadruplex in parallel
conformation and when base subsitutions are made at the +24 mCRE.
A) Predicted Runx1 +24 mCRE G-quadruplex structure in parallel formation with two G-quartets, based on CD spectrometer results (grey rectangles). Red arrows indicate 5’-3’ DNA arrangement B) Predicted mutated Runx1 +24 mCRE structure. Blue Ts show the mutated base pairs, preventing the formation of parallel quartets.
4.5.2 Runx1 +24 G-quadruplex Luciferase reporter assay
In order to determine whether an intact G-quadruplex is required for the functional
enhancer capability of the Runx1 +24 enhancer, luciferase reporter assays with mutated and
normal G-quadruplexes were used. The relative ability of constructs to drive expression of the
firefly luciferase reporter gene Luc2 (Phoinus pyralis) was determined. Luc2 was previously
cloned into the eukaryotic reporter gene vector pGL4.23-GW (Figure 4.6) (Pasquali et al.,
2014). A functional enhancer cloned into the Gateway site of pGL4.23-GW will increase
expression of Luc2 when transfected into a responsive cell line, which can be determined with
a luminometer.
137
Figure 4.6: Diagram of Luciferase reporter construct pGL4.23-GW.
This 4283 bp vector contains Gateway cassette (yellow) immediately upstream of the minimal promoter (blue). The minimal promoter (minP), a TATA-box promoter element, drives low basal expression, which allows for sensitive responses to promoterless regions of interest. The minP sits immediately upstream of the Luciferase reporter gene Luc2 (green) (Paguio et al., 2005). Figure from Thomas et al., (2015).
Runx1 +24 and modified +24 were cloned into the pGL4.23-GW (pGL4) vector (Figure
4.7) . The empty pGL4 plasmid was included to monitor background luciferase activity of the
reporter construct and acted as a negative control.
Runx1 +24 pGL4 has previously shown enhancer activity (Thomas, 2015). If the G-
quadruplex influences enhancer activity, we would expect the G to T mutation (g->t) in Runx1
+24 pGL4 to alter luciferase activity (Figure 4.7B). This g->t mutation is the same as the G4
mutant analysed in the CD analysis, so we know the G4 is unable to form. A control mutant,
Runx1 +24 pGL4 t->a (Figure 4.7 C) that alters base pairs in the loop region without affecting
the overall G4 structure was also generated.
I generated Runx1 +24 pGL4 g->t and t->a G4 mutants using site-directed mutagenesis.
This allowed for targeted base pairs to be altered efficiently. Mutations were confirmed by
Sanger sequencing.
138
Figure 4.7: Schematic representation of pGL4 Runx1+24 mCRE G-quadruplex in parallel conformation
and mutated +24 mCREs.
A) Predicted Runx1 +24 mCRE G-quadruplex structure in parallel formation with two G-quartets (grey rectangles). B) Mutated g->t Runx1 +24 mCRE structure. Blue Ts show the mutated base pairs and no quartets form in parallel formation. C) Mutated t->a Runx1 +24 mCRE structure. Blue As show the mutated base pairs and two quartets still form in parallel formation.
A Renilla luciferase reporter construct was co-transfected with the Runx1 +24 or +24
modified pGL4 reporter constructs. Firefly and Renilla luciferases have distinct evolutionary
origins, dissimilar enzyme structures and substrate requirements. These features allow these
two distinct luciferases to be used as dual reporters when co-transfected in a single well (Sherf
et al., 1996). Renilla was used as an internal control, to which the measurement of the pGL4
construct variants could be normalised.
The luminescence emitted was measured 48 h after transfection. To calculate the
relative luciferase units (RLU) of each reporter construct, the pGL4 luciferase value was
normalised to Renilla value to account for variations in transfection efficiency associated with
inter-sample variation. Results were expressed as Luciferase/Renilla ratios with vectors
carrying putative enhancers relative to the ratio of empty pGL4.23 vector (control).
In my previous Master’s project, NIH/3T3 (mouse fibroblast) cells showed a significant
upregulation of Runx1 +24 which indicates that +24 operates as an enhancer in this cell line.
NIH/3T3 cells express Runx1 P1, Runx1 P2 and total Runx1 at similar levels to the reference
genes (Thomas, 2015).
139
NIH/3T3 cell lines were co-transfected with the normal and mutated +24 pGL4
constructs (Figure 4.7) and Renilla, then assayed for their relative luciferase activity. Six
individual replicates were executed for each construct, and three biological repeats were
performed (Figure 4.8).
Figure 4.8: Luciferase activity in response to the various +24 mCREs in the NIH/3T3 cell line.
The data show enhancer activity of normal and mutated +24 pGL4 constructs compared to empty pGL4.23-GW plasmid in NIH/3T3 cells, p value <0.05 (*) <0.001(****). Mutated G-quadruplex g->t had significantly less enhancer activity than mutated t->a +24. However, there was no significant difference in enhancer expression when the unaltered Runx1 +24 was compared to the mutated +24s enhancer expression. The data were analysed by ANOVA using GraphPad Prism 6 and are represented as mean +/- standard error of mean (S.E.M) of at least three experiments (n=3). pGL4 luciferase value was normalised to Renilla value and expressed as relative luciferase units (RLU).
In the NIH/3T3 cell line, significant enhancement of luciferase activity was caused by
the Runx1 enhancer +24 mCRE (p <0.001) (Figure 4.8). The mutated Runx1 +24 G-
140
quadruplexes showed similar enhancement profiles to the normal Runx1+24 (p <0.001). The
non G4 forming mutant (g->t) showed significantly reduced expression when compared to the
control G4 mutant (t->a) (p <0.05). There was no significant difference between the unaltered
Runx1 +24 and the mutated Runx1 +24. Runx1 +24 g->t exhibited a minor decrease in enhancer
expression, whereas Runx1 +24 t->a showed a slight increase in enhancer expression when
compared to the unaltered Runx1 +24. This suggests that alterations to the loop regions of the
G4 could alter enhancer capability. However, the altered enhancement shown in the mutants is
modest, and was not statistically significant when compared to the normal Runx1 +24 G4.
G-quadruplexes are also capable of producing their own RNA, which have functional
roles (Beltran et al., 2019; Wang et al., 2017). This was not investigated in this research project.
4.6 Circular Chromatin Conformation Capture (4C) of
Human RUNX1 +24
Circular chromatin conformation capture (4C) was used to elucidate the connectivity
of the RUNX1 +24 enhancer region. The 4C sequencing libraries were prepared from 4C
material, each containing PCR products amplified by bait-specific primer pairs. The processing
of the sequencing data is available in Appendix VIII.
The guideline for the amount of mapped reads required for successful calling of 4C
data is 1 million, but identification of interactions can be done with less than 300,000 reads
(van de Werken et al., 2012a). A recent study has shown that 50,000–100,000 cis-mapped 4C
reads are sufficient to generate reproducible 4C profiles (Krijger et al., 2020). The number of
mapped reads for RUNX1 +24 bait in K562 and STAG2 mutant cells, in each library is shown
in Figure 4.9.
WT 12 PMA replicate 2 and WT 72 PMA replicate 2 had very low mapped reads, (1172
and 58721 respectively), however the rest of the libraries each had over 370,000 reads. This is
141
sufficient for successful interaction identification. Controls were cells that went through the 4C
process without the ligation steps.
Figure 4.9: RUNX1 +24 Mapped read count.
The number of mapped reads (millions) is shown for the individual conditions in replicate 1, 2, 3 or control.
4.6.1 Quality check of 4C data
Mapped reads were assigned to restriction fragments resulting in a read count per
fragment. To determine whether my Runx1 +24 4C data is of good quality and comparable to
other published 4C studies, I assessed the distribution of reads across fragments and correlation
of read counts between replicates.
The quality is measured by the proportion of reads on the same chromosome as the bait
(cis) and the proportion of reads around the bait viewpoint (near-bait:1 megabase either side of
the bait) (van de Werken et al., 2012a). The majority of the reads should be in close proximity
to the bait, captured and represented in the read count (van de Werken et al., 2012a). A high
number of reads on different chromosomes (trans) would indicate that the data originated from
random ligation events and therefore the data is of low quality. The general guideline of good
quality 4C data should have approximately 40% of reads in cis and at least 40% coverage in
142
the near-bait (Raviram et al., 2016; van de Werken et al., 2012a). For each condition, the
coverage of reads in cis and near-bait was calculated for all 3 replicates (Figure 4.10). The
majority of conditions had over 40% reads in cis and over 40% of fragments covered in the
near-bait. The STAG2 72 PMA replicate 1 had more than 40% coverage of fragments in near-
bait, but percentage of reads in cis was lower than 40% (31%). A low proportion of reads in
cis can indicate fixing conditions of the chromatin are not optimal, as more trans interactions
are detected, or that the bait forms a large number of trans interactions for structural or
regulatory purposes. Dr. Marsman previously allowed the +24 bait to be analysed in HPC7
cells with lower than 40% coverage in cis. The same parameter was extended to this analysis.
WT 12 PMA replicate 2 and WT 72 PMA replicate 2 had less than 40% reads in the near-bait
(14% and 39%). This factor combined with their low read count meant they were excluded
from further analysis.
143
Figure 4.10: RUNX1 +24 4C quality evaluation.
The coverage of fragments that had more than one read present within 0.1 megabases (Mb) on either side of the bait was plotted against the percentage of reads in cis over total. 4C data of WT 12 DMSO (light blue), WT 12 PMA (blue), WT 72 PMA (dark blue), STAG2 12 DMSO (yellow), STAG2 12 PMA (orange), STAG2 72 PMA (red) is plotted. Three points, corresponding to replicate 1, 2 and 3 are presented for each condition. Grey dotted lines indicate cut-off point for good quality data.
All samples that passed quality control were analysed using 4Cker, a bioinformatics
pipeline developed to analyse 4C and determine significant interactions (Raviram et al., 2016).
4.6.2 Detection of significant interactions
The interactions of a bait across the genome will differ depending on proximity to bait,
with closer interactions (near-bait) occurring at a higher frequency, followed by cis interaction
then trans interactions having the least amount of interaction with the bait region. 4Cker
provides different analyses for near-bait, cis and trans interactions.
Near-bait analysis was carried out for 1 Mb either side of the +24 bait (within the
RUNX1 domain). Cis analysis is carried out across chromosome 21, but excluded the near-bait
region. Trans analysis is carried out across all chromosomes except 21.
144
Statistically significant interactions were detected using the 4Cker package (Raviram
et al., 2016) using an adaptive window size of 3 fragments. Because the 4C signal is generally
higher around the bait region and decreases in far-cis and trans, different adaptive window
sizes were used for read normalisation and creating smooth spline in each data set. Adaptive
windows were used to obtain an analogous number of observed fragments for each window.
Window size is determined by the amount of signal detected in the region.
Fragments in close proximity to 4C baits have a higher read count compared to
fragments further away and when examined with the cis 4C analysis near-bait interactions
would be statistically significantly enriched. For this reason, cis interactions and near-bait
interactions were analysed separately, with near-bait analysis requiring a higher p-value to be
determined statistically significant by the 4Cker programme than the p-values of cis
interactions (Raviram et al., 2016). Significant interactions were visualised in the UCSC
browser.
To reduce errors due to mis-priming, fragments that had one or more reads in the control
samples were removed from the matching cell type and condition. Additionally, fragments
located 10 kb either side of RUNX1 +24 were removed. Fragments within 10 kb of the bait are
not informative because bait-proximal fragments have a high read count regardless of whether
there is a true interaction present or not. 4Cker combined the processed read counts per
fragment by normalising to the total number of reads in each replicate (Raviram et al., 2016).
Differential analysis between conditions was carried out between two conditions on the
windows that are interacting with the bait in at least 2 conditions. 4Cker has optimised this
analysis for near-bait and cis interactions. Significantly different domains were determined
with a p-value of 0.05.
This Chapter will interpret the RUNX1 +24 connections in normal K562 cells (WT 12
DMSO). Analysis of the other conditions is described in further detail in Chapter 5.
145
4.7 Identification and Genomic Characteristics of Human
RUNX1 +24 3D Connections
4.7.1 Visualisation of significant cis interactions
The mean normalised read count of all three WT 12 h DMSO replicates was plotted
across chromosome 21 locations using ggplot (Wickham et al., 2007) to generate the cis profile
of RUNX1 +24 connections in baseline K562 cells (Figure 4.11). Reads had a high coverage
around the +24 (5 megabases (Mb) surrounding the bait). Figure 4.11 reveals that RUNX1 +24
makes DNA-DNA contacts across the entire chromosome 21. The large normalised read peak
that shows an interaction around 45,000,000 (*) with PDE9A and BC033260.
Phosphodiesterase 9A (PDE9A) mRNA is detected throughout the body, but predominately in
the brain, whereas BC033260 is an uncharacterised long non-coding RNA (Harms et al., 2019;
Patel et al., 2018).
146
Figure 4.11: RUNX1 +24 cis 4C profile (chr21) in K562 cells (WT 12 DMSO).
Ggplot of the cis 4C profile, normalised read count of all WT 12 DMSO replicates is on the vertical axis and position on chromosome 21 is on the horizontal axis. * shows the large normalised read peak around 45,000,000 with PDE9A and BC033260.
WT 12 DMSO showed 115 significant cis interactions across the long arm of
chromosome 21. Once the overlap of the near-bait region is removed, there are 31 significant
cis interactions (Appendix IX) (Figure 4.12). These interactions extended across the known
TADs and ranging from chr21:16193679 to chr21:45164775. These interactions were further
categorised based on their chromatin characteristics using K562 ChromHMM and dbSUPER
enhancer annotations.
147
Figure 4.12: RUNX1 +24 significant cis interactions.
Chromosome 21 with RUNX1 +24 interactions represented by black arches. Each interaction is colour-coded according to its ChromHMM profile: key is at the bottom left of the figure. Super-enhancers were determined from dbSuper. Chromosome is aligned to the K562 Hi-C contact map. Yellow and blue bands show the predicted K562 TADs.
The majority of the interactions were with ChromHMM annotated regulatory elements
(enhancers and insulators) (Figure 4.13). Some regions that interact with RUNX1 +24 exhibit
multiple ChromHMM annotations, for example, location chr21:16193679-16290033
contained both insulator and strong enhancer regions. The region was also identified as
containing super-enhancers as determined by dbSUPER (Khan et al., 2016). The main
proportion of RUNX1 +24 interactions were to enhancer regions (58%), followed by
interactions to active promoters and insulators (both 32%). It also connects with some super-
enhancers so may be facilitating or restricting their 3D connections. Interestingly the +24
enhancer also interacted with regions of polycomb repressed chromatin (29%). This indicates
148
that +24 is not just acting as an enhancer hub (as previously described in Section 3.7); it may
also be holding regions of chromosome 21 to ensure that the chromatin is correctly organised
into inactive and active domains.
Figure 4.13: Profile of RUNX1 +24 significant cis interactions.
Percentage of interactions is on the vertical axis. Horizontal axis shows the ChromHMM annotations. Individual interactions can have multiple annotations.
4.7.2 Overview of significant near-bait interactions
Near-bait analysis considers the interaction 1 Mb either side of the bait (Raviram et al.,
2016). The mean normalised read count of all three WT 12 h DMSO replicates was plotted
across chromosome 21 near-bait locations using ggplot (Wickham et al., 2007) (Figure 4.14).
Domains are formed by high proportions of chromatin interactions in a genomic region, and
less to outside regions. Domain boundaries can be predicted by a decrease in read count
between two neighbouring regions. The TAD domain boundaries have already been reported
for K562 cells and were determined by Hi-C (Rao et al., 2014). In most human cell lines,
RUNX1 P1 and P2 promoters form TAD domain boundaries (Marsman, 2016).
149
When visualising the near-bait normalised read counts, the majority of reads are located
within 5 Mb surrounding the bait (Figure 4.14). Figure 4.14 indicates the near-bait genomic
interactions RUNX1 +24 is involved in.
Figure 4.14: RUNX1 +24 near-bait 4C profile.
Ggplot of the near-bait 4C profile, normalised read count is on the vertical axis and position on chromosome 21 is on the horizontal axis. Rao et al., (2014) K562 domain boundaries are shown in grey dotted lines. Domain locations in K562 cells were obtained from Rao et al., (2014) (Rao et al., 2014). Black arrows represent the RUNX1 promoters. RUNX1 is transcribed telomere to centromere implied by the direction of the black arrows of the RUNX1 promoters. * shows the large normalised read peak at human equivalent region of RUNX1 m -368.
150
When the K562 TAD domain boundaries are overlapped with the normalised WT 12
DMSO +24 reads we see RUNX1 +24 connections are primarily contained within the TAD
domain. However the interactions are not constrained by the upstream boundary, as there is a
similar read count between two neighbouring regions.
Interestingly the large normalised read peak (*) between 36,500,000 and 37,000,000
reflects an interaction with human equivalent region of RUNX1 m -368.
WT 12 DMSO showed 143 significant near-bait interactions (Appendix IX) (Figure
4.16). The majority of these interactions were upstream of RUNX1 P1. Similar to the cis
connections, most interactions connect to regions within genes or in intronic regions, implying
RUNX1 +24 connects with regulatory regions.
Genes within domains are potentially co-expressed and co-regulated with RUNX1. Dr.
Marsman defined the genes at the domain boundaries of K562 cells as CBR1, CLIC6, KCNE2
and SMIM11 (Marsman, 2016). These genes were determined to be involved in cellular
homeostasis and likely to have uniform expression in most cell types. Dixon et al., (2012)
hypothesised that genes with consistent transcription are frequently present at domain
boundaries and their high expression may contribute to boundary formation (Dixon et al.,
2012).
151
Figure 4.15: RUNX1 +24 significant near-bait interactions (chr21:34999764-37572609).
Chromosome 21 with positional zoom shown by the black box. RUNX1 is transcribed telomere to centromere implied by the direction of the black arrows of the RUNX1 promoters. In near-bait black box RUNX1 +24 interactions represented by black arches. RUNX1 +24 is represented by the yellow line. RUNX1 promoters are represented by blue lines and interactions to these promoters are shown in blue. RUNX1 regulatory regions are represented by orange lines and interactions to these regions are shown in orange. Chromosome is aligned to the UCSC K562 gene list and annotations. Dotted grey lines show Rao et al., (2014) K562 domain boundaries.
4.7.3 Significant interactions with known RUNX1 regulators
As demonstrated in the previous chapter RUNX1 has multiple different regulatory
regions that lie in the intronic and intergenic regions of RUNX1. I investigated whether RUNX1
+24 interacted with any of the characterised RUNX1 regulatory regions in K562 cells. Based
on the steady increase of RUNX1 P1 transcription after PMA stimulation (megakaryocyte
differentiation) in K562s (Antony et al., 2020), it was predicted that the RUNX1 +24 would
connect to multiple enhancers, as well as connecting with P1. Curiously the only RUNX1
promoter that +24 connected to in WT 12 DMSO was P2. RUNX1 +24 interacts with some of
the characterized human RUNX1 regulators examined in the previous chapter; m-368, m-101,
152
m-58, +62, m+110. RUNX1 +24 also interacted with the gene encoding the lncRNA RUNXOR.
Of note, there were a plethora of identified RUNX1 regulatory regions that were not connected
to RUNX1 +24, which is different to the Runx1 +24 connections seen in HPC7 cells. This
further supports the notion that different RUNX1 enhancers are used depending on cell type. It
is important to note that as this 4C only captures interactions from the +24 viewpoint. Other
enhancers may play important roles in RUNX1 regulation and connect directly with the RUNX1
promoters, without a RUNX1+24 interaction.
4.7.4 Significant interactions with other genes
To investigate which genes may be co-regulated with RUNX1, interactions occurring
with other genes were identified. Across Chromosome 21 RUNX1 +24 interacted with 63 genes
(Table 4.2). Gene interaction analysis identifies a connection with an overlap in the same
genomic region as a gene and does not necessarily reflect an interaction with a promoter or
exon of a gene. As seen in the ChromHMM analysis, the majority of interactions were with
regulatory regions (enhancers and insulators) across the chromosome (Figure 4.13).
The majority of genes that +24 interacts with are protein-coding genes (43%). However,
the other 57% of connections are to non-coding RNAs. A large proportion of these RNAs are
uncharacterised.
Non-coding RNAs have been shown to be involved in many biological processes,
including regulating transcription, cell-fate programming and maintaining genome 3D
organisation (Flynn et al., 2014; Geisler et al., 2013; Holoch et al., 2015; Li et al., 2019;
Nozawa et al., 2019; Sun et al., 2018). In particular, long non-coding RNAs have been shown
to activate or suppress transcriptional activity by interacting with chromatin modulating
proteins (Fan et al., 2015; Han et al., 2014; Jain et al., 2016; Prensner et al., 2013; Scarola et
al., 2015). In WT 12 DMSO, the connections to these RNAs may be important in maintaining
the chromatin states, to ensure correct gene regulation and overall maintenance of this cell type.
153
Table 4.2: Type and name of genes that interact with RUNX1 +24
Type Name
Protein-coding genes AP000322.53, BACE2, BRWD1, CBR1, CLIC6, DSCR3, DSCR4, ERG, GRIK1, HLCS, HMGN1, IFNAR2, IL10RB, LCA5L, PDE9A, PDXK, PKNOX1, PRDM15, PSMG1, RCAN1, RIPK4, SCAF4, SETD4, SMIM11A, SMIM11B, USP25, WRB
Pseudogene GAPDHP16, METTL21AP1, POLR2CP1, RAD23BLP, RBMX2P1, RBPMSLP, RNF6P1, RPL34P3, TIMM9P2
Antisense RNA AL773572.7, AP000688.11, AP001412.1, AP001626.2, AP001630.5, BRWD1-AS1, BRWD1-AS2, GRIK1-AS1, IL10RB-AS1, PLAC4
Long intergenic non-coding RNAs (LincRNA)
AF127936.3, AF127936.5, AF127936.7, AF015720.3, AP000318.2, AP000255.6, LINC00159, LINC01671, LINC01426, LOC100506403
Micro RNA (miRNA) MIR3197, MIR5692B
Miscellaneous RNA (miscRNA)
Y_RNA
Sense intronic RNA AP000688.14
Sense overlapping RNA BRWD1-IT1, RUNXOR
Small nuclear RNA (snRNA)
RNU6-992P
When comparing the expression of the gene list with Antony et al., (2020)’s RNA seq
expression analysis of K562s, there were no consistent patterns (expressed or not expressed)
of the connected genes in WT. Notably this supports the theory that the RUNX1 +24 is helping
to organise different domains across the chromosome into active and inactive domains.
Gene ontology analysis of genes that interact with RUNX1 +24 was not appropriate as
these genes were not consistently on or off. However, there was an exciting connection to ERG
Promoter 3. ERG is crucial to haematopoietic cell specific gene regulation and is also inactive
in WT 12 DMSO. ERG is a transcription factor involved in the differentiation and maturation
154
of megakaryocyte cells. ERG dysregulation is associated with several haematopoietic tumour
types including AML (Reddy et al., 1987; Shimizu et al., 1993). Dr. Marsman also found that
Runx1 +24 interacted with ERG in HPC7 cells (Marsman, 2016) (Table 4.3).
Altered +24 activity or connections may therefore contribute to haematopoietic
dysregulation.
The connections between Runx1 +24 and genes; Brwd1, Clic6, Dscr3, Erg, Hlcs,
Hmgn1 are seen in both HPC7 and K562 WT 12 DMSO cells (Table 4.3).Unlike in HPC7
cells +24 does not interact with TIAM1 in WT 12 DMSO cells (Table 4.3). TIAM1 is involved
in cell adhesion and migration and is dysregulated in haematopoietic cancers (Habets et al.,
1995; Ives et al., 1998).
Table 4.3: Name of genes that interact with Runx1 +24 in HPC7s
Gene name
Interactions observed with both HPC7 Runx1 +24 4C and K562 RUNX1 +24 4C
Brwd1, Clic6, Dscr3, Erg, Hlcs, Hmgn1
Interactions only observed in HPC7 Runx1 +24 4C
Atp5o, B3galt5, Cbr3, Chaf1b, Cldn14, Cryzl1, Dopey2, GM10785, Itsn1, Kcnj6, Kcnj15, Morc3, Mrps6, Slc5a3, Son, Tiam1, TMEM50b, Ttc3, 1600002D24Rik
4.7.5 Overall significant RUNX1 +24 cis and near-bait interactions
The cis connections showed significant interactions with ERG P3 and super-enhancer
regulatory regions between SAMSN1 and NRIP1 genes (Figure 4.17). The WT 12 D RUNX1
+24 connects with the known enhancers m-368, m-101, m-58, m+62, m+110. It also interacts
with known RUNX1 regulators long non-coding RNA RUNXOR and RUNX1 P2. Within the
near-bait connections, RUNX1 +24 connected with RUNX1 Exon 3. RUNX1 +24 also
connected to three previously undescribed regions +89, -96 and a regulatory region near
SLC5A3 Exon 2 and a MRPS6 intron (Figure 4.16).
155
Figure 4.16: RUNX1 +24 chromatin hub.
Yellow boxes represent UCSC annotated gene promoters and/or exons. Blue boxes show described RUNX1 regulators. Orange boxes annotate novel putative regulators.
The novel connections were further analysed using the same in silico approach (Chapter
3) to describe other regulatory regions. These annotations are described in Table 4.4.
ERG P3 is described by ChromHMM as an insulator and transcription-associated
region. Transcription-associated refers to transcriptional transition or elongation (Ernst et al.,
2010; Ernst et al., 2011). This region has both cohesin and CTCF binding as well as
haematopoietic transcription factor binding (SPI1, TAL1, GATA2). This region shows no
change in chromatin accessibility in cohesin mutant K562 cells.
The novel RUNX1 regulatory regions -96 and +89 are both annotated as weak enhancers
with transcription association. Intriguingly neither site is shown as open as there are no DNase
HS sites. +89 binds RNA Pol II, whereas -96 does not. These regions show no change in
chromatin accessibility in K562 cohesin mutant cells.
156
The novel regulatory region near SLC5A3 Exon 2 and a MRPS6 intron was annotated
by ChromHMM as containing both strong and weak enhancer regions. These regions showed
comparable transcription factor binding sites and chromatin modifications to known RUNX1
enhancers m-371, m-58 and m+110. This region also had increased accessibility in the cohesin
mutant K562 cells.
The cis connection and super-enhancer regulatory region between SAMSN1 and NRIP1
was annotated as containing strong enhancers, weak enhancers and insulators. This region has
both cohesin and CTCF binding as well as haematopoietic transcription factor binding of SPI1
and TAL1. This region showed decreased chromatin accessibility in cohesin mutant K562
cells. This region also contained uncharacterised super-enhancers (Hnisz et al., 2013; Khan et
al., 2016).
157
Table 4.4: Annotations of identified putative regulatory region connections
Feature/ Regulatory region location relative to RUNX1 P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation in K562 cells
ChromHMM annotation in K562 cells
Cohesin mediated chromatin accessibility
ERG P3 chr21:39852633-39906157
DNase I HS, RNA Pol II, RAD21, SMC3, CTCF, p300, LSD1, H3K4me1, H3K27me3, SPI1, TAL1, GATA2
Insulator, Transcription Associated
No change in accessibility in K562 cohesin mutants
-96 chr21:36517085-36518628
P300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9me3, TAL1
Weak Enhancer, Transcription Associated
No change in accessibility in K562 cohesin mutants
+89 chr21:36332447-36333601
RNA Pol II, CTCF, p300, LSD1, H3K4me1, H3K4me3, H3K9ac, H3K9me3
Weak Enhancer, Transcription Associated
No change in accessibility in K562 cohesin mutants
SLC5A3 Exon 2 MRPS6 intron
chr21:35474191-35488395
DNase I HS, RNA Pol II, CTCF, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, GATA1, TAL1, GATA2
Strong Enhancer, Weak Enhancer,
Increased accessibility in K562 cohesin mutants
region between SAMSN1 and NRIP1 (cis interaction)
chr21:16193679-16290033
DNase I HS, RAD21, SMC3 CTCF, p300, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, SPI1, TAL1,
Strong enhancer, Weak enhancer, Insulator
Decreased accessibility in K562 cohesin mutants
4.7.6 Trans interactions
The trans interactions of RUNX1 +24 could not be significantly investigated using
4Cker. The 4Cker analysis mapped the trans interactions to broad areas across various
chromosomes (Appendix X). This may be because when the software annotates regions as
significant, it does so based on the average of reads across the inputted window. As there were
low read numbers across most chromosomes, the software combined the reads across the
chromosome. This meant there were large (10,000 bp+) “interactions” called which spanned
158
across the different chromosomes. This gave no clear overview of sequencing read peaks hence
trans interactions were unable to be interpreted.
This 4C was designed to understand the cis and near-bait with the use of a 4 bp cutter.
Raviram et al., (2016) highlight in their experiment that 6 bp cutters are optimal for detecting
trans interactions (Raviram et al., 2016). The genome is cut more frequently with 4 bp cutters
and this increases resolution by over tenfold compared to 6 bp cutters (Raviram et al., 2016;
Van De Werken et al., 2012b).
To understand the trans interactions that could be detected, I looked at the raw read
peaks across the genome after control reads were removed (Appendix X). Based on peaks that
contained reads in 2 or more replicates I detected three trans interactions (Table 4.5). All peaks
were in intergenic regions and contained minimal ENCODE annotations.
Table 4.5: Annotations of identified trans interactions
Closest annotated Feature/gene
Genome coordinates (Hg19)
ENCODE annotation in K562 cells
ChromHMM annotation in K562 cells
Cohesin mediated chromatin accessibility
Region between KCNA2 and KCNA10
chr1:111,112,479-111,112,575
p300, H3K27me3 Polycomb repressed
No change in accessibility in K562 cohesin mutants
NLGN1 intergenic region
chr3:173,646,070-173,646,151
No binding Heterochromatin No change in accessibility in K562 cohesin mutants
Region between DDX58 and ACO1
chr9:32,452,626-32,452,711
p300 Transcription Associated
No change in accessibility in K562 cohesin mutants
HPC7 cells had one trans interaction between RUNX1 +24 and the overlapping
mitochondrial genes Gu332589 and Ak018753, however this was not seen in K562 cells.
159
RUNX1 is often translocated in AML and is usually with Chromosome 3 (EVI1), 8
(ETO), 16 (CBFB) and 12 (ETV6) (Sood et al., 2017). The RUNX1 +24 trans interactions do
not overlap with these frequent RUNX1 translocation points. The trans interaction on
chromosome 3 is over 4,782,000 bp away from EVI1.
It should be noted however that the K562 cells are unlikely to have a completely normal
karyotype and already carry BCR-ABL translocation. The BCR gene is normally on
chromosome number 22, ABL gene is normally on chromosome number 9 (McGahon et al.,
1997). Because the trans interactions were not able to be subjected to the same stringency as
cis and near-bait interactions, analysis of these interactions was not pursued further.
4.8 Chapter summary
4.8.1 RUNX1 +24 characteristics
RUNX1 +24 showed strong enhancer characteristics and active binding of
haematopoietic transcription factors. The FANTOM5 data showed that there is active
production of the +24 enhancer RNA (eRNA) in haematopoietic cell types.
Several studies have reported functional roles of eRNAs that foster enhancer-promoter
looping formation, promote transcription factor and co-regulator binding and support target
gene transcription (Bose et al., 2017; Hou et al., 2020b; Lai et al., 2013; Lam et al., 2013; Li
et al., 2013a; Melo et al., 2013; Panigrahi et al., 2018; Rahnamoun et al., 2018; Sigova et al.,
2015; Tsai et al., 2018). Enhancer RNA can also assist pause-release of RNA Pol II which
facilitates transcription elongation (Schaukowitch et al., 2014).
There are no SNPs with a MAF of >1% reported in RUNX1 +24 region. This could be
an underestimation, due to the limitations in identifying significant SNPs in haematopoietic
cells reported in the previous chapter (Section 3.4.2). The available AML genomic data
reported a low number (5) of mutations.
160
CRISPR/Cas9 editing of RUNX1 +24 (by targeting the whole intron) showed reduced
RUNX1 expression in OCI-AML5 cells (Mill et al., 2019). The majority of edited cells were
eradicated via apoptosis, while the viable edited cells had significantly decreased RUNX1
expression and inhibited growth (Mill et al., 2019). This research shows the importance of the
P1-P2 intron, which contains multiple RUNX1 regulatory elements.
The CD analysis showed that the G-quadruplexes observed in the mouse +24 element
are capable of forming parallel G-quadruplexes. G-quadruplex RNA is capable of targeting and
removing Polycomb repressive complex 2 from genes and allowing RNA-mediated regulation
(Beltran et al., 2019; Wang et al., 2017).
Luciferase assays showed that the mutations of G-quadruplex or mouse Runx1 +24 had
no significant effect on enhancer function. Given the overall organisational role of RUNX1 +24
seen in the 4C and possibility of G-quadruplex RNA, this assay is restricted in its ability to
assess the functional nature of this G-quadruplex, because it doesn’t take non-
transcriptional/structural roles into account. Therefore, luciferase assays may not capture the
changes in capability when the +24 G-quadruplex is mutated. Furthermore, the G-quadruplex
analysis was specific to the mouse sequence, so it may not recapitulate the function of the
human sequence. If the human sequence contains a G-quadruplex, CD analysis should be
repeated to determine the type of G-quadruplex, or its functional capacity.
It would be interesting to further investigate the possible functional roles of RUNX1
+24 eRNA and +24 G-quadruplex RNA. This could be done using precision run-on sequencing
(PRO-seq) and rapid amplification of cDNA ends (RACE). Pro-seq can define the position,
amount, and orientation of transcriptionally active RNA polymerases in the genome to a single-
nucleotide resolution (Hou et al., 2020b). RACE can map eRNAs by identifying the 5′ and 3′
ends of RNA and RT-qPCR can determine the expression levels of these eRNAs (Hou et al.,
161
2020b). The eRNAs could also be knocked down using small interfering RNA (siRNA) or
mutated using CRISPR/Cas9 (Hou et al., 2020b).
If the G-quadruplex was capable of forming RNA it would be fascinating to investigate
its targets or ability to bind to Runx1 protein. Previously Fukunaga et al., (2020) showed the
ability of a G-quadruplex RNA to bind to the ETO and dissociate the leukemic RUNX1-ETO
fusion protein from DNA (Fukunaga et al., 2020). This suggests that some G-quadruplex RNA
may serve as potential therapeutic agents for leukaemia.
4.8.2 RUNX1 +24 interactions
In K562 cells, RUNX1 +24 interactions occurred across the whole of Chromosome 21.
Previously, Hi-C and promoter capture-C showed conservative interactions of promoters
within TADs. Unexpectedly, the +24 enhancer interacted across the whole chromosome and
was not limited to the reported domain structure.
The majority of RUNX1 +24 interactions were with regulatory regions, however it is
connected to both active and inactive chromatin. This unexpected finding suggests that RUNX1
+24 is playing a role in the organisation of chromosome 21 in K562s, rather than a traditional
enhancer role.
The RUNX1 +24 connections were predominantly with non-coding RNAs, whose
expression profiles cannot be determined by the existing RNA-seq data. Antony et al., (2020)
RNA-seq data was unstranded and poly A+ enriched. Often non-coding RNA overlaps with
genes, if this applies to these non-coding RNAs, they would not get annotated by this data set.
The connections from RUNX1 +24 to non-coding RNAs, imply that RUNX1 +24 may be
providing connections across the chromatin to ensure correct transcription and repression for
maintenance of K562 cells. As non-coding RNAs are important for correct enhancer-promoter
looping, 3D genome organisation, and cell-fate processes, the +24 may be safeguarding the
chromatin structure and gene expression.
162
In HPC7 cells Runx1 +24 connected to the active P1 and inactive P2 Runx1 promoters
and connected to 7 Runx1 regulatory regions (m-371, m-368, m-354, m-327, m-321, m-303,
m-58, m-48). RUNX1 +24 in K562 cells connected to only one RUNX1 promoter (P2) which
is active as well as a small number of identified RUNX1 regulatory regions (m-368, m-101, m-
58, +62, and m+110). This reinforces the notion of different enhancer recruitment in order to
produce different cell types and generation of different RUNX1 isoforms.
There were some novel interactions identified; RUNXOR, ERG P3 and putative
regulatory regions -96, +89, SLC5A3 Exon 2 MRPS6 intron and a super-enhancer region
between SAMSN1 and NRIP1. These putative regulatory regions could be investigated further
for functional roles in K562 cells by using enhancer and insulator assays, including analysing
the combinatory roles of the regions.
RUNX1 +24 could be important for overall organisation and 3D structure of
chromosome 21. RUNX1 +24 interacts with coding genes, non-coding RNA, super-enhancers
and regulatory elements, and may be maintaining these connections to prevent transcription,
rather than the expected enhancer role. Further investigation to define the RNA-DNA
interactions proposed by this research could determine the importance of RUNX1+24 eRNAs,
possible G-quadruplex RNA and non-coding RNA connections seen in K562s. This could be
carried out by utilising previously generated RNA-DNA interactomes for K562s cells such as
Red-C from Gavrilov et al., (2020) (Gavrilov et al., 2020).
Some other questions arising from this research include: are there other well established
enhancers that connect across the chromosome? If so, are they significant for genome
organisation? Are there equivalents of RUNX1 +24 on different chromosomes? These could be
further discovered by using enhancers as baits for 4C experiments or establishing an equivalent
of Capture-C for enhancers.
163
This chapter describes my work determining how RUNX1 +24 interacts with other
DNA regions in steady-state K562 cells. However, enhancers become active when cells are
subjected to various stimuli, such as in differentiation. Cohesin insufficiency has been shown
to impair differentiation, alter overall chromatin accessibility, as well as the transcription of
RUNX1 (Mazumdar et al., 2015; Mullenders et al., 2015; Viny et al., 2015). RUNX1 P1 is
crucial for megakaryocyte differentiation and it is predicted that enhancers play key roles in
inducing the correct RUNX1 promoters. The following chapter describes experiments in which
the K562 cells were induced to differentiate to the megakaryocytic lineage, so the role of
RUNX1 +24 could be further understood. It also details how cohesin insufficiency affects the
RUNX1 +24 connections.
164
5 Chapter 5: Cohesin insulates 3D connections of RUNX1
+24 during megakaryocyte differentiation
5.1 Introduction
The previous chapter established the connectivity of the RUNX1 +24 enhancer and
suggests that this region connects to multiple DNA regions spanning chromosome 21 in K562
cells. RUNX1 P1 is crucial for megakaryocyte differentiation and it is predicted that enhancers
play key roles in selecting and inducing the correct RUNX1 promoters. Dr. Antony discovered
that enhancer activity is altered during megakaryocytic differentiation in cohesin-mutant cells
(Antony et al., 2020). Cohesin insufficiency impairs HSC differentiation, alters overall
chromatin accessibility, as well as the transcription of RUNX1 (Mazumdar et al., 2015;
Mullenders et al., 2015; Viny et al., 2015)
Overall, cohesin mutations are implicated as a putative genetic pathway for
development of AML, but how the disease arises from these mutations has not been
determined. Cohesin mutations co-occur with other mutations in leukaemogenic genes
indicating cooperation between pathways (Welch et al., 2012). One possibility is that
impairment of cohesin’s regulatory role leads to dysregulation of other important
leukaemogenic genes. The gene encoding cohesin subunit STAG2 is the most frequently
mutated of the cohesin genes (Van Der Lelij et al., 2017). STAG2 mutation can lead to complete
loss of STAG2 function due to mosaic inactivation of the X-linked wild type allele. Several
studies have shown STAG2 preferentially occupies enhancers and promoters, and promotes
cell-type-specific transcription, a function that is not compensated by cohesin-STAG1 (Arruda
et al., 2020; Cuadrado et al., 2019; De Koninck et al., 2020; Kojic et al., 2018; Viny et al.,
2019).
165
K562 cells with a patient-specific STAG2 R614* null mutation were previously
generated by Dr. Antony. Wild type (WT) and STAG2 mutant R614* K562 cells were
previously subjected to RNA-seq, revealing genome-wide gene expression changes. Using
ATAC-seq, Dr. Antony also noted altered chromatin accessibility in K562 cells with STAG2
mutation. The RNA-seq and ATAC-seq revealed that the gene expression changes observed
are associated with altered chromatin accessibility at regulatory regions and super-enhancers
in STAG2 mutants (Antony et al., 2020).
RUNX1 and ERG are important for HSC differentiation and both have intergenic
regions that are distinguished as super-enhancers in CD34+ cells (Khan et al., 2016). In Dr.
Antony’s dataset, ERG was the top differentially upregulated super-enhancer-associated gene
in the STAG2-/- mutant, while RUNX1 was one the genes that showed differential chromatin
accessibility.
K562 cells can be induced to differentiate towards the megakaryocyte lineage using
phorbol 12-myristate 13-acetate (PMA). Antony et al., (2020) used quantitative PCR (qPCR)
to measure specific gene expression changes over 72 hours (h) of PMA stimulation. WT cells
undergoing normal megakaryocyte differentiation had a gradual increase in transcription of
RUNX1 from its P1 promoter: RUNX1-P1 and ERG. However, STAG2-deficient cells showed
an aberrant spike in transcription of RUNX1 from its P2 promoter: RUNX1-P2 and ERG 6-12h
post-PMA stimulation (Antony et al., 2020)..
Dr. Antony showed that by 48 h post-stimulation, aberrant RUNX1 and ERG
transcription had returned to baseline in STAG2-deficient cells. This implies that PMA
stimulation of STAG2-deficient cells, which at baseline have increased chromatin accessibility
at RUNX1 and ERG, leads to transient unrestrained transcription (Antony et al., 2020).
Antony et al., (2020) used a bromodomain and extra-terminal motif (BET) inhibitor
protein called JQ1, to decrease BRD4 binding and suppress enhancer-driven transcription
166
(Antony et al., 2020). BRD4 is a bromodomain-containing protein that interacts with active
enhancers (Bhagwat et al., 2016). When K562 cells were treated with JQ1 concurrently with
PMA, there was reduced RUNX1-P2 and ERG expression in WT cells, and dramatically
dampened aberrant expression peaks in STAG2-deficient cells (Antony et al., 2020). . RUNX1-
P1 transcription was completely blocked by JQ1 in both WT and STAG2-deficient cells
(Antony et al., 2020). These results suggest that STAG2 depletion de-constrains the chromatin
surrounding RUNX1 and ERG, which produces aberrant enhancer-amplified transcription in
response to differentiation signal.
This Chapter describes the results of my research on: 1) the role of RUNX1 +24 in
megakaryocyte differentiation and 2) how cohesin insufficiency affects spatial connections
initiated by the RUNX1 +24 enhancer. For the first objective, K562 cells were induced with
PMA to differentiate towards the megakaryocytic lineage, so that the spatial connections
initiated by RUNX1 +24 during the differentiation process could be determined. For the second
objective, K562 cells with the STAG2 R614* null mutation were differentiated in parallel to
understand how cohesin deficiency affects +24 enhancer connections.
This experiment highlighted how the connectivity changes upon cohesin depletion and
may provide a mechanism for how cohesin mutations can lead to haematopoietic dysregulated
RUNX1 expression.
5.2 Hypothesis
I hypothesise that RUNX1+24 connections will alter during megakaryocyte
differentiation and without STAG2 these connections will not be maintained.
5.3 Aims
3) Elucidate connections of RUNX1 +24 during normal megakaryocyte differentiation.
4) Determine the change in significant chromatin interactions in STAG2-deficient
cells after PMA stimulation.
167
5.4 RUNX1 +24 significant interactions during normal
megakaryocyte differentiation.
The conditions chosen for this 4C were based on the gene expression analyses shown
by Antony et al., (2020). Differentiation to megakaryocytes was induced using PMA
(Kuvardina et al., 2015). Control treatment of K562 (WT) and STAG2-deficient cell lines used
Dimethyl Sulfoxide (DMSO), as the solvent in place of PMA. 12 h was chosen based on
abnormal transcription observed at this time in STAG2-deficient mutants. An end timepoint of
72 h was chosen because by this time, WT cells displayed differentiation markers and gene
transcription profiles consistent with megakaryocytes (Antony et al., 2020).
4Cker differential analysis was carried out on the windows that interact with the
RUNX1 +24 bait in at least 2 conditions, using DESeq2 (Love et al., 2014; Raviram et al.,
2016). This analysis only works for near-bait and cis interactions. A plot of significant
interactions was generated with a p-value of 0.05, however, to tease out true differences in
interactions, a more stringent cut-off for significance utilised an adjusted p value of 0.01.
Hereafter all p values refer to each interaction’s adjusted p value.
Significantly altered interactions and their log fold changes were visualised in the
UCSC browser.
5.4.1 Visualisation of significant interactions during normal
megakaryocyte differentiation.
The normalised read count for all WT conditions (WT 12 DMSO (control), WT 12
PMA, WT 72 PMA) was plotted across chromosome 21 using ggplot (Wickham et al., 2007)
to generate the cis profile of Runx1 +24 connections in unedited K562 cells (Figure 5.1). Reads
had a high coverage around the +24 bait. The profile remains relatively similar during
megakaryocyte differentiation; however, the number of normalised reads was reduced post-
168
stimulation. The large normalised read peak that shows an interaction around 45,000,000 (*)
with PDE9A and BC033260 in WT 12 DMSO dramatically decreased during differentiation.
Figure 5.1: Megakaryocyte differentiation with PMA stimulation RUNX1 +24 WT cis 4C profiles.
ggplots of the cis 4C profiles for WT DMSO 1, WT PMA 12 h and WT PMA 72 h. Normalised read count is on the vertical axis and position on chromosome 21 is on the horizontal axis. * large peak at PDE9A and BC033260 in WT 12 DMSO.
5.4.2 Overview of significant changes in interactions during normal
megakaryocyte differentiation.
Significantly altered interactions and their log fold changes were visualised in the
UCSC browser. Synonymous with WT 12 DMSO; the WT 12 PMA and WT 72 PMA profiles
showed that significant cis interactions spanned across the long arm of chromosome 21 (Figure
5.2). A list of all the statistically significant changes between conditions is available in
Appendix XI.
Over the differentiation time course RUNX1 +24 loses connections with a number of
regions across the chromatin. The significant cis differences between WT 12 DMSO and WT
12 PMA was loss of connection to ERG Promoter 3 (P3) with a Log2 fold change of -7.84 (p
169
value =0.0097) and regulatory region between SAMSN1 and NRIP1 (Log2 fold change of -
8.01, p value =0.0049) (Figure 5.2).
ERG is repressed in WT 12 DMSO control K562 cells. However, during differentiation
ERG expression slowly increases (Antony et al., 2020). ERG has three known promoters, and
can positively regulate its own expression (Thoms et al., 2011). The data suggest that the
RUNX1 +24 connection in WT 12 DMSO control K562 cells might constrain ERG expression,
and upon PMA treatment, ERG’s P3 is released to facilitate the expression of ERG.
When comparing the WT 12 DMSO and WT 72 PMA there are 12 lost interactions
(Log2 fold change ranges from -5.78 to -8.6, p values range from 0.0017 to 0.0081), including
loss of connections to ERG P3 and the regulatory region between SAMSN1 and NRIP1 (see
Appendix XI). Overall, megakaryocyte differentiation of K562 cells causes the RUNX1+24
enhancer to lose cis-interactions along chromosome 21.
170
Figure 5.2: Differentiation-induced changes to the RUNX1 +24 WT cis interactions.
Cis 4C profiles for WT DMSO 1, WT PMA 12 h and WT PMA 72 h. Chromosome 21 with WT DMSO 12 h RUNX1 +24 interactions represented by grey arches. Green arches show gained interactions compared to baseline (WT DMSO 12 h). Blue arches underneath the chromosome sketches highlight lost interactions compared to baseline. Each interaction is colour-coded according to its ChromHMM profile: key is at the bottom left of the figure. Super-enhancers were determined from dbSuper.
171
The ChromHMM annotations do not necessarily reflect the nature of these elements
during differentiation. The annotations were determined in steady-state K562 cells and
represent the baseline (WT 12 DMSO); therefore, the chromatin composition could be altered
after PMA stimulation.
The super-enhancers were identified regardless of the chromatin status as they were
from the collection of cells in dbSUPER (Khan et al., 2016). In WT 12 DMSO 16% (n= 5/31)
of the cis connections were with super-enhancers. In Chapter 4 I noted that RUNX1 +24
interactions with super-enhancers may be facilitating or restricting their 3D connections. There
is a decrease in super-enhancer cis-connections during differentiation (10.3%) (n= 3/29) (WT
12 PMA), the proportion of interactions with super-enhancers increases to 13.6% (n= 3/22)
after megakaryocyte differentiation at WT 72 PMA (Figure 5.3).
Figure 5.3: Differentiation-induced changes in the number of RUNX1 +24 significant cis interactions with
super-enhancers in WT K562 cells.
Percentage of interactions is on the vertical axis. Horizontal axis shows the treatment conditions WT DMSO (D) 12h, WT PMA (P) 12h and WT PMA 72h
172
5.4.3 Visualisation of significant near-bait interactions during normal
megakaryocyte differentiation.
The normalised read counts of all three time points in WT cells were plotted across
chromosome 21 near-bait locations using ggplot (Wickham et al., 2007) (Figure 5.4). The
majority of reads are located within specific regions surrounding RUNX1 +24 in WT 12
DMSO. Resembling the cis WT profiles, the number of normalised reads decreases as the cells
differentiate. Interestingly, the large normalised read peak which highlighted an interaction
with RUNX1 m -368 (*) and the surrounding area decreases in comparison to the baseline WT
12 DMSO connections (Figure 5.4). Overall, megakaryocyte differentiation of K562 cells
causes the RUNX1+24 enhancer to lose interactions within the RUNX1 TAD.
Figure 5.4: Megakaryocyte differentiation with PMA stimulation RUNX1 +24 WT near-bait 4C profile.
ggplot of the near-bait 4C profiles for WT DMSO 12 h, WT PMA 12 h and WT PMA 72 h, normalised read count is on the vertical axis and position on chromosome 21 is on the horizontal axis. The asterisk shows the large normalised read peak at human equivalent region of RUNX1 m -368.
173
5.4.4 Overview of significant changes in near-bait interactions during
normal megakaryocyte differentiation.
Similar to the cis results RUNX1 +24 progressively lost connections during
megakaryocyte differentiation. However, unlike cis interactions WT 12 PMA and WT 72 PMA
near-bait also showed some significantly increased connections (Figure 5.5). The statistically
significant changes between conditions is available in Appendix XII.
174
Figure 5.5: Differentiation-induced changes to the RUNX1 +24 WT near-bait interactions.
Near-bait 4C profiles for WT DMSO 12 h, WT PMA 12 h and WT PMA 72 h. Chromosome 21 with WT DMSO 12 h RUNX1 +24 interactions represented by grey arches. Green arches show gained interactions compared to baseline (WT DMSO 12 h). Blue arches underneath the diagrams highlight lost interactions compared to baseline. RUNX1 +24 is represented by the yellow line. RUNX1 promoters are represented by blue lines and interactions to these promoters are shown in blue. RUNX1 regulatory regions are represented by orange lines and interactions to these regions are shown in orange. Near-bait region is aligned to the UCSC K562 gene list and annotations. Dotted grey lines show Rao et al., (2014) K562 domain boundaries.
175
The previous chapter demonstrated the multiple near-bait connections surrounding
RUNX1 +24. The near-bait interactions observed during megakaryocyte differentiation reflect
the overall trend of a decrease in connections to RUNX1 +24. When assessing the WT 12
DMSO and WT 12 PMA there were 3 lost connections; -96, +89 and the SLC5A3 Exon 2
MRPS6 intron region (Log2 fold changes of -7.81,-7.62 and -8.94 and p values of 0.0034,
0.0015, 0.0014 respectively), connections that were previously described in Chapter 4.
Despite the overall loss of connections, after 12 h of PMA treatment there were some
significantly increased interactions with RUNX1 +24 (Log2 fold change ranges from 6.77 to
8.25, p values range from 5.3E-7 to 0.0020) see Appendix XII. These include some possible
regulatory regions +14, +54, +68, +142 between RUNX1 P1 and P2 (Table 5.1). RUNX1-IT1,
RUNX1 promoter 1, RUNX1 exon 1 and 2 are also recruited to the RUNX1 +24 hub (Figure
5.6). RUNX1-IT1 is a long non coding RNA which has been previously correlated with RUNX1.
(Liu et al., 2020). Expression of the RUNX1-IT1 enhanced RUNX1 transcription in pancreatic
cancer cells (Liu et al., 2020). Connections with m-368, m-58, RUNXOR, m+3, +62, m+11,
RUNX1 P2 and exon 3 remain stable with RUNX1 +24 during megakaryocyte differentiation.
The RUNX1 protein is capable regulating its own gene transcription (Martinez et al.,
2016). RUNX1 P1-driven expression is important for megakaryocyte differentiation (Draper et
al., 2017). During megakaryocyte differentiation RUNX1 P1-driven expression of RUNX1
gradually increases (Antony et al., 2020). In baseline K562 cells (WT 12 DMSO), RUNX1 P1
is not connected to RUNX1 +24 and the RUNX1 P1 driven expression is repressed (Chapter 4).
However, after PMA simulation RUNX1 P1 is recruited to RUNX1 +24 along with other
possible enhancers (Figure 5.6). In contrast, RUNX1 P2 transcript is stably expressed
throughout differentiation (Antony et al., 2020). Consistent with this, the connection between
RUNX1 P2 to RUNX1 +24 remains unchanged throughout differentiation.
176
After differentiating to megakaryocytes (WT 72 PMA) more near-bait connections are
lost (Log2 fold change ranges from -3.93 to -12.36, p values range from 1.4E-35 to 0.0095).
Contacts with some of the of the possible regulatory regions that were significant at 12 h after
PMA treatment +14, +54, +68, +142 are lost in differentiated cells. Interactions with RUNXOR,
RUNX1-IT1, RUNX1 P1, RUNX1 exon 1 and 2 are also lost. The only connection gained at 72
h of differentiation was with m-371 (Log2 fold changes of 5.95, p value of 0.0069).
Chromatin status at the near-bait regions with significantly increased connections was
analysed using ENCODE data as for Chapter 3 and 4 (Table 5.1).
Table 5.1: Annotations of identified putative regulatory region connections
Regulatory region location relative to P1 (kb)
Genome coordinates (Hg19)
ENCODE annotation in K562 cells
ChromHMM annotation in K562 cells
+14 chr21:36406427-36407599
RNA Pol II, CTCF, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9me3, H3K27me3
Strong enhancer, Weak enhancer,
+54 chr21:36366550-36367612
DNase I HS, RNA Pol II, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3, H3K27me3
Strong enhancer, Weak enhancer,
+68 chr21:36352605-36353690
RNA Pol II, p300, LSD1, H3K4me1, H3K4me3, H3K9ac, H3K9me3
Weak enhancer, weak transcription
+142 chr21:36278644-36280049
DNase I HS, CTCF, p300, LSD1, H3K27ac, H3K4me1, H3K4me3, H3K9ac, H3K9me3
Strong enhancer
5.4.5 Significant interactions with other genes
Connections with genes were identified using the same methods as Section 4.7.4 and
are listed in Appendix XIII. The number of gene connections changes during differentiation
(Table 5.2). Interestingly the number of connected genes increased from 64 in the baseline to
177
154 (12 PMA) and then decreased to 140 (72 PMA). This is opposite to the trend of connections
which decreases overall. These connections need further investigation to determine gene type,
establish their expression level, as well as ascertain if there are any patterns in the type of gene.
However, as the gene interaction reflects an overlap with the region of a gene rather than a
promoter or exon of a gene, it may not be appropriate to assume the gene’s expression is
controlled by RUNX1 +24.
Table 5.2: Number of genes connected to RUNX1 +24 in WT cells
WT 12 DMSO WT 12 PMA WT 72 PMA
Number of genes connected to RUNX1 +24
64 154 140
5.4.6 Overall significant RUNX1 +24 and near-bait interactions in K562
cells during megakaryocyte differentiation
During PMA treatment there is an overall loss of connections to RUNX1 +24, however
as the cells are differentiating there are some significant changes in the near-bait. RUNX1 +24
maintains connections with the known enhancers m-368, m-58, m+3, +62, m+110 during
differentiation (Figure 5.6). It also sustains connections with RUNX1 P2 and RUNX1 exon 3.
However, the novel connections determined in Chapter 4 to -96, +89, SLC5A3 Exon 2 MRPS6
intron and super-enhancer regulatory regions between SAMSN1 and NRIP1 are lost during
differentiation. The connections to known RUNX1 regulator long non-coding RNA RUNXOR,
and ERG P3 are also lost during differentiation (Figure 5.6).
During differentiation (12 h post PMA treatment) The +24 enhancer makes some novel
connections to putative regulatory elements +14, +54, +68, +142 (Figure 5.6, Table 5.1)
RUNX1 +24 also gains connections with the RUNX1 promoter 1, exon 1 and 2.
178
After differentiation RUNX1 +24 loses the novel connections gained at 12 h of PMA
treatment (+14, +54, +68, +142), and with the long non-coding RNA RUNX1-IT1, RUNX1 P1,
exons 1 and 2. At 72 h once differentiation has occurred, there is a significant increase in the
interaction between RUNX1 +24 and m-371. m-371 is an important haematopoietic enhancer
in developing zebrafish and mice.
179
Figure 5.6: Changes to the RUNX1 +24-anchored chromatin hub during megakaryocyte differentiation of
K562 cells.
Yellow boxes represent UCSC annotated gene promoters and/or exons. Blue boxes show described RUNX1 regulators. Orange boxes annotate novel putative RUNX1 regulators.
180
5.5 The effect of STAG2 deficiency on RUNX1 +24 significant
interactions during differentiation of K562 cells
The STAG2 mutants 4C data was prepared in the same way as WT (STAG2 12 DMSO,
STAG2 12 PMA, STAG2 72 PMA). STAG2 mutants were compared to WT equivalent
condition and timepoint.
5.5.1 Visualisation of significant interactions during cohesin impaired
“differentiation”
The normalised read count of all WT conditions (WT 12 DMSO, WT 12 PMA, WT 72
PMA) and STAG2 mutant conditions (STAG2 12 DMSO, STAG2 12 PMA, STAG2 72 PMA)
were plotted across chromosome 21 using ggplot (Wickham et al., 2007) to generate the cis
profile of Runx1 +24 connections in baseline K562 cells and cohesin mutants (Figure 5.7). The
STAG2 mutant cells have a dramatic increase in connections across chromosome 21. Reads
had a high coverage around the +24 bait.
A comparison of the +24-mediated connections in WT 12 DMSO versus STAG2 12
DMSO show a similarly high number of reads around the RUNX1 +24 bait. However, the read
coverage across chromosome 21 is higher in STAG2 12 DMSO cells.
During PMA-induced differentiation, WT K562 cells lose connections overall. In
contrast, STAG2 mutants generally retained, or even gained, connections following stimulation
with PMA. After 12 h of PMA stimulation the pattern of connections in STAG2 mutants
approximates the wild type, but with a higher read and more distant connections. After 72 h of
PMA stimulation there was a dramatic increase in connections across chromosome 21 in
STAG2 72 PMA compared to WT 72 PMA.
181
Figure 5.7: Time course of PMA stimulation of all RUNX1 +24 cis 4C profiles.
ggplots of the cis 4C profiles for WT DMSO 1, WT PMA 12 h, WT PMA 72 h, STAG2 DMSO 12 h, STAG2 PMA 12 h and STAG2 PMA 72 h. Normalised read count is on the vertical axis and position on chromosome 21 is on the horizontal axis
5.5.2 The effect of STAG2 deficiency on interactions during
differentiation of K562 cells
Similar to the baseline WT cells, the baseline STAG2 mutant cells (STAG2 12 DMSO)
had significant cis interactions spanning the long arm of chromosome 21 (Figure 5.8). A list of
all the statistically significant changes between conditions is available in Appendix XI.
182
When comparing the WT 12 DMSO and STAG2 12 DMSO, there are 4 interactions
that are not present in the cohesin mutant baseline (Log2 fold change ranges from -6.66 to 5.15,
p values range from 0.0003 to 0.0078) including the super-enhancer regulatory region between
SAMSN1 and NRIP1 (Log2 fold change of -5.64, p value of 0.0047) (Appendix XI). However,
there are 20 regions interacting with RUNX1 +24 in STAG2 12 DMSO (Log2 fold change
ranges from 3.85 to 6.04, p values range from 0.0002 to 0.0097) that are not present in WT 12
DMSO (Figure 5.8). Overall, STAG2 deficiency causes the RUNX1+24 enhancer to make more
connections than in the WT cells.
183
Figure 5.8: Significant changes to the RUNX1 +24 cis baseline (12 DMSO) interactions due to cohesin
deficiency.
Cis 4C profiles for K562 STAG2 mutant DMSO 12h cells on Chromosome 21 with K562 WT DMSO 12h RUNX1 +24 interactions represented by grey arches. Green arches show a gained interaction compared to WT. Blue arches underneath the diagrams highlight lost interactions in STAG2 mutant cells compared to WT. Each interaction is colour-coded according to its ChromHMM profile: key is at the bottom left of the figure. Super-enhancers were determined from dbSuper.
These interactions were further categorised based on the K562 chromatin
characteristics using ChromHMM and dbSUPER annotations. Some regions that interact with
RUNX1 +24 exhibit multiple ChromHMM annotations (Figure 5.9).
Compared to WT cells, STAG2 mutant cells RUNX1 +24 have slightly fewer
interactions with enhancer regions (WT = 58%, STAG2 = 50%), active promoters (WT = 32%,
STAG2 = 28%), heterochromatin (WT = 10%, STAG2 = 7%), and polycomb repressed (WT
184
= 29%, STAG2 = 28%) regions. Interactions between RUNX1+24 and insulators increased
(WT = 32%, STAG2 = 48%). Poised promoters (WT = 16%, STAG2 = 22%) and super-
enhancers (WT = 16%, STAG2 = 24%) also had an increased number of connections with
RUNX1+24.
Figure 5.9: Features of RUNX1 +24 significant cis interactions in WT baseline and STAG2 baseline cells
(12 DMSO).
Percentage of interactions is on the vertical axis. Horizontal axis shows the ChromHMM annotations. Individual interactions can have multiple annotations.
During megakaryocyte differentiation of WT K562 cells (12 h post PMA stimulation),
the RUNX1 +24 enhancer lost 2 cis connections to ERG Promoter 3 (P3) and the regulatory
region between SAMSN1 and NRIP1. STAG2 12 PMA also lost the connection to ERG P3.
The connection to regulatory region between SAMSN1 and NRIP1 was not present in STAG2
12 DMSO, and this loss was maintained in STAG2 12 PMA (Figure 5.10). Most connections
observed in WT cells were conserved in the cohesin STAG2 mutant. However, one additional
cis connection was observed in STAG2 12 PMA (Log2 fold change of 8.83, p value of 0.0023)
(Appendix XI).
185
The overall similarity between RUNX1 +24 enhancer-mediated connections formed in
WT 12 PMA and STAG2 12 PMA was somewhat surprising based on the presence of
numerous new connections seen in the STAG2 baseline cells (Figure 5.8).
186
Figure 5.10: Significant changes to the RUNX1 +24 12h PMA treatment cis interactions due to cohesin
deficiency.
Cis 4C profiles for STAG2 mutant PMA 12h Chromosome 21 with time matched K562 WT PMA 12h RUNX1 +24 interactions represented by grey arches (WT 12 PMA). Green arches show a gained interaction compared to WT. Blue arches underneath the diagrams highlight lost interactions in STAG2 mutant cells compared to WT baseline. Each interaction is colour-coded according to its ChromHMM profile: key is at the bottom left of the figure. Super-enhancers were determined from dbSuper.
After 72 h of PMA treatment, the WT cells significantly lost connections across the
chromosome. In contrast, 72 h of PMA stimulation in STAG2 mutant cells led to the RUNX1
187
+24 enhancer dramatically gaining cis connections when compared to the WT treatment
condition (Figure 5.11).
There are 35 significantly increased cis connections in STAG2 72 h PMA cells (Log2
fold change ranges from 4.75 to 10.09, p values range from 3.5E-5 to 0.0094) see Appendix
XI. This correlates with the normalised read profile seen in Figure 5.7.
188
Figure 5.11: Significant changes to the RUNX1 +24 72h PMA treatment cis interactions due to cohesin
deficiency.
Cis 4C profiles for STAG2 mutant PMA 72h Chromosome 21 with time matched K562 WT PMA 72h RUNX1 +24 interactions represented by grey arches (WT 72 PMA). Green arches show an interaction that has been significantly gained compared to time matched WT condition. WT blue arches underneath the diagram show lost interactions compared to WT baseline. The STAG2 blue arch underneath the diagram shows an interaction that has been lost when compared to WT 72 PMA. Each interaction is colour-coded according to its ChromHMM profile: key is at the bottom left of the figure. Super-enhancers were determined from dbSuper.
All interactions observed across PMA treatment and cell line were further categorised
using super-enhancer annotations derived from dbSUPER (Khan et al., 2016). dbSUPER
annotations extend across multiple different cell-types. Compared to WT cells, in STAG2
189
mutant cells the RUNX1 +24 enhancer forms more interactions with super-enhancer regions
(Figure 5.12).
At the WT baseline (WT 12 DMSO) 16% (n= 5/31) of the connections RUNX1 +24
makes are with super-enhancers; this is increased in STAG2-deficient cells (STAG2 12
DMSO) to 24% (n= 11/46). During PMA-induced differentiation (12 h) the RUNX1+24
enhancer forms similar interactions in both WT and STAG2 mutant cells; 10% of the
connections are with super-enhancers (n= 3/29 and n= 3/30 respectively). 72 h after PMA
treatment, 14% (n= 3/22) of the connections to RUNX1 +24 are with super-enhancers in WT
cells. In contrast, in STAG2 mutant cells the percentage of interactions between RUNX1 +24
and super-enhancers is 22% (n= 12/55)(Figure 5.12).
Figure 5.12: Profile of RUNX1 +24 significant cis interactions with super-enhancers during megakaryocyte
differentiation with PMA stimulation in WT and STAG2 mutant cells.
Percentage of interactions is on the vertical axis. Horizontal axis shows the K562 cells treatment conditions WT DMSO (D) 12h, WT PMA (P) 12h and WT PMA 72h in green and STAG2 mutant STAG2 DMSO (D) 12h, STAG2 PMA (P) 12h and STAG2 PMA 72h in blue.
190
5.5.3 The effect of STAG2 deficiency on near-bait interactions during
differentiation of K562 cells
The normalised read count of all WT conditions (WT 12 DMSO, WT 12 PMA, WT 72
PMA) and STAG2 mutant conditions (STAG2 12 DMSO, STAG2 12 PMA, STAG2 72 PMA)
were plotted across chromosome 21 near-bait locations using ggplot (Wickham et al., 2007)
(Figure 5.13). During differentiation of WT cells, the majority of reads are located within
specific regions surrounding the RUNX1 +24 enhancer, and read count decreases as the cells
differentiate. However, RUNX1 +24-mediated interactions in STAG2 mutant cells give rise to
distinctly different interaction profiles.
Compared with WT cells, STAG2 12 DMSO lacks the peak around RUNX1 +24 and
has a higher level of background reads compared to WT 12 DMSO. During differentiation,
after 12 h of PMA treatment the RUNX1 +24-mediated interactions in STAG2 12 PMA
resembles the pattern of normalised reads seen in the WT 12 PMA. After 72 h there is a
dramatic increase in connections across the near-bait region in STAG2 72 PMA compared to
WT 72 PMA. The normalised read count profiles reflect the gain of connections in STAG2
mutants seen in the cis profiles.
191
Figure 5.13: Time course PMA stimulation of all RUNX1 +24 near-bait 4C profiles.
ggplot of the near-bait 4C profiles for WT DMSO 1, WT PMA 12 h, WT PMA 72 h, STAG2 DMSO 12 h, STAG2 PMA 12 h and STAG2 PMA 72 h. Normalised read count is on the vertical axis and position on chromosome 21 is on the horizontal axis.
5.5.4 Significant changes in near-bait interactions caused by
STAG2 deficiency
When comparing the WT 12 DMSO and STAG2 12 DMSO there are 176 interactions
present in WT cells that are not present in the cohesin mutant baseline (Log2 fold change ranges
192
from -10.69 to -2.62, p values range from 2.9E-17 to 0.0087). The statistically significant
changes between conditions is available in Appendix XII. These are within the 5 Mb region
surrounding the bait (Figure 5.14) The connections that decreased include RUNXOR, m+3,
+62, m+110, RUNX1 exon 3 and RUNX1 P2. However, in the STAG2 12 DMSO there are 83
regions interacting with RUNX1 +24 that are not present in WT 12 DMSO (Log2 fold change
ranges from 3.31 to 7.40, p values range from 4.4E-6 to 0.0096).
193
Figure 5.14: Significant changes to the RUNX1 +24 baseline (12 DMSO) of near-bait interactions due to
cohesin deficiency.
Near-bait 4C profiles for STAG2 mutant DMSO 12h Chromosome 21 with time matched K562 WT DMSO 12h RUNX1 +24 interactions represented by grey arches. Green arches show a gained interaction compared to time matched WT condition. Blue arches underneath the diagram represent lost interactions when compared to WT DMSO 12h. RUNX1 +24 is represented by the yellow line. RUNX1 promoters are represented by blue lines and interactions to these promoters are shown in blue. RUNX1 regulatory regions are represented by orange lines and interactions to these regions are shown in orange. Near-bait region is aligned to the UCSC K562 gene list and annotations. Dotted grey lines show Rao et al., (2014) K562 domain boundaries.
During megakaryocyte differentiation (12h post PMA stimulation) the WT cells
RUNX1 +24 significantly lost connections to -96, +89 and the SLC5A3 Exon 2 MRPS6 intron
region. STAG2 12 PMA maintains the connection with -96 and the SLC5A3 Exon 2 MRPS6
intron region (Log2 fold changes of 8.07 and 8.888, p values of 0.0053 and 0.0003
194
respectively). STAG2 12 PMA has one decreased connection when compared to WT 12 PMA
(Log2 fold change of -7.10, p value of 0.0033) see Appendix XII. In contrast to the DMSO
baseline, the cohesin mutant STAG2 12 PMA showed very similar connections to WT (Figure
5.15).
Figure 5.15: Significant changes to the RUNX1 +24 near-bait interactions after 12h PMA treatment due to
cohesin deficiency.
Near-bait 4C profiles for STAG2 mutant PMA 12h Chromosome 21 with time matched K562 WT PMA 12h RUNX1 +24 interactions represented by grey arches. WT green arches show gained interactions compared to WT DMSO 12h. The STAG2 green arch shows an interaction that has been significantly gained compared to time matched WT condition. WT blue arches beneath the diagram highlight lost interactions compared to WT baseline. The STAG2 blue arch beneath the diagram shows a lost interaction compared to WT 12 PMA. RUNX1 +24 is represented by the yellow line. RUNX1 promoters are represented by blue lines and interactions to these promoters are shown in blue. RUNX1 regulatory regions are represented by orange lines and interactions to these regions are shown in orange. Near-bait region is aligned to the UCSC K562 gene list and annotations. Dotted grey lines show Rao et al., (2014) K562 domain boundaries.
195
WT 72 PMA RUNX1 +24 had decreased interactions with +14, +54, +68, +142 regions,
RUNXOR, RUNX1 P1, as well as RUNX1 exon 1 and 2 and an increased connection with m-
371. STAG2 72 PMA also lost connections with +14, +54, +68, +142 regions, RUNX1 P1, and
RUNX1 exon 1, exon 2. However, RUNX1 +24 STAG2 72 PMA maintained connections with
RUNXOR and RUNX1-IT1 (Log2 fold changes of 8.28 and 4.15, p values of 2.0E-5 and 0.0096
respectively). RUNX1 +24 had a decreased connection with RUNX1 exon 3, m+3 and m-371
when compared with its WT counterpart (Log2 fold changes ranges from -6.05 to -7.37, p
values range from 3.6E-6 to 0.0003) There was a dramatic increase in the number of
connections to RUNX1 +24 however these regions require further investigation (Figure 5.16).
The connections are different between the STAG2 12 DMSO condition and STAG2 72
PMA as shown in Appendix XIV. These were not analysed further than looking at the adjusted
p value and Log2 fold change. Although these conditions have different profiles they do show
a similar trend; When compared to the equivalent WT condition, there is a decrease in the near-
bait connections 5 Mb either side of the bait and a dramatic increase in connections to RUNX1
+24 from regions 5 Mb to 10 Mb away from the bait.
196
Figure 5.16: Significant changes to the RUNX1 +24 near-bait interactions after 72h PMA treatment due to
cohesin deficiency.
Near-bait 4C profiles for STAG2 mutant PMA 72 h Chromosome 21 with time matched K562 WT PMA 72h RUNX1 +24 interactions represented by grey arches. WT green arches show gained interactions compared to WT DMSO 12 h. STAG2 green arches show gained interactions compared to time matched WT condition. WT blue arches beneath the diagram highlight lost interactions compared to WT baseline. STAG2 blue arches beneath the diagram highlight lost interactions compared to WT 72 PMA. RUNX1 +24 is represented by the yellow line. RUNX1 promoters are represented by blue lines and interactions to these promoters are shown in blue. RUNX1 regulatory regions are represented by orange lines and interactions to these regions are shown in orange. Near-bait region is aligned to the UCSC K562 gene list and annotations. Dotted grey lines show Rao et al., (2014) K562 domain boundaries.
5.5.5 Significant interactions with other genes
Connections with genes were identified using the same methods as Section 5.4.5 and
the resulting output is listed in Appendix XIII. The number of gene connections in STAG2
197
mutants is different to WT connections. STAG2 has almost double the number of gene
connections (121) at baseline compared to WT (64). The number of associated genes decreased
as the cells were treated with PMA (Table 5.3). Interestingly, despite the similar connection
profiles of +24 in WT and STAG2 mutant cells after 12 h of PMA treatment, the number of
gene connections varies between mutant and wild type, with fewer genes connected in the
STAG2 mutant cells. Surprisingly, even though the overall numbers of connections increased
in STAG2 mutants during differentiation compared with WT, the number of connections that
were associated with genes decreased. This may be because the gain in connections is to
intronic and intergenic regions, and/or there are multiple connections within the same gene. It
may also reflect the inability of STAG2 mutant cells to differentiate, as genes involved in
differentiation are not being recruited. These results are inconclusive without further
investigation as mentioned in Section 5.4.5.
Table 5.3: Number of genes connected to RUNX1 +24 in WT cells
WT 12 DMSO
WT 12 PMA
WT 72 PMA
STAG2 12 DMSO
STAG2 12 PMA
STAG2 72 PMA
Number of genes connected to RUNX1 +24
64 154 140 121 37 47
5.5.6 Overall changes in and near-bait interactions mediated by RUNX1
+24 interactions during differentiation of K562 cells, WT versus
STAG2-deficient
After 72 h PMA treatment, WT K562 cells have features consistent with
megakaryocyte identity. During megakaryocyte differentiation in WT K562 cells, there is an
overall loss of connections to the RUNX1 +24 enhancer, but a gain in connected genes. In
198
contrast, the RUNX1 +24 enhancer gains extra connections in STAG2 mutants, but loses
connections to other genes.
Figure 5.17 compares the baseline (12 DMSO) WT and STAG2 RUNX1 +24
connections with regions that may be important for megakaryocyte differentiation. WT 12
DMSO maintains connections with m-368, -96, m-58, m+3, m+62, +89, m+110, SLC5A3 Exon
2 MRPS6 intron, RUNXOR, RUNX1 P2, RUNX1 exon 3, ERG P3 and super-enhancer
regulatory regions between SAMSN1 and NRIP1. STAG2 12 DMSO maintains connections
with m-368, -96, m-58, +89, SLC5A3 Exon 2 MRPS6 intron, RUNX1 P2 and ERG P3.
However, it loses connections with m+3, m+62, m+110, RUNXOR, RUNX1 exon 3 and super-
enhancer regulatory regions between SAMSN1 and NRIP1.
Figure 5.17: The effect of cohesin deficiency on the RUNX1 +24 enhancer-anchored chromatin hub in
baseline (12 DMSO) conditions in K562 cells.
Yellow boxes represent UCSC annotated gene promoters and/or exons. Blue boxes show described RUNX1 regulators. Orange boxes annotate novel putative RUNX1 regulators.
In WT K562 cells, during differentiation (12 h PMA) the RUNX1 +24 enhancer gains
connections to +14, +54, +68, +142, RUNX1-IT1, RUNX1 promoter 1, exon 1 and 2. RUNX1
+24 maintains connections seen in WT 12 DMSO with m-368, m-58, m+3, m+62, m+110,
RUNXOR, RUNX1 P2, RUNX1 exon 3. In WT K562 cells, RUNX1 +24 decreased interactions
199
with -96, +89, SLC5A3 Exon 2 MRPS6 intron region, ERG P3 and super-enhancer regulatory
regions between SAMSN1 and NRIP1. The RUNX1 +24 enhancer in STAG2-deficient cells (12
h PMA) has a similar interaction landscape as WT 12 PMA, the obvious differences being that
STAG2 12 PMA RUNX1 +24 further maintains the interaction with -96 and SLC5A3 Exon 2
MRPS6 intron region, which are lost in the WT (Figure 5.18).
Figure 5.18: The effect of cohesin deficiency on the RUNX1 +24 enhancer-anchored chromatin hub in mid-
differentiation (12 PMA) conditions in K562 cells.
Yellow boxes represent UCSC annotated gene promoters and/or exons. Blue boxes show described RUNX1 regulators. Orange boxes annotate novel putative RUNX1 regulators.
In WT K562 cells, after differentiation (72 h PMA) the RUNX1 +24 enhancer interacts
with m-371, m-368, m-58, m+3, +62, m+110, RUNX1 P2 and exon 3. The RUNX1 +24
enhancer loses connections with +14, +54, +68, +142, RUNX1-IT1, RUNXOR , RUNX1
promoter 1, exon1 and 2. In STAG2-deficient cells (72 h PMA) RUNX1 +24 interacts with m-
368, m-58, +62, m+110, RUNX1 P2, RUNX1-IT1, RUNXOR. The RUNX1 +24 enhancer loses
connections with m-371, m+3, and RUNX1 exon 3 when compared to the WT (Figure 5.19)
200
Figure 5.19: The effect of cohesin deficiency on the RUNX1 +24 enhancer-anchored chromatin hub in post-
differentiation (72 PMA) conditions in K562 cells.
Yellow boxes represent UCSC annotated gene promoters and/or exons. Blue boxes show described RUNX1 regulators. Orange boxes annotate novel putative RUNX1 regulators.
Figure 5.17, 5.18 and 5.19 give an overview of the changes to the WT identified
connections mediated by the RUNX1 +24 enhancer. However, these summaries do not illustrate
the gained connections in STAG2 mutant cells. The regions that have increased interactions
with the RUNX1 +24 enhancer in STAG2 mutant cells require further analysis to determine
their functional relevance.
5.6 Chapter summary
5.6.1 RUNX1 +24 enhancer connections during K562 differentiation to
megakaryocytes
5.6.1.1 Distant connections
The RUNX1 +24 enhancer makes connections spanning chromosome 21 during
megakaryocyte differentiation. As differentiation occurs, there is a steady decrease of
interactions across the chromosome, including loss of the connection between RUNX1 +24 and
ERG P3. Because ERG expression slowly increases during differentiation, it is possible that
the interaction between the RUNX1 +24 enhancer and ERG P3 inhibits transcription from this
201
promoter in baseline conditions. During the onset of differentiation, the loss of ERG P3
connection with RUNX1 +24 may in turn allow ERG P3 initiate expression of ERG.
5.6.1.2 Connections within the RUNX1 TAD
During megakaryocyte differentiation, the RUNX1 +24 enhancer displays a dynamic
interaction landscape in WT K562 cells. Upon PMA treatment, RUNX1+24 connects with
possible regulatory elements -96, +14, +54, +68, +89, +142 and known regulator m-371.
Interactions between RUNX1 +24 and +14, +54, +68 and +142 were only observed during
differentiation (12h PMA). All of these regions were annotated in by ChromHMM as enhancers
in steady-state K562 cells. RUNX1 +24 maintains contact with a cluster of regulatory regions
m-368, m-58, m+3, +62 and m+110. Previously m-368, m-58, m+3 and m+110 had only been
described as enhancers in mice but were shown to be conserved in humans in Chapter 3.
Upon stimulation with PMA, K562 cells start to transcribe RUNX1 from the P1
promoter. P1-driven transcription of RUNX1 is essential for megakaryocyte differentiation.
Consistent with the idea that the interaction between RUNX1 +24 and RUNX1 P1 stimulates
P1-driven transcription, the connection between RUNX1 +24 and RUNX1 P1 increases during
differentiation (12h PMA). In contrast, RUNX1 P2 transcript is minimally expressed
throughout differentiation. The connection between RUNX1 P2 to RUNX1 +24 remains
unchanged throughout differentiation. After differentiation there is an increase in the
connection between RUNX1 +24 and m-371.
5.6.2 RUNX1 +24 enhancer connections during PMA treatment in
STAG2 mutants
5.6.2.1 Baseline conditions:
In contrast to WT K562 cells, in STAG2-deficient cells the RUNX1 +24 enhancer
makes many more connections across chromosome 21. Increased connections corresponded
with regions that are annotated as insulators and poised promoters. Despite the overall increase
202
in interactions, some connections that are present in WT K562 cells are lost. RUNX1 +24 loses
connections with m+3, m+62, m+110, RUNXOR, RUNX1 exon 3 and super-enhancer
regulatory regions between SAMSN1 and NRIP1.
5.6.2.2 Mid-differentiation
In STAG2-deficient K562 cells, transcription of RUNX1 from the P1 promoter fails to
be activated by PMA stimulation, and instead, there is ectopic expression from the P2
promoter. From this finding (and from the baseline connectivity) the expectation was that after
12 h of PMA treatment the RUNX1 +24 enhancer would have a vastly different interaction
profile in STAG2 mutants compared to WT. Surprisingly, the RUNX1 +24 enhancer forms
similar connections after 12 h PMA treatment in both WT and STAG2-deficient cells. The
RUNX1 +24 enhancer maintains the interaction with -96 and SLC5A3 Exon 2 MRPS6 intron
region, which are lost in the WT, in STAG2 mutants. Although ERG P3 is released upon PMA
stimulation in STAG2 mutant cells, just as it is in WT, expression of ERG is inappropriately
sensitised in STAG2 mutant cells. Further investigation is required to determine how ERG
becomes ectopically activated.
5.6.2.3 Post-differentiation
STAG2 mutant cells retained stem cell-like properties and failed to activate
megakaryocyte markers (Antony et al., 2020). After 72 h PMA treatment of STAG2 mutant
cells, interactions mediated by the RUNX1 +24 enhancer no longer resembles the WT profile.
The RUNX1 +24 enhancer mediates a vastly increased number of connections, and the profiles
of normalised read counts are chaotic when compared to the profiles of the WT RUNX1 +24
connections (Figure 5.7 and 5.14). After 72 h PMA treatment of STAG2 mutant cells, the
RUNX1 +24 enhancer maintains connections with RUNX1-IT1, RUNXOR, which are lost in
WT K562. The RUNX1 +24 enhancer lost interaction with m-371, m+3, and RUNX1 exon 3 in
STAG2 mutant cells; connections that are maintained in WT K562 (Figure 5.19).
203
The RUNX1 +24 enhancer contacted more super-enhancer regions in STAG2 mutant
cells compared to WT K562.
204
6 Chapter 6: Discussions and Conclusions
Runx1 is a transcription factor that is crucial for definitive haematopoiesis (Okuda et
al., 1996; Wang et al., 1996). RUNX1 has a significant role in the development to myeloid
malignancies, especially in Acute Myeloid Leukaemia (AML) (Ichikawa et al., 2013; Okuda
et al., 1996). runx1 expression in haematopoietic progenitor cells in zebrafish is reliant on
cohesin (Horsfield et al., 2007). Cohesin mutations have been recurrently identified in AML
and other myeloid malignancies (Leeke et al., 2014). Because karyotypically normal cells are
found with cohesin mutations, the mechanism by which cohesin mutation leads to leukaemia
is thought to be by dysregulation of target genes rather than mitotic defects (Thol et al., 2014).
Cohesin insufficiency impairs differentiation, alters overall chromatin accessibility, as well as
the transcription of RUNX1 (Mazumdar et al., 2015; Mullenders et al., 2015; Viny et al., 2015).
Cohesin regulates Runx1 expression, and RUNX1 plays a central role in the development of
myeloid malignancies, therefore cohesin-dependent RUNX1 expression may represent an
underlying mechanism that leads to myeloid malignancies (Leeke et al., 2014). The precise
spatiotemporal expression of Runx1 requires cis-regulatory elements, the majority of which
have not been defined.
In my PhD project, I set out to investigate the regulatory potential of Runx1 4C-
identified regions, and determine conservation of mouse Runx1 regulators in humans. This
work has provided an insight into mouse Runx1 enhancer functions and highlighted the features
of conserved human enhancers. Runx1 +24 was identified as an important intronic Runx1
enhancer in mice, and is crucial for the activation of Runx1 expression in hematopoietic
stem/progenitor cells (HSPCs) (Ng et al., 2010; Nottingham et al., 2007). Runx1 +24 is highly
conserved in eukaryotes, indicating an essential role in gene regulation. Runx1 +24 binds most
active Runx1 enhancers in mice HPCs and may anchor a regulatory hub that organises the
correct enhancer-promoter interactions for Runx1 regulation. Suppression of enhancers by BET
205
inhibitor proteins alters the expression of RUNX1 in haematopoietic cells (Antony et al., 2020).
I hypothesised that the RUNX1 +24 enhancer anchors an enhancer hub in human cells and is
important for correct RUNX1 expression, which may be disrupted in AML.
Chromatin conformation capture techniques were used in this project to understand the
role of RUNX1 +24 enhancer in 3D chromatin connections, and as a hub for functional
regulatory regions. The RUNX1 +24 enhancer connected across chromosome 21, defying
known TADs. It interacted with coding genes, non-coding RNA loci, super-enhancers and
regulatory elements, and may be maintaining these connections to prevent transcription, rather
than have a traditional enhancer role. The data suggests that the RUNX1 +24 enhancer could
be important for overall organisation and 3D structure of chromosome 21.
The results from my project highlighted the dynamic changes in the RUNX1 +24 hub
connections with RUNX1 and ERG, genes central in correct megakaryocyte development,
during differentiation of the chronic myeloid leukaemia cell line K562. Chromatin interaction
data identified putative RUNX1 regulators and demonstrated how cohesin insufficiency affects
the RUNX1 +24 enhancer connections. In cohesin-deficient cells, the number of RUNX1 +24
connections dramatically increased across chromosome 21. This suggests that cohesin STAG2
mutation leads to a de-constrained chromatin environment. Loss of insulation of chromatin
domains in STAG2 mutants enables aberrant enhancer-amplified transcription and may
contribute to the pathogenesis of AML.
6.1 Regulators of Runx1
6.1.1 Mouse Runx1 enhancers
This thesis describes the functional characterisation of novel Runx1 enhancers m-371,
m-354, m-303 and m-48, as well as the previously identified m+110, m-58, m-321, m-327, m-
368. The data show that all these elements have enhancer capability except m-327. Despite the
lack of activity seen in zebrafish embryos expressing m-327, this putative regulatory region
206
spatially connects to the active (P1) promoter of Runx1 and the Runx1 +24 enhancer in mouse
HPC7 cells, so it may have another regulatory role in this hub. All enhancers except m+110
interacted with the Runx1 +24 enhancer (Marsman, 2016), suggesting that the Runx1 +24
enhancer could be anchoring an enhancer hub in mouse HPC7 cells.
My analysis of these enhancers in haematopoietic tissues of developing zebrafish
embryos shows that they appear to be active during the primitive wave of haematopoiesis. A
complete summary of the spatiotemporal activity of the mouse Runx1 regulatory regions in
zebrafish is difficult to interpret, because zebrafish embryos were not continuously observed
throughout development. Some of the putative enhancer regions may drive cell-specific gene
expression at developmental periods not observed.
Recent research has demonstrated the potential of zebrafish models to assess the
function of human and mouse cis-regulatory elements, regardless of their sequence
conservation in zebrafish (Bhatia et al., 2013; Bhatia et al., 2014; Chahal et al., 2019;
Ketharnathan et al., 2018; Rainger et al., 2014; Ravi et al., 2013; Yuan et al., 2018). The
transcription factors (TFs) required by cis-regulatory elements for their function are still
present and utilised in the appropriate cell and tissue types (Bessa et al., 2009; Mann et al.,
2019). Enhancer assays highlight the value of the zebrafish model as a way to determine
functionality of regulatory sequences from other species.
6.1.2 Conserved human RUNX1 regulators
I found that 9 out of the 16 mouse Runx1 regulatory regions are conserved in human
(based on sequence analysis and ENCODE annotations), increasing the number of ‘known’
putative human RUNX1 regulatory regions from 6 to 14 (Figure 6.1). Conservation between
mice and humans suggests that these regulatory regions are fundamental for the correct
regulation of RUNX1. m-354, m-321, m-303, m-48, m-42, m+1 and m+181 were not conserved
in human.
207
Figure 6.1:Schematic overview of human RUNX1 locus (hg19: chromosome 21: 36,148,773-36,872,777).
Black text annotates the previously identified human RUNX1 regulatory regions. Blue text highlights the conserved mouse regulatory regions. Each regulatory region is annotated with orange circles, grey boxes annotate exons. The blue boxes show the exons encoding the Runt binding domain. The two RUNX1 promoters are represented by black right angled arrows.
Bioinformatic analysis of RUNX1 regulatory regions showed that not all of them appear
to be active in K562 cells (Table 6.1). ChromHMM annotates m-327 and -250 as repressed,
and m-368, m-58, m+110 -139, m-43, +24, +62 as active enhancers. Intron 5.2, m-101 and
m+3 show chromatin features of enhancers; however, they are labelled as transcription-
associated alongside -139 and -188. Transcription-associated annotation may be due to their
close proximity with exons when compared to the regions annotated as enhancers or it may
indicate that these regions are producing RNA, for instance enhancer RNA (eRNA).
The RUNX1 +24 enhancer has strong enhancer characteristics and recruits
haematopoietic transcription factors SP1, TAL1, GATA1 and GATA2. SPI also located to -
139 and -188. Haematopoietic transcription factors TAL1, GATA1 and GATA2 bind to m-
371, m-58 and m+110. These regions showed transcription factor binding sites and chromatin
modifications similar to that found at the RUNX1 +24 enhancer.
The RUNX1 +24 enhancer also recruits LSD1, a mediator of transcriptional repression
usually seen in silencers. Interestingly, nine other regulatory regions (m-371, m-368, -250, m-
101, m-58, m+3, +62, m+110 and intron 5.2) had LSD1 binding in K562 cells. Cheng et al.,
(2018) treated haematopoietic cells (chronic myelogenous leukaemia (K562), acute myeloid
208
leukaemia (OCI-AML3) and histiocytic lymphoma (U937)) with a LSD1 inhibitor and
observed upregulation of RUNX1, which implicates the role of LSD1 in repression of RUNX1
(Cheng et al., 2018). Kerenyi et al., (2013) previously showed Lsd1 is a crucial epigenetic
regulator of haematopoiesis in mice, because it represses key hematopoietic stem cell genes
including Runx1 (Kerenyi et al., 2013).
Cheng et al., (2018) characterised +62 as a silencer of RUNX1 P2 by measuring the
repressive effect of various deletion/mutant +62 luciferase constructs on transcription from the
P2 promoter. This functional assay was carried out in K562, OCI-AML3 and U937 cells
(Cheng et al., 2018). However, in K562 cells +62 has ENCODE and ChromHMM annotations
associated with enhancer activity. These regions with enhancer and silencer annotations may
have multiple roles depending on combinations of connections being made with the region
(promoters and other regulatory regions) and the cell type.
ATAC-seq (Antony et al., 2020) showed that intron 5.2, m+110 and m-371 regions
were in differentially open chromatin regions in STAG2 mutant K562 cells. This suggests that
cohesin influences the accessibility of these enhancers. I observed a coincidence of cohesin
subunit Rad21 binding in the absence of CTCF with m+110 enhancer, and CTCF binding with
m-371, m-58, m-43, m+3, +62 and intron 5.2. This suggested that CTCF and cohesin could
independently mediate a subset of enhancer-promoter looping in combination with
transcription factors.
6.1.3 G-quadruplex
The Circular Dichroism (CD) analysis showed that the predicted G-quadruplexes
observed in the mouse +24 element are capable of forming parallel G-quadruplexes in vitro.
The G-quadruplex analysis was specific to the mouse sequence so it may not reflect the human
sequence, the type of G-quadruplex, or its functional capacity. Luciferase assays showed that
209
the mutations of G-quadruplex in mouse Runx1 +24 had no significant effect on enhancer
function when tested in isolation in mouse fibroblast (NIH/3T3) cells.
6.1.4 Regulatory RNAs
Non-coding RNAs are involved in many biological processes, including transcription
regulation, cell-fate programming and genome 3D organisation (Beltran et al., 2019; Flynn et
al., 2014; Geisler et al., 2013; Holoch et al., 2015; Li et al., 2019; Nozawa et al., 2019; Sun et
al., 2018; Wang et al., 2017). Types of non-coding RNA include enhancer RNA (eRNA), Long
non-coding RNAs (LncRNAs) as well as G-quadruplex RNA.
6.1.4.1 Enhancer RNA (eRNA)
eRNA has been shown to be involved in functional roles including; assisting enhancer-
promoter looping formation, supporting pause-release of RNA Pol II which facilitates
transcription elongation, promoting transcription factor and co-regulator binding and aiding
target gene transcription (Bose et al., 2017; Hou et al., 2020b; Lai et al., 2013; Lam et al.,
2013; Li et al., 2013a; Melo et al., 2013; Panigrahi et al., 2018; Rahnamoun et al., 2018;
Schaukowitch et al., 2014; Sigova et al., 2015; Tsai et al., 2018). FANTOM5 data showed ten
of the 14 RUNX1 regulatory regions (m-371, m-327, -250, -139, m-43, m+3, +24, +62, m+110
and intron 5.2) produced eRNA in haematopoietic cells. Although eRNA is typically used for
enhancer identification, not all active enhancers produce detectable eRNA (Mikhaylichenko et
al., 2018; Sartorelli et al., 2020). It remains unclear whether these enhancers produce low levels
of eRNAs that are not easily detectable with current methods. ChromHMM labelled -188, -
139, m-101, m+3 and intron 5.2 as transcription-associated. This may be attributed to enhancer
RNA transcription, as m-139, m+3 and intron 5.2 all produce eRNAs. It remains to be
investigated whether the transcription association observed in m-188 and m-101 may also due
to enhancer RNA production.
210
6.1.4.2 Long non-coding RNAs
Long non-coding RNAs (LncRNAs) have been shown to activate or suppress
transcriptional activity by interacting with chromatin modulating proteins (Fan et al., 2015;
Han et al., 2014; Jain et al., 2016; Prensner et al., 2013; Scarola et al., 2015). RUNX1-IT1 and
RUNXOR LncRNAs have been shown to regulate RUNX1. Relatively little is known about
RUNX1-IT1, a long non-coding RNA expressed from in between m+3 and RUNX1 +24.
However, Liu et al., (2020) found RUNX1-IT1 enhanced the transcription of the RUNX1 gene
in pancreatic cancer cells (Liu et al., 2020).
Previous studies show RUNXOR, a long non-coding RNAs 3.8kb upstream of RUNX1
has the ability to regulate the RUNX1 P1 promoter (Nie et al., 2019; Wang et al., 2014).
RUNXOR activates P1 and causes a shift to P1 RUNX1c isoform expression. The binding of
RUNXOR lncRNA elicits DNA demethylation and induces active histone modification markers
in the P1 CpG island. Therefore RUNXOR could be essential for the P1/P2 switch during
normal haematopoiesis and potential tumorigenesis (Nie et al., 2019).
m-371, m-368 and m-327 all sit within an uncharacterised non-coding RNA
LOC100506403. Like RUNX1-IT1 and RUNXOR, this long non-coding RNA could also play
a role in regulating RUNX1, with these regions helping to determine its expression. My 4C data
showed that in baseline conditions (12 DMSO), wild type K562 cells had a dramatic peak of
normalised reads that indicates a connection to m-368. Which may indicate an essential role of
LOC100506403 in WT 12 DMSO cells.
6.1.4.3 G-Quadruplex RNAs
G-quadruplex RNA is capable of targeting and removing the Polycomb repressive
complex 2 from genes and allowing RNA-mediated regulation (Beltran et al., 2019; Wang et
al., 2017). Because part of the mouse Runx1 +24 enhancer is capable of forming a G-
quadruplex, it is possible that the Runx1 +24 enhancer RNA may also form a G-quadruplex.
211
Previously Fukunaga et al., (2020) showed that G-quadruplex RNA binds the ETO protein and
dissociates the leukemic RUNX1-ETO fusion protein from DNA (Fukunaga et al., 2020). This
suggests that some G-quadruplex RNAs may have potential as therapeutic agents for
leukaemia.
6.1.5 Could mutational load in regulatory elements also contribute to
disease progression?
Single nucleotide polymorphisms (SNPs) are present in each of the human RUNX1 conserved
regulatory regions, except +24 and m-101. m-371, m-368, m-58, m+110 and intron 5.2 SNPs
were reported in haematopoietic, brain, thyroid, skin and oesophagus cells. SNP frequency
could be an underestimate because of the scarcity of whole genome sequence in individual
haematopoietic cell types, AML, and other myeloid malignancy samples; this limits the ability
to identify significant SNPs and mutations in haematopoietic cells. The GTEx portal annotates
SNPs with MAF >1%, as well as eQTL outputs. The GTEx portal contains 54 tissues sampled
from healthy older individuals, and also only measures whole blood, rather than individual cell
types, so cell type- or disease-specific effects may not be detected.
There were no SNPS with MAF of >1% present in the RUNX1 +24 enhancer. This
suggests that there is strong selection pressure to prevent mutation of the RUNX1 +24 enhancer.
Consistent with this, functional studies suggest that the region containing RUNX1 +24 enhancer
is functionally important in haematopoietic cells. To determine the importance of the RUNX1
+24 enhancer for RUNX1 expression and function, Mill et al., (2019) CRISPR/Cas9 edited the
whole intron between RUNX1 P1 and P2 in OCI-AML5 cells (Mill et al., 2019). The majority
of the edited cells were eradicated via apoptosis, and the viable edited cells had significantly
decreased RUNX1 expression and slower growth (Mill et al., 2019). It is not clear whether the
observed phenotypes correspond uniquely to deletion of the RUNX1 +24 enhancer since the
entire P1-P2 intron was removed. Rather, this research shows the P1-P2 intronic region, which
212
contains multiple RUNX1 regulatory elements, is important for cell survival and RUNX1
expression.
Of the 35 SNPs identified in the conserved RUNX1 regulatory regions, 6 had previously
reported functional effects in haematopoietic cells. m-371 contains rs2834945, a T to C change
that is predicted to affect 11 regulatory motifs including GATA2. rs2834945 is related to
expression of Kynurenine 3-monooxygenase (KMO) in peripheral blood monocytes (Zeller et
al., 2010). KMO encodes a mitochondrion outer membrane protein that catalyses the
hydroxylation of L-tryptophan metabolite, L-kynurenine, to form L-3-hydroxykynurenine
(Alberati-Giani et al., 1997). KMO is typically investigated in a neurological context, as high
KMO expression leads to neurodegeneration (Breton et al., 2000).
m-368 harbours two functional SNPs: rs16993221 and rs909143. rs16993221 is related
to white blood cell count; it is a crucial marker that contributes to chronic inflammation (Kong
et al., 2013). The A to T change in rs16993221 alters two regulatory motifs, BATF and IRF,
which work together in immune response. The A to G change in rs909143 is also related to
expression of KMO in peripheral blood monocytes (Zeller et al., 2010) and is predicted to
affect two regulatory motifs (NR3C1, POU2F2).
Within m+110 sits rs73900579, a T to C change that alters two regulatory motifs and
is related to red cell distribution width (Kichaev et al., 2019).
Intron 5.2 contains two functional SNPS, rs2249650 and rs2268276; both are associated
with AML susceptibility. The different SNPs are in linkage disequilibrium (LD) and change
the ability of intron 5.2 to act as an enhancer. A (rs2249650) G (rs2268276) bases and AA have
more enhancer capability that GA and GG. GA has the most reduced enhancer capability. The
SNPs also affect SPI1 binding capability; AG has strong SPI1 binding, GA has weak SPI1
binding, whereas AA and GG have medium SPI1 binding capability (Xu et al., 2016).
213
Although not all SNPs found in this project have been functionally analysed, some of
the regulatory region SNPs were reported in patients with acute myeloid leukaemia. Two AML
patients had -250 SNP rs2834885, and two SNPs in +62 (rs933131 and rs2834716) were also
found in patients with AML. Interestingly, the patient with +62 rs2834716 also had the intron
5.2 SNPs (rs2268276 and rs2249650) which were previously associated with AML
susceptibility (Xu et al., 2016).
Five patients with malignant lymphoma reported mutations within regulatory regions
m-58 and +24 that are not reported as SNPs with a MAF of >1%. Four patients had m-58
mutations chr21:g.36480715C>A (rs199811665), chr21:g.36480761A>C and
chr21:g.36481470C>T (rs1440069314). One case of malignant myeloma reported a mutation
within +24, chr21:g.36399191A>T.
The available AML genomic data is relatively limited because the majority of analyses
were performed with exome sequencing data, rather than whole genome sequencing data. Lack
of whole genome data and individual SNP significance could lead to an underestimate of the
functional contribution of enhancer-region mutations to myeloproliferative disorders.
Xu et al., (2016)’s analysis of intron 5.2 SNPs showing that SNPs are functionally
relevant to leukaemia suggests a need to analyse each of the SNPs within regulatory regions
for functional capability. Determining and categorising SNPs may help us understand each
patient’s disease progression, individual responses to drugs, or susceptibility to relapse. Some
patients had SNPs with previously described functional effects on haematopoietic cells.
However, the SNP data for myeloproliferative disorders lacks significance because the
majority of SNP eQTL information comes from analysis of whole blood, rather than each
specific cell type.
There is good evidence suggesting that RUNX1 regulatory regions intron 5.2 and +62
are directly involved in disease progression, so further investigation of these regulatory regions
214
is necessary. Intron 5.2 contains SNPs that are associated with AML susceptibility, and in
addition, this regulatory region is the site of translocation in t(8;21) found in AML patients in
which RUNX and ETO genes recombine. Schnake et al., (2019) identified a long non-coding
RNA in intron 5.2, at a site of these frequent chromosomal translocations. This long non-coding
RNA results in a more relaxed chromatin organisation at intron 5.2 (at the location of the
breakpoint). Consequently, it is possible that this feature acts as a determinant of a higher
probability of double stranded breakages in myeloid cells, resulting in translocations (Schnake
et al., 2019).
The discovery of RUNX1 silencer +62 resulted from characterisation of a novel
t(5;21)(q13;q22) translocation involving RUNX1 that was acquired during the progression of
myelodysplastic syndrome to AML in a paediatric patient (Cheng et al., 2018). This
translocation did not generate RUNX1 fusion, but rather it aberrantly upregulated the P2
isoform of RUNX1. The authors state that the translocation facilitated upregulation of RUNX1
P2 because the silencer at +62 was removed.
Several groups have described chromatin rearrangements that lead to “enhancer
hijacking”, whereby enhancers from somewhere else in the genome are rearranged next to
genes leading to overexpression (Bohlander, 2000; Gröschel et al., 2014; Look, 1997;
Northcott et al., 2014; Peifer et al., 2015; Taub et al., 1982; Zhang et al., 2020). Taub et al.,
(1982) described t(8;14) translocations in Burkitt's lymphoma which alter gene expression
through enhancer hijacking (Taub et al., 1982). Rewat et al., (2004) showed that in the t(12;13)
translocation, overexpression of CDX2 gene is leukaemogenic rather than the ETV6-CDX2
fusion(Rawat et al., 2004).
The majority of translocations occur between RUNX1 P1 and P2; while the disruption
caused to enhancers has not yet been carefully categorised, these may be examples of enhancer
hijacking.
215
Table 6.1: Summary of conserved RUNX1 regulatory region analysis
Regulatory region
ChromHMM annotation in K562 cells
FANTOM5 eRNA
Functional enhancer
Cohesin binding K562
CTCF binding K562
Accessibility of chromatin in STAG2 mutant K562
SNPs Investigated Functional SNPs
Mutations seen in haematopoietic cells in cancer patients
m-371 Strong Enhancer ✓ ✓ ✗ ✓ Increased 2 1 ✗
m-368 Strong Enhancer ✗ ✓ ✗ ✗ No change 3 2 ✓
m-327 Repressed ✓ ✗ ✗ ✗ No change 2 0 ✗
-250 Repressed ✓ ✓ ✗ ✗ No change 3 0 ✓
-188 Transcription Associated
✗ ✓ ✗ ✗ No change 1 0 ✗
-139 Weak enhancer, Transcription Associated
✓ ✓ ✗ ✗ No change 3 0 ✗
m-101 Transcription Associated
✗ No available data
✗ ✗ No change 0 0 ✗
m-58 Strong Enhancer ✗ ✓ ✗ ✓ No change 13 0 ✓
m-43 Weak Enhancer ✓ No available data
✗ ✓ No change 2 0 ✗
m+3 Transcription Associated
✓ No available data
✗ ✓ No change 1 0 ✗
216
Regulatory region
ChromHMM annotation in K562 cells
FANTOM5 eRNA
Functional enhancer
Cohesin binding K562
CTCF binding K562
Accessibility of chromatin in STAG2 mutant K562
SNPs Investigated Functional SNPs
Mutations seen in haematopoietic cells in cancer patients
+24 Strong enhancer ✓ ✓ ✓ ✓ No change 0 0 ✓
+62 Weak Enhancer ✓ No available data
✗ ✓ No change 2 0 ✓
m+110 Strong Enhancer ✓ ✓ ✓ ✗ Increased 2 1 ✓
Intron 5.2 containing m+204
Transcription Associated
✓ ✓ ✗ ✓ Increased 3 2 ✓
217
6.2 Organisational role of the RUNX1 +24 enhancer
The mouse Runx1 +24 enhancer binds most other active Runx1 enhancers in murine
HPCs, and enhancer suppression in K562 results in altered RUNX1 expression. Therefore, I
hypothesised that the RUNX1 +24 enhancer organises a chromatin hub in human cells that is
crucial for correct RUNX1 expression. Circular chromosome conformation capture (4C) is a
technique that can be used to determine 3D chromatin interactions from a given viewpoint, or
‘bait’ (van de Werken et al., 2012a). I used the RUNX1 +24 enhancer as ‘bait’ in a 4C
experiment to determine the interacting DNA partners of this enhancer in K562 cells.
The majority of RUNX1 +24 interactions were with other regulatory regions that
include both active and inactive chromatin. The connections between the RUNX1 +24 enhancer
and other genes were predominantly with loci encoding non-coding RNAs.
Within the RUNX1 TAD, the RUNX1 +24 enhancer connects with other RUNX1
regulatory regions (m-368, m-101, m-58, and m+110) that had previously only been described
in mice, but have been shown in this thesis to be conserved in humans. RUNX1 +24 connected
to only one RUNX1 promoter (P2) and the previously identified +62 enhancer. RUNX1 +24
does not connect to all RUNX1 regulatory regions, consistent with the idea that different
enhancer recruitment occurs in diverse cell types to generate different RUNX1 isoforms. There
were some novel interactions identified; RUNXOR, ERG P3 and putative regulatory regions -
96, +89, SLC5A3 Exon 2 MRPS6 intron and a super-enhancer region between SAMSN1 and
NRIP1.
My 4C analysis shows that in K562 cells, the RUNX1 +24 enhancer makes multiple
connections within the RUNX1 TAD (‘near-bait’) and further afield on chromosome 21 (‘cis’).
The chromatin conformation methods, Hi-C and promoter Capture-C, show that gene
promoters usually form interactions within their own TADs. Due to the high resolution of 4C
interactions, it is apparent that the RUNX1 +24 enhancer was not constrained to within-TAD
218
interactions, but also formed connections across the whole of chromosome 21. These
interactions outside of TAD also altered with differentiation, so they are unlikely to be artifacts
of the 4C technique. This surprising finding suggests that the RUNX1 +24 enhancer possibly
organises a chromatin hub in K562 cells that involves much of chromosome 21. This implies
that the RUNX1 +24 enhancer forms a chromatin hub that ensures correct regulation of
transcription of several chromosome 21-linked loci in K562 cells.
6.3 RUNX1 +24 enhancer connections during megakaryocyte
differentiation
My 4C analysis in K562 cells shows that the RUNX1 +24 enhancer makes multiple
connections across chromosome 21. During megakaryocyte differentiation, transcription
factors are induced to initiate gene expression consistent with megakaryocyte development. Do
these transcription factors alter connections made by the RUNX1 +24 enhancer? To determine
differences in connections, I used the RUNX1 +24 enhancer as ‘bait’ in a 4C experiment in
K562 cells undergoing PMA-induced differentiation, sampled at mid-point (12 h) and end-
point (72 h) stages. I found that as differentiation occurs, there is a steady decrease in
interactions mediated by the RUNX1 +24 enhancer across chromosome 21, including a
decrease in strength of connection between RUNX1 +24 and ERG P3.
ERG expression slowly increases during megakaryocyte differentiation. ERG P3 is
constrained in the baseline K562 cells (12 DMSO) and shows insulator annotations. Following
PMA treatment, ERG P3 has a weakened connection with RUNX1 +24. This may be to allow
ERG P3 to activate and facilitate the expression of ERG. In steady-state K562 cells, the RUNX1
+24 enhancer could be safeguarding chromatin structure, including the insulation of regions to
prevent transcription e.g., at ERG P3 or at loci encoding non-coding RNA. Once the cell has
begun the differentiation process, RUNX1 +24 “releases” these regions to allow for their
domain specific regulation and/or expression as the cells differentiate to megakaryocytes.
219
The dynamic interaction landscape of the RUNX1 +24 enhancer during differentiation
of K562 cells highlights the need to further understand the roles of regulatory regions in
different cell types. During PMA treatment, the RUNX1 +24 enhancer forms new connections
with possible regulatory elements +14, +54, +68, +142 and known regulator m-371.
Interactions between the RUNX1 +24 enhancer and +14, +54, +68 and +142 and RUNX1-IT1
were only observed during differentiation (12 h PMA). These interactions were not observed
in steady-state K562 cells. Therefore, these regulatory regions may be important for RUNX1
regulation during megakaryocyte differentiation. All of these regions were annotated by
ChromHMM as enhancers in steady-state K562 cells, but this chromatin signature may not
reflect the nature of these elements during differentiation. The RUNX1 +24 enhancer maintains
existing contacts with regulatory regions m-368, m-58, m+3, +62 and m+110 throughout
differentiation.
The RUNX1 P2 transcript is minimally expressed throughout differentiation. The
connection between RUNX1 P2 to the RUNX1 +24 enhancer remains unchanged throughout
differentiation. Previously, +62 was shown to be a silencer of RUNX1 P2. RUNX1 +24 could
act to mediate the silencer interaction between P2 and +62. In contrast, interaction between the
RUNX1 +24 enhancer and RUNX1 P1 is increased during differentiation (12h PMA).
Transcription of RUNX1 from its P1 promoter is crucial for megakaryocyte differentiation
(Draper et al., 2017). Consistent with this, K562 cells had a gradual increase in transcription
of the P1 isoform of RUNX1 during PMA-induced differentiation (Antony et al., 2020).
In the presence of enhancer suppressor JQ1, RUNX1 P1 expression is suppressed,
suggesting that P1 expression in these cells is enhancer-driven (Antony et al., 2020). Since the
RUNX1 +24 enhancer recruits P1 during differentiation, its enhancer activity could be the target
of JQ1-mediated suppression of RUNX1 transcription.
220
Connection between RUNX1 +24, RUNX1 P1, and known RUNX1 regulatory RNAs
RUNXOR and RUNX1 -IT1 all decrease after differentiation. I speculate that it is likely,
following activation of RUNX1 P1 transcript during differentiation, that the RUNX1 +24
enhancer releases connections to regions that are crucial for the RUNX1 P1 switch because
they are no longer required.
After differentiation (72 h PMA), there is an increase in frequency of the connection
between RUNX1 +24 and m-371. This region was shown to act as a primitive haematopoietic-
specific enhancer in developing zebrafish (Chapter 3) and activates RUNX1 P1 in developing
mice (Zhu et al., 2020). The increase in connection of the RUNX1 +24 enhancer with m-371
in differentiated K562 megakaryocyte cells may indicate a role for m-371 in megakaryocyte
identity. m-371 also sits within LOC100506403 – this long non-coding RNA could also play a
similar role to RUNX1-IT1 and RUNXOR in regulating RUNX1, with the connection to m-371
helping to determine its expression.
Although the RUNX1 promoters dynamically interact with the RUNX1 +24 enhancer,
the 4C analysis is unable to determine what other regulatory elements the promoters connect
to. This is because the 4C analysis used only the bait within the RUNX1 +24 enhancer as the
viewpoint for 3D interactions. The combination of regulators used to control transcription of
RUNX1 appears to be very dynamic, but also suggest cell type dependency. For example,
despite previous evidence of intron 5.2 and the -250, -188 and -139 cluster acting as RUNX1
regulators, these regions did not interact with the RUNX1 +24 enhancer upon megakaryocyte
differentiation. They could instead interact directly with the promoter(s) or act in another cell
type.
Vukadin et al., (2020) describe SON protein as a negative regulator of RUNX1 and
inhibitor of megakaryocyte differentiation in chronic myeloid leukaemia cells. They showed
that SON obstructs megakaryocytic differentiation by directly binding to RUNX1 P2 and two
221
enhancer regions (the RUNX1 +24 enhancer and a “novel” +139 enhancer) (Vukadin et al.,
2020). +139 overlaps with m+110, suggesting that they represent the same human RUNX1
enhancer. This research supports the idea that the maintained connection of +m110/+139 with
the RUNX1 +24 enhancer during megakaryocyte differentiation is crucial for correct RUNX1
expression.
6.4 Cohesin/STAG2 dependency of RUNX1 +24 enhancer
connections in K562 cells
The cohesin complex mediates tissue-specific enhancer-promoter and enhancer-
enhancer communication (Canela et al., 2017; Phillips-Cremins et al., 2013; Strom et al.,
2012). Furthermore, evidence from our group shows convincingly that cohesin is required for
the normal transcription of Runx1 in developmental contexts (Antony et al., 2020; Horsfield et
al., 2007; Marsman et al., 2014). However, the exact mechanism of how cohesin regulates
Runx1 transcription is still unknown. It is possible that cohesin may influence the connections
between the Runx1 gene and its regulatory elements, and regulate Runx1 transcription in this
manner. To uncover potential mechanisms, I sought to determine how cohesin STAG2 mutation
affects connections formed by the RUNX1 +24 enhancer in steady-state K562 cells, and those
responding to a signal to undergo megakaryocyte differentiation.
In steady-state K562 cells, STAG2 mutation dramatically increased the number of
connections formed by the RUNX1 +24 enhancer in comparison to unedited wild type cells.
The new connections formed by the RUNX1 +24 enhancer in STAG2-deficient cells included
regions annotated by ChromHMM and dbSUPER as insulators, poised promoters and super-
enhancers; however, it is possible that these regions do not have the same chromatin features
in STAG2 mutant cells as they do in wild type. STAG2 mutant K562 cells have differentially
altered chromatin accessibility, including increased accessibility at super-enhancers and
decreased accessibility at CTCF sites (Antony et al., 2020).
222
ChromHMM annotates a region as an insulator based on a high frequency of CTCF
protein binding sites, CTCF motifs, and/or a higher frequency of H3K4me1 (Ernst et al., 2010).
CTCF sites are frequently found in TAD boundary regions or within super-enhancers (Shin,
2019). The STAG2 paralogue, STAG1, preferentially locates to CTCF sites and TAD
boundaries (Arruda et al., 2020), while STAG2 is normally found at enhancers, promoters, and
Polycomb-repressed domains (Arruda et al., 2020). Absence of STAG2 is tolerated because
STAG1 can substitute for many of STAG2’s functions in the cohesin complex (Benedetti et
al., 2017; Van Der Lelij et al., 2017). Increase in interactions of the RUNX1 +24 enhancer with
insulators and super-enhancers upon STAG2-deficiency raises the possibility that STAG1-
cohesin mediates new contacts at CTCF sites.
Chromatin organisation relies on two mechanisms: compartmentalisation and
chromatin loop extrusion, which are often considered to be counteracting each other (Nuebler
et al., 2018). In the chromatin loop extrusion model, cohesin and CTCF form stable loops that
act as functional barriers between active and inactive chromatin. Loop extrusion by cohesin is
crucial for the formation of TADs, and correct transcription relies on the insulated
neighbourhoods of TADs, including the constraint of enhancers to their gene targets (Dixon et
al., 2012; Nuebler et al., 2018; Rao et al., 2014). Compartmentalisation describes the process
in which proteins (including chromatin proteins) self-organise into phase-separated
condensates and act as membrane-less organelles that cluster specific molecules. Co-
localisation of transcription factors or polycomb group proteins are frequently observed,
suggesting like attracts like (Hyman et al., 2014; Shin et al., 2018; Stadhouders et al., 2019).
These membrane-less organelles have been suggested to carry out crucial roles in a range of
cellular processes (Boeynaems et al., 2018).
I hypothesise that increased interactions in STAG2-deficient K562 cells reflect a
scenario where compromised loop extrusion may not properly balance compartmentalisation.
223
Without STAG2, cohesin’s functions in domain organisation are compromised. As a result,
TAD boundaries may be destabilised, allowing the RUNX1 +24 enhancer to connect more
broadly across chromosome 21. Upon STAG2 depletion, compartmentalisation is the dominant
mechanism used to organise chromatin. This may explain the increase in connections from the
RUNX1 +24 enhancer to super-enhancers.
Super-enhancers are described as regions that specify cell identity, are densely bound
by mediators, chromatin factors and lineage-regulating master TFs, such as GATA1 and TAL1
in K562 cells (Hnisz et al., 2013; Huang et al., 2018; Khan et al., 2018; Parker et al., 2013;
Wang et al., 2019; Whyte et al., 2013). Super-enhancers contain a significantly higher
frequency of chromatin interactions compared to regular enhancers (Huang et al., 2018; Wang
et al., 2019). These characteristics indicate that RUNX1 +24 is acting as a super-enhancer in
K562 cells.
Wang et al., (2019) highlighted super-enhancers capability to attract the formation of
phase-separated condensates, and capacity to generate 3D genome structures that specifically
initiate the super-enhancer’s target genes (Wang et al., 2019). Sebari et al., (2018) provided
evidence that transcriptional coactivators such as BRD4 are components of the phase-separated
condensate, and that these condensates co-localise with super-enhancers (Sabari et al., 2018).
BRD4 recruitment required for transcription is known to occur independent of enhancer-
promoter connections (Crump et al., 2021). Together, these studies provide an explanation for
the surprisingly, similar connections forming, during differentiation (12 PMA) in STAG2
mutant cells and wild type.
The transcription factors that respond to PMA induction may be able to mediate correct
RUNX1 +24 connections regardless of cohesin deficiency. Transcription-specific 3D structure
could reflect phase-separated condensates formation , which are used to specifically initiate the
super-enhancer’s target genes. However, when transcription induction stops (72 PMA),
224
connections go back to being abnormal. This also reflects the observations of Antony et al.,
(2020); who found that the use of BET inhibitor, JQ1 to reduce BRD4 binding and dampens
super-enhancer driven transcription, prevented the aberrant RUNX1 and ERG signal-induced
transcription in STAG2 mutant cells (Antony et al., 2020).
6.4.1 K562 Limitations
This thesis used K562 cells to determine RUNX1 +24 connectivity. K562 are a chronic
myelogenous leukaemic (CML) cell line which expresses the BCR-ABL fusion gene
(McGahon et al., 1997). Previous research has highlighted that cohesin mutations occur early
in leukaemogenesis and mainly arise in pre-leukaemic cells (CGARN, 2013; Corces et al.,
2017; Kon et al., 2013; Thol et al., 2014; Welch et al., 2012). The CRISPR/Cas9 edited STAG2
subunit is an additional mutation, which sequentially occurred after leukaemogenesis, as such,
changes observed may be due to compounding effect of existing K562 mutational profile.
Cohesin mutations are more frequently observed in AML samples than CML, so interactions
may not reflect AML samples.
Observing RUNX1 +24 connections in other cell lines would be needed to see which of
the extensive connections are replicated. However, the K562 and STAG2 mutant K562 cell
lines were effective for understanding dynamic 3D connections during differentiation, and
highlighting possible cis-regulatory regions.
6.5 Future directions
The hypothesis of my study was that Runx1 expression is regulated by regulatory
elements that connect with super-enhancer Runx1 +24, and that these connections are
dependent on cohesin. Although my results did identify some functional enhancers and their
conservation in human genome, it would be worthwhile investigating the other functional roles
of these regions, their combinatory effects, and which other cell types utilise these regions.
Some putative regulatory regions had dynamic interactions with the RUNX1 +24 enhancer
225
depending on the stage of differentiation and presence of STAG2-cohesin. However, these
regions have no functional data so require further analysis.
6.5.1 RUNX1 +24 G-quadruplex
The Runx1 +24 G-quadruplex analysis was specific to the mouse sequence so it may
not reflect the human sequence, the type of G-quadruplex, or its functional capacity. The CD
analysis should be repeated with the human RUNX1 +24 G-quadruplexes.
G-quadruplexes can be selectively targeted by small binding ligands that can disrupt
translation (Bugaut et al., 2012). This has initiated a new approach in RNA-directed drug
design. If the G-quadruplex is present and functional in the human +24 region, further research
is required to validate it as a drug target. If there is dysregulation of the RUNX1 +24 enhancer
region, targeting of the G-quadruplexes may prove effective.
6.5.2 Functional analysis of RUNX1 regulatory regions
I only tested the functional enhancer capability of mouse Runx1 regulatory regions
identified in HPC7 cells by 4C. However, it would be worthwhile to further characterise the
enhancer capabilities of previously identified RUNX1 regulators; -250, -188, -139, m-101, m-
43, +62, and 4C identified regions; -96, +14, +54, +68, +89, +142, SLC5A3 Exon 2 MRPS6
intron and super-enhancer region between SAMSN1 and NRIP1.
When determining the enhancer potential of regulatory regions, it would be useful to
investigate other developmental time-points in more detail to analyse the different waves of
haematopoiesis. Primitive haematopoiesis occurs in zebrafish 12 hpf; definitive
haematopoiesis begins first in the ventral wall of the dorsal aorta from 36 hpf, then in the caudal
haematopoietic tissue 2 dpf, and finally in the kidney marrow 4 dpf. Viewing embryos during
these important time-points may provide further insight into Runx1 regulation. A semi-
automated imaging technique could be used to efficiently analyse more time-points (Romano
et al., 2014).
226
To determine the cell type-specific enhancer properties an alternative cell-specific
enhancer reporter without a gata2 promoter could be used, so that the plasmids can be
successfully integrated into F1 lines. As the enhancer reporter plasmids contain fluorescent
markers, fluorescence-activated cell sorting (FACs) can be used to find which zebrafish
haematopoietic cells contain GFP expression driven by the putative enhancers. This method
can be used to determine which haematopoietic cells have transcriptional machinery that
recognizes the enhancers based upon the fluorescent characteristics of each cell.
Expression patterns of these enhancers show no distinct patterns of collaboration, or
independence of roles between the enhancers. Enhancers could be working in combinations or
may be functionally redundant. It would be interesting to analyse enhancer function further
with combinations of candidates and/or CRISPR/Cas9 edited candidates in reporter assays.
Interestingly regulatory regions (m-368, -250, m-101, -96, m-58, m+3, +14, +54, +62,
+68, +89 m+110, +142, intron 5.2 and SLC5A3 Exon 2 MRPS6 intron) showed histone
demethylase LSD1 binding in K562 cells indicating they have potential to be epigenetically
regulated (repressed or activated). I have tested, their enhancer potential but It would be useful
to functionally test all regions for silencer activity to provide further clarity about the
regulatory capability of these regions. Silencer activity could be determined by measuring the
repressive effect of various deletion/mutant regulatory region luciferase constructs on RUNX1
promoters. Silencer activity could also be measured by using an insulator assay vector (INS)
in zebrafish; this vector expresses GFP across somites as internal control, and there is GFP
expression variance when an insulator is cloned into vector (Bessa et al., 2009; Marsman et
al., 2014).
6.5.3 RUNX1 +24 connections with genes
Although RUNX1 P1 and P2 promoters dynamically interact with the RUNX1 +24
enhancer, the 4C analysis could not determine what other DNA regions the P1 and P2
227
promoters independently connect to during megakaryocyte differentiation. Another 4C that
uses RUNX1 P1 and P2 as baits throughout differentiation would determine what connections
these promoters make independently of the RUNX1 +24 enhancer.
Future RNA-seq to determine gene expression levels during megakaryocyte
differentiation would be informative and subsequent integration with the current 4C data would
be useful on determining whether the RUNX1 +24 enhancer connections to genes are the
scaffolds for transcription.
The connection and subsequent release of silenced ERG P3 may be crucial for correct
ERG transcription regulation. The role of the P3 promoter in ERG expression and isoform-
specific transcription could be analysed further to understand the dynamic role of the RUNX1
+24 enhancer in ERG regulation.
6.5.4 Roles of non-coding RNA in RUNX1 regulation
It is possible that non-coding RNA elements including long non-coding RNA, eRNA,
and G-quadruplex RNA could influence RUNX1 regulation. However, more research is
required to determine the individual functional contributions of these RNAs in RUNX1
expression.
Chapter 3 described evidence that eRNA is expressed at most RUNX1 regulatory
regions. It would be informative to analyse the functional capability of these eRNAs, their
interactions, and their influence on RUNX1 regulation. It would be especially exciting to further
investigate the possible functional roles of the RUNX1 +24 enhancer eRNA and possible +24
G-quadruplex RNA. This could be done using precision run-on sequencing (PRO-seq) and
rapid amplification of cDNA ends (RACE). Pro-seq can define the position, amount, and
orientation of transcriptionally active RNA polymerases in the genome to a single-nucleotide
resolution (Hou et al., 2020b). RACE can map eRNAs by identifying the 5′ and 3′ ends of
RNA, and RT-qPCR can determine the expression levels of these eRNAs (Hou et al., 2020b).
228
eRNA targets could be analysed using RNA‐guided Chromatin Conformation Capture (R3C)
as demonstrated by Wang et al., (2014). Functionality could be analysed by eRNA knock down
using small interfering RNA (siRNA) or mutated using CRISPR/Cas9 (Hou et al., 2020b).
If the G-quadruplex is capable of forming RNA, it would be fascinating to investigate
its targets or ability to bind to RUNX1 protein. This could be carried out by isolating sequences
that bind to given targets with high affinity and specificity, using systematic evolution of
ligands by exponential enrichment (SELEX) (Ellington et al., 1990; Tuerk et al., 1990).
The results of this research also indicate a possible role for uncharacterised non-coding
RNA LOC10050640 in RUNX1 regulation. Regulatory regions m-371, m-368 and m-327 all
sit within LOC10050640 and may be involved in the regulation of this long non-coding RNA.
The role of RUNX1–IT1 in haematopoietic cells is currently unknown. More broadly, research
is also needed to determine the roles of other uncharacterised non-coding RNAs that the
RUNX1 +24 enhancer also makes connections with. Further investigation to define the RNA-
DNA interactions proposed by this research could determine the importance of the RUNX1 +24
enhancer eRNAs, possible G-quadruplex RNA and non-coding RNA connections seen in K562
cells. This could be carried out by utilising previously generated RNA-DNA interactomes for
K562 cells such as Red-C from Gavrilov et al., (2020) (Gavrilov et al., 2020).
6.5.5 Further questions proposed by this research
Several questions from this research remain unanswered, for example: are there other
strong enhancers that, like the RUNX1 +24 enhancer, connect across an entire chromosome? If
so, are they significant for genome organisation? Are there equivalents of the RUNX1 +24
enhancer on different chromosomes? These could be further discovered by using enhancers as
baits for 4C experiments, or by establishing an equivalent of Capture-C for enhancers.
229
6.5.6 Clinical significance
Tumour sequencing studies are helpful, but to have medical benefits to the patient, the
mechanisms of disease must be understood beyond a statistical correlation. Analysis of
available SNP and cancer mutation databases identified SNPs and mutations in RUNX1-
associated regulatory regions. It would be informative to carry out the same analysis on the
putative regulatory regions identified by the 4C.
Other than +62 and intron 5.2, the relevance of mutations to these regions to the
progression of AML is unknown. My study sets the scene for functional analyses to precisely
determine how RUNX1 is regulated, including CRISPR/Cas9-mediated interference with
enhancer activity. The results of my research also provide a rationale for screening patients
with mutations in enhancer regions. By analysing deep sequencing of AML patient samples
that have no identifiable mutations in the RUNX1 coding region or in other leukaemia genes,
mutations in regulatory regions that lead to dysregulated RUNX1 expression may be
discovered. Sequencing analyses that identify enhancer mutations could not only explain a
patient’s AML progression, they could provide future therapeutic targets.
6.6 Overall conclusions
The precise spatiotemporal expression of Runx1 requires cis-regulatory elements, the
majority of which have not been previously defined. In this PhD thesis, I describe research that
provides insight into mouse Runx1 enhancer functions, and highlights the features of conserved
human enhancers. My study identifies the regulatory regions that interact with the RUNX1 +24
enhancer and that are potentially crucial for megakaryocyte differentiation.
One of the more exciting findings to emerge from this study is that the RUNX1 +24
enhancer forms DNA contacts across chromosome 21, defying the boundaries of known TADs.
The RUNX1 +24 enhancer interacts with coding genes, non-coding RNA, super-enhancers and
regulatory elements, and may also maintain connections that prevent transcription as well as
230
enhance it. The findings suggest that the RUNX1 +24 enhancer could be important for overall
organisation and 3D structure of chromosome 21.
The results of my investigation show that cohesin STAG2-depletion de-constrains the
chromatin, allowing increased connections to the RUNX1 +24 enhancer hub and increased
super-enhancer interactions. This may result in aberrant enhancer-amplified transcription in
response to differentiation signals.
Overall, my study identified novel mechanisms for RUNX1 regulation, demonstrated
the role of the RUNX1 +24 enhancer in megakaryocyte differentiation, and the effect of cohesin
depletion on genome connectivity during differentiation.
231
References
Abbasi, A. A., Paparidis, Z., Malik, S., Goode, D. K., Callaway, H., Elgar, G., & Grzeschik, K.-H. (2007). Human GLI3 intragenic conserved non-coding sequences are tissue-specific enhancers. PLoS One, 2(4), e366. Acemel, R. D., Maeso, I., & Gómez‐Skarmeta, J. L. (2017). Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. Wiley Interdisciplinary Reviews: Developmental Biology, 6(3). Ajore, R., Dhanda, R. S., Gullberg, U., & Olsson, I. (2010). The leukemia associated ETO nuclear repressor gene is regulated by the GATA-1 transcription factor in erythroid/megakaryocytic cells. BMC molecular biology, 11(1), 38. Albacker, C. E., & Zon, L. I. (2009). Use of zebrafish to dissect gene programs regulating hematopoietic stem cells. In Regulatory Networks in Stem Cells (pp. 101-110): Humana Press. Alberati-Giani, D., Cesura, A. M., Broger, C., Warren, W. D., Röver, S., & Malherbe, P. (1997). Cloning and functional expression of human kynurenine 3‐monooxygenase. FEBS letters, 410(2-3), 407-412. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410. Anania, C., & Lupiáñez, D. G. (2020). Order and disorder: abnormal 3D chromatin organization in human disease. Briefings in functional genomics, 19(2), 128-138. Andersson, R., Gebhard, C., Miguel-Escalada, I., Hoof, I., Bornholdt, J., Boyd, M., . . . Suzuki, T. (2014). An atlas of active enhancers across human cell types and tissues. Nature, 507(7493), 455-461. Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. Antony, J., Gimenez, G., Taylor, T., Khatoon, U., Day, R., Morison, I. M., & Horsfield, J. A. (2020). BET inhibition prevents aberrant RUNX1 and ERG transcription in STAG2 mutant leukaemia cells. Journal of Molecular Cell Biology, 12(5), 397-399. doi:10.1093/jmcb/mjz114 Arruda, N. L., Carico, Z. M., Justice, M., Liu, Y. F., Zhou, J., Stefan, H. C., & Dowen, J. M. (2020). Distinct and overlapping roles of STAG1 and STAG2 in cohesin localization and gene expression in embryonic stem cells. Epigenetics & chromatin, 13(1), 1-17. Asou, N. (2003). The role of a Runt domain transcription factor AML1/RUNX1 in leukemogenesis and its clinical implications. Critical reviews in oncology/hematology, 45(2), 129-150. Assi, S. A., Imperato, M. R., Coleman, D. J., Pickin, A., Potluri, S., Ptasinska, A., . . . James, S. R. (2019). Subtype-specific regulatory network rewiring in acute myeloid leukemia. Nature genetics, 51(1), 151-162. Au, C. H., Wa, A., Ho, D. N., Chan, T. L., & Ma, E. S. (2016). Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms. Diagnostic pathology, 11(1), 1-12.
232
Balbás-Martínez, C., Sagrera, A., Carrillo-de-Santa-Pau, E., Earl, J., Márquez, M., Vazquez, M., . . . Bayés, M. (2013). Recurrent inactivation of STAG2 in bladder cancer is not associated with aneuploidy. Nature genetics, 45(12), 1464. Barski, A., Cuddapah, S., Cui, K., Roh, T.-Y., Schones, D. E., Wang, Z., . . . Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell, 129(4), 823-837. Bee, T., Ashley, E. L., Bickley, S. R., Jarratt, A., Li, P.-S., Sloane-Stanley, J., . . . de Bruijn, M. F. (2009a). The mouse Runx1+ 23 hematopoietic stem cell enhancer confers hematopoietic specificity to both Runx1 promoters. Blood, 113(21), 5121-5124. Bee, T., Liddiard, K., Swiers, G., Bickley, S. R., Vink, C. S., Jarratt, A., . . . de Bruijn, M. F. (2009b). Alternative Runx1 promoter usage in mouse developmental hematopoiesis. Blood Cells, Molecules, and Diseases, 43(1), 35-42. Bee, T., Swiers, G., Muroi, S., Pozner, A., Nottingham, W., Santos, A. C., . . . de Bruijn, M. F. (2010). Nonredundant roles for Runx1 alternative promoters reflect their activity at discrete stages of developmental hematopoiesis. Blood, 115(15), 3042-3050. Bellissimo, D. C., & Speck, N. A. (2017). RUNX1 Mutations in Inherited and Sporadic Leukemia. Frontiers in Cell and Developmental Biology, 5, 111. doi:10.3389/fcell.2017.00111 Beltran, M., Tavares, M., Justin, N., Khandelwal, G., Ambrose, J., Foster, B. M., . . . Herrero, J. (2019). G-tract RNA removes Polycomb repressive complex 2 from genes. Nature structural & molecular biology, 26(10), 899-909. Benedetti, L., Cereda, M., Monteverde, L., Desai, N., & Ciccarelli, F. D. (2017). Synthetic lethal interaction between the tumour suppressor STAG2 and its paralog STAG1. Oncotarget, 8(23), 37619. Bessa, J., Tena, J. J., de la Calle‐Mustienes, E., Fernández‐Miñán, A., Naranjo, S., Fernández, A., . . . Casares, F. (2009). Zebrafish enhancer detection (ZED) vector: A new tool to facilitate transgenesis and the functional analysis of cis‐regulatory regions in zebrafish. Developmental Dynamics, 238(9), 2409-2417. Betz, B. L., & Hess, J. L. (2010). Acute myeloid leukemia diagnosis in the 21st century. Archives of pathology & laboratory medicine, 134(10), 1427-1433. Bhagwat, A. S., Roe, J.-S., Mok, B. Y., Hohmann, A. F., Shi, J., & Vakoc, C. R. (2016). BET bromodomain inhibition releases the mediator complex from select cis-regulatory elements. Cell reports, 15(3), 519-530. Bhatia, S., Bengani, H., Fish, M., Brown, A., Divizia, M. T., De Marco, R., . . . Kleinjan, D. A. (2013). Disruption of autoregulatory feedback by a mutation in a remote, ultraconserved PAX6 enhancer causes aniridia. The American Journal of Human Genetics, 93(6), 1126-1134. Bhatia, S., Monahan, J., Ravi, V., Gautier, P., Murdoch, E., Brenner, S., . . . Kleinjan, D. A. (2014). A survey of ancient conserved non-coding elements in the PAX6 locus reveals a landscape of interdigitated cis-regulatory archipelagos. Developmental biology, 387(2), 214-228. Boeynaems, S., Alberti, S., Fawzi, N. L., Mittag, T., Polymenidou, M., Rousseau, F., . . . Van Den Bosch, L. (2018). Protein phase separation: a new phase in cell biology. Trends in cell biology, 28(6), 420-435.
233
Bohlander, S. (2000). Fusion genes in leukemia: an emerging network. Cytogenetic and Genome Research, 91(1-4), 52-56. Bonkhofer, F., Rispoli, R., Pinheiro, P., Krecsmarik, M., Schneider-Swales, J., Tsang, I. H. C., . . . Patient, R. (2019). Blood stem cell-forming haemogenic endothelium in zebrafish derives from arterial endothelium. Nature communications, 10(1), 1-14. Bopp, S. K., Minuzzo, M., & Lettieri, T. (2006). The zebrafish (Danio rerio): an emerging model organism in the environmental field. Luxembourg: Institute for Environment and Sustainability, Joint Research Center, European Commission, European Communities. Bose, D. A., Donahue, G., Reinberg, D., Shiekhattar, R., Bonasio, R., & Berger, S. L. (2017). RNA binding to CBP stimulates histone acetylation and transcription. Cell, 168(1-2), 135-149. e122. Bower, H., Andersson, T. M., Björkholm, M., Dickman, P., Lambert, P. C., & Derolf, Å. R. (2016). Continued improvement in survival of acute myeloid leukemia patients: an application of the loss in expectation of life. Blood cancer journal, 6(2), e390-e390. Boyer, L. A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L. A., Lee, T. I., . . . Ray, M. K. (2006). Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature, 441(7091), 349-353. Brennan, C. W., Verhaak, R. G., McKenna, A., Campos, B., Noushmehr, H., Salama, S. R., . . . Berman, S. H. (2013). The somatic genomic landscape of glioblastoma. Cell, 155(2), 462-477. Breton, J., Avanzi, N., Magagnin, S., Covini, N., Magistrelli, G., Cozzi, L., & Isacchi, A. (2000). Functional characterization and mechanism of action of recombinant human kynurenine 3‐hydroxylase. European Journal of Biochemistry, 267(4), 1092-1099. Brohl, A. S., Solomon, D. A., Chang, W., Wang, J., Song, Y., Sindiri, S., . . . Shern, J. F. (2014). The genomic landscape of the Ewing Sarcoma family of tumors reveals recurrent STAG2 mutation. PLoS Genet, 10(7), e1004475. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y., & Greenleaf, W. J. (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 10(12), 1213. Bugaut, A., & Balasubramanian, S. (2012). 5′-UTR RNA G-quadruplexes: translation regulation and targeting. Nucleic Acids Research, 40(11), 4727-4741. Buijs, A., Poot, M., van der Crabben, S., van der Zwaag, B., van Binsbergen, E., & van Roosmalen, M. J. (2012). Elucidation of a novel pathogenomic mechanism using genome-wide long mate-pair sequencing of a congenital t(16;21) in a series of three RUNX1-mutated FPD/AML pedigrees. Leukemia, 26. doi:10.1038/leu.2012.79 Bullinger, L., Krönke, J., Schön, C., Radtke, I., Urlbauer, K., Botzenhardt, U., . . . Schlenk, R. (2010). Identification of acquired copy number alterations and uniparental disomies in cytogenetically normal acute myeloid leukemia using high-resolution single-nucleotide polymorphism analysis. Leukemia, 24(2), 438-449. Buniello, A., MacArthur, J. A. L., Cerezo, M., Harris, L. W., Hayhurst, J., Malangone, C., . . . Sollis, E. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research, 47(D1), D1005-D1012.
234
Burnett, A., Wetzler, M., & Lowenberg, B. (2011). Therapeutic advances in acute myeloid leukemia. J Clin Oncol, 29(5), 487-494. Byrd, J. C., Mrózek, K., Dodge, R. K., Carroll, A. J., Edwards, C. G., Arthur, D. C., . . . Watson, M. S. (2002). Pretreatment cytogenetic abnormalities are predictive of induction success, cumulative incidence of relapse, and overall survival in adult patients with de novo acute myeloid leukemia: results from Cancer and Leukemia Group B (CALGB 8461) Presented in part at the 43rd annual meeting of the American Society of Hematology, Orlando, FL, December 10, 2001, and published in abstract form. 59. Blood, The Journal of the American Society of Hematology, 100(13), 4325-4336. Cai, X., Gaudet, J. J., Mangan, J. K., Chen, M. J., De Obaldia, M. E., Oo, Z., . . . Speck, N. A. (2011). Runx1 loss minimally impacts long-term hematopoietic stem cells. PLoS One, 6(12), e28430. Canela, A., Maman, Y., Jung, S., Wong, N., Callen, E., Day, A., . . . Rao, S. S. (2017). Genome Organization Drives Chromosome Fragility. Cell. Cantor, A. B., & Orkin, S. H. (2002). Transcriptional regulation of erythropoiesis: an affair involving multiple partners. Oncogene, 21(21), 3368-3376. doi:10.1038/sj.onc.1205326 Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., . . . Schultz, N. (2012). The cBio Cancer Genomics Portal: An open platform for exploring multidimensional cancer genomics data. Cancer discovery, 2(5), 401-404. doi:10.1158/2159-8290.CD-12-0095 CGARN. (2013). Cancer Genome Atlas Research Network (CGARN): Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. The New England journal of medicine, 368(22), 2059-2074. Chahal, G., Tyagi, S., & Ramialison, M. (2019). Navigating the non-coding genome in heart development and Congenital Heart Disease. Differentiation, 107, 11-23. Chen, Z., Amro, E. M., Becker, F., Hölzer, M., Rasa, S. M. M., Njeru, S. N., . . . Tang, D. (2019). Cohesin-mediated NF-κB signaling limits hematopoietic stem cell self-renewal in aging and inflammation. Journal of Experimental Medicine, 216(1), 152-175. Cheng, C.-K., Wong, T. H. Y., Wan, T. S. K., Wang, A. Z., Chan, N. P. H., Chan, N. C. N., . . . Ng, M. H. L. (2018). RUNX1 upregulation via disruption of long-range transcriptional control by a novel t(5;21)(q13;q22) translocation in acute myeloid leukemia. Molecular Cancer, 17(1), 133. doi:10.1186/s12943-018-0881-2 Chiarle, R., Zhang, Y., Frock, R. L., Lewis, S. M., Molinie, B., Ho, Y.-J., . . . Malkin, D. J. (2011). Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell, 147(1), 107-119. Chin, C. V., Antony, J., Ketharnathan, S., Labudina, A., Gimenez, G., Parsons, K. M., . . . Musio, A. (2020). Cohesin mutations are synthetic lethal with stimulation of WNT signaling. elife, 9, e61405. Churpek, J. E., Pyrtel, K., Kanchi, K.-L., Shao, J., Koboldt, D., Miller, C. A., . . . Fronick, C. (2015). Genomic analysis of germ line and somatic variants in familial myelodysplasia/acute myeloid leukemia. Blood, 126(22), 2484-2490. Ciau-Uitz, A., Monteiro, R., Kirmizitas, A., & Patient, R. (2014). Developmental hematopoiesis: ontogeny, genetic programming and conservation. Experimental hematology, 42(8), 669-683.
235
Coffman, J. A. (2003). Runx transcription factors and the developmental balance between cell proliferation and differentiation. Cell biology international, 27(4), 315-324. Coffman, J. A. (2009). Is Runx a linchpin for developmental signaling in metazoans? Journal of cellular biochemistry, 107(2), 194-202. Conway O’Brien, E., Prideaux, S., & Chevassut, T. (2014). The epigenetic landscape of acute myeloid leukemia. Advances in hematology, 2014. Corces, M. R., Chang, H. Y., & Majeti, R. (2017). Preleukemic Hematopoietic Stem Cells in Human Acute Myeloid Leukemia. Frontiers in oncology, 7, 263-263. doi:10.3389/fonc.2017.00263 Cornish, A. J., Hoang, P. H., Dobbins, S. E., Law, P. J., Chubb, D., Orlando, G., & Houlston, R. S. (2019). Identification of recurrent noncoding mutations in B-cell lymphoma using capture Hi-C. Blood Advances, 3(1), 21-32. Corrie, P. G. (2008). Cytotoxic chemotherapy: clinical aspects. Medicine, 36(1), 24-28. Costantino, L., Hsieh, T.-H. S., Lamothe, R., Darzacq, X., & Koshland, D. (2020). Cohesin residency determines chromatin loop patterns. elife, 9, e59889. Crump, N. T., Ballabio, E., Godfrey, L., Thorne, R., Repapi, E., Kerry, J., . . . Filippakopoulos, P. (2021). BET inhibition disrupts transcription but retains enhancer-promoter contact. Nature communications, 12(1), 1-15. Cuadrado, A., Giménez-Llorente, D., Kojic, A., Rodríguez-Corsino, M., Cuartero, Y., Martín-Serrano, G., . . . Losada, A. (2019). Specific contributions of cohesin-SA1 and cohesin-SA2 to TADs and polycomb domains in embryonic stem cells. Cell reports, 27(12), 3500-3510. e3504. Cuartero, S., & Merkenschlager, M. (2018a). Three-dimensional genome organization in normal and malignant haematopoiesis. Current opinion in hematology, 25(4), 323-328. Cuartero, S., Weiss, F. D., Dharmalingam, G., Guo, Y., Ing-Simmons, E., Masella, S., . . . Barozzi, I. (2018b). Control of inducible gene expression links cohesin to hematopoietic progenitor self-renewal and differentiation. Nature immunology, 19(9), 932-941. Daver, N., Cortes, J., Ravandi, F., Patel, K. P., Burger, J. A., Konopleva, M., & Kantarjian, H. (2015). Secondary mutations as mediators of resistance to targeted therapy in leukemia. Blood, The Journal of the American Society of Hematology, 125(21), 3236-3245. Davidson, A. J., & Zon, L. I. (2004). The ‘definitive’(and ‘primitive’) guide to zebrafish hematopoiesis. Oncogene, 23(43), 7233-7246. De Braekeleer, E., Douet-Guilbert, N., Morel, F., Le Bris, M. J., Férec, C., & De Braekeleer, M. (2011). RUNX1 translocations and fusion genes in malignant hemopathies. Future Oncol, 7. doi:10.2217/fon.10.158 de Jong, J. L., & Zon, L. I. (2005). Use of the zebrafish system to study primitive and definitive hematopoiesis. Annu. Rev. Genet., 39, 481-501. De Koninck, M., Lapi, E., Badía-Careaga, C., Cossío, I., Giménez-Llorente, D., Rodríguez-Corsino, M., . . . Real, F. X. (2020). Essential Roles of Cohesin STAG2 in Mouse Embryonic Development and Adult Tissue Homeostasis. Cell reports, 32(6), 108014. Deardorff, M. A., Noon, S. E., & Krantz, I. D. (2020). Cornelia de Lange syndrome. GeneReviews®[Internet].
236
Dekker, J., & Misteli, T. (2015). Long-range chromatin interactions. Cold Spring Harbor perspectives in biology, 7(10), a019356. Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. science, 295(5558), 1306-1311. Deng, W., & Blobel, G. A. (2010). Do chromatin loops provide epigenetic gene expression states? Current opinion in genetics & development, 20(5), 548-554. Desai, P., Mencia-Trinchant, N., Savenkov, O., Simon, M. S., Cheang, G., Lee, S., . . . Hassane, D. C. (2018). Somatic mutations precede acute myeloid leukemia years before diagnosis. Nature Medicine, 24(7), 1015-1023. doi:10.1038/s41591-018-0081-z DiNardo, C. D., & Cortes, J. E. (2016). Mutations in AML: prognostic and therapeutic implications. Hematology 2014, the American Society of Hematology Education Program Book, 2016(1), 348-355. Dixon, J. R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., . . . Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485(7398), 376. Dobrzycki, T., Mahony, C. B., Krecsmarik, M., Koyunlar, C., Rispoli, R., Peulen-Zink, J., . . . Patient, R. (2020). Deletion of a conserved Gata2 enhancer impairs haemogenic endothelium programming and adult Zebrafish haematopoiesis. Communications biology, 3(1), 1-14. Dostie, J., Richmond, T. A., Arnaout, R. A., Selzer, R. R., Lee, W. L., Honan, T. A., . . . Nusbaum, C. (2006). Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Research, 16(10), 1299-1309. Downes, C. S., Mullinger, A. M., & Johnson, R. T. (1991). Inhibitors of DNA topoisomerase II prevent chromatid separation in mammalian cells but do not prevent exit from mitosis. Proceedings of the National Academy of Sciences, 88(20), 8895-8899. Draper, J. E., Sroczynska, P., Leong, H. S., Fadlullah, M. Z., Miller, C., Kouskoff, V., & Lacaud, G. (2017). Mouse RUNX1C regulates premegakaryocytic/erythroid output and maintains survival of megakaryocyte progenitors. Blood, The Journal of the American Society of Hematology, 130(3), 271-284. Du, Z., Kong, P., Gao, Y., & Li, N. (2007). Enrichment of G4 DNA motif in transcriptional regulatory region of chicken genome. Biochemical and biophysical research communications, 354(4), 1067-1070. Dzierzak, E., Medvinsky, A., & de Bruijn, M. (1998). Qualitative and quantitative aspects of haematopoietic cell development in the mammalian embryo. Immunology Today, 19(5), 228-236. Eddy, J., & Maizels, N. (2008). Conserved elements with potential to form polymorphic G-quadruplex structures in the first intron of human genes. Nucleic Acids Research, 36(4), 1321-1333. Ellington, A. D., & Szostak, J. W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature, 346(6287), 818-822. Engreitz, J. M., Agarwala, V., & Mirny, L. A. (2012). Three-dimensional genome architecture influences partner selection for chromosomal translocations in human disease. PLoS One, 7(9), e44196.
237
Erickson, P., Gao, J., Chang, K., Look, T., Whisenant, E., Raimondi, S., . . . Drabkin, H. (1992). Identification of breakpoints in t(8; 21) acute myelogenous leukemia and isolation of a fusion transcript, AML1/ETO, with similarity to Drosophila segmentation gene, runt. Blood, 80(7), 1825-1831. Ernst, J., & Kellis, M. (2010). Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature biotechnology, 28(8), 817-825. Ernst, J., Kheradpour, P., Mikkelsen, T. S., Shoresh, N., Ward, L. D., Epstein, C. B., . . . Coyne, M. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 473(7345), 43-49. Evers, L., Perez-Mancera, P. A., Lenkiewicz, E., Tang, N., Aust, D., Knösel, T., . . . Aziz, M. (2014). STAG2 is a clinically relevant tumor suppressor in pancreatic ductal adenocarcinoma. Genome Medicine, 6(1), 1-12. Ezoe, S., Matsumura, I., Nakata, S., Gale, K., Ishihara, K., Minegishi, N., . . . Kanakura, Y. (2002). GATA-2/estrogen receptor chimera regulates cytokine-dependent growth of hematopoietic cells through accumulation of p21(WAF1) and p27(Kip1) proteins. Blood, 100(10), 3512-3520. doi:10.1182/blood-2002-04-1177 Fadason, T., Ekblad, C., Ingram, J. R., Schierding, W. S., & O'Sullivan, J. M. (2017). Physical interactions and expression quantitative traits loci identify regulatory connections for obesity and type 2 diabetes associated SNPs. Frontiers in genetics, 8, 150. Fadason, T., Schierding, W., Lumley, T., & O’Sullivan, J. M. (2018). Chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities. Nature communications, 9(1), 1-13. Fan, J., Xing, Y., Wen, X., Jia, R., Ni, H., He, J., . . . Ge, S. (2015). Long non-coding RNA ROR decoys gene-specific histone methylation to promote tumorigenesis. Genome Biology, 16(1), 1-17. Feeney, K., Wasson, C., & Parish, J. (2010). Cohesin: a regulator of genome integrity and gene expression. Biochemical Journal, 428, 147-161. Feinberg, A. P. (2007). Phenotypic plasticity and the epigenetics of human disease. Nature, 447(7143), 433-440. Ferrara, F., Palmieri, S., & Leoni, F. (2008). Clinically useful prognostic factors in acute myeloid leukemia. Critical reviews in oncology/hematology, 66(3), 181-193. Ferrara, F., & Schiffer, C. A. (2013). Acute myeloid leukaemia in adults. The Lancet, 381(9865), 484-495. Fisher, J. B., McNulty, M., Burke, M. J., Crispino, J. D., & Rao, S. (2017a). Cohesin mutations in myeloid malignancies. Trends in cancer, 3(4), 282-293. Fisher, J. B., Peterson, J., Reimer, M., Stelloh, C., Pulakanti, K., Gerbec, Z. J., . . . McNulty, M. (2017b). The cohesin subunit Rad21 is a negative regulator of hematopoietic self-renewal through epigenetic repression of Hoxa7 and Hoxa9. Leukemia, 31(3), 712-719. Flavahan, W. A., Drier, Y., Liau, B. B., Gillespie, S. M., Venteicher, A. S., Stemmer-Rachamimov, A. O., . . . Bernstein, B. E. (2016). Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature, 529(7584), 110-114. Flynn, R. A., & Chang, H. Y. (2014). Long noncoding RNAs in cell-fate programming and reprogramming. Cell Stem Cell, 14(6), 752-761.
238
Fudenberg, G., Getz, G., Meyerson, M., & Mirny, L. A. (2011). High order chromatin architecture shapes the landscape of chromosomal alterations in cancer. Nature biotechnology, 29(12), 1109-1113. Fukunaga, J., Nomura, Y., Tanaka, Y., Torigoe, H., Nakamura, Y., Sakamoto, T., & Kozu, T. (2020). AG‐quadruplex‐forming RNA aptamer binds to the MTG8 TAFH domain and dissociates the leukemic AML1‐MTG8 fusion protein from DNA. FEBS letters, 594(21), 3477-3489. Fullwood, M. J., Liu, M. H., Pan, Y. F., Liu, J., Xu, H., Mohamed, Y. B., . . . Mei, P. H. (2009). An oestrogen-receptor-α-bound human chromatin interactome. Nature, 462(7269), 58-64. Gaidzik, V., Teleanu, V., Papaemmanuil, E., Weber, D., Paschka, P., Hahn, J., . . . Horst, H. (2016). RUNX1 mutations in acute myeloid leukemia are associated with distinct clinico-pathologic and genetic features. Leukemia, 30(11), 2160-2168. Gandemer, V., Rio, A. G., de Tayrac, M., Sibut, V., Mottier, S., & Ly Sunnaram, B. (2007). Five distinct biological processes and 14 differentially expressed genes characterize TEL/AML1-positive leukemia. BMC genomics, 8. doi:10.1186/1471-2164-8-385 Gao, J., Aksoy, B. A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S. O., . . . Schultz, N. (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science Signaling, 6(269), pl1-pl1. doi:10.1126/scisignal.2004088 García-Nieto, P. E., Morrison, A. J., & Fraser, H. B. (2019). The somatic mutation landscape of the human body. Genome Biology, 20(1), 1-20. Gardini, A., Cesaroni, M., Luzi, L., Okumura, A. J., Biggs, J. R., Minardi, S. P., . . . Alcalay, M. (2008). AML1/ETO oncoprotein is directed to AML1 binding regions and co-localizes with AML1 and HEB on its targets. PLoS genetics, 4(11), e1000275. Gates, L. A., Foulds, C. E., & O'Malley, B. W. (2017). Histone Marks in the 'Driver's Seat': Functional Roles in Steering the Transcription Cycle. Trends in biochemical sciences, 42(12), 977-989. doi:10.1016/j.tibs.2017.10.004 Gattuso, H., Spinello, A., Terenzi, A., Assfeld, X., Barone, G., & Monari, A. (2016). Circular Dichroism of DNA G-Quadruplexes: Combining Modeling and Spectroscopy To Unravel Complex Structures. The Journal of Physical Chemistry B, 120(12), 3113-3121. doi:10.1021/acs.jpcb.6b00634 Gavrilov, A. A., Zharikova, A. A., Galitsyna, A. A., Luzhin, A. V., Rubanova, N. M., Golov, A. K., . . . Ulianov, S. V. (2020). Studying RNA–DNA interactome by Red-C identifies noncoding RNAs associated with various chromatin types and reveals transcription dynamics. Nucleic Acids Research. Geisler, S., & Coller, J. (2013). RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nature reviews Molecular cell biology, 14(11), 699-712. Germain, R. N. (2002). T-cell development and the CD4–CD8 lineage decision. Nature Reviews Immunology, 2(5), 309-322. Gilbert, N., Boyle, S., Fiegler, H., Woodfine, K., Carter, N. P., & Bickmore, W. A. (2004). Chromatin architecture of the human genome: gene-rich domains are enriched in open chromatin fibers. Cell, 118(5), 555-566. Gilliland, D. G., & Griffin, J. D. (2002). The roles of FLT3 in hematopoiesis and leukemia. Blood, The Journal of the American Society of Hematology, 100(5), 1532-1542.
239
Ginhoux, F., & Jung, S. (2014). Monocytes and macrophages: developmental pathways and tissue homeostasis. Nature Reviews Immunology, 14, 392. doi:10.1038/nri3671 Gómez-Marín, C., Tena, J. J., Acemel, R. D., López-Mayorga, M., Naranjo, S., de la Calle-Mustienes, E., . . . Vielmas, E. (2015). Evolutionary comparison reveals that diverging CTCF sites are signatures of ancestral topological associating domains borders. Proceedings of the National Academy of Sciences, 112(24), 7542-7547. Goyal, G., Gundabolu, K., Vallabhajosyula, S., Silberstein, P. T., & Bhatt, V. R. (2016). Reduced-intensity conditioning allogeneic hematopoietic-cell transplantation for older patients with acute myeloid leukemia. Therapeutic advances in hematology, 7(3), 131-141. Griekspoor, A., & Groothuis, T. (2005). 4Peaks: a program that helps molecular biologists to visualize and edit their DNA sequence files v1. 7. In. Griffiths, A. J. F., Miller, J. H., Suzuki, D. T., Lewontin, R. C., & Gelbart, W. M. (1999). An Introduction to Genetic Analysis: W H Freeman & Company. Grimwade, D., Walker, H., Oliver, F., Wheatley, K., Harrison, C., Harrison, G., . . . Burnett, A. (1998). The importance of diagnostic cytogenetics on outcome in AML: analysis of 1,612 patients entered into the MRC AML 10 trial. Blood, The Journal of the American Society of Hematology, 92(7), 2322-2333. Gritz, E., & Hirschi, K. K. (2016). Specification and function of hemogenic endothelium during embryogenesis. Cellular and Molecular Life Sciences, 73(8), 1547-1567. Groner, Y., Ito, Y., Liu, P., Neil, J. C., Speck, N. A., & van Wijnen, A. (2017). RUNX Proteins in Development and Cancer. In: Springer. Gröschel, S., Sanders, M. A., Hoogenboezem, R., de Wit, E., Bouwman, B. A., Erpelinck, C., . . . van Lom, K. (2014). A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell, 157(2), 369-381. Growney, J. D., Shigematsu, H., Li, Z., Lee, B. H., Adelsperger, J., Rowan, R., . . . Williams, I. R. (2005). Loss of Runx1 perturbs adult hematopoiesis and is associated with a myeloproliferative phenotype. Blood, 106(2), 494-504. Gruber, S., Haering, C. H., & Nasmyth, K. (2003). Chromosomal cohesin forms a ring. Cell, 112(6), 765-777. Gunnell, A., Webb, H. M., Wood, C. D., McClellan, M. J., Wichaidit, B., Kempkes, B., . . . West, M. J. (2016). RUNX super-enhancer control through the Notch pathway by Epstein-Barr virus transcription factors regulates B cell growth. Nucleic Acids Research, 44(10), 4636-4650. Guo, G., Sun, X., Chen, C., Wu, S., Huang, P., Li, Z., . . . Zhou, Q. (2013). Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nature genetics, 45(12), 1459-1463. Habets, G., Van der Kammen, R., Stam, J. C., Michiels, F., & Collard, J. G. (1995). Sequence of the human invasion-inducing TIAM1 gene, its conservation in evolution and its expression in tumor cell lines of different tissue origin. Oncogene, 10(7), 1371-1376. Haffter, P., Granato, M., Brand, M., Mullins, M. C., Hammerschmidt, M., Kane, D. A., . . . Nusslein-Volhard, C. (1996). The identification of genes with unique and essential functions in the development of the zebrafish, Danio rerio. Development, 123(1), 1-36. Hakim, O., & Misteli, T. (2012). Chromosome conformation techniques. Cell, 148, 1068.
240
Han, P., Li, W., Lin, C.-H., Yang, J., Shang, C., Nurnberg, S. T., . . . Lin, C.-J. (2014). A long noncoding RNA protects the heart from pathological hypertrophy. Nature, 514(7520), 102-106. Harada, H., & Harada, Y. (2005). Point mutations in the AML1/RUNX1 gene associated with myelodysplastic syndrome. Critical Reviews™ in Eukaryotic Gene Expression, 15(3). Harada, Y., & Harada, H. (2009). Molecular pathways mediating MDS/AML with focus on AML1/RUNX1 point mutations. Journal of cellular physiology, 220(1), 16-20. Harms, J. F., Menniti, F. S., & Schmidt, C. J. (2019). Phosphodiesterase 9A in brain regulates cGMP signaling independent of nitric-oxide. Frontiers in neuroscience, 13, 837. Harmston, N., Ing-Simmons, E., Tan, G., Perry, M., Merkenschlager, M., & Lenhard, B. (2017). Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nature communications, 8(1), 441-441. doi:10.1038/s41467-017-00524-5 Hayashi, Y., Harada, Y., Huang, G., & Harada, H. (2017). Myeloid neoplasms with germ line RUNX1 mutation. International journal of hematology, 106(2), 183-188. Heimbruch, K. E., Meyer, A. E., Agrawal, P., Viny, A. D., & Rao, S. (2021). A cohesive look at leukemogenesis: The cohesin complex and other driving mutations in AML. Neoplasia, 23(3), 337-347. Heintzman, N. D., Stuart, R. K., Hon, G., Fu, Y., Ching, C. W., Hawkins, R. D., . . . Ren, B. (2007). Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet, 39(3), 311-318. doi:10.1038/ng1966 Hnisz, D., Abraham, B. J., Lee, T. I., Lau, A., Saint-André, V., Sigova, A. A., . . . Young, R. A. (2013). Super-enhancers in the control of cell identity and disease. Cell, 155(4), 934-947. Hnisz, D., Weintraub, A. S., Day, D. S., Valton, A.-L., Bak, R. O., Li, C. H., . . . Young, R. A. (2016). Activation of proto-oncogenes by disruption of chromosome neighborhoods. science, 351(6280), 1454-1458. doi:10.1126/science.aad9024 Holoch, D., & Moazed, D. (2015). RNA-mediated epigenetic regulation of gene expression. Nature Reviews Genetics, 16(2), 71-84. Holwerda, S., & De Laat, W. (2012). Chromatin loops, gene positioning, and gene expression. Frontiers in genetics, 3, 217. Horsfield, J. A., Anagnostou, S. H., Hu, J. K.-H., Cho, K. H. Y., Geisler, R., Lieschke, G., . . . Crosier, P. S. (2007). Cohesin-dependent regulation of Runx genes. Development, 134(14), 2639-2649. Hou, C., Dale, R., & Dean, A. (2010). Cell type specificity of chromatin organization mediated by CTCF and cohesin. Proceedings of the National Academy of Sciences, 107(8), 3651-3656. Hou, H.-A., & Tien, H.-F. (2020a). Genomic landscape in acute myeloid leukemia and its implications in risk classification and targeted therapies. Journal of Biomedical Science, 27(1), 1-13. Hou, T. Y., & Kraus, W. L. (2020b). Spirits in the Material World: Enhancer RNAs in Transcriptional Regulation. Trends in biochemical sciences.
241
Howe, K., Clark, M. D., Torroja, C. F., Torrance, J., Berthelot, C., Muffato, M., . . . Matthews, L. (2013). The zebrafish reference genome sequence and its relationship to the human genome. Nature, 496(7446), 498-503. Huang, J., Li, K., Cai, W., Liu, X., Zhang, Y., Orkin, S. H., . . . Yuan, G.-C. (2018). Dissecting super-enhancer hierarchy based on chromatin interactions. Nature communications, 9(1), 1-12. Hudson, T. J., & Jennings, J. L. (2014). Abstract 4278: International Cancer Genome Consortium (ICGC). Cancer Research, 74(19 Supplement), 4278. doi:10.1158/1538-7445.am2014-4278 Hyman, A. A., Weber, C. A., & Jülicher, F. (2014). Liquid-liquid phase separation in biology. Annual review of cell and developmental biology, 30, 39-58. Ichikawa, M., Asai, T., Saito, T., Seo, S., Yamazaki, I., Yamagata, T., . . . Hirai, H. (2004). AML-1 is required for megakaryocytic maturation and lymphocytic differentiation, but not for maintenance of hematopoietic stem cells in adult hematopoiesis. Nat Med, 10(3), 299-304. doi:10.1038/nm997 Ichikawa, M., Yoshimi, A., Nakagawa, M., Nishimoto, N., Watanabe-Okochi, N., & Kurokawa, M. (2013). A role for RUNX1 in hematopoiesis and myeloid leukemia. International journal of hematology, 97(6), 726-734. Inoue, K.-i., Shiga, T., & Ito, Y. (2008). Runx transcription factors in neuronal development. Neural Development, 3(20), 8104-8103. Ito, Y., Bae, S.-C., & Chuang, L. S. H. (2015). The RUNX family: developmental regulators in cancer. Nature Reviews Cancer, 15(2), 81-95. Ivanovs, A., Rybtsov, S., Ng, E. S., Stanley, E. G., Elefanty, A. G., & Medvinsky, A. (2017). Human haematopoietic stem cell development: from the embryo to the dish. Development, 144(13), 2323-2337. doi:10.1242/dev.134866 Ives, J. H., Dagna‐Bricarelli, F., Basso, G., Antonarakis, S. E., Jee, R., Cotter, F., & Nižetić, D. (1998). Increased levels of a chromosome 21‐encoded tumour invasion and metastasis factor (TIAM1) mRNA in bone marrow of down syndrome children during the acute phase of AML (M7). Genes, Chromosomes and Cancer, 23(1), 61-66. Jain, A. K., Xi, Y., McCarthy, R., Allton, K., Akdemir, K. C., Patel, L. R., . . . Yang, L. (2016). LncPRESS1 is a p53-regulated LncRNA that safeguards pluripotency by disrupting SIRT6-mediated de-acetylation of histone H3K56. Molecular cell, 64(5), 967-981. Jaiswal, S., & Ebert, B. L. (2019). Clonal hematopoiesis in human aging and disease. science, 366(6465). Javierre, B. M., Burren, O. S., Wilder, S. P., Kreuzhuber, R., Hill, S. M., Sewitz, S., . . . Thiecke, M. J. (2016). Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell, 167(5), 1369-1384. e1319. Jones, D. T., Jäger, N., Kool, M., Zichner, T., Hutter, B., Sultan, M., . . . Stütz, A. M. (2012). Dissecting the genomic complexity underlying medulloblastoma. Nature, 488(7409), 100-105. Jordan, C. (2002). Unique molecular and cellular features of acute myelogenous leukemia stem cells. Leukemia, 16(4), 559-562. Julien, E., El Omar, R., & Tavian, M. (2016). Origin of the hematopoietic system in the human embryo. FEBS letters, 590(22), 3987-4001.
242
Kadauke, S., & Blobel, G. A. (2009). Chromatin loops in gene regulation. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1789(1), 17-25. Kagey, M. H., Newman, J. J., Bilodeau, S., Zhan, Y., Orlando, D. A., van Berkum, N. L., . . . Levine, S. S. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature, 467(7314), 430-435. Kalev-Zylinska, M. L., Horsfield, J. A., Flores, M. V. C., Postlethwait, J. H., Vitas, M. R., Baas, A. M., . . . Crosier, K. E. (2002). Runx1 is required for zebrafish blood and vessel development and expression of a human RUNX1-CBF2T1 transgene advances a model for studies of leukemogenesis. Development, 129(8), 2015-2030. Kantarjian, H., Sawyers, C., Hochhaus, A., Guilhot, F., Schiffer, C., Gambacorti-Passerini, C., . . . Zoellner, U. (2002a). Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. New England Journal of Medicine, 346(9), 645-652. Kantarjian, H. M., Cortes, J., O'Brien, S., Giles, F. J., Albitar, M., Rios, M. B., . . . Thomas, D. A. (2002b). Imatinib mesylate (STI571) therapy for Philadelphia chromosome–positive chronic myelogenous leukemia in blast phase. Blood, The Journal of the American Society of Hematology, 99(10), 3547-3553. Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., Wang, Q., . . . Birnbaum, D. P. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581(7809), 434-443. Kari, G., Rodeck, U., & Dicker, A. P. (2007). Zebrafish: An emerging model system for human disease and drug discovery. Clinical Pharmacology & Therapeutics, 82(1), 70-80. doi:10.1038/sj.clpt.6100223 Katainen, R., Dave, K., Pitkänen, E., Palin, K., Kivioja, T., Välimäki, N., . . . Cajuso, T. (2015). CTCF/cohesin-binding sites are frequently mutated in cancer. Nature genetics, 47(7), 818-821. Kawakami, K. (2005). Transposon tools and methods in zebrafish. Developmental Dynamics, 234(2), 244-254. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., . . . Duran, C. (2012). Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 28(12), 1647-1649. Kellis, M., & Ernst, J. (2017). Chromatin-state discovery and genome annotation with ChromHMM. Nature Protocols, 12(12), 2478. Kent, W. J. (2002). BLAT—The BLAST-Like Alignment Tool. Genome Research, 12(4), 656-664. doi:10.1101/gr.229202 Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., . . . David. (2002). The Human Genome Browser at UCSC. Genome Research, 12(6), 996-1006. doi:10.1101/gr.229102 Kerenyi, M. A., Shao, Z., Hsu, Y.-J., Guo, G., Luc, S., O'Brien, K., . . . Orkin, S. H. (2013). Histone demethylase Lsd1 represses hematopoietic stem and progenitor cell signatures during blood cell maturation. elife, 2, e00633. Ketharnathan, S., Leask, M., Boocock, J., Phipps-Green, A. J., Antony, J., O’Sullivan, J. M., . . . Horsfield, J. A. (2018). A non-coding genetic variant maximally associated with serum urate levels is functionally linked to HNF4A-dependent PDZK1 expression. Human molecular genetics, 27(22), 3964-3973.
243
Khan, A., Mathelier, A., & Zhang, X. (2018). Super-enhancers are transcriptionally more active and cell type-specific than stretch enhancers. Epigenetics, 13(9), 910-922. Khan, A., & Zhang, X. (2016). dbSUPER: a database of super-enhancers in mouse and human genome. Nucleic Acids Research, 44(D1), D164-D171. doi:10.1093/nar/gkv1002 Kichaev, G., Bhatia, G., Loh, P.-R., Gazal, S., Burch, K., Freund, M. K., . . . Price, A. L. (2019). Leveraging Polygenic Functional Enrichment to Improve GWAS Power. The American Journal of Human Genetics, 104(1), 65-75. doi:10.1016/j.ajhg.2018.11.008 Kikin, O., D'Antonio, L., & Bagga, P. S. (2006). QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Research, 34(suppl 2), W676-W682. Kim, S. I., & Bresnick, E. H. (2007). Transcriptional control of erythropoiesis: emerging mechanisms and principles. Oncogene, 26, 6777. doi:10.1038/sj.onc.1210761 Kim, T., Yoshida, K., Kim, Y. K., Tyndel, M. S., Park, H. J., Choi, S. H., . . . Kim, D. D. (2016). Clonal dynamics in a single AML case tracked for 9 years reveals the complexity of leukemia progression. Leukemia, 30(2), 295-302. doi:10.1038/leu.2015.264 Kim, W., Barron, D. A., San Martin, R., Chan, K. S., Tran, L. L., Yang, F., . . . Rowley, D. R. (2014). RUNX1 is essential for mesenchymal stem cell proliferation and myofibroblast differentiation. Proceedings of the National Academy of Sciences, 111(46), 16389-16394. Klein, I. A., Resch, W., Jankovic, M., Oliveira, T., Yamane, A., Nakahashi, H., . . . Robbiani, D. F. (2011). Translocation-capture sequencing reveals the extent and nature of chromosomal rearrangements in B lymphocytes. Cell, 147(1), 95-106. Kloetgen, A., Thandapani, P., Tsirigos, A., & Aifantis, I. (2019). 3D Chromosomal Landscapes in Hematopoiesis and Immunity. Trends in Immunology, 40(9), 809-824. Koch, L. (2020). Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), 448-448. Koh, C. P., Ng, C., Nah, G., Wang, C. Q., Tergaonkar, V., Matsumura, T., . . . Osato, M. (2015). Hematopoietic stem cell enhancer: a powerful tool in stem cell biology. Histology and histopathology, 30(6), 661-672. Kojic, A., Cuadrado, A., De Koninck, M., Giménez-Llorente, D., Rodriguez-Corsino, M., Gómez-López, G., . . . Losada, A. (2018). Distinct roles of cohesin-SA1 and cohesin-SA2 in 3D chromosome organization. Nature structural & molecular biology, 25(6), 496-504. Kolovos, P., Knoch, T. A., Grosveld, F. G., Cook, P. R., & Papantonis, A. (2012). Enhancers and silencers: an integrated and simple model for their function. Epigenetics & chromatin, 5(1), 1-8. Kolterud, Å., & Carlsson, L. (1998). Expression of the LIM‐homeobox gene LH2 generates immortalized steel factor‐dependent multipotent hematopoietic precursors. The EMBO journal, 17(19), 5744-5756. Komori, T., Yagi, H., Nomura, S., Yamaguchi, A., Sasaki, K., Deguchi, K., . . . Kishimoto, T. (1997). Targeted Disruption of Cbfa1 Results in a Complete Lack of Bone Formation owing to Maturational Arrest of Osteoblasts. Cell, 89(5), 755-764. Kon, A., Shih, L.-Y., Minamino, M., Sanada, M., Shiraishi, Y., Nagata, Y., . . . Nakato, R. (2013). Recurrent mutations in multiple components of the cohesin complex in myeloid neoplasms. Nature genetics, 45(10), 1232-1237.
244
Kong, M., & Lee, C. (2013). Genetic associations with C-reactive protein level and white blood cell count in the KARE study. International Journal of Immunogenetics, 40(2), 120-125. doi:10.1111/j.1744-313X.2012.01141.x Koressaar, T., & Remm, M. (2007). Enhancements and modifications of primer design program Primer3. Bioinformatics, 23(10), 1289-1291. Kramer, I., Sigrist, M., de Nooij, J. C., Taniuchi, I., Jessell, T. M., & Arber, S. (2006). A role for Runx transcription factor signaling in dorsal root ganglion sensory neuron diversification. Neuron, 49(3), 379-393. Krijger, P. H., Geeven, G., Bianchi, V., Hilvering, C. R., & de Laat, W. (2020). 4C-seq from beginning to end: A detailed protocol for sample preparation and data analysis. Methods, 170, 17-32. Krutein, M. C., Hart, M. R., Anderson, D. J., Jeffery, J., Kotini, A. G., Dai, J., . . . Maguire, J. A. (2021). Restoring RUNX1 deficiency in RUNX1 familial platelet disorder by inhibiting its degradation. Blood Advances, 5(3), 687-699. Kulkeaw, K., & Sugiyama, D. (2012). Zebrafish erythropoiesis and the utility of fish as models of anemia. Stem cell research & therapy, 3(6), 1-11. Kuvardina, O. N., Herglotz, J., Kolodziej, S., Kohrs, N., Herkt, S., Wojcik, B., . . . Kumar, A. (2015). RUNX1 represses the erythroid gene expression program during megakaryocytic differentiation. Blood, blood-2014-2011-610519. Lai, F., Orom, U. A., Cesaroni, M., Beringer, M., Taatjes, D. J., Blobel, G. A., & Shiekhattar, R. (2013). Activating RNAs associate with Mediator to enhance chromatin architecture and transcription. Nature, 494(7438), 497-501. Lam, E. Y. N., Chau, J. Y., Kalev-Zylinska, M. L., Fountaine, T. M., Mead, R. S., Hall, C. J., . . . Flores, M. V. (2009). Zebrafish runx1 promoter-EGFP transgenics mark discrete sites of definitive blood progenitors. Blood, 113(6), 1241-1249. Lam, M. T., Cho, H., Lesch, H. P., Gosselin, D., Heinz, S., Tanaka-Oishi, Y., . . . Kosaka, M. (2013). Rev-Erbs repress macrophage gene expression by inhibiting enhancer-directed transcription. Nature, 498(7455), 511-515. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357. Latger-Cannard, V., Philippe, C., Bouquet, A., Baccini, V., Alessi, M.-C., Ankri, A., . . . Daliphard, S. (2016). Haematological spectrum and genotype-phenotype correlations in nine unrelated families with RUNX1 mutations from the French network on inherited platelet disorders. Orphanet journal of rare diseases, 11(1), 1-15. Lawrence, M., Daujat, S., & Schneider, R. (2016). Lateral Thinking: How Histone Modifications Regulate Gene Expression. Trends Genet, 32(1), 42-56. doi:10.1016/j.tig.2015.10.007 Lawrence, M. S., Stojanov, P., Mermel, C. H., Robinson, J. T., Garraway, L. A., Golub, T. R., . . . Getz, G. (2014). Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505(7484), 495-501. Lay, W. H., & Nussenzweig, V. (1968). RECEPTORS FOR COMPLEMENT ON LEUKOCYTES. The Journal of Experimental Medicine, 128(5), 991-1009. doi:10.1084/jem.128.5.991
245
Leeke, B., Marsman, J., O’Sullivan, J. M., & Horsfield, J. A. (2014). Cohesin mutations in myeloid malignancies: underlying mechanisms. Experimental Hematology & Oncology, 3, 13. Levanon, D., Bettoun, D., Harris‐Cerruti, C., Woolf, E., Negreanu, V., Eilam, R., . . . Fliegauf, M. (2002). The Runx3 transcription factor regulates development and survival of TrkC dorsal root ganglia neurons. The EMBO journal, 21(13), 3454-3463. Levanon, D., & Groner, Y. (2004). Structure and regulated expression of mammalian RUNX genes. Oncogene, 23(24), 4211-4219. Levine, M. (2010). Transcriptional enhancers in animal development and evolution. Current Biology, 20(17), R754-R763. Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., . . . Zhang, J. (2012). Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell, 148(1), 84-98. Li, Q.-L., Ito, K., Sakakura, C., Fukamachi, H., Inoue, K.-i., Chi, X.-Z., . . . Han, S.-B. (2002). Causal relationship between the loss of RUNX3 expression and gastric cancer. Cell, 109(1), 113-124. Li, W., Notani, D., Ma, Q., Tanasa, B., Nunez, E., Chen, A. Y., . . . Song, X. (2013a). Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature, 498(7455), 516-520. Li, X., & Fu, X.-D. (2019). Chromatin-associated RNAs as facilitators of functional genomic interactions. Nature Reviews Genetics, 20(9), 503-519. Li, Y., Huang, W., Niu, L., Umbach, D. M., Covo, S., & Li, L. (2013b). Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes. BMC genomics, 14(1), 553. Lieberman-Aiden, E., Van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., . . . Dorschner, M. O. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. science, 326(5950), 289-293. Lieschke, G. J., & Currie, P. D. (2007). Animal models of human disease: zebrafish swim into view. Nature Reviews Genetics, 8(5), 353-367. Lindsley, R. C., Mar, B. G., Mazzola, E., Grauman, P. V., Shareef, S., Allen, S. L., . . . Erba, H. P. (2015). Acute myeloid leukemia ontogeny is defined by distinct somatic mutations. Blood, 125(9), 1367-1376. Liu, S., Zhang, J., Yin, L., Wang, X., Zheng, Y., Zhang, Y., . . . Zheng, P. (2020). The lncRNA RUNX1-IT1 regulates C-FOS transcription by interacting with RUNX1 in the process of pancreatic cancer proliferation, migration and invasion. Cell death & disease, 11(6), 1-17. Liu, X., Zhang, Q., Zhang, D. E., Zhou, C., Xing, H., Tian, Z., . . . Wang, J. (2009). Overexpression of an isoform of AML1 in acute leukemia and its potential role in leukemogenesis. Leukemia, 23, 739. doi:10.1038/leu.2008.350 Lizio, M., Harshbarger, J., Shimoji, H., Severin, J., Kasukawa, T., Sahin, S., . . . Ishikawa-Kato, S. (2015). Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biology, 16(1), 22. Loder, F., Mutschler, B., Ray, R. J., Paige, C. J., Sideras, P., Torres, R., . . . Carsetti, R. (1999). B cell development in the spleen takes place in discrete steps and is determined by
246
the quality of B cell receptor–derived signals. The Journal of experimental medicine, 190(1), 75-90. Look, A. T. (1997). Oncogenic transcription factors in the human acute leukemias. science, 278(5340), 1059-1064. Losada, A. (2014). Cohesin in cancer: chromosome segregation and beyond. Nature Reviews Cancer, 14(6), 389-393. Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 1-21. Lozzio, C. B., & Lozzio, B. B. (1975). Human chronic myelogenous leukemia cell-line with positive Philadelphia chromosome. Luddy, R. E., Champion, L. A., & Schwartz, A. D. (1978). A fatal myeloproliferative syndrome in a family with thrombocytopenia and platelet dysfunction. Cancer, 41(5), 1959-1963. Lupiáñez, D. G., Kraft, K., Heinrich, V., Krawitz, P., Brancati, F., Klopocki, E., . . . Laxova, R. (2015). Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell, 161(5), 1012-1025. Maizels, N., & Gray, L. T. (2013). The G4 genome. PLoS genetics, 9(4), e1003468. Mann, A., & Bhatia, S. (2019). Zebrafish: A powerful model for understanding the functional relevance of noncoding region mutations in human genetic diseases. Biomedicines, 7(3), 71. Mansour, M. R., Abraham, B. J., Anders, L., Berezovskaya, A., Gutierrez, A., Durbin, A. D., . . . Silverman, L. B. (2014). An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. science, 346(6215), 1373-1377. Markova, E. N., Kantidze, O. L., & Razin, S. V. (2011). Transcriptional regulation and spatial organisation of the human AML1/RUNX1 gene. Journal of cellular biochemistry, 112(8), 1997-2005. Marsman, J. (2016). Transcriptional regulation of Runx1 gene. Pathology PhD thesis, Dunedin: Unversity of Otago. Marsman, J., & Horsfield, J. A. (2012). Long distance relationships: enhancer–promoter communication and dynamic gene transcription. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1819(11), 1217-1227. Marsman, J., O'Neill, A. C., Kao, B. R., Rhodes, J. M., Meier, M., Antony, J., . . . Horsfield, J. A. (2014). Cohesin and CTCF differentially regulate spatiotemporal runx1 expression during zebrafish development. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1839(1), 50-61. doi:10.1016/j.bbagrm.2013.11.007 Marsman, J., Thomas, A., Osato, M., O'Sullivan, J. M., & Horsfield, J. A. (2017). A DNA Contact Map for the Mouse Runx1 Gene Identifies Novel Hematopoietic Enhancers. bioRxiv. doi:10.1101/147793 Martin, C. S., Moriyama, A., & Zon, L. I. (2011). Hematopoietic stem cells, hematopoiesis and disease: lessons from the zebrafish model. Genome Medicine, 3(12), 83. Martincorena, I., Roshan, A., Gerstung, M., Ellis, P., Van Loo, P., McLaren, S., . . . Tubio, J. M. (2015). High burden and pervasive positive selection of somatic mutations in normal human skin. science, 348(6237), 880-886.
247
Martinez, M., Hinojosa, M., Trombly, D., Morin, V., Stein, J., & Stein, G. (2016). Transcriptional Auto-Regulation of RUNX1 P1 Promoter. PLoS One, 11. doi:10.1371/journal.pone.0149119 Matheny, C. J., Speck, M. E., Cushing, P. R., Zhou, Y., Corpora, T., Regan, M., . . . Gu, T. L. (2007). Disease mutations in RUNX1 and RUNX2 create nonfunctional, dominant‐negative, or hypomorphic alleles. The EMBO journal, 26(4), 1163-1175. Maurano, M. T., Humbert, R., Rynes, E., Thurman, R. E., Haugen, E., Wang, H., . . . Brody, J. (2012). Systematic localization of common disease-associated variation in regulatory DNA. science, 337(6099), 1190-1195. Mazumdar, C., Shen, Y., Xavy, S., Zhao, F., Reinisch, A., Li, R., . . . Chan, S. M. (2015). Leukemia-associated cohesin mutants dominantly enforce stem cell programs and impair human hematopoietic progenitor differentiation. Cell Stem Cell, 17(6), 675-688. McGahon, A. J., Brown, D. G., Martin, S. J., Amarante-Mendes, G. P., Cotter, T. G., Cohen, G. M., & Green, D. R. (1997). Downregulation of Bcr-Abl in K562 cells restores susceptibility to apoptosis: characterization of the apoptotic death. Cell Death & Differentiation, 4(2), 95-104. Meier, M., Grant, J., Dowdle, A., Thomas, A., Gerton, J., Collas, P., . . . Horsfield, J. A. (2018). Cohesin facilitates zygotic genome activation in zebrafish. Development, 145(1). doi:10.1242/dev.156521 Melo, C. A., Drost, J., Wijchers, P. J., van de Werken, H., de Wit, E., Vrielink, J. A. O., . . . Kalluri, R. (2013). eRNAs are required for p53-dependent enhancer activity and gene transcription. Molecular cell, 49(3), 524-535. Merkenschlager, M., & Nora, E. P. (2016). CTCF and cohesin in genome folding and transcriptional gene regulation. Annual review of genomics and human genetics, 17, 17-43. Meyers, S., Downing, J., & Hiebert, S. (1993). Identification of AML-1 and the (8; 21) translocation protein (AML-1/ETO) as sequence-specific DNA-binding proteins: the runt homology domain is required for DNA binding and protein-protein interactions. Molecular and cellular biology, 13(10), 6336-6345. Michaelis, C., Ciosk, R., & Nasmyth, K. (1997). Cohesins: chromosomal proteins that prevent premature separation of sister chromatids. Cell, 91(1), 35-45. Mifsud, B., Tavares-Cadete, F., Young, A. N., Sugar, R., Schoenfelder, S., Ferreira, L., . . . Ewels, P. A. (2015). Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nature genetics, 47(6), 598. Mikhaylichenko, O., Bondarenko, V., Harnett, D., Schor, I. E., Males, M., Viales, R. R., & Furlong, E. E. (2018). The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription. Genes & development, 32(1), 42-57. Mill, C. P., Fiskus, W., DiNardo, C. D., Qian, Y., Raina, K., Rajapakshe, K., . . . Khoury, J. D. (2019). RUNX1-targeted therapy for AML expressing somatic or germline mutation in RUNX1. Blood, 134(1), 59-73. Miyoshi, H., Kozu, T., Shimizu, K., Enomoto, K., Maseki, N., Kaneko, Y., . . . Ohki, M. (1993). The t(8; 21) translocation in acute myeloid leukemia results in production of an AML1-MTG8 fusion transcript. The EMBO journal, 12(7), 2715. Miyoshi, H., Shimizu, K., Kozu, T., Maseki, N., Kaneko, Y., & Ohki, M. (1991). t (8; 21) breakpoints on chromosome 21 in acute myeloid leukemia are clustered within a limited
248
region of a single gene, AML1. Proceedings of the National Academy of Sciences, 88(23), 10431-10434. Mullenders, J., Aranda-Orgilles, B., Lhoumaud, P., Keller, M., Pae, J., Wang, K., . . . Gong, Y. (2015). Cohesin loss alters adult hematopoietic stem cell homeostasis, leading to myeloproliferative neoplasms. Journal of Experimental Medicine, 212(11), 1833-1850. Murakawa, Y., Yoshihara, M., Kawaji, H., Nishikawa, M., Zayed, H., Suzuki, H., . . . Consortium, F. (2016). Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends in Genetics, 32(2), 76-88. Murati, A., Brecqueville, M., Devillier, R., Mozziconacci, M.-J., Gelsi-Boyer, V., & Birnbaum, D. (2012). Myeloid malignancies: mutations, models and management. BMC cancer, 12(1), 304. Nasmyth, K., & Haering, C. H. (2009). Cohesin: its roles and mechanisms. Annual review of genetics, 43. Ng, C. E. L., Yokomizo, T., Yamashita, N., Cirovic, B., Jin, H., Wen, Z., . . . Osato, M. (2010). A Runx1 intronic enhancer marks hemogenic endothelial cells and hematopoietic stem cells. Stem Cells, 28(10), 1869-1881. Nie, Y., Zhou, L., Wang, H., Chen, N., Jia, L., Wang, C., . . . Niu, C. (2019). Profiling the epigenetic interplay of lncRNA RUNXOR and oncogenic RUNX1 in breast cancer cells by gene in situ cis-activation. American journal of cancer research, 9(8), 1635. North, T., Gu, T. L., Stacy, T., Wang, Q., Howard, L., Binder, M., . . . Speck, N. A. (1999). Cbfa2 is required for the formation of intra-aortic hematopoietic clusters. Development, 126(11), 2563-2575. North, T. E., Stacy, T., Matheny, C. J., Speck, N. A., & de Bruijn, M. F. (2004). Runx1 is expressed in adult mouse hematopoietic stem cells and differentiating myeloid and lymphoid cells, but not in maturing erythroid cells. Stem Cells, 22(2), 158-168. doi:10.1634/stemcells.22-2-158 Northcott, P. A., Lee, C., Zichner, T., Stütz, A. M., Erkek, S., Kawauchi, D., . . . Sturm, D. (2014). Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature, 511(7510), 428-434. Nottingham, W. T., Jarratt, A., Burgess, M., Speck, C. L., Cheng, J.-F., Prabhakar, S., . . . Kong-a-San, J. (2007). Runx1-mediated hematopoietic stem-cell emergence is controlled by a Gata/Ets/SCL-regulated enhancer. Blood, 110(13), 4188-4197. Nozawa, R.-S., & Gilbert, N. (2019). RNA: nuclear glue for folding the genome. Trends in cell biology, 29(3), 201-211. Nuebler, J., Fudenberg, G., Imakaev, M., Abdennur, N., & Mirny, L. A. (2018). Chromatin organization by an interplay of loop extrusion and compartmental segregation. Proceedings of the National Academy of Sciences, 115(29), E6697-E6706. O'Donnell, M. R., Abboud, C. N., Altman, J., Appelbaum, F. R., Arber, D. A., Attar, E., . . . Goorha, S. (2012). Acute myeloid leukemia. Journal of the National Comprehensive Cancer Network, 10(8), 984-1021. Ochi, Y., Kon, A., Sakata, T., Nakagawa, M. M., Nakazawa, N., Kakuta, M., . . . Morishita, D. (2020). Combined Cohesin–RUNX1 Deficiency Synergistically Perturbs Chromatin Looping and Causes Myelodysplastic Syndromes. Cancer discovery, 10(6), 836-853.
249
Okuda, T., Nishimura, M., Nakao, M., & Fujitaa, Y. (2001). RUNX1/AML1: a central player in hematopoiesis. International journal of hematology, 74(3), 252-257. Okuda, T., Van Deursen, J., Hiebert, S. W., Grosveld, G., & Downing, J. R. (1996). AML1, the target of multiple chromosomal translocations in human leukemia, is essential for normal fetal liver hematopoiesis. Cell, 84(2), 321-330. Ong, C. T., & Corces, V. G. (2012). Enhancers: emerging roles in cell fate specification. EMBO reports, 13(5), 423-430. Opneja, A., Kapoor, S., & Stavrou, E. X. (2019). Contribution of platelets, the coagulation and fibrinolytic systems to cutaneous wound healing. Thrombosis research, 179, 56-63. Orkin, S. H., & Zon, L. I. (2008). Hematopoiesis: an evolving paradigm for stem cell biology. Cell, 132(4), 631-644. Orlando, G., Law, P. J., Cornish, A. J., Dobbins, S. E., Chubb, D., Broderick, P., . . . Osborne, C. S. (2018). Promoter capture Hi-C-based identification of recurrent noncoding mutations in colorectal cancer. Nature genetics, 50(10), 1375-1380. Otto, F., Kanegane, H., & Mundlos, S. (2002). Mutations in the RUNX2 gene in patients with cleidocranial dysplasia. Human mutation, 19(3), 209-216. Owen, C. J., Toze, C. L., Koochin, A., Forrest, D. L., Smith, C. A., Stevens, J. M., . . . Leber, B. (2008). Five new pedigrees with inherited RUNX1 mutations causing familial platelet disorder with propensity to myeloid malignancy. Blood, The Journal of the American Society of Hematology, 112(12), 4639-4645. Paguio, A., Almond, B., Fan, F., Stecha, P., Garvin, D., Wood, M., & Wood, K. (2005). pGL4 Vectors: A new generation of luciferase reporter vectors. Promega Notes, 89(4). Paik, E. J., & Zon, L. I. (2010). Hematopoietic development in the zebrafish. Int J Dev Biol, 54(6-7), 1127-1137. doi:10.1387/ijdb.093042ep Pang, B., & Snyder, M. P. (2020). Systematic identification of silencers in human cells. Nature genetics, 52(3), 254-263. Panigrahi, A. K., Foulds, C. E., Lanz, R. B., Hamilton, R. A., Yi, P., Lonard, D. M., . . . O’Malley, B. W. (2018). SRC-3 coactivator governs dynamic estrogen-induced chromatin looping interactions during transcription. Molecular cell, 70(4), 679-694. e677. Papaemmanuil, E., Gerstung, M., Bullinger, L., Gaidzik, V. I., Paschka, P., Roberts, N. D., . . . Campbell, P. J. (2016). Genomic Classification and Prognosis in Acute Myeloid Leukemia. N Engl J Med, 374(23), 2209-2221. doi:10.1056/NEJMoa1516192 Parker, S. C., Stitzel, M. L., Taylor, D. L., Orozco, J. M., Erdos, M. R., Akiyama, J. A., . . . Black, B. L. (2013). Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proceedings of the National Academy of Sciences, 110(44), 17921-17926. Pasquali, L., Gaulton, K. J., Rodríguez-Seguí, S. A., Mularoni, L., Miguel-Escalada, I., Akerman, İ., . . . van de Bunt, M. (2014). Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nature genetics, 46(2), 136-143. Patel, N. S., Klett, J., Pilarzyk, K., ik Lee, D., Kass, D., Menniti, F. S., & Kelly, M. P. (2018). Identification of new PDE9A isoforms and how their expression and subcellular compartmentalization in the brain change across the life span. Neurobiology of aging, 65, 217-234.
250
Peifer, M., Hertwig, F., Roels, F., Dreidax, D., Gartlgruber, M., Menon, R., . . . Heuckmann, J. M. (2015). Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature, 526(7575), 700-704. Pencovich, N., Jaschek, R., Tanay, A., & Groner, Y. (2011). Dynamic combinatorial interactions of RUNX1 and cooperating partners regulates megakaryocytic differentiation in cell line models. Blood, 117(1), e1-e14. Peng, Y., & Zhang, Y. (2018). Enhancer and super‐enhancer: Positive regulators in gene transcription. Animal models and experimental medicine, 1(3), 169-179. Pennacchio, L. A., Ahituv, N., Moses, A. M., Prabhakar, S., Nobrega, M. A., Shoukry, M., . . . Lewis, K. D. (2006). In vivo enhancer analysis of human conserved non-coding sequences. Nature, 444(7118), 499-502. Phillips, J. E., & Corces, V. G. (2009). CTCF: master weaver of the genome. Cell, 137(7), 1194-1211. Phillips-Cremins, J. E., & Corces, V. G. (2013). Chromatin insulators: linking genome organization to cellular function. Molecular cell, 50(4), 461-474. Pokholok, D. K., Harbison, C. T., Levine, S., Cole, M., Hannett, N. M., Lee, T. I., . . . Young, R. A. (2005). Genome-wide map of nucleosome acetylation and methylation in yeast. Cell, 122(4), 517-527. doi:10.1016/j.cell.2005.06.026 Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., . . . Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402-405. doi:10.1038/nature13986 Popel, A. S. (1989). Theory of oxygen transport to tissue. Crit Rev Biomed Eng, 17(3), 257-321. Pozner, A., Lotem, J., Xiao, C., Goldenberg, D., Brenner, O., Negreanu, V., . . . Groner, Y. (2007). Developmentally regulated promoter-switch transcriptionally controls Runx1 function during embryonic hematopoiesis. BMC developmental biology, 7(1), 1-19. Prensner, J. R., Iyer, M. K., Sahu, A., Asangani, I. A., Cao, Q., Patel, L., . . . Ghadessi, M. (2013). The long noncoding RNA SChLAP1 promotes aggressive prostate cancer and antagonizes the SWI/SNF complex. Nature genetics, 45(11), 1392. Puente, X. S., Beà, S., Valdés-Mas, R., Villamor, N., Gutiérrez-Abril, J., Martín-Subero, J. I., . . . Aymerich, M. (2015). Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature, 526(7574), 519-524. Raab, J. R., & Kamakaka, R. T. (2010). Insulators and promoters: closer than we think. Nature Reviews Genetics, 11(6), 439-446. Rahnamoun, H., Lee, J., Sun, Z., Lu, H., Ramsey, K. M., Komives, E. A., & Lauberth, S. M. (2018). RNAs interact with BRD4 to promote enhanced chromatin engagement and transcription activation. Nature structural & molecular biology, 25(8), 687-697. Rainger, J. K., Bhatia, S., Bengani, H., Gautier, P., Rainger, J., Pearson, M., . . . Palinkasova, B. (2014). Disruption of SATB2 or its long-range cis-regulation by SOX9 causes a syndromic form of Pierre Robin sequence. Human molecular genetics, 23(10), 2569-2579. Raney, B. J., Dreszer, T. R., Barber, G. P., Clawson, H., Fujita, P. A., Wang, T., . . . Kent, W. J. (2014). Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics, 30(7), 1003-1005. doi:10.1093/bioinformatics/btt637
251
Rao, S. S., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., . . . Lander, E. S. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159(7), 1665-1680. Rasighaemi, P., Basheer, F., Liongue, C., & Ward, A. C. (2015). Zebrafish as a model for leukemia and other hematopoietic disorders. Journal of hematology & oncology, 8, 29-29. doi:10.1186/s13045-015-0126-4 Ravi, V., Bhatia, S., Gautier, P., Loosli, F., Tay, B.-H., Tay, A., . . . Brenner, S. (2013). Sequencing of Pax6 loci from the elephant shark reveals a family of Pax6 genes in vertebrate genomes, forged by ancient duplications and divergences. PLoS Genet, 9(1), e1003177. Raviram, R., Rocha, P. P., Müller, C. L., Miraldi, E. R., Badri, S., Fu, Y., . . . Bonneau, R. (2016). 4C-ker: a method to reproducibly identify genome-wide interactions captured by 4C-Seq experiments. PLoS Computational Biology, 12(3), e1004780. Rawat, V. P., Cusan, M., Deshpande, A., Hiddemann, W., Quintanilla-Martinez, L., Humphries, R. K., . . . Buske, C. (2004). Ectopic expression of the homeobox gene Cdx2 is the transforming event in a mouse model of t (12; 13)(p13; q12) acute myeloid leukemia. Proceedings of the National Academy of Sciences, 101(3), 817-822. Rebolledo-Jaramillo, B., Alarcon, R. A., Fernandez, V. I., & Gutierrez, S. E. (2014). Cis-regulatory elements are harbored in Intron5 of the RUNX1 gene. BMC genomics, 15(1), 225. Reddy, E., Rao, V. N., & Papas, T. S. (1987). The erg gene: a human gene related to the ets oncogene. Proceedings of the National Academy of Sciences, 84(17), 6131-6135. Reikvam, H., Hatfield, K. J., Kittang, A. O., Hovland, R., & Bruserud, Ø. (2011). Acute myeloid leukemia with the t (8; 21) translocation: clinical consequences and biological implications. Journal of Biomedicine and Biotechnology, 2011. Rennert, J., Coffman, J. A., Mushegian, A. R., & Robertson, A. J. (2003). The evolution of Runx genes I. A comparative study of sequences from phylogenetically diverse model organisms. BMC evolutionary biology, 3(1), 4. Rhodes, J. M., McEwan, M., & Horsfield, J. A. (2011). Gene regulation by cohesin in cancer: is the ring an unexpected party to proliferation? Molecular Cancer Research, 9(12), 1587-1607. Ritter, D. I., Li, Q., Kostka, D., Pollard, K. S., Guo, S., & Chuang, J. H. (2010). The importance of being cis: evolution of orthologous fish and mammalian enhancer activity. Molecular biology and evolution, 27(10), 2322-2332. Rocquain, J., Gelsi‐Boyer, V., Adélaïde, J., Murati, A., Carbuccia, N., Vey, N., . . . Chaffanet, M. (2010). Alteration of cohesin genes in myeloid diseases. American journal of hematology, 85(9), 717-719. Röllig, C., Serve, H., Hüttmann, A., Noppeney, R., Müller-Tidow, C., Krug, U., . . . Einsele, H. (2015). Addition of sorafenib versus placebo to standard therapy in patients aged 60 years or younger with newly diagnosed acute myeloid leukaemia (SORAML): a multicentre, phase 2, randomised controlled trial. The lancet oncology, 16(16), 1691-1699. Romano, S. N., & Gorelick, D. A. (2014). Semi-automated Imaging of Tissue-specific Fluorescence in Zebrafish Embryos. Journal of Visualized Experiments(87), e51533-e51533. Rosenbloom, K. R., Sloan, C. A., Malladi, V. S., Dreszer, T. R., Learned, K., Kirkup, V. M., . . . Kent, W. J. (2013). ENCODE Data in the UCSC Genome Browser: year 5 update. Nucleic Acids Research, 41(D1), D56-D63. doi:10.1093/nar/gks1172
252
Ruffalo, M., Husseinzadeh, H., Makishima, H., Przychodzen, B., Ashkar, M., Koyutürk, M., . . . LaFramboise, T. (2015). Whole-exome sequencing enhances prognostic classification of myeloid malignancies. Journal of biomedical informatics, 58, 104-113. Sabari, B. R., Dall’Agnese, A., Boija, A., Klein, I. A., Coffey, E. L., Shrinivas, K., . . . Manteiga, J. C. (2018). Coactivator condensation at super-enhancers links phase separation and gene control. science, 361(6400). Sakurai, H., Harada, Y., Ogata, Y., Kagiyama, Y., Shingai, N., & Doki, N. (2017). Overexpression of RUNX1 short isoform has an important role in the development of myelodysplastic/myeloproliferative neoplasms. Blood Adv, 1. doi:10.1182/bloodadvances.2016002725 Sanyal, A., Baù, D., Martí-Renom, M. A., & Dekker, J. (2011). Chromatin globules: a common motif of higher order chromosome structure? Current opinion in cell biology, 23(3), 325-331. Sartorelli, V., & Lauberth, S. M. (2020). Enhancer RNAs are an important regulatory layer of the epigenome. Nature structural & molecular biology, 1-8. Scarola, M., Comisso, E., Pascolo, R., Chiaradia, R., Marion, R. M., Schneider, C., . . . Benetti, R. (2015). Epigenetic silencing of Oct4 by a complex containing SUV39H1 and Oct4 pseudogene lncRNA. Nature communications, 6(1), 1-13. Schaaf, C. A., Misulovin, Z., Sahota, G., Siddiqui, A. M., Schwartz, Y. B., Kahn, T. G., . . . Dorsett, D. (2009). Regulation of the Drosophila Enhancer of split and invected-engrailed gene complexes by sister chromatid cohesion proteins. PLoS One, 4(7), e6202. Schaukowitch, K., Joo, J.-Y., Liu, X., Watts, J. K., Martinez, C., & Kim, T.-K. (2014). Enhancer RNA facilitates NELF release from immediate early genes. Molecular cell, 56(1), 29-42. Schierding, W., Horsfield, J. A., & O'Sullivan, J. M. (2020). Low tolerance for transcriptional variation at cohesin genes is accompanied by functional links to disease-relevant pathways. Journal of medical genetics. Schlenk, R., Benner, A., Hartmann, F., Del Valle, F., Weber, C., Pralle, H., . . . Weber, W. (2003). Risk-adapted postremission therapy in acute myeloid leukemia: results of the German multicenter AML HD93 treatment trial. Leukemia, 17(8), 1521-1528. Schnake, N., Hinojosa, M., & Gutiérrez, S. (2019). Identification of a novel long non-coding RNA within RUNX1 intron 5. Human Genomics, 13(1), 33. Schütte, J., Wang, H., Antoniou, S., Jarratt, A., Wilson, N. K., Riepsaame, J., . . . Kinston, S. J. (2016). An experimentally validated network of nine haematopoietic transcription factors reveals mechanisms of cell state stability. Shallis, R. M., Ahmad, R., & Zeidan, A. M. (2018). The genetic and molecular pathogenesis of myelodysplastic syndromes. European journal of haematology, 101(3), 260-271. Sherf, B. A., Navarro, S. L., Hannah, R. R., & Wood, K. V. (1996). Dual-Luciferase® reporter assay: An advanced co-reporter technology integrating firefly and Renilla luciferase assays. Promega Notes, 57(2). Shih, A. H., Abdel-Wahab, O., Patel, J. P., & Levine, R. L. (2012). The role of mutations in epigenetic regulators in myeloid malignancies. Nature Reviews Cancer, 12(9), 599-612. Shimizu, K., Ichikawa, H., Tojo, A., Kaneko, Y., Maseki, N., Hayashi, Y., . . . Ohki, M. (1993). An ets-related gene, ERG, is rearranged in human myeloid leukemia with t (16; 21)
253
chromosomal translocation. Proceedings of the National Academy of Sciences, 90(21), 10280-10284. Shin, H. Y. (2019). The structural and functional roles of CTCF in the regulation of cell type-specific and human disease-associated super-enhancers. Genes & genomics, 41(3), 257-265. Shin, Y., Chang, Y.-C., Lee, D. S., Berry, J., Sanders, D. W., Ronceray, P., . . . Brangwynne, C. P. (2018). Liquid nuclear condensates mechanically sense and restructure the genome. Cell, 175(6), 1481-1491. e1413. Shlush, L. I., Zandi, S., Mitchell, A., Chen, W. C., Brandwein, J. M., Gupta, V., . . . Dick, J. E. (2014). Identification of pre-leukaemic haematopoietic stem cells in acute leukaemia. Nature, 506(7488), 328-333. doi:10.1038/nature13038 Sigova, A. A., Abraham, B. J., Ji, X., Molinie, B., Hannett, N. M., Guo, Y. E., . . . Young, R. A. (2015). Transcription factor trapping by RNA in gene regulatory elements. science, 350(6263), 978-981. Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., . . . de Laat, W. (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C). Nature genetics, 38(11), 1348-1354. Smemo, S., Tena, J. J., Kim, K.-H., Gamazon, E. R., Sakabe, N. J., Gomez-Marin, C., . . . Nobrega, M. A. (2014). Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature, 507(7492), 371-375. doi:10.1038/nature13138 Solomon, D. A., Kim, J.-S., Bondaruk, J., Shariat, S. F., Wang, Z.-F., Elkahloun, A. G., . . . Zhang, S. (2013). Frequent truncating mutations of STAG2 in bladder cancer. Nature genetics, 45(12), 1428-1430. Song, W.-J., Sullivan, M. G., Legare, R. D., Hutchings, S., Tan, X., Kufrin, D., . . . Hock, R. (1999). Haploinsufficiency of CBFA2 causes familial thrombocytopenia with propensity to develop acute myelogenous leukaemia. Nature genetics, 23(2), 166-175. Sood, R., Kamikubo, Y., & Liu, P. (2017). Role of RUNX1 in hematological malignancies. Blood, 129(15), 2070-2082. Spieler, D., Kaffe, M., Knauf, F., Bessa, J., Tena, J. J., Giesert, F., . . . Horsch, M. (2014). Restless legs syndrome-associated intronic common variant in Meis1 alters enhancer function in the developing telencephalon. Genome Research, 24(4), 592-603. Splinter, E., & De Laat, W. (2011). The complex transcription regulatory landscape of our genome: control in three dimensions. The EMBO journal, 30(21), 4345-4355. Splinter, E., Heath, H., Kooren, J., Palstra, R.-J., Klous, P., Grosveld, F., . . . de Laat, W. (2006). CTCF mediates long-range chromatin looping and local histone modification in the β-globin locus. Genes & development, 20(17), 2349-2354. Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345-354. Stein, E. M. (2015). Molecularly targeted therapies for acute myeloid leukemia. Hematology, 2015(1), 579-583. Stephens, P. J., Tarpey, P. S., Davies, H., Van Loo, P., Greenman, C., Wedge, D. C., . . . Bignell, G. R. (2012). The landscape of cancer genes and mutational processes in breast cancer. Nature, 486(7403), 400-404.
254
Stone, R. M., Mandrekar, S., Sanford, B. L., Geyer, S., Bloomfield, C. D., Dohner, K., . . . Klisovic, R. B. (2015). The multi-kinase inhibitor midostaurin (M) prolongs survival compared with placebo (P) in combination with daunorubicin (D)/cytarabine (C) induction (ind), high-dose C consolidation (consol), and as maintenance (maint) therapy in newly diagnosed acute myeloid leukemia (AML) patients (pts) age 18-60 with FLT3 mutations (muts): an international prospective randomized (rand) P-controlled double-blind trial (CALGB 10603/RATIFY [Alliance]). In: American Society of Hematology Washington, DC. Stranger, B. E., Nica, A. C., Forrest, M. S., Dimas, A., Bird, C. P., Beazley, C., . . . Koller, D. (2007). Population genomics of human gene expression. Nature genetics, 39(10), 1217-1224. Strom, L., & Dorsett, D. (2012). The ancient and evolving roles of cohesin in gene expression and DNA repair. Current Biology, 22(7), R240-250. Sun, C., Chang, L., & Zhu, X. (2017). Pathogenesis of ETV6/RUNX1-positive childhood acute lymphoblastic leukemia and mechanisms underlying its relapse. Oncotarget, 8(21), 35445. Sun, Q., Hao, Q., & Prasanth, K. V. (2018). Nuclear long noncoding RNAs: key regulators of gene expression. Trends in Genetics, 34(2), 142-157. Sweeney, C., & Vyas, P. (2019). The graft-versus-leukemia effect in AML. Frontiers in oncology, 9, 1217. Swift, S., Lorens, J., Achacoso, P., & Nolan, G. P. (1999). Rapid Production of Retroviruses for Efficient Gene Delivery to Mammalian Cells Using 293T Cell–Based Systems. Current Protocols in Immunology, 31(1), 10.17.14-10.17.29. doi:doi:10.1002/0471142735.im1017cs31 Tang, X., Sun, L., Wang, G., Chen, B., & Luo, F. (2018). RUNX1: A Regulator of NF-κB Signaling in Pulmonary Diseases. Current Protein and Peptide Science, 19(2), 172-178. Taoudi, S., Bee, T., Hilton, A., Knezevic, K., Scott, J., Willson, T. A., . . . Hilton, D. J. (2011). ERG dependence distinguishes developmental control of hematopoietic stem cell maintenance from hematopoietic specification. Genes Dev, 25(3), 251-262. doi:10.1101/gad.2009211 Taub, R., Kirsch, I., Morton, C., Lenoir, G., Swan, D., Tronick, S., . . . Leder, P. (1982). Translocation of the c-myc gene into the immunoglobulin heavy chain locus in human Burkitt lymphoma and murine plasmacytoma cells. Proceedings of the National Academy of Sciences, 79(24), 7837-7841. Taylor, C. F., Platt, F. M., Hurst, C. D., Thygesen, H. H., & Knowles, M. A. (2014). Frequent inactivating mutations of STAG2 in bladder cancer are associated with low tumour grade and stage and inversely related to chromosomal copy number changes. Human molecular genetics, 23(8), 1964-1974. Teittinen, K. J., Grönroos, T., Parikka, M., Rämet, M., & Lohi, O. (2012). The zebrafish as a tool in leukemia research. Leukemia research, 36(9), 1082-1088. Theriault, F. M., Roy, P., & Stifani, S. (2004). AML1/Runx1 is important for the development of hindbrain cholinergic branchiovisceral motor neurons and selected cranial sensory neurons. Proceedings of the National Academy of Sciences, 101(28), 10343-10348. Thol, F., Bollin, R., Gehlhaar, M., Walter, C., Dugas, M., Suchanek, K. J., . . . Wichmann, M. (2014). Mutations in the cohesin complex in acute myeloid leukemia: clinical and prognostic implications. Blood, 123(6), 914-920.
255
Thomas, A. (2015). Investigating the function of cis-regulatory elements for Runx1. Master of Science (Genetics) Thesis, Dunedin:University of Otago. Thoms, J. A., Birger, Y., Foster, S., Knezevic, K., Kirschenbaum, Y., Chandrakanthan, V., . . . Oram, S. H. (2011). ERG promotes T-acute lymphoblastic leukemia and is transcriptionally regulated in leukemic cells by a stem cell enhancer. Blood, 117(26), 7079-7089. Thota, S., Viny, A. D., Makishima, H., Spitzer, B., Radivoyevitch, T., Przychodzen, B., . . . Maciejewski, J. P. (2014). Genetic alterations of the cohesin complex genes in myeloid malignancies. Blood, 124(11), 1790-1798. doi:10.1182/blood-2014-04-567057 Thurman, R. E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M. T., Haugen, E., . . . Vernot, B. (2012). The accessible chromatin landscape of the human genome. Nature, 489(7414), 75-82. Tighe, J. E., Daga, A., & Calabi, F. (1993). Translocation breakpoints are clustered on both chromosome 8 and chromosome 21 in the t (8; 21) of acute myeloid leukemia. Blood, 81(3), 592-596. Todaro, G. J., & Green, H. (1963). Quantitative studies of the growth of mouse embryo cells in culture and their development into established lines. Journal of Cell Biology, 17, 299-313. Trinchieri, G. (1989). Biology of Natural Killer Cells. In F. J. Dixon (Ed.), Advances in Immunology (Vol. 47, pp. 187-376): Academic Press. Tsai, C.-H., Hou, H.-A., Tang, J.-L., Kuo, Y.-Y., Chiu, Y.-C., Lin, C.-C., . . . Liu, M.-C. (2017). Prognostic impacts and dynamic changes of cohesin complex gene mutations in de novo acute myeloid leukemia. Blood cancer journal, 7(12), 1-7. Tsai, P.-F., Dell’Orso, S., Rodriguez, J., Vivanco, K. O., Ko, K.-D., Jiang, K., . . . Tran, M. (2018). A muscle-specific enhancer RNA mediates cohesin recruitment and regulates transcription in trans. Molecular cell, 71(1), 129-141. e128. Tuerk, C., & Gold, L. (1990). Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. science, 249(4968), 505-510. Tyner, J. W., Tognon, C. E., Bottomly, D., Wilmot, B., Kurtz, S. E., Savage, S. L., . . . Abel, M. (2018). Functional genomic landscape of acute myeloid leukaemia. Nature, 562(7728), 526-531. Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., & Rozen, S. G. (2012). Primer3 - new capabilities and interfaces. Nucleic Acids Research, 40(15), e115-e115. van de Werken, H. J., de Vree, P. J., Splinter, E., Holwerda, S. J., Klous, P., de Wit, E., & de Laat, W. (2012a). 4C technology: protocols and data analysis. In Methods in enzymology (Vol. 513, pp. 89-112): Elsevier. Van De Werken, H. J., Landan, G., Holwerda, S. J., Hoichman, M., Klous, P., Chachik, R., . . . Bouwman, B. A. (2012b). Robust 4C-seq data analysis to screen for regulatory DNA interactions. Nature methods, 9(10), 969-972. Van Der Lelij, P., Lieb, S., Jude, J., Wutz, G., Santos, C. P., Falkenberg, K., . . . Hoffmann, T. (2017). Synthetic lethality between the cohesin subunits STAG1 and STAG2 in diverse cancer contexts. elife, 6, e26980. van Furth, R., Cohn, Z. A., Hirsch, J. G., Humphrey, J. H., Spector, W. G., & Langevoort, H. L. (1972). The mononuclear phagocyte system: a new classification of macrophages,
256
monocytes, and their precursor cells. Bulletin of the World Health Organization, 46(6), 845-852. Villar, D., Berthelot, C., Aldridge, S., Rayner, T. F., Lukk, M., Pignatelli, M., . . . Jasinska, A. J. (2015). Enhancer evolution across 20 mammalian species. Cell, 160(3), 554-566. Viny, A. D., Bowman, R. L., Liu, Y., Lavallée, V.-P., Eisman, S. E., Xiao, W., . . . Braunstein, S. (2019). Cohesin members Stag1 and Stag2 display distinct roles in chromatin accessibility and topological control of HSC self-renewal and differentiation. Cell Stem Cell, 25(5), 682-696. e688. Viny, A. D., Ott, C. J., Spitzer, B., Rivas, M., Meydan, C., Papalexi, E., . . . Chiu, A. (2015). Dose-dependent role of the cohesin complex in normal and malignant hematopoiesis. Journal of Experimental Medicine, 212(11), 1819-1832. Visel, A., Blow, M. J., Li, Z., Zhang, T., Akiyama, J. A., Holt, A., . . . Chen, F. (2009). ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature, 457(7231), 854-858. Visel, A., Prabhakar, S., Akiyama, J. A., Shoukry, M., Lewis, K. D., Holt, A., . . . Pennacchio, L. A. (2008). Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nature genetics, 40(2), 158-160. Vukadin, L., Kim, J.-H., Park, E. Y., Stone, J. K., Ungerleider, N., Baddoo, M. C., . . . Giannini, H. (2020). SON inhibits megakaryocytic differentiation via repressing RUNX1 and the megakaryocytic gene expression program in acute megakaryoblastic leukemia. Cancer Gene Therapy, 1-16. Waldman, T. (2020). Emerging themes in cohesin cancer biology. Nature Reviews Cancer, 20(9), 504-515. Wang, B., Liu, Y., Hou, G., Wang, L., Lv, N., Xu, Y., . . . Jing, Y. (2016). Mutational spectrum and risk stratification of intermediate-risk acute myeloid leukemia patients based on next-generation sequencing. Oncotarget, 7(22), 32065. Wang, H., Li, W., Guo, R., Sun, J., Cui, J., Wang, G., . . . Hu, J. F. (2014). An intragenic long noncoding RNA interacts epigenetically with the RUNX1 promoter and enhancer chromatin DNA in hematopoietic malignancies. International journal of cancer, 135(12), 2783-2794. Wang, Q., Stacy, T., Binder, M., Marin-Padilla, M., Sharpe, A. H., & Speck, N. A. (1996). Disruption of the Cbfa2 gene causes necrosis and hemorrhaging in the central nervous system and blocks definitive hematopoiesis. Proceedings of the National Academy of Sciences, 93(8), 3444-3449. Wang, X., Cairns, M. J., & Yan, J. (2019). Super-enhancers in transcriptional regulation and genome organization. Nucleic Acids Research, 47(22), 11481-11496. Wang, X., Goodrich, K. J., Gooding, A. R., Naeem, H., Archer, S., Paucek, R. D., . . . Davidovich, C. (2017). Targeting of polycomb repressive complex 2 to RNA by short repeats of consecutive guanines. Molecular cell, 65(6), 1056-1067. e1055. Wang, Y., Song, F., Zhang, B., Zhang, L., Xu, J., Kuang, D., . . . Yue, F. (2018). The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biology, 19(1), 151. doi:10.1186/s13059-018-1519-9 Ward, L. D., & Kellis, M. (2016). HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Research, 44(D1), D877-D881.
257
Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J. F., Agarwal, P., . . . Lander, E. S. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915), 520-562. doi:10.1038/nature01262 Weischenfeldt, J., Dubash, T., Drainas, A. P., Mardin, B. R., Chen, Y., Stütz, A. M., . . . Raeder, B. (2017). Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nature genetics, 49(1), 65-74. Welch, J. S., Ley, T. J., Link, D. C., Miller, C. A., Larson, D. E., Koboldt, D. C., . . . Xia, J. (2012). The origin and evolution of mutations in acute myeloid leukemia. Cell, 150(2), 264-278. Welch, J. S., Petti, A. A., Miller, C. A., Fronick, C. C., O’Laughlin, M., Fulton, R. S., . . . Tandon, B. (2016). TP53 and decitabine in acute myeloid leukemia and myelodysplastic syndromes. New England Journal of Medicine, 375(21), 2023-2036. West, A. G., Gaszner, M., & Felsenfeld, G. (2002). Insulators: many functions, many mechanisms. Genes & development, 16(3), 271-288. Whyte, W. A., Orlando, D. A., Hnisz, D., Abraham, B. J., Lin, C. Y., Kagey, M. H., . . . Young, R. A. (2013). Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell, 153(2), 307-319. Wickham, H., & Wickham, M. H. (2007). The ggplot package. In. Wilde, L., Cooper, J., Wang, Z.-X., & Liu, J. (2019). Clinical, Cytogenetic, and Molecular Findings in Two Cases of Variant t (8; 21) Acute Myeloid Leukemia (AML). Frontiers in oncology, 9, 1016. Williamson, I., Berlivet, S., Eskeland, R., Boyle, S., Illingworth, R. S., Paquette, D., . . . Bickmore, W. A. (2014). Spatial genome organization: contrasting views from chromosome conformation capture and fluorescence in situ hybridization. Genes & Development, 28(24), 2778-2791. doi:10.1101/gad.251694.114 Williamson, I., Kane, L., Devenney, P. S., Flyamer, I. M., Anderson, E., Kilanowski, F., . . . Lettice, L. A. (2019). Developmentally regulated Shh expression is robust to TAD perturbations. Development, 146(19). Wilson, N. K., Calero-Nieto, F. J., Ferreira, R., & Göttgens, B. (2011). Transcriptional regulation of haematopoietic transcription factors. Stem cell research & therapy, 2(1), 1-8. Wilson, N. K., Schoenfelder, S., Hannah, R., Castillo, M. S., Schütte, J., Ladopoulos, V., . . . Moignard, V. (2016). Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell model. Blood, 127(13), e12-e23. Woolfe, A., Goodson, M., Goode, D. K., Snell, P., McEwen, G. K., Vavouri, T., . . . Kelly, K. (2004). Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol, 3(1), e7. Xu, X., Ren, X., Wang, H., Zhao, Y., Yi, Z., Wang, K., . . . Hu, Z. (2016). Identification and functional analysis of acute myeloid leukemia susceptibility associated single nucleotide polymorphisms at non-protein coding regions of RUNX1. Leukemia & lymphoma, 57(6), 1442-1449. Yokomizo, T., Ogawa, M., Osato, M., Kanno, T., Yoshida, H., Fujimoto, T., . . . Ito, Y. (2001). Requirement of Runx1/AML1/PEBP2alphaB for the generation of haematopoietic cells from endothelial cells. Genes Cells, 6(1), 13-23.
258
Yoshida, K., Toki, T., Okuno, Y., Kanezaki, R., Shiraishi, Y., Sato-Otsubo, A., . . . Suzuki, H. (2013). The landscape of somatic mutations in Down syndrome-related myeloid disorders. Nature genetics, 45(11), 1293-1299. Young, A. L., Challen, G. A., Birmann, B. M., & Druley, T. E. (2016). Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat Commun, 7, 12484. doi:10.1038/ncomms12484 Yu, J., Li, Y., Li, T., Li, Y., Xing, H., Sun, H., . . . Xie, X. (2020). Gene mutational analysis by NGS and its clinical significance in patients with myelodysplastic syndrome and acute myeloid leukemia. Experimental Hematology & Oncology, 9(1), 1-11. Yuan, X., Song, M., Devine, P., Bruneau, B. G., Scott, I. C., & Wilson, M. D. (2018). Heart enhancers with deeply conserved regulatory activity are established early in zebrafish development. Nature communications, 9(1), 1-14. Yusta, B., Somwar, R., Wang, F., Munroe, D., Grinstein, S., Klip, A., & Drucker, D. J. (1999). Identification of glucagon-like peptide-2 (GLP-2)-activated signaling pathways in baby hamster kidney fibroblasts expressing the rat GLP-2 receptor. Journal of Biological Chemistry, 274(43), 30459-30467. Zeller, T., Wild, P., Szymczak, S., Rotival, M., Schillert, A., Castagne, R., . . . Cambien, F. (2010). Genetics and Beyond – The Transcriptome of Human Monocytes and Disease Susceptibility. PLoS One, 5(5), e10693. doi:10.1371/journal.pone.0010693 Zentner, G. E., Tesar, P. J., & Scacheri, P. C. (2011). Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. Genome Research, 21(8), 1273-1283. Zhan, Y., Mariani, L., Barozzi, I., Schulz, E. G., Blüthgen, N., Stadler, M., . . . Giorgetti, L. (2017). Reciprocal insulation analysis of Hi-C data shows that TADs represent a functionally but not structurally privileged scale in the hierarchical folding of chromosomes. Genome Res, 27(3), 479-490. doi:10.1101/gr.212803.116 Zhang, J., Baran, J., Cros, A., Guberman, J. M., Haider, S., Hsu, J., . . . Kasprzyk, A. (2011). International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database (Oxford), 2011. doi:10.1093/database/bar026 Zhang, Y., Chen, F., Fonseca, N. A., He, Y., Fujita, M., Nakagawa, H., . . . Creighton, C. J. (2020). High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nature communications, 11(1), 1-14. Zhang, Y., McCord, R. P., Ho, Y.-J., Lajoie, B. R., Hildebrand, D. G., Simon, A. C., . . . Dekker, J. (2012). Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell, 148(5), 908-921. Zhao, Y., Du, Z., & Li, N. (2007). Extensive selection for the enrichment of G4 DNA motifs in transcriptional regulatory regions of warm blooded animals. FEBS letters, 581(10), 1951-1956. Zhao, Z., Tavoosidana, G., Sjölinder, M., Göndör, A., Mariano, P., Wang, S., . . . Singh, U. (2006). Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra-and interchromosomal interactions. Nature genetics, 38(11), 1341-1347.
259
Zhu, Q., Gao, P., Tober, J., Bennett, L., Chen, C., Uzun, Y., . . . Yu, W. (2020). Developmental trajectory of prehematopoietic stem cell formation from endothelium. Blood, The Journal of the American Society of Hematology, 136(7), 845-856. Zink, F., Stacey, S. N., Norddahl, G. L., Frigge, M. L., Magnusson, O. T., Jonsdottir, I., . . . Stefansson, K. (2017). Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood, 130(6), 742-752. doi:10.1182/blood-2017-02-769869 Zuin, J., Dixon, J. R., van der Reijden, M. I., Ye, Z., Kolovos, P., Brouwer, R. W., . . . van IJcken, W. F. (2014). Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proceedings of the National Academy of Sciences, 111(3), 996-1001.
260
Appendix I: Materials
All solutions were made using sterile milli-Q water unless otherwise stated. The pH
adjustments were made using 1 M Hydrochloric acid (HCl) (BDH Prolabo) or Sodium
Hydroxide (NaOH) (VMR Prolabo). All primers were synthesized by ‘Integrated DNA
technologies’ [(IDT), Custom Science, NZ] unless otherwise specified.
Chemical reagents and consumables used in this project
Chemical (Abbreviation) Manufacturer Agar Bacteriological Oxoid Ltd, UK Ampicillin Sigma-Aldrich, USA Adenosine Triphosphate (ATP) Sigma-Aldrich, USA Agencourt® AMPure® XP beads Beckman Coulter Bacto Tryptone Bectone Dickinson and Co, USA Bgl II restriction enzyme mix New England Biolabs, UK Calcium Chloride VMR Prolabo, UK Chloramphenicol Sigma-Aldrich, USA CviQI restriction enzyme New England Biolabs, UK CviQI reaction 10x Buffer New England Biolabs, UK D-(+)-Glucose Sigma-Aldrich, USA Dimethyl sulfoxide (DMSO) Sigma-Aldrich, USA DNase Roche Applied Science,
Germany DMEM Gibco, Life Technology, USA DMSO Sigma-Aldrich, USA DpnI restriction enzyme New England Biolabs, UK DpnI reaction 10x buffer New England Biolabs, UK DpnII restriction enzyme New England Biolabs, UK DpnII reaction 10x buffer New England Biolabs, UK Dual-Glo® Luciferase Assay System Invitrogen, USA Eco RI restriction enzyme New England Biolabs, UK Eco RI reaction 10x buffer New England Biolabs, UK Ethanol Scharlab, Spain Ethyl 3-aminobenzoate methanesulfonate salt Sigma-Aldrich, USA Ethylene diamine tetracetic acid (EDTA) Fisher Scientific, USA Fetal bovine serum (FBS), NZ origin Gibco, Life Technology, USA Formaldehyde Sigma-Aldrich, USA Gateway LR Clonase II enzyme mix Invitrogen, USA Glacial acetic acid Analar Normapur, NZ Glycerol VMR Prolabo, UK Glycine Sigma-Aldrich, USA Halocarbon oil 27 Sigma-Aldrich, USA Hydrochloric acid BDH Prolabo, UK IMDM Gibco, Life Technology, USA T4 DNA ligase Invitrogen, USA
261
Chemical (Abbreviation) Manufacturer Ligase Buffer Invitrogen, USA Linear Polyacrylamide (LPA) Invitrogen, USA Lipofectamine 3000 Invitrogen, USA Lithium Chloride Sigma-Aldrich, USA Magnesium Sulfate Sigma-Aldrich, USA Meganuclease I-Sce I enzyme mix Roche Applied Science,
Germany Methylcellulose Sigma-Aldrich, USA Methylene Blue Sigma-Aldrich, USA Na2HPO4.H20 Merck, Germany Na2H2PO4 Merck, Germany OPTI-MEM reduced serum media Gibco, Life Technology, USA Phosphate buffer saline (PBS) solution Gibco, Life Technology, USA pCR8/GW/TOPO TA Cloning kit Life Technology, USA Penicillin Streptomycin Gibco, Life Technology, USA PfuUltra HF DNA polymerase Agilent Technologies, USA PfuUltra HF DNA polymerase reaction 10x buffer
Agilent Technologies, USA
Phenol:Chloroform:Isoamyl Alcohol 25:24:1 Sigma-Aldrich, USA Platinum Taq Polymerase Life Technology, USA PMA Sigma-Aldrich, USA Potassium Chloride BDH Prolabo, UK Proteinase K Invitrogen, USA P3000 Life Technology, USA qScript cDNA SuperMix Quanta BioSciences, USA Q5® Hot Start High-Fidelity 2x Master mix New England Biolabs RNaseA ThermoFisher, USA Sodium Bicarbonate Sigma-Aldrich, USA Sodium Chloride VMR Prolabo, UK Sodium Dodecyl Sulfate Sigma-Aldrich, USA Sodium Hydroxide VMR Prolabo, UK Spectinomycin Sigma-Aldrich, USA SYBR ® Green ThermoFisher, USA
Triton X-100 Sigma-Aldrich, USA
Trypan blue stain Sigma-Aldrich, USA
Trypsin-EDTA (0.5%), No phenol red Gibco, Life Technology, USA
Ultrapure Agarose Life Technology, USA
Ultrapure DNase – RNase free distilled water Life Technology, USA
X-gal Sigma-Aldrich, USA
Yeast extract Oxoid Ltd, UK
1 kb Plus DNA ladder Invitrogen, USA
1 M Tris (pH 8) J.T Baker, USA
262
Chemical (Abbreviation) Manufacturer 10x Transcription Buffer Roche Applied Science,
Germany 10 mM dNTPs Invitrogen, USA
Solutions and buffers
Luria-Broth (LB) Medium 10 g Bacto Tryptone (Becton Dickinson and Co), 5 g Yeast extract (Oxoid Ltd), 10 g Sodium Chloride (NaCl) (VMR Prolabo), make up to 1 litre with milli Q water. Sterilise by autoclaving.
LB agar Pre- autoclave LB medium with 1.5 % Agar Bacteriologial (Oxoid Ltd). Sterilise by autoclaving.
70% Ethanol Absolute ethanol dissolved in ddH2O at a final concentration of 70%
1x Tris-Acetate-EDTA (TAE) buffer 1 M TRIS (pH 8) (J.T Baker), Glacial acetic acid and 0.5 M (Analar Normapur) Ethylene diamine tetracetic acid (EDTA) (Fisher Scientific) (pH 8) added to milli Q H2O at a final concentration of 40 mM, 20 mM and 1 mM, respectively
1 % Agarose Gel 1 % Ultrapure Agarose (Life Technology) in 1 x TAE. For a 60 ml gel
1 x E3 (5 mM Sodium Chloride (NaCl) (VMR Prolabo), 0.17 mM Potassium Chloride (KCl) (BDH Prolabo), 0.33 mM CaCl2 (VMR Prolabo), 0.33 mM Magnesium Sulfate (MgSO4) (Sigma-Aldrich), 10 % (v/v) Methylene Blue (Sigma-Aldrich))
Tricaine 400 mg Tricaine powder (Ethyl 3-aminobenzoate methanesulfonate salt) (Sigma-Aldrich), 97.9 ml ddH2O, ~2.1 ml 1 M Tris pH 9. Adjust to pH 7. Store in freezer.
3 % Methylcellulose 3 % Methylcellulose (Sigma-Aldrich) in 1 x E3 solution.
Cyroprotectant Solution 20 % DMSO in FBS. Lysis buffer (10mL) 100 µL of 1 M Tris-HCl (pH 8.0), 20 µL
of 5 M NaCl, 200 µL of 20 % of NP-40, 1 tablet of protease inhibitors (Roche Diagnostics)
3 M NaOAC pH 5.6 246.1g sodium acetate (NaOAC), make up to 1 litre with milli Q water. Adjust to pH 5.6. Filter sterilise the solution.
263
2.5 M glycine 75 g glycine ), make up to 1 litre with milli Q water. Filter sterilise the solution.
264
Appendix II: Plasmid maps
Plasmid maps of constructs used in this study are presented below. Map of Zebrafish
enhancer detector vector (ZED) was obtained from Bessa et al., 2009 (Bessa et al., 2009).
Map of PCS-TP was obtained from Kawakami Lab website
(https://ztrap.nig.ac.jp/map2.html). pGL4.23-GW was obtained from Addgene
(https://www.addgene.org/60323/). pRL-TK map is from the Promega: pRL Renilla
Luciferase Control Reporter Vectors website:
(https://worldwide.promega.com/products/luciferase-assays/genetic-reporter-vectors-and-
cell-lines/prl-renilla-luciferase-control-reporter-vectors/?catNum=E2231). Map of pUC19
was obtained from Addgene (https://www.addgene.org/browse/sequence/74677/).
Zebrafish enhancer detector vector (ZED)
269
Appendix III: Primer list
Appendix III Table.1: Details of the primers used in this project
Primer name Direction Sequence ZED insert confirmation mCNE1 forward AAAGGGAACAAAAGCTGGAG mCNE2 reverse AGTCGCTTCTCTTCGGTTGA pGL4.23-GW insert confirmation RVprimer3 forward CTAGCAAAATAGGCTGTCCC Runx1 +24 Site-Directed Mutagenesis Runx1 +24 g->t
forward CGTGTCAGGAAGGGTGTGTGTGGAACTTACACCTC
reverse GAGGTGTAAGTTCCACACACACCCTTCCTGACACG Runx1 +24 t->a
forward AGGTGTAAGTTCCTCCCCCTCCCTTCCTGACAC
reverse GTGTCAGGAAGGGAGGGGGAGGAACTTACACCT qRT-PCR primers Human GAPDH forward TGCACCACCAACTGCTTAGC reverse GGCATGGACTGTGGTCATGAG Human H-CYCLOPHILIN
forward ACGGCGAGCCCTTGG
reverse TTTCTGCTGTCTTTGGGACCT Human RUNX1 P1
forward TTTTCAGGAGGAAGCGATGG
reverse TGGCATCGTGGACGTCTCTA Human RUNX1 P2
forward TTGTGATGCGTATCCCCGTA
reverse TCATCATTGCCAGCCATCAC Human RUNX1 total
forward AGACCCTGCCCATCGCTTTC
reverse GGTTCTTCATGGCTGCGGTA Human ERG forward CGCAGAGTTATCGTGCCAGCAGAT reverse CCATATTCTTTCACCGCCCACTCC Human GATA1 forward TTGCCACATCCCCAAGGCGG reverse GGGGGAGGGGCTCTGAGGTC Human KLF1 forward TGACTTCCTCAAGTGGTGGC reverse GGTGAGGAGGAGATCCAGGT 4C bait primers Human RUNX1 P1
forward AACCAAGTCTGTCAGTCTCCTG
reverse GCTAACAAGCAGAGGAAGCAA Human RUNX1 +24
forward CACGTGTCCACATTTTTCAGA
reverse CAGTTACTTCGCCCATTGCT Human RUNX1 P2
forward GCCGGCTCCTGGAATTGG
reverse AGGATGTGCGTGCGTGTGTA
270
Primer name Direction Sequence Human ERG +85
forward CTCGCTGCTTAGGTGTAGTCC
reverse TGGGGGAGACTGACAAATAGA
271
Appendix IV: Prediction of G-quadruplexes
During my master’s work the sequence of +24 Runx1 mouse regulatory region was
obtained using forward (AAAGGGAACAAAAGCTGGAG) and reverse primers
(AGTCGCTTCTCTTCGGTTGA). Despite multiple attempts and different primers, the region
was never sequenced the full way through. When compared back to the mm9 +24 regulatory
element sequence sent by Dr. Motomi Osato, it was noted that the sequences stopped in the
same place every time. The online tool QGRS mapper was used to predict if there was a G-
quadruplex structure (Kikin et al., 2006). The Osato +24 mouse regulatory region sequence is
shown below. The area that could not be sequenced is underlined, the probable G-quadruplex
is highlighted in yellow. The most likely G-quadruplex is GGTGGGGGTGG, reverse
complement CCACCCCCACC. C rich residues could also be forming i-motifs, However these
are less likely to be biologically relevant as they form depending on the cell’s pH.
Forward sequence
TCAGCTCTGCACTGCACTAAGGAGTTTTTTATGGAAAGCAATGGTCCCAGT
GGCTGAAGGGTCCAGCAAGTGGCTGGTAGAGCCAGCAAGGAGCGATGGAGGGA
TGGTGTGAGGAGGAGACAGGAAGGGAGGCGGTGACACAGGCCTTCACAGGGCC
GCTGCAGCTAGGAACTGCTGCAAAAGCAAGCTGCCCACGTTATCAGTGGCGCGC
AGGCCTGCGGTTTTCTCGCTCTTGCAACCCGGCTTCAACTGCCGGTTTATTTTTCG
ACAAACAGGATGCCTCCATCTGAGGCTGTCAGCATCCCAGGCTCTTTGAGAAGA
AAAGAGGTAGTGAGGGCCCCACCCTAGAGCCCAGCCGGGAGGGGTGGGTGAGA
GGTCCTGTTTCTAGATGCTTCCCAGAGAAGTGAGAACCTAGCAGGTGCAGTGGCC
AGGGTTCAGGAGCGTGTCAGGAAGGGTGGGGGTGGAACTTACACCTCCCACCCC
CACCAGACTTAAATGGACTCCCCACCTCTGCATGGGTCCCAAGGCT
Reverse sequence
272
AGCCTTGGGACCCATGCAGAGGTGGGGAGTCCATTTAAGTCTGGTGGGGG
TGGGAGGTGTAAGTTCCACCCCCACCCTTCCTGACACGCTCCTGAACCCTGGCCA
CTGCACCTGCTAGGTTCTCACTTCTCTGGGAAGCATCTAGAAACAGGACCTCTCA
CCCACCCCTCCCGGCTGGGCTCTAGGGTGGGGCCCTCACTACCTCTTTTCTTCTCA
AAGAGCCTGGGATGCTGACAGCCTCAGATGGAGGCATCCTGTTTGTCGAAAAAT
AAACCGGCAGTTGAAGCCGGGTTGCAAGAGCGAGAAAACCGCAGGCCTGCGCG
CCACTGATAACGTGGGCAGCTTGCTTTTGCAGCAGTTCCTAGCTGCAGCGGCCCT
GTGAAGGCCTGTGTCACCGCCTCCCTTCCTGTCTCCTCCTCACACCATCCCTCCAT
CGCTCCTTGCTGGCTCTACCAGCCACTTGCTGGACCCTTCAGCCACTGGGACCAT
TGCTTTCCATAAAAAACTCCTTAGTGCAGTGCAGAGCTGA
When the overlapping sequence data was examined, a small chunk of sequence is
unable to be read (underlined area). Two predicted quadruplex structures sit either side of the
region that could not be sequenced.
AGCCTTGGGACCCATGCAGAGGTGGGGAGTCCATTTAAGTCTGGTGGGGG
TGGGAGGTGTAAGTTCCACCCCCACCCTTCCTGACACGCTCCTGAACCCTGGCCA
CTGCACCTGCTAGGTTCTCACTTCTCTGGGAAGCATCTAGAAACAGGACCTCTCA
CCCACCCCTCCCGGCTGGGCTCTAGGGTGGGGCCCTCACTACCTCTTTTCTTCTCA
AAGAGCCTGGGATGCTGACAGCCTCAGATGGAGGCATCCTGTTTGTCGAAAAAT
AAACCGGCAGTTGAAGCCGGGTTGCAAGAGCGAGAAAACCGCAGGCCTGCGCG
CCACTGATAACGTGGGCAGCTTGCTTTTGCAGCAGTTCCTAGCTGCAGCGGCCCT
GTGAAGGCCTGTGTCACCGCCTCCCTTCCTGTCTCCTCCTCACACCATCCCTCCAT
CGCTCCTTGCTGGCTCTACCAGCCACTTGCTGGACCCTTCAGCCACTGGGACCAT
TGCTTTCCATAAAAAACTCCTTAGTGCAGTGCAGAGCTG
273
Appendix V: Sequencing report
Libraries were demultiplexed by Otago Genomics Facility (OGF). A summary of the
sequencing for this project is below. The sequencing metrics are within Illumina's performance
specifications for low diversity libraries.
275
Appendix VI: Publication
Some of the results from Chapter 3 contributed to Marsman, J., Thomas, A., Osato, M.,
O'Sullivan, J. M., & Horsfield, J. A. (2017). A DNA contact map for the mouse Runx1 gene
identifies novel haematopoietic enhancers. Scientific Reports, 7, 13347
1SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
www.nature.com/scientificreports
A DNA Contact Map for the Mouse Runx1 Gene Identifies Novel Haematopoietic EnhancersJudith Marsman1, Amarni Thomas1, Motomi Osato2,3, Justin M. O’Sullivan 4,5 & Julia A. Horsfield 1,5
The transcription factor Runx1 is essential for definitive haematopoiesis, and the RUNX1 gene is frequently translocated or mutated in leukaemia. Runx1 is transcribed from two promoters, P1 and P2, to give rise to different protein isoforms. Although the expression of Runx1 must be tightly regulated for normal blood development, the mechanisms that regulate Runx1 isoform expression during haematopoiesis remain poorly understood. Gene regulatory elements located in non-coding DNA are likely to be important for Runx1 transcription. Here we use circular chromosome conformation capture sequencing to identify DNA interactions with the P1 and P2 promoters of Runx1, and the previously identified +24 enhancer, in the mouse multipotent haematopoietic progenitor cell line HPC-7. The active promoter, P1, interacts with nine non-coding regions that are occupied by transcription factors within a 1 Mb topologically associated domain. Eight of nine regions function as blood-specific enhancers in zebrafish, of which two were previously shown to harbour blood-specific enhancer activity in mice. Interestingly, the +24 enhancer interacted with multiple distant regions on chromosome 16, suggesting it may regulate the expression of additional genes. The Runx1 DNA contact map identifies connections with multiple novel and known haematopoietic enhancers that are likely to be involved in regulating Runx1 expression in haematopoietic progenitor cells.
Runx1 is a key regulator of haematopoietic development. Deletion of Runx1 in mouse embryos is lethal in embry-onic stage (E) 12.5 due to the complete absence of definitive blood cell progenitors accompanied by extensive haemorrhaging1,2. Runx1 is crucial for haematopoietic stem cell (HSC) emergence and maintenance during development3, since conditional ablation of Runx1 in adult mice results in HSC exhaustion4. In acute myeloid leukaemia (AML) and myelodysplastic syndrome, RUNX1 function is frequently altered through mutations or translocations5, resulting in dysregulation of its target genes. While mutations directly affecting the RUNX1 pro-tein are common in leukaemia, mutations in regulatory elements that affect RUNX1 expression remain enigmatic. As yet unidentified mutations in regulatory elements, such as enhancers, could alter Runx1 expression, resulting in abnormal haematopoiesis.
Runx1 is transcribed from two promoters, P1 and P2, to give rise to different protein isoforms6. Expression of these isoforms is tightly controlled during haematopoiesis. At the onset of mouse haematopoiesis (E7.5), preced-ing the generation of HSCs, expression of the P2 isoform(s) is predominant7,8. P1 is expressed soon after P2, and its expression is synchronised with the generation of HSCs7,9. P1 expression is predominant in the mouse fetal liver, the main site of definitive haematopoietic stem/progenitor cell (HSPC) development from E12.5 onward9.
Regulatory elements, such as enhancers, can control the expression of genes via long-range chromatin interactions10. One previously identified Runx1 enhancer is located 24 kb downstream of the P1 transcriptional start site11,12. The +24 enhancer (also known as the +2313, or the +23.512 enhancer) is active in HSCs that express Runx1 dur-ing mouse embryogenesis11–13. The human equivalent of the +24 enhancer (+32 kb downstream of P1) directly contacts the promoters of RUNX1 in leukaemia cell lines14. In addition to the +24 enhancer, putative regulatory
1Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand. 2Cancer Science Institute, National University of Singapore, Singapore, 117599, Singapore. 3International Research Center for Medical Sciences, Kumamoto University, Kumamoto, Japan. 4Liggins Institute, The University of Auckland, Private Bag, 92019, Auckland, New Zealand. 5Maurice Wilkins Centre for Molecular Biodiscovery, The University of Auckland, Private Bag, 92019, Auckland, New Zealand. Correspondence and requests for materials should be addressed to J.A.H. (email: [email protected])
Received: 17 March 2017
Accepted: 2 October 2017
Published: xx xx xxxx
OPEN
www.nature.com/scientificreports/
2SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
elements for RUNX1 have been identified upstream of RUNX1-P1 and between P1 and P2; however, whether they directly contact the RUNX1 promoters has not been investigated15,16.
Here we used circular chromosome conformation capture sequencing (4C-seq) to identify regulatory ele-ments that interact with an active Runx1 P1 promoter, versus an inactive P2 promoter. While Hi-C provides genome-wide contact profiles, and Capture Hi-C enriches for interactions with preselected genomic features (usually promoters), 4C-seq can generate very high resolution contact profiles from ‘baits’ of particular interest17. Therefore, 4C-seq can yield richer information about a selected genomic region than Hi-C or Capture Hi-C. HPC-7 is a well characterised mouse HSPC line with genomic annotations, including transcription factor (TF) binding, histone modifications and chromatin accessibility18–20. 4C-seq in HPC-7 cells identified nine haemato-poietic enhancers that interact with the P1 promoter and +24 enhancer, and that are occupied by haematopoietic TFs. Four of these were previously identified, three of which were functionally tested and two out of the three showed activity during mouse haematopoiesis15. Here we assessed the activity of all nine enhancers in zebrafish, and show that eight are active in zebrafish haematopoiesis. Further, the +24 enhancer was highly interactive both within a topologically associated domain (TAD) harbouring Runx1, as well as with loci outside the TAD. Collectively, our results point to the formation of a local ‘active chromatin hub’ controlling Runx1 expression in haematopoietic cells.
ResultsWe first confirmed that P1 is actively expressed in HPC-7 cells, while P2 is silent (Supplementary Fig. S1). 4C-seq in HPC-7 cells using the P1 and P2 promoters and the +24 enhancer as ‘baits’ identified genomic interactions at Runx1 (Fig. 1a). Bait locations were designed taking into account cohesin and CTCF binding sites near both promoters and the +24 enhancer. 4C baits were designed to regions of interest (P1, +24, P2), allowing for com-parison of interactions between the active P1 promoter and inactive P2 promoter, with secondary baits located at nearby cohesin/CTCF (cc) binding sites (P1cc, +24cc, P2cc) (Fig. 1a).
4C-seq in HPC-7 cells confirms the presence of a 1.1 Mb domain harbouring Runx1. For each bait, two replicate 4C-seq libraries and one control library were sequenced. Reads were predominantly located within a 1.1 Mb region surrounding the Runx1 gene (Fig. 1b,c), representing a TAD harbouring Runx1. From visual inspection of the contact profile (Fig. 1b,c), domain boundaries appear to be present at the Cbr1/Setd4 genes upstream of Runx1, and Clic6 downstream (Fig. 1b). A comparison with existing Hi-C data from mouse embryonic stem cells, mouse CH12 (erythroleukaemia) cells, human GM12878 (lymphoblastoid), human K562 (myeloid leukaemia) and human IMR90 (foetal lung) cells revealed that TAD boundaries are conserved, and that our 4C data is consistent with existing Hi-C data (Supplementary Fig. S2). Most P1 and +24 interac-tions take place upstream of the Runx1 gene (Fig. 1b,c). In contrast, there are fewer upstream interactions from P2, while downstream interactions are retained (Fig. 1b,c). There are no other coding genes within the Runx1 domain; however, three inactive non-coding genes19, Mir802 and the long non-coding RNAs 1810053B23Rik and 1700029J03Rik, are located near the upstream border of the TAD.
Interactions from ‘cc’ baits (cohesin/CTCF binding sites) were similar to their nearby corresponding baits at P1, +24 and P2 (Fig. 1b). We investigated whether significant interactions for the ‘cc’ baits have more overlap with other cohesin or CTCF binding sites than for the non ‘cc’ baits, but did not find any difference (data not shown). The ‘cc’ baits may be too close to the corresponding P1, P2 and +24 baits to resolve unique interactions, although they are separated by at least one restriction fragment.
Chromatin interactions anchored by the Runx1 P1 and P2 promoters and +24 enhancer. Most of the significant interactions for P1, P2 and the +24 enhancer occur within the 1.1 Mb domain (Fig. 1b). There are ~40–50 significant interactions for P1cc/P1 baits, ~60–75 for +24/+24cc baits, and ~30 for P2/P2cc baits (Fig. 1d). Strikingly, the +24 enhancer in particular forms many significant interactions outside the Runx1 domain (Fig. 1b,d). Many of these long-range interactions are with other genes or gene promoters on mouse chro-mosome 16 (Fig. 2). Expression levels of these genes in human tissues according to Genotype-Tissue Expression (GTEx, https://www.gtexportal.org/home/)21 did not reveal obvious tissue-specific expression patterns. However, among the genes contacted by +24 are Erg, a haematopoietic TF, and Tiam1 (T-cell lymphoma invasion and metastasis 1), involved in cell adhesion and cell migration. These distant connections suggest that +24 may also regulate other haematopoietic genes in adjacent domains.
Identification of haematopoietic enhancers. We hypothesised that DNA connections formed from the active P1 promoter and the +24 enhancer may correspond to haematopoietic enhancers that regulate Runx1 expression. To identify putative enhancers, we aligned significantly interacting sites with the occupancy of thirteen TFs involved in haematopoietic progenitor cell production; enhancer histone modifications and DNase I hyper-sensitivity sites18–20; and conserved non-coding elements (CNE), which were identified based on comparative genomics alignment and retroviral integration site mapping to indicate potential DNaseI hypersensitive sites11. We note that the +24 enhancer binds all thirteen haematopoietic progenitor TFs in HPC-7 cells (Fig. 3).
We selected putative enhancers based on the binding of at least six TFs involved in haematopoietic progenitor cell development (Fig. 3), and the presence of a significant interaction directly at, or within 2 kb, of the TF binding cluster. Based on these criteria, we found eight other potential enhancers within the Runx1 domain that form con-nections to the baits. These enhancers were named according to their distance from the P1 transcriptional start site: −371, −354, −327, −321, −303, −58, −48, and +110 (Fig. 3b). An additional interacting region at −368, a CNE, was located adjacent to a cluster of haematopoietic TFs (Fig. 3b). The +24 and +110 are also CNEs11. The putative enhancers either form distinct blocks upstream of Runx1-P1 (from −371 to −303 and −48 to −58), or fall between the P1 and P2 promoters (+24 and +110). Four out of nine of the identified putative enhancers
www.nature.com/scientificreports/
3SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
Figure 1. Runx1 interaction profile reflects the activity of the P1 and P2 promoters and the +24 enhancer. (a) Location of 4C baits at the mouse Runx1 gene (mm10). Baits are P1cc, P1, +24, +24cc, P2cc and P2; ‘cc’: cohesin and CTCF binding site. ChIP-seq data of Rad21 (blue) and CTCF (pink) in MEL and CH12 cells were obtained from ENCODE51 and in HPC-7 cells from18,19. Exons are in grey and the +24 enhancer is in green. (b) 4C-seq profile (read per million normalised running mean of nine successive DpnII digestion fragments) for one replicate per bait with significant interactions (the top fifth percentile of interactions with a FDR of <0.01) that overlap in both replicates shown in a ~2.5 Mb region surrounding the Runx1 gene. The highest significant interactions (see Methods) are in red and other significant interactions in orange. Red dotted lines indicate the location of estimated domain boundaries. (c) Read distribution for both replicates of each bait. Line graphs represent the cumulative percentage of reads on chromosome 16 (y-axis) versus the distance from the bait (x-axis), shown in genomic coordinates on chromosome 16 (genome version mm10). The 0-value of each line indicates the location of each bait. (d) The number of significant interactions that overlap in both replicates inside (light grey) and outside (dark grey) the domain for each bait.
www.nature.com/scientificreports/
4SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
(−327, −321, −58 and +110) were identified previously based on the binding of at least three blood TFs and the presence of a H3K27ac peak (denoting active chromatin) from a subset of the same HPC-7 datasets that we have used here15.
Eight out of nine putative enhancers (−371, −368, −354, −327, −321, −58, −48, and +110) form long-range interactions with the P1 promoter, the +24 enhancer, or both (Fig. 3b and Supplementary Table S1). The excep-tion was −303, which does not interact with P1, but instead with the +24 and the P2 promoter. The +24 enhancer interacts promiscuously within the whole domain. Filtering at maximum stringency (see Methods) showed that the +24 enhancer connects to all putative enhancers and both Runx1 promoters (Fig. 3b). In contrast, the inac-tive P2 promoter connects only to −303 and +24 (Fig. 3b and Supplementary Table S1). A specific interaction between +110 and P2 could not be resolved owing to a contiguous block of significant interactions throughout the short region between +110 and P2. A model of Runx1 interactions based on the 4C-seq results is shown in Fig. 4.
Long range chromatin interactions at haematopoietic enhancers. Long-range chromatin inter-actions can be mediated by cohesin and CTCF22, and cohesin is involved in transcription regulation at active genes23. We found that four of the nine enhancer loci (in addition to the +24 enhancer) coincide with Rad21 (cohesin) binding in the absence of CTCF (Fig. 3b and Supplementary Table S1). This is consistent with the idea that cohesin (but not necessarily CTCF) mediates local DNA-DNA interactions within TADs22. All Rad21 binding sites interacted with at least one ‘cc’ bait, therefore cohesin could mediate at least a subset of enhancer-promoter communication events in HPC-7 cells.
We compared the 4C interactions identified with recently published Capture Hi-C data in HPC-7 cells18 (Supplementary Fig. S3). Capture Hi-C data was only available for interactions anchored at P1, and has a lower coverage and resolution than our 4C-seq study (an average of ~18,000 reads per promoter for Capture Hi-C with a 6-cutter, and over 1 million reads per bait for 4C-seq with a 4-cutter). The Capture Hi-C study in HPC-7 cells identified 15 P1-interacting regions that were reproduced in our study (Supplementary Fig. S3). All of these are upstream of Runx1-P1, and most are within the −368 to −303 enhancer cluster (Supplementary Fig. S3). Therefore, our study provides additional Runx1-anchored interactions that were not previously described as con-nected to Runx1 promoters (enhancers −371, −48, −58 and +110).
Figure 2. Interactions of the Runx1 locus with other genes. Significant interactions that overlap with gene bodies or gene promoters (starting from 2 kb upstream of the gene transcriptional start site) are shown for the P1cc/P1 (a), +24/ +24cc (b) and P2cc/P2 baits (c) (assembly mm10).
www.nature.com/scientificreports/
5SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
In vivo characterization of haematopoietic enhancers. Enhancer regions interacting with Runx1 recruit haematopoietic TFs in HPC-7 cells, therefore we determined if these regions act as enhancers in vivo. Each of the putative enhancers was tested for the ability to drive tissue-specific GFP expression in zebrafish embryos. Eight out of nine drove GFP expression specifically in the intermediate cell mass and posterior blood island, which are sites of haematopoietic progenitor cell production at 20–24 hours post-fertilisation (hpf)24 (Fig. 5a,b and Supplementary Fig. S4). Two of these (−58 and +110) were previously shown to be active during mouse haematopoiesis15. The −303 enhancer also expressed GFP in keratinocytes, particularly after 24 hpf (Figs 5c, S4). Interestingly, −303 was the only enhancer identified that interacted with the P2 promoter of Runx1, rather than P1. Despite the occupancy of multiple haematopoietic TFs in HPC-7 cells, and the presence of similar TF binding motifs compared to the other enhancers (Fig. 3 and Supplementary Tables 1 and 2, we did not observe enhancer activity for −327, consistent with a previous study in mice15.
In summary, we have assigned in vivo function to multiple putative enhancers upstream of Runx1 that were previously identified in silico18. These regions not only drive haematopoietic expression, but also physically con-nect with the Runx1-P1 promoter, lending confidence to the concept that they are bona fide regulators of Runx1 transcription.
Figure 3. The active P1 promoter and +24 enhancer interact with clusters of haematopoietic transcription factor binding sites. Significant interactions within the domain for each bait are shown together with the reference genes (mm9), CNEs11, TF binding sites (green), Rad21 and CTCF binding sites (black), H3K27ac, H3K4me3 and DNaseI hypersensitivity sites in HPC-7 cells18–20. A 3 Mb region (a) and two zoomed-in views (b) are shown. Locations of TF binding clusters containing at least six TF binding sites at the same location, and a significant interaction directly at or within 2 kb of each cluster, are indicated by arrows and named according to their distance from the P1 transcriptional start site. See also Supplementary Fig. S3.
www.nature.com/scientificreports/
6SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
Discussion4C analysis in HPC-7 cells generated a high-resolution connectivity map of the genomic region harbouring Runx1, and confirmed previously identified upstream connections from P1 to putative regulatory elements18. Runx1 appears to be contained within a ~1 Mb chromatin domain, consistent with Hi-C analyses in other cell types25,26. We observed multiple connections both up- and downstream from all 4C-seq baits (Runx1-P1, Runx1-P2 and +24 enhancer). Significantly, there were many more upstream connections anchored by the active elements (Runx1-P1 and +24).
Surprisingly, the +24 enhancer forms many up- and downstream connections outside of the Runx1 TAD. These comprise up to one-third of all connections formed and include other haematopoietic genes, such as Tiam1 downstream, and Erg upstream. Erg and Tiam1 dysregulation is associated with several tumor types, including AML and B- and T-cell lymphomas27,28. Strikingly, these genes are located over 2 Mb away from Runx1. These findings raise the interesting possibility that the +24 enhancer acts as a scaffold to recruit multiple promoters, enhancers and TFs over long distances in cis. Connections to +24 identified by our 4C-seq may point to the identity of some of these genes.
Our 4C-seq data identified multiple chromatin connections that intersect with some of the strongest indicators of upstream enhancers characterised by the binding of clusters of TFs and epigenetic modifications18,20. Eight out of nine of these DNA regions are able to drive haematopoietic expression in zebrafish, strongly suggesting an in vivo function. Although previous studies had identified transcriptional activity for a subset of these enhancers15, or an interaction with the P1 promoter for an overlapping subset18, no single prior study has shown that enhanc-ers both interact with Runx1 promoters (and +24), and have transcriptional activity. Table 1 provides an overview of findings from these previous studies together with additional evidence from our study that connects spatial proximity with function. Two enhancers, termed −59 and +110, were previously shown to drive LacZ expression in mice15; these are equivalent to the −58 and +110 enhancers identified here. Our data show in addition that these regions are in contact with P1 and the +24 enhancer (Table 1). While the −327 enhancer has similar char-acteristics as the other enhancers identified here, neither a previous study in mice15 nor our study identified any enhancer activity for it (Table 1). The −368, −354, −327, −321 and −303 enhancers were found to interact with P1 in a Capture Hi-C study in HPC-7 cells (Table 1). Our study confirms the presence of these interactions, and in addition, shows that these enhancers also interact with either the +24 enhancer or P2 (Table 1).
The −303 enhancer interacts with P2 rather than P1 and, in addition to being active in haematopoietic sites, it drives expression in keratinocytes, indicating that it could act in a complex that keeps P2 silent; and/or that it regulates expression of Runx1 in other tissues. Interestingly, Runx1 is actively expressed in mouse keratinocytes where it is important for hair follicle development29. Neuronal cells also express Runx130, and there may be addi-tional regulatory elements that control Runx1 expression in a manner distinct from haematopoietic expression. In support of this idea, we previously determined that cohesin and CTCF influence runx1 expression in haemato-poietic, but not neuronal cells, in zebrafish31,32.
Cohesin and CTCF organise chromatin structure and when present in combination, they appear to negatively correlate with the HSPC TFs in HPC-7 cells18. However, we observed a coincidence of cohesin subunit Rad21 bind-ing in the absence of CTCF with four out of nine identified enhancers, as well as the +24 enhancer. This suggests that CTCF-independent cohesin mediates a subset of enhancer-promoter looping in combination with TFs. This interpretation is consistent with previously identified CTCF-independent functions for cohesin in genome organi-zation and transcription33. Importantly, cohesin mutations are prevalent in AML and other myeloid malignancies34, and are categorised together with RUNX1 and spliceosome mutations in a genetic category that confers poor prognosis in AML5. Cohesin mutations led to increased chromatin accessibility of Runx1 as measured by ATAC-seq.35, raising the possibility that spatiotemporal regulation of Runx1 is cohesin-dependent in mouse and human, as was previously observed in zebrafish31.
Figure 4. Model of chromatin interactions based on Runx1 4C-seq. 4C-seq baits are shown in red and identified interacting enhancers in blue. P1 is active, and P2 is inactive. P1 interacts with −371, −368, −354, −327, −321, −58, −48, +24 and +110; +24 interacts with −371, −368, −354, −327, −321, −303, −58, −48, +110 and P2; and P2 interacts with −303 and +24.
www.nature.com/scientificreports/
7SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
4C-seq in HPC-7 cells has provided new high resolution connectivity data that sheds light on the genomic organization of Runx1, an important haematopoietic transcription factor. These data confirm and extend pre-vious analyses (Table 1), and furthermore, provide insight into the function of enhancers that have potential to
Figure 5. Putative mouse enhancers are active in haematopoietic regions in zebrafish. Whole mount representative lateral images of zebrafish embryos (20–24 hpf [a,b] and ~48 hpf [c]) that were injected with enhancer-GFP plasmids at the one-cell stage; left hand panels are merges with bright field, right hand panels are GFP fluorescence. (a) Negative control ZED plasmid shows zero (image) or non-specific fluorescence, while the positive control (+24) shows GFP expression in haematopoietic cells of the posterior blood island (red arrows) and intermediate cell mass (yellow arrows), which represent primitive blood cells. (b) Enhancer activity of +110, −48, −58, −303, −321, −327, −354, −368, −371. GFP expression was observed in primitive haematopoietic cells of the posterior blood island (red arrows) and intermediate cell mass (yellow arrows). (c) The −303 enhancer also expressed GFP in keratinocytes (white arrows in b,c). The numbers in the right hand panels represent the number exhibiting the representative phenotype out of the total number of fluorescent embryos analysed. See also Supplementary Fig. S4.
www.nature.com/scientificreports/
8SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
regulate Runx1 expression. The data presented here set the scene for functional analyses to precisely determine how Runx1 is regulated, including CRISPR/Cas9-mediated interference with enhancer activity. They also provide a rationale for screening patients with myeloproliferative disorders for mutations in enhancer regions.
MethodsCell culture. HPC-7 cells were maintained at a density of 1–10 × 105 cells/mL in Iscove’s modified Dulbecco’s media (Gibco®) supplemented with 3.024 g/L sodium bicarbonate, 10% fetal bovine serum (Moregate, New Zealand), 10% stem cell factor conditioned media and 0.15 mM monothiolglycerol (Sigma-Aldrich) as previously described36. SCF-conditioned media was obtained from culturing BHK-MKL cells maintained in Dulbecco’s mod-ified Eagle media (Sigma-Aldrich) supplemented with 3.5 g/L glucose, 3.7 g/L Sodium Bicarbonate and 10% FBS.
4C-seq library preparation. 4C library preparation was performed as previously described37 with modi-fications. Three libraries were generated, two replicates (passage 8 and passage 10 cells) and one control (a 1:1 mix of both replicates for which the first ligation step was omitted). Cells were cross-linked in 2% formaldehyde, 5% FBS and 1x PBS for 10 minutes at room temperature while rotating. Formaldehyde was quenched with a final concentration of 125 mM glycine for 5 minutes on ice while inverting several times. Cell pellets were washed twice with ice-cold 1x PBS.
Nuclei were harvested by lysing the cell pellets in ice-cold lysis buffer (10 mM Tris pH 8.0, 10 mM NaCl, 0.2% NP-40, protease inhibitors) for 10 minutes on ice. Nuclei were then resuspended in 1.2x DpnII restriction buffer (New England Biolabs) and 0.3% SDS and incubated for 1 hour at 37 °C while shaking. Triton X-100 was then added to a final concentration of 1.8% and the reaction was left at 37 °C while shaking for another hour. Chromatin was digested with 800 U of DpnII overnight at 37 °C while shaking. DpnII was inactivated by adding SDS to a final concentration of 1.3% and incubating at 65 °C for 20 minutes. Nuclei were diluted into a volume of 7 mL containing 1.01x T4 DNA ligase buffer (Life Technologies) and Triton-X100 at a final concentration of 1% and incubated at 37 °C for 1 hour. Ligations were carried out with 100 U of T4 ligase (Life Technologies) for 4.5 hours at 16 °C and 30 minutes at room temperature while shaking. For control libraries, ligase was omitted. Samples were proteinase K treated and reverse-crosslinked overnight at 65 °C. Samples were then treated with RNase A at 37 °C for 30 minutes. DNA was purified by phenol/chloroform extraction and ethanol precipitated.
A second digestion was performed with 25 U of BfaI (New England Biolabs) for P1 and P2 baits or MseI (New England Biolabs) for +24 baits in restriction buffer overnight at 37 °C while shaking. Restriction was inactivated by adding SDS to a final concentration of 1.3% and incubating at 65 °C for 20 minutes. A second ligation was performed in the same way as the first ligation, except that ligations were incubated overnight. DNA was purified with two phenol/chloroform extractions and one chloroform extraction followed by ethanol precipitation. DNA
Enhancer location relative to P1 (kb)
Interactions identified by other studies Previous functional validation Findings from our study
−371 — NoneInteracts with +24 and P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
−368 Interacts with P1 in HPC-7 cells (Capture Hi-C by Wilson et al.)18 None
Interacts with +24 and P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
−354 Interacts with P1 in HPC-7 cells (Capture Hi-C by Wilson et al.)18 None
Interacts with +24 and P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
−327 Interacts with P1 in HPC-7 cells (Capture Hi-C by Wilson et al.)18 No enhancer activity in mice (Schutte et al.)15 Interacts with +24 and P1. No enhancer
activity in zebrafish.
−321 Interacts with P1 in HPC-7 cells (Capture Hi-C by Wilson et al.)18
Identified, but not tested for enhancer activity (Schutte et al.)15
Interacts with +24 and P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
−303 Interacts with P1 in HPC-7 cells (Capture Hi-C by Wilson et al.)18 None
Interacts with +24 and P2. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos and in keratinocytes from 20 hpf.
−58 —Haematopoietic enhancer activity in E11.5 transgenic mice and luciferase enhancer activity in 416B cells (Schutte et al.)15
Interacts with +24 and P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
−48 — NoneInteracts with +24 and P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
+24Interacts with P1 and P2 in human leukaemia cell lines (3 C by Markova et al.)14
Haematopoietic enhancer activity in mouse and zebrafish development and luciferase enhancer activity in 416B cells (Nottingham et al., 2009, Ng et al., 2010, Schutte et al.)11,12,15
Interacts with P1 and P2. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
+110 —Haematopoietic enhancer activity in E11.5 transgenic mice and luciferase enhancer activity in 416B cells (Schutte et al.)15
Interacts with P1. Enhancer activity in hematopoietic sites in 20–24 hpf zebrafish embryos.
Table 1. Previous findings on enhancers and new findings from this study.
www.nature.com/scientificreports/
9SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
concentrations were measured using a Qubit® 3.0 Fluorometer (Life Technologies) and Qubit® double-stranded DNA (dsDNA) High Sensitivity Assay kit (Life Technologies).
For each bait, a total of 1 μg of DNA was amplified by PCR using Q5® High-Fidelity DNA Polymerase (New England Biolabs). Bait primer sequences are listed in Supplementary Table S3. PCR products were purified using the QIAquick PCR Purification kit (QIAGEN). DNA concentrations were measured using a Qubit® 3.0 Fluorometer and average fragment size by the 2100 Bioanalyser (Agilent Technologies) using a High Sensitivity DNA Kit (Agilent Technologies). Amplicons from the six different baits were mixed equally based on the concen-tration, average fragment size and ratio of demultiplexed 4C baits obtained from an initial MiSeq run. Libraries were prepared with Prep2SeqTM DNA Library Prep Kit from IlluminaTM (Affymetrix) and TruSeq® adaptors (Illumina). Libraries were mixed equimolarly and sequenced as 125 bp paired-end reads on two IlluminaTM HiSeq 2500 lanes by New Zealand Genomics Limited.
4C-seq data analysis. 4C-seq data was analysed in command-line and the R statistical environment38, and visualised using the University of California, Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/) with mouse assemblies mm9 or mm1039,40, or with the R package ggplot241. Baits were demultiplexed based on bait primer sequences up to and including the digestion site using a custom awk script, allowing 0 mismatches. Only read pairs that had the forward and reverse bait sequences in the correct orientation were selected. Adapter sequences, bait sequences up to but excluding the digestion site, and bases with a Phred quality score under 20 were then trimmed from the reads, using the fastq-multx, fastq-mcf and cleanadaptors v1.24 tools42,43. Quality of reads was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)44.
Reads with a minimum length of 30 bp were mapped to the mm10 reference genome using Bowtie145, allow-ing 0 mismatches. Mapped reads were assigned to DpnII digestion fragments using fourSig46. The following reads were removed from the files: 1) self-ligated reads, 2) uncut reads (fragments adjacent to baits), and 3) reads at fragments that have at least 1 read in the control (non-ligated) library. The running mean was calculated from the sum of read counts from nine successive fragments, which was obtained using fourSig46, and was read per million normalised.
Significant interaction calling was performed using the R package fourSig with the following settings: window size of 3, 1000 iterations, fdr of 0.01, fdr.prob of 0.05 (which selects the top fifth percentile of interactions with a FDR of <0.01), and only included mappable fragments46. Significant interactions were called for two regions: 1) the whole of chr16, and 2) from chr16:92,250,000-93,635,000 (within the domain). Significant interactions in both replicates were overlapped using the bedIntersect tool from UCSC47.
In the fourSig package, significant interactions can be categorised into three categories: 1) interactions that are significant after the reads from the fragment with the highest read count is removed, 2) interactions that are significant when the fragment with the highest read count is averaged to the read counts of the neighbouring frag-ments, and 3) interactions that are significant only when all fragment read counts are included46. For this study, only category 1 and 2 interactions that overlap between both replicates were included, as they are more likely to represent true interactions (because they span multiple fragments), and were previously shown to be more reproducible between replicates than single-fragment interactions46. Furthermore, we distinguished category 1 interactions that overlap between both replicates from other category 1 and 2 interactions by colouring them red and orange, respectively, to visualise the most significant interactions. For conversion from assembly mm10 to mm9, the liftOver tool from UCSC was used (http://genome.ucsc.edu/)47. Gene annotations used in Figures are UCSC reference genes.
Zebrafish enhancer assay. Runx1 regulatory regions were amplified from HPC-7 gDNA or from I-SceI-zhsp70 plasmid containing the −368, +24 and +110 sequences11 (primer sequences are in Supplementary Table S3), and cloned into the zebrafish enhancer detection vector48. Plasmid purifications were performed with the NucleoSpin® Plasmid or NucleoBond® Xtra Midi prep kits (Machery-Nagel). Primers amplified the TF binding peak +/− 200 bp, except for −321 which (due to a repetitive region) did not have a 200-bp extension on the 3′ end. The −368, +24 and +110 fragments are 471, 529 and 579 bp, respectively11. A mixture of 30 pg vector DNA and 120 pg Tol2 transposase mRNA49 was injected into 1-cell zebrafish embryos. Embryos were imaged at 20–24 or ~48 hpf using a Leica M205FA stereomicroscope with a DFC490 camera and LAS software (Leica Microsystems), images were processed using Adobe Photoshop. Zebrafish were maintained as described previously50 and zebrafish handling and procedures were carried out in accordance with the Otago Zebrafish Facility Standard Operating Procedures. The University of Otago Animal Ethics Committee approved all zebraf-ish research under approval AEC 48/11.
Identification of conserved non-coding elements. Mouse conserved non-coding elements (mCNEs) were identified as described previously11.
ChIP-seq, Capture Hi-C and DNase I hypersensitivity data. Occupancy of the transcription factors Erg, Fli1, Scl, Runx1, Gata2, E2A, Ldb1, Lyl1, Lmo2, Gfi1b, Meis1, Myb, phospho-Stat1, Pu.1, Stat3, Eto2, Cebp-α, Cebp-β, Elf1, Nfe2, p53, cMyc, Egr1, E2f4, cFos, Mac and Jun; Rad21 and CTCF; H3K27ac and H3K4me3; DNase I hypersensitivity sites; and Capture Hi-C data in HPC-7 cells was obtained from previously published data18–20. Rad21, Smc3 and CTCF chromatin immunoprecipitation sequencing (ChIP-seq) data in MEL and CH12 cells were obtained from ENCODE51.
Availability of data. The 4C-seq dataset is accessible through GEO Series accession number GSE86994 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE86994).
www.nature.com/scientificreports/
1 0SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
References 1. Okuda, T., van Deursen, J., Hiebert, S. W., Grosveld, G. & Downing, J. R. AML1, the target of multiple chromosomal translocations
in human leukemia, is essential for normal fetal liver hematopoiesis. Cell 84, 321–30 (1996). 2. Wang, Q. et al. Disruption of the Cbfa2 gene causes necrosis and hemorrhaging in the central nervous system and blocks definitive
hematopoiesis. Proc Natl Acad Sci USA 93, 3444–9 (1996). 3. Swiers, G., de Bruijn, M. & Speck, N. A. Hematopoietic stem cell emergence in the conceptus and the role of Runx1. The International
journal of developmental biology 54, 1151–63 (2010). 4. Growney, J. D. et al. Loss of Runx1 perturbs adult hematopoiesis and is associated with a myeloproliferative phenotype. Blood 106,
494–504 (2005). 5. Papaemmanuil, E. et al. Genomic Classification and Prognosis in Acute Myeloid Leukemia. N Engl J Med 374, 2209–21 (2016). 6. Levanon, D. & Groner, Y. Structure and regulated expression of mammalian RUNX genes. Oncogene 23, 4211–9 (2004). 7. Bee, T. et al. Nonredundant roles for Runx1 alternative promoters reflect their activity at discrete stages of developmental
hematopoiesis. Blood 115, 3042–50 (2010). 8. Pozner, A. et al. Developmentally regulated promoter-switch transcriptionally controls Runx1 function during embryonic
hematopoiesis. BMC developmental biology 7, 84 (2007). 9. Bee, T. et al. Alternative Runx1 promoter usage in mouse developmental hematopoiesis. Blood cells, molecules & diseases 43, 35–42
(2009). 10. Marsman, J. & Horsfield, J. A. Long distance relationships: enhancer-promoter communication and dynamic gene transcription.
Biochimica et biophysica acta 1819, 1217–27 (2012). 11. Ng, C. E. et al. A Runx1 intronic enhancer marks hemogenic endothelial cells and hematopoietic stem cells. Stem Cells 28, 1869–81
(2010). 12. Nottingham, W. T. et al. Runx1-mediated hematopoietic stem-cell emergence is controlled by a Gata/Ets/SCL-regulated enhancer.
Blood 110, 4188–97 (2007). 13. Bee, T. et al. The mouse Runx1 +23 hematopoietic stem cell enhancer confers hematopoietic specificity to both Runx1 promoters.
Blood 113, 5121–4 (2009). 14. Markova, E. N., Kantidze, O. L. & Razin, S. V. Transcriptional regulation and spatial organisation of the human AML1/RUNX1 gene.
J Cell Biochem 112, 1997–2005 (2011). 15. Schütte, J. et al. An experimentally validated network of nine haematopoietic transcription factors reveals mechanisms of cell state
stability. eLife 5, e11469 (2016). 16. Gunnell, A. et al. RUNX super-enhancer control through the Notch pathway by Epstein-Barr virus transcription factors regulates B
cell growth. Nucleic Acids Res 44, 4636–50 (2016). 17. Sati, S. & Cavalli, G. Chromosome conformation capture technologies and their impact in understanding genome function.
Chromosoma (2016). 18. Wilson, N. K. et al. Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell
model. Blood 127, e12–e23 (2016). 19. Calero-Nieto, F. J. et al. Key regulators control distinct transcriptional programmes in blood progenitor and mast cells. EMBO J 33,
1212–26 (2014). 20. Wilson, N. K. et al. Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major
transcriptional regulators. Cell Stem Cell 7, 532–44 (2010). 21. Consortium, G. T. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580–5 (2013). 22. Merkenschlager, M. & Nora, E. P. CTCF and Cohesin in Genome Folding and Transcriptional Gene Regulation. Annu Rev Genomics
Hum Genet 31, 17–43 (2016). 23. Dorsett, D. & Merkenschlager, M. Cohesin at active genes: a unifying theme for cohesin and gene expression from model organisms
to humans. Curr Opin Cell Biol 25, 327–33 (2013). 24. de Jong, J. L. & Zon, L. I. Use of the zebrafish system to study primitive and definitive hematopoiesis. Annu Rev Genet 39, 481–501
(2005). 25. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80
(2014). 26. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–80
(2012). 27. Ives, J. H. et al. Increased levels of a chromosome 21-encoded tumour invasion and metastasis factor (TIAM1) mRNA in bone
marrow of Down syndrome children during the acute phase of AML(M7). Genes, chromosomes & cancer 23, 61–6 (1998). 28. Shimizu, K. et al. An ets-related gene, ERG, is rearranged in human myeloid leukemia with t(16;21) chromosomal translocation.
Proceedings of the National Academy of Sciences of the United States of America 90, 10280–4 (1993). 29. Ortt, K., Raveh, E., Gat, U. & Sinha, S. A chromatin immunoprecipitation screen in mouse keratinocytes reveals Runx1 as a direct
transcriptional target of DeltaNp63. J Cell Biochem 104, 1204–19 (2008). 30. Inoue, K., Shiga, T. & Ito, Y. Runx transcription factors in neuronal development. Neural development 3, 20 (2008). 31. Horsfield, J. A. et al. Cohesin-dependent regulation of Runx genes. Development 134, 2639–49 (2007). 32. Marsman, J. et al. Cohesin and CTCF differentially regulate spatiotemporal runx1 expression during zebrafish development. Biochim
Biophys Acta 1839, 50–61 (2014). 33. Zuin, J. et al. Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells. Proc Natl Acad Sci
USA 111, 996–1001 (2014). 34. Leeke, B., Marsman, J., O’Sullivan, J. M. & Horsfield, J. A. Cohesin mutations in myeloid malignancies: underlying mechanisms. Exp
Hematol Oncol 3, 13 (2014). 35. Mazumdar, C. et al. Leukemia-Associated Cohesin Mutants Dominantly Enforce Stem Cell Programs and Impair Human
Hematopoietic Progenitor Differentiation. Cell Stem Cell 17, 675–688 (2015). 36 Pinto do, O. P., Kolterud, A. & Carlsson, L. Expression of the LIM-homeobox gene LH2 generates immortalized steel factor-
dependent multipotent hematopoietic precursors. EMBO J 17, 5744–56 (1998). 37. van de Werken, H. J. et al. 4C technology: protocols and data analysis. Methods Enzymol 513, 89–112 (2012). 38. Team, R. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. (R:
A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing., Vienna, Austria, 2016).
39. Kent, W. J. et al. The human genome browser at UCSC. Genome Res 12, 996–1006 (2002). 40. Mouse Genome Sequencing, C. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–62 (2002). 41. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag New York, 2009). 42. Aronesty, E. Ea-utils: Command-line tools for processing biological sequencing data. (ExpressionAnalysis, Durham, NC, 2011). 43. Chatterjee, A., Stockwell, P. A., Rodger, E. J. & Morison, I. M. Comparison of alignment software for genome-wide bisulphite
sequence data. Nucleic Acids Res 40, e79 (2012). 44. Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data. 45. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the
human genome. Genome Biol 10, R25 (2009).
www.nature.com/scientificreports/
1 1SCIeNTIfIC REPORTs | 7: 13347 | DOI:10.1038/s41598-017-13748-8
46. Williams, R. L. Jr et al. fourSig: a method for determining chromosomal interactions in 4C-Seq data. Nucleic Acids Res 42, e68 (2014).
47. Kent, W. J. et al. The human genome browser at UCSC. Genome research 12, 996–1006 (2002). 48. Bessa, J. et al. Zebrafish enhancer detection (ZED) vector: a new tool to facilitate transgenesis and the functional analysis of cis-
regulatory regions in zebrafish. Developmental dynamics: an official publication of the American Association of Anatomists 238, 2409–17 (2009).
49. Kawakami, K. Transposon tools and methods in zebrafish. Dev Dyn 234, 244–54 (2005). 50. Westerfield, M. The Zebrafish Book. A guide for the laboratory use of zebrafish (Brachydanio rerio). (University of Oregon Press,
Eugene, Oregon, 1995). 51. Rosenbloom, K. R. et al. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 41, D56–63 (2013).
AcknowledgementsHPC-7 and BHK/MKL cells were kindly provided by Kathy Knezevic and John Pimanda. TruSeq® adaptors were kindly provided by Ian Morison. The authors would like to thank Anita Dunbier and Sofie Van Huffel for help with development of 4C protocols, and Noel Jhinku for expert management of the Otago Zebrafish Facility. This research was funded by the Royal Society of NZ Marsden Fund [grant number 11-UOO-027 to JAH] and the Health Research Council of NZ [grant number 15/229 to JAH].
Author ContributionsJ.M., A.T., M.O., J.M.O., J.A.H., designed experiments; J.M., A.T., performed experiments; J.M., A.T., M.O., J.M.O., J.A.H. analysed data; A.T. produced the graphic for Fig. S4; J.M. and J.A.H. wrote the paper with input from the other authors.
Additional InformationSupplementary information accompanies this paper at https://doi.org/10.1038/s41598-017-13748-8.Competing Interests: The authors declare that they have no competing interests.Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-ative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per-mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2017
287
Appendix VII: RUNX1 regulatory region Hg38
coordinates
Appendix VII Table 1: Hg38 Genome coordinates for regulatory regions
Regulatory region location relative to P1 (kb)
Genome coordinates (Hg38)
m-371 chr21:35482813-35483787 m-368 chr21:35476819-35476991 m-327 chr21:35428582-35428982 m-321 Not conserved between humans and mice -250 chr21:35297414-35298323 -188 chr21:35236032-35236509 -139 chr21:35189322-35190258 m-101 chr21:35170398-35170666 m-58 chr21:35106408-35106609 m-43 chr21:35091787-35091963 m+24 chr21:35026809-35027025 m-42 Not conserved between humans and mice m+1 Not conserved between humans and mice m+3 chr21:35046175-35046447 +62 chr21:34986636-34987656 m+110 chr21:34908536-34908897 m+181 Not conserved between humans and mice Intron 5.2 containing m+204 chr21:34807014-34809284
288
Appendix VIII: 4C data processing
The 4C sequencing libraries were prepared from 4C material, each containing PCR
products amplified by bait specific primer pairs. The read counts were determined from the
original sequencing library counts given by Otago Genomics Facility (Appendix VIII Table 1).
The baits were demultiplexed by Dr. William Schierding (University of Auckland). The
number of reads that demultiplexed to baits within each library was also determined (Appendix
VIII Table 1).
Appendix VIII Table 1: Read counts before and after primer demultiplexing
Library Original Read Count
Demultiplexed Read Count
4533_16 K562 WT DMSO 1 11,747,923 3,973,768 4533_17 K562 WT DMSO 2 6,124,022 3,218,192 4533_18 K562 WT DMSO 3 4,231,900 2,511,463 4533_19 Control K562 WT DMSO 2,854,280 1,395,817 4533_20 K562 WT 12h PMA 1 4,568,245 3,389,303 4533_21 K562 WT 12h PMA 2 6,851,198 1,672,071 4533_22 K562 WT 12h PMA 3 4,223,579 1,999,157 4533_23 K562 WT 72h PMA 1 6,349,712 4,105,424 4533_24 K562 WT 72h PMA 2 2,934,889 2,094,222 4533_25 K562 WT 72h PMA 3 4,357,508 2,707,894 4533_26 Control K562 WT 72 PMA 3,087,335 1,844,770 4533_27 K562 -/- DMSO 1 3,948,820 2,712,143 4533_28 K562 -/- DMSO 2 4,173,877 2,528,795 4533_29 K562 -/- DMSO 3 7,181,545 3,739,103 4533_30 Control K562 -/- DMSO 31,255,638 92,067 4533_31 K562 -/- 12h PMA 1 7,077,762 3,115,038 4533_32 K562 -/- 12h PMA 2 8,657,941 3,624,719 4533_33 K562 -/- 12h PMA 3 3,914,040 2,136,044 4533_34 K562 -/- 72h PMA 1 6,648,572 3,795,374 4533_35 K562 -/- 72h PMA 2 6,463,080 3,398,746 4533_36 K562 -/- 72h PMA 3 23,085,885 3,480,265 4533_37 Control K562 -/- 72 PMA 4,041,648 2,265,435
After demultiplexing of primers, the reverse sequenced reads were mapped to a reduced
genome based on the primary RE used in the first digestion (Dpn II). The majority of RUNX1
+24 reads aligned to the reduced genome. Appendix VIII Figure 1 and 2 have very similar read
289
numbers. The quality evaluation compares the read distributions to determine if a replicate is
high enough quality to be included in the 4C interaction analysis (Appendix VIII Figure 3).
RUNX1 +24
Appendix VIII Figure 1: RUNX1 +24 demultiplexed read count.
The number of demultiplexed reads (millions) is shown for the individual conditions in replicate 1, 2, 3 or
control.
Appendix VIII Figure 2: RUNX1 +24 Mapped read count.
The number of mapped reads (millions) is shown for the individual conditions in replicate 1, 2, 3 or control.
290
Appendix VIII Figure 3: RUNX1 +24 4C quality evaluation.
The coverage of fragments that had more than one read present within 0.1 mb on either side of the bait was plotted against the percentage of reads in cis over total. 4C data of WT 12 DMSO (light blue), WT 12 PMA (blue), WT 72 PMA (dark blue), STAG2 12 DMSO (yellow), STAG2 12 PMA (orange), STAG2 72 PMA (red) is plotted. Three points, corresponding to replicate 1, 2 and 3 are presented for each condition. Grey dotted lines indicate cut-off point for good quality data.
Appendix IX: WT 12 DMSO interaction peaks
Table of all the statistically significant interactions with RUNX1 +24 in WT 12 DMSO
cells
WT 12 DMSO cis significant interactions
Appendix IX Table 1: WT 12 DMSO cis interactions
Chromosome location (Chr21)
Start End 16102849 16290033
17054388 17167830
18444178 18546244
22870115 23124047
291
Chromosome location (Chr21)
Start End 24350542 24638310
27615547 27687280
28569260 28782031
31067134 31287024
33094489 33125735
33479993 33507790
34626170 34659645
34,861,779 34,886,331
35064167 35082568
37435929 37444466
37450343 37453283
37471800 37474965
38,064,789 38,088,359
38205868 38217764
38326452 38338678
38630433 38645058
39302387 39354604
39457439 39483268
39837197 39941694
40549637 40807167
41078121 41274056
42531449 42597547
43171342 43223493
43,820,942 43,905,318
44021469 44133209
44350268 44425222
45126172 45164937
292
WT 12 DMSO near-bait significant interactions
Appendix IX Table 2: WT 12 DMSO near-bait interactions
Chromosome location (Chr21)
Start End 35474191 35488395
36120292 36122672
36123862 36126228
36125047 36127409
36156784 36159142
36157963 36160322
36243806 36245595
36253381 36255044
36264639 36266171
36265409 36266933
36271428 36272893
36277225 36278641
36277936 36279346
36279349 36280749
36280051 36281446
36280751 36282141
36281449 36282833
36282836 36284211
36288291 36289632
36290969 36292295
36293617 36294930
36294275 36295584
36305857 36307115
36313951 36315177
36314565 36315788
36316400 36317616
36317009 36318223
36317617 36318829
293
Chromosome location (Chr21)
Start End 36324230 36325417
36326010 36327189
36328367 36329537
36328953 36330121
36329538 36330704
36331867 36333024
36332447 36333601
36333602 36334752
36335327 36336470
36335900 36337041
36337042 36338179
36337611 36338746
36338180 36339312
36338747 36339877
36339878 36341004
36340442 36341565
36341005 36342126
36341566 36342686
36343803 36344915
36346578 36347680
36347130 36348230
36347681 36348780
36348231 36349329
36348781 36349876
36353148 36354232
36353690 36354773
36357470 36358544
36358545 36359617
36361223 36362291
36361757 36362824
36363358 36364423
294
Chromosome location (Chr21)
Start End 36367613 36368674
36368143 36369205
36370796 36371857
36371327 36372388
36375572 36376634
36376103 36377166
36377697 36378762
36379295 36380362
36379828 36380896
36380361 36381430
36380895 36381965
36382500 36383573
36383036 36384111
36384110 36385187
36386806 36387890
36387347 36388434
36409375 36410564
36409969 36411162
36412365 36413572
36412969 36414179
36414790 36416010
36415400 36416624
36416012 36417240
36416626 36417857
36417242 36418476
36417859 36419097
36418478 36419719
36425395 36426671
36426033 36427312
36426673 36427955
36427314 36428599
295
Chromosome location (Chr21)
Start End 36427956 36429244
36428600 36429891
36429246 36430539
36432493 36433799
36433146 36434455
36433800 36435112
36441738 36443074
36443075 36444413
36447775 36449123
36449124 36450475
36449799 36451151
36455222 36456582
36455902 36457263
36464790 36466165
36465477 36466854
36468234 36469615
36468924 36470307
36469616 36471000
36470308 36471694
36471001 36472388
36474478 36475874
36475176 36476573
36479385 36480795
36482213 36483633
36482923 36484346
36483634 36485060
36484347 36485775
36485061 36486492
36489374 36490822
36499642 36501131
36501880 36503377
296
Chromosome location (Chr21)
Start End 36507905 36509423
36508664 36510185
36517085 36518628
36547820 36549434
36548627 36550243
36549435 36551053
36551054 36552676
36569979 36571652
36575859 36577549
36598256 36600012
36599134 36600892
36613306 36615093
36689136 36690961
36690048 36691870
36692776 36694589
36699984 36701773
36700879 36702664
36707985 36709750
36716767 36718514
36769828 36771700
36770764 36772642
36845802 36848109
297
Appendix X: Trans interaction peaks
The trans interactions of RUNX1 +24 could not be significantly investigated using 4C-
ker. The 4Cker analysis showed bands that spanned across large areas of the various
chromosomes the trans interactions that could be detected, I looked at the raw read peaks across
the genome (Appendix IX Figure 1). Based on peaks that contained reads in 2 or more
replicates I detected three trans interactions (Appendix IX Figure 2-4). All peaks were in
intergenic regions and contained minimal ENCODE annotations.
Appendix X Figure 1: RUNX1 +24 trans chromosome 1 reads.
UCSC genome browser snapshot of normalised read count showing chromosome (chr) 1 reads from all three replicates.
298
Appendix X Figure 2: RUNX1 +24 chr 1 trans interactions.
UCSC genome browser snapshot of normalised read count showing reads from all three replicates highlighting the possible chr 1 trans interaction
299
Appendix X Figure 3: RUNX1 +24 chr 3 trans interactions.
UCSC genome browser snapshot of normalised read count showing reads from all three replicates highlighting the possible chr 3 trans interaction
300
Appendix X Figure 4: RUNX1 +24 chr 9 trans interactions.
UCSC genome browser snapshot of normalised read count showing reads from all three replicates highlighting the possible chr 9 trans interaction.
301
Appendix XI: List of significant changes (cis)
Lists of all the statistically significant cis changes between conditions
WT cis significant changes during differentiation
Appendix XI Table 1: Differentially decreased cis interactions in WT 12 h PMA cells compared to WT 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
16193679 16290033 -8.0131141 0.00497879
39852633 39906157 -7.8384879 0.00978024
Appendix XI Table 2: Differentially decreased cis interactions in WT 72 h PMA cells compared to WT 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
22876545 23117188 -7.4296378 0.00362255
24372136 24630993 -5.7785282 0.00383509
28570440 28782508 -7.895935 0.00817325
31223207 31350841 -8.8622479 0.00193125
34626170 34659645 -6.4021928 0.00554615
34861883 34886153 -6.2869926 0.00584545
38065170 38088150 -7.50213 0.00211092
39302647 39354164 -7.8347453 0.00178676
43171342 43273433 -6.5989095 0.00645605
43821554 43904818 -8.3977348 0.00050987
302
WT and STAG2 cis comparisons
Appendix XI Table 3: Differentially decreased cis interactions in STAG2 12h DMSO cells compared to WT 12h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
15789630 15905266 4.93016724 0.00656758
16193679 16290033 -5.6375836 0.00465524
16337170 16447658 4.39988775 0.00144947
26749583 26912507 6.04072918 0.00276931
30766929 30848918 6.02534263 0.00029361
30975222 31069884 3.84968539 0.00943295
34232784 34274588 5.95696286 0.00150594
34738802 34767851 3.99116934 0.00827214
34965396 34986172 4.06506145 0.0074603
35064266 35082455 -5.5186585 0.00780053
38065170 38088150 -5.1459938 0.00711804
38099808 38123104 3.88469303 0.00595205
38111456 38134838 5.55176215 0.00821735
38790145 38825524 4.39693366 0.00587422
39225824 39276819 4.59711214 0.00883934
39251322 39302562 4.68523129 0.00971659
39379992 39431602 5.13275693 0.0034974
39561223 39613465 5.48301289 0.00504078
40092657 40145586 5.52330128 0.00367745
40383991 40437879 4.6125032 0.00692396
41082235 41192923 -6.659979 0.00036723
41137579 41253995 -6.518779 0.00213839
42327921 42467118 4.93843987 0.00575986
44546192 44629845 4.3041222 0.00671446
44588018 44672916 6.00732878 0.00022353
303
Appendix XI Table 4: Differentially decreased cis interactions in STAG2 12 h PMA cells compared to WT 12 h PMA K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36516987 36518381 8.83443992 00.0023261 Appendix XI Table 5: Differentially decreased cis interactions in STAG2 72 h PMA cells compared to WT 72 h PMA K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
17658804 17929142 8.77960408 3.54E-05
17795376 18062908 5.99772569 0.00353314
18653456 18833421 5.86735877 0.00232339
22762987 22996867 6.37260654 0.00782695
22876545 23117188 10.0927403 1.21E-05
26749583 26912507 8.04883755 0.00207552
27308690 27428828 7.25476493 0.00509219
27367527 27490129 7.74087843 0.00509219
27426968 27553290 8.28178011 0.00214456
28570440 28782508 8.32963928 0.00275066
28997265 29212346 6.8981528 0.00358326
29427589 29629288 6.36801489 0.00379672
30646160 30726999 6.42810586 0.00190139
31826262 31999444 4.97135124 0.00939023
32513939 32620098 5.65073304 0.00866759
33094489 33156981 4.74896145 0.00866759
34169753 34211768 7.7060914 0.00015066
34943785 34965232 7.39162623 0.0010998
34954679 34975784 6.62312295 0.00399488
37516547 37524058 5.84397019 0.00095948
37520302 37527914 5.92987926 0.00232339
37535840 37543879 8.92597012 0.00015066
37720265 37734202 8.49654733 0.00315786
304
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
38755847 38789797 5.92670113 0.00296832
39431609 39483268 5.62911118 0.00855271
40198325 40251084 5.08928081 0.00207552
41812481 41960561 6.81246138 0.00351648
43821554 43904818 7.13610776 0.00388488
44313418 44387118 6.5161103 0.00156909
44927619 45009217 6.82470303 0.00866759
46067813 46187451 6.81361389 0.00263408
46127632 46249867 7.68983358 0.00052257
47058514 47214415 6.88001354 0.00124792
47136465 47289912 5.62989411 0.00233245
47432325 47569238 6.40367744 0.00142847
305
Appendix XII: List of significant changes (near-bait)
Lists of all the statistically significant near-bait changes between conditions
WT near-bait significant changes during differentiation
Appendix XII Table 1: Differentially connected near-bait interactions in WT 12 h PMA cells compared to WT 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
35474191 35488395 -8.9377799 0.00013656
36278644 36280049 6.80655514 0.00194086
36332447 36333601 -7.6163228 0.00154466
36352605 36353690 8.07237223 0.00044141
36366550 36367612 6.76625148 0.00046604
36406427 36407599 8.25342937 5.34E-07
36410566 36411762 7.97117671 1.17E-05
36420971 36422856 7.47613931 0.00204346
36517085 36518628 -7.8123099 0.00337409
Appendix XII Table 2: Differentially connected near-bait interactions in WT 72 h PMA cells compared to WT 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36278644 36280049 -8.4916179 0.00011101
36324230 36325417 -9.7177426 8.72E-05
36326010 36327189 -9.9312191 1.98E-06
36349329 36350423 -5.0850069 0.00025395
36351516 36352604 -6.4430378 0.00243444
36352061 36353147 -6.4430378 0.00243444
36352605 36353690 -9.8028067 2.51E-05
36353690 36354773 -10.007951 8.07E-06
36357470 36358544 -11.333388 2.69E-13
306
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36366550 36367612 -8.4053431 2.59E-05
36375572 36376634 -11.252854 2.29E-13
36377697 36378762 -10.717736 6.58E-10
36379295 36380362 -12.360236 5.70E-15
36379828 36380896 -4.8535823 2.15E-05
36380361 36381430 -3.9254633 0.00055942
36380895 36381965 -4.0813855 0.00262689
36383573 36384648 -3.5709246 0.00168646
36384110 36385187 -12.034618 6.48E-15
36384648 36385726 -10.183231 3.33E-06
36387347 36388434 -5.9110333 8.45E-07
36390068 36391163 -6.9307593 1.69E-11
36390614 36391712 -3.8287182 0.00154049
36391711 36392813 -13.697569 9.54E-21
36392261 36393366 -13.781015 9.54E-21
36393365 36394474 -10.689796 1.86E-12
36393918 36395030 -11.706035 8.92E-16
36394473 36395587 -11.160013 7.62E-14
36395028 36396145 -8.4361378 1.20E-05
36395586 36396705 -12.292107 8.92E-16
36396144 36397266 -12.317332 3.42E-16
36397827 36398958 -17.523158 1.40E-35
36399523 36400657 -12.06068 1.38E-18
36400090 36401227 -11.753278 5.96E-16
36400658 36401798 -3.7399723 0.00119426
36401228 36402371 -12.495422 8.92E-16
36401799 36402945 -13.775696 1.32E-20
36402372 36403521 -8.2326817 6.06E-12
36403523 36404678 -4.7434083 5.04E-06
36404100 36405259 -3.3574145 0.00265467
36405261 36406426 -5.5279129 2.50E-06
307
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36405843 36407012 -11.666767 7.09E-14
36406427 36407599 -9.8620529 3.89E-08
36407013 36408189 -9.4757263 8.22E-07
36410566 36411762 -9.6137529 7.05E-07
36412365 36413572 -10.271704 1.98E-06
36412969 36414179 -5.4552266 0.00954996
36420971 36422225 -9.2193533 0.00014574
36421598 36422856 -9.2193533 0.00014574
36427314 36428599 -10.593799 3.32E-09
36465477 36466854 -9.4216804 8.20E-06
36479385 36480795 -4.1398039 0.00622221
36508664 36510185 -8.5117676 0.00151198 36598256 36600012 -5.3745211 0.00237202
37108487 37112215 5.94585879 0.0068571 WT and STAG2 near-bait comparisons
Appendix XII Table 3: Differentially connected near-bait interactions in STAG2 12 h DMSO cells compared to WT 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
35442448 35454754 5.14667346 0.00604543
35488228 35503117 7.20660582 6.37E-05
35510591 35526316 5.13712908 0.00958697
35600268 35616837 5.63045291 0.00190404
35608619 35625055 5.39764041 0.00313695
35686013 35699118 5.05439121 0.00336181
35692775 35705461 6.00241513 0.00036696
35699325 35711598 5.92593227 0.00511254
35734347 35744470 4.69026964 0.00330434
35739543 35749398 4.87776795 0.00739212
308
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
35903813 35913177 6.16813376 0.0003288
35922480 35931468 5.15510497 0.00202299
35979513 35986245 4.15306452 0.00272369
36120292 36122672 -5.0846932 0.00165723
36123862 36126228 -6.3081783 0.00204766
36125047 36127409 -6.3378073 0.00217929
36126230 36128589 5.37656029 0.00055046
36127411 36129767 5.44025738 0.00532178
36156784 36159142 -5.9130815 0.00347325
36157963 36160322 -4.9202346 0.00236422
36215926 36218078 5.72381578 0.00143193
36238267 36240133 5.10258129 0.00313695
36243806 36245595 -7.2643134 2.92E-05
36253381 36255044 -5.5051643 0.00051481
36261516 36263082 5.54363525 0.00201202
36264639 36266171 -4.6884026 0.00198188
36265409 36266933 -4.6884026 0.00198188
36266175 36267691 7.33095815 2.50E-05
36271428 36272893 -6.5493485 0.00012837
36277225 36278641 -8.5036482 6.15E-07
36277936 36279346 -8.5019255 7.48E-09
36279349 36280749 -6.1583448 0.00133455
36280051 36281446 -7.2873161 5.50E-06
36280751 36282141 -8.5801776 1.12E-09
36281449 36282833 -8.9300293 1.04E-10
36282836 36284211 -5.7549271 0.00062342
36288291 36289632 -6.526985 8.70E-05
36290969 36292295 -4.543678 0.00052385
36292958 36294274 5.09197601 0.00514143
36293617 36294930 -6.0870697 8.83E-06
36294275 36295584 -6.0372376 0.00018546
309
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36305857 36307115 -7.33992 5.14E-05
36313951 36315177 -7.1939107 1.38E-05
36314565 36315788 -8.4854587 3.55E-10
36316400 36317616 -6.0665073 0.00091451
36317009 36318223 -7.5011631 7.10E-06
36317617 36318829 -6.6905219 0.00020166
36324230 36325417 -6.4434308 0.00219053
36326010 36327189 -6.7090075 0.00020387
36328367 36329537 -6.1060016 0.00101977
36328953 36330121 -6.8940376 0.00025188
36329538 36330704 -5.8264151 0.00098554 36331867 36333024 -6.3782038 0.00173938
36333602 36334752 -7.5083462 3.30E-05
36335327 36336470 -7.6558644 5.77E-06
36335900 36337041 -7.9329818 1.75E-09
36337042 36338179 -6.6785373 0.00186816
36337611 36338746 -5.3144725 0.00281319
36338180 36339312 -4.119617 0.00606764
36338747 36339877 -5.0564483 0.00087849
36339878 36341004 -5.4894373 0.00023521
36340442 36341565 -5.1490454 0.00109411
36341005 36342126 -5.5925002 0.00243223
36341566 36342686 -5.4010668 0.00219053
36343803 36344915 -6.9091289 0.00063124
36346578 36347680 -5.7559427 2.67E-05
36347130 36348230 -5.6191847 0.00011139
36347681 36348780 -5.9721228 7.62E-05
36348231 36349329 -6.7553451 5.15E-06
36348781 36349876 -8.3451442 5.28E-08
36349329 36350423 -7.2373471 1.21E-08
36353148 36354232 -7.6190587 2.98E-08
310
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36353690 36354773 -6.7603357 0.00052385
36357470 36358544 -8.2201229 5.59E-10
36358545 36359617 -5.0586849 0.00053423
36361223 36362291 -6.3245385 0.00018379
36361757 36362824 -7.7342382 5.85E-07
36363358 36364423 -4.2037648 0.00842511
36367613 36368674 -5.3934605 0.00532178
36368143 36369205 -7.6240911 1.65E-07
36368674 36369735 -7.8172476 1.57E-08
36369205 36370266 -6.0627492 0.0004902
36370796 36371857 -7.1648961 7.03E-07
36371327 36372388 -7.1668825 6.26E-07
36375572 36376634 -8.1453593 5.00E-10
36376103 36377166 -9.3578651 5.39E-14
36376634 36377698 -8.6138615 3.31E-10
36377166 36378230 -7.2504001 8.36E-07
36377697 36378762 -7.5580953 4.60E-07
36378230 36379295 -8.4133492 1.87E-09
36378762 36379828 -9.1784314 2.31E-12
36379295 36380362 -9.2348118 1.02E-11
36379828 36380896 -9.6763744 6.47E-14
36380361 36381430 -2.6218818 0.00652073
36380895 36381965 -3.3237615 0.00243029
36381430 36382501 -5.9783881 0.00249267
36382500 36383573 -5.7246547 0.00453991
36383036 36384111 -8.4451345 2.92E-11
36383573 36384648 -9.2083949 6.67E-14
36384110 36385187 -8.9215142 1.54E-11
36384648 36385726 -6.9420433 0.00025483
36386806 36387890 -8.5415748 3.55E-10
36387347 36388434 -9.0233909 1.55E-11
311
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36387890 36388978 -8.8873084 1.85E-11
36388433 36389523 -7.9683448 1.95E-08
36388977 36390069 -6.9327362 2.49E-06
36389522 36390615 -8.2426883 3.30E-09
36390068 36391163 -9.9377838 7.34E-15
36390614 36391712 -9.6751239 3.60E-14
36391162 36392262 -9.0775585 9.77E-12
36391711 36392813 -10.61135 2.95E-17
36392261 36393366 -10.689834 2.95E-17
36393365 36394474 -7.5904273 4.51E-09
36393918 36395030 -8.6248651 2.98E-12
36394473 36395587 -8.0668095 2.29E-10
36395028 36396145 -5.2569823 0.0015807
36395586 36396705 -9.1849292 2.02E-12
36396144 36397266 -9.2193345 6.15E-13
36396704 36397829 -9.0894818 8.09E-14
36397265 36398393 -8.967531 6.11E-12
36400090 36401227 -8.675502 1.80E-12
36400658 36401798 -9.2692792 7.93E-14
36401228 36402371 -9.378327 2.02E-12
36401799 36402945 -5.038744 7.61E-08
36402372 36403521 -3.5759767 0.00187333
36403523 36404678 -9.3643406 4.72E-14
36404100 36405259 -10.122268 6.08E-17
36404680 36405841 -9.1409659 2.10E-11
36405261 36406426 -8.2309386 4.38E-10
36405843 36407012 -8.5509922 1.38E-10
36407013 36408189 3.99025471 0.00241178
36408190 36409373 -7.4028371 1.83E-08
36409375 36410564 -5.8642315 0.00036696
36409969 36411162 -6.3031778 5.49E-05
312
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36412365 36413572 -7.0352108 0.00016797
36412969 36414179 -7.7426722 1.46E-05
36414790 36416010 -4.6024336 0.00552198
36415400 36416624 -7.9606583 3.87E-08
36416012 36417240 -8.2026183 5.28E-08
36416626 36417857 -6.4008285 1.11E-06
36417242 36418476 -7.2248679 3.94E-07
36417859 36419097 -8.6052284 3.53E-10
36418478 36419719 -9.0721955 2.10E-11
36425395 36426671 -6.1071262 0.00045891
36426033 36427312 -3.2468646 0.00427006
36426673 36427955 -5.5419205 1.03E-07
36427314 36428599 -7.4220256 1.58E-06
36427956 36429244 -7.2037424 6.20E-07
36428600 36429891 -7.6476059 5.36E-09
36429246 36430539 -6.7768232 2.50E-05
36432493 36433799 -5.4958363 0.00874461
36433146 36434455 -6.6494338 0.00223817
36433800 36435112 -6.8541375 0.00085553
36441738 36443074 -5.8566927 0.00652073
36443075 36444413 -5.9206517 0.00249267
36446428 36447774 7.39577926 2.96E-05
36447775 36449123 -6.6516338 4.41E-06
36449124 36450475 -5.6525322 0.00666041
36449799 36451151 -5.6819926 0.00326723
36455222 36456582 -5.9937473 0.00190552
36455902 36457263 -4.5299829 0.00652073
36464790 36466165 -6.4190745 0.00010023
36465477 36466854 -6.1987382 0.00070554
36468234 36469615 -5.8512261 0.00088366
36468924 36470307 -6.8216637 1.71E-06
313
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36469616 36471000 -6.8430583 7.69E-07
36470308 36471694 -7.453335 1.17E-07
36471001 36472388 -7.582741 2.40E-06
36471694 36473083 -6.7916012 0.0001583
36474478 36475874 -9.0531798 5.21E-13
36475176 36476573 -7.7620541 1.05E-06
36476574 36477976 4.5685569 0.0094656
36482213 36483633 -6.0956116 0.00219053
36482923 36484346 -6.1011021 0.00196216
36483634 36485060 -7.5770118 3.59E-05
36484347 36485775 -8.8197034 1.55E-11
36485061 36486492 -4.1300057 0.00056245
36489374 36490822 -4.6474085 0.00292553
36499642 36501131 -5.0309972 0.00810328
36501880 36503377 -5.8853815 0.00109411
36507905 36509423 -5.2677341 0.00433536
36510186 36511710 4.20340063 0.00745852
36547820 36549434 -5.0367016 0.00268999
36548627 36550243 -6.1675094 3.01E-05
36549435 36551053 -5.6311423 0.00690644
36551054 36552676 -6.0681017 0.0046105
36569979 36571652 -5.8495268 0.00299595
36575859 36577549 -8.1808458 8.61E-10
36586087 36587810 5.45802724 0.00225476
36599134 36600892 -7.70849 0.00071704
36613306 36615093 -6.6973562 1.05E-06
36639644 36641497 4.98883571 0.00388645
36643358 36645221 6.65546515 0.00076291
36644289 36646155 4.44734443 0.00452901
36675275 36677141 6.26116299 0.00199485
36678070 36679930 4.12581365 0.00954415
314
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36681781 36683630 5.27931924 0.00840389
36689136 36690961 -6.0643028 0.0065149
36690048 36691870 -6.2983822 0.00056685
36692776 36694589 -6.5063716 0.00100762
36699984 36701773 -5.0619114 0.00491749
36700879 36702664 -5.3418691 0.00359391
36707985 36709750 -5.6312265 0.00214273
36716767 36718514 -6.0836102 0.00242154
36720257 36721999 4.76772143 0.00383356
36760629 36762445 6.69185046 6.37E-05
36769828 36771700 -5.4890296 0.00539627
36770764 36772642 -5.0261707 0.00831542
36780318 36782271 5.18649867 0.00453307
36817180 36819432 5.41118856 0.00788438
36820565 36822832 4.62864398 0.0055187
36830832 36833128 3.91908454 0.00316182
36833129 36835428 6.36933438 0.00084076
36834279 36836579 7.26272323 8.89E-05
36838884 36841188 4.91703132 0.00840389
36845802 36848109 -5.5795055 0.00055954
36861965 36864276 5.4450722 0.00057203
36863120 36865432 5.94395026 0.00075854
36880462 36882777 5.11807537 0.00104606
36895580 36897926 3.31146263 0.00889445
36896753 36899103 4.04485602 0.00281761
36905029 36907416 4.68161925 0.00606764
36915906 36918366 4.73789359 0.0032947
36917136 36919606 4.96756688 0.00784981
36958414 36961325 4.74411509 0.00074215
36959870 36962799 5.84583263 0.00204766
36981249 36984462 4.96011295 0.00056245
315
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36982856 36986093 5.03870752 0.00056245
36991078 36994449 4.9324713 0.00491749
37008653 37012362 4.85873578 0.00382578
37028134 37032277 6.1739032 0.00104737
37030206 37034396 5.79386359 0.00225476
37032301 37036541 6.74526785 0.00074215
37034421 37038710 5.82107512 0.00222416
37036565 37040905 5.76476449 0.00064035
37038735 37043124 5.90300992 0.00011139
37043150 37047640 4.45955355 0.00235656
37052283 37056975 5.01655284 0.00666041
37095061 37100368 5.23676744 0.00434717
37103038 37108370 5.31556159 0.00195862
37105704 37111038 5.79920985 0.00054327
37159176 37163807 4.10587535 0.00840389
37170498 37174877 7.33275151 4.41E-06
37172687 37177018 5.49103364 0.00612315
37174853 37179137 5.24234056 0.00313695
37181212 37185366 5.85259436 0.00105398
37195362 37199276 4.48212577 0.00190552
37222017 37225778 5.30380422 0.00634469
37237127 37240932 6.31057776 0.00142198
37256273 37260121 6.2641119 0.00062633
37258197 37262045 6.06703571 0.00217929
37260121 37263969 6.10036762 0.00109411
37317455 37321316 5.98073692 0.00022105
37319385 37323257 5.83117255 0.00190552
37325210 37329115 6.6395281 5.82E-05
37356988 37360994 5.84564246 0.00074215
37358991 37362993 5.94765868 0.00165723
37382680 37386536 5.32308698 0.00195583
316
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
37384608 37388447 5.56009357 0.00199325 Appendix XII Table 4: Differentially connected near-bait interactions in STAG2 12 h PMA cells compared to WT 12 h PMA K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
35474191 35488395 8.88363712 0.00039423
36329538 36330704 -7.104548 0.00325246
36517085 36518628 8.07291617 0.00534208 Appendix XII Table 5: Differentially connected near-bait interactions in STAG2 72 h PMA cells compared to WT 72 h PMA K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
35616930 35633181 6.39080745 0.00248597
35820274 35829140 5.5848525 0.00097742
35824675 35833604 5.60497619 0.002314
35861078 35870515 5.67484009 0.00573638
35865777 35875254 6.38202372 0.00294739
36037368 36041588 5.93157339 0.00653349
36085729 36088522 5.19853978 2.60E-05
36087139 36089905 3.82433892 0.00644468
36095322 36097948 5.91682845 0.00492169
36103113 36105633 5.99928094 0.00569621
36126230 36128589 5.74903532 0.00151667
36212646 36214832 6.07280288 0.00314296
36230511 36232484 4.39631305 0.00082114
36231504 36233463 5.13758583 0.0092545
36245601 36247365 4.08172618 0.00950164
36246489 36248241 4.13413137 0.00769454
317
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36247371 36249111 3.68908087 0.00950164
36255049 36256690 4.35292358 0.00644468
36255874 36257506 4.11377844 0.00961065
36264639 36266171 -6.0507818 0.00031558
36265409 36266933 -6.0507818 0.00031558
36277936 36279346 -6.3021965 0.00037603
36278644 36280049 11.6729562 1.33E-09
36280751 36282141 3.78712369 0.0070179
36281449 36282833 3.48142505 0.00992013
36290302 36291632 2.88412113 0.00779809
36293617 36294930 4.04452087 0.0011826
36294275 36295584 4.6285964 0.00333127
36296890 36298187 -5.8193993 0.0048044
36321841 36323037 3.32492625 0.00601396
36326010 36327189 9.70851274 6.56E-06
36329538 36330704 -5.3848816 0.00813909
36353690 36354773 9.46576918 4.07E-05
36357470 36358544 10.7800466 6.46E-11
36358008 36359081 -6.1100997 0.0006696
36358545 36359617 -5.509932 0.00471123
36359081 36360153 4.41661147 0.00242101
36359617 36360688 3.72348938 0.00011918
36368143 36369205 -5.3552475 0.00242101
36375572 36376634 10.5041155 7.68E-11
36378230 36379295 -5.9803434 0.00038571
36378762 36379828 -6.2605281 4.97E-05
36379828 36380896 -4.7731068 0.00242101
36380895 36381965 3.59887114 0.00569621
36384110 36385187 10.3846842 1.70E-10
36386806 36387890 -5.2050704 0.00189496
36387890 36388978 -7.26707 3.12E-06
318
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36388433 36389523 -7.0197525 2.31E-05
36389522 36390615 3.71566016 0.00693617
36390614 36391712 -5.8030608 0.00010992
36391162 36392262 -5.3553286 0.00091074
36396704 36397829 -6.1154249 1.69E-05 36397265 36398393 -5.9315806 0.00011918
36400658 36401798 -5.4946701 0.00018284
36401799 36402945 9.67500252 8.95E-10
36403523 36404678 -4.5884313 0.002314
36404100 36405259 -6.7406599 1.60E-06
36405261 36406426 4.97333288 4.59E-05
36408190 36409373 -6.8682933 6.56E-06
36409375 36410564 4.14902314 0.00961065
36412365 36413572 9.90821104 8.81E-06
36412969 36414179 5.19516829 0.006363
36415400 36416624 -5.3966426 0.002314
36417242 36418476 3.55214792 0.00964533
36418478 36419719 -7.3674023 3.66E-06
36427314 36428599 8.28077489 2.02E-05
36438413 36439740 6.19923091 0.00355447
36439076 36440405 5.34938019 0.00781592
36447101 36448448 3.42138535 0.00964533
36447775 36449123 3.57663208 0.00973886
36462046 36463416 5.37629124 0.00091074
36466854 36468233 4.03332205 0.00678425
36467544 36468924 4.03332205 0.00678425
36468924 36470307 -5.051629 0.00303796
36470308 36471694 3.53892576 0.00955617
36473780 36475175 4.22259507 0.00195782
36474478 36475874 3.02786348 0.00781592
36476574 36477976 7.73871874 0.00086778
319
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36483634 36485060 -5.6762586 0.00949249
36525620 36527182 -5.0371467 0.0098543
36548627 36550243 3.83613999 0.00674913
36564157 36565814 -5.8356448 0.00350161
36564985 36566644 -5.6011714 0.002314
36574173 36575858 3.91851685 0.0048044
36576704 36578397 4.09421663 0.00569621
36581802 36583511 4.57197733 0.00781592
36583512 36585226 -5.8699908 0.00294739
36598256 36600012 6.96899254 1.69E-05
36599134 36600892 8.45991798 0.0005704
36613306 36615093 4.45512151 0.00042992
36624088 36625898 4.88082738 0.00813909
36624993 36626806 5.09223372 0.00189172
36626807 36628624 3.59833853 0.00779809
36627716 36629535 3.92259217 0.0098543
36638718 36640569 5.23900589 0.00950164
36639644 36641497 5.30117871 0.00781592
36654596 36656478 6.9292274 0.0008546
36728951 36730687 4.62761545 0.00314296
36729819 36731555 4.37422906 0.00775817
36735029 36736767 9.11645074 0.0001077
36739379 36741122 4.39342681 0.00346245
36756124 36757918 5.238339 0.00779809
36777412 36779341 3.42447573 0.00798702
36786243 36788249 3.67964767 0.00678425
36787246 36789261 3.86273218 0.00632354
36810491 36812704 5.61215239 0.002314
36845802 36848109 5.854747 0.0001077
36846955 36849263 5.16856943 3.66E-06
36848109 36850417 3.85732002 0.00035811
320
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36853880 36856189 -5.2477179 0.00973886
36885096 36887417 5.06153769 0.00964533
36895580 36897926 4.68289553 0.00156111
36896753 36899103 4.72151154 0.00314296
36911032 36913456 4.01547479 0.00709354
36933626 36936251 4.86453523 0.00272994
36934938 36937577 5.12703258 0.00248597
37056999 37061790 5.78297835 0.00747781
37059395 37064233 6.08828716 0.00242101
37061814 37066698 5.31314356 0.00813909
37084541 37089766 6.46491345 0.002314
37087154 37092404 6.25238487 0.00091074
37113699 37119014 5.88738966 0.00303796
37116357 37121658 5.63425786 0.00967932
37170498 37174877 4.73204438 0.00882689
37187383 37191422 4.94111543 0.00781592
37199264 37203129 7.52992906 0.00015582
37205031 37208841 4.95470397 0.00758695
37212613 37216380 5.08057016 6.35E-05
37214497 37218259 5.81850888 0.00082114
37246671 37250505 9.29052153 1.69E-05
37263969 37267814 6.15339897 0.00314296
37283130 37286944 5.57851522 0.00272994
37302154 37305960 5.87814043 0.00346245
37305962 37309776 3.96493041 0.00604611
37307869 37311688 4.63176966 0.00011918
37309778 37313604 4.43627766 0.00426435
37313608 37317450 5.33377368 0.00398626
37315529 37319381 5.18200797 0.00781592
37346971 37350974 7.54451589 0.00069152
37348973 37352979 6.87174577 0.00729703
321
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
37374884 37378805 5.53180004 0.00346245
Appendix XIII: List of gene connections
To investigate which genes may be co-regulated with RUNX1, interactions occurring
with other genes were identified. Gene interaction analysis identifies a connection with an
overlap in the same genomic region as a gene, and does not necessarily reflect an interaction
with a promoter or exon of a gene.
Appendix XIII Table 1: List of gene connecting to RUNX1 +24 connections in each condition
WT 12 D WT 12 P WT 72 P STAG2 12 D
STAG2 12 P STAG2 72 P
AF127936.3 AF015262.2 AF064858.10
AF064858.10
AF240627.2 AF064858.6
AF127936.5 AF015720.3 AF064858.11
AF127577.11
AP000255.6 AP000253.1
AF127936.7 AF064858.6 AF064858.6 AIRE AP000688.14
AP000254.8
AL773572.7 AF127577.12
AF064858.8 AP000255.6 BRWD1 AP000962.1
AP000255.6 AF127577.13
AF064860.5 AP000320.6 BRWD1-AS1
AP001628.6
AP000688.11
AF129075.5 AF064860.7 AP000563.2 BRWD1-AS2
AP001628.7
AP000688.14
AL773572.7 AF124730.4 AP000569.9 CBR3-AS1 AP001630.5
AP001412.1 AP000223.42
AF127936.3 AP000688.11
CLIC6 BX322557.10
AP001626.2 AP000233.4 AF127936.5 AP000688.14
DOPEY2 C21orf62-AS1
AP001630.5 AP000282.3 AF129075.5 AP000688.15
HLCS CLIC6
BACE2 AP000320.6 AF246928.1 AP000705.6 HMGN1 DSCAM BRWD1 AP000474.1 AGPAT3 AP000705.7 IL10RB DSCR4 BRWD1-AS1
AP000563.2 AJ006998.2 AP000705.8 ITSN1 ERVH48-1
BRWD1-AS2
AP000688.11
AL109761.5 AP000959.2 KCNE2 H2AFZP1
BRWD1-IT1 AP000688.14
AP000255.6 AP001136.2 LINC00160 HMGN1
322
WT 12 D WT 12 P WT 72 P STAG2 12 D
STAG2 12 P STAG2 72 P
CBR1 AP000688.29
AP000318.2 AP001412.1 LINC01426 KCNE1
CLIC6 AP000695.4 AP000320.6 AP001628.6 LINC01436 KCNE1B DSCR3 AP000696.2 AP000322.5
3 AP003900.6 MIR548X LINC00160
DSCR4 AP000705.6 AP000477.2 ATP5J MIR548XHG
LINC00205
GAPDHP16 AP000705.7 AP000477.3 BAGE2 MORC3 LINC00315 GRIK1 AP000705.8 AP000569.9 bP-
2171C21.3 PAXBP1 LINC00334
GRIK1-AS1 AP001042.1 AP000688.14
bP-2171C21.4
PAXBP1-AS1
LINC01426
HLCS AP001043.1 AP000688.15
bP-2189O9.2
SCAF4 LINC01549
HMGN1 AP001412.1 AP000688.29
BRWD1 SETD4 LINC01668
IFNAR2 AP001610.5 AP000695.6 C21orf62-AS1
SYNJ1 MIR5692B
IL10RB AP001619.2 AP001042.1 CBR3-AS1 Y_RNA MIR99A IL10RB-AS1
AP001619.3 AP001437.1 CLIC6 LOC100506403
MIR99AHG
LCA5L AP003900.6 AP001469.5 CRYZL1 RUNXOR MIRLET7C LINC00159 ATP5J AP001469.7 CYCSP41 AF015720.3 NDUFV3 LINC01671 BACE2 AP001469.9 DNMT3L AP000322.5
3 PDE9A
METTL21AP1
BACH1 AP001625.4 DOPEY2 CH507-396I9.3
POFUT2
MIR3197 BAGE2 AP001625.5 DSCR3 CMP21-97G8.2
RNU6-113P
MIR5692B BRWD1 AP001625.6 DSCR4 KCNE1 SCAF4 PDE9A BTF3L4P1 ATP5J DSCR8 KCNE1B SETD4 PDXK BTF3P6 BACH1 DYRK1A RCAN1 SNORA81 PKNOX1 BX322557.1
0 BRWD1 ERG RUNX1-IT1 SOD1
PLAC4 C2CD2 BRWD1-IT1 EZH2P1
TRAPPC10 POLR2CP1 CBR1 BTF3L4P1 FAM207A
WDR4
PRDM15 CCT8 C21orf91 FAM3B
Y_RNA PSMG1 CHAF1B C21orf91-
OT1 FDX1P2
LOC100506403
RAD23BLP CLDN14 C2CD2 GABPA
RUNXOR RBMX2P1 CLDN17 CBR1 GART
AF015720.3
RBPMSLP CLIC6 CBR3-AS1 GRIK1
AP000322.53
RIPK4 CXADR CCT8 HLCS
CH507-396I9.3
323
WT 12 D WT 12 P WT 72 P STAG2 12 D
STAG2 12 P STAG2 72 P
RNF6P1 CYCSP41 CH507-396I9.3
HUNK
CMP21-97G8.2
RNU6-992P DOPEY2 CLDN14 IL10RB
RUNX1-IT1 RUNX1 DSCAM CLIC6 ITGB2
SCAF4 DSCAM-IT1 CMP21-97G8.1
ITGB2-AS1
SETD4 DSCR3 CXADR ITSN1
TIMM9P2 DYRK1A DSCAM JAM2
USP25 ERG DYRK1A KCNE2
WRB EVA1C ERG KRTAP10-10
Y_RNA EZH2P1 ETS2 KRTAP10-11
AF015720.3 FAM3B EVA1C KRTAP10-6
AP000318.2 FDX1P2 EXOSC3P1 KRTAP10-7
AP000322.53
GABPA FDX1P2 KRTAP10-8
LINC01426 GART GABPA KRTAP10-9
RCAN1 GRIK1 GAPDHP14 LCA5L
RPL34P3 HLCS GAPDHP16 LINC00114
SMIM11A HLCS-IT1 GRIK1 LINC00160
SMIM11B HUNK GRIK1-AS1 LINC00649
LOC100506403
IMMTP1 HLCS LINC01424
RUNXOR ITSN1 HMGN1 LINC01547
JAM2 HUNK LINC01684
KCNE2 IGSF5 LL21NC02-1C16.1
KCNJ6 ITSN1 LL21NC02-
1C16.2
KCNJ6-AS1 JAM2 MAPK6PS2
KRTAP10-12
KCNE1 MIR3648-2
KRTAP10-13P
KCNE2 MIR3687-2
KRTAP12-1 KCNJ15 MIR6508
KRTAP12-2 KRTAP10-1 MX2
KRTAP12-3 KRTAP10-
10 NCAM2
KRTAP12-4 KRTAP10-
11 NDUFV3
LINC00160 KRTAP10-2 NRIP1
LINC00205 KRTAP10-3 PFKL
LINC00310 KRTAP10-4 PIGP
324
WT 12 D WT 12 P WT 72 P STAG2 12 D
STAG2 12 P STAG2 72 P
LINC00315 KRTAP10-5 PRDM15
LINC00334 KRTAP10-6 PSMG1
LINC00649 KRTAP10-7 PTTG1IP
LINC01426 KRTAP10-8 RCAN1
LINC01436 KRTAP10-9 RIPK4
LINC01684 LINC00160 RNF6P1
LINC01700 LINC00189 RNU4-45P
LLPHP2 LINC00310 RP1-
100J12.1
LTN1 LINC00649 RP11-
313P22.1
MAP3K7CL LINC01426 RP11-
589M2.1
MIR155HG LINC01436 RPL3P1
MIR3197 LSS RRP1B
MORC3 LTN1 RSPH1
MRPL39 MAP3K7CL RWDD2B
MRPS6 MCM3AP SAMSN1
MX1 MCM3AP-AS1
SCAF4
MX2 MEMO1P1 SETD4
MYL6P1 METTL21AP1
SH3BGR
NRIP1 MRPS6 SMIM11A
OLIG2 OLIG2 SMIM11B
PIGP PAXBP1 SNRPGP13
PLAC4 PAXBP1-AS1
SON
POFUT2 PIGP SUMO3
PRDM15 POLR2CP1 TIAM1
PSMG1 RCAN1 TPTE
RN7SL163P RIMKLBP1 TSPEAR
RNA5SP489 RN7SL163P TTC3
RNGTTP1 RNF6P1 UBASH3A
RNU4-45P RP1-
100J12.1 UBE2G2
RNU6-1149P
RP11-313P22.1
USP16
RNU6-696P RPL12P9 VN1R7P
RNU6-992P RPL23P2 WDR4
RP1-100J12.1
RPL34P3 WRB
325
WT 12 D WT 12 P WT 72 P STAG2 12 D
STAG2 12 P STAG2 72 P
RP1-225L15.1
RPL39P40 LOC100506403
RP11-313P22.1
RSPH1 AF015720.3
RP11-589M2.1
RWDD2B AP000318.2
RPL23P2 SAMSN1 KCNE1
RPL34P3 SCAF4 LINC00310
RPL39P40 SETD4 LINC01426
RPSAP64 SLC37A1 MIR802
RWDD2B SMIM11A MRPS6
SAMSN1 SMIM11B RPS20P1
SAMSN1-AS1
SYNJ1 RPS5P2
SETD4 TIMM9P2 SLC5A3
SMIM11A TSPEAR
SMIM11B TTC3
SNORA3 U3
SNORA62 USP16
SNORA80A WRB
SRSF9P1 Y_RNA
TIAM1 ZBTB21
TMPRSS2 ZNF295-
AS1
TPTE LOC100506
403
TSPEAR AF015720.3
TTC3 C21orf140
UBASH3A CH507-
396I9.7
UMODL1 CMP21-
97G8.2
UMODL1-AS1
KCNE1B
URB1 MIR802
URB1-AS1 RPS20P1
USP16 RUNX1
VN1R7P SLC5A3
Y_RNA SNORA11
LOC100506403
RUNXOR
AP000318.2
326
WT 12 D WT 12 P WT 72 P STAG2 12 D
STAG2 12 P STAG2 72 P
AP000322.53
C21orf140
CH507-396I9.3
CH507-396I9.7
CMP21-97G8.1
CMP21-97G8.2
KCNE1
RCAN1
RPS5P2
SLC5A3
RUNX1-IT1
327
Appendix XIV: List of significant changes STAG2 12
DMSO compared STAG2 72 PMA
Appendix XIV Table 1: Differential cis interactions in STAG2 72 h PMA cells compared to STAG2 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
16193679 16290033 5.86391632 0.00342781
16287686 16392414 -4.8616436 0.00409359
16337170 16447658 -6.2817559 0.00012778
22535850 22765112 -5.3708546 0.00321359
22876545 23117188 5.43274265 0.00202892
23369288 23626457 -5.6528021 0.00375885
23878355 24124540 5.05342004 0.00364158
24761799 25024643 6.15905653 0.00200644
32389417 32510405 4.90346805 0.00474697
32513939 32620098 4.80628151 0.00831305
33218357 33277752 -4.894004 0.00644409
33999313 34042381 4.99233029 0.0072539
34253734 34295441 -4.5073628 0.00819366
34643232 34676058 -5.6689901 0.00199124
35143611 35160271 -5.7105858 0.00205897
35152010 35168532 -5.8140152 0.00203572
35160337 35176727 -5.8371813 0.00382979
35232484 35247888 -5.8832819 0.00186227
35240230 35255545 -5.9020399 0.00219482
35300676 35315394 -6.3841202 0.00031727
35308065 35322722 -7.0977311 0.00049094
35436900 35450845 -5.4020436 0.00279414
35443887 35457803 -5.7110624 0.00808936
35478603 35492364 -6.1583276 0.00397153
35485500 35499228 -7.3015989 0.00041831
328
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
35506094 35519714 -5.6341086 0.00993049
35784920 35794816 4.30548294 0.00509029
35823560 35832710 4.42665524 0.00753341
35832755 35841723 5.20170137 0.00913565
35859255 35867690 8.36421118 6.47E-06
35880124 35888128 -5.3281192 0.00847348
35903755 35911258 -5.2503345 0.00664818
36021891 36026806 -6.535467 0.0020892
36052207 36056497 6.36072775 0.00183374
36074896 36078750 6.26158833 0.00049999
36078768 36082551 6.79553391 0.00253267
36097095 36100555 -5.3974861 0.00744211
36103984 36107328 4.6414395 0.00806999
36105670 36108987 4.80922853 0.00076676
36107342 36110631 4.92457287 0.00813439
36109000 36112262 3.82601073 0.00829254
36112275 36115485 5.15725287 0.00445428
36113893 36117077 6.39617859 0.00122301
36115497 36118656 6.28087387 0.00227957
36117089 36120223 5.47469903 0.0007658
36118668 36121778 5.74373025 0.00023888
36120235 36123321 4.29131238 0.00397123
36124863 36127878 7.69970523 0.00031408
36157164 36159733 5.31664623 0.00466991
36178077 36180394 -6.4238107 0.00442351
36182697 36184962 6.33079209 0.0010642
36183836 36186089 4.35662179 0.00424197
36194887 36197021 6.2781405 0.00149473
36200196 36202277 4.9283002 0.00932299
36213393 36215346 8.73987292 5.24E-06
36216323 36218250 -6.0057821 0.00371672
329
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36243603 36245303 7.23413186 0.00012512
36250339 36251990 6.41840825 0.00232423
36253634 36255261 4.29466569 0.00520545
36255264 36256880 6.54826795 3.05E-06
36256075 36257685 6.7786123 1.28E-06
36257688 36259287 -5.7516519 0.0022048
36266370 36267913 -5.4044968 0.00522296
36267144 36268682 -5.5749242 0.00318493
36269452 36270976 -6.1952272 0.00042122
36270979 36272494 5.24954146 8.02E-05
36271738 36273249 5.05029119 0.00466991
36278479 36279951 9.11879907 4.07E-08
36279217 36280685 9.10568924 1.27E-08
36280687 36282148 9.96475177 2.53E-11
36281420 36282876 10.0030672 2.11E-11
36282150 36283603 7.2543056 5.17E-05
36282878 36284327 7.267999 2.71E-05
36287208 36288634 5.4507541 0.0009367
36287923 36289346 5.48484313 0.00103209
36289348 36290764 -5.3562441 0.00474829
36290765 36292175 6.16887781 7.97E-07
36291472 36292878 8.49998497 9.96E-10
36293581 36294976 9.48666016 1.69E-11
36294280 36295673 9.42356641 4.13E-09
36305263 36306608 7.78796024 0.00024295
36309954 36311281 7.00839478 7.71E-05
36311282 36312604 3.874062 0.00903138
36311944 36313264 9.62390448 2.18E-10
36313924 36315237 4.9712562 0.00731705
36316548 36317852 6.15266037 0.00170019
36317201 36318502 9.11561781 1.50E-09
330
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36317853 36319152 9.31437377 2.64E-07
36321741 36323028 3.48271698 0.00296189
36322386 36323671 5.0790562 0.0032143
36324953 36326231 4.94922112 0.00085693
36331937 36333195 6.78039204 0.0016612
36333196 36334450 5.39157606 0.00498829
36335078 36336327 9.16501077 1.40E-07
36335703 36336952 8.22164842 4.77E-09
36336952 36338198 8.75800874 1.10E-05
36337576 36338819 8.33783655 6.32E-09
36338198 36339441 3.92381229 0.00396733
36338820 36340061 5.05953517 0.00189014
36340062 36341300 6.38840711 6.78E-06
36340682 36341919 6.09958357 0.00069069
36341301 36342536 5.71437787 0.00202855
36343771 36345001 7.75787409 0.00011951
36346844 36348068 7.40707966 1.11E-07
36347456 36348680 6.98617274 8.03E-07
36348069 36349291 8.01430446 1.16E-07
36348680 36349902 7.63758264 3.05E-06
36349292 36350512 -4.1354402 0.00625147
36352947 36354161 9.36897263 1.97E-10
36353555 36354768 6.77679685 0.0008522
36357190 36358398 8.10083758 6.32E-09
36357795 36359002 -5.1152283 0.0021679
36359002 36360208 6.20858216 9.77E-07
36360811 36362014 5.65592073 0.00093715
36361413 36362615 7.08985193 3.05E-06
36363217 36364417 5.6998418 0.00062424
36368612 36369807 7.96066213 7.07E-08
36369210 36370405 7.79227784 5.41E-06
331
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36370405 36371599 6.98964564 2.43E-05
36371003 36372196 8.48952557 4.77E-09
36372196 36373389 -4.7126163 0.00681217
36375178 36376369 7.97514 1.05E-08
36375774 36376965 8.46543677 4.33E-10
36376369 36377560 6.87164047 3.99E-06
36376965 36378155 6.49272082 1.46E-05
36377560 36378750 6.51577197 0.00051865
36379940 36381129 7.76836353 1.83E-09
36382913 36384101 8.19486754 1.83E-09
36383507 36384695 8.70214754 7.00E-11
36384101 36385289 7.88858675 3.34E-08
36388853 36390041 8.66212113 5.90E-09
36389447 36390636 8.76037028 9.96E-10
36400743 36401934 7.95025266 4.13E-09
36401339 36402530 5.74612908 0.00069069
36403126 36404318 -6.0357879 0.00025132
36404915 36406108 8.17048154 8.37E-09
36405511 36406704 8.2239719 4.77E-09
36407899 36409094 7.28607373 8.75E-06
36409692 36410889 8.8601203 2.25E-08
36410290 36411487 9.3885889 8.84E-08
36412086 36413284 7.17619303 0.00012778
36412685 36413884 8.01748321 9.23E-06
36414484 36415685 5.86499505 0.0007024
36416286 36417488 6.88736529 2.17E-05
36416887 36418089 10.1733878 5.63E-13
36417488 36418691 9.59154074 1.79E-10
36425940 36427152 9.46629636 2.11E-11
36426546 36427760 6.31447739 2.88E-08
36427153 36428367 3.93077012 0.00822862
332
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36427760 36428975 6.59679415 6.53E-05
36428367 36429583 7.65349876 1.40E-07
36428975 36430192 7.26378089 1.10E-06
36438757 36439987 6.91199385 0.00065793
36439372 36440603 6.14925001 0.00924672
36444930 36446170 4.71329953 0.00943458
36447413 36448657 9.30726002 7.08E-10
36448035 36449280 7.82083118 3.06E-05
36454913 36456170 5.79710857 0.00306027
36455542 36456800 6.12998107 0.00253267
36460592 36461860 -6.8333284 0.00091342
36469524 36470810 7.30008641 1.92E-07
36470167 36471454 8.5845074 5.10E-09
36470810 36472099 8.68900968 2.08E-07
36474685 36475982 9.99750841 5.63E-13
36475334 36476631 6.07165637 0.00049999
36477933 36479237 6.63647668 0.0002694
36484482 36485800 8.23693073 5.18E-06
36485141 36486460 4.67621314 0.00031408
36493775 36495114 -5.8752384 0.00266934
36507315 36508686 5.44741229 0.00466335
36508000 36509373 7.91544296 2.06E-05
36525409 36526824 -5.4466584 0.00614928
36526116 36527533 -5.4517141 0.00176164
36527534 36528954 6.18594195 0.00418337
36547749 36549219 6.62979028 0.00219296
36548484 36549956 9.16765578 9.96E-10
36552172 36553653 4.78145992 0.00044474
36562621 36564128 6.40969238 0.00313258
36564884 36566396 -4.6857336 0.00779839
36567155 36568672 9.22163454 1.40E-07
333
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36567914 36569433 9.05790068 8.03E-07
36568673 36570195 7.16735245 0.0015729
36571722 36573251 -4.8063868 0.00855029
36574785 36576321 3.51398395 0.00253267
36575553 36577091 8.58721717 1.94E-08
36576322 36577862 9.07635381 9.96E-10
36577092 36578634 4.08026932 0.00253267
36580956 36582508 4.44860024 0.00118968
36581732 36583285 5.43783997 0.00051611
36584065 36585624 5.19288271 0.00393797
36594271 36595854 6.15128015 0.00160141
36598238 36599831 5.36914918 0.00046146
36599035 36600629 9.99934587 1.32E-06
36609474 36611093 -4.3698555 0.00462971
36612719 36614346 5.11568005 0.00397123
36613532 36615161 9.58710989 1.69E-11
36620074 36621719 5.03147673 0.00847348
36625022 36626677 5.05492652 0.00065793
36625849 36627507 4.03410268 0.00869538
36629171 36630836 -7.003066 0.00091276
36630003 36631671 -6.9765212 0.0007058
36631672 36633343 -5.2309112 0.00349047
36632508 36634181 -5.0187545 0.0068581
36635020 36636700 3.92357996 0.00346738
36635860 36637541 4.70853391 0.00589323
36644310 36646011 -5.1397395 0.00534331
36678230 36680011 -6.1696126 0.00065238
36680012 36681798 -5.4258578 0.00152795
36680905 36682693 -5.8357132 0.00152115
36692611 36694428 8.50167259 2.29E-05
36699911 36701747 5.08698912 0.00891689
334
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36700829 36702667 5.57415448 0.00397999
36701748 36703588 5.27879921 0.00574901
36708215 36710072 7.88117736 1.56E-05
36711005 36712870 7.86184739 0.00085693
36721336 36723230 9.67340601 1.30E-06
36722283 36724180 7.52574543 0.00011951
36737632 36739576 5.64719404 0.0033521
36740553 36742506 8.81583236 1.11E-07
36755374 36757378 5.10121704 0.00235956
36760401 36762422 -6.4717096 0.0009367
36761411 36763436 -6.7162771 6.46E-06
36769564 36771619 6.72228717 0.00399923
36785192 36787309 -5.2462769 0.00187615
36793723 36795876 -6.4362647 0.00016942
36803497 36805693 6.06936737 0.00025205
36809012 36811232 9.17706758 4.08E-06
36810122 36812348 8.86807383 5.75E-06
36819097 36821364 -5.9229886 0.00024525
36829396 36831713 -6.6250797 0.00085578
36847072 36849476 3.19423968 0.00602858
36853113 36855549 5.81392578 0.0073898
36862945 36865433 -6.4243639 0.00175456
36880669 36883253 -5.789861 0.00360003
36915511 36918300 -5.4894569 0.00108972
36938338 36941266 6.49251367 0.00022981
36947191 36950174 6.6451659 0.00209437
36951679 36954690 8.74416556 2.80E-07
36957728 36960776 -4.267691 0.00693789
36962314 36965390 5.81521445 0.00010327
36963852 36966937 6.6754402 3.30E-05
36973177 36976319 5.71456785 0.00410491
335
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
36990714 36993959 -5.8539048 0.00393797
36992337 36995591 -5.6809478 0.00545778
37008816 37012162 -5.3731915 0.00642581
37036116 37039601 -7.0225175 0.00051611
37106625 37110350 -6.2762985 0.00160141
37157395 37161183 5.53000647 0.00410395
37187773 37191580 5.65172598 0.00093715
37212576 37216405 4.03696126 0.0009367
37245306 37249188 6.70840535 0.00035383
37247247 37251133 6.7562262 0.00031408
37376917 37381656 6.39608048 0.00034011
37430352 37435904 -6.2005704 0.00083817
37456313 37462375 -6.1134865 0.00932299
37539859 37548012 -5.5442088 0.00334075
37592907 37602638 6.7819362 0.0009367
37634048 37645088 -4.2859502 0.00785999
37674649 37687034 -5.2094426 0.00642581
37865474 37884227 7.43317842 0.00011951
38182100 38205843 6.36973826 0.00315036
38426343 38451967 -6.075009 0.00175456
38439155 38464954 -4.9213404 0.00473672
38601369 38630247 -5.4469672 0.00837855
38790145 38825524 -4.5339137 0.00832257
38844384 38882175 -4.3926606 0.0076947
39379992 39431602 -6.0661361 0.00202892
39959632 40013021 7.35958173 0.00038437
40383991 40437879 -5.9831322 0.00186227
40521163 40578110 -6.1239067 0.00160141
40549637 40607654 -7.4662199 3.51E-05
41082235 41192923 5.30969732 0.00720145
41960980 42110232 5.70041547 0.00511626
336
Chromosome location (Chr21) log2FoldChange Adjusted p value
Start End
42327921 42467118 -5.3236343 0.0072539
44313418 44387118 7.77763181 3.70E-06
44588018 44672916 -4.6105679 0.00815263
45203237 45280637 5.66002205 0.00410491
Appendix XIV Table 2: Differential near-bait interactions in STAG2 72 h PMA cells compared to STAG2 12 h DMSO K562 cells
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
35436685 35448601 -6.0857456 0.00059104
35442448 35454754 -6.2731354 0.00225731
35448407 35461102 -5.3307194 0.00758347
35460911 35474374 -5.4099975 0.00396781
35474191 35488395 -6.6574472 0.00021416
35481117 35495673 -6.5801675 0.00070013
35488228 35503117 -7.5036046 0.00015947
35510591 35526316 -5.794641 0.00682029
35758987 35768060 -5.2165839 0.00436163
35820274 35829140 3.92392103 0.00762037
35903813 35913177 -5.522952 0.00277216
35917872 35926974 -5.7170437 0.00574955
35922480 35931468 -6.113011 0.00110525
36018477 36023451 -5.7857828 0.00392621
36021017 36025886 -6.7927842 0.00064853
36039518 36043657 -6.1093502 0.00451839
36053474 36057119 5.74975949 0.00063965
36067476 36070695 8.09423417 3.10E-05
36073869 36076922 5.75049924 0.00332039
36078448 36081394 6.09584481 0.00765344
36097958 36100545 -5.8280405 0.00249966
337
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36106893 36109371 8.88047564 6.43E-07
36108138 36110604 8.41167605 7.90E-06
36111836 36114270 5.79605247 0.0008147
36113058 36115482 4.50555696 0.00858253
36114274 36116689 5.70466809 0.00362316
36117896 36120289 4.91532515 0.00054761
36119096 36121482 5.09296712 0.0005213
36120292 36122672 8.16806929 9.78E-08
36121485 36123860 3.52660994 0.00623036
36123862 36126228 6.83826906 0.00081453
36125047 36127409 7.17074017 0.00050799
36157963 36160322 4.70290259 0.00407907
36178044 36180403 -6.5938745 0.00186164
36182762 36185114 5.77498867 0.00250278
36183939 36186289 3.78948919 0.0093649
36209319 36211537 6.65970543 0.00332039
36212646 36214832 5.17999525 0.00343304
36213745 36215920 4.60367943 0.00623036
36215926 36218078 -6.2287521 0.00162394
36243806 36245595 6.03604449 0.00077209
36253381 36255044 6.4051837 4.04E-05
36255049 36256690 5.40118095 5.45E-05
36255874 36257506 6.13577976 4.78E-06
36257511 36259122 -6.071433 0.0005773
36261516 36263082 -5.5745924 0.00440467
36266175 36267691 -5.693073 0.00186164
36266936 36268445 -5.8929383 0.00092378
36269199 36270685 -4.7459043 0.00396781
36271428 36272893 7.39629365 1.32E-05
36278644 36280049 8.47626445 2.67E-07
36279349 36280749 5.88273561 0.00229365
338
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36280751 36282141 9.32271791 9.28E-11
36281449 36282833 9.35755075 5.57E-11
36282143 36283523 6.57986117 0.00012543
36282836 36284211 6.59308126 7.17E-05
36288291 36289632 8.41409446 2.34E-07
36290969 36292295 4.93437188 0.00015032
36293617 36294930 8.87598018 5.57E-11
36294275 36295584 8.73704516 2.01E-08
36296239 36297539 -4.9893355 0.00789699
36296890 36298187 -4.8207939 0.00789699
36314565 36315788 7.49896603 7.50E-08
36316400 36317616 5.93787377 0.00130598
36317009 36318223 7.64886122 5.92E-06
36317617 36318829 8.6586326 7.12E-07
36321841 36323037 2.85880135 0.00530882
36322440 36323634 4.47423164 0.00624517
36326010 36327189 6.48630118 0.00040072
36331867 36333024 6.10489364 0.00279432
36333602 36334752 5.22177302 0.00589405
36335327 36336470 8.61424774 3.23E-07
36335900 36337041 8.53667475 1.96E-10
36337042 36338179 8.08807553 0.00012543
36337611 36338746 5.3259128 0.00297428
36339878 36341004 6.67090109 5.25E-06
36340442 36341565 4.41760746 0.00624517
36341005 36342126 5.29524101 0.00458437
36341566 36342686 5.04193019 0.00472812
36343803 36344915 6.01427329 0.00340964
36346578 36347680 6.91611197 3.18E-07
36347130 36348230 6.51125992 5.99E-06
36348231 36349329 7.41012673 6.09E-07
339
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36348781 36349876 7.07821381 8.52E-06
36353148 36354232 8.71656398 3.22E-10
36353690 36354773 6.2181543 0.00160582
36357470 36358544 7.66678156 1.93E-08
36358008 36359081 -5.798246 0.00022941
36359081 36360153 5.45257467 2.37E-05
36361223 36362291 5.91568037 0.00057599
36361757 36362824 5.96298918 0.00024763
36368674 36369735 6.1916237 2.00E-05
36369205 36370266 7.16297674 2.76E-05
36370796 36371857 7.9559838 4.25E-08
36371327 36372388 7.95799941 3.91E-08
36372388 36373449 -5.0616276 0.00231887
36375572 36376634 7.39662037 4.22E-08
36376103 36377166 7.9155326 4.23E-10
36376634 36377698 7.20737095 3.50E-07
36377166 36378230 6.00247042 9.26E-05
36381430 36382501 6.25125031 0.00167117
36383036 36384111 7.58060482 7.62E-09
36383573 36384648 8.19239228 9.17E-11
36384110 36385187 7.27158077 1.01E-07
36388977 36390069 8.04050851 4.58E-08
36389522 36390615 8.18851773 9.50E-09
36402372 36403521 -3.3526677 0.00615
36402947 36404099 -6.4052802 5.56E-05
36404680 36405841 7.07708111 6.41E-07
36405261 36406426 7.67635857 1.65E-08
36407013 36408189 -6.0414894 0.00012123
36409375 36410564 8.16295853 2.63E-07
36409969 36411162 8.58969704 1.89E-08
36412365 36413572 6.67171768 0.00045194
340
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36412969 36414179 7.48261385 3.87E-05
36414790 36416010 5.19805826 0.00169526
36416012 36417240 6.47370689 4.25E-05
36416626 36417857 8.32922355 1.96E-10
36417242 36418476 8.93449959 3.22E-10
36417859 36419097 8.70673298 4.23E-10
36425395 36426671 6.48656042 0.0001889
36426033 36427312 3.57574031 0.00163044
36426673 36427955 5.26153509 7.12E-07
36427314 36428599 5.10900159 0.00187829
36427956 36429244 7.04581518 1.63E-06
36428600 36429891 7.15848213 1.01E-07
36429892 36431189 -5.9821829 0.00141003
36438413 36439740 5.99517147 0.00297428
36439076 36440405 6.23445205 0.00130598
36446428 36447774 -5.7569719 0.00197578
36447101 36448448 4.89204718 1.31E-05
36447775 36449123 8.79337306 7.24E-10
36455222 36456582 5.4545644 0.00504532
36460677 36462045 -7.0332119 0.0002511
36469616 36471000 6.74924793 1.63E-06
36470308 36471694 7.98815314 2.01E-08
36471001 36472388 8.00746203 7.12E-07
36474478 36475874 9.60736861 1.17E-13
36475176 36476573 5.68780928 0.0007157
36481504 36482922 -5.8163104 0.00162394
36484347 36485775 8.59528198 1.52E-10
36485061 36486492 3.8841857 0.00132208
36493739 36495204 -6.1089012 0.00074837
36507148 36508663 5.26009344 0.00630545
36507905 36509423 7.34675317 4.36E-05
341
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36513242 36514775 -5.3173362 0.00395153
36516314 36517855 -5.5017113 0.00467387
36520951 36522503 -5.0549633 0.00674828
36525620 36527182 -5.9381324 0.00050799
36529533 36531103 -5.2801418 0.00162394
36537423 36539010 -5.510789 0.00049537
36538217 36539806 -5.2311225 0.00605158
36547820 36549434 6.90041921 2.56E-05
36548627 36550243 8.56415831 3.08E-09
36549435 36551053 8.4001037 3.10E-05
36551865 36553489 4.07906958 0.00112378
36564157 36565814 -5.3641166 0.00220175
36564985 36566644 -5.5645765 0.00050915
36566645 36568309 8.87079792 3.00E-06
36567477 36569143 8.44738811 3.57E-06
36571653 36573330 -5.0946442 0.00277375
36575015 36576703 2.89255552 0.00485284
36575859 36577549 9.23549274 3.18E-11
36576704 36578397 3.36271086 0.00769729
36580949 36582655 3.66835445 0.00352294
36581802 36583511 4.64640089 0.00160582
36598256 36600012 4.60675253 0.00126352
36599134 36600892 9.27003353 3.57E-05
36609742 36611522 -5.0781173 0.00030496
36612414 36614199 -6.4072496 0.00079017
36613306 36615093 9.02751731 5.57E-11
36615989 36617781 -5.1702582 0.00498024
36617782 36619578 5.98150753 0.00140614
36624993 36626806 4.22364429 0.00234029
36628625 36630448 -7.2845038 0.00034806
36629536 36631361 -7.3250371 0.00023556
342
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36631362 36633192 -5.386783 0.00028369
36632277 36634110 -5.2554689 0.00229365
36635950 36637793 3.72955132 0.00432525
36644289 36646155 -5.4874679 0.00201106
36645222 36647089 -5.7016239 0.00455515
36663073 36664956 -6.1681355 0.00220175
36678070 36679930 -5.9626629 0.00093922
36679928 36681783 -6.3506823 6.52E-05
36680855 36682707 -6.4303147 0.0002511
36681781 36683630 -5.8688247 0.00575549
36692776 36694589 7.82142793 5.81E-05
36693683 36695492 -4.2375019 0.0092254
36700879 36702664 4.90280398 0.00861558
36710630 36712389 7.35556765 0.00067607
36711509 36713267 7.26403527 0.00171581
36721128 36722869 9.0055117 2.52E-06
36734160 36735898 -4.5910204 0.00809055
36749016 36750781 -6.8938244 5.59E-05
36755229 36757019 4.44296764 0.00348182
36756124 36757918 8.00306112 1.79E-05
36760629 36762445 -6.8698039 0.00022941
36761537 36763358 -7.1788108 2.34E-07
36764276 36766112 5.80709303 0.00407907
36769828 36771700 6.11400191 0.00186684
36770764 36772642 5.66022215 0.00272098
36774541 36776448 4.68783611 0.00156114
36785244 36787241 -5.7724043 0.00021416
36793363 36795435 -6.8419092 3.79E-06
36794399 36796481 -6.1091525 0.00200109
36801784 36803931 5.77163822 0.00433589
36803935 36806100 5.34462276 0.00182957
343
Chromosome location (Chr21) log2FoldChange Adjusted p value Start
Start End
36809388 36811594 8.14513954 6.21E-05
36810491 36812704 4.52197815 0.00407907
36811597 36813817 -4.3276266 0.00634438
36829685 36831979 -7.1834139 5.23E-05
36838884 36841188 -5.5325229 0.00652333
36845802 36848109 10.2038466 5.57E-11
36846955 36849263 2.65570785 0.00758347
36863120 36865432 -6.7298954 0.00057648
36867743 36870055 5.59197015 0.00506037
36876991 36879304 -6.1757115 0.00017592
36880462 36882777 -6.1215596 0.00056329
36902651 36905026 -6.0752014 0.00498024
36912244 36914677 -6.6823463 0.00103827
36915906 36918366 -5.6577655 0.00187829
36947079 36949855 5.96104556 0.00452362
36948467 36951259 6.78019674 0.0004733
36958414 36961325 -5.2806253 0.00051959
36962808 36965773 5.01921642 0.00058573
36964290 36967274 5.59505943 0.00045611
36973386 36976488 7.34270572 6.52E-05
36991078 36994449 -6.363725 0.00112452
36992763 36996164 -5.6514451 0.0034993
37008653 37012362 -5.7851037 0.0020801
37043150 37047640 -6.4945969 0.00013199
37045395 37049936 -4.1547449 0.00943078
37097715 37103034 -4.5778842 0.00451839
37105704 37111038 -4.6751958 0.00650289
37134665 37139771 6.31946179 0.00049537
37187383 37191422 4.83341234 0.00257763
37212613 37216380 3.51274477 0.00163945
37229550 37233329 6.17766054 0.00200687
Top Related