Genetics of distal hereditary motor neuropathies

306
GENETICS OF DISTAL HEREDITARY MOTOR NEUROPATHIES By alexander peter drew A thesis submitted for the Degree of Doctor of Philosophy Supervised by Professor Garth A. Nicholson Dr. Ian P. Blair Faculty of Medicine University of Sydney 2012

Transcript of Genetics of distal hereditary motor neuropathies

G E N E T I C S O F D I S TA L H E R E D I TA RYM O T O R N E U R O PAT H I E S

By

alexander peter drew

A thesis submitted for the Degree ofDoctor of Philosophy

Supervised byProfessor Garth A. Nicholson

Dr. Ian P. Blair

Faculty of MedicineUniversity of Sydney

2012

statement

No part of the work described in this thesis has been submitted in fulfilment ofthe requirements for any other academic degree or qualification. Except wheredue acknowledgement has been made, all experimental work was performedby the author.

Alexander Peter Drew

C O N T E N T S

acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . isummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iilist of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vlist of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiacronyms and abbreviations . . . . . . . . . . . . . . . . . . . . . xipublications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

1 literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Molecular genetics and mechanisms of disease in Distal

Hereditary Motor Neuropathies . . . . . . . . . . . . . . . . . . . 11.1.1 Small heat shock protein family . . . . . . . . . . . . . . 21.1.2 Dynactin 1 (DCTN1) . . . . . . . . . . . . . . . . . . . . . 91.1.3 Immunoglobulin mu binding protein 2 gene (IGHMBP2) 111.1.4 Senataxin (SETX) . . . . . . . . . . . . . . . . . . . . . . . 141.1.5 Glycyl-tRNA synthase (GARS) . . . . . . . . . . . . . . . 161.1.6 Berardinelli-Seip congenital lipodystrophy 2 (SEIPIN)

gene (BSCL2) . . . . . . . . . . . . . . . . . . . . . . . . . 181.1.7 ATPase, Cu2+-transporting, alpha polypeptide gene (ATP7A) 201.1.8 Pleckstrin homology domain-containing protein, G5 gene

(PLEKHG5) . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.1.9 Transient receptor potential cation channel, V4 gene (TRPV4) 221.1.10 DYNC1H1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.1.11 Clinical variability in dHMN . . . . . . . . . . . . . . . . 241.1.12 Common disease mechanisms in dHMN . . . . . . . . . 29

2 general materials and methods . . . . . . . . . . . . . . . . . 322.1 General materials and reagents . . . . . . . . . . . . . . . . . . . 32

2.1.1 Reagents and Enzymes . . . . . . . . . . . . . . . . . . . . 322.1.2 Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.1.3 Plasticware . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2 Study participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.3 DNA methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.1 DNA extraction from blood . . . . . . . . . . . . . . . . . 342.3.2 DNA oligonucleotides . . . . . . . . . . . . . . . . . . . . 352.3.3 PCR protocol . . . . . . . . . . . . . . . . . . . . . . . . . 352.3.4 Agarose gel electrophoresis . . . . . . . . . . . . . . . . . 362.3.5 DNA purification from PCR . . . . . . . . . . . . . . . . . 362.3.6 DNA sequencing . . . . . . . . . . . . . . . . . . . . . . . 37

3 genetic mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1.2 Homologous recombination and linkage disequilibrium 393.1.3 Genetic linkage analysis . . . . . . . . . . . . . . . . . . . 413.1.4 Penetrance in dHMN . . . . . . . . . . . . . . . . . . . . . 443.1.5 Characterisation of dHMN1 in a large Australian family 443.1.6 Previous genetic studies in dHMN1 at 7q34-q36 . . . . . 443.1.7 Hypothesis and aims . . . . . . . . . . . . . . . . . . . . . 48

3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 503.2.1 Microsatellite analysis . . . . . . . . . . . . . . . . . . . . 503.2.2 Microsatellite markers . . . . . . . . . . . . . . . . . . . . 513.2.3 Haplotype analysis . . . . . . . . . . . . . . . . . . . . . . 513.2.4 Genetic linkage analysis . . . . . . . . . . . . . . . . . . . 523.2.5 Association analysis . . . . . . . . . . . . . . . . . . . . . 53

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.3.1 Fine mapping of family CMT54 . . . . . . . . . . . . . . . 543.3.2 Linkage analysis using new markers in CMT54 . . . . . 573.3.3 Identification of additional dHMN families . . . . . . . . 603.3.4 Linkage power analysis . . . . . . . . . . . . . . . . . . . 623.3.5 Genetic linkage analysis in additional dHMN families . 653.3.6 Suggestive linkage in family CMT44 . . . . . . . . . . . . 653.3.7 Uninformative families . . . . . . . . . . . . . . . . . . . . 663.3.8 dHMN families excluded from 7q34-q36 . . . . . . . . . 743.3.9 Haplotype association analysis . . . . . . . . . . . . . . . 75

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.4.1 Linkage analysis in CMT54 . . . . . . . . . . . . . . . . . 763.4.2 Linkage analysis in the extended dHMN cohort . . . . . 793.4.3 Proportion of dHMN caused by a pathogenic gene at

7q34-q36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824 analysis of candidate genes . . . . . . . . . . . . . . . . . . . . 84

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.1.1 Chapter overview . . . . . . . . . . . . . . . . . . . . . . . 844.1.2 Strategy for gene identification . . . . . . . . . . . . . . . 844.1.3 Identifying genes within a disease interval . . . . . . . . 854.1.4 Selection of candidate genes . . . . . . . . . . . . . . . . . 864.1.5 Gene screening methods . . . . . . . . . . . . . . . . . . . 88

4.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 904.2.1 In-silico candidate gene prioritisation . . . . . . . . . . . 904.2.2 PCR and sequencing for gene screening . . . . . . . . . . 904.2.3 Analysis of effect of variants on exon splicing . . . . . . 914.2.4 RNA studies of splicing variants . . . . . . . . . . . . . . 924.2.5 RNA folding prediction . . . . . . . . . . . . . . . . . . . 93

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.3.1 Positional candidate genes . . . . . . . . . . . . . . . . . . 94

4.3.2 Functional candidate gene selection . . . . . . . . . . . . 964.3.3 Screening candidate genes . . . . . . . . . . . . . . . . . . 994.3.4 Analysis of the interval defined by significantly associ-

ated haplotypes . . . . . . . . . . . . . . . . . . . . . . . . 1064.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.4.1 Gene screening summary . . . . . . . . . . . . . . . . . . 1074.4.2 Was the mutation missed? . . . . . . . . . . . . . . . . . . 108

5 copy number and structural variation . . . . . . . . . . . . 1095.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.1.1 CNV and structural variation . . . . . . . . . . . . . . . . 1095.1.2 CNV in dHMN and related peripheral neuropathies . . 1125.1.3 Methods of detecting CNV . . . . . . . . . . . . . . . . . 1135.1.4 Hypothesis and Aim . . . . . . . . . . . . . . . . . . . . . 115

5.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 1155.2.1 Custom CGH Microarray . . . . . . . . . . . . . . . . . . 1155.2.2 Array processing . . . . . . . . . . . . . . . . . . . . . . . 1165.2.3 Software Nexus 5 . . . . . . . . . . . . . . . . . . . . . . . 1175.2.4 PCR confirmation of deletions . . . . . . . . . . . . . . . 118

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.3.1 Cytogenetic analysis . . . . . . . . . . . . . . . . . . . . . 1185.3.2 CNV analysis at 7q34-q36 using aCGH . . . . . . . . . . 119

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.4.2 Comparison of methods for CNV analysis . . . . . . . . 1305.4.3 Limitations of array-based CGH . . . . . . . . . . . . . . 133

6 next generation sequencing of cmt54 . . . . . . . . . . . . . 1346.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.1.1 NGS technology . . . . . . . . . . . . . . . . . . . . . . . . 1346.1.2 NGS of large genomes . . . . . . . . . . . . . . . . . . . . 1386.1.3 Applications of NGS to Mendelian disease . . . . . . . . 139

6.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 1416.2.1 NimbleGen array-based targeted sequence capture . . . 1416.2.2 454 GS FLX sequencing . . . . . . . . . . . . . . . . . . . 1426.2.3 NimbleGen array-based exome capture and Solexa se-

quencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.2.4 Confirmation of sequence variants using Sanger sequencing1446.2.5 Sequence variant analysis using Galaxy Browser . . . . . 1446.2.6 NGS assembly and annotation using Bowtie and SAMtools1446.2.7 Unequal coverage comparison . . . . . . . . . . . . . . . 1456.2.8 In silico prediction of functional impacts of sequence vari-

ants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.3.1 Chromosome 7q34-q36 target region sequencing . . . . . 1466.3.2 7q34-q36 sequence variant analysis . . . . . . . . . . . . . 1476.3.3 Off-target and repeat mapping reads . . . . . . . . . . . . 154

6.3.4 Exome sequencing . . . . . . . . . . . . . . . . . . . . . . 1556.3.5 Further sequencing analysis using unequal coverage anal-

ysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.3.6 Exome wide analysis . . . . . . . . . . . . . . . . . . . . . 163

6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1646.4.1 Limitations of the NGS analysis . . . . . . . . . . . . . . 166

7 exome sequencing of cmt44 . . . . . . . . . . . . . . . . . . . . 1697.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1697.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . 170

7.2.1 Exome sequencing . . . . . . . . . . . . . . . . . . . . . . 1707.2.2 NGS variant analysis . . . . . . . . . . . . . . . . . . . . . 1717.2.3 HRM analysis . . . . . . . . . . . . . . . . . . . . . . . . . 172

7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1727.3.1 Exome sequencing in CMT44 . . . . . . . . . . . . . . . . 1727.3.2 Clinical detail of CMT44 . . . . . . . . . . . . . . . . . . . 1737.3.3 MFN2 genotyping in CMT44 using high resolution melt

analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1777.3.4 Exome wide sequence variant analysis in CMT44 . . . . 1777.3.5 Comparison of SNPs between CMT44 and CMT54 . . . 179

7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1798 general discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 185

8.1 Genetic studies in family CMT54 . . . . . . . . . . . . . . . . . . 1858.2 Genetic linkage analysis in additional dHMN families . . . . . 1908.3 Genetic studies in family CMT44 . . . . . . . . . . . . . . . . . . 1918.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

a custom gnu bash scripts . . . . . . . . . . . . . . . . . . . . . . 194a.1 Linkage analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194a.2 NGS coverage visualisation . . . . . . . . . . . . . . . . . . . . . 197a.3 NGS variant comparison with fold coverage analysis . . . . . . 198

b pcr primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205c pedigrees for additional dhmn families . . . . . . . . . . . 216d two point lod scores for additional

dhmn families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230e multi-point lod scores for additional

dhmn families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240f 7q34-q36 control haplotypes . . . . . . . . . . . . . . . . . . . 249g 7q34-q36 candidate genes and reported

gene expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251h next generation sequencing . . . . . . . . . . . . . . . . . . . . 255i cnv identified with array cgh . . . . . . . . . . . . . . . . . . 257

bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

A C K N O W L E D G E M E N T S

It is a pleasure to thank the people who made this PhD project possible. My

warmest thanks to my supervisors Professor Garth Nicholson and Dr Ian

Blair for your support, encouragement and the opportunities presented to

me throughout this project. To Garth, your expertise with the clinical aspects

of peripheral neuropathies as well as your academic experience have proved

invaluable. To Ian, thank you for your guidance in the lab, the opportunities

and resources I was given and the assistance with preparation of my thesis.

I am extremely grateful to all participating dHMN family members, without

whom this research would not be possible.

To my colleagues, in particular, Kelly Williams, for assistance with the

targeted sequencing of candidate genes and Jenn Solski, for assistance with

sequencing variants identified with NGS. To all the staff and students in the

Northcott Neuroscience and Molecular Medicine laboratories, thank you for

your friendly assistance over these years. The help of Annette Berryman in the

Molecular Medicine office and the procedural guidance of Tracey Dent and

Annet Doss in the ANZAC research institute office has been most helpful.

The support and encouragement of my friends and family has been essential.

My parents, Paul and Josephina Drew, have been a constant source of support,

both emotionally and financially during my postgraduate years. Finally, my

warmest thanks go to my girlfriend Victoria, without her loving support and

patience this thesis would never have been possible. You always bring me joy.

This work was supported in part by an Australian Postgraduate Award and

funding from the National Health and Medical Research Council of Australia

(511941).

i

S U M M A RY

The distal hereditary motor neuropathies (dHMN) are a clinically and geneti-

cally heterogeneous group of disorders that primarily affect motor neurons,

without significant sensory involvement. Using genome wide linkage analy-

sis in a large Australian family (CMT54), a form of dHMN was previously

mapped by this laboratory, to a 12.98 Mb interval on chromosome 7q34-q36.

The axonal neuropathy seen in this family was classified as dHMN1; with

autosomal dominant inheritance, early but variable age of onset, and muscle

weakness and wasting affecting the lower limbs.

In this project, genetic linkage analysis of the chromosome 7q34-q36 disease

interval was carried out in the original family (CMT54) and 20 smaller families

from an Australian dHMN cohort. Fine mapping in family CMT54, including

unaffected individuals suggested a minimum probable candidate interval

of 6.92 Mb, flanked by markers D7S615 and D7S2546 within the 12.98 Mb

critical disease interval. Of the additional dHMN families, one (family CMT44)

achieved suggestive linkage to the chromosome 7q34-q36 disease locus with a

LOD score of 2.02.

Mutation screening was carried out in family CMT54 at the chromosome

7q34-q36 locus. The 12.9 Mb disease interval contains 89 annotated protein-

coding genes, of which 60 lay within the prioritised 6.92 Mb interval. A combi-

nation of methods was used to screen these genes for a putative pathogenic

mutation. Functional candidate genes were identified via a literature and

database search. The coding exons of 35 prioritised candidate genes were

sequenced and no pathogenic mutation was identified. Cytogenetic analysis

excluded large scale chromosomal abnormalities. Array based comparative

ii

genomic hybridisation of the 7q34-q36 interval in patients did not identify any

pathogenic duplications or deletions.

Next generation sequencing (NGS) techniques were used to identify se-

quence variants within the remaining genes within the 7q34-q36 interval and

elsewhere in the genome. Two NGS based approaches were applied to muta-

tion screening in family CMT54. Initially, the chromosome 7q34-q36 disease

interval was analysed in one affected individual using a custom designed DNA

capture microarray and 454 GS FLX (Roche) sequencing. Approximately 80%

of patient coding exons were captured, sequenced and no pathogenic muta-

tions were identified. The chromosome 7q34-q36 target captured DNA sample

was also re-sequenced along with an additional two affected individuals and

one unaffected parent using exome capture and Solexa (Illumina) sequencing.

Combined, 99.5% of coding exons were sequenced in the chromosome 7q34-

q36 interval and all sequence variants that were identified were excluded from

a pathogenic role. Sequence variants identified elsewhere in the exome were

also excluded from a pathogenic role.

Exome sequencing of dHMN family CMT44 did not identify any putative

pathogenic mutation at the chromosome 7q34-q36 locus. The exomes of four

affected and one unaffected individuals were sequenced. Exome wide analysis

identified a potential digenic inheritance in CMT44 of a previously published

MFN2 mutation causing a mild CMT2 phenotype and a second mutation

causing a dHMN phenotype. Potential candidate mutations for dHMN were

identified in two genes, PCDHGA4 and DNAH11. PCDHGA4, was previously

shown to function in the brain and spinal cord, and deletion of PCDHG genes

in a mouse model causes a severe neurodegenerative phenotype.

The gene mutation causing dHMN that maps to chromosome 7q34-q36

remains to be identified. The disease mutation may lie in a coding region

not captured by current exome platforms, a non-coding region, or the muta-

tion may cause disease through an alternate mechanism not detected by the

iii

methods employed in this thesis.Future studies should concentrate on tran-

scriptome analysis by next-gen RNA sequencing, which may identify unknown

transcripts and exons that map to chromosome 7q34-q36 or highlight sequence

variants located in regulatory elements.

Identification of new gene mutations is critical to further understanding the

biochemical and cellular processes underlying dHMN. Although the causative

mutation for dHMN on 7q34-q36 was not identified, a significant proportion

of the disease interval has been excluded using a combination of traditional

and new technologies.

The purpose of this thesis is to identify new gene mutations causing dHMN.

The genetic and functional data presented here suggest this will be a difficult

task; the genetic heterogeneity complicates genetic analysis and the multiple

molecular mechanisms implicated to date make it difficult to pinpoint specific

candidate genes. The identification of additional genes and genetic modifiers

is necessary to increase our understanding of the disease mechanisms caus-

ing dHMN and related neuropathies. This will directly aid in the diagnosis

and classification of these neurodegenerative diseases and may lead to new

therapeutics and treatment strategies.

iv

L I S T O F F I G U R E S

Figure 1.1 Eleven causative genes described to date for dHMN. . . 4Figure 3.1 Original pedigree CMT54 and haplotype analysis at chr7q34-

q36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 3.2 Physical map of the chr7q34-q36 disease locus . . . . . . 49Figure 3.3 Fine mapping of the proximal end of the minimum prob-

able candidate interval in CMT54. . . . . . . . . . . . . . 55Figure 3.4 Detail of fine mapping of the distal end of minimum

probable candidate interval in family CMT54 . . . . . . . 57Figure 3.5 Updated pedigree CMT54 and haplotype analysis . . . . 59Figure 3.6 CMT54 multi-point LOD plot at 7q34-q36 . . . . . . . . . 61Figure 3.7 Pedigree CMT44 haplotype analysis . . . . . . . . . . . . 67Figure 3.8 CMT44 multi-point LOD plot. . . . . . . . . . . . . . . . . 69Figure 4.1 Genes mapping to the 7q34-q36 disease interval . . . . . 95Figure 4.2 Location of variant CNTNAP2:c.3476-15C>A and adja-

cent exons. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Figure 4.3 PCR amplification of cDNA flanking the CNTNAP2:c3476-

15C>A variant . . . . . . . . . . . . . . . . . . . . . . . . . 105Figure 4.4 Genes in the interval defined by significantly associated

haplotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figure 5.1 Forms of CNV and structural variation at a chromosomal

or molecular scale. . . . . . . . . . . . . . . . . . . . . . . 111Figure 5.2 Array based CGH analysis. . . . . . . . . . . . . . . . . . 115Figure 5.3 Direct method for aCGH. . . . . . . . . . . . . . . . . . . 117Figure 5.4 The proportion of CNV calls within each size range. . . 120Figure 5.5 Array CGH analysis in CMT54. . . . . . . . . . . . . . . . 121Figure 5.6 CNV probe plot around deletion A. . . . . . . . . . . . . 122Figure 5.7 UCSC genome browser views of deletion A. . . . . . . . 123Figure 5.8 CNV probe plot around deletion B. . . . . . . . . . . . . 126Figure 5.9 UCSC genome browser views of deletion B. . . . . . . . 127Figure 5.10 CNV probe plot around deletion C. . . . . . . . . . . . . 128Figure 5.11 UCSC genome browser views of deletion C. . . . . . . . 129Figure 5.12 PCR Genotyping of control individuals for deletion C. . 129Figure 5.13 CNV probe plot around deletion D. . . . . . . . . . . . . 131Figure 5.14 UCSC genome browser views of deletion D. . . . . . . . 132Figure 6.1 Clonal amplification of immobilised template DNA in NGS136Figure 6.2 Filtering strategy for variants identified by targeted cap-

ture sequencing of 7q34-q36. . . . . . . . . . . . . . . . . 148Figure 6.3 454 sequencing contigs at the ACCN3 gene. . . . . . . . . 152Figure 6.4 7q34-q36 sequencing coverage AGRF sequencing . . . . 153

v

Figure 6.5 Recent segmental duplications at the chromosome 7q34-q36 disease locus. . . . . . . . . . . . . . . . . . . . . . . . 155

Figure 6.6 Sequencing coverage comparison between three exomecapture and one target region capture samples. . . . . . 160

Figure 7.1 Updated pedigree for family CMT44 . . . . . . . . . . . . 175Figure 7.2 HRM genotyping analysis of the MFN2 mutation identi-

fied in CMT44. . . . . . . . . . . . . . . . . . . . . . . . . 178Figure 7.3 Updated comparison of haplotypes between CMT44 and

CMT54 using SNP data from NGS. . . . . . . . . . . . . . 180Figure 7.4 Organisation of the protocadherin gene cluster on chro-

mosome 5p31. . . . . . . . . . . . . . . . . . . . . . . . . . 183Figure C.1 Pedigree CMT484 with haplotype analysis at 7q34-q36 . 217Figure C.2 Pedigree CMT274 with haplotype analysis at 7q34-q36 . 217Figure C.3 Pedigree CMT609 with haplotype analysis at 7q34-q36 . 218Figure C.4 Pedigree CMT677 with haplotype analysis at 7q34-q36 . 219Figure C.5 Pedigree CMT703 with haplotype analysis at 7q34-q36 . 219Figure C.6 Pedigree CMT131 with haplotype analysis at 7q34-q36 . 220Figure C.7 Pedigree CMT273 with haplotype analysis at 7q34-q36 . 220Figure C.8 Pedigree CMT333 with haplotype analysis at 7q34-q36 . 221Figure C.9 Pedigree CMT701 with haplotype analysis at 7q34-q36 . 222Figure C.10 Pedigree CMT260 with haplotype analysis at 7q34-q36 . 223Figure C.11 Pedigree CMT618 with haplotype analysis at 7q34-q36 . 224Figure C.12 Pedigree CMT477 with haplotype analysis at 7q34-q36 . 225Figure C.13 Pedigree CMT748 with haplotype analysis at 7q34-q36 . 226Figure C.14 Pedigree CMT779 with haplotype analysis at 7q34-q36 . 226Figure C.15 Pedigree CMT808 with haplotype analysis at 7q34-q36 . 226Figure C.16 Pedigree CMT375 with haplotype analysis at 7q34-q36 . 227Figure C.17 Pedigree CMT102 with haplotype analysis at 7q34-q36 . 227Figure C.18 Pedigree CMT724 with haplotype analysis at 7q34-q36 . 228Figure C.19 Pedigree CMT331 with haplotype analysis at 7q34-q36 . 229Figure E.1 Multi-point LOD plot for family CMT609. . . . . . . . . 240Figure E.2 Multi-point LOD plot for family CMT274. . . . . . . . . 241Figure E.3 Multi-point LOD plot for family CMT677. . . . . . . . . 241Figure E.4 Multi-point LOD plot for family CMT484. . . . . . . . . 242Figure E.5 Multi-point LOD plot for family CMT703. . . . . . . . . 242Figure E.6 Multi-point LOD plot for family CMT131. . . . . . . . . 243Figure E.7 Multi-point LOD plot for family CMT333. . . . . . . . . 243Figure E.8 Multi-point LOD plot for family CMT273. . . . . . . . . 244Figure E.9 Multi-point LOD plot for family CMT260. . . . . . . . . 244Figure E.10 Multi-point LOD plot for family CMT701 . . . . . . . . . 245Figure E.11 Multi-point LOD plot for family CMT618. . . . . . . . . 245Figure E.12 Multi-point LOD plot for family CMT724. . . . . . . . . 246Figure E.13 Multi-point LOD plot for family CMT331. . . . . . . . . 246Figure E.14 Multi-point LOD plot for family CMT477. . . . . . . . . 247Figure E.15 Multi-point LOD plot for family CMT779. . . . . . . . . 247

vi

Figure E.16 Multi-point LOD plot for family CMT748. . . . . . . . . 248Figure E.17 Multi-point LOD plot for family CMT808. . . . . . . . . 248

vii

L I S T O F TA B L E S

Table 1.1 OMIM classification of dHMN . . . . . . . . . . . . . . . 25Table 3.1 Previous two-point LOD scores in CMT54 . . . . . . . . 47Table 3.2 New STR markers for fine mapping . . . . . . . . . . . . 56Table 3.3 CMT54 two-point LOD scores between the dHMN locus

and microsatellite markers at 7q34-q36 . . . . . . . . . . 57Table 3.4 Sample and clinical information for 20 additional dHMN

families. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Table 3.5 Simulated two-point LOD scores for additional dHMN

families at θ = 0.00 . . . . . . . . . . . . . . . . . . . . . . 64Table 3.6 CMT44 two-point LOD scores between the dHMN locus

and microsatellite markers on 7q34-q36 . . . . . . . . . . 68Table 3.7 Comparison of haplotypes between CMT54 and addi-

tional dHMN families . . . . . . . . . . . . . . . . . . . . 77Table 3.8 Haplotypes used in association analysis. . . . . . . . . . 78Table 3.9 Association analysis of haplotypes with dHMN using

Fisher’s exact test . . . . . . . . . . . . . . . . . . . . . . . 79Table 4.1 Candidate gene prioritisation ranking using GeneWan-

derer and Endeavour. . . . . . . . . . . . . . . . . . . . . . 99Table 4.2 DNA variants identified in genes screened. All mutations

found in affected individuals sequenced including knownSNPs. Allele frequency. When both controls are homozy-gous, allele freq will be 0% or 100%. 100% indicates theyare homozygous for the alternate SNP allele. Where rsnumbers are indicated, alleles identified in sequencingmatched those reported in dbSNP. . . . . . . . . . . . . . 101

Table 6.1 Summary statistics for targeted sequencing of CMT54family member III:14. . . . . . . . . . . . . . . . . . . . . . 147

Table 6.2 Novel non-synonymous variants identified by targetedcapture and sequencing of 7q34-q36. . . . . . . . . . . . . 149

Table 6.3 Novel sequence variants possibly affecting splicing iden-tified within the 7q34-q36 interval. . . . . . . . . . . . . . 150

Table 6.4 Novel variants within the 5’ and 3’ UTR of known genes. 151Table 6.5 Summary of exome and chromosome 7q34-q36 targeted

sequencing using Solexa platform. . . . . . . . . . . . . . 157Table 6.6 Sequence variants identified by exome capture and BGI

sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 158Table 6.7 Novel sequence variants identified with exome sequencing.158Table 6.8 Segregation analysis of exome sequencing identified vari-

ants assuming unequal sequencing coverage. . . . . . . . 162

viii

Table 6.9 Markers with two-point LOD scores > 1.5 in a genomewide linkage analysis. . . . . . . . . . . . . . . . . . . . . 163

Table 6.10 Sequence variants predicted to be deleterious or affectingconserved sequences. . . . . . . . . . . . . . . . . . . . . . 164

Table 7.1 Summary of exome sequencing in CMT44. . . . . . . . . 174Table 7.2 Sequence variants identified by exome sequencing in

CMT44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174Table B.1 PCR primers used in gene screening. . . . . . . . . . . . 205Table B.2 PCR primers for microsatellite markers. . . . . . . . . . . 214Table B.3 PCR primers for RT-PCR. . . . . . . . . . . . . . . . . . . 214Table B.4 PCR primers for HRM analysis. . . . . . . . . . . . . . . . 214Table B.5 PCR primers for validation of CNV. . . . . . . . . . . . . 214Table B.6 PCR primers for validation of variants identified through

NGS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215Table D.1 Two-point linkage analysis in family CMT274 for 7q34-

q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 230Table D.2 Two-point linkage analysis in family CMT609 for 7q34-

q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 231Table D.3 Two-point linkage analysis in family CMT333 for 7q34-

q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 231Table D.4 Two-point linkage analysis in family CMT484 for 7q34-

q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 232Table D.5 Pedigree CMT677 two-point LOD scores between the

dHMN locus and microsatellite markers on chromosome7q34-q36. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

Table D.6 Two-point linkage analysis in family CMT703 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Table D.7 Two-point linkage analysis in family CMT131 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Table D.8 Two-point linkage analysis in family CMT273 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Table D.9 Two-point linkage analysis in family CMT808 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Table D.10 Two-point linkage analysis in family CMT260 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Table D.11 Two-point linkage analysis in family CMT779 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Table D.12 Two-point linkage analysis in family CMT477 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Table D.13 Two-point linkage analysis in family CMT331 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 237

Table D.14 Two-point linkage analysis in family CMT724 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 237

Table D.15 Two-point linkage analysis in family CMT618 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 238

ix

Table D.16 Two-point linkage analysis in family CMT748 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 238

Table D.17 Two-point linkage analysis in family CMT701 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Table D.18 Two-point linkage analysis in family CMT102 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Table D.19 Two-point linkage analysis in family CMT375 for 7q34-q36 markers. . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Table F.1 Healthy control haplotypes for association analysis . . . 249Table G.1 Candidate genes within the minimum probable candidate

interval on 7q34-q36 . . . . . . . . . . . . . . . . . . . . . 251Table G.2 Candidate genes outside the minimum probable candi-

date interval on 7q34-q36 . . . . . . . . . . . . . . . . . . 253Table G.3 Expression pattern of genes within the chromosome 7q34-

q36 dHMN1 disease region. . . . . . . . . . . . . . . . . . 254Table H.1 Sequencing gaps in 454 sequencing of 7q34-q36 . . . . . 255Table H.2 Analysis of variants within chromosomal translocations

on 7q34-q36 . . . . . . . . . . . . . . . . . . . . . . . . . . 256Table I.1 Gains and losses identified in control individual II:7 using

aCGH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Table I.2 Gains and losses identified in affected individual III:9

using aCGH. . . . . . . . . . . . . . . . . . . . . . . . . . . 258Table I.3 Gains and losses identified in affected individual III:12

using aCGH. . . . . . . . . . . . . . . . . . . . . . . . . . . 259Table I.4 Gains and losses identified in affected individual II:1

using aCGH. . . . . . . . . . . . . . . . . . . . . . . . . . . 260

x

A C R O N Y M S A N D A B B R E V I AT I O N S

ACRF Australian Cancer Research Foundation

aCGH array comparative genomic hybridisation

AD autosomal dominant

ADLD autosomal dominant leukodystrophy

AGE agarose gel electrophoresis

AGRF Australian Genome Research Foundation

ALS amyotrophic lateral sclerosis

AR autosomal recessive

ASSP Alternative Splice Site Predictor

BAsH GNU Bourne-Again SHell

bp base pair

CCDS consensus coding sequence

CHOP Children’s Hospital of Philadelphia

cM centimorgan

CMT Charcot-Marie-Tooth

CMT1A CMT type 1A

CNV copy number variation

ddNTP dideoxynucleotide

dHMN distal hereditary motor neuropathy

DNA deoxyribonucleic acid

DSMA distal spinal muscular atrophy

ESTs expressed sequence tags

FAM 6-carboxyfluorescein

FTD frontotemporal dementia

GWAS genome-wide association studies

xi

HNPP hereditary neuropathy with liability to pressure palsies

HSF Human Splicing Finder

IBD identical by descent

IBS identical by state

indel insertion or deletion

kb kilobase

LOD logarithm of the odds

MAF minor allele frequency

Mb megabase

microsatellite short tandem repeat

NCS nerve conduction studies

NGS next generation sequencing

NNSplice Splice Site Prediction by Neural Network

NS non-synonymous

OMIM Online Mendelian Inheritance in Man

PCR polymerase chain reaction

PCDHγ protocadherin-gamma

PPI protein-protein interaction

QC quality control

Rw rump white

SCA spinocerebellar ataxia

SD segmental duplication

small HSP small heat shock protein

SMA spinal muscular atrophy

SNP single nucleotide polymorphism

snRNP small nuclear ribonucleoprotein

SNV single nucleotide variation

SS splice site

xii

ssDNA single-stranded DNA

STS sequence-tagged site

WGA whole-genome association

Wt wild type

xiii

P U B L I C AT I O N S

papers

• Drew AP, Blair IP, Nicholson GA. “Molecular genetics and mechanisms of

disease in distal hereditary motor neuropathies: insights directing future genetic

studies”. Curr Mol Med. 2011 Nov;11(8):650-65.

abstracts

Presentations

• Drew AP, “Identifying a gene for distal motor neuron disease”, Concord

clinical week 2008, Seminar series finalist.

Posters

• Williams K, Solski J, Drew A, Albulym O, Durnall J, Thoeng A, Thomas

V, Warraich S, Crawford J, Rouleau G, Nicholson G, Blair I, “Exome

sequencing in ALS and other motor neuron diseases”, 22nd International

Symposium on ALS/MND, Sydney 2011.

• Williams K, Drew A, Solski J, Durnall J, Thoeng A, Albulym O, Warraich

S, Thomas V, Crawford J, Rouleau G, Nicholson G, Blair I, “Investigating

the genetic basis of ALS and other motor neuron diseases: Analysis of known

genes and search for new loci”, 21st International Symposium on ALS/MND,

Orlando 2010.

• Drew AP, Blair IP , Williams KL Nicholson GA, “Genetic analysis of

distal motor neuron disease”, ANS satellite meeting motor neuron disease

symposium 2010.

• Drew AP, Blair IP , Williams KL Nicholson GA, “Positional cloning of a

gene for distal motor neuron disease on 7q34-q36”, ASHG annual conference

Honolulu Oct 2009.

xiv

• Drew AP, Blair IP , Williams KL Nicholson GA, “Molecular genetic studies

of distal motor neuron disease”, ASMR NSW scientific meeting 2009.

• Drew AP, Blair IP , Williams KL Nicholson GA, “Molecular genetic studies

of distal motor neuron disease”, From cell to society 6 2008.

• Drew AP, Blair IP, Durnall JC, Gopinath S, Kennerson ML, Nicholson GA,

“Finding new genes causing motor neuron disease”, Lorne genome conference

2008.

xv

1L I T E R AT U R E R E V I E W

1.1 molecular genetics and mechanisms of disease in distal

hereditary motor neuropathies

The distal hereditary motor neuropathies (dHMNs) are a clinically and genet-

ically heterogeneous group of disorders that superficially resemble Charcot-

Marie-Tooth (CMT) neuropathy, where motor neurons are primarily affected

with an absence of significant sensory involvement. Distal HMN is also known

as distal spinal muscular atrophy (DSMA) or the spinal form of CMT (spinal

CMT). Distal HMN is distinct from the other peroneal muscular atrophies,

the combined motor and sensory neuropathies (hereditary motor and sensory

neuropathy; HMSN including CMT1 and CMT2) or exclusively sensory neu-

ropathy (hereditary sensory neuropathy; HSN). Symptoms caused by motor

neuron loss include weakness and wasting of muscles controlled by the af-

fected nerves, leading to loss of function of those muscles. The nerves affected

varies between types of dHMN. Sensory neuron loss in related diseases results

in loss of sensation and proprioception, while pain nerves generally remain

unaffected. This can lead to over use of affected limbs. The original classi-

fication of dHMN was based on the system proposed by Harding [1] with

seven subtypes defined according to; mode of inheritance, age at onset and

additional complicating features. Several additional types of dHMN have been

described, necessitating additional classifications (see Table 1.1). The molecular

genetic characterisation of dHMN has further delineated the classification of

these disorders and has highlighted some overlap between phenotypic forms

1

1 literature review 2

of dHMN. There are also rare sporadic cases which can be attributed, at least

in part, to de-novo mutations in dominant dHMN genes.

Several new genes with dHMN mutations have recently been identified,

confirming the genetic heterogeneity of dHMN. As shown in Table 1.1, there

are now 12 causative genes described for dHMN, and an additional five genetic

loci where the gene remains to be identified. Some of these mutated genes

are common between dHMN and CMT2. This review examines the growing

number of identified dHMN genes, discusses recent insights into the functions

of these genes and possible pathogenic mechanisms, and looks at the increasing

overlap between dHMN and other neuropathies including CMT2 and spinal

muscular atrophy (SMA).

1.1.1 Small heat shock protein family

Missense mutations in three genes from the small heat shock protein (small

HSP) family, heat shock 27kDa protein 1 (HSPB1), heat shock 22kDa protein

8 (HSPB8) and heat shock 27kDa protein 3 (HSPB3), cause both dHMN2 and

CMT2 [2–4]. Classically, dHMN2 is characterised by adult onset weakness and

wasting, predominantly in the distal muscles of the lower limbs, with a rapid

progression resulting in paralysis of the lower extremities within 5-10 years

[5]. However, there can be a broad clinical phenotype including some cases

with later bilateral hand weakness and minor sensory involvement. Also, a

more severe earlier onset phenotype, similar to amyotrophic lateral sclerosis

4 (ALS4; Section 1.1.4), has been observed with a specific HSPB1 mutation

[6]. Unrelated dHMN2 and CMT2 families have been reported with identical

mutations in HSPB1 or HSPB8, demonstrating that there is significant genetic

overlap between these disorders [2, 7–9].

1 literature review 3

The small HSPs are encoded by eleven genes, HSPB1 to HSPB11 [10]. The

three small HSPs involved in dHMN, HSPB1, HSPB8 and HSPB3 all contain

the highly conserved α-crystallin domain, characteristic of small HSPs [11].

The α-crystallin domain is flanked by a variable N-terminal and in HSPB1 and

HSPB8 a short C-terminal. The expression of the small HSP genes is variable

between tissue types, with expression of HSPB1, HSPB5, HSPB6, HSPB7 and

HSPB8 in spinal cord [12–14]. Both HSPB1 and HSPB8 are expressed in motor

neurons, with HSPB1 showing high expression [3, 15, 16].

In dHMN2, two missense mutations with autosomal dominant (AD) inheri-

tance have been identified in HSPB8, both affecting the same amino acid [3].

Ten mutations have been described in HSPB1 in dHMN and CMT2F families,

nine of which show AD inheritance and one with homozygous autosomal

recessive (AR) inheritance [2, 17, 18]. Most recently a mutation in HSPB3

was identified in a dHMN family [4]. The majority of the HSPB1 mutations

and both HSPB8 mutations are located in the conserved α-crystallin domain

(HSPB1; L99M, R127W, S135F, R136W, R140G and T151I. HSPB8; K141N and

K141E), with two mutations in the C-terminal of HSPB1 affecting the same

amino acid (P182L and P182S) and three mutations in the N-terminal region

(HSPB1; P39L and G84R. HSPB3; R7S). The G84R mutation is adjacent to the

α-crystallin domain and lies in a conserved phosphorylation recognition motif

(Figure 1.1 A).

Under normal cellular conditions, small HSPs freely interact to form large

oligomeric complexes containing several small HSP family members, leaving

only low levels of un-phosphorylated monomer [23, 24]. Under conditions

of stress, small HSPs are phosphorylated, triggering the large oligomers to

dissociate into smaller oligomers, dimers, and monomers [25, 26]. These phos-

phorylated active proteins act as molecular chaperones to help stabilise cellular

proteins, preventing protein aggregation and facilitating substrate refolding

in conjunction with other molecular chaperones [27]. In addition to their role

Figure 1.1: Eleven causative genes described to-date for dHMN. Protein domainsare drawn to scale. Mutations are numbered relative to the start ATG (codon 1)and the location is shown as vertical lines. The scale for each gene is indicated. (A)Small HSP family members HSPB1, HSPB8 and HSPB3. Most mutations lie withinthe α-crystallin domain (blue). (B) DCTN1, the G59S mutation lies within the CAP-gly domain (microtubule-binding) (orange). (C) IGHMBP2 and SETX are the twoUPF1-like helicases identified in dHMN. Most missense mutations in IGHMBP2 arelocated in the Superfamily I DNA and RNA helicase domain (green), whereas nullmutants (not shown) are distributed across the gene. Of the three reported SETXmutations, one lies in the Superfamily I DNA and RNA helicase domain and theremaining two lie in a putative N-terminal protein-protein interaction domain (aqua)[19]. (D) GARS mutations E71G, L129P, G240R, I280F, H418R D500N and G526R affectthe catalytic domain (red). S581L and G598L lie within the tRNA binding domain(blue). E71G also interacts with the tRNA binding domain [20]. (E) BSCL2 mutationsN88S and S90L both affect an N-glycosylation motif. (F) ATP7A mutations are locatedwithin or adjacent to transmembrane domains (purple). (G) The PLEKHG5 mutationis located within the pleckstrin-homology domain (yellow). (H) All TRPV4 mutationsare located within the ankyrn repeat domain (brown) as part of ANK4 and ANK5 [21].(I) DYNC1H1 mutations are localised to the dynein binding residues (brown). Figuregenerated with FancyGene [22].

1 literature review 4

3'UTR

200bp

ankyrn repeat domaintransmembrane domains

pore

R269HR269C

R315WR316C

TRPV4ANK4 ANK5

1000bp

motor domain

H306R Y970CI584LK671E

DYNC1H1stem domain

ATPaseATPase ATPase ATPase ATPaseATPase ATPasedimerisation residues

PLEKHG55'UTR

200bp

RhoGEF PH domain coiled coil

F647S

ATP7ACu#3Cu#2Cu#1 Cu#4 Cu#5 Cu#6

Copper binding sites

Phosphatase Phosphorylation ATP binding

3'UTR

T994I P1386S500bp

BSCL25'UTR

transmembrane transmembraneN-glycosylation motif

N88SS90L

100bp

3'UTR

GARS tRNA binding domainWHEP-TRS Catalytic Catalytic

200bp E71G L129P I280FG240R H418R D500NG526R

S581LG598A

3'UTR5'UTR

SETX

3'UTR

Superfamily I DNA and RNA helicase domain

T3I L389S R2136H1000bp

IGHMBP2

I Ia II III IV V VImotifs

L17P G86_A355del

Putative protein interaction domain

A

B

C

D

E

F

G

H

I

D974ED565NL472PL426P

W386RL364PL361P

E334K

P216LH213R

Q196A

Superfamily I DNA and RNA helicase domain

500bp

R3H Znf-AN1I Ia II III IV V VI

L192PC241RL125P

T221A

E382KG354S

H445PE514K

T493IR603C

R637C

R603HK572del L577P V580I R581S R583I G586C

NLS

DCTN1microtubule-binding domain dynein-binding domain actin-related protein-binding domain

3'UTR

G59S500bp

5'UTR

3'UTR

P39L G84R L99M R127WS135FR136W

T151L

R140G

P182LP182S

K141NK141E

R7S

HSPB1

HSPB8

HSPB3

5'UTR

3'UTR5'UTR

5'UTR

phosphorylation

3'UTR

100bp

α-crystallin domain

motifs

1 literature review 5

as chaperones, small HSPs are also involved in a diverse range of cellular

processes including, suppression of apoptosis, regulation of cytoskeleton dy-

namics, protection from oxidative stress, modulation of inflammation and

RNA processing [28–33].

With the diverse range of cellular activities of small HSPs, the definitive

mechanism through which mutations in HSPB1, HSPB3 and HSPB8 cause

dHMN remains unclear. The variability of clinical phenotypes caused by

these mutations further confuses their role. However, recent observations of

mutant small HSPs in disease models and characterisation of the function

of these proteins are providing insight into the processes involved in these

neurodegenerative disorders.

Protein aggregates are seen in many neurodegenerative diseases but it is

unclear whether these are a cause or consequence of the disease [34]. In

neuronal cell culture, expression of HSPB1 and HSPB8 carrying dHMN2

mutations leads to formation of insoluble aggregates in the cell body [2, 3, 35].

Expression of mutant HSPB8 leads to formation of aggregates containing

both HSPB1 and HSPB8 with increased interaction between the two proteins

[3]. In addition to aggregation in the cell body, the HSPB1 P182L mutation

prevented the correct transportation of both mutant and wild type (Wt) HSPB1

down neuronal processes [35]. Furthermore, aggregates were associated with

disrupted neurofilament assembly. These aggregates contained p150 dynactin

and neurofilament middle chain subunit (NFM) but not other cellular cargoes

actively transported in axons [2, 35]. Disrupted assembly and aggregation

of NF is also seen in CMT2E caused by mutations in the neurofilament-

light subunit (NFL) [36, 37]. In cultured primary mouse motor neurons, Zhai

et al [38] showed that both NFL and HSPB1 mutations caused progressive

degeneration and loss of motor neurons. Co-expression of mutant NFL with

Wt HSPB1 reduced the NF disruption and aggregation caused by mutant

NFL. Furthermore, in NFL-null motor neurons expressing mutant HSPB1,

1 literature review 6

no aggregation of NFL or NFM is observed, indicating that HSPB1 interacts

specifically with NFL [38].

In contrast to the cell culture models of the HSPB1, HSPB8 and NFL mutants,

analysis of mutant HSPB8 in primary motor neuron cultures did not result

in the formation of aggregates, but still resulted in a disruption of axonal

transport. The cultured primary motor-neurons had shortened neurites with

spheroids and end bulbs, resembling Wallerian degeneration, likely a result of

axon retraction rather than lack of outgrowth [39]. Spheroids and endbulbs

are filled with organelles and disorganised cytoskeleton components [40]. The

increase in apoptotic activity seen in injured neurons [41] was not observed

in the HSPB1 mutants, therefore, it is unlikely that the anti-apoptotic role of

HSPB1 influences the initial pathogenesis of dHMN [39].

HSPB1 also interacts with actin microfilaments both as a chaperone un-

der stress conditions and in the regulation of actin filament dynamics. As a

chaperone, phosphorylated HSPB1 monomers coat actin filaments, prevent-

ing the action of actin severing proteins activated by the stress response and

aiding the recovery of actin filaments after stress [27, 42, 43]. Under normal

conditions HSPB1 functions in a regulatory role, with un-phosphorylated

HSPB1 monomers preventing the polymerisation of actin, promoting a pool

of soluble actin subunits [44]. In motor neurons, actin microfilaments are the

main components of the plasma membrane cytoskeleton and are enriched in

presynaptic terminals, dendritic spines and growth cones [45]. The interaction

of mutant HSPB1 with actin microfilaments in motor neuron disease remains

to be investigated.

These studies suggest that disruption of the neurofilament network is a com-

mon triggering event for motor neuron degeneration in dHMN2 and CMT2E/F,

perhaps by disruption of axonal transport rather than direct neurotoxic effects

of protein aggregates.

1 literature review 7

As small HSPs interact with the cytoskeleton as chaperones under stress

conditions and a loss of chaperone activity is associated with other chaper-

onopathies [46], the influence of the mutations on the chaperone activity of

HSPB1 and HSPB8 have been investigated. In vivo analysis of HSPB1 mutations

in the α-crystallin and C-terminal domains showed there was no reduction in

the chaperone activity of HSPB1, some mutations even resulted in an increase

in chaperone activity. This was associated with increased levels of dimer to

monomer conversion of mutant proteins, with normal levels of HSPB1 dimer

maintained by Wt protein. All HSPB1 mutants examined showed enhanced

binding to target proteins, overall suggesting increased protein interaction

activity is involved in dHMN and CMT2 [26]. This supports previous analysis

showing decreased HSPB1 self-association correlates with increased chaperone

function [27]. Conversely, the chaperone activity of K141E HSPB8 mutant was

decreased, with a significant reduction seen at lower HSPB8 concentrations.

Interestingly, with specific substrates, mutant HSPB8 was active earlier than

Wt, but with a lower overall activity [47]. The K141N/E HSPB8 mutations

affect the equivalent residue mutated in HSPB4 in AD congenital cataract

[3, 48]. Mutation of this residue resulted in a similar reduction of chaperone

activity [49]. Unfortunately, comparisons are difficult because the analysis of

chaperone activity for HSPB1 and HSPB8 used different assays. The concentra-

tion of protein, the time point examined and the substrate selected all influence

the measured chaperone activity.

Recently, both HSPB1 and HSPB8 have been shown to interact in RNA

processing pathways. As RNA processing defects are postulated to be a major

factor in neurodegeneration [50, 51] these functions of HSPB1 and HSPB8

should be considered as a possible part of dHMN pathogenesis. Both disease

causing HSPB8 mutants lead to abnormally increased binding of HSPB8 to the

RNA helicase Ddx20 (gemin-3), which interacts with the survival-of-motor-

neurons protein (SMN) [52]. Mutations in the SMN1 gene encoding SMN

1 literature review 8

protein cause SMA type 1 (SMA1; MIM#253300) [53]. With reduced SMN

mRNA and protein in patients with SMA1, it is postulated that insufficient

levels of SMN in motor neurons results in SMA1 [54, 55]. SMN is part of

a multi-protein complex (including gemin-3), involved in the assembly and

regeneration of small nuclear ribonucleoproteins (snRNPs) (RNA-protein com-

plexes that combine with unmodified pre-mRNA and various other proteins to

form a spliceosome) [56, 57] and microtubule associated transport of RNA [58].

The role of mutant SMN in SMA is not well understood, however in SMA mice,

SMN deficiency results in reduced axon growth in motor-neurons, correlating

with reduced β-actin mRNA and protein in distal axons and growth cones

[59], as well as a reduction of snRNP assembly and expression of a subset of

Gemin proteins in spinal cord [60]. The HSPB8/Ddx20 interaction involves

RNA suggesting a role for HSPB8 in ribonucleoprotein processing in conjunc-

tion with SMN [52]. Recently, HSPB1 was identified as a critical subunit of the

AUF1- and signal transduction- regulated complex (ASTRC) and is an AU-rich

element (ARE) binding protein [33]. AU-rich element mRNA degradation is the

process whereby mRNAs with AREs in their 3’ UTR are specifically targeted

for degradation [61, 62]. The association of HSPB1 with ASTRC may provide a

sensing mechanism for ARE mRNA degradation involved in the regulation

of transiently expressed proteins, including those involved in inflammatory

responses, cell proliferation and intracellular signalling [33].

In addition to missense mutations, a mutation in the highly conserved heat

shock element promoter region of HSPB1 was identified in a sporadic ALS

patient with atypically long disease duration. This mutation lead to in vitro

reduction of expression of HSPB1 under normal conditions and elimination

of the heat shock response in neuronal cells, suggesting an increase in cellu-

lar damage and toxicity resulting from a lack of heat-shock response, may

contribute to the pathogenesis of motor neuron degeneration [63].

1 literature review 9

In summary, it is likely that dominant toxic interactions involving mutant

HSPB1 and HSPB8 cause dHMN through disruption of axonal transport, a

pathogenic mechanism to which motor neurons are particularly vulnerable.

1.1.2 Dynactin 1 (DCTN1)

A missense mutation in the DCTN1 gene encoding the p150Glued subunit of

dynactin (p150Glued) was described in a dHMN7B (MIM#607641) family [64].

Distal HMN7B, also known as distal spinal and bulbar muscular atrophy is an

AD lower motor neuron disease presenting in the second or third decade of life,

with vocal fold paralysis leading to breathing difficulties, progressive facial

weakness, weakness and muscle atrophy of the hands and later involvement of

the lower extremities [64, 65]. Rare DCTN1 mutations have also been identified

in ALS and ALS with FTD [65–67]

Dynactin is a microtubule motor protein involved in mitosis and specialised

subcellular movement [68]. Dynactin is a large protein complex necessary for

dynein and kinesinII based microtubule movement. The p150Glued subunit is

particularly important for its role in binding dynactin to microtubules through

its microtubule binding domain. Dynein and other molecular motors bind

to dynactin on the arm of p150Glued adjacent to the microtubule binding

domain, with the remaining rod like structure involved in binding a variety of

membrane and protein structures [68]. The p150Glued subunit is ubiquitously

expressed and is highly enriched in neurons of the central nervous system

[69].

The G59S mutation reported in dHMN7B is located in the conserved CAP-

gly domain of p150Glued [64] (Figure 1.1 B). Mutations in the CAP-gly domain

of p150Glued were also identified in Perry syndrome (MIM#168605), a rapidly

progressive AD neurological disorder involving parkinsonism, weight loss,

1 literature review 10

depression with suicidal attempts and later nocturnal hypoventilation leading

to respiratory insufficiency and death within 2-10 years [70]. With no motor

neuron involvement, Perry syndrome and dHMN7B are distinct disorders.

Unlike the mutations in dHMN7B and Perry syndrome, the ALS DCTN1

mutations are outside the CAP-gly domain.

The G59S p150Glued dHMN mutant shows decreased binding to microtubules

and does not disrupt the binding of Wt p150Glued [64, 71]. Dynactin was

disrupted in a late onset progressive motor neuron disease phenotype mouse

model through over expression of the dynamitin (p50) subunit of dynactin,

causing inhibition of retrograde axonal transport [72]. This was reflected in the

immunohistochemical staining of a patient with dHMN7B (74yo) after autopsy

which showed a change in distribution of dynactin, dyenin and related proteins

from diffuse fine granular staining throughout the cell body and axons to more

intense, coarser and irregularly shaped granules and larger accumulations

within the cell body, proximal axon and distal neurites. Normal neurofilament

staining in motor neuron cell bodies and axons indicated that the defect was

with dynactin [73]. The G59S p150Glued dHMN mutant leads to formation of

inclusion bodies, and it has been suggested that a loss of stability promotes

aggregation [71].

The G59S mutation is located in a fold of the CAP-gly domain and is

predicted to distort the folding of the microtubule domain [64]. Recent experi-

ments with yeast mutants analogous to the G59S human mutation have helped

understand the mechanism underlying the dysfunction of dynactin-mediated

transport in motor neuron disease. The CAP-gly domain has a critical role

in the initiation and persistence of dynein-dependant movement, but is not

essential for dyenin based movement. It was speculated that the GAP-Gly

activity is necessary for overcoming high force/load thresholds of movement

[74].

1 literature review 11

Analysis of the p150Glued mutants in Perry’s syndrome indicate the mutated

CAP-gly domain is folded correctly but is less stable than Wt p150Glued. The

domains containing the Perry syndrome mutations bind to microtubules

but fail to bind to EB1, a microtubule plus-end tracking protein, unlike a

G59A mutant which retained EB1 binding ability. The G59A mutant was

analysed instead of the G59S mutant, which was not correctly expressed in

this model [75]. Levy et al. [71] showed that EB1 binding was reduced with

the G59S mutation but the stability of the dynactin complex was not affected.

Ahmed et al. [75] inferred that the steric clashes stemming from the G59S

mutation prevent correct protein folding under physiological conditions and

that p150Glued is more unstable in dHMN7B than in Perry’s syndrome.

A reduction in the stability of CAP-gly domain of p150Glued supports the

model of motor neuron specific degradation proposed by Levy et al [71] that

mutant DCTN1 causes decreased binding of p150Glued to microtubules and EB1

resulting in disrupted axonal transport. An aberrant self-association leading

to aggregates, specifically in neurons, leads to further impairment of axonal

transport.

1.1.3 Immunoglobulin mu binding protein 2 gene (IGHMBP2)

Mutations in the IGHMBP2 gene on chromosome 11q13 result in dHMN6, also

known as distal spinal muscular atrophy 1 (DSMA1; MIM#604320) [76]. To date,

over 50 novel AR mutations have been described, with both homozygous and

compound heterozygous inheritance patterns, including; missense, nonsense,

frameshift, in-frame deletions and intronic splice mutations [76–86]. In addition,

a large deletion and a complex mRNA rearrangement of IGHMBP2 have been

identified together with missense mutations in two DSMA1 cases [78, 79].

In DSMA1 there is a progressive degeneration of spinal motor and sensory

1 literature review 12

neurons, with axonal degeneration, abnormal myelin formation, and motor

end-plate degeneration leading to muscle paralysis and wasting [87]. Patients

present with rapidly progressive, life threatening respiratory distress caused by

paralysis of the diaphragm, and distal muscle weakness and wasting with foot

and hand deformities [77, 88]. Sensory nerves are also affected in later stages

of the disease. Presentation within the first 6 months is strongly indicative

of IGHMBP2 mutations, with only few cases with later juvenile onset of

respiratory distress identified with IGHMBP2 mutations [84, 85].

The IGHMBP2 gene encodes Immunoglobulin mu binding protein 2 (IGHMBP2)

a ubiquitously expressed protein, predominantly present in the cytoplasm,

axons and growth cones with a smaller nuclear localised fraction in spinal

motor neurons [89, 90]. IGHMBP2 is a UPF1-like helicase, a member of the

superfamily I (SF1) DNA/RNA helicases [91], containing the seven helicase

motifs distinctive of SF1 helicases [92], including the conserved Walker A and

B ATPases [93] (Figure 1.1 C). IGHMBP2 also contains a R3H motif [94] and

zinc finger AN1-like domain [95]. Mutations in a second UPF1-like helicase,

senataxin, cause the dHMN, ALS4 [96]. Recent insight into the cellular func-

tion of IGHMBP2 has indicated it is a ribosome associated protein with 5’-3’

helicase activity [97]. The catalytic activity of IGHMBP2 requires both helicase

binding and ATP hydrolysis [97]. More specifically, it is involved in transla-

tion of mRNA[98]. IGHMBP2 is co-localised with RNA processing machinery

[99] and interacts with related helicases, Reptin, Pontin and Abt1 as well as

TFIIIC220 a RNA polymerase III transcription factor involved in tRNA gene

transcription, ribosomal maturation and pre-ribosomal RNA processing and

maturation [98].

The missense mutations in the helicase domain affect the catalytic activity of

IGHMBP2 either directly through disruption of the helicase binding activity

or indirectly by loss of the ATPase activity powering the helicase [97]. Rarer

missense mutations outside the helicase domain do not affect the helicase

1 literature review 13

binding or ATP hydrolysis activities of IGHMBP2, but rather result in reduced

protein levels (50% reduction by the T493I allele), by a disruption of protein

stability or reduced transcription of the normal mRNA levels [85]. Similarly, in

a splicing mutant where exon 8 is skipped there is partial expression resulting

in 24.4 ± 6.9% of Wt IGHMBP2 mRNA levels [84]. In this case, reduction of the

exon 8 skipped mRNA was probably the result of nonsense mediated mRNA

decay. A second nonsense mutation (C496X) leads to a complete loss of mutant

mRNA rather than truncated protein [80].

Some evidence suggests a possible correlation between IGHMBP2 protein

levels and age of onset of respiratory symptoms, with infantile onset having

a lower IGHMBP2 protein level than patients with juvenile onset [85]. Impor-

tantly, 50-75% Wt protein levels were shown in carriers, therefore the critical

level of IGHMBP2 for not developing symptoms lies in the range of 25-50%

[85]. This is supported by studies of the nmd mouse, a model of SMA resulting

from a homozygous splicing mutation in IGHMBP2 that express ∼20% of

normal IGHMBP2 mRNA levels in all tissues. The Nmd mouse model is not a

100% null mutant but rather a hypomorphic allele [100].

The recessive inheritance of DSMA1 indicates that there is a loss of function

of IGHMBP2 involved in the disease. This is either through missense mutations

in the helicase domain affecting the catalytic activity of IGHMBP2 or through

reduced protein levels. Protein levels are not the only factor influencing the

disease outcome in the nmd mouse model. A protective allele has been mapped

to the Mnm locus in the nmd mouse which rescues the nmd phenotype [100].

The modifying factor does not restore IGHMBP2 protein levels, rather it

possibly compensates for lost IGHMBP2 function. Interestingly, the Mnm

locus fails to rescue the dilated cardiomyopathy phenotype seen later in nmd

mice after the neurodegeneration phenotype is rescued [101]. In addition,

dilated cardiomyopathy is also seen in nmd mice after neuronal restoration of

1 literature review 14

IGHMBP2 levels and rescue of nmd phenotype, suggesting a motor neuron

specific activity of the Mnm locus [101].

In summary, DSMA1 involves the loss of function of IGHMBP2 helicase

activity through disruption of protein activity by missense mutations or overall

reduced protein levels. The level of active protein is likely to correlate with

the rate of disease progression and age of onset. As IGHMBP2 protein levels

are equivalently reduced in neurons and other cell types in DSMA1, motor

neurons are apparently more sensitive to the reduction of IGHMBP2 than other

cell types, possibly due to their large size, high metabolic demands and precise

connectivity requirements [85].

1.1.4 Senataxin (SETX)

Mutations in the SETX gene encoding senataxin cause the rare AD distal

hereditary motor neuropathy with pyramidal features known as juvenile

amyotrophic lateral sclerosis type 4 (ALS4; MIM#602433). ALS4 is a juvenile

or adolescent onset, slowly progressive disease that involves limb weakness

and muscle wasting with pyramidal signs as a result of degeneration of motor

neurons in the brain and spinal cord. Sensory involvement is not seen and

there is sparing of bulbar and respiratory muscles [102]. Three ALS4 families

were reported with senataxin mutations (T3I, L389S and R2136H) [96] with a

fourth ALS4 family later reclassified as dHMN1, with linkage to chromosome

7q34-q36 [103]. A large number of SETX mutations have also been reported

in several other disorders including the AR neurological disorder, ataxia with

ocular apraxia type 2 (AOA2) [104–117], a rare AR ataxia with peripheral

neuropathy [114] and an AD cerebellar ataxia/tremor syndrome [118]. While

the involvement of peripheral neuropathy suggests some clinical overlaps

between ALS4 and AOA2, they are clinically distinct [96, 114].

1 literature review 15

Senataxin is ubiquitously expressed, present primarily in the nucleus but also

present in the cytoplasm [19, 109]. Senataxin has a conserved SF1 RNA/DNA

helicase domain in the C-terminus with strong homology to the helicase

domains of UPF1 and IGHMBP2 [96, 104]. As such, senataxin is the second

UPF1-like helicase identified in a dHMN. The helicase domain of senataxin

is similarly conserved with the yeast orthologue Sen1p [96, 119]. Senataxin

also contains a putative N-terminal protein-protein interaction domain and a

C-terminal nuclear localisation signal [19, 104]. The SETX missense mutations

cluster in the helicase and protein-protein interaction domains suggesting

a role for these domains in disease (Figure 1.1 C) [19]. Although the direct

mechanisms through which mutations in SETX cause ALS4 are unknown,

recent studies have shown senataxin is involved in transcriptional regulation

of genes including RNA polymerase II termination factor and in pre-mRNA

processing affecting splicing efficiency and splice site selection [120].

The function of senataxin is similar to Sen1p in yeast, where it is a RNA

helicase acting on a wide range of RNA classes, including tRNAs, rRNAs and

small nuclear and nucleolar RNA [121], tRNA splicing [122], small nuclear

RNA synthesis [123], transcription and transcription-coupled DNA repair

[124] and as a RNA polymerase II transcription regulator [125]. An amino

acid substitution in the helicase domain of Sen1p altered the genome wide

distribution of RNA-polymerase II in both non-coding and protein coding

genes [125]. With similar mutations, Steinmetz [125] suggested a role for altered

gene transcription in ALS4 and AOA2.

In AOA2, senataxin mutations result in altered response to oxidative stress

[109, 117]. Both the knockdown of SETX expression and an AOA2 SETX

null mutant, showed a decrease in expression of genes involved in oxidative

stress, including SOD1, IMPDH2, CYC and RPL36 [120]. Mutant senataxin

also showed a similar toxic effect, however, knockdown of mutant senataxin

1 literature review 16

ameliorated the toxic effect of the mutant form. This resulted in normal levels

of oxidative stress response [117].

Mutant senataxin showed normal cellular distribution with no apparent

aggregation in cell culture and a normal molecular weight seen in Western

immunoblotting. It was therefore suggested that mislocalisation or generation

of abnormal protein isomers and aggregates do not play a role in ALS4 [19].

While it appears there is a possible role for decreased oxidative stress response

in AOA2, the mechanism of mutant senataxin in ALS4 has not been determined.

An error in transcriptional regulation of coding genes or an error in pre-mRNA

processing, affecting splice site selection or splicing efficiency are still plausible

possibilities in ALS4.

1.1.5 Glycyl-tRNA synthase (GARS)

Mutations of the GARS gene encoding glycyl-tRNA synthetase (GlyRS) were

initially reported in dHMN5 (MIM#600794) and CMT2D (MIM#601472) [126].

Distal HMN5, also known as distal spinal muscular atrophy type 5, is an

AD inherited peripheral neuropathy, characterized by adolescent onset of

weakness and atrophy of muscles in hands and feet, predominantly of the

upper limbs, with slow progression to involve the lower limbs in most cases

[127]. Distal HMN5 and CMT2D are allelic variants, with sensory involvement

distinguishing CMT2D from dHMN5 [126–128]. Atypical dHMN5 phenotypes

have also been reported with GARS mutations, a variable phenotype showing

lower or upper limb onset (G562R) [129], predominant lower limb involvement

(P244L) [130, 131] and early childhood onset with predominantly lower limb

involvement (G598A) [132]. Three GARS mutations were reported in distinct

dHMN5 and CMT2D families (dHMN5; L129P and G526R, CMT2D; G240R)

and one family with both CMT2D and dHMN5 individuals (E71G) [126].

1 literature review 17

Further GARS mutations have been described in dHMN5 (I280F and A57V)

and CMT2 (S581L, H418R, and P244L) (Figure 1.1 D) [127, 131–133].

Two mouse models of CMT2D have been described with mutations in the

mouse Gars gene, the orthologue of human GARS [134, 135]. These two Gars

mutations cause neuropathy through a dominant, gain-of-function mechanism

[135].

The aminoacyl-tRNA synthetases (ARS) (including GARS) are a family of

enzymes that catalyse the covalent bonding (charging) of an amino acid to its

corresponding transfer RNA (tRNA). The charged tRNA can then deliver the

amino acid to a growing polypeptide on a ribosome [136]. ARS are ubiquitously

expressed and highly conserved across organisms, reflecting their fundamental

role in protein synthesis. GlyRS is a homodimer, essential for the charging of

glycine to tRNAgly [137], in both the cytoplasm and mitochondria.

Antonellis and Green [20] proposed four mechanisms whereby mutant ARS

could cause neurodegeneration. (1) Impaired function of ARS enzymes ei-

ther through loss of tRNA charging function or altered cellular localization

(compartment specific loss of function) or (2) a toxic gain of function caused

by aggregation of mutant GARS or incorrectly synthesized proteins or (3)

disruption of a neuronal-specific secondary function of these enzymes or (4)

mitochondrial specific disruption of ARS function through loss of ARS trans-

port into mitochondria. While the pathologic mechanisms involving mutant

GlyRS in dHMN5 remain unclear, Motley et al. [138] suggested that a loss of

enzyme function through altered dimerisation or impaired axonal transport of

GARS with local protein synthesis defects, were likely pathogenic mechanisms

of GARS in CMT2D and dHMN5. Three out of four ARS genes mutated in

CMT (lysyl-, alanyl- and tyrosyl-tRNA) involve loss of function through re-

duction in tRNA aminoacylation [126, 139, 140] or altered axonal distribution

[141]. A dominant-negative loss of function of GlyRS would agree with the

loss of function observed by the other mutant ARS in CMT. However, other

1 literature review 18

possible pathogenic mechanisms include an altered non-canonical function of

GARS, toxic protein interactions or mitochondrial toxicity [138].

1.1.6 Berardinelli-Seip congenital lipodystrophy 2 (SEIPIN) gene (BSCL2)

In addition to GARS mutations, dHMN5 can also be caused by mutations in

the Berardinelli-Seip congenital lipodystrophy 2 (seipin) gene (BSCL2). These

mutations are also found in the allelic disorder spastic paraplegia 17 (SPG17;

MIM#270685) also known as Silver syndrome [142]. Two BSCL2 mutations

have been identified in dHMN5 and SPG17, N88S and S90L (Figure 1.1 E).

These affect a conserved N-glycosylation motif [142]. This glycosylation site

is a mutational hot-spot, with these mutations identified independently in

several families and as de-novo mutations in sporadic cases [6].

SPG17 is a rare AD neurodegenerative disorder, characterised by weakness

and wasting of the small muscles in the hand and marked spasticity in the

lower limbs [143]. BSCL2 mutations result in heterogeneous phenotypes, with

clinical variability often seen within individual families. While the clinical phe-

notypes are predominantly dHMN5 and SPG17, atypical phenotypes include,

subclinical neurological damage, variant SPG17 and dHMN5 with lower limb

or both lower and upper limb involvement and CMT2 [143–146].

Unlike the missense BSCL2 mutations in dHMN5 and SPG17, null mutations

cause congenital generalized lipodystrophy type 2 (CGL2; MIM #269700), a rare

AR disease of lipid metabolism [147]. CGL2 is characterised by a near absence

of adipose tissue from birth or infancy, severe insulin resistance and mental

retardation. Lipodystrophy is not evident in dHMN5 and SPG17, suggesting

that BSCL2 mutants in dHMN5 and SPG17 confer a dominant toxic gain of

function [144].

1 literature review 19

Three alternative transcripts of BSCL2 have been identified. Two are ubiq-

uitously expressed and one is specifically expressed in motor neurons in the

spinal cord, cortical neurons in the cerebral cortex, and in the testis and pi-

tuitary gland [142, 148]. The 288 aa central region of BSCL2 is functionally

conserved across species [149]. This region of BSCL2 contains two transmem-

brane domains, with the N and C-terminals in the cytoplasm. The conserved

N-glycosylation motif mutated in dHMN and SPG17 is located between these

two transmembrane domains [150].

Ito and Suzuki [151] demonstrated that mutations in seipin disrupt N-

glycosylation leading to incorrect protein folding in the endoplasmic reticulum

(ER). They proposed a mechanism for neurodegeneration, whereby some

misfolded seipin is eliminated by the ubiquitin-proteosome system, however

this is insufficient to prevent mutant seipin from accumulating in the ER,

leading to cell death through ER stress [152]. Similar accumulation of mutant

BSCL2 was seen in cell cultures expressing the N88S and S90L mutations,

with aggreosome like structures containing mutant BSCL2 along with altered

cellular localisation [142]. Further evidence for a toxic gain of function comes

from functional studies in yeast. BSCL2 proteins containing the N88S or S90L

mutations were able to rescue a lipid droplet morphology in the same way

as Wt BSCL2 and truncated BSCL2 containing only the conserved 288 amino

acid region. This contrasts with BSCL2 containing a CGL2 null-mutant, which

resulted in an altered lipid droplet morphology, indicating the N88S and S90L

mutants still possess some normal BSCL2 function [149].

In a dHMN family carrying a N88S BSCL2 mutation a second disease locus

was mapped to chromosome 16p [146]. This family had a variable dHMN

phenotype, including either lower or upper or both lower and upper limb

involvement, while some family members also showed pyramidal features.

All twelve patients with dHMN carried the N88S BSCL2 mutation and the

16p locus. One individual carried only the 16p locus and showed sub-clinical

1 literature review 20

neurological damage. This digenic inheritance of the 16p locus may have an

additive pathologic effect or be sufficient for an attenuated form of dHMN

[146].

While the number of studies of mutant BSCL2 is small, and no post-mortem

analysis has been reported, the proposed toxic gain of function leading to

neurodegeneration through ER stress remains a likely central mechanism in

dHMN5 and SPG17.

1.1.7 ATPase, Cu2+-transporting, alpha polypeptide gene (ATP7A)

Mutations causing X-linked dHMN (DSMAX; MIM#300489) have been reported

in the gene encoding the copper-transporting ATPase, ATP7A (Figure 1.1 F)

[153]. This X-linked dHMN syndrome is characterised by slowly progressive

distal weakness, with minor sensory involvement. The mutations occurred in

families of Brazilian [154] and North American [155] ancestry. An early age of

onset was observed in the Brazilian family with the North American family

presenting in the third or fourth decades. Similar motor-neuron disease like

symptoms can also result from acquired copper deficiency [156]. Mutations in

ATP7A have previously been described in the severe conditions, Menkes dis-

ease (MD) [157–159] and occipital horn syndrome (OHS) [160]. However, these

syndromes are phenotypically distinct from X-linked dHMN [153]. Analysis

of the copper transport function and cellular localisation of ATP7A indicate

the X-linked dHMN mutations are causing a protein trafficking defect, where

ATP7A doesn’t move away from the trans Golgi network in response to ele-

vated copper [153]. It is unclear how this defect in copper transport results in

a motor-neuron specific phenotype.

1 literature review 21

1.1.8 Pleckstrin homology domain-containing protein, G5 gene (PLEKHG5)

A mutation in the pleckstrin homology domain containing protein, family

G member 5 (PLEKHG5) gene was reported in large consanguineous pedi-

gree with an AR lower motor neuron disease [161, 162]. Classified as distal

spinal muscular atrophy 4 (DSMA4; MIM#611067) or dHMN4, this severe

juvenile onset, lower motor neuropathy presents with distal muscle wasting of

both upper and lower limbs, rapidly progressing to a general paralysis and

impaired respiration, but sparing cranial nerves [161]. A milder phenotype

was identified in one family member, with delayed onset, slower course and

moderate generalised weakness.

PLEKHG5 is a member of the Db1 protein family, with characteristic cat-

alytic Db1-homology domain and pleckstrin homology domain (Figure 1.1 G).

PLEKHG5 is a low abundance protein predominantly expressed in periph-

eral nerve, spinal cord and brain, with lower expression in heart and skeletal

muscle. [162, 163]. PLEKHG5 is a direct activator of RhoA in vivo, and is a

suppressor of neuronal differentiation [163]. PLEKHG5 is also an activator of

the NFκB signalling pathway [162, 164]. In neurons, NFκB activation stimu-

lates pro-survival genes, including anti-apoptotic proteins, antioxidants and

neurotrophic factors [165].

The reported mutation in PLEKHG5 leads to protein instability and a fail-

ure in activation of the NFκB pathway. Furthermore, aggregation of mutant

PLEKHG5 was seen when over expressed in a motor-neuron like cell line [162].

The demonstrated instability of mutant PLEKHG5, and its inability to activate

NFκB signalling, support a homozygous loss of function mechanism. Maystadt

et al. [162] proposed that PLECKHG5 may play a protective role in motor

neurons and that a loss of NFκB stimulating activity leads to neuronal-specific

1 literature review 22

cell death. Further investigation of PLEKHG5 aggregation is needed to support

a role in neurodegeration.

1.1.9 Transient receptor potential cation channel, V4 gene (TRPV4)

Two mutations in the vanilloid transient receptor potential cation-channel gene

(TRPV4) were recently identified in congenital DSMA linked to chromosome

12q23-q24 (MIM#600175) [21]. The congenital DSMA TRPV4 mutations were

concurrently reported with TRVP4 mutations in hereditary motor sensory

neuropathy type 2C (HMSN2C; MIM#606071, or CMT2C) and scapuloperoneal

spinal muscular atrophy (SPSMA; MIM#181405) [21, 166, 167], with overlap

of all the TRVP4 mutations between the three conditions [168]. In all, four

mutations were reported; two mutations affecting the same amino acid, R269H

and R269C, and two mutations affecting adjacent amino acids, R315W and

R316C (Figure 1.1 H). Both congenital DSMA families showed intra-familial

clinical variation. The phenotype of the original congenital DSMA family

ranged from mild distal lower limb involvement, through to a more severe

phenotype involving pelvic girdle weakness and subsequent scoliosis [169, 170].

The second family reported further clinical variation including mild and severe

congenital DSMA, SPSMA and HMSN2C [21]. Although congenital DSMA

and SPSMA were originally considered to be distinct [170], they have since

been confirmed to be allelic variants of mutant TRVP4 along with HMSN2C

[21, 166–168].

TRVP4 forms a homo-tetramer that acts as a plasma membrane calcium ion

channel, activated by heat, mechanical stress and extracellular hypotonicity

[171]. The four unique mutations in TRVP4 are all located in the cytosolic

N-terminal ankyrin repeats, a putative regulatory region involved in TRVP4

tetramerization or interaction with regulatory proteins [171].

1 literature review 23

The function of mutant TRVP4 in peripheral nerves is unclear, with conflict-

ing disease mechanisms reported [168]. Auer-Grumbach et al. [21] reported

reduced cell surface expression of TRPV4 calcium channels and reduced cal-

cium influx, whereas two other studies reported marked cellular toxicity, with

increased calcium channel activity [166, 167]. These differences have been

attributed to the use of different experimental techniques [168] and further

research is required to clarify the functional effects of TRPV4 mutations in

neurodegeneration.

1.1.10 DYNC1H1

During the preparation of this thesis, mutations in the dynein cytoplas-

mic 1 heavy chain 1(DYNC1H1) gene were identified in three families with

spinal muscular atrophy with lower extremity predominance (SMA-LED,

MIM#158600)[172] and one with CMT2O(MIM#614228)[173]. Both the SMA-

LED and CMT2O families with DYNC1H1 mutations showed a lower limb

predominance of motor neuron disease. The affected patients from SMA-LED

families, showed a pure motor neurone disease, manifesting in early childhood

as delayed motor development and lower limb weakness affecting walking and

running, without sensory involvement. The affected patients from the CMT2O

family also showed delayed motor development and early-onset muscle weak-

ness affecting walking. However, slow disease progression lead to distal lower

limb weakness and wasting with pes cavus deformity and variable sensory

involvement in some family members, including reduced vibration sense, loss

of proprioception and fine touch and lower limb pains. Some of the affected

patients from the CMT2O family had similar clinical descriptions to SMA-LED

indicating a clinical overlap between the disorders.

1 literature review 24

The four mutations are located in the tail domain of DYNC1H1, involved

in dimerisation of dyenin/dynactin complex motor units (Figure 1.1 I). Three

mouse models with motor neuron defects and muscle wasting have been

described with mutations in the mouse Dync1h1 gene [174, 175]. These three

mutations are also located in the tail domain of Dync1h1.

The discovery of DYNC1H1 mutations adds to previous evidence from

DCTN1 (Section 1.1.2) that disruption to retrograde transport in motor neurons

is a critical disease mechanism in dHMN. Other genes involved in these

processes are likely disease candidates in additional unsolved dHMN families.

1.1.11 Clinical variability in dHMN

The dHMN genes identified to date have a diverse range of functions, sug-

gesting motor neuron degeneration is a multifactorial process or that there are

numerous ways to disrupt these highly metabolic, terminally differentiated

cells. While distal motor neurons are primarily affected, the involvement of

other neuronal cell types, including upper motor neurons and sensory neu-

rons is apparent in some dHMN. This variability is both intrafamilial, with

individuals having different diagnoses within a single family as well as in-

terfamilial, where separate families carrying the same mutation, or different

mutations in the same gene, have differing diagnosis. In the case of sensory

involvement in dHMN, individuals without sensory signs are diagnosed as

dHMN, while individuals with clear sensory signs are diagnosed as CMT2.

The clinical variability in dHMN makes accurate diagnosis a complex and

difficult task. As the involvement of sensory and pyramidal tract signs are

prominent in several forms of dHMN they should be considered in diagnostic

and gene screening paradigms [6, 185]. This further complicates genetic studies

which rely on accurate diagnosis of family members for linkage analysis in

1 literature review 25

Tabl

e1.

1:O

MIM

clas

sific

atio

nof

dHM

N

Dis

ease

Locu

sG

ene

Inhe

rita

nce/

phen

otyp

e†R

efer

ence

s

dHM

N1

(MIM

%18

2960

)7q

34-q

36U

nkno

wn

AD

juve

nile

onse

tlo

wer

limb

pred

omin

ant

dist

alw

eakn

ess

and

was

ting

[103

]

dHM

N2A

(MIM

#158

590)

12q2

4.3

HSP

B8A

Dad

ult

onse

tlo

wer

limb

pred

omin

ant

dist

alw

eakn

ess

and

was

ting

[3]

dHM

N2B

(MIM

#608

634)

7q11

-q21

HSP

B1[2

]dH

MN

2C(M

IM#6

1337

6)5q

11.2

HSP

B3[4

]

dHM

N3

(MIM

%60

7088

)‡11

q13.

3U

nkno

wn

AR

adul

ton

set

dist

alw

eakn

ess

and

was

ting

[176

,177

]

dHM

N4

(MIM

%60

7088

)‡11

q13.

3U

nkno

wn

AR

juve

nile

onse

tsev

ere

mus

cle

wea

knes

san

dw

astin

gan

dpa

raly

sis

ofth

edi

aphr

agm

dHM

N5

(MIM

#600

794)

7p15

GA

RS

AD

uppe

rlim

bpr

edom

inan

tdi

stal

mus

cle

wea

knes

san

dw

asti

ng[1

26]

11q1

2-q1

4BS

CL2

[142

]

dHM

N6

(DSM

A1;

MIM

#604

320)

11q1

3.2

IGH

MBP

2A

Rin

fant

ileon

set

wit

hre

spir

ator

ydi

stre

ss[7

6]

dHM

N7A

(MIM

%15

8580

)2q

14U

nkno

wn

AD

adul

ton

set

wit

hvo

calc

ord

para

lysi

s[1

78]

dHM

N7B

(MIM

#607

641)

2p13

DC

TN1

AD

adul

ton

set

brea

thin

gdi

fficu

ltie

sdu

eto

voca

lcor

dpa

raly

sis,

faci

alw

eakn

ess

and

hand

mus

cle

atro

phy

[64]

X-L

inke

ddH

MN

(DSM

AX

;MIM

#300

489)

Xq1

3-q2

1A

TP7A

XL

juve

nile

onse

tdi

stal

wea

knes

san

dw

asti

ng[1

53]

DSM

A4

(MIM

#611

067)

1p35

PLEK

HG

5A

Rch

ildho

odon

set

mus

cle

wea

knes

san

dse

vere

lyde

crea

sed

resp

irat

ory

func

tion

[162

]

Con

geni

tald

ista

lSM

A(M

IM#6

0017

5)12

q23-

q24

TRPV

4A

Dco

ngen

ital

,non

prog

ress

ive

low

erlim

bw

eakn

ess

and

para

lysi

sw

ith

cont

ract

ures

[21,

169,

170,

179,

180]

dHM

N-A

LS4

(MIM

#602

433)

9q34

SETX

AD

juve

nile

onse

tdi

stal

wea

knes

san

dw

asti

ngw

ith

pyra

mid

altr

act

sign

s[9

6]

dHM

Nw

ith

pyra

mid

alfe

atur

es4q

34.3

-q35

.2U

nkno

wn

AD

adul

ton

set

low

erlim

bdi

stal

wea

knes

san

dw

asti

ngw

ith

pyra

mid

altr

act

sign

s[1

81]

dHM

N-J

(MIM

%60

5726

)9p

21.1

-p12

Unk

now

nA

Rju

veni

leon

set

dist

alw

eakn

ess

and

was

ting

wit

hpy

ram

idal

trac

tsi

gns

[182

,183

]

SMA

-LED

(MIM

#158

600)

14q3

2.31

DY

NC

1H1

AD

juve

nile

dela

yed

mot

orde

velo

pmen

tan

dlo

wer

limb

wea

knes

s[1

72]

†A

D,a

utos

omal

dom

inan

t,A

R,a

utos

omal

rece

ssiv

e,X

L,X

-chr

omos

ome

linke

d.‡

Afa

mily

map

ping

to11

q13.

3ha

sbo

thdH

MN

3an

ddH

MN

4ph

enot

ypes

[177

],Ir

obie

tal

[184

]su

gges

ted

that

they

are

the

sam

edi

sord

er.

1 literature review 26

such pedigrees. This adds to the limitations on linkage analysis caused by the

genetic heterogeneity of dHMN which prevents independent pedigrees from

being analysed together. In addition, the functional variability between known

genetic causes of dHMN limits the application of pure functional cloning based

gene identification (Section 4.1.2).

Sensory involvement is sometimes seen with mutations in dHMN genes,

highlighting the overlap between dHMN and CMT2. Interfamilial variability

is seen with unique HSPB1 mutations reported in unrelated dHMN2 and

CMT2 families. The S135F HSPB1 mutation results in both dHMN2 and CMT2

in unrelated families [2, 18]. As such, the differences in sensory symptoms

between these families cannot only be attributed to the different HSPB1 muta-

tions having varying influence between sensory and motor neurons, and the

disease mechanism must be the same between dHMN2/CMT2 with HSPB1

mutations. Similarly the K141N HSPB8 mutation causes both dHMN2 and

CMT2L in unrelated families with mild to moderate sensory involvement the

distinguishing feature between these conditions [3, 8]. Furthermore, GARS

mutations cause both dHMN5 and CMT2 in unrelated families, as well as in a

single pedigree carrying the E71G GARS mutation [126, 127, 131–133].

Additional interfamilial variability in dHMN further supports the overlap

of dHMN and CMT2. Nicholson et al. [185] identified several dHMN families

with variable sensory signs and individuals from CMTX families lacking

sensory symptoms. The X-linked dHMN families with ATP7A mutations have

minor sensory involvement in some family members [153]. The dHMN1 family

linked to chromosome 7q34-q36 has two individuals with minor sensory signs

[103] and the dHMN with pyramidal signs linked to chromosome 4q34.3-q35.2

has one individual with sensory signs [181]. Later sensory symptoms are seen

in DSMA1 with IGHMBP2 mutations, confirmed by autopsy results which

showed progressive motor and sensory neuron degeneration [87]. Generally

no sensory involvement is seen in with BSCL2 mutations, however in the large

1 literature review 27

family reported by Auer-Grumbach et al. [143], 20% of affected members had

a motor and sensory neuropathy consistent with CMT2.

Clinical variability in dHMN extends to upper motor neuron involvement

in some dHMN families. Upper motor neuron involvement has been reported,

again with intra- and interfamilial variability. Pyramidal signs are seen with

SETX mutations [6, 102], and the rare HSPB1 P182L mutation [6]. Pyramidal

signs have also been reported in dHMN1 mapped to chromosome 7q34-q36

[103], and dHMN with pyramidal features mapped to chromosome 4q34.3-

q35.2 [181]. Significant variability is also seen with BSCL2 mutations. Generally,

the N88S mutation causes a dHMN5 phenotype, with atrophy and weakness

of hand muscles whereas the S90L mutation causes the more severe SPG17

phenotype, with spasticity in the lower limbs and additional upper motor

neuron signs in addition to the muscle atrophy in the upper limbs [6]. The

unique de novo G598A GARS mutation causes an infantile SMA contrasting to

the usual dHMNV/CMT2 [132].

It is still unclear why variability of affected tissues and age of onset exists

between closely related individuals, however this implicates additional mod-

ifying factors, environmental or genetic, influencing the phenotype. Genetic

modifiers are other polymorphisms or mutations which ameliorate or intensify

the phenotype created by the primary disease mutation. Two genetic modifiers

have been mapped in a dHMN5 family and the nmd mouse model of DSMA1.

A digenic inheritance of the N88S BSCL2 mutation and a second locus mapping

to chromosome 16p was reported in a dHMN5 family with variable presen-

tation and additional pyramidal features in some family members [146]. The

chromosome 16p locus was also identified in two individuals with sub-clinical

motor neuron damage and is suggested to be a modifier locus, responsible for

some of the clinical variability within the family. The LITAF1 gene associated

with CMT1C is located in this region, however no mutation, copy number

variation or altered expression was identified. A further 49 genes remain in

1 literature review 28

the 16p locus, including other good candidate genes [146]. As previously de-

scribed, the Mnm locus in the nmd mouse model of DSMA1 with IGHMBP2

mutations, ameliorates the motor-neuron disease phenotype [100]. The Mnm

locus contains several tRNA genes and the ribosomal associated factor Abt1,

which associates with IGHMBP2; however, no polymorphisms were identified

in these genes to account for the compensatory effect of the Mnm locus [98].

While the two mapped dHMN genetic modifiers have not yet been identified,

the genetic modifiers influencing the severity of SMA caused by mutations in

the SMN1 gene have been characterised. The phenotype of SMN related SMA

is highly variable and ranges from the fatal infantile onset type 1 SMA (SMA1),

through the chronic infantile onset SMA type 2 (SMA2), juvenile type 3 SMA

(SMA3) and adult onset, type 4 SMA (SMA4). Mutations in SMN1 involve

deletions or exon skipping resulting in a loss of SMN protein. Heterozygous

SMN1 mutations cause the least severe phenotype SMA4, with loss of both

copies of SMN1 causing the more severe earlier onset forms of SMA. There

are several other factors which influence the severity of the disorder, the first

is the second copy of the SMN gene (SMN2) located on a large duplication

on chromosome 5. The SMN2 gene has a silent mutation which causes exon

7 skipping leading to an inactive protein. However SMN2 transcription still

produces about 10% of normal SMN mRNA. SMN2 copy number influences

the phenotype of SMA, with higher SMN2 copy number correlating with less

severe diagnosis [186]. SMN1 mutations that maintain partial transcription

levels similar to SMN2, lead to a less severe phenotype than a complete loss

of SMN1 [187]. Secondly, high expression levels of the PLS3 gene, encoding

Plastin3, have been shown to be protective against SMN1 mutations [188].

Plastin3 over expression rescued axon defects caused by SMN1 deletion. The

PLS3 protective effect is gender-specific to females and shows less than 100%

penetrance suggesting an unknown regulatory mechanism is responsible for

the altered expression levels of PLS3[188]. A third modifying factor is the

1 literature review 29

neuronal apoptosis inhibitory protein (NAIP) [189]. Partial deletions of NAIP

are a polymorphism found in 2% of non-SMA chromosomes. When NAIP

deletions are inherited with SMN1 mutations, the more severe SMA1 pheno-

type results. Deletions in the NAIP gene are found in a high proportion of

patients with SMA1, including 67% of European and 37% of Chinese SMA1

cohorts [189, 190]. NAIP is an inhibitor of cell death and regulator of neuronal

outgrowth [191]. These functions are lost with the deletions of NAIP. Similar

modifiers may play a role in the variability of dHMN.

1.1.12 Common disease mechanisms in dHMN

While the genes associated with dHMN encompass a diverse range of func-

tions, some common mechanisms have emerged. Disrupted axonal transport

has been implicated in dHMN with DCTN1, HSPB1 and HSPB8 mutations.

As a core component of microtubule based transport, the disruption of the

motor function of Dynactin 1 directly disrupts axonal transport. Disruption of

components of microtubule based movement, including cargo adapters, micro-

tubule tracks and motor proteins are seen in other neurodegenerative diseases

including ALS, Huntington’s disease, Parkinson’s disease and Alzheimer’s

disease [192]. The mechanism whereby HSPB1 and HSPB8 mutations disrupt

axonal transport is less clear, although a disruption of the axonal cytoskeleton

or protein aggregation are possible mechanisms leading to axon blockage.

Active transport along axons is crucial for the survival of motor neurons.

Transport is required for delivery of proteins, lipids and mitochondria to the

distal synapse, long distance signalling and the clearing of damaged proteins.

These requirements make neurons particularly susceptible to disruption of

axonal transport.

1 literature review 30

RNA processing defects are also emerging as a significant factor in mo-

tor neuron disease [193]. It is apparent that motor neurons are particularly

sensitive to disruptions in RNA processing pathways, which include, RNA

transcription, pre-mRNA splicing, editing, transportation, translation and

degradation [51]. Several genes mutated in dHMN have roles in RNA pro-

cessing including IGHMBP2, SETX and GARS which have been shown to be

disrupted. While HSPB8 and HSPB1 have functions involved in RNA pro-

cessing, whether these functions are involved in the pathogenesis of dHMN

has not been investigated. Recently, mutations in TAR DNA binding protein

(TDP-43) [194] and fused in sarcoma (FUS) [195] have been shown to cause

ALS. Both FUS and TDP-43 have roles in RNA processing including regulation

of splicing, transcription and transport. An understanding of the effects of

mutations in these RNA processing proteins should provide insight into the

disease mechanisms of dHMN and other motor neuron disease.

Protein aggregation and inclusion body formation are recognised as common

molecular mechanisms leading to neurodegeneration [34]. Protein aggregation

may result from a failure of the proteasome to clear misfolded proteins, which

aggregate causing further toxic effects, including blocked access to axons, toxic

interactions with additional proteins or triggering cell death pathways. In

dHMN2 with HSPB1 and HSPB8 mutations, protein aggregates were seen

in vitro, with aggregates containing other proteins including the cytoskele-

ton protein NFL. This was thought to contribute to disruption of the axonal

cytoskeleton and axonal transport. Mutant BSCL2 was shown to form aggreo-

some like structures and accumulate in the ER, leading to ER stress and cell

death [152].

It is unclear how mutations in ubiquitously expressed genes result in a motor

neuron predominant phenotype. Motor neurons may be more susceptible due

to their unique features such as their large size and long axons, and heavy

reliance on active transport and high energy demands, which make them

1 literature review 31

most vulnerable to the effects of these mutations. On the other hand, some

genes mutated in dHMN are specifically expressed or expressed at higher

levels in the CNS than other tissues, thereby accounting for much of the

motor neuron specific disease phenotype. Mutations in the PLEKHG5 gene,

which is specifically expressed in motor neurons, results in a more severe

dHMN phenotype than many of the other ubiquitously expressed dHMN

genes. PLEKHG5 mutations are thought to lead to cell death through loss of

NFkb signalling, resulting in a motor neuron specific phenotype due to its

limited expression.

Future genetic studies including those searching for additional disease

genes can use the information from known disease genes to identify likely

candidate genes in related genes and common pathways. As additional genes

are described in dHMN, it is likely the common disease mechanisms and

pathways will become clearer. These common disease pathways will provide

targets to develop disease treatments or preventative strategies.

The purpose of this thesis is to identify new gene mutations causing dHMN.

The genetic and functional data presented here suggest this will be a difficult

task; the genetic heterogeneity complicates genetic analysis and the multiple

molecular mechanisms implicated to date make it difficult to pinpoint specific

candidate genes. The identification of additional genes and genetic modifiers

is necessary to increase our understanding of the disease mechanisms caus-

ing dHMN and related neuropathies. This will directly aid in the diagnosis

and classification of these neurodegenerative diseases and may lead to new

therapeutics and treatment strategies.

2G E N E R A L M AT E R I A L S A N D M E T H O D S

This chapter details the general materials and methods used throughout this

project. Protocols specific to individual experiments are detailed in the relevant

chapters.

2.1 general materials and reagents

All solutions were prepared using Type 1, reagent-grade water, purified using a

Milli-Q Academic (Millipore) water purification system. Solutions, plasticware

and glassware were sterilised by autoclaving.

2.1.1 Reagents and Enzymes

The following reagents and enzymes were used: 10 mM dNTP Mix (Bioline),

Blue Perpetual Taq DNA polymerase (Vivantis) and supplied buffers, Chromo

AtTaq (Vivantis) and supplied Buffer A, ImmoMix Red (Bioline), 10× PCRx

Enhancer Solution (Invitrogen), HRM Master Mix (Idaho Technology), LC

Green Plus (Idaho Technology), agarose (Vivantis), 25× TAE buffer (Amresco),

SYBR safe (Invitrogen), HyperLadder I (Bioline), HyperLadder IV (Bioline),

100% ethanol (Ajax Finechem).

32

2 general materials and methods 33

2.1.2 Equipment

Pipetting was carried out using the following pipettes: Nichipet 100-1000 µL,

20-200 µL, and 2-20 µL (Nichiryo); Pipetman P2 (Gilson); multichannel 20-200

µL (Socorex). Disposable tips used for pipetting were: Ultratip (Greiner Bio-

One), Gilson yellow 200 and Gilson blue 1000; ART Aerosol Resistant Tips

(Molecular BioProducts), ART 10 and ART 20p; and Aerosol Barrier Tips (Inter-

path Services), 1000 µL, 200 µL, 20 µL and 10 µL. Thermal cycling was carried

out using either the GeneAmp PCR System 9700 (Applied Biosystems), Master-

cycler ep gradient S (Eppendorf), or Mastercycler pro S (Eppendorf). Agarose

gel electrophoresis was carried out in horizontal submarine units (Cleaver Sci-

entific and Amersham Bioscience) using a 3000Xi Electrophoresis power supply

(Bio-Rad). DNA concentration was measured using a BioPhotometer 6131 (Ep-

pendorf) with UVette cuvettes (Eppendorf) and a 10 mm optical path length.

Centrifugation was carried out using: PMC-860 Capsulefuge (Tomy), 5417C

centrifuge (Eppendorf), Biofuge pico (Heraeus), GS6R centrifuge (Beckman) or

Megafuge 1.0 (Heraeus).

2.1.3 Plasticware

The following plasticware were used: 1.5 mL microcentrifuge tubes (Astral

Scientific), 8-strip 200 µL PCR tubes (Scientific Specialities Inc.), 96-well un-

skirted PCR plates (Bioplastics), 96-well black-shell/white-well skirted PCR

plates (BioRad), 15 mL and 50 mL centrifuge tubes (Greiner Bio-One), and

Vacuette R 9 mL K3E K3EDTA (Greiner Bio-one) blood collection tubes.

2 general materials and methods 34

2.2 study participants

Individuals participating in this study gave informed consent in accordance

to protocols approved by the Sydney South West Area Health Service Ethics

Review Committee (Sydney Australia; Reference number: CH62/6/2006 - 050 -

G.Nicholson).

2.3 dna methods

2.3.1 DNA extraction from blood

DNA samples were extracted by the Molecular Medicine Laboratory, Concord

Repatriation General Hospital (Sydney, Australia). Genomic DNA was ex-

tracted from peripheral blood using the PUREGENE DNA isolation kit (Gentra

Systems) according to the manufacturers protocol. In brief, RBC lysis solution

was added to 3 mL of whole blood to a total volume of 12 mL and incubated

at room temperature for 10 min. White cells were pelleted by centrifugation for

10 min at 2000 × g and the supernatant discarded. Cells were lysed by adding

cell lysis solution (3 mL) and mixing by vortexing. Protein was precipitated by

addition of protein precipitation solution (1 mL) followed by vortexing and

centrifugation for 10 min at 2000 × g. Supernatant was recovered into a 15

mL centrifuge tube. DNA was precipitated by addition of 100% isopropanol (3

mL), mixing by inversion, then pelleted by centrifugation for 3 min at 2000 ×

g. The DNA pellet was washed in 70% ethanol (3 mL) and centrifuged for 1

min at 2000 × g. The supernatant was discarded and the DNA pellet was air

dried for 15 min. The DNA was re-suspended in 250 µL of DNA hydration

solution and stored at 4°C.

2 general materials and methods 35

DNA samples older than the year 2000 were extracted using an alternate

phenol chloroform based DNA extraction protocol as described by Sambrook

et al. [196].

2.3.2 DNA oligonucleotides

DNA oligonucleotides for PCR primers were designed using either the Exon-

Primer software through the UCSC genome browser [197], or LightScanner

Primer Design software version 1.0.R.84 (Idaho Technology). Primer sequences

are listed in Appendix B. Primers were synthesised by Invitrogen or Sigma.

2.3.3 PCR protocol

PCR reactions were carried out in a 10 µL reaction volume containing 10

ng template DNA, 1 U of Taq polymerase, 1 × reaction buffer, 100 µM each

dNTP, 1.5 mM MgCl2 and 4 pmol each of forward and reverse primers. If

required, PCRx Enhancer solution or LC Green Plus solutions were added at

1 × concentration during optimisation. MgCl2 concentration was increased

for difficult to optimise amplicons using a gradient from 1.5 mM to 2.5 mM.

PCR reaction thermal cycling involved: initial denaturation at 95°C for 10 min;

30 cycles of 95°C for 30 sec, optimal annealing temperature for 30 sec, and

72°C for 30 sec; and a final extension at 72°C for 5 min. The optimal annealing

temperature was determined using the Mastercycler ep gradient S (Eppendorf)

with a temperature gradient of 55 to 65°C. Thermal cyclers used are listed

in Section 2.1.2. Alternatively, a touchdown PCR protocol was used to avoid

optimisation of annealing temperature. The touchdown PCR protocol used:

initial denaturation at 95°C for 10 min; 10 cycles of 95°C for 30 sec, 72°C for

30 sec with a 1.5°C decrease in annealing temperature per cycle and 72°C for

2 general materials and methods 36

30 sec; 25 cycles of 95°C for 30 sec, 55°C for 30 sec and 72°C for 30 sec; and a

final extension at 72°C for 5 min.

2.3.4 Agarose gel electrophoresis

PCR amplicons were size fractionated using agarose gel electrophoresis. 1-

2% (w/v) agarose gels were prepared by disolving agarose into 1 × TAE

buffer with heating using a microwave oven. Molten agarose was allowed

to cool to 65°C before adding CYBR Safe (Invitrogen) DNA gel stain, then

poured into horizontal gel trays and allowed to solidify. PCR samples were

electrophoresed at 40 Vcm-1 for 40 min. DNA was visualised using a Safe

Imager Transilluminator (Invitrogen) and imaged using a Canon PhotoShot

S5-IS digital camera with Hoya O(G) filter. DNA size standards used included

HyperLadder I (200 bp to 10 kb) or HyperLadder IV (100 bp to 1 kb).

2.3.5 DNA purification from PCR

Purification of PCR products was carried out using alternate protocols de-

pending on the sequencing service used. The The Montage PCR96 Cleanup

Kit (Millipore) was used for PCR amplicons sequenced at Garvan sequencing

centre (Section 2.3.6.1) and ExoSAP PCR purification system was used for PCR

amplicons sequenced using the Macrogen sequencing service (Section 2.3.6.2).

The Montage PCR96 Clean-up Kit (Millipore) was used following the man-

ufacturers semi-automated protocol using a vacuum manifold (Millipore),

Pascal type 2005C vacuum pump (Alcatel) and high-speed microplate shaker

(Illumina). For PCR cleanup using ExoSAP, a master mix was prepared on ice

containing (per sample) 0.3 µL Exonuclease I (New England Biolabs), 1 µL

Shrimp Alkaline Phosphatase (Promega) and H2O to a volume of 5 µL. For a

2 general materials and methods 37

10 µL PCR reaction volume, 5 µL of ExoSAP master mix was added to each

completed PCR reaction and incubated for 30 min at 37°C then 15 min at 80°C.

2.3.6 DNA sequencing

Sequencing was carried out using the sequencing service at Australian Cancer

Research Foundation (ACRF) Facility GARVAN institute (Sydney, Australia)

and the sequencing service at Macrogen (Korea).

2.3.6.1 GARVAN sequencing

PCR amplicons were cleaned up using the The Montage PCR96 Clean-up

Kit (Millipore). Sequencing reactions were carried out using the Big Dye

terminator kit (Applied Biosystems) following the manufacturers protocol. In

brief, sequencing reactions were carried out in a half reaction volume of 10

µL containing: BigDye Sequencing Buffer (1 ×), ready reaction premix (1 ×),

sequencing primer (3.2 pmol), and PCR product DNA template (3-10 ng). Cycle

sequencing was carried out using a Mastercycler ep gradient S (Eppendorf)

with a modified protocol of: 96°C for 1 min; 40 cycles of 96°C for 10 sec, 50°C

5 sec, 60°C 4 min; then held at 4°C. Sequencing products were forwarded

to the sequencing service at Australian Cancer Research Foundation (ACRF)

Facility GARVAN institute (Sydney, Australia) for sequencing using an ABI

3100 Genetic Analyser (Applied Biosystems).

2.3.6.2 Macrogen sequencing

PCR products were cleaned up using ExoSAP. Cleaned up PCR products were

forwarded to Macrogen (Korea) for their standard-sequencing single or plate

service using Big Dye Terminator cycle sequencing (Applied Biosystems) and

3730xl DNA Analyzer (Applied Biosystems).

2 general materials and methods 38

2.3.6.3 Sequence analysis

Sequencing traces were downloaded from sequencing service providers in

.abi format. Sequences were aligned to the Human Mar. 2006 (NCBI36/hg18)

assembly reference sequence for the target gene using the software SeqMan

II version 5.08 (DNASTAR). Affected and control sequences were compared

to identify sequence variants. Correct alignment, exon coverage and sequence

variant annotation of aligned sequences for each PCR amplicon were checked

using the BLAT tool from the UCSC genome browser [198].

3G E N E T I C M A P P I N G

3.1 introduction

3.1.1 Overview

The focus of this genetic study is the dHMN1 disease locus on chromosome

7q34-q36 in a large Australian family, CMT54 [103]. This disease locus is a

12.9 megabase (Mb) region flanked by the markers D7S2513 and D7S637, with

a minimum probable candidate interval of 7.6 Mb flanked by D7S2511 and

D7S2546, suggested by recombination events in two unaffected individuals

[103]. There remains potential for fine mapping to refine the minimum probable

candidate interval in family CMT54. Further refinement of the disease interval

requires additional dHMN families linked to 7q34-q36 or expansion of the

original family. This chapter includes the fine mapping of the minimum

probable candidate interval in CMT54 and an analysis of the 7q34-q36 locus in

a cohort of dHMN families that are negative for known dHMN genes.

3.1.2 Homologous recombination and linkage disequilibrium

Homologous recombination is a mechanism whereby sections of DNA are

exchanged between pairs of homologous chromosomes. Homologous recombi-

nation results in a chromosomal crossover, creating two chromosomes with

a new combination of alleles. While genes located on separate chromosomes

39

3 genetic mapping 40

assort independently during meiosis, genes located on the same chromosome

only assort independently if homologous recombination occurs between them.

The further apart genes are on a chromosome the more likely this will occur.

Two genetic loci that do not assort independently are deemed linked. The

recombination fraction (θ) is a measure of the frequency of recombination

between two loci, the genetic distance between them. Recombination fractions

of < 0.5 indicate some degree of linkage, less offspring are recombinant than

would be expected for random assortment. A recombination fraction of 0.5

indicates that two loci are unlinked, that is 50% of offspring are recombinant.

The map unit of genetic distance is the centimorgan (cM), with one map unit

defined as the distance between two loci such that 1% of meiosis is recombinant

(θ = 0.01). Unless located closely together, two loci on the same chromosome

will be rapidly redistributed in a population through genetic recombination

and assume random assortment. Because closely located loci are less likely

to be recombined, they redistribute slowly and do not assort randomly in a

population. This non-random association of alleles at linked loci is termed

linkage disequilibrium or allelic association [199].

Genetic distance is a measure of the rate of recombination, not a physical

distance. An approximation of 1 cM Mb−1 is often used as a conversion from

cM to Mb [199], however the true rate of recombination varies across the

genome. The rate of recombination varies: between chromosomes, at different

regions across each chromosome, between individuals and between males

and females [200, 201]. The average male to female ratio of recombination

rate is 1.65. When examined at a Mb scale resolution, recombination rate

generally varies from 0-2 cM Mb−1, with localised peaks of high recombination

of 3-7 cM Mb−1 [200, 201]. Fine scale mapping at a kilobase (kb) resolution

has identified recombination hot spots approximately every 200 kb, generally

located outside genes, with higher densities of such hot-spots in regions of

higher recombination [202]. Recently, motifs and sequence contexts have been

3 genetic mapping 41

shown to play a role in recombination site selection [203]. In particular, the

PRDM9 reignition sequence is a major determinant of recombination site

selection [204].

3.1.3 Genetic linkage analysis

Genetic linkage analysis is a method for mapping a gene or genetic variant to

a specific chromosomal locus by identifying linked genetic markers. In human

genetics, linkage analysis can be an important first step in the identification

of disease genes. Recombination events within a pedigree are used to define

the locus harbouring the disease allele. Fine mapping is the process of refining

the location of these recombination events by analysing markers at greater

resolution around recombination sites.

3.1.3.1 DNA markers

DNA markers are characterised genetic variations which are polymorphic

within a study population. Markers are deemed informative in an individual

when allele phase can be identified, that is, both alleles can be assigned

a parental origin. Highly polymorphic markers are those with the greatest

heterozygosity, and therefore, are more likely to be informative within a

pedigree. Maximum heterozygosity is achieved when individuals within a

pedigree carry different alleles for a marker. This allows marker phase to

be assigned, and recombinant and non-recombinant meiosis to be identified,

giving the maximum possible linkage power for a pedigree. While several

types of polymorphic markers exist, short tandem repeat (microsatellite) and

single nucleotide polymorphism (SNP) markers are the most commonly used

in current genetic linkage analysis techniques.

3 genetic mapping 42

Microsatellites are short repeat sequences of 1-6 base pairs of DNA, with

the length of the microsatellite used to identify alleles. Di- and tri-nucleotide

sequences repeating 20-30 times are traditionally used as markers, as they are

highly polymorphic and suitable for PCR based amplification and sizing. Mark-

ers with 10 or more alleles are not uncommon, therefore microsatellite markers

are generally highly informative and allow correct phase to be assigned.

SNPs are single point mutations, insertions or deletions, common at a popu-

lation level, typically of 1% or greater. The informativeness of SNP markers is

lower compared with microsatellites, limited by the possible number of alleles

and the population incidence of the SNP. However, the ability to assay large

numbers cheaply and quickly with array based techniques and the higher

density across the genome, means SNP based genotyping can be suitably

informative when analysed using multi-point linkage techniques. However,

such analysis is generally limited to small pedigrees or genomic loci due to

software and computational limitations.

3.1.3.2 Two point linkage analysis

The odds of linkage between two loci is quantified as the ratio of the likelihood

of the two alternate assumptions; that the loci are linked (0 < θ < 0.5) or not

linked (θ = 0.5) [205]. In positional cloning, one locus is the marker being

tested and the second is the disease locus. The logarithm of the odds (LOD)

score (Z) is used to evaluate linkage in pedigrees [206]. The LOD score at a

selected value for θ is determined by the equation:

Z = log10

[θR(1− θ)NR

(0.5)R(0.5)NR

]

Where R = number of recombinants and NR = number of non recombinants.

Z is calculated at a range of values for θ from 0 to 0.5, with best estimate

of the distance between the two loci given by the value of θ where Z reaches

3 genetic mapping 43

its maximum positive value. This allows the localisation of a disease locus

within a genetic map. Calculations of LOD scores used in this thesis were

implemented in computer software; SuperLink, FastSLink and Genehunter

(Section 3.2.4).

To demonstrate significant linkage, the LOD score must be powerful enough

to reject the null hypothesis of no-linkage. By convention, a LOD score of +3

gives sufficient power, with 1000:1 odds in favour of genome wide linkage.

Conversely, a LOD score of -2 is considered sufficient to exclude the marker lo-

cus from linkage. LOD scores between these limits are uninformative, however

LOD scores > 1 may be considered suggestive of linkage and warrant further

investigation because the rates of false positive linkage with such scores are

low [205].

The strength of Z is influenced by the number of informative meioses within

a pedigree and the marker allele frequencies. To achieve a LOD score of +3

generally requires at least 10 informative meioses. As such, smaller pedigrees

cannot independently achieve significant LOD scores. In situations where the

disease in multiple pedigrees is thought to be due to mutations in a common

gene, LOD scores can be added to achieve a greater power to identify a disease

locus. LOD scores of -2 can be achieved with small pedigrees when alleles

do not segregate among two or more affected individuals. This is useful for

excluding linkage to known disease loci. Combining linkage data between

pedigrees in heterogeneous diseases such as dHMN, is usually ineffective

because each gene is only responsible for a small proportion of cases.

3.1.3.3 Multi-point Linkage analysis

Multi-point linkage analysis combines data from multiple adjacent markers in

a single analysis and as such, is useful for markers with limited informative-

ness. Multi-point linkage analysis is necessary for SNP based linkage analysis

as individually these markers are insufficiently informative. For highly infor-

3 genetic mapping 44

mative microsatellite markers this is less of a concern. A plot of the multi-point

LOD score across genetic distance is useful for visualising linkage or exclusion

across a genetic locus.

3.1.3.4 Haplotype analysis

A haplotype is a series of linked alleles at a chromosomal locus which can

be treated as a single unit of transmission. To identify a haplotype requires

informative markers, from which the phase of the alleles in a chromosomal

segment is determined. The segregation of a haplotype within a pedigree

is useful for visualising recombination events in individuals. The minimum

haplotype segregating with affected individuals in a pedigree defines the

disease interval. Fine mapping involves genotyping additional markers around

recombination sites to minimise the size of the disease interval.

3.1.4 Penetrance in dHMN

The penetrance of a disease allele is the proportion of individuals carrying the

mutation who exhibit clinical symptoms. Distal HMN is generally a highly

penetrant disorder, however there is some variability between different types

of dHMN. Mutations causing dHMN2A [3], dHMN2B [2] and ALS4 [96],

have complete penetrance. Similarly, dHMN5 and CMT2D caused by GARS

mutations showed complete penetrance in the first identified families [126],

however a subsequent dHMN5 family with a GARS mutation showed reduced

penetrance [129]. Distal HMN5 caused by BSCL2 mutations also shows reduced

penetrance in addition to clinical variability [142]. Until the mutation causing

dHMN mapped to 7q34-q36 is identified, the penetrance of dHMN1 in family

CMT54 cannot be confirmed. The possibility of reduced penetrance in dHMN

limits the use of recombination events in unaffected individuals for refining

3 genetic mapping 45

the disease locus, especially where the recombinant chromosome has not been

inherited further by unaffected offspring. Nevertheless, recombinant unaffected

individuals, not at risk of becoming affected, help prioritise candidate gene

selection in mutation screening analysis.

3.1.5 Characterisation of dHMN1 in a large Australian family

Genetic linkage analysis was carried out in a large Australian dHMN family

(CMT54) using microsatellite markers [103]. The disease in this family was

classified as dHMN1, showing autosomal dominant inheritance with 10 af-

fected individuals (Figure 3.1). The disease in this family is slowly progressive,

with an early but variable age of onset (range 3-40 years, median 10 years),

presenting with difficulties walking and running due to lower limb weakness.

Neurological examination identified predominant foot deformities including

pes cavus in all patients, and hammer toes with callus formation in all but

one patent. Increased muscle tone was observed in six patients. Power was

reduced in ankle extensors and intrinsic foot muscles in all but one patient.

Plantar responses were extensor in five patients. Reduced hand-grip in two

patients was the only upper limb weakness demonstrated. The only sensory

abnormalities identified were reduced vibration in the feet of four patients.

Sural nerve biopsy in one patient at age 43 years was consistent with chronic

axonal neuropathy.

3.1.6 Previous genetic studies in dHMN1 at 7q34-q36

A genome-wide two-point linkage analysis in family CMT54 showed signifi-

cant evidence for linkage at D7S636 on 7q36.1 with a LOD score of 3.22. An

Figu

re3.

1:P

revi

ousl

ypu

blis

hed

ped

igre

eof

CM

T54

and

hapl

otyp

ean

alys

isat

7q34

-q36

[103

].Fe

mal

esan

dm

ales

are

repr

esen

ted

byci

rcle

san

dsq

uare

sre

spec

tive

ly.B

lack

ened

sym

bols

indi

cate

affe

cted

indi

vidu

als.

Mar

ker

alle

les

are

liste

dbe

low

indi

vidu

als,

orde

red

from

cent

rom

ere

tote

lom

ere.

Hap

loty

pes

are

ind

icat

edby

bars

,wit

hth

eha

plo

typ

ese

greg

atin

gw

ith

the

dis

ease

blac

kene

d.T

hena

rrow

bar

inin

div

idu

alII

:7in

dica

tes

unin

form

ativ

em

arke

rs.T

here

com

bina

tion

even

tsin

indi

vidu

als

II:1

0an

dII

I:13

defin

eth

edi

seas

ege

neca

ndid

ate

inte

rval

flank

edby

D7S

2513

and

D7S

637.

Rec

ombi

natio

nev

ents

inun

affe

cted

indi

vidu

als

II:7

and

III:3

sugg

esta

min

imum

prob

able

cand

idat

ein

terv

al,fl

anke

dby

D7S

2511

and

D7S

2546

.Ind

ivid

uals

id’s

have

been

upda

ted

tom

atch

thos

ein

Figu

re3.

5.

3 genetic mapping 46

3 genetic mapping 47

Table 3.1: Previous two-point LOD scores between chromosome 7q34-q36 markersand the dHMN1 disease locus in CMT54 [103].

Marker LOD score at recombination fraction0.00 0.01 0.05 0.10 0.20 0.30 0.40

D7S1518 -5.09 0.86 1.36 1.41 1.20 0.70 0.25D7S2513 -0.05 0.04 0.22 0.30 0.30 0.20 0.06D7S2473 2.99 2.94 2.75 2.51 2.00 1.30 0.62D7S661 3.73 3.68 3.43 3.11 2.40 1.70 0.78D7S498 0.58 0.59 0.60 0.57 0.44 0.25 0.08

D7S2511 2.59 2.57 2.47 2.30 1.84 1.26 0.58D7S688 1.67 1.61 1.40 1.12 0.53 -0.05 -0.33

D7S2426 2.69 2.64 2.46 2.21 1.69 1.11 0.49D7S505 3.29 3.23 3.00 2.70 2.03 1.29 0.48D7S636 3.22 3.16 2.95 2.68 2.10 1.40 0.67

D7S2439 1.72 1.70 1.59 1.46 1.16 0.83 0.45D7S3070 3.59 3.53 3.30 2.99 2.32 1.57 0.74D7S483 3.29 3.24 3.03 2.76 2.15 1.46 0.69D7S798 1.76 1.74 1.63 1.49 1.20 0.90 0.46

D7S2462 2.17 2.14 2.00 1.82 1.45 1.02 0.55D7S2546 2.48 2.49 2.45 2.32 1.90 1.30 0.65D7S637 -2.23 0.71 1.31 1.45 1.30 1.00 0.47

3 genetic mapping 48

additional 16 markers flanking D7S636 were analysed in family CMT54, with

four additional markers providing significant evidence of linkage (Table 3.1).

Multi-point linkage analysis using the linked markers achieved a maximum

multi-point LOD score of 3.74 at D7S661. Recombinant haplotype analysis of

affected individuals defined the disease locus as the 21.78 cM interval flanked

by the markers D7S2513 and D7S637. Recombinant haplotype analysis includ-

ing unaffected individuals > 25 years of age suggests a minimum probable

candidate interval spanning 16.7 cM flanked by the markers D7S2511 and

D7S2546. Physical mapping of the flanking markers localised the disease inter-

val to chromosome 7q34-q36 (Figure 3.2). Transcript mapping and mutation

analysis of the 7q34-q36 dHMN1 locus is described in Chapter 4.

Two other dHMN genes and one spinocerebellar ataxia (SCA) gene have

also been mapped to chromosome 7 (Figure 3.2). The gene for dHMN2B,

HSPB1 is located proximal to the dHMN1 locus at 7q11.23 and the gene for

dHMN5, GARS, is located at 7p15.1. The gene for SCA18 (MIM ID %607458), an

autosomal dominant sensorimotor neuropathy with ataxia, has been localised

to 7q22-q32 [207]. A mutation in the interferon-related developmental regulator

1 (IFRD1) gene was identified as the probable pathogenic mutation for SCA18,

however it is yet to be replicated in additional families [208]. Mutations in

these genes and other known dHMN genes have been previously excluded in

CMT54 [209].

3.1.7 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT54 is caused by a gene

mutation within the linked region on chromosome 7q34-q36. The aims of this

chapter are to:

3 genetic mapping 49

Figure 3.2: (Left) Ideogram of human chromosome 7, indicating the known dHMN andother neurodegenerative disease genes and loci mapped to chromosome 7. Horizontalbars indicate the location of disease genes and vertical bars indicate the extent ofeach disease locus. (Right) Physical map of the dHMN1 chromosome 7q34-q36 locus.Original flanking markers are shown in bold. The original 7 Mb minimum probablecandidate interval is flanked by the markers D7S2511 and D7S2546.

3 genetic mapping 50

• refine the minimum probable candidate interval in CMT54 through fine

mapping.

• identify additional dHMN families linked to 7q34-q36 to further refine

the disease interval and confirm the disease locus.

3.2 materials and methods

3.2.1 Microsatellite analysis

Microsatellite markers were PCR amplified from genomic DNA incorporating

a 6-carboxyfluorescein (FAM) label that was present on each forward primer

(Section 2.3.3). Primer details are included in Appendix B. Size separation of

Microsatellite marker alleles was performed using the microsatellite analysis

service from the Australian Cancer Research Foundation (ACRF) at the Garvan

Institute of Medical Research (Darlinghurst, NSW Australia). In brief, FAM

labelled PCR products were analysed using an ABI PRISM® 3100 Genetic

Analyzer (Applied Biosystems Ltd) using a GeneScan™ 600 LIZ size standard

(Applied Biosystems Ltd).

Scoring of microsatellite alleles was performed using the software, Gene-

Marker version 1.51 (SoftGenetics LLC). Allele sets were scored by size to allow

comparison of alleles between families. The software Cyrillic, version 2.1.3

(Cherwell Scientific Publishing Ltd) was used to input and manage pedigrees

and genotype data. Pedigree and marker data were exported from Cyrillic for

linkage analysis using the MLINK format.

3 genetic mapping 51

3.2.2 Microsatellite markers

The microsatellite markers used in fine mapping of the dHMN1 chromosome

7q34-q36 disease locus in family CMT54 were selected from the Marshfield

genetic map through the STS Markers track of the UCSC genome browser

(http://genome.ucsc.edu/)[210], based on their physical location between

candidate genes. When no suitable markers were available, additional new

microsatellite markers were designed by selecting repeat sequences from the

UCSC genome browser microsatellite track. The microsatellite track displays di-

nucleotide and tri-nucleotide repeats, with 100% identity, no indels of at least

15-mer, identified with the software Tandem Repeats Finder [211]. Flanking

PCR primers were designed using the software, LightScanner® Primer Design

version 1.0.R.84 (Idaho Technology Inc.) using DNA sequences from the hg18

(2006) assembly, downloaded using the UCSC genome browser. The Marshfield

STRP map used in genetic linkage analysis was modified to include these

additional markers. The physical position of the microsatellites in the hg18

human assembly was used to correctly order markers in the genetic map. No

genetic distances were available for these new markers, therefore the genetic

distances were approximated by averaging the genetic position of the flanking

markers in the sex-averaged Marshfield STRP map.

3.2.3 Haplotype analysis

Haplotypes were generated from genotype data using parental genotypes to

assign chromosomal phase of alleles where possible. Otherwise, allele phase

was inferred from the haplotypes of offspring and siblings. Recombinant hap-

lotypes were assigned by minimising the number of individuals with the new

recombinant haplotype. Pedigrees were analysed for co-inheritance of haplo-

3 genetic mapping 52

types with disease state. Pedigrees with non-segregation of haplotypes among

affected individuals and pedigrees in which an unaffected spouse carried the

disease associated haplotype were excluded. Families where unaffected at-risk

individuals carried the disease haplotype were not excluded due to possible

non-penetrance in dHMN.

3.2.4 Genetic linkage analysis

The easyLINKAGEplus version 5.02 software package [212] was used for

genetic linkage analysis, including; simulation of two-point LOD scores (Fast-

SLink), two-point linkage analysis (SuperLink), and multi-point linkage analy-

sis (GeneHunter). A modified version of the sex-averaged Marshfield STRP

genetic map including new microsatellite markers, was used in all linkage

analysis (Section 3.2.2). A script pedmake.sh (Appendix A), was written for

GNU Bourne-Again SHell (BAsH), version 4.1.5 (Free Software Foundation,

Inc) to convert the MLINK format files generated by Cyrillic into the pedigree

and marker files required for EasyLINKAGEplus.

3.2.4.1 FastSLink

The software FastSLink, version 2.51 [213, 214] was used for simulation of

maximum LOD scores for dHMN pedigrees. FastSLink parameters were;

dominant inheritance model, using microsatellite markers with six marker

alleles, disease allele frequency of 0.0001, penetrance of wt/mt and mt/mt

genotypes of 0.95, and 1000 replicates.

3.2.4.2 SuperLink

The software SuperLink, version 1.5 was used for two-point parametric linkage

analysis. SuperLink parameters were; single locus analysis type, autosomal-

3 genetic mapping 53

dominant trait with a penetrance vector of 0.95 for wt/mt and mt/mt alleles,

and disease allele frequency of 0.0001.

3.2.4.3 GeneHunter

The software GeneHunter, version 2.1.R5 was used for multi-point linkage

analysis. GeneHunter parameters were; autosomal-dominant trait with a pen-

etrance vector of 0.95 for wt/mt and mt/mt genotypes, no liability classes

applied, and 100 markers per set.

3.2.5 Association analysis

Common haplotypes of three or more alleles were identified by observation.

Each haplotype was treated as a single unit and tested for association with

the disease. The Fisher’s exact test of independence was used to test for

association of haplotypes with dHMN. A 2x2 contingency table was used, with

the first row the dHMN group haplotypes, and the second row the healthy

control group haplotypes. Column 1 counted chromosomes with a common

haplotype and column 2 counted chromosomes with a different haplotype. The

Fishers exact test calculation was carried out using the software, QuickCalcs -

online calculators for scientists (GraphPad Software, Inc, www.graphpad.com/

quickcalcs/index.cfm). Control haplotypes with incomplete matches due to

missing allele information were counted as matching to achieve the highest

stringency. The null hypothesis tested is: The relative proportion of each

haplotype is independent of the disease.

3 genetic mapping 54

3.3 results

3.3.1 Fine mapping of family CMT54

Using fine mapping there was potential to refine the minimum probable candi-

date interval defined by genetic recombination events in unaffected individuals

in pedigree CMT54. Markers were selected to saturate the flanking regions

of the minimum probable candidate interval, with one marker between each

gene located in these areas. This density of markers achieved the maximum

resolution for excluding genes from the minimum probable candidate interval.

3.3.1.1 Minimum probable candidate interval proximal flanking marker

Under the original analysis, haplotypes show a genetic recombination in in-

dividual CMT54-III:3, which defined the proximal flanking marker for the

minimum probable candidate interval as D7S2511.The physical distance be-

tween D7S2511 and the next marker, D7S688 is 2.00 Mb and contains half of the

gene CNTNAP2, the genes CUL1 and EZH2, and the uncharacterised transcript

C7orf33. The additional microsatellite markers D7S615, D7S2442 and D7S2419

were selected in the region between the previous markers D7S2511 and D7S688

(Figure 3.3). All three additional markers were informative in individual III:3.

Haplotype analysis localised the recombination event in individual III:3 to the

1.06 Mb interval between the markers D7S615 and D7S2442. Therefore, the

new proximal flanking marker for the minimum probable candidate interval is

marker D7S615. This excluded a further 0.74 Mb of CNTNAP2, a particularly

large gene occupying just under 1/5 of the critical disease interval.

3.3.1.2 Minimum probable candidate interval distal flanking marker

The genetic recombination in individual CMT54-II:7 (Figure 3.1), defined

the proximal flanking marker for the minimum probable candidate interval

3 genetic mapping 55

Figure 3.3: Detail of fine mapping of the proximal end of the minimum probablecandidate interval defined in pedigree CMT54. The recombination boundary liesbetween D7S615 and D7S2442. The new proximal flanking marker is D7S615. Thisexcludes a further 0.74 Mb of the gene CNTNAP2 from the minimum probablecandidate interval. Markers indicated with an asterisk are those used in the previoushaplotyping. Uninformative markers are underlined. The black bar represents thehaplotype inherited with the disease and the white bar indicates the alternate, non-disease haplotype. Allele scores are shown as numbers in the haplotype bars. Genesin the intervals are shown as bars, with exons shown as wide sections on the barsand introns the narrow sections, the direction of the arrow indicates gene orientation.Haplotype bars are expanded on the right.

3 genetic mapping 56

as D7S2546. Two critical markers, D7S2462 and D7S798 are uninformative in

individual II:7. The distance between D7S2546 and the next informative marker,

D7S483 is 2.01 Mb and contains the genes XRCC2, ACTR3B and the start of

DPP6. The additional marker D7S1815 was selected in this region but was also

uninformative in individual II:7.

Four new markers AD0, AD1, AD2 and AD3, were designed using repeat se-

quences identified from the microsatellites track of the UCSC genome Browser

(Table 3.2). Microsatellites AD0, AD1 and AD3 were genotyped and shown to

be polymorphic in CMT54, with 5, 7 and 4 alleles identified respectively. Mi-

crosatelite AD2 showed significant PCR banding stutter, making similar sized

alleles indistinguishable. Therefore AD2 was unsuitable as a microsatellite

marker.

Markers AD0, AD1 and AD3 were informative in individual II:7. The recom-

bination event in individual II:7 was localised to the 1.05 Mb interval between

the markers AD3 and D7S2546 (Figure 3.4). Therefore the proximal flanking

marker for the minimum probable candidate interval remains D7S2546 and no

further genes were excluded from the minimum probable candidate interval.

3.3.1.3 Updated pedigree for CMT54

The pedigree of family CMT54 was updated to include two unaffected children

of individual II:7; individuals III:7 and III:8 (Figure 3.5). Both individuals are

Table 3.2: New STR markers for fine mapping of the minimum probablecandidate interval in pedigree CMT54. The number of alleles indicated includesalleles observed in all dHMN families examined.

Marker Repeat† Physical position‡ Genomic band # alleles

AD0 (GT)23 chr7:151,940,317-151,940,362 7q36.1 5AD1 (TA)28 chr7:152,255,637-152,255,692 7q36.2 7AD2 (TA)30 chr7:152,497,434-152,497,494 7q36.2 -AD3 (CA)23 chr7:152,787,257-152,787,302 7q36.2 4† (nucleotides)number of repeats‡ Human Mar. 2006 (NCBI36/hg18) assembly.

3 genetic mapping 57

Figure 3.4: Detail of fine mapping of the distal end of minimum probable candidate in-terval in family CMT54. The recombination boundary lies between AD3 and D7S2546.Markers indicated with an asterisk are those used in the previous haplotyping. Unin-formative markers are underlined. The black bar represents the haplotype inheritedwith the disease and the white bar indicates the alternate, non-disease haplotype.Allele scores are shown as numbers in the haplotype bars. Genes in the intervals areshown as bars, with exons shown as wide sections on the bars and introns the narrowsections, the direction of the arrow indicates gene orientation. Haplotype bars areexpanded on the right.

3 genetic mapping 58

Table 3.3: Pedigree CMT54 two-point LOD scores between the dHMN locus andadditional microsatellite markers on chromosome 7q34-q36. LOD scores for D7s2442,D7s2419 and AD0 reach significance. Markers AD1 and AD3 were not genotypedthrough the entire pedigree, limiting their LOD scores.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S615 2.0383 1.8863 1.7262 1.3782 0.9848 0.5323D7S2442 3.1630 2.9259 2.6765 2.1347 1.5235 0.8223D7S2419 3.7168 3.3267 2.9174 2.0417 1.1145 0.2948

AD0 4.9420 4.5508 4.1388 3.2432 2.2324 1.0944D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

AD1 0.5961 0.5316 0.4664 0.3360 0.2105 0.0967AD3 0.8971 0.8087 0.7160 0.5191 0.3150 0.1285

recombinant, with neither inheriting the disease haplotype from individual II:7.

Therefore, these two individuals do not reduce the possibility of incomplete

penetrance in individual II:7.

3.3.2 Linkage analysis using new markers in CMT54

3.3.2.1 Two-point linkage analysis

Two-point LOD scores were calculated for the new markers in pedigree CMT54

(Table 3.3). Significant linkage was achieved for the markers D7S2442, D7S2419

and AD0 with LOD scores of 3.16, 3.71 and 4.94 at θ=0.00 respectively. The

LOD score for the marker D7S615 did not reach a significant LOD score due to

limited heterozygosity. The marker D7S1815 was uninformative in all meiosis

in family CMT54, with only one allele observed. The markers AD1 and AD3

were only genotyped in a subset of individuals in family CMT54 limiting their

LOD scores which are not representative of the power of these markers.

Figu

re3.

5:U

pdat

edpe

dig

ree

offa

mily

CM

T54

,sho

win

gha

plot

ypes

for

sele

cted

mar

kers

onch

rom

osom

e7q

34-q

36.F

emal

esan

dm

ales

are

rep

rese

nted

byci

rcle

san

dsq

uare

sre

spec

tive

ly.B

lack

ened

sym

bols

ind

icat

eaf

fect

edin

div

idu

als.

Mar

ker

alle

les

are

liste

dbe

low

ind

ivid

ual

s,or

der

edfr

omce

ntro

mer

eto

telo

mer

e.H

aplo

typ

esar

ein

dic

ated

byba

rs,

wit

hth

eha

plo

typ

ese

greg

atin

gw

ith

the

dis

ease

blac

kene

d.T

here

com

bina

tion

even

tsin

indi

vidu

als

II:1

0an

dII

I:13

defin

eth

edi

seas

ege

neca

ndid

ate

inte

rval

flank

edby

D7S

2513

and

D7S

637.

Rec

ombi

natio

nev

ents

inun

affe

cted

indi

vidu

als

II:7

and

III:3

sugg

est

am

inim

umpr

obab

leca

ndid

ate

inte

rval

,flan

ked

byD

7S61

5an

dD

7S25

46.

3 genetic mapping 59

3 genetic mapping 60

3.3.2.2 Multi-point linkage analysis in CMT54

Multi-point linkage analysis in CMT54 achieved a maximum LOD score of

3.5 in the interval between D7S2442 and D7S2462 corresponding with the

minimum probable candidate interval (Figure 3.6).

3.3.2.3 The minimum probable candidate interval in CMT54

Fine mapping in family CMT54 using haplotype and linkage analysis has

refined the minimum probable candidate interval by 0.74 Mb, to the 6.92

Mb interval flanked by D7S615 and D7S2546. This excluded region is located

within the CNTNAP2 gene which encompasses almost 1.5% of chromosome

7. CNTNAP2 spans the recombination in unaffected individual II:7, which

defines the proximal end of the minimum probable candidate interval.

3.3.3 Identification of additional dHMN families

Further refinement of the 7q34-q36 interval may also be achieved by identifying

additional dHMN families linked to 7q34-q36 and the analysis of recombinant

individuals within these families. An analysis of the 7q34-q36 disease locus

was carried out in a cohort of additional dHMN families to help refine the

locus. Families from the Molecular Medicine (Concord Hospital) peripheral

neuropathies database, indexed as dHMN, distal SMA or CMT2 with mild or

no sensory involvement were evaluated. Families were selected if the famil-

ial disease was inherited in an autosomal-dominant fashion, with unknown

genetic cause, and symptoms were similar to those shown by CMT54.

Fifty-eight additional dHMN families were identified from the database

with clinical features consistent with dHMN or CMT2 with no or minor

sensory involvement. Twenty of these families had sufficient DNA samples

available to enable haplotypes to be generated and were selected for genetic

3 genetic mapping 61

Multi-po

intL

OD

score

D7S684

D7S2202

D7S1518

D7S2513

D7S2473

D7S661

D7S2511

D7S615

D7S2442

D7S2426

D7S505

D7S636

D7S2439

D7S3070

D7S483

AD0

D7S1815

AD1

D7S798

AD3

D7S2462

D7S2546

D7S637

D7S2447

D7S1823

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

D7S2419

D7S688

D7S498

0 5 10 15 20 25

Genetic distance (cM)Figure 3.6: Multi-point LOD score plot for family CMT54, including the new markersfor fine mapping the minimum probable candidate interval.

3 genetic mapping 62

linkage analysis (Table 3.4). The remaining families had only one affected

DNA sample available or were otherwise unsuitable for linkage and haplotype

analysis and were indexed for future screening of any new dHMN gene.

Selected families were previously screened for mutations in known autosomal-

dominant dHMN genes; HSPB1, HSPB8, BSCL2, DCTN1, TRPV4, and SETX

(O.Albulym, personal communication). Families indicated as possible dHMN5

were screened for mutations in GARS, and YARS (A.Antonellis, personal

communication).

3.3.4 Linkage power analysis

The power of each family to detect linkage was determined using FastSLink as

part of the easy linkage software package. The theoretical LOD scores obtained

from simulated linkage analysis were comparable to the size of the families

(Table 3.5). No pedigrees reached a maximum simulated LOD score of greater

than 3.0, required for significant linkage. Pedigree CMT44 achieved the highest

simulated maximum LOD score of 2.02, sufficient for suggestive linkage. Pedi-

grees CMT260, CMT701, CMT375 and CMT618 achieved simulated maximum

LOD scores of > 1.5. The remaining pedigrees are small nuclear families. While

not powerful enough to achieve significant linkage on their own, these pedi-

grees may be informative if ancestral links between families can be identified.

Simulated negative LOD scores give an estimation of the exclusion power

of a pedigree. Two families CMT331 and CMT779 achieved simulated LOD

scores that reach significance for exclusion at θ = 0.00 (Table 3.5). At θ = 0.05,

CMT331 and CMT779 achieved simulated negative LOD scores of -1.05 and

-2.00 respectively. As dHMN is genetically heterogeneous, it is unlikely that

all these families share mutations in a common gene. Therefore, the combined

3 genetic mapping 63

Table 3.4: Sample and clinical information of the 20 additional dHMN families selectedfor genetic linkage analysis.

Family # Affectedsamples1

# Unaffectedsamples2,3

Condition Complicating features4

CMT44 5 7 dHMN2CMT102 3 1 dHMNCMT131 3 1 dMHNCMT260 4 (1) 3 dHMN or

CMTVCMT273 3 0 dHMNCMT274 3 1 dHMN or

CMTVCMT331 4 0 dHMN variable minor/mild

symptomsCMT333 3 1 dHMNCMT375 4 4 dHMNCMT477 2 3 dHMNCMT484 3 1 dHMN or

CMTVCMT609 4 4 (1) dHMN or

CMTVminor/mild sensory and

pyramidal signsCMT618 4 3 dHMN or

CMTVNS

CMT677 2 4 dHMN vocal cord paralysisCMT701 6 1 dHMN pyramidal signs + GOR

coughCMT703 3 0 dHMN5CMT724 4 4 CMT2D minor/mild sensoryCMT748 3 3 (1) dHMN pyramidal signsCMT779 3 (1) dHMNCMT808 3 3 dHMN pyramidal signs1 Carriers with affected children are indicated in ( ) in the affected samples column.2 Individuals with unknown clinical status are indicated in ( ) in the normal samples column.3 Number of normal samples includes married in parents.4 NS indicates normal sensory function in CMT families, GOR cough indicates gastro-

oesophageal reflux and chronic cough.

3 genetic mapping 64

Table 3.5: Simulated two-point LOD scores for additional dHMN families at θ = 0.00

Pedigree Average StdDev Min MaxCMT44 1.413277 0.672809 -1.880637 2.022451

CMT260 1.227786 0.616653 -1.079144 1.861581CMT701 1.108423 0.453187 -0.339436 1.747461CMT375 1.169244 0.619372 -1.793023 1.721420CMT618 1.351277 0.417707 0.079144 1.630130CMT724 0.917044 0.602872 -1.160476 1.441581CMT609 0.827045 0.341491 -0.719797 1.183100CMT331 0.617530 0.407886 -2.082981 1.181784CMT477 0.214345 0.384159 -1.027830 0.713644CMT779 0.376384 0.305761 -4.951016 0.602058CMT748 0.408205 0.325215 -1.021184 0.580868CMT808 0.412215 0.322585 -1.021187 0.580868CMT274 0.378580 0.328142 -1.021107 0.580862CMT677 0.192213 0.303921 -0.866278 0.442662CMT484 0.241424 0.119957 -0.000000 0.301028CMT703 0.197789 0.176466 -0.602437 0.301028CMT273 0.180616 0.147473 -0.000261 0.301028CMT131 0.183623 0.146825 -0.000261 0.301023CMT333 0.193699 0.076322 -0.124957 0.234141CMT102 0.223685 0.022229 0.176054 0.234128

3 genetic mapping 65

linkage power of this set of families was not calculated and families were

instead analysed for linkage/exclusion independently.

3.3.5 Genetic linkage analysis in additional dHMN families

Tests for a common founding mutation(s) were carried out in the 20 additional

dHMN families large enough for haplotype analysis. Family members were

genotyped for microsatellite markers at 7q34-q36. Haplotype analysis, two-

and multi-point linkage analysis was carried out on each family separately.

Nine families showed a haplotype segregating with affected individuals only.

Of these nine families, one achieved LOD scores suggestive of linkage to

7q34-q36 and eight families were uninformative. Ten families were excluded

from linkage to 7q34-q36 and in one family the marker phase could not

be determined. These results are detailed in Section 3.3.6, Section 3.3.7 and

Section 3.3.8.

3.3.6 Suggestive linkage in family CMT44

Family CMT44 is a multi-generational dHMN family with samples available

from five affected individuals across three generations (Figure 3.7). Genotypes

for markers across the 7q34-q36 interval were analysed in CMT44 using hap-

lotype analysis, two-point and multi-point linkage analysis. CMT44 has a

haplotype at 7q34-q36 which segregates with the five affected individuals and

one recombinant unaffected individual who carries the proximal end of the

disease haplotype. This haplotype is not present in three additional unaffected

individuals. One affected individual (III:7) shows a recombination event with

an alternate haplotype for markers at the distal end of the disease interval.

This recombination defines D7S637 as the distal flanking marker for CMT44.

3 genetic mapping 66

The proximal end of the disease interval lies beyond the most proximal marker

analysed in CMT44 (D7S1518), this is beyond the proximal limits of the dis-

ease interval defined by CMT54. However, the recombination event in the

unaffected individual III:6 suggests the probable proximal flanking marker

is D7S498. These two recombination events do not further refine the disease

interval beyond family CMT54; the marker D7S637 is the flanking marker

previously defined in family CMT54, and D7S498 lies beyond the minimum

probable candidate interval.

Family CMT44 achieved a two-point LOD score of 2.02 at D7S2511, D7S688,

D73070 and D7S483 (Table 3.6). The distal end of the disease interval is ex-

cluded with significantly negative LOD scores at D7S637 and D7S2447 due

to the recombination event in affected individual II:7. Multi-point linkage

analysis achieved a maximum LOD score of 2.02 across the interval between

D7S2511 and D7S2462 (Figure 3.8). This reaches the maximum possible power

for this pedigree at this locus and indicates the lower two-point LOD scores

within this interval are due to lower marker informativeness. These results

support suggestive linkage to chromosome 7q34-q36 in pedigree CMT44, with

the results reaching the limit of power in this pedigree.

3.3.7 Uninformative families

Analysis of genotypes for markers across the 7q34-q36 interval using hap-

lotype analysis, two-point and multi-point linkage analysis, indicated fami-

lies CMT609, CMT274, CMT677, CMT484, CMT703, CMT131, CMT333, and

CMT273 were uninformative at this locus. These families were not excluded

from the 7q34-q36 locus. The pedigrees, two-point LOD scores and multi-point

LOD plots for these families are included in Appendix C, Appendix D and

Appendix E respectively.

3 genetic mapping 67

Figure 3.7: Haplotype analysis of markers from the dHMN locus at 7q34-q36 in familyCMT44. Females and males are represented by circles and squares respectively. Blackenedsymbols indicate affected individuals. Marker alleles are listed below individuals,ordered from centromere to telomere. Haplotypes are indicated by bars, with the hap-lotype segregating with the disease blackened. A recombination in affected individualIII:7 defines D7S637 as the distal flanking marker in CMT44. The recombination inunaffected individual III:6 suggests the proximal flanking marker is D7S498.

3 genetic mapping 68

Table 3.6: Pedigree CMT44 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36. The maximum LOD score achievedis 2.0225 at the markers D7s2511, D7s688, D7s3070 and D7s483. These scores reach themaximum simulated LOD score of this pedigree.

Marker lod score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959D7S661 1.1405 0.9885 0.8304 0.5070 0.2184 0.0439D7S498 0.7214 0.8596 0.8744 0.7628 0.5583 0.3002

D7S2511 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

D7S2442 1.1406 1.0549 0.9647 0.7691 0.5486 0.2959D7S2419 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S688 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858

D7S2439 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959D7S3070 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858D7S483 2.0225 1.8488 1.6661 1.2708 0.8358 0.3858

AD0AD0 0.5597 0.5174 0.4730 0.3766 0.2683 0.1445D7S1815 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959AD3AD3 1.1405 1.0549 0.9647 0.7691 0.5486 0.2959D7S2462 0.8819 0.7939 0.7014 0.5017 0.2872 0.0899D7S637 -4.8289 0.5724 0.7194 0.6977 0.5316 0.2936

D7S2447 -5.4186 -0.4825 -0.2453 -0.0714 -0.0170 -0.0023

3 genetic mapping 69

Mu

lti-

po

int

LO

D s

core

Genetic distance (cM)

D7S1518

D7S661

D7S498

D7S2442

D7S2439

D7S3070 D

7S483A

D0

D7S1815

AD

3D

7S2462

D7S637

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2511 D

7S615

D7S2419 D

7S688

5 10 15 20 250

Figure 3.8: Multi-point LOD plot for family CMT44. The LOD score between themarkers D7s2511 and D7s2462 reaches the maximum simulated two point LOD scoreof 2.02.

3 genetic mapping 70

3.3.7.1 CMT609

CMT609 is a multi-generational extended family, initially diagnosed as per-

oneal muscular atrophy with pyramidal features, with minor sensory involve-

ment [215]. The age of onset of dHMN in this family is lower than CMT54, with

individual IV:2 clinically affected when examined at age six. CMT609 shows a

haplotype segregating with four affected individuals across three generations

(Figure C.3). A genetic recombination event has occurred within the 7q34-q36

interval, however due to missing DNA this recombination could be assigned

to either individual II:2, II:4 or III:3. The recombinant haplotype has been

drawn as occurring in individual III:3 for simplicity (Figure C.3), however both

possible family founder haplotypes are considered in haplotype comparisons

(Section 3.3.9). Nevertheless, this recombination event excludes the interval

centromeric of D7S2462 with significantly negative LOD scores of < -3 at θ

= 0.00 (Table D.2 and Figure E.1). The remaining interval is uninformative

achieving a maximum two-point LOD score of 0.72 at θ = 0.00 for D7S2462. The

power of this family has potential to be increased as unaffected individual IV:3

also carries the disease haplotype. However this at-risk individual was aged

only three when clinically examined and therefore of unknown phenotype and

not included in linkage analysis. The recombination event in CMT609 would

refine the critical disease interval if this family were linked to 7q34-q36.

3.3.7.2 CMT274

CMT274 is a nuclear family including one affected parent with two affected

and one unaffected children. This pedigree has a haplotype spanning 7q34-

q36 which segregates with all three affected family members and not the

unaffected child, with no recombination events observed (Figure C.2). This

pedigree achieved a maximum two-point LOD score of 0.58 at θ = 0.00 for 5

markers across the 7q34-q36 interval (Table D.1). The multi-point LOD scores

3 genetic mapping 71

were similar, between 0.5 and 0.6 across the 7q34-q36 interval (Figure E.2).

These LOD scores reach the maximum power for this pedigree.

3.3.7.3 CMT677

CMT677 is a three-generation extended family. Generation II includes an

affected sibling pair and generation III includes three unaffected siblings and

one affected cousin. DNA samples were unavailable for generation I, the

affected parent II:2 and unaffected child III:3 (Figure C.4). The haplotype

for affected individual II:2 was partly inferred using the haplotypes of the

available family members. This pedigree has a haplotype spanning the 7q34-

q36 interval which segregates with the three affected individuals and not

unaffected individual III:2. Unaffected individual III:1 is recombinant and

carries the telomeric end of the chr.7q34-q36 interval. This recombination event

suggests the distal flanking marker is D7S3070 in CMT677. However, being in

an unaffected individual, without 100% penetrance this recombination does

not produce significantly negative LOD scores in the region affected. The

two-point and multi-point LOD scores were uninformative across the 7q34-q36

interval (LOD < 0.5; Table D.5 and Figure E.3). The multi-point LOD scores

achieved the maximum power for this pedigree in the interval not affected by

the recombination event.

3.3.7.4 CMT484

CMT484 is a nuclear family with one affected parent, two affected and one un-

affected offspring. The affected parent has a further family history of dHMN,

with another possibly affected family member not included in this study

(Figure C.1). No DNA was available for the unaffected individual III:2. The

pedigree for CMT484 shows a haplotype spanning the 7q34-q36 interval segre-

gating with the three affected individuals with no recombinations observed.

The affected siblings have alternate haplotypes from the unaffected parent

3 genetic mapping 72

confirming the correct assignment of the disease haplotype. Linkage analysis

was uninformative across 7q34-q36 with a maximum LOD score of 0.30 at θ

= 0.00 achieved at 9 markers (Table D.4). The multi-point LOD scores were

0.30 across the 7q34-q36 interval (Figure E.4). These LOD scores reach the

maximum power for this pedigree.

3.3.7.5 CMT703

CMT703 is a nuclear family with one affected parent, with two affected and

two unaffected children. The affected parent has a family history of dHMN.

DNA was not available from the two unaffected siblings III:3 and III:4 (Fig-

ure C.5). The pedigree for CMT703 has a haplotype spanning 7q34-q36 which

segregates with all three affected family members, however one affected child

has a recombination event between the markers D7S2511 and D7S2442. This

recombination event can be assigned to either III:1 or III:2, therefore there are

two possible parental haplotypes for CMT703. Haplotype analysis indicates

the distal flanking marker is D7S2442, excluding the distal end of the 7q34-q36

interval. Due to limited marker informativeness in the region affected by the

recombination event, linkage analysis only achieved a significant negative

two-point LOD score of -4.40 at θ = 0.00 for D7S3070 (Table D.6 and Figure E.5).

Two-point and multi-point LOD score were uninformative in the region not

excluded by the recombination event, reaching a maximum of 0.30 at θ = 0.00.

This reaches the maximum power for this pedigree. The genetic recombina-

tion in CMT703 would refine the disease interval if the family was expanded

sufficiently to achieve significant linkage.

3.3.7.6 CMT131

CMT131 is a three-generation family with four affected individuals (Figure C.6).

DNA was available for one affected individual in each generation. The family

founder has three unaffected siblings and no prior family history of dHMN.

3 genetic mapping 73

The pedigree for CMT131 has a haplotype shared by all three affected individ-

uals. Affected individual IV:1 shows a recombination event, carrying only the

proximal end of the disease haplotype. Haplotype analysis defined D7S3070 as

the proximal flanking marker in CMT131. Significant negative LOD scores ex-

clude the proximal end of the 7q34-q36 interval (Table D.7 and Figure E.6). The

remaining interval was uninformative in two-point and multi-point linkage

analysis, achieving a LOD score of 0.30, the maximum power for this pedigree.

The genetic recombination in CMT131 would refine the disease interval if the

family was expanded sufficiently to achieve significant linkage.

3.3.7.7 CMT333

CMT333 is a three-generation family with two affected and one unaffected sib-

lings in generation II and one affected individual in generation III (Figure C.8).

No disease was indicated in generation I, however they were not clinically

examined. DNA was unavailable for the unaffected sibling II:3 or from genera-

tion I. A haplotype at the 7q34-q36 interval segregates with all three affected

individuals, with no recombination events. Two-point and multi-point LOD

scores were uninformative across the interval, with multi-point LOD scores

reaching the maximum power for this pedigree (Table D.3 and Figure E.7).

3.3.7.8 CMT273

CMT273 is a four-generation family with one affected individual in each

generation. No DNA was available for the affected individual in generation I

or the four unaffected family members. A haplotype across 7q34-q36 segregates

with the affected individuals in generation II and III (Figure C.7). Affected

individual IV:1. carries the alternate haplotype which may exclude this interval.

However, in individual IV:1, the marker phase cannot be determined for

7 markers at the proximal end of the 7q34-q36 interval (D7S1518, D7S661,

D7S498, D7S2511, D7S615, D7S2442 and D7S2419) and two markers on the

3 genetic mapping 74

distal end of the interval (D7S637 and D7S2447). There is the possibility of

a recombination event within either of these regions. Two-point and multi-

point LOD scores reached -4.94 at θ = 0.00 for informative markers, significant

for exclusion. However, this family is insufficiently powerful to exclude the

flanking regions of uninformative markers, with two-point and multi-point

LOD scores uninformative at θ > 0.00 (Table D.8 and Figure E.8). Additional

markers are required to clarify if CMT273 is excluded from 7q34-q36 or if there

has been a genetic recombination in individual IV:1.

3.3.8 dHMN families excluded from 7q34-q36

Analysis of genotypes for markers across the 7q34-q36 interval using hap-

lotype analysis, two-point and multi-point linkage analysis, indicated fami-

lies CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT331 and

CMT808 were excluded from this interval. These families do not have a haplo-

type segregating with affected individuals and achieved significant negative

LOD scores for informative markers across the disease interval. Initial linkage

analysis using two markers in CMT375 and CMT102 identified non-segregation

of haplotypes and negative two-point LOD scores sufficient to exclude linkage.

The pedigrees, two-point LOD scores and multi-point LOD plots for these

families are included in Appendix C, Appendix D and Appendix E respectively.

CMT724 is an extended family which has a mild and variable dHMN phe-

notype with no sensory involvement. As this phenotype is mild, reduced

penetrance is more likely in this pedigree. All four affected individuals share a

haplotype for the 7q34-q36 locus (Figure C.18). Affected individual III:5 has

a recombination event, excluding the region distal to the marker D7S2439.

Two unaffected individuals also carry the disease haplotype. Due to the re-

combination event in individual III:5, markers distal to this location generate

3 genetic mapping 75

significantly negative two-point and multi-point LOD scores (Table D.14 and

Figure E.12). The LOD scores in the remaining interval are uninformative, how-

ever the occurrence of the disease haplotype in multiple unaffected individuals

suggests CMT724 is not linked to 7q34-q36.

The two largest families ascertained, CMT260 and CMT701 are good candi-

dates for further genetic analysis. Future clinical evaluation of at-risk individu-

als may identify newly affected individuals. This has the potential to increase

the linkage power in these families to a significant level in a genome wide

analysis.

3.3.9 Haplotype association analysis

The haplotypes of the eight additional dHMN families that were not excluded

were compared with CMT54 to identify any shared alleles (Table 3.7). The

minimal possible haplotypes defined by recombination events in affected indi-

viduals were considered for families CMT44, CMT131, CMT609 and CMT703.

Six haplotype blocks of three or more alleles common between CMT54 and

at least one additional family were identified (Table 3.8). CMT54 and CMT44

have two separate allele groups shared; haplotype A (1-4-3) for the mark-

ers D7S615, D7S2442, and D7S2419; and haplotype B (4-4-1) for the markers

D7S2439, D7S3070 and D7S483. CMT333 and CMT54 share six alleles; haplo-

type C (6-1-4-3-4-4) for markers D7S2511, D7S615, D7S2442, D7S2419, D7S688

and D7S2439. CMT54 and CMT484 share five alleles; haplotype D (7-1-6-1-

4) for markers D7S661, D7S498, D7S2511, D7S615 and D7S2442. CMT703,

CMT484 and CMT54 share three alleles; haplotype E (1-6-1) for markers

D7S161, D7S2511 and D7S615. Families CMT54, CMT333 and CMT484 share

three alleles in overlapping haplotype F (6-1-4), for markers D7S2511, D7S615

and D7S2442.

3 genetic mapping 76

Haplotypes for 56 healthy control chromosomes for markers across the 7q34-

q36 interval were determined by genotyping unaffected spouses in dHMN

families (Appendix F). The six haplotypes were tested for association with

the disease using a two tailed Fisher’s exact test of independence. The two

groups defined for this test were: the dHMN group, consisting of nine dHMN

families, including CMT54 and the additional dHMN families not excluded

from the 7q34-q36 locus and the normal control group, consisting of the 56

healthy control chromosomes from unaffected spouses. Statistically significant

evidence of association was demonstrated for four haplotypes: B (p = 0.0173),

C (p = 0.0173), D (p = 0.0481) and F (p = 0.0099) (Table 3.9). Haplotypes A (p =

0.1010) and E (p = 0.0735) did not reach significance, found in 6 and 5 of 56

control haplotypes respectively.

These results suggest the shared haplotypes B, C, D and F may be derived

from a common ancestor. However, the region covered by haplotype B is

mutually exclusive from haplotypes C, D and F. A larger haplotype of seven

markers spanning A and B in CMT44 is identical to CMT54 with the exception

of the allele for D7S688.

3.4 discussion

3.4.1 Linkage analysis in CMT54

Linkage analysis in family CMT54 refined the minimum probable candidate

interval by 0.74 Mb to the 6.92 Mb interval flanked by D7S615 and D7S2546.

This excluded region was within the CNTNAP2 gene which spans the recombi-

nation in unaffected individual II:7 defining the proximal end of the minimum

probable candidate interval. The critical disease interval based on affected

individuals alone is the 12.98 Mb interval flanked by D7S2513 and D7S637.

3 genetic mapping 77

Tabl

e3.

7:C

ompa

riso

nof

hapl

otyp

esbe

twee

nC

MT5

4an

dei

ght

addi

tiona

ldH

MN

fam

ilies

.Hap

loty

pes

are

shad

edgr

ay,w

ithal

lele

sco

mm

onto

CM

T54

colo

ured

blue

.Hap

loty

pes

reco

mbi

nant

inaf

fect

edin

divi

dual

sw

ithi

nfa

mili

esar

eun

shad

ed.B

oth

poss

ible

hapl

otyp

esfo

rC

MT6

09an

dC

MT

703

are

show

n.

Family

D7S1518

D7S661

D7S498

D7S2511

D7S615

D7S2442

D7S2419

D7S688

D7S2439

D7S3070

D7S483

AD0

D7S1815

AD3

D7S2462

D7S637

D7S2447

Ori

gina

lhap

loty

peC

MT5

44

71

61

43

44

41

11

25

61

Add

itio

nalH

aplo

type

sC

MT4

47

14

11

43

24

41

41

22

52

CM

T333

83

26

14

34

46

5.

1.

64

1

CM

T484

97

16

14

23

33

2.

2.

16

2

CM

T274

23

33

12

63

44

3.

3.

10.

4

CM

T677

3/8

91

42

5/4

32

34

2.

..

4.

5

CM

T703

H1

64

16

14

113

13

9.

2.

22/

51

CM

T703

H2

64

16

15

34

16

9.

1.

12/

51

CM

T131

53

24

1/2

33

.1

23

22/

31

63/

51/

3

CM

T609

H1

82

75

23

23

34

5.

1.

35

1C

MT6

09H

27

31

12

27

24

72

.2

.3

51

3 genetic mapping 78

Tabl

e3.

8:H

aplo

type

sus

edin

asso

ciat

ion

anal

ysis

are

boxe

d.Th

esi

xha

plot

ype

bloc

kste

sted

are

labe

lled

Ato

F.Ea

chfa

mily

ishi

ghlig

hted

aun

ique

colo

r.H

aplo

type

sex

clud

edby

reco

mbi

nati

onin

affe

cted

indi

vidu

als

are

shad

edgr

ay.

Haplotype

Family

D7S1518

D7S661

D7S498

D7S2511

D7S615

D7S2442

D7S2419

D7S688

D7S2439

D7S3070

D7S483

AD0

D7S1815

AD3

D7S2462

D7S637

D7S2447

# alleles

Ori

gina

lhap

loty

peC

MT

544

71

61

43

44

41

11

25

61

Add

itio

nalH

aplo

type

sC

MT

447

14

11

43

24

41

41

22

52

AC

MT

333

83

26

14

34

46

5.

1.

64

13

BC

MT

447

14

11

43

24

41

41

22

52

3

CC

MT

333

83

26

14

34

46

5.

1.

64

16

DC

MT

484

97

16

14

23

33

2.

2.

16

25

CM

T70

36

41

61

53

41

69

.1

.1

2/5

1E

CM

T48

49

71

61

42

33

32

.2

.1

62

3

CM

T33

38

32

61

43

44

65

.1

.6

41

FC

MT

484

97

16

14

23

33

2.

2.

16

2

3

3 genetic mapping 79

Table 3.9: Association analysis of haplotypes with dHMN using Fisher’s exact test ofindependence. Significance scores (p value) are shown for each haplotype identified inTable 3.8. In total, 9 haplotypes were counted in the dHMN group and 56 haplotypesin the healthy control group.

Haplotype block Freq. dHMN Freq. controls SignificanceA 3 6 0.1010B 2 0 0.0173C 2 0 0.0173D 2 1 0.0481E 3 5 0.0735F 3 4 0.0496

Because of reduced disease penetrance in dHMN, recombination events in

unaffected family members cannot be used to conclusively exclude regions.

The updated pedigree for CMT54 includes two additional unaffected children

of the unaffected individual used to define the proximal flanking marker for

the minimum probable candidate interval. Neither of these individuals carry

the recombinant haplotype and therefore it is not possible to discount the

possibility of non-penetrance in individual II:7. An accurate determination

of disease penetrance in CMT54 requires the identification of the pathogenic

7q34-q36 gene mutation.

3.4.2 Linkage analysis in the extended dHMN cohort

Linkage analysis in the larger dHMN cohort of 20 families identified suggestive

linkage to 7q34-q36 in family CMT44. CMT44 achieved a two-point LOD score

of 2.02 which reaches the maximum power for this pedigree. This strongly

supports the linkage of family CMT44 to the same locus as CMT54. Seven

additional families also had a haplotype segregating with affected individuals,

however they were uninformative in linkage analysis.

3 genetic mapping 80

The recombination event in the affected individual in CMT44 defines the

distal flanking marker in this family as D7S637, with a recombination in

an unaffected family member suggesting the proximal flanking marker is

D7S498. This interval does not refine the interval defined in CMT54, with

marker D7S637 previously defined as the distal flanking marker in CMT54 and

D7S498 beyond the proximal flanking marker defining the minimum probable

candidate interval in CMT54.

Association analysis of 7q34-q36 haplotypes from dHMN families was per-

formed in an attempt to determine the most likely location of the disease gene

and to establish whether a founding mutation was evident among families

with evidence of linkage to 7q34-q36. Haplotypes from four families showed

significant association with the disease compared to a healthy control cohort

using the Fisher’s two tailed test of independence. This association suggests

the haplotype shared by dHMN families is a result of a common disease

allele within an ancestral haplotype [216]. An association between unlinked

families cannot be used as evidence for linkage in these smaller families. To

identify with certainty families that have a haplotype that is IBD requires

a known pedigree linking the two families. Founder effects have previously

been identified in peripheral neuropathies. [217] identified a founder effect

in hereditary sensory neuropathy type I, with six families of English descent

sharing the p.T399G SPTLC1 mutation and associated surrounding haplotype

on chromosome 9. Similarly, a founder effect was demonstrated in a Gipsy

population with CMT type 4C for the SH3TC2 p.R1109X mutation and associ-

ated surrounding haplotype on chromosome 5q23-q33 [218]. A founder effect

was described in a Dutch population with CMT4C at the 5q23-q33 locus which

helped narrow the disease interval prior to identification of SH3TC2 as the

disease gene [219, 220]. A similar founder effect was identified in CMTX3 [221],

where the disease gene is yet to be elucidated.

3 genetic mapping 81

Families CMT54, CMT44, CMT333 and CMT484 have 7q34-q36 haplotypes

that are significantly associated with dHMN. However the haplotype shared

by CMT44 and CMT54 spans an interval mutually exclusive to the haplotype

shared by CMT54, CMT333 and CMT484. A founder effect is not possible if this

is the case as the disease allele cannot be located in both areas. This suggests

a false positive association. A larger haplotype of 7 markers is identical in

CMT44 and CMT54 apart from the allele for D7S688. CMT54 has a (4) allele

for D7S688 and CMT44 has a (2) allele. This larger haplotype in CMT44 would

span the separate haplotypes independently associated with dHMN if it were

a real ancestral haplotype. If this larger haplotype is a true ancestral haplotype

shared between CMT44 and CMT54, then the difference in the D7S688 marker

may be due to a change of the microsatellite repeat size, a genotyping error

or a tight double-recombinant. The (2) allele was identified in all five affected

individuals in CMT44, therefore a genotyping error is unlikely. A tight double-

recombinant is also highly unlikely as the genomic distance between the two

markers flanking D7S688 is only 0.002 Cm. Change in microsatellite repeat

number generally occurs as strand-slippage during replication, with gene-

conversion during recombination or DNA repair less common [222, 223]. Most

slip mutations involve a shift of a single-repeat (∼90%), with shifts of two-

and three-repeats rarer [222, 224]. Microsatellites have a mutation rate which

is proportional to the number of repeats, with higher repeats more likely to

be mutated [200]. D7S688 is a CA repeat with a nominal size of 22 repeats,

and therefore within the highly polymorphic size range (> 10 repeats). The

two alleles in CMT54 and CMT44 differ by three CA repeats (143 to 149 bp),

which is within the normal range of a single sequence-slip mutation during

replication. It is possible that the differing allele at D7S688 between CMT54

and CMT44 is due to a mutation of the microsatellite repeat and these two

families share a common founder. In further support for a founder effect in

these four families with evidence of haplotype association, they have a similar

3 genetic mapping 82

phenotype and all have British ancestry, however any localised geographic

links are unknown.

The convergence of the significantly associated haplotypes in CMT54, CMT333

and CMT484 suggests the disease allele lies in the region flanked by the

markers D7S498 and D7S2419. This spans the proximal end of the minimum

probable candidate interval in CMT54. Using the proximal flanking marker

for the minimum probable candidate interval, the most likely position of the

disease allele is within the 1.14 Mb interval flanked by the markers D7S615

and D7S2419.

3.4.3 Proportion of dHMN caused by a pathogenic gene at 7q34-q36

This study suggests that the incidence of dHMN with 7q34-q36 mutations is

low. In the screen of 20 dHMN families with unidentified genetic cause, good

evidence for linkage was shown in one family, CMT44, with an additional

7 families uninformative. Association analysis suggested possible ancestral

links between CMT54, CMT44 and two additional families. With these families

joined, then only one additional family is not excluded from the 7q34-q36 locus.

This suggests that the 7q34-q36 locus may be responsible for ∼5% of dHMN

within this cohort. In a review of 119 autosomal-dominant dHMN families

from the peripheral neuropathies database (from which the families in this

study were selected), mutations in known genes were responsible for ∼5% of

dHMN with the remaining ∼95% having unknown genetic cause (O.Albulym,

personal communication, June 2011). This is slightly lower than found in

another autosomal-dominant dHMN cohort where known genes account for

∼15% of dHMN [6]. In both cohorts, each gene individually accounts for

between ∼1.5 and ∼7% of of dHMN. The identification of the 7q34-q36 disease

3 genetic mapping 83

gene will likely lead to identification of additional families and clarify the

proportion of dHMN caused by mutations at 7q34-q36.

4A N A LY S I S O F C A N D I D AT E G E N E S

4.1 introduction

4.1.1 Chapter overview

This chapter describes the application of a positional candidate based approach

to the identification of the disease gene within the 7q34-q36 interval. The

strategy employed here is targeted towards the identification of a mutation

within the protein-coding exons and flanking intronic sequences of candidate

genes within the 7q34-q36 interval. This analysis was conducted concurrently

with the analysis of the 7q34-q36 locus in the extended dHMN cohort described

in Chapter 3.

4.1.2 Strategy for gene identification

Several contrasting strategies for disease gene identification have been de-

veloped to make best use of available experimental resources and charac-

teristics of the disease under study. These methods include functional and

positional cloning of disease genes [225] and more recently, next generation

sequencing (NGS) strategies (Chapter 6). Functional cloning identifies a disease

gene through a detailed understanding of the biochemical defect involved

in the disease, which implicates a specific gene. Where a disease gene is not

immediately apparent, functional cloning can be expanded to a functional

84

4 analysis of candidate genes 85

candidate approach. In such an approach, a series of informed decisions are

used to identify more likely candidate genes which are then screened for

mutations. In contrast, positional cloning identifies a disease gene based on

its location in a genetic map. This is possible if a narrow genomic interval can

be defined through linkage analysis and fine mapping to a locus containing

only one gene. When a genomic interval linked to a disease contains many

genes, positional cloning is combined with functional candidate gene selection

to prioritise gene screening. This is known as a positional candidate approach

[225].

As dHMN is functionally poorly understood, pure functional cloning is

not possible and positional cloning and candidate gene strategies are applied.

Most of the known dHMN genes have been identified through linkage analysis

and positional candidate gene screening in affected families. One dHMN gene,

HSPB3 was identified through a candidate gene based screen of all small HSPs

following the identification of two other small HSP genes in dHMN through

positional cloning (HSPB1 and HSPB8) [2–4]. Due to the genetic heterogeneity

of dHMN, individual genes account for only a few percent of overall disease;

any candidate based approaches require large cohorts to identify a new gene

even in a single family [226].

4.1.3 Identifying genes within a disease interval

With the completion of the human genome project most genes have been

reliably identified through automated gene annotation complemented with

manual gene annotation [227, 228], and genetic maps are well integrated

into the physical sequence based maps. The identification of genes within

a candidate interval has generally been reduced to a computer exercise

[229]. Genomic intervals can be examined with online tools such as NCBI

4 analysis of candidate genes 86

map viewer (http://www.ncbi.nlm.nih.gov/mapview) and the UCSC genome

browser (http://genome.ucsc.edu, [210]).

Without a comprehensive genome reference, identifying the genes within

a disease locus was previously the rate limiting step in positional cloning,

requiring the identification of expressed sequences from the linked genomic

region [reviewed in 230]. This meant that much smaller disease intervals were

required to achieve a reasonable gene identification workload [231]. Early

gene identification was greatly assisted by the development of genetic maps

generated using sequence-tagged site (STS) markers [232] and expression maps

using expressed sequence tags (ESTs) [233].

4.1.4 Selection of candidate genes

Disease intervals identified though linkage analysis can contain hundreds of

genes. In a positional candidate based approach, any functional annotation

suggesting a disease role is utilised to prioritise candidate genes with a higher

likelihood of carrying the disease mutation. Common gene annotation sources

include; comparative genomics, epidemiological studies, functional candidates,

in-silico analysis of annotation databases, and gene expression patterns.

4.1.4.1 Association analysis

Genetic association analysis is used to identify genes that may have a role in

disease aetiology by identifying genetic markers that are overrepresented in

disease population compared to a healthy control population, beyond what

should occur by chance [234]. Disease associated markers may be in linkage

disequilibrium with other functional changes, or may directly cause a change in

a protein or its expression [235]. In whole-genome association (WGA) analysis,

SNP markers are genotyped across all (or most) genes in the genome. Due

4 analysis of candidate genes 87

to the large number of SNPs examined, false positives are likely. Therefore,

any initial associations identified need to be confirmed with additional studies.

WGA has been applied to the related disorder amyotrophic lateral sclerosis

(ALS) with limited success, possibly due to the heterogeneous nature of ALS

[236].

4.1.4.2 Comparative genomics

Comparative genomics is based on the principle that biological features of

gene orthologs are evolutionarily conserved among a group of organisms [237].

This provides good functional information about candidate genes which have

been characterised in animal models. This was the case for the identification

of IGHMBP2 in DSMA1; mutations in the mouse orthologue, Igmbp2 were

previously shown to be responsible for the neurodegenerative phenotype in

nmd mice [238]. As such, IGHMBP2 was the most promising candidate gene

[76] (See Section 1.1.3).

4.1.4.3 Function dependant

Study of the known dHMN genes has highlighted some common pathways and

protein functions. These include: disrupted axonal transport, RNA processing

defects, protein aggregation and inclusion body formation (See Section 1.1.11).

Candidate genes with similar functions to these known dHMN genes are

good candidates as they possibly influence the same molecular pathways.

However, as the range of functions of known dHMN genes is broad, many

candidate genes can be hypothetically linked to these pathways and suggested

as functional candidates.

4.1.4.4 In-silico prioritisation of disease genes

In-silico gene prioritisation utilises software to analyse publicly available

databases of gene ontology to prioritise candidate genes within a disease

4 analysis of candidate genes 88

locus based on interactions and homology with known disease genes [239].

While the application of in-silico prioritisation has been successful in some

gene discoveries, its application relies on the known biology of the disease

being studied and requires that the gene being targeted is not acting in a novel

disease pathway.

4.1.4.5 Gene expression analysis

Generally, gene expression patterns matching the tissues affected in a disease

can be used as a guide for selecting candidate genes. However, ubiquitously

expressed genes can still cause disease in specific tissues. This is the case in

dHMN, where some of the known disease genes are ubiquitously expressed

(see Section 1.1.11).

4.1.5 Gene screening methods

Gene screening is the process of identifying the DNA variant (mutation)

responsible for the disease. To fully characterise the nature of a DNA variant

requires the determination of the DNA sequence. DNA sequencing by chain

termination, known as ’Sanger’ or ’dideoxy’ sequencing, is commonly used to

determine DNA sequence. Sanger sequencing can determine the sequence of

DNA fragments of up to 800 bp on average, in a single reaction [240], suitable

for sequencing individual exons of genes.

As a gene screening project may need to examine hundreds of exons, several

methods have been developed for the detection of single base pair changes in

DNA without sequencing, aiming to achieve faster and cheaper gene screening.

These methods detect altered properties of the variant DNA strand including:

mobility shift of variant DNA during electrophoresis (single-stranded confor-

mational analysis and denaturing gradient gel electrophoresis), detection of

4 analysis of candidate genes 89

heteroduplexes, altered digestion of variant DNA by enzymes or chemical

cleavage (RNase A cleavage and chemical mismatch cleavage) [reviewed in

241]. These methods all require the subsequent application of direct sequencing

to define precisely the nature of any sequence variant identified.

Advances in Sanger sequencing mean it is now rapid, accurate, and afford-

able enough to allow the direct sequencing of candidate genes without the

need for prior screening using alternate methods [242]. Recently, the use of

Sanger sequencing itself has begun to be replaced by NGS technologies (par-

ticularly exome sequencing) as the primary sequencing method in positional

cloning. NGS allows the sequencing of much larger targets in a single reaction.

During the course of this project, as NGS technologies have become available

they have been adopted where relevant. This chapter details the application of

a positional candidate approach to gene screening using Sanger sequencing of

candidate genes. The application of NGS to gene screening within the 7q34-q36

disease interval is detailed in Chapter 6.

4.1.5.1 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT54 is caused by a single gene

mutation within the linked region on chromosome 7q34-q36. The aims of this

chapter are to:

• Identify and prioritise candidate genes in the 7q34-q36 disease interval.

• Sequence coding sequences and flanking intronic sequences of candidate

genes to identify the pathogenic mutation.

4 analysis of candidate genes 90

4.2 materials and methods

4.2.1 In-silico candidate gene prioritisation

Two web based in-silico candidate gene prioritisation programs were used;

GeneWanderer and Endeavour. The same set of reference genes was used

to generate comparisons in both programs. This set included the known

dHMN genes: HSPB1, HSPB8, GARS, BSCL2, IGHMBP2, DCTN1, SETX, and

PLEKHG5.

4.2.1.1 GeneWanderer

The web-based interface for the GeneWanderer software (http://compbio.

charite.de/genewanderer/GeneWanderer [243] was used with the parameters:

linkage interval was set as chr7:141,053,887-154,034,713, analysis method was

set to Random walk, and all interactions were selected.

4.2.1.2 Endeavour

The web-based interface for the Endeavour software (http://www.esat.kuleuven.

be/endeavourweb [244, 245] was used with the following parameters: the

species selected was Homo sapiens, all data sources, and the candidate gene

interval was chr7:141,053,887-154,034,713.

4.2.2 PCR and sequencing for gene screening

PCR amplification of exons was carried out as described in Section 2.3.3, with

primer sets detailed in Appendix B. Sequencing was carried out as described

in Section 2.3.6.

4 analysis of candidate genes 91

4.2.2.1 Sequence analysis

Sequence traces in .abi format were aligned using SeqMan II 5.08 (DNAStar),

and sequence variants were identified by visual inspection of sequence traces.

The consensus sequence from the affected individuals was aligned to the

Human Mar. 2006 (NCBI36/hg18) Assembly using the BLAT tool from the

UCSC genome browser [198]. Correct sequence identity, exon coverage, and

characterisation of identified variants was carried out following the BLAT

alignment.

4.2.3 Analysis of effect of variants on exon splicing

The effect of mutations on exon splicing can be predicted using a combination

of several splicing prediction programs and a modest threshold for change in

splicing scores [246]. The guidelines for validation of splicing variants [247]

were followed, with at least three programs used to predict the effect of each

novel variant and variants with two positive in-silico results further analysed

with RNA studies.

4.2.3.1 In-silico splicing prediction

The splicing prediction programs used were NetGene2 [248], Splice Site Pre-

diction by Neural Network (NNSplice) [249], Human Splicing Finder (HSF)

[250], GeneSplicer [251] and Alternative Splice Site Predictor (ASSP) [252].

Default options were used for each software. Where programs required DNA

sequences as input, DNA sequence was downloaded from the UCSC genome

browser Human Mar. 2006 (NCBI36/hg18) assembly then manually edited to

include DNA variants being tested.

4 analysis of candidate genes 92

4.2.4 RNA studies of splicing variants

4.2.4.1 Isolation of total RNA

Total RNA was extracted from lymphocytes purified from peripheral blood

using density gradient centrifugation. Approximately 30 ml of blood was

collected into 9 ml VACUETTE® K3EDTA tubes (Greiner bio-one). The blood

sample was made up to a volume of 35 ml by adding cold RPMI-1640 medium

(Sigma). The Blood/RPMI was overlaid onto 15 ml of Ficoll-paque® PLUS (GE

healthcare Bio-sciences AB, Sweden) and centrifuged at 1700 × rpm for 40 min

without brakes using a Megafuge® 1.0 (Heraeus® Instruments) type centrifuge.

The buffy coat was recovered by pipetting then washed twice by suspending

cells in 35-50 ml RPMI media followed by centrifugation at 1000 × rpm for 5

min.

RNA extraction was carried out using the total RNA isolation NucleoSpin®

RNA II kit (Machery-Nagel) as per the protocol Total RNA purification from

cultured cells and tissue with NucleoSpin® RNA II, version May 2003/Rev.02.

In brief, buffy coat cells were lysed using the supplied lysis buffer, with me-

chanical disruption by vortexing then passing the sample through a 21 g needle

three times. The cell lysate was filtered, RNA was bound to a NucleoSpin®

RNA II column, desalted, DNAse digested, and washed three times. RNA was

eluted using 60 µL of RNAse free water, then reapplied to the column and

re-eluted. All filtration, wash and elution centrifugation steps were carried out

using a 5417 C bench-top centrifuge (Eppendorf). Eluted RNA was stored at

-70 °C.

4 analysis of candidate genes 93

4.2.4.2 cDNA synthesis and RT-PCR

All cDNA synthesis was carried out using the iScript cDNA synthesis kit

(BioRAD), using the supplied protocol, in a 2 × reaction volume (40 µL). Each

reaction was set up as follows:

Component Volume

5× iScript Reaction Mix 8 µL

iScript Reverse Transcriptase 2 µL

RNA template up to 1 µg total RNA

Nuclease-free water to a total volume of 40 µL

The reaction mix was incubated for 5 minutes at 25°C, 30 minutes at 42°C,

and 5 minutes at 85°C. Synthesised cDNA was stored at -20°C. RT-PCR was

carried out using standard PCR protocol described in Section 2.3.3. Nucleotide

primers for RT-PCR are detailed in Appendix B.

4.2.5 RNA folding prediction

Sequence variants within the 3’UTR of genes may affect RNA secondary

structure. The software Mfold version 4.4 [253, 254] was used to predict RNA

secondary structure. Required nucleotide sequences were downloaded from

the UCSC genome browser, Human Mar. 2006 (NCBI36/hg18) assembly, then

manually edited to include the sequence variants being tested.

4 analysis of candidate genes 94

4.3 results

4.3.1 Positional candidate genes

The critical disease interval defined by recombination events in affected in-

dividuals in CMT54 spans 12.98 Mb of DNA at 7q34-q36 (NCBI36/hg18

assembly) and is contained in two sequenced map contigs: NT_007914.10

and NT_034885.1 (Figure 4.1) [255]. The sequence gap between these two

contigs has an estimated size of 99-100 kb based on fluorescence in situ hy-

bridisation analysis of human DNA and comparison to the analogous region

in the mouse genome (gap report HG-740: http://www.ncbi.nlm.nih.gov/

projects/genome/assembly/grc) [255]. The gap is located on the telomeric

end of the critical disease interval. The minimum probable candidate interval

defined by recombination events in unaffected individuals in CMT54 is located

entirely within sequenced map contig NT_007914.10. Positional candidate

genes were identified from within the critical disease interval by retrieving

records for known genes from the UCSC genes track using the UCSC table

browser tool [256]. The UCSC genes track contains gene records combined

from RefSeq [257, 258], Genbank [259], the consensus coding sequence (CCDS)

project [260] and UniProt [261] databases. The critical disease interval con-

tained 245 gene records after removing duplicates. These records correspond

to:

• 89 protein coding genes (not including taste, olfactory or T-cell receptor

transcripts).

• 10 taste receptor gene transcripts.

• 14 olfactory receptor gene transcripts in the olfactory receptor gene

cluster [262].

4 analysis of candidate genes 95

• 86 T-cell receptor beta-chain gene transcripts localised to the T-cell recep-

tor beta-chain gene complex at 7q35 [263].

• 32 putative non-coding transcripts

• 14 pseudogenes

The 89 coding genes are the target for this candidate gene screening approach

(Figure 4.1). The minimum probable candidate interval contains 60 gene tran-

scripts (Table G.1), with 28 gene transcripts in the remaining 7q34-q36 critical

disease interval (Table G.2). To ensure all possible exons of candidate genes

were screened, transcripts for splicing variants or alternate protein isoforms

were identified from the UCSC genes and RefSeq tracks from the UCSC

genome browser as well as the Uniprot database. Unique exons were added to

the candidate genes list as alternate isoforms.

TR

YX

3,

PR

SS1

, PR

SS

2EPH

B6

, TR

PV

6,

TR

PV

5C

7orf

34

, K

EL,

PIP

GSTK

1,

TM

EM

13

9C

ASP2

, C

LCN

1FA

M1

31

B,

ZYX

, EPH

A1

FAM

11

5C

, FA

M1

15

A

CN

TN

AP2

C7

orf

33

, C

UL1

, EZ

H2

PD

IA4

ZN

F78

6,

ZN

F42

5Z

NF3

98

, Z

NF2

82

ZN

F21

2,

ZN

F78

3Z

NF7

77

, Z

NF7

46

ZN

F76

7,

KR

BA

1,

ZN

F4

67

SSPO

, ATP6

V0

E2

AC

TR

3C

, LR

RC

61

C7

orf

29

, R

AR

RES2

REPIN

1,

ZN

F77

5G

IMA

P8

, G

IMA

P7

, G

IMA

P4

GIM

AP6

, G

IMA

P2

GIM

AP1

, G

IMA

P5

TM

EM

17

6B

, TM

EM

17

6A

AB

P1

, KC

NH

2N

OS3

, ATG

9B

AB

CB

8,

AC

CN

3C

DK

5,

SLC

4A

2,

FASTK

TM

UB

1,

AG

AP3

, G

BX

1A

SB

10

, A

BC

F2,

CSG

LCA

-TSM

AR

CD

3,

NU

B1

, W

DR

86

CR

YG

N,

RH

EB

, PR

KA

G2

G

ALN

TL5

, G

ALN

T1

1M

LL3

, C

CT8

L1X

RC

C2

, A

CTR

3B

DPP6

7q34 7q35 7q36.1 7q36.2

31.121.11 33 35q34

AR

HG

EF3

5,

AR

HG

EF5

NO

BO

X,

TPK

1

Olfact

ory

rece

pto

r g

enes

WEE2

, SSB

P1

PR

SS3

7,

CLE

C5

AM

GA

M,

LOC

93

43

2T-

cell

rece

pto

r B

gen

es

CDI

MPCI

Cytogenetic map

Genes

Known genes

Chromosome 7

5Mb

Map contigs NT_007914

NT_034885

Figure 4.1: Genes mapping to the 7q34-q36 disease interval. Top panel: chromosome7 ideogram depicting the location of the 7q34-q36 disease locus. Lower panel: detailof the 7q34-q36 disease locus. Figure labels indicate the following: critical diseaseinterval (blue bar), minimum probable candidate interval (green bar), cytogenetic mapand known genes.

4 analysis of candidate genes 96

4.3.2 Functional candidate gene selection

A literature and database search was carried out to identify functional can-

didate genes from the 89 positional candidate genes. This search included;

neurodegenerative disease role or association, comparative genomics, func-

tional candidates, gene expression patterns and in-silico prioritisation.

4.3.2.1 Relationship of candidate genes with neurodegenerative diseases

Using the Online Mendelian Inheritance in Man (OMIM) database (http:

//www.ncbi.nlm.nih.gov/omim), positional candidate genes were examined

for any known function in, or published association with neurodegenerative or

neurological diseases. Six genes were identified as potential candidate genes:

DPP6, ACCN3, CUL1, CNTNAP2, KCNH2 and NOS3. The gene DPP6 is asso-

ciated with susceptibility to sporadic ALS in different populations of European

ancestry [264]. The same association was reproduced in homogeneous Irish

[265] and Italian populations [266]. A second gene, ACCN1 was shown to be

associated with sporadic ALS in one study [267]. The positional candidate

ACCN3 forms a functional complex with ACCN1, which has a functional role

in the peripheral nervous system [268, 269]. Cullin1, coded by the candidate

gene CUL1, interacts with Parkin [270], mutations in which cause autosomal

recessive juvenile Parkinson’s disease (PARK2 or PDJ: MIM#600116) [271].

CNTNAP2 mutations cause several neurological diseases: recessive frameshift

mutations in CNTNAP2 cause cortical dysplasia-focal epilepsy (CDFE) syn-

drome [272]: CNTNAP2 haploinsufficiency causes schizophrenia and epilepsy

[273]: and CNTNAP2 is disrupted through a chromosomal insertion/translo-

cation in familial Gilles de la Tourette syndrome and obsessive compulsive

disorder (motor and vocal ticks) [274]. In addition, SNPs within CNTNAP2 are

associated with increased risk of developing autism [275]. KCNH2 is associated

with schizophrenia, and has a functional role in the brain [276]. Genetic vari-

4 analysis of candidate genes 97

ants in NOS3 infer a slight increase in risk for late-onset Alzheimer’s disease

[277, 278].

4.3.2.2 Identification of candidate genes through comparative genomics

Using comparative genomics, the gene CASP2 was identified as a good candi-

date gene. Mutant CASP2 causes accelerated motor neuron cell death during

development in a mouse model [279]. CDK5 was previously screened for CDS

mutations in this family [103] due to the observed deregulation of CDK5 in

a SOD1 ALS mouse model [280]. The candidate gene DPP6 is disrupted by

a chromosomal re-arrangement in the rump white (Rw) mouse which is em-

bryonic lethal in homozygotes [281]. However, complementation with DPP6

deletion mutants indicated that DPP6 disruption is not the embryonic lethal

factor in the Rw chromosomal rearrangement [282]. Therefore, this does not

add further support for DPP6 as a candidate gene.

4.3.2.3 Functional candidate genes

The function of candidate genes were examined to identify genes with roles

in tissues of the peripheral nervous system or in pathways affected in dHMN

(Section 1.1.11). Seven genes were identified as potential candidate genes: CUL1,

FASTK, SMARCD3, SSBP1, CNTNAP2, TRPV6 and TRPV5. CUL1 is a core

protein of the complex responsible for ubiquination of components of NFκB

complexes [283, 284]. Cell death through loss of NFκB signalling is caused by

PLEKHG5 mutations in DSMA4 [162] (see Section 1.1.8). FASTK is a negative

modulator of neurite outgrowth and a positive modulator of neurite retraction,

with a likely signalling role in cytoskeleton maintenance [285]. SMARCD3 is

involved in chromatin remodelling and transcriptional regulation [286]. SSBP1

is involved in genome stability as part of the single-stranded DNA (ssDNA)-

binding complex and is a core protein of mitochondrial nucleoid complexes

responsible for replication and translation of mtDNA [287, 288]. CNTNAP2 is

4 analysis of candidate genes 98

thought to have a role in the localisation of ion channels at the juxtaparanodal

region of myelin required for proper conduction of the nerve impulse [289].

TRPV6 and TRPV5 form calcium-permeable channels, which participate in

neurotransmission, muscle contraction, and exocytosis by providing calcium

as an intracellular second messenger. TRPV6 and TRPV5 are members of

the same subgroup of the transient receptor potential protein superfamily as

the recently identified dHMN gene TRPV4. However, TRPV6 and TRPV5 are

calcium selective channels as opposed to TRPV1-4 which are thermosensitive

non-selective cation channels. Nevertheless, they remain good candidate genes.

4.3.2.4 Prioritisation based on gene expression pattern

The gene expression pattern of candidate genes in brain, spinal cord and

muscle was examined using the expression section of the GeneCards® page

for each gene [290, www.genecards.org]. The GeneCards® expression figures

are generated using data from Genenote [291, 292] and GNF BioGPS [293, 294].

Five genes, CDK5, CNTNAP2, DPP6, WDR86 and ATP6V0E2 had high relative

expression levels in brain or spinal cord. For two genes NOBOX and ACTR3C,

there was no expression data available for some or all of the tissues. The

remaining genes were all expressed in the selected tissues (Appendix G,

Table G.3).

4.3.2.5 In silico prioritisation of candidate genes

Two separate programs, GeneWanderer and Endeavour, were used to prioritise

candidate genes. The web-based software GeneWanderer, investigates the

interaction of candidate genes with known disease genes through a protein-

protein interaction (PPI) network based on experimental data from several

species [243]. The second program, Endeavour, also investigates the interaction

of candidate genes with known disease genes, however uses a greater number

of relationships including; literature, functional annotation, gene expression,

4 analysis of candidate genes 99

Table 4.1: Candidate gene prioritisation ranking using GeneWanderer and Endeavoursoftware.

Rank GeneWanderer EndeavourGene Symbol Score Gene Symbol Score

1 CASP2 0.8695 CDK5 0.0002472 GSTK1 0.8504 CASP2 0.002673 TRPV6 0.03027 ACTR3B 0.003874 NOS3 0.03019 ARHGEF5 0.003955 CDK5 0.02678 CENTG3 0.007146 CUL1 0.02558 SSBP1 0.008377 MGAM 0.02010 ABP1 0.01058 KEL 0.01449 PDIA4 0.01679 CNTNAP2 0.01380 GIMAP4 0.0205

10 ABCF2 0.01109 ZYX 0.0225

protein domains, PPI, pathway membership, gene regulation, transcriptional

motifs, and sequence similarity [244, 245]. Two genes, CASP2 and CDK5 were

ranked highly by both programs, with the remainder unique to each list

(Table 4.1). Both these genes have previously been identified as good candidates.

The gene GSTK1 also scored highly with GeneWanderer.

4.3.3 Screening candidate genes

The protein-coding exons and flanking intronic regions of candidate genes were

sequenced using direct sequencing, in two affected and two normal individu-

als from CMT54. Functional candidate genes within the minimum probable

candidate interval were prioritised for sequencing, followed by some candi-

date genes beyond this interval. The remaining genes screened were generally

selected based on their location in the minimum probable candidate interval.

A total of 32 genes were sequenced (comprising 475 exons): CUL1, ACCN3,

DPP6, SMARCD3, KCNH2, NOS3, CASP2, FASTK, TRPV6, GSTK1, ABCF2,

CENTG3, ABP1, PDIA4, KEL, ZYX, ABCB8, RARRES2, CLCN1, TMUB1, NUB1,

4 analysis of candidate genes 100

EZH2, REPIN1, CRYGN, RHEB, ZNF775, PRKAG2, TMEM176A, TMEM176B,

GALNT11, SLC4A2, and XRCC2. One gene, MLL3 was partly sequenced with

40 of 59 exons sequenced. Prior to this PhD project, the protein-coding exons

and exon-intron boundaries for two positional candidate genes, CDK5 and

CNTNAP2, were sequenced in an affected individual from CMT54, with no

pathogenic variants identified. The sequencing traces for CNTNAP2 and CDK5

were reviewed to check for variants affecting splicing. In total, 31 sequence

variants were identified in both affected individuals screened (Table 4.2). 14 of

these variants were not present in the two unaffected individuals sequenced.

Of these variants, nine were reported validated SNPs in dbSNP 1.29 or 1.30.

Five variants were examined further as possible pathogenic mutations.

4.3.3.1 ACCN3:c.915C>T

The variant ACCN3:c.915C>T, is a non-synonymous base change predicted to

cause the amino acid substitution Pro289Ser. Examination of the conservation

track of the UCSC genome browser indicated Pro289 is 100% conserved across

mammals, however the Stickleback is Wt for Ser in this location. Sequencing

of CMT54 family members showed this variant segregated with all affected

individuals tested (9 individuals) and was not present in any unaffected

individuals. Sequencing of a healthy control cohort identified this variant in 9

of 269 individuals, with a minor allele frequency (MAF) of 1.67%. Therefore,

this variant was considered a non-pathogenic SNP.

4.3.3.2 DPP6:c.*347A>G

The variant DPP6:c.*347A>G, lies in the 3’UTR of a short isoform of the DPP6

gene. This base is conserved across primates. Using the software mfold version

4.4, the variant was predicted to cause altered mRNA secondary structure

(data not shown). Sequencing of all CMT54 family members showed that this

variant did not segregate with affected individuals, found only in 4 affected

4 analysis of candidate genes 101

Tabl

e4.

2:D

NA

vari

ants

iden

tifi

edin

gene

ssc

reen

ed.A

llm

utat

ions

foun

din

affe

cted

ind

ivid

uals

sequ

ence

din

clud

ing

know

nSN

Ps.

Alle

lefr

equ

ency

.Whe

nbo

thco

ntro

lsar

eho

moz

ygou

s,al

lele

freq

will

be0%

or10

0%.1

00%

ind

icat

esth

eyar

eho

moz

ygou

sfo

rth

eal

tern

ate

SNP

alle

le.W

here

rsnu

mbe

rsar

ein

dica

ted,

alle

les

iden

tifie

din

sequ

enci

ngm

atch

edth

ose

repo

rted

indb

SNP.

Gen

eSy

mbo

lR

efSe

qac

cess

ion

Var

iant

dbSN

PTy

pe#

Aff

ecte

d#

cont

rols

freq

uenc

yco

ntro

lsC

NTN

AP2

NM

_014

141

c.15

28+1

17in

sAA

AC

rs34

6158

41in

tron

1-

-C

NTN

AP2

NM

_014

141

c.34

76-1

5C>

Ain

tron

1-

-C

NTN

AP2

NM

_014

141

c.37

90-6

delC

Mul

tipl

eSN

Psin

tron

2-

-C

UL1

NM

_003

592

c.11

91+8

C>

Trs

1027

1133

intr

on2

225

%fo

rT

AC

CN

3N

M_0

0476

9c.

915C>

T-

mis

sens

e(P

289S

)9

269

3.2%

for

T

ABP

1N

M_0

0109

1c.

1856

+15G>

Ars

1198

1217

intr

on2

20%

for

AC

LCN

1N

M_0

0008

3c.

261C>

Trs

6962

852

cds-

syno

n2

20%

for

TC

LCN

1N

M_0

0008

3c.

1402

-9C>

Trs

2272

252

intr

on2

20%

for

TC

LCN

1N

M_0

0008

3c.

2154

C>

Trs

2272

251

cds-

syno

n2

210

0%fo

rT

CLC

N1

NM

_000

083

c.21

80C>

Trs

1343

8232

mis

sens

e(P

727L

)2

20%

for

T

CLC

N1

NM

_000

083

c.22

84+3

3C>

Grs

5668

0997

intr

on2

20%

for

GD

PP6

prot

ein

Q8I

YG

9‡c.

*347

A>

G-

3’U

TR4

of10

244.

1%fo

rG

DPP

6N

M_1

3079

7c.

1666

+47T>

A-

intr

on3

150

%fo

rA

DPP

6N

M_1

3079

7c.

358+

30C>

Trs

2291

538

intr

on3

10%

for

TD

PP6

NM

_130

797

c.45

7+32

G>

A†

rs37

5004

2in

tron

31

50%

for

AD

PP6

isof

orm

2N

M_0

0193

6c.

303G>

A†

cds-

syno

n

Tabl

e4.

2-c

ontin

ued

onne

xtpa

ge

4 analysis of candidate genes 102

Tabl

e4.

2-c

ontin

ued

from

prev

ious

page

Gen

eSy

mbo

lR

efSe

qac

cess

ion

Var

iant

dbSN

PTy

pe#

Aff

ecte

d#

cont

rols

freq

uenc

yco

ntro

lsEZ

H2

NM

_004

456

c.21

11-4

6C>

T-

intr

on2

20%

for

TEZ

H2

NM

_004

456

c.21

10+6

T>

Grs

4127

7434

intr

on2

225

%fo

rg

EZH

2N

M_0

0445

6c.

2276

+6T>

Crs

4127

7434

intr

on2

210

0%fo

rC

TMEM

176B

NM

_014

020

c.28

0A>

Crs

3173

833

mis

sens

e(S

94R

)2

250

%fo

rC

TMEM

176B

NM

_014

020

c.40

0G>

Ars

2072

443

mis

sens

e(A

134T

)2

250

%fo

rA

TMEM

176B

NM

_014

020

c.53

8C>

Trs

1725

6042

mis

sens

e(R

180W

)2

20%

for

TTM

EM17

6BN

M_0

1402

0c.

*3T>

Crs

2302

479

3’U

TR2

225

%fo

rC

NO

S3N

M_0

0060

3c.

270+

42G>

Ars

1800

781

intr

on2

225

%fo

rA

NO

S3N

M_0

0060

3c.

774T>

Crs

1549

758

cds-

syno

n2

225

%fo

rC

NO

S3N

M_0

0060

3c.

1998

C>

Grs

2566

514

cds-

syno

n2

225

%fo

rG

NU

B1N

M_0

6118

c.59

8+7A>

Grs

4468

86in

tron

22

75%

for

GK

CN

H2

NM

_172

057

c.16

39A>

Crs

1805

123

mis

sens

e(K

557T

)1

--

ABC

B8N

M_0

0718

8c.

659+

12C>

Trs

2303

924

intr

on2

20%

for

TA

BCB8

NM

_007

188

c.98

4T>

Ars

2303

926

cds-

syno

n2

225

%fo

rA

CRY

GN

NM

_144

727

c.41

1C>

Trs

2075

001

cds-

syno

n2

20%

for

TG

ALN

T11

NM

_022

087

c.71

2+50

C>

T-

intr

on2

225

%fo

rT

†O

neva

rian

tha

sm

ult

iple

affe

cts

inD

PP6

isof

orm

s.B

oth

the

DPP

6:c.

457+

32G>

Aan

dD

PP6

isof

orm

2:c.

303G>

Are

fer

toth

esa

me

mut

atio

n.‡

Uni

Prot

KB/

TrEM

BLac

cess

ion

num

ber.

4 analysis of candidate genes 103

individuals. Furthermore, this variant was present in 1 of 24 unrelated control

individuals (MAF = 2.1%). This variant was considered a non-pathogenic SNP.

4.3.3.3 EZH2:c.2111-46C>T

The variant EZH2:c.2111-46C>T, is an intronic variant located upstream of

exon 19. Examination of the conservation track of the UCSC genome browser

indicated this nucleotide is not conserved across species, with the T allele

present in one of six species at this position. Using four different splicing

prediction programs (NetGene2, NNSplice, HSF and GeneSplicer), this variant

was predicted to not affect splicing or create any additional splice motifs.

Therefore, this variant was excluded. Segregation analysis in the larger CMT54

pedigree was not carried out.

4.3.3.4 CNTNAP2:c.3476-15C>A

The variant CNTNAP2:c.3476-15C>A, is a non-coding variant that was con-

sidered to potentially affect both the long and short isoforms of CNTNAP2,

located 15 bp upstream of exon 21 and exon 1 respectively (Figure 4.2). Ex-

amination of the conservation track of the UCSC genome browser indicated

this nucleotide is not conserved across species with C and T nucleotides at

this location in eight species. Sequencing of CMT54 family members showed

this variant segregated with all affected individuals and was not present in

any unaffected individuals. The effect of the CNTNAP2:c.3476-15C>A variant

was simulated using the splicing prediction programs NetGene2, NNSplice

and HSF. This variant was predicted to reduce the power of the adjacent splice

site by -1% by NNSplice, and -5.24% by HSF with no change predicted by Net-

Gene2. In addition, the formation of a new enhancer motif site was predicted

by the HSF software. These scores suggested the CNTNAP2:c.3476-15C>A vari-

ant may affect splicing. The effect of this variant on splicing was investigated

by RT-PCR using three PCR primer sets designed to detect any aberrantly

4 analysis of candidate genes 104

CNTNAP2 short isoform

CNTNAP2

Ex. 20 Ex. 21 Exon 22 Ex. 23 Exon 24

3'UTR5' UTR

3'UTR

PCR-A (479bp)

PCR-B (390bp)

PCR-C (370bp)

Exon 1 Ex. 2 Exon 3C>A

C>A

20F 21F

shortF 23R 24R

23R 24R

100 bp

Figure 4.2: Location of variant CNTNAP2:c.3476-15C>A and adjacent exons. Thisvariant possibly affects both isoforms of CNTNAP2, with the variant located in thethree PCRs overlapping the splice site variant. Exon 22 is 240 bp and Exon 23 is 81 bp

spliced variants of CNTNAP2 (Figure 4.2). These amplicons may identify exon

skipping of exon 21 or criptic splice site use altering the length of the transcript.

No PCR products indicating aberrant exon splicing were detected in affected

individuals from CMT54 (Figure 4.3). However, this cannot rule out the pos-

sibility of a neuron-specific splicing defect. As neuronal tissue from affected

patients is unavailable in this study, this can not be readily tested. Sequencing

of a healthy control cohort identified a heterozygous C>A change in 6 of 84

control individuals sequenced (MAF = 3.57%). Therefore, this variant was

considered a non-pathogenic SNP. This variant was subsequently identified as

SNP rs77706740, included in the dbSNP build 132 (release 9/11/2010).

4.3.3.5 CNTNAP2:c.3790-6delC

The variant CNTNAP2:c.3790-6delC is a deletion that lies upstream of exon 22.

Examination of the conservation track of the UCSC genome browser indicated

this nucleotide is not conserved across 10 species, with variable gaps of up to

9 bp present at this location. There are seven SNPs reporting small deletions,

4 analysis of candidate genes 105

Figure 4.3: PCR amplification of cDNAflanking the CNTNAP2:c3476-15C>Avariant. PCR-A and PCR-B amplify cDNAfrom the full length CNTNAP2 transcriptwhereas PCR-C amplifies the short iso-form transcript. Lanes marked (1) and (2)are affected individuals, (+) unaffectedcontrol, (-) PCR negative control, and (M)DNA size marker HyperLadder IV 100-1000bp DNA ladder (Bioline).

2 -

600500400

M +I

A---

I 2 -

300

500400

M

-

--

+

B

+ -

300

500400

M

-

--

I 2

C

4 analysis of candidate genes 106

Cytogenetic map

Genes

Chromosome 7 21.11 31.1 33 q34 35CNTNAP2 short isoform LOC392145 C7orf33 CUL1CNTNAP2

7q35 7q36.1

500 kb

Figure 4.4: Genes in the interval defined by significantly associated haplotypes (Sec-tion 3.3.9). Top panel: chromosome 7 ideogram depicting the location of the smallerlocus. Lower panel: detail of the smaller locus. Figure labels indicate the following:cytogenetic map and known genes. The 5’ end of CNTNAP2 is excluded by the proxi-mal end of the interval. Only the 5’-UTR of CUL1 is included at the distal end of theinterval.

of two to four bases, affecting this base or adjacent bases, with this nucleotide

deleted in the SNPs, rs71188981, rs72268642 and rs72031591 (dbSNP build

130). Analysis of this variant with splicing prediction software indicated no

change with NetGene2 (confidence = 0.97) and a slight reduction in confidence

with ASSP (12.537 > 12.209). With the reported SNPs at this location and the

neutral scores in splicing prediction software, this variant was considered a

non-pathogenic SNP.

4.3.4 Analysis of the interval defined by significantly associated haplotypes

The 1.14 Mb interval defined by the overlapping haplotypes associated with

dHMN (Section 3.3.9) was examined in greater detail. This interval, at the

proximal end of the minimum probable candidate interval contains the 3’

end of CNTNAP2, the 5’ end of CUL1, transcript C7orf33 and pseudogene

LOC392145 (Figure 4.4). Both CNTNAP2 and CUL1 were good candidate genes

and protein-coding exons of both genes have been excluded. The CNTNAP2

3’UTR, CUL1 5’UTR, transcript C7orf33 and pseudogene LOC392145 were

examined using NGS detailed in Chapter 6.

4 analysis of candidate genes 107

4.4 discussion

4.4.1 Gene screening summary

The protein-coding exons and exon-intron boundaries of 35 genes have been

sequenced and no putative pathogenic mutation was identified. Thirty-one

sequence variants were identified in two affected individuals from CMT54,

however, all were excluded as non-pathogenic through: non-segregation with

all affected members of CMT54, being identified in unaffected spouses or

healthy controls, reported as SNPs in dbSNP, or without a putative pathogenic

function. No further genes were sequenced as this candidate based approach

was replaced with a NGS based screen of genes within the 7q34-q36 interval

(Chapter 6).

This application of a positional candidate based approach for gene screening

has been ineffective in identifying the pathogenic mutation. This may be due

to the disease gene having a novel function unrelated to the known dHMN

disease genes, a novel uncharacterised function of a candidate gene, or the

mutation confers a gain of new function to a gene not prioritised. Also, the

possibility of a mutation acting through a regulatory mechanism or being a

CNV need to be considered. These possibilities are considered in Chapter 6

and Chapter 5. Nevertheless, the examination of the protein-coding exons and

flanking intronic sequences for functional candidate genes is the most logical

place to start a gene screen such as this.

The two dHMN genes identified recently, ATP7A and TRPV4 are both novel

genes in dHMN. Both genes had no obvious relationship with the other

known dHMN genes and therefore were only identified after sequencing

numerous other candidate genes. The X-linked dHMN, DSMAX (Section 1.1.7),

was mapped to an interval containing 71 candidate genes, 33 of which were

4 analysis of candidate genes 108

screened prior to the identification of the pathogenic mutation in ATP7A.

ATP7A is a membrane bound copper transporter and ATP7A mutations had

been shown previously shown to cause Menkes disease and OHS, was not

selected as a priority candidate. Similarly, the gene for congenital DSMA

(Section 1.1.9), TRPV4 was not selected as a primary candidate, with the

protein-coding exons and exon-intron boundaries of 19 genes sequenced prior

to the identification of the pathogenic mutation in TRPV4. These recent gene

discoveries have added to the genetic heterogeneity of dHMN.

4.4.2 Was the mutation missed?

The possibility that the pathogenic mutation was missed is unlikely. Dideoxy

sequencing was used in this screen which has a very high accuracy. Only

sequence traces of high quality were used and two affected individuals were

sequenced for most exons. Sequencing can detect any molecular level change

within the size limits of the PCR amplicon sequenced. In addition, all exons

from alternate transcripts of the candidate genes were included. However, this

approach cannot readily identify CNV involving the entire PCR amplicon

or detect allele specific amplification due to additional SNPs within primer

binding sites. An analysis of CNV within the dHMN 7q34-q36 disease interval

was carried out in Chapter 5.

Most candidate genes were focused within the minimum probable candidate

interval, with only a handful of genes outside this interval sequenced. This

interval is defined by recombination events in unaffected individuals in CMT54,

as such the possibility of reduced penetrance means that the defined interval

may be incorrect, however with the information at hand this is the most likely

area. An examination of the remaining genes in the 7q34-q36 interval was

carried out using NGS, detailed in Chapter 6.

5C O P Y N U M B E R A N D S T R U C T U R A L VA R I AT I O N

5.1 introduction

Copy number variation (CNV) has recently been recognised as a major source

of genetic diversity in humans, including variation leading to disease [295].

It is estimated that copy number variation (CNV) affects 5% of the sequence

of the human genome [296], a much greater amount than the 1% affected by

SNPs [295]. This chapter explores CNV at the 7q34-q36 disease interval as a

potential disease mechanism in dHMN.

5.1.1 CNV and structural variation

By the classical definition, CNV refers to deoxyribonucleic acid (DNA) seg-

ments that are present in a genome at an abnormal number when compared

to a reference genome. CNV can result from a change in chromosome number

or unbalanced structural variation within chromosomes. Balanced structural

variation, including inversions or insertion/deletions, where the copy number

remains unchanged, are also a source of genetic variation which must be con-

sidered in disease gene identification strategies. Structural variation can occur

at a scale ranging from large chromosomal regions through to a molecular

level of a single base pair. While there is no set definition of the minimum size

of a CNV, functional definitions have been based on the limit of resolution

achievable with each new technology. With current technology reaching base-

109

5 copy number and structural variation 110

pair resolution, the functional minimum size definition for structural variation

is one exon or 50-100 bp of DNA [295, 297]. Similarly, the mean size of CNV

detected in the genome has decreased with the increasing resolution of new

techniques, with smaller CNV more common [297, Box 2].

5.1.1.1 Chromosomal CNV

Gain or loss of an entire autosomal chromosome has serious clinical conse-

quences. Generally, loss of one or two copies of an autosomal chromosome

(nullisomy and monosomy respectively) are pre-implantation or embryonic lethal

[199]. The gain of an additional autosomal chromosome (trisomy) is also usu-

ally embryonic lethal, however, some forms are survivable but with congenital

abnormalities. Gain or loss of a sex chromosome is clinically less severe, with

affected individuals generally showing only minor abnormalities.

5.1.1.2 Structural variation

Structural variations are rearrangements of DNA segments within homologous

or non-homologous chromosomes. Such rearrangements include: deletions, a

loss of a DNA segment; duplications, a gain of a DNA segment, either in tandem

or dispersed; inversion, the reversal of direction of a DNA segment; insertion,

a gain of a foreign DNA segment from a homologous or non-homologous

chromosome; translocation, the exchange of a DNA segments between non-

homologous chromosomes (Figure 5.1) [297]. Insertions can be either novel

sequences or mobile-elements (transposons). Structural variations can be unbal-

anced, where the copy number is altered, or balanced, where there is no change

in copy number. Complex structural variations, involving a combination of

several types of structural variation, can arise from multiple independent

rearrangements at the same loci or complex rearrangements occurring in one

event.

5 copy number and structural variation 111

The molecular mechanisms by which CNV can cause disease include: gene

dosage, gene interruption, gene fusion, position effects, unmasking of reces-

sive alleles or functional polymorphisms, and transvection effects [295]. Gene

dosage effects are seen when a dosage-sensitive gene is duplicated or deleted

which leads to over expression or haploinsufficiency respectively. Gene in-

terruption occurs when the breakpoint of a CNV or structural variation lies

within a gene, leading to loss of function. Gene fusion occurs when separate

genes or their regulatory elements are brought together by structural variation.

Positional effects involve alteration to the regulatory elements of genes. Thus,

expression of genes located beyond the region involved in a CNV can be af-

fected. Unmasking recessive alleles or functional polymorphisms occurs when

the good copy of a gene is deleted, thus allowing the effect of the recessive

allele to be observed. Transvection is the cross regulation of gene expression

by gene pairs on homologous chromosomes. Loss of one allele or regulatory

elements disrupts this regulation, affecting gene expression.

Most of the known CNV is located in genomic regions with no obvious

phenotypic effect [295]. In several genome-wide CNV studies, up to half of the

CNV identified in each study are present in multiple healthy individuals indi-

cating they are polymorphic [298–304]. Recent genome-wide high resolution

CNV analysis at a population scale has defined both common polymorphic

Deletion Duplication Inversion Insertion

Chr A

Chr B

DerivativeChr B

Translocation

DerivativeChr A

DerivativeChr B

Figure 5.1: Forms of CNV or structural variation at chromosomal or molecular scale.This includes deletion, duplication, inversion, insertion and translocation.

5 copy number and structural variation 112

CNVs (MAF > 5%) and population specific polymorphic CNVs [302, 304]. The

most common structural variation identified is sub-microscopic deletion.

5.1.2 CNV in dHMN and related peripheral neuropathies

Recently, the role of CNV in diseases of the nervous system, including: neu-

rodevelopmental, neurodegenerative, and psychiatric disorders has greatly

expanded [reviewed by 305]. The majority of disease genes in dHMN identified

to date involve point mutations or small deletions affecting only a few bp (Fig-

ure 1.1). An analysis of CNV within 35 peripheral neuropathy genes (including

the dHMN genes: HSPB1, HSPB8, DCTN1, SETX, GARS and BSCL2), did not

identify any putative pathogenic CNV other than known PMP22 duplications

in CMT type 1A (CMT1A) [306].

In rare cases, structural variation has been identified as the cause of dHMN.

In autosomal recessive dHMN6, where over 50 point mutations or small

deletions have been described in IGHMBP2, two cases were observed to carry

structural variations of IGHMBP2, heterozygous with point mutations [78, 79].

The first case involved a deletion of 18.5 kb of DNA spanning exons 3-7 and

the second case involved a complex structural variation with two deletions

and one inversion affecting exons 9-15, creating a premature stop codon.

Both of these CNVs lead to disruption of IGHMBP2 and unmasking of the

recessive allele on the alternate chromosome. Other important examples of

neuropathies caused by structural variation include: CMT1A caused by tandem

duplication of PMP22 [307]; hereditary neuropathy with liability to pressure

palsies (HNPP) caused by the deletion of the same 1.5 Mb region [308]; SMA

caused by deletion or gene conversion of SMN1 (see section Section 1.1.11);

and autosomal dominant leukodystrophy (ADLD) caused by duplication of

LMNB1 [309].

5 copy number and structural variation 113

5.1.3 Methods of detecting CNV

CNVs were first identified by cytogenetic analysis of chromosomal number and

structure at a microscopic resolution. Such techniques developed to achieve

a resolution of approximately 3 Mb. With the advent of sequencing, CNV at

specific targeted loci could be characterised to a base pair resolution. How-

ever, it wasn’t until recently that comprehensive genome-wide analysis of

CNV, at high resolution, has become possible. The technologies that have al-

lowed this include clone and oligonucleotide-based array comparative genomic

hybridisation (aCGH), SNP genotyping arrays, and NGS based CNV analysis.

For reviews on these techniques see Feuk et al. [310] and Alkan et al. [297].

No single technique, as yet, can correctly identify all structural variation

in a genome-wide scan. Karyotyping can robustly identify all the types of

structural variation but is limited by its resolution and cannot identify sub-

microscopic scale structural variation which account for the majority of known

structural variation. Array based CGH, based on relative amounts of probe

signals can only detect CNV, with balanced structural variation not detected

and translocations incorrectly identified as a duplication of the DNA segment

at the donor site. However, the resolution of aCGH is limited only by the

array probe densities, which have become increasingly high. SNP genotyping

arrays are similarity restricted to the identification of CNV and also achieve

a high resolution. Recently, the use of NGS in structural variation analysis

has shown great potential, however the computational and bioinformatics

overheads required still present a significant challenge beyond large research

groups. Structural variation analysis using NGS can be grouped into four

strategies: read-pair methods, which identify structural variation break points

by locating paired-end sequencing reads that are inconsistent with a reference

genome; read-depth methods, which identify CNV based on read depth of

5 copy number and structural variation 114

sequence alignments, similar to PCR based methods; split-read approaches,

which identify structural variation break points by identifying breaks in se-

quence alignment due to multiple reads; and sequence assembly comparisons,

where de novo assembled genome sequences are aligned to identify structural

variation [reviewed by, 297]. These approaches combined have the potential to

identify all types of structural variation, however are limited by the sequencing

read-length, sequence fragment sizes, library construction and uniqueness of

the genome region, factors that limit the effective detectable size of structural

variation to less than 1 kb.

5.1.3.1 Analysis of CNV at 7q34-q36 using aCGH

To examine the 7q34-q36 disease interval for pathogenic structural variation

two analysis were carried out in this thesis; karyotyping to detect microscopic

structural variation and high-density oligonucleotide aCGH targeted to the

7q34-q36 disease interval to detect deletions and duplications. To confirm

a structural variation as pathogenic would require: The structural variation

segregates with disease in CMT54; it is a novel structural variation, not reported

in the normal population or healthy controls; the structural variation has a

putative functional mechanism disrupting a gene or regulatory element; and

the structural variation must be validated by a second method.

Array CGH is a flexible technology that allows the design of custom microar-

rays targeted to specific regions of the genome. In this study an oligonucleotide

array was designed to target the 7q34-q36 disease interval. To detect CNV,

aCGH measures the relative amounts of test and reference DNA which bind

competitively to the array (Figure 5.2). While not providing a characterisation

of all structural variation, high-density aCGH combined with karyotyping

should identify the majority of structural variation within the 7q34-q36 interval.

5 copy number and structural variation 115

Sample DNA

Reference DNA

Digest and label DNA

Cy:5

Cy:3

Hybridise to array

Duplication Deletion

Detect and quantify Cy3:Cy5 signal ratios

Cy3

:Cy5

rati

o

Genomic position

Figure 5.2: aCGH analysis. Genomic DNA from the test sample is digested andlabelled with a contrasting flurophore to a reference DNA mixture. Both labelleddigested DNAs are hybridised to an array and the relative amounts of each DNAfragment are quantified. A plot of the log ration of test over reference identifiesintervals that are duplicated and deleted.

5.1.4 Hypothesis and Aim

It is hypothesised that dHMN seen in family CMT54 is caused by a single gene

mutation within the linked region on chromosome 7q34-q36. This mutation

may be a CNV or structural variation. The aims of this chapter are to:

• Identify microscopic CNV or structural variation using karyotyping of

chromosomal spreads.

• Identify any molecular level CNV using aCGH.

5.2 materials and methods

5.2.1 Custom CGH Microarray

CNV analysis was carried out using the Agilent 60-mer oligonucleotide mi-

croarray CGH for genomic DNA. An Agilent SurePrint G3 custom CGH

microarray 4x 180K was designed using Agilent Technologies Earray software

(release 6.5; https://earray.chem.agilent.com). The 7q34-q36 disease inter-

val and 100 Kb of flanking sequence was targeted with all available probes

5 copy number and structural variation 116

included. The design included 117636 probes to the chromosome 7q34-q36

disease interval, the Human CGH 3k Agilent Normalization Probe Group and

5 × the Human CGH 1k Agilent Replicate Probe Group. The average distance

between probes in the chromosome 7q34-q36 interval was 109 bp.

5.2.2 Array processing

Genomic DNA (gDNA) from the three clinically affected and one married

in control (section 6.2.2) as well as the Agilent SurePrint G3 custom CGH

microarray, was forwarded to the Department of Cytogenetics, the Children’s

Hospital at Westmead (Artur Darmanian, Senior Hospital Scientist) for aCGH

analysis. Arrays were processed following the Agilent Oligonucleotide Array-

Based CGH for Genomic DNA Analysis Protocol Version 5.0, (June 2007), using

the direct method for sample preparation (Figure 5.3). Briefly, for each sample

analysed, an equal amount of control human gDNA (Female 100ug (Promega)

Cat: G1521, Male 100ug (Promega) Cat: G1471) was prepared. Genomic DNA

was digested with AluI and RsaI restriction enzymes. Digested sample and

control DNA were alternatively labelled with random primers and Cyanine

3-dUTP (Cy3) or Cyanine 5-dUTP (Cy5). Following clean-up of labelled gDNA,

samples were prepared for hybridisation by mixing Cy3 and Cy5 labelled

gDNA with blocking and hybridisation buffers. Samples and arrays were

hybridised to the arrays for 19 hours at 65 °C. After washing, slides were

scanned using Agilent Microarray Scanner, model G2505B. Feature extraction

was carried out using the software, Genomic Workbench Standard Edition

version 5.0.14 (Agilent Technologies).

5 copy number and structural variation 117

Experimental sample gDNA

?Restriction digestion of gDNA

?Labelling of gDNA

?Purification of labelled gDNA

?Preparation before hybridisation

?19-hour hybridisation (65 °C)

?Microarray washing

?Microarray scanning

?

Feature extraction

Experimental sample gDNA

?Restriction digestion of gDNA

?Labelling of gDNA

?Purification of labelled gDNA

Figure 5.3: Workflow for Agilent aCGH using the Direct method.

5.2.3 Software Nexus 5

Feature extracted aCGH data was segmented with Nexus 5.0 (BioDiscovery)

using the rank segmentation algorithm. The significance threshold for segmen-

tation was set at 10-6 and the minimal number of probes required per segment

was 3. Copy number gains and loss were defined with a log2 Cy3/Cy5 ratio

of 0.2 and -0.2 respectively. High gains and homozygous losses were defined

with a ratio of 0.6 and -0.7 respectively.

5.2.3.1 UCSC genome browser

Copy number variation coordinates were exported from the Nexus 5 software

in bedGraph file format using the Human Mar. 2006 (NCBI36/hg18) assembly

reference then uploaded to the UCSC Genome browser [http://genome.ucsc.

edu, 210]. CNVs were compared to the following UCSC genome browser tracks:

5 copy number and structural variation 118

UCSC Genes, Database of Genomic Variants: Structural Variation and Segmental

Duplications.

5.2.3.2 CHOP Copy Number Variation project

Copy number variations identified with Nexux 5 software were checked against

the Copy Number Variation project at the Children’s Hospital of Philadelphia

(CHOP) (CNV.CHOP.org), a database of CNVs from over 2000 healthy children.

The UCSC liftover tool (http://genome.ucsc.edu) [210] was used to convert

the coordinates of CNVs from NCBI36/hg18 format generated by the Nexux

5 software, to the NCBI35/hg17 format used by the CHOP Copy Number

Variation project.

5.2.4 PCR confirmation of deletions

PCR amplicons were designed to bridge identified CNV based on the coordi-

nates generated from Nexus 5 software with primers designed to sequences

flanking each end of the identified CNV. PCR and sequencing protocols are

described in Chapter 2. Sequence generated from PCR bridging deletions was

aligned to the Human Mar. 2006 (NCBI36/hg18) assembly reference using the

BLAT alignment tool from the UCSC genome browser [198].

5.3 results

5.3.1 Cytogenetic analysis

Cytogenetic analysis was carried out to investigate if microscopic structural

variations are present in affected individuals in CMT54. Peripheral blood was

collected from clinically affected individual III:14 (Figure 3.5) and forwarded

5 copy number and structural variation 119

to the Department of Cytogenetics, the Children’s Hospital at Westmead (Dr

C Rudduck, Cytogeneticist). Five Trypsin (GTL) banded metaphases were

counted from 10 out of a total of 15 cells. Trypsin (GTL) banding showed a

female karyotype (46, XX) with no abnormalities detected.

5.3.2 CNV analysis at 7q34-q36 using aCGH

The 7q34-q36 dHMN1 disease interval was investigated in detail for deletions

and duplications using Agilent 60-mer oligonucleotide microarray CGH for

genomic DNA analysis. Three affected individuals and one married in control

individual from pedigree CMT54 were analysed. Based on the haplotype

analysis at 7q34-q36 carried out previously, the most genetically distant family

members were selected. The three affected individuals (Figure 3.5: II:1, III:9,

III:12) carry the disease chromosome and a unique alternate chromosome

at 7q34-q36. The healthy control individual (Figure 3.5: II:7) is the married

in parent of affected individual III:9 and therefore shares the non-disease

chromosome at 7q34-q36.

Using the Nexus 5 software a total of 89 unique CNVs were identified

within the 7q34-q36 interval screened at high resolution (Appendix I). These

CNVs included 40 duplications and 49 deletions. These CNVs had a mean

size of 3847 bp, with a range from 4 bp to 110312 bp. The size distribution of

identified CNV, showed most CNVs were sub-microscopic (< 3 kb) with the

largest proportion less than 500 bp (Figure 5.4). This matched the expected

distribution, with smaller CNVs more common [297].

The location of the CNVs were plotted against the 7q34-q36 interval for

the three affected individuals and the control individual (Figure 5.5). Two

heterozygous duplications and six heterozygous deletions were common to

the three affected individuals. The coordinates of the ends of the CNV were

5 copy number and structural variation 120

Perc

en

tag

e o

f C

NV

calls

0

5

10

15

20

25

30

35

40

45

50

CNV size (bp)50

0.0

1.5k

2.5k

3.5k

4.5k

5.5k

6.5k

7.5k

8.5k

9.5k

10k

30k

50k

70k

90k

110k

Figure 5.4: The proportion of CNV calls within each size range. The CNVs werecounted in 500 bp blocks up to 10 kb, then counted in 10 kb blocks. The size distributionof the CNVs identified matches the expected distribution, with smaller CNVs morecommon.

checked to ensure identity. CNVs that were not present in all three affected

individuals were excluded. Of the eight CNVs identified in the three affected

individuals, both duplications and two deletions were also present in the

control individual and excluded. The remaining four deletions, labelled A, B,

C and D were examined further (Figure 5.5).

5.3.2.1 Deletion A

Deletion A involves the loss of up to 178 kb of DNA from within the olfactory

receptor gene cluster at 7q35. The size of the deletion is different between

the three affected individuals; affected individual III:9 shows a single large

deletion of 178 kb, whereas affected individuals II:1 and III:12 have two smaller

deletions of 55.6 kb and 110.3 kb separated by a 12.1 kb gap, maintaining the

same overall size as the deletion in III:9. The probe plot generated with the

Nexux 5 software shows different Log2 cy3/cy5 ratios for the two deletion

patterns in affected individuals (Figure 5.6). Healthy control individual II:7

shows slight gain in Log2 cy3/cy5 ratio across this region, but is below the

threshold for a duplication call.

5 copy number and structural variation 121

123

123

CNVAffectedSamples

CNVHealthySample

7q34 7q35 7q36.1 7q36.2Cytogenetic map

Genes

Chromosome 7

5Mb

DGV Struct Var

MPCI

Del A Del B Del C Del D

Figure 5.5: Array CGH analysis in CMT54. Top: Chromosome 7 ideogram depicting thelocation of the 7q34-q36 disease locus analysed using aCGH (UCSC genome browser).Middle: detail of the 7q34-q36 disease locus. Figure labels indicate the following:cytogenetic map, minimum probable candidate interval (green bar), known genes,and known CNV from DGV. Bottom: Detail of CNV identified with Nexux5 softwarefor three affected individuals and one healthy married in parent from CMT54. Theupper graph shows number of affected individuals carrying deletions (red bars) andduplications (green bars). The lower graph shows CNVs carried by the healthy marriedin parent. Four deletions labelled A-D are carried by all three affected individualsand not by the control individual. Two additional deletions and two duplications arecarried by all three affected individuals as well as the control individual.

The Nexux 5 genome browser indicated that there is a reported CNV at this

location. Examination of reported CNV from DGV using the UCSC genome

browser showed several reported duplications, deletions and inversions at this

location (Figure 5.7). In particular, two CNV reports match the coordinates of

the 178 kb deletion seen in affected individual III:9 (DGV; variation_0726 and

variation_2110). These reports identified both duplication and deletion of this

interval. In addition, two deletions (DGV; variation_94938 and variation_94944)

and two duplications (DGV; variation_82238 and variation_82244) matched the

55.6 kb and 110.3 kb deletions identified in affected individuals II:1 and III:12.

Further complicating the situation, two small duplications were reported which

match the size of the 12.1 kb gap (DGV: variation_70060 and variation_82243).

It is possible that affected individuals II:1 and III:12 carry a duplication of

the 12.1 kb interval in addition to the larger deletion A. However, additional

5 copy number and structural variation 122

55.6 110.3 kb

178 kb

55.6 110.3 kb

Figure 5.6: CNV probeplot around deletionA. Top: Genomic loca-tion, probe design andreported CNV’s. Mainpanels: CNV probe plotsgenerated with Nexussoftware around dele-tion A. The log2 test-over-reference ratios ofhybridisation signals isplotted against genomicposition. Each probe isrepresented by a bluedot and the size of dele-tion is indicated by thered scale bar. The upperthree panels shown af-fected individuals II:1,III:9, and III:12, andthe lower panel showshealthy control individ-ual II:7. Affected in-dividual III:9 shows a178 kb deletion and af-fected individuals II:1and III:12 show twosmaller deletions of55.6 kb and 110.3 kb, in-dicated by the red scalebar below each probeplot. Control individ-ual II:7 has a slightgain across this regionwhich does not reachsignificance.

5 copy number and structural variation 123

chr7 (q35) 13 21.11 31.1 33 q34 35Scalechr7: 100 kb143550000 143600000 143650000 143700000----UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative GenomicsDatabase of Genomic Variants: Structural Variation (CNV, Inversion, In/del)CTAGE4FLJ43692FLJ43692 OR2A1 FKSG35OR2A9PBC040701OR2A7LOC728377CTAGE4 FKSG35FKSG35OR2A9P OR2A42 ARHGEF5ARHGEF5ARHGEF537083296237282949409493682237822359493459622700586497631398457145729493976127006137567 822397006294941 21100726822408224194937649753951194935822428223694942949438223894938 59845372837006082243 94944822447006337284 700578224594945 45737005670059 64974Affected (II:1)Affected (III:12)Affected (III:9)Control (II:7)Figure 5.7: UCSC genome browser views of deletion A. Detail of deletion A andreported CNVs from DGV. The deletions in the affected individuals are shown asorange bars. No CNV is shown for the healthy control individual. UCSC genes andDGV structural variation tracks are shown. Colour in the DGV structural variationtrack indicates the type of variation relative to the reference: purple indicates inversions,red indicates a gain in size, blue indicates loss in size, and green indicates both a lossand a gain in size.

5 copy number and structural variation 124

evidence would be required to confirm this. These size matches are likely due

to the same array platform and probe set being used in the experiments which

reported these CNVs as used in this experiment [311–313]. Numerous other

reported CNVs overlap deletion A.

To confirm the break points of deletion A, three PCR primer sets were de-

signed from the Nexuxs 5 software coordinates. Despite a robust optimisation

protocol, no PCR products were amplified. This suggests deletion A may be

more complex than indicated, or the deletion co-ordinates may be too far from

the true break point to allow PCR amplification.

The interval affected by deletion A contains several olfactory receptor genes

and one candidate gene ARHGEF5, located at the distal end of the deletion

(Figure 5.7). Three alternate transcripts are shown for ARHGEF5, two of which

are partly deleted and one entirely deleted. The deletion or duplication of

ARHGEF5 has been identified in numerous healthy individuals, from different

populations, in at least three studies [311–313]. This suggests that ARHGEF5

CNV is not involved in the pathogenesis of dHMN linked to 7q34-q36 and

deletion A is highly likely to be a common population variant.

5.3.2.2 Deletion B

Deletion B involves the loss of 5.4 kb of DNA from 7q35. The size of this

deletion is between 5.4 kb and 5.2 kb for the three affected individuals. The

probe plot generated with the Nexus 5 software shows the Log2 cy3/cy5 ration

varies between the three affected individuals (Figure 5.8). The healthy control

individual also shows a reduction in the Log2 cy3/cy5 ratio which is below the

threshold for a deletion call. The Nexus 5 genome browser shows a reported

CNV in this location (Figure 5.9 top panel). Examination of reported CNV

from DGV using the UCSC genome browser showed two deletions of similar

size to deletion B (Figure 5.9). Furthermore, no genes are located within or

5 copy number and structural variation 125

nearby this deletion. Therefore deletion B is likely to be a normal population

variant and non-pathogenic in CMT54.

5.3.2.3 Deletion C

Deletion C involves the loss of intronic sequence from CNTNAP2, 14Kb down-

stream and 80Kb upstream of the closest two exons. While CNTNAP2 spans

the proximal end of the minimum probable candidate interval in CMT54, this

deletion is located outside the minimum probable candidate interval. The size

of the deletion is 1.4-1.5 kb in the three affected individuals and is absent from

the control individual. The probe plot generated with the Nexux 5 software

shows an equal reduction in the Log2 cy3/cy5 ratio between the three affected

individuals and no reduction in the control individual. No CNV was reported

in this location by the Nexus genome browser, however, a single deletion at

this location was reported in DGV (DGV; variation_64982) (Figure 5.11).

To check for segregation in CMT54 and screen additional healthy controls,

PCR primers were designed flanking deletion C based on the coordinates from

the Nexus 5 software. The PCR primers were located outside the deletion, with

an expected amplicon size of ∼2 Kb for normal chromosomes and ∼600 bp for

chromosomes with the deletion. Amplification from affected individuals in

CMT54 produced only the 600 bp PCR product. The 2 kb PCR product was not

seen in any affected individuals from CMT54 or 36 control samples indicating

the 2 kb product was not amplified by the PCR as opposed to the deletion being

homozygous. To confirm the specificity of the PCR, the 600 bp PCR product

from an affected individual was sequenced and aligned to the Human Mar.

2006 (NCBI36/hg18) assembly using the BLAT tool from the UCSC genome

browser. The sequence aligned with 100% identity to the flanking sequence

around deletion C (Figure 5.11). This alignment confirmed the size of the

deletion as 1098 bp, smaller than indicated by the Nexux 5 software, possibly

due to a lower number of probes at the distal end of the deletion (Figure 5.11).

5 copy number and structural variation 126

5.4 kb

5.2 kb

5.4 kb

Figure 5.8: CNV probeplot around deletionB. Top: Genomic loca-tion, probe design andreported CNV’s. Mainpanels: CNV probe plotsgenerated with Nexussoftware around dele-tion B. The log2 test-over-reference ratios ofhybridisation signals isplotted against genomicposition. Each probe isrepresented by a bluedot and the size of dele-tion is indicated by thered scale bar. The upperthree panels shown af-fected individuals II:1,III:9, and III:12, andthe lower panel showshealthy control individ-ual II:7. The three af-fected individuals showa common deletion of5.2 - 5.4 kb. Control in-dividual II:7 shows adecrease in the Log2cy3/cy5 ratio which isbelow the threshold fora deletion call. Theprobe coverage in thisregion is shown in theprobe design above.

5 copy number and structural variation 127

chr7 (q35) 33 34 35Scalechr7: 2 kb144548000 144549000 144550000 144551000 144552000 144553000----UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative GenomicsDatabase of Genomic Variants: Structural Variation (CNV, Inversion, In/del)657864979Affected (II:1)Affected (III:12)Affected (III:9)Control (II:7)Figure 5.9: UCSC genome browser views of deletion B. Detail of deletion B andreported CNVs from DGV. The deletions in the affected individuals are shown asorange bars. No CNV is shown for the healthy control individual. UCSC genes andDGV structural variation tracks are shown. Colour in the DGV structural variationtrack indicates the type of variation relative to the reference: blue indicates loss in size.

Using this PCR as a test, the deletion was present in all 10 affected individuals

in CMT54 and the unaffected individual III:3 who carries the proximal end

of the disease haplotype after a genetic recombination event. None of the

remaining unaffected family members or spouses from CMT54 carried the

deletion. A screen of 36 unrelated healthy control individuals identified three

individuals with deletion C (Figure 5.12). With the deletion reported in DGV

and present in three of 36 healthy controls, deletion C was excluded as a

normal population variant.

5.3.2.4 Deletion D

Deletion D involves a loss of sequence from an intragenic region within the

minimum probable candidate interval in CMT54. The closest gene is ACTR3C

∼93 Kb downstream. The size of this deletion ranges from 1.9 kb to 2.6 kb

in the three affected individuals. The probe plot generated with the Nexus 5

software shows a variable Log2 cy3/cy5 ratio for the deletion between the three

affected individuals (Figure 5.13). In addition, the healthy control individual

also shows a reduction in the Log2 cy3/cy5 ratio, but is below the threshold

for a deletion call. In this location, the Nexus 5 genome browser showed a

known CNV and DGV contained 5 deletions, 1 insertion and 1 insertion and

5 copy number and structural variation 128

1.4 kb

1.5 kb

1.4 kb

Figure 5.10: CNV probeplot around deletionC. Top: Genomic loca-tion, probe design andreported CNV’s. Mainpanels: CNV probe plotsgenerated with Nexuxsoftware around dele-tion C. Each blue dotrepresents a probe. TheY-axis represents thelog2 test-over-referenceratios of hybridisationsignals. The X-axis rep-resents the genomic po-sition. The upper threepanels shown affectedindividuals II:1, III:9,and III:12, and thelower panel shows con-trol individual II:7. Af-fected individuals showa 1.4-1.5 kb deletion, in-dicated by the red scalebar below each probeplot. Control individ-ual II:7 does not showany CNV in this region.The probe coverage inthis region is shown inthe probe design above.There are no reportedCNVs in this region.

5 copy number and structural variation 129

chr7 (q35) 33 34 35

Your Sequence from Blat Search

A.

B.

Scalechr7:

Figure 5.11: (A) UCSC genome browser views of deletion C. Detail of deletion C andreported CNVs from DGV. The deletions in the affected individuals are shown asorange bars. No CNV is shown for the healthy control individual. UCSC genes andDGV structural variation tracks are shown. Colour in the DGV structural variationtrack indicates the type of variation relative to the reference: blue indicates loss in size.(B) Characterisation of break points of deletion C. PCR was targeted to the deletionallele and sequenced in two affected individuals. A BLAT search on the Human Mar.2006 (NCBI36/hg18) assembly18 aligned the sequence to the position of deletion C.The deleted area is the arrowed line joining the two sequence blocks. All three CNVresults from the NEXUS CNV analysis over reported the true size of the deletion(1098bp) due to a low density of probes at the 5’ end of the deletion.

M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

M 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Figure 5.12: Genotyping of control individuals for deletion C. Lanes marked M areHyperLadder IV DNA size marker, lanes 1 and 2 are CMT54 positive controls, lane3 is the PCR negative control, Lanes 4-37 are healthy control samples, and lane 38is samples 29-37 pooled before PCR. Normal controls 21, 23 and 34 are positive fordeletion C, indicating it is a normal population variant.

5 copy number and structural variation 130

deletion (Figure 5.14). Therefore, deletion D is a normal population variant

and non-pathogenic in CMT54.

5.4 discussion

5.4.1 Summary

While CNV is a known mechanism in dHMN and the related peripheral neu-

ropathy CMT1A, the evidence presented here suggests that CNV is not the

cause of disease in this instance. Cytogenetic analysis at a microscopic reso-

lution did not indicate any deletion, duplication or other structural variation

present at the 7q34-q36 interval. CNV analysis at a sub-microscopic resolution

using aCGH identified 89 unique duplications and deletions. Four deletions

were present in the three affected family members and not the married in

healthy control. However, no putative pathogenic link could be established for

these deletions and all were excluded. These results are in agreement with the

heterozygosity observed for microsatellite markers across the 7q34-q36 inter-

val (Chapter 3). Microsatellite markers can detect duplications by observing

increased allele dosage or number of alleles, whereas deletions are observed

by loss of heterozygosity. No unexplained homozygosity or additional alleles

were observed during microsatellite analysis.

5.4.2 Comparison of methods for CNV analysis

None of the CNV identified using aCGH were reported by the cytogenetic

analysis. The majority of the CNV identified by aCGH was sub-microscopic,

with the majority of CNV below 1.5 kb. The larger CNV identified ranged

in size from 12-110 kb, well below the 1-10 Mb resolution observable by

5 copy number and structural variation 131

1.9 kb

2.5 kb

2.6 kb

Figure 5.13: CNV probeplot around deletionD. Top: Genomic loca-tion, probe design andreported CNV’s. Mainpanels: CNV probe plotsgenerated with Nexuxsoftware around dele-tion D. Each blue dotrepresents a probe. TheY-axis represents thelog2 test-over-referenceratios of hybridisationsignals. The X-axis rep-resents the genomic po-sition. The upper threepanels shown affectedindividuals II:1, III:9,and III:12, and thelower panel shows con-trol individual II:7. Af-fected individuals showa 1.9-2.6 kb deletion, in-dicated by the red scalebar below each probeplot. Control individualII:7 shows a decreasein the Log2 ratio, how-ever, this does not reachthe cut-off, but doessuggest this individualmay carry this deletion.The probe coverage inthis region is shown inthe probe design above.

5 copy number and structural variation 132

chr7 (q36.1) 33 34 35Scalechr7:chr7:153033636 5 kb 149505000 149510000----UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative GenomicsDatabase of Genomic Variants: Structural Variation (CNV, Inversion, In/del)Duplications of >1000 Bases of Non-RepeatMasked Sequence3709271636647649849495932964 94960Affected (II:1)Affected (III:12)Affected (III:9)Control (II:7)Figure 5.14: UCSC genome browser views of deletion D. Detail of deletion D andreported CNVs from DGV. The deletions in the affected individuals are shown asorange bars. No CNV is shown for the healthy control individual. UCSC genes andDGV structural variation tracks are shown. Colour in the DGV structural variationtrack indicates the type of variation relative to the reference: red indicates a gain insize, blue indicates loss in size, and green indicates both a loss and a gain in size. Theyellow bar shows a segmental duplication on 7q36.1.

cytogenetic analysis. Therefore the karyotyping and CNV analysis are both in

agreement.

The methods used here sufficiently overlap in resolution to conclude that

microscopic structural variation and all scale of insertions and deletions are

unlikely to be responsible. There remains the possibility that sub-microscopic

balanced structural variation may be present within the 7q34-q36 interval. The

resolution of the custom oligonucleotide based aCGH used here was high

enough in most places to identify CNV and overlaps with resolution to detect

indels (CNV) with direct sequencing of exons.

The NGS data generated by targeted sequence capture was not amenable

to structural variation analysis with the computer resources available. The

analysis of this data for evidence of structural variation including inversions

and insertions or the characterisation of the breakpoints of the identified

structural variation to a base pair resolution should be perused if sufficiently

powerful computing resources become available to the research group.

5 copy number and structural variation 133

5.4.3 Limitations of array-based CGH

The CNVs reported in databases have been identified from individuals con-

sidered to be clinically normal. It is possible that apparently normal/healthy

individuals will later develop disease. The larger the cohorts the greater the

chance of including asymptomatic carriers of disease alleles. This has occurred

in the case of the duplication that causes CMT1A. This duplication has been

reported in a population based study of CNV [314] where the deletion was

found in apparently normal individuals. This is also likely to be a concern

with the CHOP variation database which has used healthy children as a sam-

ple group. This limits the usefulness of these as controls to exclude CNV as

pathogenic. To use reported CNVs as controls for excluding pathogenic CNV,

the CNV needs to be either; identified in multiple individuals, at a rate above

the frequency of the disease in the general population; or from a control group

that has been clinically examined and is not at risk of developing the disease.

The prevalence of autosomal dominant dHMN is much lower than CMT1A

(20 per 100 000) and CMT2 (2 per 100 000) [315], so reported CNV are suitable

for filtering CNVs in dHMN.

The size of reported CNVs also needs to be considered when filtering po-

tential CNV mutations. The size of reported CNVs is generally over-estimated

due to the limited resolution of the techniques used to create initial CNV

maps [295]. The large CNVs reported from platforms with limited resolution

may contain several smaller CNVs. The sequencing of deletion C here, shows

that with a high probe density, the size of the deletion was only slightly over

represented by the aCGH results. Thus, the size of CNVs within 7q34-q36 is

relatively accurate and can be compared to reported CNVs identified with

similar platforms.

6N E X T G E N E R AT I O N S E Q U E N C I N G O F C M T 5 4

6.1 introduction

The recent development of new sequencing technologies has revolutionised

the strategies for the identification of genetic defects underlying disease. These

new sequencing technologies are collectively referred to as next generation

sequencing (NGS) and encompass; DNA template preparation, sequencing,

data handling and bioinformatics [316]. NGS can generate very large amounts

of sequence data. This has led to a shift in focus from data generation (template

preparation and sequencing) to data analysis and validation. NGS is particular-

ity useful for the identification of disease genes and for targeted re-sequencing

of disease intervals defined by linkage analysis or genome-wide association

studies (GWAS). The development of NGS continues to rapidly advance and

associated costs have dropped so that NGS has become increasingly accessible.

This chapter details the application of NGS for the identification of genetic

variation in family CMT54 at both the 7q34-q36 dHMN disease interval and

elsewhere in the genome.

6.1.1 NGS technology

Multiple NGS platforms were used in this study. As described below, NGS

platforms vary in the methods and biochemistry used in each step. For detailed

134

6 next generation sequencing of cmt54 135

descriptions of currently available NGS platforms see reviews by Metzker [316]

and Shendure and Ji [317].

6.1.1.1 Template preparation

For the NGS platforms used here, template preparation, or library genera-

tion, involves the fragmentation of template DNA followed by immobilisation

on solid supports. The immobilisation of template fragments is the key to

achieving the massive parallel sequencing required to achieve high sequencing

throughput and low costs. For the detection of fluorescent sequencing sig-

nals, most instruments require amplification of the DNA template fragments.

Template amplification is generally achieved by emulsion or solid phase PCR.

Both approaches use an adapter ligated shotgun library and a single-primer

set, multi-template PCR. Physical separation of template fragments is used

to maintain single template specificity in PCRs, otherwise known as clonal

amplification.

For example, the 454 sequencing system (Roche) uses a water in oil emulsion

PCR where the template fragment concentration is controlled to achieve, at

most, a single template molecule in each emulsion reaction compartment [318].

PCR products are captured to micron-scale beads included in the emulsion.

After breaking the emulsion each bead contains PCR products from a single

template fragment (Figure 6.1 a). Beads are immobilised by chemical attach-

ment to a glass slide or picotitre plate. Alternatively, the Solexa sequencing

system (Illumina) uses bridge PCR, with PCR primers fixed to a solid support.

Adapter ligated template fragments are bound to complementary probes on

a sequencing slide. The concentration of template fragments is controlled to

achieve physical separation of initial template fragments. After PCR amplifica-

tion, template clusters contain copies of a single template fragment (Figure 6.1

b). Increased instrument sensitivity allows the Helicos sequencing systems

(Helicos Biosciences) to sequence from single-molecule templates, avoiding the

6 next generation sequencing of cmt54 136

Figure 6.1: Clonal amplification of immobilised template DNA in NGS. (a) Water-in-oil emulsion PCR. An adapter-ligated shotgun library is PCR amplified withinemulsion compartments. One PCR primer is attached to the surface of micron scalebeads included in the reaction. The concentration of template and beads is adjustedto achieve one template molecule per compartment. After breaking the emulsion,beads contain only one type of template molecule which are then fixed to a slideor picotitre plate. (b) Bridge PCR. Both PCR primers densely coat the surface of thesolid support slide. The initial concentration of the adapter-linked shotgun library iscontrolled to achieve physical separation between template molecules. PCR productsremain tethered near the original template fragment resulting in clonal clusters ofPCR products from single templates. Figure adapted from [317].

need for template amplification. As no amplification is needed, templates are

bound directly to supports with a primer or other adapter probe.

6.1.1.2 Sequencing Biochemistries

The NGS platforms used in this thesis are based on pyrosequencing and

reversible terminator sequencing biochemistries. Several other sequencing

biochemistries are currently available or under development and include

sequencing by ligation [319], semiconductor based sequencing [320] and real-

time sequencing [321].

Pyrosequencing is based on the detection of the pyrophosphate released

upon nucleotide incorporation during strand extension by DNA polymerase

[322, 323]. In brief, one of the four dNTP types is added to the sequencing

reaction and pyrophosphate released stoichiometrically is detected by chemi-

luminescence through a a set of enzymatic reactions using sulfurylase and

luciferase. In this reaction, pyrophosphate is converted to ATP by sulfurylase,

6 next generation sequencing of cmt54 137

which in turn is used by luciferase in the conversion of luciferin to oxyluciferin,

generating light in proportion to the amount of available ATP. Nucleotides are

sequentially cycled to generate a fluorescence flow diagram from which the

sequence can be read [324].

Reversible terminator based sequencing utilises cyclic addition of dNTPs

modified with a reversible terminator capping the 3’-OH and fluorescent

tag used in a similar way to dideoxynucleotides (ddNTPs) used in Sanger

sequencing [316]. In brief, after each nucleotide extension and imaging, excess

dNTPs are washed off and the fluorescent label and the strand terminator are

cleaved, restoring the 3’-OH, allowing extension in the subsequent extension

cycle. NGS systems using reversible terminators have been developed using

both four color individually tagged dNTPs (used here) [325] or single color

tags with sequential extension [326].

6.1.1.3 Data handling and bioinformatics

The current NGS instruments produce short sequence reads from the the end(s)

of longer DNA fragments making up the sequencing library. Read lengths

produced by current instruments include ’short-reads’ of 25-100 base pair (bp)

or ’long-reads’ of 250-800 bp [316]. Reads can be made from a single end of a

DNA fragment or can be read from both ends as ’paired-end’ reads. Paired-end

reads have advantages in assembly accuracy over single reads, in particular for

repeat and duplicated sequences [327]. This is a greater limitation for short-

reads than long-reads. Sequence reads are aligned to form longer sequence

contigs by mapping to a reference genome [328] or de novo assembly [329]. For

disease gene discovery, further bioinformatics may include, sequence variant

calling, annotation (genes, exons, known SNPs), copy number variation, and

segregation/inheritance analysis.

6 next generation sequencing of cmt54 138

6.1.2 NGS of large genomes

NGS is ideally suited to whole genome sequencing, particularly for smaller

prokaryote genomes [318]. While this is the ultimate approach for identifying

sequence variation within an individual, this remains costly in large genomes,

especially where higher read depths are required for accurate identification

of heterozygous variants [330]. To overcome this, several non-cloning based

approaches to genomic partitioning have been developed to enrich for genomic

regions of interest suitable for NGS. These include pooled PCR products,

multiplex PCR, and hybridisation sequence capture [331]. PCR based targeted

enrichment still requires a large preparation workload and is best suited to

small numbers of targets.

Hybridisation based sequence capture, either array-based or solution-based,

has allowed the generation of DNA libraries targeting specific genomic in-

tervals at an effective speed and cost [332]. Hybridisation sequence capture

is based on the construction of a small-fragment DNA library of defined se-

quences of interest to which sample DNA is hybridised; thus allowing for

uniform recovery of sequences of interest [333]. This was demonstrated by the

targeted capture of all known coding exons of the human genome [332]. The

protein coding portion of the genome is referred to as the exome. Using larger

standardised arrays has reduced the cost of exome capture and sequencing

[334]. In addition, as most known disease causing mutations are located in

coding regions, the exome is a high yield target for research [324]. Exome

sequencing has recently become a popular approach for the identification of

Mendelian disease mutations [335].

6 next generation sequencing of cmt54 139

6.1.3 Applications of NGS to Mendelian disease

With many human genomes now sequenced, the number of known single bp

sequence variants or small indels is over ∼3.4 million [336]. An examining of

only the protein coding portion of the genome identified an average of 17 272

coding variants per individual [334]. As mutations causing rare Mendelian

diseases are expected to be highly penetrant, SNPs that are common can be

excluded. The majority of sequence variants within an individual are common

SNPs. Studies have shown that around 92% of coding variants identified or 82%

of all sequence variants, were previously reported in SNP databases [334, 336,

337]. The identification of a single highly-penetrant disease causing variant

from several thousand variants requires a comparison of several individuals

and filtering strategies to reduce the number of variants to be validated.

The first successful application of NGS to the identification of a mutation in

a Mendelian disease was in the recessive disease, Miller syndrome [338]. Four

affected individuals including two siblings and two other affected individuals

from three families were exome sequenced. Under an autosomal recessive

model, only one candidate gene was shared between individuals from the

three families. Compound heterozygous mutations were identified in the gene

DHODH in all affected individuals. Confirmation with Sanger sequencing

showed that the mutations were inherited in a autosomal recessive manner,

and parents were heterozygous carriers of the mutations. DHODH mutations

were subsequently identified in an additional three Miller syndrome families

using Sanger sequencing.

Combining sibling pairs and unrelated families is effective in recessive

diseases with low genetic heterogeneity. Autosomal dominant diseases are

generally more genetically heterogeneous, including dHMN and other neu-

ropathies where each gene mutation is only responsible for a small proportion

6 next generation sequencing of cmt54 140

of cases. As such, unrelated individuals are less likely to have mutations in the

same gene. Multiple individuals from a single pedigree are therefore required

for comparison. In the analysis of Miller syndrome by Ng et al. [338], both

autosomal recessive and dominant modes of inheritance were considered;

using a dominant model increased the number of potential gene variants by a

factor of 14-25 × compared to the recessive model. Due to the shared genetic

background of individuals in a single pedigree, there is a greater number of

genetic variants segregating with the disease by chance. In a practical sense, an

autosomal dominant disease requires the comparison of four related affected

individuals to achieve a similar number of candidate variants as a recessive

disease comparing three individuals [334, 338]. The number of potential vari-

ants does not significantly decrease by adding additional affected individuals

above 4 or 3 for dominant and recessive modes of inheritance respectively.

In the Miller syndrome analysis, comparison to eight unrelated control

individuals was a more effective way of reducing the total number of sequence

variants [338]. Under the autosomal dominant disease model, comparing two

affected individuals and 8 unrelated controls achieved a similar number of

novel variants as comparing only the four affected individuals. However, with

rapidly expanding public sequence variation databases, the effect of comparing

additional control individuals will be less.

Additional strategies for filtering out SNPs unrelated to disease have been

employed. Various software can predict the effect of a sequence variant on

protein function and implicate those likely to be damaging. However, there is

a risk of falsely excluding a true mutation due to inaccurate predictions [338].

Where the disease in a large pedigree has been mapped to a chromosomal locus,

combining NGS with mapping information can greatly reduce the number

of novel candidate variants. Sequencing of fewer individuals is required to

identify a disease mutation [339]. Based on the reported rates of variation

and linkage disequilibrium, ∼4 novel non-synonymous, splice site or indel

6 next generation sequencing of cmt54 141

(NS/SS/Indel) variants would be expected within a disease interval containing

100 genes when sequencing one affected individual [339].

6.1.3.1 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT54 is caused by a single gene

mutation within the linked region on chromosome 7q34-q36. With a defined

disease locus containing a large number of candidate genes, the use of NGS

has the potential to identify the disease gene mutation in CMT54. The aim of

this chapter is to:

• Capture and sequence genomic regions of interest in family CMT54 using

NGS in order to identify the genetic variant causing dHMN.

6.2 materials and methods

6.2.1 NimbleGen array-based targeted sequence capture

6.2.1.1 NimbleGen sequence capture array design

Two custom 385K 1-plex arrays were designed and manufactured by Roche

NimbleGen to capture genomic sequence from the 7q34-q36 disease interval.

The 12.98 Mb interval was defined by genetic recombinations in affected

individuals in CMT54 (NCBI36/hg18 assembly coordinates; chr7:141 054 291-

154 034 394). Unique probes were designed to 13 348 tiled regions covering 9.03

Mb of sequence within the target interval. Probe tiling excluded repetitive

DNA unsuitable for capture.

6.2.1.2 NimbleGen DNA sequence capture

Genomic DNA was purified from peripheral blood following the protocol

described previously (Section 2.3.1). DNA was tested for purity by UV spec-

6 next generation sequencing of cmt54 142

trophotometer and degradation by agarose gel electrophoresis according to

NimbleGen quality control (QC) requirements. DNA (∼ 40 µg) was forwarded

to Roche Diagnostics (Asia Pacific) for sequence capture using the custom

385K arrays, following standard manufacturer protocol (Roche NimbleGen).

In brief, approximately 20 µg of genomic DNA was fragmented to a size range

of 300-500 bp then purified. Adapters were ligated to the DNA fragments

using T4 DNA Ligase and small fragments (< 100 bp) removed. The library

was hybridized to the custom 385K arrays. Un-bound DNA fragments were

removed by washing followed by elution of target DNA fragments. Eluted

target DNA fragments were PCR amplified using primers complementary to

the adapter sequence. The amplified DNA library was used in subsequent

sequencing reactions.

6.2.2 454 GS FLX sequencing

The Roche NimbleGen target captured DNA was forwarded to Australian

Genome Research Facility (AGRF) for sequencing using the Roche GS FLX

(454) sequencing platform according to the manufacturer’s protocol. Sequence

reads were aligned to the Human Mar. 2006 (NCBI36/hg18) assembly using

gsMapper software (Roche). Identification of sequence variants required at

least two non-duplicate reads with at least 5 bp of sequence flanking the variant.

Sequence variants were defined as ’high-confidence’ if they were identified in

three non-duplicate reads and both forward and reverse directions. Variants

were annotated to the SNP database, dbSNP build 129.

6 next generation sequencing of cmt54 143

6.2.3 NimbleGen array-based exome capture and Solexa sequencing

Genomic DNA from two affected individuals and one ’married in’ unaffected

parent was forwarded to BGI (Hong Kong) for exome capture and sequenc-

ing. Exome capture was carried out using the NimbleGen Sequence Capture

Human Exome 2.1M Array following the manufacturers protocol (Roche Nim-

bleGen). In brief, genomic DNA was randomly fragmented to an average size

of 500 bp then purified. Adapters were ligated to the DNA fragments using

T4 DNA Ligase and small fragments (< 100 bp) removed. The library was

hybridized to a NimbleGen 2.1M Human exome array. Unbound DNA frag-

ments were removed by washing followed by elution of target DNA fragments.

Eluted target DNA fragments were randomly fragmented and Illumina com-

patible adapters ligated. PCR amplification using primers complementary to

the Illumina adapters was used to enrich the material for downstream Solexa

sequencing.

Sequencing was carried out using an Illumina Genome analyser according

to the manufacturers protocol. Adapter sequences within reads were removed

and the remaining read sequence used only if longer than 29 bp (BGI local

programming algorithm). Sequence reads were aligned to the Human Mar.

2006 (NCBI36/hg18) assembly using SOAP aligner software using one read

end of paired end reads [340]. A maximum of 6 sequence variants were allowed

per read. Consensus sequence and sequence variant calling was carried out

using SOAPsnp software. Sequence variants were annotated to dbSNP build

129.

6 next generation sequencing of cmt54 144

6.2.4 Confirmation of sequence variants using Sanger sequencing

Sequence variants were confirmed by PCR amplification of selected exons and

sequencing by Macrogen (Korea) as described in Section 2.3.6.

6.2.5 Sequence variant analysis using Galaxy Browser

Sequence variants were analysed using Galaxy Browser software (http://

galaxyproject.org/) [341, 342]. Sequence variants were annotated using ge-

nomic coordinates for UCSC genes (UCSC genes track) downloaded from the

UCSC table browser via Galaxy. Public sequence variation databases used

included dbSNP 129, dbSNP 130, and dbSNP 132 downloaded via the UCSC

table browser.

6.2.6 NGS assembly and annotation using Bowtie and SAMtools

Solexa sequencing reads were aligned to the Human Mar. 2006 (NCBI36/hg18)

assembly using Bowtie software version 0.12.5 in SAM mode. SAMtools soft-

ware version 0.1.8 was used to convert the Bowtie alignment to a pileup

format which describes the base-pair information at each chromosomal posi-

tion including SNV/indel calling, read-depth and quality scores. The Bowtie

alignment was converted from bam to sam format, sorted, then indexed and

output to a pileup formatted file. This version of Bowtie does not support

indel calling.

6.2.6.1 Sequence coverage visualisation

A custom BAsH script was written to extract read-depth coverage data from

SAMtools pileup format and create a bar graph in bedgraph format that

6 next generation sequencing of cmt54 145

can be visualised in the UCSC genome browser. This software allows the

visualisation of sequencing coverage at a single base level in an annotated

graphical environment. This script is shown in Appendix A.

6.2.7 Unequal coverage comparison

A custom BAsH script was written to check segregation of variants between

samples using an autosomal dominant model of inheritance. The script inspects

the read-depth at a chromosomal position before excluding variants. Only

variants with 10 × coverage are used to exclude variants. Comparisons are

based on the annotated position of variants as used in the UCSC genome

browser and requires all samples to use the same genome reference. First,

the software collates the reported variants from all samples. Segregation of

each variant is checked between the samples. If the variant is not identified

in a sample, the read-depth at the corresponding position is checked from

the pileup file. Where the sequence read-depth is >= 10 × and no sequence

variant is identified, the individual is considered as wt! (wt!) or equal to the

annotation reference. Where read-depth at the position of a variant is below 10

× the variant is flagged as uninformative. This script is shown in Appendix A.

6.2.8 In silico prediction of functional impacts of sequence variants

The software tools used for prediction of functional impacts of sequence

variants were: multiple sequence alignment (MSA) to identify conserved nu-

cleotides; the sequence based prediction tools SIFT (http://blocks.fhcrc.

org/sift/SIFT.html ) [343] and PANTHER (http://www.pantherdb.org/tools/

csnpScoreForm.jsp) [344]; the Sequence and structure-based prediction tools

SNAP (http://www.rostlab.org/services/SNAP/) [345] and Polyphen (http:

6 next generation sequencing of cmt54 146

//genetics.bwh.harvard.edu/pph/) [346, 347]. Web interfaces were used for

all tools.

6.3 results

6.3.1 Chromosome 7q34-q36 target region sequencing

The chromosome 7q34-q36 disease interval was sequenced in one affected

individual using targeted sequence capture and 454 sequencing. Peripheral

blood DNA from affected individual CMT54-III:14 (Figure 3.5), met quality

control standards (A260/280 = 1.83 with a high molecular weight by agarose

gel electrophoresis (AGE); and was captured using two 385K 1-plex arrays

(NimbleGen). Capture probes were designed against all non repeat-masked

sequences within the 7q34-q36 interval, covering 9.03 Mb of the 12.98 Mb

region (69.63%).

Captured DNA was forwarded to Australian Genome Research Foundation

(AGRF) Ltd. (Brisbane, Australia) for 454 sequencing and read alignment. A

total of 1 063 758 QC passed sequence reads were generated as single-end reads

with an average read length of 362 bp (Table 6.1). These reads were mapped

against the Human Mar. 2006 (NCBI36/hg18) assembly. Approximately 46%

of reads mapped to the 7q34-q36 interval, ∼53% mapped to regions outside

7q34-q36 or mapped to multiple locations (repeat-mapped) and 0.54% were un-

mapped. The repeat-mapped and unmapped reads were not used for sequence

variant identification. Reads mapping to the 7q34-q36 interval generated 5186

sequence contigs covering 9 501 821 bp (73.19% of the 7q34-q36 interval) with

an average of 18.84 × coverage. As DNA fragments and sequencing reads

are longer than the capture probes, sequence contigs cover 3.56% more of the

7q34-q36 disease interval than the area tiled by array capture probes. A total

6 next generation sequencing of cmt54 147

Table 6.1: Summary statistics for targeted sequencing of CMT54 family memberIII:14.

Sequencing reads 7q34-q36 coverageTotalreads∗

Uniquelymapped

7q34-q36mapped

Calledbases Depth† 7q34-q36

coverage‡

1 063 758 757 413 494 387 178 999 610 18.84 × 71.73%∗ Reads uniquely mapping to the Human Mar. 2006 (NCBI36/hg18) assembly (chr7 =

558261 reads and other chromosomes = 199152 reads).† Mean sequence depth across all sequence contigs mapping to 7q34-q36.‡ Percentage of 7q34-q36 covered by sequence contigs.

of 13 550 high confidence variants and 22 680 lower confidence variants were

identified within the 7q34-q36 interval.

6.3.2 7q34-q36 sequence variant analysis

The 13 550 high confidence sequence variants identified within the 7q34-q36

interval were filtered to identify putative pathogenic variants (Figure 6.2). After

excluding all variants annotated as SNPs (dbSNP 129), there were 3193 novel

variants. Novel variants were filtered using Galaxy browser software [341, 342]

against genomic intervals from the Human Mar. 2006 (NCBI36/hg18) assembly

to identify non-synonymous, splice site, 5’ UTR and 3’ UTR variants. Sequence

variants identified included single nucleotide variations (SNVs) and small

insertion or deletions (indels).

6.3.2.1 Novel non-synonymous variants

Twenty-four novel non-synonymous changes were identified (Table 6.2). Two

variants were reported in an updated dbSNP database (dbSNP 130) and

excluded. To identify false positives arising from reads mapping to repeat

sequences, 20 bp of consensus sequence flanking each variant was aligned to

the Human Mar. 2006 (NCBI36/hg18) assembly using the BLAT alignment

tool [198]. Six variants were identified as part of chromosomal translocations,

6 next generation sequencing of cmt54 148

High confidence variants13550

?

dbSNP 129 filtered variants3193

?

Exons40

?

Non-synonymous24

?

Synonymous16

?

Splice site0

?

5’ UTR7

?

3’ UTR9

?

Intronic splicing13

? ?

Regulatory13

Figure 6.2: Filtering strategy for variants identified by targeted capture sequencing of7q34-q36. The number of sequence variants at each filter step is indicated.

with the variant present on the alternate chromosome as either the normal

sequence or as a SNP at the corresponding position (Appendix H, Table H.2).

These variants were excluded as false positives. The remaining 16 variants

were re-sequenced using Sanger sequencing in all CMT54 family members:

Four variants did not segregate with affected individuals and 12 variants were

not present, indicating they were false positives from NGS.

6.3.2.2 Splicing variants

Sequence variants possibly affecting exon splicing were defined as: intronic

variants within 50 bp, or exonic changes within 5 bp of intron-exon bound-

aries. This covered variants affecting the core splice site motif, intronic splice

enhancers and the exonic splicing motif. Thirteen novel intronic variants, and

no exonic variants were identified (Table 6.3). One variant, CNTNAP2:c.3476-

15C>A, was previously excluded by Sanger sequencing of candidate genes

(Section 4.3.3). A second variant was reported in an updated dbSNP database

(dbSNP 130). These 12 variants were re-sequenced using Sanger sequencing in

6 next generation sequencing of cmt54 149

Table 6.2: Novel non-synonymous variants identified by targeted capture and se-quencing of the 7q34-q36 disease interval.

Gene Variant AAchange

Seq.depth

Variantfreq.

Validationresult†

PRSS2 c.416C>G S139C 5 0.6 FPPRSS2 c.449C>T S150F 6 0.67 FP

ZNF398 c.732C>A P244Q 21 0.14 FPKRBA1 c.307G>A E103K 28 0.11 FPSSPO c.2286G>T L762F 16 0.62 rs73168055SSPO c.9212_9213insC C3071R 11 1 rs71194663SSPO c.14297C>G H4766D 10 0.6 SNP ns

GIMAP5 c.914A>G H305R 28 0.11 SNP nsABCF2 c.154_155insTT G52fs 15 0.2 FPABCF2 c.151A>G T51G 15 0.27 FPABCF2 c.147G>C E49D 14 0.29 FPABCF2 c.143G>A R48K 13 0.31 FPABCF2 c.126_129delCTTC K43fs 13 0.31 FPABCF2 c.96T>C V32A 11 0.36 FPABCF2 c.65G>A R22Q 10 0.4 FPABCF2 c.59G>T R20L 10 0.4 FPMLL3 c.2689C>T R897X 3 1 Chr dupMLL3 c.2674G>A G892R 3 1 Chr dupMLL3 c.177_178delGA K60fs 12 0.25 SNP ns

CCT8L1 c.532A>G N178D 4 0.75 Chr dupCCT8L1 c.619A>G T207A 4 0.75 Chr dupCCT8L1 c.626A>C H209P 4 0.75 Chr dupCCT8L1 c.700G>A A234T 4 0.75 Chr dup

DPP6 iso2‡ c.283T>C S95P 27 0.11 SNP ns† Validation result indicates: SNP ns, the variant was validated in additional family mem-

bers and did not segregate with the disease; FP, false positive, not reproduced indicateSNPs not identified when re-sequencing using Sanger sequencing of the affected individ-ual sequenced by next generation sequencing; Chr dup, the SNP is part of a chromosomalduplication as detailed in section 5.3.1.3; rs#, the variant is a validated SNP reported indbSNP build 130.‡ This change is reported in a short transcript of DPP6 isoform 2 (UCSC gene: uc0101qh.1).

The transcript contains exons 1-3 with an additional 24 amino acids and stop codonfollowing exon 3.

6 next generation sequencing of cmt54 150

Table 6.3: Novel sequence variants possibly affecting splicing identified within the7q34-q36 interval by targeted DNA capture and 454 sequencing.

Gene Variant Type Seq.depth

Variantfreq.

Validationresult†

TRPV5 c.909+36T>A donor 18 0.17 FPCNTNAP2 c.1499-13T>C acceptor 28 0.11 FPCNTNAP2 c.3476-15C>A acceptor 20 0.45 +‡

ZNF398 c.776-18T>A acceptor 18 0.17 SNP nsGIMAP1 c.46-51_46-50delAG acceptor 26 0.12 FPPRKAG2 c.1437+46T>C donor 10 0.4 SNP ns,

rs73156279MLL3 c.1185-29C>G acceptor 6 0.67 FPMLL3 c.1185-9A>G acceptor 8 0.5 FPMLL3 c.2653-17G>A acceptor 3 1 FPMLL3 c.2653-37G>A acceptor 3 1 FPMLL3 c.3500-46C>T acceptor 4 0.75 FPMLL3 c.3500-23G>A acceptor 4 0.75 FPDPP6 c.457+13T>C donor 27 0.11 SNP ns

† Validation result indicates: +, variant segregates with three affected individuals and notthe control sample sequenced; SNP ns, variant was identified but did not segregatewith the three affected and not the control individual. FP, false positive, variant was notidentified in re-sequencing. rs#, the variant is a validated SNP reported in dbSNP build 130.The affected individual sequenced using NGS was one of the three affected individualsre-sequenced.‡ Variant is a novel SNP characterised in (Chapter 4).

three affected and one unaffected CMT54 family members. Three variants did

not segregate with the disease and nine variants were false positives.

6.3.2.3 5’ and 3’ UTR variants

Seven and nine novel variants were identified within the 5’-UTR and 3’-UTR

of known genes respectively (Table 6.4). One variant, DPP6:c.*347A>G, was

previously excluded by Sanger sequencing of candidate genes (Section 4.3.3).

Re-sequencing of the remaining 15 variants using Sanger sequencing in three

affected and one unaffected CMT54 family members excluded 4 variants which

did not segregate with the disease and 10 variants as false positives. The variant

6 next generation sequencing of cmt54 151

Table 6.4: Novel variants within the 5’ and 3’ UTR of known genes in the 7q34-q36disease interval.

Gene Variant Seq.depth

Variantfreq.

Validation result

5’ UTR sequence variantsGIMAP7 c.-32G>A 18 0.33 SNP ns

TMEM176B c.-170A>G 5 0.6 +, rs114949263ABCF2 c.-18C>A 11 0.36 FPABCF2 c.-31C>T 11 0.36 FP

CSGLCA-T c.-429T>C 16 0.19 FPPRKAG2 c.-346_357del 21 0.43 FPPRKAG2 c.-414T>C 24 0.42 +, rs117690714

3’ UTR sequence variantsCLEC5A c.*815_*816delTA 20 0.15 FPZNF783 c.*2008T>G 23 0.13 FPABCF2 c.*214G>A 22 0.27 FPABCF2 c.*184G>A 21 0.29 FPABCF2 c.*181C>T 22 0.27 FPABCF2 c.*178G>A 22 0.27 FPABCF2 c.*166G>A 7 0.57 SNP ns

GALNT11 c.*329T>A 14 0.21 +, rs117690714DPP6 c.*347A>G 23 0.57 + ‡

† Validation result indicates: SNP ns, the variant was validated in additional family membersand did not segregate with the disease in CMT54; +, variant segregates with the diseasein CMT54; FP, false positive, variant not identified when re-sequenced using Sangersequencing; rs#, the variant is a validated SNP reported in dbSNP build 132.‡ Variant is a novel SNP characterised in (Chapter 4).

GALNT11:c.T>A, which segregated with affected individuals from CMT54,

was identified in an updated dbSNP database (dbSNP 132).

6.3.2.4 454 sequencing quality

No putative pathogenic mutation was identified from the novel sequence

variants identified within the 7q34-q36 interval. Of the three novel variants pre-

viously identified in affected individual III:14 using Sanger sequencing (Chap-

ter 4), two were identified here (CNTNAP2:c.3476-15C>A and DPP6:c.*374A>G)

and one variant was missed (ACCN3:c.915C>T). The ACCN3 variant was also

6 next generation sequencing of cmt54 152

Contigs created by Roche NimbleGen 454 sequencing

UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Contig 324Contig 325

Chr7

Figure 6.3: 454 sequencing contigs at the ACCN3 gene. Top: Chromosome 7 ideograph:The region indicated by the red bar is expanded below. Lower: UCSC Genome Browserimage detailing detailing the two 454 sequence contigs that span the ACCN3 gene.The known SNV within exon 5 was missed, as both exons 5 and 6 are not covered byeither contig 324 or 325.

not identified in the lower confidence variant report. The ACCN3 gene was

covered by two sequence contigs, ACCN3 with exons 5 and 6 in the gap be-

tween these two contigs (Figure 6.3). The known SNV is located within exon

5, and hence, was not identified in the variant report. The low confidence

variants were briefly examined for changes within good candidate genes with

previous Sanger sequencing coverage. However, due to the high false positive

rate encountered with these and the high confidence changes previously, this

analysis was stopped. This sample was later re-sequenced with a higher quality

(Section 6.3.4), minimising the potential that the true variant may have been

reported here and missed.

Sequence variations within 7q34-q36 which were missed may be either in

the gaps between sequence contigs or within contigs with low read-depth. The

majority of gaps between the 5186 sequence contigs consist predominantly of

short sequence gaps < 10 kb in length. There were 18 large sequence gaps >

10 Kb totalling 467 kb, 3.6% of the disease interval (Table H.1). Examination of

the 18 large sequence gaps identified four gaps containing candidate genes:

gap 4 contained the entire FAM115C gene and gaps 11, 15, 17 and 18 contained

individual exons of the genes ARHGEF5, ACTR3B and DPP6 (Figure 6.4). The

remaining large sequence gaps were intergenic.

To confirm that genomic intervals within sequence contigs are free of se-

quence variants requires a minimum read-depth of 10 ×. This limits the

6 next generation sequencing of cmt54 153

0

50

10

Dep

th (

x)

Sequencing depth at reported variants

RefSeq Genes

7q34-q36 sequencing gaps of > 10 kb

Scalechr7:

GAP1GAP2GAP3GAP4GAP5

GAP8GAP9GAP10GAP11

GAP12GAP13GAP14

GAP15

GAP18GAP16

GAP17

GAP6GAP7

5Mb

20

30

40

chr7

Sequencing contigs

Figure 6.4: UCSC genome browser view of the 7q34-q36 region showing sequencingcoverage AGRF sequencing (NimbleGen). Top: Chromosome 7 ideograph: The 7q34-q36 target region indicated in the red box is expanded below. Lower): Detail of the7q34-q36 target region. Gaps greater than 10 Kb are shown as black bars labelled 1to 18. The sequencing read-depth at each reported variation is shown as a purple bar.Sequence contigs generated with 454 sequencing are shown in the black bar. RefSeqgenes are shown compacted in dark blue. The read-depth analysis indicates not allcontigs are sequenced at sufficient read-depth to rule out missed sequence variants.

possibility that all reads derive from a single chromosome, i.e. missing a se-

quence variation on the alternate chromosome. The read-depth at identified

variants ranged from 2 to 62× with an average of 15×. This is lower than

the average sequence contig read-depth of 18.8× and below the sequencing

target of 20×. As an indication of the variation seen in sequencing depth, the

read-depth at the identified variants was plotted across the 7q34-q36 interval

and visualised using the UCSC genome browser (Figure 6.4). This identified

several regions with low sequencing read-depth.It is likely that additional

sequence variants were missed due to contig gaps and localised intervals of

low sequence read-depth. Therefore, this analysis can only report the identified

variants. It is unable to determine if a genomic interval is free from variation.

6 next generation sequencing of cmt54 154

6.3.3 Off-target and repeat mapping reads

Repeat-mapped reads are those that map equally well to more than one ge-

nomic location. Repeat-mapped reads and reads mapping uniquely to genomic

regions outside the 7q34-q36 interval accounted for 53% of all reads. A fur-

ther 18.7% of reads mapped uniquely to other chromosomes, 6% mapped

uniquely to other areas on chromosome 7, and 29.66% were repeat-mapped. A

proportion of the reads mapping outside the 7q34-q36 interval and to other

chromosomes were included in the DNA capture design for QC purposes. Of

the repeat-mapped reads, 95% mapped to multiple locations on chromosome

7 and 5% mapped to other chromosomes. Some of these off-target reads and

repeat mapped reads may be the result of non-specific DNA sequence capture

or capturing DNA fragments from recent segmental duplication (SD) into or

out-of the 7q34-q36 interval.

A large amount of recent SD had been previously reported in the original

finished reference sequence for chr7 [255]. Recent SD within the 7q34-q36

disease interval in the Human Mar. 2006 (NCBI36/hg18) assembly reference

sequence was examined using the segmental duplications track (sequences of

>= 1 kb and >= 90% identity) [348, 349] of the UCSC table browser [256] and

Genome Pixelizer software [350]. Recent SDs at 7q34-q36 accounted for 2.53

Mb of DNA sequence, including: 86 intrachromosomal SDs totalling 1.97 Mb

within the 7q34-q36 interval, 3 intrachromosomal SDs (8.04 kb) of chromosome

7 outside the 7q34-q36 interval, and 86 interchromosomal SDs totalling 0.56

Mb (Figure 6.5). Therefore recent SD is likely to be responsible for some of the

off-target and repeat mapped reads obtained with targeted capture and 454

sequencing.

6 next generation sequencing of cmt54 155

1

10 11 12 13 13-R 15 16 17 19

2

21 22

3 4 5 6 7 8

9 X Y

7q34-q36

Figure 6.5: Recent segmental duplications of >= 1 kb and >= 90% identity at thechromosome 7q34-q36 disease locus in the Human Mar. 2006 (NCBI36/hg18) assemblyreference genome. Intrachromosomal segmental duplications within the 7q34-q36interval (blue), outside the 7q34-q36 interval (purple) and interchromosomal segmentalduplications (red) are shown. The 7q34-q36 interval is magnified 100 × compared tochromosomes. Details of recent segmental duplications were downloaded from thesegmental duplications track [348, 349] using the UCSC table browser [256]. The chartwas generated using Genome Pixelizer [350].

6.3.4 Exome sequencing

Because substantial areas of the 7q34-q36 target region were not covered by

targeted sequencing (above), the 7q34-q36 interval was re-examined in CMT54

using newly available exome sequencing. DNA from two affected individuals

(II:2 and III:9, Figure 3.5) and an unaffected ’married-in’ parent (CMT54-II:8),

was forwarded to BGI (Hong Kong) for exome capture and sequencing using

the NimbleGen 2.1M Human Exome Array (Roche) and Solexa sequencing

system (Illumina). The exome capture design targets 180 000 coding exons

from the consensus coding sequence (CCDS) database [260] and ∼550 miRNA

exons from miRBase [351]. Additionally, the NimbleGen target captured DNA

sequenced previously (affected individual III:14) was resequenced by BGI

using the Solexa sequencing system.

An average of 24.9 Gb of sequence was generated as paired-end, 74 bp reads

for each sequenced sample. Sequences were aligned by BGI (Hong Kong)

as single-end reads only. For the exome capture samples, this achieved an

average of 43× coverage for 94% of the CCDS exons targeted (Table 6.5). For

6 next generation sequencing of cmt54 156

the 7q34-q36 targeted capture (sequenced as two samples) this achieved an

average of 227× coverage for 99.5% of CCDS and miRNA exons within the

7q34-q36 interval.

6.3.4.1 Sequence variant analysis

An average of 30 526 sequence variants were identified within each exome

sequenced in family CMT54 (Table 6.6). These variants were annotated and

filtered using the online Galaxy Browser software. An average of 144 variants

within the linked interval on 7q34-q36 were identified in the exome sequenced

samples and 415 variants in the target captured sample. These variants were

compared to the dbSNP 130 database to remove variants present in the popu-

lation. Non-synonymous variants, were identified using open reading frame

co-ordinates from the CCDS database. After filtering, 4-6 non-synonymous

variants remained for each exome sequenced sample and 133 non-synonymous

variants for the target region captured sample.

With an autosomal dominant model of inheritance, all affected individuals

are expected to carry the same mutation, and be absent from ’married in’

unaffected individuals. Applying this to the data from the three affected and

one unaffected samples analysed, no non-synonymous variants were identified

within the 7q34-q36 interval. Extending this to all novel variants within the

7q34-q36 interval identified three intronic variants (Table 6.7). One variant was

excluded with previous Sanger sequencing (Chapter 4) and the remaining

two variants did not segregate with all affected individuals in CMT54 when

re-sequenced using Sanger sequencing.

Of the three sequence variants previously shown to segregate with the

disease in CMT54, only one was identified in this filtering approach (CNT-

NAP2:c.3476-15C>A). The SNV ACCN3:c.915C>T was identified within the

7q34-q36 target capture sequenced sample only. The variant DPP6:c.*374A>G

was not reported in any of the samples as this variant is within an alternate

6 next generation sequencing of cmt54 157

Tabl

e6.

5:Su

mm

ary

ofex

ome

and

chro

mos

ome

7q34

-q36

targ

eted

sequ

enci

ngus

ing

Sole

xapl

atfo

rm.

Sam

ple

Tota

lSe

quen

cing

base

sC

over

age

read

sTo

tal

Map

ped

Exom

em

appe

dD

epth

Exom

eII

:217

262

146

255

479

760

82

406

209

599

151

724

953

244

.48×

94.1

8%II

I:916

390

674

242

581

975

22

262

780

840

143

525

463

042

.08×

93.2

8%II

:816

903

507

250

171

903

62

358

641

852

147

501

204

143

.24×

94.2

8%II

I:14

a†17

743

070

262

597

436

01

814

973

684

1017

569

322

6.91×

99.4

1%II

I:14

b†15

865

300

234

806

440

01

646

449

085

1722

40.

38×

23.0

3%†

Cap

ture

dD

NA

from

the

cust

omca

ptur

ear

rays

wer

ese

quen

ced

sepa

rate

ly.T

heex

ome

cove

rage

does

not

incl

ude

sequ

ence

din

terv

als

outs

ide

CC

DS

exon

s.

6 next generation sequencing of cmt54 158

Table 6.6: Sequence variants identified by exome capture and BGI sequencing

Filter II:2 III:9 II:8 III:14(whole exome/7q34-q36 locus)

All Variants 30 676/142 30 655/154 30 248/135 na/415All Variants & not indbSNP 130

22 258/16 22 277/16 21 809/16 na/268

CDS Variants 12 542/65 12 570/71 12 227/51 na/242NS Variants 5464/29 5497/36 5353/22 na/159NS Variants & not indbSNP 130

4006/6 4024/6 3847/4 na/133

Table 6.7: Novel sequence variants identified with exome and targetedinterval sequencing with segregation analysis in CMT54.

Position Variant Gene Validationchr7:147 711 659 C>A CNTNAP2 healthy controls∗

chr7:148 137 226 G>A EZH2 no segregation in CMT54chr7:151 204 513 G>A PRKAG2 no segregation in CMT54∗ CNTNAP2 sequencing detailed in Chapter 4

DPP6 isoform not present in the CCDS reference database and therefore, not

targeted by the exome capture and BGI sequencing-bioinformatics pipeline.

As the pathogenic mutation was not identified a more detailed examination of

the sequencing data was undertaken.

6.3.5 Further sequencing analysis using unequal coverage analysis

The sequence read-depths and sequence contig sizes were compared between

the four sequenced samples to assess uniformity. Sequence contig coordinates

and single-base read-depth were not reported in the sequence variant report

by BGI, therefore, sequencing bioinformatics were repeated locally for the BGI

sequenced samples. Sequence reads were aligned to the Human Mar. 2006

(NCBI36/hg18) assembly using Bowtie software and sequence alignments

were converted to pileup format Using SAMtools software. Sequence contigs

6 next generation sequencing of cmt54 159

and per base read-depth were compared between samples using the custom

BAsH script PileupToBgr.sh and the UCSC genome browser (Appendix A).

The sequencing coverage was examined for the ACCN3 gene due to the

known variant present and the incomplete sequencing coverage. The three

exome captured samples had a similar coverage of sequence contigs, however,

the read-depth varied at the ends of target intervals and across exons 4-6 with

poor coverage (Figure 6.6). ACCN3 was sequenced to a greater extent in the

target captured DNA sample than the exome sequenced samples. Targeted

DNA capture included introns and exons at this location as well as the higher

number of sequencing reads. Overall, the same exons were covered poorly by

both DNA capture platforms. Additional genes were examined and all showed

similar variation in sequence coverage at the ends of target intervals or specific

exons (data not shown). This confirmed that there is variation in read-depth

and contig size between samples in the intervals sequenced.Differences in

sequence coverage need to be considered when comparing sequence variants

between samples to avoid false negatives from one sample excluding true

variants identified in another sample.

6.3.5.1 Sequence variant analysis considering unequal coverage

With an autosomal dominant model of inheritance, all affected individuals

should carry the same mutation (assuming there are no phenocopies), which

should be absent from ’married-in’ unaffected spouses. With unequal sequenc-

ing coverage, the per-base read-depth needs to be considered before excluding

a sequence variant that is not reported in an individual. A custom script was

written for the BAsH unix shell for variant segregation analysis with unequal

sequence coverage (CompSNPchr7.sh; Appendix A). The script checks for

segregation of identified sequence variants among the four CMT54 family

members. However, when a variant is not reported in an individual, the se-

quencing coverage at that position is checked. A sequence depth of at least 10

6 next generation sequencing of cmt54 160

chr7 (q36.1) 34

Scalechr7:

1 kb hg18150377000 150378000 150379000 150380000

UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative GenomicsACCN3ACCN3ACCN3

II:1

50 _

0 _

10 -

II:8

50 _

0 _

10 -

III:9

50 _

0 _

10 -

III:14

50 _

0 _

10 -

Figure 6.6: Sequencing coverage comparison between three exome capture and onetarget region capture samples. Top Chromosome 7 ideograph. The region boundedby the red box is expanded below. Lower panel Comparison of sequencing depth andcoverage for Illumina sequenced samples in the region of the candidate gene ACCN3.Control II:8, affected II:2 and affected III:9 were exome captured (green); targeted DNAcapture of the chr7q34-q36 dHMN1 disease interval was used for affected individualIII:14 (light blue). Sequencing depth greater than 50 fold is indicated by a pink line.Three isoforms of ACCN3 are shown in dark blue.

6 next generation sequencing of cmt54 161

reads at the selected DNA base position was required to exclude the presence

of a variant. Sequence variants were annotated against the dbSNP 130 database

to identify novel variants. The validation status of SNPs was considered before

excluding reported SNPs. Any SNPs from dbSNP 130 with an ’unknown’

validation were manually examined before excluding them.

Using unequal coverage analysis, the three novel variants that were previ-

ously shown to segregate with the disease in the three affected individuals from

CMT54 were also identified here (Section 6.3.4.1, Table 6.7). Four additional

variants were identified in regions where sequence coverage was insufficient to

exclude the possibility of a missed variant call (Table 6.8). These four variants

were subsequently excluded with Sanger sequencing of CMT54 family mem-

bers. One of these four variants was the ACCN3 mutation described previously.

The ACCN3 SNP was only identified in affected individual III:14, and not

in the two affected individuals or control individual sequenced with exome

sequencing with a with a read-depth of 3-7×. An additional five sequence

variants were identified that were reported SNPs with unknown validation

status (Table 6.8). Of these, four SNPs were excluded with updated annotation

information and one was confirmed as a false positive.

No sequence variation with a putative pathogenic role was identified in the

CCDS exons and flanking intronic regions from the 7q34-q36 disease interval

in CMT54. The sequencing coverage of these exons is high with 99.5% of

CCDS exons covered in the target captured sample. This includes the interval

that was common between families CMT54 and CMT44 which was identified

through haplotype comparisons.

6 next generation sequencing of cmt54 162

Tabl

e6.

8:Se

greg

atio

nan

alys

isof

exom

ese

quen

cing

iden

tified

vari

ants

assu

min

gun

equa

lseq

uenc

ing

cove

rage

.V

aria

nts

wit

hfo

ldco

vera

ge<

10fo

ld,t

heth

resh

old

for

aho

moz

ygou

sca

ll(n

ova

rian

tid

enti

fied)

.

Posi

tion

Var

iant

Gen

ety

peII

:2II

I:9II

I:14

II:8

Val

idat

ion

Nov

elva

rian

tsch

r7:1

4269

557

7A>

CC

ASP

25’

UTR

Y5

60

nose

greg

atio

nch

r7:1

4812

576

0G>

AC

UL1

intr

on8

3Y

9no

segr

egat

ion

chr7

:150

275

881

T>A

KC

NH

2N

S1

4Y

1no

segr

egat

ion

chr7

:150

378

829

C>

TA

CC

N3

NS

36

Y7

heal

thy

cont

rols

SNPs

wit

hun

know

nva

lidat

ion

stat

us(d

bSN

P130

)ch

r7:1

4128

200

7-/

AC

LEC

5Ain

tron

5Y

Y5

rs58

8802

3ch

r7:1

4833

389

1A

/CPD

IA4

intr

on5

5Y

16rs

1233

4067

,rs1

2333

828∗

chr7

:148

333

892

C/A

PDIA

4in

tron

54

Y16

rs12

3338

28∗

chr7

:148

952

576

G/A

ZN

F767

NS

3Y

07

rs71

5327

86ch

r7:1

5056

802

5-/

TSM

AR

CD

3in

tron

23

Y4

rs61

3561

47∗

Both

thes

eva

rian

tsin

corr

ectl

yre

port

asi

ngle

base

dele

tion

atpo

siti

on14

833

389

1in

are

peti

tive

sequ

ence

.

6 next generation sequencing of cmt54 163

Table 6.9: Markers with two-point LOD scores > 1.5 in a genome wide linkage analysisby Gopinath et al. [103].

Locus Marker LOD score2q36 D2S396 2.07 at θ = 07q36.1 D7S661 3.19 at θ = 07q36.1 D7S636 3.19 at θ = 07q36.2 D7S798 1.76 at θ = 08q21 D8S270 2.02 at θ = 018q12 D18S1102 1.76 at θ = 0

6.3.6 Exome wide analysis

As no putative disease causing mutation was identified within the chromosome

7q34-q36 disease locus, non-synonymous variants identified elsewhere in the

exome were examined. Sequence variants from the three exome sequenced

samples were examined. A total of 87 novel non-synonymous variants were

present in both affected individuals and absent in the unaffected control. The

location of these variants was compared to a 10 cM genome wide microsatellite

linkage analysis carried out previously as described by Gopinath et al. [103].

With the exception of the 7q34-q36 locus, there were three markers with a

two-point LOD score > 1.5 (Table 6.9). One variant in the TSPYL5 gene was

located near the marker D8S270 with a two-point LOD score of 2.02. Using

Sanger sequencing of two additional affected and two unaffected individuals

from CMT54, this variant did not segregate with the disease in the larger

CMT54 pedigree.

To identify NS variants with a likely functional impact from amongst the

remaining variants, a combination of five software tools were used to assess se-

quence conservation or effect on protein function. Variants that were predicted

to be deleterious by two or more of the software tools were considered can-

didates. Twenty-six potentially deleterious NS variants were re-sequenced in

additional CMT54 family members and control individuals. All were excluded

6 next generation sequencing of cmt54 164

Table 6.10: Sequence variants predicted to be deleterious or affectingconserved sequences.

MSA SNAP Polyphen SIFT Panther46/32/9 28/43/16 13/17/44/13 21/51/15 11/24/12/40

results indicate the number of variants in each group: MSA priority - High-/Medium/Low priority; SNAP - Not neutral/neutral/unknown; Polyphen -Probably damaging/possibly damaging/benign/unknown; SIFT - Affect/tol-erate/unknown; Panther - Probably damaging/possibly damaging/not dam-aging/unknown.

through: lack of segregation with affected individuals, presence in control

individuals or identified as homozygous SNPs.

6.4 discussion

NGS was used to screen all known genes within the chromosome 7q34-q36

disease interval. No putative pathogenic mutation was identified. Sequence

variations that were identified and analysed included non-synonymous SNV,

splice site, 5’ UTR and 3’ UTR sequence variants. A screen of non-synonymous

variants from elsewhere in the exome, including those predicted to be dele-

terious, also did not identify a putative pathogenic mutation. This analysis

considered variants that segregate with an autosomal dominant mode of in-

heritance, and took into account the differences in sequence coverage between

samples.

While a pathogenic mutation was not identified here, a similar exome

sequencing approach previously identified the disease gene TGM6 causing

spinocerebellar ataxia [339]. In this case four patients from a single pedigree

were sequenced and compared to identify the disease mutation as the sole

candidate sequence variant. This mutation lay within the previously linked

interval on chromosome 20p13-p12.2. Meta analysis showed that any two

exomes combined with the linked disease locus would have successfully

6 next generation sequencing of cmt54 165

identified the TGM6 mutation as the sole putative pathogenic candidate for

this autosomal dominant disorder.

By considering variation in sequencing coverage between samples, the se-

quence variants known to segregate with the disease in CMT54 were identified.

This identified several additional sequence variants which would have other-

wise been missed because of false negatives in individual samples caused by

poor or no sequencing coverage. This unequal coverage analysis is a novel way

of comparing sequencing variants between samples to account for variations in

sequencing coverage between samples. Other approaches to address differences

in sequencing coverage between a number (n) of samples have used a relaxed

model of segregation requiring variants to be present in n-1 or n-2 samples

only [338]. However, this application does not account for low read-depth and

would not have identified the ACCN3 SNP described here. Another approach

for the identification of variants within low coverage intervals is to combine

sequence reads from multiple affected individuals from a pedigree prior to

assembly and analyse them as a single ’virtual’ individual. By combining reads

from three affected individuals, Hedges et al. [352] demonstrated an increased

yield of sequence variants of 11%, compared to assembling three individuals

independently. This approach increases the chance of variants being reported

from areas of low sequencing coverage. It is necessary to consider unequal se-

quencing coverage if genes are to be excluded (i.e. free of putative pathogenic

sequence variation) as low read-depth will have a high false positive rate.

The addition of one sample with poor sequencing quality can falsely exclude

positive variants. This is also necessary when considering the differences in

sequence capture and sequencing platforms [353].

6 next generation sequencing of cmt54 166

6.4.1 Limitations of the NGS analysis

Despite the comprehensive analysis of NGS data, the chromosome 7q34-q36

mutation was not identified. The read-depth and sequencing coverage for one

affected individual was high after targeted sequence capture of the disease

interval. Due to the high average read-depth of 226×, most of the 99.5% of

CCDS exons were covered by sequence contigs at sufficient read-depth to be

confident that a sequence variant was not missed in these regions. However

there are several important limitations to this analysis.

An important limitation of sequence analysis based on a reference genome

is that gene annotation is not 100% complete with ongoing refinement and

addition of alternate transcripts. This is important for the definition of coding

exons used in the standard exome captures. The exome capture does not

sequence all exons within the genome because of incomplete annotation of the

reference genes and because not all exons are suitable targets for capture. This

includes coding regions within repetitive sequences and areas that may not

amplify effectively with PCR required for the sequence capture and enrichment.

In this project, a known variation in the 3’-UTR of DPP6, previously identified

by targeted Sanger sequencing of candidate genes, was not identified by exome

sequencing. In this case the DPP6 gene was not targeted in the NimbleGen

Sequence Capture 2.1M Human Exome Array used (2.1M Human Exome

Array annotation files). However, it was identified with the use of the custom

sequence capture microarray targeting the 7q34-q36 interval. In a comparison

of three current exome capture platforms, differences in the genes targeted

have been demonstrated; approximately 1100 additional genes were targeted

by the Agilent SureSelect platform compared to the NimbleGen exome array

[354].

6 next generation sequencing of cmt54 167

A further limitation is that exome sequencing does not capture regulatory

regions that may be involved in disease. These may include promoter regions,

splicing enhancers, cryptic splice sites and other remote regulatory elements.

Relatively few genes have well characterised regulatory elements beyond their

core promoter region. Similarly, the effect of synonymous variants on exon

splicing is difficult to assess. In rare cases, synonymous sequence variants

can cause exon skipping or other splicing errors [355]. These variants have

been reported to create leaky expression, with the incorrect transcript making

up only a small percentage of overall gene expression. As with promoter

predictions, splice site software is generally restricted to predicting the effect

of variants on well characterised core splicing and enhancer motifs.

With 20 000 to 24 000 SNVs identified on average in a human exome it is

essential to filter out known SNPs [356]. However, as the number of individuals

reported in sequence variant databases continues to grow, there is an increased

risk of undiagnosed individuals inadvertently being included. This is more

problematic for late onset diseases or those with subtle phenotypes. While the

true prevalence of autosomal dominant dHMN is unknown, the most common

form of peripheral neuropathy, CMT1, has a prevalence of approximately

20 cases per 100 000 and CMT2 2 per 100 000 [315]. Clinical dHMN is seen

less often than CMT2 in the Neurogenetics clinic at Concord Hospital and

assumed to have a lower prevalence (G. Nicholson, personal communication).

The genetic heterogeneity of dHMN decreases this further, as mutations in

a particular disease gene will account for a small percentage of all dHMN

[6]. With mutations being so rare, it is unlikely that undiagnosed dHMN

individuals will be present in the dbSNP cohorts. Nonetheless, a simple

filtering restriction by not using SNPs with a MAF rarer than a reasonable

threshold can easily overcome this. Using a MAF threshold of 0.001 for SNPs

has been suggested [356], however, for rare mendelian diseases this threshold

can be 0.000 01 or greater.

6 next generation sequencing of cmt54 168

A further limitation of the NGS strategy used here is the poor capacity to

identify indel mutations. While both SNV and indels were considered in the

original analysis using 454 sequencing and gsMapper software (Roche), no

putative pathogenic indels indels were identified. However, the performance

of this early 454 sequencing and gsMapper software in detecting indels was

shown to be limited, with a similar application missing 50% of the indels

identified by Sanger sequencing [353]. Indels were not called by the SOAP1

or Bowtie software used to assemble the reads from the Solexa sequencing

used in exome sequencing. The use of paired-end reads and the long reads

in a de novo assembly may better identity indels as well as translocations and

inversions. However the bioinformatics and computer resources required for

this assembly exceed those available for this project. Indel identification and

analysis using NGS is still in its infancy and currently has a much higher false

positive rate than SNV calling.

In this project, in silico predictions of disease effect were used to prioritise

non-synonymous variants from the exome analysis. However, excluding se-

quence variants based on predictions is a risky approach, as true pathogenic

variants may be missed. This was demonstrated with the analysis by Bamshad

et al. [356] in identifying DHODH mutations in Miller syndrome where in

silico predictions incorrectly excluded the true mutation under the recessive

model of inheritance tested. The filtering strategies employed here were less

stringent, requiring only two positive results from five software tools to screen

variants further.

These limitations provide possible explanations for not finding a pathogenic

variant. The pathogenic variant may be a variant in: a gene that is not yet

recognised; a gene that is not targeted by the exome capture; a repetitive

sequence that was not sequenced; a sequence variant in a regulatory element;

or an indel or other structural variation not identified within the limits of the

sequence assembly used here.

7E X O M E S E Q U E N C I N G O F C M T 4 4

7.1 introduction

A second dHMN family CMT44, with suggestive linkage to the chromosome

7q34-q36 interval, was examined by exome sequencing. Previous two-point

and multi-point linkage analysis in CMT44 achieved a maximum LOD score of

2.02 at the 7q34-q36 locus. This reached the theoretical maximum LOD-score

for this family. The haplotype comparisons carried out previously (Chapter 3)

indicate that families CMT44 and CMT54 possibly share a common ancestor.

If this is the case, CMT44 may share the same mutation with CMT54; and the

same difficulties will arise in mutation discovery. However, while the haplotype

association between these two families is statistically significant, there remains

the possibility that the families are unrelated. Recent analysis by Brewer [357]

has shown statistically significant haplotype associations to be incorrect when

a larger set of markers were included in the analysis. In addition, the more

recent sequencing platform used in analysis of CMT44 had the potential to

achieve greater coverage in difficult to sequence exons.

7.1.0.1 Hypothesis and aims

It is hypothesised that dHMN seen in family CMT44 is caused by a single gene

mutation within the linked region on chromosome 7q34-q36. The aim of this

chapter is to:

169

7 exome sequencing of cmt44 170

• Capture and sequence the exomes of individuals from family CMT44

using NGS in order to identify the genetic variant causing dHMN.

7.2 materials and methods

7.2.1 Exome sequencing

Genomic DNA from peripheral blood (3 µg) was forwarded to Axeq Tech-

nologies (Seoul, Korea) for exome sequencing. An Illumina sequencing library

was prepared for each sample which was enriched using the Illumina Exome

Enrichment protocol and sequenced using an Illumina HiSeq 2000 Sequencer.

7.2.1.1 Sequencing library preparation

Sequencing libraries were prepared using the Illumina TruSeq DNA Sample

Prep system following manufacturers protocols. In brief, 1 µg of genomic

DNA was fragmented by nebulization, 3’ ends of fragmented DNA were

repaired, then Illumina adapters were ligated to the fragments. Ligated DNA

fragments were size selected in the range of 350-400 bp. Size-selected DNA

fragments were PCR amplified to enrich for adapter ligated DNA fragments.

Correct library fragment size and concentration were assayed using an Agilent

Bioanalyzer.

7.2.1.2 Exome capture and enrichment

Exome capture and enrichment was carried out using the Illumina TruSeq

Exome Enrichment system following manufacturers protocols. In brief, DNA

libraries were indexed and pooled prior to enrichment. DNA libraries were

hybridised to capture probes bound to streptavidin beads. Bound libraries were

washed three times to remove non-specific binding fragments. The enriched

7 exome sequencing of cmt44 171

library was eluted and subjected to a second hybridization and three wash

steps. The enriched library was eluted from the capture probes and prepared

for sequencing.

7.2.1.3 Sequencing

The DNA libraries were PCR amplified to further enrich for adapter ligated

DNA fragments. Exome capture and enrichment was validated using Axeq

Technologies internal quality control procedures. Automated sequencing was

carried out using an Illumina HiSeq 2000 instrument.

7.2.1.4 NGS read mapping and variant detection

Sequence read-mapping and variant annotation was carried out by Axeq

Technologies (Seoul, Korea). Sequence reads were mapped to the Human Feb.

2009 (GRCh37/hg19) assembly using BWA software [358] using a read quality

threshold of 20. Sequence variants, including single-base substitutions and

indels, were called using SAMtools software [359] using a maximum read

depth of 1000× and a quality threshold of 20 for both SNVs and indels. SNPs

were annotated using dbSNP 131, dbSNP 132, and 1000 genomes SNP call

release (20101109, 628 individuals).

7.2.2 NGS variant analysis

Sequence variants were analysed using Galaxy Browser software (http://

galaxyproject.org/) [341, 342, 360]. Novel variants were annotated using

the dbSNP 134 and 1000 Genomes Phase 1 Integrated Variant Call Set. Pub-

lic data downloaded sites were; dbSNP 134 (ftp://ftp.ncbi.nih.gov/snp/),

the 1000 Genomes Phase 1 Integrated Variant Call Set (ftp://ftp-trace.

ncbi.nih.gov/1000genomes/ftp/) and the NHLBI Exome Sequencing Project

7 exome sequencing of cmt44 172

database (http://evs.gs.washington.edu/EVS/). Known peripheral neuropa-

thy genes were downloaded from the Mutation Database of Inherited Pe-

ripheral Neuropathies (http://www.molgen.ua.ac.be/cmtmutations/). Novel

variants were screened in a panel of 32 unrelated control exomes of unaffected

spouses from unrelated ALS and CMT families using Integrative Genomics

Viewer software [361].

7.2.3 HRM analysis

Genotyping of sequence variants was carried out with high resolution melt

analysis using the LightScanner System (Idaho Technology Inc.). Target exons

were PCR amplified in a 10 µL reaction volume with LightScanner master

mix (1×; Idaho Technology Inc.), PCR Enhancer (1×; Invitrogen), forward and

reverse primers (0.4 pmol µL-1 each) and 10 ng template DNA. Fluorescent melt

curves were acquired using the LightScanner instrument between 65°Cand

98°C.

7.3 results

7.3.1 Exome sequencing in CMT44

Exome sequencing was performed using four affected and one unaffected

individuals from CMT44 (individuals III:1, III:5, IV:1 and II:2; Figure 7.1). An

average of 6.5 billion bases of sequence per individual was generated as paired-

end reads with an average length of 105 bp (Table 7.1). This achieved a mean

read-depth of 47.72× across the five exomes. Exome coverage at 1× read-depth

ranged from 94.9% to 93.0%, however at 10× read-depth the exome coverage

was lower, ranging from 77.3% to 87.0% of target exons. An average of 65 236

7 exome sequencing of cmt44 173

SNVs and 13 385 indels were identified in each sample, of which 11.6% of

SNVs and 4.6% of indels, resulted in non-synonymous or acceptor/donor

splice site changes (including non-frameshift indels) (Table 7.2).

The non-synonymous and splice site variants (SNV and indels) that were

common to the four affected individuals were compared with the database of

44 known peripheral neuropathy genes (Mutation Database of Inherited Pe-

ripheral Neuropathies) and two additional dHMN genes (ATP7A and HSPB3).

A mutation in the MFN2 gene was identified in all four affected individuals.

This missense mutation MFN2:c.C368T (p.P123L), was previously reported in

autosomal dominant CMT2 by Verhoeven et al. [362]. However, exome data

showed that unaffected individual II:2 also carries this mutation (Figure 7.1).

CMT44 had previously been screened for mutations in MFN2, using individual

II:1 as the index patient (O.Albulym, personal communication). No mutation

was identified in indiviudal II:1. This indicated that the MFN2 mutation, had

been inherited from individual II:2. This warranted a clinical re-examination

of CMT44 family members.

7.3.2 Clinical detail of CMT44

Approximately seven years had elapsed since the previous clinical neurological

examination of CMT44 family members. A further neurological examination

of three individuals from CMT44 (II:1, II:2 and III:1) was carried out at the

Neurology clinic, Concord Hospital.

Individual II:1, aged 84, has a mild motor neuropathy, with minor muscle

wasting in the lower legs and pes cavus. Pes cavus was apparent since child-

hood, with a family history of similar feet from a young age. A mild axonal

motor neuropathy with no sensory loss was confirmed by nerve conduction

studies (NCS).

7 exome sequencing of cmt44 174

Tabl

e7.

1:Su

mm

ary

ofex

ome

sequ

enci

ngin

CM

T44.

Sam

ple

Tota

lSe

quen

cing

base

sEx

ome

cove

rage∗

read

sTo

tal

Map

ped

Exom

em

appe

dat

dept

hat

10×

dept

hM

ean

dept

hII

I:188

020

920

942

123

951

47

679

947

663

419

758

166

394

.4%

85.7

%67

.6×

III:5

6964

633

07

034

279

330

601

213

387

03

485

101

401

94.7

%87

.0%

56.1×

III:7

6586

149

86

652

011

298

516

713

531

02

922

467

830

94.9

%87

.2%

47.1×

IV:1

4132

401

64

422

761

250

362

223

206

22

015

291

144

93.0

%77

.3%

32.5×

II:2

4622

317

24

947

525

468

403

622

067

92

188

857

813

93.2

%78

.4%

35.3×

∗Ba

sed

onan

exom

esi

zeof

6208

528

6bp

.

Tabl

e7.

2:Se

quen

ceva

rian

tsid

enti

fied

byex

ome

sequ

enci

ngin

CM

T44

Filt

erII

I:1II

I:5II

I:7IV

:1II

:2(S

NV

s/in

dels

)A

llV

aria

nts

6896

3/14

923

6808

7/14

092

6720

5/12

703

6063

6/12

446

6128

9/12

762

CD

SV

aria

nts

1935

4/84

519

200/

651

1916

5/48

617

954/

568

1790

7/61

7N

S/SS

Var

iant

s96

75/8

3699

31/6

4693

06/4

8189

42/5

6090

98/6

09N

S/SS

Var

iant

s&

not

inpu

blic

data

base

s∗

1037

/628

624/

414

309/

255

677/

367

686/

413

∗P

ubl

icSN

Pd

atab

ases

use

d:

dbS

NP

131,

dbS

NP

132

and

1000

geno

mes

SNP

call

rele

ase

(201

0110

9).

7 exome sequencing of cmt44 175

I:1 I:2

DNA

II:1

DNA

II:2

DNA

III:1

DNA

III:2 III:3 III:4

DNA

III:5

DNA

III:6 III:7

DNA

III:8III:9

DNA

IV:1

DNA

IV:2 IV:3 IV:5 IV:6IV:8

?

IV:4

II:5II:3

IV:7

III:10

IV:12 IV:13

II:4

IV:9 IV:10 IV:11

P123L/wtMFN2 - P123L/wt P123L/wt

MFN2 - P123L/wt

P123L/wtMFN2 - wt/wt

DNAH11 - V1692I/wtPCDHGA4 - Y33S/wt

V1692I/wtY33S/wt

V1692I/wtY33S/wt

DNAH11 - V1692I/wtPCDHGA4 - Y33S/wt

DNAH11 - V1692I/wtPCDHGA4 - Y33S/wt

wt/wtwt/wt

DNA DNA

P123L/wt

P123L/wt

wt/wt wt/wt

wt/wt wt/wt

DNA

Figure 7.1: Updated pedigree for family CMT44. Females and males are represented bycircles and squares respectively. solid coloured symbols indicate affected individuals, withorange indicating the dHMN phenotype, blue the mild motor and sensory neuropathyphenotype and black the more severe CMT2 phenotype. Bars indicate the haplotype at7q34-q36 described previously Figure 3.7, with blackened bars indicating the haplotypesegregating with the disease. The genotype for the MFN2, DNAH11 and PCDHGA4mutations are indicated below each individual.

7 exome sequencing of cmt44 176

Individual II:2, previously listed as unaffected (Chapter 3), now aged 84,

shows distal sensory signs with reduced vibration sensation in the legs. No

muscle wasting was apparent. NCS indicated a mild motor and sensory neu-

ropathy. Since the previous clinical examination, clinical symptoms had pro-

gressed slightly, the patient now required the use of a walking stick. The

mild clinical signs would not have brought this individual to the attention

of clinicians, if it were not for the affected offspring. Based on the previous

clinical description, this individual was initially ruled out as the carrier of the

gene mutation in this family.

Affected individual III:1, aged 52, has a severe motor and sensory neuropathy,

with gross wasting and weakness of muscles in the lower legs and hands. Pes

cavus was apparent since childhood. This is a more severe phenotype than

either parent. At 44 years of age a dHMN phenotype was present, however, no

sensory defect was apparent from NCS. The first sensory symptoms started at

49 years of age, confirmed by NCS.

Other family members were not re-examined, however previous clinical de-

scriptions indicate the more severe motor and sensory neuropathy phenotype

described for individual III:1, is seen in affected individuals III:5, III:7 and

IV:1, with pes cavus and leg problems from childhood and motor and sensory

symptoms developing from adulthood. Individuals III:3, III:6, and all other

individuals from generation IV were clinically normal.

The phenotype in generations III and IV is generally more severe than either

parental phenotype from generation II. This suggests that two peripheral

neuropathy genes are combined in this family; the MFN2 P123L mutation

inherited from individual II:2 who has a mild CMT2, and a second gene from

individual II:1 with a pure motor neuropathy. This second gene may be located

within the 7q34-q36 interval or elsewhere in the genome.

7 exome sequencing of cmt44 177

7.3.3 MFN2 genotyping in CMT44 using high resolution melt analysis

Genotyping of the MFN2 P123L mutation in CMT44 family members was car-

ried out using high resolution melt analysis. DNA samples from 12 individuals

from CMT44 were analysed in duplicate (Figure 7.2). These included the two

founder individuals with dHMN and mild CMT2 respectively, four individu-

als with more severe CMT2, four clinically normal individuals (at risk) and

two unaffected ‘married in’ spouses (Figure 7.1). The MFN2 P123L mutation

was present in the five individuals with CMT2, as identified previously using

exome sequencing. Two clinically normal at risk individuals also carry the

MFN2 P123L mutation. The individual with dHMN (II:1), two other unaffected

family members and the two unaffected spouses did not carry the mutation.

The MFN2 P123L mutation shows reduced or age dependant penetrance in

CMT44.

7.3.4 Exome wide sequence variant analysis in CMT44

Given that both parents (II:1 and II:2) showed clinical features consistant

with peripheral neuropathy, it is possible that two mutations are segregating

in CMT44. In addition to individual II:2, segregation analysis of sequence

variants was also carried out with affected individual II:1 as the founder. A

total of 5155 non-synonymous (NS), splice site (SS) and indel variants were

common to the four affected individuals. Excluding those variants that were

common to individual II:2 established that 608 NS/SS variants were inherited

from individual II:1. Of these variants, 12 SNVs and 5 indels were novel, not

reported in dbSNP 132 or the 1000 genomes project (SNP release 20101109).

These variants were further filtered to exclude rare SNPs by examining a panel

of 30 unrelated control exomes, dbSNP 134, the 1000 Genomes project Phase 1

7 exome sequencing of cmt44 178

85 86 87 88 89 90 91 92-0.1

-0.08

-0.06

-0.04

-0.02

0

0.08

0.06

0.04

0.02

0.08

0.06

0.04

0.02

0

1

85 86 87 88 89 90 91 92

Temperature °C

Nor

mal

ised

Fluo

resc

ence

Temperature °C

4Fl

uore

scen

ce

Figure 7.2: HRM genotyping analysis of the MFN2 mutation identified in CMT44.Normalised melt curves (above) and difference curves (below) are displayed for MFN2exon 5. The curves are color coded, indicating: grey for normal individuals not carryingthe mutation, including two married in spouses; red and orange for affected individualswith the mutation; and green for two unaffected at risk individuals carrying themutation. The orange curve differs due to the presence of a second SNP in the exon 5amplicon.

7 exome sequencing of cmt44 179

Integrated Variant Call Set (October 2011), and the NHLBI Exome Sequencing

Project database. After filtering, two novel non-synonymous SNVs remained

in the genes; Axonemal dynein heavy chain 11, (DNAH11:c.G5074A:p.V1692I)

and Protocadherin gamma-A4 (PCDHGA4:c.A98C:p.Y33S). There were no

putative pathogenic mutations identified within the 7q34-q36 disease interval

in CMT44. Furthermore, no novel sequence variants were identified within

these two genes in previous exome sequencing data from family CMT54.

7.3.5 Comparison of SNPs between CMT44 and CMT54

To examine whether the chromosome 7q34-q36 haplotype shared by CMT44

and CMT54 (Chapter 3) represented identical by descent (IBD) or identical

by state (IBS), a set of 30 SNPs from 7q34-q36 were selected from exome

sequencing data (Figure 7.3). Of these SNPs, 11 were homozygous for different

alleles in each family. The remaining alleles were homozygous and identical

between the two families (9 SNPs) or heterozygous with a possible match

(10 SNPs). Therefore, any SNP based haplotype would have at least 11/30

alleles not shared between the two families. It is therefore unlikely that CMT44

and CMT54 share a common ancestor for the 7q34-q36 disease interval as

suggested previously (Chapter 3).

7.4 discussion

We have identified a possible digenic inheritance of dHMN and CMT2 in

family CMT44. Both parents in generation II (Figure 7.1) show clinical features

of peripheral neuropathy. Parent II:1 has a purely motor neuropathy caused

by a novel gene mutation and parent II:2 has a mild CMT2 caused by the

MFN2 P123L mutation. Four additional affected individuals appear to carry

7 exome sequencing of cmt44 180

D7S2511rs2692185SNP1rs2286127rs2074715D7S615rs2538962rs2373284rs10240503rs3801976rs77706740rs9648691rs987456rs2530312rs3194rs1062072rs2530311rs2717829rs2530310rs6962115rs6962292D7S2442D7S2419rs1014095rs6956231rs10271133rs2072406rs933436rs2007404rs3217095SNP2rs2072407rs10274535rs3214332D7S688

146.5

147.2

148.3148.4

148.5

6AAAA/G1ATAG/ACGAACGGCTT/GT/C43CGTATC-GGG-2

1TA/CA/GG1C/ACA/GAC/AACGAAAGCT/GT/C43CGCATC-

G/AGG-4

CMT44 CMT54Position (Mb) Marker

Min

imum

pro

babl

e ca

ndid

ate

inte

rval

CM

T54

Min

imal

regi

on s

ugge

sted

by

com

mon

hap

loty

pes

Figure 7.3: Updated comparison of haplotypes between CMT44 and CMT54 usingSNP data from NGS. The minimal region suggested by a founder effect in CMT54 andadditional dHMN families (Chapter 3) is flanked by the markers D7S615 and D7S2419.Physical map of the interval shows the position of markers, with coordinates based onthe Human Feb. 2009 (GRCh37/hg19) assembly. Markers include five microsatellites(Chapter 3) and 30 SNPs from NGS. The marker alleles are shown for CMT44 andCMT54, with alleles differing between the two families coloured red. Microsatellitemarker alleles are numbered and SNP markers alleles show nucleotides (ATGC) or 1bp deletions (-).

7 exome sequencing of cmt44 181

both the MFN2 mutation and the second disease gene and have a substantially

more severe, earlier onset, CMT2 phenotype than either parent. The MFN2

mutation shows reduced or age dependant penetrance, with two unaffected at

risk individuals also carrying the mutation. The second gene mutation causing

the founder dHMN phenotype is possibly the DNAH11 V1692I variant or the

PCDHGA4 Y33S variant identified from exome analysis. The two mutations

segregating in CMT44 may have an additive effect creating the more severe

phenotype seen in individuals carrying both. This may also explain the lower

penetrance seen in generation IV of CMT44, where fewer individuals are likely

to carry both mutations.

A digenic inheritance has been previously described in a three generation

dHMN family with both a BSCL2 mutation and a second disease locus on

chromosome 16p [146]. In this family, 12 affected individuals inherited both

the BSCL2 mutation and the 16p locus, and one individual, with sub-clinical

motor neuron damage, carried only the 16p locus. The 16p locus was thought

to contribute to the dHMN phenotype seen in this family. Similarly, a family

with a CMT1 phenotype, has been described with mutations in the SH3TC2

gene [336]. Normally, compound heterozygous SH3TC2 mutations cause the

severe recessive disorder CMT4C through loss of function. In this family,

individuals carrying a single missense mutation Y169H, were diagnosed with

a mild axonal neuropathy. A second SH3TC2 mutation, R945X, was identified

in the family, which on its own did not cause disease. However, this second

mutation had an additive effect to the Y169H mutation, with affected patients

who were compound heterozygous for the SH3TC2 mutations having a more

severe CMT1 phenotype.

Neither of the two candidate dHMN genes DNAH11 or PCDHGA4 in CMT44

have been previously associated with a neurodegenerative disease. DNAH11

is an axonemal dynein, which are molecular motors that drive the beating

of cilia and flagella. These are distinct to the cytoplasmic dyneins involved

7 exome sequencing of cmt44 182

in retrograde axonal transport in neurons. Homozygous and compound het-

erozygous mutations in DNAH11 cause the recessive disorder, primary ciliary

dyskinesia 7 (PCD7; MIM#611884), caused by immotile cilia [363, 364]. A role

for cytoplasmic dyenins has been suggested in several neurodegenerative dis-

eases [365, 366], with rare mutations of DCTN1 in dHMN7B [64], ALS and

ALS with FTD [65–67]. The DNAH11 V1692I mutation is a highly conserved

amino acid in the stem domain, which joins the cargo binding domains to

the stalk and microtuble binding domains. This is a different region to the

mutations described in PCD7 which all cause truncation of the conserved C

terminal domain [363, 364] or the DCTN1 mutation in dHMN7 which disrupts

the microtubule binding domain [64].

PCDHGA4 is part of the protocadherin-gamma (PCDHγ) gene cluster, one

of three protocadherin gene clusters on chromosome 5q31 [367] (Figure 7.4-A).

Protocadherins are cell-adhesion molecules variably expressed in synaptic junc-

tions in brain and spinal cord [368, 369]. The PCDHγ gene cluster consists of 22

’variable region’ exons which are individually spliced to three ’constant region’

exons (Figure 7.4-B). Transcription of the 22 variable region genes is controlled

by individual promoters with selective cis-splicing removing any intermediate

variable region exons [370, 371]. Each variable exon encodes extracellular and

transmembrane domains of the protocadherin protein, and the constant region

exons encode the intracellular domain [372]. The extracellular domain contains

six highly conserved cadherin domains which mediate cell-cell interactions

when bound to calcium [367] (Figure 7.4-C). The Y33S mutation is located at a

highly conserved amino acid in the first cadherin domain. This amino acid is

100% conserved across the protocadherin genes on 5q31.

A mouse model with dramatic neurodegeration leading to neonatal death

has been described lacking all 22 Pcdhγ genes [368]. Neonatal mice are nearly

immobile and have reduced spinal cord synapses. In culture, neurons differ-

entiate with normal axonal outgrowth and form synapses, but neurons die

7 exome sequencing of cmt44 183

200 kb5q31.3

PCDHGA1PCDHGA2

PCDHGA3PCDHGB1

PCDHGA4PCDHGB2

PCDHGA5PCDHGB3

PCDHGA6PCDHGA7

PCDHGB4PCDHGA8

PCDHGB5PCDHGA9

PCDHGB6PCDHGA10

PCDHGB7

PCDHGA11PCDH-psi3

PCDHGA12PCDHGC3

PCDHGC4PCDHGC5

chr5Scale

PCDHα PCDHβ PCDHγ

A.

B.

C. PCDHGA4Y33S

CA1CA2CA3CA4CA5CA6

Figure 7.4: Organisation of the protocadherin gene cluster on chromosome 5p31. (A)Chromosome 5 ideograph with the location of the protocadherin gene cluster indicatedby a red bar. This locus, expanded below contains three gene families; PCDHα, β andγ. (B) The PDGHγ family contains 22 genes. The variable exons (blue) are tandemlyarranged upstream of the three common exons (orange). The PCDHGA4 variable exonis highlighted red. (C) The PCDHGA4 gene structure is shown, overlaid with theposition of the six cadherin protein domains (CA1-6;blue) transmembrane domain (red)and intracellular domain (yellow). The Y33S variant is located in the first CA domain.Protein domain coordinates were downloaded from the UniProt Knowledgebase [376].

soon afterwards. The neuronal die-back was shown to be cased by fewer and

weaker synapses after the deletion of Pcdhγ genes [373]. The PCDHγ locus is

unrelated to the CMT4C locus on 5q32 with mutations in the SH3TC2 gene

[220, 374, 375].

Only suggestive evidence of linkage to chromosome 7q34-q36 (LOD score

2.02) was identified in CMT44. With a genome wide linkage analysis it would

be expected that several loci reach a suggestive LOD score of 2 with five

affected individuals. This highlights the limits of genetic analysis using smaller

single families that are unable to generate significant linkage scores. In this

7 exome sequencing of cmt44 184

analysis we identified 17 rare SNVs or indels that segregated with five affected

individuals in family CMT44. It is likely that suitably informative markers at

these 17 loci would generate LOD scores approaching the maximum theoretical

value of 2 for this pedigree. The SNP genotype results indicate that there is

no common chromosome 7 haplotype between CMT54 and CMT44 and they

should be considered to be unrelated. A similar haplotype analysis linking

two CMTX3 families was recently shown to be incorrect when considering

additional SNP markers across the haplotype [357]. The SNPs used as markers

here were all common, with a population incidence of > 0.1 (mean 0.4). As

such they are unsuitable for haplotype comparison between families due to the

high probability of identity by state. However, for homozygous SNPs where

nucleotides are unique to each pedigree, the haplotypes can be effectively

shown to be different.

Analysis of exome data indicated that chromosome 7q34-q36 is unlikely to

play a role in the disease in CMT44. The two genes with non-synonymous

SNVs described here, in particular PCDHGA4, are good candidates for further

analysis in additional dHMN and CMT families. Unfortunately this analysis

was beyond the time-frame of this candidature. To confirm a pathogenic role for

either of these genes will require the identification of additional mutations in

pedigrees with a similar phenotype and further functional analysis. Functional

analysis of these genes including cell and animal models, may shed light on

the mechanisms that underlie neurodegenerative disease.

8G E N E R A L D I S C U S S I O N

Distal HMN is a disorder of the peripheral nervous system, predominantly

affecting motor neurons. It clinically resembles the related hereditary motor

and sensory neuropathies including CMT. This PhD project aimed to identify

the defective gene on chromosome 7q34-q36 that is responsible for autosomal

dominant dHMN1. A combination of traditional and new technologies were

applied including linkage analysis, positional cloning and NGS. In addition

to analysis of the original linked family CMT54, linkage analysis was carried

out in an additional 20 families from an Australian dHMN cohort, with exome

sequencing in one of these families with evidence of suggestive linkage to the

dHMN1 7q34-q36 disease locus.

8.1 genetic studies in family cmt54

Linkage and haplotype analysis in family CMT54 was used to refine the

minimum probable candidate interval containing the dHMN disease gene, to

the 6.92 Mb interval flanked by D7S615 and D7S2546. This interval is defined

by recombination events in clinically unaffected individuals in CMT54. This is

only considered a ‘probable’ candidate region due the possibility of reduced

penetrance in dHMN. The critical disease interval defined by recombinations in

affected individuals is the 12.98 Mb interval flanked by D7S2513 and D7S637.

The 12.98 Mb interval on 7q34-q36 contains 89 known candidate genes (not

including transcripts for taste, olfactory or T-cell receptors), with 56 candidate

genes within the minimum probable candidate interval. A positional candidate

185

8 general discussion 186

based approach was initially applied to screen selected candidate genes using

Sanger sequencing of exons and flanking intronic regions. With the possibility

of reduced penetrance, the minimum probable candidate interval was used

as a guide for prioritising candidate genes for sequencing. No pathogenic

mutation was identified in 32 genes screened (and 1 partially screened gene)

in CMT54.

Sanger sequencing can identify small scale sequence changes including point

mutations and small indels, but cannot identify larger scale structural variation.

Therefore, two complementary approaches were used to identify structural

variation in the 7q34-q36 interval in CMT54: karyotyping of chromosomal

spreads and aCGH. No microscopic structural variation or chromosomal

abnormalities were identified using karyotyping. Array based CGH identified

89 unique CNVs ranging from 4 bp to 110 312 bp, with a mean size of 3847

bp. Four deletions segregated with the disease when just three affected and

one unaffected ‘married in’ spouse were compared, however all were shown to

be non-pathogenic, being intragenic sequences or identified in normal control

populations. This excluded structural variation at a microscopic resolution and

CNV from an approximate resolution of 500 bp and greater as a source of the

causative disease mechanism in CMT54.

Using NGS, the 7q34-q36 disease interval was initially sequenced in one af-

fected individual using targeted DNA capture of the 7q34-q36 interval and 454

sequencing. This achieved coverage of ∼73% of the critical disease interval with

an average read-depth of 18.84 ×. No putative pathogenic sequence variant

was identified from all exons, flanking intronic intervals and 5’ and 3’ UTRs.

Using exome sequencing of two additional affected and one family control in-

dividual, and re-sequencing of the DNA target capture individual, the coding

exons from the 7q34-q36 interval were again examined with greater sequenc-

ing coverage. Differences in sequencing coverage were demonstrated between

patients, which limited the identification of confirmed known sequence vari-

8 general discussion 187

ants segregating with the affected individuals. To address this, a method was

developed to examine the segregation of all identified variants considering the

sequence read-depth in order to eliminate false negatives in affected individu-

als. Using this method, 99.4% of exons defined by the CCDS database within

the 7q34-q36 interval were sequenced with an average read depth of 227 ×. As

no putative pathogenic mutation was identified in CMT54 within the 7q34-q36

interval, novel non-synonymous changes from the remainder of the exome

were examined. After filtering using in silico based predictions of the effect of

sequence variants and sequencing in additional CMT54 family members, all

remaining non-synonymous changes were excluded.

While the sequencing of genes within the 7q34-q36 interval was compre-

hensive, including both the targeted Sanger sequencing and NGS based ap-

proaches, several possibilities still exist for the identification of the disease

mutation. These include; mutations in genomic regions not included in the cur-

rent analysis; or mutations that cause disease through a alternate mechanism

not detected by these methods.

Considering genomic regions not covered by the current analysis, it is

possible that the mutation may be present in: an unknown gene; an unknown

isoform of a known gene; or a regulatory region. In the positional candidate

based approach using Sanger sequencing, all alternate isoforms and known

exons were considered for each candidate gene. This was achieved by using

annotations from the UCSC genes track (UCSC genome browser) which uses

data from RefSeq, Genbank, CCDS and UniProt databases. Given the large

number of alternate transcripts for known genes in the candidate region,

it is reasonable to assume that other isoforms are yet to be discovered.The

sequencing approach using targeted DNA capture and NGS of the 7q34-q36

interval also applied the gene annotations from the UCSC genes track. On

the other hand, exome sequencing only targeted CCDS genes, as used in the

NimbleGen 2.1M exome capture arrays. The CCDS genes database includes a

8 general discussion 188

core set of protein coding regions that are consistently annotated and of high

quality. However, not all genes and alternate transcripts are included [354, 377].

While the exome capture array targets most genes and exons within the 7q34-

q36 interval, there are some notable exceptions, including DPP6, screened

previously as a positional candidate using Sanger sequencing. Additionally,

while most genes have been effectively identified since the completion of

the human genome project [227, 228], annotations continue to be updated.

An example of this is the AGAP3 gene (formerly CENTG3), within the 7q34-

q36 interval, which has been revised during the course of this project with

additional exons and isoforms annotated. While sequencing concentrated on

coding sequences and intronic splice site intervals, regulatory regions including

the UTR exons and core promoter were also examined. However, this analysis

was limited to basic annotation such as upstream start codons or mutated stop

codons. Several other regulatory mechanisms were not examined such as RNA

binding sites, metal response element binding sites, secondary structure, and

upstream non-core promoter elements, due to the difficulty in identifying these

regions for most genes in silico. Sequence coverage for UTRs was variable

between the platforms used. While all UTRs were targeted by the NimbleGen

7q34-q36 DNA capture arrays, these regions have been shown to be poorly

covered by NimbleGen exome capture platforms, only sequenced by over-run

of sequencing from coding sequences [227].

Considering structural variation as a disease mechanism at the 7q34-q36 in-

terval, there remains a gap in completeness of structural variation analysis due

to the limitations of resolution for the methods used in this thesis. Karyotyping

is able to assess all structural variation within the genome but is limited by

its microscopic resolution of approximately 3 Mb. Array based CGH used in

this thesis achieves a much higher resolution, with a very high probe density,

and was able to report on duplications and deletions down to a resolution of

only a few bp. However, the probe density varied across the 7q34-q36 interval

8 general discussion 189

with variable suitability of the sequence to probe design. The average probe

density achieved a resolution to detect a CNV of approximately 500 bp or

greater across the 7q34-q36 interval. This analysis is limited to duplications

and deletions of unique sequences; it does not explore inversions or balanced

translocations or known repeat elements such as microsatellites. Small indels

within genes in the 7q34-q36 interval were identified with targeted Sanger

sequencing and NGS analysis (454 sequencing and gsMapper software), how-

ever a gap remains in the resolution between the sequencing and aCGH based

platforms.

Recently a non-coding GGGGCC hexanucleotide repeat in the gene C9ORF72

was identified as a genetic cause of ALS and frontotemporal dementia (FTD)

by two studies [378, 379]. The size of this repeat ranges from 2-23 repeats in

controls to approximately 700 to 1600 repeats in patients [378]. The disease

associated allele is too large to amplify by PCR due to the large number of GC

rich repeats. Hence, in one study, this repeat expansion was identified by an

aberrant segregation pattern of alleles in a pedigree. In the second study, the

presence of the expanded repeat was identified by manual alignment of NGS

reads around a cluster of SNPs at this location. A similar repeat would not have

been identified by any of the methods employed in this project. With expansion

of short tandem repeats now demonstrated as a major disease mechanism in

ALS, similar repeats within genes in the 7q34-q36 interval in CMT54 are good

candidates for further gene screening.

A potential location for repeat elements is the gap in the human genome

reference sequence located within the intronic region of the DPP6 gene. This is

a non-bridged sequence gap with a nominal size of 99 kb [255] (Section 4.3.1).

Almost all such remaining gaps contain tandem repeat sequences [227]. This

region of DPP6 is excluded from the disease locus by a genetic recombination

in an unaffected individual by one informative marker (D7S2546). However, if

this unaffected individual were non-penetrant, then the disease gene based on

8 general discussion 190

positional cloning would be DPP6 alone. DPP6 was initially selected as a good

positional candidate gene due to its association with susceptibility to sporadic

ALS in several European populations [264–266] This association could not be

explained by coding sequence mutations in DPP6 [380]. DPP6 was sequenced

in this thesis, with no mutations identified. The potential size of the gap in the

genome reference sequence assembly makes it difficult to quantify using PCR

based methods.

The initial genome wide linkage analysis carried out in CMT54 identified

7q34-q36 as the only locus with significant linkage to the disease. While three

other loci achieved two-point LOD scores above 1.5, all non-synonymous

sequence variants in these intervals identified by exome sequencing were

excluded. Therefore, it remains likely that the causative disease mutation is

located within the 7q34-q36 locus and is yet to be identified.

8.2 genetic linkage analysis in additional dhmn families

Using linkage analysis, the 7q34-q36 interval was examined in a cohort of

20 families with dHMN or CMT with predominant motor phenotype. One

family (CMT44) was identified with suggestive linkage to 7q34-q36 (two-point

LOD score of 2.02) reaching the the maximum theoretical power for this

pedigree. Seven additional families had a haplotype segregating with affected

individuals at 7q34-q36, however, they were uninformative in linkage analysis.

Linkage to 7q34-q36 was excluded in the remaining 12 families.

Comparison of haplotypes between the families not excluded and a control

group of 56 haplotypes indicated CMT54 and three other families (including

CMT44) had haplotypes that statistically were significantly associated with the

disease. This implied that the haplotypes were IBD and the families share a

common ancestor for the 7q34-q36 interval. The common interval shared by

8 general discussion 191

these haplotypes suggested a smaller disease interval of 1.14 Mb. However,

subsequent analysis comparing CMT54 and CMT44 haplotypes using a larger

set of SNP markers showed the haplotypes are different. The initial false

haplotype association may have been caused by chance or due to differences

in ethnicity between the disease and control groups used in the analysis. The

families with the common haplotypes all had similar ethnic backgrounds.

With no additional families conclusively linked to 7q34-q36 from the larger

dHMN cohort it is likely that the gene on 7q34-q36 is a rare cause of dHMN.

This reflects the genetic heterogeneity in dHMN overall, with most genes

accounting for only a small percentage of disease [6]. If the disease gene on

7q34-q36 has a similar incidence to other dHMN genes we can expect few

of the 20 families to be linked to 7q34-q36. However, this does not exclude

the small families that were uninformative or gave suggestive LOD scores

at 7q34-q36. Many of these families achieved the maximum theoretical LOD

scores for markers at this locus.

8.3 genetic studies in family cmt44

In dHMN family, CMT44, exome sequencing identified the potential digenic

inheritance of the known CMT gene MFN2 and an unknown mutation in a

second gene. In this pedigree, the founder dHMN individual does not carry

the MFN2 mutation, which is instead transmitted from his spouse who has a

mild CMT phenotype. The four other affected individuals have a more severe

combined dHMN/CMT2 predominant motor phenotype and carry the MFN2

mutation and the unknown mutation. The MFN2 mutation shows reduced or

age dependant penetrance. The founder CMT individual has mild symptoms

and onset during advanced adulthood. Two clinically unaffected but at risk

individuals also carry the MFN2 mutation. A second mutation is likely to

8 general discussion 192

be inherited from the founder dHMN parent causing the dHMN phenotype.

A haplotype at the 7q34-q36 disease interval was shown to segregate with

individuals in CMT44 with a dHMN phenotype and not in any of the other

family members. No putative pathogenic sequence variants were identified

within the 7q34-q36 interval using exome sequencing. However, two candidate

SNVs were identified in the genes DNAH11 and PCDHGA4. The SNV in PCD-

HGA4 is a particularly good candidate gene for dHMN, with essential roles in

the brain and spinal cord and a mouse model with severe neurodegenerative

phenotype. As no pathogenic mutation was identified within the 7q34-q36

interval in CMT44, it is unlikely that this locus plays a major role in the disease

and the suggestive (but insignificant) LOD score of 2 has probably occurred by

chance.

The MFN2 P123L mutation was previously described in a small pedigree

with severe early onset CMT2. This phenotype contrasts with the reduced

penetrance and mild phenotype seen in some individuals in CMT44. The more

severe dHMN phenotype in four individuals is likely to result from inheritance

of both the dHMN and CMT2 genes from the two independently affected

parents. The genetic analysis of CMT44 highlights the difficulties associated

with diagnosis of dHMN and related peripheral neuropathies and reaffirms

that caution must be applied when using unaffected individuals to define

disease intervals.

8.4 conclusion

Identification of new gene mutations is critical to further understanding the

biochemical and cellular processes underlying dHMN. Although the causative

mutation for dHMN on 7q34-q36 was not identified, a significant proportion

of the disease interval has been excluded using a combination of traditional

8 general discussion 193

and new technologies. This project spanned a period where there has been a

revolution in sequencing technologies. NGS continues to be developed and

improved. Bioinformatics platforms are also rapidly advancing for analysis

of the vast amounts of sequencing data. Much of the sequencing data gen-

erated by NGS in this project can be re-examined when new more powerful

bioinformatics approaches address current limitations or when more powerful

computing resources become available. In addition to the analysis of 7q34-q36

in family CMT54, a strong candidate dHMN gene mutation has been identified

in CMT44. Finding the 7q34-q36 disease gene and other dHMN mutations will

lead to improved dHMN diagnostics and may identify potential therapeutic

targets for disease prevention or treatment. The further analysis of rare genetic

variants identified through NGS will allow us to characterise the influence

of less penetrant modifier loci on the variable phenotypes seen in dHMN

and other peripheral neuropathies. Future studies should include transcrip-

tome analysis by next-gen RNA sequencing, which may identify unknown

transcripts and exons that map to chromosome 7q34-q36.

AC U S T O M G N U B A S H S C R I P T S

This Appendix details the BAsH scripts written to automate tasks and analysis

in this thesis. All scripts were written by myself for BAsH, version 4.1.5 (Free

Software Foundation, Inc) using Ubuntu 10.04.4 LTS (www.ubuntu.com).

a.1 linkage analysis

The script pedmake.sh converts MLINK pedigree and marker data files into

the input format required for the Easylinkage plus software package. File

and marker naming limitations apply when using this script. File names can

have up to 4 digits in a marker (e.g. D7s1234). Unique marker names are

supported if added when prompted by the script and must be 6 or 7 characters

long. For markers not named accordingly, manually edit the mlink.DAT file

before running the script to change the marker names to comply with these

requirements. Family names should be 6 digits long (e.g. CMT123). The output

files from this script have been tested with; FastSLink, FastLink, SuperLink,

and GeneHunter.

1 #!/bin/bash

2 echo "converts cyrillic output into easy linkage format"

3 OPTIONS="Help Run Quit"

4 select opt in $OPTIONS; do

5 if [ "$opt" = "Quit" ]; then

6 echo done

7 exit

8 elif [ "$opt" = "Run" ]; then

9

10 echo "enter the marker chromosome"

11 read chrom

12 echo "D"$chrom > pattern.tmp

13 echo "d"$chrom >> pattern.tmp

194

a custom gnu bash scripts 195

14 echo "Are there any special microsat names (y/n)?"

15 read REPLY

16 while [ $REPLY == "y" ]

17 do

18 # if [ $REPLY == "y" ]; then

19 echo "enter the special name"

20 read specM

21 echo $specM >> pattern.tmp

22 echo "another(y/n)?"

23 read REPLY

24 # fi

25 done

26

27 echo "*searching for patterns"

28 cat pattern.tmp

29 grep -f pattern.tmp mlink.DAT > markers.tmp # get a list of markers from .

DAT file

30 echo "*the following markers were found"

31 cat markers.tmp

32 rm pattern.tmp

33

34 echo "is the disease marker in the 1st position (y/n)?"

35 read REPLY

36 if [ $REPLY == "y" ]; then

37 b=42 #position of first marker score A_1

38 d=45 #position of second marker A_2

39 elif [ $REPLY == "n" ]; then

40 b=36 #position of first marker score A_1

41 d=39 #position of second marker A_2

42 fi

43 fcount=1 # file counter

44

45 while read line

46 do

47 extnum=$(printf %03d $fcount) #adds leading zeros to the

extension mumber for the file name

48 markerC=‘echo "$line"‘; # marker name line info

49 markerD=‘echo ${markerC/\"/\@}‘ # Replaces first match of " with

@

50 markerE=‘echo ${markerD/\"}‘ # Removes last "

51 markerN=‘echo ${markerE:\‘expr index "$markerE" ’\@’\‘:7}‘ # holds the

marker name with the extra space at the end

52 echo "generating file for" $markerN

53

54 filename=‘echo ${markerE:\‘expr index "$markerE" ’\@’\‘:${#markerN}}"_"

$chrom"_"$extnum".abi"‘

55

56 echo $filename

57 echo "output for marker "$markerN" into file: "$filename

58 echo -e "MARKER\tLANE\tID\tA_1\tA_2" > $filename #file header

59 a=0 #line counter

60 while read line

61 do

62 a=$(($a+1)) #a++

a custom gnu bash scripts 196

63 lineC=‘echo -e "$line \n"‘ # load marker scores line into

lineC

64 echo -e ${markerE:‘expr index "$markerE" ’\@’‘:${#markerN}}"\t1\

t"$a"\t"${lineC:‘echo $b‘:2}"\t"${lineC:‘echo $d‘:2} >>

$filename ; # MarkerName 1 $a A_1 A_2

65 done < "mlink.pre"

66 b=$(($b+10)) # move to the next marker A_1

67 d=$(($d+10)) # move to the next marker A_2

68 fcount=$(($fcount+1)) #count to the next file

69 cat ‘echo $filename‘

70 done < "markers.tmp"

71

72 # creating the p_family.pro file

73 #cut -b 1-39 mlink.pre > p_family.pro # create the pedigree file for easylinkage

74 # required extra field for SLink

75 a=0 #line counter

76 b=$(($b+1)) # move to the correct position for ’S’ in Sample

77 while read line

78 do

79 a=$(($a+1)) #a++

80 if [ ‘echo ${line:\‘echo $b\‘:1}‘ == "S" ]

81 then

82 echo "Sample"

83 # echo -e ${line:0:13}"\t"${line:13:7}"\t"${line:21:7}"\t"${line

:28:4}"\t"${line:32:5}"\t"${line:37:6}"\t2" >> pro.tmp ; #

84 echo -e ${line:0:41}"\t2" >> pro.tmp ; #

85 else

86 echo "No sample"

87 # echo -e ${line:0:13}"\t"${line:13:7}"\t"${line:21:7}"\t"${line

:28:4}"\t"${line:32:5}"\t"${line:37:6}"\t0" >> pro.tmp ; #

88 echo -e ${line:0:41}"\t0" >> pro.tmp ; #

89 fi

90 done < "mlink.pre"

91

92 cat pro.tmp > p_family.pro

93 cat p_family.pro

94 rm pro.tmp

95

96 echo "all done. Choose another option 1 (HELP), 2 (RUN) or 3 (EXIT)."

97

98 rm markers.tmp

99

100 elif [ "$opt" = "Help" ]; then

101 echo -e "limitations marker format D7s1234, with up to \n four

digits for the marker name. Doesnt work \n for made up

markers unless they are added in as specials \n will find

D7S as well as d7s. There cannot be any spaces in marker

names, manually edit these if there is not many in the mlink

.DAT file. Make marker names 6 or 7 characters long"

102 else

103 clear

104 echo bad option

105 fi

106 done �

a custom gnu bash scripts 197

a.2 ngs coverage visualisation

The script PileupToBgr.sh extracts sequencing depth from SAMTOOLS pileup

format and formats a bedgraph file for visualisation using the UCSC genome

browser. To work there must be some sequence coverage at the start position

selected. If there is no coverage at the desired start position then a position in

the previous gene must be selected where some sequence coverage is present.

1 #!/bin/bash

2

3 echo this script will take pileup formatted data and generate a bedgraph showing

the sequencing depth at each base position

4 echo it is recommended to examine small regions only to limit the size of the

files uploaded to the UCSC genome browser

5

6 echo enter sample number # id of the sample being examined

7 read sample

8 echo "enter the chromosome in the format chrN"

9 read chrom

10 echo "enter the start position for the gene or region of interest in the format

123456789" # 1st base position of gene of interest

11 read GeneStart

12 echo "enter the end position of the region of interest in the format 123456789"

13 read GeneEnd

14 echo enter the gene or region name "(Identifier)" # name of the gene of interest

15 read GeneName

16 GLength=$[$GeneEnd-$GeneStart] #calculate the length of the region of interest

17

18 # 1. make a smaller file with the info from just one chromosome

19 echo do you want to isolate chromosome data? only needs to be "done once. y/n" #

skip this step if it has already been done previously

20 read Answer

21 if [ $Answer = "y" ]; then

22 str1="grep "$chrom" "$sample".pileup > "$sample"."$chrom".pileup"

23 echo isolating $chrom data

24 eval $str1

25 fi

26

27 # 2. make a pile up file contining the sequence information for the gene of

interest

28 # due to gaps in the sequence where there is no coverage at all this pileup will

contain more coverage information at the end of the gene. This is because

it counts lines equal to the length of the gene so ant gaps will cause lines

after the gene to be included. This is no problem and beneficial as you can

see where the gaps really are.

29 str2="grep -A "$GLength" "$GeneStart" "$sample"."$chrom".pileup > "$GeneName".

pileup.tmp"

30 echo isolating $GeneName data

31 eval $str2

a custom gnu bash scripts 198

32

33 # 3. turn pileup into bedgraph format

34 echo generating bedgraph format

35 str3="cat "$GeneName".pileup.tmp | cut -f 1,2 > one.tmp"

36 eval $str3

37 str4="cat "$GeneName".pileup.tmp | cut -f 2,4 > two.tmp"

38 eval $str4

39 str5="join -1 2 -2 1 -o 1.1,1.2,2.1,2.2 one.tmp two.tmp > "$GeneName".tmp"

40 eval $str5

41

42 # 4. add header to bedgraph

43 echo adding headers

44 echo "browser position "$chrom":"$GeneStart"-"$GeneEnd > header.tmp

45 echo browser full altGraph > header1.tmp

46 echo track type=bedGraph name=\" $sample $GeneName \" description=\"Sequencing

depth "for" $sample $GeneName \" visibility=full color=200,100,0 altColor

=0,100,200 priority=10 alwaysZero=on autoScale=off viewLimits=0:50 yLineMark

=10 yLineOnOff=on > header2.tmp

47 str6="cat header.tmp header1.tmp header2.tmp "$GeneName".tmp > "$GeneName".bgr"

48 eval $str6

49

50 # 5. remove temporary files

51 echo removing temporary files

52 rm -v *.tmp

53

54 #merge Bedgraphs from different samples into one file with data on mutations in

the region... can load all the mutations into the same file as there are not

that many.

55

56 # 6 upload the GeneName.bgr to the UCSC genome browser

57 echo "process complete" check $GeneName".bgr"

58 #echo GeneName.bgr is now ready to upload to UCSC genome browser. Make sure you

are viewing HG18 build. �

a.3 ngs variant comparison with fold coverage analysis

Two versions have been written, CompSNPAllchr.sh and CompSNPchr7.sh.

CompSNPAllchr.sh replaces CompSNPchr7.sh with more efficient code and

handling of input data from multiple chromosomes. This tool compares SNP

reports between samples using the SNP reports generated with SAMtools. It

lists SNPs common to affected samples and not present in control samples,

checking for coverage where the variant is not reported. The dbSNP annota-

tions can be downloaded using the UCSC table browser. Limitations: Exome

a custom gnu bash scripts 199

sequencing pileup files with good coverage are large (6 gb to 8 gb). Pileup files

should be optimised to ensure good performance. If a linked region is being

examined extract only the target chromosome data using the grep command

(grep chr7 sample1.pileup > sample1.chr7.pileup). Otherwise use the cut com-

mand to remove only the necessary colums out of the data files to reduce their

size (optimised size 6̃0%) (cut -f 1-4 sample1.pileup > sample1.cut1-4.pileup).

The SNP report files are smaller and are not a limit to the comparison on a

reasonable desktop PC. Hard disk speed limits process on most computers,

so locate files on local HDD not on USB HDD or network attached storage. If

large amounts of RAM are available, load SNP and pileup files onto a RAM

disk to increase speed but output should be directed to normal HDD space.

This script is not recommended for routine use as it takes a long time to run

on an entire exome.

grep chr7 sample1.pileup > sample1.chr7.pileup

cut -f 1-4 sample1.pileup > sample1.cut1-4.pileup �1 #!/bin/bash

2

3 echo -e "**************\nCompSNP script\n**************"

4

5 echo "A specific file structure is required to run this script"

6 echo /Project

7 echo "/Project/snpfiles (/Project/\$SamPath/Sample1.snp)"

8 echo /Project/Sample1/Sample1.pileup

9 echo /Project/Sample2/Sample2.pileup

10 echo "Results are located in /Project/results renamed with the time of

completion"

11

12

13 echo -e "\nprogram requirements \n* Sample data: sample.snp and sample.pileup \n

snp file must have the same file name as the .pileup file\n* SNP database

reference for annotation\n eg. chr7q34-q36SNP130.snp \n* read the script

file for more detail\n\n"

14 read -p "Press any key to continue when files are ready or terminate using <ctrl

> C"

15

16 echo -e "\n\nEnter the path to the samples (dont use use / at end of path) \n\n

!!!Current default BGIData/onlychr7data/ will overwrite whatever you type!!!

"

17 read SamPath

18 SamPath=BGIData #SamPath points to the sample.snp files only

19

a custom gnu bash scripts 200

20 #Initialise results files

21 echo "‘date‘" > $SamPath/debug.txt

22 echo "‘date‘" > $SamPath/GoodSNP.txt

23 echo "‘date‘" > $SamPath/BadSNP.txt

24 echo "‘date‘" > $SamPath/checkSNP.tmp

25

26 # get sample numbers without file extensions

27 # should use read to input new sample numbers each time but i was using these

same four samples so added defaults that overwrite input

28 echo -e "Enter your sample numers \n\n!!!Current default will overwrite whatever

you type!!!"

29

30 echo -e "enter sample Affected 1:\c"

31 read Sam1

32 Sam1=A040681 # comment this line if you want to use different samples

33

34 echo -e "enter sample Affected 2:\c"

35 read Sam2

36 Sam2=A050794 # comment this line if you want to use different samples

37

38 echo -e "enter sample Affected 3:\c"

39 read Sam3

40 Sam3=A080682 # comment this line if you want to use different samples

41

42 echo -e "enter sample Control 1:\c"

43 read Sam4

44 Sam4=N010742 # comment this line if you want to use different samples

45

46 read -p "Press any key to start analysis... terminate process early using <ctrl>

C"

47

48 # adds a header with the sample numbers used to results files. Improvement would

create a unique file name each run.

49 echo -e "Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/GoodSNP.txt

50 echo -e "Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/BadSNP.txt

51 echo -e "Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/checkSNP.tmp

52 echo -e "Samples:\t" $Sam1 $Sam2 $Sam3 $Sam4 >> $SamPath/debug.txt

53 # Error checking for files

54 echo "header for file $Sam1.snp" >> $SamPath/debug.txt

55 head -n 1 $SamPath/$Sam1.snp >> $SamPath/debug.txt

56 echo "header for file $Sam1.pileup" >> $SamPath/debug.txt

57 head -n 1 "$Sam1/$Sam1.pileup" >> $SamPath/debug.txt

58

59 echo "header for file $Sam2.snp" >> $SamPath/debug.txt

60 head -n 1 $SamPath/$Sam2.snp >> $SamPath/debug.txt

61 echo "header for file $Sam2.pileup" >> $SamPath/debug.txt

62 head -n 1 "$Sam2/$Sam2.pileup" >> $SamPath/debug.txt

63

64 echo "header for file $Sam3.snp" >> $SamPath/debug.txt

65 head -n 1 $SamPath/$Sam3.snp >> $SamPath/debug.txt

66 echo "header for file $Sam3.pileup" >> $SamPath/debug.txt

67 head -n 1 "$Sam3/$Sam3.pileup" >> $SamPath/debug.txt

68

69 echo "header for file $Sam4.snp" >> $SamPath/debug.txt

a custom gnu bash scripts 201

70 head -n 1 $SamPath/$Sam4.snp >> $SamPath/debug.txt

71 echo "header for file $Sam4.pileup" >> $SamPath/debug.txt

72 head -n 1 "$Sam4/$Sam4.pileup" >> $SamPath/debug.txt

73 echo -e "\nNo file errors should have been reported. There should be a header

for each file." >> $SamPath/debug.txt

74

75 # generate SNP reference file automatically

76

77 echo "Generating SNP reference file"

78 cut -f 1,2 "$SamPath/$Sam1.snp" > AllSNP.tmp

79 cut -f 1,2 "$SamPath/$Sam2.snp" >> AllSNP.tmp

80 cut -f 1,2 "$SamPath/$Sam3.snp" >> AllSNP.tmp

81 cut -f 1,2 "$SamPath/$Sam4.snp" >> AllSNP.tmp

82 sort AllSNP.tmp > AllSNP.sort

83 uniq AllSNP.sort AllSNP.uniq

84 echo -e "unique SNPs: \c"

85 wc -l AllSNP.uniq

86

87 rm AllSNP.tmp

88 rm AllSNP.sort

89

90 DataFile=AllSNP.uniq

91 echo "DataFile is $DataFile" >> "$SamPath/debug.txt"

92 DataCol=1

93 echo "DataCol is $DataCol" >> "$SamPath/debug.txt"

94

95 #Pull out all markers from file (add -u command to sort to pull out unique SNPs)

96 #echo "The data file contains the following SNP positions"

97 str1="cat $DataFile"

98 #str1="cut -f $DataCol $DataFile"

99 #str1="sort -k "$DataCol","$DataCol" "$DataFile" | cut -f "$DataCol

100

101 #Count number of markers in $Datafile

102 echo -e "Begining analysis. \nThis may take some time. \nThere are ‘eval $str1|

wc -l‘ unique SNPs to be cross checked"

103 echo -e "number\tControl\tAff1\tAff2\tAff3\t"

104 #generates an array with the marker positions

105

106 OLD_IFS=$IFS

107 IFS=$’\n’

108 let line_counter=0

109 for line in $(cat "$DataFile"); do

110 str1=‘grep -m 1 ${line} $SamPath/$Sam1.snp | cut -f 1,2‘ # is there a

SNP in thos position?

111 str2=‘grep -m 1 ${line} $SamPath/$Sam2.snp | cut -f 1,2‘

112 str3=‘grep -m 1 ${line} $SamPath/$Sam3.snp | cut -f 1,2‘

113 str4=‘grep -m 1 ${line} $SamPath/$Sam1.snp | cut -f 1,2‘

114 str5=1

115 str6=0

116 echo -n -e "\n$line_counter:\t"

117 while :

118 do

119 str14=n

120 str11=n

a custom gnu bash scripts 202

121 str12=n

122 str13=n

123 if [[ $str4 ]]

124 then

125 str14=Y #This is the control sample so Y here excludes

this SNP

126 echo -n -e $str14 "\t"

127 str5=0

128 str6=0

129 break

130 else

131 str14=‘grep -m 1 ${line} $Sam4/$Sam4.pileup | cut -f 4‘

132 if [[ $str14 -lt 10 ]]

133 then #cannot exclude this SNP coverage is too

low

134 echo -n -e "!!"$str14"!!\t"

135 str6+=1

136 else # if this is above 10 then the sample is

included and will be caught by str5 analysis

137 echo -n -e $str14 "\t"

138 fi

139 fi

140 if [[ $str1 ]] # if the SNP exists in the sample.snp file it is reported

141 then

142 str11=Y

143 echo -n -e $str11 "\t"

144 else # the SNP is not reported in the sample.snp file, check the

coverage in the pileup file. Field 4 is the coverage figure

.

145 str11=‘grep -m 1 ${line} $Sam1/$Sam1.pileup | cut -f 4‘

146 str5=0

147 if [[ $str11 -lt 10 ]]

148 then #cannot exclude this SNP coverage is too

low

149 echo -n -e "!!"$str11"!!\t"

150 str6+=1

151 else #this SNP is not in this sample and has

good coverage therefore is excluded

152 echo -n -e $str11 "\t"

153 str6=0

154 break

155 fi

156 fi

157 if [[ $str2 ]]

158 then

159 str12=Y

160 echo -n -e $str12 "\t"

161 else

162 str12=‘grep -m 1 ${line} $Sam2/$Sam2.pileup | cut -f 4‘

163 str5=0

164 if [[ $str12 -lt 10 ]]

165 then #cannot exclude this SNP coverage is too

low

166 echo -n -e "!!"$str12"!!\t"

a custom gnu bash scripts 203

167 str6+=1

168 else #this SNP is not in this sample and has

good coverage therefore is excluded

169 echo -n -e $str12 "\t"

170 str6=0

171 break

172 fi

173 fi

174 if [[ $str3 ]]

175 then

176 str13=Y

177 echo -n -e $str13 "\t"

178 break

179 else

180 str13=‘grep -m 1 ${line} $Sam3/$Sam3.pileup | cut -f 4‘

181 str5=0

182 if [[ $str13 -lt 10 ]]

183 then #cannot exclude this SNP coverage is too

low

184 echo -n -e "!!"$str13"!!\t"

185 str6+=1

186 break

187 else #this SNP is not in this sample and has

good coverage therefore is excluded

188 echo -n -e $str13 "\t"

189 str6=0

190 break

191 fi

192 fi

193 done

194 if [[ $str5 -gt 0 ]]

195 then #generates a report on this SNP position from the UCSC snp

130 information

196 snpstr=‘grep ${line} chr7q34-q36SNP130.snp | cut -f

5,8,10,16,12,13‘

197 echo -e ${line}":\t" $str11"\t"$str12"\t"$str13"\t"

$str14"\t"$str5"\t" $snpstr >> "$SamPath/GoodSNP.txt

"

198 else # these SNPs are not found

199 echo -e ${line}":\t" $str11".\t"$str12".\t"$str13".\t"

$str14".\t"$str5 >> "$SamPath/BadSNP.txt"

200 fi

201 if [[ $str6 -gt 0 ]]

202 then

203 snpstr=‘grep "${line}" chr7q34-q36SNP130.snp | cut -f

5,8,10,16,12,13‘

204 echo -e ${line}":\t" $str11".\t"$str12".\t"$str13".\t"

$str14".\t"$str5"\t" $snpstr >> "$SamPath/checkSNP.

tmp"

205 fi

206 let line_counter=($line_counter+1)

207 done

208 IFS=$OLD_IFS

209

a custom gnu bash scripts 204

210 echo "SNPs that need to be checked" >> "$SamPath/GoodSNP.txt"

211 cat "$SamPath/checkSNP.tmp" >> "$SamPath/GoodSNP.txt"

212 rm "$SamPath/checkSNP.tmp"

213

214 mkdir results

215 mv "$SamPath/GoodSNP.txt" "results/‘date‘ GoodSNP.txt"

216 mv "$SamPath/BadSNP.txt" "results/‘date‘ BadSNP.txt"

217 mv "$SamPath/debug.txt" "results/‘date‘ debug.txt"

218 echo "the process is complete. The program output can be found in results/" �

BP C R P R I M E R S

This Appendix lists the PCR primers used in this thesis.

Table B.1: PCR primers used in gene screening. Primers were supplied by Invitrogenin a 25 nmole scale desalted purification

Gene / Exon Primer sequence (forward / reverse)RARRES2Ex1A F/R AGGAGCCAGAGTTGCTGGAG / ACCTAAACACGTGGGAGGGEx1B F/R AGAGGAAGGAGGGAAGATTTTG / CCCGGGCCATGCTACTCEx1C F/R AGTATTCGGCCTCCGGC / ACAGTGGCCTTTTCTCCAAAGEx2 F/R ATTTCCAGGCTCCTTCACCT / GAATGCGCTGGTCTCCTGEx3 F/R GGATCACACCAGGGCTTG / TTCTTAATGGACCTCCCAATTCEx4/5 F/R GCCTAGTTGGGGATCTGGTC / CCCCTCAGCATTCCCAGABP1Ex1_1 F/R AAAATTCCATGGCCCTAACC / ATACTCTGCTGTGGAGATGGGEx1_2 F/R AATGTCACCGAGTTTGCTGTG / GCTTGTGGGAGGAGAAGAGGEx1_3 F/R GTACAACGGGAAGTTCTATGGG / ATAATGGACCGGGTCATCGEx1_4 F/R ACCAAGTACCTCGATGTCGG / TTCCACCACCAGGGAGTTATAGEx2 F/R GATCCTGGGGTAGAATGAGGAG / ATCCACATAGACTAGGAACCCCEx3 F/R CTCTATTCTGACCCTGAGGCTG / GATGGTGATGAGGATGGTCCEx4 F/R GAGGAATTCAGCAAGTTTCCAG / CTCTGAGGGTTTATTGTCTGCCEZH2Ex2 F/R ACTGATTGTTAGTTTGCTGCGG / AAAACTTATTGAACTTAGGAGGGGEx3 F/R TTTTGTATTATTTGAATGTGGGAAAC / AAAGATGGACACCCTGAGGTCEx4 F/R ACCCTAAGTAAAAGAAAAGAGAGAATC / GGAAAAGAGTAATACTGCACAGGEx5 F/R AAATCTGGAGAACTGGGTAAAGAC / TCATGCCCTATATGCTTCATAAACEx6 F/R CTAGGCTATGCCTGTTTTGTCC / AAAAGAGAAAGAAGAAACTAAGCCCEx7 F/R GGCATTCCACAGACATGTAAAG / GCTCATCCGCTACATTGATTCEx8 F/R CATCAAAAGTAACACATGGAAACC / GCAATCCTCAAGCAACAAAGEx9 F/R TCCATTAATTGACTTTTCCAGTG / CATTAACGCTGACTTGATCACCEx10 F/R TTCACAATATAGAACTGTCTTGCC / AAACACCACAAGCTAGGGTACGEx11 F/R TTGAGTTGTCCTCATCTTTTCG / CCAAGAATTTTCTTTGTTTGGACEx12 F/R GCATCTAGCAGTGTCACAGAAG / AGAAACCAACAACAGCCCTTAGEx13 F/R TTCATGGCACAGATAGATCCAG / CAAATTGGTTTAACATACAGAAGGCEx14 F/R CACTCCACAGGTAGTAGGGAAG / AGGGAGTGCTCCCATGTTCEx15 F/R GAGAGTCAGTGAGATGCCCAG / TGCCCCAGCTAAATCATCTAAGEx16 F/R TTCTGTAGTCTACTTTGTCCCCAG / ATTCATTTCCAATCAAACCCACEx17 F/R TCCAGAAGGTCCAGTATTCACTC / GTCACCTCACTGACCTCTACCCEx18-19 F/R CAAACCCTGAAGAACTGTAACC / AACCCTCCTTTGTCCAGAGTTCEx20 F/R TTCTATGAATGTGCCGTTATGC / AAAACACTTTGCAGCTGGTGEx20-N F/R CACTATCTTCAGCAGGCTT / GTTAAATCAAGATTCATACAAAGACAACTA

Table B.1 - continued on next page

205

b pcr primers 206

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)FASTKEx1 F/R GTCTCCGAAGACCGATAGCTG / GTCTCCTCACCCGTGAAATGEx2_1 F/R TGAGGTAAGCAGTAGGTAGCAGG / AAGCGCCACCGAGTAGTGEx2_2 F/R GTCTCCGAAGACCGATAGCTG / GTCTCTGTCCGTGGTCCTCTACEx3 F/R AACGGGAGAGGAGAAAGGTG / CTGGCTCAATTCCACCTCAGEx4 F/R CTGACAGGAGGAGGAAGGTG / CTTCAGGAAAAGGATAGCCGAGEx5 F/R GGGACTGTCCTGTCTTTACAGC / ACGCACAATCAGAGCATGAGEx6 F/R AACTACATCAGTGGTACGCAGC / CATGGTCCCTCTGGCTTTTCEx7-8 F/R GTGGTGGCACTAGACTGGGAC / GCAGAAATGCCAGCGTTCEx9-10 F/R AGACCCTGCCCAGAGGTAAAG / ATCAAAACACAGGGAACCAAAGEx1-N F/R GCGTTCTGCTCTGGCTTTG / GTGGCCGCTGCCGCTAAEx2B-N F/R GCTGCTGATCCCTCCAGTA / CTCTGTCCGTGGTCCTCTAEx7-N F/R CTGTGCCTCCAGGCTACT / CCAGCGTTCCCGCAACAEx8-N F/R CGCCACTACTCGAGACC / GGACTCCAGTTCCTCGAAGLR8Ex2 F/R TCTCTCCTTTCACCTGTGCTG / TGTTGCTCCCAAAATGTAATTCEx3 F/R AGAGGTCAAGGCAGAGGAAAAG / CAAGATCCATAAGCGCCTTCEx4 F/R CCTGCTTCACATTGGCTATTC / GGGCAGAAGACAGATAATGGAGEx5 F/R GGGAGGAGAGGGTTTCCTG / ACATCTCTGTCCTCCAGGGEx6 F/R AGCAAGCTGGATAGAGTCAGTG / CCCCGCACTCCTACCTGTCEx7 F/R TTCCTTTCCCTCCTCCTTTC / CATAGGGGAGGCAAGTGTGAGCENTG3Ex1-2 F/R GTTGTTGTTGCCGCTGGAG / TGGAACACAGTCAGCAGAAAACEx3-4 F/R ATAGGTGACACAGGCAGCAG / AGGGAAGAGGACTGGAAGGEx5 F/R CTGAAGGTGAGCGTCACAGG / CAAGTAGACCCTTCTCCCTGGEx6 F/R GGTCTACTTGAGTTCACCTGCC / GAAAAGAGTAGGGGCTTGGGEx7 F/R GTTCTTGGTGCTCTGTGTGTTC / ACTGGGAGCCTTCCTATCGEx8 F/R AGTGAGAGCAAGGCTGTGTGTC / GAAGGTGGACAGGTGTGAGCEx9 F/R CCCCATTAGCACAGGGATTC / TACCGTCTTCCTGACTTGGGEx10 F/R TCTCAGTTCCTTCCCGTCC / CCTTGTTAAGGACTCCGCCEx11 F/R TGACGACATCGTGACCTCC / AAATCACTAAGGTCTGCTTCCCEx12 F/R GCAGGGAGAGACCTAGGAGTTG / CTGTGCCCAGGCTGAGAAGEx13 F/R CTGAATGTCTCAGGCTCTGCTC / AAGAATGCTGCCCAGTTCCEx14 F/R CCTAGAGAGAGGTGTCCGTCTG / GCAACTACTAGAAAGTGCTTAATCCEx15-16 F/R GGTAGTGAGTGGAGTTGTGGC / GCGGGACAACTAGGGAAGGEx17 F/R GAAGCCTTCCCTAGTTGTCCC / TCTAATACCATCACAGCCACTCCEx18 F/R TCTCACTGTTTCTTCCTTGTTCC / GGGAGGAAAGAAGAGGTTGCEx19 F/R TCTTCTTTCCTCCCCTACAACC / TGCTTCTCCATCTCTGTGTCTGUTR-N F/R ATGAGTAGGGGAGAAAGGCATATACTATT / GCTCCTCGCTTACAGCGTCIsoA-Ex1 F/R ATGAGTAGGGGAGAAAGGCATATACTATT / CGCCAGCCCAGTAGTCCIsoA-Ex1B F/R GAACTGCCCGCCGCAGA / GCTCCGAAGCCATGAACTTCCIsoA-Ex9 F/R GGAGGAGGGAGGCAGGA / CCCATCACAGAGGCCTGAGACCN3Ex4-5 F/R CTAAACCCTCAGCTCCTCAGAC / CACGAGTCCTTGCGAAGCEx6 F/R AGTACAAGAACTGTGCCCACC / CACTAGGGCTACGGCCTGEx7-8 F/R CTTGAAAGGAGTGGGAGAACC / ACACCTGGAGGCAAGCAGEx9-10 F/R CCTCGAGATCCTAGACTACCTCTG / AACGGGCACACCCACTGNOS3Ex2 F/R GTCCCCTCCCTCTTCCTAAG / CTCCCGACTCAGCTACAAGTTCEx3 F/R GTGAGATCGCCAGTGCTGTAAC / ATTCAAAGAAGGGTTGAGCTTCEx4 F/R ACCCCTGTACCCTTCCTCC / GGACAAGCAGGAAGTTGTGTGEx5 F/R AGCACTTGCACAAAGCCTG / CGGGAAGCCTTAGGAACTG

Table B.1 - continued on next page

b pcr primers 207

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)Ex6-7 F/R CCTGCCTCCTCACCAGC / GAGGCTCCCCTCTTCCTGEx8 F/R TGAAGGCAGGAGACAGTGG / CTTGCAGGCCCTTCTTGAGEx9 F/R AGCACCAAAGGGATTGACTG / TTTTGGGGATGGAGTGAGAGEx10 F/R GGGAAACAGAGATAGTCTCCCC / TCCTGCCCTAGTTTCTAAATGGEx11-12 F/R GATATGGGGTAATCGAGGGC / ACTGCTGGGGCAGGGTCEx13 F/R CACTCAGTATCCCAAAACCCTG / TCCTGACTTGTAACACTTCCTGTCEx14 F/R GTGTTCAGAGATCAAGTTGGGG / GTTCCTGCCCAGTCACTCCEx15 F/R CAAATCAATAATAATGAGGATCAGC / ACAGAGGGAGAGCTTTTCAGACEx16-17 F/R CCTCAGCCCCTCCCAAG / AAAGAACCAAACAGAATCAGGGEx18-19 F/R AGGAGCAAGACGCAGTGAAG / AGTCAGAGAGGGGTGGGGEx20 F/R CTCCTCCTCCCACATTCTCC / CTGGGTCGGGTCCTGAGEx21 F/R AGTCACCAAAACACAAACATCAG / AGGAGAAGAAAGTCCAGGGGEx22 F/R AAACCCTAAAGAGGCTCAGTGG / AGAGCTCCTTCCTGTTTCCAGEx23-24 F/R GAAGAACTTGGGTCCTCCTTG / TTCCTATTTCCTGCCTTCCAGEx25 F/R TTTCAGCCCAAAACGCTG / CTGGAGAGCGCGAGTCCEx26 F/R GGCTCTGCCCCTGTTGAC / GGAAAAGAGCACAGTGGATCAGEx27 F/R ACATGGAGCTGGACGAGG / AGGAATAATGCTGAATCCTTGCEx4-N F/R GGCCTGTCCTGACCTTT / CCACGTGGTGTTAGGATCTEX18-N F/R CCAGGAGCAAGACGCAGTGAA / GCAAATGAGGCAGCCTGAGCTAEX19-N F/R CTAACCCGGCTGGTTCT / GGGCACTTATGGGGAGTCAEx20-N F/R CCCTGTCCTGAACACCCTGA / CTGGGTCGGGTCCTGAGEX26-N F/R CCCGCTCCGGAGACTTT / CGTTGCTGATCCTGTCGGAAZNF775NC-Ex1 F/R CCAATGGAGTGGCTGGG / CTGGTCCATCCCTCGCACEx2 F/R AGAGTGCTTTATGGGATGGGAG / TGGCTTTAGAAAAGATAAGGGCEx3A F/R TTCTCTCCATCCCACCCAG / GCCGGGAGTGGGTCTTCEx3B F/R GTCACTTTGTATGCCTGGACTG / GCAGCTCTTACCGCAGCExCR F/R AGCTGATTCAGGACGCGG / CTCAGCGACAGCGTGGGEx3D F/R CTCTGCCAGGGCTGGTG / TTCAGAGCCTTAAGGAATCCACEx3E F/R AAGAGCTTCTCGTGGTGGTC / CTGAAGAGGGTGGTCACATCCEx3F F/R GGTGGATTCCTTAAGGCTCTG / CATACCAGTTCTAGAGTCGGGCTMEM176AEx1 F/R CTGCAGCCTGTCATGAACC / CTCCTACGCTCCTCCAACTGEx2 F/R CTCCCCAACTCCAATGAAATC / AAGACATAACTGGCTCGGTGTCEx3 F/R TTTTGGGGTTTTAGCGG / CATGTTGGAGGAAGGCAATCEx4 F/R CCTGCATGGAGGCTCTCTC / TCTCTCTCAGAGTCCAGTCGGEx5 F/R CTGTTGCAGGCTTCTCTGG / GGTAACACCTATTTCTGGTGGAGEx6 F/R ATAAGGAGCATGGGGACAGAG / ATGATAATAGGTGGGACTTGGGEx7 F/R ACCATGCTGCTGATTCTGGTAG / GTTTGCCATTGGCTCCTTCDPP6iso1-Ex1 F/R GCTGAGCCAGGCAGAGTC / AACGTAAGGCGAATTCCAGATiso3-Ex1 F/R CTGCCCTTAGGGACCAGAG / ACTGAGAAGGACAGACCTCCCiso2-Ex1 F/R TTACCTTACCGCTTGGACTCTG / CCCCAAATACATATCCTCCCTCEx2 F/R TAGCTTCAAGAATGGGAAGATG / GCATAGTGAGAATTAGAACTTGGGEx3 F/R AAATGACAATCAGAATGTTTATGG / AGACCCAACCTATTGAAAGTGGIso3-Ex1A F/R CTGCCCTTAGGGACCAGA / CTGAAGACTGGATGGAAACGIso3-Ex1B F/R AAACTGAAGACCTGGAAGATTT / CATCGCTCCTCACGGTGiso1-Ex1A F/R GAGAGAAAGCACAGCCAGA / GGTGTTGATCTTGCCAGTiso1-Ex1B F/R GGCTTCGCTGTACCAGA / AAAATTTCCCCTGCCGACiso2-Ex1 F/R CCACTTGCCGGTTGCTAA / CTCTGTCTCTGTCTCTCTCAExon2 F/R ACTGTTTTGTATCTTGGTTCTTGA / GCATAGTGAGAATTAGAACTTGGGAExon3 F/R CAATCAGAATGTTTATGGTCCTCTC / GCAAATTACTAGACCCAACCTAT

Table B.1 - continued on next page

b pcr primers 208

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)Exon4 F/R AGTGATTTTACTTTACATAAGCATTCG / TTGGCTATGCTAAAAGTTGAAACCExon5 F/R AAATGGCACCAGCAGATAC / AGAAGAGTCAAGGAAATATACTTAATGAGprot-Ex6A F/R AGCATCCTGGGACCATAG / GGTCTCCATGGAATATCTGAAACAprot-Ex6B F/R CATTGCCAGCTTGCCATGTA / CCAGTCTCCTTCCCATGATTAprot-Ex6C F/R CAATGGACCCTCGTGAATAC / GCTTCAGACACATGGAGAAAGExon6 F/R CTGATGTTTGTTAGGATTCACAGT / AACAGCTCGTCCATGTTCExon7 F/R AGGAGAAAGGAGTTAAGTTATAGGTAG / CAGGCTGAGTACAGTGAGATExon8 F/R ACCCGGTTGATGATTTCAG / AAGAACCCAACTCCAATGTAGExon9 F/R CCTCATGTTCCTGCTGG / ACTGAGTGAGCAGGGTTExon10 F/R GCTTTGCAGAGCTGAAGT / CCAGAAGAGACCATGCACExon11 F/R TGGCTGTGTCACCACTG / GGGCTGAGACGACTCTGExon12 F/R GCCAAATTGTTTCCTGCAC / GTGATTTACTCTGAGAAGCCATExon13 F/R ATTTTATCCTGGTTCAACCTCTT / CCTAGAGGCTGTCACCCExon14 F/R GAGGATGAAGAGCCATGC / CTGTACTCCATAAGTGTAATTGACAAAExon15 F/R TGTCCATAGAGGTAGTGCCA / GGTGCAATTCTGCAAAGCExon16 F/R TCCGCAGATCACCCTCAT / CCCTCCCTGCAAGTGTGExon17 F/R AGCCTGTTTTCTTGTTCTCG / GGGCCAGCTGATCTCATExon18 F/R CAGTTACGCACATGAAAAGC / CTGCACTGTGTCACAGCExon19 F/R CACCCAGAGCACCCTCA / TTCAATATTCCTTTATCATGGGTGTExon20A F/R CGGGTCACGGGTAAAATC / CTTCGTGCAGGAGCTTGExon20B F/R GTGGTGGTAAAGTGTGACG / TTTAGGTAAGAGCAAATTGCATGAAExon21 F/R TCTGGGCTACAGGAAGACAA / GGATTAGACGTGGGTAATGGExon22 F/R GAACTGGCTGCTCTGGG / CTGAAGGGAGGGCTGTGExon23 F/R GGCCCTCCCTAGAGTTTG / GAAGGCAGAGTGGGCCATAExon24 F/R TTCTCAGTCAGGTTTCCAAGC / ACCCATGGTGCCATCAAExon25 F/R AGAGCCTCCTCCTTGATG / TGAACACCCGTCAGGTCExon26A F/R CACACCTTCTGTGCGTTC / TGGAGGCCGTGAGTGATExon26B F/R TTGCTTGGGAAACAAGCTC / TCACCATGTTGAACGTCCTCATTAExon26C F/R GTTCATATATGGGCTTGCTACT / TAAACCAAATGTCATGGGAAAGAGExon26D F/R TGTCCTCCTTGGTACCG / CGAAGTCAGTCCTGACATTCExon26E F/R TTCCTTTTGAGCTTGTGGAC / AAGTCTCCTGAAGCCTAGCAExon22-N F/R TGGAACTGGCTGCTCTGGG / GGACGGTGGTGCTGAAGGCUL1Ex2 F/R TTTTCATCTAATGGAGAACTAGCTG/TTTTACATGATGAACTGGGAGGEx3 F/R TTTGGTTGATATATTGGGAATGC/TGCAATAGCAAAACACTGTCACEx4 F/R CGCATATCAAATGTTCAGGG/AAACTGTGATGGTAGGGTCAGGEx5/6 F/R TGTCCTGATAGCAGATTTTGACC/CCTTCGTAAATGCTAACAATATCCEx7 F/R AACATAACTTAAACTGAATGCTACACC/CAACAAATGACTTGTAAAGAAAGCEx8 F/R CCTCCCTTATTTTCTTAATCGC/TTTTAAAACTGACATGGGCAAGEx9 F/R TGAGAATTTTAGGAAGCATTAGAGTTG/AGCCTTTTGTTTCTAGGGAAGCEx10/11 F/R GTTTCAACGATGGTACCAAAAG/GGAGGGAATGCAGATTTTCAACEx12 F/R TGAGATTGCCAGTTTGCTGTAG/GGACTGACTGAGTATTAATGAGAAAAGEx13 F/R AAAGCCAAAGGCACATCTTG/GCTCGGCCATAGAGCTGEx14 F/R TGTGCAACATAGAGTTGTGGG/TAGAGCTGACACCACTTCAAGGEx15 F/R AATAAAGAGCAATCCACATTCATC/ATCTGCAGGCCTGTCCCEx16 F/R GCTTTACTTAGAAACGCTGCC/AAACCTGCCACTATTTCACCAGEx17 F/R AACTTGGGCTGTATATTTTGGG/AAAGATGCCCATATCCTCTTCCEx18/19 F/R GAAAATGGACTGTGAGGTTGC/FAATACTGAGATCCGTGCTAGGGEx20 F/R AAGCAGGGCACTTCTTAAAGG/GCAATCACTCCTTTGTCACCCEx21 F/R CTCTGTGCAGCTTGTCCTTAAC/CATGCTACAGCAATCAATCCACEx22 F/R TGCCAGGAAGTAAAACTCTATGC/ACGGAGGGTCAGTCCCCKCNH2

Table B.1 - continued on next page

b pcr primers 209

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)Ex1 F/R ACCCGAAGCCTAGTGCTGG/CCATCCACACTCGGAAGAGEx2 F/R GAGTGGAGAATGTGGGGAAG/CTCGGGGCGTTCTCCAGEx3 F/R ACTTGGGTTCCAGGGTCC/CCATTACATCCTCCCTTCCTGEx4 F/R GTTCCCCTCCTTCCCTTACC/GTGAAAAGGTCAGAAGCCGAGEx5 F/R GGTCTCCACTCTCGATCTATGG/CTCTGGATCACAGCCCACTCEx6 F/R CTCCTCCTCATTCTGCTTGG/TTTCTCTGTCCTCCTCGCCEx7 F/R AAGTCTTTGGGGATCTGACCTC/GGCTAGCAGCCTCAGTTTCCEx8 F/R ACGCTGAGACTGAGACACTGAC/ACTTGTTTGCTGTGCCAAGAGEx9A F/R AGATTTCTCTGACATGGAGGGG/CCCAGTGACTGCATATTCAGAAGEx9B F/R GCCCTGTACTTCATCTCCCG/GGCCCTTAGTGAAACCAAATGEx1-New F/R AGTCCAGTCTTGGCCGC/ACCAGGCCCCATTGACTCEx2-New F/R CAGTCCCAGCCATGCTTC/CCCACAGAACCCTGCCCEx9-New F/R AGATTTCTCTGACATGGAGGGG/CCCAGTGACTGCATATTCAGAAGEx10 F/R AGCTGAGGGGACATGCTCTG/ATTCAATGTCACACAGCAAAGGEx11 F/R TTTTCTGTGTTAAGGAGGGAGC/AAGGGATGGGAAGGTCTGAGEx12 F/R TCCTGCCCAGTCCTCTCTC/CTGGGTGAGCGGGGTAGEx13 F/R GAAGAGCAGCGACACTTGC/CCCTCTACCAGACAACACCGEx14 F/R AGGGCCTGGGTTGACAG/CAGCAGAAAGGCAGCAAAGEx15 F/R CTCCCGTCCATCCTCTGTC/CTCCTGAGCAGGGCCTCEx1-New2 F/R CCCGAAGCCTAGTGCTG/CCCATCCACACTCGGAAGAEx2-New2 F/R GCTGTGTGAGTGGAGAATG/GTCACACCCCCACAGAACEx4-New2 F/R TCCATTTCCCAGGCCTT/GGCCCAGAATGCAGCAASMARCD3Ex2 F/R CAGATCCTCAGAATGGCCC/ATGTGGTGGGAGCAGGAGEx3 F/R GTCACCAGCCGAGGGTC/CGCTACTCGCTTACCTGGTCEx4 F/R GGTCACTCCTGTGTCCTGAGAG/AGGCCCCTGTGTTCTGGEx5/6 F/R CTCCCCTGACATCTTTTGTACC/CCTTAGTGCAGACACCTTGTTCEx7/8 F/R GCAGAGGATTTCATCTAAGCCC/TCTCTCAAGATGCACCACTTTGEx9 F/R TGTTTCCTCAATCAACACATCC/ATCTAGAAGGGAGGGGTGGTAGEx10/11 F/R GAATCAGACTTCCTCTCAGCTTC/ACACCCTGGTCTCAGAAAAGTCEx12/13 F/R TGAGGGGTTGTCCTTCACAG/CCACAAGCTGAACAGGGAGEx14 F/R GGTGTCAGTAGGTGGGATTGTC/TGAATGACTTTTAATCCAGCCCEx3-New F/R AACGCTCAGGGTCACCAG/CGCTCCCTGCTAATCATTCEx3-New2 F/R TCAACGCTCAGGGTCAC/ACTCGCTTACCTGGTCCNUB1Ex2 F/R TTACCCAACAAAGCACCTTAAC/GCCAAAAGTTTATATGGTCATGCEx3 F/R TCATCAGAGTTGATGGCCTTAG/TGGATTGTAACTTTCCAAACAACEx4 F/R TCAGTTTATATCCTCTGCCGTC/CTCACAGAGGATCAAATAAGCCEx5 F/R ATAGGATCTCTCTGTTGCCCAG/TTGGAAAAGCTATTTTCTGCTTTACEx6/7 F/R GGGTGACAGAGTGAGACCTTG/TCCAACCCTTTACAACACTTCCEx8 F/R TTCTTAATTTTGGCTTTTAGATTGAG/CCACTGTTGTAGATGAAAGTCTGCEx9 F/R AAACACAGCTCCCTTCCTAAAC/GCCAGATTTGTAAAACATTGAACACEx10 F/R TTTAATACCTATCATGTTGCCCC/GAGAAAGCACAGGCATGTTTCEx11 F/R CCAAAATCTGATGCCTTTTAATTC/AGTTTCTCCTTCCTTCCCAATCEx12 F/R CTGGAAATGCGGAGAATCC/CTCAGGGACAAGGCCAGAGEx13 F/R TCATAGCAGCAGAGGACGG/AGTCCTGCTGCCTGATTCACEx14 F/R TCTTAAACATGAGAGTCGGCTG/CCTCAGTCTCCACATCTGCCEx15 F/R CAGTCTGAGGTTGACTGTGGTC/AAAGAGCTGCATTAGCCTTTTCTMUB1Ex2 F/R GACCCTGTGGAGGTTCCAG/ATAAAGGAGGGCTGGGTCTTAGEx3 F/R GAGGGTGGGAGTGCTGTTC/AGAGGCAGGCCGGAGAGEx2-New F/R GGTTTAGGTGAACCGTAACG/CCTTCTGCGACCACTCTC

Table B.1 - continued on next page

b pcr primers 210

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)Ex3-New F/R GTGGGAGTGCTGTTCCG/CAGTCCGTGCCCTTCTCREPIN1Ex1A F/R GTTGGTCTCTGGAGTGACTGTG/CCGCAGATGAGCTCGAAGEx1B F/R GCCCTGGTTCTGCATCTG/GATCGCTTGTGAATCTTGCTGEx1C F/R CGCTTTACCAATAAGCCCTATC/GGCTGCCCTGGGAGAAGEx1D F/R AAGAACTTCGGCAAGAAGACG/TAGGATTCCCATTGTCCCCTACABCB8Ex1 F/R AGCCAACATAGAGCCCTCAG/CTCGTGGACTCCATCTCCCEx2 F/R AACAAGCATAAACATTTGCGG/CTACATGAGGCCTTCTCCCACEx3 F/R AAGAGGAAGTGCTCCTTCCAG/GGAATGCCATCCCTTTGCEx4/5 F/R GACAAGCGCAGTGCTCAGTAG/CTTCAGGATAAGGTCCTACCCCEx6 F/R GTGACTGATGGCCTGAGGG/GTCAGCCCAACATCAGGGEx7/8 F/R CTGACCCTTGGAAAAGTCCTTC/TGAGAGAGACCAGTGCATTCAGEx9 F/R AAGAGGAGTGAGTCCTGGGAG/ATGAAGGGAGGGCCTTTGEx10 F/R ACTCTCCCAGTCAGCAGTGG/AGCTGCACAGCAGTCGGEx11/12 F/R ACCTGCTTCCTTGCTCCC/GGGGACATTAGTCCAGACCTCEx13/14 F/R AGGGCTACAGGGAGACTTCAG/AAAGAAAGAAAAGGCACCTGAGEx15 F/R GACTGTCCCATTGTGTATGGTG/CTGCTCACTGCTCCTGTGGEx16 F/R GGCACTTAGAGGATCTTGATGG/CAACTTCATATAACTCATTGCAGCEx17 F/R GCCCAATGGCTGATAGAGAG/AGTCCCCAGTAGGCTCCCCRYGNEx1 F/R AGCAAATCTTTTGTCCGGG/GGATTTGTCACCATCAACAGTGEx2 F/R CTGGGAGTAGGGGTTCCATC/GATTCCTTGCCACTGTCTGCEx3 F/R GCTTCTGTTTCTGGGAGAGG/CACTGCTGGAGGTCTCATTCEx4 F/R GTAGATGGGTCCCCAAAAGG/AACTGGCAGAGAAATGAAAACTGRHEBEx1 F/R AAAAGCGGCGGAAGAAG/CTCACTTCCCCAGACGAGACEx2 F/R AATGTTTTCTGACCCTGATACC/CCTCCATGAATACACAATGGCEx3 F/R TCATTGTTCAAACTTCAGGTGG/ACAACCTGAGGGGTTTTCTTGEx4 F/R ATCCACTCTGCGTCTACTTTG/ACTATTAATAATTCCAGCACAGTCCEx5/6 F/R ATGTTAAAATGGTGAGTCTGTGAC/ACGACCATAACACCTCTGAACGEx7 F/R AGAAATTGACCAAAATGGAGCC/ATCAAGACAGAGGTCAAATGCCEx8 F/R CTGAAGGAAACAGGGTCAAAAG/TGAAAGCCACATCAAGTAGCACEx1-New F/R GATGGAGTTTCCGACACG/CGGCTCCAAACTTCGCAPRKAG2Ex1 F/R GTGGCTTCAGGGTGTAACAGAG/AGCGTTAACCGCTTCGGEx2 F/R AAGGATAAGGTCTCAGGAAGGG/GTGAACACACAGGTGGAAGTGEx3 F/R AGTTTCATACAGGGCAGGAGG/ATACAGGCACTCAGCACCAAGEx4 F/R GAGACCAGACCTCTGACCACC/TGCAGATAGACTGCTCAGCTTGEx5 F/R GTCCCTTCCCACGCTTC/TAATTCGAGTTCGGCGAGTAAGEx6 F/R GGTTTACGTATATGCTCTTCATCC/AATAAACGTTTGCTGGCATTGEx7 F/R AAGACTTGGTCAAATATGTTAAGGG/TGAATTCCAAACCGGCATCEx8 F/R TCCAGTTTGGTATACACAATTAACAC/GATTTGGAAAATCACCATCAGCEx9 F/R TGCTATTATCCAAGCTATGTTTTG/TTTTATGGAGCAGAGAGAAAAGGEx10 F/R AAACTGAAGGGTATTTGAATTTGTG/GCCCTGTTTGGAATGAAGAACEx11 F/R AAGTGCTTTAAGGCAGGGGTAG/GGGAGACCTCAAAACTAACAATTTCEx12 F/R TCATGTTAATTCAGGCATCCAG/CTCCTGTGCCAGGTCATCTTACEx13 F/R TGAATAAATTCAGCACCAAGGG/CCTCACACCCCAAAAGGTTAGEx14 F/R ATGCAGGTACTAAATTCAGATAAGC/CTGAGATTCAGGCTTCCAGAGEx15 F/R GGCTGGAGGGATGTGTTG/TTAAATGCTGCACTTCCTGTTCEx16 F/R CTGCCAGACACAGGTAGAAACC/ACATTCTTAACCACTTGCAGCCEx1-New F/R CTTCTTGGCCCGCTGGA/CAGCGTTAACCGCTTCG

Table B.1 - continued on next page

b pcr primers 211

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)GALNT11Ex2-New F/R AGTGGTCCTACTTGGTGGTTTG/TGATGCAGCATTTCTGATCTCEx3-New F/R CAACCTAGAGACAACCTAAGTTTGAG/CCTCGTGACCTCTTACACTGACEx4-New F/R GGTGAACAAGGGGACAGTAAATC/CTTGAAAGTTCAGCTGTGTTAGTGEx5-New F/R CTGTAGTATTTGCTTGATAGTTTGTTG/ATCAGGACAGGCGCTGAGEx6-New F/R ATTGGACGGGTGGTTGTACC/CTTTTGCTCCTTACAAAGAAGGEx1 F/R ACGGGTGGTTGTACCTAAATTG/CTTTTGCTCCTTACAAAGAAGGEx2 F/R GTTATGTTCCAAAATGACACTGAG/GGCAGCAGAGAAAGACCCTATCEx3 F/R CTTGCCAAGTTTCATAAACACC/CTTGCCAAGTTTCATAAACACCEx4 F/R TTCATTGATTCTCAGATACAGTTTAAG/TGCGTAGAGAAGTTCCCATAAAGEx10-New F/R AATTCGTACCTCCGGAATAACC/CTGATGTCCTCAGGAGTGACTGEx11-New F/R AGTGCATTGGTCTGTTTCTGC/AAAATGAGTTCATCTGCCACACEx12-New F/R ACTAACCTTTGGCTAAGAGGGG/CTTTCCTAAAGCCCCAAACTCCSLC4A2Ex2 F/R CCATCCAGGTTATGCCTCC/CTCACACCCCTGGTTGGCEx3 F/R ATTAGAGGAGATAGCTGCTGGC/ACTAGGACCCCAGTCCCTTCEx4/5 F/R GGGATCAGGTTTGACTGTCC/GTTCCAGCATCACCTCACTATCEx6/7 F/R AGCTAGGAGGGCCTGGAGTC/AGGCTTGAGATTACAGCTGGTCEx8 F/R ACTCACCTCTGACACCCACC/GTGGACCATGCTTCCTCTGEx9/10 F/R CATGTGTAGGCTGGTGTTCC/CCTCCCACTCTGAGTCCACEx11/12 F/R TGAGGAAGCCTGGGTGTG/CACAAAGAAGAGGGCTGAGGEx13 F/R GTCTTGATCCCCATGACTGC/ACCCAGGCCTCCGTCAGEx14/15 F/R ATCTGCCCAAGATAAGGGTACG/AGGTCCTGAGCTCTGGGTGEx16 F/R CTCGGTGAGGGCTCTTCTC/CTCAAATCCAGAGGATGGGACEx17/18 F/R TTCCACCCGACACTCACTG/TAAGCTTAAGGACAGGCTGGGEx19 F/R CTTAAGATGCCATCCCCTTTC/ATCCAAGGTTCCCTGAGGTCEx20 F/R CATTCTGGAAAGACCCTGTG/ATCCCCGATAACTATGGAGAGGEx21 F/R GTACGTTGCCTCTTGCCTTTC/CTGAACTGGGCAGATGGGEx22/23 F/R CTGGCACCTAGGAATGTTCTTC/CCAAAGCACTTTACTGCAGGGXRCC2Ex1 F/R CTGCGCAGTTGGTGAATG/CCCAAGCCTCCCAATCCEx2 F/R CACCCAGCCTAAAGTTATGAAATC/CACAACCTGTGTCTCATTTTCCEx3A F/R TTGCTTGTACAGCTCCATTTTG/TGCATTATAGTTTGTGTCGTTGCEx3B F/R CCTTTTGATTTTGGATAGCCTG/TTTAAAACTTTAGGGCTGGGCABCF2Ex2 F/R ATTCCTGAGAGGGATGGAGC/ATCAAGATTCCAACTCTGGCTGEx3 F/R AGTGCTCAGCAGGATTGGAG/TTTGCTCTGAACACAGTTGATGEx4/5 F/R ATGGATGAGGTCCCAGTTTGAC/ACCACTGACTCCTGTTCCAAACEx6 F/R ACTCAGAATTGCATGTAGCAGG/TAACATCAGTGCAGTTTCTGGGEx7 F/R GGACAACACATCTAAAAGGTGC/TCAAGTGACCCACCCACCEx8/9 F/R CTTACTGCACAGCCTTAGAGCC/GCTCCTTTCTTATCCACCTCACEx10 F/R GATAAGGTGGGGTGTGATTGG/AGCACAGAACAGGAGCAACTCEx11 F/R CTTCCTCCTCTCTTGGTGACTG/AGTAGTAGCCCCTGCATCACTCEx12/13 F/R GTTTGCCTGCTCTGCCTATG/AAATGGAGGAGACTCTGGTTTGEx14 F/R TTATAAATGGCGATGCTAATGG/GGGTATTATGCACCCTCCACEx15 F/R CTTCCTAGACGCCCCTCAG/GAGCGGCTGGTCAGGTTAEx16 F/R CTTGCAGTGTAGGCAGGACC/CACAGCCGTCTGGACTTCTCKELEx1 F/R AAGGGCAAGATTGCTTGG/GGGCAGGACTTTTGCACEx2 F/R AACTCTCTTCAATGCACCAT/TAGTAGATGAGTGTTTGTGGGATEx3 F/R TTCTTCCCATCCCTGTCTCTCAA/GGTCAGGGTGGGCAGAAEx4 F/R CCTGATGTTCTTTGCCTAGC/GGAGCAGTCATGGTCATTT

Table B.1 - continued on next page

b pcr primers 212

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)Ex5 F/R CCTTTGATAGGAATTTGGGGC/TAAGCATGCATTGGTCCCATAEx6 F/R AGAGCCGATCCAGACAA/ACCTTCCTCTAAGGGATGATTEx7 F/R CACCTGCCCTCTCCATC/CCAGAGGTCCCTGCTTTEx8 F/R CCTCTGAGGCCAGGAGAA/GAGGAAGAAGGAAAGGAGGTAATEx9 F/R TCACACCCAAGGGGAAG/CCAAGATCTACGGTGCTCAEx10 F/R GGCTCTTCTTGAAGCATCC/GAGAGAGATGCCGACATTTACEx11 F/R GGAATGAAGATCTAGTGAAGACTG/GGAAGGCTGCTTTGGGTAEx12 F/R GTCAGAAGCTGTGGGCA/CATCTCGCTTGTTCCAATACTEx13 F/R GCAACGAGTGGGAGAAAT/CTTAGGAGGGTCAGAGAAGTGEx14 F/R ATTGGTGTTTGTCAGCGT/GGCTTCCTTTCCTGTGAATCEx15 F/R GAGGGACTGTGTAGGTCTCA/CATCATAACACCTGTCGGCEx16 F/R GGACTTACGGTTGGAGAATTG/CCCTTGTGGTCTTCCCTTAEx17 F/R CTCACCTAGGCAGCACCAA/CAGGAATGGTGGAAGGAAAEx18 F/R GTGTGGAGGGACTTTCAAAT/AGCTTCAGTAATAAGGCGTTCEx19 F/R ATCACCCCTTGTCACCG/GGAAATCTTGCTCTGATTCCATMLL3Ex1 F/R GGAGGAACCCGAGCAGCA/GGTGACAGCTCCGGCCCEx2 F/R TCAAATTGTGCCTTATGTTGAATATAAGTC/TACATGCAGTGTACAAACATATGCAAAEx3 F/R ATGCTTTATAGATACATTCACCAGTGATTA/TTTCTCTATAGTCATTTCACTGCTTTATTCEx4 F/R AAGCACAAGTATCTATAGTTATCGC/GAACAATCCCATTTGAAAATTTATAAACATEx5 F/R ATTTGGCAGTACATTGCATTTAAACA/CTAGATGAAAGGTATTTATTATTGAGGACAEx6 F/R AGAGTTGATTTTGAAGTTTTATGTCTTTAT/AATCCTTTCAAGTGAACTTCTGATTEx7 F/R CTTCCCAGCTTATATTTGAATATTCCTT/ACTCAGGGAATTTACAACATTTGTTAEx8 F/R AATCCTTTCAAATGATTTTACTGTATAGTC/CTATTATATTACACAAACCTTTAGCACCEx9 F/R GCAGTTCCCAGGACTGATTATT/CCGATTTGTCTGGTCTCCATEx10 F/R AAGGTTTACATCAGAACTACAATTACAAAT/GCTGTAAGATGGTAGAGCAAATGACEx11 F/R GCAATGCCTTCTCTTTACCTAGATAAT/ATTGGTGATACAATGGCACAACEx12 F/R AATCATAATTGTTGACTTAATGAACAAAGA/ACTGAAGTTTTAGTCAATATTCACATATTTEx13 F/R AACTCTTTACTCAAATTACTGTTGAGGGAA/AGCAATCGCAAAAATGTCTTTGGEx14A F/R GTTCTAGTGTAGCCAAATGTTTTATATATC/CATCAATCCAGTAGAAAGTTCAGAATEx14B F/R CATTATGTCCTGAGGAACAGT/TCCACACCAAAGCTACACATATACEx15 F/R TTCCTTAGTTGCTGGCTTTGAC/AAGATTTAATTCTTAACATCCAGTAGGGEx16 F/R GTTTTATCTCAGTGGCATTTGGATTTA/ACTTGCTATGAGATTTTCATCATTAACTEx17 F/R TTCTGAAGCTTTCCAGTTAAGTGTA/GTATATGAGATAAAGCAGAAATGTGCAAEx18 F/R TAGCTATAGAGCTATATTAGTTAAGAAGGT/AACTGGGAGCTCATGTTACTEx19 F/R CTTCTAGGTACATAAGCACCATATTAATTT/ACCAAAGTGGTATGAGAAAGCEx20 F/R CTTCCTGAATTACAGTAACTGAGTAACTA/TTGTGATTAAGGAAAAGCTATTTTCCATEx21 F/R TGAAGACTCAGTGTTACAGGTTT/GGTTATGGACAGTAAGGTGCAAEx22 F/R GCACTTTTAACTTTAGTGTTCAAACAT/GTTTGACAAGTGGTAAGCCAATEx23 F/R ACATGGGTATTTTGGAACAGTAATC/ACACATATTTCAAAAGTTACTTTATATGCTEx24 F/R ATTCCCCATTATGTAGGGATTATATATTTG/ATTGAAGTCTTCAGTATTGTATTTTTCTTTEx25 F/R CCTGTGTACTGTGGAAATGTAG/TTTCCAATAGAAATATCAAGACCAAAGGEx26 F/R ATTACCTTCAGTGGGTTTGTCT/ACAACAAACCTATGTTTTAAAACTCAATACEx27 F/R TCTGTTATTTCTGCTTCCTTGTCAA/AAAGATTTAACTGAAAGGAAGATGTATGEx28 F/R CAGTATCTGCTGTACTCTAGGTT/ACACATCATACACCTCTCGCEx29 F/R GTTATATACTAAGTACAGGTAAATAGTGCT/AGAAGTTAAATTATAACCATGCAAAAATCTEx30 F/R AAGATTTTTGCATGGTTATAATTTAACTTC/AGACTAGTAGGGTCCTGGTAEx31 F/R CCCATACCAGGACCCTACT/CTTGCTTGTGTGTGTATATGTACCEx32 F/R AGCTTTACTTCAACTCCTTGTTAC/AATATCATACACAATCATTTAAGCAGACTAEx33 F/R TGTAATAGGTTTTAATGAAAGCTATACTTG/GTTAAGCTACCTTTCTATTGCAGTTEx34 F/R CATCAACATTCTTTATTACAACACATCAT/CCTAATACAAATAAGGTTCAAGCACTGEx35 F/R TTTCTTTTATTTGAGAAAGTTCATGATAGT/AAATAATAATGCAAAGACCTCCCT

Table B.1 - continued on next page

b pcr primers 213

Table B.1 - continued from previous pageGene / Exon Primer sequence (forward / reverse)Ex36A F/R TCCCCCCATGATGGTAGTT/AGATGGTGGTCGTGAGTTAGEx36B F/R GTCTTTCTCAGGCTCAGACT/TCACTGGTTCCAGCTGCTAEx36C F/R TCCCAAATCCTTGGGCCTATC/CTACAACAGGTCGTGGAGTTCEx36D F/R CCCAGCCTGGAACCATA/AGGTGTCTGGGAACATGTATCEx36E F/R TTCCTGCAAGCAGCACAA/TCCTGTGTAGAAATTACTCAGTTTGATGEx37 F/R GGGCAACATAGCAAGATCCCATATCTAA/GCCACCACACCCAGCTCEx38A F/R GGTGATAGCTTTGCTTTTGTAAGTC/GGAGCTTCTGAAAATTCACCACEx38B F/R CCCAGGGTCTACCTAATCAGC/GAGAACCAGAGTTTTGTTTTCTTGEx38C F/R GATCCAGAACTTGACATGGGAG/AACTTGACACATGATTGGATGGEx38D F/R TGTGGCATAACTGGATCAACTC/AACAAACTCCTCAAGGAGGAATAGEx39 F/R AATATTTGAATTACTTCTTAGTGTACGTTG/GGGCTTTCCATTTTCTGTATGTEx40 F/R ACTGTTAAGCACCTAATACAGTTT/AAGCATGAGTTTGCTGGGEx41 F/R TTTTCTTTCTTTCTTTTAAAGTTTTTTCCC/GGTTTAGAACAGCACACATAGAGEx42 F/R AGATTTGTGTAACTTATAATGTTGGACT/AGTCACTAATGAACAACACACCEx43A F/R ACCTCAACCACAGTCATCAGC/CCGTTGTCTCTCTTGCTGTTCEx43B F/R AGTCTCAAATGCAAATCCACAG/GTGTAAAAGAAGGCCTCACTGGEx43C F/R ATTCACCATCAATCCCTGTTG/TCCATGGAGAGCTTGTCTACTTCEx43D F/R AAGAGTCGGTGGAACCAGTC/GAATAATGTGGAAATACAGTTGCTGEx44 F/R AGCTTGGCACTGCATGAACTAA/GGCATGAGCCACTGCCCEx31-iso2 F/R GGTCTTAGAAAGTTGAGCTACAC/ACAAGCTCTATCAATTCTGACATGTTAEx45 F/R ACAGAACTACGTTTTAGCAAGG/CTTTACAATTCTATTGCCATTTCACAATACEx46 F/R GGCAATAGAATTGTAAAGTTCTACCC/AAAGACATTATGCCACATACATACATTTEx47 F/R AGGGTTGGTACAGGAAGC/AATCAAAATATTTACATTAGTAAATTGGCGEx48 F/R ACACTACGCCAATTTACTAATGTAAATA/TGGTAGACACCTGACCCATCEx49 F/R TTTCTTCAGTGGTTCACTGTTG/AAGATAATCAGTATCTCAAAGAGTAAGCEx50 F/R ATGTTGTAATGAGTTGATACTGCATAATTT/AAGAGACACTGCCAAGGACEx51 F/R AGAACCACTGTCACTTTTGAATATAACATA/ATGCCCAGAACACAGAAGTCEx52A F/R TTGTCTTAACAAGAATTGCTTTCCG/GATAGTCTTTGGGCACAGGATEx52B F/R ACATTTAAACCACCTTGTGAGGATGAAATA/TGTCGCACCTCATCACGCEx52C F/R CCAAAGGGAATTCATGAGCAAG/ACAAAATACATTCATTAGCCTGACAEx53 F/R TCCTTGTGAGTGGTTATTCTTCATT/ATGTTACAGCATGTGTTCAAGTEx54 F/R CCTTTGACCAGCTTGAGATTG/ACAGGAAACCACTAATACGCCEx55 F/R TTTGGGCATTACTCAGAGGTATC/TCACATACTGGACATCAATAGCTTEx56 F/R CGAGGGTGAAACCTTTGTATGAA/ACGGAGAAAGGGAGAGGATEx57 F/R AGTCCTCCTGGTGTGGATAG/TCTCCACCAGCACTGGCEx58 F/R ACTGCTTTATCAATAAATGGGAAGT/GACCTGTGTGAGGAGGGEx59 F/R AGATGGAGGCAGAAACGC/CCAATGGTGTGTTGAATCGCPDIA4Ex1 F/R CCAGACCCCGAGTTCTC/CACTAGTTCTGGAGCTGACEx2 F/R CCGGCCAGGAAAGTTTTAAG/TGAAGGGCTGAGCGAATEx3 F/R ATGTATTATTGTTGGATTTAAATCCCATAG/CAAATTCCCATTCCCTGCGEx4 F/R CTTCTGGAGGGAGGTGAAG/CTGTTCCTGCCTCACGGEx5 F/R GCTCATTTAGCTAGAAAGTGAGTC/TCAATGGCAGCTTCCTGTTAEx6 F/R ATGGGCCATGGGAAGTG/GATGTCCCAGGAGCTTTCTEx7 F/R GGTAACGAATTAAATCTCAGACATTTAG/GTGGCTCAAAGCAGAGTEx8 F/R GTCCCTCAAACATTGTACTCAAGAA/CACTCCACCTTGCCTGAAAEx9 F/R GAGTCTGAGCTGGGACTTG/CCTGGCCTGTCTGGATTEx10 F/R CCCAGAATCTTTGGCCTTT/ACTGTCGTTTGTTGCCG

b pcr primers 214

Table B.2: PCR primers for microsatellite markers. Forward primers were5’-FAM labelled.

Marker Primers forward / reverseD7S1815 F/R FAM-GCCTACTTCCCTTGCTTACC / CCCAGTTCATGCCTCTGCD7S615 F/R FAM-TCCTATGGAATTCCTGTGG / GCCATTGGTCACTCATATTTGD7S2442 F/R FAM-TGAGCCAAGATCACAGCACT / CTGGAAGCAACAGATGTCACTAD7S2419 F/R FAM-TTGTCCCAGGCAGTTTTAC / GGCAATGAAGTTTCCCAGAD0-23xGT F/R FAM-GCTGAGATGGCACCATTGTA / GTAGGTGTGGGGGTGGAGTAD1-28xTA F/R FAM-ACATGCCTGTTATCCCAGCTAC / CGTTTTGATACATTATGGAATGGAD2-30xTA F/R FAM-GGCCTTGTGATCATGTGAGTTA / GAAACGTTGAAACAATGACAGCAD3-23xCA F/R FAM-AGATACTCAGGAGGCTGAGGTG / ATTCCTGGGGATGCTTATGACD7S1823 F/R FAM-TCCAAGTAAAGGACAGACTTC / AGCCTCTGTGCAGATCCTC †

D7S2447 F/R FAM-GTCAGATGTAGGAACCAAGC / TCTCACTGTACTCAGCCTGT †

D7S1518 F/R FAM-TCAGTGGGGAAAGCCCTCAC / GGGCAACAAGAACGAAACTCC †

D7S684 F/R FAM-GCTTGCAGTGAGCCGAC / GATGTTGATGTAAGACTTTCCAGCC †

D7S637 F/R FAM-GCTGTACCTCCCAGCA / ATATCAAAACCAAGATGTTGAC †

D7S661 F/R FAM-TTGGCTGGCCCAGAAC / TGGAGCATGACCTTGGAA †

D7S2513 F/R FAM-GCAGCATTATCCTCAACAGC / CACAAATGGCAGCCTTTCD7S2473 F/R FAM-CAGACCTCAGTGCCTCAG / GGGGATTCACCAGTAGATG† Marked primers were supplied by Sigma. Otherwise primers were supplied by

Invitrogen.

Table B.3: PCR primers for RT-PCR. Primers supplied by Invitrogen.Name Primer sequenceCNTNAP2-Ex20 F GCCACGAGAAGACCATCTTTCNTNAP2-Ex21 F CCCAAGTCGCTCTTTCTGCNTNAP2-SiEx1 F GACAAAGGACGGTCCTCGCNTNAP2-Ex22 R TCTGGAGAGGCAACCAGTCNTNAP2-Ex23 R GGATTATATGGAAAATCCGCACTCNTNAP2-Ex24 R GGTGCACAGGATGGTGAA

Table B.4: PCR primers for HRM analysis. Primers supplied by Sigma.Name Primer sequenceMFN2 Exon5F CTGGTATCTGCGTTGTGAAAGTMFN2 Exon5R AGAGGTGAGGTCAGCCGTGTC

Table B.5: PCR primers for validation of CNV identified with aCGH at 7q34-q36.Primers supplied by Invitrogen

Name Primer sequenceDelAa_F AGATTCATATTCAAACCATCTTTATTTACGDelAb_F TAGTTCTAGGTATCAGGACTAAAGGAATDelAc_F GATTCCTTCACAGTTCTAGTAAAGATTDelAa_R AACAAAGCAGGGTCATAGCDelAc_R CAAAGCAGGGTCATAGCATDelC_F TATAATGTGTGGGATTGAGGGTCDelC_R CTGTTACATGAATTGAATAAACTACAGTTG

b pcr primers 215

Table B.6: PCR primers for validation of variants identified through NGS at 7q34-q36.Primers supplied by Invitrogen and Sigma as indicated.

Sequence Name SupplierAGCATCGCACTAATCCACG CLCN1-UTR6.1F InvitrogenGAGCTACAAAGGGATGGGG CLCN1-UTR6.1R InvitrogenCAATCCCTCACCCTACCTCTTAC CLCN1-UTR6.2F InvitrogenACAGTCATTCCCTGGATGC CLCN1-UTR6.2R InvitrogenGCAGGGCCTGCAGTGAC SSPO–>C F InvitrogenGACAGTGCCCTGGGTTGGTAA SSPO–>C R InvitrogenCTTTGGGTGGAGGCATGG SSPO-C>G F InvitrogenCCCCTCACAAGTCCGATG SSPO-C>G R InvitrogenGCCCTGGCTGTGGGAGAA PRSS-FRAG F InvitrogenGCCTGGAGGCGAGAGGATTTA PRSS-FRAG R InvitrogenACTTGATTATCTTACCTTCTTATTCAAGTT ZNF398 frag F InvitrogenTCTTCAGTCAGTCCTGGCA ZNF398 frag R InvitrogenTGAAGGAGATCCCAGAGTTCTTGTTT KRBA1 frag F InvitrogenAGTGCAGTGCCGTGGAC KRBA1 frag R InvitrogenTCAAATTGTGCCTTATGTTGAATATAAGTC MLL3 frag F InvitrogenGTACAAACATATGCAAAGCATCATTATAC MLL3 frag R InvitrogenCTGACTCCCGGCCTCTC CENTG3prot Ex1 F InvitrogenACACTCGGTCCAGACCC CENTG3prot Ex1 R InvitrogenCTGGTTACACAGGTGTGTTCAGT NOS3EST-Gen-F InvitrogenGGCATCTAAACAGCGCTCAAT NOS3EST-Gen-R InvitrogenTCCCTCGAACACGAGACG NOS3EST-cDNA-F InvitrogenAAACAGCGCTCAATTTCTTCTTT NOS3EST-cDNA-R InvitrogenATTCCAGCATTTTATTTTATTTGTTTTATT NOS3EST-LS-F InvitrogenAAGGCCAGCCTGGGCAA NOS3EST-LS-R InvitrogenCCAGTGTCTGGGTTCCTGAG SSCP-Exon19-21-F InvitrogenGAGGGAAGGATGATGTGTGG SSCP-Exon19-21-R InvitrogenGTGTCTGGGTGGGAGGAG SSCP-Exon63-65-F InvitrogenGTTCTTCGCTGTCCAGATCC SSCP-Exon63-65-R InvitrogenCGATCTGATTGCTGGGG SSCP-Exon99-100-F InvitrogenCTCTCCCCACACCTACCC SSCP-Exon99-100-R InvitrogenAGTACCAGGCCAAAGTGGAAT GIMAP5-frag-F InvitrogenAGGCACACAGCTCCAAGTCT GIMAP5-frag-R InvitrogenACAGCAGCGGGTAGTATTTAGG CUL1-Ex1AF InvitrogenACCAGCCTCTGGAGAAACG CUL1-Ex1AR InvitrogenCCCAACAGCTTCTCCCC CUL1-Ex1BF InvitrogenAGTCACTGCGACAccgc CUL1-Ex1BR InvitrogenTTTTCTGTCAGTGTGTCTCACC CUL1-Ex2F-Alex InvitrogenAAGGTCTAGCTGCGTCCCTC CR624471-Ex1_1F InvitrogenTCGAGAATTGGCTGAGGTTC CR624471-Ex1_1R InvitrogenTTCAACCTAAATACTACCCGCTG CR624471-Ex1_3F InvitrogenAACCCCACATAGGTACACTGG CR624471-Ex1_3R InvitrogenCTGTTGCTGCTGCTCGC UTR-A-CNTNAP2F InvitrogenCTCCAGGCTCCAGTTCTTCC UTR-A-CNTNAP2R InvitrogenCCTTCTTGGTTTGAATTTCCTC UTR-B-CNTNAP2F InvitrogenACTTCGGAGTTGATACCCTGG UTR-B-CNTNAP2R InvitrogenGCAGTGTCATCTCCTACCACAG CNTNAP2-Exon18F InvitrogenTTGGAAAATTCCTACCTAAGTTGAG CNTNAP2-Exon18R InvitrogenGATACTTTTGTGTGGCTCTCCC CNTNAP2-Alt-EX1F InvitrogenGCGCACCCTTGCAACAC CNTNAP2-Alt-EX1R InvitrogenATTATGCAATTGGGTTGGGAG LOC392145-EX1F Invitrogenctttctgtgcagggaacagc LOC392145-EX1R InvitrogenTCCCCTTCCCTGGGCTC CASP2-Ex1R SigmaTTCACCCCATTGGACCGC CASP2-Ex1F Sigma

CP E D I G R E E S F O R A D D I T I O N A L D H M N FA M I L I E S

This Appendix shows the pedigrees for the additional dHMN families not

shown in Chapter 3. The dHMN families uninformative for markers at 7q34-

q36 are; CMT484, CMT274, CMT677, CMT703, CMT131, CMT273, CMT333 and

CMT609. The dHMN families excluded from linkage to 7q34-q36 are; CMT701,

CMT260, CMT618, CMT477, CMT748, CMT779, CMT808, CMT375, CMT102,

CMT724 and CMT331.

216

c pedigrees for additional dhmn families 217

Figu

reC

.1:P

edig

ree

CM

T484

wit

hha

plot

ype

anal

ysis

at7q

34-q

36Fi

gure

C.2

:Ped

igre

eC

MT2

74w

ith

hapl

otyp

ean

alys

isat

7q34

-q36

c pedigrees for additional dhmn families 218

Figure C.3: Pedigree CMT609 with haplotype analysis at 7q34-q36

c pedigrees for additional dhmn families 219

Figu

reC

.4:P

edig

ree

CM

T67

7w

ith

hap

loty

pe

anal

ysis

at7q

34-

q36

Figu

reC

.5:P

edig

ree

CM

T70

3w

ith

hap

loty

pe

anal

ysis

at7q

34-

q36

c pedigrees for additional dhmn families 220

Figure C.6: Pedigree CMT131 with haplo-type analysis at 7q34-q36

Figure C.7: Pedigree CMT273 withhaplotype analysis at 7q34-q36

c pedigrees for additional dhmn families 221

Figure C.8: Pedigree CMT333 with haplotype analysis at 7q34-q36

c pedigrees for additional dhmn families 222

Figu

reC

.9:P

edig

ree

CM

T701

wit

hha

plot

ype

anal

ysis

at7q

34-q

36

c pedigrees for additional dhmn families 223

Figu

reC

.10:

Pedi

gree

CM

T260

wit

hha

plot

ype

anal

ysis

at7q

34-q

36

c pedigrees for additional dhmn families 224

Figure C.11: Pedigree CMT618 with haplotype analysis at 7q34-q36

c pedigrees for additional dhmn families 225

Figure C.12: Pedigree CMT477 with haplotype analysis at 7q34-q36

c pedigrees for additional dhmn families 226

Figu

reC

.13:

Pedi

gree

CM

T748

with

hapl

o-ty

pean

alys

isat

7q34

-q36

Figu

reC

.14:

Pedi

gree

CM

T779

with

hapl

o-ty

pean

alys

isat

7q34

-q36

Figu

reC

.15:

Pedi

gree

CM

T808

with

hapl

o-ty

pean

alys

isat

7q34

-q36

c pedigrees for additional dhmn families 227

Figure C.16: Pedigree CMT375 with haplotype analysis at 7q34-q36

Figure C.17: Pedigree CMT102 with haplotype analysis at 7q34-q36

c pedigrees for additional dhmn families 228

Figure C.18: (Main figure) Pedigree CMT724 with haplotype analysis at 7q34-q36. (Box)The two possible haplotype pairs for individuals II:1 and II:2 can be inferred fromgeneration III but can not be assigned to either individual.

c pedigrees for additional dhmn families 229

Figu

reC

.19:

Pedi

gree

CM

T331

wit

hha

plot

ype

anal

ysis

at7q

34-q

36

DT W O P O I N T L O D S C O R E S F O R A D D I T I O N A L

DHMN FAMILIES

This Appendix shows the two-point LOD scores for the additional dHMN

families not shown in Chapter 3. The dHMN families uninformative for mark-

ers at 7q34-q36 are; CMT484, CMT274, CMT677, CMT703, CMT131, CMT273,

CMT333 and CMT609. The dHMN families excluded from linkage to 7q34-

q36 are; CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT808,

CMT375, CMT102, CMT724 and CMT331.

Table D.1: Pedigree CMT274 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S661 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S498 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463D7S2511 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2419 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463D7S688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2439 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463D7S3070 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463D7S483 0.4403 0.3842 0.3271 0.2128 0.1075 0.0295D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2462 0.3538 0.3052 0.2566 0.1623 0.0797 0.0213D7S2447 0.4403 0.3842 0.3271 0.2128 0.1075 0.0295

230

d two point lod scores for additionaldhmn families 231

Table D.2: Pedigree CMT609 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -3.5741 -0.4492 -0.2014 -0.0119 0.0449 0.0417D7S661 -3.2730 -0.1704 0.0539 0.1922 0.1910 0.1209D7S498 -3.4491 -0.0634 0.1437 0.2505 0.2231 0.1335D7S2511 0.5944 0.5222 0.4540 0.3294 0.2171 0.1102D7S615 -0.1041 -0.0720 -0.0493 -0.0219 -0.0086 -0.0025D7S2442 -3.2730 -0.1704 0.0539 0.1922 0.1910 0.1209D7S2419 -3.2730 -0.1704 0.0539 0.1922 0.1910 0.1209D7S688 0.4058 0.3657 0.3262 0.2482 0.1701 0.0889D7S2439 0.0039 0.0263 0.0398 0.0486 0.0416 0.0244D7S3070 -3.5741 -0.4492 -0.2014 -0.0119 0.0449 0.0417D7S483 0.3503 0.3799 0.3841 0.3449 0.2626 0.1474D7S1815 -0.0689 -0.0724 -0.0684 -0.0481 -0.0245 -0.0070D7S2462 0.7270 0.6616 0.5958 0.4616 0.3214 0.1702D7S637 -0.2024 -0.1814 -0.1543 -0.0970 -0.0490 -0.0163D7S2447 0.5351 0.5039 0.4682 0.3832 0.2792 0.1532

Table D.3: Pedigree CMT333 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 0.0223 0.0252 0.0256 0.0209 0.0123 0.0038D7S661 0.1316 0.1094 0.0887 0.0521 0.0239 0.0061D7S498 0.0580 0.0529 0.0466 0.0319 0.0167 0.0048D7S2511 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 0.0280 0.0228 0.0181 0.0103 0.0046 0.0012D7S2419 0.0902 0.0744 0.0598 0.0347 0.0157 0.0040D7S688 0.0580 0.0529 0.0466 0.0319 0.0167 0.0048D7S2439 0.1397 0.1182 0.0975 0.0594 0.0283 0.0075D7S3070 0.1316 0.1094 0.0887 0.0521 0.0239 0.0061D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S1815 -0.2108 -0.1621 -0.1227 -0.0647 -0.0276 -0.0067D7S2462 0.0580 0.0291 0.0079 -0.0136 -0.0141 -0.0055D7S637 0.0580 0.0291 0.0079 -0.0136 -0.0141 -0.0055D7S2447 0.0792 0.0652 0.0523 0.0302 0.0137 0.0035

d two point lod scores for additionaldhmn families 232

Table D.4: Pedigree CMT484 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S661 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S498 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2511 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S2419 0.1761 0.1477 0.1206 0.0719 0.0334 0.0086D7S688 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S2439 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S3070 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2462 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S637 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S2447 0.1761 0.1477 0.1206 0.0719 0.0334 0.0086

Table D.5: Pedigree CMT677 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -0.0212 -0.0340 -0.0402 -0.0373 -0.0228 -0.0072D7S661 0.1038 0.0780 0.0569 0.0272 0.0105 0.0024D7S498 -0.0984 -0.0783 -0.0630 -0.0398 -0.0211 -0.0064D7S2511 0.1415 0.1124 0.0874 0.0485 0.0220 0.0058D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 0.2075 0.1682 0.1310 0.0673 0.0243 0.0040D7S2419 0.0186 0.0176 0.0159 0.0110 0.0057 0.0016D7S688 0.2239 0.1839 0.1453 0.0774 0.0294 0.0053D7S2439 -0.1947 -0.1414 -0.1022 -0.0509 -0.0216 -0.0056D7S3070 -0.7413 -0.4904 -0.3510 -0.1841 -0.0823 -0.0215D7S483 0.2129 0.1782 0.1464 0.0908 0.0456 0.0131D7S2462 -0.6164 -0.3733 -0.2460 -0.1118 -0.0451 -0.0110D7S637 0.1415 0.1124 0.0874 0.0485 0.0220 0.0058

d two point lod scores for additionaldhmn families 233

Table D.6: Pedigree CMT703 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117D7S661 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S498 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S2511 -0.0969 -0.0768 -0.0595 -0.0325 -0.0141 -0.0035D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117D7S2419 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117D7S688 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117D7S2439 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S3070 -4.9384 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S1815 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117D7S2462 -0.4771 -0.3372 -0.2416 -0.1192 -0.0490 -0.0117D7S637 0.1761 0.1477 0.1206 0.0719 0.0334 0.0086D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Table D.7: Pedigree CMT131 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S661 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S498 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969D7S2511 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969D7S2419 0.0792 0.0719 0.0645 0.0492 0.0334 0.0170D7S2439 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969D7S3070 -4.5026 -0.9999 -0.6989 -0.3979 -0.2218 -0.0969D7S483 0.3010 0.2787 0.2553 0.2041 0.1461 0.0792AD0AD0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000AD3AD3 0.3010 0.2787 0.2553 0.2041 0.1461 0.0792D7S2462 0.0792 0.0719 0.0645 0.0492 0.0334 0.0170D7S637 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

d two point lod scores for additionaldhmn families 234

Table D.8: Pedigree CMT273 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177D7S661 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177D7S498 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177D7S2511 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2419 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S688 -0.1761 -0.1549 -0.1347 -0.0969 -0.0621 -0.0300D7S2439 -4.9402 -1.0000 -0.6989 -0.3979 -0.2218 -0.0969D7S3070 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2462 -4.9402 -1.0000 -0.6989 -0.3979 -0.2218 -0.0969D7S637 -0.0969 -0.0862 -0.0757 -0.0555 -0.0362 -0.0177D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

d two point lod scores for additionaldhmn families 235

Table D.9: Pedigree CMT808 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S661 -5.1858 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S498 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S2511 -0.2016 -0.1560 -0.1187 -0.0635 -0.0277 -0.0072D7S615 -5.1858 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S2442 -5.1858 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S2419 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S688 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S2439 -1.0212 -0.5733 -0.3758 -0.1712 -0.0679 -0.0160D7S3070 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S483 0.1037 0.1038 0.1018 0.0914 0.0720 0.0423D7S1815 0.5187 0.4788 0.4370 0.3467 0.2459 0.1317D7S2462 -1.0212 -0.5733 -0.3758 -0.1712 -0.0679 -0.0160D7S2447 -4.9402 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247

Table D.10: Pedigree CMT260 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -4.9720 -0.2587 0.0044 0.1649 0.1436 0.0545D7S661 -5.5081 -0.8214 -0.5004 -0.2035 -0.0705 -0.0139D7S498 -5.1560 -0.6542 -0.3532 -0.0999 -0.0138 0.0026D7S2511 -4.7725 -0.3563 -0.1115 0.0453 0.0549 0.0203D7S615 0.0667 0.0532 0.0415 0.0228 0.0099 0.0025D7S2442 0.5516 0.4869 0.4205 0.2824 0.1457 0.0390D7S2419 0.1987 0.1595 0.1229 0.0609 0.0201 0.0025D7S688 -4.8352 -0.4436 -0.1806 -0.0038 0.0211 0.0068D7S2439 0.0806 0.0637 0.0491 0.0263 0.0113 0.0028D7S3070 -4.8352 -0.4436 -0.1806 -0.0038 0.0211 0.0068D7S483 -5.5383 -0.9486 -0.6004 -0.2649 -0.1025 -0.0236D7S1815 -1.0985 -0.6980 -0.4767 -0.2326 -0.1010 -0.0264D7S2462 -5.1490 -0.6904 -0.4119 -0.1666 -0.0586 -0.0119D7S2447 -6.1524 -1.1612 -0.7143 -0.3007 -0.1117 -0.0247

d two point lod scores for additionaldhmn families 236

Table D.11: Pedigree CMT779 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -4.9645 -1.9996 -1.3978 -0.7959 -0.4437 -0.1938D7S2511 0.3010 0.2577 0.2148 0.1335 0.0645 0.0170D7S615 0.0134 -0.0086 -0.0269 -0.0500 -0.0531 -0.0356D7S2442 -0.0389 -0.0534 -0.0641 -0.0728 -0.0641 -0.0389D7S2419 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S688 -4.9645 -1.9996 -1.3978 -0.7959 -0.4437 -0.1938D7S2439 0.0000 -0.0410 -0.0757 -0.1192 -0.1192 -0.0757D7S3070 -5.2180 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177D7S483 -5.2180 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177D7S1815 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2462 -5.2180 -0.7212 -0.4437 -0.1938 -0.0757 -0.0177D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Table D.12: Pedigree CMT477 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -0.9379 -0.6017 -0.4066 -0.1847 -0.0713 -0.0162D7S661 0.1020 0.0748 0.0544 0.0289 0.0155 0.0072D7S498 -0.1965 -0.1292 -0.0843 -0.0331 -0.0103 -0.0018D7S615 -0.2314 -0.1832 -0.1416 -0.0767 -0.0330 -0.0079D7S2442 -0.0517 -0.0588 -0.0569 -0.0380 -0.0153 -0.0009D7S2419 -0.8477 -0.5450 -0.3714 -0.1732 -0.0696 -0.0173D7S688 -0.1965 -0.1292 -0.0843 -0.0331 -0.0103 -0.0018D7S2439 -0.9379 -0.6017 -0.4066 -0.1847 -0.0713 -0.0162D7S3070 -1.0035 -0.6408 -0.4338 -0.2029 -0.0852 -0.0246D7S1815 0.0350 0.0251 0.0173 0.0073 0.0023 0.0004D7S2462 -0.6099 -0.4279 -0.3011 -0.1416 -0.0552 -0.0126D7S2447 -0.2979 -0.1716 -0.1035 -0.0404 -0.0174 -0.0076

d two point lod scores for additionaldhmn families 237

Table D.13: Pedigree CMT331 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 0.0561 0.0436 0.0330 0.0172 0.0072 0.0017D7S661 -2.3970 -0.4527 -0.2337 -0.0800 -0.0270 -0.0034D7S498 -1.7849 -0.5506 -0.3130 -0.1173 -0.0371 -0.0038D7S2511 0.2793 0.2383 0.1980 0.1222 0.0586 0.0154D7S615 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2442 0.1213 0.0930 0.0610 0.0043 -0.0227 -0.0175D7S2419 -5.3260 -0.4600 -0.2023 0.0023 0.0666 0.0603D7S688 -0.1641 -0.1316 -0.1021 -0.0542 -0.0222 -0.0048D7S2439 0.2673 0.2147 0.1672 0.0890 0.0340 0.0030D7S3070 -2.6658 -0.5634 -0.3118 -0.1035 -0.0179 0.0121D7S483 -2.5798 -0.7164 -0.4417 -0.1932 -0.0755 -0.0177D7S1815 -2.2566 -0.9877 -0.6773 -0.3630 -0.1901 -0.0788D7S2462 0.4043 0.3624 0.3195 0.2330 0.1499 0.0728D7S2447 0.0502 0.0449 0.0447 0.0485 0.0462 0.0309

Table D.14: Pedigree CMT724 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S2202 -1.1596 -0.5877 -0.2878 -0.0271 0.0390 0.0228D7S1518 0.6013 0.5264 0.4502 0.2971 0.1533 0.0429D7S2513 -1.1602 -0.5878 -0.2879 -0.0271 0.0390 0.0228D7S2473 0.5809 0.5153 0.4469 0.3035 0.1612 0.0463D7S661 -0.0121 -0.0114 -0.0101 -0.0066 -0.0033 -0.0009D7S498 -0.1414 -0.0627 -0.0216 0.0072 0.0070 0.0018D7S2511 0.1015 0.0839 0.0676 0.0393 0.0179 0.0045D7S615 -0.6518 -0.3902 -0.2515 -0.1101 -0.0449 -0.0118D7S2442 -0.5481 -0.2910 -0.1344 0.0175 0.0501 0.0243D7S2419 0.6080 0.5332 0.4570 0.3029 0.1570 0.0442D7S688 0.5923 0.5187 0.4438 0.2930 0.1512 0.0423D7S2439 -3.8570 -1.1223 -0.6148 -0.1923 -0.0362 0.0035D7S3070 -3.8570 -1.1223 -0.6148 -0.1923 -0.0362 0.0035D7S483 -4.4991 -1.1124 -0.6803 -0.2802 -0.1002 -0.0209D7S1815 -5.3591 -1.4482 -0.9227 -0.4044 -0.1535 -0.0342D7S2462 -5.1540 -1.5686 -0.9846 -0.4173 -0.1527 -0.0326D7S2447 -0.7413 -0.4724 -0.3223 -0.1516 -0.0610 -0.0145

d two point lod scores for additionaldhmn families 238

Table D.15: Pedigree CMT618 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -4.5214 -0.4425 -0.1885 0.0103 0.0704 0.0614D7S661 -2.8949 -0.2845 -0.0595 0.0873 0.1063 0.0707D7S498 -2.9054 -0.5799 -0.3328 -0.1328 -0.0498 -0.0117D7S2511 -8.1683 -1.1633 -0.6320 -0.1835 -0.0053 0.0437D7S2442 -4.4554 -0.6362 -0.3337 -0.0654 0.0384 0.0537D7S2419 -4.9227 -1.3617 -0.8196 -0.3437 -0.1286 -0.0287D7S688 -7.9118 -1.1629 -0.6318 -0.1834 -0.0053 0.0437D7S2439 0.1249 0.1038 0.0840 0.0492 0.0226 0.0058D7S3070 -9.9771 -1.4425 -0.8874 -0.3876 -0.1514 -0.0355D7S483 -4.3118 -0.1849 0.0263 0.1438 0.1348 0.0785D7S1815 -5.1849 -0.7560 -0.4969 -0.2470 -0.1059 -0.0261D7S2462 -4.8261 -1.0688 -0.5517 -0.1310 0.0221 0.0519D7S637 -3.3683 -0.7205 -0.4434 -0.1937 -0.0757 -0.0177

Table D.16: Pedigree CMT748 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S1518 -4.6970 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S661 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S498 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2511 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S615 -1.0212 -0.5732 -0.3758 -0.1712 -0.0679 -0.0160D7S2442 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2419 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S688 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S2439 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S3070 -4.9569 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S483 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000D7S1815 -1.0212 -0.5732 -0.3758 -0.1712 -0.0679 -0.0160D7S2462 -5.6999 -0.7413 -0.4625 -0.2096 -0.0877 -0.0247D7S2447 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

d two point lod scores for additionaldhmn families 239

Table D.17: Pedigree CMT701 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S661 -4.4795 -0.5837 -0.3262 -0.1159 -0.0344 -0.0053D7S498 -8.9135 -1.6371 -1.0715 -0.5454 -0.2734 -0.1075D7S2511 -8.7837 -1.3049 -0.7699 -0.3097 -0.1101 -0.0230D7S2419 -4.8428 -0.6513 -0.3846 -0.1555 -0.0558 -0.0118D7S688 -9.2754 -1.4098 -0.8598 -0.3697 -0.1421 -0.0327D7S2439 -9.2308 -2.4876 -1.6226 -0.8057 -0.3821 -0.1346D7S3070 -4.1146 -1.8532 -1.2343 -0.6326 -0.3126 -0.1180D7S483 -3.8545 -1.2639 -0.7707 -0.3747 -0.2013 -0.0915D7S2462 -4.0306 -1.4206 -0.9053 -0.4600 -0.2406 -0.1008D7S637 -4.2524 -1.6062 -1.0544 -0.5429 -0.2749 -0.1083D7S2447 -4.4725 -1.9030 -1.2804 -0.6644 -0.3286 -0.1223

Table D.18: Pedigree CMT102 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S2511 -3.9964 -0.7211 -0.4436 -0.1938 -0.0757 -0.0177D7S483 -3.9964 -0.7211 -0.4436 -0.1938 -0.0757 -0.0177

Table D.19: Pedigree CMT375 two-point LOD scores between the dHMN locus andmicrosatellite markers on chromosome 7q34-q36.

Marker LOD score at recombination fraction0.00 0.05 0.10 0.20 0.30 0.40

D7S2511 -5.1596 -0.6962 -0.3311 -0.0347 0.0562 0.0555D7S483 -6.1442 -0.6962 -0.3311 -0.0347 0.0562 0.0555

EM U LT I - P O I N T L O D S C O R E S F O R A D D I T I O N A L

DHMN FAMILIES

This Appendix shows the multi-point LOD scores for the additional dHMN

families not shown in Chapter 3. The dHMN families uninformative for mark-

ers at 7q34-q36 are; CMT484, CMT274, CMT677, CMT703, CMT131, CMT273,

CMT333 and CMT609. The dHMN families excluded from linkage to 7q34-

q36 are; CMT701, CMT260, CMT618, CMT477, CMT748, CMT779, CMT808,

CMT375, CMT102, CMT724 and CMT331. These plots show multi-point LOD

score plotted against genetic distance.

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S688

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S637

D7S2447

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

5 10 15 20 250

Genetic distance (cM)

Figure E.1: Multi-point LOD plot for family CMT609.

240

e multi-point lod scores for additional dhmn families 241

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S688

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S2447

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25

Genetic distance (cM)

Figure E.2: Multi-point LOD plot for family CMT274.

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S498

D7S2442

D7S2439

D7S3070

D7S483

D7S2462

D7S637

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2511

D7S615

D7S2419

D7S688

0 5 10 15 20

Genetic distance (cM)

Figure E.3: Multi-point LOD plot for family CMT677.

e multi-point lod scores for additional dhmn families 242

Mu

lti-

po

int

LO

Dsc

ore

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S688

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S637

D7S2447

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5 10 15 20 25

Genetic distance (cM)

Figure E.4: Multi-point LOD plot for family CMT484.

Mu

lti-

po

int

LO

D s

core

D7S1518

D7S661

D7S498

D7S2442

D7S2439

D7S3070 D

7S483

D7S1815

D7S2462

D7S637

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2511 D

7S615

D7S2419 D

7S688

0 5 10 15 20 25

Genetic distance (cM)

Figure E.5: Multi-point LOD plot for family CMT703.

e multi-point lod scores for additional dhmn families 243

Multi-pointLOD

score

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S2442

D7S2419

D7S2439

D7S3070

D7S483

AD0

D7S1815

AD3

D7S2462

D7S637

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

5 1 0 1 5 2 0 2 50

Figure E.6: Multi-point LOD plot for family CMT131.

Mu

lti-

po

int

LO

D s

core

Genetic distance (cM)

D7S1518

D7S661

D7S49

D7S2419 D

7S688

D7S2439

D7S3070 D

7S483

D7S1815

D7S2462

D7S637

D7S2447

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

D7S2511 D

7S615

D7S2442

0 5 10 15 20 25

Figure E.7: Multi-point LOD plot for family CMT333.

e multi-point lod scores for additional dhmn families 244

Mu

lti-

po

int

LO

D s

core

D7S1518

D7S661

D7S498

D7S2442

D7S2439

D7S3070 D

7S483

D7S1815

D7S2462

D7S637

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2419 D

7S688

D7S2511 D

7S615

5 10 15 20 250

Genetic distance (cM)

Figure E.8: Multi-point LOD plot for family CMT273.

Mu

lti-

po

int

LO

D s

core

D7S1518

D7S661

D7S498

D7S2442

D7S2439

D7S3070 D

7S483

D7S1815

D7S2462

D7S2447

-6.0

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2511 D

7S615

D7S2419 D

7S688

Genetic distance (cM)

0 5 10 15 20 25

Figure E.9: Multi-point LOD plot for family CMT260.

e multi-point lod scores for additional dhmn families 245

Mul

ti-p

oint

LO

Dsc

ore

D7S661

D7S498

D7S2511

D7S2419

D7S688

D7S2439

D7S3070

D7S483

D7S2462

D7S637

D7S2447

-8.0

-7.0

-6.0

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

0 5 10 15 20

Genetic distance (cM)

Figure E.10: Multi-point LOD plot for family CMT701

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S498

D7S2511

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S637

-8.0

-7.0

-6.0

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2511

D7S2419

D7S688

20151050

Genetic distance (cM)

Figure E.11: Multi-point LOD plot for family CMT618.

e multi-point lod scores for additional dhmn families 246

Mul

ti-p

oint

LO

Dsc

ore

D7S2202

D7S1518

D7S2513

D7S2473

D7S661

D7S498

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S2447

-6.0

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2511

D7S615

D7S2419

D7S688

Genetic distance (cM)

0 5 10 15 20 25

Figure E.12: Multi-point LOD plot for family CMT724.

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S688

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

50 10 15 20 25

Genetic distance(cM)

Figure E.13: Multi-point LOD plot for family CMT331.

e multi-point lod scores for additional dhmn families 247

Mu

lti-

po

int

LO

D s

core

Genetic distance (cM)

D7S1518

D7S661

D7S498 D

7S615

D7S2442

D7S2439

D7S3070

D7S1815

D7S2462

D7S2447

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2419 D

7S688

0 5 10 15 20 25

Figure E.14: Multi-point LOD plot for family CMT477.

Mu

lt-p

oin

t L

OD

sco

re

D7S1518

D7S2511 D

7S615

D7S2442

D7S2439

D7S3070 D

7S483

D7S1815

D7S2462

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

D7S2419 D

7S688

Genetic distance (cM)

5 10 15 20 25

Figure E.15: Multi-point LOD plot for family CMT779.

e multi-point lod scores for additional dhmn families 248

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S688

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S2447

-6.0

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

Genetic distance (cM)

0 5 10 15 20 25

Figure E.16: Multi-point LOD plot for family CMT748.

Mul

ti-p

oint

LO

Dsc

ore

D7S1518

D7S661

D7S2511

D7S615

D7S498

D7S2419

D7S688

D7S2442

D7S2439

D7S3070

D7S483

D7S1815

D7S2462

D7S2447

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

Genetic distance (cM)

0 5 10 15 20 25

Figure E.17: Multi-point LOD plot for family CMT808.

F7 Q 3 4 - Q 3 6 C O N T R O L H A P L O T Y P E S

Table F.1: Healthy control haplotypes for association analysis from unaffected spousesin dHMN families

# D7S

1518

D7S

661

D7S

498

D7S

2511

D7S

615

D7S

2442

D7S

2419

D7S

688

D7S

2439

D7S

3070

D7S

483

AD

0

D7S

1815

AD

3

D7S

2462

D7S

637

D7S

2447

1 5 3 3 7 1 2 5 8 1 6 3 . 2 . 5 . 22 9 3 2 5 1 4 3 4 4 2 4 . 1 . 3 . 33 7 3 1 5 2 2 3 4 1 3 3 . 2 . 2 . 34 6 3 7 6 2 2 5 3 1 4 4 . . . 8 . 15 8 7 2 7 1 4 7 4 1 1 2 . 5 . 5 5 16 8 4 8 8 1 5 7 3 1 2 6 . 1 . 3 1 27 5 3 1 1 1 5 3 4 3 6 7 . 2 . 3 4 .8 5 2 2 10 1 2 5 8 3 2 10 . 1 . 1 5 .9 7 5 1 1 1 7 3 2 3 2 3 . 2 . 2 4 1

10 11 4 2 1 2 4 3 4 4 3 3 . 3 . 6 5 311 8 2 2 . 2 4 3 4 4 1 3 . 2 . 7 5 .12 9 4 2 . 1 4 3 1 3 1 . . 1 . 6 5 .13 4 8 8 3 2 7 7 2 4 6 9 . 1 . 4 4 .14 5 3 1 5 1 4 3 3 4 1 2 . 4 . 2 3 .15 5 1 1 6 . 4 3 2 7 2 5 . 1 . 3 3 .16 8 1 1 5 . 8 3 7 10 1 5 . 2 . 5 1 .17 4 3 2 6 . 2 5 2 5 2 5 . 1 . 5 1 .18 9 1 1 5 . . 6 7 4 3 9 . 3 . 6 3 .19 8 3 1 1 . 2 3 7 10 1 2 . . . 5 1 .20 9 4 4 1 . 4 3 2 4 4 4 . . . 2 3 .21 2 1 1 6 2 . 3 . 4 . 6 . 1 . . . 322 5 3 1 7 2 . 3 . 1 . 3 . 1 . . . 323 6 7 2 6 1 . 3 7 4 4 9 . 2 . 2 . 324 9 5 2 8 1 5 3 5 2 4 4 . 2 . 1 . 125 7 1 1 6 1 2 3 4 4 7 2 . 2 . 2 . 126 10 5 1 1 2 4 3 4 3 3 3 . 1 . 9 . 327 4 5 3 6 1 4 11 . 3 4 2 2 3 2 6 3 128 5 3 1 5 2 2 5 . 3 6 4 2 2 4 5 5 329 3 1 2 3 2 2 4 2 3 2 2 2 2 2 2 3 230 5 3 1 4 1 2 3 5 1 5 3 2 2 1 2 6 231 5 2 1 1 2 2 3 6 6 2 5 4 1 1 2 5 1

Table F.1 - continued on next page

249

f 7q34-q36 control haplotypes 250

Table F.1 - continued from previous page

# D7S

1518

D7S

661

D7S

498

D7S

2511

D7S

615

D7S

2442

D7S

2419

D7S

688

D7S

2439

D7S

3070

D7S

483

AD

0

D7S

1815

AD

3

D7S

2462

D7S

637

D7S

2447

32 4 1 1 5 2 4 3 3 5 3 2 4 1 3 4 5 233 2 1 1 6 1 2 1 1 4 4 3 4 4 4 1 3 134 1 4 1 7 1 2 2 3 2 3 3 2 2 2 3 3 235 8 5 1 1 1 4 . 2 1 4 4 . 2 . 8 . 336 9 3 1 1 1 4 3 4 2 5 3 . 2 . . . 237 9 3 1 1 . 2 . . 4 3 2 5 . 2 1 . .38 9 7 1 . 2 2 3 4 1 4 . . 2 8 . . 239 4 2 1 5 1 6 6 4 10 . 9 . . . 5 4 340 5 1 2 3 2 4 6 7 1 . 10 . . . 2 5 341 7 7 1 6 1 4 13 4 3 . 3 . . . 6 4 242 5 2 2 2 2 4 14 4 4 . 2 . . . 5 5 343 9 1 1 1 1 4 13 5 4 . 9 . . . 7 3 144 5 3 1 1 2 2 14 1 4 . 9 . . . 5 3 345 5 1 1 1 1 2 13 5 1 . 3 . . . 5 5 246 2 3 1 6 1 7 6 3 4 . 2 . . . 6 3 247 3 3 1 6 2 4 5 8 1 5 2 . 1 . 2 . 348 8 4 1 7 1 4 2 3 . . . . . . 8 . 349 . 7 1 1 1 2 3 2 4 4 3 2 1 3 3 3 350 . 2 3 10 1 . . . . . 4 3 1 1 1 3 251 . 7 7 5 1 4 2 . 4 6 3 . 1 . 4 6 152 . . . 1 2 2 . 4 1 4 2 . 3 . 3 . 353 7 1 1/2 1/3 1 4 3 3/4 1/7 3 8 . 3 . 2 . 354 1 5 7 . 2 6 2 2 8 4 . . 1 . . . .55 1 3 2 1/4 1/2 4 . 2 4 4 2 . 3 . 3 . 256 6 . . 4 1 4 3 4 2 5 2 . 1 . 1 . 3

G7 Q 3 4 - Q 3 6 C A N D I D AT E G E N E S A N D R E P O RT E D

GENE EXPRESSION

Table G.1: Candidate genes within the minimum probable candidate interval on7q34-q36. Only alternate isoforms with unique exons are shown.

Symbol Accession # Gene name Alternate isoforms

CNTNAP2 NM_014141† contactin-associated protein-like 2 isoform 2 (NM_014141)†

C7orf33 NM_145304.2 chromosome 7 open reading frame 33

CUL1 NM_003592 cullin 1

EZH2 NM_004456 enhancer of zeste 2 isoform a isoform b (NM_152998)

PDIA4 NM_004911 protein disulfide isomerase-associated 4

ZNF786 NM_152411 zinc finger protein 786

ZNF425 NM_001001661 zinc finger protein 425

ZNF398 NM_170686 zinc finger 398 isoform a isoform b (NM_020781)

ZNF282 NM_003575 zinc finger protein 282

ZNF212 NM_012256 zinc finger protein 212

ZNF783 NM_001004302 zinc finger protein 783

ZNF777 NM_015694 zinc finger protein 777

ZNF746 NM_152557 zinc finger protein 746

ZNF767 NM_024910 zinc finger family member 767

KRBA1 NM_032534 KRAB A domain containing 1

ZNF467 NM_207336 zinc finger protein 467

SSPO NM_198455 SCO-spondin ZNF862 (NM_001099220)

ATP6V0E2 NM_145230 ATPase, H+ transporting, V0 subunit isoform 1 isoform 2(NM_001100592)

ACTR3C NM_001164458 Actin-related protein 3C variant 2(NM_001164459)

LRRC61 NM_001142928 leucine rich repeat containing 61 variant 2 (NM_023942)

C7orf29 NM_138434 Homo sapiens chromosome 7 open reading frame29

RARRES2 NM_002889 chemerin preproprotein

REPIN1 NM_014374 replication initiator 1 isoform 1 isoform 3(NM_001099695)

ZNF775 NM_173680 zinc finger protein 775

GIMAP8 NM_175571 GTPase, IMAP family member 8

GIMAP7 NM_153236 GTPase, IMAP family member 7

GIMAP4 NM_018326 GTPase, IMAP family member 4

GIMAP6 NM_024711 GTPase, IMAP family member 6

GIMAP2 NM_015660 GTPase, IMAP family member 2

GIMAP1 NM_130759 GTPase, IMAP family member 1

GIMAP5 NM_018384 GTPase, IMAP family member 5

Table G.1 - continued on next page† alternate isoforms with the same identifier as the parent gene.

251

g 7q34-q36 candidate genes and reported gene expression 252

Table G.1 - continued from previous pageSymbol Accession # Gene name Alternate isoforms

TMEM176B NM_014020 transmembrane protein 176B variant 1 variant 2(NM_001101311)

variant 3(NM_001101312)

TMEM176A NM_018487 hepatocellular carcinoma-associated antigen 112

ABP1 NM_001091† Amiloride-sensitive amine oxidase [copper-containing] isoform 2(NM_001091)†

KCNH2 NM_172057 voltage-gated potassium channel, subfamily H isoformc

Isoform b(NM_172056)

NOS3 NM_000603 nitric oxide synthase 3 (endothelial cell)

ATG9B NM_173681 ATG9 autophagy related 9 homolog B

ABCB8 NM_007188 ATP-binding cassette, sub-family B, member 8

ACCN3 NM_004769 amiloride-sensitive cation channel 3 isoform a isoform b(NM_020321)

CDK5 NM_004935 cyclin-dependent kinase 5

SLC4A2 NM_003040 solute carrier family 4, anion exchanger, memberisoform 1

isoform 1 variant 2(NM_001199692)

isoform 3(NM_001199694)

FASTK NM_006712 Fas-activated serine/threonine kinase isoform 1 isoform 4(NM_033015)

TMUB1 NM_031434 transmembrane and ubiquitin-like domain-containingprotein 1 isoform 1

isoform 2(NM_001136044)

AGAP3 NM_031946 centaurin, gamma 3 isoform a isoform b(NM_001042535)

GBX1 NM_001098834 gastrulation brain homeo box 1

ASB10 NM_001142459 ankyrin repeat and SOCS box-containing 10 variant 3(NM_080871)

ABCF2 NM_007189 ATP-binding cassette, sub-family F, member 2 isoforma

isoform b(NM_005692)

CSGLCA-T NM_019015 chondroitin sulfate glucuronyltransferase

SMARCD3 NM_001003802 SWI/SNF-related matrix-associated actin-dependentregulator of chromatin subfamily D member 3

variant 3(NM_001003801)

variant 3(NM_003078)

NUB1 NM_016118 NEDD8 ultimate buster-1

WDR86 NM_198285 WD repeat-containing protein 86

CRYGN NM_144727 Gamma-crystallin N (Gamma-N-crystallin)

RHEB NM_005614 Ras homolog enriched in brain

PRKAG2 NM_016203 5’-AMP-activated protein kinase subunit gamma-2 variant b(NM_024429)

variant c(NM_001040633)

GALNTL5 NM_145292 putative polypeptideN-acetylgalactosaminyltransferase-like protein 5

GALNT11 NM_022087 polypeptide N-acetylgalactosaminyltransferase 11

MLL3 NM_170606 myeloid/lymphoid or mixed-lineage leukemia 3

CCT8L1 NM_001029866 chaperonin containing TCP1, subunit 8

XRCC2 NM_005431 X-ray repair cross complementing protein 2

ACTR3B NM_020445 actin-related protein 3-beta isoform 1

DPP6 NM_130797 dipeptidyl-peptidase 6 isoform 1 isoform 2(NM_001936)

isoform 3(NM_001039350)

DPP6 protein(Q8IYG9)‡

† alternate isoforms with the same identifier as the parent gene.‡UniProtKB/TrEMBL accession number.

g 7q34-q36 candidate genes and reported gene expression 253

Table G.2: Genes localised outside the minimum probable candidate interval onchromosome 7q34-q36. Alternate isoforms are listed where alternate exons for CDS orUTRs exist which require additional PCR amplicons designed for sequencing coverage.Alternately spliced variants without unique exons are not shown.

Symbol Accession # Gene name Alternate isoforms

WEE2 NM_001105558 WEE1 homolog 2

SSBP1 NM_003143 single-stranded DNA binding protein 1

PRSS37 NM_001008270 protease, serine, 37

CLEC5A NM_013252 C-type lectin, superfamily member 5

MGAM NM_004668 maltase-glucoamylase

LOC93432 NR_003715 Putative maltase-glucoamylase-like proteinLOC93432

TRYX3 NM_001001317 trypsin X3

PRSS1 NM_002769 Trypsin-1

PRSS2 NM_002770 Trypsin-2

EPHB6 NM_004445 Ephrin type-B receptor 6 isoform 3 (-)

TRPV6 NM_018646 Transient receptor potential cation channel subfamilyV member 6

TRPV5 NM_019841 Transient receptor potential cation channel subfamilyV member 5

C7orf34 NM_178829 Uncharacterized protein C7orf34

KEL NM_000420 Kell blood group, metallo-endopeptidase

PIP NM_002652 prolactin-induced protein

GSTK1 NM_001143679 glutathione S-transferase kappa 1 isoform 2

TMEM139 NM_153345 transmembrane protein 139

CASP2 NM_032982 caspase 2 isoform 1 isoform 3(NM_032983)

CLCN1 NM_000083 Chloride channel protein 1

FAM131B NM_014690 Protein FAM131B

ZYX NM_003461 zyxin variant 2(NM_001010972)

EPHA1 NM_005232 ephrin receptor EphA1

FAM115C NM_001130025 Protein FAM115C variant 2(NM_173678)

FAM115A NM_014719 Homo sapiens family with sequence similarity 115,member A

ARHGEF35 NM_001003702 rho guanine nucleotide exchange factor 35

ARHGEF5 NM_005435 rho guanine nucleotide exchange factor 5

NOBOX NM_001080413 NOBOX oogenesis homeobox

TPK1 NM_022445 thiamin pyrophosphokinase 1 isoform a isoform b(NM_001042482)

g 7q34-q36 candidate genes and reported gene expression 254

Table G.3: Expression pattern of genes within the chromosome 7q34-q36 dHMN1disease region. (+) indicates the gene is expressed in the tissue indicated, (H) indicatesthe gene is expressed at a higher level in this tissue relative to other tissues in thedata set, (L) indicates the gene is expressed at a lower level in this tissue relative toother tissues, (-) indicates there is no gene expression detected. (*) indicates geneswith no expression data available from Genenote and GNF BioGPS, instead UniGene -electronic northern normal expression data is presented.

Gene Brain Spinal Cord Muscle

WEE2 + + +

SSBP1 + + +

PRSS37 + + +

CLEC5A + + +

MGAM L L L

LOC93432 + + +

TRYX3 + + +

PRSS1 L L L

PRSS2 L L L

EPHB6 + + +

TRPV6 + + +

TRPV5 + + +

C7orf34 + + +

KEL + + +

PIP L L L

GSTK1 + + +

TMEM139 + + +

CASP2 + + +

CLCN1 + + +

FAM131B + + L

ZYX + + +

EPHA1 + + +

FAM115C + + +

ARHGEF5 + + +

NOBOX*

TPK1 + + +

CNTNAP2 H H L

CUL1 + + +

EZH2 L L L

PDIA4 + + +

ZNF786 + L L

ZNF425 + + +

ZNF398 + L +

ZNF282 + + +

ZNF212 + + +

ZNF783 + + +

ZNF777 + + +

ZNF746 + + +

ZNF767 + + +

KRBA1 + + +

ZNF467 + + +

SSPO + + +

ATP6V0E2 H H +

Gene Brain Spinal Cord Muscle

ACTR3C* L

LRRC61 + + +

RARRES2 L L L

REPIN1 + + +

ZNF775 + + +

GIMAP8 + + +

GIMAP7 + + +

GIMAP4 + + +

GIMAP6 + + +

GIMAP2 + + +

GIMAP1 + + +

GIMAP5 + + +

TMEM176B L L L

TMEM176A L L L

ABP1 L L L

KCNH2 + + +

NOS3 + + +

ATG9B + + +

ABCB8 + + +

ACCN3 + + +

CDK5 H + +

SLC4A2 + + +

FASTK + + +

TMUB1 + + +

AGAP3 + + +

GBX1 + + +

ASB10 + + +

ABCF2 + + +

CSGLCA-T + + +

SMARCD3 + + +

NUB1 + + +

WDR86 + H +

CRYGN* + - -

RHEB + + +

PRKAG2 + + +

GALNTL5 + + +

GALNT11 + + +

MLL3 + + +

CCT8L1 + + +

XRCC2 + + +

ACTR3B + + +

DPP6 H H +

HN E X T G E N E R AT I O N S E Q U E N C I N G

Table H.1: Sequencing gaps in 454 sequencing of 7q34-q36 with lengthgreater than 10 kb.

Gap Start position† End position† Length (bp) GenesGAP1 142928643 142944949 16307GAP2 142958727 142984092 25366GAP3 142993142 143019916 26776GAP4 143019976 143055729 35754 FAM115CGAP5 143080626 143097060 16435GAP6 143118288 143174571 56285GAP7 143179456 143202297 22842GAP8 143507776 143535852 28077GAP9 143542456 143582489 40034

GAP10 143591439 143664224 72789GAP11 143671214 143704929 33718 ARHGEF5GAP12 149236329 149249201 12878GAP13 149336627 149348154 11534GAP14 149468960 149485938 16997GAP15 152134431 152148468 14038 ACTR3BGAP16 153092835 153106814 13985GAP17 153357906 153369172 11267 DPP6GAP18 153369692 153381978 12291 DPP6† Sequence position based on Human Mar. 2006 (NCBI36/hg18) Assembly.

255

h next generation sequencing 256

Table H.2: Analysis of novel variants within chromosomal translocations on 7q34-q36.Novel variants include six non-synonymous changes and three other novel variants

Identity† Chr. Strand Start End Ref. DNA SNP at position

MLL3:c.3611T>C100% 7 - 151548623 151548663 T

95.20% 1 - 147158243 147158283 C95.20% 2 + 91248287 91248327 G95.20% 13_rand - 177226 177266 C

MLL3:c.2689C>T100% 7 - 151563895 151563935 C100% 1 - 147142930 147142970 A/G rs58408284100% 2 + 91263559 91263599 G100% 13_rand - 163587 163627 G

97.60% 21 + 10048654 10048694 A/G rs58408284

MLL3:c.2674G>A100% 7 - 151563910 151563950 G

97.60% 1 - 147142915 147142955 C/T rs2838159 rs817533697.60% 2 + 91263574 91263614 C/T rs2838159 rs817533697.60% 13_rand - 163572 163612 G95.20% 21 + 10048669 10048709 C/T rs8175336

CCT8L1:c.483G>A100% 7 + 151773957 151773997 G

97.60% 22 - 15452938 15452978 T

CCT8L1:c.532A>G100% 7 + 151774006 151774046 A

95.20% 22 - 15452889 15452929 C

CCT8L1:c.579G>A100% 7 + 151774054 151774094 G

97.60% 22 - 15452841 15452881 T

CCT8L1:c.619A>G100% 7 + 151774093 151774133 A

95.20% 22 - 15452802 15452842 C

CCT8L1:c.626A>C100% 7 + 151774100 151774140 A

95.20% 22 - 15452795 15452835 G

CCT8L1:c.700G>A100% 7 + 151774175 151774215 G

97.60% 22 - 15452720 15452760 T† Identity indicates the variation of the sequence within 20bp in both directions around

reported variants.

IC N V I D E N T I F I E D W I T H A R R AY C G H

Table I.1: Gains and losses identified in control individual II:7 using aCGH. Start andend positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End

II:7 gains

chr7 141136570 141136707chr7 142281871 142282647chr7 148583077 148583362chr7 149208062 149208712chr7 153092228 153092495chr7 154022900 154027513II:7 losses

chr1 104250813 106105850chr3 117620757 119959839chr4 167486570 171501584chr6 122839696 126222259chr7 49342534 50068562chr7 77906667 79255842

Chr. Start End

chr7 112933761 114687270chr7 145426554 145427643chr7 147106328 147107617chr7 147703070 147707039chr7 148493747 148494200chr7 150677108 150677431chr7 151432571 151433370chr7 151728888 151747584chr9 92604082 93435806chr11 27545010 29000366chr11 38438080 39760781chr12 84625746 86284769chr18 30000311 31201020chr21 15899328 18942345

257

i cnv identified with array cgh 258

Table I.2: Gains and losses identified in affected individual III:9 using aCGH. Startand end positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End

Affected III:9 gains

chr7 141008522 141008827chr7 141136570 141136764chr7 141405649 141408015chr7 141411964 141433327chr7 141436298 141436906chr7 142368556 142369029chr7 142590983 142591379chr7 142926564 142928267chr7 143232340 143233967chr7 148583077 148583344chr7 148942734 148948626chr7 149208036 149208712chr7 150070208 150071040chr7 150469368 150472497chr7 151003440 151003796chr7 154021940 154027513chrX 133579511 134418254Affected III:9 losses

chr1 78301095 81843776chr1 104250813 106105850chr1 187316392 188037381chr2 109804744 111177875chr2 165516142 167081870chr2 218674727 219908166chr2 239631901 240609103chr3 37928555 38986892chr3 73618613 75126725chr3 76194026 81813307chr3 86019386 87778062chr3 171960773 176318840chr3 191707744 194848137chr4 11290241 17357044chr4 26907162 32192670chr4 124278703 124438026chr6 49926208 51776238chr6 78591356 80510744chr6 113488617 119933450

Chr. Start End

chr7 20398581 22826154chr7 87955968 88650701chr7 112933761 114687270chr7 118682103 122036934chr7 128353657 130376195chr7 141383232 141383755chr7 141452114 141452346chr7 141529779 141532054chr7 142705631 142707109chr7 143527056 143705101chr7 144547151 144552372chr7 145923012 145924645chr7 146642408 146643847chr7 148493747 148494215chr7 149504457 149506975chr7 150021169 150021774chr7 150677028 150677330chr7 151432885 151433370chr7 153572344 153576331chr8 28788076 30510657chr8 37782056 38480739chr8 115839571 118291267chr9 101609238 104005216chr9 121327018 122844496chr10 44219153 45339171chr10 87775822 89497316chr11 1405506 3601945chr12 73217711 74588154chr12 84625746 86284769chr13 106444209 109227022chr14 42637746 43884075chr14 79896757 83876603chr15 88803154 91002601chr18 22618645 26406470chr18 30000311 31201020chr18 46253735 48483432chr17 22200000 24170873chr21 15899328 18942345

i cnv identified with array cgh 259

Table I.3: Gains and losses identified in affected individual III:12 using aCGH. Startand end positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End

Affected III:12 gains

chr7 141076919 141077135chr7 141667108 141667973chr7 141691212 141712851chr7 142350806 142351891chr7 143232301 143233967chr7 143729164 143730031chr7 148583210 148583362chr7 148817533 148820960chr7 149207950 149208712chr7 149666004 149668696chr7 153092228 153092499chr7 154338103 154339800Affected III:2 losses

chr1 104250813 106105850chr2 33323057 35182721chr3 37928555 38986892chr3 144583097 147976445chr4 97035757 99272119chr5 117403380 118509111chr7 77906667 79255842chr7 112933761 114687270chr7 143527112 143582678

Chr. Start End

chr7 87955968 88650701chr7 143594780 143705101chr7 144546961 144552372chr7 144944021 144946260chr7 145532824 145533594chr7 146642327 146643847chr7 147703070 147707039chr7 148493747 148494215chr7 149504178 149506807chr7 151432885 151433370chr7 151511225 151515357chr7 151708944 151717257chr7 151728927 151747584chr7 154023876 154031412chr8 37782056 38480739chr8 111799647 114702528chr10 44219153 45339171chr11 38438080 39760781chr11 40954001 41584983chr12 73217711 74588154chr13 93112275 94079060chr14 42637746 43884075chr18 37521727 38182028

i cnv identified with array cgh 260

Table I.4: Gains and losses identified in affected individual II:1 using aCGH. Start andend positions refer to the Human Mar. 2006 (NCBI36/hg18) assembly.

Chr. Start End

Affected II:1 gains

chr7 141411964 141433443chr7 143729164 143730012chr7 148583210 148583362chr7 149207950 149208712chr7 149666004 149667659chr7 150065475 150065749chr7 153092228 153092495chr7 154022327 154029815chrX 22015874 23768392chrX 105779829 106404347chrX 115487962 116370306chrX 128122567 129811507chrX 133579511 134418254Affected II:1 losses

chr1 172119474 179297968chr1 181412375 186257480chr2 165516142 167081870chr3 37928555 38986892chr3 122740584 124718365chr4 90972361 93284435chr5 1597400 5644849chr6 49926208 51776238chr6 62601457 65114644chr6 78591356 80510744chr7 45237624 47001006chr7 112933761 114687270chr7 141452964 141453225chr7 141543684 141543981chr7 143527056 143582678

Chr. Start End

chr7 143594780 143705101chr7 144402282 144404560chr7 144546976 144552372chr7 144930202 144931073chr7 144944085 144945984chr7 146642408 146643847chr7 148136687 148137341chr7 148493747 148494215chr7 149503960 149505812chr7 150677108 150677330chr7 151432825 151433370chr7 151503095 151504259chr7 151597923 151620438chr7 151728927 151747193chr8 41597188 43507449chr8 101729326 106640189chr9 76586459 78615389chr10 44219153 45339171chr10 49721945 52447409chr10 110908377 111439564chr11 40954001 41584983chr12 84625746 86284769chr14 42637746 43884075chr14 57198571 58329722chr15 81026720 86249006chr15 88803154 91002601chr16 58191891 61617665chr18 22618645 26406470chr18 30000311 31201020chr18 37521727 38182028

B I B L I O G R A P H Y

[1] A.E. Harding. Inherited neuronal atrophy and degeneration predominantly of lower motor neurons. In P.J.Dyck, P.K. Thomas, J.W. Griffin, P.A. Low, and J.F. Poduslo, editors, Peripheral Neuropathy, pages 1051–1064.W.B. Saunders, Philadelphia, 3rd edition, 1993.

[2] O. V. Evgrafov, I. Mersiyanova, J. Irobi, L. Van Den Bosch, I. Dierick, C. L. Leung, O. Schagina, N. Verpoorten,K. Van Impe, V. Fedotov, E. Dadali, M. Auer-Grumbach, C. Windpassinger, K. Wagner, Z. Mitrovic, D. Hilton-Jones, K. Talbot, J. J. Martin, N. Vasserman, S. Tverskaya, A. Polyakov, R. K. Liem, J. Gettemans, W. Robberecht,P. De Jonghe, and V. Timmerman. Mutant small heat-shock protein 27 causes axonal Charcot-Marie-Toothdisease and distal hereditary motor neuropathy. Nat. Genet., 36:602–606, Jun 2004.

[3] J. Irobi, K. Van Impe, P. Seeman, A. Jordanova, I. Dierick, N. Verpoorten, A. Michalik, E. De Vriendt, A. Ja-cobs, V. Van Gerwen, K. Vennekens, R. Mazanec, I. Tournev, D. Hilton-Jones, K. Talbot, I. Kremensky, L. VanDen Bosch, W. Robberecht, J. Van Vandekerckhove, C. Van Broeckhoven, J. Gettemans, P. De Jonghe, andV. Timmerman. Hot-spot residue in small heat-shock protein 22 causes distal motor neuropathy. Nat. Genet.,36:597–601, Jun 2004.

[4] S. J. Kolb, P. J. Snyder, E. J. Poi, E. A. Renard, A. Bartlett, S. Gu, S. Sutton, W. D. Arnold, M. L. Freimer, V. H.Lawson, J. T. Kissel, and T. W. Prior. Mutant small heat shock protein B3 causes motor neuropathy: utility ofa candidate gene approach. Neurology, 74:502–506, Feb 2010.

[5] V. Timmerman, P. Raeymaekers, E. Nelis, P. De Jonghe, L. Muylle, C. Ceuterick, J. J. Martin, and C. Van Broeck-hoven. Linkage analysis of distal hereditary motor neuropathy type II (distal HMN II) in a single pedigree. J.Neurol. Sci., 109:41–48, May 1992.

[6] I. Dierick, J. Baets, J. Irobi, A. Jacobs, E. De Vriendt, T. Deconinck, L. Merlini, P. Van den Bergh, V. M. Rasic,W. Robberecht, D. Fischer, R. J. Morales, Z. Mitrovic, P. Seeman, R. Mazanec, A. Kochanski, A. Jordanova,M. Auer-Grumbach, A. T. Helderman-van den Enden, J. H. Wokke, E. Nelis, P. De Jonghe, and V. Timmerman.Relative contribution of mutations in genes for autosomal dominant distal hereditary motor neuropathies: agenotype-phenotype correlation study. Brain, 131:1217–1227, May 2008.

[7] K. W. Chung, S. B. Kim, S. Y. Cho, S. J. Hwang, S. W. Park, S. H. Kang, J. Kim, J. H. Yoo, and B. O. Choi. Distalhereditary motor neuropathy in Korean patients with a small heat shock protein 27 mutation. Exp. Mol. Med.,40:304–312, Jun 2008.

[8] B. Tang, X. Liu, G. Zhao, W. Luo, K. Xia, Q. Pan, F. Cai, Z. Hu, C. Zhang, B. Chen, F. Zhang, L. Shen, R. Zhang,and H. Jiang. Mutation analysis of the small heat shock protein 27 gene in chinese patients with Charcot-Marie-Tooth disease. Arch. Neurol., 62:1201–1207, Aug 2005.

[9] B. S. Tang, G. H. Zhao, W. Luo, K. Xia, F. Cai, Q. Pan, R. X. Zhang, F. F. Zhang, X. M. Liu, B. Chen, C. Zhang,L. Shen, H. Jiang, Z. G. Long, and H. P. Dai. Small heat-shock protein 22 mutated in autosomal dominantCharcot-Marie-Tooth disease type 2L. Hum. Genet., 116:222–224, Feb 2005.

[10] H. H. Kampinga, J. Hageman, M. J. Vos, H. Kubota, R. M. Tanguay, E. A. Bruford, M. E. Cheetham, B. Chen,and L. E. Hightower. Guidelines for the nomenclature of the human heat shock proteins. Cell Stress Chaperones,14:105–111, Jan 2009.

[11] G. Kappe, E. Franck, P. Verschuure, W. C. Boelens, J. A. Leunissen, and W. W. de Jong. The human genomeencodes 10 alpha-crystallin-related small heat shock proteins: HspB1-10. Cell Stress Chaperones, 8:53–61, 2003.

[12] S. Quraishe, A. Asuni, W. C. Boelens, V. O’Connor, and A. Wyttenbach. Expression of the small heat shockprotein family in the mouse CNS: differential anatomical and biochemical compartmentalization. Neuroscience,153:483–491, May 2008.

[13] P. Tallot, J. F. Grongnet, and J. C. David. Dual perinatal and developmental expression of the small heat shockproteins [FC12]aB-crystallin and Hsp27 in different tissues of the developing piglet. Biol. Neonate, 83:281–288,2003.

[14] P. Verschuure, C. Tatard, W. C. Boelens, J. F. Grongnet, and J. C. David. Expression of small heat shockproteins HspB2, HspB8, Hsp20 and cvHsp in different tissues of the perinatal developing pig. Eur. J. Cell Biol.,82:523–530, Oct 2003.

261

bibliography 262

[15] J. C. Plumier, D. A. Hopkins, H. A. Robertson, and R. W. Currie. Constitutive expression of the 27-kDa heatshock protein (Hsp27) in sensory and motor neurons of the rat nervous system. J. Comp. Neurol., 384:409–428,Aug 1997.

[16] R. Benndorf, X. Sun, R. R. Gilmont, K. J. Biederman, M. P. Molloy, C. W. Goodmurphy, H. Cheng, P. C.Andrews, and M. J. Welsh. HSP22, a new member of the small heat shock protein superfamily, interacts withmimic of phosphorylated HSP27 ((3D)HSP27). J. Biol. Chem., 276:26753–26761, Jul 2001.

[17] K. Kijima, C. Numakura, T. Goto, T. Takahashi, T. Otagiri, K. Umetsu, and K. Hayasaka. Small heat shockprotein 27 mutation in a Japanese patient with distal hereditary motor neuropathy. J. Hum. Genet., 50:473–476,2005.

[18] H. Houlden, M. Laura, F. Wavrant-De Vrieze, J. Blake, N. Wood, and M. M. Reilly. Mutations in the HSP27(HSPB1) gene cause dominant, recessive, and sporadic distal HMN/CMT type 2. Neurology, 71:1660–1668,Nov 2008.

[19] Y. Z. Chen, S. H. Hashemi, S. K. Anderson, Y. Huang, M. C. Moreira, D. R. Lynch, I. A. Glass, P. F. Chance, andC. L. Bennett. Senataxin, the yeast Sen1p orthologue: characterization of a unique protein in which recessivemutations cause ataxia and dominant mutations cause motor neuron disease. Neurobiol. Dis., 23:97–108, Jul2006.

[20] A. Antonellis and E. D. Green. The role of aminoacyl-tRNA synthetases in genetic diseases. Annu Rev GenomicsHum Genet, 9:87–107, 2008.

[21] M. Auer-Grumbach, A. Olschewski, L. Papi?, H. Kremer, M. E. McEntagart, S. Uhrig, C. Fischer, E. Frohlich,Z. Balint, B. Tang, H. Strohmaier, H. Lochmuller, B. Schlotter-Weigel, J. Senderek, A. Krebs, K. J. Dick, R. Petty,C. Longman, N. E. Anderson, G. W. Padberg, H. J. Schelhaas, C. M. van Ravenswaaij-Arts, T. R. Pieber, A. H.Crosby, and C. Guelly. Alterations in the ankyrin domain of TRPV4 cause congenital distal SMA, scapuloper-oneal SMA and HMSN2C. Nat. Genet., 42:160–164, Feb 2010.

[22] D. Rambaldi and F. D. Ciccarelli. FancyGene: dynamic visualization of gene structures and protein domainarchitectures on genomic loci. Bioinformatics, 25:2281–2282, Sep 2009.

[23] J. M. Fontaine, X. Sun, R. Benndorf, and M. J. Welsh. Interactions of HSP22 (HSPB8) with HSP20, alphaB-crystallin, and HSPB3. Biochem. Biophys. Res. Commun., 337:1006–1011, Nov 2005.

[24] A. A. Shemetov, A. S. Seit-Nebi, O. V. Bukach, and N. B. Gusev. Phosphorylation by cyclic AMP-dependentprotein kinase inhibits chaperone-like activity of human HSP22 in vitro. Biochemistry Mosc., 73:200–208, Feb2008.

[25] K. Kato, K. Hasegawa, S. Goto, and Y. Inaguma. Dissociation as a result of phosphorylation of an aggregatedform of the small stress protein, hsp27. J. Biol. Chem., 269:11274–11278, Apr 1994.

[26] L. Almeida-Souza, S. Goethals, V. de Winter, I. Dierick, R. Gallardo, J. Van Durme, J. Irobi, J. Gettemans,F. Rousseau, J. Schymkowitz, V. Timmerman, and S. Janssens. Increased monomerization of mutant HSPB1leads to protein hyperactivity in Charcot-Marie-Tooth neuropathy. J. Biol. Chem., 285:12778–12786, Apr 2010.

[27] D. Hayes, V. Napoli, A. Mazurkie, W. F. Stafford, and P. Graceffa. Phosphorylation dependence of hsp27multimeric size and molecular chaperone function. J. Biol. Chem., 284:18801–18807, Jul 2009.

[28] R. A. Stetler, Y. Gao, A. P. Signore, G. Cao, and J. Chen. HSP27: mechanisms of cellular protection againstneuronal injury. Curr. Mol. Med., 9:863–872, Sep 2009.

[29] P. Mehlen, A. Mehlen, J. Godet, and A. P. Arrigo. hsp27 as a switch between differentiation and apoptosis inmurine embryonic stem cells. J. Biol. Chem., 272:31657–31665, Dec 1997.

[30] R. Benndorf, K. Hayess, S. Ryazantsev, M. Wieske, J. Behlke, and G. Lutsch. Phosphorylation and supramolecu-lar organization of murine small heat shock protein HSP25 abolish its actin polymerization-inhibiting activity.J. Biol. Chem., 269:20780–20784, Aug 1994.

[31] A. P. Arrigo. The cellular "networking" of mammalian Hsp27 and its functions in the control of protein folding,redox state and apoptosis. Adv. Exp. Med. Biol., 594:14–26, 2007.

[32] Y. Chen, T. S. Voegeli, P. P. Liu, E. G. Noble, and R. W. Currie. Heat shock paradox and a new role of heatshock proteins and their receptors as anti-inflammation targets. Inflamm Allergy Drug Targets, 6:91–100, Jun2007.

bibliography 263

[33] K. S. Sinsimer, F. M. Gratacos, A. M. Knapinska, J. Lu, C. D. Krause, A. V. Wierzbowski, L. R. Maher,S. Scrudato, Y. M. Rivera, S. Gupta, D. K. Turrin, M. P. De La Cruz, S. Pestka, and G. Brewer. ChaperoneHsp27, a novel subunit of AUF1 protein complexes, functions in AU-rich element-mediated mRNA decay.Mol. Cell. Biol., 28:5223–5237, Sep 2008.

[34] C. A. Ross and M. A. Poirier. Protein aggregation and neurodegenerative disease. Nat. Med., 10 Suppl:S10–17,Jul 2004.

[35] S. Ackerley, P. A. James, A. Kalli, S. French, K. E. Davies, and K. Talbot. A mutation in the small heat-shockprotein HSPB1 leading to distal hereditary motor neuronopathy disrupts neurofilament assembly and theaxonal transport of specific cellular cargoes. Hum. Mol. Genet., 15:347–354, Jan 2006.

[36] I. V. Mersiyanova, A. V. Perepelov, A. V. Polyakov, V. F. Sitnikov, E. L. Dadali, R. B. Oparin, A. N. Petrin, andO. V. Evgrafov. A new variant of Charcot-Marie-Tooth disease type 2 is probably the result of a mutation inthe neurofilament-light gene. Am. J. Hum. Genet., 67:37–46, Jul 2000.

[37] G. M. Fabrizi, T. Cavallaro, C. Angiari, L. Bertolasi, I. Cabrini, M. Ferrarini, and N. Rizzuto. Giant axon andneurofilament accumulation in Charcot-Marie-Tooth disease type 2E. Neurology, 62:1429–1431, Apr 2004.

[38] J. Zhai, H. Lin, J. P. Julien, and W. W. Schlaepfer. Disruption of neurofilament network with aggregation oflight neurofilament protein: a common pathway leading to motor neuron degeneration due to Charcot-Marie-Tooth disease-linked mutations in NFL and HSPB1. Hum. Mol. Genet., 16:3103–3116, Dec 2007.

[39] J. Irobi, L. Almeida-Souza, B. Asselbergh, V. De Winter, S. Goethals, I. Dierick, J. Krishnan, J. P. Timmermans,W. Robberecht, P. De Jonghe, L. Van Den Bosch, S. Janssens, and V. Timmerman. Mutant HSPB8 causes motorneuron-specific neurite degeneration. Hum. Mol. Genet., 19:3254–3265, Aug 2010.

[40] M. Coleman. Axon degeneration mechanisms: commonality amid diversity. Nat. Rev. Neurosci., 6:889–898,Nov 2005.

[41] S. C. Benn, D. Perrelet, A. C. Kato, J. Scholz, I. Decosterd, R. J. Mannion, J. C. Bakowska, and C. J. Woolf.Hsp27 upregulation and phosphorylation is required for injured sensory and motor neuron survival. Neuron,36:45–56, Sep 2002.

[42] J. N. Lavoie, G. Gingras-Breton, R. M. Tanguay, and J. Landry. Induction of Chinese hamster HSP27 gene ex-pression in mouse cells confers resistance to heat shock. HSP27 stabilization of the microfilament organization.J. Biol. Chem., 268:3420–3429, Feb 1993.

[43] N. Mounier and A. P. Arrigo. Actin cytoskeleton and small heat shock proteins: how do they interact? CellStress Chaperones, 7:167–176, Apr 2002.

[44] B. M. Doshi, L. E. Hightower, and J. Lee. The role of Hsp27 and actin in the regulation of movement in humancancer cells responding to heat shock. Cell Stress Chaperones, 14:445–457, Sep 2009.

[45] G. Pigino, L.L. Kirkpatrick, and S.T. Brady. Cytoskeleton of Neurons and Glia. In G.J. Siegel, editor, Basicneurochemistry : molecular, cellular, and medical aspects, pages 123–138. Elsevier Academic, Toronto, 7th edition,2006.

[46] A. J. Macario and E. Conway de Macario. Chaperonopathies by defect, excess, or mistake. Ann. N. Y. Acad.Sci., 1113:178–191, Oct 2007.

[47] M. V. Kim, A. S. Kasakov, A. S. Seit-Nebi, S. B. Marston, and N. B. Gusev. Structure and properties of K141Emutant of small heat shock protein HSP22 (HspB8, H11) that is expressed in human neuromuscular disorders.Arch. Biochem. Biophys., 454:32–41, Oct 2006.

[48] M. Litt, P. Kramer, D. M. LaMorticella, W. Murphey, E. W. Lovrien, and R. G. Weleber. Autosomal dominantcongenital cataract associated with a missense mutation in the human alpha crystallin gene CRYAA. Hum.Mol. Genet., 7:471–474, Mar 1998.

[49] S. Bera, P. Thampi, W. J. Cho, and E. C. Abraham. A positive charge preservation at position 116 of alphaA-crystallin is critical for its structural and functional integrity. Biochemistry, 41:12421–12426, Oct 2002.

[50] D. Baumer, O. Ansorge, M. Almeida, and K. Talbot. The role of RNA processing in the pathogenesis of motorneuron degeneration. Expert Rev Mol Med, 12:e21, 2010.

[51] M. van Blitterswijk and J. E. Landers. RNA processing pathways in amyotrophic lateral sclerosis. Neurogenetics,11:275–290, Jul 2010.

bibliography 264

[52] X. Sun, J. M. Fontaine, A. D. Hoppe, S. Carra, C. DeGuzman, J. L. Martin, S. Simon, P. Vicart, M. J. Welsh,J. Landry, and R. Benndorf. Abnormal interaction of motor neuropathy-associated mutant HspB8 (Hsp22)forms with the RNA helicase Ddx20 (gemin3). Cell Stress Chaperones, 15:567–582, Sep 2010.

[53] S. Lefebvre, L. Burglen, S. Reboullet, O. Clermont, P. Burlet, L. Viollet, B. Benichou, C. Cruaud, P. Millasseau,and M. Zeviani. Identification and characterization of a spinal muscular atrophy-determining gene. Cell, 80:155–165, Jan 1995.

[54] S. Lefebvre, P. Burlet, Q. Liu, S. Bertrandy, O. Clermont, A. Munnich, G. Dreyfuss, and J. Melki. Correlationbetween severity and SMN protein level in spinal muscular atrophy. Nat. Genet., 16:265–269, Jul 1997.

[55] C. Soler-Botija, I. Cusco, L. Caselles, E. Lopez, M. Baiget, and E. F. Tizzano. Implication of fetal SMN2expression in type I SMA pathogenesis: protection or pathological gain of function? J. Neuropathol. Exp. Neurol.,64:215–223, Mar 2005.

[56] A. K. Gubitz, W. Feng, and G. Dreyfuss. The SMN complex. Exp. Cell Res., 296:51–56, May 2004.

[57] G. Meister, C. Eggert, and U. Fischer. SMN-mediated assembly of RNPs: a complex story. Trends Cell Biol., 12:472–478, Oct 2002.

[58] A. G. Todd, D. J. Shaw, R. Morse, H. Stebbings, and P. J. Young. SMN and the Gemin proteins form sub-complexes that localise to both stationary and dynamic neurite granules. Biochem. Biophys. Res. Commun., 394:211–216, Mar 2010.

[59] W. Rossoll, S. Jablonka, C. Andreassi, A. K. Kroning, K. Karle, U. R. Monani, and M. Sendtner. Smn, the spinalmuscular atrophy-determining gene product, modulates axon growth and localization of beta-actin mRNA ingrowth cones of motoneurons. J. Cell Biol., 163:801–812, Nov 2003.

[60] F. Gabanella, M. E. Butchbach, L. Saieva, C. Carissimi, A. H. Burghes, and L. Pellizzoni. Ribonucleoproteinassembly defects correlate with spinal muscular atrophy severity and preferentially affect a subset of spliceo-somal snRNPs. PLoS ONE, 2:e921, 2007.

[61] G. Laroia, R. Cuesta, G. Brewer, and R. J. Schneider. Control of mRNA decay by heat shock-ubiquitin-proteasome pathway. Science, 284:499–502, Apr 1999.

[62] G. Laroia, B. Sarkar, and R. J. Schneider. Ubiquitin-dependent mechanism regulates rapid turnover of AU-richcytokine mRNAs. Proc. Natl. Acad. Sci. U.S.A., 99:1842–1846, Feb 2002.

[63] I. Dierick, J. Irobi, S. Janssens, J. Theuns, R. Lemmens, A. Jacobs, E. Corsmit, N. Hersmus, L. Van Den Bosch,W. Robberecht, P. De Jonghe, C. Van Broeckhoven, and V. Timmerman. Genetic variant in the HSPB1 promoterregion impairs the HSP27 stress response. Hum. Mutat., 28:830, Aug 2007.

[64] I. Puls, C. Jonnakuty, B. H. LaMonte, E. L. Holzbaur, M. Tokito, E. Mann, M. K. Floeter, K. Bidus, D. Drayna,S. J. Oh, R. H. Brown, C. L. Ludlow, and K. H. Fischbeck. Mutant dynactin in motor neuron disease. Nat.Genet., 33:455–456, Apr 2003.

[65] C. Munch, R. Sedlmeier, T. Meyer, V. Homberg, A. D. Sperfeld, A. Kurt, J. Prudlo, G. Peraus, C. O. Hanemann,G. Stumm, and A. C. Ludolph. Point mutations of the p150 subunit of dynactin (DCTN1) gene in ALS.Neurology, 63:724–726, Aug 2004.

[66] C. Munch, A. Rosenbohm, A. D. Sperfeld, I. Uttner, S. Reske, B. J. Krause, R. Sedlmeier, T. Meyer, C. O.Hanemann, G. Stumm, and A. C. Ludolph. Heterozygous R1101K mutation of the DCTN1 gene in a familywith ALS and FTD. Ann. Neurol., 58:777–780, Nov 2005.

[67] C. Vilarino-Guell, C. Wider, A. I. Soto-Ortolaza, S. A. Cobb, J. M. Kachergus, B. H. Keeling, J. C. Dachsel,M. M. Hulihan, D. W. Dickson, Z. K. Wszolek, R. J. Uitti, N. R. Graff-Radford, B. F. Boeve, K. A. Josephs,B. Miller, K. B. Boylan, K. Gwinn, C. H. Adler, J. O. Aasly, F. Hentati, A. Destee, A. Krygowska-Wajs, M. C.Chartier-Harlin, O. A. Ross, R. Rademakers, and M. J. Farrer. Characterization of DCTN1 genetic variabilityin neurodegeneration. Neurology, 72:2024–2028, Jun 2009.

[68] T. A. Schroer. Dynactin. Annu. Rev. Cell Dev. Biol., 20:759–779, 2004.

[69] R. H. Melloni, M. K. Tokito, and E. L. Holzbaur. Expression of the p150Glued component of the dynactincomplex in developing and adult rat brain. J. Comp. Neurol., 357:15–24, Jun 1995.

[70] M. J. Farrer, M. M. Hulihan, J. M. Kachergus, J. C. Dachsel, A. J. Stoessl, L. L. Grantier, S. Calne, D. B. Calne,B. Lechevalier, F. Chapon, Y. Tsuboi, T. Yamada, L. Gutmann, B. Elibol, K. P. Bhatia, C. Wider, C. Vilarino-Guell,O. A. Ross, L. A. Brown, M. Castanedes-Casey, D. W. Dickson, and Z. K. Wszolek. DCTN1 mutations in Perrysyndrome. Nat. Genet., 41:163–165, Feb 2009.

bibliography 265

[71] J. R. Levy, C. J. Sumner, J. P. Caviston, M. K. Tokito, S. Ranganathan, L. A. Ligon, K. E. Wallace, B. H. LaMonte,G. G. Harmison, I. Puls, K. H. Fischbeck, and E. L. Holzbaur. A motor neuron disease-associated mutation inp150Glued perturbs dynactin function and induces protein aggregation. J. Cell Biol., 172:733–745, Feb 2006.

[72] B. H. LaMonte, K. E. Wallace, B. A. Holloway, S. S. Shelly, J. Ascano, M. Tokito, T. Van Winkle, D. S. Howland,and E. L. Holzbaur. Disruption of dynein/dynactin inhibits axonal transport in motor neurons causing late-onset progressive degeneration. Neuron, 34:715–727, May 2002.

[73] I. Puls, S. J. Oh, C. J. Sumner, K. E. Wallace, M. K. Floeter, E. A. Mann, W. R. Kennedy, G. Wendelschafer-Crabb,A. Vortmeyer, R. Powers, K. Finnegan, E. L. Holzbaur, K. H. Fischbeck, and C. L. Ludlow. Distal spinal andbulbar muscular atrophy caused by dynactin mutation. Ann. Neurol., 57:687–694, May 2005.

[74] J. K. Moore, D. Sept, and J. A. Cooper. Neurodegeneration mutations in dynactin impair dynein-dependentnuclear migration. Proc. Natl. Acad. Sci. U.S.A., 106:5147–5152, Mar 2009.

[75] S. Ahmed, S. Sun, A. E. Siglin, T. Polenova, and J. C. Williams. Disease-associated mutations in the p150(Glued)subunit destabilize the CAP-gly domain. Biochemistry, 49:5083–5085, Jun 2010.

[76] K. Grohmann, M. Schuelke, A. Diers, K. Hoffmann, B. Lucke, C. Adams, E. Bertini, H. Leonhardt-Horti,F. Muntoni, R. Ouvrier, A. Pfeufer, R. Rossi, L. Van Maldergem, J. M. Wilmshurst, T. F. Wienker, M. Sendtner,S. Rudnik-Schoneborn, K. Zerres, and C. Hubner. Mutations in the gene encoding immunoglobulin mu-binding protein 2 cause spinal muscular atrophy with respiratory distress type 1. Nat. Genet., 29:75–77, Sep2001.

[77] K. Grohmann, R. Varon, P. Stolz, M. Schuelke, C. Janetzki, E. Bertini, K. Bushby, F. Muntoni, R. Ouvrier,L. Van Maldergem, N. M. Goemans, H. Lochmuller, S. Eichholz, C. Adams, F. Bosch, P. Grattan-Smith,C. Navarro, H. Neitzel, T. Polster, H. Topaloglu, C. Steglich, U. P. Guenther, K. Zerres, S. Rudnik-Schoneborn,and C. Hubner. Infantile spinal muscular atrophy with respiratory distress type 1 (SMARD1). Ann. Neurol.,54:719–724, Dec 2003.

[78] M. Pitt, H. Houlden, J. Jacobs, Q. Mok, B. Harding, M. Reilly, and R. Surtees. Severe infantile neuropathy withdiaphragmatic weakness and its relationship to SMARD1. Brain, 126:2682–2692, Dec 2003.

[79] U. P. Guenther, M. Schuelke, E. Bertini, A. D’Amico, N. Goemans, K. Grohmann, C. Hubner, and R. Varon.Genomic rearrangements at the IGHMBP2 gene locus in two patients with SMARD1. Hum. Genet., 115:319–326,Sep 2004.

[80] I. Maystadt, M. Zarhrate, P. Landrieu, O. Boespflug-Tanguy, S. Sukno, P. Collignon, J. Melki, C. Verellen-Dumoulin, A. Munnich, and L. Viollet. Allelic heterogeneity of SMARD1 at the IGHMBP2 locus. Hum. Mutat.,23:525–526, May 2004.

[81] R. E. Appleton, C. Hubner, K. Grohmann, and R. Varon. ’Congenital peripheral neuropathy presenting asapnoea and respiratory insufficiency: spinal muscular atrophy with respiratory distress type 1 (SMARD1)’.Dev Med Child Neurol, 46:576, Aug 2004.

[82] A. Giannini, A. M. Pinto, G. Rossetti, E. Prandi, D. Tiziano, C. Brahe, and N. Nardocci. Respiratory failure ininfants due to spinal muscular atrophy with respiratory distress type 1. Intensive Care Med, 32:1851–1855, Nov2006.

[83] V. C. Wong, B. H. Chung, S. Li, W. Goh, and S. L. Lee. Mutation of gene in spinal muscular atrophy respiratorydistress type I. Pediatr. Neurol., 34:474–477, Jun 2006.

[84] U. P. Guenther, R. Varon, M. Schlicke, V. Dutrannoy, A. Volk, C. Hubner, K. von Au, and M. Schuelke. Clin-ical and mutational profile in spinal muscular atrophy with respiratory distress (SMARD): defining novelphenotypes through hierarchical cluster analysis. Hum. Mutat., 28:808–815, Aug 2007.

[85] U. P. Guenther, L. Handoko, R. Varon, U. Stephani, C. Y. Tsao, J. R. Mendell, S. Lutzkendorf, C. Hubner, K. vonAu, S. Jablonka, G. Dittmar, U. Heinemann, A. Schuetz, and M. Schuelke. Clinical variability in distal spinalmuscular atrophy type 1 (DSMA1): determination of steady-state IGHMBP2 protein levels in five patientswith infantile and juvenile disease. J. Mol. Med., 87:31–41, Jan 2009.

[86] V. Fanos, A. Cuccu, S. Nemolato, V. Marinelli, and G. Faa. A new nonsense mutation of the IGHMBP2 generesponsible for the first case of SMARD1 in a Sardinian patient with giant cell hepatitis. Neuropediatrics, 41:132–134, Jun 2010.

[87] A. Diers, M. Kaczinski, K. Grohmann, C. Hubner, and G. Stoltenburg-Didinger. The ultrastructure of pe-ripheral nerve, motor end-plate and skeletal muscle in patients suffering from spinal muscular atrophy withrespiratory distress type 1 (SMARD1). Acta Neuropathol., 110:289–297, Sep 2005.

bibliography 266

[88] A. M. Kaindl, U. P. Guenther, S. Rudnik-Schoneborn, R. Varon, K. Zerres, M. Schuelke, C. Hubner, and K. vonAu. Spinal muscular atrophy with respiratory distress type 1 (SMARD1). J. Child Neurol., 23:199–204, Feb2008.

[89] Y. Fukita, T. R. Mizuta, M. Shirozu, K. Ozawa, A. Shimizu, and T. Honjo. The human S mu bp-2, a DNA-binding protein specific to the single-stranded guanine-rich sequence related to the immunoglobulin muchain switch region. J. Biol. Chem., 268:17463–17470, Aug 1993.

[90] K. Grohmann, W. Rossoll, I. Kobsar, B. Holtmann, S. Jablonka, C. Wessig, G. Stoltenburg-Didinger, U. Fischer,C. Hubner, R. Martini, and M. Sendtner. Characterization of Ighmbp2 in motor neurons and implications forthe pathomechanism in a mouse model of human spinal muscular atrophy with respiratory distress type 1(SMARD1). Hum. Mol. Genet., 13:2031–2042, Sep 2004.

[91] A. E. Gorbalenyaa and E. V. Kooninb. Helicases: amino acid sequence comparisons and structure-functionrelationships. Curr. Opin. Struct. Biol., 3:419–429, June 1993.

[92] T. R. Mizuta, Y. Fukita, T. Miyoshi, A. Shimizu, and T. Honjo. Isolation of cDNA encoding a binding proteinspecific to 5’-phosphorylated single-stranded DNA with G-rich sequences. Nucleic Acids Res., 21:1761–1766,Apr 1993.

[93] J. E. Walker, M. Saraste, M. J. Runswick, and N. J. Gay. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotidebinding fold. EMBO J., 1:945–951, 1982.

[94] N. V. Grishin. The R3H motif: a domain that binds single-stranded nucleic acids. Trends Biochem. Sci., 23:329–330, Sep 1998.

[95] E. Liepinsh, A. Leonchiks, A. Sharipo, L. Guignard, and G. Otting. Solution structure of the R3H domain fromhuman Smubp-2. J. Mol. Biol., 326:217–223, Feb 2003.

[96] Y. Z. Chen, C. L. Bennett, H. M. Huynh, I. P. Blair, I. Puls, J. Irobi, I. Dierick, A. Abel, M. L. Kennerson,B. A. Rabin, G. A. Nicholson, M. Auer-Grumbach, K. Wagner, P. De Jonghe, J. W. Griffin, K. H. Fischbeck,V. Timmerman, D. R. Cornblath, and P. F. Chance. DNA/RNA helicase gene mutations in a form of juvenileamyotrophic lateral sclerosis (ALS4). Am. J. Hum. Genet., 74:1128–1135, Jun 2004.

[97] U. P. Guenther, L. Handoko, B. Laggerbauer, S. Jablonka, A. Chari, M. Alzheimer, J. Ohmer, O. Plottner,N. Gehring, A. Sickmann, K. von Au, M. Schuelke, and U. Fischer. IGHMBP2 is a ribosome-associated helicaseinactive in the neuromuscular disorder distal SMA type 1 (DSMA1). Hum. Mol. Genet., 18:1288–1300, Apr 2009.

[98] M. de Planell-Saguer, D. G. Schroeder, M. C. Rodicio, G. A. Cox, and Z. Mourelatos. Biochemical and geneticevidence for a role of IGHMBP2 in the translational machinery. Hum. Mol. Genet., 18:2115–2126, Jun 2009.

[99] M. Miao, S. L. Chan, G. L. Fletcher, and C. L. Hew. The rat ortholog of the presumptive flounder antifreezeenhancer-binding protein is a helicase domain-containing protein. Eur. J. Biochem., 267:7237–7246, Dec 2000.

[100] G. A. Cox, C. L. Mahaffey, and W. N. Frankel. Identification of the mouse neuromuscular degeneration geneand mapping of a second site suppressor allele. Neuron, 21:1327–1337, Dec 1998.

[101] T. P. Maddatu, S. M. Garvey, D. G. Schroeder, T. G. Hampton, and G. A. Cox. Transgenic rescue of neurogenicatrophy in the nmd mouse reveals a role for Ighmbp2 in dilated cardiomyopathy. Hum. Mol. Genet., 13:1105–1115, Jun 2004.

[102] P. De Jonghe, M. Auer-Grumbach, J. Irobi, K. Wagner, B. Plecko, M. Kennerson, D. Zhu, E. De Vriendt,V. Van Gerwen, G. Nicholson, H. P. Hartung, and V. Timmerman. Autosomal dominant juvenile amyotrophiclateral sclerosis and distal hereditary motor neuronopathy with pyramidal tract signs: synonyms for the samedisorder? Brain, 125:1320–1325, Jun 2002.

[103] S. Gopinath, I. P. Blair, M. L. Kennerson, J. C. Durnall, and G. A. Nicholson. A novel locus for distal motorneuron degeneration maps to chromosome 7q34-q36. Hum. Genet., 121:559–564, Jun 2007.

[104] M. C. Moreira, S. Klur, M. Watanabe, A. H. Nemeth, I. Le Ber, J. C. Moniz, C. Tranchant, P. Aubourg, M. Tazir,L. Schols, M. Pandolfo, J. B. Schulz, J. Pouget, P. Calvas, M. Shizuka-Ikeda, M. Shoji, M. Tanaka, L. Izatt, C. E.Shaw, A. M’Zahem, E. Dunne, P. Bomont, T. Benhassine, N. Bouslam, G. Stevanin, A. Brice, J. Guimaraes,P. Mendonca, C. Barbot, P. Coutinho, J. Sequeiros, A. Durr, J. M. Warter, and M. Koenig. Senataxin, theortholog of a yeast RNA helicase, is mutant in ataxia-ocular apraxia 2. Nat. Genet., 36:225–227, Mar 2004.

[105] A. Duquette, K. Roddier, J. McNabb-Baltar, I. Gosselin, A. St-Denis, M. J. Dicaire, L. Loisel, D. Labuda, L. Marc-hand, J. Mathieu, J. P. Bouchard, and B. Brais. Mutations in senataxin responsible for Quebec cluster of ataxiawith neuropathy. Ann. Neurol., 57:408–414, Mar 2005.

bibliography 267

[106] C. Criscuolo, L. Chessa, S. Di Giandomenico, P. Mancini, F. Sacca, G. S. Grieco, M. Piane, F. Barbieri,G. De Michele, S. Banfi, F. Pierelli, N. Rizzuto, F. M. Santorelli, L. Gallosti, A. Filla, and C. Casali. Ataxiawith oculomotor apraxia type 2: a clinical, pathologic, and genetic study. Neurology, 66:1207–1210, Apr 2006.

[107] T. Asaka, H. Yokoji, J. Ito, K. Yamaguchi, and A. Matsushima. Autosomal recessive ataxia with peripheralneuropathy and elevated AFP: novel mutations in SETX. Neurology, 66:1580–1581, May 2006.

[108] B. L. Fogel and S. Perlman. Novel mutations in the senataxin DNA/RNA helicase domain in ataxia withoculomotor apraxia 2. Neurology, 67:2083–2084, Dec 2006.

[109] A. Suraweera, O. J. Becherel, P. Chen, N. Rundle, R. Woods, J. Nakamura, M. Gatei, C. Criscuolo, A. Filla,L. Chessa, M. Fusser, B. Epe, N. Gueven, and M. F. Lavin. Senataxin, defective in ataxia oculomotor apraxiatype 2, is involved in the defense against oxidative DNA damage. J. Cell Biol., 177:969–979, Jun 2007.

[110] D. R. Lynch, C. D. Braastad, and N. Nagan. Ovarian failure in ataxia with oculomotor apraxia type 2. Am. J.Med. Genet. A, 143A:1775–1777, Aug 2007.

[111] M. Anheim, M. C. Fleury, J. Franques, M. C. Moreira, J. P. Delaunoy, D. Stoppa-Lyonnet, M. Koenig, andC. Tranchant. Clinical and molecular findings of ataxia with oculomotor apraxia type 2 in 4 families. Arch.Neurol., 65:958–962, Jul 2008.

[112] L. Arning, L. Schols, H. Cin, M. Souquet, J. T. Epplen, and D. Timmann. Identification and characterisationof a large senataxin (SETX) gene duplication in ataxia with ocular apraxia type 2 (AOA2). Neurogenetics, 9:295–299, Oct 2008.

[113] P. Nicolaou, A. Georghiou, C. Votsi, L. T. Middleton, E. Zamba-Papanicolaou, and K. Christodoulou. A novelc.5308_5311delGAGA mutation in Senataxin in a Cypriot family with an autosomal recessive cerebellar ataxia.BMC Med. Genet., 9:28, 2008.

[114] L. Schols, L. Arning, R. Schule, J. T. Epplen, and D. Timmann. "Pseudodominant inheritance" of ataxia withocular apraxia type 2 (AOA2). J. Neurol., 255:495–501, Apr 2008.

[115] M. Tazir, L. Ali-Pacha, A. M’Zahem, J. P. Delaunoy, M. Fritsch, S. Nouioua, T. Benhassine, S. Assami, D. Grid,J. M. Vallat, A. Hamri, and M. Koenig. Ataxia with oculomotor apraxia type 2: a clinical and genetic study of19 patients. J. Neurol. Sci., 278:77–81, Mar 2009.

[116] M. Anheim, B. Monga, M. Fleury, P. Charles, C. Barbot, M. Salih, J. P. Delaunoy, M. Fritsch, L. Arning, M. Syn-ofzik, L. Schols, J. Sequeiros, C. Goizet, C. Marelli, I. Le Ber, J. Koht, J. Gazulla, J. De Bleecker, M. Mukhtar,N. Drouot, L. Ali-Pacha, T. Benhassine, M. Chbicheb, A. M’Zahem, A. Hamri, B. Chabrol, J. Pouget, R. Murphy,M. Watanabe, P. Coutinho, M. Tazir, A. Durr, A. Brice, C. Tranchant, and M. Koenig. Ataxia with oculomotorapraxia type 2: clinical, biological and genotype/phenotype correlation study of a cohort of 90 patients. Brain,132:2688–2698, Oct 2009.

[117] G. Airoldi, A. Guidarelli, O. Cantoni, C. Panzeri, C. Vantaggiato, S. Bonato, M. Grazia D’Angelo, S. Falcone,C. De Palma, A. Tonelli, C. Crimella, S. Bondioni, N. Bresolin, E. Clementi, and M. T. Bassi. Characterizationof two novel SETX mutations in AOA2 patients reveals aspects of the pathophysiological role of senataxin.Neurogenetics, 11:91–100, Feb 2010.

[118] A. G. Bassuk, Y. Z. Chen, S. D. Batish, N. Nagan, P. Opal, P. F. Chance, and C. L. Bennett. In cis autosomaldominant mutation of Senataxin associated with tremor/ataxia syndrome. Neurogenetics, 8:45–49, Jan 2007.

[119] H. D. Kim, J. Choe, and Y. S. Seo. The sen1(+) gene of Schizosaccharomyces pombe, a homologue of buddingyeast SEN1, encodes an RNA and DNA helicase. Biochemistry, 38:14697–14710, Nov 1999.

[120] A. Suraweera, Y. Lim, R. Woods, G. W. Birrell, T. Nasim, O. J. Becherel, and M. F. Lavin. Functional role forsenataxin, defective in ataxia oculomotor apraxia type 2, in transcriptional regulation. Hum. Mol. Genet., 18:3384–3396, Sep 2009.

[121] D. Ursic, K. L. Himmel, K. A. Gurley, F. Webb, and M. R. Culbertson. The yeast SEN1 gene is required for theprocessing of diverse RNA classes. Nucleic Acids Res., 25:4778–4785, Dec 1997.

[122] M. Winey and M. R. Culbertson. Mutations affecting the tRNA-splicing endonuclease activity of Saccha-romyces cerevisiae. Genetics, 118:609–617, Apr 1988.

[123] T. P. Rasmussen and M. R. Culbertson. The putative nucleic acid helicase Sen1p is required for formation andstability of termini and for maximal rates of synthesis and levels of accumulation of small nucleolar RNAs inSaccharomyces cerevisiae. Mol. Cell. Biol., 18:6885–6896, Dec 1998.

bibliography 268

[124] D. Ursic, K. Chinchilla, J. S. Finkel, and M. R. Culbertson. Multiple protein/protein and protein/RNA inter-actions suggest roles for yeast DNA/RNA helicase Sen1p in transcription, transcription-coupled DNA repairand RNA processing. Nucleic Acids Res., 32:2441–2452, 2004.

[125] E. J. Steinmetz, C. L. Warren, J. N. Kuehner, B. Panbehi, A. Z. Ansari, and D. A. Brow. Genome-wide distribu-tion of yeast RNA polymerase II and its control by Sen1 helicase. Mol. Cell, 24:735–746, Dec 2006.

[126] A. Antonellis, R. E. Ellsworth, N. Sambuughin, I. Puls, A. Abel, S. Q. Lee-Lin, A. Jordanova, I. Kremensky,K. Christodoulou, L. T. Middleton, K. Sivakumar, V. Ionasescu, B. Funalot, J. M. Vance, L. G. Goldfarb, K. H.Fischbeck, and E. D. Green. Glycyl tRNA synthetase mutations in Charcot-Marie-Tooth disease type 2D anddistal spinal muscular atrophy type V. Am. J. Hum. Genet., 72:1293–1299, May 2003.

[127] K. Sivakumar, T. Kyriakides, I. Puls, G. A. Nicholson, B. Funalot, A. Antonellis, N. Sambuughin,K. Christodoulou, J. L. Beggs, E. Zamba-Papanicolaou, V. Ionasescu, M. C. Dalakas, E. D. Green, K. H. Fis-chbeck, and L. G. Goldfarb. Phenotypic spectrum of disorders associated with glycyl-tRNA synthetase muta-tions. Brain, 128:2304–2314, Oct 2005.

[128] R. Del Bo, F. Locatelli, S. Corti, M. Scarlato, S. Ghezzi, A. Prelle, G. Fagiolari, M. Moggio, M. Carpo, N. Bresolin,and G. P. Comi. Coexistence of CMT-2D and distal SMA-V phenotypes in an Italian family with a GARS genemutation. Neurology, 66:752–754, Mar 2006.

[129] O. Dubourg, H. Azzedine, R. B. Yaou, J. Pouget, A. Barois, V. Meininger, D. Bouteiller, M. Ruberg, A. Brice,and E. LeGuern. The G526R glycyl-tRNA synthetase gene mutation in distal hereditary motor neuropathytype V. Neurology, 66:1721–1726, Jun 2006.

[130] A. Hamaguchi, C. Ishida, K. Iwasa, A. Abe, and M. Yamada. Charcot-Marie-Tooth disease type 2D with anovel glycyl-tRNA synthetase gene (GARS) mutation. J. Neurol., 257:1202–1204, Jul 2010.

[131] A. Abe and K. Hayasaka. The GARS gene is rarely mutated in Japanese patients with Charcot-Marie-Toothneuropathy. J. Hum. Genet., 54:310–312, May 2009.

[132] P. A. James, M. Z. Cader, F. Muntoni, A. M. Childs, Y. J. Crow, and K. Talbot. Severe childhood SMA andaxonal CMT due to anticodon binding domain mutations in the GARS gene. Neurology, 67:1710–1712, Nov2006.

[133] B. Rohkamm, M. M. Reilly, H. Lochmuller, B. Schlotter-Weigel, N. Barisic, L. Schols, G. Nicholson, D. Pareyson,M. Laura, A. R. Janecke, G. Miltenberger-Miltenyi, E. John, C. Fischer, F. Grill, W. Wakeling, M. Davis, T. R.Pieber, and M. Auer-Grumbach. Further evidence for genetic heterogeneity of distal HMN type V, CMT2 withpredominant hand involvement and Silver syndrome. J. Neurol. Sci., 263:100–106, Dec 2007.

[134] K. L. Seburn, L. A. Nangle, G. A. Cox, P. Schimmel, and R. W. Burgess. An active dominant mutation ofglycyl-tRNA synthetase causes neuropathy in a Charcot-Marie-Tooth 2D mouse model. Neuron, 51(6):715–726,Sep 2006.

[135] W. W. Motley, K. L. Seburn, M. H. Nawaz, K. E. Miers, J. Cheng, A. Antonellis, E. D. Green, K. Talbot, X. L.Yang, K. H. Fischbeck, and R. W. Burgess. Charcot-Marie-Tooth-linked mutant GARS is toxic to peripheralneurons independent of wild-type GARS levels. PLoS Genet., 7(12):e1002399, Dec 2011.

[136] M. Delarue. Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol., 5:48–55, Feb 1995.

[137] Q. Ge, E. P. Trieu, and I. N. Targoff. Primary structure and functional expression of human Glycyl-tRNAsynthetase, an autoantigen in myositis. J. Biol. Chem., 269:28790–28797, Nov 1994.

[138] W. W. Motley, K. Talbot, and K. H. Fischbeck. GARS axonopathy: not every neuron’s cup of tRNA. TrendsNeurosci., 33:59–66, Feb 2010.

[139] P. Latour, C. Thauvin-Robinet, C. Baudelet-Mery, P. Soichot, V. Cusin, L. Faivre, M. C. Locatelli, M. Mayen-con, A. Sarcey, E. Broussolle, W. Camu, A. David, and R. Rousson. A major determinant for binding andaminoacylation of tRNA(Ala) in cytoplasmic Alanyl-tRNA synthetase is mutated in dominant axonal Charcot-Marie-Tooth disease. Am. J. Hum. Genet., 86:77–82, Jan 2010.

[140] H. M. McLaughlin, R. Sakaguchi, C. Liu, T. Igarashi, D. Pehlivan, K. Chu, R. Iyer, P. Cruz, P. F. Cherukuri,N. F. Hansen, J. C. Mullikin, L. G. Biesecker, T. E. Wilson, V. Ionasescu, G. Nicholson, C. Searby, K. Talbot,J. M. Vance, S. Zuchner, K. Szigeti, J. R. Lupski, Y. M. Hou, E. D. Green, and A. Antonellis. Compoundheterozygosity for loss-of-function lysyl-tRNA synthetase mutations in a patient with peripheral neuropathy.Am. J. Hum. Genet., 87:560–566, Oct 2010.

bibliography 269

[141] A. Jordanova, J. Irobi, F. P. Thomas, P. Van Dijck, K. Meerschaert, M. Dewil, I. Dierick, A. Jacobs, E. De Vriendt,V. Guergueltcheva, C. V. Rao, I. Tournev, F. A. Gondim, M. D’Hooghe, V. Van Gerwen, P. Callaerts, L. VanDen Bosch, J. P. Timmermans, W. Robberecht, J. Gettemans, J. M. Thevelein, P. De Jonghe, I. Kremensky, andV. Timmerman. Disrupted function and axonal distribution of mutant tyrosyl-tRNA synthetase in dominantintermediate Charcot-Marie-Tooth neuropathy. Nat. Genet., 38:197–202, Feb 2006.

[142] C. Windpassinger, M. Auer-Grumbach, J. Irobi, H. Patel, E. Petek, G. Horl, R. Malli, J. A. Reed, I. Dierick,N. Verpoorten, T. T. Warner, C. Proukakis, P. Van den Bergh, C. Verellen, L. Van Maldergem, L. Merlini,P. De Jonghe, V. Timmerman, A. H. Crosby, and K. Wagner. Heterozygous missense mutations in BSCL2 areassociated with distal hereditary motor neuropathy and Silver syndrome. Nat. Genet., 36:271–276, Mar 2004.

[143] M. Auer-Grumbach, B. Schlotter-Weigel, H. Lochmuller, G. Strobl-Wildemann, P. Auer-Grumbach, R. Fischer,H. Offenbacher, E. B. Zwick, T. Robl, G. Hartl, H. P. Hartung, K. Wagner, and C. Windpassinger. Phenotypesof the N88S Berardinelli-Seip congenital lipodystrophy 2 mutation. Ann. Neurol., 57:415–424, Mar 2005.

[144] J. Irobi, P. Van den Bergh, L. Merlini, C. Verellen, L. Van Maldergem, I. Dierick, N. Verpoorten, A. Jor-danova, C. Windpassinger, E. De Vriendt, V. Van Gerwen, M. Auer-Grumbach, K. Wagner, V. Timmerman,and P. De Jonghe. The phenotype of motor neuropathies associated with BSCL2 mutations is broader thanSilver syndrome and distal HMN type V. Brain, 127:2124–2130, Sep 2004.

[145] B. P. van de Warrenburg, H. Scheffer, J. J. van Eijk, M. H. Versteeg, H. Kremer, M. J. Zwarts, H. J. Schelhaas,and B. G. van Engelen. BSCL2 mutations in two Dutch families with overlapping Silver syndrome-distalhereditary motor neuropathy. Neuromuscul. Disord., 16:122–125, Feb 2006.

[146] E. Brusse, D. Majoor-Krakauer, B. M. de Graaf, G. H. Visser, S. Swagemakers, A. J. Boon, B. A. Oostra, andA. M. Bertoli-Avella. A novel 16p locus associated with BSCL2 hereditary motor neuronopathy: a geneticmodifier? Neurogenetics, 10:289–297, Oct 2009.

[147] J. Magre, M. Delepine, E. Khallouf, T. Gedde-Dahl, L. Van Maldergem, E. Sobel, J. Papp, M. Meier, A. Megar-bane, A. Bachy, A. Verloes, F. H. d’Abronzo, E. Seemanova, R. Assan, N. Baudic, C. Bourut, P. Czernichow,F. Huet, F. Grigorescu, M. de Kerdanet, D. Lacombe, P. Labrune, M. Lanza, H. Loret, F. Matsuda, J. Navarro,A. Nivelon-Chevalier, M. Polak, J. J. Robert, P. Tric, N. Tubiana-Rufi, C. Vigouroux, J. Weissenbach, S. Savasta,J. A. Maassen, O. Trygstad, P. Bogalho, P. Freitas, J. L. Medina, F. Bonnicci, B. I. Joffe, G. Loyson, V. R. Panz,F. J. Raal, S. O’Rahilly, T. Stephenson, C. R. Kahn, M. Lathrop, and J. Capeau. Identification of the gene alteredin Berardinelli-Seip congenital lipodystrophy on chromosome 11q13. Nat. Genet., 28:365–370, Aug 2001.

[148] D. Ito, T. Fujisawa, H. Iida, and N. Suzuki. Characterization of seipin/BSCL2, a protein associated with spasticparaplegia 17. Neurobiol. Dis., 31:266–277, Aug 2008.

[149] W. Fei, G. Shui, B. Gaeta, X. Du, L. Kuerschner, P. Li, A. J. Brown, M. R. Wenk, R. G. Parton, and H. Yang.Fld1p, a functional homologue of human seipin, regulates the size of lipid droplets in yeast. J. Cell Biol., 180:473–482, Feb 2008.

[150] C. Lundin, R. Nordstrom, K. Wagner, C. Windpassinger, H. Andersson, G. von Heijne, and I. Nilsson. Mem-brane topology of the human seipin protein. FEBS Lett., 580:2281–2284, Apr 2006.

[151] D. Ito and N. Suzuki. Molecular pathogenesis of seipin/BSCL2-related motor neuron diseases. Ann. Neurol.,61:237–250, Mar 2007.

[152] D. Ito and N. Suzuki. Seipinopathy: a novel endoplasmic reticulum stress-associated disease. Brain, 132:8–15,Jan 2009.

[153] M. L. Kennerson, G. A. Nicholson, S. G. Kaler, B. Kowalski, J. F. Mercer, J. Tang, R. M. Llanos, S. Chu, R. I.Takata, C. E. Speck-Martins, J. Baets, L. Almeida-Souza, D. Fischer, V. Timmerman, P. E. Taylor, S. S. Scherer,T. A. Ferguson, T. D. Bird, P. De Jonghe, S. M. Feely, M. E. Shy, and J. Y. Garbern. Missense mutations in thecopper transporter gene ATP7A cause X-linked distal hereditary motor neuropathy. Am. J. Hum. Genet., 86:343–352, Mar 2010.

[154] R. I. Takata, C. E. Speck Martins, M. R. Passosbueno, K. T. Abe, A. L. Nishimura, M. D. Da Silva, A. Monteiro,M. I. Lima, F. Kok, and M. Zatz. A new locus for recessive distal spinal muscular atrophy at Xq13.1-q21. J.Med. Genet., 41:224–229, Mar 2004.

[155] M. Kennerson, G. Nicholson, B. Kowalski, K. Krajewski, D. El-Khechen, S. Feely, S. Chu, M. Shy, and J. Garbern.X-linked distal hereditary motor neuropathy maps to the DSMAX locus on chromosome Xq13.1-q21. Neurology,72:246–252, Jan 2009.

[156] C. C. Weihl and G. Lopate. Motor neuron disease associated with copper deficiency. Muscle Nerve, 34:789–793,Dec 2006.

bibliography 270

[157] C. Vulpe, B. Levinson, S. Whitney, S. Packman, and J. Gitschier. Isolation of a candidate gene for Menkesdisease and evidence that it encodes a copper-transporting ATPase. Nat. Genet., 3:7–13, Jan 1993.

[158] J. F. Mercer, J. Livingston, B. Hall, J. A. Paynter, C. Begy, S. Chandrasekharappa, P. Lockhart, A. Grimes,M. Bhave, and D. Siemieniak. Isolation of a partial candidate gene for Menkes disease by positional cloning.Nat. Genet., 3:20–25, Jan 1993.

[159] J. Chelly, Z. Tumer, T. Tonnesen, A. Petterson, Y. Ishikawa-Brush, N. Tommerup, N. Horn, and A. P. Monaco.Isolation of a candidate gene for Menkes disease that encodes a potential heavy metal binding protein. Nat.Genet., 3:14–19, Jan 1993.

[160] S. G. Kaler, L. K. Gallo, V. K. Proud, A. K. Percy, Y. Mark, N. A. Segal, D. S. Goldstein, C. S. Holmes, and W. A.Gahl. Occipital horn syndrome and a mild Menkes phenotype associated with splice site mutations at theMNK locus. Nat. Genet., 8:195–202, Oct 1994.

[161] I. Maystadt, M. Zarhrate, D. Leclair-Richard, B. Estournet, A. Barois, F. Renault, M. C. Routon, M. C. Durand,S. Lefebvre, A. Munnich, C. Verellen-Dumoulin, and L. Viollet. A gene for an autosomal recessive lower motorneuron disease with childhood onset maps to 1p36. Neurology, 67:120–124, Jul 2006.

[162] I. Maystadt, R. Rezsohazy, M. Barkats, S. Duque, P. Vannuffel, S. Remacle, B. Lambert, M. Najimi, E. Sokal,A. Munnich, L. Viollet, and C. Verellen-Dumoulin. The nuclear factor kappaB-activator gene PLEKHG5 ismutated in a form of autosomal recessive lower motor neuron disease with childhood onset. Am. J. Hum.Genet., 81:67–76, Jul 2007.

[163] M. De Toledo, V. Coulon, S. Schmidt, P. Fort, and A. Blangy. The gene for a new brain specific RhoA exchangefactor maps to the highly unstable chromosomal region 1p36.2-1p36.3. Oncogene, 20:7307–7317, Nov 2001.

[164] A. Matsuda, Y. Suzuki, G. Honda, S. Muramatsu, O. Matsuzaki, Y. Nagano, T. Doi, K. Shimotohno, T. Harada,E. Nishida, H. Hayashi, and S. Sugano. Large-scale identification and characterization of human genes thatactivate NF-kappaB and MAPK signaling pathways. Oncogene, 22:3307–3318, May 2003.

[165] S. W. Barger, A. M. Moerman, and X. Mao. Molecular mechanisms of cytokine-induced neuroprotection:NFkappaB and neuroplasticity. Curr. Pharm. Des., 11:985–998, 2005.

[166] G. Landoure, A. A. Zdebik, T. L. Martinez, B. G. Burnett, H. C. Stanescu, H. Inada, Y. Shi, A. A. Taye, L. Kong,C. H. Munns, S. S. Choo, C. B. Phelps, R. Paudel, H. Houlden, C. L. Ludlow, M. J. Caterina, R. Gaudet,R. Kleta, K. H. Fischbeck, and C. J. Sumner. Mutations in TRPV4 cause Charcot-Marie-Tooth disease type 2C.Nat. Genet., 42:170–174, Feb 2010.

[167] H. X. Deng, C. J. Klein, J. Yan, Y. Shi, Y. Wu, F. Fecto, H. J. Yau, Y. Yang, H. Zhai, N. Siddique, E. T. Hedley-Whyte, R. Delong, M. Martina, P. J. Dyck, and T. Siddique. Scapuloperoneal spinal muscular atrophy andCMT2C are allelic disorders caused by alterations in TRPV4. Nat. Genet., 42:165–169, Feb 2010.

[168] B. Nilius and G. Owsianik. Channelopathies converge on TRPV4. Nat. Genet., 42:98–100, Feb 2010.

[169] P. Fleury and G. Hageman. A dominantly inherited lower motor neuron disorder presenting at birth withassociated arthrogryposis. J. Neurol. Neurosurg. Psychiatr., 48:1037–1048, Oct 1985.

[170] A. J. van der Vleuten, C. M. van Ravenswaaij-Arts, C. J. Frijns, A. P. Smits, G. Hageman, G. W. Padberg,and H. Kremer. Localisation of the gene for a dominant congenital spinal muscular atrophy predominantlyaffecting the lower limbs to chromosome 12q23-q24. Eur. J. Hum. Genet., 6:376–382, 1998.

[171] G. Owsianik, D. D’hoedt, T. Voets, and B. Nilius. Structure-function relationship of the TRP channel super-family. Rev. Physiol. Biochem. Pharmacol., 156:61–90, 2006.

[172] M. B. Harms, K. M. Ori-McKenney, M. Scoto, E. P. Tuck, S. Bell, D. Ma, S. Masi, P. Allred, M. Al-Lozi, M. M.Reilly, L. J. Miller, A. Jani-Acsadi, A. Pestronk, M. E. Shy, F. Muntoni, R. B. Vallee, and R. H. Baloh. Mutationsin the tail domain of DYNC1H1 cause dominant spinal muscular atrophy. Neurology, Mar 2012.

[173] M. N. Weedon, R. Hastings, R. Caswell, W. Xie, K. Paszkiewicz, T. Antoniadi, M. Williams, C. King, L. Green-halgh, R. Newbury-Ecob, and S. Ellard. Exome sequencing identifies a DYNC1H1 mutation in a large pedigreewith dominant axonal Charcot-Marie-Tooth disease. Am. J. Hum. Genet., 89(2):308–312, Aug 2011.

[174] M. Hafezparast, R. Klocke, C. Ruhrberg, A. Marquardt, A. Ahmad-Annuar, S. Bowen, G. Lalli, A. S. With-erden, H. Hummerich, S. Nicholson, P. J. Morgan, R. Oozageer, J. V. Priestley, S. Averill, V. R. King, S. Ball,J. Peters, T. Toda, A. Yamamoto, Y. Hiraoka, M. Augustin, D. Korthaus, S. Wattler, P. Wabnitz, C. Dickneite,S. Lampel, F. Boehme, G. Peraus, A. Popp, M. Rudelius, J. Schlegel, H. Fuchs, M. Hrabe de Angelis, G. Schiavo,D. T. Shima, A. P. Russ, G. Stumm, J. E. Martin, and E. M. Fisher. Mutations in dynein link motor neurondegeneration to defects in retrograde transport. Science, 300(5620):808–812, May 2003.

bibliography 271

[175] X. J. Chen, E. N. Levedakou, K. J. Millen, R. L. Wollmann, B. Soliven, and B. Popko. Proprioceptive sensoryneuropathy in mice with a mutation in the cytoplasmic Dynein heavy chain 1 gene. J. Neurosci., 27(52):14515–14524, Dec 2007.

[176] L. Viollet, A. Barois, J. G. Rebeiz, Z. Rifai, P. Burlet, M. Zarhrate, E. Vial, M. Dessainte, B. Estournet,B. Kleinknecht, J. Pearn, R. D. Adams, J. A. Urtizberea, D. P. Cros, K. Bushby, A. Munnich, and S. Lefeb-vre. Mapping of autosomal recessive chronic distal spinal muscular atrophy to chromosome 11q13. Ann.Neurol., 51:585–592, May 2002.

[177] L. Viollet, M. Zarhrate, I. Maystadt, B. Estournet-Mathiaut, A. Barois, I. Desguerre, M. Mayer, B. Chabrol,B. LeHeup, V. Cusin, T. Billette De Villemeur, D. Bonneau, P. Saugier-Veber, A. Touzery-De Villepin, A. De-laubier, J. Kaplan, M. Jeanpierre, J. Feingold, and A. Munnich. Refined genetic mapping of autosomal recessivechronic distal spinal muscular atrophy to chromosome 11q13.3 and evidence of linkage disequilibrium in Eu-ropean families. Eur. J. Hum. Genet., 12:483–488, Jun 2004.

[178] M. McEntagart, N. Norton, H. Williams, M. D. Teare, M. Dunstan, P. Baker, H. Houlden, M. Reilly, N. Wood,P. S. Harper, P. A. Futreal, N. Williams, and N. Rahman. Localization of the gene for distal hereditary motorneuronopathy VII (dHMN-VII) to chromosome 2q14. Am. J. Hum. Genet., 68:1270–1276, May 2001.

[179] S. Reddel, R. A. Ouvrier, G. Nicholson, I. Dierick, J. Irobi, V. Timmerman, and M. M. Ryan. Autosomaldominant congenital spinal muscular atrophy–a possible developmental deficiency of motor neurones? Neu-romuscul. Disord., 18:530–535, Jul 2008.

[180] C. J. Frijns, J. Van Deutekom, R. R. Frants, and F. G. Jennekens. Dominant congenital benign spinal muscularatrophy. Muscle Nerve, 17:192–197, Feb 1994.

[181] M. Muglia, A. Magariello, L. Citrigno, L. Passamonti, T. Sprovieri, F. L. Conforti, R. Mazzei, A. Patitucci, A. L.Gabriele, C. Ungaro, M. Bellesi, and A. Quattrone. A novel locus for dHMN with pyramidal features maps tochromosome 4q34.3-q35.2. Clin. Genet., 73:486–491, May 2008.

[182] K. Christodoulou, E. Zamba, M. Tsingis, A. Mubaidin, K. Horani, S. Abu-Sheik, M. El-Khateeb, K. Kyriacou,T. Kyriakides, A. K. Al-Qudah, and L. Middleton. A novel form of distal hereditary motor neuronopathymaps to chromosome 9p21.1-p12. Ann. Neurol., 48:877–884, Dec 2000.

[183] L. T. Middleton, K. Christodoulou, A. Mubaidin, E. Zamba, M. Tsingis, K. Kyriacou, S. Abu-Sheikh, T. Kyri-akides, V. Neocleous, D. M. Georgiou, M. el Khateeb, A. al Qudah, and K. Horany. Distal hereditary motorneuronopathy of the Jerash type. Ann. N. Y. Acad. Sci., 883:65–68, Sep 1999.

[184] J. Irobi, I. Dierick, A. Jordanova, K. G. Claeys, P. De Jonghe, and V. Timmerman. Unraveling the genetics ofdistal hereditary motor neuronopathies. Neuromolecular Med., 8:131–146, 2006.

[185] G. Nicholson, M. Kennerson, M. Brewer, J. Garbern, and M. Shy. Genotypes & sensory phenotypes in 2 newX-linked neuropathies (CMTX3 and dSMAX) and dominant CMT/HMN overlap syndromes. Adv. Exp. Med.Biol., 652:201–206, 2009.

[186] I. Cusco, M. J. Barcelo, R. Rojas-Garcia, I. Illa, J. Gamez, C. Cervera, A. Pou, G. Izquierdo, M. Baiget, andE. F. Tizzano. SMN2 copy number predicts acute or chronic spinal muscular atrophy but does not account forintrafamilial variability in siblings. J. Neurol., 253(1):21–25, Jan 2006.

[187] D. D. Coovert, T. T. Le, P. E. McAndrew, J. Strasswimmer, T. O. Crawford, J. R. Mendell, S. E. Coulson, E. J.Androphy, T. W. Prior, and A. H. Burghes. The survival motor neuron protein in spinal muscular atrophy.Hum. Mol. Genet., 6:1205–1214, Aug 1997.

[188] G. E. Oprea, S. Krober, M. L. McWhorter, W. Rossoll, S. Muller, M. Krawczak, G. J. Bassell, C. E. Beattie, andB. Wirth. Plastin 3 is a protective modifier of autosomal recessive spinal muscular atrophy. Science, 320(5875):524–527, Apr 2008.

[189] N. Roy, M. S. Mahadevan, M. McLean, G. Shutler, Z. Yaraghi, R. Farahani, S. Baird, A. Besner-Johnston,C. Lefebvre, and X. Kang. The gene for neuronal apoptosis inhibitory protein is partially deleted in individualswith spinal muscular atrophy. Cell, 80:167–178, Jan 1995.

[190] Y. H. Liang, X. L. Chen, Z. S. Yu, C. Y. Chen, S. Bi, L. G. Mao, B. L. Zhou, and X. N. Zhang. Deletion analysisof SMN1 and NAIP genes in Southern Chinese children with spinal muscular atrophy. J Zhejiang Univ Sci B,10:29–34, Jan 2009.

[191] R. Gotz, C. Karch, M. R. Digby, J. Troppmair, U. R. Rapp, and M. Sendtner. The neuronal apoptosis inhibitoryprotein suppresses neuronal differentiation and apoptosis in PC12 cells. Hum. Mol. Genet., 9:2479–2489, Oct2000.

bibliography 272

[192] E. Perlson, S. Maday, M. M. Fu, A. J. Moughamian, and E. L. Holzbaur. Retrograde axonal transport: pathwaysto cell death? Trends Neurosci., 33:335–344, Jul 2010.

[193] S. J. Kolb, S. Sutton, and D. R. Schoenberg. RNA processing defects associated with diseases of the motorneuron. Muscle Nerve, 41:5–17, Jan 2010.

[194] J. Sreedharan, I. P. Blair, V. B. Tripathi, X. Hu, C. Vance, B. Rogelj, S. Ackerley, J. C. Durnall, K. L. Williams,E. Buratti, F. Baralle, J. de Belleroche, J. D. Mitchell, P. N. Leigh, A. Al-Chalabi, C. C. Miller, G. Nicholson, andC. E. Shaw. TDP-43 mutations in familial and sporadic amyotrophic lateral sclerosis. Science, 319:1668–1672,Mar 2008.

[195] C. Vance, B. Rogelj, T. Hortobagyi, K. J. De Vos, A. L. Nishimura, J. Sreedharan, X. Hu, B. Smith, D. Ruddy,P. Wright, J. Ganesalingam, K. L. Williams, V. Tripathi, S. Al-Saraj, A. Al-Chalabi, P. N. Leigh, I. P. Blair,G. Nicholson, J. de Belleroche, J. M. Gallo, C. C. Miller, and C. E. Shaw. Mutations in FUS, an RNA processingprotein, cause familial amyotrophic lateral sclerosis type 6. Science, 323:1208–1211, Feb 2009.

[196] J. Sambrook, E. F. Fritsch, and T Maniatis. Molecular Cloning, A laboratory manual,. Laboratory Press, ColdSpring Harbor, 2nd edition, 1989.

[197] X. Wu and D. J. Munroe. EasyExonPrimer: automated primer design for exon sequences. Appl. Bioinformatics,5:119–120, 2006.

[198] W. J. Kent. BLAT–the BLAST-like alignment tool. Genome Res., 12:656–664, Apr 2002.

[199] T. Strachan and A. P. Read. Human Molecular Genetics. Wiley-Liss, New York, 2nd edition, 1999.

[200] K. W. Broman, J. C. Murray, V. C. Sheffield, R. L. White, and J. L. Weber. Comprehensive human genetic maps:individual and sex-specific variation in recombination. Am. J. Hum. Genet., 63:861–869, Sep 1998.

[201] A. Kong, D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson, B. Richardsson, S. Sigurdardottir,J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S. T. Palsson, M. L. Frigge, T. E. Thorgeirsson, J. R. Gulcher, andK. Stefansson. A high-resolution recombination map of the human genome. Nat. Genet., 31:241–247, Jul 2002.

[202] G. A. McVean, S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley, and P. Donnelly. The fine-scale structure ofrecombination rate variation in the human genome. Science, 304:581–584, Apr 2004.

[203] S. Myers, L. Bottolo, C. Freeman, G. McVean, and P. Donnelly. A fine-scale map of recombination rates andhotspots across the human genome. Science, 310:321–324, Oct 2005.

[204] F. Baudat, J. Buard, C. Grey, A. Fledel-Alon, C. Ober, M. Przeworski, G. Coop, and B. de Massy. PRDM9 is amajor determinant of meiotic recombination hotspots in humans and mice. Science, 327:836–840, Feb 2010.

[205] M. A. Pericak-Vance. Analysis of Genetic Linkage Data for Mendelian Traits. Current Protocols in HumanGenetics, Supplement 9:1.4.1–1.4.31, 2001.

[206] N. E. Morton. Sequential tests for the detection of linkage. Am. J. Hum. Genet., 7:277–318, Sep 1955.

[207] Z. Brkanac, M. Fernandez, M. Matsushita, H. Lipe, J. Wolff, T. D. Bird, and W. H. Raskind. Autosomaldominant sensory/motor neuropathy with Ataxia (SMNA): Linkage to chromosome 7q22-q32. Am. J. Med.Genet., 114:450–457, May 2002.

[208] Z. Brkanac, D. Spencer, J. Shendure, P. D. Robertson, M. Matsushita, T. Vu, T. D. Bird, M. V. Olson, and W. H.Raskind. IFRD1 is a candidate gene for SMNA on chromosome 7q22-q23. Am. J. Hum. Genet., 84:692–697, May2009.

[209] S. Gopinath. Finding new genes causing motor neuron diseases. Master’s thesis, Faculty of Medicine, Univer-sity of Sydney, 2006.

[210] W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The humangenome browser at UCSC. Genome Res., 12:996–1006, Jun 2002.

[211] G. Benson. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27:573–580, Jan1999.

[212] T. H. Lindner and K. Hoffmann. easyLINKAGE: a PERL script for easy and automated two-/multi-pointlinkage analyses. Bioinformatics, 21:405–407, Feb 2005.

[213] J. Ott. Computer-simulation methods in human linkage analysis. Proc. Natl. Acad. Sci. U.S.A., 86:4175–4178,Jun 1989.

bibliography 273

[214] D.E. Weeks, J. Ott, and G.M. Lathrop. SLINK: a general simulation program for linkage analysis. Am. J. Hum.Genet., 47(Suppl):A204, September 1990.

[215] A. E. Harding and P. K. Thomas. Peroneal muscular atrophy with pyramidal features. J. Neurol. Neurosurg.Psychiatr., 47:168–172, Feb 1984.

[216] W. J. Ewens, R. S. Spielman, N. L. Kaplan, X. Gao, R. W. Morris, and E. R. Martin. Disease associations andfamily-based tests. Current Protocols in Human Genetics, 58:1.12.1–1.12.24, 2008.

[217] G. A. Nicholson, J. L. Dawkins, I. P. Blair, M. Auer-Grumbach, S. B. Brahmbhatt, and D. J. Hulme. Hereditarysensory neuropathy type I: haplotype analysis shows founders in southern England and Europe. Am. J. Hum.Genet., 69:655–659, Sep 2001.

[218] R. Claramunt, T. Sevilla, V. Lupo, A. Cuesta, J. M. Millan, J. J. Vilchez, F. Palau, and C. Espinos. The p.R1109Xmutation in SH3TC2 gene is predominant in Spanish Gypsies with Charcot-Marie-Tooth disease type 4. Clin.Genet., 71:343–349, Apr 2007.

[219] A. Gabreels-Festen, S. van Beersum, L. Eshuis, E. LeGuern, F. Gabreels, B. van Engelen, and E. Mariman.Study on the gene and phenotypic characterisation of autosomal recessive demyelinating motor and sensoryneuropathy (Charcot-Marie-Tooth disease) with a gene locus on chromosome 5q23-q33. J. Neurol. Neurosurg.Psychiatr., 66:569–574, May 1999.

[220] J. Senderek, C. Bergmann, C. Stendel, J. Kirfel, N. Verpoorten, P. De Jonghe, V. Timmerman, R. Chrast, M. H.Verheijen, G. Lemke, E. Battaloglu, Y. Parman, S. Erdem, E. Tan, H. Topaloglu, A. Hahn, W. Muller-Felber,N. Rizzuto, G. M. Fabrizi, M. Stuhrmann, S. Rudnik-Schoneborn, S. Zuchner, J. Michael Schroder, E. Buch-heim, V. Straub, J. Klepper, K. Huehne, B. Rautenstrauss, R. Buttner, E. Nelis, and K. Zerres. Mutations ina gene encoding a novel SH3/TPR domain protein cause autosomal recessive Charcot-Marie-Tooth type 4Cneuropathy. Am. J. Hum. Genet., 73:1106–1119, Nov 2003.

[221] M. Brewer, F. Changi, A. Antonellis, K. Fischbeck, P. Polly, G. Nicholson, and M. Kennerson. Evidence of afounder haplotype refines the X-linked Charcot-Marie-Tooth (CMTX3) locus to a 2.5 Mb region. Neurogenetics,9:191–195, Jul 2008.

[222] J. L. Weber and C. Wong. Mutation of human short tandem repeats. Hum. Mol. Genet., 2:1123–1128, Aug 1993.

[223] G. F. Richard, A. Kerrest, and B. Dujon. Comparative genomics and molecular dynamics of DNA repeats ineukaryotes. Microbiol. Mol. Biol. Rev., 72:686–727, Dec 2008.

[224] B. Brinkmann, M. Klintschar, F. Neuhuber, J. Huhne, and B. Rolf. Mutation rate in human microsatellites:influence of the structure and length of the tandem repeat. Am. J. Hum. Genet., 62:1408–1415, Jun 1998.

[225] F. S. Collins. Positional cloning moves from perditional to traditional. Nat. Genet., 9:347–350, Apr 1995.

[226] J. S. Collins and C. E. Schwartz. Detecting polymorphisms and mutations in candidate genes. Am. J. Hum.Genet., 71:1251–1252, Nov 2002.

[227] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the humangenome. Nature, 431:931–945, Oct 2004.

[228] J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans,R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira,X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson,S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman,M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli,S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi,R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn,K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins,R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M.Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun,Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang,H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier,C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow,K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets,S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner,S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love,F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch,E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter,M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter,S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J.

bibliography 274

Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer,A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen,A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke,A. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser,A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings,C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros,J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe,R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen,D. Wu, M. Wu, A. Xia, A. Zandieh, and X. Zhu. The sequence of the human genome. Science, 291:1304–1351,Feb 2001.

[229] A. D. Baxevanis. Using genomic databases for sequence-based biological discovery. Mol. Med., 9:185–192,2003.

[230] J. E. Parrish and D. L. Nelson. Methods for finding genes. A major rate-limiting step in positional cloning.Genet. Anal. Tech. Appl., 10:29–41, Apr 1993.

[231] N. Papadopoulos. Introduction to positional cloning. Clin. Exp. Allergy, 25 Suppl 2:116–118, Nov 1995.

[232] M. Olson, L. Hood, C. Cantor, and D. Botstein. A common language for physical mapping of the humangenome. Science, 245:1434–1435, Sep 1989.

[233] D. L. Nelson. Positional cloning reaches maturity. Curr. Opin. Genet. Dev., 5:298–303, Jun 1995.

[234] T. A. Pearson and T. A. Manolio. How to interpret a genome-wide association study. JAMA, 299:1335–1344,Mar 2008.

[235] H. K. Tabor, N. J. Risch, and R. M. Myers. Candidate-gene approaches for studying complex genetic traits:practical considerations. Nat. Rev. Genet., 3:391–397, May 2002.

[236] K. Garber. Genetics. The elusive ALS genes. Science, 319:20, Jan 2008.

[237] T.A. Brown, editor. Genomes. Wiley-Liss, Oxford, 2nd edition edition, 2002.

[238] S. A. Cook, K. R. Johnson, R. T. Bronson, and M. T. Davisson. Neuromuscular degeneration (nmd): a mutationon mouse chromosome 19 that causes motor neuron degeneration. Mamm. Genome, 6:187–191, Mar 1995.

[239] M. Zhu and S. Zhao. Candidate gene identification approach: progress and challenges. Int. J. Biol. Sci., 3:420–427, 2007.

[240] M. L. Metzker. Emerging technologies in DNA sequencing. Genome Res., 15:1767–1776, Dec 2005.

[241] M. Grompe. The rapid detection of unknown mutations in nucleic acids. Nat. Genet., 5:111–117, Oct 1993.

[242] E. Y. Chan. Advances in sequencing technology. Mutat. Res., 573:13–40, Jun 2005.

[243] S. Kohler, S. Bauer, D. Horn, and P. N. Robinson. Walking the interactome for prioritization of candidatedisease genes. Am. J. Hum. Genet., 82:949–958, Apr 2008.

[244] L. C. Tranchevent, R. Barriot, S. Yu, S. Van Vooren, P. Van Loo, B. Coessens, B. De Moor, S. Aerts, and Y. Moreau.ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res., 36:W377–384, Jul 2008.

[245] S. Aerts, D. Lambrechts, S. Maity, P. Van Loo, B. Coessens, F. De Smet, L. C. Tranchevent, B. De Moor, P. Mary-nen, B. Hassan, P. Carmeliet, and Y. Moreau. Gene prioritization through genomic data fusion. Nat. Biotechnol.,24:537–544, May 2006.

[246] C. Houdayer, C. Dehainault, C. Mattler, D. Michaux, V. Caux-Moncoutier, S. Pages-Berhouet, C. D. d’Enghien,A. Lauge, L. Castera, M. Gauthier-Villars, and D. Stoppa-Lyonnet. Evaluation of in silico splice tools fordecision-making in molecular diagnosis. Hum. Mutat., 29:975–982, Jul 2008.

[247] J. Bell, D. Bodmer, E. Sistermans, and S. Ramsden. Practice guidelines for the interpretation and reporting ofunclassified variants (uvs) in clinical molecular genetics. Technical report, Clinical Molecular Genetics Societyand Dutch Society of Clinical Genetic Laboratory Specialists, http://www.cmgs.org/BPGs/, Oct 2007.

[248] S. M. Hebsgaard, P. G. Korning, N. Tolstrup, J. Engelbrecht, P. Rouze, and S. Brunak. Splice site prediction inArabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res., 24:3439–3452, Sep 1996.

bibliography 275

[249] M. G. Reese, F. H. Eeckman, D. Kulp, and D. Haussler. Improved splice site detection in Genie. J. Comput.Biol., 4:311–323, 1997.

[250] F. O. Desmet, D. Hamroun, M. Lalande, G. Collod-Beroud, M. Claustres, and C. Beroud. Human SplicingFinder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res., 37:e67, May 2009.

[251] M. Pertea, X. Lin, and S. L. Salzberg. GeneSplicer: a new computational method for splice site prediction.Nucleic Acids Res., 29:1185–1190, Mar 2001.

[252] M. Wang and A. Marin. Characterization and prediction of alternative splice sites. Gene, 366:219–227, Feb2006.

[253] D.H. Mathews, J. Sabina, M. Zuker, and D.H. Turner. Expanded Sequence Dependence of ThermodynamicParameters Provides Robust Prediction of Rna Secondary Structure. J. Mol. Biol., 288:910–940, 1999.

[254] M. Zuker, D.H. Mathews, and D.H. Turner. Algorithms and Thermodynamics for RNA Secondary StructurePrediction: A Practical Guide. In J. Barciszewski and B.F.C. Clark, editors, RNA Biochemistry and Biotechnology,.NATO ASI Series, Kluwer Academic Publishers„ 1999.

[255] L. W. Hillier, R. S. Fulton, L. A. Fulton, T. A. Graves, K. H. Pepin, C. Wagner-McPherson, D. Layman, J. Maas,S. Jaeger, R. Walker, K. Wylie, M. Sekhon, M. C. Becker, M. D. O’Laughlin, M. E. Schaller, G. A. Fewell,K. D. Delehaunty, T. L. Miner, W. E. Nash, M. Cordes, H. Du, H. Sun, J. Edwards, H. Bradshaw-Cordum,J. Ali, S. Andrews, A. Isak, A. Vanbrunt, C. Nguyen, F. Du, B. Lamar, L. Courtney, J. Kalicki, P. Ozersky,L. Bielicki, K. Scott, A. Holmes, R. Harkins, A. Harris, C. M. Strong, S. Hou, C. Tomlinson, S. Dauphin-Kohlberg, A. Kozlowicz-Reilly, S. Leonard, T. Rohlfing, S. M. Rock, A. M. Tin-Wollam, A. Abbott, P. Minx,R. Maupin, C. Strowmatt, P. Latreille, N. Miller, D. Johnson, J. Murray, J. P. Woessner, M. C. Wendl, S. P. Yang,B. R. Schultz, J. W. Wallis, J. Spieth, T. A. Bieri, J. O. Nelson, N. Berkowicz, P. E. Wohldmann, L. L. Cook, M. T.Hickenbotham, J. Eldred, D. Williams, J. A. Bedell, E. R. Mardis, S. W. Clifton, S. L. Chissoe, M. A. Marra,C. Raymond, E. Haugen, W. Gillett, Y. Zhou, R. James, K. Phelps, S. Iadanoto, K. Bubb, E. Simms, R. Levy,J. Clendenning, R. Kaul, W. J. Kent, T. S. Furey, R. A. Baertsch, M. R. Brent, E. Keibler, P. Flicek, P. Bork,M. Suyama, J. A. Bailey, M. E. Portnoy, D. Torrents, A. T. Chinwalla, W. R. Gish, S. R. Eddy, J. D. McPherson,M. V. Olson, E. E. Eichler, E. D. Green, R. H. Waterston, and R. K. Wilson. The DNA sequence of humanchromosome 7. Nature, 424:157–164, Jul 2003.

[256] D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W. Sugnet, D. Haussler, and W. J. Kent. The UCSCTable Browser data retrieval tool. Nucleic Acids Res., 32:D493–496, Jan 2004.

[257] K. D. Pruitt, T. Tatusova, W. Klimke, and D. R. Maglott. NCBI Reference Sequences: current status, policy andnew initiatives. Nucleic Acids Res., 37:D32–36, Jan 2009.

[258] K. D. Pruitt, T. Tatusova, and D. R. Maglott. NCBI reference sequences (RefSeq): a curated non-redundantsequence database of genomes, transcripts and proteins. Nucleic Acids Res., 35:D61–65, Jan 2007.

[259] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler. GenBank: update. Nucleic AcidsRes., 32:D23–26, Jan 2004.

[260] K. D. Pruitt, J. Harrow, R. A. Harte, C. Wallin, M. Diekhans, D. R. Maglott, S. Searle, C. M. Farrell, J. E.Loveland, B. J. Ruef, E. Hart, M. M. Suner, M. J. Landrum, B. Aken, S. Ayling, R. Baertsch, J. Fernandez-Banet,J. L. Cherry, V. Curwen, M. Dicuccio, M. Kellis, J. Lee, M. F. Lin, M. Schuster, A. Shkeda, C. Amid, G. Brown,O. Dukhanina, A. Frankish, J. Hart, B. L. Maidak, J. Mudge, M. R. Murphy, T. Murphy, J. Rajan, B. Rajput, L. D.Riddick, C. Snow, C. Steward, D. Webb, J. A. Weber, L. Wilming, W. Wu, E. Birney, D. Haussler, T. Hubbard,J. Ostell, R. Durbin, and D. Lipman. The consensus coding sequence (CCDS) project: Identifying a commonprotein-coding gene set for the human and mouse genomes. Genome Res., 19:1316–1323, Jul 2009.

[261] R. Leinonen, F. Nardone, W. Zhu, and R. Apweiler. UniSave: the UniProtKB sequence/annotation versiondatabase. Bioinformatics, 22:1284–1285, May 2006.

[262] S. Rouquier, S. Taviaux, B. J. Trask, V. Brand-Arpon, G. van den Engh, J. Demaille, and D. Giorgi. Distributionof olfactory receptor genes in the human genome. Nat. Genet., 18:243–250, Mar 1998.

[263] M. A. Robinson, M. P. Mitchell, S. Wei, C. E. Day, T. M. Zhao, and P. Concannon. Organization of human T-cellreceptor beta-chain genes: clusters of V beta genes are present on chromosomes 7 and 9. Proc. Natl. Acad. Sci.U.S.A., 90:2433–2437, Mar 1993.

[264] M. A. van Es, P. W. van Vught, H. M. Blauw, L. Franke, C. G. Saris, L. Van den Bosch, S. W. de Jong, V. de Jong,F. Baas, R. van’t Slot, R. Lemmens, H. J. Schelhaas, A. Birve, K. Sleegers, C. Van Broeckhoven, J. C. Schymick,B. J. Traynor, J. H. Wokke, C. Wijmenga, W. Robberecht, P. M. Andersen, J. H. Veldink, R. A. Ophoff, and L. H.van den Berg. Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis. Nat.Genet., 40:29–31, Jan 2008.

bibliography 276

[265] S. Cronin, S. Berger, J. Ding, J. C. Schymick, N. Washecka, D. G. Hernandez, M. J. Greenway, D. G. Bradley,B. J. Traynor, and O. Hardiman. A genome-wide association study of sporadic ALS in a homogenous Irishpopulation. Hum. Mol. Genet., 17:768–774, Mar 2008.

[266] R. Del Bo, S. Ghezzi, S. Corti, D. Santoro, A. Prelle, M. Mancuso, G. Siciliano, C. Briani, L. Murri, N. Bresolin,and G. P. Comi. DPP6 gene variability confers increased risk of developing sporadic amyotrophic lateralsclerosis in Italian patients. J. Neurol. Neurosurg. Psychiatr., 79:1085, Sep 2008.

[267] T. Dunckley, M. J. Huentelman, D. W. Craig, J. V. Pearson, S. Szelinger, K. Joshipura, R. F. Halperin, C. Stamper,K. R. Jensen, D. Letizia, S. E. Hesterlee, A. Pestronk, T. Levine, T. Bertorini, M. C. Graves, T. Mozaffar, C. E.Jackson, P. Bosch, A. McVey, A. Dick, R. Barohn, C. Lomen-Hoerth, J. Rosenfeld, D. T. O’connor, K. Zhang,R. Crook, H. Ryberg, M. Hutton, J. Katz, E. P. Simpson, H. Mitsumoto, R. Bowser, R. G. Miller, S. H. Appel,and D. A. Stephan. Whole-genome analysis of sporadic amyotrophic lateral sclerosis. N. Engl. J. Med., 357:775–788, Aug 2007.

[268] E. Lingueglia, J. R. de Weille, F. Bassilana, C. Heurteaux, H. Sakai, R. Waldmann, and M. Lazdunski. Amodulatory subunit of acid sensing ion channels in brain and dorsal root ganglion cells. J. Biol. Chem., 272:29778–29783, Nov 1997.

[269] D. Alvarez de la Rosa, P. Zhang, D. Shao, F. White, and C. M. Canessa. Functional implications of thelocalization and activity of acid-sensitive channels in rat peripheral nervous system. Proc. Natl. Acad. Sci.U.S.A., 99:2326–2331, Feb 2002.

[270] J. F. Staropoli, C. McDermott, C. Martinat, B. Schulman, E. Demireva, and A. Abeliovich. Parkin is a componentof an SCF-like ubiquitin ligase complex and protects postmitotic neurons from kainate excitotoxicity. Neuron,37:735–749, Mar 2003.

[271] T. Kitada, S. Asakawa, N. Hattori, H. Matsumine, Y. Yamamura, S. Minoshima, M. Yokochi, Y. Mizuno, andN. Shimizu. Mutations in the parkin gene cause autosomal recessive juvenile parkinsonism. Nature, 392:605–608, Apr 1998.

[272] K. A. Strauss, E. G. Puffenberger, M. J. Huentelman, S. Gottlieb, S. E. Dobrin, J. M. Parod, D. A. Stephan, andD. H. Morton. Recessive symptomatic focal epilepsy and mutant contactin-associated protein-like 2. N. Engl.J. Med., 354:1370–1377, Mar 2006.

[273] J. I. Friedman, T. Vrijenhoek, S. Markx, I. M. Janssen, W. A. van der Vliet, B. H. Faas, N. V. Knoers, W. Cahn,R. S. Kahn, L. Edelmann, K. L. Davis, J. M. Silverman, H. G. Brunner, A. G. van Kessel, C. Wijmenga, R. A.Ophoff, and J. A. Veltman. CNTNAP2 gene dosage variation is associated with schizophrenia and epilepsy.Mol. Psychiatry, 13:261–266, Mar 2008.

[274] A. J. Verkerk, C. A. Mathews, M. Joosse, B. H. Eussen, P. Heutink, and B. A. Oostra. CNTNAP2 is disrupted ina family with Gilles de la Tourette syndrome and obsessive compulsive disorder. Genomics, 82:1–9, Jul 2003.

[275] D. E. Arking, D. J. Cutler, C. W. Brune, T. M. Teslovich, K. West, M. Ikeda, A. Rea, M. Guy, S. Lin, E. H. Cook,and A. Chakravarti. A common genetic variant in the neurexin superfamily member CNTNAP2 increasesfamilial risk of autism. Am. J. Hum. Genet., 82:160–164, Jan 2008.

[276] S. J. Huffaker, J. Chen, K. K. Nicodemus, F. Sambataro, F. Yang, V. Mattay, B. K. Lipska, T. M. Hyde, J. Song,D. Rujescu, I. Giegling, K. Mayilyan, M. J. Proust, A. Soghoyan, G. Caforio, J. H. Callicott, A. Bertolino,A. Meyer-Lindenberg, J. Chang, Y. Ji, M. F. Egan, T. E. Goldberg, J. E. Kleinman, B. Lu, and D. R. Weinberger.A primate-specific, brain isoform of KCNH2 affects cortical physiology, cognition, neuronal repolarization andrisk of schizophrenia. Nat. Med., 15:509–518, May 2009.

[277] M. Dahiyat, A. Cumming, C. Harrington, C. Wischik, J. Xuereb, F. Corrigan, G. Breen, D. Shaw, and D. St Clair.Association between Alzheimer’s disease and the NOS3 gene. Ann. Neurol., 46:664–667, Oct 1999.

[278] A. Akomolafe, K. L. Lunetta, P. M. Erlich, L. A. Cupples, C. T. Baldwin, M. Huyck, R. C. Green, and L. A.Farrer. Genetic association between endothelial nitric oxide synthase and Alzheimer disease. Clin. Genet., 70:49–56, Jul 2006.

[279] L. Bergeron, G. I. Perez, G. Macdonald, L. Shi, Y. Sun, A. Jurisicova, S. Varmuza, K. E. Latham, J. A. Flaws,J. C. Salter, H. Hara, M. A. Moskowitz, E. Li, A. Greenberg, J. L. Tilly, and J. Yuan. Defects in regulation ofapoptosis in caspase-2-deficient mice. Genes Dev., 12:1304–1314, May 1998.

[280] M. D. Nguyen, R. C. Lariviere, and J. P. Julien. Deregulation of Cdk5 in a mouse model of ALS: toxicityalleviated by perikaryal neurofilament inclusions. Neuron, 30:135–147, Apr 2001.

[281] R. B. Hough, A. Lengeling, V. Bedian, C. Lo, and M. Bucan. Rump white inversion in the mouse disruptsdipeptidyl aminopeptidase-like protein 6 and causes dysregulation of Kit expression. Proc. Natl. Acad. Sci.U.S.A., 95:13800–13805, Nov 1998.

bibliography 277

[282] L. Wilson, Y. H. Ching, M. Farias, S. A. Hartford, G. Howell, H. Shao, M. Bucan, and J. C. Schimenti. Randommutagenesis of proximal mouse chromosome 5 uncovers predominantly embryonic lethal mutations. GenomeRes., 15:1095–1105, Aug 2005.

[283] J. T. Winston, P. Strack, P. Beer-Romero, C. Y. Chu, S. J. Elledge, and J. W. Harper. The SCFbeta-TRCP-ubiquitin ligase complex associates specifically with phosphorylated destruction motifs in IkappaBalpha andbeta-catenin and stimulates IkappaBalpha ubiquitination in vitro. Genes Dev., 13:270–283, Feb 1999.

[284] T. Maniatis. A ubiquitin ligase complex essential for the NF-kappaB, Wnt/Wingless, and Hedgehog signalingpathways. Genes Dev., 13:505–510, Mar 1999.

[285] S. H. Loh, L. Francescut, P. Lingor, M. Bahr, and P. Nicotera. Identification of new kinase clusters required forneurite outgrowth and retraction by a loss-of-function RNA interference screen. Cell Death Differ., 15:283–298,Feb 2008.

[286] H. Z. Ring, V. Vameghi-Meyers, W. Wang, G. R. Crabtree, and U. Francke. Five SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin (SMARC) genes are dispersed in the human genome. Ge-nomics, 51:140–143, Jul 1998.

[287] D. F. Bogenhagen, D. Rousseau, and S. Burke. The layered structure of human mitochondrial DNA nucleoids.J. Biol. Chem., 283:3665–3675, Feb 2008.

[288] J. Huang, Z. Gong, G. Ghosal, and J. Chen. SOSS complexes participate in the maintenance of genomicstability. Mol. Cell, 35:384–393, Aug 2009.

[289] S. Poliak, L. Gollan, R. Martinez, A. Custer, S. Einheber, J. L. Salzer, J. S. Trimmer, P. Shrager, and E. Peles.Caspr2, a new member of the neurexin superfamily, is localized at the juxtaparanodes of myelinated axonsand associates with K+ channels. Neuron, 24:1037–1047, Dec 1999.

[290] M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. GeneCards: integrating information about genes,proteins and diseases. Trends Genet., 13:163, Apr 1997.

[291] I. Yanai, H. Benjamin, M. Shmoish, V. Chalifa-Caspi, M. Shklar, R. Ophir, A. Bar-Even, S. Horn-Saban,M. Safran, E. Domany, D. Lancet, and O. Shmueli. Genome-wide midrange transcription profiles revealexpression level relationships in human tissue specification. Bioinformatics, 21:650–659, Mar 2005.

[292] O. Shmueli, S. Horn-Saban, V. Chalifa-Caspi, M. Shmoish, R. Ophir, H. Benjamin-Rodrig, M. Safran, E. Do-many, and D. Lancet. GeneNote: whole genome expression profiles in normal human tissues. C. R. Biol., 326:1067–1072, 2003.

[293] C. Wu, C. Orozco, J. Boyer, M. Leglise, J. Goodale, S. Batalov, C. L. Hodge, J. Haase, J. Janes, J. W. Huss,and A. I. Su. BioGPS: an extensible and customizable portal for querying and organizing gene annotationresources. Genome Biol., 10:R130, 2009.

[294] A. I. Su, T. Wiltshire, S. Batalov, H. Lapp, K. A. Ching, D. Block, J. Zhang, R. Soden, M. Hayakawa, G. Kreiman,M. P. Cooke, J. R. Walker, and J. B. Hogenesch. A gene atlas of the mouse and human protein-encodingtranscriptomes. Proc. Natl. Acad. Sci. U.S.A., 101:6062–6067, Apr 2004.

[295] F. Zhang, W. Gu, M. E. Hurles, and J. R. Lupski. Copy number variation in human health, disease, andevolution. Annu Rev Genomics Hum Genet, 10:451–481, 2009.

[296] S. A. McCarroll, F. G. Kuruvilla, J. M. Korn, S. Cawley, J. Nemesh, A. Wysoker, M. H. Shapero, P. I. de Bakker,J. B. Maller, A. Kirby, A. L. Elliott, M. Parkin, E. Hubbell, T. Webster, R. Mei, J. Veitch, P. J. Collins, R. Handsaker,S. Lincoln, M. Nizzari, J. Blume, K. W. Jones, R. Rava, M. J. Daly, S. B. Gabriel, and D. Altshuler. Integrateddetection and population-genetic analysis of SNPs and copy number variation. Nat. Genet., 40:1166–1174, Oct2008.

[297] C. Alkan, B. P. Coe, and E. E. Eichler. Genome structural variation discovery and genotyping. Nat. Rev. Genet.,12:363–376, May 2011.

[298] J. Sebat, B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. Maner, H. Massa, M. Walker, M. Chi,N. Navin, R. Lucito, J. Healy, J. Hicks, K. Ye, A. Reiner, T. C. Gilliam, B. Trask, N. Patterson, A. Zetterberg, andM. Wigler. Large-scale copy number polymorphism in the human genome. Science, 305:525–528, Jul 2004.

[299] A. J. Iafrate, L. Feuk, M. N. Rivera, M. L. Listewnik, P. K. Donahoe, Y. Qi, S. W. Scherer, and C. Lee. Detectionof large-scale variation in the human genome. Nat. Genet., 36:949–951, Sep 2004.

bibliography 278

[300] R. Redon, S. Ishikawa, K. R. Fitch, L. Feuk, G. H. Perry, T. D. Andrews, H. Fiegler, M. H. Shapero, A. R. Carson,W. Chen, E. K. Cho, S. Dallaire, J. L. Freeman, J. R. Gonzalez, M. Gratacos, J. Huang, D. Kalaitzopoulos,D. Komura, J. R. MacDonald, C. R. Marshall, R. Mei, L. Montgomery, K. Nishimura, K. Okamura, F. Shen,M. J. Somerville, J. Tchinda, A. Valsesia, C. Woodwark, F. Yang, J. Zhang, T. Zerjal, J. Zhang, L. Armengol, D. F.Conrad, X. Estivill, C. Tyler-Smith, N. P. Carter, H. Aburatani, C. Lee, K. W. Jones, S. W. Scherer, and M. E.Hurles. Global variation in copy number in the human genome. Nature, 444:444–454, Nov 2006.

[301] J. M. Kidd, G. M. Cooper, W. F. Donahue, H. S. Hayden, N. Sampas, T. Graves, N. Hansen, B. Teague, C. Alkan,F. Antonacci, E. Haugen, T. Zerr, N. A. Yamada, P. Tsang, T. L. Newman, E. Tuzun, Z. Cheng, H. M. Ebling,N. Tusneem, R. David, W. Gillett, K. A. Phelps, M. Weaver, D. Saranga, A. Brand, W. Tao, E. Gustafson,K. McKernan, L. Chen, M. Malig, J. D. Smith, J. M. Korn, S. A. McCarroll, D. A. Altshuler, D. A. Peiffer,M. Dorschner, J. Stamatoyannopoulos, D. Schwartz, D. A. Nickerson, J. C. Mullikin, R. K. Wilson, L. Bruhn,M. V. Olson, R. Kaul, D. R. Smith, and E. E. Eichler. Mapping and sequencing of structural variation fromeight human genomes. Nature, 453:56–64, May 2008.

[302] D. F. Conrad, D. Pinto, R. Redon, L. Feuk, O. Gokcumen, Y. Zhang, J. Aerts, T. D. Andrews, C. Barnes,P. Campbell, T. Fitzgerald, M. Hu, C. H. Ihm, K. Kristiansson, D. G. Macarthur, J. R. Macdonald, I. Onyiah,A. W. Pang, S. Robson, K. Stirrups, A. Valsesia, K. Walter, J. Wei, C. Tyler-Smith, N. P. Carter, C. Lee, S. W.Scherer, and M. E. Hurles. Origins and functional impact of copy number variation in the human genome.Nature, 464:704–712, Apr 2010.

[303] H. Park, J. I. Kim, Y. S. Ju, O. Gokcumen, R. E. Mills, S. Kim, S. Lee, D. Suh, D. Hong, H. P. Kang, Y. J. Yoo, J. Y.Shin, H. J. Kim, M. Yavartanoo, Y. W. Chang, J. S. Ha, W. Chong, G. R. Hwang, K. Darvishi, H. Kim, S. J. Yang,K. S. Yang, H. Kim, M. E. Hurles, S. W. Scherer, N. P. Carter, C. Tyler-Smith, C. Lee, and J. S. Seo. Discoveryof common Asian copy number variants using integrated high-resolution array CGH and massively parallelDNA sequencing. Nat. Genet., 42:400–405, May 2010.

[304] R. E. Mills, K. Walter, C. Stewart, R. E. Handsaker, K. Chen, C. Alkan, A. Abyzov, S. C. Yoon, K. Ye, R. K.Cheetham, A. Chinwalla, D. F. Conrad, Y. Fu, F. Grubert, I. Hajirasouliha, F. Hormozdiari, L. M. Iakoucheva,Z. Iqbal, S. Kang, J. M. Kidd, M. K. Konkel, J. Korn, E. Khurana, D. Kural, H. Y. Lam, J. Leng, R. Li, Y. Li, C. Y.Lin, R. Luo, X. J. Mu, J. Nemesh, H. E. Peckham, T. Rausch, A. Scally, X. Shi, M. P. Stromberg, A. M. Stutz,A. E. Urban, J. A. Walker, J. Wu, Y. Zhang, Z. D. Zhang, M. A. Batzer, L. Ding, G. T. Marth, G. McVean, J. Sebat,M. Snyder, J. Wang, K. Ye, E. E. Eichler, M. B. Gerstein, M. E. Hurles, C. Lee, S. A. McCarroll, J. O. Korbel, and. 1000 Genomes Project. Mapping copy number variation by population-scale genome sequencing. Nature,470:59–65, Feb 2011.

[305] J. A. Lee and J. R. Lupski. Genomic rearrangements and gene copy-number alterations as a cause of nervoussystem disorders. Neuron, 52:103–121, Oct 2006.

[306] J. Huang, X. Wu, G. Montenegro, J. Price, G. Wang, J. M. Vance, M. E. Shy, and S. Zuchner. Copy numbervariations are a rare cause of non-CMT1A Charcot-Marie-Tooth disease. J. Neurol., 257:735–741, May 2010.

[307] J. R. Lupski, R. M. de Oca-Luna, S. Slaugenhaupt, L. Pentao, V. Guzzetta, B. J. Trask, O. Saucedo-Cardenas,D. F. Barker, J. M. Killian, C. A. Garcia, A. Chakravarti, and P. I. Patel. DNA duplication associated withCharcot-Marie-Tooth disease type 1A. Cell, 66:219–232, Jul 1991.

[308] P. F. Chance, M. K. Alderson, K. A. Leppig, M. W. Lensch, N. Matsunami, B. Smith, P. D. Swanson, S. J.Odelberg, C. M. Disteche, and T. D. Bird. DNA deletion associated with hereditary neuropathy with liabilityto pressure palsies. Cell, 72:143–151, Jan 1993.

[309] Q. S. Padiath, K. Saigoh, R. Schiffmann, H. Asahara, T. Yamada, A. Koeppen, K. Hogan, L. J. Ptacek, and Y. H.Fu. Lamin B1 duplications cause autosomal dominant leukodystrophy. Nat. Genet., 38:1114–1123, Oct 2006.

[310] L. Feuk, A. R. Carson, and S. W. Scherer. Structural variation in the human genome. Nat. Rev. Genet., 7:85–97,Feb 2006.

[311] H. Matsuzaki, P. H. Wang, J. Hu, R. Rava, and G. K. Fu. High resolution discovery and confirmation of copynumber variants in 90 Yoruba Nigerians. Genome Biol., 10:R125, 2009.

[312] D. P. Locke, A. J. Sharp, S. A. McCarroll, S. D. McGrath, T. L. Newman, Z. Cheng, S. Schwartz, D. G. Albert-son, D. Pinkel, D. M. Altshuler, and E. E. Eichler. Linkage disequilibrium and heritability of copy-numberpolymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet., 79:275–290, Aug 2006.

[313] A. J. Sharp, D. P. Locke, S. D. McGrath, Z. Cheng, J. A. Bailey, R. U. Vallente, L. M. Pertz, R. A. Clark,S. Schwartz, R. Segraves, V. V. Oseroff, D. G. Albertson, D. Pinkel, and E. E. Eichler. Segmental duplicationsand copy-number variation in the human genome. Am. J. Hum. Genet., 77:78–88, Jul 2005.

bibliography 279

[314] T. H. Shaikh, X. Gai, J. C. Perin, J. T. Glessner, H. Xie, K. Murphy, R. O’Hara, T. Casalunovo, L. K. Conlin,M. D’Arcy, E. C. Frackelton, E. A. Geiger, C. Haldeman-Englert, M. Imielinski, C. E. Kim, L. Medne, K. Anna-iah, J. P. Bradfield, E. Dabaghyan, A. Eckert, C. C. Onyiah, S. Ostapenko, F. G. Otieno, E. Santa, J. L. Shaner,R. Skraban, R. M. Smith, J. Elia, E. Goldmuntz, N. B. Spinner, E. H. Zackai, R. M. Chiavacci, R. Grundmeier,E. F. Rappaport, S. F. Grant, P. S. White, and H. Hakonarson. High-resolution mapping and analysis of copynumber variations in the human genome: a data resource for clinical and research applications. Genome Res.,19:1682–1690, Sep 2009.

[315] B. Guthmundsson, E. Olafsson, F. Jakobsson, and P. Luthvigsson. Prevalence of symptomatic Charcot-Marie-Tooth disease in Iceland: a study of a well-defined population. Neuroepidemiology, 34:13–17, 2010.

[316] M. L. Metzker. Sequencing technologies - the next generation. Nat. Rev. Genet., 11:31–46, Jan 2010.

[317] J. Shendure and H. Ji. Next-generation DNA sequencing. Nat. Biotechnol., 26:1135–1145, Oct 2008.

[318] M. Margulies, M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J.Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, C. H.Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H.Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna,E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons,J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P.Weiner, P. Yu, R. F. Begley, and J. M. Rothberg. Genome sequencing in microfabricated high-density picolitrereactors. Nature, 437:376–380, Sep 2005.

[319] A. Valouev, J. Ichikawa, T. Tonthat, J. Stuart, S. Ranade, H. Peckham, K. Zeng, J. A. Malek, G. Costa, K. McKer-nan, A. Sidow, A. Fire, and S. M. Johnson. A high-resolution, nucleosome position map of C. elegans revealsa lack of universal sequence-dictated positioning. Genome Res., 18:1051–1063, Jul 2008.

[320] J. M. Rothberg, W. Hinz, T. M. Rearick, J. Schultz, W. Mileski, M. Davey, J. H. Leamon, K. Johnson, M. J.Milgrew, M. Edwards, J. Hoon, J. F. Simons, D. Marran, J. W. Myers, J. F. Davidson, A. Branting, J. R. Nobile,B. P. Puc, D. Light, T. A. Clark, M. Huber, J. T. Branciforte, I. B. Stoner, S. E. Cawley, M. Lyons, Y. Fu, N. Homer,M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman,R. Kasinskas, T. Sokolsky, J. A. Fidanza, E. Namsaraev, K. J. McKernan, A. Williams, G. T. Roth, and J. Bustillo.An integrated semiconductor device enabling non-optical genome sequencing. Nature, 475:348–352, Jul 2011.

[321] J. Korlach, K. P. Bjornson, B. P. Chaudhuri, R. L. Cicero, B. A. Flusberg, J. J. Gray, D. Holden, R. Saxena,J. Wegener, and S. W. Turner. Real-time DNA sequencing from single polymerase molecules. Meth. Enzymol.,472:431–455, 2010.

[322] M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, and P. Nyren. Real-time DNA sequencing usingdetection of pyrophosphate release. Anal. Biochem., 242:84–89, Nov 1996.

[323] M. Ronaghi, M. Uhlen, and P. Nyren. A sequencing method based on real-time pyrophosphate. Science, 281:363, 365, Jul 1998.

[324] P. D. Stenson, M. Mort, E. V. Ball, K. Howells, A. D. Phillips, N. S. Thomas, and D. N. Cooper. The HumanGene Mutation Database: 2008 update. Genome Med, 1:13, 2009.

[325] J. Guo, N. Xu, Z. Li, S. Zhang, J. Wu, D. H. Kim, M. Sano Marma, Q. Meng, H. Cao, X. Li, S. Shi, L. Yu,S. Kalachikov, J. J. Russo, N. J. Turro, and J. Ju. Four-color DNA sequencing with 3’-O-modified nucleotidereversible terminators and chemically cleavable fluorescent dideoxynucleotides. Proc. Natl. Acad. Sci. U.S.A.,105:9145–9150, Jul 2008.

[326] J. Bowers, J. Mitchell, E. Beer, P. R. Buzby, M. Causey, J. W. Efcavitch, M. Jarosz, E. Krzymanska-Olejnik,L. Kung, D. Lipson, G. M. Lowman, S. Marappan, P. McInerney, A. Platt, A. Roy, S. M. Siddiqi, K. Steinmann,and J. F. Thompson. Virtual terminator nucleotides for next-generation DNA sequencing. Nat. Methods, 6:593–595, Aug 2009.

[327] M. J. Fullwood, C. L. Wei, E. T. Liu, and Y. Ruan. Next-generation DNA sequencing of paired-end tags (PET)for transcriptome and genome analyses. Genome Res., 19:521–532, Apr 2009.

[328] C. Trapnell and S. L. Salzberg. How to map billions of short reads onto genomes. Nat. Biotechnol., 27:455–457,May 2009.

[329] M. J. Chaisson, D. Brinza, and P. A. Pevzner. De novo fragment assembly with short mate-paired reads: Doesthe read length matter? Genome Res., 19:336–346, Feb 2009.

[330] D. R. Bentley. Whole-genome re-sequencing. Curr. Opin. Genet. Dev., 16:545–552, Dec 2006.

bibliography 280

[331] E. H. Turner, S. B. Ng, D. A. Nickerson, and J. Shendure. Methods for genomic partitioning. Annu RevGenomics Hum Genet, 10:263–284, 2009.

[332] E. Hodges, Z. Xuan, V. Balija, M. Kramer, M. N. Molla, S. W. Smith, C. M. Middle, M. J. Rodesch, T. J. Albert,G. J. Hannon, and W. R. McCombie. Genome-wide in situ exon capture for selective resequencing. Nat. Genet.,39:1522–1527, Dec 2007.

[333] M. A. Cleary, K. Kilian, Y. Wang, J. Bradshaw, G. Cavet, W. Ge, A. Kulkarni, P. J. Paddison, K. Chang, N. Sheth,E. Leproust, E. M. Coffey, J. Burchard, W. R. McCombie, P. Linsley, and G. J. Hannon. Production of complexnucleic acid libraries using highly parallel in situ oligonucleotide synthesis. Nat. Methods, 1:241–248, Dec 2004.

[334] S. B. Ng, E. H. Turner, P. D. Robertson, S. D. Flygare, A. W. Bigham, C. Lee, T. Shaffer, M. Wong, A. Bhattachar-jee, E. E. Eichler, M. Bamshad, D. A. Nickerson, and J. Shendure. Targeted capture and massively parallelsequencing of 12 human exomes. Nature, 461:272–276, Sep 2009.

[335] C. Gilissen, A. Hoischen, H. G. Brunner, and J. A. Veltman. Unlocking Mendelian disease using exomesequencing. Genome Biol, 12:228, Sep 2011.

[336] J. R. Lupski, J. G. Reid, C. Gonzaga-Jauregui, D. Rio Deiros, D. C. Chen, L. Nazareth, M. Bainbridge, H. Dinh,C. Jing, D. A. Wheeler, A. L. McGuire, F. Zhang, P. Stankiewicz, J. J. Halperin, C. Yang, C. Gehman, D. Guo,R. K. Irikat, W. Tom, N. J. Fantin, D. M. Muzny, and R. A. Gibbs. Whole-genome sequencing in a patient withCharcot-Marie-Tooth neuropathy. N. Engl. J. Med., 362:1181–1191, Apr 2010.

[337] P. C. Ng, S. Levy, J. Huang, T. B. Stockwell, B. P. Walenz, K. Li, N. Axelrod, D. A. Busam, R. L. Strausberg, andJ. C. Venter. Genetic variation in an individual human exome. PLoS Genet., 4:e1000160, 2008.

[338] S. B. Ng, K. J. Buckingham, C. Lee, A. W. Bigham, H. K. Tabor, K. M. Dent, C. D. Huff, P. T. Shannon, E. W.Jabs, D. A. Nickerson, J. Shendure, and M. J. Bamshad. Exome sequencing identifies the cause of a mendeliandisorder. Nat. Genet., 42:30–35, Jan 2010.

[339] J. L. Wang, X. Yang, K. Xia, Z. M. Hu, L. Weng, X. Jin, H. Jiang, P. Zhang, L. Shen, J. F. Guo, N. Li, Y. R. Li, L. F.Lei, J. Zhou, J. Du, Y. F. Zhou, Q. Pan, J. Wang, J. Wang, R. Q. Li, and B. S. Tang. TGM6 identified as a novelcausative gene of spinocerebellar ataxias using exome sequencing. Brain, 133:3510–3518, Dec 2010.

[340] R. Li, Y. Li, K. Kristiansen, and J. Wang. SOAP: short oligonucleotide alignment program. Bioinformatics, 24:713–714, Mar 2008.

[341] D. Blankenberg, G. Von Kuster, N. Coraor, G. Ananda, R. Lazarus, M. Mangan, A. Nekrutenko, and J. Taylor.Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol, Chapter 19:1–21, Jan2010.

[342] J. Goecks, A. Nekrutenko, J. Taylor, E. Afgan, G. Ananda, D. Baker, D. Blankenberg, R. Chakrabarty, N. Coraor,J. Goecks, G. Von Kuster, R. Lazarus, K. Li, A. Nekrutenko, J. Taylor, and K. Vincent. Galaxy: a comprehensiveapproach for supporting accessible, reproducible, and transparent computational research in the life sciences.Genome Biol., 11:R86, 2010.

[343] P. C. Ng and S. Henikoff. Predicting the effects of amino acid substitutions on protein function. Annu RevGenomics Hum Genet, 7:61–80, 2006.

[344] P. D. Thomas, M. J. Campbell, A. Kejariwal, H. Mi, B. Karlak, R. Daverman, K. Diemer, A. Muruganujan, andA. Narechania. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res., 13:2129–2141, Sep 2003.

[345] Y. Bromberg, G. Yachdav, and B. Rost. SNAP predicts effect of mutations on protein function. Bioinformatics,24:2397–2398, Oct 2008.

[346] V. Ramensky, P. Bork, and S. Sunyaev. Human non-synonymous SNPs: server and survey. Nucleic Acids Res.,30:3894–3900, Sep 2002.

[347] S. Sunyaev, V. Ramensky, I. Koch, W. Lathe, A. S. Kondrashov, and P. Bork. Prediction of deleterious humanalleles. Hum. Mol. Genet., 10:591–597, Mar 2001.

[348] J. A. Bailey, Z. Gu, R. A. Clark, K. Reinert, R. V. Samonte, S. Schwartz, M. D. Adams, E. W. Myers, P. W. Li,and E. E. Eichler. Recent segmental duplications in the human genome. Science, 297:1003–1007, Aug 2002.

[349] J. A. Bailey, A. M. Yavor, H. F. Massa, B. J. Trask, and E. E. Eichler. Segmental duplications: organization andimpact within the current human genome project assembly. Genome Res., 11:1005–1017, Jun 2001.

bibliography 281

[350] A. Kozik, E. Kochetkova, and R. Michelmore. GenomePixelizer–a visualization program for comparativegenomics within and between species. Bioinformatics, 18:335–336, Feb 2002.

[351] S. Griffiths-Jones, H. K. Saini, S. van Dongen, and A. J. Enright. miRBase: tools for microRNA genomics.Nucleic Acids Res., 36:D154–158, Jan 2008.

[352] D. J. Hedges, D. Hedges, D. Burges, E. Powell, C. Almonte, J. Huang, S. Young, B. Boese, M. Schmidt, M. A.Pericak-Vance, E. Martin, X. Zhang, T. T. Harkins, and S. Zuchner. Exome sequencing of a multigenerationalhuman pedigree. PLoS ONE, 4:e8232, 2009.

[353] O. Harismendy, P. C. Ng, R. L. Strausberg, X. Wang, T. B. Stockwell, K. Y. Beeson, N. J. Schork, S. S. Murray,E. J. Topol, S. Levy, and K. A. Frazer. Evaluation of next generation sequencing platforms for populationtargeted sequencing studies. Genome Biol., 10:R32, 2009.

[354] N. F. Asan, Y. Xu, H. Jiang, C. Tyler-Smith, Y. Xue, T. Jiang, J. Wang, M. Wu, X. Liu, G. Tian, J. Wang, J. Wang,H. Yang, and X. Zhang. Comprehensive comparison of three commercial human whole-exome capture plat-forms. Genome Biol, 12:R95, Sep 2011.

[355] L. Cartegni, S. L. Chew, and A. R. Krainer. Listening to silence and understanding nonsense: exonic mutationsthat affect splicing. Nat. Rev. Genet., 3:285–298, Apr 2002.

[356] M. J. Bamshad, S. B. Ng, A. W. Bigham, H. K. Tabor, M. J. Emond, D. A. Nickerson, and J. Shendure. Exomesequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet., 12:745–755, 2011.

[357] Megan Hwa. Brewer. Molecular genetics of X-linked Charcot-Marie-Tooth neuropathy (CMTX3). Master’sthesis, Faculty of Medicine, University of Sydney, 2011.

[358] H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics,25:1754–1760, Jul 2009.

[359] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. TheSequence Alignment/Map format and SAMtools. Bioinformatics, 25:2078–2079, Aug 2009.

[360] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, Y. Zhang, D. Blankenberg, I. Albert,J. Taylor, W. Miller, W. J. Kent, and A. Nekrutenko. Galaxy: a platform for interactive large-scale genomeanalysis. Genome Res., 15:1451–1455, Oct 2005.

[361] J. T. Robinson, H. Thorvaldsdottir, W. Winckler, M. Guttman, E. S. Lander, G. Getz, and J. P. Mesirov. Integra-tive genomics viewer. Nat. Biotechnol., 29:24–26, Jan 2011.

[362] K. Verhoeven, K. G. Claeys, S. Zuchner, J. M. Schroder, J. Weis, C. Ceuterick, A. Jordanova, E. Nelis,E. De Vriendt, M. Van Hul, P. Seeman, R. Mazanec, G. M. Saifi, K. Szigeti, P. Mancias, I. J. Butler, A. Kochanski,B. Ryniewicz, J. De Bleecker, P. Van den Bergh, C. Verellen, R. Van Coster, N. Goemans, M. Auer-Grumbach,W. Robberecht, V. Milic Rasic, Y. Nevo, I. Tournev, V. Guergueltcheva, F. Roelens, P. Vieregge, P. Vinci, M. T.Moreno, H. J. Christen, M. E. Shy, J. R. Lupski, J. M. Vance, P. De Jonghe, and V. Timmerman. MFN2 mutationdistribution and genotype/phenotype correlation in Charcot-Marie-Tooth type 2. Brain, 129:2093–2102, Aug2006.

[363] L. Bartoloni, J. L. Blouin, Y. Pan, C. Gehrig, A. K. Maiti, N. Scamuffa, C. Rossier, M. Jorissen, M. Armengot,M. Meeks, H. M. Mitchison, E. M. Chung, C. D. Delozier-Blanchet, W. J. Craigen, and S. E. Antonarakis.Mutations in the DNAH11 (axonemal heavy chain dynein type 11) gene cause one form of situs inversustotalis and most likely primary ciliary dyskinesia. Proc. Natl. Acad. Sci. U.S.A., 99:10282–10286, Aug 2002.

[364] G. C. Schwabe, K. Hoffmann, N. T. Loges, D. Birker, C. Rossier, M. M. de Santi, H. Olbrich, M. Fliegauf,M. Failly, U. Liebers, M. Collura, G. Gaedicke, S. Mundlos, U. Wahn, J. L. Blouin, B. Niggemann, H. Omran,S. E. Antonarakis, and L. Bartoloni. Primary ciliary dyskinesia associated with normal axoneme ultrastructureis caused by DNAH11 mutations. Hum. Mutat., 29:289–298, Feb 2008.

[365] G. T. Banks and E. M. Fisher. Cytoplasmic dynein could be key to understanding neurodegeneration. GenomeBiol., 9:214, 2008.

[366] J. Eschbach and L. Dupuis. Cytoplasmic dynein in neurodegeneration. Pharmacol. Ther., 130:348–363, Jun 2011.

[367] Q. Wu and T. Maniatis. A striking organization of a large family of human neural cadherin-like cell adhesiongenes. Cell, 97:779–790, Jun 1999.

[368] X. Wang, J. A. Weiner, S. Levi, A. M. Craig, A. Bradley, and J. R. Sanes. Gamma protocadherins are requiredfor survival of spinal interneurons. Neuron, 36:843–854, Dec 2002.

bibliography 282

[369] C. Zou, W. Huang, G. Ying, and Q. Wu. Sequence analysis and expression mapping of the rat clusteredprotocadherin gene repertoires. Neuroscience, 144:579–603, Jan 2007.

[370] B. Tasic, C. E. Nabholz, K. K. Baldwin, Y. Kim, E. H. Rueckert, S. A. Ribich, P. Cramer, Q. Wu, R. Axel, andT. Maniatis. Promoter choice determines splice site selection in protocadherin alpha and gamma pre-mRNAsplicing. Mol. Cell, 10:21–33, Jul 2002.

[371] X. Wang, H. Su, and A. Bradley. Molecular mechanisms governing Pcdh-gamma gene expression: evidencefor a multiple promoter and cis-alternative splicing model. Genes Dev., 16:1890–1905, Aug 2002.

[372] Q. Wu, T. Zhang, J. F. Cheng, Y. Kim, J. Grimwood, J. Schmutz, M. Dickson, J. P. Noonan, M. Q. Zhang,R. M. Myers, and T. Maniatis. Comparative DNA sequence analysis of mouse and human protocadherin geneclusters. Genome Res., 11:389–404, Mar 2001.

[373] J. A. Weiner, X. Wang, J. C. Tapia, and J. R. Sanes. Gamma protocadherins are required for synaptic develop-ment in the spinal cord. Proc. Natl. Acad. Sci. U.S.A., 102:8–14, Jan 2005.

[374] E. LeGuern, A. Guilbot, M. Kessali, N. Ravise, J. Tassin, T. Maisonobe, D. Grid, and A. Brice. Homozygositymapping of an autosomal recessive form of demyelinating Charcot-Marie-Tooth disease to chromosome 5q23-q33. Hum. Mol. Genet., 5:1685–1688, Oct 1996.

[375] H. Azzedine, N. Ravise, C. Verny, A. Gabreels-Festen, M. Lammens, D. Grid, J. M. Vallat, G. Durosier,J. Senderek, S. Nouioua, T. Hamadouche, A. Bouhouche, A. Guilbot, C. Stendel, M. Ruberg, A. Brice, N. Birouk,O. Dubourg, M. Tazir, and E. LeGuern. Spine deformities in Charcot-Marie-Tooth 4C caused by SH3TC2 genemutations. Neurology, 67:602–606, Aug 2006.

[376] M. Magrane and U. Consortium. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford),2011:bar009, 2011.

[377] M. J. Clark, R. Chen, H. Y. Lam, K. J. Karczewski, R. Chen, G. Euskirchen, A. J. Butte, and M. Snyder. Perfor-mance comparison of exome DNA sequencing technologies. Nat. Biotechnol., 29:908–914, 2011.

[378] M. Dejesus-Hernandez, I. R. Mackenzie, B. F. Boeve, A. L. Boxer, M. Baker, N. J. Rutherford, A. M. Nicholson,N. A. Finch, H. Flynn, J. Adamson, N. Kouri, A. Wojtas, P. Sengdy, G. Y. Hsiung, A. Karydas, W. W. Seeley, K. A.Josephs, G. Coppola, D. H. Geschwind, Z. K. Wszolek, H. Feldman, D. S. Knopman, R. C. Petersen, B. L. Miller,D. W. Dickson, K. B. Boylan, N. R. Graff-Radford, and R. Rademakers. Expanded GGGGCC HexanucleotideRepeat in Noncoding Region of C9ORF72 Causes Chromosome 9p-Linked FTD and ALS. Neuron, 72:245–256,Oct 2011.

[379] A. E. Renton, E. Majounie, A. Waite, J. Simon-Sanchez, S. Rollinson, J. R. Gibbs, J. C. Schymick, H. Laaksovirta,J. C. van Swieten, L. Myllykangas, H. Kalimo, A. Paetau, Y. Abramzon, A. M. Remes, A. Kaganovich, S. W.Scholz, J. Duckworth, J. Ding, D. W. Harmer, D. G. Hernandez, J. O. Johnson, K. Mok, M. Ryten, D. Trabzuni,R. J. Guerreiro, R. W. Orrell, J. Neal, A. Murray, J. Pearson, I. E. Jansen, D. Sondervan, H. Seelaar, D. Blake,K. Young, N. Halliwell, J. B. Callister, G. Toulson, A. Richardson, A. Gerhard, J. Snowden, D. Mann, D. Neary,M. A. Nalls, T. Peuralinna, L. Jansson, V. M. Isoviita, A. L. Kaivorinne, M. Holtta-Vuori, E. Ikonen, R. Sulkava,M. Benatar, J. Wuu, A. Chio, G. Restagno, G. Borghero, M. Sabatelli, D. Heckerman, E. Rogaeva, L. Zinman,J. D. Rothstein, M. Sendtner, C. Drepper, E. E. Eichler, C. Alkan, Z. Abdullaev, S. D. Pack, A. Dutra, E. Pak,J. Hardy, A. Singleton, N. M. Williams, P. Heutink, S. Pickering-Brown, H. R. Morris, P. J. Tienari, and B. J.Traynor. A Hexanucleotide Repeat Expansion in C9ORF72 Is the Cause of Chromosome 9p21-Linked ALS-FTD. Neuron, 72:257–268, Oct 2011.

[380] H. Daoud, P. N. Valdmanis, P. A. Dion, and G. A. Rouleau. Analysis of DPP6 and FGGY as candidate genesfor amyotrophic lateral sclerosis. Amyotroph Lateral Scler, 11:389–391, Aug 2010.