792891.pdf - RWTH Publications

158
Developing Inhibitors of the Enzyme ‘TRMT2a’ for the Treatment of PolyQ Diseases Von der Fakultät für Mathematik, Informatik und Naturwissenschaften der RWTH Aachen University zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigte Dissertation vorgelegt von Mag. Michael Alois Margreiter aus Schwaz 1. Berichter: Univ.-Prof. Dr. rer. nat. Carsten Bolm 2. Berichter: Univ.-Prof. Dr. rer. nat. Elmar Weinhold 3. Berichterin: Jun.-Prof. Dr. Ph.D. Giulia Rossetti Prüferin: Dr. rer. nat. Meike Niggemann Tag der mündlichen Prüfung: 26. Mai 2020 Diese Dissertation ist auf den Internetseiten der Universitätsbibliothek verfügbar.

Transcript of 792891.pdf - RWTH Publications

Developing Inhibitors of the Enzyme ‘TRMT2a’ for the

Treatment of PolyQ Diseases

Von der Fakultät für Mathematik, Informatik und Naturwissenschaften der RWTH Aachen

University zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften

genehmigte Dissertation vorgelegt von

Mag. Michael Alois Margreiter

aus Schwaz

1. Berichter: Univ.-Prof. Dr. rer. nat. Carsten Bolm

2. Berichter: Univ.-Prof. Dr. rer. nat. Elmar Weinhold

3. Berichterin: Jun.-Prof. Dr. Ph.D. Giulia Rossetti

Prüferin: Dr. rer. nat. Meike Niggemann

Tag der mündlichen Prüfung: 26. Mai 2020

Diese Dissertation ist auf den Internetseiten der Universitätsbibliothek verfügbar.

ii

iii

Abstract

Although the specific causes for all nine polyglutamine diseases were elucidateddecades ago, treatment options remain scarce and predominantly focus on sym-ptomatic relief. This is particularly striking as these neurodegenerative diseasesare monogenetic. A common hallmark is the formation of mutant protein insolu-ble aggregates in vulnerable cell populations in the brain, often starting already atmid-life. Thus, there is a clear need for viable neuroprotective strategies that de-lay disease onset and progression of these fatal disorders. To this end, the proteintRNA methyltransferase 2 homolog A (TRMT2a) was discovered in earlier studiesas a strong modulator of polyglutamine mediated toxicity (Dr. Voigt’s Lab, Instituteof Neurology, RWTH University Hospital).

Computational biology encompasses a wide range of computer-assistedbiomodeling approaches. The resulting models aim to integrate often heteroge-neous biological data and aid wet-lab experiments. This emerging field of researchhas contributed substantially to our understanding of the molecular foundations ofhealth and pathogenesis, also in the context of polyglutamine diseases. Moreover,a detailed understanding of these processes can be leveraged to discover and de-sign new drugs in silico- also known as computer-aided drug design (CADD). Thisthesis uses state-of-the-art CADD approaches to gain insights into TRMT2a at thesequence and structural level.First, I compared the sequence of TRMT2a with closely related ones to identifyconserved residues. As I initially lacked structural information on TRMT2a, I ge-nerated comparative structural models that eventually enabled the selection of asuitable protein fragment corresponding to a protein domain of TRMT2a. This frag-ment was then successfully expressed/purified in Prof. Niessing’s lab (Universityof Ulm), resulting in the first X-ray crystallographic structure of this domain.

Subsequently, I investigated if a pharmaceutical inhibition of TRMT2a(function) might prove equally neuroprotective as silencing TRMT2a. Hence, I as-sessed whether TRMT2a is susceptible to small molecule modulators, e.g., if ourrecently crystallized TRMT2a domain would feature a binding site able to accom-modate a small molecule ligand, ideally capable to cross the brain-blood-barrier.Unfortunately, I could not find such a site on the crystal structure of the domain.

Nevertheless, proteins in solution typically explore vast conformationallandscapes. Thus, after characterizing the crystallographic water network, I aug-

iv Abstract

mented my search for binding sites by also incorporating dynamical aspects of thisdomain into my models. (i) I employed machine-learning-based methods to predictresidues giving rise to local flexibility in proteins (ii) allosteric communication net-works within this domain, followed by (iii) molecular mechanics simulations in animplicit solvent with perturbation approaches. These analyses indicated a putativetransient binding site capable of binding small molecules. (iv) To further under-stand the formation and collapse of such a transient pocket, I conducted moleculardynamics simulations in explicit solvent.

Next, I selected a representative snapshot that featured the domain with apocket conformation suitable to host a small molecule ligand and performed a vir-tual screening of commercially available compounds complementing this pocket.I prioritized multiple compounds, and these were tested by Dr. Voigt’s team in vi-tro on HEK cell lines. We found their efficacy in mitigating polyglutamine toxicitycomparable to TRMT2a silenced cell lines, while not providing further benefits foralready TRMT2a depleted cells lines. This indicates that these compounds mightindeed interfere with TRMT2a. Moreover, for one compound, it was possible toshow dose-dependent binding to the domain in surface plasmon resonance expe-riments (SPR). The latter analyses were performed in Prof. Niessing’s lab. Encou-ragingly, a few of these compounds also showed a rescuing effect on fibroblastsderived from patients, independent from their respective polyglutamine disorders.A world-wide patent application for these compounds involving the labs of Prof.Shah, Prof. Schulz (both Research Center Julich), Dr. Voigt, and Prof. Niessing iscurrently underway.

In the second part of my thesis, I explore the consequences of hampe-red TRMT2a activity and how this might give rise to reduced aggregate forma-tion. I hypothesized that in the absence of TRMT2a, the translation of the poly-glutamine tract becomes more error-prone and results in non-glutamine interrupti-ons. Interrupted polyglutamine tracts are less toxic and result in fewer aggregates.To this end, I simulated uninterrupted and interrupted polyglutamine tract con-structs with Replica Exchange Protein Monte Carlo methods and compared theirrespective physicochemical properties that give rise to their different aggregationpropensities.

In the third part of my thesis, I applied CADD techniques to two targetsof interest for pain. Here, I investigated in collaboration with Prof. Grunder (Uni-versity Clinic Aachen), how RPRFa, an RFamide peptide from the venom of a conesnail, binds to Acid-sensing ion channel 3 (ASIC3). This proton-gated Na+ chan-nel plays a key role in neuropathic pain. Here, I build a comparative model andinvestigated in which conformation the peptide ligand could bind. Based on thesemodels, I proposed several single-point mutants and Prof. Grunder’s team asses-sed their effect with electrophysiology and UV-linking experiments. In a follow-upproject with the same group, I mapped the binding site of Dynorphin A(1-14) toASIC1 using a related approach augmented with explicit solvent simulations.

Conclusively, the results presented in this thesis indicate that a computa-

Abstract v

tional approach to problems in the life sciences can not only facilitate the interpre-tation of experimental observations but also guide and augment them when theyare introduced at the critical early stages of a project.

vi Abstract

vii

Uberblick

Obwohl die jeweiligen Ursachen aller neun Polyglutaminerkrankungen schonseit Jahrzehnten bekannt sind, gibt es bisher keine zugelassenen kurativenTherapien. Gangige Ansatze stellen die Symptomlinderung in den Mittelpunkt.Das ist insofern bemerkenswert, als dass diese Erkrankungen auf einzelneMutationen in den krankheitsrelevanten Genen zuruckzufuhren sind. Eineweitere herausragende Gemeinsamkeit dieser Erkrankungen ist die Bildung vonunloslichen Aggregaten der mutierten Proteine in bestimmten Zellpopulationendes Gehirns, oft bereits ab der Lebensmitte. Deshalb besteht ein offensichtlicherBedarf an effektiven und neuroprotektiven Strategien, welche imstande sind,den Zeitpunkt des Einsetzens sowie das Fortschreiten dieser letztendlichtodlichen Erkrankungen hinauszuzogern. In diesem Sinne konnten fruhereUntersuchungen an der Uniklinik Aachen (Dr. Voigt, Neurologie) zeigen, dasstRNA Methyltransferase 2 Homolog A (TRMT2a) ein effektiver Modulator, derdurch die Polyglutaminerkrankungen induzierten Neurotoxizitat ist.Rechnergestutzte Biologie umfasst eine Vielzahl von Methoden, mit demZiel die oftmals heterogenen biologischen Datensatze einzuordnen. Diesesaufstrebende Forschungsfeld konnte bereits substanzielle Beitrage zu unseremVerstandnis der molekularen Grundlagen von Gesundheit und Pathogeneseleisten- wie auch im Falle der Polyglutaminerkrankungen. Zudem kann eindetailliertes Verstandnis dieser Prozesse zur Entdeckung und Entwicklung vonneuartigen Wirkstoffen verwendet werden, auch bekannt als computergestutztesWirkstoffdesign (CADD). In dieser Dissertation kommen derartige state-of-the-art Ansatze zur Anwendung, um Einblicke in das Protein TRMT2a auf Sequenz-und Strukturniveau zu erhalten. Dazu verglich ich die Sequenz von TRMT2a miteng verwandten Sequenzen und identifizierte zunachst evolutionar konservierteResiduen. Da zu Beginn dieser Arbeit keine strukturellen Informationen zumProtein vorlagen, erstellte ich Homologiemodelle. Diese Untersuchungen erlaubtenschlussendlich die Auswahl eines geeigneten Proteinfragmentes. Dieses Fragmentkonnte exprimiert/aufgereinigt werden und fuhrte daruber hinaus zur erstenKristallstruktur einer Domane von TRMT2a (Prof. Niessing, Universitat Ulm).Des Weiteren wollte ich herausfinden, ob eine pharmakologische Inhibitionvon TRMT2a ahnlich neuroprotektive Effekte wie der entsprechende knock-down zeigen wurde. Daher untersuchte ich die Apostruktur der besagten

viii Uberblick

Domane nach Bindetaschen, welche die Bindung etwaiger niedermolekularerStoffe ermoglichen wurden. Leider konnte ich keine geeignete Tasche inder Kristallstruktur detektieren. Letztendlich reprasentiert eine Kristallstrukturjedoch nur einen sehr begrenzten Konformationsraum, wohingegen Proteine inLosung oftmals viele verschiedene Konformationen einzunehmen imstande sind.Nachdem ich zunachst das kristallographische Wassernetzwerk analysiert hatte,erweiterte ich deshalb meine Suche nach einer geeigneten Bindetasche indemich auch dynamische Aspekte der Domane in meine Modelle miteinbezog. (i)Anfanglich verwendete ich Methoden, die auf maschinellem Lernen beruhen,um Residuen vorherzusagen, welche zu lokaler Flexibilitat im Protein fuhren (ii)allosterische Kommunikationsnetzwerke der Residuen untereinander beschreibenum darauf aufbauend (iii) molekulardynamische Simulationen in implizitemSolvent mit storungstheoretischen Ansatzen durchzufuhren. In diesen Analysenzeichnete sich eine transiente Bindetasche ab, welche zudem adaquat schien,auch niedermolekulare Stoffe zu binden. Um ein detaillierteres Verstandnisder Bildung und des Kollapses dieser Tasche zu erhalten fuhrte ich extensive(iv) molekulardynamische Simulation in explizitem Solvent, d.h. mit einematomistischen Wassermodell, durch. Darauf basierend selektierte ich einegeeignete Konformation der Domane und fuhrte ein virtuelles Screening mitkommerziell verfugbaren Chemikalien durch. Ich wahlte mehrere (zur Bindetaschekomplementare) Chemikalien aus und Dr. Voigt’s Labor testete diese in vitro zuerstin HEK Zelllinien. Es zeigte sich, dass zahlreiche dieser Chemikalien ahnlicheffizient wie der knock-down von TRMT2a im Stande waren, polyglutamin-induzierte Toxizitat zu reduzieren, wahrend keine weitere Besserung bei Zellenfestgestellt werden konnte, bei denen TRMT2a praktisch nicht vorhanden war.Es konnte deshalb durchaus sein, dass mehrere dieser Chemikalien tatsachlichTRMT2a inhibieren. Daruber hinaus konnte Prof. Niessing’s Labor fur einwirksames Molekul eine dosisabhangige Bindung an die Domane in SurfacePlasmon Resonance (SPR) Versuchen nachweisen. Interessanterweise konnteneinige dieser Chemikalien die Anzahl toxischer Aggregate auch in Fibroblasten,welche von polyglutamin-erkrankten Patienten gespendet wurden, drastischreduzieren. Dieser Effekt konnte zudem bei Fibroblasten mit anderen PolyQ-Erkrankungen nachgewiesen werden. Fur diese chemischen Strukturen lauftderzeit in Kooperation mit den Forschergruppen von Prof. Shah und Prof. Schulz(beide Forschungszentrum Julich), Dr. Voigt und Prof. Niessing, eine weltweitePatentanmeldung.Im zweiten Teil meiner Arbeit untersuchte ich die moglichen Konsequenzeneiner gestorten TRMT2a Funktion. Meiner Arbeitshypothese nach, sollteeine verringerte TRMT2a Konzentration/Aktivitat zu einer fehleranfalligerenTranslation von Polyglutamin-Abschnitten fuhren. Unterbrochene Polyglutamin-Abschnitte erwiesen sich in fruheren Studien als weniger toxisch, zudem werdenin diesen Fallen weniger Aggregate beobachtet. Deshalb simulierte ich Konstruktemit kontinuierlichen und unterbrochenen Polyglutamin-Abschnitten mithilfe von

Uberblick ix

Replica Exchange Protein Monte Carlo Methoden und verglich die jeweiligenphysikochemischen Eigenschaften, welche zu den experimentell beobachtetenunterschiedlichen Aggregationsverhalten fuhren.Im dritten Teil meiner Dissertation, verwendete ich CADD um einen Einblick inProteine zu gewinnen, die bei der Schmerzentstehung eine Rolle spielen. Dazuuntersuchte ich in Kollaboration mit Prof. Grunders Team (UniversitatsklinikumAachen), wie das Peptid RPRFa (ein RFamid aus dem Gift der Kegelschnecke) densaureempfindlichen Ionenkanal 3 (ASIC3) bindet. Dieser protonengesteuerte Na+

Kanal ist bei Neuralgien von Bedeutung. Hier erstellte ich ein Homologiemodellund untersuchte in welcher Konformation diese Peptidgruppe den Kanal anwelcher Stelle binden konnte. Auf Basis dieser Bindehypothesen, wahltenich mehrere Punktmutationen aus. Prof. Grunders Labor beurteilte ihreAuswirkungen mit elektrophysiologischen und UV-vernetzenden Methoden. Ineinem Folgeprojekt mit derselben Gruppe erstellte ich ein Modell eines Komplexesbestehend aus Dynorphin A (1-14) und ASIC1 basierend auf einem ahnlichenAnsatz. Zudem erweiterte ich diese Untersuchung mit molekulardynamischenSimulationen in explizitem Solvent.Zusammenfassend sind die Ergebnisse meiner Arbeit ein Hinweis darauf, dassrechnergestutze Ansatze in den Lebenswissenschaften nicht nur eine Interpretationder experimentellen Ergebnisse ermoglichen, sondern derartige Untersuchungenzudem leiten und komplementieren konnen, wenn solche Ansatze bereits imkritischen Anfangsstadium eines Projektes miteinbezogen werden.

In memory of my brother Josef Margreiter

xi

Contents

Abstract iii

Uberblick viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Acknowledgements xix

1 Introduction: Polyglutamine Diseases 11.1 Epidemiology and Societal Impact . . . . . . . . . . . . . . . . . . . . 21.2 Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Drug Discovery and Treatment Options for CNS Diseases . . . . . . . 51.4 Translational Neuroscience . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.1 TRMT2a as a Novel Target for PolyQ Diseases . . . . . . . . . 7

2 Methods: Macromolecular Modeling 112.1 Comparative Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Fold Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Template Selection and Alignment . . . . . . . . . . . . . . . . 142.1.3 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.4 Model Evaluation and Optimization . . . . . . . . . . . . . . . 16

2.2 Statistical Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Regulation of Temperature and Pressure . . . . . . . . . . . . 192.2.2 Thermostats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.3 Barostats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Molecular Dynamics Simulations . . . . . . . . . . . . . . . . . . . . . 212.3.1 Quantum Mechanical Methods . . . . . . . . . . . . . . . . . . 222.3.2 Empirical Force Fields . . . . . . . . . . . . . . . . . . . . . . . 252.3.3 Periodic Boundary Conditions & the Minimum Image Con-

vention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.4 Long-Range Interactions . . . . . . . . . . . . . . . . . . . . . . 262.3.5 Neighbor’s List and SHAKE Formalism . . . . . . . . . . . . . 272.3.6 Solvation and Solvent Effects . . . . . . . . . . . . . . . . . . . 27

xii Contents

2.4 Monte Carlo Protein Simulation . . . . . . . . . . . . . . . . . . . . . . 282.5 Computational Lead Discovery and Drug Design . . . . . . . . . . . 28

2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.5.2 Computer-aided Drug Design . . . . . . . . . . . . . . . . . . 292.5.3 Structure- and Ligand-based Approaches . . . . . . . . . . . . 302.5.4 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5.5 High-Throughput Screening and Virtual Screening . . . . . . 332.5.6 Druggability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5.7 Transient Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 In Silico Discovery of Allosteric Inhibitors of TRMT2a RRM to Ameliora-te PolyQ Disease-mediated Neurotoxicity 453.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2 Structure of the TRMT2a RNA Recognition Motif . . . . . . . . . . . 46

3.2.1 Structure Determination . . . . . . . . . . . . . . . . . . . . . . 463.3 Structure-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.1 Bioinformatics Analysis . . . . . . . . . . . . . . . . . . . . . . 483.3.2 Crystallographic Water Clustering and 3D-RISM Analysis. . . 49

3.4 Site Prediction on the Protein Crystal Structure . . . . . . . . . . . . . 503.4.1 Traditional Structure-based Druggability Detection . . . . . . 503.4.2 Infering Druggability from other RRMs . . . . . . . . . . . . . 513.4.3 Small Molecular Probes to Detect Druggable Sites . . . . . . . 52

3.5 Transient Site Discovery on TRMT2a RRM . . . . . . . . . . . . . . . . 533.6 Explict Solvent MD and Snapshot Selection . . . . . . . . . . . . . . . 543.7 Virtual Screening on RRM and Post Processing. . . . . . . . . . . . . 57

3.7.1 Ligand-based Approach on the Catalytic Site . . . . . . . . . . 583.8 In cell Assays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.9 Biophysical Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 603.10 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.10.1 Crystal Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 603.10.2 Molecular Dynamics Simulation of RRM . . . . . . . . . . . . 623.10.3 Virtual Screening . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.11 Generation of HEKT293T with stable RNAi-mediated knockdown ofTRMT2a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.11.1 Cell Death Assay . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.12 Biophysical and Cell Experiments . . . . . . . . . . . . . . . . . . . . . 653.13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 Decreased Aggregate Formation upon TRMT2a Inhibition 714.1 Computational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Contents xiii

5 The Conorfamide RPRFa Stabilizes the Open Conformation of Acid-Sensing Ion Channel 3 via the Nonproton Ligand-Sensing Domain 755.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.1.1 Site-Directed Mutagenesis and RNA Synthesis . . . . . . . . . 775.2 Preparation and Injection of Oocytes . . . . . . . . . . . . . . . . . . . 775.3 Electrophysiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.4 Photo Affinity Labeling of ASIC3 by RPR[azF]a . . . . . . . . . . . . . 785.5 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.6 Modeling of Rat ASIC3 and RPRFa . . . . . . . . . . . . . . . . . . . . 795.7 Molecular Modeling of the RPRFa Binding Poses . . . . . . . . . . . . 795.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Bibliography 97

Appendix 133

xv

List of Figures

1.1 Rough eye phenotype (REP) used as a primary readout . . . . . . . . 8

2.1 Difference of Gaussians approach . . . . . . . . . . . . . . . . . . . . . 372.2 Induced Fit and Conformational Selection . . . . . . . . . . . . . . . . 402.3 Different Types of Pocket Flexibility . . . . . . . . . . . . . . . . . . . 41

3.1 RRM Crystal Structure & Crystallographic Water Network Analysis . 473.2 Sequence-based Analysis of the RRM of TRMT2a . . . . . . . . . . . . 493.3 Protein Patches and 3D-RISM Analysis . . . . . . . . . . . . . . . . . . 503.4 Binding Site Detection on the Crystal Structure of TRMT2a RRM . . . 513.5 Compuational Modeling of Allosteric Effects . . . . . . . . . . . . . . 533.6 Binding Site Prediction with Mixed Solvent Molecular Dynamics Si-

mulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.7 With Machine Learning and Perturbed Molecular Dynamics Simula-

tions towards Local Flexibility Prediction . . . . . . . . . . . . . . . . 553.8 Extensive and Unperturbed Molecular Mechanics Simulations of the

RRM of TRMT2a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.9 Chemcial Structures of Selected Compounds . . . . . . . . . . . . . . 683.10 Virtual Screening using the Cryptic Pocket and Pharmacophore Hy-

pothesis for the Catalytic Domain . . . . . . . . . . . . . . . . . . . . . 69

4.1 Potential energy histogram overlap of uninterrupted (left) and inter-rupted (right) strands. The high degree of overlap indicates that alltemperature were visited regularly. . . . . . . . . . . . . . . . . . . . . 71

4.2 Temperature-dependent solvent accessible surface area . . . . . . . . 72

5.1 The desensitized state of ASIC3 with RPRFa bound is not stronglypopulated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.2 The photo-reactive peptide derivate abolishes desensitization whencrosslinked to ASIC3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.3 Cartoon representation of ASIC3 in complex with RPRFa . . . . . . . 905.4 Binding of RPRFa to the NPLSD of ASIC3 . . . . . . . . . . . . . . . . 905.5 Mutation of residues in the NPLSD reduce the modulation by RPRFa 91

xvi List of Figures

5.6 NPLSD mutants are affected by high concentrations of RPRFa. . . . . 92

xvii

List of Tables

1.1 PolyQ Disease Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Secondary structure propensities of continous polyglutamine con-structs and variants with a glutamate insertion with their respectivetemperature dependency . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.1 Potential binding sites on ASIC3 . . . . . . . . . . . . . . . . . . . . . 80

xix

Acknowledgements

First and foremost thanks to my supervisor Jun.-Prof. Dr. Ph.D. Giulia Rossetti forthe opportunity to do my doctoral studies in her drug design group, her encour-agement and the chance to pursue my own research ideas. She showed me theimportance to think critically and allowed me to be involved in several intriguinginterdisciplinary research projects. This helped expand my skill set and scientifichorizon.

My collaborators deserve special recognition: Dr. Aaron Voigt for his ini-tiative, and his unyielding flow of new ideas, and suggestions, Yasmine Wasser, Ca-rina Sobisch, and Benedetta Poma, Prof. Dr. Dierk Niessing with Monika Witzen-berger, and Elena Davydova, Prof. Dr. Stefan Grunder including Dr. Axel Schmidt,Melissa Reiners, and Dr. Lilia Leisle. I am grateful for the many discussions, butalso your kind willingness to investigate some of my bolder hypotheses.

I would like to thank Prof. Dr. Carsten Bolm for his examination ofthis thesis. Moreover, I want to express my gratitude for the board of directorsat Forschungszentrum Julich for granting me the ’Vorstandsdoktoranden’ scholar-ship and, also, Prof Dr. Paolo Carloni for his support.

My friends at INM-9/IAS-5 and notably Fabri, Zeineb Si, Luca M.,Thomas, Luca P., Matic, Jonas G., Divya, Jakob, Anna, Emi, Vania, Slava, Rodrigo,Riccardo G., Loris, Joe, Riccardo C., Jonas M., Wenping and newer members of thegroup! You made my day so many times, and it was a pleasure to work with youin such an enjoyable research environment. I am most grateful for the support,encouragement, and fruitful discussions with Jun.-Prof. Dr. Mercedes Prieto, Dr.Vania Calandrini and Dr. Emiliano Ippoliti.

Petra Rott, Elisa Polese, Sabrina Schulte and all the other members of theadministrative team, thank you ever so much for kind support and efforts.

An exceptional thank you goes to my friends in Austria, especially Ozbejand Eva, Sandra F. and Sandra S., Flo, and Ladi. After all, when we catch up, itfeels like I never moved abroad!

Ich danke meinen Eltern und Geschwistern fur die Unterstutzung undden Zusammenhalt.

Ein ganz besonderer Dank geht an Martin fur seine Motivation und Liebewahrend dieser Zeit.

1

Chapter 1

Introduction: PolyglutamineDiseases

The brain is the center of the nervous system in all vertebrates and serves as ageneralized control organ for the rest of the body. Responsibilities include the gen-eration of patterns of muscle activity, excretion of hormones, processing of sensorydata, and the coordination of complex behaviors. On the cellular level, the brain iscomposed of neurons and glial cells. While glial cells provide metabolic and struc-tural support, neurons mediate electrochemical signal propagation even over longdistances. This is achieved by small branches (dendrites) and in particular, a longand thin protoplasmic fiber, the axon. Axons communicate with each other throughcertain chemicals, the neurotransmitters, which are excreted into the synaptic gap.Recent estimates state that the human brain is comprised of 86 billion highly inter-connected neurons [von Bartheld et al., 2016]. Given the staggering complexity ofthe brain, it is not surprising that it can be afflicted by numerous disorders. Theseinclude social and mood disorders (e.g., clinical depression, addiction), brain can-cer (e.g., glioblastoma), stroke, neuropathic pain, and neurodegenerative diseases[Danon et al., 2019].

Neurodegenerative diseases serve as an umbrella term for several age-associated diseases that primarily affect vulnerable neuron populations in thebrain. A hallmark of neurodegenerative diseases is a progressive deteriorationof neuronal structure and function. This leads to problems related to movement,the ataxias, or cognitive decline, so-called dementias. Ataxias are characterizedby a lack of coordination of muscle movements, e.g., abnormalities in gait, eyemovement, and speech changes. On the other hand, dementias are more diffi-cult to diagnose, since some degree neurocognitive decline is typical of normalaging [Hugo and Ganguli, 2014]. Clinical presentations are rarely clear, as mostpatients reveal mixed symptoms. Therefore, comprehensive diagnostic informa-tion necessitates neuropathological evaluation, which is only possible at autopsy[Johnson et al., 2012]. In addition, advances in structural and functional imag-ing techniques provide insights into disease progression, years before neurolog-

2 1 Introduction: Polyglutamine Diseases

ical symptoms develop [T et al., 2013], and are currently integrated into diagnosticguidelines [Ghezzi, 2018]. Besides specific protein accumulations and anatomicvulnerability, other commonalities of neurodegenerative diseases include abnor-malities in ubiquitin–proteasomal and autophagosomal/lysosomal systems, oxida-tive stress, programmed cell death, and neuroinflammation [Dugger and Dickson,2017]. Alzheimer’s disease (AD) is the leading cause of dementia and responsiblefor about 60% of all causes. The most evident risk factor is a positive family historyand mutations in the associated genes of the amyloid precursor protein and prese-nilin 1 and 2. Hallmarks of AD are the formation of β-amyloid peptide aggregatesand neuronal tau inclusions [Holmes and Amin, 2016]. Parkinson’s Disease (PD) isthe second most prevalent neurodegenerative disease affecting dopaminergic neu-rons in the substantia nigra. This leads to lower levels of dopamine in the striatumand disrupted motor control resulting in rest tremor, slowness of movement, andpostural instability [Elbaz et al., 2016]. Non-motor symptoms such as constipation,anosmia, and depression can accompany and even precede motor symptoms. PDdiagnosis remains error-prone and conclusive results require an autopsy [Horvathet al., 2012].

Repetitive DNA sequences are prone to instability and are therefore asso-ciated with a number of disorders. Trinucleotide repeats are characterized by anabnormal expansion of a tract of trinucleotide repeats within the particular gene.They can be categorized into polyglutamine (polyQ) and non-polyQ diseases. Thisthesis mostly focuses on polyQ diseases.

An aging human population obviously increases the amount of age-related disorders and patients suffering from those, since senescence constitutesa major risk factor.

1.1 Epidemiology and Societal Impact

PolyQ diseases form a unique group of nine age-associated neurodegenerative dis-orders. Individually they are often categorized as rare disorders, but there is nosingle universally accepted definition for rare diseases [Khosla and Valdez, 2018].However, when taken together, they constitute, after Alzheimer’s disease (AD) andParkinson’s disease (PD), the third most common group of neurodegenerative dis-orders with an incidence of 1-10 per 100,000 [Margulis et al., 2013]. Nevertheless,epidemiology estimates suffer from high variation among studies. The preeminentHuntington’s Disease (HD) is estimated to affect 5-6 of 100,000 people in West-ern Europe and North American populations, where the highest quality data isavailable [Pringsheim et al., 2012]. Diseases estimates are less certain for otherpolyQ diseases, such as spinobulbar muscular atrophy (SBMA), Dentatorubropalli-doluysian atrophy (DRPLA), and several spinocerebellar ataxias (SCAs). In Japan,the spinocerebellar ataxias SCA3 (Machado-Joseph disease, MJD) and SCA6 arethe most commonly reported [Maruyama et al., 2002]. Apart from easier access todisease screening, the elevated prevalence in industrialized regions is also due to

1.2 Pathology 3

longer life spans. The symptomatic onset of polyQ diseases occurs typically aroundmiddle age and thus often at working age. As a consequence, this results in asmaller workforce while simultaneously increasing healthcare costs. The resultingdamages are estimated to be in the range of billions of dollars per year [Liebermanet al., 2019].

In theory, genetic testing permits a definitive diagnosis at any age, butrequires compassionate communication of the genetic test results in the context ofa genetic counseling [Migliore et al., 2019]. Despite these strategies, the typical past-puberty onset of symptoms often results in undiagnosed parents to unknowinglypass on their respective polyQ disease to their offspring.

1.2 Pathology

In polyQ diseases, an extended CAG-tract in the exons of the disease-relevant genesleads to a prolonged glutamine tract in the resulting protein product. The num-ber of glutamine repeats varies not only in affected individuals but also in healthyones and shows disease-specific cutoffs (Table 1.1). Apart from X-linked recessiveSBMA, all polyglutamine diseases are autosomal dominant. When passed on to thenext generation these conditions frequently worsen, and earlier onset is observed,a genetic phenomenon known as anticipation [Ridley et al., 1988].

Interestingly disease-relevant protein expression is not restricted to thebrain but occurs throughout the body with putative cancer-protective effects, ex-cept for skin cancer [Ji et al., 2012, Coarelli et al., 2017]. Furthermore, not all neu-ron populations are equally affected. Overall, certain neuron populations sufferfrom synaptic loss, atrophy of dendritic arborizations, abnormal axonal swellings,and irregularities of nuclear contours. Depending on the length of the polyQ tract,protein solubility is decreased, leading to cytoplasmic and intranuclear aggregateformation. Although prolonged polyQ tracts are causative for neurological dys-function, this does not explain why certain neuronal populations are especially vul-nerable. Novel disease models provide additional support for this hypothesis, asintroducing a previously nonexistent CAG tract into the gene encoding for hypox-anthine phosphoribosyltransferase (Hprt) leads to Huntington’s disease-like symp-toms in transfected heterozygous and hemizygous mice [Yamamoto et al., 2000]. Inaddition to the mutated proteins, aggregates also contain fragments thereof, includ-ing ubiquitin and ubiquitin-binding proteins, molecular chaperons and proteasomecomponents, and intranuclear transcriptional coregulators [Lieberman et al., 2019].Apart from aggregate formation, some polyQ domains, even below the patholog-ical threshold, are associated with a higher risk for amyotrophic lateral sclerosis(ALS) [Elden et al., 2010].

4 1 Introduction: Polyglutamine Diseases

Disease Gene #CAG #CAG Major pathology(normal) (disease)

SBMA AR 5-34 37-70 Degeneration of lower motorneurons in spinal cord and bul-bar region of brainstem

HD Htt 6-35 39-250 Major loss of medium-sizedspiny neurons of the striatumand cortical projection neurons

DRPLA Atrophin-1 7-35 49-88 Degeneration of brainstem,cerebellar, and deep midbrainstructures

SCA1 Ataxin-1 6–44 >39 Atrophy, gliosis, and severeloss of Purkinje cells in the cere-bellum

SCA2 Ataxin-2 13–33 >31 Severe degeneration of thePurkinje and granule cells inaddition to neuronal loss andgliosis of the inferior olive andpons

SCA3 Ataxin-3 12–40 55–84 Degeneration of the spinocere-bellar tract, brainstem, andspinal cord

SCA6 Ataxin-6 4–18 19–33 Marked cerebellar atrophywith loss of Purkinje cells andcerebellar granule neurons

SCA7 Ataxin-7 4–35 37–306 Degeneration of retinal pho-toreceptors in addition to neu-ronal degeneration and reac-tive gliosis in the cerebellar cor-tex, dentate nucleus, inferiorolive, and pontine nuclei

SCA17 TBP 25–48 43–66 Atrophy of the cortex, striatum,and cerebellum, with neuronalloss in the striatum and cerebel-lar Purkinje cell layer

Table 1.1: Summary of human CAG-polyQ expanse diseases, typical CAG repeatranges and major pathologies, adapted from [Stoyas and Spada, 2018]

1.3 Drug Discovery and Treatment Options for CNS Diseases 5

1.3 Drug Discovery and Treatment Options for CNS Dis-eases

Drug discovery is a risky and cost-intensive process. It takes typically 12-15 yearsfrom an initial idea to the marketed product. Drug development for neurologicaldisorders faces additional challenges, as the blood-brain barrier limits the accessof the systemically delivered drugs to the central nervous system (CNS) [Danonet al., 2019]. Neurological disorders tend to afflict the elderly, which critically im-pacts drug safety and toxicity considerations. Furthermore, animal models for CNSdiseases often fail to be transferable to humans [Kulkarni and Saxena, 2018].

All these factors contribute to today’s shortage of therapeutic options forneurodegenerative diseases.

Symptomatic Interventions Although there is no cure for HD, the most commonpolyQ disease, approved symptomatic treatments are available [Wyant et al., 2017],meaning they provide temporary relief, without treating the root cause of the re-spective disease.

The array of agents and surgical interventions that were assessed to re-duce chorea associated with HD includes dopamine and glutamate antagonists,benzodiazepines, glutamate antagonists, dopamine-depleting agents, antiseizuremedications, cannabinoids, lithium, deep brain stimulation, and fetal cell trans-plantation. Additionally, patients may benefit from complementary therapies, be-havioral plans, and cognitive interventions [Frank, 2013].

While their long-term benefits remain uncertain [Bonelli and Wenning,2006], a Cochrane review in 2009 concluded that the formerly antipsychotic drugtetrabenazine (TBZ) showed clear efficacy for the control of HD-associated chorea[Mestre et al., 2009, Jankovic and Clarence-Smith, 2011]. TBZ is a reversible humanvesicular monoamine transporter type 2 and acts in the basal ganglia by promotingthe depletion of the monoamine neurotransmitters serotonin, norepinephrine, anddopamine. Since dopamine is needed for fine motor movement, lower dopaminelevels reduce hyperkinetic movement. After oral administration, TBZ is exten-sively hepatically metabolized and predominantly eliminated via the renal route[Kaur et al., 2016]. To this day, TBZ and its partially deuterated analogue Deutetra-benazine, are the only FDA-approved treatments for HD. The six deuterium atomsin Deutetrabenazine enhance its pharmacokinetic profile, by slowing down itscytochrome-mediated clearance via the isotope effect and require thus a less fre-quent dosing regimen [Russak and Bednarczyk, 2019]. This decreases side effectsassociated with varying plasma levels and was found to positively impact patientadherence [Paton, 2017]. Notably, Deutetrabenazine was the first deuterated drugearn FDA approval [Anderson et al., 2017].

The adverse effects of TBZ comprise depression, which is already com-mon in patients with HD. Therefore all patients taking TBZ need to be closely mon-itored for signs of depression and suicidal ideation. Since TBZ is a dopamine de-

6 1 Introduction: Polyglutamine Diseases

pleting drug and low dopamine levels represent a hallmark of PD, a common sideeffect of TBZ is drug-induced parkinsonism [Blanchet and Kivenko, 2016, Caroffet al., 2018].

Potential milestones, in HD therapeutics, are neuroprotective drugs thatdelay motor and cognitive decline onset, slow progression or reverse ongoing dis-ease pathology [Frank, 2013].

1.4 Translational Neuroscience

Advances in genetics and molecular biology have helped to shed light on themolecular mechanisms causative for pathologies of the CNS. A key benefit for pa-tients and researchers is that empiric diagnoses can now be correlated and aug-mented with assays tracking changes in levels of DNA, RNA, and even transcribed,or posttranslationally modified proteins in real-time. In turn, this paves the way forbona fide therapies rather than merely alleviating symptoms.

The resulting approaches can be classified as gene-therapy-based, strate-gies focusing on misfolding and aggregation, or the elucidation of alternative tar-gets (proteins, DNA, or RNA) for polyQ diseases. Recent progress was made by de-signing polyQ drug candidates based on antisense oligonucleotides (ASOs) [Laneet al., 2018]. These are synthetic single-stranded chains of nucleic acids that bind toa specific RNA sequence and thereby prevent translation.

The structure of the polyQ tract depends on the repeat length as some an-tibodies e.g. 1C2, preferentially bind longer polyQ tracts [Trottier et al., 1995] in adose-dependent manner in vitro [Heiser et al., 2000]. Therefore, Takeuchi and Na-gai suggested that small molecules may stabilize a unique structure and hamperthe protein aggregation process. Thus, they performed a phage display screeningto discover short peptides that bind to the expanded polyQ proteins with high-affinity [Takeuchi and Nagai, 2017]. Interestingly, polyQ binding peptide 1 (QBP1)efficiently lowered aggregation of expanded polyQ proteins in vitro and reducedpolyQ-induced cytotoxicity by preventing the β-sheet conformational transition aswell as oligomer formation. While QBP1 expression in polyQ Drosophila modelsindicated therapeutic potential for polyQ diseases, a limited efficiency of QBP1to pass through the blood brain barrier upon peripheral administration in mousemodels was noted [Nagai, 2003, Popiel et al., 2009].

Misfolding and subsequent polymerization of otherwise soluble proteinsis a hallmark of several neurodegenerative diseases. Chaperones are specializedmolecules that guide the folding process of nascent proteins. Heat shock pro-teins (HSPs) form a group of chaperones, that can be categorized into six fami-lies: Hsp100, Hsp90, Hsp70, Hsp60, Hsp40, and small HSPs. They are presentthroughout the nervous system and prevent aggregation, assist refolding and me-diate solubilization of stable protein aggregates. Therefore, they have also beeninvestigated as modulators of polyQ-induced neurotoxicity. HSP70 is selectivelyresponsible for the degradation of misfolded proteins and is therefore an attractive

1.4 Translational Neuroscience 7

alternative target for polyQ diseases. Thus there is considerable interest in pharma-ceutical interventions that enhance HSP70-mediated protein quality control [Daviset al., 2019].

1.4.1 TRMT2a as a Novel Target for PolyQ Diseases

An estimated 70% of all human genes linked to the disease have orthologs in thegenome of Drosophila melanogaster [Bier, 2005]. A similar percentage of vital genesin the fruitfly genome is furthermore involved in eye development. Therefore,fruitfly eyes permit the study of cellular function and development. The degenera-tion can be monitored via the generation of morphological phenotypes, e.g., rougheyes phenotypes (REP) [Jackson, 2008, Iyer et al., 2016]. Due to its well-understoodgenetics, fast generation cycle, and uncomplicated handling Dropsophila is an estab-lished model organism.

In a large-scale RNAi screen using a polyglutamine disease model inDrosophila, Voßfeldt et al. identified several novel modifiers of polyQ-mediatedneurotoxicity [H et al., 2012]. In this screen rough eye phenotypes (REPs) wereused as a readout, see Fig. 1.1. Interestingly, RNAi-mediated silencing of the en-zyme tRNA methyltransferase 2 homolog A (TRMT2a) showed the strongest sup-pression of polyQ-induced rough eye phenotypes in flies and effectively reducedthe amount of polyQ aggregates [H et al., 2012].

Transfer RNAs (tRNAs) serve as adapter molecules used in the transla-tion of single RNA strands, linking anticodons with their corresponding aminoacids. Numerous individual nucleotides that form the tRNA molecule are heavilychemically modified. TRMT2a is responsible for the C-5 methylation of uridine atposition 54 in the TΨC loop tRNA (m5U54) or ribothymidine (T). Intriguingly, thefunction of TRMT2a is not only conserved in all eukaryotes but also in the orthologfrom Saccharomyces cerevisiae TRM2 [Chang et al., 2019, Towns and Begley, 2012].

Aminoacyl-tRNA synthetases (aaRSs) are found in all kingdoms of life (ar-chaea, bacteria, and eukarya). They are responsible for attaching the correct aminoacid onto individual tRNA molecules, depending on their respective anticodon andare therefore key for the faithful transmission of genetic information in all organ-isms [Perona and Gruic-Sovulj, 2013]. In the first step of this reaction, the relevantamino acid is activated in an ATP-dependent reaction leading to aminoacyl adeny-late. Subsequently, aminoacyl adenylate is transferred to the 3’-end of tRNA [Ibbaand Soll, 2000]. Every amino acid has its own synthetase that catalyzes both steps.

Interestingly, this is not the case in some archaea, bacteria, chloroplasts,and mitochondria as they lack aaRS responsible for glutamine (glutaminyl-tRNAsynthetase) [Feng et al., 2004]. Thus, the glutamine delivering tRNA cannot becharged with its cognate target (glutamine). In the absence of glutaminyl-tRNA

8 1 Introduction: Polyglutamine Diseases

Figure 1.1: (A) Rough eye phenotype (REP) used as a primary readout for screen-ing. Compared to control (upper panels), eye-specific (GMR-GAL4) expression ofpolyQ (lower panels) induces disturbances of the external eye texture, e. g. de-pigmentation of the compound eye observed by light microscopy (left) and as de-picted in scanning electron micrographs (middle) [Freeman, 1996]. Toluidine blue-stained semi-thin eye sections reveal that the disturbance of external eye structuresis accompanied by degeneration of retinal cells (right). (B) Modification of thepolyQ-induced REP by enhancers and suppressors. Vienna Drosophila RNAi Center(VDRC) transformants used to silence respective genes: CG3284 (11219), CG16807(23843), CG15399 (19450) and CG7843 (22574). (C) Flow chart of the screeningprocedures to identify modifiers of polyQ-induced toxicity. (D) Brief summaryof screen results. Scale bars represent either 200 µm in eye pictures or 50 µm insemi-thin eye sections. Adapted from [H et al., 2012]

synthetase, glutamyl-tRNA synthetase, normally only linking glutamate to its spe-cific tRNA, can also load glutamate on tRNA specific for glutamine. When thisincorrect pairing is recognized, glutamate is converted to glutamine by the enzymeGlu-tRNAGln amidotransferase.

In E. coli it was shown that the modification of U54 leading to ribo-sylthymine (i.e., 5-methyluridine) in tRNAs is of importance for the protein-synthesizing machinery [Kersten et al., 1981]. Therefore, it was suggested that theTRMT2a mediated C-5 methylation of tRNA at the position U54 is essential forfaithful translation. Thus, TRMT2a depleted cells should be more prone to amino

1.4 Translational Neuroscience 9

acid mischarging and therefore increase the error rate during translation. Elon-gated CAG tract translation is by itself error-prone as it (i) requires a continuoussupply of correctly charged glutaminyl-tRNAGln (ii) the same translation needs tobe executed multiple (consecutive) times. Therefore, it is conceivable that in theabsence of TRMT2a, glutamate is accidentally inserted into the polyQ tract, sincethe discrimination between glutamine/glutamate loading occurred relatively latein evolution.

Interrupted polyQ tracts are known to be considerably less toxic with re-spect to continuous polyQ chains [Menon et al., 2013]. We know that the lack ofTRMT2a leads to an unmethylated tRNA at position U54. This could lead to highererror rates during the translation, e.g., glutamate insertions.

To further validate TRMT2a as an attractive novel target for polyQ dis-eases, it was necessary to prove that neither the enzyme itself nor its functionis essential. Moreover, while it was long believed that GMR-GAL4 drivers werethought to be exclusively expressed in the fly eye, more recent work indicatedexpression in additional tissues, such as the pupal ventral and the cerebral gan-glia [Ray and Lakhotia, 2015]. Therefore, TRMT2a knock-out mice were producedto understand potential adverse effects that might result from a lack of TRMT2a.In a standardized setting, more than 550 parameters were checked covering ar-eas of behavior, bone and cartilage development, neurology, clinical chemistry,eye development, immunology, allergy, steroid metabolism, energy metabolism,lung function, vision and pain perception, molecular phenotyping, cardiovascularanalyses, and pathology. Encouragingly, researchers at the German Mouse Clinic(https://www.mouseclinic.de/) who performed these tests, only noted a slight butstatistically significant lower weight in TRMT2a knock-out mice. This further indi-cates that hampering (the function of) TRMT2a could indeed be a viable strategy toameliorate polyQ-mediated neurotoxicity.

The primary goal of this thesis was it to assess, whether pharmacologicalinhibition of TRMT2a could lead to comparable neuroprotective effects in polyQdisease models, as the knock-down approach. To this end, I employed primar-ily computer-assisted molecular modeling, machine learning-based methods, andsimulation techniques.

11

Chapter 2

Methods: MacromolecularModeling

Molecular modeling methods suitable for the investigation of macromolecularstructures have benefited from considerable experimental advances in humangenome sequencing, increased access to structural information from X-ray crys-tallography, nuclear magnetic resonance (NMR), and, more recently, also cryogenicelectron microscopy (cryoEM). Overall the scientific community has accumulated avast amount of information on the sequence and structural level of these molecules,which can elucidate biological and physiopathological processes in atomistic detail.However, the structure determination of proteins lags behind, since correspond-ing databases are remain ∼200 times smaller than those dedicated to sequences [Ket al., 2014, Hollingsworth and Dror, 2018].

Computers have played a key role not only in managing and processingexperimental data but increasingly also as computational lenses into a microscopicworld [Lee et al., 2009, Dror et al., 2012, Hollingsworth and Dror, 2018]. So-calledin silico methods have augmented biomolecular structure elucidation (comparativemodeling), compound prioritization for in vitro testing (docking and virtual screen-ing) and allowed the study of molecular motions thereof (i.e., molecular dynamicsand Monte Carlo methodologies). Because biomolecular models nowadays includeup to a billion atoms, they can only be managed computationally [Jung et al., 2019].

Once pathophysiological mechanisms are elucidated at the protein level,strategies can be drafted to ameliorate their negative impact, e.g., by interferingwith the function of a disease-relevant protein. This often entails repurposingknown compounds or de-novo design of chemicals that bind to the active site ofthe protein responsible for the pathogenesis and hamper its negative impact onthe organism. Obviously, the active site of a protein can only be explored once athree-dimensional (3D) structure is available.

Unfortunately, not all disease-relevant protein structures have been solvedexperimentally. The 3D structure can, however, be estimated computationally incertain cases, e.g., via comparative modeling. Depending on the quality of these

12 2 Methods: Macromolecular Modeling

models they serve as a starting point for a more refined model, or guide experi-mentalists in selecting viable protein constructs that are easier to express, purifyand eventually lead to structure determination.

Finally, a protein structure or at least a structural model is a prerequisiteto study dynamical features of a protein, e.g., with Monte Carlo or molecular dy-namics approaches. Elucidation of dynamical properties enables binding pocketdiscovery, which is key in challenging cases when there are no pockets distinguish-able on the crystallographic protein structure per se. For a small molecule to bindefficiently to a protein, it needs to form protein-ligand interactions typically on suchconvex and often partially hydrophobic, protein surface patches.

2.1 Comparative Modeling

Proteins are highly flexible amino acid chains and adopt their fold in nature withina few microseconds to hours [Kim, 1990, Kubelka et al., 2004]. The 3D structure of aprotein is fully determined by its sequence. Levinthal’s paradox underlines that anunfolded polypeptide chain exhibits an enormous amount of degrees of freedom[Durup, 1998]. Randomly sampling them, whether in vivo or in silico, would resultin extremely long folding times - in stark contrast to the observations above. Thus,an atomic resolution simulation of the folding process is not tractable, except forvery short proteins. In some cases, extensive molecular dynamics simulations andhigh-quality force fields revealed that it is indeed possible to monitor the foldingprocess in atomic detail [Lopes et al., 2014].

The intimate relationship between fold and function has been long noted.Proteins are able to correctly assert their biological functions only when in thefolded state. Thus, the fold needs to be more conserved than the DNA or proteinsequence throughout evolution. [C and AM, 1986, Sander and Schneider, 1991,MA et al., 2000]. Mounting evidence on intrinsically disordered proteins is a note-worthy exception here, where, due to the lack of a stable 3D structure, intrinsicdynamical features give rise to the protein function [AK et al., 2001].

High sequence similarity indicates common ancestry (homology), whichresults either from a speciation event (orthologs), where populations give riseto distinct species, a gene duplication event (paralogs), or a gene transfer event(xenologs) [Frishman and Valencia, 2008].

By exploiting this relationship, one can build structural hypotheses of pro-teins lacking 3D information based on their sequence identity/similarity to knownprotein structures. These techniques are known as comparative or homology mod-eling [C and AM, 1986].

Comparative modeling can be broken down into a multi-step workflow[Martı-Renom et al., 2000] ), where tasks are repeated until a predefined modelquality is achieved:

2.1 Comparative Modeling 13

2.1.1 Fold Assignment

A prerequisite for any comparative model is the identification of at least one similarsequence with a known 3D protein structure (template candidate/s).

The principal online repository for biomolecular structural information isthe Protein Data Bank (PDB) [Berman, 2000]. As of April 2020, over 163.410 bio-logical macromolecular structures have been deposited and are made accessible tothe public (http://www.rcsb.org). The majority of structures were elucidatedwith X-Ray crystallography, ∼10% by nuclear magnetic resonance (NMR), and anexponentially growing number by 3D Electron Microscopy (cryo-EM) [Egelman,2016]. Additional databases of interest include SCOP[Andreeva, 2004], CATH[Fet al., 2005], and DALI[Dietmann, 2001].

The success of a comparative modeling approach will likely benefit fromthe following two observations:

i Despite the rapid expansion of the structural databases, novel folds are rarelyobserved. This suggests that most of the folds common in nature may havealready been unveiled.

ii Since protein structure databases are regularly updated and extended, newlyadded information can increase the accuracy of future comparative modelinginitiatives.

Sequences with higher sequence identities are more likely to assume asimilar fold. Alignment length-dependent guidelines have been established andshow that this relationship breaks when sequence identity decreases [C and AM,1986, Sander and Schneider, 1991]. Sequence-structure relationships can be charac-terized by three regimes, a ”safe zone” where a >30% sequence identity indicateseasily detected relationships, a so-called ”twilight zone” [Rost, 1999] where iden-tities encompass 30 to 10% and finally the ”midnight zone” [Rost, 1999] lacking astatistically significant relationship. Furthermore, the safe zone is sometimes fur-ther subdivided into medium quality (30-50% sequence identity) and high quality(>50% ) models.

Moreover, sequence identity also relates to the domain of applicability ofthe final homology model. When models are based on sequence identities below15% this often leads to wrong conclusions. At around 30% to 50% they can beinformative to unveil reaction/binding sites, druggability and assist the creationof mutagenesis hypotheses. Above ∼50% , resulting models can enable the studyof protein-ligand interactions in the context of drug design [Hillisch et al., 2004].Clearly, to assess a protein-ligand binding hypothesis, high-quality models are aprerequisite. Thus, only sequence identities close to 100% are apt to elucidate en-zymatic catalysis mechanisms [B and A, 2016].

In the ”twilight zone”, the sensitivity of heuristic sequence comparisonalgorithms, such as FASTA[Pearson, 1994] and BLAST[Altschul, 1997], begins to

14 2 Methods: Macromolecular Modeling

deteriorate rapidly[Brenner et al., 1998]. Encoding residue type occurrences at spe-cific positions, so-called profiles, has allowed tackling these particularly challeng-ing cases. Here, profile-sequence alignment methods, e.g. PSI-BLAST [Altschul,1997] or profile/Hidden Markov Model (HMM) alignments, e.g. HHSearch [Sod-ing, 2004], infer protein similarity more reliably. In this thesis, BLAST, PFAM, andClustalW were used.

2.1.2 Template Selection and Alignment

Template selection can take sequence identity/similarity into account whenthe sequence identity exceeds the ”twilight zone” (SSEARCH[Pearson, 1994],BLAST[Altschul, 1997] or FASTA[Pearson, 1994]). For cases with lower sequenceidentity, threading (fold recognition) can help to elucidate suitable templates. Herethe 3D information from all available protein structures drives template selection.

In the case of multiple template structures, more than one template maybe used to inform the model. The experimental method and parameters, structurequality metrics, but also biological parameters, e.g., oligomerization state and cel-lular localization, have to be considered for template discrimination. To study theligand-bound state, homology models are built with templates where a ligand hasbeen co-crystallized/soaked in the active site.

Once a suitable template is selected, the target-template alignment mayrequire refinement. Here it is key to incorporate all available information and par-ticularly conserved or key residues should assume the same positions in the align-ment. Typically, manual alignment editing by removing flanking residues and loopinsertions are widely used techniques to ensure that the alignment reflects all ac-cessible information. Root-mean-square deviation (RMSD) values,

RMSD =

√√√√ N∑i=1

d2i

N(2.1)

where N represents the number of (heavy, meaning non-hydrogen) atomsand di the distance of the ith atom, or TM scores[J and Y, 2010] can complementalignment optimization. Producing a suitable alignment is fundamental since it isunlikely to recover from an alignment error later on in the workflow [Sanchez andSali, 1997].

2.1.3 Model Building

Actual model generation is another multistep process and various procedures havebeen put forward. Overall, these can be classified as follows.

i Rigid body assembly

2.1 Comparative Modeling 15

The model is built by the assembly of rigid bodies and takes advantage of thedissection of protein structures into conserved core regions, connecting loopswith higher variability and finally, side chains decorating them.

As a typical first step, the protein core, comprising of only the Cα and back-bone atoms in the non-loop regions, is assembled. These parts generally com-prise the most structurally rigid regions of a given protein. Then, the Cα atomsare positioned by averaging the Cα positions in conserved regions of the tem-plate structures. The main chain model is derived from the template closestto the target. Next, initial positions for the loops’ backbone are introduced.Here, knowledge-based methods that leverage information from known 3Dloop structures [CM et al., 1993] or energy-based methods can be used. De-spite their convenience, database derived approaches are limited due to theexponential growth of loop conformations with loop length. The second ansatzattempts to compute the loop structure ab initio via a conformational searchand improves upon clustering and considering entropic contributions [Z et al.,2002]. Once this rigid body assembly is finalized and initial loop positionshave been assigned, sidechain atoms are introduced. Residues that are identicalin the target/template are assumed to adopt a similar sidechain conformation[Sutcliffe et al., 1987].

ii Segment Matching

An alternative approach is based on coordinate reconstitution or segmentmatching. The underlying idea is that most hexapeptides cluster into a lim-ited amount of structurally diverse classes [Jones and Thirup, 1986]. Therefore,these hexapeptide structures can be used to guide an incremental constructionof the target sequence.

iii Satisfaction of spatial restraints

An extensible way of obtaining a homology model is via the satisfaction of spa-tial constraints method. Here, concepts that are used for the structure deter-mination with NMR-derived restraints are repurposed. These methods focuson homology-based restraints where one anticipates similar distances betweentemplate-target pairs. Such restrictions are augmented with stereochemical re-straints on bond lengths, bond angles, and dihedral angles, as well as non-bonded molecular force field terms. Every piece of information on the proteincan thus be incorporated into the model as a constraint, in particular, exper-imental observations or heuristics. A model is then built by minimizing allconstraint violations [Srinivasan and Blundell, 1993, Havel and Snow, 1991].

In this thesis, Schrodinger Prime [Jacobson et al., 2004] was used for modelbuilding. Alignments are built considering of sequence and secondary structureinformation, respectively, with Prime’s internal alignment generator program STA.

16 2 Methods: Macromolecular Modeling

Sidechain modeling

Whereas in the more hydrophobic interior of globular proteins sidechain packingis straightforward, this is not always the case for sidechains located on the pro-tein surface. Sidechain atoms on the surface of a protein tend to be less stericallyrestricted and can thus explore a larger conformational space. They also mediatecrucial interactions with the solvent or other proteins. Moreover, solvent-exposedsidechains on the protein surface often drive protein-ligand interactions. A reason-able sidechain placement is, therefore key to utilize the resulting model in a drugdesign context.

An exhaustive enumeration of all sidechain conformations of everyresidue in larger proteins is computationally infeasible. Therefore, one resorts topreviously compiled libraries of low-energy sidechain conformers, so-called ro-tamers [Dunbrack and Karplus, 1993]. Such knowledge-based methods exploitinformation from protein databases containing structural information. The mostsuccessful approaches leverage not only sidechain dihedral propensities but alsoacknowledge their respective backbone-dependencies (ϕ and ψ dihedrals) [Dun-brack, 2002].

Before a mature model is achieved, model optimization can improve themodel quality. Although the extrapolated structure is expected to be naturally closeto the template, small deviations in backbone positions and sidechain positions arenonetheless to be expected. To alleviate such problems, Monte-Carlo or moleculardynamics simulations can resolve atoms that are too tightly packed or even over-lap. However, postprocessing the homology model in such a way, with only weakor no restraints might stray away from the native state towards unfolded ones [Heoand Feig, 2017].

2.1.4 Model Evaluation and Optimization

Developing comparative modeling algorithms is a particularly active field of com-putational biology research and a vast amount of tools and web servers is available.Most recent methodological advancements focus on the use of machine learnedforce fields for protein structure prediction [Noe et al., 2020]. Therefore, there is aneed for an objective quality ranking of these methods. This idea was first imple-mented in the form of the biannual Critical Assessment of protein Structure Predic-tion (CASP) [Moult et al., 2009], where groups worldwide compete to predict the3D structure of novel proteins in various disciplines in silico.

Besides the evaluation of the various algorithms and their implementa-tions, individual comparative models need to be checked as well. In a typicalworkflow, computationally cost-effective algorithms and higher availability of ITresources permit the generation of several thousands of homology hypotheses fora target protein of interest. This is of ample importance in modeling scenarioswith a sequence identity below 30%. Main sources of error arise from the selec-tion of unsuitable templates, incorrect sidechain packing, wrong alignments (espe-

2.2 Statistical Mechanics 17

cially distortions or shifts in parts that are correctly aligned), and unassigned areas[Fiser, 2017]. A full error assessment requires ”internal” self-consistency checks inconjunction with ”external” information that was not used in the model buildingworkflow. On the one hand, internal checks include stereochemistry assessment,including bonds, bonds, and dihedral angles and non-bonded atom distance de-viations [Hooft et al., 1996], while, on the other hand, correct template selectionis assessed and unreliable regions are identified with, for instance, 1D-3D profilesusing tools such as Verify3D[Eisenberg et al., 1997].

The homology model at this stage still suffers from minor inconsistenciesand typically warrants several preparation steps, which is also required for exper-imentally derived structures, before biological insights are derived. Typical stepsinclude, e.g., hydrogen bond network optimization and a restrained minimizationto resolve atoms in too close proximity. Considering the well-known shortcomingsof structural data in the PDB [Warren et al., 2012], a comparative model buildingapproach will necessarily inherit those. After all, the accuracy and precision of anyprediction cannot be better than the data that was used for the generation of themodel. Therefore a ”limited trust” to the information presented in these databasesand derived models is crucial [W et al., 2016].

Besides those intrinsic shortcomings, as aforementioned, an aggressive re-finement might lead to the accumulation of small errors and result in an even lessreliable model, e.g., due to the quality of the force field.

Finally, structural genomics is an emerging field encompassing experi-mental and automated modeling approaches to provide much of the structural in-formation on biomolecules [Chance, 2002] and fully automated modeling pipelinerepositories already contain impressive amounts of comparative models, e.g.,SWISS-model [Biasini et al., 2014] and MODBASE [Pieper et al., 2013].

Homology models for proteins with multiple domains attempt to modeleach domain individually. Thus, their relative position and orientation, often keyfor their biological activity, is difficult to infer from a bottom-up approach.

Conclusive model validation is however restricted to the accumulation ofexperimental evidence, e.g., via solving the corresponding protein crystal structure,NMR experiments or cryo-EM.

2.2 Statistical Mechanics

Experimental observables, such as temperature, pressure, and conductivity, are notdirectly accessible from investigating individual molecules. A rigorous mappingof properties of individual molecules to their behavior at the macroscopic levelrequires chemical thermodynamics and statistical mechanics.

In statistical mechanics, a key concept is the phase space (Γ ) associatedto the considered physical system which represents the set of all possible statesaccessible to the system itself. In classical mechanics, a microscopic state of a systemcomposed by N particles is a pair (Q,P ) where Q = (q1, · · · , qN ) is the collection

18 2 Methods: Macromolecular Modeling

of the position coordinates of each particle while P = (p1, · · · , pN ) is the collectionof the corresponding momenta.

The macroscopic state is realized as a distribution of microscopic states,or microstates. A statistical ensemble corresponds to the propensity-weighted sumof all microstates. Although it is a statistical ensemble that gives rise to the macro-scopic state, the physical observables are well-defined, with mean values and vari-ances proportional to the underlying number of particles.

The macroscopic state is the collection of all properties needed in order todetermine uniquely the system itself.

In the case of an ideal gas, e.g., the collections (P, T ), namely, pressure,and temperature, allow us to fully understand the gas through the ideal gas stateequation (PV = nRT ). It is worth to note that there does not exist a one to one cor-respondence between microstates and macrostates. Once one has defined the phasespace, another important concept it that of a statistical ensemble. In general terms,an ensemble is an infinite collection of replicas of our system where each replica isassociated with a possible microstate in such a manner that each microstate is com-patible with a given (fixed) macrostate. More precisely, to each ensemble we canassociate a probability density function through which we can define the conceptof statistical properties of a system, or mathematically speaking, to average over allpossible microstates.

Given a property, O, we can define an associated observable Oobs that isessentially a function. We can evaluate such a function on each microstate and sothanks to the probability density function abovementioned, it makes sense to de-fine the average of Oobs(Γ) over all microstates in the ensemble previously chosen.In the NVT or canonical ensemble, the number of particles, the volume, and thetemperature remains constant. The probability distribution for the current ensem-ble is the Maxwell-Boltzmann distribution given by:

pNV T =e−H(Γ)

kBT

Z(2.2)

whereH is the Hamiltonian function associated to the system and definedby the sum of the kinetic (K) and potential (V ) functions.In the case, one is just interested in the configurational contribution to the densityfunction we can express

pCNV T =e−V (Γ)

kBT

Z(2.3)

where, respectively, we have

ZC =

∫dΓpCNV T (Γ) =

∫e−V (Γ)

kBT dΓ (2.4)

denoting the canonical partition function, V (Γ) the potential energy of the

2.2 Statistical Mechanics 19

system and kB the Boltzmann constant. This distribution permits, in turn, to derivethe ensemble average as follows:

oobs = 〈Oobs〉 =

∫O(Γ)p(Γ)dΓ (2.5)

where dΓ = dq1...dqNdp1...dpN is the volume element in the phace space occupiedby the microstate dq1, ..., dqN , dp1, ...,pN.

In a molecular dynamics simulation, the time evolution of a system isstudied. The ergodic hypothesis states that over long periods of time, the timeaverage equals the ensemble average. Thermodynamic properties are estimatedthus by averaging over adequately long trajectories. Only then the system has beensampled exhaustively, and the complete phase space was visited, e.g., the ergodichypothesis holds, meaning that the time average equals the ensemble average ofany system property.Another intriguing consequence of this hypothesis is that the bias introduced bythe starting point selection vanishes.

Generally the underlying assumption when simulating complex biologi-cal systems is that simulations are ergodic.

2.2.1 Regulation of Temperature and Pressure

ri =pimi

; pi = Fi (2.6)

Here ri and pi denote the coordinates and corresponding momenta ofN particles with their masses mi and the forces Fi. However, normal experimentalconditions do not correspond to a microcanonical ensemble (NVE) since they areconducted at a given temperature (e.g., room temperature). Such conditions aremodeled more closely when the above scheme is adapted for isothermal systems.Here, either volume (NVT) or pressure (NPT) is kept constant. The NVT ensembleis also known as the canonical and the NPT as the isothermal-isobaric, respectively.Isothermal and isobaric conditions are imposed by dedicated algorithms referredto as ”thermostats” or ”barostats”, respectively.

2.2.2 Thermostats

The instantaneous or kinetic temperature is typically derived from the total kineticenergy of the system. The purpose of a thermostat is to allow minimal temperaturefluctuations and thus to keep temperature constant overall. One way to achievethis is to fix the system temperature to a chosen value Tc via rescaling the velocitiesof each atom at every or certain timesteps by a factor (Tc/Tr)1/2 where Tr denotesthe calculated temperature of the system [Woodcock, 1971].A more subtle approach by Berendsen [Berendsen et al., 1984] involves scaling thevelocity with the factor:

20 2 Methods: Macromolecular Modeling

λ =[1 +

dT

tT

(TdTr− 1)] 1

2 (2.7)

Where tT denotes the temperature coupling constant. This effectively cou-ples the system with an external heat reservoir and is thus an efficient procedure toequilibrate the system with a predetermined rate tT to the desired temperature. An-derson proposed an algorithm based on the stochastic collision method [Andersen,1980], whereby generating the probability density of a canonical ensemble. Thisis achieved by assigning a new value from a Maxwellian distribution to a randomatom at certain intervals.An alternative thermostat definition was presented by Nose and augmented byHoover, by the addition of two non-physical degrees of freedom (s and ps) thatregulate the total kinetic energy fluctuations :

ri =pimi

; pi = Fi −psQpi; s =

psQ

; ps =

N∑i=1

p2i

mi− LkT (2.8)

Where L is to be determined and Q alters how weak of the coupling tothe thermostat should be. These equations conserve C, where H(p, r) denotes thephysical Hamiltonian:

C = H(p, r) +p2s2Q

+ LkTs (2.9)

If C constitutes the only motion constant, rewriting the microcanonicalpartition function at a temperature T leads to:

ΩT (N,V,C) =

∫dNpdNrdpsdse

3Nsδ(H(p, r) +

p2

2Q+ LkTs− C

)=

=e

3NCLkT

LkT

∫dpse

−3Np2s

2QLkT

∫dNpdNre

−3NH(p,r)LkT (2.10)

ΩT is proportional, in case of L = 3N , to the canonical partition functionof the system and the parameter s regulates the fluctuation of the kinetic energy.

In this thesis, the Langevin thermostat was used. Constant simulationtemperature is thus achieved by altering Newton’s equations of motion. Here γidenotes the friction coefficient, f i a random force and its dispersion σi and thetimestep ∆t to integrate the equations of motions [Loncharich et al., 1992]:

ri =pimi

; pi = Fi − γipi + fi; σ2i =2miγikBT

∆t(2.11)

Within this approach, many atoms are treated implicitly with stochasticterms. This efficiently decreases the associated computational cost.

2.3 Molecular Dynamics Simulations 21

Langevin thermostat implementations rely on pseudorandom numbersgenerated by a pseudorandom number generator (PRNG) and the resulting num-ber sequences have different properties with respect to true random numbers. Thiscan lead to artifacts in the case of short concatenated simulation segments [Ceruttiet al., 2008].

Repeatedly perturbing the rotational degrees of freedom of sidechainatoms during a simulation is a viable way to detect local flexibility. This methodis known as rotamerically induced perturbation (RIP) [Ho and Agard, 2009]. Inthe modified Langevin-RIP approach (L-RIP), a Langevin thermostat with a damp-ing coefficient of 1 ps-1 enables longer MD relaxation steps without overall proteinheating [Kokh et al., 2016].

2.2.3 Barostats

When constant pressure conditions are imposed, as is typically the case in lab set-tings, then a corresponding simulation requires a barostat. A simple way to achieveconstant pressure is by adopting the simulation container volume, e.g. by isotropicvolume change via rescaling the atomic coordinates [Andersen, 1980].

The Berendsen barostat [Berendsen et al., 1984] is conceptually similar tothe Berendsen thermostat and was used in this thesis:

p =pmd − pt

τp; ηt = 1− ∆T

τpγ(pmd − pt) (2.12)

Here pmd refers to the pressure at which the MD is targeted and pt theinstantaneous pressure. Again coupling to the ‘pressure bath’ is regulated a pa-rameter, here via τp the barostat relaxation time constant. The MD container scalingfactor is given by ηt and γ denote the system’s isothermal compressibility; typicallythe value for liquid water is chosen here.

In case the coupling is too strong, the Berendsen barostat might introducesimulation artifacts. Nevertheless, this barostat computationally cost-efficient andthus also used for demanding free energy calculations [Song et al., 2019].

2.3 Molecular Dynamics Simulations

In principle, the Dirac or Schrodinger equation allows an accurate electronic struc-ture derivation of any molecule of interest. Nevertheless, analytical solutions forthe Schrodinger equation are only possible for small systems with a limited numberof atoms, while systems with several hundred atoms require numerical approxima-tions. Moreover, dynamical studies at the ab initio level are necessarily limited toshort simulation times (e.g., picoseconds) with current processing power.

Clearly, computational research of larger systems with biologically rele-vance and processes with longer timescales are typically beyond the scope of these

22 2 Methods: Macromolecular Modeling

quantum chemical approaches. To investigate also such systems computationally,cost-effective approximations are unavoidable:

i Atom types

Whereas valences are a result of electron structure theory, molecular dynamicssimulations require the use of atom types, where the electronic degrees of free-dom are treated implicitly. Therefore, e.g., a sp2 or sp3 carbon atom needs nowto be introduced separately and thus also parameterized individually.

ii Interactions

In the molecular dynamics setup, parameterized potentials replace the accuratecalculation of electronic and nuclear interactions. Despite a loss in accuracy, theso-called force-fields provide a cost-effective alternative and integrating themis straightforward. In classical molecular dynamics, forces are derived frompredefined potential functions, parameterized to yield either experimental dataor quantum chemical calculations.

2.3.1 Quantum Mechanical Methods

Quantum mechanical problems in chemistry typically involve solving the non-relativistic, time-independent Schrodinger equation [Schrodinger, 1926]. This sec-tion has been adapted from my master thesis and the reader is refered there for amore comprehensive discussion.

HΨ = EΨ (2.13)

This eigenvalue equation contains the Hamiltonian operatorH , which is formed bya kinetic and potential term V . Ψ constitutes the wave function, which describesthe system.

H = − ~2

2m∇2 + V

The Hamiltonian operator can be divided into a kinetic − ~2

2m∇2 and a po-

tential contribution. For other systems only the potential energy operator is altered.A postulate of quantum mechanics states that all conceiveable information aboutthe system, is included in the wave function. The wave function may be com-plex, therefore its interpretation is remained long unclear. Only

∫Ψ∗Ψdτ , the so-

called probability density function (PDF), can be employed for physical reasoning[Heisenberg, 1956].

Schrodinger was able to explain the hydrogen spectrum correctly and thereason for the natural order of chemical elements could be understood for the firsttime. However, as mentioned earlier, this equation can not be solved analytically

2.3 Molecular Dynamics Simulations 23

in all cases. Fortunately, various approaches have been proposed to tackle thosenumerically.

The Born-Oppenheimer approximation states that since the nuclei aremuch heavier than the electrons, the typical velocity of an electron is of the order106 m/s which compared with a nucleus velocity 10−2 m/s, is very small, they canbe treated as essentially static [Born and Oppenheimer, 1927]. The resulting wavefunction is thus composed by the product of two terms, one for the electron(s) andone for the nuclei. The nuclei wave function contributes therefore only parametri-cally to the overall energy of the system. This means it is added as a constant offset,after the electronic Schrodinger equation is solved.

Within the framework of Schrodinger’s equation, theoretical chemistshave searched for means to compute the wave function cost-efficiently. Severalmethods emerged and can be categorized into Hartree-Fock and Density Func-tional methods.

Since the solutions for the Schrodinger equation of the hydrogen atom areknown analytically, these results can be considered good starting points for sys-tems containing more than one electron. When each atom is associated with one orseveral of such functions, a basis for the electronic degrees of freedom is provided.Therefore, this set is generally referred to as basis set. Assigning basis sets to atomsis a highly non-trivial task, and several workgroups around the globe work on theseproblems [Schuchardt et al., 2007]. After each electron is assigned such a functionor functions, solving the Schrodinger equation is the next step. However, to de-crease computational effort, several Gaussian functions (often at least 3) are usedto approximate the exponential basis function. Gaussian functions have severalconvenient mathematical properties compared to exponential functions, one beingthe possibility to quickly calculate multicenter-integrals.

The Variation principle is a cornerstone of quantum chemistry. It statesthat every approximate wave function will yield an energy higher than the en-ergy for the correct wave function. So in order to optimize the wave function, alegitimate ansatz is to vary an initial (trial) wave function until the lowest energypossible is uncovered.

Relativistic effects are ignored for the sake of simplicity and so is the spinof the electrons. This fundamental non-classical property was shown by Dirac todirectly emerge from a relativistic treatment of electrons. When performing non-relativistic quantum mechanics, this effect needs to be ensured otherwise. TheHamiltonian operator does not account for it; Therefore the wave function has tobe modified accordingly. To consistently describe it, so-called Slater determinantsare needed. Slater determinants fulfill all requirements of the Pauli Exchange Prin-ciple, which states that exchanging two electrons must change the sign of the wavefunction. Two electrons that occupy the same spin orbital result in the determinantto be zero.

One can construct many different determinants form the atomic orbitals,but only the lowest energy levels are likely to be occupied, it makes sense to use

24 2 Methods: Macromolecular Modeling

only these for a cost-efficient description. The best possible description would beto include all determinants when composing the wave function. This method isknown as configuration interaction (Full-CI). However, it is not possible to use itwith a reasonable basis set for all but the smallest systems. A general observationis that for systems for which a single Lewis formula can be drawn, only few deter-minants are prevalent contributors to the overall wave function. Hartree-Fock (HF)methods try to change the basis in such a way that one determinant dominates thewave functions and all others are negligible. This means the wave function can besubstituted in reasonable approximation by only a single determinant.

All contributions to the energy associated with such a wave function, canbe split into terms that depend either on the movement of one or two electrons. Theinteraction of the nuclei contributes parametrically to the energy. The attraction be-tween one nuclus and one electron on the other hand depends on the position of theelectron. Finally the electron-electron repulsion has to be considered, and the ex-change contribution due to the Pauli-Principle, are depended on coordinates of twoelectrons. In order to assure that the atomic orbitals remain orthogonal, Lagrangemultipliers need to be introduced. Thus, the Schrodinger equation changes its formto the Roothaan equation [Roothaan, 1951]. This equation can only be solved viaan iterative process. The reason for this inconvenience is that the Fock operator de-pends on all occupied molecular orbitals. The procedure is therefore started withan initial guess of the electronic distribution. Dependent on the quality of the guessthe coefficients converge faster or slower. Finally, if a preset threshold has beenreached and the coefficients do no longer change considerably, ”self consistency”has been reached and the process is stopped. However, convergence is not alwaysguaranteed and several methods have been developed to speed-up convergence[Hamilton and Pulay, 1986].

The Hartree-Fock method works rather well to provide a rough descrip-tion of the system. The remaining error is due to the restriction that only one de-terminate is taken into consideration. The electron-electron interactions are over-simplified. Exchange and coulomb effects are described rather accurately. Com-parisons with all-determinant approaches, like Full-CI, reveal that electrons have atendency to interact stronger than Hartree Fock would suggest. The offset betweena system’s HF and Full-CI energy is called correlation energy. This difference is nota physical observable, but only results from different quantum mechanical modelchemistries, which in turn attribute differently for the correlated movements of theelectrons. A further subdivision is conventional: Dynamic correlation refers to thecorrelation of electrons occupying the same orbital. Whereas the static correlationis associated with electrons in different orbitals and is prominent for systems witha distinct orbital symmetry.

Despite being almost insignificant for the system’s overall energy, thecorrelation energy is crucial for chemical processes. ”Chemical accuracy” is onlyachieved when the error associated with the energy is less than 1 kcal/mol [Almlofand Taylor, 1987] and normally requires methods that mitigate the abovementioned

2.3 Molecular Dynamics Simulations 25

errors.

2.3.2 Empirical Force Fields

The computational study of biologically more relevant system sizes and timescaleswarrants the implementation of faster-integratable parametrized potentials. De-spite being potentials, they are commonly referred to as force fields. The systemenergy is split into additive bonded and non-bonded contributions. Hereby, re-flecting the configuration of the system in terms of energetic penalties for devia-tions from reference states. In this thesis, I used an Assisted Model Building withEnergy Refinement (AMBER) force field [Maier et al., 2015] and the associated soft-ware suite [Case et al., 2005b] for the simulation of macromolecules. The potentialtakes the form of:

Etotal =∑

covalent

+∑

non−covalent=

= (Ebonds + Eangles + Edihedrals)covalent + (EvdW + Ecoulomb)non−covalent =

=∑bonds

Kr (r − req)2 +∑angles

Kθ(θ − θeq)2 +

+∑

dihedrals

Vn2

[1 + cos (nφ− γ)] +∑i<j

[AijR12ij

− BijR6ij

+qiqjεRij

](2.14)

All terms in this equation can be separated into contributions form cova-lent and non-covalent interactions. Bond lengths, angles and dihedrals are heresummarized as covalent, while the last term describes non-covalent contributions.

• Ebonds: Here req describes a reference bond length andKr a coefficient alteringthe contribution to Etotal

• Eangles: Also a harmonic potential only in this case for the bond angles

• Edihedrals: Is defined by the rotational barriers around single bounds

– with θ being the angle

– n alters the amount of described minima/maxima

– γ is an additional phase angle

• Enon−covalent: Electrostatic qiqjεRij

, exchange Aij

R12ij

and dispersion Bij

R6ij

terms

– with A and B being coefficients

– R the radii and

26 2 Methods: Macromolecular Modeling

– q the charges– ε represents the dielectric constant

To describe the smaller deviations from given reference geometries for, e.g., bondlengths, a harmonic approximation can be utilized. This computational represen-tation allows a very cost-effective description of molecular structures. Hence usingsuch model large biomolecules can be studied. Even the time evolution of a systemis accessible when Newton’s law of motion is integrated. Besides 3-body potentials,charge-transfer effects and other phenomena, the quality of this model is limited tothe parameters available to build it.

2.3.3 Periodic Boundary Conditions & the Minimum Image Convention

Simulations of (bio)molecular systems are carried out on one or only a few biologi-cally relevant molecules in explicit or implicit solvent. This system is encapsulatedby a container, the simulation box. The box faces can serve as a walls where themolecules bounce off. However, this approach leads to artifacts since the surface tovolume ratio is drastically increased for small box dimensions, leading to more sol-vent molecules immediately adjacent to the wall. Since water behaves differentlyin bulk vs. close to the surface/walls this would result in simulation artifacts.

To avoid such finite system effects and still ensure that the watermolecules remain confined to the simulation container, the simulation box (unitcell) is replicated in all directions when using periodic boundary conditions (PBC).Thus molecules that approach a border of the containment vanish and are reap-pear near the box face vis-a-vis. This approach restraints the selection of simula-tion containment shapes since only certain geometries achieve the required tiling ofspace necessary for periodic boundary conditions. Convex box options, therefore,include the triclinic box, the hexagonal prism, two different dodecahedrons, andfinally, the truncated octahedron. Here the shape of the molecule to be simulatedin conjunction with the available computational resources can guide the selection.In three dimensions, an optimal surface/volume ratio is achieved by a sphericalcontainer at the cost of losing tessellation. A tradeoff suitable for globular proteinsis a truncated octahedron, which approximates the globular structure but it alsohas a convenient space-filling property. This simulation container geometry andthe cubic box were used in this thesis.

Pair interactions can be calculated exclusively between the closest peri-odic image of a molecule. This is then referred to as the minimum image conven-tion.

2.3.4 Long-Range Interactions

Although PBCs provide an elegant means to calculate bulk properties this alsoleads to interactions across the boundaries of the simulation box. Especially elec-trostatic interactions due to their long range, pose a problem, as all interactions

2.3 Molecular Dynamics Simulations 27

with infinite images need to be considered. The upper limit for the distance oftwo charged particles is given by half of the length of a cubic simulation box. Inthe Ewald sum approach, initially devised to treat ionic liquids, an artificial shield-ing potential around charged particles is introduced. This potential is set to 1 forsmall arguments and to 0 when half of the box size is exceeded. The interactionpotential has then to be split into short and long-range contributions. While theshort-range term readily converges in real space, the long-range is calculated inreciprocal space, where its respective convergence is swifter.

The Smooth Particle Mesh Ewald (SPME) summation [Essmann et al.,1995] provides a more computationally tractable means to calculate the electro-static interaction. Again, the potential function is decomposed into short- and longrange terms and takes advantage of Fast Fourier Transform to achieve a complexityof O(n log n).

2.3.5 Neighbor’s List and SHAKE Formalism

Cost-efficient simulations take advantage of the fast decay of non-bonded andshort-ranged potentials by neglecting the contribution of distant molecules. Thusmaintaining a record of molecules within a certain distance-cutoff, a so-calledneighbor list, decreases computational time drastically. Short-ranged interactionsare only calculated for molecules nearby. Over the course of a simulation, such listsneed to be updated every 10 or 20 steps.

Additional speed-up is achieved by imposing holonomic constraints dur-ing the simulation, e.g., within the SHAKE approach [Ryckaert et al., 1977]. Typi-cally, only the H-bonds are constrained, which permits larger simulation time steps.

2.3.6 Solvation and Solvent Effects

Most biological processes occur in aqueous solution and modeling biomoleculesrequire to account for solvent effects. In the folded state, globular proteins aresolvated by water molecules and electrolytes. Clearly, polar and charged proteinside chains prefer a more hydrophilic environment, whichs, in turn, leads apolarsidechains to orient towards the more hydrophobic core of these proteins.

Water is a dielectric medium, and a popular approach to mimic aque-ous solution is by adjusting the dielectric constant without explicitly assigningwater molecule positions. Such methods are termed implicit or continuum sol-vation models and approximate the dynamic behavior of a large number of sol-vent molecules. A major advantage of these methods is their computational cost-effectiveness. Nevertheless, solute-solvent hydrogen bonds, which play a role inprotein-protein or protein-ligand recognition are not captured. Therefore, severalexplicit water models have been put forward. These can be categorized depend-ing on several key features such as (i) the inclusion of polarizability to account fornon-additive effects, (ii) fitting or interpolating energies to account for short-range

28 2 Methods: Macromolecular Modeling

effects, (iii) incorporation of monomer flexibility, (iv) accounting for quantum ef-fects in simulations and (v) transferability and dissociable water models [Ouyangand Bettens, 2015]. Commonly used explicit water models differ in the intramolec-ular flexibility they permit, the number of point charges (sites) to approximate theelectrostatic potential or polarization correction terms. The TIP3P model [Jorgensenet al., 1983], which I used in this thesis, and the SPC/E [Berendsen et al., 1987] mod-els are 3-site models widely used in molecular dynamics simulations due to theircomputational efficiency.

2.4 Monte Carlo Protein Simulation

Monte Carlo (MC) methods provide an alternative approach to study conforma-tional changes in proteins. A potential conformational change is typically obtainedby a rotation around a dihedral angle. This change is however only accepted whenit either decreases the potential energy of the system or when the Boltzmann factore−

∆EkT is smaller than a random number between 0 and 1. The second criterion, also

known as the Metropolis criterion [Metropolis et al., 1953], permits the system toovercome energy barriers.

In this thesis, we used PROFASI, an all-atom Monte Carlo simulationpackage for PROtein Folding and Aggregation SImulation [Irback and Mohanty,2006]. In this implementation, a pairwise additive potential is used and takes theform:

E = Eloc + Eev + Ehb + Ehp (2.15)

Here, Eloc refers to the potential of the local backbone in the form of anelectrostatic potential between neighboring amino acids. The remaining non-localterms include an excluded volume Eev, a hydrogen bond Ehb and a hydrophobicinteraction term Ehp.

Conformational updates can either include individual backbone andsidechain torsional angles, or biased Gaussian steps [Favrin et al., 2001], where 7-8adjacent backbone dihedral angles biased towards local deformations are turned.Finally, in the case of multichain systems also rigid-body whole chain translationsor rotations are considered.

2.5 Computational Lead Discovery and Drug Design

2.5.1 Introduction

Pharmaceutical discovery has benefited from a broad array of chemical disciplineswhich include dyes, plant extracts and cell cultures of bacteria and fungi. Never-theless, serendipity is an important factor in the discovery of drugs.

2.5 Computational Lead Discovery and Drug Design 29

Classical drug research relied largely on trial and error with only limitedknowledge of cellular and molecular pathophysiological processes [Hol, 1986, Ver-linde and Hol, 1994]. Nonetheless, several still marketed drugs have been devel-oped using systematic searches for derivatives of active compounds. Among themost prominent example might be the invention of acetylsalicylic acid (ASA) byFelix Hoffmann.

Clearly, the high complexity of drug discovery and development stemsfrom various reasons including, (i) the constant emergence or rapid evolution ofpathogens, e.g. Zika virus outbreaks from the last decade, (ii) multifactorial dis-eases, e.g. cancer (iii) addressing pharmacodynamic aspects (iv) and drug safetyconsiderations [Maltarollo et al., 2018].

Furthermore, the development of tissue or organ-specific drugs presentsunique challenges with respect to other disorders. As aforementioned, this is inparticular the case for diseases affecting the CNS, which comprises the brain andspinal cord. Rational drug design requires a fundamental understanding of theunderlying disease pathology, which we still lack for many CNS disorders, e.g. themultifaceted AD.

Finally, the implementation of more stringent guidelines, such as pro-vided by the Food and Drug Administration (FDA) and the European Commissionadds additional hurdles for drug approval [Norman, 2016].

2.5.2 Computer-aided Drug Design

Small molecules that bind to proteins can drastically alter the properties of the pro-tein. They can either occupy the active site and modulate further substrate process-ing. Alternatively binding can occur at a site other than the active site and affectsubstrate binding distantly. Such modulators are then referred to as allosteric. Thismodulation can take many forms, e.g. steric clashes with the substrate or evencause a population shift in the conformational landscape [Ban et al., 2017].

When the protein is implicated in pathological processes, known smallmolecule modulators are especially interesting. They serve as starting points topropose a pharmaceutical intervention. The disease-relevant protein is then com-monly referred to as a ”target”.

The role of computers in medicinal chemistry has changed substantiallyover time. Formerly, their predominant responsibility was data storage and man-agement. Subsequently, advanced computational concepts facilitated insights viain silico modeling and simulations. This, in turn, allowed the generation of researchhypotheses and their evaluation in traditional wet lab settings.

To this end, computer-assisted drug design (CADD) leverages the com-putational power of modern information technology infrastructure to model andsimulate biomolecular systems and even propose and optimize small moleculesthat modulate physiological relevant proteins [Klebe, 2013]. When CADD strate-gies are embedded into a drug design process from the start, they can not only

30 2 Methods: Macromolecular Modeling

guide experiments (e.g. which residues to select for a mutagenesis experiment) butalso cost-effectively integrate findings from different experiments.

Such methods played a pivotal role in many drug discovery projects, in-cluding the approved drugs captopril, saquinavir, ritonavir, indinavir, and tirofibanand are now a vital part of the drug development process [Talele et al., 2010, Sledzand Caflisch, 2018].

Recent advances in computing power, in particular, the general availabil-ity of fast graphical processing units (GPUs), encouraged researchers to revisitproblems in CADD. Of note, machine learning techniques have received renewedinterest and were adopted in numerous areas of molecular modeling to addressthe ever-increasing amount of research data [Siegismund et al., 2018]. Populartechniques comprise support vector machines (SVMs), artificial neuronal networks(ANN), random forests (RFs) and deep learning (DL) [Chen et al., 2018]. SVMswere initially conceived by Vapnik-Chervonenkis et al. [Vapnik, 2000] for classi-fication problems and extended to rank and make predictions using a regressionmodel [Lima et al., 2016]. These algorithms attempt to identify an optimal decisionboundary between two classes. Interestingly, the resulting Lagrangian dependsonly on the dot product of the respective samples. Using a so-called kernel func-tion yields the dot product in a higher-dimensional space where the classes arelinearly separable. Popular choices include linear, polynomial, sigmoid or radialbasis (RBF) kernel [Maltarollo et al., 2018]. In this thesis, I used a quadratic andGaussian kernel function, respectively.

2.5.3 Structure- and Ligand-based Approaches

CADD approaches are often categorized either as ligand-based or structure-based.Ligand-based CADD is primarily exploiting information from known binders toa protein target but also in cases where the target is unclear. Such approachesare especially interesting when a set of (dis)-similar small molecule modulatorsis known. The three-dimensional chemical structure of similar molecules can bereadily superposed. This permits the deduction of common physicochemical traits,e.g. hydrogen bond donor/acceptors or charged/hydrophobic moieties, and theirrelative spatial positions. Such a physicochemical abstraction is called a pharma-cophore hypothesis.

Approaches taking the target structure into account, or structure-basedCADD, present alternatives towards understanding protein-ligand binding. Thediscovery and development of numerous new drugs has benefited from publi-cally available 3D structural protein information [Westbrook and Burley, 2019].The target structure may either be obtained from experimental (e.g. X-ray, NMRor cryoEM) or even a high-quality homology model. Generally, the target is thenscreened for cavities that could accommodate a small molecule or even fragments.More recently, also privileged flat protein surface patches have been investigatedin structure-based CADD, as they can mediate protein-protein interactions.

2.5 Computational Lead Discovery and Drug Design 31

Protein-Ligand Complex Prediction Once a suitable binding site is dis-covered three-dimensional small molecule conformers can be placed into this site.This 3D-jigsaw approach is known as molecular docking, or simply docking. Be-sides shape complementarity, charge matching and optimizing hydrogen bondingcan also guide the generation of the ligand-protein complex.

Although CADD has been highly successful to advance small moleculesinto the clinic and even lead to FDA-approved drugs, it is necessary to underlineseveral caveats. In the molecular docking approach, the target and the ligand re-spectively are often treated as static entities. Experimental evidence from X-raytemperature factors and NMR solution structures, however, highlights their intrin-sic dynamical features. Thus target/ligand structures in CADD are merely snap-shots. For sufficiently small ligands, quantum chemical methods help to elucidatethe electronic structure. On the other hand, dynamical aspects of ligand bindingcan also be studied by randomly inserting a small ligand into a solvated proteinsimulation box. This requires to subsequently perform long molecular dynamicssimulations [Shan et al., 2014, Dror et al., 2011]. Nevertheless such detailed treat-ment is computationally intractable.

Therefore, knowledge-based methods are employed to deduce low-energy ligand conformers and protein-ligand complex structures in an attempt toreproduce/predict bioactive conformations. Resulting ligand binding hypothesesare known to as ”poses” and a score assesses the quality of such a hypothesis.

2.5.4 Scoring Functions

The major aim of a scoring function (SF) is to discard unreasonable binding modesand subsequently provide binding affinity estimates. Depending on the underlyingidea, SFs can be classified as follows [Li et al., 2019] :

• Physics-based: Such approaches can incorporate non-bonded force field terms[Meng et al., 1992], solvent models, and electron structure methods [Mucsand Bryce, 2013].

• Empirical: Here the SF is a linear combination of energetically relevant con-tributions to protein-ligand binding [CW et al., 1998], e.g. van-der-Waals in-teractions and hydrogen bond formation. Individual weights are obtained byfitting to training data sets.

• Knowledge-based: Using a large set of known three-dimensional protein-ligandcomplexes allows the derivation of pairwise potentials [Gohlke et al., 2000].

• Machine-learning based: This class of SFs employs support vector machines(SVMs) [Zhang et al., 2017], random forest (RF) [Wang and Zhang, 2016], ar-tificial neural network [Durrant and McCammon, 2011] and deep-learningalgorithms [Stepniewska-Dziubinska et al., 2018]. Although they outperformtraditional approaches, they require sufficiently large training data sets.

32 2 Methods: Macromolecular Modeling

In this thesis, the following empirical scoring functions where used:

• Chemgauss3: Gauges shape/charge complementarity and hydrogen bondingefficiency between ligand/solvent and implicit solvent. Moreover, metal-chelator interactions are taken into account.

• Chemgauss4: Constitutes a modification of the Chemgauss3 scoring functionwith improved hydrogen bonding (networking) and metal chelator terms.

• GlideScore: Estimates ligand-free binding energies but also includes force fieldterms. These force field terms are weighted more significantly when multipleposes of the same ligand are scored with respect to each other, leading to theso-called Emodel SF [Friesner et al., 2006].

Conformer and Pose Generation In this thesis, we use the knowledge-basedOpenEye OMEGA[Hawkins et al., 2010] to generate conformer libraries of smallmolecules. Here, fragments templates are assembled along with sigma bonds, in-vertible nitrogens are enumerated and dihedrals are rotated. High-energy con-formers and too similar conformers are discarded based on symmetry or RMSDconsiderations.

For a given small molecule, the corresponding conformer ensemble gener-ated by OMEGA is then placed into a protein cavity with OpenEye FRED [McGann,2011] in a two-step workflow:

I Exhaustive Search:

i Enumeration of all putative translations and rotations of each conformer.

ii Removal of poses that are too close or far from the protein.

iii Discard all poses that do not match predefined constraints.

iv All remaining poses are scored with the preliminary Chemgauss3 function.

v Ranking the poses according to the score and ranking the best ones.

II Optimization:

i With half of the resolution of the first step, initial poses are rotated andtranslated.

ii Scoring with the refined Chemgauss4 function and retaining high-scoringposes.

iii The overall score used to rank the remaining molecules with respect to eachother.

2.5 Computational Lead Discovery and Drug Design 33

2.5.5 High-Throughput Screening and Virtual Screening

Clearly, there is an interest in probing not only one but several compounds fortheir ability to form a protein-ligand complex. Pharmaceutical companies mighteven want to consider their entire compound libraries. Traditionally, this was onlyachieved in wet lab settings, by means of extensive manual labor. Such efforts werelater facilitated through automatization by high throughput screening approaches(HTS). The specifications of HTS inherently require a high degree of reliability androbustness. Moreover, it is important that the process is easily monitored and rapid.While HTS offers extremely quick assay data at low volumes, the focus lies oftenonly on single targets without obtaining information regarding therapeutic effects.Here, phenotypic screening allows compounds to be tested for their actual bio-logical impact, without the need to identify a single target [Prior et al., 2014], butat the cost of lower throughput. Swinney et Anthony showed in a retrospectivestudy, that the contribution of phenotypic screening to the discovery of first-in-classsmall-molecule drugs outnumbered that of target-based approaches [Swinney andAnthony, 2011].

The primary goal of HTS is not to uncover a drug that can be immediatelymarketed, but rather a ”hit” , that is a compound which binds to the target and hasa statistically meaningful activity with respect to other library compounds. To qual-ify as a hit, the active compounds need to show a dose-dependent response. If thishit has the desired pharmacodynamic and -kinetic properties and meets project-specific requirements, the decision is made whether to proceed with it. If so, thecompound becomes a ”lead” [Deprez-Poulain and Deprez, 2004]. Typically, nota singular lead but a lead series is then investigated. This involves preliminarystructure-activity relationships (SAR). Considerable effort is then paid to optimizethe lead(s). This iterative process includes the synthesis and full characterizationof the drug candidate. During this phase, medicinal chemists alter the adsorption,distribution, metabolism, excretion and toxicological profile (ADMET) of the drugcandidate e.g. by replacing problematic functional groups or altering the scaffold.If metabolic processing is sufficiently understood, the lead can be evolved to a pro-drug, which requires metabolic processing to become pharmacologically active.

While HTS remains the gold standard for hit discovery, there has beenconsiderable interest to augment HTS with CADD approaches. On one hand,ligand-based methodologies can efficiently guide a preselection of compound li-braries, thus effectively reducing HTS costs. On the other hand, there is structure-based virtual screening (SBVS) and de-novo design [Lionta et al., 2014]. In SBVS,compound libraries are docked into the target protein and scoring functions aimto unravel hits. Besides that speed and relatively low cost, a key advantage of VSis that compounds need only to exist virtually to be assessed in silico. This idea istaken to the extreme, by building a putative hit from scratch on a possible targetbinding site in the de novo design approach.

The outcome of VS approaches highly depends on the quality of experi-

34 2 Methods: Macromolecular Modeling

mental data, the appropriate choice of algorithms to design the screening library,the target structure preparation and the docking/scoring method [Forli, 2015].Nevertheless, accurate in silico handling of tautomers and protonation states andtheir respective pH and temperature is a problem especially for larger libraries,where individual molecules can no longer be inspected visually. To balance compu-tational speed and accuracy, such libraries typically require a rule-based tautomerprediction, which relies primarily on connectivity information, rather than elec-tronic structure [Watson et al., 2019]. Moreover, when the three-dimensional struc-ture is not taken into consideration, stabilizing intramolecular hydrogen bondsmay go unnoticed.

Once the formal charge of all library molecules corresponding to their pro-tonation states is determined, each atom is assigned an individual partial charge.This distribution can be based on quantum mechanical, semi-empirical, force-fieldor fully empirical methods [Forli, 2015].

At this point, it is often necessary to reduce the computational workloadby trimming a compound library. A popular filter is the ”rule of 5” put forward byLipinski et al. [Lipinski et al., 2001], when he noted that physicochemical common-alities in a set of known drugs. Orally active drugs should not violate more thanone of the subsequent criteria:

• No more than 5 H-bond donors

• No more than 10 H-bond acceptors

• The molecular mass should not exceed 500 daltons

• A (predicted) octanol-water partition coefficient (logP ) below 5

In this thesis, I used these heuristic guidelines, since I was interested in anorally available drug and wanted to keep the in silico library manageable.

The second key component of SBVS a suitable protein target. Althoughhigh-resolution X-ray structures are preferable for this line of research, they rarelycan be used without detailed preparation. In fact, an analysis of a set of 728 struc-tures, taken from four published receptor-ligand structural databases, revealed thatmore than 20% required curation [Warren et al., 2012]. Regions with unassignedelectronic density require special attention and even comparative modeling. Alter-native sidechain positions and temperature factors (also known as B-factors) pro-vide first hints towards flexible segments.

Apart from the protein, X-ray structures often contain other molecules,buffers or agents to induce crystal formation. If these are unlikely to be presentunder physiological conditions, they should be removed. Structurally importantwater molecules or ions however, when tightly bound to the target should be re-tained [Blum et al., 2010].

2.5 Computational Lead Discovery and Drug Design 35

Finally, the hydrogen-bond network of the target is optimized, since it isnot normally evident from the crystal structure. In this thesis, we used the heuristicPROPKA approach [Olsson et al., 2011].

Once the target is prepared, it is necessary to account for its flexibility, asproteins exist in a variety of conformational states distributed in a complex energylandscape [Fenimore et al., 2004]. Individual conformations can be associated withactive/inactive states with biological relevance.

Furthermore, these states can be stabilized or destabilized by smallmolecule binding. When binding occurs in the active site of the target and causessmall adjacent re-arrangements such binding events are called ”induced fit”. Tothis end, when the target structure is available in a ligand-bound state (holo) SBVSwith structure results in better performance than a ligand unbound structure [Mc-Govern and Shoichet, 2003].

Numerous docking algorithms account for small binding site changes, e.g.by considering alternative rotamers in the active site, e.g. the Induced Fit Docking(IFD) [Sherman et al., 2006, Sotriffer, 2011] protocol. The underlying assumptionis that sidechain reorganization occurs on a much faster timescale than changes inthe protein backbone.

Binding Site Detection and Selection In an SBVS, a reasonable repre-sentation of the binding site is crucial and typically comprises only residues, con-served ions or cofactors in or near the active site. When the boundaries of this rep-resentation, or receptor, encompass a larger portion of the protein, the performanceof the screening deteriorates rapidly.

When an apo-structure is considered for SBVS, binding site detectionmethods permit a fast assessment of putative binding sites. Nevertheless, evenfor apo structures restricting the receptor boundaries to the active site may not bedesirable. Apart from the shape complementary, other physicochemical propertiesof the compound library to be docked have to guide the binding site discovery. Inthis thesis we used the following grid-based approaches:

• SiteMap[Halgren, 2009] differentiates ”inside” versus ”outside” grid points.The enclosure of outside points is determined by sampling, in which fractionradial rays strike the protein surface within a cutoff, e.g. 8 A, and subse-quently clustered. Subsequently, contour maps are drawn to highlight hy-drophilic, hydrophobic and surface maps, indicating the overall space avail-able for a ligand. The hydrophilic map is decomposed into hydrogen bonddonor and acceptor maps. Putative sites are then scored by counting their re-spective site points, but also their exposure, enclosure, contacts with the pro-tein and the overall hydrophobic/hydrophilic character. A weighted scoreof these values is reported as SiteScore, which is used to identify and rankbinding sites:

36 2 Methods: Macromolecular Modeling

SiteScore = 0.0733n1/2 + 0.6688e− 0.2p (2.16)

Here n represents the number of site points (max. 100), e the enclosure score,and p the capped hydrophilic score.

• DoGSiteScorer[Volkamer et al., 2012a]: After spanning a grid around the pro-tein, a difference of Gaussians filter [Bomans et al., 1990] elucidates proteinsurface positions where a spherical ligand might bind (see Fig. 2.1). Using adensity threshold, these positions are clustered and combined into putativebinding pockets. To calculate pocket volume and surface, the relevant gridpoints forming the pocket are counted and multiplied with the grid box vol-ume or surface. The pocket enclosure is computed based on the ratio betweenpocket hull and surface grid points.

• FPocket[Schmidtke et al., 2010] uses Voronoi tesselation and α -spheres tocharacterize the protein surface. Here, α-spheres are spherical probes serv-ing as exclusion volume probes while in contact with four protein atoms attheir boundaries.

However, when the active site is promiscuous, e.g. binding to sev-eral molecules is known, or a ubiquitous cofactor is bound there, blind docking[Hetenyi and van der Spoel, 2009], where the whole protein is included in the re-ceptor definition, can reveal alternative binding sites or hotspots. Such hotspotsare protein areas with substantial contributions to binding free energy [DeLano,2002]. Moreover, they are less sensitive to conformational changes in the bindingsite and hence present viable starting points for lead discovery. The concomitantcomputational overhead in blind docking can, however, be avoided when smallfragments are mapped instead of putative druglike molecules. A physicochemi-cally diverse set fragment is then required as a proxy for substructures found in thecompound library. Regions where similar fragments cluster are then reported asputative binding hotspots. A popular implementation of this idea is FTMap [Koza-kov et al., 2015] and was used in this thesis. Here, a chemically diverse set of 16small molecules is used as probe molecules: ethanol, isopropanol, isobutanol, ace-tone, acetaldehyde, dimethyl ether, cyclohexane, ethane, acetonitrile, urea, methy-lamine, phenol, benzaldehyde, benzene, acetamide, and N,N-dimethylformamide.After blind docking in Fourier space, low-energy probes are clustered and reportedas consensus cluster (CC) sites. These sites are subsequently ranked based on thenumber of containing probes.

A major shortcoming of these approaches is that they are typically appliedto a single X-ray structure. The treatment of protein flexibility is restricted to permitonly sidechain movements, e.g. by swapping Dunbrack rotamers in the previouslyoutlined Induced fit docking approach.

A more involved protein flexibility assessment for hotspot mapping aremixed solvent molecular dynamics simulations. Mixed-solvent refers here to the

2.5 Computational Lead Discovery and Drug Design 37

Figure 2.1: Two Gaussian functions with different σ are drawn in black lines. Thedifference of the two Gaussian functions and the Laplacian of a Gaussian with cor-responding σ are plotted in red and green, respectively. From [Volkamer et al.,2010].

presence of non-water molecules in the simulation container. In the simplest form,this can be a binary aqueous mixture with an organic solvent, which is misciblewith water at low concentrations. Such molecules, depending on their physico-chemical properties, serve as probes that mimic ligand binding. In the case ofcertain probes clustering, the corresponding region might be suitable as a start-ing point for drug discovery [Xiao et al., 2016]. Moreover, fully embracing pro-tein flexibility by performing a mixed-solvent MD also provides insights into howbinding affects the conformational landscape of a protein. To this end, we appliedSchrodinger MixMD with binary mixtures consisting of either isopropanol, ace-tonitrile or pyrimidine in varying concentrations.

2.5.6 Druggability

As aforementioned, a prerequisite for a target to be suitable for SBVS is its abilityto be modulated by a small molecule that binds to it with high affinity. This tar-get property is known as ‘druggability’. Strikingly, only about 10% of the humangenome represents druggable targets [Owens, 2007]. Moreover, not all druggabletargets can be implicated in human disease.

38 2 Methods: Macromolecular Modeling

There are several ways to infer druggability. The most reliable drugga-bility classification of a new target is made by querying the target gene family forknown binders. If binders to similar proteins have been reported, this increases thechances that this target is druggable as well.

The obvious limitation of this precedence-based approach is, however,that only known protein families will be investigated for their druggability.

NMR-based virtual screening of a set of small compounds or fragmentsis a powerful experimental technique to evaluate druggability [Hajduk et al., 2005,Kozakov et al., 2015]. Such biophysical assessments do not rely on parameteriza-tion but demand large quantities of protein with sufficiently high purity [Volkameret al., 2012b].

Computational Prediction of Druggability

Although binding site residues tend to be conserved throughout evolution, theyconstitute only a small portion of the protein sequence. To further complicate asequence-based detection, their respective positions are frequently scattered. Tothis end, precedence-based methods are very sensitive to the multiple sequencealignment quality and the template selection [Tseng and Li, 2011].

When structural information is available, druggability can also be pre-dicted with CADD. The presence of concave protein patches or even cavities is thefirst indication of chemical tractability. In fact, the curvature and the lipophilic sur-face area was found to correlate with the maximal affinity achievable by passivelyabsorbed drugs [Cheng et al., 2007]. These parameters can further be used to clas-sify targets as either ”undruggable” , ”difficult” or ”druggable” . Hopkins et. alestimated that only 15% of proteins reveal an adequate deep active site pocket ev-ident from their crystal structure [Hopkins and Groom, 2002]. Moreover, concavepatches can then be compared with similar sites on other proteins and also with re-spect to their physicochemical and geometrical properties, to predict whether theyare druggable or not. In fact, proteins with similar active sites frequently fulfill sim-ilar biological roles [Shulman-Peleg et al., 2004]. Sheridan et al. even identified a”pocket space” by analyzing putative pockets available from the Protein Data Bank(PDB) in terms of volume, buriedness, and hydrophobicity [Sheridan et al., 2010].

Clearly, the problem of a computer-aided druggability evaluation isclosely related to binding pocket discovery. This is especially evident for SiteScore,where a corresponding druggability score (Dscore) is obtained by reweighting[Halgren, 2009]. We used SiteScore in this thesis. Here the number of site points (n)is capped at 100, while the hydrophilic term is not, effectively characterizing polarsites as less druggable:

Dscore = 0.094n1/2 + 0.60e− 0.324p (2.17)

An alternative way to gauge druggability, using a support vector machineapproach, was put forward by Volkamer et al.[Volkamer et al., 2012a], which in-

2.5 Computational Lead Discovery and Drug Design 39

cludes the subdivision of a discovered pocket into subpockets. This facilitates thediscovery of additional, e.g. allosteric or fragment binding sites.

Although many methods permit the detection of putative binding sitesand a follow-up druggability ranking, the final selection of which pocket to targetcan not always be taken purely based on these scores. In ambiguous cases, pocketvolume has been found to be a good active site predictor [Sotriffer and Klebe, 2002].Moreover, fragment-based blind docking typically yields numerous pockets, dueto the many cavities and protrusions that line the protein surface. In the dockingstudy by Vass et al., such fragments were successfully linked and the resultingcompounds achieved up to 55-fold selectivity in favor of the dopamine D3 receptor[Vass et al., 2014].

Most CADD druggability metrics are not only limited by their exclu-sive analysis of static X-ray structures, but also by their dependence on parame-terization from known target-ligand interfaces. Although these methods are notcomputationally demanding, they are sensitive to alternative sidechain orienta-tions reported in high-quality crystallographic structures. Especially when bind-ing site flexibility is evident from crystallographic B-factors or NMR experiments,the predictive power of these methods using a single snapshot is questionable. Tothis end, biophysics-based computational approaches that draw inspiration fromNMR-based fragment screenings or crystallographic soaking offer trackable and re-liable alternatives to computational approaches purely relying on heuristics [Koza-kov et al., 2015]. In particular, the aforementioned fragment screening by FTMap[Brenke et al., 2009] or mixed solution molecular dynamics simulations enable heremore tractable computational druggability assessment from first principles [Secoet al., 2009, Ghanakota and Carlson, 2016].

Towards Embracing Binding Site Flexibility Dynamical aspects of thetargets and in particular of binding sites are more difficult to estimate. Neverthe-less, they not only enable a more detailed understanding of molecular recognitionand therefore better-tailored screening libraries, but also means to tweak specificityand selectivity.

The first model for protein-ligand binding was put forward by Fischer in1984 when he proposed its analogy to the lock-key system [Fischer, 1894]. Sincethen, our understanding has evolved and the ”induced fit” model by Koshland[Koshland, 1958] introduced the notion that the binding site can rearrange in thepresence of a suitable ligand (Figure 2.2a). Later, the ”conformational selection”model [Ma et al., 1999] suggested that in proteins the conformational change re-quired to accommodate the ligand occurs prior to a binding event (Fig. 2.2b).

40 2 Methods: Macromolecular Modeling

Figure 2.2: (a) In induced-fit binding, the change between the conformations P1and P2 of the protein occurs after binding of the ligand L. The intermediate stateP1L relaxes into the bound ground state P2L with rate kr, and is excited from theground state with rate ke. (b) In conformational-selection binding, the conforma-tional change of the protein occurs prior to ligand binding. The intermediate stateP2 is excited from the unbound ground state P1 with rate ke, and relaxes back intothe ground state with rate kr, adapted from: [Paul and Weikl, 2016]

2.5.7 Transient Sites

Not all proteins of pharmaceutical interest feature a distinct druggable binding siteevident from either a crystal structure or a homology model. Moreover, some bind-ing sites can be shallow or solvent-exposed, while others reveal subpockets onlyinterconnected via the so-called ”bottlenecks”. Although such binding sites can beinvolved in protein-peptide interaction and thus be in principle druggable, target-ing them with low molecular weight compounds is challenging, and may requirealternative chemical libraries e.g. macrocycles [Ermert, 2017].

Furthermore, active sites often bind cofactors relevant to numerous otherphysiological pathways. When a ligand is designed to mimic such a cofactor with-out achieving high selectivity, off-target effects may hamper the development ofthis drug candidate.

Studying the dynamical behavior of these sites and the discovery of al-losteric sites can provide viable starting points for drug discovery. Moreover, the insilico prediction of a transient site in HIV integrase and its subsequent experimen-tal verification led to approved inhibitors against HIV infection [DJ et al., 2004, JRet al., 2004].

Cryptic or transient sites are only evident in the protein holo form andthus require a conformational change of the apo form to become apparent. Never-theless, the fluctuation-dissipation theorem, which relates equilibrium fluctuationsto the response of the system to a ligand-induced perturbation, states that crypticsites formation will also occur over the course of a simulation of the apo state per se[Bowman and Geissler, 2012]. There is considerable pharmaceutical interest in suchsites, as many protein targets lack druggable pockets. To this end, Cimermancic etal. characterized the physicochemical and conformational changes upon transientpocket formation, by comparing apo and holo protein structures available from theProtein Data Bank (PDB) [Cimermancic et al., 2016]. They found that transient sitesare evolutionarily as conserved as binding pockets but display higher B-factors.Interestingly, there was no statistically significant difference between the ligandproperties of cryptic or binding pockets. Although a machine-learning model was

2.5 Computational Lead Discovery and Drug Design 41

built- namely an SVM- they found that the most informative feature was based ontheir simplified energy-landscape molecular dynamics simulation. The second andthird most important features were sequence conservation and fragment binding,respectively. Interestingly, when applying their model to the remaining subset ofhuman structures in the Protein Data Bank (PDB), they found that cryptic sites maybe present in more than 74% of deposited structures and thus are about ten timesmore prevalent than proteins with binding pockets only. This has clear implicationsfor drug design, as targets without a suitable pocket should not be easily dismissedwithout investigating putative cryptic sites.

Figure 2.3: Cartoon representation of five different classes of pocket dynamics: sub-pocket, adjacent pocket, breathing motion, channel/tunnel, allosteric pocket. Re-gions colored in pink indicate pocket variation relative to the reference structure(shown in the center); the red dotted lines show the pocket shapes. For allostery,the shape of the original binding site is affected by a molecule binding at a distinctbinding site. Adapted from: [Stank et al., 2016]

Recently, Stank et al. introduced five classes to differentiate betweenbinding pocket dynamics: subpocket, adjacent pocket, breathing motion, chan-nel/tunnel, and allosteric pocket. An overview with examples is shown in Figure2.3. Pink areas indicate areas that yield, e.g. become accessible during a simulation.

Although cryptic sites are presumably very abundant and medicinalchemists have started to classify them based on their shared traits, it is strikingthat HTS screens do not identify them more frequently. Here, the use of screeninglibraries biased towards traditional drug targets is likely the cause. Additionally,there can be a free energy penalty for the formation of the cryptic site, effectivelyleading to weaker binding in the cryptic site [Mobley and Dill, 2009].

Thus, to efficiently exploit a transient site it is not only necessary to detect,but to also have a detailed understanding of its formation and stability. In return,this facilitates tailored screening library design and thus ensures a computation-ally cost-effective VS. Therefore, preliminary implicit solvent molecular dynamics

42 2 Methods: Macromolecular Modeling

simulations or perturbation approaches such as L-RIP/RIPlig [Kokh et al., 2016] ortCONCOORD [Seeliger et al., 2007] provide tractable means towards understand-ing transient site formation. Normal mode analysis (NME) based on an elastic net-work model description (EMM) is another widely used technique. Here harmonicsprings link atom pairs in close proximity. Diagonalization of the Hessian matrix(i.e. the matrix of the second-order derivatives of energy) yields the vibrationalmodes of the protein. Low-frequency modes often coincide with experimentallyobserved conformational changes [Ahmed et al., 2011].

While sufficiently long explicit solvent molecular dynamics simulationswill reveal cryptic pocket formation eventually, often timescales above 1 µs arerequired [Frembgen-Kesner and Elcock, 2006, Sledz and Caflisch, 2018]. Therefore,several perturbation methodologies were put forward to investigate these phenom-ena with less computational efforts. Site Identification by Ligand Competitive Sat-uration (SILCS) is an MD-based solvent-mapping protocol that relies on co-solventprobes in high-concentration that facilitate the opening of a cryptic site. To preventthe aggregation of co-solvent molecules the attractive part of the Lennard-Jonespotential is switched off [Raman et al., 2011]. Enhanced sampling by Hamiltonianreplica-exchange in the presence of ligand probes proved successful in reproducingthese results at lower computational cost [V et al., 2016].

Furthermore, co-solvent probes can be tailored to mimic not only smallmolecules but also the protein backbone and allow to discriminate protein-proteininteraction surfaces from small molecule binding sites. Schmidt et al. establishedan all-atom workflow based on cosolvent molecular dynamics that can discrimi-nate protein and small ligand interaction regions on a protein surface and identifycryptic sites [Schmidt et al., 2019].

In this thesis, we perform co-solvent simulations with SchrodingerMixMD [Ghanakota et al., 2018] to detect protein and small-molecule bindinghotspots.

Transient allosteric sites While historically a structure-centric under-standing of allostery dominated [Monod et al., 1963], more recent descriptionsfocus also on the dynamical features [Motlagh et al., 2014]. This paradigm shiftseems reasonable since allosteric effects were reported even in intrinsically disor-dered proteins (IDP), that do not feature a fixed structure [VJ and EB, 2007]. Ad-ditional support for this view stems from experimental advances in NMR spec-troscopy [Tzeng and Kalodimos, 2011].

Ligands that bind distantly from the active site against a transient site maydo so without changing the activity. This is especially the case when the targetedtransient site does not couple with the active site. Considering the estimated abun-dance of transient sites, it is necessary to prioritize sites where such allosteric com-munication with the active site is conceivable, so-called transient allosteric sites.Furthermore, results from Bowman and Geissler on β -lactamase, RNase H andinterleukin-2 (IL2) indicate that even single proteins may contain several allosteric

2.5 Computational Lead Discovery and Drug Design 43

pockets and tested these using thiol labeling experiments [Bowman and Geissler,2012]. Additionally, they also showed that even in the absence of backbone rear-rangements correlations between amino acid sidechains can convey allosteric ef-fects.

Clearly, there is pharmaceutical interest to predict protein allostery andin particular transient allosteric sites. However, allosteric effects can arise from avariety of effects, therefore a general predictor would require a unified computa-tional treatment. Furthermore, involved conformational changes may range overtimescales from micro- to milliseconds and disorder can play a role. Approaches topredict protein allostery depend either on NMA (e.g. Allopred [Greener and Stern-berg, 2015] ), techniques, evolutionary relationships [GM et al., 2003] or machine-learning. A list of tools to assess putative allosteric effects in proteins is given in[Greener and Sternberg, 2018] .

Allopred [Greener and Sternberg, 2015] mimics allosteric ligand bindingby increasing the spring constant for the residues contributing to the allosteric site.Hereby, the flexibility of allosteric pocket residues is reduced and an active siteperturbation would lead to different normal modes. These differences and physic-ochemical pocket descriptors are then fed into an SVM to discern allosteric effects.In this thesis, we used CryptoSite in conjunction with Allopred for a preliminarytransient allosteric site assessment.

Ensemble docking Although these methods provide a first understanding of thetransient pocket formation, the introduced perturbations can lead to deviationsfrom the relevant conformations for ligand-binding. An additional shortcomingis the implicit treatment of water. Thus sufficiently long explicit solvent simula-tions without perturbations are required. Ideally, repeated formation and collapseof such transient sites can then be studied over the course of the simulation. In-dependently whether a conformational ensemble was generated either with per-turbation approaches or an unbiased MD, ensemble docking refers to approacheswhere snapshots from such simulations are used for VS or docking. Since not onlyligand flexibility during conformer generation, but also target flexibility is takeninto account, ensemble docking can outperform screenings based on static crystal-lographic structures [Amaro et al., 2018].

In principle, all snapshots from dynamics study could be employed forVS, ensuring that, if a relevant or drug-binding conformation is visited during thesimulation, it is used to prioritize binders. This, however, is computationally verydemanding and all conformations are typically clustered before considered for VS.Falcon et. al, however, showed that clustering of molecular dynamics trajectoriesis not always the ideal choice to choose conformations capable of ligand-binding[Falcon et al., 2019].

Therefore, in this thesis, I chose a different approach to incorporate targetflexibility in our virtual screening approach. First, I analyzed the protein structureand investigated crystallographic water networks using Euclidean-distance based

44 2 Methods: Macromolecular Modeling

agglomerative clustering of the crystallographic oxygen positions. When usingDoGSiteScore on the crystallographic structure I could not discern suitable bindingsites. Therefore I used the machine-learning-based CryptoSite approach for rapidtransient site assessment. This was followed by an implicit solvent L-RIP/RIPligperturbation study and finally 10 µ s all-atom molecular dynamics simulations inexplicit water. I subsequently investigated which overall structural changes in theprotein are required for transient site formation and applied a cut-off. All snapshotswithin thus geometric cutoff criteria were subjected to a druggability assessmentwith DoGSiteScorer. Finally, I computed the dihedral angle changes of residuescrucial for transient site formation. The two most druggable target conformationswere discarded since I noticed that their dihedrals had a low overall prevalence inmy molecular dynamics simulation.

Hit Prioritization In principle, screening hits with a highly diverse and extensivecompound library would allow the identification of many different starting pointsfor lead optimization. Nevertheless, inaccuracies stemming from e.g. VS protocols,force fields, scoring functions but also the use of biased chemical libraries typicallyoften leads to structurally similar high ranking compounds.

Furthermore, achiral compounds are preferable at this stage since theyare often less complex to synthesize. Additive scoring functions suffer from a biastowards heavier compounds since they can form more protein-ligand interactionssimply due to their size. In such cases, it is necessary to post-process the dockingscores and calculate the ligand efficiency (LE) [Hopkins et al., 2004], e.g. dividingthe docking score by the number of heavy (non-hydrogen) atoms.

In this thesis, I selected a subset of high ranking compounds by visual in-spection and based on how diverse their protein-ligand interactions were. To thisend, I calculated a one-dimensional representation of each docking pose interac-tion using so called protein-ligand interaction fingerprints (PLIFs) [SC, 2008] andperformed a diverse subset selection.

45

Chapter 3

In Silico Discovery of AllostericInhibitors of TRMT2a RRM toAmeliorate PolyQDisease-mediated Neurotoxicity

This section is adapted from [Margreiter et al. 2020], in preparation. I was responsi-ble for computational aspects of this project and the analysis of the crystal structurewith CADD.

3.1 Introduction

Until now, no therapy is available to interfere with the clinical progression of polyQdiseases in humans [Davis et al., 2019]. However, the administration of antisenseoligonucleotides or genetic deletion that interferes with the expression of expandedpolyQ tracts in the androgen receptor (AR) gene, was able to rescue animal modelsfrom disease phenotype [Cortes et al., 2014, Lieberman et al., 2014]. In this respect,promising clinical trials using antisense oligonucleotides (ASOs) designed to in-hibit relevant messenger RNA, and thereby reducing the concentration of mutantRNA and proteins, are underway to uncover disease-modifying treatments [SarahJ. Tabrizi, 2019]. However, ASOs are administered through intrathecal injection- aninvasive technique and only very few drugs are currently FDA-approved [Christoand Bottros, 2014].

In this work, I pursue an alternative therapeutic strategy. Here, I used insilico methods to discover orally available inhibitors for an enzyme that mediatesthe correct translation of toxic CAG tracts.

Using a high-throughput RNAi screen, tRNA methyltransferase 2(TRMT2a) was identified as a novel modifier of polyQ-induced toxicity [Voßfeldtet al., 2012]. This protein predominantly localizes to the nucleus and converts uri-

46 3 Decreased Aggregate Formation upon TRMT2a Inhibition

dine to 5-methyl uridine (m5U) at position 54 in tRNAs [Carter et al., 2019]. Al-though methylation of different tRNA amino acid acceptors varies between species,the methylation of m5U at position 54 is conserved and found from yeast to man.This strong conservation suggests a pivotal role for this modification [Hou and Per-ona, 2009]. In Escherichia coli, the enzyme TrmA catalyzes the formation of m5U54[Nordlund et al., 2000] , while in Saccharomyces cerevisiae, the enzyme responsiblefor this modification is Trm2p [Nordlund et al., 2000]. In mammals, TRMT2a is con-sidered to represent the ortholog of Saccharomyces cerevisiae Trm2p [Towns and Be-gley, 2012]. Indeed, similar to Trm2p, TRMT2a harbors a putative RNA recognitionmotif (RRM) at the N-terminus and a methyltransferase domain at the C-terminus[Chang et al., 2019]. The second ortholog of Trm2p, TRMT2b, is responsible for them5U54 methylation in human mitochondrial tRNA [Powell and Minczuk, 2019].

Previous studies on Escherichia coli have shown that the presence of thismethylation increases the fidelity and efficiency of protein synthesis by stabilizingthe three-dimensional structure of tRNA in vitro [Davanloo et al., 1979, Kerstenet al., 1981]. So far, however, the biological function of the enzyme responsible forU54 methylation in mammalian species remains largely unexplored [Chang et al.,2019].

In this study, I aimed to expand our knowledge on TRMT2a through sev-eral bioinformatics approaches, molecular dynamics simulations and subsequentlyby proposing small molecules for its inhibition. Our aim is to understand if ham-pering its function by small molecules might reduce some of the polyQ-related tox-icity aspects, as shown by HEK293 cells with stable silencing of TRMT2a [Voßfeldtet al., 2012].

We performed several in silico approaches to define a small library ofTRMT2a-interfering molecules. To this end, the crystal structure of one of the do-mains of the protein, the RRM, was determined. This structural information sig-nificantly broadened the knowledge of mammalian TRMT2a and it was exploitedto guide our discovery. Surface Plasmon Resonance (Biacore) experiments on theRRM were also performed. Although no direct binding could be observed specif-ically against the RRM, except for Spermidine, several candidate molecules wereable to decrease cell death and ameliorate the aggregation of polyQ peptides incultured cells.

3.2 Structure of the TRMT2a RNA Recognition Motif

3.2.1 Structure Determination

A variety of expression constructs of subfragments of TRMT2a were designed.Here, homology modelling approaches proved crucial to select suitable proteinconstructs. Amongst them, only a fragment from amino acid 69 to 147 that es-sentially included the RRM could be expressed and purified to near homogeneity.Robot-assisted crystallization screens yielded initial hits that were subsequently

3.2 Structure of the TRMT2a RNA Recognition Motif 47

optimized for large, well-diffracting crystals. Datasets were collected at the SwissLight Source (SLS) and German Electron Synchrotron (DESY) and resulted in astructure with 1.2 A resolution and favorable statistics and stereo-chemistry (Rfree= 17,15%; Table 1). The structure revealed the typical arrangement of four anti-parallel β -strands and two α-helices that are packed against each other. They arearranged in a β1-α1-β2-β3-α2-β4 fold topology [Loerch and Kielkopf, 2015], as al-ready reported for other RRMs [Dreyfuss et al., 2002]. The root means square de-viation of its Cα atoms to its closest structural homolog (PDB-ID: 3VF0) is 1.6 A.Loops that connect β1/α1 (referred to as loop 1), β2/β3 (loop 3) and α2/β4 (loop5)are situated in the lower part of the β-sheet, whereas loop 2 and 4 are found in theupper part of the β-sheet, see Fig 3.1.

α1α2

β4

β1

β3

β2

Loop1

Loop2

Loop3

Loop4

Loop5

180°

A B

C

DFigure 3.1: Structural features of the lower resolution TRMT2a RRM crystal struc-ture in cartoon representation. A The two central β-strands feature the submotifsRNP1 (β3 strand) and RNP2 (β1 strand). Canonical RNP1/2 residues found inTRMT2a RRM are highlighted (RNP2 as yellow sticks on the β1-strand and RNP1as magenta sticks on the β3-strand). The sidechain of F113, part of RNP1, is solvent-exposed. In contrast, the sidechain of F116 (also part of RNP1) is buried and partof loop 4. B View of the lower helical site of TRMT2a RRM and the water net-work. The indole ring of W134 π-stacks with F92 (blue dashed lines). Cα distancebetween R95 and W134 is highlighted as a purple dashed line (see subsequent sec-tions). C Agglomerative clustering using Euclidian distance. Discs with the samecolor belong to the same water cluster. The largest cluster is circled.

Typically, the β-sheet of an RRM interacts with a variable number of twoto eight unpaired RNA nucleotides via two consensus regions located in the center

48 3 Decreased Aggregate Formation upon TRMT2a Inhibition

of two neighboring β-strands. These submotifs are referred to as ribonucleopro-teins (RNPs), e.g. RNP1 and RNP2. In the structure of this RRM known determi-nants of RNA-RRM molecular recognition can be found, such as positively chargedresidues and aromatic amino acids, which form π-π-stacking interactions with theRNA nucleobases [Clery et al., 2008].

Multiple sequence alignments of known RRM structures revealed thatRNP1 is composed of K/R- G- F/Y-G/A112-F113/Y-V114/I/L-X-F116/Y (boldfaceamino acids refer to positions found in TRMT2a RNP1/RNP2; X refers to an arbi-trary amino acid) [Maris et al., 2005]. RNP1 is present in the TRMT2a RRM β3strand and spans from aa 109 to 116. In TRMT2a an additional shorter and lessconserved N-terminal region, RNP2, is part of the adjacent β1 strand (aa 75-80).The RNP2 of TRMT2a RRM features I/V/L75-F/Y-I/F/L77-X-N79-L; [Maris et al.,2005]. Generally, RRMs display two to three conserved aromatic residues on RNP1and RNP2, e.g. phenylalanines/tyrosines. Typically, one aromatic RRM residueπ-stacks with the 3’-end of the RNA and another one is inserted between two RNAribose rings [Clery et al., 2008]. The positioning of the aromatic side chains is crucialfor RNA recognition [dit Konte et al., 2017]. In TRMT2a RRM, two phenylalaninesare present on RNP1 and only one is exposed to the solvent (F113).

3.3 Structure-based Approaches

Without structural information on the tRNA-RRM complex and a diverse set ofRNA binding modes for known RNA-RRM complexes, we did not attempt to de-sign a competitive structure-based inhibitor. Moreover, the crystal structures ofRRM lack distinct binding cavities. Therefore we searched for flexible sites that canpotentially extend into underlying cavities and allosterically modulate tRNA bind-ing. To this aim, we chose to augment traditional druggability detection methodswith alternative approaches, which also incorporate receptor flexibility.

3.3.1 Bioinformatics Analysis

Binding sites were studied with respect to their sizes, shapes, hydrophobicity, elec-trostatic interactions, composition, and residue distributions. It has been furtherrecognized that binding sites have been relatively well conserved during evolution,unlike the remainder of the protein surface [I et al., 2004]. Analysis of conservationpatterns, therefore, appears to be a useful tool for prediction of protein bindingsites.

Thus, we calculated the entropy-based sequence conservation of theTRMT2a RRM residues with AL2CO[Pei and Grishin, 2001] using a multiple se-quence alignment retrieved from RRMdb[Nowacka et al., 2019], an evolutionary-oriented database for RRM sequences. TRMT2a RRM is classified as a member offamily 262 with 25 known members. Highly conserved residues are not only found

3.3 Structure-based Approaches 49

Conservation Index

B

A

C 180°

front back

K135P81

Figure 3.2: A Multiple Sequence Alignment (MSA) of several RRMs relatedto TRMT2a RRM (UniProt Code: Q8IZ69) B Weblogo depiction of the MSA CEntropy-based conservation indices mapped onto TRMTA RRM crystal structure(left: front view, right: rear view). Higher indices reveal higher conservation, notonly on the putative tRNA binding site (on top of the β-sheet), but also on thehelical back (conservation index mapped onto the surface), in particular P81 andK135.

on the face of the β-sheet (the putative tRNA binding site), but also on the he-lical back (See Fig. 3.1). All aligned sequences contain a lysine at position 135.A tryptophan at position 134 is present in all but three sequences, where a ty-rosine/phenylalanine takes its place. Such high sequence conservation far fromthe putative tRNA binding site indicates a functionally important role of theseresidues, see Fig. 3.2c.

3.3.2 Crystallographic Water Clustering and 3D-RISM Analysis.

Since water clusters can mediate protein-ligand binding, we first searched for theirpresence at the surface of the protein [Schiebel et al., 2018]. We performed anagglomeration-based clustering of water molecules resolved in the crystal structureand identified four non-singleton clusters (see Figure 3.1B-C). The largest crystallo-

50 3 Decreased Aggregate Formation upon TRMT2a Inhibition

W134

Figure 3.3: Crystallographic and computational solvation analysis. Protein patches(projected onto a sphere encapsulating TRMT2A RRM) are outlined as blue areas(positive), red (negative) and hydrophobic sites (green) with significant residueslabeled. On the right, a solvent analysis with 3D-RISM around W134 is shown.Spheres represent predicted hydration site centers and are labeled according tofree energies in a.u. (Here, red sites indicate unstable water positions and greyacceptable ones, respectively.) Two unstable waters are buried 2.5A beneath W134(highlighted with black circles). Depicted with MOE (Molecular Operating Envi-ronment, Chemical Computing Group, (CCG)).

graphic cluster contains four water molecules and is located between the α1 helixand loop 5, in the so-called minor/small groove. In the high resolution structure,a sulfate ion takes the place of this cluster. We analyzed the protein surface forhydrophobic areas and regions with strong charges and identified a large posi-tive patch surrounding W134. This motivated us to investigate solvation effects inmore detail with the so-called three-dimensional reference interaction site model(3D-RISM) theory [Kovalenko and Hirata, 1998]. The latter can estimate water dis-tribution propensities around a solute (see Fig. 3.3). This analysis confirmed thehydration site centers we found in the crystal structure. Notably, two unfavorable,water propensity peaks (e.g. water molecules), buried 2.5A beneath the sidechainof W134 were identified: the favorable release of these two ”water molecules” tothe bulk might potentially drive local structural rearrangements.

3.4 Site Prediction on the Protein Crystal Structure

3.4.1 Traditional Structure-based Druggability Detection

We performed a druggability assessment, i.e. we analyzed the size, shape, andchemical features of all the RRM cavities with DoGSiteScorer [Volkamer et al.,2012a] (v2.0.0 15.01.2019). Five candidate binding sites were identified, see Figure3.4, among which the largest one overlaps with the above-mentioned minor/small

3.4 Site Prediction on the Protein Crystal Structure 51

groove.However, all hereby detected sites were found to be too shallow to accom-

modate a ligand. Unfortunately, this indicated that TRMT2a RRM is most likelyundruggable.

DoGSite Pocket AssessmentName DrugScore SimpleScore Volume (Å3)P1 (cyan) 0.18 0.16 221.44P2 (magenta) 0.09 0.22 153.66P3 (yellow) 0.44 0.21 104.19P4 (salmon) 0.26 0 100.86P5 (white) 0.36 0.08 100.86

180°

Figure 3.4: Discovered sites with DoGSiteScorer[Volkamer et al., 2012a] on the pro-tein crystal structure of TRMT2a RRM, with corresponding scores and pocket de-scriptors in the table. None of the outlined sites reaches into the protein. Proposingligands for such sites might result in low binding affinity.

3.4.2 Infering Druggability from other RRMs

An alternative approach to elucidate druggability is to infer it from related struc-tures, with known binders. Interestingly, some RRMs do in fact bind peptides.Protein-protein binding occurs also at shallow binding sides, e.g. the Raver1 PRI4peptide forms a complex with polypyrimidine tract binding protein RRM2 (PDBCode: 3ZZZ). Thus, we were wondering if this might also be the case for TRMT2aRRM.

Despite binding on opposite faces of RRM2, this RRM can form a ternarycomplex with the peptide Raver1 and RNA [Rideau et al., 2006]. Another exampleof peptides binding to the helical face of an RRM is given by the complex of splicingfactor SPF45 U2AF-homology motif and SF3b155 ULM (aa 333-342) [Corsini et al.,

52 3 Decreased Aggregate Formation upon TRMT2a Inhibition

2007]. Here, Trp338 of SF3b155-ULM5 is accommodated into SPF45-UHM.

3.4.3 Small Molecular Probes to Detect Druggable Sites

Cosolvent mapping approaches allow a druggability assessment without priorknowledge of known binders or binding sites, while at the same time fully em-bracing protein flexibility.

To this end, we simulated TRMT2a RRM with a Mixed Solvent MolecularDynamics approach (Schrodinger Release 2019-3: MxMD, Schrodinger, LLC, NewYork, NY, 2019.). The protein is embedded in a binary mixture consisting of wa-ter and one of six small organic molecules, e.g. cosolvents, serving as probes thatmimic ligands or peptides. During a production stage of 75 ns, cosolvent positionswere registered. Overlapping regions of high propensities for individual fragmentsare then concatenated into so-called binding hotspots, see Fig. 3.6. Moreover, theelucidated hotspots are in good agreement with the previously used traditionalbinding site detection methods, see Fig. 3.4. Strikingly, a protein backbone-likecosolvent probe (N-methylacetamide, see Fig. 3.6) attaches frequently to the dor-sal site of TRMT2a RRM. From this analysis I concluded, that also in the a case ofTRMT2a RRM this site may play a role in protein-protein binding. In related RRMsthis peptide binding modifies RNA binding allosterically. Therefore, we investi-gated if this site may also be allosterically connected to RNP1/RNP2.

RNP1/RNP2 and small/minor groove allosteric regulation. To predict the pres-ence of allosteric communication, we carried out a normal mode analysis (NMA)with simulated perturbations mimicking ligand binding on the suggested tRNA-anchor-phenylalanine (F113 of RNP1) with AlloPred [Greener and Sternberg, 2015].The unperturbed and perturbed normal modes were compared, along with pocketfeatures by employing an SVM model. Results show that the minor/small grooveregion can indeed allosterically modulate the opposite β-strand, where F113 ispresent, see Fig. 3.5. Specifically, W134 seems to be a major player in this allostericcommunication since changes in its immediate surroundings are allosterically con-nected to the suggested tRNA binding site.

This suggests that our findings are in line with the fact that the minor/smallgroove of other RRMs has been previously described to interact with different pep-tides [Clery et al., 2008]. Strikingly, in NMR studies hydrophobic packing againstthe helical back of this site was found to be able to occur simultaneously withRNA binding. Also, tryptophan, given its bulky and hydrophobic sidechain, israrely found in loops, unless these loops are involved in protein-protein interac-tions [Gavenonis et al., 2014].

To explore this site further, we investigated the possibility of cryptic siteformation, see Fig. 3.7. Cryptic or transient sites can be defined as only opening-up when a binder is present [Kokh et al., 2016]. In a recent attempt to expand thedruggable proteome, these sites were suggested as an interesting alternative for hit

3.5 Transient Site Discovery on TRMT2a RRM 53

A B

Figure 3.5: AlloPred analysis of the crystal structure of TRMT2a RRM. Residuesdepicted as blue sticks were used for the putative orthosteric (tRNA binding) sitedefinition. Red and orange sites were detected as possible allosteric sites. The crys-tal structure of TRMT2a RRM is represented as white transparent cartoon. A Onlythe solvent exposed F on the β-sheet for the definition of the orthosteric site wasused B All conserved residues on the β-sheet were used to define the orthostericsite. Two Allosteric sites were detected (red, orange sticks) independently of theproposed tRNA binding site definition.

discovery especially for proteins that lack conventional binding pockets [P et al.,2016].

3.5 Transient Site Discovery on TRMT2a RRM

We used the machine learning based method CryptoSite[P et al., 2016], which al-lows the identification of residues forming a cryptic site by using a support vec-tor machine (SVM). We found that the conserved residues W134 and K135 aresuggested to be part of a cryptic site located in the minor/small groove (Fig. 3.7).Analysis of the binding site dynamics conducted with TRAPP (TRAnsient Pock-ets in Proteins, [Kokh et al., 2013] ) corroborated this finding. Specifically, belowthe minor/small groove, formed in part by W134/R135, a small, accessible cavityis formed repeatedly upon different types of perturbation: either under rotameri-cally induced site chain perturbations with Langevin dynamics (L-RIP) [Kokh et al.,2016] or under ligand-induced perturbations (RIPLig) with a probe mimicking anapproaching apolar/aromatic ligand [Kokh et al., 2016].

54 3 Decreased Aggregate Formation upon TRMT2a Inhibition

Acetone

N-Methylacetamide

Acetonitrile

Imidazole

Pyridine

Isopropanol

Binding Hotspots

TRMT2a RRM X-ray Structure

Figure 3.6: Binding hotspot mapping with Mixed Solvent Molecular Dynamics ofTRMT2A RRM. High propensity regions (contoured at an isovalue of 20σ) for in-dividual cosolvent probes are color coded. All mixtures contain water and onetype of small organic probes (binary solvent mixtures). During these simulations,polar probes (acetone, acetonitrile and imidazole) localize on the surface of the β-sheet and contribute to the two hotspots (red) on the left top of the RRM (rear vieworientation) depicted in the center. All probes contribute to the hotspots on the he-lical back, while aromatic probes (imidazole and pyridine) exclusively contributeto them. The protein backbone-like cosolvent (N-methylacetamide, cyan) is alsofound preferentially in this area.

3.6 Explict Solvent MD and Snapshot Selection

To corroborate our findings, we investigated the flexibility of this putative allostericpocket through molecular dynamics simulations. We performed a 10 µs-long MD

3.6 Explict Solvent MD and Snapshot Selection 55

180°

A B

C

W134

R95

R91

D88

K135

1 2 4

D

Ref. 1 2 3 4

83

87

91

94

134

Bind

ing

Site

Res

idue

s

3

1

6

RMSD

4Subpocket Number

1 2 3

clos

edop

enov

erla

p

Snap

shot

s

13

1

Figure 3.7: A CryptoSite indices mapped onto TRMT2a RRM (shown in cartoonrepresentation). Values ranging from grey, unlikely part of a cryptic site to red,higher probability to contribute to a cryptic site. On the helical site of TRMT2aRRM, W134 and L135 are detected as putative contributors to cryptic site forma-tion. B Rear view of TRMT2a RRM with the identified subpockets and consistentnumbering in B, C, and D. All appearing transient regions are depicted as isosur-faces (at 25% occurrence over the simulation). Binding site residues are depicted assticks. C Representative TRAPP binding site flexibility estimation (RMSD valuesin A are color-coded from white to red) using the L-RIP/RIPlig approaches. W134,H83, and K135 show high structural variance. D Subpocket occurrence during thesimulations shows that all subpockets are formed repeatedly.

simulation in explicit solvent (Figure 3.8). While the overall structure remainedstable, loop 5 appears to contribute most to RRM flexibility. This is in line withthe water distribution analyses in this area discussed above. We found that the Cαdistance of W134 and R95 can increase twofold (from 10.6A in the crystal structureto more than 20 A (Figure 3.8B) during the simulation. We observed the unfoldingand refolding of this loop numerous times over the course of the simulation.

During our simulation, we repeatedly observed that W134 transiently

56 3 Decreased Aggregate Formation upon TRMT2a Inhibition

RMSD

/ Å

0

5

All H

eavy

Atoms/

ÅW

134

Heav

y At

oms/

Å

All Cα

Loop 5 Cα

π-stacking

BA

X-ray

Figure 3.8: Explicit-Solvent Molecular Dynamics Simulation A 2D-RMSD plots ofCα atoms (horizontal axes) and heavy atoms (vertical axis). B (top) Flexibility in theloop 5 region is mediated by the transient π-stacking of W134 and F92 (blue dashedlines in Fig. 1), as can be seen by their centroid distances. Frames with detected π-stacking, according to [Vernon et al., 2018], are highlighted in grey. (bottom) Wefound the Cα distance between R95 and W134 (as outlined in Figure 1, purpledashed line) a more robust indicator of pocket formation. In the X-ray structure,the distance is 10.62 A., we observe the formation of a hydrophobic pocket whenthe residues approach. We extracted snapshots with a distance below 8.5A (indi-cated by the purple line) for a druggability analysis. The red arrow indicates thesnapshot used for VS. After considering transient site rotamer propensities the 3rd

most druggable frame according to DrugScore (red) was prioritized for VS. A su-perposition with the X-ray structure (green) is shown.

passes from solvent-buried to solvent-exposed conformations. The conformationswhere W134 is solvent-exposed are also characterized by loop 5 structural rear-rangements. When the Cα atoms of R95 and W134 come close, we registered theconsistent formation of an enclosed cavity. Specifically, the indole ring of W134 isgradually pushed outwards and becomes solvent-exposed. This is of particular in-terest since W134 was indicated in our initial CryptoSite model and also R95, in thefollow-up TRAPP analysis, as key residues involved in putative cryptic site forma-tion. Also, R95 was highly conserved (77%) based on our conservation analysis.

3.7 Virtual Screening on RRM and Post Processing. 57

We thus selected frames with a Cα distance (R95-W134) below 8.5A re-sulting in a subset of 104 frames out of 100,000 total frames for the whole trajectory.Among those, six conformations had a DrugScore[Volkamer et al., 2012b] above0.60. The two highest-ranking receptor conformations displayed a W134 indolering flip, which infrequently occurs during the simulation and were therefore notconsidered. The third highest-ranking conformation was frame 60,942 (after 6,094.2ns, DrugScore: 0.62), where the loop is in the process of refolding in its crystallo-graphic conformation after roughly 1 µ s long remodeling. In this state, the bindingsite volume of the minor/small groove is significantly increased reaching up to 589A3.

By comparing this selected conformation with the original X-ray struc-ture, we can see that the loop α 2-β 4 moves outward, while the α2-helix partiallyclaims the now available space (Figure 3.8B). During this widening of the pocket,the side chain of tryptophan (W134) opens a small and transient cavity below thesmall groove. The indole ring then forms a cation-π interaction with R135. This in-teraction can not be observed in the crystal structure and partially explains how theentropic penalty for the solvent exposure of the indole ring is overcome. The RMSDbetween the crystal structure and the selected structure is 1.7 A (Figure 3.8B).

Peptide Binding on the helical site of other RRMs. The minor/small groove ofother RRMs has been previously described to interact with different peptides [Cleryet al., 2008]. Strikingly, in NMR studies hydrophobic packing against the he-lical back of this site was found to be able to occur simultaneously with RNAbinding. Interestingly, tryptophan, given its bulky and hydrophobic sidechain, israrely found in loops, unless these loops are involved in protein-protein interac-tions [Gavenonis et al., 2014]. This partially hydrophobic pocket was next usedfor structure-based small inhibitor design. A more detailed pocket description waselucidated with Schrodinger SiteMap [Halgren, 2009].

3.7 Virtual Screening on RRM and Post Processing.

To exploit our putative cryptic pocket we performed a virtual screening workflowto prioritize small molecules that might bind into this site for in vitro testing. Asa first step, we retrieved in silico libraries of purchasable small molecules and pre-pared them by adjusting pH and conformational expansion. Given our prior in-sights into the protein receptor structure, we generated a receptor representationfor our virtual screening surrounding W134.

During virtual screening, we generated a possible binding pose for eachligand conformer and then scored them with a preliminary scoring function. Un-reasonable binding mode hypotheses were weeded out, while the best ones wereretained and subsequently rescored using a more detailed scoring function. By vi-sual inspection, we found several molecules with ChemGaussScore4 values (< -7.5)

58 3 Decreased Aggregate Formation upon TRMT2a Inhibition

and diverse binding modes. Interestingly, several compounds predicted to bindinto the minor/small groove revealed a common substructure, despite displaying di-verse binding modes. Compounds g-j (See Fig. 3.9) not only showed reasonabledocking poses/scores but also displayed activity in our cell-based assays.

3.7.1 Ligand-based Approach on the Catalytic Site

Several inhibitors have been indicated for the E. coli homolog of hTRMT2a (TrmA)and are subsequently discussed. The methylase activity requires the cofactor S-adenosyl-methionine (SAM or AdoMet) (Km values between 0.002 - 0.018 mM)[Shugart, 1978]. The SAM precursor L-methionine and structurally related nor-leucine are weak TrmA binder (both Km = 30 mM). The ethyl analogon of methion-ine (ethionine) is a known non-competitive selective inhibitor. The structurally re-lated S-adenosyl-ethionine does not show such selectivity (Km = 0.6 mM) [Tscherneand Wainfan, 1978]. Ethylthioadenosine, where methionine is replaced by an ethylgroup shows inhibition of Km = 0.2 mM. This indicated that both the adenosineand methionine substructure by themselves are sufficient for binding. S-adenosylhomocysteine (SAH) on the other hand acts via product inhibition, competitive toS-adenosyl-methionine, suggesting a tightly regulated biochemical network. Sev-eral polyamines were shown to inhibit to a different extent E. coli. TrmA, alsodepending on environmental conditions [Nonaka, 1994].

Among these, only L-ethionine is selective. Unfortunately, this compoundis hepatotoxic [Lu et al., 1976]. The other above-mentioned ligands are not toxic butthey are not specific. Therefore, I explored the chemical space around ethionine us-ing a pharmacophore approach, in an attempt to optimize toxicity properties (andpharmacological/pharmacokinetics profile in general), while retaining specificity.

To this purpose, I superposed the structurally similar inhibitors nor-leucine, methionine, and ethionine and constructed a pharmacophore model. Ichose, specifically methionine and norleucine because, despite their structural sim-ilarity to ethionine, they introduce some variability in the hydrophobic moiety (SeeFig. 3.10 C).

The resulting model contained only three features: a hydrogen bonddonor and acceptor (amino acid abstraction) and an equidistant hydrophobic sitefeature (sidechain). This pharmacophore hypothesis was used to filter the ZINCdatabase [Sterling and Irwin, 2015], resulting in 1220 hits. Those were subse-quently filtered with Canvas Version 3.9.011 (Schrodinger Release 2019-1: Canvas,Schrodinger, LLC, New York, NY, 2019.)

As a next step, the pharmacological/pharmacokinetic profiles of the com-pounds were assessed. These are summarized by an overall ADMET acceptancescore; Compounds violating Lipinski’s rule of 5 [Lipinski, 2004] were omitted.Next, compounds were sorted according to human oral availability and blood-brain barrier permeability. Virtual screening hits and pharmacophore derived com-pounds were selected for in vitro testing.

3.8 In cell Assays 59

3.8 In cell Assays

From all compounds for which we performed cell death assays, the compounds(2S)-2-aminohexanoic acid, L-norleucine, L-methionine L-ethionine, spermidine,putrescine, and spermine were previously reported in the literature as inhibitorsof E.coli or Saccharomyces cerevisiae homologs of TRMT2a, respectively. Neither L-norleucine, L-Methionine nor L-ethionine was however able to reduce cell death,suggesting considerably different catalytic binding site features. Putrescine andspermidine, consistently decreased cell death. Futhermore, the organomercuryagent p-chloromercuribenzoic acid (PCMB) reacts with free thiol groups in pro-teins containing cysteine. In fact, TRMT2a RRM features a solvent-exposed cysteine(CYS111).

On the other hand, compounds g-u resulted from structure-based drugdiscovery and the putative cryptic allosteric pocket on the helical site of TRMTR.The majority of compounds were predicted to bury at least one aromatic moietyinto the cryptic pocket. Compound n contains an hydrophobic naphthalenyl group.Not only does it fail to rescue the cell lines affected by polyglutamine toxicity, butit also reveals additional cytotoxicity, even in the absence of Cytochrome P450-dependent metabolic activation typically observed for naphthalene [Spiess et al.,2010].

Interestingly, ligands featuring a phenyl group (e.g. compounds g-j, m,p, and t) predominantly decrease cell death, with the exception of the toxic com-pound w and compound t for which we detected only a subtle drop. In case, thesecompounds bind indeed into the cryptic site, taking the place of the indole ring ofW134, they would be able to form π-stacking interactions not only with F92 butalso with the indole ring, slightly displaced with respect to the X-ray structure.

Furthermore, since we did not measure TRMT2a inhibition directly, butby means of an assay, we wanted to investigate the possibility of problematicchemotypes which frequently lead to false positives with the Pan-assay Interfer-ence (PAINS) filter [JB and JWM, 2018]. Four tested compounds did not pass thePAINS filter namely compounds m, n, o, and p. As aforementioned compound nincreased cytotoxicity. Nevertheless, for the remaining three compounds (m, o, p)a false positive hit can not be excluded.

For all tested compounds that decreased cytotoxicity below 50% , we sub-sequently performed a filter trap assay to ensure that these compounds did notonly increase cell viability but also able to suppress aggregate formation. All thesecompounds showed decreased polyQ aggregation.

Notably, three novel compounds discovered by means of structure-baseddiscovery (j, l, and p) along with the polyamines (spermidine and putrescine) re-ported in the literature for TRMT2a homologs consistently and effectively loweredcell death and aggregation in polyQ affected cells and do not contain critical sub-structures that often confound cell assays. Finally, compounds j and p show struc-tural similarity in the form of a phenyl group.

60 3 Decreased Aggregate Formation upon TRMT2a Inhibition

3.9 Biophysical Measurements

To analyze the potential in vitro interaction of three selected compounds from insilico screening (compounds g, j, l) of the RRM from TRMT2a, surface plasmon res-onance (SPR) experiments were performed. As a positive control, the polyaminespermine was established, which we observed to bind in the micromolar rangeto the RRM (KD= 1.30 ± 0.27 µ M; Figure 5). In contrast, none of the selected com-pounds showed any binding to the protein. Noteworthy, the SPR experiments wereperformed with low salt concentrations in the running buffer to also allow for theoccurrence and detection of lower affinity binding events. The size difference be-tween compounds (> 0.2 kDa) and the RRM of TRMT2a (9.2 kDa) was taken intoconsideration by coupling high amounts of protein (3000 RU) to the CM5 chip.

3.10 Methods

3.10.1 Crystal Structure

Cloning, expression, and purification. Plasmid containing human TRMT2a se-quence was provided by Aaron Voigt. Human TRMT2a RNA (aa 69 – 147) wascloned with a cleavable SUMO-tag at the N-terminus and expressed in Rosetta E.coli using LB medium. After isopropyl-thiogalactoside (IPTG, 0.5mM) inductionovernight at 18 C, cells were harvested (4500 x g, 15 min, 4 C) and flash-frozenwith liquid nitrogen. All subsequent steps were carried out at 4 C. Typically, 6liters of culture were resuspended in lysis buffer (50 mM HEPES/NaOH pH 8.5,500 mM NaCl, 20 mM Imidazole, 0.5 % Tween, 2 % Glycerol) supplemented withone tablet of EDTA-free protease inhibitor cocktail to a total volume of 50 ml. Aftersonication (4 x 6 min, amplitude 40% , output: 6), the lysate was clarified by cen-trifugation (40000 x g, 30 min), loaded onto 5 ml His-FFTrap column (GE Health-care), equilibrated in His-A buffer (50 mM HEPES/NaOH pH 8.5, 500 mM NaCl,20 mM Imidazole), and washed with 10 column volumes (CV) of His-A buffer, fol-lowed by a 10 CV wash with His-B buffer (50 mM HEPES/NaOH pH 8.5, 2000 mMNaCl, 20 mM Imidazole). SUMO tagged protein was eluted with a 10 CV gradientof His-A buffer and His-C elution buffer (50 mM HEPES/NaOH pH 8.5, 500 mMNaCl, 500 mM Imidazole). For tag cleavage, the eluate was supplemented with 100µ g of HRV 3C protease (PreScission) and dialyzed against dialysis buffer (50 mMHEPES/NaOH pH 7.5, 500 mM NaCl, 1 mM DTT) O/N. Cleaved protein was ap-plied onto a His-FFTrap column and the flowthrough that contains cleaved proteinwas collected. This flowthrough was concentrated and loaded onto a size exclu-sion chromatography column (Superdex 75 10/300 gl) equilibrated in SEC buffer(50 mM HEPES/NaOH pH 7.5, 500 mM NaCl). Purified protein was usually con-centrated to 8 mg/ml by ultrafiltration. Protein concentrations were determined bymeasurement of the A280. Aliquots were flash-frozen in liquid nitrogen and storedat -80 C.

3.10 Methods 61

Crystallization, diffraction data collection, and processing. The crystallizationexperiments for TRMT2a RRM domain were performed at the X-ray Crystallogra-phy Platform at Helmholtz Zentrum Munchen. Crystallization screening was doneat 292 K using 8.0 mg/ml of protein with a nanodrop dispenser in sitting-drop96-well plates and commercial screens. Crystals appeared after 1 week and werebig enough for X-ray diffraction experiments. The initial data set for solving thestructure was collected for a crystal grown in 0.22M lithium sulfate, 0.1M sodiumacetate (pH 4.5) and 26 % PEG 6000 (Hampton Research PEG screen). However,the best data set used for refinement was collected for a crystal grown in 0.2 Mcesium sulfate and 2.2 M ammonium sulfate (Hampton Research AmSO4screen).For the X-ray diffraction experiments, the crystals were mounted in a nylon fiberloop and flash-cooled to 100 K in liquid nitrogen. Prior freezing the crystals wereprotected with 25% (v/v) ethylene glycol. Diffraction data were collected at 100K on the PXI beamline (DESY, Hamburg) and PIII beamline (SLS, Villigen). Thediffraction data were indexed and integrated using XDS [Kabsch, 2010] and scaledusing SCALA [Evans, 2005] , Winn, Ballard et al. 2011). Intensities were convertedto structure-factor amplitudes using the program TRUNCATE(French and Wilson1978).

Structure determination and refinement: The structure of TRMT2a RRM witha resolution of 2.02 A was solved by the Auto-Rickshaw pipeline [Panjikar et al.,2005] [Panjikar et al., 2009] using the sequence as input. For the molecular re-placement step followed by several cycles of automated model building and re-finement, the Auto-Rickshaw pipeline invoked the following X-ray crystallographysoftware: MORDA (Vagin and Lebedev; http://www.biomexsolutions.co.uk/morda/), CCP4 [Nonaka, 1994] , SHELXE [Sheldrick, 2002] , BUCCANEER[K, 2006] [K, 2008] , RESOLVE [Terwilliger, 2000] , REFMAC5 [Murshudov et al.,1997] and PHENIX [PD et al., 2010]. Model rebuilding was performed in COOT[Emsley and Cowtan, 2004]. Initial refinement was done with the 2.02 A dataset inREFMAC5 [Murshudov et al., 1997] using the maximum-likelihood target function.Further refinement was done with another dataset having the higher resolution of1.23 A. The stereochemical analysis of the final model was done in PROCHECKandMolProbity [Laskowski et al., 1993]. The final model is characterized by R/Rfree fac-tors of 12.60/18.90% .

In Silico Analyses on the Crystal Structure Agglomerative Euclidean distanceclustering on the crystallographic water clusters was done with Wolfram Math-ematica 11.3. The X-ray structure was prepared with Maestro Schrodinger(Schrodinger Release 2019-1: Maestro, Schrodinger, LLC, New York, NY, 2019.)using default settings unless otherwise noted. Alignments were performed withClustalW [Larkin et al., 2007]. For binding site detection we used DoGSiteScorer[Volkamer et al., 2012a], the transient site discovery the machine learning-basedCryptoSite [Cimermancic et al., 2016] approach and TRAPP [Kokh et al., 2013].

62 3 Decreased Aggregate Formation upon TRMT2a Inhibition

TRAPP is a framework to generate and analyze large-scale secondary struc-ture rearrangements on short molecular dynamics timescales e.g. by disruptingsidechain atoms with short pulses (L-RIP). The system temperature is controlled bya Langevin thermostat to ensure a constant average temperature. In parallel, we in-vestigated ligand-mediated conformational changes. We used an apolar/aromaticamino acid, as a model for hydrophobic/aromatic ligand features and broughtthem into contact with the protein surface (RIPlig) [Kokh et al., 2016]. Finally, weinvestigated putative allosteric effects in RRM with AlloPred [Greener and Stern-berg, 2015].

3.10.2 Molecular Dynamics Simulation of RRM

The protein was prepared using PDB2PQR [Jurrus et al., 2018] (Version 2.0.0) andchecked for correct protonation states. From the partially occupied R137 the onewith the side chain closer to the protein center was retained as well as all crystal-lographic waters. To prepare the protein structure for molecular dynamics simu-lations, we used tleap (AmberTools 17, [Case et al., 2005a]) and the amber forcefield ff14SB to set up the system [Maier et al., 2015]. First, it was embedded in atruncated octahedral TIP3P [Jorgensen et al., 1983] water box with a buffer of 15 A,resulting in about 6690 water molecules. Neutralizing the system was achieved byintroducing nine randomly placed Cl– counterions with the addion2 application, tominimize potential perturbations of the system by adding charges near the protein.

We applied previously reported standard MD parameters with periodicboundary conditions, particle mesh Ewald (PME) to handle long-range electrostat-ics [Darden et al., 1993], a cut-off for non-bonding interactions of 8 A, the SHAKEalgorithm [Ryckaert et al., 1977] on bonds involving hydrogens, the Langevin ther-mostat for temperature control (with a collision frequency γ of 3 ps-1 and a timeconstant of 1 ps) [Loncharich et al., 1992] and the Berendsen barostat [Berendsenet al., 1984].

The production consists of a sequence of 100 NPT MD simulations (each100 ns long) without any additional restraints. The resulting trajectories were an-alyzed for convergence in the usual MD simulation parameters (energies, density,temperature, etc.).

We performed the analysis with PyMDLog (Version 1.0.1) and cpptraj(AmberTools17, [Case et al., 2005a]). The frames were imaged and stripped ofsolvent molecules and Cl– ions. RMSD values are calculated based on backboneatoms, while the first frame was used as a reference.

3.10.3 Virtual Screening

RRM. The protein conformation with the third highest DrugScore was further in-vestigated with virtual screening using MolPort stock compounds (7,409,777 com-pounds, www.molport.com) and NCI (237,771 compounds, https://www.nih.

3.11 Generation of HEKT293T with stable RNAi-mediated knockdown ofTRMT2a 63

gov ) as SMILES strings [Weininger, 1988]. The libraries were prepared as follows:i) With the built-in filters for lead-likeness in the OpenEye tools (OpenEye Scien-tific Software, Santa Fe, NM.http://www.eyesopen.com Hawkins, P.C.D.; Skill-man, A.G.; Warren, G.L.; Ellingson, B.A.; Stahl, M.T.), we discarded compoundsin violation of the ”rule of 5” [Lipinski, 2004] and ii) requested strict atom typing.iii) The compounds were adjusted to pH 7 and desalted. iv) We performed a confor-mational expansion considering conformers of up to 15 kcal/mol using OpenEyeOMEGA [Hawkins et al., 2010]. v) We prepared the apo receptor structure withthe apopdb2receptor on the frame with the third highest druggability (the cavityformed by solvent exposing the W134 indole ring. Thus, the side chain of W134was used as the central amino acid for the receptor setup). vi) Next, we conducteda virtual screening with the apoprotein receptor using OpenEye FRED [McGann,2011] with standard precision.

Finally, we considered only compounds with a CHEMGAUSS04 better(lower) than -7.5, and a diverse subset of 16 compounds was selected via visualinspection/PLIFs for in vitro testing.

CD. We computed low energy conformers of the three selected knowninhibitors with ChemAxon tools (Calculator Plugins were used for struc-ture, property prediction, and calculation, Marvin 19.8, 2019, ChemAxon(http://www.chemaxon.com)) of the three compounds and PharmaGist[Schneidman-Duhovny et al., 2008] to align them and build the hypothesis.This pharmacophore hypothesis was used to filter the ZINC database [Sterlingand Irwin, 2015]. ADMET properties were calculated with Canvas Version 2.8.014(Schrodinger Release 2016-1: Canvas, Schrodinger, LLC, New York, NY, 2016.).Next, we calculated Qikprop properties and omitted all compounds violating therule of 3 or 5, while retaining FDA-approved compounds. Compounds were sortedaccording to human oral availability and blood-brain barrier permeability. Aftervisual inspection, several known inhibitors from the literature and pharmacophorederived compounds were selected for in vitro testing.

3.11 Generation of HEKT293T with stable RNAi-mediatedknockdown of TRMT2a

Stable RNAi-mediated knockdown of TRMT2a was achieved by infection ofHEK293T cells with commercially available Lentiviral particles (MISSION®shRNA Lentviral Transduction Particles NM 182984.2-1574s1c1; Sigma-Aldrich).Cells with stable integration of the shRNA construct were determined by a selec-tion of puromycin-resistant colonies (0.5 µ g/ml puromycin; Invitrogen). Efficacyof TRMT2a silencing effect was determined by qPCR and Western blot. The cellline with the strongest reduction of TRMT2a (>90% ; short sh1574) was selected foruse in further experiments. The same procedure was used to generate a control cellline (short shK) expressing a scrambled shRNA (SHC002V; Sigma-Aldrich).

64 3 Decreased Aggregate Formation upon TRMT2a Inhibition

Transfections of HEKT293T cells. HEK293T cells were grown in Dulbecco‘smodified Eagle’s medium (DMEM) supplemented with 10% (v/v) FBS and 1%(v/v) penicillin/streptomycin. For transfection, cells were seeded at a density of30 x 104 cells per well in a 6-well-plate. 24 h post-seeding, cells were transientlytransfected with a construct mediating expression of an elongated polyQ with 103glutamines fused to GFP (polyQ103:GFP) using either Metafectene® (Biontex) orCalcium Chloride (CaCl2). Metafectene® was used according to the manufactur-ers instructions and CaCl2-transfection was achieved as follows: The transfectionmix (in relation to the total volume of cell culture medium per well) contained 1%CaCI2 (1M), 4% ddH2O, 5% 2 x HBSS; 0,075% (v/w) DNA. The mix was incubatedfor 5 min and added to the medium of cultured cells.

3.11.1 Cell Death Assay

Treated cells were mechanically detached from the plate by scraping and thecell/medium mixture was transferred to an Eppendorf tube. The cell/mediummixture was diluted in a 1:100 ratio with fresh DMEM medium. The dilutedcell/medium mixture was supplemented with trypan blue solution (0.04% ) in a1:1 ratio. Cell viability was analyzed according to the dye exclusion test using theautomated Cedex cell counter (Innovatis). For verification of observed differencesdeath, the effect of a few selected inhibitors was analyzed by a manual quantifica-tion of cell death. Here trypan blue-stained cells were analyzed in the Haemato-cytometer (Neubauer chamber). GFP-positive cells were counted and the relativenumber of additionally blue cells (dead cells) was determined. 100 GFP-positivecells were analyzed per condition and three to six biological replicates were as-sayed.

Cell lysis and protein fractionation Cells were washed in PBS and then mechan-ically detached from the bottom of the plate in RIPA buffer (100 µ l per six-well).Samples were centrifuged for 30 min at 13,300 rpm at 4 C and supernatants werecollected (RIPA soluble fraction). The protein concentration of the RIPA solublefraction was determined using the Bradford protein assay (DC Protein assay, Bio-Rad) according to the manufacturer’s instructions.

The pellet (RIPA insoluble fraction) was used for Dot Blot analysis. Thepellet (RIPA insoluble fraction) was washed two times (resuspended and cen-trifuged for 30 min at 13,300 rpm at 4 C in RIPA buffer). Finally, the pellet wasresuspended in urea buffer (30 mM Tris Buffer, 7 M Urea, 2 M Thiourea, 4% CHAPSpH - 8.5; 100 µ l per six-well). Proteins in the pellet were solubilized by sonication(incubation in Ultrasonic Cleaner bath, VWR) for 10 min followed by incubationfor 60 min at 4 C under rotation. Subsequently, the lysate was centrifuged at 4 C,9,000 xg for 30 min. The supernatant was collected as urea soluble fraction and theurea insoluble pellet was discarded.

3.12 Biophysical and Cell Experiments 65

Western blotting 20-50 µ g of the RIPA soluble fraction protein samples weresupplemented with 1x Laemmli buffer and boiled 95 C, 5 min. Samples weresubjected to SDS-PAGE (100 V for 90 min). Resolved proteins were transferredonto a nitrocellulose membrane (225 mA per gel for 60 min) by semi-dry blotting.The membranes were then blocked with skim milk (5% in TBS-T for 60 min) fol-lowed by incubation with the primary antibody at 4 C overnight (in 5% skim milkin TBS-T). The primary antibodies used were mouse anti-Polyglutamine, MiliporeMAB1574; mouse anti-Huntingtin polyQ, Developmental Hybridoma Bank, MW1or mouse anti-GFP; Roche 11814460001. All antibodies were applied in a 1:1000dilution. Membranes were washed three times for 10 min in TBS-T and incubatedwith the secondary horseradish peroxidase (HRP) coupled antibody (GE Health-care, NXA931V; in a 1:3000 dilution) for two hours under agitation at room tem-perature. Subsequently, the membranes were washed three times in TBS-T for 10min and the chemiluminescence signal was detected using the Super Signal® WestFemto Maximum Sensitivity Substrate (Thermo Scientific) detection kit accordingto manufacturer’s instructions. Chemiluminescence signals were visualized usingthe Alliance UVltec system (Biometra) and processed using Image J software.

Filter retardation assay Relative to the total protein concentration in theRIPA soluble fraction (volumes kept constant), we loaded a volume that wouldcorrespond to 90 µ g of protein in the RIPA soluble fraction. The total volume wasadjusted to 50 µ l by addition of RIPA buffer supplemented with 7 µ l dot blotbuffer (0.5 M Tris pH-6.8, 0.4% SDS, 20% Glycerol, 0.2 M DTT). The samples weretransferred to a nitrocellulose membrane (Protran®BA 83, Whatman, 0.2 µ M poresize) by sucking the probes through the membrane using a vacuum pump attachedto a dot blot chamber (Camlab, UK). The membrane was washed several timeswith TBS-T and blocked for 1 hour using 5% skim milk in TBS-T. Afterward, themembrane was treated as described for Western blotting using respective primaryand secondary antibodies.

3.12 Biophysical and Cell Experiments

Surface Plasmon Resonance (SPR). SPR studies were performed using a BIA-CORE 3000 system (GE Healthcare). The RRM of TRMT2a was diluted 10 mMHEPES pH 7.5 for immobilization. Using coupling buffer (500 mM NaCl, 50 mMHEPES 7.5, 0.05 % Tween) the protein was amino coupled to a CM5 chip (GE-Healthcare) according to manufacturer’s instruction reaching 3000 resonance units.

Analysis of protein-compound interactions was performed at a flow rateof 30µ L per min in running buffer (150 mM NaCl, 50 mM HEPES 7.5, 0.05 %Tween). Analyte of interest was diluted in running buffer and injected for 2 min-utes. To remove any residual attached compound, 2x 2 minutes regeneration in-jections with 1 M NaCl were done in between runs. Concentration series reachedfrom 0.5 µ M to 8 µ M for the positive control Spermine (Sigma Aldrich) and from0.5 µ M to 32 µ M for compounds g, j and l.

66 3 Decreased Aggregate Formation upon TRMT2a Inhibition

Data were analyzed with the BIA evaluation software (GE Healthcare).Obtained binding curves were double-referenced against the signal in a protein-free reference channel and a buffer run. At equilibrium of the binding curves, thecorresponding response was plotted against analyte concentration. The KD wasdetermined by fitting this curve to the steady-state affinity model. All experimentswere performed in quadruplets on different days.

3.13 Conclusion

PolyQ diseases is an umbrella term for a group of devastating neurological dis-eases. Despite their long-known causative mutations in the relevant genes, no cureis available to date. Currently, several clinical trials are underway, including oligo-antisense nucleotide Tominersen to uncover bonafide treatments [Sarah J. Tabrizi,2019].

In this work, we performed several in silico strategies to identify a smalllibrary of ligands able to modulate TRMT2aactivity in cell.

TRMT2a is an enzyme that converts uridine to 5-methyl uridine (m5U) atposition 54 on tRNAs across species, from yeast to man [Hou and Perona, 2009].However, very little is known concerning the role of this modification in mammals[Chang et al., 2019].

Mammalian TRMT2a harbors a putative RNA recognition motif (RRM) inthe N-terminus and a methyltransferase catalytic domain (CD) at the C-terminus[Chang et al., 2019].

No full-length structure is available for this protein. Here we managedfor the first time to express and purify a subfragment of TRMT2a corresponding tothe RRM. We, therefore, conduct two parallel strategies for finding inhibitors; oneexploiting the structural information of the RRM, the other exploiting the availableknowledge of general methyltransferase inhibitors.

RRM is characterized by the absence of protein pockets. However, weidentified a putative cryptic site, potentially able to allosterically affect RNA bind-ing. As the first step, we conducted a structure-based virtual screening under thishypothesis and the selected candidates were tested in both in vitro and in cell ex-periments. Several compounds with similar substructures induced a reduction ofpolyQ aggregation in cell.

For the CD, a pharmacophore model based on the TRMT2a cofactor, S-adenosylmethionine (SAM or AdoMet) was identified and used for a ligand-basedvirtual screening. Also here, several selected candidates were able to induce a re-duction of PolyQ aggregation in cell.

Due to several possible reasons, we were not able to prove any directbinding between our ligands and the TRMT2a, except for spermidine. First, ourbiophysical experiments were restricted to RRM alone; therefore ligands presum-ably targeting the CD could not be tested; while ligands targeting the RRMs weredesigned for the cryptic site in the minor/small groove; our calculation suggests that

3.13 Conclusion 67

the opening of this site might be triggered by the presence of the tRNA. The latterwas unfortunately not present in our biophysical experiments. Also, the absence ofthe rest of the protein might play a role in the suggested allosteric mechanism con-necting the RNA binding site with the identified cryptic site. Second, the observedeffects of inhibitors in cell might be the result of an interaction with other proteins.In an attempt to exclude this possibility, we tested our ligands also on TRMT2a-silenced cells (sh1574) and no effect on aggregation or cell death was observed,pointing toward a TRMT2a-dependent effect of our ligands. We also evaluatedthe possibility of off-targets with a reversed screening approach as implementedin SwissTargetPrediction [Daina et al., 2019] to elucidate putative human proteintargets for non-polyamine bioactives. For all compounds, the most frequently pre-dicted target class was kinases, with the exceptions of ligand k (lyase family). Inthe case of compound m, the kinase targets were predicted as likely as enzymesand family C G protein-coupled receptors.

68 3 Decreased Aggregate Formation upon TRMT2a Inhibition

NN

ON

OO

O

O

NO

O

OH

H3C

O

HN

O

S

O

CH

3

OO

N

N

SO

OH3C

O

N

O

O

OOBr

O OO

O

O ON OO

O

O

N H

O

O

H N

H3C

CH

3

H3C

O

O

N

O

NH

NN

H3C

O

O

H NO

OH

N

N

N

O

N H

Cl

CH

3

N NH

O

NH

O

OH

O

H3C

O N

N

N

N

N

O

H N

O

N

CH

3

CH

3

O

N

HN

O OHN

HN

g gh

lk

ji

mp

on

qr

ut

s

Figure 3.9: Chemcial Structures of Selected Compounds

3.13 Conclusion 69

A

B

C

Figure 3.10: Virtual Screening Results A Protein-ligand Interaction Fingerprint(PLIF) of the compounds that were selected for in vitro testing. Amino acid interac-tions with docked compound are highlighted as black bars (multiple interactionsof one aa as indicated by a broader bar). Each line represents a docked ligand.Top line corresponds to compound j. B Protein-ligand Interaction Diagram for thedocked pose of compound j. Lys135 and Arg137 form polar contacts. C Elucidationof the pharmacophore model on the basis of inhibitors for related proteins.

71

Chapter 4

Decreased Aggregate Formationupon TRMT2a Inhibition

The formation of insoluble aggregates is a hallmark of polyQ diseases and is ham-pered by disrupting the polyQ tract, e.g. via decreasing translation fidelity. Herewe estimated the physicochemical properties of such strands in silico to elucidatehow a non-glutamine interruption can lead to such different aggregation behavior.

Figure 4.1: Potential energy histogram overlap of uninterrupted (left) and inter-rupted (right) strands. The high degree of overlap indicates that all temperaturewere visited regularly.

We modeled two individual single strands of polyQ, one with a continousglutamine (Q) tract MYPYDVPDYA(Q)61(H)6 and one with a glutamate point mu-tation MYPYDVPDYA(Q)30E(Q)30(H)6. These constructs correspond to our afore-mentioned hypothesis, that the inhibition of TRMT2a would result in a more error-prone translation. We will refer to them as uninterrupted and interrupted polyQmodels, respectively. We then simulated their conformational properties usingthe protein folding and aggregation simulator PROFASI [Lyubartsev et al., 1992],which can reproduce conformational ensembles of unfolded proteins and intrinsi-

72 4 Decreased Aggregate Formation upon TRMT2a Inhibition

cally disordered proteins, see Fig. 4.1.We noticed common general trends in the structural propensities of the

two models, i.e. in both cases, the radii of gyration decrease on passing from lowertemperatures to 325-330K, while they increase afterward. This can be related a lossof secondary structure. Indeed, both polyQ models tend towards a coil conforma-tion at higher temperatures, whereas at lower temperatures both the systems seemto stabilize in α-helical structures. The loss of secondary structure at high temper-atures (above 330 K) is in turn reflected in the deterioration of intrachain hydrogenbonds. These findings are in good agreement with experimental results [Kim et al.,2009, Masino et al., 2002] and computational studies carried out with explicit sol-vent and replica-exchange molecular dynamics [Długosz and Trylska, 2011].

Despite these common trends small but statistically significant differencescan be pointed out between 297 K (room temperature) up to 323 K, which cov-ers all relevant physiological temperatures. Interestingly, the introduction of thenegative charge renders the polyQ structure slightly more compact with a highercontent of Turn/Bend structures at body temperature (BT, 310 K) for interruptedmodels. Accordingly, the solvent-accessible surface area (SASA) increases on pass-ing from uninterrupted to interrupted polyQ models, see Fig. 4.2. These featurespoint toward a better solvent exposure of the interrupted chain and thus a pos-sibly higher desolvation penalty for the interrupted chain during the aggregationprocess [Fernandez-Escamilla et al., 2004].

Figure 4.2: Temperature-dependent solvent accessible surface area

4.1 Computational Aspects 73

Temperature 297 K 310 K 323 K

Propensities [%] Uninterr. Interr. Uninterr. Interrup. Uninterr. Interr.Coil 4 6 7 14 29 38α-helix 95 91 90 78 59 44β-brigde 0 0 0 0 0 0β-dheet 0 0 0 0 0 03-helix 0 0 0 1 2 35-turn 0 0 0 0 0 0turn 1 2 1 4 4 7bend 0 1 1 3 5 8

Table 4.1: Secondary structure propensities of continous polyglutamine constructsand variants with a glutamate insertion with their respective temperature depen-dency

4.1 Computational Aspects

Replica-exchange Monte Carlo (REMC) simulations using the PROFASI packagewere carried out on polyQ models with random starting conformations in implicitsolvent. The uninterrupted polyQ model is composed of MYPYDVPDYA(Q)61(H)6,the interrupted counterpart has the sequence MYPYDVPDYA(Q)30E(Q)30(H)6. Forthe modeling workflow, terminal amino acids were left charged. Both systems wereenclosed in a periodic cubic box with a volume of 297.1 A3. A replica exchange runwith 16 temperatures evenly distributed over a temperature range of 274-374 K waschosen to enhance the sampling in our simulations. Periodically, conformations ofreplicas at consecutive temperatures were interchanged with the swap probability:

pijswap = min(1, e

(1

kBTi− 1

kBTj

)(Ei−Ej)

) (4.1)

where Ti is the temperature and Ei is the energy of replica i, and kB is the Boltzmannconstant. The exchange procedure improves sampling at the lower temperaturesby stochastically seeding them with independent states obtained at high temper-atures, and the above exchange probability ensures that the equilibrium is main-tained at each temperature even for the replica-exchange step. Data for the energyhistograms (See Fig. 4.1) at different temperatures was accumulated after 25,000cycles and demonstrated a sufficient overlap between each consecutive pair of tem-peratures to guarantee a reasonable flow of replicas. For both systems, 2.3 millioncycles were simulated.

At each temperature, simulation snapshots were extracted and proteinproperties were cumulatively calculated for the snapshots. We computed the ra-dius of gyration to understand differences in the globularity of the un-/interruptedstrands. DSSP [Kabsch and Sander, 1983] was used for pattern-recognition based

74 4 Decreased Aggregate Formation upon TRMT2a Inhibition

secondary structure assignment and only the polyQ tract was used for its com-putation. The solvent-accessible surface area (SASA) was estimated with the GRO-MACS [Berendsen et al., 1995] tools. In a typical SASA computation, a water probewith a radius of 0.14 nm is rolled over a molecule to model its solvent exposure. Hy-drogen bonds were calculated using the HBond plugin in VMD [Humphrey et al.,1996]

75

Chapter 5

The Conorfamide RPRFaStabilizes the Open Conformationof Acid-Sensing Ion Channel 3 viathe Nonproton Ligand-SensingDomain

In this project I was responible for the in silico studies of the channel, the peptide,and the complex. Adapted with permission of the American Society for Pharma-cology and Experimental Therapeutics. All rights reserved.

5.1 Introduction

Acid-sensing ion channels (ASICs) are Na+ channels gated by extracellular pro-tons. They assemble as trimers [Bartoi et al., 2014]. Each subunit has a topologywith two transmembrane domains and a large extracellular domain (ECD), whichis composed of 7 α-helices and 12 β-strands that fold into a structure that has beencompared with a clenched hand [Jasti et al., 2007]. In this image, the two transmem-brane domains build the forearm and the ECD is composed of a palm domain, afinger, a knuckle, and a thumb domain that surround the central β-ball domain. Apocket in the ECD containing several conserved acidic residues has been implicatedin binding protons and gating modifier toxins [Baconguis and Gouaux, 2012].

ASIC3 is an isoform that is predominantly expressed in the peripheral ner-vous system, where it contributes to the detection of painful acidosis [Deval andLingueglia, 2015]. It is expressed in primary muscle afferents and mediates mus-cle pain in ischemic-like conditions [Sluka et al., 2003, Molliver et al., 2005, Rosset al., 2016], for example during muscle fatigue. Muscle incision also decreases pH,and ASIC3 also mediates postincisional (postoperative) pain [Deval et al., 2011].

76 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

Its expression is particularly high in afferents innervating the heart muscle, whereit probably mediates the pain during myocardial ischemia (angina) [Benson et al.,1999, Sutherland et al., 2001]. Moreover, in rodents it contributes to cutaneous acidpain [Deval et al., 2008], to hyperalgesia associated with neuropathic pain [Omoriet al., 2008], to acute arthritic pain [Ikeuchi et al., 2009], and to itch sensation inresponse to acid and pruritogens [Peng et al., 2015]. Thus, ASIC3 has a well estab-lished role in the detection of pain associated with sustained acidosis. In addition,ASIC3 is expressed in cell models of glioblastoma [Tian et al., 2017], where it mightsense the sustained acidic microenvironment of a brain tumor.

Upon rapid application of protons, however, ASIC3 completely desensi-tizes, with a time constant of 0.3 seconds [Sutherland et al., 2001]. During a slightbut prolonged acidosis, it desensitizes without apparent opening [Grunder andPusch, 2015], clearly limiting its capacity to sense sustained acidosis. Althoughit has been shown that the overlap of steady-state desensitization and activationcurves of ASIC3 leads to a small window current in a tiny window of pH val-ues that roughly correspond to the pH values reached during myocardial ischemia[Yagi et al., 2006], it is probable that different modulators interact with ASIC3 to ei-ther extend the window current [Deval et al., 2008, Marra et al., 2016] or to inducesustained currents.

One group of such modulators are RFamide neuropeptides [Vick andAskwith, 2015]. The prototypical RFamide is FMRFamide, an important neuro-transmitter in different invertebrates. The main effect of FMRFamide is a slowingof desensitization (prolonging opening) of ASICs. In addition, it induces a vari-able sustained current [Askwith et al., 2000]. Mammalian RFamides, neuropeptideFF (FLFQPQRFamide) and neuropeptide AF (AGEGLNSQFWSLAAPQRFamide),have similar effects that are smaller, however. Although RFamides bind to the openconformation of ASICs [Chen et al., 2006], the on-rate to their binding site is rela-tively slow, and therefore binding to the closed conformation is usually necessaryfor an efficient modulation [Askwith et al., 2000, Chen et al., 2006]. Neither themechanism by which RFamides slow desensitization and induce a sustained cur-rent nor their binding site on ASICs are known.

Recently, RPRFamide (RPRFa), a conorfamide that has a comparativelyhigh affinity to ASIC3 and strongly slows its desensitization, has been isolated fromthe venom of the cone snail Conus textile [Reimers et al., 2017]. RPRFa increases theexcitability of sensory neurons expressing ASIC3, and injection of RPRFa into thegastrocnemius muscle of mice enhances muscle pain in response to acidic stimuli[Reimers et al., 2017]. Thus, RPRFa is an ideal tool to study the modulation ofASIC3 by RFamides. Here, we reveal that RPRFa has to unbind from ASIC3 beforethe channel can desensitize, explaining the slowed desensitization. Moreover, weshow that a cavity in the lower ECD of ASIC3, which is surrounded by the base ofthe three palm domains, the so-called nonproton ligand-sensing domain (NPLSD),is important for the modulation by RPRFa. Since the lower palm domain contractsduring desensitization [Baconguis and Gouaux, 2012, Baconguis et al., 2014], our

5.2 Preparation and Injection of Oocytes 77

results provide a mechanistic explanation for the functional effects of RFamides onASIC3 and strongly suggest that the NPLSD is the RFamide binding site on ASIC3.

5.1.1 Site-Directed Mutagenesis and RNA Synthesis

We used rat ASIC3 cDNA contained in the expression vector pRSSP [Chen et al.,2006]. Mutants of ASIC3 were generated by site-directed mutagenesis followingthe Stratagene QuikChange protocol; polymerase chain reaction was performedwith KAPA HiFi Polymerase (Kapa Biosystems, Wilmington, MA). The sequencesof mutated cDNAs was entirely verified by sequencing (Eurofins Genomics, Ebers-berg, Germany).

After linearization of plasmids with MluI (New England Biolabs, Ipswich,MA), capped cRNA was synthesized using the mMessage mMachine SP6 transcrip-tion kit (Thermo Fischer Scientific, Waltham, MA). Finally, cRNA was purified us-ing RNA Clean & Concentrator (Zymo Research, Irvine, CA). Concentrations weredetermined with a Nanodrop 2000 spectrophotometer (Thermo Fischer Scientific).RNA was stored at -80C.

5.2 Preparation and Injection of Oocytes

Animal care and experiments were conducted according to protocols approved bythe state office for nature, environment, and consumer protection (LANUV) of thestate North Rhine-Westphalia (NRW). Surgical removal of ovaries of female Xeno-pus laevis frogs was described previously [Springauf and Grunder, 2010]. Oocyteswere separated and the follicular membrane was removed by mechanical dissec-tion followed by enzymatic digestion for a period of 2 hours with collagenase type2 (Worthington Biochemical Corporation, Lakewood, NJ). We injected oocytes ofstages V and VI with 3 ng capped cRNA of ASIC3 wild-type or ASIC3 mutants.For ASIC3 mutated at residues E239 or E79 and for substitutions by cysteine, 8-16ng cRNA were injected. Injected oocytes were incubated for 24-72 hours at 19Cin oocyte Ringer solution (in millimolars: 82.5 NaCl, 2.5 KCl, 1.0 Na2HPO4, 5.0HEPES, 1.0 MgCl2, 1.0 CaCl2, 0.5 g/l polyvinylpyrrolidone, 1000 IU/l penicillinand 10 mg/l streptomycin; pH was adjusted to pH 7.3).

5.3 Electrophysiology

For most experiments, we used the Screening Tool (npi electronic, Tamm, Ger-many) as measurement setup, which provides a small recording chamber com-bined with a computer-driven dispensing robot, allowing fast and programmablesolution exchange [Baburin et al., 2006]. Two-electrode voltage clamp recordings ofXenopus oocytes were conducted at ambient temperature (20-23C). A Turbo TEC-03X amplifier (npi electronic) was used to establish voltage clamp conditions at -70

78 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

mV in all experiments. Micropipettes filled with 3M KCl and a resistance of 0.5-2MΩ were used to gain electrical access to oocytes. The current signal was filtered at20 Hz and digitized with a sampling rate of 500 Hz using the software Cellworks(npi electronic).

For the experiments reported in Fig. 5.1 and in Fig. 5.2C-E, a setup witha larger recording chamber and a computer-controlled valve-driven solution ex-change system was used. This setup allowed shorter intervals between solutionapplications (as short as 2 seconds) [Madeja et al., 1995]. For these experiments,currents were digitized with a sampling rate of 200 Hz.

Bath solutions for Two-electrode voltage clamp recordings recordings con-tained (in millimolars) 140 NaCl, 1.8 CaCl2, 1.0 MgCl2, and 10 HEPES (for pH≥ 6.6)or 10 MES (for pH < 6.6); pH was adjusted with NaOH. If not stated otherwise, pH7.4 was used as conditioning solution. Compounds were added to the bath solutionas indicated.

5.4 Photo Affinity Labeling of ASIC3 by RPR[azF]a

ASIC3-expressing oocytes were incubated in oocyte Ringer medium containing ei-ther 30-100 µM RPR[azF]a or 100 µM RPRFa and were irradiated by UV light.Irradiation was for 1-10 minutes with UV light of 302 nm in a Gel Doc XRUV-transilluminator (Bio-Rad, Hercules, CA). Electrophysiological recordings ofphoto-crosslinked ASIC3-expressing oocytes were conducted within 2 hours afterphoto-crosslinking.

5.5 Data Analysis

For analysis of current recordings, we used Cellworks Reader (version 6.2.2; npiElectronic) and IgorPro (version 5.0.3.0; WaveMetrics, Lake Oswego, OR). For ki-netic analysis of the current decay phase, it was fit with a double exponential func-tion:

I(t) = y0 +A[a1e(−t/τ2) + a2e

(−t/τ2)] (5.1)

I(t) represents the current at time point t, y0 a sustained current compo-nent,A the desensitizing current component, and a1 and a2 the two fractions of cur-rent components desensitizing with fast τ1 and slower τ2, respectively; a1 + a2 = 1.The underlying assumption was that currents of ASIC3 with no RPRFa bound de-cay with τ1 and currents of ASIC3 with RPRFa bound with τ2 [Reiners et al., 2018].

Statistical analysis was performed in Excel 2016 (Microsoft, Redmond,WA) and R 3.4.2 (R Core Team, Vienna, Austria). Results are reported as the mean± S.E.M. Student’s unpaired t test was used to determine P values, with the excep-tion of experiments reported in Fig. 5.1, for which Student’s paired t test was used.Significance levels were defined as follows: ∗P < 0.05;∗∗ P < 0.01;∗∗∗ P < 0.001.

5.6 Modeling of Rat ASIC3 and RPRFa 79

In case of multiple testing on the same data set, the Bonferroni correction was usedby multiplying the P value by the number of tests. Graphs and current traces wereplotted with IgorPro and arranged with Adobe Illustrator CS6. All experimentswere conducted with oocytes of at least two different frogs, with the exception ofphoto-labeling experiments, which were conducted with oocytes of one frog.

5.6 Modeling of Rat ASIC3 and RPRFa

We created a knowledge-based homology model of rat ASIC3 (UniProtKB: O35240,533 aa) in both its open and desensitized state, on the basis of the corresponding ionchannel structures of chicken ASIC1a (cASIC1). The sequences were aligned withClustalW, revealing a sequence identity of 44%. In the open cASIC1 structure, thechannel is in complex with the snake toxin MitTx [Protein Data Bank (PDB) entrycode: 4NTW; resolution of 2.07 AA] [Baconguis et al., 2014]. For the desensitizedreceptor state, we used as a template the desensitized structure of cASIC1 (PDBentry code: 2QTS; resolution of 1.9 AA) [Jasti et al., 2007].

The full biologic assemblies were retrieved from the PDB and processedwith Schrodinger software (Maestro Version 11.4.011; Small-Molecule Drug Dis-covery Suite 2017-4; Schrodinger, LLC, New York, NY), using default settings un-less otherwise noted. All entities not belonging to the receptor trimer were re-moved, and missing side chains/loops were modeled. With PROPKA (propka.org)we adjusted the pH of the receptors to 6 to account for the pH experienced by anopen channel. The tetrapeptide RPRF was built from the sequence in an extendedconformation and a C-terminal amide group was manually introduced yieldingRPRFa.

5.7 Molecular Modeling of the RPRFa Binding Poses

Binding Site. To characterize potential binding sites on the homology models cor-responding to the open and the desensitized states, we used Schrodinger SiteMap[Halgren, 2009]. Site candidates were ranked on the basis of SiteScore (see Supple-mental Table 5.1), which considers the presence of hydrophilic and -phobic featuresand pocket size. The required amount of enclosure and threshold for van der Waalsinteractions was decreased to detect also shallower binding sites known to be rel-evant in protein-protein interactions. SiteScore allows ranking potential bindingsites but does not allow predictions concerning the relevance of these sites for aspecific ligand.

Docking Protocol and Scoring. A peptide-protein docking approach as imple-mented in Schrodinger Glide was performed [Friesner et al., 2006]. The centroidof the interaction points in SiteMap served to center a Glide grid, suitable forpeptide docking. This grid represents a cubic box (8000 AA3) that encapsulates

80 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

SiteMap Results SiteScore Volume (A3)

Open Model

Site #1 (AP) 1.10 820Site #2 (NPLSD) 1.06 3484Site #3 1.03 884Site #4 1.02 919Site #5 1.00 968

DesensitizedModel

Site #1 (AP1) 1.11 631Site #2 1.10 497Site #3 (AP2) 1.08 700Site #4 1.00 874Site #5 0.99 1180

Table 5.1: Potential binding sites on ASIC3. Binding sites were detected withSiteMap. SiteScore values and volume of the sites are indicated. Potential bind-ing sites were sorted according to SiteScore for the individual models. The NPLSDis only detected in the open state model. The acidic pocket (AP) is detected twicein the desensitized state model, due to receptor symmetry. In the open state model,the AP is ranked slightly higher than the NPLSD.

the lower half of the three palm domains with one cube face positioned approx-imately parallel to the membrane plane. RPRFa docking was carried out witha standard-precision docking approach tailored for peptide ligands, SP-peptide[Tubert-Brohman et al., 2013], which ensures sufficient conformational samplingof the peptide. The Glide docking procedure penalizes ligand-receptor interactionsproducing steric clashes. Moreover, side chains of the receptor were allowed torelax their conformation around the docked ligand. Therefore, none of the bind-ing poses had steric clashes. We scored the resulting binding poses with GlideEmodel [Friesner et al., 2006] since force field terms are particularly well suited torank individual ligand conformers, rather than different ligands. A protein-ligandinteraction fingerprint (PLIF) was generated with MOE (Molecular Operating En-vironment, 2013.08; Chemical Computing Group ULC, Montreal, Canada).

5.8 Results

RPRFa Has to Unbind before ASIC3 Desensitizes. To describe the interactionof RPRFa with ASIC3, [Reimers et al., 2017] proposed a simplified kinetic modelwith three sequential states: closed (C), open (O) and desensitized (D) (Fig. 5.1A).According to this model, RPRFa binds to all of these three states with varying dis-sociation constants. The slower desensitization in the presence of RPRFa is thenexplained by a slower transition rate (τ2) from OR to DR, when RPRFa is bound.An alternative explanation would be that RPRFa has to unbind before the channelcan desensitize. The slower desensitization in the presence of RPRFa would then

5.8 Results 81

be explained by the transition from OR to O, which would be significantly slowerthan the transition from O to D (Fig. 5.1A), leading to an indirect slowing of de-sensitization. The main difference between these two explanations is the existenceof a desensitized conformation that has RPRFa bound (DR state), which should bereadily populated according to the first explanation, but not according to the sec-ond explanation. To decide between these two explanations, we therefore testedthe population of the DR state.

First, we examined whether RPRFa can bind to the desensitized confor-mation D of ASIC3. We desensitized ASIC3 with pH 6.0 and then applied 100 µMRPRFa for 60 seconds at pH 6.0. Next, we washed out the peptide by two shortwash steps (pH 6.0 followed by pH 7.4, 2 seconds each). When we then activatedASIC3 by application of pH 6.0, we observed no slowing of desensitization, as mea-sured by the ratio of the current 2.5 seconds post-peak (I2.5s) to the current at peak(I2.5s/Ipeak = 1.8% ± 0.2% for peak iii and I2.5s/Ipeak = 2.7% ± 0.4% for peak ii; iiivs. ii, P = 0.26; n = 12; Fig. 5.1B). In contrast, when we applied RPRFa for 60 sec-onds at pH 7.4, followed by two wash steps of identical duration as before (pH 7.4,2 seconds each), desensitization of ASIC3 after activation by pH 6.0 was stronglyslowed down (I2.5s/Ipeak = 71%±13% for peak iv; iv vs. iii; P< 0.001; Fig. 5.1). Weconclude that the affinity for RPRFa of the desensitized state D is much smaller thanof the closed state C, owing to either a smaller binding rate or a larger unbindingrate.

Next, we tested whether RPRFa unbinds during current decline (Fig.5.1C). We preapplied RPRFa (30 µM, 60 seconds) and then activated ASIC3 for 17seconds with pH 6.0 in the absence of RPRFa. During this long activation currentdeclined to < 20% of its peak amplitude. It was followed by a short 2-second in-terval in pH 7.4 (no RPRFa), just enough to allow recovery of ASIC3. We thenactivated again with pH 6.0 and evaluated current decline. In case RPRFa stayedbound to ASIC3 during the long first current decline, the slowly desensitizing cur-rent component of the second activation should be larger than the current at theend of the previous activation. In contrast, in case RPRFa unbound during cur-rent decline, the slowly desensitizing current component of the second activationshould be similar to or smaller than the current at the end of the previous activation.We found that the slowly desensitizing current component of the second activationwas indeed similar to the current at the end of the previous activation (Fig. 5.1C,left panel). To rule out that RPRFa unbinding happened during the 2-second inter-val at pH 7.4, we preapplied RPRFa as before (60 seconds), followed by 2-secondpreconditioning at pH 7.4 in the absence of RPRFa. When we then activated withpH 6.0 and evaluated I2.5s/Ipeak, we found that channels were strongly modified(I2.5s/Ipeak = 47.8% ± 2.4%;n = 7). Although I2.5s/Ipeak was larger without the2-second interval at pH 7.4 (I2.5s/Ipeak = 60.6% ± 2.6%, peak i; n = 9; iii vs. i, P= 0.004), this result still rules out that RPRFa unbinds quickly at pH 7.4, suggest-ing that current decline of ASIC3 after RPRFa-modification is accompanied by asubstantial unbinding of RPRFa.

82 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

So far, we found no evidence that the desensitized conformation withRPRFa bound (DR) is strongly populated. If RPRFa indeed needs to unbind to al-low desensitization of ASIC3, the irreversible binding of RPRFa should preventdesensitization of ASIC3. To test this prediction, we replaced the F residue ofRPRFa by the photo-reactive unnatural amino acid 4-azido-phenylalanine (AzF)to yield RPR[azF]a. Upon irradiation by UV light, the azide moiety of AzF is con-verted to the highly reactive nitrene group, which allows the covalent attachmentof RPR[azF]a to ASIC3.

RPR[azF]a slowed desensitization of ASIC3 likewise to RPRFa (Fig. 2A),even with a higher apparent affinity and a slower τdes than RPRFa (EC50: 1.2 ± 0.2µM; τdes = 12.1 ± 3 seconds; n = 7; Fig. 5.2A) [Reimers et al., 2017], demonstratingthat the peptide containing the AzF modification still binds to and modifies ASIC3.We then irradiated ASIC3-expressing oocytes with UV light for 1-10 minutes in 30-100 µM peptide (Fig. 5.2B). As control, we irradiated ASIC3-expressing oocyteswith UV light for 6 minutes in the presence of 100 µM unmodified RPRFa. Whencells treated with RPR[azF]a were subsequently activated by pH 6.0, a sustainedcurrent was observed, although no peptide was present or preapplied (Fig. 5.2C).This sustained current was not observed for control oocytes that had been incu-bated in RPRFa, suggesting that RPR[azF]a was covalently bound to ASIC3 andmodulated the channel. The sustained current increased with increasing durationsof photo labeling (I25s/Ipeak ratios: control, 0.3% ± 0.2%; UV treatment 1 minute,3.7% pm 0.4%; UV 6 minutes, 20.5% ± 1.8%; UV 10 minutes, 59.8% ± 5.0%; P <0.001 for all conditions with UV treatment; n = 5-10; Fig. 5.2D). But in contrastto modulation by RPRFa, the sustained current of covalently modulated channelspersisted for the whole duration of pH 6.0 application (30 seconds) without anyapparent desensitization. Moreover, the transient part of the current desensitizedwith a τdes that is typical for unmodified ASIC3; only at prolonged UV irradiation,τdes was slowed down (τdes: control, 0.44 ± 0.02 seconds; UV treatment 1 minute,0.62 ± 0.06 seconds, P = 0.08; UV 6 minutes, 0.82 ± 0.07 seconds, P = 0.004; UV 10minutes, 1.43 ± 0.14 seconds, P = 0.005). These results are expected when UV irra-diation leads to two populations of ASIC3, one covalently modified by RPRFa andone unmodified. The modified population would be trapped in the open confor-mation (O-R) and would not desensitize at all, and the other fraction would desen-sitize with the usual taudes. The increase in the sustained current with increasingduration of UV irradiation speaks in favor of this interpretation.

To further test the presence of two populations (one covalently modifiedand one unmodified) in these experiments, we irradiated ASIC3-expressing oocyteswith UV light for 4 minutes in 30 µM RPR[azF]a. As before, these oocytes displayeda nondesensitizing sustained current when subsequently activated by pH 6.0. Im-portantly, the transient current was still sensitive to RPR[azF]a (no UV irradiation),which slowed desensitization likewise to RPRFa (Fig. 5.2E). Thus, this result is inagreement with two populations of ASIC3 in UV-irradiated oocytes that had beenincubated in RPR[azF]a.

5.8 Results 83

Collectively, these results strongly argue that RPRFa has to unbind beforeASIC3 can desensitize and that the kinetics of current decline of ASIC3 in the pres-ence of RPRFa mainly reflects the slow unbinding of RPRFa (transition OR to O,Fig. 5.1A).

Molecular Docking Predicts the Nonproton Ligand-Sensing Domain as a Pos-sible Binding Site for RPRFa. To discover possible binding sites of RPRFa onASIC3, we modeled the open and the desensitized conformations of rat ASIC3 onthe open and the desensitized conformation of chicken ASIC1, respectively (PDBID: 4NTW and 2QTS). We then performed an unbiased binding sites detection pro-tocol (see Materials and Methods for detail) across these two in silico models andfound several potential binding sites; for each model, the five highest ranking sitesaccording to SiteScore are reported in Supplemental Table 5.1. Since experimentalevidence, as reported above, suggests that RPRFa unbinds before ASIC3 desensi-tizes, the binding site of RPRFa should be located in a region accessible only inthe open state but not in the desensitized state. The largest binding site in theopen-state model that is not accessible in the desensitized state model is a cavityin the lower palm domain, the NPLSD. Indeed, in cASIC1 the lower palm domainslightly shrinks on passing from the open to the desensitized state [Baconguis andGouaux, 2012] and this is also reflected in our models. The conformational changesin the lower palm domain, when transitioning from open to the desensitized state,suggest that RPRFa binding to the NPLSD might be sterically unfavorable in thedesensitized state.

RPRFa was therefore docked into the NPLSD of the open-state homologymodel of ASIC3 (Fig. 5.3). Protein-peptide docking is more challenging than small-molecule docking, since even short peptides are often highly flexible, and sev-eral binding poses with comparable probability are possible. We obtained 34 dis-tinct binding poses. These are summarized by a protein ligand interaction finger-print (PLIF) showing the interaction between RPRFa and ASIC3 (Fig. 5.4B). High-ranking poses form cation-π interactions with the C-terminal F of RPRFa and R376in β-strand 10, whereas the amide capping group hydrogen bonds with Q269 of β9.Salt bridges can be observed between the C-terminal R of RPRFa and E79/E423 aswell as for the N-terminal R of RPRFa and E418 (Fig. 5.4A). Notably, the interac-tions between the peptide and receptor can involve more than one monomer at thesame time, possibly altering monomer interaction.

The Nonproton Ligand-Sensing Domain Is Involved in Modulation by RPRFa.The NPLSD is mainly encapsulated by four β-strands of each of the three palmdomains. To test experimentally the prediction by textitin silico docking, we mu-tated eight residues in the NPLSD that surrounded RPRFa docked to the homol-ogy model: L77 and E79 in β-strand 1, E423 in β12, Q269 and Q271 in β9, andR376, A378, and E380 in β10 (total of 23 mutations; Fig. 5.5). First, we activatedASIC3 wild-type and each mutant with pH 6.3 and pH 4.0 and then preapplied

84 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

3 µM RPRFa followed by another activation with pH 6.3 (Fig. 5.5B). To quantifymodulation by RPRFa, we calculated the difference of the current remaining af-ter 2.5 seconds with and without RPRFa (I2.5s+RFa − I2.5s−RFa) and normalized itto the peak current without RPRFa ([I2.5s+RFa − I2.5s−RFa]/IPeak−RFa). Out of atotal of 23 mutants, for 18 mutants [I2.5s+RFa − I2.5s−RFa]/IPeak−RFa was signifi-cantly reduced, compared with the wild-type (Fig. 5.5D). Notably, all substitutionsof R376 and Q269, two amino acids that appeared to be crucial for interaction withthe Famide group of RPRFa in in silico docking (Fig. 5.4), significantly reducedmodulation by RPRFa. Likewise, substitutions of E423, which makes extensivecontacts to RPRFa in most docking poses (Fig. 5.4B), also significantly reducedmodulation by RPRFa, with the exception of the relatively conservative exchangeE423Q (Fig. 5.5D). As an alternative way to quantify modulation by RPRFa, wealso calculated the ratio I2.5s+RFa/I2.5s−RFa. Also in this type of analysis, 17 outof the 23 mutations in the NPLSD, including all substitutions of R376 and Q269,significantly reduced modulation by RPRFa. Together, these results show that theNPLSD is crucial for the modulation of ASIC3 by RPRFa. To test whether the muta-tions specifically changed the interaction of amino acids of the NPLSD with RPRFa,we additionally generated mutants Q270E and K379E, which are also located on β-strands 9 and 10 but the side chains of which point away from the cavity to whichRPRFa was docked. Both mutants indeed did not significantly affect the modu-lation by RPRFa as measured by the ratio [I2.5s+RFa − I2.5s−RFa]/IPeak−RFa (Fig.5.5D). For K379E, the ratio I2.5s+RFa/I2.5s−RFa was reduced, albeit less stronglythan for the other mutants. The results are in agreement with the interpretationthat RPRFa was bound to the NPLSD.

Evidence that Mutants within the Nonproton Ligand-Sensing Domain ReduceBinding of RPRFa and Fasten Unbinding. In principle, there are two ways bywhich the mutations might have reduced the modulation by RPRFa: first, by re-ducing binding, or second, by reducing the effect of binding on desensitization.To further differentiate between these two possibilities, for 12 mutants we alsotested a higher RPRFa concentration (100 µM) (Fig. 5.6); for wild-type ASIC3, 100µM RPRFa is a saturating concentration [Reimers et al., 2017]. When looking ata shorter time scale (0.5 seconds after peak), RPRFa was indeed able to modulatedesensitization kinetics of many mutants (Fig. 5.6B). To quantify RPRFa effects, wedetermined [I0.5s+RFa − I0.5s−RFa]/IPeak−RFa. For wild-type ASIC3 this ratio was0.90 ± 0.07, whereas it was significantly reduced to a range of -0.10 to 0.51 for 11mutants (L77F, L77C, E79S, E423R, Q269R, Q269C, Q271R, R376E, A378R, A378C,and E380Q; Fig. 5.6C), confirming the importance of these amino acids for RPRFamodulation of ASIC3.

5.9 Discussion 85

5.9 Discussion

Our results show: 1) that RPRFa has to unbind before ASIC3 can desensitize, 2)that RPRFa docks to the NPLSD of an ASIC3 open-state model, and 3) that mu-tations within the NPLSD, but not within the acidic pocket, reduce modulation ofASIC3 by RPRFa. The finding that RPRFa has to unbind before ASIC3 can desen-sitize suggests that the slow time constant τ2 of the current decay phase, which isintroduced by RPRFa and which is commonly described as slower desensitizationof ASIC3, in reality reflects unbinding of RPRFa (Fig. 5.1A). Thus, it appears thatRPRFa uses a simple mechanism to modify gating of ASIC3: by stabilizing the openconformation.

Many mutations within the NPLSD, a cavity of the lower palm domain,reduced the modulation of ASIC3 by RPRFa. Mutation of two amino acids that donot point toward the interior of the cavity, however, did not reduce RPRFa modu-lation. In addition, the reduced a2 in the bi-exponential fits of mutant ASIC3 andthe faster slow-time constant τ2, in our interpretation indicate reduced binding ofRPRFa and faster unbinding of RPRFa, respectively. These findings suggest thatRPRFa indeed binds to the NPLSD. In agreement with this idea, the lower palmhas been implicated in desensitization by several previous studies [Cushman et al.,2007, Springauf et al., 2011, Roy et al., 2013]. In particular, comparison of the openand desensitized conformation revealed that the ECD contracts in the lower palmdomain that encapsulates the negatively charged central vestibule [Baconguis et al.,2014], which forms the NPLSD of ASIC3 [Yu et al., 2010]. A contraction of the lowerpalm domain was later confirmed by functional studies; preventing the contractionimpairs desensitization [Roy et al., 2013]. Moreover, interfering with the confor-mational changes in the lower palm domain induces sustained currents by desta-bilizing the desensitized conformation. Of particular interest, it has been shownthat FRRFa limits covalent modification of a residue in the lower palm domain ofASIC1a [Frey et al., 2013]. All these findings are consistent with binding of RPRFato the NPLSD of ASIC3.

Collectively, our results are consistent with the following mechanisticmodel: RPRFa slowly (within several seconds) binds to a cavity in the lower palmdomain, the NPLSD, in the closed or the open conformation and sterically stabi-lizes the open conformation by preventing the contraction of the central vestibule.Unbinding of RPRFa with a time constant on the order of 6 seconds [Reimers et al.,2017] would then allow delayed contraction of the central vestibule and desensiti-zation of ASIC3. Since the time constant of desensitization of ASIC3 (0.3 seconds)[Sutherland et al., 2001] is approximately 10-fold faster than unbinding of RPRFa,RPRFa unbinding would dominate the kinetics of current decay. Although this is aplausible explanation of our results, we can, at present, not exclude other models inwhich RPRFa binds to a site different from the NPLSD on ASIC3 and, for example,allosterically enhances unbinding of RPRFa. The availability of RPR[AzF]a as a co-valent ligand provides the opportunity to unequivocally determine the binding site

86 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

of RPRFa on ASIC3 in the future, by identifying the covalently modified channelfragments, for example by tandem MS analysis.

We have previously shown that the prototypical RFamide FMRFa com-petes with RPRFa for modulation of ASIC3 [Reimers et al., 2017], suggesting thatboth RFamides share a binding site. Since all RFamides have similar effects onASICs [Vick and Askwith, 2015], mainly slowing current decay [Askwith et al.,2000, Deval et al., 2003, Chen et al., 2005, Reimers et al., 2017], it is conceivable thatall RFamide neuropeptides, which modulate ASICs, share a binding site on ASICs.Thus, the cavity in the lower palm domain is a probable common binding site forRFamide neuropeptides on ASICs. Although RFamides also modulate ASICs otherthan ASIC3 [Askwith et al., 2000], ASIC3 and ASIC3-containing heteromers seemto be more strongly modulated by RFamides than other ASICs [Deval et al., 2003,Chen et al., 2006, Vick and Askwith, 2015]; RPRFa, in particular, modulates onlyASIC3 and different ASIC3-containing heteromers but not ASIC1a [Reimers et al.,2017]. It is probable that structural differences in the central vestibule are respon-sible for the differential sensitivity of different ASICs to RFamides. It might bethat RFamides bind with lower affinity to the NPLSD of ASIC1a, for example.Since ASIC1a desensitizes more slowly than ASIC3, an enhanced unbinding ofRFamides, owing to lower affinity, could result in a situation in which the kinet-ics of current decay would still be dominated by desensitization rather than byRFamide unbinding, effectively reducing the functional outcome of RFamide bind-ing to the channel. Future studies are necessary to reveal the molecular basis forthe subtype-specific action of RFamides.

If RFamide neuropeptides indeed bind to the NPLSD of ASIC3 theywould join several other ligands for which there is evidence of binding to theNPLSD. 2-Guanidine-4-methylquinazoline (GMQ) is a synthetic ligand that al-ready induces sustained ASIC3 currents at pH 7.4 [Yu et al., 2010] and also bindsto the NPLSD [Yu et al., 2010]. Also, different natural ligands bind to the NPLSD,such as the arginine metabolite agmatine [Yu et al., 2010], serotonin [Wang et al.,2013], and the pruritogenic peptide SLIGRLamide [Peng et al., 2015] . Thus, theNPLSD appears as an important binding cavity in the ECD of ASIC3. All theseligands, including RPRFa, are positively charged, containing guanidine or aminegroups, which suggests that they are attracted by the negative charges of the cen-tral vestibule [Jasti et al., 2007]. Moreover, the presence of two guanidine andone amino group in RPRFa might explain its particularly high affinity for ASIC3[Reimers et al., 2017], for example compared with FMRFa, which has only oneguanidine and one amino group.

Apart from the positive charges of the ligands binding to the NPLSD, theirprecise structures are quite diverse, however. This situation is reminiscent of thedelayed rectifier K+ channel hERG, where many structurally diverse compoundsbind to a common inner cavity to inhibit the channel [J et al., 2005]. This cavityis particularly large in hERG compared with other related K+ channels, explainingthe drug promiscuity of hERG. Perhaps the central vestibule of ASIC3 is also larger

5.9 Discussion 87

than that of other ASICs explaining its sensitivity to different compounds, some ofwhich are quite bulky. It might be that the central vestibule of ASIC3 evolved to acavity where different physiologic ligands bind to modulate the activity of ASIC3.

In summary, our results reveal how RPRFa slows current decay of ASIC3and that the NPLSD is important for the modulation of ASIC3 by RPRFa. Althoughwe cannot exclude other models, we propose a mechanistic model in which RPRFabinds to the NPLSD preventing the contraction of the central vestibule, which ap-pears to be sufficient to slow desensitization.

88 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

Figure 5.1: The desensitized state of ASIC3 with RPRFa bound is not strongly pop-ulated. (A) Simplified kinetic model of the three main states of ASIC3 with orwithout RPRFa bound. τ1 and τ2 are the time constants of current decay in theabsence or presence of RPRFa, respectively. The transitions that describe currentdecay are highlighted in bold. State DR is highlighted, as its population is un-sure. (B) Left, representative current trace of an ASIC3-expressing oocyte whenactivated by pH 6.0 (black bars) and treated with 100 µM RPRFa (red bars). Peaksare marked by roman numerals. Right, mean I2.5seconds/Ipeak ratios for the pH-activated currents. Mean amplitude of peak i was -17.4 ± 3.4 µA; n = 12. (C) Left,representative current trace of an ASIC3-expressing oocyte when activated by pH6.0 (black bars) and treated with 30 µM RPRFa (red bars). Peaks are marked byroman numerals. The dotted red line represents a fit of the current decline of peaki. Current decline of peak ii followed this fit. The trace shown is representative fornine similar experiments. Right, mean I2.5seconds/Ipeak ratios. Mean amplitude ofpeak i was -14.0 ± 5.1 µA; n = 9. Error bars represent S.E.M.; n.s., not significant;∗∗P < 0.01;∗∗∗ P < 0.001 (Student’s paired t test).

5.9 Discussion 89

Figure 5.2: The photo-reactive peptide RPR[azF]a abolishes desensitization whencrosslinked to ASIC3. (A) Left, representative current trace illustrating that increas-ing concentrations of RPR[azF]a (red bars) slow the desensitization and increasecurrent amplitude of ASIC3 activated by pH 6.3 (black bars). Right, concentration-response curve. The response was quantified as the I5seconds/Ipeak ratio in measure-ments like the one shown on the left. n = 7 for 1, 10 and 100 µM and n = 5 for 0.1, 3and 30 µM. (B) Scheme illustrating that ASIC3-expressing oocytes were incubatedin RPR[azF]a and irradiated by UV light. Irradiation time and RPR[azF]a concen-tration were 1 minute and 30 µM, 6 minutes and 100 µM, or 10 minutes and 30µM, respectively. When applied for 10 minutes, half of the RPR[azF]a solution wasrenewed every 2 minutes. The control condition was incubated in 100 µM RPRFaand irradiated for 6 minutes. (C) Representative ASIC3-expressing oocytes, whichhad been photo-labeled, were activated by pH 6.0 (black bar) in the absence of anypeptide. (D) Mean I25seconds/Ipeak ratios of pH 6.0 currents of UV-irradiated ASIC3-expressing oocytes. n = 5-10. Error bars represent S.E.M.; ∗∗∗P < 0.001. Mean peakcurrents were in the range of -7.0 ± 3.1 to -17.0 ± 2.5 µA. (E) Representative cur-rent trace of an ASIC3-expressing oocyte, which had been photo-labeled with 30µM RPR[azF]a for 4 minutes. The first activation with pH 6.0 was in the absenceof peptide and the second activation was after preapplication of 30 µM RPR[azF]a.Six similar measurements were recorded in the same week.

90 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

Figure 5.3: Cartoon representation of ASIC3 in complex with RPRFa. (A) Top view.(B) Side view with highest ranked RPRFa docking pose in green licorice. The car-toon was made with PyMol (The PyMOL Molecular Graphics System, Version 2.0;Schrodinger, LLC). Higher magnification view of the highest ranked docking posewith key interactions highlighted.

Figure 5.4: Binding of RPRFa to the NPLSD of ASIC3. (A) Two-dimensionalprotein-ligand interaction diagram of the best binding pose. (B) Protein-ligand in-teraction fingerprint (PLIF) as population histogram. For each amino acid of thereceptor as many columns are displayed as there are distinguishable noncovalentinteractions in at least one pose.

5.9 Discussion 91

Figure 5.5: Mutation of residues in the NPLSD reduce the modulation by RPRFa.(A) Scheme illustrating the four β strands of one subunit that surround the cavity inthe lower palm domain. Approximate positions of amino acids that point towardthe cavity and have been mutated are indicated. (B) Representative current trace ofan ASIC3-expressing oocyte when activated by pH 6.3 (black bar) or pH 4.0 (bluebar) with or without preapplication of RPRFa (3 µM, red bar). (C) RepresentativepH-activated currents of NPLSD mutants in the absence of (black trace) and afterpreincubation with (red trace) 3µM RPRFa were overlaid. (D) Mean [I2.5s+RFa −I2.5s−RFa]/IPeak−RFa ratios for all 25 mutants of the NPLSD. Bars are shown in thecolor of the corresponding β-strands in (A); white bars represent residues whoseside chains point away from the cavity. Error bars indicate S.E.M. nmutant = 8-19;nwt = 4-28 (each condition), nwt (total) = 102. ∗P < 0.05;∗∗ P < 0.01;∗∗∗ P < 0.001(Student’s t test followed by Bonferroni correction). WT, wild type.

92 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

Figure 5.6: NPLSD mutants are affected by high concentrations of RPRFa. (A) Rep-resentative recording of an ASIC3-expressing oocyte, illustrating the applicationprotocol. ASIC3 was first activated by pH 6.3 (black bar) without peptide and thenafter preapplication of 100 µM RPRFa (red bar). (B) Representative pH-activatedcurrents of NPLSD mutants in the absence of (black trace) and after preincubationwith (red trace) 3 µM RPRFa were overlaid. Gray vertical lines indicate the timepoints 0, 0.5, and 2.5 seconds post-peak. (C) Bar graph displaying mean ratios ofcurrent amplitudes 0.5 seconds post-peak in the presence and in the absence ofRPRFa ([I0.5s+RFa − I0.5s−RFa]/IPeak−RFa). Error bars indicate S.E.M.; nwt = 17,nmutant = 6-12. (D) Current decay after preapplication of RPRFa was fit with doubleexponential functions yielding a2 (top) and the ratio τ2/τ1 (bottom). Mutant Q271Rwas not analyzed, because it could not be fit properly. nwt = 15, nmutant = 6-10.∗P < 0.05;∗∗ P < 0.01;∗∗∗ P < 0.001. WT, wild type.

5.10 Conclusion

Despite considerable efforts from industry and academia, the development of novelpharmaceutical treatment options for an aging society has lagged behind. This is inparticular the case for neurodegenerative disorders, where there is an unmet needfor new drugs and high attrition rates [Craven, 2011, Cummings and Zhong, 2016,Mehta et al., 2017]. Here computational methods can potentially guide experimen-tal efforts and provide insights into the biological and pathophysiological processesat the molecular level.

Voßfeldt et al. in Dr. Aaron Voigt’s Lab, in Prof. Jorg B. Schulz’s insti-tute (Institute of Neurology, RWTH University Hospital, RWTH Aachen)(Instituteof Neurology, RWTH University Hospital, RWTH Aachen) had previously shownneuroprotective effects of tRNA methyltransferase homolog A (TRMT2a) knock-down in a large-scale screen for modulators of polyglutamine diseases in fly diseasemodels. In this thesis, I explore computer-assisted molecular modeling and simu-lation techniques to inhibit TRMT2a for polyglutamine diseases and attempted tomatch the disease modifying effects of TRMT2a knock-down with a pharmacolog-ical intervention. To this end, I built a homology model for S-adenosyl methio-nine dependent catalytic domain located near the C-terminal and an N-terminalRNA Recognition Motif (RRM) domain (TRMT2a RRM). The resulting models forTRMT2a RRM proved essential for the selection of constructs for effective proteinexpression, purification and eventually structure elucidation by means of X-raycrystallography and were performed in the lab of Prof. Niessing , University ofUlm. However, RRMs are the most abundant RNA binding domain in eukaryotes,often with multiple RRM copies present in a single protein, which indicated thatfinding a specific inhibitor of TRMT2a RRM is key to avoiding off-target effects ofputative drugs. Moreover, several distinct RNA binding modes show a high degreeof structural versatility, which complicates the in silico prediction of the tRNA-RRMcomplex. However, on the sequence level, I elucidated two subdomains (RNP1and RNP2) of the RRM, which frequently mediate RNA-RRM recognition in otherRRMs. Unfortunately, an initial assessment of the crystal structure of TRMT2a RRMindicated that these subdomain regions where not sufficiently concave to providea suitable binding pockets for small molecule binding. Also, when considering theentire domain such a site was not discernable, indicating that this domain might beundruggable. While druggability can be assumed when such a protein features adistinct pocket, it is also possible to infer it from related structures that have knownbinders. Therefore, I looked for structural information of RRMs in complex withother molecules than RNA and could indeed discover the presence of a peptidebinding site on the helical site. Thus, to deduce such a site also on TRMT2a RRM Iperformed molecular dynamics simulation in the presence of cosolvent probes thatmimic the peptides. Encouragingly, I found increased probe propensity in sameregion of TRM2R, suggesting that also in this case the helical site of TRMT2a RRMis involved in protein-protein interactions and that such an interaction could be

94 5 RPRFa Stabilizes the Open Conformation of ASIC3 via the NPLSD

hampered by molecules that interfere with this interaction.Next, I assessed the protein dynamics with machine learning tools and

implicit solvent molecular dynamics simulations. This led to the discovery of a suf-ficiently concave protein patch on the aforementioned helical site. Since this pocketwas not apparent in the X-ray structure and became only evident when consideringthe dynamical fluctuations of the protein at room temperature we classified it as atransient site. Moreover, I predicted this pocket to be allosterically connected withthe putative tRNA binding site of this protein domain. Given the high abundanceof RRMs, an inhibitor binding to an allosteric site of TRMT2a RRM would clearlybe preferential in terms of selectivity.

Subsequently, I investigated the pocket dynamics in detail in several µ slong molecular dynamics simulations with explicit solvent and could hereby cap-ture the formation and closure of this pocket repeatedly. Leveraging this infor-mation by visual inspection of each snapshot the trajectory from such simulationsalone was assisted again by machine learning methods that predict druggabilityof a given trajectory snapshot and a rotamer propensity analysis of a key activeresidue.

To finally target this transient pocket, I ran a virtual screening to identifybinders. I selected a small subset of these hits for in vitro testing (Dr. Voigt’s lab,University Clinic Aachen) and biophysical binding measurements (Prof. Niessing’slab, University of Ulm). Intriguingly, the majority of tested compounds targetingthis pocket improved cell viability and decreased pathological aggregation. In con-trast, binding was not evident in biophysical measurements with TRMT2a RRM.We assume that the full length protein would be required to detect binding in thecase of the other ligands. To further support our hypothesis that these compoundshamper TRMT2a we tested their effect on knock-down cell lines transfected witha polyglutamine disease. Encouragingly, TRMT2a knock-down cell lines remainedunaffected when treated with these compounds, indicating that our proposed com-pounds might indeed modify the biological function of TRMT2a.

In parallel, I exploited available ligand information on homologousmethylases to build a pharmacophore hypothesis. Most inhibitors closely resem-bled the cofactor, therefore we considered them to bind to the catalytic domainof TRMT2a. Here, polyamines and, in particular, spermidine form a noteworthyexception. Of note, spermidine is already in a monocentric, randomized, double-blind, placebo-controlled phase IIb trial for AD [Wirth et al., 2019] . Our resultssuggest that the neuroprotective effects of spermidine, also observed for polyg-lutamine disorders, are in part due to interfering with the biological function ofTRMT2a.

According to our working hypothesis, the ameliorating effects of TRMT2ainhibition/knock-down are due to to an increased error rate during translation,leading to interrupted polyglutamine tracts.

These interrupted tracts are less prone to form toxic aggregates. I simu-lated the effect of a point mutation in the polyglutamine tract with a Monte Carlo

5.10 Conclusion 95

approach [Irback and Mohanty, 2006] and concluded that changed chemo-physicalproperties lead to the observed decrease in intrinsic toxicity.

In addition, I also applied molecular modeling to predict the binding siteand mode of several peptides to acid-sensing sodium channels (ASICs) in collabo-ration with Prof. Grunder (University Clinic Aachen). These channels play a keyrole in pain. Here, again it was necessary to build comparative models. To validatemy models, extensive mutagenesis, electrophysiology and UV-crosslinking experi-ments were performed in Prof. Grunder’s lab. The atomistic insights gained fromthese models pave the way for rational structure-based drug discovery on ASICSs.

In another ongoing project, again with the University Clinic Aachen, thistime in collaboration with the group of Prof. Huber, I rationalized the binding of thecompound KIRA-6 to the Fyn and Lyn kinase, respectively. These models encour-aged us to propose a shared inhibitor for both kinases. After a virtual screeningwe prioritized five compounds for in cell testing, and found that one compoundshowed a dose-dependent inhibition.

In conclusion, computer-assisted drug discovery, machine learning andmolecular dynamics simulations provide striking opportunities to increase our un-derstanding of biological and pathophysiological processes, often even at the atom-istic level. This, in return, facilitates the discovery of novel chemical entities, alsofor diseases that have proven to be challenging. Ideally, these methods should beintegrated at an early stage of a project to leverage their potential and complementexperiments.

97

Bibliography

Aqeel Ahmed, Friedrich Rippmann, Gerhard Barnickel, and Holger Gohlke. Anormal mode-based geometric simulation approach for exploring biologicallyrelevant conformational transitions in proteins. 51(7):1604–1622, 6 2011. ISSN1549-9596. doi: 10.1021/ci100461k. URL http://dx.doi.org/10.1021/ci100461k.

Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ,Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C,Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC, and Obradovic Z.Intrinsically disordered protein. 19:11381529, 2001. ISSN 1093-3263.

Jan Almlof and Peter R Taylor. General contraction of gaussian basis sets. i. atomicnatural orbitals for first-and second-row atoms. The Journal of chemical physics, 86:4070, 1987.

S. Altschul. Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs. 25(17):3389–3402, 9 1997. ISSN 1362-4962. doi: 10.1093/nar/25.17.3389. URL http://dx.doi.org/10.1093/nar/25.17.3389.

Rommie E. Amaro, Jerome Baudry, John Chodera, Ozlem Demir, J. Andrew Mc-Cammon, Yinglong Miao, and Jeremy C. Smith. Ensemble docking in drug dis-covery. 114(10):2271–2278, 5 2018. ISSN 0006-3495. doi: 10.1016/j.bpj.2018.02.038.URL http://dx.doi.org/10.1016/j.bpj.2018.02.038.

Hans C. Andersen. Molecular dynamics simulations at constant pressure and/ortemperature. 72(4):2384–2393, 2 1980. ISSN 0021-9606. doi: 10.1063/1.439486.URL http://dx.doi.org/10.1063/1.439486.

Karen E Anderson, David Stamler, Mat D Davis, Stewart A Factor, Robert AHauser, Jouko Isojarvi, L Fredrik Jarskog, Joohi Jimenez-Shahed, Rajeev Kumar,Joseph P McEvoy, Stanislaw Ochudlo, William G Ondo, and Hubert H Fernan-dez. Deutetrabenazine for treatment of involuntary movements in patients withtardive dyskinesia (AIM-TD): a double-blind, randomised, placebo-controlled,phase 3 trial. 4(8):595–604, 8 2017. ISSN 2215-0366. doi: 10.1016/s2215-0366(17)30236-5. URL http://dx.doi.org/10.1016/s2215-0366(17)30236-5.

98 Bibliography

A. Andreeva. SCOP database in 2004: refinements integrate structure and sequencefamily data. 32(90001):226D–229, 1 2004. ISSN 1362-4962. doi: 10.1093/nar/gkh039. URL http://dx.doi.org/10.1093/nar/gkh039.

Candice C Askwith, Chun Cheng, Mutsuhiro Ikuma, Christopher Benson, Mar-garet P Price, and Michael J Welsh. Neuropeptide FF and fmrfamide potentiateacid-evoked currents from sensory neurons and proton-gated deg/enac chan-nels. 26(1):133–141, 4 2000. ISSN 0896-6273. doi: 10.1016/s0896-6273(00)81144-7.URL http://dx.doi.org/10.1016/s0896-6273(00)81144-7.

Webb B and Sali A. Comparative protein structure modeling using MODELLER.54:27322406, Jun 2016. ISSN 1934-3396. doi: 10.1002/cpbi.3. URL https://dx.doi.org/10.1002/cpbi.3.

I. Baburin, S. Beyl, and S. Hering. Automated fast perfusion of xeno-pus oocytes for drug screening. 453(1):117–123, 9 2006. ISSN 0031-6768. doi: 10.1007/s00424-006-0125-y. URL http://dx.doi.org/10.1007/s00424-006-0125-y.

Isabelle Baconguis and Eric Gouaux. Structural plasticity and dynamic selectivityof acid-sensing ion channel–spider toxin complexes. 489(7416):400–405, 7 2012.ISSN 0028-0836. doi: 10.1038/nature11375. URL http://dx.doi.org/10.1038/nature11375.

Isabelle Baconguis, Christopher J. Bohlen, April Goehring, David Julius, and EricGouaux. X-ray structure of acid-sensing ion channel 1–snake toxin complex re-veals open state of a na+-selective channel. 156(4):717–729, 2 2014. ISSN 0092-8674. doi: 10.1016/j.cell.2014.01.011. URL http://dx.doi.org/10.1016/j.cell.2014.01.011.

David Ban, Luigi I. Iconaru, Arvind Ramanathan, Jian Zuo, and Richard W. Kri-wacki. A small molecule causes a population shift in the conformational land-scape of an intrinsically disordered protein. 139(39):13692–13700, 9 2017. ISSN0002-7863. doi: 10.1021/jacs.7b01380. URL http://dx.doi.org/10.1021/jacs.7b01380.

T. Bartoi, K. Augustinowski, G. Polleichtner, S. Grunder, and M. H. Ulbrich. Acid-sensing ion channel (ASIC) 1a/2a heteromers have a flexible 2:1/1:2 stoichiom-etry. 111(22):8281–8286, 5 2014. ISSN 0027-8424. doi: 10.1073/pnas.1324060111.URL http://dx.doi.org/10.1073/pnas.1324060111.

Christopher J. Benson, Stephani P. Eckert, and Edwin W. McCleskey. Acid-evokedcurrents in cardiac sensory neurons. 84(8):921–928, 4 1999. ISSN 0009-7330.doi: 10.1161/01.res.84.8.921. URL http://dx.doi.org/10.1161/01.res.84.8.921.

Bibliography 99

H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R. Haak.Molecular dynamics with coupling to an external bath. 81(8):3684–3690, 10 1984.ISSN 0021-9606. doi: 10.1063/1.448118. URL http://dx.doi.org/10.1063/1.448118.

H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma. The missing term ineffective pair potentials. 91(24):6269–6271, 11 1987. ISSN 0022-3654. doi:10.1021/j100308a038. URL http://dx.doi.org/10.1021/j100308a038.

H.J.C. Berendsen, D. van der Spoel, and R. van Drunen. GROMACS: A message-passing parallel molecular dynamics implementation. 91(1-3):43–56, 9 1995. ISSN0010-4655. doi: 10.1016/0010-4655(95)00042-e. URL http://dx.doi.org/10.1016/0010-4655(95)00042-e.

H. M. Berman. The protein data bank. 28(1):235–242, 1 2000. ISSN 1362-4962. doi:10.1093/nar/28.1.235. URL http://dx.doi.org/10.1093/nar/28.1.235.

Marco Biasini, Stefan Bienert, Andrew Waterhouse, Konstantin Arnold, GabrielStuder, Tobias Schmidt, Florian Kiefer, Tiziano Gallo Cassarino, Martino Bertoni,Lorenza Bordoli, and Torsten Schwede. SWISS-MODEL: modelling protein ter-tiary and quaternary structure using evolutionary information. 42(W1):W252–W258, 4 2014. ISSN 1362-4962. doi: 10.1093/nar/gku340. URL http://dx.doi.org/10.1093/nar/gku340.

Ethan Bier. Drosophila, the golden bug, emerges as a tool for human genetics. 6(1):9–23, 1 2005. ISSN 1471-0056. doi: 10.1038/nrg1503. URL http://dx.doi.org/10.1038/nrg1503.

Pierre Blanchet and Veronika Kivenko. Drug-induced parkinsonism: diagnosis andmanagement. Volume 6:83–91, 9 2016. ISSN 1927-7733. doi: 10.2147/jprls.s99197.URL http://dx.doi.org/10.2147/jprls.s99197.

A. P. Blum, H. A. Lester, and D. A. Dougherty. Nicotinic pharmacophore: Thepyridine N of nicotine and carbonyl of acetylcholine hydrogen bond across asubunit interface to a backbone NH. 107(30):13206–13211, 6 2010. ISSN 0027-8424. doi: 10.1073/pnas.1007140107. URL http://dx.doi.org/10.1073/pnas.1007140107.

M. Bomans, K.-H. Hohne, U. Tiede, and M. Riemer. 3-D segmentation of MR imagesof the head for 3-D display. 9(2):177–183, 6 1990. ISSN 0278-0062. doi: 10.1109/42.56342. URL http://dx.doi.org/10.1109/42.56342.

Raphael Bonelli and Gregor Wenning. Pharmacological management of hunt-ingtons disease: An evidence- based review. 12(21):2701–2720, 7 2006. ISSN1381-6128. doi: 10.2174/138161206777698693. URL http://dx.doi.org/10.2174/138161206777698693.

100 Bibliography

M. Born and R. Oppenheimer. Zur Quantentheorie der Molekeln. Annalen derPhysik, 389(20):457–484, 1927. ISSN 1521-3889. doi: 10.1002/andp.19273892002.URL http://dx.doi.org/10.1002/andp.19273892002.

G. R. Bowman and P. L. Geissler. Equilibrium fluctuations of a single folded proteinreveal a multitude of potential cryptic allosteric sites. 109(29):11681–11686, 72012. ISSN 0027-8424. doi: 10.1073/pnas.1209309109. URL http://dx.doi.org/10.1073/pnas.1209309109.

Ryan Brenke, Dima Kozakov, Gwo-Yu Chuang, Dmitri Beglov, David Hall,Melissa R. Landon, Carla Mattos, and Sandor Vajda. Fragment-based identifi-cation of druggable ‘hot spots’ of proteins using fourier domain correlation tech-niques. 25(5):621–627, 1 2009. ISSN 1460-2059. doi: 10.1093/bioinformatics/btp036. URL http://dx.doi.org/10.1093/bioinformatics/btp036.

S. E. Brenner, C. Chothia, and T. J. P. Hubbard. Assessing sequence comparisonmethods with reliable structurally identified distant evolutionary relationships.95(11):6073–6078, 5 1998. ISSN 0027-8424. doi: 10.1073/pnas.95.11.6073. URLhttp://dx.doi.org/10.1073/pnas.95.11.6073.

Chothia C and Lesk AM. The relation between the divergence of sequence andstructure in proteins. 5:3709526, Apr 1986. ISSN 0261-4189.

Stanley N Caroff, Saurabh Aggarwal, and Charles Yonan. Treatment of tardivedyskinesia with tetrabenazine or valbenazine: a systematic review. 7(2):135–148,2 2018. ISSN 2042-6305. doi: 10.2217/cer-2017-0065. URL http://dx.doi.org/10.2217/cer-2017-0065.

Jean-Michel Carter, Warren Emmett, Igor Rdl Mozos, Annika Kotter, Mark Helm,Jernej Ule, and Shobbir Hussain. Ficc-seq: a method for enzyme-specified pro-filing of methyl-5-uridine in cellular RNA. 47(19):e113–e113, 7 2019. ISSN 0305-1048. doi: 10.1093/nar/gkz658. URL http://dx.doi.org/10.1093/nar/gkz658.

D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M. Merz, A. Onufriev,C. Simmerling, B. Wang, and R. J. Woods. The Amber biomolecular simulationprograms. J Comput Chem, 26(16):1668–1688, Dec 2005a.

David A. Case, Thomas E. Cheatham, Tom Darden, Holger Gohlke, Ray Luo, Ken-neth M. Merz, Alexey Onufriev, Carlos Simmerling, Bing Wang, and Robert J.Woods. The amber biomolecular simulation programs. 26(16):1668–1688, 2005b.ISSN 0192-8651. doi: 10.1002/jcc.20290. URL http://dx.doi.org/10.1002/jcc.20290.

David S. Cerutti, Robert Duke, Peter L. Freddolino, Hao Fan, and Terry P. Lybrand.A vulnerability in popular molecular dynamics packages concerning langevin

Bibliography 101

and andersen dynamics. 4(10):1669–1680, 9 2008. ISSN 1549-9618. doi: 10.1021/ct8002173. URL http://dx.doi.org/10.1021/ct8002173.

M. R. Chance. Structural genomics: A pipeline for providing structures for thebiologist. 11(4):723–738, 4 2002. ISSN 0961-8368. doi: 10.1110/ps.4570102. URLhttp://dx.doi.org/10.1110/ps.4570102.

Yu-Hsin Chang, Susumu Nishimura, Hisashi Oishi, Vincent P. Kelly, Akihiro Kuno,and Satoru Takahashi. TRMT2A is a novel cell cycle regulator that suppresses cellproliferation. 508(2):410–415, 1 2019. ISSN 0006-291X. doi: 10.1016/j.bbrc.2018.11.104. URL http://dx.doi.org/10.1016/j.bbrc.2018.11.104.

Hongming Chen, Ola Engkvist, Yinhai Wang, Marcus Olivecrona, and ThomasBlaschke. The rise of deep learning in drug discovery. 23(6):1241–1250, 6 2018.ISSN 1359-6446. doi: 10.1016/j.drudis.2018.01.039. URL http://dx.doi.org/10.1016/j.drudis.2018.01.039.

Xuanmao Chen, Hubert Kalbacher, and Stefan Grunder. The tarantula toxinpsalmotoxin 1 inhibits acid-sensing ion channel (ASIC) 1a by increasing its ap-parent h+affinity. 126(1):71–79, 6 2005. ISSN 0022-1295. doi: 10.1085/jgp.200509303. URL http://dx.doi.org/10.1085/jgp.200509303.

Xuanmao Chen, Martin Paukert, Ivan Kadurin, Michael Pusch, and StefanGrunder. Strong modulation by rfamide neuropeptides of the asic1b/3 het-eromer in competition with extracellular calcium. 50(8):964–974, 6 2006. ISSN0028-3908. doi: 10.1016/j.neuropharm.2006.01.007. URL http://dx.doi.org/10.1016/j.neuropharm.2006.01.007.

Alan C Cheng, Ryan G Coleman, Kathleen T Smyth, Qing Cao, Patricia Soulard,Daniel R Caffrey, Anna C Salzberg, and Enoch S Huang. Structure-based max-imal affinity model predicts small-molecule druggability. 25(1):71–75, 1 2007.ISSN 1087-0156. doi: 10.1038/nbt1273. URL http://dx.doi.org/10.1038/nbt1273.

Paul Christo and Michael Bottros. Current perspectives on intrathecal drug de-livery. page 615, 11 2014. ISSN 1178-7090. doi: 10.2147/jpr.s37591. URLhttp://dx.doi.org/10.2147/jpr.s37591.

Peter Cimermancic, Patrick Weinkam, T. Justin Rettenmaier, Leon Bichmann,Daniel A. Keedy, Rahel A. Woldeyes, Dina Schneidman-Duhovny, Omar N. De-merdash, Julie C. Mitchell, James A. Wells, James S. Fraser, and Andrej Sali. Cryp-tosite: Expanding the druggable proteome by characterization and prediction ofcryptic binding sites. 428(4):709–719, 2 2016. ISSN 0022-2836. doi: 10.1016/j.jmb.2016.01.029. URL http://dx.doi.org/10.1016/j.jmb.2016.01.029.

102 Bibliography

Antoine Clery, Markus Blatter, and Frederic H-T Allain. RNA recognition motifs:boring? not quite. 18(3):290–298, 6 2008. ISSN 0959-440X. doi: 10.1016/j.sbi.2008.04.002. URL http://dx.doi.org/10.1016/j.sbi.2008.04.002.

Topham CM, McLeod A, Eisenmenger F, Overington JP, Johnson MS, and BlundellTL. Fragment ranking in modelling of protein structure. conformationally con-strained environmental amino acid substitution tables. 229:8421300, Jan 1993.ISSN 0022-2836. doi: 10.1006/jmbi.1993.1018. URL https://dx.doi.org/10.1006/jmbi.1993.1018.

Giulia Coarelli, Alhassane Diallo, Morgane Sonia Thion, Daisy Rinaldi, FabienneCalvas, Ouahid Lagha Boukbiza, Alina Tataru, Perrine Charles, Christine Tran-chant, Cecilia Marelli, Claire Ewenczyk, Maya Tchikviladze, Marie-LorraineMonin, Bertrand Carlander, Mathieu Anheim, Alexis Brice, Fanny Mochel, So-phie Tezenas du Montcel, Sandrine Humbert, and Alexandra Durr. Low cancerprevalence in polyglutamine expansion diseases. 88(12):1114–1119, 2 2017. ISSN0028-3878. doi: 10.1212/wnl.0000000000003725. URL http://dx.doi.org/10.1212/wnl.0000000000003725.

Lorenzo Corsini, Sophie Bonnal, Jerome Basquin, Michael Hothorn, Klaus Schef-fzek, Juan Valcarcel, and Michael Sattler. U2af-homology motif interactions arerequired for alternative splicing regulation by SPF45. 14(7):620–629, 6 2007. ISSN1545-9993. doi: 10.1038/nsmb1260. URL http://dx.doi.org/10.1038/nsmb1260.

Constanza J. Cortes, Shuo-Chien Ling, Ling T. Guo, Gene Hung, Taiji Tsunemi,Linda Ly, Seiya Tokunaga, Edith Lopez, Bryce L. Sopher, C. Frank Bennett, G. Di-ane Shelton, Don W. Cleveland, and Albert R. La Spada. Muscle expression ofmutant androgen receptor accounts for systemic and motor neuron disease phe-notypes in spinal and bulbar muscular atrophy. 82(2):295–307, 4 2014. ISSN0896-6273. doi: 10.1016/j.neuron.2014.03.001. URL http://dx.doi.org/10.1016/j.neuron.2014.03.001.

Rebecca Craven. The risky business of drug development in neurology. 10(2):116–117, 2 2011. ISSN 1474-4422. doi: 10.1016/s1474-4422(11)70004-7. URLhttp://dx.doi.org/10.1016/s1474-4422(11)70004-7.

Jeffrey L. Cummings and Kate Zhong. Clinical Trials and Drug Development inNeurodegenerative Diseases. Oxford University Press, 11 2016. doi: 10.1093/med/9780190233563.003.0018. URL http://dx.doi.org/10.1093/med/9780190233563.003.0018.

Kenneth A. Cushman, Josephine Marsh-Haffner, John P. Adelman, and Edwin W.McCleskey. A conformation change in the extracellular domain that accompaniesdesensitization of acid-sensing ion channel (ASIC) 3. 129(4):345–350, 3 2007. ISSN

Bibliography 103

0022-1295. doi: 10.1085/jgp.200709757. URL http://dx.doi.org/10.1085/jgp.200709757.

Murray CW, Auton TR, and Eldridge MD. Empirical scoring functions. II. the test-ing of an empirical scoring function for the prediction of ligand-receptor bindingaffinities and the use of bayesian regression to improve the quality of the model.12:9834910, Sep 1998. ISSN 0920-654X.

Antoine Daina, Olivier Michielin, and Vincent Zoete. Swisstargetprediction: up-dated data and new features for efficient prediction of protein targets of smallmolecules. 47(W1):W357–W364, 5 2019. ISSN 0305-1048. doi: 10.1093/nar/gkz382. URL http://dx.doi.org/10.1093/nar/gkz382.

Jonathan J. Danon, Tristan A. Reekie, and Michael Kassiou. Challenges and op-portunities in central nervous system drug discovery. 1(6):612–624, 9 2019. ISSN2589-5974. doi: 10.1016/j.trechm.2019.04.009. URL http://dx.doi.org/10.1016/j.trechm.2019.04.009.

Tom Darden, Darrin York, and Lee Pedersen. Particle mesh ewald: An n-log(n)method for ewald sums in large systems. The Journal of Chemical Physics, 98(12):10089–10092, 1993. doi: 10.1063/1.464397. URL https://doi.org/10.1063/1.464397.

Pari Davanloo, Mathias Sprinzl, Kimitsuna Watanabe, Martin Albani, and HelgaKersten. Role of ribothymidine in the thermal stability of transfer RNA as moni-tored by proton magnetic resonance. 6(4):1571–1581, 1979. ISSN 0305-1048. doi:10.1093/nar/6.4.1571. URL http://dx.doi.org/10.1093/nar/6.4.1571.

Amanda K. Davis, William B. Pratt, Andrew P. Lieberman, and Yoichi Osawa. Tar-geting hsp70 facilitated protein quality control for treatment of polyglutaminediseases. 9 2019. ISSN 1420-682X. doi: 10.1007/s00018-019-03302-2. URLhttp://dx.doi.org/10.1007/s00018-019-03302-2.

Warren L DeLano. Unraveling hot spots in binding interfaces: progress and chal-lenges. 12(1):14–20, 2 2002. ISSN 0959-440X. doi: 10.1016/s0959-440x(02)00283-x.URL http://dx.doi.org/10.1016/s0959-440x(02)00283-x.

Rebecca Deprez-Poulain and Benoit Deprez. Facts, figures and trends in lead gen-eration. 4(6):569–580, 2 2004. ISSN 1568-0266. doi: 10.2174/1568026043451168.URL http://dx.doi.org/10.2174/1568026043451168.

E. Deval, A. Baron, E. Lingueglia, H. Mazarguil, J.-M. Zajac, and M. Lazdun-ski. Effects of neuropeptide SF and related peptides on acid sensing ion chan-nel 3 and sensory neuron excitability. 44(5):662–671, 4 2003. ISSN 0028-3908.doi: 10.1016/s0028-3908(03)00047-9. URL http://dx.doi.org/10.1016/s0028-3908(03)00047-9.

104 Bibliography

E. Deval, J. Noel, X. Gasull, A. Delaunay, A. Alloui, V. Friend, A. Eschalier, M. Laz-dunski, and E. Lingueglia. Acid-sensing ion channels in postoperative pain. 31(16):6059–6066, 4 2011. ISSN 0270-6474. doi: 10.1523/jneurosci.5266-10.2011.URL http://dx.doi.org/10.1523/jneurosci.5266-10.2011.

Emmanuel Deval and Eric Lingueglia. Acid-sensing ion channels and nociceptionin the peripheral and central nervous systems. 94:49–57, 7 2015. ISSN 0028-3908. doi: 10.1016/j.neuropharm.2015.02.009. URL http://dx.doi.org/10.1016/j.neuropharm.2015.02.009.

Emmanuel Deval, Jacques Noel, Nadege Lay, Abdelkrim Alloui, Sylvie Diochot,Valerie Friend, Martine Jodar, Michel Lazdunski, and Eric Lingueglia. ASIC3,a sensor of acidic and primary inflammatory pain. 27(22):3047–3055, 10 2008.ISSN 0261-4189. doi: 10.1038/emboj.2008.213. URL http://dx.doi.org/10.1038/emboj.2008.213.

S. Dietmann. A fully automatic evolutionary classification of protein folds: Dalidomain dictionary version 3. 29(1):55–57, 1 2001. ISSN 1362-4962. doi: 10.1093/nar/29.1.55. URL http://dx.doi.org/10.1093/nar/29.1.55.

Nana Diarra dit Konte, Miroslav Krepl, Fred F. Damberger, Nina Ripin, OlivierDuss, Jirı Sponer, and Frederic H.-T. Allain. Aromatic side-chain conformationalswitch on the surface of the RNA recognition motif enables RNA discrimination.8(1), 9 2017. ISSN 2041-1723. doi: 10.1038/s41467-017-00631-3. URL http://dx.doi.org/10.1038/s41467-017-00631-3.

Hazuda DJ, Anthony NJ, Gomez RP, Jolly SM, Wai JS, Zhuang L, Fisher TE, EmbreyM, Guare JP Jr, Egbertson MS, Vacca JP, Huff JR, Felock PJ, Witmer MV, StillmockKA, Danovich R, Grobler J, Miller MD, Espeseth AS, Jin L, Chen IW, Lin JH, Kas-sahun K, Ellis JD, Wong BK, Xu W, Pearson PG, Schleif WA, Cortese R, Emini E,Summa V, Holloway MK, and Young SD. A naphthyridine carboxamide providesevidence for discordant resistance between mechanistically identical inhibitors ofHIV-1 integrase. 101:15277684, Aug 2004. ISSN 0027-8424. doi: 10.1073/pnas.0402357101. URL https://dx.doi.org/10.1073/pnas.0402357101.

Gideon Dreyfuss, V. Narry Kim, and Naoyuki Kataoka. Messenger-rna-bindingproteins and the messages they carry. 3(3):195–205, 3 2002. ISSN 1471-0072. doi:10.1038/nrm760. URL http://dx.doi.org/10.1038/nrm760.

Ron O. Dror, Albert C. Pan, Daniel H. Arlow, David W. Borhani, Paul Maragakis,Yibing Shan, Huafeng Xu, and David E. Shaw. Pathway and mechanism of drugbinding to g-protein-coupled receptors. 108(32):13118–13123, 7 2011. ISSN 0027-8424. doi: 10.1073/pnas.1104614108. URL http://dx.doi.org/10.1073/pnas.1104614108.

Bibliography 105

Ron O. Dror, Robert M. Dirks, J.P. Grossman, Huafeng Xu, and David E.Shaw. Biomolecular simulation: A computational microscope for molec-ular biology. 41(1):429–452, 6 2012. ISSN 1936-122X. doi: 10.1146/annurev-biophys-042910-155245. URL http://dx.doi.org/10.1146/annurev-biophys-042910-155245.

Brittany N. Dugger and Dennis W. Dickson. Pathology of neurodegenerative dis-eases. 9(7):a028035, 1 2017. ISSN 1943-0264. doi: 10.1101/cshperspect.a028035.URL http://dx.doi.org/10.1101/cshperspect.a028035.

Roland L Dunbrack. Rotamer libraries in the 21st century. 12(4):431–440, 8 2002.ISSN 0959-440X. doi: 10.1016/s0959-440x(02)00344-5. URL http://dx.doi.org/10.1016/s0959-440x(02)00344-5.

Roland L. Dunbrack and Martin Karplus. Backbone-dependent rotamer library forproteins application to side-chain prediction. 230(2):543–574, 3 1993. ISSN 0022-2836. doi: 10.1006/jmbi.1993.1170. URL http://dx.doi.org/10.1006/jmbi.1993.1170.

Jacob D. Durrant and J. Andrew McCammon. Nnscore 2.0: A neural-network re-ceptor–ligand scoring function. 51(11):2897–2903, 11 2011. ISSN 1549-9596. doi:10.1021/ci2003889. URL http://dx.doi.org/10.1021/ci2003889.

Jean Durup. On “levinthal paradox” and the theory of protein folding. 424(1-2):157–169, 2 1998. ISSN 0166-1280. doi: 10.1016/s0166-1280(97)00238-8. URLhttp://dx.doi.org/10.1016/s0166-1280(97)00238-8.

Maciej Długosz and Joanna Trylska. Secondary structures of native and pathogenichuntingtin n-terminal fragments. 115(40):11597–11608, 10 2011. ISSN 1520-6106.doi: 10.1021/jp206373g. URL http://dx.doi.org/10.1021/jp206373g.

Edward H. Egelman. The current revolution in cryo-em. 110(5):1008–1012, 3 2016.ISSN 0006-3495. doi: 10.1016/j.bpj.2016.02.001. URL http://dx.doi.org/10.1016/j.bpj.2016.02.001.

David Eisenberg, Roland Luthy, and James U. Bowie. [20] VERIFY3D: assessmentof protein models with three-dimensional profiles. In Methods in Enzymology,pages 396–404. Elsevier, 1997. ISBN 9780121821784. doi: 10.1016/s0076-6879(97)77022-8. URL http://dx.doi.org/10.1016/s0076-6879(97)77022-8.

A. Elbaz, L. Carcaillon, S. Kab, and F. Moisan. Epidemiology of parkinson’s disease.172(1):14–26, 1 2016. ISSN 0035-3787. doi: 10.1016/j.neurol.2015.09.012. URLhttp://dx.doi.org/10.1016/j.neurol.2015.09.012.

Andrew C. Elden, Hyung-Jun Kim, Michael P. Hart, Alice S. Chen-Plotkin, Brian S.Johnson, Xiaodong Fang, Maria Armakola, Felix Geser, Robert Greene, Min MinLu, Arun Padmanabhan, Dana Clay-Falcone, Leo McCluskey, Lauren Elman,

106 Bibliography

Denise Juhr, Peter J. Gruber, Udo Rub, Georg Auburger, John Q. Trojanowski,Virginia M.-Y. Lee, Vivianna M. Van Deerlin, Nancy M. Bonini, and Aaron D.Gitler. Ataxin-2 intermediate-length polyglutamine expansions are associatedwith increased risk for ALS. 466(7310):1069–1075, 8 2010. ISSN 0028-0836. doi:10.1038/nature09320. URL http://dx.doi.org/10.1038/nature09320.

Paul Emsley and Kevin Cowtan. Coot: model-building tools for molecular graph-ics. 60(12):2126–2132, 11 2004. ISSN 0907-4449. doi: 10.1107/s0907444904019158.URL http://dx.doi.org/10.1107/s0907444904019158.

Philipp Ermert. Design, properties and recent application of macrocycles in medic-inal chemistry. 71(10):678–702, 10 2017. ISSN 0009-4293. doi: 10.2533/chimia.2017.678. URL http://dx.doi.org/10.2533/chimia.2017.678.

Ulrich Essmann, Lalith Perera, Max L. Berkowitz, Tom Darden, Hsing Lee, andLee G. Pedersen. A smooth particle mesh ewald method. 103(19):8577–8593, 111995. ISSN 0021-9606. doi: 10.1063/1.470117. URL http://dx.doi.org/10.1063/1.470117.

Philip Evans. Scaling and assessment of data quality. 62(1):72–82, 12 2005. ISSN0907-4449. doi: 10.1107/s0907444905036693. URL http://dx.doi.org/10.1107/s0907444905036693.

Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, MarsdenR, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G,Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, and Orengo C.The CATH domain structure database and related resources gene3d and DHSprovide comprehensive domain family information for genome analysis. 33:15608188, Jan 2005. ISSN 0305-1048. doi: 10.1093/nar/gki024. URL https://dx.doi.org/10.1093/nar/gki024.

Wilfredo Evangelista Falcon, Sally R. Ellingson, Jeremy C. Smith, and JeromeBaudry. Ensemble docking in drug discovery: How many protein configura-tions from molecular dynamics simulations are needed to reproduce known lig-and binding? 123(25):5189–5195, 1 2019. ISSN 1520-6106. doi: 10.1021/acs.jpcb.8b11491. URL http://dx.doi.org/10.1021/acs.jpcb.8b11491.

Giorgio Favrin, Anders Irback, and Fredrik Sjunnesson. Monte carlo update forchain molecules: Biased gaussian steps in torsional space. 114(18):8154–8158, 52001. ISSN 0021-9606. doi: 10.1063/1.1364637. URL http://dx.doi.org/10.1063/1.1364637.

Liang Feng, Kelly Sheppard, Suk Namgoong, alexandre Ambrogelly, Carla Poly-carpo, Lennart Randau, Debra Tumbula-Hansena, and Dieter Soll. Aminoacyl-trna synthesis by pre-translational amino acid modification. 1(1):15–19, 5 2004.ISSN 1547-6286. doi: 10.4161/rna.1.1.953. URL http://dx.doi.org/10.4161/rna.1.1.953.

Bibliography 107

P. W. Fenimore, H. Frauenfelder, B. H. McMahon, and R. D. Young. Bulk-solventand hydration-shell fluctuations, similar to - and -fluctuations in glasses, controlprotein motions and functions. 101(40):14408–14413, 9 2004. ISSN 0027-8424.doi: 10.1073/pnas.0405573101. URL http://dx.doi.org/10.1073/pnas.0405573101.

Ana-Maria Fernandez-Escamilla, Frederic Rousseau, Joost Schymkowitz, and LuisSerrano. Prediction of sequence-dependent and mutational effects on the aggre-gation of peptides and proteins. 22(10):1302–1306, 9 2004. ISSN 1087-0156. doi:10.1038/nbt1012. URL http://dx.doi.org/10.1038/nbt1012.

Emil Fischer. Einfluss der configuration auf die wirkung der enzyme. 27(3):2985–2993, 10 1894. ISSN 0365-9496. doi: 10.1002/cber.18940270364. URL http://dx.doi.org/10.1002/cber.18940270364.

Andras Fiser. Comparative protein structure modelling. In From Protein Structureto Function with Bioinformatics, pages 91–134. Springer Netherlands, 2017. ISBN9789402410679. doi: 10.1007/978-94-024-1069-3 4. URL http://dx.doi.org/10.1007/978-94-024-1069-3_4.

Stefano Forli. Charting a path to success in virtual screening. 20(10):18732–18758,10 2015. ISSN 1420-3049. doi: 10.3390/molecules201018732. URL http://dx.doi.org/10.3390/molecules201018732.

Samuel Frank. Treatment of huntington’s disease. 11(1):153–160, 12 2013. ISSN1933-7213. doi: 10.1007/s13311-013-0244-z. URL http://dx.doi.org/10.1007/s13311-013-0244-z.

Matthew Freeman. Reiterative use of the egf receptor triggers differentiation of allcell types in the drosophila eye. Cell, 87(4):651–660, 1996.

Tamara Frembgen-Kesner and Adrian H. Elcock. Computational sampling of acryptic drug binding site in a protein receptor: Explicit solvent molecular dynam-ics and inhibitor docking to p38 MAP kinase. 359(1):202–214, 5 2006. ISSN 0022-2836. doi: 10.1016/j.jmb.2006.03.021. URL http://dx.doi.org/10.1016/j.jmb.2006.03.021.

Erin N. Frey, Ryan E. Pavlovicz, Clem John Wegman, Chenglong Li, and Can-dice C. Askwith. Conformational changes in the lower palm domain of asic1acontribute to desensitization and rfamide modulation. 8(8):e71733, 8 2013. ISSN1932-6203. doi: 10.1371/journal.pone.0071733. URL http://dx.doi.org/10.1371/journal.pone.0071733.

Richard A. Friesner, Robert B. Murphy, Matthew P. Repasky, Leah L. Frye,Jeremy R. Greenwood, Thomas A. Halgren, Paul C. Sanschagrin, and Daniel T.Mainz. Extra precision glide: Docking and scoring incorporating a model of hy-drophobic enclosure for protein-ligand complexes. 49(21):6177–6196, 10 2006.

108 Bibliography

ISSN 0022-2623. doi: 10.1021/jm051256o. URL http://dx.doi.org/10.1021/jm051256o.

Dmitrij Frishman and Alfonso Valencia, editors. Modern Genome Annotation.Springer Vienna, 2008. ISBN 9783211751220. doi: 10.1007/978-3-211-75123-7.URL http://dx.doi.org/10.1007/978-3-211-75123-7.

Jason Gavenonis, Bradley A Sheneman, Timothy R Siegert, Matthew R Eshelman,and Joshua A Kritzer. Comprehensive analysis of loops at protein-protein inter-faces for macrocycle design. 10(9):716–722, 7 2014. ISSN 1552-4450. doi: 10.1038/nchembio.1580. URL http://dx.doi.org/10.1038/nchembio.1580.

Phani Ghanakota and Heather A. Carlson. Moving beyond active-site detection:Mixmd applied to allosteric systems. 120(33):8685–8695, 6 2016. ISSN 1520-6106. doi: 10.1021/acs.jpcb.6b03515. URL http://dx.doi.org/10.1021/acs.jpcb.6b03515.

Phani Ghanakota, Herman van Vlijmen, Woody Sherman, and Thijs Beuming.Large-scale validation of mixed-solvent simulations to assess hotspots at pro-tein–protein interaction interfaces. 58(4):784–793, 4 2018. ISSN 1549-9596. doi:10.1021/acs.jcim.7b00487. URL http://dx.doi.org/10.1021/acs.jcim.7b00487.

Laura Ghezzi. Diagnosis of alzheimer’s disease typical and atypical forms. InNeurodegenerative Diseases, pages 21–28. Springer International Publishing, 2018.ISBN 9783319729374. doi: 10.1007/978-3-319-72938-1 2. URL http://dx.doi.org/10.1007/978-3-319-72938-1_2.

Suel GM, Lockless SW, Wall MA, and Ranganathan R. Evolutionarily conservednetworks of residues mediate allosteric communication in proteins. 10:12483203,Jan 2003. ISSN 1072-8368. doi: 10.1038/nsb881. URL https://dx.doi.org/10.1038/nsb881.

Holger Gohlke, Manfred Hendlich, and Gerhard Klebe. Knowledge-based scoringfunction to predict protein-ligand interactions. 295(2):337–356, 1 2000. ISSN 0022-2836. doi: 10.1006/jmbi.1999.3371. URL http://dx.doi.org/10.1006/jmbi.1999.3371.

Joe G Greener and Michael JE Sternberg. Allopred: prediction of allosteric pocketson proteins using normal mode perturbation analysis. 16(1), 10 2015. ISSN 1471-2105. doi: 10.1186/s12859-015-0771-1. URL http://dx.doi.org/10.1186/s12859-015-0771-1.

Joe G Greener and Michael JE Sternberg. Structure-based prediction of proteinallostery. 50:1–8, 6 2018. ISSN 0959-440X. doi: 10.1016/j.sbi.2017.10.002. URLhttp://dx.doi.org/10.1016/j.sbi.2017.10.002.

Bibliography 109

Stefan Grunder and Michael Pusch. Biophysical properties of acid-sensing ionchannels (asics). 94:9–18, 7 2015. ISSN 0028-3908. doi: 10.1016/j.neuropharm.2014.12.016. URL http://dx.doi.org/10.1016/j.neuropharm.2014.12.016.

VoSSfeldt H, Butzlaff M, PruSSing K, Nı Charthaigh RA, Karsten P, Lankes A,Hamm S, Simons M, Adryan B, Schulz JB, and Voigt A. Large-scale screen formodifiers of ataxin-3-derived polyglutamine-induced toxicity in drosophila. 7:23139745, 2012. doi: 10.1371/journal.pone.0047452. URL https://dx.doi.org/10.1371/journal.pone.0047452.

Philip J. Hajduk, Jeffrey R. Huth, and Stephen W. Fesik. Druggability indicesfor protein targets derived from nmr-based screening data. 48(7):2518–2525, 42005. ISSN 0022-2623. doi: 10.1021/jm049131r. URL http://dx.doi.org/10.1021/jm049131r.

Thomas A. Halgren. Identifying and characterizing binding sites and assessingdruggability. 49(2):377–389, 1 2009. ISSN 1549-9596. doi: 10.1021/ci800324m.URL http://dx.doi.org/10.1021/ci800324m.

Tracy P Hamilton and Peter Pulay. Direct inversion in the iterative subspace (DIIS)optimization of open-shell, excited-state, and small multiconfiguration SCF wavefunctions. The Journal of chemical physics, 84:5728, 1986.

Timothy F. Havel and Mark E. Snow. A new method for building protein confor-mations from sequence alignments with homologues of known structure. 217(1):1–7, 1 1991. ISSN 0022-2836. doi: 10.1016/0022-2836(91)90603-4. URLhttp://dx.doi.org/10.1016/0022-2836(91)90603-4.

Paul C. D. Hawkins, A. Geoffrey Skillman, Gregory L. Warren, Benjamin A. Elling-son, and Matthew T. Stahl. Conformer generation with OMEGA: algorithmand validation using high quality structures from the protein databank andcambridge structural database. 50(4):572–584, 3 2010. ISSN 1549-9596. doi:10.1021/ci100031x. URL http://dx.doi.org/10.1021/ci100031x.

Werner Heisenberg. Die Entwicklung der Deutung der Quantentheorie. Physikalis-che Blatter, 12(7):289–304, 1956.

V. Heiser, E. Scherzinger, A. Boeddrich, E. Nordhoff, R. Lurz, N. Schugardt,H. Lehrach, and E. E. Wanker. Inhibition of huntingtin fibrillogenesis by specificantibodies and small molecules: Implications for huntington’s disease therapy.97(12):6739–6744, 5 2000. ISSN 0027-8424. doi: 10.1073/pnas.110138997. URLhttp://dx.doi.org/10.1073/pnas.110138997.

Lim Heo and Michael Feig. What makes it difficult to refine protein models furthervia molecular dynamics simulations? 86:177–188, 10 2017. ISSN 0887-3585. doi:10.1002/prot.25393. URL http://dx.doi.org/10.1002/prot.25393.

110 Bibliography

Csaba Hetenyi and David van der Spoel. Efficient docking of peptides to pro-teins without prior knowledge of the binding site. 11(7):1729–1737, 4 2009. ISSN0961-8368. doi: 10.1110/ps.0202302. URL http://dx.doi.org/10.1110/ps.0202302.

Alexander Hillisch, Luis Felipe Pineda, and Rolf Hilgenfeld. Utility of homologymodels in the drug discovery process. 9(15):659–669, 8 2004. ISSN 1359-6446.doi: 10.1016/s1359-6446(04)03196-4. URL http://dx.doi.org/10.1016/s1359-6446(04)03196-4.

Bosco K. Ho and David A. Agard. Probing the flexibility of large conformationalchanges in protein structures through local perturbations. 5(4):e1000343, 4 2009.ISSN 1553-7358. doi: 10.1371/journal.pcbi.1000343. URL http://dx.doi.org/10.1371/journal.pcbi.1000343.

Wim G. J. Hol. Protein crystallography and computer graphics—toward rationaldrug design. 25(9):767–778, 9 1986. ISSN 0570-0833. doi: 10.1002/anie.198607673.URL http://dx.doi.org/10.1002/anie.198607673.

Scott A. Hollingsworth and Ron O. Dror. Molecular dynamics simulation for all.99(6):1129–1143, 9 2018. ISSN 0896-6273. doi: 10.1016/j.neuron.2018.08.011. URLhttp://dx.doi.org/10.1016/j.neuron.2018.08.011.

Clive Holmes and Jay Amin. Dementia. 44(11):687–690, 11 2016. ISSN 1357-3039.doi: 10.1016/j.mpmed.2016.08.006. URL http://dx.doi.org/10.1016/j.mpmed.2016.08.006.

Rob W. W. Hooft, Gert Vriend, Chris Sander, and Enrique E. Abola. Errors in proteinstructures. 381(6580):272–272, 5 1996. ISSN 0028-0836. doi: 10.1038/381272a0.URL http://dx.doi.org/10.1038/381272a0.

Andrew L. Hopkins and Colin R. Groom. The druggable genome. 1(9):727–730, 92002. ISSN 1474-1776. doi: 10.1038/nrd892. URL http://dx.doi.org/10.1038/nrd892.

Andrew L. Hopkins, Colin R. Groom, and Alexander Alex. Ligand efficiency:a useful metric for lead selection. 9(10):430–431, 5 2004. ISSN 1359-6446.doi: 10.1016/s1359-6446(04)03069-7. URL http://dx.doi.org/10.1016/s1359-6446(04)03069-7.

Judit Horvath, Pierre R. Burkhard, Constantin Bouras, and Eniko Kovari. Etiologiesof parkinsonism in a century-long autopsy-based cohort. 23(1):28–33, 7 2012.ISSN 1015-6305. doi: 10.1111/j.1750-3639.2012.00611.x. URL http://dx.doi.org/10.1111/j.1750-3639.2012.00611.x.

Ya-Ming Hou and John J. Perona. Stereochemical mechanisms of trna methyltrans-ferases. 584(2):278–286, 11 2009. ISSN 0014-5793. doi: 10.1016/j.febslet.2009.11.075. URL http://dx.doi.org/10.1016/j.febslet.2009.11.075.

Bibliography 111

Julie Hugo and Mary Ganguli. Dementia and cognitive impairment. 30(3):421–442,8 2014. ISSN 0749-0690. doi: 10.1016/j.cger.2014.04.001. URL http://dx.doi.org/10.1016/j.cger.2014.04.001.

William Humphrey, Andrew Dalke, and Klaus Schulten. VMD: visual molecu-lar dynamics. 14(1):33–38, 2 1996. ISSN 0263-7855. doi: 10.1016/0263-7855(96)00018-5. URL http://dx.doi.org/10.1016/0263-7855(96)00018-5.

Halperin I, Wolfson H, and Nussinov R. Protein-protein interactions; coupling ofstructurally conserved residues and of hot spots across interfaces. implicationsfor docking. 12:15274922, Jun 2004. ISSN 0969-2126. doi: 10.1016/j.str.2004.04.009. URL https://dx.doi.org/10.1016/j.str.2004.04.009.

Michael Ibba and Dieter Soll. Aminoacyl-trna synthesis. 69(1):617–650, 6 2000.ISSN 0066-4154. doi: 10.1146/annurev.biochem.69.1.617. URL http://dx.doi.org/10.1146/annurev.biochem.69.1.617.

Masahiko Ikeuchi, Sandra J. Kolker, and Kathleen A. Sluka. Acid-sensing ion chan-nel 3 expression in mouse knee joint afferents and effects of carrageenan-inducedarthritis. 10(3):336–342, 3 2009. ISSN 1526-5900. doi: 10.1016/j.jpain.2008.10.010.URL http://dx.doi.org/10.1016/j.jpain.2008.10.010.

Anders Irback and Sandipan Mohanty. PROFASI: A monte carlo simulation pack-age for protein folding and aggregation. 27(13):1548–1555, 2006. ISSN 0192-8651.doi: 10.1002/jcc.20452. URL http://dx.doi.org/10.1002/jcc.20452.

Janani Iyer, Qingyu Wang, Thanh Le, Lucilla Pizzo, Sebastian Gronke, Surendra S.Ambegaokar, Yuzuru Imai, Ashutosh Srivastava, Beatriz Llamusı Troisı, GraemeMardon, Ruben Artero, George R. Jackson, Adrian M. Isaacs, Linda Partridge,Bingwei Lu, Justin P. Kumar, and Santhosh Girirajan. Quantitative assessment ofeye phenotypes for functional genetic studies usingdrosophila melanogaster. 6(5):1427–1437, 3 2016. ISSN 2160-1836. doi: 10.1534/g3.116.027060. URL http://dx.doi.org/10.1534/g3.116.027060.

Mitcheson J, Perry M, Stansfeld P, Sanguinetti MC, Witchel H, and Hancox J.Structural determinants for high-affinity block of herg potassium channels. 266:16050266, 2005. ISSN 1528-2511.

Xu J and Zhang Y. How significant is a protein structure similarity with tm-score= 0.5? 26:20164152, Apr 2010. ISSN 1367-4803. doi: 10.1093/bioinformatics/btq066. URL https://dx.doi.org/10.1093/bioinformatics/btq066.

George R Jackson. Guide to understanding drosophila models of neurodegener-ative diseases. 6(2):e53, 2 2008. ISSN 1545-7885. doi: 10.1371/journal.pbio.0060053. URL http://dx.doi.org/10.1371/journal.pbio.0060053.

112 Bibliography

Matthew P. Jacobson, David L. Pincus, Chaya S. Rapp, Tyler J.F. Day, Barry Honig,David E. Shaw, and Richard A. Friesner. A hierarchical approach to all-atomprotein loop prediction. 55(2):351–367, 3 2004. ISSN 0887-3585. doi: 10.1002/prot.10613. URL http://dx.doi.org/10.1002/prot.10613.

Joseph Jankovic and Kathleen Clarence-Smith. Tetrabenazine for the treatmentof chorea and other hyperkinetic movement disorders. 11(11):1509–1523, 112011. ISSN 1473-7175. doi: 10.1586/ern.11.149. URL http://dx.doi.org/10.1586/ern.11.149.

J. Jasti, H. Furukawa, E.B. Gonzales, and E. Gouaux. Structure of an acid-sensingion channel 1 at 1.9 A resolution and low ph, 9 2007. URL http://dx.doi.org/10.2210/pdb2qts/pdb.

Baell JB and Nissink JWM. Seven year itch: Pan-assay interference compounds(PAINS) in 2017-utility and limitations. 13:29202222, Jan 2018. ISSN 1554-8929. doi: 10.1021/acschembio.7b00903. URL https://dx.doi.org/10.1021/acschembio.7b00903.

Jianguang Ji, Kristina Sundquist, and Jan Sundquist. Cancer incidence in patientswith polyglutamine diseases: a population-based study in sweden. 13(6):642–648, 6 2012. ISSN 1470-2045. doi: 10.1016/s1470-2045(12)70132-8. URL http://dx.doi.org/10.1016/s1470-2045(12)70132-8.

K. A. Johnson, N. C. Fox, R. A. Sperling, and W. E. Klunk. Brain imaging inalzheimer disease. 2(4):a006213–a006213, 1 2012. ISSN 2157-1422. doi: 10.1101/cshperspect.a006213. URL http://dx.doi.org/10.1101/cshperspect.a006213.

T.A. Jones and S. Thirup. Using known substructures in protein model building andcrystallography. 5(4):819–822, 4 1986. ISSN 0261-4189. doi: 10.1002/j.1460-2075.1986.tb04287.x. URL http://dx.doi.org/10.1002/j.1460-2075.1986.tb04287.x.

William L. Jorgensen, Jayaraman Chandrasekhar, Jeffry D. Madura, Roger W. Im-pey, and Michael L. Klein. Comparison of simple potential functions for simulat-ing liquid water. 79(2):926–935, 7 1983. ISSN 0021-9606. doi: 10.1063/1.445869.URL http://dx.doi.org/10.1063/1.445869.

Schames JR, Henchman RH, Siegel JS, Sotriffer CA, Ni H, and McCammon JA. Dis-covery of a novel binding trench in HIV integrase. 47:15055986, Apr 2004. ISSN0022-2623. doi: 10.1021/jm0341913. URL https://dx.doi.org/10.1021/jm0341913.

Jaewoon Jung, Wataru Nishima, Marcus Daniels, Gavin Bascom, ChigusaKobayashi, Adetokunbo Adedoyin, Michael Wall, Anna Lappala, Dominic

Bibliography 113

Phillips, William Fischer, Chang-Shung Tung, Tamar Schlick, Yuji Sugita, andKarissa Y. Sanbonmatsu. Scaling molecular dynamics beyond 100,000 proces-sor cores for large-scale biophysical simulations. 4 2019. ISSN 0192-8651. doi:10.1002/jcc.25840. URL http://dx.doi.org/10.1002/jcc.25840.

Elizabeth Jurrus, Dave Engel, Keith Star, Kyle Monson, Juan Brandi, Lisa E. Fel-berg, David H. Brookes, Leighton Wilson, Jiahui Chen, Karina Liles, Minju Chun,Peter Li, David W. Gohara, Todd Dolinsky, Robert Konecny, David R. Koes,Jens Erik Nielsen, Teresa Head-Gordon, Weihua Geng, Robert Krasny, Guo-WeiWei, Michael J. Holst, J. Andrew McCammon, and Nathan A. Baker. Improve-ments to the apbs biomolecular solvation software suite. Protein Science, 27(1):112–128, 2018. doi: 10.1002/pro.3280. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/pro.3280.

Cowtan K. The buccaneer software for automated model building. 1.tracing protein chains. 62:16929101, Sep 2006. ISSN 0907-4449. doi:10.1107/S0907444906022116. URL https://dx.doi.org/10.1107/S0907444906022116.

Cowtan K. Fitting molecular fragments into electron density. 64:18094471, Jan2008. ISSN 0907-4449. doi: 10.1107/S0907444907033938. URL https://dx.doi.org/10.1107/S0907444907033938.

Khafizov K, Madrid-Aliste C, Almo SC, and Fiser A. Trends in structural cov-erage of the protein universe and the impact of the protein structure initiative.111:24567391, Mar 2014. ISSN 0027-8424. doi: 10.1073/pnas.1321614111. URLhttps://dx.doi.org/10.1073/pnas.1321614111.

Wolfgang Kabsch. XDS. 66(2):125–132, 1 2010. ISSN 0907-4449.doi: 10.1107/s0907444909047337. URL http://dx.doi.org/10.1107/s0907444909047337.

Wolfgang Kabsch and Christian Sander. Dictionary of protein secondary structure:Pattern recognition of hydrogen-bonded and geometrical features. 22(12):2577–2637, 12 1983. ISSN 0006-3525. doi: 10.1002/bip.360221211. URL http://dx.doi.org/10.1002/bip.360221211.

Navneet Kaur, Puneet Kumar, Sumit Jamwal, Rahul Deshmukh, and Vinod Gaut-tam. Tetrabenazine: Spotlight on drug review. 23(3):176–185, 2016. ISSN0972-7531. doi: 10.1159/000449184. URL http://dx.doi.org/10.1159/000449184.

Helga Kersten, Martin ALBANI, Elisabeth MANNLEIN, Rosemarie PRAISLER, Pe-ter WURMBACH, and Knud H. NIERHAUS. On the role of ribosylthyminein prokaryotic trna function. 114(2):451–456, 2 1981. ISSN 0014-2956. doi:10.1111/j.1432-1033.1981.tb05166.x. URL http://dx.doi.org/10.1111/j.1432-1033.1981.tb05166.x.

114 Bibliography

N. Khosla and R. Valdez. A compilation of national plans, policies and governmentactions for rare diseases in 23 countries. Intractable Rare Dis Res, 7(4):213–222, Nov2018.

Mee Whi Kim, Yogarany Chelliah, Sang Woo Kim, Zbyszek Otwinowski, and IlyaBezprozvanny. Secondary structure of huntingtin amino-terminal region. 17(9):1205–1212, 9 2009. ISSN 0969-2126. doi: 10.1016/j.str.2009.08.002. URL http://dx.doi.org/10.1016/j.str.2009.08.002.

P. Kim. Intermediates in the folding reactions of small proteins. 59(1):631–660,1 1990. ISSN 0066-4154. doi: 10.1146/annurev.biochem.59.1.631. URL http://dx.doi.org/10.1146/annurev.biochem.59.1.631.

Gerhard Klebe, editor. Drug Design. Springer Berlin Heidelberg, 2013. ISBN9783642179068. doi: 10.1007/978-3-642-17907-5. URL http://dx.doi.org/10.1007/978-3-642-17907-5.

Daria B. Kokh, Stefan Richter, Stefan Henrich, Paul Czodrowski, Friedrich Ripp-mann, and Rebecca C. Wade. TRAPP: A tool for analysis of transient bindingpockets in proteins. 53(5):1235–1252, 5 2013. ISSN 1549-9596. doi: 10.1021/ci4000294. URL http://dx.doi.org/10.1021/ci4000294.

Daria B. Kokh, Paul Czodrowski, Friedrich Rippmann, and Rebecca C. Wade. Per-turbation approaches for exploring protein binding site flexibility to predict tran-sient binding pockets. 12(8):4100–4113, 7 2016. ISSN 1549-9618. doi: 10.1021/acs.jctc.6b00101. URL http://dx.doi.org/10.1021/acs.jctc.6b00101.

D. E. Koshland. Application of a theory of enzyme specificity to protein synthesis.44(2):98–104, 2 1958. ISSN 0027-8424. doi: 10.1073/pnas.44.2.98. URL http://dx.doi.org/10.1073/pnas.44.2.98.

Andriy Kovalenko and Fumio Hirata. Three-dimensional density profiles of waterin contact with a solute of arbitrary shape: a RISM approach. 290(1-3):237–244,6 1998. ISSN 0009-2614. doi: 10.1016/s0009-2614(98)00471-0. URL http://dx.doi.org/10.1016/s0009-2614(98)00471-0.

Dima Kozakov, Laurie E Grove, David R Hall, Tanggis Bohnuud, Scott E Mottarella,Lingqi Luo, Bing Xia, Dmitri Beglov, and Sandor Vajda. The ftmap family of webservers for determining and characterizing ligand-binding hot spots of proteins.10(5):733–755, 4 2015. ISSN 1754-2189. doi: 10.1038/nprot.2015.043. URL http://dx.doi.org/10.1038/nprot.2015.043.

Jan Kubelka, James Hofrichter, and William A Eaton. The protein folding ‘speedlimit’. 14(1):76–88, 2 2004. ISSN 0959-440X. doi: 10.1016/j.sbi.2004.01.013. URLhttp://dx.doi.org/10.1016/j.sbi.2004.01.013.

Bibliography 115

Pushkar Kulkarni and Uday Saxena. Treatment paradigms in huntington’sdisease. In Pathology, Prevention and Therapeutics of Neurodegenerative Dis-ease, pages 191–196. Springer Singapore, 9 2018. ISBN 9789811309434.doi: 10.1007/978-981-13-0944-1 17. URL http://dx.doi.org/10.1007/978-981-13-0944-1_17.

Roger M. Lane, Anne Smith, Tiffany Baumann, Marc Gleichmann, Dan Nor-ris, C. Frank Bennett, and Holly Kordasiewicz. Translating antisense tech-nology into a treatment for huntington’s disease. In Methods in Molecu-lar Biology, pages 497–523. Springer New York, 2018. ISBN 9781493978243.doi: 10.1007/978-1-4939-7825-0 23. URL http://dx.doi.org/10.1007/978-1-4939-7825-0_23.

M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan,H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, J.D. Thompson,T.J. Gibson, and D.G. Higgins. Clustal W and clustal X version 2.0. 23(21):2947–2948, 9 2007. ISSN 1367-4803. doi: 10.1093/bioinformatics/btm404. URLhttp://dx.doi.org/10.1093/bioinformatics/btm404.

R. A. Laskowski, M. W. MacArthur, D. S. Moss, and J. M. Thornton. PROCHECK: aprogram to check the stereochemical quality of protein structures. 26(2):283–291,4 1993. ISSN 0021-8898. doi: 10.1107/s0021889892009944. URL http://dx.doi.org/10.1107/s0021889892009944.

Eric H. Lee, Jen Hsin, Marcos Sotomayor, Gemma Comellas, and Klaus Schulten.Discovery through the computational microscope. 17(10):1295–1306, 10 2009.ISSN 0969-2126. doi: 10.1016/j.str.2009.09.001. URL http://dx.doi.org/10.1016/j.str.2009.09.001.

Jin Li, Ailing Fu, and Le Zhang. An overview of scoring functions used for pro-tein–ligand interactions in molecular docking. 11(2):320–328, 3 2019. ISSN1913-2751. doi: 10.1007/s12539-019-00327-w. URL http://dx.doi.org/10.1007/s12539-019-00327-w.

Andrew P. Lieberman, Zhigang Yu, Sue Murray, Raechel Peralta, Audrey Low,Shuling Guo, Xing Xian Yu, Constanza J. Cortes, C. Frank Bennett, Brett P. Mo-nia, Albert R. La Spada, and Gene Hung. Peripheral androgen receptor genesuppression rescues disease in mouse models of spinal and bulbar muscular at-rophy. 7(3):774–784, 5 2014. ISSN 2211-1247. doi: 10.1016/j.celrep.2014.02.008.URL http://dx.doi.org/10.1016/j.celrep.2014.02.008.

Andrew P. Lieberman, Vikram G. Shakkottai, and Roger L. Albin. Polyglu-tamine repeats in neurodegenerative diseases. 14(1):1–27, 1 2019. ISSN 1553-4006. doi: 10.1146/annurev-pathmechdis-012418-012857. URL http://dx.doi.org/10.1146/annurev-pathmechdis-012418-012857.

116 Bibliography

Angelica Nakagawa Lima, Eric Allison Philot, Gustavo Henrique Goulart Trossini,Luis Paulo Barbour Scott, Vinıcius Goncalves Maltarollo, and Kathia Maria Hon-orio. Use of machine learning approaches for novel drug discovery. 11(3):225–239, 2 2016. ISSN 1746-0441. doi: 10.1517/17460441.2016.1146250. URLhttp://dx.doi.org/10.1517/17460441.2016.1146250.

Evanthia Lionta, George Spyrou, Demetrios Vassilatis, and Zoe Cournia. Structure-based virtual screening for drug discovery: Principles, applications and re-cent advances. 14(16):1923–1938, 10 2014. ISSN 1568-0266. doi: 10.2174/1568026614666140929124445. URL http://dx.doi.org/10.2174/1568026614666140929124445.

Christopher A. Lipinski. Lead- and drug-like compounds: the rule-of-five revolu-tion. 1(4):337–341, 12 2004. ISSN 1740-6749. doi: 10.1016/j.ddtec.2004.11.007.URL http://dx.doi.org/10.1016/j.ddtec.2004.11.007.

Christopher A Lipinski, Franco Lombardo, Beryl W Dominy, and Paul J Feeney.Experimental and computational approaches to estimate solubility and per-meability in drug discovery and development settings 1PII of original article:S0169-409X(96)00423-1. the article was originally published in advanced drugdelivery reviews 23 (1997) 3–25. 1. 46(1-3):3–26, 3 2001. ISSN 0169-409X.doi: 10.1016/s0169-409x(00)00129-0. URL http://dx.doi.org/10.1016/s0169-409x(00)00129-0.

Sarah Loerch and Clara L. Kielkopf. Dividing and conquering the family of RNArecognition motifs: A representative case based on hnrnp L. 427(19):2997–3000,9 2015. ISSN 0022-2836. doi: 10.1016/j.jmb.2015.06.009. URL http://dx.doi.org/10.1016/j.jmb.2015.06.009.

Richard J. Loncharich, Bernard R. Brooks, and Richard W. Pastor. Langevindynamics of peptides: The frictional dependence of isomerization rates ofn-acetylalanyl-n?-methylamide. 32(5):523–535, 5 1992. ISSN 0006-3525. doi: 10.1002/bip.360320508. URL http://dx.doi.org/10.1002/bip.360320508.

Pedro E. M. Lopes, Olgun Guvench, and Alexander D. MacKerell. Currentstatus of protein force fields for molecular dynamics simulations. In Meth-ods in Molecular Biology, pages 47–71. Springer New York, 9 2014. ISBN9781493914647. doi: 10.1007/978-1-4939-1465-4 3. URL http://dx.doi.org/10.1007/978-1-4939-1465-4_3.

L. W. Lu, G. H. Chiang, and K. Randerath. Effects of dl-ethionine on mouse livertrna base composition. 3(9):2243–2254, 9 1976. ISSN 0305-1048. doi: 10.1093/nar/3.9.2243. URL http://dx.doi.org/10.1093/nar/3.9.2243.

A. P. Lyubartsev, A. A. Martsinovski, S. V. Shevkunov, and P. N. Vorontsov-Velyaminov. New approach to monte carlo calculation of the free energy:

Bibliography 117

Method of expanded ensembles. 96(3):1776–1783, 2 1992. ISSN 0021-9606. doi:10.1063/1.462133. URL http://dx.doi.org/10.1063/1.462133.

Buyong Ma, Sandeep Kumar, Chung-Jung Tsai, and Ruth Nussinov. Folding fun-nels and binding mechanisms. 12(9):713–720, 9 1999. ISSN 1741-0134. doi:10.1093/protein/12.9.713. URL http://dx.doi.org/10.1093/protein/12.9.713.

Martı-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, and Sali A. Comparativeprotein structure modeling of genes and genomes. 29:10940251, 2000. ISSN 1056-8700. doi: 10.1146/annurev.biophys.29.1.291. URL https://dx.doi.org/10.1146/annurev.biophys.29.1.291.

Michael Madeja, Ulrich Musshoff, and Erwin-Josef Speckmann. Improvement andtesting of a concentration-clamp system for oocytes of xenopus laevis. 63(1-2):211–213, 12 1995. ISSN 0165-0270. doi: 10.1016/0165-0270(95)00094-1. URLhttp://dx.doi.org/10.1016/0165-0270(95)00094-1.

James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom,Kevin E. Hauser, and Carlos Simmerling. ff14sb: Improving the accuracy of pro-tein side chain and backbone parameters from ff99sb. 11(8):3696–3713, 7 2015.ISSN 1549-9618. doi: 10.1021/acs.jctc.5b00255. URL http://dx.doi.org/10.1021/acs.jctc.5b00255.

Vinicius Goncalves Maltarollo, Thales Kronenberger, Gabriel Zarzana Espinoza,Patricia Rufino Oliveira, and Kathia Maria Honorio. Advances with supportvector machines for novel drug discovery. 14(1):23–33, 11 2018. ISSN 1746-0441. doi: 10.1080/17460441.2019.1549033. URL http://dx.doi.org/10.1080/17460441.2019.1549033.

Boris A. Margulis, Vladimir Vigont, Vladimir F. Lazarev, Elena V. Kaznacheyeva,and Irina V. Guzhova. Pharmacological protein targets in polyglutamine dis-eases: Mutant polypeptides and their interactors. 587(13):1997–2007, 5 2013.ISSN 0014-5793. doi: 10.1016/j.febslet.2013.05.022. URL http://dx.doi.org/10.1016/j.febslet.2013.05.022.

Christophe Maris, Cyril Dominguez, and Frederic H.-T. Allain. The RNA recog-nition motif, a plastic rna-binding platform to regulate post-transcriptional geneexpression. 272(9):2118–2131, 4 2005. ISSN 1742-464X. doi: 10.1111/j.1742-4658.2005.04653.x. URL http://dx.doi.org/10.1111/j.1742-4658.2005.04653.x.

Sebastien Marra, Romain Ferru-Clement, Veronique Breuil, Anne Delaunay, Ma-rine Christin, Valerie Friend, Stephane Sebille, Christian Cognard, Thierry Fer-reira, Christian Roux, Liana Euller-Ziegler, Jacques Noel, Eric Lingueglia, andEmmanuel Deval. Non-acidic activation of pain-related acid-sensing ion channel

118 Bibliography

3 by lipids. 35(4):414–428, 1 2016. ISSN 0261-4189. doi: 10.15252/embj.201592335.URL http://dx.doi.org/10.15252/embj.201592335.

Marc A. Martı-Renom, Ashley C. Stuart, Andras Fiser, Roberto Sanchez, FranciscoMelo, and Andrej Sali. Comparative protein structure modeling of genes andgenomes. 29(1):291–325, 6 2000. ISSN 1056-8700. doi: 10.1146/annurev.biophys.29.1.291. URL http://dx.doi.org/10.1146/annurev.biophys.29.1.291.

Hirofumi Maruyama, Yuishin Izumi, Hiroyuki Morino, Masaya Oda, HiromasaToji, Shigenobu Nakamura, and Hideshi Kawakami. Difference in disease-freesurvival curve and regional distribution according to subtype of spinocerebellarataxia: A study of 1,286 japanese patients. 114(5):578–583, 6 2002. ISSN 0148-7299.doi: 10.1002/ajmg.10514. URL http://dx.doi.org/10.1002/ajmg.10514.

Laura Masino, Geoff Kelly, Kevin Leonard, Yvon Trottier, and Annalisa Pastore.Solution structure of polyglutamine tracts in gst-polyglutamine fusion proteins.513(2-3):267–272, 2 2002. ISSN 0014-5793. doi: 10.1016/s0014-5793(02)02335-9.URL http://dx.doi.org/10.1016/s0014-5793(02)02335-9.

Mark McGann. FRED pose prediction and virtual screening accuracy. 51(3):578–596, 2 2011. ISSN 1549-9596. doi: 10.1021/ci100436p. URL http://dx.doi.org/10.1021/ci100436p.

Susan L. McGovern and Brian K. Shoichet. Information decay in molecular dockingscreens against holo, apo, and modeled conformations of enzymes. 46(14):2895–2907, 7 2003. ISSN 0022-2623. doi: 10.1021/jm0300330. URL http://dx.doi.org/10.1021/jm0300330.

Dev Mehta, Robert Jackson, Gaurav Paul, Jiong Shi, and Marwan Sabbagh. Whydo trials for alzheimer’s disease drugs keep failing? A discontinued drug per-spective for 2010-2015. 26(6):735–739, 5 2017. ISSN 1354-3784. doi: 10.1080/13543784.2017.1323868. URL http://dx.doi.org/10.1080/13543784.2017.1323868.

Elaine C. Meng, Brian K. Shoichet, and Irwin D. Kuntz. Automated docking withgrid-based energy evaluation. 13(4):505–524, 5 1992. ISSN 0192-8651. doi: 10.1002/jcc.540130412. URL http://dx.doi.org/10.1002/jcc.540130412.

Rajesh P. Menon, Suran Nethisinghe, Serena Faggiano, Tommaso Vannocci, HumanRezaei, Sally Pemble, Mary G. Sweeney, Nicholas W. Wood, Mary B. Davis, An-nalisa Pastore, and Paola Giunti. The role of interruptions in polyq in the pathol-ogy of SCA1. 9(7):e1003648, 7 2013. ISSN 1553-7404. doi: 10.1371/journal.pgen.1003648. URL http://dx.doi.org/10.1371/journal.pgen.1003648.

Bibliography 119

Tiago Mestre, Joaquim Ferreira, Miguel M Coelho, Mario Rosa, and Cristina Sam-paio. Therapeutic interventions for symptomatic treatment in huntington’s dis-ease. 7 2009. ISSN 1465-1858. doi: 10.1002/14651858.cd006456.pub2. URLhttp://dx.doi.org/10.1002/14651858.cd006456.pub2.

Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H.Teller, and Edward Teller. Equation of state calculations by fast computing ma-chines. 21(6):1087–1092, 6 1953. ISSN 0021-9606. doi: 10.1063/1.1699114. URLhttp://dx.doi.org/10.1063/1.1699114.

S. Migliore, J. Jankovic, and F. Squitieri. Genetic Counseling in Huntington’s Dis-ease: Potential New Challenges on Horizon? Front Neurol, 10:453, 2019.

David L. Mobley and Ken A. Dill. Binding of small-molecule ligands to proteins:“what you see” is not always “what you get”. 17(4):489–498, 4 2009. ISSN 0969-2126. doi: 10.1016/j.str.2009.02.010. URL http://dx.doi.org/10.1016/j.str.2009.02.010.

Derek C Molliver, David C Immke, Leonardo Fierro, Michel Pare, Frank L Rice,and Edwin W McCleskey. ASIC3, an acid-sensing ion channel, is expressedin metaboreceptive sensory neurons. 1:1744–8069–1–35, 1 2005. ISSN 1744-8069. doi: 10.1186/1744-8069-1-35. URL http://dx.doi.org/10.1186/1744-8069-1-35.

Jacques Monod, Jean-Pierre Changeux, and Francois Jacob. Allosteric pro-teins and cellular control systems. 6(4):306–329, 4 1963. ISSN 0022-2836.doi: 10.1016/s0022-2836(63)80091-1. URL http://dx.doi.org/10.1016/s0022-2836(63)80091-1.

Hesam N. Motlagh, James O. Wrabl, Jing Li, and Vincent J. Hilser. The ensemblenature of allostery. 508(7496):331–339, 4 2014. ISSN 0028-0836. doi: 10.1038/nature13001. URL http://dx.doi.org/10.1038/nature13001.

John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Burkhard Rost, and AnnaTramontano. Critical assessment of methods of protein structure prediction-round VIII. 77(S9):1–4, 2009. ISSN 0887-3585. doi: 10.1002/prot.22589. URLhttp://dx.doi.org/10.1002/prot.22589.

Daniel Mucs and Richard A Bryce. The application of quantum mechan-ics in structure-based drug design. 8(3):263–276, 1 2013. ISSN 1746-0441.doi: 10.1517/17460441.2013.752812. URL http://dx.doi.org/10.1517/17460441.2013.752812.

G. N. Murshudov, A. A. Vagin, and E. J. Dodson. Refinement of macromolecularstructures by the maximum-likelihood method. 53(3):240–255, 5 1997. ISSN 0907-4449. doi: 10.1107/s0907444996012255. URL http://dx.doi.org/10.1107/s0907444996012255.

120 Bibliography

Y. Nagai. Prevention of polyglutamine oligomerization and neurodegeneration bythe peptide inhibitor QBP1 in drosophila. 12(11):1253–1259, 6 2003. ISSN 1460-2083. doi: 10.1093/hmg/ddg144. URL http://dx.doi.org/10.1093/hmg/ddg144.

Frank Noe, Gianni De Fabritiis, and Cecilia Clementi. Machine learning for proteinfolding and dynamics. Current Opinion in Structural Biology, 60:77–84, 2020.

Takamasa Nonaka. The CCP4 suite-computer programs for protein crystallog-raphy. 36(3):224–227, 1994. ISSN 0369-4585. doi: 10.5940/jcrsj.36.224. URLhttp://dx.doi.org/10.5940/jcrsj.36.224.

Monica E. Nordlund, J.O. Marcus Johansson, Ulrich von Pawel-Tammingen,and Anders S. Bystrom. Identification of the TRM2 gene encoding thetrna(m5u54)methyltransferase of saccharomyces cerevisiae. 6(6):844–860, 6 2000.ISSN 1355-8382. doi: 10.1017/s1355838200992422. URL http://dx.doi.org/10.1017/s1355838200992422.

Gail A. Van Norman. Drugs and devices. 1(5):399–412, 8 2016. ISSN 2452-302X. doi: 10.1016/j.jacbts.2016.06.003. URL http://dx.doi.org/10.1016/j.jacbts.2016.06.003.

Martyna Nowacka, Pietro Boccaletto, Elzbieta Jankowska, Tomasz Jarzynka,Janusz M Bujnicki, and Stanislaw Dunin-Horkawicz. Rrmdb—an evolutionary-oriented database of RNA recognition motif sequences. 2019, 1 2019. ISSN 1758-0463. doi: 10.1093/database/bay148. URL http://dx.doi.org/10.1093/database/bay148.

Mats H. M. Olsson, Chresten R. Søndergaard, Michal Rostkowski, and Jan H.Jensen. PROPKA3: consistent treatment of internal and surface residues in em-pirical pka predictions. 7(2):525–537, 1 2011. ISSN 1549-9618. doi: 10.1021/ct100578z. URL http://dx.doi.org/10.1021/ct100578z.

Megumi Omori, Masataka Yokoyama, Yoshikazu Matsuoka, Hiroyuki Kobayashi,Satoshi Mizobuchi, Yoshitaro Itano, Kiyoshi Morita, and Hiroyuki Ichikawa.Effects of selective spinal nerve ligation on acetic acid-induced nociceptive re-sponses and ASIC3 immunoreactivity in the rat dorsal root ganglion. 1219:26–31, 7 2008. ISSN 0006-8993. doi: 10.1016/j.brainres.2008.03.040. URL http://dx.doi.org/10.1016/j.brainres.2008.03.040.

John F. Ouyang and Ryan P. A. Bettens. Modelling water: A lifetime enigma. 69(3):104–111, 3 2015. ISSN 0009-4293. doi: 10.2533/chimia.2015.104. URL http://dx.doi.org/10.2533/chimia.2015.104.

Joanna Owens. Determining druggability. 6(3):187–187, 3 2007. ISSN 1474-1776.doi: 10.1038/nrd2275. URL http://dx.doi.org/10.1038/nrd2275.

Bibliography 121

Cimermancic P, Weinkam P, Rettenmaier TJ, Bichmann L, Keedy DA, WoldeyesRA, Schneidman-Duhovny D, Demerdash ON, Mitchell JC, Wells JA, Fraser JS,and Sali A. Cryptosite: Expanding the druggable proteome by characterizationand prediction of cryptic binding sites. 428:26854760, Feb 2016. ISSN 0022-2836. doi: 10.1016/j.jmb.2016.01.029. URL https://dx.doi.org/10.1016/j.jmb.2016.01.029.

Santosh Panjikar, Venkataraman Parthasarathy, Victor S. Lamzin, Manfred S. Weiss,and Paul A. Tucker. Auto-rickshaw: an automated crystal structure determina-tion platform as an efficient tool for the validation of an x-ray diffraction exper-iment. 61(4):449–457, 3 2005. ISSN 0907-4449. doi: 10.1107/s0907444905001307.URL http://dx.doi.org/10.1107/s0907444905001307.

Santosh Panjikar, Venkataraman Parthasarathy, Victor S. Lamzin, Manfred S. Weiss,and Paul A. Tucker. On the combination of molecular replacement and single-wavelength anomalous diffraction phasing for automated structure determina-tion. 65(10):1089–1097, 9 2009. ISSN 0907-4449. doi: 10.1107/s0907444909029643.URL http://dx.doi.org/10.1107/s0907444909029643.

D. M. Paton. Deutetrabenazine: Treatment of hyperkinetic aspects of Huntington’sdisease, tardive dyskinesia and Tourette syndrome. Drugs Today, 53(2):89–102,Feb 2017.

Fabian Paul and Thomas R. Weikl. How to distinguish conformational selectionand induced fit based on chemical relaxation rates. PLOS Computational Biology,12(9):1–17, 09 2016. doi: 10.1371/journal.pcbi.1005067. URL https://doi.org/10.1371/journal.pcbi.1005067.

Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, Headd JJ, HungLW, Kapral GJ, Grosse-Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, ReadRJ, Richardson DC, Richardson JS, Terwilliger TC, and Zwart PH. PHENIX: acomprehensive python-based system for macromolecular structure solution. 66:20124702, Feb 2010. ISSN 0907-4449. doi: 10.1107/S0907444909052925. URLhttps://dx.doi.org/10.1107/S0907444909052925.

William R. Pearson. Using the FASTA program to search protein and DNA se-quence databases. In Computer Analysis of Sequence Data, pages 365–389. Hu-mana Press, 1994. ISBN 9780896032767. doi: 10.1385/0-89603-276-0:365. URLhttp://dx.doi.org/10.1385/0-89603-276-0:365.

J. Pei and N. V. Grishin. AL2CO: calculation of positional conservation ina protein sequence alignment. 17(8):700–712, 8 2001. ISSN 1367-4803.doi: 10.1093/bioinformatics/17.8.700. URL http://dx.doi.org/10.1093/bioinformatics/17.8.700.

122 Bibliography

Zhong Peng, Wei-Guang Li, Chen Huang, Yi-Ming Jiang, Xiang Wang, Michael XiZhu, Xiaoyang Cheng, and Tian-Le Xu. ASIC3 mediates itch sensation in re-sponse to coincident stimulation by acid and nonproton ligand. 13(2):387–398, 10 2015. ISSN 2211-1247. doi: 10.1016/j.celrep.2015.09.002. URL http://dx.doi.org/10.1016/j.celrep.2015.09.002.

John J. Perona and Ita Gruic-Sovulj. Synthetic and editing mechanisms ofaminoacyl-trna synthetases. In Topics in Current Chemistry, pages 1–41. SpringerNetherlands, 2013. ISBN 9789401787000. doi: 10.1007/128 2013 456. URLhttp://dx.doi.org/10.1007/128_2013_456.

Ursula Pieper, Benjamin M. Webb, Guang Qiang Dong, Dina Schneidman-Duhovny, Hao Fan, Seung Joong Kim, Natalia Khuri, Yannick G. Spill, PatrickWeinkam, Michal Hammel, John A. Tainer, Michael Nilges, and Andrej Sali.Modbase, a database of annotated comparative protein structure models and as-sociated resources. 42(D1):D336–D346, 11 2013. ISSN 0305-1048. doi: 10.1093/nar/gkt1144. URL http://dx.doi.org/10.1093/nar/gkt1144.

H. Akiko Popiel, Yoshitaka Nagai, Nobuhiro Fujikake, and Tatsushi Toda. Deliveryof the aggregate inhibitor peptide QBP1 into the mouse brain using ptds and itstherapeutic effect on polyglutamine disease mice. 449(2):87–92, 1 2009. ISSN0304-3940. doi: 10.1016/j.neulet.2008.06.015. URL http://dx.doi.org/10.1016/j.neulet.2008.06.015.

Christopher A. Powell and Michal Minczuk. TRMT2B is responsible for both trnaand rrna m5u-methylation in human mitochondria. 10 2019. doi: 10.1101/797472. URL http://dx.doi.org/10.1101/797472.

Tamara Pringsheim, Katie Wiltshire, Lundy Day, Jonathan Dykeman, ThomasSteeves, and Nathalie Jette. The incidence and prevalence of huntington’s dis-ease: A systematic review and meta-analysis. 27(9):1083–1091, 6 2012. ISSN0885-3185. doi: 10.1002/mds.25075. URL http://dx.doi.org/10.1002/mds.25075.

Marguerite Prior, Chandramouli Chiruta, Antonio Currais, Josh Goldberg, JustinRamsey, Richard Dargusch, Pamela A. Maher, and David Schubert. Back to thefuture with phenotypic screening. 5(7):503–513, 6 2014. ISSN 1948-7193. doi:10.1021/cn500051h. URL http://dx.doi.org/10.1021/cn500051h.

E. Prabhu Raman, Wenbo Yu, Olgun Guvench, and Alexander D. MacKerell.Reproducing crystal binding modes of ligand functional groups using site-identification by ligand competitive saturation (SILCS) simulations. 51(4):877–896, 4 2011. ISSN 1549-9596. doi: 10.1021/ci100462t. URL http://dx.doi.org/10.1021/ci100462t.

Bibliography 123

Mukulika Ray and Subhash C Lakhotia. The commonly used eye-specific sev-gal4and gmr-gal4 drivers in drosophila melanogaster are expressed in tissues otherthan eyes also. Journal of genetics, 94(3):407–416, 2015.

Catharina Reimers, Cheng-Han Lee, Hubert Kalbacher, Yuemin Tian, Chih-HsienHung, Axel Schmidt, Lea Prokop, Silke Kauferstein, Dietrich Mebs, Chih-ChengChen, and Stefan Grunder. Identification of a cono-rfamide from the venomofconus textilethat targets ASIC3 and enhances muscle pain. 114(17):E3507–E3515, 4 2017. ISSN 0027-8424. doi: 10.1073/pnas.1616232114. URL http://dx.doi.org/10.1073/pnas.1616232114.

Melissa Reiners, Michael A. Margreiter, Adrienne Oslender-Bujotzek, Giulia Ros-setti, Stefan Grunder, and Axel Schmidt. The conorfamide rprfa stabilizesthe open conformation of acid-sensing ion channel 3 via the nonproton lig-and–sensing domain. 94(4):1114–1124, 7 2018. ISSN 0026-895X. doi: 10.1124/mol.118.112375. URL http://dx.doi.org/10.1124/mol.118.112375.

Alexis P Rideau, Clare Gooding, Peter J Simpson, Tom P Monie, Mike Lorenz,Stefan Huttelmaier, Robert H Singer, Stephen Matthews, Stephen Curry, andChristopher W J Smith. A peptide motif in raver1 mediates splicing repression byinteraction with the PTB RRM2 domain. 13(9):839–848, 8 2006. ISSN 1545-9993.doi: 10.1038/nsmb1137. URL http://dx.doi.org/10.1038/nsmb1137.

R M Ridley, C D Frith, T J Crow, and P M Conneally. Anticipation in huntington’sdisease is inherited through the male line but may originate in the female. 25(9):589–595, 9 1988. ISSN 1468-6244. doi: 10.1136/jmg.25.9.589. URL http://dx.doi.org/10.1136/jmg.25.9.589.

Clemens Carel Johannes Roothaan. New developments in molecular orbital theory.Reviews of modern physics, 23(2):69, 1951.

J. L. Ross, L. F. Queme, E. R. Cohen, K. J. Green, P. Lu, A. T. Shank, S. An, R. C.Hudgins, and M. P. Jankowski. Muscle IL1 drives ischemic myalgia via asic3-mediated sensory neuron sensitization. 36(26):6857–6871, 6 2016. ISSN 0270-6474. doi: 10.1523/jneurosci.4582-15.2016. URL http://dx.doi.org/10.1523/jneurosci.4582-15.2016.

Burkhard Rost. Twilight zone of protein sequence alignments. 12(2):85–94, 2 1999.ISSN 1741-0134. doi: 10.1093/protein/12.2.85. URL http://dx.doi.org/10.1093/protein/12.2.85.

Sophie Roy, Celine Boiteux, Omar Alijevic, Chungwen Liang, Simon Berneche,and Stephan Kellenberger. Molecular determinants of desensitization in anenac/degenerin channel. 27(12):5034–5045, 12 2013. ISSN 0892-6638. doi:10.1096/fj.13-230680. URL http://dx.doi.org/10.1096/fj.13-230680.

124 Bibliography

E. M. Russak and E. M. Bednarczyk. Impact of Deuterium Substitution on thePharmacokinetics of Pharmaceuticals. Ann Pharmacother, 53(2):211–216, 02 2019.

Jean-Paul Ryckaert, Giovanni Ciccotti, and Herman J.C Berendsen. Numericalintegration of the cartesian equations of motion of a system with constraints:molecular dynamics of n-alkanes. 23(3):327–341, 3 1977. ISSN 0021-9991.doi: 10.1016/0021-9991(77)90098-5. URL http://dx.doi.org/10.1016/0021-9991(77)90098-5.

Chris Sander and Reinhard Schneider. Database of homology-derived proteinstructures and the structural meaning of sequence alignment. 9(1):56–68, 1 1991.ISSN 0887-3585. doi: 10.1002/prot.340090107. URL http://dx.doi.org/10.1002/prot.340090107.

Ch.B. Ph.D. Blair R. Leavitt M.D. C.M. G. Bernhard Landwehrmeyer M.D. Ed-ward J. Wild M.B. B.Chir. Ph.D. Carsten Saft M.D. Roger A. Barker M.R.C.P.Ph.D. Nick F. Blair M.B. B.S. David Craufurd M.B. B.S. Josef Priller M.D. HughRickards M.D. Anne Rosser M.B. B.Chir. Ph.D. Holly B. Kordasiewicz Ph.D.et al. Sarah J. Tabrizi, M.B. Targeting huntingtin expression in patients withhuntington’s disease. New England Journal of Medicine, 381(14):1398–1398, 2019.doi: 10.1056/NEJMx190024. URL https://doi.org/10.1056/NEJMx190.PMID: 31577898.

Brewerton SC. The use of protein-ligand interaction fingerprints in docking. 11:18428089, May 2008. ISSN 1367-6733.

Johannes Schiebel, Roberto Gaspari, Tobias Wulsdorf, Khang Ngo, Christian Sohn,Tobias E. Schrader, Andrea Cavalli, Andreas Ostermann, Andreas Heine, andGerhard Klebe. Intriguing role of water in protein-ligand binding studiedby neutron crystallography on trypsin complexes. 9(1), 9 2018. ISSN 2041-1723. doi: 10.1038/s41467-018-05769-2. URL http://dx.doi.org/10.1038/s41467-018-05769-2.

Denis Schmidt, Markus Boehm, Christopher L. McClendon, Rubben Torella, andHolger Gohlke. Cosolvent-enhanced sampling and unbiased identification ofcryptic pockets suitable for structure-based drug design. 15(5):3331–3343, 4 2019.ISSN 1549-9618. doi: 10.1021/acs.jctc.8b01295. URL http://dx.doi.org/10.1021/acs.jctc.8b01295.

P. Schmidtke, V. Le Guilloux, J. Maupetit, and P. Tuffery. fpocket: online tools forprotein ensemble pocket detection and tracking. 38(Web Server):W582–W589, 52010. ISSN 0305-1048. doi: 10.1093/nar/gkq383. URL http://dx.doi.org/10.1093/nar/gkq383.

D. Schneidman-Duhovny, O. Dror, Y. Inbar, R. Nussinov, and H. J. Wolfson. Phar-maGist: a webserver for ligand-based pharmacophore detection. Nucleic AcidsRes., 36(Web Server issue):W223–228, Jul 2008.

Bibliography 125

E Schrodinger. Quantisierung als Eigenwertproblem. iii. Storungstheorie, 1926.

Karen L Schuchardt, Brett T Didier, Todd Elsethagen, Lisong Sun, Vidhya Guru-moorthi, Jared Chase, Jun Li, and Theresa L Windus. Basis set exchange: Acommunity database for computational sciences. Journal of chemical informationand modeling, 47(3):1045–1052, 2007.

Jesus Seco, F. Javier Luque, and Xavier Barril. Binding site detection and drugga-bility index from first principles. 52(8):2363–2371, 4 2009. ISSN 0022-2623. doi:10.1021/jm801385d. URL http://dx.doi.org/10.1021/jm801385d.

Daniel Seeliger, Jurgen Haas, and Bert L. de Groot. Geometry-based sampling ofconformational transitions in proteins. 15(11):1482–1492, 11 2007. ISSN 0969-2126. doi: 10.1016/j.str.2007.09.017. URL http://dx.doi.org/10.1016/j.str.2007.09.017.

Yibing Shan, Eric Kim, Michael P. Eastwood, Ron O. Dror, Markus A. Seeliger, andDavid E. Shaw. Correction to how does a drug molecule find its target bindingsite? 136(8):3320–3320, 2 2014. ISSN 0002-7863. doi: 10.1021/ja500545u. URLhttp://dx.doi.org/10.1021/ja500545u.

G. M. Sheldrick. Macromolecular phasing with SHELXE. 217(12), 1 2002. ISSN2196-7105. doi: 10.1524/zkri.217.12.644.20662. URL http://dx.doi.org/10.1524/zkri.217.12.644.20662.

Robert P. Sheridan, Vladimir N. Maiorov, M. Katharine Holloway, Wendy D. Cor-nell, and Ying-Duo Gao. Drug-like density: A method of quantifying the “bind-ability” of a protein target based on a very large set of pockets and drug-likeligands from the protein data bank. 50(11):2029–2040, 10 2010. ISSN 1549-9596.doi: 10.1021/ci100312t. URL http://dx.doi.org/10.1021/ci100312t.

Woody Sherman, Tyler Day, Matthew P. Jacobson, Richard A. Friesner, and RamyFarid. Novel procedure for modeling ligand/receptor induced fit effects. 49(2):534–553, 1 2006. ISSN 0022-2623. doi: 10.1021/jm050540c. URL http://dx.doi.org/10.1021/jm050540c.

Lee Shugart. Kinetic studies of escherichia coli transfer RNA (uracil-5-)-methyltransferase. 17(6):1068–1072, 3 1978. ISSN 0006-2960. doi: 10.1021/bi00599a020. URL http://dx.doi.org/10.1021/bi00599a020.

Alexandra Shulman-Peleg, Ruth Nussinov, and Haim J. Wolfson. Recognition offunctional sites in protein structures. 339(3):607–633, 6 2004. ISSN 0022-2836.doi: 10.1016/j.jmb.2004.04.012. URL http://dx.doi.org/10.1016/j.jmb.2004.04.012.

Daniel Siegismund, Vasily Tolkachev, Stephan Heyse, Beate Sick, Oliver Duerr,and Stephan Steigele. Developing deep learning applications for life science

126 Bibliography

and pharma industry. 68(06):305–310, 1 2018. ISSN 2194-9379. doi: 10.1055/s-0043-124761. URL http://dx.doi.org/10.1055/s-0043-124761.

Kathleen A Sluka, Margaret P Price, Nicole M Breese, Cheryl L Stucky, John AWemmie, and Michael J Welsh. Chronic hyperalgesia induced by repeated acidinjections in muscle is abolished by the loss of ASIC3, but not ASIC1. 106(3):229–239, 12 2003. ISSN 0304-3959. doi: 10.1016/s0304-3959(03)00269-0. URLhttp://dx.doi.org/10.1016/s0304-3959(03)00269-0.

J. Soding. Protein homology detection by HMM-HMM comparison. 21(7):951–960, 11 2004. ISSN 1367-4803. doi: 10.1093/bioinformatics/bti125. URL http://dx.doi.org/10.1093/bioinformatics/bti125.

Lin Song, Tai-Sung Lee, Chun Zhu, Darrin M. York, and Kenneth M. MerzJr. Validation of AMBER/GAFF for relative free energy calculations. 2 2019.doi: 10.26434/chemrxiv.7653434.v1. URL http://dx.doi.org/10.26434/chemrxiv.7653434.v1.

Christoph Sotriffer and Gerhard Klebe. Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drugdesign. 57(3):243–251, 3 2002. ISSN 0014-827X. doi: 10.1016/s0014-827x(02)01211-9. URL http://dx.doi.org/10.1016/s0014-827x(02)01211-9.

Christoph A. Sotriffer. Accounting for induced-fit effects in docking: Whatis possible and what is not? 11(2):179–191, 1 2011. ISSN 1568-0266.doi: 10.2174/156802611794863544. URL http://dx.doi.org/10.2174/156802611794863544.

Page C. Spiess, Dexter Morin, Chase R. Williams, and Alan R. Buckpitt. Proteinthiol oxidation in murine airway epithelial cells in response to naphthalene ordiethyl maleate. 43(3):316–325, 9 2010. ISSN 1044-1549. doi: 10.1165/rcmb.2009-0135oc. URL http://dx.doi.org/10.1165/rcmb.2009-0135oc.

Andreas Springauf and Stefan Grunder. An acid-sensing ion channel from shark(squalus acanthias) mediates transient and sustained responses to protons. 588(5):809–820, 2 2010. ISSN 0022-3751. doi: 10.1113/jphysiol.2009.182931. URLhttp://dx.doi.org/10.1113/jphysiol.2009.182931.

Andreas Springauf, Pia Bresenitz, and Stefan Grunder. The interaction betweentwo extracellular linker regions controls sustained opening of acid-sensing ionchannel 1. 286(27):24374–24384, 5 2011. ISSN 0021-9258. doi: 10.1074/jbc.m111.230797. URL http://dx.doi.org/10.1074/jbc.m111.230797.

N. Srinivasan and Tom L. Blundell. An evaluation of the performance of an au-tomated procedure for comparative modelling of protein tertiary structure. 6(5):501–512, 1993. ISSN 1741-0126. doi: 10.1093/protein/6.5.501. URL http://dx.doi.org/10.1093/protein/6.5.501.

Bibliography 127

Antonia Stank, Daria B. Kokh, Jonathan C. Fuller, and Rebecca C. Wade. Proteinbinding pocket dynamics. Accounts of Chemical Research, 49(5):809–815, 2016.doi: 10.1021/acs.accounts.5b00516. URL https://doi.org/10.1021/acs.accounts.5b00516. PMID: 27110726.

Marta M Stepniewska-Dziubinska, Piotr Zielenkiewicz, and Pawel Siedlecki.Development and evaluation of a deep learning model for protein–ligandbinding affinity prediction. 34(21):3666–3674, 5 2018. ISSN 1367-4803.doi: 10.1093/bioinformatics/bty374. URL http://dx.doi.org/10.1093/bioinformatics/bty374.

Teague Sterling and John J. Irwin. ZINC 15 – ligand discovery for everyone. 55(11):2324–2337, 11 2015. ISSN 1549-9596. doi: 10.1021/acs.jcim.5b00559. URLhttp://dx.doi.org/10.1021/acs.jcim.5b00559.

Colleen A. Stoyas and Albert R. La Spada. The cag–polyglutamine repeat dis-eases: a clinical, molecular, genetic, and pathophysiologic nosology. In Neu-rogenetics, Part I, pages 143–170. Elsevier, 2018. ISBN 9780444632333. doi:10.1016/b978-0-444-63233-3.00011-7. URL http://dx.doi.org/10.1016/b978-0-444-63233-3.00011-7.

M.J. Sutcliffe, F.R.F. Hayes, and T.L. Blundell. Knowledge based modelling of ho-mologous proteins, part II: rules for the conformations of substituted sidechains.1(5):385–392, 1987. ISSN 1741-0126. doi: 10.1093/protein/1.5.385. URL http://dx.doi.org/10.1093/protein/1.5.385.

S. P. Sutherland, C. J. Benson, J. P. Adelman, and E. W. McCleskey. Acid-sensing ionchannel 3 matches the acid-gated current in cardiac ischemia-sensing neurons.98(2):711–716, 1 2001. ISSN 0027-8424. doi: 10.1073/pnas.98.2.711. URL http://dx.doi.org/10.1073/pnas.98.2.711.

David C. Swinney and Jason Anthony. How were new medicines discovered? 10(7):507–519, 6 2011. ISSN 1474-1776. doi: 10.1038/nrd3480. URL http://dx.doi.org/10.1038/nrd3480.

Roberto Sanchez and Andrej Sali. Advances in comparative protein-structure mod-elling. 7(2):206–214, 4 1997. ISSN 0959-440X. doi: 10.1016/s0959-440x(97)80027-9.URL http://dx.doi.org/10.1016/s0959-440x(97)80027-9.

Varghese T, Sheelakumari R, James JS, and Mathuranath P. A review of neuroimag-ing biomarkers of alzheimer’s disease. 18:25431627, 2013. ISSN 1823-6138.

Toshihide Takeuchi and Yoshitaka Nagai. Protein misfolding and aggregation asa therapeutic target for polyglutamine diseases. 7(12):128, 10 2017. ISSN 2076-3425. doi: 10.3390/brainsci7100128. URL http://dx.doi.org/10.3390/brainsci7100128.

128 Bibliography

Tanaji Talele, Santosh Khedkar, and Alan Rigby. Successful applications of com-puter aided drug discovery: Moving drugs from concept to the clinic. 10(1):127–141, 1 2010. ISSN 1568-0266. doi: 10.2174/156802610790232251. URLhttp://dx.doi.org/10.2174/156802610790232251.

Thomas C. Terwilliger. Maximum-likelihood density modification. 56(8):965–972,8 2000. ISSN 0907-4449. doi: 10.1107/s0907444900005072. URL http://dx.doi.org/10.1107/s0907444900005072.

Yuemin Tian, Pia Bresenitz, Anna Reska, Laila El Moussaoui, Christoph PatrickBeier, and Stefan Grunder. Glioblastoma cancer stem cell lines express func-tional acid sensing ion channels asic1a and ASIC3. 7(1), 10 2017. ISSN 2045-2322. doi: 10.1038/s41598-017-13666-9. URL http://dx.doi.org/10.1038/s41598-017-13666-9.

William L. Towns and Thomas J. Begley. Transfer RNA methytransferases and theircorresponding modifications in budding yeast and humans: Activities, predica-tions, and potential roles in human health. 31(4):434–454, 4 2012. ISSN 1044-5498. doi: 10.1089/dna.2011.1437. URL http://dx.doi.org/10.1089/dna.2011.1437.

Yvon Trottier, Yves Lutz, Giovanni Stevanin, Georges Imbert, Didier Devys,Geraldine Cancel, Frederic Saudou, Chantal Weber, Gilles David, Laszlo Tora,Yves Agid, Alexis Brice, and Jean-Louis Mandel. Polyglutamine expansion as apathological epitope in huntington’s disease and four dominant cerebellar atax-ias. 378(6555):403–406, 11 1995. ISSN 0028-0836. doi: 10.1038/378403a0. URLhttp://dx.doi.org/10.1038/378403a0.

Joan S. Tscherne and Elsie Wainfan. Selective inhibition of uracil trna methylasesof E. coli by ethionine. 5(2):451–461, 1978. ISSN 0305-1048. doi: 10.1093/nar/5.2.451. URL http://dx.doi.org/10.1093/nar/5.2.451.

Y. Y. Tseng and W.-H. Li. Evolutionary approach to predicting the binding siteresidues of a protein from its primary sequence. 108(13):5313–5318, 3 2011. ISSN0027-8424. doi: 10.1073/pnas.1102210108. URL http://dx.doi.org/10.1073/pnas.1102210108.

Ivan Tubert-Brohman, Woody Sherman, Matt Repasky, and Thijs Beuming. Im-proved docking of polypeptides with glide. 53(7):1689–1699, 7 2013. ISSN1549-9596. doi: 10.1021/ci400128m. URL http://dx.doi.org/10.1021/ci400128m.

Shiou-Ru Tzeng and Charalampos G Kalodimos. Protein dynamics and allostery:an NMR view. 21(1):62–67, 2 2011. ISSN 0959-440X. doi: 10.1016/j.sbi.2010.10.007. URL http://dx.doi.org/10.1016/j.sbi.2010.10.007.

Bibliography 129

Oleinikovas V, Saladino G, Cossins BP, and Gervasio FL. Understanding cryp-tic pocket formation in protein targets by enhanced sampling simulations. 138:27726386, Nov 2016. ISSN 0002-7863. doi: 10.1021/jacs.6b05425. URL https://dx.doi.org/10.1021/jacs.6b05425.

Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer New York,2000. ISBN 9781441931603. doi: 10.1007/978-1-4757-3264-1. URL http://dx.doi.org/10.1007/978-1-4757-3264-1.

Marton Vass, Eva Agai Csongor, Ferenc Horti, and Gyorgy M. Keseru. Multiplefragment docking and linking in primary and secondary pockets of dopaminereceptors. 5(9):1010–1014, 7 2014. ISSN 1948-5875. doi: 10.1021/ml500201u. URLhttp://dx.doi.org/10.1021/ml500201u.

Christophe LMJ Verlinde and Wim GJ Hol. Structure-based drug design:progress, results and challenges. 2(7):577–587, 7 1994. ISSN 0969-2126.doi: 10.1016/s0969-2126(00)00060-5. URL http://dx.doi.org/10.1016/s0969-2126(00)00060-5.

R. M. Vernon, P. A. Chong, B. Tsang, T. H. Kim, A. Bah, P. Farber, H. Lin, and J. D.Forman-Kay. Pi-Pi contacts are an overlooked protein feature relevant to phaseseparation. Elife, 7, 02 2018.

Jonathan S. Vick and Candice C. Askwith. Asics and neuropeptides. 94:36–41,7 2015. ISSN 0028-3908. doi: 10.1016/j.neuropharm.2014.12.012. URL http://dx.doi.org/10.1016/j.neuropharm.2014.12.012.

Hilser VJ and Thompson EB. Intrinsic disorder as a mechanism to optimizeallosteric coupling in proteins. 104:17494761, May 2007. ISSN 0027-8424.doi: 10.1073/pnas.0700329104. URL https://dx.doi.org/10.1073/pnas.0700329104.

A. Volkamer, A. Griewel, T. Grombacher, and M. Rarey. Analyzing the topology ofactive sites: on the prediction of pockets and subpockets. J Chem Inf Model, 50(11):2041–2052, Nov 2010.

A. Volkamer, D. Kuhn, F. Rippmann, and M. Rarey. Dogsitescorer: a web server forautomatic binding site prediction, analysis and druggability assessment. 28(15):2074–2075, 5 2012a. ISSN 1367-4803. doi: 10.1093/bioinformatics/bts310. URLhttp://dx.doi.org/10.1093/bioinformatics/bts310.

Andrea Volkamer, Daniel Kuhn, Thomas Grombacher, Friedrich Rippmann, andMatthias Rarey. Combining global and local measures for structure-based drug-gability predictions. 52(2):360–372, 1 2012b. ISSN 1549-9596. doi: 10.1021/ci200454v. URL http://dx.doi.org/10.1021/ci200454v.

130 Bibliography

Christopher S. von Bartheld, Jami Bahney, and Suzana Herculano-Houzel. Thesearch for true numbers of neurons and glial cells in the human brain: A reviewof 150 years of cell counting. 524(18):3865–3895, 6 2016. ISSN 0021-9967. doi:10.1002/cne.24040. URL http://dx.doi.org/10.1002/cne.24040.

Hannes Voßfeldt, Malte Butzlaff, Katja Prußing, Roisın-Ana Nı Charthaigh, PeterKarsten, Anne Lankes, Sabine Hamm, Mikael Simons, Boris Adryan, Jorg B.Schulz, and Aaron Voigt. Large-scale screen for modifiers of ataxin-3-derivedpolyglutamine-induced toxicity in drosophila. 7(11):e47452, 11 2012. ISSN 1932-6203. doi: 10.1371/journal.pone.0047452. URL http://dx.doi.org/10.1371/journal.pone.0047452.

Minor W, Dauter Z, and Jaskolski M. The young person’s guide to the PDB. 62:28132477, 2016. ISSN 0032-5422.

Cheng Wang and Yingkai Zhang. Improving scoring-docking-screening powersof protein-ligand scoring functions using random forest. 38(3):169–177, 11 2016.ISSN 0192-8651. doi: 10.1002/jcc.24667. URL http://dx.doi.org/10.1002/jcc.24667.

X. Wang, W.-G. Li, Y. Yu, X. Xiao, J. Cheng, W.-Z. Zeng, Z. Peng, M. Xi Zhu, and T.-L. Xu. Serotonin facilitates peripheral pain sensitivity in a manner that dependson the nonproton ligand sensing domain of ASIC3 channel. 33(10):4265–4279, 32013. ISSN 0270-6474. doi: 10.1523/jneurosci.3376-12.2013. URL http://dx.doi.org/10.1523/jneurosci.3376-12.2013.

Gregory L. Warren, Thanh D. Do, Brian P. Kelley, Anthony Nicholls, and Stephen D.Warren. Essential considerations for using protein–ligand structures in drug dis-covery. 17(23-24):1270–1281, 12 2012. ISSN 1359-6446. doi: 10.1016/j.drudis.2012.06.011. URL http://dx.doi.org/10.1016/j.drudis.2012.06.011.

Mark A. Watson, Haoyu S. Yu, and Art D. Bochevarov. Generation of tautomersusing micro-pka’s. 59(6):2672–2689, 5 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00955. URL http://dx.doi.org/10.1021/acs.jcim.8b00955.

David Weininger. Smiles, a chemical language and information system. 1. in-troduction to methodology and encoding rules. Journal of Chemical Informa-tion and Computer Sciences, 28(1):31–36, 1988. doi: 10.1021/ci00057a005. URLhttps://pubs.acs.org/doi/abs/10.1021/ci00057a005.

John D. Westbrook and Stephen K. Burley. How structural biologists and the pro-tein data bank contributed to recent FDA new drug approvals. 27(2):211–217, 22019. ISSN 0969-2126. doi: 10.1016/j.str.2018.11.007. URL http://dx.doi.org/10.1016/j.str.2018.11.007.

Miranka Wirth, Claudia Schwarz, Gloria Benson, Nora Horn, Ralph Buchert,Catharina Lange, Theresa Kobe, Stefan Hetzer, Marta Maglione, Eva Michael,

Bibliography 131

Stefanie Marschenz, Knut Mai, Ute Kopp, Dietmar Schmitz, Ulrike Grittner,Stephan J. Sigrist, Slaven Stekovic, Frank Madeo, and Agnes Floel. Effects ofspermidine supplementation on cognition and biomarkers in older adults withsubjective cognitive decline (smartage)—study protocol for a randomized con-trolled trial. 11(1), 5 2019. ISSN 1758-9193. doi: 10.1186/s13195-019-0484-1. URLhttp://dx.doi.org/10.1186/s13195-019-0484-1.

L.V. Woodcock. Isothermal molecular dynamics calculations for liquid salts. 10(3):257–261, 8 1971. ISSN 0009-2614. doi: 10.1016/0009-2614(71)80281-6. URLhttp://dx.doi.org/10.1016/0009-2614(71)80281-6.

Kara J. Wyant, Andrew J. Ridder, and Praveen Dayalu. Huntington’sdisease—update on treatments. 17(4), 3 2017. ISSN 1528-4042. doi:10.1007/s11910-017-0739-9. URL http://dx.doi.org/10.1007/s11910-017-0739-9.

Z. Xiao, D. Riccardi, H. A. Velazquez, A. L. Chin, C. R. Yates, J. D. Carrick, J. C.Smith, J. Baudry, and L. D. Quarles. A computationally identified compoundantagonizes excess FGF-23 signaling in renal tubules and a mouse model ofhypophosphatemia. 9(455):ra113–ra113, 11 2016. ISSN 1945-0877. doi: 10.1126/scisignal.aaf5034. URL http://dx.doi.org/10.1126/scisignal.aaf5034.

Junichi Yagi, Heather N. Wenk, Ligia A. Naves, and Edwin W. McCleskey. Sus-tained currents through ASIC3 ion channels at the modest ph changes that occurduring myocardial ischemia. 99(5):501–509, 9 2006. ISSN 0009-7330. doi: 10.1161/01.res.0000238388.79295.4c. URL http://dx.doi.org/10.1161/01.res.0000238388.79295.4c.

Ai Yamamoto, Jose J Lucas, and Rene Hen. Reversal of neuropathology and motordysfunction in a conditional model of huntington’s disease. 101(1):57–66, 3 2000.ISSN 0092-8674. doi: 10.1016/s0092-8674(00)80623-6. URL http://dx.doi.org/10.1016/s0092-8674(00)80623-6.

Ye Yu, Zhi Chen, Wei-Guang Li, Hui Cao, En-Guang Feng, Fang Yu, Hong Liu,Hualiang Jiang, and Tian-Le Xu. A nonproton ligand sensor in the acid-sensingion channel. 68(1):61–72, 10 2010. ISSN 0896-6273. doi: 10.1016/j.neuron.2010.09.001. URL http://dx.doi.org/10.1016/j.neuron.2010.09.001.

Xiang Z, Soto CS, and Honig B. Evaluating conformational free energies: the colonyenergy and its application to the problem of loop prediction. 99:12032300, May2002. ISSN 0027-8424. doi: 10.1073/pnas.102179699. URL https://dx.doi.org/10.1073/pnas.102179699.

Le Zhang, Chunqiu Zheng, Tian Li, Lei Xing, Han Zeng, Tingting Li, Huan Yang,Jia Cao, Badong Chen, and Ziyuan Zhou. Building up a robust risk mathematical

132 Bibliography

platform to predict colorectal cancer. 2017:1–14, 2017. ISSN 1076-2787. doi: 10.1155/2017/8917258. URL http://dx.doi.org/10.1155/2017/8917258.

Paweł Sledz and Amedeo Caflisch. Protein structure-based drug design: fromdocking to molecular dynamics. 48:93–102, 2 2018. ISSN 0959-440X. doi: 10.1016/j.sbi.2017.10.010. URL http://dx.doi.org/10.1016/j.sbi.2017.10.010.

134 Appendix

Eidesstattliche Erklarung

Michael A. Margreiter erklart hiermit, dass diese Dissertation und die darindargelegten Inhalte die eigenen sind und selbststandig, als Ergebnis der eigenenoriginaren Forschung, generiert wurden.

Hiermit erklare ich an Eides statt

1. Diese Arbeit wurde vollstandig oder großtenteils in der Phase als Doktoranddieser Fakultat und Universitat angefertigt;

2. Sofern irgendein Bestandteil dieser Dissertation zuvor fur einen akademis-chen Abschluss oder eine andere Qualifikation an dieser oder einer anderenInstitution verwendet wurde, wurde dies klar angezeigt;

3. Wenn immer andere eigene- oder Veroffentlichungen Dritter herangezogenwurden, wurden diese klar benannt;

4. Wenn aus anderen eigenen- oder Veroffentlichungen Dritter zitiert wurde,wurde stets die Quelle hierfur angegeben. Diese Dissertation ist vollstandigmeine eigene Arbeit, mit der Ausnahme solcher Zitate;

5. Alle wesentlichen Quellen von Unterstutzung wurden benannt;

6. Wenn immer ein Teil dieser Dissertation auf der Zusammenarbeit mit an-deren basiert, wurde von mir klar gekennzeichnet, was von anderen und wasvon mir selbst erarbeitet wurde;

7. Teile dieser Arbeit wurden zuvor veroffentlicht und zwar in:

• Reiners, Melissa & Margreiter, Michael & Oslender-Bujotzek, Adrienne& Rossetti, Giulia & Grunder, Stefan & Schmidt, Axel. (2018). TheConorfamide RPRFa Stabilizes the Open Conformation of Acid-SensingIon Channel 3 via the Nonproton Ligand-Sensing Domain. MolecularPharmacology. 94. mol.118.112375. 10.1124/mol.118.112375.

• ”TRMT2a Inhibitors for the Use in the Treatment of PolyQ Diseases”, G.Rossetti, M. Margreiter, D. Niessing, E. Davydova, M. Witzenberger, J.Schulz, A. Voigt, N. Jon Shah, PCT/EP2019/082434, 2019

Aachen,May2020Michael A. Margreiter

Bibliography Appendix

Publications

”Challenges in RNA Regulation in Huntington’s Disease: Insights from Compu-tational Studies”, O. Palomino-Hernandez, M.A. Margreiter, G. Rossetti. IsraelJournal of Chemistry 2020”The Conorfamide RPRFa Stabilizes the Open Conformation of Acid-Sensing IonChannel 3 via the Nonproton Ligand-Sensing Domain”, Reiners, Melissa & Mar-greiter, Michael & Oslender-Bujotzek, Adrienne & Rossetti, Giulia & Grunder,Stefan & Schmidt, Axel. (2018). Molecular Pharmacology. 94. mol.118.112375.10.1124/mol.118.112375.”Heteroaromatic π-Stacking Energy Landscapes”, Roland G. Huber, Michael A.Margreiter, Julian E. Fuchs, Susanne von Grafenstein, Christofer S. Tautermann,Klaus R. Liedl, and Thomas Fox Journal of Chemical Information and Modeling2014 54 (5), 1371-1379. https://doi.org/10.1021/ci500183u”Cleavage entropy as quantitative measure of protease specificity.”, Fuchs, J. E.,von Grafenstein, S., Huber, R. G., Margreiter, M. A., Spitzer, G. M., Wallnoe-fer, H. G., & Liedl, K. R. (2013). PLoS computational biology, 9(4), e1003007.https://doi.org/10.1371/journal.pcbi.1003007

Patents

”TRMT2a Inhibitors for the Use in the Treatment of PolyQ Diseases”, G. Rossetti,M. Margreiter, D. Niessing, E. Davydova, M. Witzenberger, J. Schulz, A. Voigt, N.Jon Shah, PCT/EP2019/082434, 2019

Michael Alois Margreiter+49 17657908493

Google Scholar Profile

[email protected]

Born on February 4th, 1989

WORK EXPERIENCE

April 2016 – present Scientific Staff and Doctoral CandidateForschungszentrum Jülich, Germany• Computational drug discovery for neurological disorders• System administrator for the high-performance-computing in-frastructure

• Scientific software management and license allocation• Responsible for master students• Oral defense on May 26th, 2020 (magna cum laude)

January 2015 – present Chemistry and Biology LecturerInstitute for Student Courses (IFS) Dr. Rampitsch, Innsbruck &Cologne• Preparatory courses for the MedAT/TMS exams

August 2014 – June 2017 Lecturer for the Biomedical SciencesUniversity of Applied Sciences (AZW), Innsbruck• Human Biochemistry• Pathophysiology

February 2017 – June 2018 Teaching Assistant for High School StudentsSchülerhilfe Köln-Kalk/City, Cologne• Chemistry, Biology, German, French, and Mathematics

July 2010 – March 2014 Scientific Staff and Teaching AssistantUniversity of Innsbruck, Innsbruck• Chemoinformatics, Bioinformatics, Molecular Modelling, andQuantum Chemistry

• Lab courses

February 2012 – May 2012 Computational Chemistry InternshipLead Discovery Division at Boehringer Ingelheim, Biberach an derRiss, Germany

July 2011 – October 2011 Wet Lab InternshipBiopharmaceuticals and Upscaling at Sandoz (Novartis) Kundl,Austria

July 2010 – October 2010 Wet Lab InternshipOrganic Synthesis Lab at Sandoz (Novartis) Kundl, Austria

July 2009 – October 2009 Wet Lab InternshipR&D Departement at Sandoz (Novartis) Kundl, Austria

EDUCATION

2016 – 2020 Doctoral Studies in ChemistryRWTH Aachen University, Aachen

• Research Group of Jun.-Prof. Dr. Ph.D. Giulia Rossetti• Inhibitor Design on TRMT2a for Polyglutamine Diseases

2007 – 2014 Master’s Degree in ChemistryUniversity of Innsbruck, Innsbruck

• Research Group of Univ.-Prof. DDr. Klaus Liedl• Diploma Thesis: “Probing Aromatic-Heteroaromatic Inter-actions for Drug Design (A Density Functional Theory Ap-proach)”

PUBLICATIONS

Posters ”Small-molecule modulators of TRMT2a decrease PolyQ Aggreg-ation and PolyQ-induced Cell Death”, Molecular Medicine Tri-Conference 2020, San Francisco, USA”Cryptic and Allosteric Modulators of TRMT2A for PolyglutamineDiseases”, Drug Discovery-2019, Boston, USA”Mapping the binding site of the opioid neuropeptide Big Dyn-orphin on ASIC1a: B 08-7.” Acta Physiologica 227 (2019).”Inhibiting the Enzyme TRMT2A to Ameliorate PolyQ Toxicity”,EuroQSAR 2018, Thessaloniki, Greece”Developing Inhibitors of the Enzyme TRMT2A for the Treatmentof PolyQ Diseases”, 11th Triennial Congress of the World Associ-ation of Theoretical and Computational Chemists (WATOC2017),Munich, Germany”Similarity in shape and electrostatics guide virtual screening fornovel influenza nucleoprotein ligands.” 5th Life Science Meeting2013 of the Innsbruck Universities, Austria.”Probing Heteroaromatic-Aromatic Interactions for Drug Design”,EuroCUP VI 2013, Santpoort, Netherlands”In silico Identification of Precursors for CYP Profiling BreathTests”, 27th Molecular Modelling Workshop 2013, Erlangen, Ger-many”Probing Heteroaromatic-Aromatic Interactions for Drug Design”,26th Molecular Modelling Workshop 2012, Erlangen, Germany”Dispersion bound aromatic dimers”, 4th LIFESCIENCE Meeting2012, Innsbruck

Talks ”Prediction of Allosteric Modulation of TRMT2A”, INM-9 Confer-ence 2019, Eupen”Druggability Assessment of TRMT2a: X-ray vs. a Dynamical Per-spective”, PyEmma Workshop 2019, Berlin”TRMT2a: Insights fromMolecular Dynamics Simulations”, Helm-holtz Center Munich, 2018”Pharmacophore-Based Virtual Screening of TRMT2a”, INM-ICS-Conference 2017, Jülich

Papers ”Challenges in RNA Regulation in Huntington’s Disease: Insightsfrom Computational Studies”, Israel Journal of Chemistry 2018“The Conorfamide RPRFa Stabilizes the Open Conformation ofAcid-Sensing Ion Channel 3 via the Nonproton Ligand-SensingDomain”, Mol Pharmacol. 2018”Heteroaromatic π-Stacking Energy Landscapes”, J. Chem. Inf.Model. 2014”Cleavage Entropy as a Quantitative Measure of Protease Spe-cificity”, PLoS Comput Biol 2013

Patents ”TRMT2a Inhibitors for the Use in the Treatment of PolyQ Dis-eases”, G. Rossetti, M. Margreiter, D. Niessing, E. Davydova, M.Witzenberger, J. Schulz, A. Voigt, N. Jon Shah, WO 2020/109233,2020

WORKSHOPS

PRACE Winter school 2019 – Introduction to Machine Learningfor scientists, LeuvenChemical Computing GroupUser GroupMeeting and Conference2019, Oxford, UKPyEMMA Winter School for Markov Modeling 2019, BerlinSummer School on Machine Learning in Drug Design 2018,Leuven, BelgiumSchrödinger European User Meeting 2018, Rome, Italy

HONORS

Scholarships Vorstandsdoktoranden Scholarship Forschungszentrum Jülich

Prizes ”Probing Heteroaromatic-Aromatic Interactions for Drug Design”,26th Molecular Modelling Workshop in Erlangen, Germany (BestPoster Award, 2012)BioSolveIT Scientific Challenge, 2nd Stage, 2017OMICtools Contest, 1st Prize, 2017

SKILLS

Languages German- nativeEnglish- fluent

• EF High School Year 2006 in Kansas, USAFrench- fluent

• French Exchange Program2003 in Bourg-de-Péage, France