Étude de la structure et des propriétés SAR/QSAR de ...

MINISTERE DE L’ENSEIGNEMENT SUPERIEUR ET DE LA

RECHERCHE SCIENTIFIQUE

UNIVERSITE MOHAMED KHIDER BISKRA

FACULTE DES SCIENCES EXACTES ET

DES SCIENCES DE LA NATURE ET DE LA VIE

Département des Sciences de la matière

THESE

Présentée par

Mariem GHAMRI

En vue de l’obtention du diplôme de :

Doctorat en chimie

Option :

Chimie Moléculaire

Devant la commission d’examen :

Mr. Salah BELAIDI Prof. Univ. Biskra Président

Mme. Dalal HARKATI Prof. Univ. Biskra Directrice de thèse

Mr. Ammar DIBI Prof. Univ. Batna 1 Examinateur

Mr. Lazhar BOUCHLALEG MC/A Univ. Batna 2 Examinateur

Intitulée:

Étude de la structure et des propriétés SAR/QSAR de

quelques molécules à visée thérapeutique

Soutenue le :13/01/2022

MINISTRY OF HIGHER EDUCATION AND SCIENTIFIC RESEARCH

MOHAMED KHIDER BISKRA UNIVERSITY

FACULTY OF EXACT SCIENCES AND SCIENCES OF NATURE AND

LIFE

Matter Sciences Department

THESIS

Presented by

Mariem GHAMRI

In order to obtain the diploma of:

PhD in Chemistry

Option:

Molecular chemistry

In front of the thesis committee members:

Mr. Salah BELAIDI Prof. Biskra Univ. President

Mme. Dalal HARKATI Prof. Biskra Univ. Supervisor

Mr. Ammar DIBI Prof. Batna 1 Univ. Examiner

Mr. Lazhar BOUCHLALEG MC/A Batna 2 Univ. Examiner

Entitled:

Defended on : 13/01/2022

Study of the structure and SAR / QSAR

properties of some therapeutic molecules

To my dear parents

To my husband

To my siblings

To all those who are dear to me

Acknowledgements

First of all, I pray to Allah, the almighty for providing me this opportunity

and granting me the capability to proceed successfully.

At the outset, I would like to thank my supervisor Prof. Harkati Dalal for

providing me an opportunity to do my doctoral research under his supervision,

continuous support, encouragement throughout my research, and for his useful

comments and discussion to make my thesis complete.

I wish to extend my sincere thanks to Mr. Belaidi Salah, Professor at the

University of Biskra, for accepting to chair the dissertation of my thesis.

I would also like to thank the committee members, Mr.Ammar Diobi,

Professors at the University of Batna 1, and Mr. Lazhar Bouchlaleg at the

University of Batna 2, for agreeing to examine and judge my work.

I would like to thank all the members of group of computational and

pharmaceutical chemistry, LMCE Laboratory at Biskra University, with

whom I had the opportunity to work. They provided me a useful feedback and

insightful comments on my work.

I would like to express my special thanks to Prof. Houchlaf Majdi and his

team member Linguerri Roberto and Gilberte Chambaud from Gustave Eiffel

University for the help with a smile, the good hospitality and the good reception

during my training in their Multiscale Modeling and Simulation Lab.

Last, but not least, I would like to dedicate this thesis to my beloved parents

whom raised me with a love of science and supported me in all my pursuits, my

husband and my siblings for their love and continuous support – both spiritually

and materially.

Table of contents

ACKNOWLEDGMENTS

TABLE OF CONTENTS

LIST OF FIGURES

LIST OF TABLES

General Introduction 2

Chapter I: Drug discovery: finding a lead

I.Context: drug research 5

I.1 Research and Development (R&D) 6

I.1.1 Target identification and validation 6

I.1.1.1 Choosing a disease 6

I.1.1.1.1 General about cancer 6

I.1.1.1.1.1 Definition of cancer 7

I.1.1.1.2 Types of cancer 8

I.1.1.1.3 Distribution of cancer 9

I.1.1.1.4 Anticancer drugs 10

I.1.1.1.4.1 The various types of anticancer drugs and their mechanisms of action 10

I.1.1.1.4.2 Toxicity of anticancer drugs 12

I.1.1.1.5 Signs and symptoms 12

I.1.1.1.6 Treatment 13

I.1.1.1.6.1 Surgery 13

I.1.1.1.6.2 Radiotherapy 13

I.1.1.1.6.3 Medical treatment 13

I.1.1.1.6.4 Radiation therapy 13

I.1.1.1.6.5 Hormone therapy 13

I.1.1.1.6.6 Biological response modifier therapy 14

Table of contents

I.1.1.1.6.7 Immunotherapy 14

I.1.1.1.6.8 Bone marrow transplant 14

I.1.1.2 Choosing a drug target 14

I.1.1.2.1 Historical Overview 14

I.1.1.2.2 Drug targets 15

I.1.1.2.3 Discovering drug targets 16

I.1.1.3 Finding a lead compound 18

I.1.1.3.1 Generation of Hits and Leads 18

I.1.1.3.2 Screening on the targets 18

I.1.1.3.3 Phenotypical screening 19

I.1.1.4 Lead Optimization 19

I.1.1.4.1 Rationalize the selection 20

I.1.1.4.1.1 “Drug-like” compounds 20

I.1.1.4.1.2 "lead-like" compounds 21

I.1.2 Pre-clinical trials 22

I.1.3 Clinical tests 22

I.2. Limitations and failures 24

I.3. Medicinal Chemists Today 25

I.4 Conclusion 25

I.5 References 27

Chapter II

Part I: Computer in medicinal chemistry

II.1 Theoretical Background for Quantum Mechanical Calculations 35

II.1.1 Molecular mechanics 36

II.1.1.1 Introduction 36

Table of contents

II.1.1.1.1 Elongation energy 38

II.1.1.1.2. Bending energy 39

II.1.1.1.3. Torsional energy 39

II.1.1.1.4. Van der Waals energy 40

II.1.1.1.5. Electrostatic energy 41

II.1.1.2 Force Fields 42

II.1.1.3 Limitations of Molecular Mechanics 43

II.1.2 Quantum Mechanics 44

II.1.2.1 Introduction 44

II.1.2.2 HF and DFT Methods 45

II.1.2.3 Semi-empirical methods 46

II.1.2.4 Limitations of Quantum Mechanics 47

II.2 Choice of method 49

II.3 Minima search method 50

II.3.1 Introduction 50

II.3.2 Minimization algorithms

II.3.2.1 The "steepest descent" method

50

51

II.3.2.2. The conjugate gradient method 52

II.3.2.3. The Newton-Raphson method 52

II.3.2.4. The simulated annealing method 53

II.4 Molecular Modeling 53

II.4.1 Elements of Computational Chemistry 53

II.4.1.1 Drawing chemical structures 53

II.4.1.2 Three-dimensional structures 55

II.4.1.3 Molecular Structure Databases 56

Table of contents

II.4.1.4 Energy minimization 58

II.4.1.5 Molecular dimensions 59

II.4.2 Molecular properties 60

II.4.2.1 Geometric Parameters 61

II.4.2.1.1 Bond length 61

II.4.2.1.2 Valence angle 61

II.4.2.1.3 Dihedral Angle (Torsion angle) 62

II.4.3 Electronic Parameters 63

II.4.3.1 Atomic charges 63

II.4.3.2 Molecular electrostatic potentials 63

II.4.3.3 Molecular orbitals 65

II.4.3.4 Fukui functions 66

II.5. Studies of vibrational properties 67

II.5.1 Introduction 67

II.5.2 Theoretical aspects of infrared vibration spectroscopy 67

II.5.3 Principle of infrared spectroscopy 68

II.5.3.1 Electromagnetic radiation 68

II.5.3.2 Infrared 70

II.5.3.3 Infrared absorption 70

II.5.4 Theoretical aspects 72

II.5.4.1 Internal vibration modes 72

II.5.4.2 Classification of vibration modes 73

II.5.4.3 Factors that influence vibration frequencies 74

II.5.5 Application 75

II.6 Conclusion 75

Table of contents

Part II: Computer-Assisted Drug Design

II.1 Predictive Quantitative Structure–Activity Relationship Modeling 78

II.2. Principle of QSPR / QSAR methods 79

II.3 Key Quantitative Structure–Activity Relationship Concepts 80

II.3.1 Molecular Descriptors 82

II.3.3 Quantitative Structure–Activity Relationship Modeling Approaches 83

II.3.3.1 General Classification 83

II.3.3.2. General methodology of a QSPR / QSAR study 84

II.3.3.3 Transforming the Bioactivities 85

II.3.3.4 Determination of the Best Set of Descriptors Approaches 85

II.4 Building Predictive Quantitative Structure–Activity Relationship Models: The Approaches to Model

Validation

86

II.4.1 Data analysis methods 86

II.4.1.1 Neural networks 86

II.4.1.2 Multiple linear regression 88

II.4.1.2.1 Randomization test 90

II.4.2 Interpretation and validation of a QSPR / QSAR model 91

II.4.2.1 The Importance of Validation 91

II.4.2.1.1 Internal Validation 92

II.4.2.1.1.1 Least Squares Fit 92

II.4.2.1.1.2 Fit of the Model 93

II.4.2.1.2 External validation 94

II.5 Application of QSAR 95

II.6 References 96

Table of contents

Chapter III: Results and Discussion

III.1 Introduction 104

III.2 Structure, spectroscopy and electronic properties of carbazole and its derivatives 108

III.2.1 Carbazole structure and properties 109

III.3 Carbazole derivatives structures 120

III.4 CDCAs derivatives physicochemical properties, descriptors and drug-likeness scoring 123

III.5 QSAR modelling studies 129

III.5.1 Internal validation 129

III.5.2 External validation 130

III.5.3 Y-randomization 130

III.6 Multiple linear regression (MLR) 131

III.7 Artificial Neural Networks (ANN) 133

General Conclusion 137

APPENDIX 140

REFERENCES 160

ABSTRACT

List of Figures

Figure.I.1. The (simplified) process of drug discovery is divided into four major steps…...…5

Figure.I.2. Understanding cancer start…………………………………..……………….……7

Figure.I.3. How cancer spreads? ...............................................................................................8

Figure.I.4. National Ranking of Cancer as a Cause of Death at Ages <70 Years in 2019. The

numbers of countries represented in each ranking group are included in the

legend.....................................................................................................................10

Figure.I.5. 2D & 3D structure for Etoposide (VP-16)…….…………………………….…...15

Figure.I.6. Design of CDCAs as novel Topo II-targeting anticancer agents…….…….…….17

Figure.I.7. Success rate of phase I clinical trials in obtaining marketing authorization for the

ten largest pharmaceutical companies between 1991 and 2000. The success rate

varies according to the pathologies targeted…………………...……...…………24

Figure.II.8. An illustration of the various terms that contribute to a typical force field…….37

Figure.II.9. Elongation between two atoms………………………………………………….38

Figure.II.10. Deformation of valence angles………………………………………………...39

Figure.II.11. Dihedral angle formed by atoms 1-2-3-4……………………………………...40

Figure.II.12. Van der Waals energy curve…………………………………………………...41

Figure.II.13. Electrostatic interactions between two atoms………………………………….42

Figure .II.14. Drawing chemical structures………………………………………………….54

Figure .II.15. Conversion of a 2D drawing to a 3D model…………………………………..56

Figure.II.16. The Cambridge Crystallographic Data Centre (CCDC)……………………….57

Figure.II.17. The Protein Data Bank (PDB)…………………………………………………58

Figure.II.18. Diagram of energy minimization……………………………………………...59

Figure.II.19. Different methods of visualizing molecules…………………………………..60

Figure II.20. Molecular dimensions for adrenaline (3D Hyperchem)………………………60

List of Figures

Figure.II.21. Tetrahedral shape of CCl4……………………………………………………..62

Figure.II.22. Newman projection……………………………………………………………62

Figure.II.23. Ring variation on cromakalim………………………………………………...64

Figure.II.24. Bicyclic models in cromakalim study……...…………………………………64

Figure.II.25. Molecular electrostatic potentials (MEPs) of bicyclic models (III) and (IV)…65

Figure.II.26. HOMO and LUMO molecular orbitals for ethene…………………………….66

Figure.II.27. The various spectral domains of electromagnetic radiation……………….…..69

Figure.II.28. Movement of atoms during the phenomenon of vibration………………….....71

Figure.II.29. Movements associated with normal modes of vibration of a molecule containing

3 atoms………………………………………………………………………….74

Figure.II.30. Molecular descriptors and fingerprints are examples of strategies that allow

researchers to extract important information about compounds that can be used

in additional computer-aided drug design techniques, such as virtual screening,

quantitative-structure-activity relationship (QSAR)…………...….....……….81

Figure.II.31. Representation of a formal neuron…………………...………………………..86

Figure.II.32. Architecture of neural networks……………………………………………….86

Figure.II.33. Schematic overview of the QSAR process…..……………………………….. 95

Figure.III.34. Chemical structure of carbazole with atom numbering………………….…104

Figure.III.35. The Synthetic or natural carbazole derivatives induces cancer cells death

triggering the intrinsic apoptotic pathway by inhibition of Topoisomerase

II……………………………………………………………………………108

Figure.III.36. Dual descriptor (∆f) Vs Atoms ………………………….….………………120

Figure.III.37. Face and profile views of Compounds 1 and 3…………………….………..121

Figure.III.38. Generic leverage plot of observed [11] (Observed pIC50) versus predicted

(Predicted pIC50) activity according to the MLR model. The oblique line is the

diagonal……………………………………………………………………...132

Figure.III.39. The architecture (6-4-1) of the adopted ANN model………………………..134

List of Tables

Table.I.1. Overview of anticancer drugs……………………………………………………..11

Table.II.2. Molecular degrees of freedom…………………………………………………...73

Table.III.3. Chemical structure and experimental activity (pIC50 [125]) of the CDCAs under

study. pIC50 is log10(1/IC50). * denotes the compounds selected for external

validation (test set)…………...………………………………………………..105

Table.III.4. Equilibrium Geometrical Parameters (Bond lengths in Å and angles in degrees)

of carbazole in its electronic ground state. The numbering of the atoms is given

in Figure 34……………………………………………………………………109

Table.III.5. Comparison of the experimental and calculated anharmonic frequencies (ν, in

cm-1)) of carbazole. Computed at the B3LYP/6–311G(d,p) level of theory and

we give also the computing of IR intensity (I in (km/mol). Abbreviations used:

υ,stretching; υs, sym. stretching; υas, asym. stretching; b, in-plane-bending; g,

out-of-plane bending; Ben: Benzene; Pyr: pyrrole……………………………112

Table.III.6. Outermost molecular orbitals of carbazole and of its derivatives of interest in the

present study. E (in eV) corresponds to the energy HOMO – LUMO

gap……………………………………………………………………………..113

Table.III.7. Values of the Fukui functions of carbazole. See text for the definition of the

different terms…………………………………………………………………119

Table.III.8. Physicochemical properties of CDCAs derivatives. For each compound, we give

the energies of the HOMO (EHOMO in a.u.) and of the LUMO (ELUMO in a.u.), the

polarizability (Pol in Å3), the volume (V in Å3), the surface area grid (SAG in

Å3) and the hydration energy (HE in kcal/mol). * denotes the selected

compounds for external validation (test set)…………………………………..124

Table.III.9. Drug-likeness parameters and lipophilicity indices for the CDCAs under study.

See text for the definition of these parameters and their description……….…128

Table.III.10. Numerical parameters used to confirm the QSAR model validity. See text for

more details……………………………………………………………….....131

Table.III.11. Random MLR model parameters…………………………………………….132

General Introduction


2

Molecular modeling is a broad term encompassing different molecular

graphing and computational chemistry techniques to display, simulate, analyze,

calculate and store the properties of molecules; to determine the geometry of a

molecule and to evaluate the associated physicochemical properties.

The comparison of the biological activity of certain molecules and their structures has

made it possible in many cases to establish correlations between the structural

parameters and the properties of a molecule.

The majority of heterocycles play a very important therapeutic role. Due to their

biological interest, the chemistry of heterocycles is growing.

The majority of the heterocycles more to the complexes of therapeutic interest

currently used clinically were obtained by chemical modification from natural

molecules or by a total synthesis.

The research and synthesis of new chemical and biochemical compounds are today

often associated with a molecular modeling study. Thanks to the computer

development of recent years and the rise of intensive parallel computing in particular,

molecular modeling has become a real game.

Molecular modeling can be defined as an application of computing to create,

manipulate, calculate and predict molecular structures and associated properties.

There are many theoretical chemistry methods aimed at determining the physical or

chemical properties of isolated molecules, be they thermodynamic properties such as

binding enthalpies, relative energies of different conformers, or simulations of

infrared, Raman or electronic.

Two main classes of simulation methods can be distinguished: on the one hand

quantum chemistry methods that allow the precise determination of the electronic

properties of molecules, on the other hand molecular mechanics methods that are

based on empirical parameters that allow in particular to determine the structural

parameters and secondly these methods make it possible to calculate the

physicochemical parameters used in the QSAR study.

Quantitative Structure Activity Relationship (QSAR), as their names suggest,

sets up a mathematical relationship using data analysis methods, linking microscopic

molecular properties called descriptors, to an effect experimental (biological activity,

toxicity, affinity for a receptor), for a series of similar chemical compounds. The

starting point for such methods is the definition of empirical or theoretical molecular


3

descriptors. The latter take into account information on the structure and the

physicochemical characteristics of the molecules.

The choice of experimental reference databases is decisive in a QSAR study. It must

consist of reliable experimental data obtained by following a unique experimental

protocol. Indeed, the robustness of the model strongly depends on the base on which it

is based.

Finally, the link between the descriptors and the database is determined through

analysis tools such as multilinear regressions (MLRs), partial least squares (PLS)

regressions, neural networks, and component analysis.

The main objective of this study is to predict molecules of pharmaceutical

interest, ie to sort molecules before going to the experimental stage, or to interpret or

predict the biological activation of molecules already synthesized. So, look for the

preferred conformations by quantum and semi-empirical calculations of heterocyclic

compounds of pharmaceutical interest (eg: carbazole and chalcone....) whose majority

of derivatives are bioactive.

The effect of the substituents on their basic skeletons and the main structural motifs

will be examined, and qualitative and quantitative studies of the structure-activity

relationships, using the QSAR model, will be carried out in order to arrive at models

for some series of these heterocycles of pharmaceutical interest.

Chapter I

Drug discovery: finding a lead

Drug discovery : finding a lead

5

I. Context: drug research

Drug discovery is a multidisciplinary cycle that takes quite a while (10-20 years)

and costs a ton of cash. In the writing, there is an assortment of assessments for his

complete expense, which goes from $300 million to more than $1.75 billion dollars

[1]. DiMasi's investigation [2, 3], which is quite possibly frequently referred to,

puts the expense at around 802 million dollars, while a later report puts it at under

$12 million [4]. Nonetheless, most of the studies concur on the huge measure of

assets needed to finish a task, just as the continuous expansion in general

interaction costs, which were assessed to be 138 million dollars during the 1970s

and 300 million dollars during the 1980s [5]. To exacerbate the situation, the

achievement pace of drug industry projects stays low [6], and the quantity of new

drugs acquainted with the market has stayed stable, if not declining, since the year

2000 [5].

Subsequently, drug scientists are continually endeavoring to improve every one of

the means prompting the commercialization of a medication. Drug organizations,

for instance, are among the most universally contributed, with 10 to 20% of their

income put resources into innovative work (R&D)[7].

The whole drug research interaction can be separated into four significant stages,

as demonstrated in Figure 1. We'll portray them momentarily in the parts that

follow, with an attention on the stages for which the techniques introduced in this

theory were created.

Figure .1.The (simplified) process of drug discovery is divided into four major steps.

Gene TargetLead

validationLead

optimisationclinical Sales

Biology

Screening/ Modeling

Chemistry/ Toxicology

Years 3 5 7 9 12 13-25


6

I.1. Research and Development (R&D)

One of the most goals of the R&D part is to deliver one or additional active and

innovative drug candidates with quantity amount of toxicity potential, giving them the

most effective probability of passing the next stages.

I.1.1. Target identification and validation

The initial step is to recognize at least one biological entities whose movement can

be regulated to give an advantageous impact with regard to a particular illness.

Because all following research will be predicated on the notion that these targets are

effectively related to the disease and that the medication's action would have a

favorable effect on humans, this phase is crucial. It's a lot more delicate because the

target's relevance to the targeted disease must be evaluated against the potential

negative consequences of modifying the target's activity. The search for new targets

has intensified since the human genome was sequenced and bioinformatics methods

for comparing sequences and protein structures were developed. Many authors have

also questioned the classic "one disease, one target, one drug" approach, claiming that

biological network regulation can compensate for a molecule's influence on its target

[8, 9], even if the molecule has a high chance of interacting with other proteins. Poly-

pharmacology [10] and chemogenomic are two new sciences that have arisen in recent

decades.

I.1.1.1. Choosing a disease

I.1.1.1.1. General about cancer

Cancer is one of the most important health problems of the current era and also a

leading cause of death among populations. Cancer can simply be defined as

unregulated cell division leading to a tumor formation in any part of the body. In its

natural course, tumor mass continues to grow invading the surrounding tissues and

finally tumor cells get access to the lymphatic and vascular systems spreading to

distant organs which results in metastasis. In order to be successful in the treatment of

cancer, early diagnosis, before the tumor spreads to the surrounding tissues or distant

organs, is mandatory. It is now known that most cancer types result from accumulation


7

of multiple errors in DNA that may affect primarily the regulatory pathways in the

cell. Although the currently used treatment modalities are mainly directed to the

macroscopic destruction of the tumor mass, the presence of systemic dissemination

could not be denied and systemic treatments are widely used in order to control the

microscopic disease. Despite these advances in treatment, the development of new

strategies towards the correction of molecular impairments in the cell is indispensable.

I.1.1.1.1.1. Definition of cancer

Cancer is defined by the uncontrollable growth and spread of aberrant cells, as well as

apoptosis resistance. External agents or hereditary genetic factors are responsible for

its occurrence. When a cell's repair system allows the accumulation of microscopic

changes to escape, the cell becomes malignant.

Figure .2. Understanding cancer start.

This cell divides fast and uncontrolled, resulting in an organ tumor. Through blood

vessels and lymphatics, malignant cells can escape and spread to other tissues, causing

metastases [11].

Sometimes cells don’t grow, divide and die in the usual way. This may cause blood or

lymph fluid in the body to become abnormal, or form a lump called a tumour. A

tumour can be benign or malignant.


8

Benign tumour: Cells are confined to one area and are not able to spread to other

parts of the body. This is not cancer.

Malignant tumour: This is made up of cancerous cells which have the ability to

spread by travelling through the bloodstream or lymphatic system (lymph fluid).

Carcinogenesis is separated into three stages: initiation, promotion, and tumor

progression [12].

Figure .3.How cancer spreads?

I.1.1.1.2 Types of cancer

Doctors divide cancer into types based on where it begins. Four main types of cancer

are:

Carcinomas: A carcinoma begins in the skin or the tissue that covers the surface

of internal organs and glands. Carcinomas usually form solid tumors. They are

the most common type of cancer. Examples of carcinomas include prostate

cancer, breast cancer, lung cancer, and colorectal cancer.

Sarcomas: A sarcoma begins in the tissues that support and connect the body. A

sarcoma can develop in fat, muscles, nerves, tendons, joints, blood vessels,

lymph vessels, cartilage, or bone.


9

Leukemias: Leukemia is a cancer of the blood. Leukemia begins when healthy

blood cells change and grow uncontrollably. The 4 main types of leukemia are

acute lymphocytic leukemia, chronic lymphocytic leukemia, acute myeloid

leukemia, and chronic myeloid leukemia.

Lymphomas: Lymphoma is a cancer that begins in the lymphatic system. The

lymphatic system is a network of vessels and glands that help fight infection.

There are 2 main types of lymphomas: Hodgkin lymphoma and non-Hodgkin

lymphoma.

I.1.1.1.3Distribution of cancer

Cancer ranks as a leading cause of death and an important barrier to increasing life

expectancy in every country of the world. According to estimates from the World

Health Organization (WHO) in 2019, cancer is the first or second leading cause of

death before the age of 70 years in 112 of 183 countries and ranks third or fourth in a

further 23 countries (Fig. 4). Cancer's rising prominence as a leading cause of death

partly reflects marked declines in mortality rates of stroke and coronary heart disease,

relative to cancer, in many countries.

Overall, the burden of cancer incidence and mortality is rapidly growing

worldwide; this reflects both aging and growth of the population as well as changes in

the prevalence and distribution of the main risk factors for cancer, several of which are

associated with socioeconomic development. [13]


10

Figure.4.National Ranking of Cancer as a Cause of Death at Ages <70 Years in 2019. The

numbers of countries represented in each ranking group are included in the legend.

I.1.1.1.4 Anticancer drugs

Anticancer agents are drugs that are used in chemo-therapies to treat cancer. It is

used alone or in combination. They want to stop cells from proliferating

uncontrollably because they have genetic and functional defects that set them apart

from healthy cells. This chemotherapy isn't meant to kill the tumor locally; rather, it's

meant to prevent tumor escape through the production of distant metastases [14].

I.1.1.1.4.1 The various types of anticancer drugs and their mechanisms of action

These compounds have been divided into three groups based on their

pharmacological mechanisms:

Cytotoxic anticancer drugs: The compounds employed in cytotoxic

chemotherapy vary in their modes of action and chemical family membership.

Alkylating agents, topoisomerase I and II inhibitors, microtubule poisons

(intercalating agents), antimetabolites, and splitting agents are the six primary

families.


11

Targeted therapies: these are medications that operate on tumor cells at a

specific level of their function or development. Extracellular medications (such

as avastin, erbitux, etc.) are aimed against a membrane receptor located on the

tumor cell, whereas intracellular medications (such as: multikinases, glivec ...)

disrupt the proliferative signal transmission cell through enzymatic inhibition.

Other anticancer drugs: these are anticancer drugs with different modes of

action of two classes mentioned above.

Table 1: Overview of anticancer drugs.

DRUGS DRUGS DESCRIPTION

AVASTIN®

is a monoclonal anti-vascular endothelial growth factor antibody

used in combination with antineoplastic agents for the treatment

of many types of cancer.

ERBITUX® is an endothelial growth factor receptor binding fragment used to

treat colorectal cancer as well as squamous cell carcinoma of the

head and neck.

GLIVEC® is a tyrosine kinase inhibitor used to treat a number of leukemias,

myelodysplastic/myeloproliferative disease, systemic

mastocytosis,hypereosinophilic syndrome, dermatofibrosarcoma

protuberans, andgastrointestinal stromal tumors.

CABOMETYX® is a tyrosine kinase inhibitor used to treat advanced renal cell

carcinoma, hepatocellular carcinoma, and medullary thyroid

cancer. (multikinases inhibitors).

PLATINOL® is a platinum based chemotherapy agent used to treat various sarcomas, carcinomas, lymphomas, and germ cell tumors. (Alkylating agents).

ETOPOSIDE® is a podophyllotoxin derivative used to treat testicular and small

cell lung tumors. (topoisomerase I and II inhibitors)

AMSIDINE® is a cytotoxic agent used to induce remission in acute adult


12

leukemia that is not adequately responsive to other

agents.(intercalating agents)

I.1.1.1.4.2 Toxicity of anticancer drugs

The many drugs used in chemotherapy procedures are not only harmful to tumor

cells, but they can also be toxic to healthy cells, particularly those that proliferate

quickly. Cardiotoxicity, neurotoxicity, nephrotoxicity, hepatotoxicity, alopecia,

dehydration, exhaustion, infertility, undernutrition, skin issues, allergies, and many

more are examples of toxic effects [15].

I.1.1.1.5 Signs and symptoms

No matter your age or health, it’s good to know the possible signs of cancer.

Common signs and symptoms of cancer in both men and women include:

- Pain. Bone cancer often hurts from the beginning. Some brain tumors cause

headaches that last for days and don’t get better with treatment. Pain can also be

a late sign of cancer.

- Weight loss without trying. Almost half of people who have cancer lose weight.

It’s often one of the signs that they notice first.

- Fatigue. If you’re tired all the time and rest doesn’t help.

- Fever. If it’s high or lasts more than 3 days. Some blood cancers, like

lymphoma, cause a fever for days or even weeks.

- Changes in your skin.

- Sores that don’t heal. Spots that bleed and won’t go away are also signs of skin

cancer.

- Cough or hoarseness that doesn’t go away. A cough is one sign of lung cancer,

and hoarseness may mean cancer of your voice box (larynx) or thyroid gland.

- Unusual bleeding. Cancer can make blood show up where it shouldn’t be.

Blood in your poop is a symptom of colon or rectal cancer. And tumors along

your urinary tract can cause blood in your urine.


13

- Anemia. This is when your body doesn’t have enough red blood cells, which

are made in your bone marrow. Cancers like leukemia, lymphoma, and multiple

myeloma can damage your marrow. Tumors that spread there from other places

might crowd out regular red blood cells.

I.1.1.1.6 Treatment

The treatment of a cancer patient requires a multidisciplinary approach. It employs a

variety of techniques, the most well-known of which are surgery, radiation, and

chemotherapy. Its goal is to get rid of the cancerous tumor and prevent it from

spreading to other parts of the body [16].

I.1.1.1.6.1 Surgery

This method entails interfering directly at the tumor's level, i.e., removing the

tumor, either partially or completely. When the tumor is localized, it is usually the first

and most successful treatment.

I.1.1.1.6.2 Radiotherapy

It is a cancer treatment strategy based on the biological effect of ionizing radiation,

particularly high-energy X-rays. It tries to modify cancer cells' reproduction capability

by exposing them to high doses of radiation while maintaining as much healthy tissue

and surrounding organs as feasible. It can be given alone or in conjunction with

chemotherapy.

I.1.1.1.6.3 Medical treatment

Chemotherapy for cancer is designed to kill cells that are actively dividing. It is a

treatment that involves ingesting a medication that slows the growth of tumor cells in

an attempt to extend the patient's life expectancy and minimize pain from metastases.

I.1.1.1.6.4 Radiation therapy

This treatment kills cancer cells with high dosages of radiation. In some instances,

radiation may be given at the same time as chemotherapy.


14

I.1.1.1.6.5 Hormone therapy

Sometimes hormones can block other cancer-causing hormones. For example, men

with prostate cancer might be given hormones to keep testosterone (which contributes

to prostate cancer) at bay.

I.1.1.1.6.6 Biological response modifier therapy

This treatment stimulates your immune system and helps it perform more

effectively. It does this by changing your body’s natural processes.

I.1.1.1.6.7 Immunotherapy

Sometimes called biological therapy, immunotherapy treats disease by using the

power of your body’s immune system. It can target cancer cells while leaving healthy

cells intact.

I.1.1.1.6.8 Bone marrow transplant

Also called stem cell transplantation, this treatment replaces damaged stem cells

with healthy ones. Prior to transplantation, you’ll undergo chemotherapy to prepare

your body for the process.

I.1.1.2 Choosing a drug target

I.1.1.2.1 Historical Overview

Heterocycles are common structural units in marketed drugs and in medicinal

chemistry targets in the drug discovery process. Over 80% of top small molecule drugs

by US retail sales in 2010 contain at least one heterocyclic fragment in their structures.

The one reason behind such high prevalence of oxygen, sulfur, and especially

nitrogen- containing rings in drug molecules is obvious. The research process that

leads to identification of an effective therapeutic treatment is largely based on

mimicking nature by "fooling" it in a very subtle way. Because heterocycles are the

core elements of a wide range of natural products such as nucleic acids, amino acids,

carbohydrates, vitamins, and alkaloids, medicinal chemistry efforts often evolve

around simulating such structural motifs. However, heterocycles play a much bigger


15

role in the modern repertoire of medicinal chemists. Some of the drug properties that

can be modulated by a strategic inclusion of heterocyclic moiety into the molecule

include:

1) Potency and selectivity through bioisosteric replacements, 2) Lipophilicity, 3)

Polarity, and4) Aqueous solubility [17].

Heterocyclic Compounds

The IUPAC Gold Book describes heterocyclic compounds as:

Cyclic compounds having as ring members atoms of at least two different

elements, e.g. Carbazole, Carbazole Derivatives containing Chalcone

Analogues (CDCAs) [18].

Rings are considered as "heterocycles" only if they contain at least one atom

selected from halogen, N, O, S, Se or Te as a ring member. Heterocyclic rings

may be present as distinct entities or condensed, either with carbocycles or

among themselves.

I.1.1.2.2 Drug targets

The next step is to find an appropriate pharmacological target once a therapeutic

area has been identified (e.g., receptor, enzyme, or nucleic acid). It's evident that

knowing which biomacromolecules are involved in a given disease state is crucial.

This enables the medicinal research team to determine if agonists or antagonists for a

certain receptor or inhibitors for a certain enzyme should be developed. Many

chemotherapeutic agents have been shown to be promising cancer weapons.

Anticancer drugs as etoposide (Fig .5), doxorubixin, and amonafide, for example,

strongly inhibit topoisomerase II (Topo II) [19]. Topo II is one of the most promising

anticancer therapeutic targets because it can interfere with the enzyme-DNA complex,

causing persistent DNA damage and cell death [20]. Over the past 40 years,

topoisomerase inhibitors have been widely employed in the clinical treatment of

cancer. At least one drug that targets these enzymes is used in around half of all

chemotherapy regimens [21].


16

Etoposide (VP-16)

Etoposide is an antitumor agent currently in clinical use for the treatment of small cell

lung cancer, testicular cancer and lymphomas. Since the introduction of etoposide in

1971, its mechanism of action and potent antineoplastic activity has served as the

impetus for intensive research activities in chemistry and biology. This drug acts by

stabilizing a normally transient DNA-topoisomerase II complex, thus increasing the

concentration of double-stranded DNA breaks. This phenomenon triggers mutagenic

and cell death pathways. The function of topoisomerase II is understood in some

detail, as is the mechanism of inhibition of etoposide at a molecular level. Etoposide

has shortcomings of limited neoplastic activity against several solid tumors such as

non-small cell lung cancer, cross-resistance to MDR tumor cell lines and low

bioavailability. The design and synthesis of etoposide analogs is an activity of

fundamental interest to the field of cancer chemotherapy.[22]

Figure .5. 2D & 3D structure for Etoposide (VP-16).

I.1.1.2.3 Discovering drug targets

If a drug or poison has a biological impact, that substance must have a molecular

target in the body. Previously, finding drug targets necessitated first finding the drug.


17

Most Topo II inhibitors, like other anticancer drugs, have serious side effects,

including cardiotoxicity, specific luckemia, and multidrug resistance [23-25]. As a

result, finding new Topo II inhibitors in the form of CDCAs (carbazole derivatives

including chalcone analogs) with high efficacy and low toxicity is a hot topic of

research [26-31]. Many bioactive natural products and synthetic compounds with a

wide range of biological activities, including antimalarial, antibacterial, and anticancer

activity, contain the carbazole scaffold [32-34].

Furthermore, studies have revealed that carbazole derivatives may inhibit Topo II,

which could explain their antitumor effects [35, 36]. Modifications/substitutions of the

carbazole ring have become a hot focus in the development of novel Topo II-targeting

anticancer agents in recent years. For example (Figure 6), Compound 1 operates as a

possible non-intercalative Topo II catalytic inhibitor (Figure 6) [37]. Compound 2

showed substantial cytotoxicity against the HL-60 cell line as a Topo II catalytic

inhibitor [38].

NH

N

NH2

NH

O

O

OHHO

O

NH

HN

HN

O

O

O

O

N

O

R1

R1

NH

O

O

O

O

NH

N

O

O

O

O

N

N

OO

O

OCompound 1

Compound 2

Compound 4

Compound 5

Compound 6Compound 1

Figure.6. Design of CDCAs as novel Topo II-targeting anticancer agents.


18

Compound 3 showed modest cytotoxicity against cancer cell lines when used as a

Topo II poison. These findings sparked a lot of interest in looking into carbazole

derivatives as a new type of potential anticancer drug that targets Topo II. Chalcone's

pharmacological activities, including anticancer, antioxidant, anti-inflammatory, and

anti-infective activity, piqued researchers' interest in the twenty-first century.

Compounds 4, 5, and 6 (Figure 6), which are naturally occurring and synthesized

chalcone analogs, are now being studied as cytotoxic agents. In several cancer cell

lines; these chemicals have been shown to suppress cancer cell proliferation and cause

apoptosis. Tubulin polymerization inhibition, apoptosis induction, and Topo II

inhibition have all been identified as modes of action for chalcone analogs. The

mechanism of action of the most bioactive substances was investigated further.

CDCAs appear to be non-intercalative Topo II catalytic inhibitors, certain CDCAs

showed significant cytotoxic activity with low micromolar IC50.[18]

I.1.1.3 Finding a lead compound

I.1.1.3.1 Generation of Hits and Leads

Once the target has been identified, the vast majority of drug development

projects move on to the screening stage. Its goal is to identify a first group of active

molecules, more commonly referred to as hits. During a screening test, a molecule

must have a certain level of activity (usually in the micromolar range) in order to be

identified as such. Because the precise value of the required level of activity is not

absolute, a hit will be defined as a molecule that demonstrated moderate or strong

activity during an experimental test. There are a variety of experimental methods

available for identifying hits. Today, high-throughput screening (HTS) [39-41] is most

likely the most widely used method in the pharmaceutical industry. There are two

main types of experimental screening. [5, 42-44].

I.1.1.3.2 Screening on the targets

Molecules are tested on relatively small biochemical systems, usually for their

affinity or ability to inhibit a certain protein. This is the most common sort of

screening because it allows you to test many molecules in a short amount of time and


19

because it closely resembles the traditional pharmacological research paradigm: a

target / a drug.

I.1.1.3.3 Phenotypical screening

Molecules are tested on whole cells or animal models of a specific disease. They

are slower and more expensive, but they allow researchers to study molecule activity

in a cell-like environment and so achieve more goals. Several decades ago, this sort of

screening was widely used. It has gradually been phased out in favor of target

screening in order to save costs and shorten test times while increasing the number of

molecules tested. Moreover, a number of authors have recently highlighted its benefits

[42, 45, 46].

Once identified, the hits must be confirmed using more detailed testing,

primarily to ensure that the observed activity is not due to artefacts related to the

experimental method or the presence of impurities. A lead is a compound that has

demonstrated a moderate or significant activity and is considered a good place to start

looking for a potential drug candidate. After the effectiveness of the activity has been

confirmed, more advanced tests will be conducted. These studies will aim to determine

not only the activity of molecules (selectivity, low-concentration inhibition) but also

their physico-chemical properties (solubility, lipophilic, metabolic stability, etc.) in

order to select the most promising molecules. The selection of leads is a crucial step

because the rest of the project will be focused on the few molecules that are obtained.

Once a lead has been identified, and if it is successful, one or more molecules will be

submitted to the optimization stage, representing as many paths as feasible for the

discovery of a potential drug candidate.

I.1.1.4 Lead Optimization

During this stage, medicinal chemists will begin an iterative process in which

chemical alterations will be made to a few molecules obtained during screening in

order to improve the activity and properties of future drug candidates. The goal of this

step is to obtain some molecules with a high level of activity (the level of activity

depends on the project and the goal, with the goal being to maximize the ratio of


20

activity to concentration) and appropriate Physicochemical, biological, and

toxicological properties.

This is by far the most difficult step, as it necessitates optimizing both activity and

other properties that will turn the molecule into a drug that is both effective and low

(or non-toxic). In general, multi-objective optimization is discussed, and numerous

studies have been conducted in this area, including in cheminformatics [47-49].

I.1.1.4.1 Rationalize the selection

The elimination of false selection allows for a reduction in the number of

unnecessary tests and hence significant cost savings. There are also methods for

selecting more precisely the components to test in order to reduce the chances of

success in relation to the project's goals.

I.1.1.4.1.1 “Drug-like” compounds

Ideally, only molecules with a high potential to become a drug should be tested: no

toxicity, high therapeutic efficacy, good absorption for oral prescription drugs, and so

on. In the absence of a universal definition or a perfect predictive method, numerous

studies have attempted to distinguish between a biologically active molecule and a

"drug-like" molecule that approaches the ideal drug [50-54].

Lipinski's [55]definition of the term "drug-like" is the most well-known. The

molecules with the best probability of being absorbed by voice oral satisfy at least

three of the following characteristics, based on compounds administered by voice oral

that passed phase 2 of clinical tests successfully:

Molecular weight < 500 Da,

LogP< 5,

Number of hydrogen bond acceptors < 10,

Number of hydrogen bond donors < 5.

With a similar specific goal, Veber et al [56]:

Polar surface area ≤140 Å2 and ≤10 rotatable bonds


21

I.1.1.4.1.2 "lead-like" compounds

Despite being frequently associated with Lipinski's rules, the term "drug-like" is

quite vague and has thus been repurposed to describe more specific classes of

molecules. Numerous studies have been conducted, for example, in order to

distinguish between the leads of pharmaceuticals and other molecules. Once again, the

definition of a lead remains mostly speculative, and other studies have attempted to

establish analogous rules to Lipinski's. The concept stems from the observation that

molecules entering clinical trials are frequently more complicated and larger than the

leads from which they were derived [57-60]. As a result, more restrictive filters for

obtaining simpler compounds have been developed, the most famous of which is that

of Hann and Oprea[59]:

Molecular weight < 460 Da ;

−4 < Log P < 4, 2 ;

Log Sw>= −5 ;

rotatable bonds <10 ;

Number of hydrogen bond donors < 5 ;

Number of hydrogen bond acceptors <9 ;

Number of cycles <4 ;

Other criteria have recently been introduced as well. Hopkins et al.'s of "Ligand

Efficiency" [61]suggest that the ratio between liaison energy and the number of heavy

atoms is an effective measure for lead selection. It allows, for example, to distinguish

between an active and complicated molecule and a molecule with a similar activity but

a lower complexity, allowing for the selection of leads with a higher potential for

further modification. Further groups have proposed other indices of this sort, this time

taking into account the area of the polar topological surface or lipophilicity.


22

I.1.2 Pre-clinical trials

After obtaining optimized "leads", the pre-clinical trials consist of numerous

studies aimed at qualifying the drug candidate from a pharmacological,

pharmacokinetic and toxicological point of view. The use of rational animal

experiments will allow consideration of the administration of the candidate drug in

humans during subsequent clinical trials. The aim of pharmacological studies is to

validate the mechanism of action of the candidate drug and to measure its activity in

experimental models of the pathology, in vitro and in vivo in animals.

Pharmacokinetic studies can describe the fate of a compound, its distribution,

absorption and metabolism, and then its elimination from the body. Toxicological

studies are used to determine the toxic doses of the drug candidate in the organism

studied (mainly mice or rats, more rarely cats, dogs, pigs, or primates) .[62, 63]

These data will help determine the doses to be administered to humans during clinical

trials and constitute a first approach in the study of potential adverse effects of the

candidate drug, allowing these effects to be monitored proactively. An ancillary part of

the pre-clinical development also consists of the assessment of the environmental risk

associated with the marketing of the candidate drug. All the information collected

during the pre-clinical trials will be compiled in a marketing authorization application

for the candidate drug. This will be closely studied by the competent health authorities

(in France, this is the National Medicines Safety Agency (ANSM) and the People

Protection Committee (CPP)), before authorizing, or no, the entry of the drug

candidate into the clinical trial phase.[62, 63]

I.1.3 Clinical tests

Clinical trials represent the most critical step in the drug design process,

whether or not validating several years of pre-clinical and extremely expensive

research. They are divided into four phases. The first three make it possible to

establish the efficacy and safety of the drug candidate in order to obtain Marketing

Authorization, while the fourth consists of the monitoring of side effects throughout

the period of marketing and use of the drug (pharmacovigilance phase ) .[64]The


23

supervision of clinical trials by health authorities is very strict. In France, the informed

consent of volunteers is required and a national register kept by the ANSM lists all the

subjects, the amount of their compensation (capped at 4,500 euros per year to avoid

possible abuses), the date and duration of their protocols. During each phase, if

adverse reactions are detected, the clinical study may be terminated and the drug

candidate may be permanently discontinued.

Phase I is preliminary to the study of the efficacy of a drug candidate. It takes place on

a small number of healthy volunteers (20 to 80) and its sole objective is to assess the

tolerance or absence of any side effects associated with the administration of the

candidate drug. However, these trials can be offered to patients with treatment failure,

for whom the studied treatment represents the only chance of survival. Approximately

54% of the compounds tested in phase I will advance to the next phase.[65] Phase II

aims to determine the optimal dosage, the efficacy of the drug candidate at this dosage

and its short-term tolerance. It is generally carried out on a homogeneous group of 100

to 200 patients. Only 18% of the clinical trials started go on to phase III.[65]These

trials, on a larger scale, are carried out on several thousand patients representative of

the patient population for which the treatment is intended. These are controlled trials in

which the drug in development is compared either with an effective treatment already

on the market or with a placebo.

Phase III trials are most often carried out double-blind and with random draw:

the treatments or placebo are randomly assigned to patients and to the doctors in

charge of monitoring, without them being informed of what assignment they have

been subject to. This method avoids biases related to the patient management process,

commonly known as the "placebo effect". The complete clinical trial process results in

a Marketing Authorization being obtained for 11% of the drug candidates.[6, 65]

When Marketing Authorization is granted, the marketing and application of the

treatment can begin. Phase IV of clinical trials, also called pharmacovigilance, then

consists of monitoring the potential side effects of the treatment on all the patients who

benefit from it (large and heterogeneous population). In this context, physicians have


24

the obligation to report the adverse effects described by their patients, which makes it

possible to quickly identify the emergence of new effects which would not have been

detected during clinical trials and guarantees greater safety. of use to patients.

I.2. Limitations and failures

Pharmaceutical research therefore faces major challenges, mainly scientific,

but also financial and political. The success rate of the clinical trial phases is about

11%, all types of pathologies combined.[6, 65] This varies according to the types of

pathologies, from around 5% for those involving the central nervous system to around

20% for cardiovascular disease (Figure 7) .[6]

Figure. 7. Success rate of phase I clinical trials in obtaining marketing authorization for the

ten largest pharmaceutical companies between 1991 and 2000. The success rate varies

according to the pathologies targeted.

These low success rates can be explained in several ways: pharmaceutical research

is currently focused on pathologies of great complexity (cancer, AIDS, etc.),

competition between different pharmaceutical companies is increasing in parallel with

standards of care and health authorities are becoming more demanding. In 1991, poor

pharmacokinetic characteristics of drug candidates were the leading cause of clinical

trial failure (40%). In 2000, pharmacokinetic problems represented only 10% of


25

clinical trial failures. Today, the main causes of failure during clinical trial phases are

the lack of efficacy of compounds (30%) and their toxicity (30%). [6] The failures

linked to the lack of efficacy of compounds are more frequent when the animal models

used are not very predictive, in particular for pathologies of the central nervous system

and cancers, inducing high failure rates in phases II and III (Figure 7) .[6]

I.3. Medicinal Chemists Today

Medicinal chemists today are facing a serious challenge because of the increased

cost and enormous amount of time taken to discover a new drug ,and also because of

fierce competition amongst different drug companies.

Pharma Industry development became one of the important targets for different

Governments in the last three decades, especially with the tremendous achievements of

the multinational companies in this area of industry.

Challenges of this industry in third world have been increased; the discovery of a

new active marketed molecule costs millions of USD, which increase the problem we

are facing. Many worldwide companies have been merged to increase their ability in

new drug discovery area. Most of them on the other hand adopted new technologies to

discover and develop new drug entities, among these are computational and modeling

techniques, which help in decreasing the time and cost of discovery researches, that's

why importance of Computer Aided Drug Design and Molecular Modeling is

increasing nowadays[6].

I.4 Conclusion

It was discovered that chemical libraries were at the heart of the process of

developing drug candidates. With the passage of time and ever-increasing storage

capacities, the size of chemical libraries and the number of molecules available have

grown significantly. The management and analysis of such a large amount of data

necessitate the use of computer systems capable of implementing various data analysis

and composition selection strategies. In the following section, we will introduce the

field of discussing methods for responding to the issues raised in the previous section,

putting the work of this thesis into context.

References

26

[1] C.P. Adams, V.V.J.H.a. Brantner, Estimating the cost of new drug development: is it

really $802 million?, 25 (2006) 420-428.

[2] J.A.J.P. DiMasi, The value of improving the productivity of the drug development

process, 20 (2002) 1-10.

[3] J.A. DiMasi, R.W. Hansen, H.G.J.J.o.h.e. Grabowski, The price of innovation: new

estimates of drug development costs, 22 (2003) 151-185.

[4] C.P. Adams, V.V.J.H.e. Brantner, Spending on new drug development 1, 19 (2010) 130-

141.

[5] R.M.J.R.W.D.D. Rydzewski, Project Considerations, (2008) 175.

[6] I. Kola, J.J.N.r.D.d. Landis, Can the pharmaceutical industry reduce attrition rates?, 3

(2004) 711-716.

[7] H. Hernandez Guevara, F. Hervas, A. Tuebke, The 2011 EU Industrial R&D Investment

Scoreboard, in, Joint Research Centre (Seville site), 2011.

[8] S. Maslov, I.J.P.o.t.N.A.o.S. Ispolatov, Propagation of large concentration changes in

reversible protein-binding networks, 104 (2007) 13655-13660.

[9] H. Kitano, Towards a theory of biological robustness, in, John Wiley & Sons, Ltd

Chichester, UK, 2007.

[10] A.L.J.N.c.b. Hopkins, Network pharmacology: the next paradigm in drug discovery, 4

(2008) 682-690.

[11] M. Soulie, G. Portier, L.J.P.e.u.j.d.l.A.f.d.u.e.d.l.S.f.d.u. Salomon, Oncological principles

for local control of primary tumor, 25 (2015) 918-932.

[12] M. Gérin, P. Gosselin, S. Cordier, C. Viau, P. Quénel, La référence bibliographique de ce

document se lit comme suit: Gérin M, Band P (2003) Cancer. In: Environnement et santé

publique-Fondements et.

[13] H. Sung, J. Ferlay, R.L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal,

F.J.C.a.c.j.f.c. Bray, Global cancer statistics 2020: GLOBOCAN estimates of incidence and

mortality worldwide for 36 cancers in 185 countries, 71 (2021) 209-249.

References

27

[14] E. Apretna, F. Thiessard, G. Miremont-Salamé, F.J.O. Haramburu, Pharmacovigilance,

cancer et médicaments anticancéreux, 6 (2004) 66-71.

[15] N.T. Mason, N.I. Khushalani, J.S. Weber, S.J. Antonia, H.L. McLeod, Modeling the cost

of immune checkpoint inhibitor-related toxicities, in, American Society of Clinical Oncology,

2016.

[16] T.J. Yang, A.J.J.T.l.c.r. Wu, Cranial irradiation in patients with EGFR-mutant non-small

cell lung cancer brain metastases, 5 (2016) 134.

[17] A.D. McNaught, A. Wilkinson, Compendium of chemical terminology. IUPAC

recommendations, (1997).

[18] P.-H. Li, H. Jiang, W.-J. Zhang, Y.-L. Li, M.-C. Zhao, W. Zhou, L.-Y. Zhang, Y.-D.

Tang, C.-Z. Dong, Z.-S.J.E.j.o.m.c. Huang, Synthesis of carbazole derivatives containing

chalcone analogs as non-intercalative topoisomerase II catalytic inhibitors and apoptosis

inducers, 145 (2018) 498-510.

[19] A.M. Rahman, S.-E. Park, A.A. Kadi, Y.J.J.o.m.c. Kwon, Fluorescein hydrazones as

novel nonintercalative topoisomerase catalytic inhibitors with low DNA toxicity, 57 (2014)

9139-9151.

[20] M.D. Galsky, N.M. Hahn, B. Wong, K.M. Lee, P. Argiriadi, C. Albany, K. Gimpel-

Tetra, N. Lowe, M. Shahin, V.J.C.c. Patel, pharmacology, Phase 2 trial of the topoisomerase

II inhibitor, amrubicin, as second-line therapy in patients with metastatic urothelial

carcinoma, 76 (2015) 1259-1265.

[21] G. Capranico, J. Marinello, G.J.J.o.m.c. Chillemi, Type i DNA topoisomerases, 60

(2017) 2169-2192.

[22] P. Meresse, E. Dechaux, C. Monneret, E.J.C.m.c. Bertounesque, Etoposide: discovery

and medicinal chemistry, 11 (2004) 2443-2466.

[23] B. Li, Z.-Z. Yue, J.-M. Feng, Q. He, Z.-H. Miao, C.-H.J.E.j.o.m.c. Yang, Design and

synthesis of pyrido [3, 2-α] carbazole derivatives and their analogues as potent antitumour

agents, 66 (2013) 531-539.

References

28

[24] A.K. McClendon, N.J.M.R.F. Osheroff, M.M.o. Mutagenesis, DNA topoisomerase II,

genotoxicity, and cancer, 623 (2007) 83-97.

[25] N.J. Winick, R.W. McKenna, J.J. Shuster, N.R. Schneider, M.J. Borowitz, W.P.

Bowman, D. Jacaruso, B.A. Kamen, G.R.J.J.o.c.o. Buchanan, Secondary acute myeloid

leukemia in children with acute lymphoblastic leukemia treated with etoposide, 11 (1993)

209-217.

[26] M.S. Christodoulou, F. Calogero, M. Baumann, A.N. García-Argáez, S. Pieraccini, M.

Sironi, F. Dapiaggi, R. Bucci, G. Broggini, S.J.E.j.o.m.c. Gazzola, Boehmeriasin A as new

lead compound for the inhibition of topoisomerases and SIRT2, 92 (2015) 766-775.

[27] M.S. Christodoulou, N. Fokialakis, D. Passarella, A.N. García-Argáez, O.M. Gia, I.

Pongratz, L. Dalla Via, S.A.J.B. Haroutounian, m. chemistry, Synthesis and biological

evaluation of novel tamoxifen analogues, 21 (2013) 4120-4131.

[28] M.S. Christodoulou, M. Zarate, F. Ricci, G. Damia, S. Pieraccini, F. Dapiaggi, M. Sironi,

L.L. Presti, A.N. García-Argáez, L.J.E.j.o.m.c. Dalla Via, 4-(1, 2-diarylbut-1-en-1-yl)

isobutyranilide derivatives as inhibitors of topoisomerase II, 118 (2016) 79-89.

[29] N. Ferri, T. Radice, M. Antonino, E.M. Beccalli, S. Tinelli, F. Zunino, A. Corsini, G.

Pratesi, E.M. Ragg, M.L.J.B. Gelmi, m. chemistry, Synthesis, structural, and biological

evaluation of bis-heteroarylmaleimides and bis-heterofused imides, 19 (2011) 5291-5299.

[30] H. Matsumoto, M. Yamashita, T. Tahara, S. Hayakawa, S.-i. Wada, K. Tomioka, A.J.B.

Iida, m. chemistry, Design, synthesis, and evaluation of DNA topoisomerase II-targeted

nucleosides, 25 (2017) 4133-4144.

[31] M.S. Islam, S. Park, C. Song, A.A. Kadi, Y. Kwon, A.M.J.E.j.o.m.c. Rahman,

Fluorescein hydrazones: A series of novel non-intercalative topoisomerase IIα catalytic

inhibitors induce G1 arrest and apoptosis in breast and colon cancer cells, 125 (2017) 49-67.

[32] L. S Tsutsumi, D. Gündisch, D.J.C.t.i.m.c. Sun, Carbazole scaffold in medicinal

chemistry and natural products: a review from 2010-2015, 16 (2016) 1290-1313.

[33] V.R. PO, M.P. Tantak, R. Valdez, R.P. Singh, O.M. Singh, R. Sadana, D.J.R.a. Kumar,

Synthesis and biological evaluation of novel carbazolyl glyoxamides as anticancer and

antibacterial agents, 6 (2016) 9379-9386.

References

29

[34] B. Maji, K. Kumar, M. Kaulage, K. Muniyappa, S.J.J.o.m.c. Bhattacharya, Design and

synthesis of new benzimidazole–carbazole conjugates for the stabilization of human telomeric

DNA, telomerase inhibition, and their selective action on cancer cells, 57 (2014) 6973-6988.

[35] W. Wang, X. Sun, D. Sun, S. Li, Y. Yu, T. Yang, J. Yao, Z. Chen, L.J.C. Duan,

Carbazole aminoalcohols induce antiproliferation and apoptosis of human tumor cells by

inhibiting topoisomerase I, 11 (2016) 2675-2681.

[36] A.J.E.j.o.m.c. Głuszyńska, Biological potential of carbazole derivatives, 94 (2015) 405-

426.

[37] Y. Funayama, K. Nishio, K. Wakabayashi, M. Nagao, K. Shimoi, T. Ohira, S. Hasegawa,

N.J.M.R.F. Saijo, M.M.o. Mutagenesis, Effects of β-and γ-carboline derivatives on DNA

topoisomerase activities, 349 (1996) 183-191.

[38] Y. Hajbi, C. Neagoie, B. Biannic, A. Chilloux, E. Vedrenne, B. Baldeyrou, C. Bailly, J.-

Y. Mérour, S. Rosca, S.J.E.j.o.m.c. Routier, Synthesis and biological activities of new furo [3,

4-b] carbazoles: Potential topoisomerase II inhibitors, 45 (2010) 5428-5437.

[39] R. Macarron, M.N. Banks, D. Bojanic, D.J. Burns, D.A. Cirovic, T. Garyantes, D.V.

Green, R.P. Hertzberg, W.P. Janzen, J.W.J.N.r.D.d. Paslay, Impact of high-throughput

screening in biomedical research, 10 (2011) 188-195.

[40] J.W.J.I.J.H.T.S. Noah, New developments and emerging trends in high-throughput

screening methods for lead compound identification, 1 (2010) 141-149.

[41] L.M. Mayr, D.J.C.o.i.p. Bojanic, Novel trends in high-throughput screening, 9 (2009)

580-588.

[42] D.C. Swinney, J.J.N.r.D.d. Anthony, How were new medicines discovered?, 10 (2011)

507-519.

[43] A. Golebiowski, S.R. Klopfenstein, D.E.J.C.o.i.c.b. Portlock, Lead compounds

discovered from libraries: part 2, 7 (2003) 308-325.

[44] A. Golebiowski, S.R. Klopfenstein, D.E.J.C.o.i.c.b. Portlock, Lead compounds

discovered from libraries, 5 (2001) 273-284.

References

30

[45] S. Heller, A.J.J.o.C. McNaught, The status of the InChI project and the InChI trust, 2

(2010) 1-1.

[46] D.C. Ince, L. Hatton, J.J.N. Graham-Cumming, The case for open computer programs,

482 (2012) 485-488.

[47] S. Pickett, Multi-objective approaches to screening collection design and analysis of HTS

data, in: ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, AMER

CHEMICAL SOC 1155 16TH ST, NW, WASHINGTON, DC 20036 USA, 2005, pp. U770-

U771.

[48] S. Ekins, J.D. Honeycutt, J.T.J.D.d.t. Metz, Evolving molecules using multi-objective

optimization: applying to ADME/Tox, 15 (2010) 451-460.

[49] S.J. Lusher, R. McGuire, R. Azevedo, J.-W. Boiten, R.C. van Schaik, J.J.D.d.t. de Vlieg,

A molecular informatics view on best practice in multi-parameter compound optimization, 16

(2011) 555-568.

[50] M.S. Lajiness, M. Vieth, J.J.C.o.i.d.d. Erickson, development, Molecular properties that

influence oral drug-like behavior, 7 (2004) 470-477.

[51] I.J.M.r.r. Muegge, Selection criteria for drug‐like compounds, 23 (2003) 302-321.

[52] G. Vistoli, A. Pedretti, B.J.D.d.t. Testa, Assessing drug-likeness–what are we missing?,

13 (2008) 285-294.

[53] M.-Q. Zhang, B.J.C.o.i.b. Wilkinson, Drug discovery beyond the ‘rule-of-five’, 18

(2007) 478-488.

[54] P.D. Leeson, B.J.N.r.D.d. Springthorpe, The influence of drug-like concepts on decision-

making in medicinal chemistry, 6 (2007) 881-890.

[55] C.A. Lipinski, F. Lombardo, B.W. Dominy, P.J.J.A.d.d.r. Feeney, Experimental and

computational approaches to estimate solubility and permeability in drug discovery and

development settings, 23 (1997) 3-25.

[56] D.F. Veber, S.R. Johnson, H.-Y. Cheng, B.R. Smith, K.W. Ward, K.D.J.J.o.m.c. Kopple,

Molecular properties that influence the oral bioavailability of drug candidates, 45 (2002)

2615-2623.

References

31

[57] M.M. Hann, A.R. Leach, G.J.J.o.c.i. Harper, c. sciences, Molecular complexity and its

impact on the probability of finding leads for drug discovery, 41 (2001) 856-864.

[58] T.I. Oprea, A.M. Davis, S.J. Teague, P.D.J.J.o.c.i. Leeson, c. sciences, Is there a

difference between leads and drugs? A historical perspective, 41 (2001) 1308-1315.

[59] M.M. Hann, T.I.J.C.o.i.c.b. Oprea, Pursuing the leadlikeness concept in pharmaceutical

research, 8 (2004) 255-263.

[60] G.M. Keserü, G.M.J.n.r.D.D. Makara, The influence of lead discovery strategies on the

properties of drug candidates, 8 (2009) 203-212.

[61] A.L. Hopkins, C.R. Groom, A.J.D.d.t. Alex, Ligand efficiency: a useful metric for lead

selection, 9 (2004) 430-431.

[62] J.P. Hughes, S. Rees, S.B. Kalindjian, K.L.J.B.j.o.p. Philpott, Principles of early drug

discovery, 162 (2011) 1239-1249.

[63] A. Bender, J. Scheiber, M. Glick, J.W. Davies, K. Azzaoui, J. Hamon, L. Urban, S.

Whitebread, J.L.J.C. Jenkins, Analysis of pharmacology data and the prediction of adverse

drug reactions and off-target effects from chemical structure, 2 (2007) 861-873.

[64] D. Wang, A. Bakhai, Clinical trials: a practical guide to design, analysis, and reporting,

Remedica, 2006.

[65] S.M. Paul, D.S. Mytelka, C.T. Dunwiddie, C.C. Persinger, B.H. Munos, S.R. Lindborg,

A.L.J.N.r.D.d. Schacht, How to improve R&D productivity: the pharmaceutical industry's

grand challenge, 9 (2010) 203-214.

Chapter II

Part I: Computer in medicinal

chemistry

Computer in medicinal chemistry

33

All theoretical research is underpinned by two essential motivations:

understanding and forecasting, that is, understanding what has already been done and

forecasting what may be achievable. The forecast answers questions like: "What

would happen if ...?", Or "Can we do ...?" or “What would be the value of…?”. The

traditional answer would be to experiment. But at a time when the price of computer

calculations is continually falling, while chemicals, devices, skilled labor, etc.

continues to grow, it is increasingly interesting to exploit theoretical models of all

kinds to help design new chemical species [1].

The ambition of a theoretical chemist is to be able to predict, confirm or reinterpret the

experiment using molecular modeling. Indeed, the perseverance of researchers, and

above all the power of their computer resources, play in favor of theoretical chemistry,

and its field of application.

The research and synthesis of new chemical and biochemical compounds is often

associated today with a study by molecular modeling. Molecular modeling is a

commonly used methodology. For a little over thirty years, it has gradually established

itself as a tool of choice for the discovery and oriented design of new active molecules.

Previously, only systematic biological tests were performed on a large number of

molecules and, often, only luck made it possible to highlight an interesting lead [2].

Molecular modeling is a tool for researchers concerned with the structure and

reactivity of molecules. Knowing the structure of molecular edifices makes it possible

to understand what is achieved in a physical, chemical or biological transformation. It

can also make it possible to predict such transformations. Both understanding and

forecasting are greatly facilitated when one can visualize the structures. A molecule is

correctly described by its geometry and its thermodynamic properties. The

visualization must account for all of these characteristics. The essential question is to

represent a molecule on the screen as close as possible to "reality" [3].


34

II.1 Theoretical Background for Quantum Mechanical Calculations

The goal of this chapter is to go over some uses of quantum mechanical

calculations in medicinal chemistry and drug design. These introductions, on the other

hand, will be kept to a bare minimum, focusing on the key differences between

methodologies and their known strengths and shortcomings in respect to medicinal

chemistry applications, with only one mathematical equation, probably to the relief of

the majority of readers.

The Schrödinger equation is the foundation of quantum mechanics, and it has a simple

eigenvalue problem form:

𝐻𝜓 = 𝐸𝜓

Even with today's computing power, this famous equation cannot be solved directly

for anything greater than hydrogen. As a result, a number of assumptions were devised

to allow for the quantum mechanical treatment of molecules of more immediate

concern to most chemists.

For example, the Born–Oppenheimer approximation treats atom nuclei as fixed, and

the Hartree–Fock (HF) approximation replaces the genuine electron– electron potential

description with an effective potential, thus eliminating electron correlation. Another is

the use of basis sets instead of actual electron integrals, which are designed to

approximate the structure of orbitals. For ab initio computations, the quality of these

basis sets is critical, and the larger they are and the more individual Gaussian functions

they contain, the better. The utility and correctness of calculations based on these

approximations are typically confirmed by comparison with experiment, especially for

molecular geometries and properties such as dipole moments, and the findings are

often surprisingly exact despite the use of these simplifications.

Some approximations, on the other hand, are more dubious, and the magnitude of

inaccuracy imposed is frequently unknown. Despite the use of a variety of

approximations, quantum mechanical calculations are often quite accurate and useful

in answering questions and defining molecule structures, characteristics, and

interactions that are crucial in medicinal chemistry. It should also be noted that some


35

of these questions, such as those relating to chemical reactivity, molecular properties

such as nucleophilicity, electrophilicity, charge distribution, spin–orbit coupling,

dipole, and higher multipole moments relating to polarizability, infrared, Raman, and

NMR chemical shifts, circular dichroism, and magnetic susceptibility [4], must be

answered in the affirmative, the computational and medicinal chemist's only choice for

obtaining reliable predictions is quantum mechanical computations.

In contrast to molecular mechanics, quantum mechanical calculations are obtained

directly from the physical rules that affect molecular structure through an approximate

solution of the Schrödinger equation. Ab initio, DFT, and semiempirical approaches

are the three types of methodologies available. While ab initio and density functional

approaches do not need parametrization to solve the Schrödinger equation,

semiempirical methods use parameters to avoid having to compute some of the time-

consuming integrals that ab initio and DFT computations require. Furthermore,

semiempirical approaches only consider the valence electrons, and despite having

significantly fewer parameters than molecular mechanics methods, are also less

comprehensible. All three approaches (ab initio, DFT, and semiempirical) produce a

wave function that may be used to calculate all electrical properties [5].

II.1.1 Molecularmechanics

II.1.1.1 Introduction

Molecular modellers usually have a different goal in mind: they want a force field

that can be transmitted from molecule to molecule so that they may anticipate (for

example, the shape of a new molecule) using data from other molecules. They use the

bond notion and appeal to classical chemists' ideas that a molecule is made up of a

collection of bonded atoms; a large molecule contains the same qualities as small

molecules, and in various combinations [6].

Because of non-bonded van der Waals and Coulombic interactions, molecules are

described in terms of "bonded atoms," which have been distorted from some idealized

shape. The success of molecular mechanics models is determined by the geometrical

parameters' great transferability from one molecule to the next, as well as the

parameters' predictable dependency on atomic hybridization.


36

Carbon-carbon single bond lengths, for example, are typically in the tiny range of 1.45

to 1.55, and grow in length as the “p character” of the carbon hybrids increases. As a

result, if the molecule has already been represented in terms of a certain valence

structure, it is possible to make a pretty accurate “guess” about molecular geometry in

terms of bond lengths, bond angles, and torsion angles. This group includes the vast

majority of organic compounds.

The “energy” of a molecule in molecular mechanics is described as a sum of

contributions from distortions in “ideal” bond distances (stretch contributions), bond

angles (bond contributions), and torsion angles (torsion contributions), as well as

contributions from “non-bonded” (van der Waals and Coulombic) interactions.

Figure .8. An illustration of the various terms that contribute to a typical force field.


37

II.1.1.1.1 Elongation energy

Bonds between atoms in a molecular structure often tend to elongate or contract

(Figure 9).

Figure.9.Elongation between two atoms.

This deformation is governed as a first approximation by the “Hooke” law of spring

elongation. We can thus associate it with an energy of elongation of the form:

E(L) = 1/2[K1(L-Lo)²]

WhereK1: is the constant of elongation or constant of Hooke

Lo: the length of the reference link.

L: the length of the link in the model.

All of these elongation terms are summed over all of the bonds in the molecule.

A cubic term (L-Lo)3 is generally added for large deformations. The computation of

this energy thus imposes to know at least the two indissociable parameters (K1 and Lo)

which represent a subset of the force field. Indeed, it emerges from the serial

development of the mathematical expression of the Morse curve reflecting the existing

interaction between two atoms according to their respective distance [7].


38

II.1.1.1.2. Bending energy

The fluctuation of atoms around their equilibrium position generates a

deformation of the valence angles (figure 10) [7].

Figure.10.Deformation of valence angles.

This phenomenon is governed by a bending energy which can be expressed in the

same forms as above, namely, to put it simply:

E() =1/2[Kf(- o)²]

Kf: bending constant

o: reference valence angle

: valence angle in the molecule

The pair (Kf, o) represents here again a subset of the force field.

II.1.1.1.3. Torsional energy

It concerns the dihedral angle formed by atoms 1-2-3-4. It notably takes into account

the 3D structure of the molecule (figure 11) [7].


39

Figure.11.Dihedral angle formed by atoms 1-2-3-4.

The evaluation of this energy E (f) is done by a function developed in Fourier series.

E(f) = 1/2 [V1(1+cosf)+V2(1 - cos2f)+V3(1+cos3f)]

The dihedral angle (f) defines the twist around link 2-3.

V1, V2, V3 are the constants of the potential of torsional energy.

II.1.1.1.4. Van der Waals energy

This energy concerns atoms not linked to each other and not linked to a common

atom [8]. It consists of two parts, one repellent and the other attractive, and can be

expressed by the following equation[7]:

E(vdw)= e*[- C1(r*/r) 6 +C2 exp(-C3(r/r*)]

e * :energy parameter which characterizes the depth of the potential well at the

distance r *, also called "hardness".

r *: sum of the VdW radii of the interacting atoms.

r: interatomic distance.

C1, C2, C3: constants of the force field.

We can therefore represent this energy as a function of the interatomic distance "r" as

follows:


40

Figure.12.Van der Waals energy curve.

II.1.1.1.5. Electrostatic energy

The electrostatic interactions can, in certain cases, take on a considerable

importance, in particular in the case of molecules comprising two or more

heteroatoms, Meyer et al [9], have proposed to introduce an electrostatic term even for

hydrocarbons. It can be expressed from the atomic charges or the dipole moments of

each bond [7].

In the first case:

E(e)= S q1q2 / D.d12

D: local dielectric constant of the medium

q1q2: atomic partial charges of atoms 1,2

d12: interatomic distance.

In the second:


41

E(e) = m1 m2(cosX - 3cosa1 .cosb2) / D.r12-3

r: distance between the midpoints of the two links

m1, m2: respectively represent the dipole moments of the two bonds.

X: the angle between the two moment vectors.

a1, b2: angles formed respectively between µ1 and r and µ2 and r

Figure.13.Electrostatic interactions between two atoms.

II.1.1.2 Force Fields

Models of molecular mechanics differ in terms of the number and nature of terms

they include, as well as the details of their parameterization. The combination of

functional form and parameterization is known as a force field.

We have been vague so far about which variables are the ‘correct’ ones to take.

Chemists visualize molecules in terms of bond lengths, bond angles and dihedral

angles, yet this information is also contained in the set of Cartesian coordinates for the

constituent atoms. Both are therefore ‘correct’; it is largely a matter of personal choice

and professional training.

Spectroscopists have developed systematic simplifications to the force fields in order

to make as many as possible of the small terms vanish. If the force field contains only

‘chemical’ terms such as bond lengths, bond angles and dihedral angles, then it is

referred to as avalence force field (VFF). There are other types of force fields in the

literature, intermediate between the VFF and the general force field[10].


42

With these principles in mind, many different force fields were build in professional

molecular modelling programs such as :Dreiding, MM1, MM2, Amber, OPLS…etc.

II.1.1.3 Limitations of Molecular Mechanics

The primary advantage of molecular mechanics models, is their simplicity. Except

for very small systems, computation cost is completely dominated by evaluation of

non-bonded van der Waals and Coulombic terms, the number of which is given by the

square of the number of atoms.

However, the magnitude of these terms falls off rapidly with increasing interatomic

distance and, in practice, computation cost scales linearly with molecular size for

sufficiently large molecules.

Molecular mechanics calculations may easily be performed on molecules

comprising several thousand atoms. Additionally, molecular mechanics calculations

are sufficiently rapid to permit extensive conformational searching on molecules

containing upwards of 100-200 atoms.

The fact that molecular mechanics models are parameterized may also be seen as

providing an advantage over quantum chemical models. It is possible, at least in

principle, to construct molecular mechanics models which will accurately reproduce

known experimental data, and hopefully will anticipate (unknown) data on closely-

related systems.

There are important limitations of molecular mechanics models. First, they are

limited to the description of equilibrium geometries and equilibrium conformations.

Because the mechanics “strain energy” is specific to a given molecule (as a measure of

how far this molecule deviates from its “ideal arrangement”). Two important

exceptions are: calculations involving isomers with exactly the same bonding, e.g.,

comparison of cis and trans-2-butene, and conformational energy comparisons, where

different conformers necessarily have exactly the same bonding.

In addition, molecular mechanics calculations reveal nothing about bonding or, more

generally, about electron distributions in molecules. As will become evident later,


43

information about electron distributions is key to modeling chemical reactivity and

selectivity. There are however, important situations where purely steric effects are

responsible for trends in reactivity and selectivity, and here molecular mechanics

would be expected to be of somevalue.

Finally, it needs to be noted that molecular mechanics is essentially an

interpolation scheme, the success of which depends not only on good parameters, but

also on systematics among related molecules. Molecular mechanics models would not

be expected to be highly successful in describing the structures and conformations of

“new”) unfamiliar (molecules outside the range of parameterization [11].

II.1.2 Quantum Mechanics

II.1.2 .1 Introduction

There are two approaches to quantum mechanics. One is to follow the historical

development of the theory from the first indications that the whole fabric of classical

mechanics and electrodynamics should be held in doubt to the resolution of the

problem in the work of Planck, Einstein, Heisenberg, Schrödinger, and Dirac. The

other is to stand back at a point late in the development of the theory and to see its

underlying theoretical structure[12].Quantum mechanics can be thought of roughly as

the study of physics on very small length scales, although here are also certain

macroscopic systems it directly applies to. In quantum mechanics, particles have wave

like properties, and a particular wave equation, the Schrodinger equation, governs how

these waves behave.

What are we looking for in quantum mechanics? we search to solve the equation of

Schrödinger of an electron using Born-Oppenheimer approximations. We can use the

Born- Oppenheimer approximation to construct an electronic Hamiltonian, which

neglects the kinetic energy term of the nuclei since the weight of a typical nucleus is

thousands of times greater than that of an electron.


44

This Hamiltonian is used in the Schrödinger equation describing the motion of the

electrons in the field of the fixed nuclei:

Solving this equation for the electronic wave function will produce the effective

nuclear potential function Eeff that depends on the nuclear coordinates and describes

the potential energy surface of the system.

For bond electronic problem, should satisfy two requirements: antisymmetricity and

normalization. should change sign when two electrons of the molecule interchange and

the integral of overall space should be equal to the number of electrons of the molecule

[13].

II.1.2.2 HF and DFT Methods

Many aspects of molecular structure and dynamics canbe modeled

usingclassicalmethods in the form of molecular mechanics and dynamics. The

classical force field is based on empirical results, averaged over a large number of

molecules. Because of this extensive averaging, the results can be good for standard

systems, but there are many important questions in chemistry that cannot at all be

addressed by means of this empirical approach. If one wants to know more than just

structure or other properties that are derived only from the potential energy surface, in

particular properties that depend directly on the electron density distribution, one has

to resort to a more fundamental and general approach: quantum chemistry. The same

holds for all non-standard cases for which molecular mechanics is simply not

applicable[14].In these methods, we use a set of basic functions from which we define

all the molecular orbitals of the chemical system studied by LCAO approximation

(linear combination of atomic orbitals).

The choice of basic orbital is essential for quantum chemistry calculations. There

are two types of basic functions mainly used in electronic structure calculations: The

Orbital Type Slater (STO) and those of Gaussian type (GTO).

The question here is, what basis to choose? Is it the biggest possible?


45

The answer is: all depend of your studied system.

With the increase in computing power, the minimum size of a calculation base is

currently around 6-31G (d).

Double zeta at least.

With polarization at least on the heavy atoms.

Diffuse functions in the case of an anion or a system performing hydrogen

bonds or a system having free doublets.

A good basis: 6-31G** (or6-31++G**).

A very good basis: 6-311 ++ G (2df,2p).

II.1.2.3. Semi-empirical methods

In ab-initio methods almost all of the computation time is consumed by the

calculations of the integrals, and in order to reduce this computation time, it is

necessary to simplify the Roothann equations.

A semi-empirical method is a method in which part of the calculations necessary for

Hartree-Fock calculations are replaced by parameters fitted to experimental values (the

Hamiltonian is always parameterized by comparison with references). In general, all

these methods are very precise for given families of products similar to those used for

parameterization. Depending on the nature of the approximations used [15], there are

several variants:

CNDO / 2: (Complete Neglect of Differential Overlep) 1st semi-empirical

method, it was proposed by Pople, Segal and Santry in 1965. Method

presenting certain defects among others: it does not take into account Hund's

rule.

INDO: (Intermediate Neglect of Differential Overlap) Proposed by Pople,

Beveridge and Dobosh in 1967. It distinguishes between singlet states and

triplet states of a system by keeping the exchange integrals.

NDDO: method (Neglect of Diatomic Differential Overlap): proposed by Pople

in 1965. All bicenter bi-electronic integrals are retained.


46

MINDO / 3: Proposed by Bingham, Dewar and Lo in 1975. Parameterization

carried out with reference to experimental results and not to initial results,

moreover the optimization algorithm used is very efficient (DavidonFletcher-

Powel). However, it overestimates the heat of formation of unsaturated systems

and underestimates that of molecules containing neighboring atoms with free

pairs.

MNDO: (Modified Neglect of Diatomic Overlap) Proposed by Dewar and

Thiel in 1977. Methods based on the NDDO (Neglect of Diatomic Differential

Overlap) approximation which consists in neglecting the differential overlap

between atomic orbitals on different atoms. This method does not deal with

transition metals and presents difficulties for conjugate systems.

AM 1: (Austrin Model 1) Proposed by Dewar in 1985. He tried to correct the

flaws of MNDO.

PM 3: (Parametric Method 3) Proposed by Stewart in 1989. Presents many

points in common with AM1, moreover there is still a debate concerning the

relative merits of parametrization of each of them.

SAM 1: (Semi-ab initio Model 1) The most recent method proposed by Dewar

in 1993. It includes electronic correlation.

II.1.2.4 Limitations of Quantum Mechanics

However, these methods draw back calculations could be sometimes very long,

the size of the system studied is limited and there must be very good calculation

equipment.

One of the major challenges for computer-aided drug design is that it is not governed

by the clear-cut rules of design in engineering, and hence, these methods do not

produce a finished product by a fully prescribed procedure. The limitations of the

rational computer aided drug design approach arise because of the complexity of the

biological processes involved in drug actions and metabolism at the molecular level

and the level of approximation that must be used in describing molecular

properties.[5]However, there is clear evidence in the literature that molecular modeling

and computer-aided drug design methods and also data analysis and chemoinformatics


47

approaches have become very important tools for drug discovery and that they have

been successfully applied to medicinal chemistry,[16]particularly hit and lead

generation as well as at the lead development stages.[17-19] Accepting that molecular

modeling and chemoinformatics are useful techniques does however not sufficiently

explain why one needs quantum mechanical methods. This has been done by Clark in

a recent review, where he indicates that calculational techniques used to describe

molecules should be able to describe the intermolecular interactions adequately.

[20]He points out that this can only be achieved if the molecular electrostatics and the

molecular polarizability are described well. The former is responsible for strong

interactions and the latter is directly related to dispersion and other weak interactions.

Therefore, following this argument, molecular interactions of any type can only be

described adequately and accurately by using quantum mechanical calculations.

We will divide the application of quantum mechanical calculations to answer

medicinal chemistry related questions in a drug design environment into a total of four

sections on:

(1)the accurate calculation of molecular structure,

(2) the calculation of quantum mechanical descriptors for prediction of molecular

propertiesand QSAR, [21, 22]

(3) applications to chemical reactivity and the investigation of enzyme mechanisms,

and

(4) the calculation of interactions and binding energies of small molecules with

proteins.

This selection of topics is meant to reflect the main areas of interest to medicinal

chemists working in the field of drug discovery. It is noted that although there are a

great number of publications on the use of quantum mechanical calculations to

medicinal chemistry, however, a large number of them are retrospective studies

concerned with the validation of new technology rather than the prospective

application to problem solving and design of new chemical entity (NCEs).


48

II.2 Choice of method

The method of calculation chosen depends on what calculation needs to be done,

as well as the size of the molecule. As far as size of molecule is concerned, ab initio

calculations are limited to molecules containing tens of atoms, semi-empirical

calculations on molecules containing hundreds of atoms, and molecular mechanics on

molecules containing thousands of atoms.

Molecular mechanics is useful for the following operations or calculations: [23]

• Energy minimization ;

• Identifying stable conformations;

• Energy calculations for specific conformations;

• Generating different conformations;

• Studying molecular motion.

Quantum mechanical methods are suitable for calculating the following:

• Molecular orbital energies and coefficients;

• Heat of formation for specific conformations;

• Partial atomic charges calculated from molecular orbital coefficients;

• Electrostatic potentials;

• Dipole moments;

• Transition state geometries and energies;

• Bond dissociation energies.

II.3 Minima search method

II.3.1 Introduction

Minimization sometimes results in hydrogens and doublets arranged around the

atom in impossible structural positions. This is often due to improper initial movement

of these light atoms when a strongly distorted structure is introduced.

This can also happen if you start with a planar structure. This is why the program

performs a "second pass" of minimization after having repositioned the light atoms.


49

Another difficulty of minimization concerns the problem of the local minimum. The

constrained optimization routines indeed have the unfortunate tendency to find a

minimum energy closest to the input structure [24].

In general, bond distances and angles are properly minimized, so the local minimum

problem can boil down to optimizing dihedral angles (it takes a lot more energy to

deform a bond or valence angle by compared to a dihedral angle). The following

approaches aim to extract the molecule from its potential well.

Almost all minimization methods have at least one point in common: we start at a

given place in the hypersurface and work our way down to the nearest minimum,

without knowing whether this minimum is local or absolute. We must therefore

present to the computer several starting conformations, in the form of internal

coordinates, taking inspiration from molecular models [7].

II.3.2 Minimization algorithms

For a molecule comprising N atoms, the function to be minimized therefore

comprises 3N variables. Such a function generally comprises a global minimum and

local minimisa.

From the initial geometry, we look for the set of Cartesian coordinates which reduces

to a minimum the sum of all the energy contributions.

In principle, it suffices to take the first derivative of the steric energy with respect to

each of the degrees of freedom of the molecule and to find the place on the energetic

hypersurface where, for each coordinate ri, (dE / dri) = 0.

The procedures to achieve this goal are of two types: Some use only the slope of the

surface (first derivative), others, both this slope and the curvature of the surface (the

first and second derivatives).

II.3.2.1 The "steepest descent" method

The first minimization program that can perform geometry optimization is due to

Wiberg[25] and uses the steepest descent method. After having calculated the energy


50

corresponding to an initial geometry, we move each atom individually according to its

three Cartesian coordinates and we recalculate the energy after each displacement.

This amounts to calculating the first derivative only. Then we move all the atoms over

a distance that depends on (dE / dri), and so on. This algorithm will therefore follow

the direction imposed by the dominant interatomic forces.

This is why it proves to be very effective in removing bad contacts or the main

stereochemical problems which exist in the coarse coordinates of a crystalline or

modeled structure, while disturbing the latter very little.

However, this random method is generally long towards the end of each minimization

cycle and convergence becomes very slow beyond the first cycles (oscillating

phenomena, energy rise).

In fact, the method consists in finding the direction of the greatest slope

during which the objective function F (x) decreases the most rapidly. The direction

followed will be that indicated by the opposite of the energy gradient, i.e. the direction

of the greatest slope of the energy function, which is the direction in which the energy

decreases the fastest, at least locally.

II.3.2.2. The conjugate gradient method

This method, based on the same principle as the previous one (direction opposite to

the energy gradient), also takes into account the previous steps, in order to more

precisely determine the direction and the pitch. For a quadratic energy surface,

function of 3N variables, this method converges in 3N not [26].

It retains good efficiency, but is more costly in computation time (a factor of 2

compared to "steepest descent"). The step is adjusted at each cycle to obtain the best

reduction in energy. The interest of this algorithm is to avoid an oscillatory behavior

around the minimum and to accelerate the convergence. However, it turns out to be

less efficient or even unusable (no convergence) for structures which present a lot of

bad contacts, such as structures averaged over the trajectory of a molecular dynamic.


51

II.3.2.3. The Newton-Raphson method.

An additional improvement of convergence can still be obtained by having

recourse to a quadratic approximation Q of the function F, obtained by development in

series of Taylor [27].

The method consists in seeking at each step the minimum of the development at order

2 of the function F. This method known as "Newton-Raphson", uses the second

derivatives. Now we use this optimization technique instead. It evaluates the second

derivatives of molecular energy with respect to the geometric parameters and therefore

converges more quickly.

In summary, the complete optimization according to the Newton Raphson

method requires calculating the complete matrix of second derivatives (matrix of force

constants). This matrix contains 3N X 3N elements for a molecule of N atoms. The 3N

dimensions correspond to the three degrees of motion for each atom.

The optimization of the geometry however requires only 3N-6 degrees of freedom

since the three translations and the three rotations are not accompanied by changes of

energy. The force constants are then manipulated in the form of a matrix of second

derivatives.

II.3.2.4. The simulated annealing method.

The methods that we have just described have the particularity of making the

function F decrease at each step; these methods cannot thus escape the local minimum

close to the starting structure and consequently have a radius of convergence always

restricted. The “anneal” simulated annealing method, developed by Kirkpatrick [28],

allows the F function to increase momentarily in order to cross energy barriers and fall

back to a deeper minimum.

The crossing of these barriers makes it possible to go beyond the local minima in the

vicinity of the initial structure to explore more extensively the accessible


52

conformational space, in order to discover deeper minima and more distant from the

initial structure than the minima. local.

In conclusion, a minimizer is a mathematical black box which remains an important

tool in the search for a minimum of energy. In general, several minimizers are used.

We go to a second or a third if it does not converge quickly enough. Everything also

depends on the number of variables or the number of variations in the connection

angle introduced.

II.4 Molecular Modeling

II.4.1 Elements of Computational Chemistry

II.4.1.1 Drawing chemical structures

Various software packages, such as ChemDraw, Hyperchem, MarvinSketch,

ChemWindow, and IsisDraw, are available which can be used to construct diagrams

quickly and to a professional standard.

For example, the diagrams in this thesis have all been prepared using the Hyperchem

package. Some drawing packages are linked to other items of software which allow

quick calculations of various molecular properties. For example, the following

properties for ‘adrenaline’ were obtained using Hyperchem and, gaussian: the

structure's correct IUPAC chemical name, molecular formula, molecular weight, exact

mass, and theoretical elemental analysis. It was also possible to get calculated

predictions of the compound’s 1H and 13C nuclear magnetic resonance (NMR)

chemical shifts, Energies of the HOMO and LUMO, melting point, freezing point, log

P value, molar refractivity, and heat of formation ect…( Fig.14)


53

Figure .14.Drawing chemical structures.

A vast number of organic molecules are known. In order to distinguish one from

another, chemists give them names. There are two kinds of names: trivial and

systematic. Trivial names are often brand names (such as adrenaline). Trivial names

don’t give any real clue as to the structure of a molecule, unless you are the recipient

of divine inspiration. TheIUPAC systematic name for adrenaline is 1-(3,4

dihydroxyphenyl)-2-methylaminoethanol. Any professional scientist with training in

chemistry would be able to translate the systematic name into (Figure 14).

There are various conventions that we can follow when drawing chemical

structures, but the conventions are well understood amongst professionals. First of all,

we haven’t shown the hydrogen atoms attached to the benzene ring (or indeed the

carbon atoms within), and we have taken for granted that we understand that the

normal valence of carbon is four. Everyone understands that hydrogens are present,

and so we needn’t clutter up an already complicated drawing.

In the Figure 14 the benzene ring as alternate single and double bonds, yet we

understand that the CــــC bonds in benzene are all the same. This may not be the case

in the molecule shown; some of the bonds may well have more double bond character

than others and so have different lengths, but once again it is a well-understood

convention. Sometimes a benzene ring is given its own symbol Ph or Φ. Then again,

we drawn the NHMe and the OH groups as ‘composites’ rather than showing the


54

individual OـــH and NـــH bonds, and so on. We followed to some extent the

convention that all atoms are carbon atoms unless otherwise stated.

Much of this is personal preference, but the important point is that no one with a

professional qualification in chemistry would mistake this drawing for another

molecule. Equally, given the systematic name, no one could possibly write down an

incorrect molecule.

The aim of this chapter is to show how chemistry is a well-structured

science, with a vast literature. There are a number of important databases that

contain information about syntheses, crystal structures, physical properties and

so on. Many databases use a molecular structure drawing as the key to their

information, rather than the systematic name. Structure drawing is therefore a

key chemical skill.

II.4.1.2 Three-dimensional structures

Molecular modelling software allows the chemist to construct a three-

dimensional (3D) molecular structure on the computer. There are several software

packages available, such as Chem3D, Alchemy, Sybyl, Hyperchem, Discovery Studio

Pro, Spartan, and CAChe. The 3D model can be made by constructing the molecule

atom by atom, and bond by bond. It is also possible to automatically convert a two-

dimensional (2D) drawing into a 3D structure, and most molecular modelling packages

have this facility. For example, the 2D structure of adrenaline in Fig. 15 was drawn in

ChemDraw, then copied and pasted into Chem3D, resulting in the automatic

construction of the 3D model shown. The 3D structures of a large number of small

molecules can also be accessed from the Cambridge Structural Database (CSD) and

downloaded. This database contains over 200,000 molecules which have been

crystallized and their structure determined by X-ray crystallography.


55

Figure .15.Conversion of a 2D drawing to a 3D model.

II.4.1.3 Molecular Structure Databases

Molecular geometries can be determined for gas-phase molecules by microwave

spectroscopy and by electron diffraction. In the solid state, the field of structure

determination is dominated by X-ray and neutron diffraction and many crystal

structures are known. Also, nuclear magnetic resonance (NMR) also has a role to play,

especially for proteins.

Over the years, a vast number of molecular structures have been determined and there

are several well-known structural databases. One is the Cambridge Structural Database

(CSD)(http://ccdc.cam.ac.uk= ), which is supported by the Cambridge

Crystallographic Data Centre (CCDC). The CCDC was established in 1965 to

undertake the compilation of acomputerized data base containing comprehensive data

for organic and metal–organic compounds studied by X-ray and neutron diffraction. At

the time of writing, there are some 272 000 structures in the database[29].

http://ccdc.cam.ac.uk=/


56

Figure.16.The Cambridge Crystallographic Data Centre (CCDC).

For each entry in the CSD, three types of information are stored. First, the

bibliographic information: who reported the crystal structure, where they reported it

and so on. Next comes the connectivity data; this is a list showing which atom is

bonded to which in the molecule. Finally, the molecular geometry and the crystal

structure. The molecular geometry consists of cartesian coordinates. The database can

be easily reached through the Internet, but individual records can only be accessed on

a fee-paying basis.

The Protein Data Bank (PDB) is the single worldwide repository for the processing

and distribution of three-dimensional biological macromolecular structural data. It is

operated by the Research Collaboratory for Structural Bioinformatics[30].

At the time of writing, there were19749 structures in the data bank, relating to

proteins, nucleic acids, protein–nucleic acid complexes and viruses. The databank is

available free of charge to anyone who can navigate to their site


57

http://www.rcsb.org/.Information can be retrieved from the main website. A four-

character alphanumeric identifier, such as 1PCN, represents each structure. The PDB

database can be searched using a number of techniques, all of which are described in

detail at the homepage.

Figure.17.TheProteinDataBank(PDB).

II.4.1.4 Energy minimization

Whichever software program is used to create a 3D structure, a process called

energy minimization should be carried out once the structure is built. This is because

the construction process may have resulted in unfavorable bond lengths, bond angles,

or torsion angles. Unfavorable non-bonded interactions may also be present (i.e. atoms

from different parts of the molecule occupying the same region of space).[14]The

energy minimization process is usually carried out by a molecular mechanics program

which calculates the energy of the starting molecule, then varies the bond lengths,

bond angles, and torsion angles to create a new structure. The energy of the new

structure is calculated to see whether it is energetically more stable or not. If the

http://www.rcsb.org/


58

starting structure is inherently unstable, a slight alteration in bond angle or bond length

will have a large effect on the overall energy of the molecule resulting in a large

energy difference (ΔE; Fig. 18 ).

The program will recognize this and carry out more changes, recognizing those which

lead to stabilization and those which do not. Eventually, a structure will be found

where structural variations result in only slight changes in energy—an energy

minimum. The program will interpret this as the most stable structure and will stop at

that stage.

Figure .18. Diagram ofenergy minimization.

II.3.1.5 Molecular dimensions

Once a structure has been energy minimized, it can be rotated in various axes to

study its shape from different angles. It is also possible to display the structure in

different formats (i.e. cylindrical bonds, wire frame, ball and stick, space-filling; Fig.

19).


59

Figure .19. Different methods of visualizing molecules.

Once a 3D model of a structure has been constructed, it is a straightforward procedure

to measure all of its bond lengths, bond angles, and torsion (or dihedral) angles. These

values can be read from tables or by highlighting the relevant atoms and bonds on the

structure itself. The various bond lengths, bond angles, and torsion angles measured

for adrenaline are illustrated in Fig. 20. It is also a straightforward process to measure

the separation between any two atoms in a molecule.

Figure .20. Molecular dimensions for adrenaline (3D Hyperchem).

II.4.2 Molecular properties

Various properties of the 3D structure can be calculated once it has been built and

minimized. For example, the steric energy is automatically measured as part of the

minimization process and takes into account the various strain energies within the

molecule, such as bond stretching or bond compression, deformed bond angles,

deformed torsion angles, non-bonded interactions arising from atoms which are too


60

close to each other in space, and unfavorable dipole–dipole interactions. The steric

energy is useful when comparing different conformations of the same structure, but the

steric energies of different molecules should not be compared.

II.4.2.1 Geometric Parameters

II.4.2.1.1 Bond length

Is the distance between atomic centers involved in a chemical bond. The notion of

bond length is defined differently in various experimental methods of determination of

molecular geometry; this leads to small (usually 0.01 – 0.02 Å) differences in bond

lengths obtained by different techniques. For example, in gas-phase electron-

diffraction experiments, the bond length is the interatomic distance averaged over all

occupied vibrational states at a given temperature. In an X-ray crystal structural

method, the bond length is associated with the distance between the centroids of

electron densities around the nuclei. In gas-phase microwave spectroscopy, the bond

length is an effective inter atomic distance derived from measurements on a number of

isotopic molecules, etc[31].

II.4.2.1.2 Valence angle

The shape of a molecule is determined by its Valence angles, the angles made by

the lines joining the nuclei of the atoms in the molecule. The bond angles of a

molecule, together with the bond lengths, define the shape and size of the molecule. In

Figure 21,we can see that there are six bond angles Cl-C ClinCCl4, all of which have

the same value.That bond angle,109.5°, is characteristic of a tetrahedron. In addition,

all four C-Cl bonds are of the same length (1.78 Å). Thus, the shape and size of CCl4

are completely described by stating that the molecule is tetrahedral with C-Cl bonds of

length 1.78 Å[31].


61

Figure .21. Tetrahedral shape of CCl4.

II.4.2.1.3 Dihedral Angle (Torsion angle)

In a chain of atoms A-B-C-D, the dihedral angle between the plane containing

the atoms A,B,C and that containing B,C,D. In a Newman projection (Fig.22) the

torsion angle is the angle (having an absolute value between 0° and 180°) between

bonds to two specified (fiducial) groups, one from the atom nearer (proximal) to the

observer and the other from the further (distal) atom. The torsion angle between

groups A and D is then considered to be positive if the bond A-B is rotated in a

clockwise direction through less than 180° in order that it may eclipse the bond C-D: a

negative torsion angle requires rotation in the opposite sense[31].

Figure .22. Newman projection.


62

Other properties for the structure can be calculated, such as Atomic charges, the

predicted electrostatic potential, Fukui function, and infrared vibrational frequencies.

Some of these are described in the following sections:

II.4.3 Electronic Parameters

II.4.3.1 Atomic charges

A main problem in comparing different point-charge models is the missing of

clearcriterion for the quality of the charges. This is probably the reason why so many

charge models have been suggested such as:

CHelpG: Produce charges fit to the electrostatic potential at points selected

according to the CHelpG scheme.

NBO: Requests a full Natural Bond Orbital analysis.

Mulliken: retains only the density terms involving pairs of basic functions on

different centers. HLY: Hu, Lu, and Yang charge fitting method.

Furthermore, different applications put different demands on the charges. For example,

in molecular dynamics, molecules move, so the charges must be able to describe the

electrostatics properly in all accessible points in the phase space, and they should also

be in variant to changes in the internal coordinates of the molecule [32-34].

II.4.3.2 Molecular electrostatic potentials

The molecular electrostatic potential (MESP) surface which is a plot of

electrostatic potential mapped onto the iso-electron density surface [35], the

importance of the MESP lies in the fact that it simultaneously displays the molecular

size and shape as well as positive, negative and neutral electrostatic potential regions

in terms of the electrostatic surface, which explain the investigation of the molecular

structure with its physiochemical property relationships also be useful in identifying

how compounds with different structures might line up to interact with corresponding

electron rich and electron-poor areas in a binding site [36, 37].

An example of how electrostatic potentials have been used in drug design can be seen

in the design of the cromakalim analogue (II; Fig.23), where the cyano aromatic ring


63

was replaced by a pyridine ring. This was part of a study looking into analogues of

cromakalim which would have similar antihypertensive properties, but which might

have different pharmacokinetics. In order to retain activity, it was important that any

replacement heteroaromatic ring was as similar in character to the original aromatic

ring as possible. Consequently, the MEPs of various bicyclic systems were calculated

and compared with the parent bicyclic system (III; Fig. 24).

Figure .23. Ring variation on cromakalim.

In order to simplify the analysis, the study was carried out in 2D within the plane of

the bicyclic systems, and maps were created showing areas of negative potential

(Fig.25). The contours represent the various levels of the MEP and can be taken to

indicate possible hydrogen bonding regions around each molecule. The analysis

demonstrated that the bicyclic system (IV) had similar electrostatic properties to (III),

resulting in the choice of structure (II) as an analogue.

Figure .24. Bicyclic models in cromakalim study.


64

Figure .25. Molecular electrostatic potentials (MEPs) of bicyclic models (III) and (IV).

II.4.3.3 Molecular orbitals

The most important orbitals in a molecule are the frontier molecular orbitals,

called highest occupied molecular orbital (HOMO) which explains the ionization

energy and lowest unoccupied molecular orbital (LUMO) which explains the electron

affinity in a molecule.

These orbitals determine the way the molecule interacts with other species including

the interactions between a ligand and its receptor [38]. Whereas, electrophilic and

nucleophilic attack will most likely occur at atoms where the coefficients of the

corresponding atomic orbitals in HOMO and LUMO, respectively, are large [39]. For

example, ethene can be shown to have 12 molecular orbitals. The highest occupied

molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) are

shown in Fig. 26.


65

Figure .26.HOMO and LUMO molecular orbitals for ethene.

II.4.3.4 Fukui functions

The Fukui functions are some of these indexes. They represent a local reactivity of the

studied compounds. They are given by [40]:

where ρ(r) is the electronic density, N is the number of electrons and v(r) is a constant

external potential. The reactivity of an atom k in a molecule can be described, by a

condensed Fukui function fk. As ρ(r) is a discontinuous function of N, Yang and Parr

[41, 42]have proposed approximated atomic indices fk by applying the finite

difference approximation to the condensed electronic population on any atom. Three

indices were defined to describe nucleophilic, electrophilic, and radical attack. These

can be written respectively as:

f+(A) = [qN+1(A) - qN(A)]

f-(A) = [qN(A) - qN-1(A)]

f0(A)= [qN+1(A) - qN-1(A)] /2

qk(N): electronic population of k atom in neutral molecule.

qk(N+1): electronic population of k atom in anionic molecule.

qk(N-1): electronic population of k atom in cationic molecule.


66

In this approximation, the indices depend widely on the used population analysis

approach. The electronic population around an atom k can be evaluated using

Mulliken[43], Hirshfield[44], or natural orbital [45]approximations. These indices are

some of the widely used local density functional descriptors to model chemical

reactivity and site selectivity [41, 46]. The atom with the highest Fukui indices is the

most reactive compared to the other atoms in the molecule.

II.5. Studies of vibrational properties

II.5.1. Introduction

Molecular modeling is essential for the interpretation and understanding of

experimental observations. In the molecular domain, all properties are related to the

nature and shape of the molecule. Being able to optimize the geometry of a molecule

by a theoretical model (quantum chemistry methods) is to approach its molecular

conformation observed experimentally [47, 48].

The object of this part of our work, which was oriented towards the search for a

molecular conformation of the calculated indole similar to that given by the

experiment, is to make a contribution to the understanding of the structural, vibrational

properties and electronics to exhibit the most stable molecular conformation and also

the arrangement and atomic environment of carbazole.

II.5.2. Theoretical aspects of infrared vibration spectroscopy

The movements of atoms in a molecule can be classified into three categories:

translations, rotations and vibrations. Nowadays, studies by vibrational spectroscopy

are, more and more, supplemented by calculations of quantum chemistry [49, 50].

In this case, the contribution of molecular modeling is very important to

understand reaction mechanisms or have access to chemical properties. Indeed, the

methods of quantum chemistry make it possible to model a very large number of

quantities characteristic of atomic or molecular systems or to simulate a wide variety


67

of reaction processes. Also, the combination of these two techniques is proving to be

very powerful in explaining mechanistic details at the molecular level [51].

The main purpose of vibrational spectroscopy is the determination of the

vibrational frequencies of a molecule. These frequencies depend on the mass of the

atoms involved in the normal mode of vibration as well as the strength of the

interatomic bonds. Therefore, precise information about the structure of a molecule

can be deduced from a vibrational spectrum [52, 53].

Molecular vibrations take place at different frequencies (υ vib) which depend on the

nature of the bonds as well as their environment. These frequencies correspond to the

infrared range of electromagnetic radiation [54].

II.5.3. Principle of infrared spectroscopy

II.5.3.1. Electromagnetic radiation

Electromagnetic radiation consists of a beam of particles: photons, the

movement of which is described by means of equations of wave mechanics.

The latter has shown that light participates in both wave and particle properties. When

it collides with matter, it can be thought of as discrete packets of energy (quanta)

called photons [55]. The interactions between matter and a radiation to which it is

subjected are numerous. The most interesting and the most studied involve absorption

phenomena (molecules can absorb the energy quanta of certain radiations). In this

case, their fundamental energy states are altered by transition, or transition, to excited

states of higher energy. Each state of matter is quantified and excitation takes place by

absorption of a discrete amount of energy ΔE [56]. To be absorbed, the radiation must

be at the same frequency corresponding to this quantity of energy either: the recording

of the energy absorbed or transmitted according to the frequency or the wavelength,

constitutes the absorption spectrum of the compound in the spectral region of interest

[55, 57]. The frequency ν of an electromagnetic wave is related to its wavelength by

the relation [56, 57]:

ν = C/λ


68

where C represents the speed of light propagation in vacuum and λ the wavelength.

The energy E of the radiation or photon is directly proportional to the frequency of the

electromagnetic wave or the field wave.

It is written: E = hν

Normally, a spectrum should appear as a succession of fine lines. In fact, there is for a

molecule a succession of energetic states close to each other.

Hence the obtaining of more or less broad peaks or absorption bands even with

equipment with high resolving power. The purpose of studying infrared spectra is most

often to characterize compounds or to verify their compliance with a reference sample.

Absorption in the infrared spectral region corresponds to transitions in the energies of

molecular vibrations.

Figure .27.The various spectral domains of electromagnetic radiation.

Quantum mechanics shows us that only vibrational transitions in which there

is variation in the dipole moment of the molecule, can cause an absorption peak in IR

to appear. Many lines are due to the different modes of vibration of the bonds.


69

II.5.3.2. Infrared

Understood between the visible and the microwaves, the infrared is subdivided into

three parts: the near infrared located between 13,000 and 4,000 cm-1, the medium

infrared located between 4,000 and 200 cm-1, and the far infrared located between 200

and 10 cm-1[56]. Infrared spectroscopy is based on the study of the reactions between

matter and electromagnetic radiation. The latter can be defined by its frequency ν

expressed in hertz (Hz) or by its wave number, V is usually expressed in cm-1 or in

kayser. It represents the number of waves contained in the interval of 1 cm [57]. In our

investigation, we limited ourselves to the mid-infrared range (in the region of 4000 to

600 cm-1), this corresponds to energies ranging from 4 to 40 kJ.mol-1.

An infrared spectrum is complex. This complexity can be increased by the appearance

of additional bands due to the harmonics of the fundamental absorption frequencies νf

of lower intensity. They can be observed at 2νf, 3νf, or at frequency combination bands,

which correspond to the sum of two fundamental frequencies.

Fundamental bands may not appear in the following cases: the vibration does not

cause a variation in the dipole moment of the molecule, which often occurs in the case

of compounds with a very symmetrical structure; vibrations occur at very close

frequencies or at the same frequency; absorption is too low for the band to be visible.

II.5.3.3. Infrared absorption

Any molecule excited by electromagnetic radiation absorbs an amount of energy

and goes into an excited state. According to the mechanics, the absorption process is

quantified and only particular frequencies can be absorbed by the molecule.

Vibrational excitation can be considered simply by considering two atoms A1 and A2

united by a bond as being two masses connected by a spring (figure 28) which

stretches and relaxes at a certain frequency υ.


70

Figure.28.Movement of atoms during the phenomenon of vibration

In this representation, the frequency of vibrations between the two atoms depends

both on the bond strength between them and on their atomic masses. It can be shown

that it is governed by the following Hooke's law which describes the motions of a

spring.

Ʋ՟ : vibrational frequency in wavenumbers (cm-1)

K: constant

f :force constant, indicating the stiffness of the spring (of the connection)

m1 and m2: values of the masses attached to the spring (masses of the united atoms)

This equation could lead us to think that each individual bond in a molecule

gives rise to a specific absorption band in the infrared spectrum. Which is not the case.

The energies of radiation absorption correspond to movements of elongation, or

deformation of most of the covalent bonds of molecules. In the process of absorption,

the frequencies of infrared radiation coincide with those of the natural vibrational

frequencies of molecules [58]. Not all bonds are capable of absorbing infrared energy

even if their frequency coincides perfectly with that of movement of the bond. Only

groups whose dipole moment varies during vibration are bonds which admit a dipole

movement are able to absorb infrared radiation [57].


71

A bond must have an electric dipole which must oscillate at the same frequency as that

of the excitation radiation, so that energy can be transferred.

In the case of infrared spectroscopy the molecule absorbs infrared light if and only if

the dipole moment of the molecule changes during vibration [58].

The advantage with infrared is that it avoids unwanted photochemical or

luminescence reactions. Electromagnetic radiation in the infrared (located in the region

of 10,000 to 20 cm-1), causing a change in the vibrational level nevertheless leaves the

molecule at its fundamental electronic level [58, 59]. To understand the phenomenon

of atomic vibration, it must be visualized as the interaction of two particles linked

together by a spring [58]. When this spring is pulled and released, the two particles

enter into a stretching motion. The amplitude of their movement is not only a function

of the mass of the particles but also of the force of the spring (see figure 28). During

the vibration phenomenon, other movements can be described: stretching, folding,

twisting, swaying, shearing. In biological membranes, these movements become all the

more complex as the chemical structures of the membrane constituents are more

elaborate.

II.5.4. Theoretical aspects

II.5.4.1. Internal vibration modes

In a polyatomic molecule comprising N atoms, the atoms can perform oscillating

movements around their position of equilibrium. The rather weak oscillations of the

molecule cause small displacements of the atoms formed. In this case the harmonic

oscillation model is adapted to describe the movements of the atoms of the molecule.

In a molecule made up of N atoms, each atom is assigned three coordinates to define

its position. There by; an atom has three independent degrees of freedom of movement

in the three directions of space. This results in 3N degrees of freedom to describe the

motion of an entire molecule (Table 2). Of these 3N degrees of freedom, three are the

coordinates that define the position of the center of mass of the molecule (translational

degrees of freedom), and three other coordinates necessary to specify the molecular

orientation around the center of gravity. These represent the degrees of freedom of


72

rotation. Note that a linear molecule has only two degrees, since rotation around the

molecular axis does not cause the nuclei to shift. In conclusion, a molecule with N

atoms has 3N-6 (3N 5 for a linear molecule) degrees of freedom, responsible for all

changes in the shape of the molecule without moving or rotating the center of gravity

of the molecule [60, 61].

Table.2. Molecular degrees of freedom

II.5.4.2. Classification of vibration modes

Vibrations can be classified into two categories: in-plane and out-of-plane [62, 63].

In the plane: we distinguish the elongation (ν) and the angular deformation (δ)

which can be either symmetrical (δs and νs), or antisymmetric (δas and νas).

Symmetric and antisymmetric angular deformations correspond to shearing or

rotational movements of three atoms forming the angle θ (Figure 29).

Out of plane: these are angular deformations outside a molecular plane that can

induce collective movement of the molecule corresponding to torsion (τ) or

sway (γ) movements.


73

Figure.29. Movements associated with normal modes of vibration of a molecule containing

3 atoms.

II.5.4.3. Factors that influence vibration frequencies

Analysis of an IR spectrum most often carried out in solution or in the solid state

shows that a number of factors influence the vibration frequencies as well as their

intensities.

The intensity of the bands depends on the nature and polarity of the bond. The

vibration frequency, for its part, depends on the atoms constituting the bond, the

multiplicity of the bond and its environment. Generally speaking, bonds between light

atoms vibrate at a higher frequency than bonds between heavier atoms. The higher the

constant K (bond strength constant), the higher the vibration frequency.

The link environment can have an electronic or mechanical impact on the vibration of

this frequency. We will cite a few examples:

Electrical effects: The effects of electronegativity and conjugation can cause

significant shifts in vibration frequencies.

Hydrogen bond: For some functions such as –OH, -NH, -SH, there is the

possibility of hydrogen bond formation when they perceive in their molecular

neighborhood the presence of an electronegative atom. This results in an

elongation of the X-H bond and therefore a weakening of its force constant.


74

Steric Effects: The frequency of a vibration depends on the steric constraints

associated with this vibrator. The vibration ν (C = O) occurs at a frequency so

much higher as the valence angle COC is small.

Mechanical coupling: When in a molecule vibration of the same nature are found in

the vicinity of one another, having an atom in common for example, they do not

vibrate independently of one another. As a result, the frequencies are affected

compared to those generally observed. This phenomenon is due to the existence of a

mechanical coupling.

II.5.5. Application

Infrared spectrometry [64]thus provides detailed information on:

The chemical structure of macromolecules and the composition of the polymer:

identification of the basic unit, branches, analysis of chain ends, determination

of the nature and concentration of additives, structural defects, impurities.

Intra- or intermolecular interactions, chain conformation, polymer crystallinity,

orientation of macromolecules.

Infrared spectrometry is also an effective tool for studying the structural

modifications of polymers resulting from chemical treatments, degradations or

aging of various origins.

II.6 Conclusion

Chemoinformatics and molecular modeling methods are among the recent

techniques now routinely applied in the research phases and in particular for the

management and analysis of chemical libraries. They certainly arouse as much hope as

they do criticism, but nevertheless constitute a very active line of research over the

past two decades. In this chapter, we have presented most of the methods used in

chemoinformatics for the analysis of chemical libraries. All of these concepts were

used during this thesis in order, on the one hand, to make our own contribution

through the creation of new methods allowing to characterize the chemical libraries

and on the other hand, to develop a software allowing to implement some of these

methods.

Part II:

Computer-Assisted Drug Design

Computer-Assisted drug design

76

What is computer-assisted drug design (CADD), and why is it important? There is no

clear definition, although a consensus view has emerged. Simply, CADD is the

coalescence of information on chemical structures, their properties, and their

interactions with biological macromolecules. Further, these data are transformed into

knowledge intended to aid in making better decisions for drug discovery and

development.

These are representative of some questions facing the current drug-design community

and significant applications of CADD.

Of the compounds included in our dataset, how many could be predicted to lack drug

likeproperties based on similarity in properties to known orally active drugs?

How many would be predicted to be inactive based on the known structure–activity

dataavailable on topoisomerase II inhibitors?

Given the structure–activity relationships (SARs) available on the inhibitors, what

could one determine regarding the active site of topoisomerase II?

What novel classes of compounds could be suggested based on the SAR of inhibitors,

or based on the new crystal structure of the complex?

Do the most potent compounds share a set of properties that can be identified and used

to optimize a novel lead structure?

Can a predictive equation relating properties and affinity for the isolated enzyme be

established?

II.1 Predictive Quantitative Structure–Activity Relationship Modeling

At the beginning of its over 40 years of existence as an independent area of

research,quantitative structure–activity relationship (QSAR) modeling was viewed

strictly as analytical physical chemical approach applicable only to small congeneric

series of molecules. The technique was first introduced by Hanschet al.[65]on the basis

of implications from linear free energy relationships in general and the Hammett

equation in particular. [66]

It is based upon the assumption that differences in physicochemical properties account

for the differences in biological activities of compounds. According to this approach,


77

the changes in physicochemical properties that affect the biological activities of a set

of congeners are of three major types: electronic, steric, and hydrophobic.[67]

The quantitative relationships between biological activity (or chemical property)

and the structural parameters could be conventionally obtained using multiple linear

regression (MLR) analysis. The fundamentals and applications of this method in

chemistry and biology have been summarized by Hansch and Leo.[67]This traditional

QSAR approach has generated many useful and, in some cases, predictive QSAR

equations and led to several documented drug discoveries.[68-70]

Many years of active research in QSAR have dramatically changed the breadth

and the depth of this field in all its components including the diversity of target

properties, descriptor types, data modeling approaches, and applications. The most

important changes in QSAR deal with a substantial increase in the size of data sets

available for the analysis and an increasing use of QSAR models as virtual screening

tools to discover biologically active molecules in chemical databases and virtual

chemical libraries.

II.2. Principle of QSPR / QSAR methods

The principle of the QSPR / QSAR methods is to establish a mathematical

relationship linking in a quantitative manner molecular properties, called descriptors,

with a macroscopic observable (biological activity, toxicity, physicochemical property,

etc.), for a series of similar chemical compounds. using data analysis methods. The

general form of such a model is as follows:

Property / Activity = f (D1, D2,…Dn ,,…)

D1, D2, Dn are descriptors of molecular structures.

The aim of such a method is to analyze structural data in order to detect the

determining factors for the property / activity being measured. To do this, different

types of statistical tools can be used:

- Simple and multiple linear regressions [71],

- Partial least squares regressions (PLS) [72],


78

- Decision trees [73],

- Neural networks [74-76],

- Genetic algorithms[77],

- Machine Vectors [76],

Once this relationship is established and validated, it can then be employed for the

prediction of the property / activity of new molecules, for which experimental values

are not available. Such models can also be used to better understand mechanisms and

modes of action.

II.3 Key Quantitative Structure–Activity Relationship Concepts

An inexperienced user or sometimes even an avid practitioner of QSAR could be

easily confused by the diversity of methodologies and naming conventions used in

QSAR studies. 2D or three-dimensional (3D) QSAR, variable selection or artificial

neural network (ANN) methods, Comparative molecular field analysis (CoMFA), or

binary QSAR present examples of various terms that may appear to describe totally

independent approaches, which cannot be generalized or even compared to each other.

In fact, any QSAR method can be generally defined as an application of mathematical

and statistical methods to the problem of finding empirical relationships (QSAR

models) of the form 𝑃𝑖 = 𝑘(𝐷1, 𝐷2, … 𝐷𝑛),where 𝑃𝑖 are biological activities (or other

properties of interest) of molecules, 𝐷1, 𝐷2, … 𝐷𝑛 are calculated (or, sometimes,

experimentally measured) structural properties (molecular descriptors) of compounds,

and 𝑘 is some empirically established mathematical transformation that should be

applied to descriptors to calculate the property values for all molecules.

The relationship between values of descriptors D and target properties P can be

linear (e.g., MLR as in the Hansch QSAR approach), where target property can be

predicted directly from the descriptor values, or nonlinear (such as ANNs or

classification QSAR methods) where descriptor values are used in characterizing

chemical similarity between molecules, which in turn is used to predict compound

activity.


79

The differences in various QSAR methodologies can be understood in terms of the

types of target property values, descriptors, and optimization algorithms used to relate

descriptors to the target properties and generate statistically significant models.

A- Target properties(regarded as dependent variables in statistical data modeling

sense) can generally be of three types:

(1) continuous, i.e., real values covering certain range, e.g., IC50, MIC values, or

binding constants.

(2) categorical related, classes of target properties covering certain range of values,

e.g., active and inactive compounds, frequently encoded numerically for the

purpose of the subsequent analysis as 1 (for active) or 0 (for inactive), or adjacent

classes of metabolic stability such as unstable, moderately stable, and stable.

(3) categorical unrelated, classes of target properties that do not relate to each

other in any continuum, e.g., compounds that belong to different pharmacological

classes, or compounds that are classified as drugs versus non drugs.

B- Chemical descriptors(or independent variables in terms of statistical data

modeling) can be typically classified into two types:

(1) continuous (i.e., range of real values, e.g., as simple as molecular weight or

many molecular connectivity indices); or

(2) categorical related (i.e., classes corresponding to adjacent ranges of real values,

e.g., counts of functional groups or binary descriptors indicating presence or

absence of a chemical functional group or an atom in a molecule).

Descriptors can be generated from various representations of molecules, e.g., 2D

chemical graphs or 3D molecular geometries, giving rise to the terms of 2D or 3D

QSAR, respectively.

C- Correlation methods(which can be used either with or without variable selection)

can be classified into two major categories:

(1) linear (e.g., linear regression (LR), or principal component regression (PCR),

or partial least squares (PLS)) or

(2) nonlinear (e.g., k nearest neighbor (kNN), recursive partitioning (RP), ANNs,

or support vector machines (SVMs). [78]


80

Figure .30.Molecular descriptors and fingerprints are examples of strategies that allow researchers to

extract important information about compounds that can be used in additional computer-aided drug

design techniques, such as virtual screening, quantitative-structure-activity relationship (QSAR).

In some cases, the types of biological data, the choice of descriptors, and the class

of optimization methods are closely related and mutually inclusive. For instance, MLR

can only be applied when a relatively small number of molecular descriptors are used

(at least five to six times less than the total number of compounds) and the target

property is characterized by a continuous range of values. The use of multiple

descriptors makes it impossible to use MLR due to a high chance of spurious

correlation [79]and requires the use of PLS or nonlinear optimization techniques.

II.3.1 Molecular Descriptors

Theoretical descriptors derived from physical and physico-chemical theories

show some natural overlap with experimental measurements. Several quantum

chemical descriptors, surface areas, and volume descriptors are examples of such

descriptors also having an experimental counterpart.

With respect to experimental measurements, the greatest recognized advantages of the

theoretical descriptors are usually (but not always) in terms of cost, time, and

availability.

Each molecular descriptor takes into account a small part of the whole chemical

information contained in to the real molecule and,asaconsequence,the number of

descriptors


81

iscontinuouslyincreasingwiththeincreasingrequestofdeeperinvestigationsonchemicalan

d biological systems. Different descriptors have independent methods or perspectives

to view a

molecule,takingintoaccountthevariousfeaturesofchemicalstructure.Moleculardescriptor

s have now become some of the most important variables used in molecular modeling,

and, consequently, managed by statistics, chemometrics, and chemoinformatics[80].

There are many different descriptors and many different kinds of classification

based on the effect (steric, hydrophobic, electronic, etc.) or based on its dimension that

called Descriptors Block, whereas we find four types of molecular descriptors 0D, 1D,

2D and 3D [80].

Molecular descriptors 0D, 1D: contains numbers of atoms, functional groups,

molecules properties, and charges (eg.MolecularWeight,

MolecularRefractivity,etc.).

Moleculardescriptors2D:containssimpledescriptorsandotherderivativesofalgorit

hms applied to a topological representation (eg. Number of Cl atoms, presence

of hydroxyl group,etc.).

Molecular descriptors 3D: contains descriptors derived from a geometric

representation also called geometric descriptors (eg. Molecular volume, Surface

area,etc.).

II.3.3 Quantitative Structure–Activity Relationship Modeling Approaches

II.3.3.1 General Classification

Many different approaches to QSAR have been developed since Hansch’s seminal

work. As briefly discussed above, the major differences between these methods can be

analyzed from two viewpoints:

(1) the types of structural parameters that are used to characterize molecular identities

starting from different representation of molecules, from simple chemical formulas to

3D conformations, and

(2) the mathematical procedure that is employed to obtain the quantitative relationship

between these structural parameters and biological activity.


82

Based on the origin of molecular descriptors used in calculations, QSAR methods can

be divided into three groups.

One group is based on a relatively small number (usually many times smaller than the

number of compounds in a data set) of physicochemical properties and parameters

describing hydrophobic, steric, electrostatic, etc. effects. Usually, these descriptors are

used as independent variables in multiple regression approaches. In the literature, these

methods are typically referred to as Hansch analysis.

A more recent group of methods is based on quantitative characteristics of molecular

graphs (molecular topological descriptors). Since molecular graphs or structural

formulas are ‘two dimensional,’ these methods are described as 2D QSAR. Most of

the 2D QSAR methods are based on graph theoretical indices. Although these

structural indices represent different aspects of molecular structures, and, what is

important for QSAR, different structures provide numerically different values of

indices, their physicochemical meaning is frequently unclear.

The third group of methods is based on descriptors derived from spatial (3D)

representation of molecular structures. Correspondingly, these methods are referred to

as 3D QSAR; they have become increasingly popular with the development of fast and

accurate computational methods for generating 3D conformations and alignments of

chemical structures. Perhaps the most popular example of 3D QSAR is CoMFA,

developed by Cramer et al.,[81]which has combined the power of molecular graphics

and PLS technique and has found wide applications in medicinal chemistry and

toxicity analysis.[82]

II.3.3.2. General Methodology of a QSPR / QSAR study

The general methodology of a QSAR / QSPR study is as follows:

a- Build up a database from reliable experimental measurements of the property or

activity of each compound.

b- Select the descriptors in relation to the property or activity studied.

c- Divide this database, randomly, into a training set which generally contains 2/3 of

the database and a test series (test set) consisting of the remaining 1/3.


83

d- Establish mathematical models using the learning series.

e- Characterize the models developed by their internal validation indices and check

their robustness by a randomization test (Randomization) of the dependent variable Y

(response).

f-Validate the models developed using the test series and calculate their statistical

parameters for external validation.

II.3.3.3 Transforming the Bioactivities

The main advantage of transforming data is to guarantee linearity, to achieve

normality, or to stabilize the variance. Several simple nonlinear regression

relationships can be made linear through the appropriate transformations. The simplest

and most common method of transforming [83]bioactivity data is to take the log or

negative log of the bioactivities to reduce the range of the data.

The method of creating Training and Test Sets that are representative of the population

is to choose molecules that represent all the molecules of interest based on molecular

structure and bioactivity. The number of molecules in the Test Set is determined (20%

of the total molecules in this study) and then all the molecules are placed in a table and

ordered based on bioactivities. With the table prepared, the extraction of the molecules

for the Test Set can begin through an iterative process starting at the top of the table.

II.3.3.4 Determination of the Best Set of Descriptors Approaches

Both 2D and 3D QSAR studies have focused on the development of optimal

QSAR models through variable selection. This implies that only a subset of available

descriptors of chemical structures, which are the most meaningful and statistically

significant in terms of correlation with biological activity, is selected.

The Stepwise Method Search selects a model by adding or removing individual

descriptors, a step at a time, based on their statistical significance. The end result of

this process is a single regression model, which makes it nice and simple.

The p-value for each term tests the null hypothesis that the coefficient is equal to zero

(no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In

other words, a descriptor that has a low p-value is likely to be a meaningful addition to


84

your model because changes in the descriptor's value are related to changes in the

response variable (bioactivity). Conversely, a larger (insignificant) p-value suggests

that changes in the descriptor are not associated with changes in the response.

II.4. Building Predictive Quantitative Structure–Activity Relationship Models:

The Approaches to Model Validation

II.4.1. Data analysis methods

To develop a QSPR / QSAR model we need a data analysis method, this method

allows to quantify the relationship between the property / Activity and the Structure

(descriptors).

There are several methods to build a model and analyze its statistical data, some are

linear such as multiple linear regression (MLR), partial least squares (PLS) regression,

others are nonlinear such as trees of decisions, neural networks… these methods are

available in software such as, Excel, Origin Microcal, Minitab, Statistica, SPSS, JMP,

R,…

The method used in our studies is the Multiple Linear Regression (MLR) and Neural

Networks (ANN) method.

II.4.1.1. Neural networks

The formal neural networks initiated in 1943 by Mc Culloch and Pitts are analogous to

biological nervous systems.[84]

Formal neurons (Figure 31) perform a linear combination of the inputs

received(descriptors), then apply to this value an activation function f, generally

nonlinear (sigmoidal, tangent, hyperbolic). The obtained value Y (biological activity)

is the output of the neuron.[85]


85

Figure.31.Representation of a formal neuron.

Xk (where k = 1, 2, …, n) are the input neurons (descriptors) and Wk

(where k = 0, 1, 2, …,n) are the weights. W0 is the weight associated with

an entry set at 1, called bias. The neuron equation looks like this:

Neurons alone perform fairly simple functions, and it is their combinations, called

neural networks, that make it possible to construct particularly interesting functions

(Figure 31).

Figure.32.Architecture of neural networks.


86

The architecture of neural networks consists of three layers:

o An input layer: made up of input neurons, their number is equal to the number

of input variables (descriptors) plus one (bias). Each neuron is connected to the

hidden neurons.

o A hidden layer: made up of a variable number of neurons. For each hidden

neuron, the network performs a weighted sum operation with the different

weights of each input neuron.

o An output layer: where the number of output neurons is equal to the number of

properties (activities) to be modeled.

Each connection between the neurons of the different layers is associated with a

weight. The number of weights then depends on the number of neurons in the hidden

layer, so it is possible to vary the complexity of the network by increasing or

decreasing the number of hidden neurons. To do this, data presentation cycles are

established via a learning algorithm. The one used in our approach is a

backpropagation algorithm.[86]

The principle of the back-propagation algorithm is to minimize the difference between

the calculated output (calculated activity) and the experimental values of the activity.

Defined in two stages, the algorithm propagates in a first stage the inputs (descriptors)

forward until obtaining an output (biological activity) calculated by the network. In a

second step, the calculated output is compared with the experimental value. The error

obtained by this comparison is then propagated back to the input layer by modifying

the weights of the neurons. This process is repeated until a negligible error is

obtained.[87]

II.4.1.2 Multiple linear regression

Multiple linear regression is the simplest statistical modeling method and the most

applied in studies of the structure-activity relationship [88]. The method was

popularized by Hanch by relating biological activity to experimental lipophilic,

electronic and steric properties for series of compounds.[89]


87

The RLM method is based on the assumption that there is a linear relationship

between a dependent variable Y (in our case the biological activity) and a series of n

independent variables Xi (here, the molecular descriptors)[90]. The objective is to

arrive at a mathematical equation of the form:

Where ai (i = 0, 1,…, n) are the coefficients of the regression.

Determining the equation from a data set of p samples is like solving a system of p

equations.

Or:

The residuals bi (i = 1, 2, ..., p) represent the model error,

yi (i = 1, 2,…, p) represent the dependent variables (activities),

xi (i = 1, 2,…, p) represent the independent variables (descriptors),

ai (i = 1, 2,…, p) represent the coefficients of the regression.

This set of equations can be written in the following matrix form:


88

Or in a condensed way:

Y = XA + B

Where Y, X, A, and B represent the property vector, the attribute matrix (descriptors),

the coefficient matrix, and the regression error matrix, respectively.

The RLM method then consists in choosing the coefficients ai so as to minimize

the sum of the squares of the differences between the calculated values of the property

and the experimental ones, the equation of the model therefore becomes:

Or in matrix form:

The coefficients can be obtained from the following matrix equation:

II.4.1.2.1. Randomization test

The randomization test is used by the modeler to assert that the good correlations,

between descriptors and activity, presented by the QSAR model are not due to luck.

To do this, the observations are randomly disorganized, for example ten times, by

randomly changing the activity column, but the descriptor columns remain

unchanged[91]. We therefore obtain ten models with specific statistical characteristics.

If the randomization of the observations leads to weak forecast models, this means that


89

the predictive capacities of the constructed QSAR model are not due to the

correlations of luck.[92]

II.4.2. Interpretation and validation of a QSPR / QSAR model

Once developed, the model must be interpreted by analyzing all the statistical

parameters of this model, its quality must also be studied, this quality is checked by

what is called validation. Its robustness, ie the influence of the compounds of the

training series on the model, is estimated by internal validation methods. In order to

estimate its predictive power, additional experimental data is needed to determine the

ability of the model to predict these values, this is called external validation. Finally, it

is important to know which type of molecules used with which model.

II.4.2.1. The Importance of Validation

The process of QSAR model development is divided into three key steps: (1) data

preparation, (2) data analysis, and (3) model validation. The implementation and

relative merit of these steps is generally determined by the researcher’s interests and

experience, and the availability of software. The resulting models are then frequently

employed, at least in theory, to design new molecules based on chemical features or

trends found to be statistically significant with respect to underlying biological

activity.

The first stage includes the selection of a data set for QSAR studies and the

calculation of molecular descriptors. The second stage deals with the selection of a

statistical data analysis technique, either linear or nonlinear such as MLR, PLS or

ANN. A variety of different algorithms and computer software are available for this

purpose. In all approaches, descriptors are considered as independent variables, and

biological activities as dependent variables.

Typically, the final part of QSAR model development is model validation, [93, 94]in

which estimates of the predictive power of the model are calculated. This predictive

power is one of the most important characteristics of QSAR models.

Ideally, it should be defined as the ability of the model to predict accurately the target

property (e.g., biological activity) of compounds that were not used in model


90

development.

Here, we are going to discuss about the validation parameters for the QSAR

models which are developed by multiple linear regression (MLR). Four tools of

assessing validity of QSAR models [95]are (i) cross-validation, (ii) bootstrapping, (iii)

randomization of the response data, and (iv) external validation. Where we are using

the data that created the model (an internal method) and using a separate data set (an

external method).

The methods of least squares fit (R²), cross validation (Q²) [96, 97], adjusted R²

(R²adj), chisquared test (χ²), rootmean-squared error (RMSE), bootstrapping and

scrambling (YRandomization) [98, 99]are internal methods of validating a model. The

best method of validating a model is an external method, such as evaluating the QSAR

model on a test set of compounds.

II.4.2.1.1. Internal Validation

II.4.2.1.1.1. Least Squares Fit

The most common internal method of validating the model is least squares fitting.

This method of validation is similar to linear regression and is the R2(squared

correlation coefficient) for the comparison between the predicted and experimental

activities. An improved method of determining R2is the robust straight line fit, where

data points are away from the central data points (essentially data points a specified

standard deviation away from the model) are given less weight when calculating the

R2. An alternative to this method is the removal of outliers (compounds from the

training set) from the dataset in an attempt to optimize the QSAR model and is only

valid if strict statistical rules are followed. The difference between the R2and R2adj

value is less than 0.3 indicates that the number of descriptors involved in the QSAR

model is acceptable. The number of descriptors is not acceptable if the difference is

more than 0.3.


91

Where, y and ŷ are the experimental and predicted bioactivity for an individual

compound in the training set, ymis the mean of the experimental bioactivities, and n is

the number of molecules in the set of data being examined. PRESS is the predictive

residual sum of the squares.

II.4.2.1.1.2 Fit of the Model

Fit of the QSAR models can be determined by the methods of chi-squared (χ2) and

root-mean squared error (RMSE). These methods are used to decide if the model

possesses the predictive quality reflected in the R2. The use of RMSE shows the error

between the mean of the experimental values and predicted activities. The chi squared

value exhibits the difference between the experimental and predicted bioactivities:

Large chi-square or RMSE values (≥0.5 and 1.0, respectively) reflect the model’s

poor ability to accurately predict the bioactivities even the model is having large

R2value (≥0.7).

For good predictive model the chi and RMSE values should be low (<0.5 and

<0.3, respectively). These methods of error checking can also be used to aid in

creating models and are especially useful in creating and validating models for

nonlinear data sets, such as those created with Artificial Neural Network (ANN) [100].

However, excellent values of R2, χ2and RMSE are not sufficient indicators of model

validity. Thus, alternative parameters must be provided to indicate the predictive

ability of models. In principle, two reasonable approaches of validation can be

envisaged one based on prediction and the other based on the fit of the predictor

variables to rearranged response variables.


92

II.4.2.1.2 External validation

Several authors have suggested that the only way to estimate the true predictive

power of a QSAR model is to compare the predicted and observed activities of an

(sufficiently large)external test set of compounds that were not used in the model

development [94, 101-104]. The problem in external validation is how can we select

the training and test set? Roy et al. clearly discussed that how we can solve this

problem in one of their article. [105]

To estimate the predictive power of a QSAR model, Golbraikh and Tropsha

recommended use of the following statistical characteristics of the test set [93]: (i)

correlation coefficient R between the predicted and observed activities; (ii) coefficients

of determination (R2) (predicted vs. observed activities r02, and observed vs. predicted

activities r0'); (iii) slopes k and k' of the regression lines through the origin. They

consider a QSAR model is predictive, if the following conditions are satisfied [93]:

The predictive ability of the selected model was also confirmed by external

R2test. A value ofR2test is greater than 0.6 may be taken as an indicator of good

external predictability.

Where �̅�𝑡𝑟 is the average value for the dependent variable for the training set. Kubinyi

et al[94], Novellino et al.[102], Norinder[103], and Golbraikh and

Tropsha[93]demonstrated that all of the above-mentioned criteria are necessary to

adequately assess the predictive ability of a QSAR model. Norinder suggest [103]that

the external test set must contain at least five compounds, representing the whole range

of both descriptor and activities of compounds included into the training set.


93

II.5. Application of QSAR

There are a large number of applications of QSAR models in industry, in university

research, in economics, in weather forecasting, …etc.

We report below some possible applications of QSAR models:

Rational identification of new leads with: optimization of pharmacological,

biocidal or pesticide activity.

Rational design of many products such as surfactants, perfumes, dyes and fine

chemicals.

The identification of dangerous compounds in the early stages of development.

Prediction of toxicity and side effects of new compounds.

The selection of compounds with optimal pharmacokinetic properties, be it

stability or availability in biological systems.


94

Figure.33. Schematic overview of the QSAR process.

References

[1] C. Larive, Apports combinés de l'expérimentation et de la modélisation à la compréhension de

l'alcali-réaction et de ses effets mécaniques, in, Ecole nationale des ponts et chaussees, 1997.

[2] A. Hamdouch, M.-H. Depret, Spatial Scales, Proximity and Institutional Logics: Outline of an

Evolutionary Approach of Technological Change Dynamics in Pharmaceuticals and Biotechnology

(Echelles spatiales, formes de proximite et logiques institutionnelles: Esquisse d'une approche co-

evolutionnaire des dynamiques de changement technologique dans la pharmacie et les

biotechnologies), (2008).

[3] F. Lerbet-Sereni, Expériences de la modélisation, modélisation de l'expérience, Harmattan, 2004.

[4] Y. Yamaguchi, H.F. Schaefer, Y. Osamura, J. Goddard, A New Dimension to Quantum Chemistry:

Analytic Derivative Methods in Ab Initio Molecular Electronic Structure Theory, Oxford University

Press, 1994.

[5] G.H. Loew, H.O. Villar, I.J.P.r. Alkorta, Strategies for indirect computer-aided drug design, 10

(1993) 475-486.

[6] A. Hinchliffe, Molecular modelling for beginners, John Wiley & Sons, 2003.

[7] S. Belaidi, Nouvelle approche de la stéréosélectivité dans les macrolides antibiotiques

dissymétriques, par la modélisation moléculaire, in, Batna, Université El Hadj Lakhder. Faculté des

sciences, 2002.

[8] N.L. Allinger, X. Zhou, J.J.J.o.M.S.T. Bergsma, Molecular mechanics parameters, 312 (1994) 69-

83.

[9] A. Meyer, F.J.J.o.c.c. Forrest, Towards the convergence of molecular‐mechanical force fields, 6

(1985) 1-4.

[10] R. Thomsen, M.H.J.J.o.m.c. Christensen, MolDock: a new technique for high-accuracy molecular

docking, 49 (2006) 3315-3321.

[11] W.J. Hehre, A guide to molecular mechanics and quantum chemical calculations, Wavefunction

Irvine, CA, 2003.

[12] P.W. Atkins, R.S. Friedman, Molecular Quantum Mechanics, OUP Oxford, 2011.

[13] A. KERASSA, Etude par la modélisation moléculaire des relations structures-propriétés de

quelques séries hétérocycliques bioactives, in, Université Mohamed Khider-Biskra, 2015.

References

96

[14] N.L. Allinger, Y.H. Yuh, J.H.J.J.o.t.A.C.S. Lii, Molecular mechanics. The MM3 force field for

hydrocarbons. 1, 111 (1989) 8551-8566.

[15] H.J.A.t.e.p. Dugas, quatrième édition, Librairie de l’Université de Montréal, Principes de base en

modélisation moléculaire, (1996).

[16] F.J.C.m.c. Ooms, Molecular modeling and computer aided drug design. Examples of their

applications in medicinal chemistry, 7 (2000) 141-158.

[17] K.H. Bleicher, H.-J. Böhm, K. Müller, A.I.J.N.r.D.d. Alanine, Hit and lead generation: beyond

high-throughput screening, 2 (2003) 369-378.

[18] G.J.D.D.T.T. Klebe, Lead identification in post-genomics: computers as a complementary

alternative, 1 (2004) 225-230.

[19] G.R. Marshall, T.I. Oprea, M.M.H.A.R. Leach, D.V. Green, M.R.C.L.H. Matter, D.H.B.M.R.

Gozalbes, F.B.S.L. Rogalski, I.M.G.W.J. Alvarez, C.L.C.D.M. Schnur, A.J. Tebben,

Chemioinformatics in drug discovery, Wiley Online Library, 2004.

[20] R. Clark, D. Sprous, J.J.P.S. Leonard, Barcelona, Rational Approaches to Drug Design, HD

Höltje, W, (2001) 475-485.

[21] C.J.D.D.R. Hansch, The physicochemical approach to drug design and discovery (QSAR), 1

(1981) 267-309.

[22] C.J.A.o.c.r. Hansch, Quantitative structure-activity relationships and the unnamed science, 26

(1993) 147-153.

[23] G.L. Patrick, An introduction to medicinal chemistry, Oxford university press, 2013.

[24] V. Brenner, Modelisation des forces intermoleculaires et caracterisation des hypersurfaces

d'energie potentielle. Applications a de petits agregats de van der waals, in, Paris 11, 1993.

[25] K.B.J.J.o.t.A.C.S. Wiberg, A scheme for strain energy minimization. Application to the

cycloalkanes1, 87 (1965) 1070-1078.

[26] W.H. Press, W.T. Vetterling, S.A. Teukolsky, B.P. Flannery, Numerical recipes, Cambridge

university press Cambridge, 1986.

[27] A. Warshel, S.J.T.J.o.C.P. Lifson, Consistent force field calculations. II. Crystal structures,

sublimation energies, molecular and lattice vibrations, molecular conformations, and enthalpies of

alkanes, 53 (1970) 582-594.

References

97

[28] V.N. Viswanadhan, A.K. Ghose, J.N.J.B.e.B.A.-P.S. Weinstein, M. Enzymology, Mapping the

binding site of the nucleoside transporter protein: a 3D-QSAR study, 1039 (1990) 356-366.

[29] Cambridge Crystallographic Data Centre (CCDC) database, http://www.ccdc.cam.ac.uk. , in.

[30] Official website of Protein Data Bank (PDB) : www.rcsb.org/pdb. .

[31] A.D. McNaught, A. Wilkinson, Compendium of chemical terminology. IUPAC

recommendations, (1997).

[32] K.B. Wiberg, P.R.J.J.o.C.C. Rablen, Comparison of atomic charges derived via different

procedures, 14 (1993) 1504-1518.

[33] J.G.J.C.p.l. Ángyán, Choosing between alternative MP2 algorithms in the self-consistent reaction

field theory of solvent effects, 241 (1995) 51-56.

[34] E. Sigfridsson, U.J.J.o.C.C. Ryde, Comparison of methods for deriving atomic charges from the

electrostatic potential and moments, 19 (1998) 377-395.

[35] O. Prasad, L. Sinha, N.J.J.A.M.S. Kumar, Theoretical Raman and IR spectra of tegafur and

comparison of molecular electrostatic potential surfaces, polarizability and hyerpolarizability of

tegafur with 5-fluoro-uracil by density functional theory, 1 (2010) 201-214.

[36] C. Ai, Y. Li, Y. Wang, Y. Chen, L.J.B. Yang, m.c. letters, Insight into the effects of chiral

isomers quinidine and quinine on CYP2D6 inhibition, 19 (2009) 803-806.

[37] R.K. Srivastava, V. Narayan, O. Prasad, L.J.J.C.P.R. Sinha, Vibrational, Structural and Electronic

properties of 6-methyl nicotinic acid by Density Functional Theory, 4 (2012) 3333-3341.

[38] H.B. Broughton, I.A.J.J.o.M.G. Watson, Modelling, Selection of heterocycles for drug design, 23

(2004) 51-58.

[39] I. Lessigiarska, Development of structure-activity relationships for pharmacotoxicological

endpoints relevant to European Union legislation, in, Liverpool John Moores University, 2006.

[40] G. Petersson, M.A.J.T.J.o.c.p. Al‐Laham, A complete basis set model chemistry. II. Open‐shell

systems and the total energies of the first‐row atoms, 94 (1991) 6081-6090.

[41] R.G. Parr, W.J.J.o.t.A.C.S. Yang, Density functional approach to the frontier-electron theory of

chemical reactivity, 106 (1984) 4049-4050.

http://www.ccdc.cam.ac.uk/

file:///C:/Users/client/Desktop/www.rcsb.org/pdb

References

98

[42] W. Yang, W.J.J.J.o.t.A.C.S. Mortier, The use of global and local molecular parameters for the

analysis of the gas-phase basicity of amines, 108 (1986) 5708-5711.

[43] R.S.J.T.J.o.C.P. Mulliken, Electronic population analysis on LCAO–MO molecular wave

functions. I, 23 (1955) 1833-1840.

[44] F.L.J.T.c.a. Hirshfeld, Bonded-atom fragments for describing molecular charge densities, 44

(1977) 129-138.

[45] A.E. Reed, F.J.T.J.o.c.p. Weinhold, Natural bond orbital analysis of near‐Hartree–Fock water

dimer, 78 (1983) 4066-4073.

[46] K.J.s. Fukui, Role of frontier orbitals in chemical reactions, 218 (1982) 747-754.

[47] F. Maseras, K.J.J.o.C.C. Morokuma, IMOMM: A new integrated ab initio+ molecular mechanics

geometry optimization scheme of equilibrium structures and transition states, 16 (1995) 1170-1179.

[48] S. Humbel, S. Sieber, K.J.T.J.o.c.p. Morokuma, The IMOMO method: Integration of different

levels of molecular orbital approximations for geometry optimization of large systems: Test for

n‐butane conformation and SN 2 reaction: RCl+ Cl−, 105 (1996) 1959-1967.

[49] E.B. Wilson, J.C. Decius, P.C. Cross, Molecular vibrations: the theory of infrared and Raman

vibrational spectra, Courier Corporation, 1980.

[50] X. Zhou, L. Wang, P.J.J.o.C. Qin, T. Nanoscience, Free vibration of micro-and nano-shells based

on modified couple stress theory, 9 (2012) 814-818.

[51] É. Biémont, Spectroscopie moléculaire: structures moléculaires et analyse spectrale, De Boeck

Supérieur, 2008.

[52] I. Taleb, Apport de la spectroscopie vibrationnelle, infrarouge et Raman, appliquée au sérum pour

le diagnostic de carcinome hépatocellulaire chez les patients atteints de cirrhose, in, Reims, 2013.

[53] A.C.E.Z.J.C.d.p. CIHLA, 12C 160, MAURICE KIBLER: NIVEAUX D'ÉNERGIE DES IONS

3d" PLACÉS DANS UN CHAMP CRIS-TALLIN DE SYMÉTRIE Caro Mlle FRANÇOISE

CALENDINI ET GUY MESNARD: NIVEAUX D'ÉNERGIE DE L'ION Ti3+ DANS UN CHAMP

CRISTALLIN DE SYMÉTRIE Cero, 21 (1966) 96.

[54] M. Dalibart, Spectroscopie Dans I'infrarouge, Ed. Techniques Ingénieur, 2000.

[55] J. Darnell, Biologie moléculaire de la cellule, (1993).

References

99

[56] Z. Szafran, R. Pike, M.J.I.N. Singh, Microscale Inorganic Chemistry, J. Wiley&Sons, (1991).

[57] F. BEAUDOIN, 1) Chimie organique expérimentale, (1991).

[58] M. Hamel, Influence de la variation de la température ambiante sur les vibrations induites par

effet de couronne, Université du Québec à Chicoutimi, 1991.

[59] M. Lutz, W.J.C. Mäntele, Vibrational spectroscopy of chlorophylls, (1991) 855-902.

[60] S.W. Shaw, C.J.J.o.s. Pierre, vibration, Normal modes of vibration for non-linear continuous

systems, 169 (1994) 319-347.

[61] T. Eicher, S. Hauptmann, A. Speicher, The chemistry of heterocycles: structures, reactions,

synthesis, and applications, John Wiley & Sons, 2013.

[62] Y. Morino, K.J.T.J.o.C.P. Kuchitsu, A note on the classification of normal vibrations of

molecules, 20 (1952) 1809-1810.

[63] Z. Konkoli, D.J.I.j.o.q.c. Cremer, A new way of analyzing vibrational spectra. I. Derivation of

adiabatic internal modes, 67 (1998) 1-9.

[64] J. Bartol, P. Comba, M. Melter, M.J.J.o.c.c. Zimmer, Conformational searching of transition

metal compounds, 20 (1999) 1549-1558.

[65] C. Hansch, R.M. Muir, T. Fujita, P.P. Maloney, F. Geiger, M.J.J.o.t.A.C.S. Streich, The

correlation of biological activity of plant growth regulators and chloromycetin derivatives with

Hammett constants and partition coefficients, 85 (1963) 2817-2824.

[66] L.P.J.C.r. Hammett, Some relations between reaction rates and equilibrium constants, 17 (1935)

125-136.

[67] C. Hansch, A. Leo, D. Hoekman, Exploring QSAR: fundamentals and applications in chemistry

and biology, American Chemical Society Washington, DC, 1995.

[68] D.B. Boyd, K.J.R.i.C.C. Lipkowitz, The computational chemistry literature, 2 (1991) 461-479.

[69] U. Norinder, T.J.A.p.n. Högberg, A quantitative structure-activity relationship for some dopamine

D2 antagonists of benzamide type, 4 (1992) 73-78.

[70] H. Van de Waterbeemd, N. El Tayar, B. Testa, H. Wikstroem, B.J.J.o.m.c. Largent, Quantitative

structure-activity relationships and eudismic analyses of the presynaptic dopaminergic activity and

References

100

dopamine D2 and. sigma. receptor affinities of 3-(3-hydroxyphenyl) piperidines and octahydrobenzo

[f] quinolines, 30 (1987) 2175-2181.

[71] J. Ghasemi, S. Saaidpour, S.D.J.J.o.M.S.T. Brown, QSPR study for estimation of acidity

constants of some aromatic acids derivatives using multiple linear regression (MLR) analysis, 805

(2007) 27-32.

[72] P. Geladi, B.R.J.A.c.a. Kowalski, Partial least-squares regression: a tutorial, 185 (1986) 1-17.

[73] A.J. Myles, R.N. Feudale, Y. Liu, N.A. Woody, S.D.J.J.o.C.A.J.o.t.C.S. Brown, An introduction

to decision tree modeling, 18 (2004) 275-285.

[74] A.F. Duprat, T. Huynh, G.J.J.o.c.i. Dreyfus, c. sciences, Toward a principled methodology for

neural network design and performance evaluation in QSAR. Application to the prediction of logP, 38

(1998) 586-594.

[75] I.V. Tetko, A.E.J.N.P.L. Villa, An enhancement of generalization ability in cascade correlation

algorithm by avoidance of overfitting/overtraining problem, 6 (1997) 43-50.

[76] J. Gasteiger, J.J.A.C.I.E.i.E. Zupan, Neural networks in chemistry, 32 (1993) 503-527.

[77] R.J.J.o.C.A.J.o.t.C.S. Leardi, Genetic algorithms in chemometrics and chemistry: a review, 15

(2001) 559-569.

[78] A. Kovatcheva, A. Golbraikh, S. Oloff, Y.-D. Xiao, W. Zheng, P. Wolschann, G. Buchbauer,

A.J.J.o.c.i. Tropsha, c. sciences, Combinatorial QSAR of ambergris fragrance compounds, 44 (2004)

582-595.

[79] J.G. Topliss, R.P.J.J.o.M.C. Edwards, Chance factors in studies of quantitative structure-activity

relationships, 22 (1979) 1238-1244.

[80] J.J.N.F.C.S. Leszczynski, Challenges and advances in computational chemistry and physics,

(2006) 1-680.

[81] R.D. Cramer, D.E. Patterson, J.D.J.J.o.t.A.C.S. Bunce, Comparative molecular field analysis

(CoMFA). 1. Effect of shape on binding of steroids to carrier proteins, 110 (1988) 5959-5967.

[82] H. Kubinyi, G. Folkers, Y.C. Martin, 3D QSAR in Drug Design: Volume 2: Ligand-Protein

Interactions and Molecular Similarity, Springer Science & Business Media, 1998.

[83] S. Chatterjee, A. Hadi, B.J.I. Price, New York, Regression analysis by example John Wiley &

Sons, (2000).

References

101

[84] W.S. McCulloch, W.J.T.b.o.m.b. Pitts, A logical calculus of the ideas immanent in nervous

activity, 5 (1943) 115-133.

[85] L. Douali, A. Schmitzer, D. Villemin, A. Jarid, D.J.P. Cherqaoui, c. news, Neural networks and

their applications in chemistry and biology, (2007) 131-144.

[86] D.W. Salt, N. Yildiz, D.J. Livingstone, C.J.J.P.s. Tinsley, The use of artificial neural networks in

QSAR, 36 (1992) 161-170.

[87] I.J.M.p.l.o.d.d.d.M.e.M. Ghania, Université Mentouri, Constantine, Fasculté des Sciences

Exactes., Publié le, Régression et modélisation par les réseaux de neurones, (2009) 49.

[88] R.J.R.l.b.e.p. Rakotomalala, Université Lumière Lyon, Pratique de la régression logistique, 2

(2011) 258.

[89] J. Confais, M. Le Guen, Premiers pas en régression linéaire avec SAS®, (2007).

[90] K. Roy, P.P.J.E.j.o.m.c. Roy, Comparative chemometric modeling of cytochrome 3A4 inhibitory

activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and

ANN techniques, 44 (2009) 2913-2922.

[91] L. Eriksson, J. Jaworska, A.P. Worth, M.T. Cronin, R.M. McDowell, P.J.E.h.p. Gramatica,

Methods for reliability and uncertainty assessment and for applicability evaluations of classification-

and regression-based QSARs, 111 (2003) 1361-1375.

[92] P.J.Q. Gramatica, c. science, Principles of QSAR models validation: internal and external, 26

(2007) 694-701.

[93] A. Golbraikh, A.J.J.o.m.g. Tropsha, modelling, Beware of q2!, 20 (2002) 269-276.

[94] H. Kubinyi, F.A. Hamprecht, T.J.J.o.m.c. Mietzner, Three-dimensional quantitative similarity−

activity relationships (3d qsiar) from seal similarity matrices, 41 (1998) 2553-2564.

[95] A. Worth, T. Hartung, C.J.S. Van Leeuwen, Q.i.E. Research, The role of the European centre for

the validation of alternative methods (ECVAM) in the validation of (Q) SARs, 15 (2004) 345-358.

[96] A.R. Leach, A.R. Leach, Molecular modelling: principles and applications, Pearson education,

2001.

[97] J.J.J.o.t.A.s.A. Shao, Linear model selection by cross-validation, 88 (1993) 486-494.

References

102

[98] W. Svante, M. Sjöström, L.J.E.C.C. Erikson, Partial least squares projections to latent structures

(PLS) in chemistry, 3 (2002).

[99] A. Yasri, D.J.J.o.c.i. Hartsough, c. sciences, Toward an optimal procedure for variable selection

and QSAR model building, 41 (2001) 1218-1227.

[100] G. Schneider, P.J.P.i.b. Wrede, m. biology, Artificial neural networks for computer-based

molecular design, 70 (1998) 175-222.

[101] R. Guha, P.C.J.J.o.c.i. Jurs, modeling, Determining the validity of a QSAR model− a

classification approach, 45 (2005) 65-73.

[102] E. Novellino, C. Fattorusso, G.J.P.A.H. Greco, Use of comparative molecular field analysis and

cluster analysis in series design, 70 (1995) 149-154.

[103] U.J.J.o.c. Norinder, Single and domain mode variable selection in 3D QSAR applications, 10

(1996) 95-105.

[104] N.S. Zefirov, V.A.J.J.o.c.i. Palyulin, c. sciences, QSAR for boiling points of “small” sulfides.

Are the “high-quality structure-property-activity regressions” the real high quality QSAR models?, 41

(2001) 1022-1027.

[105] P.P. Roy, K.J.Q. Roy, C. Science, On some aspects of variable selection for partial least squares

regression models, 27 (2008) 302-313.

Chapter III

Results and Discussion

Result and discussion

104

Many researches have been carried out for the development of anticancer

drugs; nevertheless, no drug could achieve a parasitological cure and some of them

presented serious toxic side effects.

While, RESEARCHES NEVER END WITH NOTHING, some compounds were

found to be important candidates as inhibitors against topoisomerase II and cytotoxic

activity.

Therefore, following our interest in this field, we applied several computational

methods on the most effective Topo II candidates in order to, study structures and

properties of these ligands, evaluate their biological response and better understand

topoisomerase II inhibition mode.

To simplify our ideas, we divide this chapter to three big titles as fellow:

• Molecular Geometry, Electronic Properties, MPO Methods and Structure

Activity/Property Relationship Studies of carbazole derivatives containing

chalcone Analogues (CDCAs).

At first, we aim to study the Structure, spectroscopy and electronic properties of

carbazole subunit using DFT methods, Afterward, we extend a dataset of Carbazole

Derivatives containing Chalcone Analogues, in order to identify compounds with

higher potency, we study physicochemical properties, descriptors and drug-likeness

scoring.

• Quantitative Structure-Activity Relationships of Topo II inhibition activity

of carbazole derivatives containing chalcone analogues (CDCAs).

Next, in this part we develop QSAR models of CDCAs analogues with respect to their

topoisomerase II inhibition activities.


105

III .1 Introduction

Carbazole is a tricyclic compound (figure 34) belonging to the general class of

nitrogen heterocycles. It possesses photoluminescent properties and good thermal

stabilities [1,2]due its extended π-conjugation and aromatic character. [1]Carbazole

derivatives exhibit interesting biological activities, with high potential for use as

antibacterial, antifungal, antiviral, antimalarial, anticancer and anti-Alzheimer agents.

[3,4]Additionally, carbazole-based compounds have potential applications for the

symptomatic and disease-modifying treatment of Parkinson’s disease. [5] Some

derivatives of carbazole, such as N-substituted carbazole–imidazole hybrids, N-

substituted carbazole imidazolium salts [6], N-thioalkylcarbazoles [7]and carbazole

aminoalcohols, [8]have shown anti-proliferative activity against several tumor cell

lines by a variety of mechanisms based on the inhibition of topoisomerase I and II

(Topo I and Topo II, respectively). Further, according to recent studies, [9,10] it has

been found that the carbazole scaffold functionalized with small substitution groups

leads to stabilization of the intramolecular G-quadruplex, thus inhibiting the

telomerase activity.

Figure .34.Chemical structure of carbazole with atom numbering.

A new set of Carbazole Derivatives containing Chalcone Analogues (CDCAs)

was reported by Li et al in 2018. [11]All these compounds (from 1 to 24, see Table 3)

possess a carbazole moiety connected to a chalcone group, with the only exception of


106

compound 8, in which the phenyl group of chalcone is replaced by a thiophene unit.

Although etoposide (VP-16), here numbered as compound 25, is chemically different

from the other compounds in Table 3, it was chosen as a reference drug. Indeed, its use

as an efficient antineoplastic agent is well established since more than five decades.

[12,13] Li et al. showed that the carbazole subunit in these compounds acts as a

scaffold for Topo II inhibition, and that the chalcone moiety of CDCAs plays a key

role in anticancer activity. This CDCAs series exhibits also a strong Topo II inhibitory

and moderate cytotoxic activity against different tested cancer cell lines such as Hela,

HL-60, A549, and PC-3. The different inhibitory activity shown by these molecules in

laboratory studies is related to the differences in their molecular structures.

Table 3: Chemical structure and experimental activity (pIC50[11]) of the CDCAs under study.

pIC50 is log10(1/IC50). * denotes the compounds selected for external validation (test set).

N Structure pIC50 N Structure pIC50

1*

4.874 2

4.600

3

5.545 4*

5.209

5

4.574 6

4.888

7

6.658 8

4.787

9

4.623 10

5.038


107

Computational chemistry and modelling are useful tools in modern pharmacological

and medicinal chemistry for both drug discovery and drug development. Indeed, their

interplay with biological sciences have resulted in a better understanding of the

mechanisms of action of drugs and body functions at the cellular and molecular levels.

Consequently, most research projects in the pharmaceutical industry start by

identifying a suitable target in the body and designing drugs able to interact with it.

Knowledge of the structure and function of the target, as well as the mechanisms by

which it interacts with drug molecules, is crucial to this approach. [14]In this respect,

Density Functional Theory (DFT) [15-17] is a very elegant approach to describe the

chemical processes and provides a powerful theoretical framework for the study of

both reactivity and selectivity [17]of medium-sized molecular systems. In addition, the

11

5.144 12*

5.195

13

5.265 14

4.525

15

3.573 16

5.413

17

6.013 18

5.666

19

5.411 20

6.174

21

5.485 22

5.343

23

4.788 24*

5.903


108

Quantitative Structure-Activity Relationships (QSAR) approach attempts to establish a

mathematical relation between the physicochemical and structural properties of a

potential drug and its biological activity, [18]expressed in terms of pIC50. Thus, it

allows designing drug molecules with improved pharmacokinetic properties, reducing

the number of compounds to be tested and limiting development failures. [14]

In this paper, we carried out several benchmark computations on the structure

and spectroscopy of the carbazole subunit using the DFT technique. After comparison

to experiments, it turns out that the B3LYP/6–311 G(d,p) level is accurate enough to

derive accurate physicochemical properties and parameters for the 25 CDCA

derivatives identified in the recent work by Li et al. [11]In order to predict the

pharmacological activity of these compounds, we incorporated these parameters into

quantitative models based on two statistical methods: Multiple Linear Regression

(MLR) and Artificial Neural Network (ANN). As implemented, the present combined

QSAR-DFT approach [19-23] for the calculation of descriptors allows us to build

simpler (less parameterized) models, thus reducing the risk of over-parameterization,

for an easier and more direct interpretation of the drug activity at the

phenomenological level. In order to develop an accurate model capable of explaining

the role of CDCAs derivatives in chemotherapy, we have performed drug-like

calculations, evaluations of physicochemical properties and QSAR study of carbazole

derivatives containing chalcone analogues targeting Topo II inhibition (Figure 35) and

presenting strong cytotoxic activity. In addition, our model will facilitate the design of

new CDCAs for this purpose.


109

Figure .35. The Synthetic or natural carbazole derivatives induces cancer cells death triggering

the intrinsic apoptotic pathway by inhibition of Topoisomerase II.

III .2 Structure, spectroscopy and electronic properties of carbazole and its

derivatives

The choice of the approach for electronic structure computations on a given

molecular system is crucial to accurateely describe its reactivity, selectivity and most

stable molecular conformation. Indeed, the latter is closely connected to the

identification of interactions between the drug and active site of the enzyme. Due to

the relatively large size of drug molecules, post Hartree-Fock ab initio computations

on these systems are out of reach. Instead, various studies showed that DFT

calculations on large molecules yield good results on different systems (organic,

inorganic, and organometallic ...) [23-26]at a reasonable computational cost. Indeed,

DFT takes into account the dynamic correlation of electrons [27], which is mandatory

for the accurate description of the structure and the spectroscopic properties of these

compounds. Hence, in this study we opted for DFT-based approaches. For instance,

computations were performed using gradient-corrected DFT with the Becke 3

exchange [28]and Lee-Yang-Parr correlation functions (B3LYP), in conjunction with

three different basis sets 6-31G+(d,p), 6-311G and 6-311G(d,p), as implemented in

GAUSSIAN 09 program package. [29]

III .2.1 Carbazole structure and properties

Prior to deriving the properties of the CDCAs of interest, we performed

geometry optimizations of the carbazole subunit in its electronic ground state. We

carried out these computations at the B3LYP/6-31G+(d,p), B3LYP/6-311G and

B3LYP/6-311G(d,p) levels of theory in order to select an appropriate atomic basis set

for the molecular systems studied herein. The results are listed in Table 2.


110

Table 4: Equilibrium Geometrical Parameters (Bond lengths in Å and angles in degrees) of

carbazole in its electronicground state. The numbering of the atoms is given in

Figure 34.

DFT/B3LYP Exp. a)

Parameters 6-31G+(d.p) 6-311G 6-311G(d,p)

C1-C2 1.398 1.397 1.395 1.393

C1-C6 1.421 1.424 1.419 1.401

C1-N13 1.388 1.396 1.385 1.387

C2-C3 1.395 1.395 1.390 1.373

C2-H14 1.087 1.082 1.085 1.00

C3-C4 1.408 1.408 1.404 1.379

C3-H15 1.086 1.082 1.084 0.96

C4-C5 1.394 1.395 1.390 1.379

C4-H16 1.086 1.082 1.084 1.02

C5-C6 1.402 1.400 1.398 1.388

C5-H17 1.087 1.082 1.085 1.01

C6-C7 1.450 1.454 1.449 1.415

C7-C8 1.402 1.400 1.398 1.388

C7-C12 1.421 1.424 1.419 1.401

C8-C9 1.394 1.395 1.390 1.379

C8-H18 1.087 1.082 1.085 1.01

C9-C10 1.408 1.408 1.404 1.379

C9-H19 1.086 1.082 1.084 1.02

C10-C11 1.395 1.395 1.390 1.373

C10-H20 1.086 1.082 1.084 0.96

C11-C12 1.398 1.397 1.395 1.393

C11-H21 1.087 1.082 1.085 1.00

C12-N13 1.388 1.396 1.385 1.387

N13-H22 1.007 1.003 1.057 0.98

C2-C1-C6 121.9 121.8 121.8 122.3

C2-C1-N13 129.6 129.8 129.7 128.7

C6-C1-N13 108.4 108.3 108.4 109

C1-C2-C3 117.6 117.7 117.7 116.7

C1-C2-H14 121.4 121.3 121.3 121

C3-C2-H14 120.9 120.8 120.9 122

C2-C3-C4 121.3 121.2 121.3 122


111

C2-C3-H15 119.2 119.3 119.2 119

C4-C3-H15 119.4 119.4 119.4 119

C3-C4-C5 120.7 120.7 120.7 121

C3-C4-H16 119.5 119.4 119.4 118

C5-C4-H16 119.8 119.8 119.8 121

C4-C5-C6 119.2 119.2 119.2 118.4

C4-C5-H17 120.3 120.3 120.3 124

C6-C5-H17 120.5 120.5 120.4 118

C1-C6-C5 119.2 119.2 119.2 119.7

C1-C6-C7 106.7 106.9 106.7 106.6

C5-C6-C7 134.0 133.9 134.1 133.7

C6-C7-C8 134.0 133.9 134.1 133.7

C6-C7-C12 106.7 106.9 106.7 106.6

C8-C7-C12 119.2 119.2 119.2 119.7

C7-C8-C9 119.2 119.2 119.2 118.4

C7-C8-H18 120.4 120.5 120.4 118

C9-C8-H18 120.3 120.3 120.3 124

C8-C9-C10 120.7 120.7 120.7 121

C8-C9-H19 119.8 119.8 119.8 121

C10-C9-H19 119.5 119.4 119.4 118

C9-C10-C11 121.3 121.2 121.3 122

C9-C10-H20 119.4 119.4 119.4 119

C11-C10-H20 119.2 119.3 119.2 119

C10-C11-C12 117.6 117.7 117.7 116.7

C10-C11-H21 120.9 120.8 120.9 122

C12-C11-H21 121.4 121.3 121.3 121

C7-C12-C11 121.9 121.8 121.8 122.3

C7-C12-N13 108.5 108.3 108.4 109

C11-C12-N13 129.6 129.8 129.7 128.7

C1-N13-C12 109.6 109.5 109.6 108.7

C1-N13-H22 125.2 125.2 125.8 125.65

C12-N13-H22 125.2 125.2 125.8 125.65

a. Ref. [30].

where a comparison is made between these theoretical data and the

experimental results by Gerkin and Reppart. [30] As expected, the larger basis set (i.e.


112

6-311G(d,p)) performs better than the smaller ones. Indeed, a good agreement between

the experimental data and those deduced using the 6-311G(d,p) basis set is found. For

instance, deviations between the computed and measured distances are mostly in the

range 0.002-0.2 Å. For the angles, the differences between the two sets are mostly less

than 1°, except for a few ones that are 3.7° off. Such deviations may be related to the

gas phase structure (here) and the crystal structure of carbazole determined in Gerkin

and Reppart’s experiments. Overall, this basis set is reliable enough for the study of

the molecular and electronic structure of this molecule and of its derivatives. For

further confirmation, we computed the anharmonic frequencies of carbazole. Table 3

gives the computed values for the 60 normal modes of vibration of carbazole, together

with the comparison to the measured fundamentals by Bree and Zwarich. [31]Again, a

good agreement is found between the two sets of data, where the differences between

both sets are less than 30 cm-1, except for Ʋ1 (NH stretching), for which we computed

3501 cm-1, while Bree and Zwarich measured 3421 cm-1. In sum, we conclude that

B3LYP, in conjunction with the 6-311G(d, p) basis set, is accurate enough for

investigating the carbazole derivatives. In the following, this level of theory will be

used for the treatment of the structural and physicochemical properties of the CDCAs

of interest.

Table 5:Comparison of the experimental and calculated anharmonic frequencies (ν, in cm-1)) of

carbazole. Computed at the B3LYP/6–311G(d,p) level of theory and we give also the computing of IR

intensity (I in (km/mol). Abbreviations used:υ,stretching; υs, sym. stretching; υas, asym. stretching; b,

in-plane-bending; g, out-of-plane bending; Ben: Benzene; Pyr: pyrrole.

Sym Calculated Expa)

Assignment I

a1 3501 41.0 3421 υ NH (Ben)

3056 43.2 3084 υs CH (Ben)

3051 2.3 3077 υs CH (Ben)

3048 1.2 3055 υs CH (Ben)

3046 55.6 3039 υas CH (Ben)

1628 2.9 1625 β CCC (Ben) + υ CC (Pyr)


113

1580 0.8 1576 υ C=C (Ben-Pyr)

1486 0.0 1481 υ CC (Pyr)

1449 27.1 1449 υ C=C (Ben-Pyr)

1340 8.2 1334 β CNC (Pyr) + υ C=C (Ben-Pyr)

1312 0.1 1288 υ CC (Pyr) + υ C=C (Ben)

1235 78.8 1205 υ C=C (Ben-Pyr) + υ CN (Pyr)

1162 4.6 1136 υ C=C (Ben-Pyr)

1113 2.3 1107 υ CC (Pyr) + CCC (Ben)

1021 0.3 1012 υ C=C (Ben)

932 3.6 910 γ CH (Ben)

851 0.0 856 γ CH (Ben) + γ CCC (Ben-Pyr)

755 47.3 747 γ CH (Ben) + γ CNH (Pyr)

642 0.5 658 CNC + CCC (Ben-Pyr)

426 10.4 425 γ CH (Ben) + γ NH (Pyr)

216 0.3 220 βCCC (Ben-Pyr)

a2

969 0.0

γ CH (Ben)

968 0.0

γ CH (Ben)

846 2.7

β CCN +β CCC (Ben-Pyr)

780 0.0 γ CH (Ben) + γ CCC (Pyr)

580 0.0

γ CH (Ben)

429 1.0

CCC + CCN (Ben-Pyr)

290 0.0

ring Ben

298 3.9 299 γ CH (Ben) + γ NH (Pyr)

107 3.0 104 CH (Ben)

b2

3059 0.3 3094 υas CH (Ben)

3048 10.6 3050 υs CH (Ben)

3027 5.5 3030 υ CH (Ben)

3013 1.6 2940 υ CH (Ben)

1582 1.5 1594 υ C=C (Ben-Pyr)

1495 15.0 1490 υ C=C (Ben) + υ CN (Pyr)

1465 30.0 1452 υ C=C (Ben-Pyr)

1395 21.2 1380 υ C=C (Ben) + υ CN (Pyr)

1324 29.4 1320 υ C=C (Ben-Pyr)

1281 1.7 1233 υ CN (Pyr)

1216 6.4 1204 υ C=C (Ben) + υ CN (Pyr)

1207 6.0 1158 υ C=C (Ben-Pyr)

1126 5.6 1118 υ C=C (Ben)

1031 0.3 1022 υ C=C (Ben)

1002 6.4 995 CCC (Ben)+ βCH (Ben)

844 1.3 835 γ CH (Ben)

741 0.0 737 CH (Ben)

627 5.9 616 βCCC (Ben-Pyr)

559 0.1 548 CCC (Ben)

510 3.5 505 β ringPyr

b1

1605 23.7

β CCC (Pyr) + υ C=C (Ben)

1169 0.5 1152 υ C=C (Ben)

937 0.1 926 γ CH (Ben)

882 0.2 880 β CNC +β CCC (Ben-Pyr)

749 0.1 741 CCC (Ben)

731 87.5 722 CH (Ben)

569 6.6 566 CH (Ben) + γ NH (Pyr)

444 0.0 445 γ CH (Ben) + γ CCC (Ben-Pyr)

342 70.7 310 NH ( pyr)

149 0.0 139 CH (Ben)


114

a) The detection of these observed frequencies of the bands has been made in the gas state.[2]

Since carbazole is the scaffold for Topo II inhibition, we studied its reactivity.

A powerful practical model for describing chemical reactivity is the frontier molecular

orbital theory, which is useful to explain drug–receptor interactions. [32]These are

related to the overlap between the highest occupied molecular orbital (HOMO) and the

lowest unoccupied molecular orbital (LUMO) of the two molecular entities, which

provides information on their electron-donating and electron-withdrawing character,

[33]respectively. These molecular orbitals (MOs) are displayed in Table 6. This figure

shows that the HOMO and LUMO correspond to π bonding and anti-bonding MOs,

respectively, spreading over the whole carbazole molecule.

Table 6: Outermost molecular orbitals of carbazole and of its derivatives of interest in the

present study.E (in eV) corresponds to the energy HOMO – LUMO gap.

Compound HOMO LUMO E

Carbazole

4.73

Chalcone

4.13

1

3.47


115

2

3.53

3

3.63

4

3.31

5

3.84

6

3.45


116

7

3.50

8

3.70

9

3.65

10

3.65

11

3.76

12

3.41


117

13

3.94

14

3.86

15

3.48

16

3.43

17

3.71

18

3.67


118

19

3.42

20

3.67

21

3.75

22

3.75

23

3.46


119

24

3.29

In addition, we analyzed the Fukui function of carbazole. This function

describes how the electron density responds to the change in the total number of

electrons and allows us to recognize the favored sites for electrophilic and nucleophilic

attacks. [17,34,35]The reactivity of an atom A in a molecule can be described by the

following condensed Fukui functions [17]:

f+(A) = [qN+1(A) - qN(A)]

f-(A) = [qN(A) - qN-1(A)]

f0(A)= [qN+1(A) - qN-1(A)] /2

where qN(A), qN+1(A) and qN-1(A) are the electronic populations of atom A in a neutral

molecule and in the corresponding anionic and cationic species, respectively. Atoms

with large values of f+(A), f-(A) and f0(A) in a given molecule will be more easily

attacked by nucleophiles, electrophiles and radicals, respectively.

We also computed the dual function, ∆f(r), [37] which is calculated as the difference

between the nucleophilic and electrophilic condensed Fukui functions:

∆f(A)=[f+(A) - f-(A)]

Thus, ∆f is positive for atoms that are more easily attacked by nucleophiles and

negative for those that are more easily attacked by electrophiles. [37]Results are given

in Table 7.

Table 7: Values of the Fukui functions of carbazole. See text for the definition of the different

terms.

Atoms q(N) q(N+1) q(N-1) f+ f- f0 ∆f

C1 0.2050 0.1950 0.1990 -0.0100 0.0060 -0.0020 -0.0160


120

C2 -0.0660 0.0060 -0.1350 0.0720 0.0690 0.0705 0.0030

C3 -0.0960 -0.0750 -0.1590 0.0210 0.0630 0.0420 -0.0420

C4 -0.1110 -0.0580 -0.1120 0.0530 0.0010 0.0270 0.0520

C5 -0.0390 0.0070 -0.1290 0.0460 0.0900 0.0680 -0.0440

C6 -0.0920 -0.0710 -0.0940 0.0210 0.0020 0.0115 0.0190

C7 -0.0920 -0.0710 -0.0940 0.0210 0.0020 0.0115 0.0190

C8 -0.0390 0.0071 -0.1290 0.0461 0.0900 0.0681 -0.0439

C9 -0.1110 -0.0580 -0.1120 0.0530 0.0010 0.0270 0.0520

C10 -0.0960 -0.0750 -0.1590 0.0210 0.0630 0.0420 -0.0420

C11 -0.0660 0.0060 -0.1350 0.0720 0.0690 0.0705 0.0030

C12 0.2050 0.1950 0.1990 -0.0100 0.0060 -0.0020 -0.0160

N13 -0.5250 -0.4350 -0.5310 0.0900 0.0060 0.0480 0.0840

H14 0.0850 0.1430 0.0230 0.0580 0.0620 0.0600 -0.0040

H15 0.0910 0.1470 0.0240 0.0560 0.0670 0.0615 -0.0110

H16 0.0890 0.1520 0.0270 0.0630 0.0620 0.0625 0.0010

H17 0.0820 0.1330 0.0270 0.0510 0.0550 0.0530 -0.0040

H18 0.0820 0.1330 0.0270 0.0510 0.0550 0.0530 -0.0040

H19 0.0890 0.1520 0.0270 0.0630 0.0620 0.0625 0.0010

H20 0.0910 0.1470 0.0240 0.0560 0.0670 0.0615 -0.0110

H21 0.0850 0.1430 0.0230 0.0580 0.0620 0.0600 -0.0040

H22 0.2250 0.2770 0.1860 0.0520 0.0390 0.0455 0.0130

Apart from the hydrogens atoms, (Table 7,Figure 36) shows that negative ∆f values

are determined for the C1, C3, C5, C8, C10 and C12 atoms. This reveals that the ortho positions

of the aromatic cycles in carbazole are highly reactive with respect to electrophilic attacks. In

contrast, the other carbon atoms, except C2 and C11, of carbazole and especially the nitrogen


121

atom present positive ∆f values. Therefore, these atomic centers are mostly subject to

nucleophilic attacks. We expect that the C2 and C11 atoms, i.e. those associated with the

largest f0 values (Table 7), will be the most probable sites for radical attacks.

Figure.36.Dual descriptor (∆f) Vs Atoms.

III .3 Carbazole derivatives structures

At first, we preoptimized the CDCAs to get minimal energy structures using the

Molecular Mechanics Force Field (MM+) followed by the semi-empirical AM1

method, with a convergence threshold of 0.01 kcal/Å in the gradient norm, as

implemented in the HyperChem (version 8.08) package. [38]This allowed us to

identify the most stable forms that were further optimized using the DFT B3LYP/6-

311++G(d,p) method. The resulting structures, in Cartesian coordinates, are displayed

in Table S1 (in Appendix). Close examination of this table reveals that the carbazole

subunit exhibits slight changes compared to the isolated carbazole, whereas the

chalcone part of the CDCAs either resembles the isolated chalcone or presents strong

changes. This justifies a posteriorithe choice of chalcone for the above-mentioned

benchmark computations.

-0.0600

-0.0400

-0.0200

0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 N13 H14 H15 H16 H17 H18 H19 H20 H21 H22

Du

al

des

crip

tor

(∆f)

Atom numbering


122

According to the structure of the chalcone moiety, we can easily classify the CDCAs

into two conformational types: i) Type 1 compounds, showing slight changes in the

chalcone subunit, and ii) Type 2 compounds, with a non-planar arrangement of the

enone/chalcone moiety.

Face view Profile view

Type 1

Compound 1

Type 2

Compound 3

Figure .37. Face and profile views of Compounds 1 and 3.


123

The central part of Type 1 compounds, the one formed by the chalcone and

carbazole moieties, is almost planar. Type 1 molecules have similar shapes as

Compound 1, displayed in the upper part of Figure 37, and correspond to Compounds

1, 2, 4, 5, 6, 7, 12, 15, 16, 19 and 23. These molecules are stabilized by extended

electron conjugation over the aromatic rings of carbazole and chalcone through the

connecting enone. Type 2 molecules have similar shapes as Compound 3, displayed in

the lower part of Figure 37, and correspond to Compounds 3, 8, 9, 10, 11, 13, 14, 17,

18 20, 21, 22 and 24. The stabilization here is due to π-stacking interactions between

the phenyl of chalcone and the aromatic rings of carbazole. Indeed, the mesomerism

and the π-stacking interactions are in competition to lead to Type 1 or Type 2

compounds. Nevertheless, we cannot observe any clear relation between the structure

of these CDCAs and their measured biological activity (Table 3). For instance,

Compounds 15 and 7 have pIC50values of ~3.5 and 6.7 despite both of them being of

Type 1. Table 3 shows also that Compound 20 has a pIC50 of 6.174 close to that of

Compound 7, despite being of Type 2. Consequently, different shapes are obtained for

these CDCAs even if all of them possess similar biological and cytotoxic effects.

Therefore, the interaction of these CDCAs with Topo II should be more complex than

the commonly admitted key-lock mechanism, which is based on shape

complementarity between the substrate and enzymatic active site. We conclude that an

induced-fit mechanism is more plausible, where conformational fluctuations in the

enzyme-substrate complex should influence the dynamics of the inhibition process,

thus affecting the biological activity of the studied compounds.

III.4 CDCAs derivatives physicochemical properties, descriptors and drug-

likeness scoring

The molecular orbital theory is extremely useful for physicists and chemists. In

particular, the HOMO and LUMO of a chemical species are important for discussing

the molecular reactivity and can help to interpret the intermolecular protein-ligand

interactions. The DFT B3LYP/6-311G(d,p) method as implemented in Gaussian 09


124

was employed to derive the outermost molecular orbitals (HOMO, LUMO and related

energies) of carbazole derivatives. These orbitals are displayed in Table 6. These

compounds have a HOMO/LUMO mostly localized on the carbazole and chalcone

moiety, respectively. Interestingly the HOMO-LUMO gap for all CDCAs is almost the

same (of ~ 3.5 eV).

In order to obtain validated QSAR models, several molecular descriptors,

encoding various information about size, hydrophilicity, electronic and topological

properties, were calculated using the Molecular Operating Environment (MOE)

[39]software and the QSAR Properties (version 8.0.6) module integrated in the

HyperChem package. The deduced properties correspond to molar weight (MW),

surface area grid (SAG), volume (V), molecular refractivity (MR), polarizability (Pol),

octanol-water partition coefficient (log10P), hydration energy (HE), hydrogen bond

donors (HBDs, counting the sum of all NH and OH groups), hydrogen bond acceptors

(HBA, counting all N and O atoms), number of rotatable bonds (Nbrot) and topological

polar surface area (TPSA). We also deduced the number of atoms (Natom) and the

number of heavy atoms (NH). These data are listed in Tables 8 and 9. These tables

show that large values of MR and Pol correspond to large values of V and MW, in

agreement with the Lorentz-Lorenz equation. [40] For example, Compound 24, the

one with the p-chlorophenyl and nitrophenyl groups, has rather large values of Pol

(68.74 Å3) and MR (149.41Å3), which are associated with large values of V (1409.29

Å3) and MW (466.92 amu). Whereas, Compound 1 has the smallest values of MR

(115.47 Å3) and Pol (55.05 Å3). HE is an important parameter connected to the

stabilization of molecules bearing polar groups and a great predictive index for

molecule accessibility in biologic media. [41,42]Compound 14 has the smallest |HE|

value (1.1 kcal/mol), while the largest |HE| values are observed for Compounds 24 and

25 (VP-16) (9.05 and 22.75 kcal/mol, respectively), which bear hydrophilic groups

(e.g. NO2, CO, CN, and OH).

Table 8:Physicochemical properties of CDCAs derivatives. For each compound, we give the

energies of the HOMO (EHOMO in a.u.) and of the LUMO (ELUMO in a.u.), the polarizability

(Pol in Å3), the volume (V in Å3), the surface area grid (SAG in Å3) and the hydration energy

(HE in kcal/mol). * denotes the selected compounds for external validation (test set).


125

Compound EHOMO ELUMO Pol V SAG HE

1* -0.2052 -0.07763 55.05 953.84 559.04 -2.86

2 -019969 -0.06997 68.66 1177.74 678.02 -3.56

3 -0.20881 -0.0735 56.56 955.41 543.72 -2.47

4* -0.21087 -0.08925 57.82 1033.16 601.92 -2.29

5 -0.20883 -0.06771 58.42 1020.74 604.76 -7.10

6 -0.20716 -0.08021 54.94 962.56 566.94 -2.57

7 -0.20385 -0.07526 58.14 1007.58 584.98 -4.46

8 -0.20599 -0.06983 53.10 906.05 535.90 -3.12

9 -0.21001 -0.07562 57.43 983.06 557.22 -2.76

10 -0.21069 -0.07664 56.56 976.37 573.38 -2.70

11 -0.21046 -0.07214 56.56 992.69 584.95 -2.89

12* -0.20561 -0.08033 54.94 960.25 562.34 -2.68

13 -0.20507 -0.06008 62.84 1065.32 596.47 -5.22

14 -0.20475 -0.06289 61.24 1092.35 596.88 -1.10

15 -0.20476 -0.07672 58.84 1039.58 609.74 -3.87

16 -0.20417 -0.07786 58.03 1014.3 588.16 -1.55

17 -0.21414 -0.07771 66.59 1089.36 597.18 -3.76

18 -0.21053 -0.07575 69.79 1133.75 618.72 -2.92

19 -0.21022 -0.08467 66.70 1144.15 665.11 -4.04

20 -0.21105 -0.07607 66.59 1123.72 634.55 -3.92

21 -0.21356 -0.0757 66.48 1127.34 637.58 -3.72

22 -0.21514 -0.07723 66.59 1132.60 644.85 -4.02

23 -0.21404 -0.08698 68.21 1189.24 693.69 -3.74

24* -0.22107 -0.10003 68.74 1183.65 676.29 -9.05

25 (VP-16) -0.21034 -0.02485 82.80 1409.29 785.78 -22.75

Ideally, from a pharmacological point of view, a drug should be orally bioavailable.

The oral route is particularly praised especially for those drugs that have to be self-

administered by the patients on a continuous basis. [43]The quickest strategy for

assessing the drug-likeness of a given compound is to apply "rules" as a standard

guideline. Orally ingested drugs should, in general, obey Lipinski's rule of five,


126

[44]which is summarized as: MW < 500; HBD < 5; log10P < 5; and HBAs < 10.

Additionally, Veber and Ghose’s rules complement the Lipinski’s one. For instance,

Veber et al. [45]

showed that compounds with Nbrot< 10 and TPSA < 140 Å2 present a good oral

bioavailability in rats. Also, Ghose et al. [46] suggested a qualifying range that could

be used in the development of drug-like chemical libraries, recommending the

following constraints: 160 ≤ MW ≤ 480; - 0.4 ≤ log10 P ≤ 5.6; 40 ≤ MR ≤ 130 and 20 ≤

Natom ≤ 70. In the following, we will discuss the oral bioavailability of the CDCAs of

interest in light of the parameters presented in Tables 8 and 9:

MW is closely connected to the molecular size. As the size of a compound

gets larger, a bigger cavity must be shaped in water to accommodate it, thus

decreasing the corresponding solubility. Growing MW reduces the

compound fixation at the surface of the intestinal epithelium, hence reducing

its absorption. In addition, a large molecular size obstructs passive diffusion

through the firmly pressed aliphatic side chains of the cellular bilayer

membrane. [47]Table 9 shows that all the considered compounds, except

Compound 25, verify both Ghose and Lipinski’s rules, because their MW

are in the 325 – 470 range. Most likely, all these compounds will easily pass

through cell membranes.

The relative hydrophilic/hydrophobic properties of a drug are crucial in

influencing its solubility, ADME (Absorption, Distribution, Metabolism,

and Excretion) properties, as well as pharmacological activity. Absorption in

the gastrointestinal tract is most effective when the drug has the right

balance of solubility in water and fat, which is the case for moderate values

of log10P (i.e. 0 < log10P < 3). If the drug is polar with low log10P

(hydrophilic), it will fail to penetrate across the intestinal epithelial cell

layer. Whereas if the molecule is nonpolar, with high log10P (hydrophobic),

it will dissolve in fat globules, leading to poor contact with the intestinal

walls and resulting in malabsorption. [14] As can be seen in Table 9, all

compounds present a log10P in this range, except Compounds 24 and 25,

which have out-of-range values of -2.22 and -4.27, respectively.


127

TPSA of a molecule corresponds to the surface sum over every single polar

atom (e.g., oxygen, nitrogen), where we include also attached hydrogens.

Generally, a molecule with TPSA > 140 Å2 will have low oral absorption.

For the central nervous system, the upper limit of TPSA for the drug to cross

the blood-brain barrier is even reduced to 80 Å2. [48]Table 9 shows that the

TPSA values are in the range 22-40 for Compounds 1-23 and that

Compound 24 is still within the acceptable range (< 140 Å2), whereas

Compound 25 is out-of-range. The percentage of absorption (%ABS, Table

9) is related to TPSA via %ABS = 109 ± 0.345 x TPSA. [49]Except

Compound 25, %ABS values are close to 100%. Again, these compounds

should have good cellular plasmatic membrane permeability.

Nbrot counts any single bond bound to a nonterminal heavy atom and not

being part of a ring. [50,51]Table 9 shows that Nbrot is smaller than 8 for all

compounds. This is a signature of relatively good molecular flexibility,

which helps the drug to traverse the membrane and better interact with the

enzyme.

Ligand efficiency (LE = 1.4 x pIC50/NH) measures the potency per heavy

atom of a given compound, where, for drugs of comparable activity, the

smaller ones have greater LE than the larger ones. As can be seen in Table

9, Compounds 7, 3 and 11 have a rather small number of NH, and thus

possess relatively high LE values (of 0.358, 0.299, 0.277, 0.273), whereas

Compounds 15, 23, 25 have higher values of NH leading to small values of

LE (0.179, 0.209, 0.224). Hence, the latter ones may exhibit bad

physicochemical and ADME properties.

Lipophilic ligand efficiency (LLE = pIC50 – log10P) gives insight into the

interaction of a molecule with its own receptor, where low lipophilicity

should be maintained [52] for good interaction. Concerning the studied

series, Table 9 shows that all compounds have positive LLE values, which is

considered as a favorable quality to create highly efficient interactions with

the biological targets.


128

In sum, the Lipinski, Veber and Ghose’s rules are verified by most of our compounds,

except number 25. Therefore, these compounds are ideal for oral bioavailability. They

have also large chance of achieving high intestinal permeability and solubility.

Compound log10P MW HBA HBD TPSA Natom MR Nbrot Lipinski Score Ghose Score Veber Score NH LE LLE ABS %

1* 2.01 325 1 0 22.00 44 115.47 4 4 4 2 25 0.273 2.864 101.41

2 0.72 411 3 0 34.47 57 138.17 5 4 3 2 31 0.208 3.880 97.11

3 1.78 360 2 0 22.00 44 120.19 4 4 4 2 26 0.299 3.765 101.41

4* 2.58 393 4 0 22.00 47 120.69 5 4 4 2 29 0.251 2.629 101.41

5 0.21 369 3 0 40.46 47 121.06 4 4 4 2 28 0.229 4.364 95.04

6 1.41 343 2 0 22.00 44 115.60 4 4 4 2 26 0.263 3.478 101.41

7 2.16 339 1 0 22.00 47 119.75 4 4 4 2 26 0.358 4.498 101.41

8 0.62 331 1 0 22.00 41 112.77 4 4 4 2 26 0.258 4.167 101.41

9 2.06 404 2 0 22.00 44 123.01 4 4 4 2 26 0.249 2.563 101.41

10 2.52 360 2 0 22.00 44 120.19 4 4 4 2 26 0.271 2.518 101.41

11 1.78 360 2 0 22.00 44 120.19 4 4 4 2 26 0.277 3.364 101.41

12* 0.02 343 2 0 22.00 44 115.60 4 4 4 2 26 0.280 5.175 101.41

13 2.31 385 3 0 40.46 52 128.22 6 4 4 2 29 0.254 2.955 95.04

14 2.31 353 1 0 22.00 50 124.04 4 4 4 2 27 0.235 2.215 101.41

15 0.41 373 3 0 31.23 48 121.97 5 4 4 2 28 0.179 3.163 98.23

16 1.56 357 2 0 22.00 47 119.88 4 4 4 2 27 0.281 3.853 101.41

17 1.85 440 3 0 22.00 51 144.32 5 4 4 2 32 0.263 4.163 101.41

18 2.61 436 2 0 22.00 54 148.47 5 4 3 2 32 0.248 3.056 101.41

19 2.46 422 2 0 22.00 51 144.19 5 4 3 2 31 0.244 2.951 101.41

20 1.85 440 3 0 22.00 51 144.32 5 4 3 2 32 0.270 4.324 101.41

21 1.25 458 4 0 22.00 51 144.45 5 4 3 2 33 0.233 4.235 101.41

22 1.85 440 3 0 22.00 51 144.32 5 4 3 2 32 0.234 3.493 101.41

23 2.23 456 3 0 22.00 51 148.91 5 4 3 2 32 0.209 2.558 101.41

24* -2.22 467 4 0 67.82 53 149.41 6 4 2 2 34 0.243 8.123 85.60

25 (VP-16) -4.27 589 12 3 160.83 74 145.81 5 2 0 1 42 0.224 10.991 53.51

Table 9: Drug-likeness parameters and lipophilicity indices for the CDCAs under study. See text for the definition of these parameters and their description.


130

III.5 QSAR modelling studies

QSAR studies have proved their applicability in modern drug discovery

protocols. The goal of QSAR models is to establish quantitative relations connecting

the physicochemical properties of classes of drugs to their biological activity.

Typically, this involves the synthesis of a series of differently substituted analogue

compounds, and the study of how their biological activities depend on the

physicochemical properties of the substituents. [14]Several linear and nonlinear

statistical model-building methods have been applied to QSAR, [53]where the needed

physicochemical descriptors are treated as independent variables. We used both

multiple linear regressions (MLR) and artificial neural network (ANN), as

implemented in the software JMP 8.0.2 and XLSTAT, [54,55] to correlate these

parameters with the biological activities of CDCAs. For internal, external validation

and Y-randomization [56,57]of the QSAR model, we used different statistical

parameters such as:

(i) The leave-one-out cross-validation coefficient (R2cv(LOO)), used to predict

the internal validation of a model. The threshold for acceptability is

R2CV(LOO)> 0.5. [58]

(ii) The significance level (P-value). It should be < 0.05.

(iii) The square correlation coefficient for external validation (R2ext). It is taken

as a good indicator when its value is greater than 0.6. [58-60]

(iv) The PRESS/SSY ratio, where PRESS is the predicted residual sum of

squares and SSY is the sum of squares deviation. It should be smaller than

0.4. [61]

(v) The rm2 metric introduced by Roy et al. [62]It should be ensured that rm̅

2>

0.5 and Δrm2< 0.2 to prevent bias in model predictability. This applies to

both the training (internal validation) and the test (external validation) sets.

III.5.1.1 Internal validation

The internal validation of a QSAR model is carried out using a cross-validation

method, using the molecules from the training set. In this paper, we used the cross-


131

validation leave-one-out (LOO) approach, where a large value of the cross-validated

squared correlation coefficient R2cv is considered as a proof of high predictive capacity

of the model. [58]

An acceptable Q2cv does not guarantee that the predicted activity values are close

to the observed ones, but simply attests that a good overall correlation exist between

them. So, to avoid this error and better highlight the predictability of the model, we

considered also the rm2 metrics, including average rm

2(LOO) (the average value of rm

2and

r′m2 ) and Δr2

m(LOO) (the absolute difference between rm2 and r’m

2)(Table 10), introduced

by Roy and colleagues. [62] For a good prediction of the QSAR model Δr2m(LOO)

should be less than 0.2, provided that the average value of rm2 is greater than 0.5.

III.5.1.2 External validation

External validation guarantees the predictability of the developed QSAR model

and its applicability to potential new drugs. The correlation between observed and

predicted data is measured by the R2ext metric (Table 10). According to Golbraikh and

Tropsha[58-60], the following criteria for the external validation, based on the

validation set, were adopted:

(i) R2ext> 0.6

(ii) (r2- r02/ r2) < 0.1 and 0.85 ≤ k ≤ 1.15 or (r2- r’0

2/ r2) < 0.1 and 0.85 ≤ k’ ≤ 1.15

(iii) |rm2 - r′m

2 | < 0.3

III.5.1.3 Y-randomization

Y-randomization [56] is a widely accepted validation technique used to check the

robustness of the models. In this validation approach, the values of the target variable

are randomly redistributed over the entire learning set and a new model is derived. The

whole operation is repeated several times. For robustness, the squared mean

correlation coefficient (R2r) of randomized models should be less than the squared

correlation coefficient (R2) of non-randomized ones. Based on the results of the


132

randomization tests, the R2p (= R2√(R2 + Rr

2)) [53] value should be greater than 0.5 to

ensure that the good predictive capacity of the so-developed models is not reached by

mere chance.

III.6 Multiple linear regression (MLR)

Despite being the oldest, MLR still remains one of the most popular approaches

to build QSAR models. This is due to its simple practical use, ease of interpretation

and transparency. Indeed, the key algorithm is available and accurate predictions can

be provided. [63]The values of the calculated descriptors are those listed in Tables 8

and 9. Data were randomly divided into two groups: a training set (internal validation)

and a testing set (external validation). The significant correlation analysis between

Topo II inhibition activity and descriptors is represented by the following equation:

pIC50 = 6,759 - 0,135×TPSA + 0,314×Natom - 0,015×V -0,485×HE +

0,314×log10P…… (Eq.1)

Table 10: Numerical parameters used to confirm the QSAR model validity. See text for more

details.

Model Training set Test set

R2inter R2

cv ṝm2

(LOO) Δrm2

(LOO) RMSE PRESS/SST R2ext ṝm

2(test) Δrm

2(test) RMSE PRESS/SST

MLR 0.768 0.686 0.601 0.162 1.224 0.341 0.800 0.596 0.197 0.545 0.092

ANN 0.999 0.991 0.988 0.000 0.005 0.000 0.650 0.598 0.108 0.372 0.135

The statistical parameters of this model are listed in Table 10. The high values

of the correlation coefficient (R2) for the training set (R2inter= 0.768) and test set

(R2ext=0.800) indicate the significant correlation between different independent

variables with Topo II inhibition activity. In addition, Table 10 shows that RMSE and

PRESS/SSY ratio are low for both internal validation (= 0.341 < 0.4) and external

validation (= 0.092 < 0.4). This confirms the reliability of our model.


133

Figure.38. Generic leverage plot of observed [11] (Observed pIC50) versus predicted

(Predicted pIC50) activity according to the MLR model. The oblique line is the

diagonal.

The model was checked by applying the Y-randomisation test. Several Y vector

random shuffles were performed and small average values of 0.1445 for R2 and -0.327

for Q2 (Table 11) were obtained after 1000 random tests, thus proving that the good

results of our original model are not due to chance correlation or structural dependence

of the training set.

Table 11:Random MLR model parameters.

Average R 0.3646

Average R2 0.1445

Average Q2 -0.327

cRp2 0.768


134

For further testing the reliability and stability of our model (Eq.1), we used the

"Leave-One-Out" (LOO) technique. Again, the high value of cross-validation

correlation coefficient (R2CV = 0.686 > 0.5) for this model allows us to qualify it as

valid. [58]Other criteria such as parameters metric (rm̅2 and Δrm

2, Table 10) confirm

that this QSAR model is acceptable. The plot of the predicted values of biological

activity pIC50 vs. the experimental values is shown in Figure 38, which certifies once

more the validity of the predicted model.

Let us now analyze the coefficients in Eq.1. The positive values of log10P and

Natom indicate a promoter effect on Topo II inhibition activity. Indeed, any increase in

the octanol-water partition coefficient and number of atoms of the molecules causes an

increase in the biological activity. This is related to the stabilization of the polar

groups in the CDCAs and represents a good criterion for molecule accessibility in the

biologic medium. Also, since the octanol-water partition coefficient provides an

indication of the lipophilic character of the drug and its aptitude to cross the cell

membrane, it represents a key parameter to measure the drug inhibitory activity.

No contribution from the HOMO-LUMO gap appears in Eq.1. Thus, the model

is independent of this parameter. This may be related to the almost constant value of

this gap computed for all the CDCAs of interest, as noted earlier. Eq.1 does not

include any dependence on HBD since this parameter is zero for all compounds,

except Compound 25 (Table 9). Similarly, MR and HBA parameters seem to have no

influence on the biological activity, since they do not appear in our model.

III.7 Artificial Neural Networks (ANN)

ANN [64,65]is a popular nonlinear model, which is used to predict the

biological activity (i.e. IC50) of the datasets of therapeutic molecules. It presents

several benefits like better prediction, adaptation and generalization capacity beyond

the studied sample, and better stability of the coefficients. It is employed in complex

drug design, drug engineering and medicinal chemistry domains. [66]


135

Figure .39.The architecture (6-4-1) of the adopted ANN model.

In this work, the neural network is a system of fully interconnected neurons

arranged in three layers. The input layer is made of six neurons, where each of them

receive one of the six descriptors selected from the correlation matrix of the model.

The intermediate (hidden) layer is composed of four neurons that form the deep

internal pattern that discovers the most significant correlations between predicted and

experimental data. One neuron constitutes the output layer, which returns the value of

IC50 (Figure 39). [67]

Figure .40.Correlation plot of observed (Observed pIC50) [11] versus predicted activity

(Predicted pIC50) calculated using ANN.


136

As can be seen in Figure 40, a good agreement between experimental data and

predicted pIC50 issued from the ANN model is observed. Indeed, the statistical

parameters for this model, as given in Table 10, reveal a correlation coefficient close

to 1 (R2inter=0.999), associated with lower values of RMSE and PRESS/SSY for both

external and internal validations. Note that these parameters are better than those

found for the MLR model, indicating that the ANN one is more reliable. Furthermore,

the robustness of the model was further confirmed by the significant R2ext value of the

test data set (R2ext= 0.650). To further evaluate our model, we applied the Leave-One-

Out cross-validation method. The resulting rm̅2

(LOO), Δrm2

(LOO), rm̅2

(test), Δrm2

(test)

statistical criteria (Table 10) support the validity of our model, since they fall within

the ranges specified by Roy et al. [62]

General conclusion

137

General Conclusion

General conclusion

138

Using first principle methodologies, we studied the molecular structure and the

vibrational/electronic spectroscopic properties of the set of CDCAs recently synthetized by Li

et al. as potential Topo II inhibitors. We also characterized the carbazole subunit of these

CDCAs. Afterwards, we derived a set of physicochemical parameters for these compounds,

which are incorporated later into MPO, MLR and ANN methods for screening the

lipophilicity indices, structure-activity relationship and to construct a QSAR model. The

results indicate that these compounds have a 98% chance of being orally active. As can be

seen in Table 10, the ANN network has substantially better predictive capabilities compared

to MLR, leading to pIC50 values closer to the experimental determinations. Nevertheless, both

models remain satisfactory and exhibit a high predictive power, thus validating their use to

explore and propose new molecules as active inhibitors of Topo II.

Table .10. Measured [11] and predicted pIC50 using MLR and ANN methods. * denotes the

compounds selected for external validation (test set).

Compound MLR predicted pIC50 ANN predicted pIC50 Observed pIC50

1* 5.496 4.873 4.874

2 4.600 4.599 4.600

3 5.574 5.536 5.545

4* 5.166 5.209 5.209

5 4.456 4.573 4.574

6 5.039 4.886 4.888

7 6.466 6.656 6.658

8 4.953 4.804 4.787

9 5.031 4.623 4.623

10 5.245 5.039 5.038

11 4.864 5.138 5.144

12* 4.690 5.195 5.195

13 5.111 5.265 5.265

14 4.568 4.527 4.525

15 4.228 3.574 3.573

16 4.766 5.413 5.413

17 6.073 6.013 6.013

General conclusion

139

18 6.188 5.768 5.666

19 5.588 5.411 5.411

20 5.642 5.466 6.174

21 5.303 5.284 5.485

22 5.558 5.342 5.343

23 4.703 5.098 4.788

24* 0.427 5.898 5.903

The investigation of CDCAs at the microscopic level allows a screening of the

physicochemical properties related to their activity in biological media. In contrast to the

statement by Li et al., we did not notice any obvious relation between the incorporation of a

halogen on the chemical composition of these CDCAs and their biological activity. Our work

shows, however, that among the physicochemical parameters considered here, only log10P and

Natom play a role. These two parameters are connected with the flexibility of these compounds,

which seems increasing the Topo II inhibitory activity and more generally improves the

anticancer potency of these compounds. For instance, the coefficient in front of log10P in our

model (Eq.1) is positive and relatively large. This means that the flexibility of the CDCAs, by

facilitating cell membrane crossing, will enhance the Topo II inhibition. This may be related

to the mechanism of action of Topo II itself, which consists in a balanced flexibility-rigidity

associated with structural changes of the quaternary structure [68] of this enzyme during the

CDCA-Topo II-DNA interaction. These findings were already suggested by Guedes et al. [69]

This work discloses the phenomenological relations existing between potential biological

activities and molecular descriptors of a relatively large panel of compounds to be used as

Topo II inhibitors, and pave the way for specific investigations of their complex mechanisms

of action at a molecular dynamics level.

APPENDIX

Appendix

141

Table S1: Chemical structure of the investigated CDCAs (Table 1) as configured at the theory level of

B3LYP/6-311G(d,p). In Angstrom, we give the Cartesian coordinates (X,Y,Z). The number of the

atom in the structure and its atomic number, respectively, correspond to CN(Center Number) and AN

(Atomic Number).

N Molecular Structure Cartesian coordinates CN AN X Y Z

1

1 6 -0.000003050 0.000046017 0.000000586

2 6 0.000009707 -0.000001908 0.000016748

3 6 -0.000003806 0.000002246 -0.000009254

4 8 0.000007744 -0.000022231 -0.000002513

5 6 -0.000006171 -0.000031415 -0.000006993

6 6 -0.000001642 0.000006633 0.000006829

7 6 0.000001428 0.000002239 0.000003030

8 6 0.000002843 0.000001175 0.000003104

9 6 0.000006202 -0.000001955 -0.000001558

10 6 -0.000002367 -0.000004444 0.000004983

11 6 -0.000001696 -0.000004343 -0.000002963

12 6 0.000001241 0.000000475 -0.000003286

13 6 0.000002966 -0.000001139 0.000002132

14 6 -0.000003638 0.000008892 -0.000005424

15 6 0.000000664 -0.000004022 -0.000009217

16 6 -0.000004033 -0.000002056 0.000002141

17 7 -0.000003881 -0.000003033 0.000019400

18 6 -0.000010756 0.000003415 -0.000005437

19 6 -0.000001163 -0.000007095 -0.000000147

20 6 -0.000002334 0.000005217 0.000003301

21 6 0.000010187 -0.000002566 0.000013611

22 6 -0.000002590 0.000015244 -0.000001697

23 6 0.000004705 -0.000007651 -0.000004148

24 6 0.000001421 0.000004393 -0.000010450

25 6 -0.000002366 -0.000001141 0.000000919

26 1 -0.000001982 -0.000000900 -0.000001167

27 1 0.000002322 0.000000383 -0.000000748

28 1 0.000003833 -0.000001051 0.000000212

29 1 0.000003203 -0.000002667 0.000002383

30 1 0.000002952 -0.000002687 0.000001044

31 1 0.000000745 -0.000001871 0.000001063

32 1 0.000001546 0.000000716 -0.000001274

33 1 -0.000002432 0.000000461 -0.000003325

34 1 -0.000002370 -0.000000561 -0.000002722

35 1 -0.000001085 -0.000001607 -0.000003734

36 1 -0.000002025 0.000001164 -0.000004006

37 1 -0.000000086 0.000003956 0.000003030

38 1 0.000000588 0.000000411 0.000001401

39 1 0.000000561 -0.000001691 -0.000003108

40 1 -0.000001273 0.000001401 0.000000682

41 1 -0.000002250 0.000001394 -0.000000986

42 1 -0.000000617 0.000001449 -0.000000526

43 1 -0.000001582 0.000000758 -0.000001424

44 1 0.000000339 -0.000000002 -0.000000492

Appendix

142

2

1 6 0.000005387 -0.000004291 0.000005055

2 6 -0.000010761 -0.000000836 -0.000009164

3 6 0.000002187 -0.000000116 0.000009235

4 8 -0.000002039 -0.000001701 -0.000001332

5 6 0.000000563 0.000006155 0.000002983

6 6 -0.000002893 -0.000005957 -0.000001186

7 6 0.000010163 0.000000633 -0.000001397

8 6 -0.000013025 0.000000884 -0.000001906

9 6 0.000001471 -0.000002870 -0.000001129

10 6 0.000000979 0.000000460 -0.000001512

11 6 -0.000002885 0.000000903 -0.000001141

12 6 0.000001241 -0.000003722 -0.000002768

13 8 -0.000006310 0.000002407 0.000007805

14 6 0.000011419 0.000001828 -0.000007113

15 6 -0.000003881 -0.000001208 -0.000006346

16 7 0.000005697 0.000000083 0.000009302

17 6 -0.000000034 -0.000000185 -0.000000674

18 6 -0.000000526 0.000000430 -0.000001540

19 6 0.000000763 -0.000001891 0.000001662

20 6 -0.000001633 0.000004432 -0.000001487

21 6 -0.000000230 -0.000001176 -0.000000515

22 6 0.000000764 0.000001237 0.000000291

23 7 -0.000000637 -0.000009731 -0.000006108

24 6 0.000000173 0.000006351 0.000003561

25 6 -0.000000570 -0.000000985 0.000002625

26 6 -0.000000715 -0.000000172 0.000000424

27 6 0.000000294 0.000001569 -0.000002272

28 6 0.000001029 0.000003560 -0.000000204

29 6 0.000001304 -0.000005790 0.000000135

30 6 0.000001908 0.000004972 0.000003962

31 6 -0.000000009 0.000000505 0.000001836

32 1 0.000001095 0.000002038 0.000000116

33 1 -0.000000477 0.000000801 0.000002154

34 1 -0.000000579 0.000000581 0.000000512

35 1 -0.000000173 0.000000058 0.000000884

36 1 0.000000600 0.000001193 -0.000001678

37 1 0.000000266 -0.000000267 -0.000000440

38 1 -0.000001697 -0.000000939 -0.000002156

39 1 -0.000000049 0.000001418 -0.000001530

40 1 -0.000000988 0.000000958 -0.000001660

41 1 -0.000000035 0.000000265 -0.000001426

42 1 -0.000000035 0.000000240 -0.000000748

43 1 -0.000000894 -0.000001478 0.000000059

44 1 -0.000001051 -0.000000026 -0.000000076

45 1 0.000001538 0.000000578 0.000000977

46 1 0.000000254 0.000000002 -0.000000707

47 1 0.000000140 0.000000202 -0.000000846

48 1 -0.000000028 -0.000000064 -0.000002091

49 1 0.000000118 -0.000000411 -0.000001126

50 1 0.000000108 -0.000000189 0.000000669

51 1 0.000000021 0.000000105 0.000001935

52 1 0.000000547 0.000000935 0.000001451

53 1 -0.000001101 -0.000001152 -0.000001084

54 1 0.000001128 0.000000387 0.000002688

55 1 0.000000585 0.000000101 0.000000562

56 1 0.000000699 -0.000000653 0.000001224

57 1 0.000000816 -0.000000462 0.000001256

Appendix

143

3

1 6 -0.000002374 0.000007476 0.000004949

2 6 0.000002054 -0.000002553 -0.000004138

3 6 0.000001781 0.000003233 0.000001714

4 8 0.000001748 -0.000002198 -0.000001319

5 6 -0.000001364 0.000001494 0.000000223

6 6 -0.000001931 -0.000000292 -0.000000019

7 6 0.000000625 0.000003596 -0.000000494

8 6 0.000001917 -0.000004596 -0.000005448

9 6 0.000000256 0.000000398 0.000004834

10 6 -0.000001224 0.000000733 -0.000003669

11 7 -0.000005575 -0.000002626 -0.000007410

12 6 0.000008422 0.000011938 0.000000365

13 6 -0.000005594 -0.000008105 -0.000005551

14 6 0.000006020 0.000013871 0.000010474

15 6 0.000004224 -0.000008850 -0.000012944

16 6 -0.000007568 0.000002809 0.000006967

17 6 -0.000001471 -0.000001042 0.000005360

18 6 0.000000487 0.000001726 0.000000965

19 6 0.000001729 0.000000780 -0.000001226

20 6 0.000001918 -0.000009630 0.000003597

21 6 -0.000000113 0.000001721 0.000000444

22 6 -0.000000471 -0.000002961 0.000002398

23 6 -0.000000716 -0.000001259 -0.000000655

24 6 0.000000269 -0.000004837 0.000003692

25 6 -0.000001674 0.000005013 0.000000404

26 17 -0.000001129 -0.000002381 0.000001497

27 1 0.000000696 0.000000668 0.000002053

28 1 0.000001785 0.000000085 0.000001143

29 1 -0.000001276 0.000000258 -0.000001698

30 1 -0.000000889 0.000000322 -0.000000922

31 1 -0.000000302 -0.000000242 0.000000055

32 1 -0.000001018 0.000000843 -0.000001107

33 1 -0.000001468 -0.000004245 -0.000001668

34 1 0.000002528 0.000000933 0.000000111

35 1 0.000001925 -0.000001633 0.000000505

36 1 0.000000203 -0.000000605 -0.000000926

37 1 -0.000001016 0.000000861 -0.000001798

38 1 0.000000253 0.000001703 -0.000001758

39 1 0.000000037 0.000001681 -0.000001378

40 1 0.000000496 0.000001548 -0.000001383

41 1 -0.000000406 -0.000001817 0.000001532

42 1 -0.000001454 -0.000001223 0.000001198

43 1 -0.000000665 -0.000000942 0.000000464

44 1 0.000000324 -0.000001653 0.000000568

Appendix

144

4

1 6 -0.000000245 -0.000015843 0.000000210

2 6 -0.000002401 0.000001093 -0.000000207

3 6 -0.000000417 -0.000000467 0.000002526

4 8 -0.000003639 0.000006264 -0.000002060

5 6 0.000004245 0.000011963 0.000006543

6 6 -0.000001543 -0.000006354 -0.000002494

7 6 0.000004917 0.000001076 -0.000001337

8 6 -0.000003356 0.000003583 0.000005249

9 6 0.000000444 -0.000004398 -0.000000491

10 6 0.000001564 0.000000637 0.000000578

11 6 -0.000001944 0.000000809 0.000000800

12 9 0.000000538 -0.000000856 -0.000001947

13 9 0.000004028 -0.000000908 -0.000003148

14 9 0.000000921 -0.000001692 0.000005133

15 6 -0.000000333 0.000000610 -0.000002877

16 6 -0.000000538 -0.000000757 -0.000002757

17 6 -0.000001373 0.000000950 -0.000002915

18 6 0.000000322 -0.000001589 -0.000000629

19 6 -0.000000542 0.000000900 0.000000181

20 6 -0.000000380 0.000000327 -0.000003794

21 7 -0.000001036 -0.000002034 -0.000006681

22 6 0.000001327 0.000003093 0.000002785

23 6 -0.000000348 -0.000001088 0.000000036

24 6 0.000002724 0.000001074 0.000000806

25 6 -0.000001596 0.000000421 0.000003740

26 6 -0.000000742 0.000003271 -0.000003837

27 6 -0.000001544 -0.000000787 0.000002471

28 6 0.000000293 0.000001225 0.000002763

29 6 0.000000194 0.000000014 -0.000000486

30 1 0.000000418 0.000000071 0.000000134

31 1 -0.000000136 0.000000731 0.000002777

32 1 0.000001043 0.000000103 0.000002882

33 1 0.000000843 -0.000000491 0.000002367

34 1 0.000000315 -0.000000167 -0.000000046

35 1 -0.000000129 0.000000378 -0.000000595

36 1 -0.000000705 -0.000000345 -0.000003662

37 1 -0.000000373 -0.000000143 -0.000003847

38 1 -0.000000251 -0.000000050 -0.000001829

39 1 -0.000000360 -0.000000102 -0.000001482

40 1 -0.000000213 0.000000547 0.000002141

41 1 -0.000000489 0.000000486 0.000002563

42 1 0.000000398 -0.000000820 0.000000798

43 1 -0.000000833 0.000000359 -0.000000347

44 1 -0.000000636 0.000000477 -0.000000320

45 1 0.000000190 -0.000000257 0.000000363

46 1 0.000000767 -0.000000768 -0.000000694

47 1 0.000000608 -0.000000542 0.000000635

5

1 6 -0.000000917 0.000002148 0.000008893

2 6 0.000003255 0.000001271 0.000001916

3 6 0.000008782 0.000000812 -0.000000308

4 8 0.000000697 0.000002432 -0.000002872

5 6 -0.000000730 -0.000000215 0.000001289

6 6 -0.000000506 -0.000000282 0.000000693

7 6 0.000000158 -0.000001116 -0.000000112

8 6 -0.000002040 0.000000293 0.000004727

9 6 -0.000000977 -0.000001941 -0.000002262

10 6 -0.000000425 -0.000001220 0.000000695

11 7 0.000000789 0.000000843 0.000001810

12 6 0.000000051 -0.000002468 -0.000002802

13 6 -0.000001908 -0.000000064 0.000001955

14 6 0.000002927 0.000000667 0.000002162

15 6 -0.000009636 -0.000001427 -0.000003537

16 6 -0.000000238 -0.000002222 0.000000321

17 6 0.000004068 0.000003565 -0.000002278

18 6 -0.000000612 -0.000001321 0.000001087

19 6 -0.000000944 0.000000400 -0.000000221

20 8 0.000005673 0.000001600 0.000015939

Appendix

145

21 6 0.000009296 0.000008942 -0.000016977

22 8 -0.000007658 -0.000005574 -0.000001730

23 6 -0.000007464 -0.000008029 0.000019520

24 6 -0.000004329 0.000003487 -0.000027465

25 6 0.000014262 0.000006844 0.000009267

26 6 -0.000016985 -0.000010820 -0.000006638

27 6 0.000001716 0.000002631 -0.000011352

28 6 0.000007421 0.000004481 0.000005813

29 1 -0.000000072 -0.000001195 -0.000000209

30 1 -0.000000822 -0.000000312 -0.000000078

31 1 0.000000820 -0.000000992 0.000000472

32 1 0.000000284 -0.000001148 0.000000001

33 1 -0.000000408 0.000000005 -0.000000476

34 1 0.000000703 -0.000000088 0.000000325

35 1 0.000000539 0.000000726 -0.000000125

36 1 -0.000000563 -0.000000157 -0.000000086

37 1 -0.000000207 0.000000133 -0.000000517

38 1 -0.000000121 0.000000738 0.000000510

39 1 0.000000824 -0.000000528 0.000000913

40 1 0.000000207 -0.000000116 0.000001350

41 1 -0.000000347 -0.000000903 -0.000000185

42 1 -0.000000684 -0.000000563 0.000000885

43 1 -0.000000536 -0.000000500 -0.000000515

44 1 -0.000001539 0.000000708 0.000001319

45 1 -0.000001925 -0.000000769 -0.000001035

46 1 0.000000248 0.000001440 0.000001504

47 1 -0.000000128 -0.000000193 -0.000001585

6

1 6 -0.000012637 0.000029245 0.000006654

2 6 0.000015818 -0.000001369 0.000013279

3 6 -0.000006526 -0.000009624 -0.000006276

4 8 0.000011426 -0.000011925 -0.000000264

5 6 -0.000001131 -0.000020433 -0.000015415

6 6 0.000001827 0.000005072 0.000011336

7 6 0.000000334 -0.000001511 0.000003382

8 6 0.000013312 0.000007952 0.000001249

9 6 0.000005907 0.000000908 -0.000001180

10 6 -0.000008047 -0.000004047 0.000003204

11 9 -0.000002459 -0.000010057 0.000000683

12 6 -0.000001780 -0.000003358 -0.000004536

13 6 -0.000000180 0.000000580 -0.000004932

14 6 0.000002483 -0.000001438 0.000000547

15 6 -0.000006187 0.000008278 -0.000004855

16 6 0.000001178 -0.000003977 -0.000006192

17 6 -0.000004291 -0.000001583 -0.000000266

18 7 -0.000004314 -0.000002658 0.000008506

19 6 -0.000010476 0.000002919 -0.000000531

20 6 -0.000001292 -0.000003972 -0.000000796

21 6 -0.000004025 0.000002312 0.000004528

22 6 0.000005593 0.000010956 0.000000025

23 6 0.000000185 0.000004550 0.000003692

24 6 0.000006558 -0.000005177 -0.000000610

25 6 0.000000771 0.000004515 -0.000006641

26 6 -0.000001125 -0.000001359 0.000003236

27 1 -0.000001938 -0.000003190 0.000004858

28 1 0.000002180 0.000001582 0.000002257

29 1 0.000005617 0.000001497 0.000000683

30 1 0.000002284 -0.000003294 0.000002894

31 1 0.000000250 -0.000002619 -0.000000684

32 1 0.000002397 0.000000005 -0.000002329

33 1 -0.000002967 0.000000624 -0.000004794

34 1 -0.000003036 -0.000000532 -0.000004554

35 1 -0.000001874 -0.000001143 -0.000004468

36 1 -0.000002413 0.000001548 -0.000004187

37 1 0.000000529 0.000003549 0.000004318

38 1 0.000001078 0.000001421 0.000001953

39 1 -0.000000741 -0.000000054 -0.000002451

40 1 -0.000000141 0.000002356 0.000001477

Appendix

146

41 1 -0.000001888 0.000001773 -0.000001689

42 1 0.000000327 0.000001622 -0.000000745

43 1 -0.000001295 0.000000352 -0.000000904

44 1 0.000000707 -0.000000300 0.000000538

7

1 6 0.000000263 -0.000002846 -0.000000061

2 6 0.000000057 0.000001263 0.000001454

3 6 -0.000000105 -0.000000014 0.000002205

4 8 -0.000000484 0.000000472 0.000000514

5 6 0.000001578 0.000001591 0.000001030

6 6 0.000001485 -0.000000891 0.000001965

7 6 0.000001870 -0.000000647 0.000002021

8 6 0.000002336 -0.000001011 0.000000785

9 6 0.000001142 -0.000000428 -0.000000367

10 6 -0.000000286 -0.000001253 -0.000000290

11 6 0.000000685 -0.000001721 -0.000000341

12 6 -0.000001610 -0.000000111 -0.000003890

13 6 -0.000001307 -0.000000614 -0.000003975

14 6 -0.000001360 0.000000326 -0.000002911

15 6 -0.000000649 -0.000000220 -0.000002023

16 6 -0.000001581 0.000000551 -0.000003260

17 6 -0.000001707 0.000000138 -0.000002433

18 7 -0.000000783 0.000000627 0.000005274

19 6 -0.000000821 0.000001316 -0.000001670

20 6 -0.000000569 0.000000538 0.000002240

21 6 0.000000527 0.000001360 0.000002032

22 6 -0.000000304 0.000000776 0.000002185

23 6 -0.000000501 0.000001625 -0.000000504

24 6 -0.000001029 0.000000305 -0.000000936

25 6 -0.000001127 0.000000873 -0.000002612

26 6 -0.000000304 0.000000079 -0.000000031

27 1 0.000000663 0.000000006 0.000002617

28 1 0.000000455 0.000000969 0.000003472

29 1 0.000002328 -0.000000682 0.000002890

30 1 0.000002483 -0.000001231 0.000002457

31 1 0.000000670 -0.000000929 -0.000001067

32 1 0.000000104 -0.000000245 -0.000001055

33 1 0.000003080 -0.000001789 0.000001548

34 1 0.000002672 -0.000002170 0.000000060

35 1 0.000001829 -0.000000699 0.000001424

36 1 -0.000002052 -0.000000141 -0.000004895

37 1 -0.000001655 -0.000000446 -0.000004942

38 1 -0.000001002 -0.000000318 -0.000002879

39 1 -0.000001683 0.000000432 -0.000002716

40 1 -0.000000356 0.000001332 0.000002329

41 1 0.000000044 0.000001067 0.000003292

42 1 -0.000000156 -0.000000079 0.000000042

43 1 -0.000001343 0.000001538 0.000001100

44 1 -0.000001852 0.000001279 -0.000000681

45 1 -0.000000187 0.000000447 0.000000553

46 1 0.000000020 -0.000000275 -0.000000773

47 1 0.000000524 -0.000000153 0.000000823

8

1 6 0.000020849 -0.000000797 -0.000018457

2 6 -0.000013177 0.000010498 0.000001562

3 6 0.000008409 0.000009858 0.000000403

4 8 -0.000006825 -0.000001494 0.000007098

5 6 -0.000000087 0.000003333 0.000000326

6 6 -0.000001451 -0.000002032 -0.000000019

7 6 0.000001281 0.000000188 -0.000000194

8 6 0.000005854 0.000002389 -0.000002187

9 6 -0.000000450 -0.000001958 -0.000001315

10 6 0.000001476 -0.000000552 -0.000000727

11 7 -0.000002881 -0.000001632 0.000003013

12 6 0.000003321 0.000004800 -0.000003469

13 6 -0.000002343 -0.000005571 0.000001667

14 6 0.000002765 0.000010946 0.000004173

15 6 -0.000008291 -0.000016475 0.000001661

16 6 0.000005036 0.000003166 -0.000000526

Appendix

147

17 6 -0.000009241 -0.000003528 0.000000067

18 6 -0.000000006 0.000000117 -0.000002094

19 6 0.000000529 -0.000000282 0.000000519

20 6 -0.000003769 -0.000012050 0.000006869

21 16 -0.000004732 -0.000003182 0.000007910

22 6 -0.000000542 0.000012726 -0.000014923

23 6 0.000002310 0.000006492 -0.000004905

24 6 -0.000002343 -0.000007565 0.000016548

25 1 0.000002128 -0.000001572 -0.000000264

26 1 0.000000178 -0.000000560 0.000000706

27 1 0.000000067 -0.000000301 -0.000000466

28 1 -0.000000215 0.000000214 -0.000000915

29 1 -0.000000072 0.000000014 -0.000000655

30 1 -0.000000341 0.000000506 -0.000000771

31 1 0.000000175 0.000000970 -0.000000577

32 1 0.000001353 -0.000001930 0.000000604

33 1 -0.000000656 -0.000000430 -0.000000250

34 1 -0.000000113 -0.000000067 0.000000271

35 1 0.000000146 -0.000000086 0.000000180

36 1 0.000000020 0.000000436 -0.000000404

37 1 0.000000178 0.000000015 -0.000000329

38 1 0.000000332 0.000000136 -0.000000057

39 1 0.000001573 -0.000003697 0.000003122

40 1 -0.000001937 -0.000001353 -0.000001099

41 1 0.000001495 0.000000312 -0.000002097

9

1 6 0.000000458 -0.000016797 -0.000027164

2 6 -0.000009286 -0.000002275 0.000002749

3 6 0.000005989 0.000009311 0.000001144

4 8 0.000006042 -0.000001585 0.000007678

5 6 -0.000002159 0.000001702 -0.000000638

6 6 -0.000001610 0.000003897 -0.000000471

7 6 0.000002802 0.000003328 -0.000000746

8 6 -0.000006912 -0.000001436 -0.000005925

9 6 0.000003596 0.000000007 -0.000005410

10 6 -0.000002223 -0.000001883 -0.000001025

11 7 0.000001852 0.000000950 0.000004247

12 6 -0.000004530 -0.000000681 0.000005340

13 6 0.000001499 -0.000000026 -0.000001123

14 6 0.000002579 -0.000004577 0.000002726

15 6 -0.000002319 -0.000002621 -0.000000099

16 6 0.000000566 -0.000000037 -0.000003285

17 6 0.000007772 0.000004099 0.000005506

18 6 -0.000003750 0.000000428 -0.000001825

19 6 -0.000002603 0.000000771 -0.000002038

20 6 0.000001708 0.000016431 0.000023921

21 6 0.000002209 -0.000008924 -0.000001393

22 6 0.000001735 -0.000003179 0.000005431

23 6 -0.000001149 -0.000001686 -0.000001726

24 6 -0.000001737 0.000001079 -0.000000795

25 6 -0.000001839 0.000000812 -0.000006705

26 35 -0.000000400 -0.000001887 0.000000565

27 1 0.000004824 -0.000002851 -0.000001466

28 1 0.000001236 -0.000001810 -0.000000805

29 1 0.000000531 0.000002851 -0.000000679

30 1 0.000001493 0.000001464 -0.000000884

31 1 0.000000484 0.000000999 0.000001315

32 1 -0.000001085 0.000003142 0.000001069

33 1 -0.000000560 0.000000549 -0.000000172

34 1 -0.000000646 -0.000000502 -0.000001058

35 1 0.000001395 0.000000429 0.000001689

36 1 0.000000205 0.000002039 -0.000003675

37 1 -0.000001675 0.000001995 0.000002473

38 1 -0.000002579 0.000002262 0.000000136

39 1 -0.000000318 0.000000092 0.000000127

40 1 -0.000001424 -0.000000189 -0.000000229

41 1 -0.000000893 -0.000000997 0.000000795

42 1 -0.000001206 -0.000001482 -0.000000845

Appendix

148

43 1 0.000001390 -0.000001632 0.000000704

44 1 0.000000537 -0.000001581 0.000002563

10

1 6 -0.000013947 -0.000008364 0.000024202

2 6 -0.000002657 -0.000016791 -0.000010074

3 6 0.000005463 0.000011580 -0.000000548

4 8 0.000008547 0.000001360 -0.000010796

5 6 0.000000312 0.000001391 0.000002380

6 6 0.000001011 0.000000983 0.000000675

7 6 0.000000549 0.000000295 0.000001152

8 6 -0.000009502 -0.000001562 0.000001210

9 6 -0.000001728 0.000006856 -0.000006834

10 6 0.000000787 -0.000001214 0.000004555

11 7 0.000007110 -0.000007136 0.000018747

12 6 -0.000003074 -0.000002686 -0.000007821

13 6 -0.000001202 0.000000956 0.000004789

14 6 0.000001559 -0.000004744 -0.000001981

15 6 -0.000002210 -0.000000815 0.000002755

16 6 -0.000003984 -0.000004762 -0.000004497

17 6 0.000011295 0.000008160 0.000000988

18 6 -0.000004862 0.000003269 -0.000004812

19 6 -0.000003277 -0.000000467 0.000001415

20 6 0.000003873 0.000015963 -0.000022679

21 6 -0.000001179 -0.000002536 -0.000001030

22 6 0.000003545 -0.000004562 -0.000001827

23 6 -0.000000808 0.000002739 -0.000000452

24 6 0.000001462 0.000002690 -0.000003253

25 6 -0.000001479 -0.000003420 0.000007828

26 17 -0.000002148 0.000001844 -0.000000519

27 1 0.000004287 -0.000000324 -0.000000347

28 1 0.000000027 -0.000001567 -0.000001589

29 1 -0.000000262 0.000002044 0.000002498

30 1 0.000000720 0.000001617 0.000001935

31 1 0.000000890 0.000001494 0.000000181

32 1 -0.000000314 0.000001018 0.000001914

33 1 -0.000000198 -0.000000804 -0.000000392

34 1 0.000001232 -0.000001220 -0.000000504

35 1 0.000001874 0.000000653 0.000000617

36 1 -0.000000777 0.000000647 0.000000415

37 1 0.000000177 -0.000000377 0.000003050

38 1 -0.000001847 -0.000000823 0.000002025

39 1 -0.000001521 -0.000000099 0.000001956

40 1 -0.000001039 -0.000001125 0.000001001

41 1 -0.000000298 -0.000000795 -0.000001963

42 1 0.000000788 -0.000000293 -0.000001022

43 1 0.000001460 0.000000244 -0.000000266

44 1 0.000001344 0.000000685 -0.000003085

11

1 6 -0.000002954 0.000016443 -0.000012367

2 6 -0.000012066 0.000003650 0.000003206

3 6 0.000005797 -0.000001726 -0.000002509

4 8 0.000002263 -0.000004701 0.000001724

5 6 0.000000564 0.000000324 -0.000000401

6 6 -0.000000805 -0.000000784 0.000000829

7 6 0.000001161 -0.000001369 0.000000412

8 6 -0.000006983 0.000000333 0.000001223

9 6 0.000000479 0.000006460 0.000002178

10 6 0.000000869 -0.000000368 -0.000001694

11 7 0.000002216 -0.000011663 -0.000012554

12 6 -0.000002319 0.000005243 0.000004459

13 6 -0.000000061 -0.000000061 -0.000001563

14 6 0.000000647 0.000004526 -0.000000725

15 6 -0.000005754 -0.000002979 0.000005192

16 6 0.000001303 -0.000003627 -0.000002154

17 6 0.000005379 -0.000002074 0.000003654

18 6 0.000000017 0.000004542 0.000005120

19 6 -0.000000292 0.000000103 0.000000162

20 6 0.000011094 -0.000018236 0.000004400

21 6 -0.000007823 0.000010071 -0.000000258

Appendix

149

22 6 0.000004382 -0.000004301 -0.000001352

23 6 -0.000000811 0.000003326 0.000003693

24 6 -0.000001354 0.000000355 -0.000002733

25 6 0.000001659 0.000002724 0.000003179

26 17 0.000003331 -0.000002980 0.000001686

27 1 0.000001254 0.000000859 0.000001783

28 1 -0.000000119 -0.000002792 -0.000002899

29 1 -0.000000141 -0.000000016 0.000000001

30 1 -0.000000021 0.000000000 -0.000000100

31 1 -0.000000003 -0.000000015 0.000000071

32 1 -0.000000188 -0.000000163 0.000000188

33 1 -0.000000111 -0.000000393 0.000000288

34 1 -0.000000398 -0.000000489 0.000000246

35 1 -0.000000610 -0.000000281 -0.000000892

36 1 0.000000707 0.000000594 -0.000000138

37 1 0.000000156 0.000001018 -0.000000484

38 1 -0.000000129 -0.000000689 -0.000000343

39 1 -0.000000005 -0.000000343 -0.000000221

40 1 -0.000000097 -0.000000035 -0.000000051

41 1 0.000000096 0.000000452 -0.000000043

42 1 0.000000177 -0.000000265 -0.000000303

43 1 -0.000000174 -0.000000090 0.000000216

44 1 -0.000000332 -0.000000585 -0.000000126

12

1 6 -0.000003771 -0.000005174 0.000000741

2 6 0.000002717 0.000000045 -0.000000009

3 6 -0.000000715 -0.000001061 0.000002023

4 8 0.000000340 0.000000502 -0.000000522

5 6 0.000004466 0.000004597 -0.000000075

6 6 0.000000877 -0.000002481 0.000000319

7 6 0.000003539 -0.000000152 0.000000239

8 6 0.000000489 -0.000000044 0.000000488

9 6 0.000000993 -0.000002123 -0.000001605

10 6 0.000000230 -0.000000806 -0.000000459

11 9 0.000002069 -0.000000219 0.000001491

12 6 -0.000001090 -0.000000252 -0.000002206

13 6 -0.000001123 -0.000000486 -0.000002432

14 6 -0.000000846 -0.000000292 -0.000001429

15 6 -0.000001042 0.000000715 -0.000001349

16 6 -0.000000559 0.000000241 -0.000001061

17 6 -0.000001652 0.000000334 -0.000001320

18 7 -0.000001238 -0.000000436 -0.000000567

19 6 -0.000001033 0.000001148 0.000001462

20 6 -0.000000016 -0.000000442 0.000000956

21 6 0.000000404 0.000001063 0.000001240

22 6 0.000000625 0.000001572 0.000001399

23 6 -0.000000394 0.000001182 -0.000000090

24 6 0.000000165 -0.000000492 0.000000370

25 6 -0.000000580 0.000001689 0.000000955

26 6 -0.000000584 0.000000529 0.000000750

27 1 0.000000949 0.000000159 0.000001088

28 1 0.000000625 0.000000793 0.000002482

29 1 0.000002329 -0.000000863 0.000000841

30 1 0.000001936 -0.000001352 -0.000000691

31 1 0.000000657 -0.000000850 -0.000001340

32 1 0.000000228 -0.000000473 -0.000001286

33 1 -0.000001630 -0.000000362 -0.000002739

34 1 -0.000001206 -0.000000693 -0.000003049

35 1 -0.000000598 -0.000000642 -0.000002048

36 1 -0.000001534 0.000000208 -0.000001220

37 1 -0.000000539 0.000001308 0.000002190

38 1 0.000000027 0.000001017 0.000002576

39 1 -0.000000031 -0.000000542 -0.000000163

40 1 -0.000001454 0.000000937 0.000000951

41 1 -0.000001693 0.000000861 0.000000599

42 1 -0.000000511 0.000000784 0.000001140

43 1 -0.000000070 0.000000154 0.000000139

44 1 0.000000244 0.000000396 0.000001221

Appendix

150

13

1 6 -0.000003788 0.000011923 0.000010914

2 6 0.000003004 -0.000005077 -0.000006946

3 6 0.000007071 0.000007869 -0.000000638

4 8 -0.000001271 0.000001846 -0.000002419

5 6 -0.000002528 -0.000002486 -0.000000200

6 6 -0.000001007 0.000001144 -0.000003076

7 6 -0.000000395 0.000001349 -0.000001136

8 6 -0.000000026 -0.000002381 -0.000004144

9 6 -0.000000401 -0.000000929 0.000001853

10 6 -0.000001800 -0.000003096 -0.000000512

11 7 0.000006189 0.000001025 -0.000006550

12 6 -0.000008793 0.000001310 0.000001469

13 6 0.000007489 -0.000000395 0.000000662

14 6 0.000000548 -0.000002107 0.000001511

15 6 -0.000008884 -0.000003971 -0.000002006

16 6 0.000004861 0.000002081 -0.000002977

17 6 -0.000001242 -0.000000082 0.000003998

18 6 -0.000002359 -0.000000841 0.000004138

19 6 0.000000836 -0.000000824 0.000000647

20 6 0.000000778 -0.000006786 -0.000002920

21 6 0.000000467 0.000002953 0.000003182

22 6 -0.000003658 0.000003912 -0.000004733

23 6 0.000007476 -0.000007477 0.000002844

24 6 -0.000005745 0.000007880 -0.000000252

25 6 -0.000003004 -0.000001938 -0.000004169

26 8 0.000004409 -0.000002394 -0.000000391

27 6 0.000000649 0.000000977 0.000002668

28 8 -0.000001923 0.000000585 0.000004058

29 6 0.000001147 -0.000001352 -0.000000907

30 1 -0.000000851 0.000000290 0.000001410

31 1 -0.000001097 -0.000001002 -0.000001544

32 1 0.000001003 -0.000000984 -0.000000566

33 1 0.000000281 -0.000001979 -0.000000517

34 1 -0.000000679 -0.000001580 -0.000000701

35 1 0.000000166 0.000000169 -0.000000481

36 1 -0.000000901 0.000001506 -0.000000222

37 1 0.000000130 0.000000500 -0.000001069

38 1 -0.000000619 -0.000000482 -0.000000009

39 1 -0.000000551 0.000000915 -0.000001571

40 1 0.000000995 0.000000103 -0.000000171

41 1 0.000000481 0.000000652 0.000000294

42 1 0.000000815 0.000000364 0.000001855

43 1 -0.000000803 -0.000000178 0.000000941

44 1 -0.000001026 0.000001669 -0.000000438

45 1 0.000002173 -0.000001487 0.000000337

46 1 0.000000561 -0.000000474 0.000000502

47 1 -0.000000586 0.000000555 0.000001635

48 1 0.000000715 0.000000005 0.000000814

49 1 0.000000439 0.000000954 0.000001968

50 1 -0.000000144 -0.000001111 0.000001981

51 1 0.000001188 -0.000001232 -0.000000232

52 1 0.000000210 0.000000110 0.000001817

Appendix

151

14

1 6 0.000001667 -0.000015735 0.000012842

2 6 -0.000007798 -0.000002308 -0.000001298

3 6 0.000011391 0.000014491 0.000000457

4 8 0.000002293 0.000003062 -0.000006049

5 6 -0.000000899 -0.000000059 -0.000000207

6 6 -0.000000454 0.000002121 0.000000805

7 6 0.000001467 0.000001562 -0.000001736

8 6 -0.000008593 -0.000003864 0.000005984

9 6 0.000000466 0.000001895 -0.000004312

10 6 -0.000000726 -0.000001267 0.000002682

11 7 0.000009840 -0.000003859 0.000013148

12 6 -0.000010675 0.000001492 -0.000006164

13 6 0.000006301 -0.000000347 0.000002472

14 6 0.000001204 -0.000003627 -0.000003115

15 6 -0.000009749 -0.000008825 0.000001964

16 6 0.000001508 -0.000000759 -0.000002404

17 6 0.000008649 0.000006294 -0.000004788

18 6 -0.000004041 -0.000001189 -0.000004674

19 6 0.000000069 0.000000153 0.000000130

20 6 -0.000001978 0.000017268 -0.000013129

21 6 0.000002091 -0.000006538 -0.000000570

22 6 0.000000459 0.000000468 -0.000000674

23 6 0.000001090 -0.000002482 0.000001741

24 6 -0.000002651 0.000006949 -0.000002453

25 6 -0.000001473 -0.000005510 0.000007267

26 6 0.000001017 -0.000000809 -0.000001111

27 6 -0.000001080 0.000002311 -0.000000061

28 1 0.000002372 -0.000001205 -0.000002703

29 1 -0.000000073 -0.000002247 -0.000000261

30 1 0.000000688 0.000000820 0.000000917

31 1 0.000000487 0.000000531 -0.000000152

32 1 0.000000323 0.000000486 -0.000000749

33 1 -0.000000601 0.000000898 0.000000639

34 1 -0.000001107 -0.000000242 -0.000000078

35 1 -0.000000205 -0.000000636 0.000000583

36 1 0.000001030 0.000000269 -0.000000672

37 1 -0.000001190 0.000000547 0.000001403

38 1 -0.000000386 -0.000000072 0.000000487

39 1 -0.000000316 0.000000420 0.000001827

40 1 0.000000016 -0.000000810 0.000001320

41 1 -0.000000274 -0.000000998 0.000000379

42 1 -0.000000891 0.000000616 0.000000349

43 1 0.000000866 -0.000000060 0.000000307

44 1 0.000000186 0.000000904 -0.000002400

45 1 -0.000000391 -0.000000514 0.000000517

46 1 -0.000000946 -0.000000589 0.000000081

47 1 0.000000305 -0.000000097 0.000000578

48 1 0.000000518 0.000000641 0.000000295

49 1 0.000000327 0.000000465 0.000000577

50 1 -0.000000135 -0.000000015 0.000000006

15

1 6 0.000008162 -0.000017837 -0.000002323

2 6 -0.000008618 0.000001749 0.000001814

3 6 0.000003462 0.000005606 0.000004070

4 8 -0.000004583 0.000009666 0.000001028

5 6 0.000006219 0.000005729 0.000002135

6 6 0.000002048 -0.000000293 0.000000557

7 6 -0.000002021 -0.000000045 0.000003135

8 6 0.000002605 -0.000006838 0.000001376

9 6 0.000002066 0.000004350 -0.000000104

10 6 -0.000005643 -0.000002849 -0.000000297

11 8 0.000001601 0.000002401 -0.000003174

12 6 0.000002192 -0.000000875 0.000001785

13 9 0.000002910 -0.000002507 0.000002695

14 6 -0.000001375 0.000000353 -0.000004087

15 6 -0.000001188 -0.000000569 -0.000004317

16 6 -0.000001740 0.000000432 -0.000002903

17 6 -0.000001571 0.000000894 -0.000003899

Appendix

152

18 6 -0.000002307 0.000001679 -0.000001464

19 6 -0.000000140 0.000000232 -0.000003883

20 7 -0.000000135 -0.000004066 -0.000003559

21 6 -0.000000423 0.000006480 0.000002552

22 6 0.000000271 -0.000003034 0.000001121

23 6 0.000003904 0.000004264 0.000001864

24 6 -0.000003438 -0.000006545 0.000001449

25 6 -0.000002275 0.000004294 -0.000002787

26 6 -0.000001227 -0.000003745 0.000005198

27 6 0.000000651 0.000001757 0.000001376

28 6 0.000000245 -0.000000047 0.000000577

29 1 0.000000965 0.000000677 0.000002717

30 1 0.000000122 0.000000274 0.000004322

31 1 0.000001184 -0.000000202 0.000003825

32 1 -0.000001626 -0.000001355 -0.000001570

33 1 0.000000218 0.000000593 -0.000001462

34 1 0.000001864 -0.000001840 -0.000000326

35 1 0.000001475 -0.000001254 -0.000002355

36 1 0.000000720 0.000000729 -0.000000928

37 1 -0.000001524 0.000000188 -0.000005473

38 1 -0.000001372 -0.000000096 -0.000005786

39 1 -0.000000915 -0.000000222 -0.000002956

40 1 -0.000000860 0.000000458 -0.000002302

41 1 -0.000000025 0.000001265 0.000003651

42 1 -0.000000272 0.000000111 0.000004224

43 1 0.000000197 -0.000000523 0.000000982

44 1 -0.000000851 0.000001039 0.000001221

45 1 -0.000001245 0.000001267 -0.000000089

46 1 0.000000528 -0.000000144 0.000001435

47 1 0.000000741 -0.000000660 -0.000000477

48 1 0.000001028 -0.000000941 0.000001414

16

1 6 -0.000045981 0.000010417 0.000007197

2 6 0.000029447 0.000003099 -0.000009129

3 6 -0.000008299 -0.000016162 0.000010334

4 8 0.000018975 -0.000019707 -0.000004895

5 6 0.000011783 0.000016129 0.000000304

6 6 -0.000000773 -0.000001902 0.000010274

7 6 0.000001981 -0.000001122 0.000002517

8 6 -0.000004319 0.000003024 -0.000015453

9 6 0.000004196 0.000000391 0.000002266

10 6 -0.000004146 -0.000006013 0.000007164

11 6 0.000003411 -0.000000340 -0.000009531

12 9 0.000005892 -0.000001944 -0.000002636

13 6 -0.000000889 -0.000001060 -0.000001452

14 6 -0.000001215 0.000001388 -0.000001726

15 6 0.000001142 -0.000001150 0.000000353

16 6 -0.000004675 0.000001637 -0.000005046

17 6 0.000003352 -0.000006137 0.000000534

18 6 -0.000002512 0.000002081 -0.000004588

19 7 -0.000000425 0.000014976 -0.000014645

20 6 -0.000005627 -0.000012027 0.000005089

21 6 -0.000002767 0.000008520 -0.000002475

22 6 -0.000006083 -0.000006268 -0.000001475

23 6 0.000003053 0.000018874 0.000002072

24 6 0.000000019 -0.000007346 -0.000005711

25 6 0.000008771 0.000003940 0.000006565

26 6 -0.000009365 -0.000005478 0.000009425

27 6 0.000002067 0.000002490 -0.000001256

28 1 0.000000430 0.000000979 -0.000002610

29 1 0.000000919 0.000001964 0.000004487

30 1 -0.000000437 0.000000294 0.000000736

31 1 0.000000144 0.000000863 -0.000002054

32 1 0.000000686 0.000001030 -0.000001544

33 1 0.000002085 -0.000002011 0.000007931

34 1 0.000002524 -0.000002064 0.000004487

35 1 0.000000953 -0.000004709 0.000007382

36 1 -0.000001747 -0.000000281 -0.000002936

Appendix

153

37 1 -0.000001218 -0.000000613 -0.000002993

38 1 -0.000000974 -0.000000266 -0.000001759

39 1 -0.000000772 0.000000092 -0.000000958

40 1 0.000000115 0.000000404 0.000002602

41 1 0.000000154 0.000001431 0.000002856

42 1 -0.000000857 -0.000002615 0.000003101

43 1 0.000001337 0.000002117 -0.000002098

44 1 0.000001665 0.000002611 -0.000001544

45 1 -0.000000695 0.000001120 0.000000217

46 1 -0.000001115 -0.000000464 -0.000000335

47 1 -0.000000211 -0.000000193 0.000000956

17

1 6 -0.000002426 0.000000512 -0.000001029

2 6 0.000003871 -0.000004230 -0.000000769

3 6 -0.000000668 0.000000069 -0.000002333

4 8 -0.000000233 -0.000003485 0.000000333

5 6 0.000000567 0.000002480 -0.000001305

6 6 0.000000531 0.000001132 -0.000001280

7 6 0.000003705 0.000000467 -0.000004465

8 6 -0.000002795 0.000001789 0.000002567

9 6 -0.000002085 0.000003444 0.000000695

10 6 0.000001718 0.000002170 -0.000001161

11 7 0.000000662 0.000000794 -0.000000576

12 6 0.000000487 0.000002691 -0.000000466

13 6 -0.000000091 0.000001317 -0.000001033

14 6 0.000000548 -0.000000255 -0.000001757

15 6 0.000004925 0.000002046 0.000002105

16 6 -0.000001523 -0.000000331 -0.000001446

17 6 0.000003480 0.000001363 -0.000003112

18 6 -0.000000596 0.000000198 0.000000788

19 6 -0.000001982 0.000000606 0.000003147

20 6 0.000000434 -0.000001832 0.000000410

21 6 -0.000001943 0.000001494 0.000003204

22 6 -0.000003708 -0.000002963 0.000003655

23 6 -0.000001625 0.000000004 0.000000989

24 6 -0.000001826 0.000001897 0.000003303

25 9 -0.000002380 -0.000001266 0.000003616

26 6 0.000003365 0.000002225 -0.000000815

27 6 0.000000973 -0.000003409 -0.000001584

28 6 -0.000000179 -0.000002279 0.000002800

29 6 0.000000337 -0.000005042 -0.000001220

30 6 -0.000000010 -0.000000810 0.000001404

31 6 -0.000000122 -0.000004029 -0.000002394

32 17 -0.000000987 -0.000004137 0.000001964

33 1 0.000002336 0.000000653 -0.000005151

34 1 0.000002139 0.000000776 -0.000004704

35 1 0.000000279 0.000002317 -0.000000832

36 1 0.000002003 0.000001930 -0.000003008

37 1 0.000001388 0.000001655 -0.000001827

38 1 -0.000001096 0.000002199 0.000001197

39 1 -0.000001194 0.000000827 0.000000684

40 1 -0.000000695 0.000000659 0.000000188

41 1 0.000001942 0.000001206 -0.000001594

42 1 -0.000001364 0.000002538 0.000002037

43 1 -0.000001129 0.000002620 0.000000936

44 1 -0.000001162 -0.000000169 0.000000734

45 1 -0.000000768 -0.000002357 0.000001957

46 1 -0.000003034 0.000000365 0.000003996

47 1 -0.000002802 0.000002364 0.000002271

48 1 -0.000000100 -0.000001435 0.000000836

49 1 -0.000000489 -0.000003451 0.000000961

50 1 0.000001061 -0.000003572 -0.000001129

51 1 0.000002260 -0.000001755 -0.000001785

Appendix

154

18

1 6 -0.000002392 0.000013555 0.000007636

2 6 0.000007207 -0.000011040 -0.000003831

3 6 0.000005100 0.000006459 -0.000000324

4 8 -0.000000023 -0.000003003 -0.000000846

5 6 0.000001101 0.000002482 -0.000000903

6 6 0.000000857 0.000002000 -0.000001404

7 6 0.000004466 0.000001370 -0.000005038

8 6 -0.000008605 -0.000000599 0.000003877

9 6 -0.000000266 0.000006024 0.000002010

10 6 0.000001474 0.000001839 -0.000002278

11 7 0.000002565 0.000002603 -0.000000058

12 6 -0.000002266 0.000000728 -0.000004393

13 6 0.000001447 0.000000822 0.000001005

14 6 0.000000937 0.000001856 -0.000000863

15 6 -0.000002582 -0.000005279 -0.000002614

16 6 -0.000000011 0.000000760 0.000000741

17 6 0.000008415 0.000001377 -0.000004346

18 6 -0.000000286 -0.000000332 0.000001338

19 6 -0.000001681 0.000001335 0.000002701

20 6 -0.000000472 -0.000000986 -0.000000367

21 6 -0.000000248 0.000001780 0.000002938

22 6 -0.000004579 -0.000003042 0.000003553

23 6 -0.000000741 0.000000895 -0.000000185

24 6 -0.000001781 0.000002119 0.000003266

25 6 -0.000002067 -0.000000979 0.000002812

26 6 0.000001003 -0.000006924 -0.000009285

27 6 0.000000238 -0.000001164 0.000000563

28 6 -0.000000567 -0.000003584 -0.000001264

29 6 0.000000948 -0.000001392 0.000004590

30 6 -0.000000057 -0.000001550 -0.000000406

31 6 0.000000031 -0.000003433 -0.000000621

32 17 -0.000000468 -0.000005289 -0.000001661

33 1 0.000002172 -0.000000333 -0.000002414

34 1 0.000002080 0.000000056 -0.000003464

35 1 0.000000599 0.000002408 -0.000000620

36 1 0.000001919 0.000002006 -0.000002436

37 1 0.000001621 0.000001516 -0.000001220

38 1 -0.000001084 0.000002406 0.000001041

39 1 -0.000000799 0.000001238 0.000000172

40 1 -0.000000162 0.000000199 -0.000000899

41 1 0.000002204 0.000000856 -0.000002218

42 1 -0.000001567 0.000000837 0.000001512

43 1 -0.000000652 0.000002493 0.000000200

44 1 -0.000000758 -0.000000713 0.000000657

45 1 -0.000001618 -0.000001535 0.000001284

46 1 -0.000002876 0.000000725 0.000003264

47 1 -0.000002248 0.000002004 0.000001714

48 1 -0.000002918 -0.000001006 0.000003222

49 1 -0.000002158 -0.000001695 0.000002364

50 1 -0.000002516 -0.000001960 0.000002741

51 1 -0.000000081 -0.000001626 0.000000329

52 1 -0.000000495 -0.000002941 0.000000510

53 1 0.000001165 -0.000003279 -0.000000879

54 1 0.000001479 -0.000001063 -0.000001206

Appendix

155

19

1 6 -0.000003179 -0.000004704 -0.000002212

2 6 0.000000691 -0.000000251 -0.000002103

3 6 -0.000001217 -0.000003481 -0.000002897

4 8 0.000000038 0.000002903 -0.000000580

5 6 0.000002117 0.000005278 0.000001354

6 6 -0.000004231 -0.000001489 -0.000003837

7 6 -0.000001095 0.000002632 -0.000002350

8 6 -0.000003484 0.000002656 0.000000831

9 6 -0.000000807 0.000001560 0.000001566

10 6 -0.000002417 -0.000000564 0.000002849

11 17 -0.000003880 0.000001767 0.000000421

12 6 0.000003991 0.000000058 0.000006125

13 6 0.000003068 -0.000000274 0.000006591

14 6 0.000002183 0.000000187 0.000004771

15 6 0.000001686 0.000000936 0.000002697

16 6 0.000002557 -0.000000860 -0.000001209

17 6 0.000003027 0.000000239 0.000005295

18 7 0.000001582 -0.000004914 0.000005536

19 6 0.000000403 0.000003394 -0.000001969

20 6 0.000001122 -0.000004985 -0.000003302

21 6 0.000001985 0.000000647 -0.000004837

22 6 -0.000000571 0.000000028 -0.000002810

23 6 -0.000000159 0.000001902 0.000000619

24 6 0.000002093 -0.000003338 -0.000001075

25 6 0.000003032 0.000000489 -0.000003210

26 6 -0.000001858 -0.000001224 -0.000001074

27 6 0.000001656 0.000000741 -0.000001308

28 6 -0.000000477 0.000000050 0.000000789

29 6 -0.000003432 0.000000271 -0.000001114

30 6 -0.000002857 0.000002560 -0.000001329

31 6 -0.000001120 0.000000099 0.000001372

32 1 -0.000001443 -0.000001028 -0.000005491

33 1 -0.000000766 -0.000000990 -0.000005504

34 1 -0.000003112 0.000001386 -0.000003576

35 1 -0.000005191 0.000001708 -0.000002991

36 1 -0.000000615 0.000001037 0.000003404

37 1 0.000001049 0.000000910 0.000002411

38 1 0.000003971 -0.000000046 0.000007853

39 1 0.000003587 0.000000629 0.000008397

40 1 0.000002377 0.000000213 0.000004963

41 1 0.000003156 -0.000000968 0.000003462

42 1 0.000000206 -0.000001621 -0.000004634

43 1 -0.000000843 -0.000001634 -0.000006108

44 1 0.000000664 -0.000001686 0.000001022

45 1 0.000001663 -0.000002279 -0.000004437

46 1 0.000004526 -0.000002468 0.000000391

47 1 0.000001915 -0.000001769 -0.000000934

48 1 -0.000000561 0.000000029 -0.000001003

49 1 -0.000004038 0.000002389 -0.000000313

50 1 -0.000004656 0.000002539 0.000000050

51 1 -0.000002336 0.000001331 -0.000000563

Appendix

156

20

1 6 -0.000010461 0.000016231 -0.000007361

2 6 0.000014637 -0.000014618 0.000002000

3 6 -0.000001170 0.000001789 -0.000004260

4 8 0.000004174 -0.000002502 -0.000004488

5 6 -0.000003105 0.000004766 -0.000002149

6 6 -0.000000244 0.000003075 -0.000003505

7 6 -0.000001977 0.000005114 -0.000005393

8 6 0.000005202 0.000002079 -0.000001922

9 6 -0.000008326 0.000001064 0.000002255

10 6 0.000002420 0.000000820 -0.000004633

11 7 0.000005047 -0.000001778 -0.000007620

12 6 0.000008770 -0.000002520 0.000000806

13 6 0.000003684 0.000009600 0.000010511

14 6 -0.000000566 -0.000034796 -0.000020747

15 6 -0.000008057 0.000006183 0.000009135

16 6 0.000003806 0.000008259 -0.000006007

17 6 -0.000004801 0.000001445 -0.000002044

18 6 0.000000381 0.000008051 0.000009563

19 6 -0.000004571 -0.000005322 0.000000431

20 6 -0.000002040 -0.000003125 0.000002895

21 6 -0.000000224 -0.000003251 0.000005826

22 6 0.000002396 -0.000001407 0.000000499

23 6 0.000000995 -0.000005283 0.000000146

24 6 0.000002285 -0.000002019 0.000003931

25 9 -0.000001084 0.000003314 0.000002667

26 6 -0.000001317 -0.000008152 0.000009308

27 6 0.000003131 -0.000002004 0.000002046

28 6 -0.000001786 0.000001412 0.000003853

29 6 -0.000001447 0.000001724 0.000002154

30 6 -0.000002850 0.000002923 0.000003445

31 6 -0.000002149 -0.000001029 -0.000004592

32 17 -0.000003693 0.000003376 0.000004173

33 1 0.000002388 -0.000001740 -0.000006711

34 1 0.000002909 -0.000001146 -0.000003909

35 1 -0.000000905 0.000004286 -0.000002435

36 1 -0.000000140 0.000005163 -0.000005977

37 1 0.000000920 0.000002635 -0.000004259

38 1 -0.000000726 0.000002174 0.000001239

39 1 -0.000000888 -0.000001498 -0.000000928

40 1 0.000002063 0.000004896 0.000005480

41 1 0.000000251 0.000000292 -0.000003965

42 1 -0.000000943 -0.000000805 0.000001318

43 1 -0.000001755 0.000001419 0.000003617

44 1 -0.000000462 -0.000001721 0.000002644

45 1 0.000001411 -0.000005642 0.000003022

46 1 0.000001860 -0.000005571 0.000001758

47 1 0.000000979 -0.000002707 0.000001756

48 1 -0.000000911 -0.000000971 0.000002624

49 1 -0.000001968 0.000000351 0.000004686

50 1 -0.000002156 0.000003155 -0.000000437

51 1 0.000001016 0.000004013 -0.000000444

Appendix

157

21

1 6 -0.000000405 -0.000011249 0.000010399

2 6 -0.000005744 0.000003737 -0.000000983

3 6 0.000001887 0.000003888 0.000001620

4 8 0.000001215 0.000003370 -0.000005319

5 6 0.000001876 -0.000002588 0.000003209

6 6 0.000001575 -0.000003583 0.000004601

7 6 0.000002283 -0.000003540 0.000004833

8 6 -0.000001918 0.000001454 0.000000253

9 6 -0.000001162 -0.000002529 0.000001995

10 6 0.000001662 -0.000001043 0.000002696

11 7 0.000004155 0.000007453 -0.000002328

12 6 -0.000008551 -0.000003731 0.000005764

13 6 0.000005589 0.000000168 -0.000003261

14 6 -0.000001644 0.000001580 -0.000000142

15 6 -0.000005750 0.000000378 0.000004031

16 6 0.000002586 0.000001213 -0.000002558

17 6 0.000003305 -0.000002720 0.000006859

18 6 0.000000516 -0.000005367 -0.000001657

19 6 -0.000003558 0.000002285 0.000000103

20 6 -0.000002693 0.000003968 -0.000007759

21 6 -0.000000582 0.000002994 0.000001266

22 6 -0.000006290 0.000004968 0.000001528

23 6 0.000000116 0.000001675 -0.000002742

24 6 0.000000373 0.000000202 0.000002992

25 9 0.000000869 0.000000017 -0.000001982

26 9 -0.000003653 0.000004462 -0.000001514

27 6 0.000000971 0.000009514 -0.000009965

28 6 0.000000299 -0.000002821 -0.000000478

29 6 0.000001977 -0.000000967 -0.000003021

30 6 0.000002832 -0.000002909 -0.000003253

31 6 0.000000505 0.000000014 -0.000005088

32 6 0.000002070 -0.000001294 -0.000002516

33 17 0.000003356 -0.000002846 -0.000005397

34 1 -0.000000064 0.000000936 0.000000904

35 1 -0.000001145 0.000000283 0.000002588

36 1 0.000001567 -0.000003436 0.000004302

37 1 0.000001550 -0.000003288 0.000005767

38 1 0.000000636 -0.000002273 0.000005052

39 1 0.000000902 -0.000002465 0.000001197

40 1 -0.000001628 -0.000001199 -0.000002709

41 1 -0.000000512 0.000001952 -0.000000676

42 1 -0.000000203 -0.000001408 0.000004710

43 1 0.000000482 0.000000500 -0.000001689

44 1 0.000000832 -0.000000545 0.000000102

45 1 -0.000002795 0.000004072 -0.000003065

46 1 -0.000003064 0.000002912 0.000001775

47 1 -0.000001293 0.000000766 0.000001575

48 1 0.000001654 -0.000001063 0.000000142

49 1 0.000002634 -0.000002986 -0.000001525

50 1 0.000002432 -0.000000282 -0.000005911

51 1 -0.000000053 0.000001371 -0.000004725

Appendix

158

22

1 6 -0.000012863 -0.000009099 0.000027887

2 6 0.000002756 -0.000013252 -0.000003203

3 6 0.000007935 0.000014903 -0.000001548

4 8 0.000007442 0.000000303 -0.000003480

5 6 -0.000000807 0.000003577 -0.000003963

6 6 0.000000494 0.000002764 -0.000004499

7 6 -0.000001224 0.000001288 -0.000006273

8 6 -0.000010334 -0.000006657 -0.000000018

9 6 -0.000003356 0.000005126 -0.000002353

10 6 0.000004393 0.000003434 -0.000006987

11 7 0.000002643 0.000004218 -0.000010627

12 6 -0.000002608 -0.000006868 0.000004254

13 6 -0.000000980 0.000001033 0.000000281

14 6 -0.000000102 0.000000855 -0.000000186

15 6 -0.000004241 -0.000013322 -0.000002755

16 6 -0.000001410 -0.000002972 -0.000005079

17 6 0.000010015 0.000009661 0.000002429

18 6 0.000001716 -0.000002141 0.000002675

19 6 -0.000002959 0.000010913 -0.000001203

20 6 0.000001914 -0.000004708 -0.000000931

21 6 0.000003135 0.000000154 0.000009575

22 6 0.000000830 0.000002315 -0.000000097

23 6 0.000003441 -0.000009097 -0.000003395

24 6 0.000004726 -0.000007927 0.000006783

25 9 0.000002389 -0.000006123 0.000002352

26 6 0.000001470 0.000018347 -0.000020384

27 6 -0.000000416 -0.000009636 0.000009487

28 6 -0.000002641 0.000005337 -0.000000427

29 6 -0.000003411 -0.000000788 0.000007468

30 6 -0.000003030 0.000005126 -0.000002542

31 6 0.000000017 -0.000001166 0.000008547

32 17 -0.000004434 0.000004334 0.000002365

33 1 0.000002226 0.000000519 0.000001664

34 1 0.000000641 -0.000001830 -0.000003683

35 1 -0.000001821 0.000002048 -0.000004665

36 1 -0.000002080 0.000002812 -0.000006959

37 1 0.000000948 0.000002178 -0.000004680

38 1 -0.000001963 0.000000535 0.000000887

39 1 0.000000534 -0.000001540 0.000001031

40 1 0.000000663 -0.000001317 0.000001079

41 1 0.000001248 0.000001326 -0.000002015

42 1 -0.000000501 -0.000000886 -0.000000667

43 1 -0.000000604 -0.000000515 0.000002275

44 1 0.000000347 -0.000002561 0.000001877

45 1 0.000004086 -0.000003659 0.000000897

46 1 0.000003754 -0.000001739 -0.000001139

47 1 0.000000944 0.000000699 -0.000001955

48 1 -0.000000385 0.000000219 0.000003630

49 1 -0.000002314 -0.000000412 0.000006539

50 1 -0.000003036 0.000002987 0.000002723

51 1 -0.000003186 0.000001204 -0.000000994

Appendix

159

23

1 6 -0.000009834 -0.000006857 -0.000002839

2 6 0.000009038 0.000002586 0.000006305

3 6 -0.000007015 -0.000008730 -0.000014712

4 8 -0.000002497 -0.000001697 0.000003871

5 6 0.000005858 0.000008849 0.000000992

6 6 -0.000005227 -0.000004626 -0.000002866

7 6 -0.000000235 0.000003960 -0.000000914

8 6 -0.000003874 0.000003258 0.000003137

9 6 -0.000000604 0.000004765 0.000004444

10 6 -0.000009140 -0.000002254 0.000005158

11 17 -0.000004900 0.000001979 0.000002770

12 6 0.000003985 0.000000685 0.000004794

13 6 0.000003367 0.000000174 0.000005937

14 6 0.000002403 0.000000029 0.000006204

15 6 -0.000000643 0.000004978 -0.000003351

16 6 0.000000136 0.000000275 -0.000001835

17 6 0.000004772 0.000002228 0.000006240

18 7 0.000006410 -0.000015523 0.000003148

19 6 -0.000003067 0.000010156 0.000002565

20 6 0.000002128 -0.000008059 -0.000006362

21 6 0.000005773 0.000001489 -0.000003642

22 6 0.000002633 0.000002080 0.000009641

23 6 -0.000003804 0.000010742 -0.000008802

24 6 0.000006848 -0.000009023 0.000003010

25 6 0.000003318 0.000002070 -0.000002433

26 6 -0.000007825 -0.000005220 -0.000007610

27 6 0.000009262 0.000005482 -0.000009404

28 6 0.000003666 0.000000666 0.000009467

29 6 -0.000011309 -0.000011697 -0.000002113

30 6 0.000002240 0.000008997 -0.000010154

31 6 0.000000704 -0.000002752 0.000011497

32 17 -0.000002381 0.000003130 -0.000005556

33 1 -0.000003645 0.000000387 -0.000004389

34 1 -0.000001063 0.000000085 -0.000005363

35 1 -0.000004819 0.000002229 -0.000000762

36 1 -0.000006708 0.000001531 0.000000130

37 1 -0.000000401 0.000001331 0.000004623

38 1 0.000003114 0.000001870 0.000003166

39 1 0.000004537 -0.000000218 0.000006468

40 1 0.000004613 0.000001516 0.000007294

41 1 0.000003217 0.000001078 0.000004315

42 1 0.000003302 -0.000001381 0.000000579

43 1 -0.000000376 -0.000000471 -0.000003277

44 1 -0.000002097 -0.000002162 -0.000005870

45 1 0.000003609 -0.000001457 0.000002528

46 1 0.000002740 -0.000003374 -0.000004710

47 1 0.000002837 -0.000004225 -0.000001629

48 1 -0.000000282 -0.000002133 -0.000002521

49 1 -0.000001337 -0.000001357 -0.000005015

50 1 -0.000004530 0.000001659 -0.000000064

51 1 -0.000002898 0.000002952 -0.000002089

Appendix

160

24

1 6 -0.000001169 0.000002971 0.000001256

2 6 0.000000558 -0.000004440 -0.000008397

3 6 -0.000002299 0.000000481 0.000003322

4 8 0.000001275 0.000000700 0.000000477

5 6 -0.000001527 0.000000783 -0.000000479

6 6 -0.000000376 -0.000000834 -0.000000506

7 6 -0.000000376 0.000000338 -0.000001924

8 6 -0.000004703 -0.000002660 0.000001113

9 6 -0.000000840 0.000002756 0.000004446

10 6 0.000000616 -0.000001492 -0.000002517

11 7 -0.000000406 0.000003398 -0.000012724

12 6 0.000000975 -0.000003151 0.000003892

13 6 -0.000000978 0.000002216 0.000000398

14 6 0.000000927 -0.000005225 0.000000358

15 6 0.000002523 0.000006454 -0.000001820

16 6 -0.000003515 -0.000003802 -0.000004454

17 6 0.000005935 0.000004693 0.000000337

18 6 0.000001670 -0.000002552 0.000006920

19 6 -0.000000244 -0.000002865 -0.000001343

20 6 0.000000565 -0.000001022 0.000000137

21 6 0.000001896 -0.000002214 0.000002095

22 6 -0.000000483 -0.000000155 0.000000800

23 6 0.000000464 -0.000000287 -0.000000414

24 6 -0.000000478 -0.000000517 0.000001884

25 7 0.000006080 0.000002554 0.000006280

26 6 -0.000001235 0.000004290 -0.000005017

27 6 0.000000361 -0.000001772 0.000002183

28 6 -0.000002208 0.000001766 0.000000872

29 6 0.000002022 -0.000001648 0.000000684

30 6 -0.000001728 0.000005765 -0.000003113

31 6 -0.000000135 0.000000073 0.000006958

32 17 -0.000000944 0.000002438 0.000000502

33 1 -0.000000135 0.000000721 0.000001196

34 1 -0.000001174 -0.000000346 0.000002053

35 1 -0.000000160 -0.000000075 -0.000000567

36 1 -0.000000639 -0.000000153 -0.000001849

37 1 -0.000000375 -0.000000269 -0.000001062

38 1 0.000000664 -0.000000370 -0.000000122

39 1 0.000001233 -0.000000182 0.000000387

40 1 -0.000001129 -0.000000465 -0.000000004

41 1 0.000000103 0.000000367 -0.000000547

42 1 -0.000000444 0.000001147 -0.000000737

43 1 0.000000159 0.000000695 0.000001967

44 1 0.000001343 0.000000133 0.000001515

45 1 0.000000957 -0.000000256 0.000000932

46 1 0.000000578 -0.000002518 -0.000000270

47 1 0.000000844 -0.000000932 -0.000000266

48 1 -0.000000685 0.000001367 0.000000259

49 1 0.000000021 0.000001608 0.000001118

50 1 -0.000000664 0.000000928 0.000000322

51 1 -0.000001268 0.000000336 -0.000001383

52 8 -0.000001061 -0.000005883 -0.000002149

53 8 -0.000000389 -0.000002891 -0.000002999

References

161

[1] J.R. Stepherson, C.E. Ayala, T.H. Tugwell, J.L. Henry, F.R. Fronczek, R. Kartika,

Carbazole Annulation via Cascade Nucleophilic Addition–CyclizationInvolving 2-(Silyloxy)

pentadienyl Cation, J Org. Lett 18(12) (2016) 3002-3005.

[2] G.-M. Tang, R.-H. Chi, W.-Z. Wan, Z.-Q. Chen, T.-X. Yan, Y.-P. Dong, Y.-T. Wang, Y.-

Z. Cui, Tunable photoluminescent materials based on two phenylcarbazole-based dimers

through the substituent groups, J. Lumin 185 (2017) 1-9.

[3] A. Głuszyńska, Biological potential of carbazole derivatives, Eur. J. Med. Chem 94 (2015)

405-426.

[4] L. S Tsutsumi, D. Gündisch, D. Sun, Carbazole scaffold in medicinal chemistry and

natural products: a review from 2010-2015, J Curr. Top. Med. Chem 16(11) (2016) 1290-

1313.

[5] E. Asma, D. Banibrata, Y. Deepthi, X. Liping, A. Tamara, E.A.R. Maarten, K.D. Aloke

Design, Synthesis, and Pharmacological Characterization of Carbazole Based Dopamine

Agonists as Potential Symptomatic and Neuroprotective Therapeutic Agents for Parkinson’s

Disease, J ACS Chem. Neurosci 10 (2019) 396−411.

[6] L.-X. Liu, X.-Q. Wang, B. Zhou, L.-J. Yang, Y. Li, H.-B. Zhang, X.-D. Yang, Synthesis

and antitumor activity of novel N-substituted carbazole imidazolium salt derivatives, Sci. Rep

5 (2015) 13101.

[7] M.S. Sinicropi, D. Iacopetta, C. Rosano, R. Randino, A. Caruso, C. Saturnino, N. Muià, J.

Ceramella, F. Puoci, M. Rodriquez, N-thioalkylcarbazoles derivatives as new anti-

proliferative agents: synthesis, characterisation and molecular mechanism evaluation, J

Enzyme Inhib Med Chem 33(1) (2018) 434-444.

[8] W. Wang, X. Sun, D. Sun, S. Li, Y. Yu, T. Yang, J. Yao, Z. Chen, L. Duan, Carbazole

aminoalcohols induce antiproliferation and apoptosis of human tumor cells by inhibiting

topoisomerase I, J ChemMedChem 11(24) (2016) 2675-2681.

[9] A. Głuszyńska, B. Juskowiak, M. Kuta-Siejkowska, M. Hoffmann, S. Haider, Carbazole

ligands as c-myc G-quadruplex binders, Int. J. Biol. Macromol 114 (2018) 479-490.

[10] M.H. Kaulage, B. Maji, S. Pasadi, A. Ali, S. Bhattacharya, K. Muniyappa, Targeting G-

quadruplex DNA structures in the telomere and oncogene promoter regions by

benzimidazole‒carbazole ligands, Eur. J. Med. Chem 148 (2018) 178-194.

[11] P.-H. Li, H. Jiang, W.-J. Zhang, Y.-L. Li, M.-C. Zhao, W. Zhou, L.-Y. Zhang, Y.-D.

Tang, C.-Z. Dong, Z.-S. Huang, Synthesis of carbazole derivatives containing chalcone

analogs as non-intercalative topoisomerase II catalytic inhibitors and apoptosis inducers, Eur.

J. Med. Chem 145 (2018) 498-510.

References

162

[12] R. Fleming, A. Miller, C. Stewart, Etoposide: an update, J Clin Pharm 8(4) (1989) 274-

293.

[13] K. R. Hande, Etoposide: four decades of development of a topoisomerase II inhibitor.

Eur. J. Cancer 34, 1514-1521 (1998).

[14] G.L. Patrick, An Introduction to Medicinal Chemistry, OUP Oxford, United Kingdom,

2013.

[15] P.W. Ayers, J.S. Anderson, L.J. Bartolotti, Perturbative perspectives on the chemical

reaction prediction problem, Int. J. Quantum Chem 101(5) (2005) 520-534.

[16] P. Geerlings, F. De Proft, W. Langenaeker, Conceptual density functional theory, Chem.

Rev 103(5) (2003) 1793-1874.

[17] R.G. Parr, W. Yang, Density-Functional Theory of Atoms and Molecules, Oxford

University Press, New York, 1989.

[18] C. Hansch, R.M. Muir, T. Fujita, P.P. Maloney, F. Geiger, M. Streich, The correlation of

biological activity of plant growth regulators and chloromycetin derivatives with Hammett

constants and partition coefficients, J. Am. Chem. Soc 85(18) (1963) 2817-2824.

[19] M. Alloui, S. Belaidi, H. Othmani, N. E. Jaidane, M. Hochlaf, Imidazole derivatives as

angiotensin II AT1 receptor blockers: Benchmarks, drug-like calculations and quantitative

structure-activity relationships modeling, Chem. Phys. Lett. 696 (2018) 70-78.

[20] S. Boudergua, M. Alloui, S. Belaidi, M.M. Al Mogren, U.A.E. Ibrahim, M.Hochlaf,

QSAR Modeling and Drug-Likeness Screening for Antioxidant Activity of Benzofuran

Derivatives, J. Mol. Struct. 1189 (2019) 307-314.

[21] M. Manachou, Z. Gouid, Z. Almi, S. Belaidi, S. Boughdiri, M. Hochlaf, Pyrazolo [1, 5-

a][1, 3, 5] triazin-2-thioxo-4-ones derivatives as thymidine phosphorylase inhibitors:

Structure, drug-like calculations and quantitative structure-activity relationships (QSAR)

modeling, J. Mol. Struct. 1199 (2020) 127027.

[22] I. Almi, S. Belaidi, E. Zerroug, M. Alloui, R. B. Said, R. Linguerri, M. Hochlaf, QSAR

investigations and structure-based virtual screening on a series of nitrobenzoxadiazole

derivatives targeting human glutathione-S-transferases, J. Mol. Struct. (2020) 128015.

[23] LuanaJanaína de Campos, E.B.d. Melo, A QSAR study of integrase strand transfer

inhibitors based on a large set of pyrimidine, pyrimidone, and pyridopyrazine carboxamide

derivatives, J. Mol. Struct 1141 (2017) 252-260.

[24] M. Ghamali, S. Chtita, A. Ousaa, B. Elidrissi, M. Bouachrine, T. Lakhlifi, QSAR

analysis of the toxicity of phenols and thiophenols using MLR and ANN, J Taibah Univ Sci

11(1) (2017) 1-10.

References

163

[25] P. Maldivi, L. Petit, C. Adamo, V. Vetere, Theoretical description of metal–ligand

bonding within f-element complexes: A successful and necessary interplay between theory

and experiment, J C. R. Chim 10(10-11) (2007) 888-896.

[26] A. Toropova, A. Toropov, E. Benfenati, G. Gini, Co-evolutions of correlations for QSAR

of toxicity of organometallic and inorganic substances: an unexpected good prediction based

on a model that seems untrustworthy, J Chemometr. Intell. Lab 105(2) (2011) 215-219.

[27] J. Thijssen, Computational Physics, Cambridge University Press2007.

[28] A.D. Becke, Density‐functional thermochemistry. III. The role of exact exchange, J.

Chem. Phys 98(7) (1993) 5648-5652.

[29] M. Frisch, G. Trucks, H.B. Schlegel, G. Scuseria, M. Robb, J. Cheeseman, G. Scalmani,

V. Barone, B. Mennucci, G. Petersson, H.C. Nakatsuji, M.; Li, X., H.P.I. Hratchian, A. F.;

Bloino, J., G.S. Zheng, J. L., M.E. Hada, M.; Toyota, , R.H. K.; Fukuda, J., M.N. Ishida, T.,

Y.K. Honda, O.; Nakai, H.; Vreven, , J.A. T.; Montgomery, Jr.; Peralta, , F.B. J. E.; Ogliaro,

M., J.J.B. Heyd, E., K.N.S. Kudin, V. N., R.N. Kobayashi, J., K.R. Raghavachari, J.C.I. A.;

Burant, S. S, J.C. Tomasi, M.; Rega,, N.J.K. N.; Millam, M., J.E.C. Knox, J. B.; Bakken, V.,

C.J. Adamo, J.; Gomperts, R., R.E.Y. Stratmann, O., A.J.C. Austin, R., C.O. Pomelli, J. W.,

R.L.M. Martin, K., V.G.V. Zakrzewski, G. A., P.D. Salvador, J. J., S.D. Dapprich, A. D., Ö.F.

Farkas, J. B., J.V.C. Ortiz, J., D.J. Fox, Gaussian 09, Gaussian, Inc., Wallingford, CT (2009).

[30] R.E. Gerkin, W.J. Reppart, The structure of carbazole at 168 K, Acta Cryst 42(4) (1986)

480-482.

[31] A. Bree, R. Zwarich, Vibrational Assignment of Carbazole from Infrared, Raman, and

Fluorescence Spectra, J. Chem. Phys 49(8) (1968) 3344-3355.

[32] S. Premkumar, T. Rekha, R.M. Asath, T. Mathavan, A.M.F. Benial, Vibrational

spectroscopic, molecular docking and density functional theory studies on 2-acetylamino-5-

bromo-6-methylpyridine, Eur. J. Pharm. Sci 82 (2016) 115-125.

[33] K. Fukui, Role of frontier orbitals in chemical reactions, J Science 218(4574) (1982)

747-754.

[34] R. Parr, W. Yang, Density functional approach to the frontier-electron theory of chemical

reactivity, J. Am. Chem. Soc 106(14) (1984) 4049-4050.

[35] P.W. Ayers, R.G. Parr, Variational principles for describing chemical reactions: the

Fukui function and chemical hardness revisited, J. Am. Chem. Soc 122(9) (2000) 2010-2018.

[36] C. Morell, A. Grand, A. Toro-Labbe, New dual descriptor for chemical reactivity, J.

Phys. Chem 109(1) (2005) 205-212.

[37] F. Guégan, P. Mignon, V. Tognetti, L. Joubert, C. Morell, Dual descriptor and molecular

electrostatic potential: complementary tools for the study of the coordination chemistry of

ambiphilic ligands, Phys. Chem. Chem. Phys 16(29) (2014) 15558-15569.

References

164

[38] HyperChem, (Molecular modeling system) Hypercube, Inc., 1115NW, 4th Street,

Gainesville, FL 32601, USA (2008).

[39] MOE, Chemical Computing Group (CCG): Montreal, Canada

[40] P.V. Rysselberghe, Remarks concerning the Clausius-Mossotti law, J. Phys. Chem 36(4)

(1932) 1152-1155.

[41] G. Pèpe, G. Guiliani, S. Loustalet, P. Halfon, Hydration free energy a fragmental model

and drug design, Eur. J. Med. Chem 37(11) (2002) 865-872.

[42] T. Salah, S. Belaidi, N. Melkemi, N. Tchouar, Molecular Geometry, Electronic

Properties, MPO Methods and Structure Activity/Property Relationship Studies of 1, 3, 4-

Thiadiazole Derivatives by Theoretical Calculations, Rev. Theor. Sci 3(4) (2015) 355-364.

[43] V. Stella, R. Borchardt, M. Hageman, R. Oliyai, H. Maag, J. Tilley, Prodrugs:

Challenges and Rewards, Springer, New York, 2007.

[44] C.A. Lipinski, F. Lombardo, B.W. Dominy, P.J. Feeney, Experimental and

computational approaches to estimate solubility and permeability in drug discovery and

development settings, Adv. Drug Deliv. Rev 23(1-3) (1997) 3-25.

[45] D.F. Veber, S.R. Johnson, H.-Y. Cheng, B.R. Smith, K.W. Ward, K.D. Kopple,

Molecular properties that influence the oral bioavailability of drug candidates, J. Med. Chem

45(12) (2002) 2615-2623.

[46] A.K. Ghose, V.N. Viswanadhan, J.J. Wendoloski, A knowledge-based approach in

designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative

and quantitative characterization of known drug databases, J. Comb. Chem 1(1) (1999) 55-68.

[47] K. Edward H, D. Li, Drug-like Properties: Concepts, Structure Design and Methods:

from ADME to Toxicity Optimization, Elsevier Science, United kingdom, 2008.

[48] L.J. Cseke, A. Kirakosyan, P.B. Kaufman, M.V. Westfall, Handbook of Molecular and

Cellular Methods in Biology and Medicine, CRC Press, USA, 2011.

[49] Y.H. Zhao, M.H. Abraham, J. Le, A. Hersey, C.N. Luscombe, G. Beck, B. Sherborne, I.

Cooper, Rate-limited steps of human oral absorption and QSAR studies, J Pharm Res 19(10)

(2002) 1446-1457.

[50] P.A. Bath, A.R. Poirrette, P. Willett, F.H. Allen, The extent of the relationship between

the graph-theoretical and the geometrical shape coefficients of chemical compounds, J. Chem.

Inf. Comput. Sci. 35(4) (1995) 714-716.

[51] K. Gohda, I. Mori, D. Ohta, T. Kikuchi, A CoMFA analysis with conformational

propensity: an attempt to analyze the SAR of a set of molecules with different conformational

flexibility using a 3D-QSAR method, J Comput Aided Mol Des 14(3) (2000) 265-275.

References

165

[52] R.G. Hill, Drug Discovery and Development: Technology in Transition, Elsevier Health

Sciences, London, 2012.

[53] F. Ghasemi, A. Mehridehnavi, A. Pérez-Garrido, H. Pérez-Sánchez, Neural network and

deep-learning algorithms used in QSAR studies: merits and drawbacks, Drug Discov. Today

23(10) (2018) 1784-1790.

[54] JMP.8.0.2, SAS Institute Inc, 2009.

[55] XLSTAT, Software, XLSTATCompany, (2013). http://www.xlstat.com.

(Accessed17.04.2020)

[56] H. Van Der Voet, Comparing the predictive accuracy of models using a simple

randomization test, Chemom. Intell. Lab. Syst., 25(2) (1994) 313-323.

[57] C. Rücker, G. Rücker, M. Meringer, Y-randomization and its variants in QSPR/QSAR, J.

Chem. Inf. Model, 47(6) (2007) 2345-2357.

[58] A. Golbraikh, A. Tropsha, Beware of q2!, J. Mol. Graph 20(4) (2002) 269-276.

[59] A. Golbraikh, A. Tropsha, Predictive QSAR modeling based on diversity sampling of

experimental datasets for the training and test set selection, J. Comput. Aided Mol. Des 16(5-

6) (2002) 357–369.

[60] A. Tropsha, P. Gramatica, V.K. Gombar, The importance of being earnest: validation is

the absolute essential for successful application and interpretation of QSPR models, QSAR

Com. Sci 22(1) (2003) 69-77.

[61] A. Srivastava, N. Shukla, QSAR based modeling on a series of lactam fused chroman

derivatives as selective 5-HT transporters, J. Saudi Chem. Soc 16(4) (2012) 405-412.

[62] P.P. Roy, K. Roy, On some aspects of variable selection for partial least squares

regression models, QSAR Comb. Sci. 27(3) (2008) 302-313.

[63] K. Roy, S. Kar, R.N. Das, A primer on QSAR/QSPR modeling: Fundamental concepts,

Springer, New York, 2015.

[64] S. Erić, M. Kalinić, A. Popović, M. Zloh, I. Kuzmanovski, Prediction of aqueous

solubility of drug-like molecules using a novel algorithm for automatic adjustment of relative

importance of descriptors implemented in counter-propagation artificial neural networks, Int.

J. Pharm 437(1-2) (2012) 232-241.

[65] R. Lowe, H.Y. Mussa, J.B. Mitchell, R.C. Glen, Classifying molecules using a sparse

probabilistic kernel binary classifier, J. Chem. Inf. Model 51(7) (2011) 1539-1544.

[66] C. Feng, S. Vijaykumar, Applications of artificial neural network modeling in drug

discovery, Clin. Exp. Pharmacol 2(3) (2012) 2-3.

References

166

[67] B.D. Ripley, Pattern Recognition and neural networks, Cambridge University Press, NY

United States, 1996.

[68] T.J. Wendorff, B.H. Schmidt, P. Heslop, C.A. Austin, J.M.J.J.o.m.b. Berger, The

structure of DNA-bound human topoisomerase II alpha: conformational mechanisms for

coordinating inter-subunit interactions with DNA cleavage, 424(3-4) (2012) 109-124.

[69] R.A. Guedes, N. Aniceto, M.A. Andrade, J.A. Salvador, R.C. Guedes, Chemical Patterns

of Proteasome Inhibitors: Lessons Learned from Two Decades of Drug Design, Int. J. Mol.

Sci 20(21) (2019) 5326.

Recently, a series of carbazole derivatives containing chalcone analogues (CDCAs) were synthetized

as potent anticancer agents and apoptosis inducers. These compounds target the inhibition of

topoisomerase II and present cytotoxic activities. After comparison to experiment, we validated the

use of B3LYP, a density functional theory-based approach, to describe the structure and molecular

properties of the carbazole subunit and CDCAs compounds of interest. Then, we derived relationships

between the chemical descriptors and activity of these carbazole derivatives using multi-parameter

optimization and quantitative structure activity relationships (QSAR) approaches. For the QSAR

studies, we used multiple linear regression and artificial neural network statistical modelling. Our

predicted activities are in good agreement with the experimental ones. We found that the most

important parameter influencing the activity of the considered compounds is the octanol-water

partition coefficient, highlighting the importance of flexibility as a key molecular parameter to favor

cell membrane crossing and enhance the action of these CDCAs against topoisomerase II. Our results

provide useful guidelines for designing new oral active CDCAs medicaments for cytotoxic inhibition.

Keywords: Carbazole derivatives; Topo II inhibition; Molecular structure; QSAR model.

Récemment, une série de dérivés du carbazole contenant des analogues de la chalcone (CDCA) ont été

synthétisés en tant qu'agents anticancéreux puissants et inducteurs de l'apoptose. Ces composés ciblent

l'inhibition de la topoisomérase II et présentent des activités cytotoxiques. Après comparaison à l'expérience,

nous avons validé l'utilisation de B3LYP, une approche basée sur la théorie fonctionnelle de la densité, pour

décrire la structure et les propriétés moléculaires de la sous-unité carbazole et des composés CDCA d'intérêt.

Ensuite, nous avons dérivé des relations entre les descripteurs chimiques et l'activité de ces dérivés de

carbazole en utilisant des approches d'optimisation multi-paramètres et de relations quantitatives structure-

activité (QSAR). Pour les études QSAR, nous avons utilisé une régression linéaire multiple et une

modélisation statistique des réseaux de neurones artificiels. Nos activités prédites sont en bon accord avec

celles expérimentales. Nous avons constaté que le paramètre le plus important influençant l'activité des

composés considérés est le coefficient de partage octanol-eau, soulignant l'importance de la flexibilité en tant

que paramètre moléculaire clé pour favoriser le franchissement de la membrane cellulaire et renforcer l'action

de ces CDCA contre la topoisomérase II. Nos résultats fournissent des lignes directrices utiles pour la

conception de nouveaux médicaments CDCA actifs oraux pour l'inhibition cytotoxique.

Mots clés : Dérivés du carbazole ; Inhibition de Topo II; Structure moleculaire; Modèle QSAR.

Étude de la structure et des propriétés SAR/QSAR de ...

Documents

Transcript of Étude de la structure et des propriétés SAR/QSAR de ...