Designing novel antitrypanosomal agents from a mixed graph-theoretical substructural approach

Designing Novel Antitrypanosomal Agents from a Mixed

Graph-Theoretical Substructural Approach

ALEJANDRO SPECK PLANCHE,1,2 MARCUS TULIUS SCOTTI,2,3 VICENTE DE PAULO EMERENCIANO,2

AMERICA GARCIA LOPEZ,1 ENRIQUE MOLINA PEREZ,4,5,6 EUGENIO URIARTE6

1Department of Chemistry, Faculty of Natural Sciences, University of Oriente,Santiago de Cuba 90500, Cuba

2Institute of Chemistry, University of Sa~o Paulo, Postal Box 26777,Sa~o Paulo, 05513-970, Brazil

3Center of Applied Sciences and Education, Federal University of Paraiba,Campus IV, Rio Tinto 58297-000, Brazil

4Faculty of Chemistry, University of Camaguey, Camaguey 74650, Cuba5Health and Food Technologies Development Group, University of Camaguey,

Camaguey 74650, Cuba6Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela,

Santiago de Compostela 15982, Spain

Received 21 October 2008; Revised 28 April 2009; Accepted 7 June 2009DOI 10.1002/jcc.21374

Published online 17 July 2009 in Wiley InterScience (www.interscience.wiley.com).

Abstract: Chagas disease is nowadays the most serious parasitic health problem. This disease is caused by Trypa-nosoma cruzi. The great number of deaths and the insufficient effectiveness of drugs against this parasite have

alarmed the scientific community worldwide. In an attempt to overcome this problem, a model for the design and

prediction of new antitrypanosomal agents was obtained. This used a mixed approach, containing simple descriptors

based on fragments and topological substructural molecular design descriptors. A data set was made up of 188 com-

pounds, 99 of them characterized an antitrypanosomal activity and 88 compounds that belong to other pharmaceuti-

cal categories. The model showed sensitivity, specificity and accuracy values above 85%. Quantitative fragmental

contributions were also calculated. Then, and to confirm the quality of the model, 15 structures of molecules tested

as antitrypanosomal compounds (that we did not include in this study) were predicted, taking into account the infor-

mation on the abovementioned calculated fragmental contributions. The model showed an accuracy of 100% which

means that the ‘‘in silico’’ methodology developed by our team is promising for the rational design of new antitrypa-

nosomal drugs.

q 2009 Wiley Periodicals, Inc. J Comput Chem 31: 882–894, 2010

Key words: massive screening; antitrypanosomal; discriminant analysis; fragment contributions; graph theoretical

descriptors

Introduction

Chagas’ disease has been identified as a major parasitic health

problem. That is a consequence of the rising levels of urbaniza-

tion and migration. Chagas disease is caused by Trypanosomacruzi (T. cruzi) and it was specific only to Central and South

America until 2004; currently about 13 million of people are

estimated to be infected with this parasite. It is calculated that

25–30% of chronically infected patients will eventually suffer

irreversible damage to the heart and digestive tract, resulting in

about 14,000 annual deaths and causing considerable morbidity.

The disease can be transmitted in three principal ways: by the

bite of bloodsucking triatomine insect vectors, by blood transfu-

sions or by congenital transmission; the countries of the South-

ern Cone of South America (Argentina, Bolivia, Brazil, Chile,

Paraguay, Uruguay) are the mostaffected by Chagas’ disease,

Additional supporting information may be found in the online version of

this article.

Correspondence to: A. S. Planche; e-mail: [email protected],

[email protected]

Contract/grant sponsor: Xunta de Galicia; contract/grant number:

20304PR

Contract/grant sponsors: CAPES/MES-Cuba, CNPq, FAPESP

q 2009 Wiley Periodicals, Inc.

because there, the main vector species lives in or around human

homes.1

Medication for Chagas’ disease is generally effective when it

is given during early stages of the infection and no treatment

based on medication has been proven to be efficient in later

stages. Unfortunately, although nifurtimox (Nfx) and benznida-

zole (Bnz) (the only two drugs available for the clinical treat-

ment of this disease) were able to eliminate patent parasitemia

and to reduce serological titers in acute and early chronic infec-

tions, they are not active against T. cruzi strains and have signif-

icantly lower efficacy in long-term chronic infections.2,3 More-

over, both drugs act via reduction of nitro group. Thus, this

reduction in the case of Nfx generates an unstable nitro radical

which reduces oxygen aand the species becomes highly toxic.

On the other hand, Bnz involves covalent modification of mac-

romolecules by nitro reduction intermediates.4 All these incon-

venients, explain the need for the search and design of new

drugs against T. cruzi.5,6

In the modern drug design, there are mainly two ways to dis-

cover new drugs: the ‘‘rational design’’ and ‘‘massive screen-

ing’’. In the case of rational design, some of the currently used

methods require previous knowledge about the target and the

mechanism of action involved, but this information is frequently

limited or is not available. In the massive screening, the knowl-

edge about the target can help to understand better the drug-

receptor interaction, but this is not a determinant condition to

develop a rational design. In this sense, computer-aided drug

design methods have been rapidly developed in recent years.7

Specially, methods related to the definition and application of

graph-theoretical descriptors have been very useful in quantita-

tive structure-activity relationship (QSAR) to perform a rational

analysis of biological activities.8 These descriptors permit to

codify the structural information of molecules and express it as

a number. Graph-based molecular descriptors have demonstrated

to be very efficient in drug design.9–12 Many works, using

graph-theoretical approaches, have been reported to design new

drugs, to model toxicities and to find common structural patterns

of the known drugs. We can mention the design of analgesic,

antiviral, antibacterial and antimycobacterial compounds.13–16

Other descriptors have been used to model pharmacological

activities like those based on fragmental contributions. Free-Wil-

son analysis can be considered as a pioneer in this sense.8 This

approach expresses, by means of indicator variables, the contri-

bution in defined positions, of a given structural feature, to the

biological activity. An advantage of the indicator variables of

the Free-Wilson analysis is that they have been extended and

they can be perfectly combined with other types of descrip-

tors.8,17,18

In the context of the active compounds against T. cruzi, some

works on biological assays have been published and in QSAR

studies applied to compounds with antitrypanosomal activ-

ity.5,6,19–28 However, the massive screening of the search for

new antitrypanosomal compounds using graph-theoretical

approaches has not been a sufficiently explored area. One of the

three reported works is the virtual screening of compounds as

inhibitors of trypanothione reductase.29 However, the principal

disadvantage of this work seems to be the restriction to the inhi-

bition analysis of only one target (trypanothione reductase) and

it is known that in the last 15 years some potential targets of

T. cruzi have been discovered and studied.30–35

The other two works can be considered as promising in this

area. One of them uses a non-stochastic quadratic fingerprints-

based approach obtaining a model with a good statistical quality

and some of the designed compounds were synthesized and their

antitrypanosomal activity was proved.36 The other study pub-

lished in the field uses the markov chains invariants for simula-

tion and design (MARCH-INSIDE) approach, that takes into

consideration not only the molecular structure of a given drug,

but also the average probability of an atom to interact with a

specific site of the target in a given microorganism. Thus, this

methodology has permitted, by the development of a single uni-

fied equation, to model the antimicrobial activity37 of several

drugs against several protozoa species. The knowledge about

fragmental features and how they are connected spatially is

determining for the development of new drugs. The aim of the

present work was to develop a model, using a mixed graph-theo-

retical substructural approach based on fragment descriptors to

obtain a molecular pattern for the search, design and prediction

of new compounds with antitrypanosomal activity, reporting for

the first time quantitative contributions of essential fragmental

features for the development of antitrypanosomal activity.

Methods

Fragment Descriptors

The principal utility of molecular descriptors based on fragments

is the possibility to obtain information about which fragments

have positive (favorable) influence and which ones have undesir-

able influence for the activity. They permit to reorient the design

of bioactive compounds to minimize the number of fragments

with negative contribution and maximize the number of frag-

ments with positive contribution. In the present work, we used a

mixed approach by employing an atom-centered fragment,38,39

functional group counts and topological substructural molecular

design (TOPS-MODE) descriptors40 to obtain fragment contribu-

tions that will provide us a guideline for the design of antitrypa-

nosomal compounds.

Atom-centered Fragment and Functional

Group Count Descriptors

Atom-centered fragments are useful molecular descriptors and

they have been employed in some QSAR/QSPR studies.38,39

These are simple molecular descriptors defined as the number of

specific atom types in a molecule. They are calculated by know-

ing the molecular composition and atom connectivities. Each

type of atom is an atom in the molecule described by its neigh-

boring atoms. Hydrogen and halogen atoms are classified by the

hybridization and oxidation state of the carbon atom to which

they are bound; for hydrogens, heteroatoms attached to a carbon

atom ina-position are further considered. Carbon atoms are clas-

sified by their hybridization state and depending on whether

their neighbors are carbon or heteroatoms. Other descriptors that

express certain fragmental features are the functional group

counts. These are simple molecular descriptors defined as the

883Designing Novel Antitrypanosomal Agents

Journal of Computational Chemistry DOI 10.1002/jcc

number of specific functional groups in a molecule. They are

calculated by knowing both the molecular composition and atom

connectivities. Both, atom-centered fragment and functional

group count descriptors can be considered as related to indicator

variables of a Free-Wilson analysis, but with some marked dif-

ferences. One of them is that in the case when these descriptors

not only mean presence or absence of a given fragment, but

also, when a structural feature is present, these descriptors

express the frequency of appearance of a structural feature in

the molecule, independently of the position in which that feature

appears.

TOPS-MODE Approach

This approach has been applied to describe physicochemical

properties of organic compounds and many applications for the

design of bioactive compounds have also been reported. The

theoretical background of the method has been widely

described.41–52 Anyway, we will focus only on the main issues.

This method codifies the molecular structure by means of the

edge adjacency matrix E (commonly called the bond adjacency

matrix B).53

The E o B matrix is a square table of order m (the number

of chemical bonds in a molecule). The elements of this matrix

(eij) are equal to 1 if bonds i and j are adjacent (it means that iand j are incident on the same vertex or atom) or 0 otherwise.

To codify the information on heteroatoms, the TOPS-MODE

approach uses B(wij) weighted matrices instead of B. The

weights (wij) are chemically meaningful numbers such as bond

distances, bond dipole, bond polarizabilities, or mathematical

expressions involving atomic weights.54–56 The weights are

introduced on the main diagonal of matrix B(wij). Then, the

spectral moments of this matrix can be used as molecular finger-

prints in QSAR studies for the codification of molecular struc-

tures. By mathematical definition, the expression ‘‘spectral

moments’’ must be understood as the sum of the elements (eij)in the natural powers of B(wij). Then, the spectral moment of

order k (lk) is the sum of the main diagonal elements (eij) of

matrix B(wij)k. The total spectral moments of the bond matrix43–45

are defined as:

lk ¼ TrðBkÞ ¼Xs

i¼1ðeiiÞk (1)

where Tr means the trace of the matrix, that is the sum of the diago-

nal entries of the matrix and the elements (eii)k are the diagonal

entries of the kth power of the bond matrix. Local spectral moments

are defined as the sum of the diagonal entries of different powers of

the bond matrix corresponding to a given molecular fragment.40 In

mathematical terms, the local spectral moments of the bond matrix

are defined as follows:

lkðf Þ ¼Xf

i¼1ðeiiÞk (2)

where f is the corresponding fragment for which the moment is

defined and the sum is carried out over all bonds forming the frag-

ment f. The simplest case is when f corresponds to a single bond,

and in this case the kth local moment is defined as the diagonal

entry corresponding to this bond in the matrix raised to the kthpower. Although TOPS-MODE descriptors are topological in na-

ture, their main advantage over most types of graph-theoretical

descriptors is the possibility to calculate the contributions of the

fragments to a desired activity.50,51 This fact provides the study of

local chemical processes like addition, substitution, elimination

and many others reactions. Also, TOPS-MODE descriptors have

demonstrated to explain successfully a considerable part of the geo-

metrical phenomena.57

Selection of the Data Set. Molecular Modeling and

Development of Discriminant Analysis

One of the most important steps in computer-aided search of

novel drugs is the design of representative and randomized train-

ing and prediction sets. With this aim, we selected a data set

made up of 188 compounds, 99 of them characterized by an

antitrypanosomal activity36 (Fig. 1), and 89 compounds that

belong to other pharmaceutical categories. The active com-

pounds belong to different families of compounds such as:

imidazoles, imidazolidines, triazoles, macrocycles, nitrofurans,

thiocyanate derivatives, and many others. They were active,

principally against the epimastigote and amastigote forms of

T. cruzi.3,19–28,36,58–63 The 89 inactive compounds were selected

from drugs of 19 pharmacological categories: analgesics, antiin-

flammatories, antianxiety drugs, anticonvulsants, hypnotics, mus-

cle relaxants, sedatives, antiparkinsonians, antihypertensives,

diuretics, anticoagulants, antihistaminics, antivirals, antitumorals,

local anesthetics, vasoconstrictors, anabolics, antioxidants, and

antituberculosis agents.64,65

All molecules were represented using Hyperchem version

7.0166 generating *.hin files. Subsequently, the DRAGON pro-

gram version 5.467 was used to calculate the functional group

counts and atom-centered fragment descriptors. On the other

hand, all molecules were converted as smiles codes and recorded

in a txt file. This file was used to calculate TOPS-MODE

descriptors and we used the molar refractivity and Abraham mo-

lecular interaction terms as weights of bond adjacency matrix.

This interaction term was expressed as excess in molar refrac-

tion and contained important information about the solvation

phenomena.68 MODESLAB version 1.5.69

The performance of a discriminant analysis depends on the

selection of compounds for the training and prediction series

used to build the classifier model. One of the most critical

aspects in the construction of the training series is to warrant a

great molecular diversity in this data set.70 For this reason, and

to design both, training and prediction sets, the complete data

set was processed using two k-means cluster analysis (k-

MCA),71 one for the set of antitrypanosomal compounds, and

the other, for inactive ones. The k-MCA was carried out with

STATISTICA 6.0,72 using de k-MCA module.

To obtain the acceptable statistical quality of the data parti-

tion in clusters, we took into account the number of members in

each cluster and the standard deviation of the variables in the

clusters (as low as possible). We also inspected the standard

deviation between and within clusters, the respective Fisher ratio

and their p-level of significance, considered to be lower than

0.05. The functional group counts and atom-centered fragment

884 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry


descriptors were used to develop the k-MCA. The training set

(selecting randomly compounds from each cluster) was made up

of 142 drugs, 74 compounds with antitrypanosomal activity and

68 inactive compounds, and the prediction set included 46 com-

pounds (�24.5% of whole data), 25 antitrypanosomal com-

pounds and 21 inactive compounds.

Figure 1. Structure of the molecules active against T. cruzi in the training and prediction series.



Linear discriminant analysis (LDA) has been the election sta-

tistical technique in some QSPR and QSAR studies.49–51,73 In

the present work, a similar expression for the QSAR was

derived:

ATc ¼ a0 þ a1D1 þ a2D2 þ � � � þ akDk (3)

In the eq. (3), ATc (acronym of antitrypanosomal activity) is

a dummy indicator variable. This variable reaches values of 1

Figure 1. (continued)



for antitrypanosomal compounds and 21 for inactive ones. The

ak term is the kth coefficient in the discriminant function and Dk

term is the kth descriptor (fragmental/TOPS-MODE). The dis-

criminant function was obtained by employing the LDA modu-

lus of STATISTICA 6.0. The default parameters of this program

were used in the development of the model. The variables to be

included in the equation were selected using forward stepwise

procedure as variable selection strategy. The selection of the

model was subjected to the principle of parsimony (Occam’s

razor).74 Then, the function with a high statistical significance,

but having as few parameters as possible, was chosen.

The statistical quality of the model was determined by exam-

ining the Wilks k, the squared Mahalanobis distance (D2), the

Fisher ratio (F), the corresponding p-level, the percentage of

classification among the group (for each case) and the proportion

between cases and variables. The Wilks k for the overall dis-

crimination can take values in the range from 0 (perfect discrim-

ination) to 1 (no discrimination). The D2 statistic indicates the

separation of the respective groups showing whether the model

possesses an appropriate discriminatory power to differentiate

between the respective two groups. Compounds were considered

unclassified by the model when the differences in the percentage

of classification between groups do not differ in more than 5%.

Compounds in the prediction set were never used to develop the

discrimination function. In the case of the prediction series, we

focused only on the ability of the model to classify active and

inactive compounds.

Additionally and to validate the models, for the training and

prediction data sets, some other statistical indices were given,

i.e., sensitivity (sens) as true positive rate, specificity (spec) as

true negative rate, accuracy (acc) as overall predictability, posi-

tive predictive value (PPV) and negative predictive value

(NPV). All these parameters were determined according to the

following equations:

sens ¼ TP

Cþ 3 100% (4)

spec ¼ TN

C� 3 100% (5)

acc ¼ TPþ TN

ðCþÞ þ ðC�Þ 3 100% (6)

PPV ¼ TP

TPþ FP3 100% (7)

NPV ¼ TN

TNþ FN3 100% (8)

where TP means true positive compounds, C1 the total active

compounds, TN means true negative compounds, C2 is the total

inactive compounds, FP are the false positives and FN are the

false negatives.

Figure 1. (continued)



Results and Discussion

Discriminant Model

After performing a random and representative selection of train-

ing and prediction series, the first one was used to fit the dis-

criminant function. The best model had the three types of

descriptors obtained by us, an atom-centered fragment, func-

tional group counts and TOPS-MODE descriptors. The classifi-

cation function that distinguishes between antitrypanosomal

compounds and inactive compounds that belong to the training

set together with the statistical parameters of the LDA is as

follows:

ATc ¼ 2:459ðRSCNÞ þ 1:176 � 10�2lðMolÞ2

� 3:051 � 10�5lðAb-R2Þ8 þ 1:701ðArNO2Þ þ 3:004ðPyrazÞþ 0:853ðC-028Þ þ 1:295ðHydÞ � 0:541ðH-053Þ

þ 0:415ðC-016Þ � 1:188ðRCOORÞ þ 2:238ðCH2RXÞ � 2:964

N ¼ 142; k ¼ 0:377; D2 ¼ 6:521; Fð11; 130Þ ¼ 19:506;

p < 0:0000001 ð9Þ

In the [eq. (9)], RSCN represents the number of aliphatic thi-

ocianate groups, l2(Mol) is the spectral moment of order 2,

weighted by the molar refractivity, l8(Ab-R2) means the spectral

moment of order 8, weighted by the Abraham molar refraction

excess term. The descriptor ArNO2 represents the number of ar-

omatic nitro groups, Pyraz, means the number of pyrazole moi-

eties, C-028 is the number of carbon atoms attached to a carbon

atom and to one electronegative atom (O, N, S, P, Se, halogens)

by aromatic bonds (as in benzene) or delocalized bonds (such as

the N��O bond in a nitro group) and to another carbon atom by

a simple bond, Hyd is the number of hydrazone moieties, H-053

represents the number of hydrogens attached to C0(sp3) with

two electronegative atoms attached to next carbon atom (the

superscript represents the formal oxidation number and it is cal-

culated as the sum of the conventional bond orders with electro-

negative atoms), C-016 is the number of sp2 carbon atom

attached by single bonds to one carbon and to one hydrogen

atom, RCOOR describes the number of aliphatic ester groups

and the descriptor represented by CH2RX means the number of

methylene moieties attached to halogen atom and to other

carbon atom.

To differentiate between active/inactive compounds in the

training series, the sensitivity (e.g., ability to identify antitrypa-

nosomal compounds correctly) of the model is 90.54% and the

specificity (e.g., ability to identify inactive compounds correctly)

is 89.71%, for an accuracy of 90.14%. The positive and the neg-

ative predictive values in the training series are 90.54 and

89.71% respectively, which means in the case of the PPV that if

the QSAR modeling predicts a compound with antitrypanosomal

activity, the probability of this compound to be positive is

90.54%. In a similar way, the NPV means in this case that if the

model predicts the compound is inactive against T. cruzi, the

probability of this compound to be inactive is 89.71%.

Models often misclassify some compounds and this is a fact

we have to take into consideration when we use them. A classi-

cally accepted solution is checking the squared Mahalanobis dis-

tance (D2) of the molecule with respect to the two centroids of

both groups (active and inactive compounds). Generally, in the

case of abnormal values of D2, this prediction must not be used.

No misclassified compound was found in the model and they

did not show abnormal values of Mahalanobis distances. The

most important criteria to accept or not a discriminant model

such as the one represented by the eq. (9), is based on the statis-

tics for the prediction set. The sensitivity of the model for the

prediction set was 92% and the specificity was 90.47% which

means an accuracy of 91.30%. The positive and negative predic-

tive values were 92 and 90.47% respectively. Supporting Infor-

mation Tables 1 and 2 shows the percentages of classification

obtained for the training and prediction sets respectively.

As it can be seen, the model has a good statistical quality.

An important thing that we have to point out is the number of

variables in the model. Two principal aspects were taken into

account. The first is the complexity of the data in which there is

a great variability in terms of structure, composition, and num-

ber of atoms. Simple molecules like guanine analogues as well

as complexe amphotericin B, were analyzed in this model. For

this reason, if we consider that we were trying to ‘‘catch’’ the

Table 1. Correlation Matrix at p\ 0.05, N 5 142.

RSCN RCOOR ArNO2 Hyd CH2RX Pyraz C-016 C-028 H-053 l2(Mol) l8

(Ab-R2)

RSCN 1.00 20.08 20.10 20.10 20.04 20.05 20.06 20.11 20.10 20.11 20.25

RCOOR 1.00 0.07 20.09 20.03 20.05 0.58 20.10 0.39 0.33 0.22

ArNO2 1.00 0.39 20.04 20.05 20.05 20.07 20.11 20.13 20.11

Hyd 1.00 20.04 20.06 20.07 20.12 20.11 20.02 20.17

CH2RX 1.00 20.02 20.03 20.04 20.04 20.02 20.11

Pyraz 1.00 20.03 0.01 20.06 20.20 0.07

C-016 1.00 20.10 0.63 0.47 0.28

C-028 1.00 20.08 0.04 20.07

H-053 1.00 0.46 0.32

l2(Mol) 1.00 0.56

l8(Ab-R2) 1.00



structural features at a substructural level, then an increase in

the number of variables should be expected.

Correlation Among Variables

In the statistic model, the independent variables have to be

highly correlated many times. That is not a consequence of the

statistical nature of the method employed, but it is due to the in-

herent properties of the used descriptors. This disturbing factor

is frequently overlooked and can lead to instability in the model

and the principal difficulty appears when we need to analyze the

relative importance of the variables in the model. The correla-

tion coefficient between independent variables should not over-

pass the value of 0.7, which is traditionally considered as the

cut-off for the analysis of collinearity between variables. The

correlation matrix of the model (Table 1) shows that the varia-

bles used do not overpass the value of 0.7 and thus, they give

specific structural information.

Receiver Operating Characteristic (ROC) Curve

Sensitivity and specificity describe the true performance with

greater clarity than accuracy, but they also have some disadvan-

tages. For a specific classifier, they represent two measures, one

for positive cases and the other one for negative ones. A com-

pound measure for the classifier is given through receiver oper-

ating characteristic (ROC) analysis. ROC is a classic methodol-

ogy from signal detection theory.75,76 Crucial for understanding

ROC is a confidence threshold in the classification task. For the

strict threshold the sensitivity will be low while specificity very

high. If the criterion for the threshold is lowered, specificity will

fall while sensitivity will be raised. In this way we can compare

two classifiers over a broad range of threshold conditions. ROC

curve is created by plotting the true positive rate against false

positive rates or sensitivity against 1-specificity.

The ROC curve going along the diagonal from bottom left to

upper right represents pure-chance performance. When the vari-

able (or variables) under study cannot distinguish between the

two groups, i.e. where there is no difference sbetween the two

distributions, the area will be equal to 0.5 (the ROC curve will

coincide with the diagonal). When there is a perfect separation

of the values of the two groups, i.e. there is no overlapping of

the distributions, the area under the ROC curve equals 1 (the

ROC curve will reach the upper left corner of the plot). The

ROC curves for the model (using both, the training and predic-

tion sets) (Fig. 2) were performed using the MATLAB program

version 6.5.77

The value for the area under the ROC curve can be inter-

preted as follows: for the training and prediction sets, the areas

under the ROC curves have the values of 0.96 and 0.95 respec-

tively, which means that in the case of the training set, a ran-

domly selected compound from the active series (antitrypano-

somal compounds) has a value higher than that for a randomly

chosen compound from the inactive series in 96% of the times.

For the prediction set the same interpretation can be carried out.

Thus, the values of ROC areas confirm that the model obtained

is not a random classifier, taking into account that the area val-

ues are significantly higher than 0.5.

Structural Interpretation

One of the most important features of a model is that its

descriptors have an interpretation as clear as possible. In the

Table 2. Fragment Contributions.

Fragment Contribution Fragment Contribution Fragment Contribution Fragment Contribution

F1 3.679 F11 0.572 F21 1.468 F31 0.263

F2 0.698 F12 1.005 F22 1.683 F32 1.566

F3 1.371 F13 2.937 F23 0.356 F33 2.850

F4 0.142 F14 2.660 F24 0.499 F34 4.197

F5 1.539 F15 1.897 F25 0.865 F35 3.615

F6 1.407 F16 0.370 F26 3.732 F36 0.580

F7 0.474 F17 0.409 F27 1.130 F37 0.903

F8 2.672 F18 0.469 F28 0.697 F38 1.763

F9 1.785 F19 0.746 F29 0.471 F39 0.200

F10 3.530 F20 0.884 F30 2.837 F40 2.872

Figure 2. ROC curves for the training and prediction set using the

model represented by eq. (9).



Figure 3. Some fragments present in the molecules.



model represented by eq. (9), the functional group counts and

atom-centered descriptors have an easy interpretation because

they mean certain type of functional features and atoms in the

molecules respectively. On the other hand, the interpretation of

a spectral moment (TOPS-MODE) can be a difficult task.

However, the principal advantage of these descriptors is that

since they are introduced in eq. (9), consequently we can calcu-

late the relative contribution of any fragment, to the antitrypano-

somal activity. At the same time, other important advantage, and

by the definition of the spectral moments, is that some fragments

that are not present in the molecules under study can be gener-

ated and their contribution can be calculated. The general meth-

odology in this computational approach, for the calculation of

the quantitative contribution of a given substructure (fragment)

to the antitrypanosomal activity is as follows. In the first step,

all the substructures (see Fig. 3) whose contribution we would

Figure 4. Molecules designed and predicted by the model with a proved antitrypanosomal activity.



like to determine were selected. The spectral moments for each

substructure were calculated and their contributions to the anti-

trypanosomal activity were obtained (Table 2) by the substitu-

tion of their spectral moments in the discriminant model. Some

fragments contain specific information represented by the func-

tional group counts and atom-centered descriptors, and for this

reason, this additional information was taken into account to cal-

culate quantitative contributions. Thus, we obtained fragments

with favorable (positive) influence to develop antitrypanosomal

activity and fragments with undesirable (negative) contribution

to this activity.

Other thing that we have to point out is that some fragments

with negative (or small positive) contributions can appear in

active (antitrypanosomal) molecules and at the same time, frag-

ments with considerable positive contributions can appear in

inactive molecules. Thus, the presence or absence of an antitry-

panosomal activity is not associated with only one fragment; it

depends on the sum of all fragments present in the molecule.

For this reason, the sum of all fragments in the compounds

under study determines that they can be classified as antitrypa-

nosomal agents.

Design and Prediction

The last and one of the most important steps in the development

of a discriminant model, is the prediction of new compounds.78

so give a general idea of how the model can be used for design

and prediction, we consider that some aspects should be pointed

out. This model based on fragment descriptors can be used for

both purposes: a rapid design because it gives some important

structural features, and for prediction purposes. The model offers

a simple guideline to determine some structural requirements for

the development of the antitrypanosomal activity. There are two

ways to design antitrypanosomal agents using this model:

� Starting from the structure of molecules reported as antitrypa-

nosomal agents and making molecular variations according to

the model based on fragment descriptors, to obtain new mole-

cules.

� Starting from several fragments with considerable positive

contributions and according to them to construct new molecules.

In the first alternative, the molecular structures that will be

submitted to a molecular variation to get new molecules, can be

those that the model considers as active. The molecular varia-

tions are carried out according to the structural information of

the different descriptors in the model and the probabilities of the

new molecules obtained are predicted. The second alternative

needs previous knowledge, about what the possible essential

fragments for the development of the antitrypanosomal activity

are. The new molecules are obtained from structural information

of the model descriptors in terms of quantitative contributions

and afterwards, their probabilities to be active against T. cruziwere calculated. To get an idea of how the model works in the

detection of fragments that can be useful for the design of anti-

trypanosomal agents, we selected 15 molecules. The first five

molecules are benzimidazole derivatives and the other 10 mole-

cules, that belong to 1,3,4-thiadiazolium-2-phenylamine chloride

analogues, derived from natural piperine. Recently, both groups

of compounds (Fig. 4) have been proved to exhibit considerable

antitrypanosomal activity.79,80

The model classified correctly all the compounds (Table 3).

By a simple inspection, some fragments with a favorable contri-

bution like those substructures containing benzimidazole, 3-

phenyl-1,3,4-thaidiazolium and benzo[d][1,3]dioxole moieties

(Fig. 5) and some structural features like the number of aromatic

nitro groups and those accounted by the descriptors C-016 and

C-028 (please see the excel file called molecule validation) are

present in the molecules that were predicted as potential antitry-

panosomal agents which were tested before, and our model con-

firms this experimental fact. In this case we have confirmed the

experimental fact and at the same time, we have determined in

these compounds which the essential features are, in terms of

quantitative fragment contributions, favorable for the develop-

ment of antitrypanosomal activity and that justified the potential-

ity of these compounds to inhibit the T. cruzi.

Conclusions

In this work, a model for classification, design and prediction of

antitrypanosomal compounds was obtained. The descriptors

selected permitted us to obtain a model with a good statistical

quality. The procedure has shown that a good linear discriminant

function can be generated by using simple molecular descriptors

like functional group counts and atom-centered fragment in

combination with those that contain substructural topological na-

ture like TOPS-MODE descriptors. This provided us some struc-

tural features expressed as fragments and the way they are con-

nected and influence one another’s behavior to give an idea

about the favorable structural requirements to develop antitrypa-

nosomal activity. Thus, we reported for the first time, the quanti-

tative contribution of different fragments to the antitrypanosomal

activity and substructures containing benzimidazole, 3-phenyl-

1,3,4-thaidiazolium and benzo[d][1,3]dioxole moieties could be

Table 3. Probabilities to be Antitrypanosomal Agents.

Mol ID Prob. Mol ID Prob. Mol ID Prob.

M-1 0.9828 M-6 0.9501 M-11 0.9977

M-2 0.9975 M-7 0.9994 M-12 0.7756

M-3 0.9983 M-8 0.9847 M-13 0.9971

M-4 0.9984 M-9 0.9998 M-14 0.6928

M-5 0.9990 M-10 0.8136 M-15 0.9954

Figure 5. Quantitative contribution of the principal fragments con-

tained in the molecules predicted by the model. BIM (benzimidazole

moiety), PhTD (3-phenyl-1,3,4-thiadiazolium), BDO (benzo[d][1,3]

dioxole).



determined as essentials for the development of the antitrypano-

somal activity, allowing the design and prediction of 15 mole-

cules recently reported to show a significant inhibitory effect

against T. cruzi. In this sense, the molecules predicted can

become or generate new leads for the development of more

effective and less toxic compounds with antitrypanosomal activ-

ity. The methodology developed in this work provides in princi-

ple, the design, search and prediction of novel antitrypanosomal

agents in a quantitative, rapid and easy way.

Acknowledgments

The authors thank Dr. Ernesto Estrada for the donation of MOD-

ESLAB 1.5 that was a valuable tool in the development of this

work. Referee comments contribute to a significant improvement

of this work.

References

1. World Health Organization Report, TDR (2005), Progress 2003–

2004, 17th program Report, Available at http://www.who.int/tdr/

publicatios/publicatios/pdf/pr17/chagas.pdf.

2. Urbina, J. A. Curr Pharm Des 2002, 8, 287.

3. Urbina, J. A.; Docampo, R. Trends Parasitol 2003, 19, 495.

4. Docampo, R.; Moreno, S. N. J. In Free Radicals in Biology; Pryor,

W. A., Ed.; Academic Press: New York, 1984; pp. 243–288.

5. Muelas, S.; Suarez, M.; Perez, R.; Rodrıguez, H.; Ochoa, C.;

Escario, J. A.; Gomez-Barrio, A. Mem Inst Oswaldo Cruz 2002, 97,

269.

6. Muelas-Serrano, S.; Le-Senne, A.; Fenadez-Portillo, C.; Nogal, J. J.;

Ochoa, C.; Gomez-Barrio, A. Mem Inst Oswaldo Cruz 2002, 97,

553.

7. Hansch, C.; Leo, A. QSAR: Fundamentals and Applications in

Chemistry and Biology; American Chemical Society: Washington,

DC, 1995.

8. Kubinyi, H. In Methods and Principles in Medicinal Chemistry;

Manhnhold, R.; Krogsgaard-Larsen, P.; Timmerman, H., Eds.; VCH:

New York, 1993.

9. Kier, L. B.; Hall, L. H. Molecular Connectivity in Chemistry and

Drug Research; Academic Press: New York, 1976.

10. Kier, L. B.; Hall, L. H. Molecular Connectivity in Structure-Activity

Analysis; Research Studies Press: Letchworth, Herts, 1986.

11. Ivanciuc, O. In Handbook of Chemoinformatics; Gasteiger, J., Ed.;

Wiley-VCH: 2003; pp. 103–138.

12. Ivanciuc, O. In Handbook of Chemoinformatics; Gasteiger, J., Ed.;

Wiley-VCH: 2003; pp. 981–1003.

13. Garcıa-Domenech, R.; Garcıa, F. J.; Soler, R. M.; Galvez, J.; Anton-

Fos, G. M.; De Julian-Ortiz, J. V. Quant Struc-Act Rel 1996, 15, 1.

14. Julian-Ortiz, J. V.; Galvez, J.; Munoz-Collado, C.; Garcıa-Dome-

nech, R.; Gimeno-Cardona, C. J Med Chem 1999, 42, 3308.

15. De Gregorio Alapont, C.; Garcıa-Domenech, R.; Galvez, J.; Ros, M.

J.; Wolski, S.; Garcıa, M. D. Bioorg Med Chem Lett 2000, 10,

2033.

16. Gozalbes, R.; Brun-Pascaud, M.; Galvez, J.; Garcıa-Domenech, R.;

Girard, P. M.; Doucet, J. P.; Derouin, F. Antimicrob Agents Chemo-

ther 2000, 44, 2764.

17. Kubinyi, H.; Kehrhahn, O. H. J Med Chem 1976, 19, 578.

18. Kubinyi, H. J Med Chem 1976, 19, 587.

19. Bonse, S.; Santelli-Rouvier, C.; Barbe, J.; Krauth-Siegel, R. L.

J Med Chem 1999, 42, 5448.

20. De Koning, H. P. Int J Parasitol 2001, 31, 511.

21. Elhalem, E.; Bailey, B. N.; Docampo, R.; Ujvary, I.; Szajnman,

S. H.; Rodrıguez, L. B. J Med Chem 2002, 45, 3984.

22. Avila, J. L.; Rojas, T.; Avila, A.; Polegre, M. A.; Robins, R. K.

Antimicrob Agents Chemother 1987, 31, 447.

23. Oliveira, R. B.; Passos, A. P. F.; Alves, R. O.; Romanha, A. J.;

Prado, M. A. F.; Dias De Sousa Filho, J.; Alves, R. J. Mem Inst

Oswaldo Cruz 2003, 98, 141.

24. Carvalho, S. A.; Da Silva, E. F.; Santa-Rita, R. M.; De Castro,

S. L.; Fraga, C. A. M. Bioorg Med Chem Lett 2004, 14, 5967.

25. Cerecetto, H.; Di Maio, R.; Gonzalez, M.; Risso, M.; Sagrera, G.;

Seoane, G.; Denicola, A.; Peluffo, G.; Quijano, C.; Stoppani, A. O.;

Paulino, M.; Olea-Azar, C.; Basombrıo, M. A. Eur J Med Chem

2000, 35, 343.

26. Gilbert, I. H. Biochim Biophys Acta 2002, 1587, 249.

27. Du, X.; Guo, C.h; Hansell, E.; Doyle, P. S.; Caffrey, C. R.;

Holler, T. P.; Mckerrow, J. H.; Cohen, F. E. J Med Chem 2002, 45,

2695.

28. Jonckers, T. H. M.; Van Miert, S.; Cimanga, K.; Baillo, C.h; Colson,

P.; De Pauw-Gillet, M. C.; Van Den Heuvel, H.; Claeys, M.;

Lemiere, F.; Esmans, E. L.; Rozenski, J.; Quirijnen, L.; Maes, L.;

Dommisse, R.; Lemiere, G. L.; Vlietinck, A.; Pieters, L. J Med

Chem 2002, 45, 3497.

29. Prieto, J. J.; Talevi, A.; Bruno-Blanch, L. E. Mol Divers 2006, 10,

361.

30. Mcgrath, M. E.; Eakin, A. E.; Engel, J. C.; Mckerrow, J. H.; Craik,

C. S.; Fletterick, R. J. J Mol Biol 1995, 247, 251.

31. Eakin, A. E.; Guerra, A.; Focia, P. J.; Torres-Martinez, J.; Craig, S.

P., III. Antimicrob Agents Chemother 1997, 41, 1686.

32. Chowdhury, S. F.; Bernier Villamor, V. B.; Guerrero, R. H.; Leal, I.;

Brun, R.; Croft, S. L.; Goodman, J. M.; Maes, L.; Ruiz-Perez, L. M.;

Gonzalez-Pacanowska, D.; Gilbert, I. H. J Med Chem 1999, 42, 4300.

33. Blankenfeldt, W.; Nowicki, C.; Montemartini-Kalisz, M.; Kalisz, H.

M.; Hecht, H. J. Protein Sci 1999, 8, 2406.

34. Zuccotto, F.; Zvelebil, M.; Brun, R.; Chowdhury, S. F.; Di Lucrezia,

R.; Leal, I.; Maes, L.; Ruiz-Perez, L. M.; Gonzalez-Pacanowska, D.;

Gilbert, I. H. Eur J Med Chem 2001, 36, 395.

35. Buckner, F.; Yokoyama, K.; Lockman, J.; Aikenhead, K.; Ohkanda,

J.; Sadilek, M.; Sebti, S.; Van Voorhis, W.; Hamilton, A.; Gelb, M.

H. PNAS 2003, 100, 15149.

36. Montero-Torres, A.; Vega, M. C.; Marrero-Ponce, Y.; Rolon, M.;

Gomez-Barrio, A.; Escario, J. A.; Aran, V. J.; Martınez-Fernandez,

A. R.; Meneses-Marcel, A. Bioorg Med Chem 2005, 13, 6264.

37. Prado-Prado, F. J.; Gonzalez-Dıaz, H.; Martinez De La Vega, O.;

Ubeira, F. M.; Chou, K. Ch Bioorg Med Chem 2008, 16, 5871.

38. Viswanadhan, V. N.; Ghose, A. K.; Revankar, G. R.; Robins, R. K.

J Chem Inf Comput Sci 1989, 29, 163.

39. Viswanadhan, V. N.; Reddy, M. R.; Bacquet, R. J.; Erion, M. D.

J Comput Chem 1993, 14, 1019.

40. Estrada, E.; Molina, E. J Mol Graph Model 2001, 20, 54.

41. Estrada, E. J Chem Inf Comput Sci 1996, 36, 844.



44. Estrada, E. J Chem Soc Faraday Trans 1998, 94, 1407.


46. Estrada, E.; Pena, A.; Garcıa-Domenech, R. J Comput-Aided Mol

Des 1998, 12, 583.

47. Estrada, E.; Gutierrez, Y. J Chromatogr A 1999, 858, 187.

48. Estrada, E.; Gutierrez, Y.; Gonzalez, H. J Chem Inf Comput Sci

2000, 40, 1386.

49. Estrada, E.; Pena, A. Bioorg Med Chem 2000, 8, 2755.

50. Estrada, E.; Uriarte, E.; Montero, A.; Tejeira, M.; Santana, L.; De

Clercq, E. J Med Chem 2000, 43, 1975.



51. Perez-Gonalez, M.; Gonzalez-Dıaz, H.; Molina-Ruiz, R.; Cabrera,

M. A.; Ramos De Armas, R. J Chem Inf Comput Sci 2003, 43,

1192.

52. Perez-Gonalez, M.; Morales Helguera, A.; Morales Rodrıguez, Y.

Electron J Mol Des 2004, 3, 750.


54. Estrada, E.; Uriarte, E. SAR QSAR Environ Res 2001, 12, 309.

55. Estrada, E.; Molina, E.; Uriarte, E. SAR QSAR Environ Res 2001,

12, 445.

56. Estrada, E.; Uriarte, E. Curr Med Chem 2001, 8, 1573.

57. Estrada, E.; Molina, E.; Perdomo-Lopez, I. J Chem Inf Comput Sci

2001, 41, 1015.

58. Lima Leite, A. C.; Souza De Lima, R.; Moreira, D. R.; Cardoso,

M. V.; Gouveia De Brito, A. C.; Farias Dos Santos, L. M. Bioorg

Med Chem 2006, 14, 3749.

59. Rolon, M.; Seco, E. M.; Vega, C.; Nogal, J. J.; Escario, J. A.;

Gomez-Barrio, A.; Malpartida, F. Int J Antimicrob Agents 2006, 28,

104.

60. Gonzalez, J. L.; Stephens, C. E.; Wenzler, T.; Brun, R.; Tanious,

F. A.; Wilson, W. D.; Barszcz, T.; Werbovets, K. A.; Boykin,

D. W. Eur J Med Chem 2007, 42, 552.

61. Gerpe, A.; Odreman-Nunez, I.; Draper, P.; Boiani, L.; Urbina, J. A.;

Gonzalez, M.; Cerecetto, H. Bioorg Med Chem 2008, 16, 569.

62. Caterina, M. C.; Perillo, I. A.; Boiani, L.; Pezaroglo, H.; Cerecetto,

H.; Gonzalez, M.; Salerno, A. Bioorg Med Chem 2008, 16, 2226.

63. Porcal, W.; Hernandez, P.; Boiani, L.; Boiani, M.; Ferreira, A.; Chi-

dichimo, A.; Cazzulo, J. J.; Olea-Azar, C.; Gonzalez, M.; Cerecetto,

H. Bioorg Med Chem 2008, 16, 6995.

64. O’neil, M. J.; Heckelman, P. E.; Koch, C. B.; Roman, K. J.; Kenny,

C. M.; D’arecca, M. R. THE MERCK INDEX: An Encyclopedia of

Chemical, Drugs and Biologicals, MERCK & Co., Inc: Whitehouse

Station, NJ, USA, 2006.

65. PubChem Database, Available at: http://pubchem.ncbi.nlm.nih.gov/.

66. HyperchemTM Release 7.01 for Windows (Molecular Modeling Sys-

tem). � 2002, Hypercube, Inc.

67. Talete srl, DRAGON for Windows (Software for Molecular Descrip-

tor Calculations). Version 5.4, 2006, Available at: http://www.talete.

mi.it/dragon.htm.

68. Platts, J. A.; Butina, D.; Abraham, M. H.; Hersey, A. J Chem Inf

Comput Sci 1999, 39, 835.

69. Estrada, E.; Gutierrez, Y. MODESLAB 1.5. � 2002–2004. Avail-

able at: www.modeslab.com.

70. Kowalski, R. B.; Wold, S. Handbook of Statistic; Krishnaiah, P. R.;

Kanal, L. N., Eds.; North Holland Publishing Company: Amsterdam,

1982; pp. 673–697.

71. Johnson, R. A.; Wichern, D. W. Applied Multivariate Statistical

Analysis, Prentice-Hall: NJ, 1998.

72. StatSoft, Inc. 2001. STATISTICA (data analysis software system),

version 6. Available at: www.statsoft.com.

73. Buryak, A.; Severin, K. J Comb Chem 2006, 8, 540.

74. Estrada, E. Topological Indices and Related Descriptors in QSAR

and QSPR; Dellivers, J.; Balaban, A. T., Eds.; Gordon and Breach:

Amsterdam, 1999; pp. 403–453.

75. Fawcett, T. ROC Graphs: Notes and Practical Considerations for

Researchers. Tech. Rep. HPL. 2003–4, Palo Alto, 2004.

76. Hanley, J. A.; Mcneil, B. J. Radiology 1982, 143, 29.

77. MATLAB Version 6.5. � 1984–2002. The MathWorks, Inc. Avail-

able at: http://www.mathworks.com.

78. Van Waterbeemd, H. Methods and Principles in Medicinal Chemis-

try; Van Waterbeemd, H.; Manhnhold, R.; Krogsgaard-Larsen, P.;

Timmerman, H., Eds.; VCH: Weinhiem, 1995; pp. 283–293.

79. Da Silva Ferreira, W.; Freire-De-Lima, L.; Barbosa Saraiva, V.;

Alisson-Silva, F.; Mendoca-Previato, L.; Previato, J. O.; Echevarria,

A.; Freire De Lima, M. E. Bioorg Med Chem 2008, 16, 2984.

80. Brain-Isasi, S.; Quezada, C.; Pessoa, H.; Morello, A.; Kogan, M. J.;

Alvarez-Lueje, A. Bioorg Med Chem 2008, 7622.



Designing novel antitrypanosomal agents from a mixed graph-theoretical substructural approach

Documents

Transcript of Designing novel antitrypanosomal agents from a mixed graph-theoretical substructural approach