Designing novel antitrypanosomal agents from a mixed graph-theoretical substructural approach
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Designing novel antitrypanosomal agents from a mixed graph-theoretical substructural approach
Designing Novel Antitrypanosomal Agents from a Mixed
Graph-Theoretical Substructural Approach
ALEJANDRO SPECK PLANCHE,1,2 MARCUS TULIUS SCOTTI,2,3 VICENTE DE PAULO EMERENCIANO,2
AMERICA GARCIA LOPEZ,1 ENRIQUE MOLINA PEREZ,4,5,6 EUGENIO URIARTE6
1Department of Chemistry, Faculty of Natural Sciences, University of Oriente,Santiago de Cuba 90500, Cuba
2Institute of Chemistry, University of Sa~o Paulo, Postal Box 26777,Sa~o Paulo, 05513-970, Brazil
3Center of Applied Sciences and Education, Federal University of Paraiba,Campus IV, Rio Tinto 58297-000, Brazil
4Faculty of Chemistry, University of Camaguey, Camaguey 74650, Cuba5Health and Food Technologies Development Group, University of Camaguey,
Camaguey 74650, Cuba6Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela,
Santiago de Compostela 15982, Spain
Received 21 October 2008; Revised 28 April 2009; Accepted 7 June 2009DOI 10.1002/jcc.21374
Published online 17 July 2009 in Wiley InterScience (www.interscience.wiley.com).
Abstract: Chagas disease is nowadays the most serious parasitic health problem. This disease is caused by Trypa-nosoma cruzi. The great number of deaths and the insufficient effectiveness of drugs against this parasite have
alarmed the scientific community worldwide. In an attempt to overcome this problem, a model for the design and
prediction of new antitrypanosomal agents was obtained. This used a mixed approach, containing simple descriptors
based on fragments and topological substructural molecular design descriptors. A data set was made up of 188 com-
pounds, 99 of them characterized an antitrypanosomal activity and 88 compounds that belong to other pharmaceuti-
cal categories. The model showed sensitivity, specificity and accuracy values above 85%. Quantitative fragmental
contributions were also calculated. Then, and to confirm the quality of the model, 15 structures of molecules tested
as antitrypanosomal compounds (that we did not include in this study) were predicted, taking into account the infor-
mation on the abovementioned calculated fragmental contributions. The model showed an accuracy of 100% which
means that the ‘‘in silico’’ methodology developed by our team is promising for the rational design of new antitrypa-
nosomal drugs.
q 2009 Wiley Periodicals, Inc. J Comput Chem 31: 882–894, 2010
Key words: massive screening; antitrypanosomal; discriminant analysis; fragment contributions; graph theoretical
descriptors
Introduction
Chagas’ disease has been identified as a major parasitic health
problem. That is a consequence of the rising levels of urbaniza-
tion and migration. Chagas disease is caused by Trypanosomacruzi (T. cruzi) and it was specific only to Central and South
America until 2004; currently about 13 million of people are
estimated to be infected with this parasite. It is calculated that
25–30% of chronically infected patients will eventually suffer
irreversible damage to the heart and digestive tract, resulting in
about 14,000 annual deaths and causing considerable morbidity.
The disease can be transmitted in three principal ways: by the
bite of bloodsucking triatomine insect vectors, by blood transfu-
sions or by congenital transmission; the countries of the South-
ern Cone of South America (Argentina, Bolivia, Brazil, Chile,
Paraguay, Uruguay) are the mostaffected by Chagas’ disease,
Additional supporting information may be found in the online version of
this article.
Correspondence to: A. S. Planche; e-mail: [email protected],
Contract/grant sponsor: Xunta de Galicia; contract/grant number:
20304PR
Contract/grant sponsors: CAPES/MES-Cuba, CNPq, FAPESP
q 2009 Wiley Periodicals, Inc.
because there, the main vector species lives in or around human
homes.1
Medication for Chagas’ disease is generally effective when it
is given during early stages of the infection and no treatment
based on medication has been proven to be efficient in later
stages. Unfortunately, although nifurtimox (Nfx) and benznida-
zole (Bnz) (the only two drugs available for the clinical treat-
ment of this disease) were able to eliminate patent parasitemia
and to reduce serological titers in acute and early chronic infec-
tions, they are not active against T. cruzi strains and have signif-
icantly lower efficacy in long-term chronic infections.2,3 More-
over, both drugs act via reduction of nitro group. Thus, this
reduction in the case of Nfx generates an unstable nitro radical
which reduces oxygen aand the species becomes highly toxic.
On the other hand, Bnz involves covalent modification of mac-
romolecules by nitro reduction intermediates.4 All these incon-
venients, explain the need for the search and design of new
drugs against T. cruzi.5,6
In the modern drug design, there are mainly two ways to dis-
cover new drugs: the ‘‘rational design’’ and ‘‘massive screen-
ing’’. In the case of rational design, some of the currently used
methods require previous knowledge about the target and the
mechanism of action involved, but this information is frequently
limited or is not available. In the massive screening, the knowl-
edge about the target can help to understand better the drug-
receptor interaction, but this is not a determinant condition to
develop a rational design. In this sense, computer-aided drug
design methods have been rapidly developed in recent years.7
Specially, methods related to the definition and application of
graph-theoretical descriptors have been very useful in quantita-
tive structure-activity relationship (QSAR) to perform a rational
analysis of biological activities.8 These descriptors permit to
codify the structural information of molecules and express it as
a number. Graph-based molecular descriptors have demonstrated
to be very efficient in drug design.9–12 Many works, using
graph-theoretical approaches, have been reported to design new
drugs, to model toxicities and to find common structural patterns
of the known drugs. We can mention the design of analgesic,
antiviral, antibacterial and antimycobacterial compounds.13–16
Other descriptors have been used to model pharmacological
activities like those based on fragmental contributions. Free-Wil-
son analysis can be considered as a pioneer in this sense.8 This
approach expresses, by means of indicator variables, the contri-
bution in defined positions, of a given structural feature, to the
biological activity. An advantage of the indicator variables of
the Free-Wilson analysis is that they have been extended and
they can be perfectly combined with other types of descrip-
tors.8,17,18
In the context of the active compounds against T. cruzi, some
works on biological assays have been published and in QSAR
studies applied to compounds with antitrypanosomal activ-
ity.5,6,19–28 However, the massive screening of the search for
new antitrypanosomal compounds using graph-theoretical
approaches has not been a sufficiently explored area. One of the
three reported works is the virtual screening of compounds as
inhibitors of trypanothione reductase.29 However, the principal
disadvantage of this work seems to be the restriction to the inhi-
bition analysis of only one target (trypanothione reductase) and
it is known that in the last 15 years some potential targets of
T. cruzi have been discovered and studied.30–35
The other two works can be considered as promising in this
area. One of them uses a non-stochastic quadratic fingerprints-
based approach obtaining a model with a good statistical quality
and some of the designed compounds were synthesized and their
antitrypanosomal activity was proved.36 The other study pub-
lished in the field uses the markov chains invariants for simula-
tion and design (MARCH-INSIDE) approach, that takes into
consideration not only the molecular structure of a given drug,
but also the average probability of an atom to interact with a
specific site of the target in a given microorganism. Thus, this
methodology has permitted, by the development of a single uni-
fied equation, to model the antimicrobial activity37 of several
drugs against several protozoa species. The knowledge about
fragmental features and how they are connected spatially is
determining for the development of new drugs. The aim of the
present work was to develop a model, using a mixed graph-theo-
retical substructural approach based on fragment descriptors to
obtain a molecular pattern for the search, design and prediction
of new compounds with antitrypanosomal activity, reporting for
the first time quantitative contributions of essential fragmental
features for the development of antitrypanosomal activity.
Methods
Fragment Descriptors
The principal utility of molecular descriptors based on fragments
is the possibility to obtain information about which fragments
have positive (favorable) influence and which ones have undesir-
able influence for the activity. They permit to reorient the design
of bioactive compounds to minimize the number of fragments
with negative contribution and maximize the number of frag-
ments with positive contribution. In the present work, we used a
mixed approach by employing an atom-centered fragment,38,39
functional group counts and topological substructural molecular
design (TOPS-MODE) descriptors40 to obtain fragment contribu-
tions that will provide us a guideline for the design of antitrypa-
nosomal compounds.
Atom-centered Fragment and Functional
Group Count Descriptors
Atom-centered fragments are useful molecular descriptors and
they have been employed in some QSAR/QSPR studies.38,39
These are simple molecular descriptors defined as the number of
specific atom types in a molecule. They are calculated by know-
ing the molecular composition and atom connectivities. Each
type of atom is an atom in the molecule described by its neigh-
boring atoms. Hydrogen and halogen atoms are classified by the
hybridization and oxidation state of the carbon atom to which
they are bound; for hydrogens, heteroatoms attached to a carbon
atom ina-position are further considered. Carbon atoms are clas-
sified by their hybridization state and depending on whether
their neighbors are carbon or heteroatoms. Other descriptors that
express certain fragmental features are the functional group
counts. These are simple molecular descriptors defined as the
883Designing Novel Antitrypanosomal Agents
Journal of Computational Chemistry DOI 10.1002/jcc
number of specific functional groups in a molecule. They are
calculated by knowing both the molecular composition and atom
connectivities. Both, atom-centered fragment and functional
group count descriptors can be considered as related to indicator
variables of a Free-Wilson analysis, but with some marked dif-
ferences. One of them is that in the case when these descriptors
not only mean presence or absence of a given fragment, but
also, when a structural feature is present, these descriptors
express the frequency of appearance of a structural feature in
the molecule, independently of the position in which that feature
appears.
TOPS-MODE Approach
This approach has been applied to describe physicochemical
properties of organic compounds and many applications for the
design of bioactive compounds have also been reported. The
theoretical background of the method has been widely
described.41–52 Anyway, we will focus only on the main issues.
This method codifies the molecular structure by means of the
edge adjacency matrix E (commonly called the bond adjacency
matrix B).53
The E o B matrix is a square table of order m (the number
of chemical bonds in a molecule). The elements of this matrix
(eij) are equal to 1 if bonds i and j are adjacent (it means that iand j are incident on the same vertex or atom) or 0 otherwise.
To codify the information on heteroatoms, the TOPS-MODE
approach uses B(wij) weighted matrices instead of B. The
weights (wij) are chemically meaningful numbers such as bond
distances, bond dipole, bond polarizabilities, or mathematical
expressions involving atomic weights.54–56 The weights are
introduced on the main diagonal of matrix B(wij). Then, the
spectral moments of this matrix can be used as molecular finger-
prints in QSAR studies for the codification of molecular struc-
tures. By mathematical definition, the expression ‘‘spectral
moments’’ must be understood as the sum of the elements (eij)in the natural powers of B(wij). Then, the spectral moment of
order k (lk) is the sum of the main diagonal elements (eij) of
matrix B(wij)k. The total spectral moments of the bond matrix43–45
are defined as:
lk ¼ TrðBkÞ ¼Xs
i¼1ðeiiÞk (1)
where Tr means the trace of the matrix, that is the sum of the diago-
nal entries of the matrix and the elements (eii)k are the diagonal
entries of the kth power of the bond matrix. Local spectral moments
are defined as the sum of the diagonal entries of different powers of
the bond matrix corresponding to a given molecular fragment.40 In
mathematical terms, the local spectral moments of the bond matrix
are defined as follows:
lkðf Þ ¼Xf
i¼1ðeiiÞk (2)
where f is the corresponding fragment for which the moment is
defined and the sum is carried out over all bonds forming the frag-
ment f. The simplest case is when f corresponds to a single bond,
and in this case the kth local moment is defined as the diagonal
entry corresponding to this bond in the matrix raised to the kthpower. Although TOPS-MODE descriptors are topological in na-
ture, their main advantage over most types of graph-theoretical
descriptors is the possibility to calculate the contributions of the
fragments to a desired activity.50,51 This fact provides the study of
local chemical processes like addition, substitution, elimination
and many others reactions. Also, TOPS-MODE descriptors have
demonstrated to explain successfully a considerable part of the geo-
metrical phenomena.57
Selection of the Data Set. Molecular Modeling and
Development of Discriminant Analysis
One of the most important steps in computer-aided search of
novel drugs is the design of representative and randomized train-
ing and prediction sets. With this aim, we selected a data set
made up of 188 compounds, 99 of them characterized by an
antitrypanosomal activity36 (Fig. 1), and 89 compounds that
belong to other pharmaceutical categories. The active com-
pounds belong to different families of compounds such as:
imidazoles, imidazolidines, triazoles, macrocycles, nitrofurans,
thiocyanate derivatives, and many others. They were active,
principally against the epimastigote and amastigote forms of
T. cruzi.3,19–28,36,58–63 The 89 inactive compounds were selected
from drugs of 19 pharmacological categories: analgesics, antiin-
flammatories, antianxiety drugs, anticonvulsants, hypnotics, mus-
cle relaxants, sedatives, antiparkinsonians, antihypertensives,
diuretics, anticoagulants, antihistaminics, antivirals, antitumorals,
local anesthetics, vasoconstrictors, anabolics, antioxidants, and
antituberculosis agents.64,65
All molecules were represented using Hyperchem version
7.0166 generating *.hin files. Subsequently, the DRAGON pro-
gram version 5.467 was used to calculate the functional group
counts and atom-centered fragment descriptors. On the other
hand, all molecules were converted as smiles codes and recorded
in a txt file. This file was used to calculate TOPS-MODE
descriptors and we used the molar refractivity and Abraham mo-
lecular interaction terms as weights of bond adjacency matrix.
This interaction term was expressed as excess in molar refrac-
tion and contained important information about the solvation
phenomena.68 MODESLAB version 1.5.69
The performance of a discriminant analysis depends on the
selection of compounds for the training and prediction series
used to build the classifier model. One of the most critical
aspects in the construction of the training series is to warrant a
great molecular diversity in this data set.70 For this reason, and
to design both, training and prediction sets, the complete data
set was processed using two k-means cluster analysis (k-
MCA),71 one for the set of antitrypanosomal compounds, and
the other, for inactive ones. The k-MCA was carried out with
STATISTICA 6.0,72 using de k-MCA module.
To obtain the acceptable statistical quality of the data parti-
tion in clusters, we took into account the number of members in
each cluster and the standard deviation of the variables in the
clusters (as low as possible). We also inspected the standard
deviation between and within clusters, the respective Fisher ratio
and their p-level of significance, considered to be lower than
0.05. The functional group counts and atom-centered fragment
884 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
descriptors were used to develop the k-MCA. The training set
(selecting randomly compounds from each cluster) was made up
of 142 drugs, 74 compounds with antitrypanosomal activity and
68 inactive compounds, and the prediction set included 46 com-
pounds (�24.5% of whole data), 25 antitrypanosomal com-
pounds and 21 inactive compounds.
Figure 1. Structure of the molecules active against T. cruzi in the training and prediction series.
885Designing Novel Antitrypanosomal Agents
Journal of Computational Chemistry DOI 10.1002/jcc
Linear discriminant analysis (LDA) has been the election sta-
tistical technique in some QSPR and QSAR studies.49–51,73 In
the present work, a similar expression for the QSAR was
derived:
ATc ¼ a0 þ a1D1 þ a2D2 þ � � � þ akDk (3)
In the eq. (3), ATc (acronym of antitrypanosomal activity) is
a dummy indicator variable. This variable reaches values of 1
Figure 1. (continued)
886 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
for antitrypanosomal compounds and 21 for inactive ones. The
ak term is the kth coefficient in the discriminant function and Dk
term is the kth descriptor (fragmental/TOPS-MODE). The dis-
criminant function was obtained by employing the LDA modu-
lus of STATISTICA 6.0. The default parameters of this program
were used in the development of the model. The variables to be
included in the equation were selected using forward stepwise
procedure as variable selection strategy. The selection of the
model was subjected to the principle of parsimony (Occam’s
razor).74 Then, the function with a high statistical significance,
but having as few parameters as possible, was chosen.
The statistical quality of the model was determined by exam-
ining the Wilks k, the squared Mahalanobis distance (D2), the
Fisher ratio (F), the corresponding p-level, the percentage of
classification among the group (for each case) and the proportion
between cases and variables. The Wilks k for the overall dis-
crimination can take values in the range from 0 (perfect discrim-
ination) to 1 (no discrimination). The D2 statistic indicates the
separation of the respective groups showing whether the model
possesses an appropriate discriminatory power to differentiate
between the respective two groups. Compounds were considered
unclassified by the model when the differences in the percentage
of classification between groups do not differ in more than 5%.
Compounds in the prediction set were never used to develop the
discrimination function. In the case of the prediction series, we
focused only on the ability of the model to classify active and
inactive compounds.
Additionally and to validate the models, for the training and
prediction data sets, some other statistical indices were given,
i.e., sensitivity (sens) as true positive rate, specificity (spec) as
true negative rate, accuracy (acc) as overall predictability, posi-
tive predictive value (PPV) and negative predictive value
(NPV). All these parameters were determined according to the
following equations:
sens ¼ TP
Cþ 3 100% (4)
spec ¼ TN
C� 3 100% (5)
acc ¼ TPþ TN
ðCþÞ þ ðC�Þ 3 100% (6)
PPV ¼ TP
TPþ FP3 100% (7)
NPV ¼ TN
TNþ FN3 100% (8)
where TP means true positive compounds, C1 the total active
compounds, TN means true negative compounds, C2 is the total
inactive compounds, FP are the false positives and FN are the
false negatives.
Figure 1. (continued)
887Designing Novel Antitrypanosomal Agents
Journal of Computational Chemistry DOI 10.1002/jcc
Results and Discussion
Discriminant Model
After performing a random and representative selection of train-
ing and prediction series, the first one was used to fit the dis-
criminant function. The best model had the three types of
descriptors obtained by us, an atom-centered fragment, func-
tional group counts and TOPS-MODE descriptors. The classifi-
cation function that distinguishes between antitrypanosomal
compounds and inactive compounds that belong to the training
set together with the statistical parameters of the LDA is as
follows:
ATc ¼ 2:459ðRSCNÞ þ 1:176 � 10�2lðMolÞ2
� 3:051 � 10�5lðAb-R2Þ8 þ 1:701ðArNO2Þ þ 3:004ðPyrazÞþ 0:853ðC-028Þ þ 1:295ðHydÞ � 0:541ðH-053Þ
þ 0:415ðC-016Þ � 1:188ðRCOORÞ þ 2:238ðCH2RXÞ � 2:964
N ¼ 142; k ¼ 0:377; D2 ¼ 6:521; Fð11; 130Þ ¼ 19:506;
p < 0:0000001 ð9Þ
In the [eq. (9)], RSCN represents the number of aliphatic thi-
ocianate groups, l2(Mol) is the spectral moment of order 2,
weighted by the molar refractivity, l8(Ab-R2) means the spectral
moment of order 8, weighted by the Abraham molar refraction
excess term. The descriptor ArNO2 represents the number of ar-
omatic nitro groups, Pyraz, means the number of pyrazole moi-
eties, C-028 is the number of carbon atoms attached to a carbon
atom and to one electronegative atom (O, N, S, P, Se, halogens)
by aromatic bonds (as in benzene) or delocalized bonds (such as
the N��O bond in a nitro group) and to another carbon atom by
a simple bond, Hyd is the number of hydrazone moieties, H-053
represents the number of hydrogens attached to C0(sp3) with
two electronegative atoms attached to next carbon atom (the
superscript represents the formal oxidation number and it is cal-
culated as the sum of the conventional bond orders with electro-
negative atoms), C-016 is the number of sp2 carbon atom
attached by single bonds to one carbon and to one hydrogen
atom, RCOOR describes the number of aliphatic ester groups
and the descriptor represented by CH2RX means the number of
methylene moieties attached to halogen atom and to other
carbon atom.
To differentiate between active/inactive compounds in the
training series, the sensitivity (e.g., ability to identify antitrypa-
nosomal compounds correctly) of the model is 90.54% and the
specificity (e.g., ability to identify inactive compounds correctly)
is 89.71%, for an accuracy of 90.14%. The positive and the neg-
ative predictive values in the training series are 90.54 and
89.71% respectively, which means in the case of the PPV that if
the QSAR modeling predicts a compound with antitrypanosomal
activity, the probability of this compound to be positive is
90.54%. In a similar way, the NPV means in this case that if the
model predicts the compound is inactive against T. cruzi, the
probability of this compound to be inactive is 89.71%.
Models often misclassify some compounds and this is a fact
we have to take into consideration when we use them. A classi-
cally accepted solution is checking the squared Mahalanobis dis-
tance (D2) of the molecule with respect to the two centroids of
both groups (active and inactive compounds). Generally, in the
case of abnormal values of D2, this prediction must not be used.
No misclassified compound was found in the model and they
did not show abnormal values of Mahalanobis distances. The
most important criteria to accept or not a discriminant model
such as the one represented by the eq. (9), is based on the statis-
tics for the prediction set. The sensitivity of the model for the
prediction set was 92% and the specificity was 90.47% which
means an accuracy of 91.30%. The positive and negative predic-
tive values were 92 and 90.47% respectively. Supporting Infor-
mation Tables 1 and 2 shows the percentages of classification
obtained for the training and prediction sets respectively.
As it can be seen, the model has a good statistical quality.
An important thing that we have to point out is the number of
variables in the model. Two principal aspects were taken into
account. The first is the complexity of the data in which there is
a great variability in terms of structure, composition, and num-
ber of atoms. Simple molecules like guanine analogues as well
as complexe amphotericin B, were analyzed in this model. For
this reason, if we consider that we were trying to ‘‘catch’’ the
Table 1. Correlation Matrix at p\ 0.05, N 5 142.
RSCN RCOOR ArNO2 Hyd CH2RX Pyraz C-016 C-028 H-053 l2(Mol) l8
(Ab-R2)
RSCN 1.00 20.08 20.10 20.10 20.04 20.05 20.06 20.11 20.10 20.11 20.25
RCOOR 1.00 0.07 20.09 20.03 20.05 0.58 20.10 0.39 0.33 0.22
ArNO2 1.00 0.39 20.04 20.05 20.05 20.07 20.11 20.13 20.11
Hyd 1.00 20.04 20.06 20.07 20.12 20.11 20.02 20.17
CH2RX 1.00 20.02 20.03 20.04 20.04 20.02 20.11
Pyraz 1.00 20.03 0.01 20.06 20.20 0.07
C-016 1.00 20.10 0.63 0.47 0.28
C-028 1.00 20.08 0.04 20.07
H-053 1.00 0.46 0.32
l2(Mol) 1.00 0.56
l8(Ab-R2) 1.00
888 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
structural features at a substructural level, then an increase in
the number of variables should be expected.
Correlation Among Variables
In the statistic model, the independent variables have to be
highly correlated many times. That is not a consequence of the
statistical nature of the method employed, but it is due to the in-
herent properties of the used descriptors. This disturbing factor
is frequently overlooked and can lead to instability in the model
and the principal difficulty appears when we need to analyze the
relative importance of the variables in the model. The correla-
tion coefficient between independent variables should not over-
pass the value of 0.7, which is traditionally considered as the
cut-off for the analysis of collinearity between variables. The
correlation matrix of the model (Table 1) shows that the varia-
bles used do not overpass the value of 0.7 and thus, they give
specific structural information.
Receiver Operating Characteristic (ROC) Curve
Sensitivity and specificity describe the true performance with
greater clarity than accuracy, but they also have some disadvan-
tages. For a specific classifier, they represent two measures, one
for positive cases and the other one for negative ones. A com-
pound measure for the classifier is given through receiver oper-
ating characteristic (ROC) analysis. ROC is a classic methodol-
ogy from signal detection theory.75,76 Crucial for understanding
ROC is a confidence threshold in the classification task. For the
strict threshold the sensitivity will be low while specificity very
high. If the criterion for the threshold is lowered, specificity will
fall while sensitivity will be raised. In this way we can compare
two classifiers over a broad range of threshold conditions. ROC
curve is created by plotting the true positive rate against false
positive rates or sensitivity against 1-specificity.
The ROC curve going along the diagonal from bottom left to
upper right represents pure-chance performance. When the vari-
able (or variables) under study cannot distinguish between the
two groups, i.e. where there is no difference sbetween the two
distributions, the area will be equal to 0.5 (the ROC curve will
coincide with the diagonal). When there is a perfect separation
of the values of the two groups, i.e. there is no overlapping of
the distributions, the area under the ROC curve equals 1 (the
ROC curve will reach the upper left corner of the plot). The
ROC curves for the model (using both, the training and predic-
tion sets) (Fig. 2) were performed using the MATLAB program
version 6.5.77
The value for the area under the ROC curve can be inter-
preted as follows: for the training and prediction sets, the areas
under the ROC curves have the values of 0.96 and 0.95 respec-
tively, which means that in the case of the training set, a ran-
domly selected compound from the active series (antitrypano-
somal compounds) has a value higher than that for a randomly
chosen compound from the inactive series in 96% of the times.
For the prediction set the same interpretation can be carried out.
Thus, the values of ROC areas confirm that the model obtained
is not a random classifier, taking into account that the area val-
ues are significantly higher than 0.5.
Structural Interpretation
One of the most important features of a model is that its
descriptors have an interpretation as clear as possible. In the
Table 2. Fragment Contributions.
Fragment Contribution Fragment Contribution Fragment Contribution Fragment Contribution
F1 3.679 F11 0.572 F21 1.468 F31 0.263
F2 0.698 F12 1.005 F22 1.683 F32 1.566
F3 1.371 F13 2.937 F23 0.356 F33 2.850
F4 0.142 F14 2.660 F24 0.499 F34 4.197
F5 1.539 F15 1.897 F25 0.865 F35 3.615
F6 1.407 F16 0.370 F26 3.732 F36 0.580
F7 0.474 F17 0.409 F27 1.130 F37 0.903
F8 2.672 F18 0.469 F28 0.697 F38 1.763
F9 1.785 F19 0.746 F29 0.471 F39 0.200
F10 3.530 F20 0.884 F30 2.837 F40 2.872
Figure 2. ROC curves for the training and prediction set using the
model represented by eq. (9).
889Designing Novel Antitrypanosomal Agents
Journal of Computational Chemistry DOI 10.1002/jcc
Figure 3. Some fragments present in the molecules.
890 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
model represented by eq. (9), the functional group counts and
atom-centered descriptors have an easy interpretation because
they mean certain type of functional features and atoms in the
molecules respectively. On the other hand, the interpretation of
a spectral moment (TOPS-MODE) can be a difficult task.
However, the principal advantage of these descriptors is that
since they are introduced in eq. (9), consequently we can calcu-
late the relative contribution of any fragment, to the antitrypano-
somal activity. At the same time, other important advantage, and
by the definition of the spectral moments, is that some fragments
that are not present in the molecules under study can be gener-
ated and their contribution can be calculated. The general meth-
odology in this computational approach, for the calculation of
the quantitative contribution of a given substructure (fragment)
to the antitrypanosomal activity is as follows. In the first step,
all the substructures (see Fig. 3) whose contribution we would
Figure 4. Molecules designed and predicted by the model with a proved antitrypanosomal activity.
891Designing Novel Antitrypanosomal Agents
Journal of Computational Chemistry DOI 10.1002/jcc
like to determine were selected. The spectral moments for each
substructure were calculated and their contributions to the anti-
trypanosomal activity were obtained (Table 2) by the substitu-
tion of their spectral moments in the discriminant model. Some
fragments contain specific information represented by the func-
tional group counts and atom-centered descriptors, and for this
reason, this additional information was taken into account to cal-
culate quantitative contributions. Thus, we obtained fragments
with favorable (positive) influence to develop antitrypanosomal
activity and fragments with undesirable (negative) contribution
to this activity.
Other thing that we have to point out is that some fragments
with negative (or small positive) contributions can appear in
active (antitrypanosomal) molecules and at the same time, frag-
ments with considerable positive contributions can appear in
inactive molecules. Thus, the presence or absence of an antitry-
panosomal activity is not associated with only one fragment; it
depends on the sum of all fragments present in the molecule.
For this reason, the sum of all fragments in the compounds
under study determines that they can be classified as antitrypa-
nosomal agents.
Design and Prediction
The last and one of the most important steps in the development
of a discriminant model, is the prediction of new compounds.78
so give a general idea of how the model can be used for design
and prediction, we consider that some aspects should be pointed
out. This model based on fragment descriptors can be used for
both purposes: a rapid design because it gives some important
structural features, and for prediction purposes. The model offers
a simple guideline to determine some structural requirements for
the development of the antitrypanosomal activity. There are two
ways to design antitrypanosomal agents using this model:
� Starting from the structure of molecules reported as antitrypa-
nosomal agents and making molecular variations according to
the model based on fragment descriptors, to obtain new mole-
cules.
� Starting from several fragments with considerable positive
contributions and according to them to construct new molecules.
In the first alternative, the molecular structures that will be
submitted to a molecular variation to get new molecules, can be
those that the model considers as active. The molecular varia-
tions are carried out according to the structural information of
the different descriptors in the model and the probabilities of the
new molecules obtained are predicted. The second alternative
needs previous knowledge, about what the possible essential
fragments for the development of the antitrypanosomal activity
are. The new molecules are obtained from structural information
of the model descriptors in terms of quantitative contributions
and afterwards, their probabilities to be active against T. cruziwere calculated. To get an idea of how the model works in the
detection of fragments that can be useful for the design of anti-
trypanosomal agents, we selected 15 molecules. The first five
molecules are benzimidazole derivatives and the other 10 mole-
cules, that belong to 1,3,4-thiadiazolium-2-phenylamine chloride
analogues, derived from natural piperine. Recently, both groups
of compounds (Fig. 4) have been proved to exhibit considerable
antitrypanosomal activity.79,80
The model classified correctly all the compounds (Table 3).
By a simple inspection, some fragments with a favorable contri-
bution like those substructures containing benzimidazole, 3-
phenyl-1,3,4-thaidiazolium and benzo[d][1,3]dioxole moieties
(Fig. 5) and some structural features like the number of aromatic
nitro groups and those accounted by the descriptors C-016 and
C-028 (please see the excel file called molecule validation) are
present in the molecules that were predicted as potential antitry-
panosomal agents which were tested before, and our model con-
firms this experimental fact. In this case we have confirmed the
experimental fact and at the same time, we have determined in
these compounds which the essential features are, in terms of
quantitative fragment contributions, favorable for the develop-
ment of antitrypanosomal activity and that justified the potential-
ity of these compounds to inhibit the T. cruzi.
Conclusions
In this work, a model for classification, design and prediction of
antitrypanosomal compounds was obtained. The descriptors
selected permitted us to obtain a model with a good statistical
quality. The procedure has shown that a good linear discriminant
function can be generated by using simple molecular descriptors
like functional group counts and atom-centered fragment in
combination with those that contain substructural topological na-
ture like TOPS-MODE descriptors. This provided us some struc-
tural features expressed as fragments and the way they are con-
nected and influence one another’s behavior to give an idea
about the favorable structural requirements to develop antitrypa-
nosomal activity. Thus, we reported for the first time, the quanti-
tative contribution of different fragments to the antitrypanosomal
activity and substructures containing benzimidazole, 3-phenyl-
1,3,4-thaidiazolium and benzo[d][1,3]dioxole moieties could be
Table 3. Probabilities to be Antitrypanosomal Agents.
Mol ID Prob. Mol ID Prob. Mol ID Prob.
M-1 0.9828 M-6 0.9501 M-11 0.9977
M-2 0.9975 M-7 0.9994 M-12 0.7756
M-3 0.9983 M-8 0.9847 M-13 0.9971
M-4 0.9984 M-9 0.9998 M-14 0.6928
M-5 0.9990 M-10 0.8136 M-15 0.9954
Figure 5. Quantitative contribution of the principal fragments con-
tained in the molecules predicted by the model. BIM (benzimidazole
moiety), PhTD (3-phenyl-1,3,4-thiadiazolium), BDO (benzo[d][1,3]
dioxole).
892 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
determined as essentials for the development of the antitrypano-
somal activity, allowing the design and prediction of 15 mole-
cules recently reported to show a significant inhibitory effect
against T. cruzi. In this sense, the molecules predicted can
become or generate new leads for the development of more
effective and less toxic compounds with antitrypanosomal activ-
ity. The methodology developed in this work provides in princi-
ple, the design, search and prediction of novel antitrypanosomal
agents in a quantitative, rapid and easy way.
Acknowledgments
The authors thank Dr. Ernesto Estrada for the donation of MOD-
ESLAB 1.5 that was a valuable tool in the development of this
work. Referee comments contribute to a significant improvement
of this work.
References
1. World Health Organization Report, TDR (2005), Progress 2003–
2004, 17th program Report, Available at http://www.who.int/tdr/
publicatios/publicatios/pdf/pr17/chagas.pdf.
2. Urbina, J. A. Curr Pharm Des 2002, 8, 287.
3. Urbina, J. A.; Docampo, R. Trends Parasitol 2003, 19, 495.
4. Docampo, R.; Moreno, S. N. J. In Free Radicals in Biology; Pryor,
W. A., Ed.; Academic Press: New York, 1984; pp. 243–288.
5. Muelas, S.; Suarez, M.; Perez, R.; Rodrıguez, H.; Ochoa, C.;
Escario, J. A.; Gomez-Barrio, A. Mem Inst Oswaldo Cruz 2002, 97,
269.
6. Muelas-Serrano, S.; Le-Senne, A.; Fenadez-Portillo, C.; Nogal, J. J.;
Ochoa, C.; Gomez-Barrio, A. Mem Inst Oswaldo Cruz 2002, 97,
553.
7. Hansch, C.; Leo, A. QSAR: Fundamentals and Applications in
Chemistry and Biology; American Chemical Society: Washington,
DC, 1995.
8. Kubinyi, H. In Methods and Principles in Medicinal Chemistry;
Manhnhold, R.; Krogsgaard-Larsen, P.; Timmerman, H., Eds.; VCH:
New York, 1993.
9. Kier, L. B.; Hall, L. H. Molecular Connectivity in Chemistry and
Drug Research; Academic Press: New York, 1976.
10. Kier, L. B.; Hall, L. H. Molecular Connectivity in Structure-Activity
Analysis; Research Studies Press: Letchworth, Herts, 1986.
11. Ivanciuc, O. In Handbook of Chemoinformatics; Gasteiger, J., Ed.;
Wiley-VCH: 2003; pp. 103–138.
12. Ivanciuc, O. In Handbook of Chemoinformatics; Gasteiger, J., Ed.;
Wiley-VCH: 2003; pp. 981–1003.
13. Garcıa-Domenech, R.; Garcıa, F. J.; Soler, R. M.; Galvez, J.; Anton-
Fos, G. M.; De Julian-Ortiz, J. V. Quant Struc-Act Rel 1996, 15, 1.
14. Julian-Ortiz, J. V.; Galvez, J.; Munoz-Collado, C.; Garcıa-Dome-
nech, R.; Gimeno-Cardona, C. J Med Chem 1999, 42, 3308.
15. De Gregorio Alapont, C.; Garcıa-Domenech, R.; Galvez, J.; Ros, M.
J.; Wolski, S.; Garcıa, M. D. Bioorg Med Chem Lett 2000, 10,
2033.
16. Gozalbes, R.; Brun-Pascaud, M.; Galvez, J.; Garcıa-Domenech, R.;
Girard, P. M.; Doucet, J. P.; Derouin, F. Antimicrob Agents Chemo-
ther 2000, 44, 2764.
17. Kubinyi, H.; Kehrhahn, O. H. J Med Chem 1976, 19, 578.
18. Kubinyi, H. J Med Chem 1976, 19, 587.
19. Bonse, S.; Santelli-Rouvier, C.; Barbe, J.; Krauth-Siegel, R. L.
J Med Chem 1999, 42, 5448.
20. De Koning, H. P. Int J Parasitol 2001, 31, 511.
21. Elhalem, E.; Bailey, B. N.; Docampo, R.; Ujvary, I.; Szajnman,
S. H.; Rodrıguez, L. B. J Med Chem 2002, 45, 3984.
22. Avila, J. L.; Rojas, T.; Avila, A.; Polegre, M. A.; Robins, R. K.
Antimicrob Agents Chemother 1987, 31, 447.
23. Oliveira, R. B.; Passos, A. P. F.; Alves, R. O.; Romanha, A. J.;
Prado, M. A. F.; Dias De Sousa Filho, J.; Alves, R. J. Mem Inst
Oswaldo Cruz 2003, 98, 141.
24. Carvalho, S. A.; Da Silva, E. F.; Santa-Rita, R. M.; De Castro,
S. L.; Fraga, C. A. M. Bioorg Med Chem Lett 2004, 14, 5967.
25. Cerecetto, H.; Di Maio, R.; Gonzalez, M.; Risso, M.; Sagrera, G.;
Seoane, G.; Denicola, A.; Peluffo, G.; Quijano, C.; Stoppani, A. O.;
Paulino, M.; Olea-Azar, C.; Basombrıo, M. A. Eur J Med Chem
2000, 35, 343.
26. Gilbert, I. H. Biochim Biophys Acta 2002, 1587, 249.
27. Du, X.; Guo, C.h; Hansell, E.; Doyle, P. S.; Caffrey, C. R.;
Holler, T. P.; Mckerrow, J. H.; Cohen, F. E. J Med Chem 2002, 45,
2695.
28. Jonckers, T. H. M.; Van Miert, S.; Cimanga, K.; Baillo, C.h; Colson,
P.; De Pauw-Gillet, M. C.; Van Den Heuvel, H.; Claeys, M.;
Lemiere, F.; Esmans, E. L.; Rozenski, J.; Quirijnen, L.; Maes, L.;
Dommisse, R.; Lemiere, G. L.; Vlietinck, A.; Pieters, L. J Med
Chem 2002, 45, 3497.
29. Prieto, J. J.; Talevi, A.; Bruno-Blanch, L. E. Mol Divers 2006, 10,
361.
30. Mcgrath, M. E.; Eakin, A. E.; Engel, J. C.; Mckerrow, J. H.; Craik,
C. S.; Fletterick, R. J. J Mol Biol 1995, 247, 251.
31. Eakin, A. E.; Guerra, A.; Focia, P. J.; Torres-Martinez, J.; Craig, S.
P., III. Antimicrob Agents Chemother 1997, 41, 1686.
32. Chowdhury, S. F.; Bernier Villamor, V. B.; Guerrero, R. H.; Leal, I.;
Brun, R.; Croft, S. L.; Goodman, J. M.; Maes, L.; Ruiz-Perez, L. M.;
Gonzalez-Pacanowska, D.; Gilbert, I. H. J Med Chem 1999, 42, 4300.
33. Blankenfeldt, W.; Nowicki, C.; Montemartini-Kalisz, M.; Kalisz, H.
M.; Hecht, H. J. Protein Sci 1999, 8, 2406.
34. Zuccotto, F.; Zvelebil, M.; Brun, R.; Chowdhury, S. F.; Di Lucrezia,
R.; Leal, I.; Maes, L.; Ruiz-Perez, L. M.; Gonzalez-Pacanowska, D.;
Gilbert, I. H. Eur J Med Chem 2001, 36, 395.
35. Buckner, F.; Yokoyama, K.; Lockman, J.; Aikenhead, K.; Ohkanda,
J.; Sadilek, M.; Sebti, S.; Van Voorhis, W.; Hamilton, A.; Gelb, M.
H. PNAS 2003, 100, 15149.
36. Montero-Torres, A.; Vega, M. C.; Marrero-Ponce, Y.; Rolon, M.;
Gomez-Barrio, A.; Escario, J. A.; Aran, V. J.; Martınez-Fernandez,
A. R.; Meneses-Marcel, A. Bioorg Med Chem 2005, 13, 6264.
37. Prado-Prado, F. J.; Gonzalez-Dıaz, H.; Martinez De La Vega, O.;
Ubeira, F. M.; Chou, K. Ch Bioorg Med Chem 2008, 16, 5871.
38. Viswanadhan, V. N.; Ghose, A. K.; Revankar, G. R.; Robins, R. K.
J Chem Inf Comput Sci 1989, 29, 163.
39. Viswanadhan, V. N.; Reddy, M. R.; Bacquet, R. J.; Erion, M. D.
J Comput Chem 1993, 14, 1019.
40. Estrada, E.; Molina, E. J Mol Graph Model 2001, 20, 54.
41. Estrada, E. J Chem Inf Comput Sci 1996, 36, 844.
42. Estrada, E. J Chem Inf Comput Sci 1997, 37, 320.
43. Estrada, E. J Chem Inf Comput Sci 1998, 38, 23.
44. Estrada, E. J Chem Soc Faraday Trans 1998, 94, 1407.
45. Estrada, E. J Chem Inf Comput Sci 1999, 39, 90.
46. Estrada, E.; Pena, A.; Garcıa-Domenech, R. J Comput-Aided Mol
Des 1998, 12, 583.
47. Estrada, E.; Gutierrez, Y. J Chromatogr A 1999, 858, 187.
48. Estrada, E.; Gutierrez, Y.; Gonzalez, H. J Chem Inf Comput Sci
2000, 40, 1386.
49. Estrada, E.; Pena, A. Bioorg Med Chem 2000, 8, 2755.
50. Estrada, E.; Uriarte, E.; Montero, A.; Tejeira, M.; Santana, L.; De
Clercq, E. J Med Chem 2000, 43, 1975.
893Designing Novel Antitrypanosomal Agents
Journal of Computational Chemistry DOI 10.1002/jcc
51. Perez-Gonalez, M.; Gonzalez-Dıaz, H.; Molina-Ruiz, R.; Cabrera,
M. A.; Ramos De Armas, R. J Chem Inf Comput Sci 2003, 43,
1192.
52. Perez-Gonalez, M.; Morales Helguera, A.; Morales Rodrıguez, Y.
Electron J Mol Des 2004, 3, 750.
53. Estrada, E. J Chem Inf Comput Sci 1995, 35, 31.
54. Estrada, E.; Uriarte, E. SAR QSAR Environ Res 2001, 12, 309.
55. Estrada, E.; Molina, E.; Uriarte, E. SAR QSAR Environ Res 2001,
12, 445.
56. Estrada, E.; Uriarte, E. Curr Med Chem 2001, 8, 1573.
57. Estrada, E.; Molina, E.; Perdomo-Lopez, I. J Chem Inf Comput Sci
2001, 41, 1015.
58. Lima Leite, A. C.; Souza De Lima, R.; Moreira, D. R.; Cardoso,
M. V.; Gouveia De Brito, A. C.; Farias Dos Santos, L. M. Bioorg
Med Chem 2006, 14, 3749.
59. Rolon, M.; Seco, E. M.; Vega, C.; Nogal, J. J.; Escario, J. A.;
Gomez-Barrio, A.; Malpartida, F. Int J Antimicrob Agents 2006, 28,
104.
60. Gonzalez, J. L.; Stephens, C. E.; Wenzler, T.; Brun, R.; Tanious,
F. A.; Wilson, W. D.; Barszcz, T.; Werbovets, K. A.; Boykin,
D. W. Eur J Med Chem 2007, 42, 552.
61. Gerpe, A.; Odreman-Nunez, I.; Draper, P.; Boiani, L.; Urbina, J. A.;
Gonzalez, M.; Cerecetto, H. Bioorg Med Chem 2008, 16, 569.
62. Caterina, M. C.; Perillo, I. A.; Boiani, L.; Pezaroglo, H.; Cerecetto,
H.; Gonzalez, M.; Salerno, A. Bioorg Med Chem 2008, 16, 2226.
63. Porcal, W.; Hernandez, P.; Boiani, L.; Boiani, M.; Ferreira, A.; Chi-
dichimo, A.; Cazzulo, J. J.; Olea-Azar, C.; Gonzalez, M.; Cerecetto,
H. Bioorg Med Chem 2008, 16, 6995.
64. O’neil, M. J.; Heckelman, P. E.; Koch, C. B.; Roman, K. J.; Kenny,
C. M.; D’arecca, M. R. THE MERCK INDEX: An Encyclopedia of
Chemical, Drugs and Biologicals, MERCK & Co., Inc: Whitehouse
Station, NJ, USA, 2006.
65. PubChem Database, Available at: http://pubchem.ncbi.nlm.nih.gov/.
66. HyperchemTM Release 7.01 for Windows (Molecular Modeling Sys-
tem). � 2002, Hypercube, Inc.
67. Talete srl, DRAGON for Windows (Software for Molecular Descrip-
tor Calculations). Version 5.4, 2006, Available at: http://www.talete.
mi.it/dragon.htm.
68. Platts, J. A.; Butina, D.; Abraham, M. H.; Hersey, A. J Chem Inf
Comput Sci 1999, 39, 835.
69. Estrada, E.; Gutierrez, Y. MODESLAB 1.5. � 2002–2004. Avail-
able at: www.modeslab.com.
70. Kowalski, R. B.; Wold, S. Handbook of Statistic; Krishnaiah, P. R.;
Kanal, L. N., Eds.; North Holland Publishing Company: Amsterdam,
1982; pp. 673–697.
71. Johnson, R. A.; Wichern, D. W. Applied Multivariate Statistical
Analysis, Prentice-Hall: NJ, 1998.
72. StatSoft, Inc. 2001. STATISTICA (data analysis software system),
version 6. Available at: www.statsoft.com.
73. Buryak, A.; Severin, K. J Comb Chem 2006, 8, 540.
74. Estrada, E. Topological Indices and Related Descriptors in QSAR
and QSPR; Dellivers, J.; Balaban, A. T., Eds.; Gordon and Breach:
Amsterdam, 1999; pp. 403–453.
75. Fawcett, T. ROC Graphs: Notes and Practical Considerations for
Researchers. Tech. Rep. HPL. 2003–4, Palo Alto, 2004.
76. Hanley, J. A.; Mcneil, B. J. Radiology 1982, 143, 29.
77. MATLAB Version 6.5. � 1984–2002. The MathWorks, Inc. Avail-
able at: http://www.mathworks.com.
78. Van Waterbeemd, H. Methods and Principles in Medicinal Chemis-
try; Van Waterbeemd, H.; Manhnhold, R.; Krogsgaard-Larsen, P.;
Timmerman, H., Eds.; VCH: Weinhiem, 1995; pp. 283–293.
79. Da Silva Ferreira, W.; Freire-De-Lima, L.; Barbosa Saraiva, V.;
Alisson-Silva, F.; Mendoca-Previato, L.; Previato, J. O.; Echevarria,
A.; Freire De Lima, M. E. Bioorg Med Chem 2008, 16, 2984.
80. Brain-Isasi, S.; Quezada, C.; Pessoa, H.; Morello, A.; Kogan, M. J.;
Alvarez-Lueje, A. Bioorg Med Chem 2008, 7622.
894 Planche et al. • Vol. 31, No. 4 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc