Predicting resistance mutations using protein design algorithms

6
Predicting resistance mutations using protein design algorithms Kathleen M. Frey a,1 , Ivelin Georgiev b,1,3 , Bruce R. Donald b,c,2 , and Amy C. Anderson a,2 a Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269; b Department of Computer Science, Duke University; and c Department of Biochemistry, Duke University, Durham, NC 27708-0129 Edited by Robert M. Stroud, University of California, San Francisco, CA, and approved June 15, 2010 (received for review February 20, 2010) Drug resistance resulting from mutations to the target is an unfor- tunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of muta- tions after clinical exposure, it would be powerful to be able to in- corporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K , to predict potential resistance muta- tions in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition as- says show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorpo- rated in a lead design strategy against any target that is susceptible to mutational resistance. negative design algorithm DHFR MRSA K* design algorithm R esistance has been observed for even the most reserved anti- biotics, sometimes after only brief clinical exposure. One of the most common resistance mechanisms is the accumulation of mutations in an enzyme target, creating an active site that can no longer accommodate the inhibitor yet maintains function. When these resistance mutations are discovered in the clinic, the mutants must be identified and studied, forcing the drug design process to start anew. To address this problem in preclinical drug discovery, resistance mutants are generated and studied in vitro with labor-intensive experiments. In contrast, it would be useful to predict resistance mutations in silico during the very early stages of drug discovery, thus encouraging strategies to overcome these limitations during the design process. In response to this need, we have developed and experimentally tested a protocol to computationally predict resistance mutations in a protein target, using algorithms for positive and negative structure-based protein design. Protein design algorithms have recently been developed and used to reengineer proteins and enzymes to bind unique ligands. In the case of enzymes, successful design requires catalytic activ- ity as well as binding. A promising approach involves ensemble- based scoring and search algorithms for protein design, which have been applied to modify the substrate specificity of an anti- biotic-producing enzyme in the nonribosomal peptide synthetase pathway (1). New algorithms for protein design, an example of which is called K , combine a statistical mechanics-derived ensemble-based approach to computing the binding constant with the speed and completeness of a branch-and-bound pruning algorithm (24). In addition, efficient deterministic algorithms in- clude provable ϵ-approximation algorithms for estimating parti- tion functions in order to model binding to arbitrary precision (4). Other examples of successful applications of computational protein design include design of unique enzymes (5, 6), enhance- ment of antibody affinity (7), and design of a unique protein fold (8). There have been few previous attempts to predict potential re- sistance mutations; most of these have been applied retrospec- tively to predict known mutations that occur in response to clinically used antibiotics. Predictions have identified peptide sequences that would bind more tightly to a mutant HIV protease enzyme (9), profiled tyrosine kinase inhibitors (10), and estimated vitality valuesto account for the catalytic activity of resistant mutants of HIV protease (11). A recent review (12) summarizes structure- and sequence-based attempts to predict known mutations in HIV protease and reverse transcriptase. Using structure-based design, we have been developing a unique class of propargyl-linked antifolate inhibitors that are ac- tive against several pathogenic species of dihydrofolate reductase (DHFR), specifically, methicillin-resistant Staphylococcus aureus (MRSA) DHFR (Sa DHFR) (13, 14). As MRSA has an extensive array of resistance mechanisms, it is critical to consider the likely development of resistance for any new inhibitors. Therefore, given that we have determined high resolution structures of wild-type Sa DHFR bound to these propargyl-linked antifolates (13, 14), we considered this to be an excellent case to apply struc- ture-based protein design algorithms for resistance mutation prediction. Here, we report a prospective study that uses the protein de- sign algorithm, K , to predict resistance mutations in DHFR from MRSA toward a potent propargyl-linked antifolate. Structure- based negative design predicted mutations to destabilize the binding of the inhibitor; positive design predicted mutations that stabilize the native protein function. Intersecting the top-scoring positive and negative designs predicted candidate resistance mu- tations. Four of the top ten ranked mutant enzymes were created and evaluated. Three of the mutants indeed maintained activity and displayed lower (18-, 9- and 13-fold) affinity for the inhibitor. A crystal structure of the top-ranked mutant bound to the inhi- bitor shows a conformation of the ligand that clearly has signi- ficantly fewer interactions with the protein. This unique and expedient approach to resistance mutation prediction should be useful for the development of inhibitors toward other targets for which drug therapy is limited by mutational resistance. Author contributions: K.M.F., I.G., B.R.D., and A.C.A. designed research; K.M.F. and I.G. performed research; K.M.F., I.G., B.R.D., and A.C.A. analyzed data; and K.M.F., I.G., B.R.D., and A.C.A. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The atomic coordinates have been deposited in the Protein Data Bank, www.pdb.org (PDB ID codes 3F0Q and 3LG4). 1 K.M.F. and I.G. contributed equally to this work 2 To whom correspondence may be addressed. E-mail: [email protected] or brd [email protected]. 3 Present address: Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1002162107/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1002162107 PNAS August 3, 2010 vol. 107 no. 31 1370713712 BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Transcript of Predicting resistance mutations using protein design algorithms

Predicting resistance mutations usingprotein design algorithmsKathleen M. Freya,1, Ivelin Georgievb,1,3, Bruce R. Donaldb,c,2, and Amy C. Andersona,2

aDepartment of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269; bDepartment of Computer Science, Duke University; andcDepartment of Biochemistry, Duke University, Durham, NC 27708-0129

Edited by Robert M. Stroud, University of California, San Francisco, CA, and approved June 15, 2010 (received for reviewFebruary 20, 2010)

Drug resistance resulting from mutations to the target is an unfor-tunate common phenomenon that limits the lifetime of many ofthe most successful drugs. In contrast to the investigation of muta-tions after clinical exposure, it would be powerful to be able to in-corporate strategies early in the development process to predictand overcome the effects of possible resistance mutations. Herewe present a unique prospective application of an ensemble-basedprotein design algorithm, K�, to predict potential resistance muta-tions in dihydrofolate reductase from Staphylococcus aureus usingpositive design to maintain catalytic function and negative designto interfere with binding of a lead inhibitor. Enzyme inhibition as-says show that three of the four highly-ranked predicted mutantsare active yet display lower affinity (18-, 9-, and 13-fold) for theinhibitor. A crystal structure of the top-ranked mutant enzymevalidates the predicted conformations of the mutated residuesand the structural basis of the loss of potency. The use of proteindesign algorithms to predict resistancemutations could be incorpo-rated in a lead design strategy against any target that is susceptibleto mutational resistance.

negative design algorithm ∣ DHFR ∣ MRSA ∣ K* design algorithm

Resistance has been observed for even the most reserved anti-biotics, sometimes after only brief clinical exposure. One of

the most common resistance mechanisms is the accumulation ofmutations in an enzyme target, creating an active site that can nolonger accommodate the inhibitor yet maintains function. Whenthese resistance mutations are discovered in the clinic, themutants must be identified and studied, forcing the drug designprocess to start anew. To address this problem in preclinical drugdiscovery, resistance mutants are generated and studied in vitrowith labor-intensive experiments. In contrast, it would be usefulto predict resistance mutations in silico during the very earlystages of drug discovery, thus encouraging strategies to overcomethese limitations during the design process. In response to thisneed, we have developed and experimentally tested a protocolto computationally predict resistance mutations in a proteintarget, using algorithms for positive and negative structure-basedprotein design.

Protein design algorithms have recently been developed andused to reengineer proteins and enzymes to bind unique ligands.In the case of enzymes, successful design requires catalytic activ-ity as well as binding. A promising approach involves ensemble-based scoring and search algorithms for protein design, whichhave been applied to modify the substrate specificity of an anti-biotic-producing enzyme in the nonribosomal peptide synthetasepathway (1). New algorithms for protein design, an example ofwhich is called K�, combine a statistical mechanics-derivedensemble-based approach to computing the binding constantwith the speed and completeness of a branch-and-bound pruningalgorithm (2–4). In addition, efficient deterministic algorithms in-clude provable ϵ-approximation algorithms for estimating parti-tion functions in order to model binding to arbitrary precision (4).Other examples of successful applications of computationalprotein design include design of unique enzymes (5, 6), enhance-

ment of antibody affinity (7), and design of a unique proteinfold (8).

There have been few previous attempts to predict potential re-sistance mutations; most of these have been applied retrospec-tively to predict known mutations that occur in response toclinically used antibiotics. Predictions have identified peptidesequences that would bind more tightly to a mutant HIV proteaseenzyme (9), profiled tyrosine kinase inhibitors (10), and estimated“vitality values” to account for the catalytic activity of resistantmutants of HIV protease (11). A recent review (12) summarizesstructure- and sequence-based attempts to predict knownmutations in HIV protease and reverse transcriptase.

Using structure-based design, we have been developing aunique class of propargyl-linked antifolate inhibitors that are ac-tive against several pathogenic species of dihydrofolate reductase(DHFR), specifically, methicillin-resistant Staphylococcus aureus(MRSA) DHFR (Sa DHFR) (13, 14). As MRSA has an extensivearray of resistance mechanisms, it is critical to consider the likelydevelopment of resistance for any new inhibitors. Therefore,given that we have determined high resolution structures ofwild-type Sa DHFR bound to these propargyl-linked antifolates(13, 14), we considered this to be an excellent case to apply struc-ture-based protein design algorithms for resistance mutationprediction.

Here, we report a prospective study that uses the protein de-sign algorithm, K�, to predict resistance mutations in DHFR fromMRSA toward a potent propargyl-linked antifolate. Structure-based negative design predicted mutations to destabilize thebinding of the inhibitor; positive design predicted mutations thatstabilize the native protein function. Intersecting the top-scoringpositive and negative designs predicted candidate resistance mu-tations. Four of the top ten ranked mutant enzymes were createdand evaluated. Three of the mutants indeed maintained activityand displayed lower (18-, 9- and 13-fold) affinity for the inhibitor.A crystal structure of the top-ranked mutant bound to the inhi-bitor shows a conformation of the ligand that clearly has signi-ficantly fewer interactions with the protein. This unique andexpedient approach to resistance mutation prediction shouldbe useful for the development of inhibitors toward other targetsfor which drug therapy is limited by mutational resistance.

Author contributions: K.M.F., I.G., B.R.D., and A.C.A. designed research; K.M.F. and I.G.performed research; K.M.F., I.G., B.R.D., and A.C.A. analyzed data; and K.M.F., I.G.,B.R.D., and A.C.A. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The atomic coordinates have been deposited in the Protein Data Bank,www.pdb.org (PDB ID codes 3F0Q and 3LG4).1K.M.F. and I.G. contributed equally to this work2To whom correspondence may be addressed. E-mail: [email protected] or [email protected].

3Present address: Vaccine Research Center, National Institute of Allergy and InfectiousDiseases, National Institutes of Health, Bethesda, MD 20892.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1002162107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1002162107 PNAS ∣ August 3, 2010 ∣ vol. 107 ∣ no. 31 ∣ 13707–13712

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Results and DiscussionCompound 1 (Fig. 1) is a lead inhibitor of MRSA DHFR(Ki ¼ 10 nM) and possesses antibacterial activity against isolatesof MRSA in culture. In order to elucidate the basis of its potency,we determined a crystal structure of the enzyme bound to itscofactor, NADPH and compound 1. Crystals of the wild-typeSa DHFR showed diffraction amplitudes to 2.1 Å (Table 1);the structure was determined with difference Fourier methodsusing coordinates of Sa (F98Y)DHFR (15). The model revealsthe overall characteristic “DHFR fold” that consists of aneight-stranded beta sheet and four alpha helices connected byflexible loop regions. The pyrimidine ring of compound 1 formstwo conserved hydrogen bonds with Asp 27 and the 4-aminogroup forms two additional hydrogen bonds with the backbonecarbonyl atoms of residues Leu 5 and Phe 92 (Fig. 1). The pro-pargyl linker and propargylic methyl group form van der Waalscontacts with Phe 92 and Leu 20. The methyl group at the C6position forms van der Waals interactions with Leu 20 andLeu 28 and the meta-biphenyl ring extends into a hydrophobicpocket formed primarily by residues Ile 50, Leu 28, and Leu 54.

In order to identify residues that when mutated may incur abinding penalty for compound 1 but that would maintain bindingto the substrate, dihydrofolate, K� searches using the minDEE∕A � ∕K� algorithms (1, 4) were performed. For these searches, thealgorithm used as input the crystal structure of Sa DHFR:com-pound 1∶NADPH as well as a model of Sa DHFR bound toNADPH and dihydrofolate, which was adapted from coordinatesof a single mutant Sa DHFR bound to the same ligands (15). Tenactive site residues (Leu 5, Val 6, Leu 20, Asp 27, Leu 28, Val 31,Thr 46, Ile 50, Leu 54, and Phe 92) were modeled as flexible usingrotamers (16) and allowed to keep their wild-type identity or tomutate within a restricted group of amino acids (seeMaterials andMethods) that conserve the amino acid property or correlate witha different DHFR species. In order to comprehensively test thealgorithm, the group included mutations that represent bothsingle- and double-nucleotide polymorphisms. While single-nucleotide polymorphisms are the most prevalent mechanismof mutational resistance, more complex mutations do evolve suchas those in HIV-1 reverse transcriptase (17–19), hepatitis C pro-tease (20), hepatitis B reverse transcriptase (21), andHelicobacterpylori rpoB (22).

Based on the input model, K� searches designed to identify upto 2-point mutations were performed separately for dihydrofolateand compound 1. K� scores are computed as a ratio of Boltz-mann-weighted partition functions over rotamer-based confor-mational ensembles for the bound protein-ligand complex, thefree protein, and free ligand (see Materials and Methods). Sincehigher K� scores imply better affinity, we focused our attention onpairs of mutations that scored highly for dihydrofolate and poorlyfor compound 1, particularly on those that returned a score ofzero for compound 1 (Table 2) and a resultant score ratio ofinfinity. A mutant K� score of 0 for compound 1 implies one

of two scenarios: (1) K� pruned all rotameric conformations forthese mutants; or (2) the computed partition function for thebound protein-ligand complex for a given mutant was signifi-cantly less (in the current redesigns: more than eight orders ofmagnitude) than the product of the partition functions for thefree protein and the free ligand. Typically, scenario (1) occurswhen significant steric interference exists either within the pro-tein or between the protein and the ligand for all possiblerotameric conformations, even after minimization. An initial ster-ic overlap (before minimization) between two rotamers of morethan 1.5 Å was considered significant and the corresponding pairof rotamers was pruned from further consideration (Materials andMethods). Scenario (2) is typically caused by significantly unfavor-able interactions between the protein and the ligand, as opposedto the protein in its unbound form (Table 2). The different mu-tants with a score of 0 for compound 1 are thus not equivalentwith respect to the predicted destabilized interactions with com-pound 1. There were 105 mutants with a score ratio of infinity; thetop ten exhibited a dihydrofolate score considerably better thanthe others and were further investigated. These ten mutants fellinto two groups: the first group contained mutations at residues31 and 92, the second group contained mutations at positions 50and 92 (Table 2). The computed partition functions for the boundprotein-ligand complex (with compound 1) and the free protein(Table 2 D, E), along with the lowest energy for the respectiveensemble of conformations (Table S1) suggest that compound1 has significantly destabilized interactions with these mutants.

Interestingly, the wild-type sequence was ranked #306 out of1,173 with dihydrofolate and #369 out of 1,173 with compound 1,suggesting that a subset of the higher-ranked sequences may haveimproved binding to dihydrofolate or compound 1.

Experimental Validation of the Predicted Mutants.To test the validityof the results of the algorithm, we created four of the mutants thatshow diversity at each of the positions (ranked by the algorithm#1, 3, 7, and 9) using site-directed mutagenesis and purificationprocedures similar to the wild-type enzyme (13, 14). Enzymeactivity assays show that there is only a modest 3-fold reductionin KM values for dihydrofolate (Table 3) for the Val31/Phe92 mu-tants. While the kcat and kcat∕KM values are lower than those ofthe wild-type enzyme (Table 3), the losses are within the range ofother clinically observed DHFR mutants. For example, the F57Lmutation in Plasmodium vivax DHFR (pyrimethamine, cyclogua-nil and WR99210 resistance) (23), the L22R mutation in humanDHFR (methotrexate resistance) (24) and the A16V mutation inPlasmodium falciparum DHFR (cycloguanil resistance) (25) suf-fer 220-, 250-, and 680-fold losses, respectively, in kcat∕KM . Theninth-ranked mutant, Sa (I50W, F92S) DHFR, was not active,suggesting that the Ile50Trp mutation prevents binding tothe substrate. The results for the mutants ranked #1, 3, and 7by the algorithm experimentally validate the positive design

Table 1. Statistics of data collection and refinement

Complex Sa (WT)/ cmpd 1/NADPH Sa (V31Y,F92I)/ cmpd 1/NADPH

PDB ID 3F0Q 3LG4Space group P6122 P61Unit cell (a,b,c in Å) a ¼ b ¼ 79.16, c ¼ 108.80 a ¼ b ¼ 88.752, c ¼ 103.167Resolution, (last shell, Å) 25.9 - 2.10 (2.29 - 2.08) 42.8 - 3.15 (3.23 - 3.15)Completeness, % (last shell,%) 87.9 (95.3) 91.2 (99.7)Redundancy (last shell) 10.40 (12.2) 5.0 (5.4)Rsym (last shell) 0.050 (0.143) 0.061 (0.379)hI∕σi (last shell) 8.6 (2.7) 14.5 (3)Refinement statisticsR-factor∕Rfree 0.196; 0.231 0.26; 0.292Rms deviation bond lengths (Å), angles (°) 0.010; 1.386 0.007; 1.219Average B factors (Å2): overall; NADPH; compound 1 20.6; 16.7; 21.8 88.7; 94.7; 68.5Residues in most favored regions, allowed regions (%) 90.4, 9.6 87.5; 12.5

13708 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1002162107 Frey et al.

component of the K� algorithm and its success in predictingmutants that retain catalytic activity.

In order to assess the results of the negative design componentof the algorithm, Ki values were measured for the wild-type andSa (V31Y, F92I) DHFR enzymes with compound 1. Dixon plotsshow that the inhibitor binds competitively (Figs. S1 and S2);analysis of the plots yields Ki values of 7.5 nM and 128 nMfor the wild type and mutant, respectively. Ki values were alsocalculated from IC50 values and KM values (26) for all activemutants (Table 4). The top-ranked resistance mutant, Sa (V31Y,F92I) DHFR, shows the greatest (18-fold) loss in affinity for com-pound 1. Mutants Sa (V31Y, F92S) and Sa (V31F, F92L) DHFRalso show significant (9-fold and 13-fold, respectively) losses inpotency, suggesting that the algorithm is also successful in its ne-gative design component. The success of the algorithm promptedthe investigation of a structure of the mutant to determine whythe resistance mutations at positions 31 and 92 retain activity butlose affinity for compound 1.

Determination of a Crystal Structure of Sa (V31Y, F92I) DHFR, NADPHand Compound 1. Crystals of Sa (V31Y, F92I) DHFR showeddiffraction amplitudes to 3.15 Å (Table 1); the structure of themutant was determined using difference Fourier methods basedon the wild-type structure bound to NADPH and compound 3(Table S2) as a model (PDB ID: 3FQC) (13). There is a high de-gree of similarity between Sa (wild-type) and Sa (V31Y, F92I)DHFR, reflected in a root mean square deviation for 157 Cαatoms of 0.355 Å. The similarity of the enzymes is also reflectedin their melting temperatures, as determined by circular di-chroism (Tm wild-type ¼ 42.5 °C, Tm SaðV31Y;F92IÞ ¼ 36.3 °C,graphs shown in Fig. S3). The Sa (V31Y, F92I) DHFR mutant

structure exhibits the standard, extended conformation ofNADPH, in contrast to the alternate conformation observedin several structures of the Sa (F98Y) DHFRmutant (13). In con-trast to the wild-type structure in which the ligand fully occupiesthe site, compound 1 binds the mutant active site with 50%occupancy, suggesting that the V31Y and F92I mutations affectligand binding.

Despite the moderate resolution of the data for the mutantenzyme, the electron density maps revealed significant structuraldetails including side chain and ligand orientations in the activesite that disclose the basis of the lower affinity of compound 1(Fig. 1). Strong hydrophobic interactions made with the nativePhe92 residue and propargyl linker of compound 1 are reducedwith the mutation to Ile92. The Val31Tyr mutation introducessteric bulk in the active site that interferes with the 2-methylsubstitution on the distal phenyl ring, causing the substituted bi-phenyl of the ligand to contort around the propargyl position andreorient by approximately 60°. Reorientation positions the twophenyl rings outside the main hydrophobic pocket, causing theloss or reduction of strong hydrophobic interactions with residuesLeu 28, Val 31, Leu 54, and Phe 92. In the new position, the distalphenyl ring maintains interactions only with Leu 20. While itappears that the mutant enzyme may have bound the oppositeenantiomer compared to that bound in the wild-type structure,the resolution of the electron density does not permit exactevaluation.

KM values suggest that active sites mutated at the Val31 andPhe92 positions retain affinity for the substrate, dihydrofolate. Inorder to understand why these mutations allow substrate binding,we compared the Sa (V31Y, F92I) DHFR crystal structure boundto compound 1 and NADPH with the lowest energy predictedstructure for Sa (V31Y, F92I) DHFR bound to dihydrofolate(Fig. 2A). In contrast to compound 1, which relies on the inter-action between the propargyl linker and Phe 92, dihydrofolatelacks a propargyl linker and is minimally affected by the Ile 92mutation. In addition, the para-aminobenzoic acid moiety ofdihydrofolate can be accommodated in the limited space nearthe Tyr 31 mutation. This accommodation differs from compound1, where the biphenyl changes orientation to avoid steric repul-sion with Tyr 31. Although a crystal structure for the Sa (V31F,F92L) DHFR mutant was not obtained, it appears that thedihydrofolate molecule could also be accommodated in the spacenear the Phe 31 mutation.

Table 2. The top ten amutants as ranked by the score ratio criterionare shown with the respective K� scores with bdihydrofolate (DHF)

Compound 1

Partition functionc

Rank Mutationsa DHF K� Scoreb FPd PLe

1 V31Y/F92I 4.30 × 1040 6.6 × 10292 2.1 × 10132

2 V31Y/F92V 3.81 × 1040 2.9 × 10301 4.3 × 10143

3 V31Y/F92S 3.13 × 1040 1.2 × 10331 2.4 × 10173

4 V31Y/F92A 2.94 × 1040 1.3 × 10330 4.4 × 10171

5 V31Y/F92M 6.77 × 1038 3.4 × 10332 1.0 × 10169

6 V31Y/F92L 6.38 × 1038 2.3 × 10320 2.7 × 10150

7 V31F/F92L 6.01 × 1033 2.3 × 10327 2.3 × 10310

8 I50W/F92M 7.70 × 1032 8.8 × 10340 5.0 × 10327

9 I50W/F92S 2.74 × 1032 1.9 × 10339 3.2 × 10325

10 I50W/F92A 2.10 × 1032 1.9 × 10338 3.9 × 10323

For each of the mutants, the computed cpartition function for the dfreeprotein (FP) and eprotein-ligand (PL) complex (with compound 1) are alsoshown; all of the top 10 mutants had a K� score of 0 for compound 1.Mutants shown in bold were selected for experimental validation.

Fig. 1. Stereo view of the superposition of the ternary crystal structures of Sa (WT) DHFR (green) and Sa(V31Y/F92I) DHFR (magenta) bound to NADPH andcompound 1 (A) as B) stick models and C) surface rendering.

Table 3. Kinetic parameters for the wild-type and mutant enzymes

Enzyme KM, units in μM kcat, units in 1∕skcat∕KM

(fold decrease)

Sa (WT) 14.5 ± 3.5 31 2.14 (1.00)Sa (V31Y, F92I) 43 ± 2.6 2.8 0.06 (36)Sa (V31Y, F92S) 58 ± 3.0 1.4 0.02 (107)Sa (V31F, F92L) 45 ± 4.3 0.31 0.007 (306)

Frey et al. PNAS ∣ August 3, 2010 ∣ vol. 107 ∣ no. 31 ∣ 13709

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

To assess whether the K� algorithm predicted similar rotamerconformations as were observed in the crystal structure, the crys-tal structure of Sa (V31Y, F92I) DHFR was compared with thelowest energy predicted structure (Fig. 2B). The Tyr 31 mutationobserved in the crystal structure has the same conformation aseach of the top ten members of the ensemble of predicted struc-tures. Although the rotamer for Ile 92 in the crystal structurediffers from the predicted structure, the predicted residue islocated in the same location and space as both the mutant Sa(V31Y, F92I) and Sa (WT) DHFR structures. Similarity of rota-mer conformations suggests that the algorithm is accurate inpredicting not only the identity of residues targeted for mutation,but also the proper orientation of mutated residues.

The conformation of compound 1 from the top K�-predictedstructure lacks the reorientation of the biphenyl observed in theSa (V31Y, F92I) DHFR crystal structure. This reoriented confor-mation is significantly out of the range of ligand conformationsinput to the software to be modeled by K�. For example, in the K�input ligand rotamer (see Materials and Methods section) closestto the ligand conformation in the Sa (V31Y, F92I) DHFR crystalstructure, the biphenyl of compound 1 is rotated by approxi-mately 60°.

The goal of the K� algorithm was to block binding to com-pound 1 by introducing mutations to DHFR, based on the se-lected set of input ligand conformations. Using this input model,the algorithm successfully identified mutations that have signifi-cantly diminished binding to compound 1. The ligand flexibilitymodel used by K� significantly improves on models where ligandsare treated as rigid or in which rigid ligand rotamers are used.However, due to the expense of the computation, the numberof rotamers and the span of ligand conformations were still sig-nificantly limited. While, ideally, the entire feasible conformationspace around a ligand should be evaluated by the computationalprocedure, this would come at the cost of a combinatorial in-crease in compute times. Optimal balance should thus be soughtbetween the computational requirements and the comprehen-siveness of the flexibility models.

Nevertheless, it is interesting to determine how a ligand con-formation mimicking the one observed in the crystal structure ofthe Sa (V31Y, F92I) mutant complex would be scored by ouralgorithm. To this end, we set up and performed a K� scorecomputation for the Sa (V31Y, F92I) mutant, with the ligand di-hedrals constrained to a conformation close to the one observedin the mutant crystal structure. Interestingly, the computed lowestenergy (approx. −179.4 kcal∕mol) in the K� conformation en-semble for the bound protein-ligand complex was virtually iden-tical to the lowest energy of the original K� ensemble (approx.−180.0 kcal∕mol, see Table S1). Thus, even with the ligandconformation mimicking the one observed in the crystal structurecomplex, K� predicts the Sa (V31Y, F92I) mutant to have signi-ficantly destabilized interactions with compound 1 as comparedto the free protein (see Table S1).

Table 4. Inhibition assays for enzymes and compound 1

Enzyme Ki , nM Fold loss*

Sa wild-type (WT) 10 1.0Sa (V31Y,F92I) 180 18Sa (V31Y, F92S) 87 8.7Sa (V31F, F92L) 130 13

*Fold loss ¼ Ki value for enzyme∕Ki value for WT.

Fig. 2. Stereo views of the superposition of the Sa(V31Y/F92I) DHFR crystal structure (magenta) with A) model of dihydrofolate in the active site (cyan) and B)the lowest energy model from the predicted ensemble (yellow).

13710 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1002162107 Frey et al.

Elucidating Structure-Activity Relationships of Additional AntifolateCompounds. The structural evidence suggests that the Val31Tyrand Phe92Ile mutations decrease affinity by reducing hydropho-bic interactions with Phe 92 and introducing steric bulk at Val 31.In order to validate these hypotheses, we tested the activity offour other propargyl-based antifolates for the mutant enzymesand probed the structure-activity relationships. All four addi-tional antifolates (compounds 2, 3, 4, and 5 shown in Table S2with inhibition values) are potent inhibitors for the wild-type en-zyme, but lose potency for all of the mutant enzymes. Compound2, with an unsubstituted distal phenyl ring relative to the dimethylsubstitution on compound 1, loses only 7-fold affinity for the Sa(V31Y, F92I) DHFR enzyme, validating that the unsubstitutedphenyl is less affected by mutations at the Val 31 position. In con-trast, compound 4, with an ethyl group at the C6 position of thediaminopyrimidine ring, is affected by the Val 31 mutation(exhibiting 10.5-, 16-, and 28-fold losses with each of the enzymesmutated at Val 31) because the ethyl group has extensive van derWaals contacts with Val 31.

Mutations at the Phe 92 position affect the interactions withthe propargyl substitutions on the compounds. As such, thesemutations reduce the affinity of compound 3, with a gem-dimethylat the propargylic position, by 7–18-fold. Compound 5, with apara-biphenyl ring instead of the meta-biphenyl of compound1, has lower affinity for the wild-type enzyme (Ki ¼ 24 nM);analysis of cocrystal structures shows that the para-biphenyl ringjuts out of the active site (13). Interestingly, the orientation of thepara-biphenyl ring in these structures is similar to the new orien-tation adopted by the meta-biphenyl ring of compound 1 in the Sa(V31Y, F92I) DHFR structure.

ConclusionsPredicting potential resistance mutations in an enzyme target insilico would enable the design of compounds early in the drugdesign process that overcome these limitations. Towards thatgoal, we used the protein design algorithm, K�, to predict muta-tions in DHFR from S. aureus. Positive design selected mutationsthat maintain binding to the substrate, dihydrofolate; negative de-sign selected mutations that lower the affinity for a lead inhibitorand the intersection of these sets resulted in a ranked series ofdouble mutants. Four of the ten top-ranked mutants were chosenfor experimental validation. Excitingly, as predicted, three of themutants maintain catalytic activity and show lower affinity for theinhibitor. A crystal structure of the top-ranked mutant furthervalidates the predicted conformations of the mutated residuesand reveals a conformation of the ligand with many fewer inter-actions than are apparent in the wild-type structure.

Materials and MethodsK�. The following mutations (one-letter amino acid codes) were allowed forthe ten active site residues: Leu5, Val6, Leu20, Leu28, Val31, and Ile50—AVLIMFWY; Asp27—DE; Phe92—AVLIMFWYS. Residues Thr46 and Leu54weremodeled using rotamers but were not allowed tomutate. K� performed2-point mutation searches, in which any two of the ten active site residueswere allowed to simultaneously mutate, while the other eight residues wereallowed to change rotamers. These searches yielded a total of 1,173 candi-date mutants (that included the wild-type, 1-, and 2-point mutations),corresponding to 4.37 × 1010 (DHFR∶NADPH∶dihydrofolate) and 2.49 × 1012

(DHFR∶NADPH∶compound 1) conformations, for which K� scores were com-puted (see below). The use of a factor of ∼100more conformations to predictmutants with reduced binding to compound 1 reflects the need for greaterconformational flexibility for negative design.

NADPH and a steric shell of residues within 5 Å from the active site or 8 Åfrom dihydrofolate were included as part of the input structure. The sub-strate was modeled as flexible using rotamers and was allowed to rotate/translate. Modal values from the Penultimate rotamer library (16) were usedfor the amino acid side chains. Ligand rotamers were defined for dihydro-folate and compound 1 by sampling sets of rotatable bonds (see SI Text).For all rotamers, each dihedral was allowed to minimize within �9 ° fromits initial value.

For each of the candidate mutants, positive design (dihydrofolate as sub-strate) and negative design (compound 1 as substrate) computations wereperformed and scored. The ratio of the K� scores for a given mutant withdihydrofolate and compound 1 was used to rank the candidate mutants;mutants with high ratios were predicted to be good dihydrofolate and poorcompound 1 binders. The energy function consisted of the Amber vdW,electrostatic, and dihedral energies (27), and the EEF1 implicit pairwisesolvation model (28) (see SI Text).

A K� score for a given mutant was computed as the ratio of the partitionfunctions for the bound protein-ligand complex and the free protein andfree ligand. Partition functions were computed over Boltzmann-weightedensembles of conformations: a partition function q for a given ensembleS of conformations was computed as: q ¼ ∑s∈S expð−Es∕RTÞ, where Es isthe energy of conformation s, T is the temperature in Kelvin, and R isthe gas constant. For each partition function computation, initial rotamerpruning based on the MinDEE algorithm (4) was first applied as a prepro-cessing step to reduce the number of candidate rotamers and rotamer-based conformations. MinDEE is provably accurate with continuously flex-ible rotamers defined over a bounded (but continuous) region of side-chainconformation space. Following, A� enumerated the remaining unprunedconformations in order of increasing lower bounds on their energies. Forcomputational efficiency, a provably accurate ϵ-approximation algorithmwas applied to guarantee the accuracy of the computed partition functionbased only on a small fraction of the remaining unpruned conformations(4). Using this ϵ-approximation algorithm, the A� enumeration could beprovably halted once the computed partition function was guaranteedto be within ϵ from the full partition function (when all rotameric confor-mations are taken into account). An ϵ value of 0.03 guaranteed that thecomputed partition functions were at least 97% of the corresponding fullpartition functions (1, 4). The MinDEE∕A� and K� algorithms return a gap-free list of predictions in the order of the predicted score (either empiricalmolecular mechanics energy [MinDEE], or free energy based on molecularensembles [K�]).

Mutant Enzyme Preparation. Sa DHFR mutants ranked #1, 3, 7, and 9 by the K�

algorithm were prepared with site-directed mutagenesis using the DNAencoding Sa (WT) DHFR as a template. All clones were verified by DNA se-quencing. Mutant enzymes were expressed in E. coli BL21(DE3) cells andpurified using procedures previously adapted for the wild-type protein (13).

Enzyme Assays. Enzyme activity and inhibition assays were performed in tri-plicate for each DHFR mutant by monitoring the rate of NADPH oxidation bythe DHFR enzyme at an absorbance of 340 nm (13). Kinetic parameters weremeasured by performing triplicate enzyme activity assays at varying substrateconcentrations of dihydrofolate and analyzed with Lineweaver-Burk plots.KM values, in addition to the obtained IC50 values, were used to calculateKi values for each enzyme and inhibitor (26). Ki values were also determinedexperimentally for compound 1 and the wild-type and Sa (V31Y, F92I)enzymes using four concentrations of substrate (50, 100, 150, and 200 μM)and four concentrations of inhibitor near the IC50 value (50, 75, 100, and150 nM for wild-type and 150, 300, 450, and 600 nM for Sa(V31Y,F92I)DHFR).The inhibition data were analyzed with Dixon plots.

Enzyme Stability by Circular Dichroism. Temperature-induced unfolding ex-periments were performed for both Sa (WT) and Sa (V31Y, F92I) DHFR byincreasing the temperature from 5 °C to 70 °C and monitoring circular dichro-ism at 222 nm. CD measurements were taken at 1 °C temperature incrementswith an equilibration time of 2 min. Protein concentrations used in theseexperiments were 12 μM for Sa (WT) DHFR and 17 μM for Sa (V31Y, F92I)DHFR. To account for the difference in enzyme concentration, the data pointswere converted from millidegrees to molar ellipticity and then plotted todetermine Tm values.

Crystallization. Both Sa DHFR and Sa (V31Y, F92I) DHFR were crystallized usinghanging-drop vapor diffusion. The purified enzymes (12 mg∕mL) were incu-bated with compound 1 (1 mM) and NADPH (2 mM) for 2 h on ice. Crystals ofthe protein:ligand:NADPH complex were optimized in a crystallizationsolution containing 15% PEG MW10,000, 150 mM sodium acetate, and100 mM MES pH 6.5.

Data Collection and Refinement. Diffraction data with amplitudes extendingto 2.1 Å or 3.15 Å were collected at National Synchrotron Light Source(beamline X29 or X25) for complexes of Sa DHFR and Sa (V31Y, F92I) DHFR,respectively. Data were indexed and scaled using HKL 2000 (29). ProgramsCoot (30) and Refmac (31) were used to build and refine the structure until

Frey et al. PNAS ∣ August 3, 2010 ∣ vol. 107 ∣ no. 31 ∣ 13711

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

an acceptable Rcryst and Rfree were achieved. The geometry of the structurewas validated using Procheck (32) and Ramachandran plots. Data collectionand refinement statistics are reported in Table 1.

Software. All software is freely available open-source upon publication.

ACKNOWLEDGMENTS. The authors thank Pablo Gainza-Cirauqui, Nanda Karri,and Michael Lombardo as well as all members of the Donald and Andersonlabs for thoughtful suggestions. We thank Dr. Dennis Wright's laboratory forproviding compounds 1–5, Drs. Carol Teschke and Juliana Cortines for assis-tance with the circular dichroism experiments and the National Institutesof Health (NIH) for funding (GM78031 to B.R.D. and GM067542 to A.C.A.).

1. Chen C, Georgiev I, Anderson A, Donald B (2009) Computational structure-basedredesign of enzyme activity. Proc Natl Acad Sci USA 106:3764–3769.

2. Georgiev I, Donald B (2007) Dead-end elimination with backbone flexibility. Bioinfor-matics 23:i185–194.

3. Georgiev I, Keedy D, Richardson J, Richardson D, Donald B (2008) Algorithm forbackrub motions in protein design. Bioinformatics 24:i196–204.

4. Georgiev I, Lilien R, Donald B (2008) Theminimized dead-end elimination criterion andits application to protein redesign in a hybrid scoring and search algorithm for com-puting partition functions over molecular ensembles. J Comput Chem 29:1527–1542.

5. Jiang L, et al. (2008) De Novo computational design of retro-aldol enzymes. Science319:1387–1391.

6. Rothlisberger D, et al. (2008) Kemp elimination catalysts by computational enzymedesign. Nature 453:190–195.

7. Lippow S, Wittrup K, Tidor B (2007) Computational design of antibody affinityimprovement beyond in vivo maturation. Nat Biotechnol 25:1171–1176.

8. Kuhlman B, et al. (2003) Design of a novel globular protein fold with atomic-levelaccuracy. Science 302:1364–1368.

9. Altman M, Nalivaika E, Prabu-Jeyabalan M, Schiffer C, Tidor B (2008) Computationaldesign and experimental study of tighter binding peptides to an inactivatedmutant ofHIV-1 protease. Proteins 70:678–694.

10. Verkihiver G (2007) Computational proteomics of biomolecular interactions in thesequence and structure space of the tyrosine kinome: Deciphering the molecular basisof the kinase inhibitors selectivity. Proteins 66:912–929.

11. Ishikita H, Warshel A (2008) Predicting drug-resistant mutations of HIV protease.Angewandte Chemie International Edition 47:697–700.

12. Cao Z, et al. (2005) Computer prediction of drug resistancemutations in proteins.DrugDiscov Today 10:521–529.

13. Frey K, et al. (2009) Crystal structures of wild-type and mutant methicillin-resistantStaphylococcus aureus dihydrofolate reductase reveal an alternative conformationof NADPH that may be linked to trimethoprim resistance. J Mol Biol 387:1298–1308.

14. Frey K, Lombardo M, Wright D, Anderson A (2010) Towards the understanding of re-sistance mechanisms in clinically isolated, trimethoprim-resistant, methicillin-resistantStaphylococcus aureus dihydrofolate reductase. J Struc Biol 170:93–97.

15. Dale G, et al. (1997) A single amino acid substitution in Staphylococcus aureusdihydrofolate reductase determines trimethoprim resistance. J Mol Biol 266:23–30.

16. Lovell S, Word J, Richardson J, Richardson D (2000) The penultimate rotamer library.Proteins 40:389–408.

17. Hu Z, et al. (2006) Fitness comparison of thymidine analog resistance pathways inhuman immunodeficiency virus type 1. J Virol 80:7020–7027.

18. Larder B, Coates K, Kemp S (1991) Zidovudine-resistant human immunodeficiencyvirus selected by passage in cell culture. J Virol 65:5232–5236.

19. Larder B, Kemp S (1989) Multiple mutations in HIV-1 reverse transcriptase confer high-level resistance to zidovudine (AZT). Science 246:1155–1158.

20. Rong L, Dahari H, Ribeiro R, Perelson A (2010) Rapid emergence of protease inhibitorresistance in hepatitis C virus. Science Translational Medicine 2.

21. Yatsuji H, et al. (2006) Emergence of a novel lamivudine-resistant hepatitis B virusvariant with a substitution outside the YMDD motif. Antimicrob Agents Chemother50:3867–3874.

22. Wueppenhorst N, Stueger H, Kist M, Glocker E (2009) Identification and characteriza-tion of triple- and quadruple-resistant Helicobacter pylori clinical isolates in Germany.J Antimicrob Chemother 63:648–653.

23. Leartsakulpanich U, et al. (2002) Molecular characterization of dihydrofolate reduc-tase in relation to antifolate resistance in Plasmodium vivax. Mol Biochem Parasitol119:63–73.

24. Ercikan-Abali E, et al. (1996) Active site-directed double mutants of dihydrofolatereductase. Cancer Res 56:4142–4145.

25. Sirawaraporn W, Sathitkul T, Sirawaraporn R, Yuthavong Y, Santi D (1997) Antifolate-resistant mutants of Plasmodium falciparum dihydrofolate reductase. Proc Natl AcadSci USA 94:1124–1129.

26. Cheng Y, Prusoff W (1973) Relationship between the inhibition constant (Ki) and theconcentration of inhibitor which causes 50 percent inhibition (I50) of an enzymaticreaction. Biochem Pharmacol 22:3099–3108.

27. Cornell W, et al. (1995) A second generation force field for the simulation of proteins,nucleic acids, and organic molecules. J Am Chem Soc 117:5179–5197.

28. Lazaridis T, Karplus M (1999) Effective energy function for proteins in solution.Proteins 35:133–152.

29. Otwinowski Z, MinorW (1997)Macromolecular Crystallography.Method Enzymol, edsCW Carter and RM Sweet 276 (Academic Press, New York), 307–326.

30. Emsley P, CoWTan K (2004) Coot: Model-building tools for molecular graphics. ActaCrystallogr D 60:2126–2132.

31. Murshudov G, Vagin A, Dodson E (1997) Refinement of macromolecular structures bythe maximum-likelihood method. Acta Crystallogr D 53:240–255.

32. Laskowski R, MacArthur M, Moss D, Thornton J (1993) PROCHECK: A program to checkthe stereochemical quality of protein structures. J Appl Crystallogr 26:283–291.

13712 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1002162107 Frey et al.