Prediction of folding mechanism for circular-permuted proteins

12
Prediction of Folding Mechanism for Circular-permuted Proteins Cecilia Clementi 1 *, Patricia A. Jennings 2 and Jose ´ N. Onuchic 1 1 Department of Physics and 2 Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA Recent theoretical and experimental studies have suggested that real pro- teins have sequences with sufficiently small energetic frustration that topological effects are central in determining the folding mechanism. A particularly interesting and challenging framework for exploring and test- ing the viability of these energetically unfrustrated models is the study of circular-permuted proteins. Here we present the results of the application of a topology-based model to the study of circular permuted SH3 and CI2, in comparison with the available experimental results. The folding mechanism of the permuted proteins emerging from our simulations is in very good agreement with the experimental observations. The differences between the folding mechanisms of the permuted and wild-type proteins seem then to be strongly related to the change in the native state topology. # 2001 Academic Press Keywords: protein folding; circular-permuted protein; transition state; molecular dynamics simulations *Corresponding author Introduction The structural heterogeneity experimentally observed in the transition state ensemble (TSE) for well designed sequences appears to be more strongly determined by the type of native fold than by differences in sequence, as long as these differences leave the native fold relatively unchanged and the energetic frustration small. 1-19 Therefore, the structural heterogeneity observed in the funnel-like energy landscape has a strong ‘‘topological’’ component and the roughness of this landscape is also influenced by these effects. Exper- imental evidence supporting this idea is based on the observation that, generally, members of hom- ologous protein families show a conservation of the folding mechanism, even when they have little sequence identity (see for instance 14,15 ). Moreover, it has been shown recently that there is a substan- tial correlation between the average sequence sep- aration between contacting residues in the native structure (the so-called ‘‘contact order’’, a quantifi- able topological feature of the native fold) and the folding rates for single domain proteins. 16,17 Although the central role of native state top- ology in determining the folding mechanism is thought to be a quite general result, at least for small two-state folding proteins, there are remark- able exceptions. In fact, it has been found recently that families of proteins with a very similar final fold may show large differences in the folding mechanism (see for instance 20-24 ). Such experimen- tal findings show that topology alone cannot always and completely determine the folding mechanism and demonstrate that the balance between topology and energetics in the folding of proteins is very delicate. The balance between top- ology and energetics seems to be particularly criti- cal in proteins with a highly symmetrical native structure (such as protein L and protein G), where the influence of sequence on the folding mechan- ism has been found particularly strong (see for instance 25,26 ). It is clear from these observations that there is the need for a better understanding of the interplay between topology and energetics in protein folding in order to make significant progress and to be able to make quantitative predictions. Nevertheless, starting from the theoretical and experimental evidence on the generally leading role of topology in the folding of small proteins, several simple theoretical models have been devel- oped during the last few years in order to exploit the topological information of the native structure to make predictions about the folding mechan- ism. 10,11,27-35 The success obtained by these models is clearly corroborating the idea that the native E-mail address of the corresponding author: [email protected] Abbreviations used: TSE, transition state ensemble; CI2, chymotrypsin inhibitor 2. doi:10.1006/jmbi.2001.4871 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 311, 879–890 0022-2836/01/040879–12 $35.00/0 # 2001 Academic Press

Transcript of Prediction of folding mechanism for circular-permuted proteins

doi:10.1006/jmbi.2001.4871 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 311, 879±890

Prediction of Folding Mechanism forCircular-permuted Proteins

Cecilia Clementi1*, Patricia A. Jennings2 and Jose N. Onuchic1

1Department of Physics and2Department of Chemistry andBiochemistry, University ofCalifornia at San Diego, LaJolla, CA 92093, USA

E-mail address of the [email protected]

Abbreviations used: TSE, transitioCI2, chymotrypsin inhibitor 2.

0022-2836/01/040879±12 $35.00/0

Recent theoretical and experimental studies have suggested that real pro-teins have sequences with suf®ciently small energetic frustration thattopological effects are central in determining the folding mechanism. Aparticularly interesting and challenging framework for exploring and test-ing the viability of these energetically unfrustrated models is the study ofcircular-permuted proteins. Here we present the results of the applicationof a topology-based model to the study of circular permuted SH3 andCI2, in comparison with the available experimental results. The foldingmechanism of the permuted proteins emerging from our simulations is invery good agreement with the experimental observations. The differencesbetween the folding mechanisms of the permuted and wild-type proteinsseem then to be strongly related to the change in the native statetopology.

# 2001 Academic Press

Keywords: protein folding; circular-permuted protein; transition state;molecular dynamics simulations

*Corresponding author

Introduction

The structural heterogeneity experimentallyobserved in the transition state ensemble (TSE) forwell designed sequences appears to be morestrongly determined by the type of native foldthan by differences in sequence, as long as thesedifferences leave the native fold relativelyunchanged and the energetic frustration small.1-19

Therefore, the structural heterogeneity observed inthe funnel-like energy landscape has a strong``topological'' component and the roughness of thislandscape is also in¯uenced by these effects. Exper-imental evidence supporting this idea is based onthe observation that, generally, members of hom-ologous protein families show a conservation ofthe folding mechanism, even when they have littlesequence identity (see for instance14,15). Moreover,it has been shown recently that there is a substan-tial correlation between the average sequence sep-aration between contacting residues in the nativestructure (the so-called ``contact order'', a quanti®-able topological feature of the native fold) and thefolding rates for single domain proteins.16,17

Although the central role of native state top-ology in determining the folding mechanism is

ing author:

n state ensemble;

thought to be a quite general result, at least forsmall two-state folding proteins, there are remark-able exceptions. In fact, it has been found recentlythat families of proteins with a very similar ®nalfold may show large differences in the foldingmechanism (see for instance20-24). Such experimen-tal ®ndings show that topology alone cannotalways and completely determine the foldingmechanism and demonstrate that the balancebetween topology and energetics in the folding ofproteins is very delicate. The balance between top-ology and energetics seems to be particularly criti-cal in proteins with a highly symmetrical nativestructure (such as protein L and protein G), wherethe in¯uence of sequence on the folding mechan-ism has been found particularly strong (see forinstance25,26). It is clear from these observationsthat there is the need for a better understanding ofthe interplay between topology and energeticsin protein folding in order to make signi®cantprogress and to be able to make quantitativepredictions.

Nevertheless, starting from the theoretical andexperimental evidence on the generally leadingrole of topology in the folding of small proteins,several simple theoretical models have been devel-oped during the last few years in order to exploitthe topological information of the native structureto make predictions about the folding mechan-ism.10,11,27-35 The success obtained by these modelsis clearly corroborating the idea that the native

# 2001 Academic Press

880 Folding of Circular-permuted Proteins

state topology is central in determining the overallshape of the energy landscape associated with thefolding process, at least for a set of proteins.

In this context, we have introduced a modelbased on an off-lattice Ca minimalist representationof a real protein structure, dressed with potentialinteractions designed to drastically reduce theenergetic frustration and heterogeneity in nativeresidue-residue interactions (Go-likeHamiltonian).27,28 The agreement between the pre-dictions obtained by using this model and the TSEstructure of real proteins, experimentally deter-mined by means of � value measurements, lendfull support to the idea that topology is stronglyresponsible for the observed structural heterogen-eity of the TSE.

A particularly interesting and challenging frame-work for exploring and testing the viability ofthese energetically unfrustrated models is thestudy of circular-permuted proteins. While experi-mentalists often make single site mutations to askthe role particular side-chains play in the TSE,these experiments have many potential contri-butions including sterics, charge change, etc. Totest the role topology plays in folding, we soughtto address this question in a system in which theenergetics should remain essentially the same.Towards this end, we take a cue from experimen-talists who have shown that ``circularly permut-ing'' a sequence can help address questionsregarding contextual sequence informations andfolding. Circular permutation studies work bestwhen the N and C termini of the protein are closein space. This being the case, the N and C terminican be joined by a peptide linker sequence orchemical linkage (with minimal disruption to func-tion). The circular permutation is then accom-plished by subsequently disrupting a peptide bondat a given position along the sequence. How thecircular-permutation affects the folding of the pro-tein is strongly dependent on the region where the``cut'' is made. While the protein sequence isshifted but unchanged upon circular permutation{the chain connectivity and the contact order (asde®ned in16) could be strongly modi®ed: thechange between the wild-type and the permutantis thus mostly topological with minimal contri-bution from sterics, charge change, etc. A recentcomputational study on circular permutations in alattice protein model has shown that different cir-

{ Although a few mutations are usually performedfrom the wild-type protein for allowing the joining ofthe two chain termini, the comparison in the �-valueanalysis is made between the mutated protein and thecircularly permuted mutated protein.

{ As we have described,27 since the energeticfrustration is fully removed from these models, energy-dependent quantities such as protein stability andfolding barriers cannot be quantitatively estimated bythem. Nonetheless, the structural details are welldetermined as expected for minimally frustratedsequences.

cular permutations may produce substantiallydifferent results on the folding nucleus of theprotein.36

The exciting results in predicting the main fea-tures for transition state structure of small fast-folding proteins are encouraging to investigate theeffects of circular permutations on the foldingmechanism of these proteins.

Here, we present the results of the applicationof a topology-based model (as described in27) tothe study of circular permuted proteins for twosimple two-state folding proteins, SH3 and CI2,in comparison with the available experimentalresults{.

The folding mechanism of the permuted proteinsemerging from our simulations is in very goodagreement with the experimental observations. Thedifferences between the folding mechanisms of thepermuted and wild-type proteins seem then to bestrongly related to the change in the native statetopology.

Results and Discussion

Chymotrypsin inhibitor 2 (CI2)

Chymotrypsin inhibitor 2 (CI2) protein is a 64residue protein, consisting of six b-strands packedagainst an a-helix to form a hydrophobic core.Experimental37 ± 40 and computational41,42 studieshave established that CI2 folding and unfoldingcan be modeled by simple two-state kinetics andthat the transition state has roughly half of thenative interactions formed in the transition stateensemble and a broad distribution of � values.The a-helix and the mini-core (de®ned by strands 3and 4 and their connecting loop) seem to play therole of a nucleation site around which foldinginitiates. The transition state structure of a circularpermuted CI2 has been experimentally character-ized by Otzen & Fersht.43 This permuted CI2 hasresidues 3 and 63 joined together (by a disul®debond) and the peptide bond Met40-Glu41 cleavedby cyanogen bromide (CNBr). In the following, werefer to this permutation as perm 40-41. Resultsfrom protein engineering analysis on perm 40-41have shown that its transition state is essentiallyunchanged from that of the wild-type protein,suggesting that the folding nucleus is retainedunder this particular circular permutation.

The overall structure of the transition stateexperimentally observed can be recovered utilizinga simpli®ed energetically unfrustrated model. Theenergetically unfrustrated Ca minimalist modeladopted for our simulations is described in detailelsewhere27 and a brief summary of it is given inAppendix A.

To compare the results from simulations andexperiments we compute the formation probabilityQi for each native contact i to be formed at TSE.Practically, the formation probability Qi is obtainedby averaging the number of times the contact ioccurs over a large set of structures sampled in

Figure 1. The transition statestructure for the wild-type (left)and circular permuted, perm 40-41,(right) CI2, as obtained from simu-lations. Different colored parts ofthe structures indicate their differ-ent degree of formation at the tran-sition state, as quanti®ed in thecolor scale on top. The predomi-nant colors in both structures areyellow-gray, indicating largelyhomogeneous TSE structures. Thea-helix and the mini-core are themost structured parts of the pro-teins for both wild-type and perm40-41 CI2. The main differencebetween the two proteins resides inthe joint region, where perm 40-41is more structured than the wild-type protein. This description is ingood agreement with experimentalresults.43

Folding of Circular-permuted Proteins 881

equilibrium around the free energy barrier betweenthe folded and unfolded states (see Appendix Afor details). The fractional formation Qi arestrongly related to the experimentally measurable(bond) �i values (see27 for a thorough discussion).

Figure 1 shows the degree of ``nativeness'' ofeach residue (computed as an average over theprobability of formations Qi of all contacts invol-ving that residue) at the transition state for thewild-type and perm 40-41 CI2. It is clear from theFigure that there are no major differences in theoverall transition state structure for the wild-typeand permuted protein. Particularly, the a-helix andthe mini-core remain the most likely parts of theprotein formed at the transition state, although thedegree of structural inhomogeneity of the TSE isquite small.

By analyzing the single fractional formation Qi

for the different contacts (i � 1, . . . ,M) at TSE wesee that there is a good correlation between thevalues for the wild-type and permuted proteins (asshown in Figure 2), again con®rming that the fold-ing mechanism is almost unchanged. However thecircular permutation induces a very localized per-turbation in the contacts involving residues in thejoint and/or in the cleavage region. The pointsmostly deviating from the diagonal in Figure 2identify the contacts of perm 40-41 with the largestdifference in fractional formation at TSE withrespect to the wild-type protein. Experimentallythey observe43 that the regions around the disul-®de bond and the cleavage are locally perturbed,particularly they register an increase of ``native-ness'' for residues Lys2 and Val60 in the permutedCI2. This is a direct result of the constraint of thepermutant. An interesting test would be to varythe linker composition in order to see at which lin-ker length this effect fades. Accordingly, our

results show that the termini are slightly morestructured than in the wild-type protein. Figure 3shows the contact map of the average transitionstate structure for wild-type and perm 40-41 CI2.Different colors in these contact maps representdifferent probability for a contact to be formed asquanti®ed by the color scale: the two contact mapsare very similar, with the major differences loca-lized in the joint and cleavage regions as discussedabove.

SH3 domain of the src tyrosine-protein kinease

The a-spectrin SH3 domain is a largely b-sheet,two-state folding protein with a more polarizedtransition state structure (a substantial number oflarge � values).15 The SH3 fold contains three dis-tinct b-hairpins (we refer to them with the stan-dard names used in the literature: RT loop(residues 9-22, in our notations); n-Src loop (26-41);and distal loop (36-49)). Extensive mutagenic stu-dies on this protein44 have shown that the foldingnucleus of the wild-type a-spectrin SH3 is formedmainly by residues located in the RT loop and dis-tal loop b-hairpins. Moreover, a detailed compari-son of the transition state structure of a-spectrinSH3 and its homolog src SH3 has shown that �values at corresponding sequence positions arehighly correlated14,15 although the two homologshave only weak identity (�30 % identity withgaps). This fact suggests that the sequence detailsand local stability are less important for determin-ing how structured a region is in the TSE than itslocation in the ®nal folded conformation. The SH3domain is then an optimal candidate for the studyof circular permutations by using topology-basedmodels.

Figure 2. Correlation of the probability for a contactto be formed Qi (i.e. a rough measure of the contact �value) at the transition state between the wild-type andcircular permuted CI2. Almost all Qi have comparablevalues for the wild-type and circular permuted proteins.The contacts where the largest differences reside arethose involving residues in the joint and/or in the clea-vage region. The red line is the best linear ®t for the cir-cular permuted Qi as a function of the correspondingwild-type Qi. The diagonal is shown (blue line) as areference.

882 Folding of Circular-permuted Proteins

Two different circular permutants of a-spectrinSH3 have been experimentally studied bySerrano's group:44 one involves the disruption of abond in the RT loop (between residues 14 and 15{,indicated as perm 14-15 in the following), the otherthe disruption of a bond in the distal loop(between residues 42 and 43{, perm 42-43). Thetransition states of these two circular permutedSH3 proteins are remarkably different from eachother. Particularly, the folding mechanism of perm42-43 is completely different from the wild-type.Differences between the wild-type and these twocircular permuted SH3 is somehow expected, sincein both cases the cut is made in a region involvedin the transition state structure, where topologicaleffects (i.e. chain connectivity) were thought toplay the largest role. The results from our simu-lations are in excellent agreement with the exper-iments, as shown in the Figures.

Figure 4 shows the degree of ``nativeness'' fordifferent parts of the SH3 structure for the wild-

{ Residues 14-15 in our notation correspond toresidues 19-20 in44.

{ Residues 42-43 in our notation correspond toresidues 47-48 in44.

type and the permuted proteins. As described forCI2, the degree of nativeness of a residue isobtained by averaging the probabilities to beformed over all the contacts involving that residue.In perm 42-43 the cleaved distal loop is not formedat the transition state (while it is almost fullyformed in the wild-type transition state), as foundexperimentally. Moreover, the n-Src loop and Nterminus (which are less structured in the wild-type transition state) are more structured in thetransition state of 42-43, again in agreement withexperimental results.44 Not many experimentaldata are available for perm 14-15, but overall it hasbeen found that the changes are less dramatic. Thenucleus is destabilized, but to a lesser extent thanin the 42-43 mutant. From our results we see thatindeed the wild-type nucleus (interactions betweendistal loop and diverging turn) is destabilized butthe distal loop itself is still structured, and there isno contribution of the N terminus in the transitionstate structure.

Figure 5 shows the correlation between the prob-abilities of contact formation for correspondingcontacts in the wild-type and the two permutedproteins. With a correlation between the fractionalformation of contacts as low as �0, it is clear thatthe folding mechanism for perm 42-43 is completelydifferent from the wild-type. For perm 14-15instead the correlation of the same quantities is'0.7 con®rming that part of the wild-type tran-sition state structure is retained in the permutedmolecule, although the folding nucleus is partlydestabilized. The destabilization of the wild-typefolding nucleus for perm 14-15 is better representedin Figure 6. As already stated above, the foldingnucleus of a-spectrin SH3 is composed by the inter-actions involving the residues of RT loop and dis-tal loop b hairpins (i.e. interactions within andbetween the regions around residue 18 and residue45).

The contact maps in Figure 6 show that whilethese interactions are almost fully formed (blue-black colors) in the wild-type they are poorly struc-tured in perm 42-43 (white-yellow colors) and par-tially destabilized in perm 14-15 (the distal loop islargely structured but the RT loop is unstructuredand the interactions between the two loops areonly partially formed).

Conclusions

We have shown that simple energetically unfru-strated models are quite effective for predicting thechanges in the folding mechanism upon circularpermutation experimentally observed, providingstrong support to the fact that the structural het-erogeneity in the protein funnel-like energy land-scape is strongly in¯uenced by topological effects.

We have presented the results obtained using aGo-like potential on a Ca off-lattice representationof CI2 and SH3 under different circular permu-tations. The predictions of the model are in excel-

Figure 3. Probability for different contacts to be formed at the transition state for the wild-type (left) and circularpermuted (right) CI2. Different colors in the contact maps indicate different probabilities as quanti®ed by the colorscale on top of the Figure. The main differences reside in a few contacts in the regions of the joint and the cleavage,in agreement with the experimental ®ndings43 and as also shown in Figures 1 and 2.

Folding of Circular-permuted Proteins 883

lent agreement with the available experimentalresults, which is consistent with the fact that theseproteins have suf®ciently reduced energeticsfrustration that most of the observed structuralpolarization of the TSE is due to topological fac-tors. It is interesting to notice that while the TSEstructure of the circular permuted CI2 is almostunchanged with respect to the wild-type, the TSE

Figure 4. The transition state structure for the wild-type (and perm 42-43 (right), as obtained from simulations. Differthe transition state. Differently from CI2, these circular permechanism.

of the two circular permuted SH3 analyzed hereshow quite large differences from each other andfrom the wild-type. These results are perfectly con-sistent with the fact that the TSE of CI2 is generallydepicted as a collection of non-speci®c and some-what diffuse nuclei, while the TSE of SH3 presentsa substantially larger degree of structural polariz-ation. The more polarized TSE suggests that SH3

center) and two circular permuted SH3: perm 14-15 (left)ent colors indicate their different degree of formation atmutations (especially perm 42-43) do change the folding

Figure 5. Correlation of the probabilities Qi for a contact to be formed at the transition state between the wild-typeand the two circular permuted SH3: perm 14-15 (left) and perm 42-43 (right). In the case of perm 14-15 there are stillsome similarities with the folding of the wild-type and the overall correlation is around 0.7. On the contrary there isno correlation between the transition state contact probabilities of perm 42-43 and wild-type. As also shown inFigure 6 there is a clear shift of the folding nucleus to another region of the protein, for perm 42-43. In fact a clusterof contacts with medium-high probability of formation in the wild-type (contacts involving residues 30-50) has amuch lower probability in perm 42-43, whereas contacts with very low probability of formation, between the N andC termini in the wild-type, show an enhanced likelihood in perm 42-43. The shift of the folding nucleus produces twoclusters of points in the diagram. The red line is the best linear ®t for the circular permuted Qi as a function of thecorresponding wild-type Qi. The diagonal is shown (blue line) as a reference.

884 Folding of Circular-permuted Proteins

has a backbone conformation which is intrinsicallymore dif®cult to fold, i.e. there is a greater level oftopological frustration in this structure. Changes inchain connectivity are then expected to affect thefolding mechanism more in the SH3 fold than inthe CI2. In other words, since the important con-tacts at the TSE of CI2 are mostly local (i.e. invol-ving residues close to each other along thesequence, as in the a-helix) and distributed overthe protein, big changes are not expected uponshifting of the sequence over the structure. On thecontrary, since the TSE of SH3 involves the for-

Figure 6. Contact map representation of the TSE structur(right) SH3. The main difference between the wild-type andnucleus for perm 14-15. On the contrary, the folding mechantype: a new folding nucleus emerges whereas the one existin

mation of speci®c non-local contacts (as the inter-actions between the distal-loop and RT-loophairpins), changing the chain connectivity in thoseregions is expected to have heavy re¯ection on theformation of the TSE.

The success of our approach in explaining theexperimental data for SH3 and CI2 circular permu-tants gives us the con®dence needed to extendthese studies for more challenging and interestingproteins such as DHFR and IL-1b, for which thefolding mechanism has already been shown tohave strong topological dependence.28 We shall

e for perm 14-15 (left) wild-type (center) and perm 42-43perm 14-15 transition state consists of a less stable foldingism of perm 42-43 is completely different from the wild-g in the wild-type is completely disrupted.

Folding of Circular-permuted Proteins 885

extend this procedure for prediction on larger pro-teins in our future work.

Acknowledgments

One of us (C.C.) expresses her gratitude to GiovanniFossati for his suggestions and for carefully reading themanuscript, and to the Center for Astrophysics & SpaceSciences of UCSD for the usage of graphics facilities andcomputer time. This work has been supported by theNSF (grant MCB-0084797), by the La Jolla Interfaces inScience program (sponsored by the Burroughs WellcomeFund) and by NIH. Numerical simulations would beimpossible without the support of the NSF (grant no.9970199).

References

1. Boczko, E. M. & Brooks, C. L., III (1995). First-principles calculation of the folding free energy of athree-helix bundle protein. Science, 269, 393-396.

2. Wolynes, P. G. (1996). Symmetry and the energylandscapes of biomolecules. Proc. Natl Acad. Sci.USA, 93, 14249-14255.

3. Nelson, E. D., Eyck, L. T. & Onuchic, J. N. (1997).Symmetry and kinetic optimization of protein-likeheteropolymers. Phys. Rev. Letters, 79, 3534-3537.

4. Nelson, E. D. & Onuchic, J. N. (1998). Proposedmechanism for stability of proteins to evolutionarymutations. Proc. Natl Acad. Sci. USA, 95, 10682-10686.

5. Shea, J. E., Nochomovitz, Y. D., Guo, Z. Y. &Brooks, C. L., III (1998). Exploring the space of pro-tein folding Hamiltonians: the balance of forces in aminimalist beta-barrel model. J. Chem. Phys. 109,2895-2903.

6. Sheinerman, F. B. & Brooks, C. L., III (1998). Calcu-lations on folding of segment b1 of streptococcalprotein G. J. Mol. Biol. 278, 439-455.

7. Sheinerman, F. B. & Brooks, C. L., III (1998).Molecular picture of folding of a small alpha/betaprotein. Proc. Natl. Acad. Sci. USA, 95, 1562-1567.

8. Onuchic, J. N., Socci, N. D., Luthey-Schulten, Z. A.& Wolynes, P. G. (1996). Protein folding funnels: thenature of the transition state ensemble. Folding Des.1, 441-450.

9. Socci, N. D., Nymeyer, H. & Onuchic, J. N. (1997).Exploring the protein folding landscape. Physica D,107, 366-382.

10. Portman, J. J., Takada, S. & Wolynes, P. G. (1998).Variational theory for site resolved protein foldingfree energy surfaces. Phys. Rev. Letters, 81, 5237-5240.

11. Micheletti, C., Banavar, J., Maritan, A. & Seno, F.(1999). Protein structures and optimal folding emer-ging from a geometrical variational principle. Phys.Rev. Letters, 82, 3372-3375.

12. Scheraga, H. A. (1992). Contribution of physicalchemistry to an understanding of protein structureand function. Protein Sci. 1, 691.

13. Shea, J. E., Onuchic, J. N. & Brooks, C. L., III (1999).Exploring the origins of topological frustration:design of a minimally frustrated model of fragmentB of protein A. Proc. Natl Acad. Sci. USA, 96, 12512-12517.

14. Grantcharova, V., Riddle, D., Santiago, J. & Baker,D. (1998). Important role of hydrogen bonds in thestructurally polarized transition state for folding ofthe src SH3 domain. Nature Struct. Biol. 5, 714-720.

15. Martinez, J., Pisabarro, M. & Serrano, L. (1998).Obligatory steps in protein folding and the confor-mational diversity of the transition state. NatureStruct. Biol. 5, 721-729.

16. Plaxco, K. W., Simons, K. T. & Baker, D. (1998).Contact order, transition state placement and therefolding rates of single domain proteins. J. Mol.Biol. 277, 985-994.

17. Plaxco, K. W., Simons, K. T., Ruczinski, I. & Baker,D. (2000). Topology, stability, sequence, and length:De®ning the determinants of two-state protein fold-ing kinetics. Biochemistry, 39, 11177-11183.

18. Chan, H. S. (1998). Matching speed and locality.Nature, 392, 761-763.

19. Thirumalai, D. & Klimov, D. K. (1999). Decipheringthe timescales and mechanisms of protein foldingusing minimal off-lattice models. Curr. Opin. Struct.Biol. 9, 197-207.

20. Ferguson, N., Capaldi, A. P., James, R., Kleanthous,C. & Radford, S. E. (1999). Rapid folding with andwithout populated intermediates in homologousfour-helix proteins Im7 and Im9. J. Mol. Biol. 286,1597-1608.

21. Dalessio, P. M. & Ropson, I. J. (2000). b-sheet pro-teins with nearly identical structures have differentfolding intermediates. Biochemistry, 39, 860-871.

22. Burns, L. L., Dalessio, P. M. & Ropson, I. J. (1998).Folding mechanism of three structurally similarb-sheet proteins. Proteins: Struct. Funct. Genet. 33,107-118.

23. Guerois, R. & Serrano, L. (2000). The SH3-foldfamily: experimental evidence and prediction ofvariations in the folding pathways. J. Mol. Biol. 304,967-982.

24. Cota, E., Steward, A., Fowler, S. B. & Clarke, J.(2001). The folding nucleus of a ®bronectin type IIIdomain is composed of core residues of the immu-noglobulin-like fold. J. Mol. Biol. 305, 1185-1194.

25. McCallister, E. L., Alm, E. & Baker, D. (2000).Critical role of beta-hairpin formation in protein Gfolding. Nature Struct. Biol. 7, 669-673.

26. Kim, D. E., Fisher, C. & Baker, D. (2000). A break-down of symmetry in the folding transition state ofprotein L. J. Mol. Biol. 298, 971-984.

27. Clementi, C., Nymeyer, H. & Onuchic, J. N. (2000).Topological and energetic factors: what determinesthe structural details of the transition state ensembleand ``en-route'' intermediates for protein folding?An investigation for small globular proteins. J. Mol.Biol. 298, 937-953.

28. Clementi, C., Jennings, P. A. & Onuchic, J. N. (2000).How native state topology affects the folding ofdihydrofolate reductase and interleukin-1b. Proc.Natl Acad. Sci. USA, 97, 5871-5876.

29. Alm, E. & Baker, D. (1999). Prediction of protein-folding mechanisms from free-energy landscapesderived from native structures. Proc. Natl Acad. Sci.USA, 96, 11305-11310.

30. Munoz, V. & Eaton, W. A. (1999). A simple modelfor calculating the kinetics of protein folding fromthree-dimensional structures. Proc. Natl Acad. Sci.USA, 96, 11311-11316.

31. Galzitskaya, O. V. & Finkelstein, A. V. (1999). Atheoretical search for folding/unfolding nuclei in

886 Folding of Circular-permuted Proteins

three-dimensional protein structures. Proc. Natl Acad.Sci. USA, 96, 11299-11304.

32. Klimov, D. K. & Thirumalai, D. (2000). Native top-ology determines force-induced unfolding pathwaysin globular proteins. Proc. Natl Acad. Sci. USA, 97,7254-7259.

33. Pande, V. & Rokhsar, D. (1999). Folding pathway ofa lattice model for proteins. Proc. Natl Acad. Sci.USA, 96, 1273-1278.

34. Du, R., Pande, V., Grosberg, A., Tanaka, T. &Shakhnovich, E. (1999). On the role of conformation-al geometry in protein folding. J. Chem. Phys. 111,10375-10380.

35. Choe, S., Li, L., Matsudaira, P., Wagner, G. &Shakhnovich, E. (2000). Differential stabilization oftwo hydrophobic cores in the transition state of thevillin 14T folding reaction. J. Mol. Biol. 304, 99-115.

36. Li, L. & Shakhnovich, E. (2001). Different circularpermutations produced different folding nuclei inproteins: a computational study. J. Mol. Biol. 306,121-132.

37. Jackson, S. E. & Fersht, A. R. (1991). Folding ofchymotrypsin inhibitor 2. 1. Evidence for a two-statetransition. Biochemistry, 30, 10428-10435.

38. Jackson, S. E. & Fersht, A. R. (1991). Folding ofchymotrypsin inhibitor 2.2. In¯uence of prolineisomerization on the folding kinetics and thermo-dynamic characterization of the transition state offolding. Biochemistry, 30, 10436-10443.

39. Jackson, S. E., Moracci, M., elMasry, N., Johnson, C.& Fersht, A. R. (1993). The effect of cavity creatingmutations in the hydrophobic core of chymotrypsininhibitor 2. Biochemistry, 32, 11262-11269.

40. Jackson, S. E., elMasry, N. & Fersht, A. R. (1993).Structure of the hydrophobic core in the transitionstate for folding of chymotrypsin inhibitor 2: a criti-cal test of the protein engineering method of anal-ysis. Biochemistry, 32, 11270-11278.

41. Pan, Y. P. & Daggett, V. (2001). Direct comparisonof experimental and calculated folding free energiesfor hydrophobic deletion mutants of chymotrypsininhibitor 2: free energy perturbation calculationsusing transition and denatured states from molecu-lar dynamics simulations of unfolding. Biochemistry,40, 2723-2731.

42. Kazmirski, S., Wong, K., Freund, S., Tan, Y., Fersht,A. & Daggett, V. (2001). Protein folding from ahighly disordered denatured state: the folding path-way of chymotrypsin inhibitor 2 at atomic resol-ution. Proc. Natl Acad. Sci. USA, 98, 4349-4354.

43. Otzen, D. E. & Fersht, A. R. (1998). Folding ofcircular and permuted chymotrypsin inhibitor 2:retention of the folding nucleus. Biochemistry, 37,8139-8146.

44. Viguera, A. R., Serrano, L. & Wilmanns, M. (1996).Different folding transition states may result in thesame native structure. Nature Struct. Biol. 3, 874-880.

45. Ueda, Y., Taketomi, H. & Go, N. (1975). Studies onprotein folding, unfolding and ¯uctuations by com-puter simulation. I. The effects of speci®c aminoacid sequence represented by speci®c inter-unitinteractions. Int. J. Pept. Res. 7, 445-459.

46. Ueda, Y., Taketomi, H. & Go, N. (1978). Studies onprotein folding, unfolding, and ¯uctuations by com-puter simulations. II. A three-dimensional latticemodel of lysozyme. Biopolymers, 17, 1531-1548.

47. Pearlman, D. A., Case, D. A., Caldwell, J. W., Ross,W. S., Cheatam, T. E., III, DeBolt, S. et al. (1995).AMBER, a package of computer programs

for applying molecular mechanics, normal modeanalysis, molecular dynamics and free energycalculations to simulate the structural and energeticproperties of molecules.. Comput. Phys. Commun. 91,1-41.

48. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren,W. F., DiNola, A. & Haak, J. R. (1984). Moleculardynamics with coupling to an external bath. J. Chem.Phys. 81, 3684-3690.

49. Sobolev, V., Wade, R., Vriend, G. & Edelman, M.(1996). Molecular docking using surface complemen-tarity. Proteins: Struct. Funct. Genet. 25, 120-129.

50. Onuchic, J. N., Nymeyer, H., GarcõÂa, A. E., Chahine,J. & Socci, N. D. (2000). The energy landscape theoryof protein folding: insights into folding mechanismsand scenarios. Advan. Protein Chem. 53, 87-152.

51. Ferrenberg, A. M. & Swendsen, R. H. (1988). NewMonte Carlo technique for studying phase tran-sitions. Phys. Rev. Letters, 61, 2635-2638.

52. Ferrenberg, A. M. & Swendsen, R. H. (1989). Opti-mized Monte Carlo data analysis. Phys. Rev. Letters,63, 1195-1198.

53. Swendsen, R. H. (1993). Modern methods of analyz-ing Monte Carlo computer simulations. Physica A,194, 53-62.

54. Nymeyer, H., Socci, N. D. & Onuchic, J. N. (2000).Landscape approaches for determining the ensembleof folding transition states: success and failure hingeon the degree of frustration. Proc. Natl Acad. Sci.USA, 97, 634-639.

Appendix A

Model and Method

The energetically unfrustrated model of the ana-lyzed proteins are constructed by using a Go-likeHamiltonian.1,2 A Go-like potential takes intoaccount only native interactions, and each of theseinteractions enters in the energy balance with thesame weight. Residues in the proteins are rep-resented as single beads centered in their Ca pos-itions. Adjacent beads are strung together into apolymer chain by means of bond and angle inter-actions, while the geometry of the native state isencoded in the dihedral angle potential and a non-local bead-bead potential.

A detailed description of this energy functioncan be found elsewhere.3 Solvent mediation andside-chain effects are already included in theseeffective energy functions. Therefore, entropychanges are associated to the con®gurational entro-py of the chain. We have used molecular dynamics(MD) (entailing the numerical integration ofNewton's laws of motion) for simulating the kin-etics of the protein models. We employed thesimulation package AMBER (version 4.1)47 at con-stant temperature, i.e. using the Berendsen algor-ithm for coupling the system to an external bath.5

The native contact map of a protein is derivedwith the CSU software, based upon the approachdeveloped by Sobolev et al.6 Native contactsbetween pairs of residues (i,j) with j 4 i � 4 arediscarded from the native map as any three and

Folding of Circular-permuted Proteins 887

four subsequent residues are already interacting inthe angle and dihedral terms. The energy gain of aformed native contact is indicated by e, and thetemperatures and energies reported here aremeasured in units of e.

A contact between two residues (i,j) is con-sidered formed if the distance between the Ca

atoms is shorter than g times their native distancesij. It has been shown7 that the results are notstrongly dependent on the choice made for the cut-off distance g. In this work we used g � 1.2.

For all the proteins presented here, several con-stant temperature simulations were made andcombined using the WHAM algorithm8 ± 10 to gen-erate a speci®c heat pro®le versus temperature anda free energy F(Q) as a function of the folding reac-tion coordinate Q, de®ned as the fraction of nativecontact formed in a conformation (Q � 0 at thefully unfolded state and Q � 1 at the folded state).The use of the parameter Q as a single reactioncoordinate for the folding process is justi®ed bythe fact that proteins with a reduced level of ener-getic frustration (as this model is by construction)have a funnel-like energy landscape with a sol-vent-averaged potential depth strongly correlatedwith the degree of nativeness. In this situation, thefolding dynamics can be described as the diffusionof an ensemble of protein con®gurations over alow-dimensional free energy surface, de®ned interms of the reaction coordinate Q3,11,12,14,15,17 (seeAppendix B for a discussion on the validity of thechoice of Q as reaction coordinate).

The folding temperature of each protein modelis estimated as the temperature corresponding tothe peak in the speci®c heat curve.

The errors (reported as error bars in the plots)on the estimates of the contact probabilities areobtained by computing these quantities from sev-eral (more than 15) uncorrelated sets of simulationsand then considering the dispersion of valuesobtained for the same contact.

References

1. Ueda Y., Taketomi H. & Go N. (1975). Studieson protein folding, unfolding and ¯uctuations bycomputer simulation. I. The effects of speci®camino acid sequence represented by speci®c inter-unit interactions. Int. J. Pep. Res. 7, 445-459.

2. Ueda, Y., Taketomi, H. & Go, N. (1978). Stu-dies on protein folding, unfolding, and ¯uctuationsby computer simulations. II. A three-dimensionallattice model of lysozyme. Biopolymers, 17, 1531-1548.

3. Clementi, C., Nymeyer, H. & Onuchic, J. N.(2000). Topological and energetic factors: whatdetermines the structural details of the transitionstate ensemble and ``en-route'' intermediates forprotein folding? An investigation for small globu-lar proteins. J. Mol. Biol. 298, 937-953.

4. Pearlman, D. A., Case, D. A., Caldwell, J.W.,Ross, W. S., Cheatam, T. E., III, DeBolt, S. et al.,

(1995). AMBER, a package of computer programsfor applying molecular mechanics, normal modeanalysis, molecular dynamics and free energy cal-culations to simulate the structural and energeticproperties of molecules. Phys. Commun. 91, 1-41.

5. Berendsen, H. J. C., Postma, J. P. M., vanGunsteren, W. F., DiNola, A. & Haak, J. R. (1984).Molecular dynamics with coupling to an externalbath. J. Chem. Phys. 81, 3684-3690

6. Sobolev, V., Wade, R., Vriend, G. & Edelman,M. (1996). Molecular docking using surface com-plementarity. Proteins: Struct. Funct. Genet. 25, 120-129.

7. Onuchic, J. N., Nymeyer, H. GarcõÂa, A. E.,Chahine, J. & Socci, N. D. (2000). The energy land-scape theory of protein folding: insights into fold-ing mechanisms and scenarios. Advan. ProteinChem. 53, 87-152.

8. Ferrenberg, A. M. & Swendsen, R. H. (1988).New Monte Carlo technique for studying phasetransitions. Phys. Rev. Letters, 61, 2635-2638.

9. Ferrenberg, A. M. & Swendsen, R. H. (1989).Optimized Monte Carlo data analysis. Phys. Rev.Letters, 63, 1195-1198.

10. Swendsen, R. H. (1993). Modern methodsof analyzing Monte Carlo computer simulations.Physica A, 194, 53-62.

11. Onuchic, J. N., Socci, N. D., Luthey-Schulten,Z. A. & Wolynes, P. G. (1996). Protein folding fun-nels: the nature of the transition state ensemble.Folding Des. 1, 441-450.

12. Shea, J. E., Onuchic, J. N. & Brooks, C. L., III.(1999). Exploring the origins of topological frustra-tion: design of a minimally frustrated model offragment B of protein A. Proc. Natl Acad. Sci. USA,96, 12512-12517.

13. Clementi, C., Jennings, P. A. & Onuchic, J.N. (2000). How native state topology affects thefolding of dihydrofolate reductase and interleukin-1b. Proc. Natl Acad. Sci. USA, 97, 5871-5876.

14. Nymeyer, H., Socci, N. D. & Onuchic, J. N.(2000). Landscape approaches for determining theensemble of folding transition states: success andfailure hinge on the degree of frustration. Proc. NatlAcad. Sci. USA, 97, 634-639.

Appendix B

Is Q an Appropriate ReactionCoordinate?

The comparison between the simulation and theexperiments presented here (as well as in1,2) isbased upon the assumption that we are able toidentify the transition state ensemble through asingle reaction coordinate, Q, de®ned as the frac-tion of native contact formed.

This choice for the reaction coordinate, on whichthe identi®cation of the TSE depends, has the indis-putable advantage that Q is computationallystraightforward. The theoretical justi®cation for the

Figure A1. Example of a portionof typical simulation run for perm40-41 CI2. The range of Q de®ningthe TSE is highlighted. The fourdifferent types of con®gurations(see the text) contributing to theTSE are marked in different colors.

Figure A2. Correlation betweenthe probabilities of contact for-mation computed separately fordifferent subsets of TSE con®gur-ations for perm 40-41 CI2:(i) U! TS! F, (ii) F! TS! U,(iii) U! TS! U, (iv) F! TS! F.The three sets of points showthe correlation of (ii), (iii) and(iv) versus (i). The correlation coef®-cients are shown in the inset.

888 Folding of Circular-permuted Proteins

Figure A3. Distribution of the values of Q obtained at

Folding of Circular-permuted Proteins 889

use of Q as a single reaction coordinate for thefolding process relies on the fact that proteins witha reduced level of energetic frustration (as thismodel is by construction) have a funnel-like energylandscape with a solvent-averaged potential depthstrongly correlated with the degree of nativeness.This argument is not suf®cient to ensure that thereis a perfect correspondence between the TSE thusobtained from simulations and the TSE experimen-tally described through the �-value analysis.

This is source of signi®cant debate and alterna-tive de®nitions of reaction coordinate have beendiscussed in the literature (see for instance3), aswell as different methods for the identi®cation of aset of transition state structures.

The transition state ensemble is de®ned as thosestructures having the same probability of foldingto the native state, or unfolding back to the denatu-rated state, at the transition temperature. The idealexact reaction coordinate would be the probabilityfor a con®guration to fold to the native state, thatwe indicate in the following as Pnative.

This choice of the reaction coordinate affects alsothe robustness of the de®nition of the transitiontemperature itself.

In principle, all the structures collected during asimulation should be checked for their probabilityto lead to the native/denatured state. This is com-putationally too intensive. For instance, the typicalnumber of con®gurations gathered during one ofthe runs performed for this analysis, is of the orderof 105. In order to infer reliably the free energy pro-®le, by using the WHAM algorithm with thechoice of Q as the reaction coordinate, about30 runs (sampling different temperatures) arenecessary.

Each individual con®guration could, in principle,be labeled with a value of Pnative, by means of a setof ancillary simulations starting from it. Clearlythis task is overwhelmingly demanding.

The same procedure could be applied, however,to validate a posteriori the set of con®gurationsselected as TSE on the basis of the thermodynamicanalysis performed by using Q as the reactioncoordinate.

We discuss the results of the investigation onthis question, done by checking the TSE con®gur-ations gathered from the simulations and the anal-ysis presented here. As an example, we focus onthe results obtained for perm 40-41 CI2. The ana-lyses performed for all the other proteins con-sidered here yield the same conclusions.

According to our approach a con®guration isconsidered in the TSE if the corresponding Q valuelies in a range bracketing the top of the barrier inthe free energy pro®le.

Following this criterion, four types of con®gur-ations are gathered during a simulation ``exper-iment'' as contributing to the putative TSE (seeFigure A1): (i) those visited transiently during fold-ing (U! TS! F), (ii) those visited transientlyduring unfolding (F! TS! U), (iii) con®gur-ations that seem to be just ¯uctuations from thefolded state (F! TS! F) (iv) con®gurations thatseem to be just ¯uctuations from the unfoldedstate (U! TS!U).

The ®rst question to answer in order to validatethe putative TSE is whether the average TSE struc-ture is similar across these four subsets. Clearly ifthe four types of con®gurations have signi®cantlydifferent average values of Pnative, the TSE averagestructure would be different when computed forthe different four subsets.

We show in Figure A2 that the average structureof the TSE is essentially the same for the four cat-egories of con®gurations. For instance, con®gur-ations of the type (U! U) are not distinguishablefrom those that we collect on the U! F path.

A further step in the validation of our choice ofTSE, based upon Q, is the explicit calculation of

the end of short simulations starting from each of theTSE con®gurations of type U! TS! U, for threedifferent temperatures around the putative Tf. The inte-gral probabilities for Q 4 0.4 (Q 5 0.6) are 0.41 (0.53),0.44 (0.48) and 0.51 (0.40) for T � 0.96e, 0.97e, 0.98erespectively.

890 Folding of Circular-permuted Proteins

Pnative for a representative set of TSE con®gur-ations. Since we have shown that the structuralinformation contained in the four different subsetsis equivalent, we performed this test only on theU! TS! U con®gurations. For each of the 153con®gurations of the U! TS! U ensemble weran a series of 100 short simulations{ and werecorded the value of Q at their ®nal step. Thesesimulations determine the propensity to go fromthe TSE to the native or unfolded ensembles. Initialvelocities are drawn at random from a Boltzmanndistribution at the temperature of the simulation.The simulations are repeated for three differenttemperatures: 0.96e, 0.97e, 0.98e. The TSE analysispresented here for this protein is done for T�0.96e,i.e. the folding temperature estimated from thepeak in the speci®c heat curve.

The results are summarized in Figure A3, whichshows a well balanced distribution of folded versusunfolded af®nities.{. The integral probabilities forQ40.4, i.e. the unfolded ensemble (Q50.6, i.e. thefolded ensemble) are 0.41 (0.53), 0.44 (0.48) and0.51 (0.40) for T�0.96e, 0.97e, 0.98e, respectively.

The conclusion of this analysis is that the TSEselected by relying on Q as reaction coordinate

{ We selected a time interval of 100 steps. FromFigure A1, it can be noticed that 100 steps is of theorder of the typical crossing time through the transitionregion, but it is much shorter than the folding time.

{ The folding af®nities for the single con®gurationsare qualitatively in agreement with the average resultpresented here.

around the folding temperature, provides a reliablesample of con®gurations with Pnative '0.5. Theresults presented here are therefore meaningfullycomparable with the TSE structures experimentallyinferred by means of �-value analysis.

References

1. Clementi, C., Nymeyer, H., & Onuchic, J. N.(2000). Topological and energetic factors: whatdetermines the structural details of the transitionstate ensemble and ``en-route'' intermediates forprotein folding? An investigation for small globu-lar proteins. J. Mol. Biol. 298, 937-953.

2. Clementi, C., Jennings, P. A., & Onuchic, J. N.(2000). How native state topology affects the fold-ing of dihydrofolate reductase and and Interleukin-1b. Proc. Natl. Acad. Sci. USA. 97, 5871-5876.

3. Du, R., Pande, V., Grosberg, A., Tanaka, T., &Shakhnovich, E. (1999). On the role of confor-mational geometry in protein folding. J. Chem.Phys. 111, 10375-10380.

Edited by C. R. Matthews

(Received 9 February 2001; received in revised form 8 June 2001; accepted 21 June 2001)