Theoretical and Computational Protein Design

24
Theoretical and Computational Protein Design Ilan Samish, Christopher M. MacDermaid, Jose Manuel Perez-Aguilar, and Jeffery G. Saven Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104; email: [email protected] Annu. Rev. Phys. Chem. 2011. 62:129–49 First published online as a Review in Advance on December 3, 2010 The Annual Review of Physical Chemistry is online at physchem.annualreviews.org This article’s doi: 10.1146/annurev-physchem-032210-103509 Copyright c 2011 by Annual Reviews. All rights reserved 0066-426X/11/0505-0129$20.00 These authors contributed equally to this work. Keywords de novo protein design, energy landscape theory, negative design, probabilistic protein design, artificial enzymes, nonbiological cofactors, membrane proteins Abstract From exponentially large numbers of possible sequences, protein design seeks to identify the properties of those that fold to predetermined struc- tures and have targeted structural and functional properties. The interac- tions that confer structure and function involve intermolecular forces and large numbers of interacting amino acids. As a result, the identification of sequences can be subtle and complex. Sophisticated methods for character- izing sequences consistent with a particular structure have been developed, assisting the design of novel proteins. Developments in such computational protein design are discussed, along with recent accomplishments, ranging from the redesign of existing proteins to the design of new functionalities and nonbiological applications. 129 Annu. Rev. Phys. Chem. 2011.62:129-149. Downloaded from www.annualreviews.org by Weizmann Institute of Science on 01/31/12. For personal use only.

Transcript of Theoretical and Computational Protein Design

PC62CH07-Saven ARI 24 February 2011 21:38

Theoretical and ComputationalProtein DesignIlan Samish,∗ Christopher M. MacDermaid,∗

Jose Manuel Perez-Aguilar, and Jeffery G. SavenDepartment of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104;email: [email protected]

Annu. Rev. Phys. Chem. 2011. 62:129–49

First published online as a Review in Advance onDecember 3, 2010

The Annual Review of Physical Chemistry is online atphyschem.annualreviews.org

This article’s doi:10.1146/annurev-physchem-032210-103509

Copyright c© 2011 by Annual Reviews.All rights reserved

0066-426X/11/0505-0129$20.00

∗These authors contributed equally to this work.

Keywords

de novo protein design, energy landscape theory, negative design,probabilistic protein design, artificial enzymes, nonbiological cofactors,membrane proteins

Abstract

From exponentially large numbers of possible sequences, protein designseeks to identify the properties of those that fold to predetermined struc-tures and have targeted structural and functional properties. The interac-tions that confer structure and function involve intermolecular forces andlarge numbers of interacting amino acids. As a result, the identification ofsequences can be subtle and complex. Sophisticated methods for character-izing sequences consistent with a particular structure have been developed,assisting the design of novel proteins. Developments in such computationalprotein design are discussed, along with recent accomplishments, rangingfrom the redesign of existing proteins to the design of new functionalitiesand nonbiological applications.

129

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

De novo proteindesign: design ofproteins from firstprinciples, ofteninvolving thepredetermination ofstructure, sequence,and function

INTRODUCTION

The function of a folded protein is largely dictated by its structure. Consequently, understandinghow structure and function are encoded in the sequence is a long-standing goal critical for under-standing many life processes and for interpreting genomic information. The primary sequenceof a protein is determined easily, but predicting structure from sequence is a difficult but rapidlydeveloping field (1) aided by efforts in structural genomics (2, 3). Insight into protein folding andfunction has come with advances in theoretical and experimental methods that address structureprediction, protein folding, and protein design. In particular, the design and redesign of proteinsopen novel means to explore natural proteins and to engineer new molecular systems havingfunctions that need not be observed in nature.

Protein design is motivated by the desire to study, understand, and exploit the versatile struc-tures and functions capable with proteins. Nature leverages the physicochemical properties of theamino acids to arrive at highly functional sequences that fold spontaneously, in which structuraland functional properties are fine-tuned during the course of evolution. Proteins comprise tensto thousands of amino acids, and backbone and side-chain degrees of freedom result in an im-mense number of possible configurations for a single sequence. Many of the apparent paradoxesinvolving folding dynamics and stability associated with folding are addressed using (a) modelsin which a limited set of ensembles of conformations provides steps on a path to the folded stateor, more generally, (b) the energy landscape theory of folding, which quantifies the features ofa protein’s configurational energy surface (4, 5). Central to the energy landscape theory is theidea of sufficiently minimal frustration in natural proteins, in which the native state is sufficientlylow in energy so as to avoid becoming trapped in local misfolded energy minima. In the de novodesign of sequences, the complexity is compounded by the exponentially large numbers of possi-ble sequences: 20N for an N-residue protein using only the 20 naturally occurring amino acids.Rather than using evolution, protein design achieves sufficiently minimal frustration via the carefulidentification of sequences and their properties.

Why design proteins when nature already provides a wealth of structures and functions?Designing proteins critically tests and advances our understanding of protein stability and fold-ing (6). In addition, designing proteins provides efficient access to large (nanoscale), well-definedmolecular structures, as arbitrary protein sequences can be straightforward to create and foldingyields the three-dimensional structure. Protein design can provide possible routes to novel cat-alytic, pharmaceutical, structural, and sensing properties. Realizing such designed proteins, how-ever, requires the means to identify successfully appropriate sequences that fold and are functional.

De novo protein design faces a variety of challenges. Early design efforts impressively yieldednew proteins, but many were structurally and thermodynamically less well defined than naturalproteins (7–12). The difficulty results in part from the subtle physical and chemical interactions thatstabilize folded structures. In addition, the exponentially large number of possible sequences (e.g.,more than 20100 ∼ 10130 sequences for a 100-residue protein) impedes direct rational design. Inaddition to being energetically consistent with its folded structure, a protein sequence also must beincompatible with alternative, competing structures. Computational methods for protein designare designed to surmount some of these difficulties and address the large number of concertedinteractions within a targeted structure that confer folding and desired functional properties.

Herein computational methods are surveyed for designing and engineering proteins. The-oretical and computational methods are the focus, particularly when used to predetermine thesequence properties at large numbers of variable residue positions. The review is not exhaustiveand highlights computational aspects of protein design, particularly for cases in which designedsequences have been realized and characterized experimentally.

130 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

NMR: nuclearmagnetic resonancespectroscopy

ELEMENTS OF COMPUTATIONAL APPROACHESTO PROTEIN DESIGN

Computational methods have been developed to explore and characterize sequence space andguide protein design with atomic resolution. Common inputs, considerations, and techniques forcomputational protein design (hereafter referred to as design) are discussed below.

Target Structure

The target protein structure is represented most often as a set of coordinates for the backboneatoms, which may be obtained from an existing high-resolution crystal structure or carefullyselected from a nuclear magnetic resonance (NMR)-derived structure (13). Novel structurescan be modeled de novo via assembly of protein fragments (14–16) or secondary-structure el-ements (17–19). Often these backbone coordinates are constrained during the design calculations.Although fixing the backbone greatly reduces the computational complexity, it also precludes mainchain adjustments to accommodate sequence variation. Backbone flexibility can be introduced byincluding alternative, closely related conformations (20–27). An existing structure can yield de-signed functionalities beyond a protein’s natural function. For example, the TIM-barrel topologyis one of the most common structures and is found in 21 unrelated protein superfamilies (28).Recently novel functionalities were introduced successfully into this fold (Figure 1a,b) (29).

a b

ec d

Figure 1Novel designed structures and enzymes. (a) A novel retroaldol enzyme designed within a TIM-barrel template (29). Side chains ofdesigned residues are rendered with catalytic residues in yellow. (b) Designed Kemp elimination enzyme with TIM-barrel template,with measured rate enhancements of up to 105 and multiple turnovers (118). Side chains of active site residues are depicted. (c) Novelβαβ protein, including a β-sheet ( pink) forming a tight core with the helix (cyan) (120). (d ) The redesign of a procarboxypeptidase,yielding a highly stable and fast-folding antiparallel dimer. One monomer depicts the designed positions (cyan) and the positions thatwere not designed (magenta) (121). (e) The design of a novel α/β protein structure, TOP7 (119). Two pseudosymmetric halves arehighlighted by light and dark shading.

www.annualreviews.org • Theoretical and Computational Protein Design 131

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

Rotamer:conformation of aminoacid side chainsgenerated by variationof dihedral angles;often discrete sets ofrotamers are identifiedfrom known proteinstructures and/ormolecular modeling

Residue Degrees of Freedom

The degrees of freedom per residue are the chemical identities of the amino acids and their con-formations. Including side-chain conformations expands the combinatorial complexity of design,but recovers the complementary intermolecular interactions observed in natural proteins.

Amino acids. The amino acid degrees of freedom refer to the chemical variability of the sequenceand to the number of different amino acids allowed at each variable position. Although using all20 natural monomers allows the entire sequence space to be searched, studies have suggested avariety of ways to reduce the amino acid degrees of freedom. Hydrophobic patterning can reducedramatically the sequence search space (30). Monomers other than the naturally occurring aminoacids also may be considered (31) for use in enhancing function (32) and probing protein structure(33).

Rotamer libraries. The side-chain conformations usually are inferred from a structural databaseand typically are consistent with energy minima in a molecular potential (34). Rotamer statescorrespond to frequently observed values of the dihedral angles in the side chain of each aminoacid. The simplest amino acids (alanine and glycine) are considered to have just one rotamerstate, whereas larger amino acids may have as many as 80–100 different rotamer states (or more).The use of a rotamer library significantly reduces the computational complexity by discretizingthe state space. A variety of rotamer libraries are available for protein modeling and protein de-sign, including backbone-independent, secondary-structure-dependent, and backbone-dependentlibraries (34–43). Early studies utilized as few as 84 rotamers to represent all side-chain conforma-tions (35). More recently, large libraries have been constructed with as many as 50,000 rotamers(44). Rotamer libraries can be augmented by allowing for variation in bond lengths and anglesas observed in crystal structures, yielding conformer libraries (45, 46). Conformer libraries ap-pear to be superior to rotamer libraries in the computational design of small-molecule placementin enzymes (47). Rotamer libraries can be expanded beyond the 20 natural amino acids. Post-translational modifications, water, ligands, and cofactors can be modeled using effective rotamers;e.g., structurally well-defined water molecules can be incorporated as a solvated rotamer library(31), enhancing the modeling of protein-protein interactions and water within protein interiors.Rotamers also can be constructed for nonbiological monomers, e.g., β-amino acids (48).

Energy Functions

Physicochemical potentials and energy functions are used to quantify sequence-structure compati-bility (49, 50). Potential functions currently in use for protein design are similar to those developedfor molecular simulations (51–54). Such potentials can be parameterized to yield naturally occur-ring sequences as local minima (55). Simplified, coarse-grain energy functions also can be derivedfrom natural protein structures (56).

If the state of a sequence can be denoted by (α1, r(α1); α2, r(α2); . . . ; αN , r(αN )), where αi andr(αi ) represent the amino acid and the rotamer of a residue, then the energy of a particular set ofamino acids and rotamer states is computed by summing over one- and two-residue interactions:

E(α1, r(α1); α2, r(α2); . . . ; αN , r(αN )) =N∑

i=1

εi (αi , r(αi )) +N∑

i=1

N∑

j>i

γi j (αi , r(αi ); α j , r(α j )). (1)

The one-body energy ε is calculated by summing over the interatomic interactions between theside-chain atoms of amino acid αi when in rotamer state r(αi) with the backbone and other fixed

132 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

Negative design:incorporatesinformation aboutcompeting structuresand configurationsthat do not have thetargeted structureand/or functionalproperties in thedesign process

MC: Monte Carlomethods

moieties, e.g., a metal ion cofactor. Similarly, the two-body term γ , representing the rotamer-rotamer interaction, is calculated as the sum of all interatomic energies between the atoms in theside chains of amino acids αi and αj given that their rotamer states are r(αi) and r(αj).

Solvent can be treated explicitly (31) or, more commonly, implicitly via pairwise hydrophobicinteractions, variable dielectric constant (57), or propensities for solvent exposure (27). Explicitsolvent models can become impractical for protein design, and solvation free energies are ac-counted for by using solvent-accessible surface areas (58, 59). Such implicit models of solvationcan also include generalized Born terms, in which solvent is treated as a polarizable dielectric(60–63). An environmental potential energy (64) based on the Cβ density in the vicinity of eachside chain has been parameterized from a set of soluble proteins and was shown to correlate wellwith commonly used hydrophobicity scales (27).

Foldability Criteria and Negative Design

As it folds, a protein avoids populating alternative structures that are significantly different fromthe target. The energy landscape is appropriately funneled toward the native state (4, 5). In proteindesign, the inclusion of this bias against misfolded structures is referred to as negative design (65). Avariety of foldability criteria have been presented whose optimization during the course of designyields such funneled energy landscapes (66). The simplest criterion suggests that a well-foldedprotein has an energy gap separating the target structure energetically from possible competingstructures (4, 66–68). Focusing only on the target structure and decreasing its energy by varyingsequence are not always correlated with improved foldability. This can be particularly problematicfor reduced representations of the amino acids in which negative design can be crucial (69), orfor cases in which competing structures that are energetically degenerate are readily accessible,e.g., protein-protein interactions and helical oligomers (70). For simplified models of proteins,foldability criteria that include information about competing structures can guide sequence designexplicitly (56).

For atomically detailed representations of proteins, however, the reduction of the energy (orother scoring functions) of the target structure is central to most design methods and yields com-plementary interactions among residues, consistent with what is observed in naturally occurringstructures. Sequences yielding such tightly packed structures are likely specific to a particularbackbone structure. Similar interior packing and inter-residue complementarity are unlikely tobe observed in alternative conformations of the backbone, at least for many tertiary structures.Explicit representation of the side chains increases the effective number of monomer types aseach amino acid possesses a set of conformational states. As the effective number of monomersincreases, encoding a particular tertiary structure is facilitated. In the absence of explicit nega-tive design, the unfolded state still needs to be considered. This is often addressed by the use ofreference energies, which quantify the typical energy (or free energy) of each amino acid in theunfolded state. Such energies can be determined empirically so as to reproduce known amino acidfrequencies or can be calculated from modeling small peptides (55, 71, 72).

Sequence Search and Characterization

There are a variety of methods for identifying sequences subject to a particular energy or objectivefunction (73). Most methods rely on optimization procedures to identify candidate sequences,whereas others focus on characterizing the sequence space, often probabilistically. Optimizationmethods can employ stochastic or deterministic approaches. Stochastic algorithms include theMonte Carlo (MC) methods (simulated annealing) (55, 74–76), graph search algorithms such

www.annualreviews.org • Theoretical and Computational Protein Design 133

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

DEE: dead-endelimination

as A∗ (77), and genetic algorithms (78). The deterministic optimization methods systematicallyidentify global optima (or estimates of optima), e.g., elimination methods [dead-end elimination(DEE)] (79), self-consistent mean field (80), and graph decomposition and linear programming(81). Rather than specific sequences, some methods produce a site-specific probabilistic descriptionof the sequence ensemble that is likely to be compatible with the target structure (27, 82). Herewe discuss a few of the more commonly used techniques.

Monte Carlo. MC methods are widely used to stochastically sample states of complex systems ina manner consistent with a chosen probability distribution. The Boltzmann distribution is appliedmost often, which preferentially weights low-energy configurations. At each elementary step in asearch, a partially random test sequence is generated, and its energy is computed using a physicalpotential. These elementary steps can modify both rotamer state and amino acid identity. An effec-tive temperature determines the energy changes that are likely. In simulated annealing, graduallydecreasing the effective temperature allows preferential sampling of lower-energy configurations(55, 74, 83). Convergence to a global minimum is not guaranteed, however, so multiple indepen-dent calculations are applied. Other extensions include MC with quenching (84) and biased MC.In biased MC, trial moves are biased to increase acceptance probabilities (85), in which the biasingderives from a probability that is predetermined or a function of the local energy surface, akinto configurational biasing (86). Related methods have been applied to the continuous explorationof side-chain conformations (87). Biased MC and self-consistent-field biased MC have providedefficient sampling and better estimates of the lowest energy sequence (83). Biased MC has beenapplied in an iterative manner, in which acceptable mutations were identified from a preliminarycalculation and utilized to bias a second MC procedure having a much larger rotamer library(88). Replica exchange methods have been applied to protein design, allowing local barriers in thesequence landscape to be overcome and improve sampling of sequences (75). During design, MCmethods can address explicitly effects resulting from the loss of side-chain entropy upon folding,but studies involving large numbers of structures suggest this local entropy has a modest impacton determining amino acid identity (89).

Dead-end elimination. A pruning method of identifying sequence-rotamer states that corre-spond to the global minimum energy conformation (79) has been adapted for side-chain modeling(90, 91). Functionally equivalent to an exhaustive search, the algorithm finds the minimum energysolution of an energy function comprising at most two-body interactions. The DEE theorem canbe stated briefly as follows: For two rotamers ir and it at position i, if the following inequality holdstrue,

E(ir ) +N∑

j �=i

mins

E(ir , js ) > E(it) +N∑

j �=i

maxs

E(it, js ), (2)

where E(ir) is side chain–backbone energy, and E(ir , js) is side chain–side chain energy with otherrotamers, then ir is incompatible with the global minimum and can be eliminated. The pruningcriterion can be relaxed to expedite the elimination of suboptimal rotamers (92):

E(ir ) − E(it) +N∑

j �=i

mins

[E(ir , js ) − E(it, js )] > 0, (3)

which is equivalent to eliminating rotamer ir if it has a higher energy than rotamer it for all possibleconformations at other sites. The process is repeated iteratively until no further amino acids orrotamer states may be eliminated, and a search among the remaining configurations identifies theglobal minimum.

134 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

Although applying DEE to protein design results in an exponential increase in computationtime for large proteins (84), the method has been applied successfully (93, 94) and continuesto be improved. Generalized DEE compares clusters of rotamers instead of individual rotamers,improving convergence and accommodating larger systems (94). Revised elimination and flaggingcriteria have been suggested (95), which include an MC calculation to eliminate high-energyrotamers. DEE developed for multistate and flexible structures includes extended DEE, whichsearches within discrete subspaces to yield gap-free lists of low-energy states (96). In tacklingmultistate design problems, i.e., including negative design, investigators recently suggested atype-dependent DEE (97), in which a rotamer may be eliminated based only on comparisons withother rotamers of the same amino acid. Flexible backbone DEE adds a constraining box aroundeach residue, using upper and lower bounds of rotameric interactions (98). Despite continuingadvances, DEE does not scale well with the size of the protein and is often applied to small proteinsor limited numbers of amino acid residues.

Mean-field theory. An alternative to stochastic sampling is the mean-field approach (39, 80, 99–101). Rather than enumerating sequences, mean-field calculations estimate the relative weights ofinteractions that determine the local energy at a site (residue) and the probabilities of the aminoacids. An effective temperature may be lowered so as to identify the properties of low-energysequences. The sequence and structure of a protein of N residues may be described as the set (α1,r(α1); α2,r(α2); . . . ; αN , r(αN )), where αi and r(αi) denote the amino acid identity and side-chainconformation, respectively, at site i. The average local energy εi(αi) at position i may be writtenas

εi (αi , r(αi )) =∑

r(α j )

j,α j

w j (α j , r(α j ))γi j (αi , r(αi ); α j , r(α j )) + ε0i (αi , r(αi )), (4)

where γi j (αi , r(αi ); α j , r(α j )) is the two-body interaction energy between side chains αi and αj, andε0

i (αi , r(αi )) is the one-body energy that results from side chain–backbone interactions and/or thestructural propensities of the amino acids. The w j (α j , r(α j )) are the site-specific probabilities ofthe amino acids and their conformational states, and often have a Boltzmann form: w j (α j , r(α j )) ∝exp(−βε j (α j , r(α j ))), where β is an effective inverse temperature. These equations are solved self-consistently for the local fields εi(αi). The properties of low-energy sequences may be explored bydecreasing the effective temperature parameter β−1. Often the most probable amino acid at eachvariable position is selected in determining a candidate sequence. Mean-field methods avoid theexplicit enumeration or sampling of conformations. With regard to characterizing global minima,mean-field algorithms have performed well in specifying hydrophobic core residues but less so forboundary and surface positions (84).

Probabilistic approach. Probabilistic approaches often are applied when complete informationis not available for a system, as is certainly the case for protein modeling and design. Site-specificamino acid probabilities, rather than specific sequences, may be used in protein design (27, 71).Such a probabilistic approach is motivated in part by the approximations and uncertainties asso-ciated with identifying sequences consistent with a particular structure: e.g., energy functions areapproximate; side-chain conformations are treated discretely; backbone atoms are fixed or highlyconstrained; and solvation is treated using simple models. Nonetheless, probabilistic methodsleverage parameterizations of interatomic interactions and structural features to provide usefulsequence information, which can guide design experiments and identify structurally importantamino acids. The site-specific probabilities highlight residues that are likely to tolerate mutationwithout adversely affecting structure, information that is certainly useful in designing functional

www.annualreviews.org • Theoretical and Computational Protein Design 135

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

proteins. Lastly, the site-specific probabilities provided are a natural input to guide the construc-tion of combinatorial libraries of proteins.

MC methods naturally provide a probabilistic sampling of sequences (75, 83), and in addition,an entropy-based formalism to determine amino acid probabilities for a given backbone structurehas been developed (27, 82). The theory uses concepts from statistical thermodynamics to estimatedirectly the site-specific probabilities. The method addresses the whole space of available compo-sitions and is not limited by computational enumeration and sampling. Large protein structures(more than 100 variable residues) can be accommodated easily. The effective entropy quantifiesthe variability of sequences consistent with the target structure. Entropy maximization is centralto the methodology, as in statistical thermodynamics and information theory. The most probableset of site-specific probabilities is determined by optimizing an effective entropy function, subjectto any imposed constraints. Constraints can specify both global considerations (e.g., the overallenergy of the sequences) and local features (e.g., the allowed amino acids at particular sites). Withthe judicious application of such constraints, the properties of sequences consistent with a par-ticular tertiary structure and other desired functional properties may be readily explored. Withwi (αi , r(αi )) denoting the amino acid and rotamer state probabilities at residue position i, the totalsequence-conformational entropy Sc is written as

Sc = −∑

i,αi ,r(αi )

wi (αi , r(αi )) ln wi (αi , r(αi )). (5)

The sum extends over each sequence position i and all available amino acids α and rotamerstates at each position. Although writing the entropy Sc in this manner implies a factorizationapproximation and suggests that the site-specific probabilities are independent, constraints on thesequences cause the probabilities to be coupled. The wi (αi , r(αi )) are determined as those thatmaximize Sc subject to constraints fm, which are themselves functions of the wi (αi , r(αi )). Thefunctions fm may be used to specify a wide variety of properties on the sequences, including theoverall energy of the structure, the patterning of residue properties, and effective energies thatquantify solvation and/or secondary-structure propensities of the amino acids. Fluctuations of theconstraint functions fm about their average values are assumed to be small. Different constraintsenter in a dimensionless manner, obviating the difficulties associated with the relative weighting ofthe energetic terms, e.g., physically derived versus database-derived energy terms. The resultingset of coupled, nonlinear equations is solved iteratively. If the only constraints imposed are thoseinvolving the atomistic energy and the normalization of the wi (αi , r(αi )), this methodology reducesto the mean-field methods discussed above. In guiding the identification of specific sequences, alow-energy consensus sequence may be identified as that comprising the most probable aminoacid at each position. Alternatively, an iterative series of calculations may be performed in whichincreasing numbers of residue identities or constraints are specified until a unique sequence isidentified. The calculated probabilities also may be used to guide an efficient MC-based methodthat uses predetermined amino acid probabilities to bias the generation of trial sequences at eachstep (83).

EFFORTS IN THEORETICALLY GUIDED PROTEIN DESIGN

We highlight some recent achievements in computational protein design, focusing on proteinsdesigned with the aid of computational methods and then realized experimentally. In some cases,high-resolution structures are available in addition to biochemical and biophysical characteriza-tion, providing detailed information on the accuracy of computational prediction. Computationalprotein design has advanced sufficiently to allow medium-size proteins to be designed efficiently,

136 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

as well as to permit the complete design of novel proteins. Given the space limitations of thisreview, there are many interesting topics that are not included, e.g., designed proteins to studyfolding kinetics (102–104) and allostery (105), and we sample studies from primarily 2003–2010and refer the reader to earlier reviews of prior pioneering work (66, 106–109).

Toward Redox-Active Proteins and Enzymes

One of the most striking functions of proteins is their capacity for enzymatic catalysis, so theengineering of new catalytic activity (110–112) has been a subject of design efforts.

Thioredoxin was re-engineered (113) to perform the hydrolysis of PNPA ( p-nitrophenyl ac-etate). The tetrahedral intermediate of histidine-PNPA was modeled as a series of rotamers, andthe surrounding protein sequence was optimized for binding to this high-energy intermediate. Ahigh-scoring sequence was realized experimentally, and the pH dependence of the reaction wasconsistent with histidine acting as the catalytic nucleophile in the designed protein. The designedmutant does not reach natural enzymatic efficiency, however, with only a tenfold enhancementin the turnover rate compared with a mixture of 4-methyl imidazole and catalytically inactivewild-type thioredoxin.

The structure and sequence of a 114-residue four-helix bundle (DFsc) with a dinuclearmetal center have been designed computationally (Figure 2a) (71). The target template for the

b da c

Figure 2Designed proteins containing cofactors. (a) A designed dinuclear single-chain four-helix bundle showing the NMR structure (cyan)superimposed on the template ( gray) used for design (71). (b) Model of a four-helix-bundle protein that binds four equivalentnonbiological, Fe diphenyl porphyrins (19). (c) Model of an A2B2 hetero-tetramer that binds a nonbiological photoactive Zn porphyrin(128). (d ) Model of a single-chain four-helix bundle that binds two nonbiological Fe diphenyl porphyrin cofactors (129).

www.annualreviews.org • Theoretical and Computational Protein Design 137

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

monomeric variant DFsc was created by designing a single protein chain that properly positionedthe four helices. Active site residues were constrained, and computational methods were used todetermine the identities of the remaining 88 residues. Despite the six ionizable residues within theinterior, the designed apo protein folds in the absence of metal ions. The protein stoichiometri-cally binds two equivalents of Fe(III), Co(II), Mn(II), and Zn(II) and has increased thermal stabilityupon metal binding. The experimentally determined structure matches the template within 1.0-Aroot-mean-square deviation (114). The resulting protein also exhibits catalytic activity with regardto known peroxidase substrates, and spectroscopic studies inform the redox mechanism of relateddi-iron proteins (115).

Beta motifs frequently occur in natural metal-binding proteins. The structure and sequence ofa novel beta protein have been designed with a metal binding site that mimics that of rubredoxin.The 40-residue protein exhibits beta structure in the presence and absence of metal ions, andbinds Fe(II/III) to yield a redox-active site that supports more than 16 cycles of oxidation andreduction, even in an aerobic environment (116).

The structure, sequence, and function of a phenol oxidase have been designed de novo, resultingin a protein that catalyzes O2-dependent oxidation of 4-aminophenol to quinone monoimine (117).The resulting protein exhibits features of biological enzymes (saturation kinetics and an active sitepocket) but does not have a rate or efficiency (kcat/Km = 1,500 M−1·min−1) associated with naturalenzymes.

Enzymes that catalyze reactions for which there is no known natural enzyme recently have beendesigned computationally. Such efforts select and redesign appropriate protein scaffolds, often thecommon TIM-barrel fold, so as to bind and stabilize high-energy intermediate structures in retro-aldol reactions (29) and in Kemp eliminations (118) (Figure 1a,b). In each case, large fractions(32/72 and 8/59) of the proteins that were realized exhibited enzymatic activity. In the case ofthe Kemp elimination catalyst, directed in vitro evolution methods were used subsequently toidentify proteins having a 200-fold increase in kcat/Km over the initial computational design. Theenzymes exhibit impressive rate enhancements of several orders of magnitude, but their efficiencies(kcat/Km ∼ 1–103 M−1 s−1) do not yet rival efficient natural enzymes (29, 118).

De Novo Design and Redesign

Using a coarse-grained protein model, investigators have included information about unfoldedstructures (negative design) in a stochastic search for a sequence with a funneled conformationalenergy landscape (56). A combination of MC sequence search and repeated folding simulationsensures that the targeted native state is well separated from other sampled structures. The au-thors designed a three-helix-bundle topology and selected several of the designed sequences forsynthesis, one of which had spectroscopic (circular dichroism and NMR) data consistent with awell-defined target structure. The study elegantly synthesizes energy landscape ideas with proteindesign methods.

A 97-residue α/β protein, Top7, with a novel fold was designed computationally (Figure 1e)(119). The design protocol consisted of cycling between sequence design and backbone optimiza-tion. Starting from a rough two-dimensional representation of the structure, peptide fragmentswere assembled to yield three-dimensional models consistent with the desired topology. A se-quence was designed computationally, and subsequently the backbone structure was minimized.For each of five initial structures, 15 cycles of sequence design and backbone optimization wereused to obtain low-energy sequence-structure pairs. The crystal structure of one of the designedproteins, Top7, adopts the designed topology with 1.17-A root-mean-square deviation over allbackbone atoms.

138 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

A novel, stand-alone βαβ motif has been designed computationally (Figure 1c) (120). The36-residue protein possesses an interesting parallel beta sheet. The protein topology was modeled,and the sequence was designed computationally. The use of negative design and Trp interactionsborrowed from trp-zip peptides ultimately yielded a protein of well-defined structure. The proteinhas exceptional thermostability (Tm = 90◦C), and the solution NMR structure is in excellentagreement with the targeted topology.

Protein Re-Engineering

Naturally occurring proteins continue to be redesigned extensively using computational methods.The activation domain of human procarboxypeptidase A2 has been redesigned computationally,resulting in a protein with 68% of the wild-type sequence mutated and having a 10 kcal mol−1

increase in stability (Figure 1d). A high-resolution crystal structure and solution NMR structuresare effectively superimposable with the targeted tertiary structure (121). The entire sequence of a51-residue engrailed homeodomain has been redesigned computationally (122). Two sequencesselected for experimental characterization had ∼23% sequence identity with the wild type, but bothwere much more stable than the natural protein, having thermal denaturation temperatures greaterthan 99◦C. The NMR solution structure closely matches that of the natural design template.

New binding sites for trinitrotoluene, L-lactose, and serotonin have been engineered compu-tationally in five proteins from the Escherichia coli periplasmic binding protein superfamily (123).The target ligands have little similarity with the natural ligands and exhibit a wide range of chem-ical properties in terms of shape, chirality, functional groups, flexibility, charge, and solubility.Satisfying all potential hydrogen-bond donors and acceptors in the ligand was critical. Experi-mental studies reveal that all six designed receptors for trinitrotoluene distinguish the absence ofa single nitro group, and all 10 lactate designs exhibited chiral stereospecifity for L-lactate overthe D-lactate enantiomer, pyruvate, and the prochiral oxidized form of lactate. The serotonin de-sign had a significantly lower affinity for two related molecules, tryptamine and tryptophan. Theobserved binding affinities of many of the engineered proteins for their target ligands were in thesame range as the wild-type receptors (Kd ∼ 0.1–1.5 μM). One group has suggested that someof these proteins do not show evidence of binding to some of the targeted substrates in crystalstructures and solution phase studies (124), indicating that these designed proteins merit furtherstudy.

Although there are few examples of designed beta-sheet proteins (116), tenascin has beenredesigned without explicit negative design to yield a stable protein compared with the wildtype (125). The crystallographic structure closely matches the structure of the computationallygenerated model. Although alternate competing structures may be less likely to be populated forthis particular fold, explicit negative design may not be necessary to arrive at well-folded betaproteins.

The specificity of interactions has been studied with the aid of computational protein design.A scheme for computationally designing specificity into protein interactions has been developedand used to identify partners for leucine zipper (bZIP) transcription factors (81). With the aid ofexperimental studies using protein microarrays, many designed peptide ligands, including thosethat bind important oncoproteins, were found to be selective for their intended targets. Thefindings suggest that bZIPs have sparsely sampled the range of possible interactions and that theinterplay between stability and specificity in these systems can be explored using computationalprotein design.

Computational protein design also may be used to design libraries of sequences. Differentmethods of generating combinatorial libraries have been explored and applied to green fluorescent

www.annualreviews.org • Theoretical and Computational Protein Design 139

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

protein variants (126). The fraction of functional proteins was largest for the library designed by us-ing a structure-based computational method. A greater diversity of color was observed in designedlibraries that maintained fluorescence. In contrast to many other methods for specifying libraries,structure-based computational methods preserved function as the mutation level increased.

Transfer proteins that bind and transfer lipids have been re-engineered to provide novel biosen-sors for fatty acids. Computational protein design has been used to remove a disulfide bridge andattach a thio-reactive fluorophore. Ligand binding likely causes a conformational change that canbe monitored using fluorescence. These variants have affinity to palmitate that is consistent withbinding to wild type (127).

Cofactors and Nonbiological Protein Assemblies

Computational de novo design accelerates the creation of novel protein assemblies, which maynot have natural analogs. The association of nonbiological cofactors with proteins opens the pos-sibility of realizing biomolecular systems with properties that are accessible neither with naturallyoccurring amino acids nor with biological cofactors. For complex macromolecular cofactors, nat-ural protein structures may not provide appropriate scaffolds, and novel structures also must becreated. An oligomeric, tetrahelical protein framework has been designed that encapsulates a syn-thetic cofactor (17), yielding a computationally designed protein that selectively binds two copiesof a nonbiological diphenyl iron porphyrin (DPP-Fe). Binding of the cofactor in a bis-His fashionwas observed for DPP-Fe but not for other Fe-containing porphyrins (17). This design strategyhas been elaborated to realize a modular metalloporphyrin peptide array with a coiled-coil re-peat that binds four equivalents of DPP-Fe (Figure 2b) (19). Similarly, a protein that selectivelybinds a nonbiological photoactive zinc porphyrin DPP-Zn has been designed computationally,which assembles as an A2B2 heterotetramer (Figure 2c) (128). Experimental studies reveal thetargeted cofactor-binding stoichiometry and cofactor-binding specificity in each case. One pathto proteins that incorporate nonbiological, asymmetric cofactors is via the computational design ofsingle-chain (monomeric) proteins. Experimental characterization of a computationally designed108-residue single-chain helical protein is consistent with high specificity of binding to the de-sired cofactors and a well-structured protein that binds the DPP-Fe cofactor (Figure 2d) (129).This designed protein presents an example of a scaffold that may be used to explore asymmetricmutations to enable a variety of properties, including surface immobilization and selective bindingof different cofactors and chromophores. Designed amphipathic helical bundle proteins bindingsimilar cofactors are capable of self-ordering at aqueous interfaces, conferring vectorial orientationto the bundles and the contained cofactors (130, 131).

Protein-Protein Interactions and Protein Assemblies

There has been much recent interest in multimeric proteins and protein-protein interactions.Computational studies involving iterative design have revealed that a bias toward low-energycomplexes can result in symmetric assemblies of proteins from random ensembles in which sym-metric complexes are essentially absent initially (132). Protein-protein interactions recently havebeen subjects of computational protein design (133). Structure-based computational design hasyielded tetrapeptides that efficiently depolymerize serine-protease inhibitor (serpin) fibrils (134).Thus small, designed peptides may be used to modulate polymerization and depolymerizationof fibrillar aggregates and provide evidence that peptide-induced depolymerization takes placevia a heterogeneous, multistep process that begins with internal fragmentation. Protein dimershave been generated from monomeric proteins using computational protein docking and sequence

140 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

design. A docking algorithm was used to generate a dimeric model of the β1 domain of streptococ-cal protein G. Computational design of 24/56 residues yielded a heterodimer comprising subunitswith 8 and 12 mutations relative to wild type. The dimer is weakly bound (0.3-mM dissociationconstant), but for one of the designed proteins, NMR spectral changes observed in the presenceof its binding partner suggest specific dimer formation (135). A designed de novo hydrogen-bondnetwork at the DNase-immunity protein interface yielded new binding partners that exhibitedspecificities of at least 300-fold preference for the cognate over the noncognate binding pairs (136).Computational redesign has stabilized an adhesion protein interaction involved in inflammatoryresponse, involving integrin lymphocyte function-associated antigen-1 (LFA-1) and its ligand in-tercellular adhesion molecule-1 (ICAM-1) (137). A calcium binding site has been engineered intothe cell adhesion protein CD2, and the construct retains its ability to bind CD48; such systems canbe useful for engineering Ca2+-responsive sensors and switches and also for probing the subtletiesof Ca2+ signaling and binding (138).

Proteins having complex quaternary structures also have been subjects of design. The symmetryof large, symmetric protein assemblies may be leveraged to address large numbers of simultaneousmutations (139). The DNA protection protein Dps is a dodecameric structure having a 4.5-nm-diameter interior cavity that has been redesigned computationally by varying up to 120 sites (10 perprotein subunit) to confer a large hydrophobic interior surface (Figure 3a) (140). Thus ferritinproteins provide robust scaffolds for engineering large cavities having potential applications asnanoscale containers. Human H ferritin forms an assembly with 24 four-helix-bundle subunitshaving an 8-nm-diameter cavity. This system has been redesigned to confer noble metal ion(Au3+, Ag+) binding, reduction, and nanoparticle formation within the cavity (141). Binding tothese metals occurs via exposed thiol groups introduced (via cysteine mutations) to the surface ofthe cavity. A variant with a total of 96 cysteines and histidines removed from the exterior surfaceand 96 nonnative cysteines added to the interior surface retained wild-type stability and structure,as confirmed by X-ray crystallography and spectroscopic studies.

a b c

Figure 3Computationally designed associating proteins. (a) The redesigned interior of human ferritin (140). The parent wild type has nocysteines on the interior surface. Eight of the 24 subunits have been deleted to expose the interior. Metal-binding cysteines are shownin gray (carbon) and yellow (sulfur). (b) Superposition of the structure of the transmembrane portion of the bacterial potassium ionchannel KcsA ( gray) and the experimentally determined solution structure of a computationally designed water-soluble variant (cyan)(143, 144). (c) Model of the CHAMP transmembrane helix (magenta) designed to specifically bind integrins (cyan) αIIb (left) and αv(right), both of which naturally bind integrin β3. The GxxxG interface motif is rendered with space-filling atoms (147).

www.annualreviews.org • Theoretical and Computational Protein Design 141

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

Membrane Proteins

The design of novel membrane proteins has applications to improving our understanding andcontrol of membranes and membrane proteins. Given their central roles in many biological pro-cesses, membrane proteins are targets of many drugs and therapeutics. Computational proteindesign has been used to test and refine models of G-protein coupled receptors, in which the ef-fects of mutations on oligomerization state are compared with mutagenesis studies and used toconfirm atomistic models of these important receptors (142).

Transmembrane proteins span the membrane and have large numbers of exterior hydrophobicresidues that maintain structure and registry in the membrane bilayer. As a result, membraneproteins are aggregation prone, difficult to obtain in large amounts, and recalcitrant to purificationusing standard solution phase methods. Most biophysical studies of membrane proteins involvethe aqueous dispersion of membrane proteins using detergents, lipids or auxiliary proteins, andreconstitution, e.g., so as to obtain diffraction-quality crystals or solubilized protein suitable forNMR and other studies. Obtaining such solubilizing conditions for biophysical and structuralstudies remains subtle and time intensive.

Water-soluble variants of integral membrane proteins have been designed computationally,potentially facilitating studies of their structures and function. The solution structure of a compu-tationally designed water-soluble variant of the transmembrane domain of the bacterial potassiumion channel KcsA recovers the tertiary and quaternary structures of the membrane-soluble wild-type structure (Figure 3b) (143, 144). The redesigned protein contains 29 designed mutationsin each of the four 104-residue subunits. These findings highlight the promise of developingwater-soluble variants of membrane proteins using computational redesign of sequence.

Transmembrane proteins have been targeted for computationally directed structure-based de-sign and engineering. Studies of a model helical protein having both transmembrane and aqueousdomains have explored the roles of Asn-mediated interactions in conferring a particular fold andoligomerization state (145). Cooperative, interhelix interactions in a serine-zipper transmembranehelix motif have been examined in a computationally designed transmembrane system. The de-signed protein exhibits parallel dimerization of the helices, and mutation of the central serineresidues to alanine yields dimers of comparable stability, suggesting that complementary pack-ing interactions, rather than hydrogen bonding, play a central role in stabilizing the dimer (146).Building upon these findings, researchers have computationally designed peptides that specificallytarget transmembrane helices to modulate the activity of membrane proteins. The designed pep-tides were able to discriminate between the transmembrane helices of two closely related integrins(αIIbβ3 and αvβ3), in which the specificity was obtained using complementary peptide-helix stericinteractions (Figure 3c) (147). This approach has been extended to address direct interactions ofa designed anti-RIIb peptide with isolated full-length integrin RIIb, and the peptides take on atransmembrane alpha-helical structure that does not disrupt the bilayer (148). Proteins have beendesigned to provide controllable integrity of a bilayer. A natural alpha-helical cell-lytic peptide,mastoparan X, has been re-engineered to bind metal cations; binding of Zn(II) or Ni(II) stabilizesthe peptide’s amphiphilic structure, leading to lysis of cells and vesicles (149).

CONCLUSION

Successful protein design poses many challenges: the many degrees of freedom involving both se-quence and local structure that lead to the combinatorial complexity of the search for sequences,the subtlety of the underlying physical forces that stabilize folded structures, and our incompleteunderstanding of the determinants of folding and function. Computational protein design seeks

142 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

to make quantitative many fundamental rules governing protein folding and to develop efficientapproaches to identify and characterize sequences consistent with a given structure. In recentyears, this quantitative approach to protein design has yielded a number of high-quality sequenceprediction methods. These efforts have led to milestones in the design of new proteins with novelproperties. Continued development and application of these methods will inform our understand-ing of protein structure, protein folding, and protein activity. Such methods open the door to thecreation of tailored, nonnatural functional proteins and the re-engineering of natural proteins toenhance or redirect their activities.

DISCLOSURE STATEMENT

The authors are not aware of any affiliations, memberships, funding, or financial holdings thatmight be perceived as affecting the objectivity of this review.

ACKNOWLEDGMENTS

The authors gratefully acknowledge support from the U.S. Department of Energy (DE-FG02-04ER46156) and the National Institutes of Health (P01 GM55876, R01 HL-085303). Thiswork was partially supported through the Laboratory for Research on the Structure of Matterthrough NSF MRSEC DMR05-20020. The authors acknowledge infrastructural support from theUniversity of Pennsylvania’s Nano/Bio Interface Center through the National Science Founda-tion (NSF) NSEC DMR08-32802. I.S. acknowledges funding from the Human Frontiers ScienceProgram (HFSP). J.G.S. is grateful for the hospitality of the College of Chemistry and MolecularEngineering at Peking University.

LITERATURE CITED

1. Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A. 2009. Critical assessment of methods ofprotein structure prediction: round VIII. Proteins 77:1–4

2. Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A, et al. 2009. PSI-2: structural genomics tocover protein domain family space. Structure 17:869–81

3. Terwilliger TC, Stuart D, Yokoyama S. 2009. Lessons from structural genomics. Annu. Rev. Biophys.38:371–83

4. Onuchic JN, Luthey-Schulten Z, Wolynes PG. 1997. Theory of protein folding: the energy landscapeperspective. Annu. Rev. Phys. Chem. 48:545–600

5. Onuchic JN, Wolynes PG, Luthey-Schulten Z, Socci ND. 1995. Toward an outline of the topographyof a realistic protein-folding funnel. Proc. Natl. Acad. Sci. USA 92:3626–30

6. Degrado WF. 1988. Design of peptides and proteins. Adv. Protein Chem. 39:51–1247. Hill CP, Anderson DH, Wesson L, Degrado WF, Eisenberg D. 1990. Crystal structure of α1: implica-

tions for protein design. Science 249:543–468. Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH. 1993. Protein design by binary patterning of

polar and nonpolar amino acids. Science 262:1680–859. Regan L, Degrado WF. 1988. Characterization of a helical protein designed from first principles. Science

241:976–7810. Quinn TP, Tweedy NB, Williams RW, Richardson JS, Richardson DC. 1994. Betadoublet: de novo

design, synthesis, and characterization of a beta-sandwich protein. Proc. Natl. Acad. Sci. USA 91:8747–5111. Bryson JW, Betz SF, Lu HS, Suich DJ, Zhou HXX, et al. 1995. Protein design: a hierarchical approach.

Science 270:935–4112. Bryson JW, Desjarlais JR, Handel TM, DeGrado WF. 1998. From coiled coils to small globular proteins:

design of a native-like three-helix bundle. Protein Sci. 7:1404–14

www.annualreviews.org • Theoretical and Computational Protein Design 143

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

13. Schneider M, Fu XR, Keating AE. 2009. X-ray versus NMR structures as templates for computationalprotein design. Proteins 77:97–110

14. Simons KT, Kooperberg C, Huang E, Baker D. 1997. Assembly of protein tertiary structures fromfragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol.Biol. 268:209–25

15. Levitt M. 1992. Accurate modeling of protein conformation by automatic segment matching. J. Mol.Biol. 226:507–33

16. Tsai CJ, Zheng J, Aleman C, Nussinov R. 2006. Structure by design: from single proteins and theirbuilding blocks to nanostructures. Trends Biotechnol. 24:449–54

17. Cochran FV, Wu SP, Wang W, Nanda V, Saven JG, et al. 2005. Computational de novo design andcharacterization of a four-helix bundle protein that selectively binds a nonbiological cofactor. J. Am.Chem. Soc. 127:1346–47

18. North B, Summa CM, Ghirlanda G, DeGrado WF. 2001. Dn-symmetrical tertiary templates for thedesign of tubular proteins. J. Mol. Biol. 311:1081–90

19. McAllister KA, Zou HL, Cochran FV, Bender GM, Senes A, et al. 2008. Using α-helical coiled coils todesign nanostructured metalloporphyrin arrays. J. Am. Chem. Soc. 130:11921–27

20. Desjarlais JR, Handel TM. 1995. New strategies in protein design. Curr. Opin. Biotechnol. 6:460–6621. Harbury PB, Tidor B, Kim PS. 1995. Repacking protein cores with backbone freedom: structure pre-

diction for coiled coils. Proc. Natl. Acad. Sci. USA 92:8408–1222. Harbury PB, Plecs JJ, Tidor B, Alber T, Kim PS. 1998. High-resolution protein design with backbone

freedom. Science 282:1462–6723. Humphris EL, Kortemme T. 2008. Prediction of protein-protein interface sequence diversity using

flexible backbone computational protein design. Structure 16:1777–8824. Georgiev I, Keedy D, Richardson J, Richardson D, Donald B. 2008. Algorithm for backrub motions in

protein design. Bioinformatics 24:I196–20425. Mandell DJ, Coutsias EA, Kortemme T. 2009. Sub-angstrom accuracy in protein loop reconstruction

by robotics-inspired conformational sampling. Nat. Methods 6:551–5226. Mandell DJ, Kortemme T. 2009. Backbone flexibility in computational protein design. Curr. Opin.

Biotechnol. 20:420–2827. Kono H, Saven JG. 2001. Statistical theory for protein combinatorial libraries: packing interactions,

backbone flexibility, and the sequence variability of a main-chain structure. J. Mol. Biol. 306:607–2828. Nagano N, Orengo CA, Thornton JM. 2002. One fold with many functions: the evolutionary rela-

tionships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol.321:741–65

29. Jiang L, Althoff EA, Clemente FR, Doyle L, Rothlisberger D, et al. 2008. De novo computational designof retro-aldol enzymes. Science 319:1387–91

30. Marshall SA, Mayo SL. 2001. Achieving stability and conformational specificity in designed proteins viabinary patterning. J. Mol. Biol. 305:619–31

31. Jiang L, Kuhlman B, Kortemme T, Baker D. 2005. A “solvated rotamer” approach to modeling water-mediated hydrogen bonds at protein-protein interfaces. Proteins 58:893–904

32. Chin JW, Cropp TA, Anderson JC, Mukherji M, Zhang Z, Schultz PG. 2003. An expanded eukaryoticgenetic code. Science 301:964–67

33. Serrano AL, Troxler T, Tucker MJ, Gai F. 2010. Photophysics of a fluorescent non-natural amino acid:p-cyanophenylalanine. Chem. Phys. Lett. 487:303–6

34. Dunbrack RL Jr. 2002. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 12:431–4035. Ponder JW, Richards FM. 1987. Tertiary templates for proteins: use of packing criteria in the enumer-

ation of allowed sequences for different structural classes. J. Mol. Biol. 193:775–9136. Tuffery P, Etchebest C, Hazout S, Lavery R. 1991. A new approach to the rapid determination of protein

side chain conformations. J. Biomol. Struct. Dyn. 8:1267–8937. Dunbrack RL Jr, Karplus M. 1993. Backbone-dependent rotamer library for proteins: application to

side-chain prediction. J. Mol. Biol. 230:543–7438. Schrauber H, Eisenhaber F, Argos P. 1993. Rotamers: to be or not to be? An analysis of amino acid

side-chain conformations in globular proteins. J. Mol. Biol. 230:592–612

144 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

39. Kono H, Doi J. 1996. A new method for side-chain conformation prediction using a Hopfield networkand reproduced rotamers. J. Comput. Chem. 17:1667–83

40. De Maeyer M, Desmet J, Lasters I. 1997. All in one: A highly detailed rotamer library improves bothaccuracy and speed in the modelling of sidechains by dead-end elimination. Fold. Des. 2:53–66

41. Dunbrack RLJ, Cohen FE. 1997. Bayesian statistical analysis of protein side-chain retainer preferences.Protein Sci. 6:1661–81

42. Lovell SC, Davis IW, Arendall WB 3rd, de Bakker PI, Word JM, et al. 2003. Structure validation byCα geometry: ϕ, ψ and Cβ deviation. Proteins 50:437–50

43. Lovell SC, Word JM, Richardson JS, Richardson DC. 2000. The penultimate rotamer library. Proteins40:389–408

44. Peterson RW, Dutton PL, Wand AJ. 2004. Improved side-chain prediction accuracy using an ab initiopotential energy function and a very large rotamer library. Protein Sci. 13:735–51

45. Xiang ZX, Honig B. 2001. Extending the accuracy limits of prediction for side-chain conformations.J. Mol. Biol. 311:421–30

46. Shetty RP, de Bakker PIW, DePristo MA, Blundell TL. 2003. Advantages of fine-grained side chainconformer libraries. Protein Eng. 16:963–69

47. Lassila JK, Privett HK, Allen BD, Mayo SL. 2006. Combinatorial methods for small-molecule placementin computational enzyme design. Proc. Natl. Acad. Sci. USA 103:16710–15

48. Shandler SJ, Shapovalov MV, Dunbrack RL, DeGrado WF. 2010. Development of a rotamer libraryfor use in β-peptide foldamer computational design. J. Am. Chem. Soc. 132:7312–20

49. Boas FE, Harbury PB. 2007. Potential energy functions for protein design. Curr. Opin. Struct. Biol.17:199–204

50. Makhatadze GI, Privalov PL. 1995. Energetics of protein structure. Adv. Protein Chem. 47:307–42551. Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, et al. 1984. A new force-field for molecular

mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 106:765–8452. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swamiathan S, Karplus M. 1983. CHARMM: a

program for macromolecular energy minimization and dynamics calculations. J. Comput. Chem. 4:187–217

53. Hermans J, Berendsen HJC, Vangunsteren WF, Postma JPM. 1984. A consistent empirical potential forwater-protein interactions. Biopolymers 23:1513–18

54. Mayo SL, Olafson BD, Goddard WA III. 1990. DREIDING: a generic force field for molecular simu-lations. J. Phys. Chem. 94:8897–909

55. Kuhlman B, Baker D. 2000. Native protein sequences are close to optimal for their structures. Proc. Natl.Acad. Sci. USA 97:10383–88

56. Jin WZ, Kambara O, Sasakawa H, Tamura A, Takada S. 2003. De novo design of foldable proteins withsmooth folding funnel: automated negative design and experimental verification. Structure 11:581–90

57. Gordon DB, Marshall SA, Mayo SL. 1999. Energy functions for protein design. Curr. Opin. Struct. Biol.9:509–13

58. Eisenberg D, McLachlan A. 1986. Solvation energy in protein folding and binding. Nature 319:199–20359. Sharp KA, Nicholls A, Friedman R, Honig B. 1991. Extracting hydrophobic free energies from experi-

mental data: relationship to protein folding and theoretical models. Biochemistry 30:9686–9760. Im W, Lee MS, Brooks CL III. 2003. Generalized Born model with a simple smoothing function.

J. Comput. Chem. 24:1691–70261. Lee MS, Feig M, Salsbury FR Jr, Brooks CL III. 2003. New analytic approximation to the standard

molecular volume definition and its application to generalized Born calculations. J. Comput. Chem.24:348–56

62. Guvench O, Weiser J, Shenkin P, Kolossvary I, Still WC. 2002. Application of the frozen atom approx-imation to the GB/SA continuum model for solvation free energy. J. Comput. Chem. 23:214–21

63. Vizcarra CL, Mayo SL. 2005. Electrostatics in computational protein design. Curr. Opin. Chem. Biol.9:622–26

64. Takada S, Luthey-Schulten Z, Wolynes PG. 1999. Folding dynamics with nonadditive forces: a simula-tion study of a designed helical protein and a random heteropolymer. J. Chem. Phys. 110:11616–29

www.annualreviews.org • Theoretical and Computational Protein Design 145

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

65. Hecht MH, Richardson JS, Richardson DC, Ogden RC. 1990. De novo design, expression, and charac-terization of felix: a four-helix bundle protein of native-like sequence. Science 249:884–91

66. Saven JG. 2001. Designing protein energy landscapes. Chem. Rev. 101:3113–3067. Hellinga HW. 1997. Rational protein design: combining theory and experiment. Proc. Natl. Acad. Sci.

USA 94:10015–1768. Shakhnovich EI. 1998. Protein design: a perspective from simple tractable models. Fold. Des. 3:R45–5869. Yue K, Fiebig KM, Thomas PD, Chan HS, Shakhnovich EI, Dill KA. 1995. A test of lattice protein

folding algorithms. Proc. Natl. Acad. Sci. USA 92:325–2970. Havranek JJ, Harbury PB. 2003. Automated design of specificity in molecular recognition. Nat. Struct.

Biol. 10:45–5271. Calhoun JR, Kono H, Lahr S, Wang W, DeGrado WF, Saven JG. 2003. Computational design and

characterization of a monomeric helical dinuclear metalloprotein. J. Mol. Biol. 334:1101–1572. Wernisch L, Hery S, Wodak SJ. 2000. Automatic protein design with all atom force-fields by exact and

heuristic optimization. J. Mol. Biol. 301:713–3673. Samish I. 2009. Search and sampling in structural bioinformatics. In Structural Bioinformatics, ed. J Gu,

PE Bourne, pp. 207–35. New York: Wiley74. Hellinga HW, Richards FM. 1994. Optimal sequence selection in proteins of known structure by sim-

ulated evolution. Proc. Natl. Acad. Sci. USA 91:5803–775. Yang X, Saven JG. 2005. Computational methods for protein design and protein sequence variability:

biased Monte Carlo and replica exchange. Chem. Phys. Lett. 401:205–1076. Allen BD, Mayo SL. 2010. An efficient algorithm for multistate protein design based on FASTER.

J. Comput. Chem. 31:904–1677. Leach AR, Lemon AP. 1998. Exploring the conformational space of protein side chains using dead-end

elimination and the A∗ algorithm. Proteins 33:227–3978. Desjarlais JR, Handel TM. 1999. Side-chain and backbone flexibility in protein core design. J. Mol. Biol.

290:305–1879. Desmet J, De Maeyer M, Hazes B, Lasters I. 1992. The dead-end elimination theorem and its use in

protein side-chain positioning. Nature 356:539–4280. Koehl P, Delarue M. 1994. Application of a self-consistent mean field theory to predict protein side-

chains conformation and estimate their conformational entropy. J. Mol. Biol. 239:249–7581. Grigoryan G, Reinke AW, Keating AE. 2009. Design of protein-interaction specificity gives selective

bZIP-binding peptides. Nature 458:859–6482. Zou JM, Saven JG. 2000. Statistical theory of combinatorial libraries of folding proteins: energetic

discrimination of a target structure. J. Mol. Biol. 296:281–9483. Zou J, Saven JG. 2003. Using self-consistent fields to bias Monte Carlo methods with applications to

designing and sampling protein sequences. J. Chem. Phys. 118:3843–5484. Voigt CA, Gordon DB, Mayo SL. 2000. Trading accuracy for speed: a quantitative comparison of search

algorithms in protein sequence design. J. Mol. Biol. 299:789–80385. Cootes AP, Curmi PMG, Torda AE. 2000. Biased Monte Carlo optimization of protein sequences.

J. Chem. Phys. 113:2489–9686. Siepmann JI. 1990. A method for the direct calculation of chemical potentials for dense chain systems.

Mol. Phys. 70:1145–5887. Jain T, Cerutti DS, McCammon JA. 2006. Configurational-bias sampling technique for predicting side-

chain conformations in proteins. Protein Sci. 15:2029–3988. Dantas G, Kuhlman B, Callender D, Wong M, Baker D. 2003. A large scale test of computational protein

design: folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 332:449–6089. Hu XZ, Kuhlman B. 2006. Protein design simulations suggest that side-chain conformational entropy

is not a strong determinant of amino acid environmental preferences. Proteins 62:739–4890. Dahiyat BI, Mayo SL. 1996. Protein design automation. Protein Sci. 5:895–90391. Liang S, Grishin NV. 2002. Side-chain modeling with an optimized scoring function. Protein Sci. 11:322–

3192. Goldstein RF. 1994. Efficient rotamer elimination applied to protein side-chains and related spin glasses.

Biophys. J. 66:1335–40

146 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

93. Dahiyat BI, Mayo SL. 1997. De novo protein design: fully automated sequence selection. Science 278:82–87

94. Looger LL, Hellinga HW. 2001. Generalized dead-end elimination algorithms make large-scale proteinside-chain structure prediction tractable: implications for protein design and structural genomics. J. Mol.Biol. 307:429–45

95. Gordon DB, Hom GK, Mayo SL, Pierce NA. 2003. Exact rotamer optimization for protein design.J. Comput. Chem. 24:232–43

96. Kloppmann E, Ullmann GM, Becker T. 2007. An extended dead-end elimination algorithm to determinegap-free lists of low energy states. J. Comput. Chem. 28:2325–35

97. Yanover C, Fromer M, Shifman JM. 2007. Dead-end elimination for multistate protein design. J. Comput.Chem. 28:2122–29

98. Georgiev I, Donald B. 2007. Dead-end elimination with backbone flexibility. Bioinformatics 23:I185–9499. Lee C. 1994. Predicting protein mutant energetics by self-consistent ensemble optimization. J. Mol. Biol.

236:918–39100. Vasquez M. 1995. An evaluation of discrete and continuum search techniques for conformational analysis

of side-chains in proteins. Biopolymers 36:53–70101. Mendes J, Soares CM, Carrondo MA. 1999. Improvement of side-chain modeling in proteins with the

self-consistent mean field theory method based on an analysis of the factors influencing prediction.Biopolymers 50:111–31

102. Zhu Y, Fu X, Wang T, Tamura A, Takada S, et al. 2004. Guiding the search for a protein’s maximumrate of folding. Chem. Phys. 307:99–109

103. Bunagan MR, Yang X, Saven JG, Gai F. 2006. Ultrafast folding of a computationally designed Trp-cagemutant: Trp2-cage. J. Phys. Chem. B 110:3759–63

104. Tang J, Kang SG, Saven JG, Gai F. 2009. Characterization of the cofactor-induced folding mechanismof a zinc-binding peptide using computationally designed mutants. J. Mol. Biol. 389:90–102

105. Ambroggio XI, Kuhlman B. 2006. Computational design of a single amino acid sequence that can switchbetween two distinct protein folds. J. Am. Chem. Soc. 128:1154–61

106. DeGrado WF, Summa CM, Pavone V, Nastri F, Lombardi A. 1999. De novo design and structuralcharacterization of proteins and metalloproteins. Annu. Rev. Biochem. 68:779–819

107. Park S, Xi Y, Saven JG. 2004. Advances in computational protein design. Curr. Opin. Struct. Biol. 14:487–94

108. Butterfoss GL, Kuhlman B. 2006. Computer-based design of novel protein structures. Annu. Rev. Biophys.Biomol. Struct. 35:49–65

109. Kang SG, Saven JG. 2007. Computational protein design: structure, function and combinatorial diversity.Curr. Opin. Chem. Biol. 11:329–34

110. Koder RL, Anderson JL, Solomon LA, Reddy KS, Moser CC, Dutton PL. 2009. Design and engineeringof an O2 transport protein. Nature 458:305–9

111. Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ Jr, et al. 2006. Computational redesignof endonuclease DNA binding and cleavage specificity. Nature 441:656–59

112. Razeghifard R, Wallace BB, Pace RJ, Wydrzynski T. 2007. Creating functional artificial proteins. Curr.Protein Pept. Sci. 8:3–18

113. Bolon DN, Mayo SL. 2001. Enzyme-like proteins by computational design. Proc. Natl. Acad. Sci. USA98:14274–79

114. Calhoun JR, Liu W, Spiegel K, Dal Peraro M, Klein ML, et al. 2008. Solution NMR structure of adesigned metalloprotein and complementary molecular dynamics refinement. Structure 16:210–15

115. Bell CB, Calhoun JR, Bobyr E, Wei PP, Hedman B, et al. 2009. Spectroscopic definition of the biferrousand biferric sites in de novo designed four-helix bundle DFsc peptides: implications for O2 reactivity ofbinuclear non-heme iron enzymes. Biochemistry 48:59–73

116. Nanda V, Rosenblatt MM, Osyczka A, Kono H, Getahun Z, et al. 2005. De novo design of a redox-activeminimal rubredoxin mimic. J. Am. Chem. Soc. 127:5804–5

117. Kaplan J, DeGrado WF. 2004. De novo design of catalytic proteins. Proc. Natl. Acad. Sci. USA 101:11566–70

www.annualreviews.org • Theoretical and Computational Protein Design 147

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

118. Rothlisberger D, Khersonsky O, Wollacott AM, Jiang L, DeChancie J, et al. 2008. Kemp eliminationcatalysts by computational enzyme design. Nature 453:190–95

119. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. 2003. Design of a novel globularprotein fold with atomic-level accuracy. Science 302:1364–68

120. Liang HH, Chen H, Fan KQ, Wei P, Guo XR, et al. 2009. De novo design of a βαβ motif. Angew.Chem. Int. Ed. Engl. 48:3301–3

121. Dantas G, Corrent C, Reichow SL, Havranek JJ, Eletr ZM, et al. 2007. High-resolution structuraland thermodynamic analysis of extreme stabilization of human procarboxypeptidase by computationalprotein design. J. Mol. Biol. 366:1209–21

122. Shah PS, Hom GK, Ross SA, Lassila JK, Crowhurst KA, Mayo SL. 2007. Full-sequence computationaldesign and solution structure of a thermostable protein variant. J. Mol. Biol. 372:1–6

123. Looger LL, Dwyer MA, Smith JJ, Hellinga HW. 2003. Computational design of receptor and sensorproteins with novel functions. Nature 423:185–90

124. Schreier B, Stumpp C, Wiesner S, Hocker B. 2009. Computational design of ligand binding is not asolved problem. Proc. Natl. Acad. Sci. USA 106:18491–96

125. Hu XZ, Wang HC, Ke HM, Kuhlman B. 2008. Computer-based redesign of a beta sandwich proteinsuggests that extensive negative design is not required for de novo beta sheet design. Structure 16:1799–805

126. Treynor TP, Vizcarra CL, Nedelcu D, Mayo SL. 2007. Computationally designed libraries of fluorescentproteins evaluated by preservation and diversity of function. Proc. Natl. Acad. Sci. USA 104:48–53

127. Choi EJ, Mao J, Mayo SL. 2007. Computational design and biochemical characterization of maizenonspecific lipid transfer protein variants for biosensor applications. Protein Sci. 16:582–88

128. Fry HC, Lehmann A, Saven JG, DeGrado WF, Therien MJ. 2010. Computational design and elab-oration of a de novo heterotetrameric α-helical protein that selectively binds an emissive abiological(porphinato)zinc chromophore. J. Am. Chem. Soc. 132:3997–4005

129. Bender GM, Lehmann A, Zou H, Cheng H, Fry HC, et al. 2007. De novo design of a single-chaindiphenylporphyrin metalloprotein. J. Am. Chem. Soc. 129:10732–40

130. Zou HL, Therien MJ, Blasie JK. 2008. Structure and dynamics of an extended conjugated NLO chro-mophore within an amphiphilic 4-helix bundle peptide by molecular dynamics simulation. J. Phys. Chem.B 112:1350–57

131. Zou HL, Strzalka J, Xu T, Tronin A, Blasie JK. 2007. Three-dimensional structure and dynamics of a denovo designed, amphiphilic, metallo-porphyrin-binding protein maquette at soft interfaces by moleculardynamics simulations. J. Phys. Chem. B 111:1823–33

132. Andre I, Strauss CEM, Kaplan DB, Bradley P, Baker D. 2008. Emergence of symmetry in ho-mooligomeric biological assemblies. Proc. Natl. Acad. Sci. USA 105:16148–52

133. Mandell DJ, Kortemme T. 2009. Computer-aided design of functional protein interactions. Nat. Chem.Biol. 5:797–807

134. Chowdhury P, Wang W, Lavender S, Bunagan MR, Klemke JW, et al. 2007. Fluorescence correla-tion spectroscopic study of serpin depolymerization by computationally designed peptides. J. Mol. Biol.369:462–73

135. Huang P-S, Love JJ, Mayo SL. 2007. A de novo designed protein protein interface. Protein Sci. 16:2770–74

136. Joachimiak LA, Kortemme T, Stoddard BL, Baker D. 2006. Computational design of a new hydrogenbond network and at least a 300-fold specificity switch at a protein-protein interface. J. Mol. Biol. 361:195–208

137. Song G, Lazar GA, Kortemme T, Shimaoka M, Desjarlais JR, et al. 2006. Rational design of intercel-lular adhesion molecule-1 (ICAM-1) variants for antagonizing integrin lymphocyte function-associatedantigen-1-dependent adhesion. J. Biol. Chem. 281:5042–49

138. Yang W, Wilkins AL, Ye YM, Liu ZR, Li SY, et al. 2005. Design of a calcium-binding protein withdesired structure in a cell adhesion molecule. J. Am. Chem. Soc. 127:2085–93

139. Fu XR, Kono H, Saven JG. 2003. Probabilistic approach to the design of symmetric protein quaternarystructures. Protein Eng. 16:971–77

148 Samish et al.

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62CH07-Saven ARI 24 February 2011 21:38

140. Swift J, Wehbi WA, Kelly BD, Stowell XF, Saven JG, Dmochowski IJ. 2006. Design of functionalferritin-like proteins with hydrophobic cavities. J. Am. Chem. Soc. 128:6611–19

141. Butts CA, Swift J, Kang SG, Di Costanzo L, Christianson DW, et al. 2008. Directing noble metal ionchemistry within a designed ferritin protein. Biochemistry 47:12729–39

142. Taylor MS, Fung HK, Rajgaria R, Filizola M, Weinstein H, Floudas CA. 2008. Mutations affecting theoligomerization interface of G-protein-coupled receptors revealed by a novel de novo protein designframework. Biophys. J. 94:2470–81

143. Slovic AM, Kono H, Lear JD, Saven JG, DeGrado WF. 2004. Computational design of water-solubleanalogues of the potassium channel KcsA. Proc. Natl. Acad. Sci. USA 101:1828–33

144. Ma DJ, Tillman TS, Tang P, Meirovitch E, Eckenhoff R, et al. 2008. NMR studies of a channel pro-tein without membranes: structure and dynamics of water-solubilized KcsA. Proc. Natl. Acad. Sci. USA105:16537–42

145. Cristian L, Nanda V, Lear JD, DeGrado WF. 2005. Synergistic interactions between aqueous andmembrane domains of a designed protein determine its fold and stability. J. Mol. Biol. 348:1225–33

146. North B, Cristian L, Stowell XF, Lear JD, Saven JG, DeGrado WF. 2006. Characterization of a mem-brane protein folding motif the ser zipper, using designed peptides. J. Mol. Biol. 359:930–39

147. Yin H, Slusky JS, Berger BW, Walters RS, Vilaire G, et al. 2007. Computational design of peptides thattarget transmembrane helices. Science 315:1817–22

148. Caputo GA, Litvinov RI, Li W, Bennett JS, DeGrado WF, Yin H. 2008. Computationally designedpeptide inhibitors of protein-protein interactions in membranes. Biochemistry 47:8600–6

149. Signarvic RS, DeGrado WF. 2009. Metal-binding dependent disruption of membranes by designedhelices. J. Am. Chem. Soc. 131:3377–84

www.annualreviews.org • Theoretical and Computational Protein Design 149

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62FrontMatter ARI 18 February 2011 17:56

Annual Review ofPhysical Chemistry

Volume 62, 2011 Contents

Laboring in the Vineyard of Physical ChemistryBenjamin Widom � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1

The Ultrafast Pathway of Photon-Induced ElectrocyclicRing-Opening Reactions: The Case of 1,3-CyclohexadieneSanghamitra Deb and Peter M. Weber � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �19

Coarse-Grained (Multiscale) Simulations in Studies of Biophysicaland Chemical SystemsShina C.L. Kamerlin, Spyridon Vicatos, Anatoly Dryga, and Arieh Warshel � � � � � � � � � � � �41

Dynamics of Nanoconfined Supercooled LiquidsR. Richert � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �65

Ionic Liquids: Structure and Photochemical ReactionsEdward W. Castner Jr., Claudio J. Margulis, Mark Maroncelli,

and James F. Wishart � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �85

Theoretical Study of Negative Molecular IonsJack Simons � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 107

Theoretical and Computational Protein DesignIlan Samish, Christopher M. MacDermaid, Jose Manuel Perez-Aguilar,

and Jeffery G. Saven � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 129

Melting and Freezing of Metal ClustersAndres Aguado and Martin F. Jarrold � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 151

Astronomical ChemistryWilliam Klemperer � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 173

Simulating Chemistry Using Quantum ComputersIvan Kassal, James D. Whitfield, Alejandro Perdomo-Ortiz, Man-Hong Yung,

and Alan Aspuru-Guzik � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 185

Multiresonant Coherent Multidimensional SpectroscopyJohn C. Wright � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 209

Probing Free-Energy Surfaces with Differential Scanning CalorimetryJose M. Sanchez-Ruiz � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 231

viii

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62FrontMatter ARI 18 February 2011 17:56

Role of Solvation Effects in Protein Denaturation: FromThermodynamics to Single Molecules and BackJeremy L. England and Gilad Haran � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 257

Solid-State NMR Studies of Amyloid Fibril StructureRobert Tycko � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 279

Cooperativity, Local-Nonlocal Coupling, and Nonnative Interactions:Principles of Protein Folding from Coarse-Grained ModelsHue Sun Chan, Zhuqing Zhang, Stefan Wallin, and Zhirong Liu � � � � � � � � � � � � � � � � � � � � � 301

Hydrated Acid ClustersKenneth R. Leopold � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 327

Developments in Laboratory Studies of Gas-Phase Reactions forAtmospheric Chemistry with Applications to Isoprene Oxidationand Carbonyl ChemistryPaul W. Seakins and Mark A. Blitz � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 351

Bonding in Beryllium ClustersMichael C. Heaven, Jeremy M. Merritt, and Vladimir E. Bondybey � � � � � � � � � � � � � � � � � � � 375

Reorientation and Allied Dynamics in Water and Aqueous SolutionsDamien Laage, Guillaume Stirnemann, Fabio Sterpone, Rossend Rey,

and James T. Hynes � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 395

Detecting Nanodomains in Living Cell Membrane by FluorescenceCorrelation SpectroscopyHai-Tao He and Didier Marguet � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 417

Toward a Molecular Theory of Early and Late Events in Monomerto Amyloid Fibril FormationJohn E. Straub and D. Thirumalai � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 437

The Density Matrix Renormalization Group in Quantum ChemistryGarnet Kin-Lic Chan and Sandeep Sharma � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 465

Thermodynamics and Mechanics of Membrane Curvature Generationand Sensing by Proteins and LipidsTobias Baumgart, Benjamin R. Capraro, Chen Zhu, and Sovan L. Das � � � � � � � � � � � � � � � 483

Coherent Nonlinear Optical Imaging: Beyond FluorescenceMicroscopyWei Min, Christian W. Freudiger, Sijia Lu, and X. Sunney Xie � � � � � � � � � � � � � � � � � � � � � � � 507

Roaming RadicalsJoel M. Bowman and Benjamin C. Shepler � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 531

Coarse-Grained Simulations of Macromolecules:From DNA to NanocompositesJuan J. de Pablo � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 555

Contents ix

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.

PC62FrontMatter ARI 18 February 2011 17:56

New Developments in the Physical Chemistry of Shock CompressionDana D. Dlott � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 575

Solvation Dynamics and Proton Transfer in Nanoconfined LiquidsWard H. Thompson � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 599

Nonadiabatic Events and Conical IntersectionsSpiridoula Matsika and Pascal Krause � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 621

Lessons in Fluctuation Correlation SpectroscopyMichelle A. Digman and Enrico Gratton � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 645

Indexes

Cumulative Index of Contributing Authors, Volumes 58–62 � � � � � � � � � � � � � � � � � � � � � � � � � � � 669

Cumulative Index of Chapter Titles, Volumes 58–62 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 672

Errata

An online log of corrections to Annual Review of Physical Chemistry articles may befound at http://physchem.annualreviews.org/errata.shtml

x Contents

Ann

u. R

ev. P

hys.

Che

m. 2

011.

62:1

29-1

49. D

ownl

oade

d fr

om w

ww

.ann

ualr

evie

ws.

org

by W

eizm

ann

Inst

itute

of

Scie

nce

on 0

1/31

/12.

For

per

sona

l use

onl

y.