EMAN: Semiautomated Software for High-Resolution Single-Particle Reconstructions

16
EMAN: Semiautomated Software for High-Resolution Single-Particle Reconstructions Steven J. Ludtke, Philip R. Baldwin, and Wah Chiu Verna and Marrs McLean Department of Biochemistry, National Center for Macromolecular Imaging, Baylor College of Medicine, Houston, Texas 77030 Received June 7, 1999, and in revised form August 10, 1999 We present EMAN (Electron Micrograph ANaly- sis), a software package for performing semiauto- mated single-particle reconstructions from transmis- sion electron micrographs. The goal of this project is to provide software capable of performing single- particle reconstructions beyond 10 Å as such high- resolution data become available. A complete single- particle reconstruction algorithm is implemented. Options are available to generate an initial model for particles with no symmetry, a single axis of rotational symmetry, or icosahedral symmetry. Model refinement is an iterative process, which utilizes classification by model-based projection matching. CTF (contrast transfer function) param- eters are determined using a new paradigm in which data from multiple micrographs are fit simulta- neously. Amplitude and phase CTF correction is then performed automatically as part of the refine- ment loop. A graphical user interface is provided, so even those with little image processing experience will be able to begin performing reconstructions. Advanced users can directly use the lower level shell commands and even expand the package utiliz- ing EMAN’s extensive image-processing library. The package was written from scratch in C11 and is provided free of charge on our Web site. We present an overview of the package as well as several con- formance tests with simulated data. r 1999 Academic Press Key Words: electron cyromicroscopy; CTF; con- trast transfer function; single-particle reconstruc- tion; image processing INTRODUCTION Over the past decade, an experimental/computa- tional technique known as single-particle analysis has been growing rapidly in popularity. In this technique, transmission electron microscopy is used to address problems that are difficult to approach with traditional crystallographic methods. A variety of statistical and image processing techniques are applied to a large number of images of identical molecules to produce a three-dimensional structure. This technique offers several advantages over crystal- lographic methods. First, the macromolecule is typi- cally embedded in vitreous ice, which preserves its native state in a biologically relevant conformation. Second, it is ideally suited to address molecules that are problematical for X-ray crystallography. The majority of X-ray structures are for proteins smaller than 200 kDa. Electron cryomicroscopy, however, is ideal for large macromolecules and assemblies in the range of hundreds to thousands of kilodaltons. Fi- nally, single-particle analysis allows functional states to be addressed relatively easily (Gabashvili et al., 1999; Orlova et al., 1996). When the macromolecule consists of a mixed population of functional states and is still somewhat structurally heterogeneous, statistical techniques can, in principle, be applied to separate particles in the desired state from the mixed population. For macromolecules that can be purified to chemi- cal and structural homogeneity and are amenable to recording high-resolution images, the biggest limit- ing factor in applying single-particle methods is data processing. Data collection and scanning can be accomplished relatively rapidly, but once the data are ready for processing, analysis may proceed for weeks before it is even known whether the quality and quantity of data are sufficient to achieve the desired resolution. While several excellent software packages exist that allow single-particle and other reconstruction procedures to be performed (Frank et al., 1996; Schroeter and Bretaudiere, 1996; van Heel et al., 1996; Whittaker et al., 1995), they are de- signed primarily for experienced users performing reconstructions at intermediate resolutions. EMAN is designed to make this technique more accessible to inexperienced users and provide tools necessary to efficiently process large amounts of data for high- resolution reconstructions. EMAN is a software package designed specifically for single-particle reconstructions. Its design and Journal of Structural Biology 128, 82–97 (1999) Article ID jsbi.1999.4174, available online at http://www.idealibrary.com on 82 1047-8477/99 $30.00 Copyright r 1999 by Academic Press All rights of reproduction in any form reserved.

Transcript of EMAN: Semiautomated Software for High-Resolution Single-Particle Reconstructions

Journal of Structural Biology 128, 82–97 (1999)Article ID jsbi.1999.4174, available online at http://www.idealibrary.com on

EMAN: Semiautomated Software for High-ResolutionSingle-Particle Reconstructions

Steven J. Ludtke, Philip R. Baldwin, and Wah Chiu

Verna and Marrs McLean Department of Biochemistry, National Center for Macromolecular Imaging, Baylor College of Medicine,Houston, Texas 77030

Received June 7, 1999, and in revised form August 10, 1999

smstprpOfrMumedntmewAsippaf

tt

thttwoa

mTlcnSamtirnt1cassm

cripaawadpraesriier

1CA

We present EMAN (Electron Micrograph ANaly-is), a software package for performing semiauto-ated single-particle reconstructions from transmis-

ion electron micrographs. The goal of this project iso provide software capable of performing single-article reconstructions beyond 10 Å as such high-esolution data become available. A complete single-article reconstruction algorithm is implemented.ptions are available to generate an initial model

or particles with no symmetry, a single axis ofotational symmetry, or icosahedral symmetry.odel refinement is an iterative process, which

tilizes classification by model-based projectionatching. CTF (contrast transfer function) param-

ters are determined using a new paradigm in whichata from multiple micrographs are fit simulta-eously. Amplitude and phase CTF correction ishen performed automatically as part of the refine-ent loop. A graphical user interface is provided, so

ven those with little image processing experienceill be able to begin performing reconstructions.dvanced users can directly use the lower levelhell commands and even expand the package utiliz-ng EMAN’s extensive image-processing library. Theackage was written from scratch in C11 and isrovided free of charge on our Web site. We presentn overview of the package as well as several con-ormance tests with simulated data. r 1999 Academic Press

Key Words: electron cyromicroscopy; CTF; con-rast transfer function; single-particle reconstruc-ion; image processing

INTRODUCTION

Over the past decade, an experimental/computa-ional technique known as single-particle analysisas been growing rapidly in popularity. In thisechnique, transmission electron microscopy is usedo address problems that are difficult to approachith traditional crystallographic methods. A varietyf statistical and image processing techniques are

pplied to a large number of images of identical f

82047-8477/99 $30.00opyright r 1999 by Academic Pressll rights of reproduction in any form reserved.

olecules to produce a three-dimensional structure.his technique offers several advantages over crystal-

ographic methods. First, the macromolecule is typi-ally embedded in vitreous ice, which preserves itsative state in a biologically relevant conformation.econd, it is ideally suited to address molecules thatre problematical for X-ray crystallography. Theajority of X-ray structures are for proteins smaller

han 200 kDa. Electron cryomicroscopy, however, isdeal for large macromolecules and assemblies in theange of hundreds to thousands of kilodaltons. Fi-ally, single-particle analysis allows functional stateso be addressed relatively easily (Gabashvili et al.,999; Orlova et al., 1996). When the macromoleculeonsists of a mixed population of functional statesnd is still somewhat structurally heterogeneous,tatistical techniques can, in principle, be applied toeparate particles in the desired state from theixed population.For macromolecules that can be purified to chemi-

al and structural homogeneity and are amenable toecording high-resolution images, the biggest limit-ng factor in applying single-particle methods is datarocessing. Data collection and scanning can beccomplished relatively rapidly, but once the datare ready for processing, analysis may proceed foreeks before it is even known whether the qualitynd quantity of data are sufficient to achieve theesired resolution. While several excellent softwareackages exist that allow single-particle and othereconstruction procedures to be performed (Frank etl., 1996; Schroeter and Bretaudiere, 1996; van Heelt al., 1996; Whittaker et al., 1995), they are de-igned primarily for experienced users performingeconstructions at intermediate resolutions. EMANs designed to make this technique more accessible tonexperienced users and provide tools necessary tofficiently process large amounts of data for high-esolution reconstructions.EMAN is a software package designed specifically

or single-particle reconstructions. Its design and

icmsypuaTitab

sftnulvtppct

P

imgpmaiooiphgr

pmkLCogael

StbbfalwcteTtgptpstprpt

ttosvmscpaaavfwpc

P

tsictictwNb

83EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

mplementation are based on three general prin-iples. The first is to improve processing efficiency, toake processing the thousands to hundreds of thou-

ands of particles required for reconstructions be-ond 10 Å feasible. The second is to make single-article processing more accessible to inexperiencedsers, while still providing the tools necessary tollow advanced users to perform custom processing.he third is to introduce a robust method for perform-

ng CTF correction with minimal effort on the part ofhe user. This article describes the design rationalesnd features of EMAN and demonstrates its applica-ility with simulated data.

3D RECONSTRUCTIONS IN EMAN

Performing a reconstruction in EMAN is a three-tage process. First, the particles must be selectedrom scanned micrographs or CCD frames. Second,he boxed-out particles are used to generate a prelimi-ary 3D model. Finally, the preliminary model issed as the starting point for the main refinement

oop, which is iterated until the refinement con-erges. EMAN also allows the CTF of the microscopeo be corrected semiautomatically. In this case, CTFarameters are determined and phase correction iserformed before the refinement loop. Amplitudeorrections are performed automatically as part ofhe refinement loop.

article Selection

Before a reconstruction can be performed, thendividual particles must be located in the raw

icrographs or CCD frames. EMAN includes boxer, araphical program for manual or semiautomaticarticle selection. This program displays the overallicrograph in one window and the boxed particles in

nother window. The boxed particles are updatedmmediately when a new particle is selected or anld box is moved. Individual boxes may be adjustedr deleted with a simple mouse click. Contrastnversion is provided to aid in manually locatingarticles in low-contrast images. For very large origh-resolution micrographs, boxer allows the micro-raph to be split into several regions of equal size,educing the memory requirements of the program.Boxer also incorporates a semiautomatic selection

rocedure. A number of automatic particle selectionethods have been proposed (Frank and Wagen-

necht, 1984; Harauz and Fong-Lochovsky, 1989;ata et al., 1995; Martin et al., 1997; Thuman-ommike and Chiu, 1995; van Heel, 1982). Severalf these techniques are based on the technique ofenerating a rotationally averaged reference imagend locating particles by cross-correlating the refer-nce with the entire micrograph. Boxer uses a simi-

ar technique with a few additional refinements. d

ince projections of particles in different orienta-ions may have dramatically different appearances,oxer uses multiple references to ensure accurateoxing. The user manually selects several particlesrom the micrograph, which are then rotationallyveraged and used as templates. Particles are thenocated by cross-correlating each template imageith the entire micrograph. The individual cross-

orrelation maps, each from a different template, arehen combined by selecting the maximum value atach pixel location from the set of correlation maps.he combined cross-correlation map will then con-

ain peaks for each putative particle in the micro-raph. Since some particles may contain multipleeaks, this map is then low-pass filtered to a resolu-ion equivalent to 1⁄2 the box size. This minimizesroblems with the same particle being multiplyelected. Instead, this ensures that peaks that areoo close to each other are averaged and a singlearticle will be identified. A peak-searching algo-ithm then extracts the location of all recognizedarticles from this map with peaks above somehreshold value.

The only remaining difficulty is determination ofhis threshold value. Rather than attempting to dohis automatically, boxer presents the user with a setf threshold sliders. One slider sets the basic peakelection threshold, and two additional thresholdalues allow particles with excessive or insufficientean contrast to be excluded. The second pair of

liders is used primarily for eliminating areas ofontamination. As the user varies the sliders, thearticle boxes in a 1k 3 1k section of the micrographre updated interactively. Once all three thresholdsre at appropriate levels, the entire micrograph isutoboxed. While the selection routine in the currentersion does a relatively good job, there is still roomor improvement. Particles that are nonsphericalill be selected less accurately than nearly sphericalarticles. This procedure is likely to be improvedonsiderably in future releases.

reliminary Model Generation

EMAN’s refinement procedure is model based;hat is, it requires an initial 3D model to use as atarting point for refinement. The quality of thenitial model required to allow the refinement loop toonverge is a function of many parameters, includinghe symmetry of the model, the signal-to-noise ration the individual particles, and the degree of spheri-al asymmetry in the model. Generally speaking, ifhe data have sufficient signal, the refinement loopill converge even if a very poor initial model is used.onetheless, some sort of model is required, and theetter the model is, the faster the refinement proce-

ure will converge. Many methods for generating

igCOrbsrtieg

itthissnoictGodltcabow

otripgggstepddpaiutg

ErsttiFtTciteaapntc

rstspabCtpsnwsaatgssmvvtdmpHfs

uEo

84 LUDTKE, BALDWIN, AND CHIU

nitial models have been invented by many differentroups over the years (Baker and Cheng, 1996;rowther, 1971; Frank, 1989; van Heel et al., 1996).ne solution to this problem is for the user to select

epresentative particle views and generate a modely manually placing geometric objects (cylinders,pheres, etc.) in a 3D map. Of course, this processequires considerable effort and ability on the part ofhe user and may bias the reconstruction toward anncorrect result. To avoid this, EMAN contains sev-ral symmetry-specific routines for automaticallyenerating initial models.Naturally, in many cases the symmetry is not

nitially known. In general, the actual symmetry ofhe molecule must be imposed during the reconstruc-ion or symmetry-breaking artifacts will occur due toigh noise levels present in the individual particle

mages. For example, consider performing a recon-truction of a molecule with a twofold rotationalymmetry without imposing this symmetry. If nooise is present, every particle will be assigned onef two Euler angles with equal probability. However,f noise is present, it will dominate the alignmenthoice, combining constructively to produce symme-ry breaking artifacts in the final reconstruction.enerally, with twofold symmetry, this will causene half of the protein to have much higher meanensity than the other half. The lower the noiseevels are in the individual particle images, the lesshis effect will be observed, but in general, theorrect symmetry must be applied to obtain a reli-ble reconstruction from noisy data. The sectionelow, entitled ‘‘Symmetry Determination’’ discussesne method for evaluating the symmetry of a modelith unknown symmetry.For asymmetric particles, those with C2 symmetry

r those for which the routines with imposed symme-ry do not work well, a generic model-generatingoutine is provided. This routine begins by generat-ng a set of reference-free class averages. That is,articles that appear to be similar to one another arerouped together, and then the particles within eachroup are mutually aligned and averaged. Thisenerates a class average for each group, whichhould represent one characteristic view of the par-icle. This routine has several user-defined param-ters, which may be adjusted to obtain the bestossible distribution of class averages. Assuming theistribution of particle orientations is sufficientlyiverse, the set of averages will represent most of theossible particle orientations. Several of these aver-ges are then selected manually for use in generat-ng a 3D model. A Fourier common-lines routine issed to determine the relative orientations of all ofhe selected averages, which are then combined to

enerate a 3D model. a

For objects known to have icosahedral symmetry,MAN uses a very fast initial model generationoutine. This routine searches the complete particleet for particles with the best five-, three-, andwo-fold symmetries. There is, of course, no guaran-ee that particles with these precise views will existn the data set, but this is generally not a problem.or each symmetry, EMAN will use the particles

hat come closest to having the desired symmetry.he quality factor for each particle is measured byalculating the dot product between the particle andtself after an appropriate rotation. Once the par-icles for each symmetric axis have been determined,ach group of particles is mutually aligned andveraged to generate three characteristic class aver-ges. These three views are then used to build areliminary 3D model. This model is, naturally, veryoisy and visually may appear quite different fromhe final refined model. It is, however, usually suffi-ient for use as an initial model for refinement.For objects with Cn symmetry (a single axis of

otational symmetry), where n . 2, a procedureimilar to the icosahedral procedure is used. First,he particles are searched for views with good Cn

ymmetry. Then a second search is performed forarticles with a mirror or pseudo-mirror symmetrynd a poor Cn symmetry. That is, particles that haveoth a good mirror symmetry and the worst possiblen symmetry are located. If we consider the projec-

ions with Cn symmetry to be ‘‘top’’ views of thearticle, then this second group represents possibleide views of the particle. Macromolecules with evenwill have a true mirror symmetry. Macromoleculesith odd n will still typically have a pseudo-mirror

ymmetry in the side views. Once top-view particlesnd side-view particles have been located, they areligned and averaged to generate class averages ofhe top and one side view of the particle. Since theroup of side-view particles may contain a variety ofide views, one dominant side view from the set iselected by the alignment routine. A preliminary 3Dodel is then constructed from these two orthogonal

iews. As expected, the quality of this model will beery poor; however, we have found it to be sufficiento achieve convergence in several test cases with realata and one test with simulated data. While thisethod is very fast, it is not robust. It may not

rovide an adequate starting model in every case.owever, it is sufficiently fast that it is worth trying

or any molecule with a known or suspected Cn

ymmetry.Naturally, other software packages can also be

sed at this point to generate an initial model.MAN can read models in a variety of formats, sother packages that provide, for example, multivari-

te statistical analysis techniques for low-symmetry

pFpa

R

Xtgmmigcwidwstsnscpc

tatlanbtFbchtabm

ESeltbnIr

muf

85EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

articles (Frank et al., 1996; van Heel et al., 1996) orourier common-lines techniques for icosahedralarticles (Crowther, 1971) may be used to generaten initial model for subsequent refinement in EMAN.

efinement Loop

As with other structural techniques like NMR and-ray crystallography, the experimental data in elec-

ron cryomicroscopy cannot be directly inverted toenerate an optimal 3D model. In cases like this, theost commonly used technique is iterative refine-ent. In this technique, a rough preliminary model

s iteratively refined against the data. True conver-ence is achieved when the model remains un-hanged for several successive iterations. In EMAN,e use a less restrictive definition of convergence,

ncluding the noise level of our initial data in theefinition. In this definition, convergence is achievedhen the FSC (Fourier shell correlation) between

uccessive iterations stabilizes. This is not to sayhat the FSC between iterations must fall below apecific value. Indeed, if the data are sufficientlyoise-free, the FSC may remain above 0.5 at allpatial frequencies. Rather, we require that the FSCurve between successive iterations cease to im-rove, no matter what numerical values the curvesontain. Small, resolution-dependent variations be-

FIG. 1. A flow diagram outlining the reconstruction process inodel generation, and model refinement. The boxed particles aresing the same set of particles. The refinement loop is iterated unt

rom projections indicates use as references only.

ween iterations due to high noise levels are unavoid-ble, even when the model has stabilized. However,his fact can be used to our advantage. Once ourimited definition of convergence is satisfied, anydditional changes between models are due solely tooise present in the images. This makes the FSCetween successive iterations a rough measure ofhe resolution of the model. The point at which theSC curve begins to fall is the point at which noiseegins to strongly affect the reconstruction. Typi-ally, the resolution at which the FSC curve fallsalfway to its minimum value will correspond roughlyo the final resolution of the model as determined byt-test. This is not a robust measure of resolution,

ut is useful for preliminary estimates as the refine-ent progresses.While many of the individual techniques used in

MAN are conceptually similar to those used byPIDER (Frank et al., 1996) and IMAGIC (van Heelt al., 1996), the overall refinement algorithm, out-ined in Fig. 1, is different than that used by either ofhese packages. EMAN uses particle classification,ut the classes generated in the refinement loop areot reference-free, as they generally are in anMAGIC reconstruction. The projection matchingoutine used by EMAN to classify particles is similar

. Reconstruction occurs in three phases: particle selection, initialo generate a preliminary model, which is then refined iterativelyergence is achieved. CTF correction (in grey) is optional. Grey line

EMANused til conv

tt

FaoeptSbTecs

ptpwDrstpuhab

pawdsbsipcottaagFrmgtptaNO

aaituhTiautegTdbrrcta

acrsabapbuaitppsaarcebTamacht

trr

86 LUDTKE, BALDWIN, AND CHIU

o one commonly used Euler-angle determinationechnique in SPIDER.

The refinement loop is outlined in the right half ofig. 1. The refinement process begins by generatingset of N projections with uniformly distributed

rientations. These projections are used as refer-nces for classification of the raw particle data. Eacharticle will be associated with one of the projec-ions, resulting in N classes of raw particle images.everal techniques for classification are provided,ut the most robust is performed by classesbymra.his program translationally and rotationally alignsach particle to each reference image and thenalculates a dot product, which is used to rate theimilarity of the particle to each reference.A minimum number of projections are required to

rovide sufficient angular sampling for a reconstruc-ion at a given resolution. This minimum number ofrojections can be estimated in various ways, but theell-known pD/d approximation is adequate (whereis the size of the particle and d is the desired

esolution). In EMAN, the number of projections iselected by the user and may be increased to improvehe homogeneity of individual particle classes androvide better angular resolution. This allows theser to vary continuously between the extreme withigh angular resolution and very little averagingnd the extreme with very poor angular resolution,ut with very consistent, low-noise class averages.The next step is generation of class averages. The

articles within a class must be mutually alignednd then averaged. While each particle is groupedith the projection it most closely resembles, thisoes not mean all of the particles necessarily repre-ent usable data for this orientation. Particles maye partially denatured, in a different functionaltate, radiation damaged, or simply in a patch of badce. If a large angular spacing is used betweenrojections, individual ‘‘good’’ particles may also lookonsiderably different due to small differences inrientation. We do not wish to eliminate particles inhis second class, since they represent true data forhis macromolecule, and the ‘‘blurring’’ effect gener-ted when they are averaged is expected. Thislgorithm attempts to determine which particles areood and which are bad as they are mutually aligned.or this procedure to succeed there should be aeasonable number of images in the class (,10 orore) and more than half of the particles must be

ood. Identification of bad particles is accomplishedhrough a simple iterative procedure. First, eacharticle is aligned to the class average that was usedo generate the class. The particles are then aver-ged together to generate an initial class average.ext, each particle is aligned to this initial average.

nce all of the particles have been aligned to the q

verage, a histogram of similarity between the aver-ge and each aligned particle is generated. Themages represented by the tail of this histogram arehe least similar to the class average and are notsed in generating the new average. They are,owever, allowed to participate in the next iteration.his process is iterated several times. The number of

terations and how similar a particle must be to theverage to be included in the new average areser-defined parameters. Eventually a self-consis-ent class average is produced. Note that the refer-nce projection is used only to help speed the conver-ence and is used only in the first round of alignment.he final average may actually appear considerablyifferent than the initial projection the class wasased on. In fact, this difference is the primaryeason the overall refinement loop converges soapidly. If CTF correction is enabled, amplitudeorrections are performed automatically as the par-icles within a class are averaged together (see CTFnd Envelope Function Corrections).Generally, the particle classification routine does

n excellent job of assigning particles to the correctlass. However, in some cases the low signal-to-noiseatio in the electron micrographs will lead to misas-ignment of some particles. In most cases this is notproblem, and the incorrectly assigned particles wille eliminated by the alignment/averaging procedurebove. In cases where one orientation is stronglyreferred the problem may be more severe. We beginy considering the case where the orientations areniformly distributed. In this case, the correctlyssigned particles in each class will outnumber thencorrectly assigned particles. However, if one orien-ation is strongly preferred, with say 100 times morearticles in a particular orientation, the misassignedarticles may actually outnumber the correctly as-igned particles within some classes. The alignment/veraging algorithm will still produce a consistentverage in this case, but the resulting average willepresent the preferred orientation rather than theorrect orientation. This problem is relatively rare,ven in cases where a preferred orientation exists,ut it will have serious consequences if it does occur.o deal with this potential problem, once the classverages have been generated, their Euler anglesay be redetermined by projection matching. This

ssignment will be more accurate than the initiallassification since the class averages have a muchigher signal-to-noise ratio than the individual par-icles.

The class averages with assigned Euler angles arehen used to construct a new 3D model for the nextound of refinement. EMAN uses a Fourier spaceeconstruction routine, which is extremely fast and

uite robust for most systems. This is a conceptually

saiountFt

ort1swraptsFb(mqngptt

S

tttadbupmalaampcsca

id

snubcgrTthbtp

mstpctatdprfarstssuimfpoa

C

rooaoim1snpsd

87EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

imple method. Since the orientation of each classverage is known, each class average is simplynserted into the Fourier volume in the correctrientation, passing through the center of the vol-me. Each inserted class average is weighted by theumber of particles used to generate it. When all ofhe class averages have been averaged into theourier volume, an inverse Fourier transform yieldshe reconstructed model in real space.

Once the new 3D model has been generated, theverall process is iterated. The convergence of theefinement process can be observed by examininghe FSC between the 3D models (Harauz and Heel,986) generated in each iteration. This curve willtabilize once the refinement loop has converged. Itill also generally give a good estimate of the

esolution of the 3D models as described above. Andditional resolution measurement can be made byerforming a simple two-way T test: the raw par-icles are split in half and each half is reconstructedeparately. There are different definitions for theSC cut-off that defines the resolution of the model,ut we prefer the more conservative 0.5 FSC cut-offBottcher et al., 1997). This method may underesti-ate the reliability of the model somewhat, since the

uality of the class averages in EMAN dependsonlinearly on the number of particles used toenerate each class. That is, dividing the number ofarticles in a class by 2 may generate a class averagehat due to alignment limitations, may be worsehan simply using half the particles would suggest.

ymmetry Determination

A prerequisite for determining an accurate struc-ure using single-particle analysis is knowledge ofhe particle’s symmetry. If there is no prior informa-ion on the symmetry of the particle, EMAN provides

procedure for making a preliminary symmetryetermination. The raw particle data are classifiedy generating a set of rotational invariants andsing a standard k-means iterative classificationrocedure. The particles in each class are thenutually aligned and averaged, using the same

lignment routine described above in the refinementoop. This produces a set of reference-free classverages. Generally, visual inspection of these classverages will give a good idea of the symmetry of theodel. For a quantitative assessment, a routine is

rovided to examine the symmetry quality of eachlass average for each possible simple rotationalymmetry, C2–C14. Multivariate statistical analysisan also be applied at this point to provide somedditional insight.In particularly difficult cases, rather than apply-

ng this technique to high-resolution, low-contrast

ata, far from focus micrographs can be used in- l

tead. These data will have a very high signal-to-oise ratio at low resolution, but contain little or nosable high-resolution information. These data cane used in the above procedure and may even bearried through a complete 3D reconstruction. Theoal of this exercise is to rapidly generate a low-esolution model with a high degree of confidence.his model will generally be sufficient to determine

he symmetry. The symmetry may be broken if aigh-resolution reconstruction is performed later,ut for a new particle with very little initial struc-ural information, this technique provides a goodlace to start.Once the symmetry has been tentatively deter-ined, a 3D reconstruction is performed with this

ymmetry imposed. Two methods are then used toest the accuracy of the applied symmetry. First,rojections of the final model are compared with theorresponding class averages. If the assigned symme-ry is too high, several class averages will not matchny of the projections. If the assigned symmetry isoo low, say C3, when the true symmetry is C6,uplicates of each class will appear in the set ofrojections. In the second test, the symmetry in theefinement loop is relaxed, and the refinement is runor several additional iterations. If the symmetryssignment was accurate, the refined model willemain essentially unchanged. Generally, some smallymmetry-breaking mass movement will occur inhis process, but the overall appearance of the modelhould still maintain its original symmetry. If theymmetry assignment was incorrect, the model willndergo dramatic changes in the first or second

teration after the symmetry is relaxed. Since refine-ent progresses much more rapidly and requires

ewer particles with some applied symmetry, thisrocedure is more robust than the alternate methodf starting with no applied symmetry and thenpplying the predicted symmetry in the second step.

TF and Envelope Function Corrections

The goal of single-particle reconstruction is toegenerate the correct three-dimensional structuref a molecule based on two-dimensional projectionsf the molecule. Unfortunately, the images gener-ted by electron microscopes are not true projectionsf the specimen. They suffer from a set of artifactsncluding the CTF and the envelope function of the

icroscope (Erickson and Klug, 1970; Hanszen,971). In addition, noise is present from a variety ofources. If we assume that astigmatism and drift areegligible, these effects are all isotropic. The CTF inarticular can cause serious artifacts in a 3D recon-truction. Without CTF correction, the model pro-uced by a reconstruction may contain significant

ocal mass displacements. The high-resolution struc-

toehcoris

l(eiwpdfvC1q

dtcsraateospffictdts

iwwfsfiestet

ecfifm

wCramsinsTe

wt(tfb

P

acddgldriicfiw

a

88 LUDTKE, BALDWIN, AND CHIU

ure will generally be severely distorted, and the lackf low-resolution amplitude correction can causeffects such as causing a solid object to appearollow. The severity and type of these effects, ofourse, depend on many factors, the most significantf which is the defocus of the data used in theeconstruction. Nonetheless, if a reconstruction thats truly representative of the real structure is de-ired, CTF correction is crucial.CTF correction has been routine in electron crystal-

ography and helical reconstructions for some timeAmos et al., 1982; Crowther et al., 1996; Hendersont al., 1986; Jeng et al., 1989; Mimori et al., 1995), butt is only now becoming widespread in single-particleork. This is principally due to the difficulty inerforming this correction on noncrystallographicata. Several groups have proposed and/or per-ormed CTF correction on single-particle data toarious degrees in the past (Bottcher et al., 1997;onway et al., 1997; Zhou et al., 1998; Zhu et al.,997), but the existing correction software is stilluite difficult to use.CTF correction can be performed with varying

egrees of completeness, ranging from simple trunca-ion of the data at the first zero crossing of the CTF toomplete amplitude and phase correction performedimultaneously on multiple data sets. The EMANeconstruction procedure incorporates complete CTFmplitude and phase correction that is nearly fullyutomated. A graphical utility, ctfit, allows the usero determine the CTF parameters for the particles inach micrograph. Multiple micrographs with a rangef defocus values should be used to sample Fourierpace as uniformly as possible. Ctfit stores the CTFarameters in the header of each particle and per-orms phase correction. The complete set of particlesrom the entire set of micrographs is then combinednto a single file for reconstruction. CTF amplitudeorrection is then performed transparently as part ofhe reconstruction procedure. To provide a completeescription of how EMAN performs CTF correctionhe mathematical methodology will be discussed inome detail.The envelope function and CTF are best examined

n Fourier space. At this point in the discussion, weill ignore the effects of drift and astigmatism,hich cause asymmetries in the CTF and envelope

unctions. The Fourier transform of the true 3Dtructure of the molecule is called the structureactor of the molecule, F(s, u, f), where the overbarndicates a complex valued function. As discussedarlier, in Fourier space, a projection of the 3Dtructure is represented by a slice passing throughhe origin in Fourier space. This means CTF andnvelope functions that are circularly symmetric in

he images are also spherically symmetric when

xtended to 3D. This is a crucial point if we wish toorrect for the CTF during a reconstruction. Thenal goal of the reconstruction is to produce F(s, u,). The data measured in a transmission electronicroscope can be described by

M(s, u, f) 5 C(s)E(s)F(s, u, f),

here M(s, u, f) is the measured data, C(s) is theTF, E(s) is the envelope function, and N(s, u, f)

epresents random noise with a consistent spectralmplitude profile. If we wish to obtain F(s, u, f), weust first know C(s) and E(s). N(s, u, f) cannot be

ubtracted directly, since all we know is its meanntensity. We rely on averaging to reduce the meanoise level, which we monitor by examining theignal-to-noise ratio as the reconstruction progresses.o determine C(s) and E(s), it is more convenient toxamine the rotationally averaged power spectra

M(s)2 5 F(s)2C(s)2E(s)2 1 N(s)2,

here N(s)2 is the mean noise intensity and F(s)2 ishe one-dimensional structure factor of the moleculerotationally averaged power spectrum). To performhe 3D correction, we must determine C(s) and E(s)or each micrograph. To properly weight the dataetween micrographs, we must also know N(s).

arameter Determination

We know M(s) for each micrograph and C(s), E(s),nd N(s) can all be parameterized based on theoreti-al or empirical models as shown below. In two-imensional crystals or helical arrays, N(s) can beetermined independently by examining the back-round between the crystallographic peaks or layerines. In single-particle processing, the signal isistributed continuously throughout Fourier spaceather than in discrete peaks, making this approachmpossible. However, since C(s) varies sinusoidally,nformation about N(s) can be obtained from the zerorossings of C(s). Even with this fact, the overalltting problem is impossible for a single data setithout first determining F(s).The functional forms used for C(s), E(s), and N(s)

re

C(s) 5 A(Î1 2 CA2 sin (g) 1 CA cos (g)),

where g 5 2p1Csl3s4

41

DZls2

2 2 , 0 # CA # 1

E(s) 5 e2Bs2

2 n21n3s21n4Îs

N(s) 5 n1e .

TAcoshsbmsdwtAeehtcnidmtmbcaimwet

cgastio1afTXcappeuntd

oiacdfspsmfpsengca

gbfTwvssdcdulcepocnaaawrftmoFt

P

asc

89EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

his gives a total of eight parameters used for fitting:, CA, DZ, B, and n1–4. The model for N(s) isompletely empirical and encompasses a wide rangef different physical effects, including incoherentcattering, film noise, and scanner noise. This modelas worked well with data from four different micro-copes (JEOL 1200EX, 4000EX, 2010F, and 3000SFF),ut could potentially require modification for othericroscopes. In particular, we anticipate micro-

copes with energy filters to produce a substantiallyifferent noise distribution. C(s) is provided by theell-known weak phase approximation with ampli-

ude contrast corrections (Erickson and Klug, 1970).five-term theoretical expression exists for E(s),

ncompassing microscope parameters as well asffects such as specimen movement. In practice,owever, a simple Gaussian is experimentally indis-inguishable from this aggregate expression in mostases. It is also worth noting that there are currentlyo parameters to compensate for astigmatism. Ctfit

s capable of measuring astigmatism angle andefocus difference, but no corrections are currentlyade for this effect. As single-particle reconstruc-

ions approach atomic resolution, however, the astig-atism present even in excellent micrographs may

ecome significant. EMAN can be easily modified toompensate for this, but a considerable speed pen-lty would be imposed. In addition, specimen charg-ng will induce apparent defocus changes and astig-

atism within a single micrograph, and beam tiltill cause resolution-dependent phase shifts. These

ffects are ignored at present, beyond avoiding par-icles in micrograph areas with obvious charging.

As mentioned above, before C(s), E(s), and N(s)an be determined accurately from individual micro-raphs, we must determine F(s)2. Once this has beenccomplished, parameter determination becomes atraightforward fitting problem. We will discusshree methods for determining F(s)2. The first methods to perform a solution X-ray scattering experimentn the macromolecule under study (Schmid et al.,999; Thuman-Commike et al., 1999). This providesn isotropically averaged one-dimensional structureactor that will be nearly identical to F(s)2 above.here may be discrepancies due to the differences in-ray and electron scattering cross sections for spe-ific charge distributions (Mitsuoka et al., 1999). Inddition, the solution scattering experiment willrovide a structure factor that is completely isotro-ic. If the orientations of the single particles in thelectron cryomicroscopy experiment are not distrib-ted uniformly, the structure factors may containoticeable differences. Even in this case, however,he structure factor will generally be sufficient toetermine C(s), E(s), and N(s).

2

The second method for determining F(s) is to take s

ne or more micrographs of the sample in ice depos-ted on a layer of continuous carbon. By selectingreas of the micrograph containing only continuousarbon, the CTF parameters can be determinedirectly by assuming a one-dimensional structureactor for the carbon film (approximated as a con-tant for low to intermediate resolutions). Thesearameters can then be used to determine thetructure factor of the macromolecule in the sameicrograph. That is, we determine C(s) and E(s)

rom the carbon film alone and then use thesearameters with the macromolecule data in theame micrograph to determine F(s)2 of the macromol-cule. Of course, this F(s)2 will be very inaccurateear the zeros of the CTF in an individual micro-raph, so an average F(s)2 must be determined byombining the information from several micrographst different defocus settings.The final method for determining F(s)2 is the most

eneral and requires no additional data collection,ut is the most difficult to implement. F(s)2 is aunction solely of the macromolecule being studied.hat is, it is the portion of the data that is constanthen microscope parameters such as defocus arearied. When data taken at multiple defocuses are fitimultaneously, this fact provides an additional con-traint, which makes the fit feasible. As long as theistribution of particle orientations remains fairlyonsistent, any micrographs of the same protein willo. In the most general case, this would still be annderdetermined fitting problem. However, the oscil-

atory nature of the CTF combined with a monotoni-ally decreasing envelope function provides the nec-ssary additional constraints to make the fittingroblem converge. After the fit has been performednce, with several micrographs, the resulting F(s)2

an be used to determine C(s), E(s), and N(s) for anyumber of additional micrographs. While relativemplitudes and B-factors can be determined, there islways an arbitrary overall scaling factor and anrbitrary overall B-factor that cannot be determinedithout a known reference. Currently this technique

equires some manual fitting to be performed, but aully automated solution may be possible. Despitehe difficulties, we have successfully applied thisethod to several problems in which results from

ne of the other methods were also available, and the(s)2 determined by the two methods matched ex-

remely well.

erforming the Corrections

Once the parameters have been determined, thectual CTF correction is performed when aligned 2Dingle-particle images are averaged to generate alass average. The averaging is performed in Fourier

pace. The basic method is to perform a weighted

asssafniddpstmwcraotspv

wsda

Wmtr

Tcsoiintccoa

st

vDptorwodduoref

rfiHstrirt

rpepfddrttacjmCtaFwzbw(p

90 LUDTKE, BALDWIN, AND CHIU

verage of the images, where the weights vary withpatial frequency. We wish to choose the weightsuch that data are used optimally; that is, theignal-to-noise ratio is maximized in the final aver-ge image at all spatial frequencies. The parametersor C(s), E(s), and N(s) provide a measure of signal-to-oise ratio as a function of spatial frequency for each

ndividual particle. Since F(s) is the same in all of theata for a given particle, we can eliminate it andefine the relative signal-to-noise ratio for eacharticle as Rn(n) 5 [Cn(s)2E n(s)2]/[Nn(s)2], where theubscript, n, denotes particle number. To simplifyhe expression, we assume that Nn(s) is approxi-ately the same between exposures. The solution inhich Nn(s) is allowed to vary is somewhat more

omplex, but is easily derived. Note that some inaccu-acy in Nn(s) will not have a significant effect on theccuracy of the results; it will simply cause slightver- or underweighting of certain particles. In prac-ice, this effect is negligible for data collected on aingle microscope. The final class average is ex-ressed as a simple linear combination of the indi-idual aligned images

T(s, u) 5 on

kn(s)Mn(s, u),

here s and u represent polar coordinates in Fourierpace, and kn(s) are the weighting coefficients to beetermined. The signal-to-noise ratio of the classverage, T(s,u) is then just

RT(s, u) 5 on

kn(s)Rn(s, u).

e wish to determine kn(s) such that RT (s) is maxi-ized at all s, Sn Cn(s)En(s)kn(s) 5 1 at all s and that

he CTF and envelope functions are corrected. Theesult of this maximization is

kn(s) 51

Cn(s)En(s)

Rn(s)

om

Rm(s)5

Cn(s)En(s)

om

Cm(s)2Em(s)2.

hat is, for an optimal class average, the weightingoefficients, k, are proportional to Rn(s), the relativeignal-to-noise ratio within each image. This method-logy makes optimal use of the available informationn all of the images. When an image contains nonformation at a particular spatial frequency, it doesot contribute to the final image. In addition, thisechnique is relatively insensitive to small inaccura-ies in the CTF model and/or parameters. The typi-al effect of fitting inaccuracies would be slightlyver- or underweighting a particular image when

veraging. Since all the images represent the same w

tructure, this simply causes a slight reduction inhe statistical definition of the result.

TESTING

Each routine in EMAN has been subjected to aariety of tests, with both real and simulated data.iscussing all of these tests would make this paperrohibitively long, so we will limit this discussion towo of the most fundamental tests. First, a basic testf the 3D projection/reconstruction routines at highesolution is described, since this procedure is some-hat novel for the single-particle community. Sec-nd, a complete reconstruction test using simulatedata is presented. EMAN has successfully repro-uced and, in fact, improved several reconstructionssing real data as well. Again, to limit the complexityf this paper, these results will be published sepa-ately. All of the data and specific program param-ters used in the tests described here are availableor download with the EMAN distribution.

EMAN uses a direct Fourier space reconstructionoutine, whereas most reconstruction packages useltered real-space methods (Frank et al., 1996; vaneel et al., 1996). The direct Fourier algorithm offers

everal advantages over typical real-space methods,he most important of which is speed. In many othereconstruction schemes, construction of a 3D models a time-limiting step. The described Fourier algo-ithm is one to two orders of magnitude faster thanypical back-projection methods.

To verify the robustness of EMAN’s Fourier algo-ithm, a simple projection/reconstruction test in theresence of high noise levels was performed. A 3Dlectron density map of a trimer of the outer shellrotein of the bluetongue virus (VP7) was generatedrom PDB data (1BVP). The PDB model contains aimer of trimers (Grimes et al., 1995). The electronensity map of a single trimer was generated at 3-Åesolution. Two thousand five hundred projections ofhis model were generated using real-space projec-ion with trilinear interpolation. Noise was thendded to each projection at typical levels for electronryomicroscopy. Figure 2 shows representative pro-ections with and without noise and a plot of the

ean signal-to-noise ratio in the noisy particles. NoTF or envelope function was applied in the first

est. All 2500 projections with predefined Eulerngles were then used to generate a 3D model. Aourier space reconstruction routine was employedith 2 3 2 3 2 voxel Gaussian interpolation and 20%

ero padding. For quantitative comparison, the FSCetween the original and the reconstructed modelsas calculated as a function of spatial frequency

Fig. 3, black line). Naturally, with the added noise, aerfect reconstruction is not possible. Some noise

ill always be present in the final model. To provide

aomrtsistd

goassoC

t

twps

91EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

standard for comparison, the signal-to-noise ratiof the original particles was used to calculate theaximum FSC that could be obtained by an ideal

econstruction algorithm (Fig. 3, gray line). Whilehere is a one-to-one relationship between FSC andignal-to-noise ratio, the relationship contains anntegral with no analytical solution, so the conver-ion is calculated numerically. Note that this calcula-ion does not account for the numerical errors in-

FIG. 2. Particles used for both of the conformance tests descririmer from PDB data. (B) The same projections with added noiseith applied CTF, envelope function, and noise. Each particle sharticles used in the second test. (D) The mean signal-to-noisetructure factor of the macromolecule divided by the applied noise

uced by the trilinear interpolation used in s

enerating projections, so this curve actually slightlyverestimates the best possible FSC. Even withoutccounting for this affect, the Fourier space recon-truction routine performs extremely well. Alsohown in Fig. 3 are two representative views of theriginal and reconstructed models at 4-Å resolution.learly, the two models are nearly indistinguishable.In the second test, the overall reconstruction rou-

ine, including CTF correction, was tested with

the text. (A) A few raw projections of the bluetongue virus capsidd from those used in the first test. (C) Representative projectionsepresents a different simulated defocus. These are a few of then the individual images shown in B. The curve represents the

bed inselecteown rratio ilevel.

imulated data. The same BTV trimer was used as a

tpteeaad

uedpdfg

pr

92 LUDTKE, BALDWIN, AND CHIU

est model. In this test, five sets of 500 randomrojections were generated. To make the test realis-ic, we used CTF and noise parameters taken fromxperimental measurements made on a JEOL 2010Flectron microscope with a field emission gun oper-ted at 200 keV with an objective lens sphericalberration coefficient of 0.5 mm. Parameters were

FIG. 3. Results of the first test. A side view and a thin slice of thlot shows the FSC between the reconstruction and the origineconstruction routine with the given data set at the noise level sh

etermined from a measurement of carbon film fit r

sing ctfit. These parameters were then applied toach of the five data sets, with each set simulating aifferent defocus. Figure 2C shows representativerojections with noise and applied CTF at variousefocus settings. A reconstruction was then per-ormed as described above. An initial model wasenerated using the simple symmetry search algo-

view are shown for the original and the reconstructed models. Thedel. Also shown is the maximum FSC possible with an idealFig. 2.

e topal mo

own in

ithm for a single three-fold rotational symmetry.

TdrFtat

tcid

Aataaptcswed

s

sdasm

93EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

his model was then refined against the completeata set with CTF correction. Figure 4 shows theesults after zero to three refinement iterations. TheSC indicates a final resolution of ,5 Å, close to theheoretical limit with the applied envelope functionnd noise levels in the original particles. The struc-ures appear nearly identical at this resolution.

INDIVIDUAL PROGRAMS

EMAN contains a number of individual programshat are useful for general purpose electron micros-opy image processing. All of the necessary process-ng steps in the reconstruction procedure are embed-ed in a small number of straightforward commands.

FIG. 4. Results of the second test. In this test, a complete reconhows the FSC between the model after each iteration and themonstrates that the refinement converged by the fourth iteratifter each iteration are shown above the plot. The leftmost viewsearch algorithm described in the text. The rightmost model is th

odels were smoothed to 6 Å with a Gaussian filter.

graphical user interface is provided to allow rapidnalysis of the results and provides instructions toake a novice user through the reconstruction system-tically. For advanced users, lower level commands,s well as a complete set of C11 classes, arerovided to allow custom processing in cases wherehe general reconstruction procedures are insuffi-ient. For a typical macromolecule, a refined 3Dtructure can be generated from boxed out particlesith a sequence of three commands. Aside fromvaluating results, no user interaction is requireduring the refinement procedure.Despite its relative ease of use, generating a final

tructure and verifying its correctness are not en-

ion was performed with 2500 particles of simulated data. The plote’’ model used to generate the original particle data. The ploth a final resolution of ,4.5 Å. Representative views of the modelent the preliminary model generated by the automatic symmetrymodel. Models after iterations 1–3 are shown in the middle. All

structe ‘‘truon witreprese true

ttmmEettpoeb

igpicmcfpiprrcfvatiecit

cCpdseegWgi

gb3msd

pdfp

ihSsattto

fcwwmtwcfvsWdoo(ttptpts

etiiptrpsptqrE

94 LUDTKE, BALDWIN, AND CHIU

irely trivial. A novice user may, for example, overes-imate the quality of an early model of a newacromolecule or be unaware of the pitfalls thatay exist in making initial symmetry assumptions.ven with fairly automated software, user experi-nce will play a role in generating a robust model. Inhe future, EMAN will be expanded to include addi-ional routines to ease the analysis procedure androvide additional user guidance. The initial versionf the software is usable by novices, but they shouldxpect a considerable amount of trial and errorefore they begin obtaining robust models.The graphical user interface in EMAN is packaged

n several separate programs. Screenshots of tworaphical programs are shown in Fig. 5. The mainrogram, called eman, allows graphical browsing ofmages and command history. Every time an EMANommand-line program is executed, a log-file entry isade, recording the exact command parameters,

ommand run-time, and accessed files. This is per-ormed automatically with no additional effort on theart of the user. This command history can benteractively browsed and searched with eman. Thisrovides a mechanism for recording the history of aeconstruction, as well as allowing the user to easilyerun commands with small variations. Eman alsoontains a complete image browser and specific toolsor evaluating the results of a reconstruction. Indi-idual 2D or 3D image files can be viewed directly inwindow in eman with interactive brightness, con-

rast, and inversion controls. Additionally, whenntermediate files from a reconstruction are viewed,man provides a variety of useful features includingomparison of projections with class averages andndividual images in a class with reference projec-ions.

As mentioned above, EMAN contains a programalled ctfit, which allows parameters necessary forTF correction to be interactively determined. Thisrogram also acts as a complete CTF simulator foretermining good defocus values for future micro-cope sessions and comparing parameters from differ-nt microscopes. It also has the ability to apply CTF,nvelope function, and/or noise to a set of images toenerate simulated particles for testing purposes.ith a 3D model, this can be used to predict how a

iven particle will appear visually on a microscope ince.

Helixhunter and sheethunter are nongraphical pro-rams that facilitate the location of alpha helices andeta sheets in 4- to 8-Å reconstructions. Proc3d is aD processing program that can perform most com-on image processing operations on volume data,

uch as filtering, centering a model, calculating a

ifference map, and scaling. In addition, it can take p

airs of models and calculate Fourier shell phaseifference, Fourier shell correlation coefficients, R-actors, etc. Align3d is provided to automaticallyerform rotational alignment of two 3D models.For isosurface visualization, an excellent solution

s Vis5D (http://www.ssec.wisc.edu/,billh/vis5d.tml), a freely available program written by thepace Science and Engineering Center at the Univer-ity of Wisconsin, Madison. EMAN includes mrc2v5dnd mrc2v5dt, utilities to convert 3D MRC models tohe format used by Vis5D. Vis5D provides extensiveools for visually comparing models, generating con-our slices, creating small animations, and dozens ofther useful features.EMAN is provided completely free of charge with

ull C11 source. No modification is necessary forompilation on SGIs or Linus based machines. It wasritten with portability in mind and should compileith relatively little effort on other Unix basedachines. Graphical programs were written using

he freely available QT toolkit (http://www.troll.no/),hich compiles on virtually all Unix based ma-

hines. Fourier transforms are performed with theree FFTW library (http://www.fftw.org/), which pro-ides performance comparable to that of machine-pecific optimized code (http://www.fftw.org/benchfft/).hile the MRC and IMAGIC file formats are used by

efault, EMAN can also read and write a variety ofther formats, including SPIDER, TIFF (currentlynly 8 bit), GIF, etc. Machine byte order swappingMSB first vs LSB first) is handled automatically byhe image reading functions. Users are encouragedo modify our examples and write their own utilityrograms. Any useful programs may be submitted tohe NCMI for inclusion in future releases of theackage. Complete documentation, sample data, andhe software itself can be obtained on the NCMI Webite: http://ncmi.bcm.tmc.edu/.

DISCUSSION AND CONCLUSIONS

EMAN was written with the ambitious goal ofventually generating 3- to 5-Å resolution reconstruc-ions from single-particle data. The largest problemn generating such a high-resolution reconstructions the rapid increase of the number and size ofarticles such as reconstruction requires. Clearly,he amount of image data scales as the square of theeciprocal of the resolution. That is, the individualarticles for a 5-Å reconstruction will be twice theize and contain four times as much data as thearticles used for a 10-Å reconstruction. Not only dohe individual particles become larger, but the re-uired angular accuracy scales linearly with theeciprocal of the resolution. Since there are twouler angles, this means the number of reference

rojections used for Euler angle determination also

c

95EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

FIG. 5. Sample screenshots of two of the graphical programsarbon film images. Eman is shown in four of its many different di

included with the EMAN package. Ctfit is shown performing a fit of twosplay modes.

strp

irdbnpnlovotgoBrItfmrt

phfrtdeti,eca

hwmsahsatallSm

acsnnrrdsrhbmlrearuompem

rcvtfiapwretgttsmru(d

fprabb

96 LUDTKE, BALDWIN, AND CHIU

cales as the square of the reciprocal of the resolu-ion. This factor does not cause an increase in theequired data, but does cause an increase in therocessing time for a reconstruction.In addition to these two factors, we must also take

nto account the increase in number of particlesequired due to the low signal-to-noise ratio in theata. There is a direct correspondence between FSCetween two independent data sets and signal-to-oise ratio. This means that the number of requiredarticles is inversely proportional to the signal-to-oise ratio at the desired resolution. Since the enve-

ope function causes the signal-to-noise ratio to fallff as e22Bs2, the number of particles required risesery rapidly beyond s 5 Î1/B. This makes the qualityf the microscope imaging system extremely impor-ant. For example, with images with B 5 200 Å2,oing from 20- to 10-Å resolution will require a factorf 20 increase in the number of particles. However, if

5 100 Å2, the same resolution improvement willequire only a factor of 4 increase in particle count.n addition, the signal-to-noise ratio is also propor-ional to the structure factor of the particle. Asiderom localized peaks, the structure factor of mostacromolecules falls off rapidly from 100 to 10 Å,

equiring a further increase in the number of par-icles (Thuman-Commike et al., 1999).

Combining these factors, we immediately see thatrocessing becomes a limiting factor very rapidly atigh resolution. Even if we discount the structureactor, a 5-Å reconstruction with B 5 50 Å2 data willequire approximately 300 times more computa-ional time than a 10-Å reconstruction with the sameata quality. If the 10-Å reconstruction required, forxample, a day of processing time, the 5-Å reconstruc-ion would take nearly a year of continuous process-ng. Improving B to 25 Å2 would reduce this time to2–3 months. Even so, improvements in processing

fficiency and availability of parallel processing areritical for higher resolution reconstructions to bechieved.EMAN is already performing reconstructions in 24that previously required a month of user-intensiveork. Several novel techniques for rapid projectionatching promise another order of magnitude in

peed improvement in the near future. In addition,ll of the time-limiting steps in the reconstructionave been parallelized for use on shared-memoryupercomputers. Single-particle reconstruction is ide-lly suited to parallel processing, since in most stepshe particles or classes can be treated independently,llowing very coarse-grained parallelization. Paral-el code is currently implemented using the pthreadsibrary supported by virtually all UNIX machines.upport for distributed processing on clusters of

achines should exist by the next release. r

One assumption that is frequently made whenpproximating the feasibility of high-resolution re-onstructions is that data with an arbitrarily smallignal-to-noise ratio can be recovered if a sufficientumber of particles are averaged together. Unfortu-ately, this is not the case. When the signal-to-noiseatio falls below a particle specific value, alignmentoutines will no longer be able to accurately align theata at high resolution. If the alignment is notufficiently good, the data at the correspondingesolution will no longer add coherently, no matterow many data are provided. A very rough estimate,ased on our experience with EMAN and variousicroscopes, any data with a signal-to-noise ratio

ess than ,0.02 will be unrecoverable. With thisequirement and typical dose conditions of ,102/Å2, we estimate that B < 2d 2 provides a reason-ble estimate of the largest B capable of providing aeconstruction at a given resolution, d, given annlimited number of particles. This approximation,f course, ignores the structure factor of the macro-olecule, which can have a significant effect. Ctfit

rovides the tools necessary to make a more precisestimate for a specific macromolecule on a specificacromolecule on a specific microscope.Another factor to consider in performing high-

esolution reconstructions is CTF correction. Whenollecting data for reconstructions beyond 10 Å, it isirtually impossible to avoid multiple oscillations ofhe CTF. Pictures would have to be taken so close toocus that there would be virtually no contrast in themages, and low-resolution information would belmost completely absent. A more reasonable ap-roach is to combine further from focus images,hich provide much greater total signal-to-noise

atio, with a robust CTF correction algorithm. How-ver, even with a robust algorithm, there are limitso the defocus that can be successfully used at aiven resolution. When the defocus becomes suchhat multiple CTF oscillations occur from one pixelo the next at high spatial frequency in Fourierpace, reliable corrections can no longer be made. Inost cases, this turns out to be a relatively modest

equirement. For example, for the 200-keV scopesed in earlier examples, defocuses as high as 1.5 µm

first zero of the CTF at ,20 Å) should still produceata correctable to 5–6 Å.In summary, EMAN provides the tools necessary

or achieving the next level of resolution in single-article reconstructions, while also allowing low-esolution reconstructions to be performed rapidlynd routinely. As data with B 5 50 Å2 or betterecome available, single-particle reconstructions atelow 10 Å resolution should become routine. As

esolution improves, EMAN will continue to be im-

pp

mpP(FB

A

B

B

C

C

C

E

F

F

F

G

G

H

H

H

H

J

L

M

M

M

O

S

S

T

T

v

v

W

Z

Z

97EMAN: SINGLE-PARTICLE RECONSTRUCTION SOFTWARE

roved to provide an up-to-date tool for single-article reconstructions.

We thank Joanita Jakana and Irina Serysheva for providinguch of the experimental data used when writing and testing the

ackage. This project was supported in part by Grants41RR02250, F32AR08474 (to S. J. Ludtke), NLM2T15LM07093

to P. R. Baldwin), and RO1AR41729, by the Robert A. Welchoundation, and by the W. M. Keck Center for Computationaliology.

REFERENCES

mos, L. A., Henderson, R., and Unwin, P. N. (1982) Three-dimensional structure determination by electron microscopy oftwo-dimensional crystals, Prog. Biophys. Mol. Biol. 39, 183–231.

aker, T. S., and Cheng, R. H. (1996) A model-based approach fordetermining orientations of biological macromolecules imagedby cryoelectron microscopy, J. Struct. Biol. 116, 120–130.

ottcher, B., Wynne, S. A., and Crowther, R. A. (1997) Determina-tion of the fold of the core protein of hepatitis B virus by electroncryomicroscopy, Nature 386, 88–91.

onway, J. F., Cheng, N., Zlotnick, A., Wingfield, P. T., Stahl, S. J.,and Steven, A. C. (1997) Visualization of a 4-helix bundle in thehepatitis B virus capsid by cryo-electron microscopy, Nature386, 91–94.

rowther, R. A. (1971) Procedures for three-dimensional recon-struction of spherical viruses by Fourier synthesis from electronmicrographs, Philos. Trans. R. Soc. London B 261, 221–230.

rowther, R. A., Henderson, R., and Smith, J. M. (1996) MRCimage processing programs, J. Struct. Biol. 116, 9–16.

rickson, H. P., and Klug, A. (1970) The Fourier transform of anelectron micrograph: Effects of defocusing and aberrations, andimplications for the use of underfocus contrast enhancement,Philos. Trans. R. Soc. London B 261, 105–118.

rank, J. (1989) Image analysis of single macromolecules, Elec-tron Microsc. Rev. 2, 53–74.

rank, J., Radermacher, M., Penczek, P., Zhu, J., Li, Y., Ladjadj,M., and Leith, A. (1996) SPIDER and WEB: Processing andvisualization of images in 3D electron microscopy and relatedfields, J. Struct. Biol. 116, 190–199.

rank, J., and Wagenknecht, T. (1984) Automatic selection ofmolecular images from electron micrographs, Ultramicroscopy12, 169–176.abashvili, I. S., Agrawal, R. K., Grassucci, R., and Frank, J.(1999) Structure and structural variations of the Escherichiacoli 30 S ribosomal subunit as revealed by three-dimensionalcryo-electron microscopy, J. Mol. Biol. 286, 1285–1291.rimes, J., Basak, A. K., Roy, P., and Stuart, D. (1995) The crystalstructure of bluetongue virus VP7, Nature 373, 167.anszen, K. J. (1971) The optical transfer theory of the electronmicroscope: Fundamental principles and applications, in Barer,R., and Cosslett, V. E. (Eds.), Advances in Optical and ElectronMicroscopy, pp. 1–84, Academic Press, New York.arauz, G., and Fong-Lochovsky, A. (1989) Automatic selection ofmacromolecules from electron micrographs by component label-ling and symbolic processing, Ultramicroscopy 31, 333–344.arauz, G., and Heel, M. V. (1986) Exact filters for three-dimensional reconstruction of an object from projections, Optik

73, 146–156.

enderson, R., Baldwin, J. M., Downing, K. H., Lepault, J., andZemlin, F. (1986) Structure of purple membrane from Halobac-terium halobium: Recording, measurement and evaluation ofelectron micrographs at 3.5 Å resolution, Ultramicroscopy 19,147–178.

eng, T. W., Crowther, R. A., Stubbs, G., and Chiu, W. (1989)Visualization of alpha-helices in tobacco mosaic virus by cryo-electron microscopy, J. Mol. Biol. 205, 251–257.

ata, K. R., Penczek, P., and Frank, J. (1995) Automatic particlepicking from electron micrographs, Ultramicroscopy 58, 381–391.artin, I. M. B., Marinescu, D. C., Lynch, R. E., and Baker, T. S.(1997) Identification of spherical virus particles in digitizedimages of entire electron micrographs, J. Struct. Biol. 120,146–157.imori, Y., Yamashita, I., Murata, K., Fujiyoshi, Y., Yonekura, K.,Toyoshima, C., and Namba, K. (1995) The structure of theR-type straight flagellar filament of Salmonella at 9 Å resolu-tion by electron cryomicroscopy, J. Mol. Biol. 249, 69–87.itsuoka, K., Hirai, T., Murata, K., Miyazawa, A., Kidera, A.,Kimura, Y., and Fujiyoshi, Y. (1999) The structure of bacterior-hodopsin at 3.0 Å resolution based on electron crystallography:Implication of the charge distribution, J. Mol. Biol. 286, 861–882.rlova, E. V., Serysheva, II, van Heel, M., Hamilton, S. L., andChiu, W. (1996) Two structural configurations of the skeletalmuscle calcium release channel, Nat. Struct. Biol. 3, 547–552.

chmid, M. F., Sherman, M. B., Matsudaria, P., Tsuruta, H., andChiu, W. (1999) Scaling structure factor amplitudes in electroncryomicroscopy using X-ray solution scattering, J. Struct. Biol.128, 51–57.

chroeter, J. P., and Bretaudiere, J. P. (1996) SUPRIM: Easilymodified image processing software, J. Struct. Biol. 116, 131–137.

human-Commike, P. A., and Chiu, W. (1995) Automatic detectionof spherical particles from spot-scan electron microscopy im-ages, J. Micros. Soc. Am. 1, 191–201.

human-Commike, P. A., Tsuruta, H., Greene, B., Prevelige, P. E.,King, J., and Chiu, W. (1999) Solution X-ray scattering basedestimation of electron cryomicroscopy imaging parameters forreconstruction of virus particles, Biophys. J. 76, 2249–2261.

an Heel, M. (1982) Detection of objects in quantum-noise-limitedimages, Ultramicroscopy 8, 331–342.

an Heel, M., Harauz, G., Orlova, E. V., Schmidt, R., and Schatz,M. (1996) A new generation of the IMAGIC image processingsystem, J. Struct. Biol. 116, 17–24.hittaker, M., Carragher, B. O., and Milligan, R. A. (1995)PHOELIX: A package for semi-automated helical reconstruc-tion, Ultramicroscopy 58, 245–259.

hou, Z. H., Jakana, J., Rixon, F. J., and Chiu, W. (1998) 9 Åstructure of HSV-1 B capsid, in 14th International Congress onElectron Microscopy, 4th ed. pp. 291–292, Cancun, Mexico.

hu, J., Penczek, P. A., Schroder, R., and Frank, J. (1997)Three-dimensional reconstruction with contrast transfer func-tion correction from energy-filtered cryoelectron micrographs:Procedure and application to the 70S Escherichia coli ribosome,

J. Struct. Biol. 118, 197–219.