Identification of Spherical Virus Particles in Digitized Images of Entire Electron Micrographs

12
Identification of Spherical Virus Particles in Digitized Images of Entire Electron Micrographs Ioana M. Boier Martin Department of Computer Science, Indiana University, South Bend, Indiana 46634 Dan C. Marinescu and Robert E. Lynch Department of Computer Sciences, Purdue University, West Lafayette, Indiana 47907 and Timothy S. Baker 1 Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907 Received March 7, 1997, and in revised form June 26, 1997 New methods are described that should facilitate high-resolution (5–10 Å) image reconstructions from low-dose, low-contrast electron micrographs of fro- zen-hydrated specimens and processing of large, digital images produced by new imaging devices and modern electron microscopes. Existing tech- niques for automatic selection of images of indi- vidual biological macromolecules from electron mi- crographs are inefficient or unreliable. We describe the Crosspoint method (CP), which produces good quality solutions with relatively small miss rates and few false hits, and an extension of this method along with a procedure for refining its solution. Two algorithms for processing large images, one based on image subsampling, the other on image decompo- sition, are described. A large image is first com- pressed (e.g., by subsampling) and the CP method is applied to the compressed image to produce an initial solution. The information gathered at this stage is used to cut the original image into subim- ages and then to refine the particle coordinates in each subimage. An interactive environment for ex- perimenting with particle identification methods is described. r 1997 Academic Press 1 INTRODUCTION An important goal of transmission electron micros- copy is to reveal the three-dimensional structure of the specimen under study. From electron micro- graphs which contain many different projections of identical macromolecules (e.g., virus particles), it is possible to produce a spatial model of the structure of an average particle (see, for example, (4)). The three-dimensional (3D) model of a specimen is normally represented as a density function sampled at the points of a regular grid. The images of individual particles in electron micrographs are ap- proximate projections of the specimen in the direc- tion of the electron beam. The problem of determin- ing the specimen structure from the micrographs is equivalent to the problem of reconstructing a density distribution from its projections. Fourier theory pro- vides a simple approach to finding the 3D structure of an object from its projections. The Projection Theorem (6) connects the Fourier transform of the object with the transforms of its projections. The basic steps of the 3D reconstruction process begin with the selection (boxing) of individual par- ticle images from a number of digitized micrographs (Fig. 1). These projections of the virus particles constitute the different views used to fill in the 3D Fourier transform of the specimen. The number of such views depends on the desired resolution of the final structure and on the particle size. Next, the orientations of the specimen that give rise to these projections must be determined (9). The best results are often obtained in the case of highly symmetrical particles such as icosahedral viruses because the high symmetry leads to redundancies in the Fourier transform data and this, in turn, aids in the orienta- tion search process. The 3D Fourier transform of the particle is calculated from experimental values in central sections. The values of the 3D transform 1 To whom correspondence should be addressed. Fax: (765) 496-1189. E-mail: [email protected]. JOURNAL OF STRUCTURAL BIOLOGY 120, 146–157 (1997) ARTICLE NO. SB973901 146 1047-8477/97 $25.00 Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.

Transcript of Identification of Spherical Virus Particles in Digitized Images of Entire Electron Micrographs

Identification of Spherical Virus Particles in Digitized Imagesof Entire Electron Micrographs

Ioana M. Boier Martin

Department of Computer Science, Indiana University, South Bend, Indiana 46634

Dan C. Marinescu and Robert E. Lynch

Department of Computer Sciences, Purdue University, West Lafayette, Indiana 47907

and

Timothy S. Baker1

Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907

Received March 7, 1997, and in revised form June 26, 1997

New methods are described that should facilitatehigh-resolution (5–10 Å) image reconstructions fromlow-dose, low-contrast electron micrographs of fro-zen-hydrated specimens and processing of large,digital images produced by new imaging devicesand modern electron microscopes. Existing tech-niques for automatic selection of images of indi-vidual biological macromolecules from electron mi-crographs are inefficient or unreliable. We describethe Crosspoint method (CP), which produces goodquality solutions with relatively small miss ratesand few false hits, and an extension of this methodalong with a procedure for refining its solution. Twoalgorithms for processing large images, one basedon image subsampling, the other on image decompo-sition, are described. A large image is first com-pressed (e.g., by subsampling) and the CP method isapplied to the compressed image to produce aninitial solution. The information gathered at thisstage is used to cut the original image into subim-ages and then to refine the particle coordinates ineach subimage. An interactive environment for ex-perimenting with particle identification methods isdescribed. r 1997 Academic Press

1 INTRODUCTION

An important goal of transmission electron micros-copy is to reveal the three-dimensional structure ofthe specimen under study. From electron micro-

graphs which contain many different projections ofidentical macromolecules (e.g., virus particles), it ispossible to produce a spatial model of the structure ofan average particle (see, for example, (4)).

The three-dimensional (3D) model of a specimen isnormally represented as a density function sampledat the points of a regular grid. The images ofindividual particles in electron micrographs are ap-proximate projections of the specimen in the direc-tion of the electron beam. The problem of determin-ing the specimen structure from the micrographs isequivalent to the problem of reconstructing a densitydistribution from its projections. Fourier theory pro-vides a simple approach to finding the 3D structureof an object from its projections. The ProjectionTheorem (6) connects the Fourier transform of theobject with the transforms of its projections.

The basic steps of the 3D reconstruction processbegin with the selection (boxing) of individual par-ticle images from a number of digitized micrographs(Fig. 1). These projections of the virus particlesconstitute the different views used to fill in the 3DFourier transform of the specimen. The number ofsuch views depends on the desired resolution of thefinal structure and on the particle size. Next, theorientations of the specimen that give rise to theseprojections must be determined (9). The best resultsare often obtained in the case of highly symmetricalparticles such as icosahedral viruses because thehigh symmetry leads to redundancies in the Fouriertransform data and this, in turn, aids in the orienta-tion search process. The 3D Fourier transform of theparticle is calculated from experimental values incentral sections. The values of the 3D transform

1 To whom correspondence should be addressed. Fax: (765)496-1189. E-mail: [email protected].

JOURNAL OF STRUCTURAL BIOLOGY 120, 146–157 (1997)ARTICLE NO. SB973901

1461047-8477/97 $25.00Copyright r 1997 by Academic PressAll rights of reproduction in any form reserved.

must be sampled at the points of a 3D regular gridand this requires interpolation methods (14,18). Thelast step is to compute the electron density functionfrom the 3D Fourier transform by an inverse Fouriertransformation.

Boxing, the first step in the 3D reconstructionprocedure, is generally performed by a manual selec-tion process. Because this selection procedure can betedious, most low-resolution reconstructions (e.g., 20 Å)of relatively small virus particles have been computedfrom fewer than 100 particle images. It was estimatedthat approximately 2000 particle images are necessaryfor the reconstruction of a virus with a diameter of 1000Å at 10 Å resolution (20), and recent results at 7–9 Åresolution with hepatitis B virus capsids (3,5) haveconfirmed this estimate. Hence, manual boxing meth-ods are becoming impractical. The need for computer-aided particle detection methods provides the motiva-tion for our work.

At high magnification, noise in electron micro-graphs of unstained, frozen-hydrated macromol-ecules is unavoidable (14) and makes automaticdetection of particle positions a challenging task.Variability in the background support film of thespecimen sample and radiation damage are two

major sources of noise. Background variations in amicrograph can be enhanced in the digitized imageby use of a color look-up table (Fig. 2). Radiationdamage is the consequence of the exposure of thespecimen to the electron beam required to producehigh-magnification images. Limited exposure is usedto maximize specimen preservation, but the result isa low-contrast image. A typical low-contrast micro-graph and a histogram of the density values showthat gray levels in the image are concentrated in avery narrow range, as discussed in section 2.3 andillustrated in Figs. 5a and 5c.

An ideal automatic particle selection method mustproduce a reliable solution and be computationallyefficient. For high-resolution reconstruction work itis necessary to analyze large numbers of micro-graphs at speeds comparable to the data acquisitionrates. New input devices such as modern scanningmicrodensitometers and CCD (charge-coupled de-vice) detectors routinely allow frames consisting of6000 3 6000 or more pixels to be collected within atime frame of minutes or less.

FIG. 1. Schematic representation of the steps in a three-dimensional reconstruction of a spherical virus particle fromelectron micrographs.

FIG. 2. Variation of the background intensity values across adigitized micrograph containing a mixture of bacteriophage FX174(,30 nm in diameter) and polyoma virus (,50 nm in diameter)particles. Inset at lower right shows the color look-up table used tomap image intensities. Graph at top depicts the intensity varia-tion along a line that crosses the field of particles (black horizontalline). The virus particles are embedded in a thin (,100 nm) layerof vitrified water which is suspended across holes in a carbon film(edge seen in the lower left corner). In this example, the intensityrange varies linearly from red to yellow to blue, corresponding toprogressively lower densities in the specimen.

147IDENTIFICATION OF VIRUS PARTICLES IN MICROGRAPHS

The quality of the solution can be measured interms of the number of particles correctly identified,the number of unidentified particles, and the num-ber of false hits. Missed particles do not constitute asevere error as long as their number is small.Information that could be gathered from these projec-tions is lost, but the loss can be compensated byincreasing the number of micrographs from whichparticles are selected. False hits pose a more serioustype of error. If used, such regions that do notcorrespond to any real particle projections introduceadditional errors into the 3D Fourier transform.However, methods do exist that allow such ‘‘bad’’data to be screened (e.g., (2,9)).

2 AUTOMATIC PARTICLE SELECTION METHODS

Image processing of noise-obscured micrographsenables one to deal with problems such as (a) locat-ing and extracting the motif representing the projec-tion of a particle from a noisy background (8,11,16),(b) enhancing the structurally significant details ofthis motif (6,11), and (c) determining the orientationof the particle which produced the motif relative tosome viewing direction. A number of techniques thattake advantage of the high symmetry of icosahedralviruses have been successfully implemented and areroutinely used to solve problems (b) and (c) (6,9).However, the human visual system remains unsur-passed in its ability to analyze micrograph imageseffectively and reliably.

In this section we review the effects of severalimage processing algorithms and heuristics on micro-graphs and discuss the results obtained.

2.1 Edge Detection

Edge detection is a popular segmentation methodwhich exploits the discontinuity of the gray-levelvalues in an image. An edge is the boundary betweentwo regions with relatively distinct gray-level proper-ties. The idea is to transform the image so that apixel no longer contains gray-level information, but amagnitude and direction representing the severityand orientation of the local gray-level change. Gradi-ent operators have been widely used for edge detec-tion (10). A local derivative (gradient) is computed atevery pixel in the image. Regions of constant inten-sity yield a null gradient, whereas varying regionsare characterized by nonzero derivatives. The gradi-ent of an image I at pixel (x, y) is a vector:

=I 5 3Ix

Iy4 5 3

­I

­x

­I

­y4 .

A special case of gradient operators is the Sobeloperators, which have both a differencing and asmoothing effect (10). A common implementation ofthe Sobel operators is (using the notations in Fig.3a): Ix 5 (z7 1 2z8 1 z9) 2 (z1 1 2z2 1 z3) and Iy 5(z3 1 2z6 1 z9) 2 (z1 1 2z4 1 z7). The masks for theseoperators are shown in Figs. 3b and 3c.

The Sobel transform fails to yield reliable resultswhen applied to electron micrograph images. Thedigitized micrograph is very noisy and has signifi-cant levels of intensity variation both inside andoutside particle regions. Clearly the values of theintensities alone are incapable of defining whichportions of the line lie inside particles and which ofthem lie in the background (Fig. 2).

2.2 Template Matching

Template-matching methods have been proposedby several groups (8,12,16,20). In this method areference (template) particle is selected from themicrograph and cross-correlated with the entireimage (Fig. 4). Computationally it is more efficient totransform the entire image and the reference par-ticle to the Fourier domain, multiply the transforms,and transform back than to perform a correlation ofthe original images. Peaks in the correlation pattern(i.e., values of the correlation coefficient larger thana particular threshold) identify the locations of re-gions in the micrograph most similar to the templateand, in the ideal case, correspond to the centers ofthe particle projections.

Olson and Baker (16) proposed a two-cycle tem-plate-matching algorithm in which the peaks de-tected after the first correlation cycle are sortedaccording to their magnitude, the particle projec-tions corresponding to the strongest peaks are aver-aged, and the average is used as a new template inthe second cycle of the algorithm.

Template-matching methods produce reasonableresults only when applied to images with a goodsignal-to-noise ratio, i.e., formed with medium tohigh electron dose, and after background variationsare minimized or removed. However, it is commonlyagreed that it is difficult to identify peaks in thecross-correlation maps computed from such low-dose

FIG. 3. The Sobel operator masks.

148 BOIER MARTIN ET AL.

micrographs, and peak discrimination is extremelysensitive to fluctuations of the average intensityvalue throughout the image.

The basic template-matching algorithm describedby Thuman-Commike and Chiu (20,21) is precededby a constant area detection and correction process.The algorithm is rather complicated, involving im-age ‘‘cutting’’ and ‘‘sewing.’’ The authors report per-centages of false hits between 35 and 55%.

The computations involved in template-matchingmethods are relatively large because the complexityof the Fourier transform alone is n 3 log (n), with nthe number of pixels in the image. A small imagemay consist of n 5 1000 3 1000 pixels, whereas ascan of an entire micrograph can easily approach n 510 000 3 10 000 pixels.

2.3 The Crosspoint Method

The Crosspoint method we have developed com-bines traditional image processing techniques withheuristics and a new algorithm for the detection ofparticle centers. The time complexity of variousalgorithms used by this method is n, where n repre-sents the number of pixels in the digitized micro-graph. This method is described in detail in (13). Themain steps are summarized below and illustrated inFig. 5.

2.3.1 Image enhancement. The digitized micro-graph is enhanced by histogram equalization (10),followed by image averaging to smooth out localfluctuations of pixel intensities. High-resolution 3Dreconstructions usually include close-to-focus, i.e.,

low-contrast, images in which the high-resolutiondetails are not destroyed by the electron beam.Histogram equalization helps improve image con-trast by redistributing the gray levels in the imagemore uniformly over the gray-scale range (Figs.5a–5c).

The rationale for neighborhood averaging the digi-tized, histogram-equalized image is motivated by thefact that the intensities of the pixels in the image arenot characteristic of the inside or the outside of aparticle projection (see Fig. 2). A particular intensityvalue may occur in a region inside a projection aswell as somewhere in the background, where thereare no particles. However, the majority of the pixelsinside a particle projection have lower intensity (aredarker) than the pixels surrounding the particle,thus enabling the human eye to easily recognizeparticles. High-intensity fluctuations tend to be sharpand scattered throughout the entire area of theprojected image. Since averaging is a smoothingoperation, such fluctuations are reduced or disap-pear completely in this process. However, the size ofthe averaging filter must be chosen carefully toprevent the resulting image from becoming tooblurred (Fig. 5d).

2.3.2 Particle identification with a double scanprocedure. The particle identification algorithm isat the core of the Crosspoint (CP) method. It consistsof two phases: marking and clustering.

The algorithm takes as input an image and thevalue of the radius r of the particles to be identified.The image could be the original raw image, theenhanced image, or a subimage. The radius of theparticles is defined interactively based on the visualinspection of the image or is inferred from measure-ments made on previous images. The result of themarking phase is a binary image. Each pixel isconsidered to belong either to a particle projection(marked, set to 1) or to the background (not marked,set to 0).

In the original CP method (13) the image isscanned horizontally, row by row from top to bottom.Pairs of pixels at distance r 1 1 are compared andthe difference between the intensity values of thepixels in such a pair is tested against a thresholdvalue. If this difference is larger than the threshold,the algorithm proceeds to compare the lower inten-sity value with that of a pixel at distance r 1 1 in thevertical direction. This difference is also testedagainst the threshold and, if it is larger, the algo-rithm marks the element of the pair with the lowerintensity as being inside a particle; otherwise thepixel is not marked.

A portion of a micrograph after marking (Fig. 5e)shows pixels that have been marked as being insidea particle colored in green, and those unmarked

FIG. 4. The basic cross-correlation, template-matching algo-rithm.

149IDENTIFICATION OF VIRUS PARTICLES IN MICROGRAPHS

FIG. 5. (a) Portion of a low-contrast micrograph of frozen-hydrated sample of reovirus cores. (b) The micrograph after histogramequalization. (c) Gray-level histograms before (top) and after (bottom) histogram equalization. (d) The micrograph in (b) after neighborhoodaveraging with a 10 3 10 filter. (e) Contents of the binary image after pixel marking (green) superimposed on the micrograph in (d). (f ) Theresult of the CP2 method.

FIG. 6. The result of the marking phase in the case of an ideal particle. Note. The three lowest yellow pixels should have been colored green.FIG. 7. Portion of a micrograph in which the pixels have been marked (a) once and (b) twice. In (b), the particle projections, and hence

their centers, are better approximated by the clusters.FIG. 8. Disconnecting particles by thinning: (a) particle identification without thinning and (b) particle identification with thinning.FIG. 9. The solution produced by the CP2 method for the micrograph shown in Fig. 2.FIG. 10. (a) Electron micrograph containing several particle projections. (b) New particle positions (in red) after refinement (old

positions—in blue—are shown for comparison).

retaining their original intensities. For most par-ticles, the clusters of green pixels approximate quitewell the area of the particle’s projection. The centerof each particle is computed as the center of mass ofthe cluster corresponding to that particle.

Owing to the asymmetric nature of the scanningprocess, the top region of each particle projection issystematically left unmarked. The pixels marked bythe algorithm in the case of an ideal particle (Fig. 6)are colored green, whereas the top portion of theparticle (yellow) is not marked because the comparisonbetween the intensities of those pixels and the ones atdistance r 1 1 in the vertical direction fails. Hence, thissingle-scan process results in a systematic misjudging ofthe particle centers in the vertical direction.

A more accurate version of the algorithm, CP2,involves scanning the image twice: the first scan isperformed as before, followed by a second scanapplied to a transposed image rowwise (i.e., frombottom to top). The marked pixels are the cumulativesum of both scans. A portion of a micrograph withpixels marked (a) after CP and (b) after CP2 isillustrated in Fig. 7.

Clustering is the second phase of the particleidentification algorithm. It determines the clusters,i.e., the connected components in the binary imageresulting from the marking phase. Two algorithms,one based on a depth-first search and the other on acoloring scheme, are briefly described.

The stack algorithm (13) is a depth-first algorithmfor detecting connected components in a binaryarray using a stack. The array is scanned rowwiseuntil the first 1 is encountered. Its coordinates areused to update the center of mass of the clustercurrently being determined and the size of thecluster is incremented. All marked neighbors of thecurrent position are pushed onto a stack (eightneighbors are considered). The next position to beprocessed is the one at the top of the stack. A clusterhas been completely detected and processed whenthe stack becomes empty. The horizontal scanning ofthe binary array then resumes, until all clustershave been detected.

The coloring algorithm was suggested by M. J.Atallah (1). It detects connected components in abinary array by ‘‘coloring’’ them with different colors.As in the previous algorithm, the array is scannedrowwise, and every time a marked position is encoun-tered, it is either colored with a new color from acolor array (if none of its neighbors is colored) or itreceives the color of its neighbors. Only four neigh-bors are considered (top left, top, top right, and left).A decision must be made when, at some point duringthe scanning process, two clusters that have beenconsidered separate and have been colored with twodifferent colors become connected. In this case, thetwo clusters have to be ‘‘recolored’’ with the same

color. The simplest way to achieve this is to make thetwo different colors synonyms. The center of massand the size of the clusters can be computed on-the-fly, as the scanning progresses.

The size of a cluster is used to filter out clustersthat are too large or too small compared with theexpected area of the projection (see Section 3.2 and (13)).The center of mass of each cluster approximates thecorresponding particle center. The application of CP2 tothe micrograph in Fig. 5e is shown in Fig. 5f.

2.3.3 Postprocessing. A particle identificationmethod is affected by two types of errors: (a) missedparticles and (b) false hits. In the CP2 procedure thenumber of missed particles can be reduced by adjust-ing the rejection criteria based on the size of theclusters. A false hit occurs when a cluster that doesnot correspond to a real particle projection is ac-cepted. As mentioned in Section 1, this is a moreserious type of error. One way to reduce the numberof false hits is to calculate the average intensityinside each of the particles detected and to compareit with the average of all intensity values in acircular region outside it. If the two average valuesare very close, then it is very likely that the particleis merely a false hit.

One of the most common causes for missing par-ticles in the CP2 method occurs when just one clusteris detected instead of two, as results when twoparticle projections are touching or are very close toone another (see Fig. 8a). Here, postprocessing isnecessary to disconnect the two clusters. A ‘‘thin-ning’’ procedure has been adopted in which theoutermost layers of pixels from each cluster areremoved and this can effectively disconnect theclusters that have merged into single clusters. Thisprocedure works in such situations because theclusters are generally connected by thin ‘‘bridges.’’The appearance of clusters before and after thethinning procedure is illustrated in Fig. 8.

Results produced by the CP2 method for a micro-graph which contains a mixture of two types of virusparticles is shown in Fig. 9. In this example, thedesired particles were the smaller ones (bacterio-phage FX174). The larger particles (polyoma virus)were included solely for calibration purposes (16).The image is particularly noisy, a large portion of thecarbon film obscures the lower left corner, and thevariation of the background intensity is clearly vis-ible. Nevertheless, the CP2 method is quite success-ful in identifying most of the FX174 images anddistinguishes them from other objects (polyoma par-ticles, carbon film, unidentified contaminants).

3 THE REFINEMENT OF PARTICLE POSITIONS

Our experience with a large number of micro-graphs of different virus samples indicates that thecenters determined using the CP2 method approxi-

151IDENTIFICATION OF VIRUS PARTICLES IN MICROGRAPHS

mate fairly well the true centers of the particles.However, it is possible to improve the quality of oursolution by refining the centers determined by CP2.We assessed the quality of our refinement method bycomparing the positions of the centers before andafter refinement with centers selected by an experi-mentalist. In the case of Fig. 10a, two correspondingpositions differ, on average, by (1.65, 3.37) pixels.The error function used to calculate the averagedistance between a position (xi

a, yia) detected by the

program and its manually selected counterpart (xim,

yim) is given by the formula

(ex, ey) 5 1Î oi51,...,C

(xia 2 xi

m)2@C,

Î oi51,...,C

(yia 2 yi

m)2@C2 ,

where C is the total number of particle imagespresent in the micrograph.

The next section describes a correlation-basedmethod for refining the centers of the particlesobtained using the CP2 method and also analyzesthe results obtained. A second, background equalizationmethod was tested but produced unreliable results.

3.1 Correlation-Based Refinementof Particle Centers

This algorithm was inspired by the template-matching algorithms (16,20,21), but it is more effi-cient and accurate. Let C be the number of particleprojections detected using the CP2 method and let rbe the radius of the particles. Provided the number offalse hits and of missed particles is small, a modelparticle projection built by averaging all the Cprojections is more accurate than one constructedmanually by selecting one or few particles as de-scribed in (16). The model can be cross-correlatedwith the points of the entire scanned image, butrestricted to a limited search region. We take advan-tage of the accuracy of the CP2 solution by restrict-ing this region to a small area around the center ofeach of the C particles. For a square-shaped region ofdimension 2b 1 1 pixels where b is typically set to avalue in the range of 2 to 4, the total number ofcorrelations performed is C(2b 1 1)2. The positionyielding the largest correlation coefficient identifiesthe region in the micrograph most similar to thereference and it is likely to be a more accurateapproximation of the true center. The two steps ofthe procedure are

Step 1. Build the model particle projection. Theintensity of every pixel of the model is the average of

the C corresponding pixel intensities in all detectedparticle images.

Step 2. For each of the C particles, for everyposition (xb, yb) within the search box, correlate themodel particle with the micrograph in a circularregion of radius r centered at (xb, yb).

Let I denote the image on which the refinement isto be performed, M the model particle, Ib and sI,b,respectively, the average intensity and the standarddeviation of I inside a circle of radius r centered at(xb, yb), N the number of pixels inside the modelparticle, M the average intensity, and sM the stan-dard deviation of the model particle. The correlationcoefficient, r, is given by the formula (15)

r 5

o(xi,yi)[M

(I(xi 1 xb, yi 1 yb) 2 Ib) 3 (M(xi, yi) 2 M)

N 3 sI,b 3 sM.

The new, refined center corresponds to the positionyielding the largest r. We have implemented this algo-rithm such that if the new center is located on the borderof the search box, we allow for the search box to move inthe direction of the maximum correlation coefficient anumber of times to obtain a more accurate value for r.

The results of the correlation refinement for a testimage are shown in Fig. 10b. The particle centercoordinates after refinement approximate the truecenters better than the initial values. Analysis of alarge number of micrographs shows that correlationwith a model particle usually improves the resultsproduced by the Crosspoint method. For the imageshown in Fig. 10a, the average error is (0.83, 0.84)pixels for the correlation-based refinement, as op-posed to the (1.65, 3.37) pixel error obtained beforerefinement. The time to produce the initial CP2solution in the case of a 1280 3 1000 pixel image(Fig. 12a) was about 18 sec on a Reality EngineSilicon Graphics machine with a 90-MHz processorand 128 Mbytes of main memory. The subsequentcorrelation refinement step took about 15 sec.

TABLE 1Sensitivity of the Crosspoint Method to Changes

in the Particle Radius

Radius (in pixels) Correctly identified Missed False hits

18 36 (77%) 11 (23%) 3 (6%)20 39 (83%) 8 (17%) 2 (4%)22 38 (81%) 9 (19%) 2 (4%)24 42 (89%) 5 (11%) 2 (4%)25 42 (89%) 5 (11%) 2 (4%)26 39 (83%) 8 (17%) 3 (6%)28 40 (85%) 7 (15%) 3 (6%)30 39 (83%) 8 (17%) 6 (12%)

Note. Results are given for the micrograph shown in Fig. 9. Thetrue radius is approximately 25 pixels.

152 BOIER MARTIN ET AL.

We also designed and tested a background equal-ization refinement method, but with unsatisfactoryresults on our test images. In this method, the solutiongenerated by the CP2 method was used to produce aninitial set of particle centers, at which point background

FIG. 11. Sensitivity of the Crosspoint method to changes in thenumber of thinning layers: (a) no thinning, (b) one thinning layer,(c) two thinning layers. The micrograph shown contains images ofhuman rhinovirus (HRV) particles decorated with Fab antibodyfragments.

FIG. 12. Sensitivity of the Crosspoint method to changes in theimage pixel resolution: (a) original low-contrast image(1280 3 1000 pixels) showing projections of several human rhino-virus particles, (b) the result of the CP2 method applied to (a), (c)the result of the CP2 method applied to (a) after subsampling(640 3 500 pixels).

153IDENTIFICATION OF VIRUS PARTICLES IN MICROGRAPHS

variations were removed before correlation-basedrefinement of the centers was performed.

3.2 The Sensitivity of the Crosspoint Method

The CP2 method is sensitive to changes in severalparameters. For example, the radius r of the virusparticle projections is a very important input param-eter. The CP2 method cannot be used for micro-graphs containing a mixture of different virus par-ticles that are comparable in size (e.g., whosediameters differ by only ,10%). Often it is difficulteven to locate the particles in the original image;hence, errors in defining the radius are expected.

Our experience indicates that the quality of thesolution is not seriously affected for small variationsof r. Table 1 illustrates the results obtained for onemicrograph (Fig. 9).

Another solution-sensitive parameter, set by theprogram user, is the number of thinning layers usedto disconnect particles that have joined into a singlecluster. An illustration of the use of the Crosspointmethod using zero, one, and two thinning layers,respectively, is shown in Fig. 11. For most micrographswe have tested, two thinning layers are optimal.

Three other parameters influence the CP2 solu-tion. One is the threshold used in the marking phaseof the particle identification algorithm. The othertwo are the upper and lower bounds for the size of acluster of marked pixels. A cluster is considered torepresent a particle if its size approximates the areaof a circle with the same radius as the particle. Wehave selected optimal values for these bounds basedupon our analysis of a large number of micrographs.

4 PROCESSING LARGE IMAGES

Modern transmission electron microscopy meth-ods now make it possible to produce very largeimages, with 50–100 Mpixels (million pixels). Onemicrograph may contain the projected images of athousand or more virus particles. Manipulating suchan image in the computer requires 50–400 Mbytes ofstorage depending on the dynamic range of theimaging device (1, 2, 3, or 4 bytes/pixel). The fact thatsuch images can be generated at a fairly high rateand have to be stored creates a critical need for largesecondary storage systems and data compressiontechniques.

Processing and rendering such images is a chal-lenging proposition due to the speed and storagelimitations of current graphics workstations. Forexample, rendering a 5878 3 7521 pixel image takesabout 120 sec on an SGI Power Onyx with oneprocessor and 128 Mbytes of memory. Histogramequalization of the same image takes more than 200sec, and 10 3 10 averages more than 400 sec.

Several possible solutions to this problem exist. A

graphics system with several processors and 512Mbytes to 1 Gbyte of main memory could be used.Parallel algorithms for image enhancement, particleidentification, and center refinement are needed toexploit efficiently such an expensive machine. Alter-natively, the large image can be cut into subimagesand each subimage then processed independently.

Another option is to compress the original, digi-tized image and then to detect the initial positions ofthe particles on the compressed image. Once thesepositions are detected, one can cut the original imagemore efficiently and conduct the refinement proce-dure on each of the subimages, using as an initialapproximation the positions located on the com-pressed image.

4.1 Image Compression

The size of an image can be significantly reducedby simple compression algorithms. A lossless com-pression method like run-length encoding (RLE) isnot very useful because it alters the contents of theimage and the Crosspoint method is not designed towork on an encoded image.

The alternative to lossless compression is lossycompression. Several lossy compression schemes arepossible. The simplest one is subsampling. Imagesize can be decreased by a factor of m2 by using onlyevery mth pixel on each row and every mth row of theimage. More sophisticated algorithms involving

FIG. 13. Experimenting with a low-contrast image: (a) portionof a 5878 3 7521 pixel image; (b) image after histogram equaliza-tion and averaging; (c) CP method applied to image in (b); (d)refinement of the positions detected in (c): blue circles—positionsbefore refinement; red circles—positions after refinement.

154 BOIER MARTIN ET AL.

wavelet transformations (19) could also be used tocompress micrograph images. Preliminary resultsindicate that such transformations can be success-fully used even for very low-contrast images inconjunction with the Crosspoint method (Fig. 12).

4.2 A Particle Identification Algorithm Basedon Image Subsampling and Decomposition

In contrast with correlation-based methods forwhich higher pixel resolution means better chancesfor a more accurate match with the template, theCrosspoint algorithm works well on subsampledimages. A 1280 3 1000 pixel micrograph containingseveral human rhinovirus particle images (Fig. 12a)was examined with the CP2 method on the originalimage (Fig. 12b) and on the subsampled image (Fig.12c; subsampling factor, m 5 2). The average errorfor the coordinates of the particle centers selectedmanually versus using the CP2 method in the case ofFigs. 12b and 12c, without refinement, is (0.34, 1.04)pixels for the full-resolution image and (0.49, 1.25)pixels for the subsampled one.

The processing of a large image involves thefollowing steps:

Step 1. Reduce the size of the image by a factor ofm2. Values of m 5 2 and m 5 3 (i.e., four- andninefold subsampling) seem sufficient for all practi-cal purposes.

Step 2. Enhance this image by histogram equaliza-tion and averaging.

Step 3. Specify the radius r of the particles to beidentified and use the Crosspoint method to identifythem on the image resulting after Step 2. Let (xi, yi),i 5 1, . . . , C, denote the coordinates of the C particlecenters detected.

Step 4. Divide the original image into P subim-ages. Several strategies can be used. One is to ensurethat each subimage has a rectangular shape andcontains roughly C/P particles. Another is to dividethe image into subimages of equal size (possiblyoverlapping).

Step 5. For each subimage carry out the correlation-based refinement algorithm described in Section 3.1.Construct a model particle projection by averagingthe particles within that subimage. Allow the centerof particle i to move within the box with corners(xi 2 b, yi 2 b), (xi 2 b, yi 1 b), (xi 1 b, yi 2 b), and(xi 1 b, yi 1 b). As before, if the best correlation isobtained when the center is located on the edge of therefinement box, allow up to k moves of the refine-ment box.

Step 6. Filter out particles whose best correlationcoefficient is lower than a given threshold (likely tobe false hits).

The timing results for the CP2 method in the caseof the micrograph in Fig. 12 recorded on a Silicon

Graphics workstation with a 200-MHz IP22 proces-sor and 64 Mbytes of main memory were 18 sec forthe entire image (1000 3 1280 pixels) and 7 sec forthe subsampled image (subsampling factor m 5 2).

4.3 Image Decomposition

An alternative to image compression is to decom-pose it into several overlapping subimages and toapply the Crosspoint method to each subimage inde-pendently. With the exception of histogram equaliza-tion, the CP2 procedure does not involve any globalimage transformation (such as Fourier transforms)and it is, therefore, suitable to be applied to indi-vidual subimages. Histogram equalization is theonly transformation affected by decomposition. Byapplying histogram equalization to subimages animproved contrast is obtained in each subimage,closer to the optimal image contrast that would beachieved by using, for instance, the adaptive histo-gram equalization technique described in (17).

In contrast to the complex ‘‘cutting’’ and ‘‘sewing’’described in (20,21), the only requirement of themethod proposed here is that each subimage includea border region to allow for correct averaging, cluster-ing, and box migration during the refinement stagefor all particles inside. Let r denote the radius of theparticles, 2 3 b 1 1 the length of the side of therefinement box, and k the maximum number of boxshifts allowed during the refinement. Then, thewidth of the border region should be w 5 2 3 r 1max5r 1 1, k 3 b6. For example, a 10 000 3 10 000pixel image, with r 5 64, b 5 4, and k 5 5, can bedecomposed into four subimages of 5193 3 5193pixels each. In this case, the border region has thewidth w 5 2 3 64 1 max565, 5 3 46 5 193 pixels andthe actual subimage has 5000 3 5000 pixels. Due tothe need to include a border region, there is a point ofdiminishing return when increasing the number ofsubimages into which an image is decomposed.

Special attention must be paid to processing clus-ters located close to the border region. If a cluster isfully contained within the extended subimage andits center of mass is within the boundaries of theactual subimage then it is considered to belong to thesubimage. If a cluster is fully contained in theextended subimage, but its center of mass is insidethe border region, the cluster is not considered partof the current subimage. Its processing will takeplace in one of the neighboring subimages.

Due to the nature of the decomposition and thefact that the center of a cluster may belong to onlyone of the actual subimages, each cluster is pro-cessed only once. Therefore, it is straightforward tocombine the results from all subimages: the list of allparticle positions for the whole image is the union ofall subimage lists.

155IDENTIFICATION OF VIRUS PARTICLES IN MICROGRAPHS

5. AN ENVIRONMENT FOR EXPERIMENTING WITHPARTICLE IDENTIFICATION METHODS

The expectation that one can design a fully auto-mated particle identification method capable of pro-cessing micrographs produced under various condi-tions without any human intervention seemsunrealistic at this time. What we believe can be doneat this stage is to design an environment whichsupports experimenting and tuning of various meth-ods.

Given a batch of images obtained under similarconditions, the user needs to fine tune the generalalgorithm, e.g., the number of thinning layers, theradius of the particle, etc. Once an optimal procedureis established, all images in the batch can be pro-cessed automatically. Occasionally, a different se-quence of image enhancement steps leads to betterresults than the one we described in Section 2.3.1.Figure 13a shows an image in which particles can beidentified with the naked eye only by a very astuteobserver. On the enhanced images (Figs. 13b and13c), the particles can be identified and the CPmethod works well. In this case, the following se-quence of image enhancement steps leads to the bestresults: histogram equalization followed by averag-ing, followed by another cycle of histogram equaliza-tion and averaging steps.

To support such experimentation, the environ-ment we have developed supports the standardparticle identification method described earlier, aswell as individual image transformations that can becomposed in random order.

EMMA is an interactive software package builtaround the Crosspoint method. In addition to auto-matic particle selection and refinement, it includescapabilities to decompose large images and to dis-play the subimages, to perform various traditionalimage processing transforms on the digitized micro-graphs, to select, unselect, and extract individualparticles interactively, and to store particles intofiles. The transforms supported are: histogram equal-ization, averaging, Sobel and Laplace gradient meth-ods, high-boost filtering, colormap modification, com-pression, and the Hough transform. The programallows for an easy composition of such transforms inthe order specified by the user.

EMMA is built in X-Windows and Motif (21) andconsists of approximately 20 000 lines of code.

6 CONCLUSIONS AND FUTURE WORK

To improve the resolution of virus structures deter-mined using cryoelectron microscopy methods from20 Å to 5–10 Å, the number of virus particle projec-tions used in the three-dimensional reconstructionprocess must increase from a few hundred to severalthousands. To make better use of the biological

samples, the electron microscope must be controlledto aim its beam at particles with positions previouslydetermined from low-dose, low-magnification, andhence very noisy images. Modern devices are capableof producing images with 100 Mpixels, or even more,containing thousands of virus particle projectionsthat need to be analyzed, hence, the motivation forthe work reported in this paper.

Efforts to automate the particle identification pro-cess have been reported in the literature, but exist-ing methods are inefficient and none of them hasgained wide acceptance. Noise due to a variety ofsources makes particle identification very difficult.Due to the low contrast of some of the micrographs, itis often a challenge for the human eye to even noticea particle in a micrograph. And it is not an easy taskto capture in an algorithm the eye’s ability to recog-nize shapes.

The original automatic particle identificationmethod, the Crosspoint or CP, is described in (13). Asreported in (13), the algorithm is efficient and pro-duces relatively accurate solutions, with acceptablemiss rates and few false hits. By exploiting the localproperties of the micrograph image, the algorithm iscapable of dealing with a varying background.

After a large number of tests, we conclude that theparticle identification algorithm works best on im-ages enhanced by histogram equalization and aver-aging. In this paper, we propose a refinement methodbased on correlating a model particle with regions ofthe image located in the vicinity of the particlesdetected by the CP algorithm.

Another contribution of this paper is an algorithmfor processing large images. A compressed image isused to obtain the initial list of particle centers.Experiments confirm that a 4- to 16-fold lossy com-pression does not deter the CP algorithm fromlocating particle centers with sufficient accuracy.The uncompressed large image is then decomposedinto subimages to which the refinement algorithm isapplied.

There are no obvious ways to compute the qualityof the solution provided by a particle identificationprogram. The best one can do is to identify manuallythe location of particles on one micrograph, comparethem with those computed by the program, andreport the error. It is difficult to compare differentalgorithms and programs. There are no benchmarkimages and it is possible that a program which doesvery well on some images may provide a poor-qualitysolution for others. Likewise, there are few timingresults to allow a fair comparison of different pro-grams. Nonetheless, an analysis of the algorithmsinvolved favors the CP method over methods requir-ing Fourier transforms.

The speed of such a method is a prerequisite for

156 BOIER MARTIN ET AL.

automatic control of the electron microscope. Toavoid premature damaging of the biological speci-men, the following approach can be used: first, alow-dose, low-magnification image is recorded on aslow-scan CCD camera and the digital image is usedto determine the coordinates of the virus particles;then these coordinates are used to aim and calibratethe electron beam to take high-magnification pic-tures of each particle or clusters of particles, usingflood-beam or spot-beam imaging procedures (7).

Further information about the EMMA package, aswell as a number of test images, can be obtainedfrom http://www.cs.purdue.edu/homes/sb/Projects/EMMA/emma.html. The software is available freeupon request.

This research has been partially supported by the NationalScience Foundation Grants BIR-9301210 and MCB-9527131, by agrant from the Intel Corporation, a grant from the PurdueResearch Foundation, a Grant-In-Aid of Research and a SummerFaculty Fellowship from Indiana University, and by the ScalableI/O Initiative. We thank the anonymous reviewers for manyconstructive comments.

REFERENCES

1. Atallah, M. J. (1996) Private communications.2. Baker, T. S., and Cheng, R. H. (1996) A model-based approach

for determining orientations of biological macromoleculesimaged by cryo-electron microscopy, J. Struct. Biol. 116,120–130.

3. Bottcher, B., Wynne, S. A., and Crowther, R. A. (1997)Determination of the fold of the core protein of hepatitis Bvirus by electron microscopy, Nature 386, 88–91.

4. Cheng, R. H., Kuhn, R. J., Olson, N. H., Rossmann, M. G.,Choi, H.-K., Smith, T. J., and Baker, T. S. (1995) Three-dimensional structure of an enveloped alphavirus with T 5 4icosahedral symmetry, Cell 80, 621–630.

5. Conway, J. F., Cheng, N., Zlotnick, A., Wingfield, P. T., Stahl,S. J., and Steven, A. C. (1997) Visualization of a 4-helix bundlein the hepatitis B virus capsid by cryo-electron microscopy,Nature 386, 91–94.

6. Crowther, R. A., DeRosier, D. J., and Klug, A. (1970) Thereconstruction of a three-dimensional structure from projec-tions and its applications to electron microscopy, Proc. R. Soc.London Ser. A 317, 319–340.

7. Downing, K. H. (1991) Spot-scan imaging in transmissionelectron microscopy, Science 251, 53–59.

8. Frank, J., and Wagenknecht, T. (1984) Automatic selection ofmolecular images from electron microscopy, Ultramicroscopy12, 169–175.

9. Fuller, S. D., Butcher, S. J., Cheng, R. H., and Baker, T. S.(1996) Three-dimensional reconstruction of icosahedral par-ticles—The uncommon line, J. Struct. Biol. 116, 48–55.

10. Gonzalez, R. C., and Woods, R. E. (1993) Digital ImageProcessing, Addison–Wesley, Reading, MA.

11. Harauz, G., and Fong-Lochovsky, A. (1989) Automatic selec-tion of macromolecules from electron micrographs by compo-nent labeling and symbolic processing, Ultramicroscopy 31,333–344.

12. van Heel, M. (1982) Detection of objects in quantum-noise-limited images, Ultramicroscopy 8, 331–342.

13. Boier Martin, I. M. (1996) Scientific Data Visualization andDigital Image Processing for Structural Biology, Ph.D. thesis,Purdue University.

14. Moody, M. F. (1990) Image analysis of electron micrographs,in Biophysical Electron Microscopy, pp. 145–285, AcademicPress, New York.

15. Mosteller, F., Rourke, R. E. K., and Thomas, J. B., Jr. (1970)Probability With Statistical Applications, Addison–Wesley,Reading, MA.

16. Olson, N. H., and Baker, T. S. (1989) Magnification calibrationand the determination of spherical virus diameters usingcryo-microscopy, Ultramicroscopy 30, 281–298.

17. Pizer, S. M., Amburn, E. P., Austin, J. D., Cromartie, R.,Geselowitz, A., ter Haar Romeny, B., Zimmerman, J. B., andZuiderveld, K. (1987) Adaptive histogram equalization and itsvariations, Comp. Vis. Graph. Image Proc. 39, 269–280.

18. Smith, P. R., Peters, T. M., and Bates, R. H. T. (1973) Imagereconstruction from finite numbers of projections, J. Phys. 6,319–381.

19. Stollnitz, E. J., DeRose, A. D., and Salesin, D. H. (1996)Wavelets for Computer Graphics: Theory and Applications,Morgan Kaufmann, San Mateo, CA.

20. Thuman-Commike, P., and Chiu, W. (1995) Automatic detec-tion of spherical particles from spot-scan electron microscopyimages, J. Microsc. Soc. Am. 1, 191–201.

21. Thuman-Commike, P. A., and Chin, W. (1996) PTOOL:Asoftware package for the selection of particles from electroncryomicroscopy spot-scan images, J. Struct. Biol. 116, 41–47.

22. Young, D. A. (1990) The X Window System Programming andApplications with Xt, the OSF/Motif ed., Prentice Hall, NewYork.

157IDENTIFICATION OF VIRUS PARTICLES IN MICROGRAPHS