Routine quantitative analysis of brain and cerebrospinal fluid spaces with MR imaging

11
Or ig i na I Research Routine Quantitative Analysis Spaces with MR Imagihng' Ron Kikinis, MD Martha E. Shenton, PhD Guido Gerig, PhD John Martin, MS Mark Anderson, BS David Metcalf, BS Charles R. G. Guttmann, MD Robert W. McCarley, MD William Lorensen, MS A computerized system for processing spin-echo magnetic resonance (MR) imaging data was imple- mented to estimate whole brain [gray and white matter) and cerebrospinal fluid volumes and to dis- play three-dimensionalsurface reconstructions of specifiedtissue classes. The techniques were evalu- ated by assessing the radiometric variability of MR volume data and by comparingautomated and manual procedures for measuring tissue volumes. Results showed [a) the homogeneity of the MR data and (b) that automated techniques were consis- tently superior to manual techniques. Both tech- niques, however, were d e c t e d by the complexity of the structure, with simpler structures (eg,the in- tracranial cavity) showing less variability and bet- ter spatial correlation of segmentation results be- tween raters. Moreover, the automated techniques were completed for whole brain in a fraction of the time required to complete the equivalent segmenta- tion manually. Additional evaluations included in- terrater reliability and an evaluation that included longittidinalmeasurement, in which one subject was imaged sequentially 24 times, with reliability computed from data collected by three raters Over 1 year. Results showed good reliability for the auto- mated segmentationprocedures. lndextemu: Braln. MR, 10.1214 Cerebrosplnal fluid, MR. 10.1214 * Comparative studles Three-dlmensional imaging 0 Volume measurement JMRI 1982; 2:619-629 AbbrevhtIom: CSF = cerebrospinal fluld. ICC = intracranial cavity. ROI = region ofinterest. SD = standard deviatlon. 3D = three-dimen- sional. Image display * Image processing * Harvey Cline, PhD Ferenc A. Jolesz, MD MOST MAGNETIC RESONANCE (MR) imaging as- sessments of pathologic changes in the brain have re- lied on subjective visual interpretation of two-dimen- sional cross sections. Such interpretations are limited because (a) they do not allow accurate quantitative analysis, (b) boundary definitions are inexact, and (c) a pixel-by-pixelanalysis is not feasible. The supe- rior spatial and contrast resolutions of current MR images, however, make it possible to quantify patho- logic changes in the brain, changes that are evident in many brain disorders. A prerequisite for quantifying and visualizing mor- phometric changes in whole brain, or in specffic tis- sue, is to be able to reliably identify the relevant struc- tures. We have recently developed a computerized method of segmentation ( 1) based on a multistep pro- cedure. A crucial element of this procedure is the op- timization of the MR imaging data set (2), including the application of a filter to reduce noise (3). A two- stage multistep segmentation procedure that includes volumetric analysis of specified tissues and/or struc- tures is then applied. Finally, three-dimensional (3D) renderings from surface models are generated for the evaluation of morphometric features. Herein, we report the reliability data obtained with these procedures (includingnewly developed, semiau- tomated generation of a mask for identifyingthe in- tracranial cavity [ICC]) in a subset of cases in which we compare automated techniques with manual mea- surements for both interrater and intrarater reliabil- ity estimates. ' From the Departments ofRadlology1R.K.. J.M.. M.A.. D.M.. C.R.G.G.. F.A.J.) and Psychiatry [M.E.S.. R.W.M.I. Harvard Medical School. Brigham and Women's Hospital. 75 Francis St. Boston. MA 02 115: the Communication Technology Laboratory. Image Sclence Division. ETH. Zurich. Switzerland 1G.G.): and GE Corpo- rate Research and Development Center, Schenectady, NY 1W.L.. H.C.). Recelved Aprlll3, 1992: revision requested April 14: revision received and accepted Septem- ber 4. Supported in part by grants from the SWISS National Foundation 1R.K.. C.R.G.G.1: by a Research Scientist Development Award from NlMH lK01-MH00746- 051. a Milton grant. and a Scottlsh Rite grant (M.E.S.): by National Inslitules of Health grants PO1 CA41167.5-KO4-NS011083. and 2 PO1 AC04953. and by a grant from NYNEX 1F.A.J.l: by the Department of Veterans Mairs Medical Research Service and NlMH grant 40.799 IR.W.M.1: by the Theodore Vada Stanley Research Award (M.E.S.. R.K.) and the Whitaker Foundatlon 1R.K.I: and by Swiss National Science Foundation grant 4018-1 1082 lC.R.G.G.1. Mdreie reprfnt requests to R.K. SMRI. 1992 61 9

Transcript of Routine quantitative analysis of brain and cerebrospinal fluid spaces with MR imaging

Or ig i na I Research

Routine Quantitative Analysis

Spaces with MR Imagihng' Ron Kikinis, MD Martha E. Shenton, PhD Guido Gerig, PhD John Martin, MS Mark Anderson, BS David Metcalf, BS Charles R. G. Guttmann, MD Robert W. McCarley, MD William Lorensen, MS

A computerized system for processing spin-echo magnetic resonance (MR) imaging data was imple- mented to estimate whole brain [gray and white matter) and cerebrospinal fluid volumes and to dis- play three-dimensional surface reconstructions of specified tissue classes. The techniques were evalu- ated by assessing the radiometric variability of MR volume data and by comparing automated and manual procedures for measuring tissue volumes. Results showed [a) the homogeneity of the MR data and (b) that automated techniques were consis- tently superior to manual techniques. Both tech- niques, however, were dected by the complexity of the structure, with simpler structures (eg, the in- tracranial cavity) showing less variability and bet- ter spatial correlation of segmentation results be- tween raters. Moreover, the automated techniques were completed for whole brain in a fraction of the time required to complete the equivalent segmenta- tion manually. Additional evaluations included in- terrater reliability and an evaluation that included longittidinal measurement, in which one subject was imaged sequentially 24 times, with reliability computed from data collected by three raters Over 1 year. Results showed good reliability for the auto- mated segmentation procedures.

lndextemu: Braln. MR, 10.1214 Cerebrosplnal fluid, MR. 10.1214 * Comparative studles Three-dlmensional imaging 0 Volume measurement

JMRI 1982; 2:619-629

AbbrevhtIom: CSF = cerebrospinal fluld. ICC = intracranial cavity. ROI = region ofinterest. SD = standard deviatlon. 3D = three-dimen- sional.

Image display * Image processing *

Harvey Cline, PhD Ferenc A. Jolesz, MD

MOST MAGNETIC RESONANCE (MR) imaging as- sessments of pathologic changes in the brain have re- lied on subjective visual interpretation of two-dimen- sional cross sections. Such interpretations are limited because (a) they do not allow accurate quantitative analysis, (b) boundary definitions are inexact, and ( c ) a pixel-by-pixel analysis is not feasible. The supe- rior spatial and contrast resolutions of current MR images, however, make it possible to quantify patho- logic changes in the brain, changes that are evident in many brain disorders.

A prerequisite for quantifying and visualizing mor- phometric changes in whole brain, or in specffic tis- sue, is to be able to reliably identify the relevant struc- tures. We have recently developed a computerized method of segmentation ( 1) based on a multistep pro- cedure. A crucial element of this procedure is the op- timization of the MR imaging data set (2), including the application of a filter to reduce noise (3). A two- stage multistep segmentation procedure that includes volumetric analysis of specified tissues and/or struc- tures is then applied. Finally, three-dimensional (3D) renderings from surface models are generated for the evaluation of morphometric features.

Herein, we report the reliability data obtained with these procedures (including newly developed, semiau- tomated generation of a mask for identifying the in- tracranial cavity [ICC]) in a subset of cases in which we compare automated techniques with manual mea- surements for both interrater and intrarater reliabil- ity estimates.

' From the Departments ofRadlology1R.K.. J.M.. M.A.. D.M.. C.R.G.G.. F.A.J.) and Psychiatry [M.E.S.. R.W.M.I. Harvard Medical School. Brigham and Women's Hospital. 75 Francis St. Boston. MA 02 115: the Communication Technology Laboratory. Image Sclence Division. ETH. Zurich. Switzerland 1G.G.): and GE Corpo- rate Research and Development Center, Schenectady, NY 1W.L.. H.C.). Recelved Aprlll3, 1992: revision requested April 14: revision received and accepted Septem- ber 4. Supported in part by grants from the SWISS National Foundation 1R.K.. C.R.G.G.1: by a Research Scientist Development Award from NlMH lK01-MH00746- 051. a Milton grant. and a Scottlsh Rite grant (M.E.S.): by National Inslitules of Health grants PO1 CA41167.5-KO4-NS011083. and 2 PO1 AC04953. and by a grant from NYNEX 1F.A.J.l: by the Department of Veterans Mairs Medical Research Service and NlMH grant 40.799 IR.W.M.1: by the Theodore Vada Stanley Research Award (M.E.S.. R.K.) and the Whitaker Foundatlon 1R.K.I: and by Swiss National Science Foundation grant 4018-1 1082 lC.R.G.G.1. Mdreie reprfnt requests to R.K.

SMRI. 1992

61 9

0 MATERIALS AND METHODS

-1 D = linear -2D = area

Subjects The data came from several sources, including

( a ) an MR imaging study of a patient with multiple sclerosis, (b) a neurosurgical patient selected for sur- gical planning, and ( c ) 15 healthy male control sub- jects evaluated prospectively in an ongoing study of schizophrenia. The average age of the control subjects was 38 years (range, 23-54 years). All were screened for drug and alcohol abuse and for psychiatric disor- ders: no subject was receiving medications with known effects on brain volume (eg, steroids) (41.

Data Acquisition Protocol All brain images were acquired with the same 1.5-T

Signa system (GE Medical Systems, Milwaukee). A double-echo spin-echo acquisition, covering the whole brain, was performed in the axial plane. The section thickness was 3.0 mm, and sections were acquired contiguously (no gap) by combining two interleaved sequences in the individual acquisitions. Half-Fourier sampling (0.5 excitations) at 54 section locations was done in 12 minutes with 192 phase-encoding steps, TEs of 30 and 80 msec, and a TR of 3,000 msec. The field of view was 24 cm. To reduce flow artifacts, we used a gradient-moment-nulling flow-compensation technique (2.5). See Image Acquisition in Figure 1.

Zmage Processing

nection to our Sun workstations (Sun Microsystems, Mountain View, Calif), where images were processed. The techniques used were based on a multistep ap- proach (1,6,7). Briefly, the gray-scale images were translated into label classes, in which each label rep- resented an entity defined by the operator, such as gray matter, white matter, ventricular and/or sub- arachnoidal cerebrospinal fluid (CSF), and lesions. The operator defined the seed points for each tissue class, thus providing the initial information. It is im- portant to emphasize, however, that the actual classi- fications and resulting label maps were done auto- matically (see below). These label maps, in turn, were easily accessible for computer analysis, which re- sulted in volume determinations for the different label classes and/or 3D reconstructions derived from them.

(3).-A filter was applied to each set of images to re- duce noise without blurring fine morphologic details (see Image Processing in Fig 1 ). This filter was based on the simulation of anisotropic diffusion of heat orig- inally reported by Perona and Malik (8) and subse- quently adapted for double-echo MR imaging by Gerig et a1 (3,6). Two user-specified parameters were neces- sary for this implementation: ( a ) the number of itera- tions and ( b ) the threshold level to distinguish noise and real signal ( k value). These two parameters were determined empirically by repeatedly applying the filter at different settings to a set of images and by se- lecting the image with the optimal parameter combi- nation, which, in the present study, was found to be three iterations and a k value of 8. These parameter values were then applied to all data sets. Figure 2a

The data were transferred through an Ethernet con-

Noise reduction with an anisotropic dflusionfilter

4

Patient Image Acquisition

Magnet Field Strength Data Acquisition Protocol -Slice Thickness -Field of View

-Pulse Sequence

I \

Image Processing

Preprocessing Filter

Segmentation -Supervised Multivar Analysis -Classification -Grouping -

Figure 1. tion and processing procedures used in the Surgical Plan- ning Laboratory. Multiuar. = multivariate, I D = one-dimen- sional, 20 = two-dimensional.

Flowchart gives an overview of the data acquisi-

and 2b show unfiltered double-echo gray-scale im- ages, and Figure 2c and 2d show the filtered images for this image pair. Note the improved quality of the latter images.

Supervised segmentation into tissue classes.-Af- ter application of the filter, a segmentation algorithm, based on a multivariate analysis, was used to differen- tiate tissue classes. In using this algorithm, the opera- tor begins by providing sample points selected from a corresponding set of gray-scale images (about 20 sample points per tissue class) that are then used to calculate a classificator with a nonparametric statisti- cal algorithm (k,, [ nearest-neighbor supervised clas- sification]). This classificator basically represents a two-dimensional lookup table for efficiently assigning the most probable category to the double-valued mea- surements of each voxel. If the operator is not satis- fied with the results, additional "training" points can be picked from either of the two gray-scale images or from the label maps, and the classificator can then be recalculated. Generally, this last step was not neces- sary for an experienced operator. Figure 3 shows the color-coded display of the different tissue classes.

Generation and application of a mask of the ICC. -By using our acquisition parameters, we found sev- eral structures outside the ICC that had signal inten- sity distributions similar to those of some structures within the ICC (eg, the orbits were classified as CSF). To address this problem, we generated a mask of the ICC by first changing all labels that represented gray matter, white matter, and CSF into one class and then labeling everything else as background. The area that represented the ICC was then eroded (6) to break con- nections that might exist between the ICC and extra- neous structures (eg, the optic nerve that connects the ICC with the orbits). Supervised connectivity was then used to remove the extraneous structures, and a dila-

620 JMRl November/December 1992

Figure 2. Original gray- scale images before (a. b) and after (c, d) filtering show an axial section at the level of the orbits in a patient with multiple sclerosis. First- (a. c ) and second- echo (b, d) images (proton- density- and T2-weighted, respectively). Comparison of white matter on the unfil- tered and filtered images demonstrates the removal of the salt-and-pepper texture and enhancement of the boundaries between tissues on the latter.

C. d.

tion was subsequently performed to reverse the ero- sion of the edited ICC. To remove the holes left by ves- sels, which were generally classified as background (when vessels were not selected as a tissue class), a modified connectivity algorithm was used.

Application of the mask to the segmentation re- sults.--The mask generated in the previous step was applied to the segmented images, and all labels out- side the mask were reset to the background value.

Supervised connectivity for additional classes.-

Where necessary, additional classes, such as ventricu- lar versus subarachnoidal CSF, were identified inter- actively by applying the connectivity algorithm to specified voxels.

Generation of 30 reconstructions and determina- tion of the volume of the dgerent label classes.-By using the dividing cubes algorithm, surface models of the different tissue classes were generated ( 9 , l O ) . These models were then interactively evaluated alone and in relation to other tissue classes. Volumes were

Volume2 Number6 JMRl 621

Figure 3. User interface for sampling. The images in the top half are the first- (left) and second-echo images. The result of the seg- mentation is shown in the lower-left panel, and the classificator of feature space is shown in the lower-right panel, with actual sample points that were used to calculate the map (blue = CSF, yellow = white matter lesions, pink = skin. and gray = gray matter).

obtained by adding up the number of single voxels in each label class. This number was multiplied by the volume in milliliters of each voxel to obtain the vol- ume of each label class in milliliters (Fig 1). Figure 4 illustrates the final segmentation of brain into tissue classes.

Specifc Analyses

Radiometric Variability of MR Volume Data A classification based on absolute signal intensity

values assumes that a given tissue class yields voxels with constant values. To test radiometric variability, we chose white matter areas as our reference class because these areas represent large regions through- out the data sets. At different locations, within sec- tions and in different sections, we selected regions of interest (ROIs) representing white matter areas. The statistical parameters assessed, therefore, reflected a mixture of the variability in anatomy (white matter is not completely homogeneous), a human factor intro- duced by interactive selection of the ROIs, and radio- metric distortion.

Reliability of Image Processing Measurements To critically evaluate our automated segmentation

approach, we (a) correlated the automated segmenta- tion results with manual measurements performed by the same five experienced operators (R.K., M.E.S., and three raters who had spent more than 1 year seg- menting brain images with the automated techniques used in our laboratory) in one section showing the largest body of the lateral ventricles, (b) assessed re- sults of automated segmentation performed by three

operators in a single case (54 section levels, with two sections [first and second echo] per level), ( c ) exam- ined overall reproducibility by imaging the same sub- ject 24 times and segmenting the data set, and ( d ) an- alyzed the volume of whole brain (gray matter and white matter) and CSF in a small but carefully defined group of healthy, right-handed male control subjects ( n = 15) whose MR images were collected prospec- tively from the general population in the Boston area. The methods used for the four evaluations are de- scribed in detail below.

Supervised automated segmentation compared with manual measurement in a single section.-To assess interrater accuracy and to reference our auto- mated measurements to another available method, a section pair (first and second echo) was selected at the level of the cella media of the lateral ventricles in a healthy volunteer. The five experienced raters each then independently traced the outline of the ICC, the brain surface, the surface of the white matter, and the CSF on the same filtered image (T2-weighted image). For automated segmentation, the five raters selected training points on the same image for gray matter, white matter, and CSF, and then the segmentation algorithm was used to compute the surface area en- compassed by each of the tissue classes. The ventri- cles were identified with a connectivity algorithm.

To further assess statistical and/or systematic er- ror sources, the boundary length for each tissue class was calculated. On a binary raster image, a boundary is determined by a continuous series of "cracks" be- tween adjacent pixels. To correct for the staircase ef- fect of diagonal boundaries, steps were interpolated by connecting the centers of neighboring cracks (poly-

622 JMRl . November/December 1992

a. b. Figure 4. Gray matter is shown in gray, white matter in green, white matter lesions in yellow, and CSF in blue. (b) Three-dimensional re- construction derived from all sections in the brain. White matter lesions around the ventricular system are shown in yellow, skin in brown, gray matter in gray, and CSF in violet.

Final result of the segmentation procedure. (a) Section that has been segmented and separated from bone is shown.

gon-fit procedure). Because of the limited resolution of the raster images and the binary decision process used to assign a category to each pixel, we expected the segmentation error to increase with decreasing size and increasing complexity of the tissue class boundaries. The numbers in Figure 5b accorded with increasing complexity of the boundary line, with 1 be- ing the least complex.

The complexity of an object can be expressed as a percentage by comparing the boundary length (P) of an object with its area (A) by the following: C = [P/A) x 100. In a circular object with radius r , C var- ies with 2 / r . This results in small values for large ob- jects and large values for small objects. C can be de- scribed as the relative boundary length per pixel. If an algorithm results in an inaccurate estimation of the boundary pixels, C expresses the rate of change for the area calculation. Small objects or objects with a complex boundary are thus more likely to be affected by changes in the boundary pixels than are large, compact objects.

Supervised automated segmentation in a single subject by three raters-For further estimation of reliability, a single study consisting of 108 sections was analyzed by three raters. The segmentation was performed for brain, white matter, gray matter, and CSF, and the reliability for the volumes of each of these tissue classes as determined by the three raters was calculated.

Longitudinal analysis of the same brain.-Data from a female patient were initially obtained at weekly intervals over 8 weeks, then biweekly, and later monthly for a total of 24 examinations. This repre- sents the most rigorous test of the overall reliability of our measurement system, because all components of

analysis and measurement were involved. This pa- tient was studied over a period of l year, and thus we have data that reflect the stability of the MR imager and data that show the reliability of image processing performed by three different raters (the three raters evaluated nonoverlapping examinations, n = 8 for each).

Application of the method to a small, well-defined group.-Data from a prospective study of 15 healthy, right-handed male control subjects were used to illus- trate the kind of information that can be obtained with these procedures. Volumes were determined for whole brain, gray matter, white matter, CSF, sub- arachnoid CSF, and ventricles. The ventricles were separated further from subarachnoid CSF by apply- ing a connectivity algorithm to the classified images. Three-dimensional reconstructions of specified tissue were then rendered (see below).

Three-dimensional reconstructions.-In all 15 cases, 3D reconstructions of the brain surface were generated. In selected cases, additional 3D recon- structions were generated for all available tissues.

RESULTS

Radiometric Variability of the MR Volume Data Our first goal was to determine the reliability and

consistency of our imager. Accordingly, we measured whether the same tissue had the same signal intensity in different locations in one representative section and in different sections. This was done by selecting ROls in white matter from different quadrants of the image. Thus, at different locations within sections and in different sections, we selected ROIs represent-

Volume2 - Number6 * JMRl * 623

Cornplexlty vs -A Overlap

0, I

0 20 40 M 80 100 120

~ p ( C r i t v

a. b. Figure 5. CSF is shown in light blue, gray matter in brown, white matter in yellow, and ventricles in dark blue. (b) Increasing tissue com- plexity (see text) is plotted versus percent overlap for four of five raters. Five experienced raters determined the areas of the same structures with supervised multivariate analysis and manual measurements. There was better overlap for each of the six structures with the automated procedure than with manual segmentation. In addition. automated segmentation provided more complexity in the more complex structures. reflecting a more consistent handling of the partial-volume data. 1 = ICC. 2 = brain, 3 = ventricles, 4 = white matter. 5 = gray matter, and 6 = subarachnoid CSF.

lnterrater reliability. [a) Example of segmentation result used for the analysis in b and in Table 3. Subarachnoid

ing white matter exclusively. The multivariate statisti- cal parameters within single ROIs and of the complete population of ROIs were compared. These values are shown in Tables 1 and 2. The results demonstrate the excellent homogeneity of the MR data. (Head phantom data, collected to determine the validity of our mea- surements, have previously shown that the measure- ment error for brain parenchyma is 4%-6% [ 11 ].)

Reliability of Image Processing Measurements Supervised multivariate analysis compared with

manual measurements.-For both the automated and manual measurements, the following areas were determined: white matter, gray matter, the ICC, the ventricular system, and subarachnoid CSF. For the automated procedures the five raters selected train- ing points for the tissue classes (see Materials and Methods), and for the manual measurements the five raters used the cursor to draw a line that followed the boundary of each tissue class. Composite structures were determined by adding up the areas of their con- stituent parts (eg, brain = gray matter + white mat- ter).

Figure 5 shows the computed measurements for gray matter, white matter, and CSF based on the au- tomated results from the five raters and the manual measurements done by the same raters.

Because the absolute size of these structures varied greatly, the data were normalized by using the average of all 10 measurements from each structure and then calling this average the 100% value for the ICC, with which the individual values were then compared. As

Table 1 Radiometric Variability of M R Volume Data in One Imagine Section

~~

Mean Volume (mL)

No. of First Second Location Voxels Echo Echo 01 a2

Upper left 20 477.1 220.4 30.7 7.2 Upper ri ht 20 487.5 201.9 20.5 7.1 Lower d t 20 483.6 234.7 22.3 7.6 Lower right 20 510.7 230.7 23.3 9.5 Equally distrib-

uted 200 487.2 216.1 26.5 15.1

Note.--nl and u2 are the standard deviations (SDs) (in milliliters) for the first and second echoes, respectively. Samples of white matter signal intensity were obtained in the various locations. There is an increase in the first- echo mean value in the lower-right quadrant of the image and a decrease in the second-echo mean value in the up- per-right quadrant. Compared with the values of the equally distributed voxel population, the maximal changesare+23.5(+4.8%)and -10.1 (-2.l%)forthe first-echovaluesand +18.6 (+8.6%)and -14.2 (-6.6%) for the second-echo values.

expected, the scatter between the raters increased with increasing complexity of the structures (Fig 5 and overlap in Table 3 ) . That is, deviations between methods (error) increased with increasing complex- ity. However, although there was clearly a methodical error-in particular in the more complex tissues-

624 * JMRl * November/December 1992

Table 3 Complexity Analysis

Tissue Class Area

[pixels2)* SD

[%I* Manual segmentation

ICC Brain Ventricles White matter Gray matter SAS

ICC Brain Ventricles White matter Gray matter SAS

Supervised segmentation

19,655.4 15,914.8 1.528.6 8,169.6 7,745.2 2,212.0

19,814.8 17,369.4 1,518.6 8,595.6 8,773.8

926.8

0.9 2.2 3.6 9.7 7.7

18.6

0.5 0.8 2.1 6.6 6.5

15.5

Perimeter [pixels) t

555.9 1.874.2

256.7 1,488.5 3.180.3 1,870.2

608.6 1,611.8

278.4 2,853.5 4.252.1 1,083.6

Overlap [ 2 4 raters) (%)

Complexity [%I+

2.8 11.6 16.8 18.2 41.1 84.5

3.1 9.3

18.3 33.2 48.5

116.9

99.3 95.9 93.9 86.2 80.4 63.4

99.6 99.4 99.5 92.4 92.6 81.8

Note.-The boundary was extracted as a series of cracks between background and object structure and approximated with a polygon fit. These data are represented graphically in Figure 5b. SAS = subarachnoid CSF. ; The areas are given as the mean and S D of five segmentations.

The perimeters were measured in one case. Complexity is equal to the perimeter divided by the area, x 100.

~~~~~~~~~~~~~~~~~

Table 2 Radiometric Variability of MR Volume Data from Different Imaging Sections

Axial Section No. of Pair Voxels

1 200 2 200 3 200 4 200 5 200 6 200 7 200 8 200

Mean Volume (mL)

First Second Echo Echo cr1 02

491.9 223.3 32.5 17.7 496.3 249.9 19.6 10.9 483.0 219.7 37.5 23.5 479.5 213.0 30.5 20.6 487.7 219.2 29.0 17.8 487.2 216.1 26.5 15.1 492.2 228.1 15.4 14.8 484.0 226.2 21.0 14.7

Note.-ul and a2 are the SDs (in milliliters) for the first and second echoes, respectively. This table and Table 1 allow comparison of mean values of different sample groups within white matter throughout the data set.

the variability within one method was relatively small (good interrater reliability).

A measure of complexity was also used to compare the automated and manual ratings. As described in the Materials and Methods section, the boundaries of the segmented objects were extracted and approxi- mated by the polygon-fit procedure. Table 3 lists the areas, boundary lengths, and complexity values. The results are different for manual and automated seg- mentations but nonetheless show the same trends: Complexity increases from the ICC, which is the least complex, to white matter, gray matter, and subarach- noid CSF, which is the most complex. This order of increasing complexity is also reflected by the SDs and the measurements for degree of overlap (ie, with in- creasing complexity, the SD becomes larger and the degree of overlap becomes smaller. The correlation of SD and overlap with C can be expressed numerically: The correlations between SD and complexity were r =

.927 for manual and r = .985 for automated segmen- tation, and the correlations between overlap and com- plexity were r = - .979 for manual and r = - .976 for automated segmentation. These results illustrate that the reliability decreases for objects of complex shape. On the basis of these results, it is possible to define confidence boundaries for detecting statistically sig- nificant differences between tissues.

A closer examination of the pixel-by-pixel overlap between manual and automated segmentation, how- ever, shows that although both methods resulted in a similar number of white matter voxels being classified (8,169 manual vs 8,595 automated), setting our crite- ria to a minimum of four raters obtaining identical classifications per pixel resulted in 86.2% overlap of pixels for manual and 92.4% for automated tech- niques (Table 3 ) .

the segmentation was done in one section, this infor- mation can be used to segment the entire brain (all sections). In contrast, the manual measurements, which took approximately 2 hours to complete as op- posed to 10 minutes for the computed measure- ments, were for only one section. Another 60-80 hours would have been needed to complete manual measurements for all sections.

Supervised automated segmentation o fa whole brain data set by three raters.-Table 4 lists the re- sults of analysis of the same whole brain data set by three raters. As for a single section pair (Table 31, brain and ICC could be measured with high reliabil- ity, whereas measurements of white matter, gray mat- ter, white matter lesions, and CSF showed greater variability, though still quite acceptable reliability.

Longitudinal analysis of the same subject over several weeks.-Figure 6 plots ICC and brain vol- umes derived from measurements over time in a sin- gle female patient. We found that the volume of the ICC, which is anatomically bounded mostly by bone and should accordingly not change, was very stable (SD was 1.2% of the mean). Brain volume was found

Moreover, although selecting the training points for

Volume2 Number6 JMRl 625

Table 4 Interrater Variability for Three Raters

Volume (mL)

550000

500000

450000

400000

350000

~~~~ ~ ~

Rater 3 Mean Error Bounds [%) Tissue Class Rater 1 Rater 2

..

.~

~.

~~

First run ICC Brain CSF White matter Gray matter WML

1.495.8 1.342.8

129.7 601.1 74 1.7

11.7

1.486.9 1,333.0

132.5 661 .O 672.0

10.7

1,474.3 1,358.6

101.6 853.2 505.4

7.1

1.485.7 1,344.8

121.3 705.1 639.7

9.8

+0.7.-0.8 + 1 .o.-0.9 +9.3.-16.2

+21.0,-14.7 + 15.9, -2 1 .O +19.0.-28.0

Second run* ICC 1.448.0 1,435.2 1.44 1.3 1.441.5 +0.4.-0.4

1.295.5 +0.5,- 1 .O Brain 1.302.2 1.283.1 1.301.1 CSF 145.7 152.1 140.2 146.0 +4.2, -4.0 White matter 613.1 637.7 679.1 643.3 +5.6.-4.7 Gray matter 679.4 630.8 614.4 64 1.5 +5.9,-4.2 WML 9.7 14.6 7.7 10.7 +37.1.-28.2

Note.-WML = white matter lesion. * Results obtained by same three raters, using same data set. 3 months after first run.

to fluctuate slightly more over the same time (SD was 6.34% of the mean). This correlates well with the higher complexity value for the structure of the brain. This consistency of measurements over time was ob- tained even though three different raters segmented the data sets.

Application ofthe method to a small, well-defined group.-Table 5 lists the volumetric results from the 15 healthy control subjects. The trend that we found in the single sections (see above) was repeated here. The scatter, as reflected in the SD, increased from 6% for the ICC to 28% for the subarachnoid CSF. Be- cause the volumes were determined in different sub- jects, the natural scatter within the group added to the methodical error.

Three-dimensional reconstructions.-Our 3D re- constructions of the skin surface and brain were gen- erated from the same segmentation results used to calculate the tissue volumes [Table 5). The surface anatomy is shown in Figure 7.

DISCUSSION In recent years, the quality of MR imaging data has

improved considerably. This is because advanced fea- tures in the hardware and software of commercial im- agers have provided more spatially homogeneous data and a better signal-to-noise ratio. Such improve- ments have allowed us to optimize MR imaging acqui- sition parameters and to apply multichannel image processing to MR data sets, neither of which would have been possible without such improvements. We have now successfully applied these newly developed image processing techniques to more than 365 rou- tine clinical MR studies of the brain. Herein we have reported an analysis of the reliability and reproduc- ibility of the method in a subset of these cases.

With few exceptions (6). earlier attempts have not explicitly used the concept of multiple steps but have relied on one or a few algorithms. These earlier at- tempts can be divided into two types: ( a ) single-pixel- based methods (eg, 12) and ( b ) neighborhood algo- rithms (eg, 13).

Single-pixel-based classification of MR images, such as by windowing or multivariate analysis, failed be- cause of the inhomogeneity of the available data ( 14- 16). Similarly, attempts to apply neighborhood algo- rithms to gray-scale data did not result in automated procedures. Different edge-detection schemes have been used, but thus far all have relied on substantial user interaction ( 17) and/or supervised edge tracing (13,15,18,19). Others have used interactive manual segmentation (20,2 1 ) or a combination of manual and probabilistic techniques (22.23). The drawbacks of manual segmentation are the amount of time re- quired for the analysis and the introduction of subjec- tivity. Still others have relied on techniques that opti- mize contrast between only two tissue classes during data acquisition (24.25). Such a "binary" approach to segmentation is useful if only one tissue is of interest,

626 JMRl November/December 1992

Table 5 Application of Computer Methods to a Small, Well-Defined Group of Healthy Volunteers

Volume (mL)

Tissue Class Mean SD

ICC 1,562.10 104.95 Brain 1,440.81 214.01 White matter 681.59 112.33 Gray matter 759.22 101.68 Subarachnoid CSF 104.48 29.2 1 Ventricles 16.81 4.4 Total CSF 121.29 34.66

I

I ~~~~~

Note.-The 15 healthy. right-handed male volunteers were prospectively chosen.

as in CSF volume determination (24); however, it falls short in cases in which multiple tissue classes are of interest-for instance, when it is useful to differenti- ate among gray matter, white matter, and CSF.

Another problem that limits image processing tech- niques is separation of the ICC from extraneous tis- sue outside the ICC. This is because many soft tissues outside the ICC have signal intensity properties simi- lar to those of tissues inside the ICC (eg, the contents of the eyeball have the same signal intensity proper- ties as CSF). Some groups have used data only from the upper parts of the skull, where there are no direct connections with soft-tissue bridges (23). Others avoided this problem because they used manual cor- rection in outlining the brain ( 13,151. We have gener- ated a full 3D description of the ICC as an individual

Figure 7. Example of segmenta- tion-derived 3D reconstruction shows reconstructed skin from pos- terior oblique view and simulated craniotomy. The opening in the skin surface allows visualization of the central sulcus. Part of the gray mat- ter (which is white and gray) was re- moved on the pre- and postcentral gyms to emphasize the white matter (which is yellow).

step using automated techniques. Anatomically speaking, the boundary of the ICC is marked by the inner table of the skull bone (which has low signal in- tensity on all MR images) on one side and by CSF and brain tissue (which have intermediate, low. or high signal intensity, depending on the echo used) on the other side. Since there are only small openings in the boundary of the ICC (eg, the different foramina), we have developed a procedure that cuts through bridges that connect soft tissues inside and outside the ICC.

Our approach offers the following: (a) The data are compatible with routine clinical evaluations be- cause the images contain the full contrast range of clinical images; ( b ) the imager and the data acquisi- tion protocols provide spatially homogeneous data sets; (c) whole-brain data sets are acquired; (d ) multi- ple tissues can be extracted from a single acquisition -the method is not a binary segmentation (eg, CSF vs everything else); and ( e ) except for the supervision required to select the initial pixels, the method is es- sentially automated, thus having the potential to be fast (relative to techniques based on a larger propor- tion of interactive work) and to have greater repro- ducibility and less variability. Although there are still instances when manually guided ROIs are necessary (eg, in evaluating the hippocampus and parahippo- campal @us [ 25)) , the automated techniques offer a clear advantage. Moreover, this system is under full control of the operator because the operator can per- form the training step iteratively until a satisfactory result is obtained. However, while the operator de- cides which sample points are picked for the different tissue labels, computerized tools are used to extract signal intensities derived from those sample points

Volume2 Number6 JMRl 627

and to define classification rules based on the statis- tics.

Using the techniques described herein, we have au- tomated the identification of different anatomic struc- tures in MR images of the brain. On the basis of our experience and the fact that much more powerful computers will become available and that the algo- rithms will be improved, we believe that application of our technique to generate 3D reconstructions and vol- ume determinations of brain structures will become part of the routine evaluation of MR imaging data sets of the head.

These new segmentation procedures can be used in two different ways: ( a ) to determine the volumes of structures identified and ( b ) to generate 3D recon- structions of identified structures. These two capabili- ties will likely increase our ability to more reliably and accurately diagnose such disorders as Alzheimer dis- ease, multiple sclerosis, brain atrophy, and hydro- cephalus. Follow-up studies in patients with diseases such as tumor, brain edema, and multiple sclerosis will allow a more precise determination of the pro- gression of disease on the basis of quantitative volu- metric measures. in addition to the qualitative analy- sis by the radiologist. The routine evaluation of the brain in 3D representations also affords access to new methods of diagnosis of conditions such as brain atrophy and hydrocephalus (7) and for surgical plan- ning (26.27). The latter would include, but not be lim- ited to, identification of the central sulcus for the planning of neurosurgical procedures, the localization of tumor relative to the brain surface for the planning of optimum entry for tumor excision, and/or interac- tive identification of blood vessels to assist in selecting the optimum entry for neurosurgery (Fig 7).

0 CONCLUSIONS Our main findings can be summarized as follows: 1. We were able to determine the ICC, brain (gray

matter and white matter), and CSF volumes by using new automated MR image processing procedures. We were also able to create 3D reconstructions of these tissue components, which greatly enhances the ap- preciation of complex anatomy. The validity of these measurements was not the focus of this study; rather their reliability was the focus. Since no standard of reference exists for determining the validity of MR vol- umetric measurements, we compared volumetric data from computed segmentation with postmortem volumes reported in the literature and found that our results were consistent with that data (25). We have also reported volumetric data calculated from head phantom images, which also showed consistent re- sults (1 I]. Thus, while our focus in the present study was to assess the reproducibility/reliability of our measurements, we also have indirect assessments of the validity of these measurements.

2. There was less variability among the five trained raters when they used the automated procedures than among the same raters using manual outlining for any given structure.

3 . The variability in measurements changed as a function of the complexity of the structure being as- sessed. Simpler structures such as the ICC showed less measurement variability among the five raters

than did more complex structures such as the sub- arachnoid CSF. This was true for both manual and automated techniques, although the variability was less for the automated measurements.

4. Automated segmentation of the whole brain by three raters showed high reliability for the ICC. brain. and CSF, with measurement variability a function of structural complexity.

several weeks showed that the volume of the ICC was stable (SD, 1.2%). as was brain volume (SD.

5. A longitudinal analysis of the same subjects over

6.3%). 0

Acknowledgments: technical and administrative support provided by Diane Doolin. BS, Marianna Jakab. MSEE, Adam Shostack, BS. Andre Ro- batino. MS, Brian Chiango, RT, and Maureen Ainslie RT.

References

The authors gratefully acknowledge the

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

Cline HE, Lorensen WE, Kikinis R, Jolesz FA. mensional segmentation of MR images of the head using probability and connectivity. J Comput Assist Tomogr

Jolesz FA, Schwartz RB, LeClerq GT, et al. spin echo imaging in routine clinical brain and cervical spine protocols (abstr). Magn Reson lmaging 1990: 8(suppl 1):62. Gerig G, Kubler 0, Kikinis R, Jolesz FA. Nonlinear aniso- tropic filtering of MRI data. IEEE Trans Med Imaging

Shenton ME, Kikinis R, McCarley RW. et al. automated MRI volumetric measurement techniques to the ventricular system in schizophrenics and normal controls. Schizophr Res 1991; 5:103-113. Feinberg DA, Hale JD, Watts JC, et al. Halving MR imag- ing time by conjugation: demonstration at 3.5 kG. Radiol- OD 1986: 161:527-531. Gerig G. Kuoni W, Kikinis R, Kubler 0. Medical imaging and computer vision: an integrated approach for diagnosis and planning. Presented at the DAGM Symposium on Com- puter Vision, Hamburg, Germany, October 2-4, 1989. Kikinis R, Jolesz FA. Gerig G. et al. 3D morphometric and morphologic information derived from clinical brain MR images: NATO advanced workshop in Travemiinde. June 1990. In: Hohne KH. Fuchs H , Pizer SM. eds. 3D im- aging in medicine: algorithms. systems, applications. NATO AS1 series F: computer systems sciences. Vol6O. Berlin: Springer-Verlag. 1990: 44 1-454. Perona P, Malik J. anisotropic diffusion. In: Proceedings of IEEE workshop on computer vision. Miami, Fla: IEEE. 1987: 6-22. Lorensen WE, Cline HE. Marching cubes: a high resolu- tion 3D surface reconstruction algorithm. ACM Comput Graphics 1987: 21:163-169. Cline HE, Lorensen WE, Ludke S, Crawford CR, Teeter BC. Two algorithms for the three-dimensional reconstruction of tomograms. Med Phys 1988; 5:320-327. Cline HE, Lorensen WE, Souza SP, et al. 3D surface ren- dered MR images of the brain and its vasculature. J Com- put Assist Tomogr 1991: 15:344-351. Kohn MI, Tanna NK, Herman GT. et al. Analysis of brain and cerebrospinal fluid volumes with MR imaging. I . Meth- ods, reliability, and validation. Radiology I99 1 : 178: 1 15- 122. Filipek PA, Kennedy DN. Caviness VS. et al. onance imaging-based brain morphomctry: development and application to normal subjects. Ann Neurol 1989; 25: 61-67. Vannier MW, Butterfield RL. Jordan D, et al. tral analysis of magnetic resonance images. Radiology 1985; 154:221-224. Levin DN, Pelizzari CA, Chen GTY, et aI. geometric correlation of MR. CT. and PET images. Radiol-

Three-di-

1990: 14:1037-1045. Half Fourier

1992: 11:221-232. Application of

Scale space and edge detection using

Magnetic res-

Multispec-

Retrospective

OD 1988; 169:817-823.

628 JMRl November/December 1992

16. Udapa J K , Srihari SN, Herman GT. Pattern analysis and 23.

17. machine intelligence. IEEE Trans 1982: 4:41-50. Hiihne KH. Bomans M, Pommert A, et al. mographic volume data: adequacy of methods of different modalities and organs. In: Hohne KH, Fuchs H. Pizer SM. eds. 3D imaging in medicine: algorithms, systems, applica- tions. NATO AS1 series F: computer systems sciences. Vol 60. Berlin: Springer-Verlag. 1990: 197-2 15.

18. Jack CRJr , Gehring DG, Sharbrough FW, et al. Tempo- ral lobe volume measurement from MR images: accuracy and left-right asymmetry in normal persons. J Comput As- sist Tomogr 1988: 12:21-29.

19. Jack CR J r . Brain and cerebrospinal fluid volume: mea- surement with MR imaging. Radiolog 1991: 178:22-24.

20. Press GA, Amaral DG, Squire LR. Hippocampal abnor- malities in amnesic patients revealed by high-resolution magnetic resonance imaging. Nature 1989: 341:54-57.

21. Squire LR, Amaral DG, Press GA. Magnetic resonance imaging of the hippocampal formation and mammillary nuclei distinguish medial temporal lobe and diencephalic amnesia. J Neurosci 1990: 10:3106-3117.

22. Schroth G. Naegele T, Klose U. Mann K. Petersen D. Re- versible brain shrinkage in abstinent alcoholics, measured by MRI. Neuroradiolo@ 1988: 30:385-392.

Rendering to- 24.

25.

26.

27.

Rusinek H. de Leon M J , George AE. et al. ease: measuring loss of cerebral grey matter with MR imag- ing. Neuroradiology 1991; 178:109-114. Condon B, Wyper D, Grant R. Patterson J, Hadley D, Teas- dale G. intracranial cerebrospinal fluid volume. Lancet 1986:

Shenton ME, Kikinis R. Jolesz FA. et al. lobe abnormalities in schizophrenia and thought disorder. N Engl J Med 1992: 327:604-612. Kikinis R, Jolesz FA, Lorensen WE, Cline HE, Stieg PE. Black PML. MRl data for neurosurgical planning (abstr). In: Book of abstracts: Society of Magnetic Resonance in Medicine 199 1. Berkeley, Calif: Society of Magnetic Resonance in Medicine, 1991: 752. Kikinis R. Altobelli DE. Cline HE. Lorensen WE, Mulliken J. Jolesz FA. cia1 surgery using 3-dimensional reconstructions. Pre- sented at the XI1 International Congress of Head and Neck Radiology. Zurich, Switzerland. October 1991.

AIzheimer dis-

Use of magnetic resonance imaging to measure

1:1355-1357. Left temporal

3D reconstruction of skull base tumors from

Planning and simulation of cranio-maxillofa-

Volume2 Number6 JMRl 629