Virtual mastoidectomy performance evaluation through multi-volume analysis

10
Int. JCARS manuscript No. (will be inserted by the editor) Virtual Mastoidectomy Performance Evaluation through Multi-Volume Analysis Thomas Kerwin · Don Stredney · Gregory Wiet · Han-Wei Shen the date of receipt and acceptance should be inserted later Abstract Purpose Development of a visualization system that pro- vides surgical instructors with a method to compare the re- sults of many virtual surgeries (n > 100). Methods A masked distance field models the overlap be- tween expert and resident results. Multiple volume displays are used side-by-side with a 2D point display. Results Performance characteristics were examined by com- paring the results of specific residents with those of experts and the entire class. Conclusions The software provides a promising approach for comparing performance between large groups of resi- dents learning mastoidectomy techniques. Keywords Volume feature extraction, comparative visual- ization, mastoidectomy 1 Introduction The creation of more realistic and effective surgical simu- lation is an ongoing and active area of research. A often overlooked component of a successful simulation suite is the analysis of the output of those surgical simulations. This type of technology is especially needed during the use of a simulation by many residents during a training course where T. Kerwin · D. Stredney Ohio Supercomputer Center, Columbus, Ohio, USA E-mail: [email protected]; [email protected] G. Wiet Nationwide Children’s Hospital, Columbus, Ohio, USA The Ohio State University Medical Center, Columbus, Ohio, USA E-mail: [email protected] H.-W. Shen Department of Computer Science and Engineering, Ohio State Univer- sity, Columbus, Ohio, USA E-mail: [email protected] many volumes can be recorded. Tools are needed to exam- ine the performance over time of individual residents as well as the performance of the class in general. Our application is the analysis of datasets acquired from a mastoidectomy simulator involving virtually drilling on a volumetric bone. In the surgery, bone is carefully removed from the temporal bone region of the skull using a drill in order to access the middle and inner ear. The anatomy data used in the simulator is acquired using CT scans obtained from cadavers. Using a haptic 3D joystick as a virtual surgical drill, users remove portions of the bone on the computer to expose underlying structures, as in actual surgery. This simulation system trains residents on the psychomotor and cognitive skills necessary to perform in the operating room. The traditional curriculum for mastoidectomy surgery consists of expert surgeons commenting on the performance of a resident performing the procedure on a cadaver or arti- ficial bone. With the use of a computer simulation, data on performance from an entire group of residents can be evalu- ated and compared. The computer simulation removes vari- ability inherent in different temporal bones used by differ- ent trainees as well as potentially eliminating the subjective biases of the expert observer by allowing for objective, al- gorithmic assessment. The application detailed in this arti- cle provides a way to compare the output of one of these standardized virtually drilled bones to others and presents a unified view of these resident examinations. It also allows for the automated graphical analysis of multiple volume data files facilitating the comparison of performance of surgical skill as well as the uniform analysis and application of de- scriptive metrics previously defined [18] for objective com- parison across trainees and compared to expert performance. We hypothesize that with further development and testing, such applications can provide both formative and summa- tive assessment of skill with a reduced burden on expert sur- geons, allowing them to devote more time to other matters.

Transcript of Virtual mastoidectomy performance evaluation through multi-volume analysis

Int. JCARS manuscript No.(will be inserted by the editor)

Virtual Mastoidectomy Performance Evaluation through Multi-VolumeAnalysis

Thomas Kerwin · Don Stredney · Gregory Wiet · Han-Wei Shen

the date of receipt and acceptance should be inserted later

AbstractPurpose Development of a visualization system that pro-vides surgical instructors with a method to compare the re-sults of many virtual surgeries (n > 100).Methods A masked distance field models the overlap be-tween expert and resident results. Multiple volume displaysare used side-by-side with a 2D point display.Results Performance characteristics were examined by com-paring the results of specific residents with those of expertsand the entire class.Conclusions The software provides a promising approachfor comparing performance between large groups of resi-dents learning mastoidectomy techniques.

Keywords Volume feature extraction, comparative visual-ization, mastoidectomy

1 Introduction

The creation of more realistic and effective surgical simu-lation is an ongoing and active area of research. A oftenoverlooked component of a successful simulation suite isthe analysis of the output of those surgical simulations. Thistype of technology is especially needed during the use of asimulation by many residents during a training course where

T. Kerwin · D. StredneyOhio Supercomputer Center, Columbus, Ohio, USAE-mail: [email protected]; [email protected]

G. WietNationwide Children’s Hospital, Columbus, Ohio, USAThe Ohio State University Medical Center, Columbus, Ohio, USAE-mail: [email protected]

H.-W. ShenDepartment of Computer Science and Engineering, Ohio State Univer-sity, Columbus, Ohio, USAE-mail: [email protected]

many volumes can be recorded. Tools are needed to exam-ine the performance over time of individual residents as wellas the performance of the class in general. Our applicationis the analysis of datasets acquired from a mastoidectomysimulator involving virtually drilling on a volumetric bone.In the surgery, bone is carefully removed from the temporalbone region of the skull using a drill in order to access themiddle and inner ear. The anatomy data used in the simulatoris acquired using CT scans obtained from cadavers. Using ahaptic 3D joystick as a virtual surgical drill, users removeportions of the bone on the computer to expose underlyingstructures, as in actual surgery. This simulation system trainsresidents on the psychomotor and cognitive skills necessaryto perform in the operating room.

The traditional curriculum for mastoidectomy surgeryconsists of expert surgeons commenting on the performanceof a resident performing the procedure on a cadaver or arti-ficial bone. With the use of a computer simulation, data onperformance from an entire group of residents can be evalu-ated and compared. The computer simulation removes vari-ability inherent in different temporal bones used by differ-ent trainees as well as potentially eliminating the subjectivebiases of the expert observer by allowing for objective, al-gorithmic assessment. The application detailed in this arti-cle provides a way to compare the output of one of thesestandardized virtually drilled bones to others and presentsa unified view of these resident examinations. It also allowsfor the automated graphical analysis of multiple volume datafiles facilitating the comparison of performance of surgicalskill as well as the uniform analysis and application of de-scriptive metrics previously defined [18] for objective com-parison across trainees and compared to expert performance.We hypothesize that with further development and testing,such applications can provide both formative and summa-tive assessment of skill with a reduced burden on expert sur-geons, allowing them to devote more time to other matters.

2 Thomas Kerwin et al.

This application utilizes the output of these simulations(virtual bones that have been partially drilled away on thecomputer) and organizes them for comparative analysis andqualitative visualization. Through a multi-institutional study,80 residents were asked to perform a virtual mastoidectomyon a temporal bone before and after a training class. Some ofthe residents declined to provide data after the class, so wehave 80 pre-test mastoidectomy performances and 50 post-test performances: 130 volumes total.

To create a system that allows meaningful exploration ofa multi-volume dataset, we create distance field representa-tions of volumes created by experts performing the simula-tion. Our task presents some aspects that are different frommany other multi-volume dataset studies. For instance, weare not concerned with registration issues, since all the vol-umes are modified from the same original volume in the sim-ulation. Also, we are not concerned with segmentation or thecreation of an atlas. Our focus is on the problem of findingdistance measures between volumes that reflect the differ-ences between expert and resident performances from thevirtual surgery. The distance field representation capturesshape information of the expert simulation example that isused to construct a distance measure between the expert andresident data.

After performing a masking operation on the distancefields based on a resident volume, we compute histogramdistances to extract features that represent the amount ofoverlap between the two volumes. We use these and otherfeatures to construct a feature vector. Dimensionality reduc-tion techniques are used to display a 2D plot of the vectorsfrom the whole dataset. We link the 2D plot of volumes toindividual volume rendering windows so that the data spaceof the multi-volume dataset can be explored interactively.The contribution of this article is both the novel shape com-parison based on masked distance fields and the interactiveapplication that utilizes this and other volume comparisontechniques to visualize a multi-volume dataset.

2 Previous Work

Multifield visualization is a topic of high interest, but mostwork considers only a small number of fields. Woodring andShen [19] show a method to compare volumes using set op-erators and a tree to composite multiple volumes into onedisplay. This works well for four or even eight volumes, butapplying it to a global view of our dataset is infeasible. Wedo build on their approach for our individual volume dis-plays. However, the problem of visualizing a large datasetof multiple, related volumetric images has not been widelystudied. Joshi et al. [9] used distance features to place brainimaging data into a three-dimensional space for exploration,with a corresponding distance grid to aid in an overview ofthe space. However, displaying the separate brain volumes

in a shared three-dimensional space creates occlusion prob-lems and does not allow for the easy visual comparison ofmultiple volumes. In addition, they were forced to displayonly isosurfaces of the brains because of the difficulties ofrendering 500 volumes simultaneously. Parallel coordinateshave been used for an overview component of a visualiza-tion for a 4D anatomical motion database, as described byKeefe et al. [10]

We use a method inspired by Bruckner and Moller [1] togenerate features for our volumes. In that work, the authorsgenerate a distance field for each isosurface value. They com-pute a histogram of that distance field and, using that, finda mutual-information histogram distance measure betweeneach pair of isosurface values. Our use of distance fields isinspired by their work, but we use different processing meth-ods on those fields. We generate a distance field for each ex-pert volume and expert-derived volume and use a maskingapproach to compare histograms. They use a similarity mapto display one value per pair of isosurface values while wegenerate a high-dimensional feature vector using a varietyof distance measures. They use a mutual information-basedhistogram distance measure while we employ an earth-moversdistance. We share with them the requirement of having apre-processing step, where generating distance fields of vol-umetric data and computing either mutual information orearth-movers distance is performed. We compare our dis-tance measure to theirs.

Our shape analysis of the volumes used in our appli-cation is based on a binary mask: a voxel is either occu-pied or not. The task of extracting a binary mask from volu-metric datasets has been widely studied and many methods,from isovalue choice to much more complex manual, auto-matic, and semi-automatic approaches, have been used. [6,20] Mahfouz et al. [13] looked at geometrical measures tocreate feature vectors on medical volumes. That work useda dataset of 228 computed tomography (CT) volumes of hu-man knees, and they were able to classify the sex of the indi-vidual by a 45 length feature vector generated by the voxelsof the patellae. They considered individual measures, suchas moments, rather than measures based on the differencebetween voxels in individual volumes.

3 Methods

3.1 Definitions

Our goal is to create a visualization system to evaluate stu-dent performance on a volumetric surgical simulator. Duringthe simulation, material is removed from the original vir-tual bone. This procedure defines a region that is a subset ofthe original bone. Instructors who wish to examine the per-formance of a particular resident will want to compare the

Virtual Mastoidectomy Performance Evaluation through Multi-Volume Analysis 3

region drilled by the resident to regions drilled in other sit-uations: by experts, by other students, or even by the samestudent at a different time. This paper describes a system bywhich instructors can review a large number of these surgi-cal simulation performances and evaluate the quality of thedrilling results. The system consists of two main parts:

1. Extraction of features that quantify the overlap of spacebetween the resident drilled regions and the expert drilledregions. Intuitively, higher overlap of these regions isan indicator of good surgical technique by the resident,while a low overlap indicates poor technique. We em-ploy an approach based on the distance field of the ex-pert drilled regions to quantify this overlap, as explainedbelow.

2. Interactive visualization that displays the position of ex-pert and resident volumes in the feature-space definedby the features of each volume, but also allows directviewing of the comparison between expected and actualperformance of the resident bone via volume rendering.

We convert each volume in our dataset into a cavity rep-resentation before any other processing is performed. Weare using the set of voxels that were removed by drillingas our definition of the final product from each resident.In other words, the set of voxels that are inside the cavitydrilled away by the user are used as the volume in this analy-sis. The empty space is treated as positive space during anal-ysis. A student who does not drill at all during the simulationwill produce an empty cavity volume as the result. This is akey point to remember in the following sections. All of thedata volumes are considered binary volumes for processing,although we use numerical values of the original opacity ofthe bone in our rendering.

Our dataset consists of two disjoint subsets. The smallersubset contains volumes that are from surgical performancesby known experts; we call this the reference set. The remain-ing volumes in the set have been obtained from surgical res-idents who have unknown competency in the procedure. Wecall this set the test set. We use 130 test volumes and fourreference volumes to demonstrate the utility of our system.

3.2 Composite reference volumes

To account for the variation in performance in experts, wecompare the entire set of reference volumes with our testvolumes. We create two composite volumes to describe thisset: ALL and ANY. A voxel in volume ALL is occupied ifand only if the correpsonding voxel is occupied in all ref-erence volumes. Similarly, a voxel in volume ANY is occu-pied if and only if any correpsonding voxels in the referencevolumes are occupied. So, where R is the set of reference

volumes:

ALL =⋂

V∈R

V, ANY =⋃

V∈R

V. (1)

The ANY and ALL volumes will be referred to as com-posite reference volumes, since they are formed by an opera-tion on all the volumes in the reference set. These compositevolumes designate the maximal and minimal boundaries ofthe reference set. In our application, the portion of the bonethat is removed is the portion that is considered for scoringthe performance of each user. Therefore, the ALL volumerepresents where it is mandatory for a user to remove bonefrom the skull, since all of the experts removed bone in thoseareas. Similarly, the ANY volume represents regions where auser is allowed to remove bone from the skull, since at leastone expert removed bone in those areas.

3.3 Feature vectors

As stated in Sec. 3.1, one of our goals is the definition ofa feature that gives a similarity measure between two vol-umes. There are, of course, a wide variety of features thathave been used in the literature for various types of volumevisualization and analysis. We are using a distance field-based histogram similarity feature rather than texture- [2]or size-based [3] measures for two reasons: (1) all of thevolumes in our dataset are pre-registered, since they wereconstructed by modifying the same base volume and (2) alarge proportion of each volume are unchanged from userto user. We use a distance field to represent shape of ourobjects. This technique is a common step for non-linear reg-istration techniques [7,14] as well as shape skeletonizationalgorithms [5,12].

The distance field representation of the volume is a scalarfield where each point has a value equal to the shortest dis-tance to the boundary of the volume. We use a signed Eu-clidean distance field. Let d be the Euclidean distance to theboundary; voxels outside of the boundary of the object willhave a value of −d, while voxels inside the boundary willhave a value of d. A signed distance field is computed for allof the volumes in the reference set, as well as the ALL andANY composite volumes. In Fig. 1, (c) shows a signed dis-tance field generated from (a). Teal and green pixels shownegative values of the distance field, while blue pixels arepositive values.

The signed distance field representation is a tool for de-scribing the shape of the volume, but we need a function thatcharacterizes the differences between the test data items andthose in the reference set. Histogram comparison is oftenused for this purpose. However, since we expect the shapesbetween the reference and test datasets to be fairly close, di-rect histogram comparison between the two distance fields

4 Thomas Kerwin et al.

(a) Reference data item (b) Test data item (c) Full distance field (d) MR,T : (c) masked by (b)

(e) MR,R: (c) masked by (a) (f) Histogram of (c) (g) Histogram of (d) (h) Histogram of (e)

Fig. 1 A 2D example of our technique. A distance field (c) is computed based on the reference data boundary (a). Test data boundaries (b) are usedas a mask over this distance field (creating (d)) and the reference data is used as a mask as well (creating (e)). The histograms of the two maskeddistance fields are calculated (producing (g) and (e)) These histograms are compared using the methods detailed in Sec. 3.3.

would reveal very little. In order to magnify the shape dif-ference between the two volumes, we define a masking op-eration to retrieve only the part of the part of the referencevolume distance field that is overlapping the test volume.For every pair of reference volume R and test volume T , wedefine MR,T as a volume that includes the values from thedistance field of R only where the voxels from T are occu-pied. Fig. 1 (d) shows the result of this masking operationin 2D. Only the values in the distance field of the referencevolume that correspond with occupied voxels in the maskset are kept, all others are discarded.

After this masking operation, which excludes the regionsof the reference volume distance field that do not overlapthe region of the test volume, we calculate a histogram ofMR,T . Since we are concerned with binary volumes that havea large amount of overlapping voxels, the following proce-dure will produce features characterizing the area that thetest volume occupies but the reference volume does not, aswell as characterizing the area in which the test volume failsto occupy the reference volume area. We also compute MR,R,which is the reference volume distance field excluding all re-gions outside the reference volume. This will be used laterto compare with MR,T .

3.3.1 Histogram distances

We create two signatures for every combination of test vol-ume and expert volume. A histogram of MR,T is obtained.We divide this histogram into halves: one where the dis-

tance field is negative (Hout) and one where it is zero or pos-itive (Hin). Based on these two histograms, we calculate adistance between a given reference volume and all test vol-umes.

Next, we must define functions in terms of the histogramsto quantify the difference between the reference and test vol-umes. We will refer to the scoring function as ω , with ωinevaluating the inside portion of the distance field (Hin) andωout evaluating the outside portion (Hout). This is a better ap-proach in our case than full histogram matching since Houtof MR,R (where R is the reference volume) will consist en-tirely of zero-valued bins (See Fig 1 (h)).

We employ the Earth Mover’s Distance (EMD) to com-pare the inside (positive) portions of the two histograms:

ωin(Ain,Bin) = EMD(Ain,Bin), (2)

where A is the histogram of the distance field (computedfrom the reference volume) masked by the test volume (MR,T )and B is the histogram of the same distance field masked bythe reference volume (MR,R). The EMD operation is appro-priate since the purpose of ω is to evaluate the differencebetween the shape of the region occupied by both the ref-erence and test datasets and negative bins of the histogramrepresent space outside that region.

As shown in Fig. 1, the histogram for the reference ex-ample has no entries for outside distances, since the mask isthe same as the distance field generation object. Therefore,using a similar approach as in Eq. 2 to measure the distance

Virtual Mastoidectomy Performance Evaluation through Multi-Volume Analysis 5

for outside histograms will not work, since there are no neg-ative values in the expert histogram. The EMD for the nega-tive values is equivalent to simply summing the total numberof pixels contained in all bins in the histogram:

EMD(Aout,Bout) = α ∑i∈bins

ri (3)

where α is the constant weight defined as the penalty forcreating or destroying a unit of “earth” in the EMD. Us-ing this definition for ωout would effectively “flatten” thedistance-based signature into a simple counting of voxels.Since this flattening of the signature is an undesirable resultfor our similarity feature, we use the method described be-low to find an appropriate distance feature that quantifies theregion outside the original expert bone.

Virtual mastoidectomies are performed using a 3D joy-stick. As in the actual surgery, this produces small variationsaround the edges of the drilled region. However, variation isunlikely in a region far away from voxels that the user in-tended to drill. Therefore, we chose to penalize voxels wherethe magnitude (absolute value) of the expert distance field islarge. The large magnitude represents a significant distancefrom the expected boundary, so these voxels are less likelyto be modified by the user by chance. Errors in voxels thathave a smaller magnitude of the expert distance field shouldbe penalized as well, but not as much. We use a weightedsum that depends on the value of the bin. The larger the dis-tance from the boundary of the expert that the bin represents,the larger the effect that bin should have on the total distancevalue. We adopt a function

ωout(Aout) =∑

ni=1 (n− i)s ∗Aout[i]

∑ni=1 is

, (4)

as a distance measure. In this function, n represents the num-ber of bins in the histogram Aout, while s is a parameter thatcontrols how fast the penalty increases. We use a value of1.2 for s. Aout[i] represents the ith bin of the histogram.

We compare the test volumes to the ALL, ANY, and in-dividual reference volumes in order to generate features. Allfeatures together comprise a vector of length 12: the ωin andωout values for the ALL and ANY volumes and the four ref-erence volumes.

3.3.2 Other features

We compared our feature vector to two other methods de-rived from previous work in order to acertain the effective-ness of these features in terms of mastoidectomy evaluation.The quality of these features will be addressed in Sec. 5.

Features were generated a method from Sanchez-Cruzand Bribiesca [15]. In this method, distances are calculatedbetween two volumes using their non-overlapping sets. Fora pair of volumes, A, where A is from the reference set, and

B, where B is from the test set, we calculate the set of vox-els A−B and B−A. We then find the solution to the trans-portation problem [11] which assigns these sets to each otherwith the smallest distance between corresponding elements.Unassigned elements, in the case of unequal set sizes, aregiven as a penalty. The volume sets needed to be down-sampled to a third of their original sizes due to the largememory and computation requirements of this method. Aswith the previous method, we generate a distance feature foreach combination of a test volume with the ALL and ANY

composite volumes along with four reference volumes, giv-ing us six features for each volume.

Features were also generated using a variant of the methoddescribed by Bruckner and Moller [1]. Their method usesthe mutual information between two distance fields as a dis-tance measure. We use the normalized mutual informationas suggested in their work. For each test volume, this gener-ates six features, since we are finding the mutual informationbetween the test volume and all four reference volumes andthe ALL and ANY composite volumes.

3.4 Multi-volume dataset viewer

Even with defined similarity features between volumes, away to interact with the data space is needed. Our solutionis a visualization program with two displays. On one side isa grid of volume viewing panels and on the other is a 2Dscatterplot. This follows the general framework of modernvisualization for large datasets: examination of small-scaleand individual data points using a set of tools and the linkingof those examination to a large-scale dataset overview.

Due to the large number of volumes in the dataset, weprovide an overview of the data space to users which is amore efficient way to browse the virtual mastoidectomiesthan a flat list. We append all the features from the distancefield histogram method (Sec. 3.3) into a high-dimensionalfeature vector that encodes shape-based distance informa-tion between each volume and the composite or individ-ual reference volumes. We map these feature vectors intoa two-dimensional space. We give the user the choice be-tween two projection methods, principal component analy-sis (PCA) and locally-linear embedding (LLE). We can alsoappend the features from Sec. 3.3.2 or any other featuresdesired into this vector. We discuss the effects of using thefeatures from Sec. 3.3.2 in Sec. 5.1.

Since we wish to compare a small number of referencevolumes to a large number of test volumes, techniques likePCA and LLE is well suited to the task. For comparing allpair-wise distances in a set, techniques like self-organizingmaps [8] and multidimensional scaling are appropriate. Weare not comparing all pair-wise distances, so we do not usethese techniques.

6 Thomas Kerwin et al.

Fig. 2 Proximity in the scatterplot relates to commonality between volumes. The two volumes displayed with the green and yellow triangle showincorrect over-drilling, as seen by the large amounts of red in the corresponding volume display. The red and blue triangles show results closerto the experts’. Blue circles on the scatterplot represent expert volumes, while green circles represent pre-training resident results and red circlesrepresent post-training resident results. Arrows connect the same user’s pre- and post- results, if available. See Sec. 4 for more detail.

3.5 Comparative rendering

The second main component of the visualization frameworkis a set of volume visualization windows using small multi-ples [17]. Only four multiples are displayed, since the userneeds to resolve detail in the volume image. However, givena large pixel display, higher number of multiples could bedisplayed. While using configurations that had more win-dows but at a smaller size, users complained about the amountof zooming necessary to view details of the volumes.

Joshi et al. [9], describe a multi-volume visualizationtechnique that displays all volumes in the same shared 3Dspace. However, this hinders detailed comparision. In ourapplication, we link the cameras of all the multiple volumedisplay windows. This allows for meaningful comparisonsbetween the two objects and avoids the problem of having tofly around objects to view them. As a metaphor, the shared3D space approach can be thought of as walking around amuseum, comparing objects in fixed locations, while our ap-proach can be thought of as putting objects on a table in frontof you. Another drawback of the shared 3D space approachis that of occlusion due to overlapping. While we have over-lapping items in the 2D scatterplot display, our volumes areviewed separately and never overlap.

We use a six item classification in our volume visual-ization display. Recall that we have constructed ANY andALL composite volumes by combining all the volumes inthe reference set. The combination of these two volumes de-fine three regions as shown in Table 1. Each test volume

ALL ANY Interpretation0 0 Unobserved: No expert modified that region0 1 Optional: Some experts modified that region1 0 (Meaningless, does not occur)1 1 Consensus: All experts modified that region

Table 1 Four combinations of the ANY and ALL regions result in threemeaningful values. These values are used for voxels outside and insidethe user-drilled area, giving six regions of bone.

defines a binary division of the space as well, which givesus six combinations: Unobserved, Optional, and Consensusregions that are inside the test volume region and those thatare outside the test volume region. Given these six classes,we can define colors to control display of the regions. Thesecolors are user editable, although only some combinationsmake sense. This is discussed in Sec. 4.

4 Interaction

We use a brushing and linking approach to facilitate the ex-ploration of the dataset. By clicking and dragging individualpoints in the scatterplot display into the volume windows,the existing volume on that volume display is replaced bythe selected point. This allows users to arbitrarily select vol-umes of interest. These can be either near or far from eachother in the volume data space and investigate the shape dif-ferences between the different volumes.

The camera is locked between the different volume dis-plays. Unlocked cameras provide flexibility, but users felt

Virtual Mastoidectomy Performance Evaluation through Multi-Volume Analysis 7

Fig. 3 This image shows the inverse cavity rendering, where the bone that has been drilled away in the simulation is rendered as solid. Blue colorshows the consensus areas that have been drilled, while bright green shows areas where drilling is considered optional. The semitransparent tealcolor designates regions that are considered consensus areas for drilling based on the expert data, but was not drilled by the user. Red indicates over-drilling errors. The scatterplot display in this example is color coded for institution: each different color is a different institution. The scatterplothere uses LLE for layout.

that since the volumes are pre-registered to each other, theability to view different parts of different volumes at thesame time was unnecessary and potentially confusing.

Each volume has metadata associated with it: the institu-tion at which the resident is located, the year of experienceof the resident and whether the performance was a post-test(performed after training) or a pre-test (performed beforetraining). This metadata is incorporated into the scatterplotdisplay by coloring the glyphs used to represent each vol-ume. In Fig. 2, the scatterplot display displays pre-test vol-umes (those that were virtually drilled before training) ingreen and post-test ones (those virtually drilled after train-ing) in red. The expert volumes are displayed in blue forreference. Selection of which metadata property controls thecoloring is controlled by the user. In Fig. 3, the glyph colorencodes the institution at which that user is studying.

We use an arrow glyph to show the temporal progressionbetween two volumes. We asked users to provide a samplevolume before and after going through a training session. Asshown in Figs. 2 and 3, many arrows are displayed whichconnect two data points. The arrow goes from the data pointthat represents a virtual mastoidectomy performed beforetraining to a data point representing a post-training virtualmastoidectomy. The display of these arrows can be toggledby the user.

The features incorporated into the feature vector whichis used for the scatterplot display can also be changed bythe user. Four sets are available: the distance field histogram

(a) Good performance (b) Poor performance

Fig. 4 Comparison of good and poor mastoidectomy performances us-ing one of the volumetric display modes. In a good mastoidectomy,most of the green volume should be removed, since green representsconsensus areas where all experts removed bone (the ALL volume).Red areas are those that were not drilled away by any expert but wereremoved by that particular resident.

metric for the individual test volumes, the same metric onthe composite volumes, and the two metrics described inSec. 3.3.2. We discuss the effects of the selections on thepoint display in Sec. 5.1.

5 Discussion

The volume visualization portion of the display benefits agreat deal from the composite volumes defined in Sec. 3.2.As shown in Fig. 4, the boundaries between the ANY and

8 Thomas Kerwin et al.

ALL volumes can be a very effective visual indicator of cor-rectness. Orange-colored voxels can be drilled or not; theybelong to the ANY set but not the ALL set. Green-coloredvoxels belong to the ALL set and designate areas that shouldbe drilled away. The colors and opacities of the differentclasses of voxels, as listed in Tab. 1, can be given arbitrarycolors and opacities, depending on the visualization effectdesired. In Fig. 3, the removed voxels are shown with highopacity. Surrounding the removed voxels is a low-opacityteal region which contains voxels that should have been re-moved but were not. This display mode presents an easierway for users to examine errors in drilling which occur deepin the bone cavity. The display mode in Figs. 3 and 4 hashigh opacity only for the bone left as solid, which is a morenatural visualization for experts and residents and is pre-ferred as a orienting display mode for the users.

The manual interaction between the reduced dimension-ality data space and the individual volume rendering dis-plays is key to the utility of this software. Due to the natureof dimensionality reduction, the resulting scatterplot maynot have axes that have simple meanings related to clini-cal performance. In our case, while surgeons explored thedataset, useful facts come to light that can help to evalu-ate the residents who use the simulator. In Fig. 2, two vol-umes from the upper right of the scatterplot and two fromthe lower left have been selected. The selected volumes areindicated on the scatterplot by triangles, with the color ofthe triangle matching the color of the triangle on the upperleft corner of each volume display. On the correspondingvolume displays, the similarities between the data items thatare in close proximity are apparent. The items on the lowerleft of the scatter plot (the upper left and lower left volumesin the volume display) have large areas of red voxels. Thesecorrespond to voxels that no experts drilled away but wereremoved by the user. This, in turn, corresponds to an error inperforming the mastoidectomy procedure. Data items in thislower left area of the scatterplot show similar behavior. Theother two items displayed have good performance, showinglittle of the red and removing much of the green, consensusportion of the bone, which all expert examples removed.

Occlusion can be an issue in scatterplot displays. A slightoffset of the position of the point (jitter) has been used toavoid occlusion in displays which have categorical or in-teger values [16], but this technique can be misleading forcontinuous data like ours. Other approaches include filter-ing, transparency to see overlapped points, and distortion toseparate the points [4]. We employ a transparency-based so-lution for the occlusion issues. Overlapping points can beeasily recognized, and when items are dragged over to avolume display window, a menu pops up allowing the userto choose from any item near the mouse cursor when thedrag operation was initiated. We found this to be sufficientfor the amount of occlusion in our current dataset. Different

techniques might need to be employed if the number of datapoints doubles.

In addition to the software being used by instructors togauge the performance of the class as a whole and find stu-dents who are doing very well or very poorly, the softwarecould be used as a self examination by each individual resi-dent. We demonstrate a preliminary version of this with ourarrow glyph. This tracks the progress from post-instructionto pre-instruction. We plan that after deployment of the sim-ulator many more time-stamped samples of each resident’ssurgical performances will be acquired, and from that wecan construct a trend line. This can be plotted that in thedata-space view, along with the locations of expert perfor-mances. This will let the resident know if he or she is pro-gressing and improving the quality of procedural technique.

5.1 Effects of different feature vectors

The selection of the features that are included in the featurevector can dramatically change the scatterplot display. Fig.5 shows the examples from selecting one of the feature setsdescribed earlier as the only constituent. of the feature vec-tor. The software allows us to compare the masked distancefield approach described in Sec. 3.3 with features from otherworks, as described in Sec. 3.3.2.

The Mutual Information feature is shown in Fig. 5 (a).This shows a continuum of bad examples (the left) and goodexamples (on the right). However, items that are bad becauseof over-drilling and items that are bad because of under-drilling appear in a similar location. This makes it difficultto understand nuances of student performance at a glance.The difference assignment feature, as described by Sanchez-Cruz [15], performs even more poorly than the Mutual In-formation feature for this application. As shown in Fig. 5(b), most of the data items are forced along a single line.Manual exploration of this space determined very little use-ful information contained in this feature. It may be useful inother types of data, but is not good for evaluating the qualityof mastoidectomies.

We found that the composite volume histogram compar-ison feature produced the most intuitive dataspace for in-vestigating the overall performance of residents. With thisfeature (shown in Fig. 5 (c)), the expert reference volumesare clustered in the upper right corner of the 2D space. Datapoints near this corner tend to have good performance char-acteristics: most of the consensus area is drilled away andlittle over-drilling occurs. Data points in the bottom righttend to be under-drilled. In the bottom right corner are vol-umes that were not drilled on at all. The data points in theupper left are items where substantial over-drilling occurredbut a large portion of the consensus area was drilled as well.These volumes represent residents who removed all that wasneeded to complete the procedure, but were too aggressive

Virtual Mastoidectomy Performance Evaluation through Multi-Volume Analysis 9

(a) Mutual information (b) Difference assignment (c) Composite volume histogramcomparison

(d) Independent reference vol-ume histogram comparison

Fig. 5 Comparison of scatterplots generated from various feature vectors.

Fig. 6 Scatterplot display of pre-test items, graded by experts using the“Post. wall thinned” metric. Green shows good performance, while redshows bad performance. The scatterplot layout here uses the compositeand individual expert histogram matching features.

and went too deep into the bone. Finally, the lower left cor-ner is sparsely populated, but consists of those users whomade major over-drilling mistakes but did not drill away theareas necessary to complete the surgery. These users proba-bly have more substantial misunderstandings of the mastoidanatomy than the other groups, or they had problems usingthe simulator.

Based on the interactions of expert surgeons with thesoftware, we discovered some interesting properties of themulti-visualization dataset. This section discusses these prop-erties. An experienced otologic surgeon graded the pre-testbones using the “Post. canal wall thinned” metric. This met-ric has been previously shown to be relevant to general per-formance for mastoidectomies: an expert looks at the resultsof a mastoidectomy, either virtual or on cadaver bone, anddetermines if the Post. wall of the external auditory canal isthinned properly. Grading was done on a 1 through 5 scale.

We mapped the results to our scatterplot, as shown in Fig. 6.Low scores were assigned red colors, while high scores wereassigned green colors, with pale yellow representing a scoreof 3. Good scores clustered around the middle of the plot,while items in the far bottom left and right corners of theplot tended to have low scores. This follows from our pre-vious analysis: the lower left corner contains examples thathave substantial over-drilling errors and those in the lowerright corner did not drill enough. The two red dots in themiddle were an anomaly, so more investigation was done.The examples from those particular users were consideredgood by the experts, but they did remove much more of thePost. canal wall than is allowed.

In addition, flaws in the grading system used by expertsfor evaluating virtual mastoidectomies presented themselvesvia interaction with the software. After coloring by a com-posite score given by experts, some anomalous high scoreswere shown in the lower left corner of the plot. By draggingthese data items on the scatterplot to the volume renderingwindow, we identified the problem. The mastoidectomies inquestion were done well, but only after the users had per-formed extraneous drilling on the side of the bone. This in-dicates a failure to orient the bone properly before startingthe procedure. However, the portion of the bone drilled in-correctly was to the rear of the virtual specimen and the met-rics used for scoring do not take that part into account. Thisshows an important problem with defining metrics for scor-ing this type of procedure: failure to include a case for er-rors that are unusual. Overall, the experts feel that this toolis a useful overview of the volumes and a convenient wayto browse the dataset, but further research must be done onprocedure-specific metrics for it to be a completely auto-mated scoring system for mastoidectomies.

5.2 Implementation notes and future work

We used logarithmic histograms (log(1.0+ x) for each binquantity x) in our calculations, as we found it achieved better

10 Thomas Kerwin et al.

results than a linear histogram. This may be data dependent,and for other types of datasets linear histograms might pro-duce good results.

Some experimentation was done with a gradient-basedrendering technique in which colors would change smoothlybased on the value of the expert distance fields. However,during testing, users complained that the color gradients wereconfusing and obscured the boundaries between the sectionsdescribed in Tab. 1. We feel that future experimentation withnon-photorealistic rendering techniques for the volume dis-plays would be fruitful.

6 Conclusion

We have described a system for viewing the similarities anddifferences of a multi-volume dataset consisting of resultsfrom a mastoidectomy surgery simulator. This system al-lows users to compare resident performance to expert per-formance using similarity measures based on histograms ofthe distance fields of the expert results. The analysis of themulti-volume dataset includes the construction of two com-posite volumes defining the observed range of the expertvolumes. The application provides a solid starting point forthe investigation of any similar multi-volume datasets.

Acknowledgements This work is supported by a grant from the Na-tional Institute of Deafness and Other Communication Disorders, ofthe National Institutes of Health, 1 R01 DC06458-01A1.

References

1. Bruckner, S., Moller, T.: Isosurface Similarity Maps. Com-puter Graphics Forum 29(3), 773–782 (2010). DOI 10.1111/j.1467-8659.2009.01689.x

2. Caban, J.J., Rheingans, P.: Texture-based transfer functions for di-rect volume rendering. IEEE Transactions on Visualization andComputer Graphics 14(6), 1364–71 (2008). DOI 10.1109/TVCG.2008.169

3. Correa, C.D., Ma, K.L.: Size-based transfer functions: a new vol-ume exploration technique. IEEE transactions on visualization andcomputer graphics 14(6), 1380–7 (2008). DOI 10.1109/TVCG.2008.162

4. Ellis, G., Dix, A.: A taxonomy of clutter reduction for informationvisualisation. IEEE transactions on visualization and computergraphics 13(6), 1216–23 (2007). DOI 10.1109/TVCG.2007.70535

5. Gagvani, N., Silver, D.: Parameter-controlled volume thinning.CVGIP: Graphical Models and Image Processing 61(3), 149–164(1999)

6. Heimann, T., Meinzer, H.P.: Statistical shape models for 3D med-ical image segmentation: a review. Medical image analysis 13(4),543–63 (2009). DOI 10.1016/j.media.2009.05.004

7. Huang, X., Paragios, N., Metaxas, D.N.: Shape registration in im-plicit spaces using information theory and free form deformations.IEEE transactions on pattern analysis and machine intelligence28(8), 1303–18 (2006). DOI 10.1109/TPAMI.2006.171

8. Hussain, M., Eakins, J.P.: Component-based visual clustering us-ing the self-organizing map. Neural networks : the official journal

of the International Neural Network Society 20(2), 260–73 (2007).DOI 10.1016/j.neunet.2006.10.004

9. Joshi, S.H., Horn, J.D.V., Toga, A.W.: Interactive exploration ofneuroanatomical meta-spaces. Frontiers in neuroinformatics 3, 38(2009). DOI 10.3389/neuro.11.038.2009

10. Keefe, D.F., Ewert, M., Ribarsky, W., Chang, R.: Interactive coor-dinated multiple-view visualization of biomechanical motion data.IEEE transactions on visualization and computer graphics 15(6),1383–90 (2009). DOI 10.1109/TVCG.2009.152

11. Kuhn, H.W.: The Hungarian method for the assignment problem.Naval Research Logistics Quarterly 2(1-2), 83–97 (1955). DOI10.1002/nav.3800020109

12. Latecki, L.J., Li, Q.n., Bai, X., Liu, W.y.: Skeletonization usingSSM of the Distance Transform. IEEE (2007). DOI 10.1109/ICIP.2007.4379837

13. Mahfouz, M., Badawi, A., Merkl, B., Fatah, E.E.A., Pritchard, E.,Kesler, K., Moore, M., Jantz, R., Jantz, L.: Patella sex determi-nation by 3D statistical shape models and nonlinear classifiers.Forensic science international 173(2-3), 161–70 (2007). DOI10.1016/j.forsciint.2007.02.024

14. Masuda, T.: Registration and Integration of Multiple Range Im-ages by Matching Signed Distance Fields for Object Shape Mod-eling. Computer Vision and Image Understanding 87(1-3), 51–65(2002). DOI 10.1006/cviu.2002.0982

15. Sanchez-Cruz, H., Bribiesca, E.: A method of optimum transfor-mation of 3D objects used as a measure of shape dissimilarity.Image and Vision Computing 21(12), 1027–1036 (2003). DOI10.1016/S0262-8856(03)00119-7

16. Trutschl, M., Grinstein, G., Cvek, U.: Intelligently resolving pointocclusion. In: IEEE Symposium on Information Visualization, pp.131–136. IEEE (2003). DOI 10.1109/INFVIS.2003.1249018

17. Tufte, E.R.: Envisioning Information. Graphics Press (1990)18. Wan, D., Wiet, G.J., Welling, D.B., Kerwin, T., Stredney, D.:

Creating a cross-institutional grading scale for temporal bonedissection. The Laryngoscope 120(7), 1422–7 (2010). DOI10.1002/lary.20957

19. Woodring, J., Shen, H.W.: Multi-variate, time-varying, and com-parative visualization with contextual cues. IEEE transactions onvisualization and computer graphics 12(5), 909–16 (2006). DOI10.1109/TVCG.2006.164

20. Zhang, H., Fritts, J., Goldman, S.: Image segmentation evaluation:A survey of unsupervised methods. Computer Vision and ImageUnderstanding 110(2), 260–280 (2008). DOI 10.1016/j.cviu.2007.08.003