Interpretation Time of Serial Chest CT Examinations ... - CORE
-
Upload
khangminh22 -
Category
Documents
-
view
9 -
download
0
Transcript of Interpretation Time of Serial Chest CT Examinations ... - CORE
David V. Beard, PhD #{149}Paul L. Molina, MD #{149}Keith E. Muller, PhD #{149}Kevin M. Denelsbeck, MSBradley M. Hemminger, MS #{149}J. Randolph Perry, MD #{149}M. Patricia Braeuning, MDDeborah H. Glueck, MS #{149}W. Dean Bidgood, Jr, MD #{149}Matthew Mauro, MD
Richard C. Semelka, MD #{149}Ann S. Willms, MD #{149}David Warshauer, MD #{149}Etta D. Pisano, MD
Interpretation Time of Serial Chest CTExaminations with Stacked-MetaphorWorkstation versus Film Alternator’
I From the Departments of Radiology (D.V.B., P.L.M., B.M.H., J.R.P., M.P.B., W.D.B., MM., R.C.S.,A.S.W., D.W., E.D.P.), Computer Science (D.V.B., K.M.D.), and Biostatistics (K.E.M., D.H.G.), Univer-sity of North Carolina School of Medicine, Chapel Hill. Received March 9, 1995; revision requested,
April 26; revision received, June 5; accepted June 12. Supported in part by National Institutes ofHealth grant no. ROl CA44060. Address reprint requests to D.V.B., Idaho State University, Corn-
puter Information Systems, College of Business, Campus Box 8020, Pocatelbo, ID 83209-8020.
( RSNA, 1995
753
PURPOSE: Interpretation time ofserial staging chest CT cases, whicheach contained current and previousexaminations, with a simple proto-type workstation called filmstackwas experimentally compared withinterpretation time with a film alter-nator.
MATERIALS AND METHODS: Thefilmstack displayed a “stack” of sec-tions for each examination; user con-trols allowed rapid selection of presetattenuation windows and both syn-chronized and unsynchronized scroll-ing. Eight radiologists were timed asthey used the filmstack and the filmalternator to interpret four ergonomi-cally complex serial CT cases.
RESULTS: All reports dictated on thebasis of findings with filmstack andfilm were of acceptable clinical accu-racy. The time to examine a case withfilmstack was significantly fasterthan the time with film, including thetime to load and unload the alterna-tor (99% confidence [P = .01]). Therewas no statistically significant differ-ence in interpretation time betweenfllmstack and prehung film.
CONCLUSION: Use of a low-coststacked CT workstation with a single1,024 x 1,024 monitor is an effectivemeans of interpreting cases that re-quire comparison of multiple CT ex-aminations.
Index terms: Computed tomography (CT),
image display and recording #{149}Diagnostic radi-
ology, observer performance #{149}Receiver operat-
ing characteristic curve (ROC) #{149}Workstation
Radiology 1995; 197:753-758
P ICTURE archiving and communica-
tion systems (PACS) may even-tually reduce costs (1) and improve
logistics by eliminating film images,providing simultaneous access bymultiple physicians, and allowing use
of a variety of three-dimensional and
contrast-manipulation display algo-nithms (2). Computed tomography(CT) is a candidate for early filmbessoperations because, unlike radiogra-
phy, CT is inherently digital and itsrelatively small images completely fitonto available commodity-pricedmonitors.
CT examinations are often com-pared to one or more previous CTexaminations, and these additionalimages greatly complicate CT work-station interpretation. Similarly, load-ing and unloading the film onto the
film alternator can be increasingly
time-consuming as the number ofcomparison examinations in a seriesincreases. Many radiology depart-
ments have allied personnel preboadboth the current study and the most
appropriate comparison examinationto maximize radiologist productivity.However, comparison examinationsmay be incorrectly selected, hung in
the wrong order, or unavailable until
just before interpretation, so film
hanging still occupies radiologists’time in even the most efficient de-partments.
Generally, PACS workstations for
interpretation of CT examinationshave used a mosaic arrangement ofthe CT sections that is analogous tothe 3 x 4 or 3 x 5 grid used with filmdisplay of a CT examination. Such a
mosaic display allows the radiologist’s
eyes to move quickly back and forth
between a “cluster” of two to eightimages (3), which facilitates compani-
son. An alternative to the mosaic an-
rangement is the metaphor of a stackof CT images sequentially displayedone at a time on a monitor. Use of“forward” and “backward” controlsallows the radiologist to visualize theentire examination one image at atime. Because the images are supenirn-posed and “aligned” from one sectionto the next, radiologists can view lon-gitudinably oriented structures such
as blood vessels without having to
move their eyes from a given locationon the monitor.
Stacked displays have advantagescompared with mosaic displays for
workstation presentation of CT exam-inations. Since only one image fromeach examination is displayed at thesame time, a much smaller monitorcan be used, which reduces the cost of
the workstation. Further, a single low-cost 1,024 x 1,024 monitor can hold
up to four “stacked” CT examinationssimultaneously for comparison. Stacksfor each examination are on the same
monitor, so comparison images areclose together rather than up to sev-eral feet away, as can be the case withfilm or mosaic workstations. There arealso disadvantages to stacked displays.
Only a single image is displayed at
one time, so radiologists may find itdifficult to obtain an understanding,or a “gestalt,” of a large three-dimen-sionab section of anatomy. Further, co-
ordination of stacks of serial cases canbe difficult.
A number of researchers have eval-
uated the speed and accuracy of useof workstations for interpretation of asingle CT examination (4-12). Inter-pretation with a CT workstation canbe as accurate as interpretation with
film, as long as the workstation hasmonitors of good quality, appropriate
attenuation window settings, suffi-ciently rapid image-display times, andacceptable computer-human interac-tions. Beard et al (11) found that with
I b.
Toggles betweenStacked and
Cycles throughpreset AttenuationWindows
2.
Figures 1, 2. (Ia) Stacked display. Screen layout contains images from the follow-up and initial chest CT examinations. (ib) Gestalt display.
Screen layout contains a mosaic of miniature versions of all the images from both the follow-up and initial examinations. (2) Configuration ofkeyboard for stacked display. Double-arrow buttons scroll both sets of images at the same time.
754 #{149}Radiology December 1995
.5’ ‘4
Current Previous t
,� ‘4
Current Previous
use of 2,500 x 2,000-pixel monitors and0.11-second image display times, asingle CT examination could be inter-
preted as quickly with a workstationas with a film alternator. Straub et al
(10) conducted a detailed receiver op-erator characteristic (ROC) study tocompare the accuracy of interpreta-tion of an abdominal mass in a singleCT examination with a 2,048 x 2,048-pixel mosaic workstation, a stack-metaphor 2,048 x 2,048-pixel work-station, and a film alternator: Resultswith both the stacked and mosaicmethods were obtained with accuracyand speed equal to those achievedwith film. Also, Beard et al (13) usedan eye-tracker and time-motion analy-sis (14) to predict that as the imagedisplay times are reduced to about 0.3second, results with the stacked dis-play are achieved about as quicklyas they are with the mosaic display.
However, interpretation speed wasnot sufficient with previous worksta-tions that used less expensive 1,024 x1,024 monitors, regardless of image
display speed (12). To our knowledge,the feasibility (or efficacy) of worksta-tion interpretation of serial CT cases,rather than of just a single CT case,has been largely unstudied.
MATERIALS AND METHODS
Experimental Goal
We conducted a timing experiment to
evaluate a simple prototype stack-meta-
phon CT interpretation workstation wecalled filmstack for use in interpretation ofserial CT cases that resulted in dictated
reports. Two hypotheses were evaluated.First, error-free interpretation of serial
staging chest CT cases with the filmstackworkstation is fasten than interpretation
with a film alternator, including the timeto load and unload the film. Second, film-stack interpretation time is faster than in-terpretation of pnehung film with a filmalternator, not including the time to loadand unload the film. Error-free intempreta-tion time includes interpretation times
only for those interpretations that meet
acceptable clinical standards of accuracy.
Error-free time is a technique used in psy-chobogy studies to hold accuracy constantto obtain a valid comparison of response
time (15).
Equipment
Serial cases containing both initial and
follow-up CT examinations were inter-preted with a conventional horizontal filmalternator and with filmstack. This proto-type workstation has a SPARC I + corn-
putem with a single 36 x 28-cm, 1,100 x900-pixel, 66-Hz, noninterlaced, land-
scape-oniented monitor (Sun Microsys-
tems, Mountain View, Calif). filmstackwas implemented with the C program-
ming language and the X windowing sys-
tern. This hardware and software configu-ration displays the next or previous imagein a filmstack in 0.2 second, which is ap-
proximately the time a human requires topress a button. Figure Ia depicts the film-
stack screen layout, with the film stacks of
the current examination on the left and ofthe previous examination on the right.
Pilot studies indicated that a numberof radiologists were troubled by their in-ability to see an overview or gestalt of the
images. To address this preference, we
added an alternate screen layout that pre-
sented gestalt views for both the currentand previous examinations (Fig Ib). Thegestalt view consists of a five-column rno-
saic of reduced-size (10 x 10-pixel square)
images from both examinations. A button
on the control panel toggled the display
between the gestalt view and the stackedview of both the current and previous ex-
aminations.
Figure 2 shows the configumation of the
control panel. By using self-sticking tags
on the keys of the conventional Sun work-station keyboard, we were able to rapidly
rearrange and reconfigure the keyboardduring pmototyping and pilot studies. Sixfunctions were included in the final yen-
sion of the keyboard used in this expeni-
ment. First, a vertical button pain was pro-vided to scroll the follow-up examinationup and down, as well as a corresponding
vertical button pain for the initial exami-
nation. Second, a vertical button pair be-
tween the follow-up and initial button
pains was provided to scroll both examina-
tions in a synchronized manner. Third, abutton below the scrolling buttons moved
both examinations to the topmost imagethat contained a portion of the liver. This“liven” button proved very useful because
it synchronized the images in both exam-
inations regardless of the anatomic posi-
tion of the examination. Once the exami-
nations were synchronized with the liverbutton, the radiologist could scroll up anddown and the examinations would move
in parallel, generally in synchronized fash-
ion. We do not know of any current tech-
nobogy that can automatically recognize
which section contains the top of the liver
on some other anatomic landmark in a CTexamination, so a technologist had to Se-lect this section at the same time that thepreset attenuation window settings werechosen. Fourth, a long horizontal buttonabove the scrolling vertical button pairsmoved both examinations to their topmostimage. Fifth, a button toggled between
preset soft-tissue, lung, and liver win-dows. This attenuation window operationtook 0.2 second. Findings in previousstudies (7,1 1) supported the effectivenessand speed of providing only preset atten-uation windows for general CT interpreta-tion. Sixth, a button toggled between the
stacked and the gestalt display.
Observers
Eight board-certified radiologists, five
men and three women who read CT scansin our department, participated in the ex-
periment. All observers had some experi-
ence with computers such as word pro-cessoms or the information system in our
radiology department. All had used con-
soles on CT machines and several had par-
ticipated in previous CT workstation eval-uations, but none had experience with
filmstack.
Cases
Four pains of serial “staging” chest CTexaminations (lung apices through theliver) with multiple abnormal findings
were selected for the experiment. Ergo-
nomically complex cases with potentially
bong interpretation times were deliberatelychosen to maximize any interpretation-
‘�- 2 Radioloev #{149}755
time differences between ifim and film-stack. Such cases were expected to be par-ticularly difficult for interpretation withfilmstack, owing to the large number ofmultisection visual comparisons requiredbetween the serial examinations.
A homogeneous set of cases was chosento reduce variance. Findings among theseabnormal cases were as follows: Case A
was a postmastectomy evaluation of a pul-monary nodule suspicious for metastasis,postirradiation changes in the left upperlobe, and diffuse fatty infiltration of the
liver. Case B was a posttranspbatation evab-
uation of a lung with interval increasedmediastinal adenopathy, fluid collections,bilateral pleural effusions that tracked intothe fissures, and patchy right-upper-lobeposterior-segment consolidation. Case C
exhibited right-lower-lobe collapse, devel-
opment of a large right pleural effusion,and persistent right-upper-lobe posterior-segment parenchymab consolidation. CaseD was of multiple bilateral pulmonary me-tastases. All examinations had been per-formed with a Somatom Plus CT scanner(Siemens; Isebin, NJ) with contiguous 1-cmCT sections and an average of 35 sectionsper scan.
To help control for the effect of varyingdifficulty, the four cases were paired andthen divided into two groups of equal dif-ficulty. The original report for the initialexamination as well as the original CT req-uisition form for the follow-up examina-tion (which contained the patient historyand the referring physician’s clinical ques-tion) were obtained for each case. Theoriginal scans were used for viewing withthe film alternator. Preset values for at-tenuation windows in fllmstack viewingwere chosen to match as closely as pos-sible the windows used to obtain the origi-nal scans.
Experimental Designand Procedure
A counterbalanced, within-subject ex-perimental design was used, with each ob-server reading two cases with a film alter-
nator and two cases with fllmstack. Eachcase was read exactly once by each ob-server, and each case was read the samenumber of times with flbmstack and witha film alternator, so no radiologist readthe same case with both display methods.Case presentation, observer, and methodwere all controlled. Four observers startedwith filmstack, whereas the other fourstarted with the film alternator. Alternatorand fflmstack interpretations were system-atically intermixed to minimize the effectof ordering.
Both flbmstack and film alternator read-ing sessions were conducted under con-ditions controlled for lighting, sound,distractions, and interruptions (16). Ob-servers were instructed to work as quicklyas possible while producing dictated me-ports of clinically acceptable accuracy. Foreach film case, the observer was given afolder that contained the scans for the mi-tial and follow-up examinations, as well as
a typed sheet of 8.5” x I 1” paper that con-tamed the dictated report for the initialexamination and the requisition informa-tion for the follow-up examination. Ob-servers loaded the scans onto the abtemna-ton, viewed the images, and dictated areport with a familiar dictation machine.
Before interpretation of the first work-station case, the seat adjustment, keyboard,
and monitor placement were checked, thekeyboard arrangement was explained, andthe observer was trained for 2 to 3 minutesby scrolling through a case, adjusting con-trast, etc. Then the observer was given twoserial staging chest CT training cases tointerpret, for a total of approximately 20
minutes of practice. One training case in-volved changes in size and number ofliver metastases, and the other trainingcase involved a pleural effusion.
On the start signal, the observer wasgiven the typed sheet of paper that con-tamed the initial report and the follow-uprequisition information, and the imageswere displayed on the workstation. Theobserver then scrolled back and forth inthe examinations, changed attenuationwindows as needed, and dictated a report.
All observers had years of experience
interpreting CT cases displayed on film.Given finite available observer time, it wasnot possible to match this film experiencein a laboratory setting. Results of pilotstudies indicated that only a small amount
of training was sufficient for our observersto accurately interpret cases with filmStaCk
Data Collection
Verbal comments-referred to as verbalprotocol-made by the observers duringthe study were recorded by the experi-menter and the observers were debriefedin an unstructured interview at the end ofeach session to gather comments and as-sess their experience with workstation and
film interpretation. The experimenter useda stopwatch to measure the time to inter-pret each case. Load and unload times aswell as interpretation times were mea-sured in the film alternator cases. Thestopwatch times have an accuracy of 1second.
Whether ifim load and unload times
should be included in a comparison of filmand fflmstack interpretation is open to de-bate, and the answer depends on the open-ationab characteristics of the reading room
to which the results are applied. Thus wereport both total film time (including thetime to load and unload films) as well asprehung film time (without the time toload and unload ifims) to allow the readerto make institutionally specific compari-sons. Radiologists often perform some im-
age “reading” while they load scans ontothe view box, so we might be slightly un-derestimating the actual time to interpret
prehung ifims.
Data Analysis
It was not the purpose of this experi-ment to evaluate interpretation accuracy,
but we needed to ensure that only accu-rate interpretations were included in themean interpretation times for workstationand film. Accuracy of the 32 reports (theeight observers generated one report foreach of the four cases, which each con-tamed scans from an initial and a follow-up CT examination) was determined by aphysician grader who was not an observer
in the experiment and who was blindedto both interpretation method and theobserver. The grader had available the
original scans, all the dictated reportsgenerated during the experiment, andthe original copies of the initial CT reportand follow-up requisition form.
Each experimental report was tran-scribed, with any information as to ob-server or method removed, and then
evaluated by the grader in the followingmanner: First, a list of findings was gener-ated for each report. Second, a list of find-ings was generated for each case on thebasis of all available data. Third, the find-ings for each case were divided into criti-cab findings, which might influence pa-tient-care decisions; relevant findings,which related to the clinical question onthe follow-up requisition form; and addi-tional noncritical findings. Fourth, eachreport was determined to be clinically ac-ceptable if it contained no incorrect find-ings and contained all critical and relevantfindings.
The assumption of the general linearmultivariate model is that the errors have
a multivariate normal distribution. Boxplots, tests of normality, normal plots, andanalysis of the means, standard devia-tions, skewness, and kurtosis revealed thatthe various time measurements satisfiedthis assumption fairly well.
Two general linear mubtivariate modelswere fitted. The first model was used totest the hypothesis of no difference be-tween filmstack time and total ifim time.The second model was used to test thehypothesis that there was no differencebetween fflmstack time and prehung filmtime. The Wilks A was chosen as the teststatistic. In this case, this is equivalent to apaired-data t test. A Bonferroni correcteda level of .025 was used for each test toensure an experimental type 1 error rateof .05.
Accuracy Data
RESULTS
Interpretation accuracy with a filmalternator and fllmstack were identi-
cal with no unacceptable reports gen-erated with either method. Thus, allthe interpretation times can be con-sidered error-free task times. Whileperfect accuracy is unusual in a reac-tion-time experiment, it is reasonableand in fact expected with this radio-logic image interpretation task. Radi-obogists, during residency, fellowship,and their work experience, have beentrained to a near-perfect accuracy as-
756 #{149}Radiology Dpcpmhor 1 QQ�
ymptote to always generate a clini-
cably acceptable report. In the past
(11), radiologists have been observedto slow down, take notes, use a greasepen, and take other steps to ensurethey maintain that standard of clinical
acceptability.
Time Data
Case A interpretation time averaged
6.74 minutes, case B averaged 5.61 mm-
utes, Case C averaged 6.10 minutes,
and case D averaged 8.38 minutes,with the averages computed with twofilmstack times and two prehung-filmtimes for each observer. The Table
details the interpretation times by ob-
server and by display method. Load-
Unload Time is the sum of the time
to load the images onto the alternator
and the time to unload the imagesfrom the alternator back into the filmfolder. Total film time of 10.84 mm-utes was significantly longer than the
time to interpret the serial CT caseswith fibmstack of 6.68 minutes (P =
.0103 < .025 [ie, the 99% confidence
interval on the difference of the meansdoes not cover zero]).
There was no significant differencebetween prehung film time of 6.75minutes and filmstack time of 6.68
minutes (P = .9183), and four of the
eight radiologists interpreted faster
with filmstack than with prehungfilm. A 95% confidence interval on thedifference between the mean prehung
film time and the mean filmstack timewould be (-1.43 minutes, +1.6 mm-utes). A confidence interval of ±1.5
minutes may seem quite large, but thenatural variation among interpretedcases was over 8 minutes, with inter-
pretation times ranging from less than4 minutes to almost 12 minutes, so a
reduction to ±1.5 minutes is still fairly
useful. Going further and provingthat the mean fibmstack time and themean prehung film time are likelyto differ only by a clinically unimpor-
tant difference (ie, proving the nullhypothesis) requires a much more
powerful experiment. Retrospective
power analysis indicated that even if
we had used 12 cases and 24 radiolo-gists, we would have been able to de-
tect a 30-second difference only 36%of the time.
Additional exploratory analyseswere conducted to support the aboveconclusions. A general linear multi-variate model was fitted to examinethe effects of the order of case presen-
tation and the method of display.There was no significant interactionbetween the order in which an ob-server read the cases and the display
method (P = .8010). This means that
the interpretation time for an observ-en’s first and second cases was about
the same, whether the cases were pre-sented on film or on filmstack. Therewas also no significant effect of order
(P = .7029). This means that the firstfilm or filmstack case took nearly the
same amount of time to complete as
the second. There were no missing
data.
Observations and ObserverComments
All observers believed that filmstackwould provide acceptable interpreta-tions and that they would be willing
to use a workstation with a stacked
display in clinical practice. However,
half of the observers stated that theywould still be more comfortable withfilm, though all biked the fact that anelectronic workstation would handlefilm management and manipulationbetter.
A number of observers commented
that they sometimes had trouble “get-
ting the big picture” (developing a
spatial gestalt of the three-dimen-
sional information contained in the
CT examinations) with fibmstack. Oneradiologist noted that film was better
for determining the change in size ofa pleural effusion from the initial tothe follow-up examinations becausethis task necessitated moving the eyes
rapidly back and forth over four to six
images in the follow-up examination
to develop a notion of the craniocau-dal extent of the effusion. We believe
this finding is likely to apply to spatial
comprehension of a variety of bongi-tudinally oriented structures. Whileseveral observers made use of the
gestalt view (Fig ib), most did not.
Several observers suggested that the
100 x 100-pixel images were too smallto provide sufficient information foran adequate spatial understanding of
the data. Many observers used the“both to top of liver” key to synchro-nize the examinations and then scrolledback and forth, keeping the examina-
tions synchronized.Several experienced radiologists
dictated findings “on the fly” or asthey discovered them on film but
sometimes memorized the findings
and dictated them all at one time withfilmstack. When asked, these radiobo-
gists stated that it took too much time
to scroll back and forth through theexamination or to write notes andthat it was easier to just remember the
findings and dictate them at one time.
On the other hand, another radiobo-gist reported trouble keeping track of
findings with the stacked display, be-
cause the spatial location of the images
on the alternator normally assisted
memory of critical findings. Anotherradiologist used written notes morewith fibmstack than with a film alter-nator. Finally, one radiologist whohad been skeptical at first about thestacked display stated that the section-
by-section synchronized viewing of
the initial and follow-up examinationswas ideal for determining change in
number and extent of multiple liverlesions.
DISCUSSION
Interpretation Time
After 20 minutes of training, averageinterpretation time with the stackeddisplay was significantly faster than
with total film (including the time toload and unload the film) and wasslightly faster than with prehung film.
Four of eight radiologists were fasterwith the stacked display than withprehung scans. Further reductionsin interpretation time with a stackeddisplay may occur after several minor
changes are made to the user inter-
face and radiologists receive addi-
Vt�1iimo 107 #{149}Mitmhor � Radioloev #{149}757
tional training and become more fa-miliar with both the interaction andhow best to integrate a workstationinto their clinical operations. Giventhese results and our power analysis,
we would be surprised if, under clini-cal conditions, replacement of even
prehung film with a stacked displaywould result in any significant reduc-tion in operational throughput.
Our experimental results showedinterpretation time with stacked dis-play was faster than with total film,and these results are strengthenedfurther by our decision to limit train-ing and to use complex serial CT casesthat compound the difficulties of in-
terpretation with a stacked display.
One of our main strategic decisionswas that whenever we were forced
to choose between two experimentaldesign alternatives, we would be bi-ased in favor of the existing film dis-play and against the new stacked dis-play workstation. This strategy wouldresult in a stronger case for the newtechnology if our experimental hypo-thesis was found to be true and the
examination would be kept withintractable means. Thus we limited our
subject training because two caseswere sufficient for rate interpretations
and because we could never train ra-diobogists as well with the new work-
station as they had been trained in
the film approach they had been us-ing in the clinic for years. We alsochose serial staging chest CT exami-
nations for our test cases because
these are a particularly difficult and
common type of multiple-examina-tion CT case (the longer the interpre-tation, the more differences in dis-play-method interpretation time areaccentuated) and because we believedthe cases would be fairly difficult tointerpret with a stacked display work-station, again with bias in favor of
film when we had to make a choice.Beard et al (11) reported that the
time to load and unload the scans inone chest CT case averaged 1.18 mm-utes versus the mean of 4.09 minutesreported for the similar cases in this
experiment (Table). Assuming CTcases of roughly similar complexity
in the two experiments, doubling thenumber of scans to be examined in acase may roughly triple the film-han-
dung time. Four minutes per case
across 10 or 20 CT cases per day can
add up to a considerable amount ofradiologist time. Unless radiologistshave considerable free time in theirdaily routine, it is clearly advantageous
either to use an electronic worksta-tion or to have allied personnel accu-rately preload alternators.
Interpretation Accuracy
The purpose of this experiment was
to analyze error-free response time forserial CT cases. As such, the expeni-ment was not designed to have suffi-cient statistical power to draw conclu-sions with respect to interpretation
accuracy. However, given the lack ofunacceptable reports with the stackeddisplay, combined with the excellent
accuracy results of Straub et al (10)
with another workstation using thestack metaphor, we believe that astacked display such as filmstack iscapable of providing acceptable accu-racy for interpretation of both mdi-
vidual and serial CT examinations.
Radiologic Accuracy versusTask-Time Experiment Design
Task-time experiments have a num-ber of major differences from accuracyexperiments. First, task-time expen-ments have a continuous responsevariable, typically seconds or minutes,
whereas accuracy experiments have abinary or discrete response variable,
typically an “acceptable or unaccept-able” rating. Experiments with con-tinuous response variables necessi-tate use of considerably fewer casesand/on observers for the same statisti-cal power. Second, variance in observertask time is almost always much greaterthan variation between case and im-age-display method (11), so it is more
important to use a larger number ofobservers than a larger number of
cases. Third, and perhaps most impor-tantly, accuracy and task-time labora-
tory results are put to very different
purposes in the clinic and so havedifferent levels of acceptable variance.Interpretation accuracy directly af-fects patient outcome, so radiologists
correctly demand very strong expen-
mental results before they accept anew technology. On the other hand,interpretation task time for a new dis-play method is only one of many fac-tons in overall clinical productivity.
The “ noise” from operational effi-ciency, case mix, and most impon-
tantly, individual differences amongradiologists, mean that interpretation
task times can have larger variances
than those for accuracy results andstill be clinically relevant.
Caveats
Observers-The radiologists hadapproximately 20 minutes of train-ing with stacked display comparedwith several years of training with afilm alternator. We would expect the
stacked display times to improve withadditional training, but we would notanticipate much change with the filmtimes. We would expect radiologistswho are less familiar with computersto eventually achieve similar perfor-mance after additional training.
A number of radiologists were un-comfortable with the stacked displayinterpretation method for some cases.We suspect that three-dimensionalspatial inferences-such as visualiza-
tion of longitudinally oriented struc-tunes-may be more complex withstacked display than with a mosaicapproach. On the other hand, thestacked approach combined with thesynchronization function, which al-
lowed both examinations to be scrolledin parallel, appears to be a superiormethod for section-by-section com-panson of CT examinations.
Reading Time-Since radiologists
could view the images while theywere loading the films onto the al-temnator, it is possible that we haveslightly underestimated the amount
of time a radiologist would require toread prehung films. The prehangingof films was found to be expenimen-tally problematic during pilot studies,
owing to variance in preferred film-
arrangement configurations.
Fatigue-Experimental sessions
lasted about 1 hour, whereas clinicalinterpretation sessions can last allmorning or all afternoon. Thus, it is
possible that in routine practice, men-
tal and eye fatigue with either film al-temnator or stacked display might af-fect relative performance.
Implications for CT WorkstationDesign
The stacked-metaphor approachprovides an effective means for inter-pretation of multiple CT examinations
and allows interpretations that are asfast if not faster than with film. How-ever, on the basis of observations andcomments by radiologists, we believethat if cost were not a factor, use of a2,500 x 2,000-pixel, high-brightnessmonitor with a 0.1-second image dis-play time would be superior to use ofa 1,024 x 1,024-pixel monitor for in-terpretation of CT examinations. CT
images might have to be mathemati-cally interpolated to a slightly larger
size to compensate for use of smaller
pixels, as are often found with high-resolution monitors. This interpola-
tion would still allow 12 or more CTimages to be displayed on a singlemonitor. The 2,500 x 2,000-pixelmonitor could be configured for a si-multaneous mosaic display of 12 CT
758 #{149}Radiology December 1995
images from the follow-up examina-
tion with the option of displaying ei-then the initial or the follow-up ex-amination with the stacked metaphor.A 2,500 x 2,000-pixel monitor could
also be used to display chest radio-graphs and other large images.
Nevertheless, we believe there are
a number of circumstances in which
use of a 1,024 x 1,024-pixel monitor
and the stacked display approach
should be considered. First, 2,500 x2,000-pixel monitors are likely to me-
main more expensive and physicallybanger than 1,024 x 1,024-pixel moni-tons for some time, so stacked display
can provide effective CT interpreta-tion at a much lower cost in hard-ware and physical footprint. Thereare numerous places at a medical cen-
ten where this cost trade-off might be
advantageous, including in radiobo-gists’ offices and in nonradiobogy din-
ics. Second, given that the stacked
display provides effective CT visual-
ization with considerably less screendisplay area, it would be advantageous
for display of comparison examina-tions. Third, we believe that use of a
stacked display with a synchnoniza-tion function could provide a superiormeans of side-by-side examination
of sections from serial examinations.Software would have to be developedto take into consideration varying sec-tion thicknesses and intervals. U
Acknowledgments: We thank Joseph K. T.
Lee, MD, and R. Eugene Johnston, PhD, fortheir help in the preparation of the manuscript.
References1. Arenson RL, van der Voorde F, Stevens JF.
Improved financial management of theradiology department with a microcostingsystem. Radiology 1988; 166:255-259.
2. Pizer SM. Psychovisual issues in the dis-play of medical images. In: Hoehne KH, ed.Pictorial information systems in medicine.Berlin, Germany: Springer-Verlag, 1985;235-250.
3. Beard DV, Johnston RE, Told 0, Wilcox C.A study of radiologists viewing multiplecomputed tomography studies using aneyetracking device. J Digital Imaging 1990;3:230-237.
4. Arenson RL, Chakraborty DP, Seshadni SB,
et al. The digital imaging workstation.Radiology 1990; 176:303-315.
5. Foley WD, Jacobson DR. Taylor AJ, et al.Display of CT studies on a two-screen elec-
tronic workstation versus a film panel al-ternator: sensitivity and efficiency amongradiologists. Radiology 1990; 174:769-773.
6. Johnston RE, Yankaskas BC, Perry JR. et al.Agreement experiments: a method forquantitatively testing new medical imagedisplay approaches. Proc SPIE 1990; 1234:621-630.
7. Beard DV. Designing a radiology work-station: a focus on navigation during theinterpretation task. J Digital Imaging 1990;3:152-163.
8. Brown JJ, Malchow SC, Totty WG, et al.Examination of the knee: interpretationwith multiscreen digital workstation vshardcopy format. AJR 1991; 157:81-85.
9. Berbaum KS, Franken EA, Honda H, et al.Evaluation of a PACS workstation for as-sessment of body CT studies. J ComputAssist Tomogr 1990; 4:853-858.
10. Straub WH, Gun D, Good WF, et ab. Pni-
many CT: diagnosis of abdominal massesin a PACS environment. Radiology 1991;1:739-743.
11. Beard DV, Hemminger BM, Perry JR. et al.Single-screen workstation versus film alter-nator for fast CT interpretation. Radiology1993; 187:565-569.
12. Beard DV, Hemminger BM, Pisano ED, etab. CT interpretations with a low costworkstation: a timing study. J Digital Imag-ing 1994; 7:1-7.
13. Beard DV, Pisano ED, Denelsbeck KM,Johnston RE. Eye-movement during CTinterpretation: eyetracker results and im-age-display-time implications. J Digital Im-aging 1994; 7:189-192.
14. Card SK, Moran TP, Newell A. The psy-chology of human-computer interaction.
Hillsdale, NJ: Erlbaum, 1983.15. Fitts PM. The information capacity of the
human motor system in controlling theamplitude of movement. J Exp Psychol1954; 47:381-391.
16. Rogers DC, Johnston RE, Hemminger BM,Pizer SM. Effect of ambient bight on elec-
tronically displayed medical images asmeasured by luminance-discriminationthresholds. J Opt Soc Am 1987; 4:976-983.