Variability in B-mode ultrasound measurements in the Atherosclerosis Risk in Communities (ARIC)...

10
ELSEVIER Ultrasound in Med. & Biol.. Vol. 22, No. 5, pp. 545-554. lYY6 Copyright 0 1996 World Federation for Ultrasound in Medicine & Biology Printed in the USA. All rights reser& 0301.5629/96 $15.00 + .OO PII: SO301-5629( 96) 00039-7 *Original Contribution VARIABILITY IN B-MODE ULTRASOUND MEASUREMENTS IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES (ARIC) STUDY LLOYD E. CHAMBLESS, t MING M. ZHONG, + DONNA ARNETT, * AARON R. FOLSOM, $ WARD A. RILEY * and GERARDO HEISS § ‘Collaborative Studies Coordinating Center, Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA; $Division of Epidemiology, School of Public Health, University of Minnesota, Minneapolis, MN, USA; *Department of Neurology, Bowman Gray School of Medicine of Wake Forest University, Winston-Salem, NC, USA; and “Department of Epidemiology, School of Public Health, University of North Carolina. Chapel Hill. NC. USA (Received 27 April 199.5: in final form 12 January 1996) Abstract-The Atherosclerosis Risk in Communities ( ARIC) study is a multicenter, long-term epidemiologic study that uses B-mode ultrasound primarily to measure carotid arterial intima-media wall thickness. To assess the reliability of the measurement process of this application of B-mode technology to population- based research, 36 volunteers from four centers were scanned at three visits, 7-14 days apart. Estimates of the components of variation in the B-mode measurements of artery wall thickness from between-person, between-sonographer, within-sonographer, between-reader and within-reader variation are presented, along with estimates of the correlation R betweenmeasures made at repeat visits by different sonographers and read by different readers. The estimates of R for mean intima-media thicknessare 0.69,0.60,0.54 and 0.66 for the carotid bifurcation, internal carotid, commoncarotid, and three-segment overall carotid mean, respectively. When these are adjusted to account for differences in between-person variance between the ancillary study and the main ARK study, the estimatedsite-specificreliability coefficients appropriate to the ARIC study population are 0.77,0.73 and 0.70 for mean carotid far-wall intima-media thicknessat the carotid bifurcation and the internal and common carotid arteries. Key Words: B-mode ultrasound, Quantitative imaging, Reliability, Repeatability, Variance, Measurement variation. INTRODUCTION The measurement of carotid artery intima-media wall thickness by B-mode ultrasound is a major com- ponent of several recent epidemiologic studies (The ARIC Investigators 1989; Bots et al. 1995; Heiss et al. 1991; O’Leary et al. 1991; Salonen et al. 1991) and clinical trials (Riley et al. 1992). In multicenter settings, even if a single, standardized protocol is used, measurement variability is typically intro- duced from several sources: ultrasound scanning equipment; sonographers; reading equipment; and readers of the scans. The repeatability of a measure- ment determines, in part, the ability to ascertain its association with an outcome in populations. If wall thickness is considered as the independent variable Address correspondence to: Dr. LloydE. Chambless, Collabo- rative Studies Coordinating Center, Department of Biostatistics, Uni- versityof NorthCarolina, Chapel Hill, NC 27514. in an analysis, then it is known (Fuller 1987) that high measurement variability causes standard linear and logistic regression to produce biased estimates of the association, with the bias being toward the null when no other covariates are included. If wall thickness is considered as the dependent variable then high measurement variability contributes not to bias but to decreased precision in estimates of association. The effects of measurement variation can be minimized at the planning and training stage, by taking steps to reduce the several components of variance, or by planning multiple measurements from which an average is calculated. Alternatively, the bias can be eliminated at the analysis stage with measurement error models (Fuller 1987) if the relia- bility of the independent variable is known. Hence, it is important to assess thoroughly the amount of total measurement variance and the source of this variance for B-mode ultrasonography. 545

Transcript of Variability in B-mode ultrasound measurements in the Atherosclerosis Risk in Communities (ARIC)...

ELSEVIER

Ultrasound in Med. & Biol.. Vol. 22, No. 5, pp. 545-554. lYY6 Copyright 0 1996 World Federation for Ultrasound in Medicine & Biology

Printed in the USA. All rights reser& 0301.5629/96 $15.00 + .OO

PII: SO301-5629( 96) 00039-7

*Original Contribution

VARIABILITY IN B-MODE ULTRASOUND MEASUREMENTS IN THE ATHEROSCLEROSIS RISK IN COMMUNITIES (ARIC) STUDY

LLOYD E. CHAMBLESS, t MING M. ZHONG, + DONNA ARNETT, * AARON R. FOLSOM, $

WARD A. RILEY * and GERARDO HEISS § ‘Collaborative Studies Coordinating Center, Department of Biostatistics, University of North Carolina,

Chapel Hill, NC, USA; $Division of Epidemiology, School of Public Health, University of Minnesota, Minneapolis, MN, USA; *Department of Neurology,

Bowman Gray School of Medicine of Wake Forest University, Winston-Salem, NC, USA; and “Department of Epidemiology, School of Public Health, University of North Carolina. Chapel Hill. NC. USA

(Received 27 April 199.5: in final form 12 January 1996)

Abstract-The Atherosclerosis Risk in Communities ( ARIC) study is a multicenter, long-term epidemiologic study that uses B-mode ultrasound primarily to measure carotid arterial intima-media wall thickness. To assess the reliability of the measurement process of this application of B-mode technology to population- based research, 36 volunteers from four centers were scanned at three visits, 7-14 days apart. Estimates of the components of variation in the B-mode measurements of artery wall thickness from between-person, between-sonographer, within-sonographer, between-reader and within-reader variation are presented, along with estimates of the correlation R between measures made at repeat visits by different sonographers and read by different readers. The estimates of R for mean intima-media thickness are 0.69,0.60,0.54 and 0.66 for the carotid bifurcation, internal carotid, common carotid, and three-segment overall carotid mean, respectively. When these are adjusted to account for differences in between-person variance between the ancillary study and the main ARK study, the estimated site-specific reliability coefficients appropriate to the ARIC study population are 0.77,0.73 and 0.70 for mean carotid far-wall intima-media thickness at the carotid bifurcation and the internal and common carotid arteries.

Key Words: B-mode ultrasound, Quantitative imaging, Reliability, Repeatability, Variance, Measurement variation.

INTRODUCTION

The measurement of carotid artery intima-media wall thickness by B-mode ultrasound is a major com- ponent of several recent epidemiologic studies (The ARIC Investigators 1989; Bots et al. 1995; Heiss et al. 1991; O’Leary et al. 1991; Salonen et al. 1991) and clinical trials (Riley et al. 1992). In multicenter settings, even if a single, standardized protocol is used, measurement variability is typically intro- duced from several sources: ultrasound scanning equipment; sonographers; reading equipment; and readers of the scans. The repeatability of a measure- ment determines, in part, the ability to ascertain its association with an outcome in populations. If wall thickness is considered as the independent variable

Address correspondence to: Dr. Lloyd E. Chambless, Collabo- rative Studies Coordinating Center, Department of Biostatistics, Uni- versity of North Carolina, Chapel Hill, NC 27514.

in an analysis, then it is known (Fuller 1987) that high measurement variability causes standard linear and logistic regression to produce biased estimates of the association, with the bias being toward the null when no other covariates are included. If wall thickness is considered as the dependent variable then high measurement variability contributes not to bias but to decreased precision in estimates of association. The effects of measurement variation can be minimized at the planning and training stage, by taking steps to reduce the several components of variance, or by planning multiple measurements from which an average is calculated. Alternatively, the bias can be eliminated at the analysis stage with measurement error models (Fuller 1987) if the relia- bility of the independent variable is known. Hence, it is important to assess thoroughly the amount of total measurement variance and the source of this variance for B-mode ultrasonography.

545

546 Ultrasound in Medicine and Biology Volume 22. Number 5. 1906

METHODS

The ARIC study includes a prospective investiga- tion of the etiology and natural history of atherosclero- sis and the etiology of clinical atherosclerotic disease in cohorts selected from four U.S. communities: For- syth County, NC; Jackson, MS (an all-black cohort) ; selected suburbs of Minneapolis, MN; and Washington County, MD (The ARIC Investigators 1989). A sam- ple of approximately 4000 residents aged 45 -64 years was drawn from each community to take part in an extensive examination of cardiovascular risk factors and their atherosclerotic sequelae. A major component is an ultrasonic examination of the carotid arteries to obtain precise measurements of arterial wall intima plus media thickness at standard sites. Two different methods were used to assess measurement variability of carotid wall thickness. Repeat scans and repeat read- ings were built into the regular cohort examination as a quality control (QC) program. Also an intraindividual variability (IIV) study was conducted by scanning 36 individuals from the four centers on three visits. From these data, the repeatability of the B-mode ultrasound measurements on mean carotid far-wall thickness was estimated.

High-resolution longitudinal ultrasonic B-mode images were acquired using a Biosound 2000 11 SA. The ARIC study followed a detailed protocol to evalu- ate four high-resolution ultrasound systems prior to selection of a system in 1986. The protocol included measurement of the transmitted pressure waveform and lateral beam characteristics in the focal plane using a miniature hydrophone probe, determination of effec- tive axial and lateral resolution using a tissue-equiva- lent phantom, and evaluation of the performance and ease of use by sonographers during participant exami- nations. The model Biosound 2000 11 SA was the Bio- sound model marketed in 1986 and was the instrument selected. The effective axial resolution is approxi- mately 0. IO mm and the effective lateral resolution in the focal plane is slightly better than 1.0 mm. The transducer is a mechanically scanned annular array with variable depth electronic focusing. When an im- age frame is digitized with most common image cap- ture boards, the pixel dimension is approximately 0.067 mm. considerably smaller than most B-mode imaging systems at that time.

At each field center, the scanning instrument was standardized and monitored at intervals ranging from 1 week to 1 month by standard tissue-mimicking phan- toms. The RMI (Radiation Measurements Inc.) Model 414B phantom was used. Complete scans of all study participants were recorded on high-resolution z-in. ( 19.1 mm) Sony video-cassettes which were sent to

the ultrasound reading center (URC ). where images were transcribed onto an optical disk which was then randomly assigned to one of five to seven readers. A B-mode image reading station consisted of a personal computer (IBM-XT equivalent), a frame-grabber card, a color image monitor and an optical disk player. A Truevision Targa image capture board was used. Read- ing station pixel size (0.067 mm) was calibrated at the beginning of each participant study. Sonographers and readers were centrally trained, followed by supervised practice and certification.

The ARIC scanning and reading protocols (Bond et al. 1991; National Heart, Lung, and Blood Institute 1987; Riley et al. 1991) divide the extracranial carotid arteries into three l-cm segments as shown in Fig. 1: the internal carotid; the carotid bifurcation; and the common carotid. In each arterial wall segment, four boundaries are identified on the image: the adventitia- media and the intima-blood on the near-wall, and the blood-intima and media-adventitia on the far-wall. During scanning the sonographers recorded the best possible images on the basis of clear visualization of the arterial boundaries; Doppler signals were used only to verify the identification of the internal carotid arter- ies. All measurements were made from longitudinal B-mode images of the carotid arteries as illustrated in Fig. 1. The examination was performed with the patient

Fig. 1. Schematic overview of carotid artery B-mode ultra- sound measurements.

Variability in B-mode ultrasound 0 L. E. CHAMRI.F:SS et trl. 547

in the supine position and the sonographer seated at the head of the examinee. The head was tilted and placed on a foam wedge oriented approximately 45” to the left when the right side was examined and to the right when the left side was examined. The sonog- rapher was instructed to seek out the interrogation angle which, if possible. displayed the internal carotid and external carotid in the same plane. If the anatomy of the participant would not permit this image to be acquired, the sonographer was instructed to seek an interrogation angle for which the bulb most notably widened when scanning from the common carotid ar- tery to the bifurcation. The three segments were then carefully scanned, using fine manipulations of the transducer, to visualize as clearly as possible the dou- ble line pattern defining the combined intimal-medial thickness at both the near wall and far wall of the segment. In those cases where both near wall and fal wall could not be well visualized at the same interroga- tion angle, the sonographer gave priority to the far wall boundaries while also maintaining a good image of the near wall media-adventitia boundary. When the optimum image was obtained, the sonographer de- pressed a footswitch to trigger a tone code which per- mits readers to locate this image sequence on the video recording.

At the URC, the portions of the videotape identi- fied by the sonographer as containing the best bound- ary images for each arterial segment were transcribed to optical disk. From it, the readers selected the systolic frame (largest lumen diameter) which best visualized the arterial wall boundaries. Wall motion was used to assist in identifying the boundaries. Arterial boundaries were marked to the nearest image pixel (0.067 mm) on the screen by placing a computer-controlled cross- hair at the edges. Up to 11 points could be marked at I- mm intervals along a l-cm segment at each boundary. However, some points were difficult to visualize due to arterial curvature, tortuosity, kinking, arterial mineral deposits or arteries very deep from the skin surface. To recover some of the missing information, a cubic splining procedure was applied so that when at least 5 of the 1 I points on a boundary were seen, missing points between the visualized points were interpolated. Details on the reading protocol are given elsewhere ( National Heart. Lung, and Blood Institute 1987; Riley et al. 1991 ).

Mean arterial wall intima-media thickness at a segment is defined as the mean of all available pairs of coordinate readings, including the splined coordinates. Even after interpolation at the coordinate level, a sig- nificant percentage of the segment-specific means re- mains missing. Since both the correspondence between ultrasound echoes and arterial wall structures are less

precise and missing data more common at the near wall than the far wall, most B-mode data analyses in ARIC focus only on the far-wall thickness. As a sum- mary index, overall far-wall mean (grand mean) of six segments-left and right internal, bifurcation and common-is often used. Missing segments are im- puted when a value is available at at least one of the segments. The imputation is done by maximum likeli- hood using the EM algorithm (Little and Rubin 1987 1. In this article, however, segment-specific analyses are based on the observed mean far-wall thickness. includ- ing the splining interpolation of the coordinate data, but not the EM imputation. Only for the grand mean have these imputed data been included. The data used in this study come from two sources.

1. Ultrasound IIV study. The IIV study was de- signed to assess the between-person variation to&,), the between- and within-sonographer varia- tion ( u isON, akSON) and the between- and within- reader variation ( a&, a& ). Thirty-six healthy white volunteers ( 15 men and 2 1 women. mean age 54 y ) were recruited for the ultrasound intraindividual vari- ability study (IIV). At each field center these volun- teers were invited to attend three visits spaced 7- 14 d apart. At each visit, two ultrasound B-mode scans were obtained by each of the sonographers indepen- dently. A unique identification number was assigned to each scan so that the URC was blinded to the identity of the participants. At the URC. half of the IIV scans were randomly selected for repeat readings. with some speci tied as rereadings by the same reader. some by a randomly selected second reader. Three field centers had two sonographers, and one had three. There were 14 readers.

To lessen participant burden, only the left or the right carotid was randomly selected for each partici- pant (half on the left and the other half on the right). Due to limited sample size. the IIV analyses were done with side effect ignored. The results thus represent the average of the two sides.

2. Quality control (QC) wpliwte datu. ARIC’s ongoing ultrasound QC programs were designed to monitor the same-visit-within-sonographer and the within- and between-reader variation. Since the sample sizes of the QC studies are much larger than that 01 the IIV study, we include these data to confirm results from the IIV study, although the IIV study includes more sources of variance than the ongoing QC study. The repeat scan program was designed such that after finishing all the segment scannings for a participant, one or two segments were randomly selected by the computer to be repeated by the same sonographer. QC repeat scanning data were stored with the main study data and were read by the same reader (unblinded).

54i Ultrasound in Medicine and Biology Volume 21. Number 5, 1996

In a separate QC program, independent, blinded repeat way factorial random effects model Y = PERSON + readings on randomly selected participants are per- READER + PERSON*READER + ERROR to esti- formed at the URC, sometimes assigned to the same mate between-person, between-reader and within- reader, sometimes to a different reader. For analysis reader variations, the latter from the error term. We of the QC replicates, right and left sides were analyzed ascribed variation from the reader*person term to be- separately. tween-reader variation.

Stutistical methods We applied a random effects model (Searle 197 I )

to the IIV data to estimate the between-person vari- ance, between- and within-sonographer variance and between- and within-reader variance. Details on the modeling are given in the Appendix. Of primary inter- est is the proportion of the between-person variance over the total variance, which is also called the reliabil- ity coefficient, R. In an epidemiologic study, relating differences between people with respect to arterial wall thickness to some potential risk factor, one hopes that R is near unity, so that the measurement “well classi- fies” a person with respect to wall thickness. R can also be interpreted as the correlation between measures made at different visits, scanned by different sonogra- phers, and read by different readers for an individual. Similarly. other correlations can also be defined: be- tween repeat readings by the same reader on the same scan; between repeat readings by different readers on the same scan; between scans by the same sonographer read by the same reader; and so forth. All of these correlation coefficients may be expressed as R, =

2 5T - 5;

5; t where a$ is the sum of the components of

variance related to the type of repeatability being con- sidered and cr$ is the sum of all components. A large component of 5; - 5; is a&, the between-person component, which can vary greatly between studies. So for across-study comparability purposes it is often preferable to consider 5$ . The results of the variance components analysis are presented by source of varia- tion, giving the square root of the variance associated to that source and the ratio of this square root of vari- ance to the mean of far-wall thickness, multiplied by 100. This latter statistic will be referred to as a coeffi- cient of variation (CV).

For the QC intrasonographer replicate data, we

used the nested random effects model Y = PERSON + ERROR to estimate the between-person and within- sonographer + within-reader variance components. Within-sonographer and within-reader components cannot be estimated separately from this dataset. How- ever, because within-reader variation can be estimated from the reader replicate data, within-sonographer variation could then be obtained by subtracting the within-reader variation from the sum.

For the QC reader replicate data, we used the two-

All the models stated above were implemented by the SAS VARCOMP procedure (SAS 1993). The maxi- mum likelihood method was chosen over the sum-of- squares method to avoid producing negative variance components. This requires the assumption of normal dis- tributions for each term in the model. In addition to the normality assumption, linearity and independence (among effects) assumptions are also required for these models. An alternative procedure to estimate the overall correlation coefficients, using virtually no assumptions except independence among participants, is a random resampling method. Two thousand independent samples were drawn from the IIV dataset. In each sample, a pair of measurements was selected for each person. The pair, in random order, consists of measurements made from different visits, different sonographers, and different read- ers. Sample variances of the two measurements (VI; , V,, ; where i is the index for the sample) and the sample covariance (C, ) were derived for each sample. Because the two measurements were in a completely random or- der, it was expected that they had the same mean and the same variance. Thus, pooled mean and variance were used in estimating R. This led to estimating R by

c

(V, + T3)/2’ the bar representing means over the 2000

samples. The grand mean in the IIV study is the mean of

three segments from one side, whereas the grand mean in the ARIC main study, which is of interest to many investigators, is the mean of six segments from both sides. Results presented are the grand mean of three segments from the IIV study, but unfortunately the IIV study design does not permit transforming those estimates of measurement variation to estimates for the six-segment mean, basically because between side measurement covariances are not available.

For the ARIC main study data and QC data, we analyzed the left side and right side separately. We calcu- lated the unweighted average of the means and variance components for the two sides to be comparable with the IIV data. Because the side differences were very small, in this article we present only the simple averages of means and variances of the two sides.

RESULTS

Table 1 presents sample sizes and summary statis- tics of carotid artery far-wall intima-media thickness

Variability in B-mode ultrasound 0 L. E. CHAMBLESS et al. 549

Table 1. Sample size, mean (mm), standard deviation (SD) and coefficient of variation (CV) of carotid artery far-wall intima-media thickness from ARK main study data and the ARIC IIV study, by segment.

Study

ARK main study”

ARK IIV study’

Segment

Bifurcation Internal Common Six-segment grand meanb Bifurcation Internal Common Three-segment grand meanil

N Mean SD cv (mm) (mm) (TO)

8.331 0.857 0.390 46 5.732 0.719 0.403 56

11,143 0.666 0.183 27 13.479 0.745 0.191 36

35 0.769 0.277 36 33 0.659 0.325 49 36 0.616 0.147 24 36 0.699 0.188 27

’ Means, sample sizes and variances are unweighted averages of left and right sides. ’ Mean of six segments, observed plus imputed, at both sides. ’ Average of 18 participants scanned on the left side and 18 on the right side. d Mean of three segments at one side, observed plus imputed, on the left for 18 participants and on the right for the others.

from the ARIC main study and from the IIV study. The standard deviations (SD) are higher on the right side for all three segments, and the means are higher on the right side for the bifurcation and internal seg- ments (data not shown). The means and SDS of far- wall thickness at all carotid segments are somewhat lower in the IIV sample than in the ARIC population. This should be expected because the IIV participants are healthy volunteers and all of one race (white). Table 2 shows the square roots of the variance compo- nents and coefficients of variation (CV) for ARIC QC data. At the bifurcation and internal carotid, the sonog- rapher and reader variations are higher on the right side than the left side except in one case, while at the common carotid no side effect is seen (data not shown). Comparing the average of left and right side of the QC data in Table 2 with the IIV data in Table 3, the between-reader and the within-reader variations are, in most cases, slightly larger in the QC data than

those in the IIV data. The within-sonographer (within- visit, same reader) variations are similar between the two tables. Because of one influential observation we can only conclude that the components are about the same or much larger for the sonographer component than the reader component for the bifurcation. Table 3 also shows that the total sonographer component of measurement variance is much larger than the total reader component for the internal carotid and about the same for the common carotid.

Table 4 contains correlations between repeated measures, from the IIV and the QC study, as estimated from the linear variance component models. From the ARIC IIV study the overall reliability coefficients, which are correlations between repeat measurements on a person scanned by different sonographers at dif- ferent visits and read by different readers, are 0.61, 0.53, 0.53 and 0.67 for the bifurcation, internal, com- mon and three-segment grand mean, respectively. For

Table 2. Square roots of variance components (mm) for far-wall thickness by arterial segment: ARK QC data”.

Source of variation

Repeat scannings Within sonographer

(witin-visit) + within-reader

Carotid bifurcation Internal carotid Common carotid

Square root of Square root of Square root of variance component variance component variance component

N cm4 N (mm) N (mm)

720 0.170 539 0.127 565 0.082

Repeat readings Between-reader 1280 0.033 850 0.102 1733 0.060 Within-reader 0.164 0.125 0.05 I Reader total 0.168 0.162 0.079

‘Repeated scans and readings determined by the quality assurance protocol. Sample sizes and variance components are unweighted averages over left and right sides.

Gltrawund in Medicine and Biology Volume 72. Number 5. IYY6

both sources of data, the within-reader reliability coef- ficients are 0.83 or better at all segments, except for the IIV bifurcation. The between-reader correlations are close to 0.8 at all segments for both data sources. The within-sonographer (within-visit, same reader 1 re- liability coefficients from the IIV study are around 0.8 for the internal and 0.65 for the bifurcation and com- mon, whereas, as estimated from the QC study, they are around 0.78 for the bifurcation and common and 0.90 for the internal.

Table 5 compares the estimates of overall reliabil- ity coefficients between the variance components method and the resampling method. They agree very closely for the common and grand mean, but for the bifurcation and internal the coefficients are somewhat higher for the resampling method. From the resampling method the reliability coefficients are 0.69. 0.60, 0.54 and 0.66 for the bifurcation, internal, common and three-segment grand mean. respectively.

DISCUSSION

The assessment of measurement variance in the ARIC study reported here considers only short-term variation, and touches not at all on drift over time. Because such drift can be a major difficulty in the analysis of change in arterial wall thickness over time. long-term epidemiologic studies need not only to mon- itor and minimize short-term variability, but also to measure and control measurement drift in the scanning and reading process.

A potential threat to reproducibility in ultrasound data derives from missing information, which can oc- cur to a considerable degree in protocols based on precise measurements of B-mode images of arterial wall structures. The amounts of missing carotid in- tima-media wall thickness are about 60% at the inter- nal, 40% at the bifurcation and 15% at the common carotid in the ARIC main study. One way to present the amount of missing data from the IIV study is to note that, in the 2000 resamples of pairs of measure- ments from the 36 volunteers, the mean sample sizes are 17, 16 and 32 for the bifurcation. internal and common, respectively.

In screening the IIV data for potential outliers, all readings were found to be within plausible biologic range. However, for one person there was a consistent difference between the two sonographers at the bifurcation. wall thickness being around 2.6 mm on average for one sonog- rapher versus 1.1 mm for the other. This markedly influ- ential participant corresponds to a volunteer who had an eccentric atherosclerotic lesion in one carotid bifurcation which was not visible in the angle of interrogation speci- fied by protocol, but entered the field of view following

Variability in B-mode ultrasound 0 L. E. CHAMHLES rr al. 551

Table 4. Correlation coefficients between various repeat measures for far-wall thickness of the carotid artery: random effects modeling for ARIC IIV study and ARIC QC data.

Study Source Bifurcation Internal Common Three-segment

grand mean

ARtC QC” Within-sonographer (within-visit). same reader 0.78 0.90 0.78 Between-reader 0.85 0.85 0.8 I Within-reader 0.86 0.91 0.92

ARK IIV” Overall 0.61 0.53 0.53 0.67 Within-sonographer (within-visit), same reader 0.64 0.8 I 0.67 0.7-l Between-reader 0.77 0.X5 0.78 0.7C) Within-reader 0.79 0.87 0.83 0.x.3

dR= 1 - average measurement variance

average total variance ’ where measurement variance and total variance are unweighted averages obcr the

left and the right side. - ’ Includes 18 participants scanned on the left side, and 18 on the right

a minor change in the transducer angle of interrogation. The ARIC protocol provides standards for sonographers on the transducer angle of interrogation, and instructs the sonographers to disregard arterial wall thickness and the presence or absence of a lesion in identifying the optimal angle. Because the sample size in the IIV study is small, the between-sonographer variation and the total variation are markedly influenced by this extreme case. Inclusion of this influential observation in the analyses increased the within-sonographer-within-visit-same reader corre- lation from 0.64 to 0.77, and increased both the between- reader and within-reader correlations to 0.88 (see Table 4). The impact of this influential observation on the re- peat measures of the three-segment grand mean was much smaller, of course. The impact of this single obser- vation on the estimates of the overall reliability coeffi- cient was to decrease the correlation between repeat mea- sures by different sonographers at different visits read by different readers to 0.47 for the bifurcation, and 0.62 for the three-segment grand mean. Because the frequency of large, eccentric lesions in the ARIC population is not known to us (nor estimable from the ultrasound examina- tions performed using our study protocol), it is not possi- ble to provide a quantitative estimate of the contribution of such lesions to the variability of ARK’s ultrasound data.

Although the measurement variances and the co- efficients of variation are substantially smaller at the common carotid than those at the bifurcation and inter- nal carotid, the overall reliability coefficients, R (Table 5), are not much different. because of the effect of between-person variation. The mean and the variance components of the right side arterial wall thickness are higher for the internal and bifurcation than those of the left side. It is not known whether the differences truly exist with respect of human anatomy. or are caused by the differences in procedures. For example, scanning the right side may be different from scanning the left side due to the right-handedness of most of the sonographers, and reading may differ slightly for the two sides because the images are inverted with respect to each other.

Due to the homogeneity of the IIV volunteers (e.g., they are all white and healthy), the between- person variations, and therefore the reliability coeffi- cients, are expected to be somewhat smaller than those of the main study population. Under the assumption that measurement variation was the same in the IIV and main ARIC study, the total measurement variances estimated from the IIV study (by the resampling method) and the total variance estimated from the ARK main study can be combined to produce esti-

Table 5. Overall reliability coefficients: three estimators of correlation coefficients for site-specific and grand mean far-wall thickness, for repeated measures by different sonographers at different

visits read by different readers.

IIV data: random Measurement variance from effects modeling IIV data: IIV study: total variance

Segment method resampling method from ARIC main study

Bifurcation 0.61 0.69 0.77 Internal 0.53 0.60 0.73 Common 0.53 0.54 0.70 Three-segment grand mean’ 0.67 0.66

,’ Grand mean for the IIV study refers to the mean over the three segments on one side.

557 Ultrasound in Medicine and Biology Volume 22. Number 5, 1996

mates of reliability. This results in estimates of 0.77 for the bifurcation, 0.73 for the internal and 0.70 for the common carotid. Comparable estimates for the six- segment grand mean are not available, because the measurement error covariances between the left and the right side are not available. To put these estimates of reliability coefficients in perspective we can com- pare them with coefficients for other variables of inter- est in epidemiologic studies: for plasma cholesterol R = 0.94 for ARIC, other studies 0.76-0.90 (Chambless 1992a) ; plasma HDL-cholesterol R = 0.94 for ARIC, other studies 0.61-0.96 (Chambless 1992a) ; plasma fibrinogen R = 0.72 for ARIC, other studies 0.54- 0.72 (Chambless 1992b); antithrombin III activity R = 0.42 and apolipoprotein A-I R = 0.60 in ARIC (Chambless 1992b).

The ACAPS ultrasound reproducibility study (Riley 1992) differs from the ARIC study in that the two repeat visits are a month apart, the same reader reads each pair of scans, the number of participants is much larger (n = 405 for same sonographer, n = 453 for the different sonographer, n = 40 for within-reader, and n = 41 for between-reader repeat studies), the numbers of scans per person are fewer (two as opposed to ARIC’s 12), and within-visit variability between- sonographer or within-sonographer is not considered separately. Further, the ACAPS study subjects were required to have focal atherosclerotic lesions and to have “a minimum of four (out of six possible) high- quality IMT measurements from each carotid system.” The variables of interest in ACAPS were different- of primary interest was mean over all “sites” of maxi- mum wall thickness, where “site” here includes right and left carotid, and internal, common or bifurcation, and near or far wall. Of secondary interest was the maximum of the “site’‘-specific maxima. Only the pri- mary variable is discussed here in a comparison be- tween the ACAPS and ARIC three-segment grand mean results, and reference is made only to the ARIC estimates from the IIV random effects model. Using our notation, ACAPS derived between-sonographer (between-visit)/same reader R = 0.75, as compared to ARIC’s 0.7 1; ACAPS within-sonographer (between- visit)/same reader R = 0.79, as compared to ARIC’s 0.78; ACAPS within-reader R = 0.95 compared to ARIC’s 0.80-0.96 at the various sites for main study QC data, and 0.85-0.89 at the individual sites for the IIV study. For between-reader repeatability, ACAPS had R = 0.73 compared to ARIC’s 0.78-0.88 at the various sites in the main QC study, and 0.77-0.88 at the individual sites for the IIV study.

The study on ultrasound variability reported by Salonen et al. (1991) included ten 58-year-old men from the Kuopio Ischaemic Heart Disease Risk Factor

Study. The individuals visited one clinic three times. at intervals of 1 week, and were scanned and read with a single ultrasound system by four “observers,” who did both the scanning and the reading. The scanning protocol differed from ARIC, in that the angle of inter- rogation was chosen to show the maximal far-wall thickness, and the maximal thickness was measured twice on each of the left and right common carotids, and the average of these four measures used. Thus, the homogeneity of test subjects, i.e., the between-person variance, and the homogeneity of equipment are quite different from the ARIC IIV study. Furthermore, the measurements being considered (mean of left and right maximal far-wall thickness vs. one-sided mean wall thickness) are also quite different. We restrict our at- tention in this discussion to measures of variation most comparable to ARIC’s. Between-observer CVs were given as 10.5% for each side. The ARIC IIV study had approximately 14% CV for a sum of between- reader and between-sonographer components. Salonen et al. ( 1991) report a mean coefficient of variation for the difference between first and third visits of around 6%, whereas ARIC reports a CV of 10% for within- sonographer variation. They conclude that most of the method variability in B-mode ultrasound measure- ments is due to intersonographer variation. The ARIC IIV study reached the same conclusion.

In the Rotterdam study (Bots et al. 1995) on re- producibility of carotid artery wall thickness measure- ments, only the common carotid was considered, both the left and the right and the mean over the two. The far-wall intima-media thickness measurement on each side was the average over three digitized images. The subjects were 80 randomly selected subjects of the main Rotterdam study, invited within 3 months after their baseline carotid ultrasonography for a second ul- trasound scan. The subjects were over 55 years of age, 56% female and of mean age 68 y. The overall correla- tions between measurements made at different visits by different sonographers and read by different readers were 0.71 for the right side, 0.66 for the left and 0.74 for the mean of two sides. Although these correlations should also estimate the proportion of total variance due to differences between subjects, assuming the time span too short for biologic within-person variation, this proportion is also reported as 0.87, with no discussion given of the difference between the estimates. It is also stated that most of the measurement variation was due to reader variation, although no evidence is given, and, indeed, from the fact that for the left side and mean over both sides the correlations between measurements by different sonographers are notably lower than the correlation between different readers it actually seems

Variability in B-mode ultrasound 0 L. E. CHAMBLESS er nl. 553

that sonographer variation is larger, as in the ARK and the Finnish studies.

Persson et al. ( 1992) from Malmo, Sweden report on a reproducibility study for common carotid mean far-wall thickness. Intersonographer (two different so- nographers, same visit) and intrasonographer (same sonographer, two different visits) repeat scans were done, but it is not clear whether the readings of paired scans were done by the same or different readers (who were also the sonographers), so comparison to ARIC is not straightforward. The only statistics reported which are similar to ARIC’s are the correlations be- tween repeat measurements. The correlation for intra- sonographer (between-visit) was 0.89, and for interso- nographer (within-visit) the correlation was 0.84. For ARIC, assuming different readers, these were both around 0.54 from the IIV ancillary study, and around 0.68 if adjusted to ARIC study total variance.

The results of the CHS (O’Leary et al. 1991) and a French/Italian reliability study (Touboul et al. 1992) were presented in terms of rank correlations and mean absolute difference, so are not really comparable to the presentation of the ARK and other studies reviewed here.

With reliability coefficients in the neighborhood of 0.7, estimates of the association between wall thick- ness and a disease outcome, such as prevalent or inci- dent coronary heart disease or stroke, will be generally biased toward the null in the standard logistic regres- sion or survival analysis methods, with true relative risk ( RR7 ) being approximately a power of the ex- pected value RRE of estimated RR, RRT = RR:-’ = RRi7+ = RRk4. This is properly true only for regres- sion with wall thickness as the only explanatory (inde- pendent) variable (Fuller 1987), but is also likely to hold when additional explanatory variables without measurement error are added to the model. The results presented in this study are not directly applicable, al- though they are relevant, to clinical applications of B-mode ultrasound such as diagnostic evaluation and decisionmaking. Estimates for these applications can be obtained by extending the approach used in this study. Knowledge of the variability inherent to the B- mode scanning and measurement process provides the basis for achieving validity of the cutpoints used in screening and diagnosis employing this technology, for the reliability of decisionmaking and for the prediction of risk. For example, access to estimates of measure- ment variability such as the ones presented here, does allow the adjustment of relative risk and other esti- mates for attenuation due to measurement error. More generally, such studies as the present one should in- crease the awareness of the potential bias when analyz-

ing arterial wall thickness as an independent variable and the effects on lowering precision when considering

wall thickness as a dependent variable. Furthermore, efforts should be maintained to decrease the measure- ment variation in future studies, by improvement in the measurement process, by making multiple mea- surements, and to always include repeat measurements on a subsample so that intraindividual variability can be assessed.

Ach~wledgemenfs-This research was supported by contracts NOI- HC-55015. NOl-HC-55016, NOl-HC-55018, NOl-HC-55019, NOl- HC-55020, NOl-HC-55021 and NO 1-HC-55022 from the National Heart, Lung, and Blood Institute. The authors to thank Emestine Bland for the preparation of the manuscript. The following coworkers in the AI-X project are also acknowledged: ARIC Coordinating Center, University of North Carolina: Richard Hayes, Doris Jones, George Williams; Hemostasis Lab: Pam Pfile, Hoang Pham, Valerie Stinson. Teri Trevino; Lipid Lab: Val Creswell, Maria Messi, Julita Samoro, Selma Soyal; Forsyth County Field Center: Carolyn Bell, Jeannette Bensen, Delilah Cook, Teresa Crotts, Amy Haire, Suzanne Pillsbury, Delilah Posey; Jackson Field Center: Connie Myers, Virginia Overman, Stephanie Parker, Liza Sullivan; Minneapolis Field Center: Marilyn Bowers, Ellie Justiniano, Gail Murton, Leone Reed: Washington County Field Center: Carol Christman, Dorrie Costa, Melissa Minotti, Joan Nelling.

REFERENCES The ARIC Investigators. The atherosclerosis risk in communities

(ARIC) study: Design and objectives. Am J Epidemiol 1989; 129:687-702.

Bond MG, Barnes RW, Riley WA, Wilmoth SK, Chambless LE. High resolution B-mode ultrasound scanning methods in the ath- erosclerosis risk in communities ( ARIC) study. Neuroimaging 1991; 1:68-73.

Bots ML, Mulder PGH, Hofman A. Reproducibility of carotid vessel wall thickness thickness measurements. The Rotterdam study. J Clin Epidemiol (in press).

Chambless LE, McMahon RP, Brown SA. Patsch W, Heiss G. Short- term intraindividual variability in lipoprotein measurements: The atherosclerosis risk in communities ( ARIC) study. Am J Epide- miol 1992a;l36:1069-1081.

Chambless LE, McMahon RP, Wu K. Folsom A, Finch A. Short- term intraindividual variability in hemostasis factors: the ARIC Study. Ann Epidemiol 1992b;2:723-733.

Fuller WA. Measurement error models. New York: John Wiley & Sons, 1987.

Heiss G, Sharrett AR, Barnes R, Chambless LE. Szklo M. Carotid atherosclerosis measured by B-mode ultrasound in populations: Associations with cardiovascular risk factors in the ARIC study. Am J Epidemiol 1991; 134:250-256.

Little R, Rubin J. Statistical analysis with missing data. New York: John Wiley & Sons, 1987.

National Heart, Lung, and Blood Institute. Atherosclerosis risk in communities (ARIC) study. Operations manual no. 6: Ultra- sound assessment, version 1.0, April 24, 1987.

O’Leary DH, Polak JF. Wolfson SK Jr. Use of sonography to evalu- ate carotid atherosclerosis in the elderly: The cardiovascular health study. Stroke 1991;22:1155-1163.

Persson J, Stavenow L. Wikstrand J, Israelsson B, Formgren J. Non- invasive quantification of atherosclerotic lesions: Reproducibility of ultrasonographic measurement of arterial wall thickness and plaque size. Arterioscler Thromb 1992; 12:261-266.

Riley WA, Barnes RW, Applegate WB. Dempsey R, Hartwell T. Reproducibility of noninvasive ultrasonic measurement of ca- rotid atherosclerosis: The asymptomatic carotid artery plaque study. Stroke 1992;23:1062-1068.

Riley WA, Barnes RW, Bond MG, Evans GW, Chambless LE. High resolution B-mode ultrasound reading methods in the atheroscle- rosis risk in communities (ARIC) study cohort. Neuroimaging 1991:1:168-172.

5.5-I Ultrasound in Medicine and Biology Volume 22. Number 5, 1996

Salonen R, Haapanen A. Salonen J. Measurement of intima-media thickness of common carotid arteries with high-resolution B- mode ultrasonography: Inter- and intraobserver variability. Ultra- sound Med Biol 1991; 17:225-230.

SAS user’s guide: Statistics, Vol. 6. No. 3. Cary, NC: SAS Institute. Searle SR. Linear models. New York: John Wiley & Sons, 1971. Touboul PJ, Prati P, Scarabin PV, Andrai V, Thibout E. Use of

monitoring software to improve the measurement of carotid wall thickness by B-mode imaging. J Hypertens 1992: IO(suppl 5):S37-s41.

APPENDIX

We applied a random effects model ( Searle I97 I ) to the IIV study data:

Y iji(,mri = PERSON, + SONOG, + PERSON, *SONOG,

+ VISITk(PERSON, ) + SONOG,* VISITI( PERSON, )

+ SCAN,(PERSON, . SONOG,. VISIT, )

+ READER,,, + ERROR ( I )

for Y = far wall arterial thickness. The variou, terms in the model are assumed independent.

From this model one can write the total variance as:

where the variance components are listed in the same order as the term in the model. 05, is the between-person variation and (T?,,,\,~ is the between-visit variation. There is unlihely tn be perceptible

biologic within-person variation in arterial wail thicl\ness between visits, because the visits are only a week apart. so g&17 is ascribed to within-sonographer variation beyond that occurring within a visit. which is oGcAN. Sonographer performance may vary among sonogra- phers differently for different participants or different visits, so gq:OhO(j. ~&X,rPEKsON. and ~&CrvIS17 are all ascribed to between- sonographer variation, a& is the between-reader variance. a& is the within-reader variance, from the error term. (An alternative and direct estimate of cr$, comes from the sums of squares of differences between readings by the same reader on the same scan, but these are so close to the estimates from the model they are not reported.) o: ~ fli,, is the total measurement variance, where (T:- is the total of all the variance components.

Of primary concern is the ratio R = U&/D+. the prop&on of total variance which falls in the a& component. which is also called the reliability coefficient. R can also be interpreted as the correlation between measures made at repeat visits scanned by differ- ent sonographers and read by different readers, as follows from the definition of the correlation coefficient as the ratio of covariance to variance. where the variances at repeated measures are assumed equal. Note that the covariance in the detinjtion of R is:

cm ( ~;,ii.,,, . Y,,.i. ,,,/ ‘,, ) = Cov(PERSON,. PERSON, )

= Var( PERSON) = o;,

for j f j’, k f k’, Ott f ftz’. Similarly. other correlations are given as follows: between repeat readings by the same reader on the same

1 scan, K,, = I - awK : between repeat readings by different readers

0; 1

on the same scan, R,,, = I ~ (TiK + auq

CT; : and between different

scans by the same sonographer on the-same person at the hame visit read by the same reader-. RwuwvnK = (rfsl, + terms with SONOG + (T&,, + CT;, vu;,