Patients were more consistent in randomized trial at prioritizing childbirth preferences using...

13
Patients were more consistent in randomized trial at prioritizing childbirth preferences using graphic-numeric than verbal formats Karen B. Eden a, * , James G. Dolan b,c , Nancy A. Perrin d , Dundar Kocaoglu e , Nicholas Anderson f , James Case g , Jeanne-Marie Guise a,h,i a Oregon Health & Science University (OHSU) Department of Medical Informatics & Clinical Epidemiology, Portland, OR, USA b Unity Health System, Rochester, NY, USA c University of Rochester Department of Medicine, Rochester, NY, USA d OHSU School of Nursing, Portland, OR, USA e Portland State University Department of Engineering and Technology, Management, Portland, OR, USA f University of Washington Department of Medical Education and Biomedical, Informatics, Seattle, WA, USA g OHSU Division of General Internal Medicine, Portland, OR, USA h OHSU Department of Obstetrics & Gynecology, Portland, OR, USA i OHSU Department of Public Health & Preventive Medicine, Portland, OR, USA Accepted 5 May 2008 Abstract Objective: We developed an evidence-based decision aid to help women with a prior cesarean to prioritize their childbirth preferences related to a future birth. Because there was uncertainty about which scale format would assist the patients in being most consistent in pri- oritizing preferences in a multiattribute decision model, we compared a graphic-numeric scale with a text-anchored scale. Study Design and Setting: Ninety-six postnatal women with a prior cesarean were randomized to use 1 of 2 preference scale formats in a computerized childbirth decision aid. We measured the level of inconsistency (intransitivity) when patients prioritized their childbirth preferences and clarity of values before and after using the decision aid. Results: When the trade-offs involved risk, women were more consistent when using graphic-numeric than text-anchored formats (P 5 0.015). They prioritized safety to their baby as 4 times more important than any other decision factor including safety to self. Both groups reduced unclear childbirth values over time (P ! 0.001). Women who over-used the extreme ends of the scale when evaluating risk were more likely to be inconsistent (P ! 0.001). Conclusion: Patients were more consistent in making trade-offs involving risk using graphic-numeric formats than text-anchored for- mats to measure patient preferences. Ó 2009 Elsevier Inc. All rights reserved. Keywords: Decision support techniques; Pregnancy; Vaginal birth after cesarean; Patient preferences; Decision aid; Analytic hierarchy process 1. Introduction Patient-centered care is considered essential for high- quality health care (Institute of Medicine) [1,2]. To provide patient-centered care, it is essential for clinicians to understand patient preferences and priorities. This be- comes even more important during the birth of a child, which is not only a health event but also a personal, inti- mate experience for women and provides memories for the entire family. In 2004, there were 4,115,590 live births in the United States; 20.6% were primary (first-time) cesareans [3]. Ac- cording to these numbers, 847,811 women who gave birth that year would be potentially eligible to face the complex decision of whether they should attempt a vaginal delivery after cesarean (VBAC) or have a cesarean in their next de- livery. Related to the current pregnancy, both the delivery options contain risks for the patient to consider and risk to the placenta in future pregnancies [4e7]. Because the delivery decision requires trade-offs, a woman and her Preliminary findings were presented at the annual meeting of the Soci- ety of Medical Decision Making, San Francisco, CA, October 2005, and the Women’s Health Interdisciplinary Research Symposium, Bethesda, MD, October 2005. Sponsored by the OHSU Foundation; NIH K12 grant: Building Inter- disciplinary Research Careers in Women’s Health, 5K12HD043488-04. This work was also supported by grants 1 R03 HS013959 and 1 R01 HS15321-01 from the Agency for Healthcare Research and Quality and 1 K08 HS11338-01 from the National Institute of Child Health & Human Development. * Corresponding author. 3181 Southwest Sam Jackson Park Road, Mail Code: BICC. Tel.: þ503-494-2456; fax: þ503-494-4551. E-mail address: [email protected] (K.B. Eden). 0895-4356/09/$ e see front matter Ó 2009 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2008.05.012 Journal of Clinical Epidemiology 62 (2009) 415e424

Transcript of Patients were more consistent in randomized trial at prioritizing childbirth preferences using...

Journal of Clinical Epidemiology 62 (2009) 415e424

Patients were more consistent in randomized trial at prioritizing childbirthpreferences using graphic-numeric than verbal formats

Karen B. Edena,*, James G. Dolanb,c, Nancy A. Perrind, Dundar Kocaoglue, Nicholas Andersonf,James Caseg, Jeanne-Marie Guisea,h,i

aOregon Health & Science University (OHSU) Department of Medical Informatics & Clinical Epidemiology, Portland, OR, USAbUnity Health System, Rochester, NY, USA

cUniversity of Rochester Department of Medicine, Rochester, NY, USAdOHSU School of Nursing, Portland, OR, USA

ePortland State University Department of Engineering and Technology, Management, Portland, OR, USAfUniversity of Washington Department of Medical Education and Biomedical, Informatics, Seattle, WA, USA

gOHSU Division of General Internal Medicine, Portland, OR, USAhOHSU Department of Obstetrics & Gynecology, Portland, OR, USA

iOHSU Department of Public Health & Preventive Medicine, Portland, OR, USA

Accepted 5 May 2008

Abstract

Objective: We developed an evidence-based decision aid to help women with a prior cesarean to prioritize their childbirth preferencesrelated to a future birth. Because there was uncertainty about which scale format would assist the patients in being most consistent in pri-oritizing preferences in a multiattribute decision model, we compared a graphic-numeric scale with a text-anchored scale.

Study Design and Setting: Ninety-six postnatal women with a prior cesarean were randomized to use 1 of 2 preference scale formats ina computerized childbirth decision aid. We measured the level of inconsistency (intransitivity) when patients prioritized their childbirthpreferences and clarity of values before and after using the decision aid.

Results: When the trade-offs involved risk, women were more consistent when using graphic-numeric than text-anchored formats(P 5 0.015). They prioritized safety to their baby as 4 times more important than any other decision factor including safety to self. Bothgroups reduced unclear childbirth values over time (P ! 0.001). Women who over-used the extreme ends of the scale when evaluating riskwere more likely to be inconsistent (P ! 0.001).

Conclusion: Patients were more consistent in making trade-offs involving risk using graphic-numeric formats than text-anchored for-mats to measure patient preferences. � 2009 Elsevier Inc. All rights reserved.

Keywords: Decision support techniques; Pregnancy; Vaginal birth after cesarean; Patient preferences; Decision aid; Analytic hierarchy process

1. Introduction

Patient-centered care is considered essential for high-quality health care (Institute of Medicine) [1,2]. To

Preliminary findings were presented at the annual meeting of the Soci-

ety of Medical Decision Making, San Francisco, CA, October 2005, and

the Women’s Health Interdisciplinary Research Symposium, Bethesda,

MD, October 2005.

Sponsored by the OHSU Foundation; NIH K12 grant: Building Inter-

disciplinary Research Careers in Women’s Health, 5K12HD043488-04.

This work was also supported by grants 1 R03 HS013959 and 1 R01

HS15321-01 from the Agency for Healthcare Research and Quality and

1 K08 HS11338-01 from the National Institute of Child Health & Human

Development.

* Corresponding author. 3181 Southwest Sam Jackson Park Road,

Mail Code: BICC. Tel.: þ503-494-2456; fax: þ503-494-4551.

E-mail address: [email protected] (K.B. Eden).

0895-4356/09/$ e see front matter � 2009 Elsevier Inc. All rights reserved.

doi: 10.1016/j.jclinepi.2008.05.012

provide patient-centered care, it is essential for cliniciansto understand patient preferences and priorities. This be-comes even more important during the birth of a child,which is not only a health event but also a personal, inti-mate experience for women and provides memories for theentire family.

In 2004, there were 4,115,590 live births in the UnitedStates; 20.6% were primary (first-time) cesareans [3]. Ac-cording to these numbers, 847,811 women who gave birththat year would be potentially eligible to face the complexdecision of whether they should attempt a vaginal deliveryafter cesarean (VBAC) or have a cesarean in their next de-livery. Related to the current pregnancy, both the deliveryoptions contain risks for the patient to consider and riskto the placenta in future pregnancies [4e7]. Because thedelivery decision requires trade-offs, a woman and her

416 K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

What is new?

� Patients can be inconsistent in expressing prefer-ences related to health risks, which can misguidethe healthcare decision.

� A method for measuring inconsistency due to in-transitivity in establishing childbirth preferences isdescribed.

� Patients were more consistent at assessing prefer-ences related to health risks when they used a scaleformat that included graphics and numbers ratherthan only words.

clinician must select a strategy that takes into considerationher priorities for the delivery.

Very little is known about the process by which womenset priorities related to childbirth [8]. Women most likelymake their decisions by considering not only safety but alsoseveral other factors: availability of VBAC, family obliga-tions (related to recovery from delivery, and the impact onpartners and other children), and personal values (e.g.,desire to experience labor) [5,8e22].

One approach which several researchers have takenwhen patients are faced with complex medical decisionsis to create a decision aid that measures patients’ priorities(preference weights) for the decision. A recent randomizedcontrolled trial found that a decision aid that helped womenprioritize preferences about possible health outcomes hada larger effect on the delivery decision than a decisionaid focused on presenting probabilities or comparing withusual care [23]. This finding suggests that the actual processof considering the possible health outcomes impacted thefinal decision. Although more than 200 health care decisionaids have been created in the past few years [24,25], onlya few include a method to explicitly measure patient prior-ities, and these have not attempted to evaluate the variabil-ity or accuracy of such measurements. Reliablemeasurement of patient preferences has proved to be chal-lenging. Previous investigators have demonstrated that24e89% of respondents were inconsistent when usingranking, visual analog, or timeetrade-off scales for assess-ing health states [26,27]. When respondents are inconsis-tent during the measurement process, the associatedpreference weights that are determined can have largevariability [28].

The most commonly used measure of consistency inpreference elicitation is the extent to which preferencesobey the transitive rule [29]. For example, if a patient feltsafety to her baby was twice as important as safety to her-self and if she felt safety to herself was 3 times more impor-tant than having her desired delivery experience, we wouldexpect her to indicate that her baby’s safety was 6 times

more important than her desired delivery experience.Perfectly consistent preferences are difficult to achieve incomplex choice tasks. Large deviations from perfect con-sistency, however, call into question the reliability of themeasurement procedure. Good preference assessment pro-cedures, therefore, should be associated with only minordegrees of inconsistency. When inconsistency differs be-tween 2 formats (e.g., graphic-numeric or text-anchoredscales) measuring the same preferences, the source of theinconsistency is likely to be the result of the measurementprocess rather than the complexity of the task or some otherfactor.

The long-term objective of our research program is tounderstand and improve the decision quality of pregnantwomen by creating a childbirth decision aid that helps themmake decisions consistent with their childbirth preferences.The aim of this study was to compare, in a randomizedtrial, the patient’s inconsistency due to intransitivity whenusing a graphic-numeric scale to that when using a text-anchored scale in computerized decision aid. We also mea-sured how clear patients were about their delivery priorities[30], a measure of the decision process [25].

2. Methods

2.1. Population

The study population consisted of women who had atleast 1 prior cesarean (within the past 18 months) and wereeligible for a VBAC. The women were recruited from July2004 through February 2005. Most women were recruitedfrom the postpartum unit of a large university hospital inOregon. Eligible patients in the postnatal unit were identi-fied by medical chart review and approached directly. Pa-tient information acquired from medical records duringrecruitment was kept confidential and limited to only thatinformation necessary for identifying qualifying partici-pants. During the chart review process, women were eligi-ble if they (1) had experienced a cesarean delivery, (2) were18 years of age or older, (3) spoke English, and (4) werephysically capable of bearing additional children by vaginaldelivery (e.g., had not had a bilateral tubal ligation, hyster-ectomy). Additional patients were recruited during thesame time period using informational brochures availablein 6 women’s health and family medicine clinics (1 privatepractice and 5 faculty-resident-midwife teaching practices),at a women’s health fair; using announcements on a localradio, and from an advertisement in a local family maga-zine. These patients were screened using the same criteria(1e4) when they contacted the study coordinator. Individ-ual patient appointments were scheduled, allowing approx-imately 45 minutes per session. Each patient received $20reimbursement for time and travel. The study was approvedby the Oregon Health & Science University InstitutionalReview Board and the trial was registered withClinicalTrials.gov.

417K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

Women were randomized to use 1 of the 2 formats ofa computerized childbirth decision aid based on a previ-ously generated computerized randomization performedby an independent biostatistician. The randomization wasblocked in groups of 6 to ensure an equal proportion of re-spondents in each group throughout the study. The sched-uler of the patient session had no access to therandomization schedule. After the patient arrived at the ses-sion, the research assistant consulted the randomizationschedule so she could select the decision aid with the as-signed format for the patient. Each participant was blindedto her intervention assignment, and research assistants werecarefully trained to provide objective feedback that wouldnot reveal these assignments or present opinions regardingthe program material.

2.2. Intervention

A research assistant with a laptop and printer met theparticipant in a quiet location at any of the 6 participatingclinics. Women who were unable to travel to a participatingclinic were invited to participate at the location of the re-cruitment (at the hospital (n 5 5) or health fair (n 5 16)).All women completed questions on the computer aboutprior obstetric history and demographics and then were ran-domized to use 1 of 2 decision aids to help them clarifytheir values.

Each decision aid was built on an underlying, evidence-based [4e6,8], multicriterion decision model that includedprobabilities of maternal and neonatal health outcomes andchildbirth preferences from a systematic evidence reviewwe completed [5,8]. The decision model was evaluatedfor face validity by clinicians during structured interviewsand by patients in 4 focus group sessions. The resulting

MAJOR CRI

Sub-CRIT

OPTION

Goal: Make best de

decision for m

Avoid harm to the baby

Avoid harm to the mother

Avoipreg

Avoid incontinence

Avoid numbnessor pain by incision

Avoid hysterectomy

Request aTrial of Labor

Fig. 1. Multiattribute decision model showing criteria for deciding between a

model was based on the analytic hierarchy process(AHP), a technique for turning vague preferences or per-ceptions into normalized weights and then measuring theinconsistency related to intransitivity with which the usersprovide information [28,31]. It has been used for healthcaredecisions when preferences and values are key to the deci-sion [32]. The AHP is typically presented using verbal ornumeric scales. Verbal statements may be easier for peopleto understand as we think in terms of words, not numbers.However, there is evidence that people attach different nu-meric interpretations to verbal labels [33], which led us tothe current study, a comparison of inconsistency using text-anchored and graphic-numeric scale formats.

There are typically 4 steps in using AHP: (1) create a de-cision model with a goal for the decision, the options, andthe criteria for judging the options (Fig. 1); (2) completea series of pairwise comparisons among the criteria to de-termine their relative importance to meeting the goal, usinga constant-sum response scale that is either a graphic-numeric format or a text-anchored format; (3) completea series of pairwise comparisons among the options to de-termine how well they fulfill the criteria again, using a con-stant sum response scale; and (4) compute the weights ofthe criteria and integrate the information gleaned from steps2 and 3 to identify the option that best fulfills the goal [32].Participants in our study completed steps 2 and 3 using cus-tomized software that contained the decision model (step 1)and carried out the computation and summation processes(step 4). During the computation process, AHP also mea-sures the inconsistency level due to intransitivity introducedby the user during the pairwise comparison process. Partic-ipants received feedback on their preference weights andthe model’s recommended option. They were given an op-tion to revise their preference weights.

TERIA

ERIA

S

livery

e

d risk to futurenancies

Have a gooddelivery experience

ScheduledDelivery

Partnerinvolved

Smoothrecovery

Bond withbaby

Option forlabor

Reduce costof delivery

Request anElective Repeat

Cesarean

‘‘trial of labor’’(attempting a VBAC) and an elective repeat cesarean.

418 K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

The goal of the decision model (Fig. 1) was for thewoman to choose a delivery strategy that best matchedher preferences. The 2 delivery options are shown at thebottom of the model: ‘‘Request an elective repeat cesareandelivery’’ or ‘‘Request a trial of labor’’ (attempt a vaginaldelivery). The term ‘‘trial of labor’’ is currently usedinstead of vaginal delivery because not all women who re-quest a vaginal birth will achieve this. Three-quarters (76%,range, 60e82%) of women who attempt a VBAC will havea vaginal delivery and the remainder have a repeat cesarean[4]. The women were given this information when theyused either format of the decision aid. The boxes in themiddle of the model represent the many criteria or deci-sional factors a woman may be considering in formingher delivery strategy. These criteria at the first and secondlevels are defined and presented to patients as follows:

Avoid harm to the baby: such as, avoid death or disabilityto the baby.

Avoid side effects to the mother: such as, avoid prolongednumbness or pain by incision, avoid hysterectomy oravoid incontinence by middle age.

Avoid risk to future pregnancies: such as, avoid risk ofplacenta accreta and placenta previa.

Have a good delivery experience: such as, have theoption to schedule delivery; to involve one’s partnerin the delivery; to reduce cost; to be physically andemotionally able to bond with one’s baby immediatelyfollowing delivery; to have a smooth recovery withinweeks of delivery; and to have the option to experiencelabor.

Every attempt was made to define the criteria to be inde-pendent of each other. For example, the definition of the de-livery experience was explicitly defined to not includehealth outcomes that were defined in other criteria.

During 30 additional pretesting sessions with eligiblewomen before the trial, we evaluated whether the womencould make the pairwise comparisons. Comparisons of cri-teria that appeared related and/or difficult to understand, forexample, avoiding mild compared with major infantdisabilities, were removed or revised during this iterativeprocess. In cases where the definition for the major crite-rion included attributes that would have a mixed effect

Fig. 2. Screenshot of the childbirth decision aid showing a pairwise compari

(some attributes might match better to VBAC and someto repeat cesarean), subcriteria were created. For example,the major criterion, ‘‘Have a good delivery experience’’ in-cluded ‘‘option to experience labor’’ (which relates moreclosely to desire for VBAC) and also ‘‘option to scheduledelivery’’ (which is more closely related to repeat cesar-ean). For the major criterion, ‘‘Avoid harm to baby,’’ wedidn’t find sufficient medical evidence to report infant dis-ability related to delivery options separately from infantmortality and thus left them combined in the definitionand did not create subcriteria. To test the hierarchy in thedecision model for face validity, the women were givennote-cards with the major criteria and subcriteria displayed.They were asked to lay out the decision model as a verifica-tion of the hierarchy of the model. As needed, the modelwas revised or definitions were clarified, and the processwas repeated with new participants. The model and defini-tions stabilized within the first 10 participants. For the last20 pretester sessions, we measured the accuracy of place-ment for each major criterion or subcriterion. These 20 tes-ters correctly placed each criterion in >85% of placements.

The AHP has typically used a variety of scale formats(verbal, Likert scales, graphical) to assist the user inmaking the pairwise comparisons [33]. Because there wasuncertainty about which scale format for measuring prefer-ence weights during the pairwise comparisons wouldreduce intransitivity, we compared use of a bipolargraphic-numeric scale to a bipolar text-anchored scale.

In each childbirth decision aid, the patient made a series ofpairwise comparisons to determine the relative importance ofeach decision criterion [28,31,32,34], for example, compar-ing the importance of ‘‘avoiding side effects to the mother’’with ‘‘having a good delivery experience,’’ using 1 of 2 for-mats (Figs. 2 and 3). The women in this study were makingthe comparisons without fore-knowledge of their future de-livery outcome, so that the actual outcome did not affect theirpreference weight. We measured intransitivity for each set ofpairwise comparisons within a group of criteria: the MajorCriteria (6 pairwise comparisons); the Harm-to-Mother sub-criteria (3 comparisons that included incontinence, pain, andhysterectomy); and finally within the Delivery subcriteria (15comparisons that included convenience of scheduling, recov-ery process, and so forth; see Fig. 1).

son using the graphic-numeric format for prioritizing decision criteria.

Fig. 3. Screenshot of the childbirth decision aid showing a pairwise comparison using the text-anchored format for prioritizing decision criteria.

419K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

With either format, the patients allocated a constant sumof 100 points between the 2 criteria, reflecting their relativeimportance, using either a graphic-numeric scale that dis-played numbers or a Likert scale that contained text butno numeric information. With the graphic-numeric scale,the patient used a sliding bar (Fig. 2) to allocate a totalof 100 points between the 2 alternatives to express the ratioscale relations between them. Sliding the bar away from themidpoint (which represents equal distribution of points, 50points for each criterion) in either direction increased themagnitude of the number. In this example, the patient se-lected ‘‘Avoid side effects’’ to the mother as a magnitudeof 78 compared with 22 (not shown) for ‘‘Have a good de-livery experience.’’ This represented the ratio scale value of78/22 (or 3.55:1), meaning ‘‘avoiding side effects’’ is morethan 3 times as important as ‘‘having a good delivery expe-rience’’ for her. The participant only saw the position of thesliding bar (and the associated number) on a scale that in-creased from 50 to 100 but did not see the underlying ratio.

With the text-anchored scale, the patient picked the ra-dio button by the text that best represented the level of im-portance (Fig. 3). For this scale, the radio buttons hadinvisible ratios associated with them that ranged from1:1; 2:1; up to 9:1 in either direction. In Fig. 3, the radiobutton she selected represented a ratio of 4:1 in favor of‘‘avoiding side effects.’’ The maximal ratio for either scale

Fig. 4. Sample feedback on childbirth preferences that the patient received after

isons to adjust the preference weights (shown by percentages) if she desired.

format was 9:1. At the end of each set of comparisons forboth formats, the women received an on-screen summaryof their preference weights (Fig. 4). Finally, the womenwere presented with the probabilities of the health out-comes for each delivery option and made another seriesof pairwise comparisons (VBAC vs. repeat cesarean) to de-termine how well they fulfilled the criteria, again, using thesame response format they used previously.

In cases where probabilities were not available, for ex-ample, comparisons among delivery experience subcriteria,the women made subjective comparisons about how wellthe delivery options fulfilled the criterion. At any pointwhile using the decision aid, for example, after learningthe probabilities of the health outcomes, the women couldreset their priorities (adjust their preference weights). Thedecision aid recorded all changes. The program used theAHP to combine the final preference weights and providea suggested delivery strategy to the patient [28,32]. A sam-ple model and computation is shown in Appendix 1 (avail-able on the journal’s website at www.elsevier.com/locate/jclinepi).

Note: The program was not designed to make a final de-cision for the patient but to help the patient gain deeper in-sight into her priorities for competing decision factors.

The participants were given the opportunity to print theresults to discuss with their providers and/or to take home.

prioritizing the criteria for childbirth. The patient could revise her compar-

420 K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

A research assistant was available to answer any questionsregarding the output and to take note of any commentsusers had after completing the program. The database inthe decision aid recorded scores for all comparisons andany changes in comparisons once feedback was given.

2.3. Measures

The primary focus of the trial was to identify the re-sponse format that helped the women be most consistentin their decision making. Appendix 1 provides the detailsfor computation of the measure of inconsistency introducedby women during the pairwise comparison process. In com-puting inconsistency due to intransitivity, the average pref-erence weight is found through a series of matrixcomputations. The decision aid re-computes the weights,the mean of the weights and their associated standard devi-ations (SDs) using every possible orientation the matricescould be ordered. The SDs are used as the measure of in-transitivity introduced by the patient’s comparisons. Whenthe SD surrounding each weight is small, the overall intran-sitivity level is low (<0.10) and the patient’s responses arerelatively transitive, a necessary requirement for computingaccurate preferences [29,34]. When the inconsistency islow, there is a clearer delineation between more desiredand lesser desired preferences [33]. The decision makerhas more confidence in the results of the AHP.

To measure the effect of the format on the decision pro-cess, we have adapted questions from one subscale of theDecisional Conflict Scale [30,35,36]. In this study, we fo-cused on reducing 1 aspect of decision conflict (unclearvalues related to childbirth). Patients who report having lessconflict are more likely to make decisions. When the wholescale is used, decisional conflict results !25 suggest thatthe decision maker is prepared to make a decision. This val-idated and reliable tool predicts that users with greater de-cisional conflict are more likely to delay decisions and areless knowledgeable about the decision [35,37,38]. Theymay also be more likely to express regret or blame theirpractitioner for a bad outcome [39,40]. We measured clarityof values before and after the patients used the decision aid.

2.4. Statistical analyses

Using chi-square tests for categorical variables and inde-pendent samples and t-tests for continuous variables, weanalyzed the patients’ baseline demographics, obstetric his-tory, and insurance status to evaluate whether the random-ization effectively delivered similar groups. Independentsamples t-tests were used to determine if the 2 groups dif-fered on levels of intransitivity applied to the 3 judgmentsets. Because these measures are correlated, we used a Bon-ferroni-corrected alpha level of 0.017 (0.05/3) to determinesignificance. Repeated measures analysis of variance (AN-OVA) was used to determine if the clarity about childbirthvalues measured before and after using the decision aid was

the same for the 2 groups. In subsequent analyses, we com-pared consistent and inconsistent responders using chi-square tests and t-tests for cross-sectional measures and re-peated measures ANOVA, to examine changes over time.

3. Results

The number of patients screened, eligible, and recruitedis depicted in Fig. 5. Of the 105 women who enrolled in thestudy, 9 women failed to appear for their computer ses-sions. The remaining 96 women (48 in each response for-mat group) were included in the analysis. The 2 groupswere similar in all demographic, health insurance status,and obstetric history variables (Table 1). The majority (64of 96) of the women participated in the immediate postpar-tum period (within 8 weeks of their cesarean delivery). Theremaining 32 women participated 9 weeks to 18 months af-ter delivery. Seventy-eight percent of the women had only 1prior cesarean. One-quarter of the women had an emer-gency cesarean. There were no differences in the childbirthpreferences expressed by the 2 groups after using the tool(Table 2). Across both formats, women placed the mostweight on avoiding harm to their babies; followed by avoid-ing side effects to themselves; avoiding risk to future preg-nancies; and having a good delivery experience. Fivewomen using the text-anchored scale (10.4%) and 6 womenusing the graphic-numeric scale (12.5%) made changes totheir preference weights after receiving feedback on theirpriorities or after learning the probabilities of maternaland neonatal outcomes. In instances when the womenchanged their priorities, the final preference weights wereincluded in the analyses.

In evaluating which format produced the most inconsis-tent results, we found that women were more intransitivewith the text-anchored format than the graphic-numericformat for comparisons among the Major Criteria(text-anchored intransitive mean 5 0.09 vs. graphic-numeric mean 5 0.05; mean difference 5 0.04, 95%CI 5 0.01e0.06, t 5 2.473, P 5 0.015) and among theHarm-to-Mother subcriteria (text-anchored 5 0.06 vs.graphic-numeric 5 0.03, mean difference 5 0.03, 95%CI 5 0.02e0.04, t 5 4.483, P ! 0.001; Fig. 6). There wereno differences between the 2 groups for intransitivity incomparisons among the Delivery subcriteria (text-an-chored 5 0.11 vs. graphic-numeric 5 0.10; mean difference0.01, 95% CI �0.02 to 0.03, t 5 0.756, P 5 0.452), whichhovered at the inconsistency guideline of <0.10. This par-ticular set of comparisons involved 6 items (and 15 pair-wise comparisons), the largest of the 3 sets ofcomparisons. As more comparisons are involved, inconsis-tency due to intransitivity increases, simply because of theincreased opportunity.

Our repeated measure ANOVA revealed that both groupsof patients improved the decision process by significantlyreducing unclear childbirth values after using the decision

Recruitment through the Postnatal

unit at OHSU,

n=482

Additional Recruitment, n=35,

health expos (n=23) or responded to radio, newspaperannouncements, and clinic flyers(n=12)

Eligible for trial (by postnatal

medical chart),

n=246

Invited to trial from postnatal unit,

n=111

Excluded, n=236

Bilateral tubal or hysterectomy, n= 56, Outside Portland, n=5, Under 18 yrs, n=1, Correctional inmate, n=2, Planned adoption, n=2, Spanish speaking, n=131, Other language, n=21, No record of prior cesarean, n=10, Other, n=8

Number not recruited, n=135,Notice of “no visitors”, n=9, Patient busy with nurse, n=18, Patient sleeping, n=2, Patient refused study, n=9, Patient discharged before recruitment, n=97

Enrolled in trial and scheduled

for computer session,

n=105

Number randomized to trial,

n=96

Didn’t appear for computer session,

n=9

Graphic Numerical Format,

n=48

Verbally Anchored Format,

n=48

Declined,

n=41

Fig. 5. Recruitment and enrollment of participants.

421K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

aid (F 5 23.74, df 5 1, P ! 0.001), and the degree ofchange did not differ between the groups (F 5 0.436,df 5 1, P 5 0.510) (text-anchored mean 5 29.86 at base-line; 21.01 after using the decision aid; graphic-numericmean 5 33.85 at baseline; 22.22 after using the decisionaid).

When we considered only sets of comparisons that in-cluded trade-offs of risk to mother or baby (among the Ma-jor Criteria and among the Harm-to-Mother subcriteria),31.3% (15 of 48) of women assigned to the text-anchoredformat displayed inconsistency outside the guideline forthe methodology (O0.10) compared with 16.7% (8 of 48)of women assigned to the graphic-numeric format (chi-square 5 2.801, df 5 1, P 5 0.094). In total, 23 of the 96women exceeded the inconsistency guideline for at least1 set of comparisons involving risk (Major Criteria orHarm-to-Mother subcriteria).

To better understand differences between the consistentwomen who stayed within the 0.10 guideline (n 5 72)and inconsistent women who exceeded the 0.10 guideline(n 5 23), we examined their use of the extreme ends of

the scales in comparisons involving risk. We found that56.5% (13 of 23) of inconsistent women used only extremeends of the scales (!25 or O75) on every comparisoncompared with 12.5% (9 of 72) of consistent responders(chi-square 5 19.0, df 5 1, P ! 0.001). Regardless of in-consistency level, this behavior was more prevalent(38.3%, 18 of 47) among women assigned to the text-anchored format than among those assigned to thegraphic-numeric format (8.3%, 4 of 48) (chi-square 5 12,df 5 1, P 5 0.001). Because previous researchers have re-ported that uncertain responders may overuse middle valuesof scales [41], we also screened for overuse of middle quar-tile values (25e75) and found that only 1 of the 96 respon-dents kept all evaluations of risk in this range.

4. Discussion

We found that patients using the graphic-numeric formatfor judging the relative importance of criteria were moreconsistent in prioritizing risk than those using the text-an-chored format. This result may reflect the difficulty people

Table 1

Characteristics of the 2 randomized groups

Format

P-valueGraphic-numeric Text-anchored

Mean age (yr) 28.3 29.6 0.307a

Race

White 43/48 (90%) 38/47 (81%) 0.067b

African American 5/48 (10%) 4/47 (9%)

Other 0/48 (0%) 5/47 (11%)

Ethnicity

Hispanic 5/48 (10%) 3/48 (6%) 0.460b

Has insurance that covers

next delivery

35/48 (73%) 28/48 (58%) 0.133b

Time since last cesarean

Up to 8 weeks 30/48 (63%) 34/48 (71%) 0.386b

Nine weeks up to 18

months

18/48 (38%) 14/48 (29%)

Obstetric history

Prior vaginal delivery 6/46 (13%) 7/48 (15%) 0.829b

Only one prior cesarean

delivery

37/48 (77%) 38/48 (79%) 0.805b

Emergency cesarean 14/48 (29%) 10/48 (21%) 0.346b

a t-tests for independent samples.b Chi-square tests.

0

0.02

0.04

0.06

0.08

0.1

0.12

MajorCriteria*

Harm-to-Mother**

Delivery

In

co

ns

is

te

nc

y

Graphic-NumericText-Anchored

Statistical difference calculated with t-tests for independent samples,*p = 0.015, **p < 0.001

Fig. 6. For pairwise comparisons involving risk to either mother, baby, or

future pregnancies (‘‘Major Criteria’’) or to the mother alone (‘‘Harm-to-

Mother’’) the patients were less inconsistent using a graphic-numeric for-

mat than a text-anchored format. When the patients made pairwise com-

parisons among criteria related to delivery and recovery (‘‘Delivery’’),

there was no difference in inconsistency among the formats.

422 K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

have in transferring words to numbers. For example, thecomparison term ‘‘moderately more important’’ translatesto ‘‘3 times more important’’ when we convert the ratioscale for the AHP computations. The conversion to a nu-meric scale may overestimate the women’s intended differ-ence between the criteria [33]. By contrast, the women may

Table 2

Childbirth priorities (preference weights)

Format

P-value

Graphic-numeric

Mean (SD)

Text-anchored

Mean (SD)

Major criteria

Avoid harm to baby 0.64 (0.13) 0.65 (0.11) 0.614

Avoid harm to mother 0.15 (0.08) 0.14 (0.06) 0.645

Avoid risk to future

pregnancy

0.10 (0.06) 0.11 (0.08) 0.473

Have good delivery

experience

0.11 (0.08) 0.10 (0.08) 0.304

Inconsistency 0.05 (0.06) 0.09 (0.07) 0.015

Harm-to-mother subcriteria

Avoid hysterectomy 0.62 (0.19) 0.68 (0.19) 0.125

Avoid incontinence 0.24 (0.15) 0.21 (0.14) 0.284

Avoid numbness or pain 0.14 (0.14) 0.11 (0.13) 0.317

Inconsistency 0.03 (0.04) 0.06 (0.03) 0.000

Delivery subcriteria

Bond with baby 0.32 (0.12) 0.34 (0.09) 0.328

Have smooth recovery 0.21 (0.11) 0.20 (0.11) 0.758

Involve partner 0.20 (0.09) 0.22 (0.11) 0.393

Desire to labor 0.15 (0.10) 0.12 (0.10) 0.210

Convenience 0.06 (0.05) 0.06 (0.05) 0.815

Reduce cost 0.06 (0.03) 0.06 (0.06) 0.664

Inconsistency 0.10 (0.7) 0.11 (0.05) 0.452

have been more careful in their comparisons using the nu-meric scale as the inclusion of numbers suggested somelevel of magnitude and precision. Users of the text-an-chored format in this study tended to use extreme ends ofthe scales, making the inconsistencies large because ofthe magnitude of the comparisons. Dolan and Kind [42] re-ported the same behavior with a subset of inconsistent re-spondents in their study of patients, disabled adults, andcommunity members evaluating EuroQol states using a vi-sual analog scale. Inconsistent respondents tended to scorehealth states around the endpoints of the scale (0 or 100).

Next, we evaluated how intransitivity affected the accu-racy of the preference weights computed using AHP. In Ap-pendix 1, we provide computations of preference weightsfor 1 hypothetical highly consistent decision maker and 1hypothetical highly inconsistent decision maker. The endresult is a small variation in the computed preferenceweights for the consistent responder and a large variationin the computed preference weights for the inconsistent de-cision maker that resulted in a misleading recommendation.

Because of the ethical nature of the comparisons in thisstudy (e.g., consideration of risk to baby vs. concern aboutrisk to future pregnancies), some women may have feltmore comfortable choosing less extreme values on thegraphic-numeric scale with minimal labels compared withthe text-labeled format. As one user commented on thegraphic-numeric scale, ‘‘I liked the program because it al-lowed me to show how I was leaning.’’ Women using thisscale may have felt that it was ‘‘value-free’’ and less judgedby it. By contrast, women using the text-labeled formatmay have felt compelled to choose the label indicating that

423K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

protection from harm was ‘‘extremely more important,’’even though they may have privately considered the com-pared item (e.g., smooth recovery) as somewhat important,too. However, they may have been hesitant to indicate thiswith the text labels. These women may have felt ‘‘morejudged’’ by this scale.

Another explanation for the increased inconsistencywith the text-anchored format could be that the women sim-plified their process. Previous researchers have found thatmost people avoid reasoning using verbatim traces of mem-ory and instead rely on the ‘‘gist’’ they extracted from thedata [43,44]. In our study, women using the sliding barmay have more easily recalled how they were ‘‘leaning’’in prior comparisons using the graphic-numeric scale(e.g., avoiding harm to my baby is more important thanhaving my desired birth experience) than women usingthe text-anchored scale (with terms, such as ‘‘moderatelymore important,’’ ‘‘much more important,’’ and so forth).By selecting the extreme ends of scales, the women as-signed to the text-anchored scales may have then reliedon the gist of the radio button placement (extreme left orright side to indicate which factor was most important).

Other researchers have demonstrated that when childrencomplete tasks that involve the transitivity rule, they arereasonably good with comparing items of easily measuredattributes (comparison of 3 objects of length A, B, and C)but may use simpler heuristics when the items are less cer-tain or measurable (comparisons of friendships among chil-dren A, B, and C) [45]. Perhaps this behavior is also true ofa subset of adults in comparing health outcomes and expe-riences of less certain nature?

It may also be that the inconsistent women had alreadydecided the route of delivery and felt no reason to engage inan activity to make trade-offs between the decisional fac-tors. Other researchers have reported similar simplifyingdecision-making behavior, particularly if respondents basetheir choice solely on one decisional criterion [46e48]. Inthis study, for many responders, avoidance of harm to thebaby was several times more important than all other deci-sional factors and thus simplified the trade-off process.Some users may have used the ‘‘priority heuristic,’’ avoidthe worst outcome (harming the baby) and pay less atten-tion to other factors [48].

Previous researchers have suggested that users shouldreceive feedback about their level of inconsistency and begiven an option to revise responses [27]. Ideally, this feed-back will help users understand the interrelationships of thecomparisons. Another approach is for the users to rank-or-der the criteria before making any pairwise comparisons.These approaches would reduce unintentional measurementmistakes. These strategies would not aid the decision makerwho intentionally has simplified the decision but perhaps itmight make her re-engage in the decision-making process.Designers of future decision aids may want to include a tri-age point for users who are inconsistent and/or make ex-treme trade-offs. Rather than providing preference weight

feedback, the decision aid might instead prompt them tolearn more about factors they seem to be ignoring with theircurrent strategy.

Another approach might be to relay the inconsistency in-formation to the patient’s provider or a decision coach as analert to probe deeper into the decision process with the pa-tient. Perhaps the patient has undue worry about a risk orhas narrowly focused on 1 specific criterion at the expenseof not considering other important factors. This insight mayempower the providers to know what topics to focus onduring the decision-making discussions.

5. Conclusions

Although both formats improved the decision process(clarified values), patients were more consistent in makingtrade-offs involving risk when the format was graphic-numeric rather than text-anchored. This insight aids in thedesign of future computerized systems to measure patientpriorities and may aid in providerepatient communication.Our finding may prove particularly important in ethicallypacked medical decisions when patients may be hesitantto reveal the importance of avoiding risk to self or othersout of fear of being judged.

Acknowledgments

This study was supported by grants from the OHSUFoundation; NIH K12 grant: Building Interdisciplinary Re-search Careers in Women’s Health, 5K12HD043488-04;and the Agency for Healthcare Research and Quality,1R03HS013959-01A1, 1R01HS15321-01, and 1K08HS11338-01. The authors also thank Deborah Rosenberg,BA, Jennifer Williams, MPA, and Megan Palinsky, MS,who assisted in developing the decision aid software, re-cruiting the participants and meeting with the women dur-ing computer sessions. Finally, the authors thank AnnetteO’Connor, MScN, PhD, FCAHS, for her assistance in inter-preting the results, Robert Hamm, PhD, for his insightfulcomments on the psychology of inconsistency, John A.Beekman, PhD, for advice on the mathematics of the incon-sistency measurement, and Jane Kyhl Beekman, BA andKathryn Pyle Krages, AMLS, MA, for their editingexpertise.

References

[1] Institute of Medicine, editor. Envisioning the national health care

quality report. Washington, DC: National Academy Press; 2000.

[2] Institute of Medicine, editor. Crossing the quality chasm: a new

health system for the 21st century. Washington, DC: National Acad-

emy Press; 2001.

[3] Martin JA, Hamilton BE, Sutton PD, Ventura SJ, Menacker F,

Kirmeyer S. Births: final data for 2004. Natl Vital Stat Rep

2006;55(1).

424 K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

[4] Guise J-M, Berlin M, McDonagh M, Osteweil P, Nygren P,

Helfand M. Safety of vaginal birth after cesarean: a systematic re-

view. Obstet Gynec 2004;103(3):420e9.

[5] Guise J-M, McDonagh M, Hashima J, Kraemer D, Eden KB,

Berlin M, et al. Vaginal birth after cesarean section (VBAC). Evi-

dence report/technology assessment no. 71. Rockville, MD: Agency

for Healthcare Research and Quality; 2003.

[6] Guise J-M, McDonagh M, Osteweil P, Nygren P, Chan B, Helfand M.

Systematic review of the incidence and consequences of uterine rup-

ture in women with previous cesarean section. Br Med J

2004;329(7456):19e25.

[7] Mankuta DD, Leshno MM, Menasche MM, Brezis MM. Vaginal

birth after cesarean section: trial of labor or repeat cesarean section?

A decision analysis. Am J Obstet Gynecol 2003;189:714e9.

[8] Eden KB, Hashima JN, Osterweil P, Nygren P, Guise JM. Childbirth

preferences after cesarean birth: a review of the evidence. Birth

2004;31:50e62.

[9] Fraser W, Maunsell E, Hodnett E, Moutquin JM. Randomized con-

trolled trial of a prenatal vaginal birth after cesarean section educa-

tion and support program. Childbirth alternatives post-cesarean

study group. Am J Obstet Gynecol 1997;176(2):419e25.

[10] Kirk EP, Doyle KA, Leigh J, Garrard ML. Vaginal birth after cesar-

ean or repeat cesarean section: medical risks or social realities? Am J

Obstet Gynecol 1990;162(6):1398e403. discussion 1403e5.

[11] Kline CR, Martin DP, Deyo RA. Health consequences of pregnancy

and childbirth as perceived by women and clinicians. Obstet & Gy-

necol 1998;92(5):842e8.

[12] Martin JN, Harris BA, Huddleston JF, Morrison JC, Propst MG,

Wiser WL, et al. Vaginal delivery following previous cesarean birth.

Am J Obstet Gynecol 1983;146(3):255e63.

[13] McClain CS. The making of a medical tradition: vaginal birth after

cesarean. Social Sci Med 1990;31(2):203e10.

[14] Meier PR, Porreco RP. Trial of labor following cesarean section:

a two-year experience. Am J Obstet Gynecol 1982;144(6):671e8.

[15] Melnikow J, Romano P, Gilbert WM, Schembri M, Keyzer J,

Kravitz RL. Vaginal birth after cesarean in California. Obstet Gyne-

col 2001;98(3):421e6.

[16] Fawcett J, Tulman L, Spedden J. Responses to vaginal birth after ce-

sarean section. J Obstet Gynecol Neonatal Nurs 1994;23(3):253e9.

[17] Gamble JA, Creedy DK. Women’s preference for a cesarean section:

incidence and associated factors. Birth 2001;28(2):101e10.

[18] Lau TK, Wong SH, Li CY. A study of patient’s acceptance towards

vaginal birth after cesarean section. Austr NZ J Obstet Gynaecol

1996;36(2):155e8.

[19] Murphy MC, Harvey SM. Choice of a childbirth method after cesar-

ean. Women Health 1989;15(2):67e85.

[20] Gamble JA, Creedy DK. Women’s request for a cesarean section:

a critique of the literature. Birth 2000;27(4):256e63.

[21] Shorten A, Shorten B, Keogh J, West S, Morris J. Making choices for

childbirth: a randomized controlled trial of a decision-aid for in-

formed birth after cesarean. Birth 2005;32(4):252e61.

[22] Murphy D, Liebling R. Cohort study of maternal views on future

mode of delivery after operative delivery in the second stage of labor.

Am J Obstet Gynecol 2003;188:542e8.

[23] Montgomery A, Emmett C, Fahey T, Jones C, Ricketts I, Patel R,

et al. Two decision aids for mode of delivery among women with pre-

vious caesarean section: randomised controlled trial. Br Med J

2007;(June 23):334.

[24] O’Connor AM, Stacey D, Entwistle V, Llewellyn-Thomas H,

Rovner D, Holmes-Rovner M, et alIn: Decision aids for people facing

health treatment or screening decisions (cochrane review), vol. 4.

Chichester, UK: John Wiley & Sons, Ltd; 2003.

[25] O’Connor AM, Bennett C, Stacey D, Barry MJ, Col NF, Eden KB,

et al. Do patient decision aids meet effectiveness criteria of the

international patient decision aid standards collaboration? A system-

atic review and meta-analysis. Med Decis Making 2007;27(5):

554e74.

[26] Lamers L, Stalmeier P, Krabbe P, Busschbach J. Inconsistencies in

TTO and VAS values for EQ-5D health states. Med Decis Making

2006;26(MareApr):173e81.

[27] Badia X, Roset M, Herdman M. Inconsistent responses in three pref-

erence-elicitation methods for health states. Social Sci Med

1999;943e50.

[28] Kocaoglu D. A participative approach to program evaluation. IEEE

Trans Eng Manage 1983;30(3).

[29] Forman E, Gass S. The analytic hierarchy process: an exposition. Op-

er Res 2005;49(4):469e86.

[30] O’Connor AM. Decisional conflict scale, � 1993. 4th edition. Ot-

tawa: Loeb Health Research Institute; 1999.

[31] Cleland DI, Kocaoglu D. Engineering management. McGraw-Hill;

1981.

[32] Dolan JG, Frisina S. Randomized controlled trial of a patient decision

aid for colorectal cancer screening. Med Decis Making 2002;22:

125e39.

[33] Huizingh E, Vrolijk H. A comparison of verbal and numerical judg-

ments in the analytic hierarchy process. Organ Behav Hum Decis

Process 1997;70(3):237e47.

[34] Saaty RW. The analytic hierarchy process-what it is and how it is

used? Math Model 1987;9(3e5):161e76.

[35] O’Connor AM. Validation of a decisional conflict scale. Med Decis

Making 1995;15:25e30.

[36] O’Connor AM. Decision conflict scale, � 1993 (updated 2006). Ot-

tawa Health Research Institute (University of Ottawa); 2006.

[37] Drake E, Engler-Todd L, O’Connor AM, Surh LC, Hunter A. Devel-

opment and evaluation of a decision aid for prenatal testing for

women of advanced maternal age. J Genet Couns 1999;8(4):217e33.

[38] Comeau C. Evaluation of a decision aid for family members consid-

ering long term care options for their relative with dementia. Ottawa:

University of Ottawa; 2001.

[39] Sun Q. Predicting downstream effects of high decisional conflict:

meta-analysis of the decisional conflict scale. Ottawa: University of

Ottawa; 2004.

[40] Gattellari M, Ward J. Men’s reactions to disclosed and undisclosed

opportunistic PSA screening for prostate cancer. Med J Aust

2005;182(8):386e9.

[41] Bruin de Bruin W, Fischbeck P, Stiber N, Fischhoff B. What number

is ‘‘Fifty-Fifty’’?: Redistributing excessive 50% responses in elicited

probabilities. Risk Anal 2002;22(4):713e23.

[42] Dolan P, Kind P. Inconsistency and health state valuations. Social Sci

Med 1996;42(2):609e15.

[43] Howe M, Brainard C, Reyna V. Development of long-term retention.

New York: Springer-Verlag; 1992.

[44] Reyna V. How people make decisions that involve risk: a dual-pro-

cesses approach. Curr Dir Psychol Sci 2004;13(2):60e6.

[45] Markovits H, Dumas C. Developmental patterns in understanding so-

cial and physical transitivity. J Exp Child Psychol 1999;73:95e114.

[46] Salkeld G, Solomon M, Butow P, Short L. Discrete-choice experi-

ment to measure patient preferences for the surgical management

of colorectal cancer. Br J Surg 2005;92:742e7.

[47] Cairns J, van der Pol M, Lloyd A. Decision making heuristics and the

elicitation of preferences: being fast and frugal about the future.

Health Econ 2002;11:655e8.

[48] Brandstatter E, Girgerenzer G, Hartwig R. The priority heuristic: mak-

ing choices without trade-offs. Psychol Rev 2006;113(2):409e32.

Table A2

Matrix 2 of orientation AeBeC

A B C

h 5 1 i 5 2 j 5 9

k 5 1/2 l 5 1 m 5 3

n 5 1/9 o 5 1/3 p 5 1

424.e1K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

Appendix 1

Sample computations of the inconsistency (due to in-transitivity) measure using the constant-sum method

The specific method of analytic hierarchy process (AHP)used by the decision aid in this study is the constant-summethod [1], which differs slightly from other forms of AHPthat are used in some commercialized software, for example,Expert Choice [2]. This method was chosen to give the par-ticipants more precision in measuring preference weightsand to allow us to create a customized interface. With thismethod, participants can allocate importance using a moreprecise scale than the standard ratio scale. To illustrate theprocess to arrive at measured inconsistency, consider the sim-ple decision model shown below (Fig A1).

MAJOR CRITERIA

OPTIONS

Goal: Make Best

Decision for Me

Criterion A

Option 1

Criterion B Criterion C

Option 2

Fig A1. Sample decision model.

The decision maker first determines the priority of thecriteria to the decision, using a series of pairwise compari-sons for all possible comparisons, that is, Criterion A vs.Criterion B, Criterion B vs. Criterion C, Criterion C vs. Cri-terion A. The decision maker allocates a total of 100 pointsbetween 2 criteria based on relative importance to the deci-sion. Because previous work with AHP has revealed that ra-tios greater than 9:1 create large variation in the preferenceweights, we limited our constant sum values to be >10 and<90 [3].

Example 1: hypothetical highly consistent (transitive)respondent

Let’s assume that the user of the software considers Cri-terion C to be extremely important, Criterion B to be mod-erately important, and Criterion A not very important to thedecision. If she is reasonably consistent with these

Table A1

Matrix 1 of orientation AeBeC

Criterion A Criterion B Criterion C

Criterion A X 67 90

Criterion B 33 X 75

Criterion C 10 25 X

underlying feelings, the results of her pairwise comparisonsmight show the following allocation of points:

Criterion C (90 points) vs. Criterion A (10 points)Criterion A (33 points) vs. Criterion B (67 points)Criterion B (25 points) vs. Criterion C (75 points)

These points are then placed in a matrix (see Table A1,Matrix 1) for 1 orientation (or order) of the criteria, in thiscase the columns and rows are ordered in the matrix asAeBeC.

Next, we compute the ratios of the comparisons for eachpair and create a new matrix. For example, the respondent in-dicated that Criterion C is worth 90 compared with CriterionA, which was scored as 10, suggesting that Criterion C is 9times more important in the decision than Criterion A. In ma-trix 2, this ratio is computed as j 5 90/10 5 9 (Table A2).

Next, we evaluate the relationship of the ratios to eachother. The ratios of B compared with C are computed asi/j 5 2/9, l/m 5 1/3, o/p 5 1/3, and similarly the ratios ofA/B are computed (Table A3).

Now we solve for A, B, and C:Assign C 5 1.0 and solve for B and C.

Table A

Matrix

A

1

1

1

Mean 0

B 5 C � ðmean of B=CÞ5 1:0 � 0:2963 5 0:2963

A 5 B � ðmean ofA=BÞ5 0:2963 � 0:4444 5 0:1317

We then sum C, B, A:Cþ BþA 5 1.0þ 0.2963þ 0.1317 5 1.428

We normalize to create preference weights ranging from0 to 1.0.

Weight C 5 C=ðAþBþCÞ5 1=1:428 5 0:7003

Weight B 5 B=ðAþBþCÞ5 0:2963=1:428 5 0:2075

Weight A 5 A=ðAþBþCÞ5 0:1317=1:428 5 0:0922

If the respondent is perfectly consistent, all rows in ma-trix 3 in each column would be identical. In this example,the numbers are similar but not identical suggesting that

3

3 of orientation AeBeC

/B B/C

/2 2/9

/2 1/3

/3 1/3

.44444 0.2963

Table A4

Matrix 1 of orientation AeCeB

Criterion A Criterion C Criterion B

Criterion A X 90 67

Criterion C 10 X 25

Criterion B 33 75 X

424.e2 K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

the respondent introduced some minor inconsistency in themagnitude (and/or) direction of the comparisons. This firstseries of matrices provides the weights for the first orienta-tion (ordered A, B, C) only. To determine the inconsistencylevel, we compute the weights for all possible orientationsof the comparisons and compute an averaged deviation.For each orientation, we reset the starting matrix with thenew orientation, for example, rows and columns ordered,AeCeB (Table A4).

With this new starting matrix, we work through matrices2 and 3 and compute the weights. Once computations forall orientations are performed, we create a summary table(Table A5) of the computed weights.

The overall inconsistency level is determined using thestandard deviations (SDs) for each weight (whereSDA 5 0.0021; SDB 5 0.0043; SDC 5 0.0052; and n 5 3)in the following formula:

Table A

Summa

Orienta

AeBe

AeCe

BeAe

BeCeCeAe

CeBe

Mean

SD

a C

Inconsistency level 5 SQRT�1=n�SD2

Aþ SD2B

þ SD2C

��5 0:0039 in this example:

Observe that the preference weights, regardless of orien-tation, are approximately the same in each column of TableA5 and the inconsistency level for this respondent is verylow (much less than the guideline of 0.10). The respondentthen went on to make the final set of comparisons indicat-ing how well each option achieves each criterion:

Criterion A (Option 1: 71 points vs. Option 2: 29 points)Criterion B (Option 1: 65 points vs. Option 2: 35 points)Criterion C (Option 1: 32 points vs. Option 2: 68 points)

Using the distributive mode [4] to combine the prefer-ence weights with the points assigned to each option foreach criterion, we generate the following summary scores:

5

ry of preference weights for consistent responder

tion Weight A Weight B Weight C

Ca 0.0922 0.2075 0.7003

B 0.0916 0.2019 0.7065

C 0.0905 0.2112 0.6983

A 0.0880 0.2084 0.7036

B 0.0889 0.2001 0.7110

A 0.0870 0.2029 0.7101

0.0897 0.2053 0.7050

0.0021 0.0043 0.0052

omputed above.

Table A

Summa

Orienta

AeBe

AeCe

BeAeBeCe

CeAe

CeBe

Mean

SD

Option 1 :71 � ð0:0897Þ þ 65 � ð0:2053Þþ32 � ð0:7050Þ5 42:3 points

Option 2 :29 � ð0:0897Þ þ 35 � ð0:2053Þþ68 � ð0:7050Þ5 57:7 points

Option 2 is recommended based on the patient’s prefer-ence weights and comparison of options.

Example 2: hypothetical highly inconsistent(intransitive) respondent

Next, assume that the respondent has made a mistake inexpressing her preferences on one comparison. Sheswitched the numbers on the last comparison (originally:Criterion B 5 25 points vs. Criterion C 5 75 points). Thepairwise comparisons led to the following:

Criterion C (90 points) vs. Criterion A (10 points)Criterion A (33 points) vs. Criterion B (67 points)Criterion B (75 points) vs. Criterion C (25 points)

To complete this computation, we would create the same3 matrices that were created in the first example and thenrepeat the process for all 6 orientations. The summary ofthe computations for each orientation is shown in the fol-lowing equation (Table A6).

Inconsistency level 5 SQRT�1=n�SD2

A þ SD2B

þ SD2C

��5 0:175 in this example:

Notice how the orientation greatly affects the magnitudeof the weights in Table A6. Looking only at the preferenceswithin the column labeled, ‘‘Weight A,’’ the weights arelargest when the orientation begins with ‘‘A’’ than for orien-tations beginning with ‘‘B’’ or ‘‘C’’ (see shaded cells in col-umn, ‘‘Weight A,’’ of Table A6). The weights in the‘‘Weight B’’ column are largest when the orientation beginswith ‘‘B.’’ The same pattern holds true for computedweights of C. The weights for all preferences have extremevariation related to the orientation.

Now, assume that the respondent makes the same com-parisons for the options achieving the criterion as were pre-viously presented with the highly consistent respondent.

6

ry of preference weights for inconsistent responder

tion Weight A Weight B Weight C

C 0.1891 0.5471 0.2638

B 0.2664 0.2695 0.4641

C 0.0765 0.7903 0.1332

A 0.0497 0.6412 0.3091

B 0.0989 0.2860 0.6151

A 0.0343 0.3547 0.6109

0.1192 0.4815 0.3994

0.0904 0.2119 0.1964

424.e3K.B. Eden et al. / Journal of Clinical Epidemiology 62 (2009) 415e424

Criterion A (Option 1: 71 points vs. Option 2: 29 points)Criterion B (Option 1: 65 points vs. Option 2: 35 points)Criterion C (Option 1: 32 points vs. Option 2: 68 points)

Option 1 :71 � ð0:1192Þ þ 65 � ð0:4815Þþ32 � ð0:3994Þ5 52:5 points

Option 2 :29 � ð0:1192Þ þ 35 � ð0:4815Þþ68 � ð0:3994Þ5 47:5 points

Option 1 is marginally recommended, based on highlyvariable preference weights and the patient’s introductionof intransitivity in the preference measurement process.These hypothetical examples demonstrate the importance

of measuring the intransitivity introduced by the respondentin prioritizing the decision criteria before providing feed-back on preference weights or summated results.

References to Appendix

[1] Kocaoglu, D. A participative approach to programevaluation. IEEE Trans Eng Manage, 1983; 30(3).

[2] Expert choice 11 quick start guide & tutorials. Arling-ton, PA. 2004.

[3] Forman E, Gass S. The analytic hierarchy process: anexposition. Oper Res 2005; 49(4):469e486.

[4] Dolan JG, Frisina S. Randomized controlled trial ofa patient decision aid for colorectal cancer screening.Med Decis Making 2002; 22:125e139.