Decisions Based on Numerically and Verbally Expressed Uncertainties

14
Journal of Experimental Psychology: Human Perception and Performance 1988, Vol. 14, No. 2, 281-294 Copyright 1988 by the American Psychological Association, Inc. 0096-1523/88/500.75 Decisions Based on Numerically and Verbally Expressed Uncertainties David V. Budescu and Shalva Weinberg University of Haifa, Haifa, Israel Thomas S. Wallsten University of North Carolina A two-stage within subjects design was used to compare decisions based on numerically and verbally expressed probabilities. In Stage 1, subjects determined approximate equivalences between vague probability expressions, numerical probabilities, and graphical displays. Subse- quently, in Stage 2 they bid for (Experiment 1J or rated (Experiment 2) gambles based on the previously equated verbal, numerical, and graphical descriptors. In Stage 1, numerical and verbal judgments were reliable, internally consistent, and monotonically related to the displayed probabilities. However, the numerical judgments were significantly superior in all respects because they were much less variable within and between subjects. In Stage 2, response times, bids, and ratings were inconsistent with both of two opposing sets of predictions, one assuming that imprecise gambles will be avoided and the other that verbal probabilities will be preferred. The entire pattern of results is explained by means of a general model of decision making with vague probabilities which assumes that in the present task, when presented with a vague probability word, people focus on an implied probability interval and sample values within it to resolve the vagueness prior to forming a bid or a rating. Subjective probability (SP) is a basic concept in all models of individual decision making under uncertainty. In one class of models, functions of SP are used to weight the utilities, or values, of the basic outcomes to yield a global assessment of goodness for each alternative. This class includes the tradi- tional Subjectively Expected Utility model (Savage, 1954) and a large variety of more recent generalizations and refinements such as the Subjectively Weighted Utility (Karmarkar, 1978), Certainty Equivalence (Handa, 1977), Prospect Theory (Kahneman & Tversky, 1979), and Anticipated Utility (Quig- gin, 1982) models. Obviously, such models require the SPs to take real numerical values bounded by 0 and 1 that satisfy certain consistency or coherence conditions. Thus, SP is considered to represent a mapping of an individual's subjec- tive beliefs into the real numbers. In another class of models, SP and outcome utilities are treated as separate dimensions, and alternatives are compared on a dimensional rather than a global basis (Payne, 1976; Russo & Dosher, 1983; Tversky, 1969). In these models, too, SP is treated as a mapping of subjective uncertainty onto the real numbers. The numerous decision models that assume numerical representation of uncertainty are in sharp contrast with the fact that people generally prefer to express their beliefs by means of natural language. Several reasons have been cited for the distinct preference of words over numbers (see also Beyth-Marom, 1982; Budescu & Wallsten, 1985; Wallsten, Budescu, Rapoport, Zwick, & Forsyth, 1986). It is claimed that most people understand words better than numbers and This work was supported by a grant from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel, and a Hugo Bergman Award to David V. Budescu. We wish to thank Hana Shtruminger for writing the programs for the experiments. Correspondence concerning this article should be addressed to David V. Budescu, Department of Psychology, University of Haifa, Haifa 31999, Israel. typically handle uncertainty by means of verbal expressions and associated rules of conversation rather than by numbers (Zimmer, 1984). Historically, probability theory is much younger (Hacking, 1975) than natural language, and at the individual level language and its subtleties are mastered long before one can deal with the numerical system and its intri- cacies, let alone with probabilities. Finally, numbers are perceived as conveying a level of precision and authority that people do not associate with their opinions. Words are perceived as more flexible and less precise in meaning and, therefore, seem better suited to describe vague and imprecise opinions and beliefs. This property of probability phrases was recently demonstrated by Wallsten et al., (1986) and by Rapoport, Wallsten, and Cox (1987), who have proposed and evaluated various ways of modeling the vagueness of these expressions. In light of this reality, it is surprising that there is virtually no literature dealing with decisions based on verbally ex- pressed beliefs. The only published article on this issue is by Shanteau (1974), but its goals were different from that of the present research, and its implicit assumption was that proba- bility phrases have precise and well-defined meanings. Most of the empirical literature on probability expressions has focused on the translation of verbal expressions to point numerical equivalents. The overwhelming result is great var- iability in the values assigned to words and large overlap among the ranges assigned to the various expressions (e.g., Beyth-Marom, 1982; Budescu & Wallsten, 1985;Foley, 1959; Hakel, 1968; Johnson, 1973; Kenney, 1981; Lichtenstein & Newman, 1967; Nakao & Axelrod, 1983; Simpson, 1944, 1963). Some of these studies (Beyth-Marom, 1982; Budescu & Wallsten, 1985; Johnson, 1973) have also shown that the between-subjects variability in assigning numbers to expres- sions far exceeds the within-subjects variability, which itself is not minor. Thus, it may be concluded that verbal expres- sions have imprecise meanings to individuals, and further, that there are substantial individual differences in the mean- ings of the expressions. 281

Transcript of Decisions Based on Numerically and Verbally Expressed Uncertainties

Journal of Experimental Psychology:Human Perception and Performance1988, Vol. 14, No. 2, 281-294

Copyright 1988 by the American Psychological Association, Inc.0096-1523/88/500.75

Decisions Based on Numerically and Verbally Expressed Uncertainties

David V. Budescu and Shalva WeinbergUniversity of Haifa, Haifa, Israel

Thomas S. WallstenUniversity of North Carolina

A two-stage within subjects design was used to compare decisions based on numerically and

verbally expressed probabilities. In Stage 1, subjects determined approximate equivalences

between vague probability expressions, numerical probabilities, and graphical displays. Subse-

quently, in Stage 2 they bid for (Experiment 1J or rated (Experiment 2) gambles based on the

previously equated verbal, numerical, and graphical descriptors. In Stage 1, numerical and verbal

judgments were reliable, internally consistent, and monotonically related to the displayedprobabilities. However, the numerical judgments were significantly superior in all respects because

they were much less variable within and between subjects. In Stage 2, response times, bids, and

ratings were inconsistent with both of two opposing sets of predictions, one assuming that

imprecise gambles will be avoided and the other that verbal probabilities will be preferred. The

entire pattern of results is explained by means of a general model of decision making with vague

probabilities which assumes that in the present task, when presented with a vague probabilityword, people focus on an implied probability interval and sample values within it to resolve the

vagueness prior to forming a bid or a rating.

Subjective probability (SP) is a basic concept in all modelsof individual decision making under uncertainty. In one classof models, functions of SP are used to weight the utilities, orvalues, of the basic outcomes to yield a global assessment ofgoodness for each alternative. This class includes the tradi-tional Subjectively Expected Utility model (Savage, 1954) anda large variety of more recent generalizations and refinementssuch as the Subjectively Weighted Utility (Karmarkar, 1978),Certainty Equivalence (Handa, 1977), Prospect Theory(Kahneman & Tversky, 1979), and Anticipated Utility (Quig-gin, 1982) models. Obviously, such models require the SPs totake real numerical values bounded by 0 and 1 that satisfycertain consistency or coherence conditions. Thus, SP isconsidered to represent a mapping of an individual's subjec-tive beliefs into the real numbers.

In another class of models, SP and outcome utilities aretreated as separate dimensions, and alternatives are comparedon a dimensional rather than a global basis (Payne, 1976;Russo & Dosher, 1983; Tversky, 1969). In these models, too,SP is treated as a mapping of subjective uncertainty onto thereal numbers.

The numerous decision models that assume numericalrepresentation of uncertainty are in sharp contrast with thefact that people generally prefer to express their beliefs bymeans of natural language. Several reasons have been citedfor the distinct preference of words over numbers (see alsoBeyth-Marom, 1982; Budescu & Wallsten, 1985; Wallsten,Budescu, Rapoport, Zwick, & Forsyth, 1986). It is claimedthat most people understand words better than numbers and

This work was supported by a grant from the United States-IsraelBinational Science Foundation (BSF), Jerusalem, Israel, and a Hugo

Bergman Award to David V. Budescu.We wish to thank Hana Shtruminger for writing the programs for

the experiments.Correspondence concerning this article should be addressed to

David V. Budescu, Department of Psychology, University of Haifa,

Haifa 31999, Israel.

typically handle uncertainty by means of verbal expressionsand associated rules of conversation rather than by numbers(Zimmer, 1984). Historically, probability theory is muchyounger (Hacking, 1975) than natural language, and at theindividual level language and its subtleties are mastered longbefore one can deal with the numerical system and its intri-cacies, let alone with probabilities.

Finally, numbers are perceived as conveying a level ofprecision and authority that people do not associate with theiropinions. Words are perceived as more flexible and less precisein meaning and, therefore, seem better suited to describevague and imprecise opinions and beliefs. This property ofprobability phrases was recently demonstrated by Wallsten etal., (1986) and by Rapoport, Wallsten, and Cox (1987), whohave proposed and evaluated various ways of modeling thevagueness of these expressions.

In light of this reality, it is surprising that there is virtuallyno literature dealing with decisions based on verbally ex-pressed beliefs. The only published article on this issue is byShanteau (1974), but its goals were different from that of thepresent research, and its implicit assumption was that proba-bility phrases have precise and well-defined meanings.

Most of the empirical literature on probability expressionshas focused on the translation of verbal expressions to pointnumerical equivalents. The overwhelming result is great var-iability in the values assigned to words and large overlapamong the ranges assigned to the various expressions (e.g.,Beyth-Marom, 1982; Budescu & Wallsten, 1985;Foley, 1959;Hakel, 1968; Johnson, 1973; Kenney, 1981; Lichtenstein &Newman, 1967; Nakao & Axelrod, 1983; Simpson, 1944,1963). Some of these studies (Beyth-Marom, 1982; Budescu& Wallsten, 1985; Johnson, 1973) have also shown that thebetween-subjects variability in assigning numbers to expres-sions far exceeds the within-subjects variability, which itselfis not minor. Thus, it may be concluded that verbal expres-sions have imprecise meanings to individuals, and further,that there are substantial individual differences in the mean-ings of the expressions.

281

282 D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

By considering verbal expressions of uncertainty as vagueprobabilities, a link can be made to the literature on the roleof vagueness in choice and decisions.' This link assumes thatwhen a person is told, for example, that an event is likely, heor she will act similarly to being told that the probability fallswithin a certain interval or that various probabilities are moreor less possible. Of course, each individual may interpret thephrase differently with respect to the uncertainty it conveys,but we assume that each deals with the imprecision in thesame way that he or she would if the imprecision were statedin a corresponding numerical fashion.

With this assumption in mind, we developed a paradigmin which in Stage 1 subjects assigned numerical probabilitiesand probability expressions to graphical probability displays(i.e., spinners) in a manner designed to achieve whatever

equivalence is possible between the three modes—graphical,numerical, and verbal—for each person. In Stage 2, the same

subjects bid to buy or sell gambles for gains or losses contin-gent on probabilities expressed in each of the three approxi-mately equivalent ways. Contrasting predictions can be de-rived for Stage 2 based on two different approaches to decisionmaking, given vague probabilities.

From one perspective, Ellsberg (1961) was the first todemonstrate decision behavior inconsistent with the axiomsof rationality (e.g., Savage, 1954) when probabilities are notdescribed precisely, and similar results have been reported byothers as well (Becker & Brownson, 1964; Curley & Yates,

1985; Larson, 1980; Yates & Zukowsky, 1976). In most ofthese experiments, subjects chose between gambles with pre-cise probabilities and ones with probabilities defined only interms of intervals or second-order distributions (i.e., proba-bilities over probabilities). Most people avoided the imprecisegambles, while a minority seemed to prefer them, in bothcases generally at the sacrifice of expected gain. To explainhis results, Ellsberg (1961) hypothesized that when faced witha situation in which many probability distributions are plau-sible, people consider two distributions: the most likely oneand either (for a majority of the people) the distributionassociated with the lowest expected gain or (for the minoritywho are optimists) the distribution associated with the highestexpected gain. Each individual, then, behaves in a mannerconsistent with a SP equal to a weighted average of the twodistributions, with the relative weights depending on the de-gree of imprecision.

Whether or not we accept Ellsberg's explanation, we cangeneralize from his and related results to make predictionsfor our paradigm. First, because most people avoid vagueprobabilities, it can be predicted that they will pay more tobe relieved of negative gambles and concurrently will pay lessto obtain positive gambles, when the gambles are expressedverbally than when they are expressed numerically or graph-ically. Furthermore, it can be predicted that they will do soeven at the loss of long-run expected gain. To the extent thatthis occurs, it can be said that behavior in response to verbalprobabilities is suboptimal relative to that in response tonumerical or graphical probabilities. Finally, in view of thenegative correlation between preference and decision time(Jamieson & Petrusic, 1977), it can be predicted that it willtake longer to bid to verbal than to numerical or graphicalgambles.

A contrasting perspective comes from Zimmer (1983,1984), who has suggested that the verbal mode of communi-cation is more natural to people than is the numerical. Spe-cifically, he claims that "people handle uncertainty by custom-ary verbal expressions and the implicit and explicit rules ofconversation connected with them" (Zimmer, 1983, p. 163)."Therefore, if one forces people to give numerical estimates,one forces them to operate in a mode which requires moremental effort and is therefore more prone to interference (andbias)" (Zimmer, 1984, p. 123). Zimmer's claims lead to justthe opposite predictions from those made above, namely, thatpeople will generally prefer gambles based on verbal ratherthan on numerical or graphical probabilities, that they willperform more optimally (in the same sense of making moremoney) with such gambles, and that decisions about themwill be faster.

The first goal of this study, therefore, is to test the contrast-ing predictions drawn from Ellsberg's and Zimmer's work.This is to be done in the present paradigm by comparingsubjects' bids and the decision times in Stage 2 for gamblesbased on verbally, numerically, and graphically expressedprobabilities established as approximately equivalent in Stage1. Of course, there does not exist an exact one-to-one corre-spondence between a certain probability and a given phrase,as demonstrated by the empirical evidence cited earlier. Ap-proximate equivalences can be established only by a meth-odology that will (a) allow for considerable individual differ-ences in the understanding and use of probability phrases; (b)eliminate the asymmetry induced by the "quantify the word"paradigm; (c) link both words and numbers to certain well-defined events; and (d) allow comparison of the two types ofjudgments. The second goal of the study is to develop such amethodology, which will be used in Stage 1.

The development of such a methodology gives rise to athird research goal, namely, a comparison of the quality ofprobabilistic judgments by verbal and numerical means. Inan extensive review of the literature on encoding of subjectiveprobabilities, Wallsten and Budescu (1983) have describedfive criteria for evaluating the quality of judgment of uncer-tainty. Three of them—reliability, internal consistency, and

1 It is necessary to clarify terminology here. Unfortunately, as

pointed out by Oden (1986), many workers in behavioral decisiontheory (e.g., Einhora & Hogarth, 1985; Ellsberg, 1961; Yates &

Zukowsky, 1975; as well as others) have used the terms ambiguous

or ambiguity where they actually meant vague or vagueness. Accord-ing to both the dictionary (e.g., Merriam-Webster, 1984) and careful

analysis (Black, 1937), a statement, phrase, or event is ambiguous if

it is capable of being understood in two or more different, but precise,

ways. It is vague if it is not clearly defined or capable of being

understood precisely. Thus, for example, the probability of drawinga ball from an urn of unspecified composition (Ellsberg, 1961) is

vague, but it is not ambiguous, and a probability based on very little

information (e.g., Einhora & Hogarth, 1985) is vague, but it is notambiguous, and so forth. A probability would be ambiguous, for

example, if one had an urn containing black cork-filled balls and

white lead-filled balls, and were told "the probability of drawing a

light ball is 0.80." We are concerned here with vagueness, not ambi-

guity. We will use only the former and related terms, while neverthe-

less referring to articles that (incorrectly) use the latter and related

terms.

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES 283

construct validity—are relevant for comparing verbal and

numerical responses.

Considering reliability first, the usual measures of correla-

tion are not applicable to the verbal responses and therefore

cannot be used for comparing the two modes. However,

reliability can be assessed and compared in terms of (a) the

number of distinct numerical and verbal responses given to a

particular spinner probability, (b) the number of probabilities

to which a unique response of each type is given, and (c) the

within-subject variability for each of the two types of re-

sponses. A consistent and reliable judge will repeatedly use

the same response when presented with a given spinner, will

use responses in a differential and discriminatory fashion, and

will display relatively little variation in the way events and

responses are associated. (An important and complicated issue

that we will not treat here concerns the use of verbal syno-

nyms, but see Zwick, 1987, for one approach to this problem.)

Internal consistency of the judgments can be tested by

examining the way in which words and numbers are used

over a large number of distinct probabilities. In particular, we

will test to what degree the ordering of these probabilities is

reflected in the subjects' judgments. Construct validity can be

established by taking advantage of the fact that in this para-

digm subjects perform several tasks associated with the same

events—judgments, bids, and attractiveness ratings. The levels

of correspondence among the responses in these tasks can

serve as measures of construct validity.

Experiment 1

Method

Subjects

Twenty native English speakers, all students at the University of

Haifa, agreed to participate in a judgment/decision-making experi-

ment. In return for their participation, they received a fixed amountof $2.50 in Israeli currency and an additional bonus, depending on

the quality of their decisions. Six of the subjects were male, 14 were

female. Their average age was 25 years.

Procedure

The experiment consisted of two sessions, approximately 1 week

apart. During the first session, subjects provided numerical and verbal

judgments of graphic displays. These judgments were used in thesecond session, which consisted of a bidding task. Both sessions were

controlled by a PDF 11/73 computer with a graphic terminal (Visual

550).Stage 1: The judgment session. On each trial, subjects were shown

a circle radially divided into a shaded and an unshaded sector on the

terminal's screen. They were instructed to imagine that a dart was

pointed at the circle's center and were asked to judge the probabilitythat it would land on the shaded section. Eleven different displays

were used, with the shaded sections equal to 0.05, 0.10, 0.20,

0.80,0.90, and 0.95 of the total area. Each subject judged each displayin three different ways: (a) by open-ended numerical judgment, (b)

by open-ended verbal judgment, or (c) by selecting a phrase from a

fixed list. Each judgment was repeated three times for a total of nine

judgments.

The subjects first provided the open-ended numerical and verbal

judgments. These were obtained in blocks of 11 trials, covering the

11 displays. The order of presentation was randomized in each block.

Half the subjects started with a numerical (N) followed by a verbal

(V) block and continued through the alternating sequence of NVNVNV.

The other half completed the inverse sequence VNVNVN. Subjects

were instructed to respond by providing their best numerical or verbal

estimate of the required probability. In the verbal case they were

instructed to use only probability phrases and to avoid frequency or

quantity descriptions. Numbers were restricted to integers from 0 to100.

Following completion of these 66 judgments, subjects performed

three blocks of selections of phrases. A list of 18 phrases was presented

on the screen next to the partially shaded circle. The list included

only probability terms and was based on results obtained from

previous studies of words-to-numbers conversions, so as to cover the

entire range of values. The 18 words were ordered randomly on each

trial, and the subjects were instructed to select the phrase that best

captured the probability of the event. The selection task was always

the last in order to avoid contamination of the subjects' subjective

lexicon, and the words were reordered on each trial in order to force

the subjects to carefully consider the list for each judgment and to

discourage the development of alternative simplifying procedures.

After the various judgments, subjects established an approximateequivalence between displays, phrases, and numbers. First, each of

the 11 graphic displays was presented together with the list of the

various words (up to six) that had been elicited from that subject. Atthe bottom of the screen the various numerical judgments (up to

three) were shown, and the subject was asked to select the one that

best described the graphic display and the various phrases. Finally,

each of the graphic displays was presented together with its numericalestimates (up to three), and the verbal judgments (up to six) were

shown at the bottom of the screen. The subject selected the one

phrase that best captured the graphic display and the various numer-

ical estimates. In each of the two sequences, the order of presentation

was randomized, The number and phrase selected for each display

were considered to be approximately equivalent with respect to

probability from the subject's point of view and were used as such in

the second session. Occasionally, a word or a number was selected by

a given subject as "best" for two different probabilities. This problem

occurred 7 times for the numerical selection (3.2%) and 26 times in

the verbal selection task (11.8%). In these cases the modal numerical/verbal response of the subject to that particular probability in the

judgment stage was selected for presentation in the decision stage.

The session lasted from 30 to 45 min.

Stage 2: The bidding session. Subjects performed a slight variationof the Marshak bidding procedure (Becker, De Groot, & Marshak,

1964). On each trial they were told that they were in possession of a

lottery ticket that had a certain chance to win/lose a given sum of

money. They were instructed to decide on the minimal cash equiva-

lent they would be willing to accept instead of a potentially winning

ticket or the maximal sum they would be willing to pay in order to

give away a potentially losing ticket. They were further informed that

not all the offers would be accepted, but rather each would becompared with a counteroffer generated by the computer. The opti-

mal strategy of responding according to one's "true" subjective worth

of a gamble was described and illustrated for both the winning and

losing cases. Half the trials involved winning lotteries, and the otherhalf were their negative reflections. Eleven probabilities were pre-

sented in three forms: graphically (a display identical to the one used

in Stage 1), verbally (the phrase selected in Stage 1), or numerically

(the number selected in Stage 1). Each presentation was replicated

three times with the stakes of $0.80, $ 1.05, and $1.25.After six training trials in which all modes of presentation were

demonstrated and the computer's counteroffers were presented and

explained, the subjects provided the experimental bids. Each person

284 D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

performed 198 trials (11 probabilities x 3 modes of presentation x 3

sums x 2 domains) in a random order.

A counteroffer was determined for each gamble by selecting a

random number in the range EV ± >h SD (i.e., one standard deviation

around its expected value), (Tversky, 1967). These values were used

to determine whether an offer would be accepted or rejected. In order

to allow comparison of the bids across subjects and modes of pres-

entation, the same counteroffers were employed for all replications

of a given Probability x Stake combination. However, neither the

counteroffer nor the outcome of the comparison was presented to the

subjects. The time that elapsed from the presentation of the gamble

until the subject's bid (total decision time) was also recorded on each

trial.

At the conclusion of the bidding task, subjects actually played six

lotteries. First, it was determined whether or not an offer was accepted;

then, if necessary, a random number was selected to simulate a lottery

and determine its outcome. Subjects received the total amount earned

in these six lotteries as well as their base pay. At the conclusion of

this session, subjects were asked to answer several questions regarding

their use of the various modes of representation of probability in the

study and in everyday life. The average duration of this session was

90 min.

Results

Stage 1: Judgment

This is the first study in which numerical and verbal judg-

ments of a fixed set of events were elicitated from the same

subjects. Therefore, it is informative to compare carefully the

two parallel sets of judgments.

Direct comparisons of numerical and verbal responses are

difficult, however, because the two modes have different

metric characteristics. Without entering a debate on what

transformation would render the numerical mode a linear or

a ratio scale measure of subjective uncertainty, it can be

agreed that the numerical mode has at least rough interval

properties. In contrast, however, one cannot with certainty

even rank order all the probability expressions used by an

individual (Budescu & Wallsten, 1985). We used two methods

of comparison to overcome this difficulty. First, we treated

the responses categorically, simply counting the number of

occurrences of distinct elements. Second, we calculated statis-

tics regarding the graphic displays to which responses of each

type were assigned.One of the most surprising results was the richness of the

probability vocabulary of the subjects. The 20 subjects gen-

erated 111 different phrases (compared with 73 numbers of

101 possible integers). The distribution of the total number

of responses over subjects for the three response methods is

summarized in the top panel of Table 1. Note that the averagesubject used about 13 phrases (both self-produced and from

the list) and about 18 numbers.

The next two panels in Table 1 present the distribution of

the number of responses per display and of the number of

displays for which a certain response was used. A perfectly

consistent subject should establish a 1:1 mapping between

responses and displays. Although the numerical judgments

do not demonstrate perfect consistency and reliability, they

are more reliable and discriminatory. The mean number ofresponses per display under the numerical condition is signif-

Table 1

Analyses of Verbal and Numerical Subjective

Judgments of Probability

Statistic

No. of responses persubject

MinimumMeanMaximumSD

No. of responses perdisplay (%)

123

MeanNo. of displays asso-

ciated with a re-sponse (%)12

3*3Mean

X Response/displayMinimumMedianMaximum

Kendall's rb

MinimumMedianMaximum

Phrases

713.4519

1.65

19.0949.5531.36

2.12

40.1119.9939.902.67

.35

.55

.73

.75

.90

.97

List

1013.2516

1.45

26.8247.3325.45

1.99

30.8125.1844.01

2.52

.33

.60

.79

.79

.92

.98

Numbers

1217.95294.66

35.4543.1821.36

1.86

84.8713.53

1.601.94

.39

.39

.83

.86

.99

.99

Note. Phrases refers to the open-ended elicitation, and list to theselection from the fixed list of phrases.

icantly lower than in the two verbal conditions combined,

((19) = 2.68, p < .05, and so is the mean number of displaysassociated with a given spinner, r(19) = 5.90, p < .05. Figure

1 displays the mean number of responses generated by each

method as a function of the displayed probability. Note the

M-shaped curves for all three methods, with the number of

responses lowest at .50, and .95. For all displays there are

more verbal responses than there are numerical ones, withthe largest difference being observed at the lower end of the

continuum (.05). The M-shape of the curve was confirmed by

a trend analysis that showed a significant quartic effect, F(l,

tf>z

•—• PhrasesListNumbers

5 10 20 30 40 50 60 70 80 90 100

PROBABILITY DISPLAYED

Figure 1. Mean number of responses generated as a function of the

probability displayed and the response mode.

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES 285

190) = 31.68, p < .05, accounting for over 44% of the

interprobabilities variance.The last panels of Table 1 summarize the degree of mono-

tonicity in the relation between the displayed probabilitiesand the responses provided by the subjects. For each subject,under each of the response modes we obtained a two-waytable of the frequency of usage of a given phrase, or number,in describing the various displays. Then, the rows of the tablewere permuted to achieve the best possible monotonic fit.This was done by ordering them according to the weightedmean of the probabilities for which the response was used.(For example, if the phrase a was used once to describe a . 10display and twice to describe a .20 display, its weighted averageof. 167 placed it after word b, which was used twice to describea .10 display and once to describe a .20 display and had anaverage of .133.) This procedure yielded a highly consistentmonotonic pattern, with a large majority of the responsesconcentrated near the diagonal. In fact, for all three modes ofjudgment, over 85% of the judgments were located in thismain block diagonal with a median value of 97% for thenumerical judgments and 94% for the verbal ones. To furtherquantify this relation, we report two statistics focusing onslightly different aspects of the data. Goodman's X (re-sponse (display) is an asymmetric measure of nominal PRE(proportional reduction in error). This statistic describes thedegree to which knowledge of the display presented canimprove prediction of the subject's response; Kendall's rb is asymmetric ordinal coefficient based on calculation of the

proportion of concordant and discordant pairs of rows andcolumns. Both measures suggest that subjects tend to use theverbal as well as the numerical response modes in a consistentmonotonic fashion, with a slight advantage to the numericalmode, but no significance tests are available to compare thevarious conditions.

Figures 2-4 illustrate this analysis of monotonicity acrossall subjects. To facilitate presentation, the figures are limitedto responses repeated at least 10 times (across subjects andreplications). The graphic portions of the figures show thefrequencies with which responses were used for each display.Note first that, as expected, the numerical responses (Figure2) demonstrate a consistent monotonic relation to the dis-played spinners. A similar pattern is obtained for the phrases(either self-generated in Figure 3) or selected from the list(Figure 4).

The two modes do differ in a very important respect,however, as is evident by inspecting the figures. Namely, eachphrase is given in response to a broader range of displays thanis each number. The question of whether this unsurprisingresult is due primarily to within- or between-subjects varia-bility is answered in the last two columns prior to the chartsin the figures. The first column shows the weighted mean ofwithin-subjects standard deviations of the displays associatedwith each response. The final column shows the standarddeviation of the mean display value for each subject perresponse. It is particularly interesting to note that between-subjects variability exceeds within-subjects variability for 18

Mean S.D. S.D. (means)

Probability

50 60 70

5

6

10

15

20

25

30

35

33

45

40

50

60

65

66

70

75

80

85

90

95

19

10

40

19

26

14

25

23

12

18

33

54

32

25

11

27

23

42

21

33

27

5.00

6.50

11.31

10.00

17.78

22.14

26.80

29.13

30.00

33.33

40.30

49.44

60.31

64.80

70.91

72.59

74.35

80.73

87.62

92.26

93.52

0

0

1.16

1.44

2.09

5.40

4.85

5.62

0

8.82

5.27

2.77

3.22

5.48

0

4.17

3.97

5.31

0

2.55

2.41

0

3.24

5.69

3.48

2.26

5.98

4.67

6.25

D

9.05

6.23

3.19

4.71

10.03

4.77

8.80

7.41

10.67

8.07

7.34

4.16

Legend

- 1-5

+ 6-10

x 11-20

* 21-30

<&> >30

Figure 2. Distribution of numerical judgments as a function of the probability displayed.

286 D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

S.D. S.D. (means)

Probability

40 SO 60 70 80 90

Improbable

Very unlikely

Unlikely

Fair chance

Some chance

Possible

Likely

Quite likely

Good chance

Quite possible

Very good chance

Probable

Very possible

Quite probable

Very likely

Almost certain

17

10

40

13

32

88

20

11

40

43

15

26

25

15

25

11

9.41

10.50

12.13

27.B9

30.63

42.67

58.00

58.18

62.75

67.33

76.00

78.65

82. BO

83.67

85.40

93.12

4.68

8.23

5.56

5.77

10.06

13.43

9.45

14.08

13.80

12.12

6.20

9.95

8.40

6.16

12.28

4.40

2.53 + +

5.04 -

6.2E x x +

7.98 -

10.01 + + +

10.59 - - x x * x x - _

22.62 - +

7.54 _ _ _ _ _

11.75 + + + +

13.32 - _ - + 4. + + _

12.33

6.28

7.98

16.84

5.07

2.09

Legend

_ 1-5

+ 6-10

x 11-10

* 21-30

® >30

_ _ _ _ _ _

+ - +

- • ( • - + -

- - - +

- - - x -

- - +

Figure 3. Distribution of verbal judgments (phrases) as a function of the probability displayed.

of the 21 numerical responses (86%), whereas the reverse istrue in only 1 case (4.5%). In contrast, when looking at theverbal responses, between-subjects variability exceeds within-subjects variability in 14 of 32 cases (44%), combining overthe free and list verbal condition, and the reverse is true in 12of 32 instances (43%). Thus, individuals are far more variablein their assignment of phrases than of numbers to displays,

relative to between-subjects variability. Furthermore, com-parison of the within- and between-subjects standard devia-tions indicates substantially greater individual differences inthe use of phrases than of numbers. To substantiate this

impression, we compared the values obtained under the threeconditions by a nonparametric procedure. Mann-Whitney U

tests showed significant differences between the numeric con-dition and the fixed list, (Z = -2.19, p > .05, for the within-subjects SDs, and Z = -2.89, p > .05 for the between-subjectsSDs), the numeric condition and the self-generated phrases(Z = -4.50, p < .05, for the within-subjects variation, and Z= -2.34, p < .05, for the between-subjects variation), but nodifference between the two verbal conditions (Z = 1.1, p >.05, for the within-subjects and Z = 0.05, p > .05 for thebetween-subjects values).

Phrase

Almost impossible

Very poor chance

Improbable

Unlikely

Poor chance

Doubtful

Somewhat unlikely

Possible

Somewhat likely

Toss-up/even odds

Probable

Likely

Good chance

Very good chance

Almost certain

N

31

37

14

24

25

21

66

52

47

54

41

32

68

71

74

Mean

5.32

6.11

11.43

15.83

16.00

19.29

28.03

42.12

48.09

50.19

64.15

64.38

68.82

81.20

95.36

S .̂

1.44

3.02

2.04

8.00

3.95

7.15

11.16

11.50

13.30

0

12.01

10.27

11.08

9.26

5.80

Probability

S.D. (means) 0 5 10 20 30 40 SO 60 70 80 90 95 100

0.72 *

2.29 x x -

6.84 + - -

8.54 + + - +

7.08 + +

10.34 +

9.16 + * x x

13.97 + x x + - -

12.81 - - + X - X + -

1.36

20.35

14.36

10.79

9.13

4.81

Legend

_ 1-5

+ 6-10

x 11-20

* 21-30

<§) >30

® -

_ + + + -f

- - + + + -

- + + x #

+ X X <g) +

- - + x ®

Figure 4. Distribution of verbal judgments (list) as a function of the probability displayed.

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES 287

At the conclusion of the verbal and numerical direct judg-ments, the subjects established approximate equivalence re-lations among the various modes. A total of 63 phrases and51 numerical values were selected for the 11 displays. Evenat this stage the intersubject agreement was quite poor butsomewhat better for numbers. Only 8 phrases and 9 numbers(12.0% and 17.65%, respectively) were selected by more thanhalf the subjects, and the majority of selections—43 phrases(68.25%) and 30 numbers (58.82%)—were specific to 1 or 2subjects only. Moreover, when a certain phrase/number wasselected by 2 or more subjects, it wasn't necessarily in con-nection with the same display. Overall, 110 Phrase x Proba-bility, and 74 Number x Probability combinations were re-corded. Thus, on the average, 10 different words and 6.73numbers were selected for each display. Only 12 Number xProbability combinations (16.22%) and 6 Phrase x Probabil-ity combinations (5.45%) were selected by more than 5 sub-jects. Most selections were idiosyncratic for a single subject—this was the case in 50% of the numerical and 65.45% of theverbal choices.

Stage 2: Bids and Inferred Probabilities

Assume that in the narrow monetary range used in thisstudy (gains and losses of $0.80-$1.25) the value functionsfor money are roughly linear in both domains.2 Then, theratio of the subject's bid to the lottery's stake can be used toinfer the subject's SP of a win or loss. We will refer to thisratio as the adjusted bid (for example, if a subject's bid for a$1.25 lottery was $1.00, the adjusted bid is 1/1.25 = 0.8).Figure 5 shows mean adjusted bids as a function of thelottery's probability, separately for the two domains and thethree modes of presentation (the diagonal represents SP equalto the displayed probability). In all three conditions, lowprobabilities are overweighted, and high ones are under-

O

OQ

QLLJ

weighted. The graphic presentation yields results most nearlylinear, and the verbal function is the least linear of all three.This can be further verified in Table 2, which presents thelinear correlations between the displayed probabilities and theinferred SPs. The lowest correlations are recorded in theverbal condition, and the tests of equality of correlated cor-relations (e.g., Guilford & Fruchter, 1978) indicated that thecorrelations in the verbal case are significantly lower than inthe graphic condition, (Z — 2.87, p < .05, for losses and Z =5.38, p < .05, for gains) and in the numeric one (Z = 1.90, p< .05, for losses and Z = 1.85, p < .05, for gains). Thecorrelation in the numeric condition is significantly lowerthan its graphic counterpart for gains (Z = 3.13, p < .05), butnot for losses (Z = 1.15, p > .05).

To test the primary predictions derived earlier, the absoluteadjusted bids were submitted to an analysis of variance(ANOVA) blocked within subjects with domain (gain vs. loss),displayed probability, and mode of presentation as independ-ent, within-subjects factors (e.g., Kirk, 1982), As Table 3shows, subjects are willing to pay more to avoid possible lossesthan they demand in place of parallel monetary gains, butthis difference is not significant, F(l, 15) = 3.62, p> .05. Thedifferential relations between bids and probabilities in thedomains of gains and losses are reflected in a significantDomain x Probability interaction, F(10, 130) = 2.99, p <

.05.The effect of presentation mode is significant, F[2, 19) =

3.77, p < .05, with slightly more extreme bids in response toverbal than to graphic or numerical presentations, as evidentin Table 3. However, the Mode x Domain interaction is notsignificant, F(2, 38) = 0.29, p > .05, indicating that subjectsboth demanded more money in place of the verbal than theother positive gambles and offered more money to be relievedof them in the negative domain. This result is also supportedat the individual level. When the gambles are matched one toone, 16 subjects displayed the same pattern of selling/buyingprices in a majority of cases.

Profit/loss analysis. This pattern of bids is also reflected inan analysis of their outcomes. We computed for each subjecthis or her expected total outcome across all 66 bets, definedas the sum of two components—the subjects' bid in caseswhere they were accepted and the gambles' expected valueswhen the offers were rejected. Table 4 presents the meanresults for gains and losses. Note that subjects win less andlose more under the verbal presentation, and their overalloutcome is worst under this condition. The difference betweenthe modes of presentation is significant in both the domainof losses, F(2, 38) = 3.42, p < .05, and the domain of gains,F(2, 38) = 3.38, p < .05. However, actual differences are

PROBABILITY

Figure 5. Mean adjusted bid as a function of lottery's probability

and mode of presentation.

2 This assumption was supported by the regression of the subjects'average bids on the corresponding stakes for fixed levels of probability.

The linear regression accounted for over 95% of the variance in thebids for all probabilities equal to or greater than .40 and for significant

proportions of the variance for lower probabilities (56%, 70%, 85%,

and 91% for .05, .10, .20, and .30, respectively). The intercepts of the

regression lines were very close to 0 in absolute terms (the medianwas .02 and the largest departure from 0 was .06), and the slopes

increased monotonically as a function of the probabilities.

288 D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

Table 2

Correlations of Inferred Probabilities With

the Displayed Probability

Mode of presentation

Domain

LosesGainsCombined

Graphic

.78

.90

.83

Numeric

.75

.86

.80

Verbal

.70

.83

.76

Table 4

Mean Expected Gain/Loss as a Function of Mode of

Presentation and Domain

Mode of presentation

Domain

GainsLossesTotal

Graphic

15.18-19.60-4.45

Numeric

14.95-19.50-4.55

Verbal

14.88-20.46-5.58

M

15.00-19.85-4.85

small. The gain under the verbal condition is only 1.23% less

than under the other two conditions combined, and the losses

are larger by only 4.65%. Only when the total outcomes are

combined does the inferiority of the verbal presentation reach

an impressive 24.41%.Decision times. In all these analyses we employ the inverse

transformation to stabilize the variances, eliminate skewness,

and diminish the impact of outliers (e.g., Smith, 1976; Wainer,

1977). Harmonic mean decision times are shown in Table 5

and Figure 6. To test the predictions regarding decision times,

we performed an ANOVA on the response rates, with mode of

presentation, displayed probability, and domain as independ-

ent variables blocked within subjects. Decisions are signifi-

cantly slower when losses are involved, F(\, 19) = 5.34, p <

.05, and, the three modes of presentation differ significantly,

F(2, 38) = 19.27, p < .05, with the slowest decisions recorded

with numerical probabilities. There are significant differences

in the decision times at the various probabilities, F(10, 190)

= 4.11, p < .05, and the probability interacts with the mode

of presentation, F(20, 380) = 267, p < .05. Generally, deci-

sions are quickest at probabilities of .50 and above .80, and

slowest in the ranges of .30-.40 and .60-.70, resulting in

approximately M-shaped curves. The nature of the mode of

Presentation x Probability interaction is not easily explained.

Experiment 2

Neither the predictions derived from Ellsberg's nor from

Zimmer's ideas were well supported by the results of Experi-

ment 1. In the positive domain, subjects were willing to bid

more, thus displaying a preference for the verbal gambles,

whereas the reverse was true in the negative domain. Further-

more, expected gain was only slightly worse in the verbal than

the other conditions, while decision time was slowest in the

numerical condition. This last result may be at least partially

due to the special relation between the nature of the task and

the various modes of presentation. Most subjects reported

that they performed some calculations prior to bidding under

all modes of presentation, but this tendency was higher in the

numerical case. The fact that bidding induces calculations

can be deduced from the subjects' responses under the graphic

presentation. In this condition, they were faced with identical

spinners in Stage 1 (judgment) and 2 (bids). In the former

case, 82% of the responses were multiples of .05, but in the

latter only 50% were, indicating that a different response

mechanism was used in this task. Because the numerical

display of the probability facilitates calculations, it is possible

that the longer decision times in this condition reflect addi-

tional computation time. Additional support for this interpre-

tation is obtained from a comparison of the variance of the

bidding times in the verbal and numerical cases. Because of

large individual differences, it makes sense to compare these

variances only within subjects. In 13 cases (65%) the variance

was larger for the numerical bids, and the average within-

subjects ratio of these two variances is 1.51. Thus, on the

average the variance was larger by over 50% with numerical

precise probability, reflecting a greater heterogeneity of strat-

egies in this case.

In order to test this possibility, we decided to replicate part

of the experiment with a different response technique that

does not encourage any calculations—namely, rating the at-

tractiveness of lotteries. This response scale is not directly

related to monetary outcomes, and it is free of some of the

biases usually associated with bidding and pricing procedures

(Tversky, Sattath, & Slovic, 1987). It is reasonable to expect

that if the decision time results from Experiment 1 are an

artifact of calculation time, they will not be replicated with

this different response method. However, there is no reason

to expect the pattern of ratings to differ from that of the bids.

Method

Subjects

Six of the subjects who took part in Experiment 1 could be locatedand agreed to participate in one additional session.

Table 3

Mean Absolute Adjusted Bid as a Function of the Mode of

Presentation and Domain

Mode of presentation

Domain Graphic Numeric Verbal

Gains .512Losses .568M .540

.503

.562

.532

M

.534 .516

.582 .571

.558 .543

Table 5

Harmonic Mean Decision Time (in Seconds) as a Functionof Mode of Presentation and Domain

Mode of presentation

Domain Graphic Verbal Numeric M

GainsLossesM

7.568.097.82

9.3710.069.70

10.6210.8410.73

9.019.519.25

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES 289

g —• Graphic

NumericVerbal

Table 6

Reliability of Ratings and Validity With Respect to

Corresponding Bids

Mode of presentation

05 10 20 30

Domain

GainsLossesCombined

GainsLossesCombined

Graphic

.86

.85

.86

.92

.56

.74

Numeric

Reliability

.97

.90

.94

Validity

.92

.67

.79

Verbal

.90

.79

.85

.73

.56

.65

PROBABILITY

Figure 6. Harmonic mean decision time (in seconds) as a functionof probability displayed and mode of presentation.

Procedure

The experiment was a partial replication of the second session ofthe first study. Subjects rated the attractiveness of gambles involving11 probabilities, presented numerically, graphically, or verbally. Thenumeric and verbal equivalents of the graphic displays were those

established in Stage 1 of the first experiment. Each of the 33 gambles(11 probabilities X 3 modes) was presented in a "win" and a "lose"context, and was replicated twice in each condition, with stakes of$0.80 and $ 1.25. Thus, each subject rated 132 gambles. The order ofpresentation was randomized. The subjects were instructed to rateeach winning (losing) gamble according to its attractiveness (unat-tractiveness) on a scale ranging from 9 (extremely attractive or unat-tractive) to I (indifferent). The lotteries were presented on the screenwith a response scale, and the subject moved the cursor to the

appropriate location.

Results

Table 6 presents the reliability of the attractiveness ratings

and their validity, the latter calculated by correlating the mean

ratings with the mean bids obtained from the same subjects

in Experiment 1. Note that all the ratings are highly reliable.

The validity measures are lowest for the verbal presentation

mode and highest for the numerical. Also, the results are more

stable and consistent in the domain of gains.

Table 7 and Figure 7 display the mean attractiveness/

unattractiveness of the lotteries in the various conditions. No

significant effects of presentation mode, f-'(2, 10) = 1.47, p >

.05, or domain of lottery, F({, 5) = 1.3, p >.05, were detected

in the ANOVA of the ratings. The subjects' ratings were mon-

otonically related to the probabilities, f(10, 50) = 73.97, p <

.05, but as in Experiment 1, the form of the relation was

different in the two domains, as witnessed by the Significant

Probability x Domain interaction, F(10, 50) = 2.23, p < .05.

Finally, Table 8 presents the harmonic mean rating time as

a function of the domain and mode of presentation. An

ANOVA of the response rates failed to detect significant differ-

ences between the two domains, F(l, 5) = 1.38, p > .05, the

three modes of presentation, F(2, 10) = 1.83, p > .05, or a

significant interaction between the two. There was a signifi-

cant effect of probability, F\10, 50) = 2.34, p < .01, that is

shown in Figure 8. Rating was fastest at the probability of .50

and generally decreased monotonically in both directions.

General Discussion

This study addressed the distinct, but related, issues of (a)

the relation between verbal and numerical judgments of un-

certainty, as obtained in Stage 1, and (b) the effect of the

mode of uncertainty expression on decision behavior as ob-

tained in Stage 2. We will first discuss the two issues separately

and then conclude by integrating them.

Verbal Versus Numerical Judgments of Uncertainty

We evaluated and compared the two modes of judgment

in terms of the three criteria of reliability, internal consistency,

and construct validity. Although both modes reached satisfac-

tory levels by all criteria, the numerical judgments consistently

and significantly outperformed the verbal ones. Specifically,

reliability was assessed in three ways: (a) the number of

distinct responses given to a particular display, (b) the number

of displays to which a unique response of each type was given,

and (c) the within-subjects display variability for each of the

two types of responses. On the basis of all three measures, the

numerical mode was significantly superior to the verbal one.

Table 7

Mean Attractiveness or Unallraaiveness Rating (1-9 Scale)

as a Function of a Mode of Presentation and Domain

(Experiment 2)

Mode of presentation

Domain

GainsLossesM

Graphic

5.075.705.38

Numeric

4.925.905.41

Verbal

5.305.975.63

M

5.095.865.48

290 D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

O

1DC.

-90 -80 -70 -60 -50 -40 -30 -20 -10

GAINS

10 20 30 40 50 60 rt) 80 90

PROBABILITY

Figure 7. Mean rating (1-9 scale) as a function of lottery's proba-

bility and mode of presentation.

This superiority of the numerical responses cannot be attrib-uted to either the open-ended nature of the free method or toartificiality of the list method, because when the numericalresponses were rounded to multiples of 5 (yielding 21 cate-gories, comparable to the size of the phrase list), they stillshowed the greatest within-subjects reliability.

Evidence for internal consistency among the responses wasobtained from the monotonicity analyses and indicated thatsubjects were consistent in both modes of response. Unfor-tunately, significance test are not available to compare thesestatistics, but they were always higher for the numericaljudgments.

Finally, construct validity was examined by regular corre-lational analysis. The adjusted bids in response to verballyand numerically presented uncertainties correlated highlywith the display probabilities that initially elicited the verbaland numerical descriptors. Also attractiveness ratings of thegambles correlated well with both the adjusted bids and theoriginal probabilities. However, correlations in the numericalcase were always significantly higher than in the verbal one(cf. Tables 2 and 6). In examining the construct validity ofthe verbal and numerical presentations, it is instructive tocompare the correlations with their upper limit, namely, those

Table 8Harmonic Mean Judgment Times as a Function of Mode ofPresentation and Domain (Experiment 2)

Mode of presentation

Domain

GainsLossesM

Graphic

3.033.373.19

Verbal

3.313.893.58

Numeric

3.483.463.47

M

3.263.563.41

oz

• • Graphic

NumericVerbal

.05 .10 .20 .30 .40 .50 .60 .70 .80 .90 .95

PROBABILITY

Figure 8. Harmonic mean rating time (in seconds) as a function of

probability displayed and mode of presentation.

obtained under the graphic presentation mode. If the judg-ment (numerical or verbal) introduces any noise and causesconfusion, the former correlations will be lower than the latter.In fact, the numerical and graphic correlations are very simi-lar, and in the domain of losses, are not significantly different-Correlations in the verbal condition were in all cases signifi-cantly lower than their graphic and numerical counterparts,indicating that this mode of judgment is associated with higherlevels of noise and confusion.

To summarize, both modes of judging uncertainty yieldreliable, internally consistent scales that demonstrated con-struct validity at the level of individual subjects. Although notsurprising in the numerical mode, these results extend thoseof Budescu and Wallsten (1985) in the verbal mode, whoshowed only that individuals consistently rank order nonnu-merical descriptors of certainty. However, all the comparisonsfavored the numerical over the verbal response mode, becauseof considerably and significantly smaller within- and between-subjects variability under the former mode. Thus, the exper-imental procedure yielded verbal and numerical descriptionsof the graphic displays that were as similar as possible incentral probability meaning for each subject but dissimilar inat least two other regards. Specifically, the phrases were morevague than the numbers for each subject; this is consistentwith much previous research (Rapoport et al., 1987; Wallstenet al., 1986). Also, when considering those phrases and num-bers used by multiple subjects, we found that between-subjectsvariability and therefore individual differences were muchgreater in the verbal mode.

Finally, it is noteworthy that, on average, subjects re-sponded with over 13 phrases within each of the two verbalmethods. Altogether, the 20 subjects generated 111 distinctexpressions in response to 11 probability displays. Even afterthe "best" phrases were ultimately determined for each subjectfor each display, disagreement in choice of expression wasextreme. Thus, it appears that our subjects had substantial

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES 291

working vocabularies of uncertainty, with nevertheless rela-tively little overlap among individuals. These results stand inmarked contrast to those of Zimmer (1983), whose subjects'active lexicons for uncertainty seemed to contain five or sixexpressions each, with many phrases in common. Specifically,the 150 subjects in his experiment used a total of 12 verballabels of probability. Our and his experiments differ in nu-merous ways, any of which might be responsible for theconflicting results.

Decision Behavior

Before discussing the various predictions, it is worth point-ing out that when considering each of the presentation modesseparately, essential features of the present data replicateresults previously obtained with numerical lotteries. Specifi-cally, within the graphic, numerical, and verbal presentationmodes, subjects were more sensitive to losses than to gains,in the sense of offering more money to avoid a negative lotterythan requiring to replace an equivalent positive one (Budescu& Weiss, 1987; Kahneman & Tversky, 1979), and decisiontimes were longer for negative than for positive lotteries (BenZur & Breznitz, 1981).

However, when comparing behavior across modes, andconsidering Experiments 1 and 2 together, the data demon-strated neither of the expected patterns of results. There wasno evidence of systematic avoidance of the verbal lotteries inconjunction with longer decision times and considerable sac-rifice of financial gain, as predicted from the results of Ellsberg(1961) and others. Nor were there systematic preferences forsuch lotteries in conjunction with shorter decision times andenhancement of financial gain, as predicted from the idea(Zimmer, 1983, 1984) that verbally expressed uncertaintiesare preferred and more optimally processed than are theirnumerical counterparts.

Two features of the rating times in Experiment 2 demon-strate that the longer bidding times for the numerical thanthe other gambles in the first study were artifactual, probablybecause of calculation time. First, rating times for the threepresentation modes were equal, suggesting no intrinsic differ-ence in time to process the three kinds of lotteries. Second,the rating times were considerably shorter than the biddingtimes, probably reflecting the absence of calculations. Finally,it must be emphasized that subjects were not informed thatdecision times were to be collected, nor were they urged torespond quickly. Thus, before final conclusions are reachedregarding the relative difficulty of processing uncertainty pre-sented in the various modes, it is necessary to collect dataunder conditions that encourage quick responding.

Although neither set of predictions was supported, therelations among responses in the three modes are systematic,with three important conclusions emerging. First, the generalpattern of bids and ratings was similar in all three cases(Figures 5 and 7). This result is especially surprising, giventhe vagueness of the phrases relative to the numbers anddisplays themselves, as documented by the judgments in Stage1. Second, bids under verbal presentation were relatively lessoptimal than under the other two modes, as measured byexpected earnings. Overall, the subjects would have earned

about 24% less in the verbal than in either of the other twoconditions (Table 4). Third, on the basis of the bids in thefirst experiment, the verbal lotteries were slightly but signifi-cantly and consistently preferred in the domain of gains,whereas the opposite was true in the domain of losses. Thisconclusion is derived not only from the group analyses, butit also holds for 16 of the 20 individual subjects as well.Ratings in Experiment 2 showed the same pattern, althoughthe effect of presentation mode was not significant in thiscase.

This demonstration of preference for vagueness under avery general condition stands in marked contrast to the resultsof all other studies of which we are aware on the effects ofvagueness on preference. Ellsberg (1961), Einhorn and Ho-garth (1985), and Hogarth and Kunruther (1984) all reportedsome individuals who preferred vague to precise lotteries. Inaddition, experimental conditions were manipulated in thelatter two articles that encouraged vagueness preferences.Otherwise, the universal result has been vagueness avoidance.Indeed, except for the studies of Einhorn and Hogarth (1985)and Hogarth and Kunruther (1984), all experiments used onlypositive lotteries, which is precisely where we found vaguenesspreference.

In trying to understand the pattern of bids obtained in thedomain of gains, it is important to realize that the generalvagueness avoidance hypothesis implies a context or a do-main-specific effect—namely, a vague probability is givenmore weight or responded to as a higher chance in the contextof a loss than of a gain. Thus, a lottery based on a verbalprobability is valued less in the context of gains but more(negatively) in the context of losses than is a gamble based ona precise probability to which the verbal one was judgedequivalent in Stage 1. The alternative hypothesis, that verbalprobabilities are preferred to numerical or graphic ones, im-plies an opposite effect in the two domains. That is, gamblesbased on verbal probabilities are valued more in the domainof gains and less (negatively) in the domain of losses. Incontrast, our data indicate that the subjects attached moreextreme values to the verbal gambles than to the precise onesin both domains.

The most obvious difference between this study and all theothers is that in the present case the vague uncertainties wererepresented linguistically. Also, we used a bidding task, as didBecker and Brownson (1964); all other studies used a pairwisechoice procedure. Because Becker and Brownson's results aresimilar to everyone else's, we are inclined to think that theimportant feature is the manner in which the vagueness isexpressed. However, the relative contributions of the twodifferences must be explored carefully, especially in light ofthe well-documented effects of response mode and framingon preferences (Goldstein & Einhorn, 1987; Slovic & Lichten-stein, 1983; Tversky & Kahneman, 1981).

Integrating the Stage 1 and 2 Results: The v-n Model

The primary puzzle in the present data is that the Stage 1data clearly showed that the phrases are more vague than thenumbers or the displays, but nevertheless the general patternsof bids, ratings, and times were similar in all cases (except for

292 D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

the numerical bid times, presumably because of calculations).Despite the overall similarity, the verbal gambles were re-sponded to less optimally, in that earnings from them were24% less than from the other two types, because verbalgambles were valued more positively in the case of gains andmore negatively in the case oflosses than were the others.

A theory of judgment and choice on the basis of linguisticuncertainties (represented by the v-n model) has been devel-oped to explain these and numerous other results (Wallsten,Budescu, & Erev, in press) and will be sketched here insofaras it pertains to the present data.

The first assumption is that probability phrases are vaguein the sense that they describe some probabilities very well,some not at all, and some to an intermediate degree. Theexact probabilities represented by a phrase, and the degree ofvagueness, vary over individuals. Thus, probability phrasesare "linguistic variables" (Zadeh, 1974) that can be repre-sented by membership functions over the (0, 1) interval ofprobabilities, as illustrated in Figure 9. The ordinate of suchfunctions, /», denotes the degree of membership of a givenprobability in a particular vague phrase for a given person.

Alternatively, n for a phrase Wean be thought of as the truthvalue of the statement, "The probability p is described by thephrase W," bounded by 0 (absolutely false) and 1 (absolutely

true). Wallsten et al. (1986) and Rapoport et al., (1987) discussproperties of these functions in detail and have also developedmethods of empirically establishing them in reliable and validways in the context of the representation of vagueness.

Because phrase meanings are vague and overlap, as illus-trated by Wallsten et al. (1986) and in Figure 9, they wereapplied in a relatively inconsistent way to the probabilitiespresented in Stage 1. However, in Stage 2 it was absolutelynecessary to resolve the vagueness in order to provide a bidor a rating. We theorized that this was accomplished byfocusing on a restricted interval of the most representativeprobabilities and by sampling from it in some way to achievea single valued representation of the phrase, from which a bid(rating) was formed. Thus, subjects considered only probabil-ities with memberships above a certain task-specific threshold,v. Figure 9 presents membership functions of four hypotheti-cal words, W,-WA, and three possible thresholds, 1/1-1/3. Notethat the threshold employed determines the degree of effectiveoverlap among the four words, and the width of the foureffective probability intervals. For example, if one uses v\ heor she is faced with four relatively wide and highly overlappingintervals. On the other hand, i>3 yields four narrow andnonoverlapping intervals. This amounts to the assumptionthat in the context of a given task, the subject can unambig-uously rank order the four words. Thus, the general similarityof the bids and ratings in the three presentation modes canbe explained by assuming a relatively high threshold, yieldinga stable ordering of the words.

Once a threshold is determined, the subject must select apoint value in the interval to represent the phrase. Manysampling rules can be invoked; data from a choice experiment(Wallsten et al., in press) support a model in which probabil-

PROBABILITY

Figure 9. Hypothetical membership function for four words (W,-W,) and three i>-cut levels (u,-v,).

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES 293

ities are assigned a sampling weight equal to their relativemembership value above the threshold. Given this model, theoverweighting of high probabilities implies a strong predictionregarding the shape of the membership functions—namely,that they be generally positively skewed above the threshold.

This v-n model serves as a general conceptual frameworkthat provides one possible explanation for all aspects of thepresent results. Obviously, additional experiments using ver-bal probabilities with known and established membershipfunctions are necessary to fully test this theory.

References

Becker, S. W., & Brownson, F. O. (1964). What price ambiguity? Or

the role of ambiguity in decision making. Journal of PoliticalEconomy, 72, 62-73.

Becker, G. M, De Groot, M. H., & Marshak, J. (1964). Measuring

utility by a single response sequential method. Behavioral Science,9, 226-232.

Ben Zur, H., & Breznitz, S. J. (1981). The effect of time pressure on

risky choice behavior. Acla Psychologies 47, 89-104.

Beyth-Marom, R. (1982). How probable is probable? Numerical

translation of verbal probability expressions. Journal of Forecast-

ing, I, 257-269.

Black, M. (1937). Vagueness. Philosophy of Science, 4, 427-455.

Budescu, D. V., & Wallsten, T. S. (1985). Consistency in interpreta-

tion of probabilistic phrases. Organizational Behavior and HumanDecision Processes, 36, 391-485.

Budescu, D. V., & Weiss, W. (1987). Reflection of transitive and

intransitive preferences: A test of prospect theory. Organizational

Behavior and Human Decision Processes, 39, 184-202.

Curley, S. P., & Yates, J. F. (1985). The center and range of the

probability interval as factors affecting ambiguity preferences. Or-

ganizational Behavior and Human Decision Processes, 36, 273-287.

Einhorn, H. J., & Hogarth, R. M. (1985). Ambiguity and uncertainty

in probabilistic inference. Psychological Review, 92, 433-461.Ellsberg, D. (1961). Risk, ambiguity and the Savage axioms. Quarterly

Journal of Economics, 75, 643-669.

Foley, B. J. (1959). The expression of certainty. American Journal of

Psychology, 72,614-615.Goldstein, W. M., & Einhorn, H. (1987). Expression theory and the

preference reversal phenomenon. Psychological Review, 94, 236-

254.Guilford, J. P., & Fruchter, B. (19781. Fundamental statistics in

psychology and education (6th ed.). New York: McGraw-Hill.Hacking, I. (1975). The emergence of probability. Cambridge, MA:

Cambridge University Press.

Hakel, M. (1968). How often is often? American Psychologist, 23,

533-534.Handa, J. (1977). Risk, probabilities, and a new theory of cardinal

utility. Journal of Political Economy, 85, 97-122.

Hogarth, R. M., & Kunruther, H. C. (1984). Risk ambiguity and

insurance. Unpublished manuscript. University of Chicago, Grad-

uate School of Business, Center for Decision Research.

Jamieson, D. G., & Petrusic, W. M. (1977). Preference and the time

to choose. Organizational Behavior and Human Performance, 19,

56-67.

Johnson, E. M. (1973). Encoding of qualitative expressions of uncer-

tainty. (Tech. Paper 250). Arlington, VA: U.S. Army Research

Institute for the Behavioral and Social Sciences.Kahncman, D., & Tversky, A. (1979). Prospect theory: An analysis

of decision under risk. Econometrica, 47, 263-291.

Karmarkar, U. S. (1978). Subjectively weighted utility: A descriptive

extension of the expected utility model. Organizational Behaviorand Human Performance, 21, 61-72.

Kenney, R. M. (1981). Between never and always. New England

Journal of Medicine, 305, 1097-1098.

Kirk, E. R. (1982). Experimental design (2nd ed.). Belmont, CA:

Brooks/Cole.

Larson, J. R. (1980). Exploring the external validity of a subjectively

weighted utility model of decision making. Organizational Behav-

ior and Human Performance, 26, 293-304.

Lichtenstein, S., & Newman, J. R. (1967). Empirical scaling of

common verbal phrases associated with numerical probabilities.Psychonomic Science, 9, 563-564.

Merriam-Webster. (1984). Webster's Ninth New Collegiate Diction-ary. Springfield, MA: Author.

Nakao, M. A., & Axelrod, S. (1983). Numbers are better than words:

Verbal specifications of frequency have no place in medicine. The

American Journal of Medicine, 74, 1061-1065.

Oden, G. (1986, November). Discussion at the symposium on "The

Representation and Role of Ambiguity in Judgment and IndividualDecision Making" at the annual meeting of the Judgment/DecisionMaking Society, New Orleans.

Payne, J.W. (1976). Task complexity and contingent processing in

decision making: An information search and protocol analysis.

Organizational Behavior and Human Performance, 16, 366-387.

Quiggin, J. (1982). A theory of anticipated utility. Journal of Eco-

nomic Behavior and Organization, 3, 323-343.

Rapoport, A., Wallsten, T. S., & Cox, J. A. (1987). Direct and indirect

scaling of membership functions of probability phrases. Mathe-matical Mode/ing, 9, 397-417.

Russo, J. E., & Doshe'r, B. A. (1983). Strategies of multiattributebinary choice. Journal of Experimental Psychology: Learning,Memory, and Cognition, 9, 676-696.

Savage, L. J. (1954). The foundations of statistics. NY: Wiley.

Shanteau, J. (1974). Component processes in risk decision making.Journal of Experimental Psychology, 103, 680-691.

Simpson, R. H. (1944). The specific meanings of certain terms

indicating differing degrees of frequency. Quarterly Journal of

Speech, 30, 328-330.Simpson, R. H. (1963). Stability in meanings for quantitative terms:

A comparison over 20 years. Quarterly Journal of Speech, 49, 146-

151.Slovic, P., & Lichtenstein, S. (1983). Preference reversals: A broader

perspective. American Economic Review, 73, 596-605.Smith, J. E. K. (1976). Data transformations in analysis of variance.

Journal of Verbal learning and Verbal Behavior, 15, 339-346.Tversky, A. (1967). Additivity utility and subjective probability.

Journal of Mathematical Psychology, 4, 175-201.

Tversky, A. (1969). Intransitivity of preferences. Psychological Re-view, 76, 31-48.

Tversky, A., & Kahneman, D. (1981). The framing of decisions and

the psychology of choice. Science, 211, 453-458.

Tversky. A., Sattath, S., Slovic, P. (1987). Contingent weighting injudgment and choice. Unpublished paper, Stanford University,

Stanford, CA.Wainer, H. (1977). Speed vs. reaction time as a measure of cognitive

performance. Memory & Cognition, 5, 278-280.Wallsten, T. S., & Budescu, D. V. (1983). Encoding subjective prob-

abilities: A psychological and psychometric review, Management

Science, 29, 151-173.

Wallsten, T. S., Budescu, D. V., & Erev, I. (in press). Understanding

and using linguistic uncertainties. Acta Psychologica.Wallsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth,

B. (1986). Measuring the vague meanings of probability terms.

Journal of Experimental Psychology: General, 115, 348-365.

294 D. BUDESCU, S. WEINBERO, AND T. WALLSTEN

WaUsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth,

B. (1986). Measuring the vague meanings of probability terms.

Journal of Experimental Psychology: General, 115, 348-365.

Yates, J. F., & Zukowski, L. G. (1976). Characterization of ambiguityin decision making. Behavioral Science, 21, 19-25.

Zadeh, L. A. (1974). The concept of a linguistic variable and its

application to approximate reasoning. In K. S. Fu & J. T. Tow

(Eds.), Learning systems and intelligent robots (pp. 1-10). New

York: Plenum Press.

Zimmer, A. C. (1983). Verbal vs. numerical processing of subjective

probabilities. In R. W. Scholtz (Ed.), Decision making under un-

certainty (pp. 159-182). Amsterdam: North-Holland.

Zimmer, A. C. (1984). A model for the interpretation of verbal

predictions. International Journal of Man-Machine Studies, 20,121-134.

Zwick, R. (1987). Combining stochastic uncertainty and linguisticinexactness: Theory and experimental evaluation. Unpublisheddoctoral dissertation, University of North Carolina at Chapel Hill.

Received September 15, 1986Revision received August 19, 1987

Accepted September 1, 1987 •

Inconsistencies and Aggravations:Word Processing Manuscripts in APA Style

If some of the requirements of APA editorial style and typing format, as described in thePublication Manual, conflict with the capabilities of your word processing system, the APAJournals Office would like to hear from you. We are reexamining style and format require-ments in the light of how easy or how difficult they are to implement by those preparingmanuscripts on computers or word processors. For example, some systems cannot easilyplace the page number under the short title in the upper-right corner of each page. If youhave had difficulties with this requirement or any others, we would much appreciate yourtaking the time to jot down which requirements have presented problems, and what hardwareand software systems you used to prepare the manuscript. Write to Leslie Cameron, Room710, APA, 1400 North Uhle Street, Arlington, Virginia 22201.