Loci of contextual effects in judgment

20
Journal of Experimental Psychology: Human Perception and Performance 1982, Vol. 8, No. 4, 582-601 Copyright 1982 by the American Psychological Association, Inc. 0096-1523/82/0804-0582S00.75 Loci of Contextual Effects in Judgment Barbara A. Mellers University of California, Berkeley Michael H. Birnbaum University of Illinois at Urbana-Champaign Three experiments investigated the loci of contextual effects in judgment, Ex- periment 1 demonstrated the effect of stimulus spacing on category ratings and magnitude estimations of the darkness of dot patterns. Variations in the stimulus spacing were shown to affect both category ratings and magnitude estimations in a similar fashion. Experiment 2 was designed to determine whether contextual effects due to stimulus spacing influence the scale values or the judgment function. Subjects judged "differences" and "ratios" of the subjective darkness of dot patterns. Differences in mean judgments of single stimuli from Experiment 1 did not predict the rank order of judged "differences" and "ratios" from Experiment 2. The estimated scale values of the stimuli appeared to be independent of stim- ulus spacing. These findings suggest that contextual effects due to the stimulus spacing occur in the judgment function for within-modality judgments. Exper- iment 3 examined contextual effects in cross-modality judgments. Stimulus spac- ing and stimulus range were manipulated for "difference" and "total" judgments. Unlike the within-modality results, the stimulus range and spacing influenced the scale values. A contextual theory of within- and cross-modality judgment is presented. It is now well established that judgments are relative. That is, the response to a stim- ulus depends not only on the stimulus to be judged, but also on other stimuli that form a context, or frame of reference for judg- ment (Birnbaum, 1974b, 1978; Helson, 1964; Johnson & Mullally, 1969; Parducci, 1963, 1968, 1974, 1982; Poulton, 1968; Restle & Greeno, 1970). The term context refers to such factors as the stimulus range, stimulus spacing, frequency of stimulus presentation, response range, and other experimental de- tails that influence judgment. What is not so well established is how to deal with contextual effects (Anderson, 1975; Birnbaum, 1974b, 1978; Poulton, 1979). Some authors have regarded contextual ef- fects as sources of error, noise, confusion, or bias. Stevens (1971), for example, remarked that they "rate no better than a nuisance" Coauthorship is equal. This research was supported by the Research Board of the University of Illinois at Urbana-Champaign. We thank Jerome R. Busemeyer for discussion of an earlier draft. Requests for reprints should be sent to either Barbara A. Mellers, Department of Psychology, University of California, Berkeley, California 94720 or Michael H. Birnbaum, Department of Psychology, University of Illinois, 603 East Daniel, Champaign, Illinois 61820. and are a "diversion from the basic business of sorting out the fundamental principles" (p. 448). Poulton (1979) and Anderson (1982) contended that contextual effects can and should be "avoided," though their rec- ommendations for procedures that are sup- posed to accomplish this goal differ. Others have argued that contextual effects are both lawful and necessary for a complete psychophysical theory. This approach at- tempts to develop theories that account for judgments in a wide variety of contexts (Birnbaum, 1974b, 1982a; Parducci, 1974, 1982; Birnbaum, Parducci, & Gifford, 1971). The purpose of this paper is to explore con- textual effects in within-modality and cross- modality comparison and combination tasks. Outline of Judgment To facilitate discussion of possible loci of contextual effects, it is useful to represent judgment as a composition of functions. In single-stimulus judgments, such as category ratings or magnitude estimations, subjective values (s) of the stimuli are related to phys- ical values by the psychophysical function, s = H(4>). The output function, /, translates subjective values to overt responses. The function relating responses to stimuli is the 582

Transcript of Loci of contextual effects in judgment

Journal of Experimental Psychology:Human Perception and Performance1982, Vol. 8, No. 4, 582-601

Copyright 1982 by the American Psychological Association, Inc.0096-1523/82/0804-0582S00.75

Loci of Contextual Effects in Judgment

Barbara A. MellersUniversity of California, Berkeley

Michael H. BirnbaumUniversity of Illinois at Urbana-Champaign

Three experiments investigated the loci of contextual effects in judgment, Ex-periment 1 demonstrated the effect of stimulus spacing on category ratings andmagnitude estimations of the darkness of dot patterns. Variations in the stimulusspacing were shown to affect both category ratings and magnitude estimationsin a similar fashion. Experiment 2 was designed to determine whether contextualeffects due to stimulus spacing influence the scale values or the judgment function.Subjects judged "differences" and "ratios" of the subjective darkness of dotpatterns. Differences in mean judgments of single stimuli from Experiment 1 didnot predict the rank order of judged "differences" and "ratios" from Experiment2. The estimated scale values of the stimuli appeared to be independent of stim-ulus spacing. These findings suggest that contextual effects due to the stimulusspacing occur in the judgment function for within-modality judgments. Exper-iment 3 examined contextual effects in cross-modality judgments. Stimulus spac-ing and stimulus range were manipulated for "difference" and "total" judgments.Unlike the within-modality results, the stimulus range and spacing influencedthe scale values. A contextual theory of within- and cross-modality judgment ispresented.

It is now well established that judgmentsare relative. That is, the response to a stim-ulus depends not only on the stimulus to bejudged, but also on other stimuli that forma context, or frame of reference for judg-ment (Birnbaum, 1974b, 1978; Helson, 1964;Johnson & Mullally, 1969; Parducci, 1963,1968, 1974, 1982; Poulton, 1968; Restle &Greeno, 1970). The term context refers tosuch factors as the stimulus range, stimulusspacing, frequency of stimulus presentation,response range, and other experimental de-tails that influence judgment.

What is not so well established is how todeal with contextual effects (Anderson, 1975;Birnbaum, 1974b, 1978; Poulton, 1979).Some authors have regarded contextual ef-fects as sources of error, noise, confusion, orbias. Stevens (1971), for example, remarkedthat they "rate no better than a nuisance"

Coauthorship is equal.This research was supported by the Research Board

of the University of Illinois at Urbana-Champaign. Wethank Jerome R. Busemeyer for discussion of an earlierdraft.

Requests for reprints should be sent to either BarbaraA. Mellers, Department of Psychology, University ofCalifornia, Berkeley, California 94720 or Michael H.Birnbaum, Department of Psychology, University ofIllinois, 603 East Daniel, Champaign, Illinois 61820.

and are a "diversion from the basic businessof sorting out the fundamental principles"(p. 448). Poulton (1979) and Anderson(1982) contended that contextual effects canand should be "avoided," though their rec-ommendations for procedures that are sup-posed to accomplish this goal differ.

Others have argued that contextual effectsare both lawful and necessary for a completepsychophysical theory. This approach at-tempts to develop theories that account forjudgments in a wide variety of contexts(Birnbaum, 1974b, 1982a; Parducci, 1974,1982; Birnbaum, Parducci, & Gifford, 1971).The purpose of this paper is to explore con-textual effects in within-modality and cross-modality comparison and combination tasks.

Outline of Judgment

To facilitate discussion of possible loci ofcontextual effects, it is useful to representjudgment as a composition of functions. Insingle-stimulus judgments, such as categoryratings or magnitude estimations, subjectivevalues (s) of the stimuli are related to phys-ical values by the psychophysical function,s = H(4>). The output function, /, translatessubjective values to overt responses. Thefunction relating responses to stimuli is the

582

LOCI OF CONTEXTUAL EFFECTS 583

composition, R = J[H(<j>)]. Judgments basedon multiple stimuli (judgments of "differ-ences," "ratios," or "total intensities") canbe represented as the composition of threefunctions (Birnbaum, 1974a, 1978), as inFigure I.1

Contextual effects due to manipulation ofthe stimuli that form the frame of refererieecan, in principle, influence any or all of thesefunctions. Identifying the loci of contextualeffects amounts to pinpointing contextualeffects in terms of these functions. A com-plete psychophysical theory should not onlyidentify the loci of contextual effects but suc-cessfully predict changes in the function (orfunctions) influenced by manipulation of thecontext.

Range-Frequency Theory

One contextual theory for predicting cat-egory ratings of single stimuli as a functionof the stimulus distribution is range-fre-quency theory (Parducci, 1965; 1982; Par-ducci & Perrett, 1971). This theory assumesthat the response is a compromise betweentwo tendencies: (a) a tendency to divide thestimulus range into equal subranges; and(b) a tendency to use categories along theresponse scale with equal frequency. A gen-eral form of range-frequency theory can bewritten:

Gtk = akFk(s,) + ftks, + yk, (1)

where Gik is the response to stimulus / incontext k; st is the subjective value of thestimulus; Fk is the cumulative stimulus dis-tribution for context k (i.e., Fk[st] is linearlyrelated to the rank of stimulus / in contextk)\ ak and ftk are the weights that dependon the response range, stimulus range, andweights of the two tendencies, and yk is anadditive constant that depends on the re-sponse scale. If ak is zero, then the responsewill be linearly related to the subjective valueof the stimulus (range principle). On theother hand, if f)k is zero, then the responsewill be linearly related to the stimulus rank(frequency principle). Range-frequency the-ory has been successful in a wide variety ofjudgment domains (Birnbaum, 1974b; Par-ducci, 1963; 1974; Parducci & Perrett,

Outline of Psychological MeasurementPhysical_w,Values

ScaleValues-C- Jntegrated ,

Impression.OvertResponse

Figure 1. Outline of notation. (Subjective scale valuesof the stimuli, s, and st are combined [or compared] bythe function, \ftij = C[s,, fy], and transformed to an overtresponse by the strictly monotonic judgment function,Rv = . From Birnbaum, 1978.)

1971). Range-frequency theory may applyto contextual effects in the psychophysicalfunction or the judgment function (H or Jin Figure 1).

Overview

The purpose of this article is to identifythe loci of contextual effects in within-modaland cross-modal judgments. All three ex-periments use the same stimuli: dot patterns.The stimulus levels (in number of dots) werespaced according to either a positively ornegatively skewed distribution in which sixdot patterns were common to all distribu-tions and to all three experiments. Judg-ments or inferred scale values of these com-mon stimuli are affected by the other stimuliin their distribution; this influence is termeda contextual effect.

In Experiment 1, judges rated or esti-mated the "darkness" of single-dot patternsusing category ratings or magnitude esti-mations. In Experiment 2, judges evaluated"ratios" and "differences" in darkness be-tween each pair of stimuli. In Experiment3, judges made cross-modality comparisonsand combinations of dot patterns with circlesthat varied in diameter. They were asked tojudge the "difference" between the darknessof each dot pattern and the size of each circleand the "total intensity" of each pair.

1 Quotation marks are used to designate the task givento the subject or responses given by the subject. Quo-tation marks are not used for models or theoretical state-ments. Thus "ratios" and "differences" refer to judg-ments of stimulus pairs, which may or may not fit ratioand difference models. Quotations are not used for com-puted differences in judgment, which are actual differ-ences between two judgments.

584 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

Experiment 1 investigates the effects ofthe stimulus spacing and the response pro-cedure. In Experiment 1, contextual effectscan be attributed to the composition of twofunctions, /[#(</>)]. With single stimulusjudgments, it is impossible to unconfoundcontextual changes in the scale values (H)from changes in the judgment function (/).

Experiment 2 separates the effects of thestimulus spacing on H and J by examiningtwo-stimulus "ratio" and "difference" judg-ments. Previous research (Birnbaum, 1978,1980) concluded that the subtractive oper-ation underlies both "ratio" and "differ-ence" judgments. An extension of the sub-tractive theory which allows variations in thestimulus spacing to affect both the scalevalues and the judgment function, can bewritten:

Ryk ~ Jf(.Sjk ~ slk), (2)

where RIJk and Dtjk represent "ratio" and"difference" judgments, respectively; J% andJk are judgment functions that depend onthe response procedure and the distributionof stimuli, k; slk and sjk are the estimatedscale values for stimulus i and j that alsodepend on the stimulus distribution, where[slk = Hk(<l)j)]. Experiment 2 investigatesEquations 2 and 3 as well as two specialcases, in which contextual effects influenceonly the scale values or only the judgmentfunctions. These special cases make differentpredictions that will be tested in Experi-ment 2.

Experiment 3 examines the effects of thestimulus range and spacing on cross-modaljudgments. Subjects are asked to compareand combine the subjective darkness of dotpatterns with the subjective size of circles.A general theory that asserts that contextualeffects both precede (occur in H) and follow(i.e., in J) stimulus comparison can be ex-pressed by the equations

Ttjk = J'k(Cj + slk), (4)

Dt]k = Jk(cj ~ %), (5)

where Tijk and Dyk are the "total intensity"and "difference" judgments, respectively;J'k and Jk are strictly monotonic judgmentfunctions; sik and c, are the scale values of

the ith dot pattern and the /th circle size incontext k. (Because the same distribution ofcircle sizes is used throughout, the circlescale values do not have a subscript for con-text.) Special cases of Equations 4 and 5 thatassume that the stimulus range and spacingoperate entirely on H or / make differentialpredictions that will be tested in Experi-ment 3.

The major purpose of these experimentsis to test alternative theories of the loci ofcontextual effects in which the influence ofsuch variables as stimulus spacing can beattributed to different functions in Figure 1.It will be shown that the effects of contextcan be attributed to the J functions in Ex-periments 1 and 2. However, in the cross-modality tasks of Experiment 3 there wasevidence that scale values depend on stim-ulus spacing and range.

Experiment 1: Contextual Effects in"Direct" Scaling

Traditionally, the two most popular "di-rect" methods for scaling have been categoryratings and magnitude estimations. Magni-tude estimations are usually found to be apositively accelerated function of categoryratings (Stevens, 1966; Stevens & Galanter,1957; Torgerfcon, 1961; Eisler, 1963; Marks,1974, 1979). The nonlinear relationship be-tween the two procedures has been a long-standing puzzle.

One theory is that judgments of stimulican be regarded as "direct" measurementsof subjective value. This theory ignores the/ function in Figure 1 and operationally de-fines scale values as judgments:

tk - sik ,

tk - s*k,

(6)

(7)

where Gik and Mik are category ratings andmagnitude estimations of stimulus i in con-text k, slk and sfk are the subjective values,in which a nonlinear transformation, s,k =T(sfk), relates the two scales.

An alternative representation of thesejudgments (Birnbaum, 1978; 1982a) is asfollows:

Glk = Jk(s,\ (8)

Mik = (9)

LOCI OF CONTEXTUAL EFFECTS 585

POSITIVELY SKEWED CONTEXT NEGATIVELY SKEWED CONTEXT

9 '; • • 10 •'•*V'.- i i '•,'•.'.• •• • • • • • •

Figure 2. Two stimulus distributions. (Patterns 9, 11,6, 10, 1, and 7 are identical in both contexts; thesestimuli have 12, 18, 27, 40, 60, and 90 dots, respectively.)

where st is assumed to be a sensory scalevalue, independent of the stimulus range,stimulus spacing, or the response scale; Jkand 7f are assumed to be strictly monotonicjudgment functions whose exact functionalform depends lawfully on the stimulus range,stimulus spacing, and response continuum.

There are three major purposes for Ex-periment 1. First, the experiment determinesthe magnitude of the contextual effect dueto stimulus spacing using the same stimuliand general procedure that will be used inExperiment 2. Predictions for Experiment2 will be calculated based on results of Ex-periment 1. The second purpose of Experi-ment 1 is to examine the effects of stimulusspacing and response range for both categoryratings and magnitude estimations. Thesame stimuli and procedures are used to de-termine whether the effects are comparablefor the two tasks. A third purpose of Ex-periment 1 is to investigate the consequencesof comparing magnitude estimations be-tween 'groups of subjects who experiencedifferent contexts. Equations 8 and 9 implythat these judgments are not necessarily anordinal scale of subjective value when com-pared across contexts.

MethodObservers were asked to judge the subjective darkness

of dot patterns using either category ratings or mag-nitude estimations. Four anchored category-rating con-ditions were produced by a factorial design of two re-sponse ranges combined with two stimulus distributions.In two other category-rating conditions, the largest andsmallest stimuli were not anchored to the endpoints ofthe response scale, to examine the effect of anchoring.

Twelve magnitude-estimation conditions were producedby a factorial combination of two stimulus distributions,three standards, and two response ranges. Different sub-jects served in e^ch of the 18 conditions, which werecarried out over a two-year period as separate experi-ments with the same stimuli.

Stimuli. The stimuli were 25-mm squares containingirregular patterns of 1-mm solid dots. The judges re-ceived either a positively skewed or a negatively skeweddistribution of dot patterns, as shown in the left andright of Figure 2, respectively. Six stimuli, common toboth distributions, contained 12, 18, 27, 40, 60, and 90dots (patterns numbered 9, 11, 6,10, 1, and 7 in Figure2, respectively). In the positively skewed distribution,five additional (context) stimuli had 14, IS, 16, 21, and23 dots. In the negatively skewed distribution the fivecontext stimuli had 47, 51, 70, 74, and 77 dots.

Each subject received a 22 cm X 28 cm sheet on whichthe 11 stimuli were printed in the format shown on eitherside of Figure 2, except that there were no titles orborder, and each stimulus was identified by a letter in-stead of a number. Thus, all of the stimuli for one con-text were simultaneously available during the judg-ments.2 The common stimuli appear in the same positionsin both contexts so that position in the arrangement andstimulus rank would be unconfounded.

Category rating conditions. The positively and neg-atively skewed stimulus distributions were factoriallycombined with three response scales. In two of the con-ditions, judges were asked to use integers from 1 to 5where 1 = very light, 2 = light, 3 = medium, 4 = dark,and 5 = very dark. In two other conditions, the ratingscale went from 1 to 100 (where 1 = very light and100 = very dark). Rating scales were anchored by theinstructions to call the lightest dot pattern 1 and thedarkest dot pattern either 5 or 100. In two other con-

2Parducci (1963) found that contextual effects aresimilar for situations in which the stimuli are presentedsuccessively or simultaneously. Birnbaum (1978,1982a)found similar results for the two procedures for judg-ments of "ratios" and "differences." In pilot work, weobtained similar results to our Experiment 2 with suc-cessive presentation of stimulus pairs.

586 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

40 60 90

Stimulus Value1 (Log Scale)

Figure 3. Contextual effects in category ratings. (Meanratings of the common stimuli are plotted against stim-ulus values, which are spaced in equal log steps. Dashedcurves show ratings for the positively skewed stimulusdistribution [left of Figure 2]; solid curves are from thenegatively skewed distribution [right of Figure 2]. Up-per panel shows results for 1-100 rating scale; lowerpanel for 1-5 rating scale. Brackets show plus and minusone standard error.)

ditions (positively and negatively skewed stimulus dis-tributions), the 1-5 scale was used, but the categorylabels were not anchored to the stimuli.

The subjects repeated the experiment in all of theconditions, and only the second set of responses was usedin the analyses.

Magnitude estimation conditions. The stimuli, dis-tributions, and procedures were the same as those forthe category-rating tasks except that judges were askedto make magnitude estimations. Positively and nega-tively skewed stimulus distributions were factoriallycombined with two ranges of response examples andthree standards (12, 18, and 27 dots), yielding 12 con-ditions.

Judges were asked to estimate the "ratio" of eachstimulus to the standard, using a modulus of 100. Oneset of instructions used the following examples: 33 = Was dark as the standard dot pattern, 40 = % as dark,50 = ft as dark, 67 = ̂ as dark, 100 = equal in darknessto the standard, 150 = 1V4 times as dark, 200 = 2 timesas dark, 250 = 2!4 times as dark, and 300 = 3 timesas dark.

In the other instructions, the examples read: 1 1 = %as dark as the standard, 14 = '/7 as dark, 20 = % asdark, 33 = W as dark, 100 = equal in darkness to the

standard, 300 = 3 times as dark, 500 = 5 times as dark,700 = 7 times as dark, and 900 = 9 times as dark. Whenthe 12-dot pattern (lightest stimulus) was used as thestandard, the examples below 100 were omitted.

In all of the magnitude estimation conditions, judgeswere encouraged to use any numbers, in between ormore extreme than the examples, to indicate the "ratioof the darkness of each dot pattern to the standard."

Subjects. The judges were 883 undergraduates.3

In the category-rating conditions, there were 134judges who used the 5-point anchored scale, 228 withunanchored 5-point scale, and 158 with the 100-pointanchored scale. About half of each group received thepositively or negatively skewed distributions.

In the magnitude estimation conditions, there were363 subjects with 22 to 41 judges in each of the 12conditions.

Results and Discussion

Category Ratings. Mean judgments forthe anchored category-rating tasks are shownin Figure 3. The upper panel shows meanratings of the common stimuli in the an-chored 1-100 condition; the lower panelshows the results for the anchored 1-5 con-dition. The two curves within each panelshow that the mean ratings of the same stim-uli can be either positively or negatively ac-celerated relative to log <£, depending on thestimulus spacing. Brackets represent plusand minus one standard error for each mean.The general shape of the curves is consistentwith Parducci's range-frequency theory(Equation 1). Data for the 1-5 unanchoredscale were similar to those for the anchoredscale.

Parducci (1982) has recently found thatthe magnitude of the contextual effect dueto the stimulus distribution, decreases withincreasing number of response categoriesand that it increases as a function of thenumber of distinct stimulus levels. Using a100-point scale, Parducci (1982) found asmall contextual effect due to variation ofthe relative frequency with which 5 stimuli

3 The research participants in Experiments 1-3 were1,123 undergraduates at the University of Illinois atUrbana-Champaign, who were tested alone or in smallgroups and who received credit in lower division psy-chology courses. An additional 51 were tested who eitherfailed to follow instructions or to complete the tasks intime. Of these, 33 students (most of whom failed tocomplete one of the two parts in the allotted time) werein Experiment 3; 13 were in Experiment 1; and the other5 were in Experiment 2.

LOCI OF CONTEXTUAL EFFECTS 587

were presented. However, the present data,obtained with 11 stimulus values, show thatthe contextual effect of stimulus spacing forthe 100-point rating scale remains quitelarge.

Magnitude estimations. Magnitude es-timations of the common stimuli are shownin Figure 4, with a separate curve for eachof the four conditions in which a standardof 12 dots was used. If there were no effectof the stimulus distribution and the examplesused in the instructions, then all four curveswould coincide. Instead, the difference be-tween the open and solid points shows thatwhen the examples range as high as 900,subjects use numbers that average muchhigher than when the largest example is only300. It appears that the power function ex-ponent obtained in magnitude estimationexperiments depends largely on two vari-ables that are under the experimenter's con-trol: the (log) response range and the (log)stimulus range. Thus, exponents of the powerfunction may relate more closely to the ex-perimental design than to the subjects' sen-sations. Robinson (1976) and Poulton (1979)reached similar conclusions (see also Teght-soonian, 1971).

Figure 4 shows that the effects of the stim-ulus spacing on magnitude estimations aresimilar to the effects for category ratings.Parducci (1963) found similar results. Onedifference is that the magnitude estimationcurves in Figure 4 do not rejoin at the upperend, as they do for category ratings.4

Since the relationship between M and <j>and between G and 0 can have so many dif-ferent forms, the relationship between Mand G should not be theorized to be an in-variant functional form (Montgomery, 1975).

In Figure 5, mean magnitude estimationsof the same six stimuli (averaged over con-text) are shown for the between-subject de-signs (on the left) from Experiment 1 anda within-subject design from Experiment 2(on the right; these data will be discussedin more detail later). In the between-subjectdesigns, each subject is given only one stan-dard and several comparison stimuli; in thewithin-subject design, each subject receivesmultiple standards and stimuli. The stimuliare spaced on the abscissa according to judg-ments using the 27-dot standard, which then

900

800

§ 700

| 600u)111

„ 500•o3

1 400

2 300cI 200

100

. Magnitude Estimation

900"

LorgesrExample

I i I12 18 27 40 60 90

Stimulus Value(Log Scale)

Figure 4. Contextual effects in magnitude estimation.(Mean magnitude estimations for common stimuli areplotted against stimulus values as in-Figure 3. The 12-dot stimulus was the standard and was assigned thevalue 100. Dashed lines show results for positivelyskewed distributions, solid lines are for negativelyskewed conditions. Upper two curves show results wheninstructions included an example response as high as900; lower two curves show results when the highestexample was 300. Brackets show plus and minus onestandard error.)

automatically becomes the identity line ineach set. The ratio model (Rtj = Sj/th whereRv is the magnitude estimation of the "ratio"of stimulus levels j to i, with scale values Sjand tt) implies that the other curves in Figure

4 The present data allow a test of the conclusion ofMoskowitz (1982) that magnitude estimations are moresensitive to stimulus differences than category ratings.Following the procedure of Moskowitz, the F ratio forthe main effect of the common stimuli (relative to theSubjects X Stimulus interaction) was calculated for thefirst 22 subjects in each of the 18 conditions. Contraryto Moskowitz, it was found that the Fs for categoryratings are larger than those for magnitude estimations.All 6 category-rating conditions had F(5, 105) greaterthan 80, and 5 of 6 Fs were greater than 120. For the12 magnitude-estimation conditions, only 6 of 12 hadF(5, 105) greater than 80, and only 2 Fs were greaterthan 120. A difference of procedure may have producedthe difference in results. Moskowitz did not anchor hiscategory scale to the stimuli by means of warm-ups and/or instructions to judge the stimuli relative to the ex-tremes. It seems likely that his subjects assigned severalstimuli to the same category and effectively used onlya few categories.

588 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

200 400 6000 100 200 3000 200 400 600Magnitude Estimation (27 Dot Standard)

Figure 5. A comparison of magnitude estimations from between-subject designs (in which each judgereceived only one standard) and a within-subject design, plotted as a function of judgments using the27-dot standard, with a separate curve for each standard. (Examples in the instructions went as highas 900 for the condition on the left, 300 for the curves in the center, and 800 for the condition on theright. Judgments of common stimuli were averaged over the positively and negatively skewed distri-butions. Dashed lines show predictions from the product rule for the upper curve in each set.)

5 should also be linear, and all three curvesin each set should intersect at a commonpoint. The two sets of curves obtained usingbetween-subject designs (left of Figure 5)bow out in the center and pinch in at theends. The curves obtained using a within-subject design show bilinear divergence,more consistent with the ratio model.

Product rule. Previous investigations ofmagnitude estimations have examined theproduct rule, which is implied by a specialcase of the ratio model.5 The product rulecan be written as

/? — /? • /? ^ 10^

which follows from the ratio model, Ry =Sj/tt, if Sj = tt when i= j (that is, if the scalevalue of first and second stimuli are equalfor equal stimuli).

The dashed lines in Figure 5 show theprediction of the product rule (Equation 10).Only a portion of the dashed line is plottedfor the curves on the left because the pre-dicted curve goes beyond the scale on theordinate. In fact, the predicted "ratio" of 90dots to 12 dots is 1,593, although the meanresponse is only 746. The between-subjectdata clearly violate the product rule. Theupper curve with the 12-dot standard is notas steep as the curve predicted by the prod-uct rule for either the "300" or the "900"condition. However, the within-subject data

(Experiment 2) appear reasonably consistentwith the product rule. The violations of theproduct rule and violations of bilinearity inthe between-subject data are expected fromEquations 8 and 9, which assume that dif-ferent J and J* functions are involved fordifferent groups of subjects receiving differ-ent stimulus distributions, standards, andexamples.

In sum, both category ratings and mag-nitude estimations of the subjective darknessof dot patterns depend on the stimulus spac-ing. Both procedures also depend on the re-sponse range presented in the instructions.In the case of magnitude estimations, it ap-pears that the supposed freedom of the sub-ject to control the range of responses islargely illusory, since subjects seem to beextremely sensitive to the range of examplesused to explain the task.

5 Bilinearity is a weaker requirement, implied by amore general ratio model, Rv = (*//*,)* + b. If the curvesintersect with an ordinate projection of zero,' then b =0. Otherwise, the ordinate projection of the intersection(b) can be subtracted from each "ratio" response. IfSj = lt when / = j, then these adjusted "ratio" responses(Ry - b) should obey the product rule. A recent articleby Fagot (1979) distinguishes three properties of ratiomodels, which he defines as ratio consistency, Rv = R,k •Rtf product constancy; Rv • Rlk = Rlm • Rm/c; and ratioconstancy, Ry/Rkj = Rlm/Rkm. The first property isEquation 10. Violation of bilinearity would refute allthree properties.

LOCI OF CONTEXTUAL EFFECTS 589

Experiment 2: Contextual Effects in"Ratio" and "Difference" Judgments

One interpretation of the results fromExperiment 1 is that the judgment functiondepends upon the stimulus spacing and othercontextual aspects of the experiment, Gtk =•^t(*f)> where st is independent of context.However, another interpretation is that theseresults reveal changes in the psychophysicalfunction, as expressed by the equation, Gik =slk = //*(<£(). Experiment 2 is designed toseparate the H function from the J functionand thereby identify the loci of contextualeffects due to stimulus spacing.

In Experiment 1, the difference in the re-sponses between 18 and 12 dots exceeds thedifference in the responses between 90 and60 dots for the positively skewed stimulusdistribution. However, for the negativelyskewed distribution, the mean difference inthe responses to 18 versus 12 is less than themean difference in responses to 90 versus 60(see Figures 3 and 4). With a unifactor de-sign (as in Experiment 1) it is impossible totell whether these changes in response shouldbe attributed to changes in the scale valuesor changes in the judgment function.

In Experiment 2, however, it is possibleto distinguish between contextual effectsthat precede or follow stimulus comparison.The experimental procedure involves askingsubjects to judge "differences" betweenstimulus pairs and varying the spacing of thestimuli. For example, will judges in the pos-itively skewed context rank the "difference"in darkness between 18 and 12 dots asgreater than the "difference" in darknessbetween 90 and 60? Will the subjects giventhe negatively skewed distribution rank or-der the two "differences" in the oppositeorder? (Similar questions can be asked con-cerning "ratios.")

This question can be formalized with re-spect to two special cases of Equations 2 and3. First, contextual effects may operate en-tirely prior to stimulus comparison; that is,they operate on the .//function in Figure 1.This theory assumes that judgments of singlestimuli directly reflect scale values, and sikcan be replaced with judgments from Ex-periment 1, Glk\ that is, % = G,k. This theoryimplies that judged "differences" in Exper-

100§80S 60

Ji 4°5 20

| -20|-40o-60

-80

-100

Positively SkewedDistribution

Negatively SkewedDistribution

121827406090 121827406090Stimulus Value (Log Spacing)

Figure 6. Computed differences between mean categoryratings (on the 100-point scale) from Figure 3. (If stim-ulus spacing affects scale values, the rank orders of thejudged "differences" in the two contexts would not bethe same. Dashed lines, which connect equal physicalratios, go in opposite directions in the two panels.)

iment 2 will be monotonically related tocomputed differences in judgment from Ex-periment 1 as follows:

Rijk = J*[G)k - Gik], (2a)

Di)k = J[GJk - Glk], (3a)

where R and D are "ratio" and "difference"judgments; / and J* are monotonic judg-ment functions, as in Equations 2 and 3; Gtkrepresents the judged value of stimulus i incontext k, as in Experiment 1.

Second, the effect of context may onlyfollow stimulus comparison; that is, contextmay only operate on the /function in Figure1. This second special case can be writtenas follows:

- st], (2b)

- st], (3b)

where only the judgment functions (/) de-pend on the context (subscript k). This the-ory implies that the rank order of "differ-ences" and "ratios" will not vary as afunction of stimulus spacing.

Predictions of the model in which contexteffects precede stimulus comparison (Equa-tions 2a and 3 a), are shown in Figure 6.Differences (GJk — Glk) were computed fromthe single stimulus category ratings in Ex-periment 1 (on the 1-100 anchored scale).These computed differences are plotted as

590 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

a function of the left stimulus (minuend—along the abscissa) with a separate curve foreach level of the stimulus on the right (sub-trahend). The left and right of Figure 6 showpredicted differences computed from cate-gory ratings in the positively skewed andnegatively skewed conditions, respectively.

Notice that the rank order differs system-atically between the two conditions in Figure6. Dashed lines connect pairs of equal phys-ical ratios (i.e., equal log differences) tohighlight the change in rank order. In thepositively skewed condition, equal log dif-ferences decrease in subjective extremity asone moves up the scale. In the negativelyskewed condition, equal log differences arepredicted to increase in extremity as onemoves up the scale. For example, Figure 6shows that in the positively skewed context,the computed difference between the 18- and12-dot stimuli exceeds the difference be-tween the 90- and 60-dot stimuli, whereasfor the negatively skewed context, the orderof these differences is reversed.

If contextual effects govern H or both Hand J, then the rank order of judged "dif-ferences" in the positively skewed contextwould be different from the rank order ofthe judged "differences" in,the negativelyskewed context. If the pattern in Figure 6is obtained, the simpler interpretation thatscale values are independent of context(Equations 2b and 3b) would be rejected infavor of Equations 2a and 3a or Equations2 and 3. On the other hand, if the scale val-ues are independent of the stimulus spac-ing—that is, if contextual effects operateonly on the transformation from subjectivedifferences to overt responses—then Equa-tions 2b and 3b (in which Sj and st replacesjk and sl/c) would be sufficient. This modelimplies that the rank order of "difference"and "ratio" judgments in both contextsshould be the same: All four matrices shouldhave the same rank order.

Method

Observers were asked to judge either "ratios" or"differences" of the subjective darkness of dot patterns.Positively and negatively skewed stimulus distributions(shown in Figure 2) were factorially combined with in-structions to judge either "ratios" or "differences." Dif-ferent judges served in each of the four conditions.

"Difference" task. Judges were told to rate the"difference of subjective darkness" between stimuluspairs on a scale from 90 to -90 where 80 = left stimulusis very very much darker than right; 60 = left stimulusis very much darker than right; 40 «= left stimulus ismuch darker than right; 20 = left stimulus is darkerthan right; 0 = left and right stimuli are equal in dark-ness; -20 = right stimulus is darker than left; -40 =right stimulus is much darker than left; -60 = rightstimulus is very much darker than left; -80 = rightstimulus is very very much darker than left. Judges wereinstructed to use integers between -90 and 90.

"Ratio" task. Judges were given the following ex-amples: 800 = left stimulus is 8 times as dark as right;400 = left stimulus is 4 times as dark as right; 200 =left stimulus is 2 times as dark as right; 100 = left andright stimulus are equal in darkness; SO = left stimulusis V4 as dark as right; 25 = left stimulus is '/4 as darkas the right; 12.5 = left stimulus is '/» as dark as right.Judges were encouraged to use numbers in between ormore extreme than the examples to express their judg-ments.

Design. In each condition there were 121 trials con-structed from an 11 X 11, Left Stimulus X Right Stim-ulus, factorial design in which each dot pattern on theleft could appear with each of the 11 dot patterns onthe right. The distribution of stimuli was the same forboth the right and left stimulus.

Procedure. Judges read the instructions and com-pleted 33 representative warm-up trials followed by 121experimental trials in random order. Each trial referredto two of the dot patterns of the stimulus array. Thestimulus sheets were the same as those used in Exper-iment 1. Thus, the stimuli and procedure were like thoseof Experiment 1 except that judges were directed tocompare stimuli in pairs rather than judge them indi-vidually.

Subjects. There were 80 judges, with 18 to 23 dif-ferent people in each of the four conditions.

Results and Discussion

Figure 7 shows mean "ratios" plottedagainst mean "differences" with a separatetype of symbol for each divisor (subtrahend).Data are shown for the 6 X 6 common de-sign, with the results for the positivelyskewed condition on the left, and the resultsfor the negatively skewed condition on theright. Note that the data appear reasonablyconsistent with the hypothesis that one op-eration underlies both tasks, that is, thatjudged "ratios" are approximately a mono-tonic function of judged "differences." Ifsubjects truly used both ratio and subtractiveoperations with the same scale values, how-ever, then judgments of "ratios" and "dif-ferences" would not be monotonically re-lated but instead would show a particularnonmonotonicity in which equal ratios cor-

LOCI OF CONTEXTUAL EFFECTS 591

respond to more extreme differences as thedivisor increases, (see Birnbaum, 1980, Fig-ure 3).

To test the one-operation theory and tofind out if scale values depend on context,the judgments were fit by a special case ofEquations 2 and 3 in which J and J* arelinear and exponential, respectively:

Rijk = 7* exp[ty - slk] + dk , (11)

DIJk = ak[sjk ~ slk] + 0k , (12)

where RIJk and Dtjk are predicted "ratio" and"difference" judgments between stimuli iand/' in context k; ak, (3k, yk, and Sk are fittedconstants. The subscript, k, on the scale val-ues indicates that different scale values werepermitted for each context (though the scalevalues and comparison process were assumedto be the same for both "ratio" and "differ-ence" tasks within each contextual distri-bution). Although Equations 11 and 12 as-sume that the scale value of each stimulusis independent of the standard with whichit is compared and also that the scales arethe same for both tasks, these equations gavea good fit to data from nine experiments withseveral continua (Birnbaum, 1978, 1980,1982a).

For each task, a proportion of varianceunaccounted for was defined as follows:

(13)

where PT is the proportion of residual sys-tematic variance for task T, XT is the meanjudgment within each cell of the design, XTthe corresponding prediction, and XT theoverall mean judgment for task T. In the"difference" tasks, XT represents the meanjudgment; for the "ratio"_tasks, XTis the logof the mean judgment, XT is the log of theprediction, and XT is the mean of the logs.The summation is over all cells in the designfor task T. The sum of these four proportions(across all four matrices) was minimized, bymeans of a computer program that utilizedChandler's (1969) STEPIT subroutine. Pa-rameter estimates were derived using the11X11 designs. Similar results were ob-tained when only the 6 X 6 common designswere used. For the common design, the over-all indices of lack of fit were .011 and .014for the positively and negatively skewed con-ditions, respectively. Model deviations thusconstitute a little more than .5% of the sys-tematic variance for each of the four ma-trices.

600c~ 500o•| 400</>LJ

300

200

100

0

Pos

-100 -80 -60 -40 -20 0 20 40 60

"DIFFERENCE" RatingFigure 7. Judgments of "ratios" plotted against "differences." (Data on the left and right are for thepositively and negatively skewed contexts, respectively. Abscissa shows scale for negatively skewedcontext; positively skewed context data are shifted 40 units to the left; arrows show zero "difference."Curves are best-fit solutions to a special case of the one operation theory,)

592 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

The fit of the model can be assessed inFigure 8, which shows mean judgments forthe common stimuli in the positively skeweddistribution (upper panels) and negativelyskewed distribution (lower panels). Datapoints are connected with solid lines. Dashedlines are predictions of the theory that sub-jects use the same operation and scale valuesfor both tasks (Equations 11 and 12).

The subtractive model (Equations 11 and12) predicts that the "difference" judgmentswill be parallel and linear and that the "ra-tio" judgments will show bilinear divergence

due to the exponential J function. In all fourpanels, the data points lie close to the pre-dictions of the model. The average standarderrors of the means are 2.55 and 8.62 in the"difference" and "ratio" tasks, respectively.

Note that not only are the rank orders ofthe data points similar within a context (i.e.,for "difference" and "ratio" judgments) butalso across contexts (i.e., for positive andnegative skew). Thus, the predictions in Fig-ure 6, which were based on judgments of thesame stimuli in the same contexts (Experi-ment 1), failed to materialize. Instead, the

c<oE

<a:

coQ>

700

600

500

400

300

200

100

0

700

600

500

400

300

200

100

. 0

Positively Skewed -

h R

121(18) (27) (40) (60) (90)

. Negatively Skewed- R

(12) (18) (27) (40) (60) (90)

60

40

20

0

-20

-40 ~

rn33m

-60

60

40

20 g-(O

-20

-40

•60

8 10

Antilog of Scale Value (2s)

I 2 3

Scale Value (5 )

Figure 8. Mean "ratio" judgments (R) and "difference" judgments (D) for the common stimuli in thepositively skewed and negatively skewed conditions. (This figure replots the data of Figure 7 to assessthe predictions of bilinearity and parallelism. Dashed lines are the predictions of the subtractive theoryfit to all four matrices simultaneously. "Ratios" are plotted against antilogs of the scale values for theleft stimulus, with a separate curve for each stimulus on the right [divisor]. Linearity of the curves isconsistent with the theory that one operation underlies both tasks.)

LOCI OF CONTEXTUAL EFFECTS 593

rank order of "differences" and "ratios"appears largely independent of stimulusspacing.

Figure 9 shows the estimated scale valuesfor the positively skewed distribution plottedagainst those for the negatively skewed con-text. Note that although the scale values ofthe two conditions were permitted to differ,they are quite similar; the points lie veryclose to the identity line. The dashed lineshows the expected relationship if the stim-ulus spacing had influenced the estimatedscale values in the same fashion as it affectedthe single ratings of Experiment 1, accordingto Equations 2a and 3a (i.e., contextual ef-fects precede stimulus comparison). Instead,the data appear consistent with the view thatcontextual effects operate only on the J func-tion, as in Equations 2b and 3b.

In conclusion, these data do not provideevidence that contextual effects due to stim-ulus spacing operate on the scale values. Itappears that the rank order of "ratios" and"differences" can be reproduced by assum-ing that judges, compute differences betweenscale values that are independent of the stim-ulus spacing. Considering Experiments 1and 2 together, it appears that judgments ofsingle stimuli depend on stimulus spacing,but scale values derived from the subtractivemodel do not. Thus, it seems reasonable tolocalize the effects of stimulus spacing in thejudgment function for "direct" ratings andfor the present within-modal stimulus com-parisons.

Experiment 3: Contextual Effects in Cross-Modality Comparison and Combination

Experiment 3 examines whether the sametheories can account for contextual effectsin cross-modality judgments as for within-modality judgments. Cross-modality match-ing involves the comparison of stimuli fromtwo different dimensions. "Does the punish-ment fit the crime?" or "Is this salary fairpay for this job?" are examples of this typeof judgment.

Two prominent theories of cross-modalitymatching are mapping theory and relationtheory (Krantz, 1972; Shepard, 1978). Ac-cording to mapping theory, psychologicalvalues of stimuli from different continua are

o ,—o in

IwUJ

1 2 3 4

Estimated Scale Value(Neg)

Figure 9. Solid points show estimated scale values forpositively skewed context plotted against estimated scalevalues for negatively skewed context. (Broken curveshows relationship predicted from scale values based onthe 100-point ratings in Figure 3.)

mapped onto a common scale of sensationand can be directly compared. A cross-mo-dality match is presumed to occur whenequal strength sensations are elicited bystimuli on different continua.

According to relation theory, relationships(e.g., ratios) between pairs of stimuli fromdifferent continua are compared. In physicalmeasurement, a mass in grams cannot becompared with a length in centimeters butratios of masses can be compared with ratiosof length. By analogy, it may be possible tocompare the ratio of the heaviness of twoweights to the ratio of the loudness of twotones, since the ratios 'of stimulus pairs areon a common scale. Neither mapping theorynor relation theory gives an explicit accountof contextual effects due to the stimulus dis-tribution.

Another view of cross-modality matching,psychological relativity theory, contends thateach stimulus is compared to its distribution,and the relative positions of the two stimuliwith respect to their distributions are com-pared. For example, consider the question,"Is the weight as heavy as the light isbright?" Psychological relativity theory as-serts that to answer this question, the relativeposition of the weight in the distribution ofsubjective heavinesses is compared to therelative position of the light in the distri-

594 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

Figure 10. Example stimulus trial for the cross-modalityexperiment.

bution of brightnesses. The relative positionof a stimulus in its distribution is assumedto be its range-frequency value, as in Equa-tion 1.

Experiment 3 investigates the effects ofthe stimulus distribution (range and spac-ing) on cross-modality judgments. Two taskswere used, "difference" and "total intensity"judgments. Subjects were asked to compareand combine the subjective size of circleswith the subjective darkness of dot patterns.Psychological relativity theory asserts thatthe estimated scale values derived from thesubtractive and additive models (Equations4 and 5) depend on the stimulus distributionfor each dimension.

On the other hand, if the scale values areindependent of the stimulus distribution, asimpler theory than relativity theory mightsuffice. A special case of Equations 4 and 5in which the stimulus distribution only af-fects the judgment function (analogous toEquations 2b and 3b) can be written:

Tijk = J'k[cj (4b)

(5b)

where T and D are "total" and "difference"judgments, and the other terms are also asdefined in Equations 4 and 5, except thescale values now do not depend on context(note there is no k subscript for the scalevalues). Experiment 3 examines whether thescale values in cross-modality judgment de-pend on stimulus distribution.

MethodJudges rated both "differences" and "total intensi-

ties" of the subjective size of circles and subjective dark-ness of dot patterns. An example stimulus trial is shownin Figure 10. Judges were asked to rate the "total in-tensity" of the size of the circle and the darkness of thedot pattern. They were also asked to indicate whetherthe size of the circle exceeded the darkness of the dotpattern and to estimate the "difference." Four differentgroups received the same set of circles paired with oneof four distributions of dot patterns.

"Difference" task. Judges rated the "difference be-tween the subjective size of the circle and the subjectivedarkness of the dot pattern" on a scale from -90 ("thedarkness of the dot pattern is very very much greaterthan the size of the circle") to 90 ("the size of the circleis very very much greater than the darkness of the dotpattern"). Zero referred to an "equal match" of circlesize and dot darkness.

"Total intensity" task. Subjects were instructed thatthe darker the dot pattern and the larger the circle, thegreater the "total intensity" of the stimuli. "Total in-tensity" was rated on a scale from 0 ("the intensity isvery very weak") to 90 ("the intensity is very verygreat"). On trials in which only one stimulus was pre-sented, judges were asked to rate the "intensity" of thatstimulus as though it was presented with another stim-ulus of zero "intensity." Instructions stated that the"intensity" of a stimulus presented alone was alwaysless than the "total intensity" of the same stimulus pre-sented with another.

Design. There were four experimental conditions.Six circles with diameters of 7.6, 11.2, 14.7, 18.3, 21.8,or 25.4 mm, were factorially combined with one of fourdifferent sets of dot patterns, with 1.5-mm black dotsinside 25-mm squares. Six dot patterns were commonto all four distributions of dots; these patterns contained12, 18, 27, 40, 60 or 90 dots.

The four distributions of dots were as follows: Me-dium range: 10, 12, 18, 27, 40, 60, 90, and 135. Therewere 48 cells produced from a 6 X 8, Circle Size X DotNumber, factorial design. Wide range: 6, 12, 18, 27,40, 60, 90, 180 (a 6 X 8 design). Positively skewed: 12,14, 15, 16, 18, 21, 23, 27, 40, 60, 90 (a 6 X 11 design).Negatively skewed: 12, 18, 27, 40, 47, 51, 60, 70, 74,77, 90 (a 6 X 11 design).

Each of the four conditions was constructed from afactorial design in which the stimulus on the left wasone of six circles and the stimulus on the right was oneof the eight or eleven dot patterns. Each circle and dotpattern was also judged by itself for the "total intensity"conditions.

Procedure. Each trial consisted of either a circle, adot pattern, or both. Task order ("difference" or "total")was counterbalanced across subjects. Judges were given24 to 26 representative warm-up trials to acquaint themwith the stimulus range and stimulus spacing. Trialswere printed in random order, and pages were shuffledto provide different orders. The task took approximatelyone hour.

Subjects. The judges were 157 undergraduates, with38 to 41 different judges in each of the four conditions.

Results and Discussion

Figures 11 and 12 show mean judgmentsof the common stimuli in the "total" and"difference" tasks, respectively. Data pointsare connected by solid lines; dashed lines in-dicate prediction of the model described be-low. Estimated circle scale values are plottedon the abscissa with a separate curve foreach level of the dot stimulus. The open cir-

LOCI OF CONTEXTUAL EFFECTS 595

cnzLJ

O

0)Eo>

T3

10 20 30 40 10 20 30 40

Estimated Scale Value of CircleFigure 11. Mean "total intensity" judgments for common stimuli in the four contexts plotted as afunction of the estimated circle scale value with a separate curve for each dot pattern. (Open circles areused for judgments of dot pattern alone; the curve labeled "none" show judgments of circles alone.Dashed lines are predictions of the additive model, in which scale values for dot patterns were assumedto depend on the range and spacing of the dot stimuli.)

cles in Figure 11 indicate judgments of dotstimuli alone; the bottom curves labeled"none" indicate judgments of circle stimulialone. The average standard errors are 2.95and 2.00 for the "difference" and "totals"tasks, respectively. The approximate paral-lelism of the curves in Figures 11 and 12 isconsistent with predictions of additive andsubtractive models for the "total" and "dif-ference" tasks, respectively, assuming linearJ functions.

The four sets of "total" judgments andfour sets of "difference" judgments were fitto a special case of Equations 4 and 5:

fljk = ak(cj + sik) + l3k, (14)

D,Jk = yk(cj-sik), (15)

where fijk and &vk are the predicted "total

intensity" and "difference" judgments ofcircle j and dot pattern i in context k; Cj andslk are the estimated scale values of the cir-cles and the dot patterns, respectively; ak andftk, and yk are linear constants for each con-text. In Equation 15, when the response is"no difference" it is assumed that c, = slk,that is, a cross-modality "match." In theadditive, model, the additive constant (3k isdetermined by the constraint that when astimulus is not presented, its value is zero.6

Scale values for the dot patterns were per-mitted to be different for "totals" and "dif-ferences" in each context; that is, there are

'Equation 14 jmplies the following: Trot + fOJt -Tyt = |8*, where Trat and T0jt are predicted judgmentsof the /th dot pattern and jth circle presented alone incontext k.

596 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

eight sets of scale values for dots. Since thedistribution of circles was identical in allconditions, circle scale values were assumedto be the same for all eight conditions. Byconstraining the circle scale values to beidentical across tasks, it is only necessary tofix one circle scale value in order to deter-mine the darkness scale values.

The model shown in Equations 14 and 15could account for all but 3.06% of the vari-ance in the mean judgments. When it is as-sumed that the dot scale values are the samefor both "difference" and "totals" tasks (butdot scales depend on context), the model lefta residual of 4.41% using 38 fewer estimatedparameters. Hence, not much is lost by as-suming the dot scale values are largely in-dependent of the task.

The vertical spacing of the curves in Fig-ures 11 and 12 determines the scale valuesfor the dots. Note that the spacing betweenthe curves differs for different contexts. For

example, the vertical spacing between the12- and 90-dot curves is greater for the nar-row range conditions (positive and negativeskew) than for the wide range conditions forboth "difference" and "total" tasks at alllevels of circle size.

Estimated scale values are shown in Fig-ure 13 as a function of log 4> with separatecurves for each stimulus distribution. Notethat for both "differences" and "totals," theslopes are greater for the narrow range con-ditions than for the wide range condition.Note also that a medium-level dot pattern(e.g., 27) receives a greater scale value inthe positively skewed context (where themajority of stimuli are lighter), than it doesin the negatively skewed context. Thesechanges in the slope and the height of thecurves are in the general direction of theusual contextual effects in ratings (Parducci,1974), although the curvature is less thanexpected by range-frequency theory.

O

IUJ

t

o>£o>•o

80

40

0

-40

-80

80

40

§-400)5 -80

I 1 I I I I

"DIFFERENCES"Positively Skewed

90

•4 1 1

Medium Range

I I T^ I I

Negatively Skewed

H 1 1 h

Wide Range

0 10 20 30 40 0 10 20 30 40

Estimated Scale Value of CircleFigure 12. Mean "differences" for the common stimuli in the four tasks, plotted as in Figure 11. (Dashedlines are predictions of the subtractive model, allowing different dot scale values for different contexts.)

LOCI OF CONTEXTUAL EFFECTS 597

In summary, Experiment 3 shows that incross-modality comparison and combina-tion, the marginal stimulus distribution af-fects the scale values. Recall that in Exper-iment 2, the marginal stimulus spacing didnot affect the scale values for within-mo-dality comparisons. The results of Experi-ment 3 suggest that stimuli are judgedwithin the context of other stimuli in thesame continuum before they are comparedor combined across continua. These findingsare compatible with a psychological relativ-ity view of cross-modality matching.

General Discussion

Single Stimulus Versus ComparisonJudgments

Experiments 1 and 2 taken together sug-gest that contextual effects due to manipu-lation of the stimulus distribution for within-modal judgments can be attributed to thejudgment function (J in Figure 1), ratherthan the psychophysical function. In Exper-iment 1, the stimulus distribution was shownto affect both category ratings and magni-tude estimations of single stimuli. These con-textual effects could be explained by a gen-eralized form of Parducci's range-frequencytheory for the judgment function. Localiza-tion of the contextual effects in J was con-sistent with the finding in Experiment 2 thatsubtractive model scale values do not appearto change as a function of stimulus spacingin "ratio" and "difference" judgments.7

Within-Modality Versus Cross-Modality'Comparison

A comparison of Experiment 2 and 3shows that manipulation of the stimulusspacing has different effects on the estimatedscale values for within-modality and cross-modality judgments. In the within-modalityjudgments of Experiment 2, estimated scalevalues for "ratios" and "differences" in bothcontexts were virtually identical.

However, with cross-modality judgmentsof "differences" and "totals" in Experiment3, estimated scale values do appear to de-pend on the range and spacing of the stimuli.Hence it appears that in cross-modalityjudgments, contextual effects occurred in the

. "TOTALS

-"DIFFERENCES"/ Po9

18 27 40 60 90 12 18 27 40 60 90

Stimulus ValueFigure 13. Estimated scale values for dot patterns plot-ted against stimulus values in equal log steps. (If therewere no contextual effects, curves in each panel shouldcoincide.)

H function; in within-modality judgments,the H function was unaffected. Cross-mo-dality judgments may require an additionalimplicit judgmental transformation, in whichsubjective values are related to their owndistributions before they are compared orcombined across dimensions.

Some cross-modality judgments seem toinvolve a joint distribution between the twomodalities in addition to the marginal dis-tributions. Consider a judgment such as "Isthis salary fair pay for this job?" Psycho-logical relativity theory asserts that the rel-ative position of the salary in the distributionof salaries is compared with the relative po-sition of the job in the distribution of jobs.That is to say, judges compare the stimuli

7 In Experiment 2, two types of distributions are rel-evant: (a) the marginal distribution, or spacing of thestimuli, and (b) the distribution of ¥ (subjective dif-ferences). It is assumed that the J function depends onthe distribution of impressions (differences), which wasnot systematically manipulated in Experiment 2. There-fore, Experiment 2 was not designed to produce changesin the J function. Mellers and Birnbaum (in press) var-ied the J function by systematically manipulating thejoint distribution in order to affect the distribution of* (see also Birnbaum, et al, 1971, Experiment 5, andMellers, 1982).

598 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

§ 500

•| 400inUJ| 300

& 200

c 100

Magnitude Estimation!'900" 27

;"300" 12

Standard

Largest Example

12 IS 27 40 60 90Stimulus Value (Log Spacing)

Figure 14. Evidence against between-subject compari-son of magnitude estimation: Crossover refutes simpletheories of magnitude estimation.

with respect to the marginal stimulus dis-tributions. In addition to the marginal stim-ulus information, the judge may also attendto the distribution of salaries for each joband the distribution of jobs for each salary.Thus, the joint distribution seems relevantto certain cross-modality comparisons.

Mellers (1982, Experiment 4) investi-gated inequity judgments of hypotheticalfaculty members relative to other membersof an academic department based on infor-mation about their merit ratings and sala-ries. The marginal distributions of merit andsalary and the distribution of salary-meritdifferences were manipulated. It was shownthat the marginal stimulus distributions af-fected the scale values as predicted by anextension of range-frequency theory, as inExperiment 3. Variations in the distributionof differences influenced the judgment func-tion and could also be described by an ex-tension of range-frequency theory applied tovalues of ̂ , as in Birnbaum et al. (1971) andMellers & Birnbaum (in press). In partic-ular, Mellers found that a salary is judgedto be "fair" when its position relative to thedistribution of salaries corresponds to therelative position of the person's merit in themerit distribution. Furthermore, the judg-ment of inequity was found to depend on thesubjective difference between salary andmerit relative to the distribution of subjec-tive differences.

Contextual Effects in Between-SubjectVersus Within-Subject Designs

In a recent paper that examined "ratio"and "difference" judgments, different groups

of subjects used different standards for the"ratio" task (Rule, Curtis, & Mullin, 1981).They assumed that such "ratio" judgmentscan be described by the following ratiomodel:

(16)

where RtJ is the estimation of "ratio," andSj and s, are the subjective values of the stim-ulus and the standard, respectively, and J*is a single monotonic function that was as-sumed to be independent of the value of thestandard. Violations of bilinearity in Figure5 show that this model can be refuted if J*is assumed to be any power function with anadditive constant.

Even with the weak assumption that J*is monotonic, Equation 16 can be refuted.Figure 14 replots two of the curves fromFigure 5 (Experiment 1) to reveal an ordinalviolation of Equation 16 for between-subjectdesigns. Figure 14 demonstrates that Equa-tion 16 cannot account for the data whendifferent subjects use different standards anddifferent ranges of examples. For example,the "ratio" of the darkness of the 18-dotstimulus to the 12-dot stimulus (when thelargest example is 300) is greater than the"ratio" of the 18-dot stimulus to the 27-dotstimulus (when the largest example is 900).In other words, the judgments and Equation16 imply:

S>S' °7)

However, the "ratio" of the 90-dot stimulusto the 27-dot stimulus (when the largest ex-ample is 900) is greater than the "ratio" ofthe 90-dot stimulus to the 12-dot stimulus(when the largest example is 300), implying:

J90 £90 (18)

Thus, if Equation 16 is assumed, and J* isthe same for different groups of subjects, itfollows that s21 > s\2 in Equation 17, butSn > Syj in Equation 18. This contradictionrefutes Equation 16. However, these other-wise puzzling results are expected if differ-ent J* functions occur for different groupsof subjects, as in Equations 8 and 9, or 2band 3b.

Another violation of Equation 16 can beseen in Figure 4. The "ratio" of the 27-dot

LOCI OF CONTEXTUAL EFFECTS 599

stimulus to the 12-dot stimulus when ex-amples go up to 900 in the positively skewedcontext is greater than the "ratio" of the 40-dot stimulus to the 12-dot stimulus in thenegatively skewed context, which shows thatbetween-subject comparisons imply that the27-dot pattern is darker than the 40-dot pat-tern! Yet within any of the conditions, the40-dot stimulus is always judged to bedarker than the 27-dot pattern. These resultsare consistent with Equations 8 and 9, whichallow for different J* functions.

Experiment 1 demonstrates that categoryratings and magnitude estimations dependon the stimulus spacing and the responserange given in the instructions. When dif-ferent groups of subjects receive differentstimulus or response distributions (i.e., mag-nitude estimations in which different groupsof subjects receive different stimuli, stan-dards, and/or different examples) the judg-ment function may indeed vary. Thereforemagnitude estimations (or ratings) shouldnot be regarded as an ordinal scale of sen-sation when compared across groups whoexperience different contexts.

If variations of standards and examplesin between-subjects designs produce differ-ent J* functions, the conclusion of Rule etal. (1981) that subjects use two operationswhen judging "differences" and "ratios" ofheaviness of lifted weights should be ques-tioned and reexamined. This conclusion restsentirely on the assumption that J* in Equa-tion 16 is the same for different groups ofsubjects who received different standards intheir experiments. Figure 14 shows that thisassumption may lead to contradictions.

It might be argued that Rule et al. (1981)were correct in their domain (heaviness)when they assumed that J* was the same forall groups of subjects. However, it seemslikely that if our Experiment 1 was repli-cated with lifted weights, similar contextualeffects would occur in which the curves couldbe either made to fit the ratio model, givethe same order as the subtractive model, orviolate the ratio model depending on the re-sponse range and standard. The present the-ory (Equation 9) should be preferred toEquation 16 because it can explain both theRule et al. results (which are by analogyreplicated in Figure 5) and also our results(Figure 14), which contradict Equation 16."

Is There a "Right" Way to DoPsychophysics?

Some have argued that with certain pro-cedures, contextual effects can be avoided.For example, Poulton (1979), who reviewedcontextual effects due to stimulus spacingand other factors, recommended using eithera logarithmic spacing in magnitude esti-mation or a complete between-subjects de-sign. According to range-frequency theory,logarithmic spacing could yield category rat-ings that are linearly related to subjectivevalue if Fechner's law is assumed to be true.However, it seems unwise to assume Fech-ner's law in the design of experiments with-out providing a test of the assumption. Fur-thermore, for magnitude estimation, thejudgment function appears to depend on theresponse examples. To "avoid" this issue,Poulton (1979) advised using no examples.However, when the examples are not ma-nipulated, the J* function is uncontrolledand unknown. Just because the experimenterrefrained from using examples to illustratethe magnitude estimation scale does notmean that the subject did not do so. Eachsubject may use a different set of implicitexamples, thereby producing a different J*function.

If there are supposed to exist "right" pro-cedures to avoid contextual effects, how arethe "right" procedures to be determined?Some criteria for establishing the "correct-ness" of procedures are needed to resolvedisagreements concerning proper procedure.Certainly, a procedure should not be advo-cated based on an uncertain theory that can-not be tested using that procedure (Birn-baum, 1982a; 1982b).

Consider Poulton's (1973; 1979) recom-mendation to "avoid" contextual effects by

8 Some investigators have noted the difficulty of fittingthe ratio model to data (Anderson, 1974; Eisler, 1960;Fagot & Stewart, 1971; SjOberg, 1971). Ironically,Birnbaum and his colleagues, who argue for a subtrac-tive rather than a ratio representation of "ratio" and"difference" judgments, have been reasonably success-ful in fitting the ratio model to "ratio" judgments. Intheir procedures, variation of standards and compari-sons is done in within-subject factorial designs, and geo-metrically-spaced response examples are used. It is theo-rized that geometrically-spaced examples can induce anexponential J* function for magnitude estimations.Hence, a subtractive comparison process can lead todata that will fit a ratio model (Birnbaum, 1980,1982a).

600 BARBARA A. MELLERS AND MICHAEL H. BIRNBAUM

asking each subject to make only a singlejudgment, prom our theoretical viewpoint,this procedure confounds the stimulus andthe context: If each subject judged a differ-ent "ratio," the judgment could be repre-sented, RIJ = Jtj (sj - s,), which shows thata different Jtj function would be allowed foreach stimulus and standard. This recom-mendation must be based on the theory thateach subject has the same J* function as inEquation 16, a dubious assumption given theresults of Figure 14. More importantly, sucha procedure would not allow a test of thetheory upon which it is based.

Another approach—the approach of thisarticle—is to systematically manipulate thecontext and use a theory of the context toderive subjective values. Context is viewedas an integral part of the judgment process—something that cannot be "avoided." Thissystextual design approach is discussed ingreater detail and compared with repre-sentative design, standardization design,and between-subjects design by Birnbaum(1974b; 1982a, Section E).

Conclusions

The following tentative conclusions ap-pear consistent with the data:

1. Both category ratings and magnitudeestimations appear to depend on the stimulusspacing and the response range in a similarfashion. These contextual effects are in thedirection predicted by Parducci's range-fre-quency theory. The relationship betweencategory ratings and magnitude estimationsdepends on the context and cannot be de-scribed by a context-invariant functionalform.

2. In within-modal judgments, the scalevalues appear to be independent of variationsin the stimulus distribution. These contex-tual effects can be accounted for by changesin the judgment function.

3. In cross-modality judgments, the scalevalues are influenced by the stimulus distri-bution: It appears that subjects compare therelative position of a stimulus in its distri-bution with the relative position of a stimulusof another modality to its distribution. Re-sults were consistent with a psychologicalrelativity theory of cross-modality judgment.

References

Anderson, N. H. Algebraic models in perception. InE. C. Carterette & M. P. Friedman (Eds.), Handbookof perception (Vol. 2). New York: Academic Press,1974.

Anderson, N. H. On the role of context effects in psy-chophysical judgment. Psychological Review, 1975,52,462-482.

Anderson, N. H. Cognitive algebra and social psycho-physics. In B. Wegener (Ed.), Social attitudes andpsychophysical measurement. Hillsdale, N.J.: Erl-baum, 1982.

Birnbaum, M. H. The nonadditivity of personalityimpressions. Journal of Experimental Psychology,1974, 102, 543-561. (a)

Birnbaum. M. H. Using contextual effects to derivepsychophysical scales. Perception & Psychophysics,1974, 15, 89-96. (b)

Birnbaum, M. H. Differences and ratios in psychologicalmeasurement. In N.J. Castellan & F. Restle (Eds.),Cognitive Theory (Vol. 3). Hillsdale, N.J.: Erlbaum,1978.

Birnbaum, M. H. A comparison of two theories of "ra-tio" and "difference" judgments. Journal of Exper-imental Psychology: General, 1980, 3, 304-319.

Birnbaum, M. H. Controversies in psychological mea-surement. In B. Wegener (Ed.), Social attitudes andpsychophysical measurement. Hillsdale, N.J.: Erl-baum, 1982. (a)

Birnbaum, M. H. Problems with so-called "direct" scal-ing methods. In J. T. Kuznicki, R. A. Johnson, &A. F. Rutkiewic (Eds.), Sensory methods: Problemsand approaches to hedonics. ASTM STP773. Phil-adelphia, Pa.: American Society for Testing and Ma-terials, 1982. (b)

Birnbaum, M. H., Parducci, A., & Oifford, R. K. Con-textual effects in information integration. Journal ofExperimental Psychology, 1971, 88, 158-170.

Chandler, J. D. Subroutine STEPiT-Finds local minimaof a smooth function of several parameters. Behav-ioral Science, 1969, 14, 81-82.

Eisler, H. Similarity in the continuum of heaviness withsome methodological and theoretical considerations.Scandinavian Journal of Psychology, 1960, /, 69-81.

Eisler, H. Magnitude scales, category scales, and Fech-nerian integration. Psychological Review, 1963, 70,243-253.

Fagot, R. F. Nested models of relative judgment: Ap-plications to a similarity averaging model. Perception& Psychophysics, 1979, 26, 255-264.

Fagot, R. F., & Stewart, M. Tests of product and ad-ditive scaling axioms. Perception & Psychophysics,1971, 10, 418-422.

Helson, H. Adaption-level theory. New York: Harper& Row, 1964.

Johnson, D. M., & Mullally, C. R. Correlation-and-regression model for category judgments. Psycholog-ical Review, \969, 76, 205-215.

Krantz, D. H. Magnitude estimations and cross-mo-dality matching. Journal of Mathematical Psychol-ogy, 1972, 9, 168-199.

Marks, L. E. On scales of sensation: Prolegomena toany future psychophysics that will be able to come

LOCI OF CONTEXTUAL EFFECTS 601

forth as a science. Perception & Psychophysics, 1974,16, 358-376.

Marks, L. E. A theory of loudness and loudness judg-ments. Psychological Review, 1979, 86, 256-285.

Mellers, B. A. Equity judgment: A revision of Aristo-telian views. Journal of Experimental Psychology:General, 1982, / / / , 242-270.

Mellers, B. A. & Birnbaum, M. H. Contextual effectsin social judgment. Journal of Experimental SocialPsychology, in press.

Montgomery, H. Direct estimation: Effect of method-ological factors on scale type. Scandinavian Journalof Psychology, 1975, 16, 19-29.

Moskowitz, H. R. Utilitarian benefits of magnitude es-timation scaling for testing product acceptability. InJ. T. Kuznicki, R. A. Johnson, and A. F. Rutkiewic(Eds.), Selected sensory methods: Problems and ap-proaches to hedonics. ASTM STP 773. Philadelphia,Pa.: American Society for Testing and Materials,1982.

Parducci, A. Range-frequency compromise in judgment.Psychological Monographs, 1963, 77(2, Whole No.565).

Parducci, A. Category judgment: A range-frequencymodel. Psychological Review, 1965, 72, 407-418.

Parducci, A. The relativism of absolute judgment. Sci-entific American, 1968, 219, 84-90.

Parducci, A. Contextual effects: A range-frequencyanalysis. In E. C. Carterette & M. P. Friedman(Eds.), Handbook of Perception (Vol. 2). New York:Academic Press, 1974.

Parducci, A. Category ratings: Still more contextualeffects. In B. Wegener (Ed.), Social attitudes andpsychophysical measurement. Hillsdale, N.J.: Erl-baum, 1982.

Parducci, A., & Perrett, L. Category rating scales: Ef-fects of relative spacing and frequency of stimulusvalues. Journal of Experimental Psychology, 1971,89, 427-452.

Poulton, E. C. The new psychophysics: Six models for

magnitude estimation. Psychological Bulletin, 1968,69, 1-19.

Poulton, E. C. Unwanted range effects from usingwithin-subject experimental design. PsychologicalBulletin, 1973, 80, 113-121.

Poulton, E. C. Models for biases in judging sensorymagnitude. Psychological Bulletin, 1979, 86, 777-803.

Restle, F., & Greeno, J. G. Introduction to mathe-matical psychology. Reading, Mass.: Addison-Wes-ley, 1970.

Robinson, G. H. Biasing power law exponents in mag-nitude estimation instructions. Perception & Psycho-physics, 1976, 19, 80-84.

Rule, S. J., Curtis, D. W., & Mullin, L. C. Subjectiveratios and differences in perceived heaviness. Journalof Experimental Psychology: Human Perception andPerformance, 1981, 7, 459-466.

Shepard, R. N. On the status of "direct" psychologicalmeasurement. In C. W. Savage (Ed.), Minnesotastudies in the philosophy of science (Vol. 9). Min-neapolis: University of Minnesota Press, 1978.

Sjtiberg, L. Three models for the analysis of subjectiveratios. Scandinavian Journal of Psychology, 1971,12, 217-240.

Stevens, S. S. A metric for the social consensus. Science,1966, 151, 530-541.

Stevens, S. S. Issues in psychophysical measurement.Psychological Review, 1971, 78, 426-450.

Stevens, S. S., & Galanter, E. H. Ratio scales and cat-egory scales for a dozen perceptual continua. Journalof Experimental Psychology, 1957, 54, 337-411.

Teghtsoonian, R. On the exponents in Stevens' law andthe constant in Ekman's law. Psychological Review,1971, 78, 71-80.

Torgerson, W. S. Distances and ratios in psychologicalscaling. Acta Psychologica, 1961, 19, 201-205.

Received August 12, 1981