Context in category scales: is “fully agree” equal to twice agree?

7
Original article Context in category scales: is fully agreeequal to twice agree? Effets de contexte et échelles de catégories W. Cools * , J. Hofmans, P. Theuns Vakgroep Arbeids- and Organisatiepsychologie, Faculteit voor Psychologie en Educatiewetenschappen, Vrije Universiteit Brussel, pleinlaan 2, 1050 Brussels, Belgium Received 15 March 2005; accepted 5 September 2005 Abstract Trough the cross modality matchingapproach this research examines the perceived intensity of verbal qualifiers used in agreement scales. Also, the effect on perceived intensity resulting from the scaling context in itself (i.e. number of response categories and the chosen verbal qualifiers) is investigated. Results show a low amount of inter-individual variability between subjects concerning the perceived intensity of verbal qualifiers. A category scale with five response alternatives is least prone to context effects and the use of supplementary extreme answer cate- gories on the left and the right ends of the scale does not improve the metric properties of the scale. © 2006 Elsevier Masson SAS. All rights reserved. Résumé Lusage déchelles est une méthode courante dans le domaine des sciences sociales. Le but de cette expérimentation était de déterminer linfluence sur les réponses des sujets, de lutilisation de descripteurs verbaux sur une échelle graphique. La tâche des sujets consistait à définir lintensité de sept descripteurs par estimation numérique et par production graphique dun trait. Les descripteurs ont été présentés tels quels et dans le contexte dune échelle à cinq, six ou sept catégories. Afin détudier ces relations, nous avons utilisé la technique de dappariement intermodal. Les résultats montrent quune échelle comportant cinq catégories possède les meilleures propriétés psychométriques et que les caté- gories extrêmes napportent aucune valeur supplémentaire à léchelle. © 2006 Elsevier Masson SAS. All rights reserved. Keywords: Scaling; Cross-modality matching; Context; Measurement level Mots clés : Échelles ; Appariement inter modal ; Contexte ; Niveau de mesure 1. Introduction Although category scales have the disadvantage of being susceptible to different kinds of biases and even if they fail to attain a quantitative measurement level, they are the most widely used response format in social science. This can be explained by their convenience, i.e. they are easy to use and large samples can be reached with relatively low costs. (Cools et al., 2003; Hofmans et al., 2004; Lodge et al., 1975). Usually category scales with 411 response alternatives are used. Often these are associated with numbers, words or gra- phical symbols (called labelsor qualifiers). The choice of these labels or qualifiers is not impartial with regard to the elicited response patterns. Our paper focuses on this choice. When talking about context in this paper we imply the scaling contextin itself. Actually the focal point of this article is the http://france.elsevier.com/direct/ERAP/ Revue européenne de psychologie appliquée 56 (2006) 223229 * Corresponding author. E-mail address: [email protected] (W. Cools). 1162-9088/$ - see front matter © 2006 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.erap.2005.09.007

Transcript of Context in category scales: is “fully agree” equal to twice agree?

http://france.elsevier.com/direct/ERAP/

Revue européenne de psychologie appliquée 56 (2006) 223–229

Original article

Context in category scales: is “fully agree” equal to twice agree?

Effets de contexte et échelles de catégories

W. Cools*, J. Hofmans, P. Theuns

* Corresponding author.E-mail address: walenti

1162-9088/$ - see front madoi:10.1016/j.erap.2005.09.

Vakgroep Arbeids- and Organisatiepsychologie, Faculteit voor Psychologie en Educatiewetenschappen,

Vrije Universiteit Brussel, pleinlaan 2, 1050 Brussels, Belgium

Received 15 March 2005; accepted 5 September 2005

Abstract

Trough the “cross modality matching” approach this research examines the perceived intensity of verbal qualifiers used in agreement scales.Also, the effect on perceived intensity resulting from the scaling context in itself (i.e. number of response categories and the chosen verbalqualifiers) is investigated. Results show a low amount of inter-individual variability between subjects concerning the perceived intensity of verbalqualifiers. A category scale with five response alternatives is least prone to context effects and the use of supplementary extreme answer cate-gories on the left and the right ends of the scale does not improve the metric properties of the scale.© 2006 Elsevier Masson SAS. All rights reserved.

Résumé

L’usage d’échelles est une méthode courante dans le domaine des sciences sociales. Le but de cette expérimentation était de déterminerl’influence sur les réponses des sujets, de l’utilisation de descripteurs verbaux sur une échelle graphique. La tâche des sujets consistait à définirl’intensité de sept descripteurs par estimation numérique et par production graphique d’un trait. Les descripteurs ont été présentés tels quels etdans le contexte d’une échelle à cinq, six ou sept catégories. Afin d’étudier ces relations, nous avons utilisé la technique de d’appariementintermodal. Les résultats montrent qu’une échelle comportant cinq catégories possède les meilleures propriétés psychométriques et que les caté-gories extrêmes n’apportent aucune valeur supplémentaire à l’échelle.© 2006 Elsevier Masson SAS. All rights reserved.

Keywords: Scaling; Cross-modality matching; Context; Measurement level

Mots clés : Échelles ; Appariement inter modal ; Contexte ; Niveau de mesure

1. Introduction

Although category scales have the disadvantage of beingsusceptible to different kinds of biases and even if they fail toattain a quantitative measurement level, they are the mostwidely used response format in social science. This can be

[email protected] (W. Cools).

tter © 2006 Elsevier Masson SAS. All rights reserved.007

explained by their convenience, i.e. they are easy to use andlarge samples can be reached with relatively low costs. (Coolset al., 2003; Hofmans et al., 2004; Lodge et al., 1975).

Usually category scales with 4–11 response alternatives areused. Often these are associated with numbers, words or gra-phical symbols (called “labels” or “qualifiers”). The choice ofthese labels or qualifiers is not impartial with regard to theelicited response patterns. Our paper focuses on this choice.When talking about context in this paper we imply “the scalingcontext” in itself. Actually the focal point of this article is the

W. Cools et al. / Revue européenne de psychologie appliquée 56 (2006) 223–229224

impact of the chosen number of scale points as well as theinfluence of the chosen verbal labels on the other hand. Onetype of category scales is frequently used and therefore attractsour attention namely agreement scales (e.g. totally agree, donot agree, agree, …).

According to Rohrmann (2003) it is essential to strive forthe development of equidistant and unambiguous categoryscales, but little has been yet done to achieve this goal. Hefound that whether labels are imbedded in a specific “contextA” or “context B”, it has little influence on the perceived inten-sity of those labels. However he employed a different defini-tion of context, i.e. the content of the associated questions.Borg and Lindblad (1976) found that everyday expressions(e.g. verbal labels) used as qualifiers were rated quite similarlyby different groups of individuals, in so much that these ratingsare representative for the underlying rank-ordering and per-ceived intensity of the expressions presented. Building on thisacceptable inter-individual correspondence they concluded thata category scale with verbal qualifiers can be constructed. Theystipulated that these qualifiers ought to be positioned accordingto their underlying perceived intensity; furthermore they shouldbe positioned in the same way as they are perceived in relationto one another. Hence Borg (Borg, 1982; Borg and Borg, 1987,2001) developed his own scaling method i.e. “category ratioscaling”, resulting in the development of the CR-100 and theCR-10 scale which measure perceived exertion (see Fig. 1).According to Borg (2003) scaling will never be performedwith any perfect scale and the idea itself is utopian, he prefersspeaking of “semi-ratio scaling” properties. “Semi-ratio” refersto the fact that Borg’s scale can be treated as though it involvesratio properties (this in spite of small individual variances),since the scale proved to have a high predictive power (Borg,2003).

Further to investigating the effect of “context defined by thescale in itself”, another aim of this research is to examine thepossibility of constructing an agreement category scale asdescribed by Borg and Lindblad (1976).

Rohrmann (2003) states that in the eighties the focus ofsocial sciences research, temporally shifted from category-

Fig. 1. Borg CR scale for perceived exertion.

based scaling to other scaling techniques like magnitude esti-mation. At times magnitude estimation reaches a higher levelof measurement and is les prone to context and other biasesthan category scaling (Gescheider, 1988; Stevens, 1966). Dif-ferent authors (Lodge et al., 1975; Rohrmann, 2003; Schafferand Bradburn, 1989; Wegener, 1983) stated that magnitudescaling is indeed superior in terms of obtained measurementlevels, but is perceived as more difficult to use by respondents.In classic magnitude estimation experiments, participants areasked to directly match numbers to the perceived intensity(sensation magnitude) of a sensory or physical stimulus (e.g.bright light, a sound ….). This is done one stimulus at atime, with each individual stimulus randomly presented severaltimes. In doing so ratio scales of sensation can be developed.Sometimes participants are given a reference stimulus and anumerical value, called a “modulus”. Then they are asked torate their perceived intensity in accordance with the givenreference value (Gescheider, 1988; Lodge et al., 1976a).According to Rohrmann (2003) the fundamental differencebetween category rating and magnitude estimation lies in thedifferent cognitive operations that are required from respon-dents, namely thinking in differences (i.e. category scales) orin ratios (i.e. magnitude estimation). Until today this kind ofresearch established that human observers are capable ofusing numbers to make proportional judgments of physical(or more objective) stimuli.

Magnitude estimation is a scaling technique typically usedin psychophysical research and was introduced by Stevens(1975). According to Stevens’ “representational measurementtheory”, measurement refers to the allocation of numericalvalues to stimuli (object, event, attitude …) according to a setof meaningful rules (Stevens, 1975). Stevens’ power law(1975), claims that equal ratios of physical intensity produceequal ratios of sensation or:

ψ ¼ kIn (1)

where ψ is the sensation magnitude, k is a constant dependingin on the scale unit, and n is an exponent to which the stimulusintensity I is raised. This exponent n is specific for each stimu-lus modality, for instance the exponent n for line length equals1, for handgrip it equals 0.67 (Gescheider, 1988; Stevens,1975). The sensation of magnitude ψ can be expressed bymeans of different response modalities (e.g. numbers, linelength, handgrip ….) and not only by means of numerical esti-mates. Others argue that because in some cases the real valueof an attribute or stimuli (e.g. IQ) cannot be observed directlythere is no way of knowing which measurement scale corre-sponds to the values of an attribute (Kampen, 2001). Thus incase of classic magnitude estimation techniques, the shift fromphysical to social stimuli (more accurately the shift from themeasurement of attitudes) raises one major problem, that isthe lack of known metric properties of these social stimuliand the impossibility to asses these social stimuli directly. Apossible answer to this problem can be found in the “crossmodality” paradigm. This paradigm provides a way to establishthe validation of those social–psychological stimuli. Validation

1 The response alternatives were presented in Dutch: “fully agree” (equal for‘helemaal akkoord’), “rather agree” (equal for ‘eerder akkoord’) “neutral”(equal for neutraal), “rather disagree” (equal for “eerder niet akkoord”),“fully disagree” (equal for ‘helemaal niet akkoord’).

W. Cools et al. / Revue européenne de psychologie appliquée 56 (2006) 223–229 225

in this case stands for a method to establish the proper quanti-fication of these social–psychological stimuli through a calibra-tion task. In this task participants must first of all express esti-mates of their sensation of physical stimuli through twodifferent modalities (Bruce and Clayton, 1999; Cools et al.,2004). In practice, participants in this stage are often asked toestimate line length stimuli (modality 1) by means of numbers,and produce lines to express numerical stimuli (modality 2).The rationale behind this paradigm is mainly based on the con-cept of the transitivity of properties of magnitude estimationsof different response scales e.g. number estimations, line pro-duction, hand grip, etc. (Cardoso et al., 2001; Kampen, 2001).The paradigm may be presented schematically (Lodge et al.,1976a) as follows:

Because the stimulus Φ remains the same across the differ-ent response modalities, both related power functions can bewritten as:

R1 ¼ k1Φx1 R2 ¼ k2Φ

x2 (2a)

From Eq. (1) we can deduce Eq. (2).

log R1 ¼ x1x2

log R2 þ k (2b)

It follows that the exponent x1x2obtained by matching any

two modalities should hold if they are to be indirectly matchedthrough a third one. This enables researchers to validate theobserved estimates of the social scale values (Cardoso et al.,2001). At this point respondents’ estimates of social stimulican be validated and the next step in the analysis of estimatesis the assessment of the underlying scale values. Thereafter thecalibration phase, participants are asked to provide estimates ofsocial stimuli by means of different response modalities, in ourcase “numerical estimation” (NE) and “line length production”(LLP). Scale values are determined through the empiricalexponents obtained from the calibration exercises. Ideallysocial scale values are given by the following formula(Lodge, 1981):

Ψ ¼ R1x11 R

1x22

� �12=

(3)

Where X1 is the exponent derived from the first modality(often “numeric estimations”) and X2 is the exponent deducedfrom the second modality (often “LLPs”) in the calibrationphase.

In summary through “cross modality matching” we attemptto examine the perceived intensity of typical verbal qualifiersused in agreement scales. We will also map the effect of the

scaling context in itself (verbal labels and number of responsealternatives). Consequently we will asses whether categoryscaling and magnitude estimation provide coherent informationabout the examined verbal qualifiers.

2. Method

The experiment was conducted on the Internet, with sub-jects being able to participate from any place of choice. Fortythree subjects participated of which 11 males and 32 female.Their age varied from 18 to 54 years. The experiment consistedof 10 trials. A complete within-subjects design was used. Thisenabled us to assess the size and direction of context effectsresulting from a particular choice of verbal qualifiers and num-ber of categories on the scale. To avoid sequential biases likepractice effects, fatigue, memory etc. all stimuli within eachtrial were randomly presented several times. The presented sti-muli consisted of seven verbal qualifiers1 “fully agree”, “ratheragree”, “neutral”, “rather disagree” and “fully disagree”.

First each participant had to take part in a calibration trial.Each subject produced estimates for perceived “line length”through numbers and secondly gave line production to estimatenumerical stimuli. As mentioned before this allows us to vali-date the estimates of social scales values in the coming trials.

The calibration task was followed by two no-context trialswhere subjects produced estimates of the perceived magnitudeof each verbal label. During these two trials the qualifiers werepresented without any scaling context. In the first no-contexttrial they provided numerical estimates (NE) and in the secondone LLP was used. Participants were presented with those twono-context trials first, in order to avoid sequential biases. Weassumed that if participants would be confronted with verbalqualifiers within their scaling context, this context might affecttheir estimates later on in the no-context trials.

After the no-context trials participants produced numericalestimates and line length estimates of the same verbal qualifierspresented in a scale within the context of a scale with five, sixor seven categories. These context trials were counterbalancedamong participants.

In summary the experiment consisted of:

● two calibration trials;

● two no-context trials: NE and LLP estimation of verbal qua-lifiers presented as such;

● three context trials: subjects gave NE and LLP estimation ofverbal qualifiers presented in the context of a typical;

○ five-category-scale;

○ six-category-scale;

○ seven-category-scale.

W. Cools et al. / Revue européenne de psychologie appliquée 56 (2006) 223–229226

3. Results

Table 1NE and LLP estimates for each verbal qualifier in the no-context trials

Verbal qualifier NE LLPX S.D. X S.D.

Fully disagree 0.7 2.3 0.02 0.2Disagree 2.9 4.79 8.2 2.3Rather disagree 35.1 8.1 41.7 17.8Neutral 37 7.9 129.8 47.1

The first part of the results concerns the psychophysicalvalidation based on the two calibration sessions. The observedexponents x1 and x2 in Eq. (2) have been derived for both mod-alities (numerical estimates and LLPs). These are used to checkperformance adequacy and subjects’ understanding of instruc-tions. We know from psychophysical research (Gescheider,1988; Stevens, 1975) that the theoretical exponents of therelated power functions for NE and line length are both equalto 1. When the fraction x1

x2in Eq. (2) is calculated, this should

also be equal to 1 as stipulated by the cross modality paradigm.This indicates that an increase/decrease of a line length/numberproduces an equal increase/decrease of the corresponding num-ber/line length (Cools et al., 2003; Gescheider, 1988; Lodge etal., 1975; Stevens, 1975).

In our findings the average value of this fraction equals 1.02with a standard deviation of 0.049. In verifying if this fractionis equal to 1 (one sample t-test) we surprisingly found that theobserved fraction differs significantly from 1 (t (3) = 2.746;P < 0.01). This could indicate that at least one of our partici-pants did not perform adequately in the calibration task. How-ever, when scanning the data for outliers (for the exponents forthe different modalities, and for the fraction in Eq. (2)) nonewere found.

As shown in Fig. 2, the regression coefficients for NE andLLP gave us no concern, since the numerical estimates explainabout 95% of the LLP responses (R2 = 0.947).

In case one would consider only the cross modality match-ing paradigm, the finding that the observed fraction differsfrom 1 would mean that the estimates given by our subjectsare not validated. However, since we found no extreme valuesand moreover computed an R2 value of 0.95, we conclude thatpeople did assign ratio estimates to the presented stimuli. Apossible explanation can be found in the calibration exercisewith our translation of the “line length” modality. The theore-tical exponents used for the different modalities are primarilybased on findings that date from more than two decades ago.One may wonder whether in this case one can still considerboth as the same modalities: back then line length estimateswould have been obtained with a paper-and-pencil experiment,while in the present study we used a computer-based taskwhich was completed within the limits of a screen. Therefore

Fig. 2. Average LLP estimates plotted against average NE estimates for thecalibration trial.

we may have to think of a specific exponent for computer-based line length estimates. Our rationale is supported by theresearch of Ross (2003); Teghtsoonian (2004) who also statedthat the variance of fit (or R2) may be a better indicator for thequality of judgments than the fraction (see Eq. (2)) which istraditionally used in the “cross modality matching” paradigm.

This section involves the calculation of the no-context scalevalues based on the two “no-context” trials. When exploringthe raw data (as shown in Table 1) it is noticed that the LLPestimates outweigh the NE estimates with a multiplicative fac-tor of more than 2. Calculating the actual scale values accord-ing to the Lodge et al. (1975) rationale (see Eq. (3)) requiresprior standardization of the raw estimates. We used the follow-ing linear transformation:

LLPij ¼ ai NEij (4)

LLPij stands for the “line length” estimate of label i obtainedfrom subject j, and NEij refers to the “numerical estimates”.

In short, a transformation based on the regression coeffi-cients of the LLP estimates against NE estimates was per-formed for each subject individually. From these transformedestimates the social scale values were calculated according toLodge (1981) (as shown in Table 2).

A quick inspection of the results reveals that the label “neu-tral” does not coincide with the mid-point of the scale, as gen-erally assumed in survey research. Also, the overlaps (95%confidence interval) found between “agree” and “fully agree”and between “disagree” and “fully disagree” are noteworthy.The use of additional extreme verbal qualifiers (like “fullyagree” and “fully disagree”) apparently does not add a surplusto the metric properties of the scale.

In this section the results concerning the influence of thescaling context are described. The same linear transformationthat was also applied to the perceived LLP estimates of thecontext trials. And also the social scale values for the differentresearched scaling contexts (i.e. five, six or seven categories)

Rather agree 84.3 26.3 205.1 49.2Agree 102.3 12 337.7 88.6Fully agree 111 21.5 273.4 53.6

Table 2Derived social scale values for each verbal qualifiers in the no-context trials

Verbal qualifier X S.D.Fully disagree 0.0 0.0Disagree 1.6 2.6Rather disagree 22.7 6.6Neutral 41.5 8.2Rather agree 79.6 19.9Agree 113.0 21.3Fully agree 106.5 21.0

Table 3Derived social scale values for each verbal qualifiers in the different contexttrials

Verbal qualifier Five categories Six categories Seven categoriesX S.D. X S.D. X S.D.

Fully disagree 0.08 0.32 0,00 0,00Disagree 6.3 7.6 7.2 7.1 6.2 6.6Rather disagree 18.5 18.8 20.4 19.2 18.9 18.8Neutral 43.6 9.4 43.8 8.7Rather agree 77.2 20.3 81.4 20.4 79.4 18.3Agree 109.7 20.1 105.4 15.1 106.5 17.02Fully agree 108.7 19.8 107.06 17.9

W. Cools et al. / Revue européenne de psychologie appliquée 56 (2006) 223–229 227

were derived in the same way. Results are presented shown inTable 3.

We produced four overlay scatter plots, one for each possi-ble “context” condition (i.e. five, six or seven categories) andone for the “no-context” condition. In these scatter plots (asshown in Fig. 3) all NE estimates are outlined against theLLP estimates for all discerned verbal labels. These plotsreveal that the estimates of the investigated “context” trialscontain less error variance than the estimates of the “no-context” condition. In general in all conditions resulted inhigh regression fits (R2 > 0.70) were found. A “six category”scale seems to expose the least spread around the fit line(R2 = 0.88).

The error bar graph indicates that the relative position ofeach individual verbal label does not change dramatically dueto the scaling context. In contrast, the scale values obtained inthe “six categories” trial show a big fissure between “ratheragree” and “rather disagree”. In this case we expected a repo-sitioning of the scale values along the scale due to the absenceof the qualifier “neutral”.

In order to asses the differences between the derived scalevalues of the “no-context” trials and each “context” trial sepa-

Fig. 3. Average LLP estimates plotted against average

rately, we performed three (i.e. five, six, seven categoriesopposed to no-context) different two-way repeated measuresANOVA for the factors “label” and “context”. The differentlevels for the factor “label” represented the actual verbal qua-lifiers and for the factor “context” the associated levels standfor either “no-context” or “the specific number of categories”.In all three cases (i.e. five, six or seven categories opposed tono-context) we found a significant interaction effect (F(3,130) = 12.399, P < 0.05; F(3, 101) = 14.482, P < 0.05;F(2,82) = 20.908, P < 0.05) of label and context. The interac-tion plots in Fig. 4 show that the response patterns for the con-text or no-context condition are very similar. A differencebetween context and no-context only seems to occur for somespecific labels i.e. “rather disagree” for the “five categories”context, “agree” in the “six and seven categories” condition.

The Bonferoni analysis for the difference between the “no-context” condition and the “five categories” condition, showeda significant small (partial η2 = 0.212) difference for the label“disagree” (F(5) = 4.31, P < .05). The same analysis in case forthe “six categories” trial also reveals only a significant mediumdifference (partial η2 = 0.365) for “disagree” (F(6) = 7.571,P < .05). The “seven categories” context showed no significantdifference for “disagree” although the qualifier “fully disagree”showed a significant difference (F(6) = 5.379, P < .05) with amedium effect size (partial η2 = 0.29).

4. Discussion and conclusion

We used the “cross modality matching” paradigm to assesthe perceived value of verbal qualifiers measuring level ofagreement as well as the effect of the scaling context in itself.Like Borg and Lindblad (1976) we conclude that our subjects

NE estimates for the no-context and context trials.

Fig. 4. Average derived scale values for the no-context trial and each contexttrial.

W. Cools et al. / Revue européenne de psychologie appliquée 56 (2006) 223–229228

rated the agreement qualifiers quite similarly. A high agree-ment among participants is observed. A linear regression ofthe two response modalities over all verbal labels, for all(no)-context conditions, revealed all R2 to be higher than0.75.

Fig. 5. Propos

Because participants did not rate the label “neutral” to takethe middle position of the scale, and because the results of theperformed ANOVA’s showed a small significant repositioningfor some of the used verbal qualifiers, we tend to disagree withRohrman (2003) who states that rating scales should be “con-structed which approximate interval scale quality, and thattherefore it is essential to use equidistant scale points”. Yetwe support Borg and Lindblad (1976) who argue that the pre-sented scale should be representative for the underlying per-ceived intensities.

In case of the “qualifiers” used in our research a categoryscale with five response alternatives appears to be the bestchoice. In case of a such a scale a good variability around thefit line (as shown in Fig. 3) of the NE estimates plotted againstthe LLP estimates is established. Our results indicate that themetric scale properties are not improved when adding thelabels “fully agree” or “fully disagree” to the respectiveextreme ends of the scale. The 95% confidence intervals ofthe extreme labels “fully disagree” and “fully agree” overlapwith “disagree” and “agree” and therefore do not improve themetric properties of the scale. A “six-category” scale shows aless inter-individual variability than a “five-category” scale butthe error bar plot shows an important fissure between “ratheragree” and “rather disagree”. A “six-category scale” can possi-bly result in less differentiation between respondents. Also, theeffect of context on the label “disagree” is smaller for the “five-category” scale.

These results indicate that in order to achieve better metricproperties, the mean social scale values (i.e. 1.6; 22.7; 41.5;79.6; 112.9) ought to be used for data encoding in surveyresearch instead of the currently used values (e.g. 1; 2; 3; 4;5 or …). In observing the LLP and NE estimates of the socialscale values of all participants, a good linear regression fit(R2 > 0.70) with an intercept equal to 0, is demonstrated.Hence, we can conclude that the derived social scale estimateshave “semi-ratio” properties as defined by Borg (2003).Furthermore this enables us to conclude that we the acquisi-tioned social scale values can be divided or multiplied by anyfactor, and that these can be used in possible future data-analyses. This has also been done for the presented values inFig. 5. Hofmans et al. (2004) stipulates that for response scalesin Dutch, labels are best presented with those labels perceivedas lowest at the left hand side of the scale, because then theyare less prone to orientation effects. At this stage we settled forthe scale presented in Fig. 5. However, it must be noted thatalso layout can be interpreted as part of the scaling “context”,though this was not the focus of this study.

Because “cross modality matching” as described above is avery new approach for the analysis of context effects, for theinvestigation of the measurement level of category scales andfor the assessment of the perceived intensity of chosen verbal

ed scale.

Ho

Ge

Ka

Lo

Lo

Lo

Ro

Ro

Sch

Ste

Ste

Teg

We

W. Cools et al. / Revue européenne de psychologie appliquée 56 (2006) 223–229 229

qualifiers, some caution is required. Further evaluation andreplication of these findings is certainly mandatory.

References

Borg, E., 2003. The multidimensional character of perceived exertion for amaximal and submaximal cycle ergometer test. In: Berglund, B., Borg, E.(Eds.), Fechner Day 2003. Proceedings of the Nineteenth Annual Meetingof the International Society for Psychophysics. International Society ofPsychophysics, Larnaca.

Borg, G., 2003. An index for relations between perceptual magnitudes basedon level-anchored ratio scaling. In: Berglund, B., Borg, E. (Eds.), FechnerDay 2003. Proceedings of the Nineteenth Annual Meeting of the Interna-tional Society for Psychophysics. International Society of Psychophysics,Larnaca.

Borg, G., Lindblad, I., 1976. The determination of subjective intensities in ver-bal descriptions of symptoms. University of Stockholm (Unpublishedmanuscript).

Borg, G., 1982. A category scale with ratio properties for intermodal and inter-individual comparisons. (Psychological judgments). Deutscher Verlag DerWissenschaften Berlin (Unpublished manuscript).

Borg, G., Borg, E., 1987. On the relation between category scales and ratioscales and a method for scale transformations. (Unpublished manuscript).

Bruce, H.W., Clayton, P., 1999. An internet role for the academic librarian?Australian Academic & Research Libraries 30 (3), 171–187.

Cardoso, F., Matsushima, E.H., Kamizaki, R., Oliveira, A.M., Da Silva, J.A.,2001. The measurement of emotion intensity: a psychophysical approach.In: Sommerfeld, E., Kompass, R., Lachmann, T. (Eds.), Fechner Day2001. Proceedings of the Seventeenth Annual Meeting of the InternationalSociety of Psychophysics. International Society of Psychophysics, Berlin.

Cools, W., Hofmans, J., Baekelandt, S., Theuns, P., 2003. The role of contextin category scaling: placing anchors at equal distances. In: Berglund, B.,Borg, E. (Eds.), Fechner Day 2003. Proceedings of the Nineteenth AnnualMeeting of the International Society for Psychophysics. InternationalSociety of Psychophysics, Larnaca.

Cools, W., Hofmans, J., Baekelandt, S., Theuns, P., 2004. The numeric esti-mation of verbal qualifiers measuring level of agreement: differences inresponse patterns. In: Oliveira, A., Teixeira, M., Borges, G., Ferr, M.(Eds.), Fechner Day 2004. The Twentieth Annual Meeting of the Interna-tional Society for Psychophysics (pp. 344–349). International Society ofPsychophysics, Coïmbra.

fmans, J., Baekelandt, S., Cools, W., Theuns, P., 2004. The influence ofreversing the anchors on the rating scale on the intensity perception ofthe anchors). In: Oliveira, A., Teixeira, M., Borges, G., Ferro, M. (Eds.),Fechner Day 2004. The Twentieth Annual Meeting of the InternationalSociety for Psychophysics (pp. 190–198). International Society of Psycho-physics, Coïmbra.

scheider, G.A., 1988. Psychophysical scaling. Annual reviews Psychology39, 169–200.

mpen, J., 2001. The adequacy and interpretation of models for ordinal asso-ciation. Unpublished doctoral dissertation, Katholieke Universiteit Brussel,Brussels.

dge, M., Cross, D.V., Tursky, B., Tanenhaus, J., 1975. The psychophysicalscaling and validation of political support scale. American Journal of Poli-tical Science 19, 611–650.

dge, M., Cross, D., Tursky, B., Tanenhaus, J., Reeder, R., 1976a. The psy-chophysical scaling of political support in the real world. Political Metho-dology 2, 159–182.

dge, 1981. Magnitude scaling: quantitative measurement of opinions. SageUniversity papers series on quantitative applications in the social sciences(Vols. 5). Sage publications, London.

hrmann, B., 2003. Verbal qualifiers for rating scales: sociolinguistic consid-erations and psychometric data. University of Melbourne, Australia(Unpublished manuscript).

ss, H.E., 2003. Context effects in the scaling and discrimination of size. In:Berglund, B., Borg, E. (Eds.), Fechner Day 2003. The Nineteenth AnnualMeeting of the International Society for Psychophysics (pp. 257–304).International Society for Psychophysics, Larnaca.

affer, N.C., Bradburn, N.M., 1989. Respondent behaviour in magnitudeestimation. Journal of American Statistical Association 84 (406), 402–413.

vens, S.S., 1966. A metric for social consensus: methods of sensory psy-chophysics have been used to gauge the intensity of opinions and attitudes.Science 151, 530–541.

vens, S.S., 1975. Psychophysics: Introduction to its Perceptual, Neural andSocial Prospects. Wiley, New York.

htsoonian, 2004. Range effects how many and how important? In: Oli-veira, A., Teixeira, M., Borges, G., Ferro, M. (Eds.), Fechner Day 2004.The Twentieth Annual Meeting of the International Society for Psychophy-sics. International Society for Psychophysics, Coïmbra.

gener, B., 1983. Category-rating and Magnitude estimation scaling techni-ques. Social methods and research 12, 31–75.