Religiosity, moral attitudes and moral competence

124
Issues on Study Design Petri Nokelainen [email protected] Articles 1. Fung, I. Y., Wilkinson, I. A., & Moore, D.W. (2003). L1- assisted reciprocal teaching to improve ESL students' comprehension of English expository text. Learning and Instruction, 13(1), 1-31. 2 2. Lewalter, D. (2003). Cognitive strategies for learning from static and dynamic visuals. Learning and Instruction, 13(2), 177–189. 33 3. Ruggero, C. J., Johnson, S. L., & Kizer, A. (2004). Spanish language measures of mania and depression. Psychological Assessment, 16(4), 381-385. 46 4. Laurent, J., Catanzaro, S. J., & Joiner, T. E. Jr. (2004). Development and validation of the physiological hyperarousal scale for children. Psychological Assessment, 16(4), 373-380. 51 5. Nokelainen, P., Tirri, K., & Merenti-Välimäki, H.-L. (2007). Investigating the Influence of Attribution Styles on the Development of Mathematical Talent. Gifted Child Quarterly, 51(1), 64-81. 59 6. Duriez, B., & Soenens, B. (2006). Religiosity, moral attitudes and moral competence: A critical investigation of the religiosity-morality relation. International Journal of Behavioral Development, 30(1), 76-83. 77 7. Nokelainen, P., & Ruohotie, P. (2009). Non-linear Modeling of Growth Prerequisites in a Finnish Polytechnic Institution of Higher Education. Journal of Workplace Learning, 21(1), 36-57. 85 8. Tirri, K., Nokelainen, P., & Mahkonen, M. (2009). How Morality and Religiosity Relate to Intelligence: A Case Study of Mathematically Gifted Adolescents. Journal of Empirical Theology, 22(1), 70-87. 107

Transcript of Religiosity, moral attitudes and moral competence

Issues on Study Design Petri Nokelainen [email protected]

Articles

1. Fung, I. Y., Wilkinson, I. A., & Moore, D.W. (2003). L1-assisted reciprocal teaching to improve ESL students' comprehension of English expository text. Learning and Instruction, 13(1), 1-31.

2

2. Lewalter, D. (2003). Cognitive strategies for learning from static and dynamic visuals. Learning and Instruction, 13(2), 177–189.

33

3. Ruggero, C. J., Johnson, S. L., & Kizer, A. (2004). Spanish language measures of mania and depression. Psychological Assessment, 16(4), 381-385.

46

4. Laurent, J., Catanzaro, S. J., & Joiner, T. E. Jr. (2004). Development and validation of the physiological hyperarousal scale for children. Psychological Assessment, 16(4), 373-380.

51

5. Nokelainen, P., Tirri, K., & Merenti-Välimäki, H.-L. (2007). Investigating the Influence of Attribution Styles on the Development of Mathematical Talent. Gifted Child Quarterly, 51(1), 64-81.

59

6. Duriez, B., & Soenens, B. (2006). Religiosity, moral attitudes and moral competence: A critical investigation of the religiosity-morality relation. International Journal of Behavioral Development, 30(1), 76-83.

77

7. Nokelainen, P., & Ruohotie, P. (2009). Non-linear Modeling of Growth Prerequisites in a Finnish Polytechnic Institution of Higher Education. Journal of Workplace Learning, 21(1), 36-57.

85

8. Tirri, K., Nokelainen, P., & Mahkonen, M. (2009). How Morality and Religiosity Relate to Intelligence: A Case Study of Mathematically Gifted Adolescents. Journal of Empirical Theology, 22(1), 70-87.

107

Learning and Instruction 13 (2003) 1–31www.elsevier.com/locate/learninstruc

L1-assisted reciprocal teaching to improve ESLstudents’ comprehension of English expository

text

Irene Y.Y. Funga,∗, Ian A.G. Wilkinsonb, Dennis W. Moorea

a Research Centre for Interventions in Teaching and Learning, School of Education, University ofAuckland, Private Bag 92019, Auckland, New Zealand

b The Ohio State University, OH, USA

Abstract

A multiple-baseline design across three schools was used to investigate the effects of L1-assisted reciprocal teaching on 12 Year 7 and Year 8 (Grades 6 and 7) Taiwanese ESL stu-dents’ comprehension of English expository text. The intervention comprised the alternate useof L1 (Mandarin) and L2 (English) reciprocal teaching procedures. Through 15–20 days ofinstruction, students learned how to foster and monitor their comprehension by using the cogni-tive and metacognitive strategies of questioning, summarising, clarifying, and predicting. Stu-dents made gains on both researcher-developed and standardised tests of reading comprehen-sion and showed evidence of qualitative changes in their comprehension processes whenreading L1 and L2 texts. 2002 Published by Elsevier Science Ltd.

Keywords: English (second language); Reciprocal teaching; Reading comprehension; Metacognition

1. Introduction

The learning difficulties of students for whom English is their second language(ESL) are a key issue in many educational settings. It is a challenge to help thesestudents read age-appropriate content area materials for academic learning when theirEnglish language proficiency is limited.

∗ Corresponding author. Tel.:+(64-9)-373-7599x3721; fax:+(64-9)-367-7191.E-mail address: [email protected] (I.Y.Y. Fung).

0959-4752/02/$ - see front matter 2002 Published by Elsevier Science Ltd.PII: S0959 -4752(01 )00033-0

page 2

2 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Remedial programs for ESL school students who are experiencing reading diffi-culties typically emphasise improving students’ English language knowledge andcompetence. The learning difficulties of these students are usually attributed to theirlimited-English proficiency, so remedial programs in schools often delay readingcomprehension instruction until the students are fluent in oral English (Anderson &Roit, 1993). At most, reading and reading instruction in ESL classes aim to developstudents’ decoding skills or their knowledge of syntax or vocabulary for literal com-prehension. Few focus on teaching higher-level reading comprehension strategies.Findings of bilingual education research suggest that, without special intervention,ESL students normally take 2–3 years to become proficient in basic communicationskills in English and 4–10 years to approach grade-level competence in English aca-demic skills (Collier, 1988; Collier & Thomas, 1989). The pressing reality is thatnon-English proficient new immigrant secondary school students do not have timeto wait for their English language skills to mature to the stage where they can readto learn.

Interventions that develop cognitive and metacognitive strategies for reading com-prehension may provide an alternative approach to enhancing ESL students’ readingperformance. There is substantial evidence that reading strategy instruction can helppoor readers improve their reading comprehension (Anderson & Roit, 1993; Block,1993; Deshler & Schumaker, 1993; Miller, 1985; Palincsar & Brown, 1984; Paris,Cross, & Lipson, 1984; Pressley, Johnson, Symons, McGoldrick, & Kurita, 1989).Reciprocal teaching, in particular, has been shown to be a feasible method of teachingcognitive and metacognitive strategies for reading comprehension to poor readerseven before they are fully able to decode (Brown & Palincsar, 1985; Le Fevre, 1996).The reading difficulties experienced by young and slow learners whose first languageis English are quite similar to those of ESL students (Klingner & Vaughn, 1996;Miller & Perkins, 1990; O’Malley & Chamot, 1990). Hence, ESL students mightalso benefit from reciprocal teaching.

Reciprocal teaching, developed by Palincsar and Brown (1984), is designed toimprove the reading comprehension of students who are able to decode, but havedifficulty in comprehending age-appropriate text. The procedure employs a collabor-ative small-group discussion method for teaching and learning a set of reading com-prehension fostering and monitoring strategies by engaging students routinely in fourstrategic activities—questioning, summarising, clarifying, and predicting—whilereading a common text. Initially, the teacher takes the major responsibility for leadingthe discussion in the group, by thinking aloud and modelling what it is that expertreaders do when they try to understand and remember a text. In so doing, the nor-mally covert comprehension fostering and monitoring processes are made visible tostudents. Later, the students adopt the teacher’s role and take turns leading the dis-cussion for a segment of text, while the teacher monitors, diagnoses and supportsstudents’ participation. The teacher’s task is to ensure students’ successful partici-pation, using such techniques as prompting, praising, altering the demand on thestudents, or providing extra scaffolding when necessary (Palincsar, 1986). Rosensh-ine and Meister (1994) reviewed 16 quantitative studies on reciprocal teaching con-ducted between 1984 and 1992 and reported an effect size of 0.88 when researcher-

page 3

3I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

developed tests were used to assess students’ comprehension and 0.32 when stan-dardised tests were used.

Reciprocal teaching has been used with ESL students but with mixed success.Cotterall (1990) and Dashwood and Mangubhai (1996) reported no significantimprovements in the reading comprehension of their ESL pre-university participantsafter implementing reciprocal teaching interventions, whereas Klingner and Vaughn(1996) reported positive results with grades 7–8 Hispanic ESL students. In Klingnerand Vaughn’s study, students were encouraged to use their first language amongtheir peers to clarify misunderstanding even though the teacher did not speak theirlanguage.

One plausible reason for the reported failure of reciprocal teaching with ESL stu-dents is that it requires students to be able to cope with the concurrent cognitivedemands of high-level language processing and high-level strategic thinking for read-ing comprehension. Few ESL students are able to handle the linguistic burden whenthe reciprocal teaching dialogue is conducted only in English. Indeed, during thefirst phase of a pilot study, the first author found that four Year 7 (grade 6) TaiwaneseESL students were unable to interact with the teacher and other students when con-ventional reciprocal teaching, in English, was tried (Fung, 1999). These students’mean daily reading comprehension performance during this phase was even lowerthan their mean performance during the baseline. To ease the linguistic burden, Cot-terall (1990) and Dashwood and Mangubhai (1996) have suggested using parti-cipants’ fi rst language during strategy training. However, few systematic studiesexploring the effects of L1-based reading instruction have been conducted.

2. L1-assisted reciprocal teaching

2.1. Theoretical background

Three related literatures guided the modifications to conventional reciprocal teach-ing procedures in the present study. The first related to the nature of ESL students’L2 reading processes; the second concerned the role of metacognition in facilitatingknowledge and skills transfer; and the third was research on effective reading strat-egy instruction.

2.1.1. L2 reading processesFluent reading entails heavy demands on the reader’s attention (Samuels, 1994)

and relies on the automaticity of component processes of decoding and comprehen-sion. In the case of ESL students who are trying to comprehend a text in a languagenot yet mastered, the task inevitably requires more attention than is available, astheir L2 reading is often slower, more laborious and frustrating compared to theirL1 reading. Indeed, the rationale of many ESL remedial programs is to developstudents’ English language competencies and decoding automaticity to attain read-ing fluency.

However, according to schema theory (Adams & Collins, 1979; Anderson & Pear-

page 4

4 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

son, 1984; Rumelhart, 1980), there is a strong relationship between the knowledgeof the world that readers bring to text and comprehension of those texts. Manystudies in the L2 reading literature have reported that, beyond a necessary basic, butlimited knowledge of English, ESL readers’ abilities to use schemata and engage inan interactive processing to construct meaning can make a difference in how wellthey comprehend in English (e.g. Carrell & Eisterhold, 1983; Hague & Olejnik, 1989;Langer, 1990; Swaffar, 1988). These findings suggest that a wealth of backgroundknowledge and a high level of strategic knowledge, both acquired in L1 reading,may facilitate L2 reading comprehension.

This view is similar to the point Vygotsky (1962) made when discussing learninga foreign language. He argued that a foreign (or second) language learner transfersthe system of meaning he or she already has to the new language:

The foreign [or second] language acquisition process does not repeat the courseof the first language acquisition, but is an analogous system that develops in areverse direction. Each system complements the other and the two languages inter-act to the advantage of each (Vygotsky, 1962, p. 26).

These two perspectives suggest that as L2 readers are trying to derive meaningfrom text, they may call into play their existing meaning making system and skillsacquired from their previous L1 reading. Strategy use for deriving meaning fromtext appears to be a stable phenomenon that is not tied to specific language features.Hence, L2 readers should be able to transfer their successful L1 reading strategiesand skills to their L2 reading (Benedetto, 1984; Block, 1986; Cummins, 1980;Hague & Olejnik, 1989; Hudson, 1982; Langer, 1990; Lee & Musumeci, 1988).

However, we note that there are sceptics who argue that the transfer of L1 readingskills to L2 reading is not necessarily automatic (e.g. Bossers, 1992; Carrell, 1991;Clarke, 1980; Taillefer, 1996). Findings from their studies suggest that novice L2readers may not have enough language proficiency to derive meaning from textswritten in their developing second language.

One pedagogical implication of the studies discussed above is that, in addition todeveloping ESL students’ English language proficiency, non-English-proficient ESLlearners may benefit from reading instruction if it capitalizes on the learners’ existingknowledge and skills acquired in their L1 literacy activities and promotes transferto their English reading.

2.1.2. Metacognition facilitates strategy transferFindings from strategy training studies show that metacognition facilitates transfer

of learners’ acquired knowledge, skills, and strategies to different learning situations(Paris, Lipson, & Wixson, 1983). Metacognition involves the awareness of one’sown mental process and abilities, as well as those mechanisms that allow one toevaluate and regulate one’s progress during the learning activity (Brown, Cam-pione, & Day, 1981). Readers with metacognitive knowledge are able to simul-taneously concentrate on the text and on their reading processes, checking if theirreading is resulting in understanding and knowing how to deal with comprehensionbreakdowns (Paris & Winograd, 1990; Raphael & Pearson, 1985).

page 5

5I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Models of reading frequently include a metacognitive component, alongside word,sentence and idea level processing. Reading comprehension is conceptualised ascomprising four distinctive and interdependent component processes: decoding, lit-eral comprehension, inferential comprehension, and comprehension monitoring(Frederiksen, 1982; Just & Carpenter, 1987; Thibadeau, Just, & Carpenter, 1982).Decoding involves using the printed word to activate word meanings in memory,either through a direct association of the printed word and its meaning or throughthe intermediate step of representing letter-sound correspondences. Literal compre-hension involves putting activated word meanings together to form propositions.Inferential comprehension involves going beyond the idea explicitly stated to inte-grate, summarise and elaborate on these ideas. Comprehension monitoring involvessetting a reading goal, checking to see if the goal is being reached, and implementingremedial strategies when one’s goal is not being reached. The reader’s backgroundknowledge, automated basic skills and strategies in these four component processesinteract and compensate each other to enhance meaning construction while pro-cessing text (Gagne, Yekovich, & Yekovich, 1993).

However, when confronting a comprehension breakdown, a reader’s metacognitiveknowledge of how, when, where and why to use specific strategies plays a causalrole in strategy selection, implementation, and monitoring (Borkowski, Carr, & Pres-sley, 1987). Therefore, we hypothesised that instruction aimed at raising ESL stu-dents’ metacognitive awareness of their L1 reading comprehension would facilitatetransfer of the knowledge and skills acquired from their L1 reading to their L2 read-ing.

2.1.3. Effective strategy instructionStudies indicate that the metacognitive knowledge and skills that facilitate transfer

can be taught and learned through explicit reading strategy instruction (Anderson &Roit, 1993; Baker, 1996; Baker & Brown, 1984; Block, 1993; Brown et al., 1981;Deshler & Schumaker, 1993; Dole, Duffy, Roehler, & Pearson, 1991; Gelzheiser,1986; Miller, 1987; Palincscar & Brown, 1984). The effectiveness of strategy instruc-tion, however, depends largely on some quite specific teacher actions. Effective strat-egy instruction requires the teacher to become a mediator who provides explicitexplanation, modelling, and scaffolding to help students construct understandingsabout the content of the text, strategies that aid in comprehension of the text, andthe nature of the reading process (Dole et al., 1991; Duffy et al., 1987; Palincsar &Brown, 1984; Paris et al., 1984; Pearson & Dole, 1987). Reciprocal teaching appearsto meet these requirements.

2.2. Modifications to conventional practice

Based on these three lines of research, and the results of our pilot study, wemodified conventional reciprocal teaching procedures to enable students with limited-English proficiency to cope with the competing cognitive and language demandswhen reading English expository text. First, our L1-assisted reciprocal teachingincorporated the use of both L1 reciprocal teaching, using students’ fi rst language

page 6

6 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

as the medium of instruction and L2 reciprocal teaching, using students’ secondlanguage as the medium of instruction, while students were reading age-appropriateexpository text written in their first and second languages, respectively. The formeraimed to facilitate internalisation of the comprehension fostering and monitoringstrategies by using the students’ stronger language. The latter aimed to encourageknowledge and strategy transfer and to improve students’ comprehension of Englishexpository text. Second, we adopted the explicit-teaching-before-reciprocal-teachingform of the conventional procedure (for detailed discussion of this procedure, seeRosenshine & Meister, 1994), wherein all new strategies were introduced and prac-tised on L1 reciprocal teaching days, and revisited on L2 reciprocal teaching days.

The purpose of the present study, then, was to investigate the effectiveness ofour L1-assisted reciprocal teaching in improving limited-English-proficient students’comprehension of English expository text. Further, we wanted to examine whetherthere were any qualitative differences between students’ reading comprehension pro-cesses when reading text in their L1 and L2 prior to and after the intervention.

3. Method

3.1. Participants

Twelve Year 7 and Year 8 (Grades 6 and 7) ESL students aged between 11.6 and13.6 years from three suburban schools in Auckland, New Zealand participated inthe study. There were four students, two boys and two girls from each school, com-prising three experimental groups. All students were new migrants from Taiwan,who spoke Mandarin as their first language and were able to read in Chinese atgrade level when they left their home country, whereas their English reading abilitywas approximately 4–6 years behind their chronological ages (see Table 1). Theywere selected because they were fluent Mandarin speakers and were proficient inChinese literacy, and because their ESL teachers recommended them as studentswho were likely to benefit from the intervention.

3.2. Instruments and materials

Four sets of instruments were used in this study to examined students’ comprehen-sion product and process (see Table 2). The standardised test, the Neale Analysis ofReading Ability (1988), measured students’ reading comprehension and decodingaccuracy of narrative text at pre-test and post-test. The researcher-developed compre-hension tests measured students’ daily performance in comprehension of expositorytext across various phases of the study. Think-aloud tasks examined students’ L1 andL2 comprehension processes at pre-test and post-test, and the transfer test ascertainedstudents’ ability to transfer comprehension strategies to novel tasks at pre-test andpost-test.

page 7

7I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Tab

le1

Dem

ogra

phic

info

rmat

ion

onpa

rtic

ipan

tsa

Stud

ent

Sex

Age

No.

ofye

ars

ofC

urre

ntye

arle

vel

Com

preh

ensi

onR

eadi

ngde

codi

ngsc

hool

ing

inN

Zin

NZ

read

ing

age

bag

ec

S1F

11.6

0.5

Yea

r7

6.0

7.2

S2M

11.4

0.5

Yea

r7

6.3

7.3

S3F

11.6

0.5

Yea

r7

6.3

6.5

S4M

11.1

10.

5Y

ear

76.

37.

1S5

M13

.63

Yea

r8

7.10

8.8

S6F

12.1

12

Yea

r8

8.1

9.10

S7M

12.3

3Y

ear

88.

68.

10S8

F12

.51

Yea

r8

6.3

7.1

S9M

131.

5Y

ear

89.

510

.6S1

0M

12.7

0.5

Yea

r8

7.10

9.1

S11

F11

.73

Yea

r7

8.9

10.8

S12

F11

.83

Yea

r7

8.1

9.0

aA

llpa

rtic

ipan

tssp

oke

Man

dari

nas

thei

rfir

stla

ngua

ge.

bC

ompr

ehen

sion

read

ing

age

was

asse

ssed

byth

eN

eale

Ana

lysi

sof

Rea

ding

Abi

lity

.c

Rea

ding

deco

ding

age

was

asse

ssed

byth

eN

eale

Ana

lysi

sof

Rea

ding

Abi

lity

.

page 8

8 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Table 2Instrumentation and administration

Instrument Administration

Product measuresThe Neale Analysis of Reading Ability (Neale, Pre-test & post-test1988)

Researcher-developed Daily assessment during Baseline,Comprehension Tests Intervention, and Follow-up phases

Process measuresThink-aloud Tasks Pre-test & post-testTransfer Test Pre-test & post-test

3.2.1. Standardised reading comprehension testThe first author administered the Neale Analysis of Reading Ability (Neale, 1988)

as a pre-test 2 days before baseline data collection began and as a post-test 2–3weeks after the intervention. Form 1 was used at pre-test and Form 2 at post-test.Stability coefficients between Forms 1 and 2 are 0.95 and 0.98 for comprehensionand accuracy, respectively (Neale, 1988). Internal consistency coefficients for Form1 and 2 are 0.90 and 0.89, respectively, for comprehension and 0.81 and 0.83,respectively, for accuracy.

3.2.2. Researcher-developed daily comprehension testsStudents’ daily reading comprehension performance was assessed using a set of

28 researcher-developed daily comprehension short-answer tests. The 28 expositorypassages of 400–900 words were selected from the 10–12 reading-age category ofthe New Zealand School Journal Catalogues for 1978–1995 (Department of Edu-cation, 1986; Ministry of Education, 1995) based on the selection criteria of age-appropriateness, interest, content, and relevance to the cultural backgrounds of theparticipants. Each test consisted of ten questions constructed by the first author. Fourquestions were text-explicit, four were text-implicit, and two were script-implicit,using the taxonomy of question types developed by Pearson and Johnson (1978).An independent rater judged the classification of each question for 16 passages (57%of the 28 assessment passages). Inter-rater agreement ranged from 80 to 100% witha mean of 94%. For School 1, the passages for the tests were randomly assigned tosessions. For Schools 2 and 3, the order of the passages followed that of School 1.In total, the 12 students completed 307 daily comprehension tests across baseline,intervention, and follow-up phases. The first author and a graduate student in edu-cation independently marked all tests. Inter-marker agreement ranged from 80 to100% with a mean of 90%.

3.2.3. Think-aloud tasksPrompted think-aloud tasks were conducted on an individual basis prior to and

following the intervention. They were used to elicit participants’ procedural knowl-edge of strategy use for processing Chinese and English expository text. Participants

page 9

9I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

were allowed to use their first language to think aloud even when they were readingEnglish text, because it was believed that use of a student’s stronger language todiscuss text would reveal a more complete picture of students’ reading comprehen-sion (Moll, 1988; Moll, Estrada, Diaz, & Lopes, 1980).

Two English expository passages entitled Insect Noises and Birdstrikes, each ofabout 280 words, were used for the think-aloud task to examine students’ comprehen-sion processes while reading English texts at pre-test and post-test. These were alsochosen from the 10–12 reading age category of the New Zealand School JournalCatalogues for 1978–1995 (Department of Education, 1986; Ministry of Education,1995). The two passages were used as parallel forms because they were comparablein length, reading age, readability [84.3 and 83.3, using Klare’s (1984) Flesch Read-ing Ease Formula] and were from the same publisher. Similarly, two Chinese pass-ages from a non-fiction book series written for students in Taiwan aged between 10and 12 years entitled Genetic Theories and Continental Drift were selected for useas parallel forms for the think-aloud task, to examine students’ comprehension pro-cesses while reading Chinese texts at pre-test and post-test. The two texts were com-parable in readability, length (420 words each), reading age (10–12 year-old), andwere from the same publisher.

The think-aloud protocol consisted of three parts: prior knowledge assessment,prompted think-aloud, and text retelling. Prior knowledge assessment required parti-cipants to give an introductory statement briefly describing the topic of the text, stateas many different things about the topic as they could, and explain four key vocabu-lary terms chosen from the text. The prompted think-aloud elicited participants’ intro-spective knowledge of cognitive and metacognitive strategies for dealing withexpository text. Prior to and following the intervention phase, each participant ran-domly read two texts, one in Chinese and one in English. Nine places in the Chinesetexts were marked with a red dot, and 13 places in the English text were markedwith a red dot. Placement of dots was based on the natural pause structure of thetext. Procedures for the prompted think-aloud assessment in this study were the sameas those used by Hare and Smith (1982), but in our study students were allowed toconsult a dictionary or the researcher for the translation equivalents of difficult wordsthey encountered while reading English text. Each participant was asked to readsilently and stop at each dot. Then, the researcher asked him/her to answer threequestions without looking at the text: “What do you remember about what you read?”“What were you doing and thinking as you read it?” “ Is there anything else?” Beforethe text retelling, we asked each participant to reread the story silently so he or shecould construct a coherent version of the text uninterrupted by the think-aloud task(cf. Jimenez, Garcia, & Pearson, 1996). Then participants were asked to tell every-thing they remembered about the passage without looking at the text.

Students’ responses during the think-alouds were audio-taped for later analysis.Only students’ strategy use during the think-alouds is reported in this paper. Strategyuse during the think-aloud was coded using a system developed by the first author.Table 3 shows the four groups of comprehension strategies and their subcategories.The unit of analysis was the segment of text students read before they were promptedto stop and think aloud. A category code for a specific strategy was assigned only

page 10

10 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Tab

le3

Cat

egor

ies

and

subc

ateg

orie

sof

read

ing

com

preh

ensi

onst

rate

gies

Dec

odin

gL

itera

lco

mpr

ehen

sion

Infe

rent

ial

com

preh

ensi

onC

ompr

ehen

sion

mon

itori

ng

�Bre

akin

gle

xica

lite

ms

into

part

s�P

arap

hras

ing

�Pre

view

ing

the

text

and

the

�Aw

aren

ess

offa

ilure

/dif

ficul

ties

toill

ustr

atio

nsfo

rge

nera

lun

ders

tand

unde

rsta

ndin

g�U

sing

lette

r-so

und

�Sw

itchi

ngco

des

betw

een

L1

�Usi

ngte

xtst

ruct

ure

tohe

lp�A

war

enes

sof

succ

ess

inco

rres

pond

ence

and

L2

iden

tify

impo

rtan

tid

eas

unde

rsta

ndin

g�C

onsu

lting

the

dict

iona

ryfo

r�T

rans

latin

gw

ord

for

wor

d�D

raw

ing

infe

renc

es�A

war

enes

sof

the

exis

ting

tran

slat

ion

equi

vale

nts

know

ledg

eco

ntra

dict

ing

the

text

�Res

tatin

gth

ete

xt�Q

uest

ioni

ng�A

war

enes

sof

lack

ofba

ckgr

ound

know

ledg

e�C

lari

fyin

g�R

erea

ding

�Sum

mar

isin

g�R

eadi

ngah

ead

�Pre

dict

ing

�Adj

ustin

gre

adin

gsp

eed

�Vis

ualiz

ing

�Usi

nga

bilin

gual

dict

iona

ry�I

nvok

ing

and

rela

ting

prio

rkn

owle

dge/

expe

rien

ceto

the

text

cont

ent

�Cor

rect

ing

mis

unde

rsta

ndin

g�E

valu

atin

gad

equa

cy/tr

uthf

ulne

ssof

wha

tth

eau

thor

says

page 11

11I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

once for each text segment when one or more of its subcategories were identified.The first author and a graduate student in education independently coded students’think-alouds. Inter-rater agreement ranged from 80 to 100% with a mean of 88%.

3.2.4. Transfer testSix expository passages each of about 150 words were selected for the transfer

test. Four of them were adapted from Markman (1979) and contained logical incon-sistencies in the presentation of content materials. The other two were normal texts,similar in length and readability to those from Markman, and were adapted fromother sources. One was from the New Zealand School Journal (1994, Part 2, Number1) and the other was from an unidentifiable source. One normal passage and twopassages with inconsistencies formed the texts used at pre-test or post-test (see Table4). The two sets of texts were matched approximately in length and readability. Thetransfer test consisted of two tasks. The first task was for participants to give a freerecall after reading each of the three English texts. This was to assess the extent ofparticipants’ comprehension. The second task was for participants to comment onwhether the texts were clear and easy for readers to understand if they were tobe published. Among the three texts the participants read, two contained deliberateinconsistencies and one did not. The purpose of this part of the transfer test was toassess how well the participants monitored their comprehension and whether theywere able to detect inconsistencies as they read. Students were allowed to consulta bilingual dictionary or the researcher for Chinese translation equivalents of difficultwords. The pre-test was administered 1 week before intervention began and the post-

Table 4Inconsistencies in the texts used for the transfer test at pre-test and post-test

Texts Inconsistent statements

Pre-testFood Problem –Ants �Ants do not have a nose.

�Ants can always find their way home bysmelling for the odour.

Fish �Fish cannot see colours at the dark bottom ofthe sea.

�Fish can see the colour of their food.Post-testPossum –Ice cream �The ice cream in Baked Alaska melts when it

gets hot.�The ice cream stays firm and does not meltwhen they make Baked Alaska.

Snake �Garden snakes do not have ears. They cannothear the insects.

�Garden snakes can hear the sounds of theinsects and catch them for food.

page 12

12 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

test, 2 weeks after intervention. To offset any order effect, participants were told tochoose a text at random to read each time. Participants’ responses were audio-tapedand later transcribed in Chinese.

For the scoring of free recalls, each of the three passages was divided into ideaunits. The number of idea units recalled was expressed as a percentage of the totalnumber of idea units in each text. For the scoring of students’ comments, a maximumscore of 2 points was assigned to each correct answer to the two questions: “Doeseverything make sense to you?” “ What do you want to add so that the passage willbecome easier to understand?” For each question, a correct yes or no answer withsound justification received 2 points. A correct yes or no answer, but without soundjustification received only 1 point. The maximum score for a participant’s commentson each passage was thus 4 points; hence, the maximum total score assigned to eachparticipant on the inconsistencies measure was12 points. Participants’ performancewas also expressed in terms of percentages of the total scores.

The transfer test was administered on an individual basis 3 days prior to and 2days after the intervention. Students’ performance on free recall and inconsistenciesmeasures at both pre-test and post-test were audio-taped and later transcribed inChinese. The first author and a graduate student in education independently scored alltranscripts. Inter-rater agreement ranged between 75 and 100% with a mean of 89%.

3.3. Design and procedure

The study used a single-subject multiple baseline design across the three groupsof subjects (Barlow & Hersen, 1984). This design used control procedures ratherthan control groups (Neuman & McCormick, 1995). It emphasized analysis of per-sonalised evaluation data to gauge treatment effects on individual participants, com-paring individuals’ changes against their own pre-intervention levels and trends inperformance (McCormick, 1995), and it used systematic replication across parti-cipants in different settings to attest to the degree of reliability and generalizabilityof the results (Kucera & Axelrod, 1995). In addition, this design emphasized analysisof transfer of the targeted knowledge and skills (McCormick, 1995). The number ofdata collection points across baselines in the present study was kept constant tominimise possible practice effects in the delayed baselines. The order in which thethree experimental groups participated in the study corresponded to the order inwhich school principals granted permission.

Following the baseline, we introduced the modified L1-assisted reciprocal teachingprocedure. After the intervention, there was a follow-up phase of data collection toassess the maintenance of treatment effects. Effects of the intervention on readingcomprehension and accuracy were measured with pre-test and post-test adminis-trations of the Neale Analysis of Reading Ability (Neale, 1988). Treatment effectson participants’ procedural knowledge of strategy use for reading comprehensionwere assessed using information collected from the think-aloud tasks before andafter intervention. Pre-test and post-test administrations of the transfer test were alsoincluded to assess the extent of cognitive and metacognitive strategy transfer to anovel task.

page 13

13I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

The first author served as the teacher and conducted all reciprocal teaching andexplicit strategy instruction in a small room in each school. The researcher met allstudents for 1 h each day, 4 days a week, over a period of 4–5 weeks during thethird term of the school year. The reciprocal teaching sessions occurred daily for 20days in Schools 1 and 2, and for 15 days in School 3. The program in School 3 wascut short because there was a 2-week vacation after the 15th session of the programand one of the students moved to another school after the term break.

3.3.1. BaselineStudents received a 5-day baseline assessment. On each day, they read an assess-

ment passage, and then attempted to answer ten comprehension questions withoutlooking at the text. The researcher read the questions if requested by the students.When students came to a difficult word, they were allowed to ask the researcher orto consult a bilingual dictionary for its translation equivalent. Each test was to becompleted within 25 min. The students received their test results the following dayand were asked to record their results on the record sheet.

3.3.2. InterventionThe intervention comprised the modified L1-assisted reciprocal teaching procedure

in which both Chinese and English reciprocal teaching occurred on alternate days.On each day, prior to the 20-min reciprocal teaching dialogue, there was a 15-minsession of teacher-directed explicit strategy instruction. The language used in theexplicit instruction was the same as that used for the reciprocal teaching dialogueon that day.

During the explicit strategy instruction, new concepts and strategies were intro-duced in Mandarin first and revisited in English on the following day. The first daywas an introduction to the purpose and format of the comprehension program andbrief training in communication skills for group discussion. The content of the fol-lowing day’s explicit strategy instruction was the same except that it was conductedin English. On the third day, the focus was on knowledge about the reading process,reading goals, and reading strategies. The teacher explained what strategic readersdo as they read; what the four strategies of questioning, summarising, clarifying andpredicting were; and why the four strategies were useful. After the 4th day, onestrategy was singled out and taught. Each strategy was taught and revisited whilereading a short training passage that exemplified one of Meyer’s (1975) four typesof top-level structure: comparison, causation, problem/solution and collection ofdescriptions. For example, when reading a text with the cause-effect top level struc-ture, students were told that the use of clarifying, questioning, summarising, andpredicting strategies would be more effective if the focus was on the cause and theeffects of the issue stated in the text. Overall, there were 12 days of explicit strategyinstruction, six conducted in Mandarin and six in English. After the 12th day, explicitstrategy instruction was omitted and every session began with 5 min of feedback onstudents’ performance on the previous day’s comprehension test. The reciprocalteaching dialogue then followed.

Four passages were used for English reciprocal teaching throughout the inter-

page 14

14 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

vention. They ranged from 200 to 1300 words, with the shorter ones being used inthe early stage of the intervention. They were selected from the 10–12 reading-agecategory from the New Zealand School Journal Catalogue for 1980–1995 (Ministryof Education, 1995). Another eight passages were used for L1 reciprocal teachingthroughout the intervention. These were Chinese passages of 350–1400 words chosenfrom a non-fiction book series written for students in Taiwan aged between 10 and12 years.

The dialogues followed the procedures described by Palincsar and Brown (1984),except that Mandarin and English were used on alternate days. The 1st day of theintervention was Chinese reciprocal teaching. The teacher first modelled the use ofthe four strategies of questioning, summarising, clarifying and predicting while read-ing a segment of a Chinese training passage. Then the students took turns to assumethe teacher’s role and practised the four strategies on the following sections of thetext. The teacher supported each student’s participation by giving prompts, praise,feedback, or explanation. The next day was English reciprocal teaching. In theEnglish reciprocal teaching sessions, students were encouraged to clarify the meaningof any difficult words they encountered during their silent reading, either by con-sulting a bilingual dictionary or the teacher for the translation equivalents in Chinese,before the dialogue began. In so doing, the dialogues focused more on idea-levelrather than word-level comprehension. Following each reciprocal teaching session,students individually completed the daily comprehension test in the same way asthey did during the baseline phase.

3.3.3. Follow-upFollowing completion of the intervention phase, baseline procedures were reintro-

duced over three sessions as follow-up probes. These follow-up probes were carriedout three to four weeks after each intervention.

4. Results and discussion

4.1. Quality of dialogue

On the whole, interactions during L1 reciprocal teaching were more collaborative,fruitful, and enjoyable to students than those during the L2 reciprocal teaching. Stu-dents in School 1, in particular, showed much better performance during the L1-reciprocal-teaching sessions (see Table 5). In the early stages of the intervention,students in School 1 spent almost half of each L2 session solving vocabulary prob-lems, leaving less time available for the substantive dialogue (see Table 6). More-over, the pace of the dialogue was so slow that only a small part of an article wascovered in each session. As a result, the teacher often had to alter the demands onstudents, for example, by asking each student to go through one activity instead offour to ensure that each student had a turn to give input to the dialogue.

Unlike School 1, there were at least two students in Schools 2 and 3 who hadbetter oral English skills. Therefore, the pace of their L2 dialogue was somewhat

page 15

15I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Table 5Excerpt from the L1 reciprocal dialogue, school 1, day 3

Reading text: Bats

Nearly all bats are active at night, and many of them live in dark caves. Have you ever wonderedhow bats get around in the dark without any problems? They shouldn’ t be able to see things in thedark, but how can they manage to avoid mid-air collisions or find their food? 200 years ago, twozoologists, one from Italy, and one from Switzerland, did an experiment with some bats. Theycovered the ears of the bats with very thick cloth and let them fly in a dark cave. They found thatthese bats could not fly as smoothly as usual. The two biologists did not hear any noises made by thebats while they were flying, so they just couldn’ t understand why the bats didn’ t fly properly whentheir ears were covered with thick cloth.S1: The finding … about bats … The finding is about what … and who found it?T: It seems that you’ve got many questions to ask. I think it’d be easier to ask ONE at a time.

So what is your first question?S1: Well, my first question is: Who found out this?T: Found out what?S1: Who found out bats don’ t use their eyes when they fly?S3: Two biologists, one from Italy and the other one from Switzerland.S1: Yes, that’s correct. Now my second question is: why … why … I mean when the ears of the

bats were covered with thick cloth, why didn’ t they fly properly?S2: [can’ t wait the question is finished] Because they rely on hearing sonar echoes to find their

way around.T: Wow, S2 [real name replaced] knows pretty much about bats, doesn’ t he?S1: But, but … it didn’ t say anything about sonar echoes here.S4: It just says here they didn’ t know why the bats didn’ t fly properly.T: Well, it’s really good to see (S2) to use the knowledge in his head to help us understand this

part. But as what S1 and S4 did, we also need to check whether there’s any differencebetween what we know in our head and what the author is trying to say in the text.

S2: I have a question too, why … I mean why did they want to cover up the bats’ ears, ratherthan their eyes … or mouths?

T: That’s really a good question! It leads us to clarify the purpose of their experiment. Cananyone answer it?

S4: I know, because bats … it says here bats do not have any problems flying inside a dark cave,so bats don’ t use their eyes.

S1: And they just guessed that bats used their ears.T: So, they wanted to test if this guessing was right. Okay, S1, could you summarize this part

now?S1: [Reading from the beginning of this segment] Nearly all bats are…T: Use your own words when you summarize.S1: Well, this part is about … about the purpose of the experiment. It was to test if bats used their

ears as they fly.T: Is that all? What about the finding?S1: Their finding was … bats didn’ t fly very well when their ears were covered, but they didn’ t

know why.T: Excellent! Now, can you predict what will be coming next?S1: Maybe about … about why … about how bats use their ears as they fly.T: Well, lets read further and see if it gives us an explanation. S3, would you like to be the next

teachers?

page 16

16 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Table 6Excerpt from the L2 reciprocal dialogue, school 1, day 4

Reading text: Hospital Helicopter

What’s that noise? I’m visiting a friend in hospital, and outside I can hear a “chukka, chukka,chukka,” getting louder and louder. It’s the shiny red and yellow helicopter arriving. This helicopteris a kind of air ambulance. It can bring patients to hospital much quicker than an ordinary ambulance.People go over to the windows to watch. Outside, two people in blue uniforms are waiting with astretcher to help get the patient out of the helicopter. The helicopter sways as it lands. The twohelicopter pilots, dressed in red, jump out to help. So does the ambulance officer, who has beenlooking after the patient as they flew to the hospital.T: By looking at the title “Hospital Helicopter” and all the pictures here, what do you think this

article will tell us about?S3: Doctors and … [point to the word “patients” ]T: Patients.S3: Patients. Doctors and patients.T: Do you mean about how doctors help patients?S3: Yes [nodding].T: So S3 [real name replaced] predicts that this article is about how doctors help patients. S3,

would you like to be our first teacher today and lead us to read the first paragraph?S3: Yes [nodding]. [Students asked the teacher for translation equivalents as they did their silent

reading.]T: [When students stopped reading] S3, now ask us a question about this paragraph.S3: Umm … What colour [pointing at the word “helicopter” ]…T: Helicopter.S3: Helicopter. What colour is the helicopter?S1: Red and yellow.T: That is really an easy question, isn’ t it? Have you got a harder one? For example, a question

begins with “how” or “why”?S3: Umm … umm … no.T: Does anyone have another question to ask about this paragraph? Yes, S4, you’ve got one.S4: How … how people … how the people help the patient?T: Are you asking: how do the people on the helicopter help the patient?S4: Mm … How this helicopter help the patient.T: That’s a very good question. How does this helicopter help the patient? Can anyone answer it?

-[No response]T: S4, can you tell us how this helicopter helps the patient?S4: Umm, it can … it can bring … patients to the … hospital much quicker [reading from the

text.]T: Yes. I have a question, why do they bring patients to the hospital by helicopter, but not by

ordinary ambulance?S2: Because, because it is quick.T: I see. A helicopter can bring patients to the hospital much quicker than an ambulance. Now,

S3, can you summarise this paragraph for us?S3: It’s about the helicopter, umm … the helicopter can bring … [point to the word “patient” ] …

patients [after the teacher saying the word “patient” ] to the hospital quicker.T: Well done, S3.

page 17

17I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

faster, and the amount of text covered was greater, though still wanting when com-pared with their performance during the L1 sessions. During the L2 sessions, theteacher was generally more supportive of students and gave more prompts and praise,and the students were more sympathetic and helpful to each other. By contrast, duringL1 sessions, the teacher placed higher demand on students, who tended to ask morechallenging questions, often in a playful fashion. However, during the final stage ofthe intervention, the quality of L2 dialogues in all three schools did improve (seeTable 7).

4.2. Researcher-developed daily comprehension tests

Fig. 1 shows the mean performance of the three groups of participants on dailycomprehension assessment tests across baseline, intervention, and follow-up phases.To aid visual inspection, trend lines for the three schools across phases weredeveloped using the celeration line technique (Bloom & Fischer, 1982). In time seriesanalysis, the two dimensions of concern are the change in level and change in trend(Krishef, 1991). A change in level refers to a change at the point where the baselineends and the treatment begins. A change in trend refers to a change in tendency forperformance to increase or decrease systematically over time (Kazdin, 1982).

Fig. 1 shows that there was an upward shift in level and an upward change intrend for participants in Schools 2 and 3. In School 1, the changes in level and trendwere not so apparent, though a consistent tendency for increasing performance wasevident. This might have been due to the fact that the four students were too over-whelmed by the task on the 1st day, hence their baseline trend might have beenconfounded with the effect of having extremely low scores on the first day. Individualstudent’s daily graphs showed that the magnitude and latency of the change variedfrom student to student. This variability might be attributable to individual differ-ences in the time needed for internalisation of cognitive and metacognitive strategies.It might also be the result of individual differences in prior knowledge, interest inthe content material, vocabulary knowledge, or motivation. Nevertheless, studentsin all three schools showed a marked gain in mean scores on the reading comprehen-sion tests between the first and second half of the intervention (see Table 8). Thefollow-up data show that the achieved level of performance maintained three to fourweeks after the intervention. Paired samples t-tests indicate a significant differencebetween the overall baseline means and follow-up means (t=11.03, df=10, p�0.05).

4.3. Standardised reading test

Table 9 shows students’ reading comprehension and accuracy scores on the NealeAnalysis of Reading Ability (Neale, 1988). There was a mean gain of 4.8 in compre-hension raw scores or a mean increase of 12 months in comprehension reading ages.There was also a mean gain of 11.73 in accuracy raw scores or a mean increase of12 months in accuracy reading ages. Results of paired samples t-tests on the rawscores indicate a significant difference between pre-test and post-test in both compre-hension (t=5.53, df=10, p�0.05) and accuracy (t=5.18, df=10, p�0.05).

page 18

18 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Table 7Excerpt from the L2 reciprocal dialogue, school 2, day 18

Reading text: The Dunker

Meet HUET, the Helicopter Underwater Escape Trainer, commonly known as Dunker. This machineis used in the training given to all workers who travel by helicopter back and forth to Taranaki’soffshore gas rigs. Everyone—from the highest boss to the newest rigger—has to practice what to doif their helicopter should crash into the sea. HUET is a bright orange metal cage about the same sizeand shape as a large helicopter cabin. The trainees take their seats and tighten their seat belts.Experienced divers wait in the pool, ready to help anyone who gets into difficulties. If a trainee iswearing a red helmet, the divers know that this person is a non-swimmer, and will watch extracarefully. The instructor tells the trainees how to brace themselves for when they hit the water, just asthey would in a real crash.

[As the students received their copy of the article, they spontaneously flipped over, lookingat the title, and the pictures.]

T: Okay, do we know anything about this article with the title called “The Dunkers”?Ss: No.S7: I think it’s about diving, because a diver is here [point at the picture].S6: I think it’s about training … training divers.S5: No, I think it’s about training people to escape for safety. Ha! Ha! I saw words like

“escape,” “ helicopter,” and ”crash,” here and there.T: You clever boy! Will you be our first teacher today?S5: Okay! [saying proudly]T: How much do you want us to read, S5?S5: Two paragraphs, up to “ in a real crash.” [Students started reading and asked the teacher for

translation equivalents as they read.]T: S5, we’ re ready now. So what’s your first question?S5: My first question IS: what’s “HUET”?Ss Helicopter Underwater Escape Trainer [all students reading aloud the text].S5: Correct! My next question IS: What is dunker?S7: It’s the machine, … a red-orange metal cage, … It’s of the same size and same shape of a

helicopter.S5: Correct! Now, I have a harder question for you: when a helicopter crashes into the sea, what

… what will they do?S6: They will die.Ss They will die! [all students repeated amusedly, and all laughed]S5: No, I mean they should brace themselves for when they hit the water [reading aloud the

text].T: So you actually wanted to ask us something about the training program. But before that, I

want to clarify: who need this kind of training program anyway [looking at S8]?S8: Fishermen?S6: No, helicopter pilots.S7: And passengers too.T: But I think this training program is especially for some people.S5: It says here … [pointing to the word “ rigger” ] it’s for riggers.S7: But why riggers?S6: [Reading aloud the text] Because they travel by helicopter back and forth to Taranaki’s

offshore gas rigs.T: Thank you, S5. Now, please summarize these two paragraphs for us.S5: Umm, it’s an introduction of the HUET. Emm … It talks about the machine, the dunker, and

the training program for riggers.T: Good job, S5. Choose the next teacher please.

page 19

19I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Fig. 1. Mean daily comprehension performance across baseline, intervention, and follow-up phases(changes in level and trend indicated on the right).

Table 8Mean percent correct on researcher-developed daily comprehension tests across baseline, intervention,and follow-up phases (Standard deviations in parentheses)

Baseline First half of Second half of Follow-upintervention intervention

School 1 22.75 (1.5) 45.63 (3.33) 61.75 (6.25) 71.67 (7.2)School 2 45.25 (13.2) 58.25 (14.56) 78.72 (5.8) 82.5 (13.2)School 3 44.25 (17.99) 61.00 (13.34) 80.00 (9.13) 81.11 (14.17)

4.4. Think-alouds

The data in Table 10 show that there were mean increases from pre-test to post-test in the frequencies of inferential strategy use of 24.27% in L1 reading (t=3.04,df=10, p�0.05) and 43.36% in L2 reading (t=6.18, df=10, p�0.05). These increasessuggest that students made more conscious effort to achieve idea-level comprehen-sion during their L1 and L2 text processing at post-test. Moreover, as can be seen

page 20

20 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Tab

le9

Com

preh

ensi

onan

dac

cura

cysc

ores

onth

eN

eale

Ana

lysi

sof

Rea

ding

Abi

lity

test

atpr

e-te

stan

dpo

st-t

est

Com

preh

ensi

onA

ccur

acy

Pre-

test

Post

-tes

tPr

e-te

stPo

st-t

est

Stud

ent

Raw

scor

esR

eadi

ngag

eR

awsc

ores

Rea

ding

age

Raw

scor

esR

eadi

ngag

eR

awsc

ores

Rea

ding

age

Scho

ol1

S13

6.0

107.

725

7.2

408.

6S2

46.

35

6.5

267.

343

8.9

S34

6.3

87.

117

6.5

247.

0S4

46.

34

6.2

247.

137

8.2

Scho

ol2

S511

7.10

199.

741

8.8

6010

.4S6

128.

120

9.10

549.

1056

9.11

S714

8.6

2210

.343

8.10

6811

.0S8

46.

38

7.1

247.

125

7.1

Scho

ol3

S918

9.5

2410

.961

10.6

7411

.7S1

011

7.10

138.

346

9.1

5910

.3S1

115

8.9

209.

1063

10.8

5711

.0S1

212

8.1

––

458.

1–

–M

ean

9.09

13.9

1∗38

.55

50.2

7∗

SD5.

437.

2916

.28

17.3

9

∗p�

0.05

page 21

21I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Tab

le10

Mea

npe

rcen

tof

text

segm

ents

inw

hich

stra

tegi

esw

ere

used

inth

ink-

alou

dsdu

ring

L1

and

L2

read

ing

atpr

e-te

stan

dpo

st-t

est

L1

read

ing

L2

read

ing

Pre-

test

Post

-tes

tPr

e-te

stPo

st-t

est

Stra

tegi

esM

ean

SDM

ean

SDM

ean

SDM

ean

SD

Dec

odin

g0.

000.

000.

000.

0073

.55

26.8

064

.36

35.3

0L

itera

l98

.00

4.45

99.0

03.

3297

.82

3.74

96.4

59.

42In

fere

ntia

l35

.36

22.4

259

.64∗

18.3

219

.55

10.9

662

.90

20.9

4∗

Com

preh

ensi

on38

.27

24.6

740

.36

20.8

581

.18

19.8

282

.64

16.7

2m

onito

ring

∗p�

0.05

The

rew

ere

nine

units

ofan

alys

isin

the

L1

read

ing

text

and

13un

itsof

anal

ysis

inth

eL

2re

adin

gte

xtat

both

pre-

test

and

post

-tes

t.

page 22

22 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

in Table 11, students demonstrated use of a wider repertoire of inferential strategiesat post-test in both their L1 and L2 reading.

Qualitative data from the think-aloud transcripts show that at pre-test studentsusually ignored what they did not understand. Even when they acknowledged thatthey did not understand, eight out of 11 students did not know how to fix up theiridea-level comprehension breakdowns during both their L1 and L2 text processing.They simply stopped and expressed frustration over the task. For example:

Student 1 (dot 7/L1 reading): It says his genes, well, he discovered … and then,he felt quite strange, and then … I don’t know.Student 1 (dot 3/L2 reading): Some insects make sounds because they want tosend messages. Well … I’m not quite sure what it’s talking about.Student 8 (dot 7/L1 reading): I’ve read twice this part but I still don’t quiteunderstand. The first sentence says this is an important discovery. The chromo-somes were passed onto those flies with pale red eyes, … then … all are palered … I can’t link them up! [sighed]Student 8 (dot 5/L2 reading): It says you can also make this sort of sound ... um,…like fingernails …like when fingernails are long … It seems to say … well, italso says teeth can make sounds … when combing hair … you can also hear thesounds. Oh! I just don’t understand what it says. Fingernails … and teeth, andmaking sounds… What on earth is it talking about?

However, at post-test, students were more active in their meaning making duringL1 and L2 reading. For example:

Student 1 (dot 7/L1 reading): It says South American Continent and African Conti-nent can move, because now they’ve moved and joined together. But, it doesn’t

Table 11Frequencies of inferential strategies used in the think-alouds at pre-test and post-testa

L1 reading L2 readingStrategies for inferential Pre-test Post-test Pre-test Post-testcomprehension

Previewing text & illustrations 0 0 0 1Drawing inferences 22 33 16 66Questioning 12 28 6 32Using structural cues 0 3 0 0Clarifying 0 4 0 14Summarizing 0 2 0 4Predicting 1 4 0 7Visualizing 2 7 8 11Total 37 81 30 135

a Frequencies are expressed in terms of total number of times a strategy was noted in the think-aloudprotocols. There were nine units of analysis in the L1 text and 13 units of analysis in the L2 text at pre-test and post-test.

page 23

23I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

say how they moved. How did he discover this after all? From the TV, from somebooks? Or did he himself discover it?Student 1 (dot 3/L2 reading): I still don’t quite understand the meaning of thisword “birdstrike.” The strike of birds … but this is also one way to hurt birds.It will kill birds. The birds will die, so it must be a way of doing harm to birdswhen they’re sucked into the engine.Student 8 (dot 8/L1 reading): Goodness me, what kind of writing it is! “Continen-tal drift — What a great dream!” This sentence just comes from nowhere! Yes,I know this part is talking about his discovery, but … here (pointing at dot 6) itsays that Wegener began to think about this question seriously, and here (pointingat dot 8), it explains how he did the research, that is, by studying theses, andafter studying those theses he had this discovery. Well, I think this part shouldn’tbe placed here because it doesn’t read well; instead, it should be placed rightafter here (pointing at the end of dot 6). This passage must have been written bya foreigner!Student 8 (dot 4/L2 reading): Well, it says the birds could be sucked into theengine. This could damage the engine of the plane. Once damaged, the engineneeds repair. It says if the engine suddenly lost … that would be very dangerous.… when lost means no longer there, so there could be a danger, the plane couldcrash. But why is it that when the engine is blocked by the bird … I mean …why is it that the engine would suddenly disappear?

Student 10, in particular, demonstrated a dramatic change in his approach to read-ing. At pre-test, he insisted on using English to do the think-aloud task during hisL2 reading, but all he did was to recite word by word from the text without concernfor understanding:

Student 10 (dot 8/L2 reading): (Trying hard to recite the text) In the body of themale cicada, they have two discs in the each side, in each side…. some cicadacan … slightly curved, they can slightly, …… no, they are slightly curved… andcicada can click in and out like, out like … pressed. (Embarrassed) I can memorizeshort sentences without understanding, but I can’t remember such a long sentencelike this!Student 10 (dot 3/L1 reading): It talks about a zoologist who also studies genetics.He found … um.. and studies developed rapidly…um… some people found genesin chromosomes. In fact I’m not sure what chromosomes are.

As can be seen, Student 10 demonstrated somewhat greater concern for under-standing during his L1 reading at pre-test, though he did little to recover from hiscomprehension breakdowns. By contrast, at post-test, he was more able to monitorand to foster his L2 reading comprehension through clarifying, predicting, verifyinghis predictions, and linking ideas from different parts of the text. He was also moreable to draw on his prior knowledge to evaluate and extrapolate the ideas presentedin the L1 text at post-test. For example:

page 24

24 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Student 10 (dot 11/L2 reading): (Reading from the text) “But they are not per-fect.” … Um… “They” should mean seagulls, but I’m not sure. I need to readfurther and see what comes up next.Student 10 (dot 12/L2 reading): Okay, I know it now! It says the gas-guns arenot perfect. Well, here it explains that the gas-guns have been used for a longtime, certainly the seagulls have got used to them. Right! No matter how manytimes they fire the guns, the seagulls will not be scared or fly away, because theyhave got used to the noise.Student 10 (dot 7/L1 reading): Yes, I do think so! African Continent and SouthAmerican Continent were joined together it the past, but over a long period oftime, um… Well … maybe it had been pounded … pounded by the sea waves …then they broke apart … so, after a long time, they ended up in the present pos-itions.

However, a change in strategy use was not so clear for three students who werealready using some fix-up strategies to recover from their comprehension breakdownswhile reading English text at pre-test. For example, instances of inferential strategyuse were identified in Student 6’s think-alouds at both pre-test and post-test:

Student 6 (dot 3/L2 reading/pre-test): It says these insects … in this way … areperhaps calling their brothers and sisters. Right! I remember when I was in Tai-wan, I read something about … ants, for example, they leave some sort of “juice”on their path, and they will know where the food is. Those insects are calling,maybe because they’re trying to tell their brothers and sisters where the food is.I’m not so sure.Student 6 (dot 11/L2 reading/post-test): (Reading aloud the text) “But they arenot perfect” … Which “they” is it talking about? It may mean the seagulls, orthe gas-guns. … Well, I think it should mean the gas-guns… because you can’tsay the seagulls aren’t quite “perfect,” unless you haven been getting along withthem before. So, it should be talking about the gas-guns that are not that perfect.

On the other hand, Student 5 (during L1 reading) and Student 9 (during L1 andL2 reading) did not verbalise any comprehension breakdowns or fix-up strategies;instead, they simply recalled the text during their think-alouds at post-test. It wasnot clear whether this was because they did not have any idea-level breakdowns orbecause their strategy use had already become automatised and, therefore, not avail-able to the students’ conscious awareness. Nevertheless, the think-aloud transcriptsshowed that following the intervention students generally used inferential compre-hension strategies more frequently. They became more deliberate in their idea-levelcomprehension and demonstrated a deeper level of text processing during both L1and L2 reading.

In summary, results of the protocol analyses show treatment effects on students’L1 and L2 reading comprehension processes and L2 comprehension product. Follow-ing the strategy intervention, students became more deliberate in fostering and moni-toring their L1 and L2 comprehension at the idea level, using a wider repertoire of

page 25

25I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

strategies for inferential comprehension to integrate their prior knowledge and experi-ence with the text information.

4.5. Transfer test

Table 11 shows students’ mean performance on the transfer measures. There wasa significant increase in the percent of idea units recalled from pre-test to post-test(t=6.19, df=10, p�0.05). There was also a significant improvement on the detectionof inconsistencies measure (t=2.70, df=10, p�0.05), suggesting that students’ com-prehension monitoring was more successful at post-test. (Table 12)

At pre-test, eight students failed to spontaneously detect the inconsistencies on atleast one of the two problem-passages. At post-test, only two students failed to doso. Students’ critical comments made at pre-test usually were about the choice ofwords, the punctuation, the length of the text, or the amount of information presentedin the text. For example:

Student 7 (reading article about ants): It cannot be published. It is too short. Itwould be better if it talks more about ants.Student 9 (reading article about fish): It’s okay. It can be published. But it canadd some more information about fish, like how they breathe.

At post-test, students’ comments focused more on the logical structure of the text,and the majority of students were able to suggest sensible solutions to the contradic-tions that existed in the problem passages. For example:

Student 7 (reading article about snakes): This article cannot be published, unlessone more paragraph is added. It must explain how snakes hear, how they knowthere are insects, something like that.Student 9 (reading article about ice cream): I think it maybe because this is avery special kind of ice cream, and it takes a long time to melt. But if it’s topublish, it needs to explain more clearly as to why it doesn’t melt after beingbaked in the oven.

Table 12Results of transfer test on the free recall measure and inconsistencies measure at pre-test and post-test

Pre-test Post-testAssessment Mean (%) SD Mean (%) SD

Free recall measure 41.18 8.83 60.55∗ 5.99Inconsistencies 53.09 25.21 78.73∗ 16.77measure

∗ p�0.05.

page 26

26 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

5. Conclusion

The purpose of this study was to investigate the effect of a modified form ofreciprocal teaching on the comprehension of limited-English proficient ESL studentswho had difficulties reading age-appropriate English expository text. In addition, thestudy sought to examine if there were qualitative differences in students’ fi rst andsecond language reading processes prior to and following the intervention. Bothquantitative and qualitative data strongly indicate that the L1-assisted reciprocalteaching procedure improved the English reading competence of the Mandarin-speaking ESL students who participated in the study.

The quantitative data provided evidence of a treatment effect on students’ L2reading comprehension. On the daily reading comprehension measures, all studentsfrom the three schools demonstrated improvement. Nine out of the 12 studentsattained and sustained a mean score greater than or equal to 70% correct over thefinal five sessions of intervention (70% correct is a benchmark criterion for goodreaders set by Palincsar and Brown, 1984). Follow-up probes showed that the levelof performance maintained 3–4 weeks after the intervention. Moreover, on the stan-dardised test, there were statistically significant increases in both the comprehensionand accuracy scores, representing gains in reading age in the order of 12 months. Itshould be noted that gains on standardised tests of reading following reciprocal teach-ing have been notoriously hard to obtain (Rosenshine & Meister, 1994).

The qualitative data from the think-aloud protocols provide evidence of corre-sponding effects on students’ reading comprehension processes. There were changesin students’ procedural knowledge for both L1 and L2 text processing. Followingthe intervention, students demonstrated their ability to concentrate simultaneouslyon reading the text and on monitoring their own performance, checking if their read-ing was resulting in understanding and knowing how to deal with comprehensionbreakdowns. This is one source of evidence of improvement in students’ metacogni-tion.

In addition, results of the transfer test indicated that students were able to transfertheir newly acquired comprehension fostering and monitoring strategies to a noveltask. Successful strategy transfer relies on students having metacognitive knowledgeof how, when, where, and why to use specific strategies (Borkowski et al., 1987).Students’ successful performance at post-test demonstrated that they acquired thismetacognitive knowledge.

On the whole, the findings of this study are consistent with those of Brown andPalincsar (1985) and Le Fevre (1996) in that they support the feasibility of usingreciprocal teaching to teach cognitive and metacognitive strategies for reading com-prehension to poor readers even before they are fully able to decode.

One possible explanation for the success of the present study is that the inter-vention addressed the problems of linguistic burden that previous research has indi-cated might be a problem when using reciprocal teaching with ESL students. DuringL1 reciprocal teaching dialogues, the ESL students were able to capitalise on theirfirst-language proficiency and literacy experiences as they learned the higher-levelcognitive and metacognitive strategies. During the L2 reciprocal teaching, students

page 27

27I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

already had a clear conceptual understanding about what strategy to use and how,when, where and why to use it as they were practising the four strategies. Althoughthe dialogue was much slower because of students’ limited English proficiency, stu-dents were still able to interact with the text, and with each other, meaningfully.

Another explanation for the success of the present study is that we adopted theexplicit-teaching-before-reciprocal teaching format rather than the reciprocal-teach-ing-only format in the intervention. Although the two forms of reciprocal teachingappear to be similarly effective, as indicated by the results of Rosenshine and Meis-ter’s (1994) meta-analysis, the two approaches might be differentially effective whendealing with students who have limited-English language proficiency. Prior explicitstrategy instruction might help in two ways. First, it might serve to activate andbuild students’ existing knowledge of the reading process and strategies. Second, itmight introduce students to the ‘ language’ of reciprocal teaching. This may helplessen the competing cognitive demands of text processing that requires high-levelthinking and verbal interaction that requires high level language proficiency. Ourfindings, similar to those reported by Duffy et al. (1987), suggest that explicit instruc-tion leads to poor readers’ more conscious use of reading strategies, and to betterreading performance.

A third explanation for the success of the present study is that the structure ofthe intervention helped promote strategy transfer. As students practised using thecomprehension fostering and monitoring strategies through L1 and L2 reciprocalteaching dialogues on alternate days, they had opportunities to construct understand-ings about the similarity of the L1 and L2 reading processes. The incorporation ofL1 and L2 reciprocal teaching may have helped students develop a sense of consciouscontrol, or metacognitive awareness, over a set of strategies that they could adaptfor use with both Chinese and English text. New immigrant ESL students may bepreoccupied with the linguistic ‘ trees’ and forget the knowledge ‘ forest’ when theyread in a second language. Consequently, their wealth of prior L1-literacy experi-ences may not be accessed. The success of the present study may be due in part tothe improvement of students’ metacognitive knowledge of the processes in both L1and L2 reading through which transfer was facilitated.

Although the convergence of data from various sources in this study providesstrong evidence of treatment effects on students’ English comprehension, three limi-tations must be kept in mind when interpreting these findings. First, there were notests of generalisability across settings. Therefore, we do not know whether studentswould apply the cognitive and metacognitive strategies to their regular classroomreading activities. Second, because we used a range of measures to assess students’comprehension in both L1 and L2 reading (both processes and outcomes), the assess-ments themselves may have had some impact on the changes in the students’ compre-hension performance. We do not know the extent or nature of these testing effects.Third, because of the relatively small sample of 12 students in the study, individualdifferences among participating students may have obscured some effects of theintervention. Sources of individual differences that may have influenced our abilityto detect the effects of treatment in this study include: time needed for internalisationof cognitive and metacognitive strategies (Raphael, 1984), prior knowledge of, and

page 28

28 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

interest in, the text content (Afflerbach, 1990; Bugel & Buunk, 1996), vocabularyknowledge (Anderson & Freebody, 1980) and motivation (Guthrie & Wigfield,1999).

In conclusion, this study supports the feasibility of using the reciprocal teachingprocedure with limited-English proficient ESL students when the procedure is modi-fied to cater for the language needs of these students. By pairing L1-reciprocal teach-ing sessions with L2-reciprocal teaching sessions and providing explicit instructionin the use the cognitive and metacognitive strategies, students’ existing expertise andknowledge within their L1 can be capitalized upon to facilitate the development ofappropriate reading and comprehension skills in their L2.

Acknowledgements

We are grateful to Professor John Hattie and to the three anonymous referees forcomments on earlier versions of the manuscript.

References

Adams, M. J., & Collins, A. (1979). A schema-theoretic view of reading. In R. O. Freedle (Ed.), Newdirections in discourse processing (pp. 1–22). Norwood, NJ: Albex.

Afflerbach, P. P. (1990). The influence of prior knowledge and text genre on readers’ prediction strategies.Journal of Reading Behavior, 22(2), 131–148.

Anderson, R. C., & Freebody, P. (1980). Vocabulary knowledge and reading. Reading Education ReportNo. 11. Cambridge, MA.: Illinois University, Urbana. Center for the Study of Reading. ED177470.

Anderson, R. C., & Pearson, P. D. (1984). A schema-thematic view of basic processes in reading compre-hension. In P. D. Pearson, R. Barr, M. L. Kamil, & P. Mosenthal (Eds.), Handbook of reading research(pp. 255–291). New York: Longman.

Anderson, V., & Roit, M. (1993). Planning and implementing collaborative strategy instruction for delayedreaders in grades 6–10. The Elementary School Journal, 94, 121–137.

Baker, L. (1996). Social influences on metacognitive development in reading. In C. Cornoldi, & J. Oakhill(Eds.), Reading comprehension difficulties: Processes and intervention (pp. 331–352). Mahwah, NJ:Lawrence Erlbaum Associates.

Baker, L., & Brown, A. L. (1984). Metacognitive skills and reading. In P. D. Pearson, R. Barr, M.Kamil, & P. Mosenthal (Eds.), Handbook of reading research (pp. 353–394). New York: Longman.

Barlow, D. H., & Hersen, M. (1984). Single case experimental designs (2nd ed.). New York: Pergamon.Benedetto, R. (1984). A psycholinguistic investigation of the top-level organization strategies in first and

second language reading: Five case studies. Unpublished doctoral dissertation, New York University,New York.

Block, E. (1986). The comprehension strategies of second language readers. TESOL Quarterly, 20,463–494.

Block, C. C. (1993). Strategy instruction in a literature-based program. Elementary School Journal, 94,139–151.

Bloom, M., & Fischer, J. (1982). Evaluating practice: Guidelines for the accountable professional. Engle-wood Cliffs, NJ: Prentice-Hall.

Borkowski, J. G., Carr, M., & Pressley, M. (1987). “Spontaneous” strategy use: Perspectives from metac-ognitive theory. Intelligence, 11, 61–75.

Bossers, B. (1992). Reading in two languages: A study of reading comprehension in Dutch as a secondlanguage and in Turkish as a first language. Rotterdam: The Netherlands: Drukkerij Van Driel.

page 29

29I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Brown, A. L., Campione, J. C., & Day, J. D. (1981). Learning to learn: On training students to learnfrom texts. Educational Researcher, 10, 14–21.

Brown, A. L., & & Palincsar, A. S. (1985). Reciprocal teaching of comprehension strategies: A naturalhistory of one program for enhancing learning. (Technical Report No. 334). U.S. Illinois: Center ofthe Study of Reading, University of Illinois, Urbana.

Bugel, K., & Buunk, B. P. (1996). Sex differences in foreign language text comprehension: The role ofinterests and prior knowledge. Modern Language Journal, 80(1), 15–31.

Carrell, P. (1991). Second language reading: Reading ability or language proficiency? Applied Linguistics,12, 159–179.

Carrell, P. L., & Eisterhold, J. C. (1983). Schema theory and ESL reading. TESOL Quarterly, 17, 553–573.Clarke, M. A. (1980). The short-circuit hypothesis of ESL reading—or when language competence inter-

feres with reading performance. Modern Language Journal, 64, 203–209.Collier, V. P. (1988). How long? A synthesis of research on academic achievement in a second language.

TESOL Quarterly, 23, 509–531.Collier, V. P., & Thomas, W. P. (1989). How quickly can immigrants become proficient in school English?

Journal of Educational Issues of Language Minority Students, 5, 26–38.Cotterall, S. (1990). Reciprocal teaching: A problem-solving approach to reading. Guidelines, 12, 2.Cummins, J. (1980). The cross-lingual dimensions of language proficiency: Implications for bilingual

education and the optimal age issue. TESOL Quarterly, 14, 175–187.Dashwood, A., & Mangubhai, F., (1996), ESL students and strategy training in reading comprehension-

decoding the implications for second language learning and teaching. Paper presented at the ACTA–ATESOL (NT) National Conference and 7th TESOL in Teacher Education Conference, Darwin, North-ern Territory, Australia.

Department of Education (1986). School Journal Catalogue 1965–1985. Wellington, NZ: Department ofEducation, School Publications Branch.

Deshler, D. D., & Schumaker, J. B. (1993). Strategy mastery by at-risk students: Not a simple matter.Elementary School Journal, 94, 153–167.

Dole, J. A., Duffy, G. G., Roehler, L. R., & Pearson, P. D. (1991). Moving from the old to the new:Research on reading comprehension instruction. Review of Educational Research, 61(2), 239–264.

Duffy, G. G., Roehler, L. R., Sivan, E., Rackliffe, G., Book, C., Meloth, M. S., Vavrus, L. G., Wesselman,R., Putnam, J., & Bassiri, D. (1987). Effects of explaining the reasoning associated with using readingstrategies. Reading Research Quarterly, 22, 347–368.

Frederiksen, J. R. (1982). A componential theory of reading skills and their interactions. In R. J. Sternberg(Ed.), Advances in the psychology of human intelligence. Hillsdale, NJ: Lawrence Erlbaum Associates.

Fung, I. Y. Y. (1999). L1-assisted reciprocal teaching for ESL students. Unpublished Masters Thesis,University of Auckland, Auckland, N.Z.

Gagne, E. D., Yekovich, C. W., & Yekovich, F. R. (1993). The cognitive psychology of school learning.New York: HarperCollins College Publishers.

Gelzheiser, L. M. (1986). Instruction that affords skill transfer. New York: Paper presented at the AnnualConference of the Association for Children and Adults with Learning Disabilities.

Guthrie, J. T., & Wigfield, A. (1999). How motivation fits into a science of reading. Scientific Studies ofReading, 3(3), 199–205.

Hague, S., & Olejnik, S. (1989). Text structure: Does awareness transfer from first language to secondlanguage? Paper presented at the annual meeting of the American Educational Research Association,San Francisco.

Hare, V. C., & Smith, D. C. (1982). Reading to remember: Studies of metacognitive reading skills inelementary school-aged children. Journal of Educational Research, 75(3), 157–164.

Hudson, T. (1982). The effects of induced schemata on the short circuit in L2 reading: Non-decodingfactors in L2 reading performance. Language Learning, 32, 1–31.

Jimenez, R. T., Garcia, G. E., & Pearson, P. D. (1996). The reading strategies of bilingual Lintina/ostudents who are successful English readers: Opportunities and obstacles. Reading Research Quarterly,31(1), 90–112.

Just, M. A., & Carpenter, P. A. (1987). The psychology of reading and language comprehension. Newton,MA: Allyn & Bacon.

page 30

30 I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York:Oxford University Press.

Klingner, J. K., & Vaughn, S. (1996). Reciprocal teaching of reading comprehension strategies for studentswith learning disabilities who use English as a second language. Elementary School Journal, 96(3),275–293.

Klare, G. R. (1984). Readability. In P. D. Dearson, R. Barr, M. L. Kamil, & P. Mosenthal (Eds.), Handookof reading research (Vol 1, pp. 681–744). New York: Longman.

Krishef, C. H. (1991). Fundamental approaches to single subject design and analysis. Malabar, Flor-ida: Krieger.

Kucera, J., & Axelrod, S. (1995). Multiple-baseline designs. In S. B. Neuman, & S. McCormick (Eds.),Single subject experimental research: Applications for literacy (pp. 47–63). Newark, Delaware: Inter-national Reading Association.

Langer, J. A. (1990). Meaning construction in school literacy tasks: A study of bilingual students. Amer-ican Educational Research Journal, 27(3), 427–471.

Le Fevre, D. (1996). Tape-assisted reciprocal teaching for readers with poor decoding skills. UnpublishedMasters Thesis, The University of Auckland, Auckland.

Lee, J. F., & Musumeci, D. (1988). On hierarchies of reading skills and text types. Modern LanguageJournal, 72, 000–000.

Markman, E. M. (1979). Realizing that you don’ t understand: Elementary school children’s awareness ofinconsistences. Child Development, 50, 643–655.

McCormick, S. (1995). What is single-subject experimental research? In S. B. Neuman, & S. McCormick(Eds.), Single subject experimental research: Applications for literacy (pp. 1–31). Newark, Delaware:International Reading Association.

Meyer, B. J. F. (1975). , The organization of prose and its effects on memory. Amsterdam: North-Holland.Miller, G. E. (1985). The effects of general and specific self-instruction training on children’s comprehen-

sion monitoring performance during reading. Reading Research Quarterly, 20, 616–628.Miller, G. E. (1987). The influence of self-instruction on the comprehension monitoring performance of

average and above-average readers. Journal of Reading Behavior, 19, 303–317.Miller, L. D., & Perkins, K. (1990). ESL reading comprehension instruction. RELC Journal, 21(1), 79–94.Ministry of Education (1995). School Journal Catalogue (1980–1995). Wellington, NZ: Ministry of Edu-

cation, Learning Media.Moll, L. C. (1988). Some key issues in teaching Latino students. Language Arts, 65, 465–472.Moll, L. C., Estrada, E., Diaz, E., & Lopes, L. M. (1980). The organization of bilingual lessons: Impli-

cations for schooling. The Quarterly Newsletter of the Laboratory of Comparative Human Cognition,2(3), 53–58.

Neale, M. D. (1988). Neale analysis of reading ability revised. Australia: ACER.Neuman, S. B., & McCormick, S. (1995). Single subject experimental research: Applications for literacy.

Newark, Delaware: International Reading Association.O’Malley, J. M., & Chamot, A. U. (1990). Learning strategies in second language acquisition. New

York: Cambridge University Press.Palincsar, A. S. (1986). Reciprocal teaching. In A. S. Palincsar (Ed.), Teaching reading as thinking. Oak

Brook, IL: North Central Regional Educational Laboratory.Palincscar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and comprehen-

sion-monitoring activities. Cognition and Instruction, 1(2), 117–175.Paris, S. G., Lipson, M., & Wixson, K. (1983). Becoming a strategic reader. Contemporary Educational

Psychology, 8, 293–316.Paris, S. G., & Winograd, P. (1990). How metacognition can promote academic learning and instruction.

In B. F. Jones, & L. Idol (Eds.), Dimensions of thinking and cognitive instruction (pp. 15–51).Hillsdale, NJ: Erlbaum.

Paris, S. G., Cross, D. R., & Lipson, M. Y. (1984). Informed strategies for learning: a program to improvechildren’s reading awareness and comprehension. Journal of Educational Psychology, 76(6), 1239–1252.

Pearson, P. D., & Johnson, D. D. (1978). Teaching reading comprehension. New York: Holt, Rine-hart & Winston.

page 31

31I.Y.Y. Fung et al. / Learning and Instruction 13 (2003) 1–31

Pearson, P. D., & Dole, J. A. (1987). Explicit comprehension instruction: A review of research and anew conceptualisation of instruction. Elementary School Journal, 88(2), 151–165.

Pressley, M., Johnson, C. J., Symons, S., McGoldrick, J. S., & Kurita, J. A. (1989). Strategies that improvechildren’s memory and comprehension of text. The Elementary School Journal, 90, 3–32.

Raphael, T. E. (1984). Developmental aspects of training students to use information-locating strategiesfor responding to questions. Research Series No. 137. East Lansing, MI: Michigan State University.ED241592.

Raphael, T. E., & Pearson, P. D. (1985). Increasing student’s awareness of sources of information foranswering questions. American Educational Research Journal, 22, 217–235.

Rosenshine, B., & Meister, C. (1994). Reciprocal teaching: A review of the research. Review of Edu-cational Research, 64(4), 479–530.

Rumelhart, D. E. (1980). Schemata: The building blocks of cognition. In R. J. Spiro, B. C. Bruce, & W.F. Brewer (Eds.), Theoretical issues in reading comprehension (pp. 33–58). Hillsdale, NJ: LawrenceErlbaum Associates.

Samuels, S. J. (1994). Toward a theory of automatic information processing in reading, revisited. In R.B. Ruddell, M. R. Ruddell, & H. Singer (Eds.), Theoretical models and processes of reading (4th ed.,pp. 816–837). Newark, Del.: International Reading Association.

Swaffar, J. K. (1988). Readers, texts, and second languages: The interactive processes. The Modern Langu-age Journal, 72(2), 123–148.

Taillefer, G. F. (1996). L2 reading ability: Further insight into the short-circuit hypothesis. The ModernLanguage Journal, 80, 461–477.

Thibadeau, R., Just, M. A., & Carpenter, P. A. (1982). A model of the time course and content of reading.Cognitive Science, 6, 157–203.

Vygotsky, L. S. (1962). Thought and language. New York: Wiley.

page 32

Learning and Instruction 13 (2003) 177–189www.elsevier.com/locate/learninstruc

Cognitive strategies for learning from static anddynamic visuals

D. Lewalter∗

University of the Armed Forces, Munich, FRG, Werner-Heisenberg-Weg 39, D-85577 Neubiberg,Germany

Abstract

An experimental study with 60 students investigated the effects of including static ordynamic visuals in an expository text on a learning outcome and the use of learning strategieswhile working with those visuals. For the study, two illustrated and one text-only version ofa computer-based learning text on an astrophysical subject were developed and served as thelearning material. Considering the cognitive task demand in a learning test, we found signifi-cant differences between the illustrated versions and the text-only version, but not betweenthe two illustrated ones. We used think-aloud protocols to examine the learning processesinitiated by both types of visuals. The coding of the recorded learning activities was based onrecent theories of learning strategies. The results for both types of illustrations indicate differ-ent frequencies in the use of learning strategies relevant for the learning outcome, and thereforeindicate the contribution of the cognitive process quality for the supportive function of visuals. 2003 Elsevier Science Ltd. All rights reserved.

1. Introduction

Illustrations are frequently integrated in expository text to support learning and tomake the learning process more effective. A large number of studies have substan-tiated the supportive function of static illustrations in text on learning outcomes undercertain conditions (Rieber, 1994). A theoretical explanation for the positive effectsof visuals is provided by the cognitive theory of multimedia learning from text andpictures presented by Mayer (1994, 2003). When learners build referential connec-

∗ Fax: +49-89-6004-2841.E-mail address: [email protected] (D. Lewalter).

0959-4752/03/$ - see front matter 2003 Elsevier Science Ltd. All rights reserved.doi:10.1016/S0959-4752(02)00019-1

page 33

178 D. Lewalter / Learning and Instruction 13 (2003) 177–189

tions between their separately developed mental representations of verbal and visualmaterial and their prior knowledge, learning is enhanced.

The integration of illustrations plays an important role in designing effective com-puter-based learning programs. With the advent of new technologies, dynamic visualssuch as animated graphics are used instead of or in addition to static visuals likepictures. Comparing both types of illustrations, we find numerous similarities con-cerning the representation of objects. However, animations seem to be superior forthe visualization of spatial aspects and dynamic processes. They allow a completevisualization of spatial constellations and dynamic processes, whereas in pictures,static indicators such as shading or arrows must be used to symbolize this infor-mation.

Consequently, animations and pictures impose different cognitive demands on thelearner when creating a mental representation of the dynamic learning content.

As dynamic illustrations offer a complete model for generating a mental represen-tation of motion, thereby reducing the level of abstraction of temporal ideas, theyshould support deeper understanding than static visuals do (Park & Hopkins, 1993).When static visuals like pictures are used, learners are forced to infer this model ontheir own. One may, therefore, expect dynamic visuals to be more helpful in fosteringthe learning process if motion in three-dimensional space is a relevant aspect of thelearning material. However, only weak empirical evidence supports the assumptionof the superiority of animated illustrations (e.g. Reed, 1985; Rieber, 1989).

How can this lack of superiority of animations over static visuals be explained?One opportunity to find an answer to this question is to examine the learning processmore closely. The expectation of the higher impact of dynamic visuals on learningoutcomes when compared with static visuals is based on the assumption that learnersuse all the information presented in these illustrations. Static and dynamic illus-trations are seen as “simple media” that do not confront the learner with problems inprocessing the presented information. Weidenmann (1988) argues that the cognitiveprocessing of visuals is not as easy as is frequently supposed. Studies have shownthat learners sometimes have problems establishing connections between visual andtextual information (e.g. Mayer, 1994; Weidenmann, 1988). They also have difficultyidentifying the relevant information presented in an illustration (Mayer & Gallini,1990). When learning with dynamic illustrations, learners are confronted with similaror even more challenging problems (Lowe, 1999, 2003). The transitory nature ofdynamic visuals confronts learners with higher levels of cognitive load than wouldbe expected for static visuals. On the other hand, the apparent simplicity of thedynamic information presented may influence the learners to adopt less mental effortand volitional attention (Rieber, 1989).

From an information-processing perspective, it seems reasonable to assume thatthere are differences in cognitive processes associated with learning from animatedand static visuals. Therefore we tried to investigate the actual strategies and thoughtprocesses that learners are engaged in when learning from illustrations integrated intext. These strategies have to be considered as a central factor for the effective useof visuals.

Activities that support successful learning are generally referred to as learning

page 34

179D. Lewalter / Learning and Instruction 13 (2003) 177–189

strategies (Dansereau, 1985; Pintrich, 1989; Weinstein & Mayer, 1986). We usedthe model proposed by Weinstein and Mayer (1986) as an appropriate theoreticalframework for understanding the process of information encoding when learning withillustrations in text. Weinstein and Mayer (1986, p. 315) defined learning strategies as“behaviors and thoughts that learners engage in during learning and that are intendedto influence the learner’s encoding process” . Learning strategies are understood asa schematic structure combining a sequence of specific learning activities that willbe executed by the learner to gain new knowledge. The authors differentiate thelearning strategies by considering their function within the encoding process. Thisprocess can be analyzed with respect to the following four main components: selec-tion, acquisition, construction and integration. Weinstein and Mayer (1986) dis-tinguish between cognitive and metacognitive strategies in the process of learningfrom text. Cognitive strategies refer to the learners’ cognitive processes during theprocess of encoding, for example rehearsal or elaboration behaviors. Metacognitivestrategies refer to the learners’ knowledge of their own cognitive processes and theirability to control these processes, for example by monitoring and modifying them.

In our study we used these main categories of learning strategies in order to charac-terize the learners’ activity when engaged in the visual aspects of the learningmaterial. The material is composed of a text describing and explaining an aspect ofthe subject matter, immediately followed by dynamic or static visuals on a separateframe (see Section 2.1 for a detailed description).

In general, the term “rehearsal strategies” refers to learning techniques like memo-rizing by recitation and recapitulation. We expected that the statements of the learnerswhen using rehearsal strategies during working on the visuals would probably bevery similar to the text information they read in the preceding text. Since dynamicillustrations offer the motion of the pictorial components in a chronological order, therehearsal of the information may be more structured. Because of the short duration ofthe animation the repeated information may be less complete. Learners may onlymention the most obvious aspects of information presented by the dynamic visual.Considering the assumption that learning is enhanced when learners build referentialconnections between their separately developed verbal and visual representations ofthe learning material (Mayer, 2003), this learning strategy may support learning,particularly when factual knowledge is supposed to be acquired.

In general, “elaboration strategies” include learning techniques such as buildingconnections between new information and prior knowledge or experiences. Thereforeelaboration strategies seem to be crucial for deeper comprehension. It was assumedthat the illustrations stimulate the learners to use optical links to the learning content,which were offered to them in addition to verbal links that were offered by the text.Learners could connect the new information not only to their prior knowledge butalso to their optical experiences stored in memory. For example, they could identifyoptical similarities. Compared with static visuals, animations offer the dynamicaspect of the optical appearance as an additional link for elaborations.

In our study, we used the term “control strategies” to refer to learning techniquesthat (a) aim at the planning and regulation of the further steps in learning and (b)refer to the control of the actual level of comprehension. According to Chi, Bassok,

page 35

180 D. Lewalter / Learning and Instruction 13 (2003) 177–189

Lewis, Reimann, and Glaser (1989), statements that refer to the “controlling of thelevel of comprehension” can be understood either as statements confirming compre-hension of the learning material or as statements indicating comprehension failure.These strategies refer to both the comprehension of the illustrations and the buildingof referential connections between text and visuals. Because of the transitory andautomatic nature of the dynamic visuals, it might be more difficult for the learnersto detect possible problems in comprehension when compared with static visuals.Learners who generate a visual impression of the motion on the basis of static sym-bols on their own may be likely to recognize problems in comprehension becausethey have the ability to regulate the speed of their learning, perhaps making themmore able to employ planning and regulation strategies. These strategies are crucialfor deeper comprehension (Entwistle, 1988).

This taxonomy of learning strategies represents an analytical distinction betweenlearning processes which in fact are highly interrelated. However, this frameworkprovides a theoretical basis for an attempt to distinguish various kinds of cognitiveand metacognitive learning techniques executed by the learners when they are learn-ing with different kinds of visuals. In addition, since we do not know how successfulstudents use learning strategies when working with different kinds of visuals, wehave to investigate the intensity of the strategy use and its impact on learning out-comes.

In sum, these considerations lead to the following research questions:

1. What are the effects of static and dynamic visuals in an expository text on learn-ing outcomes?

2. What kind of learning strategies do students use when learning with static anddynamic visuals and to what extent do they employ them?

3. What effect does the use of learning strategies have on learning outcome whenlearning with visuals?

2. Method

In order to answer these questions, we conducted an experimental study with twoexperimental and one control group. Subjects were tested individually when learningwith a computer-based learning program on an astrophysical topic.

2.1. Material

The learning material consisted of three versions of a computer-based learningtext, two illustrated versions and one non-illustrated control version. The learningprograms dealt with the external appearance and the explanation of optical phenom-ena as a result of optical gravitational lensing (see Fig. 1a and b). In general relativitytheory, the presence of a deflector (for example, the gravitational field of a massivestar) can curve spacetime, and the path of a light ray will be deflected as a result(see Fig. 1b). This process is called gravitational lensing. As a consequence of len-

page 36

181D. Lewalter / Learning and Instruction 13 (2003) 177–189

Fig. 1. (a) Static illustration showing the outward appearance of the optical phenomenon of the apparentdoubling of a star (inverted and transferred to gray scale). (b) Static illustration explaining the opticalphenomenon presented in Fig. 1a by depicting the bending of the light beams (inverted and transferredto gray scale).

sing, light rays that would not have otherwise reached the observer are bent fromtheir paths and towards the observer (see Fig. 1b). A gravitational field may causea source of light to appear greatly distorted and as multiple images (Fig. 1a).

The learning programs were composed of a learning text (about 2100 words) andnine units with illustrations. Each text section that described and explained an opticalphenomenon was immediately followed by an illustration. The text was identical inall three versions. The visuals were extracted from a video-exhibit from the Deut-sches Museum in Munich. They showed either the changing external appearance ofan optical phenomenon (for example, optical distortions of a star that is revolvingaround a smaller but massive star; Fig. 1a) or they illustrated the explanation of thephenomenon by depicting the effect of bending the path of a light ray because ofgravitational lensing (for example, when a straight light ray becomes bent so itreaches the eye of the observer; Fig. 1b). Within the latter illustrations the pictorialcomponents were labeled according to the text. In all illustrations the spatial relationsand the motion were crucial to understanding the principles and laws behind thephenomenon explained. In the version with dynamic visuals, the learning text wasillustrated by animated graphics which showed the course of motion completely. Inthe version with static visuals, motion was symbolized by a single frame or a seriesof frames (two to four frames) depicting the steps, showing the position of stars andthe change of their external appearance (see Fig. 1a) or by arrows symbolizing thedevelopment of the light ray (see Fig. 1b). The information presented by both typesof illustrations was equivalent. The text and the illustrations were presented on separ-ate pages, starting with the text passage which explained what would be seen in theimmediately following illustration(s). To maintain the alternation of text and illus-tration, the text-only version included content-free pictures showing circles and lines.

2.2. Subjects

A total of 60 education and psychology undergraduate students with an averageage of 25 years participated in the study. The participation was voluntary and was not

page 37

182 D. Lewalter / Learning and Instruction 13 (2003) 177–189

based on a reward. Each version of the learning program was given to 20 students, 14females and six males.

2.3. Measure of the use of learning strategies

Measurement of cognitive activities during learning can prove problematic. Itmight be supposed that, while learning with illustrations, experienced learners utilizenumerous learning techniques that are beyond their conscious control. These learnerswill likely have difficulty remembering the strategies at the end of the learning task.Therefore, recording the cognitive and metacognitive process activities should bedone simultaneously with the learning process. This is the main reason why we usedthink-aloud protocols. In order to overcome methodological problems that have beendiscussed with respect to reliability, validity and completeness of the data gained bythis method, we used a procedure that follows the guidelines of Ericsson and Simon(1984, see Section 2.5).

Using this method, we were able to get a quite unbiased insight into the learningprocess and to consider additional learning strategies, in case the strategies mentionedabove were inaccurate or incomplete for categorizing the different forms of learningtechniques used by the students.

The think-aloud protocols were taped and transcribed verbatim. Complete sen-tences and subordinate clauses were used as coding units. The following examplesof think-aloud protocols show statements of the learners while looking at Fig. 1a or1b. They will give an impression of how we tried to reconstruct the usage of learningstrategies (further descriptions of the statements we found will be given with theResults).

Rehearsal: “There are the light rays coming from the star behind the massive star.They are bent to the right and to the left—so they are reaching the eye of theobserver.” (This statement is an almost verbatim repetition of the preceding textpassage. Therefore it was coded as an indicator of the usage of rehearsal strategy.)Elaboration: “The bigger star is revolving around the smaller one like one knows itfrom the earth. …Because the star is bending the light ray distorting pictures comeup like in a distorting mirror.”Confirming Comprehension: “OK, clear. I think I understand it now.”Comprehension Failure: “ I do not understand it at all.”Planning Further Learning: “Now I have to see if I can find something about it inthe text.”

The quantitative data analysis differs with respect to the kind of learning strategy.As a quantitative indicator of rehearsal strategies, the completeness of the repetitionof the text-based information referring to the particular illustration was used withineach of the nine think-loud periods. A score was derived according to the followingrules: 1 point when only but one aspect was mentioned, 2 points for partial repetitionand 3 points for complete repetition (cf. Lewalter, 1997). Elaboration and controlstrategies were coded according to the frequency of their occurrence (at each case

page 38

183D. Lewalter / Learning and Instruction 13 (2003) 177–189

1 point). To assess the reliability two raters categorized all the protocols and reached94% agreement. Disagreements were resolved in discussion.

2.4. Measure of the learning outcome

Learning was measured with a test consisting of tasks that differ in the cognitivedemand required. The test includes tasks on factual knowledge, which required thereproduction of specific facts or events (seven items, 56 points, Cronbach’s alpha0.79), and tasks on comprehension and problem-solving, which require the under-standing and use of principles and laws (nine items, 54 points, Cronbach’s alpha0.80). Within both groups of questions, the assessing of the learning outcome coverspictorial and verbal answer formats. The students had to answer drawing tasks (cf.draw in the path of a light ray in a certain constellation of stars) and verbal tasks(cf. describe an optical phenomenon). The test meets the requirements proposed byJoseph and Dwyer (1984) and others.

2.5. Procedure

The study was conducted in two steps: a pre-test and a main investigation. Duringthe pre-test, we used a test on topic-specific prior knowledge (11 items, Cronbach’salpha 0.70) and five scales of the Wilde-Intelligence-Test measuring spatial imagin-ation ability, speed of perception and verbal intelligence (WIT, Jager & Althoff,1983). The results of the pre-test were used to form parallel groups for the maininvestigation, which took place one or two weeks later. In the course of the maininvestigation, the students worked with one version of the learning program. Theselearning sessions were performed individually. At the beginning of the learning ses-sion the learners got an introduction to the think-aloud method according to theEricsson and Simon guidelines (1984). They indicated that the task of thinking aloudmust be clearly subordinate to the learning task and that students should be instructedto verbalize all their thoughts immediately and without selection. Then the studentswere asked to work on the learning program for about 45 minutes in order to learnthe content and be able to answer questions about it. The think-aloud method wasused only when the students were learning with the visuals. To keep the learningprocess as equal as possible, the learners of the text-only version had to think-aloudwhen they were looking at the text-irrelevant illustrations. Following the learningsession, the students completed a questionnaire evaluating the learning program andthe quality of the visuals. Finally, they answered the questions on the learning out-come test.

3. Results

3.1. What are the effects of static vs. dynamic visuals in an expository text onlearning outcome?

In the first step of analysis, the results of the test were compared for the twoexperimental groups on the one side and the control group on the other. Significant

page 39

184 D. Lewalter / Learning and Instruction 13 (2003) 177–189

Table 1Mean number (M) and standard deviation (SD) of the result in the learning outcome test depending onlearning program version, mean comparison, one-factor variance analysis, Scheffe test, p � 0.05

Dynamic Static visuals Text-only F- p D-S D-T S-Tvisuals version value

(df2,57)

M SD M SD M SD

Factual knowledge 28.55 11.78 28.50 10.24 15.90 10.81 8.84 �0.001 ∗ ∗Comprehension and 26.80 9.91 21.15 8.64 14.80 9.98 7.94 �0.001 ∗problem solving

differences were found on factual knowledge for both experimental groups whencompared with the text-only group (control group) (see Table 1). For the measureson comprehension and problem solving, only the difference between the dynamicversion and text version was significant.

In the second step of analysis, the results of the two experimental groups werecompared. No significant differences were found between the subjects learning withthe two illustrated versions, neither for questions concerning factual knowledge norfor the tasks on comprehension and problem solving (see Table 1).

3.2. What kind of learning strategies do students use when learning with staticand dynamic visuals and to what extent do they employ them?

Data from the think-aloud protocols show that learners used all three main categor-ies of learning strategies to varying degrees when learning with static or dynamicvisuals (see Table 2).

Rehearsal strategies were the most frequently used learning strategies in bothlearning groups. All students used this kind of strategy almost every time while

Table 2Mean number (M), standard deviation (SD) and median of the use of learning strategies depending onlearning program version, mean comparison U-test, Mann–Whitney, two-tailed

Learning strategy Dynamic visuals Static visuals U-value z p

M (SD) Median M (SD) Median

Rehearsal 20.0 (6.8) 21.5 25.8 (3.0) 27.0 88.5 �3.03 �0.01Elaboration 2.3 (3.4) 1.0 1.2 (2.0) 0.5 168.5 �0.91 nsControl strategies overall 7.6 (5.2) 6.0 7.5 (7.9) 5.0 172.5 �0.75 nsConfirming comprehension 3.3 (3.3) 3.0 1.2 (2.3) 0.0 112.0 �0.48 �0.05Comprehension failure 2.6 (2.3) 2.0 2.8 (3.3) 2.0 191.0 �0.25 nsPlanning for further learning 1.7 (1.8) 1.5 3.6 (3.4) 2.0 128.5 �1.97 �0.05

page 40

185D. Lewalter / Learning and Instruction 13 (2003) 177–189

learning with a visual. The intensive use of this learning strategy may be partlyinduced by the think-aloud method and the coding system, which does not allow adistinction between the use of rehearsal strategies and verbalization of the compo-nents of the illustration. As Table 2 shows, the learners, on average, used rehearsalstrategies significantly more often when learning with static illustrations. The con-tents of the rehearsal statements mainly refer to the text sequences of the learningprogram directly related to the visuals and not to previously given information. State-ments coded as rehearsal were repetitions of the learning text with exact wordingor the recapitulation of the main idea of the learning content while using theexpressions of the text.

We found that both groups very rarely used elaboration strategies (see Table 2).When they were used, learners made references only to static optical aspects of thelearning material and very rarely to their prior domain knowledge. The elaborationswe found reflected an effort to comprehend the principles of the subject matter bymaking optical analogies. The dynamic visuals group used slightly more elaborationstrategies, but the difference was not significant. It is noteworthy that half of thestudents did not verbalize any elaborations whatsoever.

We did not find any significant difference between the two groups on the averageuse of control strategies overall. Within the three categories of control strategies wefound relatively small frequencies of mentioning. Nonetheless, we find differencesconcerning the expression of these strategies between both groups that might be quiteimportant for the learning process. The learners reported with significantly higherfrequency on their comprehension when using dynamic illustrations. The frequenciesof statements indicating comprehension failure were similar for both groups. Signifi-cant differences were found in the statements about planning further steps in learningin favor of the static visuals group. Statements were coded as indicative of furtherplanning when learners mentioned learning aims concerning special aspects of thecontent or their further learning activities.

3.3. What effect does the use of learning strategies have on learning outcomewhen learning with visuals?

To answer the question concerning the effect of strategy use on learning outcome,we divided the learners into two groups: those who often used a specific strategyvs. those not using it (with planning strategy) or using it less frequently (withrehearsal strategy). We carried out two-way analysis of variance (ANOVAs) on thefrequency of strategy use and the version of the learning program. With regard tolearning outcome, we distinguished between factual knowledge andcomprehension/problem solving.

As can be seen in Fig. 2a, the intensity of rehearsal strategy use had a main effecton factual knowledge (F(1,36) = 5.67, p � 0.05). As expected, rehearsal strategiesdid not have a significant influence on comprehension and problem solving as shownin Fig. 2b. The program version was a significant factor only when questions oncomprehension and problem solving were considered (F(1,36) = 5.55, p � 0.05). In

page 41

186 D. Lewalter / Learning and Instruction 13 (2003) 177–189

Fig. 2. Average result in the test on factual knowledge (a) and comprehension and problem solving (b),depending on learning program version and intensity of use of rehearsal strategy.

both cases, there are no significant interactions between program version and strategyuse (see Fig. 2a and b).

Within the pattern of the control strategies, an effect on learning outcome that isbased on cognitive information processing can only be expected for planning stra-tegies and particularly for tasks which demand a deeper comprehension (Entwistle,1988). This assumption was confirmed by our data. The use of control strategies hada significant main effect on students’ performance on comprehension and problemsolving tasks (F(1,36) = 5.08, p � 0.05), but not on the recall of factual knowledge.In neither case was there a statistical interaction with program version (see Fig. 3aand b). The effect of program version was significant only for questions requiringcomprehension and problem solving (F(1,36) = 5.87, p � 0.05).

In addition, we analyzed whether students’ perception of their own understandingcorresponded to their actual performance. We found a positive correlation (r =0.49, p � 0.05) between statements indicating comprehension and general learningoutcome for dynamic version learners but not for static version subjects, who veryrarely referred to comprehension. As the learners of both groups mentioned compre-hension failure very rarely, no significant correlation with learning outcome wasfound.

Fig. 3. Average result in the test on factual knowledge (a) and comprehension and problem solving (b),depending on learning program version and intensity of use of planning for further learning strategy.

page 42

187D. Lewalter / Learning and Instruction 13 (2003) 177–189

4. Discussion

The results of this study confirm the supportive function of the illustrations usedin the learning programs on factual knowledge when compared with text-only infor-mation. Concerning tasks on comprehension and problem solving, this is only truefor dynamic visuals. The illustrations helped the subjects to store the new informationin memory. However, the data indicate a lack of superiority of dynamic visuals onlearning outcome when compared with static visuals. They confirm the findings ofcomparative studies between animations and pictures (e.g. Reed, 1985; Rieber, 1989,1994). This is even true for different cognitive task demands of the test on learningoutcome. The hypothesis that the use of animations has a positive impact on learningoutcome because of the complete presentation of the dynamic aspects and thereduction of the level of abstraction (Park & Hopkins, 1993; Rieber & Kini, 1991)is not supported by our data. Arrows and series of frames, which are quite conven-tional symbols for motion, may be sufficient for the learners to acquire factual knowl-edge in this case.

From a media research perspective, this result can be interpreted as an exampleof the equality of effectiveness of dynamic and static visuals in supporting the learn-ing process. The use of static visuals might be completely sufficient in some cases.Further research is needed to examine the conditions for the specific and effectiveuse of dynamic visuals.

From the cognitive psychology research perspective, which was applied in thisstudy, we tried to get information about the impact on the learning process of useof learning strategies.

The adoption of the concept of learning strategies of Weinstein and Mayer’s(1986) model was an appropriate basis for the description and classification of cogni-tive processes during learning with illustrations. The results indicate that the proposedcategories for describing learning strategies are useful, but for a more detailed recon-struction of these processes the classification system needs further differentiation andadaptation to the specific characteristics of the learning process with visuals.

The think-aloud method has proved to be an appropriate means for the assessmentof these processes. However, there are also some methodical problems. For example,it may be the case that the method evokes the salience of one kind of learningstrategies (e.g. rehearsal) to a higher degree than others. With regard to futureresearch, additional methods like retrospective interviews on the use of learning stra-tegies should be tried to get closer insight.

Nevertheless, the data show that rehearsal strategies were used often by bothgroups, but significantly more frequently by the static visuals group. As the use ofthis strategy has an impact on acquiring factual knowledge, this finding seems tosupport our central hypothesis that the quality of the cognitive process contributesto the supportive function of visuals.

The very rare use of elaboration strategies in this study might partly be explainedby a lack of prior domain knowledge. Since most references were made to the opticalaspects of the learning content, the results suggest additional investigation of the

page 43

188 D. Lewalter / Learning and Instruction 13 (2003) 177–189

expression and the conditions for the use of elaboration strategies when learningwith illustrations.

In our study, learners of both groups used control strategies quite rarely. Whilethe more successful learners in the dynamic group made comprehension confirmingstatements, both groups very rarely referred to comprehension failure. The moreintensive planning of further learning steps by the subjects of the static version indi-cates that they were aware of the actual learning demand and responded with apurposeful organization of their learning activities. The possibly supportive effect ofanimations may be partly lost due to the lesser use of appropriate learning strategies.Perhaps the amount of effort invested is—besides other aspects (e.g. complexity ofthe material, purpose of the task)—influenced by general expectation about how touse and engage in these new media of learning, as has been shown by Rieber (1994)and others.

Several conclusions might be drawn from these results. Further research is neededto extend our knowledge on cognitive processing of visual information. If the resultsreported in this experimental study are replicated in future research approaches, stu-dents may need additional support to use suitable learning strategies in order to learnmore effectively with animated visuals. Additional research is needed to explore thenature of the instructional design that encourages students to use dynamic visualinformation for reaching a higher level of learning.

References

Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: howstudents study and use examples in learning to solve problems. Cognitive Science, 13, 145–182.

Dansereau, D. F. (1985). Learning strategy research. In J. W. Segal, S. F. Chipman, & R. Glaser (Eds.),Relating instruction to research (pp. 209–239). Thinking and learning skills, Vol. 1. Hillsdale, NJ:Erlbaum.

Entwistle, N. (1988). Motivational factors in students’ approaches to learning. In R. R. Schmeck (Ed.),Learning strategies and learning styles (pp. 21–51). New York: Plenum Press.

Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis. Cambridge, MA: MIT Press.Jager, A. O., & Althoff, K. (1983). Der Wilde-Intelligenz-Test (WIT). Gottingen: Hogrefe.Joseph, J. H., & Dwyer, F. M. (1984). The effects of prior knowledge, presentation mode, and visual

realism on student achievement. Journal of Experimental Education, 52, 110–121.Lewalter, D. (1997). Lernen mit Bildern and Animationen. Munster: Waxmann.Lowe, R. (1999). Extracting information from an animation during complex visual learning. European

Journal of Psychology of Education, 14, 225–244.Lowe, R. (2003). Animation and learning: selective processing of information in dynamic graphics. Learn-

ing and Instruction, 13, 157–176.Mayer, R. E. (1994). Visual aids to knowledge construction: building mental representations from pictures

and words. In W. Schnotz, & R. W. Kulhavy (Eds.), Comprehension of graphics (pp. 125–138).Amsterdam: Elsevier.

Mayer, R. M. (2003). The promise of multimedia learning: using the same instructional design methodsacross different media. Learning and Instruction, 13, 125–139.

Mayer, R. E., & Gallini, J. K. (1990). When is an illustration worth ten thousand words? Journal ofEducational Psychology, 82, 715–726.

Park, O. -C., & Hopkins, R. (1993). Instructional conditions for using dynamic visual displays: a review.Instructional Science, 21, 427–449.

page 44

189D. Lewalter / Learning and Instruction 13 (2003) 177–189

Pintrich, P. R. (1989). The dynamic interplay of student motivation and cognition in the college classroom.Advances in Motivation and Achievement, 6, 117–160.

Reed, S. K. (1985). Effect of computer graphics on improving estimates to algebra word problems. Journalof Educational Psychology, 77, 285–298.

Rieber, L. P. (1989). The effects of computer animated elaboration strategies and practice on factual andapplication learning in an elementary science lesson. Journal of Educational Computing Research, 5,431–444.

Rieber, L. P. (1994). Computers, graphics & learning. Madison, WI: Brown & Benchmark.Rieber, L. P., & Kini, A. S. (1991). Theoretical foundations of instructional applications of computer-

generated animated visuals. Journal of Computer-Based Instruction, 18, 83–88.Weidenmann, B. (1988). When good pictures fail. An information processing approach to the effect of

illustrations. In H. Mandl, & J. R. Levin (Eds.), Knowledge acquisition from text and pictures (pp.157–171). Amsterdam: Elsevier.

Weinstein, C. E., & Mayer, R. E. (1986). The teaching of learning strategies. In M. C. Wittrock (Ed.),Handbook of research on teaching (pp. 315–327). New York: Macmillan Publishing Company.

page 45

BRIEF REPORTS

Spanish-Language Measures of Mania and Depression

Camilo J. Ruggero, Sheri L. Johnson, and Amy K. CuellarUniversity of Miami

Efforts to better understand bipolar spectrum disorders across ethnic groups are often hampered by thelack of commonly used self-report instruments to assess mania and depression in individuals who speaklanguages other than English. This article describes the translation into Spanish of 2 self-report measuresof manic symptoms (i.e., the Internal State Scale and the Hypomanic Personality Scale) and 2 self-reportmeasures of depression (i.e., the Inventory to Diagnose Depression and the Inventory to DiagnoseDepression, Lifetime version). The authors translated these measures into Spanish and assessed theirpsychometric properties among bilingual college students (N � 88). Results suggest that the Spanishversions have psychometric properties comparable to the English versions of the instruments.

During the 1990s, the Hispanic population was the fastest grow-ing minority group in the United States and became the largestminority population in the country (U.S. Census Bureau, 2001).This demographic phenomenon makes salient the need to incor-porate this group into clinical research on mental health. A majorbarrier to this endeavor is the lack of valid and reliable Spanish-language assessment instruments for those Hispanics who speakSpanish as their primary language (Ginzberg, 1991; Norvy, Stan-ley, Averill, & Daza, 2001). Measures of psychopathology that doexist in Spanish tend to be developed for, and normed on, largelynon-Hispanic, White samples; moreover, these instruments lackpsychometric support for use with bilingual Americans (Norvy etal., 2001).

Although lifetime prevalence rates of bipolar disorder are com-parable across ethnic groups (American Psychiatric Association,2000), there is some preliminary evidence that cultural variablesmay influence the course of the disorder (Nandi, Banerjee,Mukherjee, Nandi, & Nandi, 2000). Nonetheless, little researchexists comparing the course of the disorder or the mechanisms bywhich it unfolds across ethnic groups. Spanish-language measuresof bipolar disorder are needed to facilitate this type of research.Although interview-based instruments exist, we are aware of noSpanish versions of self-report measures to assess current manicsymptoms, and we are aware of only a single effort to translate ameasure of lifetime vulnerability to mania (Rawlings, Barrantes-Vidal, Claridge, McCreery, & Galanos, 2000).

In this article, we report on the translation into Spanish of twoself-report measures of mania, the Internal State Scale (ISS; Baueret al., 1991) and the Hypomanic Personality Scale (HPS; Eckblad& Chapman, 1986), and on the psychometric properties of these

measures. We also report on the translation into Spanish of twowidely used self-report measures of depression and lifetime historyof depression, the Inventory to Diagnosis Depression (IDD; Zim-merman, Coryell, Corenthal, & Wilson, 1986) and the Inventory toDiagnose Depression, Lifetime version (IDD-L: Zimmerman &Coryell, 1987), and on the psychometric properties of these mea-sures. These instruments have been extensively used in researchwith English-speaking participants.

The goal of the current study was to develop Spanish-languageversions of these commonly used measures of mania and depres-sion and to compare the psychometric properties of the Englishversions of these measures with the Spanish versions among asample of bilingual individuals. Specifically, internal consistencyestimates and mean scale score differences were calculated andcompared across language versions; intraclass correlations werecalculated to assess the relationship between the measures inEnglish and Spanish. In so comparing the two language versions,we do not intend to make cross-cultural comparisons, rather, weintended to obtain initial information on the psychometric proper-ties of the Spanish versions of each measure. Positive findings inthis respect would provide support for further testing of thesemeasures among more diverse clinical samples.

Method

The sample initially consisted of 90 English–Spanish bilingual under-graduates at the University of Miami who received, for participation,partial credit for an introductory psychology research assignment. To beincluded, participants were required to pass an English and Spanish com-prehension test consisting of 12 words at the 7th-grade level (drawn fromthe Peabody Picture Vocabulary Test—Third Edition, Dunn & Dunn,1997; Test De Vocabulario En Imagenes Peabody: Adaptacion Hipano-americana, Dunn, Padilla, Lugo, & Dunn, 1986). Participants scored wellabove the recommended threshold of 4 correct words on both scales. Thatis, they correctly identified an average of 9.38 (SD � 1.93) Spanish and11.38 (SD � 1.11) English words. A review of responses suggested arandom pattern of responding for two individuals; they were removed fromsubsequent analyses.

Of the 88 participants retained for the analyses, 61.4% were men and38.6% were women. Participants ranged in age from 17 to 47 (M � 19.81,SD � 4.46). Thirty-one percent of the participants were freshman, 23.9%

Camilo J. Ruggero, Sheri L. Johnson, and Amy K. Cuellar, Departmentof Psychology, University of Miami.

We extend special thanks to Sandra Trifonovic, Daniela Malazzo,Wendy Vega, Iruma Bello, Amy E. Hutchings, and Jose Menendez whoassisted in the translation of measures and data collection.

Correspondence concerning this article should be addressed to Camilo J.Ruggero, Department of Psychology, University of Miami, Coral Gables,FL 33124-2070. E-mail: [email protected]

Psychological Assessment Copyright 2004 by the American Psychological Association2004, Vol. 16, No. 4, 381–385 1040-3590/04/$12.00 DOI: 10.1037/1040-3590.16.4.381

381

page 46

were sophomores, 22.7% were juniors, and the remaining 22.7% wereeither seniors or graduate-level students. In terms of ethnicity, 81.8% of theparticipants reported being Hispanic, and 13.7% designated themselves asCaucasian, 1.1% as Asian, 1.1% as African American, and 1.1% as other.With respect to country of origin, 67% of participants were born in theUnited States or Canada (5.7% of the participants were from Puerto Rico),21.6% were born in South America, 3.4% were born in Central America,and 3.4% were born in Cuba. The remaining 4.5% of the participants wereborn in other regions of the world.

Participants met with the experimenter in small groups. Participantswere told that they would be completing English and Spanish versions ofvarious questionnaires. All participants completed written informed-consent procedures with none declining to participate in the study. Eachparticipant completed the language comprehension test noted above andthen completed computerized versions of the measures. The language-order of the administration was varied, with some participants completingthe English versions first and others completing the Spanish versions first.Within each language, however, the order of instrument administration wasthe same. To reduce the possibility that a participant would remember hisor her response to any particular item, participants first completed all themeasures in one language before completing the measures in the secondlanguage.

Translation of Measures

We took several steps to obtain Spanish versions of the measures. Amedical translator was hired to provide an initial translation of all measuresinto Spanish. To ensure that different Hispanic groups could readily un-derstand items, a team of translators who had lived in Mexico, Cuba,Argentina, Chile, and Spain reviewed the initial translations and replacedculture-specific wording with culture-neutral wording. A second team oftranslators back-translated the items of the Spanish versions. If wordingfrom the original items and back-translations were discrepant, the team oftranslators consulted with one another and selected wording that wasmutually acceptable.

ISS

The ISS is a 16-item self-report instrument designed to assess theseverity of current manic and depressive symptoms. In the original version,participants respond to each item using a 100-mm visual analog scale. Thisresponse format was modified in the current research by adopting aLikert-type scale that ranged from 1 (not at all) to 10 (extremely) and thathas been validated in previous research (Glick, McBride, & Bauer, 2003;Johnson, Ruggero, & Carver, in press; B. Meyer, Johnson, & Carver,1999). Bauer et al.’s (1991) principal-components analysis yielded foursubscales: Activation (ACT), Well-Being (WB), Perceived Conflict (PC),and the Depression Index (DI).

All subscales had good internal consistency reliability (ACT, � � .84;WB, � � .87; PC, � � .81; DI, � � .92; Bauer et al., 1991). With respectto validation, ACT was significantly correlated (r � .60, p � .0001) withthe Young Mania Rating Scale (Young, Biggs, Ziegler, & Meyer, 1978),whereas DI was significantly correlated (r � .84, p � .0001) with theHamilton Depression Rating Scale (Hamilton, 1960). Both scales suc-ceeded in not only distinguishing diagnostic groups, but in discriminatingchanges in symptom severity (Bauer et al., 1991). As expected, the ACTand DI scales were not highly correlated (r � .17; Bauer et al., 1991).

HPS

The HPS is a 48-item self-report measure designed to identify individ-uals at risk for manic episodes. The HPS is one of the few measures thathas been shown to predict the development of hypomania and bipolardisorder over time (Kwapil et al., 2000). The items assess positive affect,energy, extraversion, and goal-driven behavior. Each of the 48 items is

keyed either “true” or “false.” Sample items include “I often feel excitedand happy for no apparent reason,” “I often have moods where I feel soenergetic and optimistic that I feel I could outperform almost anyone atanything,” and “There have often been times when I had such an excess ofenergy that I felt little need to sleep at night.”

The HPS has been shown to differentiate individuals with and withoutmanic symptoms: More than 75% of individuals with high scores werefound to meet diagnostic criteria for bipolar disorder (Eckblad & Chapman,1986). The measure has high reliability (15-week test–retest reliability �.81; correlation � � .87) and correlates with other measures of risk forbipolar disorder (General Behavior Inventory: r � 47, n � 768; Eckblad &Chapman, 1986). The measure is uncorrelated with indices of socialdesirability (Crowne-Marlow Scale for Social Desirability: r � .05, n �768; Eckblad & Chapman, 1986). High scores have been shown to predictthe onset of bipolar disorder and related conditions over a 10-year period(Kwapil et al., 2000). In addition, the HPS has been shown to relate tosymptoms of mania more robustly than other scales, such as the NEO-V(T. D. Meyer, 2002).

IDD

The IDD is a 22-item self-report measure designed to assess the symp-toms of major depressive disorder. Unlike the Beck Depression Inventory(Beck, Rush, Shaw, & Emery, 1979), the IDD and IDD-L were designedto closely correspond with Diagnostic and Statistical Manual of MentalDisorders (3rd ed.; DSM–III; American Psychiatric Association, 1980)criteria for the diagnosis of a major depressive episode and also closelycorrespond with the criteria of the Diagnostic and Statistical Manual ofMental Disorders (4th ed., text rev.; DSM–IV–TR; American PsychiatricAssociation, 2000). Each item consists of five statements assessing thedegree to which one has experienced a specific symptom of depression(0 � absence of the symptom, 1 � subclinical severity, 2–4 � clinicallysignificant symptoms).

The IDD has been shown to differentiate between individuals with andwithout major depression (Zimmerman et al., 1986). The measure has highreliability (split-half reliability � .93; Cronbach’s � � .92) and correlatessignificantly with other measures of depression (Hamilton Rating Scale:r � .80, p � .001; Beck Depression Inventory: r � .87, p � .001;Zimmerman et al., 1986). Moreover, the IDD is sensitive to changes indepression severity from inpatient admission to discharge (Zimmerman etal., 1986).

IDD-L

The IDD-L is a 22-item self-report measure designed to assess lifetimehistory of depression. It is identical to the IDD with one exception: Ratherthan referring to current symptoms, the IDD-L asks respondents to focus onthe week in their life when they felt the most profoundly sad or depressed.The IDD-L was originally designed to diagnose a lifetime history of majordepressive disorder according to the DSM–III. However, the items cover allof the criteria necessary to make this diagnosis according to the standardsof the DSM–IV–TR.

The IDD-L has good reliability (Cronbach’s � � .92; split-half reliabil-ity � .90) and has demonstrated significant concordance with other mea-sures of lifetime history of depression (Diagnostic Interview Schedule[DIS]: � � .66). Using the DIS as the criterion, the sensitivity of theIDD-L was 74% and its specificity was 93%. The chance corrected level ofagreement between the IDD-L and DIS was � � .60.

Affective State

To examine links between symptoms and current affect, a list of sixpositive and six negative affect adjectives was also included. For eachadjective, individuals were asked to describe how they were feeling “rightnow” on a scale of 0 (not at all) to 8 (extremely). Positive affect items

382 BRIEF REPORTS

page 47

included amused, elated, enthusiastic, euphoric, excited, happy, and sur-prised. Negative affect items included annoyed, anxious, distressed, fear-ful, hostile, and nervous. These items were drawn from the CurrentAffective State Inventory (Gross, Sutton, & Ketelaar, 1998).

Results

Table 1 presents descriptive statistics for the sample (M, SD, and�) on the English and Spanish versions of the HPS, ISS, IDD,IDD-L, and positive and negative affect measures. Alpha coeffi-cients of internal consistency for the Spanish versions of each scalewere .70 or greater (M � .83), and all were comparable to thealpha coefficients for the English versions (M � .82). Among ourmeasures, the HPS is the only one previously translated intoSpanish (Rawlings et al., 2000); however, the current Spanish

translation of the HPS, which differed by relying on back-translation, had a significantly higher alpha coefficient of internalconsistency than the previously translated version (.86 vs. .70).

Intraclass correlations using the agreement model were calcu-lated to assess the correspondence between the language versionsof the measures and are presented in Table 2. The intraclasscorrelations between English and Spanish versions of each scaleexceeded .67, ranging from .68 ( p � .01) for the ISS Activationscale to .97 ( p � .01) for the IDD-L scale.1 The relatively smallsize of the current sample did not allow for extensive analyses ofthe performance of the Spanish versions of the instruments at theitem level (e.g., confirmatory factor analysis). However, a prelim-inary comparison of item performance between the two languageversions revealed only a few discrepancies.2

As would be expected in a nonclinical sample, only 4.5% ofparticipants endorsed symptoms of a current major depressiveepisode on the IDD. This result was the same for both the English

1 The ISS Activation scale is the primary measure of current manicsymptoms. However, a single item measure of mania, “Today I feelmanic,” was included. This item faired poorly, with a low intraclasscorrelation between the English and Spanish versions, ri � .32, p � .01.

2 For scales based on items with a continuous response format, weexamined the item-scale correlations for each scale in the two languageversions to help us identify items that were possibly performing differentlyacross the two languages. With only one exception, all of the items acrossthe scales had similar item-scale correlations for the two language versionsand all were significantly greater than 0. The exception was Item 5 of theIDD, where the item-scale correlation was .40 for the English version but.59 for the Spanish version. Item-scale correlations for Item 13 of the ISSActivation scale was low, but this was true for both language versions(rEnglish � .34; rSpanish � .35) and suggested a difficulty with the originalinstrument for our sample, not the translation. Item-level performance forthe HPS was explored by comparing the rates of endorsement for each itemin the English and Spanish versions. Thirty-nine of the 48 items had similar(less than a 10-point difference) rates of endorsement between the Spanishand English versions. The rates of endorsement between the languageversions of the remaining 9 items (Items 2, 9, 14, 18, 21, 23, 31, 44, and47) differed by, at most, 18 percentage points.

Table 2Intraclass Correlations for English- and Spanish-LanguageVersions of the HPS, ISS, IDD, IDD-L Scales, and the AffectMeasures

Scale ri 95% CI

HPS .94* .92–.96ISS Act .68* .54–.78ISS WB .81* .73–.87ISS PC .85* .78–.90ISS DI—5a .88* .82–.92ISS DI—2b .77* .67–.85IDD .95* .93–.97IDD-L .97* .95–.98PA .88* .81–.92NA .87* .80–.91

Note. CI � confidence interval; HPS � Hypomanic Personality Scale;ISS � Internal State Scale; Act � Activation scale; WB � Well-Beingscale; PC � Perceived Control scale; DI � Depression Index; IDD �Inventory to Diagnose Depression; IDD-L � Inventory to Diagnose De-pression, Lifetime version; PA � positive affect words; NA � negativeaffect words.a ISS DI five-item version. b ISS DI two-item version.* p � .01.

Table 1Descriptive Statistics for English and Spanish Versions of the HPS, ISS, IDD, IDD-L Scales, andthe Affect Measures

Scale

English version Spanish version

M SD � M SD �

HPS 16.20 7.41 .86 16.30 7.49 .86ISS Act 3.20 1.74 .75 3.00 1.95 .86ISS WB 6.25 2.25 .81 6.11 2.30 .81ISS PC 2.56 1.35 .64 2.41 1.34 .70ISS DI—5a 3.25 1.90 .82 3.26 1.81 .78ISS DI—2b 2.50 2.05 .75 2.32 1.81 .64IDD 3.73 6.30 .92 3.58 6.38 .93IDD-L 8.88 13.94 .90 9.44 14.52 .91PA 3.18 1.55 .92 3.27 1.66 .92NA 2.26 1.22 .80 2.25 1.30 .84

Note. HPS � Hypomanic Personality Scale; ISS � Internal State Scale; Act � Activation scale; WB �Well-Being scale; PC � Perceived Conflict scale; DI � Depression Index; IDD � Inventory to DiagnoseDepression; IDD-L � Inventory to Diagnose Depression, Lifetime version; PA � positive affect words; NA �negative affect words.a ISS DI five-item version. b ISS DI two-item version.

383BRIEF REPORTS

page 48

and Spanish versions of the measure. A lifetime major depressiveepisode as assessed by IDD-L was reported by 18.2% of thesample on the Spanish IDD-L and 17% on the English IDD-L (oneparticipant met criteria on the Spanish, but not the English IDD-L).Both rates are congruent with what would be expected for lifetimehistory of depression among a nonclinical sample. Only 3.4% ofthe sample obtained a score of 36 or higher on the HPS, which wasthe original cut-off used in concurrent validity studies by Eckblad& Chapman (1986).

Analyses suggested that the convergent validity in Spanish, asindicated by the correlation among related scales, mirrored theconvergent validity among the English versions of the scales. Aswould be expected, English measures of lifetime risk for mania,current hypomanic symptoms, and positive affect were robustlycorrelated (see Table 3). The pattern of scale correlations for theSpanish versions closely mirrored the pattern of scale correlationsfor the English versions.

English measures of current depression and negative affect werealso robustly correlated (see Table 4). Lifetime history of depres-sion (IDD-L) was correlated with the IDD, but not with current

depression on the ISS or the negative affect measure. This findingis not too surprising given the low rates of current depressionamong individuals with a lifetime history of depression. As before,the pattern of scale correlations for the Spanish versions closelymirrored the pattern of scale correlations for the English versions.

Discussion

The goal of this study was to provide Spanish-language versionsof self-report measures of manic symptoms, as well as measures ofcurrent and lifetime depression. Results of this study providedpreliminary support for the Spanish versions of these measures.Specifically, the measures had acceptable internal consistencyestimates; the Spanish versions of the measures were all compa-rable to their English counterparts, having high intraclass correla-tions between language versions. Although strong supportemerged for the compatibility between the English and Spanishversions of the mania and depression measures, we address someissues and limitations that influence the interpretation of results.

Most importantly, we relied on a relatively small, undergradu-ate, and bilingual sample. This has several repercussions. First, thesmall sample size limited our ability to examine the factor struc-ture of the measures. Second, it may be that individuals with moresevere forms of psychopathology would describe their symptomsdifferently, and it will be important to assess how well measurescohere among clinical samples. Third, because of the relativelylow levels of symptoms within this sample, one would expectrestriction of range. This is most likely for more rare symptoms,including current symptoms (which should be endorsed at a lowerrate than lifetime symptoms), and hypomanic symptoms (whichoccur in a much smaller proportion of the population). Thisrestriction-of-range issue would be expected to artificially lowerthe magnitude of the correlation between scales. Fourth, the prop-erties of the current measures need to be explored in other popu-lations, in particular, among nonbilingual participants and amongparticipants from Spanish-speaking regions other than those of thecurrent sample. Future research may also benefit from consideringthe need to adapt the current measures to accommodate for re-

Table 3Correlations Between English- and Spanish-Language Versionsof Mania and Positive Affect Measures

Scale 1 2 3 4 5 6

English version1. HPS —2. ISS Act .35* —3. PA .34* .51* —

Spanish version4. HPS .94* .36* .34* —5. ISS Act .56* .68* .62* .56* —6. PA .32* .41* .88* .31* .63* —

Note. Two-tailed Pearson’s correlation coefficient used. HPS � Hypo-manic Personality Scale; ISS Act � Internal State Scale Activation scale;PA � positive affect words.* p � .01.

Table 4Correlations Between English- and Spanish-Language Versions of Depression and NegativeAffect Measures

Scale 1 2 3 4 5 6 7 8 9 10

English version1. ISS DI—5a —2. ISS DI—2b .79* —3. NA .59* .62* —4. IDD .41* .35* .39* —5. IDD-L .12 .10 .14 .51* —

Spanish version6. ISS DI—5a .86* .66* .64* .47* .17 —7. ISS DI—2b .70* .78* .63* .28* .16 .72* —8. NA .58* .66* .87* .26* .13 .61* .65* —9. IDD .46* .41* .45* .95* .53* .49* .35* .31* —

10. IDD-L .16 .14 .14 .51* .96* .20 .19 .13 .53* —

Note. Two-tailed Pearson’s correlation coefficient used. ISS DI � Internal State Scale Depression Index;NA � negative affect words; IDD � Inventory to Diagnose Depression; IDD-L � Inventory to DiagnoseDepression, Lifetime version.a ISS DI five-item version. b ISS DI two-item version.* p � .01.

384 BRIEF REPORTS

page 49

gional differences in language usage. Finally, language proficiencyissues were poorly considered in this study. The current study didnot include assessment of the language difficulty of measures, northe language proficiency levels of participants. Moreover, ourlanguage-proficiency standard was quite minimal. It will be im-portant for future studies to consider the extent to which languageproficiency influences the psychometric properties of the mea-sures. Beyond the limitations discussed so far, interpretation of theresults is limited by the fact that participants knew they wouldcomplete the measures in two languages, which may have intro-duced bias into their responses.

The availability of Spanish versions of commonly used mea-sures of mania and depression is a necessary prerequisite to un-derstanding cross-cultural differences in the course of bipolardisorder. Results in the current study provide preliminary supportfor the use of these measures in Spanish-speaking participants;however, before further endorsement can be given, investigation ofthe psychometric properties of these measures in clinical as well asnonbilingual samples will be necessary.3

3 The Spanish versions of the ISS and HPS described here can bedownloaded directly from the Web at http://www.psy.miami.edu/faculty/sjohnson

References

American Psychiatric Association. (1980). Diagnostic and statistical man-ual of mental disorders (3rd ed.). Washington, DC: Author.

American Psychiatric Association. (2000). Diagnostic and statistical man-ual of mental disorders (4th ed., text rev.). Washington, DC: Author.

Bauer, M. S., Crits-Christoph, P., Ball, W. A., Dewees, E., McAllister, T.,Alahi, P., et al. (1991). Independent assessment of manic and depressivesymptoms by self-rating: Scale characteristics and implications for thestudy of mania. Archives of General Psychiatry, 48, 807–812.

Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitivetherapy for depression. New York: Guilford Press.

Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test—III.Circle Pines, MN: American Guidance Service.

Dunn, L., Padilla, E., Lugo, D., & Dunn, L. (1986). Test De VocabularioEn Imagenes Peabody: Adaptacion Hispanoamericana [Peabody PictureVocabulary Test: Hispanic-American Adaptation]. Circle Pines, MN:American Guidance Service.

Eckblad, M., & Chapman, L. J. (1986). Development and validation of ascale for hypomanic personality. Journal of Abnormal Psychology, 95,214–222.

Ginzberg, E. (1991). Access to health care for Hispanics. Journal of theAmerican Medical Association, 265, 238–241.

Glick, H. A., McBride, L., & Bauer, M. S. (2003). A manic-depressivesymptom self-report in optical scannable format. Bipolar Disorders, 5,366–369.

Gross, J. J., Sutton, S. K., & Ketelaar, T. V. (1998). Relations betweenaffect and personality: Support for the affect-level and affective-reactivity views. Personality and Social Psychology Bulletin, 24, 279–288.

Hamilton, M. (1960). A rating scale for depression. Journal of Neurology,Neurosurgery and Psychiatry, 23, 56–61.

Johnson, S. L., Ruggero, C. J., & Carver, C. S. (in press). Cognitive,behavioral, and affective responses to reward: Links with hypomania.Journal of Clinical and Social Psychology.

Kwapil, T. R., Miller, M. B., Zinser, M. C., Chapman, L. J., Chapman, J.,& Eckblad, M. (2000). A longitudinal study of high scorers on theHypomanic Personality Scale. Journal of Abnormal Psychology, 109,222–226.

Meyer, B., Johnson, S. L., & Carver, C. S. (1999). Exploring behavioralactivation and inhibition sensitivities among college students at risk forbipolar spectrum symptomatology. Journal of Psychopathology andBehavioral Assessment, 21, 275–292.

Meyer, T. D. (2002). The Hypomanic Personality Scale, the Big Five, andtheir relationship to depression and mania. Personality and IndividualDifferences, 32, 649–660.

Nandi, D. N., Banerjee, G., Mukherjee, S. P., Nandi, P. S., & Nandi, S.(2000). Psychiatric morbidity of a rural Indian community: Changesover a 20-year interval. British Journal of Psychiatry, 176, 351–356.

Norvy, D. M., Stanley, M. A., Averill, P., & Daza, P. (2001). Psychometriccomparability of English- and Spanish-language measures of anxietyand related affective symptoms. Psychological Assessment, 13, 347–355.

Rawlings, D., Barrantes-Vidal, N., Claridge, G., McCreery, C., & Galanos,G. (2000). A factor analytic study of the Hypomanic Personality Scale inBritish, Spanish and Australian samples. Personality and IndividualDifferences, 28, 73–84.

U.S. Census Bureau. (2001). Census 2000 PHC-T-9. Population by age,sex, race, and Hispanic or Latino origin for the United States: 2000.Retrieved February 21, 2003, from http://www. census. gov/population/www/cen2000/phc-t9.html

Young, R. C., Biggs, J. T., Ziegler, V. E., & Meyer, D. A. (1978). A ratingscale for mania: Reliability, validity and sensitivity. British Journal ofPsychiatry, 133, 429–435.

Zimmerman, M., & Coryell, W. (1987). The inventory to diagnose depres-sion, lifetime version. Acta Psychiatrica Scandinavica, 75, 495–499.

Zimmerman, M., Coryell, W., Corenthal, C., & Wilson, S. (1986). Aself-report scale to diagnose major depressive disorder. Archives ofGeneral Psychiatry, 43, 1076–1081.

Received March 25, 2003Revision received April 28, 2004

Accepted April 29, 2004 �

385BRIEF REPORTS

page 50

Development and Preliminary Validation of the Physiological HyperarousalScale for Children

Jeff LaurentDe Pere, Wisconsin

Salvatore J. CatanzaroIllinois State University

Thomas E. Joiner Jr.Florida State University

Considerable empirical support exists for the positive affect and negative affect components of thetripartite model of anxiety and depression proposed by L. A. Clark and D. Watson (1991); however, lessattention has been paid to the physiological hyperarousal component of the model. The development ofthe Physiological Hyperarousal Scale for Children (PH–C; J. Laurent, S. J. Catanzaro, & T. E. Joiner Jr.,1995) is described. The psychometric properties of items are examined using students in Grades 6–12(N � 398). Initial scale validation includes a joint factor analysis with the Positive and Negative AffectScale for Children (PANAS–C; J. Laurent et al., 1999; J. Laurent, K. Potter, & S. J. Catanzaro, 1994).The relationship between the PH–C and existing measures that tap related constructs is examined.Together, the PH–C and PANAS–C provide a means to assess tripartite model constructs useful indifferentiating anxiety and depression.

The tripartite model of anxiety and depression developed byClark and Watson (1991) has been helpful in understanding thesedisorders. After reviewing the psychometric data concerning therelationship between adult anxiety and depression, Clark andWatson concluded that anxiety and depression share a nonspecificcomponent that reflects general affective distress or negative affect(NA). However, they also identified characteristics that distin-guished the disorders. Symptoms involving physiological hyper-arousal (PH; e.g., racing heart, sweaty palms, dry mouth) wereunique to anxiety, whereas the lack of positive affect (PA; i.e.,pleasant engagement with the environment) identified depression.

Originally part of the two-factor structure of affect proposed byZevon and Tellegen (1982), the PA and NA aspects of the tripartitemodel have garnered nearly 2 decades of support (e.g., Tellegen,Watson, & Clark, 1999; Watson & Clark, 1984; Watson & Telle-gen, 1985). Support for the PA and NA constructs was fostered bythe development of an instrument that allowed their measurement,the Positive and Negative Affect Schedule (PANAS; Watson,Clark, & Tellegen, 1988). The PANAS, and a youth version, thePositive and Negative Affect Scale for Children (PANAS–C;Laurent et al., 1999; Laurent, Potter, & Catanzaro, 1994), have

been used with a variety of adult and youth populations (Crocker,1997; Crook, Beaver, & Bell, 1998; Huebner & Dew, 1995; Jolly,Dyck, Kramer, & Wherry, 1994; Kercher, 1992; Lonigan, Hooe,David, & Kistner, 1999; Lonigan, Phillips, & Hooe, 2003; Watson,1988; Watson, Clark, & Carey, 1988).

The introduction of the tripartite model created a need for ameasure of the PH construct. This was remedied quickly in theadult literature with the introduction of the Anxious Arousal sub-scale of the Mood and Anxiety Symptom Questionnaire (MASQ;Watson & Clark, 1991). The MASQ has demonstrated its validityin adult samples (Keogh & Reidy, 2000; Reidy & Keogh, 1997;Watson, Clark, et al., 1995; Watson, Weber, et al., 1995) andpreliminary studies with adolescents (Ettelson & Laurent, 2002;Laurent & Ettelson, 2001a), lending support for the PH construct.Recently, Joiner et al. (1999) identified items from the BeckAnxiety Inventory (BAI; Beck & Steer, 1993) that were consistentwith the PH construct.

The development of child measures of PH has been slower. Asa result, items from popular child self-report measures of anxietyand depression were used in initial attempts to validate the tripar-tite model (Chorpita, Albano, & Barlow, 1998; Chorpita, Plum-mer, & Moffitt, 2000; Joiner, Catanzaro, & Laurent, 1996). Ra-tionally derived PH scales were developed using items from theRevised Children’s Manifest Anxiety Scale (RCMAS; Reynolds &Richmond, 1985). It is interesting to note that the PhysiologicalAnxiety subscale was not used in its entirety, because it containeditems about trouble making up one’s mind and bad dreams, whichwere viewed as unrelated to the somatic arousal associated withPH. Although the particular RCMAS items selected varied (Joineret al., 1996, used Items 5 [trouble breathing], 17 [sick to stomach],and 19 [sweaty hands]; Chorpita, Plummer, & Moffitt, 2000, usedItems 5, 9 [mad easily], 13 [sleep problems], 17, 21 [tired], 33

Jeff Laurent, De Pere, Wisconsin; Salvatore J. Catanzaro, Department ofPsychology, Illinois State University; Thomas E. Joiner Jr., Department ofPsychology, Florida State University.

Portions of this article were presented at the 70th annual meeting of theMidwestern Psychological Association, Chicago, Illinois, April/May 1998,and the 32nd annual convention of the National Association of SchoolPsychologists, New Orleans, Louisiana, March/April 2000.

Correspondence concerning this article should be addressed to JeffLaurent, 1508 Fox Ridge Court, De Pere, WI 54115. E-mail:[email protected]

Psychological Assessment Copyright 2004 by the American Psychological Association2004, Vol. 16, No. 4, 373–380 1040-3590/04/$12.00 DOI: 10.1037/1040-3590.16.4.373

373

page 51

[wiggle in seat]), these rationally derived scales provided adequatemeasurement of the PH construct for the purposes of the studies inwhich they were used.

The current zeitgeist in the youth literature concerning themodel is moving away from rationally derived scales to empiri-cally based measurement of the tripartite constructs. Shortly afterdeveloping the PANAS–C, Laurent, Catanzaro, and Joiner (1995)began work on a measure of PH, the Physiological HyperarousalScale for Children (PH–C). Initial data using the PH–C in combi-nation with the PANAS–C were encouraging (Laurent, Catanzaro,& Joiner, 1998, 2000). The results from principal-axis factoring(PAF) and relationships among PA, NA, and PH scales wereconsistent with the tripartite model. Recently, Chorpita, Daleiden,and colleagues presented another tripartite measure, the Affect andArousal Scale for Children (AFARS; Chorpita, Daleiden, Moffitt,Yim, & Umemoto, 2000). Selected from an initial item pool of 77,the 27-item AFARS has NA (8 items), PA (10 items), and PH (9items) subscales that mirror tripartite model constructs. The au-thors of the AFARS provided data attesting to the validity of themeasure (Chorpita & Daleiden, 2002; Chorpita, Daleiden, et al.,2000; Daleiden, Chorpita, & Lu, 2000). Although the PA and NAcomponents of the AFARS differ from the PANAS–C scales inthat items from the former are more symptom-oriented (e.g., “Ihave plenty of friends” [PA], “I can’t calm down once I am upset”[NA]) and items from the latter are affect-based (i.e., youth indi-cate the extent to which they have felt “active” [PA], “nervous”[NA]), the PH scale of the AFARS and the PH–C are similar inthat they both tap physiological aspects of anxiety; differences liein the type and breadth of symptoms measured in the 18-itemPH–C versus 9 PH items of the AFARS.

The purpose of the current article is twofold. First, we furtherdescribe the development of the PH–C, a measure of physiologicalhyperarousal for youth. The PH–C makes it unnecessary to rely onquestionable subscales (e.g., RCMAS Physiological Anxiety) orthe patchwork of measures that have been rationally derived fromexisting self-report anxiety instruments to assess PH. In addition,the PH–C complements the PANAS–C, providing measurement ofall three components of the tripartite model of anxiety and depres-sion proposed by Clark and Watson (1991). Second, the psycho-metric properties of the 18-item PH–C were examined using asample of students in Grades 6–12, and preliminary informationregarding scale validation is provided.

Method

Participants

The sample consisted of students in Grades 6–12 (N � 398) in centralIllinois who were participating in studies related to various aspects ofadolescent development. The mean age of the students in the sample was14.60 years (SD � 1.73 years). The sample consisted of 66 sixth graders(16.6%), 73 seventh graders (18.3%), 76 eighth graders (19.1%), 77 ninthgraders (19.3%), 43 tenth graders (10.8%), 34 eleventh graders (8.5%), and29 twelfth graders (7.3%). The sample was 49.2% male and 50.8% female.Most children were Caucasian (81.9%), with African Americans, AsianAmericans, Latino Americans, and those of Middle Eastern descent, inter-racial, or other ethnic backgrounds representing 3.8%, 2.3%, 0.8%, 0.5%,1.2%, and 3.5% of the sample, respectively; data on ethnicity were missingfor 6.0% of the sample. The majority of students in the sample lived withboth biological parents (66.6%), 12.8% lived with their biological mother

only, 1.5% lived with their biological father only, 8.8% lived with theirbiological mother and stepfather, 5.3% lived with their biological fatherand stepmother, and 4.5% reported other living arrangements (e.g., grand-parent[s], adoptive parent[s]); data on living arrangements were unavail-able for 0.5% of the sample. Specific questions soliciting informationconcerning socioeconomic and special education status of students partic-ipating in the study were not asked.

Instruments

PH–C. The PH–C consists of 18-items that assess physiological hy-perarousal, defined as bodily manifestations of autonomic arousal (see theAppendix). Criteria for anxiety disorders from the fourth edition of theDiagnostic and Statistical Manual of Mental Disorders (DSM–IV; Amer-ican Psychiatric Association, 1994) and existing self-report measures ofanxiety, such as the State–Trait Anxiety Inventory for Children (STAIC;Spielberger, 1973), the RCMAS, and the BAI were consulted in order toensure that terms that best fit the definition were included and that termsthat did not refer specifically to physiological aspects of anxiety wereexcluded. With regard to the anxiety disorders listed in the DSM–IV, theparticular disorder that has the highest degree of physical arousal associ-ated with it is panic disorder. All physical symptoms listed under panicdisorder are found on the PH–C. Symptoms of restlessness and muscletension listed under generalized anxiety disorder also are represented onthe PH–C. Other symptoms were considered, for example, exaggeratedstartle response from posttraumatic stress disorder and feeling that one hasto urinate frequently. However, these were viewed as less clearly associ-ated with anxious arousal among children.

On the PH–C, children are instructed to rate on a 5-point scale (1 � veryslightly or not at all to 5 � extremely) how often they have experiencedsymptoms such as “sweaty hands/palms,” “feeling of choking,” “numbness(like your foot’s asleep),” “heart pounding,” “can’t catch your breath,” andso forth, during the past 2 weeks. Initial findings using the PH–C andPANAS–C were consistent with the tripartite model (Laurent et al., 1998,2000). The psychometric properties of the measure, the focus of this article,are described below.

PANAS–C. The PANAS–C consists of a 12-item PA scale and a15-item NA scale. The PANAS–C contains a mix of items from theoriginal PANAS, the PANAS–X Basic Negative Emotions, Basic PositiveEmotions, and Other Affective States scales (Watson & Clark, 1994), anditems that represented synonyms for some PANAS–X items that Laurent etal. (1994) felt were more easily understood by children. The PANAS–Cinstructs children to indicate how often they have felt interested, sad, andso forth, during the past few weeks on a 5-point Likert-type scale (1 � veryslightly or not at all to 5 � extremely). Laurent et al. (1999) presentedinformation on the properties of the PANAS–C. The alpha coefficients forthe PA scale (.90, .89) and the NA scale (.94, .92) were in the generallyaccepted range. The PANAS–C scales also demonstrated good convergentand discriminant validity in a school sample (Grades 4–8) and in a smallinpatient sample.

RCMAS. The RCMAS is designed to assess anxiety in children andadolescents. Items consist of generally descriptive statements of anxiety(e.g., “I worry about. . .”) or social desirability (e.g., “I am always. . .”).Children are asked to determine if the item describes themselves. Affir-mative responses are counted and converted to a T score on the TotalAnxiety scale (M � 50, SD � 10). The RCMAS also yields four subscalescores (M � 10, SD � 3): Physiological Anxiety, Worry/Oversensitivity,Social Concerns/Concentration, and Lie. The current study reported find-ings for the 10-item Physiological Anxiety subscale. According to the testauthors, this factor-based subscale provides an index of a child’s expres-sion of the physical manifestations of anxiety. Reynolds and Richmond(1985) provided extensive reliability and validity data concerning theRCMAS. They reported reliability estimates for the Physiological Anxietysubscale that were consistently in the .60s and .70s, except above age 15,where estimates fell within the .50s.

374 LAURENT, CATANZARO, AND JOINER

page 52

Children’s Psychosomatic Symptom Checklist (CPSC). The CPSC(Wisniewski, Naglieri, & Mulick, 1988) is a 12-item scale of psychoso-matic distress experienced by children. The measure is based on the StateUniversity at New York revision of the Psychosomatic Symptom Checklist(PSC; Attanasio, Andrasik, Blanchard, & Arena, 1984). On the CPSC,children are asked to rate physical problems (e.g., stomach pains, heartbeating very fast, etc.) on two dimensions: how often they experienced theproblem (i.e., frequency; 0 � not a problem to 4 � every day) and how badthe problem was (i.e., intensity; 0 � not a problem to 4 � very, very bad).Wisniewski et al. (1988) reported a coefficient alpha of .83 for their12-item measure. They also provided evidence of construct and discrimi-nant validity for the measure.

Children’s Depression Inventory (CDI). The CDI (Kovacs, 1992) is a27-item self-report measure of depression that was designed for school-agechildren and adolescents. Each item assesses a specific symptom of de-pression (e.g., sadness) or its school-related consequences (e.g., perceivedperformance in school). On each item, the child is instructed to choosefrom three statements that range from no symptom (0) through mild (1) tosevere symptoms (2); total raw scores range from 0 to 54. The psychometricproperties of the CDI have been examined extensively. Kovacs (1992)found support for five subscales: Negative Mood, Interpersonal Problems,Ineffectiveness, Anhedonia, and Negative Self-Esteem. The six-item Neg-ative Mood subscale was used in the present study. Kovacs (1992) reporteda reliability estimate of .62 for the Negative Mood subscale.

Procedure

Letters were sent to parents and guardians of students in Grades 6–12inviting them to allow their child to participate in studies about adolescentdevelopment. Students for whom parental consent was obtained completedthe PH–C along with other measures as part of larger studies on social–emotional functioning. Data were collected in groups of 12–40 students.Each student was given a packet containing an assent form, a sheetrequesting demographic information, and various self-report measures,depending on the particular study. Measures within packets were assem-bled in random order. The introduction to each student in the variousstudies contained the statement, “This is a study of students and theiremotions or feelings.” Often, this was the first statement the student heardor read. No study asked questions about a student’s health, reducing thelikelihood that the PH–C and physiological anxiety-related measureswould be interpreted as physical illness measures. It is possible that astudent may have changed the context of an item and thought about anillness, but we do not believe that is likely, given the nature of the studies.Very few students asked questions of clarification on any measures used inthe studies. Students typically completed measures in about 25–30 min; nopacket took more than 45 min to complete. The researchers and graduateor undergraduate students in psychology were available to answer ques-tions and debriefed participants about the nature of the studies after themeasures were completed.

Results

The analysis of the PH–C occurred in several steps. First, thedistribution of item responses was examined. All items had allresponse possibilities endorsed. As would be expected in a generalschool sample, the frequency distribution was positively skewedfor each item. Examining the distribution of responses to Options3, 4, and 5 for each item revealed that the lowest percentage ofendorsement of Options 3–5 occurred for the item “feeling ofchoking” (3.3%), and the highest percentage occurred for the item“I can’t sit still”(40.9%).

Next, corrected item-total correlations were generated for thescale. Using the guidelines provided by Nunnally and Bernstein

(1994), an item was considered weak if it had a corrected item-total correlation less than .30. All 18 items had item-total corre-lations that exceeded the .30 criterion. Item-total correlationsranged from .35 (“sweaty hands/palms”) to .59 (“heart pounding”and “shaky”). The alpha coefficient for the scale was .87.

As a first step in validating the 18-item PH–C, a joint factoranalysis was conducted with the 27 PANAS–C items (12 PA, 15NA) to examine the construct validity of the PH–C. A commonfactor analysis (i.e., PAF) was conducted. On the basis of thetripartite model, a three-factor solution with an oblique rotationwas requested. Unlike an orthogonal rotation that forces factors tobe unique (i.e., uncorrelated), an oblique rotation allows the rela-tionship among factors to emerge. The structure matrix was ex-amined to determine the relationship of each item to the factor.Absolute values greater than or equal to .40 were consideredsignificant in interpretation (Gorsuch, 1997). Results from the PAFare presented in Table 1. Loadings on the three factors were aspredicted with clear representations of PA, NA, and PH. Five itemshad significant cross-loadings (i.e., �.40) on the NA and PHfactors.

Next, the correlations between the PH–C and the PANAS–Cscales were examined using a subset of 118 children who hadcompleted these measures along with self-report measures of anx-iety and depression. To be consistent with the tripartite model, thecorrelations between PA–NA and PA–PH should be near zero orslightly negative. Theoretically, the NA–PH correlation should benear zero or slightly positive (Clark & Watson, 1991). However,previous research suggests that NA–PH correlations among childsamples range from the .30s–.50s (see Laurent & Ettelson, 2001b,for a review). Correlations among the scales are presented in Table2. The correlations follow the predicted pattern. PA–NA andPA–PH correlations were near zero. The NA–PH correlation (r �.64) was higher than in other studies (Laurent & Ettelson, 2001b).

The relations between the PH–C and self-report measures ofanxiety and depression among the 118 students also were exam-ined (see Table 2). Because child self-report measures of anxietytypically assess physical aspects of anxiety to some degree, in-cluding symptoms of somatic arousal, correlations between thesescales and the PH–C should be moderate. Higher correlationswould be expected with measures that specifically assess thephysical aspects of anxiety, such as the RCMAS PhysiologicalAnxiety subscale: PH–C correlations among anxiety measuresshould be greater than NA correlations among anxiety measures.The correlation between the CDI Negative Mood subscale and thePANAS–C NA scale should be greater than the correlation be-tween this CDI subscale and the PH–C.

Correlations reflected expected relations: PH correlations withanxiety measures (range � .56–.64) were higher than the NAcorrelations (range � .38–.58). The difference in the size of thecorrelations between PH and NA was statistically different for theRCMAS Physiological Anxiety subscale, t(115) � 2.44, p � .01,Cohen’s d � .45, and the CPSC intensity scale, t(115) � 3.27, p �.001, d � .61. No difference was found in the size of the corre-lations between PH and NA with regard to the CPSC frequencyscale, t(115) � 1.02, ns, d � .19. The NA–CDI Negative Moodsubscale correlation was higher than the PH–CDI Negative Moodsubscale correlation but not statistically greater, t(115) � �1.53,p � .10, d � .28.

375PHYSIOLOGICAL HYPERAROUSAL SCALE FOR CHILDREN

page 53

To further examine the scale, we conducted regression analysesusing the PA and NA scales of the PANAS–C and the PH–C topredict scores on the RCMAS Physiological Anxiety subscale andthe CDI Negative Mood subscale. To be consistent with thetripartite model, the PH–C should add significantly to the predic-tion of anxiety after accounting for PA and NA. In contrast, thereshould be no unique contribution of PH–C to the prediction ofdepression after accounting for PA and NA.

Results followed this pattern. When entered in Block 1 of theregression analysis to predict scores on the RCMAS PhysiologicalAnxiety subscale, the partial correlations for PA (�.30, p � .001)and NA (.44, p � .001) were significant. When PH was entered inBlock 2, the partial correlation with anxiety continued to besignificant (.44, p � .001). In the regression to predict scores onthe CDI Negative Mood subscale, the partial correlations for PA(�.28, p � .01) and NA (.68, p � .001) also were significant whenentered in Block 1. However, when PH was entered in Block 2, thepartial correlation with depression was not significant (.18, ns).

Discussion

The tripartite model of anxiety and depression has proven usefulin understanding the relationship between these two disorders.Until recently, the only method to measure tripartite constructsamong youth was to rationally select and combine items fromexisting self-report measures. Lately, researchers have developedchild measures to assess tripartite constructs (e.g., AFARS; Chor-pita, Daleiden, et al., 2000; Daleiden et al., 2000) or have used ormodified existing adult measures (e.g., MASQ; Ettelson & Lau-rent, 2002; Rudolph, Lambert, Osborne, Gathright, & Kumar,2002). The current study described the further development andinitial validation of the PH–C, a measure of somatic arousalcommonly associated with anxiety. When combined with thePANAS–C (Laurent et al., 1999), the PH–PANAS–C allows allaspects of the tripartite model to be measured with youth.

Item analysis and preliminary validation of the PH–C werepromising. Correlations among the PA and NA scales of thePANAS–C and the PH–C were largely consistent with thosepredicted within the tripartite model. PA–NA and PA–PH corre-lations were near zero. The NA–PH correlation was larger thanpredicted by the model, but not out of line with previous findings(cf. Chorpita, Plummer, & Moffitt, 2000). Similar correlationsbetween NA and PH have been reported in the adult literature(Brown, Chorpita, & Barlow, 1998; Cox, Enns, Walker,Kjernisted, & Pidlubny, 2001; Joiner, 1996; Joiner & Blalock,1995).

The joint factor analysis provided insight into the nature of theNA–PH relationship. Three NA items had significant cross-loadings (i.e., �.40) on the PH factor and two PH items cross-loaded on the NA scale. The NA items frightened, nervous, andafraid represent emotions that often have arousal as a component.The PH item shaky can be viewed as similar to the NA items justlisted, and the PH item heart pounding is an outcome of arousal.The items nervous and afraid are on the adult PANAS. Theseitems, along with frightened and scared, which approach a signif-icant cross-loading on the PH scale, represent four of the six itemson the Fear subscale of the Basic Negative Emotion Scales of theexpanded PANAS (PANAS–X; Watson & Clark, 1994). The itemshaky could also be related to fear. The shared arousal featureassociated with fear may influence the higher-than-expected cor-relation between the NA and PH scales. Future research with thePH–C should more closely examine and delineate this fear–arousalrelationship. A way to influence the relationship between NA andPH may be to reduce the number of items related to fear onmeasures of NA.

Alternatively, future studies could consider the impact of elim-inating items that have significant cross-loadings on scales. Be-

Table 1Principal-Axis Factoring (Oblique Rotation) Requesting a ThreeFactor Solution for the PANAS–C and PH–C Scales

Item

Factor

1 2 3

Interested �.04 .51 �.08Excited .01 .67 �.08Happy �.08 .73 �.18Strong .02 .51 �.09Energetic .02 .68 �.10Calm �.14 .19 �.06Cheerful �.03 .73 �.12Active �.04 .69 �.14Proud �.05 .64 �.12Joyful �.06 .81 �.14Delighted .02 .71 �.04Lively �.06 .76 �.07Sad .36 �.21 .66Frightened .40 .04 .61Ashamed .33 �.01 .64Upset .27 �.21 .69Nervous .41 .17 .57Guilty .33 .04 .60Scared .38 .06 .62Miserable .35 �.27 .61Jittery .33 .28 .29Afraid .41 .04 .59Lonely .35 �.18 .71Mad .29 �.18 .60Disgusted .28 �.04 .57Blue .34 �.19 .66Gloomy .32 �.27 .69Dry mouth .47 �.00 .36Sweaty hands/palms .37 �.02 .17Tingling .50 �.08 .21Blushing .51 .03 .34Shaky .64 �.07 .47Stomach ache .62 .02 .32Cold flashes/chills .53 �.05 .27Dizzy .49 �.06 .30Heart pounding .63 .01 .40Sweating when not hot .47 �.01 .28Can’t catch breath .53 �.01 .22Feeling of choking .50 �.08 .26Hot flashes .51 �.08 .31Numbness .47 .05 .24Pain in chest .64 .04 .31Feeling like throwing up .52 �.07 .35Tight muscles .55 .02 .28Can’t sit still .44 .14 .17Eigenvalues 8.88 5.41 2.16% variance 19.72 12.16 4.79

Note. The structure matrix is provided. Bold numbers represent factorloadings �.40. Underlined numbers represent the highest loading whenmore than one factor has a loading �.40.

376 LAURENT, CATANZARO, AND JOINER

page 54

cause the items with cross-loadings had their highest loading onpredicted scales, suggesting the robustness of the constructs beingmeasured, it may be premature to eliminate these items fromscales. However, if future research replicates the pattern of cross-loadings in different samples, then some consideration should begiven to item elimination or modification to further improve thescales.

The correlations between the PH–C and instruments that purportto measure aspects of anxiety that have a physiological componentranged from .56 to .64. Although the magnitude of the correlationswas encouraging, a note of caution is in order. Previous researchhas challenged the ability of existing child self-report measures toassess the physiological aspects of anxiety. For example, Lee,Piersel, and Unruh (1989) raised concerns about the convergentand discriminant validity of the Physiological Anxiety subscale ofthe RCMAS. The fact that this RCMAS subscale contains itemsrelated to bad dreams, anger, and concentration raises questionsabout the accuracy of calling this factor “physiological” anxiety.On the other hand, the PH–C was developed specifically to tap thephysiological aspects of anxiety.

Also, because other child anxiety measures were examinedwhen developing PH–C items, the possibility that the correlationbetween this scale and the RCMAS Physiological Anxiety sub-scale was inflated needs to be considered. To some extent, thisproblem is inevitable whenever the authors of scales consult thesame child psychopathology literature. The hallmark symptoms ofa disorder form the basis of instruments designed to assess thatdisorder. Compounding the problem is the fact that these scales arethen used to validate one another. We are guilty of perpetuatingthis practice, consulting the DSM–IV and reviewing the STAIC,RCMAS, and BAI while developing PH–C items, and then usinga subscale of the RCMAS as part of the process of validating thePH–C. Although the format of the PH–C differed from existingchild anxiety measures, some item content overlapped. Threeitems from the PH–C and RCMAS were similar (sweaty hands,stomach ache, can’t catch breath). The inevitability of the problemis evident by the fact that there were similar items on the PH–Cand CPSC (stomach ache, dizzy, heart pounding), even though thelatter was not reviewed when developing items for the former.However, the impact of the overlapping items appears minimal.

When the overlapping items were removed from the PH–C thecorrelation with the RCMAS Physiological Anxiety scale wentfrom .56 to .55, and the correlations with the CPSC frequency andintensity subscales went from .64 to .63 and .59 to .60, respec-tively. Perhaps a better test of convergent validity for the PH–Cwould be similar scales from other tripartite measures, such as theArousal scale of the AFARS or the Anxious Arousal scale of theMASQ or a youth version of the MASQ (Rudolph et al., 2002).Now that other measures of the tripartite model exist, comparisonswith other measures of PH can be made.

The correlation with the CDI Negative Mood subscale providedan initial attempt at demonstrating discriminant validity. Althoughthe PH–CDI Negative Mood subscale correlation was less than theNA–CDI Negative Mood subscale correlation, the difference wasnot significant. However, partial correlation of the PH–C with theCDI Negative Mood subscale controlling for PA and NA revealedthe relationship (more specifically, the lack of relationship) pre-dicted by the tripartite model. This finding, combined with thesignificant relationship witnessed between the RCMAS Physiolog-ical subscale and PH–C when controlling for PA and NA, providespreliminary support for both the discriminant and convergentvalidity of the scale, respectively.

Although preliminary findings regarding the PH–C are promis-ing, several other approaches might be considered in future valid-ity research. For example, to date, only one study has employed atripartite measure designed for youth with a clinical sample (Chor-pita & Daleiden, 2002). In the current study, scale developmentusing a broad sampling of youth representative of the populationensured desired variability. However, it is important to demon-strate that the resulting measure is sensitive to relationships pre-dicted in the literature on which it is based in all samples in whichit is employed. When compared with children and adolescents in ageneral school sample, youth experiencing emotional problemswould be expected to obtain scores reflective of their distress.Indeed, preliminary data with a small group of undifferentiatedinpatient youth found that PH and NA scores were significantlyhigher and PA scores were significantly lower for inpatients com-pared with schoolchildren. Clearly, more needs to be done withboth general outpatient and inpatient clinical samples of youth andspecific samples of anxious and/or depressed youth to validate the

Table 2Correlations Between the Physiological Hyperarousal Scale for Children (PH–C) andSelf-Report Measures of Anxiety and Depression

Scale 1 2 3 4 5 6 7

1. PH–C —2. PANASC NA .64*** —3. PANASC PA �.07 .01 —4. RCMAS Physio .56*** .40*** �.28** —5. CPSC Freq .64*** .58*** �.24** .61*** —6. CPSC Intens .59*** .38*** �.26** .39*** .55*** —7. CDI Neg Mood .56*** .65*** �.23* .52*** .55*** .50*** —

Note. N � 118. PANASC NA � Positive and Negative Affect Scale for Children Negative Affect scale;PANASC PA � Positive and Negative Affect Scale for Children Positive Affect scale; RCMAS Physio �Revised Children’s Manifest Anxiety Scale Physiological Anxiety scale; CPSC Freq � Children’s Psychoso-matic Checklist Frequency scale; CPSC Intens � Children’s Psychosomatic Checklist Intensity scale; CDI NegMood � Children’s Depression Inventory Negative Mood scale.* p � .05. ** p � .01. *** p �.001.

377PHYSIOLOGICAL HYPERAROUSAL SCALE FOR CHILDREN

page 55

PH–C and the tripartite model in general. Also, although thecurrent study employed students in Grades 6–12, the PH–C wasdesigned to complement the PANAS–C, which has been used withstudents in Grades 4–12. Future studies should examine the utilityof the PH–C with younger students.

Another approach to establishing the validity of the PH–Cwould be to examine the relationship between the scale and psy-chophysiological measures (e.g., blood pressure and heart rate).Future studies might also employ an experimental methodologysimilar to the mood induction procedure used in depression re-search. Experimental conditions that compared anxious versusdepressed mood inductions, and resulting scores on the PH-PANAS–C scales, would provide additional validity evidence forthe PA, NA, and PH measures.

Future research might also examine the effect of changing theresponse choices on the PH–C. The current response choicesmirror those used on the PANAS–C, and reflect severity ratingsrather than frequency ratings. Changing the response choices tofrequency ratings (e.g., never, sometimes, about half the time,often, all the time) would be more consistent with the instructions.Our observations indicated that students did not have difficultyrecognizing that they were being asked to rate different degrees ofexperience. Nevertheless, future work with the PH–C should ex-amine the impact of changing response choices to reflect fre-quency ratings.

In summary, our findings indicate that the PH–C is a meaningfulmeasure of physiological hyperarousal in youth. When combinedwith the PANAS–C, the PH–PANAS–C assesses all aspects of thetripartite model. Certainly, further research is needed with both thePH–C and PANAS–C to demonstrate their utility in differentiatinganxiety and depression among youth. This research should includesamples that are ethnically diverse and represent both internalizingand externalizing disorders. However, initial indications are thatthese instruments address shortcomings of traditional child self-report measures of anxiety and depression. Thus, the PH–PANAS–C scales may be useful tools in applied and researchsituations where the differentiation of anxiety and depressionamong youth is important.

References

American Psychiatric Association. (1994). Diagnostic and statistical man-ual of mental disorders (4th ed.). Washington, DC: Author.

Attanasio, V., Andrasik, F., Blanchard, E. B., & Arena, J. G. (1984).Psychometric properties of the SUNYA revision of the PsychosomaticSymptom Checklist. Journal of Behavioral Medicine, 7, 247–258.

Beck, A. T., & Steer, R. A. (1993). Manual for the Beck Anxiety Inventory.San Antonio, TX: Psychological Corporation.

Brown, T. A., Chorpita, B. F., & Barlow, D. H. (1998). Structural rela-tionships among dimensions of the DSM–IV anxiety and mood disordersand dimensions of negative affect, positive affect, and autonomicarousal. Journal of Abnormal Psychology, 107, 179–192.

Chorpita, B. F., Albano, A. M., & Barlow, D. H. (1998). The structure ofnegative emotions in a clinical sample of children and adolescents.Journal of Abnormal Psychology, 107, 74–85.

Chorpita, B. F., & Daleiden, E. L. (2002). Tripartite dimensions of emotionin a child clinical sample: Measurement strategies and implications forclinical utility. Journal of Consulting and Clinical Psychology, 70,1150–1160.

Chorpita, B. F., Daleiden, E. L., Moffitt, C., Yim, L., & Umemoto, L. A.(2000). Assessment of tripartite factors of emotion in children and

adolescents I: Structural validity and normative data of an affect andarousal scale. Journal of Psychopathology and Behavioral Assessment,22, 141–160.

Chorpita, B. F., Plummer, C. M., & Moffitt, C. E. (2000). Relations oftripartite dimensions of emotion to childhood anxiety and mood disor-ders. Journal of Abnormal Child Psychology, 28, 299–310.

Clark, L. A., & Watson, D. (1991). Tripartite model of anxiety anddepression: Psychometric evidence and taxonomic implications. Journalof Abnormal Psychology, 100, 316–336.

Cox, B. J., Enns, M. W., Walker, J. R., Kjernisted, K., & Pidlubny, S. R.(2001). Psychological vulnerabilities in patients with major depressionvs. panic disorder. Behaviour Research and Therapy, 39, 567–573.

Crocker, P. R. E. (1997). A confirmatory factor analysis of the PositiveAffect Negative Affect Schedule (PANAS) with a youth sport sample.Journal of Sport & Exercise Psychology, 19, 91–97.

Crook, K., Beaver, B. R., & Bell, M. (1998). Anxiety and depression inchildren: A preliminary examination of the PANAS–C. Journal ofPsychopathology and Behavioral Assessment, 20, 333–350.

Daleiden, E., Chorpita, B. F., Lu, W. (2000). Assessment of tripartitefactors of emotion in children and adolescents II: Concurrent validity ofthe Affect and Arousal Scales for Children. Journal of Psychopathologyand Behavioral Assessment, 22, 161–182.

Ettelson, R., & Laurent, J. (2002, February/March). Assessing anxiety anddepression among adolescents: Confirmatory factor analysis of theMood and Anxiety Symptom Questionnaire. Poster presented at the 34thAnnual Convention of the National Association of School Psychologists,Chicago, IL.

Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis.Journal of Personality Assessment, 68, 532–560.

Huebner, E. S., & Dew, T. (1995). Preliminary validation of the Positiveand Negative Affect Schedule with adolescents. Journal of Psychoedu-cational Assessment, 13, 286–293.

Joiner, T. E., Jr. (1996). A confirmatory factor-analytic investigation of thetripartite model of depression and anxiety in college students. CognitiveTherapy and Research, 20, 521–539.

Joiner, T. E., Jr., & Blalock, J. A. (1995). Gender differences in depression:The role of anxiety and generalized negative affect. Sex Roles, 33,91–108.

Joiner, T. E., Jr., Catanzaro, S. J., & Laurent, J. (1996). The tripartitestructure of positive and negative affect, depression, and anxiety in childand adolescent psychiatric inpatients. Journal of Abnormal Psychology,105, 401–409.

Joiner, T. E., Jr., Steer, R. A., Beck, A. T., Schmidt, N. B., Rudd, M. D.,& Catanzaro, S. J. (1999). Physiological hyperarousal: Construct valid-ity of a central aspect of the tripartite model of depression and anxiety.Journal of Abnormal Psychology, 108, 290–298.

Jolly, J. B., Dyck, M. J., Kramer, T. A., & Wherry, J. N. (1994). Integrationof positive and negative affectivity and cognitive content-specificity:Improved discrimination of anxious and depressed symptoms. Journalof Abnormal Psychology, 103, 544–552.

Keogh, E., & Reidy, J. (2000). Exploring the factor structure of the Moodand Anxiety Symptom Questionnaire (MASQ). Journal of PersonalityAssessment, 74, 106–125.

Kercher, K. (1992). Assessing subjective well-being in the old-old: ThePANAS as a measure of orthogonal dimensions of positive and negativeaffect. Research on Aging, 14, 131–168.

Kovacs, M. (1992). The Children’s Depression Inventory (CDI) manual.North Tonawanda, NY: Multi-Health Systems.

Laurent, J., Catanzaro, S. J., & Joiner, T. E., Jr. (1995). Theory-basedscreening of youth internalizing disorders. Unpublished manuscript,Illinois State University, Department of Psychology, Normal, IL.

Laurent, J., Catanzaro, S. J., & Joiner, T. E., Jr. (1998, April/May). A childmeasure of the tripartite model: Initial development and validation of thePhysiological Hyperarousal-Positive and Negative Affect Scale for Chil-

378 LAURENT, CATANZARO, AND JOINER

page 56

dren. In S. J. Catanzaro (Chair), The tripartite model of anxiety anddepression: Current research and future prospects. Symposium con-ducted at the 70th annual meeting of the Midwestern PsychologicalAssociation, Chicago, IL.

Laurent, J., Catanzaro, S. J., & Joiner, T. E., Jr. (2000, March/April).Physiological Hyperarousal Scale for Children: Scale development andpreliminary validation. Poster session presented at the 32nd annualconvention of the National Association of School Psychologists, NewOrleans, LA.

Laurent, J., Catanzaro, S. J., Joiner, T. E., Jr., Rudolph, K. D., Potter, K. I.,Lambert, S., et al. (1999). A measure of positive and negative affect forchildren: Scale development and preliminary validation. PsychologicalAssessment, 11, 326–338.

Laurent, J., & Ettelson, R. (2001a, April). Assessing anger, anxiety, anddepression among adolescents: An examination of two measures. Posterpresented at the 33rd Annual Convention of the National Association ofSchool Psychologists, Washington, DC.

Laurent, J., & Ettelson, R. (2001b). An examination of the tripartite modelof anxiety and depression and its application to youth. Clinical Child andFamily Psychology Review, 4, 209–230.

Laurent, J., Potter, K., & Catanzaro, S. J. (1994, March). Assessing positiveand negative affect in children: The development of the PANAS–C.Poster session presented at the 26th Annual Convention of the NationalAssociation of School Psychologists, Seattle, WA.

Lee, S. W., Piersel, W. C., & Unruh, L. (1989). Concurrent validity of thephysiological subscale of the Revised Children’s Manifest AnxietyScale: A multitrait-multimethod analysis. Journal of PsychoeducationalAssessment, 7, 246–254.

Lonigan, C. J., Hooe, E. S., David, C. F., & Kistner, J. A. (1999). Positiveand negative affectivity in children: Confirmatory factor analysis of atwo-factor model and its relation to symptoms of anxiety and depression.Journal of Consulting and Clinical Psychology, 67, 374–386.

Lonigan, C. J., Phillips, B. M., & Hooe, E. S. (2003). Relations of positiveand negative affectivity to anxiety and depression in children: Evidencefrom a latent variable longitudinal study. Journal of Consulting andClinical Psychology, 71, 465–481.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.).New York: McGraw-Hill.

Reidy, J., & Keogh, E. (1997). Testing the discriminant and convergentvalidity of the Mood and Anxiety Symptoms Questionnaire using aBritish sample. Personality and Individual Differences, 23, 337–344.

Reynolds, C. R., & Richmond, B. O. (1985). Revised Children’s ManifestAnxiety Scale (RCMAS) manual. Los Angeles: Western PsychologicalServices.

Rudolph, K. D., Lambert, S. F., Osborne, L., Gathright, T., & Kumar, S.(2002). Developmental considerations in anxiety and depression: II. Theemergence of sex differences across early adolescence. Unpublishedmanuscript.

Spielberger, C. D. (1973). Preliminary manual for the State-Trait AnxietyInventory for Children (“How I Feel Questionnaire”). Palo Alto, CA:Consulting Psychologists Press.

Tellegen, A., Watson, D., & Clark, L. A. (1999). On the dimensional andhierarchical structure of affect. Psychological Science, 10, 297–303.

Watson, D. (1988). The vicissitudes of mood measurement: Effects ofvarying descriptors, time frames, and response formats on measures ofpositive and negative affect. Journal of Personality and Social Psychol-ogy, 55, 128–141.

Watson, D., & Clark, L. A. (1984). Negative affectivity: The disposition toexperience aversive emotional states. Psychological Bulletin, 96, 465–490.

Watson, D., & Clark, L. A. (1991). The Mood and Anxiety SymptomQuestionnaire. Unpublished manuscript, University of Iowa, Depart-ment of Psychology, Iowa City, IA.

Watson, D., & Clark, L. A. (1994). The PANAS–X: Manual for the Positiveand Negative Affect Schedule–Expanded form. Unpublished manuscript,University of Iowa, Department of Psychology, Iowa City, IA.

Watson, D., Clark, L. A., & Carey, G. (1988). Positive and negativeaffectivity and their relation to anxiety and depressive disorders. Journalof Abnormal Psychology, 97, 346–353.

Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and vali-dation of brief measures of positive and negative affect: The PANASscales. Journal of Personality and Social Psychology, 54, 1063–1070.

Watson, D., Clark, L. A., Weber, K., Assenheimer, J. S., Strauss, M. E., &McCormick, R. A. (1995). Testing a tripartite model: II. Exploring thesymptom structure of anxiety and depression in student, adult, andpatient samples. Journal of Abnormal Psychology, 104, 15–25.

Watson, D., & Tellegen, A. (1985). Toward a consensual structure ofmood. Psychological Bulletin, 98, 219–235.

Watson, D., Weber, K., Assenheimer, J. S., Clark, L. A., Straus, M. E., &McCormick, R. A. (1995). Testing a tripartite model: I. Evaluating theconvergent and discriminant validity of anxiety and depression symptomscales. Journal of Abnormal Psychology, 104, 3–14.

Wisniewski, J. J., Naglieri, J. A., & Mulick, J. A. (1988). Psychometricproperties of a children’s psychosomatic symptom checklist. Journal ofBehavioral Medicine, 11, 497–507.

Zevon, M. A., & Tellegen, A. (1982). The structure of mood change: Anidiographic/nomothetic analysis. Journal of Personality and Social Psy-chology, 43, 111–122.

(Appendix follows)

379PHYSIOLOGICAL HYPERAROUSAL SCALE FOR CHILDREN

page 57

Appendix

How I Feel

Please circle the number that best describes how often you have felt or experienced the following during thelast two weeks.

ItemVery slightly or

not at all A little Moderately Quite a bit Extremely

Dry mouth 1 2 3 4 5Sweaty hands/palms 1 2 3 4 5Tingling (like pins and needles) 1 2 3 4 5Blushing 1 2 3 4 5Shaky 1 2 3 4 5Stomach ache 1 2 3 4 5Cold flashes/chills 1 2 3 4 5Dizzy 1 2 3 4 5Heart pounding 1 2 3 4 5Sweating when you are not hot 1 2 3 4 5Can’t catch your breath 1 2 3 4 5Feeling of choking 1 2 3 4 5Hot flashes 1 2 3 4 5Numbness (like your foot’s asleep) 1 2 3 4 5Pain in your chest 1 2 3 4 5Feeling like throwing up 1 2 3 4 5Tight muscles 1 2 3 4 5Can’t sit still 1 2 3 4 5

Note. Items are from the Physiological Hyperarousal Scale for Children (Laurent et al., 1995).

Received June 2, 2003Revision received April 28, 2004

Accepted May 27, 2004 �

New Editor Appointed for History of Psychology

The American Psychological Association announces the appointment of James H. Capshew, PhD,as editor of History of Psychology for a 4-year term (2006–2009).

As of January 1, 2005, manuscripts should be submitted electronically via the journal’s ManuscriptSubmission Portal (www.apa.org/journals/hop.html). Authors who are unable to do so shouldcorrespond with the editor’s office about alternatives:

James H. Capshew, PhDAssociate Professor and Director of Graduate StudiesDepartment of History and Philosophy of ScienceGoodbody Hall 130Indiana University, Bloomington, IN 47405

Manuscript submission patterns make the precise date of completion of the 2005 volume uncertain.The current editor, Michael M. Sokal, PhD, will receive and consider manuscripts throughDecember 31, 2004. Should the 2005 volume be completed before that date, manuscripts will beredirected to the new editor for consideration in the 2006 volume.

380 LAURENT, CATANZARO, AND JOINER

page 58

6

ISM

ImLtSssew

Gifted Child QuarterlyVolume 51 Number 1

Winter 2007 64-81© 2007 National Association for

Gifted Children

http://gcq.sagepub.comhosted at

http://online.sagepub.com

page 59

nvestigating the Influence of Attributiontyles on the Development ofathematical Talent

Petri NokelainenUniversity of Tampere, Finland

Kirsi TirriUniversity of Helsinki, Finland

Hanna-Leena Merenti-VälimäkiEspoo-Vantaa Institute of Technology, Finland

4

Putting the Research to Use: It is essential that educators and parents understand the influence of bution styles on the development of mathematical talent. This study provides understanding of howerately, and mildly mathematically gifted adolescents and adults differ in their specific reasons fofailure. Differences in attribution styles between the three groups of mathematically gifted, as measSelf-Confidence Attitude Attribute Scales questionnaire, indicate that it is important to know if the asuccess or failure are stable or unstable, external or internal.

Knowledge of how learners or trainees use attributions to account for success and failure can hand parents gain a deeper awareness of the mathematically gifted and, thus, predict their expectanintervention strategies when needed. The information is also applicable to courses concerning thegifted. Furthermore, the information can be presented directly to mathematically gifted studentsdevelop more insight into their own behavior.

n 2000, 180,000 students from 28 Organization forEconomic Co-Operation and Development (OECD)

ember countries and 4 non-OECD countries (Brazil,atvia, Liechtenstein, and the Russian Federation) par-

icipated in the first Programme for Internationaltudent Assessment (PISA). The results showed thattudents from Japan, Korea, New Zealand, and Finlandcored highest in all tests measuring mathematics lit-racy (OECD, 2001). The Finnish students’ rankingas even higher (i.e. third, when variation within

country was taken into account; O2003, the PISA follow-up study, foematics literacy and involving 276,students, was conducted in 41 countrieThe results of overall student perforent countries on the mathematics sc

Authors’ Note: Please address correspondenceResearch Centre for Vocational EducatiHämeenlinna, Finland 13101; e-mail: petri.no

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

10.1177/0016986206296659

Abstract: In this article, the authors examine the influence of attribution styles on the development of mathematicaltalent. The study employs a Self-Confidence Attitude Attribute Scale questionnaire, which measures ability and effortattributions. Participants are three groups of highly, moderately, or mildly mathematically gifted Finnish adolescentsand adults (N = 203). The results of Bayesian classification modeling show that items attributing success to effort andfailure to lack of effort are the best predictors for the level of mild mathematical giftedness and gender (females). Theresults of multivariate analysis of variance show that highly and moderately mathematically gifted students reportedthat ability was more important for success than effort, but mildly mathematically gifted tended to see effort as lead-ing to success. Moderately and mildly mathematically gifted students attribute failure to lack of effort, whereashighly mathematically gifted students attribute failure to lack of ability.

different attri- highly, mod-r success andured with thettributions for

elp educatorscies and plan needs of the

to help them

Keywords: attribution styles; self-regulation; mathematical talent; academic Olympians; vocational high schoolstudents

ECD, 2001). Incusing on math-165 15-year-olds (OECD, 2004).mance in differ-ale showed that

to Petri Nokelainen,on, P.O. Box 229,[email protected].

Nokelainen et al. / Investigating the Influence of Attribution Styles 65

Hong Kong students had the highest, and Finnishstudents had the second highest, mean student score(OECD, 2004). The finding of small within-countryvariance in the Finnish sample was repeated.

One logical reason for success in international com-parison studies is the Finnish government’s “equalopportunities and high-quality education for all”principle. The first practical consequence of the prin-ciple is that education is free for all students partici-pating in these assessments. The second consequenceis government’s strong financial support for publicsector educational institutions. This has led to the sit-uation where there are no appreciable differences inteaching quality or premises between public and specialschools. Partly for this reason, only a small minorityof the schools in Finland are special schools withentrance examinations and financial support from pri-vate or corporate sources. There are no private uni-versities or polytechnics in Finland.

The purpose of this study is to explore the attribu-tion styles—that is, personal explanations for successand failure—in Finnish adolescents and adults (N =203) with varying levels of mathematical giftednessto discover what attributions contribute to or impedethe development of mathematical talent.

The first group, “Olympians,” consists of highlymathematically gifted adults who have participated inthe International Olympics for Mathematics. Tirri andCampbell (2002) reported that 80% of the FinnishOlympians apply their mathematical talent by choos-ing a career in science. The majority of them areresearchers in academia or engineers in technicalfields. The Olympians have been very successful intheir graduate studies, and they have published arti-cles and books related to their fields. Those Olympianswho did not continue in academia chose a career asan engineer or as a CEO or a manager in leadingFinnish companies, such as Nokia (Tirri, 2002; Tirri& Campbell, 2002).

The second group, “Prefinalists,” consists of sec-ondary school students who have taken part in nationalcompetitions in mathematics. The group representsthe top level of Finnish 15-year-old students that par-ticipated in the international PISA 2000 study.

The third group, “Polytechnics,” consists of adoles-cent students from a technical vocational high schoolwho study mathematics as their major subject. InFinland, most of the vocational high schools are highlyspecialized regional institutions that train adolescentsfor the tasks of an expert. This particular institution

is the top-rated technically oriented vocational highschool in Finland.

Giftedness is not a monolithic construct. There aredifferent levels of giftedness, and thus, the three groupsrepresenting mathematically gifted adolescents andadults in this study are not homogenous. Furthermore,we are not able to guarantee that the individuals withineach group share the same level of mathematical abil-ity. Intelligence quotient is, especially with children, auseful index of the discrepancy between mental andchronological age. As the participants of this study areadolescents and adults, we did not measure their IQ,but instead, we looked at their current or past achieve-ments. Olympians are the most homogenous and math-ematically gifted group in this study on the basis oftheir achievements as Academic Olympians and theirtraceable academic publication record (Nokelainen,Tirri, & Campbell, 2004; Nokelainen, Tirri, Campbell,& Walberg, 2004). We classify Olympians for the pur-pose of this study as highly mathematically gifted.Also, Prefinalists underwent a series of increasinglydemanding mathematical tests to be included in theAcademic Olympians training program. Their trainersare past Olympians—that is, members of the firstgroup in this study. Prefinalists are classified as mod-erately mathematically gifted, as we do not yet knowhow many will be selected to participate as AcademicOlympians in the future. Technical vocation highschool students, who study mathematics as their major,represent mildly mathematically gifted students inthis study. Group membership (1 = Olympians, 2 =Prefinalists, and 3 = Polytechnics) showed a strongpositive correlation with secondary school mathemat-ics grade average (from 1, the highest, to 7), with a cor-relation coefficient of r(203) = .82, p < .001.

Earlier studies of mathematical giftedness havemainly focused on within-group differences relatedto, for example, gender or attribution styles. There arevery few between-group comparisons, except cross-cultural, reported. Socioeconomic differences do existin Finland, but their impact on children’s educationalpossibilities is minor because education is free at alllevels. As the PISA results indicate (OECD, 2001,2004), all individuals in Finland are provided with auniformly high level of basic mathematical training,thus controlling, at least to some extent, individual-level educational differences. This allows us to inter-pret possible differences between the groups throughdifferences in individuals’ characteristics, such asmathematical giftedness and attribution styles.

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 60

All the participants completed the Self-ConfidenceAttitude Attribute Scales (SaaS) questionnaire(Campbell, 1996a). The instrument included 18 itemsbased on Weiner’s (1974, 1980, 1986, 1994, 2000)properties of attributional thinking, measuring abilityand effort attributions on four dimensions: (a) successbecause of ability, (b) failure because of lack of abil-ity, (c) success because of effort, and (d) failure becauseof lack of effort.

Our three research questions are as follows: (a) Arethe four dimensions of the SaaS instrument present inthe empirical sample? (b) What are the best predic-tors of the level of mathematical giftedness and gen-der among the SaaS variables? (c) Do the attributionstyles differ by the level of mathematical giftednessor gender?

Theoretical Framework

Properties of Attributional Thinking

Reasons people give for an outcome, such as suc-cess or failure in a task, are called attributions (Heider,1958). Factors involved in attributional thinking, suchas specific reasons for success and failure, have beenshown to be related in achievement settings (Weiner,1974, 1980, 1986, 1994, 2000). In his studies, Weinerfound that the four most frequent reasons for successand failure are ability, effort, task difficulty, and luck.Subsequent research identified learning strategiesas a fifth possible reason for success and failure(Alderman, 2004): “It is no good thing trying harderif you do not know how to try.”

Dai, Moon, and Feldhusen (1998) classify attributionconstructs into three groups. First, attribution appraisalsare explanations assessed following actual or manipu-lated success or failure in performing a specific task.Second, attribution beliefs are domain-specific ordomain-general beliefs about the causes of success orfailure. Third, attribution styles are generalized, stereo-typical patterns of attributions and dispositional beliefs.Attribution styles are assessed in a similar way toattribution beliefs, except that a certain typology isimposed on the data using predetermined criteria. Inthis study, we examined attribution styles using Weiner’s(1992) classification of reasons for success and fail-ure: (a) internal and external attributions, referring towithin or outside person causes; (b) stable and unsta-ble attributions, referring to consistent or inconsistentcauses over time; and (c) controllable and uncon-trollable attributions, referring to the extent a person

believes he or she has control over the cause of anoutcome. In this study, we examined within-personfactors (ability and effort) as they have typically beenfound to be the most frequently cited reasons forsuccess and failure in achievement contexts. Those fac-tors are classified as “internal” attributions. “External”attributions (luck, task difficulty) were omitted fromthe study design. Thus, our focus is on stable andunstable, controllable and uncontrollable, internal attri-butions. Most effort attributions tend to be unstableand controllable, as opposed to ability attributions,which are usually stable and uncontrollable. We willshow later how the 18 SaaS items are related to thesedimensions. We will also discuss in later stages of theanalysis how the four SaaS factors describe dimen-sions of reasons for success and failure.

Self-Regulation and Attribution Styles

Self-regulation refers to the process through whichself-generated thoughts, feelings, and actions areplanned and systematically adapted as necessary toaffect one’s learning and motivation (Schunk & Ertmer,2000; Zimmerman, 2000). According to social-cognitivetheory, self-regulation is dependent on the situation.Therefore, self-regulation is not a general characteris-tic or a developmental level but rather is contextuallydependent.

Zimmerman (2000) describes self-regulation ascyclical because the feedback from prior performanceis used to make adjustments during current efforts.Personal, behavioral, and environmental factors areconstantly changing, and therefore, an individual has tomonitor these changes continuously to know whetherany adjustments are required. Zimmerman describesthe three feedback loops involved in monitoring one’sinternal state, one’s behaviors, and one’s environmentas the triadic forms of self-regulation.

Figure 1 describes self-regulation of learning tasksas a cyclical, three-phase process (Zimmerman, 1998).The phases in this learning cycle are forethought,performance or volitional control, and self-reflection.Forethought, which creates the necessary conditions forlearning, consists of task analysis and self-motivationbeliefs. Performance or volitional control, which guidesthe learning process and regulates concentration andlearning performance, consists of self-control and self-observation. Self-reflection, which refers to examiningand making meaning of the learning experience, con-sists of self-judgment and self-reaction. Next, we exam-ine more closely the last phase, which contains thefocus of this article, attribution styles.

66 Gifted Child Quarterly, Vol. 51, No. 1

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 61

Nokelainen et al. / Investigating the Influence of Attribution Styles 67

Self-reflection begins with self-judgment, which isthe process whereby an individual compares infor-mation attained through self-monitoring to extrinsicstandards or goals. He or she wants to have fast andaccurate feedback on his or her performance as com-pared to others. Self-judgment leads to attributioninterpretations where the learner interprets the rea-sons for success or failure. Attribution interpretationscan lead to positive self-reactions. The individual mightinterpret the failure of a strategy as the result of toolittle effort and then increase his or her efforts, but ifhe or she interprets the reason for failure as being alack of ability, the reaction is liable to be negative.Attribution interpretations reveal the possible reasonsfor learning mistakes and help the learner to findthose learning strategies that best suit the given situ-ation. They also develop or promote the adaptationprocess. Self-regulated individuals are more adap-tive and evaluate their performance appropriately.

Positive reactions (e.g., self-satisfaction) reinforcepositive interpretations of oneself as an individual andenhance intrinsic interest in the task.

Ellström (2001) defines qualification as the com-petence that is actually required by a task and/or isimplicitly or explicitly determined by individualqualities. In our study setting, the most interestingpoint is that competence may also be seen as anattribute of the individual, meaning, for example, ahuman resource that the person brings to mathemati-cal problem-solving situations. Furthermore, attribu-tions may emphasize formal competence as indicatedby degree requirements and certificates or, the focusof this study, potential competence as indicated bythe capacity of the individual to successfully com-plete tasks and face new challenges on the basis ofdemonstrated personal attributes and abilities (otherthan those obtained through formal training). Ellström(2001) has noticed that potential competence may

Figure 1Cyclical Self-regulatory Phases

Source: Adapted from Zimmermann, 2000

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 62

vary greatly between individuals with the same formalqualifications, because they may possess very differentlevels of inherent ability and may have learned differentthings outside of school or studies through their work-ing life and recreational activities. Thus, ability attribu-tions affect later performance expectations and, innegative cases, the development or continuation oflearned helplessness (Ruohotie & Nokelainen, 2000).

In this study, we concentrate on participants’ self-evaluations on the basis of mathematics achievementand academic ability, because causal attributions (seephase “Self-Reflection” in Figure 1) play an impor-tant part in the self-regulatory process by being cen-tral elements of self-judgment and thus influencing,for example, goal setting and self-efficacy. We areinterested to see if the attribution styles of highlymathematically gifted individuals differ from those ofthe mathematically able.

Our research questions are as follows: (a) Are thefour dimensions of the SaaS instrument (successbecause of ability, failure because of lack of ability,success because of effort, and failure because of lackof effort) present in the empirical sample? (b) What arethe best predictors of the level of mathematical gifted-ness (high = Olympians, moderate = Prefinalists, andmild = Polytechnics) and gender among the SaaS vari-ables? (c) Do the attribution styles differ by the level ofmathematical giftedness or gender?

Literature Review

Mathematical Giftednessand Attribution Styles

Campbell has conducted several cross-nationalstudies on Mathematics Olympians (see, e.g., Campbell,1994, 1996b; Nokelainen, Tirri, & Campbell, 2004).He made two interesting findings: First, the interna-tional data on mathematics self-concept verified thefinding that their academic self-concepts fluctuatefrom grade school to high school and, second, that theOlympians attributed effort to be more important intheir success than ability (Campbell, 1996b). The lat-ter research finding has been verified by Chan (1996),who reported that adolescent gifted students weremore likely to attribute failure to lack of effort than toattribute it to low ability. The American and TaiwaneseOlympians have also attributed success and failuremore to effort than to ability (Feng, Campbell, &Verna, 2001; Wu & Chen, 2001).

Heller and Lengfelder (2000) investigated 100German Olympian finalists and 135 Prefinalists inmathematics, physics, and chemistry. In contrast toCampbell’s findings, their results showed that partic-ipants in both groups valued ability significantlymore highly than effort. Effort was estimated to beequally important in the case of failure as in the caseof success (Heller & Lengfelder, 2000).

Marsh (1983) found, as he studied relationshipsbetween the dimensions of self-attribution, self-concept,and academic achievements, that those who attribute aca-demic success to ability and who do not attribute failureto a lack of ability have better academic self-conceptsand better academic achievement. Multon, Brown, andLent (1991) have also shown a positive correlationbetween perceived ability and achievement.

Gender and Attribution Styles

In an American study by Verna and Campbell(2000), a small significant difference between malesand females was found with regard to perceptions ofability. The female American Chemistry Olympiansconsidered ability to be a more important factor forsuccess than did the males. However, no differencewas found for the effort factor.

Kerr (1994) and Reis (1998) have identified exter-nal barriers to gifted women as including the attitudesof parents and school, environmental options, and pos-sible discrimination or harassment at school or atwork. The possible internal barriers among giftedfemales included self-doubt, self-criticism, and lowexpectations. According to Siegle and Reis (1998),gifted girls tend to underestimate their abilities, espe-cially in mathematics, social studies, and science.

Instrumentation of Attribution Theory

There is abundant literature and research on attri-bution theory, especially on attributional properties inachievement settings (Weiner, 1974, 1980, 1986,1994, 2000), because the role of motivation in acade-mic achievement has proven to be a popular topic. Theprinciple of attribution theory is that students searchfor understanding, trying to discover why an event hasoccurred (Weiner, 1974). The interest is apparent aswe examine the structure of existing measurementinstruments: Biggs’s (1985) 42-item Study ProcessQuestionnaire consists of two scales (Motive andStrategy) with three approaches: (a) surface, (b) deep,and (c) achieving. The questionnaire contains six sub-scales (Surface Motive, Deep Motive, Achieving Motive,

68 Gifted Child Quarterly, Vol. 51, No. 1

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 63

Nokelainen et al. / Investigating the Influence of Attribution Styles 69

Surface Strategy, Deep Strategy, and AchievingStrategy). Ramsden and Entwistle’s (1981) Approachesto Studying Inventory, which is one of the most widelyused questionnaires on student learning in higher edu-cation, contains subscales including such factors asfear of failure, extrinsic motivation, and achievingorientation. Marsh (e.g., Marsh & O’Neill, 1984)has developed a set of scales (Self-DescriptionQuestionnaire I to III) for different age groups measur-ing self-concept with a multifaceted (e.g., mathe-matics, verbal, academic, physical) view. Accordingto Strein (1995), research results during the past 15years have strongly supported the multifaceted viewemphasizing domain-specific self-concepts. In thisstudy, we apply the SaaS questionnaire that was devel-oped by Campbell (1996a) originally for cross-cultural Academic Olympiad studies.

Method

Sample

The Finnish education system includes compre-hensive schools, postcomprehensive general and voca-tional education, higher education, and adult education.Comprehensive schools provide a 9-year compulsoryeducational program for all school-age childrenbeginning at the age of 7. Postcomprehensive educa-tion is provided in upper secondary schools and voca-tional institutions. The Finnish higher educationsystem includes 20 universities and 30 vocationalhigh schools. The higher education system as a wholeoffers openings for 66% of the relevant age group(universities 29%, vocational high schools 37%).

Respondents in the first group, Olympians, are theFinnish students most gifted in mathematics. Thegroup consists of individuals of different ages whoparticipated in Olympiad Studies in Mathematics from1965 through 1999. Separate programs exist for theMathematics, Physics, and Chemistry Olympiads. Inrecent years, programs have been created for Biologyand Computer Science Olympiads as well. Distinctstudies have been undertaken in each of these acade-mic areas. In the Mathematics, Physics, and ChemistryOlympiad programs, a series of increasingly difficulttests are administered. This testing concludes withthe identification of the top national finalists (6 to 20Olympians). These individuals are trained to competein the International Olympiad programs.

The second group, Prefinalists, involved in thisstudy consists of secondary school students who have

taken part in the national competitions in mathemat-ics from 2000 to 2001. Each year, schools all overFinland send their most talented students to thisannual competition. The tests of this competitionresemble the tests used in academic Olympians.

The third group, Polytechnics, consists of studentsof Espoo-Vantaa Institute of Technology. They needprogressively advanced mathematical skills as theyprogress in their studies. However, compared to higherlevel mathematics studies in universities, technologicalmathematics studies in vocational high schools aremore practically oriented.

In addition, respondents’ parents were asked abouttheir educational level. More than 60% of Olympians(62.2%) and Prefinalists (65.4%) parents had an acad-emic degree. Only 23.9% of Polytechnics parents hadthe same educational level. Analysis of parental occu-pation in the three groups showed that Olympians andPrefinalists parents shared similar vocational interests,as they were, for example, doctors, teachers, andbusiness managers. Polytechnics’ parents were mainlymiddle-class (e.g., factory workers). However, inFinland, educational level is not a good predictor ofsocioeconomic status. Prefinalists’ parents belong tothe highest income class in Finland, with the averageannual salary of US$83,597. Both Olympians’ parents($49,447) and Polytechnics’ parents ($46,721) earnmiddle-level salaries in the Finnish context.

Procedure

All the participants completed the SaaS question-naire (Campbell, 1996a) based on Weiner’s (1974) self-attribution theory. The Mathematics Olympians’ data(n = 77) included 68 male and 9 female respondents.The sample is quite representative, as the total numberof Finnish Mathematics Olympians is 84 (70 males and14 females). The second group (n = 52) is a samplefrom about 200 secondary school national competitorsin mathematics. The polytechnic student data (n = 74) isa fully representative sample of an advanced mathemat-ics course held at Espoo-Vantaa Institute of Technologyin Autumn 2001, the total number of participants inwhich is approximately 3,000. Olympiad data was col-lected between 1997 and 2002, the Polytechnic data in2001, and the Prefinalists data between 2001 and 2002.

Table 1 shows that, except for the Polytechnics, gen-der is biased toward males. This finding is related tothe well-documented tendency of females not to pur-sue careers in science even though they are as equallycapable as males (e.g., Enman & Lupart, 2000), unlessone or both of their parents are in the same field (Tirri,

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 64

70 Gifted Child Quarterly, Vol. 51, No. 1

2002). The student’s age is a good predictor for groupmembership.

Measurement Instrument

The SaaS instrument was mailed to respondents ina traditional paper-and-pen form (see Table 2). Theinstrument used a 6-point Likert-type scale rangingfrom 1 = strongly disagree to 6 = strongly agree. TheSaaS questionnaire included 18 items measuring thestudents’ attributions based on self-attribution theory(Weiner, 1974). Although Weiner’s original concep-tualization contained four attributions (ability, effort,difficulty, and luck), the statistical analysis based onnumerous empirical samples produced only two dis-tinct scales: Effort and Ability (Campbell, 1996a,1996b; Feng et al., 2001; Heller & Lengfelder, 2000;Tirri, 2001). In each of these studies, a consistentfactor structure was found for the Ability and Effortscales. Statements linking success and effort pro-duced high scores on the Effort scale. Statements onthe Ability scale expressed the view that ability ismore important than hard work.

In addition to SaaS, we asked for the followingbackground information from the respondents: gen-der, age, number of programming languages known,and average of mathematics, physics, and chemistrysecondary school grades.

Statistical Analysis

The data analysis began by examining all the itemsto see if they were technically applicable for linear sta-tistic computations based on multivariate normalityassumptions, such as exploratory factor analysis (EFA)and multivariate analysis of variance (MANOVA).

In the second stage, we conducted an explorativefactor analysis to answer the first research question:

Are the four dimensions of the SaaS instrument (suc-cess because of ability, failure because of lack of abil-ity, success because of effort, and failure because oflack of effort) present in the empirical sample?

Bayesian classification modeling (Silander & Tirri,1999, 2000) was conducted in the third stage of theanalysis to answer the second research question: Whatare the best predictors for the level of mathematicalgiftedness (high = Olympians, moderate = Prefinalists,and mild = Polytechnics) and gender among the SaaSvariables? Bayesian classification modeling resembleslinear discriminant analysis (LDA; Huberty, 1994),but it is free of most of the assumptions of Gaussianmodeling (Nokelainen, Ruohotie, & Tirri, 1999;Nokelainen & Tirri, 2004).

In the fourth stage, we conducted MANOVA (withthe Roy-Bargman step-down analysis and Bonferronipost hoc test) to see if the attribution styles differ bythe level of mathematical giftedness or gender. Wheninvestigating more than one dependent variable, weapplied factorial MANOVA instead of a series ofANOVAs, as it controls for increasing risk of Type Ierror (falsely rejecting null hypothesis when it is true).

Results

Investigating VariablesStatistical Properties

In the first stage, a frequency analysis was carriedout for all the variables. Results show that the respon-dents used the whole scale from 1 (totally disagree) to6 (totally agree) for all items. According to Kerlinger(1986), before constructing one’s own questionnaire,“one should first ask the question: Is there a better wayto measure my variables?” (p. 495). He classifiesweaknesses of rating scales into extrinsic and intrinsic.

Table 1Description of Mathematics Olympians, Prefinalists, and Polytechnics Data

Olympians Prefinalists Polytechnics(n = 77) (n = 52) (n = 74)

n % n % n %

GenderMale 68 88 43 83 40 54Female 9 12 9 17 34 46

AgeMedian 37 17 24Range 20 to 55 15 to 20 20 to 34

Note: Total N of the data is 203.

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 65

71

Tabl

e 2

Des

crip

tive

Sta

tist

ics

and

Att

ribu

tion

Dim

ensi

ons

of t

he S

elf-

Con

fide

nce

Att

ribu

te A

ttit

ude

Scal

e

Oly

mpi

ans

Pref

inal

ists

Poly

tech

nics

(n=

77)

(n=

52)

(n=

74)

Stab

le –

Con

trol

labl

e –

Item

MSD

MSD

MSD

Uns

tabl

eaU

ncon

trol

labl

eb

Eff

ort (

12 it

ems)

1. I

did

poo

rly

only

whe

n I

did

not w

ork

hard

eno

ugh.

3.79

0.97

3.56

1.15

3.46

1.15

UC

2. Y

ou c

an b

e su

cces

sful

in a

nyth

ing

if y

ou w

ork

hard

3.27

1.23

3.78

1.03

3.99

0.85

UC

enou

gh a

t it.

6. W

hen

I sc

ored

low

on

a te

st,i

t was

bec

ause

I d

idn’

t3.

920.

793.

630.

973.

530.

95U

Cst

udy

hard

eno

ugh.

8. M

y ac

hiev

emen

t wou

ld h

ave

been

bet

ter

3.29

1.25

3.83

0.96

4.09

0.69

UC

if I

trie

d ha

rder

.9.

Sel

f-di

scip

line

is th

e ke

y to

sch

ool s

ucce

ss.

3.27

0.91

3.47

0.92

3.41

1.06

SC

10. T

he s

mar

t kid

s tr

ied

the

hard

est.

2.43

0.93

2.81

0.95

2.22

0.93

SC

11. P

oor

stud

y ha

bits

are

the

mai

n ca

use

of lo

w g

rade

s.3.

320.

913.

440.

983.

541.

04U

C12

. I h

ad to

wor

k ha

rd to

get

goo

d gr

ades

.2.

110.

952.

080.

952.

741.

03S

C15

. Whe

n I

didn

’t u

nder

stan

d so

met

hing

,it m

eant

I3.

690.

873.

581.

073.

510.

94U

Cdi

dn’t

put

in e

noug

h tim

e.16

. I c

ould

hav

e do

ne b

ette

r in

mat

hem

atic

s if

I h

ad3.

111.

173.

811.

054.

010.

85U

Cw

orke

d ha

rder

.17

. Har

d w

ork

is th

e ke

y to

get

goo

d gr

ades

.2.

831.

052.

771.

133.

000.

89S

C18

. I le

t peo

ple

dow

n w

hen

I do

n’t w

ork

hard

eno

ugh.

2.64

1.05

2.62

1.14

2.45

1.09

UC

Abi

lity

(6 it

ems)

3. T

here

are

som

e th

ings

you

can

not d

o no

mat

ter

3.68

1.19

3.29

1.17

2.97

1.19

SU

how

har

d yo

u tr

y.4.

I w

orke

d ha

rder

if I

like

d th

e te

ache

r.3.

161.

223.

571.

243.

770.

99U

C5.

Bei

ng s

mar

t is

mor

e im

port

ant t

han

wor

king

har

d.2.

991.

033.

230.

922.

641.

03S

U7.

You

hav

e to

hav

e th

e ab

ility

in o

rder

to s

ucce

ed in

3.92

0.74

3.90

0.69

3.86

0.75

SU

mos

t thi

ngs.

13. W

hen

I di

d po

orly

in s

choo

l it w

as b

ecau

se I

did

2.46

1.04

2.37

0.98

2.09

0.80

SU

not h

ave

the

need

ed a

bilit

y.14

. Why

wor

k in

an

area

whe

re y

our

abili

ty is

low

?2.

811.

032.

751.

232.

581.

16S

U

a. S

tabl

e an

d un

stab

le a

ttrib

utio

ns r

efer

to c

onsi

sten

t or

inco

nsis

tent

cau

se o

ver

time.

b. C

ontr

olla

ble

and

unco

ntro

llabl

e at

trib

utio

ns r

efer

to e

xten

t per

son

belie

ves

he o

r sh

e ha

s co

ntro

l ove

r th

e ca

use

of a

n ou

tcom

e.

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 66

72 Gifted Child Quarterly, Vol. 51, No. 1

The extrinsic defect is that scales are much too easy toconstruct and use. Sometimes a scale is used to mea-sure things for which it is not appropriate. Kerlingerdefines the intrinsic defect of rating scales as theirproneness to constant error. He lists four main sources:halo effect, the error of severity (to rate all items toolow), error of leniency (to rate all items too high), anderror of central tendency (to avoid all extreme judg-ments). To address this issue, we analyzed the overallresponse tendency. We found that distribution of themodes on a 6-point Likert-type scale was multimodaland slightly biased toward positive values: (a) n = 0,(b) n = 5, (c) n = 0, (d) n = 11, (e) n = 0, and (f) n = 0.

Numerous publications declare that certain attrib-utes belong to data appropriate for multivariate analy-sis (e.g., Bradley & Schaefer, 1998; Tabachnick &Fidell, 1996). The most commonly used criteria foraccepting variables for multivariate analysis are asfollows: (a) a standard deviation of no more than halfthe mean, (b) skewness less than +/– .3, and (c) cor-relation between +/– .3 and .7. When we examinedthe 18 items using the first two criteria, we noticedthat all the items passed the first criteria, but only 4items passed the second criteria. As it seemed impos-sible to take the second criteria literally because of ahigh rejection rate at the .03 level, we examined theskewness of items in three additional levels (.05, .07,and .08). The .07 level proved to be suitable for thisdata set, suggesting rejection of three items (#4, #6,and #7). A nonparametric interitem correlation matrixwas produced to examine the third criteria. Thirteenitems reached the desired level, as the values rangedfrom –.48 to .71 (M = .05, SD = .16). The rejecteditems were #4, #7, #10, #14, and #18. We examinedmultivariate normality with Mahalanobis distances.The maximum values for the two SaaS scales werebelow critical values obtained from the chi-squaretable (alpha = .001), thus not indicating the presenceof outliers.

Finally, when we combine the results of the variableselection phase, it seems obvious that at least two items(#4, “I worked harder if I liked the teacher,” and #7,“You have to have the ability in order to succeed inmost things”) should be omitted from further analysis.

Explorative Factor Analysis

Our next task, according to the first researchquestion, was to see if the combined sample andthree subsamples contained the following fourdimensions: (a) success because of ability, (b) failure

because of lack of ability, (c) success because ofeffort, and (d) failure because of lack of effort. Weperformed the analysis with 16 items, as the variablerejection based on communalities of two-dimensionalprincipal components analysis structure did notappear to provide a feasible solution. Factor analysiswith the maximum likelihood extraction method anddirect oblimin rotation (delta value was set to zero, i.e.,letting factors correlate) was conducted for the com-bined sample (N = 203) and for each sample sepa-rately (Olympians n = 77, Prefinalists n = 52, andPolytechnics n = 74).

A four-factor solution with eight items grouped thevariables in all three subsamples and the combinedsample as expected. Next, we present the eight itemsoperationalizing the four SaaS factors. Factor 1, successbecause of ability, included only one variable: #5,“Being smart is more important than working hard.”The logic behind this solution was that the other tworelated items (#4, “I worked harder if I liked theteacher,” and #7, “You have to have the ability in orderto succeed in most things”) were omitted from furtheranalysis because they did not meet the assumptions ofmultivariate analysis. Factor 2, failure because of a lackof ability, included Items 3, “There are some things youcannot do no matter how hard you try,” and 13, “WhenI did poorly in school it was because I did not have theneeded ability” (alpha = .62). Factor 3, success becauseof effort, included Item 9, “Self-discipline is the key toschool success”; Item 12, “I had to work hard to getgood grades”; and Item 17, “Hard work is the key to getgood grades” (alpha = .63). Factor 4, failure because ofa lack of effort, included Items 8, “My achievementwould have been better if I tried harder,” and 16, “Icould have done better in mathematics if I had triedharder” (alpha = .82). The Cronbach’s alpha values forthe four factors within the three groups varied asfollows: Olympist data (Factor 1 = not calculated,Factor 2 = .56, Factor 3 = .75, and Factor 4 = .76),Prefinalists data (Factor 1 = not calculated, Factor 2 =.60, Factor 3 = .67, and Factor 4 = .84), andPolytechnics data (Factor 1 = not calculated, Factor 2 =.59, Factor 3 = .42, and Factor 4 = .80).

Although we found only one item measuring thefirst SaaS dimension, success because of ability, cor-relations between factors behaved as expected (seeTable 3). Ability and effort factors correlated nega-tively with each other, and both effort factors, as wellas both ability factors, correlated positively.

Table 4 shows how four SaaS factors are related tointernal, stable-unstable, and controllable-uncontrollable

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 67

Nokelainen et al. / Investigating the Influence of Attribution Styles 73

attribution dimensions. The stable-unstable dimen-sion is important, as research has typically found thatlearners view ability as relatively stable (Alderman,2004). For example, if a mathematically gifted personis convinced that he or she is not able to solve certaintypes of problems, it is an indication of an internal-stable attribution, and his or her failure appears to befixed. This belief pattern is known as learned help-lessness. However, there is also a third dimension:controllable-uncontrollable. As seen in Table 4, bothability factors represent uncontrollable attributions, andboth effort factors represent controllable attributions. Ifthe mathematically gifted person decides to continuesolving problems related to areas he or she finds diffi-cult, he or she has changed an uncontrollable attribution(ability) to one that he or she can control (effort). Theleast serious dimension for the learner’s self-esteem isunstable-controllable, because the first componentrefers to a situation that is temporary by nature and thesecond refers to an effort level that is adjustable. Forexample, a learner explains his or her failure in integralcalculations by saying that certain routine proceduresneed more practice. No single SaaS item (see Table 2)or factor represents the last quarter of Table 4, unstable-uncontrollable, as it describes, for example, a situationin which a mathematically gifted person guesses cor-rectly the answers to those competition exercises thathe or she is unable to solve. Although we did notinclude external attributions in our design, we will

demonstrate it by converting the latest internal-unstable-uncontrollable example to the form of external-unstable-uncontrollable. This is accomplished by replacing“guessing” (internal attribution) with an “easy test”(external attribution).

Bayesian Classification Modeling

We conducted the Bayesian classification modelingwith the B-Course program (Myllymäki, Silander, Tirri,& Uronen, 2002) to find out which variables measuringattribution styles are the best predictors for the level ofmathematical giftedness (high = Olympians, moderate= Prefinalists, and mild = Polytechnics) and gender(Research Question 2). In the classification process, theautomatic search tried to find the best set of variables topredict the class variable for each data item. This pro-cedure resembles the traditional LDA (Huberty, 1994),but the implementation is totally different. For example,a variable selection problem that is addressed withforward, backward, or stepwise selection procedure inLDA is replaced with a genetic algorithm approach(e.g., Hilario, Kalousisa, Pradosa, & Binzb, 2004; Hsu,2004) in the Bayesian classification modeling. Thegenetic algorithm approach means that variable selec-tion is not limited to one (or two or three) specificapproach; instead, many approaches and their combi-nations are exploited. One possible approach is to beginwith the presumption that the models (i.e., possible

Table 3Correlations Between the Self-Confidence Attribute Attitude Scale Factors

1 2 3 4

1. Success because of ability (Item 5) 1.0002. Failure because of lack of ability (Items 3, 13) 0.135 1.0003. Success because of effort (Items 9, 12, 17) –0.194** –0.052 1.0004. Failure because of lack of effort (Items 8, 16) 0.151* –0.286** 0.171* 1.000

Note: Item descriptions can be found in Table 2.*p < .05. **p < .01 (two-tailed).

Table 4Self-Confidence Attribute Attitude Scale Factors by Attribution Dimensions

Internal

Controllable Uncontrollable

Items Factors Items Factors

Stable 9, 12, 17 Success because of effort 5 Success because of abilityUnstable 8, 16 Failure because of lack of effort 3, 13 Failure because of lack of ability

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 68

74

Tabl

e 5

Impo

rtan

ce R

anki

ng o

f th

e Se

lf-C

onfi

denc

e A

ttri

bute

Att

itud

e Sc

ale

Item

s by

the

Lev

el o

f G

ifte

dnes

s an

d G

ende

r

The

Lev

el o

f G

ifte

dnes

sG

ende

r

Oly

mpi

ans

Pref

inal

ists

Poly

tech

nics

Mal

eFe

mal

eD

ropa

(n=

77)

(n=

52)

(n=

74)

(n=

149

)(n

= 5

2)

Cla

ss a

nd P

redi

ctor

Var

iabl

es%

MSD

MSD

MSD

MSD

MSD

The

leve

l of

gift

edne

ssb

10. T

he s

mar

t kid

s tr

ied

the

hard

est.

14.3

62.

430.

932.

810.

952.

220.

9316

. I c

ould

hav

e do

ne b

ette

r in

mat

hem

atic

s if

I7.

923.

111.

173.

811.

054.

010.

85ha

d w

orke

d ha

rder

.12

. I h

ad to

wor

k ha

rd to

get

goo

d gr

ades

.6.

932.

110.

952.

080.

952.

741.

035.

Bei

ng s

mar

t is

mor

e im

port

ant t

han

4.95

2.99

1.03

3.23

0.92

2.64

1.03

wor

king

har

d.3.

The

re a

re s

ome

thin

gs y

ou c

anno

t do

no3.

963.

681.

193.

291.

172.

971.

19m

atte

r ho

w h

ard

you

try.

4. I

wor

ked

hard

er if

I li

ked

the

teac

her.

2.48

3.16

1.22

3.57

1.24

3.77

0.99

8. M

y ac

hiev

emen

t wou

ld h

ave

been

bet

ter

1.98

3.29

1.25

3.83

0.96

4.09

0.69

if I

trie

d ha

rder

.G

ende

rc

12. I

had

to w

ork

hard

to g

et g

ood

grad

es.

6.93

2.23

0.96

2.67

1.10

8. M

y ac

hiev

emen

t wou

ld h

ave

been

bet

ter

6.44

3.68

1.13

3.85

0.87

if I

trie

d ha

rder

.2.

You

can

be

succ

essf

ul in

any

thin

g if

you

3.96

3.65

1.14

3.71

0.98

wor

k ha

rd e

noug

h at

it.

1. I

did

poo

rly

only

whe

n I

did

not w

ork

3.47

3.67

1.11

3.43

1.03

hard

eno

ugh.

14. W

hy w

ork

in a

n ar

ea w

here

you

r ab

ility

2.97

2.78

1.13

2.61

1.02

is lo

w?

5. B

eing

sm

art i

s m

ore

impo

rtan

t tha

n1.

982.

960.

992.

861.

06w

orki

ng h

ard.

a. D

ecre

ase

in p

redi

ctiv

e cl

assi

fica

tion

if it

em is

dro

pped

fro

m th

e cl

assi

fica

tion

mod

el.

b. C

lass

ific

atio

n ac

cura

cy is

65.

35%

.c.

Cla

ssif

icat

ion

accu

racy

is 7

9.70

%.

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 69

Nokelainen et al. / Investigating the Influence of Attribution Styles 75

predictor variable combinations) that resemble eachother a lot (i.e., have almost same variables and dis-cretizations) are likely to be almost equally good. Thisleads to a search strategy in which models that resem-ble the current best model are selected for comparison,instead of picking models randomly. Another approachis to abandon the habit of always rejecting the weakestmodel and instead collect a set of relatively good mod-els. The next step is to combine the best parts of thesemodels so that the resulting combined model is betterthan any of the original models. B-Course is capable ofmobilizing many more viable approaches, for example,rejecting the better model (algorithms such as hillclimbing, simulated annealing) or trying to avoid pick-ing similar model twice (tabu search).

First, we derived the model for classifying data itemsaccording to the class variable level of mathematical gift-edness (Olympians, Prefinalists, and Polytechnics) with18 variables of the SaaS scale as predictors (items arelisted in Table 2). The estimated classification accuracyfor the model was 65.4%. Second, we derived the modelfor classifying data items according to the class variablegender. The estimated classification accuracy for themodel was 79.7%.

Table 5 lists the variables ordered by their estimatedclassification performance in the model. The strongestvariables—that is, those that discriminate the indepen-dent variables best—are listed first. The percentagevalue attached to each variable indicates the predicteddecrease in the classification performance if the vari-able were to be dropped from the model. The tableshows that the variables in the first two models, levelof giftedness and gender, have a clear order of impor-tance. The most important variable for both models isItem 12, “I had to work hard to get good grades.” If weremove that variable from the first model, it wouldweaken the performance from 65.4% to 58.4%.Removal of the variable from the second model wouldweaken the performance from 79.7% to 72.8%.

Differences in the group means in Table 5 show thatthe first classification between three groups of themathematically gifted is based on effort attributions asfive items out of seven measure success or failurebecause of effort. The first item, 10, “The smart kidstried the hardest,” is the best overall predictor variable.However, the other items are more interesting, asthey show that both mildly (Polytechnics) and moder-ately (Prefinalists) mathematically gifted individualsattribute failure to lack of effort, but only mildly math-ematically gifted individuals attribute success to effort.Highly (Olympians) and moderately (Prefinalists)

mathematically gifted individuals prefer ability as anexplanation for their success.

Females in this sample tend to attribute success toeffort more than males. Furthermore, they are alsomore likely to attribute failure to lack of ability thanmales. Both findings are consistent with existingresearch (e.g., Alderman, 2004; Vermeer, Boekaerts,& Seegers, 2000). However, we note that “femalevoice” in this study belongs to mostly those who aremathematically mildly gifted as they are members ofthe Polytechnics group. This explains at least tosome extent why items measuring effort have suchan important role in the first two classificationmodels.

Multivariate Analysis of Variance

We investigated the third research question, “Do theattribution styles differ by the level of mathematicalgiftedness) and gender?” with a 3 × 2 factorial multi-variate analysis of variance. The four dependent vari-ables were SaaS factors based on both theoreticalassumptions (Weiner, 1974) and the results of preced-ing EFA: success because of ability, failure because oflack of ability, success because of effort, and failurebecause of lack of effort. The independent variableswere the level of mathematical giftedness and gender.Preliminary assumption testing was conducted to checkfor normality, linearity, univariate, and multivariate out-liers; homogeneity of variance-covariance matrices;and multicollinearity. No violations were discoveredexcept for the fourth factor, failure because of lack ofeffort, for which the test of homogeneity of variancewas not met (Levene’s p < .001, Cochran’s p = .014,Bartlett-Box’s p = .002). Larger variances indicate thata .05 alpha level is overstated and the differences shouldbe assessed using a lower value (e.g., .03; Hair,Anderson, Tatham, & Black, 1998). For such depen-dent variables, Tabachnick and Fidell (1996) suggestusing Pillai’s criterion instead of Wilks’s lambda.

With the use of Pillai’s criterion, the level of mathe-matical giftedness multivariate main effect on the SaaSfactors was found to be significant, F(8, 384) = 4.33,p < .001. The gender multivariate main effect on theSaaS factors was not found to be significant, F(4, 191) =0.23, p = .992. The level of mathematical giftedness andgender multivariate interaction was not found to be sig-nificant, F(8, 384) = 0.45, p = .893. The resultsreflected a modest association between three groups ofmathematically gifted and the SaaS factors, partialη2 = .08 (Pillai’s trace) – .16 (Roy’s Largest Root). This

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 70

76 Gifted Child Quarterly, Vol. 51, No. 1

finding suggests that group membership explains attri-bution styles from 8% to 16%. The achieved statisticalpower for this main effect was 1.0.

To further investigate the impact of the level ofmathematical giftedness main effect on the SaaS, aRoy-Bargman step-down analysis was performed.Step-down analysis resolves the problem of corre-lated univariate F tests with correlated dependentvariables (Tabachnick & Fidell, 1996). The followingpriority order of SaaS factors from most to leastimportant was developed on the basis of theoreticalassumptions, instead of assigning priority on thebasis of univariate F, to avoid problems inherent instepwise regression: (a) success because of ability,(b) failure because of lack of ability, (c) successbecause of effort, and (d) failure because of lack ofeffort. In step-down analysis, each SaaS factor wasanalyzed, in turn, with higher priority factors treated ascovariates and with the highest priority SaaS factor(success because of ability) tested in a univariateANOVA. In addition, univariate F values were calcu-lated to allow correct interpretation of the step-downanalysis. Results of the analysis are summarized inTable 6. An experiment-wise error rate of 5% wasachieved by the apportionment of alpha, as shown inthe last column of Table 6, for each of the SaaS factors.

A unique contribution to predicting differencesbetween the three groups of mathematically gifted wasmade by the success because of ability factor, step-downF(2, 194) = 3.28, p = .040, η2 = .03. According to Cohen(1988), the effect size is small, indicating that only 3%of the variance in the dependent variable is attributable

to differences in mathematical giftedness. Highly (meansuccess because of ability = 3.45, SE = .07) and moder-ately (mean success because of ability = 3.56, SE = .09)mathematically gifted individuals evaluated the role ofability to be higher than the mildly mathematicallygifted did (mean success because of ability = 3.28, SE =.07). The mean difference between the moderately andmildly gifted reached statistical significance usingBonferroni adjusted alpha level of .013, p = .040. Thisresult is consistent with both Weiner’s (1986) find-ings showing high-achieving students’ tendency touse internal-stable-uncontrollable causal attributionsfor success and Heller and Lengfelder’s (2000) findingsfor German Olympians and Prefinalists. However, stud-ies of American and Taiwanese Olympians showedopposite results, as participants referred more to effortthan ability attributions (Campbell, 1996b; Feng et al.,2001; Wu & Chen, 2001).

After the pattern of differences measured by the firstSaaS factor was entered, a difference was also found onfailure because of lack of ability, step-down F(2, 193) =11.20, p < .001, η2 = .10. The effect size of this findingis moderate, indicating that 10% of variance in thedependent variable is attributable to differences inmathematical giftedness. Mathematically highly gifted(adjusted mean failure because of lack of ability = 3.00,SE = .09) students attributed failure to lack of abilitymore than moderately (adjusted mean failure becauseof lack of ability = 2.60, SE = .11) and mildly (adjustedmean failure because of lack of ability = 2.38, SE = .09)mathematically gifted. The mean difference betweenhighly and moderately mathematically gifted students

Table 6Tests of the Level of Giftedness, Gender, and Their Interaction

IV DV Univariate F df Step-down Fa df α η2

The level of giftedness SUC_ABI 3.28* 2,194 3.28* 2,194 .01 .03FAI_ABI 11.99*** 2,194 11.20*** 2,193 .01 .10SUC_EFF 4.12* 2,194 3.05* 2,192 .01 .03FAI_EFF 17.80*** 2,194 11.43*** 2,191 .01 .11

Gender SUC_ABI 0.00 1,194 0.00 1,194 .01 .00FAI_ABI 0.91 1,194 0.92 1,193 .01 .00SUC_EFF 0.13 1,194 0.11 1,192 .01 .00FAI_EFF 0.18 1,194 0.09 1,191 .01 .00

Group by gender SUC_ABI 1.00 2,194 1.00 2,194 .01 .01FAI_ABI 0.09 2,194 0.16 2,193 .01 .00SUC_EFF 0.23 2,194 0.38 2,192 .01 .00FAI_EFF 0.73 2,194 0.26 2,191 .01 .00

Note: α = adjusted alpha level. η2 = effect size for the step-down F. SUC_ABI = success because of ability; FAI_ABI = failure becauseof lack of ability; SUC_EFF = success because of effort; FAI_EFF = failure because of lack of effort.a. Roy-Bargman step-down F.*p < .05. **p < .01. ***p < .001.

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 71

Nokelainen et al. / Investigating the Influence of Attribution Styles 77

reached statistical significance using Bonferroniadjusted alpha level of .013, p = .028. Also, the meandifference between the highly and mildly mathemati-cally gifted reached statistical significance usingBonferroni adjusted alpha level of .013, p < .001. Thisinternal-stable-uncontrollable finding, which accordingto Marsh (1983) may lead to low academic self-conceptand achievement, is against Weiner’s internal-unstable-controllable expectation for high achievers who havefailed.

The third step in the analysis was to enter the suc-cess because of effort factor. This step-down reachedstatistical significance, however, with a small effectsize, F(2, 192) = 3.05, p < .05, η2 = .03. Mildly math-ematically gifted individuals valued success becauseof effort higher (adjusted mean success because ofeffort = 3.04, SE = .09) than those who were highly(adjusted mean success because of effort = 2.74, SE =.09) and moderately gifted (adjusted mean successbecause of effort = 2.79, SE = .10). The mean differ-ence between mildly and highly gifted studentsreached statistical significance using Bonferroniadjusted alpha level of .013, p = .029. This resultshowing mildly mathematically gifted Polytechnicspreferring internal-unstable-controllable attributionswas expected.

After the pattern of differences measured by suc-cess because of ability, failure because of lack of abil-ity, and success because of effort was entered, adifference was also found on the attitude toward fail-ure because of lack of effort, step-down F(2, 191) =11.43, p < .001, η2 = .11. According to Cohen (1988),the effect size for this finding is classified as large.Mildly (adjusted mean failure because of lack of effort =3.99, SE = .11) and moderately (adjusted mean failurebecause of lack of effort = 3.77, SE = .12) mathemat-ically gifted students attributed failure to lack of effortmore than the highly gifted did (adjusted mean failurebecause of lack of effort = 3.27, SE = .11). The meandifference between mildly gifted Polytechnics andhighly gifted Olympians reached statistical signifi-cance using Bonferroni adjusted alpha level of .013,p < .001. In addition, the mean difference between mod-erately gifted Prefinalists and highly gifted Olympiansreached statistical significance using Bonferroniadjusted alpha level of .013, p < .001. The group-levelresults of this second SaaS failure factor were notcongruent with theoretical expectations. We expectedto see mildly gifted individuals prefer internal-stable-uncontrollable attributions—that is, failure because

of lack of ability—instead of internal-unstable-controllable.

Summary of Results

In this article, we have examined the influence ofattribution styles on the development of mathematicaltalent in three groups of mathematically giftedFinnish adolescents and adults (N = 203).

All the participants completed the SaaS question-naire (Campbell, 1996a). The instrument included 18items, based on Weiner’s (1974) attribution theory,measuring the students’ ability and effort attributionson four dimensions: (a) success because of ability,(b) failure because of lack of ability, (c) success becauseof effort, and (d) failure because of lack of effort.

The research questions in this study were asfollows: (a) Are the four dimensions of the SaaS instru-ment (success because of ability, failure because oflack of ability, success because of effort, and failurebecause of lack of effort) present in the empirical sam-ple? (b) What are the best predictors for the level ofmathematical giftedness and gender among the SaaSscale variables? and (c) Do the attribution styles differby the level of mathematical giftedness (Olympians,Prefinalists, and Polytechnics) or gender?

The first research question was addressed withEFA. The results showed that the four dimensions ofthe SaaS were present in all samples. The overallalpha values ranged from .62 to .82. Three alpha val-ues were less than .60 in the group level (Olympians,Factor 2 alpha = .56; Polytechnics, Factor 2 alpha =.59 and Factor 3 alpha = .42).

The second research question was analyzed usingBayesian classification modeling. The classificationvariables were the level of mathematical giftedness(high = Olympians, moderate = Prefinalists, and mild =Polytechnics) and gender. Eighteen SaaS items werepredictors in all the analyses. The results showed thatboth Polytechnic students and females think that they“had to work hard to get good grades.” When we fur-ther examined the females’ preference for effort as acause for success, we learned that the result was trueonly for the Polytechnics and Prefinalists samples, asthere was no difference between female and maleOlympians’ responses. Thus, Verna and Campbell’s(2000) earlier finding that female Chemistry Olympiansconsidered ability to be a more important factor forsuccess was not repeated in this study. Failure becauseof lack of ability was the only self-attribute scale that

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 72

78 Gifted Child Quarterly, Vol. 51, No. 1

was able to predict respondent’s age. The youngeststudents (15 to 28 years old) believed more in theirabilities than the older ones (29 to 41 and 42 to 55years old). We explain this finding with the fact thatthe younger individuals have not yet reached as higha level in their mathematical studies as the older onesand thus realized that “the more you know, the moreyou know you ought to know.”

The third research question was analyzed with a 3 ×2 factorial design MANOVA. Dependent variableswere four SaaS factors (success because of ability, fail-ure because of lack of ability, success because of effort,and failure because of lack of effort). Independent vari-ables were the level of mathematical giftedness (high =Olympians, moderate = Prefinalists, and mild =Polytechnics) and gender. Results showed that thelevel of mathematical giftedness multivariate maineffect on the SaaS factors was found to be signifi-cant. The gender multivariate main effect on the SaaSfactors and the level of mathematical giftedness andgender multivariate interaction were not found tobe significant. Highly and moderately mathematicallygifted individuals felt that ability is more important tosuccess than effort. According to Dai et al. (1998),such attributions represent self-awareness of highpotentialities that constitute a necessary but not suffi-cient condition for high levels of performance. Mildlymathematically gifted individuals tended to see effortleading to success. Mildly and moderately mathemati-cally gifted students attributed failure to lack of effort.The highly gifted attributed failure to lack of ability.This finding is related to self-concept, as mildly andmoderately mathematically gifted individuals tendto judge their efficacy favorably, whereas the highlygifted are likely to base their appraisals of self-efficacyon the actual difficulty levels of the tasks in question(see Dai et al., 1998, for discussion).

Limitations of the Study

In this study, we measured attribution styles with aquestionnaire. Such self-reporting allows us to study, asopposed to attribution appraisals or causal beliefs,hypothetical success and failure situations without clearreference to who is the performer. The first possiblesource of error is the SaaS instrument translation fromEnglish to Finnish. To control the error variancebecause of translation, all the items were retranslatedback to English and compared with the original items.

However, no pilot study with correlational analyses wasconducted. A second possible source of error is cross-cultural differences between U.S. and Finnish mathe-maticians, as the original instrument was developed forstudies among the U.S. mathematics Olympians.Fortunately, the language of the SaaS items is free ofcultural references. A third possible source of error isthe psychometric properties of the SaaS instrumentitself, as no alpha values were reported in the originalstudy (Campbell, 1996a).

Discussion

The theoretical idea of Ellström (2001) essential tothis study is that attributions for success or failure affectpotential competence, which is a human resource eachindividual brings to the mathematical problem-solvingsituation. So what are the “good” attributions that giverise to individual’s potential competence? The previousresearch body shows two trends: The first is seeing abil-ity as a more important explanation for success thaneffort (e.g., “ability is everything”; e.g., Heller &Lengfelder, 2000), and another is claiming that “abilitywithout effort goes nowhere” (e.g., Campbell, 1996b;Chan, 1996; Feng et al., 2001; Wu & Chen, 2001).

Why do several international Academic Olympianresearch teams end up with contradictory results?The first natural explanation is cultural differences(Campbell, Tirri, Ruohotie, & Walberg, 2004). Wesuggest here that the second natural explanation isstatistical. In all of the aforementioned studies, effortand ability attributions were calculated as mean scoresof 18 SaaS items: the Effort scale was measured with12 items and the Ability scale with 6 items (see Table 2).The results of variable selection and EFA indicated that,in the Finnish sample, keeping all 18 SaaS items on themodel does not lead to a psychometrically justifiedsolution (αeffort = .63 and αability = .27). We forced thetwo-factor solution and calculated internationally com-parable mean scores for the Finnish Olympian sample(n = 77) but found no difference between ability (M =3.16, SD = 0.32) and effort attributions (M = 3.18, SD= 0.45). It should also be noted that the reported meandifference between Ability (M = 2.91, SD = 0.58) andEffort (M = 3.21, SD = 0.62) scales in the Americanstudy (Campbell, 1996b) is quite small.

The third trend in research (e.g., Schunk & Ertmer,2000; Zimmermann, 2000) says that ability and effortwithout self-regulation goes nowhere. Figure 1 shows

© 2007 National Association for Gifted Children. All rights reserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 http://gcq.sagepub.comDownloaded from

page 73

Nokelainen et al. / Investigating the Influence of Attribution Styles

that ability is present in the “Self-Reflection” phase thatprecedes the “Forethought” phase, where individualsset goals and plan actions. At this point, it is importantto make a distinction between an individuals’ real andself-perceived ability level. The real ability level is whatthe first research trend is speaking about, but accordingto the concept of self-regulation, it is undistinguishablefrom the self-perceived ability. This notion comes fromthe fact that effort is represented in the preceding phase,“Performance or Volitional Control,” in the cyclicalself-regulation process described in Figure 1 via self-control. It is therefore sending adjusting signals via self-observation to the individual’s self-perceived ability.Our results that show highly mathematically gifted pre-ferring ability over effort as an explanation for successor failure might indicate that their perceived abilitylevel is so high that they are able to meet the mostdemanding challenges by adjusting their efforts.According to the “attribution asymmetry” phenomenon(Dai et al., 1998), high-ability students tend to attributetheir success to both ability and effort. According tothem, attributing success to ability represents self-awareness of high potentialities that constitute a neces-sary but not sufficient condition for success. Attributingsuccess to effort has a self-enhancing and motivatingeffect, as one feels in control of one’s own development.

It is not our primary purpose to compare ability andeffort only with questions that ask “Which is a betterexplanation for success?” but rather to continue withfurther questions, such as “What are the best develop-mental practices for those individuals who mostly pre-fer effort over ability as an explanation for success in agiven task?” As Campbell (1995) says, “achievers needfour qualities: ability, discipline, confidence and goodworking habits” (p. 186). Thus, a high effort level as acause for success may indicate that tasks are toodemanding and thus individuals feel that too mucheffort is needed to accomplish the task. Furthermore,this might indicate that an individual needs more sup-port to be convinced that he or she has the ability to suc-ceed. Research has shown that ability is often viewed asa stable and uncontrollable attribution (e.g., SaaS Item7, “You have to have the ability in order to succeed inmost things”). The worst scenario for a mathematicianaccording to some of the self-attribution theorists (e.g.,Dweck, 1999) is to blame personal ability for failure, asit is believed to be something that you either do or donot have, an internal and stable attribution. It is inter-esting that the most highly mathematically gifted in thisstudy scored highest in this respect when comparedto those who were moderately and mildly gifted. This

finding makes sense if we agree that Olympians repre-sent the highest mathematical giftedness level in thisstudy and thus have probably faced a lot more demand-ing mathematical tasks during their lifetime than theother two group members.

Kay Alderman (2004) suggests that it is up to theteacher or trainer to convince a learner or trainee thatmathematical thinking ability as a skill or knowledgeis a learnable, unstable quality. Thus, knowledge ofhow learners or trainees use attributions to accountfor success and failure can help teachers or trainerspredict their expectancies and plan interventionstrategies when needed.

© 2007 National Association for Gifted Children. All rights rhttp://gcq.sagepDownloaded from

page 74

References

Alderman, K. (2004). Motivation for achievement. Possibilitiesfor teaching and learning. Mahwah, NJ: Lawrence Erlbaum.

Biggs, J. (1985). The role of metalearning in study processes.British Journal of Educational Psychology, 55, 185-212.

Bradley, W. J., & Schaefer, K. C. (1998). The uses and misuses ofdata and models. Thousand Oaks, CA: Sage.

Campbell, J. R. (1994). Developing cross-cultural/cross-nationalmethods and procedures. International Journal of EducationalResearch, 21(7), 675-684.

Campbell, J. R. (1995). Raising your child to be gifted.Cambridge, MA: Brookline Books.

Campbell, J. R. (1996a). Developing cross-national instruments:Using cross-national methods and procedures. InternationalJournal of Educational Research, 25(6), 485-496.

Campbell, J. R. (1996b). Early identification of mathematics talenthas long-term positive consequences for career contributions.International Journal of Educational Research, 25(6), 497-522.

Campbell, J. R., Tirri, K., Ruohotie, P., & Walberg, H. (Eds.).(2004). Cross-cultural research: Basic issues, dilemmas, andstrategies. Hämeenlinna, Finland: RCVE.

Chan, L. (1996). Motivational orientations and metacognitiveabilities of intellectually gifted students. Gifted ChildQuarterly, 40, 184-193.

Cohen, J. (1988). Statistical power analysis for the behavioralsciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Dai, D. Y., Moon, S. M., & Feldhusen, J. F. (1998). Achievementmotivation and gifted students: A social cognitive perspective.Educational Psychologist, 33(2/3), 45-63.

Dweck, C. (1999). Self theories: Their role in motivation, per-sonality and development. Philadelphia: Psychology Press.

Ellström, P.- E. (2001). The many meanings of occupational com-petence and qualification. In W. J. Nijhof & J. N. Streumer(Eds.), Key qualifications in work and education (pp. 39-50).Dordrecht, the Netherlands: Kluwer Academic.

Enman, M., & Lupart, J. (2000). Talented female students’ resis-tance to science: An exploratory study of post-secondaryachievement motivation, persistence, and epistemologicalcharacteristics. High Ability Studies, 11(2), 161-178.

Feng, A., Campbell, J. R., & Verna, M. (2001). The talent devel-opment of American Physics Olympians. Gifted and TalentedInternational, 16(2), 108-114.

79

eserved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 ub.com

page 75

Hair, J. F., Anderson, R. E., Tatham R. L., & Black, W. C. (1998).Multivariate data analysis (5th ed.). Englewood Cliffs, NJ:Prentice Hall.

Heider, F. (1958). The psychology of interpersonal relationships.New York: Wiley.

Heller, K., & Lengfelder, A. (2000, April). German Olympiadstudy on mathematics, physics and chemistry. Paper presentedat the annual meeting of American Educational ResearchAssociation, New Orleans, LA.

Hilario, M., Kalousisa, A., Pradosa, J., & Binzb, P.- A. (2004).Data mining for mass-spectra based diagnosis and biomarkerdiscovery. Drug Discovery Today: BIOSILICO, 2(5), 214-222.

Hsu, W. H. (2004). Genetic wrappers for feature selection indecision tree induction and variable ordering in Bayesiannetwork structure learning. Information Sciences, 163(1-3),103-122.

Huberty, C. (1994). Applied discriminant analysis. New York:John Wiley & Sons.

Kerlinger, F. (1986). Foundations of behavioral research (3rded.). New York: CBS College Publishing.

Kerr, B. (1994). Smart girls: A new psychology of girls, women andgiftedness (2nd ed.). Scottsdale, AZ: Gifted Psychology Press.

Marsh, H. (1983). Relationships among dimensions of self-attribution, dimensions of self-concept and academic achieve-ments. (ERIC Document Reproduction Service No. ED 243 914)

Marsh, H., & O’Neill, R. (1984). Self Description QuestionnaireIII: The construct validity of multidimensional self-conceptratings by late adolescents. Journal of Educational Measure-ment, 21, 153-174.

Multon, K. D., Brown, S. D., & Lent, R. W. (1991). Relation ofself-efficacy beliefs to academic outcomes: A meta-analyticinvestigation. Journal of Counseling Psychology, 38, 30-38.

Myllymäki, P., Silander, T., Tirri, H., & Uronen, P. (2002). B-Course: A Web-based tool for Bayesian and causal data analy-sis. International Journal on Artificial Intelligence Tools,11(3), 369-387.

Nokelainen, P., Ruohotie, P., & Tirri, H. (1999). Professional growthdeterminants—Comparing Bayesian and linear approachesto classification. In P. Ruohotie, H. Tirri, P. Nokelainen, &T. Silander (Eds.), Modern modeling of professional growth(Vol. 1, pp. 85-120). Hämeenlinna, Finland: RCVE.

Nokelainen, P., & Tirri, H. (2004). Bayesian methods that opti-mize cross-cultural data analysis. In J. R. Campbell, K. Tirri,P. Ruohotie, & H. Walberg (Eds.), Cross-cultural research:Basic issues, dilemmas, and strategies (pp. 141-158).Hämeenlinna, Finland: RCVE.

Nokelainen, P., Tirri, K., & Campbell, J. R. (2004). Cross-culturalpredictors of mathematical talent and academic productivity.High Ability Studies, 15(2), 230-242.

Nokelainen, P., Tirri, K., Campbell, J. R., & Walberg, H. (2004).Isolating factors that contribute or hinder adult productivity:Comparing the Terman longitudinal studies with the retro-spective Olympiad studies. In J. R. Campbell, K. Tirri,P. Ruohotie, & H. Walberg (Eds.), Cross-cultural research:Basic issues, dilemmas, and strategies (pp. 119-140).Hämeenlinna, Finland: RCVE.

Organization for Economic Co-Operation and Development.(2001). Knowledge and skills for life. first results from theOECD Programme for International Student Assessment(PISA) 2000. Paris: Author.

80

© 2007 National Association for Gifted Children. All rights reshttp://gcq.sagepubDownloaded from

Organization for Economic Co-Operation and Development.(2004). Learning for tomorrow’s world. First results fromPISA 2003. Paris: Author.

Ramsden, P., & Entwistle, N. (1981). Effects of academic depart-ments on students’ approaches to studying. British Journal ofEducational Psychology, 51, 368-383.

Reis, S. (1998). Work left undone. Mansfield Center, CT: CreativeLearning Press.

Ruohotie, P., & Nokelainen, P. (2000). Modern modeling of studentmotivation and self-regulated learning. In P. R. Pintrich &P. Ruohotie (Eds.), Conative constructs and self-regulatedlearning (pp. 141-193). Hämeenlinna, Finland: RCVE.

Schunk, D. H., & Ertmer, P. A. (2000). Self-regulation and acad-emic learning. In M. Boekaerts, P. R. Pintrich, & M. Zeidner(Eds.), Handbook of self-regulation (pp. 631-650). San DiegoCA: Academic Press.

Siegle, D., & Reis, S. (1998). Gender differences in teacher andstudent perceptions of gifted students’ ability and effort.Gifted Child Quarterly, 42(1), 39-47.

Silander, T., & Tirri, H. (1999). Bayesian classification. InP. Ruohotie, H. Tirri, P. Nokelainen, & T. Silander (Eds.),Modern modeling of professional growth (Vol. 1, pp. 61-84).Hämeenlinna, Finland: RCVE.

Silander, T., & Tirri, H. (2000, April). Model selection for Bayesiannetworks. Paper presented at the annual meeting of AmericanEducational Research Association, New Orleans, LA.

Strein, W. (1995). Assessment of self-concept. (ERIC DocumentReproduction Service No. ED 389 962)

Tabachnick, B., & Fidell, L. (1996). Using multivariate statistics.New York: HarperCollins.

Tirri, K. (2001). Finland Olympiad studies: What factors con-tribute to the development of academic talent in Finland?Educating Able Children, 5(2), 56-66.

Tirri, K. (2002). Developing females’ talent: Case studies of FinnishOlympians. Journal of Research in Education, 12(1), 80-85.

Tirri, K., & Campbell, J. (2002). Actualizing mathematical gift-edness in adulthood. Educating Able Children, 6(1), 14-20.

Vermeer, H. J., Boekaerts, M., & Seegers, G. (2000).Motivational and gender differences: Sixth-grade students’mathematical problem-solving behavior. Journal ofEducational Psychology, 92, 308-315.

Verna, M., & Campbell, J. R. (2000, April). Career orientations forAmerican chemistry Olympians. Paper presented at the annualmeeting of American Educational Research Association, NewOrleans, LA.

Weiner, B. (1974). Achievement motivation and attributiontheory. Morristown, NJ: General Learning Press.

Weiner, B. (1980). The role of affect in rational (attributional)approaches to human motivation. Educational Researcher, 9,4-11.

Weiner, B. (1986). An attributional theory of motivation andemotion. New York: Springer.

Weiner, B. (1992). Human motivation: Metaphors, theories andresearch. Newbury Park, CA: Sage.

Weiner, B. (1994). Integrating social and personal theories ofachievement striving. Review of Educational Research, 64,557-573.

Weiner, B. (2000). Intrapersonal and interpersonal theories ofmotivation from an attributional perspective. EducationalPsychology Review, 12(1), 1-14.

Gifted Child Quarterly, Vol. 51, No. 1

erved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 .com

page 76

Wu, W., & Chen, J. (2001). A follow-up study of Taiwan physics andchemistry Olympians: The role of environmental influences in tal-ent development. Gifted and Talented International, 16(1), 16-26.

Zimmerman, B. J. (1998). Developing self-fulfilling cyclesof academic regulation: An analysis of exemplary instruc-tional models. In D. H. Schunk & B. J. Zimmerman (Eds.),Self-regulated learning: From teaching to self-reflective prac-tice (pp. 1-19). New York: Guilford.

Zimmerman, B. J. (2000). Attaining self-regulation. A socialcognitive perspective. In M. Boekaerts, P. R. Pintrich, &M. Zeidner (Eds.), Handbook of self-regulation (pp. 13-39).San Diego, CA: Academic Press.

Petri Nokelainen, Ed. Lic., is a special researcher at the ResearchCentre for Vocational Education, University of Tampere, Finland.

Nokelainen et al. / Investigating the Influence of Attribution Styles

© 2007 National Association for Gifted Children. All rights resehttp://gcq.sagepub.cDownloaded from

His research interest lies in the study of applied statistical model-ing, gifted education, modern network-based learning, and profes-sional growth.

Kirsi Tirri is a professor at the Department of PracticalTheology,University of Helsinki, Finland. Her research interestsinclude gifted education, teacher training, moral education, andcross-cultural studies.

Hanna-Leena Merenti-Välimäki is a principal lecturer of math-ematics at the Espoo-Vantaa Institute of Technology. Her researchinterests include mathematical education, mathematical model-ing, and statistical and technical analysis.

81

rved. Not for commercial use or unauthorized distribution. by guest on October 16, 2007 om

International Journal of Behavioral Development2006, 30 (1), 76–83

http://www.sagepublications.com

© 2006 The International Society for theStudy of Behavioural Development

DOI: 10.1177/0165025406062127

Introduction

According to Rest (1983), moral behavior is the result of atleast four component processes: (1) identifying a situation asa moral problem, (2) figuring out what one ought to do andevaluating possible plans of action, (3) evaluating how thevarious courses of action serve moral and nonmoral values anddeciding which action will be pursued, and (4) executing theplan of action. Most of the research and theorizing on moraldevelopment has focused on the second component: the phasein which one needs to figure out what ought to be done, giventhe fact that a situation is perceived as a moral problem. AsThoma, Rest, and Davison (1991) have noted, there areseveral interpretive systems by which moral action choices canbe generated. People may rely on justice reasoning or so-calledmoral reasoning (Kohlberg, 1976), but they may also rely onconcepts of care (Gilligan, 1977), social norms and conven-tions (Nisan, 1984; Turiel, 1983), or religious prescriptions(Lawrence, 1979). This article will focus on the relationbetween moral reasoning and religiosity.

Kohlberg (1981) has argued that religiosity and moralreasoning are inherently unrelated because they constitute twodistinct areas of human concern: Whereas moral decisionmaking is grounded in rational arguments of justice and isinfluenced by level of cognitive development (e.g., education)and exposure to socio-moral experiences (e.g., role-takingopportunities), religious reasoning is based on revelations byreligious authorities.Whereas the primary function of moralityis to resolve competing claims among individuals, the primaryfunction of religion is to affirm morality. In other words,whereas moral reasoning provides moral prescriptions,religious reasoning affirms moral judgment as meaningful(Fernhout & Boyd, 1985). In spite of Kohlberg’s arguments,several researchers have attempted to relate both concepts andhave concluded that religiosity and morality are not unrelated

at all (e.g., Deka & Broota, 1988; Siegmund, 1979; Wakenhut,1981). However, other researchers (e.g., Wahrman, 1981) haveargued that the apparent religiosity–morality relation canprobably be explained by cognitive processes such as dogma-tism. In the present study, we will examine whether differencesin the processing of religious contents can indeed explain thesupposed religiosity–morality relation.

Recent theorizing in the psychology of religion makes adistinction between being religious or not and the way in whichreligious contents are processed (e.g., Wulff, 1991). Usually,both dimensions are tightly intertwined in religiosity measures.However, Fontaine, Duriez, Luyten, and Hutsebaut (2003)have shown that the two religiosity dimensions that Wulff(1991) described (Exclusion versus Inclusion of Tran-scendence and Literal versus Symbolic) can be discerned bythe Post-Critical Belief Scale (Duriez, Fontaine, & Hutsebaut,2000). In this way, the effect of being religious as such can beseparated from the effect of the way people process religiouscontents. This allows for a nuanced study of the religiosity–morality issue. First, the theory of Wulff is summarized,followed by a presentation of the Post-Critical Belief Scale.Second, the difference between moral attitudes and moralcompetence is highlighted, followed by a presentation of theMoral Judgment Test (Lind, 1998), which allows for the separ-ation of both aspects. Third, research on the religiosity–morality relation is summarized. Finally, the aim of the presentstudy is outlined and hypotheses regarding the relationbetween the religiosity dimensions and both moral attitudesand moral competence are formulated.

The theory of Wulff

According to Wulff (1991), all possible approaches to religioncan be located in a two-dimensional space along two orthog-onal bipolar dimensions. The vertical axis in this space, the

Religiosity, moral attitudes and moral competence: A criticalinvestigation of the religiosity–morality relation

Bart Duriez and Bart SoenensDepartment of Psychology, Katholieke Universiteit Leuven, Belgium

The present study investigates the relation between the religiosity dimensions which Wulff (1991)described (Exclusion versus Inclusion of Transcendence and Literal versus Symbolic) and both moralattitudes and moral competence. The Post-Critical Belief Scale (Duriez, Fontaine, & Hutseabut,2000) was used as a measure of Wulff ’s religiosity dimensions, and the Moral Judgment Test (Lind,1998) was used to measure both moral attitudes and moral competence. Results from a middleadolescent sample (N = 338), a university sample (N = 336) and an adult sample (N = 336) suggestthat whereas the Literal versus Symbolic dimension shows substantial relations with moral attitudesand moral competence, the Exclusion versus Inclusion of Transcendence dimension is unrelated toboth of them. This suggests that, although there is no intrinsic relationship between religiosity andmorality, the way people process religious contents is predictive of the way they deal with moralissues.

Correspondence concerning this article should be sent to BartDuriez, Katholieke Universiteit Leuven, Department of Psychology,

Tiensestraat 102, 3000 Leuven, Belgium; e-mail: [email protected]. Tel: +32(0)16/32.59.62. Fax: 32(0)16/32.60.00

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 77

Exclusion versus Inclusion of Transcendence dimension,specifies to what extent people accept the existence of God orsome other transcendent reality, and refers to the distinctionsbetween being religious or not and being spiritual or not. Thehorizontal axis, the Literal versus Symbolic dimension, issituated at the level of social cognitions and refers to the wayreligious contents are processed, namely either in a literal orsymbolic way. In this way, four attitudes towards religion aredefined (see Figure 1). Literal Affirmation represents a positionin which the literal existence of the religious realm is affirmed.This position is most clearly embodied by religious fundamen-talists. Literal Disaffirmation represents a position in which theexistence of the religious realm is rejected and in which thepossibility is lost out of sight that religious language has asymbolic meaning. Religious language is also understood in aliteral way, but this time religion is rejected. Symbolic Dis-affirmation represents a position in which the existence of thereligious realm is rejected, but in which the possibility is takeninto account that religious contents might refer to a hiddensymbolic meaning. Symbolic Affirmation represents a positionin which the existence of the religious realm is affirmed, andin which one tries to encompass and transcend reductiveinterpretations in order to find a symbolic meaning in thereligious language which has personal relevance.

Building on Wulff ’s theory, Hutsebaut and his colleagues(Duriez & Hutsebaut, 2000; Hutsebaut, 1996) constructed thePost-Critical Belief Scale, which captures four approaches to

Christian religion: Orthodoxy, External Critique, Relativismand Second Naiveté. These four approaches were consideredequivalent to Literal Affirmation, Literal Disaffirmation,Symbolic Disaffirmation and Symbolic Affirmation, respec-tively (see Figure 1). Only recently, however, have thoroughassessments of the construct validity of the Post-Critical BeliefScale been made. Duriez et al. (2000) have shown that its foursubscales provide accurate measures of Wulff ’s fourapproaches to religion, and Fontaine, Duriez, Luyten, andHutsebaut (2003) have shown that, when differences in acqui-escence are corrected for, two components are sufficient toexplain the empirical relations among the items and that thesetwo components can be interpreted in terms of the dimensionsExclusion versus Inclusion of Transcendence and Literalversus Symbolic. Both dimensions have recently been shownto result largely from the identity style that people developduring adolescence, and hence can be said to be susceptible todevelopmental influences (Duriez & Soenens, in press; Duriez,Soenens, & Beyers, 2004).

Moral attitudes versus moral competence

Within the Kohlbergian tradition (e.g., Colby et al., 1983,1987; Kohlberg, 1969, 1976, 1981, 1984; Rest, 1974, 1997),moral reasoning is defined as the individual’s socio-moralperspective: the characteristic point of view from which theindividual formulates moral judgments. In this line of research,

INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT, 2006, 30 (1), 76–83 77

• O4

• O7 • S8O1

E10 E9E7

E4E1

E2

E8 E3 E11

R6

R10

R3 R2

R5

R1R4

•• O2 •

S7 •• S1

• O8 • O3 • S2

• S5

• O6• O5

S9 • S6 • •

R11

• ••

••

• ••

• •

••

Exclusion of Transcendence

Inclusion of Transcendence

Literal Symbolic

Literal AffirmationOrthodoxy

Symbolic AffirmationSecond Naiveté

External CritiqueLiteral Disaffirmation

RelativismSymbolic Disaffirmation

.10

.20

.30

.40

.50

.60

-.60

-.50

-.40

-.30

-.20

-.10

-.10 -.20 -.40 -.30 -.50 .10 .20 .30 .40

S3

Figure 1. Integration of the average structure of the Post-Critical Belief Scale items in Wulff ’s (1991) theoretical model.

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 78

participants are usually offered moral dilemmas in which thereis a conflict between different moral principles, and in whicheach possible solution is doomed to conflict with some moralprinciples. Participants are asked to argue, either freely (as isthe case in Kohlberg’s Moral Judgment Interview) or via forcedchoice (as is the case in Rest’s Defining Issues Test), why it isjustified to choose a certain outcome. On the basis of this kindof research, and drawing on Piagetian assumptions concern-ing stagewise cognitive development, Kohlberg (1984)proposed a six stage model to describe moral development.These six stages are divided, two by two, in three distinctivelevels. The preconventional level has been described as a self-perspective. Social norms are either not comprehended orignored, and hence fail to enter into the process of moralreasoning. The guiding moral principle is to avoid punishment(stage 1) and to satisfy one’s needs (stage 2). In the conventionallevel, social norms guide the moral reasoning process. Of centralimportance are being a nice person (stage 3) and conforming toas well as trying to maintain the social order (stage 4). Finally,in the post-conventional level, one no longer relies upon thesocial norms, but rather on the moral principles upon whichthese norms are based. There is a focus on the legal viewpoint,including the possibility to change the law when at odds withrational considerations of social utility (stage 5) and on abstractethical principles, such as equality and respect for the dignity ofhuman beings (stage 6). When there is a conflict betweenconventions and moral principles, a conventional reasoner willjudge by convention rather than by moral principle, whereas apost-conventional reasoner will judge by principle rather thanby convention. However, this does not imply that individualsat the post-conventional level are also more moral.

In this respect, Wagner (1990) notes that a higher level ofmoral development is not defined by the “correctness” of one’smoral conclusions, but by the concepts and reasons employedin arriving at these conclusions. Individuals who have reachedhigher moral development levels have a repertoire of conceptsand justifications which allows them to comprehend the moralreasoning of persons at lower levels. Conversely, persons atlower moral development levels are unable to understand fullythe justifications used by those who have reached higher moraldevelopment levels. The Kohlbergian tradition thus presup-poses an affective-cognitive parallelism (e.g., Kohlberg, 1969,p. 349): A preference for higher stages (the affectivecomponent) should develop simultaneously with the ability touse the underlying perspective in a consistent and differenti-ated manner (the cognitive component).

As Lind (1985) has noted, although this affective-cognitiveparallelism is one of the core assumptions of Kohlberg’stheory, this hypothesis had not been dealt with adequately inthe design of research methods and, hence, was never empiri-cally assessed. To test this hypothesis, a new research designwas needed. For this purpose, Lind (1978, 1995, 1998; Lind& Wakenhut, 1985) constructed the Moral Judgment Testwhich allows the investigation of this supposed affective-cogni-tive parallelism. According to Lind (1995), it is insufficient andtheoretically invalid to focus exclusively on the moral principlessomeone pursues or, in other words, on his/her moral attitudes(i.e., the affective aspect). One should also look at how compe-tently or how consistently a person applies these principles inthe decision-making process (i.e., the cognitive aspect). A childmay hold high moral principles, such as justice and keepingone’s promises, but will lack the competence to apply them ina consistent but differentiated manner to everyday decision-

making. Thus, according to Lind (1995), a consistent moraljudgment can only be expected in highly morally developedsubjects. But this consistency must be defined with respect toa well-reasoned criterion. The criterion Lind puts forward isthat people should appreciate a moral principle independentlyof whether the resulting arguments are in line with personalopinion on a particular issue. Now, how does this work?

The Moral Judgment Test confronts people with two moraldilemmas. For each dilemma, a person has to indicate to whatdegree he/she agrees with the solution chosen by the maincharacter(s). Next, this person is confronted with six argu-ments pro and six arguments contra his/her opinion on how tosolve each of the dilemmas. Each argument represents one ofKohlberg’s (1958, 1984) stages of moral reasoning. The sumof the scores a person obtains for the arguments referring tothe same stage indicates the degree to which this personreasons according to the underlying perspective. In addition,the Moral Judgment Test measures the degree to which judg-ments about these pro and contra arguments are consistent. Ahighly morally consistent or competent person will appreciateall arguments referring to a certain socio-moral perspective,irrespective of whether it is a pro or contra argument. A personwith low moral competence will appreciate the pro argumentsonly. And although the C-index is obviously logically inde-pendent of the moral principles someone pursues, a stronglypositive relation between the C-index and a preference for thehighest stages of Kohlberg’s model has been reported (Lind,1985). Thus, in general, people obtaining the highest moralcompetence levels are also the ones preferring the mostadvanced socio-moral perspectives. These results supportKohlberg’s presupposed affective-cognitive parallelism.

Religiosity and morality

Although Kohlberg (1981) argued that religiosity and moralreasoning are inherently unrelated, research has reported thatreligiously affiliated persons exhibit increased preference forKohlberg’s conventional level (Siegmund, 1979; Wakenhut,1981) and decreased preference for the principled reasoningthat is exhibited in stages 5 and 6 (Deka & Broota, 1988).Moral reasoning was also reported to be negatively related toAllport and Ross’s (1967) intrinsic and extrinsic religiosity(Sapp & Gladding, 1989), and positively to Batson’s (1976)quest dimension (Glover, 1997; Sapp & Gladding, 1989).Following Kohlberg’s assumption of affective-cognitive paral-lelism, these findings suggest that religious persons exhibitlimited moral development because they lack the cognitivecapacity for principled reasoning. However, Ernsberger andManaster (1981) and Glover (1997) have argued that themoral reasoning of religious persons depends on the serious-ness of their religious commitment and on the moral stagewhich is normative for their religious community. In a religiouscommunity whose teachings include principled reasoning,highly religious individuals are likely to show increased prefer-ence for this kind of reasoning. In contrast, in a communitywhose teachings do not include principled reasoning, highlyreligious individuals are likely to exhibit decreased preferencefor this kind of reasoning. The theological superiority of theconventional moral arguments would then overrule the logicalsuperiority of the post-conventional arguments. In a similarvein, Wahrman (1981) argued that the apparent religiosity–morality relation can probably be explained by cognitiveprocesses such as dogmatism.

78 DURIEZ AND SOENENS / RELIGIOSITY, MORAL ATTITUDES AND MORAL COMPETENCE

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 79

The present study

In the present study, both the Post-Critical Belief Scale andthe Moral Judgment Test were used. The Post-Critical BeliefScale allowed separating differences in religiosity (Exclusionversus Inclusion of Transcendence) from differences in the waypeople process religious contents (Literal versus Symbolic).The Moral Judgment Test yielded measures of both moral atti-tudes and moral competence. Given the fact that the Literalversus Symbolic dimension relates to empathy, perspectivetaking, authoritarianism and racism (Duriez, 2004a, 2004b) –all of which have been shown to relate to moral development(e.g., Duriez & Van Hiel, 2002; Ernsberger & Manaster, 1981;Kohlberg, 1976), as well as to concepts that are closely relatedto dogmatism such as dualism, intolerance of ambiguity andclosed-mindedness (Desimpelaere et al., 1999; Duriez, 2003a)– we expected this dimension to relate negatively to preferencefor lower moral stages and positively to preference for highermoral stages and moral competence. We expected theExclusion versus Inclusion of Transcendence dimension to beunrelated to moral attitudes and moral competence once theLiteral versus Symbolic dimension is considered.

Although the main aim of this study was to look at the reli-giosity–morality relation, we also looked at how age andeducation related to moral competence. Research has shownthat level of education is a more important determinant ofmoral competence than age, and that education can stimulatemoral competence (Lind, 1993, 1995, 2003; Oser, 1986; Rest,1986). Therefore, looking at the relations between moralcompetence and age and education allowed us to test thevalidaty of the Moral Judgment Test. In line with previousresearch, we expected moral competence to be unrelated to ageand to be positively related to level of education. To rule outthe possibility that the relationships between moralcompetence and the religiosity dimensions that are derivedfrom the Post-Critical Belief Scale (i.e., Exclusion versusInclusion of Transcendence and Literal versus Symbolic) canbe reduced to educational differences, we checked whetherthese relations between moral competence and the religiositydimensions remain intact after taking educational differencesinto account. In addition, to rule out the possibility that therelations between moral competence and the religiosity dimen-sions vary by level of education, we tested the moderating roleof education. It was expected that, despite potential mean(educational) differences in the study variables, the relation-ships among the constructs would hold across level ofeducation.

Method

Samples

Dutch-speaking Belgian educational science students (N =171) of the Katholieke Universiteit Leuven were asked tocomplete a questionnaire, and to distribute five questionnairesto other people, including both their parents (or, when imposs-ible, adults of the same age, gender and educational level), afellow college student of the opposite gender, and a male anda female high school student. Because participants receivedcourse credit, response rates were very high (> 98%). Thisprocedure resulted in a sample of adolescents (Sample 1; N =338; 50% male ranging in age from 14 to 20 years with a meanof 16 years), university students (Sample 2; N = 336; 50%

male ranging in age from 18 to 24 years with a mean of 20years), and adults (Sample 3; N = 336; 50% male ranging inage from 30 to 70 years with a mean of 48 years). Of theparticipants in Sample 3, the highest obtained degree was aprimary school degree (10%), a high school degree (26%), anon-university higher education degree (41%), or a universitydegree (23%). Given our data collection procedure, this distri-bution can be assumed to resemble the educational distri-bution of the parents of the participants in Samples 1 and 2.In Belgium, Roman Catholicism is the dominant religion, andalthough only 10% of Belgians attend church services regu-larly, about 90% are baptized as Roman Catholic (Office ofChurch Statistics, Brussels), so all participants had a fairknowledge of Roman Catholic doctrines and customs. Partici-pants having missing values on the Moral Judgment Test orhaving over three missing values on the Post-Critical BeliefScale were excluded from the analyses, resulting in samplesizes of 314 (Sample 1), 320 (Sample 2) and 318 (Sample 3).Missing values on the Post-Critical Belief Scale (24 in Sample1, 18 in Sample 2 and 18 in Sample 3) were replaced by thesample-specific mean of the item.

Measures

Religiosity. Participants completed the 33-item Post-CriticalBelief Scale (Duriez et al., 2000). All items were scored on a7-point Likert scale. As in Fontaine et al. (2003), a level ofacquiescence estimation was subtracted from the raw scores. APrincipal Component Analysis (PCA) was then performed onthese corrected scores. A scree test pointed to a two-compo-nential solution for all three samples. In all samples, afterorthogonal Procrustes rotation towards the average structurereported by Fontaine et al. (2003; see Figure 1), thesecomponents could be interpreted in terms of Exclusion versusInclusion of Transcendence and Literal versus Symbolic. In allsamples, Tucker’s Phi indices were above .90 for bothcomponents, suggesting good congruence (Van de Vijver &Leung, 1997). Estimates of internal consistency (theta;Armor, 1974) were .87, .88 and .89 for Exclusion versusInclusion of Transcendence and .80, .83 and .84 for Literalversus Symbolic in Samples 1 to 3 respectively. A high scoreon Exclusion versus Inclusion of Transcendence indicates atendency to include transcendence. A high score on Literalversus Symbolic indicates a tendency to deal with religion in asymbolic way.

Morality. Participants completed the Moral Judgment Test(Lind, 1998), which consists of a workers’ dilemma and amercy-killing dilemma. For each dilemma, a person has toindicate to what degree he/she agrees with the solution chosenby the main character. Next, this person is confronted with sixarguments pro and six arguments contra his/her opinion onhow to solve the dilemma. The person then indicates, on a 9-point scale ranging from -4 to +4, to what degree these argu-ments are (un)acceptable.The mean score a person obtains forthe arguments referring to the same stage indicates the degreeto which this person reasons according to the underlying socio-moral perspectives. In addition, the C-index measures thedegree to which a person’s judgments about these pro andcontra arguments are consistent. A detailed description of howto compute this index can be found in Lind (1998).The Dutchversion was validated by Duriez and De Marez (2000) accord-ing to the prescriptions of Lind (1998).

INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT, 2006, 30 (1), 76–83 79

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 80

Results

Validity analyses

The means and standard deviation of the scores on the MoralJudgment Test are presented in Table 1. For the Post-CriticalBelief Scale, means and standard deviations are not includedin this table, because, due to the procedure that was describedin the measures section, the mean always equals zero, and thestandard deviation always equals 1. Table 1 shows that therewas no significant difference between the samples with respectto the higher levels of Kohlberg’s model. In each of thesamples, equal importance was attached to the arguments ofstages 5 and 6, F(2, 943) = 2.83 and 0.62, ns, respectively. Incontrast, important differences were found with respect to thearguments of stages 1, 2, 3 and 4, F(2, 943) = 26.65, 39.31,23.77 and 19.87, p < .001, respectively, as well as with respectto the C-index, F(2, 942) = 41.79, p < .001. Post-hoc Tukeycomparisons revealed that participants in Sample 1 attachedsignificantly more importance to arguments of stages 1, 2, 3and 4 than participants in Samples 2 and 3. In addition,participants in Sample 2 attached significantly less importanceto arguments of stages 1, 2 and 3 than participants in Sample3. These results suggest that university students (Sample 2)make the sharpest distinction between moral arguments of thelower stages of Kohlberg’s model and moral arguments of thehigher stages, followed by adults (Sample 3) and middleadolescents (Sample 1). Finally, participants in Sample 2obtained significantly higher scores than participants inSamples 1 and 3 for the C-index. The latter results are in linewith previous research that has shown that education is a moreimportant determinant of moral competence than age (theparticipants in Sample 3 are older than the participants inSample 2, but Sample 2 is clearly the best educated group).These results receive further support from the finding that,within Sample 3, important differences in the C-index werefound with respect to level of education, F(3, 304) = 6.03, p< .001, but not with respect to age, F(3, 308) = 2.81, ns. Forthis purpose, Sample 3 was split into four age groups (30–40,40–50, 50–60 and 60–70). Post-hoc Tukey comparisonsrevealed that scores on the C-index tend to rise with level ofeducation: The C-index equals 14.00, 17.81, 21.82 and 27.43for, respectively, those with a primary school degree only, thosewith a high school degree, those with a higher educationdegree, and those with a university degree. In addition, whenscores on the C-index are compared between Sample 2 and the

participants of Sample 3 with a university degree (people witha similar educational level but a distinctly different age), themean score of both groups was not significantly different, F(1,388) = 6.20, ns. This testifies to the importance of level ofeducation instead of age in the prediction of moralcompetence, and shows that the results that we obtain with theMoral Judgment Test are in line with earlier results.

Correlation analyses

The relation between the religiosity dimensions and bothmoral attitudes and moral competence was investigated bymeans of bivariate correlations (see Table 2). In all samples,results show that the Exclusion versus Inclusion of Transcen-dence dimension is unrelated to moral attitudes, but that theLiteral versus Symbolic dimension is not. In general, thisdimension is negatively related to preference for stages 1, 2 and3, unrelated to preference for stage 4, and positively related to

80 DURIEZ AND SOENENS / RELIGIOSITY, MORAL ATTITUDES AND MORAL COMPETENCE

Table 1Means and standard deviations of the scores on the Moral Judgment Test

Sample 1 Sample 2 Sample 3

M SD M SD M SD

Stage 1 –2.83c 4.77 –5.63a 4.53 –4.47b 5.23Stage 2 –1.44c 4.62 –4.70a 4.63 –2.42b 5.02Stage 3 –0.73c 4.68 –3.35a 4.49 –2.04b 5.13Stage 4 2.52b 3.69 1.22a 4.23 0.56a 3.93Stage 5 3.23a 3.74 3.92a 3.78 3.47a 3.46Stage 6 3.20a 3.50 3.13a 3.50 2.89a 3.84C-Index 23.66a 17.55 33.48b 18.22 21.43a 17.19

Mean levels with different superscript are significantly different at the .01 level.

Table 2Correlations between the variables included in this study

Inclusion versus Exclusion of Transcendence

Morality Sample 1 Sample 2 Sample 3

Stage 1 .07** .09** .11**Stage 2 .03** .02** .05**Stage 3 .05** .08** .10**Stage 4 –.08** .00** .10**Stage 5 –.10** –.05** –.08**Stage 6 –.07** –.01** .01**C-Index –.05** .06** –.05**

Literal versus Symbolic

Morality Sample 1 Sample 2 Sample 3

Stage 1 –.37** –.27** –.42**Stage 2 –.36** –.29** –.37**Stage 3 –.21** –.24** –.39**Stage 4 –.01** –.03** –.11**Stage 5 .13** .19** .16**Stage 6 .18** .25** .17**C-Index .31** .28** .27**

* p < .01,** p < .001.

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 81

preference for stages 5 and 6. Results also show that, whereasExclusion versus Inclusion of Transcendence is unrelated tomoral competence (as measured by the C-index), the Literalversus Symbolic dimension is strongly and positively related tomoral competence.This pattern of results also came to the forein all four age groups that were discerned in Sample 3 (30–40,40–50, 50–60 and 60–70).

Regression analyses

In order to determine (a) whether the relations between bothreligiosity dimensions and moral competence remain signifi-cant after controlling for the effect of level of education, and(b) whether level of education moderates these relations, ahierarchical regression analysis was performed. In this analysis,moral competence served as the dependent variable and waspredicted by level of education in Step 1, the two religiositydimensions in Step 2, and the two interaction components(level of education by Inclusion versus Exclusion of Tran-scendence and level of education by Literal versus Symbolic)in Step 3. Results show that, after controlling for the effect oflevel of education (β =.16, p < .01), Literal versus Symbolicexplained additional variance in moral competence scores(β = .22, p < .001). Neither Inclusion versus Exclusion ofTranscendence (β = –.02, ns) nor the two interactioncomponents (β = .00 and .01, ns) significantly added to theprediction of moral competence.

Discussion

Religiosity and morality

Results of the present study show that, when separating theeffects of being religious or not (Exclusion versus Inclusion ofTranscendence) from the way people process religious contents(Literal versus Symbolic), the apparent religiosity–morality relation that was observed in previous studies can beexplained by the way people process religious contents. Incomparison to people who process religious contents in a literalway, people processing religious contents in a symbolic wayshow higher moral competence and tend to make a sharperdistinction between moral arguments of the lower stages andhigher stages of Kohlberg’s model: In comparison to peoplewho process religious contents in a literal way, they pay lessattention to arguments of the lower stages (as is shown by thenegative correlations between Literal versus Symbolic andstages 1, 2 and 3 preference) and more attention to argumentsof the higher stages (as is shown by the positive correlationsstages 5 and 6 preference). In contrast, being religious or not isunrelated to both moral attitudes and moral competence.Theserelations follow a similar pattern among adolescents, universitystudents, adults (and different age groups of adults), and amonghighly religious subjects (see also Duriez, 2003b). Results alsoremained the same after controlling for educational differ-ences. The relationship between moral competence and thereligiosity dimensions could not be explained by educationaldifferences, and did not vary by level of education. Resultssupport the idea of Kohlberg (1981) that religiosity andmorality are inherently unrelated and the idea of Wahrman(1981) that the apparent religiosity–morality relation that wasobserved in previous studies can be explained by cognitiveprocesses that go beyond mere educational differences.

In sum, these results suggest that whether or not someoneis religious has no consequences for moral reasoning ability.What seems to be vitally important, however, is the way inwhich someone processes religious contents. If people processreligious contents in a literal way, this seems to have a delete-rious effect on their moral reasoning ability. In a similar vein,research that allows for the separation of differences in religios-ity from differences in the way religious contents are processedsuggests that, even though, in comparison to non-religiouspeople, religious people are more likely to prefer order, struc-ture and predictability (Duriez, 2003a), are less likely to valuehedonism, stimulation and self-direction values and morelikely to value tradition and conformity values (Fontaine,Duriez, Luyten, Corveleyn, & Hutsebaut, 2005), are morelikely to hold cultural conservative and authoritarian beliefs(Duriez, 2003c, 2004a), and are more likely to base importantchoices in life on the expectations of parents, authority figuresor reference groups (Duriez & Soenens, in press; Duriez et al.,2004), they are neither more nor less likely to experiencepsychological well-being (Dezutter, 2004), to feel empathy(Duriez, 2004a, 2004b), or to hold prejudiced attitudes(Duriez, 2004a). In sum, although religious people tend to beconservative and submissive, they are neither more nor lesshappy, good-natured and tolerant. This suggests that theimpact of being religious or not on individuals’ lives is limitedwhen it is separated from the impact of the way people processreligious contents. The impact of the way people processreligious contents, on the other hand, seems vitally important,with people processing religious contents in a literal way notonly showing less advanced moral reasoning abilities but alsoless psychological well-being, less empathy and moreprejudice.

Theoretical and practical implications

The validity analyses that are reported in this article supportthe claim that level of education can stimulate moralcompetence (cf. Lind, 1993, 2003; Oser, 1986; Rest, 1986).However, these analyses also make it clear that the way inwhich people process religious contents contributes to theprediction of moral competence beyond educational differ-ences. In line with this, previous research has shown that thetype of education and the type of values that are promotedmatter (Lickona, 1977; Snarey, Reimer, & Kohlberg, 1985).There is a consensus among researchers that educationalprograms targeted at stimulating moral development should beaimed at learning how to distinguish good from bad argumentsand at learning to translate one’s ethical principles to solutionsfor specific problems with which one is confronted in real life,even under those circumstances where factors like prejudice,authority or the so-called moral majority try to prevent peoplefrom thinking about the different aspects that are part of theproblem (e.g., Lind, 2003). The current results suggest that arelated educational aim should be to teach people that thereare plenty of religious systems, that all of them are moresimilar than they appear at first sight, and that it is importantto spot the symbolical meaning of their contents prior tomaking judgments about whether a specific religion or religionin general is something that deserves to be incorporated intothe personal worldview. This aim might be achieved indirectly.Previous research has shown that the identity styles that areformed during adolescence predict the way religious infor-mation is processed, and there is growing evidence that it is

INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT, 2006, 30 (1), 76–83 81

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 82

possible to direct this identity formation process (Archer,1994; Ferrer et al., 2002; Josselson, 1994). This way,adolescents can be expected to learn to deal with religiousissues in a more symbolic way. Teaching people either directlyor indirectly to deal with religious issues in a more symbolicway can be expected both to stimulate moral development ingeneral and moral competence in particular and to have itspositive effects on well-being, pro-social behavior and toler-ance. In sum, incorporating this kind of education into theeducational system might be a prerequisite for a better func-tioning democratic society in which people feel better, are lessprejudiced and more helpful and, above all, are capable ofbehaving in a morally competent way.

Limitations and suggestions

An important limitation of the present study is that it wasconducted in one specific cultural setting, namely Flanders(Belgium). One might wonder whether the Roman Catholictradition, which is the dominant religious tradition in thisregion, shapes the way people process religious issues, andwhether members of other religious traditions process religiouscontents in a different way. However, the present study demon-strates that the religiosity–morality relation remained the samein spite of mean level differences in moral competence. In linewith this, even though the way church members processreligious issues is no doubt shaped by the church to which theybelong, the denomination that is studied will not necessarilyalter our findings. In fact, we expect the religiosity–moralityrelation to remain the same in spite of possible mean leveldifferences in the way people within different religioustraditions deal with religious contents. Of course, the predic-tion that, worldwide, the religiosity–morality relation will bedue to differences in the way people process religious contents,with people dealing with religious contents in a more literalway displaying less moral competence, needs further investi-gation. Future research needs to check these relations in areasin which other religious denominations are dominant, in areasin which religious denominations have to compete in order togain members, and in areas in which religion plays a lessimportant role in society. However, before being able to do so,future research should focus on revising the Post-CriticalBelief Scale, because the current version is tailored for usageamong Christians and people that grew up in a Christiansetting.

Apart from assessing the cross-cultural generalizability ofour findings, research should further develop the educationalprograms that are available to stimulate moral competence.Our results suggest that, rather than focusing exclusively ondirectly facilitating moral development, these programs mightbenefit from incorporating both a pluralist religious educationand identity development programs. Incorporating theseelements might yield additional beneficial effects on moraldevelopment. Because of the importance of citizens’ moralcompetence levels for a properly functioning democraticsociety, ministers of education should sponsor and integrateresearch programs that are aimed, either directly or indirectly,at raising moral competence levels, reshaping the schoolcurriculum in order to leave more room for this kind ofeducation, convincing schools and schoolteachers of theimportance of these educational programs, and investing intraining programs that teach schoolteachers the necessary skillsto effectively carry out these educational programs.

Acknowledgement

The contribution of the second author was supported by theFund for Scientific Research Flanders (FWO).

References

Allport, G.W., & Ross, J.M. (1967). Personal religious orientation and prejudice.Journal of Personality and Social Psychology, 5, 432–443.

Archer, S.L. (1994). Interventions for adolescent identity development. ThousandOaks, CA: Sage.

Armor, D.J. (1974). Theta reliability and factor scaling. In H.L. Costner (Ed.),Sociological methodology (pp. 17–50). London: Jossey-Bass.

Batson, C.D. (1976). Latent aspects of “From Jerusalem to Jericho.” In M.P.Golden (Ed.), The research experience. Itasca, IL: F.E. Peacock.

Colby, A., Kohlberg, L., Gibbs, J., & Lieberman, C. (1983). A longitudinal studyof moral judgment. Monographs of the Society for Research in Child Develop-ment, 48.

Colby, A., Kohlberg, L., Speicher, B., Hewer, A., Candee, D., Gibbs, J., &Power, C. (1987). The measurement of moral judgment, Vols. 1 and 2. New York:Cambridge University Press.

Deka, N., & Broota, K.D. (1988). Relation between level of religiosity and prin-cipled moral judgment among four religious communities in India. Journal ofPersonality and Clinical Studies, 4, 151–156.

Desimpelaere, P., Sulas, F., Duriez, B., & Hutsebaut, D. (1999). Psycho-epistemological styles and religious beliefs. The International Journal for thePsychology of Religion, 9, 125–137.

Dezutter, J. (2004). [Religiosity and mental health]. Unpublished raw data.Duriez, B. (2003a). Vivisecting the religious mind: Religiosity and motivated

social cognition. Mental Health, Religion & Culture, 6, 79–86.Duriez, B. (2003b). Religiosity, moral attitudes and moral competence. A

research note on the relation between religiosity and morality. Archiv fürReligionspsychologie, 25, 210–221.

Duriez, B. (2003c). Religiosity and conservatism revisited. Relating a newreligiosity measure to the two main conservative political ideologies. Psycho-logical Reports, 92, 533–539.

Duriez, B. (2004a). A research note on the relation between religiosity andracism: The importance of the way in which religious contents are beingprocessed. The International Journal for the Psychology of Religion, 14, 175–189.

Duriez, B. (2004b). Taking a closer look at the religion-empathy relationship:Are religious people nicer people? Mental Health, Religion & Culture, 7,249–254.

Duriez, B., & De Marez, P. (2000). Voorstelling en validering van de Morele OordeelTest (MOT), de Nederlandstalige versie van de Moral Judgment Test (MJT).Internal report. Leuven: K.U. Leuven.

Duriez, B., Fontaine, J.R.J., & Hutsebaut, D. (2000). A further elaboration ofthe Post-Critical Belief Scale: Evidence for the existence of four differentapproaches to religion in Flanders-Belgium. Psychologica Belgica, 40,153–181.

Duriez, B., & Hutsebaut, D. (2000). The relation between religion and racism:The role of post critical beliefs. Mental Health, Religion and Culture, 3, 85–102.

Duriez, B., & Soenens, B. (in press). Personality, identity styles and religiosity:An integrative study among late and middle adolescents. Journal of Adoles-cence.

Duriez, B., Soenens, B., & Beyers, W. (2004). Personality, identity styles and reli-giosity: An integrative study among late adolescents in Flanders (Belgium).Journal of Personality, 72, 877–910.

Duriez, B., & Van Hiel, A. (2002). The march of modern fascism. A compari-son of social dominance orientation and authoritarianism. Personality andIndividual Differences, 32, 1999–1213.

Ernsberger, D.J., & Manaster, G.J. (1981). Moral development, intrinsic/extrin-sic religious orientation and denominational teachings. Genetic PsychologyMonographs, 104, 23–41.

Fernhout, H., & Boyd, D. (1985). Faith in autonomy: Development inKohlberg’s perspectives in religion and morality. Religious Education, 80,287–307.

Ferrer, L.A., Cass, C., Kurtines, W.M., Briones, E., Bussell, J.R., Berman, S.L.,& Arrufat, O. (2002). Promoting identity development in marginalized youth.Journal of Adolescent Research, 17, 168–187.

Fontaine, J.R.J., Duriez, B., Luyten, P., & Hutsebaut, D. (2003). The internalstructure of the Post-Critical Belief scale. Personality and Individual Differ-ences, 35, 501–518.

Fontaine, J.R.J., Duriez, B., Luyten, P., Corveleyn, J., & Hutsebaut, D. (2005).Consequences of a multi-dimensional approach to religion for the

82 DURIEZ AND SOENENS / RELIGIOSITY, MORAL ATTITUDES AND MORAL COMPETENCE

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 83

INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT, 2006, 30 (1), 76–83 83

relationship between religiosity and value priorities. The International Journalfor the Psychology of Religion, 15, 123–143.

Gilligan, C. (1977). In a different voice: Women’s conceptions of the self andmorality. Harvard Educational Review, 49, 481–517.

Glover, R.J. (1997). Relationships in moral reasoning and religion amongmembers of conservative, moderate, and liberal religious groups. The Journalof Social Psychology, 137, 247–254.

Hutsebaut, D. (1996). Post-critical belief: A new approach to the religiousattitude problem. Journal of Empirical Theology, 9, 48–66.

Josselson, R. (1994). The theory of identity development and the question ofintervention: An introduction. In S.L. Archer (Ed.), Interventions for adolescentidentity development. Thousand Oaks, CA: Sage.

Kohlberg, L. (1958). The development of modes of moral thinking and choice in theyears 10 to 16. University of Chicago: Unpublished doctoral dissertation.

Kohlberg, L. (1969). Stage and sequence: The cognitive developmentalapproach to socialization. In D.A. Goslin (Ed.), Handbook of socializationtheory and research (pp. 370–480). Chicago: McNally.

Kohlberg, L. (1976). Moral stages and moralization: The cognitive develop-mental approach. In T. Lickona (Ed.), Moral development and behavior: Theory,research, and social issues (pp. 31–53). New York: Holt, Rinehart and Winston.

Kohlberg, L. (1981). Essays on moral development, vol. 1: The philosophy of moraldevelopment. Moral stages and the idea of justice. San Francisco: Harper & Row.

Kohlberg, L. (1984). Essays on moral development, vol. 2: The psychology of moraldevelopment. The nature of moral stages. San Francisco: Harper & Row.

Lawrence, J.A. (1979). The component procedure of moral judgment making.Dissertation Abstracts International, 40, 896B.

Lickona, T. (1977). Creating the just community with children. Theory intoPractice, 16, 97–104.

Lind, G. (1978). Der “Moralisches-Urteil-Test (MUT)”. Anleitung zur An-wendung und Weiterentwickelung des Tests. In L.H. Eckensberger (Ed.),Sozialisation und Moral (pp. 171–201). Weinheim: Beltz.

Lind, G. (1985). The theory of moral cognitive development. A socio-psycho-logical assessment. In G. Lind, H.A. Hartmann & R.H. Wakenhut (Eds.),Moral development and the social environment (pp. 21–53). Chicago: PrecedentPublishing.

Lind, G. (1993). Morality and education. Heidelberg: Asanger.Lind, G. (1995). The meaning and measurement of Moral Judgment Competence

revisited. A dual-aspect model. Invited address, SIG Moral Development andEducation, American Educational Research Association (AERA), San Fran-cisco, 17–21 April.

Lind, G. (1998). An introduction to the Moral Judgment Test (MJT). Unpublishedmanuscript. Konstanz: University of Konstanz.

Lind, G. (2000). Off limits. A cross-cultural study on possible causes of segmentationof moral judgment competence. Paper presented at the annual scientific meetingof the American Educational Research Association, 24–28 April, NewOrleans, LA.

Lind, G. (2003). Moral ist lehrbar. Handbuch zur Theorie und Praxis der moralischenund demokratischen Bildung. Munich: Oldenburg.

Lind, G., & Wakenhut, R.H. (1985). Testing for moral judgment competence.In G. Lind, H.A. Hartmann & R.H. Wakenhut (Eds.), Moral development andthe social environment (pp. 79–105). Chicago: Precedent Publishing.

Nisan, M. (1984). Social norms and moral judgment. In W. Kurtines & J.Gewirtz (Eds.), Morality and moral development (pp. 208–224). New York:Wiley.

Oser, F. (1986). Moral education and values education: The discourse perspec-tive. In M.C. Wittrock (Ed.), Handbook of research on teaching (pp. 917–941).New York: Macmillan.

Rest, J.R. (1974). Manual for the Defining Issues Test: An objective test for moraljudgment development. Minneapolis: University of Minneapolis Press.

Rest, J.R. (1983). Morality. In P. Mussen (Ed.), Manual of child psychology (4thedn, vol. 3, pp. 556–629). New York: Wiley.

Rest, J.R. (1986). Moral development: Advances in research and theory. New York:Praeger.

Rest, J.R. (1997). Alchemy and beyond: Indexing the Defining Issues Test.Journal of Educational Psychology, 89, 498–507.

Sapp, G.L., & Gladding, S.T. (1989). Correlates of religious orientation, reli-giosity and moral judgment. Counseling and Values, 33, 140–145.

Siegmund, U. (1979). Das moralisches Urteilsniveau von reilgiösen Studenten-gruppen. In G. Lind (Ed.), Moralisches Entwicklung und Soziale Umwelt(pp. 13–21). Zentrum 1 Bildungsforschung. Mimeo: University of Konstanz.

Snarey, J., Reimer, J., & Kohlberg, L. (1985). The Kibbutz as a model for moraleducation: A longitudinal cross-cultural study. Journal of Applied Develop-mental Psychology, 6, 151–172.

Thoma, S.J., Rest, J.R., & Davison, M.L. (1991). Describing and testing amoderator of the moral judgment and action relationship. Journal of Person-ality and Social Psychology, 61, 659–669.

Turiel, E. (1983). Domains and categories in social-cognitive development. InW. Overton (Ed.), The relationship between social and cognitive development(pp. 69–106). Hillsdale, NJ: Erlbaum.

Van de Vijver, F., & Leung, K. (1997). Methods and data analysis for cross-culturalresearch. London: Sage.

Wagner, J. (1990). Rational constraint in mass belief systems: The role ofdevelopmental moral stages in the structure of political beliefs. PoliticalPsychology, 11, 147–171.

Wahrman, I.S. (1981). The relationship of dogmatism, religious affiliation, andmoral judgment development. The Journal of Psychology, 108, 151–154.

Wakenhut, R. (1981). Sozialisationsforschung und Lebenskundlicher Unterricht.Dokumentation zur katholischen Militärseelsorge, 8, 48–71.

Wulff, D.M. (1991). Psychology of religion: Classic and contemporary views. NewYork: Wiley.

at Stanford University on December 2, 2008 http://jbd.sagepub.comDownloaded from

page 84

Non-linear modeling of growthprerequisites in a Finnish

polytechnic institution of highereducation

Petri Nokelainen and Pekka RuohotieUniversity of Tampere, Tampere, Finland

Abstract

Purpose – This study aims to examine the factors of growth-oriented atmosphere in a Finnishpolytechnic institution of higher education with categorical exploratory factor analysis,multidimensional scaling and Bayesian unsupervised model-based visualization.

Design/methodology/approach – This study was designed to examine employee perceptions ofhow their managers create conditions that support professional growth and learning, and how theemployees perceive their growth motivation and commitment to the organization. Data were gatheredfrom 447 employees with the Growth-oriented Atmosphere Questionnaire in a Finnish polytechnicinstitution of higher education.

Findings – Results showed that the theoretical four-group classification of the growth-orientedatmosphere factors was supported by the empirical evidence. Results further showed that managersand teachers had higher growth motivation and level of commitment to work than other personnel,including job titles such as cleaner, caretaker, accountant and computer support. Employees across alljob titles in the organization, who have temporary or part-time contracts, had higher self-reportedgrowth motivation and commitment to work and organization than their established colleagues.

Practical implications – Leaders in various organizations may benefit from learning what is thecurrent professional growth status of diverse employee groups, and in understanding the potentialdifferences in employee growth motivation.

Originality/value – This study contributes to an understanding of organizational growth andlearning as a non-linear process. The statistical non-linear modeling approach is novel providingresearch and practical example of how to use these techniques in practice.

Keywords Organizations, Continuing development, Workplace training, Modeling, Polytechnics,Finland

Paper type Research paper

IntroductionProfessional growth is a continuous learning process that enables individuals toacquire the knowledge, skills and abilities needed to cope with changing demands forvocational proficiency throughout their career (London and Mone, 1999). It is, thus,viable to speak about “professional career growth” to distinguish it from the concept of“professional development”, which is a collection of concrete developmental strategiesand functions that aim to support professional growth. However, these two conceptsappear in the research literature intertwined in the form of “professional growth anddevelopment”. This is natural, as the professional development is de rigueur but not defacto for professional growth (Nokelainen, 2008).

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1366-5626.htm

JWL21,1

36

Received 23 June 2008Revised 18 September 2008Accepted 22 September2008

Journal of Workplace LearningVol. 21 No. 1, 2009pp. 36-57q Emerald Group Publishing Limited1366-5626DOI 10.1108/13665620910924907

page 85

London and Mone’s term “continuous” describes strong and durable need or will tolearn and also valuation of learning. “Learning” refers in this context individuals willto develop one’s skills via practice and training in order to meet changing challenges ofthe work. Naturally, in most cases such development is possible only ifemployer/organization shares the same goals. Continuous learning is characteristicof multifaceted career that is in fact defined as growth of know-how (Ruohotie, 2000).People may have at the same time several career paths or consecutive work periods indifferent companies or even in different professions.

In order to be successful, educational organizations must provide effectiveprofessional development programs for employees over the entire course of theircareers (Lawler, 1994). This notion makes studies of professional updating, andespecially those concerning the problems and prerequisites of continual growth invarious work communities, most important. These include factors within theindividual, the job, the work place and society (Nokelainen, 2008; Ruohotie, 1996;Ruohotie and Nokelainen, 2000).

Continuous development and updating of skills is important, otherwise they maybecome useless (or at least obsolete) for the work life. Kaufman (1974, p. 23) has definedthe professional obsolescence as “the degree to which professionals lack the up-to-dateknowledge and skills necessary to maintain effective performance in either theircurrent or future work role.” According to Pazy (2004), professional updating is alearning response to imminent obsolescence. Like other forms of adult learning beyondformal education, updating is characterized by a problem focus (Knowles, 1990), and itis typically a self-initiated, self-structured, and self-defined activity. This paper has twomain goals. First, to present a theoretical model of growth-oriented atmosphere and,second, to demonstrate its practical use as a measurement instrument of growthprerequisites with different employee groups of a Finnish polytechnic institution ofhigher education.

Growth prerequisites are examined on the basis of a 14 dimensional theoreticalmodel of growth-oriented atmosphere developed by Ruohotie and Nokelainen (2000).The organization investigated in this study received its polytechnic institution ofhigher education status among the first in 1996. We conducted the first surveyinvestigating growth prerequisites in the organization in 1998 (Ruohotie andNokelainen, 2000). This research paper reports the findings of the second survey thatwas conducted in 2002. Although the study reported here is based on non-probabilitysampling, the polytechnic institution of higher education investigated in this studyrepresents all the other 30 organizations quite well as they all had to meet the samecriteria (e.g., planning, function and goals of education, curriculum, the development ofevaluation and feedback systems) evaluated by the same committee before they werepromoted from vocational institutions to high schools.

The Finnish education system consists of comprehensive school,post-comprehensive general and vocational education, higher education and adulteducation. Comprehensive school provides a nine-year compulsory educationalprogram for all school-age children, beginning at the age of seven. Post-comprehensiveeducation is given by upper secondary schools and vocational schools or institutes.The higher education system consists of 20 universities and 31 polytechnic institutionof higher education. The higher education system as a whole offers openings for 66 percent of the relevant age group (universities: 29 percent, polytechnic institutions of

Modelingof growth

prerequisites

37

page 86

higher education: 37 percent). Polytechnic institutions of higher education have beenpart of the Finnish school system now for only ten years. The polytechnic institution ofhigher education evaluation committee decided between 1992 and 1996, on the basis ofthe 14 evaluation criteria, which of 200 vocational education institutions werepromoted to polytechnic institutions of higher education (Liljander, 2002, p. 10).

The following four research questions are to be considered:

RQ1. Is the 13-factor model of the growth-oriented atmosphere relevant to describegrowth prerequisites of Finnish polytechnic institution of higher educationemployees?

RQ2. Is the theoretical four-group classification of the growth-orientedatmosphere factors present in the sample?

RQ3. To what extent employees’ position is connected to growth motivation andcommitment to the organization?

RQ4. Is employee’s nature of contract connected to growth motivation andcommitment to the organization?

The data analyses applied in this study are exploratory factor analysis for categoricalindicators (RQ1), multidimensional scaling (RQ2) and Bayesian unsupervisedmodel-based visualization (RQ3 and RQ4).

The paper is organized as follows: first, we give a condensed up-to-date introductionto the theoretical model of growth-oriented atmosphere. Second, we represent theempirical results considering the four research questions.

Growth prerequisites in organizationsProfessional development includes all developmental functions, which are directed atthe maintenance and enhancement of professional competency. In the modern world,updating is, ideally, a continual, lifelong process that addresses such goals as theacquisition of new and up-to-date information, the development of skills andtechniques and the elevation of one’s personal esteem (Ruohotie, 1996). Themaintenance and enhancement of competency is subject to the combined effect of manyfactors, ranging from personal traits to salient features of the work environment(Fishbein and Stasson, 1990).

Maurer and Tarulli (1994) have identified the following factors affecting thevoluntary involvement of workers in development activities:

(1) Perceptions related to the working environment.

(2) Perceptions and beliefs regarding the benefits of development.

(3) Values and judgments.

(4) Personality factors including:. identification with work;. the personal concept of career;. the need for self-development; and. self-efficacy.

JWL21,1

38

page 87

Hall (1986) has created a model of mid-career sub-identity development which outlinesfactors that influence professional development (growth triggering factors) and theprocess through which the professional exploration cycle progresses. It shows thatprofessional growth is dependent on the social and institutional context as wellpersonal attributes and circumstances. Several factors are presented which can triggerin career routine and lead to the acquisition and development of new knowledge andskills.

Organizational triggersChanges in organizational structure, areas of responsibility and tasks often require thedevelopment of new skills. Individuals respond to such changes both effectively andbehaviorally according to their perception of their circumstances, interpretingenvironmental events or situational change on the basis of personal values andperspectives.

Research conducted as part of the Growth Needs Project in Finland show that thefollowing factors are among the keys to the creation and maintenance of growth andhigh innovative capacity in an organization (Ruohotie, 1996):

. Creation of a supportive culture. In a supportive environment innovationbecomes a natural part of everyday work. Tasks may be intentionally defined inbroad terms, encouraging change and emphasizing the possibility of choice.

. Reward of development. In innovative organizations learning, initiative andexperiment are prized as inherently valuable.

. Supportive and participative management. In innovative organizations it is seenas the duty of management to create a workplace where each individual canreach his/her or her full potential.

. Intensive communication. The more intensive the communication, the moreeffectively new ideas and alternative points of view can be shared and developed.

. Security:. In an era of intensifying competition, the organizations that willsurvive and succeed are those where there is a secure and confident atmospherefor employees. The fear of failure, of blame or of criticism is an effective damperto creative innovation.

Generating continuous enlivening innovation requires at least two things of anorganization:

(1) it must learn to fully develop and utilize the capacity of its personnel; and

(2) it must show imagination at all times, suspending judgment temporarily whennecessary in order to promote the development of new ideas.

Work role triggersResearch results of the Growth Needs Project indicate that motivational aspects of thework environment and the individual’s opportunities to influence it correlate positivelywith personnel motivation. Boring, repetitive and dependent work discouragesprofessional development and growth. Challenging, variable and independent workencourages it (Ruohotie, 1996).

Modelingof growth

prerequisites

39

page 88

Personal triggersEvents or stages connected to everything from personal factors to life changes – forexample, changes in family relationships, health, age and so forth – can cause anindividual to reconsider his or her career priorities and goals. In addition, according toHall (1986, 1990), certain personal characteristics predispose an individual to makechanges in order to avoid the negative consequences of work pressure or deal withpersonal frustration at the status quo (i.e. basic personality disposition, motivation foradvancement, initiative, stress on performance, hardiness, flexibility, tolerance ofambiguity, independence).

Factors contributing to growth-oriented atmosphereImportant factors in the development of growth orientation are support and rewardsfrom the management, the incentive value of the job itself, the operational capacity ofthe team and work-related stress. Each of these can further be divided into smallerindividual factors.

Management and leaders face such challenges as how to empower people, supportthe development of their professional identity and how to create careers based oninteraction. They should also aim to develop, reward, set goals and evaluate learning inthe organization. Successful leadership creates commitment to the job and theorganization.

The incentive value of the job depends on the opportunities it offers for learning, i.e.the developing nature of the job. Therefore, essential factors for professional growthare the developmental challenges, the employees’ chances to influence, opportunitiesfor the collaborative learning and valuation of the job.

The operational capacity of a team or a group can be defined by its members’capability to operate and learn together, by the work group co-operation and by thereputation for effectiveness.

Work-related stress might become an obstacle to professional growth. Ambiguity,vagueness and role conflicts, a too heavy mental load and demand for continualalterations may stress people and damage the organizational atmosphere. Negativestress quickly suppresses growth and development.

Theoretical dimensions of growth-oriented atmosphereIn the earlier study dating back to 1998, Ruohotie and Nokelainen (2000) examined thetheoretical dimensions of a growth-oriented atmosphere in the same organization as inthe current study. The organization consisted of ten geographically separate units. Thesample size was 318 employees, 66 percent out of the survey population of 479employees. The target population was Finnish polytechnic institution of highereducation personnel in 1998 (n ¼ 7,958).

Both male (n ¼ 145) and female (n ¼ 147) respondents’ group sizes were almostidentical (46 percent) with 8 per cent (n ¼ 27) missing data. Respondents’ age wasreported with four classes: 20 to 29 years (5 percent, n ¼ 17), 30 to 39 years (25 percent,n ¼ 78), 40 to 49 years (37 percent, n ¼ 120), and over 50 years (24 percent, n ¼ 75)with 9 per cent (n ¼ 29) missing data. The job profile contained three groups (7 percentof missing data): managers (8 percent, n ¼ 25), teachers (44 percent, n ¼ 139) and otherpersonnel, i.e. cleaner, caretaker, librarian (41 percent, n ¼ 131).

JWL21,1

40

page 89

Although the non-response rate was quite high in this study, the job titledistribution of the sample (teachers: 44 percent, managers: 8 percent, other personnel:41 percent, missing: 7 percent) was parallel both to the survey population (teachers: 47percent, managers: 5 percent, other personnel: 48 percent) and target population(teachers: 63 percent, managers: 5 percent, other personnel: 32 percent) distributionsderived from the public records.

The instrument utilized in the study contained 80 statements. The response optionsin a five-point summative rating scale (aka “Likert scale”, see DeVellis, 2003, pp. 78-80)varied from 1 (strongly disagree) to 5 (strongly agree).

Ruohotie and Nokelainen (2000) constructed 14 summated scales (Hair et al., 1995,p. 9) to represent the theoretical dimensions of growth-oriented atmosphere. The scaleswere formed on the basis of both theoretical aspects and the results of exploratoryfactor analysis (Maximum likelihood with Varimax rotation). The 13-factor solutionwas the most parsimonious representing 67 per cent of the variance within the 80items. Eigenvalues were between 1.05 and 23.98. Respondents indicated only moderatedifferences in preferences for various dimensions as mean ratings ranged between 3.2and 3.8. Internal consistency for each factor was estimated with Cronbach’s alphacoefficient (Cronbach, 1970, pp. 160-1). The alpha values ranged from 0.77 to 0.93(Ma ¼ 0:84).

Although the authors report continuous parameters such as mean and alpha onitems measured with the non-metric ordinal scale, we consider the results plausible asthe underlying phenomenon, a growth-oriented atmosphere is continuous by nature(Marini et al., 1996). Johnson and Creech (1983) have studied with simulation studiesthe categorization error that occurs when continuous variables are measured byindicators with only a few categories. The results indicated that while categorizationerror does produce distortions in multiple indicator models, under most conditionsexplored the bias was not sufficient to alter substantive interpretations. However,authors warranted caution in the use of two-, three- or four-category ordinal indicators,particularly when the sample size is small. In the Ruohotie and Nokelainen (2000)study, as well as in the present study, the ordinal scale has five categories and thesample size to the number of the observed variables ratio is acceptable according toempirical and simulation studies (Cattell, 1978; Gorusch, 1983; MacCallum et al., 1999).Further, according to Yeo and Neal (2008), recent research has shown that thepsychometric properties of single-item measures can equal that of multi-item measuresfor a variety of psychological constructs, such as job satisfaction.

Ruohotie and Nokelainen (2000) found that growth-oriented atmosphere generatestogetherness and reflects on developing leadership. Multidimensional scaling providedevidence to conclude that factors representing the incentive value of the job,commitment to work and organization, the clarity of the job and growth motivation arethe strongest indicators of growth-oriented atmosphere. Ruohotie and Nokelainen(2000) made the following conclusions based on their research findings:

. teacher’s professional growth-motivation reflects directly with task value onteacher-pupil relationships and on achievement motivation;

. task value has an effect on growth-oriented atmosphere; and

. growth-oriented atmosphere is the highest in work assignments that offerchallenging professional tasks (manager, teacher) and lowest among otherworkers.

Modelingof growth

prerequisites

41

page 90

MethodSampleA non-probability sample included employees that worked in a Finnish polytechnicinstitution of higher education during the year 2002. The organization is the same as inthe 1998 study (Ruohotie and Nokelainen, 2000), but as the organization structure wasre-arranged in 2000, the number of units has dropped from ten to eight. A total of 447participants completed the questionnaire. The sample size is 87 per cent of the surveypopulation of 512 workers, indicating 13 per cent non-response rate. The targetpopulation of Finnish polytechnic institution of higher education personnel in 2002 was9,661. Non-response error was analyzed in the study by comparing job titledistributions (manager/teacher/other) between the sample and public employeerecords. We conclude that the results of this study are to some extent generalizable tothe target population of Finnish polytechnic institution of higher educations, as thetarget organization’s job distribution resembles the job distribution of targetpopulation.

The average age of respondents’ in the sample was 39 years (SD ¼ 9:1, range 22-62).Respondents’ job profiles were as follows (with 6 percent, n ¼ 27 missing data):Teachers (48 percent, n ¼ 215), managers (7 percent, n ¼ 30) and other personnel (39percent, n ¼ 175).

A majority of the respondents were established employees (64 percent, n ¼ 287), butthe sample included also temporary (25 percent, n ¼ 109), and part-time (6 percent,n ¼ 28) workers. Of the managers, 80 per cent (n ¼ 24) had established contracts and20 per cent (n ¼ 6) had a temporary contract. Over the half of the teachers (67 percent,n ¼ 143) had established contracts, 21 (10 percent) had part-time, and 48 (22 percent)had temporary contracts. Other personnel had the following contracts: 66 per cent(n ¼ 115) established, 3 per cent (n ¼ 6) part-time and 29 per cent (n ¼ 51) temporary,respectively.

InstrumentThe Growth-oriented Atmosphere Questionnaire (GOAQ) used in this study was amodified version of the one developed during the Growth Needs project (Ruohotie,1996). The theoretical basis for the structure of the instrument elicited from the worksof Argyris (1972, 1992), Dubin (1977, 1990), Hall (1986, 1990), and Kaufman (1974,1990). The latest version of the GOAQ is based on the research findings of the GrowthNeeds Project’s previous research phase (Ruohotie and Nokelainen, 2000). The originalinstrument contained 92 items operationalizing 14 latent dimensions. Each item wasmeasured in a five-point summative rating scale from 1 (strongly disagree) to 5(strongly agree). According to the results of exploratory factor analysis for categoricalvariables (CEFA), the 67 strongest loading items were chosen to describe the 13dimensions of the growth-oriented atmosphere model (see Table I). The dimensionmeasuring students’ attitudes toward teacher in the 1998 study was dropped out in thecurrent study as it is relevant only for the teachers who represent 48 per cent of thesample. A demographics sheet was attached to the instrument enquiring respondents’position in the organization and nature of the contract.

JWL21,1

42

page 91

Data(n ¼ 447)

Item Median Mode

Factor 1. Encouraging leadership (ENC)v5. My manager is friendly and easily approachable 4.0 5v6. My manager pays attention to my suggestions and wishes 4.0 5v7. My manager works with a team to find solutions 4.0 4v8. My manager is fair 4.0 5v9. The employees in my organization are encouraged to develop

new working methods and to think creatively 4.0 4v10. My manager trusts his or her staff and allows them to work

independently 4.0 5v11. The organization promotes self-reliance and employees are

encouraged to find new and improved working methods 4.0 4v13. The managers are interested in the wellbeing of staff 3.0 4v14. The management strives to improve the working conditions of

staff 4.0 4v15. My goals were agreed in co-operation with my manager 4.0 4v23. Failures are dealt with in a constructive manner and employees

are encouraged to learn from their mistakes 3.0 3v25. My manager has supported me in the past 4.0 4v26. My manager knows how to tap into the differing characteristics

within the workforce 3.0 4v27. My manager has succeeded in strengthening the sense of unity in

the workplace 3.0 3v90. This organisation values me as an individual 4.0 4

Factor 2. Strategic leadership (STR)v1. The management of my organization provides a clear direction

and highlights the key points in education 3.0 3v2. The management of my organization expresses and enforces

accepted values both in spoken form and through its example 3.0 3v3. The management of my organization embodies distinct values

and a clearly defined style of leadership 3.0 3v4. The management of my organization observes the latest

educational developments and uses this information whenplanning the organization’s activities 3.0 3

Factor 3. Know-how rewarding (REW)v20. It is rewarding to achieve my goals 2.0 1v21. The organization rewards its employees’ professional knowledge

and skills 2.0 1v22. Employees with increased knowledge are given extra

responsibility 3.0 3v24. The organization rewards employees for tackling demanding

tasks 3.0 3

Factor 4. Know-how developing (DEV)v37. The organization endeavours to always use the latest knowledge

in the field 4.0 4v38. The organization’s employees are given training to increase their

professional skills 3.0 4

(continued )

Table I.The growth-oriented

atmosphere questionnaire

Modelingof growth

prerequisites

43

page 92

Data(n ¼ 447)

Item Median Mode

v39. The organization takes an active interest in its employees’professional growth 3.0 3

v40. The staff is given the latest information and professionalliterature 4.0 4

v41. I am given the chance to learn new things and improve myself 4.0 4

Factor 5: Incentive value of the job (INV)v28. I can work independently and without restrictions 4.0 4v29. I can use my skills at work in a variety of ways 4.0 4v30. My work consists of various differing tasks 4.0 5v31. My work gives me a sense of success and achievement 4.0 4v32. My work gives me personal satisfaction 4.0 4

Factor 6. Clarity of the job (CLA)v46. A clear division of tasks exists between members of teaching

staff 3.0 4v47. The organization’s decision making structure is transparent 3.0 3v48. The organization’s goals are transparent 3.0 4v49. The teachers know exactly what their colleagues expect of them 3.0 3

Factor 7. Valuation of the job (VAL)v42. My manager appreciates my work 4.0 4v43. I am given encouraging feedback on my work 3.0 4v45. I feel that my work is valued 4.0 4

Factor 8. Community spirit (COS)v54. The organization’s staff feels personally responsible for

achieving their goals 4.0 4v55. The staff maintains a demand for high performance 4.0 4v56. The staff possesses a sense of unity and a willingness to strive

towards a common goal 4.0 4v57. My colleagues help me when necessary 4.0 4v58. The staff discusses improvements to work and/or their working

environment 4.0 4v59. The staff presents new ideas about solving work-related

problems 4.0 4v60. The staff wants to improve the quality of teaching 4.0 4

Factor 9. Team spirit (TES)v50. I have ample opportunities to exchange work-related ideas and

experiences with my colleagues 4.0 4v51. We tend to evaluate and analyze our work together to learn from

it 3.0 3v52. We solve work-related problems together 4.0 4v53. We advise and guide each other on executing work-related tasks 4.0 4

Factor 10. Psychic stress of the job (PSY)v78. I feel that I am beginning to dislike my work 2.0 2v79. I feel that it is getting more difficult for me to take the initiative 2.0 1v80. I find it difficult to concentrate 2.0 1

(continued )Table I.

JWL21,1

44

page 93

ProcedureThe sample was obtained with non-probability sampling. Each employee of theorganization was personally invited via e-mail to complete an online version of theGOAQ. The online questionnaire (Miettinen et al., 2005) presented one to five questionsat the same page allowing respondents to attach an open comment to each question.

Non-response error was analyzed by comparing the job profiles of the sample withsurvey population (teachers: 48 percent, managers: 6 percent, other personnel: 46 percent)and target population distributions (teachers: 57 percent, managers: 9 percent, otherpersonnel: 34 percent) derived from public records. Comparison of the job profiledistributions shows that the “other personnel” group is 12 per cent underrepresented inthe sample when compared with the survey population. Teachers are 9 per centunderrepresented in the sample when compared with the target population. The sampledistribution of job profiles is similar enough to survey and target population distributionsto represent Finnish polytechnic institution of higher education personnel in this study.

Data analysesResearch questions in this study are addressed with unsupervised multivariate dataanalysis methods that allow ordinal indicators. Unsupervised methods (e.g.,exploratory factor analysis) discover variable structure from the evidence of thedata matrix as opposite to supervised methods (e.g., discriminant analysis) thatassume a given structure (Venables and Ripley, 2002, 301). Unsupervised methods arefurther divided into four sub categories:

Data(n ¼ 447)

Item Median Mode

Factor 11. Build-up of work requirements (BUI)v70. My workplace has too few employees to cope with the workload 4.0 4v72. My workload has increased during the past years 4.0 5v76. My working pace has increased in recent years 4.0 4v77. I feel that I am experiencing fatigue 3.0 4

Factor 12. Commitment to work and organization (COM)v87. I am happy in my present job 4.0 4v88. I want to continue in my present job; it gives me job satisfaction 4.0 4v89. I don’t find going to work each morning disagreeable 4.0 5v91. I do not wish to change jobs 4.0 4

Factor 13. Growth motivation (GRM)v81. I feel encouraged by having added responsibilities 4.0 4v82. I find self-improvement useful 5.0 5v83. I like to participate in all manner of improvement projects within

the organization (such as training, team work and projects,exchanging duties, taking on additional tasks, etc.) 4.0 4

v84. I am interested in further training, provided it speeds up mytransfer to other, more challenging tasks 4.0 5

v85. I like to experiment with new ideas 4.0 4

Note: Data ¼ Employees of a Finnish polytechnic institution of higher education Table I.

Modelingof growth

prerequisites

45

page 94

(1) visualization methods;

(2) cluster analysis;

(3) factor analysis; and

(4) discrete multivariate analysis.

The first research question is investigated with exploratory factor analysis forcategorical indicators (CEFA), that is implemented in Mplus (Muthen and Muthen,2001), and Spearman nonparametric rank-order correlations. The use of CEFA has twomajor advantages over traditional exploratory factor analysis. First, it allows the use ofordinal indicators as it is based on the categorical variable model developed by BengtMuthen (1993). Second, it does not require multivariate normality as it applies thegeneral asymptotic distribution free function instead of the usual maximum likelihoodestimator (Muthen and Muthen, 2001).

The other three research questions are investigated in this paper with non-linearvisualization methods. According to Venables and Ripley (2002), visualization methodsare often more effective than clustering methods discovering interesting groupings inthe data, and they avoid the danger of over-interpretation of the results as researcher isnot allowed to input number of expected latent dimensions. In cluster analysis thecentroids that represent the clusters are still high dimensional, and some additionalillustration methods are needed for visualization (Kaski, 1997), for example,multidimensional scaling (Kim et al., 2000). We apply in this study the BayMiner(www.bayminer.com/en) non-linear visualization modeling software as it is capable ofanalyzing both linear and non-linear dependencies between variables underinvestigation (Kontkanen et al., 2000).

ResultsKerlinger (1986) classified weaknesses of rating scales into extrinsic and intrinsic.Extrinsic defect is that scales are way too easy to construct and use. Sometimes a scaleis used to measure things that it is not designed to measure. This point was addressedwith a pilot study of 12 respondents and an interview of the organizations developmentmanager. The online questionnaire that was used for the pilot study was the near-finalversion allowing respondents to attach an open comment to each question. Thisprocedure is quite close to what Fowler (1995, pp. 130-1) calls “field pre-test withobservation” as with an online questionnaire we are able to ask item-specific commentsand even track answering times for each item. The comments from the pilot study andinterview were analyzed and wordings improved where necessary. The item structurefrom the pilot study was analyzed with Bayesian dependency modeling that iscomputationally robust also with small sample sizes. Results of the pilot study showed,for example, that the term “manager” was not clear to all the respondents. Further,some of them did not understand the difference between “management” and“manager”. We solved this problem by adding clear definitions of the terms in theopening page of the questionnaire.

According to Kerlinger (1986, p. 495), intrinsic defect of rating scales is theirproneness to constant error. He lists four main sources:

(1) halo effect;

(2) error of severity (to rate all items too low);

JWL21,1

46

page 95

(3) error of leniency (to rate all items too high); and

(4) error of central tendency (to avoid all extreme judgments).

To examine intrinsic defect we analyzed the overall response tendency. Results showthat the respondents used the whole scale from 1 (totally disagree) to 5 (totally agree)for all the items but one. The scale for item v82 (“I find self-improvement useful”)ranged from 2 to 5. Mode frequencies that sum up to the number of items in thequestionnaire were as follows: (1, strongly disagree) n ¼ 0, (2) n ¼ 9, (3) n ¼ 27, (4)n ¼ 54, (5, strongly agree) n ¼ 2. This result is as overall distribution of the modes on afive-point summative rating scale is unimodal and only slightly biased towardspositive values.

RQ1. Relevance of the 13-factor growth-oriented atmosphere modelExploratory factor analysis for categorical indicators was conducted to solve the firstresearch question: “Is the 13-factor model of growth-oriented atmosphere relevant todescribe growth prerequisites of Finnish polytechnic institution of higher educationemployees?” In technical terms, our goal is to find the most relevant factorial structurefor observed variables measuring growth-oriented atmosphere.

The GOAQ items were subject to categorical exploratory unweighted least squaresfactor analysis with Varimax rotation. An initial estimation yielded 14 factors witheigenvalues exceeding unity, accounting for 73 per cent of the total variance.Thirteen-factor Varimax-rotated solution, accounting for 71 per cent of the totalvariance, was found to be most interpretable in terms of meaningful clusters andcorrespondence to both theoretical and empirical findings of our previous researchwork. The root mean square residuals (RMSR) help the investigator to examine howwell the aspects of the data are captured by the model (Loehlin, 2004, p. 70). RMSRvalue of 0.03 was well below a cut-off value of 0.08 (Hu and Bentler, 1999). Figure 1presents the 13-factor model of the growth-oriented atmosphere. The individual itemsrelated to the dimensions are presented in Table I.

Figure 1.Thirteen-factor model of

the growth-orientedatmosphere

Modelingof growth

prerequisites

47

page 96

Dimensions derived from the factor analysis are strongly related to each other as thecorrelation coefficients presented in Table II are significant at the 0.01 level(two-tailed). Spearman bivariate coefficients range between 0.81 and 20.52. Theaverage of all coefficients is 0.26 and the average of total variance explained is 7 percent. Closer examination of the coefficients reveals, as expected, that growthmotivation (GRM) is not affected by strategic leadership (STR), know-how rewarding(REW) or build-up of work requirements (BUI). It is also noteworthy to mention thatpsychic stress of the job (PSY) has the only positive correlation with build-up of workrequirements.

Growth prerequisites of a polytechnic institution of higher education can bedescribed with the help of the 12 dimensions that are presented in Table III. Students’attitude to teacher dimension that was present in the earlier solution of Nokelainen andRuohotie (2000) was omitted from this model due to theoretical and technical reasons.Theoretical reason was that the factor is too tightly related to teaching, making it anirrelevant dimension for those employees who do not teach (i.e. managers and otherpersonnel). The second, more technical point favoring rejection of the factor was thatthe items operationalizing the dimension were not selective enough (i.e. the full scalewas not in use).

Internal consistency measures estimate how consistently individuals respond to theitems within a scale. Reliability is, thus, a characteristic of the data in hand, and not ofthe test (Thompson, 1998). Table III shows both lower (Cronbach’s alpha) and upperbound (Tarkkonen’s reliability, see Vehkalahti, 2000) of such measures. The scores inour study range from 0.75 to 0.97 (Cronbach’s alpha) and from 0.79 to 0.97(Tarkkonen’s reliability). The most reliable factor was encouraging leadership (ENC).This finding is partly due to fact that alpha values tend to get larger as the number ofitems grows (ENC was measured with 15 items as the other dimensions had three toseven items).

RQ2. Validity of the four group classification of growth-oriented atmosphere factorsNon-metric multidimensional scaling was conducted in order to answer the secondresearch question: Is the theoretical four-group classification of the growth-orientedatmosphere factors valid for this sample? In technical terms, we examine what is thegeometric two-dimensional structure of the components operationalizinggrowth-oriented atmosphere.

Figure 2 represents the structure of two dimensional distance measures betweencases in our growth-oriented atmosphere data set. Euclidean distance as dissimilaritymeasure and distance scaling model was applied for ordinal data. First dimensionclassified components into two groups. First group contained factors representingoperational capacity of the team: student’s attitudes towards teacher (STA); growthmotivation (GRM); incentive value of the job (INV); and work group cooperation(WOC). Factors in the second group were connected to supporting and rewardingmanagement: rewarding for know-how (REW); strategic leadership (STR); and clarityof the job (CLA). Second dimension visualizes work-related stress: psychic stress of thejob (PSY) and increase in the demands of the work (BUI) represent the negative end ofscale, and encouraging leadership (ENC), valuation of the job (VAL), know-howdeveloping (DEV), commitment to work and organization (COM), and communityspirit (COS), the positive polarity (Figure 2.)

JWL21,1

48

page 97

Gro

wth

-ori

ente

dat

mos

ph

ere

fact

ors

12

34

56

78

910

1112

13

1.E

nco

ura

gin

gle

ader

ship

(EN

C)

–2.

Str

ateg

icle

ader

ship

(ST

R)

0.39

–3.

Kn

ow-h

owre

war

din

g(R

EW

)0.

650.

54–

4.K

now

-how

dev

elop

ing

(DE

V)

0.68

0.40

0.61

–5.

Ince

nti

ve

val

ue

ofth

ejo

b(I

NV

)0.

600.

270.

410.

59–

6.C

lari

tyof

the

job

(CL

A)

0.72

0.47

0.57

0.61

0.48

–7.

Val

uat

ion

ofth

ejo

b(V

AL

)0.

850.

370.

600.

660.

610.

63–

8.C

omm

un

ity

spir

it(C

OS

)0.

550.

320.

390.

550.

460.

480.

56–

9.T

eam

spir

it(T

ES

)0.

480.

300.

360.

490.

370.

440.

490.

75–

10.

Psy

chic

stre

ssof

the

job

(PS

Y)

20.

302

0.19

20.

212

0.31

20.

402

0.30

20.

352

0.24

20.

23–

11.

Bu

ild

-up

ofw

ork

req

uir

emen

ts(B

UI)

20.

192

0.19

20.

252

0.16

20.

122

0.20

20.

232

0.06

20.

060.

41–

12.

Com

mit

men

tto

wor

kan

dor

gan

izat

ion

(CO

M)

0.55

0.34

0.42

0.49

0.61

0.47

0.58

0.36

0.32

20.

502

0.28

–13

.G

row

thm

otiv

atio

n(G

RM

)0.

200.

040.

040.

240.

290.

110.

210.

300.

222

0.23

0.03

0.19

Note:

Sp

earm

anra

nk

ord

erco

rrel

atio

ns

(rs)

wer

eca

lcu

late

dd

ue

toor

din

alm

easu

rem

ent

scal

e

Table II.Correlation coefficients of

the 13 dimensions ofgrowth-oriented

atmosphere

Modelingof growth

prerequisites

49

page 98

Growth-orientedatmosphere factor Description a b TRc

1. Encouragingleadership (ENC)a

Management of the organization expresses andconsolidates values that direct activities, monitorsthe development processes of units and defines thedirection and focus of operations 0.97 0.97

2. Strategicleadership (STR)

Manager supports and motivates personnel todevelop know-how, work methods and workcommunity. He takes advantage of work communitymember’s expert knowledge and he tries to solveproblems with them. He pays attention to theexpectations and wishes of personnel 0.89 0.90

3. Know-howrewarding (REW)

Organization rewards its employees’ professionalknowledge and skills. Members of work communitygain more responsibility as their know-howincreases 0.87 0.87

4. Know-howdeveloping (DEV)

Organization takes an active interest in itsemployee’s professional growth. Members of workcommunity are interested in self-developing 0.88 0.90

5. Incentive value ofthe job (INV)a

Work gives intrinsic fulfilment by being versatile,autonomous and challenging 0.88 0.90

6. Clarity of the job(CLA)

Personnel has a clear picture of goals andresponsibilities. They are aware of decision-makingprocesses and personal expectations 0.87 0.90

7. Valuation of thejob (VAL)a

Work contribution is respected by the worker itself,colleagues and management 0.88 0.90

8. Community spirit(COS)a

How community members may learn from eachother, for example via dialogue, by analyzingmistakes, participating in collaborative planningand quality development 0.92 0.93

9. Team spirit (TES) Good team spirit promotes helping each other andtaking responsibility over common goals. Workgroup members discuss about developing work andworking environment 0.87 0.88

10. Psychic stress ofthe job (PSY)

To what extent work and changes relating to itinduce psychic strain like fatigue flightiness 0.83 0.85

11. Build-up of workrequirements(BUI)

How to cope with changes in the personal workload

0.75 0.7912. Commitment to

work andorganization(COM)a

To be truly excited about one’s work. Howimportant it is to stay in current job

0.87 0.8913. Growth

motivation (GRM)aTo trust ones abilities in difficult situations, takenew challenges and develop ones know-how 0.80 0.81

Notes: a Common dimension as in the previous study in the same organization (Ruohotie andNokelainen, 2000) with 80-item version of the questionnaire; b Cronbach’s index of internalconsistency; c Tarkkonen’s reliability index

Table III.The 13 dimensions ofgrowth-orientedatmosphere

JWL21,1

50

page 99

Examination of the coordinates for scaling Euclidean dimensions in two-dimensionalspace shows that growth motivation (1.3945) and incentive value of the job (1.1274) arethe strongest components on the positive end of the first dimension and psychicalstress of the job (22.4731) is strongest on the negative end together with rewarding ofknow-how (21.5332), and strategic leadership (21.4983). Stress value (0.049) indicatesthat the model fits to the data reasonably well. This result together with visualexamination of the Figure 2 supports the earlier research finding suggesting thatencouraging leadership (ENC) and commitment to work and organization (COM) areclosely situated in the visual space, but in different dimensions.

RQ3. Position and the nature of contract as predictors of growth motivationBayesian model-based visualization is applied in this study to investigate the thirdresearch question: To what extent employees’ position and the nature of contract areconnected to growth motivation? With Bayesian unsupervised model-basedvisualization we may concentrate on singular summary factors and study each onesdistribution dynamically.

Figure 3 is a visualization of the Bayesian network model. The window hasfollowing elements: Main window, attribute selection window (upper left corner), lowprofile window (lower left corner), initial profile window (lower right corner), and highprofile window (upper right corner). The main window contains the model in whicheach dot stands for one respondent (n ¼ 447). Attribute selection window shows thecurrent component of interest and its discretization (i.e. the classes of data). In Figure 3,the component of interest is growth motivation (KM_GRM). Low profile window

Figure 2.The growth-oriented

atmosphere factors intwo-dimensional space

(MDS, Euclidean distancemodel)

Modelingof growth

prerequisites

51

page 100

shows the distribution of examined variables when sub sample represents the lowestvalues of the component, high profile window has the same functionality for the highend sub sample. Initial profile window shows the initial distribution of the examinedvariables. Thin bars in profile window represent initial values, thick bars values of thecurrent sub sample.

The attribute selection window in the upper left part of Figure 3 shows that growthmotivation scale is quite biased and thus the upper bound for the lowest category is3.55. However, inspection of the values in high-scale profile window and high scale subsample frame gives evidence that managers and teachers has distinguishedrepresentation in the highest category of growth motivation as the thick bar is tallerthan the thin bar that indicates the average value. It is interesting to observe that thoserespondents with the most insecure contracts, namely temporary and part-time, havehigher growth motivation than their established colleagues (Figure 3).

RQ4. Position and the nature of contract as predictors of commitment to theorganizationThe fourth research question is to study to what extent employees’ position and thenature of contract is connected to his or her commitment to the organization. Attributeselection window in the upper left part of Figure 4 shows that scale for commitment towork and organization is balanced: Upper bound for the lowest category is 2.92 andlower bound for the highest category is 4.76. Values in high-scale profile window andhigh scale sub sample frame suggest that managers and teachers have the highest levelof commitment to work as the thick bar is taller than the thin bar indicating theaverage value. This result is parallel with our earlier research findings (Ruohotie and

Figure 3.Bayesian model-basedvisualization of growthmotivation by employees’position and the nature ofcontract

JWL21,1

52

page 101

Nokelainen, 2000). Commitment to work and organization is highest among thoserespondents with the most insecure contracts (Figure 4).

ConclusionWe have examined dimensions of growth-oriented atmosphere in a Finnishpolytechnic institution of higher education with categorical exploratory factoranalysis, classical multidimensional scaling and Bayesian unsupervised model-basedvisualization.

Thirteen-dimension Varimax-rotated solution in the categorical factor analysis wasfound to be interpretable in terms of meaningful clusters and correspondence to boththeoretical and empirical findings of previous research (Ruohotie and Nokelainen,2000).

Results of two-dimensional scaling showed that the components on the negative endof the first dimension represent operational capacity of the team. Components on thepositive end of the first dimension are related to supporting and rewardingmanagement. Second dimension visualized work-related stress; the most componentswith the most negative coordinates were psychical stress of the job and increase in thedemands of the work. Rewarding for know-how, clarity of the job assignments, andencouraging leadership represented the positive polarity of the second dimension.Research evidence suggests that the psychic stress caused by the work affectsincreasingly to the build-up of work requirements.

DiscussionThe findings of a previous study (Ruohotie and Nokelainen, 2000) conducted in thesame domain suggested that growth-oriented atmosphere generates togetherness and

Figure 4.Bayesian model-based

visualization ofcommitment to work and

organization byemployees’ position and

the nature of contract

Modelingof growth

prerequisites

53

page 102

reflects on developing leadership. Multidimensional scaling and Bayesianunsupervised model-based visualization both provided evidence to conclude thatfactors representing encouraging leadership and commitment to work andorganization are closely situated in, but in different dimensions. Results furthershowed that managers and teachers had the highest growth motivation and level ofcommitment to work. Employees across all job titles in the organization withtemporary or part-time contracts, had higher self-reported growth motivation andcommitment to work and organization than their established colleagues.

A recent study among the employees of a US restaurant chain showed thatconscientiousness was the best predictor of job performance against work experience,psychological atmosphere and work effort (Byrne et al., 2005). Results indicated thatbeing conscientious might not be enough to secure the highest levels of performanceunless the individual is concurrently willing to work hard, and is a member of apsychologically secure work setting. In the current study, most of the employees wereworking with established contracts and the work performance was not measured, butstill the results are comparable at least to some extent. Findings of both studiesunderline the importance of willingness to work hard (i.e. high growth motivation andvaluation of the job) and psychologically secure work setting (i.e. low level of psychicstress, strong team and community spirit).

Finally, an interesting future research direction would be to study relationshipbetween empowerment and management of change (Spreitzer et al., 1999).Empowerment means the removal of constraints that prevent individuals fromworking to their optimal potential (Mills and Friesen, 1995). Perceived empowerment isa process that expands individual power in comparison to status quo or some solid endresult. It will occur to varying degrees within any organization, and individuals willexperience variable feelings of empowerment at different times (Koberg et al., 1999).Thus, managers should be able to evolve innovative ideas, gain support from his/hersuperiors, and finally, encourage members of the work community to strive for acommon goal.

References

Argyris, C. (1972), The Applicability of Organizational Society, Cambridge University Press,London.

Argyris, C. (1992), On Organizational Learning, Blackwell Publishers, Cambridge, MA.

Byrne, Z.S., Stoner, J., Thompson, K.R. and Hochwarter, W.A. (2005), “The interactive effects ofconscientiousness, work effort, and psychological climate on job performance”, Journal ofVocational Behavior, Vol. 66 No. 2, pp. 326-38.

Cattell, R.B. (1978), The Scientific Use of Factor Analysis in Behavioral and Life Sciences, PlenumPress, New York, NY.

Cronbach, L.J. (1970), Essentials of Psychological Testing, 3rd ed., Harper & Row, New York, NY.

DeVellis, R.F. (2003), Scale Development, Theory and Applications, 2nd ed., Sage, Thousand Oaks,CA.

Dubin, S. (1977), “The updating process”, Continuing Education in Science and Engineering,December, pp. 165-86.

Dubin, S. (1990), “Maintaining competence through updating”, in Willis, S. and Dubin, S. (Eds),Maintaining Professional Competence, Jossey-Bass, San Francisco, CA, pp. 9-43.

JWL21,1

54

page 103

Fishbein, M. and Stasson, M. (1990), “The role of desires, self-predictions, and perceived controlin the prediction of training session attendance”, Journal of Applied Social Psychology,Vol. 20 No. 3, pp. 173-98.

Fowler, F.J. Jr (1995), “Improving survey questions”, Design and Evaluation, Sage, ThousandOaks, CA.

Gorusch, R. (1983), Factor Analysis, 2nd ed., Lawrence Erlbaum Associates, Hillsdale, NJ.

Hair, J.F., Anderson, R.E., Tatham, R.L. and Black, W.C. (1995), Multivariate Data Analysis,4th ed., Prentice-Hall, Englewood Cliffs, NJ.

Hall, D.T. (1986), “Breaking career routines: mid-career choice and identity development”,in Hall, D.T. (Ed.), Career Development in Organizations, Jossey-Bass, San Francisco, CA,pp. 120-59.

Hall, D.T. (1990), “Career development theory in organizations”, in Brown, D. and Brooks, L. (Eds),Career Choice and Development, Jossey-Bass, San Francisco, CA, pp. 422-54.

Hu, L. and Bentler, P. (1999), “Cut-off criteria for fit indexes in covariance structure analysis:conventional criteria versus new alternatives”, Structural Equation Modeling, Vol. 6 No. 1,pp. 1-55.

Johnson, D.R. and Creech, J.C. (1983), “Ordinal measures in multiple indicator models:a simulation study of categorization error”, American Sociological Review, Vol. 48 No. 3,pp. 398-407.

Kaski, S. (1997), “Data exploration using self-organizing maps”, Acta Polytechnica Scandinavica,Mathematics, Computing and Management in Engineering Series No. 82, FinnishAcademy of Technology, Espoo.

Kaufman, H. (1974), Obsolescence and Professional Career Development, AMACOM, New York,NY.

Kaufman, H. (1990), “Management techniques for maintaining a competent professional workforce”, in Willis, S. and Dubin, S. (Eds), Maintaining Professional Competence, Jossey-Bass,San Francisco, CA, pp. 249-61.

Kerlinger, F. (1986), Foundations of Behavioral Research, 3rd ed., CBS College Publishing,New York, NY.

Kim, S.-S., Kwon, S. and Cook, D. (2000), “Interactive visualization of hierarchical clusters usingMDS and MST”, Metrika, Vol. 51 No. 1, pp. 39-51.

Knowles, M. (1990), The Adult Learner: A Neglected Species, Gulf Publishing Company, Houston,TX.

Koberg, C., Boss, R., Senjem, J. and Goodman, E. (1999), “Antecedents and outcomes ofempowerment”, Group & Organizational Management, Vol. 24 No. 1, pp. 71-91.

Kontkanen, P., Lahtinen, J., Myllymaki, P. and Tirri, H. (2000), “Unsupervised Bayesianvisualization of high-dimensional data”, in Ramakrishnan, R., Stolfo, S., Bayardo, R. andParsa, I. (Eds), Proceedings of the Sixth International Conference on Knowledge Discoveryand Data Mining, The Association for Computing Machinery, New York, NY, pp. 325-9.

Lawler, E.E. (1994), “From job-based to competence-based organizations”, Journal ofOrganizational Behaviour, Vol. 15, pp. 3-15.

Liljander, J.-P. (2002), “AMK-uudistus” (“Polytechnic institution of higher education reform)”,in Liljander, J.-P. (Ed.), Omalla Tiella (On One’s Own Road), Edita, Helsinki, pp. 10-20.

Loehlin, J.C. (2004), Latent Variable Models, 4th ed., Lawrence Erlbaum Associates, Mahwah, NJ.

Modelingof growth

prerequisites

55

page 104

London, M. and Mone, E.M. (1999), “Continuous learning”, in Ilgen, D.R. and Pulakos, E.D. (Eds),The Changing Nature of Performance: Implications for Staffing, Motivation andDevelopment, Jossey-Bass Publishers, San Francisco, CA, pp. 119-53.

MacCallum, R.C., Widaman, K.F., Zhang, S. and Hong, S. (1999), “Sample size in factor analysis”,Psychological Methods, Vol. 4 No. 1, pp. 84-99.

Marini, M., Li, X. and Fan, P. (1996), “Characterizing latent structure: factor analytic and grade ofmembership models”, Sociological Methodology, Vol. 1 No. 26, pp. 133-64.

Maurer, T. and Tarulli, B. (1994), “Investigation of perceived environment, perceived outcome,and person variables in relationship to voluntary development activity by employees”,Journal of Applied Psychology, Vol. 79 No. 1, pp. 3-14.

Miettinen, M., Nokelainen, P., Kurhila, J., Silander, T. and Tirri, H. (2005), “EDUFORM – a tool forcreating adaptive questionnaires”, International Journal on E-learning, Vol. 4 No. 3,pp. 365-73.

Mills, D. and Friesen, B. (1995), “Empowerment”, in Crainer, S. and Dearlove, D. (Eds), FinancialTimes Handbook of Management, FT/Pitman, London, pp. 345-57.

Muthen, B.O. (1993), “Goodness of fit with categorical and other non-normal variables”,in Bollen, K.A. and Long, J.S. (Eds), Testing Structural Equation Models, Sage, NewburyPark, CA, pp. 205-43.

Muthen, L.K. and Muthen, B.O. (2001), Mplus User’s Guide, 2nd ed., Muthen and Muthen,Los Angeles, CA.

Nokelainen, P. (2008), Modeling of Professional Growth and Learning: Bayesian Approach,Tampere University Press, Tampere.

Nokelainen, P., Ruohotie, P., Tirri, H. and Silander, T. (2002), “Empowerment – modeling theprerequisites of change and growth with Bayesian networks”, paper presented at theEuropean Conference on Educational Research (ECER), Lisbon, September.

Nokelainen, P., Silander, T., Ruohotie, P. and Tirri, H. (2007), “Investigating the number ofnon-linear and multi-modal relationships between observed variables measuring agrowth-oriented atmosphere”, Quality & Quantity, Vol. 41 No. 6, pp. 869-90.

Pazy, A. (2004), “Updating in response to the experience of lacking knowledge”, AppliedPsychology: An International Review, Vol. 53 No. 3, pp. 436-52.

Ruohotie, P. (1996), “Professional growth and development”, in Leithwood, K., Chapman, S.,Carson, D., Hollinger, P. and Hart, A. (Eds), International Handbook of EducationalLeadership and Administration, Kluwer Academic Publishers, Dordrecht, pp. 419-45.

Ruohotie, P. (2000), “Conative constructs in learning”, in Pintrich, P. and Ruohotie, P. (Eds),Conative Constructs and Self-regulated Learning, Career Development Centre, Vancouver,pp. 1-31.

Ruohotie, P. and Nokelainen, P. (2000), “Beyond the growth-oriented atmosphere”, in Beairsto, B.and Ruohotie, P. (Eds), Empowering Teachers as Lifelong Learners, RCVE, Hameenlinna,pp. 147-67.

Spreitzer, G., DeJanasz, S. and Quinn, R. (1999), “Empowered to lead: the role of psychologicalempowerment in leadership”, Journal of Organizational Behavior, Vol. 20 No. 4, pp. 511-26.

Thompson, B. (1998), “Five methodology errors in educational research: the pantheon ofstatistical significance and other faux pas”, invited address presented at the AnnualMeeting of the American Educational Research Association, San Diego, CA, April.

Vehkalahti, K. (2000), Reliability of Measurement Scales, Statistical Research Reports No. 17,Finnish Statistical Society, Helsinki, available at: http://ethesis.helsinki.fi/julkaisut/val/tilas/vk/vehkalahti (accessed March 16, 2008).

JWL21,1

56

page 105

Venables, W.N. and Ripley, B.D. (2002), Modern Applied Statistics with S, 4th ed., Springer,New York, NY.

Yeo, G. and Neal, A. (2008), “Subjective cognitive effort: a model of states, traits, and time”,Journal of Applied Psychology, Vol. 93 No. 3, pp. 617-31.

Further reading

Nokelainen, P., Ruohotie, P., Tirri, H. and Silander, T. (2002), “Empowerment – modeling theprerequisites of change and growth with Bayesian networks”, paper presented at theEuropean Conference on Educational Research (ECER), Lisbon, September.

Nokelainen, P., Silander, T., Ruohotie, P. and Tirri, H. (2007), “Investigating the number ofnon-linear and multi-modal relationships between observed variables measuring agrowth-oriented atmosphere”, Quality & Quantity, Vol. 41 No. 6, pp. 869-90.

Tracey, J., Tannenbaum, S. and Kavanagh, M. (1995), “Applying trained skills on the job: theimportance of the work environment”, Journal of Applied Psychology, Vol. 80 No. 3,pp. 239-52.

Walsh, K., Bartunek, J. and Lacey, C. (1998), “A relational approach to empowerment”,in Cooper, C. and Rousseau, D. (Eds), Trends in Organizational Behavior, Vol. 5,John Wiley & Sons, Chichester, pp. 103-26.

Modelingof growth

prerequisites

57

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

page 106

© Koninklijke Brill NV, Leiden, 2009 DOI: 10.1163/157092509X437224

Journal of Empirical Th eology 22 (2009) 70-87 brill.nl/jet

Journal of Empirical Theology

How Morality and Religiosity Relate to Intelligence: A Case Study of Mathematically Gifted Adolescents

Kirsi Tirria, Petri Nokelainenb, Marko Mahkonenc

a) University of Helsinki, [email protected]

b) University of Tampere, Finlandc) Nokia Research Centre, Finland

Received 2 February 2009; accepted 5 March 2009

SummaryIn this article we explore the moral and religious reasoning of mathematically gifted adolescents (N = 20) who attend a special boarding school for gifted students in Finland. Th e sample consists of 11 female and 9 male fi rst-year upper secondary school students (M

age = 16.25, SD

age = 0.444).

Th e participants’ intelligence and their moral and religious reasoning were measured by means of the following instruments: Wechsler Adult Intelligence Scale III (WAIS-III); Defi ning Issues Test (DIT); and Religious Judgment Test (RJT) respectively. Th e research design was correlational and included the following three research questions: (1) How is intelligence related to moral thinking? (2) How is intelligence related to religious thinking? (3) How are moral and religious thinking related to each other? Results regarding the fi rst research question showed that moral reasoning was related to intelligence. However, WAIS-III scores were not positively linked to the DIT scores within this highly gifted sample. Results regarding the second research question showed that the most intelligent young adults were more opposed to the lowest and highest forms of religious reasoning than their less intelligent peers. Results regarding the last research question showed that the level of moral thinking was negatively related to both the lowest and the highest stages of religious judgement, but positively related to the third religious orientation stage (ego autonomy and one-sided self-responsibility).

Keywordsmorality, religiosity, intelligence, mathematically gifted students

1 Introduction

We know from earlier empirical research that intelligence tends to correlate with high levels of moral and religious reasoning (Narvaez, 1993; Räsänen, Tirri & Nokelainen, 2006; 2007). However, the relationship between intelli-gence and morality is a very complex one and needs more detailed study (Tirri & Nokelainen, 2007). Not much research has been done on the relationship between moral judgement and religious judgement (Räsänen et al., 2006).

JET 22,1_f6_70-87.indd 70JET 22,1_f6_70-87.indd 70 4/14/2009 5:19:19 PM4/14/2009 5:19:19 PM

page 107

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 71

Th e existing body of research indicates that religiously conservative subjects tend to score lower on moral reasoning tests (e.g. Defi ning Issues Test, DIT) than their more liberal peers. According to Rest (1986, p. 131), ‘the most striking fi nding from the literature relating religious measures to moral judge-ment development is the consistent relationship between DIT P index and religious beliefs’.

According to Duriez and Soenens (2006), research has shown that reli-giously affi liated persons exhibit increased performance for moral reasoning in terms of Lawrence Kohlberg’s conventional level (stages 3 and 4) and decreased performance for the post-conventional level (stages 5 and 6). Th ey further argue (2006, p. 76), in their recent empirical study with 1010 participants representing Belgian Dutch-speaking adolescents and adults, that ‘although there is no intrinsic relationship between religiosity and morality, the way peo-ple process religious contents is predictive of the way they deal with moral issues’. In practice this means that the moral stage of religious community (e.g. whether principled moral reasoning is used or not) aff ects the level of moral reasoning.

In this article we explore the moral and religious reasoning of mathemati-cally gifted adolescents (N = 20) who attend a special school for gifted students. Th e private and independent boarding school is located in the coun-tryside of Southern Finland. Th e object of the current study, a mathematics program, is supported by Nokia, the largest IT company in Finland. External fi nancial support ensures that studying and living in this boarding school is almost free for the students.

Th e mathematics program began in 1994. Th e school annually selects 20 students according to the ‘excursion weekends’ entrance examination tests. Students are mathematically gifted 15- to 18-year-olds and they graduate from the senior secondary school in two years instead of the average three years. All the students have taken three well-known tests to measure their general intel-lectual ability — Wechsler Adult Intelligence Scale (WAIS-III), moral reason-ing (DIT) and religious thinking (RJT).

In this study we have formulated the following three research questions: (1) How is intelligence related to moral thinking? (2) How is intelligence related to religious thinking? (3) How are moral and religious thinking related to each other?

Th e article is organised as follows: First we give an overview of the existing research on the measurement of moral and religious thinking. Second, we describe the sample and our research instruments. Finally we present and dis-cuss the results.

JET 22,1_f6_70-87.indd 71JET 22,1_f6_70-87.indd 71 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 108

72 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

2 Th eoretical Background

2.1 Research on Moral Th inking

Most of the studies in the area of moral development are based on the cogni-tive-developmental theory of Lawrence Kohlberg (e.g. 1969). Th e Defi ning Issues Test (DIT) is a well-documented measure of moral judgement that has been used all over the world (Rest, 1986). Th e index most frequently used is the “P index,” which refl ects a person’s principled reasoning (stages 5 and 6 in Kohlberg’s theory). Kohlberg’s procedures have been criticised for lack of diver-sity in the moral dilemmas that have been used in the interviews (Yussen, 1977). Th e hypothetical dilemmas can also be seen as too abstract and removed from the daily experiences of most people (Straughan, 1975). Recognition of these aspects of hypothetical dilemmas has led educational researchers to study real-life moral problems identifi ed by people (Walker, de Vries & Trevethan, 1987). Th e research conducted in this area shows that adolescents formulate dilem-mas very diff erent from the hypothetical dilemmas used by Kohlberg and his colleagues to assess moral reasoning (Yussen, 1977; Binfet, 1995). Most of the dilemmas formulated by Kohlberg focus on issues of ownership, public wel-fare, and life and death. In Yussen’s study (1977), the moral dilemma themes formulated by adolescents focused most frequently on interpersonal relations. Colangelo (1982) and Tirri (1996) found the same tendency among gifted adolescents.

Andreani and Pagnin provided a comprehensive review of the literature in their article (1993). According to these authors, gifted students are presumed to have a privileged position in the maturation of moral thinking because of their precocious intellectual growth. Terman’s (1925) sample of gifted chil-dren showed superior maturity in moral development in terms of choosing socially constructive activities and in rating misbehaviour.

In the 1980s Karnes and Brown (1981) made an initial investigation of moral development and the gifted, using Rest’s DIT. Th eir sample included 233 gifted students (9-15 years of age) who were selected for a gifted program. Th e results of the DIT were compared to the students’ results in a test that measured their intellectual ability (WISC-R). Th e empirical results of the study showed a positive correlation between the two tests. According to researchers, intellectually gifted children appear to reach a relatively high stage of moral reasoning earlier than their chronological peers (Karnes & Brown, 1981).

Other studies of moral judgement using DIT P indexes have shown that gifted adolescents scored higher than their peers as a group (Tan-Willman &

JET 22,1_f6_70-87.indd 72JET 22,1_f6_70-87.indd 72 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 109

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 73

Gutteridge, 1981; Janos & Robinson, 1985; Narvaez, 1993). However, the data for high-achieving adolescents have indicated that the relationship between apparent academic talent and moral judgement indexes is more com-plex. According to Narvaez’s study, high academic competence is necessary for an unusually high P index but does not necessarily predict it. Th e high achiev-ers can have an average to high moral judgement index, whereas low achievers cannot be high scorers in moral judgement (Narvaez, 1993).

Ikonen-Varila (2000) reported DIT P indexes of Finnish 9th graders (N = 1631). According to her, the proportion of post-conventional moral reasoning was 22.6 per cent. Ikonen-Varila found a positive connection between academic competence and moral reasoning. Success at school, classi-fi ed into three groups (satisfactory, moderate, excellent), produced the average DIT P indexes of 15.4, 24.2 and 29.7 respectively. She concluded that because cognitive factors regulate moral reasoning in childhood and adolescence, it is natural that school success should be one of the main background factors explaining moral reasoning abilities. Her results support the connection between giftedness and moral reasoning: the more gifted, the more capable of principled moral reasoning.

Tirri and Pehkonen (2002) explored the moral reasoning and scientifi c argumentation of Finnish adolescents who are gifted in science. Th ese 16 girls and 15 boys (14-15 years of age) participated in a gifted program at the Uni-versity of Helsinki. Th e design contained the following research instruments and procedures: (1) Raven’s Standard Progressive Matrix (SPM) was used to provide a test comparing students’ capacities for observation and clear think-ing; (2) moral reasoning was measured by means of DIT; (3) students were asked to write essays on scientifi c moral dilemmas; (4) researchers interviewed the students. Th e results show that the average DIT P index was 41, represent-ing the average score for a heterogeneous group of 18-year-olds. Scores ranged from 7 to 78, indicating quite high variance (SD = 15.8); some students really represented post-conventional moral reasoning, some not at all. An interesting fi nding was that the correlation between DIT and SPM was close to zero (Tirri & Pehkonen, 2002).

In a recent Finnish study, DIT P indexes of 51 academically gifted 9th grade students and their average-ability peers (N = 77) were compared (Räsänen et al., 2006). Räsänen and his colleagues investigated the DIT P index distribution separately for the male (n = 21) and female (n = 25) sub-samples. Th e score average for the gifted males was 35.0 with a standard deviation of 15.5. Th e lower boundary of 95 per cent confi dence interval was 26.7 and the upper boundary 43.3. Th e DIT P indexes ranged from 16 to 79 in the male sub-sample. Th e score average for the gifted females was 35.9 with

JET 22,1_f6_70-87.indd 73JET 22,1_f6_70-87.indd 73 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 110

74 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

a standard deviation of 15.4. Th e lower boundary of 95 per cent confi dence interval was 29.5 and the upper boundary 42.3. DIT score values range from 15 to 75 in the female sub-sample. Th ese results run parallel with those of an earlier Narváez study (1993). She found that standard deviation increased according to the level of giftedness, concluding that academic competence is a necessary but not suffi cient condition for principled thinking.

Räsänen and his colleagues (2006) further classifi ed the DIT scores into four classes on the basis of the quartiles: 1st quartile (25%, DIT score values below 25.0), 2nd quartile (50%, DIT score values from 25.0 to 33.9), 3rd quartile (75%, DIT score values from 34.0 to 44.4) and 4th quartile (100%, DIT score values above 44.4). However, no statistically signifi cant diff erence between the DIT scores of male and female respondents was found, χ2(3, n = 41) = 4.733, p = .192. Th e existing body of research shows that the absence of gender diff erences in the case of gifted students is not unusual. Narváez (1993) did not fi nd signifi cant diff erences between gifted female and male students: girls had an average P index of 28.2 while the boys had an aver-age P index of 25.6. Rest (1986) too concludes, on the basis of a meta-analysis of 56 DIT studies, that although females usually gain higher P index than males, gender accounts for only 0.9 per cent of the variance. According to him, age and education are 250 times more powerful than gender in explain-ing the P index variance.

2.2 Research on Religious Th inking

Studies concerning the relationship between intelligence and religious think-ing are far more rare. Religious reasoning can be investigated by means of a new test called the religious judgement test (RJT) that adapts the ideas in DIT. Th e test, developed by Antti Räsänen and his colleagues (Räsänen et al., 2006; 2007), is based on Oser’s theory of religious development (Oser & Gmünder, 1991). Oser’s theory is about the way one copes with contingency situations. Th e concepts of religious judgement and developmental stage are two key constructs in Oser’s theory. Religious judgement is some kind of cognitive pattern of religious knowing of reality (Oser, 1980). As individu-als interpret their experiences, pray, study religious texts or take part in reli-gious life, they actualise their personal religious consciousness, the system of rules that concern their relationship to the Ultimate. Th e relationship appears in verbal form in religious judgement. According to the developmental theory of religious judgement (Oser & Gmünder, 1991), individuals produce reli-gious judgements especially when faced with the border or contingency situa-tions of life.

JET 22,1_f6_70-87.indd 74JET 22,1_f6_70-87.indd 74 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 111

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 75

According to Räsänen (2003), the stage concept signifi es that developmen-tal stages are qualitatively diff erent and they follow an unchanging sequence in the course of development across the life span. Th e fi ve developmental stages focus on various forms of relationship between the human and the Ultimate being, to whom Christianity refers as God. Each stage must be seen as a unique depth structure which may have various contents. Oser claims that the stages have universal cross-cultural validity, although the content may vary at specifi c stages. Usually transition from one stage to the next involves a period of uncer-tainty. Th e individuals’ relation to God is qualitatively diff erent, depending on their developmental stage.

Oser has diff erentiated fi ve developmental stages forming a hierarchal sequence (Oser, 1980; Oser & Gmünder, 1991), but he has not presented well-defi ned age limits for his stages (Räsänen, 2003). Basically, the fi rst stage is most prev-alent at the age of 7 to 9 and it disappears entirely by the age of 14 to 15. Th e second stage of religious thinking is most intensive at the age of 11-12 and then vanishes by the age of 20-25, but it reappears strongly in late adulthood (56-75 years). Th e third stage is generally not possible until the phase of abstract thinking is reached. According to Oser’s empirical studies (1980; Oser & Gmünder, 1991), the third stage is the typical one in youth and young adulthood. Th e fourth stage is usually placed at middle age, though some people reach it in young adulthood. Th e fi fth stage is hypothetical in that, in empirical studies, no-one has been placed there (Räsänen et al., 2007).

3 Method

3.1 Sample

Th e sample consisted of 11 female and 9 male fi rst-year upper secondary school students at an independent and private boarding school in Finland. Th e school specialises in mathematics — education and competitions. Par-ticipants’ age was from 16 to 17 years (M = 16.25, SD = 0.444). All the meas-urements were completed during 2008.

3.2 Measurements

An experienced, licensed psychologist measured students’ general intelligence (including verbal and performance indexes) by means of the Wechsler Adult Intelligence Scale III (WAIS-III, Wechsler, 1997). In addition, students responded on two scales: the Defi ning Issues Test (DIT, Rest, 1986) and the Religious Judgment Test (RJT, Räsänen, 2003; Räsänen et al., 2007).

JET 22,1_f6_70-87.indd 75JET 22,1_f6_70-87.indd 75 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 112

76 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

Th e Wechsler Adult Intelligence Scale III (WAIS-III) is one of the best known intelligence tests worldwide. Th e test has two main components, Ver-bal IQ (VIQ) and Performance IQ (PIQ); together they form the general-level full-scale intelligence quotient (FSIQ). In the following analysis, we used the FSIQ score to represent the participant’s measured intelligence. For the pur-pose of the analysis, the FSIQ score (theoretically ranging from 45 to 155) was divided into three classes (1 = 111-120, 2 = 121-130, 3 > 130). Th e class with the lowest FSIQ was labelled ‘C’, the middle class ‘B’ and the highest-achiev-ing class ‘A’ in line with Terman’s studies (1925). We are mainly interested in comparing the performance of the A (FSIQ scores above 130) and C (FSIQ scores below 120) groups.

Th e Defi ning Issues Test (DIT) is based on Kohlberg’s (1969) research on moral judgement. It contains six dilemmas: (1) Heinz and the drug; (2) Stu-dent takeover; (3) Escaped prisoner; (4) Th e doctor’s dilemma; (5) Webster; (6) Newspaper. According to Rest (1986), people at diff erent points of devel-opment interpret moral dilemmas diff erently, defi ne the critical issues of the dilemmas diff erently and have diff erent intuitions about what is right and fair in a situation. Th e respondent’s task is to consider 12 issues for each dilemma and then indicate which are the most important in deciding what to do. Th e P index (‘principled morality’) is the most widely used score from the DIT (including D index, M score, A score, Utiliser score and Action Choice index). According to Rest (1986), it is calculated by counting the number of times Kohlberg’s stage 5 and 6 items are chosen as the fi rst, second, third or fourth most important consideration, weighting these ranks by 4, 3, 2 and 1 respec-tively. Th e score ranges from zero (lowest) to 95 (highest). Test-retest reliabil-ity ranges between .70 and .80 and the DIT typically takes 35-50 minutes to complete (Narváez, 1993). A new version of the test (DIT 2) is also available, but it is shown to correlate positively with the DIT test (.53-.70) (Th oma, Rest, Narváez & Derryberry, 1999). In this study the DIT P index, ranging from 0 to 95, was divided into three classes (1 < 40, 2 = 40-49, 3 > 49).

Religious reasoning is explored in this study by means of the RJT (Räsänen, 2003; Räsänen et al., 2007), which is a 26-item multiple-choice questionnaire based on Fritz Oser’s (1980) theory of religious judgement (see Appendix). Th e scale for each item ranges from 1 (totally disagree) to 5 (totally agree). Oser has mainly used hypothetical dilemmas to study the developmen-tal stages, but Räsänen later showed in his empirical studies (Räsänen et al., 2007) that multiple-choice items too can capture individual variance in reli-gious thinking. Th e fi ve stages of Oser’s theory (1980) are as follows: (1) Ori-entation to religious heteronomy (e.g. item 4, ‘God is able to control all the events in the world by sending, for example, storm’); (2) Orientation to

JET 22,1_f6_70-87.indd 76JET 22,1_f6_70-87.indd 76 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 113

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 77

do ut des (give so that you may receive, God is an all-powerful being, e.g. item 8, ‘Righteous life protects me from God’s anger’); (3) Orientation to ego autonomy and one-sided self-responsibility (deism, e.g. item 9, ‘free human being acts autonomously without God’); (4) Orientation to mediated auton-omy and salvation plan (e.g. item 16, ‘even though I am free to make my own decisions, I consider God’s advice to humankind when making decisions’); (5) Orientation to religious intersubjectivity and autonomy (e.g. item 18, ‘I know that the invisible world of God reverberates in this world in love and forgiveness’). RJT was analysed in terms of the summative scores of fi ve devel-opmental stages in religious judgement. Räsänen and his colleagues (2007) investigated internal scale reliabilities by means of Cronbach’s (1970) alpha, using a sample of 413 Finnish adolescents. Th ey reported the following alpha values for the fi ve developmental stages: I α = .83, II α =.88, III α =.73, IV α =.88, V α =.81. In the current sample, internal consistency values ranged from .79 to .90 (see Appendix). Th ese alpha levels are adequate if we consider Nunnally’s statement: “Increasing reliabilities much beyond .80 is often waste-ful of time and funds with the exception of applied settings where important decisions are made with respect to specifi c test scores’ (Nunnally, 245-246).

In addition, the students’ gender, school achievement (self-reported 9th grade marks in mathematics, religion, native and foreign languages) and math-ematical giftedness (boarding school’s entrance examination test) were used as controlling variables in the analysis. Th e entrance examination test, PSMEES, contains a set of multifaceted mathematical tasks at various competence levels. Th e score, ranging from −23.5 (minimum) to 50 (maximum), was divided into three classes (1 < 20, 2 = 20-34, 3 > 34.1).

3.3 Statistical Analyses

Because of small sample size and uncertainty of linear variable dependencies (Marini, Li & Fan, 1996), we applied non-linear and non-parametric statisti-cal techniques to answer the research questions. First, we examined the statisti-cal relationship between controlling variables and dependent variables by means of the Chi square test (χ2), contingency coeffi cient (C ) and non-parametric rank-order correlation (r

s). Second, considering the three research

questions separately, we examined statistical dependencies between the three test scores by means of Bayesian classifi cation and dependency modelling.

Bayesian theory, based on a concept of subjective probability, was initially developed by a British clergyman, Th omas Bayes, in the 18th century and published posthumously (Bayes, 1763). Th e essential benefi ts of using Baye-sian methods in this study are twofold: they work robustly even with small

JET 22,1_f6_70-87.indd 77JET 22,1_f6_70-87.indd 77 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 114

78 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

samples and allow prediction by means of the model derived from the empir-ical evidence (Nokelainen, 2008). In this study, Bayesian models were calcu-lated by means of the B-Course computer program (Myllymäki, Silander, Tirri & Uronen, 2002).

Bayesian classifi cation modelling (BCM) resembles linear discriminant analysis (Huberty, 1994) to some extent; but instead of using forward, back-ward or stepwise search methods for predictor variables, it applies genetic search algorithms. Th e genetic algorithm approach means that predictor-variable selection is not limited to one (or two or three) specifi c approaches; instead, many approaches and their combinations are exploited (Cormen, Leiserson & Rivest, 1996).

Bayesian dependency modelling (BDM) predicts the most probable statisti-cal dependency structure between the observed variables. It visualises the result in a form of Bayesian network, allowing the user to probe the model by adjust-ing the values of all variables and examining the eff ects on other variables included in the best model (Nokelainen, 2008).

4 Results

4.1 Overall Results of WAIS-III, DIT and RJT

Participants’ VIQ and PIQ scores were as follows: (MVIQ

= 125.35, SDVIQ

= 6.393) (M

PIQ = 123.85, SD

PIQ = 8.561). We conclude that the sample consists

of highly intelligent young adults, as most of the participants (n = 17, 85%) were above the ‘slightly better than average’ level (111-20 points) in FSIQ (M

FSIQ = 126.20, SD

FSIQ = 5.908).

Th e DIT P index represents the relative importance that participants ascribe to stage 5 and 6 items of Kohlberg’s theory (level 3, post-conventional: ‘social contract orientation’ and ‘universal ethical principles’). Participants completed the DIT within 15-50 minutes (M = 36.25, SD = 13.463). According to Narvaéz (1993) the resultant P index, 41.65 (SD = 12.041), is above the nor-mal senior high level (M = 31.80, SD = 13.500) and more resembles a typical college student’s P index (M = 42.30, SD = 13.200).

RJT results showed that mathematically gifted adolescents scored highest on the last three stages of Oser’s theory: (1) Orientation to religious heter-onomy (M = 2.00, SD = .750); (2) Orientation to reciprocity (M = 1.75, SD = .720); (3) Orientation to ego autonomy and one-sided self-responsibility (M = 3.01, SD = 1.230); (4) Orientation to mediated autonomy and salvation plan (M = 2.22, SD = 1.056); (5) Orientation to religious intersubjectivity and autonomy (M = 2.66, SD = 1.139).

JET 22,1_f6_70-87.indd 78JET 22,1_f6_70-87.indd 78 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 115

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 79

Results of statistical analyses controlling gender for the WAIS-III, DIT and RJT scores showed that there was only one statistically signifi cant diff erence: males in the sample (n = 9) had higher WAIS-III scores than females (n = 11), χ2(2, n= 20)=6.147, p = .046. However, this fi nding is not very signifi cant scientifi cally, as the related contingency coeffi cient is .49 (C

max = .71).

Next, we checked self-reported school achievement (9th grade marks in mathematics, religion, native and foreign languages) for the WAIS-III, DIT and RJT scores. Results showed a strong positive correlation for both mathe-matics and foreign language grades (Cohen, 1988) with the WAIS-III score at r

s(20) = .62, p = .003 and r

s(20) = 54, p = .015 respectively.

Mathematical giftedness was measured in this study by means of the board-ing school’s entrance examination test (PSMEES). Student scores ranged from 10.7 to 43.0 (M = 28.24, SD = 8.972). Not surprisingly, mathematical gifted-ness showed a strong positive correlation with the WAIS-III score, r

s(19) = .57,

p = .011.Controlling three test scores for gender showed, as expected (e.g. Weschler,

1997; Rest, 1986; Räsänen, 2003), that they are not producing gender-biased results. It was also no surprise that students’ mathematical ability, measured by means of the 9th grade mark and supervised test, correlated positively with the WAIS-III FSIQ score. Further, the result showing no statistically signifi cant correlation between the 9th grade religion mark and RJT scores was not unex-pected, as it replicates Räsänen’s earlier fi nding (Räsänen et al., 2007) in terms of several larger samples (reported correlations ranged from -.04 to .21).

4.2 RQ 1: How is Intelligence Related to Moral Th inking?

Our hypothesis regarding the fi rst research question is that intelligence and moral thinking are positively related concepts. We expect to fi nd, however — in parallel with Narvaéz (1993) — that an above-average level of intelligence is not a positive predictor of an above-average level of moral reasoning.

Non-parametric correlational analysis showed that there was no linear sta-tistical dependency between the DIT P index and WAIS-III FSIQ scores in this sample. Further, BCM indicated, with a 60 per cent classifi cation accu-racy, that none of the A group students would achieve the highest P indexes in the sample; on the other hand, some of the C group students achieved the highest DIT scores in this sample. Our fi nding supports Narváez’s conclusion (1993, p. 277) that ‘intellectual accomplishment is necessary but not suffi -cient for a high score in moral judgement’. When making practical recom-mendations, however, we should bear in mind that the sample consists of highly gifted young adults who scored exceptionally high on both tests meas-uring intelligence and moral thinking.

JET 22,1_f6_70-87.indd 79JET 22,1_f6_70-87.indd 79 4/14/2009 5:19:20 PM4/14/2009 5:19:20 PM

page 116

80 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

4.3 RQ 2: How is Intelligence Related to Religious Th inking?

Our hypothesis regarding the second research question is that intelligence and religious thinking are positively related concepts, since Oser’s theory builds on cognitive thinking skills. Furthermore, we expect to fi nd a preference among the most gifted adolescents for the more autonomous stages of reli-gious thinking.

Th e results of the Chi square test showed that the young adults in the A group (FSIQ above 130) were not associated with the lowest stage of reli-gious judgment (I Orientation to religious heteronomy) to the same extent as the B and C group students, χ2(4, n = 20) = 9.938, p = .041. Th e scientifi c signifi cance of this fi nding is strong, as the related contingency coeffi cient (C = .61) is close to the theoretical maximum approximation of .82. Accord-ing to Räsänen (2003), the fi rst stage is the most prevalent at the ages of 7 to 9 and it should disappear by the age of 14 to 15 years. As the age of the stu-dents in the sample was from 16 to 17 years, we conclude that this fi nding is related to intellectual maturity (physical vs. intellectual age).

Th e BCM showed, with 65 per cent classifi cation accuracy, that the A group’s views were the most non-religious and the C group’s the most reli-gious on all fi ve RJT scales in the sample. Th is fi nding suggests that students with at least slightly above-average intelligence — that is, whose FSIQ scores are above 111 points (all three groups in this sample) — are capable of reli-gious reasoning at the highest stages (IV Orientation to mediated autonomy and salvation plan, V Orientation to religious intersubjectivity and autonomy) if they wish to do so.

Th e BDM resulted in a model that is presented in fi gure 1. Th e initial value distributions — that is, without making any prediction with the three FSIQ value classes — are shown in the left-hand column. When A group members are selected (FSIQ value prediction is set to ‘A’), we see from the ‘A’ column that the highest-scoring young adolescents in WAIS-III totally disagree with the fi rst and second religious orientation stage statements. Columns ‘B’ and ‘C’ show that their lower-scoring peers are not so strongly opposed to those two stages. Finally, results show clearly that members of the lowest-achieving group C locate themselves more in the two highest religious orientation stages than their higher-achieving peers in the A and B groups. Th is is evident as, in the bottom row of fi gure 1 (‘RJT V’), summative percentages of the two most positive response options (4 ‘Agree’ and 5 ‘Strongly agree’) decrease from the right end of the row (C group, 32.0%) towards the left (B group, 22%; A group, 6.2%).

JET 22,1_f6_70-87.indd 80JET 22,1_f6_70-87.indd 80 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 117

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 81

4.4 RQ 3: How are Moral and Religious Th inking Related to each Other?

Our hypothesis regarding the third research question is that moral and reli-gious thinking are related to each other since they both build on cognitive judgement skills. However, we expect them not to be reduced to cognitive skills only.

Moral thinking showed a weak negative correlation to the two fi rst and two last stages of religious judgement (I, II, IV and V respectively) and a strong positive correlation to the third stage (Orientation to ego autonomy

Figure 1: Bayesian dependency model of moral and religious judgment. 0 = None of the Wechsler Adult Intelligence Scale III (WAIS-III) full-scale IQ (FSIQ) values are fi xed and thus no prediction is made about the behaviour of the Religious Judgment Test (RJT) values in the model. A = FSIQ value is fi xed to represent the A group (scores < 121) and the model shows the student’s corresponding responses on the fi ve RJT scales (RJT I = Orientation to religious heteronomy, RJT II = Orien-tation to reciprocity, RJT III = Orientation to ego autonomy and one-sided self-responsibility, RJT IV = Orientation to mediated autonomy and salvation plan, RJT V = Orientation to religious intersubjectivity and autonomy). B =FSIQ value is fi xed to represent the B group (scores 121-130). C = FSIQ value is fi xed to rep-resent the A group (scores > 130).

DIT

0

1 43%

2

3 35%

22%

RJT I1 30%

2

3 30%

40%

RJT II1

2

3

46%

35%

19%

RJT III1

2

3

15%

15%

19%

4

5

38%

13%

RJT IV1

2

3

34%

24%

24%

4 18%

DIT

1

RJT I1

2

3 29%

59%

RJT II1

2

3

64%

15%

21%

RJT III1

2

3

24%

21%

5.9%

4

5

43%

5.9%

RJT IV1

2

3

35%

12%

29%

4 25%

DIT

2

3

RJT I1 35%

2

3

5.4%

RJT II1

2

3

25%

40%

34%

RJT III1

2

3

9.0%

15%

47%

4

5

20%

9.0%

RJT IV1

2

3

33%

27%

28%

4 12%

DIT

3

1

RJT I1 48%

2

3 11%

41%

RJT II1

2

3

37%

56%

7.4%

RJT III1

2

3

8.1%

8.1%

16%

4

5

42%

26%

RJT IV1

2

3

34%

36%

15%

4 15%

100%

100%

100%

1

2

3

1

2 2

3

12%

60%

RJT V1

2

3

23%

18%

34%

4

5

13%

13%

RJT V1

2

3

16%

8.9%

41%

4

5

17%

17%

RJT V1

2

3

24%

21%

39%

4

5

8.2%

8.2%

RJT V1

2

3

31%

27%

21%

4

5

10%

10%

JET 22,1_f6_70-87.indd 81JET 22,1_f6_70-87.indd 81 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 118

82 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

and one-sided self-responsibility), χ2(16, n = 20) = 27.570, p = .036. Th e fi nd-ing is plausible according to Räsänen (2003), as this stage requires abstract thinking skills and is typical in youth and young adulthood.

BDM resulted in a model that is depicted in fi gure 2. Th e initial value dis-tributions — that is, without making any prediction with the three DIT value classes — are shown in the left-hand column. When the DIT value prediction is set to one, we see how the participants who scored lowest in the DIT have responded to the fi ve RJT scales. Th e analysis shows that they are not as abso-lute in their opposition to the fi rst two religious orientations as the other two groups. On the other hand, they have stronger opinions than their peers on the last two religious orientations. Th e previously presented result of strong positive dependency between moral thinking and the third religious orienta-tion is also explicit in the fi gure, since the negative response frequencies (1 ‘Totally disagree’ and 2 ‘Disagree’) tend to decrease as the DIT score increases (45.0%, 24.0%, 16.2% respectively). Th e students who scored highest in the moral thinking test are profi led as belonging to the third religious orientation stage in this sample, and the students who scored lowest are profi led into the fourth and fi fth religious orientation stages. Th e middle-level moral thinkers have a non-linear relationship (multimodality issue) to their religious thinking: some absolutely deny the fi rst two stages of religious orientation, like the members of the highest-scoring DIT group, while some have the most sympathetic feelings for the fi rst two stages in this sample. Our conclusion is that as the level of moral thinking increases, preference for the third stage of religious orientation — ego autonomy and one-sided self-responsibility — increases together with the avoidance of the other four reli-gious orientation stages.

5 Discussion

Th e fi rst hypothesis on the relationship between intelligence and moral think-ing yielded the expected results: moral reasoning was related to intelligence. However, WAIS-III scores were not positively connected to the DIT scores within this highly gifted sample.

Regarding the fi rst research question, DIT P index did not correlate with WAIS-III FSIQ. Th is fi nding disagrees with the conclusion by Sanders et al. (1995, p. 502) that ‘[t]he DIT is simply another way of measuring verbal abil-ity’, but supports the fi nding by Th oma et al. (1999, p. 325) that ‘DIT scores describe a latent variable that is distinct from verbal ability’. Results showed that females and males scored equal DIT P indexes. Th is fi nding supports Narváez’s (1993) earlier fi ndings of gender interdependency of the test.

JET 22,1_f6_70-87.indd 82JET 22,1_f6_70-87.indd 82 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 119

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 83

Another interesting fi nding was that, while calculating the DIT P indexes, we noticed that some of the items were religiously coloured — for example: ‘Whether only God should decide when a person’s life should end’ (Th e doc-tor’s dilemma, item 9.); ‘Whether the Christian commandment to love your fellow man applies to this case’ (Th e Websters dilemma, item 11); ‘If someone is in need, shouldn’t he be helped regardless of what you get back from him?’ (Th e Websters dilemma, item 12). We also learned that these items were selected to represent Kohlberg’s stage 4 of moral reasoning by the DIT authors. In prac-tice this means that if respondents choose to use one or more of these items, their DIT P index will be lower than if they use the items representing Kohlberg’s stages 5 and 6. To demonstrate this we selected two female participants, ‘Leena’

Figure 2: Bayesian dependency model of moral and religious judgment. 0 = None of the Defi ning Issues Test (DIT) values are fi xed and, thus, no prediction is made about the behaviour of the Religious Judgment Test (RJT) values in the model. 1 = DIT value is fi xed to represent the lowest scoring group (P index < 40) and the model shows the student’s corresponding responses on the fi ve RJT scales (RJT I = Orientation to religious heteronomy, RJT II = Orientation to reciprocity, RJT III = Orientation to ego autonomy and one-sided self-responsibility, RJT IV = Orientation to mediated autonomy and salvation plan, RJT V = Orientation to religious intersubjectivity and autonomy). 2 = DIT value is fi xed to represent the middle-scoring group (P index 40-49). 3 = DIT value is fi xed to represent the highest-scoring group (P index > 49).

WAIS III FSIQ

0

A 18%BC 7.8%

74%

RJT I1 29%23 30%

40%

RJT II123

34%32%

33%

RJT III123

13%18%

23%45

39%7.3%

RJT IV123

34%24%

24%4 18%

WAIS III FSIQ

A

RJT I123 5.9%

5.9%

RJT II123

4.6%85%

11%

RJT III123

8.6%12%

23%45

45%11%

RJT IV123

30%47%

8.9%4 14%

WAIS III FSIQ

B

C

RJT I1 16%23

46%

RJT II123

39%21%

40%

RJT III123

13%19%

24%45

38%6.5%

RJT IV123

35%19%

27%4 19%

WAIS III FSIQ

C

A

RJT I1 13%23 13%

74%

RJT II123

62%13%

25%

RJT III123

16%23%

17%45

37%5.9%

RJT IV123

37%15%

26%4 22%

100%100%

100%

ABC

AB B

C

88%

38%

RJT V123

21%17%

42%45

10%10%

RJT V123

41%40%

12%45

3.1%3.1%

RJT V123

16%12%

49%45

11%11%

RJT V123

15%8.0%

44%45

16%16%

JET 22,1_f6_70-87.indd 83JET 22,1_f6_70-87.indd 83 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 120

84 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

and ‘Suvi’, from the sample, with similar FSIQ scores (122 and 118 respec-tively). Leena had selected all three abovementioned stage 4 items in the DIT, whereas Suvi had used none of them. Should Leena have selected ‘correct’ items representing stages 5 or 6 instead, her P index would have increased from .38 to .53, resembling Suvi’s P index (.50). Th is fi nding indicates that the use of Christian ethics is not perceived as principled morality in Kohlberg’s procedures.

Th e second hypothesis on the relationship between intelligence and reli-gious thinking was partly affi rmed: the most intelligent young adults were more opposed to the lowest and highest forms of religious reasoning than their less intelligent peers. Th is might be an indication of atheism among the most intelligent participants. Furthermore, the least intelligent participants in the sample (the C group) were more religiously oriented than their peers. Th is result is intelligible as, according to Räsänen (2003), the fi rst stage is most prevalent at the age of 7 to 9 and should disappear by the age of 14 to 15 years. As the age of the students in the sample ranged from 16 to 17 years, we con-clude that religious judgement is to some extent related to intellectual matu-rity (physical vs. intellectual age).

Th e third hypothesis was affi rmed: the level of moral thinking was nega-tively related to both the lowest and the highest stages of religious judgement, but positively related to the third religious orientation stage (ego autonomy and one-sided self-responsibility). Th is fi nding is plausible, as Räsänen (2003) states that this stage requires abstract thinking skills and is typical of youth and young adulthood.

Whether we use traditional frequentistic parametric (or non-parametric) methods or any non-parametric approach, like neural networks (self-organis-ing maps), fuzzy logic, minimum description length calculation or Bayesian methods, the power (Murphy & Myors, 1998) of the study remains a relevant question: how do we know for sure that if we reject our null hypothesis (H

0)

it is false, too, in the real world? Traditional power analysis is impossible with statistical techniques based on the concept of subjective (i.e. non-frequentis-tic) probability. Th e justifi cation is simple: the Bayesian statistics that we have applied in this article do not include the concepts of statistical signifi cance, alpha (type I) or beta (type II) error levels (Hoijtink & Klugkist, 2007). Our conclusion is that the current results should be interpreted with caution. We need to collect a larger sample for a further study of topics presented in this article.

Another interesting issue for further investigation would be a longitudinal comparison of DIT P index development with the two other measures, as earlier studies (e.g. Rest, 1986) have shown that DIT scores tend to increase with age.

JET 22,1_f6_70-87.indd 84JET 22,1_f6_70-87.indd 84 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 121

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 85

Our study has important implications for religious and moral education. Teachers and psychologists should be informed about the theories and meas-urement techniques used to measure moral and religious thinking. Th ese issues should also be discussed with secondary school students in both reli-gious and non-religious schools. Th is kind of education would promote intel-ligent belief and a critical attitude towards testing for the measurement of intelligence, morality and religiosity.

References

Andreani, O. & Pagnin, A. (1993). Nurturing the moral development of the gifted. In K. Heller, F. Mönks, & H. Passow (eds), International handbook of research and development of gifted-ness and talent (pp. 539-553). Oxford: Pergamon Press.

Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 53, 370-418.

Binfet, J. (1995, April). Identifying the themes in student-generated moral dilemmas. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Second edition. Hillsdale, NJ: Lawrence Erlbaum Associates.

Colangelo, N. (1982). Characteristics of moral problems as formulated by gifted adolescents. Journal of Moral Education, 11(4), 219-232.

Cormen, T. H., Leiserson, C. E., & Rivest, R. L. (1996). Introduction to algorithms. Sixteenth edition. Cambridge, MA: Th e MIT Press.

Cronbach, L. J. (1970). Essentials of psychological testing. Th ird edition. New York: Harper & Row.Duriez, B. & Soenens, B. (2006). Religiosity, moral attitudes and moral competence: A critical

investigation of the religiosity-morality relation. International Journal of Behavioral Develop-ment, 30(1), 76-83.

Hoijtink, H. & Klugkist, I. (2007). Comparison of hypothesis testing and Bayesian model selec-tion. Quality & Quantity, 41, 73-91.

Huberty, C. (1994). Applied discriminant analysis. New York: John Wiley & Sons.Ikonen-Varila, M. (2000). Moraalipäättelyn kehittyneisyys, normisosialisaatio ja toimintastrategiat

[Development of ethics, norm-sosialization and strategic plans]. In J. Hautamäki, P. Arinen, A. Hautamäki, M. Ikonen-Varila, S. Kupiainen, B. Lindholm, M. Niemivirta, P. Rantanen, M. Ruuth, & P. Scheinin (Eds.), Oppimaan oppiminen yläasteella. Oppimistulosten arviointi 7/2000 (pp. 215-238). Helsinki: Opetushallitus.

Janos, P. & Robinson, N. (1985). Psychosocial development in intellectually gifted children. In F. Horowitz & M. O’Brien (eds), Th e gifted and talented: Developmental perspectives (pp. 149-195). Washington, DC: American Psychological Association.

Karnes, F. & Brown, K. (1981). Moral development and the gifted: An initial investigation. Roper Review, 3, 8-10.

Kohlberg, L. (1969). Stage and sequence: Th e cognitive-developmental approach to socializa-tion. In D. A. Goslin (ed.), Handbook of socialization theory and research (pp. 347-480). Chicago: Rand McNally.

Marini, M., Li, X. & Fan, P. (1996). Characterizing latent structure: Factor analytic and grade of membership models. Sociological Methodology, 1, 133-164.

Murphy, K. R. & Myors, B. (1998). Statistical power analysis. A simple and general model for traditional and modern hypothesis tests. Mahwah, NJ: Lawrence Erlbaum Associates.

JET 22,1_f6_70-87.indd 85JET 22,1_f6_70-87.indd 85 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 122

86 K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87

Myllymäki, P., Silander, T., Tirri, H. & Uronen, P. (2002). B-Course: A web-based tool for Bayesian and causal data analysis. International Journal on Artifi cial Intelligence Tools, 11(3), 369-387.

Narvaez, D. (1993). High achieving students and moral judgment. Journal for the Education of the Gifted, 16(3), 268-279.

Nokelainen, P. (2008). Modeling of professional growth and learning: Bayesian approach. Tampere, Finland: Tampere University Press.

Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. Oser, F. (1980). Stages of religious judgment. In C. Brusselmans (ed.), Toward moral and religious

maturity (pp. 277-315). First International Conference on Moral and Religious Develop-ment. Morristown: Silver Burdett Company.

Oser, F. & Gmünder, P. (1991). Religious judgment: a developmental perspective. Birmingham: Reli-gious Education Press.

Rest, J. R. (1986). Moral development. Advances in research and theory. New York: Praeger Publis-hers.

Räsänen, A. (2003). Das religiöse Urteil und die Glaubensvorstellungen. Eine fi nnische Unter-suchung. Archiv für Religionspsychologie, 25, 195-209.

Räsänen, A., Tirri, K. & Nokelainen, P. (2006). Th e moral and religious reasoning of gifted adolescence. In K. Tirri (ed.), Nordic perspectives on religion, spirituality and identity (pp. 97-111). Helsinki, Finland: University of Helsinki.

——— (2007). Religious thinking and giftedness. In K. Tirri & M. Ubani (eds), Holistic educa-tion and giftedness (pp. 91-108). Helsinki: Department of Practical Th eology, University of Helsinki.

Sanders, C. E., Lubinski, D. & Benbow, C. P. (1995). Does the Defi ning Issues Test measure psychological phenomena distinct from verbal ability? An examination of Lykken’s query. Journal of Personality and Social Psychology, 69(3), 498-504.

Straughan, R. (1975). Hypothetical moral situations. Journal of Moral Education, 4(3), 183-189.Tan-Willman, C. & Gutteridge, D. (1981). Creative thinking and moral reasoning in academi-

cally gifted secondary school adolescents. Gifted Child Quarterly, 25(4), 149-153.Terman, L. (1925). Genetic studies of genius: Vol. 1. Mental and physical traits of a thousand gifted

children. Stanford, CA: Stanford University Press. Th oma, S. J., Rest, J., Narváez, D. & Derryberry, P. (1999). Does moral judgment development

reduce to political attitudes or verbal ability: Evidence using the Defi ning Issues Test. Review of Educational Psychology, 11, 325-342.

Tirri, K. (1996). Th e themes of moral dilemmas formulated by preadolescents. Resources in Educa-tion ED 399046.

Tirri, K. & Nokelainen, P. (2007). Comparison of academically average and gifted students’ self-rated ethical sensitivity. Educational Research and Evaluation, 13(6), 587-601.

Tirri, K. & Pehkonen, L. (2002). Th e moral reasoning and scientifi c argumentation of gifted adolescents. Th e Journal of Secondary Gifted Education, 13(3), 120-129.

Walker, L., de Vries, B. & Trevethan, S. (1987). Moral stages and moral orientation in real-life and hypothetical dilemmas. Child Development, 58, 842-858.

Wechsler, D. (1997). Manual for the Wechsler Adult Intelligence Scale — Th ird Edition (WAIS-III). San Antonio, TX: Th e Psychological Corporation.

Yussen, S. (1977). Characteristics of moral dilemmas written by adolescents. Developmental Psy-chology, 13(2), 162-163.

JET 22,1_f6_70-87.indd 86JET 22,1_f6_70-87.indd 86 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 123

K. Tirri et al., / Journal of Empirical Th eology 22 (2009) 70-87 87

Appendix

Religious Judgment Test factors and items (alpha values in parenthesis).

I Orientation to religious heteronomy (.79)

v32 God either protects or abandons.v40 If I do not fulfi l God’s will, my relationship to Him will break down.v41 God directly infl uences human beings by awarding and punishing.

II Orientation to do ut des (.90)

v29 Adherence to the religious rules helps me to relate well with God.v34 I can infl uence God by praying.v36 God sends affl ictions, and after surviving them I can have God’s love.v38 Proper behaviour protects me from God’s anger.

III Orientation to ego autonomy and one-sided self-responsibility (.81)

v18 A free human being acts autonomously without God.v20 My life in this world does not rest on a consideration of God, who is outside this world.v21 I think God is somewhere else, not participating in occurrences in this world.v22 My life in this world and the transcendent world of God do not inter-sect.

IV Orientation to mediated autonomy and salvation plan (.79)

v28 God has plans for this world and they come true through human beings.v31 God needs human beings to realise His will in this world.v35 Th e existence of the world requires that God exist.v37 I am free to make decisions in my life, but within the limits of God’s instructions and directions.

V Orientation to religious intersubjectivity and autonomy (.82)

v19 A deeply religious individual is in every way committed to God and to love for one’s neighbour.v26 I know that the invisible world of God is manifested in this world in love and forgiveness.v27 I believe that many things in this world refl ect God’s invisible world.v39 I consider that God is always present in interpersonal involvement.

JET 22,1_f6_70-87.indd 87JET 22,1_f6_70-87.indd 87 4/14/2009 5:19:21 PM4/14/2009 5:19:21 PM

page 124