Observing students in classroom settings: A review of seven coding schemes. School Psychology...

454

School Psychology Review,2005, Volume 34, No. 4, pp. 454-474

Observing Students in Classroom Settings: A Review ofSeven Coding Schemes

Robert J. VolpeNortheastern University

James C. DiPernaThe Pennsylvania State University

John M. HintzeUniversity of Massachusetts-Amherst

Edward S. ShapiroLehigh University

Abstract. A variety of coding schemes are available for direct observational as-sessment of student classroom behavior. These instruments have been used for anumber of assessment tasks including screening children in need of further evalu-ation for emotional and behavior problems, diagnostic assessment of emotionaland behavior problems, assessment of classroom ecology in the formulation ofacademic interventions, and monitoring the progress of medical, psychosocial,and academic interventions. Although this method of behavioral assessment has ahigh degree of face validity, it is essential to consider the psychometric propertiesof available coding schemes to select the appropriate instrument for a given as-sessment. This article reviews the structure, content, training requirements, andavailable psychometric properties of seven available direct observation codes. Rec-ommendations for the use of each code and future directions for research in obser-vational assessment are provided.

assessment tasks including: (a) screening chil-dren in need of further evaluation for emotionaland behavior problems, (b) diagnostic assess-ment of emotional and behavior problems, (c)assessing classroom ecology in the formula-tion of academic interventions, and (d) moni-toring the progress of medical, psychosocial,and academic interventions.

Systematic observation of student behav-ior is among the most common assessmentmethodologies utilized by school psychologists(Shapiro & Heick, 2004; Wilson & Reschly,1996) and is viewed as one of the most objec-tive and direct measurement tools available forthe assessment of child behavior. This methodtraditionally has been used for a number of

Correspondence regarding this article should be addressed to: Robert J. Volpe, PhD, Department of Coun-seling and Applied Educational Psychology, 202 Lake Hall, Northeastern University, Boston, MA 02115-5000; E-mail: [email protected]

Copyright 2005 by the National Association of School Psychologists, ISSN 0279-6015

“You can observe a lot just by watchin’.” Yogi Berra

455

Classroom Observation Codes

Information gathered via direct observa-tion has a high degree of face validity; however,several factors may have a negative effect onthe quality of these data. Merrell (1999) lists thefollowing threats to the validity of behavioralobservation: (a) poorly defined behavior catego-ries, (b) low interobserver reliability, (c) observeereactivity, (d) situational specificity of targetbehaviors, (e) inappropriate code selection, and(f) observer bias. These threats may be mini-mized through the selection and use of well-vali-dated instruments and adequate training in theiruse. Hintze (2005) provides specific guidelinesfor the selection of appropriate coding schemes.Because no observation code is appropriate forall situations, the selection of an appropriateobservation coding scheme is an essential stepin maximizing the validity of an observation-based assessment.

To select the observation code that ismost appropriate for a given purpose, usersshould be aware of the available codes, theircomposition, their psychometric properties(see Hintze, 2005), and the amount of time re-quired to learn each code. Although severalresources exist that describe observation cod-ing schemes (e.g., Hintze, Volpe, & Shapiro,2002; Winsor, 2003), they include only a fewmeasures and provide limited information re-garding the psychometric properties of eachmeasure. The purpose of this article is to pro-vide a comprehensive review of observationcoding schemes available to assess the aca-demic behaviors of elementary school children.Although coding schemes exist to measure awide variety of student behaviors in classrooms(e.g., academic behaviors, social behaviors,student-teacher interactions), this review fo-cuses on codes measuring academic engage-ment given the central role this observableclassroom behavior plays in the learning pro-cess (e.g., Greenwood, 1996). The ultimategoal is to assist school psychologists’ with theselection of appropriate observation codes foruse in their professional practice.

Selection of Observation Codes forReview

In keeping with our aforementioned pur-pose, we limited our review to coding schemes

easily obtained by school practitioners that in-clude a measure of academic engagement andcan be used without the need of a computer.Although excellent computer-based codingsystems exist for assessing child academic be-haviors such as the Eco-Behavioral AssessmentSoftware System (Greenwood, Carta, Kamps,& Delquadri, 1993), the need of a portablecomputer and extensive training limits theirwidespread use by school practitioners. To beincluded in this review, a coding system must:(a) have been designed for use in elementaryclassrooms; (b) have a manual available fromthe author, or otherwise published, in a formto facilitate standardized use of the code; (c)assess academically related behaviors (e.g.,academic engagement); and (d) allow for re-cording via a paper-and-pencil format (someavailable codes can be used only with por-table technology, such as a laptop computeror PDA). A search of electronic databases(e.g., ERIC Assessment Clearinghouse,PsychInfo), Buros Mental MeasurementsYearbooks, recent test catalogs, and attemptsto contact developers yielded the followingseven codes meeting our inclusion criteria: Aca-demic Engaged Time Code of the SystematicScreening for Behavior Disorders (AET-SSBD;Walker & Severson, 1990), ADHD School Ob-servation Code (ADHD-SOC; Gadow,Sprafkin, & Nolan, 1996), Behavioral Obser-vation of Students in Schools (BOSS; Shapiro,2004), Classroom Observation Code (COC;Abikoff & Gittelman, 1985), Direct Observa-tion Form (DOF; Achenbach, 1986), State-Event Classroom Observation System(SECOS; Saudargas, 1997), and Student Ob-servation System of the Behavioral AssessmentSystem for Children-2 (SOS; Reynolds &Kamphaus, 2004).

For each systematic observation codeincluded in this review, we report the follow-ing information: (a) the general purpose of thecode, (b) the length of time required for train-ing (as reported in research studies, these arelikely overestimates for experienced schoolpsychologists), (c) a list of behavior catego-ries (with examples of target behaviors), and(d) a summary of published psychometric data.In addition, we address the strengths and limi-

456

School Psychology Review, 2005, Volume 34, No. 4

tations of each code and offer recommenda-tions for its appropriate use. Table 1 presentscharacteristics of each code, including avail-ability, availability on hand-held computers,recording method (e.g., partial interval, mo-mentary time sample), behavior categories,training requirements, and the typical lengthof single observations. Table 2 summarizes thepsychometric properties of each code. Table 3summarizes the strengths and limitations ofeach code and offers recommendations for usebased on available data. In the following sec-tions we review each of the seven systematicdirect observation codes.

Academic Engaged Time Code (AET-SSBD)

The AET-SSBD is a component ofSystematic Screening of Behavior Disorders(SSBD; Walker & Severson, 1990), a screen-ing procedure for emotional and behavior dis-orders in elementary school children (Grades1–6). The full SSBD system involves threegates: (a) teacher rank ordering of students interms of externalizing and internalizing prob-lems, (b) teacher ratings of adaptive behaviorand critical events, and (c) systematic directobservations in multiple school settings (theAET-SSBD in classroom settings, and the PeerSocial Behavior Code or PSB-SSBD in free-play settings). Leff and Lakin (2005) providea review of the PSB in this issue. The AET-SSBD and PSB-SSBD are used in the third gateto verify teacher ratings. The AET-SSBD mea-sures the amount of time a student spends en-gaged in academic material during indepen-dent seatwork (e.g., listening to the teacher,writing in a workbook). The total amount oftime a student exhibits behavior consistent withthe definition of academic engagement is re-corded with a stopwatch. This value is dividedby the overall time of observation (usually 15minutes) and multiplied by 100 to computeacademic engaged time. Typically data fromtwo observations are averaged to obtain a stableacademic engaged time score. As opposed tocomparing this score to one or more randomlyselected peers, in the AET-SSBD raw scoresare converted to T-scores using normative datatables. Normative data for the SSBD-AET con-

sist of a sample of 1,300 first through sixthgrade children from 16 school districts acrosssix states. Normative tables in the SSBDmanual are arranged according to gender andwhether or not the child of interest has met thecriteria for either internalizing or externaliz-ing problems in the second stage of the SSBD.Training requirements have been reported toconsist of as little as 4 to 6 hours for both theAET-SSBD and PSB-SSBD codes (Walker etal., 1990).

Psychometric properties. For theAET-SSBD, interobserver agreement is calcu-lated by dividing the smaller score of two ob-servers by the larger score. Mean interobserveragreement coefficients across five publishedstudies have ranged from .95 (Walker et al.,1994) to .98 (Quinn, Mathur, & Rutherford,1995). Coefficients for individual cases havetypically ranged from the .80s to 1.00. Scoreson the AET-SSBD have been shown to corre-late significantly (r = -.42) with teacher rat-ings of externalizing behavior problems(Walker et al., 1988). Several studies havefound scores on the AET-SSBD to accuratelydiscriminate control children from those nomi-nated as at-risk by teachers for externalizingproblems (e.g., Quinn et al., 1995; Walker etal., 1990). Less consistent has been the abilityof scores from the AET-SSBD to discriminatechildren nominated as at-risk for internalizingproblems from controls (Walker et al., 1988;Walker et al., 1990).

Attention-Deficit HyperactivityDisorder School Observation Code(SOC)

The ADHD-SOC (Gadow et al., 1996)was developed as both a screening measure andas a tool for evaluating the effects of interven-tions for children with attention-deficit/hyper-activity disorder (ADHD) and related disor-ders across school settings (classroom, lunch-room, playground). According to its develop-ers, training for the ADHD-SOC should requireapproximately 20 to 25 hours. Categories arecoded in 15-second intervals for 15 minutes.The following behaviors are coded using thepartial interval method: (a) interference (e.g.,

457


Tab

le 1

Ch

arac

teri

stic

s of

Rev

iew

ed S

yste

mat

ic O

bser

vati

on C

odes

Tra

inin

gTy

pica

l Len

gth

Cod

e/A

vaila

bilit

y C

ompu

teri

zed?

Rec

ordi

ng M

etho

d(s)

Beh

avio

r C

ateg

orie

sR

equi

rem

ents

of O

bser

vatio

n

Aca

dem

ic E

ngag

ed T

ime

Cod

eN

oD

urat

ion

1.A

cade

mic

eng

aged

tim

eL

ow15

min

utes

of th

e SS

BD

(A

ET-

SSB

D;

Wal

ker

& S

ever

son,

199

0)w

ww

.sop

risw

est.c

om

AD

HD

Sch

ool O

bser

vatio

n C

ode

No

Part

ial i

nter

val

1.In

terf

eren

ceM

oder

ate

15 m

inut

es(A

DH

D-S

OC

; Gad

ow, S

praf

kin,

15-s

econ

d in

terv

als

2.M

otor

mov

emen

t&

Nol

an, 1

996)

Ava

ilabl

e fr

om:

3.N

onco

mpl

ianc

ew

ww

.che

ckm

atep

lus.

com

4.N

onph

ysic

al a

ggre

ssio

n5.

Off

-tas

k

Beh

avio

ral O

bser

vatio

n of

Stu

dent

sPa

lmM

omen

tary

/Tim

e1.

Act

ive

enga

ged

time

Mod

erat

e15

min

utes

in S

choo

ls (

BO

SS; S

hapi

ro, 2

004)

Sa

mpl

e2.

Pass

ive

enga

ged

time

Com

pute

r ve

rsio

n av

aila

ble

from

:Pa

rtia

l/Int

erva

l3.

Off

-tas

k m

otor

ww

w.s

opri

swes

t.com

15-s

econ

d in

terv

als

4.O

ff-t

ask

verb

al5.

Off

-tas

k pa

ssiv

e6.

Teac

her

dire

cted

inst

ruct

ion

Cla

ssro

om O

bser

vatio

n C

ode

No

Part

ial

inte

rval

1.In

terf

eren

ceH

igh

32 m

inut

es(C

OC

; Abi

koff

& G

ittel

man

, 198

5)W

hole

inte

rval

2.M

inor

mot

or m

ovem

ent

3.G

ross

mot

or s

tand

ing

Thi

s re

fere

nce

prov

ides

a d

etai

led

4.G

ross

mot

or v

igor

ous

desc

ript

ion

of c

ode

and

the

5.Ph

ysic

al a

ggre

ssio

nob

serv

atio

n pr

otoc

ol.

6.V

erba

l agg

ress

ion

7.So

licita

tion

of te

ache

r8.

Off

-tas

k9.

Non

com

plia

nce

10.

Out

of

chai

r be

havi

or11

.A

bsen

ce o

f be

havi

or(T

able

1 c

ontin

ues)

458


(Tab

le 1

con

tinue

d)

Tra

inin

gTy

pica

l Len

gth

Cod

e/A

vaila

bilit

y C

ompu

teri

zed?

Rec

ordi

ng M

etho

d(s)

Beh

avio

r C

ateg

orie

sR

equi

rem

ents

of O

bser

vatio

n

Dir

ect O

bser

vatio

n Fo

rmN

oPr

edom

inan

t act

ivity

1.O

n-ta

skL

ow10

min

utes

(DO

F; A

chen

bach

, 198

6)

sam

plin

g2.

With

draw

n-In

atte

ntiv

ew

ww

.ase

ba.o

rg4-

poin

t Lik

ert s

cale

3.N

ervo

us-O

bses

sive

f

or 9

7 pr

oble

m it

ems

4.

Dep

ress

ed5.

Hyp

erac

tive

6.A

ttent

ion

Dem

andi

ng7.

Agg

ress

ive

Stat

e-E

vent

Cla

ssro

om O

bser

vatio

nN

oM

omen

tary

/tim

eSt

ates

Mod

erat

e20

min

utes

Syst

em (

SEC

OS;

Sau

darg

as, 1

997)

s

ampl

e1.

Scho

ol w

ork

rsau

darg

@ut

k.ed

uFr

eque

ncy

reco

rdin

g 2

.L

ooki

ng a

roun

d15

-sec

ond

inte

rval

s3.

Oth

er a

ctiv

ity4.

Soci

al in

tera

ctio

n w

ith c

hild

*5.

Soci

al in

tera

ctio

n w

ith te

ache

r6.

Out

of

seat

Eve

nts

1.O

ut o

f se

at2.

App

roac

h ch

ild3.

Oth

er c

hild

app

roac

h4.

Rai

se h

and

5.C

allin

g ou

t to

teac

her

6.Te

ache

r ap

proa

ch

Stud

ent O

bser

vatio

n Sy

stem

Win

dow

sM

omen

tary

/tim

eA

dapt

ive

Beh

avio

rsU

ncle

ar15

min

utes

(SO

S; R

eyno

lds

& K

amph

aus,

200

4)Pa

lm

sam

ple

1.R

espo

nse

to te

ache

r/le

sson

ww

w.a

gsne

t.com

30-s

econ

d in

terv

als

2.Pe

er in

tera

ctio

n3.

Wor

ks o

n sc

hool

sub

ject

s4.

Tra

nsiti

on m

ovem

ent

(Tab

le 1

con

tinue

s)

459


(Tab

le 1

con

tinue

d)

Tra

inin

gTy

pica

l Len

gth

Cod

e/A

vaila

bilit

y C

ompu

teri

zed?

Rec

ordi

ng m

etho

d(s)

Beh

avio

r C

ateg

orie

sR

equi

rem

ents

of O

bser

vatio

n

3-po

int L

iker

t sca

lePr

oble

m B

ehav

iors

f

or 6

5 be

havi

or1.

Inap

prop

riat

e m

ovem

ent

i

tem

s—so

me

item

s2.

Inat

tent

ion

p

erm

it sc

orin

g of

3.In

appr

opri

ate

voca

lizat

ion

w

heth

er th

e be

havi

or4.

Som

atiz

atio

n

was

dis

rupt

ive

to th

e5.

Rep

etiti

ve m

otor

mov

emen

ts

cla

ss

6.

Agg

ress

ion

7.

Self

inju

riou

s be

havi

or8.

Inap

prop

riat

e se

xual

beh

avio

r9.

Bow

el/b

ladd

er p

robl

ems

Not

e. L

ow (

up to

10

hour

s), M

oder

ate

(bet

wee

n 11

-25)

, (H

igh

> 2

5).

460


Tab

le 2

Psy

chom

etri

c P

rope

rtie

s of

Rev

iew

ed S

yste

mat

ic O

bser

vati

on C

odes

Val

idity

Tre

atm

ent

Cod

eIn

tero

bser

ver A

gree

men

tC

onve

rgen

tD

iscr

imin

ant

Sens

itivi

tyN

orm

ativ

e D

ata

AE

T-Pe

rcen

t agr

eem

ent a

vera

ged

TR

F E

xter

naliz

ing

(r =

-.4

2)D

iscr

imin

ated

chi

ldre

n at

-ris

kN

o da

ta a

vaila

ble

1,30

0 ch

ildre

n fr

om 1

6SS

BD

acr

oss

5 st

udie

s w

as .9

6fo

r ex

tern

aliz

ing

prob

lem

ssc

hool

dis

tric

ts in

6fr

om c

ontr

ols

stat

es

Dis

crim

inat

ed c

hild

ren

at-r

isk

Dat

a av

aila

ble

byfo

r in

tern

aliz

ing

prob

lem

sge

nder

, and

by

grou

ping

from

con

trol

sof

the

SSB

D (

exte

rnal

iz-

ing,

inte

rnal

izin

g,no

nran

ked)

AD

HD

-K

appa

s =

.77–

.86

IOW

A (

rs

= .0

6–.7

1)D

iscr

imin

ated

chi

ldre

n w

ithSe

nsiti

ve to

Aut

hors

rec

omm

end

use

SOC

Kap

pas

= .6

7–.8

0A

TR

S (r

s =

.11–

.62)

emot

iona

l and

beh

avio

ral

stim

ulan

t dru

gof

cla

ssro

om c

ompa

ri-

Peer

Con

flic

t Sca

le (

rs =

diso

rder

s fr

om c

hild

ren

with

effe

cts

son

child

ren

(obs

erve

d .3

8–.7

4)le

arni

ng d

isab

ilitie

sin

alte

rnat

ing

1-m

inut

ese

gmen

ts)

Dis

crim

inat

ed c

hild

ren

with

AD

D f

rom

thei

r no

ndis

able

dpe

ers

Dis

crim

inat

ed c

hild

ren

with

AD

HD

fro

m th

eir

nond

isab

led

peer

s

(Tab

le 2

con

tinue

s)

461


(Tab

le 2

con

tinue

d)

Val

idity

Cod

eIn

tero

bser

ver A

gree

men

tC

onve

rgen

tD

iscr

imin

ant

Tre

atm

ent

Nor

mat

ive

Dat

aSe

nsiti

vity

BO

SSK

appa

s =

.93–

.98

No

publ

ishe

d da

ta a

vaila

ble

Dis

crim

inat

ed c

hild

ren

with

Sens

itive

toA

utho

rs r

ecom

men

d us

eA

DH

D f

rom

thei

rin

stru

ctio

nal

of c

lass

room

com

pari

-no

ndis

able

d pe

ers

man

ipul

atio

nsso

n ch

ildre

n (o

bser

ved

ever

y fi

fth

inte

rval

)

CO

CPh

i coe

ffic

ient

s =

.80–

1.0

No

publ

ishe

d da

ta a

vaila

ble

Dis

crim

inat

ed c

hild

ren

with

Phi c

oeff

icie

nts

= .5

5–.9

5A

DH

D f

rom

thei

rPh

i coe

ffic

ient

s =

.40–

.97

nond

isab

led

peer

s*P

oor

relia

bilit

y fo

r ve

rbal

aggr

essi

on in

som

e st

udie

sU

sing

mul

tiple

cat

egor

ies

Sens

itive

toA

utho

rs r

ecom

men

d us

e80

% o

f AD

HD

and

typi

cally

psyc

hoso

cial

of c

lass

room

com

pari

-de

velo

ping

chi

ldre

n w

ere

inte

rven

tions

and

son

child

ren

(obs

erve

dco

rrec

tly c

lass

ifie

dst

imul

ant d

rug

in a

ltern

atin

g 4-

min

ute

effe

cts

segm

ents

)

DO

FA

vera

ge P

ears

on c

orre

latio

nsT

RF

tota

l beh

avio

rD

iscr

imin

ated

boy

s re

ferr

edO

n-ta

sk, a

nd th

eA

utho

rs r

ecom

men

d us

eac

ross

4 s

tudi

es w

as .9

0 fo

rpr

oble

ms

scor

efo

r pr

oble

m b

ehav

iors

fro

mN

ervo

us/O

bses

sive

,of

cla

ssro

om c

ompa

ri-

tota

l beh

avio

r pr

oble

ms

and

(rs

= -

.26–

-.5

3)ty

pica

lly d

evel

opin

g bo

ysan

d D

epre

ssio

nso

ns o

bser

ved

in 1

0-.8

4 fo

r on

-tas

km

atch

ed f

or a

ge, g

rade

,sc

ales

hav

em

inut

e bl

ocks

bef

ore

TR

F sc

hool

per

form

ance

and

race

dem

onst

rate

dan

d af

ter

10-m

inut

eO

ne w

ay I

CC

s fo

r 60

(rs

= -

.14–

.66)

sens

itivi

ty t

o a

obse

rvat

ion

of ta

rget

min

utes

of

obse

rvat

ion

was

targ

eted

pre

vent

ion

child

. .8

6 fo

r to

tal b

ehav

ior

TR

F A

dapt

ive

func

tioni

ngpr

ogra

mpr

oble

ms

and

.71

for

on-t

ask

com

posi

te (

rs =

.48–

.72)

Nor

mat

ive

data

on

287

child

ren

from

One

way

IC

C f

or 1

0N

ebra

ska,

Ore

gon,

and

min

utes

of

obse

rvat

ion

was

Ver

mon

t als

o ar

e .8

5 fo

r to

tal b

ehav

ior

prob

lem

s a

vaila

ble

for

com

pari

- a

nd .5

8 fo

r on

-tas

kso

n pu

rpos

es.

(Tab

le 2

con

tinue

s)

462


(Tab

le 2

con

tinue

d)

Val

idity

Tre

atm

ent

Cod

eIn

tero

bser

ver A

gree

men

tC

onve

rgen

tD

iscr

imin

ant

Sens

itivi

tyN

orm

ativ

e D

ata

SEC

OS

Ave

rage

tota

l agr

eem

ent

No

publ

ishe

d da

ta a

vaila

ble

Dis

crim

inat

ed b

etw

een

No

publ

ishe

d da

ta50

0 ch

ildre

n in

Gra

des

1 =

.81

child

ren

with

beh

avio

rav

aila

ble

thro

ugh

5di

sord

ers

and

thei

rTo

tal a

gree

men

t = .7

5–1.

0no

ndis

able

d pe

ers

Chi

ldre

n at

tend

ed 1

of

10 e

lem

enta

ry s

choo

ls in

Dis

crim

inat

ed b

etw

een

Eas

t Ten

ness

eech

ildre

n w

ith le

arni

ngdi

sabi

litie

s fr

om th

eir

nond

isab

led

peer

s

SOS

Unc

lear

fro

m a

vaila

ble

data

No

publ

ishe

d da

ta a

vaila

ble

Dis

crim

inat

ed b

etw

een

No

publ

ishe

d da

taA

utho

rs r

ecom

men

d us

ech

ildre

n w

ith A

DH

D a

nd a

vaila

ble

of tw

o or

thre

e ra

ndom

lyno

ndis

able

d ch

ildre

nse

lect

ed p

eer

com

pari

-so

ns

IOW

A =

IO

WA

Con

ners

Tea

cher

’s R

atin

g Sc

ale

(Lon

ey &

Mili

ch, 1

982)

; AT

RS

= A

bbre

viat

ed T

each

ers

Rat

ing

Scal

e (C

onne

rs, 1

973)

; TR

F =

Tea

cher

Rep

ort

Form

(cf

., A

chen

bach

&R

esco

rla,

200

1).

463


Tab

le 3

Stre

ngt

hs,

Lim

itat

ion

s, a

nd

Rec

omm

enda

tion

s fo

r R

evie

wed

Sys

tem

atic

Obs

erva

tion

Cod

es

Cod

eSt

reng

ths

Lim

itatio

nsR

ecom

men

ded

Use

Aca

dem

ic E

ngag

edSi

mpl

e co

de to

lear

n an

d us

eN

arro

w a

sses

smen

t of

stud

ent b

ehav

ior

Use

as

part

of

the

SSB

D f

or s

cree

ning

Tim

e-SS

BD

Low

trai

ning

req

uire

men

tsN

eed

for

upda

ted

norm

ativ

e da

ta

pur

pose

sSt

rong

rel

iabi

lity

Use

ful f

or m

easu

ring

stu

dent

eng

age-

Stro

ng s

uppo

rt f

or d

iscr

imin

ant v

alid

ity

men

t as

part

of

a di

agno

stic

ass

essm

ent

AD

HD

-Sch

ool

Bro

ad m

easu

rem

ent o

f ex

tern

aliz

ing

Var

iabl

e in

tero

bser

ver

agre

emen

tA

sses

smen

t of

exte

rnal

izin

g pr

oble

ms

Obs

erva

tion

Cod

e

beh

avio

rsR

elat

ivel

y lo

w a

ssoc

iatio

n w

ith te

ache

rM

onito

ring

eff

ects

of

inte

rven

tions

Kno

wn

psyc

hom

etri

c pr

oper

ties

r

atin

gs o

f hy

pera

ctiv

itySe

nsiti

ve to

eff

ects

of

trea

tmen

t

BO

SSSp

ecif

ic a

sses

smen

t of

activ

e st

uden

tM

ore

info

rmat

ion

need

ed r

egar

ding

Des

crib

ing

the

clas

sroo

m b

ehav

ior

of

eng

agem

ent

tr

eatm

ent s

ensi

tivity

c

hild

ren

Som

e ev

iden

ce f

or tr

eatm

ent s

ensi

tivity

In g

ener

al, l

imite

d in

form

atio

n co

ncer

ning

May

be

usef

ul in

ass

essm

ent o

f

psy

chom

etri

c pr

oper

ties

e

xter

naliz

ing

beha

vior

CO

CSt

rong

sup

port

for

dis

crim

inan

t val

idity

Com

plex

cod

e th

at is

a c

halle

nge

to le

arn

Scre

enin

g an

d di

agno

sis

of A

DH

DSt

rong

sup

port

for

trea

tmen

t sen

sitiv

ityD

ata

need

ed e

xam

inin

g di

scri

min

atio

n of

Mon

itori

ng th

e ef

fect

s of

inte

rven

tions

c

hild

ren

with

AD

HD

fro

m o

ther

f

or A

DH

D

aff

ecte

d po

pula

tions

DO

FE

asy

to le

arn

and

use

Mor

e in

form

atio

n ne

eded

with

reg

ard

toA

s pa

rt o

f th

e A

SEB

A f

or a

sses

smen

t of

Bro

ad a

sses

smen

t of

exte

rnal

izin

g an

d

the

psyc

hom

etri

c pr

oper

ties

of s

cale

s

em

otio

nal a

nd b

ehav

ior

prob

lem

s

inte

rnal

izin

g be

havi

ors

o

ther

than

tota

l beh

avio

r pr

oble

ms

Inte

grat

ed in

to a

bro

ad a

sses

smen

t sys

tem

Mor

e in

form

atio

n ne

eded

reg

ardi

ng

(A

SEB

A)

tr

eatm

ent s

ensi

tivity

and

dis

crim

inan

tSo

me

evid

ence

of

trea

tmen

t sen

sitiv

ity

val

idity

Som

e ev

iden

ce f

or u

tility

in d

iagn

ostic

Smal

l nor

mat

ive

sam

ple

as

sess

men

ts o

f be

havi

or p

robl

ems

(Tab

le 3

con

tinue

s)

464


(Tab

le 3

con

tinue

d)

Cod

eSt

reng

ths

Lim

itatio

nsR

ecom

men

ded

Use

SEC

OS

Kno

wn

accu

racy

Nor

mat

ive

grou

p co

nsis

ts o

nly

ofA

sses

smen

t of

exte

rnal

izin

g pr

oble

ms

ifE

vide

nce

for

disc

rim

inan

t val

idity

5

00 s

tude

nts

from

Eas

t Ten

ness

ee

pee

r co

mpa

riso

ns a

re u

tiliz

edL

ow in

fere

nce/

desc

ript

ive

cate

gori

esN

o da

ta a

vaila

ble

rega

rdin

g tr

eatm

ent

Has

bee

n us

ed o

n a

wid

e ag

e ra

nge

s

ensi

tivity

(

firs

t gra

de–h

igh

scho

ol)

SOS

Rel

ativ

ely

broa

d as

sess

men

t of

both

Lim

ited

evid

ence

of

psyc

hom

etri

cN

one

at th

is ti

me

p

ositi

ve a

nd n

egat

ive

stud

ent

p

rope

rtie

s

beh

avio

rs

465


calling out when it is not appropriate to do so),(b) motor movement (e.g., getting out of seatwithout permission), (c) verbal aggression(e.g., cursing at another student), (d) symbolicaggression (e.g., taking another student’s pen-cil), (e) object aggression (e.g., kicking a chair),and (f) off-task (e.g., looking out a windowinstead of completing an assignment). Non-compliance (e.g., ignoring verbal directionfrom teacher) is scored using the whole inter-val method. Other categories are coded inlunchroom and playground settings to assessappropriate and inappropriate social behaviors(see Leff & Lakin, 2005, for use of the ADHD-SOC in playground settings). When used aspart of a comprehensive diagnostic assessment,the authors of the ADHD-SOC recommendselecting three or four peers to observe for com-parison. Selected peers are observed with thetarget student in alternating 1-minute segments.

Psychometric properties. Interobserveragreement using the ADHD-SOC has beensomewhat variable. For example, Nolan andGadow (1994) reported kappa coefficients be-tween .77 and .86 for the five classroom cat-egories, with only the category of nonphysicalaggression falling below .80. However, Gadow,Nolan, Sprafkin, and Sverd (1995) reportedkappas at or below .80 for all five categories(k = .67–.80). Test-retest coefficients based onobservations within a 2-week period were lowto moderate (range = .27–.72) (Gadow et al.,1996). The association between teacher ratingsof hyperactivity and relevant ADHD-SOC cat-egories (motor movement and off-task) hasbeen low and not statistically significant. How-ever, significant associations between teacherratings of hyperactivity and observed off-taskbehavior (though not motor movement) emergewhen teacher ratings of negative behavior werecontrolled for statistically (rs between .46 and.48) (Gadow et al., 1996). Evidence for the con-vergent validity of the remaining categories ofthe ADHD-SOC (interference, noncompliance,nonphysical aggression) is more robust. Nolanand Gadow (1994) found moderate correlationsbetween these categories and teacher ratingsof aggression and emotional lability (range =.38–.74). In addition, the ADHD-SOC has beenfound to discriminate between children iden-

tified as having ADHD and their nonlabeledpeers (Gadow et al., 1992). Finally, the treat-ment sensitivity of all but one ADHD-SOC cat-egory (nonphysical aggression) has been dem-onstrated in school-based studies of stimulantdrug effects (Gadow, Nolan, & Sverd, 1992;Gadow, Nolan, Sverd, Sprafkin, & Paolicelli,1990).

Behavioral Observation of Students inSchools (BOSS; Shapiro, 2004)

The BOSS was designed to assess stu-dent academic behavior in the classroom en-vironment. According to its developer (thefourth author of this article), it should take be-tween 10 and 15 hours of training to becomeproficient using the BOSS. The BOSS essen-tially measures levels of on- and off-task be-havior. However, the BOSS divides on-taskbehavior into active engaged time (AET; codedwhen a student is actively engaged in academicresponding; e.g., reading aloud, writing in ajournal), and passive engaged time (PET;coded when a student is passively attending;e.g., listening to a teacher, looking at the black-board while a teacher writes). Both AET andPET are scored using momentary time sam-pling at the beginning of each 15-second in-terval. During the remainder of each interval,the partial interval method is used to recordthe following off-task behavior categories: (a)off-task motor (motor activity not associatedwith the assigned academic task; e.g., leav-ing seat to throw a piece of paper in the trashcan), (b) off-task verbal (utterances not as-sociated with the academic task; e.g., talk-ing to a peer about something other than thecurrent assignment, humming), and (c) off-taskpassive (passive nonengagement; e.g., lookingout the window). Using the BOSS, the ob-server codes the behavior of the target childfor four out of every five intervals. On ev-ery fifth interval, the behavior of one of sev-eral preselected peers is coded on the samebehaviors as the target child for comparisonpurposes. Finally, teacher-directed instruction(TDI) is coded using the partial-intervalmethod. Scores on TDI estimate the amountof time a teacher is engaged in instruction. Forexample, TDI would be coded if the teacher

466


was lecturing to the class, but TDI would notbe coded if he was grading papers at his desk.TDI, like peer comparison data, is scored onevery fifth interval.

Psychometric properties. Reports ofinterobserver agreement for the BOSS havebeen consistently high. For example, in a studyinvolving repeated measurement of three par-ticipants, Ota and DuPaul (2002) reported to-tal agreement ranging between 90 and 100%.More recently, DuPaul et al. (2004) reportedkappas ranging from .93 to .98 for observa-tions in a large sample of children with ADHDand normal comparison children (N = 136).Although there are no data available support-ing the convergent validity of the BOSS, thereare some data supporting the ability of theBOSS to discriminate between children withADHD and typically developing children. Spe-cifically, DuPaul et al. (2004) found that PETand a composite of the three off-task scores ofthe BOSS significantly discriminated betweenchildren with ADHD who had academic prob-lems and typically developing peers, whetherthe observations were conducted during in-struction in mathematics or reading. Effectsizes for these variables ranged between -.53and 1.25.

Treatment sensitivity of the BOSS hasbeen documented in a study investigating theefficacy of computer-aided instruction for threechildren with ADHD (e.g., Ota & DuPaul,2002). In a multiple-baseline design acrossthree participants, the BOSS categories of AET(ES between -2.91 and -13.01) and a compos-ite of the three off-task scores (ES between 1.8and 3.06) were found be sensitive to manipu-lations in instructional modality (regular mathinstruction vs. working on a computer).

Classroom Observation Code (COC)

The COC (Abikoff & Gittelman, 1985)was designed to quantify the classroom behav-ior of children for diagnostic assessment forADHD and for monitoring the effects of inter-ventions designed to ameliorate the symptomsof ADHD. The COC is a relatively complexcode consisting of 12 behavior categories.Abikoff, Gittelman-Klein, and Klein (1977) re-

ported that training for the code averaged 50hours, and that only 5 of 8 advanced under-graduate and graduate student research assis-tants met the training criteria of 70% agree-ment at the end of training. Like the ADHD-SOC, the COC focuses exclusively on childbehaviors. Categories of the COC are recordedin 15-second intervals using one of two sam-pling methods. The following discrete behav-iors are scored using the partial intervalmethod: (a) interference (e.g., calling out dur-ing a teacher lecture), (b) minor motor move-ment (e.g., twisting and turning while seated),(c) gross motor standing (e.g., out of seat andstanding), (d) gross motor-vigorous (e.g., run-ning or crawling across the classroom), (e)physical aggression (e.g., kicks or hits anotherchild), (f) threat or verbal aggression-to chil-dren or -to teacher (e.g., curses at another childor teacher), and (g) solicitation of teacher (e.g.,raises hand). The following behaviors arecoded using the whole interval method: (a) off-task (e.g., plays with toy while the teacher istalking), (b) noncompliance (e.g., ignores ver-bal direction from teacher), and (c) out of chairbehavior (e.g., out of seat when not appropriateto do so). Finally, if none of the aforementionedbehaviors are noted in an interval, “absence ofbehavior” is coded. Observation sessions usingthe COC typically are 32 minutes in duration.A target child and a same gender teacher-nomi-nated “normal” peer are observed for 16 min-utes each, in alternating 4-minute blocks.

Psychometric properties. Reportedinterobserver agreement for the COC has beenhigh. For example, Abikoff et al. (2002) col-lected interobserver agreement data for 10%of 1,893 observations, which yielded mean phicoefficients ranging from .80 to 1.00. The dis-criminant validity of the COC is well docu-mented (Abikoff et al., 1977; Abikoff,Gittelman, & Klein, 1980; Abikoff et al., 2002).For example, in a study of 502 pairs of chil-dren with ADHD and their classmates, Abikoffet al. (2002) found that all of the COC catego-ries significantly discriminated between chil-dren with ADHD and their typically develop-ing peers. The categories of off-task and inter-ference have been found to be the most dis-criminating, correctly classifying 77% and

467


76.2% of cases, respectively (e.g., ADHD vs.“normal”). However, by combining the catego-ries of interference, off-task, minor motormovement, gross motor movement, and solici-tation, almost 80% of cases were correctly clas-sified (Abikoff et al., 1980). The treatment sen-sitivity of the COC is well documented andhas been used as a dependent measure in nu-merous studies of medical and psychosocialinterventions for children with ADHD (e.g.,Abikoff et al., 2004; Klein & Abikoff, 1997).Based on our review of the extant literature,no studies have investigated the convergentvalidity of the COC.

Direct Observation Form (DOF)

The DOF (Achenbach, 1986) was de-signed to obtain ratings of problem behaviorsand on-task behavior directly observed ingroup settings, and is part of the AchenbachSystem of Empirically-Based Assessment(ASEBA; Achenbach & Rescorla, 2001). TheDOF has been used in research studies acrossa number of school settings, including theclassroom, lunchroom, and playground. Train-ing for the DOF should take about 10 hours.Although each observation period is relativelybrief (10 minutes), the developers of the DOFrecommend that three to six observations beperformed to gain a stable estimate of childbehavior. During each observation session, theobserver writes a narrative or running log de-scribing the target student’s behavior. In thelast 5 seconds of each 1-minute interval, theobserver also records whether the target childis on-task or off-task. On-task versus off-taskis determined by the predominant activity sam-pling method wherein behavior must occur formore than half of the 5-second sampling inter-val. Hence, the DOF requires that the observerwrite a narrative and observe on- and off-taskbehavior simultaneously.

At the end of each 10-minute observa-tion session the observer uses the DOF formto rate the student’s behavior on 97 problemitems. Problem items are scored on a 4-pointLikert scale: 0 = no occurrence; 1 = slight orambiguous occurrence; 2 = definite occurrencewith mild to moderate intensity and less than3-minutes duration; and 3 = definite occurrence

with severe intensity or greater than 3-minutesduration. Problem items are short (e.g., “actstoo young for age,” “sulks,” “nervous, highstrung, or tense”) with 72 items correspond-ing to items of the Child Behavior Checklistfor Ages 6 to 18 (CBCL; Achenbach &Rescorla, 2001) and 83 items correspondingto the Teacher Report Form (TRF; Achenbach& Rescorla, 2001). Factor analyses of datafrom 212 clinically referred children between5 and 14 years of age generated six syndromescales (Withdrawn-Inattentive, Nervous-Ob-sessive, Depressed, Hyperactive, AttentionDemanding, Aggressive; Achenbach &Rescorla, 2001), plus Internalizing and Exter-nalizing scales. The DOF also provides a To-tal Problem score that is the sum of the 0 to 3ratings on the 97 items and an on-task scoreranging from 1 to 10. In addition, the develop-ers of the DOF recommend observing twocomparison children in the same setting (oneobserved before and one after the target stu-dent). The DOF scoring profile provides rawscores for the six syndrome scales, plus Tscores for Internalizing, Externalizing, andTotal Problems. The DOF profile comparesscores for the target child (and control chil-dren) to a normative sample of 287 childrenfrom Nebraska, Oregon, and Vermont.

Psychometric properties. In severalstudies Pearson correlations were indicative ofgood interobserver agreement. Averagingacross four studies of children in public schoolclassrooms and a residential treatment center(Achenbach & Edelbrock, 1983; McConaughy,Achenbach, & Gent, 1988; McConaughy, Kay,& Fitzgerald, 1998, 1999), mean interobserveragreement was .90 for DOF Total BehaviorProblems and .84 for on-task. In an examina-tion of the generalizability of the DOF, Reedand Edelbrock (1983) found that DOF TotalBehavior Problems (mean intraclass correlation=.85), but not on-task (mean intraclass correla-tion = .58), generalized well from one observerto another for individual 10-minute observationsessions. When data from six sessions werecombined interclass correlations improved foron-task (mean intraclass correlation = .71), andTotal Behavior Problems remained stable(mean intraclass correlation = .86).

468


The convergent validity of the DOF issupported by significant correlations (rs =.37to .51) between total behavior problem scalesof the DOF and TRF (Achenbach & Edelbrock,1986; Reed & Edelbrock, 1983). The TotalBehavior Problems score and the on-task scoreof the DOF have also been shown to discrimi-nate between boys referred for problem behav-ior and a sample of typically developing boysmatched for age, grade, and race (Reed &Edelbrock, 1983). The treatment sensitivity ofthe on-task, internalizing, nervous/obsessive,and depressed scales (McConaughy et al.,1999) has been demonstrated in evaluations ofschool-based programs to prevent emotionaldisturbance (McConaughy et al., 1998, 1999).

State-Event Classroom ObservationSystem (SECOS)

The SECOS (Saudargas, 1997) was de-signed to quantify student behavior as part ofa comprehensive multimethod assessment andto assess the effectiveness of classroom inter-ventions. It has been used in research studiesinvolving students from first grade throughhigh school. Learning the code typically re-quires 13 to 15 hours of training (Saudargas,1997). For the SECOS, momentary time sam-pling is used to derive an estimate of theamount of time the student engages in the fol-lowing six “state” behaviors: (a) school work(e.g., a student is solving a math problem in aworkbook), (b) out of seat (e.g., student leavesseat without permission), (c) looking around(e.g., student looks out window), (d) socialinteraction with child (e.g., student talks toneighbor about school work), (e) social inter-action with teacher (e.g., teacher is helping stu-dent solve a math problem), and (f) other ac-tivity (e.g., sharpening pencil). The frequencyof five additional “event” behaviors are re-corded in 15-second intervals: (a) raise hand(e.g., student raises hand in response to ateacher question), (b) calling out to teacher(e.g., student calls teacher to ask for help), (c)approach child (e.g., student taps neighbor onthe shoulder), (d) other child approach (anotherchild taps the target student on the shoulder),(e) teacher approach (e.g., teacher asks studenta question). Out of seat appears as both a state

and an event category, which allows for an es-timate of both the frequency and the durationof this behavior.

The author of the SECOS recommendsobserving a classroom peer for comparisonpurposes. Although no guidelines are offeredto direct the collection of such data in theSECOS manual, research studies have col-lected target and peer data in alternating 20-minute sessions (cf., Slate & Saudargas,1986a). Also, normative data for the SECOSare available for children in first through fifthgrade. The normative sample consisted of 500children from 10 schools in East Tennessee.Due to a lack of statistically significant differ-ences in scores between boys and girls in thenormative sample, these data are grouped to-gether in T-score conversion tables by grade.

Psychometric properties. Interobserveragreement using the SECOS appears good.Fellers and Saudargas (1987) reported an av-erage total agreement of .81. In another study,Slate and Saudargas (1986a) found total agree-ment to range from .75 to 1.0 for 25% of ob-servations.

Although there do not appear to be anypublished data supporting the convergent va-lidity of the SECOS, two studies have exam-ined the accuracy of the SECOS. Saudargasand Lentz (1986) found that the associationbetween state and event category scores on theSECOS and real-time recording of the samebehaviors on hand-held computers supportedthe sampling methods employed (rs =.67 to.92), as did t-tests comparing levels of esti-mated and real time scores. However, in a laterstudy, Saudargas and Zanolli (1990) found thatmomentary time sampling in 15-second inter-vals may not be sensitive to behaviors of shortduration (e.g., teacher interactions, verbaliza-tions). If these behaviors are of particular in-terest, these authors have suggested that short-ening intervals (e.g., 5-second) would improvesensitivity, but perhaps at the cost of reliabil-ity.

The SECOS significantly discriminatedbetween typically developing children andthose with behavior disorders (Slate &Saudargas, 1986a), and learning disabilities(e.g., Fellers & Saudargas, 1987). However, it

469


should be noted that only observed teacherbehaviors and a combination of observedteacher and child behaviors were able to dis-criminate between boys with learning disabili-ties and their typically developing same gen-der peers (Slate & Saudargas, 1986b).

Student Observation System (SOS)

The SOS (Reynolds & Kamphaus, 2004)was designed to assess a broad array of bothadaptive and maladaptive classroom behaviors,and is a component of the Behavior Assess-ment System for Children-2nd Edition (BASC-2; Reynolds & Kamphaus, 2004). It has beensuggested that training for the SOS can be ac-complished in a 30-minute workshop or bysimply reading the manual, but no criterion fortraining has been reported (Lett & Kamphaus,1997). The length of observation sessions istypically 30 minutes, but the authors recom-mend observing the target child across 3 or 4days in different classrooms to enhance thereliability of measurement.

Using the SOS, the observer takes notesconcerning child and teacher behaviors for 27seconds of each 30-second interval. In the last3 seconds of each interval, the observer uses a3-second momentary time sampling procedureto record adaptive and/or maladaptive behav-iors exhibited by the target child. The adap-tive behaviors are coded using the followingfour categories: (a) response to teacher/lesson(e.g., answers teacher’s question appropri-ately), (b) peer interaction (e.g., participatesappropriately in small group discussion), (c)work on school subjects (e.g., completing amath worksheet alone), and (d) transitionmovement (e.g., walking to blackboard whenasked to do so). The following nine categoriesare grouped together as maladaptive behaviors:(a) inappropriate movement (e.g., walkingaround classroom when inappropriate), (b) in-attention (e.g., doodling on book), (c) inappro-priate vocalization (e.g., teases another stu-dent), (d) somatization (e.g., complains abouta headache), (e) repetitive motor movements(e.g., plays with hair), (f) aggression (e.g., in-tentionally breaks a neighbor’s pencil), (g) self-injurious behavior (e.g., pulls own hair), (h)inappropriate sexual behavior (e.g., strokes

self), and (i) bowel/bladder problems (e.g.,wets pants). At the end of the 30-minute ob-servation session, the observer then reviewsnotes and rates the student’s behavior on 65behavior items on a 3-point Likert scale (NO= never observed, SO = sometimes observed,FO = frequently observed). Items are groupedaccording to the aforementioned adaptive andproblem behavior categories. For the problembehavior items, there is a column to indicatewhether the behavior was disruptive to theclass.

Psychometric properties. Unfortu-nately, little published data are available con-cerning the psychometric properties of theSOS. The manual for the BASC-2 reports nodata concerning technical adequacy of the SOS.In one study, the SOS was evaluated with re-gard to its ability to discriminate a group of 37children with ADHD from a group of 18 typi-cally developing children (Lett & Kamphaus,1997). In this study interobserver agreementwas examined in a subsample of participants(n = 44) with coefficients reported to be in the.80s. However, the range of interobserveragreement coefficients and the method em-ployed to evaluate interobserver agreementwas not reported, making interpretation ofthese data difficult. Nevertheless, scores on thecategory of inappropriate movement and theproblem behavior composite (of which the in-appropriate movement category is a contribu-tor) from the momentary time sampling por-tion of the SOS significantly discriminatedchildren with ADHD from typically develop-ing children.

Discussion

The purpose of this article was to criti-cally evaluate seven observation systems de-signed to assess a student’s classroom behav-ior. Table 3 provides a summary of thestrengths, limitations, and recommended usesfor each of the codes included in this review.

Recommendations for selection ofobservation codes. Three of the codes, theAET-SSBD, DOF, and SOS, were developedin conjunction with other measures (the SSBD,ASEBA, and BASC-2, respectively) and are

470


closely aligned with the constructs assessed bythese measures. The remaining four codes weredeveloped to assess key behavioral domains,although some focus exclusively on problembehaviors (i.e., ADHD-SOC, COC), whereasothers (i.e., BOSS, SECOS) focus on positivebehaviors (e.g., academic engagement) as wellas problem behaviors. As such, the primarytarget behavior(s) of interest will be one of theinitial factors guiding the selection of a poten-tial observation code from among those in-cluded in this review.

Beyond the consideration of target be-haviors, reliability and validity evidence mustbe weighed when deciding which observationcode to use. With the exception of the SOS, allof the reviewed codes have minimally suffi-cient reliability evidence. With regard to va-lidity, all of the codes have at least some evi-dence to suggest that scores differentiate be-tween students with classroom behavior dif-ficulties and students without such difficulties.Only three of the codes (AET-SSBD, ADHD-SOC, and DOF), however, have published evi-dence of convergent validity. Similarly, onlyfour of the codes (ADHD-SOC, COC, DOF,and BOSS) have published evidence to sup-port their use for monitoring change in class-room behavior in response to intervention.

Given the strengths and limitations of theavailable data, six of the codes have sufficientevidence to be used as part of a multimethodassessment. Based on the available data, theADHD-SOC and DOF appear to have the mostsupport for use in the multimethod assessmentof externalizing problems, and the DOF is theonly code appropriate for assessing internaliz-ing problems. The COC shows promise in theassessment of classroom behaviors associatedwith ADHD, whereas the SECOS, BOSS, andAET-SSBD demonstrate promise in the assess-ment of positive behaviors in the classroomsetting. The extremely limited published evi-dence available for the SOS precludes its useat the current time.

Recommendations for observationbest practices. In the beginning of this articlewe listed the threats to the validity of observa-tional assessment identified by Merrell (1999),including the use of poorly defined behavior

categories and inappropriate code selection. Bypresenting information concerning the sevencodes reviewed here we hope to enhance thevalidity of observations by facilitating theappropriate selection of well-validated cod-ing schemes for particular assessment tasks.There are other strategies that observers canuse to maximize the validity of observa-tional assessments. First, it is incumbent onthe observer to ensure that they are ad-equately trained on a given code and thatthe consistency of their observations doesnot decline over time. Training requirements(summarized in Table 1) should be takeninto consideration when selecting any givencode. One way to ensure the adequacy oftraining is to utilize a precoded videotapeto determine if a minimum degree of accu-racy has been achieved. Unfortunately, suchtapes are available only for the AET-SSBD.Alternatively, observers can check theiragreement with a second observer (seeHintze, 2005, for methods to calculateinterobserver reliability). In addition to ascer-taining whether observers have been trainedto criterion initially, it is also necessary peri-odically to check reliability to curb observerdrift.

Second, several observations are neces-sary to achieve a reliable estimate of a studentbehavior (see Hintze, 2005). Although this hasbeen discussed in terms of traditional measure-ment theory (e.g., the measurement of a trait),it would seem equally valid for behavioral ap-proaches to assessment wherein one is moreinterested in assessing differences in a givenbehavior across conditions. As such, it is rec-ommended that if one wishes to make com-parisons of student behavior across settings,multiple observations should be performedwithin each setting. Third, the normative datathat are currently available (e.g., AET-SSBD,SECOS) appear inadequate due to either sam-pling techniques, the age of the data, or both.Further, given the variability in ecology acrosseducational settings (e.g., task demands, class-room rules, classroom management skills,quality of instruction) standardized norms seemill suited for observational assessment. Hence,it is recommended that local normative databe collected for frequently used codes and that

471


for each assessment, data be collected on oneor more peers under the same conditions as thetarget child.

Two final considerations for ensuringobservation best practices are reactivity andobserver bias. Reactivity refers to a target childaltering behaviors as a result of being observed,resulting in inaccurate estimates of actual tar-get behaviors. One strategy for minimizingreactivity is to conduct multiple observationsto increase the child’s familiarity/comfort withthe observer in the classroom. Observer biasalso affects observation accuracy and refers tothe tendency of an observer to consistentlyview (and record) observed behaviors in aparticular way (e.g., negative, positive). Ad-equate training and periodic reliabilitychecks described previously are perhaps themost effective way to minimize the likelihoodof observer bias.

Finally, no assessment should rely on asingle measurement method, particularly whenreliability and validity evidence is limited.Assessments are enhanced when multiplemethods are employed (Cone, 1978) to assessbehavior across multiple dimensions(Achenbach, 1993). Hence, observations, likeany other assessment methodology, shouldonly be used as part of a broader assessmentbattery irrespective of the assessment domain.

Future Research Directions

There are multiple critical directions forfuture research to ensure identification of ap-propriate (and inappropriate) uses of the stan-dardized behavior observation codes reviewedherein. In light of the limited number of stud-ies evaluating convergent validity, each of thecodes would benefit from additional studiesaddressing this type of evidence. Second, evi-dence for treatment sensitivity is nonexistentfor some codes and minimally sufficient forothers. Studies examining treatment sensitiv-ity are essential if these systems are to be usedto evaluate intervention effectiveness.

One final line of validity evidence notcurrently addressed in studies of these codesis the representativeness of observed behaviorbased on a small number of observations. De-spite the common professional belief that re-

sults of observation are the “gold standard” inthe assessment of behavior, studies (e.g., Doll& Elliott, 1994; Hintze & Matthews, 2004)have raised important questions regarding thevalidity of a small number of observations tomeasure classroom behavior. Given that mostpractitioners rarely have time to engage in alarge number of observations for an individualstudent, determining the validity of a singleobservation (or small number of observations)with each of these codes is essential for justi-fication of their use in professional practice forscreening and diagnosis.

Conclusions

The direct assessment of student behav-ior has been a critical component of compre-hensive evaluations of student behavior inclassroom settings. The seven observationcodes reviewed in this article have been de-veloped to provide practitioners with a stan-dardized framework for measuring classroombehavior. With the exception of one code, allhave published interobserver agreement evi-dence to support their use with school-agepopulations. Most also have some evidence ofpredictive validity and treatment sensitivity,though much of this evidence is limited tosingle studies with samples that are small tomoderate in size. Even less evidence is avail-able related to convergent validity. As a re-sult of these limitations in existing evidence,much research is necessary to ensure thatthese codes are used for appropriate assess-ment purposes, behaviors, and target stu-dents. Until the completion of such studies,practitioners are encouraged to select mea-sures cautiously and use multiple methodsin screening, diagnosis, and evaluation oftreatment effectiveness.

References

Abikoff, H., & Gittelman, R. (1985). Classroom Obser-vation Code: A modification of the Stony Brook Code.Psychopharmacology Bulletin, 21, 901-909.

Abikoff, H., Gittelman, R., & Klein, D. F. (1980). A class-room observation code for hyperactive children: A rep-lication of validity. Journal of Consulting and Clini-cal Psychology, 48, 555-565.

Abikoff, H., Gittelman-Klein, R., & Klein, D. F. (1977).Validation of a classroom observation code for hyper-

472


active children. Journal of Consulting and ClinicalPsychology, 45, 772-783.

Abikoff, H., Hechtman, L., Klein, R. G., Weiss, G., Fleiss,K., Etcovitch, J., Cousins, L., Greenfield, B., Martin,D., & Pollack, S. (2004). Symptomatic improvementin children with ADHD treated with long-term meth-ylphenidate and multimodal psychosocial treatment.Journal of the American Academy of Child and Ado-lescent Psychiatry, 43, 802-811.

Abikoff, H. B., Jensen, P. S., Arnold, L. L. E., Hoza, B.,Hechtman, L., Pollack, S., Martin, D., Alvir, J., March,J. S., Hinshaw, S., Vitiello, B., Newcorn, J., Greiner,A., Cantwell, D. P., Conners, C. K., Elliott, G.,Greenhill, L. L., Kraemer, H., Pelham, W. E., Jr., Se-vere, J. B., Swanson, J. M., Wells, K., & Wigal, T.(2002). Observed classroom behavior of children withADHD: Relationship to gender and comorbidity. Jour-nal of Abnormal Child Psychology, 4, 349-359.

Achenbach, T. M. (1986). The Direct Observation Formof the Child Behavior Checklist (rev. ed.). Burlington,VT: University of Vermont, Department of Psychia-try.

Achenbach, T. M. (1993). Implications of multiaxial em-pirically based assessment for behavior therapy withchildren. Behavior Therapy, 24, 91-116.

Achenbach, T. M., & Edelbrock, C. (1983). Manual forthe Child Behavior Checklist/4-18 and Revised ChildBehavior Profile. Burlington, VT: University of Ver-mont, Department of Psychiatry.

Achenbach, T. M., & Edelbrock, C. (1986). Manual forthe Teacher’s Report Form and Teacher Version of theChild Behavior Profile. Burlington, VT: University ofVermont, Department of Psychiatry.

Achenbach, T. M., & Rescorla, L. A. (2001). Manual forthe ASEBA School-Age Forms & Profiles. Burlington,VT: Research Center for Children, Youth, and Fami-lies.

Cone, J. D. (1978). The behavioral assessment grid (BAG):A conceptual framework and taxonomy. BehaviorTherapy, 9, 882-888.

Conners, C. K. (1973). Rating scale for use in drug stud-ies with children [Special issue: Pharmacotherapy ofchildren]. Psychopharmacology Bulletin, 24-84.

Doll,B., & Elliott, S. N. (1994). Representativeness ofobserved preschool social behaviors: How many dataare enough? Journal of Early Intervention, 18, 227-238.

DuPaul, G. J., Volpe, R. J., Jitendra, A. K., Lutz, J. G.,Lorah, K. S., & Grubner, R. (2004). Elementary schoolstudents with attention-deficit/hyperactivity disorder:Predictors of academic achievement. Journal of SchoolPsychology, 42, 285-301.

Fellers, G., & Saudargas, R. A. (1987). Classroom behav-iors of LD and nonhandicapped girls. Learning Dis-ability Quarterly, 10, 231-236.

Gadow, K. D., Nolan, E. E., Sprafkin, J., & Sverd, J.(1995). School observations of children with attention-deficit hyperactivity disorder and comorbid tic disor-der: Effects of methylphenidate treatment. Journal ofDevelopmental and Behavioral Pediatrics, 16, 167-176.

Gadow, K. D., Nolan, E. E., & Sverd, J. (1992). Meth-ylphenidate in hyperactive boys with comorbid tic dis-order: II. Behavioral effects in school settings. Jour-

nal of the American Academy of Child and AdolescentPsychiatry, 31, 462-471.

Gadow, K. D., Nolan, E. E., Sverd, J., Sprafkin, J., &Paolicelli, L. (1990). Methylphenidate in aggressive-hyperactive boys: I. Effects on peer aggression in publicschool settings. Journal of the American Academy ofChild and Adolescent Psychiatry, 29, 710-718.

Gadow, K. D., Paolicelli,L. M., Nolan,E. E., Schwartz, J.,Sprafkin, J., & Sverd, J. (1992). Methylphenidate inaggressive hyperactive boys: II. Indirect effects ofmedication on peer behavior. Journal of Child andAdolescent Psychopharmacology, 2, 49-61.

Gadow, K. D., Sprafkin, J., & Nolan, E. E. (1996). ADHDSchool Observation Code. Stony Brook, NY: Check-mate Plus.

Greenwood, C. R. (1996). The case for performance-basedinstructional models. School Psychology Quarterly, 11,283-296.

Greenwood, C. R., Carta, J. J., Kamps, D., & Delquadri,J. (1993). Ecobehavioral Assessment Systems Software(EBASS): Observational instrumentation for schoolpsychologists. Kansas City: Juniper Gardens Children’sProject, University of Kansas.

Gruber, R., DuPaul, G. J., Jitendra, A. K., Volpe, R. J., &Lorah, K. S. (in press). Classroom observations of stu-dents with and without ADHD: Differences across aca-demic subjects and types of engagement. Journal ofSchool Psychology.

Hintze, J. M. (2005). Psychometrics of direct observation.School Psychology Review, 34, 507-519.

Hintze, J. M., & Matthews, W. J. (2004). Thegeneralizability of systematic direct observationsacross time and setting: A preliminary investigation ofthe psychometrics of behavioral observation. SchoolPsychology Review, 33, 258-270.

Hintze, J. M., Volpe, R. J., & Shapiro, E. S. (2002). Bestpractices in systematic direct observation of studentbehavior. In A. Thomas & J. Grimes (Eds.), Best prac-tices in school psychology IV (Vol. 2, pp. 993-1006).Bethesda, MD: National Association of School Psy-chologists.

Klein, R. G., & Abikoff, H. (1997). Behavior therapy andmethylphenidate in the treatment of children withADHD. Journal of Attention Disorders, 2, 89-114.

Leff, S. S., & Lakin, R. (2005). Playground-based obser-vational systems: A review and implications for prac-titioners and researchers. School Psychology Review,34, 474-488.

Lett, N. J., & Kamphaus,R. W. (1997). Differential valid-ity of the BASC Student Observation System and theBASC Teacher Rating Scale. Canadian Journal ofSchool Psychology, 13, 1-14.

Loney, J., & Milich, R. (1982). Hyperactivity, inattentionand aggression in clinical practice. In M. Wolraich &D. K. Routh (Eds.), Advances in developmental andbehavioral pediatrics (Vol. 3, pp. 113-147). Greenwich,CT: JAI Press.

McConaughey, S. H., Achenbach,T. M., & Gent, C. L.(1988). Multiaxial empirically based assessment: Par-ent, teacher, observational, cognitive, and personalitycorrelates of child behavior profile types for 6- to 11-year-old boys. Journal of Abnormal Child Psychology,16, 485-509.

473


McConaughy, S. H., Kay, P. J., & Fitzgerald, M. (1998).Preventing SED though parent-teacher action researchand social skills instruction: First-year outcomes. Jour-nal of Emotional and Behavioral Disorders, 6, 81-93.

McConaughy, S. H., Kay, P. J., & Fitzgerald, M. (1999).The Achieving, Behaving, Caring Project for prevent-ing ED: Two-year outcomes. Journal of Emotional andBehavioral Disorders, 7, 224-239.

Merrell, K. W. (1999). Behavioral, social, and emotionalassessment of children & adolescents. Mahwah, NJ:Lawrence Erlbaum Associates.

Nolan, E. E., & Gadow, K. D. (1994). Relation betweenratings and observations of stimulant drug response inhyperactive children. Journal of Clinical Child Psy-chology, 23, 78-90.

Ota, K. R., & DuPaul, G. J. (2002). Task engagement andmathematics performance in children with attention-deficit hyperactivity disorder: Effects of supplementalcomputer instruction. School Psychology Quarterly, 17,242-257.

Quinn, M. M., Mathur, S. R., & Rutherford, R. B. (1995).Early identification of antisocial boys: A multi-methodapproach. Education and Treatment of Children, 18,272-281.

Reed, M. L., & Edelbrock, C. (1983). Reliability and va-lidity of the Direct Observation Form of the ChildBehavior Checklist. Journal of Abnormal Child Psy-chology, 11, 521-530.

Reynolds,C. R., & Kamphaus, R. W. (2004). BehaviorAssessment System for Children (2nd ed.). CirclePines, MN: American Guidance System Publishing.

Saudargas, R. A. (1997). State-Event Classroom Obser-vation System (SECOS). Observation manual. Univer-sity of Tennessee, Knoxville.

Saudargas, R. A., & Lentz, F. E. Jr. (1986). Estimatingpercent of time and rate via direct observation: A sug-gested observational procedure and format. SchoolPsychology Review, 15, 36-48.

Saudargas, R. A., & Zanolli, K. (1990). Momentary timesampling as an estimate of percentage time: A fieldvalidation. Journal of Applied Behavior Analysis, 23,533-537.

Shapiro, E. S. (2004). Academic skills problems workbook(rev.). New York: The Guilford Press.

Shapiro, E. S., & Heick, P. (2004). School psychologistassessment practices in the evaluation of students re-ferred for social/behavioral/emotional problems. Psy-chology in the Schools, 41, 551-561.

Slate, J. R., & Saudargas, R. A. (1986a). Differences inthe classroom behaviors of behaviorally disordered andregular class children. Behavioral Disorders, 11, 45-55.

Slate, J. R., & Saudargas, R. A. (1986b). Differences inlearning disabled and average students’ classroom be-haviors. Learning Disability Quarterly, 9, 61-67.

Walker, H. M., & Severson, H. H. (1990). SystematicScreening for Behavior Disorders: Users guide andadministration manual. Longmont, CO: Sopris West.

Walker, H. M., Severson, H. H., Nicholson, F., Kehle, T.,Jenson, W. R., & Clark, E. (1994). Replication of theSystematic Screening for Behavior Disorders (SSBD)procedure for the identification of at-risk children.Journal of Emotional and Behavioral Disorders, 2, 66-77.

Walker, H. M., Severson, H., Stiller, B., Williams, G.,Haring, N., Shinn, M., & Todis, B. (1988). Systematicscreening of pupils in the elementary age range at riskfor behavior disorders: Development and trail testingof a multiple gating model. Remedial and Special Edu-cation, 9(3), 8-14.

Walker, H. M., Severson, H. H., Todis, B. J., Block-Pedego,A. E., Williams, G. J., Haring, N. G., & Barckley, M.(1990). Systematic Screening for Behavior Disorders(SSBD): Further validation, replication, and norma-tive data. Remedial and Special Education, 11(2), 32-46.

Wilson, M. S., & Reschly, D. J. (1996). Assessment inschool psychology training and practice. School Psy-chology Review, 25, 9-23.

Winsor, A. P. (2003). Direct observation for classrooms.In C. R. Reynolds & R. W. Kamphaus (Eds.), Hand-book of psychological & educational assessment ofchildren: Personality, behavior, and context (2nd ed.,pp. 248-255). New York: Guilford Press.

Robert J. Volpe, PhD, is Assistant Professor in the Department of Counseling and AppliedEducational Psychology at Northeastern University. His primary research interests con-cern academic problems experienced by children with attention-deficit/hyperactivity dis-order, academic and behavioral assessment, and academic interventions.

James Clyde DiPerna, PhD, is Assistant Professor in the School Psychology Program atthe Pennsylvania State University. His research focuses on assessment and interventionstrategies to promote students’ academic, social, and emotional competence.

John M. Hintze, PhD, is an Associate Professor and Director of the School PsychologyProgram at the University of Massachusetts at Amherst. He received his doctorate fromLehigh University in 1994 and prior to that was a practitioner in the public schools for 10years. His research interests are in CBM and various forms of progress monitoring, re-search design, and data analysis that informs practice.

474


Edward S. Shapiro, PhD, currently is Iacocca Professor of Education, Professor of SchoolPsychology and Director, Center for Promoting Research to Practice in the College ofEducation at Lehigh University, Bethlehem, Pennsylvania. He is the author or co-authorof 10 books including his most recently published third edition of Academic Skills Prob-lems: Direct Assessment and Intervention and the Academic Skills Problems Workbook(revised edition), both by Guilford Press. His primary research interests are assessmentand intervention for academic skills problems, issues in scaling up of research to practice,and Pediatric School Psychology.

Observing students in classroom settings: A review of seven coding schemes. School Psychology...

Documents

Transcript of Observing students in classroom settings: A review of seven coding schemes. School Psychology...