Observing students in classroom settings: A review of seven coding schemes. School Psychology...
Transcript of Observing students in classroom settings: A review of seven coding schemes. School Psychology...
454
School Psychology Review,2005, Volume 34, No. 4, pp. 454-474
Observing Students in Classroom Settings: A Review ofSeven Coding Schemes
Robert J. VolpeNortheastern University
James C. DiPernaThe Pennsylvania State University
John M. HintzeUniversity of Massachusetts-Amherst
Edward S. ShapiroLehigh University
Abstract. A variety of coding schemes are available for direct observational as-sessment of student classroom behavior. These instruments have been used for anumber of assessment tasks including screening children in need of further evalu-ation for emotional and behavior problems, diagnostic assessment of emotionaland behavior problems, assessment of classroom ecology in the formulation ofacademic interventions, and monitoring the progress of medical, psychosocial,and academic interventions. Although this method of behavioral assessment has ahigh degree of face validity, it is essential to consider the psychometric propertiesof available coding schemes to select the appropriate instrument for a given as-sessment. This article reviews the structure, content, training requirements, andavailable psychometric properties of seven available direct observation codes. Rec-ommendations for the use of each code and future directions for research in obser-vational assessment are provided.
assessment tasks including: (a) screening chil-dren in need of further evaluation for emotionaland behavior problems, (b) diagnostic assess-ment of emotional and behavior problems, (c)assessing classroom ecology in the formula-tion of academic interventions, and (d) moni-toring the progress of medical, psychosocial,and academic interventions.
Systematic observation of student behav-ior is among the most common assessmentmethodologies utilized by school psychologists(Shapiro & Heick, 2004; Wilson & Reschly,1996) and is viewed as one of the most objec-tive and direct measurement tools available forthe assessment of child behavior. This methodtraditionally has been used for a number of
Correspondence regarding this article should be addressed to: Robert J. Volpe, PhD, Department of Coun-seling and Applied Educational Psychology, 202 Lake Hall, Northeastern University, Boston, MA 02115-5000; E-mail: [email protected]
Copyright 2005 by the National Association of School Psychologists, ISSN 0279-6015
“You can observe a lot just by watchin’.” Yogi Berra
455
Classroom Observation Codes
Information gathered via direct observa-tion has a high degree of face validity; however,several factors may have a negative effect onthe quality of these data. Merrell (1999) lists thefollowing threats to the validity of behavioralobservation: (a) poorly defined behavior catego-ries, (b) low interobserver reliability, (c) observeereactivity, (d) situational specificity of targetbehaviors, (e) inappropriate code selection, and(f) observer bias. These threats may be mini-mized through the selection and use of well-vali-dated instruments and adequate training in theiruse. Hintze (2005) provides specific guidelinesfor the selection of appropriate coding schemes.Because no observation code is appropriate forall situations, the selection of an appropriateobservation coding scheme is an essential stepin maximizing the validity of an observation-based assessment.
To select the observation code that ismost appropriate for a given purpose, usersshould be aware of the available codes, theircomposition, their psychometric properties(see Hintze, 2005), and the amount of time re-quired to learn each code. Although severalresources exist that describe observation cod-ing schemes (e.g., Hintze, Volpe, & Shapiro,2002; Winsor, 2003), they include only a fewmeasures and provide limited information re-garding the psychometric properties of eachmeasure. The purpose of this article is to pro-vide a comprehensive review of observationcoding schemes available to assess the aca-demic behaviors of elementary school children.Although coding schemes exist to measure awide variety of student behaviors in classrooms(e.g., academic behaviors, social behaviors,student-teacher interactions), this review fo-cuses on codes measuring academic engage-ment given the central role this observableclassroom behavior plays in the learning pro-cess (e.g., Greenwood, 1996). The ultimategoal is to assist school psychologists’ with theselection of appropriate observation codes foruse in their professional practice.
Selection of Observation Codes forReview
In keeping with our aforementioned pur-pose, we limited our review to coding schemes
easily obtained by school practitioners that in-clude a measure of academic engagement andcan be used without the need of a computer.Although excellent computer-based codingsystems exist for assessing child academic be-haviors such as the Eco-Behavioral AssessmentSoftware System (Greenwood, Carta, Kamps,& Delquadri, 1993), the need of a portablecomputer and extensive training limits theirwidespread use by school practitioners. To beincluded in this review, a coding system must:(a) have been designed for use in elementaryclassrooms; (b) have a manual available fromthe author, or otherwise published, in a formto facilitate standardized use of the code; (c)assess academically related behaviors (e.g.,academic engagement); and (d) allow for re-cording via a paper-and-pencil format (someavailable codes can be used only with por-table technology, such as a laptop computeror PDA). A search of electronic databases(e.g., ERIC Assessment Clearinghouse,PsychInfo), Buros Mental MeasurementsYearbooks, recent test catalogs, and attemptsto contact developers yielded the followingseven codes meeting our inclusion criteria: Aca-demic Engaged Time Code of the SystematicScreening for Behavior Disorders (AET-SSBD;Walker & Severson, 1990), ADHD School Ob-servation Code (ADHD-SOC; Gadow,Sprafkin, & Nolan, 1996), Behavioral Obser-vation of Students in Schools (BOSS; Shapiro,2004), Classroom Observation Code (COC;Abikoff & Gittelman, 1985), Direct Observa-tion Form (DOF; Achenbach, 1986), State-Event Classroom Observation System(SECOS; Saudargas, 1997), and Student Ob-servation System of the Behavioral AssessmentSystem for Children-2 (SOS; Reynolds &Kamphaus, 2004).
For each systematic observation codeincluded in this review, we report the follow-ing information: (a) the general purpose of thecode, (b) the length of time required for train-ing (as reported in research studies, these arelikely overestimates for experienced schoolpsychologists), (c) a list of behavior catego-ries (with examples of target behaviors), and(d) a summary of published psychometric data.In addition, we address the strengths and limi-
456
School Psychology Review, 2005, Volume 34, No. 4
tations of each code and offer recommenda-tions for its appropriate use. Table 1 presentscharacteristics of each code, including avail-ability, availability on hand-held computers,recording method (e.g., partial interval, mo-mentary time sample), behavior categories,training requirements, and the typical lengthof single observations. Table 2 summarizes thepsychometric properties of each code. Table 3summarizes the strengths and limitations ofeach code and offers recommendations for usebased on available data. In the following sec-tions we review each of the seven systematicdirect observation codes.
Academic Engaged Time Code (AET-SSBD)
The AET-SSBD is a component ofSystematic Screening of Behavior Disorders(SSBD; Walker & Severson, 1990), a screen-ing procedure for emotional and behavior dis-orders in elementary school children (Grades1–6). The full SSBD system involves threegates: (a) teacher rank ordering of students interms of externalizing and internalizing prob-lems, (b) teacher ratings of adaptive behaviorand critical events, and (c) systematic directobservations in multiple school settings (theAET-SSBD in classroom settings, and the PeerSocial Behavior Code or PSB-SSBD in free-play settings). Leff and Lakin (2005) providea review of the PSB in this issue. The AET-SSBD and PSB-SSBD are used in the third gateto verify teacher ratings. The AET-SSBD mea-sures the amount of time a student spends en-gaged in academic material during indepen-dent seatwork (e.g., listening to the teacher,writing in a workbook). The total amount oftime a student exhibits behavior consistent withthe definition of academic engagement is re-corded with a stopwatch. This value is dividedby the overall time of observation (usually 15minutes) and multiplied by 100 to computeacademic engaged time. Typically data fromtwo observations are averaged to obtain a stableacademic engaged time score. As opposed tocomparing this score to one or more randomlyselected peers, in the AET-SSBD raw scoresare converted to T-scores using normative datatables. Normative data for the SSBD-AET con-
sist of a sample of 1,300 first through sixthgrade children from 16 school districts acrosssix states. Normative tables in the SSBDmanual are arranged according to gender andwhether or not the child of interest has met thecriteria for either internalizing or externaliz-ing problems in the second stage of the SSBD.Training requirements have been reported toconsist of as little as 4 to 6 hours for both theAET-SSBD and PSB-SSBD codes (Walker etal., 1990).
Psychometric properties. For theAET-SSBD, interobserver agreement is calcu-lated by dividing the smaller score of two ob-servers by the larger score. Mean interobserveragreement coefficients across five publishedstudies have ranged from .95 (Walker et al.,1994) to .98 (Quinn, Mathur, & Rutherford,1995). Coefficients for individual cases havetypically ranged from the .80s to 1.00. Scoreson the AET-SSBD have been shown to corre-late significantly (r = -.42) with teacher rat-ings of externalizing behavior problems(Walker et al., 1988). Several studies havefound scores on the AET-SSBD to accuratelydiscriminate control children from those nomi-nated as at-risk by teachers for externalizingproblems (e.g., Quinn et al., 1995; Walker etal., 1990). Less consistent has been the abilityof scores from the AET-SSBD to discriminatechildren nominated as at-risk for internalizingproblems from controls (Walker et al., 1988;Walker et al., 1990).
Attention-Deficit HyperactivityDisorder School Observation Code(SOC)
The ADHD-SOC (Gadow et al., 1996)was developed as both a screening measure andas a tool for evaluating the effects of interven-tions for children with attention-deficit/hyper-activity disorder (ADHD) and related disor-ders across school settings (classroom, lunch-room, playground). According to its develop-ers, training for the ADHD-SOC should requireapproximately 20 to 25 hours. Categories arecoded in 15-second intervals for 15 minutes.The following behaviors are coded using thepartial interval method: (a) interference (e.g.,
457
Classroom Observation Codes
Tab
le 1
Ch
arac
teri
stic
s of
Rev
iew
ed S
yste
mat
ic O
bser
vati
on C
odes
Tra
inin
gTy
pica
l Len
gth
Cod
e/A
vaila
bilit
y C
ompu
teri
zed?
Rec
ordi
ng M
etho
d(s)
Beh
avio
r C
ateg
orie
sR
equi
rem
ents
of O
bser
vatio
n
Aca
dem
ic E
ngag
ed T
ime
Cod
eN
oD
urat
ion
1.A
cade
mic
eng
aged
tim
eL
ow15
min
utes
of th
e SS
BD
(A
ET-
SSB
D;
Wal
ker
& S
ever
son,
199
0)w
ww
.sop
risw
est.c
om
AD
HD
Sch
ool O
bser
vatio
n C
ode
No
Part
ial i
nter
val
1.In
terf
eren
ceM
oder
ate
15 m
inut
es(A
DH
D-S
OC
; Gad
ow, S
praf
kin,
15-s
econ
d in
terv
als
2.M
otor
mov
emen
t&
Nol
an, 1
996)
Ava
ilabl
e fr
om:
3.N
onco
mpl
ianc
ew
ww
.che
ckm
atep
lus.
com
4.N
onph
ysic
al a
ggre
ssio
n5.
Off
-tas
k
Beh
avio
ral O
bser
vatio
n of
Stu
dent
sPa
lmM
omen
tary
/Tim
e1.
Act
ive
enga
ged
time
Mod
erat
e15
min
utes
in S
choo
ls (
BO
SS; S
hapi
ro, 2
004)
Sa
mpl
e2.
Pass
ive
enga
ged
time
Com
pute
r ve
rsio
n av
aila
ble
from
:Pa
rtia
l/Int
erva
l3.
Off
-tas
k m
otor
ww
w.s
opri
swes
t.com
15-s
econ
d in
terv
als
4.O
ff-t
ask
verb
al5.
Off
-tas
k pa
ssiv
e6.
Teac
her
dire
cted
inst
ruct
ion
Cla
ssro
om O
bser
vatio
n C
ode
No
Part
ial
inte
rval
1.In
terf
eren
ceH
igh
32 m
inut
es(C
OC
; Abi
koff
& G
ittel
man
, 198
5)W
hole
inte
rval
2.M
inor
mot
or m
ovem
ent
3.G
ross
mot
or s
tand
ing
Thi
s re
fere
nce
prov
ides
a d
etai
led
4.G
ross
mot
or v
igor
ous
desc
ript
ion
of c
ode
and
the
5.Ph
ysic
al a
ggre
ssio
nob
serv
atio
n pr
otoc
ol.
6.V
erba
l agg
ress
ion
7.So
licita
tion
of te
ache
r8.
Off
-tas
k9.
Non
com
plia
nce
10.
Out
of
chai
r be
havi
or11
.A
bsen
ce o
f be
havi
or(T
able
1 c
ontin
ues)
458
School Psychology Review, 2005, Volume 34, No. 4
(Tab
le 1
con
tinue
d)
Tra
inin
gTy
pica
l Len
gth
Cod
e/A
vaila
bilit
y C
ompu
teri
zed?
Rec
ordi
ng M
etho
d(s)
Beh
avio
r C
ateg
orie
sR
equi
rem
ents
of O
bser
vatio
n
Dir
ect O
bser
vatio
n Fo
rmN
oPr
edom
inan
t act
ivity
1.O
n-ta
skL
ow10
min
utes
(DO
F; A
chen
bach
, 198
6)
sam
plin
g2.
With
draw
n-In
atte
ntiv
ew
ww
.ase
ba.o
rg4-
poin
t Lik
ert s
cale
3.N
ervo
us-O
bses
sive
f
or 9
7 pr
oble
m it
ems
4.
Dep
ress
ed5.
Hyp
erac
tive
6.A
ttent
ion
Dem
andi
ng7.
Agg
ress
ive
Stat
e-E
vent
Cla
ssro
om O
bser
vatio
nN
oM
omen
tary
/tim
eSt
ates
Mod
erat
e20
min
utes
Syst
em (
SEC
OS;
Sau
darg
as, 1
997)
s
ampl
e1.
Scho
ol w
ork
rsau
darg
@ut
k.ed
uFr
eque
ncy
reco
rdin
g 2
.L
ooki
ng a
roun
d15
-sec
ond
inte
rval
s3.
Oth
er a
ctiv
ity4.
Soci
al in
tera
ctio
n w
ith c
hild
*5.
Soci
al in
tera
ctio
n w
ith te
ache
r6.
Out
of
seat
Eve
nts
1.O
ut o
f se
at2.
App
roac
h ch
ild3.
Oth
er c
hild
app
roac
h4.
Rai
se h
and
5.C
allin
g ou
t to
teac
her
6.Te
ache
r ap
proa
ch
Stud
ent O
bser
vatio
n Sy
stem
Win
dow
sM
omen
tary
/tim
eA
dapt
ive
Beh
avio
rsU
ncle
ar15
min
utes
(SO
S; R
eyno
lds
& K
amph
aus,
200
4)Pa
lm
sam
ple
1.R
espo
nse
to te
ache
r/le
sson
ww
w.a
gsne
t.com
30-s
econ
d in
terv
als
2.Pe
er in
tera
ctio
n3.
Wor
ks o
n sc
hool
sub
ject
s4.
Tra
nsiti
on m
ovem
ent
(Tab
le 1
con
tinue
s)
459
Classroom Observation Codes
(Tab
le 1
con
tinue
d)
Tra
inin
gTy
pica
l Len
gth
Cod
e/A
vaila
bilit
y C
ompu
teri
zed?
Rec
ordi
ng m
etho
d(s)
Beh
avio
r C
ateg
orie
sR
equi
rem
ents
of O
bser
vatio
n
3-po
int L
iker
t sca
lePr
oble
m B
ehav
iors
f
or 6
5 be
havi
or1.
Inap
prop
riat
e m
ovem
ent
i
tem
s—so
me
item
s2.
Inat
tent
ion
p
erm
it sc
orin
g of
3.In
appr
opri
ate
voca
lizat
ion
w
heth
er th
e be
havi
or4.
Som
atiz
atio
n
was
dis
rupt
ive
to th
e5.
Rep
etiti
ve m
otor
mov
emen
ts
cla
ss
6.
Agg
ress
ion
7.
Self
inju
riou
s be
havi
or8.
Inap
prop
riat
e se
xual
beh
avio
r9.
Bow
el/b
ladd
er p
robl
ems
Not
e. L
ow (
up to
10
hour
s), M
oder
ate
(bet
wee
n 11
-25)
, (H
igh
> 2
5).
460
School Psychology Review, 2005, Volume 34, No. 4
Tab
le 2
Psy
chom
etri
c P
rope
rtie
s of
Rev
iew
ed S
yste
mat
ic O
bser
vati
on C
odes
Val
idity
Tre
atm
ent
Cod
eIn
tero
bser
ver A
gree
men
tC
onve
rgen
tD
iscr
imin
ant
Sens
itivi
tyN
orm
ativ
e D
ata
AE
T-Pe
rcen
t agr
eem
ent a
vera
ged
TR
F E
xter
naliz
ing
(r =
-.4
2)D
iscr
imin
ated
chi
ldre
n at
-ris
kN
o da
ta a
vaila
ble
1,30
0 ch
ildre
n fr
om 1
6SS
BD
acr
oss
5 st
udie
s w
as .9
6fo
r ex
tern
aliz
ing
prob
lem
ssc
hool
dis
tric
ts in
6fr
om c
ontr
ols
stat
es
Dis
crim
inat
ed c
hild
ren
at-r
isk
Dat
a av
aila
ble
byfo
r in
tern
aliz
ing
prob
lem
sge
nder
, and
by
grou
ping
from
con
trol
sof
the
SSB
D (
exte
rnal
iz-
ing,
inte
rnal
izin
g,no
nran
ked)
AD
HD
-K
appa
s =
.77–
.86
IOW
A (
rs
= .0
6–.7
1)D
iscr
imin
ated
chi
ldre
n w
ithSe
nsiti
ve to
Aut
hors
rec
omm
end
use
SOC
Kap
pas
= .6
7–.8
0A
TR
S (r
s =
.11–
.62)
emot
iona
l and
beh
avio
ral
stim
ulan
t dru
gof
cla
ssro
om c
ompa
ri-
Peer
Con
flic
t Sca
le (
rs =
diso
rder
s fr
om c
hild
ren
with
effe
cts
son
child
ren
(obs
erve
d .3
8–.7
4)le
arni
ng d
isab
ilitie
sin
alte
rnat
ing
1-m
inut
ese
gmen
ts)
Dis
crim
inat
ed c
hild
ren
with
AD
D f
rom
thei
r no
ndis
able
dpe
ers
Dis
crim
inat
ed c
hild
ren
with
AD
HD
fro
m th
eir
nond
isab
led
peer
s
(Tab
le 2
con
tinue
s)
461
Classroom Observation Codes
(Tab
le 2
con
tinue
d)
Val
idity
Cod
eIn
tero
bser
ver A
gree
men
tC
onve
rgen
tD
iscr
imin
ant
Tre
atm
ent
Nor
mat
ive
Dat
aSe
nsiti
vity
BO
SSK
appa
s =
.93–
.98
No
publ
ishe
d da
ta a
vaila
ble
Dis
crim
inat
ed c
hild
ren
with
Sens
itive
toA
utho
rs r
ecom
men
d us
eA
DH
D f
rom
thei
rin
stru
ctio
nal
of c
lass
room
com
pari
-no
ndis
able
d pe
ers
man
ipul
atio
nsso
n ch
ildre
n (o
bser
ved
ever
y fi
fth
inte
rval
)
CO
CPh
i coe
ffic
ient
s =
.80–
1.0
No
publ
ishe
d da
ta a
vaila
ble
Dis
crim
inat
ed c
hild
ren
with
Phi c
oeff
icie
nts
= .5
5–.9
5A
DH
D f
rom
thei
rPh
i coe
ffic
ient
s =
.40–
.97
nond
isab
led
peer
s*P
oor
relia
bilit
y fo
r ve
rbal
aggr
essi
on in
som
e st
udie
sU
sing
mul
tiple
cat
egor
ies
Sens
itive
toA
utho
rs r
ecom
men
d us
e80
% o
f AD
HD
and
typi
cally
psyc
hoso
cial
of c
lass
room
com
pari
-de
velo
ping
chi
ldre
n w
ere
inte
rven
tions
and
son
child
ren
(obs
erve
dco
rrec
tly c
lass
ifie
dst
imul
ant d
rug
in a
ltern
atin
g 4-
min
ute
effe
cts
segm
ents
)
DO
FA
vera
ge P
ears
on c
orre
latio
nsT
RF
tota
l beh
avio
rD
iscr
imin
ated
boy
s re
ferr
edO
n-ta
sk, a
nd th
eA
utho
rs r
ecom
men
d us
eac
ross
4 s
tudi
es w
as .9
0 fo
rpr
oble
ms
scor
efo
r pr
oble
m b
ehav
iors
fro
mN
ervo
us/O
bses
sive
,of
cla
ssro
om c
ompa
ri-
tota
l beh
avio
r pr
oble
ms
and
(rs
= -
.26–
-.5
3)ty
pica
lly d
evel
opin
g bo
ysan
d D
epre
ssio
nso
ns o
bser
ved
in 1
0-.8
4 fo
r on
-tas
km
atch
ed f
or a
ge, g
rade
,sc
ales
hav
em
inut
e bl
ocks
bef
ore
TR
F sc
hool
per
form
ance
and
race
dem
onst
rate
dan
d af
ter
10-m
inut
eO
ne w
ay I
CC
s fo
r 60
(rs
= -
.14–
.66)
sens
itivi
ty t
o a
obse
rvat
ion
of ta
rget
min
utes
of
obse
rvat
ion
was
targ
eted
pre
vent
ion
child
. .8
6 fo
r to
tal b
ehav
ior
TR
F A
dapt
ive
func
tioni
ngpr
ogra
mpr
oble
ms
and
.71
for
on-t
ask
com
posi
te (
rs =
.48–
.72)
Nor
mat
ive
data
on
287
child
ren
from
One
way
IC
C f
or 1
0N
ebra
ska,
Ore
gon,
and
min
utes
of
obse
rvat
ion
was
Ver
mon
t als
o ar
e .8
5 fo
r to
tal b
ehav
ior
prob
lem
s a
vaila
ble
for
com
pari
- a
nd .5
8 fo
r on
-tas
kso
n pu
rpos
es.
(Tab
le 2
con
tinue
s)
462
School Psychology Review, 2005, Volume 34, No. 4
(Tab
le 2
con
tinue
d)
Val
idity
Tre
atm
ent
Cod
eIn
tero
bser
ver A
gree
men
tC
onve
rgen
tD
iscr
imin
ant
Sens
itivi
tyN
orm
ativ
e D
ata
SEC
OS
Ave
rage
tota
l agr
eem
ent
No
publ
ishe
d da
ta a
vaila
ble
Dis
crim
inat
ed b
etw
een
No
publ
ishe
d da
ta50
0 ch
ildre
n in
Gra
des
1 =
.81
child
ren
with
beh
avio
rav
aila
ble
thro
ugh
5di
sord
ers
and
thei
rTo
tal a
gree
men
t = .7
5–1.
0no
ndis
able
d pe
ers
Chi
ldre
n at
tend
ed 1
of
10 e
lem
enta
ry s
choo
ls in
Dis
crim
inat
ed b
etw
een
Eas
t Ten
ness
eech
ildre
n w
ith le
arni
ngdi
sabi
litie
s fr
om th
eir
nond
isab
led
peer
s
SOS
Unc
lear
fro
m a
vaila
ble
data
No
publ
ishe
d da
ta a
vaila
ble
Dis
crim
inat
ed b
etw
een
No
publ
ishe
d da
taA
utho
rs r
ecom
men
d us
ech
ildre
n w
ith A
DH
D a
nd a
vaila
ble
of tw
o or
thre
e ra
ndom
lyno
ndis
able
d ch
ildre
nse
lect
ed p
eer
com
pari
-so
ns
IOW
A =
IO
WA
Con
ners
Tea
cher
’s R
atin
g Sc
ale
(Lon
ey &
Mili
ch, 1
982)
; AT
RS
= A
bbre
viat
ed T
each
ers
Rat
ing
Scal
e (C
onne
rs, 1
973)
; TR
F =
Tea
cher
Rep
ort
Form
(cf
., A
chen
bach
&R
esco
rla,
200
1).
463
Classroom Observation Codes
Tab
le 3
Stre
ngt
hs,
Lim
itat
ion
s, a
nd
Rec
omm
enda
tion
s fo
r R
evie
wed
Sys
tem
atic
Obs
erva
tion
Cod
es
Cod
eSt
reng
ths
Lim
itatio
nsR
ecom
men
ded
Use
Aca
dem
ic E
ngag
edSi
mpl
e co
de to
lear
n an
d us
eN
arro
w a
sses
smen
t of
stud
ent b
ehav
ior
Use
as
part
of
the
SSB
D f
or s
cree
ning
Tim
e-SS
BD
Low
trai
ning
req
uire
men
tsN
eed
for
upda
ted
norm
ativ
e da
ta
pur
pose
sSt
rong
rel
iabi
lity
Use
ful f
or m
easu
ring
stu
dent
eng
age-
Stro
ng s
uppo
rt f
or d
iscr
imin
ant v
alid
ity
men
t as
part
of
a di
agno
stic
ass
essm
ent
AD
HD
-Sch
ool
Bro
ad m
easu
rem
ent o
f ex
tern
aliz
ing
Var
iabl
e in
tero
bser
ver
agre
emen
tA
sses
smen
t of
exte
rnal
izin
g pr
oble
ms
Obs
erva
tion
Cod
e
beh
avio
rsR
elat
ivel
y lo
w a
ssoc
iatio
n w
ith te
ache
rM
onito
ring
eff
ects
of
inte
rven
tions
Kno
wn
psyc
hom
etri
c pr
oper
ties
r
atin
gs o
f hy
pera
ctiv
itySe
nsiti
ve to
eff
ects
of
trea
tmen
t
BO
SSSp
ecif
ic a
sses
smen
t of
activ
e st
uden
tM
ore
info
rmat
ion
need
ed r
egar
ding
Des
crib
ing
the
clas
sroo
m b
ehav
ior
of
eng
agem
ent
tr
eatm
ent s
ensi
tivity
c
hild
ren
Som
e ev
iden
ce f
or tr
eatm
ent s
ensi
tivity
In g
ener
al, l
imite
d in
form
atio
n co
ncer
ning
May
be
usef
ul in
ass
essm
ent o
f
psy
chom
etri
c pr
oper
ties
e
xter
naliz
ing
beha
vior
CO
CSt
rong
sup
port
for
dis
crim
inan
t val
idity
Com
plex
cod
e th
at is
a c
halle
nge
to le
arn
Scre
enin
g an
d di
agno
sis
of A
DH
DSt
rong
sup
port
for
trea
tmen
t sen
sitiv
ityD
ata
need
ed e
xam
inin
g di
scri
min
atio
n of
Mon
itori
ng th
e ef
fect
s of
inte
rven
tions
c
hild
ren
with
AD
HD
fro
m o
ther
f
or A
DH
D
aff
ecte
d po
pula
tions
DO
FE
asy
to le
arn
and
use
Mor
e in
form
atio
n ne
eded
with
reg
ard
toA
s pa
rt o
f th
e A
SEB
A f
or a
sses
smen
t of
Bro
ad a
sses
smen
t of
exte
rnal
izin
g an
d
the
psyc
hom
etri
c pr
oper
ties
of s
cale
s
em
otio
nal a
nd b
ehav
ior
prob
lem
s
inte
rnal
izin
g be
havi
ors
o
ther
than
tota
l beh
avio
r pr
oble
ms
Inte
grat
ed in
to a
bro
ad a
sses
smen
t sys
tem
Mor
e in
form
atio
n ne
eded
reg
ardi
ng
(A
SEB
A)
tr
eatm
ent s
ensi
tivity
and
dis
crim
inan
tSo
me
evid
ence
of
trea
tmen
t sen
sitiv
ity
val
idity
Som
e ev
iden
ce f
or u
tility
in d
iagn
ostic
Smal
l nor
mat
ive
sam
ple
as
sess
men
ts o
f be
havi
or p
robl
ems
(Tab
le 3
con
tinue
s)
464
School Psychology Review, 2005, Volume 34, No. 4
(Tab
le 3
con
tinue
d)
Cod
eSt
reng
ths
Lim
itatio
nsR
ecom
men
ded
Use
SEC
OS
Kno
wn
accu
racy
Nor
mat
ive
grou
p co
nsis
ts o
nly
ofA
sses
smen
t of
exte
rnal
izin
g pr
oble
ms
ifE
vide
nce
for
disc
rim
inan
t val
idity
5
00 s
tude
nts
from
Eas
t Ten
ness
ee
pee
r co
mpa
riso
ns a
re u
tiliz
edL
ow in
fere
nce/
desc
ript
ive
cate
gori
esN
o da
ta a
vaila
ble
rega
rdin
g tr
eatm
ent
Has
bee
n us
ed o
n a
wid
e ag
e ra
nge
s
ensi
tivity
(
firs
t gra
de–h
igh
scho
ol)
SOS
Rel
ativ
ely
broa
d as
sess
men
t of
both
Lim
ited
evid
ence
of
psyc
hom
etri
cN
one
at th
is ti
me
p
ositi
ve a
nd n
egat
ive
stud
ent
p
rope
rtie
s
beh
avio
rs
465
Classroom Observation Codes
calling out when it is not appropriate to do so),(b) motor movement (e.g., getting out of seatwithout permission), (c) verbal aggression(e.g., cursing at another student), (d) symbolicaggression (e.g., taking another student’s pen-cil), (e) object aggression (e.g., kicking a chair),and (f) off-task (e.g., looking out a windowinstead of completing an assignment). Non-compliance (e.g., ignoring verbal directionfrom teacher) is scored using the whole inter-val method. Other categories are coded inlunchroom and playground settings to assessappropriate and inappropriate social behaviors(see Leff & Lakin, 2005, for use of the ADHD-SOC in playground settings). When used aspart of a comprehensive diagnostic assessment,the authors of the ADHD-SOC recommendselecting three or four peers to observe for com-parison. Selected peers are observed with thetarget student in alternating 1-minute segments.
Psychometric properties. Interobserveragreement using the ADHD-SOC has beensomewhat variable. For example, Nolan andGadow (1994) reported kappa coefficients be-tween .77 and .86 for the five classroom cat-egories, with only the category of nonphysicalaggression falling below .80. However, Gadow,Nolan, Sprafkin, and Sverd (1995) reportedkappas at or below .80 for all five categories(k = .67–.80). Test-retest coefficients based onobservations within a 2-week period were lowto moderate (range = .27–.72) (Gadow et al.,1996). The association between teacher ratingsof hyperactivity and relevant ADHD-SOC cat-egories (motor movement and off-task) hasbeen low and not statistically significant. How-ever, significant associations between teacherratings of hyperactivity and observed off-taskbehavior (though not motor movement) emergewhen teacher ratings of negative behavior werecontrolled for statistically (rs between .46 and.48) (Gadow et al., 1996). Evidence for the con-vergent validity of the remaining categories ofthe ADHD-SOC (interference, noncompliance,nonphysical aggression) is more robust. Nolanand Gadow (1994) found moderate correlationsbetween these categories and teacher ratingsof aggression and emotional lability (range =.38–.74). In addition, the ADHD-SOC has beenfound to discriminate between children iden-
tified as having ADHD and their nonlabeledpeers (Gadow et al., 1992). Finally, the treat-ment sensitivity of all but one ADHD-SOC cat-egory (nonphysical aggression) has been dem-onstrated in school-based studies of stimulantdrug effects (Gadow, Nolan, & Sverd, 1992;Gadow, Nolan, Sverd, Sprafkin, & Paolicelli,1990).
Behavioral Observation of Students inSchools (BOSS; Shapiro, 2004)
The BOSS was designed to assess stu-dent academic behavior in the classroom en-vironment. According to its developer (thefourth author of this article), it should take be-tween 10 and 15 hours of training to becomeproficient using the BOSS. The BOSS essen-tially measures levels of on- and off-task be-havior. However, the BOSS divides on-taskbehavior into active engaged time (AET; codedwhen a student is actively engaged in academicresponding; e.g., reading aloud, writing in ajournal), and passive engaged time (PET;coded when a student is passively attending;e.g., listening to a teacher, looking at the black-board while a teacher writes). Both AET andPET are scored using momentary time sam-pling at the beginning of each 15-second in-terval. During the remainder of each interval,the partial interval method is used to recordthe following off-task behavior categories: (a)off-task motor (motor activity not associatedwith the assigned academic task; e.g., leav-ing seat to throw a piece of paper in the trashcan), (b) off-task verbal (utterances not as-sociated with the academic task; e.g., talk-ing to a peer about something other than thecurrent assignment, humming), and (c) off-taskpassive (passive nonengagement; e.g., lookingout the window). Using the BOSS, the ob-server codes the behavior of the target childfor four out of every five intervals. On ev-ery fifth interval, the behavior of one of sev-eral preselected peers is coded on the samebehaviors as the target child for comparisonpurposes. Finally, teacher-directed instruction(TDI) is coded using the partial-intervalmethod. Scores on TDI estimate the amountof time a teacher is engaged in instruction. Forexample, TDI would be coded if the teacher
466
School Psychology Review, 2005, Volume 34, No. 4
was lecturing to the class, but TDI would notbe coded if he was grading papers at his desk.TDI, like peer comparison data, is scored onevery fifth interval.
Psychometric properties. Reports ofinterobserver agreement for the BOSS havebeen consistently high. For example, in a studyinvolving repeated measurement of three par-ticipants, Ota and DuPaul (2002) reported to-tal agreement ranging between 90 and 100%.More recently, DuPaul et al. (2004) reportedkappas ranging from .93 to .98 for observa-tions in a large sample of children with ADHDand normal comparison children (N = 136).Although there are no data available support-ing the convergent validity of the BOSS, thereare some data supporting the ability of theBOSS to discriminate between children withADHD and typically developing children. Spe-cifically, DuPaul et al. (2004) found that PETand a composite of the three off-task scores ofthe BOSS significantly discriminated betweenchildren with ADHD who had academic prob-lems and typically developing peers, whetherthe observations were conducted during in-struction in mathematics or reading. Effectsizes for these variables ranged between -.53and 1.25.
Treatment sensitivity of the BOSS hasbeen documented in a study investigating theefficacy of computer-aided instruction for threechildren with ADHD (e.g., Ota & DuPaul,2002). In a multiple-baseline design acrossthree participants, the BOSS categories of AET(ES between -2.91 and -13.01) and a compos-ite of the three off-task scores (ES between 1.8and 3.06) were found be sensitive to manipu-lations in instructional modality (regular mathinstruction vs. working on a computer).
Classroom Observation Code (COC)
The COC (Abikoff & Gittelman, 1985)was designed to quantify the classroom behav-ior of children for diagnostic assessment forADHD and for monitoring the effects of inter-ventions designed to ameliorate the symptomsof ADHD. The COC is a relatively complexcode consisting of 12 behavior categories.Abikoff, Gittelman-Klein, and Klein (1977) re-
ported that training for the code averaged 50hours, and that only 5 of 8 advanced under-graduate and graduate student research assis-tants met the training criteria of 70% agree-ment at the end of training. Like the ADHD-SOC, the COC focuses exclusively on childbehaviors. Categories of the COC are recordedin 15-second intervals using one of two sam-pling methods. The following discrete behav-iors are scored using the partial intervalmethod: (a) interference (e.g., calling out dur-ing a teacher lecture), (b) minor motor move-ment (e.g., twisting and turning while seated),(c) gross motor standing (e.g., out of seat andstanding), (d) gross motor-vigorous (e.g., run-ning or crawling across the classroom), (e)physical aggression (e.g., kicks or hits anotherchild), (f) threat or verbal aggression-to chil-dren or -to teacher (e.g., curses at another childor teacher), and (g) solicitation of teacher (e.g.,raises hand). The following behaviors arecoded using the whole interval method: (a) off-task (e.g., plays with toy while the teacher istalking), (b) noncompliance (e.g., ignores ver-bal direction from teacher), and (c) out of chairbehavior (e.g., out of seat when not appropriateto do so). Finally, if none of the aforementionedbehaviors are noted in an interval, “absence ofbehavior” is coded. Observation sessions usingthe COC typically are 32 minutes in duration.A target child and a same gender teacher-nomi-nated “normal” peer are observed for 16 min-utes each, in alternating 4-minute blocks.
Psychometric properties. Reportedinterobserver agreement for the COC has beenhigh. For example, Abikoff et al. (2002) col-lected interobserver agreement data for 10%of 1,893 observations, which yielded mean phicoefficients ranging from .80 to 1.00. The dis-criminant validity of the COC is well docu-mented (Abikoff et al., 1977; Abikoff,Gittelman, & Klein, 1980; Abikoff et al., 2002).For example, in a study of 502 pairs of chil-dren with ADHD and their classmates, Abikoffet al. (2002) found that all of the COC catego-ries significantly discriminated between chil-dren with ADHD and their typically develop-ing peers. The categories of off-task and inter-ference have been found to be the most dis-criminating, correctly classifying 77% and
467
Classroom Observation Codes
76.2% of cases, respectively (e.g., ADHD vs.“normal”). However, by combining the catego-ries of interference, off-task, minor motormovement, gross motor movement, and solici-tation, almost 80% of cases were correctly clas-sified (Abikoff et al., 1980). The treatment sen-sitivity of the COC is well documented andhas been used as a dependent measure in nu-merous studies of medical and psychosocialinterventions for children with ADHD (e.g.,Abikoff et al., 2004; Klein & Abikoff, 1997).Based on our review of the extant literature,no studies have investigated the convergentvalidity of the COC.
Direct Observation Form (DOF)
The DOF (Achenbach, 1986) was de-signed to obtain ratings of problem behaviorsand on-task behavior directly observed ingroup settings, and is part of the AchenbachSystem of Empirically-Based Assessment(ASEBA; Achenbach & Rescorla, 2001). TheDOF has been used in research studies acrossa number of school settings, including theclassroom, lunchroom, and playground. Train-ing for the DOF should take about 10 hours.Although each observation period is relativelybrief (10 minutes), the developers of the DOFrecommend that three to six observations beperformed to gain a stable estimate of childbehavior. During each observation session, theobserver writes a narrative or running log de-scribing the target student’s behavior. In thelast 5 seconds of each 1-minute interval, theobserver also records whether the target childis on-task or off-task. On-task versus off-taskis determined by the predominant activity sam-pling method wherein behavior must occur formore than half of the 5-second sampling inter-val. Hence, the DOF requires that the observerwrite a narrative and observe on- and off-taskbehavior simultaneously.
At the end of each 10-minute observa-tion session the observer uses the DOF formto rate the student’s behavior on 97 problemitems. Problem items are scored on a 4-pointLikert scale: 0 = no occurrence; 1 = slight orambiguous occurrence; 2 = definite occurrencewith mild to moderate intensity and less than3-minutes duration; and 3 = definite occurrence
with severe intensity or greater than 3-minutesduration. Problem items are short (e.g., “actstoo young for age,” “sulks,” “nervous, highstrung, or tense”) with 72 items correspond-ing to items of the Child Behavior Checklistfor Ages 6 to 18 (CBCL; Achenbach &Rescorla, 2001) and 83 items correspondingto the Teacher Report Form (TRF; Achenbach& Rescorla, 2001). Factor analyses of datafrom 212 clinically referred children between5 and 14 years of age generated six syndromescales (Withdrawn-Inattentive, Nervous-Ob-sessive, Depressed, Hyperactive, AttentionDemanding, Aggressive; Achenbach &Rescorla, 2001), plus Internalizing and Exter-nalizing scales. The DOF also provides a To-tal Problem score that is the sum of the 0 to 3ratings on the 97 items and an on-task scoreranging from 1 to 10. In addition, the develop-ers of the DOF recommend observing twocomparison children in the same setting (oneobserved before and one after the target stu-dent). The DOF scoring profile provides rawscores for the six syndrome scales, plus Tscores for Internalizing, Externalizing, andTotal Problems. The DOF profile comparesscores for the target child (and control chil-dren) to a normative sample of 287 childrenfrom Nebraska, Oregon, and Vermont.
Psychometric properties. In severalstudies Pearson correlations were indicative ofgood interobserver agreement. Averagingacross four studies of children in public schoolclassrooms and a residential treatment center(Achenbach & Edelbrock, 1983; McConaughy,Achenbach, & Gent, 1988; McConaughy, Kay,& Fitzgerald, 1998, 1999), mean interobserveragreement was .90 for DOF Total BehaviorProblems and .84 for on-task. In an examina-tion of the generalizability of the DOF, Reedand Edelbrock (1983) found that DOF TotalBehavior Problems (mean intraclass correlation=.85), but not on-task (mean intraclass correla-tion = .58), generalized well from one observerto another for individual 10-minute observationsessions. When data from six sessions werecombined interclass correlations improved foron-task (mean intraclass correlation = .71), andTotal Behavior Problems remained stable(mean intraclass correlation = .86).
468
School Psychology Review, 2005, Volume 34, No. 4
The convergent validity of the DOF issupported by significant correlations (rs =.37to .51) between total behavior problem scalesof the DOF and TRF (Achenbach & Edelbrock,1986; Reed & Edelbrock, 1983). The TotalBehavior Problems score and the on-task scoreof the DOF have also been shown to discrimi-nate between boys referred for problem behav-ior and a sample of typically developing boysmatched for age, grade, and race (Reed &Edelbrock, 1983). The treatment sensitivity ofthe on-task, internalizing, nervous/obsessive,and depressed scales (McConaughy et al.,1999) has been demonstrated in evaluations ofschool-based programs to prevent emotionaldisturbance (McConaughy et al., 1998, 1999).
State-Event Classroom ObservationSystem (SECOS)
The SECOS (Saudargas, 1997) was de-signed to quantify student behavior as part ofa comprehensive multimethod assessment andto assess the effectiveness of classroom inter-ventions. It has been used in research studiesinvolving students from first grade throughhigh school. Learning the code typically re-quires 13 to 15 hours of training (Saudargas,1997). For the SECOS, momentary time sam-pling is used to derive an estimate of theamount of time the student engages in the fol-lowing six “state” behaviors: (a) school work(e.g., a student is solving a math problem in aworkbook), (b) out of seat (e.g., student leavesseat without permission), (c) looking around(e.g., student looks out window), (d) socialinteraction with child (e.g., student talks toneighbor about school work), (e) social inter-action with teacher (e.g., teacher is helping stu-dent solve a math problem), and (f) other ac-tivity (e.g., sharpening pencil). The frequencyof five additional “event” behaviors are re-corded in 15-second intervals: (a) raise hand(e.g., student raises hand in response to ateacher question), (b) calling out to teacher(e.g., student calls teacher to ask for help), (c)approach child (e.g., student taps neighbor onthe shoulder), (d) other child approach (anotherchild taps the target student on the shoulder),(e) teacher approach (e.g., teacher asks studenta question). Out of seat appears as both a state
and an event category, which allows for an es-timate of both the frequency and the durationof this behavior.
The author of the SECOS recommendsobserving a classroom peer for comparisonpurposes. Although no guidelines are offeredto direct the collection of such data in theSECOS manual, research studies have col-lected target and peer data in alternating 20-minute sessions (cf., Slate & Saudargas,1986a). Also, normative data for the SECOSare available for children in first through fifthgrade. The normative sample consisted of 500children from 10 schools in East Tennessee.Due to a lack of statistically significant differ-ences in scores between boys and girls in thenormative sample, these data are grouped to-gether in T-score conversion tables by grade.
Psychometric properties. Interobserveragreement using the SECOS appears good.Fellers and Saudargas (1987) reported an av-erage total agreement of .81. In another study,Slate and Saudargas (1986a) found total agree-ment to range from .75 to 1.0 for 25% of ob-servations.
Although there do not appear to be anypublished data supporting the convergent va-lidity of the SECOS, two studies have exam-ined the accuracy of the SECOS. Saudargasand Lentz (1986) found that the associationbetween state and event category scores on theSECOS and real-time recording of the samebehaviors on hand-held computers supportedthe sampling methods employed (rs =.67 to.92), as did t-tests comparing levels of esti-mated and real time scores. However, in a laterstudy, Saudargas and Zanolli (1990) found thatmomentary time sampling in 15-second inter-vals may not be sensitive to behaviors of shortduration (e.g., teacher interactions, verbaliza-tions). If these behaviors are of particular in-terest, these authors have suggested that short-ening intervals (e.g., 5-second) would improvesensitivity, but perhaps at the cost of reliabil-ity.
The SECOS significantly discriminatedbetween typically developing children andthose with behavior disorders (Slate &Saudargas, 1986a), and learning disabilities(e.g., Fellers & Saudargas, 1987). However, it
469
Classroom Observation Codes
should be noted that only observed teacherbehaviors and a combination of observedteacher and child behaviors were able to dis-criminate between boys with learning disabili-ties and their typically developing same gen-der peers (Slate & Saudargas, 1986b).
Student Observation System (SOS)
The SOS (Reynolds & Kamphaus, 2004)was designed to assess a broad array of bothadaptive and maladaptive classroom behaviors,and is a component of the Behavior Assess-ment System for Children-2nd Edition (BASC-2; Reynolds & Kamphaus, 2004). It has beensuggested that training for the SOS can be ac-complished in a 30-minute workshop or bysimply reading the manual, but no criterion fortraining has been reported (Lett & Kamphaus,1997). The length of observation sessions istypically 30 minutes, but the authors recom-mend observing the target child across 3 or 4days in different classrooms to enhance thereliability of measurement.
Using the SOS, the observer takes notesconcerning child and teacher behaviors for 27seconds of each 30-second interval. In the last3 seconds of each interval, the observer uses a3-second momentary time sampling procedureto record adaptive and/or maladaptive behav-iors exhibited by the target child. The adap-tive behaviors are coded using the followingfour categories: (a) response to teacher/lesson(e.g., answers teacher’s question appropri-ately), (b) peer interaction (e.g., participatesappropriately in small group discussion), (c)work on school subjects (e.g., completing amath worksheet alone), and (d) transitionmovement (e.g., walking to blackboard whenasked to do so). The following nine categoriesare grouped together as maladaptive behaviors:(a) inappropriate movement (e.g., walkingaround classroom when inappropriate), (b) in-attention (e.g., doodling on book), (c) inappro-priate vocalization (e.g., teases another stu-dent), (d) somatization (e.g., complains abouta headache), (e) repetitive motor movements(e.g., plays with hair), (f) aggression (e.g., in-tentionally breaks a neighbor’s pencil), (g) self-injurious behavior (e.g., pulls own hair), (h)inappropriate sexual behavior (e.g., strokes
self), and (i) bowel/bladder problems (e.g.,wets pants). At the end of the 30-minute ob-servation session, the observer then reviewsnotes and rates the student’s behavior on 65behavior items on a 3-point Likert scale (NO= never observed, SO = sometimes observed,FO = frequently observed). Items are groupedaccording to the aforementioned adaptive andproblem behavior categories. For the problembehavior items, there is a column to indicatewhether the behavior was disruptive to theclass.
Psychometric properties. Unfortu-nately, little published data are available con-cerning the psychometric properties of theSOS. The manual for the BASC-2 reports nodata concerning technical adequacy of the SOS.In one study, the SOS was evaluated with re-gard to its ability to discriminate a group of 37children with ADHD from a group of 18 typi-cally developing children (Lett & Kamphaus,1997). In this study interobserver agreementwas examined in a subsample of participants(n = 44) with coefficients reported to be in the.80s. However, the range of interobserveragreement coefficients and the method em-ployed to evaluate interobserver agreementwas not reported, making interpretation ofthese data difficult. Nevertheless, scores on thecategory of inappropriate movement and theproblem behavior composite (of which the in-appropriate movement category is a contribu-tor) from the momentary time sampling por-tion of the SOS significantly discriminatedchildren with ADHD from typically develop-ing children.
Discussion
The purpose of this article was to criti-cally evaluate seven observation systems de-signed to assess a student’s classroom behav-ior. Table 3 provides a summary of thestrengths, limitations, and recommended usesfor each of the codes included in this review.
Recommendations for selection ofobservation codes. Three of the codes, theAET-SSBD, DOF, and SOS, were developedin conjunction with other measures (the SSBD,ASEBA, and BASC-2, respectively) and are
470
School Psychology Review, 2005, Volume 34, No. 4
closely aligned with the constructs assessed bythese measures. The remaining four codes weredeveloped to assess key behavioral domains,although some focus exclusively on problembehaviors (i.e., ADHD-SOC, COC), whereasothers (i.e., BOSS, SECOS) focus on positivebehaviors (e.g., academic engagement) as wellas problem behaviors. As such, the primarytarget behavior(s) of interest will be one of theinitial factors guiding the selection of a poten-tial observation code from among those in-cluded in this review.
Beyond the consideration of target be-haviors, reliability and validity evidence mustbe weighed when deciding which observationcode to use. With the exception of the SOS, allof the reviewed codes have minimally suffi-cient reliability evidence. With regard to va-lidity, all of the codes have at least some evi-dence to suggest that scores differentiate be-tween students with classroom behavior dif-ficulties and students without such difficulties.Only three of the codes (AET-SSBD, ADHD-SOC, and DOF), however, have published evi-dence of convergent validity. Similarly, onlyfour of the codes (ADHD-SOC, COC, DOF,and BOSS) have published evidence to sup-port their use for monitoring change in class-room behavior in response to intervention.
Given the strengths and limitations of theavailable data, six of the codes have sufficientevidence to be used as part of a multimethodassessment. Based on the available data, theADHD-SOC and DOF appear to have the mostsupport for use in the multimethod assessmentof externalizing problems, and the DOF is theonly code appropriate for assessing internaliz-ing problems. The COC shows promise in theassessment of classroom behaviors associatedwith ADHD, whereas the SECOS, BOSS, andAET-SSBD demonstrate promise in the assess-ment of positive behaviors in the classroomsetting. The extremely limited published evi-dence available for the SOS precludes its useat the current time.
Recommendations for observationbest practices. In the beginning of this articlewe listed the threats to the validity of observa-tional assessment identified by Merrell (1999),including the use of poorly defined behavior
categories and inappropriate code selection. Bypresenting information concerning the sevencodes reviewed here we hope to enhance thevalidity of observations by facilitating theappropriate selection of well-validated cod-ing schemes for particular assessment tasks.There are other strategies that observers canuse to maximize the validity of observa-tional assessments. First, it is incumbent onthe observer to ensure that they are ad-equately trained on a given code and thatthe consistency of their observations doesnot decline over time. Training requirements(summarized in Table 1) should be takeninto consideration when selecting any givencode. One way to ensure the adequacy oftraining is to utilize a precoded videotapeto determine if a minimum degree of accu-racy has been achieved. Unfortunately, suchtapes are available only for the AET-SSBD.Alternatively, observers can check theiragreement with a second observer (seeHintze, 2005, for methods to calculateinterobserver reliability). In addition to ascer-taining whether observers have been trainedto criterion initially, it is also necessary peri-odically to check reliability to curb observerdrift.
Second, several observations are neces-sary to achieve a reliable estimate of a studentbehavior (see Hintze, 2005). Although this hasbeen discussed in terms of traditional measure-ment theory (e.g., the measurement of a trait),it would seem equally valid for behavioral ap-proaches to assessment wherein one is moreinterested in assessing differences in a givenbehavior across conditions. As such, it is rec-ommended that if one wishes to make com-parisons of student behavior across settings,multiple observations should be performedwithin each setting. Third, the normative datathat are currently available (e.g., AET-SSBD,SECOS) appear inadequate due to either sam-pling techniques, the age of the data, or both.Further, given the variability in ecology acrosseducational settings (e.g., task demands, class-room rules, classroom management skills,quality of instruction) standardized norms seemill suited for observational assessment. Hence,it is recommended that local normative databe collected for frequently used codes and that
471
Classroom Observation Codes
for each assessment, data be collected on oneor more peers under the same conditions as thetarget child.
Two final considerations for ensuringobservation best practices are reactivity andobserver bias. Reactivity refers to a target childaltering behaviors as a result of being observed,resulting in inaccurate estimates of actual tar-get behaviors. One strategy for minimizingreactivity is to conduct multiple observationsto increase the child’s familiarity/comfort withthe observer in the classroom. Observer biasalso affects observation accuracy and refers tothe tendency of an observer to consistentlyview (and record) observed behaviors in aparticular way (e.g., negative, positive). Ad-equate training and periodic reliabilitychecks described previously are perhaps themost effective way to minimize the likelihoodof observer bias.
Finally, no assessment should rely on asingle measurement method, particularly whenreliability and validity evidence is limited.Assessments are enhanced when multiplemethods are employed (Cone, 1978) to assessbehavior across multiple dimensions(Achenbach, 1993). Hence, observations, likeany other assessment methodology, shouldonly be used as part of a broader assessmentbattery irrespective of the assessment domain.
Future Research Directions
There are multiple critical directions forfuture research to ensure identification of ap-propriate (and inappropriate) uses of the stan-dardized behavior observation codes reviewedherein. In light of the limited number of stud-ies evaluating convergent validity, each of thecodes would benefit from additional studiesaddressing this type of evidence. Second, evi-dence for treatment sensitivity is nonexistentfor some codes and minimally sufficient forothers. Studies examining treatment sensitiv-ity are essential if these systems are to be usedto evaluate intervention effectiveness.
One final line of validity evidence notcurrently addressed in studies of these codesis the representativeness of observed behaviorbased on a small number of observations. De-spite the common professional belief that re-
sults of observation are the “gold standard” inthe assessment of behavior, studies (e.g., Doll& Elliott, 1994; Hintze & Matthews, 2004)have raised important questions regarding thevalidity of a small number of observations tomeasure classroom behavior. Given that mostpractitioners rarely have time to engage in alarge number of observations for an individualstudent, determining the validity of a singleobservation (or small number of observations)with each of these codes is essential for justi-fication of their use in professional practice forscreening and diagnosis.
Conclusions
The direct assessment of student behav-ior has been a critical component of compre-hensive evaluations of student behavior inclassroom settings. The seven observationcodes reviewed in this article have been de-veloped to provide practitioners with a stan-dardized framework for measuring classroombehavior. With the exception of one code, allhave published interobserver agreement evi-dence to support their use with school-agepopulations. Most also have some evidence ofpredictive validity and treatment sensitivity,though much of this evidence is limited tosingle studies with samples that are small tomoderate in size. Even less evidence is avail-able related to convergent validity. As a re-sult of these limitations in existing evidence,much research is necessary to ensure thatthese codes are used for appropriate assess-ment purposes, behaviors, and target stu-dents. Until the completion of such studies,practitioners are encouraged to select mea-sures cautiously and use multiple methodsin screening, diagnosis, and evaluation oftreatment effectiveness.
References
Abikoff, H., & Gittelman, R. (1985). Classroom Obser-vation Code: A modification of the Stony Brook Code.Psychopharmacology Bulletin, 21, 901-909.
Abikoff, H., Gittelman, R., & Klein, D. F. (1980). A class-room observation code for hyperactive children: A rep-lication of validity. Journal of Consulting and Clini-cal Psychology, 48, 555-565.
Abikoff, H., Gittelman-Klein, R., & Klein, D. F. (1977).Validation of a classroom observation code for hyper-
472
School Psychology Review, 2005, Volume 34, No. 4
active children. Journal of Consulting and ClinicalPsychology, 45, 772-783.
Abikoff, H., Hechtman, L., Klein, R. G., Weiss, G., Fleiss,K., Etcovitch, J., Cousins, L., Greenfield, B., Martin,D., & Pollack, S. (2004). Symptomatic improvementin children with ADHD treated with long-term meth-ylphenidate and multimodal psychosocial treatment.Journal of the American Academy of Child and Ado-lescent Psychiatry, 43, 802-811.
Abikoff, H. B., Jensen, P. S., Arnold, L. L. E., Hoza, B.,Hechtman, L., Pollack, S., Martin, D., Alvir, J., March,J. S., Hinshaw, S., Vitiello, B., Newcorn, J., Greiner,A., Cantwell, D. P., Conners, C. K., Elliott, G.,Greenhill, L. L., Kraemer, H., Pelham, W. E., Jr., Se-vere, J. B., Swanson, J. M., Wells, K., & Wigal, T.(2002). Observed classroom behavior of children withADHD: Relationship to gender and comorbidity. Jour-nal of Abnormal Child Psychology, 4, 349-359.
Achenbach, T. M. (1986). The Direct Observation Formof the Child Behavior Checklist (rev. ed.). Burlington,VT: University of Vermont, Department of Psychia-try.
Achenbach, T. M. (1993). Implications of multiaxial em-pirically based assessment for behavior therapy withchildren. Behavior Therapy, 24, 91-116.
Achenbach, T. M., & Edelbrock, C. (1983). Manual forthe Child Behavior Checklist/4-18 and Revised ChildBehavior Profile. Burlington, VT: University of Ver-mont, Department of Psychiatry.
Achenbach, T. M., & Edelbrock, C. (1986). Manual forthe Teacher’s Report Form and Teacher Version of theChild Behavior Profile. Burlington, VT: University ofVermont, Department of Psychiatry.
Achenbach, T. M., & Rescorla, L. A. (2001). Manual forthe ASEBA School-Age Forms & Profiles. Burlington,VT: Research Center for Children, Youth, and Fami-lies.
Cone, J. D. (1978). The behavioral assessment grid (BAG):A conceptual framework and taxonomy. BehaviorTherapy, 9, 882-888.
Conners, C. K. (1973). Rating scale for use in drug stud-ies with children [Special issue: Pharmacotherapy ofchildren]. Psychopharmacology Bulletin, 24-84.
Doll,B., & Elliott, S. N. (1994). Representativeness ofobserved preschool social behaviors: How many dataare enough? Journal of Early Intervention, 18, 227-238.
DuPaul, G. J., Volpe, R. J., Jitendra, A. K., Lutz, J. G.,Lorah, K. S., & Grubner, R. (2004). Elementary schoolstudents with attention-deficit/hyperactivity disorder:Predictors of academic achievement. Journal of SchoolPsychology, 42, 285-301.
Fellers, G., & Saudargas, R. A. (1987). Classroom behav-iors of LD and nonhandicapped girls. Learning Dis-ability Quarterly, 10, 231-236.
Gadow, K. D., Nolan, E. E., Sprafkin, J., & Sverd, J.(1995). School observations of children with attention-deficit hyperactivity disorder and comorbid tic disor-der: Effects of methylphenidate treatment. Journal ofDevelopmental and Behavioral Pediatrics, 16, 167-176.
Gadow, K. D., Nolan, E. E., & Sverd, J. (1992). Meth-ylphenidate in hyperactive boys with comorbid tic dis-order: II. Behavioral effects in school settings. Jour-
nal of the American Academy of Child and AdolescentPsychiatry, 31, 462-471.
Gadow, K. D., Nolan, E. E., Sverd, J., Sprafkin, J., &Paolicelli, L. (1990). Methylphenidate in aggressive-hyperactive boys: I. Effects on peer aggression in publicschool settings. Journal of the American Academy ofChild and Adolescent Psychiatry, 29, 710-718.
Gadow, K. D., Paolicelli,L. M., Nolan,E. E., Schwartz, J.,Sprafkin, J., & Sverd, J. (1992). Methylphenidate inaggressive hyperactive boys: II. Indirect effects ofmedication on peer behavior. Journal of Child andAdolescent Psychopharmacology, 2, 49-61.
Gadow, K. D., Sprafkin, J., & Nolan, E. E. (1996). ADHDSchool Observation Code. Stony Brook, NY: Check-mate Plus.
Greenwood, C. R. (1996). The case for performance-basedinstructional models. School Psychology Quarterly, 11,283-296.
Greenwood, C. R., Carta, J. J., Kamps, D., & Delquadri,J. (1993). Ecobehavioral Assessment Systems Software(EBASS): Observational instrumentation for schoolpsychologists. Kansas City: Juniper Gardens Children’sProject, University of Kansas.
Gruber, R., DuPaul, G. J., Jitendra, A. K., Volpe, R. J., &Lorah, K. S. (in press). Classroom observations of stu-dents with and without ADHD: Differences across aca-demic subjects and types of engagement. Journal ofSchool Psychology.
Hintze, J. M. (2005). Psychometrics of direct observation.School Psychology Review, 34, 507-519.
Hintze, J. M., & Matthews, W. J. (2004). Thegeneralizability of systematic direct observationsacross time and setting: A preliminary investigation ofthe psychometrics of behavioral observation. SchoolPsychology Review, 33, 258-270.
Hintze, J. M., Volpe, R. J., & Shapiro, E. S. (2002). Bestpractices in systematic direct observation of studentbehavior. In A. Thomas & J. Grimes (Eds.), Best prac-tices in school psychology IV (Vol. 2, pp. 993-1006).Bethesda, MD: National Association of School Psy-chologists.
Klein, R. G., & Abikoff, H. (1997). Behavior therapy andmethylphenidate in the treatment of children withADHD. Journal of Attention Disorders, 2, 89-114.
Leff, S. S., & Lakin, R. (2005). Playground-based obser-vational systems: A review and implications for prac-titioners and researchers. School Psychology Review,34, 474-488.
Lett, N. J., & Kamphaus,R. W. (1997). Differential valid-ity of the BASC Student Observation System and theBASC Teacher Rating Scale. Canadian Journal ofSchool Psychology, 13, 1-14.
Loney, J., & Milich, R. (1982). Hyperactivity, inattentionand aggression in clinical practice. In M. Wolraich &D. K. Routh (Eds.), Advances in developmental andbehavioral pediatrics (Vol. 3, pp. 113-147). Greenwich,CT: JAI Press.
McConaughey, S. H., Achenbach,T. M., & Gent, C. L.(1988). Multiaxial empirically based assessment: Par-ent, teacher, observational, cognitive, and personalitycorrelates of child behavior profile types for 6- to 11-year-old boys. Journal of Abnormal Child Psychology,16, 485-509.
473
Classroom Observation Codes
McConaughy, S. H., Kay, P. J., & Fitzgerald, M. (1998).Preventing SED though parent-teacher action researchand social skills instruction: First-year outcomes. Jour-nal of Emotional and Behavioral Disorders, 6, 81-93.
McConaughy, S. H., Kay, P. J., & Fitzgerald, M. (1999).The Achieving, Behaving, Caring Project for prevent-ing ED: Two-year outcomes. Journal of Emotional andBehavioral Disorders, 7, 224-239.
Merrell, K. W. (1999). Behavioral, social, and emotionalassessment of children & adolescents. Mahwah, NJ:Lawrence Erlbaum Associates.
Nolan, E. E., & Gadow, K. D. (1994). Relation betweenratings and observations of stimulant drug response inhyperactive children. Journal of Clinical Child Psy-chology, 23, 78-90.
Ota, K. R., & DuPaul, G. J. (2002). Task engagement andmathematics performance in children with attention-deficit hyperactivity disorder: Effects of supplementalcomputer instruction. School Psychology Quarterly, 17,242-257.
Quinn, M. M., Mathur, S. R., & Rutherford, R. B. (1995).Early identification of antisocial boys: A multi-methodapproach. Education and Treatment of Children, 18,272-281.
Reed, M. L., & Edelbrock, C. (1983). Reliability and va-lidity of the Direct Observation Form of the ChildBehavior Checklist. Journal of Abnormal Child Psy-chology, 11, 521-530.
Reynolds,C. R., & Kamphaus, R. W. (2004). BehaviorAssessment System for Children (2nd ed.). CirclePines, MN: American Guidance System Publishing.
Saudargas, R. A. (1997). State-Event Classroom Obser-vation System (SECOS). Observation manual. Univer-sity of Tennessee, Knoxville.
Saudargas, R. A., & Lentz, F. E. Jr. (1986). Estimatingpercent of time and rate via direct observation: A sug-gested observational procedure and format. SchoolPsychology Review, 15, 36-48.
Saudargas, R. A., & Zanolli, K. (1990). Momentary timesampling as an estimate of percentage time: A fieldvalidation. Journal of Applied Behavior Analysis, 23,533-537.
Shapiro, E. S. (2004). Academic skills problems workbook(rev.). New York: The Guilford Press.
Shapiro, E. S., & Heick, P. (2004). School psychologistassessment practices in the evaluation of students re-ferred for social/behavioral/emotional problems. Psy-chology in the Schools, 41, 551-561.
Slate, J. R., & Saudargas, R. A. (1986a). Differences inthe classroom behaviors of behaviorally disordered andregular class children. Behavioral Disorders, 11, 45-55.
Slate, J. R., & Saudargas, R. A. (1986b). Differences inlearning disabled and average students’ classroom be-haviors. Learning Disability Quarterly, 9, 61-67.
Walker, H. M., & Severson, H. H. (1990). SystematicScreening for Behavior Disorders: Users guide andadministration manual. Longmont, CO: Sopris West.
Walker, H. M., Severson, H. H., Nicholson, F., Kehle, T.,Jenson, W. R., & Clark, E. (1994). Replication of theSystematic Screening for Behavior Disorders (SSBD)procedure for the identification of at-risk children.Journal of Emotional and Behavioral Disorders, 2, 66-77.
Walker, H. M., Severson, H., Stiller, B., Williams, G.,Haring, N., Shinn, M., & Todis, B. (1988). Systematicscreening of pupils in the elementary age range at riskfor behavior disorders: Development and trail testingof a multiple gating model. Remedial and Special Edu-cation, 9(3), 8-14.
Walker, H. M., Severson, H. H., Todis, B. J., Block-Pedego,A. E., Williams, G. J., Haring, N. G., & Barckley, M.(1990). Systematic Screening for Behavior Disorders(SSBD): Further validation, replication, and norma-tive data. Remedial and Special Education, 11(2), 32-46.
Wilson, M. S., & Reschly, D. J. (1996). Assessment inschool psychology training and practice. School Psy-chology Review, 25, 9-23.
Winsor, A. P. (2003). Direct observation for classrooms.In C. R. Reynolds & R. W. Kamphaus (Eds.), Hand-book of psychological & educational assessment ofchildren: Personality, behavior, and context (2nd ed.,pp. 248-255). New York: Guilford Press.
Robert J. Volpe, PhD, is Assistant Professor in the Department of Counseling and AppliedEducational Psychology at Northeastern University. His primary research interests con-cern academic problems experienced by children with attention-deficit/hyperactivity dis-order, academic and behavioral assessment, and academic interventions.
James Clyde DiPerna, PhD, is Assistant Professor in the School Psychology Program atthe Pennsylvania State University. His research focuses on assessment and interventionstrategies to promote students’ academic, social, and emotional competence.
John M. Hintze, PhD, is an Associate Professor and Director of the School PsychologyProgram at the University of Massachusetts at Amherst. He received his doctorate fromLehigh University in 1994 and prior to that was a practitioner in the public schools for 10years. His research interests are in CBM and various forms of progress monitoring, re-search design, and data analysis that informs practice.
474
School Psychology Review, 2005, Volume 34, No. 4
Edward S. Shapiro, PhD, currently is Iacocca Professor of Education, Professor of SchoolPsychology and Director, Center for Promoting Research to Practice in the College ofEducation at Lehigh University, Bethlehem, Pennsylvania. He is the author or co-authorof 10 books including his most recently published third edition of Academic Skills Prob-lems: Direct Assessment and Intervention and the Academic Skills Problems Workbook(revised edition), both by Guilford Press. His primary research interests are assessmentand intervention for academic skills problems, issues in scaling up of research to practice,and Pediatric School Psychology.