music teachers' assessment literacy, beliefs, & practices
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of music teachers' assessment literacy, beliefs, & practices
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES:
AN INTERVENTION STUDY
by
JOCELYN WENONA ARMES
B.A., Salisbury University, 2012
M.M., Ithaca College, 2016
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirement for the degree of
Doctor of Philosophy
Department of Music Education
2020
Doctoral Committee:
James R. Austin, Chair
Margaret H. Berg
David A. Rickels
Tom Myer
Carolyn A. Haug
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES ii
Armes, Jocelyn W. (Ph.D., Music Education)
Music Teachers’ Assessment Literacy, Beliefs, & Practices
Dissertation Directed by Dr. James R. Austin
Abstract
While shifting priorities in educational policy have increased demand that teachers be
proficient in classroom assessment, teacher preparation programs have responded slowly, despite
evidence that preservice training in assessment may increase teachers’ knowledge and valuation
of assessment. Although professional development (PD) has been shown to change inservice
teachers’ knowledge and attitudes toward assessment, but researchers have not found evidence
that PD influences assessment practices. The purpose of this pretest-posttest control group study
was to examine the effect of an online PD on music teachers’ assessment literacy, beliefs, and
practices. Forty-three participants were randomly assigned to the control or intervention group and
completed the pretest and posttest. These questionnaires consisted of three measures: the
Classroom Assessment Literacy Inventory (CALI), the Music Teacher Assessment Implementation
Inventory (MTAII), and Music Teacher Assessment Beliefs Inventory (MTABI). Intervention group
participants (n = 18) enrolled in a four-week online PD focused on increasing music teacher
assessment literacy.
Following the PD, intervention participants demonstrated a significant increase in
assessment literacy scores compared to their control group peers; assessment beliefs and practices
did not significantly change over time. I found several significant relationships between
participants’ assessment literacy and their self-reported practices, as well as significant
relationships between participants’ assessment beliefs and practices. Implications for music
teacher PD in assessment, preservice teachers, and music teacher educators are discussed.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES iii
Acknowledgements
Completing this dissertation, and this degree, would not have been possible without the
support of countless far-flung people in my life, for whom I am unspeakably grateful.
First, and always, thank you to my mother and grandmother -- Pamela Simson and Jackie
Fritch -- for being not only my staunchest supporters, but setting the standard for what true inner
strength, compassion, determination, patience, and a sense of justice can accomplish in service to
others. Without our family of brilliant and strong women, I do not know who I would be, or what
is possible. Mom, you are my inspiration as a teacher and human being. Grandmomsy Jackie,
even though you are no longer with us, I see you in every member of our family, and I am so
grateful for your influence and love. I love you both.
To my adviser, Dr. James “Papa Bear” Austin, I have no possible way of adequately
thanking you for your mentorship and support throughout not only this dissertation, but this
entire degree. It has been the privilege of my academic life to study with you, and to serve as
your editorial assistant for the Journal of Music Teacher Education. I will fondly recall our many
meetings, your kindness, humor, and encouragement. I am sure I drove you crazy at times with
my sense of scale and use of metaphors (e.g., yoga), but I suspect you enjoyed our conversations
as much as I did. I would not be the thinker, writer, researcher, or person I am today without you,
and I am undoubtedly better for it.
To my committee members -- Dr. Margaret Berg, Dr. David Rickels, Professor Tom
Myer, and Dr. Carolyn Haug. Thank you for your collective efforts and support throughout this
degree. I heard all your voices in my head throughout this process, and the final document is
more thoughtful and complete as a result. Thank you, Dr. Berg, for your support personally and
administratively of my research and success throughout this degree. While this project did not
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES iv
end up using a qualitative design, our conversations still guided my thinking about participants’
experiences throughout the intervention; you have shown me how to think in Technicolor as a
researcher, and I am grateful. Thank you, Dr. Rickels, for your support during my time at CU. I
have enjoyed learning from you in different contexts -- from MSE to research methods – and
have often found that “resistance is futile” when it comes to your enthusiasm for all things
spreadsheet-related. Thank you, Professor Myer, for serving on my committee and for your
support of my musicianship during my time at CU. I gleefully recall the first time we met as I
was touring campus; I accosted you in the hall and asked if we could discuss the Harbison
Sonata. You were gracious enough to spare me a few moments and have included me as an
honorary member of the studio ever since. To Dr. Haug, thank you for sharing your expertise in
assessment and education; your suggestions were some of the most insightful.
To my graduate cohort, your friendship was one of the most important aspects of my
experience during this degree. Our conversations always improved my thinking, and my spirits.
Special thanks to Seth Taft, Jacob Holster, and Ian Miller for also participating in our zany sax
quartet, and to Ellie Wolfe, Kate Bertelli-Wilinski, and Bryan Koerner for your friendship,
support, and wisdom from afar.
Finally, at risk of being verbose, thank you to the many colleagues, friends, and family
members flung, from music education to yoga, with whom I have crossed paths in this life. You
have taught me that nothing meaningful happens without community.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES v
Table of Contents
Chapter
I. Introduction to the Study …………………………………………………………………...1
What Drives Teachers’ Assessment Practices? ............................................................2
Definitions & Purposes of Assessment ……………………………………………….9
Assessment Literacy ………………………………………………………………...14
Assessment Beliefs ………………………………………………………………….15
Teacher Preparation and Development ……………………………………………...17
Professional Development and Assessment Literacy………………………………..19
Study Need and Significance ………………………………………………………..22
Purpose and Research Questions ……………………………………………………25
Definitions …………………………………………………………………………..27
Delimitations ………………………………………………………………………..29
Assessment Literacy……………………………………………………………..29
Sampling ………………………………………………………………………...29
Measures ………………………………………………………………………...29
Researcher Interest …………………………………………………………………..31
II. Review of Related Literature ……………………………………………………………...34
Assessment Literacy ………………………………………………………………...35
...in Education ……………………………………………………………………35
...in Music Education ……………………………………………………………43
...in Summary …………………………………………………………………..44
Assessment Beliefs ………………………………………………………………….45
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES vi
...of Teachers …………………………………………………………………….46
...of Music Teachers ……………………………………………………………..57
...in Summary ……………………………………………………………………60
Assessment Practices ………………………………………………………………..61
...of Teachers …………………………………………………………………….62
...of Music Teachers ……………………………………………………………..69
...in Summary ……………………………………………………………………78
III. Methodology ………………………………………………………………………………..80
Research Design and Intervention …………………………………………………..81
Research Design ...………………………………………………………………81
Intervention Design ……………………………………………………………...82
Course Elements and Organization …………...………………………………...83
Population and Sample ……………………………………………………………...87
Selection and Design of Research Measures ………………………………………..88
Data Collection Instruments ………………………………………………………...90
Prescreening Questionnaire and Informed Consent …………………………….90
Assessment Literacy ……………………………………………….…………….91
Assessment Practices ……………………………………………………………92
Assessment Beliefs ……………………………………………………….……...94
Intervention Group Posttest ……………………………………………………..95
Procedures …………………………………………………………………………...95
Pilot Testing ……………………………………………………………………..95
Data Collection ………………………………………………………………….96
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES vii
Data Analysis ………………………………………………………………..…..97
IV. Results …………………………………………………………………………………..…100
Participant Demographics …………………………………………............………..….101
Reliability and Item Analysis …………………………...............................………..…104
CALI Reliability and Item Analysis …………………......................……..…....104
Correlations ……………………….................................................……105
Difficulty and Discrimination Indices ....................................................106
MTABI Reliability …………………………………………….................……..107
Descriptive Statistics for the CALI, MTAII, & MTABI ...................................................109
CALI Descriptives ...............................................................................................110
MTAII Descriptives .............................................................................................110
MTABI Descriptives ............................................................................................111
Research Questions .........................................................................................................112
Multivariate Analysis of Assessment Literacy and Beliefs .................................113
MANOVA Results ..................................................................................114
Nonparametric Analyses of Assessment Practices .............................................115
Mann-Whitney U Test Results ................................................................117
Spearman Rho Results ............................................................................118
Feedback from Intervention Participants ........................................................................121
Question One ......................................................................................................122
Question Two ......................................................................................................122
Question Three ....................................................................................................123
Question Four .....................................................................................................123
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES viii
Question Five .....................................................................................................124
V. Summary and Conclusions ................................................................................................125
Summary of Findings .....................................................................................................126
Assessment Literacy ...........................................................................................126
Assessment Beliefs ..............................................................................................127
Assessment Practices ..........................................................................................127
Relationships Between Assessment Literacy, Beliefs, and Practices ..................127
Discussion........................................................................................................................128
Major Findings ..................................................................................................128
Music Teachers Lack Prior Assessment Training ..................................128
Online Professional Development Formatting ........................................130
Assessment Literacy can be Impacted Through Intervention .................131
Assessment Beliefs are Related to Assessment Literacy ........................132
Assessment Beliefs Appear Stable Across Time ....................................133
Music Teachers’ Assessment Practices Vary and are
Largely Informal .........................................................................134
Assessment Beliefs and Assessment Literacy are
Related to Assessment Practices .................................................135
Music Teachers’ Assessment Practices May Be
Impacted by Other Factors .........................................................135
Music Teachers’ Educational Decision Making .................................................136
Role of Other Factors ..............................................................................139
Internal Factors ...........................................................................139
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES ix
External Factors ..........................................................................140
Tension .......................................................................................141
Classroom Realities ....................................................................142
Role of Socialization ...............................................................................143
Measuring Assessment Literacy, Beliefs, and Practices .....................................145
Calibration ...............................................................................................145
Dimensionality ........................................................................................146
Implications .....................................................................................................................149
...for Music Teacher Development ......................................................................149
...for Future Implementation of this Intervention ...............................................151
Study Limitations and Implementation Challenges ........................................................153
Sampling Procedure ............................................................................................153
Participant Attrition Between Stages ..................................................................156
History Effects Due to the COVID-19 Pandemic ...............................................156
Reliability of the CALI Instrument ......................................................................157
Recommendations for Future Researchers .....................................................................157
Conclusion ......................................................................................................................161
References .................................................................................................................................163
Appendices ................................................................................................................................180
A. IRB Approval Documentation ..............................................................................................180
B. Standards for Teacher Competence in Educational Assessment of Students (STCEAS)
Standards & Corresponding CALI Items ..............................................................................181
C. Original CALI by Mertler (2002) ..........................................................................................182
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES x
D. Music Teachers Assessment Workshop (MTAW) Design .....................................................192
E. Prescreening and Informed Consent Questionnaire & Informed Consent ............................199
F. Adapted Classroom Assessment Literacy Inventory (CALI) ................................................205
G. Music Teacher Assessment Implementation Inventory (MTAII) ...........................................210
H. Music Teacher Assessment Beliefs Inventory (MTABI) ........................................................212
I. Invitation to Participate .........................................................................................................214
J. Trigger Email Correspondence .............................................................................................215
K. CALI Distractor Item Analysis ..............................................................................................217
L. Intervention Participant Feedback .......................................................................................218
M. Teacher-Constructed Task Exemplar ....................................................................................220
N. Descriptive Data for the CALI, MTAII, and MTABI by Assigned Group .............................233
.
.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES xi
List of Tables
Table
1.1 The Standards for Teacher Competence in Educational Assessment of Students...............4
2.1 Dissertations about Music Teacher Assessment Practices ................................................70
3.1 Participant Tasks ...............................................................................................................84
3.2 Sample Items for Subscales of Assessment Literacy in the Original CALI ......................92
4.1 Participant Descriptive Statistics ....................................................................................103
4.2 Correlations of Item Scores with CALI Standards ..........................................................106
4.3 Difficulty & Discrimination Indices ...............................................................................107
4.4 CALI Pretest and Posttest Descriptive Statistics .............................................................110
4.5 MTAII Pretest and Posttest Descriptive Statistics ...........................................................111
4.6 MTABI Pretest and Posttest Descriptive Statistics ..........................................................112
4.7 Mann-Whitney U Test Results on Assessment Practice Mean Change Scores ..............119
4.8 Spearman’s Rho Results .................................................................................................120
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES xii
List of Figures
Figure
1.1 Cyclical Response of National Organizations to Reform Efforts .......................................3
1.2 The Cyclic Nature of Assessment & Educational Decision Making in Instruction .........11
1.3 Teachers’ Classroom Assessment Decision Making ........................................................23
3.1 Conceptual Diagram of Study Procedures ........................................................................82
3.2 Perusall Discussion Board on an Assessment Text ..........................................................86
4.1 Pretest to Posttest Mean Literacy Scores ........................................................................116
4.2 Pretest to Posttest Mean Belief Scores ............................................................................117
5.1 Music Teachers’ Classroom Assessment Decision Making ...........................................137
5.2 Internal Factors ...............................................................................................................140
5.3 External Factors ..............................................................................................................141
5.4 Tensions ..........................................................................................................................142
5.5 Classroom Realities ........................................................................................................143
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 1
Chapter 1
Introduction to the Study
In his keynote address for the 2007 Florida Symposium on Assessment in Music
Education, Richard Colwell expressed the following sentiment:
“Music educators have two ideas about assessment: (1) we assess continually or (2) our
goals cannot be assessed. Murphy phrases this dichotomy as either “ardent passion or
blithe disregard.” Other than our fascination with aptitude tests, individual assessment
has been informal, and used in private instruction. Group assessment has been conducted
in classes, contests, festivals, and concerts. The professional literature in assessment is
focused on program assessment. However, music education, K-12, has no identifiable
programs.” (p. 3)
While perhaps harsh, Colwell’s commentary is relevant. Over the past thirty years — as national
organizations, policy makers, educational experts, and researchers have debated how to reform
education — the meaning of assessment has become distorted. Teachers now equate assessment
with standardized, high-stakes testing (Colwell, 2008; Heritage, 2007; Stiggins, 2014) and, as a
result, they assess learning in more narrow or calculated ways, and their efforts to improve how
they assess at the classroom level have been compromised.
Historically, educators have used assessments for the purposes of diagnosis,
accountability, and communication (Nierman & Colwell, 2019, p. 181). Music educators,
however, have narrower views and practices. As Colwell opined at the Florida Symposium,
music educators use informal assessments almost exclusively, or “believe that music is all
process or they believe that their classes/ensembles are continually being assessed in public
performances or by outside adjudicators in contests or festivals” (p. 181). Yet, there is an
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 2
argument to be made that “the music education profession has been about the business of
assessment since its inception” (Nierman & Colwell, 2019, p. 181); secondary music teachers, in
particular, claim to informally assess students’ performance through error detection. Here resides
the tension among music educators about the nature of assessment: they are both assessing
constantly, and not fully assessing in ways recognized by educational experts and stakeholders.
Since inception of the national standards movement, obtaining recognition for music as a
legitimate school subject has been a coveted goal of the music education profession and the
organizations that have represented and advocated for it (Colwell, 2008). Abdicating
responsibility for assessing students -- whether due to “ardent passion or blithe disregard” -- is
not a viable path for music educators interested in legitimizing the discipline. There are societal
and political forces that compel music educators to embrace the multiplicity of assessment
purposes, both professionally and for the benefit of their students.
In this chapter, I will discuss the historical rationales for assessment, the definitions and
primary purposes of assessment, the research surrounding assessment literacy, beliefs, and
practices, and the potential of online professional development to increase music teacher
assessment literacy and enhance their use of assessment within the classroom. Then, I will situate
the need for the present study within the literature, articulate my purposes for this intervention
study and my research questions, provide definitions for pertinent constructs, and describe
delimitations of the study. Finally, I will share my personal interest in this topic.
What Drives Teachers’ Assessment Practices?
Historically, teachers have changed their teaching and assessment practices in response to
external pressures from district-, state-, and federal-level policies and other contextual demands
within their jobs. The time between policy issuance and teacher response tends to lag, resulting
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 3
in incrementally wider swings in policy. This iterative process is depicted in Figure 1.1. The
most recent wave of education reform efforts can be traced to the publication of A Nation at Risk
in 1984. Perhaps in tandem with increased public demand for accountability and transparency,
the standards movement unintentionally led teachers away “from evaluating student knowledge
and ability for a successful life to an evaluation of the different school systems, different
teachers, schools, and practices” (Colwell, 2008, p. 6). Nonetheless, national standards, and the
assessment measures designed from them, provided educational stakeholders with clear,
measurable objectives of what students should know and be able to do.
Figure 1.1
Cyclical Response of National Organizations to Reform Efforts
National teacher organizations also responded to demands for increased instructional
quality in the early 1990’s. The seven Standards for Teacher Competence in Educational
Assessment of Students (STCEAS) were developed by the American Federation of Teachers
(AFT), National Council on Measurement in Education (NCME), and the National Education
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 4
Association (NEA) to address inadequate assessment training in teacher preparation programs.
These standards outline the competencies required for teachers to be considered assessment
literate (Table 1.1) (AFT, NCME, & NEA, 1990). The goal of these standards was to provide “a
guide for teacher educators in their work with teacher education programs, a self-assessment
guide for teachers, a guide for workshop instructors, and an impetus for educational
measurement instructors to conceptualize student assessment more broadly than had been done
in the past” (Brookhart, 2011, p. 3).
Table 1.1. The Standards for Teacher Competence in Educational Assessment of Students
Teachers should be skilled in...
1. … choosing assessment methods appropriate for instructional decisions.
2. … developing assessment methods appropriate for instructional decisions.
3. … administering, scoring and interpreting the results of both externally-produced and teacher-
produced assessment methods.
4. … using assessment results when making decisions about individual students, planning teaching,
developing curriculum, and school improvement.
5. … developing valid pupil grading procedures which use pupil assessments.
6. ... communicating assessment results to students, parents, other lay audiences, and other educators.
7. … recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of
assessment information.
The STCEAS also served as a subsequent blueprint for researchers to measure the
assessment literacy of preservice and inservice teachers, and design programming intended to
enhance their literacy. Beginning with the Teacher Assessment Literacy Questionnaire (TALQ)
(Impara et al., 1993), the Classroom Assessment Literacy Inventory (CALI) (Mertler, 2004), and
then the Assessment Literacy Inventory (ALI) (Mertler & Campbell, 2005), psychometricians and
education experts have measured assessment literacy almost exclusively through the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 5
competencies described in STCEAS. This is not to say the STCEAS have been accepted without
criticism. A number of psychometricians and researchers have examined the internal structure of
the TALQ (Alkarusi, 2015), the CALI (Gotch & French, 2015; Ryan, 2019; Xu & Brown, 2016),
and the ALI (Hailaya et al. 2014). Critics have found literacy measures associated with these
inventories to have adequate reliability (i.e., internal consistency) for preservice teachers, but
insufficient reliability for inservice teachers. They also argue -- depending upon the analysis
technique employed -- that the factors (i.e., these inventories typically assign five items to each
standard) do not neatly correspond to the standards. Recently, scholars (DeLuca et al. 2016;
Gotch & French, 2014; Xu & Brown, 2016) have offered contemporary interpretations or
addendums to the original seven standards published thirty years ago. Brookhart (2011) and
Popham (2009) previously suggested modest updates to the standards, as well.
Brookhart (2011) published an updated list of competencies, increasing the total number
to eleven, in order to address 21st century concerns, such as assessing diverse learners and using
technology appropriately to facilitate assessment. Assessment scholar James Popham identified
thirteen target skills and knowledge areas corresponding to assessment literacy, including
identifying, constructing, implementing, and interpreting assessments or assessment data (2009).
Popham argued that to be assessment literate, teachers needed knowledge and skills relevant for
both classroom-level and large-scale assessments. These knowledge and skill targets reflect the
myriad functions of assessment in the current educational landscape, and extend the
competencies outlined by the STCEAS. Popham’s incorporation of the accountability functions
of assessment hold important implications for inservice teachers, who are increasingly asked to
improve student performance on high-stakes measures and evaluated, in part, on the basis of
those same measures.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 6
In keeping with the iterative nature of educational reform and teacher response, the
arguably lax assessment environment of the early 1990’s (Brookhart, 2001) swung toward an
overemphasis on high-stakes testing during the 2000’s (Stiggins, 2002, p. 760). Reform in the
“Era of Accountability” has no doubt contributed to a widespread skepticism of assessment (i.e.
its accountability purpose, and the high-stakes, standardized methods often used) by teachers
(Barnes et al., 2017). Currently, the pendulum of education reform appears poised to swing in the
direction of classroom-based efforts to improve student outcomes, including the use of teacher-
devised assessments (Brookhart, 2011). Researchers have suggested that classroom-based
assessment can increase student motivation and engagement (Earl & Katz, 2006). If history is
any indication, however, teacher practices are unlikely to evolve in such a manner that the
benefits of classroom-level assessment are fully realized; teachers’ general mistrust and
misapplication of assessment may be difficult to shake (Pishghadam et al., 2014; Steinberg,
2008).
These societal trends have undoubtedly affected the assessment beliefs and practices of
music teachers. Perhaps due to the contextual demands of their positions, music teachers appear
to lag behind general education peers with regards to adopting sound assessment principles
(Austin & Russell, 2017; Russell & Austin, 2010), despite the efforts of national organizations to
instill awareness and knowledge about assessment principles and practices. In the 1990's, MENC
responded to the standards-based reform movement by (a) developing national standards for
music teachers (1994); (b) publishing a corresponding handbook, Performance Standards for
Music (1996); and (c) assembling a collection of 31 articles replete with rubrics, strategies, and
other assessment best practices, Spotlight on Assessment in Music Education (2001).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 7
Nearly ten years later, leaders within the National Association for Music Education
(NAfME) were still responding to policy pressures to improve music teacher assessment
knowledge and practices. In 2009, NAfME released an official position statement on
“Assessment in Music Education” wherein they asserted that “assessment, and the accountability
that stems from the public dissemination of the results of assessment, are key components in
building quality instructional programs.” NAfME also addressed the challenges imposed by
high-stakes assessment and logistical obstacles of designing, implementing, and analyzing
classroom assessments for music educators. However, the most strongly worded statement was
about the responsibility of the music educator:
While the forms and content of music assessment may appropriately vary, some form of
regular assessment of music programs should be adopted. The assessment should
measure student learning across a range of standards representative of quality,
balanced music curriculum, including not only responding to music but also
creating and performing music. This assessment should serve the goal of educational
accountability by providing data that can be included in the school- or district-level
“report card” disseminated to the public as required by law. [Bolded by NAfME]
Colwell and other experts have argued that efforts to reform teachers’ assessment
practices have had “minimal impact upon the classroom” with regards to “any effort to improve
teaching and learning in music” (2008, p. 7). While teachers may be generally more aware of
sound assessment principles, they remain unlikely to employ them in practice (Sears, 2002).
Researchers conducting studies after the standards-based reforms of the 1990’s, but prior to the
accountability-based reforms of No Child Left Behind (2001), found that the majority of
teachers’ assessment practices tended to be informal (i.e., observational, large-group feedback
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 8
during instruction) and that grades served accountability and motivational functions (i.e.,
behavioral or attitudinal criteria constituted a large proportion of grades) (Hanzlik, 2001; Hill,
1999; McClung, 1996; Simanton, 2001). In several studies, researchers found that innovative
assessment practices, such as student portfolios, accounted for less than 20% of music teachers’
assessment strategies (Hanzlik, 2001; Hill, 1999; Simanton, 2001). Thus, while national efforts
to enhance assessment practices among music teachers have taken hold to a certain extent,
changes have been slow.
More recently, researchers have found that music teachers tend to focus their classroom
assessment practices on the evaluation of student performance skills, but non-musical criteria are
still emphasized to a considerable extent when assigning grades (Austin & Russell, 2017;
Kancianic, 2006; LaCognata, 2010; Russell & Austin, 2010; Sherman, 2006, St. Pierre &
Wuttke, 2017). The disconnect between policy efforts to improve how music teachers assess and
grade their students, and their actual practices, is complex. Music teachers experience the same
policy pressures as their peer educators outside of music, but these pressures are compounded by
unique features within their jobs: community expectations for performances; program history
and enrollments; music teachers working with many of the same students over multiple years
(i.e., social consequences affecting and arising from assessment); and competition for resources
and time. These issues frame the beliefs music teachers may hold about assessment. Some
researchers have cited the importance of autonomy and its impact on teacher beliefs about
assessment (Box et al., 2015; Fulmer et al., 2014; Simanton, 2001), while others attribute
deficient practices to a lack of adequate training in assessment (Austin & Russell, 2016; Russell
& Austin, 2010; St. Pierre & Wuttke, 2017). Other researchers have cited participant
philosophies about the purpose of a music education (i.e., music classes should be focused on
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 9
non-academic outcomes; fun, enjoyment, community engagement, etc.)(LaCognata, 2010;
Richerme, 2016). This attitude is particularly common amongst secondary music educators, who
often self-identify as directors rather than music educators (Isbell, 2008), and have expressed
feelings that assessment is outside the purview of their role (Denis, 2018).
Definitions & Purposes of Assessment
There is little doubt that some of the confusion surrounding the purposes of assessment
can be attributed to the myriad definitions of assessment. In a 2009 chapter of Assessment
Policy: Making Sense of the Babel, Herman and Baker astutely observed that:
“Assessment, test, measure, instrument, [or] examination metric? Tests or assessments
are instruments used to collect and provide information; they are composed of measures
that can be numerically summarized...A metric is an indicator divided by some other
variable, such as time or cost. Although the term “test” often connotes more traditional
kinds of measures, and assessment a wider array of tasks and item types, we use the two
terms interchangeably.” (p. 176)
Herman and Baker commented upon but a few dimensions of assessment: type, form, and
method. When used interchangeably, it is no wonder confusion abounds. Indeed, a review of
music education research literature confirms there are numerous definitions for assessment.
General education researchers have fared no better in their efforts to clearly explain what
assessment means or entails. When defining assessment, some experts focus on the means by
which learning information may be collected, while others focus on the process or outcome. This
conceptual disparity over what assessment is and does reflects the confusion surrounding its
dimensions (e.g., structure, scale, interpretation, and consequences). In the most general sense
assessment could be defined as any range of methods or processes applied to the pursuit of
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 10
evaluating learner performance (Popham, 2009). For the purposes of this study, assessment is
defined as the process, inclusive of any purpose or method, of evaluating student learning before,
during, or after the instructional cycle.
Upon reaching definitional consensus, identifying the desired function for an assessment
is critical. The same assessment tool could be used for multiple assessment purposes. Teachers
utilize assessments for a number of curricular and non-curricular purposes in instruction,
including diagnosing areas of improvement or need for students, placing students in instructional
groups or supplemental programs, assigning grades, providing feedback about progress to
students and parents, controlling student behavior, communicating achievement expectations,
and teaching concepts and skills to students (Airasian, 2004). Researchers and educational
experts have routinely criticized teachers for this “hodgepodge” of academic and nonacademic
purposes informing their classroom assessment practices (McMillan, 2001, 2003, p. 34; Schafer,
1993).
Assessment functions are partially determined by their location in the overall
instructional cycle (Figure 1.2). In a 2015 brief, the U.S. Department of Education defined the
instructional cycle as four recurring components: “selecting an instructional strategy,
implementing the strategy, collecting data on strategy implementation (i.e., assessment), and
analyzing the data and reflecting on the results” (p. 2). This cycle can take place over the course
of a lesson, and/or be nested within a larger curricular design. An assessment -- even the same
assessment tool -- placed at any point throughout this cycle would have a distinct purpose. In a
special issue of Music Educators Journal, Goolsby (1999) defined four primary purposes of
assessment: placement, summative, diagnostic, and formative. All four can be distinguished by
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 11
the time in the instructional cycle during which they occur, as well as the kind of educational
decisions for which such assessments are used.
Figure 1.2
The Cyclic Nature of Assessment & Educational Decision Making in Instruction
Placement assessments typically occur prior to instruction and provide the teacher with
information about students’ abilities needed to properly place them within groups. In a music
context, such assessments may include auditions, ensemble seating/part assignments, and seat
challenges. Summative assessments typically occur at the conclusion of a teaching cycle and
provide information about a group or individual’s mastery of the content. Concerts, festivals,
recitals, and other kinds of performances or musical products are classic examples of summative
assessments in music settings. Diagnostic assessments are used to determine where learning
difficulties exist for students, so that teachers can tailor instruction to remedy gaps in knowledge
or skill. For music teachers, this often requires using task analysis to determine which musical
element students are struggling with (i.e., rhythmic, melodic, harmonic, or technical elements of
the music), and devising exercises to strengthen deficiencies. Formative assessments are used
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 12
throughout the instructional cycle to provide feedback to students about their progress in meeting
a desired and explicitly stated learning outcome. In the music classroom this may look like
feedback from the teacher during rehearsals, short quizzes, exit tickets, or other checks for
understanding.
More recently, scholars and practitioners have reimagined the purposes of assessment
using the language assessment of learning, assessment for learning, and assessment as learning
(Scott, 2012). This shift in the language about the purposes of assessment captures the parallel
shift in curriculum design toward a student-centered model. In a 2006 guide developed by Earl
and Katz on behalf of the Canadian government, full chapters were devoted to unpacking the
definition of each term, the curricular implications, and assessment practices corresponding to
each purpose. According to Earl and Katz (2006), “assessment of learning refers to strategies
designed to confirm what students know, demonstrate whether or not they have met curriculum
outcomes or the goals of their individualized programs, or to certify proficiency and make
decisions about students’ future programs or placements” (p. 55). Thus, assessment of learning
encompasses the summative and placement purposes, as well as some of the accountability
functions of assessment (Stiggins, 2002). Often, assessment of learning is separated from the act
of teaching and learning within the instructional cycle. It is also inclusive of high-stakes forms of
assessment. With this type of assessment, teachers are responsible for collecting and analyzing
data and awarding grades; according to Scott (2012), such assessment is something that is “done
to students” (Scott, 2012, p. 32).
Assessment as learning “focuses on students and emphasizes assessment as a process of
metacognition (knowledge of one’s own thought processes for students…[and] emerges from the
idea that learning is not just a matter of transferring ideas from someone who is knowledgeable
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 13
to someone who is not…” (Earl & Katz, 2006, p. 41). Thus, assessment as learning
acknowledges that learning is flexible and fluid rather than linear and rigid. The role of the
teacher is to facilitate student understanding by designing instruction that allows students to
think about and monitor their own learning (p. 42). Assessment as learning is rooted in
reflection-as-practice, and positions assessment as something that is “done by” students (Scott,
2016, p. 32). While on some level, this is the most aspirational form of assessment, the
consensus among experts, policymakers, and researchers is that assessment for learning
represents the most actionable means to changing teaching practice and enhancing student
learning outcomes.
Assessment for learning “occurs throughout the learning process...it is designed to make
each student’s understanding visible, so that teachers can decide what they can do to help
students progress” (Earl & Katz, 2006, p. 29). This definition encompasses the formative and
diagnostic purposes of assessment, and fully empowers the teacher to make educational
decisions about instruction based on ongoing assessment interpretations and feedback to
students. Assessment for learning has gained traction as a way to reimagine assessment within a
post-high-stakes assessment environment (Hansen, 2019; Stiggins, 2004, 2005; Wiggins &
McTighe, 2006). This orientation positions assessment as something that is “done for” students
to enhance instruction and learning (i.e., educative assessment), positions teachers as
collaborative facilitators, and embeds assessment practice directly in instruction, rather than as a
stand-alone event (Scott, 2016, p. 32).
Ideally, teachers utilize a variety of assessment strategies and purposes throughout the
course of instruction. No single purpose of assessment should supersede another; balance is
critical to meeting the needs of students, teachers, administrators, and other educational
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 14
stakeholders. Educational experts have established the motivational and engaging benefits of
assessment for learning (Earl & Katz, 2006; Stiggins, 2004, 2005). Assessment of learning has
been used to determine proficiency throughout a students’ K-12 and higher education experience
for decades, and to make important decisions about promotion, credentialing, and the quality of
instruction being provided to students. Assessment as learning represents a newer perspective --
one of empowering students as co-constructors of learning and understanding in the classroom
(Earl & Katz, 2006). Collectively and ideally, the well-balanced adoption of these varied
assessment functions meets the multifaceted needs of all educational stakeholders.
Assessment Literacy
Researchers have shown that basic knowledge of assessment is not sufficient to change
assessment practice. That is, teachers can be generally knowledgeable about what kinds of
assessments are available to them, but not make connections between that knowledge and the
instructional practices that follow. Understanding both the components and processes of
assessment constitutes procedural knowledge or assessment literacy. Assessment literacy has
been defined as “an understanding of the principles of sound assessment” (Crusan et al., 2016, p.
43). Stiggins stated that “assessment literates know the difference between sound and unsound
assessment...they are not intimidated by the sometimes mysterious and always daunting technical
world of assessment” (cited in Mertler, 2009, p. 102). Functionally, assessment literate “teachers
must not only be competent to develop and use high-quality authentic assessments and scoring
rubrics, but also be able to master evaluative skills to make sound judgments about student
performance” (Koh, 2011). Most importantly, assessment literacy is contextually situated; Willis
et al. (2013) explained:
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 15
“Assessment literacy is a dynamic context dependent social practice that involves
teachers articulating and negotiating classroom and cultural knowledges with one another
and learners, in the initiation, development, and practice of assessment to achieve the
learning goals of students” (p. 2).
Thus, to be assessment literate is to be both fluent and adaptable in the knowledge and uses of
assessment within a specific context.
Assessment literacy is a transferable knowledge base and set of skills that may be
developed by both preservice and inservice teachers. Ryan (2019) adapted the Classroom
Assessment Literacy Inventory (CALI) to include parallel measures of assessment literacy and
confidence. Ryan found significant relationships between preservice teacher GPA and
assessment knowledge, confidence, and edTPA assessment task ratings. This suggests that there
may be a link between assessment literacy and confidence in affecting teacher practice, as
preservice teachers are required to demonstrate assessment competencies through five rubric-
based evaluations associated with the edTPA assessment task. What remains unclear is the
degree to which confidence about assessment literacy reflects ongoing teacher learning about
assessment as opposed to existing practices or beliefs.
Assessment Beliefs
Some researchers posit that increasing assessment literacy in the teacher workforce is not
sufficient to effect change in their practices (Ludwig, 2013; Ryan, 2019; Xu & Brown, 2016).
Researchers have suggested that teacher beliefs about assessment may be as important as their
assessment knowledge (Barnes et al., 2017; Brookhart, 2011; Deenan & Brown, 2016). The ways
teachers conceptualize assessment (i.e., what assessments are, and their purposes), and their
feelings surrounding assessment (i.e., value judgements, past experiences, and preferences), can
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 16
directly impact their educational decision making (Deenan & Brown, 2016). To illustrate, a
teacher who is very knowledgeable about assessment methods, but who assigns a lower priority
to assessment than planning and instruction, may opt for assessment methods employed with the
least amount of effort, even if they are not reasonably effective.
Assessment conceptions encompass “teachers’ general views about what assessment is
and its purposes in school and in society” (Fulmer et al., 2015, p. 479). Researchers have found
that teachers conceive of assessment in three broad ways: for instructional feedback (i.e.,
formative feedback to student, instructional decision making for the educator), for accountability
(i.e., of students, in grading practices, of their teaching, etc.), and as irrelevant (i.e., high-stakes
standardized tests that do not provide disaggregated data at the classroom level, testing that takes
away from instructional time, etc.)(Opre, 2015).
The terms beliefs and conceptions are used interchangeably within the literature;
however, beliefs are understood to be inclusive of conceptions (Fulmer et al., 2015). Assessment
beliefs comprise the values, conceptions, and attitudes teachers hold; they “merge affect and
concept” (William, 1979 as cited in Fulmer et al., 2015, p. 478). Teachers’ assessment beliefs
may also be linked to confidence in their knowledge and skills. Austin and Russell (2019)
collected data about preservice music teachers’ (N = 75) confidence in their ability to properly
assess K-12 music students. They found that preservice teachers with more assessment training
felt more confident in their knowledge and skills. Ryan (2018) collected information about
preservice teachers’ confidence in their assessment knowledge but found that those with the least
knowledge were typically the most confident. Ludwig (2013) collected data from 160 teachers
about their assessment confidence and beliefs. Ludwig also found that the most well-trained and
confident teachers tended to have the most positive conceptions of assessment. While
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 17
recognizing that confidence and positive conceptions may not translate into implementation of
sound assessment practices, Ludwig concluded that increasing assessment literacy in teachers,
and providing opportunities for teachers to collaborate and reflect on the conceptions, were
necessary to improve student involvement in assessment (2013, p. iv). Consequently, it may be
important to consider how teachers’ beliefs about assessment, in tandem with their assessment
literacy, inform assessment practices.
Teacher Preparation and Development
Despite the development of the Standards for Teacher Competence in Educational
Assessment of Students (STCEAS), a shift toward standards based-curriculum, and political
support for accountability-based educational reform -- all of which were intended to elevate
teachers’ assessment competencies -- education expert Stiggins argues that these efforts
“paradoxically, have been barriers to developing assessment competence in the classroom”
(2014). One such barrier is the slow pace of teacher preparation programs in making substantive
changes to assessment curriculum amidst competing space for courses tied to teacher
certification and licensing (Darling-Hammond et al., 2002; DeLuca & Klinger, 2011; DeLuca et
al., 2010; Gareis & Grant, 2015). Mertler noted, “ironically, in this age of increased emphasis on
testing and assessment, many colleges of education and state education agencies do not require
preservice teachers to complete specific coursework in classroom assessment” (2004, p. 50).
While teachers are pressured to increase student learning outcomes, and evidence suggests that
effective teacher-constructed assessments can contribute to significant gains on standardized
tests (Mertler, 2004), preservice teachers are not provided adequate training. It is no small
wonder that, “beginning teachers continue to feel unprepared to assess student learning”
(DeLuca & Bellara, 2013, p. 357 in Gareis & Grant, 2015).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 18
Some teacher preparation programs have responded by developing stand-alone courses
for minimal credit (Austin & Russell, 2016, 2019; Gareis & Grant, 2015). The effectiveness of
such additions to the teacher education curriculum has not been conclusive. Gutierrez (2014)
found that “the amount of teacher training explain[ed] 17% of the variability in classroom
assessment practices, while teachers’ assessment knowledge explain[ed] 38% of such variability
in assessment practices…the amount of teacher training did not significantly predict teachers’
assessment knowledge” (p. 4). Clearly, beyond offering units or entire courses focused on
assessment to preservice teachers, it is important that the instruction associated with those
offerings be of a high enough quality to enhance both assessment knowledge and practical
application of that knowledge in assessing learning. Undergraduate music education curricula
are characteristically dense, with few programs offering formal training in assessment principles
(May et al., 2017). Austin and Russell (2019) found that preservice music teachers who received
a greater amount of assessment education valued assessment more and were more confident in
their assessment abilities, but still anticipated using assessment, at least in part, to target and
document non-musical outcomes (e.g., behavioral compliance, rehearsal attendance, punctuality,
citizenship, etc.). Few students had experience implementing assessments in their field
placements, however, which may have limited their ability to conceive of or appreciate how
assessments might be used to target and promote music learning.
Given that undergraduate music education majors typically report that few (if any) class
sessions are devoted to assessment throughout their teacher preparation course work, graduate
study might be considered a stopgap or compensatory means of improving music teacher
assessment literacy, confidence, and practices (Austin & Russell, 2019). Yet, there are no
guarantees that music teachers engaged in graduate education will have access to high-quality
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 19
assessment courses. Austin & Russell (2016) estimated that fewer than half of all students
pursuing master’s degrees in music would complete an entire course on assessment; only 58% of
institutions reported offering stand-alone courses, and only 72% of those that offered an
assessment required graduate students to take the course. Thus, it may be important to consider
how professional development programming could be used to make assessment education
accessible to more music teachers, and whether this instructional format (which would
necessarily be less comprehensive and more condensed than a dedicated course) might be
suitable for enhancing assessment literacy, beliefs, and practices. Additionally, education reform
efforts and trends tend to shift faster than teacher preparation and graduate education programs
may be able to respond. While providing foundational understanding of effective assessment
principles should be an attainable objective for teacher preparation programs, ongoing education
is necessary to keep assessment principles and practices current and relevant to inservice
teachers.
Professional Development and Assessment Literacy
One possible avenue for addressing teachers’ assessment literacy, beliefs, and practices is
through inservice teacher professional development. Professional development, like assessment
itself, has proven difficult to define given the “broad-based assumption that teachers already
know what professional development is” (vanOostveen et al., 2019, p. 1876). Generally,
researchers ascribe desired outcomes for professional development, or compare it to other
professions’ development activities. Because teachers are continually developing instructional
competencies, some researchers have shifted from the language “professional development” to
“job-embedded professional learning” (Zepeda, 2019, p. 3). Regardless of terminology,
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 20
professional development can be defined as any range of learning opportunities provided for
teachers to improve instructional practice in the service of student learning outcomes.
Professional development has traditionally been delivered in a face-to-face format
(McConnell et al., 2013). Teachers now utilize novel online information and communication
technology (ICT) for professional development opportunities (Wasserman & Migdal, 2019).
Whereas traditional formats often require all participants to meet for a short period of time in a
prescribed location -- which may present logistical obstacles for many teachers -- online formats
provide teachers from different districts and schools the chance to engage in collaborative
professional learning communities (PLC) at a distance. Further, the kinds of activities that
teachers can engage in through online professional development are often unique in comparison
to face-to-face formats. For example, vanOostveen et al. (2019) investigated the efficacy of
Professional Development Learning Environments (PDLEs; a series of learning tasks and video-
based case studies) in online professional development and found evidence of “some effect on
beliefs about personal theories of learning” (p. 1864). Boling et al. (2011) found that online
professional development was most effective when designers eschewed traditional (i.e., face-to-
face) activities, and fully embraced the pedagogical opportunities of digital formats.
Online professional development is also a potentially viable option for alleviating teacher
concerns of relevancy and effectiveness. One of teachers’ primary concerns is that professional
development may not address their unique contextual factors (i.e. school- or district-level factors,
content area, and experience level of the teachers) (Guskey, 2003, 2009). Often, face-to-face
formats (e.g., workshops, conferences, coaching, professional communities, etc.) necessitate
presenting educational information in a general manner that is incompatible with teachers’
personal needs (Cook et al., 2017). Online professional development holds the potential for more
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 21
individualized and longer-term engagement with teachers than periodic or short-term face-to-
face formats.
During development of the STCEAS, the collaborating organizations emphasized that
teacher assessment literacy must be cultivated during preservice and inservice teacher education
(AFT et al., 1990). Wang et al., (2008) described four models for developing teacher assessment
literacy ranging from face-to-face training to graduate coursework for credit. As a test of one
model, they implemented an online training system for preservice teachers to improve
assessment literacy and found that those in the treatment group improved their assessment
knowledge and conceptions (changes in actual assessment practices were not considered). In
2001, staff in Lincoln (Nebraska) Public Schools sought assistance in developing ongoing
professional development in assessment literacy for teachers (DeLuca et al., 2004). The
subsequent program, the Assessment Literacy Learning Team (ALLT), was implemented with
approximately five percent of teachers. This program was unique in that teachers’ literacy,
confidence, and the quality of their subsequent assessments were appraised, as well as student
attitudes about themselves as learners. Researchers reported that teacher assessment literacy and
confidence, and quality of classroom assessments improved, as did student beliefs and attitudes.
Huai et al. (2006) conducted a quasi-experimental study with 55 teachers from Arizona,
South Carolina, and Wisconsin to evaluate the effectiveness of an online professional
development program, Assessing One and All, in increasing teachers’ assessment literacy and
practices. Teachers completed the three-month online course, maintained journals, and kept
records of any other professional development they attended for the duration of the intervention.
Measures of assessment literacy and knowledge of assessment practices, obtained prior to and
following the course, were compared. Huai et al. found that “the multimedia, web-based AOA
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 22
course was effective in improving participants’ knowledge and self-efficacy with regard to
general and inclusive educational assessments” (p. 257). This study was unique in that it
evaluated an exclusively online professional development program and required teachers to
maintain journals of their experiences.
Long-term or intensive interventions may be key to changing teacher assessment
practices. Mertler (2009) designed a two-week long intervention for inservice teachers in a
mixed methods study. He found that the intensive was effective in increasing assessment literacy
scores, as well as improving teacher perceptions of assessment as reported in their journals. Like
Wang et al., Mertler did not examine whether the intervention changed teacher assessment
practices in the long term. Koh (2011) investigated whether ongoing and sustained professional
development activities or short-term, one-shot workshops were more effective in cultivating
increased assessment literacy among teachers. Koh directed professional development for two
groups of teachers randomly assigned to each condition. While teachers receiving one-shot
workshops demonstrated improved performance on assessment tasks in the near-term, teachers in
the sustained professional development group had higher levels of assessment literacy one year
later.
Study Need and Significance
Teachers face increasing pressure from political, societal, and local forces to provide
evidence that students are learning at an appropriate level (Colwell, 2008; Mertler, 2009;
Nierman & Colwell, 2019; Stiggins, 2014). Past emphasis on the accountability purposes of
assessment, however, coupled with inadequate assessment training, have left teachers skeptical
and uninformed as to how assessment might be used to improve teaching and learning at the
classroom level (Stiggins, 2014). Teachers also use assessments for purposes other than
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 23
providing feedback or measuring achievement (McMillan, 2003), which undermines the integrity
of the assessment process and diminishes trust in teachers and the information they share about
student learning. Teachers’ educational decision making is a complex process that encompasses
their knowledge, beliefs, expectations, values, their classroom contexts, and external factors
(e.g., local and state policies, parents, demands from administration, etc.) (Figure 1.3). Their
subsequent assessment practices reflect the intersecting – and potentially conflicting – forces at
play. To date, researchers in music education have primarily focused on music teachers’
assessment practices, and devoted nominal attention to music teachers’ assessment literacy and
beliefs. General education researchers have yet to explore the interaction of assessment literacy,
beliefs, and practices, although it is an often-cited implication for future research efforts (Fan, et
al., 2011; Mertler, 2009; Quilter & Gallini, 2000).
Figure 1.3
Teachers' Classroom Assessment Decision Making
Adapted from McMillan (2003)
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 24
Possible solutions to increasing assessment literacy among teachers include adding
assessment coursework to teacher preparation programs (DeLuca & Klinger, 2010), or enhancing
inservice teacher assessment literacy through professional development (Huai et al., 2006; Koh,
2011; Mertler, 2009; Wang et al.,, 2008). In research addressing music teacher perceptions of
preservice coursework, participants have reported poor connections (i.e., a decontextualized
experience) between theory and practice (Conway, 2002, 2012). General education researchers
have also noted that the “combination of theoretical and practical study is a particularly
important change from the traditional approach, which front-loads theory, does not enable
applications, and therefore does not support grounded analysis of teaching and learning”
(Darling-Hammond, 2006, p. 154). The same criticism can be leveled toward poorly constructed
professional development. If opting to enhance teacher assessment literacy through professional
development, integrating the theoretical and practical applications in teachers’ classrooms is
critical to grounding their confidence and beliefs.
To date, efforts to expand inservice teacher assessment literacy through professional
development have consisted mainly of short-term face-to-face interventions (Lukin et al., 2004;
Mertler, 2009), online interventions (Koh, 2011), or a mixture of the two (Wang et al., 2008). All
have reported varying levels of success in enhancing teachers’ assessment literacy, while only
some have attempted to measure teachers’ confidence in their assessment abilities, beliefs about
assessment, or changes in assessment practices. Given that teachers’ assessment practices are a
complex sum of teachers’ knowledge, beliefs, and contexts, providing professional development
that addresses all these components appears vital to changing educational decision-making
processes long-term. Such an endeavor will not be without challenge. Teachers’ educational
decision-making processes are impacted by the complex and recursive interaction of their
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 25
philosophical beliefs, logistical challenges and realities, and external social and political
pressures (McMillan, 2003). Yet, providing professional development targeted toward enhancing
— even incrementally — music teachers’ assessment literacy and practices, while engaging them
in reflection about their beliefs, may have an impact on their subsequent educational decision
making.
This study was unique and significant in several ways. First, I investigated the
intersection of assessment literacy, beliefs, and practices; as previously mentioned, researchers in
general and music education have yet to simultaneously explore these concepts. I also adapted a
measure of assessment literacy — the Classroom Assessment Literacy Inventory by Mertler
(2000) — to a music education context, within an experimental research design that allowed me
to make statistical comparisons across conditions and time. Further, the use of an intervention to
enhance music teachers’ assessment literacy, beliefs, and practices is a novel approach that has
not previously been employed in music education research.
Purpose & Research Questions
The primary purpose of this study was to examine the impact of an online professional
development intervention on music teachers’ assessment literacy, beliefs, and practices. The
intervention was designed to improve music teachers’ classroom assessment literacy, beliefs, and
practices via a four-week online, module-based course. Assessment literacy was measured using
an adapted version of Mertler’s (2000) Classroom Assessment Literacy Inventory (CALI). To
measure assessment practices, as represented by the forms and functions of assessment most
frequently implemented in the classroom, participants responded to a researcher-devised measure
(the Music Teacher Assessment Implementation Inventory). Finally, to evaluate music teacher
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 26
beliefs, participants indicated their level of agreement with 17 statements comprising the Music
Teacher Assessment Beliefs Inventory (MTABI).
The research questions for this investigation were:
1. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment literacy?
2. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment beliefs?
3. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment practices?
4. Are there significant relationships between music teachers’ assessment literacy, beliefs,
and practices?
Hypotheses
Based upon the literature, I formed directional hypotheses about how these variables
would evolve throughout the study. With regards to assessment literacy, I anticipated that music
teachers would score poorly across groups (i.e., control and intervention) on the pretest, and that
those in the intervention group would show growth; Mertler (2002; 2009) demonstrated that
teacher knowledge is relatively malleable with direct and/or sustained intervention. I also
hypothesized that music teachers’ beliefs about assessment would reflect the findings of Austin
and Russell (2017); those with less knowledge or experience with assessment would tend to
value it less than peers with more knowledge or experience. Researchers have continually
reported -- and lamented -- widespread use of informal assessment, or behavioral and
accountability uses of assessment, within the music teaching profession (Hanzlik, 2001; Hill,
1999; Kancianic, 2006; LaCognata, 2010; McClung, 1996; Sears, 2002; Simanton, 2001).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 27
Consequently, I anticipated that music teachers would confirm prior findings before the
intervention. I hoped that the intervention group would show increased usage of formal
formative assessments (e.g., written assignments, projects, portfolios, individual playing
assessments), and less reliance on behavioral measures such as attendance and participation. I
also hoped that those in the intervention group would see the utility of assessment for serving a
wider variety of functions in the music classroom, such as placement and diagnostic functions,
rather than a reliance on summative and extramusical functions. Collectively, I anticipated that
these findings would lend credence to the conceptual model McMillan (2003) designed; by
demonstrating that changes to music teachers’ knowledge and beliefs surrounding assessment
could be leveraged to change practices.
Definitions
For the purposes of this study, I defined the most salient terms and constructs; these
definitions reflect their use within education and assessment research literature. Assessment
refers to the process of gathering or eliciting information about student learning, and any specific
methods for doing so. This definition accounts for the various dimensions of assessment: its
structure (e.g., informal, formal, standardized), the scale (e.g., classroom-level, state-level,
national-level), format (e.g., traditional or alternative), purpose (e.g., diagnostic, placement,
formative, or summative), interpretation criteria (i.e., in relation to prior achievement, standards,
or norms), and the consequences or outcomes attached to results (i.e., no stakes, low-stakes, or
high-stakes).
Assessment literacy is the adaptable knowledge of processes and methods used to
evaluate student learning, as well as the practices best suited for specific learning contexts. It is
not enough to know the processes and methods associated with assessment; teachers must also be
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 28
able to use them to make sound educational decisions in a variety of contexts. Assessment
literacy “involves the understanding and appropriate use of assessment practices along with the
knowledge of the theoretical and philosophical underpinnings in the measurement of students’
learning” (DeLuca & Klinger, 2010, p. 420). This distinction -- between knowing and using -- is
a core tenet of modern cognitive theory (Wiggins & McTighe, 2005).
Assessment beliefs are the conceptions and values teachers hold about assessment.
Conceptions include teachers’ knowledge about what assessment is and personal views about
how it should be used. Teachers’ beliefs about the fundamental purposes and goals of assessment
may influence their practices, as well as their responses to efforts designed to alter those
practices. Examining teachers’ assessment conceptions in tandem with their assessment literacy
may prove integral to changing teacher assessment practices in the long-term.
Assessment practices encompass the specific forms and functions for which music
educators gather information about student learning, Researchers in general education and music
education have examined both the specific forms (e.g., written assessments, individual
performance tasks, group performance tasks, etc.) and the purposes for which teachers employed
assessments (e.g., formative, summative, accountability, etc.). Because there is not definitional
consensus, I elected to collect information about music teachers’ use of both the forms and
functions for which they use assessment in their classrooms.
Professional development encompasses any range of activities and methods geared
toward improving teacher practice in the service of improving student learning outcomes. Thus,
professional development can occur in face-to-face or online formats. For the purposes of this
study, professional development consisted of an online intervention delivered over the course of
four weeks. There were four modules, each one week in length, corresponding to the first four
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 29
STCEAS competencies. These standards state that teachers should be skilled in (a) choosing
appropriate assessment methods, (b) developing appropriate assessment methods, (c)
administering, scoring, and interpreting externally produced and teacher-produced assessments,
and (d) using assessment results for educational decision making.
Delimitations
The results of this investigation were examined with the following delimitations in mind.
Assessment Literacy
The Standards for Teacher Competence in Educational Assessment of Students (STCEAS)
encompass seven unique competencies that teachers must possess to be considered assessment
literate. However, it is evident from the general and music education literature that many
teachers struggle with the first four standards, and that these are the standards most closely
associated with classroom-level assessment practices that may have a meaningful impact on
student outcomes. Thus, the first four standards were the only ones addressed in the online
professional development course and adapted CALI measure. The CALI was also selected
because it presents questions in the form of realistic vignettes that participants evaluate; rather
than measuring inert knowledge about assessment, respondents must utilize procedural and
applicable knowledge about assessment. This most closely embodies the skillset associated with
the STCEAS and assessment literate teachers.
Sampling
Via email, I solicited participants from a nationwide population of approximately 20,000
music teachers holding NAfME membership, who presumably were technologically capable
enough to access and complete the study. I recognized that NAfME membership is not
necessarily representative of the entire music teacher profession. However, I solicited
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 30
participants from a nationwide sample in hopes that the analyses would be adequately powered
for the pretest-posttest control group design and number of variables. This study was conducted
during the COVID-19 pandemic in the spring of 2020; this may have played a significant factor
in participants’ decision not to engage with the study, or to drop out. My final participant sample
was 43 music educators; 18 in the intervention group, and 25 in the control group.
Measures
I chose to measure assessment literacy using a pre-existing quantitative questionnaire
adapted to align to the first four STCEAS and music teacher contexts. While there were other
measures and strategies available, I selected the Classroom Assessment Literacy Inventory
(CALI) specifically because of its (a) alignment to the STCEAS, to which I also aligned the
modules of the intervention; (b) use of vignettes to frame questions, thus requiring respondents
to apply authentically apply their knowledge; and (c) it is the most widely utilized measure by
researchers who have examined assessment literacy in inservice and preservice teacher
populations. After reviewing the literature, I also decided to construct my own instrument to
measure the frequency with which music teachers self-reported utilizing assessments in specific
forms, and for specific functions. While there are some noted flaws with collecting self-report
data about assessment practices, direct observation and computation of music teachers’
assessment practices was not viable. Finally, I utilized a pre-existing and validated measure, the
MTABI, from Austin and Russell (2017) to measure music teachers’ beliefs about the purposes
and value of assessment. This measure did impose an a priori conceptualization of assessment
beliefs, in the interest of maintaining a reasonable amount of work for intervention participants; I
did not utilize more exploratory strategies (e.g., interviews and journaling prompts).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 31
Researcher Interest
I would be remiss if I failed to express my own interests in conducting this study. I
pursued a career in education due to the influence of my mother. She is a middle school librarian,
STEM teacher, and state assessment coordinator. She started teaching over twenty years ago,
halfway through my childhood. I assisted her during many summers; setting up her classroom,
moving desks into nonsymmetrical (to my chagrin) learning centers, alphabetizing student
folders, and making copies. Little did I know I was watching her explore the possibilities of
Deweyian learning concepts in her own teaching. She fully embraced project-based learning
early in her career, and incorporated assessment as learning principles into her instruction. She
was truly a model educator for assessment. And I, unknowingly, was learning from a master
teacher.
When I began my undergraduate career, Mom and I had many conversations about the
role of the teacher in directing (my words) and facilitating (her words) instruction. My decision
to become a music teacher brought me, unexpectedly, to a precipice. I had spent 15 years
watching my mother teach and assess in a student-centered way. Yet, by choosing to study
music, I was confronted with the ways that I had witnessed my former and current teachers teach
and assess, and experienced cognitive dissonance. Rehearsals had always been teacher centered.
Assessment, if it happened at all, appeared to only exist as a box to check at the end of a marking
period or semester, and usually to appease administrators. “We assess constantly” my professors
and cooperating teachers told me. True, the informal and formative strategies that I learned to
name during my studies (i.e. error detection, pedagogy) are a type of assessment. But I had seen
what a master teacher can do with assessment, and how powerful it could be for student learning.
My mother was able to elevate second graders and middle schoolers.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 32
During my first few years of teaching I, like many young teachers, reverted to the
comfortable routines of what I knew from my experiences as a music student. Rehearsals
resembled those that I had participated in as a middle school, high school, and university student.
While my students did get better at their instruments, sing in tune, march in step, and play piano
with appropriately raised and relaxed wrists, I didn’t see evidence that they could carry these
skills outside of my classroom without being directed. I had taught them to rely on needing me to
be musical. They were terrific musicians, but their musicianship was inert.
Over the next few years I continued talking to my mother about teaching. I attended
professional development. I read about teaching music from experts, including an article on
assessment from my future advisor and mentor James Austin. I started adapting the principles I
had witnessed in my mother’s classroom. I grew less afraid of losing control of my classroom
and using student voice and choice to frame my rehearsals. I had students devise criteria for
evaluating performances, self-evaluate, and demonstrate growth (not just proficiency!) in their
musicianship. I used project-based learning to encourage students to arrange their own music;
some of it we even performed in concert. Student performance on my traditional paper-and-
pencil summative assessment even increased. I found students slowly taking their skills out of
the classroom. I know they had musical lives before, but now they could use the skills they had
acquired in class to enhance their musical lives out of class. By expanding my assessment palette
from the default informal strategies that I had witnessed and been taught, my classroom became
a colorful space for transformational learning.
I believe in the power of effective assessment. I know music teachers lead unappreciated
and thanklessly busy professional lives. I know that high-stakes testing and assessment has left a
sour taste in the mouths of all teachers, especially music teachers who have been robbed of
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 33
already scarce instructional and planning time. I know that professional development can often
feel like an administrative hoop to jump through, and that trends in education often change faster
than administrators change schools. I still believe in the power of effective assessment. The key
to changing teacher practice is a combination of building knowledge, building confidence, and
giving teachers space to experiment and collaborate. I was fortunate to have my mother down the
hall for the first few years of my career. I was lucky to have a master teacher as my mother. So, I
believe that I can help other music teachers see its value, too.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 34
Chapter 2
Review of Related Literature
Music teachers’ assessment practices often reflect a lack of awareness regarding
assessment principles designed to promote and document learning in an effective manner at the
classroom level. This problem may be due, in part, to teachers’ negative experiences -- both as
educators and former students -- and their association of assessment with high-stakes,
standardized testing. Many music teachers also report having received inadequate training in
assessment as part of their teacher preparation programs, resulting in gaps in their knowledge
base and skill set (i.e., lack of assessment literacy). Researchers have found that teachers’
assessment practices may be influenced by their beliefs about assessment (Harris & Brown,
2009), including the major purposes that assessment should serve, if assessments provide a
trustworthy basis for making educational decisions (Olsen & Buchanan, 2019), and whether
assessment is even appropriate where music making and learning are concerned (Denis, 2018).
Professional development may prove a viable avenue to educate inservice teachers about
assessment; online professional development can be highly efficacious, in part, due to the unique
delivery formats available in digital platforms. In this study, I provided a four week online
professional development intervention to a voluntary sample of music educators. Through the
use of an intervention design, I was uniquely situated to explore the impact of the intervention on
music teachers’ assessment literacy, beliefs, and practices.
Assessment is a vast topic that scholars have examined from numerous vantage points.
To include the sum of assessment research literature in this chapter would be unfeasible.
Therefore, for the purposes of this study, I delimited the range of literature to scholarly, peer-
reviewed articles published after 1990. In addition, I delimited research in general education to
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 35
articles addressing the assessment literacy, beliefs, and practices of teachers broadly (i.e., not
content area-specific). Comparably fewer music education scholars have examined the
assessment literacy, beliefs, and practices of inservice music teacher populations; therefore, I
have occasionally included articles that were topically relevant, but that described preservice
teacher populations. The body of research reviewed in this chapter was organized into three main
sections according to major research outcomes for this study: (a) assessment literacy; (b)
assessment beliefs; and (c) assessment practices. Each section includes subsections for
assessment research focused on teachers in general education and research focused on teachers in
music education, as well as a section summary.
Assessment Literacy
Since the publication of the STCEAS in 1990, researchers have developed numerous tools
to measure assessment literacy in inservice teacher populations. The STCEAS have been used as
a benchmark for establishing the content validity of a number of assessment literacy measures,
including those developed by Impara et al. (1993) and Mertler (2001, 2004, 2009), and
assessment conceptualization measures by Brown and colleagues (2004, 2006, 2011, 2012,
2015). Critics of these measures caution that the STCEAS are not inclusive of all the factors that
inform teachers’ assessment practices (Alkharusi, 2015; Brookhart, 2011; Hailaya et al.,. 2014;
Stiggins, 1999). In the following section, I will chronologically explore the development,
validation, and use of assessment literacy measures in both general education and music
education contexts.
...of Teachers.
Impara et al. (1993) were amongst the first educational researchers to examine teachers’
assessment literacy. They developed the Teacher Assessment Literacy Questionnaire (TALQ),
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 36
and asked teachers to respond to items about both preservice and inservice assessment literacy
training, their assessment practices, their comfort level in interpreting standardized test
information, assessment preferences, and interest in increasing their assessment literacy. Using a
national sample of 555 teachers from 42 states (47% response rate), they found that teachers
harbored positive feelings about the role of classroom assessments, but less positive feelings
about standardized instruments. Impara et al. reported 46% of teachers did not feel comfortable
interpreting information from standardized tests. While 70% of teachers reported having some
training in testing and measurement, 30% reported no training at all. Further, 59% of teachers
reported a preference for training to occur as part of their inservice experience. Perhaps most
interestingly, the researchers found that teachers who were least interested in becoming
assessment literate were also the least confident in their abilities to assess (p. 116). This study
provided researchers -- such as McMillan -- with a foundational understanding of the concepts
and issues surrounding assessment literacy, and teachers’ general beliefs about their own skills.
Using the STCEAS as a basis for measuring assessment literacy competencies, Mertler
(2004) devised and administered the Classroom Assessment Literacy Inventory (CALI) to a
sample of 67 preservice and 10 inservice teachers. The CALI consists of 35 items (5 items per
STCEAS standard, presented via vignettes in consecutive order), and was adapted from Impara et
al.’s (1993) TALQ. Mertler conducted descriptive analyses and t-test comparisons of preservice
and inservice teachers’ mean scores for each of the seven subscales, as well as the total score for
the instrument. In all cases, where there were significant differences in scores between inservice
and preservice teachers, inservice teachers scored higher. Both groups scored an average of 22
items out of 35 correctly. Mertler reflected that “traditional teacher preparation courses in
classroom assessment are not well matched with what teachers need to know for classroom
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 37
practice” (p. 60). He acknowledged that there were limitations with regard to sampling and
generalizability, and the internal consistency of the assessment literacy measure.
Building upon the 2004 study, Mertler and Campbell (2005) revised the CALI and
renamed it the Assessment Literacy Inventory (ALI), intending to evaluate preservice teacher
populations. This instrument was also based upon the STCEAS, but was organized differently
than the CALI in that there were only five vignettes (i.e., one item per standard, per vignette).
This investigation included a two-stage pilot of the ALI with 152 preservice teachers in the fall of
2003, and 249 preservice teachers in the spring of 2004. Mertler and Campbell found increased
internal consistency (α = .74) in comparison to the CALI for this population. As in the 2001
study, Mertler and Campbell found that preservice teachers answered an average of 23 out of 35
items correctly; they considered this result puzzling – and lower than expected – given preservice
teachers’ recent coursework. They concluded that “because the ALI is specifically designed to
measure the real-world application of assessment concepts and competencies outlined in The
Standards, limited familiarity and experience with the day-to-day realities of the classroom may
have precluded preservice teachers from making necessary connections” (p. 13). They also
concluded that researchers should evaluate the psychometric properties of the ALI with inservice
teacher populations.
Four years later Mertler (2009) used the ALI with a small sample of inservice teachers.
Using a mixed-methods intervention study, Mertler administered the ALI as a pre- and posttest
measure to seven elementary school teachers as part of an intensive two-week professional
development inservice intended to increase assessment literacy. In addition to the ALI, Mertler
used journal prompts to corroborate quantitative results regarding perceived assessment literacy
and growth throughout the intervention. Topics in the two-week intensive included “norm- and
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 38
criterion-referenced measurements, validity and reliability of assessments, the integration of
teaching and assessment, construction and use of traditional assessments, construction and use of
authentic assessments, grading, and the interpretation of standardized test results” (p. 104).
Participants completed nine assessment tasks in the two-week intensive. Mertler found that
teachers’ ALI scores increased from pretest to posttest, but did not analyze if such changes were
statistically significant due to the small size of the sample (N = 7). The largest changes were for
standards 5 (“Teachers should be skilled in developing valid pupil grading procedures which use
pupil assessments”; + 2.00 out of five) and 2 (“Teachers should be skilled in developing
assessment methods appropriate for instructional decisions”; +1.86 out of five). Through the
journal prompts, teachers revealed that their assessment knowledge was initially limited, but that
the intervention was “highly beneficial to their work” (p. 111). Teachers also found the intensive
format of the professional development was helpful because it required them to put aside time to
deliberately consider their current understanding and practices. Mertler did acknowledge the
limitations of this intervention study, including the limited sample size, and generalizability of
the results to different groups of teachers. However, Mertler concluded that “performance-based
inservice teacher training sessions, which focus on applied assessment decision-making, could
prove to be beneficial to a majority of classroom teachers” (p. 112).
One of Mertler’s recommendations for future researchers was to validate the use of the
ALI with inservice teacher populations. Hailaya et al. (2014) surveyed 582 inservice teachers
using the ALI in order to validate the instrument, using Rasch and confirmatory factor analysis
techniques. The researchers made slight adjustments to the vignettes used in the ALI, mainly to
suit prospective Tawi-Tawi and Philippine respondents and their context. They piloted the
measure with 45 elementary and secondary teachers and found acceptable measures of internal
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 39
consistency for the subscales (α = .75). Then, they administered the instrument to the selected
sample of 582 teachers (100% response rate). After removing a single item for violating
assumptions required for the Rasch model, they constructed a confirmatory factor analysis. They
found that the model had appropriate item-level fit and that “all items are appropriate in
measuring teacher assessment literacy and reflect the unitary dimension of the scale pertaining to
assessment literacy” (p. 305). With regards to how the items loaded onto the subscales (n.b., each
subscale corresponding to a standard in the STCEAS), both the Rasch and CFA results indicated
poor fit due to overlapping items. Hailaya et al. had three primary conclusions: (a) there was an
absence of hierarchy among items and factors; (b) factors and standards could not be used
interchangeably in interpreting the validity of the instrument; and (c) the ALI may have other
factorial or structural variance across cultures. They asserted that the STCEAS standards upon
which the ALI is based may not be “sufficiently comprehensive” to assess teachers’
understanding of assessment as it relates to the “realities of the classroom” (p. 312). They also
noted that modern statistical techniques may not be appropriate for interpreting the validity of an
instrument created using classical testing theory.
Similarly, in 2015 Alkharusi examined the psychometric properties of the Teacher
Assessment Literacy Questionnaire (TALQ), upon which the CALI and ALI are based. Alkharusi
administered the TALQ to 259 preservice teachers enrolled in an assessment course in Oman
after translating the instrument into Arabic. He found that the TALQ items demonstrated
“acceptable levels of difficulty, discrimination, reliability, and validity” (p. 1), measured a
unitary construct of assessment literacy, and correlated positively with course scores. Construct
validity (unitary model fit) was measured using a CFA, χ2(329) = 990.762, RMSEA = .08, CFI =
.89. Alkharusi found the instrument to have high internal consistency (α = .84). He concluded
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 40
that the TALQ was a viable tool for instructional and assessment purposes in preservice teacher
populations but cautioned that it should also be validated in other countries.
Some researchers have targeted specific skills within assessment literacy in their
development of measures. Donovan, in her 2015 dissertation and subsequent 2018 article,
developed the Teachers Knowledge and Use of Data and Assessments (tKUDA) measure. The
measure specifically captures teachers’ knowledge and use of assessments and assessment data
in educational decision making. Both her dissertation and article described the psychometric
development and validation of the instrument (30 items; 15 for knowledge, and 15 for use).
Donovan used two samples to calibrate (n = 201) and validate (n = 164) the instrument. She
identified assessment knowledge and assessment use as separate constructs and subsequently
conducted Rasch analyses on each set of items separately. She found excellent model fit and
unidimensionality for both constructs. However, at the item level, several items across both
constructs did not fit the model. She did find strong evidence of internal consistency for both
constructs (knowledge, α = .95; use, α = .96). She concluded that further calibration of the items
was warranted, but that results confirmed other researchers’ findings about teachers’ ability to
analyze data from assessments, and use such information for educational decision making.
Ryan (2018) sought to create and evaluate an instrument that would measure the
assessment literacy of preservice teachers. She also investigated the relationship between
assessment literacy and confidence scores and scores on the edTPA (i.e., to assess the convergent
validity of edTPA to prepare assessment literate preservice teachers). She adapted the Classroom
Assessment Literacy Inventory (CALI) by adding a confidence scale item after every literacy item
on the inventory. After piloting the instrument with 165 sophomores and juniors within one
teacher preparation program in the Midwest, she modified the instrument. This initial sample
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 41
was used to evaluate the internal structure and validity of the instrument. As discussed above, the
CALI was designed in alignment with the STCEAS. Ryan used Rasch and CFA to evaluate the
internal structure and factor loading of individual items onto the standards (used as subscales for
the measure). She did not “draw any definitive conclusions in support of one internal structure,
but [the CFA] results from this study at least demonstrate that the ‘clean’ and ‘tidy’ Standards-
based conceptualization of assessment knowledge is questionable, and perfect alignment [of the
items] with the seven standards is highly improbable regardless of the sample used” (p. 244).
Ryan suggested researchers may be inclined to use parts of the CALI to assess dimensions of
assessment knowledge based upon other theoretical understandings of assessment literacy.
A subsequent sample was used to investigate the relationship between the CALI and a
teacher performance measure (i.e., the edTPA). The second sample (n = 112 seniors) was
obtained from the same program. Ryan found numerous significant relationships for preservice
teachers’ cumulative GPA with assessment knowledge, assessment confidence, and the edTPA
scores. Specifically, the senior’s assessment knowledge score was moderately and positively
correlated to their edTPA score (r = .257, p = .011). Further, “the significant relationships
between cumulative GPA and edTPA total (r = .381, p < .001) and edTPA Assessment (r = .395)
[ratings] were positive and moderate to strong” (p. 232). When controlling for cumulative GPA,
the significant relationship between assessment knowledge and edTPA performance became
“nonsignificant” (p. 235). She concluded that cumulative GPA was an important predictor of
performance on “external assessment literacy tests and performance-based exams” (p. 235). She
also found that confidence had a moderating, but not predictive, effect on assessment knowledge.
Students in the secondary program reported higher confidence than early childhood program
students, but scored lower on assessment literacy items. She advised that teacher preparation
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 42
programs with high academic standards may be able to rely on their curriculum and grading
procedures to demonstrate preservice teacher assessment literacy, rather than using the edTPA.
Researchers have explored relationships between teachers’ assessment beliefs (or
attitudes about assessments) and their assessment literacy (McMillan, 2000; McMillan & Nash,
2003). Quilter and Gallini (2000) devised their own instrument to measure this relationship for a
sample of 117 inservice teachers in southeastern Michigan. Like other assessment literacy
measures (e.g., TALQ, CALI, ALI, tKUDA), this instrument was designed to cover the seven
STCEAS. Quilter and Gallini adapted the TALQ, reducing it to 21 items; 3 per each of the 7
standards. They found the average item difficulty for this sample of teachers was .68, and the
average item discrimination was .28. Thus, the measure was appropriately challenging and
discriminated well at the item level. The measure demonstrated weaker internal consistency (α =
.50) than the original TALQ, perhaps due to reducing the number of items per standard. Quilter
and Gallini found that their sample (N = 117) scored similarly to the original TALQ sample
studied by Impara et al. (1993); 91% of the sample answered correctly for the items
corresponding to the third standard, and 60% of the sample answered correctly for the items
corresponding to the second standard. Teachers’ past experiences with standardized testing and
classroom assessment were positively correlated with their current attitudes, while current
attitudes toward classroom assessment were negatively related to current attitudes toward
alternative assessments. Overall, Quilter and Gallini found that personal experiences and
attitudes did play a more important role than their professional training, and that “teachers’
current attitudes toward educational practice result from a mix of affective and cognitive
variables, with more emphasis on affective variables” (p. 128).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 43
Recognizing that prior affective experiences may have a greater impact than training on
teachers’ educational decision making, Fan et al. (2011) investigated the effectiveness of a web-
based assessment literacy program in enhancing secondary inservice teachers’ assessment
knowledge and perspectives. A sample of 47 secondary math and science teachers participated in
a six-week summer program. The program was unique in its ability to deliver individualized,
situated professional development to the participants. Fan et al. administered a researcher-
devised instrument, the forty item Assessment Knowledge Test (AKT) to evaluate teachers’
understanding of assessment principles (e.g., construction of multiple-choice items, reliability
and validity, item discrimination, etc.) as described in the STCEAS standards (p. 1734). They
also administered a Survey of Assessment Perspectives (SAP) to evaluate teachers’ perspectives
toward assessment functions and procedures. They found that teacher participation in the web-
based program improved their assessment knowledge and perspectives, “especially those in the
low-level prior knowledge group” (p. 1738). They concluded that future research was needed “to
explore whether other factors might have an impact on assessment literacy for inservice teachers''
(p. 1739), particularly whether teachers’ background and instructional experiences moderate
their knowledge and use.
...of Music Teachers.
One of the first investigations of assessment training for inservice music teachers was
conducted by Austin and Russell (2016). Because many states require teachers to obtain master’s
degrees for licensure after a probationary license, Austin and Russell hypothesized that graduate
programs may offer a potential access point for increasing teacher literacy. Using a researcher-
developed instrument, they surveyed faculty from 69 music schools bearing National Association
for Schools of Music (NASM) accreditation (33% response rate). They found that graduate
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 44
courses specifically focused on assessment were offered at only 58% of institutions. According
to respondents, when stand-alone assessment courses were not offered it typically was because of
a perception that (a) such material was already adequately covered throughout other courses, (b)
a lack of instructional time and/or limited program enrollment prohibited offering an entire
course focused on assessment, and/or (c) program philosophy did not support assessment as a
major curricular strand. Within institutions that offered an assessment course, only 72% of them
required master’s students to take the course, and only 33% required doctoral students to do so.
Such courses were typically delivered in a face-to-face format, rather than an online or hybrid
format. The most important learning outcomes for assessment courses were developing rubrics,
aligning assessments to objective, and using formative assessments and feedback to improve
student learning. While Austin and Russell examined graduate courses that may lead to increased
assessment literacy, they did not measure assessment literacy of teachers specifically.
...in Summary.
Assessment literacy measurement is attributable primarily to the work of Impara et al.
(1993), Mertler (2000), and Mertler & Campbell (2005) in general education. Nearly all
assessment literacy inventories (e.g. TALQ, CALI, ALI, ALICE) have been based upon the
competencies outlined in the seven STCEAS. Mertler developed two of the most frequently used
and/or adapted instruments, the CALI and ALI. In both of these measures, Mertler used vignettes
to evaluate teachers’ assessment literacy via 35 items corresponding to the STCEAS standards
(five items per standard). In the CALI, the vignettes (and items) present each STCEAS in order
(i.e., items 1-5 represent the first standard, 6-10 the second, and so on). In the ALI, there are five
vignettes, each with seven questions; each question represents one item from the seven
standards. Researchers examining the reliability of these measures with inservice teacher
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 45
populations have often found that literacy scores lack adequate internal consistency, and that the
evidence of construct validity corresponding to the seven standards (i.e., the STCEAS) is not
always satisfactory or does not exhibit good fit (Alkharusi, 2015; Ryan, 2018). However, the
CALI remains the most comprehensive measure of teachers’ assessment literacy available.
General education researchers are beginning to investigate the intersection of assessment
literacy and assessment beliefs, because there is evidence that teacher practice may be influenced
by teachers’ personal experiences, conceptions, confidence in their ability to execute
professional judgement related to assessments, and their valuation of assessment. Similar to
McMillan and Nash’s (2001) conceptualization of assessment practice, teacher’s assessment
literacy appears to moderate, in some fashion, their educational decision-making process and
subsequent assessment practices. Many teachers appear to know what and how assessments
should be used, but are reluctant to use them in their instructional practices because of
conflicting notions about why they should assess, how assessment fits into their broader
philosophical beliefs, or the negative consequences of providing students, parents, administrators
or other educational stakeholders assessment information that may be disappointing.
Assessment Beliefs
As previously discussed, education researchers generally use the terms belief,
conceptions, and values interchangeably, with the understanding that beliefs encompass affective
(i.e. values) and objective (i.e., conceptions of what assessment is and how it functions)
dimensions (Fulmer et al., 2015; Opre, 2015). Thus, dimensions of belief include personal
meanings derived from experience, abstract mental images, feelings, forms of knowledge, rules,
and preferences (Box et al., 2015; Opre, 2015). Researchers who have studied assessment
literacy and teacher assessment practices generally have found that internally constructed beliefs
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 46
about assessment may moderate the influence of assessment literacy on teacher practices (Brown
& Michaelides, 2011; Deneen & Brown, 2016; Ludwig, 2013; Nyberg, 2016). Further,
researchers have found that teacher beliefs about assessment tend to be reinforced by personal
beliefs about other educational matters (e.g. curriculum, roles of the students and teachers,
content area, etc.) and assessment usage. That is, teachers’ assessment beliefs become reified
over time by their unique contexts and experiences, and reinforce practices that reflect such
beliefs (Box et al., 2015). Most researchers have only addressed teacher beliefs within one
dimension of assessment (i.e. formative or summative, informal or formal). In this section, I will
first review the literature in general education surrounding teacher assessment beliefs. Then, I
will describe emerging research in music education on this topic. Finally, I will summarize the
most significant findings within this body of research.
...of Teachers.
In 2000, McMillan & Nash conducted a qualitative inquiry with 24 elementary and
secondary English and mathematics teachers selected for maximum variation. The purpose of
this study was to explore the reasons teachers give for their assessment and grading practices, as
well as the factors that influenced such decisions. Participant interviews were transcribed and
coded eclectically (i.e. deductively and inductively). The final emergent themes comprised
McMillan’s (2003) conceptual model of teachers’ educational decision making (Figure 1.3, p.
41): this model included teacher beliefs and values, classroom realities, external factors, decision
making rationale, and assessment and grading practices. McMillan and Nash found that “the
most salient internal factor that appears to influence teacher decision making concerning
classroom assessment and grading practices is the teacher’s philosophy of teaching and
learning,'' explained as “assessment and grading practices are whatever will best serve the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 47
purposes that are linked to a larger, more encompassing philosophy of education” (p. 10).
Scholars, to date, have not expanded upon the role teachers’ beliefs may play in assessment and
grading practices.
Gavin Brown, in partnership with other researchers, has established an extensive line of
research addressing teachers’ beliefs and their influence on subsequent assessment behaviors.
Brown (2006) devised the Teachers’ Conceptions of Assessment (TCoA) measure as part of his
dissertation. A subsequent article summarized the psychometric properties of an abridged
instrument. The TCoA - III consisted of 27 (instead of 50) statements using a positively-packed
response scale derived from a previous iteration of the TCoA (n.b., two negative options, and
four positive agreement options). The statements addressed the four main purposes of
assessment; (a) assessment makes schools accountable, (b) assessment makes students
accountable, (c) assessment improves education, and (d) assessment is irrelevant. These were
derived from earlier multi-level and multifactorial models of teacher conceptions as developed
by Brown. The measure was administered to a sample of 692 teachers from Queensland. Brown
used CFA to establish construct validity (χ2(311) = 1492.61, p = .000, RMSEA = .074, TLI =
.80). Items loaded well on each of the first- and second-order factors in the sample. He
concluded that the abridged version of the TCOA was more efficient than the original, while
providing similar quality information.
In 2009, Harris & Brown used phenomenographic approaches to explore the purposes 26
New Zealand teachers ascribed to assessment. Despite Brown’s prior research surrounding this
topic, and development of the TCOA and its related forms, Harris and Brown argued that the
instrument may be limited because it only accounts for four possible purposes of assessment.
They utilized phenomenographic approach because “it is based on the assumption that people
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 48
hold multiple, and at times, contradictory conceptions within their frame of reference, making it
impossible to claim that any particular participant ‘holds’ just one specific conception” (p. 367).
Participants were selected for maximum variation and interviewed with a semi-structured
protocol. Data were analyzed in multiple steps. First, preconceived ideas were bracketed or
excluded to reduce researcher subjectivity. Data were coded inductively, systematically and
iteratively within and across cases. Categories were formed by grouping based upon frequency,
position, and pregnancy of the statement. Subsequently, passages were grouped to create “pools
of meaning” (p. 368). Harris and Brown identified seven major purposes of assessment: (1)
compliance, (2) external reporting, (3) reporting to parents, (4) extrinsically motivating students,
(5) facilitating group instruction, (6) teacher use for individualizing learning, (7) joint teacher
and student use for individualized learning. These findings align with other researchers’ reports
that teachers’ educational decision-making is a complex interaction of internal and external
factors.
Remesal (2011) also conducted a qualitative inquiry into teachers’ assessment beliefs to
construct a new model of conceptions of assessment. She used two sequential interview
techniques with 50 primary and secondary math teachers in Spain. First, participants were
interviewed with a semi-structured protocol. Then, one month later, participants were asked to
provide examples of typical classroom assessment material and interviewed using a “critical
event recall” protocol. In this way, the artifacts would serve as a point of triangulation for the
teachers as they referenced their rationale for constructing, implementing, scoring, and
interpreting data. Remesal identified four purposes for assessments spanning a continuum from
pedagogical concerns to accountability concerns. The four purposes that operated along this
continuum were the effect of assessment on (a) learning, (b) teaching, (c) the certification of
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 49
learning, and (d) the accountability of teaching. Remesal argued that her findings demonstrated
the complexity of school assessment, and the limitations of current analyses (e.g. dichotomous
distinctions between purposes, or static categorizations) in understanding teachers’ rationales.
Segers and Tillema (2011) used Brown’s abridged instrument, the TCOA-III, to evaluate
Dutch secondary school teachers’ (n = 351) and students’ (n = 712) conceptions of the purposes
of assessment. Using Maximum Likelihood factor analysis (varimax rotation) with the teacher
data resulted in nine factors accounting for 48.8% of the variance. Subsequent models had
inappropriate cross loadings of items between factors. The final 4-factor solution explained 34%
of the variance, and included formative purposes for assessment (19.5%), school accountability
purposes (6.3%), perceptions that assessments are irrelevant or inaccurate (4.6%), and
perceptions that assessments have reliability and validity (3.6%). Segers and Tillema also used
Maximum Likelihood factor analysis (varimax rotation) with the student data, and initially
reported a seven-factor solution accounting for 55.4% of the variance. This model had similar
inappropriate cross loadings of items between factors. The final five factor solution accounted
for 46.2% of the variance. The factors included “supports learning” (18.6%), “student
accountability” (8.8%), “the experience of assessment as enjoyable” (6.9%), “the positive effect
of assessment on the supportive and collaborative climate in the class” (6.5%), and “school
accountability” purposes (5.3%) (p. 51). Segers and Tillema framed their conclusions within the
larger political environment of Dutch education; that is, policy and education experts had been
endorsing a shift from using assessment for summative and accountability purposes (i.e.,
assessment of learning) toward formative and instructional decision-making purposes (i.e.,
assessment for learning). They concluded that, in this sample, students were closer to embracing
this perspective than teachers.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 50
In 2011, Brown et al. developed a new self-reporting inventory to examine Hong Kong
and Chinese teachers’ assessment beliefs. This measure, the Chinese-Teachers’ Conceptions of
Assessment (C-TCoA), introduced two new constructs (“development” and “control”), and was
translated into Cantonese and Putonghua. Modifications to each form were made to obtain a
natural and appropriate flow in each language. Participants were purposively sampled, as random
sampling has been shown to be ineffective in Chinese contexts (p. 310). In a Cantonese region of
China, the researchers obtained 1014 survey responses (69% response rate), and in Guangzhou,
they obtained 898 survey responses (80%). They used EFA and CFA to develop a well-fitting
model that explained Chinese teacher responses to the instrument. Brown et al.. found a 7-factor
model with acceptable fit (χ2(414) = 3479.15, RMSEA = .062, CFI = .87). But, upon closer
inspection of factor intercorrelations, the researchers determined the model may require a second
order structure. The second hierarchical model with 3 intercorrelated factors had worse fit, but it
was still acceptable (χ2(426) = 3856.94, p < .001, RMSEA = .065, CFI = .85). They also tested
the invariance of the model with both groups (i.e., Hong Kong and Guangzhou), and found
statistically significant differences in the way the two groups answered the instrument. They
concluded that the new inventory — adapted specifically for Chinese contexts — was a valid
tool to measure Chinese teachers’ assessment beliefs. The role of accountability and control was
found to be important to this population.
In another investigation, Brown et al. (2011) administered the TCoA - IIIA to a sample of
1525 teachers (47.3% response rate) of Queensland, Australia. This instrument is importantly
and significantly different from the TCoA - III in that two of the purposes are considered
hierarchical: “improvement” contains four sub-factors each with three items, and “irrelevant” has
three sub-factors each with three items. Brown et al. found that the CFA model for Queensland
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 51
teachers was inadmissible; that is, it did not fit the data. Because the model was constructed, like
all CFAs are, based on a priori theoretical hypotheses, the research team speculated that it may
be necessary to consider alternative models with different paths. They also theorized that teacher
characteristics (e.g. primary or secondary) may not be invariant. Subsequently, they specified
models for primary and secondary teachers. They found, after also introducing two new paths
between first order factors, that the respecified model had considerably improved fit (χ2(309) =
2741.56, p < .04, RMSEA = .05, CFI = .846). This means that the two groups did have
independent responses to the instrument. The researchers used MANOVA to investigate if this
difference was significant, and found that the groups were significantly different, but the effect
size was small to moderate. Brown et al. concluded that the quality of the instrument was “such
that differences between groups were most likely due to real population differences rather than
chance artefacts in responding to the questionnaire” (p.218). This was important, because
subsequent research by Brown and others would be conducted on the basis that differences
between cultural groups needed to be measured. With regards to policy and teachers’ assessment
conceptions, Brown et al. argued their results demonstrate that “policy makers, professional
developers, teacher educators, and administrators may have failed to persuade teachers that the
currently available assessment systems provide informative, valid, and improving techniques” (p.
218).
Brown et al. (2012) sought to determine if teachers’ understandings of feedback (n.b.,
“understandings” as the analog for beliefs, feedback as the analog for formative assessment)
influenced the type and quality of feedback provided to students. Using a self-devised
instrument, the Teachers’ Conceptions of Feedback Inventory (TCoF), the researchers solicited a
sample of 518 New Zealand teachers to respond to six-point, positively packed agreement scales
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 52
corresponding to assessment purpose and task statements. The researchers found, using structural
equation modelling, that teacher beliefs about formative feedback were influenced by “feedback
practices that they control (i.e., teacher-centric) [that] are used for the explicit purpose of
improving the quality of student learning outcomes” (p. 974). They also found that “teachers’
understanding of feedback as a set of practices in which they are not involved was predicated by
an understanding that feedback requires self- and peer-interaction” (p. 947); teachers appeared to
believe assessment was something done to students rather than something done with students.
Allal (2013) investigated Swedish teachers’ assessment beliefs through the lens of
socially-situated practice by conducting two interviews with each of 10 sixth-grade teachers.
Prior to each interview, teachers were asked to select two students “for whom the teacher had
hesitated when completing the students’ report card between the grades of 3 (‘objectives nearly
attained’) and 4 (‘objectives obtained’)” (p. 25). Allal’s rationale for focusing on this scenario
was that teachers would best be able to display evidence of professional judgement, and that
requiring teachers to bring materials to the interview would help them re-enact their practice and
reasoning in authentic ways. Data were coded inductively. The major themes were “professional
judgement as a cognitive act” and “professional judgement as a socially situated practice” (p.
27). Allal found that teachers used, more often than other appraisal techniques, informal
observation as the basis for making final decisions in these scenarios. Teacher participants’
frequently articulated that they made these decisions using the “sum” of students’ efforts within
classes but were not often able to point to specific examples of student work. Allal concluded
that future efforts to enhance teachers’ practices and professional judgement should focus on
being able to document and communicate coherently and transparently the rationale of their
grading procedures.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 53
Azis (2015) used an explanatory mixed methods design to investigate teachers’
assessment beliefs and how such beliefs related to their practices. Using a sample of 107 middle
school English teachers, Azis administered Brown et al.'s (2010) Teacher Conceptions of
Assessment (TCoA). This measure consisted of several subscales measuring teacher agreement,
indicated via 5-point scales, with items representing three dimensions of assessment function (as
cited in Brown et al., 2009) -- improvement, accountability, and irrelevance. Azis found the
highest levels of agreement with items related to the improvement dimension, closely followed
by accountability. In the follow-up qualitative phase, Azis interviewed four teachers who agreed
strongly with the improvement dimension. She found that teachers in this group “believed the
main purpose of assessment was to inform teaching” (p. 138), and that, subsequently, they
favored teacher-constructed tools and authentic tasks. The participants also felt external or
standardized assessments reduced their autonomy, had a negative impact on the equity of their
students, and were not credible (p. 143). In total, Azis found that teachers’ pattern of TCoA
responses were inconsistent with their practices; that is, teachers’ beliefs about the purposes of
assessment contradicted their actual practices.
Hidri (2015) explored assessment beliefs in Tunisian secondary school (n = 336) and
university teachers (336) using Brown’s (2006) TCoA - III measure. Subsequent analyses were
carried out in four phases: EFA, PCA, dimension analysis, and CFA. Hidri used EFA to
investigate possible factor structure. She used Monte Carlo PCA parallel analysis -- where the
sample data was set at 100 cases, the PCA was set at 2, and the desired percentile was set at 95%
-- to determine the statistically significant eigenvalues of each factor based on random data
generation. Then, she used dimension analysis to estimate the correct number of factors; this
produced different results from phase one to phase two. Finally, Hidri used CFA to investigate
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 54
the relationship paths between variables, to test data fit, and to check indicators moderating
influence on factors. Hidri found discrepancies between each method of analysis. Data were not
separated (i.e., secondary and university teachers), which Hidri acknowledged may have created
issues for fitting models to (what may have been) data that were not invariant. With regards to
the results of the TCoA - III, Hidri concluded that “conflicting conceptions of assessment might
also impact teachers’ practices”, and that despite finding her models were divergent from
previous studies by Brown, there was a strong relationship between Tunisian teachers’
conceptions of assessment purposes for accountability and for improvement.
Ludwig (2013) investigated the relationship between 160 upstate New York public
school teachers’ conceptions of assessment and their confidence in their assessment knowledge.
She used Brown’s (2006) TCoA - III to evaluate teachers’ assessment beliefs, and Arter &
Busick’s (2001) Classroom Assessment Confidence Questionnaire (CACQ). She found that
teachers’ collective self-perceived agreement was greatest for the student accountability purpose
of assessment., but participants tended to rate themselves as most confident in effectively
communicating assessment results. Ludwig found a statistically significant difference in
confidence scores for teachers with varying types of assessment training. She concluded that
teachers should be provided with opportunities to (a) reflect on their assessment beliefs, (b)
collaborate with peers, and (c) enhance their assessment literacy, especially with regards to
improving student involvement in assessment.
Brown et al. (2015) examined teachers’ assessment beliefs in an Indian context, because
teachers are the main agents of educational reform in Indian schools. The researchers
hypothesized that teachers’ beliefs about the purposes of assessment would predict their
assessment practices, and that responses would differ between external (i.e. government or
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 55
publisher created) and internal (i.e. teacher constructed) assessments, with preference given to
formative and diagnostic functions. Teachers were assigned to a modified TCoA - III instrument
and answered questions about either internal (n = 603) or external (n = 649) assessments. All
teachers also took the Teacher Practices of Assessment (PrAI), a 32-item inventory developed in
Hong Kong to identify teacher agreement with assessment practices for specific purposes. As in
prior studies, Brown et al. used CFA to test the fit of a set of pathways within and among factors,
and to establish construct validity. The first nine-factor model was rejected because of negative
error variance in three 1st-order factors. The final four-factor model included the following
factors: improvement, irrelevance, control, and school quality accountability (χ2(293) =2254.88,
RMSEA = .06, CFI = .82); groups were structurally invariant. For the PrAI, the researchers found
a four-factor solution using 29 of the 32 items with good fit (χ2(269) = 1887.16, RMSEA = .06,
CFI = .88). With regards to teachers’ conceptions and beliefs about the purposes of assessment,
researchers found that “regardless of internal or external conditions [they] still see assessment
predominantly around improving student learning by teaching for exams” (p. 59). The
researchers recommended that new resources and programs were needed to support teachers’
formative assessment practices. They lamented that “simply put, it seems that teachers in quite
diverse contexts believe a good school’s effect is seen on better examination performance”
(p.60).
Barnes et al. (2017) are amongst the few and most recent to examine U.S. teachers’
assessment beliefs. Using a sample of 179 participants from the northeast region of the United
States, researchers administered Brown’s abridged (2006) TCoA - III. They conducted EFA and
found that Brown’s four factor model did not fit their data. They then used principal axis
factoring with promax rotation to identify a factor structure that best fit the data. They found a
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 56
three-factor solution that accounted for 51.54% of the variance. They called the three factors
“assessment as valid for accountability” (10 items), “assessment improves teaching and learning”
(6 items), and “assessment as irrelevant” (8 items); thus, they merged the formerly separate
accountability factors from the school and teacher level. Barnes et al. theorized that “the items
considered to tap assessment for student accountability may be seen by teachers as a middle-
ground between assessment for teaching and learning and the more extreme accountability end
of assessment that is present in many U.S. teaching contexts” (p. 114). Like many researchers
before them, they also urged future researchers to “tease out how these conceptions of
assessment influence practice” (p. 115).
Most recently, Olsen and Buchanan (2019) conducted an inductive, multiple case study to
explore secondary teachers’ grading beliefs and practices. After reviewing many of the studies
discussed previously in this chapter, Olsen and Buchanan argued that modern approaches to
grading (e.g. standards-based, four-point systems, software packages or other technology)
necessitated an inductive examination of teachers’ grading practices. They collected data from
15 teachers and 2 principals in two New York schools undergoing a year-long professional
development program targeting grading practices. Data consisted of observations of professional
development meetings, semi-structured interviews with individual teachers throughout the year,
and grading documents teachers produced.
Olsen and Buchanan (2019) found evidence that teachers held seemingly contradictory
beliefs about the purposes of grading (i.e., a hodgepodge of academic and nonacademic factors),
reported a lack of professional development around grading for teachers (i.e., limited preservice
and inservice training about sound grading practices), expressed conflicting messages about the
purposes of schooling, and frequently adapted grading approaches to fit their classroom realities.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 57
In interviews, teachers also noted the influence of their own personal experiences with grades as
students and teachers, and their desire to see students be successful (i.e., teachers felt guilty when
students demonstrated low achievement). Many teachers commented that grading practices
endorsed by the professional development program failed to capture the myriad ways in which
students learn or demonstrate growth. That is, they felt using strictly academic criteria was unfair
to a segment of students who put forth appropriate effort, were compliant, or showed growth but
not proficiency in a given skillset. Overall, Olsen and Buchanan noted that teachers’ grading
practices evolved slowly and recursively. They concluded that in order for such professional
development to be effective, schools or districts needed complete teacher buy-in.
...of Music Teachers.
Music education researchers have only recently begun examining assessment beliefs,
largely to determine if they have an impact upon future preservice and inservice teacher
assessment practices. Leong (2014) examined Singaporean music teachers’ conceptions of
classroom assessment in an attempt to “unravel the multiplicity of significant relationships of
concepts within the specific context of music teachers’ classroom decision-making
environments” (p. 454). Through the theoretical lens of Alexander’s (1992) conceptions of music
classroom assessment, Leong examined the complex intersection of music teachers’ values,
beliefs, needs, required professional duties, and knowledge about assessment. Leong utilized Q-
Methodology, a textual analysis completed by computer programming such as PQMethod and
PCQ, to distinguish the components of 30 teachers’ conceptions about assessment. This
programming organized text into factors, similar to how continuous data can be organized by an
EFA. The teachers were from primary, secondary, university, and department of education
contexts. The textual data were extracted and organized into core thematic statements and loaded
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 58
onto four factors: efficient, evolving, embedded, and empirical. Several statements were
represented across all factors (e.g. “I believe in making an effort to select/design the most
appropriate assessment task for my students” and “I believe there is always room for
improvement when it comes to assessment” (p. 461). The factors were not loaded uniformly
across participant characteristics; that is, one factor was generated from exclusively primary
teachers, while another was generated by only university professors and department of education
staff. Leong concluded that:
“classroom assessment, like many aspects of classroom teaching and learning is not a
stable entity. Rather it is highly variable, contested, and irreducibly situated in a specific
context. The different conceptions of what classroom assessment practice entails suggest
there are many, often conflicting mediating influences with which teachers need to
grapple” (p. 464).
Leong’s conclusion gives credence to the notion that assessment beliefs are a complex
interaction of teachers’ feelings, experiences, needs, and professional obligations.
Nyberg (2016) used participatory action research as a method to explore secondary music
teachers' conceptualizations of musical knowledge, learning, and communication (i.e.,
communicating learning outcomes) in a Swedish school. Nyberg, two administrative music
teachers, and five classroom music teachers formed a core group at a secondary school with the
intention to “increas[e] students’ goal-related achievements” and “develop practices through a
municipal drive for funding of research and development projects...on assessment of musical
knowledge and learning” (p. 244). In such a qualitative approach the participants are also
collaborators in investigating a problem. In this study, the core group met eight times to
determine what would be assessed and how, and to work through their conceptions of what
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 59
assessment should look like. The meetings were recorded and generated into transcriptions and
field notes. Additionally, participants maintained journals, which were subsequently used to
begin discussions. Nyberg found that teachers’ conceptions of assessment were, in part,
influenced by how they conceived of learning; that is, it was typically more difficult to create
assessments for knowledge or skills that teachers perceived as “holistic.” While many of
Nyberg’s findings reflect the characteristics and context of Swedish schools, stakeholder
demands for accountability and transparency of assessment data is reminiscent of the United
States in the early 1990’s.
Austin and Russell (2017) examined the role assessment beliefs and occupational identity
may play in accounting for inservice music teachers’ assessment practices. They surveyed over
9000 secondary music teachers in the United States, with 423 providing complete and usable
data (6% response rate; sampling error of +/- 5%). The researcher-developed questionnaire
included a mix of selection-type and rating scale items that respondents used to describe their
assessment and grading criteria and practices, and a separate section of rating scale items, from
Brown’s (2006) instrument that teachers used to report their beliefs about assessment. Items
addressing beliefs corresponded to functions of assessment (formative (7), α = .87; summative
(11), α = .87; accountability (7), α = .91) and beliefs about the value of assessment (positive
valence (9), α = .91; negative valence (8), α = .88). To measure role identity, Austin and Russell
used 6-point scales for respondents to self-report their degree of agreement with six items
corresponding to teacher (2) or performer (4) identities. Finally, respondents were also asked to
report the weighting of grading criteria representing performance skills, attitude, attendance, and
musical knowledge. Austin and Russell found that teachers who valued assessment more were
more likely to target musicianship outcomes in their grading practices and identify with the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 60
teacher occupational identity, while teachers who devalued assessment were more likely to target
extramusical (i.e., behavioral) outcomes.
In 2019 Austin & Russell investigated preservice teachers’ conceptions of assessment
and projected assessment practices. Using a researcher-developed instrument, they surveyed 75
music education majors from eight institutions across four regions in the U.S. They found
respondents favored the formative functions of assessment over summative and accountability
functions. Students who reported receiving greater amounts of training in assessment valued
assessment more than less-trained peers, and were more confident in their assessment abilities.
Most participants reported two or fewer music education class sessions devoted to assessment
topics, yet one-third felt “very or extremely” confident in their ability to assess future students.
Austin and Russell’s investigation explores both the function and value conceptions of
assessment.
...in Summary.
Assessment beliefs are a complex, intersectional sum of teachers’ experiences as teachers
and students, philosophical beliefs about teaching and learning, and beliefs about the purposes of
assessment. General education researchers have been strongly influenced by the work of Brown
(2006) and his colleagues, especially given his prolific testing of the TCoA inventory in contexts
worldwide, from Tunisia to Indo-China. The TCoA, and its many iterations, are based upon
Brown’s (2006) finding that teachers’ assessment beliefs were centered around (a) accountability
at the school level, (b) accountability at the classroom level, (c) formative feedback, and/or (d)
perceptions of irrelevance. The TCoA measures use this a priori classification of beliefs, which
may not be appropriate for all teaching populations or contexts (Brown & Gao, 2015). Few
researchers have explored teachers’ assessment beliefs in the United States. Barnes et al. (2017)
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 61
recently used the TCoA with a population of U.S. teachers, but found that Brown’s a priori
classification condensed to three factors, and demonstrated poor fit. As a result, some researchers
have used exploratory or inductive methodologies to investigate teachers’ assessment beliefs
(Leong, 2014; Remesal, 2011). Music education researchers have only begun to examine music
teachers’ assessment beliefs and how they may intersect with assessment practice. Little is
known about music teachers’ assessment beliefs, especially in the United States; thus, future
investigations of the dimensions of assessment belief, and development of a reliable instrument
to capture music teachers’ beliefs, are warranted.
Assessment Practices
The decisions teachers make about when and how to employ assessment in their
classrooms comprise only a small portion of the professional judgements made in a given day;
planning and preparing learning activities, implementing lessons, managing student behavior,
communicating with parents, other teachers, and administrators, and providing additional support
or assistance to students are the bulk of the activities in which teachers engage. As discussed in
Chapter 1, teachers’ assessment practices appear to lag behind best practices associated with
reliable, valid, fair, and useful assessment, but those practices also reflect broader societal and
policy trends related to education reform. In the following section, I discuss specific literature
pertaining to assessment practices employed by teachers (i.e., classroom specialists, not music
teachers) and teachers’ perceptions associated with specific practices, and then review
assessment practice research involving music educators; within each section, studies will be
presented chronologically based on year of publication.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 62
...of Teachers.
In 1995, Oosterhof et al. conducted an observational study of 15 Floridian public-school
teachers to explore current classroom assessment practices. Two teachers were from elementary
schools, seven from middle schools, and six from high schools. Teachers were selected for
maximum variation. Each teacher was observed for five consecutive teaching days in order to
“gain insight into a number of teacher and student behaviors, perceptions, and strategies that
perhaps go unnoticed when shorter periods are involved” (p. 2). Observations were documented
as 10-minute interval estimates of the amount of time that teachers and students engaged in (a)
formal assessment, (b) informal assessment, (c) integrated assessment and instruction, (d) other
on-task activity, and (e) off-task activity. Observations were also recorded in field notes.
Oosterhof et al. found -- consistently across classrooms -- that teachers primarily engaged in
informal assessment and integrated assessment and instruction techniques (i.e., “show of hands”,
questioning techniques) with students. Oosterhof et al. reported “one of the more surprising
findings” was the “limited or non-existent time teachers have for developing assessment skills”
(p. 12). They concluded that “if practicing teachers' abilities with critical measurement skills are
less than acceptable, perhaps we need to give careful consideration to selecting a subset of skills
with which we will train teachers well” (p. 12).
Building upon earlier work, Zhang (1996) administered his Assessment Practices
Inventory (API) to 311 inservice teachers to determine the hierarchy of teacher assessment
competencies -- as outlined in the STCEAS standards -- using Rasch analysis. The API consisted
of 67 items describing specific classroom assessment practices, to which teachers responded
using a 5-point scale from “not at all skilled” to “highly skilled.” Zhang used principal
component analysis and determined that teachers’ responses organized into six components.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 63
Subsequently, the logit of the items belonging to each component was calculated. The result of
these analyses was a list of the assessment practices perceived to be most difficult. Of the six
categories -- relative to the STCEAS standards -- interpreting standardized test results,
conducting classroom statistics, and using assessment results in decision making were perceived
to be the most difficult practices by inservice teachers. Communicating assessment results was
perceived to be the easiest. Non-academic grading practices (i.e. participation, attendance, effort)
had the second highest logit score, meaning that teachers perceived these practices to be
relatively easy. Zhang concluded that these findings confirmed other researchers’ assertions that
teachers were unskilled in developing, implementing, and interpreting assessment data.
As standardized testing became a regular feature in classrooms of the late 1990’s,
researchers began measuring teachers’ perceptions of standardized testing and the relevance of
standardized testing to their assessment and instructional practice. Goldberg & Roswell (1998)
gathered instructional materials from Maryland teachers who had previously scored the
Maryland School Performance Assessment Program (MSPAP) state tests, and subsequently
surveyed 50 Charles County teachers about the impact of scoring MSPAP on their teaching and
perceptions of how the MSPAP “integrated into their own and their colleagues' instructional and
classroom assessment practices” (p. 5). They also interviewed 12 Charles County teachers with
more than one year of scoring experience. They found that teachers believed the MSPAP scoring
experience had either improved or had the potential to improve their teaching and assessment
practices. The two significant themes from the interviews with teachers corroborated the
questionnaire results; namely, “scoring was such a valuable experience that it would be ideal if
every teacher and administrator could score”, and “scoring gives you the ‘big picture’ and serves
as a ‘wake up call’” (p. 14). Goldberg & Roswell also found that teachers’ instructional materials
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 64
became grounded in “plausible, real-life situations, problems, issues, or decisions, and are
comprised of a series for which the purposes are clear and authentic” (p. 20) after scoring
experience. Teachers also became more effective in aligning instructional materials to state
standards and learning indicators. Goldberg & Roswell surmised that professional development
approximating the experiences teachers had scoring the MSPAP was imperative to enhancing
teacher assessment practice.
As education reform efforts turned toward accountability, state departments of education
and policy organizations began developing rubrics and other methods of evaluating teacher
assessment practices. Aschbacher (1999) developed an evaluation framework and rubrics based
upon the participation of 24 third- and seventh-grade teachers. Teachers were asked to submit
exemplars of typical classroom assignments and student work in a binder. Collectively, the
teacher participants contributed 136 teacher-constructed assignments, with four pieces of student
work for each assignment. Aschbacher evaluated the assignments using five 4-point scales to
evaluate the cognitive demand of the task, the clarity of grading, the alignment of the task with
learning goals, the alignment of grading criteria with learning goals, and the overall task quality.
Aschbacher found that the “vast majority of assignments collected for this study at both
elementary and middle schools made relatively low-level cognitive demands on students” (p.
26). She also found that students encountered “coherent assignments less than half the time
based on assignments submitted,” particularly at the middle school level (p. 30). That is, they
were not typically aligned to teachers’ goals, nor were grading practices. She also found that
students received no feedback on over one-third of the assignments, and that teachers’
perceptions about what constitutes high- or low-quality work had a “low to moderate”
correlation to the raters’ appraisal of the student work examples. Aschbacher concluded that her
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 65
“approach to measuring classroom practice through ratings of a sample of assignments shows
promise in its capacity to describe several important aspects of the classroom learning
environment… and in suggesting areas for administrative attention, professional development,
and teacher reflection” (p. 41). A notable implication of Aschbacher’s study is the potential value
of amassing teacher-constructed materials and student work exemplars in appraisals of teachers’
instructional effectiveness and the integrity of their assessment practices.
In 2000, Mertler conducted a study about teachers' assessment practices and/or literacy
that would subsequently inform this line of research. The purpose of this descriptive study was to
explore the assessment practices of teachers in Ohio; specifically, the methods used to ensure
construct validity and reliability of classroom assessments. Using a stratified random sample of
K-12 teachers, Mertler developed and administered the Ohio Teacher Assessment Practices
Survey to 625 teachers. Respondents were asked to list specific steps used to ensure that their
assessments aligned to objectives (i.e., validity), and how often they used these procedures on a
five-point scale (e.g. 1 = never to 5 = always). Respondents were asked the same question about
steps used to ensure assessments yielded consistent student scores (i.e., reliability). Responses
were coded into six categories. More than half of the responses involved teacher constructed
tests. The remaining categories included “compare to objectives”, “analysis of test data”, “I don’t
determine validity”, “asking for student feedback”, and miscellaneous (p. 32). With regards to
preparation, only 13% of respondents indicated that they felt “well prepared” to assess student
learning (p. 34). Mertler concluded that his findings confirmed those of earlier research implying
that teachers were unprepared to conduct and interpret assessment data.
McMillan (2001) surveyed 1,483 teachers (65%) in seven urban Virginia school districts
about their assessment and grading practices. To document various assessment and grading
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 66
practices, McMillan used closed-item 6-point frequency scales. Using principal component
analysis, McMillan found that teachers utilized a “hodgepodge of factors” to determine grades:
academic achievement, “academic enablers” (i.e., non-academic factors such as effort, ability,
attendance, and participation), use of external benchmarks, and the use of extra credit
assignments or tasks (p.28). He also found significant differences between teachers of different
subjects; math teachers, for example, were less likely to use academic enablers than English or
social studies teachers. Perhaps alluding to his work with Nash in developing a conceptual model
of educational decision making, McMillan concluded that teachers’ assessment practices are
inexorably connected to other demands influencing teachers’ educational decision making,
including their desire to document learning and motivate students.
McMillan et al. (2002) subsequently examined the assessment and grading practices of
921 elementary teachers in Richmond, Virginia public schools (58% response rate). Mirroring
McMillan’s (2001) study of secondary teachers’ assessment and grading practices, the
researchers developed a similar instrument. This instrument, however, also incorporated
questions “to emphasize actual teacher behaviors in relation to a specific class of students, rather
than more global teacher beliefs; teacher responded to all items once for language arts and once
for mathematics” (p. 206). In this way, the researchers could approximate elementary school
teachers’ unique context; they validated the instrument with the assistance of 15 elementary
school teachers. As McMillan did in 2001, the researchers used principal component analysis to
organize teachers’ practices. This analysis resulted in three components: teacher-constructed
performance tasks (e.g. constructed-response assignments, essays, projects, etc.), publisher- or
district-provided performance tasks, and teacher-constructed achievement tasks (i.e., major
exams and tests). Except for the frequency with which tasks were used, the researchers found no
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 67
significant differences between mathematics and English teachers’ assessment and grading
practices. The researchers also found confirmatory evidence suggesting “within-school variance
is greater than between-school variance; individual teacher preferences are more important than
are differences between schools in determining grading practices” (p. 212). This strengthens
McMillan’s earlier findings that teachers’ educational decisions involve the reconciling of a
complex assortment of competing internal and external needs (2000, 2001).
Building upon McMillan’s earlier work, Frey and Schmitt (2010) investigated the
assessment practices of 3rd- through 12th-grade teachers in Kansas. The researchers devised
their own survey instrument. Respondents were asked to use provided definitions for terms (n.b.,
traditional tests, performance tests) and answer questions about six aspects of classroom
assessment, including usage as measured by the “estimated percentage of the time they use
various types of assessments” (p. 109). Respondents were also asked for demographic data such
as their gender, subject area, and years of teaching experience. Frey and Schmitt found no
relationship between years of teaching experience and the frequency with which teachers used
teacher-constructed or publisher-constructed tests. They did, however, find a relationship
between teaching experience and the tendency to use short-answer formats as well as
performance-based (i.e. authentic, alternative) formats – both of which were more common
among less experienced teachers. They also found a significant difference between teacher
assessment practices and teacher gender, subject, and level. Female teachers were more likely to
use performance-based assessments than male teachers, and less likely to use teacher-constructed
assessments. Elementary teachers were less likely to use teacher-constructed tests. Frey and
Schmitt bemoaned that “a generation after the call for improved assessment practices . . . the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 68
research focus is overwhelmingly on large-scale test development with little emphasis on
assisting teachers in developing high-quality classroom measures” (p. 116).
Researchers other than Zhang (1996) have used the STCEAS as a framework to assess
teachers' assessment knowledge and practices. In her 2014 dissertation, Gutierrez administered
the Assessment Literacy Inventory for Classroom Educators (A.L.I.C.E.) to a population of 94
Midwestern suburban teachers from three separate middle schools. She found that most teachers
in the sample reported never taking assessment coursework in their undergraduate or graduate
preparation. Additionally, despite reporting greater than ten hours of professional development,
as well as access to instructional coaches and regular administrative meetings, teachers felt they
lacked a sound understanding of assessment principles. For this population, Gutierrez found that
teacher training “explains 17% of the variability in classroom assessment practices, while
teachers’ assessment knowledge explains 38% of such variability in assessment practices” (p. 4).
She concluded that the findings supported previous research in that many teachers fail to receive
formal assessment coursework, but that training in assessment significantly increases the
likelihood of using a wider variety of assessment practices.
Researchers have also displayed a renewed interest in investigating the personal and
contextual factors associated with teachers’ use of assessment to make educational decisions.
Box et al. (2015) used the Personal Practice Assessment Theories (PPAT) framework as a lens to
investigate the educational decision making and assessment practices of three science teachers in
a west Texas suburban community. Like McMillan and Nash’s (2000) conceptual model of
assessment, this framework incorporates teachers’ contextual circumstances into the rationale for
educational decisions and assessment practices. Using ethnographic methods and a multiple case
study approach, Box et al. collected interview and artifact data. They found distinct differences
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 69
in assessment practices among the three teachers, confirming prior research that individual
teacher assessment practices have great variance due to the contextual demands of their
classrooms, schools, and districts.
Fulmer et al. (2015) explored the contextual factors affecting teachers' assessment
practices. In their conceptual model, they reviewed research that may explain teacher assessment
practices from micro-, meso-, and macro- perspectives. They found that most research addresses
teacher assessment practices at the micro- (i.e., local, classroom based) level, and that there are
fewer researchers conducting research at the meso- (i.e., school) level, or seeking to connect
micro- and macro- (i.e., national, policy level) contexts. They divided their discussion of
literature by each level and relevant topics. For the micro-level, they discussed research
surrounding teachers’ conceptions, beliefs, knowledge, and value of assessment, in addition to
specific teacher background variables. For example, they reported that teachers’ roles and
experiences have been shown to influence their assessment practices; teachers with greater
managerial responsibilities tended to value and utilize sound assessment principles with greater
frequency (p. 483). They also reported that secondary school teachers of science and humanities
have wider gaps between their values and practice than do teachers of language and creative arts
(p. 483). They concluded that researchers should strive to investigate teacher practices, values,
and knowledge of assessment at each of the three levels; untangling the complex interaction of
contextual (i.e., micro-) and circumstantial (i.e., meso- and macro-) variables is key to
understanding how to improve teacher practice.
...of Music Teachers.
Music teachers’ uses of assessments are varied. Researchers have consistently found that
when music educators assess student learning, most employ informal and formative assessments
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 70
such as “in class, down-the-line” performance assessments that are designed to provide checks of
skill or accuracy (Hill, 1999; Kancianic, 2006; LaCognata, 2011; McClung, 1996; McCoy, 1988,
1991; McQuarrie & Sherwin, 2013; Russell & Austin, 2010; Simanton, 2000). But, the bulk of
student appraisals are accounted for by “non-academic criteria” such as attendance and attitude
(Russell & Austin, 2010). Researchers have ascribed this phenomenon to several potential
causes: circumstantial or logistical challenges facing teachers (McClung, 1996), a lack of
administrative oversight and support when assigning grades (Russell & Austin, 2010), the
common conception amongst music educators that assessment is not appropriate for the
“subjective” experience of music (Denis, 2018), and a lack of teacher preparation in assessment
knowledge and use (Austin & Russell, 2019).
Most music education researchers have examined assessment practices through a
dissertation project; eight dissertations by music educators were published in the 1990’s and
early 2000’s. The dissertation authors typically described music educator assessment practices
(n.b., Table 2.1). All but two dissertations were written by music educators with an interest in
exploring secondary -- particularly high school -- band teacher practices.
Table 2.1. Dissertations about Music Teacher Assessment Practices
Date Author Institution Title
1996 McClung Florida State
University
A descriptive study of learning assessment and grading
practices in the high school choral music performance
classroom
1999 Hill University of
Southern
Mississippi
A descriptive study of assessment procedures, assessment
attitudes, and grading policies in selected public high school
band performance classrooms in Mississippi
2001 Hanzlik University of
Nebraska
An examination of Iowa high school instrumental band
directors' assessment practices and attitudes toward assessment
2001 Simanton University of
North Dakota
Assessment and grading practices among high school band
teachers in the United States: A descriptive study
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 71
2002 Sears University of
Massachusetts
Lowell
Assessment in the instrumental music classroom: Middle
school methods & materials
2006 Kancianic University of
Maryland -
College Park
Classroom assessment in U.S. high school band programs:
methods, purposes, & influences
2006 Sherman Teachers
College,
Columbia
University
A study of current strategies and practices in the assessment of
individuals in high school bands
2010 LaCognata University of
Florida
Current student assessment practices of high school band
directors
McClung (1996) described learning assessment and grading practices in high school
choral contexts. Using three samples, a student sample from the 1995 Georgia High All-State
Chorus (n = 615; 100% return rate), a high school choral teacher sample (n = 160; 80% return
rate), and a high school principal sample (n = 150; 78% return rate), McClung surveyed
participants about the purposes and practices of assessment. He found that all groups perceived
grades to be an important component of the choral experience, and that the process of assigning
grades could potentially affect the public’s perception of the legitimacy and value of high school
choral music. Most notably, McClung found strong support for use of extra-musical criteria in
grading and assessment practices from all samples, including participation, attitudinal criteria,
and attendance. McClung speculated that these extra-musical criteria may provide opportunities
for students to demonstrate “achievement in the affective domain”, or for teachers to credit less
musically talented students for growth. His findings were mirrored in subsequent dissertations on
music teacher assessment practices.
In his dissertation, Hill (1999) examined the assessment practices, procedures, and
policies in high school band contexts in Mississippi. Like McClung, Hill surveyed a population
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 72
of students (n = 327; from the Mississippi Bandmasters’ Association State Band Clinic), teachers
(n = 93; Mississippi Bandmasters Association), and principals (n = 38; randomly selected public-
school administrators). Hill found that grades were considered an important part of the
instrumental classroom. He also found that extramusical (i.e., non-academic) criteria were
important and significantly weighted in grading practices. These criteria included attendance,
participation, and attitude. Hill also asked participants about their use of a range of assessment
formats, from portfolios to traditional paper-and-pencil, and found that fewer than one out of
four teachers employed any academic achievement criteria in their assessment and grading
practices. Hill’s findings reflect the acute issues that professional organizations in the 1990s
sought to allay.
Hanzlik’s (2001) dissertation marks the first efforts of a music education researcher to
examine assessment practices and beliefs in tandem, as well as an effort to document differences
in assessment beliefs based on teacher differences. Hanzlik developed the instrument -- the
Survey of Band Directors Attitude Toward Assessment (SBDAA) -- and administered it to a
sample of 200 randomly selected band directors in Iowa; 154 surveys were returned (77%
response rate). Contrary to Hill and McClung’s findings, Hanzlik found that Iowa band directors
tended to use playing or performance task assessments the most, followed by attendance,
“teacher observation”, participation, and sight-reading (p. 6). Hanzlik concluded that “the
emphasis of the instructional process in Iowa band rooms seems to be clearly on performance
learning and not on cognitive or affective learning” (p. 6). He also found that band teachers
generally held positive attitudes toward assessments. Using ANOVA and hierarchical regression,
Hanzlik found only one significant group difference, with band directors in Class A schools
exhibiting more positive attitudes toward assessment than band directors in larger Class 1A
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 73
schools. Additional analyses revealed a significant curvilinear relationship between high school
teaching experience and attitude toward assessment, with directors with the most and least
experience having more positive attitude scores than directors with 10-25 years of teaching
experience. These findings may have reflected the unique teaching context of Iowa at that
particular time.
Simanton (2001) conducted his dissertation the same year as Hanzlik. Like Hanzlik,
Simanton was concerned with the growing emphasis on academic reform and accountability, and
how these emphases might mete out in music teacher assessment practices and beliefs. Using a
researcher-developed questionnaire, Simanton surveyed 202 high school band directors via a
regionally stratified sample (based upon six regions defined by the Music Educators National
Conference [MENC]). He found that few band directors -- nationally -- employed sound
assessment practices, but that they were highly satisfied with their current practices. Simanton
also found that directors of smaller bands, or bands led by directors with graduate degrees,
tended to use sound assessment practices with greater frequency. Further, there appeared to be
regional differences for director use of grading criteria and time spent on assessing students. Like
McClung, Simanton cited workload and contextual factors as significant reasons for directors’
deficiencies in assessment practice.
Sears’ (2002) master’s thesis was unique in that she sampled middle school instrumental
music instructors (i.e., band and orchestra). The purpose of the study was to describe the types of
assessments in use by middle school instrumental instructors in Massachusetts. She sent
questionnaires to eighty schools across southeastern Massachusetts, and a total of 42 responded
(52.5% response rate). Like McClung and Hill, Sears found that the majority of directors used
non-musical criteria in their grading practices, and that nearly all directors considered student
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 74
attendance and participation as appropriate and important measures. Other assessment practices
included the use of performance tasks (as found by Hanzlik in 2001), practice cards, and teacher-
devised rubrics. Overall, Sears found that most respondents’ schools required teachers to
consider existing state frameworks in assessment planning and usage.
In 2005, Kotora examined assessment strategies used by high school choral music
teachers and taught by college methods professors in Ohio. After surveying 246 high school
choral teachers (43% return rate), he found that non-musical criteria were frequently utilized;
attendance and participation were used as assessment criteria by 85% of choral teachers, with
attitude used by 74% of choral teachers. College methods professors (n = 20; 53% return rate)
also used non-musical criteria, but to a lesser degree (attendance, 55%; participation, 45%;
attitude, 35%). Musical criteria, including concert performances, singing and written tests,
audiotape recordings, and individual performances, were all used by at least 68% of the
participants. Kotora also collected information about whether the decision to use various
assessment practices was impacted by district, state, or national mandates, or personal choice.
Most participants reported that personal choice guided their decision to use specific assessment
practices. Kotora concluded that the profession should seek to provide clarity in the assessment
options available to teachers, increase the availability of technology, and conduct research to
explore the factors that inform teachers’ decisions to use specific assessment strategies.
Drawing from a random national sample, Kancianic (2006) surveyed 2,000 high school
band directors about their assessment methods, purposes, and classroom characteristics. He
received 634 (31.7%) completed questionnaires from participants. The independent variables
comprised personal (11 items) and school (11 items) characteristics. The stated dependent
variables were 23 assessment methods, 19 assessment purposes, and 23 factors influencing the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 75
use of classroom assessment (e.g. logistics, performance expectations, administrative demands,
etc.). Kancianic found that directors tended to employ performance task assessments, but fewer
academic assessment practices. Directors cited logistical reasons as the primary impediment to
utilizing other assessment practices, echoing findings reported by McClung (1996), Hill (1999),
and Simanton (2001). Kancianic reported that none of the MANOVA analyses were found to be
significant for group differences based upon assessment methods, purposes, or factors, likely due
to the large number of variables at play.
Sherman (2006) investigated high school band directors’ assessment and grading
practices, the current tools employed in conducting assessments, and attitudes toward such
practices. Utilizing a random sample of 500 high school band directors from the NAfME Eastern
Region, Sherman surveyed instructors about the kinds of assessments used, including
performance tasks, writing samples, and student self-evaluation. The second phase of this
dissertation involved interviews with directors who participated in the survey. In the interviews,
participants described their use of assessments. Many did employ performance tasks, rubrics, and
portfolios, but used them infrequently, citing time and workload challenges; “with 160 students
to grade, it is impossible to perform any type of assessment more than once as I need to give up a
full week of teaching to do them” (p. 60). More often, Sherman found, teachers incorporated
non-musical criteria in grading practices, specifically, class participation, effort, and class
preparation. However, teachers also reported feeling that they needed to justify their practices;
“If I don’t have a strong background and supporting data for my strategy, I will be eaten alive in
this community. Any hint of ambiguity will be challenged. I have no option but to CYA at every
corner. That is what dictates my grading procedure, nothing else” (p. 61). Sherman’s study
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 76
confirmed prior researchers’ findings that classroom realities often impede teachers’ abilities to
formally assess students with frequency.
LaCognata (2010) studied current high school band directors’ assessment practices and
beliefs about assessment. Using a survey, LaCognata sampled 5,000 directors who held NAfME
(formerly MENC) membership; a total of 454 completed the survey (10% of the usable sample,
after undeliverable emails were removed). Overall, LaCognata found that “the main purpose of
student assessment for high school band directors centered on providing their students and
themselves with feedback concerning the instructional process in the classroom” (p. 10). This
represents a fundamental shift in thought from the earliest dissertations by McClung (1996), Hill
(1999), and Simanton (1999), whereby the researchers found that the summative purposes of
education (and related types of assessments) were cited more frequently. However, for the
directors in LaCognata’s study, nonmusical criteria still represented a significant component of
teachers’ grading practices.
In a 2010 study, Russell and Austin surveyed 352 secondary music teachers in the
NAfME Southwest region about specific assessment and grading practices as well as any
contextual or individual difference variables that may influence their practices. They found, like
LaCognata, that while the use and weighting of non-musical criteria in assessing and grading
was less pronounced when compared to studies conducted in the 1990’s or early 2000’s, such
criteria still accounted for the preponderance of students’ grades. Specifically, non-musical
criteria (i.e., attendance, attitude, practice, participation) accounted for an average weighting of
60% of student grades. Within non-musical criteria, Russell and Austin found that directors used
a combination of “subjective impressions and objective documentation to assess” factors (p. 44).
They also found that teachers were most likely to use traditional (i.e., written), in-class
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 77
approaches for assessing knowledge, and “down-the-line” performance tasks to assess skills (p.
46). Using MANOVA analysis, Russell and Austin also found that middle school choral
directors gave more weight to written assessments than their instrumental colleagues but found
no such difference between vocal and instrumental high school directors. High school directors
gave greater weight to attendance than did middle school directors, and choral high school
directors gave greater weight to attitudinal factors than did their instrumental colleagues. Russell
and Austin concluded that music teachers may face “the greatest challenge in moving their
assessment paradigm out of the 20th century” if their assessment practices continue to lag behind
assessment principles espoused by experts and framed in policy.
As previously discussed, most researchers have examined music teachers’ assessment
practices and beliefs in secondary and/or instrumental contexts. Fewer have considered other
levels or specializations (e.g., middle school, elementary, vocal, or general contexts). McQuarrie
and Sherwin (2013) recently explored the relationship between elementary music teachers’
assessment practices and assessment topics in literature aimed toward them. First, they collected
data from 100 elementary general music teachers in the Northwestern United States about their
assessment practices. Then, they reviewed ten years (1999-2009) of the national publications
Teaching Music and Music Educators Journal for articles related to classroom music assessment.
Finally, they ranked the reported classroom assessment techniques, and those found in the
literature, by frequency. They found that there is a discrepancy between the assessment strategies
reported as being used most frequently by elementary music classroom teachers and those
espoused in the literature, matching the findings of Austin and Russell (2010).
Austin and Russell (2017) recently explored the role that teacher occupational identity
and assessment conceptions may play in assessment practices. Similar to their 2010 study, they
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 78
asked 423 secondary music teachers in the United States about assessment practices. In
alignment with their previous findings, they found that the most heavily weighted grading
criteria were performance skills (35%), followed by non-musical criteria (attitude, 24%;
attendance, 14%). By exploring assessment conceptions (used interchangeably with beliefs and
valuing throughout assessment literature), Austin and Russell identified that music teachers who
valued assessment were more likely to target musicianship outcomes and utilize a mixture of
formative and summative assessments in their practice than music teachers who did not value
assessment. They, however, did not find that teacher identity was strongly correlated to
assessment conceptions. Their findings warrant further examination of the role of assessment
beliefs in impacting music teacher assessment practice.
...in summary.
Researchers in general education and music education have conceptualized and measured
teachers’ assessment and grading practices in a number of ways. Some have described specific
kinds of assessments (e.g., teacher constructed tests, externally produced tests, rubrics, etc.) that
teachers may use, such as Aschbacher (1999). Other researchers have described teachers’
assessment practices by the cognitive or affective outcomes that may be achieved (McCoy,
1988). Music education researchers have described the purposes for which teachers employ
assessments (e.g., achievement, extramusical). There does not appear to be consensus amongst
researchers about the most effective way to have teachers self-report their assessment practices.
Some researchers, like Austin and Russell (2010, 2017), have asked teachers to estimate the
weight of specific assessments in a grading scheme. Frey and Schmitt (2010), however, asked
teachers to self-report the frequency with which they employed specific assessments. Asking
teachers to report the purposes for which they use assessments may be necessary to
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 79
understanding the complex interaction of external and internal factors that inform their
assessment practices, as McMillan and Nash (2000) suggested.
Both general education teachers’ and music teachers’ assessment practices have been
described as lacking (Colwell, 2008; Stiggins 2014). Informal methods of assessment (e.g.,
questioning techniques, “fist to five”, or other participation-based methods of engaging students)
are favored by general classroom and music teachers, as opposed to formal and alternative
methods of assessing (e.g., exit tickets, performance assessments). Music teachers appear to use
nonacademic criteria (e.g., attendance, attitudinal criteria) as the basis for grades to a greater
extent than their general education peers. Both sets of teachers also appear to account for non-
academic criteria in their grading and assessment practices, in part, to motivate students, or
account for effort and growth in students who perform below proficiency. Finally, researchers of
both populations appear interested in examining the role of assessment beliefs in teachers’
educational decision-making processes and assessment practices. McMillan and Nash (2000)
devised a conceptual model that may prove useful in determining how teachers resolve tensions
arising from teachers’ beliefs and literacy, external mandates, and classroom realities, and which
of those factors, if any, influence assessment practices.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 80
Chapter 3
Methodology
In this investigation I used a pretest-posttest control group design to examine the
effectiveness of an online professional development intervention for music educators in changing
assessment literacy, beliefs, and practices. The intervention consisted of a four-week online,
module-based course. A secondary purpose of this investigation was to explore relationships
among music teachers’ assessment literacy, beliefs, and practices.
My research questions were:
1. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment literacy?
2. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment beliefs?
3. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment practices?
4. Are there significant relationships between music teachers’ assessment literacy, beliefs,
and practices?
The independent variable had two levels corresponding to the group or condition to
which participants were randomly assigned (i.e., the control or intervention group). The
dependent variables were assessment literacy, beliefs, and practices. In this chapter, I describe
the research design and intervention, participant sampling, the instruments used to collect data,
and procedures related to data collection and analysis.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 81
Research Design and Intervention
Research Design
I used a pretest-posttest control group design. This design allowed me to determine
whether the intervention was effective in changing music educators’ assessment literacy, beliefs,
and/or practices. Drawing upon a national sample of music educators, I administered a
prescreening questionnaire to determine respondents’ eligibility to participate, requested that
they complete the pretest (including separate measures of assessment literacy, beliefs, and
practices), and then randomly assigned participants to either a control or intervention group
(Figure 3.1). The control group was a true control, in that the control group participants did not
receive professional development or any form of communication from me prior to my
administering the posttest. The pretest-posttest control group design minimizes many threats to
internal validity, including history effects, selection effects, and testing effects, because the
influence of environmental factors can be minimized or controlled for across groups (Adams &
Lawrence, 2019; Campbell & Stanley, 1963).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 82
Figure 3.1
Conceptual Diagram of Study Procedures
Intervention Design
I implemented the intervention, a four week online professional development program,
via Google Classrooms. Google Classrooms is a commercial learning management system, akin
to programs such as Canvas, Blackboard, or Moodle. Participants can access the course,
hereafter called Music Teacher Assessment Workshop (MTAW), using a personal email
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 83
associated with Google mail. Google Classrooms are organized by course. Email addresses
associated with professional or institutional accounts may not be authorized to use the Google
Classrooms feature; thus, it was important that participants use a personal email account. In the
platform, I created the course, modules, and invited participants in the intervention group to
participate via email. Participants used a class code to enroll in the course.
Course Organization & Elements
The course was titled Music Teacher Assessment Workshop (MTAW). All courses in
Google Classroom are organized around a home page, called the “Stream.” At the top of the
page were four tabs, including the homepage: Stream, Classwork, People, and Grades (Appendix
D). The Stream is a running collection of updates the instructor has made to the course or
announcements from the instructor. The Classwork tab is the repository for the modules. The
People tab is a directory of the course participants. The Grades tab is a gradebook for the
instructor and an individualized report card for participants. Participants did not receive formal
grades for this course, but I did provide notification after they completed each module. Further,
every module was given a categorical weight of 25%, which helped communicate progress to
participants throughout the workshop.
The Classwork tab is organized by Topic. Google Classrooms uses Topic in the way that
Canvas, Blackboard, or Moodle use modules; that is, each topic serves as an anchor point for
assignments and materials. For the MTAW, there were five topics: Week 1, Week 2, Week 3,
Week 4, and Questions & Comments. Topics were listed in the order that participants completed
them. Additionally, the Classwork tab included a link to a Google Calendar for the workshop.
Participants could sync this calendar to their personal accounts, or reference it within the class to
see when each module, or Topic, should be completed. Participants were provided access to each
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 84
Topic on a weekly basis, beginning on March 10th, and ending on April 19, 2020. Participants
had access to the first Topic as soon as they registered and until the conclusion of the first week,
the first and second module during the second week, the first, second, and third module during
the third week, and all four modules during the fourth week.
Each Topic was aligned to a specific STCEAS standard; thus, the assignments and
materials associated with each Topic were also aligned to the standard. There were three
assignments per Topic: an Overview, a Discussion, and a Teacher-Constructed Task. A full
listing of the tasks teachers completed each week can be found in Table 3.1.
Table 3.1. Participant Tasks
Visit # Procedures/Tools Location Estimated Time
Pretest ● CALI questionnaire (15 minutes
● MTAII questionnaire (5 minutes)
● MTABI (5 minutes)
Qualtrics 25 minutes
Week 1
Module
● Review of Materials (1-2 hours)
● Discussion Board (.5-1 hour)
● Teacher Constructed Task (.5 hour)
Classroom
MTAW
Course
2.5-5 hours
Week 2
Module
● Review of Materials (1-2 hours)
● Discussion Board (.5-1 hour)
● Teacher Constructed Task (1 hour)
Classroom
MTAW
Course
3-5 hours
Week 3
Module
● Review of Materials (1-2 hours)
● Discussion Board (.5-1 hour)
● Teacher Constructed Task (1-2 hours)
Classroom
MTAW
Course
3-5.5 hours
Week 4
Module
● Review of Materials (1-2 hours)
● Discussion Board (.5-1 hour)
● Teacher Constructed Task (1-2 hour)
●
Classroom
MTAW
Course
3-5.5 hours
Posttest ● CALI questionnaire (15 minutes
● MTAII questionnaire (5 minutes)
● MTABI (5 minutes)
Qualtrics 25 minutes
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 85
Within the Overview assignment, participants could access a collection of five to six
articles and print resources. I asked participants to read all of the materials, and encouraged them
to download articles and materials for their personal use. There was no specific accountability
measure to ensure that participants accessed and read all of the materials. However, the
discussion board each week was based upon one of the readings from each collection. This was
an intentional part of the workshop design. Requiring participants to access all of the materials
could potentially discourage engagement in the workshop, and I hoped to provide differentiation
through levels of engagement.
Each Discussion assignment description directed participants toward the web-based
application Perusall. Perusall is a discussion board that takes place on a text (Figure 3.2). Each
week, participants responded to discussion questions corresponding to one of the required
materials, and comments from their peers. This application allows participants to engage in
discussion directly on a text, rather than in a discussion board one step removed from a text.
Participants can add emoji, pictures, video, hyperlinks, or most other media to their comments;
this platform is highly participatory and authentic to digital learning contexts, which researchers
have found to be more effective than traditional formats (Boling et al., 2012; Cook et al., 2017).
Further, providing opportunities for teachers to work together has been shown to increase
satisfaction with online professional development, and the likelihood that teachers will complete
online programs (Yurkofsky et al., 2019). While this discussion activity through Perusall was
interactive, participants’ comments were anonymous to one another, but visible to myself as the
instructor. This precaution was taken to reduce any possible biased or negative interaction
between participants, or the unlikely (given the use of a national sample) possibility of
participants knowing one another professionally or personally.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 86
Figure 3.2
Perusall Discussion Board on an Assessment Text
The Teacher-Constructed Task was directly tied to a corresponding STCEAS standard
each week. Over the course of the professional development, teachers selected a cohort of
students for whom they could design an assessment, developed an assessment (informed by
sound assessment principles), interpreted their assessment data, and used the results for future
educational decision-making. Each week, teachers submitted either a written account of how
they completed the task, a physical copy of the assessment they designed, or both. Therefore, this
assignment cumulatively kept the course grounded in participants’ classroom practice, and
connected theory to practice (Darling-Hammond, 2002). Due to the COVID-19 pandemic, which
resulted in nation-wide closures of schools and a rapid transition to online and distance learning
formats (UNESCO, 2020), I altered the task requirements for the third and fourth tasks. I
provided all teachers with a dataset for purposes of interpretation given that they could not
feasibly collect actual student data; all related materials can be found in Appendix D, and an
exemplar of a completed teacher-constructed task is in Appendix M. Then, using the data
participants were provided, they reflected upon what their next steps would be instructionally,
and how they may change the assessment for future use.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 87
As a form of reciprocity to control group participants, I offered the workshop again at the
conclusion of the study, from June 1 through June 28, 2020. Finally, I planned to present a
summary report of my findings from this investigation to all participants in late summer 2020.
Population and Sampling
My initial plan for conducting this study involved working with a medium-size school
district in Maryland that employed 50 music teachers across 17 elementary/K-8 schools and
seven middle/high schools. Only two music teachers, however, agreed to participate in the study.
Therefore, I abandoned the initial plan and decided to utilize the National Association for Music
Education’s Survey Research Assistance program to disseminate the study description and
informed consent documentation via email to a national population of approximately 20,000
music educators in the United States. I delimited the target population to active K-12 music
educators of any interest area (i.e., band, orchestra, choral, technology, etc.) with NAfME
membership. Because of music educators having to open and read an email invitation and
complete a prescreening questionnaire before they could participate in the study, participants
constituted a volunteer sample.
Following IRB approval of my study on February 12, 2020, NAfME approved and
distributed my project description and prescreening questionnaire on March 10, 2020, with the
intent to resend the email one week later. The purposes of the prescreening questionnaire were to
(a) ensure that all participants met my aforementioned criteria for participation (n.b. NAfME
members include preservice teachers, retired teachers, and college-level teachers), (b) provide
prospective participants with a detailed description of the study, (c) obtain informed consent, (d)
collect demographic information from participants, and (e) randomly assign participants to the
control or intervention group. Due to the impact of the emerging COVID-19 pandemic, NAfME
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 88
halted all research after March 17, 2020. Thus, my sample was limited to the music educators
who completed the prescreening questionnaire during the first week. The Research Survey
Assistance program provided information about participant access of the email. The email,
containing the project description and link to the prescreening measure (Appendix I), was sent to
20,474 members. Of that number, 604 emails were rejected by the members’ account, thereby
reducing my accessible population to 19,870. Of that number, 6,309 opened the email, and 247
clicked on the link to the prescreening questionnaire. Finally, of the 247 members who accessed
the prescreening measure, 108 completed the measure in full (i.e., informed consent and
demographic information). I used features in the Qualtrics survey design to randomly assign the
108 participants who completed the prescreening measure to the control or intervention group;
because participants and myself were aware of the group to which they were assigned, this was
not a blinded design. However, in simple random assignment, each participant has an equal
chance of experiencing either condition (Adams & Lawrence, 2019, p. 288).
Selection and Development of Research Measures
Assessment literacy is an adaptable and applicable understanding of the components and
uses of assessment in educational decision making (Stiggins, 1991). To be assessment literate,
teachers must have a firm understanding of sound assessment principles — selecting, designing,
implementing, scoring, and interpreting assessments — and how to use subsequent data in
educational decision making. Quantitative education researchers have traditionally measured
assessment literacy using the seven Standards for Teacher Competence in Educational
Assessment of Students (STCEAS) as benchmarks for subscales (Impara et al., 1993; Mertler,
2000; Mertler & Campbell, 2005). To measure music teacher assessment literacy, I selected the
Classroom Assessment Literacy Inventory (CALI). The CALI and related measures (TALQ, ALI)
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 89
have been used extensively in studies of preservice and inservice teachers’ assessment literacy.
While there are continuing concerns about the internal consistency and construct validity of the
CALI when used with inservice teachers, it was the instrument that could best measure
assessment literacy in a contextualized manner (i.e., assessment principles grounded within
realistic vignettes) and be most readily adapted for use with music teachers working within a
range of school settings and sub-specialties (Mertler, 2002). The psychometric properties of the
adapted CALI are reported in Chapter 4.
Assessment practices are the specific methods teachers use to gather or elicit information
about student learning. Researchers in general education and music education have developed
numerous ways to describe and quantify assessment practices (Rowan & Correnti, 2009). In
general education, researchers have described assessments based upon whether they were
teacher- or publisher-constructed (McMillan, 2001), format (Oosterhof et al., 1995), their
purpose (Box et al., 2015), and even specific assignment types (Schmitt & Frey, 2010). In music
education, McCoy (1988) categorized assessment practices based upon the ways that students
may be engaged (e.g. cognitive, psychomotor, affective, and non-music), while Russell and
Austin (2010) categorized assessments based upon achievement or non-achievement criteria.
Regardless of the nomenclature, researchers do tend to use frequency ratings to quantify
assessment practice (Rowan & Correnti, 2009). I developed an instrument -- the Music Teacher
Assessment Implementation Inventory (MTAII) -- that participants used to report the frequency
with which they employed specific forms of assessment, and used assessments for specific
functions (e.g., summative, formative, diagnostic, placement, and extramusical). I believed it was
important to ask music teachers how often they employ assessment for various functions because
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 90
researchers have found that many music teachers use assessment informally and to evaluate
extramusical criteria (e.g., participation, attitude, attendance).
Assessment beliefs comprise the values, conceptions, and attitudes teachers hold about
and toward assessment; they “merge affect and concept” (William, 1979 as cited in Fulmer et al.,
2015, p. 478). Educational researchers, particularly Brown, have studied teacher assessment
beliefs using a priori designations with teaching populations worldwide. Generally, researchers
have found that some teachers hold contradictory beliefs about the purposes and value of
assessment; they may perceive assessment as useful and necessary for instructional feedback and
accountability, but also as potentially irrelevant, or even detrimental, to their educational
decision making (Austin & Russell, 2017, 2019; Opre, 2015). Based upon existing research,
music teachers may hold different conceptions of assessment from the general teaching
population as a result of the specialized nature of the discipline. Thus, I chose to utilize an
existing measure specifically designed for music teachers to examine the beliefs participants held
about the purposes of assessment. This measure, the Music Teacher Assessment Beliefs Inventory
(MTABI), was adapted and evaluated by Austin and Russell (2017) from Brown’s TCoA
questionnaire.
Data Collection Instruments
Prescreening Questionnaire and Informed Consent
I distributed the invitation to participate in the study via email through NAfME’s
Research Survey Assistance program. The prescreening questionnaire contained four sections
(Appendix E). First, respondents were asked if they were a current K-12 music teacher to ensure
they were part of my target population. Respondents who selected “no” were directed to a
message thanking them for their interest, and informing them they were not eligible to participate
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 91
in the study. I used the “avoid ballot box stuffing” feature in Qualtrics to ensure that these
respondents would not access the prescreening measure again from the link they were provided.
Respondents who selected “yes” were directed to the informed consent document. In addition to
providing consent, participants provided their school email, which was used to link their
responses throughout the study. Third, participants responded to a series of 11 items that
addressed demographic characteristics and educational experience.
Assessment Literacy
The CALI, as well as the MTAII and MTABI, constituted the pretest questionnaire. I
adapted a measure previously used for determining the assessment literacy of varied teacher
populations -- the Classroom Assessment Literacy Inventory or CALI (Appendix F). I selected
the CALI as opposed to other assessment literacy instruments (i.e., TALQ, ALI, ALICE, or
tKUDA) for reasons discussed in the above section. The fact that the CALI often exhibits
marginal reliability when used with inservice teachers may be due to the number and nature of
items (i.e., how the measure is constructed), characteristics of teachers completing the CALI (i.e.,
most music teachers have minimal background of education in assessment), and the type of
information it is designed to provide (i.e., literacy measured in terms of the number of items
answered correctly across distinct standards). According to classical test theory, research
instruments will produce optimally reliable outcomes when there are an extensive number of
items of similar difficulty, when the instruments are administered to a sample that is large and
diverse, and when the measure represents strongly related dimensions of an attribute or construct
(Thompson, 2010). The original instrument (Appendix C) was constructed to reflect the
assessment competencies articulated in the STCEAS. There were 35 items, and seven dimensions
or subscales (i.e., each subscale corresponding to a standard, with 5 items per subscale). Each
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 92
item consisted of a vignette or scenario. Respondents selected one of four options that they
believe best answers the question. Table 3.2 includes sample items from each of the seven
dimensions (i.e., subscales) of assessment literacy measured by the CALI.
Table 3.2. Sample Items for Subscales of Assessment Literacy in the Original CALI
Assessment Literacy Dimension Sample Item
Teachers should be skilled in choosing
assessment methods appropriate for
instructional decisions.
Mrs. Bruce wished to assess her students' understanding of the
method of problem solving she had been teaching. Which
assessment strategy below would be most valid?
Teachers should be skilled in developing
assessment methods appropriate for
instructional decisions.
Ms. Gregory wants to assess her students' skills in organizing
ideas rather than just repeating facts. Which words should she
use in formulating essay exercises to achieve this goal?
The teacher should be skilled in
administering, scoring and interpreting the
results of both externally-produced and
teacher-produced assessment methods.
Many teachers score classroom tests using a 100-point percent
correct scale. In general, what does a student's score of 90 on
such a scale mean?
Teachers should be skilled in using
assessment results when making decisions
about individual students, planning teaching,
developing curriculum, and school
improvement.
Ms. Camp is starting a new semester with a factoring unit in her
Algebra I class. Before beginning the unit, she gives her students
a test on the commutative, associative, and distributive properties
of addition and multiplication. Which of the following is the
most likely reason she gives this test to her students?
Teachers should be skilled in developing
valid pupil grading procedures which use
pupil assessments.
A teacher gave three tests during a grading period and she wants
to weight them all equally when assigning grades. The goal of
the grading program is to rank order students on achievement. In
order to achieve this goal, which of the following should be
closest to equal?
Teachers should be skilled in communicating
assessment results to students, parents,
other lay audiences, and other educators.
In a routine conference with Mary's parents, Mrs. Estes observed
that Mary's scores on the state assessment program's quantitative
reasoning tests indicate Mary is performing better in
mathematics concepts than in mathematics computation. This
probably means that:
Teachers should be skilled in recognizing
unethical, illegal, and otherwise
inappropriate assessment methods and uses of
assessment information.
Mrs. Brown wants to let her students know how they did on their
test as quickly as possible. She tells her students that their scored
tests will be on a chair outside of her room immediately after
school. The students may come by and pick out their graded test
from among the other tests for their class. What is wrong with
Mrs. Brown's action?
For the purposes of this study, I adapted the CALI in two important ways. First, I reduced
the number of subscales from seven to four (and thus, the number of items from 35 to 20), and
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 93
altered the vignettes to reflect contexts familiar to music teachers while retaining the content
objective associated with each item (Table 3.2). Secondly, all items in this measure were
randomized to reduce the likelihood of a testing effect (i.e., pretest and posttest versions of the
questionnaire functioned as parallel forms of the same measure).The decision to delimit the
number of items and subscales in the CALI was based upon several factors: (a) researchers have
identified the first four standards as the most likely to be improved through professional
development with inservice teacher populations (Mertler, 2005, 2009), (b) the first four standards
address competencies that are grounded in classroom practices and educational decision making,
(c) music teachers, specifically, are reported to need training in classroom assessment principles
(Russell & Austin, 2010), and (d) my interest, as the researcher, in designing a professional
development intervention that would be viewed as both meaningful and manageable, thereby
reducing participant attrition.
Assessment Practices
I used a researcher-developed measure, the Music Teacher Assessment Implementation
Inventory (MTAII), to evaluate the frequency and manner with which music teachers’ employ
specific forms of assessment, and the functions for which they use assessment in their
classrooms. This measure consisted of two parts (Appendix G). In the first section, respondents
self-reported how often they utilized specific assessment forms. In the second section,
respondents self-reported how often they used assessments for a given function.
The MTAII was developed after consulting prior research by both general education and
music education researchers. This measure consisted of two matrices with five items rated on a
5-point frequency scale of “Never” to “Almost Always.” To frame responses, I asked
participants to reflect upon a class that represented the most typical example of their teaching
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 94
area. Then, I asked them to self-report the frequency with which they had used specific forms of
assessment in the last four weeks. I determined four weeks would be the longest amount of time
teachers could accurately self-report their assessment usage through recall, or checking relevant
lesson plans and/or their gradebook. Eight forms of assessment were presented: written
tests/quizzes, written classwork/homework, group performance, individual performance,
projects, portfolios, attendance, and participation. In the second matrix, participants were asked,
using the same framing and scale, the frequency with which they had employed assessments for
a given function within the last four weeks. I presented five assessment functions (e.g.
summative, formative, diagnostic, placement, and extramusical) common to music teachers’
practice. For the posttest, I slightly modified the MTAII; rather than asking participants to reflect
upon the previous four weeks, I asked them to project the frequency with which they would use
specific forms of assessment, and for what functions, in the four weeks to come.
Assessment Beliefs
Researchers have speculated that, due to the contextual realities of their jobs, music
teachers may hold conceptions of assessment that differ from the general teaching population
(Russell & Austin, 2010). While Brown’s TCoA questionnaire has been validated for a general
teaching population, the Music Teacher Assessment Beliefs Inventory (MTABI) is a measure
adapted for use with music teacher populations (Austin & Russell, 2017). This measure consists
of 17 (nine positively-phrased, eight negatively phrased) characterizing assessment of music
learning in terms of importance, value, relevance, and trustworthiness; participants were directed
to indicate their level of agreement with each statement using a six-point Likert type scale
(Appendix H). In their study, Austin and Russell (2017) reported high internal consistency (α =
.92) for their measure.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 95
Intervention Group Posttest
The posttest was identical to the pretest with the exception that music teachers assigned
to the intervention group were asked to respond to five open-ended questions about their
experience completing an online professional development program. Responses to these
questions were ancillary to the main research questions, but provided me, as the researcher, with
some basis for contextualizing the findings and judging the efficacy of the intervention. As with
the pretest, items were randomized within each section (corresponding to the CALI, MTAII, and
MTABI) to control for a testing effect and response set.
Procedures
Pilot Testing
I pilot tested the pretest questionnaire (inclusive of the CALI, MTAII, and MTABI
measures) between January 6, 2020 and January 31, 2020 using a sample of 28 Colorado music
educators. I viewed pilot testing as an opportunity to establish the face validity of the research
measures, refine certain items, and gauge the user experience (e.g., the time and effort required
to complete the questionnaire). Face validity is an informal way of evaluating whether a measure
appears appropriate to assess a construct (Lawrence & Adams, 2019). In addition, I sought input
from several music teacher educator colleagues about the layout and accessibility of the online
professional development course. Based upon user feedback, I modified the wording of two
questions in the assessment literacy measure, and limited the number of items per page to five.
Further information about the psychometric properties of the CALI and MTABI measures can be
found in Chapter 4; because the MTAII was designed to capture purely descriptive data about
music educators' assessment practices, psychometric analysis was not warranted.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 96
Prior to implementing the main study and gathering data, I completed the IRB procedures
for all human subject research under the auspices of the University of Colorado Boulder. I was
granted IRB approval to conduct the study on February 13, 2020. Then, I applied for NAfME’s
Research Survey Assistance, which was approved on March 6, 2020 (Appendix A).
Data Collection
The solicitation email, informing target population members about the study and
encouraging them to participate, was distributed on March 10, 2020. While my original intention
was for NAfME to send a follow-up email the very next week, NAfME ceased all research
activities on March 17, 2020 due to the COVID-19 pandemic. I did, however, use the emails of
people who had provided informed consent but not completed the pretest measure, to send a
second wave of solicitation emails on March 18, 2020. Participants (N = 108) who completed the
prescreening questionnaire and provided informed consent were automatically randomly
assigned to the control or intervention group (through an automated feature within Qualtrics),
and then emailed the pretest measure by using an automated trigger email function (Appendix J).
For those in the intervention group who completed the pretest measure, another automatic email
was sent containing instructions for registering to take the MTAW. I closed the pretest measure
on March 23, 2020. A total of 74 participants completed the pretest (39 in the intervention group,
35 in the control group).
During the subsequent four weeks, from March 23 through April 19, 2020, I used Google
Classrooms to provide music teachers assigned to the treatment group a professional
development (PD) experience focused on assessment. Each week, I sent PD participants an email
describing the topic and tasks to be covered within the module. I monitored the class daily, in
order to respond quickly to participant comments and questions about the intervention. I also
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 97
reviewed participants’ submissions and provided informal, formative feedback in written form,
as would be customary for most multi-week PD offerings. At the conclusion of the intervention I
emailed all participants, including those in the control group, the link to the posttest (Appendix
J). The posttest was kept open until April 30th, and I sent two subsequent reminder emails to
participants to complete the posttest.
Data Analysis
On April 30th, I closed the posttest and downloaded the data files from Qualtrics. A total
of 18 intervention group participants completed the posttest, along with 25 participants who had
been assigned to the control group. All files were compiled into one database where participant
responses were linked through their school email. Data were analyzed using the Statistical
Package for Social Sciences (SPSS, Version 26.0.0.0 for Windows 10, 2019). The file contained
a total of 179 variables, including demographic information (13 items), as well as pretest and
posttest responses to the 20 assessment literacy items, the two assessment practice matrices, and
the 17 assessment belief items. I recoded the assessment literacy items to reflect their ipsative
nature by using “1” for correct responses and a “0” for incorrect responses. This allowed me to
compute total participant scores on the assessment literacy items, as well as difference scores
between participants’ performance on the pretest and posttest. I also reverse coded participant
responses to the negatively phrased items within the assessment beliefs matrices, and then
summed responses across items to compute a total assessment beliefs score. Then, I computed a
difference score for participants’ assessment beliefs. Finally, I summed participants’ responses to
individual items on the MTAII, and computed difference scores for each item.
After coding responses and creating composite measures and difference scores, I used
descriptive statistics to summarize participant responses and assess the normality of responses to
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 98
the assessment beliefs measure. Then, I estimated the internal consistency of the CALI
instrument (using a Kuder-Richardson 20 procedure) and the MTABI instrument (using
Cronbach’s alpha). I also used Pearson correlations to analyze the relationships among CALI
items, their respective subscales, and participants’ total scores on the pretest.
To determine whether the professional development intervention had a significant effect
on music teacher assessment literacy and beliefs, I used a multivariate analysis of variance
(MANOVA) with assigned group serving as the independent variable and assessment literacy
and assessment belief difference scores (posttest mean minus pretest mean) serving as the
dependent variable set. Historically, the use of difference scores to determine whether an
experimental manipulation or educational intervention has had an intended effect has been
discouraged by some statisticians and psychometricians. In recent years, however, scholars have
qualified such criticisms depending on the kind of design employed and the nature of the
research question (Henson, 2001; Thomas & Zumbo, 2012; Thompson, 2010). For example,
Thompson (2010) objected to the use of repeated measures ANOVA in designs where there is
only one posttest, and argued that difference scores were more accurate in determining the
efficacy of an intervention in such designs. Thomas and Zumbo (2012) demonstrated
mathematically why there is little difference in the statistical outcome between using difference
scores in a MANOVA and using a repeated measures analysis. They stipulated, however, that
there might be slight differences in the significance level and effect size.
In addition to the MANOVA for intervention effects on music teacher assessment literacy
and beliefs, I used a Mann Whitney U procedure to compare assessment practices of music
teachers representing the intervention and control groups. This was necessary because the data
for participants’ assessment practices did not meet the assumption of normality required for the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 99
parametric equivalent. Upon examination of the descriptive statistics from participants’
responses to the assessment practice items, it was apparent that participants interpreted the items
as ordinal rather than intervallic in nature (i.e., that the differences between points on the scale
were not equal). This further reinforced my decision to use a Mann Whitney U test to compare
participants’ pretest and posttest frequency reports for assessment practices.
Finally, I used Pearson correlations and Spearman’s Rho analyses to answer the fourth
research question and explore possible relationships between music teachers’ assessment
literacy, beliefs, and practices. A Spearman’s Rho operates similarly to the Pearson correlation,
but can be used when one or more variables being correlated reflect ordinal measurement.
Additionally, while “in parametric correlation the relationship between the two variables should
be linear, the relationship between any two variables being examined through Spearman
correlation should be monotonic” (Russell, 2018, p. 292). That is, the relationship between the
variables should demonstrate mutual growth or decrease, or an inverse relationship. Because the
assessment practices data were ordinal, this test was the most appropriate analysis to use. I
utilized both participants’ pretest and posttest scores for the measures; specifically, the summed
pretest and posttest literacy scores, the summed pretest and posttest assessment belief scores, and
the self-reported pretest and posttest mean scores of assessment practices.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 100
Chapter 4
Results
Nearly 20,000 inservice K-12 music educators were solicited for study participation via
the National Association for Music Education’s (NAfME) Research Survey Assistance program
in March 2020. A follow-up email was not distributed after NAfME halted all research assistance
on March 18th due to the COVID-19 pandemic. Of the NAfME members who received the
invitation email, 247 educators accessed the link describing the study, 108 provided informed
consent and were randomly assigned (through a feature within Qualtrics) to either the control or
the intervention group, 74 completed the pretest in full, and 43 completed the study in full. Using
a feature in Qualtrics, I randomly assigned participants to approximately equal groups.
I analyzed data using IBM SPSS Statistics (Version 26.0). First, I sought to establish the
extent to which the control and intervention groups were both representative and equivalent in
terms of participant characteristics. Second, I estimated the reliability of the Classroom
Assessment Literacy Inventory (CALI) and the Music Teacher Assessment Beliefs Inventory
(MTABI), and also conducted item-level difficulty and discrimination analyses for the CALI.
Third, I produced descriptive statistics for participants’ demographic features and responses to
each of the three measures: the CALI, the Music Teacher Assessment Implementation Inventory
(MTAII), and the MTABI. Finally, I conducted difference testing (i.e., MANOVA, Mann-
Whitney U) to answer the research questions. The following chapter is organized in the four
sections described above. The research questions addressed group differences over time and
relations among assessment literacy, beliefs, and practices; thus, all analyses associated with the
major research questions were based upon the total number of participants who completed the
study.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 101
Participant Demographics
Participant characteristics can be found in Table 4.1. A volunteer sample of forty-three
music educators from a target population of roughly 20,000 NAfME members currently teaching
music in K-12 schools completed the requirements of the study. Of those, 25 (58.1%)
participants represented the control group, and 18 (41.9%) the intervention group. Twenty-seven
of the participants were female (62.8%), and 16 were male (37.2%); within the respective
experimental groups, there were 15 females and 10 males in the control group, and 12 females
and six males in the intervention group. I found, using a chi-square test of independence, no
significant association between assigned group and gender [ꭗ2 (1) = 0.19, p = .655]. A non-
significant association implies equivalence of groups.
Within this sample, 37 of the participants identified as Caucasian or non-Hispanic (86%),
while the remaining six identified as one of the four other ethnicities. In the control group, 22
participants identified as Caucasian, one as Hispanic or Latinx, one as American Indian, and one
as bi- or multi-racial; within the intervention group, 15 identified as Caucasian, one as Black or
African American, and two as bi- or multi-racial. Using a 2 x 2 chi-square test of independence
with races and ethnicities collapsed to Caucasian and non-Caucasian, I found no significant
association between assigned group and race/ethnicity [ ꭗ2 (1) = 0.19, p = .663].
Of the 43 participants, 19 taught elementary grades (44.2%), one taught grades K-8
(2.3%), eight taught exclusively middle school grades (18.6%), ten taught a mixture of grades 6-
12 (23.3%), four taught exclusively high school grades (9.3%), and one taught all K-12 grade
levels (2.3%). Within the control group, 13 participants taught some combination of elementary
grades, four taught middle school grades, and eight taught a mixture of middle and high school
grades. In the intervention group, eight taught mostly elementary grades, four taught middle
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 102
school grades, and six taught mostly high school grades. After collapsing these descriptions into
“elementary”, “middle”, and “high” levels to ensure cells had a minimum count of five, I found
no significant association between the assigned group and grade levels taught [ꭗ2 (2) = 0.35, p =
.840].
Participants taught a mixture of musical subjects. Fifteen participants taught chorus or
vocal courses (34.9%), 16 taught band (37.2%), six taught orchestra (14%), nine taught an
instrumental ensemble other than band or orchestra (e.g., guitar, jazz band, marching band, etc.;
20.9%), 27 taught general music (62.8%), two taught music appreciation (4.7%), two taught
music theory (4.7%), and three taught in a visual and performing arts program (7%). Due to the
number of courses present, I could not collapse descriptors into a reasonable number to conduct a
chi-square analysis; however, groups had comparable counts of the three most prominent
ensemble courses (i.e., band, chorus, and orchestra), alternative ensembles, and other courses.
There was a large range, from 2 to 39 years, of teaching experience within this sample (M
= 14.37, SD = 9.14. Within the control group, participants averaged 15.20 years of experience
(SD = 8.99), and those in the intervention group averaged 13.22 years of experience (SD = 9.48).
Using an independent samples t-test, I found no significant difference between experimental
groups based upon participants’ average years of teaching experience [t(41) = .002, p = .491].
With regard to educational credentials, thirteen participants reported holding a Bachelor’s
degree (30.2%), 18 held a Master’s degree (41.9%), 11 had acquired an additional 30 credits
after a Master’s degree (25.6%), and one had completed a Doctoral degree (2.3%). I collapsed
the post-baccalaureate degree category to include the Master’s, “+30” and Doctoral degree, and
found no significant association between assigned group and level of education [ ꭗ2 (1) = 0.09, p =
.766].
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 103
Finally, I asked participants five questions about their prior experiences with assessment,
and general preparedness to teach music and assess student learning. Only 11.6% (5) reported
having a specific course devoted to assessment in their undergraduate coursework. On a six-point
Likert type scale ranging from “Very Unprepared” to “Very Prepared”, participants, on the
whole, reported feeling prepared to be a music teacher after their undergraduate coursework
(72.09%), while only 18 (41.9%) reported feeling prepared to assess student learning. A majority
of participants also reported that they had never taken a prior workshop on assessment (67.7%),
or any kind of course on assessment following their undergraduate coursework (76.7%).
Table 4.1. Descriptive Statistics for Participant Demographics (N= 43)
n % n %
Experimental Group Courses Taught
Control Group 25 58.1 Vocal/Choral 15 34.9
Intervention Group 18 41.9 Instrumental Band 16 37.2
Gender Instrumental Orchestra 6 14.0
Female 27 62.8 Instrumental Other 9 20.9
Male 16 37.2 General Music 27 62.8
Race & Ethnicity Music Appreciation 2 4.7
Caucasian or Non-Hispanic 37 86.0 Music Theory 2 4.7
Black or African American 1 2.3 Visual & Performing Arts 3 7.0
Hispanic or Latinx 1 2.3 Educational Background
American Indian or Alaska Native 1 2.3 Bachelor's degree 13 30.2
Biracial or multi-racial 3 7.0 Master's degree 18 41.9
Grade Level Master's +30 11 25.6
Elementary 19 44.2 Doctoral degree 1 2.3
Elementary + Middle 1 2.3 Years Teaching Experience
Middle 8 18.6 1-10 16 37.2
Middle + High 10 23.3 11-20 19 44.2
High 4 9.3 21-30 6 13.9
K12 1 2.3 31-40 2 4.7
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 104
Reliability and Item Analysis
Given the importance of research measure fidelity in providing a trustworthy test of an
educational intervention, I estimated the reliability of the adapted Classroom Assessment
Literacy Inventory (CALI) for a music education context (Mertler, 2004), and confirmed the
reliability of the Music Teacher Assessment Beliefs Inventory (Austin & Russell, 2017). I did not
evaluate the reliability of the Music Teacher Assessment Implementation Inventory (MTAII),
because the measure served a primarily descriptive purpose (i.e., collected scaled frequency data,
rather than participants’ level of agreement with statements, or attitudes). For the purposes of
reliability and item analysis, I utilized the pretest responses of the 43 music educators who
completed the requirements of the study.
CALI Reliability and Item Analysis
In prior studies, researchers have used the Kuder-Richardson 20 (KR20) index to estimate
the reliability of assessment literacy scores measured by the CALI and its related forms (i.e.,
TALQ, ALI). The KR20 index is used to calculate the internal consistency of dichotomously
scored items -- items that may be scored correctly or incorrectly (Thompson, 2010). Normally,
KR20 values range from 0 to 1, with higher values representing a more internally consistent
instrument, and values of .70 or higher considered adequate for research purposes. It is important
to note, however, that this standard is commonly understood to apply to instruments with 50 or
more items of homogenous difficulty (Thompson, 2020, p. 668). Finally, squaring the KR20
index provides an estimation of score variance that did not result from error.
Impara et al. (1993) reported the TALQ reliability estimate as .54 for a national sample of
555 inservice teachers, and Campbell (2002) used the ALI with a convenience sample of 220
preservice teachers, reporting a reliability estimate of .74. In a 2004 study of 67 preservice
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 105
teachers and 101 inservice teachers, Mertler reported KR20 estimates of .74 for preservice
teachers and .44 for inservice teachers on the CALI, which he interpreted as “comparable” to
prior researchers’ reliability estimates.
The original CALI measured seven facets of assessment literacy, corresponding to the
seven Standards for Teacher Competence in the Educational Assessment of Students (STCEAS),
using a total of 35 items. In my adapted measure, I reduced the number of facets to four, and a
total of 20 items. I used the KR20 index to evaluate the internal consistency of my adapted CALI
measure. For the sample of 43 music educators who completed all requirements of the study the
KR20 reliability estimate was .29. Such a result could indicate poor internal consistency of the
measure. To investigate possible reasons for poor internal consistency I conducted three further
analyses: correlations among scores for individual CALI items, 5-item standards, and the 20-item
CALI measure; difficulty indices for individual items; and discrimination indices for individual
items.
Correlations
I computed scores for each standard by summing the number of items correct out of five
items for each standard. I also summed the items correct across the four standards to arrive at a
total CALI score. I calculated bivariate correlations between scores on each standard and total
CALI scores (Table 4.2). All correlations between total CALI scores and scores for each standard
were of moderate magnitude (r = .42 to .67) and significant at p < .01. Thus, it appeared that
participants’ performance within each standard moderately correlated with their overall test
performance. Next, I correlated participants’ scores on individual items against the total score for
the corresponding standard (Table 4.2). Across all four standards, scores for at least three items
were moderately and significantly correlated to total scores for the standard. Item-standard
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 106
correlations were weakest for Item 3.2 and Items 4.2 and 4.5. Based upon these correlations, it is
possible to conclude that there is reasonable alignment from item level responses to scores for
CALI standards.
Table 4.2. Correlations of Item Scores with CALI Standards
(N = 43) Pretest Total Item 1 Item 2 Item 3 Item 4 Item 5
Standard 1 .42** .45** .66** .38* .50** .51**
Standard 2 .67** .36* .46** .46** .36* .70**
Standard 3 .64** .58** .22 .50** .59** .42**
Standard 4 .57** .42** .16 .42** .70** .21
**Significant at p < .01 (2-tailed).
*Significant at p < .05 (2-tailed).
Difficulty and Discrimination Indices
To provide additional perspective on the psychometric quality of CALI items, I conducted
difficulty and discrimination analyses (Table 4.3). A difficulty index is essentially the percentage
of respondents that answered an item correctly, expressed as a decimal. A discrimination index is
a measure comparing the percentage of correct responses from the highest and lowest scoring
groups to evaluate if the item discriminates between the respondents scoring highest and lowest
overall (Salkind, 2018, p. 166). The formula for calculating a discrimination index is d = Nh - Nl /
(p)T, where d is the discrimination index, Nh is the number of respondents in the high scoring
group, Nl is the number of respondents in the low scoring group, p is the percentage threshold the
researcher selects to limit the high and low scoring groups to, and T is the total number of
responses for the item. Like a Pearson correlation, it is expressed as a value between 0 (no
discrimination) and 1 (high discrimination), with positive values representing instances where
more respondents in the high scoring group answered correctly than the low scoring group (p.
167). Thus, the discrimination of an item is constrained by its difficulty; an item can only have
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 107
perfect discrimination if all of those in the highest scoring group answered correctly, and all of
those in the lowest scoring group scored incorrectly (p. 169), I selected a threshold of the highest
and lowest scoring 20% from the sample (n = 16).
Table 4.3. Difficulty & Discrimination Indices
D d D d
Standard 1 Standard 3
1.1 0.98 0.00 3.1 0.81 0.1
1.2 0.58 0.00 3.2 0.93 0.00
1.3 0.93 0.00 3.3 0.53 0.20
1.4 0.84 0.10 3.4 0.51 0.20
1.5 0.26 0.05 3.5 0.95 0.00
Standard 2 Standard 4
2.1 0.88 0.10 4.1 0.42 0.10
2.2 0.19 0.15 4.2 0.86 0.00
2.3 0.63 0.15 4.3 0.42 0.20
2.4 0.98 0.00 4.4 0.40 0.20
2.5 0.44 0.20 4.5 0.09 0.00
Overall, it appears that there were a number of easier items (i.e., 9 of the 20 items had a
difficulty index >.80) that likely constrained the discriminating power of the CALI, and possibly
contributed to lack of internal consistency as well. It is important to note that item difficulty and
discrimination indices are not always accurate representations of item quality. A certain number
of difficult and easy items may be needed to adequately sample the full range of assessment
literacy among a group of music teachers. Item difficulty and discrimination indices also are
sensitive to the type and number of individuals being tested, as well as random error (e.g., item
ambiguity, clues, or other technical defects).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 108
The two most difficult items for participants (Item 2.2 and 4.5) had limited to no ability
to discriminate between the highest and lowest performing respondents. I performed a distractor
analysis to determine which of the foils participants selected the most frequently. Item 2.2
(Appendix F) asked respondents to determine which strategy would increase the reliability of a
test. Participants most frequently selected (incorrectly) the first foil, which more accurately
defined how participants would increase the validity of a test. Thus, I determined that this item
was difficult because participants might not have known the difference between reliability and
validity. Item 4.5 asked respondents to determine which factor might invalidate comparisons
between scores on standardized tests. Participants almost exclusively selected the first foil, which
speculated that scores might differ on tests in districts more aligned to the standards upon which
the test was based. I determined that participants likely answered this incorrectly because they
did not understand the difference between an invalid comparison of scores and a valid
comparison of scores. A detailed distractor analysis can be found in Appendix K.
MTABI Reliability Analysis
I utilized Austin and Russell’s (2017) instrument for this study; however, rather than
create subscale scores based on factor analysis results, I reverse coded negatively phrased items
and then summed responses across all items to create total MTABI scores reflecting the extent to
which participants adopted a positive orientation toward assessment as an important, valuable,
and trustworthy aspect of music teaching and learning.. Upon request Austin provided
Cronbach's α for their data (N = 406), after reverse coding to match the specifications of this
study; he reported strong reliability for the measure (α = .92). I used the same procedure to
establish the internal consistency of the 17 items for the 43 participants who completed the
pretest, and found their scores to be highly consistent (α = .89).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 109
Participant Attrition
I would be remiss not to note the possible impact of participant attrition on study results;
especially in light of the study occurring during the beginning of the first wave of the COVID-19
pandemic. I initially recruited 108 participants. Of that number, only 74 completed the pretest
measure (31 in the control group, and 43 in the intervention group). From March 18th, when the
pretest was first available, to March 23rd, when the intervention began, 18 participants enrolled
in and later completed the professional development workshop. The posttest was distributed on
April 2th and closed on April 27th; a total of 43 participants completed the measure and the
study (25 in the control group, and 18 in the intervention group). Thus, over the course of the
study, from March 18th until April 27th, approximately 42% of participants dropped out of the
study, with, as might be expected, a greater attrition rate for the intervention group (58%) than
for the control group (19%). To determine whether attrition differentially impacted intervention
group teachers as compared to control group teachers, I conducted a 2 x 2 factorial ANOVA,
with assigned group (intervention, control) and study completion status (completed, not
completed) as the independent variables, and pretest CALI scores as the dependent measure.
There was no significant interaction (F = .27, p = .602) or main effects (F = .13, p = .718, for
assigned group, F = .36, p = .553 for study completion status), which suggests that attrition did
not differentially affect the groups in terms of the assessment literacy they exhibited at the
beginning of the study. I then repeated this analysis with pretest MTABI sum scores as the
dependent measure. Again, I found no significant interaction (F = .07, p = .797) or main effects
(F = .86, p = .358, for assigned group, F = .00, p = .949, for study completion status). Thus, I
concluded that participants’ beliefs were not significantly impacted by attrition.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 110
Descriptive Statistics for the CALI, MTAII, and MTABI
Using SPSS 26, I conducted descriptive analyses of the 43 participants’ responses on the
pretest and posttest CALI, MTAII, and MTABI measures. Results are reported in tables 4.4, 4.5,
and 4.6. Descriptive analyses were required to evaluate that the data met assumptions required to
conduct further inferential statistical analyses.
CALI Descriptives
Participants responded to 20 multiple-choice items. After data collection, items were
recoded to reflect the ipsative (i.e., correct or incorrect) nature of the questions using “1” for
correct responses and “0” for incorrect responses. Subsequently, data were summed for the total
measure, and each of the four STCEAS standards to which they corresponded. I ran descriptive
statistics on these summed scores; the means and standard deviations are reported in Table 4.4.
Table 4.4. CALI Pretest and Posttest Descriptive Statistics (N = 43)
Pretest Posttest
Mean SD Skew Kurtosis Mean SD Skew Kurtosis
Standard 1 3.58 0.91 -0.46 0.43 3.95 0.69 0.06 -0.19
Standard 2 3.12 0.93 0.13 -0.30 3.26 0.82 0.03 -0.80
Standard 3 3.74 0.93 -0.77 0.75 3.93 0.83 -0.13 -0.60
Standard 4 2.19 0.88 0.72 0.10 2.14 1.10 -0.18 -0.93
Total Score 12.63 2.09 0.36 6.19 13.28 1.92 -0.29 -0.19
The general pattern is one of means increasing and standard deviations decreasing over time,
except for Standard 4 (using assessment results to make educational decisions), where the
posttest mean was smaller, and the standard deviation was larger. Further, music teachers
generally appeared to have lower levels of literacy knowledge surrounding Standards 2
(developing and implementing appropriate assessments) and 4.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 111
MTAII Descriptives
I designed this measure after consulting prior researchers' examination of music
teachers’ assessment practices. Within the literature, researchers conceptualized assessment
practices in two ways: by the specific forms of assessment used (e.g., tests, classwork,
performances, etc.) and the purposes for which these assessments were used (e.g., summative,
formative, diagnostic, etc.). I attempted to account for both form and function by asking music
educators to report the frequency with which they used specific forms of assessment and how
often they used assessments to serve varied functions. The scale ranged from 1-6, with options
“Never”, “Less than Once Per Week”, “Once Per Week”, “Several Times Per Week”, “Nearly
Every Day”, and “Always.” Descriptive statistics for the MTAII are reported in Table 4.5.
Table 4.5. MTAII Pretest and Posttest Descriptive Statistics (N = 43)
Pretest Posttest
x̄ SD x̄ SD
Forms Participation 4.18 1.45 3.68 1.48
Group Performances 3.78 1.22 2.94 1.63
Individual Performances 2.94 1.13 2.72 1.11
Attendance 3.14 1.90 2.52 1.76
Written Classwork 1.92 0.85 2.50 0.97
Projects 2.08 1.18 2.16 0.93
Written Tests & Quizzes 1.74 0.78 1.82 0.69
Portfolios 1.22 0.47 1.40 0.93
Purposes Formative 3.70 1.28 3.72 1.34
Diagnostic 3.14 1.49 3.00 1.40
Extramusical 2.44 1.50 2.10 1.18
Summative 2.16 0.84 1.98 0.74
Placement 1.50 0.91 1.50 0.74
*Frequency scale from 1-6 (Never to Always)
**Bolded posttest means highlight increases from pretest means.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 112
Within the eight specific assessment forms comprising music teacher practices,
participants generally reported more frequent use of written tests and quizzes, homework,
projects, and portfolios on the posttest. Participants also reported using individual performances,
group performances, attendance, and participation less frequently in their assessment of student
learning. With regard to the functions of assessment, participants self-reported using fewer
summative, diagnostic, and extramusical assessments, and approximately the same number of
assessments for formative and placement purposes.
MTABI Descriptives
As previously discussed, this measure was adapted from Austin and Russell’s (2017)
study. Participants used a 6-point Likert-type scale with response options ranging from “Strongly
Disagree to Strongly Agree” to rate their level of agreement with 17 statements comprising
possible beliefs about the value and use of assessments. After data collection, I reverse coded
negatively-phrased items. The means and standard deviations (pretest and posttest) for the 17
items completed by study participants are reported in Table 4.6.
Table 4.6. MTABI Pretest and Posttest Descriptives (N = 43)
Pretest Posttest
x̄ SD x̄ SD
Assessment is an important music teacher responsibility 5.05 0.98 5.16 0.75
Assessment and instruction can be seamlessly integrated 4.86 0.99 5.12 0.91
Assessment helps music teachers to be effective 5.00 0.87 5.00 0.85
Assessment has little impact on music teaching⍑ 4.86 1.21 5.00 0.87
Assessment forces music teachers to contradict their beliefs⍑ 4.30 1.32 4.81 0.98
Assessment consistently provides useful information 4.72 0.93 4.70 0.74
Assessment results are rightfully ignored by most music teachers⍑ 4.56 1.03 4.67 1.06
Assessment reduces music teacher creativity⍑ 4.47 1.33 4.65 1.02
Assessment causes music teachers to be conformists⍑ 4.19 1.26 4.56 1.05
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 113
Assessment results are of great use to music teachers 4.30 1.41 4.51 1.06
Assessment interferes with teaching⍑ 3.98 1.32 4.47 1.1
Assessment results are dependable 4.09 0.81 4.37 0.79
Assessment results are trustworthy 4.12 0.79 4.35 0.78
Assessment helps music teachers treat their students fairly 4.02 1.32 4.26 1.22
Assessment results are often inaccurate⍑ 4.09 1.13 4.05 0.87
Assessment typically provides precise information 3.74 0.98 3.95 0.95
Assessment results are prone to error⍑ 3.70 1.06 3.86 0.94
Total 74.05 11.59 77.49 9.30
*Agreement scale from 1-6.
** Bolded posttest means indicate an increase from pretest means.
⍑ Negatively phrased items that were reverse coded after data collection.
After reverse coding items and comparing pretest and posttest descriptives, I found
participants began and ended the experiment with an overall positive orientation toward
assessment. While music teachers reported higher levels of agreement with 15 of the 17
statements, these changes were nominal.
Research Questions
The research questions for this investigation were:
1. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment literacy?
2. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment beliefs?
3. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment practices?
4. Are there significant relationships between music teachers’ assessment literacy, beliefs,
and practices?
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 114
In the following section, I describe how I met the assumptions necessary for the
parametric and nonparametric statistical analyses I employed, and the results of those analyses.
Multivariate Analysis of Assessment Literacy and Beliefs
To determine whether a four-week online professional development intervention had a
significant effect on music teachers’ assessment literacy and their beliefs about the value of
assessment, I employed a multivariate analysis of variance (MANOVA), with assessment
literacy and belief gain scores (i.e., the change in participants’ scores from pretest to posttest)
serving as the dependent variables. As discussed in Chapter 3, I considered this analysis the
most appropriate procedure for answering the research question -- as opposed to repeated
measures MANOVA or a MANCOVA.
The assumptions for a MANOVA are (Russell, 2018, p.131):
1. The data collected for the dependent variables are continuous, rather than categorical.
2. The data collected for the independent variable are categorical rather than continuous.
3. Each observation is independent of any other observation.
4. All of the dependent variables are normally distributed themselves, and any combination
of dependent variables is normally distributed (multivariate normality).
5. Each of the dependent variables has equal variance when compared to each independent
variable.
6. There is a linear relationship between independent and dependent variables.
All assumptions for this analysis were met. The first three assumptions were met through
study design. The fourth assumption was met through analyzing descriptive statistics, a Shapiro-
Wilk normality test, and visual inspection of Q-Q plots in SPSS 26. I determined that two cases
in my data set were outliers in the CALI difference score variable and used the “Select Cases”
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 115
function to omit them from subsequent analyses. As a result, my sample size for this analysis
was 41, with 24 participants in the control group and 17 in the intervention group. After retesting
my dataset, I determined that I met the assumptions for normality. The fifth assumption was met
by conducting a Box’s M test (Box M = 1.56, p = .669). Because the test was not significant, I
determined that each of the dependent variables had equivalent variance in relation to the
independent variable. The sixth assumption was met by generating a scatter plot matrix with
Loess lines for each level within the independent variable, and visually inspecting the matrices
for similar features.
MANOVA Results
Using the gain scores on the two dependent measures (i.e., assessment literacy and
beliefs) as the dependent variable set, I conducted a MANOVA. I found the overall model to be
significant (Λ = .785, F = 5.21, p = .01, ηp2 = .22). To determine which mean differences
contributed to the overall significant multivariate outcome, I subsequently conducted univariate
ANOVA tests. I met the equality of variance assumption for both assessment literacy change
scores [F(1, 39) = 1.255, p = .270], and assessment belief change scores [F(1, 39) = .252, p =
.618]. Based on the univariate ANOVA tests, I determined that there was significant group
difference for assessment literacy change scores [F(1) = 7.731, p = .008, ηp2 = .17], but not for
assessment belief change scores [F(1) = 1.580, p = .216]. Participants in the intervention group
(n = 17, x̄ = 1.41, SD = 2.15) exhibited significantly greater growth in assessment literacy over
time than their peers in the control group (n= 24, x̄ = -.25, SD = 1.67). There was not a
significant change in assessment beliefs between the intervention (n = 17, x̄ = 3.94, SD = 8.26)
and control (n = 24, x̄ = .83, SD = 7.46) groups over time. Figures 4.1 and 4.2 depict the pretest
and posttest scores by assigned group for the literacy and beliefs measures, respectively.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 116
Figure 4.1
Pretest to Posttest Mean Literacy Scores
Figure 4.2
Pretest to Posttest Mean Belief Scores
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 117
Nonparametric Analyses of Assessment Practices
To answer the third and fourth research questions, it was necessary to utilize
nonparametric analyses. This is because participant change scores on the MTAII did not meet the
assumptions required to perform parametric analyses. It was apparent that participants
conceived of the frequency scale as ordinal in nature (i.e., that the distances between points on
the frequency scale were not equal).
Thus, I employed a Mann-Whitney U test to determine whether differences existed
between groups for assessment form and assessment function/purposes (i.e., the third research
question). For this analysis, I utilized all 43 participants because I did not need to assume that the
distributions were normally distributed.
The assumptions for a Mann-Whitney U test are (Russell, 2018, p. 270):
1. The independent variable (or grouping variable) is bivariate. That is, the independent
variable is membership in one or another group or category.
2. The data collected for the dependent variable are ordinal or continuous rather than
categorical.
3. Each observation is independent of any other observation (most often accomplished
through random sampling).
All assumptions were met by virtue of the study design and type of data collected.
In order to answer the fourth research question, I used a Pearson correlation to analyze
the relationship between music teachers’ (N = 43) assessment literacy and beliefs, and a
Spearman’s Rho analysis to explore relationships between music teachers’ assessment literacy
and practices, and assessment beliefs and practices. I used pretest and posttest data for all
analyses. The assumptions for a Pearson correlation are like those of a Spearman’s Rho analysis
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 118
and differ only in that all data must use scale-level measurement. The assumptions for a
Spearman’s Rho analysis are (Russell, 2018, p. 291):
1. The variables need to be ordinal or continuous (ratio or interval).
2. Whereas in parametric correlation the relationship between the two variables should be
linear, the relationship between any two variables being examined through Spearman
correlation should be monotonic.
The study design and data collected met these assumptions. It should be noted that the
difference between monotonic and linear relationships is that monotonic relationships simply
require distributions to trend in mutually positive, negative, or inverse directions, while linear
relationships suggest that there is a possible point of complete linearity.
Mann-Whitney U Test Results
The results of this test can be found in Table 4.7. I found that there were no significant
differences between groups in the change scores of participants’ self-reported use of specific
assessment practices employed, or the purposes for which such practices were utilized. I
observed that change scores were nominal and eclipsed by the variance of participants’ scores.
Table 4.7. Mann-Whitney U Test Results on Assessment Practice Mean Change Scores (N = 43)
U Z p
Forms Written Tests & Quizzes 182.00 -1.13 .261
Written Classwork & Homework 219.50 -0.14 .886
Group Performances 162.00 -1.58 .114
Individual Performances 207.50 -0.45 .651
Projects 176.00 -1.28 .199
Portfolios 185.00 -1.19 .236
Attendance 214.50 -0.28 .778
Participation 196.50 -0.75 .454
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 119
Purposes Summative 196.00 -0.86 .391
Formative 190.00 -0.91 .365
Diagnostic 189.00 -0.911 .362
Placement 221.50 -0.09 .926
Extramusical 214.50 -0.27 .791
*Frequency scale from 1-6 (Never to Always)
Pearson Correlation and Spearman’s Rho Results
The results of these analyses can be found in Table 4.8. Using Pearson correlations, I
found a significant – albeit modest to moderate – relationship between the CALI and MTABI
pretest and posttest scores. Participants who scored highly on the CALI pretest and posttest were
also likely to score highly on the MTABI pretest and posttest. Participants who scored poorly on
the CALI pretest and posttest were somewhat likely to score poorly on the MTABI pretest and
posttest.
Using Spearman’s Rho analyses, I found several significant inverse relationships between
CALI posttest scores and participants’ self-reported use of written tests and quizzes, written
classwork, and participation. Participants who scored higher on the CALI posttest were less
likely to report employing written tests and quizzes, written classwork, and participation as
appraisals of students’ learning than peers who scored poorly on the CALI posttest.
I also found several significant relationships between participants’ assessment beliefs on
the MTABI pretest and posttest. Participants who scored highly on the MTABI pretest were
somewhat more likely to self-report using written tests and quizzes. Interestingly, participants
who scored highly on the MTABI posttest did not self-report using written tests and quizzes with
any greater frequency than participants who scored poorly. While there was not a relationship on
the MTABI pretest between those who reported using group performances, there was a significant
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 120
inverse relationship between those who scored highly on the MTABI and those who self-reported
using group performances with more regularity. That is, music teachers who indicated a higher
regard for assessment tended to self-report using group performances less frequently than those
who held a lower regard for assessment. This finding was the same for participants’ participation
scores; that is, there was not a relationship between participants’ MTABI pretest scores and self-
reported use of participation, but there was a significant inverse relationship between
participants’ posttest MTABI scores and self-reported use of participation. While there was a
significant inverse relationship between participants’ MTABI pretest scores and self-reported use
of attendance, on the posttest there was virtually no relationship to participants’ self-reported use
of attendance. At the outset of the study, participants with a higher regard for assessment were
less likely to report using attendance as an appraisal of student learning. Given that participants’
belief scores on the MTABI posttest held stable, it appears that attendance was no longer utilized
by high or low scorers. This finding likely had more to do with participants’ transition to digital,
distance learning platforms. Finally, participants’ who scored highly on the MTABI pretest and
posttest were moderately likely to self-report using assessment for formative purposes.
Table 4.8. Relationships between Assessment Literacy, Beliefs, and Practices (N = 43)
CALI Sum Score MTABI Sum Score Pretest Posttest Pretest Posttest
CALI Sum Score – – 0.33* 0.36*
MTABI Sum Score 0.33* 0.36* – –
MTAII Sum Scores
Written Tests & Quizzes 0.07 -0.38* 0.37* 0.11
Written Classwork 0.04 -0.33* 0.09 0.14
Group Performances 0.12 -0.04 0.10 -0.35*
Individual Performances -0.01 0.14 0.23 -0.10
Projects 0.06 0.08 0.04 0.02
Portfolios -0.27 0.17 0.13 0.13
Attendance -0.07 0.04 -0.31* 0.05
Participation 0.07 -0.33* -0.29 -0.36*
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 121
Summative -0.05 -0.05 -0.02 -0.07
Formative -0.01 0.09 0.52** 0.38*
Diagnostic 0.05 0.23 0.21 -0.20
Placement 0.02 0.07 0.20 -0.02
Extramusical 0.07 -0.24 0.11 0.21
**Significant at p < .01 (2-tailed).
*Significant at p < .05 (2-tailed).
Feedback from Intervention Participants
I solicited participant feedback about the intervention at the end of the posttest in the
form of five questions:
● Was the online professional development relevant to you as a music teacher?
● Was the online professional development course appropriately challenging or too
difficult?
● What did you like about the online professional development course?
● What would you have changed about the online professional development course to
make it more enjoyable or useful?
● Would you recommend this online professional development course to other music
educators?
I viewed these ancillary questions and intervention group members’ responses as an
important opportunity to contextualize findings, offer insights into future online course designs,
and provide information about what music teachers need and desire from professional
development experiences. This feedback was not collected or used for analysis, but only to aid in
interpretation and forming recommendations that could benefit music teachers directly. Eight
participants, of the 18 enrolled in the intervention, answered these questions; because slightly
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 122
fewer than half of the intervention participants provided comments, these should be interpreted
with this limitation in mind. The full opus of participant responses can be found in Appendix L.
Question One
All eight participants indicated that the course was relevant to their practice as a music
teacher. Responses ranged in detail from one-word affirmations, to explanations about what they
found particularly useful. One participant noted, “it was very relevant for me as a [sic] jh and hs
choir teacher, and I will be referring back to the chapters as I need.”
Question Two
With regards to the challenge of the course, there was considerable variability in
responses. Four of the respondents felt that the course was appropriately challenging. One
participant did note that specific sections were more challenging than others (“there was only one
section that I got sort of lost, but it wasn’t overly difficult”), but did not provide information
about which section was more challenging for them. Two felt that the course was challenging,
but only because of their current circumstances, in light of the COVID-19 pandemic; one opined,
“I did not have enough time to participate as fully as I would have liked”, and another lamented,
“appropriately challenging for ordinary circumstances, difficulty to manage in what became my
current situation.” One participant thought that the course was “to [sic] easy.” Such variability
was to be expected given the circumstances, and as in any normal learning environment.
Question Three
With regards to what participants liked about the professional development, there was a
wide range of responses addressing course design, materials, and activities. Most of the feedback
centered on the relevance and accessibility of the readings, especially the primary text by Brian
Shaw, “Music Assessment for Better Ensembles” (2018). One participant said, “I thought there
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 123
was a lot of great content in the primary book we were reading and discussing from. I always
love to learn of a new-to-me author or researcher”, and another stated, “I plan on buying the
book that some chapters of the readings were taken from so I can read it in full and mark it up.”
Others felt the online format was appealing and accessible. Several commented that the activities
were helpful, “focused on real world teaching problems”, that discussions were “engaging” and
“interactive”, and that the course allowed them the “chance to practice something like designing
an assessment.” As noted previously, designing a course that music teachers valued was a
personal goal of mine during this study. This feedback was helpful and aligned to other
researchers’ findings about online professional development (Boling et al., 2011; vanOostveen et
al., 2019; Wasserman & Migdal, 2019) and professional development targeted to teachers’
unique contextual factors (Guskey, 2003, 2009).
Question Four
Participants’ in the intervention also offered constructive criticism about what course
elements could be altered to enhance their experience. Two participants felt that the course was
fine in its current state, and two others only wished that it had not coincided with the COVID-19
pandemic, but acknowledged that “you [the researcher] had zero control of that” because the
course was tied to a dissertation project. One participant wished that the “course was offered
over a longer period. Eight or more weeks would have allowed me the time to participate more
fully.” Two participants offered helpful feedback about the accessibility of course materials; one
wished that readings were also offered in audio or video formats, and another wished there were
slightly fewer readings. Taking participants’ comments into account, I believe future iterations of
this course would benefit from greater use of alternative formats, and emphasizing some
materials over others, perhaps by making some optional.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 124
Question Five
When asked if they would recommend the course to other music teachers, all eight
participants who responded answered in the affirmative, and some were insightful about how
their peers may perceive the course. One noted, “[the course] approaches some negative thinking
and stereotypes that i hear and see from other music teachers in a way that is very clear and
shows alternative approaches, but it may be too [sic] engrained in some people to have a positive
outtake from this pd.” Another stated that “well designed PD in assessment is badly needed”, and
“I think a lot of teachers in my district would benefit from this development.” These comments
aligned with prior findings from other researchers, as well. Teachers do appear to desire
professional development, but they need it to be targeted to their context (Guskey, 2003, 2009).
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 125
Chapter 5
Summary and Conclusions
Shifts in educational policy over the last thirty years have increased the demand for
teachers to be proficient in classroom assessment. Yet, teacher educators have been slow to
respond in developing teachers’ assessment competency through alterations to their curriculum
(Darling-Hammond et al., 2002; DeLuca & Klinger, 2011; Gareis & Grant, 2015). However,
researchers have found that preservice music teachers who do receive training in assessment
have more favorable beliefs about assessment and their ability to use it effectively (Austin &
Russell, 2019), and that sustained and relevant professional development can effectively change
inservice teachers’ beliefs surrounding assessment (Huai et al., 2006; Koh, 2011; Mertler, 2009).
To date, no one has examined the effectiveness of using an online professional development
program to enhance music teachers’ assessment literacy, beliefs, and practices.
Therefore, the purpose of this pretest-posttest control group study was to examine the
effects of an online professional development intervention on music teachers’ assessment
literacy, beliefs, and practices. In the spring of 2020, I solicited participation from music
educators with NAfME membership. After two weeks, I obtained informed consent from 108
respondents. A total of 43 participants completed all requirements of the study: 18 in the
intervention group and 25 in the control group. Participants in the intervention group enrolled in
a four-week professional development (PD) focused on increasing music teacher assessment
literacy based upon the first four Standards for Teacher Competence in Educational Assessment
of Students (STCEAS). All participants completed a pretest and posttest consisting of three
measures: the Classroom Assessment Literacy Inventory (CALI), the Music Teacher Assessment
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 126
Implementation Inventory (MTAII), and the Music Teacher Assessment Beliefs Inventory
(MTABI).
In this chapter, I will provide a summary of the major findings, situate findings within the
extant literature, offer implications for music teacher PD, discuss limitations of the study, and
provide recommendations for future research.
Summary of Findings
Findings are organized by the four research questions of this study:
1. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment literacy?
2. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment beliefs?
3. Does a four-week online professional development intervention have a significant effect
on music teachers’ assessment practices?
4. Are there significant relationships between music teachers’ assessment literacy, beliefs,
and practices?
Assessment Literacy
Using multivariate analysis, I compared assessment literacy and assessment beliefs
change scores of intervention and control group participants. Given a significant multivariate
outcome, I conducted follow-up univariate tests to determine whether group differences applied
to one dependent variable or both. I found a significant difference between groups’ assessment
literacy scores, with a large effect size; thus, differences between assigned groups were unlikely
due to chance. Intervention group participants, on average, answered one to two additional
questions correctly (out of twenty; thus, increasing their score by about 15%) on the posttest than
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 127
they did on the pretest. Control group participants, on average, answered the same number of
questions correctly from pretest to posttest. On average, participants demonstrated the most
growth in responding to items from Standard 1 (selecting appropriate assessments), modest
growth for Standard 2 (designing and implementing assessments), while showing nominal to no
growth on items from Standard 3 (scoring and interpreting assessments) and Standard 4 (making
educational decisions based upon assessment data). Participants were most literate in selecting
assessments as well as scoring and interpreting assessment results, less literate in designing and
implementing assessments, and least literate in making decisions based upon assessment data.
Assessment Beliefs
I did not find significant group (intervention, control) differences in assessment belief
gain scores, although there was a qualitative change in music teachers’ assessment beliefs from
the pretest to the posttest. Overall, intervention group participants demonstrated modest growth
in assessment beliefs, while control group participants demonstrated relatively stable assessment
beliefs across time (Appendix M). Participants’ assessment belief scores averaged in the upper
third of possible values, suggesting an overall positive regard for assessment.
Assessment Practices
I found no significant differences between assigned groups for music teachers’ self-
reports of how frequently they used specific forms of assessment or how frequently they used
assessment to serve specific functions.
Relationships between Music Teachers’ Assessment Literacy, Beliefs, and Practices
I found a positive relationship between music teachers’ literacy and belief scores. This
suggests that participants who scored well on the literacy items tended to hold positive beliefs
about the usefulness, value, and trustworthiness of assessment as a basis for educational
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 128
reporting and decision making. There were several significant inverse relationships between
music teachers’ literacy scores and their self-reported use of written tests and quizzes, written
classwork, and participation to appraise student performance. There were also significant
relationships between music teachers’ belief scores and their self-reported use of specific
assessment forms (e.g., written tests and quizzes, group performances, attendance, and
participation) and the purpose for which they used assessment (e.g., formative assessments).
Discussion
In this section, I connect major findings to prior literature. Next, I contextualize the major
findings of this study by revisiting and re-imagining McMillan’s (2003) educational decision-
making conceptual map and illustrate how additional factors may shape music teachers’
assessment decision-making. Finally, I explore the role that measurement played in obtaining
these results.
Major Findings
Music Teachers Lack Prior Assessment Training
Only four of the music teachers in this study reported feeling prepared to assess students
after graduation. This finding aligned to Mertler’s (2001) survey of over six hundred inservice
teachers, where roughly the same proportion of respondents reported feeling prepared to assess
student learning after graduation. Consequently, the degree to which preservice training in
assessment moderates inservice music teachers’ assessment beliefs remains to be seen; yet, there
is some evidence that training and coursework for inservice music teachers is associated with
positive assessment beliefs (Austin & Russell, 2017).
While education experts and theorists argue that assessment is part of the instructional
process, it seems that teachers – and music teachers, specifically – do not necessarily conceive of
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 129
assessment as an integrated component of instruction. This perception – that assessment is a
significant and uniquely different skillset than instruction – is likely partially informed by the
lack of prior assessment training that music teachers receive. When asked if they felt prepared to
be a music teacher following undergraduate study, nearly three quarters of my participants
answered in the affirmative; yet, when asked if they felt prepared to assess student learning, only
one out of ten participants answered in the affirmative. Ludwig (2013) found that inservice
teachers who felt confident about their assessment knowledge were more likely to have prior
training in assessment, and to hold positive beliefs surrounding the accountability purposes of
assessment.
To date, music education researchers have not explicitly investigated assessment literacy
amongst inservice or preservice music teacher populations. I did collect demographic
information about participants’ prior assessment training and terminal degree; my findings were
comparable to Austin and Russell (2016), when they surveyed graduate programs offering music
education degrees. They found about three out of five of institutions offered a stand-alone
assessment course, and of those, three-quarters required master’s students to take the course.
Thus, it was not surprising that of the 30 participants in my study who held master’s degrees,
only four had taken a prior stand-alone course in assessment, and five felt prepared to assess
students when they entered the teaching profession. Further, of the 30 participants in my study
that held master’s degrees or higher, only eleven had attended a prior workshop focused on
assessment. Austin and Russell (2016) did not collect demographic and assessment training
information from inservice music teachers. Yet, it is not surprising to see that when institutions
do not require assessment courses as part of their curriculum for students, few inservice music
teachers report prior coursework focused exclusively on assessment. While this finding does not
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 130
account for institutions which imbed assessment training within other methods or curriculum
courses, it is nonetheless apparent that current curricular practices did not lead to assessment
literate participants in this study. Between a lack of dedicated coursework, and evidence that
professional development is more effective in changing practice than preservice coursework
(Gutierrez, 2014), I believe professional development may hold greater potential for developing
assessment literacy in the music teacher population than preservice coursework.
Online Professional Development Formatting
Participants’ assessment literacy increased following an online intervention; this echoes
findings from researchers who have studied the efficacy of online professional development for
inservice teaching populations. Based upon feedback from intervention participants (n = 8),
music teachers in the present study also appeared to appreciate the use of non-traditional formats
(i.e., delivery mechanisms, materials, and activities not used in face-to-face learning
experiences). Boling et al. (2011) and vanOostveen et al. (2019) reported similar findings while
investigating teachers’ beliefs about theories of learning via online PD.
To date, researchers have not specifically studied the efficacy of specific course elements
(other than online or face-to-face formats). However, researchers have found that use of novel
online information and communication technology (ICT) in digital formats is more effective than
adaptation of traditional presentation strategies (e.g., taped lectures) in online PD (Boling, et al.,
2011; DeLuca, et al., 2004; Huai, et al., 2006; vanOostveen, et al.2019; Wasserman & Migdal,
2019; Wang, et al., 2008). Such findings are partially due to the collaborative opportunities such
formats provide, as well as the potential for creating relevant PD that addresses teachers’ unique
contextual factors (Guskey, 2003, 2009). In developing the intervention, I purposively selected
activities that would balance opportunities for collaboration (e.g., the weekly discussions using
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 131
the Perusall application) with authentic application of content in music teachers’ unique context
(e.g., the Teacher-Constructed tasks). However, it may be worthwhile to examine the
effectiveness of different kinds of tasks, such as journals, in enhancing teachers’ assessment
literacy, beliefs, and practices.
Assessment Literacy Can Be Impacted through Intervention
I found that intervention group participants demonstrated significantly greater assessment
literacy than their control group peers at the conclusion of the study. Fan et al. (2011) also found
inservice teachers’ (N = 47) assessment literacy increased on the Assessment Knowledge Test
(AKT) – a researcher-designed measure – following a six-week online professional development
course. The AKT bears little resemblance to the CALI; thus, growth comparisons are difficult to
make. The AKT was not designed to align to the STCEAS, used a different number of items, and
was scored differently than the CALI. However, Fan et al.’s finding does lead me to wonder
whether a longer intervention (e.g., six or eight weeks) would have led to significant changes in
music teachers’ assessment beliefs. Koh’s (2011) finding that sustained professional
development (albeit, in a face-to-face format) was more effective than traditional “one shot”
workshops in changing teachers’ assessment literacy in the long term suggests that such a change
should be further investigated. In fact, in feedback from intervention participants, one teacher
specifically commented that they would have preferred a longer workshop. When considering
the length of my intervention, I considered the rigor of the course against the possibility of
participant attrition if the course was too long. In future uses of this professional development I
may elect to extend the duration to eight weeks, allotting two weeks per module, which may
allow participants more time to synthesize and reflect upon content as they perform the teacher-
constructed task. Or, in extending the length of the intervention, I could allot more time for
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 132
standards that participants perceived as more challenging to understand, or that were resistant to
improvement on the posttest. In future studies, I may compare the impact of varying intervention
durations upon assessment literacy, beliefs, and practices.
Mertler (2009) used the Assessment Literacy Inventory (ALI) to measure teachers’
assessment literacy before and after a two-week professional development intensive (N = 7). The
ALI is comparable to the CALI; there are an equivalent number of items aligned to the STCEAS,
and items are scored identically; only the grouping and organization of items differ between
measures. Mertler (2009) found that participants demonstrated, on average, lower competency
amongst items from Standards 1, 2, and 3 on both the pretest and posttest in comparison to those
enrolled in the intervention in my study. However, Mertler’s participants did show greater
change in their scores on items from Standards 2, 3, and 4 from pretest to posttest, as well as
greater change overall on all four standards from pretest to posttest. In contrast to Mertler’s
(2009) findings, I found that participants’ scores increased the most for items comprising
Standards 1 (selecting appropriate assessments) and 3 (scoring and interpreting assessment
results). While Mertler’s intervention was a two-week in-person intensive, mine was an online
four-week asynchronous experience; differences in participants’ assessment literacy following an
intervention may be, in part, attributable to the difference in duration and/or format. Further,
Mertler studied general education inservice teachers (i.e., English, math), while I exclusively
studied exclusively inservice music teachers. These differences could also be due to changes in
preservice teacher curriculum since Mertler conducted his study.
Assessment Beliefs are Related to Assessment Literacy
I found a modest significant relationship between assessment beliefs and assessment
literacy, indicating that those who were more assessment literate tended to hold higher regard for
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 133
assessment, and vice versa. While the intervention did not directly focus upon or target teachers’
assessment beliefs – though perceptions about assessment and its value were certainly alluded to
in the weekly discussion board – it is evident that the experience had some influence on
participants’ beliefs surrounding assessment. The focus of the intervention was primarily on
assessment literacy, i.e., the adaptable knowledge of processes and methods used to evaluate
student learning. Specifically, the intervention was created to enhance teachers’ knowledge about
selecting, designing, implementing and scoring, and interpreting assessments in their teaching
practice. Intervention participants demonstrating a significant increase in assessment literacy
compared to their peers was one of my original hypotheses.
Participant assessment literacy scores and assessment belief scores were moderately
related, suggesting that assessment literacy may inform assessment beliefs. That is, the more
participants know about assessment and how to implement it effectively, the higher regard they
may hold for assessment. Indeed, participants with higher assessment literacy scores tended to
hold higher regard for assessment. This was in keeping with Fan et al.’s (2011) findings after
investigating the effectiveness of an online program to enhance secondary inservice teachers’ (N
= 47) assessment knowledge and perspectives. This further suggests that assessment literacy
may, in some way, be influenced by the beliefs that music teachers hold about assessment
(Austin & Russell, 2019). Or, conversely, greater assessment literacy may inform music
teachers’ beliefs about assessment. Quilter and Gallini (2000) found that teachers’ past
experiences with classroom assessment correlated highly with their current beliefs.
Assessment Beliefs Appear Stable Across Time
Music teachers in this study held an overall positive view about the value and purposes of
assessment at the outset and conclusion of the study. While some beliefs nominally changed,
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 134
such changes were not significantly different between groups over time. While the four-week
period between the administration of the pretest and posttest may not have been enough time to
impact participants’ beliefs, I believe that assessment beliefs and other internal factors described
by McMillan may be resistant to change due to their connection to music teachers’ overall self-
identification with what Bartel (2004) and others have termed the “teacher-conductor” model. In
this model, music teachers have traditionally been characterized as “teacher-directed,
authoritarian” leaders compelling students engage in “re-creative rather than creative
experience[s]” (Countryman, 2008, and Reimer, 1989, as cited in Berg, 2014, p. 263). Isbell
(2008) suggested that music teachers’ self-identities may be formed during preservice training
through conflicting narratives about their role as performers and educators. Such identity
formation may be reinscribed through secondary- and tertiary socialization relationships with
previous secondary ensemble teacher-conductors, and preservice collegiate conductors and
instructors (Berg, 2014).
Music Teachers’ Assessment Practices Vary, And Are Largely Informal
While I did not find music teachers’ assessment practices were significantly changed
following the intervention, I did find that they comprised a variety of forms (e.g., performance
tasks, tests, and attendance) and purposes (e.g., formative, summative, extramusical), directly
recalling McMillan’s (2001, 2003) findings. Music teachers largely reported using individual
performances (e.g. “down-the-line” music performance checks), but not other forms of
assessment, such as written tests and quizzes, classwork, portfolios, or projects (Hill, 1999;
Kancianic, 2006; Kotora, 2005; LaCognata, 2011; McClung, 1996; McCoy, 1988, 1991;
McQuarrie & Sherwin, 2013; Russell & Austin, 2010; Simanton, 2000). Half of my sample
reported using attendance, and over three quarters reported using participation to assess students
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 135
on a weekly basis. This finding corroborates Russell & Austin’s (2010) finding that many
student appraisals are accounted for by “non-academic criteria.” Participants’ self-reported use of
attendance and participation appeared to wane on the posttest, and there was not a difference
between assigned groups. Likely, the COVID-19 pandemic rendered such assessments moot as
PK-12 music educators shifted to online instruction.
Assessment Beliefs and Literacy are Related to Assessment Practices
My findings did align with Austin and Russell’s (2017) observation that music teachers
who valued assessment were more likely to assess student learning for formative purposes, and
to eschew extramusical purposes of assessment (i.e., participation, attendance, and other
compliance-based behavioral targets). I also found that participants who purported to use
assessment for formative functions also tended to hold assessment in higher regard. Perhaps
teachers who utilize formative assessments with greater frequency find that students demonstrate
greater understanding of taught material, or due to increased assessment literacy, no longer
perceive assessment as a skillset separated from their teaching practices.
Music Teachers’ Assessment Practices May Be Impacted by Other Factors
In this study, music teachers’ assessment beliefs and practices did not significantly
change. While direct remediation of assessment beliefs and practices, extending the intervention,
and attempting to further mitigate the impact of COVID-19 on participants’ experiences may
have changed the outcome of this study, music teachers’ decision-making process may also be
impacted by additional factors. Measuring the influence of external factors (e.g., state, district, or
building policies, parents, etc.) or classroom realities (e.g., building schedules, teaching load,
resources, and student characteristics) was not an aim in this study. However, these factors
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 136
undoubtedly influence teachers’ decision making regarding specific assessment practices, as
McMillan (2003) found.
Music teachers’ decision making may be uniquely impacted in comparison to their
general education peers by their positions and identities as musicians. Prior researchers have
found that music educators’ identities are shaped by factors such as personal philosophy about
the purpose of music education (LaCognata, 2010; Richerme, 2016), whether music teachers
identify more strongly as an educator or director (Isbell, 2008), and if they believe assessment
resides outside their creative roles (Denis, 2018). It may be important to examine music teachers’
decision-making processes independently of their general education peers. Otherwise, music
teachers will continue to (a) feel inadequate about their ability as assessors of student learning;
(b) disregard their role as assessors; and (c) execute the role ineffectively.
Music Teachers’ Classroom Assessment Decision-Making
McMillan’s (2003) conceptual map of teachers’ assessment decision making (Figure 1.3,
p. 23) depicts the factors teachers take into consideration when selecting assessment practices.
McMillan (2001, 2003) developed this model using survey and semi-structured interview data
collected from 27 English and mathematics teachers. McMillan described six major themes: (a)
internal factors (e.g., teachers’ knowledge, values, and beliefs), (b) external factors (e.g., state
accountability policies, district policies, and parents), (c) tension between internal and external
factors (e.g., grades, discipline, student success), (d) classroom realities (e.g., absenteeism,
heterogeneity, and limited resources), (e) decision making rationales (e.g., student engagement,
student success, and difficulty), and (f) assessment practices. In McMillan’s conceptual model
assessment practices are the manifested, tangible outcome of the six factors.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 137
Based upon my findings, I have altered McMillan’s model (Figure 5.1). My model differs
from McMillan’s (2003) in two ways; while McMillan conceptualized internal factors as
inclusive of teachers’ knowledge, beliefs, expectations, and values, in my model I have
extrapolated assessment literacy (i.e., adaptable assessment knowledge) and assessment beliefs
(i.e., conceptions and values associated with assessment) from internal factors. Teachers’
knowledge and beliefs outside of assessment, as well as their expectations and values (e.g.,
personal philosophy of music education, goals for the music program, etc.), may serve as
additional constructs informing “internal factors” unique to music educators. External factors and
classroom realities may also require extrapolating factors that are unique to music educators.
Figure 5.1
Music Teachers’ Classroom Assessment Decision-Making
In both McMillan’s and my figure, assessment practices are the output of competing,
inter-related factors that form a teacher’s decision-making rationale for selecting a specific
assessment (or a specific purpose for assessing students). Internal and external factors contribute
to dissonance within the teacher; for example, a teacher’s philosophy about the purpose of
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 138
education may conflict with external demands from administrators (e.g., enrollment, classroom
management, increasing test scores, etc.). As teachers respond to these tensions, their own
internal narratives may shift, or the narratives of the external actors may shift. The teacher whose
philosophy was in conflict with the administrator’s demand for a different classroom
management style may react to the conflict by changing their philosophy, or by convincing the
administrator that the approach they are using is in the best interest of students. Internal and
external factors also directly impact the teachers’ rationale for subsequent assessment practices.
Additionally, classroom realities such as student heterogeneity and access to resources directly
impact teachers’ assessment decision-making. Teachers with large sections of students with
varied needs may feel pressured to use fewer, or less formal, assessments, or types of
assessments, with less frequency. I believe that assessment beliefs and literacy also directly
impact teachers’ assessment decision-making, as well as moderate one another. For example,
teachers with lower levels of assessment literacy likely hold a lower regard for the value of
assessment (and vice versa), which subsequently impacts their assessment decision-making.
Teachers’ assessment beliefs and literacy may also impact other internal factors, such as
philosophical narratives about the purpose of education. Those with low regard for assessment or
a lack of knowledge about assessment processes may feel that education is not a measurable or
observable endeavor that can be captured in data, for instance. Such a stance also may impact a
teachers’ willingness to learn more about assessment (i.e., increase their assessment literacy), or
change their assessment beliefs.
Teachers’ classroom assessment decision-making is likely not a series of static, discrete
events that result in individual decisions about individual assessments. This process is possibly
continuous, and each of the factors may account for variance in the outcome depending upon the
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 139
most pressing needs of the students and teacher. That final decision – the assessment practice
selected by the teacher – may subsequently inform the next series of assessment decisions based
upon the teachers’ ability to reflect on assessment results, and any subsequent changes to their
assessment beliefs and/or literacy.
The Role of Other Factors
Other factors may influence music teachers’ assessment decision-making and allow
music teachers to overlook their roles as assessors. Measuring the influence of these factors (e.g.,
internal, external, tensions between internal and external, and classroom realities) was not an aim
of this study but should be further defined and explored by researchers.
Internal Factors. Internal factors, such a music teachers’ prior training in assessment,
individual program aspirations, or other personal values attached to music education (i.e.,
philosophy, cultural values, or even pedagogical beliefs), likely inform educational decision-
making. For the purposes of future studies, I believe that internal factors should be extrapolated
to include the elements depicted in Figure 5.2. Collectively, these elements may help explain
why some music educators are resistant to utilizing a variety of assessment forms and purposes.
For example, Denis (2018) argued that music educators often did not believe assessment was
appropriate for appraising subjective experiences like music, and that assessment is outside the
purview of their role as directors. In fact, it is this pervasive perspective held by music teachers –
that they are conductors rather than educators – that may contribute to these perceptions
(Mantie, 2012). Music educators who view themselves as conductors rather than educators may
have program aspirations that are more likely to include public recognition (i.e., in the form of
trophies, prestige, and community support), which affirms the conductor identity. Additionally, a
lack of prior assessment training may reinforce the perception that assessment is not an integral
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 140
professional duty, especially if prior training holds little relationship to the music teachers’
philosophy, program aspirations, and/or pedagogical beliefs.
Figure 5.2
Internal Factors
External Factors. Factors such as parental expectations for students’ grades, affect, and
achievement, as well as school and district expectations for program size and achievement, may
also play a role in educational decision-making related to assessment (Russell & Austin, 2010).
Music teachers’ often assign higher-than-average grades to students than other content areas
(LaCognata, 2010). This may create an expectation amongst some parents that high grades are to
be expected of music coursework, regardless of students’ demonstrated proficiency surrounding
knowledge about music or technical prowess. Allsup and Benedict (2008) suggested that
challenging parents’ (as well as students’, administrators’, and community stakeholders’)
expectations about the relationship between learning and grading in music courses are rooted in
larger discourses about the legitimacy of music as an academic subject.
Figure 5.3
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 141
External Factors
Tension. Researchers in general and music education have not examined the tension
between internal and external factors impact on educational decision-making tied to assessment.
However, the parallel tensions between the roles of performer and educator may provide context
for music teachers’ general fatigue and ambivalence toward assessment. Just as music teachers’
occupational role identity is shaped by conflicting – and often unreconciled – narratives (Isbell,
2008), I believe music teachers’ assessment decision-making may be, as well. When music
teachers’ personal expectations, values, beliefs, and knowledge about assessment outweigh the
relative importance of external factors (or, when music teachers have the autonomy and agency
to implement program and pedagogical change), music teachers may elect to use a greater
variety of assessments for myriad purposes, and conceive of assessment as an integral
component of instruction. When external demands of parents and other stakeholders about the
success, impact, and size of a music program outweigh the relative importance of music
teachers’ internal narratives (i.e., knowledge, values, beliefs, prior training, confidence, etc.),
music teachers’ may subvert their personal desires and select fewer assessments, or fail to see
assessment as an integral component of instruction.
Figure 5.4
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 142
Tension
Classroom Realities. Classroom realities, such as the size of classes, teachers’ teaching
schedules, student discipline, and resources (i.e., monetary, technological, material, and human
resources), likely also impact assessment decision-making (Figure 5.5). Music teachers are often
assigned the largest class sections within their schools, the most and varied teaching preps, and
the most challenging teaching and extracurricular schedules, particularly at the elementary
general and secondary levels (Hanzlik, 2001; Hill, 1999; Kancianic, 2006; LaCognata, 2010;
McClung, 1996; Sherman, 2006; Simanton, 2001). Music teachers may feel overwhelmed by the
prospect of designing, grading, and interpreting dozens – if not hundreds – of assessments,
especially if they lack resources such as technology, materials, or adequate planning time. These
factors no doubt contribute to music teachers’ reliance upon extramusical purposes of
assessment, such as attendance, participation, and other compliance-based appraisals of student
performance, especially if music teachers’ have not undergone assessment training.
Figure 5.5
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 143
Classroom Realities
The Role of Socialization
As described in Berg’s (2014) review of literature about music teacher preparation and
role-identity, music teachers’ socialization into the profession may play a role in forming a
“teacher-conductor” identity (p. 261). Teacher-conductor identity has traditionally eschewed
more student-centered practices, and led to a reliance on instruction where “a teacher/conductor
[stands] in front of a group of music makers controlling the starts and stops, correctly diagnosing
problems, and effectively prescribing remedies to reach the goal of a flawless performance”
(Bartel, 2004, as cited in Berg, 2014, p. 261). Contemporary music education researchers have
advocated for student-centered educative practices such as focused discussion to promote
musical awareness and critical thinking, and the use of peer-assisted learning activities
embedded within a traditional rehearsal-based context (Berg, 2014). Isbell (2008) suggested that
conflict between the performer (e.g., conductor, director, musician) and educator identities music
teachers hold over the course of preservice training may contribute to practices associated with
either student-centered or teacher-centered pedagogies. Denis (2018), after a review of music
education assessment literature, suggested that music teacher identity formation may even
contribute to perceptions amongst music teachers that assessment is inappropriate, and outside of
their responsibilities as educators. Johnson (2014) suggested that this false dichotomy between
music teachers’ self-identification either as performers or educators could be reconciled through
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 144
conducting coursework. Music education researchers’ examination of identity and role formation
through professional socialization could serve as an important framework for understanding the
reasons why music teachers hold beliefs about assessment that are resistant to change, and how
such beliefs could be influenced during preservice training.
Formation of music teacher identity occurs during primary, secondary, and tertiary
socialization. Primary socialization experiences occur prior to preservice teacher training, often
through formative familial experiences. Berg (2014) suggested that beliefs formed during this
period are “often not questioned and can be emotionally charged (Berger & Luckman, 1966),
thus functioning as one’s habitus (Bourdieu, 1993) or [contributing to] ideas about appropriate
actions, values, and one’s function in society (DeMarrais & LeCompte, 1999)” (p. 266).
Secondary socialization occurs in the years immediately preceding preservice training;
researchers have suggested that this period is often critical to music teachers’ decision to pursue
collegiate music training (p. 267). Inservice music teachers tend to most identify with the role of
music teacher or performer prior to preservice training based upon performance experiences in
secondary ensembles (Berg, 2014; Isbell, 2008). Austin and Reinhardt (1999) suggested that
preservice music teachers’ philosophical beliefs tend to remain stable over time, further
reinforcing later researchers’ findings that preservice training may not necessarily alter
perspectives about the value and purpose of music education. Isbell (2008) found that
undergraduates’ occupational identity was best predicted by secondary socialization experiences.
Thus, it is not surprising that preservice and inservice teachers’ who recall secondary
performance experiences under secondary ensemble conductors – in combination with the
perception of director’s roles as performers – decide to pursue music education.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 145
Tertiary socialization occurs during occupational role construction, often during
preservice teacher coursework. As preservice music teachers acquire knowledge and experience
teaching music, they reconcile or integrate their prior secondary socialization experiences to
form an overall occupational identity. This identity (and associated values), while resistant to
change, is not immutable, and may be reshaped based upon the setting (Berg, 2014). Preservice
music teachers’ occupational identity consists of three constructs, according to Isbell (2008):
musician, self-perceived teacher, and teacher identity as inferred from others. Thus, providing
preservice and early career music teachers with experiences that challenge their prior
conceptions, engage them in reflection, and address the emotionally-charged components of their
philosophies could prove vital to shaping music teachers’ beliefs surrounding their roles as
assessors, and the integration of the role of assessor and educator.
Measuring Assessment Literacy, Beliefs, and Practices
Evaluating the effect of the intervention on participants’ assessment literacy, beliefs, and
practices was the primary purpose of this study. Yet, to do so, it was critical to have effective
instruments; reliably measuring assessment literacy, beliefs, and practices in valid ways was
critical to determining the efficacy of the intervention. Selecting or designing appropriate
instruments for this study impacted the quality and kind of data I gathered. My results lead me to
wonder about the calibration and dimensionality of the instruments employed, and what
researchers must consider in future investigations surrounding assessment literacy, beliefs, and
practices.
Calibration
With regards to assessment literacy, I selected the CALI for the following reasons: (a) it
was the most widely utilized measure by researchers who have examined assessment literacy in
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 146
inservice and preservice teacher populations, (b) it was designed to align with the Standards for
Teacher Competence in Educational Assessment of Students (STCEAS), which I was also using
to plan instruction for the Music Teachers’ Assessment Workshop (MTAW), and (c) the questions
posed were based upon realistic vignettes portraying application of assessment knowledge (i.e.,
the knowledge measured was procedural, not inert, which is a key attribute of assessment
literacy). Yet, the internal consistency of the CALI remains compromised. On the CALI there are
five items associated with each standard. I found, within each standard, that some items were
neither difficult nor able to discriminate between the highest and lowest performers. Based upon
this finding, I believe that both more items are needed per standard (to increase the internal
consistency), and that the items should be more carefully calibrated to challenge and discriminate
between responses.
For example, the five items corresponding to the first STCEAS standard (“selecting
appropriate assessments") addressed issues surrounding reliability and validity, selecting the
most appropriate assessment strategy from a list for a specific scenario, and rationales underlying
the selection of an assessment. While these are all pertinent to the adaptable knowledge required
to select appropriate assessments, it may be necessary for there to be more items spanning these
concepts, and/or for more elements to be considered and represented under this standard.
Dimensionality
Prior researchers have criticized the CALI for its lack of internal consistency, as well as
the lack of fit between specific items and the subscale (i.e., the STCEAS) they comprise
(Alkharusi, 2015; Hailaya, et al., 2014; Ryan, 2018). Alkarusi (2015) and Hailaya et al. (2104)
suggested that assessment literacy – as measured by the CALI – was a unitary construct; that is,
that it was not comprised of the seven subscales corresponding to the STCEAS. Ryan (2018) “did
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 147
not draw any definitive conclusion in support of one internal structure, but the results from [her]
study at least demonstrate that the ‘clean’ and ‘tidy’ Standards-based conceptualization of
assessment knowledge is questionable, and perfect alignment with the seven standards is highly
improbably regardless of the sample used” (p. 244). Future examination of assessment literacy
must begin with measurement studies to determine the dimensionality of assessment literacy, and
whether it is a unitary construct, or multi-dimensional construct.
Assessment beliefs have also been measured by researchers using instruments predicated
on a mixture of unitary or multi-dimensional definitions. For example, the work of Brown (2004,
2006) and his colleagues (2009a, 2009b, 2011a, 2011b, 2011c, 2012, 2015) was based upon his
instrument, the Teachers’ Conceptions of Assessment (TCoA). Brown and his colleagues
conceived of teachers’ assessment beliefs as multi-dimensional, comprising three primary
themes: (a) accountability purposes, (b) formative feedback purposes, and (c) irrelevance to
teachers’ practices. There have been numerous measurement studies both confirming and
disconfirming statistical fit to these dimensions (Azis, 2015; Remesal, 2011; Segers & Tillema,
2011); however, Brown has also cautioned that the fit of this multi-dimensional conception of
assessment belief appears dependent upon the population used. Harris and Brown (2009) used a
phenomenographic approach to explore the purposes that a sample of 26 New Zealand teachers
ascribed to assessment and arrived upon four dimensions. Allal (2013) asked ten Swedish
teachers to select student work and discuss the rationale for their appraisals, and found that
assessment techniques were socially situated, and often informed by conflicting internal and
external factors. Using Brown’s TCoA, Azis (2015) conducted an explanatory mixed-method
study to investigate assessment beliefs and their influence on assessment practices; they also
found agreement with Brown’s three-dimension conception of assessment beliefs.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 148
While the MTABI was adapted from the TCoA by Austin & Russell (2017), it differs from
Brown’s instrument in that it is a global measure of assessment beliefs. However, it does
demonstrate a sustained high degree of internal consistency. The statements used in the MTABI
encompass evaluative (e.g., “assessment results are trustworthy”) and affective (e.g., “assessment
results are rightfully ignored by most music teachers”) aspects of belief related to assessment,
stated in a positive or negative manner. As with the CALI, researchers should continue to
examine the dimensionality and malleability of assessment beliefs, and the often-conflicting
narratives that inform such beliefs.
I designed the MTAII, in part, because researchers have not reached definitional
consensus about what constitutes assessment practices. Some researchers defined practices as the
specific forms of assessment used (Aschbacher, 1999; Frey & Schmitt, 2010; Hanzlik, 2001;
Hill, 1999; Kotora, 2005; McMillan et al., 2002; Russell & Austin, 2010; Sears, 2002; Sherman,
2006), while others defined assessment practices as the purposes for which teachers employ
assessment (LaCognata, 2010; Mertler, 2000; McClung, 1996; McMillan, 2001; Oosterhof,
1995; Zhang, 1996). It is possible that assessment practices may encompass both the forms and
purposes for which teachers use assessment. However, it is also important to consider the way
that data are collected; some researchers, such as Kancianic (2006), measured the number of
assessments teachers report using, while others measured the frequency using a scale (Mertler,
2000), or the degree to which teachers felt skilled while employing specific assessment practices
(Zhang, 1996). The common feature of these data collection techniques are that all relied upon
accurate self-reporting from teachers. There is no evidence suggesting that teachers are accurate
in their self-reporting of assessment practice data, and researchers should continue to investigate
more objective (e.g., observation-based) ways of collecting this kind of data.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 149
Implications
...for Music Teacher Development
Given the findings of this study, further contextualized by intervention participant
feedback (Appendix L), I believe there are important implications for inservice music teachers,
school districts, music teacher educators, and national arts education organizations.
Inservice music teachers must advocate for and seek targeted, district-supported
professional development that addresses their content area. Professional development should be
designed in a flexible format (e.g., face-to-face, online, or a hybrid of the two), and facilitate
application of the desired competencies for music teachers. Facilitated practice has been shown
to be more effective in changing teachers’ knowledge and practices than traditional lecture-based
professional development formats (Chen, 2007). Professional development is the most
appropriate avenue to rectify assessment illiteracy within the music teaching profession, as
current research suggests that preservice assessment coursework (regardless of delivery via a
stand-alone or integrated course) may have little impact on the subsequent assessment practices
of inservice teachers (Gutierrez, 2014). Based upon the results of this study, feedback from
participants, and prior research, it is evident that music teachers may desire additional
assessment knowledge development opportunities that their preservice and/or master’s degree
experiences, state conference experiences, and prior inservice professional development
experiences have not provided.
Music teacher educators also stand to benefit from these findings. While some suggest
that assessment literacy development occurs most rapidly with inservice teachers, perhaps
because they are able to enact novel strategies in their own classrooms, preservice music teachers
may later become more open to professional development if assessment concepts are introduced
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 150
during their training in a fresh way. Music teacher educators should find ways to incorporate
more assessment training into their curriculum, whether through formal changes to program
course requirements, or embedded activities where students create, implement, and reflect upon
assessments they give in their practicum placements. For example, the assessment construction
project that music teachers enrolled in my intervention completed could be readily adapted for
preservice teachers to use in their field-based teaching experiences practica. Participants in this
study found those experiences – where activities were aligned to course readings and objectives
– valuable. Preservice teachers would no doubt also benefit from such activities, even if they did
not lead to sustained assessment literacy in their inservice teaching. Learning from both their
cooperating teacher and education professors could likely be more impactful than traditional
preservice assessment training, which is typically delivered in the form of single-class lectures,
or out-of-context assignments.
While NAfME does have a visible position statement about assessment in music
education on their website, as well as guidelines for music teachers, school boards, legislators,
and other decision-makers, it is not clear that there is consequential organizational support for
developing music teachers’ competency as assessors. NAfME, as well as other teacher advocacy
organizations, may benefit from the findings of this study by providing the teachers they
represent with effective professional development. In particular, NAfME is better situated to
access and appeal to music teachers (both inservice and preservice) than any other organization.
Whether through their online presence as a national organization, or satellite presence via state-
level music education associations and conferences, NAfME can and should endorse, finance,
develop, and implement professional development to achieve the aims of their position
statement. Further, NAfME, and other national arts education organizations, should work to
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 151
develop music-teacher specific assessment literacy standards that consider the unique contextual
features and attributes of music educators.
...for Future Implementations of this Intervention
In future iterations of this intervention, I will alter several elements of the intervention.
Specifically, I would incorporate reflection-based activities to directly address music teachers’
assessment beliefs, reapportion emphasis to provide participants with more practice interpreting
assessment data and using data to make future educational-decisions, assure that participants
were able to conduct the Teacher-Constructed tasks with their students, provide greater
differentiation in resources, and increase the length of the intervention from four to eight weeks.
In this study I did not directly address music teachers’ assessment beliefs in this
intervention, although teachers did engage in discussions about the nature of their beliefs, and
assumptions they held about the purposes and value of assessment. In a future implementation of
the professional development, I will explicitly address music teachers’ beliefs through journal
prompts and reflections based upon both the readings and discussions surrounding specific
readings. I believe this alteration will help music teachers examine the origins of their beliefs
(which may be tied to professional socialization experiences), and dispassionately connect what
could be emotionally charged assessment beliefs from their identity as educators. I also believe
this may help integrate music teachers’ often conflicting narratives about their roles as
conductors and educators.
I will also place greater emphasis on materials and activities connected to 3rd and 4th
STCEAS. In this study, music teachers showed the most growth in Standards 1 and 2 (selecting
and designing assessments), but minimal growth in Standards 3 and 4 (implementing, scoring,
and interpreting assessment data for educational decision making). Participants’ comments about
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 152
the intervention also lead me to belief that activities associated with Standard 3 were the most
intellectually challenging for many music teachers. Arguably, Standard 4 is essential to shifting
music teachers’ actual assessment practices. Thus, I might alter the teacher-constructed tasks
associated with Standards 3 and 4 to include greater scaffolding of evaluation and measurement
concepts, ask teachers’ to respond to various scenarios about scoring and interpreting assessment
data, and reflect upon how effective teachers’ subsequent educational decision making (i.e., the
instruction following the assessment) was for their students. Or, I might alter all of the teachers-
constructed tasks to embody a participatory action research project by asking teachers to use the
assessment they designed during the course with one cohort of students, and a previously-utilized
assessment (or no assessment) with another cohort.
While the COVID-19 pandemic prevented music teachers’ from implementing their
assessment with their students due to access, I will design future versions of this professional
development to allow music teachers to utilize their students in more varied ways. For example,
within this study I asked participants to design a rubric-based assessment for students that would
conceivably be implemented in ordinary circumstances. In the future, I will spend more time
emphasizing the integrated nature of assessment and instruction and provide participants with
exemplars for assessments fulfilling a variety of forms and purposes. This way, regardless of
circumstances, teachers will be more likely to utilize their assessment with their students and
maximize the value that authentic practice may hold in shaping assessment beliefs.
The resources and materials participants accessed during this study were largely text-
based articles. In feedback comments, several participants voiced their desire for greater
differentiation in the materials, such as audio files of articles being read (e.g., Audible or other
software that offers oral versions of text) or video-based lectures that surmises the major points
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 153
from each week’s readings. I will take the time to locate or create audio versions of articles and
provide other audio- and video-based options for participants who require such accommodations
to complete the coursework. I will also emphasize that participants are only required to access
one or two articles each week so they can discuss them with other participants; this activity was
well-received by participants and may help challenge previously held assessment beliefs.
Additional materials will be offered on a supplemental basis.
Finally, I will change the length of the professional development from four to eight
weeks. Two participants suggested they would have benefitted from more time to complete the
modules, or to engage in the resources more fully. By doubling the time of the intervention, I can
devote one week for participants to read and discuss materials, and one week to synthesize and
apply their knowledge through the Teacher-Constructed tasks. Changing the duration of the
intervention will provide participants more time to engage with materials and to reflect upon the
processes associated with each of the four STCEAS.
Study Limitations and Implementation Challenges
The findings of this study must be contextualized within the limitations of this study,
including the sampling procedure, participant attrition between stages of the study, possible
history effects, and reliability of the CALI instrument.
Sampling Procedure
In March of 2020, I utilized the NAfME Research Assistance Program (RSA) to solicit
participants from a nationwide sample of 19,870 music teachers. Of that number, only 6,309
opened the email, and 247 clicked on the link to read more about the study. It is impossible to
know now, but I suspect the opportunity to send one or two follow-up reminders could have
garnered music teacher attention at the outset of the study. Of those, 108 completed informed
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 154
consent, 74 completed the pretest, and 43 completed the study. Due to NAfME’s data collection
procedures for members and the RSA, they were unable to provide demographic information
about the 19,870 members they pulled from their lists (i.e., if participants were full-time K-12
music educators). This is, in part, why I included a pre-screening question. However, it also
means that I am unable to calculate an accurate response rate or fully assess the
representativeness of my participants, because I do not know how many of the music teachers in
the target population met my parameters. Regardless, it is evident that participation of 43
members from a target population of approximately 20,000 educators would result in a quite low
response rate. Participation may have been improved had I utilized a more compelling form of
reciprocity (e.g., monetary compensation) to incentivize both participation and completion of the
study.
Although generalizability is not a goal of intervention designs, it was still important to
ensure that participant characteristics were not significantly different between groups. A low
response rate could conceivably result in a nonresponse bias, where participants in the study
differ from those who chose not to participate. The use of a volunteer sample based upon music
teachers with NAfME membership could also conceivably impact the representativeness of
participants. In the future, researchers might draw a sample of music teachers from a single
district or state to contextualize findings for a specific population; their findings could be
sponsored and distributed by state-level music education associations, which would benefit from
empirical research.
Other music education researchers who have utilized a similar sampling procedure also
reported low response rates; LaCognata (2013) reported a 10% response rate for a sample of
4,500 music teachers, Koerner (2017) was unable to report an official response rate for his
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 155
sample of 154 music teachers from ten states, but estimated an approximate response rate of 3%
based upon National Center for Educational Statistics data about teacher populations, and
several other researchers have reported low response rates using this sampling method (Bacala,
2020; Hahn, 2010; Hourigan, 2008). Researchers should take this trend into account when
utilizing the NAfME Research Assistance Service. In the meantime, researchers may lobby for
such changes through channels such as the Society for Research in Music Education (SRME).
Framing changes to NAfME’s Research Assistance Service as a collaborative effort between
national leadership and researchers for the benefit of the profession may be persuasive.
Coupled with the timing of the study occurring during the onset of the COVID-19
pandemic, it is possible that participants in this study are only representative of music teachers
with NAfME membership invested in professional development and assessment. Because
NAfME would not resend the solicitation email, I was strictly limited to the 108 participants who
completed the informed consent document from the initial wave, which limited statistical power.
However, I did meet the statistical assumptions for the analyses employed in this study, and as
discussed in Chapter 4, I did not find any preexisting differences for intervention and control
group participants at the pretesting stage, which implies that the randomization process may have
worked as intended.
In a future replication of this study, I might consider co-designing the professional
development experience with input from teachers. This could potentially increase buy-in and
participation by music teachers and help deliver the most relevant and targeted professional
development. I also suspect strong endorsement (i.e., compulsory participation and completion)
by a school district, state accrediting or licensure body (e.g., a state department of education), or
national organization (e.g., NAfME) may have also helped attract more participants. These
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 156
considerations bolster my resolve to continue developing and implementing this assessment-
focused professional development in the future.
Participant Attrition Between Stages
In a similar vein to the sampling procedure, participant attrition between stages of the
study may have served as a major threat to internal validity (i.e., attrition cases in either group
may have altered the posttest results). However, as noted in Chapter 4, those who chose to leave
the study were not significantly different from those who completed the study regarding their
pretest measures of assessment literacy, beliefs, or practices. The design of the study, including
the length of the measures utilized and the time required to complete the MTAW modules, may
have also contributed to the attrition between stages of the study.
History Effects Due to the COVID-19 Pandemic
Unfortunately, there is no way to tell what impact – and to what degree – the concurrence
of this study with the beginning of the first wave of the COVID-19 pandemic may have had on
findings. While I cannot definitively prove COVID-19 was the cause of a 42% (n = 31)
participant attrition, I know through anecdotal evidence (i.e., emails) from participants that the
pandemic placed an immense burden on music teachers, who suddenly shifted all instruction
online during late March and early April 2020, and contributed to at least seven participants’
decision to leave the study. I could also deduce that the sudden shift to online instruction may
have impacted findings with regards to music teachers’ assessment practices, as there was a shift
from emphasis on traditionally utilized forms and functions (e.g., participation, attendance, and
extramusical functions) to a greater emphasis on formative assessment, and written classwork
and individual performances. A future replication of this study – in a post-COVID-19 world –
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 157
may reveal greater attainment of the objectives linked to the intervention group, and lower
attrition.
Reliability of the CALI Instrument
Reliability estimates – as measured through a KR20 analysis – of inservice music teacher
responses to the CALI instrument continues to be less than satisfactory. Just as Mertler (2004)
reported, I was unable to reach an acceptable threshold for reliability with my participants (N =
43). This also corresponds to what Ryan (2019) reported for the CALI, and what other
researchers have reported for similar measures, like the TALQ, tKUDA, and ALI (Alkharusi,
2015; Donovan, 2015; Hailaya et al. 2014). However, it is important to consider what internal
consistency measures of a competency instrument conceptually convey; that is, such an
instrument can only be considered “reliable” to the degree that participants hold a similar level of
knowledge (or lack thereof). Psychometricians disagree about the use of traditional internal
consistency estimates for instruments that are not scale-based (Thompson, 2010). Regardless, it
may be prudent, based upon item difficulty and discrimination indices, and item-level distractor
analysis, to consider amending or creating alternative measures for assessment literacy,
especially for music educator populations.
Recommendations for Future Research
This study contributed to an under-investigated area of music education research; music
educator assessment literacy, beliefs, and practices are almost completely represented by
doctoral dissertations, volumes produced after each of the seven International Symposia on
Music Education, and the published research of Austin and Russell (Austin & Russell, 2016,
2017, 2019; Russell & Austin, 2010). Austin and Russell have primarily examined music
teachers’ assessment beliefs, extending the work of Brown and his colleagues in general
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 158
education. Music education researchers have not converged on a conceptualization of assessment
practices; and past research has included both forms and functions of assessment in classroom
instruction. Thus, it would be helpful for music education researchers to create an instrument or
utilize a strategy that does not solely rely upon self-reported data. Aside from the possibilities for
future research that may directly address or overcome some of this study’s limitations and
challenges, there are other avenues for studying music teacher assessment as illuminated by my
findings. Chiefly, my recommendations include: (a) development of reliable measures aligned to
music teachers’ assessment literacy, beliefs, and practices, (b) examination of the relationship
between music teachers’ assessment literacy, beliefs, and practices, (c) deliberate partnerships
between researchers and school districts to facilitate intervention-based studies; and (d) further
examination of the role other factors may play in music teachers’ educational decision-making
surrounding assessment.
Developing a reliable measure of music teachers’ assessment literacy is an important step
toward accurately capturing inservice and preservice music teachers’ competency. Perhaps music
researchers should align a future instrument to standards other than the STCEAS, such as music
teacher certification competencies. Or, perhaps NAfME, or another music teacher advocacy
group, could put together a taskforce to identify factors comprising assessment literate music
teachers. Following the last International Symposia of Music Education in 2019 – which was
directly focused on assessment – a task force was assembled to identify such factors but has not
released documentation publicly. As researchers have noted in recent years, the STCEAS needs
an update (Brookhart, 2011; DeLuca et al., 2016; Gotch & French, 2014; Popham, 2019).
Additionally, developing a valid and reliable instrument for evaluating music teachers’
assessment practices is required to perform parametric analyses of music teachers’ attitudes, and
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 159
compare them to assessment literacy and beliefs. As previously discussed, the calibration and
perceived dimensions of assessment literacy, beliefs, and practices require further examination.
Future researchers should utilize exploratory and descriptive designs (e.g., qualitative case
studies, explanatory mixed method designs) to operationalize these constructs.
I also believe that researchers should continue examining the relationships between the
three major constructs in this study (i.e., assessment literacy, beliefs, and practices). For
example, researchers could examine whether assessment practices are impacted by a moderated
role between assessment literacy and beliefs. They could also examine the direction and
magnitude of the relationship between assessment literacy and beliefs. Teasing out these
relationships has important implications for rectifying music teachers’ assessment illiteracy. For
example, if assessment beliefs were found to be influenced by music teachers’ assessment
literacy, that could impact the curricular offering of preservice music teachers and adjust the
professional development priorities of administrators overseeing inservice music teachers. These
relationships could be examined using path analysis, confirmatory factor analysis, or structural
equation modelling.
Music education researchers have suggested that accessing teaching populations can be
challenging (Austin, 2018; Koerner, 2017; LaCognata, 2013). Austin noted that the issue of
access is compounded when “researcher [projects do] not support the districts’ policy positions
or curricular priorities” (2018, p. 8). It is little surprise, therefore, that intervention designs
utilizing inservice music teachers are not well represented in our research journals. These
circumstances (i.e., a need for targeted professional development for music teachers, and a desire
from music education researchers to access teaching populations more easily) should be
approached as an opportunity by all stakeholders to form partnerships. Music teachers and
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 160
districts would benefit from researchers’ expertise in assessment content, and researchers would
benefit from more intimate access to the population they most often study. Additionally,
assessment is both an understudied area in music education research, and a highly desired area
for professional development by school districts, especially those in the process of adapting
curricula to new state standards aligned to the National Core Arts Standards and/or Model
Cornerstone Assessments (Payne et al., 2019). Finally, use of an online or hybrid delivery system
for professional development could prove useful in alleviating some of the access barriers posed
by face-to-face formats. Music education researchers could even pose such interventions as a
form of participatory action research, and increase teacher buy-in and support for research-based
practices.
Finally, researchers should extend the findings of this study and McMillan’s (2003)
conceptualization of teachers’ educational decision-making surrounding assessment by exploring
the role other factors may play. For example, researchers might investigate how impactful music
teachers’ internalized beliefs, values, knowledge, confidence, self-efficacy, prior experiences
with assessment training, and/or prior socialization experiences are in shaping educational
decision-making. This could take the form of various mixed method designs, or advanced
experimental designs. For example, researchers could use a survey instrument to collect
assessment literacy, belief, and/or practice data, and demographic information. In a follow-up
case-study or grounded-theory approach, researchers could conduct extensive semi-structured
interviews with teachers, administrators, parents, and other stakeholders, as well as classroom
observations to verify self-reported data, and build a cohesive theory of music teachers’
assessment decision-making. Then researchers could utilize assessment literacy, belief, and/or
practice data in a confirmatory factor analysis of the grounded model. Or researchers could
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 161
conduct a study where they collect data from district teachers in the same way described within
this study, but also delve into the experiences of teachers in both groups using a multiple-case
study approach. This would allow for corroboration of findings and cross-case analysis between
assigned groups. It would be helpful for researchers, policymakers, and inservice teachers to
know approximately how much variance in assessment decision-making is accounted for by
other factors.
Conclusion
In this study, I found that music teachers’ assessment literacy can be significantly
increased through online professional development, that music teachers’ assessment literacy and
beliefs are moderately related, and that music teachers’ beliefs and specific assessment practices
are moderately related. Further, I found that music teachers in the intervention valued receiving
content-area focused professional development. While there are still issues surrounding the
reliability of instrumentation for measuring assessment literacy, these findings point to possible
solutions for alleviating overall music teacher assessment illiteracy, including but not limited to
national support for developing music teacher-specific literacy standards, development and
dissemination of professional development by national organizations (e.g., NAfME), and
intentional collaboration between music education researchers and school districts. School
districts, inservice music teachers, and music teacher preparation programs may benefit from use
of targeted training in assessment for their respective teacher populations. In the future,
researchers should continue to examine the dimensionality, direction, and magnitude of the
relationships between music teachers’ assessment literacy, beliefs, and practices. The COVID-19
pandemic may have played a role in the impact of this specific intervention; yet, a silver lining of
this timing is that there has never been a better time to reimagine teacher development, and work
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 162
toward improving music teachers’ assessment literacy, beliefs, and practices. Ultimately, these
concepts affect the quality of students’ learning experiences. In the months and years to follow
the COVID-19 pandemic, music teachers, researchers, and other stakeholders will have the
opportunity to make substantive changes – whether curricular or policy-based – to the
educational endeavor, and truly unlock the potential of effective assessment.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 163
References
Adams, K. A. & Lawrence, E. K. (2019). Research methods, statistics, and applications (2nd
ed.). SAGE Publications.
Airasian, P. W. (2004). Classroom assessment: Concepts and applications (5th ed.). McGraw-
Hill.
Allal, L. (2013). Teachers’ professional judgement in assessment: a cognitive act and a socially
situated practice. Assessment in Education: Principles, Policy & Practice, 20(1), 20-34.
https://doi.org/10.1080/0969594X.2012.736364
Allsup, R. E., & Benedict, C. (2008). The problems of band: An inquiry into the future of
instrumental music education. Philosophy of Music Education Review, 16(2), 156–173.
www.jstor.org/stable/40327299
American Federation of Teachers, National Council on Measurement in Education, and the
National Education Association. (1990). The standards for teacher competence in
educational assessment of students. Retrieved from
http://www.unl.edu/buros/article3.html.
Austin, J. R. (2018). In defense of researcher access. Journal of Music teacher Education, 27, 7-
10. https://doi.org/10.1077/1057083717748707
Austin, J. R., & Reinhardt, D. (1999). Philosophy and advocacy: An examination of preservice
music teachers’ beliefs. Journal of Research in Music Education, 47(1), 18–30.
https://doi.org/ 10.2307/3345825
Austin, J. R. & Russell, J. (2016). The status of assessment instruction in U.S. graduate music
education programs: Access, curriculum, and outcomes. Paper presented at the 32nd
World Conference of the International Society of Music Education in Glasgow, Scotland.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 164
Austin, J. R. & Russell, J. (2017). Secondary music teachers’ assessment practices: The role of
occupational identity and assessment conceptions. Paper presented at the Sixth
International Symposium on Assessment in Music Education in Birmingham, England.
Austin, J. R. & Russell, J. (2019). Preservice music teachers’ assessment education: Relations
with assessment conceptions, assessment confidence, projected assessment practices, and
occupational identity. Paper presented at the Seventh International Symposium on
Assessment in Music Education in Gainesville, Florida, USA.
Aschbacher, P. R. (1999, December). Developing indicators of classroom practice to monitor
and support school reform. University of California, Los Angeles: CRESST Technical
Report 513.
Assessment in Music Education. (2009). Retrieved from https://nafme.org/about/position-
statements/assessment-in-music-education-position-statement/assessment-in-music-
education/.
Azis, A. (2015). Conceptions and practices of assessment: A case of teachers representing
improvement conception. TEFLIN Journal, 26(2), 129-152.
https://doi.org/10.16639/TEFLINJOURNAL.V26I2/129-154
Bailey, S., Henricks, S., & Applewhite, S. (2015). Student perspectives of assessment strategies
in online courses. Journal of Interactive Online Learning, 13(3), 112-125.
Baccala, A. C. (2020). Elements of comprehensive musicianship: A survey addressing the
attitudes and approaches of middle and high school choral directors [Doctoral
dissertation, Auburn University]. Auburn University Archive.
https://etd.auburn.edu/bitstream/handle/10415/7172/Elements%20of%20Comprehensive
%20Musicianship-
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 165
%20A%20Survey%20Addressing%20the%20Attitudes%20and%20Approaches%20of%
20Middle%20School%20and%20High%20School%20Choral%20Directors%20by%20Al
lison%20Baccala.pdf?sequence=2
Baker, W. (2013). Questioning assumptions. Vivienne: a case study of e-learning in music
education. Australian Journal of Music Education, (1), 13-22.
Baldwin, S. J., Ching, Y., & Friesen, N. (2018). Online course design and development among
college and university instructors: An analysis using grounded theory. Online Learning
Journal, 22(2), 157-171. https://doi.org/10.24059/olj.v22i2.1212
Barnes, N., Fives, H., & Dacey, C.M. (2017) U.S. teachers' conceptions of the purposes of
assessment. Teaching and Teacher Education, 65, 107-116.
https://doi.org/10.1016/j.tate.2017.02.017
Berg, M. H. (2014). Preservice music teacher preparation for the conductor-educator role. Barret,
J.R. and Webster, P.R. (Eds.), The musical experience: Rethinking music teaching and
learning. (pp. 261-283). Oxford Scholarship Online. https://doi.org/
10.1093/acprof:oso/9780199363032.003.0015
Biasutti, M., Frate, S., & Concina, E. (2019). Music teachers’ professional development:
Assessing a three-year collaborative online course. Music Education Research, 21(1),
116-133. https://doi.org/10.1080/14613808.2018.1534818
Boling, E. C., Hough, M., Krinsky, H., Saleem, H., & Stevens, M. (2012). Cutting the distance in
distance education: Perspectives on what promotes positive, online learning experiences.
Internet and Higher Education, 15, 118-126.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 166
Box, C., Skoog, G., & Dabbs, J.M. (2015). A case study of teacher personal practice assessment
theories and complexities of implementing formative assessment. American Education
Research Journal, 52(5), 956-983. https://doi.org/10.3102/0002831215587754
Brookhart, S. (2001). The “Standards” and classroom assessment research. Paper presented at
the 53rd Annual Meeting of the American Association of Colleges for Teacher Education
in Dallas, Texas, USA.
Brookhart, S. (2011). Educational assessment knowledge and skills for teachers. Educational
Measurement: Issues & Practice, 30(1), 3-12.
Brown, G. (2004). Teachers’ conceptions of assessment: implications for policy and professional
development. Assessment in Education: Principles, Policy & Practice, 11(3), 301-318.
https://doi.org/10.1080/0969594042000304609
Brown, G. (2006). Teachers’ conceptions of assessment: Validation of an abridged version.
Psychological Reports, 99(1), 166-170. https://doi.org/10.2466/pr0.99.1.166-170
Brown, G., Chaudhry, H., & Dhamiji, R. (2015). The impact of an assessment policy upon
teachers' self-reported assessment beliefs and practices: A quasi-experimental study of
Indian teachers in private schools. International Journal of Educational Research, 71, 50-
64. https://doi.org/10.1016/j.ijer.2015.03.001
Brown, G., Harris, L. R., & Harnett, J. (2012). Teacher beliefs about feedback within an
assessment for learning environment: Endorsement of improved learning over student
well-being. https://doi.org/10.1016/j.tate.2012.05.003
Brown, G., Hui, S., Yu, F., & Kennedy, K. (2011). Teachers’ conceptions of assessment in
Chinese contexts: A tripartite model of accountability, improvement, and irrelevance.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 167
International Journal of Educational Research, 50, 307-320.
https://doi.org/10.1016/j.ijer.2011.10.003
Brown, G., Irving, S., Peterson, E., Hirschfeld, G. (2009). Use of interactive - informal
assessment practices: New Zealand secondary students’ conceptions of assessment.
Learning and Instruction, 19, 97-111. https://doi.org/10.1016/j.learninstruc.2008.02.003
Brown, G., Kennedy, K., Fok, P., Chan, J., & Yu, M. (2009). Assessment for student
improvement: understanding Hong Kong teachers’ conceptions and practices of
assessment. Assessment in Education: Principles, Policy & Practice, 16(3), 347-363.
https://doi.org/10.1080/09695940903319739
Brown, G., Lake, R., & Matters, G. (2011). Queensland teachers’ conceptions of assessment:
The impact of policy priorities on teacher attitudes. Teaching and Teacher Education, 27,
210-220. https://doi.org/10.1016/j.tate.2010.08.003
Brown, G. & Michaelides, M. (2011). Ecological rationality in teachers’ conceptions of
assessment cross samples from Cyprus and New Zealand. European Journal of
Psychology & Education, 26, 319-337. https://doi.org/10.1007/s10212-010-0052-3
Burnaford, G. (1999). Teacher action research as professional development in schools: four
paths to change. School wide inquiry: a self-study of an “outside” teacher researcher.
Opinion Paper presented at the Annual Meeting of the American Educational Research
Association, Montreal, PQ.
Campbell, D.T., & Stanley, J.C. (1963). Experimental and quasi-experimental designs for
research. Rand McNally.
Chen, S. (2007). Instructional design strategies for intensive online courses: an objectivist-
constructivist blended approach. Journal of Interactive Online Learning, 6(1), 1-15.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 168
Cherasaro, T. L., Reale, M. L., Haystead, M., Marzano, R. J. (2015). Instructional improvement
cycle: A teacher's toolkit for collecting and analyzing data on instructional strategies.
ERIC Clearinghouse.
Colwell, R. (2008). Assessment in music education: integrating curriculum, theory, and practice;
proceedings of the 2007 Florida Symposium on Assessment in Music Education, from
March 29-31, 2007, at the Univ. of Florida Hilton Hotel and Conference Center in
Gainesville, Florida. (K. Albert & T. S. Brophy, Eds.). Chicago, IL: GIA.
Conway, C. (2002). Perceptions of beginning teachers, their mentors, and administrators
regarding preservice music teacher preparation. Journal of Research in Music Education,
50(1), 20-36.
Conway, C. (2012). Ten years later: Teachers reflect on “perceptions of beginning teachers, their
mentors, and administrators regarding preservice music teacher preparation.” Journal of
Research in Music Education, 60(3), 324-338.
https://doi.org/10.1177/0022429412453601
Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second
language teachers' knowledge, beliefs, and practices. Assessing Writing, 28, 43-56.
https://doi.org/10.1016/j.asw.2016.03.001
Darling-Hammond, L., Chung, R., & Frelow, F. (2002). Variation in teacher preparation:
How well do different pathways prepare teachers to teach? Journal of Teacher
Education, 53(4), 286-302.
Darling-Hammond, L. (2006). Powerful teacher education: Lessons from exemplary programs.
Jossey-Bass.
DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 169
in teacher candidates’ learning. Assessment in Education: Principles, Policy, &
Practice, 17(4), 419-438.
DeLuca, C., Klinger, D., Searle, M., & Schula, L. (2010). Developing a curriculum for
assessment education. Assessment Matters, 1, 133-156.
Deneen, C. & Brown, G. (2016). The impact of conceptions of assessment on assessment literacy
in a teacher education program. Cogent Education, 3, 1-14.
https://doi.org/10.1080/2331186X.2016.1225380.
Denis, J.M. (2018). Assessment in music: A practitioner introduction to assessing students.
Update, 36(3), 20-28. https://doi.org/10.1177/8755123317741489
Desimone, L. (2009). Improving impact studies of teachers’ professional development: Toward
better conceptualizations and measures. Educational Researcher, 38(3), 181-199.
https://doi.org/10.3102/0013189X-8331140
Dietz-Uhler, B. & Hurn, J. (2013). Using learning analytics to predict (and improve) student
success: a faculty perspective. Journal of Interactive Online Learning, 12(1), 17-26.
Donovan, C. (2018). Rasch Analysis of the teachers' knowledge and use of data and assessment
(tKUDA) measure. Journal of Applied Measurement, 19(1), 76-92.
Donovan, C. (2015). Measuring teachers' knowledge and use of data assessments: Creating a
measure as a first step toward effective professional development (Doctoral dissertation).
University of Denver, Denver. (10017935).
Earl, L. & Katz, S. (2006). Rethinking classroom assessment with purpose in mind: Assessment
for learning, assessment as learning, assessment of learning. Winnipeg: Manitoba
Education, Citizenship and Youth.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 170
Fan, Y., Wang, T., Wang, K. (2011). A web-based model for developing assessment literacy of
secondary in-service teachers. Computers & Education, 57, 1727-1740.
https://doi.org/10.1016/j.compedu.2011.03.006
Fulmer, G., Lee, I., Tan, K. (2015). Multi-level model of contextual factors and teachers’
assessment practices: an integrative review of research. Assessment in Education:
Principles, Policy & Practice, 22(4), 475-494.
https://doi.org/10.1080/0969594X.2015.1017445
Gareis, C. R., & Grant, L. W. (2015). Assessment literacy for teacher candidates: A
focused approach. Teacher Educators' Journal, 4-21.
Goldberg, G. L., & Roswell, B. S. (1998). Perception and practice: The impact of teachers'
scoring experience on performance-based instruction and classroom assessment. Paper
presented at the annual meeting of the American Educational Research Association, San
Diego. (ERIC Document Number ED 420 670)
Great Schools Partnership (2013). Professional Development Definition. Retrieved from
https://www.edglossary.org/professional-development/.
Gutierrez, S.L. (2014). From National Standards to classrooms: A case study of middle level
teachers' assessment knowledge and practices (Doctoral dissertation). Western Michigan
University (https://scholarworks.wmich.edu/dissertations/245).
Hahn, K. R. (2010). Inclusion of students with disabilities: Preparation and practices of music
educators (Publication No. 3420149) [Doctor Dissertation]. Retrieved from ProQuest
Dissertations & Theses A&I.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 171
Hanzlik, T. (2001). An examination of Iowa high school instrumental band directors' assessment
practices and attitudes toward assessment (3009721)[Doctoral dissertation]. University of
Nebraska, Lincoln.
Herman, J. L. & Baker, E. L. (2009). Assessment policy: Making sense of the babel. In G. Sykes,
B. Schneider, & D. N. Plank (Eds.), Handbook of education policy research (p. 176-190).
Routledge.
Heritage, M. (2007). Formative assessment: What do teachers need to know and do? Phi
Delta Kappan, 89(2).
Hidri, S. (2015). Conceptions of assessment: Investigating what assessment means to secondary
and university teachers. Arab Journal of Applied Linguistics, 1(1), 19-43.
Hill, K. W. (1999). A descriptive study of assessment procedures, assessment attitudes, and
grading policies in selected public high school band performance classrooms in
Mississippi (9935693) [Doctoral dissertation]. The University of Southern Mississippi.
ProQuest Dissertations and Theses A&I.
Hoover, N. R., & Abrams, L. M. (2013). Teachers’ instructional use of summative
student assessment data. Applied Measurement in Education, 26(3), 219–231.
Hourigan, R. M. (2008). Teaching strategies for performers with special needs. Teaching
Music, 15, 26-29.
Isbell, D. (2008). Musicians and teachers: The socialization and occupational identity of
preservice music teachers Journal of Research in Music Education, 56(2), 162-178.
https://doi.org/10.1177/0022429408322853
Jamil, F. & Hamre, B. (2018). Teacher reflection in the context of an online professional
development course: applying principles of cognitive science to promote teacher
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 172
learning. Action in Teacher Education, 40(2), 220-236.
https://doi.org/10.1080/01626620.2018.1424051
Johnson, E. (2014). Preservice music teachers’ occupational identity in a beginning conducting
course. Journal of Education and Training Studies, 2(3).
https://doi.org/10.11114/jets.v2i3.422
Kancianic, P.M. (2006). Classroom assessment in U.S. high school band programs: methods,
purposes, & influences (3222315) [Doctoral Dissertation]. University of Maryland,
College Park. ProQuest Dissertations and Theses A&I.
Koerner, B. D. (2017). Beginning music teacher mentoring: Impact on reflective practice,
teaching efficacy, and professional commitment (Publication No. 10642603) [Doctoral
dissertation, University of Colorado Boulder]. ProQuest Dissertations and Theses Global.
Kotora, E. J. (2005). Assessment practices in the choral music classroom: A survey of Ohio high
school choral music teachers and college choral methods professors. Contributions to
Music Education, 32, 65–80.
Koutsoupidou, T. (2014). Online distance learning and music training: Benefits, drawbacks, and
challenges. Open Learning: The Journal of Open, Distance and e-Learning, 29(3), 243-
255. https://doi.org/10.1080/02680513.2015.1011112
Leong, W. (2014). Understanding classroom assessment in dilemmatic spaces: Case studies of
Singaporean music teachers’ conceptions of classroom assessment. Music Education
Research, 16(4), 454-470. https://doi.org/10.1080/14613808.2013.878325
LaCognata, J.P. (2010). Current student assessment practices of high school band directors
(3436343) [Doctoral Dissertation]. University of Florida, Gainesville.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 173
LaCognata, J. P. (2013). Current student assessment practices of high school band directors in
the United States. In T. Brophy & A. Lehmann-Wermser (Eds.), Music assessment across
cultures and continents: The culture of shared practice (pp. 109-128). GIA.
Ludwig, N. (2013). Exploring the relationship between K-12 public school teachers’ conceptions
of assessment and their classroom assessment confidence levels (3579798) [Doctoral
dissertation]. Regent University. ProQuest Dissertations and Theses A&I.
Mantie, R. (2012). Band and/as music education: Antinomies and the struggle for legitimacy.
Philosophy of Music Education Review, 20(1), 63-81. https://doi.org/
10.2979/philmusieducrevi.20.1.63
May, B. N., Willie, K., Worthen, C., & Pehrson, A. (2017). An analysis of state music education
certification and licensure practices in the United States. Journal of Music Teacher
Education, 27(1), 65-88. https://doi.org/10.1177/1057083717699650
McClung, A.C. (1996). A descriptive study of learning assessment and grading practices in the
high school choral music performance classroom (9700217) [Doctoral dissertation]. The
Florida State University, Tallahassee. ProQuest Dissertations and Theses A&I.
McConnell, T., Parker, J., Eberhardt, J., Koehler, M., & Lundeberg, M. (2012). Virtual
professional learning communities: Teachers’ perceptions of virtual versus face-to-face
professional development. Journal of Science Education and Technology, 22, 267-277.
https://doi.org/10.1007/s10956-012-9391-y
McMillan, J. H. (2001). Secondary teachers’ classroom assessment and grading practices.
Educational Measurement: Issues and Practice, 20, 20–32.
McMillan, J. H., & Nash, S. (2000, April). Teacher classroom assessment and grading practice
decision making. Paper presented at the annual meeting of the National Council on
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 174
Measurement in Education, New Orleans.
McQuarrie, Sarah H. and Sherwin, Ronald G. (2013) Assessment in music education:
Relationships between classroom practice and professional publication topics. Research
& Issues in Music Education, 11(1).
Mertler, C. (2000). Teacher-centered fallacies of classroom assessment validity and reliability.
Mid-Western Educational Researcher, 13(4), 29-35.
Mertler, C. (2004). Secondary teachers' assessment literacy: Does classroom
experience make a difference? American Secondary Education, 49-64.
Mertler, C. (2009). Teachers’ assessment knowledge and their perceptions of the impact of
classroom assessment professional development. Improving Schools, 12(1), 101–113.
Mertler, C.A. & Campbell, C.S. (2005). Measuring teachers' knowledge and application
of classroom assessment concepts: Development of the Assessment Literacy
Inventory. Proceeding from American Educational Research Association, Montreal,
Quebec, Canada.
Montgomery, A., Mousavi, A., Carbonaro, M., Hawyard, D., & Dunn, W. (2019). Using learning
analytics to explore self-regulated learning in flipped blended learning music teacher
education. British Journal of Educational Technology, 50(1), 114-127.
https://doi.org/10.111/bjet.12590
National Education Association (2019). Professional Development. Retrieved from
http://www.nea.org/home/30998.htm
Nierman, G. E. & Colwell, R. (2019). Perspectives from North America. In T. S. Brophy (Ed.),
The Oxford Handbook of Assessment Policy and Practice in Music Education (pp. 173-
196). Oxford Press.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 175
Nyberg, J. (2016). You are seldom born with a drum kit in your hands: Music teachers'
conceptualizations of knowledge and learning within music education as an assessment
practice. Systems of Practice and Action Research, 29, 235-259.
https://doi.org/10.1007/s11213-015-9352-3
Opre, D. (2015). Teachers’ conceptions of assessment. Procedia: Social and Behavioral
Sciences, 209, 229-233.
Oosterhof, A. (1995). An extended observation of assessment procedures used by selected
public school teachers. Paper presented at the annual meeting of the American
Educational Research Association, San Francisco. (ERIC Document Number ED 390
937)
Pastore, S. & Pentassuglia, M. (2016). Teachers’ and students’ conceptions of assessment within
the Italian higher education system. Practitioner Research in Higher Education, 10(1),
109-120.
Payne, P., Burrack, F., Parkes, K., & Wesolowski, B. (2019). An emerging process of assessment
in music education. Music Educators Journal, 105(3), 36-44.
https://doi.org/10.1177/0027432118818880
Pellegrino, K., Conway, C., & Russell, J. (2015). Assessment in performance-based secondary
music classes. Music Educators Journal, 102(1), 48-55.
https://doi.org/10.1177/0027432115590183
Perry, M. L. (2013). Teacher and principal assessment literacy (Publication No.
3568118)[Doctoral dissertation]. University of Montana. ProQuest Dissertations and
Theses A&I.
Pike, P. (2017). Improving music teaching and learning through online service: a case study of a
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 176
synchronous online teaching internship. International Journal of Music Education, 35(1),
107-117. https://doi.org/10.1177/0255761415613534
Pishghadam, R., Adamson, B., Sadafian, S., & Kan, F. (2014). Conceptions of assessment and
teacher burnout. Assessment in Education: Principles, Policy & Practice, 21(1), 34-51.
https://doi.org/10.1080/0969594X.2013.817382
Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory
into Practice, 48(1), 4-11.
Popham, W. (2011). Assessment literacy overlooked: a teacher educator’s confession. The
Teacher Educator, 46(4), 265-273. https://doi.org/10.1080/08878730.2011.605048
Prichard, S. (2018). A profile of high-stakes assessment practices in music teacher education.
Journal of Music Teacher Education, 27(3), 94-105.
https://doi.org/10.1177/1057083717750079
Remesal, A. (2010). Primary and secondary teachers’ conceptions of assessment: a qualitative
study. Teaching and Teacher Education, 27, 472-482.
https://doi.org/10.1016/j.tate.2010.09.017
Richerme, L.K. (2016). Measuring music education: A philosophical investigation of the model
cornerstone assessments. Journal of Research in Music Education, 64(3), 274-293.
https://doi.org/10.1177/0022429416659250
Rowan, B., & Correnti, R. (2009). Studying reading instruction with teacher logs: Lessons from
the study of instructional improvement. Educational Researcher, 38(2), 120-131.
Russell, J. (2011). Assessment and case law: Implications for the grading practices of music
educators. Music Educators Journal, 97(3), 35-39.
https://doi.org/10.1177/0027432110392051
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 177
Russell, J. A. (2018). Statistics in Music Education Research. Oxford University Press.
Russell, J. A. & Austin, J. R. (2010). Assessment practices of secondary music teachers. Journal
of Research in Music Education, 58(1), 37-54.
https://doi.org/10.1177/0023429409360062
Ryan, K.A. (2018). An investigation of pre-service teacher assessment literacy and assessment
confidence: measure development and EdTPA performance (Publication No.
10871606)[Doctoral dissertation]. Kent State University. ProQuest Dissertations and
Theses A&I.
Sears, M. (2002). Assessment in the instrumental music classroom: Middle school methods &
materials (Publication No.1409387)[Master’s thesis]. University of Massachusetts.
ProQuest Dissertations and Theses A&I.
Sherman, C. (2006). A study of current strategies and practices in the assessment of individuals
in high school bands (Publication No. 3237089)[Doctoral dissertation]. Teachers College
Columbia University. ProQuest Dissertations and Theses A&I.
Siegel, M. A. & Wissehr C. (2011). Preparing for the plunge: Preservice teachers’
assessment literacy. Journal of Science Teacher Education, 22, 371–391.
Simanton, E. (2000). Assessment and grading practices among high school band teachers in the
United States: A descriptive study (Publication No. 9986536)[Doctoral dissertation].
University of North Dakota. ProQuest Dissertations and Theses A&I.
Stake, R. E. (2010). Qualitative research: Studying how things work. Guilford Press.
Stiggins, R. (1991). Assessment literacy. The Phi Delta Kappan, 72(7), 534–539.
Stiggins, R. (2002). Assessment crisis: the absence of assessment for learning. The Phi Delta
Kappan, 83(10), 758-765. https://doi.org/10.1177/003172170208301010
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 178
Stiggins, R. (2004). New assessment beliefs for a new school mission. The Phi Delta Kappan,
86(1), 22-27. https://doi.org/10.1177/003172170408600106
Stiggins, R. (2005). From formative assessment to assessment for learning: a path to success in
standards-based schools. The Phi Delta Kappan, 87(4), 324-328.
https://doi.org/10.1177/003172170508700414
Stiggins, R. (2014). Improve assessment literacy outside of schools, too. The Phi Delta Kappan,
96(2), 67-72. https://doi.org/10.1177/0031721714553413
St. Pierre, N., & Wuttke, B. (2017). Standards-based grading practices among practicing music
educators: prevalence and rationale. Update, 35(2), 30-37.
https://doi.org/10.1177/8755123315604468
Thompson, N. (2010). Kr-20. In N. J. Salkind (Ed.), Encyclopedia of research design (p. 668).
SAGE Publications, Inc. https://doi.org/10.4135/9781412961288.n205
UNESCO. (2020). COVID-19 impact on education.
https://en.unesco.org/covid19/educationresponse
vanOostveen, R., Desjardins, F., & Bullock, S. (2019). Professional development learning
environments (PDLEs) embedded in a collaborative online learning environment
(COLE): Moving towards a new conception of online professional learning. Education
Information and Technology, 24, 1863-1900. https://doi.org/10.1007/s10639-018-9686-6
Walls, K. (2008). Distance learning in graduate music teacher education: Promoting professional
development and satisfaction of music teachers. Journal of Music Teacher Education,
18(1), 55-66. https://doi.org/10.1177/1057093708323137
Wasserman, E. & Migdal, R. (2019). Professional development: Teachers’ attitudes in online and
traditional training courses. Online Learning Journal, 23(1), 132-143.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 179
https://doi.org/10.24057/olj.v23il.1299
Wiggins, G. & McTighe, J. (2005). Understanding by Design (2nd ed.). Alexandria, VA: ‘
Association for Supervision and Curriculum Design.
Willis, J., Adie, L., & Klenowski, V. (2013). Conceptualising teachers' assessment literacies in
an era of curriculum and assessment reform. Australian Educational Researcher, 40(2),
241-256.
Yurkofsky, M., Blum-Smith, S., Brennan, K. (2019). Expanding outcomes: Exploring varied
conceptions of teacher learning in an online professional development experience.
Teaching and Teacher Education, 82, 1-13. https://doi.org/10.1016/j.tate.2019.03.002
Zepeda, S. J. (2019). Professional development: What works (3rd ed.). Retrieved from
http://ebookcentral.proquest.com
Zhang, Z. (1996). Teacher assessment competency: A Rasch model analysis. Paper presented at
the annual meeting of the American Educational Research Association, New York.
(ERIC Document Number ED 400 322)
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 181
Appendix B
Standards for Teacher Competence in Educational Assessment of Students (STCEAS)
Standards & Corresponding CALI Items
1. Teachers should be skilled in choosing assessment methods appropriate for instructional
decisions.
Items: 1, 2, 3, 4, 5
2. Teachers should be skilled in developing assessment methods appropriate for
instructional decisions.
Items: 6, 7, 8, 9, 10
3. The teacher should be skilled in administering, scoring and interpreting the results of
both externally-produced and teacher-produced assessment methods.
Items: 11, 12, 13, 14, 15
4. Teachers should be skilled in using assessment results when making decisions about
individual students, planning teaching, developing curriculum, and school
improvement.
Items: 16, 17, 18, 19, 20
5. Teachers should be skilled in developing valid pupil grading procedures which use
pupil assessments.
6. Teachers should be skilled in communicating assessment results to students, parents,
other lay audiences, and other educators.
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise
inappropriate assessment methods and uses of assessment information.
*Bolded standards and items were used in the measure for this study.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 182
Appendix C
Original CALI Instrument
by Dr. Craig Mertler,
Bowling Green State University
[Adapted from the Teacher Assessment Literacy Questionnaire (1993), by Barbara S. Plake &
James C. Impara, University of Nebraska-Lincoln, in cooperation with The National Council on
Measurement in Education & the W.K. Kellogg Foundation]
Directions: Please read each item carefully and select the response you think is the best one by
shading the corresponding circle. Even if you are not sure of your choice, but you think you
know which is best, mark that response.
PART I
1. What is the most important consideration in choosing a method for assessing student
achievement?
❏ The ease of scoring the assessment.
❏ The ease of preparing the assessment.
❏ The accuracy of assessing whether or not instructional objectives were attained.
❏ The acceptance by the school administration.
2. When scores from a standardized test are said to be “reliable,” what does it imply?
❏ Student scores from the test can be used for a large number of educational decisions.
❏ If a student retook the same test, he or she would get a similar score on each retake.
❏ The test score is a more valid measure than teacher judgments.
❏ The test score accurately reflects the content of what was taught.
3. Mrs. Bruce wished to assess her students' understanding of the method of problem solving she
had been teaching. Which assessment strategy below would be most valid?
❏ Select a textbook that has a "teacher's guide" with a test developed by the authors.
❏ Develop an assessment consistent with an outline of what she has actually taught in the
class.
❏ Select a standardized test that provides a score on problem solving skills.
❏ Select an instrument that measures students' attitudes about problem solving strategies.
4. What is the most effective use a teacher can make of an assessment that requires students to
show their work (e.g., the way they arrived at a solution to a problem or the logic used to arrive
at a conclusion)?
❏ Assigning grades for a unit of instruction on problem solving.
❏ Providing instructional feedback to individual students.
❏ Motivating students to attempt innovative ways to solve problems.
❏ None of the above.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 183
5. Ms. Green, the principal, was evaluating the teaching performance of Mr. Williams, the fourth
grade teacher. One of the things Ms. Green wanted to learn was if the students were being
encouraged to use higher order thinking skills in the class. What documentation would be the
most valid to help Ms. Green to make this decision?
❏ Mr. Williams’ lesson plans.
❏ The state curriculum guides for fourth grade.
❏ Copies of Mr. Williams’ unit tests or assessment strategies used to assign grades.
❏ Worksheets completed by Mr. Williams’ students, but not used for grading.
6. A teacher wants to document the validity of the scores from a classroom assessment strategy
she plans to use for assigning grades on a class unit. What kind of information would provide the
best evidence for this purpose?
❏ Have other teachers judge whether the assessment strategy covers what was taught.
❏ Match an outline of the instructional content to the content of the actual assessment.
❏ Let students in the class indicate if they thought the assessment was valid.
❏ Ask parents if the assessment reflects important learning outcomes.
7. Which of the following would most likely increase the reliability of Mrs. Lockwood's multiple
choice end-of-unit examination in physical science?
❏ Use a blueprint to develop the test questions.
❏ Change the test format to true-false questions.
❏ Add more items like those already on the test.
❏ Add an essay component.
8. Ms. Gregory wants to assess her students' skills in organizing ideas rather than just repeating
facts. Which words should she use in formulating essay exercises to achieve this goal?
❏ compare, contrast, criticize
❏ identify, specify, list
❏ order, match, select
❏ define, recall, restate
9. Mr. Woodruff wanted his students to appreciate the literary works of Edgar Allen Poe. Which
of his test items shown below will best measure his instructional goal?
❏ "Spoke the raven, nevermore." comes from which of Poe's works?
❏ True or False: Poe was an orphan and never knew his biological parents.
❏ Edgar Allen Poe wrote: 1. Novels 2. Short stories 3. Poems 4. All of the above.
❏ Discuss briefly your view of Poe's contribution to American literature.
10. Several students in Ms. Atwell's class received low scores on her end-of-unit test covering
multi-step story problems in mathematics. She wanted to know which students were having
similar problems so she could group them for instruction. Which assessment strategy would be
best for her to use for grouping students?
❏ Use the test provided in the "teacher's guide."
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 184
❏ Have the students take a test that has separate items for each step of the process.
❏ Look at the student's records and standardized test scores to see which topics the students
had not performed well on previously.
❏ Give students story problems to complete and have them show their work.
11. Many teachers score classroom tests using a 100-point percent correct scale. In general, what
does a student's score of 90 on such a scale mean?
❏ The student answered 90% of the items on this test correctly.
❏ The student knows 90% of the instructional content of the unit covered by this test.
❏ The student scored higher than 90% of all the students who took the test.
❏ The student scored 90% higher than the average student in the class.
12. Students in Mr. Jakman's science class are required to develop a model of the solar system as
part of their end-of-unit grade. Which scoring procedure below will maximize the objectivity of
assessing these student projects?
❏ When the models are turned in, Mr. Jakman identifies the most attractive models and
gives them the highest grades, the next most attractive get a lower grade and so on.
❏ Mr. Jakman asks other teachers in the building to rate each project on a 5-point scale
based on their quality.
❏ Before the projects are turned in, Mr. Jakman constructs a scoring key based on the
critical features of the projects as identified by the highest performing students in the
class.
❏ Before the projects are turned in, Mr. Jakman prepares a model or blueprint of the
critical features of the product and assigns scoring weights to these features. The models
with the highest scores receive the highest grade.
13. At the close of the first month of school, Mrs. Friend gives her fifth grade students a test she
developed in social studies. Her test is modeled after a standardized social studies test. It presents
passages and then asks questions related to understanding and problem definition. When the test
was scored, she noticed that two of her students—who had been performing well in their class
assignments—scored much lower than other students. Which of the following types of additional
information which would be most helpful in interpreting the results of this test?
❏ The gender of the students.
❏ The age of the students.
❏ Reliability data for the standardized social studies test she used as the model.
❏ Reading comprehension scores for the students.
14. Frank, a beginning fifth grader, received a G. E. (grade equivalent score) of 8.0 on the
Reading Comprehension subtest of a standardized test. This score should be interpreted to mean
that Frank
❏ can read and understand 8th grade reading level material.
❏ scored as well as a typical beginning 8th grader scored on this test.
❏ is performing in Reading Comprehension at the 8th grade level.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 185
❏ will probably reach maximum performance in Reading Comprehension at the beginning
of the 8th grade.
15. When the directions indicate each section of a standardized test is timed separately, which of
the following is acceptable test-taking behavior?
❏ John finishes the vocabulary section early; he then rechecks many of his answers in that
section.
❏ Mary finishes the vocabulary section early; she checks her answers on the previous test
section.
❏ Jane finishes the vocabulary section early; she looks ahead at the next test section but
does not mark her answer sheet for any of those items.
❏ Bob did not finish the vocabulary section; he continues to work on that section when the
testing time is up.
16. Ms. Camp is starting a new semester with a factoring unit in her Algebra I class. Before
beginning the unit, she gives her students a test on the commutative, associative, and distributive
properties of addition and multiplication. Which of the following is the most likely reason she
gives this test to her students?
❏ The principal needs to report the results of this assessment to the state testing director.
❏ Ms. Camp wants to give the students practice in taking tests early in the semester.
❏ Ms. Camp wants to check for prerequisite knowledge in her students before she begins
the unit on factoring.
❏ Ms. Camp wants to measure growth in student achievement of these concepts, and scores
on this test will serve as the students' knowledge baseline.
17. To evaluate the effectiveness of the mathematics program for her gifted first graders, Ms.
Allen gave them a standardized mathematics test normed for third graders. To decide how well
her students performed, Ms. Allen compared her students' scores to those of the third-grade norm
group. Why is this an incorrect application of standardized test norms?
❏ The norms are not reliable for first graders.
❏ The norms are not valid for first graders.
❏ Third grade mathematics items are too difficult for first graders.
❏ The time limits are too short for first graders.
18. When planning classroom instruction for a unit on arithmetic operations with fractions,
which of these types of information have more potential to be helpful?
norm-referenced information: describes each student's performance relative to a other students in
a group (e.g., percentile ranks, stanines), or
criterion-referenced information: describes each student's performance in terms of status on
specific learning outcomes (e.g., number of items correctly answered for each specific objective)
❏ Norm-referenced information.
❏ Criterion-referenced information.
❏ Both types of information are equally useful in helping to plan for instruction.
❏ Neither, test information is not useful in helping to plan instruction.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 186
19. Students' scores on standardized tests are sometimes inconsistent with their performances on
classroom assessments (e.g., teacher tests or other in-class activities). Which of the following is
not a reasonable explanation for such discrepancies?
❏ Some students freeze up on standardized tests, but they do fine on classroom
assessments.
❏ Students often take standardized tests less seriously than they take classroom
assessments.
❏ Standardized tests measure only recall of information while classroom assessments
measure more complex thinking.
❏ Standardized tests may have less curriculum validity than classroom assessment.
20. Elementary school teachers in the Baker School system collectively designed and developed
new curricula in Reading, Mathematics, and Science that is based on locally developed
objectives and objectives in state curriculum guides. The new curricula were not matched
directly to the content of the fourth grade standardized test. A newspaper reports the fourth grade
students in Baker Public Schools are among the lowest scoring districts in the State Assessment
Program. Which of the following would invalidate the comparison between Baker Public
Schools and other schools in the state?
❏ The curriculum objectives of the other districts may more closely match those of the
State Assessment.
❏ Other school systems did not design their curriculum to be consistent with the State
Assessment test.
❏ Instruction in Baker schools is poor.
❏ Other school systems have different promotion policies than Baker.
21. Which of the following choices typically provides the most reliable student-performance
information that a teacher might consider when assigning a unit grade?
❏ Scores from a teacher-made test containing two or three essay questions related directly
to instructional objectives of the unit.
❏ Scores from a teacher-made 20 item multiple-choice test designed to measure the
specific instructional objectives of the unit.
❏ Oral responses to questions asked in class of each student over the course of the unit.
❏ Daily grades designed to indicate the quality of in-class participation during regular
instruction.
22. A teacher gave three tests during a grading period and she wants to weight them all equally
when assigning grades. The goal of the grading program is to rank order students on
achievement. In order to achieve this goal, which of the following should be closest to equal?
❏ Number of items.
❏ Number of students taking each test.
❏ Average scores.
❏ Variation (range) of scores.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 187
23. When a parent asks a teacher to explain the basis for his or her child's grade, the teacher
should
❏ explain that the grades are assigned fairly, based on the student's performance and other
related factors.
❏ ask the parents what they think should be the basis for the child's grade.
❏ explain exactly how the grade was determined and show the parent samples of the
student's work.
❏ indicate that the grading scale is imposed by the school board and the teachers have no
control over grades.
24. Which of the following grading practices results in a grade that least reflects students'
achievement?
❏ Mr. Jones requires students to turn in homework; however, he only grades the odd
numbered items.
❏ Mrs. Brown uses weekly quizzes and three major examinations to assign final grades in
her class.
❏ Ms. Smith permits students to redo their assignments several times if they need more
opportunities to meet her standards for grades.
❏ Miss Engle deducts 5 points from a student's test grade for disruptive behavior.
25. During the most recent grading period, Ms. Johnson graded no homework and gave only one
end-of-unit test. Grades were assigned only on the basis of the test. Which of the following is the
major criticism regarding how she assigned the grades?
❏ The grades probably reflect a bias against minority students that exists in most tests.
❏ Decisions like grade assignment should be based on more than one piece of information.
❏ The test was too narrow in curriculum focus.
❏ There is no significant criticism of this method providing the test covered the unit's
content.
26. In a routine conference with Mary's parents, Mrs. Estes observed that Mary's scores on the
state assessment program's quantitative reasoning tests indicate Mary is performing better in
mathematics concepts than in mathematics computation. This probably means that
❏ Mary's score on the computation test was below average.
❏ Mary is an excellent student in mathematics concepts.
❏ the percentile bands for the mathematics concepts and computation tests do not overlap.
❏ the mathematics concepts test is a more valid measure of Mary's quantitative reasoning
ability.
27. Many states are revising their school accountability programs to help explain differences in
test scores across school systems. Which of the following is not something that needs to be
considered in such a program?
❏ The number of students in each school system.
❏ The average socio-economic status of the school systems.
❏ The race/ethnic distribution of students in each school system.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 188
❏ The drop-out rate in each school systems.
28. The following standardized test data are reported for John.
Subject -- Stanine Score
Vocabulary -- 7
Mathematics Computation -- 7
Social Studies -- 7
Which of the following is a valid interpretation of this score report?
❏ John answered correctly the same number of items on each of the three tests.
❏ John's test scores are equivalent to a typical seventh grader's test performance.
❏ John had the same percentile rank on the three tests.
❏ John scored above average on each of the three tests.
29. Mr. Klein bases his students' grades mostly on graded homework and tests. Mr. Kaplan bases
his students' grades mostly on his observation of the students during class. A major difference in
these two assessment strategies for assigning grades can best be summarized as a difference in
❏ formal and informal assessment.
❏ performance and applied assessment.
❏ customized and tailored assessment.
❏ formative and summative assessment.
30. John scored at the 60th percentile on a mathematics concepts test and scored at the 57th
percentile on a test of reading comprehension. If the percentile bands for each test are five
percentile ranks wide, what should John's teacher do in light of these test results?
❏ Ignore this difference.
❏ Provide John with individual help in reading.
❏ Motivate John to read more extensively outside of school.
❏ Provide enrichment experiences for John in mathematics, his better performance area.
31. In some states testing companies are required to release items from prior versions of a test to
anyone who requests them. Such requirements are known as
❏ open-testing mandates.
❏ gag rules.
❏ freedom-of-information acts.
❏ truth-in-testing laws.
32. Mrs. Brown wants to let her students know how they did on their test as quickly as possible.
She tells her students that their scored tests will be on a chair outside of her room immediately
after school. The students may come by and pick out their graded test from among the other tests
for their class. What is wrong with Mrs. Brown's action?
❏ The students can see the other students' graded tests, making it a violation of the
students' right of privacy.
❏ The students have to wait until after school, so the action is unfair to students who have
to leave immediately after school.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 189
❏ Mrs. Brown will have to rush to get the tests graded by the end of the school day, hence,
the action prevents her from using the test to identify students who need special help.
❏ The students who were absent will have an unfair advantage, because her action allows
the possibility for these students to cheat.
33. A state uses its statewide testing program as a basis for distributing resources to school
systems. To establish an equitable distribution plan, the criterion set by the State Board of
Education provides additional resources to every school system with student achievement test
scores above the state average. Which cliché best describes the likely outcome of this regulation?
❏ Every cloud has its silver lining.
❏ Into each life some rain must fall.
❏ The rich get richer and the poor get poorer.
❏ A bird in the hand is worth two in the bush.
34. In a school where teacher evaluations are based in part on their students' scores on a
standardized test, several teachers noted that one of their students did not reach some vocabulary
items on a standardized test. Which teacher's actions is considered ethical?
❏ Mr. Jackson darkened circles on the answer sheet at random. He assumed Fred, who was
not a good student, would just guess at the answers, so this would be a fair way to obtain
Fred's score on the test.
❏ Mr. Hoover filled in the answer sheet the way he thought Joan, who was not feeling well,
would have answered based on Joan's typical in-class performance.
❏ Mr. Stover turned in the answer sheet as it was, even though he thought George, an
average student, might have gotten a higher score had he finished the test.
❏ Mr. Lund read each question and darkened in the bubbles on the answer sheet that
represented what he believed Felicia, a slightly below average student, would select as
the correct answers.
35. Mrs. Overton was concerned that her students would not do well on the State Assessment
Program to be administered in the Spring. She got a copy of the standardized test form that was
going to be used. She did each of the following activities to help increase scores. Which activity
was unethical?
❏ Instructed students in strategies on taking multiple choice tests, including how to use
answer sheets.
❏ Gave students the items from an alternate form of the test.
❏ Planned instruction to focus on the concepts covered in the test.
❏ None of these actions are unethical.
PART II
36. What is your gender?
❏ female
❏ male
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 190
37. Which of the following is the most appropriate description of the level at which you teach?
❏ elementary – primary (K – grade 3)
❏ elementary – intermediate (grades 4 – 6)
❏ elementary (K – 6)
❏ middle (grades 6 – 8)
❏ high (grades 9 – 12)
❏ secondary (grades 6 – 12)
❏ K – 12
❏ other
38. Which best describes the educational level you have attained?
❏ B.A. or B.S.
❏ M.A. or M.S.
❏ Specialist
❏ Ed.D.
❏ Ph.D.
39. Including the current year, how many years of experience do you have as a classroom
teacher?
❏ 1 – 5 years
❏ 6 – 10 years
❏ 11 – 15 years
❏ 16 – 20 years
❏ 21 – 25 years
❏ 26 – 30 years
❏ more than 30 years
40. To the best of your knowledge, did you take a standalone course in classroom assessment as
part of you undergraduate teacher preparation?
❏ yes
❏ no
41. Which of the following best describes your perception of the level of preparation for the
overall job of being a classroom teacher that resulted from your undergraduate teacher
preparation program?
❏ very unprepared
❏ somewhat unprepared
❏ somewhat prepared
❏ very prepared
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 191
42. Which of the following best describes your perception of the level of preparation for
assessing student performance that resulted from your undergraduate teacher preparation
program?
❏ very unprepared
❏ somewhat unprepared
❏ somewhat prepared
❏ very prepared
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 192
Appendix D
Intervention Module Design
Home Screen
Weekly Modules & Prompts
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 199
Appendix E
Prescreening and Informed Consent Questionnaire & Informed Consent
Thank you for expressing interest in participating in this research study. The following
questionnaire includes three sections:
(1) A pre-screening question to ensure that you are eligible to participate in this study.
(2) A description of the study followed by informed consent documentation. If you are
interested in participating in this study, you must provide informed consent and an email address
that you access regularly.
(3) A series of demographic and background questions (11 items).
This questionnaire should take you no longer than five minutes to complete. At the end, you will
be notified about which condition you have been randomly assigned to (i.e., the intervention
group or the control group). If you are randomly assigned to the intervention group, you will also
receive information about how to register for the four-week online professional development. If
you are assigned to the control group but would still like to receive the four-week online
professional development, I will offer the course again after the data collection period has ended
for this study upon request.
If you have any questions or concerns, please contact me via email at
[email protected] or via phone at (443) 235-0957.
***
Q8 Are you currently a music teacher in a PK-12 classroom in the United States?
● Yes (1)
● No (2)
[Skip To: End of Block If Are you currently a music teacher in a PK-12 classroom in the United
States? = Yes]
Unfortunately you are not eligible to participate in this study at this time. Thank you for your
interest, and best wishes to you in your future teaching endeavors. You may exit out of this
window at any time.
[Skip To: End of Survey If Unfortunately you are not eligible to participate in this study at this
time. Thank you for your... Is Displayed]
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 200
***
Q32 You are eligible to participate in this study.
Please continue to the next screen to read a description of the study and provide informed
consent.
***
Q7 Permission to Take Part in a Human Research Study Title of research study: Music
Teachers’ Assessment Literacy, Beliefs, & Practices: A Mixed Methods Intervention Study IRB
Protocol Number: 20-0054 Investigator: Jocelyn W. Armes Purpose of the Study The
purpose of the study is to examine the effectiveness of an online professional development
intervention for music teachers in changing assessment literacy, beliefs, and practices. A second
purpose of this study is to explore music teachers’ beliefs about assessment. Although
researchers in general education have examined assessment literacy, beliefs, and practices
separately, no one, to date, has examined these concepts at the same time. In addition, no one has
used an intervention to enhance music teachers’ assessment literacy, beliefs, and practices. Very
little is known about music teachers’ assessment literacy and beliefs, however, there is reason to
suspect that music teachers’ assessment practices and beliefs are informed by conflicting beliefs
about assessment and external expectations.
I expect that you will be in this research study for six weeks, from March 8, 2020 until April
13, 2020. I expect about 200 people will be in this research study. Explanation of
Procedures If you decide to participate in this study, you can expect the following:
Because this is an intervention study, and I am trying to measure the change in music teachers’
assessment literacy, beliefs, and practices, you will be randomly assigned to either the
intervention group or a control group. Your group will be chosen by chance, like flipping a
coin. You will have an equal chance of being assigned to either group. Both groups will take a
questionnaire-type survey with questions related to assessment knowledge, practices, and beliefs.
Intervention Group. If you are in the intervention group, you will be sent an additional email
with information about how to register for the online assessment workshop. The online
assessment workshop will take place in a virtual classroom via Google Classroom. You will have
access to the course from March 8, 2020 until April 13, 2020.
Each week you will be asked to complete a module. Each module is expected to take about two
hours to complete. In each module, you will read materials about assessment. Next, you will
complete a discussion activity with other participants in the intervention. Then, you will
complete a task related to designing an assessment for your classroom. Modules will stay open
for the entire course. However, you should complete the modules in order, and strive to complete
every module within a week. Control Group. For this study, the control group is a true control;
that is, the participants in the control group will not take part in the online professional
development. If you, however, would like to receive the online professional development after
the experiment is over you will be permitted to do so, upon request. At the conclusion of the
four-week online assessment workshop, both groups will be sent a final email with the link to the
final questionnaire. This will help me measure any changes in music teachers’ assessment
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 201
knowledge, practices, and beliefs. Voluntary Participation and Withdrawal Whether or
not you take part in this research is your choice. You can leave the research at any time and it
will not be held against you. Potential Benefits We cannot promise any benefits to you or
others from your taking part in this research. However, you may find that participating in online
professional development about assessment increases your knowledge and skill in this area of
your teaching practice. You may also find that collaborating with your music teacher peers
provided you with fresh perspectives related to assessment. Finally, your students may benefit
from any increase in your knowledge or skill in this area. Confidentiality Information
obtained about and from you for this study will be kept confidential to the extent allowed by law.
Research information that identifies you may be shared with the University of Colorado Boulder
Institutional Review Board (IRB) and others who are responsible for ensuring compliance with
laws and regulations related to research, including people on behalf of the Office for Human
Research Protections. The information from this research may be published for scientific
purposes; however, your identity will not be given out. Payment for Participation You will
not be paid to be in this study. Questions If you have questions, concerns, or complaints, or
think the research has hurt you, you can contact me at [email protected], or my
advisor, James Austin, at [email protected]. This research has been reviewed and
approved by an IRB. You may talk to them at (303) 735-3702 or [email protected] if: •
Your questions, concerns, or complaints are not being answered by the research team. • You
cannot reach the research team. • You want to talk to someone besides the research team. • You
have questions about your rights as a research subject. • You want to get information or provide
input about this research.
Q3 Informed Consent Documentation
I have read through the invitation to participate in this intervention study, am aware of the
potential risks and benefits of participation, and understand that being in this study is voluntary
and that my responses are confidential and private.
● I agree to participate
● I do not agree to participate
Q5 By providing your email in the space below, you acknowledge that (a) you have read through
the invitation to participate in this intervention study, (b) you are aware of the potential risks and
benefits of participation, and (c) you understand that being in this study is voluntary and that
your responses will be kept confidential and private.
________________________________________________________________
***
Demographic Information
Please provide the basic demographic information requested below. This information will be
kept confidential. You will not be personally identifiable in any documents or reports generated
from this study.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 202
Q15 With what gender do you identify?
● Female (1)
● Male (2)
● Trans or Nonbinary (3)
Q16 What is your race?
● Caucasian or Non-Hispanic (1)
● Black or African American (2)
● Hispanic or Latinx (3)
● American Indian or Alaska Native (4)
● Asian (5)
● Native American or Pacific Islander (6)
● Biracial or multi-racial (7)
Q17 Which of the following grade levels do you teach? (Check all that apply.)
❏ Pre-Kindergarten (1)
❏ Kindergarten (5)
❏ 1st (6)
❏ 2nd (7)
❏ 3rd (8)
❏ 4th (9)
❏ 5th (10)
❏ 6th (11)
❏ 7th (12)
❏ 8th (13)
❏ 9th (14)
❏ 10th (15)
❏ 11th (16)
❏ 12th (17)
Q18 What courses do you teach? (Check all that apply.)
❏ Chorus/Vocal (1)
❏ Instrumental Band (2)
❏ Instrumental Orchestra (3)
❏ Instrumental Other (Jazz, marching, guitar, etc.) (4)
❏ General Music (5)
❏ Music Appreciation (6)
❏ Music Theory (7)
❏ Visual & Performing Arts (8)
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 203
Q19 Which best describes the educational level you have attained?
● Bachelor's degree (1)
● Master's degree (2)
● Master's +30 credits (3)
● Doctoral degree (4)
Q20 Including the current year, how many years of experience do you have as a classroom
teacher? (Answer as a number; e.g., 11.)
Q21 To the best of your knowledge, did you take a standalone course in classroom assessment as
part of your undergraduate teacher preparation?
● Yes (1)
● No (2)
Q22 Coming out of your undergraduate teacher preparation program, how prepared were you
for the job of being a music teacher?
● Very unprepared
● Unprepared
● Somewhat unprepared
● Somewhat prepared
● Prepared
● Very prepared
Q23 Coming out of your undergraduate teacher preparation program, how prepared were you to
assess students' learning?
● Very unprepared
● Unprepared
● Somewhat unprepared
● Somewhat prepared
● Prepared
● Very prepared
Q24 Have you ever taken a workshop (a few days or less) in which the only topic was
assessment?
● Yes (1)
● No (2)
Q25 Since completing your undergraduate degree, have you ever taken a course (a few weeks or
more) focused only on assessment?
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 204
● Yes (1)
● No (2)
***
[RANDOM ASSIGNMENT TO CONTROL OR INTERVENTION]
Q34 Thank you for your interest in participating in this study and completing the Informed
Consent documentation. You have been randomly assigned to the Control Group.
This means that you will not receive the online professional development from March 2020 -
April 2020. However, you will be asked to take a questionnaire about your assessment
knowledge, practices, and beliefs in March 2020 and April 2020. Your response to these
questionnaires is still important for comparison purposes.
If you would still like to receive the free online professional development in April 2020, please
select "Yes" below.
You should expect to receive a link to the study questionnaire from Jocelyn Armes, the Principal
Investigator, within an hour at the email address you provided. If you have further questions,
please contact Jocelyn Armes at [email protected] or (443) 235 - 0957.
● Yes, I would like to receive the free online professional development in April 2020, after
I complete the questionnaires in March and April. (4)
● No, I would not like to receive the free online professional development in April 2020,
after complete the questionnaires in March and April. (5)
[OR]
Q35 Thank you for your interest in participating in this study and completing the Informed
Consent documentation. You have been randomly assigned to the Intervention Group.
This means that you will receive the online professional development from March 2020 - April
2020.
You should expect to receive a link to the study questionnaire from Jocelyn Armes, the Principal
Investigator, within an hour at the email address you provided. If you have further questions,
please contact Jocelyn Armes at [email protected] or (443) 235 - 0957.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 205
Appendix F
Adapted Classroom Assessment Literacy Inventory (CALI)
Classroom Assessment Literacy Inventory
In this section, you will be asked 20 multiple choice questions about assessment. Choose the
best answer to each question. Even if you are not sure of your choice, mark that response.
[Adapted from the Classroom Literacy Assessment Inventory (2000), by C. Mertler, Bowling
Green State University]
***
What is the most important consideration in choosing a method for assessing student
achievement?
o The ease of scoring the assessment.
o The ease of preparing the assessment.
o The accuracy of assessing whether or not instructional objectives were attained.
o The acceptance by the school administration.
When scores from a standardized test are said to be “reliable,” what does it imply?
o Student scores from the test can be used for a large number of educational decisions.
o If a student retook the same test, he or she would get a similar score on each retake.
o The test score is a more valid measure than teacher judgments.
o The test score accurately reflects the content of what was taught.
Mrs. Bruce wished to assess her students' understanding of key signature identification. Which
assessment strategy below would be most valid?
o Select a music theory text that has a "teacher's guide" with a test developed by the
authors.
o Develop an assessment consistent with an outline of what she has taught in class.
o Select a standardized test that provides a score on problem solving skills.
o Select an instrument that measures students' attitudes about problem solving strategies.
What is the most effective use a teacher can make of an assessment that requires students to
show their work (e.g., the way they arrived at a solution to a problem or the logic used to arrive
at a conclusion)?
o Assigning grades for a unit of instruction on problem solving.
o Providing instructional feedback to individual students.
o Motivating students to attempt innovative ways to solve problems.
o None of the above.
Ms. Green, the principal, was evaluating the teaching performance of Mr. Williams, the
elementary general music teacher. One of the things Ms. Green wanted to learn was if the
students were being encouraged to use higher order thinking skills in the class. What
documentation would be the most valid to help Ms. Green to make this decision?
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 206
o Mr. Williams’ lesson plans.
o The state curriculum guides for fourth grade music.
o Copies of Mr. Williams’ unit tests or assessment strategies used to assign grades.
o Worksheets completed by Mr. Williams’ students, but not used for grading.
A teacher wants to document the validity of the scores from a classroom assessment strategy she
plans to use for assigning grades on a class unit. What kind of information would provide the
best evidence for this purpose?
o Have other teachers judge whether the assessment strategy covers what was taught.
o Match an outline of the instructional content to the content of the actual assessment.
o Let students in the class indicate if they thought the assessment was valid.
o Ask parents if the assessment reflects important learning outcomes.
Which of the following would most likely increase the reliability of Mrs. Lockwood's multiple-
choice end-of-unit examination in middle school band?
o Use a curriculum guide to develop the test questions.
o Change the test format to true-false questions.
o Add more items like those already on the test.
o Add an essay component.
Ms. Gregory wants to assess her students' skills in organizing ideas rather than just repeating
facts. Which words should she use in formulating essay exercises to achieve this goal?
o compare, contrast, criticize
o identify, specify, list
o order, match, select
o define, recall, restate
Mr. Woodruff wanted his students to appreciate the choral works of Craig Hella Johnson. Which
of his test items shown below will best measure his instructional goal?
o What is the name of the collection of Dorothy Water’s poetry that Johnson set to music?
o True or False: Johnson was the first ever Artist in Residence at Texas State University.
o Johnson writes works for: 1. Chorus 2. Soloists 3. Trios 4. All of the above.
o Discuss briefly your view of Johnson’s contribution to choral literature.
Several students in Ms. Atwell's class received low scores on her end-of-unit test covering
counting rhythms in simple duple meters. She wanted to know which students were having
similar problems so she could group them for instruction. Which assessment strategy would be
best for her to use for grouping students?
o Use the test provided in the "teacher's guide."
o Have the students take a test that has separate items for each simple duple meter.
o Look at the student's records to see which topics the students had not performed well on
previously.
o Give students practice worksheets to complete and have them write the counts in under
the examples.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 207
Many teachers score classroom tests using a 100-point percent correct scale. In general, what
does a student's score of 90 on such a scale mean?
o The student answered 90% of the items on this test correctly.
o The student knows 90% of the instructional content of the unit covered by this test.
o The student scored higher than 90% of all the students who took the test.
o The student scored 90% higher than the average student in the class.
Students in Mr. Jakman's music class are required to compose an original song as part of their
end-of-unit grade. Which scoring procedure below will maximize the objectivity of assessing
these student projects?
o When the compositions are turned in, Mr. Jakman identifies the most beautiful
compositions — to his ear — gives them the highest grades, the next most beautiful gets a lower
grade and so on.
o Mr. Jakman asks other teachers in the building to rate each composition on a 5-point
scale based on their quality.
o Before the compositions are turned in, Mr. Jakman constructs a scoring key based on
critical features of the projects as identified by the highest performing students in the class.
o Before the compositions are turned in, Mr. Jakman prepares a model of the critical
features of a composition and assigns scoring weights to these features. The compositions with
the highest scores receive the highest grade.
At the close of the first month of school, Mrs. Friend gives her fifth grade students a test she
developed for musical aptitude. Her test is modeled after a standardized aptitude test. It presents
aural examples and then asks questions related to identifying features in the music. When the test
was scored, she noticed that two of her students—who had been performing well in their class
assignments—scored much lower than other students.
Which of the following types of additional information which would be most helpful in
interpreting the results of this test?
o The gender of the students.
o The age of the students.
o Reliability data for the standardized test she used as the model.
o Reading comprehension scores for the students.
Frank, a fifth grader taking private violin lessons, received a G. E. (grade equivalent score) of 8.0
on the music theory subtest of a standardized test. This score should be interpreted to mean that:
o Frank can understand 8th grade music theory material.
o scored as well as a typical beginning 8th grader scored on this test.
o is performing in music theory at the 8th grade level.
o will probably reach maximum performance in music theory at the beginning of the 8th
grade.
When the directions indicate each section of a test is timed separately, which of the following is
acceptable test-taking behavior?
o John finishes the vocabulary section early; he then rechecks many of his answers in that
section.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 208
o Mary finishes the vocabulary section early; she checks her answers on the previous test
section.
o Jane finishes the vocabulary section early; she looks ahead at the next test section but
does not mark her answer sheet for any of those items.
o Bob did not finish the vocabulary section; he continues to work on that section when the
testing time is up.
Ms. Camp is starting a new concert cycle with a unit on minor keys in her auditioned choir class.
Before beginning the unit, she gives her students a test on key signature identification. Which of
the following is the most likely reason she gives this test to her students?
o The principal needs to report the results of this assessment to the state testing director.
o Ms. Camp wants to give students practice in taking tests early in the semester.
o Ms. Camp wants to check for prerequisite knowledge in her students before she begins
the unit.
o Ms. Camp wants to measure growth in student achievement of these concepts, and scores
on this test will serve as the students' knowledge baseline.
To evaluate the effectiveness of the Kodaly curriculum for her first graders, Ms. Allen gave them
a standardized test normed for third graders. To decide how well her students performed, Ms.
Allen compared her students' scores to those of the third-grade norm group. Why is this an
incorrect application of standardized test norms?
o The norms are not reliable for first graders.
o The norms are not valid for first graders.
o Third grade mathematics items are too difficult for first graders.
o The time limits are too short for first graders.
When planning classroom instruction for a unit on seventh chord construction, which of these
types of information have more potential to be helpful?
o norm-referenced information: describes each student's performance relative to other
students in a group (e.g., percentile ranks).
o criterion-referenced information: describes each student's performance in terms of status
on specific learning outcomes (e.g., number of items correctly answered for each specific
objective.
o Both types of information are equally useful in helping to plan for instruction.
o Neither, test information is not useful in helping to plan instruction.
Students' scores on standardized tests are sometimes inconsistent with their performances on
classroom assessments (e.g., teacher tests or other in-class activities). Which of the following is
not a reasonable explanation for such discrepancies?
o Some students freeze up on standardized tests, but they do fine on classroom
assessments.
o Students often take standardized tests less seriously than they take classroom
assessments.
o Standardized tests measure only recall of information while classroom assessments
measure more complex thinking.
o Standardized tests may have less curriculum validity than classroom assessment.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 209
Elementary school teachers in the Baker School system collectively designed and developed new
curricula in elementary general music, chorus, and band that are based on locally developed
objectives and objectives in state curriculum guides. The new curricula were not matched
directly to the content of the fourth-grade standardized test. A newspaper reports the fourth-grade
students in Baker Public Schools are among the lowest scoring districts in the State Assessment
Program.
Which of the following would invalidate the comparison between Baker Public Schools and
other schools in the state?
o The curriculum objectives of the other districts may more closely match those of the State
Assessment.
o Other school systems did not design their curriculum to be consistent with the State
Assessment test.
o Instruction in Baker schools is poor.
o Other school systems have different promotion policies than Baker.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 210
Appendix G
Music Teacher Assessment Implementation Inventory (MTAII)
Music Teacher Assessment Implementation (MTAI) Inventory
In this section, you will be asked to answer two questions related to the forms of assessments you
give, and the purposes for which you use assessment. In both cases, reflect upon a typical class in
your main teaching area.
***
Reflect upon a typical class in your main teaching area.
Within the last four weeks, how often have you used the following forms of assessment?
Never Less Than
Once Per
Week
Once Per
Week
Several
Times Per
Week
Nearly
Every Day
Written Tests/Quizzes o o o o o
Written Classwork/Homework o o o o o
Group Performances o o o o o
Individual Performances o o o o o
Projects o o o o o
Portfolios o o o o o
Attendance o o o o o
Participation o o o o o
***
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 211
Reflect upon a typical class in your main teaching area.
Within the last four weeks, how often have you used assessment for the following purposes?
Never Less Than
Once Per
Week
Once Per
Week
Several
Times Per
Week
Nearly
Every Day
Summative (i.e., assessments
used to provide information
about mastery, usually at the
end of instruction)
o o o o o
Formative (i.e., assessments
used to provide students
feedback during ongoing
instruction)
o o o o o
Diagnostic (i.e., assessments
used to identify areas of
improvement for students)
o o o o o
Placement (i.e. assessments
used to sort or order students
into targeted groupings)
o o o o o
Extramusical (i.e.,
assessments used to motivate
or hold students accountable
for behaviors)
o o o o o
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 212
Appendix H
Music Teacher Assessment Beliefs Inventory (MTABI)
In this section, you will be asked to answer questions related to your beliefs about assessment.
Music teachers may adopt different views as to the nature or value of assessment. Please
indicate the extent to which you agree with each statement listed below:
Strongly
Disagree
Disagree Somewhat
Disagree
Somewhat
Agree
Agree Strongly
Agree
Assessment results are
trustworthy
o o o o o o
Assessment consistently
provides useful
information
o o o o o o
Assessment results are
dependable
o o o o o o
Assessment is an
important music teacher
responsibility
o o o o o o
Assessment helps music
teachers be more
effective
o o o o o o
Assessment and
instruction can be
seamlessly integrated
o o o o o o
Assessment forces music
teachers to contradict
their beliefs
o o o o o o
Assessment helps music
teachers treat their
students fairly
o o o o o o
Assessment interferes
with teaching
o o o o o o
Assessment results are of
great use to music
teachers
o o o o o o
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 213
Assessment results are
rightfully ignored by
most music teachers
o o o o o o
Assessment has little
impact on music teaching
o o o o o o
Assessment results are
often inaccurate
o o o o o o
Assessment results are
prone to error
o o o o o o
Assessment typically
provides precise
information
o o o o o o
Assessment causes
teachers to be conformists
o o o o o o
Assessment reduces
music teacher creativity
o o o o o o
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 214
Appendix I
Invitation to Participate
SUBJECT LINE: Music Teacher Assessment Professional Development
Greetings Music Educators:
I am writing to ask for your help with a study to better understand what music teachers know and
feel about assessment in their classrooms, and to see if professional development created
specifically for music teachers can change those thoughts and feelings. I am sending this
invitation to all members of NAfME who teach music in any PK-12 setting, and I need as many
people as possible to respond.
If you decide to participate in this study, you will complete a brief online questionnaire, which
should take less than 10 minutes. Then, you will be randomly assigned to either an intervention
or control group, and asked to take an online survey about your assessment knowledge, practices,
and beliefs. Those in the intervention group will also receive access to a four-week assessment
workshop, and those in the control group will only be asked to take the online survey. All
participants will also take the same online survey at the end of the study. Your participation is
voluntary, and your responses are confidential. When you click on the screening survey link
below, you will learn more about the study to help you decide whether to participate.
If you have any questions about the study, please email or contact me at
[email protected]. A summary of major research findings will be made available to
interested participants upon request. This research has been reviewed and approved by an IRB.
You may talk to them at (303) 735-3702, or [email protected].
This screening survey will remain open to you for just under three weeks, closing on March 30,
2020. To begin the survey, click on this link or copy and paste the URL into your internet
browser: https://cuboulder.qualtrics.com/jfe/form/SV_0OjuK7q3vBqPPed
Warm Regards,
Jocelyn W. Armes
PhD Candidate in Music Education
University of Colorado Boulder
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 215
Appendix J
Trigger Email Correspondence
Intervention Group Pretest Trigger Email
Greetings Music Educators:
Thank you for enrolling in this study about music teacher assessment literacy, beliefs, and
practices. You have been assigned to the intervention group of this study, and will receive the
online professional development from March 23 - April 19, 2020.
In order to enroll in the course you will need a Google Gmail account. Some school districts use
Google products for teacher productivity; for the purposes of this study, it would be simpler to
use a personal Gmail account. If you do not have one, please go to www.gmail.com and create a
google email account to enroll in this course. Instructions for creating a Gmail can be found by
following this link: https://edu.gcfglobal.org/en/gmail/setting-up-a-gmail-account/1/.
After you are logged into your Gmail account, you can enroll in the professional development
course using the following steps:
1. In the same window, type classroom.google.com
2. At the top, click + (Add) and then Join class.
3. Enter the class code: rojrvgn
On the homescreen of the course, you will find further instructions for how to navigate through
our classroom. The professional development course will last for four weeks. The course
calendar is as follows:
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 216
Week Start Finish
I: Choosing Assessment Methods Appropriate for Instructional
Decisions
March 23 March 29
II: Developing Assessment Methods Appropriate for Instructional
Decisions
March 30 April 5
III: Administering, Scoring, and Interpreting Assessments April 6 April 12
IV: Using Assessment Results for Educational Decision Making April 13 April 19
After completing the course, you will be sent a link to another questionnaire like the one you
recently completed. Thank you, again, for your participation in this study. If you have any
further questions, please contact me via email at [email protected].
Warm regards,
Jocelyn W. Armes
PhD Candidate in Music Education
University of Colorado Boulder
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 217
Appendix K
CALI Item Distractor Analysis
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 218
Appendix L
Intervention Participant Feedback
Was the online
professional
development
relevant to you
as a music
teacher?
Was the online
professional
development course
appropriately
challenging or too
difficult?
What did you like
about the online
professional
development
course?
What would you
have changed about
the online
professional
development course
to make it more
enjoyable or useful?
Would you
recommend this
online
professional
development
course to other
music educators?
1 Yes, it was very
relevant for me as
a jh and hs choir
teacher, and I will
be referring back
to the chapters as
I need
I thought overall it
was appropriately
challenging. There
was only one section
that I got sort of lost,
but it wasn't overly
difficult
I thought there was a
lot of great content in
the primary book we
were reading and
discussing from. I
always love to learn
of a new to me author
or researcher. I also
really enjoyed the
discussion with other
music teachers,
hearing some of our
similarities and
differences, in
thinking, practices,
and circumstances
Overall I wouldn't
change anything. I
personally struggle
with a lot of
academic reading and
also read very slowly,
so that articles and
chapters could
sometimes be a lot to
get through. Having
audio chapters or
video format
information may
have been helpful at
times, but I know
how to work around
my reading issues for
myself, so it wasn't
an absolute need
I would, but I am
not sure how well
received it would
be by every music
teacher. I think it
approaches some
negative thinking
and stereotypes
that I hear and see
from other music
teachers in a way
that is very clear
and shows
alternative
approaches, but it
may be too
engrained in some
people to have a
positive outtake
from this pd
2 Yes it was. I felt it was
appropriately
challenging.
It forced me to take a
look at my own
assessment practices
(or lack thereof) and
decide how to use
more assessment in
my classroom.
I'm not sure if I
would have changed
anything.
Yes I would.
3 Yes It was not too
difficult, but perhaps
had a little too many
texts involved. Some
seemed more relevant
than others.
Reading the texts,
especially the Shaw
book and Russell
article.
It seemed to be
designed like a
college course, which
would be fine for
college students.
However, if the target
audience is working
teachers, I'd
recommend cutting
back on some of the
reading and activities
and streamlining it a
bit. It was a lot you
were asking people to
do.
I would highly
recommend some
of the texts shared,
but the
development
course itself I
would probably
not recommend.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 219
4 I was able to
complete the PD
course. What I
experienced was
relevant to my
position as a
general music
teacher.
To easy I appreciated the
design of the course.
The readings were
informative without
begin too complex.
The assignments
seemed focused on
real world teaching
problems.
I would have
appreciated the
course being offered
over a longer period
of time. 8 or more
weeks would have
allowed me the time
to participate fully in
the course.
I would
recommend this
course.
Assessment is a
critical element to
music teaching
which is
frequently
misunderstood.
Well designed PD
in assessment is
badly needed.
5 Yes. I was only
able to complete
some activities
because my
schedule became
overloaded with
distance learning
preparations and
implementation.
What I
participated in
was valuable. I
wish
circumstances had
been different so I
could have
benefited from
the training in its
entirety.
Appropriately
challenging for
ordinary
circumstances.
Difficult to manage in
what became my
current situation.
It made me think and
presented valuable
information on ways
assessment can be a
valuable tool in the
music classroom
beyond jumping
through district
hoops.
No covid-19... but
you had zero control
of that
Yes
6 Yes Appropriately
challenging
The accessibility of
an online course was
appealing.
Nothing, I found it
quite useful.
Yes
7 Yes. I felt it was
appropriately
challenging.
It used relevant
material. I plan on
buying at least one of
the books that some
chapters of the
readings were taken
from so I can read it
in full and mark it up.
The "UGH!"
comments
participants kept
leaving. It made them
sound like my jr high
students.
Yes.
8 Yes I think it was
appropriately
challenging.
However, I did not
have enough time to
fully participate in the
course.
The reading was
insightful
Other than it taking
place during a global
pandemic where I've
been engaged in
other PD on
smartmusic, zoom
and other platforms
to serve my students,
nothing.
I think a lot of
teachers in my
district would
benefit from this
development.
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 220
Appendix M
Teacher-Constructed Task Exemplar
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 232
Appendix N
Descriptive Data for CALI, MTAII, and MTABI by Assigned Group
CALI
Control Group (N = 25) Intervention Group (N = 18)
Pretest Posttest Pretest Posttest
Mean SD Mean SD Mean SD Mean SD
Standard 1 3.60 0.96 3.88 0.73 Standard 1 3.56 0.86 4.06 0.64
1.1 0.96 0.20 1.00 0.00 1.1 1.00 0.00 1.00 0.00
1.2 0.68 0.48 0.76 0.44 1.2 0.44 0.51 0.78 0.43
1.3 0.88 0.33 1.00 0.00 1.3 1.00 0.00 1.00 0.00
1.4 0.80 0.41 0.84 0.37 1.4 0.89 0.32 0.94 0.24
1.5 0.28 0.46 0.28 0.46 1.5 0.22 0.43 0.33 0.49
Standard 2 3.2 0.96 3.28 0.84 Standard 2 3 0.91 3.22 0.81
2.1 0.88 0.33 0.96 0.20 2.1 0.89 0.32 0.89 0.32
2.2 0.24 0.44 0.20 0.41 2.2 0.11 0.32 0.22 0.43
2.3 0.68 0.48 0.60 0.50 2.3 0.56 0.51 0.72 0.46
2.4 0.96 0.20 1.00 0.00 2.4 1.00 0.00 1.00 0.00
2.5 0.44 0.51 0.52 0.51 2.5 0.44 0.51 0.39 0.50
Standard 3 3.8 0.96 3.80 0.82 Standard 3 3.67 0.91 4.11 0.83
3.1 0.88 0.33 0.92 0.28 3.1 0.72 0.46 0.83 0.38
3.2 0.96 0.20 0.88 0.33 3.2 0.89 0.32 1.00 0.00
3.3 0.48 0.51 0.44 0.51 3.3 0.61 0.50 0.67 0.49
3.4 0.52 0.51 0.56 0.51 3.4 0.50 0.51 0.67 0.49
3.5 0.96 0.20 1.00 0.00 3.5 0.94 0.24 0.94 0.24
Standard 4 2.24 0.58 2.04 0.93 Standard 4 2.11 0.58 2.28 1.32
4.1 0.52 0.51 0.48 0.51 4.1 0.28 0.46 0.50 0.51
4.2 0.84 0.37 0.72 0.46 4.2 0.89 0.32 0.67 0.49
4.3 0.40 0.50 0.36 0.49 4.3 0.44 0.51 0.61 0.50
4.4 0.32 0.48 0.32 0.48 4.4 0.50 0.51 0.44 0.51
4.5 0.16 0.37 0.16 0.37 4.5 0.00 0.00 0.06 0.24
Total Score 12.84 2.54 13.00 2.02 Total Score 12.33 1.24 13.67 1.75
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 233
MTAII
Control Group (n = 25) Intervention Group (n = 18)
Pretest Posttest Pretest Posttest
Mean SD Mean SD Mean SD Mean SD
Practices Written Tests & Quizzes 1.76 0.72 2.08 0.64 1.72 0.67 1.72 0.67
Written Classwork 1.88 0.83 2.56 1.04 2.06 0.94 2.67 0.91
Group Performances 3.68 1.22 3.08 1.61 3.89 1.23 2.61 1.50
Individual Performances 3.16 1.18 2.92 1.04 2.78 1.00 2.61 1.20
Projects 2.20 1.16 1.96 0.84 2.11 1.32 2.22 0.88
Portfolios 1.32 0.56 1.44 1.12 1.17 0.38 1.50 0.79
Attendance 2.68 1.84 2.12 1.62 3.28 1.93 2.89 1.84
Participation 3.76 1.62 3.56 1.5 4.44 1.34 4 1.33
Purposes Summative 2.08 0.82 2.12 0.73 2.33 0.84 2 0.77
Formative 3.76 1.17 3.60 1.26 3.78 1.48 4.06 1.21
Diagnostic 3.16 1.52 2.84 1.31 3.22 1.56 3.28 1.36
Placement 1.40 0.87 1.44 0.77 1.67 1.03 1.56 0.71
Extramusical 2.56 1.53 2.12 1.20 2.22 1.40 2.06 1.06
*Frequency scale from 1-6 (Never to Always)
MUSIC TEACHERS’ ASSESSMENT LITERACY, BELIEFS, & PRACTICES 234
MTABI
Control Group
(n = 25)
Intervention Group
(n = 18)
Pretest Posttest Pretest Posttest
x̄ SD x̄ SD x̄ SD x̄ SD
Assessment is an important music teacher responsibility 5.16 0.75 5.20 0.76 5.05 0.98 5.16 0.75
Assessment and instruction can be seamlessly integrated 4.96 0.94 5.20 0.91 4.86 0.99 5.12 0.91
Assessment helps music teachers to be effective 5.08 0.81 5.00 0.91 5.00 0.87 5.00 0.85
Assessment has little impact on music teaching⍑ 4.80 1.26 4.76 0.97 4.86 1.21 5.00 0.87
Assessment forces music teachers to contradict their beliefs⍑ 4.20 1.47 4.80 0.91 4.30 1.32 4.81 0.98
Assessment consistently provides useful information 4.84 0.85 4.72 0.79 4.72 0.93 4.70 0.74
Assessment results are rightfully ignored by most music
teachers⍑ 4.60 0.91 4.60 1.08 4.56 1.03 4.67 1.06
Assessment reduces music teacher creativity⍑ 4.48 1.30 4.56 1.04 4.47 1.33 4.65 1.02
Assessment causes music teachers to be conformists⍑ 4.04 1.20 4.44 1.04 4.19 1.26 4.56 1.05
Assessment results are of great use to music teachers 4.04 1.54 4.40 1.26 4.3 1.41 4.51 1.06
Assessment interferes with teaching⍑ 4.04 1.31 4.20 1.16 3.98 1.32 4.47 1.10
Assessment results are dependable 4.32 0.63 4.48 0.82 4.09 0.81 4.37 0.79
Assessment results are trustworthy 4.24 0.83 4.36 0.86 4.12 0.79 4.35 0.78
Assessment helps music teachers treat their students fairly 4.08 1.58 4.16 1.38 4.02 1.32 4.26 1.22
Assessment results are often inaccurate⍑ 4.28 1.10 4.08 0.95 4.09 1.13 4.05 0.87
Assessment typically provides precise information 3.80 1.08 3.96 1.06 3.74 0.98 3.95 0.95
Assessment results are prone to error⍑ 4.00 1.12 3.96 1.06 3.70 1.06 3.86 0.94
Total 74.96 10.78 76.88 9.57 74.05 11.59 77.49 9.30
*Agreement scale from 1-6.
** Bolded posttest means indicate an increase from pretest means.
⍑ Negatively phrased items that were reverse coded after data collection.