Theory underlying a national teacher evaluation program

10
Theory underlying a national teacher evaluation program Sandy Taut *, Vero ´ nica Santelices, Carolina Araya, Jorge Manzi Pontificia Universidad Cato ´lica de Chile, Centro de Medicio ´n MIDE UC, Santiago, Chile 1. Introduction Educational evaluation, assessment and accountability systems are considered increasingly important in many countries in an effort to increase educational achievement, quality and equality (Greaney & Kellaghan, 2008). Teachers are attributed a key role in these efforts and there is an on-going international debate on how best to attract, develop and retain effective teachers (see OECD, 2005; Barber & Mourshed, 2007). The Chilean national teacher evaluation system (NTES) has received international attention for its explicit, standards-based nature, technical sophistication, formative purpose and high-stakes consequences, including merit pay (OECD, 2004, 2005; Manzi, 2009). In the context of a larger study of the NTES’ intended and unintended consequences, this paper describes this evaluation system’s underlying theory, which is supposed to delineate the assumed causal connections between program components or activities on the one hand, and expected program effects or consequences on the other hand. Our role was to help program designers and implementers formulate their underlying program theories regarding the NTES, in reaction to what was portrayed in policy and legal documents. We explored this perceived underly- ing theory of the NTES by focusing on its intended effects at different levels of the educational system (local, school, individu- al), particularly highlighting two main components of the NTES: an incentives program for high-achieving teachers (Asignacio ´n Vari- able de Desempen ˜o Individual, or AVDI), and a mandatory professional development program for low-achieving teachers (Planes de Superacio ´n Professional, or PSP). Supposed connections between program activities and expected outcomes are known in the program evaluation literature as the program theory, or theory of change behind a program or policy (see Weiss, 1972; Bickman, 1987; Patton, 2008). Since the NTES resulted from a long and difficult negotiation process between three social groups, the study of this program’s theory required the exploration of the conceptions held by three stakeholder groups: (1) the Chilean Education Ministry, repre- sented by professionals from the Centro de Perfeccionamiento, Experimentacio ´n e Investigacio ´n Educacional [Center for In-Service Teacher Training, Experimentation and Educational Research] CPEIP; (2) the Colegio de Profesores [Chile’s Teacher Union]; and (3) the Asociacio ´n de Municipalidades [Association of Local Authorities or, Municipalities Association]. Finally, since its inception profes- sionals from the Centro de Medicio ´n [Measurement Center] MIDE UC of the Pontificia Universidad Cato ´lica de Chile [Catholic University of Chile] are in charge of implementing the program. Since their views have also had direct impact on the specific methodology and processes used to carry out the NTES, they have been included in this study as a (4) fourth stakeholder group. We examined the program theory of these four stakeholder groups Evaluation and Program Planning 33 (2010) 477–486 ARTICLE INFO Article history: Received 11 May 2009 Received in revised form 2 November 2009 Accepted 17 January 2010 Keywords: Teacher evaluation Performance assessment system Merit pay Professional development Program theory ABSTRACT The paper describes a study conducted to explicate the multiple theories underlying Chile’s national teacher evaluation program. These theories will serve as the basis for evaluating the intended consequences of this evaluation system, while not losing sight of emerging unintended consequences. We first analyzed legal and policy documents and then interviewed fourteen representatives of the four stakeholder groups involved in the program’s design and implementation, in order to gain insight into their respective conceptions of the program’s functioning and intended effects. The results show that, as to be expected and despite the long and difficult negotiation process that preceded implementation of this program, multiple political stakeholders still view the program’s intended effects differently. However, there was substantial overlap regarding a number of intended effects, such as building the capacity of, and triggering change in, teachers with shortcomings, and informing the selection of new teachers and facilitating the exit of unsatisfactory teachers from the system. It was difficult to get interviewees to talk about how exactly these intended effects are supposed to be achieved. The paper draws conclusions regarding theory elaboration process involving multiple stakeholders in a highly political context. 2010 Elsevier Ltd. All rights reserved. * Corresponding author. Tel.: +56 2 354 5302; fax: +56 2 552 2563. E-mail address: [email protected] (S. Taut). Contents lists available at ScienceDirect Evaluation and Program Planning journal homepage: www.elsevier.com/locate/evalprogplan 0149-7189/$ – see front matter 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.evalprogplan.2010.01.002

Transcript of Theory underlying a national teacher evaluation program

Theory underlying a national teacher evaluation program

Sandy Taut *, Veronica Santelices, Carolina Araya, Jorge Manzi

Pontificia Universidad Catolica de Chile, Centro de Medicion MIDE UC, Santiago, Chile

1. Introduction

Educational evaluation, assessment and accountability systemsare considered increasingly important in many countries in aneffort to increase educational achievement, quality and equality(Greaney & Kellaghan, 2008). Teachers are attributed a key role inthese efforts and there is an on-going international debate on howbest to attract, develop and retain effective teachers (see OECD,2005; Barber & Mourshed, 2007). The Chilean national teacherevaluation system (NTES) has received international attention forits explicit, standards-based nature, technical sophistication,formative purpose and high-stakes consequences, including meritpay (OECD, 2004, 2005; Manzi, 2009).

In the context of a larger study of the NTES’ intended andunintended consequences, this paper describes this evaluationsystem’s underlying theory, which is supposed to delineate theassumed causal connections between program components oractivities on the one hand, and expected program effects orconsequences on the other hand. Our role was to help programdesigners and implementers formulate their underlying programtheories regarding the NTES, in reaction to what was portrayed inpolicy and legal documents. We explored this perceived underly-ing theory of the NTES by focusing on its intended effects at

different levels of the educational system (local, school, individu-al), particularly highlighting twomain components of the NTES: anincentives program for high-achieving teachers (Asignacion Vari-

able de Desempeno Individual, or AVDI), and a mandatoryprofessional development program for low-achieving teachers(Planes de Superacion Professional, or PSP). Supposed connectionsbetween program activities and expected outcomes are known inthe program evaluation literature as the program theory, or theoryof change behind a program or policy (see Weiss, 1972; Bickman,1987; Patton, 2008).

Since the NTES resulted from a long and difficult negotiationprocess between three social groups, the study of this program’stheory required the exploration of the conceptions held by threestakeholder groups: (1) the Chilean Education Ministry, repre-sented by professionals from the Centro de Perfeccionamiento,

Experimentacion e Investigacion Educacional [Center for In-ServiceTeacher Training, Experimentation and Educational Research]CPEIP; (2) the Colegio de Profesores [Chile’s Teacher Union]; and (3)the Asociacion de Municipalidades [Association of Local Authoritiesor, Municipalities Association]. Finally, since its inception profes-sionals from the Centro de Medicion [Measurement Center] MIDEUC of the Pontificia Universidad Catolica de Chile [CatholicUniversity of Chile] are in charge of implementing the program.Since their views have also had direct impact on the specificmethodology and processes used to carry out the NTES, they havebeen included in this study as a (4) fourth stakeholder group. Weexamined the program theory of these four stakeholder groups

Evaluation and Program Planning 33 (2010) 477–486

A R T I C L E I N F O

Article history:

Received 11 May 2009Received in revised form 2 November 2009Accepted 17 January 2010

Keywords:

Teacher evaluationPerformance assessment systemMerit payProfessional developmentProgram theory

A B S T R A C T

The paper describes a study conducted to explicate the multiple theories underlying Chile’s nationalteacher evaluation program. These theories will serve as the basis for evaluating the intendedconsequences of this evaluation system, while not losing sight of emerging unintended consequences.We first analyzed legal and policy documents and then interviewed fourteen representatives of the fourstakeholder groups involved in the program’s design and implementation, in order to gain insight intotheir respective conceptions of the program’s functioning and intended effects. The results show that, asto be expected and despite the long and difficult negotiation process that preceded implementation ofthis program, multiple political stakeholders still view the program’s intended effects differently.However, there was substantial overlap regarding a number of intended effects, such as building thecapacity of, and triggering change in, teachers with shortcomings, and informing the selection of newteachers and facilitating the exit of unsatisfactory teachers from the system. It was difficult to getinterviewees to talk about how exactly these intended effects are supposed to be achieved. The paperdraws conclusions regarding theory elaboration process involving multiple stakeholders in a highlypolitical context.

2010 Elsevier Ltd. All rights reserved.

* Corresponding author. Tel.: +56 2 354 5302; fax: +56 2 552 2563.E-mail address: [email protected] (S. Taut).

Contents lists available at ScienceDirect

Evaluation and Program Planning

journa l homepage: www.e lsev ier .com/ locate /eva lprogplan

0149-7189/$ – see front matter 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.evalprogplan.2010.01.002

through interviews with a number of its current and pastrepresentatives in leadership positions.

The work we report on in this paper is part of a larger study onthe consequences (or consequential validity) of the nationalteacher evaluation system. In this larger research effort explicatingthe program theory was only the first step. The second step is toexplore the program theory empirically, particularly the intendedand unintended consequences, through data collected fromevaluated teachers, school directors, and local authorities.However, this paper only deals with the first part, the bringingto light of the program’s underlying theory, and the second partwill be reported on elsewhere.

First, the paper provides a brief review of the literature relatedto teacher accountability in the context of the introduction of theNTES in Chile, as well as a review of program theory literature asrelevant to our study. The paper then states the research question,describes the methodological details, and communicates thefindings of the study. Finally, we offer a discussion of thesefindings and reflect on the conclusions and lessons learnt.

2. Literature review

2.1. Teacher accountability in the context of introducing the NTES in

Chile

A discussion of teacher accountability in Chile must start bybriefly outlining the history of this profession in Chile. Whileduring the democratic governments of the 1960s, the status ofteachers had improved, this changed dramatically during theMilitary government (1973–1990). In its first five years ofexistence, the Teacher Union was dismantled and a considerablenumber of teachers judged to be political opponentswere forced toleave their jobs, forced in to exile, or worse, imprisoned or made todisappear. In the course of decentralization and privatization ofeducation at the beginning of the 1980s, municipalities were giventhe right to appoint and dismiss teachers, while teacher salarieswere not adjusted in line with other public sector professionals. Inaddition, teacher education was demoted to a non-universityprofession (Avalos, 2004).

The new democratic government taking power in 1990 gavefirst priority to improving the working conditions and salaries ofteachers. Among its first actions was to pass a new law regulatingteachers’ contractual conditions, known as Estatuto Docente. Thislaw, in place to this day—although with various modifications,allowed municipalities to hire teachers following due process butnot to fire tenured ones. In general, the law strengthened teachingas a profession by granting the right to appropriate initial and in-service education, adequate working conditions and improvedsalaries. Between 1991 and 2001, teacher salaries improved by141%, more than salary increases in the economy at large or in thepublic sector (OECD, 2004). The law also contained a clauseregarding teacher evaluation, as away formunicipalities to removeseriously underperforming teachers. The Teacher Union opposedthis clause, and in practice it was rarely if never enforced (Avalos,2004).

From the mid-1990s there was growing concern about theinsufficient student achievement results of Chile’s public educa-tion sector, as shown both by results of its national student testingsystem (SIMCE) and international comparative assessments.Different parts of society, especially the conservative oppositionand the municipalities, criticized the government for not actingmore forcefully to improve teacher performance in this context, forexample, by enforcing the teacher evaluation clause in the EstatutoDocente. The Teacher Union decided to tackle the issue andentered negotiations, initially only with the Ministry of Education,to design an acceptable system. In 2000, the Municipalities

Association joined the discussions. The negotiation processinvolved an international seminar on teacher evaluation present-ing experiences in Cuba, U.K. and USA, as well as visits abroad for agroup of involved stakeholders. This led to a better understandingof the complexities of installing a teacher evaluation system thatwould satisfy different stakeholder expectations. An importantstep in reaching an agreement was the publication of a set ofstandards describing competent teaching, based on Danielson’sFramework for Teaching (1996). Finally, a draft law was sent out forconsultation and was approved by nearly 80% of those teachersresponding to the consultation (which amounted to N = 65,846).The evaluation lawwas passed by Parliament on August 14, 2004—some seven years after negotations had begun (see OECD, 2004;Avalos & Assael, 2006; Assael & Pavez, 2008).

The consensus reflects compromises from all stakeholdergroups. The system is not purely formative, as the Teacher Unionhad wanted. It contains the option to remove unsatisfactoryteachers from their positions. But the system does not connectteacher performance to student learning, something that conser-vative groups had urged and which was strongly opposed by theTeacher Union. The system grants incentives to high-performingteachers but does not connect these incentives to a professionalcareer ladder and other existing incentives schemes, as the unionhad demanded but which the Ministry resisted. The agreementreflects a compromise between two perspectives on teacherperformance that co-exist not only in Chile but at internationallevel: on the one hand, there are those arguing for accountabilityand control, as well as incentives, as a way to bring aboutimprovements in teacher performance; on the other hand, thereare those who call for trust in the professional identity and supportfor the self-improvementmotivation of teachers as public servants(see OECD, 2005). In the view of Cox (2008), in the case of Chile thistension has been resolved so far more in favor of pressure orcontrol than in favor of support or trust.1

2.2. Program theory

The concept of program theory has been present in theevaluation literature since the 1970s (Weiss, 1972) but it was inthe 1980s when Chen and Rossi (1983, 1987) formally introducedthe notion of program theory evaluation. Although there is an on-going academic debate around the definition of program theoryand program theory evaluation (Davidson, 2005), a commondefinition is the one provided by Bickman (1987) who definesprogram theory as ‘‘the construction of a plausible and sensiblemodel of how a program is supposed to work.’’

Theory-driven or theory-based evaluation comes as a responseto evaluation approaches that over-emphasized the role ofmethods and aims to provide information that can be used toimprove program design and implementation (Chen, 1990). Itadvocates the use of multiple methods and highlights the need fora theory that guides the work of evaluators by helping themprioritize evaluation questions and methodological approaches.That theory can come from formal social science knowledge andpractitioners’ experience (Chen, 1990; Donaldson, 2007). Accord-ing to Rossi, Freeman, and Lipsey (1999), the aim of reconstructingprogram theory is to describe the ‘‘program as intended’’ and itsrationale, in particular, how the program is supposed to bringabout the intended outcomes.

1 It is interesting to note that in a recent high-level commission report oneducation quality, invited by the Chilean President, these two positions come outclearly with regard to the discussion of the Estatuto Docente and the relatedperformance evaluation. They are presented in the report as ‘‘Position A’’ and‘‘Position B’’, in parallel existence and seemingly unreconcilable (Consejo AsesorPresidencial para la Calidad de la Educacion, 2006).

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486478

Reconstructing program theory is necessary because it isfrequently implicit or unsystematic. As Weiss (1973) says:

‘‘. . . the goal of social programs are often global, diffuse, diverseand inconsistent, vary over stakeholders and may have little todo with program functioning. One reason for this sorry state isthat it often requires coalition support to secure adoption of aprogram. Holders of diverse values and different interests haveto be won over, and therefore a host of realistic and unrealisticpromises aremade in the guise of program goals (pp.180–181).’’

The relationship between evaluation and politics identified byWeiss is especially important to the work presented in this paper.In Weiss’ view, stakeholders differ in their assumptions and tacitunderstandings, and therefore a programmay have more than oneunderlying theory. All these theories need to be brought to light inorder to reach a consensus on which deserve to be tested in anevaluation (Stame, 2004). Leeuw (2003) suggests that rebuildingprogram theory becomes more important the larger the assumedimpact of a program, themoremoney involved in the program, andthe larger the risks involved. Risks refer to both expected andunexpected consequences.

Establishing the underlying program theory requires evalua-tors to familiarize themselves with the program’s goals andimplementation. According to Donaldson (2007) and Rossi et al.(1999), program theory must be developed in an iterative processthat involves evaluators on the one side and program staff andstakeholders on the other side. Rossi et al. (1999, p. 164) callprogram staff and stakeholders ‘‘the most important sources ofinformation’’ in this context. Theory articulation process can becompleted during the program design stage (which would be theideal case), but in reality often happens later in the program cyclewhen the need for an evaluation arises. Ideally, there would be afirst draft developed based on program material and writtendescriptions of the program. This first draft would be reviewedand discussed with stakeholders and its plausibility checked andagreed on. Then the evaluators would look more carefully at theconnections between the different program elements andpresent a revised program theory to stakeholders in order toget their final approval. Such interaction with the program staffand stakeholders brings the additional benefit of increasedevaluation use, especially process use (Patton, 2008). Insummary, methods to be used in the process of reconstructingprogram theory include the review of program documents,interviews and meetings with stakeholders and program staff,observation of program activities, and review of social scienceresearch (Donaldson, 2007; Ehren, Leeuw, & Scheerens, 2005;Leeuw, 2003; Rossi et al., 1999).

Some authors differentiate two components of a programtheory: (i) one that refers to the implementation of processesaimed at providing the intended services to the target populationand (ii) another that refers to the hypothesized causal links andimpact of the intervention. Rossi et al. (1999) and Donaldson(2007), for example, refer to these two components as the programprocess theory and the program impact theory, respectively. Morespecifically, Rossi et al. (1999, pp. 167–168) define program impacttheory as ‘‘the goals and objectives that describe the outcome ofthe change process the program aims to bring about in socialconditions’’ as well as ‘‘any intermediate objectives that representsteps along the pathway leading from program services, on oneend, to improved social conditions that are the program’s ultimategoal, at the other.’’

A useful tool often mentioned for helping to reconstruct aprogram theory is the logic model (Frechtling, 2007). A logic modelcharacterizes an intervention by defining its components andmaking the connections between them explicit. The following are

the four basic components: (a) inputs: resources brought to aproject, (b) activities: actions undertaken by the project, (c)outputs: immediate results of the actions, such as events,participation, or products, and (d) outcomes: desired accomplish-ments or change (Frechtling, 2007; McLaughlin & Jordan, 1999).What turns a logic model into a program theory is adding detailedchange mechanisms that explain how and why the program willachieve the desired outcomes (Patton, 2008, p. 336).

Typically, program theories are depicted as charts or graphicaldisplays (Rossi et al., 1999, p. 172). Examples of diagramsillustrating program impact theories, as presented in the literature,usually resemble simple flowcharts using boxes to contain theprogram as input on the left side, mediator variables orintermediate outcomes in the middle, and long-term intendedoutcomes or impacts on the right side, with arrows connectingthem (see Rossi et al., 1999, p. 103). Donaldson (2007, p. 26) talksabout the ‘‘variable oriented approach’’ as the most widely usedapproach to pictorially represent program impact theories,including direct effects, mediator relationships, and moderatorrelationships. Donaldson (2007) promotes the mediator model,which in practice corresponds to input! intermediate out-comes! long-term outcomes models, as shown in examples ofsuch diagrams he replicated from projects he has been involved in(see pp. 62 and 88).

There is an ongoing debate on the importance of conducting anassessment of the quality of program theory on the part of theevaluator. While some authors (e.g., Rossi, Chen) argue in favor ofthe normative role evaluators should play, others (e.g., Patton)stress the participatory nature of the process and the importance ofhaving stakeholders come to their own conclusions about programfeasibility and coherence based on the collaborative dynamic ofrebuilding the program’s theory.

Program theory is mentioned as part of guidelines on how toexamine the consequential validity of assessment and account-ability systems (Forte Fast & Hebbler, with ASR-CAS Joint StudyGroup on Validity in Accountability Systems, 2004) and as a basisfor evaluating social programs in general (Bickman, 1987; Carvalho&White, 2004; Chen, 1990; Donaldson, 2007). There are examplesof studies explicating theories underlying social policies like theschool inspection policy in the Netherlands (Ehren et al., 2005) andregarding similar teacher assessment systems, for example theaccreditation program by the National Board for ProfessionalTeaching Standards in the United States (National ResearchCouncil, 2008, p. 31).

3. Research objective

The research objective guiding this study is to make explicitthe expected consequences behind the NTES and the mechan-isms by which the program would generate these expectedconsequences. In Rossi et al.’s (1999) and Donaldson’s (2007)lingo, we focused on the NTES’ program impact theory.Specifically, we concentrated on the two main aspects of theNTES: an incentives program for high-achieving teachers (AVDI),and a mandatory professional development program for low-achieving teachers (PSP).

4. Methodology

The research question stated above was explored using thefollowing methodological approach: First, we reviewed legal andpolicy documents to develop an initial graphical representation ofthe program’s underlying theory, according to these officialpublished sources. Then, we reconstructed empirically theoryheld by the four stakeholder groups who participated in theteacher evaluation system’s planning, design, and implementation.

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486 479

We analyzed these perspectives regarding their differences andcommonalities, trying to arrive at a consolidated version of theprogram’s theory.

In order to reconstruct the stakeholders’ theories we con-ducted a total of fourteen interviews, three to four per stakeholdergroup. Our interviewees held leadership positions in the institu-tions involved in the design of the NTES. For example, in theMinistry of Education we interviewed the director of the CPEIP, aswell as the person directly responsible for coordinating much ofthe process that conceived of and introduced the NTES in Chile. Inthe Teacher Union, we interviewed the president as well as twotechnical advisors in NTES matters. We also spoke with thepresident of theMunicipalities Association and his technical staff,some of which had directly participated in the NTES negotiationprocess.

We followed a semi-structured protocol that focused onexploring the expected consequences that members representingeach stakeholder group associatedwith the NTES. The protocol alsoinvestigated the mechanisms by which the program wouldgenerate such changes. A diagram of the NTES’s underlying theory,as described by legal and policy papers, was presented to eachinterviewee in order to facilitate their analysis of the program (seeFig. 1). We discuss this initial portrayal of the program in thefollowing section.

4.1. NTES’ underlying theory as explicated in official documents

Since 2005 the evaluation is mandatory and is the basis forrewarding and sanctioning about 71,000 teachers working in theChilean public education sector. The evaluation distinguishesbetween ‘‘outstanding’’, ‘‘competent’’, ‘‘basic’’, and ‘‘unsatisfacto-ry’’ performance. Performance standards guiding the evaluationhave been defined, officially endorsed, published and widelydisseminated as the ‘‘Marco Para la Buena Ensenanza [Guidelinesfor Good Teaching]’’ (Ministry of Education, 2004). Evaluationmethods include a portfolio comprising a written part and avideotaped lesson, a supervisor questionnaire, a peer interview,and a self-assessment questionnaire. The evaluated teachersreceive a descriptive report detailing their results for the differentportfolio dimensions, describing strengths and weaknesses, andincluding their final score. The school principal and the head of themunicipal education department also receive reports providing thefinal performance categories of the teachers evaluated in that

Fig. 1. NTES’ theory as portrayed in legal and policy documents.

Fig. 2. NTES’ underlying theory as discussed by CPEIP (Ministry of Education).

Fig. 3. NTES’ underlying theory as discussed by Teacher Union.

Fig. 4. NTES’ underlying theory as discussed by Municipalities Association.

Fig. 5. NTES’ underlying theory as discussed by implementers (MIDE UC).

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486480

school or municipality, and the average results for all thoseteachers by portfolio dimension. While thus a considerablenumber of key stakeholders receive the individual teacherevaluation results, one can say that they are not public and notused for ranking purposes. For a full description of the program seeManzi, Preiss, Gonzalez, Flotts, and Sun (2008).

Legal and policy documents (e.g., Law 19.933, 2004; Law19.961, 2004; Decreto No. 192, 2004;Ministry of Education, n.d.-a;Ministry of Education, n.d.-b) describe the National TeacherEvaluation System as a program that combines monetaryincentives to motivate teachers to show good performance withformative feedback and institutionalized support opportunities forthose in need of improvement. As the laws and regulations reflect,the result of the evaluation has high-stakes consequences forindividual teachers: high-performing teachers are eligible for anincrease in salary, low-performing teachers are subject tomandatory professional development, and – if repeatedly evaluat-ed ‘‘unsatisfactory’’ – loss of employment.

The incentives program (AVDI) is accessible to teachersevaluated as ‘‘outstanding’’ or ‘‘competent’’ and involves passinga subject and pedagogical knowledge test following the teacherevaluation process. The results in this test determine the percentsalary increase (0%, 5%, 15% or 25%) the teacher receives until it istime for his or her re-evaluation (after 4 years).

Teachers evaluated as ‘‘basic’’ and ‘‘unsatisfactory’’ are requiredto attend professional development courses (PSP), which aresupposed to provide them with an opportunity to learn andimprove their practices. The PSP are expected to be more effectivethan traditional development courses since their designers wouldtake into consideration the performance weaknesses identifiedthrough the evaluation process. Local managers of school systemswould also be able to use the information provided by NTES’reports in hiring and firing decisions.

In the long run NTES is expected to help professionalize theteaching career, to improve student learning and to link teachers’salaries to their actual performance. Fig. 1 shows the full programschematically as described in the legal and policy documents weconsulted.

4.2. Data collection

The representatives from each of the stakeholder groups werecontacted in order to schedule a personal interview. In addition tothe names originally considered by the research group, eachinterviewee was asked to provide the name and position ofadditional informants they considered key for the study. The goalwas to conduct a minimum of three interviews by stakeholdergroup. Each interview was approximately one hour long andconducted by a team of two researchers. The two researchersalternated their role as moderator and observer who also tooknotes. The interviews were tape-recorded for later transcriptionand analysis.

4.3. Data analysis

Each interview transcript was analyzed in a data analysismatrix especially designed for this purpose. Thematrix dimensionsreflected the study’s goals and interview questions. After codingthe contents of each interview using our matrix, we thensummarized each matrix in the form of a diagram. The diagramsserved as a one-page graphical display of each interviewee’spersonal vision of the consequences and mechanisms underlyingthe NTES. Once the diagrams were completed for all intervieweesfrom one stakeholder group, a general diagram for the group wasdeveloped. Themembers of the research groupworked to integratethe different visions within one stakeholder group by building on

the considerable overlap that existed between them. This alsomeant that we did not include in this consolidated version thosestatements that were not shared among the interviewees of onegroup. The last step was to look for commonalities and differencesin the views of the stakeholder groups by creating summary tables,which are presented in the Discussion section.

5. Results

This section presents the program theories of each stakeholdergroup. We developed diagrams, which are presented in thefollowing order: Ministry of Education’s Center for In-ServiceTeacher Training, Experimentation and Educational Research(CPEIP), Teacher Union, Municipalities Association, and imple-menting institution.

How to read the program theory diagrams

The program theory diagrams have a common structure for all

stakeholders groups. The upper part of the diagram shows the

way interviewees expected NTES to generate change at the

educational system level. Thefirst box of thediagram represents

the NTES as the origin of any theory regarding this program. The

secondboxwasmentionedby three out of four of the groups and

describes a key characteristic of the NTES: it is a teacher

evaluation system based on professional standards. The central

box shows the initial expected consequences stated by the

members of the different stakeholders groups. The last two

boxes of the diagram show the expected longer-term effects of

NTES. The lower part of the diagram represents the way inter-

viewees expected the NTES to generate change at the individual

(teacher) level. The broad arrows connect both parts of the

diagram, attempting to show that the sum of the individual

changes should relate to at least someof the system level effects.

Please keep in mind that the diagram summarizes the original

intentions of the interviewees regarding NTES’ functioning, not

necessarily their opinion about the program’s real or observed

functioning after the onset of implementation. In addition, the

diagram aims to show the institutional position of each of the

four stakeholder groups. That is, they represent an integrated

version of what the three or four interviewees from one partic-

ular stakeholder group had to say, with some individual points

of view being deemphasized in this summary.

5.1. Program theory of the Ministry of Education (CPEIP)

Representatives from the CPEIP expressed the expectation thatthe NTES would advance education quality and student achieve-ment through the improvement of teacher competencies andteacher performance. The NTES would achieve these long-termgoals by explicitly evaluating performance based on professionalteaching standards stipulated in the Marco para la BuenaEnsenanza (MBE) or ‘‘Framework for Good Teaching’’, a documentdeveloped in collaboration with representatives of the teachingprofession and international experts. The standards-based natureof the NTES is novel to teachers, since the previous evaluationsystem focused exclusively on formal and administrative aspectsof their professsional performance.

To CPEIP’s representatives one of the NTES’ most importantfeatures is its formative nature, that is, the opportunities and

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486 481

support it provides to teachers in order for them to improve theirclassroom practices. These opportunities are crystallized in theNTES’ professional development program (PSP). This program,designed and implemented at local level by municipal educationauthorities, would be based on actually diagnosed local needs andwould provide teachers the tools to improve their practices giventhe reality of each municipality.

The interviewees expected that the NTES would contribute toits long-term goals by providing diagnostic information about thestrengths andweaknesses of teachers currently working in schoolstoday. This information and the associated resources would helpthe municipal educational decision-making processes becomemore efficient and effective, especially those related to teacherprofessional development. Another way of using the diagnosticwould be that the best teachers would help other teachers, as wellas those students who are in most need within a municipality.Furthermore, it is hoped that the results of the NTES areincreasingly used for hiring decisions, as well as for the non-renewal of contracts of consistently under-performing teachers.

At the same time, those teachers who receive performancecategorizations of competent and outstanding would receivemonetary and non-monetary incentives. For example, after passinga subject knowledge test they would obtain an increase in salary(AVDI). These teacherswould also be able to apply to becomepart ofanetworkof ‘‘teachers teaching teachers’’which is seenasa sourceofsocial recognition. Over time, CPEIP’s representatives expect thatthese mechanisms would help keep good teachers in the classroomandwould incentivate the sharingof goodpractices andexperiences.

Along these lines, CPEIP’s representatives expected the NTES toimprove teaching practices by fostering a dialogue between peersfocused on overcoming weaknesses and maintaining goodpractices. One of the interviewees said that peer collaborationmeans ‘‘a different way of doing things . . . it is about looking atourselves permanently and finding out what our weaknesses are.’’

At the individual level CPEIP’s representatives expected theexternal feedback generated by NTES to encourage teachers toperform regular, critical self-observations. Teachers are supposedto engage in reflection based on the information provided in theindividual evaluation reports. This reflection would not only makethem aware of but also feel responsible for their weaknesses andwould result in a commitment to change weak practices. Inaddition, just by undergoing the evaluation process teacherswouldbe encouraged to observe the effectiveness of their practices in theclassroom and assume their professional responsibilities towardfostering student learning. Finally, NTES’ external feedback wouldalso be a source of recognition of good performance. Fig. 2graphically depicts CPEIP’s program theory.

5.2. Teacher Union’s program theory

For the representatives of the Teacher Union the professionalteaching standards are themost central feature of NTES in terms ofbringing about the expected consequences. In their view, theevaluation should only assess the compliancewith the professionalstandards and should be disconnected from student performance.The interviewees from the Teacher Union expect NTES to improveteachers’ practices in the short andmedium term in four importantways.

First of all, the NTES is seen as a response to the Teacher Union’ssearch for a cultural change in which teachers are given a moreprominent social position and would be held in higher esteem bybeing perceived as professionals bound by professional standards.

Secondly, the NTES is perceived as an opportunity for teachersto improve their practices, but regardless of their NTES category.The Teacher Union representatives firmly believe in the formativecharacter of the program and are committed to the continuing

professional development of all teachers regardless of whetherthey are classified as unsatisfactory, basic, competent or outstand-ing. The NTESwould provide opportunities for teachers to improvetheir practices, not only related to the official professionaldevelopment program (PSP).

Along these lines, the representatives from the Teacher Unionvoiced the expectation that the NTES would generate improve-ment through peer collaboration. The system would stimulate thesharing of particularly successful practices, and would allow allteachers, not only low-performing ones, to develop professionallythrough peer dialogue and interactions.

Finally, the NTES was also expected to help the development ofa professional career where teachers’ experience and performancewould be associated to monetary compensation and promotionopportunities. The Teacher Union considers the NTES as afundamental input for improving teachers’ job prospects throughcompensation and promotion decisions that are based on teachers’classroom practices.

At the individual level the representatives from the TeacherUnion believed the NTES would generate a self-observation thatwould function throughout the entire evaluation process. This self-regulation would begin the moment teachers become conscious oftheir practices when producing the products required by theevaluation (portfolio, video-taped lesson, self-evaluation question-naire). Subsequently, teachers would receive external feedback ontheir practices based on the results of the evaluation, elaborating ontheir strengths and weaknesses. Both processes together wouldmotivate teachers’ continuous learning. Fig. 3 graphically displayshow the Teacher Union views the NTES’ program theory.

5.3. Program theory of the Municipalities Association

According to the representatives of the Municipalities Associa-tion, the NTES would ultimately improve the learning andachievement of students through the improvement of teachingpractices of public school teachers. The interviewees did notmention the teaching standards as an important enabling factor asthe other stakeholder groups had done.

These long-term consequenceswould come about because, firstof all, the NTES provides a diagnostic of teachers’ practices within amunicipality, highlighting their strengths and weaknesses. Thisinformation would allow the local authorities to make betterdecisions about the allocation of resources, about the design andimplementation of teacher professional development, the planningand implementation of educational plans, and would providerecognition for especially proficient teachers. NTES would serve asan external validation of the decisions that the local authoritymakes regarding hiring, promotion and removal of teachers. Theinformation NTES provides about the current practices of teachersis expected to confirm what the local authority already knowsregarding its teachers. Having information froman external source,however, allows the local authority to make human resourcedecisions about teachers without being viewed as arbitrary and/orpolitically motivated. This is especially important when letting golow-performing teachers.

In this context, the NTES would also serve to moderate the rigidlabor conditions that rule teachers in public schools. Theinterviewees stated that the special legislation that governs thelabor situation of teachers at public schools (Estatuto Docente)makes it very hard to remove teachers, even those for whom theremight be evidence about their lack of preparation and profession-alism. However, none of the interviewees viewed the hiring andremoval of teachers as the only tool to improve education qualityat the municipal level.

At the individual level the interviewees from theMunicipalitiesAssociation expected that the NTES would make teachers undergo

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486482

two types of processes: (i) the observation of and reflection abouttheir own practices and (ii) taking responsibility for theconsequences of their professional actions. These two processeswould result in the improvement of weak practices and wouldserve as a positive reinforcement for those teaching practicesidentified as effective. In Fig. 4 we show the MunicipalitiesAssociation’s program theory.

5.4. Implementers’ program theory

TheEducationMinistrycontractedout the implementationof theNTES to the Measurement Center MIDE UC. In the opinion of MIDEUC representatives the NTES was expected to improve teachingpractices and student learning in the long run. These long-termoutcomes were to be achieved through the first-ever evaluation ofpublic school teachers using explicit professional standards of goodteaching. In the opinion of the implementers of NTES, explicitlystating the expectations of newand current teachers bywayof theseprofessional standardswouldhelp improve andmodernize teachingpractices of public school teachers. In the words of one of theinterviewees: ‘‘just introducing the standards as part of theevaluation gives them a very high status.’’

The interviewees suggested that NTES would work as aninformation system with multiple short and medium-termconsequences. First of all, NTES would allow linking socialrecognition for high-performing teachers to real and concreteachievements, comparable at national level. The NTES wouldidentify good teachers and generate incentives associated to goodperformance in the classroom.

In second place, the interviewees thought NTESwould create anopportunity to support and offer professional development toteachers who show weaknesses in their practices. This is donethrough the Professional Development Plans (PSP). The profes-sional development would enable low-performing teachers toimprove and eventually to achieve competent or outstandingperformance. NTES would allow addressing teachers’ weaknessesusing solid information and leveraging on local resources.

Regarding the use of NTES’ information at local (municipal) level,the implementersmentioned that NTESwould support the capacityof municipalities’ education staff to make informed decisions andthereby improve education quality overall in themunicipalities. Forexample, it would allow them to make decisions about hiring goodteachers, and allocating them to different municipal schools basedon their quality in relation to school needs.

Lastly, the interviewees mentioned the role that the NTESwould play in reforming the teacher education system byproviding information about the weaknesses of teachers’ practicesand the areas that need most support. This information wouldallow redesigning the curricula of teacher education programs and,in the long run, improve education quality.

At the individual level the implementers thought that NTESwould trigger self-regulation processes through which teachers

would adopt professional standards, would be able to makesignificant progress in the practices identified as weak, and wouldinternally reinforce, and thus maintain, good practices. Thetriggering of the individual self-regulation process occurs, in theirview, by participating in the evaluation process, not necessarilyonly related to receiving the evaluation results. At the same time,the intervieweesmentioned that the NTES provides useful externalfeedback to teachers in the form of individualized evaluationreports. These reports indicate the results of the evaluation andhighlight individual strengths and weaknesses in teachingpractice. In the eyes of the implementers, this information wouldbenefit all teachers across the board, regardless of their NTESclassification. Fig. 5 graphically summarizes the implementers’program theory.

6. Discussion

Taking as a starting point theory diagram that reflected theexisting legal documents regarding NTES, the process of recon-structing the underlying theory of the teacher evaluation programresulted in four different diagrams corresponding to each of thefour groups of stakeholders we interviewed: Ministry of Educationstaff, Teacher Union representatives, representatives of theMunicipalities Association, and representatives of the institutionentrusted with the implementation of the evaluation system. Eachof the four diagrams contained sometimes similar, sometimesdistinct expected short-term and long-term consequences.

It is interesting to highlight a few characteristics of the differentstakeholder groups’ underlying theories before moving to anintegration of these four theories in order to guide our empiricalstudy of NTES’ consequences. First of all, it is striking that all groupsexcept the Municipalities Association highlight the importance ofthe professional teaching standards. To these groups the standardsconstitute a key attribute of the evaluation system so that thissystem can reach its expected consequences. The standards –mainly because they were elaborated by the Ministry inconsultation with the Teacher Union – give legitimacy to theevaluation, and they provide a transparent image of what it meansto be a good teacher (Avalos & Assael, 2006). Similarly, it is strikingthat in terms of long-term expected effects of the evaluationsystem, all groups except the Teacher Union mention improvedstudent learning and achievement. All four groups agree that theshort-term expected consequences of the evaluation should resultin improved teaching practices, but only the Ministry, theimplementers and the Municipalities Association see as a crucialnext step the improvement of student learning. It is not surprisingthat the Teacher Union does not mention this final outcome: theyhave long rejected that teachers be held accountable on the basis ofstudent achievement (Avalos, 2004; Assael & Pavez, 2008).

We examined in more detail the wealth of immediate ormedium-term expected consequences the interviewees sharedwith us. Table 1 shows a summary of the four groups’ expected

Table 1

Expected short-term consequences of NTES at system level as mentioned by the four stakeholder groups.

Expected consequences Stakeholder groups

Implementinginstitution

Ministry ofEducation

TeacherUnion

Municip.Assoc.

(1) offering social reinforcement of good teaching practice – –(2) building capacity of teachers with shortcomings –(3) informing selection of new teachers and exit of insatisfactory teachers – (4) informing the reform of teacher education programs – – –(5) providing a base for peer-to-peer conversations about good practice – –(6) improving teachers’ job prospects by providing incentives – –(7) changing to a culture where teachers are professionals – – –(8) diagnosing quality of teaching practice as basis for management decisions – –

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486 483

consequences of NTES at system level. The table is meant to allowfor an analysis of the overlap that exists among the groups.

The table shows that some stakeholder groups have moreconsequences in common than others. For example, implementinginstitution and Ministry of Education share three expectedconsequences, while the Municipalities Association shares onlytwo consequences with other groups. There is no overlap at allbetween the expected consequences of the Teacher Union andthose mentioned by the Municipalities Association, which is notsurprising given the diverging political viewpoints these twoentities represent (Avalos, 2004; Avalos & Assael, 2006). There aretwo expected consequences that were only mentioned by one ofthe stakeholder groups, respectively.

When we asked ourselves how the four stakeholder groups’consolidated program theories would guide our empirical study, itwas clear that it was not possible to examine four different programtheories because of time and budget constraints. Therefore, wedecided to investigate those expected consequences that are sharedamong at least two of the four stakeholder groups, in addition toexamining all the assumptions communicated in the legislation andofficial documentation regarding NTES, and our own hypotheses asresearchers. In fact, all consequences stated in the policy and legaldocuments were reiterated and agreed on by at least twostakeholder groups. Thanks to the interviews we conducted wecould add two additional intended effects that were not explicitlystated in the legislation but that proved important and interesting:peer effects and the use of the information provided by NTES foreducational management decisions.

As a next step, the research team expressed the expectedconsequences, as they had been identified through the documentanalysis and stakeholder interviews, in terms of more detailedunderlying assumptions or changemechanisms. This would help usfurther focus the data collection we were planning in the ensuingempirical study on NTES’ consequences. The detailed underlyingassumptions we developed can be found in the Appendix.

So far we have only discussed theory the interviewees heldabout NTES at the system level. We did the same exercise for theindividual (i.e., evaluated teacher) level. At this level, all fourstakeholder groups agreed that the evaluation process and thefeedback of its results should first trigger self-regulatory and self-evaluation mechanisms. These internal mechanisms in turn weresaid to yield the expected consequences at individual level that arelisted in Table 2.

Following the same approach we used for the system level, wewill empirically examine those expected consequences at individ-ual level thatwerementioned bymore than one stakeholder group,and which also include all the consequences stipulated in thepolicy and legal documents. We again derived specific underlyingassumptions we could then examine in our empirical study (seeAppendix A).

7. Conclusions and lessons learnt

Reflecting back on the process of developing the programtheory we have various observations to share. First of all, we were

constantly reminded that this program took shape as a compro-mise between three political groups after seven years ofnegotiation. We were reminded of how Weiss (1973) describedthe political reality of social program development and adoption,and the resulting multitude of visions regarding a given program’sgoals and functioning. Clearly, the difficult negotiation process andfragile compromise as described in detail by Avalos and Assael(2006) is reflected in our findings. While individual differenceswithin a group could often be explained by the interviewees’professional backgrounds (e.g., teacher versus economist versuspsychologist), we also observed differences between the stake-holder groups based on their political or institutional affiliations.However, all the expected consequences we found to be reflectedin the legal documents regarding NTES were present and sharedbetween at least two groups at a time.

Second, during the interviews we observed that it wassometimes difficult to distinguish between what the intervieweehimself or herself wished the program theorywould have been like(his or her ‘‘wishful thinking’’ about the program), had he or shebeen able to influence the negotiation process more noticeably,versus his or her description of the program theory as it reallypresented itself based on the actually negotiated compromise, andalso without taking into account anecdotal evidence concerning‘‘real’’ consequences of NTES to date.

Third, during the interviews we strove to obtain details on themechanisms of change, i.e., exactly how the interviewees thoughtthe expected consequences would be reached (e.g., intervening ormoderator variables). In most cases, especially at system level, wewere only able to talk about what were the expected conse-quences; wewere not able tomove on to how thesewere supposedto be reached. However, we were somewhat successful in gettingat the individual motivation theory underlying the evaluationsystem, which seemed easier to think about for the interviewees(who are mostly psychologists or teachers by training). Thedifficulty of getting program designers and implementers toexplicate the precise, detailed mechanisms of change, hypotheses,or causal links underlying their programs has already been noted inthe evaluation literature (see Weiss, 1973; Patton, 2008; Chen,1990; Donaldson, 2007). In fact, the examples of program impacttheories published in the literature closely resemble our diagramsin that they contain the program as input on the left hand side,short-term or intermediate expected outcomes in the middle, andlong-term consequences or impacts on the right hand side (seeRossi et al., 1999; Donaldson, 2007). It seems rare that, in workingwith stakeholders, complex input–mediator–moderator–out-comes models of a program can be elaborated. Also, generallysuch diagrams need to be easy to grasp and communicate and thiscalls for a certain level of simplicity (Donaldson, 2007).

In summary, the diagrams presenting theories underlying theNTES, although not reflecting the complexity of all the variables atplay, are not unusual for program theories elaborated jointly withstakeholders. Importantly, as a research team we (1) ensured thatall effects as mentioned in the policy and legal documents werereflected in the consolidated intended effects tables guiding futureempirical research, and (2) went one step further by inferringmore

Table 2

Expected short-term consequences of NTES at individual level as mentioned by the four stakeholder groups.

Expected consequences Stakeholder groups

Implementinginstitution

Ministry ofEducation

TeacherUnion

Municip.Assoc.

(1) strengthening adoption of professional standards – – –(2) triggering change of weak practices – (3) maintaining good practices by triggering internal reinforcement of diagnosed strengths – – (4) stimulating teacher learning – – –(5) making individuals feel accountable for diagnosed weaknesses – – –

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486484

detailed underlying assumptions regarding the program’s func-tioning, similar to the approach chosen by the Committee onEvaluation of Teacher Certification by the National Board forProfessional Teaching Standards (National Research Council, 2008)(see Appendix A).2

We found ituseful tohaveconsultedwith stakeholders insteadofonly revising existing documents. Not only did it raise additional,valid expected consequences not explicated in the legislation, it willalso helpmakemore relevant to stakeholders our ensuing empiricalresearch on intended and unintended consequences of NTES. Thesestakeholder groups in general, and some of the interviewees inparticular,will be among the audience of our researchfindings. Theywill be the ones in positions to ask formodifications ofNTES in orderto make it more effectively reach its intended effects and to avoidnegative, unintended ones in the future.

Regarding theory elaboration process we chose, while we foundthe stakeholder interviews beneficial in elaborating on the resultsof the document analysis and further illuminating the program’sintended consequences, we could have ended by holding a large-groupmeetingwhere each group’s versionwould be presented andcommented on in terms of differences and commonalities, andcomments by each group’s representatives would be solicited. Thiscould have been done with NTES’ technical advisory board, whichis comprised of representatives of the four stakeholder groupsinvolved in our study. This may have brought additional benefitsnot only for our own understanding of the program, but also thepositive side effect of triggering new insights for the stakeholders.It is important to remember, however, that we were dealing with ahighly visible national educational policy. Besides feasibility issuesin arranging for joint meetings with all interviewees, we mustremember that this program exists as a remarkable yet fragilecompromise following difficult negotations between three socialgroups. Publicly resurfacing their different views on the programmay also have had detrimental effects on the relative politicalstability of the existing agreement.

In explicating NTES’ underlying theory we became aware thatno theory-based or logic modeling approach had been used in itsdesign—at least not explicitly or documented. Applying programtheory or logic modeling approaches when designing publicpolicies has its benefits and drawbacks. Adopting such anapproach makes program design more transparent and facilitatesa more effective and efficient program planning and implemen-tation later on. On the other hand, transparency also makes moreapparent existing disagreements, and it invites public scrutinyand accountability for intended effects. However, evaluations ofthe effectiveness of social programs can overall benefit fromexisting program logic models and assumptions, while hopefullynot neglecting possible unintended effects. Policy-makers willlikely need the facilitation skills and expertise of externalconsultants to help them develop program theories or logicmodels. Our experience and the literature coincide that it is adifficult and time-consuming task, especially when there are noexisting social science theories that can inform this process(Stufflebeam & Shinkfield, 2007). On the other hand, workingjointly with evaluators on this task would provide a way to linkprogram design with program evaluation and enhance theevaluability of social programs and policies (Wholey, Hatry &Newcomer, 1994).

Acknowledgments

This study is part of a larger research project supported by theFondo Nacional de Desarrollo Cientıfico y Tecnologico (Fondeyct, no.1080135) of the Chilean government.We thank Patricia Thibaut forher competent research assistance.We also thank four anonymousreviewers and the editor for their helpful comments.

Appendix A

Examples of specific assumptions underlying NTES at system level:

(1) Offering social reinforcement of good teaching practice:- NTES information is used by superiors, peers and the public to in factprovide social reinforcement to good teachers.

- Good teachers perceive the social reinforcement they receive as tied tothe NTES.

- Social reinforcement is perceived as attractive and as improvingjob satisfaction.

- Offering social reinforcement to some teachers has no negativeconsequences,e.g. on those teachers not rewarded, or on the work climate in general.

(2) Building capacity of teachers with shortcomings:- NTES informs the professional development planning efforts that aredesigned for teachers by the municipalities (PSP).

- The professional development courses in fact teach relevant knowledgeand skills, tackling those areas where teachers are weakest.

- The professional development courses are taken by those teachers in mostneed of improvement.

- Those teachers most in need learn relevant knowledge and skills andtransfer these knowledge and skills to improve their teachingpractice in the classroom.

(3) Informing selection of new teachers and exit of unsatisfactory teachers:- NTES scores are taken into account in teacher selection processes atmunicipal level.

- Teachers with competent and outstanding performance in NTES are infact more likely to be hired.

- NTES provides the justification for facilitating the exit from theclassroom of consistently unsatisfactory teachers.

- Teachers with consistently (i.e., three consecutive years of)unsatisfactory performance in NTES do in fact leave theteaching profession.

(4) Providing a base for peer-to-peer conversations about good practice:- NTES results and reports are discussed among peers or in teams of teachers.- NTES results and reports provide the basis for talking among peers aboutwhat is good practice, and how to achieve it.

(5) Improving teachers’ job prospects by providing incentives:- Monetary incentives associated with NTES (AVDI) are perceivedby teachers as attractive and positively influence their job satisfaction.

- Monetary incentives associated with NTES in fact motivate teachersto continue teaching in the classroom.

- Non-monetary incentives are also known, and perceivedas attractive and as improving job commitment.

(6) Using teacher performance diagnostic to target educationalmanagement decisions:

- The reports the NTES provides to local educational decision-makersare informative and comprehensible so that they can be taken as aninput in the educational decision-making process.

- The NTES is perceived as a credible source of information aboutteacher performance.

- Local educational authorities have the know-how and experience totransform NTES results into local educationalmanagement and policy decisions.

Examples of specific assumptions underlying NTES at individual level:

(1) Triggering change of weak practices:- Teachers find the information on their weak areas of practice that iscontained in the individual evaluation reports to be clear, trustworthy,reliable and valid.

- Teachers reflect on the information and acknowledge that they dohave the weaknesses encountered by the evaluation.

- Teachers develop a strong intention to change these weaknesses.- Teachers believe they are able to change these weaknesses (self-efficacy).- Teachers develop concrete and useful strategies how to bring about thenecessary changes in their teaching practice.

2 A pending task to be addressed is an assessment of the quality of theassumptions we derived, which should be based on existing literature on the topic(see Rossi, Freeman & Lipsey, 1999). We plan on performing such a literaturereview, for example, regarding the effects of monetary incentives programs forteachers; effects of teacher professional development programs on teachingpractice and student learning; auto-regulatory and self-evaluative processes basedon performance feedback.

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486 485

Appendix A (Continued )

Examples of specific assumptions underlying NTES at individual level:

- Teachers in fact act upon their intention, successfully implement theirstrategies, and finally are successful at changing their teaching practice(changes in behavior).

- Pointing out weak areas has no substantial negative effects on teachers,such as negative emotions (anger, depression), diminished job motivation.

(2) Maintaining good practices by triggering internal reinforcement ofdiagnosed strengths:

- Teachers find the information on their strong areas of practice that iscontained in the individual evaluation reports to be clear, trustworthy,reliable and valid.

- Teachers reflect on the information and acknowledge that they do have thestrengths encountered by the evaluation.

- This information triggers pride or a similar feeling of internal reinforcement.- This internal reinforcement results in the teacher consciously making aneffort to maintain these good practices in the classroom.

References

Assael, J., & Pavez, J. (2008). La construccion e implementacion del sistema deevaluacion del desempeno docente chileno: Principales tensiones y desafıos[The development and implementation of the Chilean teacher evaluation system:Key tensions and challenges]. Revista Iberoamericana de Evaluacion Educativa, 1(2),42–55.

Avalos, B. (2004). Teacher regulatory forces and accountability policies in Chile: Frompublic servants to accountable professionals. Research Papers in Education, 19(1),67–85.

Avalos, B., & Assael, J. (2006). Moving from resistance to agreement: The case of theChilean teacher performance evaluation. International Journal of Educational Re-search, 45, 254–266.

Barber, M., & Mourshed, M. (2007). How the wold’s best performing school systemscome out on top. Retrieved October 30, 2009 from http://www.mckinsey.com/locations/UK_Ireland/Publications.aspx#Reports.

Bickman, L. (1987). The functions of program theory. New Directions for ProgramEvaluation, 33, 5–18.

Carvalho, S., & White, H. (2004). Theory-based evaluation: The case of social funds.American Journal of Evaluation, 25(2), 141–160.

Chen, H. T. (1990). Theory-Driven Evaluations. Newbury Park: Sage Publications.Chen, H. T., & Rossi, P. H. (1983). Evaluating with sense: Theory-drive approach.

Evaluation Review, 7, 283–302.Chen, H. T., & Rossi, P. H. (1987). Theory-driven approach to validity. Evaluation and

Program Planning, 10, 95–103.Consejo Asesor Presidencial para la Calidad de la Educacion. (2006). Informe final de

consejo asesor presidencial para la calidad de la educacion [Final report of thepresidential committee on educational quality]. Retrieved October 20, 2009 fromhttp://www.consejoeducacion.cl/articulos/Informefinal.pdf.

Cox, C. (2008). Educacion en el Bicentenario: dos agendas y calidad de la polıtica[Education in the Bicentenario: two agendas and the quality of policies]. RetrievedOctober 20, 2009 from http://mt.educarchile.cl/mt/jjbrunner/archives/2008/01/cristian_cox_ed.html.

Danielson, C. (1996). Enhancing professional practice: A framework for teaching. Alex-andria, VA: Association for Supervision and Curriculum Development.

Davidson, E. J. (2005). The ‘‘Baggaging’’ of theory-based evaluation. Journal of Multi-Disciplinary Evaluation, 4, iii–xiii.

Decreto No. 192 of Education, ‘‘Reglamento sobre Evaluacion Docente [Regulationsabout the Teacher Evaluation]’’. (2004). Retrieved October 14, 2008, from http://www.docentemas.cl/documentos.php.

Donaldson, S. I. (2007). Program theory-driven evaluation science. Strategies and applica-tions. New York: Lawrence Erlbaum Associates.

Ehren, M. C. M., Leeuw, F. L., & Scheerens, J. (2005). On the impact of the Dutcheducational supervision act. Analyzing assumptions of primary education. Ameri-can Journal of Evaluation 26(1).

Forte Fast, E. F., Hebbler, S., & ASR-CAS Joint Study Group on Validity in AccountabilitySystems. (2004). A Framework for Examining Validity in State Accountability Systems.Council of Chief State School Officer.

Frechtling, J. A. (2007). Logic Modeling Methods in Program Evaluation. San Francisco:Jossey-Bass.

Greaney, V., & Kellaghan, T. (2008). Assessing National Achievement Levels in Education.Washington, DC: The World Bank.

Law 19.933. (2004). Retrieved October 14, 2008, fromwww.rmm.cl/usuarios/equiposite/doc/200501121332150.19933.pdf.

Law 19.961 ‘‘Evaluacion Docente [Teacher Evaluation]’’. (2004). Retrieved October 14,2008, from www.docentemas.cl/docs/Ley_19961.pdf.

Leeuw, F. L. (2003). Reconstructing program theories: Methods available and problemsto be solved. American Journal of Evaluation, 24(1), 5–20.

Manzi, J. (2009). Individual incentives and teacher evaluation: The Chilean case. Unpub-lished manuscript prepared for the Organisation for Economic Co-operation andDevelopment (OECD), Paris, France.

Manzi, J., Preiss, D., Gonzalez, R., Flotts, P., & Sun, Y. (2008).Design and Implementation ofa National Project of Teaching Assessment: The Chilean Experience. Paper presented atthe annual meeting of the American Educational Research Association, March 24–28, 2008, New York City, USA.

McLaughlin, J. A., & Jordan, G. B. (1999). Logic models: A tool for telling your program’sperformance story. Evaluation and Program Planning, 22, 65–72.

Ministry of Education. (2004). Marco para la Buena Ensenanza [Framework for GoodTeaching]. Santiago, Chile: Ministry of Education of the Chilean Government.

Ministry of Education (n.d.-a). Fortalecimiento de la profesion docente [Strengthening ofthe teaching profession]. Retrieved October 14, 2008, from www.rmm.cl/usuarios/equiposite/doc/200501121331010.politica%20fortprof%20docente.doc.

Ministry of Education (n.d.-b). Polıticas Educacionales durante los Gobiernos demo-craticos [Educational Policies by the democratic governments]. Retrieved October14, 2008, from www.rmm.cl/usuarios/equiposite/doc/200501121331270.polt%20educ.doc.

National Research Council. (2008). Assessing accomplished teaching: Advanced-levelcertification programs. Committee on Evaluation of Teacher Certification by theNational Board for Professional Teaching Standards. In: Milton D, Hakel, JudithAnderson Koenig, Stuart W. Elliott, editors. Board on Testing and Assessment,Center for Education, Division of Behavioral and Social Sciences and Education.Washington, DC: The National Academic Press.

Organisation for Economic Co-operation and Development. (2004). Reviews of nationalpolicies for education: Chile. Paris, France: Organisation for Economic Co-operationand Development, Centre for Co-operation with Non-members.

Organisation for Economic Co-operation and Development. (2005). Teachers Matter:Attracting, Developing and Retaining Effective Teachers. Paris, France: Organisationfor Economic Co-operation and Development.

Patton, M. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage.Rossi, P., Freeman, H., & Lipsey, M. (1999). Evaluation: A systematic approach. Thousand

Oaks: Sage.Stame, N. (2004). Theory-based evaluation and types of complexity. Evaluation 10(1).Stufflebeam, D., & Shinkfield, A. (2007). Evaluation theory, models & applications. San

Francisco, CA: Jossey-Bass.Wholey, J. S., Hatry, H. P., & Newcomer, K. E. (Eds.). Assessing the feasibility and likely

usefulness of evaluation. In: Handbook of Practical Program Evaluation (pp. 15–39).San Francisco: Josey-Bass.

Weiss, C. H. (1972). Evaluation research: Methods for assessing program effectiveness.Englewood Cliffs, NJ: Prentice Hall.

Weiss, C. H. (1973). The politics of impact measurement. Policy Studies Journal, 1, 179–183.

Sandy Taut earned her Ph.D. from the Graduate School of Education at the University ofCalifornia Los Angeles (UCLA), with an emphasis on program evaluation. She currentlyleads the research unit at the Measurement and Evaluation Center (MIDE UC) of thePontificia Universidad Catolica de Chile.

Veronica Santelices received her Ph.D. from the Graduate School of Education at theUniversity of California Berkeley. She currently works as an educational researcher atthe Measurement and Evaluation Center (MIDE UC) of the Pontificia UniversidadCatolica de Chile.

Carolina Araya holds a Masters degree in psychology from the Pontificia UniversidadCatolica de Chile and works as an educational researcher at the Measurement andEvaluation Center (MIDE UC) of the Pontificia Universidad Catolica de Chile.

Jorge Manzi holds a Ph.D. in psychology from the University of California Los Angeles(UCLA). He is a full professor at the School of Psychology at Pontificia UniversidadCatolica de Chile and the director of the Measurement and Evaluation Center (MIDEUC) of the Pontificia Universidad Catolica de Chile.

S. Taut et al. / Evaluation and Program Planning 33 (2010) 477–486486