A LITERACY PROFILE OF MAJORITY-LANGUAGE DUAL-IMMERSION PARTICIPANTS

A LITERACY PROFILE OF MAJORITY-LANGUAGE

DUAL-IMMERSION PARTICIPANTS

APPROVED BY SUPERVISING COMMITTEE:

________________________________________ Howard L. Smith, Ph.D., Chair

________________________________________

Belinda Bustos Flores, Ph.D.

________________________________________ Monica Lara, Ph.D.

Accepted: _________________________________________

Dean, Graduate School

DEDICATION

This is dedicated to my husband, Chris Cessac. Words cannot express my gratitude for all your support in our years together.



by

JEANNE SINCLAIR, B.A.

THESIS Presented to the Graduate Faculty of

The University of Texas at San Antonio in Partial Fulfillment of the Requirements

for the Degree of

MASTER OF ARTS IN BICULTURAL-BILINGUAL STUDIES

THE UNIVERSITY OF TEXAS AT SAN ANTONIO College of Education and Human Development

Department of Bicultural-Bilingual Studies May 2014

iv

ACKNOWLEDGEMENTS

I would like to thank the Department of Bilingual/Bicultural Studies for providing me

with an unparalleled education over the past five years and for inspiring me to continue in the

field. Dr. Howard L. Smith patiently supported me through this project from start to finish – I

appreciate the countless hours you have dedicated! Dr. Belinda Bustos Flores helped me navigate

the statistics portion of this project, and I will always appreciate what you have done. I also want

to thank the Texas school district that provided the student data for this study. I hope the final

product can be useful to you.

Special thanks are in order for Dr. Francis Hult for introducing me to the field and being

a mentor ever since. I would also like to thank Dr. Monica Lara for offering advice on early

drafts and for reading with such careful attention to detail.

My children – Marie, Adras, and Etta – you are amazing and you inspire me! Thank you

for always being understanding about “college night”. And to my husband Chris – thank you for

being our family’s rock – we would all fall apart without you. My deepest thanks for all you

have done.

May 2014

v



Jeanne Sinclair, M.A.

The University of Texas at San Antonio, 2014

Supervising Professor: Howard L. Smith, Ph.D.

This thesis establishes a numerical profile of native English-speaking students’

literacy in a Spanish-English two-way dual language immersion program, based on state

assessment data. This is a relevant area for research because such programs are increasingly

common, and yet there are relatively few investigations focusing on this group’s literacy. The

subjects are third-, fourth-, and fifth grade students in a Central Texas school district.

Two-way immersion (TWI) programs integrate children from diverse language

backgrounds and provide academic instruction in two languages. In this case, the curriculum is

taught in Spanish and English. Approximately half of the students in these TWI programs are

classified as Limited English Proficient/English Language Learners (ELL/LEP), which indicates

that their primary language is one other than English and that through norm-referenced tests they

demonstrate to not have acquired English language abilities commensurate with their age or

grade level. The other half of the students participating in this TWI program is non-ELL/LEP,

meaning that their home language is English (also known as majority language speakers). I

analyze the non-ELL/LEP students’ scores on the State of Texas Assessment of Academic

Readiness (STAAR) reading tests, and compare with ELL/LEP students in the same program, as

well as grade-level peers in monolingual (English-only) settings. I also investigate data trends

related to socioeconomic status.

vi

TABLE OF CONTENTS

Acknowledgements........................................................................................................................ iv

Abstract ............................................................................................................................................v

List of Tables ............................................................................................................................... vii

List of Figures .............................................................................................................................. viii

Chapter One: Introduction ...............................................................................................................1

Chapter Two: Literature Review .....................................................................................................3

Chapter Three: Methodology.........................................................................................................34

Chapter Four: Findings ..................................................................................................................43

Chapter Five: Discussion ...............................................................................................................60

Chapter Six: Implications ..............................................................................................................66

Conclusion .....................................................................................................................................72

Appendices.....................................................................................................................................73

References......................................................................................................................................76

Vita

vii

LIST OF TABLES

Table 1 STAAR 2012 Mean P-Values and Internal Consistency

Values By Reporting Category and Content Area.....................................31

Table 2 Independent and Dependent Variables ......................................................35

Table 3 T-Test Results for Non-ELL/LEP students in TWI and English

classrooms (independent variable: education program, dependent

variable: ratio score) ..................................................................................44

Table 4 Percentage of students meeting passing and advanced standards and

mean ratio scores, non-TWI/non-ELL/LEP and TWI/non ELL/LEP........46

Table 5 T-Test Results for ELL/LEP and Non-ELL/LEP students in TWI

program (independent variable: ELL/LEP status, dependent


Table 6 Percentage of students meeting passing and advanced standards and

mean ratio scores, TWI and non-ELL/LEP and TWI and ELL/LEP.........49

Table 7 Descriptive Statistics for non-ELL/LEP Students (independent

variables: EcoDis/TWI, dependent variable: ratio score) ..........................51

Table 8 Between-group effects (ANOVA) of non-ELL/LEP students

(independent variables: EcoDis and TWI, dependent variable:

ratio score...................................................................................................52

Table 9 Partial Eta Squared and Observed Power of non-ELL/LEP

students (independent variables: EcoDis and TWI, dependent


Table 10 Post-hoc t-test of non-ELL/LEP students (independent variable:

viii

EcoDis, dependent variable: ratio score) ...................................................54

Table 11 Descriptive Statistics for All Students (independent variables:

EcoDis/TWI, dependent variable: ratio score)...........................................54

Table 12 Between-group effects (ANOVA) of All Students (independent

variables: EcoDis and TWI, dependent variable: ratio score) ...................56

Table 13 Partial Eta Squared and Observed Power of non-ELL/LEP students

(independent variables: EcoDis and TWI, dependent variable: ratio

score)..........................................................................................................56

Table 14 Post-hoc t-test of all students (EcoDis is independent variable, ratio

score is dependent variable).......................................................................58

ix

LIST OF FIGURES

Figure 1 Classification of bilingual programs.......................................................... 5

Figure 2 Relationship between STAAR Reading Scale Scores

and Ratio Scores ...................................................................................... 38

Figure 3 Bar Chart of mean ratio scores of non-ELL/LEP students in

TWI and English (mainstream) classrooms, Grades 3-5 ......................... 45

Figure 4 Estimated Marginal Means of Ratio Score of Two-Way

Immersion ELL/LEP and non-ELL/LEP Students in Grades 3-5 ........... 48

Figure 5 Summary of Program and Language Factors: Percentage students

meeting passing standard, Grades 3......................................................... 50

Figure 6 Estimated Marginal Means of Ratio Score of non-ELL/LEP


ratio score) ............................................................................................... 53

Figure 7 Estimated Marginal Means of Ratio Score of All Students


ratio score) ............................................................................................... 57

1

CHAPTER ONE: INTRODUCTION

The multicultural context of the U.S. requires innovative educational models to reach

students of diverse socioeconomic and linguistic backgrounds. Educators want to provide rich

opportunities for students who are not yet proficient in English (known as students with Limited

English Proficiency or English-language learners, ELL/LEP). Two-way immersion (TWI), a type

of bilingual education program, is essentially designed to meet these objectives by integrating

students from two or more linguistic backgrounds and teaching the school’s curriculum in two

languages. The number of TWI programs offered throughout the U.S. has risen quickly and

substantially over the past decade, numbering over four hundred throughout the U.S. as of 2007

(Center for Applied Linguistics, 2007).

There are several large-scale studies demonstrating the positive effects of TWI programming

on ELL/LEP students (Lindholm-Leary, 2001; Lopez & McEneaney, 2012; Ramirez et al., 1991;

Thomas & Collier, 2002, 2004). However, there is little existing research on the impact on

majority-language (in this project’s case, English) speakers. For schools to continue to

meaningfully integrate children through dual-language education, parents, educators, and

policymakers need to be able to predict their academic outcomes. Yet these and other questions

remain about the possibility for majority-language students to become bicultural and biliterate in

TWI programs, as well as the impact of integration on minority-language speakers’ academic

success (Valdés, 2011).

Given that literacy is the pillar of the U.S. educational experience, it is important to know

whether TWI participants are acquiring literacy at the same rate as other students. This project

seeks to develop a literacy profile of majority-language participants in dual-immersion programs,

using data from the Texas state-mandated reading and writings tests, the State of Texas

2

Assessments of Academic Readiness (STAAR) in the upper elementary years (third, fourth, and

fifth grade). I seek to answer questions such as, do the majority-language (English-speaking)

TWI participants read at the same level as their monolingual English peers (i.e., those in a

mainstream classroom)? How do the majority-language participants compare to the minority-

language students who are in the TWI program? Are students from different socioeconomic

backgrounds performing at similar levels?

3

CHAPTER TWO: LITERATURE REVIEW

Bilingual Education

In essence, bilingual education is a pedagogy that instructs students in two languages: a

language that students speak as their home or native language, and a second language that

students are learning. In the U.S., bilingual education has allowed English Language Learners or

students of Limited English Proficiency (ELL/LEP) to keep up academically through academic

content instruction in their native language, while also acquiring English through sheltered

instructional strategies. Of special note is the use of the term “ELL/LEP”. “English Language

Learner”, or ELL, is the term currently in use for Texas students who are learning English as a

second language and who have not acquired English proficiency. However, the older term,

“Limited English Proficient”, remains in use by government agencies and so it is present in the

data. Wherever possible I have used “ELL/LEP” to be inclusive of both terms and to avoid

confusion.

Currently, the majority of U.S. ELL/LEP students served by bilingual education

programs are Spanish-speaking immigrants, approximately half of whom live in poverty (Capps

et al., 2005). However, it was wealthy Cuban refugees fleeing Fidel Castro’s regime who created

the first official U.S. bilingual program. Seeing themselves as sojourners who would return to

Cuba after what they imagined would be Castro’s short-lived rule, they wanted their children to

maintain cultural and linguistic ties to their homeland. This desire, combined with political

influence, brought the Spanish-English dual language program at Coral Way Elementary School

to fruition in Miami in 1963 (Crawford, 1995).

Progressive actions at the federal level promoted the development of more bilingual

programs through the nineteen-sixties and seventies. The first Bilingual Education Act of 1968

4

encouraged schools to provide native language instruction to minority-language students. In

1974 the Supreme Court ruled in Lau vs. Nichols that the San Francisco Unified School District

was effectively discriminating against language minorities by not providing instruction in such a

way that allowed ELL/LEP students to comprehend academic content. The Lau ruling and the

1974 Equal Education Opportunity Act stimulated the creation of new programs to serve

minority language students (Crawford, 1995).

Through the 1990s, the most common bilingual education model was transitional

bilingual education (TBE). TBE programs provide native language instructions to ELL/LEP

students in the early elementary grades, and then transition the students as quickly as possible to

mainstream English instruction. This is considered a subtractive program, because English

monolingualism and literacy are the central program goals, rather than bilingualism and

biliteracy; thus, the native language is “subtracted” from the student. (May, 2008). TBE

programs have a negative reputation in some schools as “waiting rooms” for a “real” (i.e.,

English-only) classroom (De Jong & Howard, 2009, p. 83).

At the other end of the pedagogical spectrum are additive, or strong, bilingual programs.

These are designed to add to a student’s spoken and written language repertoire by developing

both the home language and the second language. Additive bilingual programs include

maintenance, heritage, and one-and two-way immersion programs (May, 2008). Students

enrolled in additive programs tend to attain higher academic success than those in subtractive

programs such as TBE, English as a Second Language (ESL), and Structured English Immersion

(SEI) (May, 2008). Please see Figure 1 for a diagram of different language program models.

5

Figure 1: Classification of bilingual programs,

from May, 2008, adapted from Hornberger, 1991

Two-Way Immersion

Two-way immersion models -- the focus of this paper – are a type of strong, or additive,

bilingual model, because they enrich and develop literacy in the student’s L1 and L2. Two-way

immersion programs seek not to remediate what some educators wrongly view as students’

“language deficits”. Instead, these programs develop students’ linguistic resources through

content instruction in the two languages (Genesee, 1987).

TWI is the only bilingual model that purposefully integrates equal numbers of minority-

and majority-language speakers in a single classroom to develop bilingualism and biliteracy,

learn academic content, and develop cross-cultural competence (De Jong et al., 2011; Lindholm-

Leary, 2001). TWI requires teachers who have a developed skill set for teaching language

through content instruction and working with ELL/LEP students to build on academic and

6

linguistic strengths. TWI can offer students opportunities for authentic communication, and

extend opportunities for real language use. Because the students can be language models, there is

the possibility of less language error fossilization or plateauing than in a traditional foreign

language classroom (De Jong & Howard, 2009).

There is ample literature describing the success of TWI programs for ELL/LEP students.

Thomas & Collier (2002) performed a meta-analysis of outcomes in different program models,

and concluded that students in TWI programs outperformed those in transitional bilingual

programs in Spanish achievement tests. Thomas and Collier (2004) also researched 13,456

ELL/LEP students in TWI, transitional, and developmental programs in Houston Independent

School District, with similar conclusions: TWI had the best results, outperforming others in

English, and meeting or outperforming the others in Spanish.

De Jong (2004) researched ELL’s English proficiency in transitional bilingual programs

(TBE) compared with TWI. In this study of 233 children in kindergarten through fifth grade, De

Jong found that “both programs demonstrated significant growth from grade to grade in oral

language development (particularly in the early grades, K-2) and in reading and writing (all

grades)” (p. 103). In essence, this study shows that participating in TWI does not negatively

impact English language development for ELL/LEP students. Also, TWI programs had superior

outcomes in reading and writing compared with the transitional bilingual program.

Lopez and Tashakkori (2006) looked at TWI and transitional programs for ELL/LEP

students who started in kindergarten or 1st grade and were tested in fifth grade. TWI students

acquired English faster than those in transitional programs, and they also showed more positive

attitudes toward bilingualism. These scholars found no significant difference in reading,

7

mathematics, and science outcomes for ELL/LEP students between the two programs, although

TWI did show superior Spanish reading skills.

Lindholm-Leary and Block (2009) compared the achievement of Hispanic ELL/LEP

students and Hispanic non-ELL/LEP students from low socioeconomic (SES) backgrounds in

TWI and mainstream English programs. They found that dual language students from both

language backgrounds “achieve comparably or significantly higher than their mainstream peers

in tests of English reading/language arts and mathematics” (p. 55). They found that students in

TWI close the achievement gap in math and language arts faster than those in mainstream

English programs.

Similarly, Schouten (2006) compared low SES, Hispanic students’ outcomes in TWI and

mainstream programs. She found that third and fourth grade students in both programs

performed at similar levels, but that the TWI students showed increased gains by 5th grade. This

is evidence that cognitive and linguistic benefits are enhanced when students remain in TWI for

the full course of elementary school.

Lindholm-Leary and Hernández (2011) looked at different subgroups of upper

elementary and middle school Latino students in TWI who come from low-SES backgrounds:

native English speakers, current ELL/LEP students, fluent English proficient/previous ELL/LEP

students (RFEP). RFEPs in the TWI program, who are considered an “at-risk” group, showed

achievement at comparable or higher levels than peers and native English speakers in

mainstream English classrooms.

Pérez and Flores (2002) investigated a 90/10 TWI program with 62 Spanish-dominant

and English-dominant third graders in a majority low-income school. In this program, reading

was taught only in Spanish from kindergarten to second grade, and English reading was

8

introduced in third grade. The results from a third grade English reading test showed both

language groups could transfer reading skills from Spanish. This is strong evidence in support of

Cummins’s Common Underlying Proficiency.

Cobb (2009) investigated outcomes of three programs: TWI for native English and

Spanish speakers, ESL (for native Spanish speakers), and standard elementary (for native

English speakers). This longitudinal study from 3rd through 7th grade had 166 students

participating. Cobb found that “[d]ual language schooling, when implemented properly by

schools, must be considered at least equally as effective in core academic achievement areas as

traditional elementary schooling, and is probably more effective in the long term” (p. 41). The

results show TWI to have better outcomes than ESL and traditional schooling models, especially

in reading and writing.

Dow (2008) also performed a longitudinal study, with 200 students in grades 1-6. The

study compared ELL/LEP students and non-ELL/LEP students achievement on standardized

tests in one-way dual language, TWI, and monolingual English programs. For ELLS, two-way

outcomes were slightly better than one-way. There was no difference for non-ELL/LEP students.

Barnett et al. (2007) looked at TWI and English Immersion programs for 131 ELL/LEP

and non-ELL/LEP preschool-aged children (ages 3 and 4). They concluded that the English

language measures were similar for both groups. TWI improved the Spanish language

development of ELL/LEP and native English-speaking children without losses in English

language learning. Among the native Spanish speakers, the TWI program produced large gains

in Spanish vocabulary compared to the English Immersion program. All participants had

“substantial gains in language, literacy, and mathematics” (p. 277).

9

Nakamoto et. al’s 2010 study of Spanish ELL/LEP outcomes provides an example of the

importance of longitude in drawing conclusions about language programming. They investigated

Spanish ELL/LEP student outcomes in TWI, transitional bilingual education, and monolingual

English classrooms in first, second, and third grade. As expected, the first and second graders in

the programs with Spanish instruction performed better on Spanish measures and worse on

English measures, and the students of the same age in the English classroom showed the opposite

effect. The authors conclude that by third grade, the results were approximately equal for the

sample students in all three programs.

However, from this positive result, it is not possible to assume that all students continue

with an equal trajectory. Even if students continue in English-only instruction after third grade,

the effect of the primary language instruction in the early years may be latent or hidden,

providing linguistic and cognitive support in later grades when vocabulary is more difficult and

comprehension tasks more complex. Thus, research on older students could provide a more

comprehensive picture of literacy.

Successful TWI Programs. Howard and Sugarman found successful TWI programs can

vary in terms of the proportion of each language used in content instruction, and the language

and sequence of literacy instruction (2007). Research suggests that no approach is intrinsically

better (Cummins, 2000; Howard & Sugarman, 2007). According to Lindholm-Leary, comparing

TWI design is “complicated by the cultural, social class, and linguistic diversity in student

populations and at different school sites” (Lindholm-Leary, 2001, p. 37).

There are several approaches to achieving biliteracy in these programs. In some, all

students learn to read in the minority language, regardless of native language. In others, literacy

is taught in the native language initially (i.e., English literacy for English speakers, and Spanish

10

literacy for Spanish speakers). In yet other program designs, initial literacy is taught in both

languages to all students. For example, some TWI programs instruct in the target (minority)

language substantially more than the majority language. For example, the 90/10 model teaches

the target language for 90% of the instructional day in kindergarten, 80% in first grade, 70% in

second grade, and so on. Another type of program, known as a 50/50 model, instructs for equal

time in both languages in all grade levels (Howard & Sugarman, 2007).

According to Christian et. al’s 1997 findings, none of these biliteracy approaches is

inherently better, and all can be successful, as long as the design takes into consideration

students’ access to academic support at home. The more academic support a student has at home,

the less L1 instruction that student needs. For example, students from middle- or upper-class

English-speaking homes have shown literacy success in programs that focus only on Spanish

literacy in kindergarten and first grade, because parents are available at home to offer support in

learning to read in English, these students’ native language.

On the other hand, Cloud et al. (2000) recommend teaching initial literacy in the minority

language for all TWI participants, although they do note that some programs opt to teach literacy

in the native language to both groups, to “capitalize on their existing oral skills” while focusing

on the minority language for academic content instruction. Lindholm-Leary leaves the question

open to further research: “it is not clear whether particular approaches to literacy result in better

outcomes than others for specific populations of students” (2001, p. 71).

Some scholars have posited that learners from both language groups in TWI programs

can be taught literacy in both languages in the early elementary years (Escamilla, 2013).

Cummins suggests that for populations of varying bilingual proficiencies, upon entry to the

program “…it may be more effective to promote literacy in both L1 and English

11

simultaneously…to work for transfer across languages from an early stage” (2000, p. 194).

Proctor et. al (2010) agree, and their results “suggest that biliterate outcomes may be optimized

by literacy instruction delivered in L1 and L2 simultaneously” (p. 17).

As seen here, programs given the name “two-way immersion” vary in their approaches

to biliteracy instruction. These variations “reflect both differences in community needs as well as

the population served by the schools” (Christian et al., 1997, p. 116). The key for a successful

program seems to be that student needs drive program decisions (Howard & Sugarman, 2007).

Theories of Second Language Acquisition and Biliteracy

This section presents scholarship on theories that support bilingual education. Included

here are seminal works that helped to define the path of second language acquisition teaching

and learning. These theories demonstrate the mechanisms that allow students to learn material

and become literate in their nonnative language.

Common Underlying Proficiency. Cummins’ common underlying proficiency (also

known as central processing system or interdependence hypothesis) posits that transfer of

academic skills and knowledge occurs across languages, under appropriate conditions of student

motivation and exposure to both languages. Cognitive abilities, specific linguistic features and

skills, literacy, general concepts, knowledge and schemata learned in L1 are accessible in L2,

once you have the proficiency to express them. Thus, learners do not have to relearn these skills,

because they are accessible through the common underlying proficiency. This theory, also

known as the “dual iceberg” theory, supports TWI because content taught in one language

becomes knowledge available in both languages. (Cummins, 1981, 2000; Lindholm-Leary,

2001).

12

Academic and social language. Roger Shuy developed the iceberg metaphor (1981) to

describe surface and deep linguistic levels. The surface levels, such as grammar, are easily taught

and assessed, while the deeper levels includes language comprehension, discourse, and

semantics, which are not easy to explicitly teach or assess. Shuy noticed that teachers were more

likely to teach lessons on the surface levels of language than the deep levels. He argued against

this superficial style of teaching, which focuses on form over function. TWI, by offering content

instruction in two languages, allows students to learn about the deeper functions of language

through authentic lessons and discourse in both languages.

Cummins similarly delineated language skills into two categories (1981). Basic

interpersonal communicative skills (BICS) are related to surface oral fluency (also known as

“playground” language), and take a relatively short time to learn. Cognitive academic language

proficiency (CALP) requires active cognitive involvement and can take five or more years to

develop. Understanding the difference between BICS and CALP is pivotal in understanding

ELL/LEP students’ English development. Because the language of the classroom (CALP) is

cognitively demanding and context-reduced, students with only BICS fall behind in the

mainstream English setting. For example, in some cases, minority-language students who appear

to be fluent in English are transitioned to English-only classrooms where they struggle to keep

up with their English peers. The root of their struggle is often that they have acquired BICS,

which allow them to communicate adequately with peers, but lack CALP, which is the context-

reduced language of the classroom. Such students require sheltered instruction, a key component

of TWI. Sheltered instruction provides context for cognitively challenging lessons (also known

as scaffolding, i.e., Wood, Bruner, & Ross, 1976).

13

Language acquisition through the teaching of core content. There are two types of

language acquisition processes: subconscious language acquisition and conscious language

learning (Strevens, 1977). According to Krashen’s Monitor Theory (1981) the subconscious

process is the more effective of the two processes, because “formal rules, or conscious learning,

play only a limited role in second language performance” (1981). This theory predicts, for

instance, that an interactive lesson about the food chain that includes the use of grammatical

direct objects may be more effective than explicit instruction on direct objects.

This theory supports TWI education because it shows that language can be acquired

through content instruction. However, while content-based language learning is effective, there is

merit to providing authentic corrective feedback so that learners can improve their language form

(Lyster, 2004). Educators can achieve both goals (academic content and language form) through

the use of language and content objectives in each lesson.

Culturally relevant pedagogy. Cross-cultural understanding is a main tenet of TWI

classrooms, in which students can feel that their cultures are valued, which in turn promotes

student engagement and learning (Ladson-Billings, 1995). Diversity training can help educators

develop positive outlooks toward linguistic and cultural diversity (Flores & Smith, 2009).

Teacher preparatory programs that emphasize the development of cultural competence can fully

meet the needs of linguistically and culturally diverse students (Sheets et al., 2010). By

celebrating students’ unique cultural backgrounds in the choice of assignments and materials,

TWI classrooms make the curriculum more relevant, especially to minority-language learners.

Bourdieu’s theory of cultural capital predicts that students of dominant culture are

successful academically at least in part because their culture aligns with testing culture (Lee &

Bowen, 2006). Due to the cultural nature of schools and tests, providing instruction and

14

assessments in minority-language students’ home language, while well intentioned, may not

address deeper cultural mismatches between students and the testing tool. Cultural capital may

be represented in assessment, despite translation or even transadaptations, as indicated by

Nelson-Barber and Trumbull: “Standardized tests can lack validity for many students from non-

dominant communities who do speak English [as an L1]” (2007, p. 138).

Cultural Motivation for TWI

Assimilation is a process some immigrants go through; as they fit into their adopted

culture, they leave behind their home language and cultural practices. In the United States,

assimilation is a metanarrative of the “authentic” American experience, exemplified by the

“melting pot” analogy (Nieto, 2009). However, assimilation can cause feelings of frustration, a

disconnection between generations, and the silencing of students’ cultural experiences and

linguistic resources in schools (Nieto, 2009).

The debate on how to best educate ELL/LEP students in the U.S. centers on assimilation

versus pluralism, which is the idea that students should maintain their home language as well as

learn the majority language (De Jong & Howard, 2009). TWI allows minority-language students

to maintain, celebrate, and develop their home language and culture, and “strives to promote

positive multicultural environments and attitudes” (De Jong & Howard, 2009, p. 84), offering an

opportunity for minority-language students to maintain and promote their cultures in the school

environment, while majority-language students can become more culturally aware.

TWI is designed to do more than simply provide a single space for diverse learners to

come together. These classrooms provide a third space in which the invisible tensions between

minority and majority languages become visible: students “confront, speak about, and

interactively redefine the relationship between the two languages” (Hadi-Tabassum, 2006).

15

Gaining cross-cultural competence is a key TWI goal, but it is difficult to achieve. In the world

outside the school’s walls, minority groups still struggle for equality.

Students from different backgrounds have different types of motivation for participating

in TWI. Gerena studied students’ and families’ motivation to join TWI programs (2010). She

found families from both minority- and majority-language groups desire opportunities, success,

and mutual understanding. However, minority-language parents were unique in that they were

motivated by a desire to maintain their heritage language and their cultural identity in the

community. English-speaking families, on the other hand, tended to be motivated by financial

opportunities, i.e., access to jobs that require bilingualism. The majority of TWI participants hold

one motivation in common regardless of their home language: the desire to be bilingual and

biliterate (Whiting & Feinauer, 2011).

Block (2012) found attitudinal differences between Latinos (both Spanish-dominant and

English-dominant) in TWI and mainstream classrooms. The students in TWI had more positive

attitudes toward the Spanish language (especially reading and speaking in public) and increased

overall biculturalism than those in mainstream English classrooms. Additionally, students in the

TWI programs “grew substantially in their relationships with Spanish-speaking family and their

communication with community members during their years in elementary school” (Block,

2012, p. 252). These works show that TWI can have a positive effect on cultural attitudes.

Literacy

Literacy is the cornerstone of education, yet there is little consensus among policymakers

and educators on its definition. Even definitions of the two basic components of literacy, reading

and writing, are hotly contested – especially “reading”. Unfortunately, the analysis of writing in

16

TWI programs is beyond the scope of this study, but it is a critical component of literacy and

warrants study.

In the first half of the 20th century, the understanding of reading was simplistic: it was

simply “the correlation of a sound image with its corresponding visual image” (Bloomsfield,

1938 as cited in International Reading Association, 1995). In the ensuing decades the debates

over the definition of reading and the best practices for literacy instruction have become highly

politicized (Lara, 2010), so much that the debates have become known as the “Reading Wars”

(Valencia & Wixson, 2000). Essentially, one camp understands reading as primarily skill-based,

while the other views reading as a socially and culturally constructed process.

Because the basis for the current study is sociocultural, this section focuses on the latter

meaning. Two large organizations dedicated to reading offer modern definitions: the RAND

Reading Study Group emphasizes the interactive nature of reading, which they define as

“extracting and constructing meaning through interaction and involvement with written

language” (Kirby, 2003, p. 1). The Progress in International Reading Literacy Study (PIRLS)

offers a more sociocultural perspective, defining reading as a basis for participation in society:

“the ability to understand and use those written forms required by society and/or valued by the

individual” (Elley, 1992 as cited in International Reading Association, 1995).

The “simple view of reading” (Hoover & Gough, 1990) is a theory developed in the

1980s in an attempt to explain bilingual students’ English reading performance. This theory

proposes that reading comprehension is equivalent to the product of the student’s decoding

ability and listening comprehension level. Although the “simple view” remains popular,

contemporary researchers have found other key factors to reading achievement (e.g. Tilstra et al.,

2009), such as vocabulary, verbal proficiency, and fluency.

17

Another key element of ELL/LEP students’ literacy development unexplored by the

simple view of reading is their level of literacy engagement (Cummins, 2011; Goldenberg et al.,

2006). Students’ literary engagement is characterized by extensive reading, an enjoyment of

literacy, deep comprehension, and active pursuit of literacy activities within and outside of

school (Guthrie, 2004 in Cummins, 2011, p. 1977). Literacy engagement can be enhanced by the

inclusion of culturally relevant classroom activities and materials. Regardless of the formal

definition of reading, ELL/LEP students require between four and seven years to reach grade-

level standards in English literacy achievement (Bialystok, 2002).

Connections between L1 literacy and L2 literacy

Biliteracy. Biliteracy, broadly defined as the ability to read and write in two languages, is

a primary goal of TWI. However, as difficult as it is to understand reading in a single language,

the process of becoming literate in two languages is even more complicated. As Shanahan and

August write, “learning to read for the first time in a second language is arguably a difficult task,

particularly for children who have limited oral skills in that language and limited emergent

literacy skills in any language” (2008, p. 294). This is an added level of complexity on the

already complex continuum of biliteracy and bilingualism.

The debate on biliteracy has focused on whether monolingual (i.e., English-only) literacy

instruction or bilingual (i.e., L1 and English) literacy instruction better serve ELL/LEP students

in their literacy development. In search of conclusive results, no less than five meta-analyses in

the past 30 years have focused on this theme: Greene, 1997; August & Shanahan, 2006; Rolstad,

Mahoney, & Glass, 2005; Slavin & Cheung, 2005; Willig, 1985; all cited in Goldenberg (2010).

These five meta-analyses each found that second language literacy is indeed supported by

18

literacy development in the primary language. Goldenberg states that “this might be one of the

strongest findings in the entire field of educational research. Period” (2010, p. 22).

The higher a student’s L1 literacy and academic knowledge, the more strongly they

predict L2 literacy and academic achievement (Genesee & Lindholm-Leary, 2012; Riches &

Genesee, 2006). Cummins’ Common Underlying Proficiency Theory explains this relationship:

students who access knowledge and literacy from L1 can apply that to knowledge/literacy

learning in L2. Cummins (2011) also found that literacy in L1 promotes academic achievement,

whether it is a monolingual or bilingual instructional environment.

Biliteracy in TWI programs. Beyond the complicated milieu of biliteracy, TWI

programs have to navigate even more complex factors in the teaching of literacy. Not only are

ELL/LEP students learning to read in two languages (e.g., Spanish as L1 and English as L2), but

simultaneously native English speakers are also learning to read in two languages (e.g., English

as L1 and Spanish as L2). Students progress along the continuum of biliteracy (Hornberger,

2012), which demonstrates that students’ bilingualism and biliteracy abilities are “highly

complex and fluid” (p. 264), which problematizes the use of a monolingual test to assess

emerging bilinguals (Escamilla, 2006). It is through this lens of a bilingual continuum that this

project seeks to explore the assessment of emerging bilinguals.

Sociocultural perspectives in the curriculum and literacy

There are myriad sociocultural issues associated with the teaching of literacy to language

minorities. Too frequently, reading curricula are designed only with English speakers in mind,

and lack cultural relevance for language-minority students. On the other hand, curricula

designed specifically for minority-language speakers often come from a deficit perspective that

discounts these students’ substantial background knowledge. The resulting curriculum tends to

19

be at the lower level of Bloom’s taxonomy, and is neither engaging nor authentic.

For example, reading instruction in bilingual classrooms has typically centered on

simplistic word reading and fluency, without cultivating students’ engagement with reading or

their “awareness of how meaning is encoded in text” (Cummins, 2011, p. 1974). Genesee and

Lindholm-Leary (2012) similarly state that there is too much emphasis on phonemic awareness

when teaching reading in two languages, and that ELL/LEP students need more exposure to and

emphasis on “complex genres of literacy” (p. 84).

These shortfalls in design can be remedied by culturally relevant pedagogy, which

explicitly and openly recognizes and gives credence to ELL/LEP students’ cultural backgrounds,

personal and familial histories, and funds of knowledge. This way of teaching seeks to counteract

the lack of positive portrayals of ELL/LEP students in the curriculum. It also attempts to

counteract the deficit mentality that minorities have endured in the school system. In support of

culturally relevant pedagogy, Cummins writes,

Power relations in the wider society express themselves in educational contexts through the negotiation of identities in the school; thus, students from communities whose identities have been devalued in the wider society will benefit from instruction that affirms their identities within the context of the school.

Cummins, 2011, p. 1975

The use of minority L1 in classrooms remains controversial despite many studies that

demonstrate its benefits. Therefore, a concerted, explicit effort to use and promote L1 in the

classroom “challenges the devaluation of their language and culture within the wider society”

(Cummins, 2011, p. 1987) and allows students to engage more deeply with the curriculum.

Incorporating L1 into classroom discourse allows ELL/LEP students to activate prior knowledge,

affirm cultural identities, and express ideas.

20

Goldenberg (2006) also explored studies of sociocultural factors and reading achievement

in language-minority children in a variety of educational environments. He found that attempts to

make the classroom culture more like home culture can increase engagement and participation.

Additionally, a culturally relevant reading curriculum encourages growth in reading

comprehension.

Essentially, examining literacy practices through a sociocultural lens shows that students

of diverse backgrounds benefit from the incorporation of culturally relevant materials, the

primary language, and an enriched, engaging reading curriculum. Although these findings are

based on research on ELL/LEP students, it can be assumed that these best practices benefit

majority speakers in TWI programs, too.

Language Structure and the Implications for Reading

One characteristic of all languages is the opacity or shallowness of its orthographic

system. If the language is highly regular, and each phoneme consistently corresponds to a single

letter, like Spanish, it has a shallow orthography. On the other hand, if there is not a one-to-one

correspondence between phonemes and letters, like English which is highly irregular, then it has

a deep or opaque orthography.

Ziegler and Goswami (2005) refer to this as the “grain size theory” – that the more

opaque the orthgraphic system, the more difficult it is to learn to read that language. They found

that children learning to read languages with shallow orthographies made rapid progress in

literacy acquisition. For example, in studies described by Prior (2012), students learning to read

Finnish, Greek and German (which have shallow orthographies) quickly learned to decode

easily. However, students learning to read more opaque orthographies (in this case French and

Danish) had higher rates of errors. Students learning to read in English also displayed relatively

21

high error rates, which can be “attributed to the great depth and opacity of the system” (p. 138).

These differences in orthographies affect both English- and Spanish-speakers in TWI programs.

Reading Comprehension and Vocabulary

Much of the existing research on reading comprehension of TWI program participants

focuses on students in kindergarten through second grade. While this research is quite important,

more studies are needed that follow students’ trajectories through the upper elementary grade

levels. Through longer analyses scholars can investigate the impact of primary language

instruction as students proceed to more complex subjects.

Some studies of reading comprehension of older bilingual participants are available.

Mancilla-Martinez and Lesaux (2010) performed a longitudinal study of ELL/LEP students’

reading comprehension from preschool to fifth grade. They found that, although the majority of

these students had attended preschool, and thus had attended U.S. schools for seven years, their

average reading level by fifth grade was on a second grade level. In terms of specific reading

skills, the fifth graders’ word reading was relatively on par with age peers, but vocabulary and

comprehension were three years below grade level. These results imply that the instructional

emphasis for these students had been on simplistic word reading, as opposed to deeper

comprehension and vocabulary development.

Similarly, Geva and Farnia (2011) showed that fifth grade ELL/LEP students had fallen

significantly behind age peers in areas such as “vocabulary breadth, overall command of a

variety of syntactic skills, comprehension of spoken language….[and] complex language and

reading comprehension tasks, even though they are able to perform at par on word-level reading,

reading fluency, and cognitive component skills” (p. 1840). Proctor et al. (2011) also emphasis

the importance of vocabulary depth as measure to predict English reading performance. These

22

authors emphasize a similar theme – that ELL/LEP students’ benefit from explicit vocabulary

instruction and activities that help them develop deeper reading engagement and

comprehension.

Assessment in Multilingual/Multicultural settings

Assessment of language-minority children poses many challenges. Although states and

districts have tried to make testing more fair for ELL/LEP students, there remains a lack of

conclusive research, and existing policies do not always rely on research-driven methods. Are

the literacy tests administered to dual-language learners, which are often designed for

monolingual students, able to accurately assess their literacy abilities? As little research exists on

the assessment of English-speaking students in TWI programs, this section focuses on a general

discussion of assessment issues for students of diverse linguistic backgrounds.

Of great concern is the achievement gap between ELL/LEP students and non-ELL/LEP

students (Abedi & Gándara, 2006; Borsato & Padilla, 2008; Choi & Wright, 2006; Fairbairn &

Fox, 2009; De Jong & Howard, 2009; Sánchez et al., 2013; Sandberg & Reschly, 2010; Solano-

Flores, 2008; Young et al., 2008). Prior to federal education mandate No Child Left Behind

(NCLB, 2001), ELL/LEP students were often not included in reported assessment results.

Current federal law, however, mandates that all students must participate in the NCLB high-

stakes assessments, ostensibly “to provide information about their needs so that schools can

address those needs and raise their achievement to at least adequate levels” (Gándara & Baca,

2008, p. 213).

While addressing minority-language students’ plight is a noble intention, it has actually

served to further discriminate against these students by the use of tests that are inappropriate.

Minority language speakers are disproportionately penalized for failing (Menken, 2011), and are

23

framed as “problems” by the results and interpretation of these scores (Koyama & Menken,

2013). Minority language speakers tend to score lower on the English standardized tests than

monolingual peers, and their language proficiency tests results can be misinterpreted. Sometimes

these “inaccurate” results (Borsato & Padilla, 2008, p. 486) are used to make graduation,

promotion, or retention decisions. For example, what may be an issue of low academic language

proficiency can be misdiagnosed as a learning disability, and often a disproportionate number of

ELL/LEP students are identified for special education (Borsato & Padilla, 2008, Sanchez et al.,

2013).

The reductive philosophy of the testing paradigm does not bode well for native-language

instruction for minority-speakers. The narrowing of the scope of the assessments, combined with

the increase in the assessment’s power, in turn narrows and emphasizes the importance of the

taught curriculum; enrichment programs such as TWI are considered superfluous. For example,

New York City has exemplar bilingual schools, but ESL programs are on the rise and the

bilingual programs are on the decline: “large numbers of these [bilingual] programs are being

eliminated and replaced with ESL programs in city schools” (Menken, 2011, p. 126).

Abedi et al. (2003) researched assessment data in individual school districts, as well as

two entire states, and found several trends related to the variation in achievement between

ELL/LEP students and non-ELL/LEP students. First, a student’s performance on content

assessments such as math and science is closely related to and often confounded with their

proficiency in English. Second, the gap between ELL/LEP and non-ELL/LEP achievement

increases as the complexity and amount of language in the assessment increases. Third, a high

language load in an assessment tool may be a source of measurement error. The test may lack

validity because it is does not isolate the assessed content from language proficiency.

24

To address these disparities, Fairbairn and Fox recommend the following: test language

be accessible (i.e., not overly complex), local educators develop tests for their unique population,

more research is done on ELL/LEP test processing and feedback, tests are normed to different

populations, clear ELL/LEP assessment policies are created, and more research is done on

ELL/LEP assessments (Fairbairn & Fox, 2009).

The current testing paradigm conflates content assessment and language assessment

(Fairbairn & Fox, 2009). Such tests intend to measure content knowledge, but actually measure

language proficiency (Cummins, 2000). In other words, a minority-language student who has a

solid understanding of photosynthesis in his primary language might not be able to explain or

answer questions about that topic in English, due to his beginning English language proficiency.

Because of this construct-irrelevance variance, a student’s language abilities affect academic

performance in complex ways (Solano-Flores, 2008, Sanchez et al., 2013).

A student’s English language proficiency plays a critical role in these assessments. Abedi

et al. (2003) found in their analysis of ELL/LEP students’ performance on high school tests that

the greater the English language complexity, or “English language load”, in the assessment tool,

the larger the gap between performance of ELL/LEP and non-ELL/LEP students. Because

“verbal and quantitative reasoning skills are measured less precisely for ELL/LEP students than

they are for non-ELL/LEP students” (Lakin & Lai, 2012, p. 151), there tends to be a greater

difference between ELL/LEP and non-ELL/LEP performance on verbal/language arts

assessments than math and non-verbal assessments.

Young (2009) recommends looking at the following eight indicators for test

comparability. For a given test, each of the factors should have a similar response for both

minority-language speakers and English speakers. Any response that is different for the two

25

groups questions that test’s validity. The factors are (a) Reliability, the equal precision of

measurement across examinee groups; (b) Factor Structure, the relationships among test items

and components are similar across examinee groups; (c) Differential Item Functioning, no

differential item difficulty is due to group membership; (e) Predictive Validity, no differential

prediction is due to group membership; (f) Educational Decisions, no differential decision-

making due to group membership; (g) Test Content, content and cognitive processes used are the

same across examinee groups; (h) Testing Accommodations that are appropriate, perceived as

such, and have minimal impact on scores for examinees who do not require them; (i) Test

Timing, with no differential speededness due to group membership (Young, 2009, p. 125).

To create a large-scale standardized test that evokes similar responses for both language

groups, thereby meeting the criteria above of a valid test for bilingual students, is a challenge. To

comply with these factors, teachers could instead create local, small-scale assessments that meet

these criteria. This new, decentralized system of assessment would require coordination between

“different kinds of professionals, from assessment development experts to cognitive scientists to

linguists to cultural anthropologists” (p. 9). If the paradigm does shift to local assessments,

scholars warn of possible resistance from “the measurement community” i.e., test-making

companies (Solano-Flores & Trumbull, 2003). In other words, this may disrupt the current

system of assessment development, because it would no longer be profit-driven.

The current assessments used for bilingual students are not adequate, and a full linguistic

profile in all four domains (reading, writing, speaking, and listening) in both languages is rarely

done (Solano-Flores, 2008). Solano-Flores and Trumbull suggest treating language itself as a

source of measurement error. For example, a minority-language student may perform well in

certain domains in L1 and not L2, while other tasks he can perform well in L2 but less so in L1

26

(Solano-Flores & Trumbull, 2003). Their research shows that for tests to accurately report

achievement, ELL/LEP students actually need to complete a higher number of test items in both

languages.

Cultural and Linguistic Issues in Test Development

The current testing paradigm is “one size fits all” (Abedi et al., 2004). In some states, all

students are expected to take exactly the same assessment, regardless of their unique linguistic

and cultural background. Assessments taken by ELL/LEP students are constructed for native

English speakers (Borsato & Padilla, 2008) using standards are for monolinguals, and usually

piloted/normed on English-speaking students (Solano-Flores, 2008). This approach lacks

“texture” (Fairbairn & Fox, 2009, p. 11), because ELL/LEP students are not a monolithic group.

On the contrary, ELL/LEP students are extremely diverse, and vary greatly in their L1

proficiency and literacy, dialect usage, and family background. Whereas the current testing

mentality sees cultural background as a “nuisance variable” in validity testing (Abedi &

Gándara, 2006, p. 43), Solano and Trumbull argue that “culture-free tests cannot be constructed

because tests are inevitably cultural devices” (2003, p. 9). These scholars argue for cultural

awareness to be a critical aspect of the entire assessment process, from test development and

review, to test use and interpretation (Del Rosario Basterra et al., 2010).

It is not only content or literacy tests that have issues of cultural bias and validity.

MacSwan and Rolstad (2006) performed a study in which they analyzed Spanish oral language

tests for validity. Of the 145 Spanish-speaking children tested (none of whom was identified as a

special education student) 75%-90% had a result of “less than fluent” in Spanish, despite the last

half-century’s language acquisition research that shows “all normal children to achieve

linguistically and to do so effortlessly and in the absence of instruction” (Pinker, 1994 in

27

MacSwan & Rolstad 2006). Further testing of these students showed that their error rates were

within the normal range, causing the researchers to conclude that “common native language tests

… do not correctly identify the true native language abilities of ELL/LEP students [and]

identified a majority of children as limited or nonspeakers of their native language” (p. 2322).

Some states do offer primary language reading and content area assessments, although

test development practices tend to start with the original English version and merely translate it

to the minority language, which can reduce the test’s validity and reliability (Lara, 2010). Instead

of translating the English test, Solano-Flores and Trumbull (2003) propose a concurrent

assessment development model where the test is developed in both languages at the same time:

changes are made to both interactively, and this allows groups of test developers to reach deeper

levels of analysis in their “discussion of language issues” (Solano-Flores & Trumbull, 2003, p.

6). This technique is beneficial not just for the development of a test in a language other than

English; the process allows the test developers to analyze the English test with a high “level of

specificity and depth” (Solano-Flores & Trumbull, 2003, p. 7).

Many states have policies that allow ELL/LEP students to test with accommodations if

they cannot test in their primary language. However, “many states currently use accommodations

without evidence of their validity” (Abedi et al., 2004, p. 15). For example, some

accommodations currently offered to ELL/LEP students are based on special education

accommodations, and they are not necessarily appropriate for ELL/LEP students (Fairbairn &

Fox, 2009). Another difficulty with accommodations is that students are often offered

accommodations under the assumption that they possess certain skills, such as L1 reading

proficiency or knowledge of how to use a dictionary (Solano-Flores, 2008).

28

Several scholars have concluded that an effective accommodation for ELL/LEP students

must only show benefit for ELL/LEP students and not for non-ELL/LEP students – if it shows

benefit for both groups, then it would not be fair to only offer it to ELL/LEP students (Abedi,

2004; Borsato & Padilla, 2008; Solano-Flores, 2008). This is the “interaction hypothesis”, and

there is at least one accommodation that meets this criterion: simplification of the language in

test items. This accommodation is considered to be valid, because it does not “appear to affect

the performance of English-proficient students” (Abedi et al., 2004, p. 17).

STAAR Test

Texas mandates assessments for all Texas public-school students in grades 3-12.

Students in grade 3-8 take the STAAR tests in reading and math every year, writing in grades 4

and 7, science in grades 5 and 8, and social studies in grade 8. According to the Texas Education

Agency (TEA), STAAR is a “rigorous program” that focuses on making sure that students are

ready for “subsequent grades and courses and, ultimately, for college and career” (STAAR

General Brochure).

The third-, fourth-, and fifth-grade STAAR reading tests are similar in many ways, but

have some key differences. The third-grade test has 5-6 passages, 48 questions, and a total

reading load of approximately 3,400 words. The fourth-grade test has 6-8 passages, 52 questions,

and a total reading load of approximately 3,900 words. The fifth-grade test has 6-8 passages, 54

questions, and a reading load of approximately 4,100 words. All exams test students’ ability to

read and understand fiction, literary nonfiction, poetry, expository, procedural, and media

literacy genres. The fourth- and fifth-grade tests include a dramatic passage, and paired passages

of multiple genres that treat the same theme (STAAR Reading Test Designs Grades 3, 4, and 5).

Students in both third- and fourth-grade are expected to be able to show literal

29

understanding of the texts, understand the use of sensory language, and find the meaning of

unknown words by using context clues, roots, and affixes. Students in both grade levels must be

able to analyze a text for the theme, identify author’s purpose, summarize, determine order and

importance of the plot’s main events, draw conclusions about characters’ intentions, and make

plausible inferences. Fourth- and fifth-grade students are expected to also understand how

figurative language affects meaning, to use text features, and to recognize how the organization

of a text creates a relationship between ideas (State of Texas Assessments of Academic

Readiness Performance Level Descriptors Reading, Grades 3, 4, and 5).

Spanish Version of STAAR. Any Spanish-speaking ELL/LEP is eligible to take the

STAAR reading exam in Spanish in grades 3-5. Non-ELL/LEP students who are in TWI

programs are also eligible for Spanish-language testing (Texas Education Agency, Student

Assessment Division, Training on the LPAC Decision-Making Process for the Texas Assessment

Program). Spanish-language tests are designed to be “linguistically and culturally appropriate for

the students tested and comparable to the English-version tests in content, rigor, and achievement

standards” (Chapter 5, English Language Learners (ELL/LEP students) and the State of Texas

Assessments of Academic Readiness Program). These tests are transadaptations, instead of

translations, which is a process designed to create a less culturally and linguistically biased

assessment.

Linguistic Accommodations for ELL/LEP students. ELL/LEP students in grade 3-5

may utilize two types of linguistic accommodations when taking the STAAR reading exam in

English. The Language Proficiency Assessment Committee (LPAC) is charged with determining

the appropriate accommodations, if any, for each ELL. According to the TEA, only those

students who regularly use these accommodations in the classroom are eligible to use them

30

during testing. The accommodations are not available for students who are taking the Spanish

version of STAAR, and neither are they available to non-ELL/LEP students.

The first type of accommodation available to ELL/LEP students is extra time. The normal

time limit for STAAR exams is four hours. ELL/LEP students are eligible to have the testing

time limit extended to the entire school day. The second type of accommodation is the use of

dictionaries. Allowable dictionary types include Standard English, simplified English, bilingual,

monolingual in another language, and picture dictionaries (TEA Linguistic Accommodations for

ELL/LEP students participating in the STAAR Program). Additional accommodations are

available if the student takes the STAAR-L (linguistically accommodated version).

STAAR Reliability. A test’s reliability is the expectation that multiple administrations of

the same test yield similar results. The TEA provides reliability data in several ways, but the

primary method is by measuring internal consistency, which correlates student’s responses to

questions of the same construct within a single test. In other words, if a student understands

sensory language, it is assumed he will correctly answer most of the questions on that construct.

This aspect of reliability can be measured with Cronbach’s alpha. Values between .8 and .9 are

considered good, and values above .9 are considered excellent.

The results of internal reliability for the Spring 2012 administration are provided in Table

1. Overall, STAAR reliability coefficients are good or excellent, because they are at .8 or higher.

However, in all measures, students of White ethnicity had higher reliability ratings than students

of Hispanic ethnicity.

31

Table 1. STAAR 2012 Mean P-Values and Internal Consistency Values

By Reporting Category and Content Area

Grade 3 Grade 4 English Reading – Total Group .891 .890

English Reading - Hispanic .879 .877 English Reading - White .884 .880

Spanish Reading – Total Group .869 .885 Spanish Reading - Hispanic .869 .884

Spanish Reading – White .901 .926 Other reliability data provided by the TEA are classic standard error of measurement,

which measures chance error such as student guessing; conditional standard of measurement,

which measures how accurate the band score is for the number of correct answers; and accuracy

of classification, which identifies how accurately the scoring system classifies students based on

their test score (i.e., advanced, satisfactory, and unsatisfactory). Because correlations between

students’ scores on STAAR and other tests are not available, and because the test is confidential,

one cannot assume that the commonly understood definition of reliability applies, i.e., that a test

is “consistent and dependable” (Abeywickrama & Brown, 2010, p. 27). Furthermore, no data are

available that provides reliability evidence for ELL/LEP students who take the English STAAR

with linguistic accommodations.

STAAR Validity. Test validity is evidence to what extent the test measures what it

purports to measure, and that educators can appropriately make inferences about student

performance from the test results. TEA provides validity evidence in five categories: test content,

response processes, internal structure, relations to other variables (criterion-related validity), and

consequences of testing (washback).

Content validity is the extent to which test measures the content it purports to measure.

The process of the development of the test is the evidence TEA provides for content validity: to

32

write items and build tests to pre-defined criteria; review items more than once for

appropriateness of item content and identification of item bias; perform field-tests on items and

review the field-test data; and have university-level experts review high school assessments for

content accuracy. (STAAR Technical Digest, p. 66).

Another factor for validity is response processes, the cognitive processes necessary to

answer a test item. To be valid, these responses must provide an accurate measurement of the

given construct. The TEA maintains that field-testing of item types and formats, and analysis of

field test data such as “item difficulty, item-test correlations, and differential item functioning”

by educators and experts accurately measures the factor of response process (Standard Technical

Processes, p. 66). Unlike reading tests in the upper grades, those in grades 3 and 4 are

exclusively in multiple-choice format, which the TEA somewhat unconvincingly claims is

“because it most closely resembles what students typically experience in classroom testing”

(TEA Chapter 4, p. 112).

An additional factor given by the TEA as evidence for validity is internal structure, which

is provided as a correlation for the reliability factor of internal consistency. If the internal

consistency (reliability) is high for all subpopulations, the TEA states that the internal structure

has a high level of homogeneity and therefore is valid. Another factor given is the high

comparability between the Spanish and English language versions of the Texas assessments.

However, the 2007 reference cited by TEA in support of this claim (Davies, O’Malley and Wu,

2007) may not be accurate, as Texas overhauled the previous assessment system (Texas

Assessment of Knowledge and Skills, or TAKS) and began STAAR in 2012.

An analysis of the relationship between scores on two measures is known as criterion-

related validity. TEA cites research the agency has performed on the relationships between

33

STAAR and other measures such as SAT and ACT, grade correlation studies between STAAR

and course grades, the correlation between STAAR EOC and college courses, the relationship of

exams in a given grade across content areas, linking studies of a single content area across grade

levels, and STAAR-to-TAKS comparisons in a single content area. The actual research studies

were not named nor were available as of the time of writing on the TEA website nor via any of

the links; the only information available is the timeline of the process of standards-setting, which

ends in Fall 2014.

The final factor in test validity provided by the TEA is consequential test validity, which

refers to intended and unintended consequences of test scores (this may also be referred to as

washback). While the STAAR is designed to have an effect on “curriculum, instructional

content, and delivery strategies” (Standard Technical Processes, p. 68), an example of an

unintended consequence is the narrowing of curriculum, or “teaching to the test”. The TEA

provides information about STAAR’s intended consequences, but claims that it is too soon to

study unintended consequences, evidence for which TEA claims “typically occurs after a

program has been in place for some time and is intended to continue in future years” (STAAR

Chapter 4, p. 118).

In sum, the information on the TEA’s website about the five measures of STAAR

validity do not, in their current state, suffice to clearly demonstrate this test’s validity. Despite

these concerns, this study uses STAAR data to develop a literacy profile of TWI participants

because it is an assessment administered consistently across the state and plays an important role

in students’ school careers.

34

CHAPTER THREE: METHODOLOGY

Hypotheses

The research tests these non-directional null hypotheses:

• There will be no statistically significant difference in STAAR reading scores between

English-speaking students in TWI programs and monolingual English programs.

• There will be no significant difference in STAAR reading scores between students who

are ELL/LEP students and those who are not ELL/LEP students.

It also tests these directional hypotheses:

• There will be a statistically significant difference in STAAR reading scores between

students who qualify for free or reduced lunch and those who do not qualify. Those who

qualify for free or reduced lunch will score significantly lower than those who do not

qualify.

Research subjects

The sites for data collection were selected using “typical case sampling” (Teddlie & Fen

Yu, 2007). Typical case sampling is used to “find instances that are representative or typical of a

particular type of case on a dimension of interest” (ibid, p. 80), which in this case concern the

research focus of creating a literacy profile of English-speaking TWI students.

Setting and participants

The research sites are three elementary schools in a suburban Central Texas school

district. The STAAR data selected for this study are from students enrolled in either TWI or

mainstream classrooms who were in third, fourth, or fifth grade in the 2012-2013 school year.

The third-grade students were born between September 1, 2003 and August 31, 2004. Fourth-

grade students were born between September 1, 2002, and August 31, 2003. Fifth grade students

35

were born between September 1, 2001, and August 31, 2002. The data include each student’s

STAAR score, education program (TWI or non-TWI), ELL/LEP status, grade level, and

socioeconomic status. All this information is housed in Texas’ Public Education Information

Management System (PEIMS) database.

Research Design

By comparing STAAR scores of individual classrooms to each other, as well as to a

statewide baseline, this study seeks to develop a numerical literacy profile (Ford et al., 2013) of

students in two-way immersion (TWI) programs. Do the majority-language (English-speaking)

TWI participants perform at the same level as their monolingual English peers (i.e., those in a

mainstream classroom)? Do these data change depending on the grade level (third, fourth, or

fifth)? How do the majority-language participants compare to the minority-language students

who are in the TWI program? Are students from different socioeconomic backgrounds

performing at similar levels?

Variables

The dependent variable in this study is the State of Texas Assessment of Academic

Readiness (STAAR) scores. The independent variables are the educational program (TWI or

non-TWI), ELL/LEP status, and socioeconomic status. Table 2 outlines the three different

independent variables, and the dependent variable, the STAAR reading score.

Table 2: Independent and Dependent Variables

Independent Variables Dependent Variables Education program (TWI or monolingual/English-only)

STAAR Reading Score

ELL/LEP Status Socioeconomic status

36

STAAR Reading Score. STAAR provides three different types of scores: raw, percent,

and scale scores. For the purpose of interpretation, raw scores (the correct responses on a given

test) divided by the total number of responses on a given test) are converted to percent test scores

and scale scores.

The TEA states that the scale score takes into account the difficulty of the questions, by

using the Rasch scale (2011-2012 Technical Digest, “Chapter 3, Standard Technical Processes”).

By analyzing student responses in field-tested questions, the agency decides each question’s

difficulty, and sets the scale accordingly (please see Appendix for the Grades 3-5 Raw Score

Conversion Tables).

What remains uncertain is the possibility of accurate reporting of the scale score if the

test is scaled this way for difficulty. For example, whether a third grade student answers the 20

most difficult questions correctly, or the 20 easiest questions correctly, he achieves a 1331 scaled

score. If a certain raw score is always equivalent to a certain scale score, no matter the difficulty

of the questions that were answered correctly, it is unclear how the scaled score does actually

adjust for difficulty level. Perhaps the scaling process assumes that a student who answers the 20

most difficult questions correctly also answers the 20 easiest questions correctly, although this is

not addressed in the Technical Digest.

The Technical Digest states that scale scores may be used “across forms and test

administrations” (p. 49), meaning that the scores are scaled to be equivalent despite irregularities

in test difficulty from administration to administration of the same content test in same grade

level. The scale is also vertical to follow an individual student’s progress. The TEA explains in

the 2013 Texas Student Assessment Program Interpreting Assessment Reports, STAAR Grades

3-8 Assessments,

37

The vertical scale score…. can be used to evaluate a student’s progress across grades in a particular subject. The vertical scale score can also be used to determine whether a student achieved satisfactory performance or advanced performance, to compare one student to another taking the same grade/subject area assessment, and to compare cohorts of students taking the same grade/subject area assessment in different years.”

(p. 2.2) Thus, STAAR scale scores can follow an individual student’s progress, or compare the

same grade level from year to year (for example, how 3rd grade performed in 2012, and how 3rd

grade performed in 2013). However, when grouping data across grade levels/cohorts (for

example, creating a subgroup of all TWI non-ELL/LEP students data in grades 3-5 for ANOVA),

scale values are not appropriate because the range of the scale and the relationship between the

ratio score and scale score change over grade levels. Thus, an ANOVA, which is a measure that

uses means (averages) could not accurately analyze scale scores from a subgroup that includes

scores from three grade levels, as the scale score has a different corresponding raw score for each

grade level.

Thus, STAAR percent test scores were treated as ratio scores for the purpose of this

study. Data can be classified as ratio when measuring proportion, magnitude, or count and has an

absolute zero, that is, the absence of what is being measured (Stevens, 1946). In essence, there

are no negative scores. Using the percent test scores as ratio scores allows for a continuous,

consistent value that can be analyzed across grade levels.

Figure 2 is a chart created from the Grades 3-5 Raw Score Conversion Tables in the

Appendix. Figure 2 demonstrates the relationship between ratio score (x-axis) and scale score (y-

axis). The ratio is different at the ends of the results spectrum. At the outer ends of the curve (i.e.,

between 0 and 1 question right, or 39 and 40 questions right) the curve is steep. This means that

there is more difference in the scale score (a bigger jump) at these edges. In the middle of the

38

curve the slope is not steep. For example, in third grade a student gains 10 scale points between

getting her 20th and 21st question right. Yet a third grader gains 83 scale points between getting

her 38th and 39th question right. The scaled scores are not plotted on the same slope across the

results spectrum, which is to say that they are not of equivalent ratios to the raw or ratio score.

Thus, the scaled scores are not appropriate for means analyses for comparing across the grade

levels, as outliers at the ends of the spectrums can create more influence in the means calculation

than is desired.

Figure 2: Relationship between STAAR Reading Scale Scores and Ratio Scores

The nature of the scaling process results in a lack of compatibility with means analyses

(t-tests and ANOVA) across grade levels in subgroups that include more than one grade level.

Therefore, while ratio scores may have some undesired variability in terms of test form and test

39

administration difficulty, they are utilized in this study because of their relatively continuous,

consistent values.

Education program. The schools in this study offer TWI as a bilingual program.

English-speaking students are enrolled in the same classroom as Spanish-speakers. Both English

and Spanish are languages of academic content instruction, although the exact amount of time

spent in each language is unknown and beyond the scope of this study.

When ELL/LEP students enroll in school, they are offered enrollment in the TWI

program. Their parents or guardians can choose to place them in a bilingual classroom (which

schools are required to offer if there are more than 20 students who speak the same home

language in a district), English as a second language (ESL) classroom, or mainstream English

classroom. Schools with bilingual programs generally encourage ELL/LEP students to enroll in

the bilingual classroom, although parents can choose not to do so.

Non-ELL/LEP students must choose to opt into the TWI program by opting in through a

sign-up process, and sometimes by lottery if students outnumber available spaces. Some schools

have a required screening for English-speakers, to make sure they have an adequate level of

native language proficiency to succeed in the two-way immersion classroom. Other programs do

not require screenings. Monolingual English speakers do not need to follow any opt-in procedure

to enroll in a mainstream English classroom; this is the default program for these students.

The PEIMS database has data on students’ participation in educational programs, coded

as both “Dual Language Immersion/Two-Way” and “Parent Or Guardian Has Requested

Placement Of A Non-ELL/LEP Student In The Bilingual Program” for non-ELL/LEP students,

“Parent Or Guardian Has Approved Placement Of A LEP Student in The Bilingual Program” for

40

ELL/LEP students, and “Student does not participate in the Bilingual Education Program” for

students in the mainstream English classroom.

ELL/LEP status. ELL/LEP students are those whose “primary language is other than

English and whose English language skills are such that [they have] difficulty performing

ordinary classwork in English” (TEA, Limited English Proficiency Initiatives, Snapshot of

ELL/LEP students in Texas). Students are identified as ELL/LEP when they enroll in school if

they do both of the following: first, indicate on the home language survey (at enrollment) that a

language other than English is spoken at home, and second, score below proficient on a norm-

referenced English Oral Language Proficiency Test (OLPT) if they are in grades K-12 and also

score below the 40th percentile English reading test in grades 2-12. Chapter 29 of the Texas

Education code uses the term “LEP” and Chapter 89 of the code changed it to “ELL”; to avoid

confusion this study uses both (Dr. Monica Lara, personal correspondence). ELL/LEP status is

classified in the PEIMS database under “LEP Indicator Code”. The analysis does not include

students who are in exited or monitoring status.

Socioeconomic Status (SES). Students who qualify for free or reduced lunch are

considered to be of a low socioeconomic status (SES). In Texas, this would be an income of up

to 185% of the federal poverty guidelines, or an annual income below $49,969 ($961 per week)

for family of five. Students’ SES status is available in the PEIMS database under “Economic

Disadvantage Code”.

Analysis Plan

A T-test will examine between-group differences of non-ELL/LEP students in TWI and

mainstream classrooms. A second T-test examines between-group differences of ELL/LEP and

non-ELL/LEP students in TWI programs. Finally, a two-way ANOVA will examine between-

41

group differences to determine main effects of the program (TWI or non-TWI) and/or

socioeconomic status (EcoDis or non-EcoDis) on STAAR tests. This two-way ANOVA is first

done for just non-ELL/LEP students, and then for students of all language backgrounds. These

analyses are done separately for grades 3, 4, and 5, where appropriate, and the findings will be

compared to determine if between-group differences are equivalent across grade levels.

Since there are multiple t-tests and ANOVA’s conducted, to reduce Type I error,

Bonferroni adjustment is used to determine the a priori alpha level, p < .0125, which is the

original alpha level (p < .05) divided by the number of tests (4) (Bland, 1995). To be significant

the analyses must be less than adjusted Bonferroni level of p < .0125.

Justification of Sample Size

The schools in this study were chosen under the purposeful sampling known as “typical

case sampling” of elementary schools in the greater San Antonio area. It is difficult to calculate

an appropriate sample size because the Texas Assessment Agency does not appear to publish the

number of STAAR test takers who are enrolled in two-way dual language programs. Texas

public school educational data do show the number of test taking students who are of limited

English proficiency (ELL/LEP), those who are in a low SES bracket, and those who are

migrants, among other data categories. However, to determine sample size for this study, we

must estimate the number of two-way dual-language participants in the state.

As of 2005, there were 531 two-way immersion programs in the state of Texas, according

to the Texas Two-Way/Dual Language Consortium. 18 of these are non-elementary (6th-8th

grade), bringing the total to 513. The average elementary school size in the state of Texas is 549

students (U.S. Department of Education). This study only discusses grades three through five, so

the number of students is divided in half because elementary schools usually house six grade

42

levels including kindergarten. Thus, our estimate of the population of two-way dual-language

participants is 513 x 549 / 2 = 140,819 students. Using the Raosoft sample calculator, with a 5%

margin of error and a 95% level of confidence, my sample size needs to be at least 384 students.

A second calculation using the G*Power sample size calculator (reference) reveals that with

input parameters of 0.25 effect size, 0.05 error probability, and 6 sample groups, my sample size

needs to be 400 students. The actual sample size of 810 students is almost twice as much as the

sample size calculators indicate.

43

CHAPTER FOUR: FINDINGS

This project seeks to create a literacy profile for non-ELL/LEP students in TWI programs

by examining between-group differences of non-ELL/LEP students in mainstream English

programs and ELL/LEP students in TWI programs. It also examines the effects of low

socioeconomic status. Specifically the research questions were: Do majority-language TWI

participants perform at the same level on a state reading assessment as their non-ELL/LEP peers

in a mainstream English classroom? How do the non-ELL/LEP students’ reading scores compare

to the ELL/LEP students in the TWI program? Are students from different socioeconomic

backgrounds performing at similar levels? The findings are presented in this order.

Program Differences

The first analysis looks at between-group differences on STAAR ratio scores of non-

ELL/LEP (native English-speaking) students in TWI programs compared with their non-

ELL/LEP peers in monolingual English programs in third, fourth, and fifth grades. Total students

in this analysis are n=761, with n=63 students in TWI and n=698 students in mainstream

classrooms. This t-test does not include students who are ELL/LEP or in monitoring status. An

independent t-test was run with education program as independent variable and ratio score as

dependent variable. Levene’s test shows that equal variances can not be assumed (p. < 01), so the

appropriate t-test results are used. The results, as shown in Table 3, are significant (p < .001)

between the two programs with TWI performing at a higher mean ratio score. This result

supports a rejection of the null hypothesis and suggests more than a 6-point mean ratio score

difference between the two groups when considering all grade levels, with majority-language

speaking TWI students’ mean ratio score at 78.87 and majority-language speakers in mainstream

classrooms at 72.49.

44

Table 3: T-Test Results for Non-ELL/LEP students in TWI and English classrooms

(independent variable: education program, dependent variable: ratio score)

Group Statistics

TwoWayDual N Mean Std. Deviation Std. Error Mean No 698 72.49 17.124% 0.648% RatioScore DL Imm/2w 63 78.87 11.704% 1.475%

Independent Samples Test

Levene's Test for Equality of Variances

t-test for Equality of Means

95% Confidence Interval of the Difference

F Sig. t df Sig. (2-tailed)

Mean Differenc

e

Std. Error

Difference

Lower Upper

Equal variances assumed

11.893 .001 -2.898 759 .004 -6.384% 2.203% -10.709% -2.060%

RatioScore Equal variances not assumed

-3.964 87.981

.000 -6.384% 1.611% -9.585% -3.183%

44

45

Figure 3: Bar Chart of mean ratio scores of non-ELL/LEP students in TWI and

English (mainstream) classrooms, Grades 3-5

Figure 3 demonstrates mean ratio scores of non-ELL/LEP students in TWI and non-TWI

programs across grade levels. In third grade, the between group mean difference is 6 points, in

fourth grade 5 points, and fifth grade 7 points. While mainstream English classrooms appear to

gain 2 points from third through fifth grade, TWI students gain 7 points across the same years.

The number of students in both groups who met the passing standard and the advanced

standard are presented in Table 4. The percentage of TWI participants who met the passing and

advanced standards appears to be at least 10 percentage points higher than the non-TWI students,

with the exception of one group: a greater percentage of non-TWI students met the advanced

standard in Grade 4 than did TWI students. In Grade 5, the percentage of TWI students who met

the advanced standard is approximately twice the percentage of non-TWI students.

Mea

n R

atio

Sco

re

46

Table 4: Percentage of students meeting passing and advanced standards and mean ratio

scores, non-TWI/non-ELL/LEP and TWI/non-ELL/LEP

Non-TWI and Non-ELL/LEP

N Met Passing Standard

Met Advanced Standard

Mean Ratio Score

Grade 3 231 87% 25% 70.64 Grade 4 210 83% 32% 73.48 Grade 5 257 84% 28% 73.35

TWI and

Non-ELL/LEP N Met Passing

Standard Met Advanced

Standard Mean Ratio

Score Grade 3 24 96% 38% 76.67 Grade 4 22 100% 27% 77.55 Grade 5 17 94% 59% 83.71

Language Background Differences

The next independent t-test analysis explores the performance of ELL/LEP and non-

ELL/LEP students in TWI classrooms, with language background as the independent variable

and ratio score as the dependent variable. Only students who are in the TWI program are

considered. This analysis includes students who took the STAAR in English or Spanish. An

assumption is made that these are equivalent tests. Total students in this analysis are n=109, with

n=46 ELL/LEP students and n=63 non-ELL/LEP students.

In examining the results, the Levene’s test shows that equal variances can not be assumed

(p < 0.05), so the appropriate t-test results are used. The independent t-test (Table 5) results

appear to show a between-group difference (p < .001) between the two groups of language

speakers. This supports a rejection of the null hypothesis. Overall this suggests more than a 28-

point mean difference between the two groups (non-ELL/LEP mean ratio = 78.87 and ELL/LEP

speakers’ mean ratio score = 50.87.)

47

Table 5: T-Test Results for ELL/LEP and Non-ELL/LEP students in TWI program

(independent variable: ELL/LEP status, dependent variable: ratio score)

Group Statistics

ELL/ LEP

N Mean Std. Deviation

Std. Error Mean

No 63 78.87 11.704% 1.475% RatioScore ELL/LEP 46 50.87 16.601% 2.448%

Independent Samples Test Levene's Test for

Equality of Variances

t-test for Equality of Means

95% Confidence Interval of the Difference

F Sig. t df Sig. (2-tailed)

Mean Differenc

e

Std. Error Difference

Lower Upper Equal variances assumed

7.825 .006 10.333

107 .000 28.003% 2.710% 22.631% 33.376%

RatioScore Equal variances not assumed

9.800 76.298 .000 28.003% 2.857% 22.313% 33.694%

47

48

Figure 4: Estimated Marginal Means of Ratio Score of Two-Way Immersion ELL/LEP and

non-ELL/LEP Students in Grades 3-5

Figure 4 demonstrates visually the mean ratio scores for the two language groups across

grade levels. It shows what appear to be substantially greater mean ratio scores for the non-

ELL/LEP students than the ELL/LEP students in the TWI classrooms. The English-speaking

group shows mean ratio score gains in each grade level, with approximately 1 point growth

between third and fourth grades, and 6 points growth between fourth and fifth grade. The

ELL/LEP group shows a decrease of 2 points in mean ratio scores between third and fourth

grade. The ELL/LEP group is substantially lower than the non-ELL/LEP group. Data on

ELL/LEP fifth grade students are provided, but there are too few in the analysis to draw

conclusions (n=2). The data for the two t-tests is presented in Figure 5.

49

Table 6: Percentage of students meeting passing and advanced standards and

mean ratio scores, TWI and non-ELL/LEP and TWI and ELL/LEP

The number of students in both groups who met the passing standard and the advanced

standard are presented in Table 6. The percentage of non-ELL/LEP students who met the

passing standard is at least 44 percentage points higher than ELL/LEP students. A full 100% of

the non-ELL/LEP students met the passing standard in fourth grade, while only 37% of

ELL/LEP students did. It appears that 38% of third grade non-ELL/LEP students and 27 of

fourth grade non-ELL/LEP students met the advanced standard while only 4% of third grade

ELL/LEP students and 0% of fourth grade ELL/LEP students did. Fifth grade TWI/LEP results

are provided but cannot be included in the comparison because insufficient fifth grade ELL/LEP

student scores are available (n=2).

In summary, the percentage of students passing in each grade in TWI and ELL/LEP, TWI

and Non-ELL/LEP, and Non-TWI and Non-ELL/LEP are presented visually in Figure 4. (Note:

There is insufficient data on fifth grade ELL/LEP students, with n=2).

TWI and Non-ELL/LEP



Mean Ratio Test Score


TWI and ELL/LEP



Mean Ratio Test Score


50

Figure 5: Summary of Program and Language Factors:

Percentage students meeting passing standard, Grades 3-5

Socioeconomic Background Differences.

This section presents two-way ANOVA comparisons that analyze the independent

variables of socioeconomic status and education program, with a dependent variable of STAAR

ratio score. There are two analyses: the first ANOVA analysis consists only of non-ELL/LEP

(native majority-language speakers), and then second ANOVA analysis includes both ELL/LEP

and non-ELL/LEP students.

Non-ELL/LEP Only. This ANOVA analysis is of between-group differences in Non-

ELL/LEP students from economically disadvantaged (EcoDis) and non-economically

disadvantaged (non-EcoDis) backgrounds. The total number of ratio scores analyzed is n =761.

Table 7 show n means for groups as well as means and standard deviations by group. English-

speaking economically disadvantaged students in the TWI program are the smallest group, with

n = 10, but this group is large enough to merit analysis (n > 5).

51

Table 7: Descriptive Statistics for non-ELL/LEP Students (independent variables:

EcoDis/TWI, dependent variable: ratio score)

The two-way ANOVA (Tables 8 and 9) appears to show that for non-ELL/LEP students,

only socioeconomic status is significant after Bonferroni’s correction. Being economically

disadvantaged (EcoDis) is a statistically significant factor with p <.001, and partial eta-squared,

ηp2 = .023, which can be interpreted as having a small effect size (Grimm & Yarnold, 2003;

Stevens, 1996) and an observed power of 0.989. The interaction effect of economically

disadvantaged and TWI participation was not statistically significant (p = 0.551). The degree of

significance and the strength of the power support the acceptance of the directional hypothesis

that there is a significant difference between the performance of low-SES students and non-low-

SES students on the STAAR reading exam, when the scores of only non-ELL/LEP students are

taken into consideration.

52

Table 8: Between-group effects (ANOVA) of non-ELL/LEP students (independent

variables: EcoDis and TWI, dependent variable: ratio score

Table 9: Partial Eta Squared and Observed Power of non-ELL/LEP students (independent

variables: EcoDis and TWI, dependent variable: ratio score)

The mean ratio scores of the four groups (non-ELL/LEP only) are as follows: non-

TWI/non-EcoDis mean ratio score 76.19, non-TWI/EcoDis mean ratio score 62.67, TWI/non-

EcoDis mean ratio score 80.49, and TWI/EcoDis mean ratio score 70.30. These data appear to

demonstrate that TWI students who come from economically disadvantaged backgrounds score 7

53

points greater than their EcoDis counterpart in the mainstream classroom. These data also appear

to show that the EcoDis/TWI (mean ratio score 70.30) group gains more than 50% of the

difference in mean ratio scores between their non-TWI/EcoDis counterparts (mean ratio score

62.67) and non-TWI/non-EcoDis students (76.19) by participating in the TWI program (76.19 –

62.67 = 13.52 points difference in mean ratio scores between EcoDis and non-EcoDis students in

mainstream classrooms; 50% of 13.52 is 6.76 points. The EcoDis/TWI students appear to score

70.30 – 62.67 = 7.63 points higher than their EcoDis peers in the mainstream classroom). These

data are presented visually in Figure 6.

Figure 6: Estimated Marginal Means of Ratio Score of non-ELL/LEP

(independent variables: EcoDis and TWI, dependent variable: ratio score)

54

The ANOVA is followed up by a post-hoc t-test which confirms the results of the

ANOVA by finding that socioeconomic status is statistically significant in the analysis of non-

ELL/LEP students’ performance.

Table 10: Post-hoc t-test of non-ELL/LEP students (EcoDis is independent variable, ratio

score is dependent variable)

ELL/LEP and Non-ELL/LEP Students (All Students). This section details a two-way

ANOVA of between-group differences in students from all language backgrounds from

economically disadvantaged (EcoDis) and non-economically disadvantaged (non-EcoDis)

backgrounds. The total number of scores analyzed is n=826. Table 9 shows n figures for groups

as well as means and standard deviations.

Table 11: Descriptive Statistics for All Students (independent variables: EcoDis/TWI,

dependent variable: ratio score)

55

The two-way ANOVA (Tables 12 and 13) appears to show that for all students, TWI

participation is not statistically significant, while economic disadvantage and the interaction of

economic disadvantage and TWI program participation are statistically significant. TWI

participation (TwoWayDual) has a p = .344 and an observed power of .157. Being economically

disadvantaged (EcoDis) is a main effect and is a statistically significant factor with p < .001, and

partial eta-squared, ηp2 = .134 (Grimm & Yarnold, 2003; Stevens, 1996), which can be

interpreted as having a medium effect size, and an observed power of 1.000. The interaction of

socioeconomic status and TWI participation was also statistically significant (p = 0.006) with

little consequential effect size (eta = .009) and an observed power of 0.791. The degree of

significance, the eta, and the strength of the power support the acceptance of the directional

hypothesis that there is a significant difference between the performance of low-SES students

and non-low-SES students on the STAAR reading exam, when the scores of all students are

taken into consideration.

56

Table 12: Between-group effects (ANOVA) of All Students


Table 13: Partial Eta Squared and Observed Power of non-ELL/LEP students


The mean ratio scores of the four groups (all language groups included) are as follows:

non-TWI/non-EcoDis mean ratio score 76.19, non-TWI/EcoDis mean ratio score 62.62,

TWI/non-EcoDis mean ratio score 79.11, and TWI/EcoDis mean ratio score 56.67. This two-way

ANOVA, which includes both ELL/LEP and non-ELL/LEP students, appears to show that non-

economically disadvantaged students perform approximately 3 points higher on mean ratio

57

scores in TWI than non-TWI, while economically disadvantaged students perform approximately

6 points lower in TWI than non-TWI settings. This is demonstrated visually in Figure 7.

Figure 7: Estimated Marginal Means of Ratio Score of All Students


The ANOVA is followed up by a post-hoc t-test (Table 14) which confirms the results of

the ANOVA by finding that socioeconomic status is statistically significant in the analysis of all

students’ performance.

58

Table 14: Post-hoc t-test of all students (EcoDis is independent variable, ratio score is

dependent variable)

These ANOVA analyses would lead to the conclusion that the literacy rates of

economically disadvantaged students, as measured by the STAAR test, are lower in TWI

programs than in mainstream programs. This is in contrast with the earlier, similar ANOVA that

analyzed only non-ELL/LEP students. This is likely due to a conflation of language and

socioeconomic factors, which is discussed in the next section.

Limitations

This study has a convenient sample size of n = 810, and the majority were NonELL from

which one cannot make broad generalizations. Additionally, these data are from a single,

suburban school district; without data from urban and rural sites, it is even less generalizable. I

could not locate data on ELL/LEP students who are in mainstream English classrooms to run a

two-way ANOVA using ELL/LEP status and educational program as independent variables,

which would have provided deeper insight into the effect of these factors by themselves and also

in interaction.

This study utilizes ratio scores, which do not take test form or test administration

difficulty into consideration. The overall potential for variability in test difficulty between test

forms and between test administrations is unknown and is not clearly stated in the Technical

Digest.

59

To truly create a “literacy profile”, it would be ideal to include a more holistic assessment

of literacy, including reading fluency, vocabulary, comprehension, engagement, grammar,

writing, etc. STAAR tests are relatively consistently administered and so make ideal data for

comparisons; however, the literacy performance of a student cannot be fully measured by this

single test.

The education program variable can vary widely, even under an individual category. For

example, the PEIMS code “Dual Language Immersion/Two-Way” does not provide information

about the actual instruction in the classroom. It could mean 90% of the instruction is in Spanish,

or 50%, 20%, or none. Without qualitative data such as observations or interviews, it is

impossible to know how the program is being implemented.

60

CHAPTER FIVE: DISCUSSION

This section offers analysis of the findings in the previous chapter. To review, the project

goal was to test the non-directional null hypotheses that state that there will be no significant

difference in STAAR reading ratio scores between non-ELL/LEP students in TWI and non-TWI

settings, and also no significant difference between TWI participants who are ELL/LEP and

those who are not ELL/LEP. It also tested the directional hypotheses that stated that there would

be a statistically significant difference between TWI students who qualify as economically

disadvantaged and those who do not qualify, and those who qualify would score significantly

lower than those who do not qualify. The discussion is organized by the hypotheses.

Hypothesis 1. There will be no statistically significant difference in STAAR reading ratio

scores between English-speaking students in TWI programs and monolingual English programs.

This hypothesis concerns students who are non-ELL/LEP, and compares their

performance in TWI and non-TWI programs. This analysis appears to show that the literacy rates

of English-speaking participants in TWI are not impeded by the study and use of two languages;

rather, their literacy achievement appears to be higher than those who are in monolingual

settings.

In almost every category of analysis, TWI participants appear to have outperformed the

students in the mainstream English classroom. Over the three grade levels, these students had

mean STAAR ratio scores that were 6 points higher than the students in the mainstream English

program; in third grade the mean ratio scores for the TWI group were 6 points higher, in fourth

grade 5 points higher, and in fifth grade 7 points higher. The percentage of students passing in

the TWI group was at least 10% more than the non-TWI group in each grade level. The

percentage of students in TWI who met the advanced standards was 50% greater than the non-

61

TWI group in third grade, and twice as great in fifth grade (although it was slightly less in fourth

grade). The t-test indicates a statistically significant finding. The finding leads to a rejection of

this null hypothesis.

Although a possible explanation for the between-group difference is merely hypothetical

for this project, it deserves mention. In the existing literature there is evidence that the parents of

non-ELL/LEP students in TWI programs can be active advocates for their children’s success,

citing concerns among school personnel that “the two-way immersion classes have more than

their fair share of supportive middle class parents” (Scanlan & Palmer, 2009, p. 402). In other

words, parents from English-speaking homes must opt for their child to be in this special

language enrichment program. Opting-in requires a certain level of sophistication including

familiarity with the educational system and how to navigate the paperwork required for

participation, among other attributes. If parents who have the resources to elect for TWI do so –

whether for financial, cultural, or other reasons – then fewer “supportive” parents are involved in

the mainstream English classes. If parental support is a key element for student success, then this

may be a possible explanation for the TWI students’ higher scores. Another possible explanation

is that non-ELL/LEP TWI students are able to access higher cognitive thinking and language

skills due to exposure to two languages, although this benefit usually takes more time to realize

(Thomas & Collier, 2002).

Hypothesis 2. There will be no significant difference in STAAR ratio scores between TWI

program participants who are ELL/LEP students and those who are not ELL/LEP.

This hypothesis concerns only students who are in the TWI programs, and includes

students who qualify as ELL/LEP and non-ELL/LEP. These data appear to indicate that non-

ELL/LEP students substantially outperform their ELL/LEP classmates in TWI settings. Total

62

mean STAAR ratio scores were 28 points higher for the non-ELL/LEP group than the ELL/LEP

group. Whereas virtually all of the non-ELL/LEP students in the TWI program met the passing

standard, approximately half of ELL/LEP students did so in third grade. Fewer than four out of

ten ELL/LEP students met the passing standard in fourth grade. Of all 54 ELL/LEP students in

the TWI program whose data were analyzed, only 1 of these students met the advanced standard,

compared with approximately 40% of all non-ELL/LEP students in the TWI program (n=25).

The t-test indicates statistically significant results and supports the rejection of the null

hypothesis.

There are several possible factors that lead to the results in the findings. For one, the

design of the program at this district is unknown. It is possible that the program uses very little

Spanish instruction, which might lead to the results seen here. Instructional time spent in the

native language (Spanish) can lead to more academic success for ELL/LEP students. The

program may be a 50/50 model, or a 90/10 model, or something else. Additionally, there is the

question of program fidelity. It can be the case that a program designed to teach the majority of

the content in Spanish, actually, in practice, does not. Some teachers or administrators choose to

implement the language proportions in different ways than the program is designed.

Another possible factor is of the ELL/LEP students’ background. If the community is

composed mostly of recent immigrants, it is possible that they are unfamiliar with the culture of

U.S. schools and the culture of standardized testing. Such lack of familiarity would impede these

students’ ability to perform at a high rate on the STAAR test.

Yet more factors worth considering concern the teachers’ backgrounds, the curriculum,

and instructional materials. Does the teacher have adequate training and understanding to

successfully implement the dual-language program for minority-language speakers’ success? Is

63

her philosophy one of inclusiveness? Does the curriculum adequately address the learning needs

of students from diverse backgrounds? Are the lessons and instructional materials designed in a

way that minority-language students can sustain meaningful engagement?

These are some of many possible factors for the substantial underperformance of the

ELL/LEP students in this study. However, because there are no classroom observations or other

qualitative data available, it is not possible to indicate which may be contributing to this

discrepancy.

Hypothesis 3. There will be a statistically significant difference in STAAR ratio scores

between students who qualify for free or reduced lunch and those who do not qualify. Those who

qualify for free or reduced lunch will score significantly lower than those who do not qualify.

This portion of the analysis focuses on the between-group differences for students who

qualify as economically disadvantaged and those who do not. The hypothesis postulates that

there will be a statistically significant difference between the ratio scores of the two groups

(EcoDis and non-EcoDis). The data do appear to lead to acceptance of this hypothesis; however,

the findings appear to indicate that there is a substantial difference depending on whether a

student is both economically disadvantaged and ELL/LEP or just economically disadvantaged.

Non-ELL/LEP students only. As discussed in the findings, the TWI program appears to

allow students from economically disadvantaged homes to make considerable gains in reading

achievement compared with their peers in non-TWI settings. However, this may only extend to

non-ELL/LEP students. It is important to remember that only approximately 16% of the non-

ELL/LEP students in the TWI program are students from poverty (n=10 of a total of 63 non-

ELL/LEP students in TWI). There are three grade levels at three schools that contributed data to

this project, thus there is an average of approximately one economically disadvantaged non-

64

ELL/LEP student per classroom. Meanwhile, more than 1 out of 4 (27%) of students in the

mainstream English program (n =191 of 698 total) are classified as economically disadvantaged.

The ten students in the study sample who are economically disadvantaged, non-

ELL/LEP, TWI participants had a mean ratio score of 70, which as mentioned earlier, makes up

about 50% of the “distance” between their economically disadvantaged and non-economically

disadvantaged peers in the mainstream English classroom. These ANOVA results would lead to

the conclusion that TWI program participation does not inhibit literacy achievement (STAAR

ratio score) for non-ELL/LEP economically disadvantaged students. Importantly, the findings

reveal that the TWI program appears to improve their literacy achievement.

In terms of conceivable factors that might be contributing to these results, it is possible

that these economically disadvantaged non-ELL/LEP students are influenced by classroom peers

of a higher socioeconomic status who may be more readily able to navigate the classroom

environment. Through their interaction, they are able to take advantage of the enrichment

instruction. However, as mentioned earlier, the size of this group is small (n=10), so it is

possible that the data are misleading due to sample size.

Students from all language backgrounds in TWI programs. This section focuses on all

students who participate in the TWI program, and on economic disadvantage as a factor. The

two-way ANOVA shows that the TWI does not have a statistically significant main effect on

ratio scores, but economic disadvantage does. It also appears to indicate that the interaction

effect TWI and EcoDis had a statistically significant effect.

Importantly, 85% of the economically disadvantaged students in the TWI program are

also ELL/LEP (n=57 of 67 total EcoDis in TWI), while 15% of them are non-ELL/LEP (n=10 of

67 total EcoDis in TWI). In this particular analysis, language is not taken into consideration; it is

65

merely program status and economic status. As such, it appears socioeconomic status, in this

case for economically disadvantage students, program may affect test performance outcome, as

indicated by Figure 7 in which the lines diverge.

As shown in the previous section, non-ELL/LEP students from economically

disadvantaged homes do appear to show gains in TWI when compared to peers in the

mainstream English classroom. However, the second ANOVA (all language backgrounds)

appears to show that the impact of TWI on students from poverty is negative. It can be inferred

that this discrepancy is due to the conflation of language background and socioeconomic

background, since the first ANOVA appears to show the positive gains for the non-ELL/LEP

group. Since 85% of the economically disadvantaged students in the TWI program are ELL/LEP,

and as shown earlier their group’s results are the lowest of any demographic presented here, the

main effect for this group appears to be language background. Unfortunately this effect does not

appear to have a positive impact that one would hope to see for language-minority students, for

whose benefit bilingual educational programs were originally developed. However, again caution

must be taken in the interpretation of these results, as nothing is known in this study about

classroom instruction, students’ age of arrival, etc.

66

CHAPTER SIX: IMPLICATIONS

TWI programs in the U.S. have had to navigate the current political climate, standardized

testing pressures, the continued marginalization of minorities, and lack of professional

development. This section explores the research on issues of concern about the TWI model. Of

foremost concern is that the benefits of the program should extend equally to both minority and

majority language speakers. In theory, this is happening, but the “sociopolitical context of ELL

schooling and differences in acquisition contexts” undermine the possibility for equitable

education, even in a program specifically designed to be equitable for all participants (De Jong &

Howard, 2009, p. 86). There is evidence for these concerns in this project’s findings.

Continued marginalization. By integrating students of different cultures and languages,

dual-language programs have tried to neutralize the power struggle that has traditionally defined

their relationship. This integration of minority and majority speakers is widely held to be a

benefit of two-way immersion programs -- academically, linguistically, and culturally. However,

these benefits are generally assumed and not explicitly proven (De Jong & Howard, 2009).

Dominant-culture (Anglophone) families in TWI programs at times appropriate minorities’

linguistic and cultural assets, as described by Pimental (2011): “[T]he presence of language-

majority students in a dual-language program, students who likely see Spanish as a commodity,

transforms the outlook of bilingualism within the larger school setting” (p. 348). This section

describes the literature on marginalization in dual-language programs.

Cultural marginalization. Although one of the main goals of TWI is cross-cultural

competency, there is evidence that schools still “reflect the societal power structure …[and]

reinforce the lies, distortions, and occasional truths upon which national and dominant-group

cultural identities are built” (Cummins in Valdes, 2011). Pimentel (2011) explains that “dual-

67

language programs may operate from a Whiteness frame of reference, wherein Latina/o students’

language and cultural practices come to be perceived in positive terms only because they serve as

commodities that can be enjoyed by “White, English-speaking students” (p. 351). The fact that

school celebrations are exclusively from Hispanic culture, “exoticizes the Latina/o students’

cultural practices” and suggests that they exist for the purpose of being “consumed by the White

families” (p. 351).

Some programs maintain implicit “gate-keeping” that effectively excludes minority

families from participating in TWI programs. Scanlan and Palmer (2009) describe a scenario in

which a school sets a deadline to sign up for the dual-language program lottery several months in

advance. The white middle-class families in the area are highly aware of this deadline. Minority

families (in this study, African-Americans) may not socialize in the same circles or use the same

media as those who know about the deadline, due to “institutional and cultural barriers that go

far beyond the reaches of this school” (p. 400). Thus, these minority families are effectively

unable to enroll their children in the program. Similarly, Dorner (2011) found that TWI

information was often disseminated by computer (website, listserv, etc.), marginalizing those

who do not have access to technology.

TWI programs are extending learning opportunities for dominant majority (white upper-

and middle-class families), but questions remain about the impact of TWI on the English-

speaking children who are “left behind” in the monolingual classrooms. By involving many

“supportive middle class parents” in TWI programs, the monolingual English classrooms are

negatively impacted because the only students “left” are those whose parents have not opted to

participate in TWI. Such situations “heighten awareness in these school communities around

68

discrete dimensions of diversity while muting discourse around others” (Scanlan & Palmer,

2009, p. 412).

Dorner (2010) researched a Chicago area school district that was starting a dual-language

program. Her study explores “which voices are valued” in the debate over which elementary

school would offer the program (p. 578). In her analysis, Dorner found that low-income Mexican

families’ voices were marginalized and their desire for their neighborhood school to house the

dual language program was not heard.

Linguistic marginalization. TWI seems to offer the best of both worlds for minority

speakers – an environment in which to acquire literacy in the native language, and access to

school and community support (Valdés, 2011). Several researchers have examined who is able to

access linguistic benefits from cultural integration in TWI settings. De Jong & Howard (2009)

determined that the benefits of such student integration have largely been taken for granted, and

that successful outcomes are not necessarily guaranteed for all students in these programs. The

perceived equity between languages (as in, “we’re all second language learners here”) glosses

over actual unequal access to power (Fitts, 2006).

Because English is the dominant language both inside and outside of school, minority

speakers become fluent in English more quickly than the majority speakers become fluent in the

minority language. In the TWI classroom, students are likely to select the language of most

efficient communication, which is often English even if they are working academically in

Spanish (De Jong & Howard, 2009; Broner & Tedick, 2011). This asymmetrical use of language

is not just related to proficiency – it is also tied to the perception of English as the language of

power (Goldenberg, 2006; De Jong & Howard, 2009). In Spanish classrooms, teachers express

concern that they need to be “Spanish police”, strictly separating student’s languages (Fitts,

69

2006). The way many Spanish-speaking students actually talk – codeswitching, using borrowed

words – is not honored.

Valdes (2011) has shown that some teachers have different expectations for students

depending on language background. For instance, English speakers learning Spanish are

“applauded” in TWI programs, while it is “expected” that Spanish speakers learn English;

therefore Spanish-speakers do not receive as much praise. Language minority students are

expected to be language experts who can model the target language for the majority speakers,

even if the minority speakers are not familiar with the school language register (De Jong &

Howard, 2009). Teachers may set high expectations for these students without appropriately

scaffolding for them to be academic language models.

Academic rigor. Valdes (2011) found the Spanish used in the TWI classroom to be

“watered down” to accommodate the English-speakers. Teachers use short sentences, basic

comprehension questions, and “impoverished teacher input, teacher-student interaction,

questioning and lesson pacing as a result of accommodating for the presence of (beginning)

second language learners” (De Jong & Howard, 2009, p. 89). Lindholm-Leary found the same

pattern with teacher questioning, with 64% factual recall questions and only 36% higher order

questions (2001). Fitts (2006) explains that Spanish classrooms “tend to promote a teacher-

controlled, Initiation-Response-Evaluation(I-R-E)” participation format, which is “hardly known

for being student-centered or democratic” (Fitts, 2006, p. 354).

On the other hand, the English TWI classrooms focus on more complex tasks. This is a

problem because “what is novel to native English speakers acquiring Spanish is not a learning

experience for native Spanish speakers” (De Jong & Howard, 2009, p. 88). If the Spanish being

used in TWI is being modified for English speakers, ELLs will have difficulty acquiring the

70

requisite “cognitive academic proficiency” needed for academic success (Valdés, 2011).

Spanish, or English, are neutral in this regard – neither lends itself “naturally” to certain

classroom practices. Further research is needed about the teachers, the curriculum, and the

classroom/school interaction that cause the less-than-ideal learning environment described here.

Even collaborative work, one of the cornerstones of TWI programs, can undermine

ELLs’ academic products, due to constant interruptions for translations by the majority-speakers,

which can “wear their partners out” (De Jong & Howard, 2009, p. 90). One suggestion De Jong

and Howard offer to remedy this situation is flexible grouping; sometimes this means

homogenous grouping that allows minority speakers to engage in more challenging literacy

activities. For majority students, certain structures can help: teaching language chunks (social

language), teacher modeling of target language, and sentence starters. Minority speakers benefit

from explicit teaching of how to be a language model.

Teacher agency. Teachers are agents of social change, as they negotiate top-down

language policy on the ground level (Paciotto & Delany-Barmann, 2011). Teacher agency is

individual action that either advances or undermines policy implementation. This section

presents two situations from the literature: one in which agency is noticeably absent, and in

which teacher agency helped to advance bilingualism in a community.

Warhol and Mayer (2012) report on the dissolution of a TWI program in a large, poor,

inner city elementary school in urban Connecticut. The school was underperforming on NCLB

criteria, which brought “comprehensive school reform” (CSR) to the school. Bilingual teachers

say the CSR “coaches” demanded they not teach in Spanish, but CSR claims they hadn’t. In this

case, explicit state language policy supports bilingual education, but the implicit local policy was

that bilingual education is ineffective and not to be used. Teachers spoke in interviews about the

71

students from a deficit perspective, and expressed beliefs that the program was “hindering”

learning (p. 158). Due to lack of adequate leadership and training, teachers accepted the top-

down approach, internalized “the prevailing language ideologies about language education” (p.

159), and the program was dissolved (Warhol & Mayer, 2012).

The elementary teachers in Paciotto and Delany-Barmann’s 2011 study were teaching in

a rural town in Illinois. At the time, the ELLs, recent immigrants from Latin America, were in a

transitional bilingual education program. These teachers took it upon themselves to learn about

bilingual education best practices – they attended workshops, earned their credentials in bilingual

education, and even traveled to Mexico to learn more about their students’ culture. The educators

decided to create a more powerful educational experience for both minority- and majority-

speakers at the school by integrating them in a TWI program. The rural community had never

been exposed to TWI, so at first, administration and residents were resistant. This did not stop

the teachers; that first year they started bringing the two groups together for instruction for 30

minutes each day, increasing it each year to 50/50. The teachers’ agency in “selling” this

program to the community was critical, although a 3-year federal grant is what firmly

“legitimized” the TWI program. This bottom-up approach is an example of teacher agency that

successfully changed a remedial program into an enrichment program.

Despite these challenges, TWI programs are an excellent alternative to subtractive

language programs. Although TWI are currently implemented successfully in many schools, they

are viable only to the extent that political will wants to keep them alive. Therefore, this study

seeks to establish the academic profile of the typical majority-language speaker in a TWI

program. This profile can help educators know what is within the boundaries of “normal” for

non-ELLs, and hopefully prevent knee-jerk reactions that might put TWI programs in danger.

72

CONCLUSION AND FUTURE RESEARCH

This analysis appears to show that the literacy achievement rates of native English-

speaking participants appear to be higher than both non-ELL/LEP students in monolingual

settings and ELL/LEP students in TWI settings. Educational program and language background

are both statistically significant factors in the STAAR scores of this sample group.

Socioeconomic status also appears to be a main effect in student performance. However, future

research is needed to determine if these findings can be generalized to the greater population.

Statewide data are available that would provide a much fuller picture as to the literacy profile of

students in TWI programs. Longitudinal studies that follow cohorts through the grade levels

could provide much more information about how student groups progress over time in relation to

each other. Finally, research on assessment is critical to determine a more effective way to assess

emerging bilingual students. Would it be feasible or desirable for the state to create bilingual

academic and language standards and assessments in two languages for TWI students? Would

results be different for classrooms that are considered to be model programs?

The original purpose of two-way dual language programs was to meet the cultural and

linguistic needs of minority-language speakers. Future research is needed to investigate how

educators can realize this important goal. Is there a way to integrate the dual purposes of dual-

language education?

73

APPENDIX

Raw Score Scale Score0 7291 8712 9563 10094 10485 10796 11067 11298 11509 116910 118711 120412 121913 123414 124915 126316 127617 129018 130319 131620 133121 134122 135423 136724 138025 139326 140027 142128 143529 145030 146831 148332 150233 152234 155535 156936 159937 163638 168639 176940 1909

State of Texas Assessments of Academic Readiness

Raw Score Conversion Table

Grade 3 Reading

Spring 2013

* Level II: Satisfactory (Recommended)

LevelII:Satisfactory

(Phase-In2)

LevelIII:Advanced

LevelI:Unsatisfactory(Phase-In1)

LevelII:Satisfactory(Phase-In1)


LevelI:Unsatisfactory(Recom

mended)

LevelII:

(R)*

74

Raw Score Scale Score0 8111 9512 10343 10844 11215 11506 11757 11978 12179 123410 125111 126612 128113 129514 130815 132116 133417 134618 135819 137020 138121 139322 140423 141624 142225 143926 145027 146228 147529 148630 150031 151332 152733 154134 155035 157336 159137 161038 163339 165640 168641 172342 177343 185544 1995



Grade 4 Reading

Spring 2013


LevelII:Satisfactory

(Phase-In2)

LevelIII:Advanced


LevelII:Satisfactory(Phase-In1)


LevelI:Unsatisfactory(Recom

mended)

LevelII:

(R)*

75

Raw Score Scale Score0 8291 9692 10523 11024 1139


5 11696 11947 12168 12369 1254

10 127011 128612 1301

Grade 5 Reading

March 2012


fact

ory

(Pha

se-In

1)

(Pha

se-In

2)

men

ded)

13 131414 132815 134116 135317 136518 137719 138920 1400

Leve

l I: U

nsat

isf

Leve

l I: U

nsat

isfa

ctor

y

nsat

isfa

ctor

y (R

ecom

m

21 141122 142323 143424 144525 145826 146727 147928 1490 se

-In 1

)

L

Leve

l I: U

n

29 150230 152031 152732 153933 155234 156635 158236 159637 1612 ve

l II:

Satis

fact

ory

(Pha

s

el II

: Sat

isfa

ctor

y (P

hase

-In 2

)

I: (R

)*

37 161238 163039 165040 166741 169642 172643 176344 181345 1896

Lev

Leve

l III:

Ad

vanc

edLe

ve

Leve

l II

45 189646 2035


76

REFERENCES

Abedi, J. (2004). The No Child Left Behind Act and English Language Learners: Assessment

and Accountability Issues. Educational Researcher, 33(1), 4-14

Abedi, J., Hofstetter, C. H., and Lord, C. (2004). Assessment Accommodations for English

Language Learners: Implications for Policy-Based Empirical Research. Review of

Educational Research, 74(1), 1-28

Abedi, Jamal. (2010). Performance assessments for English language learners: Stanford, CA:

Stanford University, Stanford Center for Opportunity Policy in Education. Retrieved

July, 20 2013 from: http://edpolicy. stanford.

edu/sites/default/files/publications/performance-assessments-english-

languagelearners.pdf

Abedi, Jamal, and Gándara, Patricia. (2006). Performance of English Language Learners as a

Subgroup in Large-‐Scale Assessment: Interaction of Research and Policy. Educational

Measurement: Issues and Practice, 25(4), 36-46

Abedi, Jamal, Leon, S, and Mirocha, J. (2003). Impact of student language background on

content-based performance: Analyses of extant data: Center for the Study of Evaluation,

National Center for Research on Evaluation, Standards, and Student Testing, Graduate

School of Education and Information Studies, University of California, Los Angeles

Abeywickrama, P., & Brown, H. D. (2010). Language assessment: Principles and classroom

practices. Pearson-Longman.

77

Barnett, W. Steven, Yarosz, Donald J., Thomas, Jessica, Jung, Kwanghee, and Blanco, Dulce.

(2007). Two-way and monolingual English immersion in preschool education: An

experimental comparison. Early Childhood Research Quarterly, 22(3), 277-293

Bialystok, Ellen. (2002). Acquisition of literacy in bilingual children: A framework for research.

Language Learning, 52(1), 159-199.

Bland, J. M., & Altman, D. G. (1995). Multiple significance tests: the Bonferroni method. BMJ,

310 (6973), 170.

Block, Nicholas C. (2012). Perceived impact of two-way dual immersion programs on Latino

students' relationships in their families and communities. International Journal of

Bilingual Education and Bilingualism, 15(2), 235-257

Borsato, Graciela N, and Padilla, Amado M. (2008). Educational assessment of English-language

learners. In Suzuki, Lisa A. (Ed); Ponterotto, Joseph G. (Ed), (2008). Handbook of

multicultural assessment: Clinical, psychological, and educational applications. 471-

489. San Francisco, CA, US: Jossey-Bass.

Broner, M., & Tedick, D. J. (2011). Talking in the fifth-grade classroom: Language use in an

early, total Spanish immersion program. Immersion education: Practices, policies,

possibilities, 166-186.

Capps, R., Fix, M., Murray, J., Passel, J.S., and Herwantoro, S. (2005). The New Demography of

America’s Schools: Immigration and the No Child Left Behind Act. Washington, DC:

The Urban Institute.

Center for Applied Linguistics. (2007). Directory of Two-Way Immersion Programs in the U.S.

Retrieved from http://www.cal.org/jsp/TWI/SchoolListings.jsp on February 16, 2014

78

Choi, Daniel and Wright, Wayne E. (2006). The Impact of Language and High-Stakes Testing

Policies on Elementary School English Language Learners in Arizona. Educational

Policy Analysis Archives. 14(13), 1-58.

Christian, Donna, Montone, Christopher, Lindholm, Katherine, and Carranza, Isolda. (1997).

Profiles in Two-Way Immersion Education. Language in Education: Theory and Practice

89: Center for Applied Linguistics.

Cloud, Nancy, Genesee, Fred, and Hamayan, Else V. (2000). Dual language instruction: A

handbook for enriched education: Heinle and Heinle Boston.

Cobb, B., Vega, D., and Kronauge, C. (2006). Effects of an Elementary Dual Language

Immersion School Program on Junior High School Achievement. Middle Grades

Research Journal, 1(1).

Crawford, James. (1995). Bilingual education: History, politics, theory, and practice (3rd ed.):

Crane Publishing Company Trenton, NJ.

Creswell, J. W. (2012). Qualitative inquiry and research design: Choosing among five

approaches. Sage.

Cummins, James. (1981). The role of primary language development in promoting educational

success for language minority students. Schooling and language minority students: A

theoretical framework, 3-49.

Cummins, Jim. (2000). Language, power, and pedagogy: Bilingual children in the crossfire

(Vol. 23): Clevedon [England]; Toronto: Multilingual Matters.

79

Cummins, Jim. (2011). The intersection of cognitive and sociocultural factors in the

development of reading comprehension among immigrant students. Reading and Writing,

25(8), 1973-1990.

De Jong, Ester. "L2 proficiency development in a two-way and a developmental bilingual

program." NABE Journal of Research and Practice 2.1 (2004): 77-108.

De Jong, E.J., Bearse, C.I., Tedick, D, Christian, D, and Fortune, T.W. (2011). The same

outcomes for all? High school students reflect on their two-way immersion program

experiences. Immersion education: Practices, policies, possibilities, 104-122.

De Jong, Ester. (2011). Toward a Monolingual USA? The Modern English-Only Movement.

http://www.colorincolorado.org/article/49656/ Accessed 10/31/2013.

De Jong, Ester de, and Howard, Elizabeth. (2009). Integration in two-way immersion education:

equalising linguistic benefits for all students. International Journal of Bilingual

Education and Bilingualism, 12(1), 81-99.

Del Rosario Basterra, María, Trumbull, Elise, and Solano-Flores, Guillermo. (2010). Cultural

validity in assessment: Addressing linguistic and cultural diversity: Routledge.

Dorner, L. M. (2011). Contested communities in a debate over dual-language education: The

import of “public” values on public policies. Educational Policy, 25(4), 577-613.

Dow, Pauline. (2008). Dual-language education- A longitudinal study of students' achievement

in an El Paso County, Texas school district dissertation 2008.pdf. (Doctor of Education),

The University of Texas at El Paso, El Paso, TX.

Escamilla, K., et al. (2013) Biliteracy from the Start: Literacy Squared in Action. Philadelphia:

Caslon.

80

Escamilla, K. (2006) Monolingual Assessment and Emerging Bilinguals: A Case Study in the

US. In O. Garcia, T. Skutnabb-Kangas, & M. Torres-Guzman (Eds.), Imagining

Multilingual Schools Ch. 9: pp. 184-199. Clevedon, England: Multilingual Matters Ltd.

Fairbairn, Shelley B, and Fox, Janna. (2009). Inclusive achievement testing for linguistically and

culturally diverse test takers: Essential considerations for test developers and decision

makers. Educational Measurement: Issues and Practice, 28(1), 10-24.

Fitts, S. (2006). Reconstructing the status quo: Linguistic interaction in a dual-language school.

Bilingual Research Journal, 30(2), 337-365.

Flores, B. B., and Smith, H. L. (2009). Teachers’ characteristics and attitudinal beliefs about

linguistic and cultural diversity. Bilingual Research Journal, 31(1-2), 323-358.

Ford, K. L., Cabell, S. Q., Konold, T. R., Invernizzi, M., and Gartland, L. B. (2013). Diversity

among Spanish-speaking English language learners: profiles of early literacy skills in

kindergarten. Reading and Writing, 1-24.

Gándara, Patricia, and Baca, Gabriel. (2008). NCLB and California’s English language learners:

The perfect storm. Language Policy, 7(3), 201-216.

Genesee, Fred. (1987). Learning through two languages: Studies of immersion and bilingual

education (Vol. 163): Newbury House Cambridge, MA.

Genesee, F., & Lindholm-Leary, K. (2012). The education of English language learners. In K. R.

Harris, S. Graham, & T. Urdan (Eds.), APA Education Psychology Handbook: Vol. 3.

Application to learning and teaching (pp. 499–526). Washington, DC: American

Psychological Association.

81

Gerena, L. (2010). Parental Voice and Involvement in Cultural Context: Understanding

Rationales, Values, and Motivational Constructs in a Dual Immersion Setting. Urban

Education, 46(3), 342-370.

Geva, Esther, and Farnia, Fataneh. (2011). Developmental changes in the nature of language

proficiency and reading fluency paint a more complex view of reading comprehension in

ELL and EL1. Reading and Writing, 25(8), 1819-1845.

Goldenberg C., Rueda R., August D. (2006) in Developing literacy in second language learners:

Report of the national literacy panel on language-minority children and youth,

Sociocultural influences on the literacy attainment of language-minority children and

youth, eds. August D, Shanahan T (Lawrence Erlbaum, Mahwah, NJ), pp 269–318.

Goldenberg, C. (2010) Improving Achievement for English Learners: Conclusions from Recent

Reviews and Emerging Research. In Li, G., & Edwards, P. A. (Eds.). Best practices in

ELL instruction. Guilford Press.

Hadi-Tabassum, Samina. (2006). Language, space and power: A critical look at bilingual

education (Vol. 55): Multilingual Matters.

Hoover, Wesley A, and Gough, Philip B. (1990). The simple view of reading. Reading and

Writing, 2(2), 127-160.

Hornberger, N. H., & Link, H. (2012). Translanguaging and transnational literacies in

multilingual classrooms: A biliteracy lens. International Journal of Bilingual Education

and Bilingualism, 15(3), 261-278.

Howard, Elizabeth R, and Sugarman, Julie. (2007). Realizing the vision of two-way immersion:

Fostering effective programs and classrooms: Center for Applied Linguistics.

82

International Reading Association. (1995). The literacy dictionary: The vocabulary of reading

and writing. T. L. Harris, & R. E. Hodges (Eds.). International Reading Assoc.

Kirby, Sheila Nataraj. (2003). Developing an R&D Program to Improve Reading

Comprehension. RAND Reading Study Group.

Koyama, Jill, and Menken, Kate. (2013). Emergent Bilinguals: Framing Students as Statistical

Data? Bilingual Research Journal, 36(1), 82-99.

Krashen, Stephen D. (1981). Bilingual education and second language acquisition theory.

Schooling and language minority students: A theoretical framework, 51-79.

Ladson-Billings, G. (1995). Toward a theory of culturally relevant pedagogy. American

educational research journal, 32(3), 465-491.

Lakin, Joni M., and Lai, Emily R. (2012). Multigroup Generalizability Analysis of Verbal,

Quantitative, and Nonverbal Ability Tests for Culturally and Linguistically Diverse

Students. Educational and Psychological Measurement, 72(1), 139-158.

Lara, Monica. (2010). Doctor of Education. The Structure of an Early Reading Test in Grade 1:

In Search of a Relationship with Reading in Spanish.

Lee, J. S., & Bowen, N. K. (2006). Parent involvement, cultural capital, and the achievement gap

among elementary school children. American Educational Research Journal, 43(2), 193-

218.

Lindholm-Leary, Kathryn, and Block, Nicholas. (2009). Achievement in predominantly low

SES/Hispanic dual language schools. International Journal of Bilingual Education and

Bilingualism, 13(1), 43-60.

83

Lindholm-Leary, Kathryn, and Hernández, Ana. (2011). Achievement and language proficiency

of Latino students in dual language programmes: native English speakers, fluent

English/previous ELL/LEP students, and current ELL/LEP students. Journal of

Multilingual and Multicultural Development, 32(6), 531-545.

Lindholm-Leary, Kathryn J. (2001). Dual Language Education. Clevedon, UK: Multilingual

Matters, Ltd.

Lopez, F., and McEneaney, E. (2012). State Implementation of Language Acquisition Policies

and Reading Achievement Among Hispanic Students. Educational Policy, 26(3), 418-

464.

López, M. G., & Tashakkori, A. (2006). Differential outcomes of two bilingual education

programs on English language learners. Bilingual Research Journal, 30(1), 123-145.

Lyster, Roy. (2004). Research on form-focused instruction in immersion classrooms:

implications for theory and practice. Journal of French Language Studies, 14(3), 321-

341.

MacSwan, J., & Rolstad, K. (2006). How language proficiency tests mislead us about ability:

Implications for English language learner placement in special education. The Teachers

College Record, 108(11), 2304-2328.

Mancilla-Martinez, J., and Lesaux, N. K. (2010). Predictors of Reading Comprehension for

Struggling Readers: The Case of Spanish-speaking Language Minority Learners. J Educ

Psychol, 102(3), 701-711.

84

May, Stephen. (2008). Bilingual Immersion Education - What the research tells us 2008.pdf. In

N. H. H. J. Cummins (Ed.), Encyclopedia of Language and Education (Vol. 2, pp. 19-

34): Springer Science and Business Media.

McClarty, Katie Larsen, et al. "Evidence-Based Standard Setting Establishing a Validity

Framework for Cut Scores." Educational Researcher 42.2 (2013): 78-88.

Menken, Kate. (2011). From policy to practice in the Multilingual Apple: bilingual education in

New York City. International Journal of Bilingual Education and Bilingualism, 14(2),

121-131.

Nakamoto, Jonathan, Lindsey, Kim A., and Manis, Franklin R. (2010). Development of reading

skills from K-3 in Spanish-speaking English language learners following three programs

of instruction. Reading and Writing, 25(2), 537-567.

Nelson-Barber, S., & Trumbull, E. (2007). Making assessment practices valid for Indigenous

American students. Journal of American Indian Education, 46(3), 132-147.

Nieto, David. (2009). A Brief History of Bilingual Education in the United States By David

Nieto 2009.pdf. Perspectives on Urban Education, 6(1), 61-72.

No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425 (2002).

Paciotto, C., & Delany-Barmann, G. (2011). Planning micro-level language education reform in

new diaspora sites: two-way immersion education in the rural Midwest. Language Policy,

10(3), 221-243.

Perez. B. & Flores, B. (2002). Biliteracy development in two-way immersion classrooms:

Analysis of third grade Spanish and English reading. In Schallert, D., Fairbanks, C.,

85

Worthy, J., Maloch, B. & Hoffman, J. (Eds). 51st yearbook of the National Reading

Conference (pp. 357-367). Oak Creek, WI: National Reading Conference.

Pimentel, C. (2011). The color of language: The racialized educational trajectory of an emerging

bilingual student. Journal of Latinos and Education, 10(4), 335-353.

Prior, Anat. (2012). Reading in More Than One Language: Behavior and Brain Perspectives

Reading, Writing, Mathematics and the Developing Brain: Listening to Many Voices (pp.

131-155): Springer.

Proctor, P., August, D., Snow, C., and Barr, C. (2010). The Interdependence Continuum: A

Perspective on the Nature of Spanish–English Bilingual Reading Comprehension.

Bilingual Research Journal, 33(1), 5-20.

Proctor, P., Silverman, R., Harring, J., and Montecillo, C. (2011). The role of vocabulary depth

in predicting reading comprehension among English monolingual and Spanish–English

bilingual children in elementary school. Reading and Writing, 25(7), 1635-1664.

Ramírez, J David, Yuen, Sandra D, and Ramey, Dena R. (1991). Final report: Longitudinal study

of structured English immersion strategy, early-exit and late-exit programs for language

minority children (Report submitted to the US Department of Education). San Mateo,

CA: Aguirre International.

Riches, Caroline, and Genesee, Fred. (2006). Crosslinguistic and crossmodal issues. Educating

English language learners: A synthesis of research evidence, 64-108.

Rosa Hernandez Sheets, Blaca Araujo, Gloria Calderon, and John Indiatsi. (2010). Developing

Cultural Competency. In Flores, B. B., Sheets, R. H., & Clark, E. R. (Eds.) Teacher

preparation for bilingual student populations: Educar para transformar. Routledge.

86

Sanchez, Serafin V., Rodriguez, Billie Jo, Soto-Huerta, Mary Esther, Villarreal, Felicia Castro,

Guerra, Norma Susan, and Flores, Belinda Bustos. (2013). A Case for Multidimensional

Bilingual Assessment. Language Assessment Quarterly, 10(2), 160-177.

Sandberg, K. L., and Reschly, A. L. (2010). English Learners: Challenges in Assessment and the

Promise of Curriculum-Based Measurement. Remedial and Special Education, 32(2),

144-154.

Scanlan, M., & Palmer, D. (2009). Race, power, and (in) equity within two-way immersion

settings. The Urban Review, 41(5), 391-415.

Schouten, Belinda Treviño. (2006). Doctor of Education. Working the System: Low-income

Latino Student Achievement: ProQuest.

Shanahan, T., & August, D. (Eds.). (2008). Developing Reading and Writing in Second

Language Learners. Routledge.

Shuy, R. W. (1981). A Holistic View of Language. Research in the Teaching of English, 15(2),

101-11.

Solano-Flores, G., and Trumbull, E. (2003). Examining Language in Context: The Need for New

Research and Practice Paradigms in the Testing of English-Language Learners.

Educational Researcher, 32(2), 3-13.

Solano-Flores, Guillermo. (2008). Who Is Given Tests in What Language by Whom, When, and

Where? The Need for Probabilistic Views of Language in the Testing of English

Language Learners. Educational Researcher, 37(4), 189-199.

Stevens, J. (1996). Applied Multivariate Statistics for the Social Sciences. Manwah, NJ:

Lawrence Earlbaum Associates.

87

Strevens, P. (1977). New orientations in the teaching of English (Vol. 8). Oxford: Oxford

University Press.

Teddlie, C., & Yu, F. (2007). Mixed methods sampling a typology with examples. Journal of

mixed methods research, 1(1), 77-100.

Thomas, Wayne and Collier, Virginia. (2004). The Astounding Effectiveness of Dual Language

Education for All 2004.pdf. NABE Journal of Research and Practice,, 2(1), 1-20.

Thomas, Wayne and Collier, Virginia. (2002). A National Study of School Effectiveness for

Language Minority Students' Long-Term Academic Achievement 2002.pdf. Center for

Research on Education, Diversity and Excellence UC Berkeley.

Tilstra, Janet, McMaster, Kristen, Van den Broek, Paul, Kendeou, Panayiota, and Rapp, David.

(2009). Simple but complex: components of the simple view of reading across grade

levels. Journal of Research in Reading, 32(4), 383-401.

Valdés, Guadelupe. (2011). Dual-language immersion programs: A cautionary note concerning

the education of language minority students. Research and Practice in Immersion

Education.

Valencia, S., & Wixson, K. (2000). Policy-oriented research on literacy standards and

assessment. In M.L. Kamil, P.B. Mosenthal, P.D. Pearson, & R. Barr (Eds.), Handbook

of reading research (Vol. 3, pp. 909–935). Mahwah, NJ: Erlbaum.

Warhol, L., & Mayer, A. (2012). Misinterpreting School Reform: The Dissolution of a Dual-

Immersion Bilingual Program in an Urban New England Elementary School. Bilingual

Research Journal, 35(2), 145-163.

88

Weinfurt, K. P. (2001). Multivariate Analysis of Variance. In L. G. Grimm & P. R. Yarnold

(Eds.), Reading and Understanding Multivariate Statistics (pp. 245-276). Washington, DC:

American Psychological Association.

Whiting, Erin F., and Feinauer, Erika. (2011). Reasons for enrollment at a Spanish–English two-

way immersion charter school among highly motivated parents from a diverse community.

International Journal of Bilingual Education and Bilingualism, 14(6), 631-651.

Wood, David, Bruner, Jerome S, and Ross, Gail. (1976). The role of tutoring in problem solving.

Journal of child psychology and psychiatry, 17(2), 89-100.

Wooldridge, B., & Haimes-Bartolf, M. (2006). The field dependence/field independence learning

styles: Implications for adult student diversity, outcomes assessment and accountability.

Learning styles and learning, 237-257

Young, John W., Cho, Yeonsuk, Ling, Guangming, Cline, Fred, Steinberg, Jonathan, and Stone,

Elizabeth. (2008). Validity and Fairness of State Standards-Based Assessments for

English Language Learners. Educational Assessment, 13(2-3), 170-192.

Young, J. W. (2009). A framework for test validity research on content assessments taken by

English language learners. Educational Assessment, 14(3-4), 122-138.

Ziegler, Johannes C, and Goswami, Usha. (2005). Reading acquisition, developmental dyslexia,

and skilled reading across languages: a psycholinguistic grain size theory. Psychological

bulletin, 131(1), 3.

89

VITA

Jeanne Sinclair is originally from Northfield, VT. She studied at the Gallatin School at

New York University and graduated summa cum laude with a B.A. in Individualized Study in

2000. In 2001 she joined AmeriCorps and worked with the non-profit organization The Rio

Grande Institute, first as a volunteer, and later as Assistant Director. In 2009 she began teaching

after receiving her certification from the Region 4 Alternative Teacher Certification Program in

Houston. She taught second and fourth grade in Central Texas schools and graduated with her

M.A. in Bicultural/Bilingual Studies from the University of Texas at San Antonio. Her future

plans include doctoral studies at the University of Toronto.

A LITERACY PROFILE OF MAJORITY-LANGUAGE DUAL-IMMERSION PARTICIPANTS

Documents

Transcript of A LITERACY PROFILE OF MAJORITY-LANGUAGE DUAL-IMMERSION PARTICIPANTS