Challenging the Research Base of the Common Core State Standards: A Historical Reanalysis of Text...

11
Educational Researcher, Vol. 42 No. 7, pp. 381–391 DOI: 10.3102/0013189X13505684 © 2013 AERA. http://er.aera.net OCTOBER 2013 381 A s the latest educational movement to sweep the nation, the Common Core State Standards (CCSS) have received an unprecedented amount of support, as evi- denced by their rapid adoption in all but a handful of states. Developed by the National Governors Association Center for Best Practices (NGA) and the Council of Chief State School Officers (CCSSO), the CCSS are the most recent iteration of a quarter-century’s worth of efforts to clearly define what students should know and be able to do. Pragmatic in tone, the CCSS are designed “to be robust and relevant to the real world, reflecting the knowledge and skills that our young people need for success in college and careers” (NGA/CCSSO, 2010a). In common with other American school reform initiatives, both past and present, the CCSS purport to be grounded in research studies that unambiguously signal the need for sharp increases in academic standards. As CCSS advocates see it, the problem is not merely that current curricula are weak, but also that today’s schools are burdened by the legacy of a decades-long decline in the quality of American education. Indeed, one of the cornerstone assumptions of the Common Core State Standards in English Language Arts (CCSS/ELA) is the belief that reading textbooks have trended downward in difficulty and sophistica- tion over the past century. This alleged curricular decline is, in turn, rhetorically linked to a lapse in student preparedness for the future; in other words, CCSS proponents believe that American youth are less well-equipped to succeed in “college and careers” than they once were. In their view, the deterioration of textbook quality offers evidence of a languishing curriculum and thus serves as an explicit justification for the creation of a new set of common standards: “while the reading demands of college, workforce training programs, and citizenship have held steady or risen over the past fifty years or so,” the Standards authors assert, “K-12 texts have, if anything, become less demanding” (NGA/ CCSSO, 2010b, p. 2). The appendices that accompany the CCSS/ELA reference the few extant studies that report this drop in the rigor of reading materials. Appendix A cites, for example, a 1977 study by highly regarded reading researcher Jeanne Chall and her colleagues arguing, according to the CCSS authors, that the difficulty of school textbooks decreased from 1963 to 1975 (Chall, Conard, & Harris, 1977). The CCSS rely even more heavily on an influ- ential 1996 study by Hayes, Wolfer, and Wolfe that found “pre- cipitous declines” in average sentence length and vocabulary level in reading textbooks (NGA/CCSSO, 2010b, p. 3). After analyzing portions of 800 elementary, middle, and high school textbooks published between 1919 and 1991, Hayes et al. say they detected a “pervasive decline in the difficulty” of these 505684EDR XX X 10.3102/0013189X13505684Educational ResearcherMonth XXXX research-article 2013 1 The Pennsylvania State University, University Park, PA 2 The Agnes Irwin School, Bryn Mawr, PA Challenging the Research Base of the Common Core State Standards: A Historical Reanalysis of Text Complexity David A. Gamson 1 , Xiaofei Lu 1 , and Sarah Anne Eckert 2 The widely adopted Common Core State Standards (CCSS) call for raising the level of text complexity in textbooks and reading materials used by students across all grade levels in the United States; the authors of the English Language Arts component of the CCSS build their case for higher complexity in part upon a research base they say shows a steady decline in the difficulty of student reading textbooks over the past half century. In this interdisciplinary study, we offer our own independent analysis of third- and sixth-grade reading textbooks used throughout the past century. Our data set consists of books from 117 textbook series issued by 30 publishers between 1905 and 2004, resulting in a linguistic corpus of roughly 10 million words. Contrary to previous reports, we find that text complexity has either risen or stabilized over the past half century; these findings have significant implications for the justification of the CCSS as well as for our understanding of a “decline” within American schooling more generally. Keywords: curriculum; educational reform; history; reading; textbooks FEATURE ARTICLES

Transcript of Challenging the Research Base of the Common Core State Standards: A Historical Reanalysis of Text...

Educational Researcher, Vol. 42 No. 7, pp. 381 –391DOI: 10.3102/0013189X13505684© 2013 AERA. http://er.aera.net

OctObER 2013 381

As the latest educational movement to sweep the nation, the Common Core State Standards (CCSS) have received an unprecedented amount of support, as evi-

denced by their rapid adoption in all but a handful of states. Developed by the National Governors Association Center for Best Practices (NGA) and the Council of Chief State School Officers (CCSSO), the CCSS are the most recent iteration of a quarter-century’s worth of efforts to clearly define what students should know and be able to do. Pragmatic in tone, the CCSS are designed “to be robust and relevant to the real world, reflecting the knowledge and skills that our young people need for success in college and careers” (NGA/CCSSO, 2010a).

In common with other American school reform initiatives, both past and present, the CCSS purport to be grounded in research studies that unambiguously signal the need for sharp increases in academic standards. As CCSS advocates see it, the problem is not merely that current curricula are weak, but also that today’s schools are burdened by the legacy of a decades-long decline in the quality of American education. Indeed, one of the cornerstone assumptions of the Common Core State Standards in English Language Arts (CCSS/ELA) is the belief that reading textbooks have trended downward in difficulty and sophistica-tion over the past century. This alleged curricular decline is, in turn, rhetorically linked to a lapse in student preparedness for the future; in other words, CCSS proponents believe that

American youth are less well-equipped to succeed in “college and careers” than they once were. In their view, the deterioration of textbook quality offers evidence of a languishing curriculum and thus serves as an explicit justification for the creation of a new set of common standards: “while the reading demands of college, workforce training programs, and citizenship have held steady or risen over the past fifty years or so,” the Standards authors assert, “K-12 texts have, if anything, become less demanding” (NGA/CCSSO, 2010b, p. 2).

The appendices that accompany the CCSS/ELA reference the few extant studies that report this drop in the rigor of reading materials. Appendix A cites, for example, a 1977 study by highly regarded reading researcher Jeanne Chall and her colleagues arguing, according to the CCSS authors, that the difficulty of school textbooks decreased from 1963 to 1975 (Chall, Conard, & Harris, 1977). The CCSS rely even more heavily on an influ-ential 1996 study by Hayes, Wolfer, and Wolfe that found “pre-cipitous declines” in average sentence length and vocabulary level in reading textbooks (NGA/CCSSO, 2010b, p. 3). After analyzing portions of 800 elementary, middle, and high school textbooks published between 1919 and 1991, Hayes et al. say they detected a “pervasive decline in the difficulty” of these

505684 EDRXXX10.3102/0013189X13505684Educational ResearcherMonth XXXXresearch-article2013

1The Pennsylvania State University, University Park, PA2The Agnes Irwin School, Bryn Mawr, PA

challenging the Research base of the common core State Standards: A Historical Reanalysis of text complexityDavid A. Gamson1, Xiaofei Lu1, and Sarah Anne Eckert2

the widely adopted common core State Standards (ccSS) call for raising the level of text complexity in textbooks and reading materials used by students across all grade levels in the United States; the authors of the English Language Arts component of the ccSS build their case for higher complexity in part upon a research base they say shows a steady decline in the difficulty of student reading textbooks over the past half century. In this interdisciplinary study, we offer our own independent analysis of third- and sixth-grade reading textbooks used throughout the past century. Our data set consists of books from 117 textbook series issued by 30 publishers between 1905 and 2004, resulting in a linguistic corpus of roughly 10 million words. contrary to previous reports, we find that text complexity has either risen or stabilized over the past half century; these findings have significant implications for the justification of the ccSS as well as for our understanding of a “decline” within American schooling more generally.

Keywords: curriculum; educational reform; history; reading; textbooks

FEAtURE ARtIcLES

382 EDUcAtIONAL RESEARcHER

schoolbooks (Hayes, Wolfer, & Wolfe, 1996, p. 489). Based on their review of studies such as those by Chall et al. (1977) and Hayes et al. (1996), the Standards’ authors ultimately conclude that today’s curricular materials in general, and reading text-books in specific, pale in comparison to those students used before 1962. If true, these findings of a 50-year fall in textbook difficulty certainly offer sufficient cause for action. In fact, the creators of the Standards explicitly state that this evidence of cur-ricular decline is “the impetus behind the Standards’ strong emphasis on increasing text complexity as a key requirement in reading” (NGA/CCSSO, 2010b, p. 2).

But what if this narrative of the decline and fall in textbook quality is inaccurate? We suggest that the story is incomplete at best and, worse yet, it may mislead and distract us from more compelling educational concerns. The rapid adoption of the CCSS has outstripped the kind of serious scrutiny that might normally attend the launch of such a major reform effort. Although most states have embraced the CCSS, the initial analy-ses conducted of these new standards to date are mixed, espe-cially those that assess whether they represent an advance over current state standards (Loveless, 2012; Porter, McMaken, Hwang, & Yang, 2011; Schmidt & Houang, 2012). Given the influence the Standards are likely to exert on schools across the country, further close examination is certainly warranted.

The publication of the CCSS coincided with an interdisci-plinary study we have been conducting on the complexity of elementary school reading textbooks.1 Were it not for the heavy reliance of the CCSS on a chronicle of curricular decline, we would have proceeded apace with our own analysis, discussing—but not necessarily highlighting—the weaknesses and gaps in the studies cited by the Standards. However, we feel it is advisable to explore the presumptions embedded within the CCSS, as some other scholars have begun to do (Hiebert & Mesmer, 2013). The findings from our investigation, though not completely incon-sistent with previous studies, offer a far more nuanced portrait of curricular change. Moreover, we find that previous assertions of curricular decay have often overlooked key dimensions of the historical context in which textbook changes took place.

By offering our own independent analysis of widely used third- and sixth-grade reading textbooks published throughout the past century, we hope to contribute to the ongoing scholarly examination of the CCSS. The data set we have collected is com-posed of more than 100 years’ worth of books from 117 text-book series issued by 30 publishers; our text collection has resulted in a linguistic corpus of roughly 10 million words. Our study blends historical investigative methods with the analytic tools of linguistics. In terms of scope, we lengthen the time period, increase the number of texts, and expand the linguistic corpus beyond that covered by any previous study. For the purposes of this article, and as we describe further in our methods section and in our methodological appendix (available on the journal web-site), we apply a combination of older and newer analytic approaches to our corpus that allows us to assess the degree of sophistication of the words, the readability of the language, and the average length of the sentences used in each textbook.

The CCSS use a variety of terms to describe the relative chal-lenge that textbooks pose to students, but in their prescriptions for the English Language Arts standards, the authors focus on

the concept of “text complexity,” an issue that has received an increasing amount of attention over the past decade (Fisher, Frey, & Lapp, 2012; Hiebert & Mesmer, 2013; Snow, 2002; Williamson, Fitzgerald, & Stenner, 2013). As defined by the CCSS, text complexity refers to “the inherent difficulty of read-ing and comprehending a text combined with consideration of reader and task variables” (NGA/CCSSO, 2010b, p. 43). Although the CCSS calls for a “tripartite system” of establishing text complexity, the sole measure of text difficulty originally ref-erenced in the CCSS is the Lexile Framework (NGA/CCSSO, 2010a, 2010b), a fairly narrow measure by some accounts.2 In her 2011 report on establishing text complexity, Hiebert ques-tions the effectiveness of the Lexile Framework and concludes that there is a need for additional measures of text complexity for the Common Core (Hiebert, 2011). We return to this issue at the end of the article.3

The measures we apply to our data set allow for direct com-parison to the historical studies used by the CCSS, while at the same time they also enable us to offer newer insights and a more robust evaluation of claims concerning downward trends in text-book quality.4 On their own, the results of our linguistic analysis offer a more complicated view of textbook changes over time, and when combined with our historical analysis of the evolution of American schooling, which we discuss briefly below, our study yields something of an alternative picture of the history of the reading curriculum. In fact, our findings offer a direct challenge to the assertions of the CCSS.

Historical Context and Claims

No assessment of text, quantitative or qualitative, is elegant enough to overcome misinterpretations of past educational prac-tice. In our examination of the literature on the historical changes in reading textbooks—especially that utilized by the CCSS—we discovered several invalid historical assumptions, if not some fairly egregious errors in interpretation.5 The Hayes et al. (1996) study referenced above has been cited repeatedly by purveyors of the view that American schools are in decline. The Hayes argu-ment goes like this: the most challenging and rigorous American reading textbooks were published between 1860 and 1918; after World War I, reading textbooks for all grades were “generally simplified”; then, after World War II, the “mean levels of readers for all grades but third became even simpler” (p. 497). By the 1990s, Hayes et al. assert, textbooks had become so watered down that the modern sixth-, seventh-, and eighth-grade readers were less challenging than fifth-grade readers published before World War II.6 Ultimately, they conclude, the decline in text-book difficulty—a consequence of publishers shortening sen-tences and reducing the use of uncommon English words—has led, from fourth grade on, to the simplest school texts in American history (pp. 503–504).

A key primary source that Hayes et al. (1996) use for their pre–World War I assessment of reading textbooks is the set of books that constitute the well-known McGuffey’s Eclectic Readers. “By modern standards,” the authors conclude, “Professor McGuffy’s [sic] pre- and post-Civil War readers were very diffi-cult.” Of course, they were very difficult, the observant historian counters: literary selections in the McGuffey’s were essentially

OctObER 2013 383

compilations of adult reading materials cut and pasted for younger readers (McGuffey & Gorn, 1998; Sullivan, 1994; Westerhoff, 1978). Moreover, unlike the textbooks most of us are familiar with today, nineteenth- and early twentieth-century readers were designed for the purposes of elocution, not compre-hension. In other words, children were not expected to under-stand what they read; instead, they were assessed on how well they could read the text aloud, in the tradition of declamation (Mathews, 1966; Pearson, 2009; Resnick & Resnick, 1977). If the McGuffey’s proved difficult to youthful readers, it was a chal-lenge to their tongues.

As a consequence of treating the McGuffey’s Readers as the high watermark of text complexity, Hayes et al. (1996) artifi-cially skew their own results; everything published thereafter looks bland in comparison. And because the Hayes study is one of the few systematic chronological studies of textbook difficulty, it has gained considerable cache among researchers interested in understanding historical curricular changes, despite its flaws.7 Our interest is not in completely dismissing their study, for it offers some key data points. Rather, the McGuffey example sim-ply serves to remind us that the research base of the CCSS offers shaky ground upon which to build a case for major national reform initiatives.

A second example of a problematic interpretation is demon-strated by CCSS authors’ misreading of Jeanne Chall’s 1977 work. Though she was certainly a tough stickler for rigor, Chall did not necessarily argue, as the Standards’ authors contend, that her own textbook analysis revealed “a thirteen-year decrease from 1963 to 1975 in the difficulty of grade 1, grade 6, and (especially) grade 11 texts” (NGA/CCSSO, 2010b, p. 3). What Chall actually concluded is somewhat different. She and her col-leagues studied a selection of textbook materials—basal readers, literature textbooks, teacher guides, and history textbooks—published between 1947 and 1975. Although Chall did find some declines in language difficulty, especially in the years between 1947 and 1962, she also identified several important reversals to these trends. In first-grade textbooks, she detected a shift toward greater challenge, starting in a rather limited way in the early 1960s but ultimately resulting in a dramatic increase in challenge between 1968 and 1975 (Chall et al., 1977, p. 16). Chall pointed out that she had previously detected a trend toward more rigor in her earlier work (1967), one that had been confirmed by Popp’s (1975) investigation of textbooks for begin-ning readers published between 1968 and 1975. Chall even sug-gested that these increases in the rigor of early reading textbooks may have already resulted in some observable outcomes in read-ing assessments. “Could it be,” she asked (p. 63), “that these more demanding reading programs are related to the fact that the early grades, 1 to 4, have not shown the recent decline in achievement scores?” Could it be, she wondered, that they also helped explain the positive jump in 1975 fourth-grade NAEP scores over those of 1970?

In her study of sixth-grade readers, Chall encountered some-thing very similar, arriving at the qualified conclusion that “it seems as if we are dealing with almost a reverse bell-shaped curve, with the greater challenge and difficulty usually found in the 1936 edition, a decline in the 1944/51 and the 1955-6/62 edi-tions and an increase in 1965” (p. 25). In 11th-grade literature

books (although this takes us beyond the scope of own study), Chall reported that the most striking finding was the “great consistency”—rather than a decline—in the difficulty of the four books she analyzed published between 1949 and 1968 (p. 43). Taken together, Chall’s studies found either a consis-tency or a noteworthy increase in textbook challenge and diffi-culty beginning in the early 1960s: the reverse bell-shaped curve.

Two other things trouble us about the CCSS research base beyond the errors suggested by the examples above. First, what we find in the literature utilized by the CCSS authors is, essen-tially, a tight and closed loop of researchers citing one another and leading, we suggest, to an artificially heightened sense of scholarly agreement about a decline in textbook complexity. Second, these perspectives collectively offer a nostalgic view of American education that is illusory and deceptive. It harkens back to a golden age of education, “the good old days,” when standards were high and schools ensured an intellectually rigor-ous education; this is a notion that historians have sought to dispel, for it represents a romanticized view of history, one that too easily ignores the realities of earlier phases of American schooling, overlooking its often elite and exclusory nature (see, e.g., Graham, 1992).

Focus and Design of Our Reading Study

Our study of reading textbooks, then, is an attempt to step back from the hyperbole of constant curricular decline that is persis-tently evoked by the CCSS/ELA authors, David Coleman and Susan Pimentel.8 Our research team has systematically collected a large data set of elementary school reading textbooks published between 1905 and 2004. We selected Grades 3 and 6 for our investigation, because these two grade levels tend to bookend the deliberate instruction of reading comprehension strategies in elementary schools. Although we amassed an even larger num-ber of books, our analysis here utilizes 187 third-grade and 71 sixth-grade reading textbooks, for a total of 5,049,057 words and 4,939,458 words, respectively.

Our goal was to locate a sample of third- and sixth-grade reading textbooks that accurately represented the textbook mar-ket from roughly the 1890s to 2008. In preparation, we reviewed a variety of key sources in the field (including Chall, 1967, 1983, 1996; Durkin, 1978–1979; Durkin, 1981; Huey, 1908; Mathews, 1966; Pearson, 2009; Smith, 1934, 1965, 2002; Venezky, 1987), examined research studies produced through-out the century, and studied decades of relevant journal articles to identify important or popular basal reading textbooks and publishers.9

Analyzing the Linguistic Complexity of Textbooks

The data set used in this study consisted of “texts” (linguists use the term “texts” to refer to individual reading units within a text-book such as stories, nonfiction narratives, poems, etc.) from third- and sixth-grade reading textbooks that were widely adopted in American elementary schools over the past century. All textbooks were scanned and digitized using OmniPage, an optical character recognition program. To prepare texts for

384 EDUcAtIONAL RESEARcHER

analysis, each individual text was manually checked to ensure accuracy and was annotated with a series of relevant meta- information, including publisher, print year, grade level, start page, end page, and genre. Table 1 summarizes the details of the data set. To ensure the reliability of the measures applied, we included only texts with at least 100 words. Altogether, there were 5,259 texts in the third-grade data set and 2,782 texts in the sixth-grade data set (sixth-grade texts are longer). Textbooks were grouped into 10 time periods, each of which covers one decade, beginning 5 years prior to the beginning of a decade and ending 4 years after the turn of the decade (e.g., 1955–1964).

Methods of Analysis

We used four different measures to understand changes in the texts over the century; two of these measures focus on lexical dif-ficulty (LEX and word frequency band [WFB]), whereas the other two measures calculate the readability and mean sentence length of the texts. Lexical difficulty measures the degree of sophistica-tion of the words in a text. Readability formulas have commonly defined difficult words in two ways: one based on word length or syllable structure, and the other based on word frequency or familiarity. What constitutes familiar or difficult words will vary from reader to reader; however, previous studies have reported experimental evidence that word frequency significantly affects reading comprehension (Marks, 1974; McGregor, 1989). The

two measures of lexical difficulty adopted here are both based on an analysis of the frequency of the words in the text. The first measure, LEX, is taken directly from Hayes et al. (1996) to ensure direct comparability of our results with the findings of their study (readers are directed to Hayes et al. for details). Briefly, Hayes compared the distribution of frequency ranks of words in a text-book against that in major newspapers, which was used for the reference level of lexical difficulty. LEX scores were calculated using QANALYSIS 6.0 (developed by Hayes, 2003) for texts with 1,000 words or more (to ensure reliability, following the instructions of the software). As summarized in Table 1, a total of 2,151 third-grade texts and 1,718 sixth-grade texts satisfied this criterion.

LEX uses word frequency ranks derived from a fixed lexicon, namely, The American Heritage Word Frequency Book (Carroll, Davies, & Richman, 1971), and it therefore assumes that the most “common” words have remained stable across the century. In an effort both to refine the measure and to address the prob-lem of employing an anachronistic baseline wordlist, we devel-oped some novel modifications in applying our second measure of lexical difficulty, the WFB score. For the WFB, then, we ana-lyzed frequency bands of the words in each text using a wordlist derived from texts published in the same period as the textbook in question. Specifically, we derived a separate wordlist for each of the 10 decades from the 1910s to the 2000s using the American English component of the Google Books Corpus (Michel et al.,

Table 1Details of the Third- and Sixth-Grade Data Sets

Grade Decade Years Books Texts Words Mean length SD LEXTexts

3 1910 1905–1914 10 489 346,401 708.39 795.00 117 1920 1915–1924 7 298 216,670 727.08 673.34 83 1930 1925–1934 16 638 471,188 738.54 631.13 163 1940 1935–1944 10 308 322,083 1045.72 615.41 154 1950 1945–1954 12 384 400,946 1044.13 638.58 166 1960 1955–1964 13 362 368,677 1018.44 683.29 148 1970 1965–1974 35 857 982,683 1146.65 735.49 470 1980 1975–1984 33 803 831,931 1036.03 711.29 354 1990 1985–1994 25 566 618,952 1093.55 727.96 294 2000 1995–2004 26 554 489,526 883.62 685.59 202 All 1905–2004 187 5,259 5,049,057 960.08 716.38 2,151 6 1910 1905–1914 3 133 158,503 1191.75 1086.15 61 1920 1915–1924 5 252 444,822 1765.17 2755.04 102 1930 1925–1934 6 357 427,087 1196.32 1294.04 150 1940 1935–1944 6 272 486,467 1788.48 1491.61 170 1950 1945–1954 4 160 306,149 1913.43 1404.17 116 1960 1955–1964 8 301 537,425 1785.47 981.39 243 1970 1965–1974 8 262 549,843 2098.64 2002.21 193 1980 1975–1984 14 456 798,805 1751.77 2340.00 303 1990 1985–1994 13 351 816,438 2326.03 2454.29 264 2000 1995–2004 4 238 413,919 1739.16 1664.93 116 All 1905–2004 71 2,782 4,939,458 1775.51 1940.12 1,718

Note. LEXTexts denotes the number of texts with 1,000 or more words; LEX scores were computed for these texts only; only main reading texts with 100 or more words are included in the table.

OctObER 2013 385

2011). The component consists of more than 118.88 billion words from more than 1.05 million books published in the period 1905–2004. For each text, we first determined the wordlist to be used based on its year of publication; we then scored each word in the text (excluding punctuations, symbols, numbers, and proper names) based on its frequency band, as follows: 0 for the top 1,000 most frequent words, 1 for the sec-ond 1,000, and so on, through 10 for words beyond the first 10,000. The WFB score of a text was the average score of all scored words in the text.

In addition to the two lexical difficulty measures, we also cal-culated mean length of sentence (MLS) and the New Dale-Chall readability index (Chall, 1995); these are also commonly used indices and, therefore, provide complementary approaches to verifying our results. Sentence and word segmentation was per-formed by the Stanford part-of-speech tagger (Toutanova, Klein, Manning, & Singer, 2003), which also provided part-of-speech information for each word (useful for other analyses not reported here). A simple python script was then used to count the num-ber of sentences and words in each text and subsequently to cal-culate its MLS. The New Dale-Chall readability index was calculated using an in-house tool that implemented the read-ability formula (for additional methodological details, see Lu, 2009).

Results of the Linguistic Analysis

LEX

The average LEX scores of third- and sixth-grade texts published in different decades are presented in Figure 1.

Third grade. A one-way analysis of variance (ANOVA) showed significant between-decade differences in the mean LEX scores of third-grade texts, F(9, 2141) = 38.04, p < .001. Tukey’s hon-estly significant difference (HSD) post hoc tests revealed the fol-lowing significant between-decade differences:

1. The 2000s had a significantly higher LEX score than all other decades (p < .001).

2. The 1990s had a significantly higher LEX score than all earlier decades (p < .05).

3. The 1940s had a significantly lower LEX score than all other decades (p < .001).

4. The 1930s had a significant lower LEX score than the 1970s and 1980s (p < .005).

Sixth grade. A one-way ANOVA also revealed significant between-decade differences in the mean LEX scores of sixth-grade texts, F(9, 1708) = 9.641, p < .001. Tukey’s HSD post hoc tests revealed the following significant between-decade dif-ferences only:

1. The 1920s had a significantly higher LEX score than all other decades (p < .01).

2. The 1930s had a significantly higher LEX score than the 1980s and 1990s (p < .05).

Taken together, these results indicate that the mean LEX scores for third-grade texts declined from the 1910s through the 1940s but then increased steadily from the 1950s onwards, with the texts published in the 1990s and 2000s exhibiting the high-est LEX scores among all decades. For sixth-grade texts, LEX scores were comparable across all decades but the 1920s and 1930s, the two decades that had the highest LEX scores. In other words, the lexical difficulty of sixth-grade texts did experience a decline after the peak of the 1920s, but it stabilized by the 1940s and remained relatively constant after that.

WFB Scores

The mean WFB scores of third- and sixth-grade texts from dif-ferent decades are summarized in Figure 2. WFB scores, notably, differ from the LEX scores in that they use a historically accurate reference lexicon; therefore, they take into account the words

FIGURE 1. Mean LEX scores of third- and sixth-grade reading texts. Y-error bars indicate the 95% confidence interval for the mean.

386 EDUcAtIONAL RESEARcHER

that a reader might have been expected to know at a given point in time.

Third grade. A one-way ANOVA showed significant between-decade differences in WFB scores of third-grade texts, F(9, 5249) = 60.243, p < .001. Tukey’s HSD post hoc tests revealed the following between-decade differences:

1. The 2000s had a significantly higher WFB score than all other decades (p < .001).

2. The 1990s had a significantly higher WFB score than the 1910s–1970s (p < .05).

3. The 1980s had a significantly higher WFB score than the 1920s and 1950s (p < .005).

4. The 1970s had a significantly higher WFB score than the 1950s (p < .001).

5. The 1940s had a significantly lower WFB score than all other decades (p < .001).

6. The 1930s had a significantly lower WFB score than all other decades (p < .01) except the 1920s, 1940s, and 1950s.

Sixth grade. A one-way ANOVA also showed significant between-decade differences in WFB scores of sixth-grade texts, F(9, 2772) = 5.109, p < .001. Tukey’s HSD post hoc tests revealed that the 1920s and 1990s both had significantly higher scores than the 1930s, 1940s, and 1950s (p < .05).

Similar to the results on the LEX scores, these results indicate that for third-grade texts, WFB declined from the 1910s to the 1940s, increased steadily thereafter, notably from the 1970s, reaching the highest level in the 2000s. For sixth-grade texts, WFB scores stayed largely stable, with the exception of the 1920s and 1990s, the two decades with the highest WFB scores. In effect, WFB scores result in the kind of reverse bell-shaped curve that Chall found in her research.

New Dale-Chall Readability Index

The mean Dale-Chall readability indices of third- and sixth-grade texts published in different decades are summarized in Figure 3.

Third grade. A one-way ANOVA showed significant between-decade differences in Dale-Chall readability indices of third-grade texts, F(9, 5249) = 108.339, p < .001. Tukey’s HSD post hoc tests revealed the following significant between-decade differences:

1. The 1910s had a significantly higher readability index than all other decades (p < .001).

2. The 1920s and 1930s both had significantly higher read-ability indices than the 1940s–2000s (p < .001).

3. The 1940s had a significantly higher readability index than the 1950s (p = .025).

4. The 2000s had a significantly higher readability index than the 1950s–1990s (p < .001).

Sixth grade. A one-way ANOVA also showed significant between-decade differences in Dale-Chall readability indices of sixth-grade texts, F(9, 2772) = 46.644, p < .001. Tukey’s HSD post hoc tests revealed the following significant between-decade differences:

1. The 1920s had a significantly higher readability index than all other decades (p < .001).

2. The 1910s and 1930s both had significantly higher read-ability indices than the 1940s–2000s (p < .001).

3. The 1940s had a significantly higher readability index than the 1960s, 1980s, and 1990s (p < .01).

Taken together, these results indicate that for both third- and sixth-grade texts the Dale-Chall readability index was significantly

FIGURE 2. Mean word frequency band (WFB) scores of third- and sixth-grade reading texts. Y-error bars indicate the 95% confidence interval for the mean.

OctObER 2013 387

higher in the earlier decades (especially 1910s–1930s) than in the later decades (especially 1950s–1990s). No significant changes were found in the later decades (1950s–2000s), with the exception of a significant improvement in the 2000s for third-grade texts. However, because the Dale-Chall index combines the percentage of difficult words and the average sentence length, it is reasonable to suspect that the results on the readability index, especially those that appear to contradict our LEX and WFB scores, are a conse-quence of between-decade differences in mean sentence length rather than in lexical difficulty. We turn to this difference below.

Mean Length of Sentence

The MLS of third-grade and sixth-grade texts is summarized in Figure 4.

Third grade. A one-way ANOVA showed significant between-decade differences in MLS of third-grade texts, F(9, 5249) = 110.698, p < .001. Tukey’s HSD post hoc texts revealed similar

patterns of between-decade differences in MLS as those in the Dale-Chall readability index:

1. The 1910s had a significantly higher MLS than all other decades (p < .001).

2. The 1920s and 1930s both had a significantly higher MLS than the 1940s–2000s (p < .001).

3. The 1940s had a significantly higher MLS than the 1950s and 1980s (p < .05).

4. The 2000s had a significantly higher MLS than the 1950s–1990s (p < .005).

Sixth grade. A one-way ANOVA also showed significant between-decade differences in MLS of sixth-grade texts, F(9, 2772) = 46.687, p < .001. Tukey’s HSD post hoc texts revealed the same patterns of between-decade differences in MLS as those in the Dale-Chall readability index:

FIGURE 3. Mean Dale-Chall readability index of third- and sixth-grade reading texts. Y-error bars indicate the 95% confidence interval for the mean.

FIGURE 4. Mean length of sentence (MLS) of third- and sixth-grade reading texts. Y-error bars indicate the 95% confidence interval for the mean.

388 EDUcAtIONAL RESEARcHER

1. The 1920s had a significantly higher MLS than all other decades (p < .005).

2. The 1910s and 1930s both had a significantly higher MLS than the 1940s–2000s (p < .005).

3. The 1940s had a significantly higher MLS than the 1960s, 1980s, and 1990s (p < .001).

Similar to the results on the readability index, these results indicate that for both third- and sixth-grade texts, MLS was significantly higher in the earlier decades (especially 1910s–1930s) than in the later decades (especially 1950s– 1990s). No significant changes were found in the later decades (1950s–2000s), with the important exception of a significant improvement in the 2000s for third-grade texts. Given the increases we found in LEX and WFB, it is therefore likely that the between-decade differences in readability indices resulted from the differences in MLS, rather than those in lexical difficulty.

Analytic Differences Between Our Study and Previous Investigations

Before we discuss our findings, it may be worthwhile to reiterate briefly why our results are so at odds with those of Hayes et al. (1996), especially as their work has provided much of the evi-dence upon which other claims of schoolbook simplification are based. Our study differs from theirs both in terms of data sets and methods, even though we used one measure, the LEX, in common. First, our study was based upon a much larger corpus than the Hayes team used (9.98 million words vs. 1.14 million), and ours was more targeted (centering on third- and sixth-grade vs. spread across Grades 1 through 8). Hayes’s sample used 10 to 30 pages from each text, whereas we used whole texts. Hayes chunked their findings into three broad time periods—1919 to 1945, 1946 to 1962, and 1963 to 1991—whereas we grouped our books into decades in an effort to better capture the subtle-ties of change across the century. We also attempted to gather a fully representative sample of texts for each decade across the century; Hayes felt that his sample of textbooks was relatively complete for the years 1963–1991 but acknowledged (p. 494) that there were many omissions for the period 1919 to the early 1960s.

In addition to the interpretative differences in understanding historical context we noted with the example of the McGuffey Readers, we attempted to contextualize our textbooks through our approach to determining the WFB score. We analyzed fre-quency bands of the words in each text using a wordlist derived for each of the 10 decades from the 1910s to the 2000s from texts published in the same period as the textbook (via the Google Books Corpus) rather than using a fixed lexicon (The American Heritage Word Frequency Book, 1971) for all texts. Our goal here, of course, is less to critique the Hayes et al. study per se (for it offers an important first effort at understanding changes to text complexity over time); rather, our larger concern is to challenge the scant research base and the questionable assump-tions embedded in the CCSS.

Discussion and Conclusion

Our findings show a distinctly different pattern of historical shifts in text complexity than the simple declines reported by the authors of the CCSS. Regarding third-grade texts, although our measures do show some declines during the early decades of the century (most likely the by-product of the concerted shift in textbooks from reading-for-elocution toward reading-for- comprehension), the clear and compelling story is that the dif-ficulty of reading textbooks has increased steadily over the past 70 years. The trends for sixth-grade, though not as noticeably buoyant in reversing earlier trends, reveal a remarkable stability, with some increases, since the 1940s. Put simply, our findings offer compelling evidence that the complexity of reading text-books, at least at the third- and sixth-grade levels, has either increased or remained noticeably consistent over the past three-quar-ters of a century. Drawing upon this central conclusion, we dis-cuss five implications of our study.

First, the blanket condemnation made by the CCSS authors that school reading texts have “trended downward in difficulty in the last half century” is inaccurate (Appendix A, p. 2). Given that our corpus of 10 million words is richer, that our time period is longer, and that our measures are more extensive than those employed by previous studies, our investigation offers serious challenges to the historical research embedded in the CCSS. In fact, our third-grade findings echo the reverse bell-shaped curve that Chall detected in the 1970s. They are also consistent with the findings of more recent studies, such as analyses by reading researcher Hiebert, who argues that “the claim that K-3 texts have been dumbed down over the past 50 years is simply not true” (Hiebert, 2011–2012, p. 26; Hiebert & Mesmer, 2013). Our findings also lend additional credence to the argument that elementary grades, more generally, have not experienced decline (Hiebert & Mesmer, 2013). In other words, if the CCSS authors wish to advocate for dramatic increases in text complexity, espe-cially in the reading materials of the elementary grades, they will need to find justifications beyond assumed historical decline for doing so. Otherwise, we may be both hastily attempting to solve a problem that does not exist and elevating text complexity in a way that is ultimately harmful to students.

Second, the CCSS effort to quickly ratchet up text complex-ity in elementary reading materials seems unnecessarily, even irresponsibly, rushed. Our study indicates that text complexity is not the primary problem. Here again we agree with Hiebert (2011–2012, p. 27), who wonders why we allow ourselves to be “harried by another new standard that has yet to be validated.” According to a wide range of studies, the much more significant challenge is that many third- and fourth-graders are not reading proficiently at current complexity levels. Indeed, results from the 2011 NAEP indicate only one third of our fourth-graders are reaching the current “proficient” or “advanced” levels, whereas another full third of American school children languish at “below basic” (Aud et al., 2012). Meanwhile, a recent report by ACT (2012) indicates that students who have fallen behind in ele-mentary schools are less likely to catch up later. In other words, increasing text complexity without serious attention to concomi-tant instructional supports is likely to further broaden and

OctObER 2013 389

exacerbate the achievement gap without addressing underlying causes.

A third and related point is that overemphasis on text com-plexity distracts us from educational problems that are arguably much more pressing. Why, for example, should we focus on ele-vating elementary text complexity rather than focusing on, say, continuing to improve instructional quality? For that matter, per-haps we should urge the reversal of recent school funding cuts in states across the nation, reductions that often undermine basic instructional services for the most vulnerable children. Shall we tinker with complexity levels while overlooking the egregious educational inequities and scandalous socioeconomic conditions that researchers have demonstrated are persistent causes of low academic performance? As we know, students in affluent com-munities are doing relatively well whereas students in high-pov-erty schools are the lowest performers (Berliner, 2006; Reardon, 2011; Rothstein, 1998); higher text complexity levels are likely to ignore this problem while further widening the achievement gap.

Fourth, although the CCSS tends to equate higher text com-plexity with greater academic rigor, higher complexity, in and of itself, is not always better, as the example of McGuffey shows. Chall, Conard, and Harris-Sharples (1991, p. 18) cautioned more than 20 years ago that it is one thing to use quantitative measures to determine how hard a text is, but “optimal difficulty”—that is, how hard it should be for each student—depends on many factors. Williamson, Fitzgerald, and Stenner (2013, p. 65) argue that consideration of the appropriate chal-lenge for students ought to take into account how much challenge is beneficial to students; researchers, they note, have for decades sought a “sweet spot” that situates printed text at exactly the best level for any given individual student. Our results indicate, for example, that there are tradeoffs between such text features as lexical difficulty and sentence length. Would it not be better to engage teachers in an ongoing dual effort to uncover reading dif-ficulties and determine outstanding texts?

Finally, text complexity, though important, is only one dimension of what makes for an excellent, robust, and engaging reading program. One hazard of focusing too intently on the CCSS notion of text complexity is that it ushers us into a rather rarified atmosphere, where Lexile rank—or any static single standard—becomes all-important. The danger is that we will lose sight of what researchers and practitioners view as high, but reasonable, expectations, or that we will overlook the kinds of texts that students find most exciting.

In sum, our findings raise implications for policy, research, and practice, especially as the nation pursues future curricular innovation. We call for a broader view of complexity that incor-porates text, instruction, and a wider variety of materials, as well as and for an assessment approach using measures that are less restrictive. Our study did not examine textbooks used in seventh through twelfth grade, and we suggest that research examining the changes in middle and high school texts would be highly fruitful, particularly if such an investigation examined not only readings used in English Language Arts but also those assigned in disciplines such as science and social studies. Such a study would provide an even more complete picture of variations in text complexity over time and provide a much firmer base upon which to base future curricular improvements.

NOTES

The research reported in this article was made possible by a grant from the Spencer Foundation. The views expressed are those of the authors and do not necessarily reflect the views of the Spencer Foundation. The authors wish to express their appreciation to the four anonymous reviewers who gave feedback on this article; here too, any errors or opinions expressed are our own.

1Investigators on our reading project include Robert J. Stevens, Hilary Knipe Swank, Steven L. Thorne, and David P. Baker. This proj-ect also parallels another study conducted at Penn State on the history of the mathematics curriculum (David P. Baker, principal investigator) (see Baker et al., 2010; Blair, Gamson, Thorne, & Baker, 2005). The authors thank the many graduate and undergraduate research assistants who aided us with the research reported here.

2Many of the dimensions that constitute “text complexity” have yet to be operationalized for the classroom. As originally conceived by the authors of the CCSS, text complexity is to be determined by a triad of features, including qualitative dimensions (e.g., structure, knowl-edge demands, levels of meaning, and language conventionality and clarity), quantitative dimensions (e.g., word length or frequency, sen-tence length, and text cohesion—all aspects of text that can be mea-sured by computer software), and reader considerations (e.g., variables such as reader motivation, knowledge, and experiences; along with purpose and complexity of the task assigned and the questions posed). Hiebert and Mesmer (2013) report that since the original publication of the Standards, CCSS personnel now include additional measures of text features, beyond the Lexile Framework, in their presentations. However, these recommendations have not been officially integrated into Appendix A. Other scholars—including some that work for the organization that developed the Lexile Framework—have argued that the adoption of decisions about shifting quantitative text complexity levels in schools “requires more than the implementation of a single, static standard” (Williamson et al., 2013, p. 56); they suggest that there are multiple trajectories that can lead to the CCSS end-of-high-school target for text complexity exposure, thereby relieving practitioners of the burden of following a single static standard. For more details on the Lexile Framework, see www.lexile.com.

3Perhaps the most recent, and most thorough, effort to develop a working model of text complexity can be found in Mesmer, Cunningham, and Hiebert (2012).

4Researchers examining the relative difficulty of the language used in student textbooks tend to use a variety of descriptive terms inter-changeably, such as difficulty, challenge, readability, sophistication, and rigor. Some terms—readability, for example—employ specific formulas to gauge the relative difficulty of any given text as a way to designate books as appropriate for certain grade levels. In their discussions of his-torical trends, the CCSS employ a variety of terms, but their remedies rely on a rather narrow conception of “text complexity”; and, as we point out, definitions of the text complexity vary, as do definitions of academic rigor. Because the CCSS use the term “text complexity,” we use that term here. Mesmer et al. (2012) suggest that in future research, scholars should distinguish between text complexity and text difficulty.

5Space does not allow for an in-depth accounting of the analytic problems we found in the literature used by the CCSS authors, so we offer just two brief illustrations here. We anticipate addressing these concerns in more detail in our future work.

6One dimension not discussed by the CCSS authors is that both the Hayes et al. (1996) and the Chall et al. (1977) studies were moti-vated in part by an interest in explaining the reported declines in SAT scores that received a great deal of attention beginning in the 1970s. Indeed, Chall’s study was initiated at the behest of the College Board. For their part, Hayes et al. argue that the decline in SAT scores can be explained by a “knowledge deficit” that was caused in part by the

390 EDUcAtIONAL RESEARcHER

declining challenge in the textbooks they examined; however, numer-ous studies have cautioned against using SAT scores to draw conclu-sions about the quality of American schools (see, e.g., Grissmer, 2000; Steelman & Powell, 1996).

7Even those who draw upon the Hayes et al. study to substantiate their own claims acknowledge that Hayes’s sample was rather limited (see, e.g., Adams, 2010–2011).

8Neither David Coleman nor Susan Pimentel is a trained educator, researcher, or reading specialist.

9For further discussion of our design and approach, see our method-ological appendix (available on the journal website).

REFERENCES

ACT. (2012). Catching up to college and career readiness. Iowa City, IA: ACT.

Adams, M. J. (2010–2011). Advancing Our Students’ Language and Literacy: The Challenge of Complex Tasks. American Educator, 34(4), 3–11, 53.

Aud, S., Hussar, W., Johnson, F., Kena, G., Roth, E., Manning, E., . . . Zhang, J. (2012). The Condition of Education 2012 (NCES 2012-045). Washington, DC: U.S. Department of Education, National Center for Education Statistics. Retrieved from http://nces.ed.gov/pubsearch

Baker, D., Knipe, H., Collins, J., Leon, J., Cummings, E., Blair, C., & Gamson, D. (2010). One hundred years of elementary school mathematics in the United States: A content analysis and cognitive assessment of textbooks from 1900 to 2000. Journal for Research in Mathematics Education, 41, 383–423.

Berliner, D. C. (2006). Our impoverished view of educational research. Teachers College Record, 108, 949–995.

Blair, C., Gamson, D., Thorne, S., & Baker, D. (2005). Rising mean IQ: Cognitive demand of mathematics education for young chil-dren, population exposure to formal schooling, and the neurobiol-ogy of the prefrontal cortex. Intelligence, 33, 93–106.

Carroll, J. B., Davies, P., & Richman, B. (1971). The American Heritage word frequency book. Boston, MA: Houghton Mifflin.

Chall, J. S. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books.

Chall, J. S. (1996). Learning to read: The great debate (3rd ed.). Fort Worth: Harcourt Brace College.

Chall, J. S., Conard, S., & Harris, S. (1977). An analysis of textbooks in relation to declining SAT scores. New York: Advisory Panel on the Scholastic Aptitude Test Score Decline.

Chall, J. S., Conard, S. S., & Harris-Sharples, S. (1991). Should textbooks challenge students? The case for easier or harder textbooks. New York, NY: Teachers College Press, Teachers College, Columbia University.

Durkin, D. (1978–1979). What classroom observations reveal about reading comprehension instruction. Reading Research Quarterly, 14, 481–533.

Durkin, D. (1981). Reading comprehension instruction in five basal reader series. Reading Research Quarterly, 16, 515–544.

Fisher, D., Frey, N., & Lapp, D. (2012). Text complexity: raising rigor in reading. Newark, DE: International Reading Association.

Graham, P. A. (1992). S.O.S.: Sustain our schools. New York, NY: Hill and Wang.

Grissmer, D. (2000). The continuing use and misuse of SAT scores. Psychology, Public Policy, and Law, 6, 223–232.

Hayes, D. P. (2003). A guide to lexical analysis of natural texts using QLEX or QANALYSIS. Sociology Technical Report (Vol. 2033-1). Ithaca, NY: Cornell University.

Hayes, D. P., Wolfer, L. T., & Wolfe, M. F. (1996). Schoolbook sim-plification and its relation to the decline in SAT-Verbal scores. American Educational Research Journal, 33, 489–509.

Hiebert, E. H. (2011). Beyond single readability measures: Using mul-tiple sources of information in establishing text complexity. Journal of Education, 191(2), 33.

Hiebert, E. H. (2011–2012). The Common Core’s staircase of text complexity: Getting the size of the first step right. Reading Today, 29(3), 26–27.

Hiebert, E. H., & Mesmer, H. A. E. (2013). Upping the ante of text complexity in the Common Core State Standards: Examining its potential impact on young readers. Educational Researcher, 42, 44–51.

Huey, E. B. (1908). The psychology and pedagogy of reading, with a review of the history of reading and writing and of methods, texts, and hygiene in reading. New York, NY: Macmillan.

Loveless, T. (2012). The 2012 Brown Center report on American edu-cation: How well are American students learning? With sections on predicting the effect of the Common Core State Standards, Achievement Gaps on the Two NAEP Tests, and Misinterpreting International Test Scores (Vol. 3). Washington, DC: Brookings Institution.

Lu, X. (2009). Automatic measurement of syntactic complexity in child language acquisition. International Journal of Corpus Linguistics, 14(1), 3–28.

Marks, C. B. (1974). Word frequency and reading comprehension. Journal of Educational Research, 67, 259.

Mathews, M. M. (1966). Teaching to read, historically considered. Chicago: University of Chicago Press.

McGregor, A. K. (1989). The effect of word frequency and social class on children’s reading comprehension. Reading, 23, 105–115.

McGuffey, W. H., & Gorn, E. J. (1998). The McGuffey readers: Selections from the 1879 edition. Boston, MA: Bedford Books.

Mesmer, H. A., Cunningham, J. W., & Hiebert, E. H. (2012). Toward a theoretical model of text complexity for the early grades: Learning from the past, anticipating the future. Reading Research Quarterly, 47, 235–258.

Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., The Google Books Team, . . . Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014), 176–182.

National Governors Association Center for Best Practices & Council of Chief State School Officers (NGA/CCSSO). (2010a). The Common Core State Standards. Washington, DC: Author.

National Governors Association Center for Best Practices & Council of Chief State School Officers (NGA/CCSSO). (2010b). The Common Core State Standards for English Language Arts & Literacy in History/Social Studies, Science, and Technical Subjects, Appendix A: Research Supporting Key Elements of the Standards. Washington, DC: Author.

Pearson, P. D. (2009). The roots of reading comprehension instruc-tion. In S. E. Israel & G. G. Duffy (Eds.), Handbook of research on reading comprehension (pp. 3–31). New York, NY: Routledge.

Popp, H. M. (1975). Current practices in the teaching of beginning reading. In J. B. Carroll & J. S. Chall (Eds.), Toward a literate soci-ety: The report of the Committee on Reading of the National Academy of Education with a series of papers commissioned by the committee (pp. 101–146). New York, NY: McGraw-Hill.

Porter, A., McMaken, J., Hwang, J., & Yang, R. (2011). Common Core Standards: The new U.S. intended curriculum. Educational Researcher, 40, 103–116.

Reardon, S. F. (2011). The widening academic achievement gap between the rich and the poor: New evidence and possible explanations. In G. J. Duncan & R. J. Murnane (Eds.), Whither opportunity? Rising inequality, schools, and children’s life chances (pp. 91–115). New York, NY: Russell Sage Foundation.

OctObER 2013 391

Resnick, D. P., & Resnick, L. B. (1977). The nature of literacy: An historical exploration. Harvard Educational Review, 47, 370–385.

Rothstein, R. (1998) The way we were? The myths and realities of America’s Student Achievement. New York: Century Foundation.

Schmidt, W. H., & Houang, R. T. (2012). Curricular coherence and the Common Core State Standards for mathematics. Educational Researcher, 41, 294–308.

Smith, N. B. (2002). American reading instruction. Special edition. Newark, DE: International Reading Association.

Snow, C. (2002). Reading for understanding: Toward a research and development program for reading comprehension. Santa Monica, CA: Rand.

Steelman, L. C., & Powell, B. (1996). Bewitched, bothered, and bewil-dering: The use and misuse of state SAT and ACT scores. Harvard Educational Review, 66, 27.

Sullivan, D. P. (1994). William Holmes McGuffey: Schoolmaster to the nation. Rutherford, NJ: Associated University Presses.

Toutanova, K., Klein, D., Manning, C., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 252–259). Edmonton, Alberta, Canada: The Association for Computational Linguistics.

Venezky, R. L. (1987). A history of the American reading textbook. Elementary School Journal, 87, 246–265.

Westerhoff, J. H. (1978). McGuffey and his readers: Piety, morality, and education in nineteenth-century America. Nashville: Abingdon.

Williamson, G. L., Fitzgerald, J., & Stenner, A. J. (2013). The Common Core Standards’ quantitative text complexity trajectory:

Figuring out how much complexity is enough. Educational Researcher, 42(2), 59–69.

AuTHORS

DAVID A. GAMSON, PhD, is an associate professor of education in the Department of Education Policy Studies at The Pennsylvania State University, 300 Rackley Building, University Park, PA 16802; [email protected]. His research focuses on the history of American education, with a special interest in the history of reform, policy, school districts, and the curriculum.

XIAOFEI LU, PhD, is Gil Watz Early Career Professor in Language and Linguistics and Associate Professor of Applied Linguistics at The Pennsylvania State University, 304 Sparks Building, University Park, PA 16802; [email protected]. His research interests are primarily in corpus linguistics, computational linguistics, and intelligent computer-assisted language learning.

SARAH ANNE ECKERT, PhD, teaches history and provides research support to the Center for the Advancement of Girls at The Agnes Irwin School in Bryn Mawr, Pennsylvania; [email protected]. Her research focuses on teacher preparation for urban areas and education policy.

Manuscript received February 22, 2013Revision received July 18, 2013

Accepted August 13, 2013