L1/L2 Adolescent Vocabulary Use in a Learner Corpus: Academic Success and Lexical Sophistication and...

69
L1/L2 Adolescent Vocabulary Use in a Learner Corpus: Academic Success and Lexical Sophis>ca>on/ Diversity in Grade12 Expository Wri>ng Geoffrey Pinchbeck, PhD (Medical Biochemistry) PhD Candidate [email protected] Werklund School of Educa>on University of Calgary AAAL 2015 American Associa1on for Applied Linguis1cs Toronto, ON March 21, 2015

Transcript of L1/L2 Adolescent Vocabulary Use in a Learner Corpus: Academic Success and Lexical Sophistication and...

L1/L2  Adolescent  Vocabulary  Use  in  a  Learner  Corpus:  Academic  Success  and  Lexical  Sophis>ca>on/

Diversity  in  Grade-­‐12  Expository  Wri>ng

Geoffrey  Pinchbeck,  PhD  (Medical  Biochemistry)  PhD  Candidate  

[email protected]  Werklund  School  of  Educa>on  

University  of  Calgary

AAAL  2015  American  Associa1on  for  Applied  Linguis1cs  

Toronto,  ON  March  21,  2015

L1/L2  Adolescent  Vocabulary  Use  in  a  Learner  Corpus:  Academic  Success  and  Lexical  Sophis1ca1on/Diversity  in  

Grade-­‐12  Expository  Wri1ng  or:          Academic  English  is  no  one’s  L1

AAAL  2015  American  Associa1on  for  Applied  Linguis1cs  

Toronto,  ON  March  22,  2015

Geoffrey  Pinchbeck,  PhD  (Medical  Biochemistry)  PhD  Candidate  

[email protected]  Werklund  School  of  Educa>on  

University  of  Calgary

Outline• Academic English language use in:

17-18 year olds • grade 12 • mainstream, academic track • mono- bi- multi- lingual

• My data:• 1003 student essays • academic transcripts

My Analysis:• Vocabulary use vs. academic success • Vocabulary use vs. K-12 textbook/reader corpora • differences in grammar use

Research on Language Development

• prenatal - 4 year old children: L1 syntax & phonetics

• early primary grades: L1 literacy: decoding & orthography

• adult immigrants, foreign language learners: L2, L3

• bilingual/multilingual learners

Research on Language Development

• prenatal - 4 year old children: L1 syntax & phonetics

• early primary grades: L1 literacy: decoding & orthography

• adult immigrants, foreign language learners: L2, L3

• language minority bilingual/multilingual learners

• L1 for adolescents: “Junior High”, “Senior High”, University... ?

• Junior High, Senior High, University, & beyond... ?

Pedagogical Intervention?

Model of development? Vocabulary

MorphologyMWUsSyntax

Discourse Register/

Genre

Research on Language Developmentacade

mic

^

Alberta, Canada Grade12 English Language Arts 30-1 (ELA)

Provincial “Diploma” Exam

English Language Arts

(ELA) 30-1

•English literature course • required for entry to all university

programs • taken by 45% of high school

graduates

Grade12 English Language Arts 30-1 (ELA)

Provincial “Diploma” Exam

Part A: Writing - 3 hours (50%) Task 1: Personal Response to Texts (20%) Task 2: Critical/Analytical

Response to Literary Texts (30%)

Part B: Reading - 3 hours (50%) 70 multiple choice questions

This Study: Essays and Data from Alberta Education

• 1500 exam essays: 1150 typed, 350 hand-written

• 1002 typed essays included (so far)

• Associated Data: • Provincial Exam scores:

• Math, English/Social Studies• ESL coding history (grades 1-12) • Years in the provincial school system (1-12 years) • Essay scores and subscores (holistic rubric)

ELA Exam Writing Task 2 (30%)

Suggested time: approximately 1½ to 2 hours

“Choose from short stories, novels, plays, screenplays, poetry, films, or other literary texts that you have studied in English

Language Arts 30–1. …”

“Discuss the idea(s) developed by the text creator in your chosen text about the

role adversity plays in shaping an individual’s identity.”

English Exam Writing Task 2 (30%)

Suggested time: approximately 1½ to 2 hours

“Choose from short stories, novels, plays, screenplays, poetry, films, or other literary texts that you have studied in English

Language Arts 30–1. …”

“Discuss the idea(s) developed by the text creator in your chosen text about the

role adversity plays in shaping an individual’s identity.”

186 topicsThis corpus:

Essay Length

Tokens40003000200010000

Frequency

150

100

50

0

Mean = 943Std. Dev. = 332

N = 1,003

Whole Population vs Sample

0"

2"

4"

6"

8"

10"

12"

14"

8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30"

sample"(n=1003)"

Popula9on"(N=16,303)"

Essay Score ( /30)

Numberof

Essays

Research Questions (Exploratory)

• Does vocabulary use in exam essays correlate with “academic success” in high school?

• What is the range of vocabulary use in high-school student writing prior to starting university?

• What kind of vocabulary is uniquely (or disproportionately) used by academically successful students?

Research Questions (Exploratory)

• Does vocabulary use in exam essays correlate with “academic success” in high school?

• What is the range of vocabulary use in high-school student writing prior to starting university?

• What kind of vocabulary is uniquely (or disproportionately) used by academically successful students?

• Goal: define important language that is teachable

What is success?

“Academic Success”Language Dependent Academic Competencies: MorphoSyntax / pronunciation / listening (?) Vocabulary

Breadth / Depth Productive / Receptive General-Academic / Subject-specific

Genre / Register / Discourse

Language Independent Academic Competencies:

Working Memory & other cognitive factors

Socio-economic status Strategies Attitude Motivation Support

Mathexam

scores

English +Social Studiesexam scores

Correlations among Government exam scores

Social Studies Academic Math Applied Math

English Language Arts .810** .432** .438**

** p < .000

Correlations among Government exam scores

English

Language Arts

Social Studies Academic Math Applied Math

English Language Arts 1 .810** .432** .438**

English + Social Studies

Combined.985** .983** .478** .473**

** p < .000

Next… Identification of language in student writing

that is disproportionately associated with academic success…

Grade 12 students are not adult L2 learners

rethink assumptions optimization of methods

Identification of language in student writing that is disproportionately associated

with academic success…

Lexical Sophistication Measurement: Optimization

Lexical Sophistication Measurement: Optimization• improve signal-to-noise ratio:

• remove: prompt words, proper nouns & compound nouns

Lexical Sophistication Measurement: Optimization• improve signal-to-noise ratio:

• remove: prompt words, proper nouns & compound nouns

• Lexical Profiles (frequency bands) vs. average frequency

• (Laufer & Nation, 1995) • (Crossley, Cobb, & McNamara, 2013)

BNC-COCA 25 Families

13K12K11K10K9K8K7K6K5K4K3K2K1K

95

99

Cumulat ive

% Word

Fami l ies

(95% CI)

100

90

80

70

60

ABCF

English

+

Social

Studies

Average

Grade

Lexical Profiles & Academic Success

Lexical Sophistication

• “average word frequency” =

sum of frequencies(in a reference corpus) of all essay wordstotal number of words

• (Crossley, Cobb, & McNamara, 2013)

Lexical Sophistication Measurement: Optimization• improve signal-to-noise ratio:

• remove: prompt words, proper nouns & compound nouns

• frequency bands vs. average frequency • (Laufer & Nation, 1995) • (Crossley, Cobb, & McNamara, 2013)

• Scale: What is a “sophisticated” word?• (choice of reference corpus/corpora)

Lexical Sophistication Measurement: Optimization• improve signal-to-noise ratio:

• remove: prompt words, proper nouns & compound nouns

• frequency bands vs. average frequency • (Laufer & Nation, 1995) • (Crossley, Cobb, & McNamara, 2013)

• Scale: What is a “sophisticated” word?• (choice of reference corpus/corpora)• SubtlexUS - subtitle corpus (TV & Movies)

Lexical Sophistication Measurement: Optimization• improve signal-to-noise ratio:

• remove: prompt words, proper nouns & compound nouns

• frequency bands vs. average frequency • (Laufer & Nation, 1995) • (Crossley, Cobb, & McNamara, 2013)

• Scale: What is a “sophisticated” word? • SubtlexUS - subtitle corpus (TV & Movies)

• definition of “word”

Definition of ‘Word’• meaning: “running a race” ; “the engine is running”

• Type = spelling : running

• Lemma = stem + inflections (POS tagging required)

• run (verb): run, runs, ran, running

• Word Family = stem + inflections + derivations

• run = run, running, runs, ran, runner, runners, runny, runnier, runniest

• Multiword units?: “run over” “on the run” “a run of bad luck”

Optimal definition of ‘Word’ (Grade 12 student writing)

• “Type”: spelling (“form”): running

• meaning: “running a race” ; “the engine is running” ;

• “Lemma”: stem + inflections

• run (verb): run, runs, ran, running

• Word Family: stem + inflections + derivations• run =

run, running, runs, ran, runner, runners, runny, runnier, runniest

• Multiword units?: “run over” “on the run” “a run of bad luck”

ELA 30-1 & Social Studies averaged exam

score

Regression Model of Language Dependent Academic Success

Dependent Variable

linear regression

• Math• no ESL coding history • Lexical Sophistication • MTLD (Lexical Diversity) • Total Word Families

Predictors

ELA 30-1 & Social Studies averaged exam

score

Regression Model of Language Dependent Academic Success

• Math• no ESL coding history • Lexical Sophistication • MTLD (Lexical Diversity) • Total Word Families

• adjusted R2 = 0.534 F (784) = 150.5 (p<.000)

Predictors Dependent Variable

linear regression

English-Social Studies averaged score (without essay score)

0.701

Regression  Model

Correla>on  (r)

Adjusted  R  Square

Lexical  Sophis>ca>on  Average  Word  Frequency  (rareity  in  Sub>tle  Corpus)  

0.55 0.343

• F (1001) = 524.3 (p<.000)

Further Validation• Essays truncated to 500, 750, 1000 words:

• words taken from beginning

• words taken from end

• words randomized

• statistically weaker but the story is the same

Now what?

What kind of vocabulary distinguishes academically successful student writing?

>80% ELA & Soc. Stud. Student Corpus:

tokens: 135852 types: 8551 families: 4359

n = 116

<53% ELA & Soc. Stud.Student Corpus:

tokens: 141820 types: 7188 families: 4163

n = 180

Compare vocabulary used: high and low achieving students

>80% ELA & Soc. Stud. Student Corpus:

tokens: 135852 types: 8551 families: 4359

n = 116

<53% ELA & Soc. Stud.Student Corpus:

tokens: 141820 types: 7188 families: 4163

n = 180

Text-Lex Compare(lextutor.ca)

Families uniqueto D/F students?

Families uniqueto A-students?

Families SharedBetween

A-students& D/F students?

Text-Lex Compare (lextutor.ca)

Families shared byboth sub-corpora: 2645

>80% - students’ writing Unique Families: 2028

>80% ELA & Soc. Stud. Student Corpus:

tokens: 135852

n = 116

<53% - students’ writing Unique Families: 671

<53% ELA & Soc. Stud.Student Corpus:

tokens: 141820

n = 180

Text-Lex Compare (lextutor.ca)

0%

4%

6%

8%

10%

12%

14%

16%

18%

20%

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K 11K 12K 13K

Families shared byboth sub-corpora: 2645

>80% - students’ writing Unique Families: 2028

>80% ELA & Soc. Stud. Student Corpus:

tokens: 135852

n = 116

<53% - students’ writing Unique Families: 671

<53% ELA & Soc. Stud.Student Corpus:

tokens: 141820

n = 180

Text-Lex Compare sub-corpora: lexical profiles

keyword word families in poor vs. good student writing

(Fisher exact test)once seem upon that while demonstrate fail world return author without support arrive ultimate discover effect idea lead when more coward create display fire honest progress sane concern defeat persevere on flaw beauty isolate norm eventually experience true initial and a greater redemption inevitable nine to circumstance cut suffer depict in transform final success strength enable pursuit pressure likely

cause confine belief desire despair reach crush approve introduce journey between tear tragedy aspect drink one it instead importance mark attitude insight system vision ruthless resolve theory garnet daisy finance worth for view part incredible degree practise sole balance shame secure justice consequent during such less lifestyle drive with face highlight description benefit inherent large pulp capable community abandon

establish period overwhelm escape read guilty grow standard apparent order embrace poet dominate essential exception advance village subsequent observe method consume remove achieve superficial quality adversary reduce procrastinate justify defence condition abort catalyst sheer notion destruction effort cruel failure represent than the scene provide external she wealth newfound tragic construct primary idealistic kettle prominent constant reject pain physical yet

this cope loss warfare surmount stem strain citizen stifle plight profound pearl culminate emphasise superior moss tempt inquisition foster can human abuse continue entire above character form past woman violence react courage shift through self compassion passion possess victory exemplify universe act challenge belle recognize extreme doubt avoid street immediate deem plague conform principle distinct convention will identify illusion

maintain potential dilemma arm confidence who obsess current confront develop force repeat replace wallpaper population fundamental resist light perceive undergo common opinion describe present build sense solid remain increase detriment inflict succumb symbol destroy soldier direct oppose further clear interact able mature often overcome reveal term reinforce effective previous result retreat root romantic jig plot era dictate necessary from

appear require rather of aware behave commission facade triumph metaphor by an simple former as upper lack rise desperate adverse social class society refuse reflect thus compromise moral audience mere persona alcohol however improve regard allow despite oppress sale blind nature response attempt perhaps yank acknowledge source mrs attack evident culture narrate ideal contrast define

laird proctor barber buddy morphine existential mountain sergeant bother assault teddy dad tiger nice special climb mile cheat movie happen bomb kid lot you big secret get guy want huge roof hammer stocking torture fantasy mum scare girl go film warden always jail just red really know shape why store keep rock together

kill every because stay talk think tell catch horrible stone start letter direction about conclusion send help holocaust soon bank crazy look matter boy back different lie people walk total misfortune say affect each side wife up happy prison thing live text good bad there try trouble whole meet like some what let

could need money do would find race guard love way ask sister main hear die how forget protagonist concentrate brother out but friend along day directed listen time god if wrong reason never give today off any away make all misery feel hard very sick harm parent where change best father play camp

name game take end not so show have little close many hate important bring year we young around run man another though put pass i life house then be now he they save come before person also

BNC-COCA frequency bands

(lextutor.ca)

<53% >80%

165 K-1 10510 K-2 965 K-3 984 K-4 182 K-5 63 K-6 113 K-7 61 K-8 21 K-9 12 K-11 21 K-13 1

197 total 350

>80%<53%

Additional Validation

Student writing sub-corpora comparisons with other large corpora:

•spoken corpora (UK/US)

•academic written corpora (UK/US)

•K-12 school texts and readers corpus (US)

Student Writing vs. Large Corporaof SPEECH

Spearman rank correlations of word family frequency

BNC Speech

(UK)

COCA Speech

(US)

TV Soap-Operas

(US)

SUBTLEX movie/tv sub-titles

(US)

SUBTLEX tv

sub-titles (UK)

above 80% 0.473 0.558 0.456 0.448 0.501

below 53% 0.554 0.621 0.595 0.590 0.595

(N= 2809 Word families)

BNC Speech

(UK)

COCA Speech

(US)

TV Soap-Operas

(US)

movie/tv sub-titles

(US)

tv sub-titles

(UK)

above 80% 0.473 0.558 0.456 0.448 0.501

below 53% 0.554 0.621 0.595 0.590 0.595

Hotelling -Williams test

t-value (df = 2803)

7.9** 6.6** 14.0** 14.2** 9.5**

(N= 2809 Word families) ** correlations are significantly different (p < 0.000)

Student Writing vs. Large Corporaof SPEECH

Spearman rank correlations of word family frequency

BNC Academic

(UK)

COCAAcademic

(US)

Above 80% 0.562 0.564

Below 53% 0.485 0.492

(N= 2809 Word families)

Student Writing vs. Large Corporaof Academic WRITING

Spearman rank correlations of word family frequency

BNC Acad (UK)

COCAAcad (US)

K-12 texts & readers

(US)

above 80 0.562 0.564 0.485

below 53 0.485 0.492 0.563

(N= 2809 Word families)

Student Writing vs. Large Corporaof Academic WRITING

Spearman rank correlations of word family frequency

BNC Academic

(UK)

COCAAcademic

(US)

K-12 texts & readers

(US)

above 80 0.562 0.564 0.485

below 53 0.485 0.492 0.563

Hotelling -Williams test

t-value (df = 2803)

8.8** 8.0** - 4.0**

(N= 2809 Word families) ** correlations are significantly different p < 0.000

Student Writing vs. Large Corporaof Academic WRITING

Spearman rank correlations of word family frequency

Student Writing vs. K-12 texts and readers

0.25%

0.3%

0.35%

0.4%

0.45%

0.5%

0.55%

0.6%

0.65%

0.7%

Gr1%

Gr2%

Gr3%

Gr4%

Gr5%

Gr6%

Gr7%

Gr8%

Gr9%Gr10%Gr11%Gr12%

Gr13+%

below53%

above80%

Spearman rank

correlation

K-12 (U.S.) School materials sub-corpora

Student Writing vs. K-12 texts and readers

0.25%

0.3%

0.35%

0.4%

0.45%

0.5%

0.55%

0.6%

0.65%

0.7%

Gr1%

Gr2%

Gr3%

Gr4%

Gr5%

Gr6%

Gr7%

Gr8%

Gr9%Gr10%Gr11%Gr12%

Gr13+%

below53%

above80%

Spearman rank

correlation

K-12 (U.S.) School materials sub-corpora

Hostelling-Williamsdifference

p < .05(n=1598

word families)

Student Writing vs. K-12 texts and readers

0.25%

0.3%

0.35%

0.4%

0.45%

0.5%

0.55%

0.6%

0.65%

0.7%

Gr1%

Gr2%

Gr3%

Gr4%

Gr5%

Gr6%

Gr7%

Gr8%

Gr9%Gr10%Gr11%Gr12%

Gr13+%

below53%

above80%

Spearman rank

correlationHostelling-Williamsdifferencep < .05

(n=1598 word

families)

t = 4.0(2803)p < .000

K-12 (U.S.) School materials sub-corpora

Is the grammar also different?

Is the grammar also different?

compare part of speech (POS) tag frequency in >80% vs. <53% student writing

Part Of Speech Tagsfrequent in failing students’ (<53%) writing

POS  tag example >80%  Student  Wri>ng  Freq.  

<53%  Student  Wri>ng  Freq.  

Fischer  exact  test  Minimal  Ra1o  (<1  is  sig.  diff.)

verb  ‘be’past  tense was,  were 1099 <-­‐> 2153 0.76

verb,  past  tense took 1974 <-­‐> 3497 0.79

verb  ‘have’past  tense had 380 <-­‐> 769 0.80

verb  be,  gerund/present  par>ciple being 270 <-­‐> 438 0.86

personal  pronoun I,  he,  it 8153 <-­‐> 10771 0.91

par>cle give  up 681 <-­‐> 1061 0.91

wh-­‐abverb where,  when 1136 <-­‐> 1562 0.95

wh-­‐pronoun who,  what 892 <-­‐> 1194 0.99

Past tense TagsKey in failing students’ (<53%) writing

POS  tag example >80%  Student  Wri>ng  Freq.  

<53%  Student  Wri>ng  Freq.  

Fischer  exact  test  Minimal  Ra1o  (<1  is  sig.  diff.)

verb  be,  past  tense was,  were 1099 <-­‐> 2153 0.76

verb,  past  tense took 1974 <-­‐> 3497 0.79

verb  have,  past  tense had 380 <-­‐> 769 0.80

verb  be,  gerund/present  par>ciple being 270 <-­‐> 438 0.86

personal  pronoun I,  he,  it 8153 <-­‐> 10771 0.91

par>cle give  up 681 <-­‐> 1061 0.91

wh-­‐abverb where,  when 1136 <-­‐> 1562 0.95

wh-­‐pronoun who,  what 892 <-­‐> 1194 0.99

Regression of English-Social Studies exams averaged

(without English essay)

0.701

Regression  Model

Correla>on  (r)

Standarized  Beta Adjusted  R  Square

R  Square  Change

Acadademic  Math  (YN) 0.32 0.2150.307 +.309

Math  Score 0.49 0.372no  ESL  code 0.22 0.202 0.373 +.067MTLD  (Lexical  Diversity) 0.15 0.018 0.389 +.017log  (total  Word  Families) 0.38 0.013 0.43 +.042

Average  Word  Frequency  (rarity  in  Sub>tle  Corpus)   0.55 0.365 0.497 +.067

Past  tense  Verb  ra>o 0.17 0.045 0.499 +.013

• F (782) = 112.4 (p<.000)

Summary• Lexical Sophistication is important, and is NOT the

only factor • development likely occurs in multiple overlapping

dimensions:

• narrative -> expository • spoken -> written • child-like -> adult-like • general English -> academic English + technical

words • personal -> impersonal • ELLs -> proficient bilinguals

Vocabulary Use?

• This data captures only the appearance of words used in essays

• can’t distinguish between vocabulary knowledge and ability to use academic register

• no claims about whether the words are used according to conventions

Vocabulary Use?• data captures just the appearance of words in

essays

• frequency data can’t distinguish between vocabulary knowledge and ability to use academic register

• no claims about whether the words are used according to conventions

Vocabulary Use?• the data captures just the appearance of words in

essays

• can’t distinguish between vocabulary knowledge and ability to use academic register

• no claims about whether the rare words are used according to conventions

• e.g. “Now, I will comprise[?] my final argument…”

Implications for Main-stream Education?

Reading Comprehension

Volume of Reading

Vocabulary

Adapted from Nagy, 2005.

• selection and teaching of individual words

• exposure to rich language

• morphological awareness

• matching student to text: 95%-98% coverage

• time to read• fluency• motivation

• comprehension strategies

• building background knowledge

• morphological awareness

Adapted from Nagy, (2005)

Reading Comprehension

Volume of Reading

Vocabulary

Implications for Education?

GeneralEnglish

Vocabulary

Mid-Frequency

High Frequency

Very High Frequency

school life

Social Studies

Biology

Chemistry

Physics

Math

Home life

Social Life

&

Unique life Experience

EnglishLanguage

Arts

General Academic

Vocabulary

Vocabulary Development

Tree model

Future Work?

• Maintain relationship with government

• Developmental / longitudinal L1 academic English “learner” corpus:

• compounds / multi-word units / phraseology • morphology • syntax • discourse • register / genre markers • writing “errors”

Future Work?

• Maintain relationship with government

• Developmental / longitudinal L1 academic English “learner” corpus:

• compounds / multi-word units / phraseology • morphology • syntax • discourse • register / genre markers • writing “errors”

• Developmental scales for Academic English

ReferencesAnthony, L. (2009). AntWordProfiler [computer software]. Retrieved from http://

www.antlab.sci.waseda.ac.jp/antwordprofiler_index.html Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical

evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. http://doi.org/10.3758/BRM.41.4.977

Cobb,T. (n.d.). Web VP/BNC-COCA-25 Vocabprofiler [Online computer software] (an adaptation of Heatley and Nation’s, 1994 Range). Available: http://www.lextutor.ca/vp/bnc

Crossley, S. A., Cobb, T., & McNamara, D. S. (2013). Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System, 41(4), 965–981. http://doi.org/10.1016/j.system.2013.08.002

Edwards, R., & Collins, L. (2011). Lexical Frequency Profiles and Zipf’s Law. Language Learning, 61(1), 1-30. doi: 10.1111/j.1467-9922.2010.00616.x

Fisher, R. A. (1922). On the Interpretation of χ2 from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society. pp. 87–94.

Heatley, A. and Nation, P. (1994). Range. Victoria University of Wellington, NZ. [Computer program, available at http://www.vuw.ac.nz/lals/.]

Laufer, B., & Nation, P. (1995). Vocabulary size and use: lexical richness in L2 written production. Applied Linguistics, 16, 307-322. doi: 10.1093/applin/16.3.307

References (cont.)Kyle, K., & Crossley, S. A. (2014). Automatically Assessing Lexical Sophistication:

Indices, Tools, Findings, and Application. TESOL Quarterly. doi: 10.1002/tesq.194

Milička, J. (2012). Minimal Ratio: An Exact Metric for Keywords, Collocations etc. Czech and Slovak Linguistic Review, 1.

Nagy, W. E. (2005). Why vocabulary instruction needs to be long-term and comprehensive. In A. Hiebert & M. Kamil (Eds.), Teaching and learning vocabulary: Bringing Research to Practice (pp. 27-44). Mahwah, NJ: L. Erlbaum.

Nation, P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 6-19). Cambridge: Cambridge University Press.

Schmid, H. (1994): Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.

Van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology, 67(6), 1176–1190. http://doi.org/10.1080/17470218.2013.850521

Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. New York: Touchstone Applied Science Associates, Inc.

Werklund School of Education

Acknowledgements

• Professor Hetty Roessingh

• Eric Eidelberg & Mike Clark

Dept. of Computer Science

• Nicole Neutzling, Calgary Board of Education

• Susan Elgie, University of Toronto

• Alberta Education, Assessment Sector

Please come visit us in sunny Alberta

Questions?