Measuring CEFR B1 Listening Proficiency: Cognitive Validity in a Listening Test

24
MEASURING CEFR B1 LISTENING PROFICIENCY Caroline Shackleton IATEFL TEASIG EVENT, CENTRO DE LENGUAS MODERNAS, GRANADA UNIVERSITY, OCTOBER 2014

Transcript of Measuring CEFR B1 Listening Proficiency: Cognitive Validity in a Listening Test

MEASURING CEFR B1 LISTENING PROFICIENCY

Caroline ShackletonIATEFL TEASIG EVENT, CENTRO DE LENGUAS MODERNAS,

GRANADA UNIVERSITY, OCTOBER 2014

A Construct Validity Study of the University of Granada B1 Listening Test

Weir’s (2005) socio-cognitive validation framework.

3 main pillars:

Scoring ValidityContext ValidityCognitive Validity

Cognitive Validity Expertise in using language should be the focus of assessment (Buck, 2001; Field, 2013). Field (2012) goes on to argue that cognitive validity can be approached in two ways:

Modelling the skill on target expert behaviour.

Studying candidate behaviour using verbal reports.

Listening ConstructAn inferential, interactive process.• Bottom-up (speech perception and word recognition).

• Top-down (applying non-linguistic knowledge, schema, frames and background and topical knowledge).

Used in a parallel form in order to decode a message and build meaning (Buck, 2001).

The degree of usage of each will depend on knowledge of the language, familiarity with the topic and the purpose for listening (Vandergrift, 2003).

AuthenticityWe must aim to test those aspects which are unique to listening (Buck, 2001). A scripted text, in contrast, would lack many of these characteristics (Field, 2008a).

The CEFR (CoE, 2001 p.165) states that `syntactic over-simplification of authentic texts may actually have the effect of increasing the level of difficulty’ because of the elimination of redundancies, clues to meaning etc.

Field (2013, p.143) states that ‘If a test is to adequately predict how test takers will perform in normal circumstances, it is clearly desirable that the spoken input should closely resemble that of real-life conversational or broadcast sources´. According to Field, items should never be developed from a transcript; rather, the acoustic phonetic signal should be placed centre stage in order to take account of the ´relative salience of ideas´ (Field, 2013, p.150).

Taylor’s (2013) updated re-conceptualisation of Weir’s framework for cognitive validity.COGNITIVE VALIDITYCOGNITIVE PROCESSESDecoding acoustic/phonetic (and visual) inputLexical searchSyntactic parsingEstablishing propositional meaningConstructing a meaning representationConstructing a discourse representation

Adapted from Taylor (2013, p. 28)

Description of UGR B1 Listening Test

A Priori EvidenceExpert Judgement.

Text Mapping.The process is recommended by Weir (2005, p.101) in order to replicate one type of listening when developing listening tasks. It allows us to form a consensus on the meaning of any given spoken discourse. Consensus is determined as being n-1, where n is the total number of test developers.

Post Priori EvidenceVerbal Report:The present study used both concurrent ‘think alouds’ at the pre-listening stage to assess goal setting and stimulated recall methodology (Gass and Mackey, 2000) after task completion.

Participants: 5 volunteers from two B1 level groups, one lower-end B1 (N=3) and one higher-end B1 (N= 2)

Coding CategoriesL = Lexical decoding and recognition (only one word).IU = Idea Unit or proposition implying successful syntactic parsing.CE= Cognitive Environment (Buck, 2001). Includes top down processing and strategy use such as pragmatic /sociolinguistic knowledge, drawing on co-text and context of situation/prior knowledge and inference. TR= Text Representation.M = Monitoring.

Exact agreement between coders was found to be 89.86% and the inter-coder reliability was found to be Cohen’s Kappa = 0.831 (p <.0.001), 95% CI (0.71, 0.95).

Results:Pre-listening:

‘Assessing the situation’ (Buck; 2001, p.104)

Items not only provide the candidate with a purpose for listening but also a context from which to activate schemata and generate hypotheses (Shohamy & Inbar, 1991).

Example:

Well I imagined that, me, for example, when I was a child I played volleyball or I played football in the playground. I don’t know, although they are questions that Jack will hear, I’ve made them personal and imagined the answers I would have to give. I don’t know why I did that. As if the questions were directed at me. So maybe to help me, if Jack gives a similar answer, if Jack says ‘when I was young I played football’, I say, ‘of course, like me’ and it’s easy to get.(Participant 2, L4)

After listening:Here from the beginning I inferred that it was ‘What sports did he do when he was young’, because he said, ‘everybody played football’. And also, although it seems silly, when he was talking about his past he laughed and everybody laughs when they think about their past, when they were a child.

(Participant 2, L4)Here, the candidate's use and application of pragmatic knowledge on this item highlights the fact that the problem-solving process of answering items also involves various non-linguistic cues as well as linguistic ones.

It was seen throughout that, along with monitoring, participants mixed and matched the accurate decoding of idea units and contextual clues gleaned using bottom-up and top-down processes.

Example: In number 4 I doubted between two. The first is D, if individual or team. The other is H - which sport he did when he was young. I think it’s H because he said if you don’t play football you don’t have friends and something about giving it up when he was 15 years old. So yes, it’s H not B because he was young. ‘What sport did you do when you were a child?’ He says ‘football’ and then he says ‘that’s one’ so you know he’s done one, so if he was talking about B it doesn’t make sense that he says that’s one. Like I said, he says when he was 15 years old. He says he played football when he was young because if you wanted to have friends you had to play football.

(Participant 1, L4)

RESULTS

In concordance with Buck (1994), the verbal reports gave evidence that different individual approaches are used to arrive at a correct answer.

Reasons for incorrect responses

Mis-coding (see Field, 2008a)Example: In this one I had lots of doubts. What I understood was, basically, he was talking about, he said something about ‘train’. When he said train I thought about places where you could do sport. I imagined training in a field. At the beginning I understood something but when I heard ‘train’ I sort of just thought about that. It can’t be two things at once so because I’m sure he said train, it’s what I’m most sure about so I chose this option.(Participant 2, L6)

No appropriate schema has been activated (Rost, 2011)

Example: Sarah thinks the flight to London takes... I didn’t hear this, I didn’t hear anything. But I think it’s less time than John says. First because I didn’t hear a time, a number of hours, and then by what John says I think the girl is more intelligent than John and it’s going to be less time than John says. (Participant 1, L12)

Methodology useful to explain item statistics (e.g. Anckar 2011)

ExampleItem 5: Easiest item on test by far. Logit difficulty measure is -2.09 with acceptable fit statistics.

Example:In 5 he was talking about benefits. I think he was talking about benefits of doing sport. Because it allows you to get stronger, relax and he spoke about feelings and ‘emotional’ or emotions. It makes you feel good. He was saying to be strong and good for how you feel.

(Participant 4, L5)

The reason for this item being easy could be explained by the fact that the ideas of ‘stronger’ and ‘emotional balance’ were ‘key content words’, which could easily be matched to candidate’s schemata and world knowledge (e.g. Field, 2008b).

SOME CONCLUSIONSEvidence to support Taylor’s (2013) cognitive validity model.

Construct-irrelevant test-wiseness strategies were not evidenced.

Communicative language ability is tested i.e. evidence of linguistic, strategic, pragmatic and socio-linguistic competence.

Cognitive validity evidence provides us with some degree of certainty that test takers will be able to perform similar tasks in the TLU domain.

Study supports the view that gist/main ideas should be tested at lower levels (e.g. Field, 2013).

Authentic sound files in listening tests can have positive wash-back.

Some evidence showing that different types of listening elicit different behaviours. Important for construct coverage.

Listening comprehension is an individual process. Text-mapping useful in order to provide consensus on meaning of any given spoken discourse.

The reported scores provide a meaningful, user-friendly description of what a typical test-taker can do at any given level (Alderson et al, 2006).

THANK YOU FOR LISTENING!

ReferencesAlderson,J.C., Figueras, N, Kuijper,H., Nold,G, Takala,S & Tardieu, C. (2006). Analysing Tests of Reading and Listening in Relation to the Common European Framework of Reference: The Experience of The Dutch CEFR Construct Project, Language Assessment Quarterly, 3 (1), 3-30.Anckar, J. (2011). Assessing Foreign Language Listening Comprehension by Means of the Multiple-Choice Format: Processes and Products. Retrieved March, 2013 from http://www.academia.edu/2969303/Assessing_foreign_language_listening_comprehension_by_means_of_the_multiple-choice_format_processes_and_productsBuck, G. (1994). The appropriacy of psychometric measurement models for testing second language listening comprehension. Language Testing, 11(2), 145-170.Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press.Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.Gass, S., & Mackey, A. (2000). Stimulated recall methodology in second language research. Mahwah,NJ: Lawrence Erlbaum Associates.Field, J. (2008a). Listening in the Language Classroom. Cambridge: Cambridge University Press.Field, J. (2008b). Bricks or Mortar: Which Parts of the Input Does a Second Language Listener Rely On?. TESOL Quarterly, Vol. 42, No. 3, pp. 411-432.Field, J. (2012). Cognitive validity in language testing: theory and practice. Presentation given at CRELLA Summer Research Seminar. Retrieved September, 2013 from http://www.beds.ac.uk/__data/assets/pdf_file/0007/215845/Cognitive-valdity-summerseminar-Read-Only-Compatibility-Mode.pdfField, J. (2013). Cognitive validity. In Geranpayeh, A and L.Taylor (Eds). Examining Listening: Research and Practice in Assessing Second Language Listening. Cambridge: Cambridge University Press. Rost, M. (2011).Teaching and Researching Listening. 2nd edition. Pearson education limited.Shohamy, E., & Inbar, O. (1991). Validation of listening comprehension tests: The effect of text and question type. Language Testing, 8(1), 23–40. Taylor, L. (2013). Introduction. In Geranpayeh, A and L.Taylor (Eds). Examining Listening: Research and Practice in Assessing Second Language Listening. Cambridge: Cambridge University Press.Weir, C. J. (2005).Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.Vandergrift, L. (2003). Orchestrating strategy use: Toward a model of the skilled second language listener. Language Learning, 53(3), 463–496.