construct validation of the revised listening component of the ...

24
CONSTRUCT VALIDATION OF THE REVISED LISTENING COMPONENT OF THE MALAYSIAN UNIVERSITY ENGLISH TEST BY ELIA MD JOHAR A dissertation submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in Education Institute of Education International Islamic University Malaysia JULY 2013

Transcript of construct validation of the revised listening component of the ...

CONSTRUCT VALIDATION OF THE REVISED LISTENING COMPONENT OF THE MALAYSIAN

UNIVERSITY ENGLISH TEST

BY

ELIA MD JOHAR

A dissertation submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in Education

Institute of Education International Islamic University Malaysia

JULY 2013

ii

ABSTRACT

This paper presents the findings for content- and construct-related evidence of the newly revised version of the listening skill test that is used as part of a university entrance requirement. The Malaysian University English Test (MUET) listening component was designed and conceptualized based on Bloom’s cognitive taxonomy, encompassing 18 objectives in a 20-item instrument to gauge the mastery of listening comprehension ability. The study attempted to define the construct theoretically and operationally under logical investigation and to find the evidence of content relevance, content representation and technical quality of the items through empirical investigation. It employed the item rating procedures involving four content experts in the area of English as a Second Language Learning. Low Spearman’s rho values on inter-rater reliability indicated marked variations across the raters. Rating data were analysed using both the traditional item-objective congruence method and adjusted item index equation as the items were found to be measuring multiple objectives. Five items were rated as clearly measuring a singular valid objective while 6 were measuring multiple objectives. Of these 11 items, 9 items tested the lowest cognitive level while 2 items measured the objectives under Analysis and Synthesis. To determine construct validity, data obtained from the 250 MUET examinees were analysed using the Rasch model. Analysis of results showed that the listening construct was under-represented as the number of items was insufficient in targeting differing abilities. Two items were underfit and deleted. The study also revealed that no single format is appropriate to address all listening abilities. It is recommended that more items be added to the test. The implications of the study call for the improvement of the reliability and validity with respect to the utility of Bloom’s taxonomy in test development.

iii

ملخص البحث

مبحتوى وتركيب اختبار مهارات االستماع للنسخه ةدلة ذات الصليعرض هذا البحث النتائج لألوقد . املنقحة الختبار اللغه االجنليزية للجامعات واليت تستخدم كجزء من متطلبات الدخول للجامعة

يةوحدة قياس 20 هدفا مع 18مت صياغة عناصر االستماع اعتمادا على تصنيف بلوم املعريف وتشمل حاولت الدراسة تعريف االستماع نظريا وعمليا . املسموع فهميف لقياس مدى التمكن من القدرة

متثيل احملتوى ونوعية الوحدات تقنيا من خالل ،عطاء دليل رئيسي حول والئمة احملتوىإىل إضافة إلقد استخدمت هذه الدراسة الوحدات إلجراءات حتديد القيمة مع انضمام .التجرييبسلوب التقييم أ

ن القيمة املنخفضة ملعامل سبريمان راو حول إ. كلغة ثانية م اللغة اإلجنليزيةخرباء احملتوى يف جمال تعليمت حتليل البيانات املقيمة باستخدام طريقة .موثوقية التقييم دللت على وجود اختالفات بني املقيمني

ن الوحدات أالتقليدي وطريقة معادلة الرقم القياسي للوحدة املعدلة حيث هدف الوحدة تصنيف هدفا صاحلا مفرداا تقيس أ تتبيناليت مسة وحدات خل القيمة مت حتديد .تعددةاملهداف األتقيس

مت استخدام تسعة منها ةمن هذه االحدى عشر وحد .هداف متعددةأحدات تقيس و بينما ستطار التحليل إهداف يف ان االخرتان املتبقيتان تقيسان األتبينما االثن األدىندراك لقياس مستوى اإل

ممتحن المتحان اللغة 250ألجل التأكيد من صحة بناء البيانات، مجعت من عينة حجمها .التركيبوبينت ". راش"ن طريق استخدام منوذج يسمى بـــ ذلك ع االجنليزية يف اجلامعات املاليزية وقد مت

نتائج التحليل بأن مكونات عناصر االستماع كان دون املستوى حيث أن عدد الوحدات مل يكنمن بني الوحدات مت فرز وحدتني غري مالئمتني حيث . كافيا يف الكشف عن بيان اختالف القابليات

لقد وجدت . أن هاتني الوحدتني مل تسامها يف التاثري على مكونات االستماع لذا يوصى بإلغائهماحدات ويوصى بإضافة و. الدراسة بأن عدم وجود شكل السؤال احملدد لتحديد القدرة يف االستماع

من خالل مشاكل وصعوبات الدراسة يوصى بتحسني املوثوقية والصالحية فيما . أخرى إىل االختبار . خيص استخدام تصنيف بلوم املعريف يف تطوير االختبارات

iv

APPROVAL PAGE

The thesis of Elia Md Johar has been approved by the following:

_______________________ Ainol Madziah Zubairi

Supervisor

_______________________ Ismail Sheikh Ahmad

Supervisor

________________________ Mohamad Sahari Nordin

Supervisor

________________________ Zainurin Abd Rahman

Internal Examiner

_________________________ Abdul Halim Abdul Roaf

External Examiner

_________________________ Radwan Jamal Yousef El Atrash

Chairman

v

DECLARATION

I hereby declare that this dissertation is the result of my own investigations, except

where otherwise stated. I also declare that it has not been previously or concurrently

submitted as a whole for any other degrees at IIUM or other institutions.

Elia Md Johar Signature …………………………………… Date ……………………..

vi

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH

Copyright © 2013 by Elia Binti Md Johar. All rights reserved.

CONSTRUCT VALIDATION OF THE REVISED LISTENING COMPONENT OF THE MALAYSIAN UNIVERSITY

ENGLISH TEST

No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below. 1. Any material contained in or derived from this unpublished research may

only be used by others in their writing with due acknowledgement. 2. IIUM or its library will have the right to make and transmit copies (print

or electronic) for institutional and academic purposes. 3. The IIUM library will have the right to make, store in a retrieval system

and supply copies of this unpublished research if requested by other universities and research libraries. Affirmed by Elia Binti Md Johar. …………………….. …………… Signature Date

vii

I dedicate my doctoral work to my children: Athirah Aiman, Adam Aiman and

Muhammad Aiman who have truly been my strength through the quest of its

completion.

viii

ACKNOWLEDGEMENTS

I wish to acknowledge with special thanks and appreciation to a number of individuals who provided assistance with the study. This research would not have been possible without the time and efforts of my doctoral committee members: Associate Professor Dr Ainol Madziah, Associate Professor Dr Ismail Sheikh Ahmad and Professor Dr Mohamad Sahari Nordin.

My special gratitude also goes to the officials of the Malaysian Examination Council who are in-charge of the Malaysian University of English Test for their assistance and kind hospitality during my data entry period.

Special thanks go to the Dean and those at the Academy of Language Studies, Universiti Teknologi MARA, Shah Alam for their continual support and encouragement.

I would like to gratefully acknowledge the contributions of my course work lecturers who had been one way or another provided me with all knowledge and skills required in completing my PhD programme. I also wish to thank Puan Nuriza and Puan Nursiah at the Post-Graduate Office who have always been very helpful to me during my PhD quest.

My deepest appreciation also goes to my closest PhD colleagues and friends who had been there for me in times of adversities.

My deepest love and gratitude go to my children, Athirah Aiman, Adam Aiman and Muhammad Aiman, my mother, Hawa Mohamed Salleh and my siblings for their never-ending love and support.

Above all, I render my thankfulness to the Almighty Allah for blessing me with endurance and all the assistance provided along the way.

ix

TABLE OF CONTENTS

Abstract ............................................................................................................... ii Abstract in Arabic ................................................................................................ iii Approval Page ..................................................................................................... iv Declaration Page .................................................................................................. v Copyright Page .................................................................................................... vi Dedication ........................................................................................................... vii Acknowledgements ............................................................................................. viii List of Tables ....................................................................................................... xiii List of Figures ..................................................................................................... xv List of Abbreviations ........................................................................................... xvi CHAPTER ONE: INTRODUCTION ............................................................... 1

1.1 Introduction..................................................................................... 1 1.2 The Testing of Listening Comprehension ........................................ 1 1.3 Background of the Malaysian University English Test .................... 4 1.4 The Revised Listening Component .................................................. 5 1.5 The Muet Listening Test Design...................................................... 10 1.6 Statement of Problem ...................................................................... 12 1.7 Purpose and Objectives of the Study ............................................... 15 1.8 Research Questions ......................................................................... 16 1.9 Significance of the Study................................................................. 16 1.10 Scope of the Study .......................................................................... 17 1.11 Theoretical Framework ................................................................... 19 1.12 Definitions of Terms ....................................................................... 22 1.13 Conclusion ...................................................................................... 27

CHAPTER TWO: REVIEW OF LITERATURE ............................................ 28

2.1 Introduction..................................................................................... 28 2.2 Theoretical Perspectives of Listening .............................................. 28

2.2.1 Listening Comprehension ....................................................... 29 2.2.2 Framework of Language Competence .................................... 31 2.2.3 Speech Reception Framework ................................................ 34

2.3 Test Method Facets Framework....................................................... 36 2.3.1 Response Formats .................................................................. 39

2.3.1.1 Selected-Response Item Formats .................................... 40 2.3.1.2 Constructed-Response Item Format ................................ 42

2.4 Bloom’s Taxonomy and Issues ........................................................ 44 2.5.1 Construct Under-Representation ............................................. 55 2.5.2 Construct-Irrelevant Variance ................................................ 55

2.6 Content Aspects of Construct Validity ............................................. 58 2.6.1 Content Relevance ................................................................. 59 2.6.2 Content Representation .......................................................... 60 2.6.3 Technical Quality of Test Items.............................................. 61

2.7 Approaches to Construct Validation in Relation to Listening Tests .. 62

x

2.8 Quantifying Judgmental Methods in Content Validation.................. 66 2.8.1 Item-Objective Congruence Index .......................................... 68

2.9 Reliability of Criterion-Referenced Test Scores ............................... 70 2.10 Unidimensionality of Test Items ...................................................... 73 2.11 The Rasch Measurement Model (RMM) ......................................... 76

2.11.1 Properties of the Rasch Measurement Model .......................... 79 2.11.2 Reliability and Separation ...................................................... 81 2.11.3 Item-Person Distribution Map ................................................ 82 2.11.4 Unidimensionality and Local Independence ........................... 83 2.11.5 Validity Indices ...................................................................... 86 2.11.6 Evaluation of Fit .................................................................... 86

2.12 Reviewing and Deleting Flawed Items ............................................ 90 2.13 Relevant Studies .............................................................................. 91

2.13.1 Validity Studies...................................................................... 92 2.13.2 Studies on Response Format................................................... 93 2.13.3 Studies on Items of High and Low Cognitive Skill Levels ...... 96

CHAPTER THREE: RESEARCH METHODOLOGY .................................. 99

3.1 Introduction..................................................................................... 99 3.2 Research Design Overview ............................................................. 99 3.3 Logical Investigation ....................................................................... 102

3.3.1 Theoretical Definition of the Listening Construct ................... 103 3.3.2 Operational Definition of Listening Construct ........................ 105

3.4 Population and Sample .................................................................... 106 3.4.1 Content-Related Evidence ...................................................... 106 3.4.2 Construct-Related Evidence ................................................... 107

3.5 Instruments ..................................................................................... 109 3.5.1 MUET Listening Test Paper ................................................... 109 3.5.2 Relevance Rating Sheet .......................................................... 110 3.5.3 Item-Objective Congruence Table .......................................... 110 3.5.4 Structural Difficulty Rating Sheet .......................................... 111 3.5.5 Suitability of Item Format Rating Sheet ................................. 111

3.6 Data Gathering Procedures .............................................................. 112 3.6.1 Preliminary Activities ............................................................ 112 3.6.2 Piloting .................................................................................. 113 3.6.3 Administration of the Rating Sheets ....................................... 113

3.7 Data Analysis Procedures ................................................................ 114 3.7.1 Sampling Adequacy ............................................................... 117 3.7.2 Reliability .............................................................................. 120 3.7.3 Convergence .......................................................................... 121 3.7.4 Discrimination ....................................................................... 122 3.7.5 Validity of the Person Measures ............................................. 123

3.8 The Pilot Study ............................................................................... 124 CHAPTER FOUR: FINDINGS ........................................................................ 131

4.1 Introduction..................................................................................... 131 4.2 Inter-Rater Reliability Analysis ....................................................... 131

xi

4.3 Content-Related Evidence ............................................................... 135 4.3.1 Content Relevance ................................................................. 135 4.3.2 Content Representation .......................................................... 137

4.3.2.1 Item-Objective Congruence ............................................ 137 4.3.2.2 Appropriateness of Item Format ..................................... 145 4.3.2.3 Structural Difficulty of the MUET Listening Items ......... 146

4.4 Construct-Related Evidence ............................................................ 149 4.4.1 Reliability: Replicability of the MUET Listening Items .......... 150 4.4.3 Discrimination: Construct Representation .............................. 161

4.5 Deletion of Problematic Items ......................................................... 170 4.6 Key Findings ................................................................................... 174

CHAPTER FIVE: DISCUSSION OF FINDINGS ........................................... 181

5.1 Introduction..................................................................................... 181 5.2 Research Question 1: What is the Content-Related Evidence of the Muet Listening Test? ................................................................................ 181

5.2.1 Inter-Rater Reliability .............................................................. 181 5.2.2 Content Relevance ................................................................... 182 5.2.3 Content Representation ............................................................ 182

5.3 Research Question 2: What is the Construct-Related Evidence of the Muet Listening Test? ................................................................................ 184

5.3.1 Reliability .............................................................................. 184 5.3.2 Unidimensionality .................................................................. 185 5.3.3 Construct Representation ....................................................... 186 5.3.4 Construct-Irrelevant Variance ................................................ 188 5.3.5 Technical Quality of the Items ............................................... 189

CHAPTER SIX: CONCLUSIONS AND RECOMMENDATIONS ................ 192

6.1 Introduction..................................................................................... 192 6.2 Overview of the Study ..................................................................... 192 6.3 Summary of the Findings ................................................................ 193

6.3.1 Findings on Content-Related Evidence of the MUET Listening Test .............................................................................................. 193 6.3.2 Findings on Construct-Related Evidence of the MUET Listening Test .............................................................................................. 195

6.4 Implications for the Muet Listening Test Design and Development . 196 6.5 Contributions of the Study ............................................................... 200 6.6 Limitations of the Study .................................................................. 200 6.8 Conclusion ...................................................................................... 202

BIBLIOGRAPHY .............................................................................................. 205 APPENDIX I: LETTER OF CONSENT FROM THE MALAYSIAN EXAMINATIONS COUNCIL ............................................................................ 219 APPENDIX II: INSTRUMENT - MUET LISTENING TEST PAPER ................ 220 APPENDIX III: MUET LISTENING SCRIPT .................................................... 224 APPENDIX IV: INSTRUMENT – ITEM RELEVANCE RATING SHEET ........ 228 APPENDIX VI: STRUCTURAL DIFFICULTY RATING SHEET ............................................... 232

xii

APPENDIX VII: ITEM FORMAT RATING SHEET .......................................... 233 APPENDIX VIII: MEAN RELEVANCE RATING ACROSS FOUR SUBJECT-MATTER EXPERTS........................................................................................... 234

xiii

LIST OF TABLES Table No. Page No. 1.1 Test Components with their Aggregated Scores and Weighting 8

1.2 A Comparison between the Old and Revised Listening Component Specifications 9

1.3 The Central Tendencies and Dispersion of the MUET Listening Test Aggregated Scores over Six Examination Sessions 13

2.1 Taxonomy Table 49

3.1 Summary of the Statistical Methods Employed in the Study According to the Research Questions 116

3.2 Classification of the MUET Listening Items across SMEs 125

3.3 Summary of the Classification of the MUET Listening Items across Listening Sub-skills 127

4.1 Inter-rater Reliability Analysis: 133

4.2 Mean Relevance Indices across All Items 136

4.3 Index of Item-Objective Congruence Values and the Associated Average SMEs’ Ratings for Each Objective 138

4.4 The Adjusted Item-Objective Congruence Values, Valid Constructs and Construct Mean for 20 MUET Listening Items 142

4.5 Mean Appropriateness Ratings of Each Item across Types of Item Format 146

4.6 Ratings on Item Structural Difficulty 147

4.7 Rasch Examinee Summary Statistics 150

4.8 Rasch Item Summary Statistics 152

4.9 Item Statistics: Correlation Order 154

4.10 Standardized Residual Correlations for Item Local Independence Measure155

4.11 Principal Component Analysis of Standardised Residual Correlations for Items (in Eigenvalue units) 156

4.12 The Rasch Good-fit Statistics 158

xiv

4.13 Examinee Statistics: Misfit Order 167

4.14 Point-Biserial Correlation and Fit Statistics of Problematic Items 171

4.15 Rasch Examinee Recalculated Summary Statistics 172

4.16 Rasch Item Recalculated Summary Statistics of 18 Items 173

4.17 The Recalculated Summary Statistics for the Rasch Analysis 173

4.18 Findings on Content-Related and Construct-Related Evidence of the MUET Listening Test 179

xv

LIST OF FIGURES

Figure No. Page No.

1.1 Bloom’s Taxonomy of Educational Objectives 6

1.2 The Framework of the MUET Listening Test Design 11

1.3 The Theoretical Framework of the MUET Listening Component Validation Study 20

2.1 Some Components of Language use and Language Test Performance 33

2.2 Speech Reception Framework 36

2.3 The Revised Bloom’s Taxonomy 47

3.1 The Research Design of the Study 100

3.2 Theoretical and Operational Definitions of the MUET Listening Construct 103

3.3 Item-Person Distribution Map 128

4.1 Structural Difficulty of the MUET Listening Items 148

4.2 The Observed Item Characteristic Curve 160

4.3 Wright Map of Person-Measures and Item Calibrations 162

4.4 Persons with Most Unexpected Responses in Terms of Logit Measure 168

4.5 Persons with Most Misfitting Responses in terms of OUTFIT MNSQ Values 170

xvi

LIST OF ABBREVIATIONS

et al. (et alia): and others CTT Classical Test Theory IELTS International English Language Testing System ILA International Listening Association MUET Malaysian University English Test RMM Rasch Measurement Model SME Subject-Matter Experts TOEFL Test of English as a Foreign Language MPM Majlis Peperiksaan Negara

1

CHAPTER ONE

INTRODUCTION

1.1 INTRODUCTION

This chapter introduces the revised listening component of the Malaysian University

English Test (MUET). It highlights the testing issues of the listening comprehension

as the background of the study. It also describes the MUET listening scores over four

examination sessions statistically and accordingly, justifies the reasons why the

construct validation of the listening test should be undertaken. The purpose, objectives

and research questions are then laid down to provide the direction of the study. The

chapter also deals with the significance, scope and limitations of the study. Lastly, this

chapter provides operational definitions of important testing terms to clearly define

the path of the study.

1.2 THE TESTING OF LISTENING COMPREHENSION

No longer being considered the Cinderella of the four language skills, at present,

listening is treated as equally important as other language skills in the use of not just

English but other varieties of spoken language in the era of globalization (Flowerdew

& Miller, 2005). There has also been a growing concern over the role it plays in

language learning as well as language acquisition (Feyten, 1991; Nunan, 1998) and

therefore, it is taught in many classrooms as a basic skill, antecedent to the

development of other language skills. Parallel to the development in listening, the

testing of listening comprehension is viewed to have “undergone considerable changes

in recent years as a result of a greater understanding of the processes involved in

2

listening” (Lewkowicz, 1991, p.26). However, commenting on the status of the

assessment of listening abilities, Alderson and Bachman, the editors of the Cambridge

Language Assessment Series in the preface of Buck’s (2001) deem it as one of the

least understood and least developed area in language testing and assessment.

The most prevailing issue in listening is the absence of an adequate definition

of the second language (L2) listening ability causing L2 listening to suffer from the

clarity of concept which may boil down to its very unique characteristics (Witkin &

Trochim, 1997; Vandergrift, 2007). In fact, Buck (1991) reveals that a review of

literature on both first and second language listening portray a similar picture. This

situation triggers indecisiveness on what standardized basis should be referred to

when designing and developing listening tests. He further points out that “in practice,

test constructors are obliged to follow their instincts and just do the best they can

when constructing tests of listening comprehension” (p.67). In the same way, Dunkel,

Henning and Chaudron (1993) argue that many of the current L2 listening

comprehension tests were criticized for having been constructed with little or no

explicit reference to any particular model or theory of listening comprehension or

some of the taxonomies of listening sub-skills. This kind of test construction practice

is seen as a result of what they claim “there exists…no general consensus on the best

techniques for assessing that construct [listening comprehension]” (Dunkel et al.,

1993, p. 178). What more is, according to Fulcher and Davidson (2007), tests can in

fact be built from specifications, not from framework or models and hence, this gives

a leeway for any tests to be developed merely from the skills and teaching syllabus.

This, indeed, becomes a contrary to the fact that test specifications are meant to make

the underlying theoretical framework of a test explicit and explicate relationships

3

among its constructs and the relationship between the theory and the test purpose

(Alderson, Clapham & Wall, 1995).

However, the testing of listening comprehension can, to some extent,

demonstrate different measures albeit considerable sharing of the many important

characteristics with reading (Dunkel et al., 1993; Buck, 2001). The very unique

characteristics of listening which are ephemeral, a rich prosody and natural fast

speech, and yet unobservable make its measurement technically more complex. For

that reason, the best way of measuring listening ability is to incorporate those aspects

of proficiency and comprehension that are unique to listening (Buck, 1991; Rost,

2002, as cited in Weir, 2005). In this regard, the focus of testing listening would

primarily be on special features of the spoken texts such as phonology, accents,

prosodic features, speech rate, hesitations and discourse structure. As such, Buck

(2001) therefore, argues the testing of listening should require fast and automatic, on-

line processing of texts that have the linguistic characteristics of typical spoken

language.

It is not surprising that the MUET listening test - one of the language

components constituting the country’s placement test which seeks to determine

whether candidates have an adequate level of English ability to follow undergraduate

courses in the chosen field of study (Zuraidah Mohd Don, 2003) has incorporated all

the elements proposed by Buck (1991). The quality of the test items must be

determined through investigation on its psychometric qualities which include

reliability and validity. However, little is known about the reliability and validity of

the MUET because they are not made public. Among the most recent validity studies

of MUET in local setting are item-level evaluation of the MUET reading test (Yusup,

2012), the predictive validity of MUET as a placement test in relation to its bands

4

(Rethinasamy & Chuah, 2011) and construct validity of MUET in comparison with

English Placement Test of the International Islamic University Malaysia (Noor Lide

Abu Kassim, Ainol Madziah Zubairi & Nuraihan Mat Daud, 2007).

Similarly, there is relatively little research literature relating to the reliability

and validity of placement tests (Wall, Clapham & Anderson, 1994; Fulcher, 1997)

albeit the widespread usages of such tests except the international placement tests like

Test of English as a Foreign Language (TOEFL) and International English Language

Testing System (IELTS). Such a move to establish the MUET reliability and validity

is of a paramount importance. This is to promote the credibility of MUET so that

unwarranted comments like questioning the status of MUET as a high-stakes test that

merely serves the general purpose of exposing students to some nuances in English

(Nair, New Straits Times, February 1, 2005) are removed. In addition, this could also

correct the perception of the failure of the use of MUET as a lever for change for it

was claimed that nothing has been achieved but only mere compliance (Lee, 2004).

Above all, test validation practice will be able to provide evidence that MUET really

measures what it is intended to measure not only in the final stage of its development

but also under operational conditions. This is in line with what McNamara (2000)

proposes, that is, it is still necessary for data from actual tests to be systematically

gathered and analysed to investigate the validity and usefulness.

1.3 BACKGROUND OF THE MALAYSIAN UNIVERSITY ENGLISH

TEST

As indicated in the National Education Philosophy, MUET is seen as a continuing

effort of the Ministry of Higher Education where the role of English as a second

language in the country, becomes a main concern in producing knowledgeable and

5

competent Malaysians. It was first launched in 1999 with the objective of gauging the

English Language proficiency of those who intend to pursue tertiary education.

Therefore, it is compulsory for pre-university students to sit for MUET before

pursuing their higher education at degree level in local public universities.

MUET is under the management of the Malaysian Examinations Council and

the organization is responsible for the administration of MUET since its inception in

1999. The test is administered twice a year; one is mid-year (April/May) and the other

is year-end (October/November).

MUET consists of four language components, namely, listening, speaking,

reading and writing. MUET was announced to be revised in March 2007 and the

revised MUET was effective beginning the end of 2008. The MUET results are

computed based on an aggregated score range of 0 to 300 which correlates with a

banding system, ranging from Band 1 (Very Limited User) to Band 6 (Highly

Proficient Users) (Malaysian Examinations Council, 2006).

1.4 THE REVISED LISTENING COMPONENT

As a criterion-referenced test, the MUET listening test scores define an individual’s

ability or performance in terms of relative mastery of a domain or upon “successful

completion of tasks from a set of domain of criterion tasks” (Bachman, 1990).

Accordingly, the listening ability of the MUET candidates are measured using six

categories prescribed in Bloom’s Cognitive Taxonomy of Educational Objectives

which can tap the proficiency level from the lowest cognitive level to the highest one.

Responses to the MUET listening tests will reflect what the candidates actually know

and can do.

6

Therefore, sufficient number of test items should be determined for the use in the

interpretation of scores to make it possible to describe test performance in terms of a

student’s mastery or non-mastery of the tasks (Linn & Gronlund, 2000).

The teaching of the MUET listening skill is not content-based but is tailored

towards listening enabling skills which are classified under Bloom’s cognitive

taxonomy of educational objectives. Figure 1.1 shows the classifications under

Bloom’s taxonomy, namely, six hierarchical cognitive categories: knowledge,

comprehension, application, analysis, synthesis and evaluation.

Figure1 1.1: Bloom’s Taxonomy of Educational Objectives

Accordingly, the listening skills grouped under each level are as follows (Malaysian

Examinations Council, 2006):

(1) Knowledge

Recalling information

Recognizing main ideas

Recognizing supporting details

(2) Comprehension

Deriving meaning of words, phrases, sentences from context

Paraphrasing

7

(3) Application

Predicting outcomes

Applying a concept to new situations

(4) Analysis

Understanding language functions

Distinguishing the relevant from the irrelevant

Distinguishing fact from opinion

Drawing inferences

Identifying roles and relationship

(5) Synthesis

Following the development of a point or an argument

Summarizing information

(6) Evaluation

Appraising information

Making judgments

Drawing conclusions

Recognizing and interpreting speakers’ views, attitudes or intentions

All the six headings (Knowledge, Comprehension, Application, Analysis,

Synthesis and Evaluation) of listening skills shall be referred to as ‘cognitive

operations’. Those sub-skills come under the headings are named as ‘listening skills’.

Therefore, the listening domain is operationalised by means of Bloom’s taxonomy of

listening skills which are in fact the cognitive operations of different levels of critical

thinking skills.

Having been revised for the first time since its inception in 1999, MUET

underwent minor changes in its test specifications involving all the four components:

listening, speaking, reading and writing. However, the allocation of aggregated scores

for the four papers representing each language component did not change at all as

8

indicated in Table 1.1 which shows the distribution of the scores for all the

components.

Table1 1.1 Test Components with their Aggregated Scores and Weighting

Paper Code Test Component Maximum Score

(Aggregated Scores) Weighting

800/1 Listening 45 15% 800/2 Speaking 45 15% 800/3 Reading 120 40%

800/4 Writing 90 30%

Total: 300 100%

Band Achieved:

With regard to the content of the revised MUET listening test specifications,

the comparison between the old and the revised test specifications for the listening

component is shown in Table 1.2. In the new format of the listening component, there

was an increase in the number of texts used and test items, and a change in type of

item format as well. The rest of test item specifications remain the same.