Adequacy of Assessment tools in Official English Certificates ...

Adequacy of Assessment tools in Official English Cer�ficates to the Guidelines provided by the Common European Framework for Languages: An analysis on the effec�ve implementa�on of rubrics

Lucía Fraga Viñas

Tesis doctoral

2019

Ad

equ

acy

of

Ass

essm

ent

too

ls in

Offi

cial

En

glis

h C

er�

fica

tes

to t

he

Gu

idel

ines

p

rovi

ded

by

the

Co

mm

on

Eu

rop

ean

Fra

mew

ork

fo

r La

ngu

ages

: An

an

alys

is o

n

the

effec

�ve

imp

lem

enta

�o

n o

f ru

bri

cs

LucíaFragaViñas

UDC2019

Adequacy of Assessment tools in Official

English Certificates to the Guidelines

provided by the Common European

Framework for Languages: An analysis on

the effective implementation of rubrics

Lucía Fraga Viñas

Tesis doctoral UDC / 2019

Director: Eduardo Barros Grela

Codirectora: María Bobadilla Pérez

Programa de doctorado en Estudios Ingleses Avanzados: Lingüística,

Literatura y Cultura.

1

ABSTRACT

The Council of Europe, through the Common European Framework of Reference for

Language, Teaching and Assessment (CEFRL), has been promoting shifts in the teaching

learning process, such as the communicative approach. The current doctoral thesis has

analysed the exams and rubrics of the main Official Certificates in Spain to check how

efficiently they have adapted the CEFR guidelines. The results of an extensive

examination show that more research is needed in the field as incongruences and

shortcomings have been detected. One of the most significant findings to emerge from

this thesis is the verification that the certificates that must determine the learner’s

competence present faults in the implementation of the basic CEFR guidelines, and they

assess with rubrics which do not meet the efficiency and reliability requirements that are

expected of them. Moreover, significant research limitations stem from the lack of

transparency the certificates show in terms of assessment criteria, instruments and

reliability data. On the other hand, this study has allowed the establishment of patterns

between the assessment of the different skills and can contribute to the improvement of

the evaluation process from the perspective of exam creation and rubrics design. The

switch towards a more communicative assessment of the receptive skills can also be

initiated with some of the basis of the current dissertation.

Keywords: CEFR, assessment, EFL, rubrics, communicative approach, English Official

Certificates

3

Resumen

El Consejo Europeo, a través del Marco Común Europeo de Referencia para la Lengua,

Enseñanza y Aprendizaje (MCER), ha impulsado cambios en el proceso de enseñanza y

aprendizaje de lenguas, tales como el enfoque comunicativo. Esta tesis doctoral ha

analizado los exámenes y las rúbricas de los principales Certificados Oficiales en España

para comprobar la eficiencia con la que han adoptado las directrices del MCER. Los

resultados de este estudio muestran que todavía se necesita más investigación en este

campo ya que se han detectado incongruencias y omisiones. Uno de los resultados más

significativos ha sido el hecho de que los certificados que determinan la competencia de

los estudiantes contienen errores en la implementación de las directrices del marco y

examinan con rúbricas que no cumplen con los requisitos de validez y eficiencia que se

espera de ellas. Además, muchas de las limitaciones de este estudio se derivan de la falta

de transparencia en cuanto a criterios de evaluación, instrumentos y datos de fiabilidad

de los certificados. Por otra parte, esta investigación ha permitido establecer patrones

entre la evaluación de las diferentes destrezas y puede contribuir a la mejora del proceso

de evaluación desde la perspectiva de creación de exámenes y diseño de rúbricas. El

cambio hacia una evaluación más comunicativa de las destrezas receptivas también puede

iniciarse a partir de algunas bases de la presente tesis.

Palabras clave: MCER, evaluación, inglés como lengua extranjera, rúbrica, enfoque

comunicativo, Certificados Oficiales de inglés

5

Resumo

O Consello Europeo, a través do Marco Común Europeo de Referencia para a Lingua,

Ensino e Aprendizaxe ( MCER), impulsou cambios no proceso de ensino e aprendizaxe

de linguas, tales como o enfoque comunicativo. Esta tese doutoral analizou os exames e

as rúbricas dos principais Certificados Oficiais en España para comprobar a eficiencia

coa que adoptaron as directrices do MCER. Os resultados deste estudo mostran que aínda

se necesita máis investigación neste campo xa que se detectaron incongruencias e

omisións. Un dos resultados máis significativos foi o feito de que os certificados que

determinan a competencia dos estudantes conteñen erros na implementación das

directrices do marco e examinan con rúbricas que non cumpren cos requisitos de validez

e eficiencia que se espera delas. Ademais, moitas das limitacións deste estudo derívanse

da falta de transparencia en canto a criterios de avaliación, instrumentos e datos de

fiabilidade dos certificados. Por outra banda, esta investigación permitiu establecer

patróns entre a avaliación das diferentes destrezas e pode contribuír á mellora do proceso

de avaliación desde a perspectiva de creación de exames e deseño de rúbricas. O cambio

cara a unha avaliación máis comunicativa das destrezas receptivas tamén pode iniciarse

a partir dalgunhas bases da presente tese.

Palabras clave: MCER, avaliación, inglés como lingua estranxeira, enfoque comunicativo

Certificados Oficiais de Inglés

7

Spanish summary (long)

En los últimos años son muchas las investigaciones que se han realizado sobre la

evaluación en el aula de inglés como Lengua Extranjera. El establecimiento del Marco

Común Europeo de Referencia para las Lenguas: Enseñanza, Aprendizaje y Evaluación

(MCER) ha impulsado cambios en el proceso de enseñanza aprendizaje, tales como el

enfoque comunicativo o el trabajo de las cuatro destrezas dentro del aula: comprensión

oral, comprensión escrita, producción oral y producción escrita. Sin embargo, es

necesario analizar las principales Certificaciones Oficiales con el objetivo de analizar si

sus exámenes son adecuados a las pautas marcadas por el MCER en cuanto a forma,

contenido y evaluación. Además, los métodos de evaluación tradicional todavía

predominantes en los centros educativos no son adecuados para evaluar estas destrezas

comunicativas, lo cual conlleva una necesidad de investigación en dicha área para

encontrar formas de evaluación capaces de medirlas y evaluarlas. En este ámbito, la

introducción de rúbricas como instrumento objetivo de evaluación o como herramienta

de enseñanza aprendizaje se ha convertido en una de las principales áreas de estudio.

A pesar de que en muchos países como Estados Unidos, las rúbricas son un instrumento

de evaluación común desde principios del siglo XX, en España no se les ha prestado

demasiada atención hasta hace una década. En los últimos años la aparición de rúbricas

en los libros de texto se ha hecho más frecuente, en parte debido a la aprobación de la ley

actual de educación, la LOMCE. De todas formas, todavía no se ha realizado suficiente

investigación y aún hay muchas cuestiones por contestar acerca de su aplicación,

efectividad, usos y objetividad.

No obstante, el MCER no solo ha ejercido una gran influencia sobre nuevas metodologías

y enfoques, sino que también lo ha hecho sobre otros aspectos, por ejemplo, aquellas

8

tareas que son adecuadas para poder examinar las distintas destrezas. En este sentido, el

propio marco es una gran fuente de referencia que detalla diferentes actividades

apropiadas para examinar a los estudiantes. Asimismo, el marco también es un punto de

referencia sobre los estándares de aprendizaje que sirve para la preparación de los

currículos y las programaciones escolares y a la vez para definir los contenidos de un

examen y los criterios que se deben usar para su evaluación.

La presente tesis doctoral se ha escrito tras una lectura amplia y exhaustiva de fuentes

primarias sobre la evaluación de ILE, en particular, sobre el uso de rúbricas y del MCER

para la creación de exámenes y herramientas de evaluación. Uno de los principales fines

de la presente tesis es facilitar a la comunidad educativas pruebas acerca de la efectividad

y validez con la que las normativas del MCER se han adaptado e implementado en los

principales exámenes de los certificados de inglés más comunes en España. Con tal

intención, se ha analizado cada uno de los exámenes diseñados para examinar cada

destreza, es decir, las tareas que incluyen, cuáles son sus objetivos y sus criterios para

averiguar si siguen las directrices del marco. Además, la investigación estudia los

exámenes de qué destrezas utilizan una rúbrica para su evaluación y examina dichas

rúbricas para comprobar su efectividad. Este análisis de los exámenes que forman cada

uno de los certificados junto con el análisis de las rúbricas que utilizan pretende aportar

información a la comunidad educativa. Esta información puede utilizarse de diferentes

maneras, por ejemplo, para detectar qué ejercicios son más frecuentes para examinar una

destreza y qué criterios deben considerarse para la evaluación de tal destreza. También

nos permitió saber si el marco se está utilizando de manera correcta e identificar las

incoherencias en cuanto a sus recomendaciones y su implementación real. Por otra parte,

el análisis de rúbricas permitió establecer patrones entre ellas. De este modo, se podrá

averiguar qué tipos de rúbricas son las más comunes, si son efectivas o no y qué aspectos

9

pueden mejorarse para aumentar su fiabilidad. Finalmente, este detallado análisis ayudará

a la creación y diseño de rúbricas futuras.

En cuanto a las líneas de investigación, es evidente que el inglés es un campo de estudio

muy amplio y esta tesis doctoral mezcla distintas áreas. En primer lugar, la educación, la

asignatura de inglés como lengua extranjera y su relación con el MCER. También

podemos enmarcar esta investigación dentro del estudio de nuevos métodos de

evaluación, en concreto de rúbricas, o en el área de certificados de lenguas extranjeras.

El enfoque comunicativo que promueve el Consejo Europeo ha revolucionado las

metodologías de enseñanza y aprendizaje que existían antes de la última década. La

consolidación de la Unión Europea y la necesidad social de adaptarse a las demandas y

exigencias del mercado de la globalización ha llevado a centrarse en las destrezas

comunicativas. Tener fluidez en inglés implica que uno puede mantener una conversación

sobre un tema general y entender a cualquier persona sin demasiada dificultad. Por lo

tanto, es esencial que la destreza oral y la auditiva tengan prioridad. Igualmente, lograr

alcanzar una buena competencia comunicativa es el principal objetivo de cualquier

currículum de inglés como lengua extranjera. Llegados a este punto, es fundamental dejar

claro que las destrezas comunicativas se refieren a aquellas involucradas en la actuación

oral, es decir, la comprensión auditiva y la expresión oral. Por su parte, la competencia

comunicativa requiere que el estudiante sepa la lengua y tenga conocimientos suficientes

para saber qué es adecuado en una comunidad de hablantes en una situación concreta, así

como conocimientos sobre pragmática, discurso y cultura o estrategias para solventar

posibles dificultades. Todo esto requiere que la programación, tareas y enseñanza se

transformen y alejen de metodologías tradicionales centradas en conocimientos

gramaticales. Y para que sea posible, es necesario investigar para que la transformación

10

de la enseñanza sea efectiva y significativa y las nuevas metodologías e instrumentos

utilizados estén validados.

Todo esto es lógico si se tiene en cuenta que muchas de las metodologías imperantes

utilizan ejercicios de verdadero o falso o de respuesta múltiple, los cuales no necesitan de

una rúbrica para determinar si la respuesta es incorrecta. Sin embargo, para determinar si

una composición es buena o mal es fundamental contar con una herramienta fiable y

objetiva. La presencia de muchos o pocos errores gramaticales es solo uno de los criterios

que deben utilizarse junto con otros como el nivel y uso de vocabulario, la puntuación, la

coherencia y la conexión. No obstante, no cualquier rúbrica puede utilizarse porque no

todas las rúbricas son objetivas, válidas y fiables y de ahí que sea fundamental investigar

en esta área. Es necesario saber cómo diseñar y crear una rúbrica efectiva o al menos

cómo determinar si una ya hecha lo es o no.

Otra área de estudio es la tecnológica puesto que los nuevos dispositivos electrónicos son

una realidad innegable en la sociedad actual. Han tenido un impacto considerable en

prácticamente todos los aspectos de nuestra vida: el trabajo, la salud, las interacciones

sociales, la industria, la medicina… etc. y, por supuesto, también en la educación. Los

estudiantes de lenguas extranjeras tienen ahora acceso ilimitado a una gran cantidad de

input y pueden acceder a miles de muestras y ejemplos con solo un clic, así como a

múltiples diccionarios, tesauros o aplicaciones de vocabulario. Esta tesis doctoral quiere

hacer también una pequeña aportación al campo con la recopilación de recursos en línea

para crear rúbricas.

Otro campo de investigación relacionado es el de los niveles de competencia en una

lengua. Determinar el nivel de competencia de una persona en una lengua no es una tarea

fácil y el MCER ha hecho una gran labor elaborando una escala y unificando criterios

para que, por ejemplo, un nivel intermedio alto signifique lo mismo en un país que en

11

otro dentro de la Unión Europea. Los sistemas educativos europeos han ido adaptándose

a los nuevos requerimientos exigidos en el marco al igual que aquellos certificados

dedicados a otorgar diplomas de certificación de nivel en una lengua. Evidentemente, se

han producido muchos ajustes para adaptar los exámenes de certificación al marco y sus

directrices. Sin embargo, todavía queda mucho por hacer. Es en esta línea donde se sitúa

la contribución de esta investigación ya que el análisis de los exámenes y sus rúbricas

indica aquellos puntos en el que las recomendaciones del marco no se han implementado

correctamente para que puedan corregirse.

Una vez establecidos los objetivos y las líneas de investigación relacionadas con la

presente investigación, es momento de hablar de los resultados y conclusiones de la

misma. La presente investigación ha sido satisfactoria en muchos aspectos. Ha permitido

no solo una comprensión profunda del complejo sistema de evaluación sino también una

mejor percepción del MCER como piedra angular de la enseñanza de inglés como lengua

extranjera en Europa y como fuerza conductora del establecimiento del sistema

comunicativo en todos sus países miembros. No obstante, uno de los descubrimientos

más significativos que ha surgido de la investigación es la verificación de que los

certificados que determinan la competencia de los candidatos en lengua inglesa presentan

fallos en la implementación de las directrices básicas del MCER y en la evaluación con

rúbricas que, en ocasiones, no cumplen con los requisitos de eficiencia y fiabilidad que

se espera de ellas. Además, han surgido limitaciones a la investigación derivadas de la

falta de transparencia de los certificados en cuento a criterios e instrumentos de

evaluación o datos de fiabilidad, A pesar de sus estatus oficial y nacional, algunos de los

certificados analizados proporcionan información vaga sobre la estructura de sus pruebas

y no muestran datos de fiabilidad de sus exámenes como el Alpha Cronbach.

12

Con referencia a los aspectos positivos, el análisis tanto de los exámenes como de sus

rúbricas de evaluación ha permitido realizar comparaciones que abren líneas nuevas e

interesantes de investigación. También la comunidad puede beneficiarse de los hallazgos

clave que han establecido patrones entre la evaluación de las destrezas y pueden contribuir

a la mejora del proceso de evaluación desde la perspectiva del diseño de rúbricas y la

creación de exámenes. El cambio hacia un sistema de evaluación más comunicativo que

se ajuste más a la metodología y enseñanza utilizadas en el enfoque comunicativo puede

partir también de algunas de las premisas de esta investigación.

La primera observación que debe hacerse con respecto a los descubrimientos tiene que

ver con la necesidad de promover una mayor transparencia. Conocer el formato de un

examen, así como sus objetivos y sus criterios de evaluación es fundamental para

cualquier candidato. Por este motivo, es recomendable que las instituciones responsables

de dichos certificados publiquen en sus páginas web las rúbricas con las que van a evaluar

los examinadores, así como instrucciones precisas sobre el tipo de actividades que

contendrán sus pruebas y los objetivos y criterios de las mismas. Siguiendo esta línea de

transparencia, algunos de los certificados no ofrecen estudios que apoyen la validez y

eficiencia de sus certificados. Unos certificados con tal relevancia a nivel nacional

deberían estar respaldados con investigación sobre sus resultados y validez. Mientras los

certificados FCE y IELTS aportan estos resultados, documentos y estudios, ninguno de

los demás certificados lo hace, lo cual ha impedido una comparación de certificados con

respecto a sus coeficientes de fiabilidad.

Otro resultado clave de esta investigación ha sido el hecho de que ninguna de las rúbricas

empleadas por ninguno de los certificados analizados ha superado el test de validez

elaborado para este estudio a partir de una rúbrica de rúbricas y las directrices del marco

al respecto. Aunque algunas rúbricas presentan más aspectos a mejorar que otras, lo cierto

13

es que el hecho de que ninguna de ellas se pueda considerar plenamente válida ejemplifica

la dificultad y complejidad que conlleva diseñar una rúbrica efectiva. Una de las

principales complicaciones que se ha descubierto en la comparación de resultados es que

la dificultad de alcanzar un equilibrio entre usar descriptores que sean bastante breves

para que la rúbrica sea práctica y que las explicaciones sean lo suficientemente claras y

detalladas para que no se consideren demasiado vagas o abiertas. Alguna de las rúbricas,

por ejemplo, utiliza demasiados criterios de evaluación y son, en consecuencia, poco

prácticas o manejables. Por otro lado, algunas rúbricas sí son prácticas porque miden solo

cuatro criterios, pero sus descriptores son demasiado vagos.

El hecho de que ninguna de las rúbricas analizadas haya superado todos los criterios de

fiabilidad y validez lleva a reflexionar y considerar si el propio MCER es realmente

factible. El marco aporta directrices sobre cómo evaluar las distintas destrezas y lo hace

de maneras diferentes. Para empezar, contiene cuadros con información acerca de las

características que un candidato debe demostrar en cada uno de los niveles y destrezas.

Esto permite que las distintas instituciones y los profesores puedan elaborar el currículo,

las programaciones y diseñar las evaluaciones y los tests para la evaluación de contenidos

o la determinación de un nivel. Asimismo, incluye rúbricas con descriptores para cada

destreza e incluso para tareas o actividades concretas, por ejemplo, hay descriptores para

la evaluación de un monólogo, de una presentación o de un anuncio público. Sin embargo,

el uso de las rúbricas del propio marco implicaría que las instituciones que se encargan

de diseñar certificados de nivel tendrían que usar una rúbrica distinta para cada tarea del

examen o una muy global para todas que no sería tan precisa. Aunque estos

inconvenientes no sean desventajas en sí mismos, pues asegurarían que todas las tareas

son adecuadas, al igual que sus instrumentos de evaluación, la cantidad de trabajo y

tiempo empleado se elevarían considerablemente. Otro posible inconveniente es que las

14

escalas proporcionadas por el marco son de tipo holístico, por lo que serían demasiado

genéricas para aquellos certificados que intenten determinar si un candidato posee o no

un determinado nivel y no podrían incluir notas sobre la actuación del mismo

(sobresaliente, notable, bien, suficiente…). Diversos estudios revisados han probado,

además, que las rúbricas de tipo analítico son más precisas que las holísticas. Por

consiguiente, son más fiables y, por eso, la mayoría de las rúbricas analizadas de los

certificados son analíticas a excepción de la empleada por el ISE II para evaluar la

compresión auditiva. Las instituciones que organizan y diseñan los certificados se

encargan también de la creación de rúbricas y éstas son, como se ha dicho, de tipo

analítico en su mayoría. Puesto que las escalas incluidas en el marco son holísticas, el

proceso de creación de las mismas es más complejo y, como resultado, las rúbricas

utilizadas por un certificado y otro son muy diferentes pese a ser del mismo nivel. Sin

embargo, la elección de escoger los criterios, descriptores y escalas puede personalizar

mucho más las rúbricas al modelo de prueba propuesto, pero a su vez las distancia más

del MCER. Es probable que si el propio marco incluyese rúbricas de tipo analítico para

las distintas destrezas y niveles, las rúbricas creadas por instituciones o los propios

docentes serían más similares entre sí y posiblemente más fiables.

En línea con el diseño de rúbricas según las recomendaciones expuestas en el marco:

viabilidad de la rúbrica de acuerdo con el tamaño y el número de criterios, escritura

positiva e información breve pero no vaga en los descriptores; cabe destacar la dificultad

de redactar descriptores con información breve pero no vaga ya mencionado. Además,

otro obstáculo de las rúbricas analizadas ha sido el de utilizar palabras positivas, ya que

conseguir redactar los descriptores de los niveles de actuación más bajos con palabras

positivas requiere un dominio absoluto de la lengua. Debido a esto, solo dos de las

rúbricas analizadas cumplen este apartado. Un mayor uso de palabras de connotación

15

positiva en el marco ayudaría a mejorar este aspecto proporcionando más ejemplos sobre

cómo realizar tan compleja tarea.

Teniendo en cuenta todo lo expuesto anteriormente, podría concluirse que el MCER, pese

a haber realizado una labor magnífica en la unificación de niveles y el impuso de la

competencia comunicativa, puede no ser todo lo aplicable que debería. Una vez que ya

se ha iniciado una reforma del sistema y la evaluación de lenguas extranjeras sería un

buen momento para revisar el propio marco e intentar mejorar las pequeñas deficiencias

que presenta.

Otra reflexión interesante de la investigación llevada a cabo es la relacionada con la

presencia o ausencia de rúbricas como herramienta de evaluación. Con respecto a esto, se

ha detectado que solo las destrezas productivas, es decir, la producción oral y la

producción escrita, son evaluadas con rúbricas. Tan sólo el ISE-II utiliza una rúbrica para

evaluar una destreza no productiva sino receptiva, la comprensión auditiva. Es verdad

que las tareas que tradicionalmente se utilizan para examinar las destrezas productivas

(una presentación, un monólogo, una redacción…) parecen a priori más fáciles de

examinar con una rúbrica. Sin embargo, nada impide que las destrezas receptivas puedan

evaluarse de manera objetiva y fiable con una rúbrica. No obstante, evaluar una prueba

de comprensión auditiva o escrita con una escala requeriría un cambio en los modelos de

examen y tareas que normalmente se utilizan para medir estas destrezas. Evaluar un

ejercicio de respuesta múltiple o verdadero o falso con una rúbrica no tiene ningún

sentido, pero si en lugar de este tipo de actividades se utilizase otras en las que el alumno

tuviese que demostrar su comprensión elaborando algún texto oral o escrito, sí sería

posible utilizarlas. Esto es precisamente lo que ocurre en la prueba auditiva del certificado

ISE II, en el que el alumno, después de escuchar el audio, debe elaborar un resumen

oralmente y a continuación conversar con el examinador acerca de lo que ha entendido,

16

contestando a sus preguntas cuando sea necesario. El cambio del tipo de tareas más

comunicativas también deberá llevar consigo un cambio en la metodología.

A pesar de los resultados y hallazgos de esta investigación, es fundamental tener en cuenta

las limitaciones de la misma. Una mayor transparencia por parte de las instituciones con

respecto a sus pruebas, objetivos, criterios y rúbricas habría facilitado esta investigación,

así como datos referentes a estudios previos relacionados con su fiabilidad.

Esta investigación también abre la puerta a nuevas líneas de investigación ya que la

investigación en el ámbito de la evaluación de lenguas extranjeras, el MCER y las rúbricas

todavía requieren de muchos más estudios. Entre las distintas investigaciones que se

podrían seguir a raíz de esta tesis doctoral las más importantes estarían relacionadas con

un estudio de la fiabilidad que comparase los distintos certificados, por ejemplo,

comprobando la nota obtenida por un mismo candidato presentándose al mismo nivel en

los distintos certificados o la nota obtenida por un mismo candidato en un examen si es

evaluado por diversos examinadores independientes diferentes. Por último, un estudio

sobre la aplicabilidad del MCER sería de sumo interés, así como uno que analice hasta

qué punto se respeta y aplican los principios del marco en el sistema educativo actual.

Todas estas nuevas líneas de investigación apuntan a un gran espacio de investigación en

esta área para alcanzar un sistema que garantice inequívocamente que todos los

estudiantes puedan desarrollar al máximo su potencial. Son muchos los pasos que ya se

han dado en las últimas dos décadas, en el campo del aprendizaje de lenguas se han

realizado cambios importantes hacia un aprendizaje comunicativo real. Esto debería, por

tanto, ser fuente de inspiración para todos los investigadores, para demostrarles que los

hallazgos, la disciplina y la investigación de verdad pueden transformar la realidad en la

que vivimos.

17

Galician summary (long)

Nos últimos anos son moitas as investigacións que se realizaron sobre a avaliación na

aula de inglés como Lingua Estranxeira. O establecemento do Marco Común Europeo de

Referencia para as Linguas: Ensino, Aprendizaxe e Avaliación (MCER) impulsou

cambios no proceso de ensino aprendizaxe, tales como o enfoque comunicativo ou o

traballo das catro destrezas dentro da aula: comprensión oral, comprensión escrita,

produción oral e produción escrita. Porén, cómpre analizar as principais Certificacións

Oficiais co obxectivo de analizar se os seus exames son adecuados ás pautas marcadas

polo MCER en canto a forma, contido e avaliación. Ademais, os métodos de avaliación

tradicional aínda predominantes nos centros educativos non son adecuados para avaliar

estas destrezas comunicativas, o cal da lugar a unha necesidade de investigación na

devandita área para atopar formas de avaliación capaces de medilas e avalialas. Neste

ámbito, a introdución de rúbricas como instrumento obxectivo de avaliación ou como

ferramenta de ensino aprendizaxe converteuse nunha das principais áreas de estudo.

A pesar de que en moitos países como Estados Unidos, as rúbricas son un instrumento de

avaliación común dende principios do século XX, en España non se lles prestou

demasiada atención ata hai unha década. Nos últimos anos a aparición de rúbricas nos

libros de texto fíxose máis frecuente, en parte debido á aprobación da lei actual de

educación, a LOMCE. De tódolos xeitos, aínda non se realizou suficiente investigación e

aínda hai moitas cuestións por contestar sobre a súa aplicación, efectividade, usos e

obxectividade.

Con todo, o MCER non só exerceu unha gran influencia sobre novas metodoloxías e

enfoques, senón que tamén o fixo sobre outros aspectos, por exemplo, aquelas tarefas que

son adecuadas para poder examinar as distintas destrezas. Neste sentido, o propio marco

18

é unha gran fonte de referencia que detalla diferentes actividades apropiadas para

examinar aos estudantes. Así mesmo, o marco tamén é un punto de referencia sobre os

estándares de aprendizaxe que serve para a preparación dos currículos e as programacións

escolares e ao mesmo tempo para definir os contidos dun exame e os criterios que se

deben usar para a súa avaliación.

A presente tese doutoral escribiuse tras unha lectura ampla e exhaustiva de fontes

primarias sobre a avaliación de ILE, en particular, sobre o uso de rúbricas e do MCER

para a creación de exames e ferramentas de avaliación. Un dos principais fins da presente

tese é facilitar á comunidade educativas probas sobre a efectividade e validez coa que as

normativas do MCER foron adaptadas e aplicadas nos principais exames dos certificados

de inglés máis comúns en España. Con tal intención, analizouse cada un dos exames

deseñados para examinar cada destreza, é dicir, as tarefas que inclúen, cales son os seus

obxectivos e os seus criterios para pescudar se seguen as directrices do marco. Ademais,

a investigación estudou os exames que utilizan unha rúbrica para a súa avaliación e

examina ditas rúbricas para comprobar a súa efectividade. Esta análise dos exames que

forman cada un dos certificados xunto coa análise das rúbricas que utilizan pretende

achegar información á comunidade educativa. Esta información pode utilizarse de

diferentes maneiras, por exemplo, para detectar os exercicios que son máis frecuentes

para examinar unha destreza e os criterios que deben considerarse para a avaliación de tal

destreza. Tamén permitiu saber se o marco está a utilizarse de maneira correcta e

identificar as incoherencias en canto ás súas recomendacións e a súa aplicación real. Por

outra banda, a análise de rúbricas fixo posible establecer patróns entre elas. Deste xeito,

pescudáronse os tipos de rúbricas que son as máis comúns, se son efectivas ou non e os

aspectos que poden mellorarse para aumentar a súa fiabilidade. Finalmente, esta detallada

análise axudará á creación e deseño de rúbricas futuras.

19

En canto ás liñas de investigación, é evidente que o inglés é un campo de estudo moi

amplo e esta tese doutoral mestura distintas áreas. En primeiro lugar, a educación, a

materia de Inglés como Lingua Estranxeira e a súa relación co MCER. Tamén podemos

enmarcar esta investigación dentro do estudo de novos métodos de avaliación, en concreto

das rúbricas, ou na área de certificados de linguas estranxeiras.

O enfoque comunicativo que promove o Consello Europeo revolucionou as metodoloxías

de ensino e aprendizaxe que existían antes da última década. A consolidación da Unión

Europea e a necesidade social de adaptarse ás demandas e esixencias do mercado da

globalización levou a centrarse nas destrezas comunicativas. Ter fluidez en inglés implica

que un pode manter unha conversación sobre un tema xeral e entender a calquera persoa

sen demasiada dificultade. Polo tanto, é esencial que a destreza oral e a auditiva teñan

prioridade. Igualmente, lograr alcanzar unha boa competencia comunicativa é o principal

obxectivo de calquera currículo de inglés como lingua estranxeira. Chegados a este punto,

é fundamental deixar claro que as destrezas comunicativas refírense a aquelas

involucradas na actuación oral, é dicir, a comprensión auditiva e a expresión oral. Pola

súa banda, a competencia comunicativa require que o estudante saiba a lingua e teña

coñecementos suficientes para saber o que é adecuado nunha comunidade de falantes

nunha situación concreta, así como coñecementos sobre pragmática, discurso e cultura ou

estratexias para esquivar posibles dificultades. Todo isto require que a programación,

tarefas e ensino se transformen e afasten de metodoloxías tradicionais centradas en

coñecementos gramaticais. E para que isto sexa posible, cómpre investigar para que a

transformación do ensino sexa efectiva e significativa e as novas metodoloxías e

instrumentos utilizados estean validados.

Todo isto é lóxico se se ten en conta que moitas das metodoloxías imperantes utilizan

exercicios de verdadeiro ou falso ou de resposta múltiple, os cales non necesitan dunha

20

rúbrica para determinar se a resposta é incorrecta. Non obstante , para determinar se unha

composición é boa ou mala é fundamental contar cunha ferramenta fiable e obxectiva. A

presenza de moitos ou poucos erros gramaticais é só un dos criterios que deben utilizarse

xunto con outros como o nivel e uso de vocabulario, a puntuación, a coherencia e a

conexión. No entanto, non calquera rúbrica pode utilizarse porque non todas as rúbricas

son obxectivas, válidas e fiables e por iso é polo que é fundamental investigar nesta área.

É necesario saber como deseñar e crear unha rúbrica efectiva ou polo menos como

determinar se unha xa feita o é ou non.

Outra área de estudo é a tecnolóxica posto que os novos dispositivos electrónicos son

unha realidade innegable na sociedade actual. Tiveron un impacto considerable en

practicamente todos os aspectos da nosa vida: o traballo, a saúde, as interaccións sociais,

a industria, a medicina… etc. e, por suposto, tamén na educación. Os estudantes de

linguas estranxeiras teñen agora acceso ilimitado a unha gran cantidade de input e poden

acceder a miles de mostras e exemplos con só un clic así como a múltiples dicionarios,

tesauros ou aplicacións de vocabulario. Esta tese doutoral quere facer tamén unha

pequena achega ao campo coa recompilación de recursos en liña para crear rúbricas.

Outro campo de investigación relacionado é o dos niveis de competencia nunha lingua.

Determinar o nivel de competencia dunha persoa nunha lingua non é unha tarefa fácil e o

MCER fixo un gran labor elaborando unha escala e unificando criterios para que, por

exemplo, un nivel intermedio alto signifique o mesmo nun país que noutro dentro da

Unión Europea. Os sistemas educativos europeos foron adaptándose aos novos

requirimentos esixidos no marco do mesmo xeito que aqueles certificados dedicados a

outorgar diplomas de certificación de nivel nunha lingua. Evidentemente, producíronse

moitos axustes para adaptar os exames de certificación ao marco e as súas directrices. Así

a todo, aínda queda moito por facer. É nesta liña onde se sitúa a contribución desta

21

investigación xa que a análise dos exames e as súas rúbricas indica aqueles puntos no que

as recomendacións do marco non se adaptaron correctamente para que poidan corrixirse.

Unha vez establecidos os obxectivos e as liñas de investigación relacionadas coa presente

investigación, é momento de falar dos resultados e conclusións da mesma. A presente

investigación foi satisfactoria en moitos aspectos. Permitiu non só unha comprensión

profunda do complexo sistema de avaliación senón tamén unha mellor percepción do

MCER como pedra angular do ensino de inglés como lingua estranxeira en Europa e

como forza condutora do establecemento do sistema comunicativo en todos os seus países

membros. No entanto, un dos descubrimentos máis significativos que xurdiu da

investigación é a verificación de que os certificados que determinan a competencia dos

candidatos en lingua inglesa presentan fallos na adaptación das directrices básicas do

MCER e na avaliación con rúbricas que, en ocasións, non cumpren cos requisitos de

eficiencia e fiabilidade que se espera delas. Ademais, xurdiron limitacións á investigación

derivadas da falta de transparencia dos certificados en canto a criterios e instrumentos de

avaliación ou datos de fiabilidade. A pesar da súa posición oficial e nivel nacional, algúns

dos certificados analizados proporcionan información vaga sobre a estrutura das súas

probas e non mostran datos de fiabilidade dos seus exames como o Alpha Cronbach.

Con referencia aos aspectos positivos, a análise tanto dos exames como das súas rúbricas

de avaliación permitiu realizar comparacións que abren liñas novas e interesantes de

investigación. Tamén a comunidade pode beneficiarse dos achados clave que permitiron

establecer patróns entre a avaliación das destrezas e poden contribuír á mellora do proceso

de avaliación dende a perspectiva do deseño de rúbricas e a creación de exames. O cambio

cara a un sistema de avaliación máis comunicativo que se axuste máis á metodoloxía e

ensino utilizadas no enfoque comunicativo pode partir tamén dalgunhas das premisas

desta investigación.

22

A primeira observación que debe facerse con respecto aos descubrimentos ten que ver

coa necesidade de promover unha maior transparencia. Coñecer o formato dun exame,

así como os seus obxectivos e os seus criterios de avaliación é fundamental para calquera

candidato. Por este motivo, é recomendable que as institucións responsables dos

devanditos certificados publiquen nas súas páxinas web as rúbricas coas que van avaliar

os examinadores, así como instrucións precisas sobre o tipo de actividades que conterán

as súas probas e os obxectivos e criterios dos mesmos. Seguindo esta liña de

transparencia, algúns dos certificados non ofrecen estudos que apoien a validez e

eficiencia dos seus certificados. Uns certificados con tal relevancia a nivel nacional

deberían estar apoiados con investigación sobre os seus resultados e validez. Mentres os

certificados FCE e IELTS achegan estes resultados, documentos e estudos, ningún dos

demais certificados o fai, o cal impediu unha comparación de certificados con respecto

aos seus coeficientes de fiabilidade.

Outra resultado clave desta investigación foi o feito de que ningunha das rúbricas

empregadas por ningún dos certificados analizados superou o test de validez elaborado

para este estudo a partir dunha rúbrica de rúbricas e as directrices do marco. Aínda que

algunhas rúbricas presentan máis aspectos a mellorar que outras, o certo é que o feito de

que ningunha delas se considerase plenamente válida exemplifica a dificultade e

complexidade que provoca deseñar unha rúbrica efectiva. Unha das principais

complicacións que se descubriu na comparación de resultados é que a dificultade de

alcanzar un equilibrio entre usar descritores que sexan bastante breves para que a rúbrica

sexa práctica e que as explicacións sexan o suficientemente claras e detalladas para que

non se consideren demasiado vagas ou abertas. Algunha das rúbricas, por exemplo, utiliza

demasiados criterios de avaliación e son, en consecuencia, pouco prácticas ou

23

manexables. Doutra banda, algunhas rúbricas si son prácticas porque miden só catro

criterios, pero os seus descritores son demasiado vagos.

O feito de que ningunha das rúbricas analizadas superase todos os criterios de fiabilidade

e validez leva a reflexionar e considerar se o propio MCER é realmente factible. O marco

achega directrices sobre como avaliar as distintas destrezas de maneiras diferentes. Para

empezar, contén cadros con información sobre as características que un candidato debe

demostrar en cada un dos niveis e destrezas. Isto permite que as distintas institucións e os

profesores poidan elaborar o currículo, as programacións e deseñar as avaliacións e os

tests para a avaliación de contidos ou a determinación dun nivel. Así mesmo, inclúe

rúbricas con descritores para cada destreza e mesmo para tarefas ou actividades concretas,

por exemplo, hai descritores para a avaliación dun monólogo, dunha presentación ou dun

anuncio público. Agora ben, o uso das rúbricas do propio marco implicaría que as

institucións que se encargan de deseñar certificados de nivel terían que usar unha rúbrica

distinta para cada tarefa do exame ou unha máis global para todas que non sería tan

precisa. Aínda que estes inconvenientes non sexan desvantaxes en si mesmos, pois

asegurarían que todas as tarefas son adecuadas, do mesmo xeito que os seus instrumentos

de avaliación, a cantidade de traballo e tempo empregado elevaríanse considerablemente.

Outro posible inconveniente é que as escalas proporcionadas polo marco son de tipo

holístico, polo que serían demasiado xenéricas para aqueles certificados que tenten

determinar se un candidato posúe ou non un determinado nivel e non poderían incluír

notas sobre a actuación do mesmo (sobresaliente, notable, ben, suficiente…). Diversos

estudos revisados probaron, ademais, que as rúbricas de tipo analítico son máis precisas

que as holísticas. Por conseguinte, son máis fiables e, por iso, a maioría das rúbricas

analizadas dos certificados son analíticas salvo a empregada polo ISE II para avaliar a

compresión auditiva. As institucións que organizan e deseñan os certificados encárganse

24

tamén da creación de rúbricas e estas son, como se dixo, de tipo analítico na súa maioría.

Posto que as escalas incluídas no marco son holísticas, o proceso de creación das mesmas

é máis complexo e, como resultado, as rúbricas utilizadas por un certificado e outro son

moi diferentes a pesar de ser do mesmo nivel. Con todo, a elección de escoller os criterios,

descritores e escalas pode personalizar moito máis as rúbricas ao modelo de proba

proposto, pero á súa vez as distancian máis do MCER. É probable que se o propio marco

incluíse rúbricas de tipo analítico para as distintas destrezas e niveis, as rúbricas creadas

por institucións ou os propios docentes foran máis similares entre si e posiblemente máis

fiables.

En liña co deseño de rúbricas segundo as recomendacións expostas no marco: viabilidade

da rúbrica de acordo co tamaño e o número de criterios, escritura positiva e información

breve pero non vaga nos descritores; cabe destacar a dificultade de redactar descritores

con información breve pero non vaga como xa se mencionou. Ademais, outro obstáculo

das rúbricas analizadas foi o de utilizar palabras positivas, xa que conseguir redactar os

descritores dos niveis de actuación máis baixos con palabras positivas require un dominio

absoluto da lingua. Debido a isto, só dous das rúbricas analizadas cumpren este apartado.

Un maior uso de palabras de connotación positiva no marco axudaría a mellorar este

aspecto proporcionando máis exemplos sobre como realizar tan complexa tarefa.

Tendo en conta todo o exposto anteriormente, podería concluírse que o MCER, a pesar

de realizar un labor magnífico na unificación de niveis e o impuxo da competencia

comunicativa, pode non ser todo o aplicable que debería. Unha vez que xa se iniciou unha

reforma do sistema e a avaliación de linguas estranxeiras sería un bo momento para

revisar o propio marco e tentar mellorar as pequenas deficiencias que presenta.

Outra reflexión interesante da investigación levada a cabo é a relacionada coa presenza

ou ausencia de rúbricas como ferramenta de avaliación. Con respecto a isto, detectouse

25

que só as destrezas produtivas, é dicir, a produción oral e a produción escrita, son

avaliadas con rúbricas. Tan só o ISE-II utiliza unha rúbrica para avaliar unha destreza non

produtiva senón receptiva, a comprensión auditiva. É verdade que as tarefas que

tradicionalmente se utilizan para examinar as destrezas produtivas (unha presentación, un

monólogo, unha redacción…) parecen a priori máis fáciles de examinar cunha rúbrica.

Porén, nada impide que as destrezas receptivas poidan avaliarse de maneira obxectiva e

fiable cunha rúbrica. Así a todo, avaliar unha proba de comprensión auditiva ou escrita

cunha escala requiriría un cambio nos modelos de exame e tarefas que normalmente se

utilizan para medir estas destrezas. Avaliar un exercicio de resposta múltiple ou

verdadeiro ou falso cunha rúbrica non ten ningún sentido, pero se en lugar deste tipo de

actividades utilizásese outras nas que o alumno tivese que demostrar a súa comprensión

elaborando algún texto oral ou escrito, si sería posible utilizalas. Isto é precisamente o

que ocorre na proba auditiva do certificado ISE II, no que o alumno, despois de escoitar

o audio, debe elaborar un resumo oral e, a continuación, conversar co examinador sobre

o que entendeu, contestando as súas preguntas cando sexa necesario. O cambio do tipo

de tarefas máis comunicativas tamén deberá levar consigo un cambio na metodoloxía.

A pesar dos resultados e achados desta investigación, é fundamental ter en conta as

limitacións da mesma. Unha maior transparencia por parte das institucións con respecto

ás súas probas, obxectivos, criterios e rúbricas facilitaría esta investigación, así como

datos referentes a estudos previos relacionados coa súa fiabilidade.

Esta investigación tamén abre a porta a novas liñas de investigación xa que a

investigación no ámbito da avaliación de linguas estranxeiras, o MCER e as rúbricas

aínda requiren de moitos máis estudos. Entre as distintas investigacións que se poderían

seguir por mor desta tese doutoral, as máis importantes estarían relacionadas cun estudo

da fiabilidade que comparase os distintos certificados, por exemplo, comprobando a nota

26

obtida por un mesmo candidato presentándose ao mesmo nivel nos distintos certificados

ou a nota obtida por un mesmo candidato nun exame se é avaliado por diversos

examinadores independentes diferentes. Por último, un estudo sobre a aplicabilidade do

MCER sería de sumo interese así como un que analice ata que punto respéctase e aplican

os principios do marco no sistema educativo actual.

Todas estas novas liñas de investigación apuntan a un gran espazo de investigación nesta

área para alcanzar un sistema que garanta inequivocamente que todos os estudantes

poidan desenvolver ao máximo o seu potencial. Son moitos os pasos que xa se deron nas

últimas dúas décadas, no campo da aprendizaxe de linguas realizáronse cambios

importantes cara a unha aprendizaxe comunicativa real. Isto debería, polo tanto, ser fonte

de inspiración para todos os investigadores, para demostrarlles que os achados, a

disciplina e a investigación de verdade poden transformar a realidade na que vivimos.

27

Prologue

In the new global economy, English has become the most important and relevant language

in the world. However, English had acquired the status of lingua franca par excellence

much before the advances in transportation and technology enabled the intensification of

trade around the world. Thus, this new era, often known as globalisation, has resulted in

an upsurge in the significance of the language. This explains why the learning of English

as a Foreign Language1 (hereinafter, EFL) is nowadays a primary concern worldwide.

This need to acquire the language has spurred a proliferation of studies related to the

Teaching of English as a Foreign Language (henceforth, TEFL), and consequently many

different methodologies have emerged through the years, with varying success. While

some of them are still being implemented and coexist, others have been cast aside. At the

same time, new methods, tasks, instruments and assessment2 tools are appearing, most of

them inspired by the new technological advances. Along with these, new certificates,

titles and diplomas which certify an individual’s language competence and level of

proficiency have been created. This is partly trigged by the fact that demonstration of the

level of fluency in a language has become an essential requirement in order to apply for

a job or for a grant, or to enable somebody to study in other country. In addition, in

primary and secondary schools, and also at university and other academic contexts,

English is frequently evaluated and, most of the time, compulsory.

1 Foreign Language: according to Richards and Schmidt: a language which is not the native language of large numbers of people in a particular country or region, is not used as a medium of instruction, and is not widely used as a medium of communication in government, media, etc. Foreign languages are typically taught as school subjects for the purpose of communicating with foreigners or for reading printed materials in a language (206). As opposed to second language: a language that plays a major role in a particular country, though it may not be the first language of many people who use it (472) 2 Assessment and evaluation will be treated as synonyms owing to of stylistic reasons, although the difference between them will be explained later.

28

In recent years, the Council of Europe, through the Common European Framework of

Reference for Languages: Learning, teaching and assessment (from now on CEFR), has

been promoting shifts in the teaching-learning process, such as the implementation of the

communicative approach3 or the practice of the four language skills in the classroom

(speaking, reading, listening and writing) rather than focusing solely on writing and

grammar. As a result, a great deal of research on assessment in the EFL classroom has

been conducted. Nevertheless, traditional assessment methods remain predominant and

are still used in most schools, even though they are not suitable for the assessment of

communicative language competence, which could be defined as the underlying

knowledge a speaker has of the rules of grammar, including phonology, orthography,

syntax, lexicon, and semantics, and the rules for their use in socially appropriate

circumstances (Hymes 269-293). This conveys the need for research in this area in order

to find new assessment tools to evaluate them. Traditional paper tests no longer work for

the evaluation of speaking and must be discarded in order to pave the way for new

assessment methods.

In light of this new situation, research into rubrics4 has become a central area of study. In

spite of the fact that many countries, such as the USA, have been using them as a common

assessment tool since the beginning of the 20th century, far too little attention has been

paid to them in our country. By contrast, the appearance of grading scales in school

textbooks in Spain has not become frequent until recently and its current presence has

been partly prompted by the current education law, known as the LOMCE. So far,

3 Communicative approach: approach to foreign language learning teaching which emphasises that the goal of language learning is communicative competence and which seeks to make meaningful communication and language use a focus of all classroom activities (Richards and Schimdt 90) 4 Rubrics: for stylistic purposes and due to the high frequency of appearance of the term, rubrics will be sometimes referred as grading scales.

29

however, there has been little discussion on rubrics and there are still many enquiries

about their application, effectivity, uses and objectivity which need to be addressed.

Not only has the CEFR had an influence on teaching methodologies and approaches, but

also over other aspects; for instance, it is a valuable source of information regarding

suitable assessment tasks for each of the skills. Furthermore, it provides useful knowledge

of Learning Standards. Those learning standards facilitate not only the preparation of the

curriculum and the syllabus, but also the contents of a paper and the criteria used for the

assessment thereof.

This doctoral thesis has been written after an exhaustive and comprehensive reading of

primary resources for the assessment of EFL, especially, about the use of rubrics and the

use of the CEFR for the creation of assessment tests and tools. The main aim of the

research is to provide the educational community with evidence on how appropriately the

CEFR guidelines have been implemented in the assessment papers of the most common

English Certificates in Spain. In order to do so, an analysis of each skill’s exam papers

will be carried out, and this will include the tasks, the objectives of each one thereof, and

whether or not they are adhering to the indications of the CEFR. Furthermore, the research

studies which skills are being assessed with a rubric and examines them to check their

effectivity.

This analysis of the exam papers which form the English Certificates selected, together

with the rubrics employed for their grading, intends to provide the educational community

with a useful source of information which may be used with different purposes. One of

these could be the detection of which tasks are more frequent for the assessment of a

particular skill and which criteria are usually considered for their evaluation.

Furthermore, it will allow us to check whether the framework is being used correctly and

it will identify incoherencies and omissions disregards in reference to the framework’s

30

recommendations. On the other hand, the analysis of rubrics will make it possible to

establish patterns among them. In this way the most common types could be detected.

Moreover, it will be determined if they are effective or not and in which aspects they can

be improved. Finally, this entire analysis will help in the creation and design of future

rubrics.

Objectives of the thesis

The entire thesis has been conceived with the intention of providing the educational

community with research that may help to improve of the assessment of EFL.

The creation of the CEFR has encouraged a new approach in TEFL and the learning of

foreign languages. Old-fashioned methodologies, such as the grammar-translation

method5, have been substituted for communicative ones6, which prioritise speaking and

listening skills and favour practical scopes. All of these changes imply a necessary switch

in the assessment and evaluation processes, since traditional paper-based exams are no

longer effective.

Taking this into account, the aim of this thesis is to facilitate ease the required change in

the assessment process. In order to do so, all the information related to assessment,

involving both traditional and new methodologies, needs to be gathered. It is essential to

know the dimension of assessment so that contributions to its improvement can be

produced.

5 Grammar-Translation Method: a method of foreign or second language teaching which makes use of

translation and grammar study as the main teaching and learning activities (Richards and Schmidt 231) 6 All the methods derived from the communicative approach, such as Task-Based Language Teaching, Cooperative Language Learning, and Content-Based Instruction

31

Secondly, it is also an intention of the current thesis to study in detail both the CEFR and

everything related to rubrics: the former because it is fundamental to analyse its

indications towards the teaching of foreign languages and, more specifically, the

assessment of foreign languages; and the latter because the effective use thereof requires

detailed knowledge of them. This includes knowing how they work, what advantages or

possible drawbacks they have, which different types exist and how they must be designed.

As a specific contribution to the current assessment of EFL and the use of rubrics is the

main objective of the current thesis, the principal goal is to study and analyse how the

most important Certificates of competence in English assess the different skills in relation

to the framework and how rubrics are being used to validate these certificates.

With the above stated purpose, the structure and tasks of the different papers from each

of the certificates selected will be studied. It will be checked whether they follow the

CEFR and if they are adapting its recommendations to their exams. Moreover, their

effectiveness and reliability will be tested and the areas in which certain improvements

or corrections should be made will be identified.

Concerning rubrics, it will be studied which papers do use a grading scale as an

assessment instrument. Furthermore, in those in which a rubric is employed, the rubrics

will be analysed in order to detect the most common types of rubrics, and to ascertain

whether they are effective and reliable. In addition, there will be a study on which aspects

can be improved and whether the framework guidance is being followed. The aim is to

stablish relations and patterns among the different rubrics and skills, to give advice for

their correction or improvement, if needed, and to establish accurate instructions for the

creation of future ones.

32

Finally, the research also intends to establish future lines of investigation which can

contribute to the current one, or which may be opened up thanks to the present one.

The main objectives and sub-objectives of the current thesis are as follows:

1- To gather and organise information relating to the assessment of English as a

Foreign Languages (EFL).

1.1. To study alternative evaluation methods and instruments of

evaluation.

2- To study the Common European Framework (CEFR) in relation to

assessment, Learning Standards and grading scales.

3- To do an in-depth study on the theory, definition, types and composition of

rubrics.

3.1. To provide information about online rubrics and instruments to design

them.

4- To analyse some of the most common English Certificates in relation to their

papers and rubrics.

4.1. To determine whether the implementation of the Framework is

correct.

4.2. To establish patterns and relationships among them.

4.3. To detect shortcomings or mistakes which could be improved.

5- To open new lines of research.

33

Table of contents ABSTRACT ...................................................................................................................................... 1

Resumen ........................................................................................................................................ 3

Resumo .......................................................................................................................................... 5

Spanish summary (long) ................................................................................................................ 7

Galician summary (long).............................................................................................................. 17

Prologue ...................................................................................................................................... 27

Chapter 1: INTRODUCTION ..................................................................................................... 37

1.1. State of the art ............................................................................................................. 37

1.2. Lines of research ......................................................................................................... 43

1.3. Synopsis ...................................................................................................................... 47

Chapter 2: REVIEW OF THE LITERATURE ........................................................................... 51

2.1. A historical review of evaluation ..................................................................................... 53

2.2. Types of evaluation .......................................................................................................... 62

a) According to the moment of application: .................................................................... 62

b) According to its extension ........................................................................................... 63

c) According to the agent of evaluation (ibid. 40) ........................................................... 63

d) According to the scale ................................................................................................. 64

e) According to the purpose ............................................................................................ 64

f) According to the scoring ............................................................................................. 65

g) According to the delivery methods employed ............................................................. 65

h) According to the formality .......................................................................................... 66

i) Divergent/convergent .................................................................................................. 66

j) Process/ Product .......................................................................................................... 66

2.3. Dimensions of assessment ................................................................................................ 66

2.4. Importance and consequences of evaluation .................................................................... 68

2.5. Traditional Evaluation ...................................................................................................... 72

2.6. Alternative evaluation ...................................................................................................... 76

2. 7. Assessment for Learning ................................................................................................. 80

2.8. Instruments ....................................................................................................................... 82

2.8.1. Portfolio ..................................................................................................................... 83

2.12.1.1. European Portfolio of languages ............................................................................. 85

2.8.2. Oral presentations ...................................................................................................... 86

2.8.3. Journals ..................................................................................................................... 87

2.8.4. Projects ...................................................................................................................... 88

2.8.5. Interviews .................................................................................................................. 90

34

2.8.6. Progressive assessment chart..................................................................................... 91

2.8.7. Report ........................................................................................................................ 92

2.8.8. Rubrics ...................................................................................................................... 93

2.9. Language Assessment Literacy ........................................................................................ 93

Chapter 3: The CEFR .................................................................................................................. 95

3.1. Common Reference Levels .............................................................................................. 96

3.2. Learning Standards ........................................................................................................... 98

3.3. Chapter 9: Assessment ................................................................................................... 101

3.4. The CEFR on rating scales or checklist ......................................................................... 104

3.5. Evaluation of competences ............................................................................................. 107

Chapter 4: RUBRICS ................................................................................................................ 111

4.1. Definition .................................................................................................................. 111

4.2. Why use rubrics? ....................................................................................................... 113

4.3. Historical Overview. Rubrics in Education ............................................................... 116

4.4. Types ......................................................................................................................... 118

a) According to how is measured: ................................................................................. 119

b) According to the scoring type: .................................................................................. 120

c) According to its theme .............................................................................................. 120

d) According to its application ...................................................................................... 121

e) According to its function ........................................................................................... 121

f) According to the scorer ............................................................................................. 122

g) According to the channel ........................................................................................... 122

4.5. Parts of a rubric ......................................................................................................... 123

4.6. Advantages and disadvantages .................................................................................. 124

4.7. How to build a rubric ................................................................................................ 127

4.8. Online Tools for building a rubric ............................................................................. 128

Chapter 5: METHODOLOGY .................................................................................................. 139

5.1. Introduction .................................................................................................................... 139

5.2. Methodological approach ............................................................................................... 141

5.3. Research design .............................................................................................................. 149

5.3.1. Objectives and context ............................................................................................ 149

5.3.2. Definition of the units of analysis ........................................................................... 150

5.3.3. Number scheme rules .............................................................................................. 153

5.3.4. Categorisation and codification ............................................................................... 164

5.3.5. Reliability and Validity ........................................................................................... 168

5.3.4. Data Analysis .......................................................................................................... 168

35

5.4. Hypotheses ..................................................................................................................... 171

Chapter 6: RESEARCH ............................................................................................................ 173

6.1. Proficiency Test exam papers for the different language skills and their assessment

rubrics 173

6.2. Writing ...................................................................................................................... 174

6.2.1. Literature review ............................................................................................... 174

6.2.2. Assessment of Writing in the main English Certificates ......................................... 183

6.3. Speaking .................................................................................................................... 228

6.3.1. Literature Review .............................................................................................. 229

6.3.2. Assessment of Speaking in the main English Certificates of ESL .................... 234

6.4. Reading ..................................................................................................................... 279


6.4.2. Assessment of Reading in the main English Certificates of ESL ..................... 284

6.5. Listening .................................................................................................................... 304


6.5.2. Assessment of listening in the main English Certificates of ESL ..................... 307

6.6. Findings ..................................................................................................................... 325

Chapter 7: CONCLUSIONS ..................................................................................................... 339

7.1. Research implications ............................................................................................... 340

7.2. Research limitations .................................................................................................. 346

7.3. Research applicability and future implications ......................................................... 347

Chapter 8: BIBLIOGRAPHY ................................................................................................... 353

8.1. List of figures: ................................................................................................................ 370

APPENDICES ........................................................................................................................... 373

Appendix 1: Exam samples ................................................................................................... 373

FCE ................................................................................................................................... 373

IELTS ................................................................................................................................ 391

ISE II ................................................................................................................................. 413

ACLES .............................................................................................................................. 421

EOI .................................................................................................................................... 431

Appendix 2: Rubrics ................................................................................................................. 441

FCE ................................................................................................................................... 441

IELTS ................................................................................................................................ 443

ISE-II ................................................................................................................................. 447

ACLES .............................................................................................................................. 453

EOI .................................................................................................................................... 455

36

Appendix 3: CEFR ................................................................................................................ 459

Common Reference Levels: self-assessment grid ............................................................. 459

Common Reference Levels: qualitative aspects of spoken language use ......................... 461

Chapter Four scales for different skills and tasks .............................................................. 463

37

Chapter 1: INTRODUCTION

1.1. State of the art

The use of the CEFR within the European Union for TEFL and the assessment of EFL is

already a reality. The Spanish government has implemented the framework’s guidelines

through education acts, and the main English Certificates for assessing English

competence claim to follow the CEFR patterns. Such a meaningful change in foreign

language learning and EFL have boosted research on CEFR. Furthermore, there are many

studies on how to implement the CEFR in other countries outside Europe. Nonetheless,

although not exactly scarce, research on the implementation of the CEFR is not so

prolific.

Mathea Simons and Josef Colpaert conducted a study on this issue with the aim of

shedding some light on how the CEFR is perceived and how it can be improved. Their

article “Judgemental Evaluation of the CEFR by stakeholders in language testing”

published in 2015 in Revista de Lingüística y Lenguas Aplicadas collects their findings.

They designed a survey which was discussed and answered by 138 users (teachers,

researchers, publishers, test developers and policy makers) who attended the International

Conference “Language Testing in Europe. Time for a Framework?” held in the University

of Antwerp in 2013. Among their findings, the fact that most of the responders use the

CEFR frequently in their jobs can be highlighted, although they do so in varying degrees

of detail. The results showed that responders make use of the framework for designing

language tests that corresponded to CEFR levels (58.7%), informing of the content of a

teaching syllabus or curriculum (49%) and designing teaching and learning tasks (46%).

In addition, and even though 56% admit that the institution they work for requires them

to use the CEFR, respondents state they use the CEFR because they have read research

38

studies that convinced them that the CEFR is important (59%). Overall, the perception of

the usefulness of CEFR is rather positive or very positive, but the practicality and degree

of detail do not entirely meet their expectations. Finally, among the recommendations

suggested, the most relevant could be the need for some control over the use that is made

of the framework in real educational settings.

In the same line, the article “Is CEFR Really over There?” written by Sibel Çagatay and

Fatma Ünveren and published in Procedia. Social and Behavioural Sciences in 2015

deals with the research they conducted to explore English Language instructors’

knowledge on the basis of CEFR and their perceptions of CEFR curriculums. They

surveyed instructors of EFL subjects, 18 from a private university and 36 from a state

university during the academic year 2013-2014. Among the state instructors, they claimed

to know about the CEFR but were neutral about understanding the contents of it.

Furthermore, 68% did not take a course or education concerning CEFR and admitted that

they did not have sufficient knowledge about it. Concerning the impact of CEFR on

coursebooks and programmes, 52% did not think that their programme was CEFR

specific and only 36% felt that the CEFR had an impact on the coursebook they used. As

for the private instructors, all of them stated they had received education and were fully

aware of the uses and details of the CEFR. They also agreed on the impact the CEFR had

on their coursebooks and programmes. By way of conclusion, both private and state

instructors considered the CEFR to be useful.

Teachers’ perceptions of the CEFR were also analysed in the primary school context. An

article published in the International Online Journal of Education and Teaching (IOJET)

in 2018 comments on the findings of a comparative study carried out among EFL teachers

working at state and private schools. The researchers, Yilmaz and Ünveren, sent a

questionnaire to 105 school teachers around Turkey. The conclusions of their research

39

were that the majority of teachers had general knowledge of the CEFR but teachers from

private schools had taken courses or training on it and therefore had a sufficient amount

of knowledge, unlike their counterparts. Moreover, teachers from private schools felt that

the CEFR had an impact on course books, tests and language-teaching techniques, while

state teachers remained undecided on those issues. Yilmaz and Ünveren also conducted

a socio-demographic study and found novice teachers to be more aware of the CEFR’s

impact. Moreover, EFL teachers holding MA or PhD degrees had more knowledge on

CEFR than those who only hold degrees in English.

The use of rubrics is still limited within Spain, but it is progressively more frequent in

textbooks and official certificates. Despite this, there are many questions in relation to

them that need to be studied, although since the education act (LOMCE) was enacted,

more research in connection with rubrics has appeared. In 2015, Carolina Girón García

and Claudia Llopis Moreno, from the University Jaime I published an article about the

use of rubrics for the assessment of speaking titled “Designing Oral-based Rubrics for

Oral Language Testing with undergraduate Spanish Students in ESP context” in The

Journal of Language Teaching and Learning (JLTL). Their article deals with a research

conducted with students at their university which aimed to determine whether the fact

that students could choose their partner in their oral assessment (ideal partner) improved

their results in comparison to performing the same task with an undesired partner. A total

of 10 undergraduates were selected as subjects of the research, all from different

university degrees but taking the subject of scientific English during the academic year

2013/2014. Research consisted in the design of a questionnaire wherein the subjects had

to point out the classmate they would select as their “ideal partner” and the one with

whom they would prefer not to perform (“undesired partner”). The students’ level was

assessed with a monologue task and a rubric. The rubric was aimed at assessing the

40

students’ fluency, vocabulary, grammar, pronunciation, coherence and communicative

management. The scale was from 1 to 3 and then an average grade was calculated. After

the analysis of all the questionnaires and the level of the participants, they were paired

for the performance of a dialogue (role-play). Students had to perform the task firstly with

their ideal partner and then with the undesired one. Both performances were assessed with

the same rubric. Once all the data had been analysed, the researchers concluded that 20%

of the participants improved their performance and grade with their ideal partner, whereas

60% obtained a worse qualification when they were with their undesired classmate. 20%

of the participants obtained the same result with both partners. Although the study

limitations are clear, the results underscore the need to continue studying the assessment

of communicative skills with rubrics in order to check its effectivity and improve the

evaluation process.

In the article entitled “Assessment rubrics: towards cleared and more replicable design,

research and practice”, appearing in the journal Assessment and evaluation in Higher

Education published in November 2015, Phillip Dawson reflects on the different

meanings of the term “rubric”, from a secret assessment-sheet up to the articulation of the

assessment criteria expected from a written paper. He states that the research on rubrics

has risen considerably in recent years, as evinced by the fact that until 1997 there were

only 106 books whose main topic were rubrics, something which has clearly changed, as

at least 5,000 articles on rubrics were published worldwide up to 2013.

In the article, Dawson warns about the fact that many institutions are forced to use a rubric

when the definition thereof is not yet clear. His concerns stem from the fact that the lack

of agreement on what a rubric really is may lead to incoherence in its application and may

diminish the rubric’s effectivity and reliability. This issue originated research upon the

different ways of categorizing grading scales. One of those criteria might be purpose:

41

whether they have been designed for the assessment of one task or for the assessment of

a skill independently of the task. Other examples of criteria are whether the rubric

contains examples, or if it is holistic or analytic. However, Dawson claims there are other

good ways of categorising them, such as the way the descriptors are expressed, the

presentation of a rubric, who has designed the grading scale, etc. Finally, Dawson

mentions that some rubrics can include some feedback.

Following Dawson’s line of research, Thikra K. Ghalib and Abdulghani A. Al-Hattami

also undertook research into different types of rubrics. In particular, they focused on

holistic and analytic rubrics as an instrument for the assessment of written tasks. The

results of their research are given in their article “Holistic versus Analytic Evaluation of

EFL writing: A Case Study” published in June 2015 in English Language Teaching

magazine. The University of Taiz in Yemen was the location for their case study with 30

students taking the Writing Skills course during the academic year 2014/2015. Their

objective was to find out whether holistic or analytic rubrics could improve the reliability

of the evaluation process and whether there is any correlation between the grades obtained

and the use of one or other type of rubric. The examiners used two rubrics: a holistic one,

with six points; and an analytic one, with five different criteria to assess cohesion,

vocabulary, syntactic structure, etc. Students had to write a 250 word-descriptive text.

The examiners assessed both of them with each of the rubrics with one month in between.

The analysis of the data showed that students obtained lower grades with the analytic

rubric but, at the same time, reliability among the examiners’ grades was higher when

using the analytic rubric.

A case study carried out by the Universities of Granada and Vigo (Gallego Arrufat and

Raposo-Rivas) concluded that, after using the rubric throughout a whole term in one

subject, students thought that its use increased their motivation and boosted cooperative

42

work. They expressed their options through a survey based on a Likert scale (211).

Another case study carried out by Verano-Tacoronte et al. designed a rubric based on

specific literature read. This rubric was validated by a panel of experts and later used to

assess undergraduate students. Students had access to the rubric in order to prepare their

presentations in pairs. They were assessed by a team of teachers using the rubric and the

results showed the rubric’s high reliability, as the scores given by the teachers were very

similar.

On the other hand, some authors have conducted research on rubrics by analysing the

work of other researchers on the issue. For instance, Jonsson and Svingby studied

seventy-five scientific studies on the reliability and validity of rubrics and they concluded

that scoring is more consistent when using them. Moreover, both the reliability and

validity of the assessment process increased when employing them as an assessment tool.

Panadero and Jonsson analysed twenty-one studies and found that rubrics provided

transparency to the assessment, reduced the anxiety, aided with feedback, and helped to

improve students’ self-efficacy and self-regulation.

On the contrary, some other researches are more critical with the use of rubrics although

they do not discourage its use. Reddy and Andrade are critical about how carelessness

with the validity of the rubrics some researches seem to be. They criticised the fact that

they describe neither the rubric development nor the content validity (cited in Cano 273).

Panadero, Alfonso-Tapia and Huertas resolved that rubrics, in spite of improving the

feedback, do not boost learning by themselves (cited in Cano 270-274).

43

Another research project conducted by Velasco-Martinez and Tojar-Hurtado and

published in Investigar con y para la sociedad attempted to ascertain to what extent

teachers use rubrics to assess competences. To this end, they analysed 150 different

rubrics used by teachers of different universities in Catalonia, Aragón, Galicia,

Extramadura and Castilla y León. Among the results obtained, it was discovered that the

branch of social and legal sciences is the one in which rubrics are most used (34%), as

opposed to arts and humanities (only a 4%). A further finding was that rubrics were

mainly used to assess written essays (36%) and hardly ever used to assess visual or

graphic resources (2.7%), which implies a traditional conceptualisation of knowledge as

something that can be looked up in a book. In addition to these findings, the authors also

provide the educational community with some other interesting data, such as the teaching

methodology the participant teachers applied in their lessons. These data denoted that

master lecture is still the most used methodology (36.7%) while other innovative

methodologies have no significant presence in those universities; for example, only 1.3%

of the respondents used portfolio and 6.7% used case studies with their students. (1396-

1400)

1.2. Lines of research

In this section, the key lines of research will be stated. English is an extremely broad

subject of research and different areas related to it may be implicated in its study. The

current research connects the areas of Education and English Language with the

44

implementation of the CEFR. Besides, new methods of assessment, particularly rubrics

and English language proficiency tests7 in a language are also involved in the research.

The communicative approach promoted by the European Council has revolutionised the

teaching and learning methodologies existing prior to the last decade. The consolidation

of the European Union and the social necessity of adaption to the economic market

demands of globalisation have led to a focus on communicative skills. Being fluent in

English means one can hold a conversation about any general topic and understands

anybody without much difficulty; thus, it is essential to treat speaking and listening skills

as a priority. In addition to this, the students’ achievement of communicative competence

is the main aim of TEFL. Here, it is relevant to clarify how the concepts of communicative

skills and communicative competence are used throughout this thesis. Communicative

skills refer to those used in oral performance; i.e. listening and speaking. In contrast,

communicative competence involves all the knowledge of a language and the knowledge

of whether it is feasible, appropriate, or done in a particular speech community (Richards

and Schmidt 90-91). The foregoing implies a complete transformation of syllabi, tasks

and teaching, all of which need extensive research. If a meaningful and effective

transformation is wanted, all of these new methodologies and instruments must initially

be checked and subsequently improved once they are in force.

The dominant traditional methods of assessment prevailing were designed to evaluate

traditional methodologies, but they are not suitable for the current communicative

approach. For example, speaking tasks cannot be assessed with either multiple-choice

tasks or with a true or false response, and, obviously, paper-based tests are not an option.

As a result, the need has arisen to apply new assessment methodologies and new

7 Proficiency tests: tests that measures how much of a language someone has learned (Richards and Schmidt 425). In this case the standarised English Certificates which are under study.

45

assessment instruments. Portfolios, grading scales, journals and projects are some of the

examples which could be mentioned as new instruments of assessment currently used in

the evaluation process. Nevertheless, a great deal of research still needs to be conducted

in relation with them. Among these new popular instruments for alternative assessment,

rubrics can be highlighted. Although it can be argued that they are not so new, their use

in Spain is still scarce. Hence, this thesis provides an important opportunity for studying

how effective, valid and reliable a rubric can be for assessment, what types of rubrics

exist, and how the reliability and precision thereof can be improved. It is in this line of

research where the present thesis is located.

Another principal area of study is technology, since new devices and new technologies

have an undeniable presence in the current society. They have had an impact and

influence in almost all aspects of life, including work, relationships, industry, medicine,

etc., and education is no exception. Thus, there is a vast field of research, since those new

resources, applications and programmes can be used in the classroom and even in the

assessment process, by both students and teachers. Foreign language learners can have

access to an unlimited amount of input: reading, listening, speaking and writing samples

can be easily accessed with one click as well as, dictionaries, thesauri and vocabulary

applications. How to adapt all these new possibilities to the education system is a broad

issue to examine and research. It is here where this thesis aims to make a small

contribution through the inclusion of online rubrics and tools for their creation.

Another line of research connected with the learning of foreign languages and the new

educational approach are levels of competence. The necessity of learning a new language

and the corresponding need to assess competence in a foreign language are the result of

the aforementioned globalisation. Determining the level of competence which a learner

or user has in a foreign language is not an easy task. That is why the European Council,

46

having detected this problem, urged research in the field. As a result of this, the CEFR

was developed.

Once the CEFR established the common levels of competence for languages in each skill

and provided guidelines on how to assess them, it was time to implement it across Europe.

Therefore, education systems had to be adapted to the new requirements and,

consequently, certificates to prove level of competence became especially significant.

While there have been a lot of adjustments in order to adapt the framework, there are still

many shortcomings in the adaptation thereof. Indeed, how to correctly implement the

Framework guidelines, how to assess competences and how to teach foreign language

according to the same requires further research. This is the line of investigation which has

inspired the current research, as its intention is to ascertain how Certificates of

competence in English have adapted the Framework to their papers together with the new

assessment instruments.

In conclusion, this thesis follows several lines of research which are somehow related and

connected. Generally speaking, these lines are as follows:

- The study of English as a Foreign Language (EFL).

- The modernisation of teaching and assessment methods.

- The implementation of the Common European Framework of Reference (CEFR).

- The assessment of the levels of competence of a foreign language user.

- The tasks which may form a paper for the assessment of each of the skills.

- The use of rubrics as an instrument for assessing skills from the communicative

approach.

47

1.3. Synopsis

Owing to the foregoing, the current doctoral thesis intends to study rubrics as an

alternative tool for the assessment of EFL and the most popular English Certificates in

our country. With this in mind, the entire work has been organised into different chapters

which deal with important aspects necessary to achieve the final desired results.

After the current introduction section, the thesis starts in chapter two with a theoretical

framework on the assessment of EFL. This literature review encompasses its definition,

different types and possible classifications, dimensions, and importance. The framework

is sub-divided into various sections which deal with different relevant and related aspects.

First, a historical review of evaluation, which goes from a global perspective of the

phenomenon to finish with a more focused review of it in Spain. Next, the different types

of evaluation are explained, along with the different possible classifications which can be

made. Those schemes help to understand the dimensions of assessment and all the spheres

and agents related to it. This general theoretic framework finishes with the explanation of

different methods and assessment tools for the assessment of EFL, paying special

attention to the alternative ones, in particular, the rubric, which is one of the main subjects

of the following chapters of this work.

An alternative evaluation must be linked to the Learning Standards and undoubtedly to

the CEFR. These will be fully explained in chapter three, which begins with the Common

Reference Levels that are relevant for TEFL, the achievement of the communicative

competence and the assessment thereof, after which the concept of Learning Standards,

meaningful for the understanding of the current education system in Spain based on the

LOMCE, will be reviewed. Assessment is also one of the key issues in this thesis; hence,

48

the chapter of the CEFR that deals with the same is covered. Chapter three concludes with

the rating scales provided by the CEFR and the evaluation of competences.

Chapter four deals with the use of rubrics as a tool to assess the English competence of

foreign language learners. This chapter includes a comprehensive summary of the

literature and resources available. It also focuses on the description and explanation of a

rubric: how it can be defined, the importance of its use, its origin and history, the different

parts it may contain, its advantages and disadvantages and the multiple types of rubric

there are according to different criteria which might be used to classify them. Finally, the

chapter ends with an explanation of how a rubric must be built, and a review of the online

tools available to design them.

Chapter five contains the description of the research methodology. This focuses on the

methodological approach selected and how the research has been designed. In this section

the objectives of the research are explained as well as the selection of English Certificates

which are analysed, the criteria used to determine effective rubrics and the instruments

used to do so. Lastly, the hypotheses of the research are stated.

The research carried out is fully detailed in chapter six. This chapter is sub-divided into

six sections. The first section is the use of rubrics for the assessment of the different skills.

Then, there are four sections which correspond to the four different skills: writing,

speaking, listening and reading. These four sections follow the same structure: first there

is a review of the literature on the use of rubrics in order to assess each of the four different

skills, after which the test for the corresponding skill of each of the English certificates is

analysed. The analysis includes the explanation of the time and tasks used and the criteria

stated to its assessment, as well as a comparison with the tasks and criteria stated in the

CEFR, attempting to detect possible defects, shortcomings or incoherence. The rubric

49

used for the assessment of each paper (if any) is also analysed, according to the type and

effectiveness thereof.

The research’s chapter finishes with a comparison among all the papers and among all

the rubrics used to examine a particular skill. The comparisons help in the design of exam

papers and in the future development of rubrics for that skill. In addition, we could

contrast which kinds of rubrics are more likely to appear in the assessment of a certain

skill and if its use may or not lead to the need of change the way of teaching and assessing

one determined skill.

The conclusions of the thesis are found in chapter seven, which deals with the

implications of the research and the findings it has led to by making a reflection of what

aspects should be improved and corrected. The limitations of the research are also

mentioned in this chapter, which finishes with an explanation of its applicability; i.e., in

which ways this research may be used to make the most of it. Finally, several lines of

future research the current thesis opens are stated, with the hope that more research can

be conducted in relation with the same in order to continue improving the assessment of

EFL, the implementation of the CEFR and the use of rubrics.

The final chapter gathers all the bibliography which has been used in the preparation of

the current thesis together with the articles, theses and online resources employed. The

appendices contain the exam papers of each of the certificates analysed and the rubrics

that each of them uses.

51

Chapter 2: REVIEW OF THE LITERATURE

The assessment of EFL is a matter of international importance since English reached the

position as the world’s principal and most widespread lingua franca. It is therefore not

just the main language used in trade and international communication but also one of the

most taught and studied languages. Its importance is undeniable and many institutions,

both public and private, deal with the granting of certificates, titles and diplomas which

certify and individual’s knowledge and level of proficiency. In many educational and

professional contexts, such as schools, universities and international firms, English is

frequently taught, and most often evaluated.

The determination of one’s linguistic competence is highly complex and arduous, which

complicates the evaluation process even further. Consequently, a great deal of research

has been dedicated to the evaluation of EFL.

Patel defines evaluation as “the process of determining the degree to which the goals of

a programme have been achieved” (3). Nevertheless, this definition is very partial as it

only refers to one type of assessment; i.e. summative assessment. Moreover, it is actually

conceptualising assessment and not evaluation. This example illustrates the inaccuracy

with which the term is commonly employed, owing to the lack of a universal term.

A better definition of assessment might be its conception as “all the activities teachers

use to help students learn and to cause student progress” (Black and William 43).

Concepts as basic as evaluation and assessment are often confused, even by professionals

within the educational world. Following Pérez-Paredes and Rubio’s distinction (cited in

Mclaren et al. 2005), assessment is a “general term we use to refer to the set of procedures

which are put into practice when gathering information about student’s communicative

52

competence or student’s language performance achievement” (606). On the other hand,

evaluation “considers the teaching and learning program as a whole and seeks to obtain

feedback that can serve different purposes for the different agents in education” (609-

610). As a result, evaluation encompasses assessment, among other aspects such, as

planning, programming, reporting, etc. whilst assessment refers strictly to testing.

However, during this thesis the terms will be used interchangeably for stylistic purposes.

The importance of evaluation is unquestionable. Some authors, such as Dikli, affirm that

it is “one of the crucial components of the instruction” (13). Abbas states that “is an

essential tool for verifying that educational goals have been met” (24).

Nevertheless, this proliferation of definitions is not due solely to the importance thereof,

but also because of the complexity of the term, which makes it possible to distinguish

between three different levels regarding evaluation (Castillo and Cabrerizo):

- technical: evaluation as the process of checking that the system is accomplishing

its functions;

- ideological: evaluation has two functions. One is to legitimise the cultural

inheritance and the other to eliminate what does not belong to the ideological

principles which are being transmitted;

- psychoeducational: evaluation applied to individual/particular students; (6)

In addition, evaluation involves a huge range of dimensions (ibid. 20), including criteria,

evaluator, functions, indicator, methods, objectives, process, uses and variables which

must be considered.

With regard to the EFL Classroom, traditional teaching methods, such as the Grammar-

Translation Method, have been gradually substituted by certain communicative methods.

Since 2001, the European Council, through the CEFR, has been further promoting

53

communicative competence. As a result, the four language skills (speaking, listening,

reading and writing) started to be practised in the class, together with communicative

strategies and techniques and sociocultural context and not just the grammar, as was

common previously.

2.1. A historical review of evaluation

Evaluation and assessment are not new concepts. In fact, according to Lavigne and Good

“forms of testing can be traced back to Chinese, Greek and Romans” (2). López Bautista

expounds Cicero and Augustine of Hippo as examples of people who had already

introduced educative approaches. However, it was not until the Middle Ages when

university examinations started to be much formal. The tendency would continue in the

ensuing centuries. The 18th century saw an increasing demand to access education. This

would lead to an increase of evaluations (1-2) but only in form of entrance tests. The

education and evaluation processes back then were completely different from how we

currently conceive them, even far from our traditional concept of them. Actually, what

we currently know as traditional schooling was born and developed in the 19th century.

At that time, only memory ability was tested.

It was in the United States of America where the first oral and written forms of evaluation

(as they are now conceived) were found (Garrison cited in Lavigne and Good 2). The

beginning of formalisation of assessment arose from the moment when the first cheating

scandals emerged in the 1820s. It was argued that some students had received easier

questions than the others in order to manipulate the results (Lavigne and Good 2). This

scandal brought about the implementation of certain measures to improve objectiveness.

54

For instance, a number of public schools introduced committees to guarantee fairness

during the examination processes. As López Bautista states, in 1845, Horace Man

designed a performance test so that the schools of Boston could be evaluated. Some years

later, from 1887 to 1898, another formal evaluation was carried out in the country. This

time was Joseph Rice who assessed the orthographical knowledge of thousands of

students from all over the country (2). At that time, evaluation was used as a tool of

control, authority and punishment.

It was back then when the questioning around assessment (which has persisted until the

present day) started, and when the first doubts and criticism on assessment appeared.

Odell summarised these concerns back in 1928 (cited in Lavigne and Good 3. The main

concerns voiced included the belief that examinations were harmful for the students’

health as they provoked stress and fear. It was also thought that sometimes the contents

were not adjusted to the objectives. In addition, some critics claimed that, because passing

the exams became a goal in itself, examinations encouraged cheating and bluffing.

Another recurrent objection to assessment was the amount of time that was spent doing

examinations, time that could be better used for learning, reviewing, etc. Finally, it was

believed that exams were totally unnecessary since capable teachers should be able to

assess students on a daily basis through observation.

Evaluation continued to evolve throughout the next century. During the early decades,

some intelligence tests were designed, boosted by the findings of Charles Spearme in

1904 which identified a characteristic of general intelligence. Binet would design the first

intelligence test just one year later and Stern promulgated an intelligence quotient

formula. However, the tendency changed again in 1942, when Ralph W. Tyler defined

55

the term evaluation as a concept different and separate from measurement (López

Bautista 2-3). For him, the purpose of evaluation was to find out to what extent the

established criteria have been met. From that moment on, the approaches to evaluation

also focused on the syllabus and there was greater concern for the assessment process. In

1967, Scriven distinguished between formative and summative evaluation, as well as

between intrinsic and extrinsic evaluation. One of the major changes was changed by

Piaget’s constructivist conception of evaluation. This approach sees the learner as an

active subject, able to build his or her own knowledge. It also conceives that all new

knowledge is generated from that acquired. (ibid. 3).

Castillo and Cabrerizo point out seven different stages in the evolution of evaluation

through history (9). On the first moment or stage, assessment was seen as a way of

measuring the students to establish differences between them. Assessment subsequently

became a measure to check the consecution level of the objectives established. Influenced

by the new trends and conceptions emerging in the United States in the 60s, evaluation

was considered as a complex process that affected every aspect of education, and

summative and formative evaluation started to be distinguished. New perspectives were

developed in the following decade and gave rise to reflection on assessment as a process

that should affect the decisions made. Moreover, criteria and educational objectives were

at their peak; thus, evaluation became a normative process. The fifth stage encompassed

a period where different models of evaluation proliferated. Most of them could be

included into qualitative and quantitative paradigms.

Concerning EFL, Liz Hamp-Lyons also locates the origins of formal large-scale

examinations in the US before and after the First World War. As for Britain, foreign

56

languages were assessed with achievement purpose, just like ancient Greek and Latin had

been examined in the previous centuries. In 1911, the Cambridge University Senate

suggested the creation of a teaching certificate in modern foreign languages. In 1913 the

Certificate of Proficiency in English was developed prompted by the interest in improving

the relationships with colonies and former colonies. The test consisted of grammar and

translation exercises as well as phonetic transcriptions, essays and pronunciation. (15-17)

Another period in language testing and assessment took place in the late 1950s and early

1960s. John Carrol developed the Foreign Language Aptitude Battery, which designed to

determine to what extent a person will be able to master a language. At this time, two

proficiency tests were also developed in the United States: the Certificate of Proficiency

in English at the University of Michigan and the proficiency test of the American

University Language Centre in Washington D.C., which would lead to the now famous

TOELF. Nevertheless, these proficiency examinations were different from the British one

mentioned above, since they were influenced by psychometric advances and made no

assumptions of previous knowledge of the learners. The same period of time implicated

significant changes in Britain, too. English was a very strong language owing to

commerce and politics, so universities received thousands of international applications.

Thereupon, it was necessary to determine if a student would be able to study in English

or how much English he or she would need to learn before being able to do so. The

English Proficiency Test Battery and the Test in English-Overseas were the two most

famous examinations at the time. (ibid. 15-16)

The next change occurred in 1979 and was brought about drifted by the appearance of

communicative language teaching. The British Council required a more communicative

test to check proficiency within the academic context. The English Language Testing

57

Service (ELTS) was created, but it was too expensive, as well as hard to score and to

develop. As a result, it would be replaced by the IELTS, which was more generic as it did

not assess each individual according to the field he or she intended to study. (ibid. 17)

The following period was marked by the establishment of the Common European

Framework of Reference (CEFR), funded by the Council of Europe. The main purpose

was “to overcome the barriers to communication among professionals working in the field

of modern languages arising from the different educational systems in Europe” (Council

of Europe, cited in Hamp-Lyons 18) and “provid[ing] a common basis for the elaboration

language syllabuses, curriculum guidelines, examinations, textbooks, etc. across Europe”

(ibid.18)

Since the 1990s notable interest in the assessment process has been evident, with the

educational community focusing much more in validity and reliability of assessment and

the effect that examinations have on learners. Furthermore, teachers seem to be much

more aware of the implications of their tests. Reflexion and research on what it is taught

and the way it is taught as a direct consequence of the implications and influences of

testing has also been unavoidable. Tsushima argues that “due to epistemological changes

in 2nd language acquisition and the increasing awareness that any language assessment

cannot be separated from the context, culture, and values of where it was created and

used” (106) qualitative approaches are now more common and mixed methods (MM)

have been demanded.

In Spain, assessment also developed gradually, and it was influenced by the changes and

movements which were happening in the other countries mentioned. However, political

58

issues delayed the changes and Spain seemed to be always one step behind. One of the

most significant changes in the evolution of the Spanish education system occurred when

the Constitution of 1812 was approved. Thereupon, education came to be organised,

financed and controlled by the State and not by the church. Written on the grounds of

freedom and equality, the constitution defended the universality of primary education and

gave the teaching programmes some homogeneity (MECD 1). General Elío’s military

uprising would subsequently return the control to the church, until 1920, when the liberals

were in power for three years and enacted the Reglamento General de la Instrucción

Pública (General Regulation of the Public Instruction) and education was free again. The

following years would follow a similar pattern, with alternate periods of absolutism and

liberalism. During the reign of Isabel II, a well-known Spanish law for public instruction

was enacted in the year 1857. This is known as Ley Moyano and it achieved a consensus

between progressives and conservatives. This law established three levels of study:

elementary, general studies (6 years) and post-secondary. It also standardised teaching in

public and private schools and both teaching training and teachers. The governance

thereof was divided among the State, the province and the local governments. (ibid. 2)

The next significant period in education occurred during the First Republic, when the

freedom of education was allowed. Nevertheless, it lasted for just one year, as the

Bourbon Restoration enacted a conservative constitution and education became the cause

of multiple fights between the two main political sides. The independence of some

colonies sparked off an internal crisis, which led to some changes being made in

education, such as the regulation of examinations. The Second Republic brought along a

unique school, free and compulsory for everybody, secular, and it regulated bilingualism

and the position of teachers as public servants. The progressive changes did not last long

59

as the dictatorship in 1939 gave the control over education back to the church, and

education was used to transmit the ideology of the regime. Education became elitist,

teaching methods became archaic and learners were separated by sex. It would not be

until 1950s when the dictatorship started to be a little bit less dogmatic with its regulations

over education. At the end of the regime, in 1970 the General Law of Education (Ley

General de Educación) established four levels of education: preschool, primary school

(EGB), secondary school and, higher education. (ibid. 5-7).

As for the teaching of foreign languages in particular, it was not until the 20th century that

the teaching of modern foreign languages was introduced. It was specifically in a Royal

Decree of 1900 when the study of foreign languages, such as French, English or German,

was considered as an educational component (Morales et al. 8). At that moment, French

was established as the compulsory foreign language, which had to be studied from the

age of 12, whereas English and German were optional in certain academic years (ibid.

19). It was significant, however, that knowledge of French was a requirement to be able

to access Baccalaureate studies.

The introduction of foreign languages was quite brief, though, as only three years later

the possibility of studying an optional subject of English or German was removed and the

compulsory study of French was limited to the two years previous to Baccalaureate

studies. The teaching of foreign languages was then granted to the Superior Schools of

Trade (ibid. 19). In 1926, a reform drafted by the Plan Calleja slightly improved the

situation of the foreign language teaching by making the study of French compulsory for

three years and offering the possibility of studying English or German for two years (ibid.

21). This reform survived with some modifications until 1938, when a further educational

60

reform, which increased the number of school hours, was implemented. This reform

included some tests which the students had to pass at the end of certain academic years

and in which the students should prove their knowledge in foreign languages among other

subjects.

An important shift in the teaching of foreign languages happened in 1953, when a new

law (Ley de Ordenacion de la Enseñanza Media) was passed. The age for starting to learn

foreign languages was initially set at 12 and then lowered 11, but for more academic

years. At the same time, the necessity of using the target language in the class was

mentioned for the first time in a written law (ibid. pp 24-27).

The final major period of education in Spain starts with the Transition to democracy,

when the current Constitution was drafted (1978) and a new law was enacted (Ley

Orgánica del Estatuto de Centros Escolares, LOECE) only two years later. It established

the study of a foreign language for all students in primary education. Furthermore, it was

possible to choose either French or English as the main foreign language to study, and

English quickly became the predominant option. Since then, the different political parties

in power have made changes in the education system. In 1990, a new act, known as the

LOGSE, was enacted and introduced a system based on continuous and integrative

evaluation. It was based on Piaget’s constructivist ideas, so students should learn how to

learn. The LOGSE also conceived evaluation as a subject in itself and established the

assessment of students, but also of teachers, educational centres, syllabuses, etc. (López

Bautista 5). The aims were to use evaluation to obtain information for orientation and

improvement. The final stage started with the Organic Law of Education (2006), or LOE

61

as it is known. From this moment, the evaluation in Spain had another cornerstone: the

assessment of competences and the implementation of the eight key competences:

- Competence in linguistic communication

- Mathematical competence

- Competence in knowledge of and interaction with the physical world

- Competence in processing information and use of ICT

- Competence in social skills and citizenship

- Cultural and artistic competence

- Learning to learn

- Autonomy and personal initiative

At the same time, competence in linguistic communication was sub-divided into different

sub-competences, which included linguistics (grammar), socio-cultural aspects, logic,

sociolinguistics, learning to learn and strategy.

In 2013, a new government approved a new Organic Law known as LOMCE aimed to

replace the LOE. This new educative also contains key competences -seven to be precise:

- Competence in linguistic communication

- Mathematical competence and science and technology basic competence

- Digital competence

- Learning to learn

- Competence in social skills and citizenship

- Leadership and entrepreneur spirit

- Awareness and cultural expression

The main novelty is the introduction of the so-called Language Standards, types of

quantitative reference tables for assessing to what extent a student has achieved a goal

62

(standard) in the curriculum. Learning Standards have been used in many other countries

for the evaluation of school subjects and are closely linked to rubrics or grading scales,

as will be explained below in the Learning Standards section.

2.2. Types of evaluation

The complexity of evaluation implies a huge amount of evaluation types which may be

categorised in different ways, according to the aspect used to catalogue them.

Among the different criteria which may be used, evaluation could be classified according

to the moment of application, its purpose, its extension, and the agent of evaluation.

(Castillo and Cabrerizo 32-48).

a) According to the moment of application:

❖ Diagnostic evaluation: some tests are used to “expose learner difficulties, gaps in

their knowledge, and skill deficiencies during a course” (Harmer 321).

❖ Placement evaluation: this consists of a test to determine in which level a student

should be placed. It is “usually based on syllabuses and materials the students will

follow and used once their level has been decided on” (ibid. 321).

❖ Initial evaluation: this is the one carried out by the teacher at the beginning of the

academic year or at the beginning of a course in order to “know the previously

acquired knowledge of their new students” (in Mclaren, Madrid and Bueno 609).

❖ Progress evaluation: this kind of evaluation measures how much the student has

progressed.

❖ Final evaluation: it is the one which is done at the end of a course to check the

achievement of the objectives.

63

b) According to its extension

The extension refers to whether the evaluation is assessing only one aspect, for instance,

the academic result of an exam, the behaviour of the student in the class, etc. or if it

assesses different aspects at the same time. (Castillo and Cabrerizo 39)

❖ Global evaluation: this encompasses all the components and dimensions of the

learner, the educative centre, the programme, etc. The evaluation applied only to

the students would be a kind of evaluation which measures skills, knowledge,

attitude, competences, etc.

❖ Partial evaluation: this is focuses solely on the measurement of one aspect or

dimension: knowledge, skills, attitude, etc.

❖ Inner evaluation: this type of evaluation refers to the one carried out by the centre,

its teachers, or administrative staff to test the inner working.

c) According to the agent of evaluation (ibid. 40)

This kind of categorisation is based on the person who is in charge of the evaluation. It

allows evaluation to be divided into different kinds:

❖ Self-evaluation: that carried out by teachers when they evaluate their own work,

or by students when they evaluate themselves.

❖ Hetero-evaluation: this is evaluation as everybody conceives it. When the teacher

evaluates students or when students evaluate the teacher. It can be external if an

agent of evaluation from outside the high school evaluates the students.

❖ Co-evaluation: co-evaluation or peer-evaluation. This is the kind of evaluation

carried out by people who belong to the same level or status. In other words, it

refers to the evaluation of teachers by their colleagues or to the evaluation of

students carried out by their classmates.

64

d) According to the scale

Normative evaluation is that which compares the results obtained by a group with the

general average of one concrete group; for instance, the average of another group from

the same level, the centre’s average mark, the comparison with other centres. (ibid. 41-

42)

On the other hand, criterial evaluation is based on some evaluation criteria specified

previously to the assessment, and available for the students to consult.

e) According to the purpose

This is probably the best-known criterion. Evaluation may be used with three main

purposes: diagnosis, formative and summative.

❖ Diagnosis evaluation has already been explained above (in letter A).

❖ Formative evaluation: an evaluation “is considered to be formative if the primary

purpose is to provide information for program improvement” (Fitzpatrick et al.

16). It is related to the evaluation of the process and is used to make curricular

decisions on the content, the way of teaching, the reviewing, etc. It allows the

teacher to check, reinforce, regulate the learning; and the student to orientate,

receive feedback, check, etc.

❖ Summative evaluation: this type of evaluation “is usually called final evaluation”

and is “considered the main type of evaluation in school settings” (in McLaren,

Madrid and Bueno 611). It is carried out at the end of the academic year in order

to check the achievements reached. It normally has a penalising function as it

allows the teacher to decide whether the learner passes or not. Fitzpatrick, Sanders

and Worthen define the summative evaluation as follows: it is “concerned with

providing information to serve decisions or assist in making judgements about

program adoption, continuation or expansion.” (7)

65

In order to understand the significance and magnitude of evaluation, it is fundamental to

know all the areas and aspects it influences, such as the scholar programme, the working

of the educative centre, the teacher’s performance, the student’s learning, the didactic

materials, the tools, procedures, the whole education system, educational community and

the evaluation itself (meta-evaluation). (Castillo and Cabrerizo 49)

f) According to the scoring

There are a large number of scales for scoring a test. The most common ones are probably

the numerical scales, particularly the scale from one to ten, which is widely used in Spain,

where 10 is the maximum grade and 1 is the minimum. Another common numerical scale

is from 1 to 100. Moreover, in some countries, for instance, in Germany, the scale from

1 to 10 is used in the opposite direction so 1 would be the highest score and 10 the lowest.

Numerical scales can be also expressed by percentages. The qualitative scale is also very

used in Spain and it consists of a wording scale which scores qualitatively (sobresaliente,

notable, bien…). The words can change (excellent, perfect, good, average, needs to

improve…) but the idea is the same. Other countries like the United States or the United

Kingdom used letter scales. Once again, the number of levels may vary. The most typical

letter scale ranges from A, which is the top, to F, which is the bottom. Grades can be

given by hand, computer or distributed scoring. “Distributed scoring is a model of

performance scoring in which readers receive training and conduct scoring remotely (e.g.,

from home) rather than in a regionally-located performance scoring center” (Keng et al.

1)

g) According to the delivery methods employed

In this case, the test can be traditional, using pen and paper, it can be delivered by

computer, online or it can be made orally through a presentation, interview, exposition,

etc.

66

h) According to the formality

Amita Patel claims that with informal assessment “the judgements are integrated with

other tasks” (4) and they are commonly employed to provide students with formative

feedback. This type of assessment is not very stressful or threatening for learners. In

contrast, formal assessment is the one which students know they are doing. (ibid.)

i) Divergent/convergent

Divergent assessments are those “in which a range of answers or solutions might be

considered correct.” (ibid.) While they are more time consuming, they are often more

authentic in order to assess cognitive skills. On the other hand, tests with only one correct

response are called convergent, and are faster and easier to grade.

j) Process/ Product

Process assessment focuses on the steps followed by the students to achieve an ability or

to do a particular essay or task; it measures development. Product assessment scores or

measures only the outcome of a task or test and never the process which was followed.

With regard to assessment approaches in educative centres. Escudero Escorza classifies

these into five groups: the first approach is focused on the results, the second one on the

organisation of the centre itself, the third type uses mixed criteria, the fourth type seeks

to assess cultural aspects related to the centre, and the fifth type assess the institution’s

ability to self-transform. (2)

2.3. Dimensions of assessment

As it has been previously stated, assessment and evaluation are such complex concepts

because they encompass many different dimensions which must be taken into

67

consideration. Castillo and Cabrerizo mention a huge range of dimensions, including

criteria, evaluator, functions, indicator, methods, objectives, process, uses and variables,

but there are even more.

The learner’s dimension is a good one to start with. Irrespective of whether the method

used to assess is traditional or alternative, most of the time assessment implies the

students. Assessment is a very important part of the teaching-learning process, not only

if it is performed for summative purposes or for any other. Assessment is supposed to

help students, providing them with feedback and information on their achievements of

goals, improvements, weaknesses, level, strengths, study techniques, work, abilities and

skills and teamwork. This is the reason why they should be always taken into account

when the assessment is being chosen or the evaluation is being planned. The number of

students in the class, their possible different levels or what the type of learners they are,

are some of the things which should concern the teacher. It is highly recommendable to

allow them to take part in the process in some way so that they will perfectly understand

what they are facing and what it is expected.

The evaluator is another essential and indispensable dimension of the process. Scoring is

probably the most difficult part of the teaching process and it requires a great deal of

capacity, ability and objectivity from the scorer. Furthermore, as will be explained later

in greater detail, teachers should have some language assessment literacy in order to be

good at assessing.

The criteria are also an implication of the assessment process. Besides being clearly

defined, the test or assessment task must be suitable for them as well as the assessment

tool. It is not just a case of referring to the particular criteria for a task, but also for the

whole academic year as some criteria and standards are regulated by the curriculum or

68

the syllabus. The criteria must be truly useful to assess a particular skill or ability of the

student.

The school is another dimension related to assessment. Some schools have special

regulations to standardise examinations. In addition, evaluation grades might be used to

elaborate statistics on a large-scale which classify the educational centre where they were

made. For example, the academic failure reports or the schools with the best average

results in a particular skill, subject or course.

It should not be forgotten that parents are also part of the educational community. They

should also be implicated in their children’s educations and sometimes, for better or

worse, qualifications are the most important indicator for them.

The assessment task or method is obviously a dimension of the process. There are

hundreds of different tasks in order to assess students, but the decision must be taken with

a view to the types, the context, how fast, reliable and effective scoring is, and so on.

The assessment instrument, if we use one, may be also quite significant. Rubrics,

checklists, interviews or reports might have a great influence on the assessment.

2.4. Importance and consequences of evaluation

Carrillo and Unigarro state that “it is not possible to consider assessment separately from

the teaching and learning processes” (36). According to Ahmadi Fatalaki assessment

allows the teacher to identify the successful instruction and it also affect the learning

engagement of the students since teachers are more aware of the learner’s level of

proficiency (78). Assessment determines what is taught and how (Carrillo and Unigarro).

69

In the current society, a society of information and knowledge in which students must be

the principal actors in the learning process, it is necessary to teach them how to learn, to

engage themselves in the process and to boost critical thinking, creativity, team work,

initiative and reflexion (Herrero et al.). The European Higher Education Area (EHEA)

has promoted formative education along with methodologies which help learners to

develop both academic and professional competences and to turn the student into an

active subject.

Moreover, during the last decade there has been an increasing awareness that assessment

“cannot be separated from the context, culture, values and where it was created and used”

and, as a consequence, “hermeneutic and qualitative approaches have been adopted”

(Tsushima 106). Heidari, Ketabi and Zonoobi explored the role of culture in the different

language teaching methods from the ancient times to the present day. They have

discovered that because of globalisation, culture has a much more significant role in

modern teaching methods such as the Communicative Approach, Task-Based Language

Teaching, Content-Based Language Instruction and The Intercultural Competence. (para.

22)

Yamtim and Wongwanich place assessment among the five different components which

“contribute determining the quality of instruction” as well as students, teachers, resources

and context (cited in Herrera and Macías 303). After all, the evaluative system provides

the social community with information on educative quality, it also contributes with

information for research and it allows the most urgent areas for intervention to be

identified. Furthermore, it allows the impact of specific programmes and political policies

to be verified. Syllabi and curricula can also be improved based on assessment results as

70

it can help in the incorporation of quality standards. Evaluation results may also be used

to select students to access certain programmes, such as university degrees. Finally,

assessment measures the domain of competence (Carrillo and Unigarro 33) which is

useful in determining the learner’s level,

diagnosing where possible problems are and checking the progress and goals achieved.

Popham devotes an entire chapter in his book, Mastering Assessment, to point out how

assessment can help teaching. Among the reasons he gives, one could highlight how high-

stakes test - tests which implicate important decisions- may affect the promotion of the

student or not. But high-stakes tests also help to determine whether the instructional

practice was good or not. How pre-assessing students can save time to teacher is explained

in Mastering Assessment. The reason for this is that topics which learners already know

can be omitted. Additionally, the assessment of students during the teaching process

allows formative assessment and progress-monitoring. Moreover, the teacher can

determine whether he or she will need to explain the topic in a different way or not.

However, Popham suggests not giving too much weight to these “en-route” assessments

so that students do not become intimidated. With regard to post-assessment, the most

important advice which stems shifted from the book is the fact that more than “dispensing

student’s grades” (16), these assessments should answer the question: “How effective was

my teaching?” (15).

Not only is assessment important and significant owing to its benefits or the positive

influence it can have on the teaching practice, but also because of the negative effects and

consequences it might also lead to. Carrillo and Unigarro state three important factors in

language learning: anxiety, motivation and self-confidence (21). A high level of the last

71

two is essential for successful learning. On the contrary, a low level of anxiety eases the

learning. As a result, the common high level of anxiety the students experience nowadays

during the language lesson is hindering their learning. Horwitz points out three main

sources of this anxiety: communicative apprehension, fear of a negative evaluation and

exam anxiety (cited in ibid. 54).

The fear of a negative evaluation has a strong impact on the learner’s test performance.

That is because “there is abundant evidence that demonstrates that individuals’

intellectual performance is undermined in situations that remind them that they are

stereotyped, which is causing them to underperform” (Schmader cited in Ewing 7).

Furthermore, performance anxiety can lead some students to school refusal. According to

research conducted in the United States, up to 5% of students may reject going to school

because of anxiety or depression (Wimmer 1). These feelings of anxiety or depression are

generally generated by the assessment and evaluative process.

Anxiety may be transmitted to students during the teaching process. Sometimes, teachers

“may communicate the consequences of a test performance in a very different manner”

(Von der Embse, Schultz and Draughn 622). For instance, they may use test results to

“threaten students” as an attempt to try to motivate or encourage them to study and

prepare the exam well if they do not want to fail. “Fear appeals refer to messages that

repeatedly remind students about the importance of passing exams and the consequences

of failure” (ibid.). Those messages are usually linked to the ones sent by many learners’

parents from their homes and the student’s own self-demanding to pass the course, to

entry the desirable university or to obtain great scores.

72

The level of anxiety can be measured with several scales. One of the most important one

is the Foreign Language Class Anxiety Scale (FLCAS) which consists of 33 items and a

Likert scale from 1 to 5. The Cognitive Anxiety Scale is another one and has 27 items. It

also uses a Likert Scale, this time from 1 to 4. It is recommendable to check the level of

anxiety of a class in order to take the suitable measures to reduce it.

Both the positive and negative consequences derived from assessment or the evaluation

process make it so significant and important.

2.5. Traditional Evaluation

What traditional evaluation really means may differ from one author to the other. It is

obvious that not everybody has the same conceptualisation on what it is traditional and

what it is not. Moreover, the term itself is very broad. For the purposes of the current

work, traditional evaluation corresponds to all the evaluation methods, techniques and

tools which are used by the teacher to carry out a final and summative evaluation at the

end of the course or term. This evaluation is only quantitative because it merely aims to

give a mark to students in order to establish how well they have achieved the learning of

the concepts from the syllabus. There is no intention of providing students with qualitative

information about their learning, or about what he or she can do to improve, or how to

learn better. It does not aim to give information about their strengths and weakness, or

even try to measure their “real” performance, competence and abilities or skills, but only

the memorised knowledge that they have acquired. When this kind of evaluation takes

place, there is normally nothing else to do, as with this evaluation the student has already

passed or failed the subject/term/course.

73

Dikli maintains that traditional assessment tools are usually “multiple-choice test,

true/false test, short answers and essays” (13). Multiple choice tests consist of different

questions or unfinished statements and various answers from among which the learner

has to choose. They are commonly used in high schools as they are “fast, easy and

economical to score” as well as “objective” (13). However, there are also many problems.

For example, “they are extremely difficult to write well, and the distractors may actually

put ideas into student’s heads that they did not have before they read them” (14). In

addition, it is possible to pass a multiple-choice test without having a really good

competence in writing or speaking the language. True/false tests requires students to

“make a decision and find out which of the two potential responses is true” (14). However,

it may be difficult to determine whether the student really knows the correct answer as

he/she has a 50% chance of success. (13-14). Short-answer tests try to obtain a brief

written answer to a question. The main problem is that questions are normally unrelated

and are normally open to interpretation.

Some other common tools for summative assessment, especially in the teaching of foreign

languages, are cloze procedures. Cloze tests are based on the “deletion of every nth word

in a text (somewhere between every 5th or 9th word)” (Harmer 323). As a result, the

elimination of words is random and all kinds of words may be deleted. The main

disadvantage of cloze procedures is that the learner “depends on the particular words that

are deleted, rather than on any general English knowledge” (324).

74

Summative and quantitative evaluation is still the most usual and common type of

evaluation in Spain’s education system, even when teaching and education have been

recently exposed to a significant evolution.

Castillo and Cabrerizo (427) have collected some of the disadvantages of traditional

evaluation:

❖ Only concepts are assessed: the quantitative marks given to student measure only

their memory ability.

❖ Scholar failure. The huge and increasing academic failure rates are closely related

to the teaching as well as to the assessment method. Traditional evaluation does

not measure the real written and spoken competence of a learner, either in real and

authentic situations, leading to lack of motivation in students. In the era of the new

technologies and communications, it is completely understandable that students

have no interest in traditional exams or tasks. In addition, it has been proved that

there are multiple intelligences and also multiple types of learners which

traditional evaluation does not take into account.

❖ The student is the only one being evaluated. This conceptualisation is completely

old-fashioned. Within the educational community, the teacher should also be

assessed−as well as the syllabus, the evaluation programme, etc.− in order to

improve the quality of the system. Traditional evaluation does not serve this

purpose.

❖ Only the results are measured. Against traditional evaluation, the new and

alternative ways of evaluation recognise the importance of evaluating not just the

results but also the process, the rhythm, the effort, the strategies the learner is able

to use, the methodology, the progress, etc.

75

❖ Remedial exams are merely repetitions of previous exams. Some authors criticise

the fact that a remedial exam only entails doing a very similar exam on a different

day instead of providing the student with new strategies to face the difficulties.

❖ Traditional evaluation focuses on what the learners do not know or on their

mistakes. This is a very negative perspective which does not motivate students at

all. Evaluation should be focused on what the students are able to do in order to

stimulate them.

❖ Evaluation is merely a score. As has already been explained, the way in which

evaluation is carried out nowadays is reduced to a number and it does not contain

any qualitative comment.

❖ The exam is the only assessment tool. Students have to gamble everything in one

exam. It does not matter how they have performed during the entire course, what

they have proved to know during all the classes, but only what they are able to do

in a traditional assessment one day in a highly stressful situation.

❖ Traditional evaluation does not consider self-evaluation whose benefits have been

more than proved.

❖ Traditional evaluation is normally used as a measure of repression or punishment

instead of being an instrument for improving learning.

Brooks states that “it has long been recognized that what teacher teach and the ways in

which they teach are heavily influenced by particular types of assessment”. (16) Pidgeon

has already expounded (1992) how in many classes “children were ricocheting from one

unrelated activity to another to ensure all attainment targets were covered”. This has

propitiated that children do not learn in any “meaningful way” (cited in Brooks 17).

76

Besides some of the consequences already mentioned, Brooks also points out that

traditional evaluation discourages skills such as critical reflexion and speculation, as well

as the learning for personal improvement. Lavigne and Good claim “teacher criticism was

consistently shown to have a negative effect on student learning” (47). Zainab Abbas has

also reflected on the effects of evaluation and assessment, and she states that “experience

indicates that the process of evaluation has been misused by the majority of EFL

instructors” (4) and that they have substituted it for a “monthly or regular selected

responses test”. She also points out what many other authors and researchers have found

exams and tests to be very stressful and harmful for the students and this fact does not

benefit overall language proficiency.

2.6. Alternative evaluation

The teaching of EFL is not like it used to be two decades ago. The establishment of the

CEFR and the promotion of the communicative competence have driven reforms in the

subject. The communicative approach is now hugely encouraged in lessons, so the main

language of work is English and not Spanish. The different skills: speaking, listening,

reading and writing are now put into practice besides grammar.

However, the truth is that, normally, grammar has still the greatest weight in the student

mark and the assessment tools used for the evaluation are, in most schools and high

schools, very traditional. Despite the presence of formative evaluation in the classroom,

it is not commonly applied. What most of teachers do as a “formative evaluation” is

actually just to reserve a 10 % of the global quantitative mark to measure (quantitatively

too) the attendance, the attitude, the work and the homework done by the student during

77

the year. As a result, very important aspects of the evaluation, such as the progress made

by the learners, their daily work and effort, or their real performance in a non-threatening

and stressing situation, are reduced to a 10% of their final mark.

By contrast, there is another type of evaluation already mentioned: formative evaluation.

Nevertheless, it is important to highlight, once again, that the kind of methodologies that

consider “periodical intermediate exams” as formative evaluation are not the ones this

essay refers to (Popham 17). Formative evaluation has been proved to be more effective

than summative evaluation. Researchers such as Paul Black and Dylan William have

conducted a great deal of research on formative evaluation, its implications and its results.

One of their studies on formative evaluation was an assessment of the formative

evaluation itself. Their report was published by the journal “Assessment” in Education

and was based on a wide variety of reports on the topic. These reports had been written

by other researchers from different countries and were based on the experience with

pupils ranging from 5-year-olds to university students. According to the report written by

Black and William, their research showed conclusively that formative evaluation

improves learning. Numerous reviewers of this meta-analysis made by Black and

Williams in 1998 conclude that the methodological rigour used and the quality of their

judgements and conclusions endorse the reliability of the outcomes of the report are

totally reliable. (qtd. in Popham 23-26)

Once the benefits of implementing formative evaluation have been described, it seems

clear that formative and qualitative evaluation is more recommendable than summative

evaluation or, at least, that a combination of the two should be implemented in a fairer

and more equitable manner.

78

Notwithstanding that formative evaluation already exists today, the significant and

immense changes our society is experiencing in recent years make it necessary to go a

little bit beyond that. Formative evaluation is required but the methods, techniques, and

tools which can be used to implement it need to keep in step with the evolution and the

advances society is undergoing. For this reason, it seems necessary to turn the formative

evaluation into a trans-formative evaluation and in order to do so, it seems essential to

provide teachers with a new or renewed variety of alternative methods, techniques and

tools which will enable them to design and programme a new and real alternative

evaluation.

First of all, as was done with evaluation in the previous part, it is essential to start by

giving a definition of the concept “alternative assessment and evaluation”. Despite the

fact that it is a common concept in the current educational community, it is difficult to

find an appropriate definition which captures the entire essence of what an alternative

method of assessment or evaluation implicates. In fact, the concept of alternative testing

was used to describe all those activities which were not formal tests but which could be

used for assessing learning performance, as alternatives to the traditional methods of

evaluating language.

Valencia and Pearson have created a quite similar conceptualisation of the term:

“alternative assessment consists of all of those efforts that do not adhere to the traditional

criteria of standardisation, cost-effectiveness, objectivity and machine-scorability” (cited

in Abbas 27). A much more simplistic definition of the term is that published by

79

Glencoe/McGraw-Hill which understands that it is every “alternative to traditional paper-

and-pencil tests”. (para. 2)

Other authors consider alternative assessment as the evaluation of what students can do

instead of what they are able to recall or to reproduce. It would be the checking of what

students integrate and produce. (Abbas 27)

The above definitions and explanations point out the complexity of the concept due to the

wide spectrum of manifold methods or techniques which could be encompassed under

this denomination. In her article on alternative assessment, Willa Louw mentions the view

of two authors, Mary and Combs, on the theme (23). Although the works of these authors

were published in 1997, their conceptualisation has been able to integrate the complexity

of the concept and is still relevant for this reason. They both conceive alternative

assessment as an amalgam of cognitive, demonstrative and affective methods carried out

in order to evaluate the students. According to this, alternative assessment implies not

simply assessing how the knowledge could be applied to the real world in authentic tasks

or assessing the performance of skills such as demonstration and simulation, but also

assessing the attitudes and values of the students.

However, whether everything which is not a written “traditional” test should be

considered alternative assessment or not could be widely discussed. Is a non-written

examination suffient to be considered alternative assessment? What could be labelled as

“alternative”? Nowadays technological improvements and advances, as well as the

changes in the world, are so relevant that what could be regarded as alternative ten years

ago, now would be far from what we understand as alternative to traditional, particularly

80

taking into account the characteristics of Secondary Education students. These pupils

were born within the Internet Era, with a computer, a smartphone and a tablet at home,

and they have been able to manage all of these devices since a very early age. Commonly

referred as “digital natives”, they are distant from the previous generations. The

generation of students who witnessed the introduction of digital screens or projectors in

the schools and who owned a mobile phone from the ages of 10 or 11 could be slightly

impressed by the changes applied during these years but, in order to engage digital

natives, we need more than a projector in the classroom, and other alternative

methodologies and methods should be implemented. The evolution from the printed book

to the book projected onto the screen could work for a couple of years with past

generations but it would be considered boring and old-fashioned for the new ones.

Younger students would define these methodologies as traditional more than alternative,

as it would be the case with most of erstwhile alternative assessment methods.

2. 7. Assessment for Learning

Kay Sambell, Liz McDowell and Catherine Montgomery, lecturers at Northumbria

University in the United Kingdom, devote an entire volume to outline what assessment

for leaning is. Despite the fact that they focus on higher education, the truth is that most

of the concepts may also be applied to secondary education. Assessment is commonly

thought of as a form of testing what the learned know, what they can do, and which grade

is associated to it. However, Assessment for Learning (hereinafter, AfL) is “the principle

that all assessment, within the overall package, should contribute to helping students to

learn and to succeed” (3). As a result, assessment for learning is neither summative nor

81

formative, or any other type, but a balanced combination of different types with the main

purpose of improving students’ learning.

AfL is based on six principles. It must be rich in formal feedback but also in informal

feedback; emphasise authentic and complex assessment tasks; develop students’ abilities

to assess their own progress; direct own learning; offer extensive confidence-building

opportunities and practice; and have a suitable balance of summative and formative

assessment (5).

Active learners involved in their own learning need feedback in order to evaluate their

own progress and improve their learning. For this reason, teacher’s comments and self-

review logs, interviews, and so forth, are recommendable. Peer review of drafts, for

instance, is a suitable practice for providing students with informal feedback.

Collaborative approaches may help learners to learn together, revising, discussing,

sharing ideas, understanding different points of view or learning about new methods that

can be very helpful for them. It is also fundamental to allow students to practice or

rehearse so that they can improve, gain self-confidence or correct possible mistakes.

These opportunities can be generated by teachers during the teaching process. The

dominance of summative assessment should be reduced and balanced through a much

more qualitative assessment if AfL is the principal target of the whole process. Finally, it

is indispensable to design authentic tasks, what is really important for the students to

know instead of what it is easier to score (6-7).

There is a huge range of practices that might be introduced in the class in order to assess

for learning, all of which involve real-world contexts or practices beyond the academy

(ibid. 13), students being able to detect problems, solve them and work in a cooperative

way. Consequently, teaching approaches must focus on independent thinking, problem-

solving ability, originality and teamwork skills. Learners should be explained what they

82

are doing, why, and what the purpose is, so that they can understand the value and the

significance of what they are learning.

Besides the understanding of the relevance of assessments, it is important that students

see the link to the real world and develop personal engagement. In Trowler and Trowler’s

words (cited in ibid.) “individual student engagement in educationally purposive

activities leads to more favourable educational outcomes” (15).

With regard to TEFL, Christopher J. Hall warns there is “a tendency to identify the

language uniquely with a single ’standard‘ variety”, which means, to associate native

speaker English with the only valid variety of English and, as a result, “the sole reference

point for assessing the adequacy of non-native speaker forms” (378). If an authentic

assessment is desired, Standard English cannot be the only the only accurate language

allowed.

2.8. Instruments

Alternative evaluation encompasses a colossal and endless list of methods and techniques

which would be impossible to expound entirely. In the present work, an extensive range

of examples will be described.

It is essential to clarify that most of alternative evaluation methods are grounded on

formative evaluation. Some authors state that alternative assessment must be applied

continuously in the class because classroom-based assessment informs immediately

teachers and students, as well as parents, of student performance on an ongoing basis

(Janisch et al. 222). Learning a foreign language implies being able to produce and to

know the language all the time and not just in a concrete exam. For this reason, students

must be assessed in their day a day use, in order to correctly measure how they use the

83

language in performance and authentic tasks. Dikli claims that “authentic assessment

aims to relate the instruction to the real-world experience of the learners.” (14) Along the

same lines, Wangsatorntanakhun (cited in Dikli) states the term, performance-based

assessment, embraces both alternative and authentic assessment.

2.8.1. Portfolio

According to Nigel Miller, portfolio “contains great potential for developing and

demonstrating transferable skills as an ongoing process throughout the degree

programme.” (8). The quantity of research, works and articles written on the portfolio and

their countless advantages and benefits is vast.

Paulson, Paulson, and Meyer (cited in Dikli) define portfolios as “a purposeful collection

of student work that exhibits the student’s efforts, progress, and achievements in one or

more areas” (14). The portfolio normally includes the entire body of work from the

student or a selection of the best pieces and any kind of reflection on it. This reflection is

based on the selection of works by the students and the reason why they have been chosen

or based on what the student has or has not learned through the use of any self-evaluation

tool. Within the current technological era, this alternative method of assessment could be

improved taking the advantages provided by the new technological tools. The internet

provides the student with numerous applications or websites specialised in creating online

portfolios, for instance, Eufolio. However, other tools and applications available on the

Internet could be used with an identical purpose. Students could develop all their

creativity in order to design and construct their own portfolios by building them on a

webquest or a wiki, or by utilising applications such as Lino-it, Wall wisher, Smore or

Kerpoof, among many others.

84

Dikli states that “the practicality of e-portfolio use is highly dependent on instructor’s as

well as learner’s knowledge of computer technology. The variety of information that

could be included to e-portfolios is infinite”. (15)

Among the great number of advantages of portfolios, the following ones could be

highlighted (Miller):

❖ Portfolios can be useful for students with work experience to claim credit for tasks

done at the workplace and to tailor work tasks in a way that promotes learning and

development. They can also be useful as a basis for interviews and promotion.

❖ A portfolio is usually a collection of work developed over time and may help the

student to think about what is being achieved in an ongoing way.

❖ Students have a degree of control over what goes into the portfolio.

❖ As evidence of a student’s achievements, a portfolio can foster confidence and a

sense of achievement.

❖ The process can foster dialogue with tutors and assessors.

As for the relationship between the portfolio and the evaluation process, a number of

authors state that, through a portfolio, teachers can assess the progress of students and

discover the process they have followed. It can also assess in a formative way, rather than

in a final exam, while students can get more involved in the process. Finally, it shows the

abilities they like the most instead of the ones chosen by the teacher (Castillo and

Cabrerizo 218).

In addition, through the portfolios students are engrossed in self-evaluation and set goals

for their learning. They are no longer “defenceless vessels waiting to be filled with facts”

(Wasserstain cited in Janisch et al. 14). Instead, they are “masters of their own learning

and sense” making (Graves 2002, cited in Janisch et al. 221).

85

Although a universal and unified model of portfolio does not exist, except for the one

proposed by the European Council, which will be explained below, Castillo and Cabrerizo

(2010) recommend that the following aspects always be included: Subjects’ aims,

competences which must be acquired, general scheme of work, daily planning, register

of key experiences, evaluation criteria and guidelines, self-assessment guidelines and

conclusions (219).

On the other hand, portfolios might also entail some difficulties. Janisch, Liu, and Akrofi

discussed some of them: the principal one is the lack of motivation or self-initiative and

self-reliance of the students, who are used to take a passive role in learning, and it is

sometimes difficult to get them to be more responsive and participatory. (227)

Other disadvantages of using portfolios could be the following:

❖ Requiring extra time to plan an assessment system and conduct the assessment.

❖ Gathering all of the necessary data and work samples can make portfolios bulky

and difficult to manage.

❖ Developing a systematic and deliberate management system is difficult, but this

step is necessary in order to make portfolios more than a random collection of

student work.

❖ Scoring portfolios involves the extensive use of subjective evaluation procedures

such as rating scales and professional judgment, and this limits reliability.

❖ Scheduling individual portfolio conferences is difficulty and the length of each

conference may interfere with other instructional activities. (Venn 538 cited in

Patel, P. 2).

2.12.1.1. European Portfolio of languages:

The European Portfolio of Languages (EPL) is the result of many projects and studies

developed by the Council of Europe in several EU countries aimed at setting up the basis

86

and common features for a standard portfolio valid in the whole territory. According to

Little et. al.

EPL “summarizes the owner’s linguistic identity and his or her experience of

learning and using second/foreign languages; it also provides space for the owner

periodically to record his or her self-assessment of overall second/foreign

language proficiency.” (7)

This portfolio consists of three obligatory parts: the Language Passport, the Language

Biography and the Dossier.

The European Portfolio was designed mainly to make students aware of their own

learning process and promote their autonomy and to provide evidence of the learner’s

competence, skills and ability in the target language (Little et al. 8).

The European Portfolio of Languages is closely related to the CEFR. As a result, the EPL

uses the level’s scales of the framework for self-assessment. Moreover, the grid of levels

could be used by teachers who wish to assess the level of their students in the different

skills to mark the portfolio if they want to implement its use in the evaluation process.

After all, “the ELP should be seen as a means of bringing the concerns, perspectives and

emphases of the CEFR down to the level of the learner in the language classroom” (Little

et al. 10).

2.8.2. Oral presentations

Oral presentations are another frequent alternative method of assessment. Fortunately, the

communicative approach promoted by the European Union has started to introduce oral

presentations in English lessons. Oral presentations often consist of a speech made by the

student on a topic, using, or not, additional aids, such as presentation software. Students

are normally required to prepare their speeches in advance, and that commonly includes

researching a certain theme and being able to produce a communicative speech.

87

Ryan and Cook point out the following reasons for introducing this method in the

assessment process:

❖ Testing cognitive skills

❖ Allowing students to demonstrate their ability to generate and synthesise new

ideas

❖ Giving them the opportunity to demonstrate what they have learnt in an analytical

way

❖ Turning the tertiary classroom into an active learning environment

❖ Giving students the chance to learn from their peers and to share their knowledge

with them. (para. 2)

They also stress that students “need to use the task as a valuable opportunity to develop

vital skills” (para. 5). They also emphasise that the practice gained in oral presentations

enables students to develop to understand process and explain key topics which will

suppose tremendously valuable practice for their future careers and professions.

Slagell (n.d.) warns about the disadvantages: it may require external judges and much

time to evaluate the effect; it is a threatening situation that increases student anxiety and

may interfere with learning. Moreover, it involves complicated ethical issues and the

feedback is often limited; it attempts to divide delivery from content; it offers a false sense

of “objectivity” and suggests that effective speaking is a checklist of universal and distinct

specific behaviours. Finally, peer assessment is complicated; some peers are poor

listeners or rude respondents, there are temporal and administrative challenges, and

videotaping can heighten anxiety.

2.8.3. Journals

Journals are entries written by students in which they reflect on what they have done and

learned in the class or in their assignments. Journals can be written on a daily, weekly or

88

monthly basis, at the end of the lesson, or at home. On the Internet, a huge quantity of

tools can be found to connect the ICT with the lessons.

Richard and Renandy (cited in Abbas 32) state that journal keeping, being informal in

nature, enables a student to gain extensive writing practice. The main advantages of

journals are:

❖ It can be enjoyable since it gives the student free rein to write on any topic at the

spur of the moment.

❖ It offers students the privacy, freedom and safety to experiment and develop as a

writer.

❖ It contributes greatly to the humanistic approach to teaching and learning, an

example of which is the integration of values during the sharing sessions.

Lack of motivation would possibly be the most significant disadvantage of this method.

Students will not be particularly thrilled by this method and they may even consider it

boring and annoying. Moreover, it is not really an authentic task, and it only allows the

teacher to evaluate the writing skill. It could, however, be combined with another method.

2.8.4. Projects

Project-Based Learning, known as PBL, is a method that is currently in vogue within the

educational community. Project-Based Learning is a method which consists of a complex

task or question that students must accomplish through a designed product. Students must

work in teams in a cooperative way and need to plan the project, organise, research,

discuss, negotiate and share the results (Moss and Van Duzer 3). Gökhan Bas (2008)

states that “the need for education to adapt to a changing world is a primary reason that

project-based learning is increasingly popular” (2). Fried-Booth (as cited in ibid.) notes

that PBL creates a bridge between the English spoken in the class and real-life English,

and that places students in situations which require the usage of the language in order to

89

communicate as well as the establishment of a trusting relationship among the team

members (cited in Moss and Van Duzer). Information gap activities, learn-to-learn

activities, role playing, interviews, research and planning are only some of the key words

related to PBL (Gökhan 5).

As a summary it can be highlighted that Project-Based Learning is characterised by the

following principles (Moss and Van Duzer):

❖ Builds on previous work: integrates speaking, listening, reading and writing skills.

❖ Incorporates collaborative teamwork, problem solving and negotiating

❖ Requires learners to engages in independent work

❖ Challenges learners to use English outside the class

❖ Involves students in the planning process

❖ Leads to clear outcomes

❖ Incorporates self-evaluation, peer evaluation and teacher evaluation.

Other advantages that could be mentioned are that PBL allows all students to use the

learning method they prefer according to the type of intelligence they have. For instance,

a visual learner could create a poster to show the results, and a kinaesthetic learner might

prefer role-play or to give a presentation in order to share her/his results. PBL also

integrates all the curriculum areas and the social and cultural context easily and it is

adaptable to the skill levels of each of the students. In addition, learners develop good

learning habits and responsibilities and learn the contents (“know”) and how to use them

(“do”) at the same time.

Nevertheless, there are also some difficulties which may be considered as disadvantages

by some teachers, and that is the main reason why they do not introduce the method as

much as they should. These would include the following (Pozuelos, Rodríguez and Travé

14-15):

90

❖ The amount of workload that starting a project entails.

❖ Lack of resources and materials for developing the project.

❖ Rigid school timetables divided by subjects.

❖ Lack of practice or examples of PBL in the teacher’s environment.

❖ Climate of uncertainty the first time a project is carried out.

❖ Critiques made by the parents or teachers of the school because of the innovative

character of the method.

❖ The apparent inability of cover all the contents established by the curriculum.

2.8.5. Interviews

Abbas observes that interviews “are one-on-one sessions between the learner and the

instructor” (31). Richard and Renandya (qtd. in ibid. 31) state that conferencing is an

effective means of teacher response to student writing. It is a form of oral teacher

feedback. A short 10 to 15-minute conference will enable the teacher to ask students about

certain parts of the letter writing which are problematic. In fact, Fitzpatrick et al. comment

that “helpful as they are, written documents cannot provide a complete or adequate basis

for describing the object of evaluation” (207). However, it is essential to highlight that

interviews are not ordinary conversations but conversations with very concrete

pedagogical purposes. These purposes may be, as already mentioned above, to clarify

writing parts or to get feedback from the student on the subject, the teaching, the contents,

etc. But they could also be to assess the student’s competence. In EFL, interviews could

be extremely helpful in determining a learner’s speaking competence but also to check a

certain grammar point or the specific vocabulary from a theme, and to assess the listening

comprehension of a record or if the student has understood a reading.

91

Castillo and Cabrerizo (376) state that interviews have three functions: diagnostic (its

goal is to identify problems or level), collection of information (about the student’s

interests) and therapeutic (in order to put strategies of intervention into practice).

They also point out the advantages of the interview (ibid. 377):

- It allows the teacher to establish a closer relationship with the student.

- Interviews are adaptable to the student and his/her level.

- Interviews are flexible, the interviewer can adapt himself/herself to the

circumstances which can suddenly emerge.

- It provides a lot of information, both verbal and non-verbal.

- It is suitable for many students with special needs.

On the other hand, interviews also present a number of disadvantages:

- It they are time consuming, which is especially problematic with big groups.

- There could be problems with subjectivity.

- Interviewers should be highly audacious and perceptive.

2.8.6. Progressive assessment chart

There is a diverse range of varieties which can be considered as checklists. Some authors

distinguish between category system, checklist and progressive assessment charts

(Castillo and Castrillo 362-364), whilst others conceive them as different sorts of the same

tool or even mix them. The truth is that all of them may be combined into one by the

teacher in order to adapt it to his or her concrete evaluation needs.

The difference between them is the way of measuring or assess. The categories system

collects comments written by the teacher on some previously defined and concrete

categories. These categories must be thorough and exclusive. Checklists, on the other

hand, are not based on comments but on yes-no answers to the different aspects defined.

The teacher must point out whether or not one aspect is present in the student’s behaviour.

92

Finally, progressive chart reports measure the different categories defined so they can

have some score, depending on how much that aspect is present or absent.

2.8.7. Report

As Fitzpatrick, Sanders and Worthen remind us, the main purpose of evaluation reports

is to improve the programme. However, “evaluation reports serve many purposes” (376),

among which the following could be highlighted:

❖ To help the teacher to make curricular decisions.

❖ To change attitudes and behaviours of the students.

❖ To improve communication between the teacher and the learners.

❖ To involve students in their learner process.

❖ To drive student’s attention to their difficulties.

In summary, reports inform the learners and the teacher himself/herself about the

“findings and conclusions resulting from the collection, analysis and interpretation of

evaluation information” (ibid. 377). It is fundamental that the report contain relevant and

important content for the students. This implies that the content has “truth value and utility

value”. Truth value refers to the quality of the information reported and the utility value

stands for the significance and pertinence.

Although reports are traditionally written on paper, there are multiple forms of presenting

reports and the teacher should consider the best delivery process. Among the possibilities,

the information may be reported through “brochures, audiotape reports or a slide

presentation”. (ibid. 379).

As formative evaluation aims to help students in their learning process, and it also helps

the teachers to improve their teaching and curricular decisions, it is recommendable that

reports be delivered regularly.

93

2.8.8. Rubrics

Rubrics or grading scales are charts which allow the assessment of a task on the grounds

of some criteria and established scale. Since rubrics suppose a meaningful and relevant

part of the current thesis, they will be discussed at length in a separate chapter (Chapter

4).

2.9. Language Assessment Literacy

Language Assessment Literacy (LAL) is a fairly common concept in other countries, such

as the USA. Unfortunately, as of yet it is not particularly well-known in this country. As

quoted in Herrera and Macías, “developing LAL among EFL teachers must be a necessary

component of foreign language teacher education programs” (303). Assessment literacy

implies that the teacher understands and knows the meaning of high- and low-quality

assessment. LAL implies the knowledge of certain skills and how to assess them, but also

the knowledge of how to measure language competence. In addition, it is necessary to be

conversant with certain related principles, such as validity, reliability, objectivity or

ethics. Fatalaki (78) expounds fifteen skills that teachers of foreign language should gain.

According to him, it is important for a foreign language teacher to learn how to control,

administer and score a test, how to interpret statistical and raw data, how to detect poorly

performed test items and confusing factors unrelated to the skill being assessed. He or she

should also distinguish between correlation and causation or between two or more data

sets. Furthermore, the acquisition of knowledge about different assessment measures of

reliability and validity of the test are also important points. A teacher must know how to

94

intervene suitably when students misbehave and understand measurement error and

confidence intervals. Communicating the results appropriately and having ethics and a

huge commitment to test improvement are also fundamental skills.

According to Herrera and Macías, a high competence of LAL entails an appropriate

design of assessment, the selection of alternative assessment methods or techniques, the

analysis of the impact caused by standardised tests and a suitable connection between the

language teaching approach and the assessment practices. The introduction of LAL in

teaching training programmes could solve many problems related or derived from the

evaluation process. Herrera and Macías suggest initially a “questionnaire to diagnose the

needs with regard to assessment literacy of not only EFL teachers but language teachers

in general” (304).

95

Chapter 3: The CEFR

The CEFR, as mentioned in chapter 1 and 2, was developed by the European Council as

the result of twenty years of research with the intention of providing a comprehensive and

transparent basis for the creation and design of language syllabi and curriculum

guidelines, together with the design of teaching and learning textbooks, worksheets, tasks

and, in short, teaching materials, and the assessment of foreign language competence.

As cited in the framework, assessment is used in the document “in the sense of the

assessment of the proficiency of the language user” (177). The CEFR provides the entire

educational community with Learning Standards for languages teaching. In addition, the

Framework itself is a valuable resource that can be consulted by teachers to construct the

specifications of a task or the construction of test items. For those purposes, information

is provided in section 4.1: ‘the context of language use’ (domains, conditions and

constraints, mental context), section 4.6: ‘Texts’, and Chapter 7: ‘Tasks and their Role in

Language Teaching’ (179) as well as in section 5.2: Communicative language

competences.

On the other hand, the use of the framework as a provider of Learning Standards and

guidelines for the construction of tasks and test items is not its sole function. The CEFR

also deals with the assessment of languages in many other different manners. This is

because “the scales provide a source for the development of rating scales for the

assessment of the attainment of a particular learning objective and the descriptors may

assist in the formulation of criteria” (179). The descriptors, which refer to the short text

that contains a description of what each level of reference consists of, can be used by

teachers for teacher assessment and also for student’s self-assessment. On the one hand,

the descriptors for communicative acts, for instance, may be particularly helpful for

feedback, as they can give the students an overall impression of their performance in a

96

task. Scales may also be a good tool for summative assessment since teachers can build

their grids or checklist on the grounds of the framework. As is well explained in the

document, such a descriptor “Can ask for and provide personal information might be

exploded into the implicit constituent parts I can introduce myself; I can say where I live;

I can say my address in French […]” (180).

On the other hand, the descriptors of proficiency can also assist in self-assessment and in

the creation of tools, such as checklists, grids or examination rating scales for

performance assessment.

The Framework is also a practical instrument to “relate national and institutional

frameworks to each other, through the medium of the Common Framework” and to “map

the objectives of particular examinations and course modules using the categories and

levels of the scales” (182). This means that the Learning Standards used in a determined

level in one institution are very similar to the ones used in another institution of a

complete different geographical point. It also implies that a student with a certified B2

level has the same competence in a language as any other citizen in the European Union.

Finally, the reference to the levels of the scales is a clear link stated by the Framework

itself to use it as a reference for the building of grading scales or rubrics in order to assess

language learners.

3.1. Common Reference Levels

Chapter 3 of the CEFR document deals with the six different levels which the European

Council establishes for the knowledge of a language. These levels are the following: A1

(Breakthrough), A2 (Waystage), B1 (Threshold), B2 (Vantage), C1 (Effective

Operational Proficiency) and C2 (Mastery). Sometimes, the six levels are referred using

97

basic or elementary (lower and upper) for the lowest two (A1 and A2), intermediate

(lower and high/upper) for the ones in the middle of the scale (B1 and B2), advanced (C1)

and proficiency (C2). As the framework states, the levels can be read from the highest to

the lowest, or the reverse. However, the levels are always presented from top to bottom

for consistency. Moreover, “each level should be taken to subsume the levels below it on

scale. That is to say someone at B1 is considered also to be able to do whatever is stated

at A2, to be better than what is stated at A2” (36-37).

All the scales provided by the framework can be separated into three groups: user-

oriented, assessor-oriented and constructor-oriented scales. The European Council

defines user-oriented scales as those which “report typical or likely behaviour of learners

at any given level” (37). These scales are always positively worded, and they state

holistically what the student can do in each level:

“…can understand simple English spoken slowly and carefully to him/her and

catch the main points in short, clear, simple messages and announcements”

(Eurocentres Certificate Scale of language Proficiency cited in European

Framework of Reference, 38).

Assessor-oriented scales deal with how well the student performs. They intend to guide

the scoring process and are usually negatively worded. Nevertheless, the Framework

encourages examiners and scale-designers to avoid these negative statements and try to

use descriptors as positive as possible by describing key features of good performance

examples. The Framework also highlights that those scales which contain more than 5

different criteria “have been argued to be less appropriate for assessment because

assessors tend to find it difficult to cope with more than 3-5 categories” (38). These scales

can be holistic or analytic, though the most recent ones are commonly used to determine

the level (diagnosis-oriented).

98

Constructor-oriented scales “guide the construction of tests at appropriate levels.

Statement are expressed in terms of specific communication tasks the learner might be

asked to perform in tests” (39).

3.2. Learning Standards

The development of the CEFR has been the most ambitious attempt so far to establish

certain common Learning Standards in the European Community. As indicated by its own

name, the CEFR, Learning Standards only relate to Languages, unlike the American

Common Core Learning Standards. Nevertheless, it is best to begin with a definition of

what they are.

“Learning Standards are concise, written descriptions of what students are expected to

know and be able to do at a specific stage of their education” (The glossary of education

reform, para. 1). Despite describing what students should have learned after a course (i.e.,

the educational objectives, what they can do at a specific level of their learning

progression), they do not mention any specific teaching method or curricula. Learning

Standards are normally organised by subject and most of them are common in the

different regions or states of a country.

In the United States, the development of Learning Standards represented highly

significant changes in the American education system. They appeared around the late

1880s and early 1990s and, by that time, each State had developed its own Learning

Standards for each grade. Since then, the so-called “Standards-based Education Reform”

has progressively introduced changes in the school system through different acts. One of

the most important changes was the development of the Common Core Learning

Standards, in 2009, by selecting the most effective Learning Standards which were

already in use in different States across the country. For that purpose, the National

99

Governors Association (NGA) and the Council Chief State School Officers (CCSSO)

took into consideration the comments and proposals of thousands of teachers, parents,

members of the school community and citizens concerned with the issue. [The National

Governors’ Association Center for Best Practices (NGA Center) and the Council of Chief

State School Officers (CCSSO), n.d.]. Although the Common Core Learning Standards

have been adopted by 42 States and some other territories, they have not been free of

controversy, and some of their aspects have been under scrutiny and have been object of

debate. Some of the concerns the critics have brought up are:

- Whether what students learn must be decided by the federal government rather

than by the local communities and parents of each school.

- If the Learning Standards are the most important and appropriate for all the

States and schools.

- If they are or not prescriptive enough.

- If they truly represent learning professions.

In Europe, the CEFR has linked the Learning Standards to the learning of languages. The

European Council points out the Framework’s aim:

“it was designed to provide a transparent, coherent and comprehensive basis for

the elaboration of language syllabuses and curriculum guidelines, the design of

teaching and learning materials, and the assessment of foreign language

proficiency” (Council of Europe, para. 1).

The recommendations of the European Union through the CEFR resulted in significant

changes in the Spanish education system. The current education act, the Real Decreto

(Royal Decree) 1105/2014, of 26 December, linked to the Organic Law of Education,

known as the LOMCE, draws on the Framework guidelines. In fact, the Spanish

100

translation for Learning Standards “estándares de aprendizaje” is mentioned 374 times in

the text. The act points out the Learning Standards provided must be taken as reference

for the curriculum programme and the syllabus design. (Real Decreto 1105/2014 171).

Those Learning Standards, as cited in the text, “permiten definir los resultados de

aprendizaje, y que concretan lo que el estudiante debe saber, comprender y saber hacer

en cada asignatura; deben ser observables, medibles y evaluables y permitir graduar el

rendimiento o logro alcanzado8” (172). In this case, the Learning Standards provided refer

not just to language subjects, but to all of them. The influence of the CEFR in the Learning

Standards provided for the English as a foreign language subject is clear, though. For

example, the descriptor proposed by the CEFR for B1 competence in overall listening

comprehension stands as follows: “Can understand straightforward factual information

about common everyday or job-related topics, identifying both general messages and

specific details, provided speech is clearly articulated in a generally familiar accent”

(CEFR 66). The same level learning standard in the same skill (Year 2 of the

Baccalaureate, B1, listening comprehension) in the Spanish act is as follows:

“Comprende instrucciones, anuncios, declaraciones y mensajes detallados, dados cara a

cara o por otros medios, sobre temas concretos, en lenguaje estándar y a velocidad

normal9” (Real Decreto 1105/2014 442).

The Spanish Learning Standard for the oral production in the same level “Hace

presentaciones de cierta duración sobre temas de su interés académico o relacionados con

su especialidad10” (Real Decreto 1105/2014 443) corresponds to the B1 descriptor of the

8 Allow us to define the learning results and concrete what the student must know and understand in each subject; they must be observable, measurable and evaluable and allow to grade the performance and reached achievements 9 He understands instructions, announcements, declarations and detailed messages in face-to-face conversations or through other means in standard language and at normal speed 10 He makes presentations of a certain length about topics of his academic interest or related to his specialty.

101

CEFR for addressing audiences “Can give a prepared straightforward presentation on a

familiar topic within his/her field which is clear enough to be followed without difficulty

most of the time, and in which the main points are explained with reasonable precision.”

(CEFR 60)

The two aforementioned Learning Standards are just two examples to illustrate the clear

connection between the Learning Standards given by the European Council through the

CEFR and the Learning Standards stated in the Spanish Royal Decree for the subject of

English as a Foreign Language. This connection can be found in all the Learning

Standards stated in the above mentioned Spanish Royal Decree.

3.3. Chapter 9: Assessment

The CEFR devotes an entire chapter to the assessment of foreign languages. More

specifically, to the use of the CEFR with assessment purposes. In this chapter, the

difference between assessment and evaluation is clarified. Furthermore, validity,

reliability and feasibility are three terms referred to as fundamental for any kind of

discussion in this area. Consequently, the framework can be a great help as the assessment

of a particular level should include the same criteria. Thus, if a learner passes a B1 exam

in one country, he/she should be able to pass a B1 exam in any other country. In the same

way, two different learners who have passed a B1 exam should have more or less the

same level. These two examples would prove that the exams were valid and reliable. As

the framework itself states, it can be used in order to specify the content of the tests, the

criteria to determine the attainment of an objective and for the description of levels of

proficiency, so that certain comparisons across different systems can be made. The tests

102

should be also feasible; i.e., they should be practical for the assessors. As the CEFR

provides them with a useful point of reference, the feasibility of their examinations should

be easier thanks to it.

With regard to the use of the framework for the preparation of tests and examinations, it

provides the examiner with numerous criteria that can be consulted when designing a

task. Furthermore, it contains a “sampling of the relevant types of discourse” (178).

Different domains, conditions and constraints as well as mental context can be found in

the Framework. The appearance of examples for all the levels is also very useful for this

task.

The huge number of descriptors provided and classified according to the level constitute

a key source for the development of rating scales and checklists. Hence, the descriptors

of the communicative activities which can be found in chapter four of this thesis, can be

used for the definition of a specification for the design of an assessment task. The scales

are also helpful for reporting the results. Finally, teachers can self-assess themselves or

use the scale to implement student self-assessment. For instance, they can create a

checklist or a type of grid for continuous assessment or for summative assessment at the

end of each lesson/unit or course. Nevertheless, it is important to maintain the scales

positively worded in case some changes are made since it is a typical mistake of such

scales to be negatively worded (181).

The scales of the Common Reference Levels aim to facilitate comparison among systems.

In this regard, if the same descriptors are used in the examination, different tests can be

compared as well as the results of those tests, so that both national and institutional

frameworks can be related.

103

Chapter nine also includes a list of the different types of assessment. In particular, it

contains 26 kinds which are classified in pairs of opposites:

- Achievement assessment/ proficiency assessment

- Norm-Referencing (NR) / criterion referencing (CR)

- Mastery CR/ Continuum CR

- Continuous assessment/fixed point assessment

- Formative assessment/summative assessment

- Direct assessment/ indirect assessment

- Performance assessment/ knowledge assessment

- Subjective assessment/objective assessment

- Rating on a scale/ rating on a checklist

- Impression/ guided judgment

- Holistic/ analytic

- Series assessment /category assessment

- Assessment by others/ self-assessment

All the kinds of assessment cited above are explained in the framework. Most are already

defined in the “Types of evaluation” section of the current work. Due to the importance

that the distinction between holistic and analytic assessment will have for the current

thesis, special attention has been paid to the definitions and explanations provided by the

framework. According to this, it can be stressed that a holistic assessment implies a

“global synthetic judgment” (190), whereas in an analytic assessment, different aspects

are analysed separately. The Framework also clarifies the fact that the distinction can be

made in terms of what is assessed; e.g., whether it is a global category, such as “writing”,

or if the examiner needs to assign separate scores to all the aspects involved. Another

104

example could be how the result is calculated, either with holistic rating scale or with an

analytic grid.

3.4. The CEFR on rating scales or checklist

Advice on how to design and develop effective grading scales or checklist can be found

in chapter 9 and in the appendices. The recommendations contained range from the

suitable number of criteria to the formulation of the descriptors.

Chapter 9 also warns of the importance of choosing a feasible assessment tool; for

instance, if a rubric is selected as the assessment tool, the possible categories that it

includes must be feasible. The framework emphasizes that it is “more than 4 or 5

categories starts to cause cognitive overload and that 7 categories is psychologically an

upper limit” (193). As a result, if the criteria considered relevant for the assessment

exceed that limit, these features ought to be combined and renamed under a broader

category. The framework itself exemplifies the process by providing 12 possible

categories which could be used in the assessment of the oral competence (193):

➢ Turn-taking strategies

➢ Cooperating strategies

➢ Asking for clarification

➢ Fluency

➢ Flexibility

➢ Coherence

➢ Thematic development

➢ Precision

➢ Sociolinguistic competence

➢ General Range

105

➢ Vocabulary range

➢ Grammatical accuracy

➢ Vocabulary control

➢ Phonological control

To illustrate the reduction process, four different scales are presented. These scales show

how the 12 aforementioned criteria have been reduced to five or six categories which

encompass them all (194-195). One of the examples is the Cambridge Certificate in

Advanced English (CAE) which contains five different criteria. The different criteria

encompassed under each one are specified within brackets:

• Fluency (fluency)

• Accuracy and range (general range, vocabulary range, grammatical accuracy and

vocabulary control)

• Pronunciation (phonological control)

• Task achievement (Coherence and sociolinguistic appropriacy)

• Interactive communication (turn-taking strategies, co-operative strategies,

thematic development)

With regard to the formulation of descriptors, appendix A includes several specifications

on the best way to state them. The first remark is on positiveness. Previous research on

proficiency scales detected a tendency to formulate lower levels descriptors negatively.

The framework recognises the difficulty in doing so, “it is more difficult to formulate

proficiency at low levels in terms of what the learner can do rather than in terms of what

they can’t do” (205), but it also encourages the desire to revert that tendency. Some

examples are given, for instance, “can produce only formulaic utterances lists and

enumerations” could be expressed as follows: “produces and recognises a set of words

106

and short phrases learnt by heart” (206) in an attempt to reformulate the descriptor

positively.

Definiteness in the statements is also encouraged by the Council. Avoiding vagueness

and describing concrete tasks are essential for achieving effectiveness. However,

definiteness should not lead to the production of excessively long descriptors since, as the

framework notes, a “descriptor which is longer than a two-clause sentence cannot

realistically be referred to during the assessment process” (207). Brevity also helps the

independence of descriptor desired by the framework. Moreover, the descriptors must be

clear and transparent so that both the examiner and the learner can totally understand what

is expected in the assessment.

As for the ways in which descriptors of language proficiency can be assigned to different

levels, three different methods are possible: intuitive methods, qualitative methods and

quantitative methods. In the case of the last two, the criteria can be started either with

performance or with the descriptors. If intuitive methods are used to design the scale,

there are three possible options:

1. An expert would be in charge of developing the rubric or checklist.

2. A committee develops the grading scale. In this case of a small of group of experts

would be in charge of developing the rubric. They may use drafts and work on

them.

3. If an experiential principle is chosen, a committee will develop the grid and

afterwards a systematic piloting and feedback could be implemented to check its

effectiveness.

Selecting a qualitative method for the creation would involve “small workshops with

groups of informants and a qualitative rather than statistical interpretation of the

107

information obtained” (209). On the other hand, if the method is quantitative it would

“involve a considerable amount of statistical analysis and careful interpretation of the

results” (210).

3.5. Evaluation of competences

In the document “Assessment of Key Competences in initial education and training:

Policy Guidance”, the European Commission defines a key competence as “a

combination of knowledge, skills and attitudes appropriate to a specific context” (6)”. It

is explained that these key competences encompass not only “traditional” competences

such as communication and digital competence but also some others such as learning to

learn, cultural awareness, initiative…etc.

The CEFR makes a distinction between general competences and communicative

language competences. The declarative knowledge, skills and know-how, “existential”

competence and the ability to learn are included as general competences.

Declarative knowledge includes knowledge on the work which embraces “locations,

institutions and organisations, persons, objects, events, processes” (46) and “classes of

entities and their properties and relations” (102); and it also includes sociocultural

knowledge (everyday living, interpersonal relations, values, social convention or body

language among others) and intercultural awareness.

Social, living, professional and leisure skills will form the skills and know-how

competence. With regard to existential competence, this consists of “factors connected

108

with their individual personalities, characterised by the attitudes, motivations, values,

beliefs, cognitive styles and personality types which contribute to their personal identity”

(105). Finally, the final general competence, ability to learn, is about studying and

heuristic skills and communication awareness.

Communicative language competence, defined by the CEFR as “those which empower a

person to act using specifically linguistic means” (9), are classified into three types:

linguistic, sociolinguistic and pragmatic. The first is formed by the “traditional

competences”, such as lexical, grammatical, semantic, phonological, orthographic, etc.

Sociolinguistic competence refers to those abilities to use the language in a social

dimension. This means knowledge of different formulas, including greeting, introduction,

politeness and register, among others. To finish, the pragmatic competence concerns the

organisation of speech, language functions and the fluency and propositional precision.

All of these competences must be taught, worked and assessed in the classroom. The

CEFR has eased the process by providing the educational community with different

grading scales in which descriptors about the different levels of each competence are

formulated. Therefore, it can be checked to what extent students masters a particular

competence, either general or linguistic, and whether their level is suitable for the course.

It is also important to highlight that the approach with which all these competences must

be acquired: “an action-oriented one in so far as it views users and learners of a language

primarily as ‘social agents’ (p.9).

Besides general and communicative competences, Jose Ángel del Pozo states that

professional competence is also fundamental in current society, where education is

frequently linked to the work world. In this regard, del Pozo gathers the competences

from the different approaches mentioned and classifies them three broad types. Basic

competences are related to the previous knowledge that allows the student to enter the

109

work world, cross curricular competences are related to social skills, team work,

methodological skills; and specific competences refer to the specific and technical

abilities required by a profession (16-17).

In the chapter eight of the CEFR, two other competences are defined: plurilingual and

pluricultural competences. These competences are defined as the “ability to use languages

for the purposes of communication and to take part in intercultural interaction, where a

person, viewed as a social agent has proficiency, of varying degrees, in several languages

and experience of several cultures” (168). The plurilingual competence boosted by the

European Commission has been introduced slowly into the Spanish education system. In

recent years, plurilingual education systems have become commonplace in Spain. In

those schools, English is taught not just in the subject of English but also in some other

subjects, such as music, P.E, biology, arts, mathematics…. However, this does not fit the

CEFR’s definition of the concept of “plurilinguism”. It is indeed linked to another

educational movement, also currently in vogue, Content and Language Integrated

Learning, much better known by its acronym CLIL. The main idea is to learn a language

through the learning of contents of a different area.

The role of the Framework is intended to be open and dynamic, transparent,

comprehensive and coherent but non-dogmatic. As a result, it does not support any

particular language teaching method and it does not position itself in any current dispute

on language education (CEFR 18). This does not imply the CEFR does not have a huge

impact on education policies. In contrast, it

“will enable them [users] to approach public examination syllabuses in a more

insightful and critical manner, raising their expectations of what information

examining bodies should provide concerning the objectives, content, criteria and

procedures for qualifying examinations at national and international level” (20).

110

So, for instance, if the Framework provides a description of the levels for different

competences and those are taken into consideration and selected as a learning objective,

it will lead to particular content choices in the curriculum and the syllabus.

Concerning assessment, the CEFR devotes an entire chapter (Chapter 9) to this important

subject. This chapter provides the education community with an extensive list of different

types of assessment, although it is not exhaustive, as it has been clarified in the document

itself. The CEFR can be taken as a reference or resource for assessment in multiple ways.

One of them is for the specification of the content for examinations or the criteria for the

attainment of a language objective. Furthermore, the descriptors may be used to construct

tasks, to give feedback, and also for self- or teacher-assessment, by using them as a sort

of checking list or grid.

The European Commission intends to promote the key competences in order to achieve

certain general objectives. These aims are meant to reduce early school-drop-out, to

increase early childhood education, to provide a better support to teachers and to provide

students with high-quality learning based on significant and relevant curricula.

111

Chapter 4: RUBRICS

The use of rubrics is becoming increasingly popular within the educational community.

As opposed to traditional tools of evaluation, which normally measure the memory

knowledge and cognitive skills of the learner, rubrics are assessment tools which measure

the performance of the learner in a standardised way. K. Bujan (75) argues that rubrics

are being used in order to grant traditional qualifications a more authentic or real value.

Rubrics are similar to templates, file cards or forms which are used as a guide to assess

specific activities (Castillo and Cabrerizo 405). It is essential to highlight that rubrics

allow the teacher to assess not just intellectual skills, such as critical thinking, analysis,

opinion, or creativity, but also the learner’s attitude (Bujan 75). One of the strongest

points of using rubrics as an evaluation or assessment tool is the fact that they work as a

guide for both teachers and students. This means it is highly encouraged to allow students

to consult the rubrics the teacher uses before being assessed. Knowing what aspects are

going to be measured and what needs to be placed in one descriptor or the other could

help learners in many different ways. First of all, they know exactly what is going to be

assessed, they know what the assessment criteria will be, they can better understand the

process of evaluation and the process of assessment and they get more involved in it.

Furthermore, it could also be recommendable at a certain point to allow students to create

a rubric as this could help them to understand how it works and to reflect on the

assessment process. Students could also attempt to self-evaluate themselves using the

rubric and arguing and defending their opinions.

4.1. Definition

112

Rubrics can be defined and understood in slightly different ways. Brookhart defines

Rubrics as “a coherent set of criteria for students' work that includes descriptions of levels

of performance quality on the criteria” (4 cited in Wang). Melissa D. Henning gives a

definition in the same line, “a set of scoring guidelines that evaluate students' work and

provide a clear teaching directive”. As those definitions state, rubrics are mainly

descriptive and not evaluative. Heide Andrade argues that rubrics “are often used to grade

student work but they can serve another, more important, role as well” (1). At Berkeley

University Center for Teaching and Learning, rubrics have three characteristics: the

criteria students must achieve for a task, the indicator of quality which students should

know and follow in order to pass the task; and as a scoring tool. However, the use of

rubric is also highlighted not only as a summative or formative tool but also as a teaching

tool which benefits teachers, students, and the entire teaching-learning process. Henning

explicates very clearly the process:

[Rubrics] “convey the teacher's expectations and they provide students with a

concrete print out or electronic file showing what they need to do for the specific

project. Typically, a teacher provides the rubric to students before an assignment

begins, so students can use the rubric as a working guide to success.”

Furthermore, they help the teacher during the assessment as they provide them with a

complete range of criteria and goals in different aspects, not just grammatical. They also

contain curriculum goals and standards. In addition, they enable students to understand

their scores by comparing them with the rubric used. Self-assessment and peer-

assessment may also be encouraged by the employment of rubrics.

113

4.2. Why use rubrics?

Research on rubrics in Spain, although scarce, underlines the necessity of using rubrics

aimed at the methodological changes framed by the European Council. The establishment

of the CEFR, the changes produced in post-secondary education brought about by the

process of Bologna together with the enactment of recent educational legislation have

shifted the focus within in the teaching-learning process. The teacher is no longer the

centre of attention; neither is theoretical knowledge the aim of the process. In contrast,

pupils are now the main focus. Students are guided by a teacher who facilitates the pupil’s

own learning while working on different skills. The final objective is to ensure that

students are prepared for the labour market and to ease their integration into the

workforce.

These significant changes in the teaching-learning process convey an evaluation process

with a different aim: not only the assessment of the knowledge, but also assessment as an

important tool to improve the teaching. It demands student participation through self-

assessment and peer assessment and the need for teacher’s feedback.

Rubrics may be a very useful tool to adapt the teaching-learning process to the recent

demands of the system. Cano gives reasons why teachers should support the use of

rubrics:

a) Because of its formative value instead of summative. Even though they are more

and more used to assess nowadays but they were first associated to qualitative

feedback.

114

b) They may guide the learning process. They are useful for the teachers as they

make sure that their teaching is in harmony with the criteria that they are going to

apply to assess the students. Moreover, students can know what is expected and

they may also learn what points they need to improve after the assessment.

c) Because of its building value. The participation of the students in the creation of

the rubric allows a major implication in their learning process and it helps them

to learn to learn.

d) Because of the monitoring of long-term development. Rubrics allow the

development of students over different courses to be checked.

e) Because of the scientific evidence. There is a great deal of scientific evidences on

rubrics’ benefits and validity. (269-270)

The centre for Advanced Research on Language Acquisition, CARLA, makes up to

twelve arguments on why the educational community needs rubrics. Some of them are

very similar to the ones mentioned above, such as “helps instructors to clarify goals and

improve teaching” or “learners can develop ability to judge quality, own work and peer’s

work” and others encompass other possible arguments in favour of the use of rubrics. For

instance, using rubrics can “answer parents’ questions” as they can see what students

must achieve and how they have performed, it allows the time evaluating performance

and giving feedback to be reduced, it aligns evaluation criteria to standards, curriculum,

instruction and assessment task and increases “reliability, implies consistency and

objectivity”.

Positive outcomes with regard to rubrics have been found in several studies. For example,

a study carried out on the use of rubrics in different educational centres has achieved

115

positive and encouraging findings. A case study conducted in the Universities of Granada

and Vigo (Gallego Arrufat and Raposo-Rivas) concluded that, after using the rubric

during a whole term in one subject, students thought that its use increased their motivation

and boosted cooperative work. They expressed themselves in that way through a survey

based on a Likert scale (211). Another case study carried out by Verano-Tacoronte et al.

designed a rubric based on specific literature read. This rubric was validated by a panel

of experts and later used to assess undergraduate students. Students had access to the

rubric in order to prepare their presentations in pairs. They were assessed by a team of

techers using the rubric and the results showed a high reliability of the rubric as the scores

given by the teachers were fairly similar.

Some authors have investigated rubrics by analysing research conducted by other

researchers on the issue. For instance, Jonsson and Svingby revised seventy-five scientific

field studies on the reliability and validity of the rubrics and they concluded the scoring

is more consistent when using them, and both reliability and validity of the assessment

process increased by employing them as an assessment tool. Panadero and Jonsson

analysed twenty-one studies finding that rubrics provide transparency to the assessment,

reduce anxiety, aid with feedback, and help to improve students’ self-efficacy and self-

regulation.

Nevertheless, many authors have also noted drawbacks. Verano-Tacoronte et al. warn of

the scarce training teachers still have (43). Other researches are also critical towards the

use of rubrics, although they do not discourage their use. Reddy and Andrade are critical

with the carelessness about how valid the rubrics of some researches are, since they

describe neither the rubric development nor the content validity (cited. in Cano 273).

116

Panadero, Alfonso-Tapia and Huertas concluded that, in spite of improving the feedback,

rubrics do not boost learning by themselves (cited in Cano 270-274).

Furthermore, a study carried out by Velasco-Martinez and Tojar-Hurtado and published

in Investigar con y para la sociedad attemped to ascertain to what extent teachers are

using rubrics to assess competences. With this in mind, they analysed 150 different

rubrics used by teachers of different universities in Catalonia, Aragón, Galicia,

Extramadura and Castilla y León. Among the results obtained, it was discovered that the

branch of social and legal sciences is the one in which rubrics are mostly used (34%) as

opposed to arts and humanities (only 4%). Another finding was that rubrics were mainly

used to assess essay writing (36%) and hardly ever used to assess visual or graphic

resources (2.7%), which implies a traditional conceptualisation of knowledge as

something that can be memorised. In addition to these findings, the authors also provide

the educational community with some other interesting data they gathered, such as the

teaching methodology the participant teachers usually apply in their lessons. These data

evinced that the teacher-centred lecture is still the most used methodology (36.7%) while

other innovative methodologies have no significant presence in those universities; for

example, only 1.3% of the respondents use portfolios and 6.7% use case studies with their

students. (1396-1400)

4.3. Historical Overview. Rubrics in Education

The word rubric dates back to the Middle Ages. Popham clarifies that the term was used

by Christian monks, who spent their time in monasteries copying scriptural literature in

Latin. They frequently used large red letters in order to mark the beginning of each major

117

section. The Latin modifier for red materials was called rubric, so “rubric” was employed

to name the label of a section and, by extension, a category (6). Ayhan and Türkyılmaz

also mention “rubric” as “once used to signify the highlights of legal decision as well as

the dictations for conducting religious services” (82).

Gavin Brooks argues that rubrics were introduced in the L1 classroom in order to assess

students’ writings (229). Until that moment, writings were scored based on the teacher

criteria, who had to come up with a mark without any specific guidelines to support his

or her decision. According to Brooks, rubrics “were first proposed as a tool to analyse

writing in 1912 when Noyes suggested the use of a rubric as a means of standardizing the

evaluation of student compositions” (230). Noyes thought students were too subjective

when they scored and decided to create a rubric to grant more objectivity to the

assessment process (cited in Gardner, Powell and Widmann 2). At that time, the purpose

of rubrics was simply the assessment of students by means of an objective scale and not

the improvement of students’ writing. In the same year, Milo B. Hillegas created A Scale

for the Measurement of Quality in English Composition by young people, which would

be known as the Hillegas Scale. As Turley and Gallagher commented, this scale created

by a professor of Columbia University College “offered a scientific way to quantify the

quality of student compositions fashioned on a statistical model of normal distribution”

(88). The Hillegas scale was used by many schools in the United States. Edward

Thorndike, who had been Hillegas’ professor at Columbia, improved his scale in 1915,

by “substituting new specimens for certain of the original samples and by including

several examples in the steps at or near the middle of the scale” (Hudelson, cited in Brooks

230). Noyes, Hillegas and Thorndike early rubrics were used to compare and rank

different schools all over the United States and some school headmasters even used them

118

in order to assess their teachers (Turley and Gallagher 88). That was probably the first

step towards using rubrics not only for the assessment of students but also for the

“assessment” of teachers. In the University of Detroit (Michigan), S. A. Courtis,

supervisor of educational research in the 1910s, used the scales in order to assess the

effectiveness of teaching methods. The teachers had to use the scale in their class so that

he could supervise the teacher’s performance by comparing the writing scores of the

students. In this way, he could check if the scores were decreasing or increasing. In the

event there was a decrease, the supervisor would intervene and work with the teacher to

improve the results. (ibid. 89)

Rubrics are still used nowadays all over the world. While in Spain they are still starting

to be used as an assessment tool and occasionally as a teaching tool, in the United States

their use is widespread. As has been explained above, they were used initially for

assessment purposes, but after the 70’s, they began to be used to provide students with

feedback (Brooks 230). However, they are still used in order to make comparisons

between schools, and its effectiveness as a tool to improve students writing has been

questioned by many American scholars in several academic journals. On the other hand,

some scholars strongly support the use of rubrics.

4.4. Types

There is also a large variety of rubric types. This is due to the fact that different criteria

can be used to classify them. Hence, according to the criteria we chose, we can allocate

the grading scales under different labels. In the present thesis, only the most common

ones are mentioned:

119

a) According to how is measured:

❖ Holistic or Global: the different parts are not measured separately but together, as

a whole. This way, performance is compared to the criteria in general. Popham

notes holistic scoring makes one overall, holistic judgment and there is a simple

reason for using them: holistic rubrics “saves tons of time”. Ayhan and Uğur

define them as follows “raters judge by forming an overall impression of learners´

performance and matching it to best fitting column on the scale” (p. 88). They

also expound that each of the scales describes performance in relation with several

criteria such as grammar, vocabulary, fluency, etc. Rubrics usually consist of 5 or

6 dimensions. Some of the positive aspects of using this kind of rubric are that

they are time saving, they can be used in different tasks, they are easier to

understand by children and they focus on what students are able to do, and not the

other way around. On the other hand, non-specific feedback is provided, and

students may meet criteria in some issues and not in others so it may be difficult

to locate them in one level.

- Primary trait: this kind of holistic rubric focuses on only one individual

characteristic

❖ Analytic or Partial: the performance of the student is compared to each of the

criteria stablished, separately. Sometimes, separate marks can be put together in

order to obtain a total mark. Analytic rubrics, in Popham’s words, “supply

diagnostic data of considerable utility” (17). Furthermore, they can show students’

progress in different aspects as well as pointing out their strengths and weaknesses

specifically (Ayhan and Uğur). In contrast, they are more difficult to design, more

difficult to use and more time consuming.

120

- Multiple trait: rubrics of this kind are very similar to the analytic ones

and the terms are often exchanged. However, Ayhan and Uğur explain

that the difference lies in the fact that “analytic rubrics evaluate more

traditional and generic dimensions of language production, while

multiple trait rubrics focus on specific features of performance” (89).

b) According to the scoring type:

❖ Quantitative: when the rubric has been composed to give a numeric mark.

❖ Qualitative: the qualification of the rubric is not numeric.

- Words: the different aspects are assessed with comments such as good,

excellent, poor, etc.

- Alphabetic: when the rubric is designed to provide a result in form of a letter:

A, B, C, D, E.

- Graphic: the results of this sort of rubric are presented through squares or

graphic symbols

- Symbols: the results are showed by a picture or sign, such as an emoji or

smiley. They are especially useful with young learners.

❖ Mixed: a mixture of more than one type of the above-mentioned types of rubrics.

c) According to its theme (Goldin 8-14)

❖ Domain-independent: this kind of rubrics consists of criteria which apply to any

theme, area or skill.

❖ Domain-relevant: they are opposite to the above mentioned. They are linked to an

area, owing to which they use vocabulary and terminology of that domain.

121

❖ Problem-specific: they are very specific, they are used not for a specific task but

to assess how the students address and solve a particular problem which they may

face in their future professional careers.

❖ Open-ended problem: this is very similar to the previous one but, in this case, the

kind of situation the students must face requires them to think more freely as there

is not a unique way of performing well.

d) According to its application

❖ Hypergeneral rubrics: they are, as Popham defines, “excessively inexact scoring

guides” (19). They contain excessively vague criteria, dimensions and descriptors

which may be applied to any kind of task, any skill and any domain.

❖ Task-specific: those kind or rubrics are built just for the assessment of one

particular activity. The main disadvantage is that they are “essentially worthless

… because they deal with a student’s response to a particular task” (21)

❖ Skill-focused rubrics: as the name explains, these rubrics assess one skill:

speaking, writing, listening or reading. If they are well built, they are very useful.

e) According to its function

❖ Proficiency/Diagnostic: its goal is to determine the learner’s level. It may be used

to place a student in a concrete level or to prove his/her level to obtain a certificate.

❖ Achievement: this kind of rubric attempts to reveal to what extent the learner has

achieved the course goals or how he/she has performed in a task.

122

f) According to the scorer

❖ Peer review/ co-assessment: rubrics used by learners in order to assess their

classmates. They can assess other learner’s work or even assess their colleagues’

work as part of a team.

❖ Teacher: rubrics used by the teacher to assess a student. Even though the teacher

is the one using them, it is highly recommended that he or she share it with the

students and explain it to them before the assessment.

❖ Self-assessment: rubrics used by the students in order to assess their own work.

g) According to the channel

❖ Paper: traditional rubrics printed on paper

❖ IRubric: online rubric. They are shared online with all the students. Some types

of software allow the teacher to provide feedback based on their assessment with

the rubric.

There are specific rubrics for assessing behaviour. According to the type of scale used

(Marin-García et al. 51-54)

❖ Rating scales: the descriptions of the different dimensions are numbers. They are

easy and quick to use but they may lead to doubts on their interpretation, the

feedback is vague and they have validity and reliability problems.

❖ BARS: this type of scale is used to assess behaviour. Key aspects of behaviour

are ordered from the least to the most efficient. They are very clear and more

reliable and objective, but they are difficult to create.

❖ BOS: they gather frequency data of behaviour observations. They are difficult to

apply buy they assess every dimension separately.

123

❖ Paired comparison: two students are assessed at the same time. The rater decides

which of them is better in each of the dimensions. It is very fast and quite reliable

but it implies some ethics issues.

4.5. Parts of a rubric

A rubric consists traditionally of a grid with multiple cells. The names given to the

different parts a rubric varies from one author to the other, but the parts mentioned are

basically the same although they may not use an identical term. Rubrics may have only

two sections (if they are holistic) or four (when they are analytic).

When the rubric is composed by two main sections, there is one vertical column and a

horizontal one as well. One corresponds to the language descriptions and the other one to

the scores. The two sections are frequently assembled, and they form just one column in

which each cell contains the score and the language description of that level.

Rubrics which have 4 sections normally contain a task description at the top. They usually

have many columns and cells. Either the left vertical column or the first horizontal line

may be called the scale and it can also be known as scores or performance levels. These

scale levels may be a number, but most often they are expressions which indicate the level

of achievement (Excellent, Good, Poor, etc.). If the scale is placed on the horizontal line,

then the first column on the left will contain the dimensions, also called criteria. Those

will indicate what is being measured (Grammar, Vocabulary, Cohesion, Coherence,

Fluency, Clarity, Visual Contact… etc.). Finally, the remaining cells in the grill form the

descriptors of the dimensions or the performance; i.e. they describe what is expected from

124

an X dimension or category in a determined scale level. Descriptors of the dimensions

might contain many qualifying adjectives and even examples of performance. García-

Sanz mentions that the Structure of the Observed Learning Outcomes Taxonomy

(SOLO), which was created by Biggs and Collins in 1982, is based on the progress from

incompetence to competence in order to create the achievement scale (6). Atherton

summarizes the V stages as follows:

“Pre-structural: here students are simply acquiring bits of unconnected

information, which have no organisation and make no sense. Unistructural: simple

and obvious connections are made, but their significance is not grasped. Multi-

structural: a number of connections may be made, but the meta-connections

between them are missed, as is their significance for the whole. Relational level:

the student is now able to appreciate the significance of the parts in relation to the

whole. At the extended abstract level, the student is making connections not only

within the given subject area, but also beyond it, able to generalise and transfer

the principles and ideas underlying the specific instance.” (cited in García-Sanz

93)

These stages can ease the teacher’s design of their own rubrics. They just need to take

them into account to stablish the performance scale and the descriptions of the different

dimensions.

4.6. Advantages and disadvantages

The main advantages of the usage of a rubric are as follows (Bujan 77 and Castillo and

Cabrerizo):

125

❖ The assessment is more objective, more standardised and more consistent,

especially when a practical performance such as a speaking or writing is being

measured.

❖ The teacher clarifies the assessment criteria very specifically.

❖ It adjusts better the assessment to the criteria.

❖ Rubrics provide useful feedback on the educational process.

❖ They allow the teacher to supply documentary evidence on the student progress.

❖ Rubrics ease students’ understanding of the results.

❖ They allow students to focus on what is expected from them and to review or

adapt their performance to what it is expected. (407)

José Ángel del Pozo also adds:

❖ Rubrics indicate clearly the strengths and weaknesses of the students.

❖ They boost students’ responsibility of their own learning (59-60)

Moreover, rubrics can be created by the students, by the teacher or another person and

they can be used for a great amount of purposes (Castillo and Cabrerizo): self-assessment,

peer-assessment and the assessment of: teachers, oral speeches, written productions,

individual works or essays, group works or essays, a portfolio (207).

In this regard, Raposo-Rivas and Martínez-Figueira clarify that peer assessment has

multiple advantages because it is an incentive for students, it allows them to develop

interpersonal strategies and social abilities, it helps them in the development of

professional skills and the reflexive and critical thinking. For such an important and

126

enriching practice, rubrics are an inestimable useful tool as they will enable students to

assess their peers in an objective way. (99-200)

Nevertheless, there may also be certain disadvantages. The CARLA mentions that the

“information provided by primary trait rubrics is limited and may not easily translate into

grades”. Besides, “task-specific rubrics cannot be applied to other tasks without

adaptation of at least one or more dimensions.” (cited in Ayhan and Uğur 90). Elena Cano

provides the educational community with reasons against the use or overuse of rubrics.

She specifies that attitudes may not always be correctly measured with rubrics as they

require long-term assessment. Furthermore, she disagrees with task-oriented rubrics as

they are too time-consuming, and she warns against the difficulty that creating a real valid

rubric implies. She defends her argument by indicating that there are many non-valid

rubrics available which may be harmful for the assessment if they are used and, at the

same time, some other tools may be used instead for the same purpose, such as checklists,

interactive feedback, portfolios… etc. Tunner (cited in Goldin 2011) supports this idea

by saying “while there seems to be a general consensus that rubrics are important and that

they improve the peer review activity, there is not as much agreement on how they should

be implemented” (22). There is no standardised and shared conception of the term rubric

within the educational community, neither in the design process nor in the applications

thereof, and this is a clear disadvantage for its common use and utilisation (Castillo

Tabares et al. 75-76). Some research carried out on the use of a rubric in order to assess

team work in the University of Lleida (París et al.) mentioned among their conclusions

that the rubrics used were not very flexible and adaptable to different circumstances,

alternatives were not proposed and there were difficulties in the coordination among

members. However, they highlighted the implication of the students in the goals, tasks

and achievements together with a strong feeling of integration in their own teams (95).

127

Similarly, White and Winkwort state that rubrics “connect key types of partnership

building […] with 3 key drivers that enable partnerships to grow” (5). Those key drivers

are the shared commitment, the capacity to hold collaboration and the common vision for

what can be achieved through it.

4.7. How to build a rubric

There are several steps which must be followed in order to design or create a rubric from

scratch. José Ángel del Pozo points out that determining what is going to be measured is

the first step. Once the skill is decided, the task can be designed or chosen. Afterwards,

the type of rubric which will be used for scoring must be selected. The type of rubric

employed must be chosen on the grounds of the task and context (number of students,

space, etc.). With the aim of creating the performance levels, it is important to start writing

only three levels, the maximum, the minimum and an intermediate one. Once those levels

have been selected, they are used as a reference to write the intermediate ones. The

following step is writing the description of each of the performance levels. What can be

observed from the learner must appear, it is advisable to look for an “excellent” model to

identify which characteristics define a good work. Avoiding the use of ambiguous words

is important as well as expressing the performance criteria in terms of observable student

behaviours or product characteristics. If possible, the different criteria which will be

assessed should be placed in the “order” they are likely to be observed to ease the scoring

task. Once the rubric has been designed, it is convenient to check whether it is really

useful. We could use a rubric to evaluate the rubric created, such as the one mentioned

above, or we could ask another teacher or scorer to check it. It is also recommendable to

implicate the learners in the process; i.e., by explaining and showing them the rubric, even

allowing them to make suggestions and little changes. If it is oral, the rubric has to be

used immediately after the performance. Furthermore, a good idea for improving the

128

teaching-learning process is to ask the students to evaluate their own work or their

classmates’ work. This way the teacher can compare his or her own score with the one

given by the learner and detect possible problems and misunderstandings. It would be

interesting to conduct an interview with the learners so that they can exchange

impressions and give a fully detailed feedback which helps both the student and the

teacher.

4.8. Online Tools for building a rubric

The following section analyses different tools which are available on the Internet for

building a rubric. It will briefly describe their appearance and how they work, and also

what their main advantages and flaws are.

Annenberg Learner is a tool created by the Annenberg Foundation. This foundation

promotes excellent teaching in American schools and offers different multimedia

resources to help teachers to improve their teachings methods. One of the resources they

have created is the Annenberg Learner. It is basically a simple resource to build rubrics.

The web design is very plain, not over-elaborate, but easy to navigate through. The rubric

can be created in just seven steps: first, a title must be given; secondly, it is possible to

choose the format, either in form of table or a list. The third step deals with the scale.

Users can select one of six different scales, one of which is numerical. Next, the order of

the scale must be decided, from worst to best, or vice versa. The fifth step refers to the

assessment criteria; there are twenty to choose from and the user can include as many of

them as he or she wishes. Step six gives the chance to put the assessment criteria in the

order wished. The final step provides the opportunity to make amendments in any of the

129

previous steps and to generate a pdf with the created rubric. The strengths of this tool are

its simplicity, speed and the fact that registration in not required.

Fig. 1. Annenberg Learner. Rubric creator. Screenshot.

Essay Tagger Common Core Rubric Creation Tool is also a rubric creator with a simple

design and use. The creation starts with the selection of the target grade level, which is

an indicator of quality, as it takes into account the target audience. Its handicap is the fact

that the levels are solely based on the British school system, which is a problem for

Spanish users. The second step for the rubric creation is the selection of the standards that

will be applied. Since the tool is created in line with the UK system, the common core

standards that appear are those included in the British education acts. However, this

should not be a problem, as most of the ones referring to speaking, listening, writing and

reading can be easily applied to the standards of the CEFR. For instance, “choose

language that expresses ideas precisely and concisely, recognizing and eliminating

130

wordiness and redundancy” (Common Core Standards, Language, Knowledge of

Language, as cited in Essay Tagger Common Core Rubric Creation). This resource also

allows the user to specify if the CC Standards will be applied to sentence, paragraph, or

whole document level. Finally, a title must be written, and an email address provided.

This way, the user will receive a link to the rubric created. Furthermore, some more rubric

elements which are not linked to the CC Standards can be edited once the rubric is created.

This rubric is, as has already been said, not targeted to a Spanish context. Nevertheless,

it is a useful model to try to create an equivalent tool in which the standard outcomes of

the educational law can be included.

Fig. 2. EssayTagger.com. Essay Tagger Common Core Rubric Creation Tool. Screenshot

iRubric is a really interesting tool created by Rcampus (Reazon Systems Inc.),

comprehensive Education Management System and a collaborative learning

131

environment. Besides providing teachers with the possibility of creating a class

(attendance list, score lists, schedules, send messages to students, post notes, etc.), it

allows them to look for a rubric out of the hundreds which other people have shared, to

create a rubric from scratch, or even to edit one already made by other teachers. It also

provides the teacher with the opportunity of sharing the rubric with the class in order to

assess in collaborative manner. Rcampus is free for the educational community but it does

require subscription.

Fig. 3. Irubric by RCampus. Screenshot

Rubistar is probably one of the most famous online tools for creating rubrics. It belongs

to the University of Kansas (created by Altec). The rubric is created in just three easy

steps: selection of topic/area/skill, selection of a customisable rubric, and selection of

categories (they are created automatically but they can be edited. The data base provides

the user with myriad rubrics to choose from, which is a very strong point in its favour.

The downsides include that the rubric can only be written in English, despite the fact that

the rest of the web page offers the possibility of being read in Spanish. Additionally, up

132

to nineteen rows can be chosen for the categories, but only four columns can be created

for the scale.

Fig. 4. Rubistar. Screenshot

Teachnology General Rubric Generator (Teachnology Inc.) is an easy tool to manage and

it allows an image to be selected for the rubric. Nevertheless, if one wishes to choose

some of the options, a fee must be paid. Moreover, it does not offer any pre-designed

material so the creation it is more time-consuming and difficult. Only four descriptors

and five categories can be selected.

133

Fig. 5. Technology General Rubric Generator. Screenshot

Quick Rubric (Clever Prototypes LLC) has a highly visual and helpful design. In the main

screen the user can visualise a blank rubric with a table format. All the elements can be

modified, the number of rows, the number of columns and the order of each of them. The

descriptors, the criteria, the categories and the scale can be introduced by the user. The

134

rubric can be saved and printed once it is finished. The creation process can be slow since

it does not provide any criteria example. Registration is also required.

Fig. 6. Quick Rubric. Screenshot

Rubric-O-Matic is Australian software created by Peter Evans. It is a much more complex

tool than the previous ones. It needs to be downloaded and installed as a complement of

Windows Word, but it provides many different functions. Besides the creation of rubrics

and access to many examples of rubrics, it contains marking scales from several

educational systems of Europe, Australia and United States. It has an automated grading

function, so that once the assigned mark for a criterion is given, it can automatically

calculate the total score. Moreover, as explained in its web page, it allows the user “to

create and use detailed reusable feedback comment banks, insert audio comments into

assignments, highlight a phrase in the assignment and click a button to do a plagiarism

search or to highlight easily confusing phrases or words.”

135

Fig. 7. Rubric. O-Matic software by Peter Evans. Screenshot

Finally, Princippia: Innovación educativa is an application provided by Google Apps

(Princippia, Formación y Consultoría, S.L.). It is a spreadsheet from Google docs which

can be used as a template for rubric creation. The template contains one spreadsheet with

the instructions for building the rubric, and another spreadsheet for the class list, which

can be edited, an editable rubric with the criteria, and another one for the scoring system.

One of the spreadsheets allows the user to select for each of the pupils the appropriate

descriptor assigned to the different criteria which are being assessed. The final

spreadsheet is the final assessment, which provides the final score for each student

together with each of the individual marks assigned for each criterion in different colours,

so that they can be easily seen at a glance. This tool can be particularly helpful if it is used

for the same class throughout the year, but it can be very time-consuming if many

modifications have to be introduced.

136

Fig. 8. Princippia by Princippia Formación y Consultoría, S.L. Rubric sample. Screenshot

137

Fig. 9. Princippia by Princippia Formación y Consultoría, S.L. Evaluation Criteria.

Screenshot

Fig. 10. Princippia by Princippia Formación y Consultoría, S.L. Final Evaluation.

Screenshot

139

Chapter 5: METHODOLOGY

5.1. Introduction

As has already been stated, this thesis intends to analyse the exams and rubrics used by

the most popular English certificates in Spain in order to examine whether or not they are

effective and, for the rubrics, to determine the most common types used for each skill.

Rubrics will be assessed according to their types, if they measure what they are supposed

to score, and if they are valid and reliable. It is also the intention of the current work to

examine the tasks which compose the papers of the four different skills in each test, with

the purpose of deciding whether or not they follow the guidelines given by the European

Framework of Languages.

Through research people inquire, question, observe and analyse so that the object of study

is found, verified or refuted. According to Hernández Sampieri et al. research can be

defined as the set of systemic and empiric processes that are applied when a phenomenon

is being studied (20). Valid research must be based on scientific methodologies and

instruments to ensure that it is accurate and objective. Gil Pascual claims that researchers

use tools in order to be able to quantify the information or to transform it into figures (15).

Science subjects and areas have traditionally been the fields for research, whereas

educational ones were not considered valid for quantitative research. Nevertheless, Gil

points out the appearance of new research instruments which allow the quantification and

measurement of many aspects. Previously, these aspects were inconceivable in a

scientific investigation (15). Vez claims that the scientific orientation in the educational

area stems from Bloom’s structuralism and Skinner’s conductivism, the cornerstone of

the university knowledge platform (84-85).

140

Vez also expounds how the linguistic culture of the Didactics of Language Teaching

shifted linguistic material towards the didactic objectives level or the didactic contents.

This stems from the mentalist belief that there is always a transfer between conscious

knowledge and linguistic competence (85). Other advances in the linguistic investigation

made by Greenberg, Chomsky, Van Bruen, Fillmore and Andersons and Halliday in

different areas have allowed emphasis to be placed on (Vez 80):

- Semantics as opposed to morphosyntax

- Functionality as opposed to grammar inventory

- Communicative approach as opposed to no communicative.

According to Vez, it was the European Union which assumed the change of paradigm and

achieved, with some other institutional support, the mobilization of research towards a

different direction: languages are not the main objective anymore, rather the language

users (87).

The European Council was responsible for implementing of a set of actions oriented

towards the research of languages from a qualitative perspective and, in 1971, in the

symposium titled “The Linguistic content, means of evaluation and their interaction in

the teaching and learning of modern languages in adult education” (cited in Vez) which

was held at Rüschlikon, Switzerland, between 3 and 7 May , established the following

objectives, among others:

• language teaching should specify worthwhile, appropriate and

realistic objectives based on a proper assessment of the needs,

characteristics and resources of learners;

• language teaching should be planned as a coherent whole, covering the

specification of objectives, the use of teaching methods and materials,

141

the assessment of learner achievement and the effectiveness of the

system, providing feedback to all concerned;

• effective language teaching involves the co-ordinated efforts of

educational administrators and planners, textbook and materials

producers, testers and examiners, school inspectors, teacher trainers,

teachers and learners, who need to share the same aims, objectives and

criteria of assessment. (Vez 89).

Finally, Puren (cited in Vez 95) indicates that, from the new epistemic approach,

theorisation must come from internal data, referring to empirical data collected within the

educative framework by the actors of the teaching-learning process.

5.2. Methodological approach

There are several instruments of research in the educative field. This thesis will analyse

different rubrics as well as the exams and tasks they intend to measure and will compare

them with the guidelines explained in the European Framework.

To begin with, it is fundamental to describe the different sorts of research which exist.

Daniel Madrid notes the three basic types of research:

a) Basic or theorical investigation; used for the construction of abstract theoretical

methods which explain the teaching and learning processes of a language.

b) Applied research: the application of the theoretical models to the educative areas.

c) Practical research: it makes practical use of the other two investigation types. It is

normally based on the premises established by the theoretical and applied research

when they are applied to practical situations in the classroom (12).

142

Taking into consideration the previous classification, the methodology of the current

research can be framed as basic or theoretical. Additionally, it can be stated that it will

employ the technique of “Analysis of Documents”. Hernández Sampieri et al. mention

research using organisational documents and materials such as reports, evaluations,

letters, plans, messages, etc. (434) as part of a qualitative approach. They state that this

approach encompasses a wide variety of conceptions, views, techniques and non-

quantitative studies (20). In addition, Dale T. Griffee perceives data as the lifeblood of

research which connects theory and practice (128) and defines the data collection

instrument (DCI) as “the means, either physical or nonphysical, of producing

quantitative or qualitative data to be analysed or interpreted” (128).

This research will also be qualitative. Madrid refers to this kind of research as an

investigation which does not use numerical data extracted from reality but instead tries to

interpret and describe the reality in detail through words (12). The qualitative method

tries to erase the subjectivism of the person who is analysing the documents (Gil 281).

Through the analysis of the documents’ content, different data are collected with the

typical objectivity and relevance which the scientific method denotes (ibid. 282). For

Berelson, the analysis of content is a research technique for the objective, systemic and

qualitative description of the content of the communication” (cited in Gil 282). It is

therefore the transformation of content in quantitative data. According to Krippendorff,

this instrument is bound to formulate valid inference which can be applied to a context

from certain data (cited in Gil 283). The aim of the analysis of content may be, for

instance, describe tendencies, analyses persuasive techniques, connect features, relate

attributes, etc. (Gil 283).

Hernández et al. claim that the main purposes of the qualitative analysis are:

1) Explore the data

143

2) Give them a structure

3) Describe experiences

4) Discover concepts, categories, themes, patterns and links among the data in order

to give them sense, interpret and explain them with regard to the problem.

5) Understand the context which surrounds the data

6) Rebuild facts and stories

7) Connect the results with the available knowledge

8) Generate a theory based on data. (418)

In order to apply this methodology, it is fundamental to follow the below stated conditions

(ibid. 283):

- Objectivity

- Systematisation (according to organised patterns or standards)

- To be quantifiable

- Manifesto

Griffee suggests a similar process by which “a large amount of raw data is reduced” (128);

this data must subsequently be interpreted by the assignation of meaning. Then, it must

be validated. He defines validation as “an argument by which evidence shows that the

analysed interpretation based on data to some extent reflects the construct” (129).

Hernández et al. defend that qualitative approaches are open and expansive since they are

based on the literature review. Furthermore, they are normally applied to lower number

of cases. In conclusion, they are oriented to learn from experiences and different points

of views, value process and generate theories (361).

In order to apply the method, certain procedures should be followed (Gil 284-287):

144

1) Objective and context

Besides defining the aims and the universe which will be subject of study, it is

important to determine the type of documents which will be analysed.

2) Define the units of analysis

Among the different possible units of analysis (lexis, propositional,

argumentative, etc.) the research-conducted units will be the thematic ones, as

the documents investigated will be all rubrics and skill test papers.

3) Number scheme rules

The presence or absence of a code, its frequency, order of appearance, density,

concentration.

4) Categorisation

It is the classification of the elements of the text from a previously established

criterion.

5) Codification

This is the allocation of codes to each category.

6) Reliability and validity

Reliability can be calculated through the percentage of times that independent

codes coincide.

Validity will come about when all the information of the documents is covered in

the categories and when all the categories show rich results in order to produce

hypotheses and inferences.

7) Data analysis

Analysing the data requires:

145

- A descriptive phase where the frequency of the categories will be investigated

along with their internal and external variables.

- An inferential phase where conclusions will be drawn.

- A multivariate phase in which categories, complex structures and relationships

among the content blocks will be studied.

In addition, Gil vindicates that the content analysis may be carried out through the study

of the thematic content, the semantic content and the network analysis. The thematic

content may consist in word counting; i.e., the analysis of the number of times a word

appears. Afterwards, those with the same stem are grouped together. They can also be

classified by the context in which they appear. The analysis of the semantic content

studies the relationships among the terms. Thus, terms can be related in different ways,

for instance, if one term refers to a part of another, it would be an inclusion relationship;

if it is the output or the input of another, it would be a role relationship.

Nevertheless, research may not always be pure. Qualitative research does not use numeric

data but instead describes through words. Hence, the validation is quite significant, as has

already been mentioned. When the research methodology is not pure, it may be the case

that it is using triangulation. Triangulation is a technique which is normally used “to

validate data collection instruments and meet the charge of subjective bias from single-

method or single-observer studies (Griffee 123). It is generally defined as a combination

of methodologies. As Patton states (cited in Griffee 132), triangulation consists in

verifying the consistency of different data sources within the same methods and using

multiple perspectives or theories to interpret the data. Moreover, Daniel Madrid remarks

that the main principle of triangulation is the gathering and analysis of data from three

angles or points of views so that they can be compared and contrasted (34). He affirms

146

that it is especially useful for qualitative research since, for instance, data may be

collected from different documents or instruments and then combine them all. (ibid. 34)

A number of authors oppose triangulation, since they argue that one cannot mix methods.

However, Patton vindicates that the purity of the method is not as fundamental as the

search for useful and relevant information. He adds that “the research cannot test

predetermined hypotheses and still remained open to what emerges from open-ended

phenomenological observation” (133).

Once the different methodological approaches have been briefly described, it is time to

itemise the current research with the aim of framing it under the right labels. As has

already been stated, the current research will be basic or theoretical since it will be

substantiated in the literature review of the European Framework of Reference for

Languages, Assessment and Evaluation, rubrics and English Certificates with the

intention of determining whether the exams measure what they are supposed to measure,

if the instruments of assessment are truly effective, and if the guidelines promoted by the

European Council are being followed. Accordingly, the research will be qualitative.

The main technique used will be the analysis of documents, in particular the rubrics

which are employed for the assessment of the different English Certificates analysed and

the exam papers which form each of the certificates. Following Hernández Sampieri et

al. (418), the main purposes of the research will be:

1) Explore the data: analyse the tasks and items of the exam papers of each of the

skills assessed and the rubrics (if any) which the examiners use for the assessment

of them.

2) Structure data: all the data will be structured in recording tables.

147

3) Describe their structure, type and effectiveness: The tasks and items will be

described together with their objectives; the grading scales will be classified

according to different criteria and the effectiveness of both exams and rubrics will

be discussed.

4) Discover concepts, categories and patterns: The conclusions will make it possible

to check whether certain patterns exist according to the skill which is being

assessed. Those patterns will be ascertained with the help of a rubric created

specifically for this purpose.

5) Understand the context: all the data will be understood within the Common

European Framework context.

6) Rebuild facts: all the data gathered in the recording tables will be collected in a

new comparison rubric created specifically for this task.

7) Connect the results with knowledge: the conclusions obtained will be connected

to the knowledge available on assessment of foreign languages and didactics of

language teaching and learning.

8) Generate a theory based on data: The main errors presented both in the exam

papers and rubrics will be stated so that a number of future amendments can be

carried out.

The table below summarises the methodological approach.

148

This graph illustrates the methodological purposes.

This chart shows the methodological purposes of the investigation. First, the selection of

• Basic or theoretical: study of the European Framework together assessment instruments (rubrics).

Investigation type

• Qualitative. The data will be described and analysed through words.

Research type

• Analysis of documents: rubrics and exam papersInstruments

• Evaluation theory

•Rubrics theory

• European Framework for Languages guidelines

Study

• English Certificates

Selection•Rubrics

• Exam Papers

Analysis of documents (Tables)

• Placement of the data analysed in a rubric for the comparison and detection of patterns

Results•Reflection

• Possible Amendments

Conclusions

149

the English Certificates (with their respective exam papers and rubrics) based on the

previous study of evaluation and rubrics theory and the European Framework of

Reference for Languages guidelines. Second, the analysis of documents through the

recording tables created. Third, the arrangement of the data analysed in a rubric built for

the identification of patterns. Fourth, the reflection of the results and possible error

correction and the drawing of conclusions.

5.3. Research design

In order to plan the research, Gil’s scheme (284-287) will be followed. Thus, the first step

is to define the objectives and context of the research and also to determine the

documents which should be analysed.

1) Objective and context

2) Define the units of analysis

3) Number scheme rules

4) Categorisation

5) Codification

6) Reliability and validity

7) Data analysis

5.3.1. Objectives and context

The first step is the establishment of the objectives of the research and the context.

The objectives of the research are the following ones:

- Analyse the exam papers of the main English certificates in Spain to determine

whether or not they follow the European Framework of Reference for

Languages guidelines in regard to the assessment of each skill.

150

- Analyse the exam tasks in order to check whether or not they measure what

they are supposed to measure.

- Analyse the rubrics used for the assessment of the exam papers in order to

check their effectiveness and validity.

- Determine which types of rubrics are more common and whether some

patterns according to skills can be stablished.

The context in which the current research can be established is the detailed study of a

large body of theory related to the assessment and evaluation process which is gathered

in the literature review section (chapter 2), the study of the Common European

Framework of Reference (CEFR) (chapter 3) and the study of the theory of rubrics

(chapter 4). The information from all these three areas will determine the foundations on

which the current research is designed and established.

5.3.2. Definition of the units of analysis

The second step is the definition of the units of analysis. The documents which will be

employed for this research are the exam papers used of the main English certificates and

the handbooks provided by its institutions. Furthermore, the grading scales which they

use for the assessment of the different tasks or papers (if any).

5.3.2.1. Selection of English Certificates of ESL

In the current globalised world, English has achieved an important role as a lingua franca.

Hence, speaking fluent English has become imperative in many fields such as the

academic or economic ones. Furthermore, the assertion that someone can speak English

in an interview or a CV is no longer enough. As a result, the certification of the English

151

language level is nowadays a common requirement in order to apply for some jobs, to

study in some international universities or to aim for certain public positions. Roca Varela

and Palacios also describe how

“many Spanish universities, where, as a consequence of the Bologna Declaration

on the European Space of Higher Education (ESHE), undergraduates and

sometimes also graduate students need to show that they possess at least a B1

level of a foreign language, in most cases English, when they graduate” (55)

There are a wide range of English Certificates of ESL which aim to certify the level of

English their candidates actually have. The most well-known certificates in Spain are the

Cambridge University Certificates ESOL and IELTS, The Trinity College ISE,

CERTACLES, and the Spain’s Official School of Languages certificates. The criteria

selected for the election of those certificates and not others cater to several factors. First

of all, since the current research can be contextualised under the European Framework,

all the certificates should be valid in Europe and should be based on the CEFR and its

levels. This criterion ruled out other famous English Certificates of ESL such as the

TOELF. The second criterion applied is connected to the popularity and acceptance of

the certificates in Spain. According to this criterion, the English Certificates of ESL

selected are the most highly regarded and accepted by most institutions such as

universities or the Spanish government (Ministerio de Trabajo, Migraciones y Seguridad

Social).

All those tests assess the different skills separately through different papers or

performance tasks. In Roca Varela and Palacio’s words they:

“measure the ability of non-native speakers to understand and use English in real-life

settings by examining their competence to understand and produce written and spoken

152

English. Examinees are generally given an overall mark according to their level of

performance on the whole range of tasks included in the tests” (55).

Besides the selection of certificates, it is relevant to comment on the Framework level

chosen. It has already been said that the tests chosen are based on the CEFR levels of

competence. Therefore, all of them except the IELTS assess the aforementioned levels

with separate certificates. This means that the candidate decides which level of

certification he or she wants to be assessed in. If the test is passed, he or she is granted

the certification of the correspondent level. In contrast, if the candidate does not pass the

test, he or she does not receive any certification. In the case of the IELTS, there is only

one test whose results determine the level of competence the candidate has (from an A1

to a C2). In consequence, the candidate will always obtain a certificate after the

examination.

Concerning this research, it has been decided that the exam papers and rubrics analysed

correspond with the B2 level. This level is upper-intermediate, and it is the most

commonly required by companies and institutions to work or study. According to the

CEFR, a B2 user:

“Can understand the main ideas of complex text on both concrete and abstract

topics, including technical discussions in his/her field of specialisation. Can

interact with a degree of fluency and spontaneity that makes regular interaction

with native speakers quite possible without strain for either party. Can produce

clear, detailed text on a wide range of subjects and explain a viewpoint on a topical

issue giving the advantages and disadvantages of various options.” (24)

At this level, the user is already independent and is able to express himself or herself as

well as understand with a certain degree of facility in a broad range of topics.

153

All the certificates chosen use a rubric to score some of their papers and/or tasks:

In the following sections how each of the English Certificates of ESL above mentioned

assesses them will be analysed together with the rubrics they use, if any, to score them.

5.3.3. Number scheme rules

Concerning the number of scheme rules, it is important to define which principles are

going to be used in order to determine which rubrics are effective, valid and reliable. In

addition, the criteria to indicate if the exam tasks and criteria correspond to the ones stated

by the CEFR.

Cambridge ESOL: First Certificate

FCE

Writing paper: one rubric for

the assessment of all the tasks.

Speaking part: one rubric for


Reading paper: no

Listening paper: no

IELTS

Writing paper:two different rubrics,

one per task.



Reading paper: no

Listening paper: no

Trinity College London:

ISE II

Writing paper: two different rubrics for the assessment of the tasks, one per

task



Reading paper: no

Listening paper: one rubrics dor the assessment of the

task (the same used for Speaking)

ACLES





Reading paper: no

Listening paper: no

EOI





Reading paper: no

Listening paper: no

154

5.3.3.1. CEFR tasks and assessment criteria

In the CEFR, tasks suitable for the assessment of each skill, together with the general

criteria of assessment for a B2 user in each of the skills are stated. The information

concerning the tasks, objectives and the criteria gathered from each of the certificates will

be compared to the information contained in the CEFR. This study seeks to state whether

they are effectively assessing the skill. The summary of the information contained in the

CEFR for each of the skills can be found below:

In terms of writing, the CEFR states in the global scale provided that a B2 learner “can

produce clear, detailed text on a wide range of subjects and explain a viewpoint on a

topical issue giving the advantages and Independent disadvantages of various options”

(61). Three illustrative scales for the writing skill can be found in section 4.4. of the

framework entitled `Communicative language activities and strategies´. These scales can

be used in order to ease the design of communicative tasks and the evaluation of the

writing ability. The creative writing scale indicates that a B2 learner is able to “write a

review of a film, book or play” (62) whereas in the essay and report scale is pointed out

his or her ability to develop an argument by giving reasons and details or explaining the

advantages or disadvantages; and sum up information from different sources.

Tasks:

• completing forms and questionnaires

• writing articles for magazines, newspapers, newsletters, etc.

• producing posters for display

• writing reports, memoranda, etc.

• making notes for future reference

155

• taking down messages from dictation, etc.

• creative and imaginative writing

• writing personal or business letters

Criteria:

A B2 learner can produce clear, detailed text on a wide range of subjects and explain

a viewpoint on a topical issue giving the advantages and independent disadvantages

of various options.

Can write a review of a film, book or play.

Can write an essay or report.

Concerning speaking, the normal tasks used to assess speaking are presentations,

description of pictures, interviews, discussions and dialogues between students.

The CEFR points out that the oral productions tasks should be (58):

• Public address (information, instructions, etc.)

• Addressing audiences (speeches at public meetings, university lectures,

sermons, entertainment, sports commentaries, sales presentations, etc.)

In addition, it is indicated that the assessment of the speaking skills can include some of

the following tasks: reading a written text aloud; speaking from notes, or from a written

text or visual aids (diagrams, pictures, charts, etc.); acting out a rehearsed role; speaking

spontaneously; singing (CEFR 58).

The CEFR provides the educational community with five different illustrative scales for

the speaking skill levels (58-61). Those scales can be used to assess the level of the:

• Overall spoken production

• Sustained monologue: describing experience

156

• Sustained monologue: putting a case (e.g. in debate)

• Public announcements

• Addressing audience

The criteria for a speaking B2 level overall production states that a B2 learner “Can give

clear, systematically developed descriptions and presentations, with appropriate

highlighting of significant points, and relevant supporting detail” and he or she “can give

clear, detailed descriptions and presentations on a wide range of subjects related to his/her

field of interest, expanding and supporting ideas with subsidiary points and relevant

examples.” (CEFR 60). In addition, some other specifications for particular functions

such as describing an experience or doing a debate can be extracted from the other rubrics

provided. Thus, a B2 learner must be able to describe with details different topics, give

reasons and support them in discussions and debates or highlight the advantages or

disadvantages of different options. He or she has the ability to give presentations with

clarity and fluency and depart spontaneously from those when follow up questions are

posed, or interesting points raised by the audience.

Tasks:

• public address (information, instructions, etc.)

• addressing audiences (speeches at public meetings, university lectures, sermons,

entertainment, sports commentaries, sales presentations, etc.)

• reading a written text aloud

• speaking from notes, or from a written text or visual aids (diagrams, pictures,

charts, etc.)

• acting out a rehearsed role

157

As for the reading comprehension is most commonly assessed through multiple choice

questions, true or false questions, sentence completion, open questions, gapped texts or

summaries. The CEFR indicates that the tasks selected should focus on (69):

• reading for general orientation;

• reading for information, e.g. using reference works;

• reading and following instructions;

• reading for pleasure;

• The language user may read;

• for gist;

• for specific information;

• for detailed understanding;

• speaking spontaneously

• singing

Criteria:

• clear, systematically developed descriptions and presentations

• supporting ideas with subsidiary points and relevant examples

• describing an experience

• describe with details different topics

• give reasons and supporting them in discussions and debates

• highlight the advantages or disadvantages of different options

• give presentations with clarity and fluency

• depart spontaneously from discussion when follow up questions are posed

158

• for implications, etc.;

According to the CEFR, the reading global scale provided states that a B2 candidate

is able to:

“read with a large degree of independence, adapting style and speed of reading to

different texts and purposes, and using appropriate reference sources selectively.

Has a broad active reading vocabulary but may experience some difficulty with

low frequency idioms” (69).

In addition, the CEFR contains four other reading scales:

• reading correspondence

• reading for orientation

• reading for information and argument

• reading instructions

The most important criteria comprised in those scales for the B2 users include the ability

to read correspondence, follow a text of instructions, understand specialised articles and

reports related to current issues. Among the functions that the candidate has to be able to

do are: scan through long and complex texts, finding relevant details and the identification

of the most important information, ideas or opinions (CEFR 69-71).

Tasks:

• reading for general orientation

• reading for information, e.g. using reference works

• reading and following instructions

• reading for pleasure

• for gist

159

• for specific information

• for detailed understanding

• for implications, etc.

Criteria:

• read with a large degree of independence, adapting style and speed of reading to

different texts and purposes, and using appropriate reference sources selectively

• ability to read correspondence

• follow a text of instructions

• understand specialised articles and reports related to current issues.

• scan through long and complex texts

• find relevant details

• identification of the most important information, ideas or opinions

Finally, regarding the most common tasks used for the assessment of listening, they are

multiple choice questions, true or false listening tests, open questions and sentence

completion exercises.

The CEFR lists the following listening tasks to assess this skill (65):

• listening to public announcements (information, instructions, warnings, etc.)

• listening to media (radio, TV, recordings, cinema)

• listening as a member of a live audience (theatre, public meetings, public lectures,

entertainment, etc.)

• listening to overheard conversations, etc.

160

In the CEFR there are five different scales to assess the listening skill. In the overall one,

it is pointed out how a B2 user:

“Can understand the main ideas of propositionally and linguistically complex

speech on both concrete and abstract topics delivered in a standard dialect,

including technical discussions in his/her field of specialisation. Can follow

extended speech and complex lines of argument provided the topic is reasonably

familiar, and the direction of the talk is sign-posted by explicit markers” (66)

Besides this scale, other four are provided for:

• Understanding interaction between native speakers

• Listening as a member of a live audience

• Listening to announcements and instructions

• Listening to audio, media and recordings

Those scales provide the main criteria which a B2 user must master in relation to the

listening skills. The criteria encompass the ability to keep up with a conversation,

understand much of what is said in a discussion in which he/she is participating and be

able to participate, understand and follow lectures, talks and reports with academic

vocabulary, announcements and messages and radio documentaries or broadcast audio. It

can also identify viewpoints and attitudes of different speakers (CEFR 66-68).

Tasks:



• listening as a member of a live audience (theatre, public meetings, public lectures,


161


Criteria:

• main ideas of propositionally and linguistically complex

• can follow extended speech and complex lines of argument

• keep up with a conversation

• understand much of what is said in a discussion in which he/she is participating and

be able to participate

• understand and follow lectures, talks and reports with academic vocabulary,

announcements and messages and radio documentaries or broadcast audio

• identify viewpoints and attitudes of different speakers

5.3.3.2. Criteria to create an appropriate rubric or to determine which ones are effective

To begin with, it can generally be stated that a good rubric must be valid, reliable,

effective and relevant. Nevertheless, those factors are commonly absent in many of the

rubrics contained in some textbooks or available on the internet. It is essential that the

teacher or examiner has all these criteria present when he or she builds the rubrics,

chooses it or supervises the one created by the students.

Whether the criteria included in the rubrics are pertinent in order to determine if the

student has achieved a specific skill or has completed the task successfully, is a key factor

to verify if the rubric is relevant. Another important issue is whether the task or what is

being assessed will really be useful for the learner in his or her future career.

162

Colin Phelan and Julie Wren argue that “validity refers to how well a test measures what

it is purported to measure” (para. 12), it is the accuracy of an assessment. In order to

achieve validity, the goal of the assessment must be clear and very well defined and set.

There are many different times of validity:

- Face validity: determines if the items assess the desired construct.

- Construct validity: is used to ensure that the rubric is actually measuring what

it is intended to measure, and no other variables.

- Criterion-Related validity: it consists on comparing the rubric to one other.

- Formative validity: it refers to the outcomes and tries to check to what extent

the rubric provides data to improve the learning program.

- Sampling validity: “ensures that the rubric covers the broad range of areas

within the concept under study” (para. 24).

Some good advice can also imply making sure the aims are very clear and operationalised.

With the purpose of checking that the rubrics match the evaluation standards and criteria,

it is important to try to get the student involved in the creation or choice of the rubric and

to get them familiarised with it, and, if possible, compare your rubric with other similar

rubrics.

Validity may be threatened by several factors, such as constructs very poorly defined, an

unsuitable selection of the task or an inappropriate context for the performance. The

teacher must make sure that the rubric chosen or created is adequate for what is going to

be measured.

Reliability is, according to Phelan and Wren, “the degree to which an assessment tool

produces stable and consistent results” (para. 1). This means, for instance, that a reliable

163

rubric will score a text with the same mark if it is used twice with one-month-time in

between or that two different teachers will score it with the same mark.

As in the case of validity, there are also different types of reliability. The first type is

called test-retest reliability. A rubric should score the same test with the same mark or a

similar one if it is used twice. Another sort of reliability is known as parallel forms and

consists of assessing the same test with two different rubrics which contain similar

descriptors and criteria. The score should be similar in order to obtain a high reliability

rate. When there is more than one examiner using the same rubric, the inter-rater

reliability can be checked. Finally, the internal consistency reliability intends to evaluate

to what extent different items of a test prove the same results. Phelan and Wren point out

two subtypes, average inter-item correlation, “obtained by taking all of the items on a test

that probe the same construct, determining the correlation coefficient for each pair of

items, and finally taking the average of all of these correlation coefficients” (para. 10)

and split-half reliability, which consists of dividing in half the items that measure the

same skill “the entire test is administered to a group of individuals, the total score for each

“set” is computed, and finally the split-half reliability is obtained by determining the

correlation between the two total “set” scores”.

It is essential to check how effective a rubric is with the purpose of improving the

teaching-learning process. A truly effective rubric must provide students with feedback

on what they need to improve and how to do so, what they have done well and also

provide the teacher with information about the whole process and how to address the

problems detected.

Relevance is another important factor. It is necessary to be sure the rubric descriptors and

criteria are really relevant to assess a particular skill or task. In addition, the task must be

164

genuinely useful for the learner’s development. The teacher has to ask himself/herself

whether what is being assessed is significant for the students, if it will help them in their

profession or if it really probes the master of one skill or not.

Popham proposes a rubric to evaluate rubrics. It is a very simple rubric with four

evaluative criteria which correspond to the following questions:

-Is the skill assessed actually worthwhile?

-Are there few scoring criteria but correctly labelled?

-Are there degrees of excellence described appropriately?

-Is the rubric presented in a clear and handy way? (27-28).

In addition, as it has already been noted, the Common Framework of Reference for

Languages includes certain guidelines on how to construct good grading scales or

checklist to use as assessment tools. The guidelines given highlight the necessity of

building a feasible tool, so it is not suitable to build a rubric which contains more than

5 criteria to assess. With regard to the descriptors, they should be positively worded,

brief and they should avoid vagueness.

5.3.4. Categorisation and codification

The fourth and fifth step are categorization and codification. The instrument chosen for

the research are recording tables. These charts have been designed specifically for this

study and they have been created bearing in mind that the information they contain is

going to be compared, examined and, obviously, analysed. Accordingly, they were

165

considered as a helpful visual instrument which will be used in order to summarise and

categorise the data extracted from the different exams and rubrics.

The recording table below has been designed in order to compare the tasks which each

paper contains with the ones proposed by the CEFR. Moreover, it allows the comparison

between the criteria stated to assess these tasks.

English certificate name CEFR

Task:

[Tasks which form the paper]

Tasks:

• [Quotation of the tasks

recommended by the CEFR for the

assessment of the skill

Criteria:

[Criteria for the assessment of the paper’s

tasks]

Criteria:

[Criteria stated by the CEFR for the

assessment of the skill in a determined

level]

In order to classify the rubrics used by each of the certificates in the different skills, the

following chart has been conceived. It seeks to give a detailed classification of a rubric.

Consequently, it encompasses all the different possible classifications which can be

considered to classify a rubric. As a result, the rubric will be defined according to all the

possible criteria.

Type of rubric according to English certificate name

How is measured

How is scored

Theme

166

Application

Function

Scorer

Channel

Moreover, the grading tables suggested by the Framework will be compared with the

rubrics used in the certificates according to the descriptors and criteria employed.

Finally, one recording table was designed to allow the summarisation of all the data

extracted and analysed. This chart it is merely a compendium of the two previous ones

together with a table which summarizes Popham’s rubric for rubrics.

EXAM Tasks:

Match CEFR tasks:

Match CEFR criteria:

RUBRIC Type Measurement

Scoring

Theme

Application

167

Function

Scorer

Channel

Relevant

Valid

Reliable

CEFR criteria Feasible

Descriptors Positive

Brief

No vague

Popham’s rubric Skill worthwhile

Scoring Criteria

(few and well labelled)

168

Descriptors (well

described)

Clear and Handy?

5.3.5. Reliability and Validity

With regard to the reliability and validity of each of the exams, the limitations of the

current research made it impossible to carry out a case study as this would imply a whole

new and extensive research so the data used to measure the reliability and validity are

those provided for the institutions based on studies and research carried out previously.

The main coefficients used will be Cronbach’s Alpha and SEM.

5.3.4. Data Analysis

Finally, the data obtained and classified will be studied in order to establish comparisons

and patterns among the tests and scales studied. For this purpose and with the aim of

easing such comparison, some rubrics specifically designed will be used.

One recording table to compare all the certificates per skill (4 tables in total):

CEFR (Corresponding Skill)

Tasks: Criteria:

Tasks:

169

English

Certificate

FCE IELTS

(Band 6)

ISE II ACLES EOI

Time

Nº of Tasks

Word

length

Rubric?

Match

CEFR

tasks

Match

CEFR

criterion

One recording table in order to compare the analysis of the certificates in relation to the

suitability of their exam papers:

FCE IELTS

(Band 6)

ISE II ACLES EOI

Nº of exam papers

Match

CEFR

tasks

Writing

Speaking

Reading

Listening

170

Match

CEFR

criteria

Writing

Speaking

Reading

Listening

Reliability Cronbach’s

alpha

SEM

One recording table which compares the certificates in relation to their rubrics (two in

total, one for the comparison of the writing rubrics and another for the comparison of the

speaking rubrics.

SKILL FCE IELTS ISE II ACLES EOI

Type Measurement

Scoring

Theme

Application

Function

Scorer

Channel

Relevant

Valid

CEFR Feasible

Descriptors Pos.

Brief

171

No

Vague

Popham’s

rubric

Skill worthwhile

Scoring criteria

(few+well labelled)

Descriptors (well

described)

Clear+handy?

5.4. Hypotheses

Taking into account the objectives of this thesis and the research which has been

explained and designed, it is fundamental at this stage to state what the hypotheses are:

H1.- The main English Certificates exam papers in Spain follow the guidelines given by

the European Framework of References for Languages.

H2.- The productive skills will use a rubric for the assessment of the paper tasks while

the receptive skills will present a lack of them.

H3.- Quantitative rubrics will be predominant in the main English Certificates.

H4.- The rubrics used will not be entirely valid, effective or reliable according to the

criteria stated by the CEFR.

H5.- The rubrics used will not be entirely effective or valid according to the criteria for

good rubrics.

172

H6.- Certain patterns could be extracted from the analysis of the different rubrics

analysed.

The hypotheses above mentioned obviously need to be subjected to a detailed research

in order to check whether or not they are correct. Hence, the research design previously

exposed is essential for accomplishing such a meaningful task. The next chapter

presents the research carried out.

173

Chapter 6: RESEARCH

6.1. Proficiency Test exam papers for the different language skills and their

assessment rubrics

In the following sections, the main exams used to certificate English competence in Spain

will be analysed. Special attention will be paid to:

o The types of task contained in each paper and whether those tasks are

recommended by the CEFR.

o If the assessment criteria match the criteria indicated by the European Framework.

o The rubrics used to assess those skills (if any) with the intention of finding out:

1. The type of rubrics used are in accordance with the different criteria.

2. Whether they are suitable or not.

3. Which types are the most common according to each skill.

4. If the use of the rubric implies a change of methodology or traditional

assessment tasks.

5. If they are adjusted to the CEFR criteria.

6. Their validity.

7. Their relevance.

8. Their reliability.

As has been explained above, rubrics, grading scales or evaluation matrices are powerful

resources or tools in order to assess whether a student has achieved the learning standards

of a certain level, its level of proficiency in a subject, topic or language, his or her work

174

and behaviour through a course, essay, etc., his or her progress, etc. In spite of the fact

they can be implemented in any subject, their use in the assessment of foreign languages

is the main focus of this thesis.

As a result, it makes sense to explore the use of rubrics for the assessment of the four

different skills: speaking, writing, reading and listening. Even though it is true that rubrics

have been traditionally used for the assessment of productive skills such as speaking and

writing, it is also possible to assess the so-called receptive skills (i.e., reading and

listening) using rubrics.

6.2. Writing

Rubrics were originally created for the scoring of writing compositions. This is probably

the reason why grading scales for writing are the most diversified. Writing is commonly

assessed through the elaboration of a composition by the learner. Those compositions or

essays are usually within a range of 40 to 220 words11. Traditional tasks encompass the

creation of a specific kind of text. Those text types correspond to formal or informal

letters or emails, articles, reports, complaints or opinion essays12. The CEFR refers to the

following types of tasks to assess this skill:

“completing forms and questionnaires; writing articles for magazines, newspapers,

newsletters, etc.; producing posters for display; writing reports, memoranda, etc.;

making notes for future reference; taking down messages from dictation, etc.; creative

and imaginative writing; writing personal or business letters, etc.” (61)

6.2.1. Literature review

11 In the English certificates analysed 12 In the English certificates analysed

175

Writing has been one of the main skills to be taught and assessed in the evaluation of

languages in the last centuries. Ezza stated that “since the early 1870s, academic circles

in English-speaking countries, notably in the United States, have been attaching

heightened significance to writing instruction” (186).

A simple analysis of the general assessment practices enables the verification that writing

is present, either directly or indirectly, in most evaluations. It is obvious that the

assessment of writing in a Foreign Language or in a native language course implies the

student’s demonstration of his or her writing production and abilities. On the other hand,

it might not be an immediate first thought that any kind of paper assessment presupposes

the test-taker uses writing. For instance, in a biology exam the teacher is assessing the

student’s knowledge of the cell. Nevertheless, the student must prove that knowledge by

explaining and describing the way in which a cell works through words. If the student

does not explain himself or herself correctly, the teacher might consider that he or she

does not know the content, so his or her ability to write is somehow being assessed. This

is because traditional examination practices, still the most common ones in our

educational system, are carried out through paper-based exams which require learners to

prove their knowledge on a certain subject, issue or area by producing a written answer.

As a result, it can be concluded that “in educational settings, writing is the basis upon

which a candidate’s achievement, learning and intelligence are judged” (Ghalib 225). For

instance, the student’s knowledge of photosynthesis in a biology exam does not just

depend on his or her understanding of the process, but also on his or her ability to explain

it clearly and correctly to the teacher in the exam. For this reason, writing is somehow

being assessed as well.

Despite its importance, writing is a skill that is still not mastered by most students, neither

native nor non-native speakers. According to some researchers, such as Allen et al.,

176

“students in the US struggle to reach proficiency levels throughout their high school

years” (125). This may be due to the complexity and difficulty that the process of writing

implies since it “requires individuals to coordinate a number of cognitive skills and

knowledge sources, such as goal setting, discourse awareness, memory management

strategies and social cultural knowledge” (Flower and Hayes cited in Allen et al. 125).

L1 and L2 writing processes are similar as both imply the setting of goals and the

translation of ideas into words (125). However, they show differences in lower-level

processes. L1 writer’s syntactic constructions and lexical access are pretty much

automatized while they are high demanding tasks for L2 writers (De Keyser cited in

Allen, Crossley et al. 126). Consequently, non-native speakers spend less time on higher-

lever processes because of the emphasis they need to put on the tasks that native speakers

perform automatically. Furthermore, L2 learners must tackle some extra constraints when

writing; for instance, the translation they may mentally try to do when they write in their

L2. Some studies related to this issue suggest L2 writings often include more t-units but

fewer modifiers, subordinate clauses and cohesive mechanisms. (Silva cited in Allen et

al. 126).

With regard to evaluation of the writing skill of L2 learners, it is essential to understand

the difficulty and complexity of the process since it “requires accounting for multiple

factors to ensure a fair and accurate judgement of writer’s abilities” (Veerappan and

Tajularipin 143). As for the criteria that must be considered for the assessment of this

skill, there are several. Even though these criteria may be referred to with a different name

or label, the truth is that they are assessing the same aspects, as will be illustrated below:

the Somerset Local Educational Authority (LEA) in the United Kingdom takes into

account eight criteria: originality, vocabulary, elaboration, organization, syntactic

agreement, spelling, handwriting and layout (Wilkinson cited in Ezza 187). On the other

177

hand, the Australian Curriculum Assessment and Reporting Authority (ACARA) uses ten

scoring criteria: audience, structure, persuasive devices, cohesion, text, ideas, vocabulary,

paragraphing, sentence structure and spelling (cited in Ezza 187) and the City University

of New York relies on only five: critical response to the writing task; development of

ideas; structure of the response; sentence, word choice and grammar; and usage and

mechanics (187). According to Polio (cited in de Haan and Van Esch 2-3), the features to

review in the text are the following: overall quality (based on the linguistic accuracy),

syntactic complexity (variety of t-units and elaborate language structures), lexical

features, content (interest, referencing and argumentation), mechanics (spelling,

punctuation and capitalisation), coherence and discourse (organisation and emphatics and

cohesive devices), fluency and revision. All of the aforementioned examples illustrate

briefly the diversity of sets of criteria that may be used to evaluate the writing skill but

also the clear similarities among all the sets.

Bearing in mind the different criteria which can be used for the assessment of writing,

rubrics seem to be an appropriate tool which, as Mark Brook states “enables an evaluator

to convert a given quality of student work into a letter grade, percentage or level” (cited

in Frydrychova 391). The general advantages of using a rubric have already been stated

in the chapter devoted to rubrics. As for the advantages related specifically to the use of

rubrics for the assessment of writing, Ezza highlights they “ensure greater score validity”

(187). Concretely, holistic rubrics allow an “authentic reaction of the rater” whereas

analytic ones identify “writers’ strengths and weaknesses” (187). Ezza also warns about

some of the cons: holistic rubrics cannot give a precise diagnostic info and focus and rely

too much on the scorer rather than on the text. For his part, analytic rubrics might have

other problems. For instance, analytic rating of one criterion may influence the rating of

the other criteria. That is, the fact that one text contains many grammar mistakes does not

178

necessarily mean that the structure, cohesion, or range of vocabulary are not excellent.

However, this fact may affect the rater’s perception. Some other critics suggest that

rubrics may be “inhibiting in terms of their creativity” or that rubrics “describe minimal

standards rather than high standards” (432).

Some interesting studies on the use of rubrics for the evaluation of writing have been

carried out in the recent years. Wang conducted an inquiry during an EFL writing course

in a Chinese University about the effects of rubrics for the self-assessment of student’s

writings. The research consisted of six essays and several interviews with six informants

among the 80 who were doing the writing tasks. The students were shown one rubric in

the first class together with an explanation and samples which would fit each of the levels

of the scale. During one lesson, the students were asked to write an essay which was

photocopied by the teacher afterwards. Later the students had to self-assess themselves.

In the following lesson, students were asked to peer-assess the same writings. Then,

students were shown both self- and peer-assessments. The same process was carried out

six times. The research demonstrated that the participants “embraced the rubric as an

instructional tool guiding them throughout the forethought, performance and reflection

stages of self-regulated learning” (9).

In an article for Assessing Writing, Sundeen noted the results of a study he carried out in

a high school from the western of the US. 89 learners were divided into three different

groups, each exposed to different conditions. One group could see the six-point rubric

they would be assessed with and it was also explained to them; the second group could

see the rubric, but no explanation was given to them; and the final group received no

instruction related with the assessment tool. The rubric which was used to assess the

students’ essays before and after the research measured the organisation, word choice,

ideas, sentence fluency and convention of the persuasive essays. It could be proved that

179

those students who had seen and studied the rubric obtained improved results. However,

no significance difference where found in terms of number of paragraphs, sentences and

words, which was contrary to what it was expected. Concerning the group which were

only shown the rubric some minutes before the writing, it was observed that their writing

quality was better than when they could not see the rubric, although the improvement was

not as significant as it was in the group where the rubric had been shown and commented

on beforehand (78-88).

Inspired by Sundeen’s research, Becker carried out a similar study. He decided to

compare the performance of learners on a summary writing task and the impact that the

involvement of students in the creation of the rubric could have in their results. 96 ESL

students with a high-intermediate level of English took part in the research. Three groups

were labelled as treatment and one as control group. There was no overall difference in

the English proficiency levels of any of the groups. The first group which was referred as

“A” participated in a workshop which lasted for 60 minutes. The aim of the workshop

was to develop a scoring rubric. With this intention, students were shown good and poor

examples of rubrics and they were asked to think about how a summary could be assessed.

Afterwards, they had to list the criteria of a demonstrative effective summary. Later the

criteria were discussed in order to articulate levels of quality, and finally they were tasked

with the creation of a draft rubric. Furthermore, the revised version of the rubric was

posted so that it would be available for everybody. Class “B” did not create a rubric but

could see and discuss five different benchmark essays and, afterwards, they had the

opportunity to score 10-15 summaries from students of the other three groups. These

summaries were the ones which had achieved the same score from three professional

raters. Group “C” just had the chance of seeing the rubric 15 minutes before completing

the summary task. Group “D” was only asked to complete the task without any kind of

180

access to the rubric. The results obtained by the research were in the line of Sundeen´s

achievements. Although the pre-test scores of the four groups were very similar, it was

found that the results of those students in group A in the post-test were significantly higher

in comparison to the other classes (B, C and D). Moreover, the results obtained by class

B learners were higher than the results achieved by the learners in groups C and D. As a

result, Becker concluded that the fact of providing the learners with a rubric is not enough

to improve their writing performance. It is therefore necessary to involve the students in

the creation of the rubric or, at least explain it to them and provide them with different

models and examples in order to improve their scores (15-24).

Simona Laurian and Carlton J. Fitzgerald tried to probe the differences between students

who had been shown or not the rubric through a small case study. The students had to do

two writing assignments, the second one with a copy of the rubric that would be later used

by the scorer. In addition, they answered a survey which consisted of fifteen questions

divided in three categories: the use of rubrics, standards and positive and negative effects

of the rubrics. The students answered with a five-point rating scale. The results of the

case study were quite significant, as the average score of the students improved from

86.83 to a 90.86 in the second essay written. Eighteen of the students got a higher mark

in the second assignment. With regard to the survey, the most remarkable aspect is that

20 students out of 21 admitted that they had taken into account a rubric in order to

complete their writing task if they had been allowed to see it. 19 out of 21 considered that

rubrics were useful for self-assessment before handing in an essay. 14 disagreed with the

statement that a rubric “stifles my creativity” which rejects the argument that students

“found rubrics inhibiting in terms of their creativity” (432).The above summarised studies

by Wang, Sundeen, Becker and Laurian and Fitzgerald illustrate the importance of giving

ESL learners access to the rubrics, involving them either in their creation or through the

181

explanation of their criteria and the illustration of students with examples of good and

bad models which would fit each of the scale levels.

Other studies on the use of rubric as an assessment instrument deal with the use of

automated scoring systems. Perin and Lauterbach pointed out that those “cannot yet

interpret the meaning of a piece of writing, identify off-topic content, or determine

whether it is well argued”. Nevertheless, they worked on the use of coh-metrix automated

scoring in order to improve them. This system focuses on cohesion and the different

linguistic variables or “cohesive cues” which are, as McNamara et al. argue, that what

“would be expected to characterize student writing judge to be of high quality” (62 cited

in Perin and Lauterbach). The cohesive cues used by different automated systems may

include connectives, lexical overlap, logical operators, causal cohesion, semantic co-

referentiality, anaphoric reference, polysemy, hypernymy, lexical diversity, word

formation, syntactic complexity, syntactic similarity and basic text measures. The

research counted on a corpus of persuasive essays and summaries from the Mississippi

State University, together with other ones written by low-skilled adults at community

colleges. All the essays were scored first by experienced human raters using a six-point

holistic rubric for the essay and a 16-point analytic scale for the summary. Some of the

criteria included in the rubric were the following: critical thinking, use of examples,

reasons provided, organization, coherence, accuracy of grammar and usage of mechanics.

The inter-rater reliability was r=.85. Afterwards, they were scored by the automatic rater

in order to find out whether the system can distinguish between trained and low trained

students. The system made use of significant predictors in order to attempt to detect those

persuasive essays written by trained students. These predictors were the number of words

before the main verb, the textual lexical diversity and the celex logarithm frequency

including all words. Using the three predictors the coh-metrix was unable to distinguish

182

between the two groups of essays. However, when the researchers decided to use a coh-

metrix system based on 52 different predictors, they found that it was capable of showing

a number of significant differences between high and low essays in the lexical diversity

and the argument overlap predictor. As for the written summaries, the coh-metrix found

differences in the content word overlap, adjacent sentences, proportional, standard

deviation and lexical diversity predictors.

A large amount of studies outline different kinds of rubrics in order to find the most

reliable ones or the features essential for building an effective one. Ghalib and Al-Hattami

carried out a case study with 30 students of English at the Faculty of Arts of Taiz

University to compare the assessment with holistic and analytic rubrics. The case study

consisted of assessing a descriptive essay with a holistic rubric and with an analytic one.

Three raters were trained previously in a two-hour session where they were explained the

rating system, advised on how to avoid bias and the most common rating problems were

discussed. The rater had to assess the 30 writing samples in two separate sessions. The

first time, they used a 6-point holistic grading scale, whereas in the second session (which

took place one month later) they used an analytic rubric with the following criteria:

content, cohesion, syntactic structure, vocabulary and mechanics. Furthermore, the

analytic rubric had several well-defined standards of performance points. This case study

allowed the researchers to measure the standard deviation of the scores, this being 3.12

with the holistic rubric and 2.82 with the analytic one. They also observed that the analytic

rubric was more rigorous according to t-test. In addition, the analysis of variance showed

no significant differences among the three raters and the confidence interval using the

holistic rubric and the analytic one was 95%. As a result, the researchers concluded that,

using analytical scoring rubrics, raters “give lower scores than then using holistic scoring

rubrics” (230) but the analytic ones provide “more consistent scores” (131). It is also

183

worth highlighting that Ghalib and Al-Hattami’s article for English Language Teaching

contains a list of effective rubric features such a “well-defined list of criteria for test-

takers to know what is expected” (226), “standards of excellence for the different levels

of performance” (226), “gradations of quality” based on the degree to which standards

have been met, and “modal exemplars of expected performance at the different levels of

the scale” (226). Moreover, they state that an effective rubric “is the one that is used by

different raters on a given assessment task and generates similar judgements/scores”

(226).

6.2.2. Assessment of Writing in the main English Certificates

The English Certificates analysed showed a certain degree of consensus on the way to

assess the writing skills. Thus, the tasks consist mainly in writing compositions.

Cambridge First Certificate

The Cambridge First Certificate (CFE) writing paper consists of two writing tasks. The

time allowed to take the exam is 1 hour and 20 minutes. In task 1 “candidates are given

input in the form of an essay title to respond to, along with accompanying notes to guide

their writing” (Cambridge English Language Assessment 27). The essay composition

must be 140-190 words long in which the candidate will agree or disagree with a prompt

given. Candidates must prove their ability for giving information, giving opinion, giving

reasons, comparing and contrasting ideas and opinions and drawing a conclusion in

English. Some other specifications of the criteria include the candidates’ need to express

their ideas in a clear and logical way, the use a variety of structures and vocabulary and

the appropriate use of cohesive devices and linking words. Task 2 consists of three

184

different tasks types, from which one must be chosen. The three tasks proposed may be

an article, an informal or formal letter or email, a report or a review. In this part,

candidates “must be able to demonstrate appropriate use of one or more of the following

functions: describing, explaining, reporting, giving information, suggesting,

recommending, persuading” (28). Moreover, they must also adjust themselves to the

specific type of writing requirements in terms of format and register, as well as using of

cohesive devices, appropriate vocabulary and structures and correct exposition of ideas.

Concerning writing, the CEFR states, in the global scale provided, that a B2 learner “can

produce clear, detailed text on a wide range of subjects and explain a viewpoint on a

topical issue giving the advantages and Independent disadvantages of various options”

(61). Three illustrative scales for the writing skill can be found in the section 4.4. of the

framework titled `Communicative language activities and strategies´ in order to ease the

creation of communicative tasks and the evaluation of the writing ability. The creative

writing scale indicates that a B2 learner is able to “write a review of a film, book or play”

(62) whereas in the essay and report scale is pointed out his or her ability to develop an

argument by giving reasons and details or explaining the advantages or disadvantages;

and sum up information from different sources. According to all this, the tasks proposed

by the Cambridge First Certificate would be suitable to assess whether or not a candidate

has a B2 writing level. In addition, the tasks selected would be among those recommended

by the Council to assess writing. The table below summarises the Cambridge First

Certificate’s successful adaptation of the exam tasks to what the CEFR suggests.

FCE CEFR

Task:

1. Essay

Tasks:

185

2. Informal letter/e-mail, article,

report, formal letter/e-mail or

review

• completing forms and

questionnaires

• writing articles for magazines,

newspapers, newsletters, etc.




• taking down messages from

dictation, etc.


• writing personal or business

letters

Criteria:

(Task 1) Candidate must prove his or her

ability for giving information, giving

opinion, giving reasons, comparing and

contrasting ideas and opinions and

drawing a conclusion.

(Task 2) Candidate needs to demonstrate

appropriate use of one or more of the

following functions: (describing,

explaining, reporting, giving information,

suggesting, recommending, persuading

Criteria:

A B2 learner can produce clear, detailed

text on a wide range of subjects and

explain a viewpoint on a topical issue

giving the advantages and independent

disadvantages of various options.

Can write a review of a film, book or

play.


186

As for the instrument used to assess this skill, the Cambridge FCE examiner must use one

rubric in order to score the two writing compositions. This rubric is analytic because it

assesses separately the four different criteria. These criteria are: content, communicative

achievement, organisation and language. As for the scoring, it is quantitative since it uses

a numeric scale from 0 as the minimum and 5 as the maximum. It is a rubric only used to

assess the two writing tasks, so it is domain relevant and skill focused. The main purpose

of these certificates is to determine whether the candidate has the B2 level or not, so it is

a proficiency rubric. Although it is not the candidate’s teacher who scores the tasks, the

examiner substitutes the teacher in order to increase the objectivity of the task. Finally, it

is a paper rubric even though it can be also found on the Internet.

Type of rubric according to Cambridge FCE

How is measured Analytic

How is scored Quantitative

Theme Domain-relevant

Application Skill-focused

Function Proficiency

Scorer Teacher (examiner)

Channel Paper

It is now time to determine whether the rubric used by the Cambridge certificate is

suitable or not. In order to do so, all the aspects included and explained in the section

Criteria to create a good rubric or to determine which ones are good will be analysed.

Afterwards, a recording table will summarise all the conclusions of the analysis.

187

To begin with, Popham’s rubric will be employed. The rubric’s rubric criteria will

correspond to the following questions: Is the skill assessed actually worthwhile? Are there

few scoring criteria but correctly labelled? Are there degrees of excellence described

appropriately? Is the rubric presented in a clear and handy way? (Popham 27-28).

➢ Is the skill assessed actually worthwhile?

Yes. The skill assessed is the writing skill. In the literature review section, the importance

of the writing skill has been fully addressed. Moreover, the CEFR includes the writing

skill among the skills used to determine someone’s language level.

➢ Are there few scoring criteria but correctly labelled?

Yes. The scoring criteria are few (four) as Popham recommends and are correctly

labelled.

➢ Are there degrees of excellence described appropriately?

No. There are multiple cells to describe the different criteria according to the scale. The

descriptors are appropriately described, but in some cases they are somewhat short and

they do not provide examples, which would be recommendable. Furthermore, the

descriptors for scale 4, 2 and 0 are very vague and imprecise as they merely indicate a

performance between the bands above and below.

➢ Is the rubric presented in a clear and handy way?

Yes. The rubric is clear because it is not excessively long and the design is good and

handy.

With regard to the CEFR recommendations for good rubrics (the necessity of building a

feasible tool, descriptors positively worded, brief and not vague), the Cambridge rubric

for the FCE is feasible since it contains only 4 different criteria. Furthermore, CEFR

188

advice on the reduction of criteria has been followed by the grouping of categories under

the same label. For example, under the label “Language” both the grammar and

vocabulary are assessed. The “Organisation” criterion encompasses the cohesion devices

together with the structure and format of the composition. In relation to descriptors, they

are positively worded, as encouraged by the European Council. For instance, band 1 of

the language criteria, which corresponds to one of the lowest levels, states “uses everyday

vocabulary generally appropriately, while occasionally overusing certain lexis. Uses

simple grammatical forms with a good degree of control. While errors are noticeable,

meaning can still be determined.” (34). The descriptors are brief but, as has already been

stated, some of the bands are very vague as they only include the indication: performance

shares features of Bands X and X as the only descriptor.

Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be

stated that the criteria the rubric uses are suitable as they are relevant to the skill. In fact,

the European Council uses three of them (communicative achievement, organisation and

language) in the writing scale provided by the CEFR as the Cambridge Handbook for

teachers indicates (33). The validity can also be easily confirmed since the test, with the

help of the rubric, measures what they are supposed to assess, and the descriptors

provided match the evaluation standards included in the CEFR. A simple comparison

between the CEFR writing scale and the FCE writing rubric shows the similarities. (All

descriptors included in the following table have been taken from the Cambridge FCE

handbook 33-34)

Communicative

Achievement

Organisation Language

189

CEFR (B2) Uses the

conventions of the

communicative

task to hold the

target reader’s

attention and

communicate

straightforward

ideas

Text is generally

well organised and

coherent, using a

variety of linking

words and cohesive

devices

Uses a range of

everyday

vocabulary

appropriately, with

occasional

inappropriate use of

less common lexis.

Uses a range of

simple and some

complex

grammatical forms

with a good degree

of control.

Errors do not

impede

communication.

FCE (Band 5) Uses the

conventions of the

communicative

task effectively to

hold the target

reader’s attention

and communicate

straightforward and

Text is well

organised and

coherent, using a

variety of cohesive

devices and

organisational

patterns to

Uses a range of

vocabulary,

including less

common lexis,

appropriately.

Uses a range of

simple and complex

grammatical forms

190

complex ideas, as

appropriate.

generally good

effect.

with control and

flexibility.

Occasional errors

may be present but

do not impede

communication.

Reliability

Cambridge Assessment English controls the reliability of its certificates by using

Cronbach’s alpha (the closer the Alpha is to 1, the more reliable the test section is) and

also the Standard Error of Measurement tool (shows the impact of reliability on the likely

score of an individual: it indicates how close a test taker’s score is likely to be to their

‘true score’, to within some stated probability). The results of these two tools are

summarised in the following table that Cambridge Assessment English publishes in its

web page.

Cronbach’s alpha SEM

Reading 0.80 3.61

Writing 0.84 1.39

Use of English 0.84 3.18

191

Listening 0.81 2.16

Speaking 0.84 1.50

Total Score 0.94 2.78

The above extensive analysis of the Cambridge First Certificate writing exam and the

rubric proves that the exam paper is suitable and matches CEFR indications and levels;

and that the rubric used for the assessment of the exam is good and suitable in most of the

aspects. However, it is too vague because of the omission of certain bands descriptors and

the absence of examples. The following table summarises the whole analysis.

EXAM Tasks: 2

Match CEFR tasks: Yes

Match CEFR criteria: Yes

RUBRIC Type Measurement Analytic

Scoring Quantitative


192



Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

Reliable Yes

CEFR criteria Feasible Yes

Descriptors Positive Yes

Brief Yes

No vague NO

Popham’s rubric Skill worthwhile Yes

Scoring Criteria Yes

193


Descriptors (well

described)

NO

Clear and Handy? Yes

IELTS

The International English Language Testing System (IELTS) is aimed at measuring the

language proficiency of people who need a certificate to prove their capacity to study or

work in English. IELTS is jointly owned by the British Council, IDP: IELTS Australia

and Cambridge Assessment English. There are two types of tests available: academic and

general training. The former is more oriented to people who wish to study at university

in English while the latter is conceived for candidates who want to either work in English

or study Secondary Education in English. The main difference with other English

certificates, such as the Cambridge ESOL tests (PET, FCE, CAE) or the Trinity ISE, is

that the IELTS does not separate candidates by level; i.e., there are no separate tests to

achieve the certificate of that level. In contrast, all candidates take the same exam and

their results determine their level.

Academic writing is one of the four parts that make up the test. The 60-minute writing

test contains two tasks. In writing task 1, candidates must describe in academic or semi-

formal style one graph or chart. In writing task 2, they must write an essay giving their

opinion about a particular topic.

194

A sample for this task could be the following:

“The first car appeared on British roads in 1888. By the year 2000 there may be

as many as 29 million vehicles on British roads.

Alternative forms of transport should be encouraged, and international laws

introduced to control car ownership and use. To what extent do you agree or

disagree?” (IELTS web page. Sample Test Questions)

Whereas task 1 focuses on the “ability to identify the most important and relevant

information and trends in a graph”, task 2 “assesses the ability to present a clear, relevant,

well-organised argument, giving evidence or examples to support ideas and use language

accurately” (IELTS web page. Test Format).

As for the compatibility of the IELTS writing assessment with the CEFR guidelines, there

are no references to the description of any kind of graphic, diagram, table or chart as

recommended tasks for the assessment of writing. However, the CEFR recommends the

use of diagrams as a suitable task for the speaking exam or the reading comprehension.

Moreover, writing reports of memoranda may include the use of graphics and diagrams,

too. The IELTS task 1 implies the combination of two skills: on the one hand, the reading

comprehension since the candidate has to be able to read and understand the information

contained in the visual stimuli. On the other hand, the writing skill when he or she writes

the description. The production of an essay does appear in the framework as a valid task

to assess the writing skill.

IELTS CEFR

195

Task:

1. Graphic/Diagram description

2. Essay

Tasks:


questionnaires







dictation, etc.



letters

Criteria:

(Task 1) ability to identify the most

important and relevant information and

trends in a graph, chart, table or diagram,

and to give a well-organised overview of

it using language accurately in an

academic style.

(Task 2) ability to present a clear,

relevant, well-organised argument, giving

evidence or examples to support ideas

and use language accurately.

Criteria:







play.


196

The scoring of the IELTS writing test is based on the use of grading scales. In this case,

two different rubrics are used: one for the scoring of the first task and another one for

marking the second task. Both of them are analytic, as they assess individually four

different criteria: task achievement, coherence and cohesion, lexical resource,

grammatical range and accuracy. They are both quantitative with a numeric scale from 0

to 10. As they are different, depending on the task, it can be stated they are task-specific

and domain relevant. They are proficiency rubrics as they are targeted to assess the

candidate level or competence. Both are paper-based despite the fact they can be found

online.

Type of rubric according to IELTS




Application Task-specific



Channel Paper

It is now time to determine whether the rubrics used by the IELTS are good or not.






197


labelled.


Yes. All the degrees of excellence are fully explained. However, they do not contain any

example.


No. Whereas it is clear, it is possibly not very handy because it is too extensive since there

are 9 degrees of excellence.

As far as CEFR recommendations for good rubrics are concerned (the necessity of

building a feasible tool, descriptors positively worded, brief and not vague), the IELTS

rubric is feasible as it only contains 4 different criteria, but it is perhaps excessively long

owing to the numeric scale that it uses. The CEFR advice on the reduction of criteria by

the grouping thereof under one clear label have been followed. In relation to descriptors,

they are not always positively worded, as encouraged by the European Council. For

instance, bands 0 and 1 contain descriptors such as “does not attempt the task in any way”,

“does not organise ideas logically” or in band 5 “makes inadequate, inaccurate or over-

use of cohesive devices” (IELTS rubric writing task 1). The descriptors are brief, but they

do not contain any sort of example.


stated that the criteria the rubric uses are suitable because they are relevant to the skill. In

fact, the European Council uses some of them although with different label names. For

example, the “coherence and cohesion” category is equivalent to the CEFR

“organisation” and the “lexical resource” and the “grammatical range and accuracy”

198

categories are labelled as “language” in the writing scale provided by the CEFR as the

Cambridge Handbook for teachers indicates (33). However, communicative achievement

is not assessed in the IELTS exam. The validity can be confirmed in the organisation and

language criteria since the test, with the help of the rubric, measures what it is supposed

to assess and the descriptors provided match the evaluation standards included in the

CEFR.

The following table compares both of them:

Communicative

Achievement


CEFR (B2) Uses the

conventions of the

communicative

task to hold the

target reader’s

attention and

communicate

straightforward

ideas.

Text is generally

well organised and

coherent, using a

variety of linking

words and cohesive

devices.

Uses a range of

everyday

vocabulary

appropriately, with

occasional


less common lexis.

Uses a range of

simple and some

complex

grammatical forms

with a good degree

of control.

199

Errors do not

impede

communication.

IELTS (Band 6) Arranges

information and

ideas coherently

and there is a clear

overall progression

uses cohesive

devices effectively,

but cohesion within

and/or between

sentences may be

faulty or

mechanical.

May not always use

referencing clearly

or appropriately.

(Lexical) uses an

adequate range of

vocabulary for the

task.

Attempts to use less

common

vocabulary but with

some inaccuracy.

Makes some errors

in spelling and/or

word formation, but

they do not impede

communication.

(Grammatical

range)

Uses a mix of

simple and complex

sentence forms.

200

Makes some errors

in grammar and

punctuation but

they rarely reduce

communication.

Reliability

The IELTS also controls its reliability regularly. As stated on its own website, research

conducted in year 2015 investigated the reliability of the test using Cronbach's alpha and

SEM. The results were the following (ielt.org):

Alpha SEM

Listening Paper 0.92 0.37

Academic Reading 0.90 0.38

The above extensive analysis of the IELTS writing exam and tasks and the rubric proves

that the exam paper is suitable and matches CEFR in some indications and levels but not

in others and that the rubric used for the scoring is not effective or valid in many aspects.

The following table summarises the analysis.

EXAM Tasks: 2

201

Match CEFR tasks: Task 1: No

Task 2: Yes





Application Task-focused


Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

Reliable Yes

202

CEFR criteria Feasible NO

Descriptors Positive NO

Brief Yes

Not vague NO


Scoring Criteria


Yes

Descriptors (well

described)

Yes

Clear and Handy? NO

TRINITY COLLEGE ISE II

The Trinity College London Integrated Skills in English Certificate to certificate a B2

competence (equivalent to the CEFR B2) divides the exam into two exam modules: on

the one hand, the reading and writing exam; and on the other, the listening and speaking

203

exam. This division makes it different from all the other proficiency certificates and offers

a much more integrated approach. The first module lasts for two hours and contains four

tasks, two of which examine the writing ability of the candidate (tasks 3 and 4).

To begin with, task 3 is called “Reading into writing task” and connects the previous two

tasks (which focus principally on the assessment of the reading comprehension of the

candidates) with the following ones, aimed more at assessing writing production. The

candidate has to write about a composition of around 180 words based on the four reading

texts in task 2. The task aims to measure the ability to:

“identify information that is relevant to the writing prompt; identify common

themes and links across multiple texts; paraphrase and summarise factual ideas,

opinions, arguments and/or discussion; synthesise such information to produce a

coherent response to suit the purpose” (Trinity College London 13)

The type of composition can be one of the following: descriptive essay, discursive essay,

argumentative essay, article (magazine or online), informal email or letter, formal email

or letter, review, report. The same genres can be the object of task 4, which is normally

referred as “extended writing”. In this task, the student must also answer in response to a

prompt in the same number of words as the previous task. The focus is here solely on the

“ability to produce a clear and detailed response to a prompt” (13). The writing topic of

this task may be related to one of the following issues:

• Society and living standards

• Personal values and ideals

• The world of work

• Natural environmental concerns

204

• Public figures past and present

• Education

• National customs

• Village and city life

• National and local produce and products

• Early memories

• Pollution and recycling

It is advised to spend about 40 minutes on each of the writing tasks.

According to the CEFR guidelines for the assessment of the writing skill in a B2 level,

the ISE II would be highly suitable for assessing this skill, due to both the tasks it uses

and the criteria they assess.

ISE-II CEFR

Task:

3. Reading into Writing

4. Extended Writing

Types (both)

➢ Descriptive essay

➢ Discursive essay

➢ Argumentative essay

➢ Article (magazine or online)

➢ Informal email or letter

➢ Formal email or letter

➢ Review

Tasks:


questionnaires







dictation, etc.


205

➢ Report • writing personal or business

letters

Criteria:

A candidate who passes ISE II Writing

can:

➢ synthesise and evaluate

information and arguments from a

number of sources

➢ express news and views

effectively in writing and relate to

the views of others

➢ write clear, detailed texts on a

variety of subjects related to his

or her interests, following

established conventions of the

text type concerned

➢ write clear, detailed descriptions

of real or imaginary events and

experiences, marking the

relationship between ideas in

clear, connected text

➢ write an essay or report that

develops an argument

systematically, gives reasons and

Criteria:







play.


206

relevant details, and highlights

key points

➢ explain the advantages and

disadvantages of various options

➢ evaluate different ideas or

solutions to a problem

➢ summarise a range of factual and

imaginative texts, e.g. news items,

interviews or documentaries

➢ discuss and contrast points of

view, arguments and the main

themes

➢ summarise the plot and sequence

of events in a film or play.

For the assessment of the writing tasks, Trinity College London examiners must use two

rubrics, one per task. Task 3 “reading into writing” scale is analytic and assesses four

different criteria: reading for writing, task fulfilment, organisation and structure and

language control. According to the scoring, it is quantitative, with a numeric scale from

0 to 4. It is only used to assess the writing skill, so it is domain-relevant and also task

specific, since it is only used to assess task 3 in particular and cannot be used for assessing

any other task. It is a proficiency rubric, owing to the nature of the certificate, assessed

by an examiner; and it is a paper rubric, although it can be found online.

Task 4 “extended writing” rating scale is also analytic but with only three criteria: task

fulfilment, organisation and structure and language control. The rest of the classifications

207

are the same (quantitative, domain-relevant, task-specific, proficiency, examiner and

paper).

Type of rubric according to ISE-II (Task 3 and 4)







Channel Paper

It is time to deeply analyse the rubric:






Yes. The scoring criteria are few (four or three) as Popham recommends and are correctly

labelled.


Yes, the descriptors are fully described. However, no examples are provided.


208

No. Whereas it is clear, it is possibly not very handy because it is too broad. Task 3 rubric

is two-pages long. The descriptors of the two rubrics are well described. However, they

are too long, which means that the rubrics are not very handy.

Concerning the CEFR recommendation for good rubrics (the necessity of building a

feasible tool, descriptors positively worded, brief and not vague), the ISE-II rubrics are

feasible since they only contain 4 and 3 different criteria, but they are perhaps too long

owing to the long descriptors they contain. The CEFR advises on the reduction of criteria

by the grouping thereof under one clear label has been followed. In relation to descriptors,

they are not always positively worded as encouraged by the European Council. For

instance, score degree 1 says “poor achievement of the communicative aim” or “errors

frequently impede understanding” (43).



the European Council uses some of them although with different label names. For

example, the “task fulfilment” category is equivalent to the CEFR “communicative”. The

validity can be confirmed in the organisation and language criteria, since the test with the

help of the rubric measures what they are supposed to assess and the descriptors provided

match the evaluation standards included in the CEFR.


Communicative

Achievement/Task

fulfilment

Organisation/

Organisation and

structure

Language/Language

Control

209

CEFR (B2) Uses the

conventions of the

communicative

task to hold the

target reader’s

attention and

communicate

straightforward

ideas.

Text is generally

well organised and

coherent, using a

variety of linking

words and cohesive

devices.

Uses a range of

everyday

vocabulary

appropriately, with

occasional


less common lexis.

Uses a range of

simple and some

complex

grammatical forms

with a good degree

of control.

Errors do not

impede

communication.

ISE-II Excellent

achievement of the

communicative

aim.

Excellent

awareness of the

writer–reader

Effective

organisation of

text.

Very clear

presentation and

logical

development of

most ideas and

Wide range of

grammatical items

relating

to the task with

good level of

accuracy.

210

relationship (i.e.

appropriate use of

standard style and

register throughout

the text).

All requirements

(i.e. genre, topic,

reader,

purpose and

number of words)

of the

instruction

appropriately met.

arguments, with

appropriate

highlighting

of significant

points and relevant

supporting detail.

Appropriate format

throughout the text.

Effective

signposting.

Wide range of

lexical items

relating to the

task with good level

of accuracy.

Any errors do not

impede

understanding.

Excellent spelling

and punctuation.

Reliability

The Trinity College London ISE-II does not provide any data on the Cronbach’s Alpha

coefficient or the average SEM of the test.

The above extensive analysis of the ISE-II writing exam and tasks and the rubric proves

that the exam paper is suitable and matches CEFR in some indications and levels but not

in others, and that the rubric used for the scoring is not effective or valid in some aspects.

The following table summarises the analysis:

EXAM Tasks: 2

211








Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

Reliable Unknown


212

Descriptors Positive No

Brief No

Not vague YES


Scoring Criteria


Yes

Descriptors (well

described)

NO

Clear and Handy? YES

ACLES

The Association of Language Centres in Higher Education certifies whether a learner has

or not a concrete level through an exam in which the four skills are measured individually.

The B2 accreditation assesses separately the speaking expression and interaction, the

listening comprehension, the writing production and the reading comprehension.

213

The writing production paper lasts for 70-90 minutes. Candidates must write two texts

which can be descriptive, narrative, informative or argumentative. Text must be at least

125-word long each and not over 450 words in total. Among the possible tasks that we

can find: informal letter to tell about news, experiences or feelings; answer a professional

letter or email; answer an advertisement; write a CV and a cover letter; summarise and

state an opinion; write a text for a magazine or forum or write instructions (ACLES.

Estructura Exámenes).

In the ACLES accreditation model document, it is stated that the criteria for the task are

the same as those indicated in the CEFR. However, it is not really specified the criteria

the writing tasks intend to focus on. As for the tasks, they are suitable and match those

indicated by the European Council.

ACLES CEFR

Task:

• informal letter to tell about news,

experiences or feelings

• answer a professional letter or

email; answer an advertisement

• write a CV and a cover letter

• summarise and state an opinion

• write a text for a magazine or

forum or write instructions

Tasks:


questionnaires







dictation, etc.


214


letters

Not specified Criteria:







play.


A rubric is used to assess both tasks of the writing paper. This rubric is analytic and it

contains four different criteria (task adequacy, organisation and register, grammar and

vocabulary and orthography and punctuation). It is a quantitative rubric with a numeric

scale from 1 to 10 with 5 degrees of excellence (1-2; 3-4; 5-6; 7-8; 9-10) and also a

qualitative word-scale of 5 levels (from very deficient to very well for a B2 level). It is a

domain-relevant, since it is a writing scale and also skill-focused. Furthermore, the rubric

can be classified as a proficiency rubric, the scorer is an examiner and it is a paper rubric.

Type of rubric according to ACLES


How is scored Quantitative and Qualitative


215




Channel Paper

The deep analysis of the rubric is:






No. The scoring criteria are few (four) as Popham recommends but they are not correctly

labelled. The orthography and punctuation criteria could be included under the “grammar

and vocabulary” category and punctuation under the “organisation and register” category.


Yes, the descriptors are fully described. However, no examples are provided.


Yes, the rubric is easy to use and clear.

Regarding CEFR recommendations for good rubrics (the necessity of building a feasible

tool, descriptors positively worded, brief and not vague), the ACLES rubric is feasible

since it contains 4 different criteria, which are well described but not too long. In relation

to descriptors, they are not always positively worded as encouraged by the European

216

Council. For instance, score degree 3-4 states words as “bastantes errores” or “dominio

insuficiente”. Nevertheless, a clear attempt to uses positively worded descriptors is

observed in most of the descriptors.

Finally, are the rubric and the test relevant, valid and reliable? The criteria the writing

tasks intend to measure are not specified so it is difficult to state if they are relevant to the

skill. It is supposed that as the tasks are adapted to the ones mentioned in the CEFR and

it is stated that the criteria followed are the same described in the framework; the paper

should be relevant, but it is difficult to state it for sure. The validity can be confirmed in

the organisation and language criteria since the test, with the help of the rubric, measures

what it is supposed to assess and the descriptors provided match the evaluation standards

included in the CEFR.


Communicative

Achievement


CEFR (B2) Uses the

conventions of the

communicative

task to hold the

target reader’s

attention and

communicate

straightforward

ideas.

Text is generally

well organised and

coherent, using a

variety of linking

words and cohesive

devices.

Uses a range of

everyday

vocabulary

appropriately, with

occasional


less common lexis.

Uses a range of

simple and some

217

complex

grammatical forms

with a good degree

of control.

Errors do not

impede

communication.

ACLES (9-10) Tanto las ideas

simples como las

complejas se

comunican con

claridad. El texto

está bien

estructurado.

Enlaza las frases y

conceptos

apropiadamente. El

formato, el tono y el

estilo son

apropiados.

Demuestra un

dominio muy

Bueno de

estructuras

gramaticales,

simples y

complejas. Usa un

vocabulario amplio,

conforme con lo

esperado del nivel

B2. Usa con acierto

algunas poco

communes.

Reliability

The ACLES web page does not provide the Cronbach’s Alpha coefficient or the average

SEM; nor does it give any information about any reliability research data.

218

The following text summarises the whole analysis.

EXAM Tasks: 2


Match CEFR criteria: Unknown


Scoring Quantitative and Qualitative




Scorer Examiner

Channel Paper

Relevant Unknown

Valid Yes

219

Reliable Unknown



Brief YES

Not vague YES


Scoring Criteria


NO

Descriptors (well

described)

Yes


EOI

220

The Official School of Languages (EOI) offers English courses ordered by level. The B2

level is called “Avanzado 2” and grants a title which certifies that level if the exam is

passed. Although the basics of the different Schools of Languages in Spain are the same,

each autonomous community is responsible for the creation and evaluation of the exam

certificate. The exam, criteria and rubrics analysed below correspond to the EOI Gijón

(Principado de Asturias). The reason being that this School publishes is rubrics online so

that all the students can access them. However, the EOI of A Coruña does not publish any

rubric online and does not allow the students to see the rubrics.

The paper for the assessment of writing expression and interaction consists of one or two

tasks. The tasks will be between 75 words and 250 words long. One of the tasks might be

completing or writing a text following the information given, and the other task may be a

free composition about a prompt given. The writing paper lasts for 90 minutes. The

possible types of tasks that can appear are the following: writing a personal letter of email

describing feelings and experiences, writing a formal letter, writing a text about personal

experiences or events, writing an opinion or argumentative texts, writing about routines

in present or past, describing people, objects or places, describing a picture, telling a story,

correcting or completing a letter, taking notes or summarising a conference or a film,

removing illogical words from a text, cloze, ordering a text (Escuela Oficial de Idiomas

de Gijón. Departamento de Inglés 175-176). As for the criteria that the candidates must

prove, the paper intends to assess their ability to: write notes to transmit simple

information, write letters with news and points of view, write reports which develop an

argumentation, write reviews, write down structured notes on relevant information from

a lecture or conference, summarise texts, organize texts according to the text typology,

use cohesive devices, use wide range of vocabulary, use a wide range of grammar

structures, express opinions (174-175).

221

If the exam tasks and criteria are compared to the tasks and criteria proposed in the

framework for the assessment of the writing skills, it can be stated that the exam is suitable

since uses most of the tasks contained in the CEFR and the criteria also match.

EOI CEFR

Tasks:

1 or 2 among the following:

• writing a personal letter of email

• describing feelings and

experiences

• writing a formal letter

• writing a text about personal

experiences or events

• writing an opinion or

argumentative texts

• writing about routines in present

or past

• describing people, objects or

places

• describing a picture

• telling a story

• correcting or completing a letter

• taking notes or summarizing a

conference or a film

Tasks:


questionnaires







dictation, etc.



letters

222

• removing illogical words from a

text

• cloze

• ordering a text

Criteria:

Ability to:

• write notes to transmit simple

information

• write letters with news and points

of view

• write reports which develop an

argumentation

• write reviews

• write down structured notes on

relevant information from a

lecture or conference

• summarise texts

• organise texts according to the

text typology

• use cohesive devices

Criteria:







play.


223

• use wide range of vocabulary, use

a wide range of grammar

structures

• express opinions

The writing paper is assessed with a rubric, which is analytic. It assesses seven different

aspects or criteria (orthography and punctuation, grammar, vocabulary, register,

interaction and discourse management) and two of them are subdivided into a further two.

Attending to the scoring, it is both qualitative and quantitative as the six levels of

excellence are described with a number (from 0 to 5) and words. It is a domain relevant

and skill-focused rubric and it is also a proficiency rubric. It is a paper rubric and it is

used by an examiner.

Type of rubric according to EOI


How is scored Quantitative and qualitative





Channel Paper

Here is the deep analysis:


224





No, the number of scoring criteria is excessively high. Moreover, the labels are much too

long.


Yes, the descriptors are described suitably.


No. Despite being clear, the rubric is excessively long which make it very unhandy.

As for the CEFR recommendations for good rubrics (the necessity of building a feasible

tool, descriptors positively worded, brief and not vague), the EOI rubric is not feasible

since it contains 7 different criteria. Furthermore, CEFR advice on the reduction of criteria

by the grouping thereof under one clear label has not been followed. The labels should be

regrouped and shorten to ease the examiner’s task. For example, the orthography,

vocabulary and grammar criteria could be assessed within the same criteria. In relation to

descriptors, they are generally positively worded as encouraged by the European Council

but there are also some negative worded descriptors. For instance, “the candidate is not

able to exchange, ask or comment information” (Interaction- 0 points). The descriptors

are quite long. Besides, they are so many that is hardly impossible for the examiner to

score somebody fast or to distinguish between two close levels.



225

fact, the European Council uses three of them (communicative achievement, organisation

and language) in the writing scale provided by the CEFR. The validity can also be easily

confirmed since the test, with the help of the rubric, measures what it is supposed to assess

and they descriptors provided match the evaluation standards included in the CEFR.

Communicative

Achievement


(Grammar/Vocabulary)

CEFR (B2) Uses the

conventions of the

communicative

task to hold the

target reader’s

attention and

communicate

straightforward

ideas.

Text is generally

well organised and

coherent, using a

variety of linking

words and

cohesive devices.

Uses a range of

everyday vocabulary

appropriately, with

occasional


less common lexis.

Uses a range of simple

and some complex

grammatical forms

with a good degree of

control.

Errors do not impede

communication.

EOI (5 points) Describes, presents

situations,

necessities, facts

Builds a coherent

and clear speech

which adjusts to

Uses a wide range of

grammar structures and

226

and opinions,

explains, gives

reasons in a

proficient way

with not difficulty.

Writes with

fluency and ease.

the required text

typology and the

paragraph and

organisation

conventions. Uses

with ease cohesive

devices and key

words and phrases.

uses precise and varied

vocabulary.

Reliability

The web page of the school of Languages of Gijón does not give any date related to any

reliability coefficient or research.

EXAM Tasks: 1-2


Match CEFR criteria: YES


Scoring Quantitative and qualitative


227

Application Task-focused


Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

Reliable Unknown


Descriptors Positive NO

Brief NO

Not vague Yes


Scoring Criteria No

228


Descriptors (well

described)

YES

Clear and Handy? NO

6.3. Speaking

Productive skills such as writing and speaking are each day more and more assessed with

the help of a rubric. Although the appearance of the rubric is closely linked to the

assessment of writing compositions and essays, its use for the assessment of the speaking

skill is currently becoming habitual. In Spain, the encouragement of the communicative

approach and the development of the European Framework of Reference for Languages

have led many teachers to implement this tool in order to evaluate the speaking

performances of their students. The normal tasks used to assess speaking are

presentations, description of pictures, interviews, discussions and dialogues between

students.

The CEFR points out that the oral productions tasks should be (58):

• public address (information, instructions, etc.)

• addressing audiences (speeches at public meetings, university lectures, sermons,

entertainment, sports commentaries, sales presentations, etc.)

229

In addition, it is indicated that the assessment of the speaking skills can include some of

the following tasks: reading a written text aloud; speaking from notes, or from a written

text or visual aids (diagrams, pictures, charts, etc.); acting out a rehearsed role; speaking

spontaneously; singing (CEFR 58).

6.3.1. Literature Review

The evaluation of speaking is a relatively new since, in the past, traditional

methodologies, such as the Grammar-Translation Method, were dominant in the teaching

of languages in our country. That situation has luckily been changed in the recent years,

mainly promoted by the popularity of the communicative approach and the establishment

of the CEFR, as has already been explained.

Many authors have defended how the “communication skills are critical for intellectual

development, career trajectory, and civic engagement” (Schreiber et al. 207). The

introduction of the speaking competence in the class brings along the necessity of

significant changes in terms of teaching methodology and assessment. Students must

practice this skill in the class and the teacher needs to check the learning of the students

through the evaluation of this skill.

The assessment of a speaking task is completely different from the traditional assessment

of any grammar or vocabulary exercises. There are no right or wrong answers and many

other factors may be taken into account. For example, in order to assess speaking

performance, there are some dimensions related to the language itself, such as the

grammar or the vocabulary and some other strictly related with the performance or the

message delivery such as the articulation, or the non-verbal behaviour.

230

Bygate states that “the process of speaking includes three main phases: conceptualizing

the message content, formulating the message linguistically, and articulating the message

(cited in Baitman and Veliz 177). Some other authors, such as Alderson and Bachman,

distinguish only two types of knowledge: language knowledge and the textual knowledge

(cited in Baiman and Veliz 179). The language type refers to syntax, phonology and

vocabulary; and the textual type refers to the organisation and cohesion.

The assessment of speaking in L2 it is particularly complex. Baitman and Veliz explained

some of the problems which can be faced. Among them, the simulation of a real-language

use situation might be hard to achieve, in addition to the fact it is costly in terms of time

and resources. Many studies have been conducted in order to provide the community with

a better scope on the assessment of this skill. Schreiber et al., for instance, reviewed some

of them, such as that carried out by Quianthy and Haffering. They considered the topic

choice, fulfilment of the oral discourse, purpose determination, supporting materials and

strategic word choice were the dimensions to be assessed with regard to the composition

of the message, whereas vocal variety, articulation, language use and non-verbal

behaviour must also be scored (cited in Schreiber et al. 208)

Through an extensive collection of speaking rubrics and studies on them, it was

determined that some of the dimensions and criteria often used in the assessment of

speaking were always the same or very similar, although they can be called by different

names. The range of vocabulary, grammar (linguistic control), organisation or structure,

link devices (connectors, cohesion), content (topic development) and pronunciation

(articulation, vocal expression) appear in 95% of the rubrics revised. As for those which

include the performance or delivery, non-verbal behaviour, eye contact and the use of

supporting materials are the most frequent dimensions.

231

With reference to the assessment of speaking, numerous studies have dealt with different

problems or related issues related. The one conducted by Emrah Ekmekçi and the one

mentioned above by Brittany Baitman and Mauricio Veliz are just two examples of case

studies which examined whether there are significant differences between the assessment

by native-speaker teachers and non-native speakers. However, they did not reach the same

conclusions. While the former, carried out with 80 EFL students and 6 teachers (3 NNES

and 3 NES) and a 20-point analytic scale, found no difference (Ekmekçi 104), the latter,

in which 12 teachers participated and scored 4 TOELF independent tasks with a analytic

scale, proved that NNES gave lower scores than the NES (Baitman and Veliz 186). The

rubric used was analytic and measured accuracy, fluency, pronunciation and vocabulary.

Native-speaker teachers tended to give more importance to fluency and pronunciation,

whereas non-native teachers scored grammar accuracy and vocabulary more stringently

(ibid. 191). In the same line, the study conducted by Zhang and Elder in China with 30

test-takers and holistic numerical ratings from 39 examiners (19 NES and 20 NNES)

showed no significance difference, although it did allow the researchers to ascertain that

linguistic features were considered to be more relevant for NNES. Meanwhile, NES

teachers tend to focus more on interaction, compensation strategies and demeanour (cited

in Baitman and Veliz).

Another pertinent matter regarding the assessment of the oral skill seems to be the

reliability and validity of peer-, self- and teacher assessment of the oral productions of

EFL learners, as shown by the high number of investigations which dealt with this issue.

With the intention of giving an answer to questions such as how reliable each of them is,

and if there exists any correlation among the three types of assessment for the oral and

written productions, Salehi and Sayyar published in the International Journal of

Assessment and Evaluation in Education an article where their study is explained. 32

232

students from three English Language Teaching Institutes acted as self- and peer-

assessors, and two experienced teachers as the teacher assessors. The students received a

seminar in which they were explained what self- and peer-assessment are and how to

carry them out. The results were very positive in terms of reliability, as high inter-rater´s

reliability was found through the comparison of all the peer assessment (14). The

correlation between the two teachers was strong in the teacher assessment of the oral

production. However, in the matter of correlation between peer- and teacher-assessment

and self-assessment and teacher-assessment, the results were dissimilar. Whilst in the

assessment of the written production the correlation was found to be strong and high

(r=.85 and r=.79 respectively), no significant correlation was noted in the scores of oral

productions given by the teachers and the self-assessment (r=.30). Correlation between

peer-assessment and teacher-assessment was significant (r=.61) but still lower than the

correlation achieved in the written production (16). The results suggest that peer-

assessment is more reliable than self-assessment in oral production.

One of the main disadvantages often outlined when speaking about the evaluation of the

speaking skill is how time-consuming it may be. Jackson and Ward state that “assessing

public speaking on an individual basis, especially in larger cohorts, is very time

demanding in terms of organizing sessions, staff availability and feedback” (2). Because

of this, they wanted to create and test the use of a rubric and to identify which factors

might affect the variation among markers so that rubrics could be used in the future as a

reliable tool able to speed up and standardise feedback. 32 international students

participated and three academic markers scored their public speaking performances. The

public speaking rubric used was the one developed by Schreiber et al. in the above-

mentioned research, which included ten aspects related with the two main categories of

content and delivery. It was found out that the dimensions of design and development of

233

the speech and the persona were those which showed a bigger difference among the three

scorers´ marks. Body language was the second aspect, although on a lower scale.

Hensley and Brand wrote an essay from the Communication Department in Millikin

University (Illinois) to provide the educational community with information in order to

“prepare and demonstrate the most effective ways to craft and delivered messages

adopted to a wide variety of audiences” (2). They advise giving advanced or exemplary

speech (both verbal and non-verbal) to the students so that they can learn how to perform

them correctly by watching and examining different excellent examples. Another

suggestion is the use of supporting materials which back up the ideas of the speech. As

for the organisation of the speech, they recommend structuring the same into four

different phases: an introduction which focuses on catching the reader’s attention, the

establishment of the thesis and a good conclusion. Moreover, natural and fluent

transitions must be used through all the phases.

Luu Trong Tuan focused on the use of analytic rubrics to assess the speaking

performance. Tuan collected some positive or negative aspects of the analytic rubrics

mentioned in different works about analytic rubrics. Among the advantages, one can find

the fact of being a useful diagnostic tool for students and also for teachers. Learners can

easily obtain feedback to improve their performances and the teacher may, through the

information taken from the use of rubrics, tailor the instruction according to the needs of

the students (674). Trong also explained that inexperienced scorers might find the analytic

rubric more useful, as it does not require as much expertise as a holistic rubric to obtain

reliable results. As for the disadvantages collected, Trong mentions the great amount of

time it requires to use them as opposed to holistic rubrics, the decrease of the

interconnectedness of spoken discourse and the fact that the criterion scored first may

have an effect on the evaluation of the subsequent criteria (674). After the literature

234

review on the matter, Trong carried out a study with 104 students divided in two groups.

The experimental group (51 students) was scored with an analytic rubric while the control

group (53 students) was assessed with a holistic one. The teacher responsible for scoring

was the same for both groups. The final test of the previous semester was used as “pre-

test”, while the six speaking tests during the semester formed the “post-test”. The analytic

rubric used contained five criteria: coherence, content, grammar and structure, language

and organisation. Furthermore, a questionnaire which consisted of six items and a six-

scale Likert type was conducted. The speaking competence of the students was almost

the same according to the pre-test results. After the realisation of the 6 tests of the post-

test phase, it was discovered that the students from the control group (scored with a

holistic rubric) did not improve much from the first test to the sixth. They obtained an

average mark of 6.31 in the first test and 6.58 in the final one. However, the experimental

group (scored with the analytic rubric) showed an increase from an average mark of 6.33

in the first test to a 7.06 in the last. The improvement in terms of criteria was highest in

the content one. This suggests that analytic rubrics help students´ improvement more than

the holistic ones.

6.3.2. Assessment of Speaking in the main English Certificates of ESL


The Cambridge First Certificate (FCE) speaking paper consists of four different tasks or

parts. These involve “short exchanges with the interlocutor; a 1-minute individual ‘long

turn’; a collaborative task involving the two candidates; a discussion” (Cambridge

English Language Assessment 71). The total duration of the test is 14 minutes.

235

Each part focuses on different language functions. As a result, part one is a conversation

between the examiner and the candidate in which the examiner will ask the candidate

different questions about general personal information (place of birth, family,

hobbies…etc.). The aim of this part is the assessment of the candidate’s ability to use

general interactional and social language. In part 2, the candidate is asked to compare two

pictures for around one minute. This task aims to assess how the candidate is able to

organise a larger unit of discourse; compare, describe and express opinions. This task is

also individual, whereas part 3 is collaborative and involves two candidates. Candidates

are given one written question with different written stimuli to talk together. “The focus

is on sustaining an interaction; exchanging ideas, expressing and justifying opinions,

agreeing and/or disagreeing, suggesting, speculating, evaluating, reaching a decision

through negotiation, etc.” (Cambridge English Language Assessment, 71). Finally, the

last part of the exam is also collaborative and implies a discussion between the two

candidates of some questions asked (orally) by the examiner and related to the previous

task. Candidates are expected to prove their ability to express agreement and

disagreement, give opinions and justify them, and speculate.

The CEFR provides the education community with five different illustrative scales for

the speaking skill levels (58-61). Those scales can be used to assess the level of the:

• Overall spoken production

• Sustained monologue: describing experience

• Sustained monologue: putting a case (e.g. in debate)

• Public announcements

• Addressing audiences

236

The criteria for a speaking B2 level overall production states that a B2 learner “can give

clear, systematically developed descriptions and presentations, with appropriate

highlighting of significant points, and relevant supporting detail” and he or she “can give

clear, detailed descriptions and presentations on a wide range of subjects related to his/her

field of interest, expanding and supporting ideas with subsidiary points and relevant

examples” (CEFR 60). In addition, some other specifications for particular functions such

as describing an experience or doing a debate can be extracted from the other rubrics

provided. Thus, a B2 learner must be able to describe different topics in detail, give

reasons and support them in discussions and debates or highlight the advantages or

disadvantages of different options. He or she has the ability to give presentations with

clarity and fluency and depart spontaneously from those when follow-up questions are

posed, or interesting points are raised by the audience.

Although only one of the tasks (part 1) included in the speaking exam is among those

mentioned in the CEFR proposal of tasks to assess speaking (public address:

information), the other three functions are mentioned in the activities that the assessment

of speaking may involve. Speaking from pictures (part 2), speaking from written texts

(part 3) and speaking spontaneously (part 4, part 1). With regard to the criteria the CEFR

gives to assess a B2 learner, the tasks proposed by the FCE do assess them. However, it

would be more recommendable to include a task involving production of speech by the

candidate since it is the main task advised in the framework. The following table

summarises and compares both of them.

FCE CEFR (B2)

Task: Tasks:

237

Part 1: questions

Part 2: monologue. Description of a

picture

Part 3: collaborative task. Discussion a

written question with stimuli

Part 4: collaborative task. Discussion of

questions related with the previous task

• public address (information,

instructions, etc.)

• addressing audiences (speeches at

public meetings, university

lectures, sermons, entertainment,

sports commentaries, sales

presentations, etc.)


• speaking from notes, or from a

written text or visual aids

(diagrams, pictures, charts, etc.)



• singing

Criteria:

Part1: ability to use general interactional

and social language

Part 2: ability to organise a larger unit of

discourse; compare, describe and express

opinions

Part 3: sustaining an interaction;

exchanging ideas, expressing and

justifying opinions, agreeing and/or

disagreeing, suggesting, speculating,

Criteria:

• clear, systematically developed

descriptions and presentations

• supporting ideas with subsidiary

points and relevant examples


• describe with details different

topics

• give reasons and supporting them

in discussions and debates

238

As for the rubric the FCE uses to score the candidates in the speaking exam, it is an

analytic rubric. It contains 4 different categories to assess: Grammar and Vocabulary,

Discourse Management, Pronunciation and Interactive Communication. It is also a

quantitative rubric with a numeric scale from 0 to 5, and it is domain-relevant and skill-

focused since it is used to assess all the tasks of the speaking exam. It is a proficiency

rubric because the certificate aims to determine the level of the candidate. The scorer is a

trained examiner and it is a paper rubric although it can be found on the Internet, too.

Type of rubric according to Cambridge FCE







Channel Paper

evaluating, reaching a decision through

negotiation, etc.”

Part 4: expressing and justifying

opinions, agreeing and/or disagreeing and

speculating

• highlight the advantages or

disadvantages of different options

• give presentations with clarity and

fluency

• depart spontaneously from

discussion when follow up

questions are posed

239

With reference to whether the rubric used by the FCE examiners is good or not, here is

the analysis.


Yes. The skill assessed is the speaking skill. In the literature review section, the

importance of the speaking skill has been fully addressed. Moreover, the CEFR boosts

the communicative approach so this is probably the most important skill.



labelled.



descriptors are appropriately described but in some cases they are a bit short and they do

not provide examples, which would be recommendable. Furthermore, the descriptors for

scale 4, 2 and 0 are very vague and imprecise as they merely indicate a performance

between the bands above and below.



handy.

In regard to the CEFR recommendation for good rubrics (the necessity of building a

feasible tool, descriptors positively worded, brief and not vague), the Cambridge rubric

for the FCE is feasible since it contains only 4 different criteria. In relation to the

240

descriptors, they are positively worded as encouraged by the European Council. For

instance, band 1 of the language criteria, which corresponds to one of the lowest levels,

states “Shows a good degree of control of simple grammatical forms” (Cambridge

Language Assessment 82). The descriptors are brief but, as it has already been stated,

some of the bands are very vague because they only include: performance shares features

of Bands X and X as the only descriptor.



the European Council uses 5 different criteria to assess speaking (range, accuracy,

fluency, interaction and coherence). If the labels are not the same, the truth is that the

labels used by the FCE measure the same aspects but one. Hence, the grammar and

vocabulary category from the Cambridge rubric is equivalent to the CEFR’s ‘accuracy’.

The assessment of the ‘range’ and ‘coherence’ and ‘fluency’ corresponds to the ‘discourse

management’ in the FCE and the ‘interaction’ is clearly represented by the ‘interactive

communication’ category. On the other hand, the FCE proposes the assessment of

pronunciation and the CEFR does not.

The validity can also be easily confirmed since the test, with the help of the rubric,

measures what it is supposed to assess and the descriptors provided match the evaluation

standards included in the CEFR. A simple comparison between the CEFR speaking scale

and the FCE speaking rubric shows the similarities. (The information has been taken from

the CEFR (28) and the Cambridge First Handbook (82).

241

Range, Fluency and

Coherence

/Discourse

Management

Accuracy

/Grammar and

vocabulary

Interaction

/Interactive

Communication

CEFR (B2) Has a sufficient

range of language

to be able to give

clear descriptions,

express viewpoints

on most general

topics.

Can produce

stretches of

language with a

fairly even tempo;

although he/she can

be hesitant. There

are few noticeably

long pauses.

Can use a limited

number of cohesive

devices to link

utterances.

Sows a relatively

high degree of

grammatical

control. Does not

make errors which

cause

misunderstanding

and can correct

most of his/her

mistakes.

Can initiate

discourse, take

his/her turn when

appropriate and end

conversation when

he/she needs to.

Can help the

discussion along on

familiar ground

confirming

comprehension,

inviting others in,

etc.

242

FCE (Band 5) Produces extended

stretches of

language with very

little

hesitation.

Contributions are

relevant and

there is a clear

organisation of

ideas.

Uses a range of

cohesive devices

and discourse

markers.

Shows a good

degree of control

of a range of

simple and some

complex

grammatical forms.

Uses a range of

appropriate

vocabulary to give

and exchange

views on a wide

range of familiar

topics.

Initiates and

responds

appropriately,

linking

contributions to

those of

other speakers.

Maintains and

develops the

interaction and

negotiates

towards an

outcome.

Reliability

Reliability coefficients of the ESOL certificates have already been stated in the writing

section.

The above extensive analysis of the Cambridge First Certificate speaking exam and tasks

and the rubric proves that the exam paper is suitable and matches both the CEFR

indications and levels, and that the rubric used for the assessment of the exam is good and

suitable in most of the aspects. However, it is too vague owing to the omission of certain

243

bands descriptors and the absence of examples. The following table summarises the

analysis.

EXAM Tasks: 4








Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

244

Reliable Yes



Brief Yes

Not vague NO


Scoring Criteria


Yes

Descriptors (well

described)

NO


IELTS

245

The IELTS speaking exam contains three tasks and has a total duration of 11-14 minutes.

The first part lasts for 4 or 5 minutes. In this part the examiner introduces himself or

herself and asks the candidate a few general questions on familiar topics such as family,

interests, hobbies, etc. The aim of this part is to check the ability to communicate on

everyday topics. In the second task of the test, the learner has to speak on his or her own

for around two minutes on a particular topic given by the examiner in one card with some

points to mention. The candidate is allowed one minute to prepare the task and make

some notes. Afterwards, the examiner asks him or her some question related. This part is

about 4 minutes long. This task intends to measure the “ability to speak at length on a

given topic (without further prompts from the examiner), using appropriate language and

organising ideas coherently” (IELTS web page. Test format). Finally, task 3 is a

discussion between the examiner and the candidate about a topic related to the previous

part. It “focuses on the ability to express and justify opinions and to analyse, discuss and

speculate about issues.” (ibid.)

The IELTS test format and criteria coincide almost exactly with the CEFR guidelines on

tasks and criteria to assess the speaking performance of a candidate. According to the test

format, the candidate must address audiences and speak from notes (part 2) and also speak

spontaneously (part 1 and 3). Furthermore, the long turn exposition makes it possible to

assess the ability of the candidate to give presentations, support his or her ideas (part 2),

give reasons and supporting them (part 3) or depart spontaneously (part 3).

IELTS CEFR (B2)

Task:

(Task 1) Interview

Tasks:

246

(Task 2) Long turn (monologue)

(Task 3) Discussion


instructions, etc.)












• singing

Criteria:

(Part 1) ability to communicate opinions

and information on everyday topics and

common experiences or situations by

answering a range of questions

(Part 2) ability to speak at length on a

given topic (without further prompts

from the examiner), using appropriate

language and organising ideas coherently

Criteria:







topics



247

As for the rubric the IELTS uses to score the candidates in the speaking exam, it is an

analytic rubric. It contains 4 different categories to assess: Fluency and coherence, lexical

resource, grammatical range and accuracy and pronunciation. It is also a quantitative

rubric with a numeric scale from 0 to 9; it is domain-relevant and skill-focused since it is

used to assess all the tasks of the speaking exam. It is a proficiency skill because the

certificate aims to determine the level of the candidate. The scorer is a trained examiner

and it is a paper rubric although it can be found on the Internet, too.

Type of rubric according to IELTS







Channel Paper

(Part 3) ability to express and justify

opinions and to analyse, discuss and

speculate about issues




fluency



questions are posed

248

In the matter of whether the rubric used by the IELTS examiners is good or not, the

analysis can be found below.







labelled.


Yes. There is enough description for all of them although they do not contain any

example.


No, the grading scale is too long so the scoring process can be tedious and confusing.

As regards the CEFR recommendations for good rubrics (the necessity of building a

feasible tool, descriptors positively worded, brief and not vague), the IELTS rubric is

feasible since it only contains 4 different criteria. Nevertheless, the extension of the

grading scale (ten different levels) makes it difficult to use. In relation to the descriptors,

they are not always positively worded as encouraged by the European Council). For

example, some descriptors such as “cannot produce basic sentence forms” can be read in

band 2 or “cannot respond without noticeable pauses” (IELTS. Speaking rubric). The

descriptors are brief although they do not provide the examiner with examples.

249


stated that the criteria the rubric uses are suitable because they are relevant to the skill.

The European Council uses 5 different criteria to assess speaking (range, accuracy,

fluency, interaction and coherence). There are no criteria to measure “interaction” in the

IELTS rubric and, since there is a discussion task, it should be included. The other criteria

match the ones used by the CEFR, they are just grouped in a different way.

The validity cannot be confirmed since the descriptors provided match the evaluation

standards included in the CEFR but the rubric does not measure all that it is supposed to

assess. This is because there are no interaction criteria, despite a discussion task being

included in the exam format. A simple comparison between the CEFR writing scale and

the IELTS speaking rubric shows the similarities and differences.

Range, Fluency and

Coherence

/Discourse

Management

Accuracy

/Grammar and

vocabulary

Interaction

/Interactive

Communication


range of language

to be able to give

clear descriptions,

express viewpoints

on most general

topics.

Sows a relatively

high degree of

grammatical

control. Does not

make errors which

cause

misunderstanding

and can correct

Can initiate

discourse, take

his/her turn when

appropriate and end

conversation when

he/she needs to.

250

Can produce

stretches of

language with a

fairly even tempo;

although he/she can

be hesitant. There

are few noticeably

long pauses.

Can use a limited

number of cohesive

devices to link

utterances.

most of his/her

mistakes.

Can help the

discussion along on

familiar ground

confirming

comprehension,

inviting others in,

etc.

IELTS (Band 6) (Fluency and

coherence)

Candidate is

willing to speak at

length, though may

lose coherence at

times due to

occasional

repetition, self-

correction or

hesitation.

He uses a range of

connectives and

(Lexical resource)

Candidate has a

wide enough

vocabulary to

discuss topics at

length and make

meaning clear in

spite of

inappropriacies.

He generally

paraphrases

successfully.

251

discourse markers

but not always

appropriately

(Grammatical

range and

accuracy)

He uses a mix of

simple and

complex structures,

but with limited

flexibility.

He may make

frequent mistakes

with complex

structures though

these rarely cause

comprehension

problems.

Reliability

The reliability data used to check the IELTS certificates have been discussed in the

writing section.

The above extensive analysis of the IELTS speaking exam and tasks and the rubric proves

that the exam paper is suitable and matches CEFR indications and levels but that the

rubric used for the assessment of the exam is not suitable in many aspects.

The following table summarises the analysis.

252

EXAM Tasks: 4








Scorer Examiner

Channel Paper

Relevant Yes

Valid No

Reliable Unknown

253

CEFR criteria Feasible No


Brief Yes

Not vague NO


Scoring Criteria


Yes

Descriptors (well

described)

Yes

Clear and Handy? NO

ISE II

As it happens with the reading and writing skills, the ISE-II assesses two skills together.

Therefore, both the listening and speaking abilities of the candidate form an integrated

approach. This means they are both evaluated in the same exam. This module exam

254

consists of four tasks and lasts for 20 minutes. The first task is called the “topic task” and

the candidate is allowed to speak about a topic within his or her personal interests which

he or she has previously prepared. Moreover, he/she is allowed to use notes or maps to

ease the task. Candidates may also use an item such a picture. The timing for this task is

4 min. The examiner will ask the candidate questions related to the topic chosen. The

candidate is expected to be able to use these language functions:

• Initiating and maintaining the conversation

• Expressing and expanding ideas and opinions

• Highlighting advantages and disadvantages

• Speculating

• Giving advice

• Expressing agreement and disagreement

• Eliciting further information

• Establishing common ground

The second task is named “collaborative task” and it is also 4-minute long. In this part,

the examiner poses a prompt in the form of a dilemma and the candidates ask questions

to find out more information and keep the conversation going. The language functions

expected to be managed by the candidate are:




• Speculating

• Giving advice


255



Task 3 is the “conversation task” and has a duration of 2 minutes. The examiner asks

the candidates questions on a subject (society and living standards, personal values and

ideals, the world of work, national environmental concerns and public figures past and

present) and they start a conversation. The candidate must be able to prove his or her

ability:




• Speculating

• Giving advice




The last task of this module exam assesses listening exclusively so it will be analysed in

the listening section.

With regard to the adequacy of the exam design according to the CEFR guidelines, it

can be said that the ISE-II tasks match the tasks proposed by the framework and also the

criteria which must be measured to certificate the speaking of a B2 candidate.

Therefore, the exam is suitable.

256

ISE-II CEFR (B2)

Task:

(Task 1) Topic Task

(Task 2) Collaborative Task

(Task 3) Conversation Task

Tasks:


instructions, etc.)












• singing

Criteria:

(Part 1)

• Initiating and maintaining the

conversation

• Expressing and expanding ideas

and opinions

Criteria:






257

• Highlighting advantages and

disadvantages

• Speculating

• Giving advice

• Expressing agreement and

disagreement



(Part 2)


conversation


and opinions


disadvantages

• Speculating

• Giving advice


disagreement



(Part 3)


conversation


topics




disadvantages of different

options.


fluency



questions are posed.

258

This module is assessed with two rubrics, one for the first three tasks and another one

for the last one. Since the tasks related to speaking are the first, only that rubric will be

analysed in the current section.

The ISE II Speaking and listening rating scale is analytic and it contains four criteria

(communicative effectiveness, interactive listening, language control and delivery). It is

a quantitative rubric, too: the numeric scale used ranges from 0 to 4. It is domain-

relevant and is skill-focused since it is used to assess three different tasks even though

they aim to assess two skills. It is a proficiency rubric, paper based despite the fact that

it can be found online. Finally, an examiner is responsible for the scoring with it.

Type of rubric according to ISE-II




and opinions


disadvantages

• Speculating

• Giving advice


disagreement



259





Channel Paper

The following is the deep analysis of the rubric:




the communicative approach so that this is probably the most important skill.



labelled.


Yes. All the descriptors are well explained. However, they do not content any example.


No. Although the number of criteria is adequate and so are the descriptions, latter are very

extensive which may make its use difficult.

The CEFR recommendations for good rubrics (the necessity of building a feasible tool,

descriptors positively worded, brief and not vague) have been partially followed. The

ISE-II is feasible since it contains only 4 different criteria, but it may not be very handy.

In relation to the descriptors, they are not always positive worded. For instance, in the

260

scale 1 “does not maintain and develop the interaction sufficiently” or “does not show

adequate level of grammatical accuracy and lexical precision” can be read. The

descriptors are not brief; in contrast, they are quite long.



fact, the European Council uses most of them. If the labels are not the same, the truth is

that the labels used by the ISE-II measure the same aspects. Hence, the “delivery”

category is equivalent to the CEFR’s ‘accuracy’. The assessment of the ‘accuracy’ is the

“language control” in the Trinity College rubric and the interaction is more or less

equivalent to the communicative effectiveness. In the ISE-II there is one criterion more

which does not appear at the CEFR scales, the interactive listening but this is only because

the guidelines explain the assessment of skills per separate. The validity can also be easily

confirmed since the test, with the help of the rubric, measures what it is supposed to assess

and the descriptors provided match the evaluation standards included in the CEFR as the

table below shows:

Range, Fluency

and Coherence

/Delivery

Accuracy /Language

Control

Interaction

/Communicative

effectiveness


range of language

to be able to give

clear descriptions,

express viewpoints

Shows a relatively

high degree of

grammatical control.

Does not make errors

which cause

Can initiate discourse,

take his/her turn when

appropriate and end

conversation when

he/she needs to. Can

261

on most general

topics.

Can produce

stretches of

language with a

fairly even tempo;

although he/she

can be hesitant.

There are few

noticeably long

pauses.

Can use a limited

number of

cohesive devices

to link utterances.

misunderstanding and

can correct most of

his/her mistakes.

help the discussion

along on familiar

ground confirming

comprehension,

inviting others in, etc.

ISE-II (Scale

4)

• clearly

intelligible

• uses focal

stress and

intonation

effectively

• speaks

promptly

• uses a wide

range of

grammatical

structures/

lexis flexibly

to deal with

topics at this

level

• consistently

shows a high

• fulfils the task

very well

• initiates and

responds with

• effective turn-

taking

• effectively

maintains and

262

and

fluently

• requires no

careful

listening

level of

grammatical

• accuracy and

lexical

precision

• errors do not

impede

communication

• develops the

interaction

• solves

communication

• problems

naturally, if

any

Reliability

Trinity College London reliability data have been mentioned in the writing section.

The above extensive analysis of the ISE-II speaking exam and tasks and the rubric proves

that the exam paper is suitable and matches CEFR indications and levels but that the

rubric used for the assessment of the exam is not suitable in some aspects.

The following table summarises the analysis carried out:

EXAM Tasks: 3




263





Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

Reliable Yes



Brief No

Not vague Yes

264


Scoring Criteria


Yes

Descriptors (well

described)

YES

Clear and Handy? NO

ACLES

The ACLES certification for the B2 level lasts for 7-10 minutes. The candidates must

perform a monologue and also an interaction in pairs. However, neither the tasks in detail

nor the criteria they intend to measure are explained. Hence, it is difficult to compare

whether or not they are suitable for the examination of the speaking exam. It is stated in

the ACLES website that the criteria followed are those provided for the level in the CEFR,

but as they are not clearly stated in any document it is difficult to say it for sure.

For the assessment of the speaking paper, the examiner uses a speaking rubric. This rubric

has three criteria (fluency/interaction; linguistic correction and pronunciation). It can be

265

classified as analytic and quantitative as it uses a numeric scale (the same used for the

writing scale of the ACLES) which also includes a qualitative word-scale. According to

the theme, it is domain-relevant and according to the application is skill-focused. It is a

proficiency rubric; the scorer is an examiner and it is a paper rubric.

Type of rubric according to ACLES







Channel Paper

In depth analysis of the rubric:






Yes. The scoring criteria are few (three) as Popham recommends and are correctly

labelled.


266


descriptors are appropriately described but in some cases they are very vague and

imprecise as they merely indicate a performance between the above and below degrees

of excellence.



handy.

In relation to the CEFR recommendations for good rubrics (the necessity of building a

feasible tool, descriptors positively worded, brief and not vague), the ACLES rubric is

feasible since it contains only 3 different criteria. In relation to the descriptors, they are

positively worded as encouraged by the European Council. The descriptors are brief but,

as it has already been stated, some of the bands are very vague because they only include:

“shares features of Bands X and X” (ACLES. Speaking Rubric) as the only descriptor.



fact, the European Council uses 5 different criteria to assess speaking (range, accuracy,

fluency, interaction and coherence). If the labels are not the same, the truth is that the

labels used by the ACLES measure the same aspects but one.



standards included in the CEFR. It can be checked by means of a simple comparison

between the CEFR writing scale and the ACLES rubric:

267

Range, Fluency and

Coherence /

Fluency, interaction

and adequacy

Accuracy /

linguistic

correction

Interaction


range of language

to be able to give

clear descriptions,

express viewpoints

on most general

topics.

Can produce

stretches of

language with a

fairly even tempo;

although he/she can

be hesitant. There

are few noticeably

long pauses.

Can use a limited

number of cohesive

devices to link

utterances.

Shows a relatively

high degree of

grammatical

control. Does not

make errors which

cause

misunderstanding

and can correct

most of his/her

mistakes.

Can initiate

discourse, take

his/her turn when

appropriate and end

conversation when

he/she needs to.

Can help the

discussion along on

familiar ground

confirming

comprehension,

inviting others in,

etc.

268

ACLES (9-10) Se comunica con

mucha fluidez,

incluso en períodos

largos y complejos.

Suple las

ocasionales

carencias con

paráfrasis y

circunloquios

adecuados. Las

pausas son escasas

y no entorpecen la

comunicación del

contenido temático.

En las tareas

interactivas el

diálogo fluye con

naturalidad, y se

muestra capaz de

iniciar turnos,

tomarlos cuando le

corresponde y

cerrarlos con

eficacia

comunicativa.

Excelente control

gramatical, con

escasos errores

sistemáticos y

pequeños fallos que

no provocan la

incomprensión, y

que a veces

autocorrige. Léxico

abundante,

incluyendo algunas

palabras de uso

poco frecuente o

frases idiomáticas,

adecuadas a la

tarea.

269

Reliability

ACLES reliability have been discussed in the writing section.

The next table summarizes the complete analysis:

EXAM Tasks: 2

Match CEFR tasks: Unknown

Match CEFR criteria: Unknown






Scorer Examiner

Channel Paper

270

Relevant Yes

Valid Yes

Reliable Unknown



Brief Yes

Not vague No


Scoring Criteria


Yes

Descriptors (well

described)

NO


271

EOI

The speaking paper is around 10 minutes long. The candidate may have to hold a

conversation with the teacher or other classmate, comment and describe an image, discuss

about some current issue, express personal opinions, tell about a personal experience,

have a dialogue or role play on a familiar situation in which the candidate proves his or

her ability to develop himself or herself (Escuela Oficial de Idiomas de Gijón.

Departamento de Inglés 169). The different possible types of task which can appear are

(169):

- Answer and ask questions

- Describe people or objects with visual support.

- Narrate dreams, goals or feelings

- Inform on familiar topics

- Tell stories about familiar themes.

- Make hypotheses

- Make a presentation and answer audience’s questions.

- Exchange information on known matters about his or her profession or

interests

- Advice someone

- Complain on common situations

- Manage to communicate in daily situations or journeys

- Check correct information

- Summarise a plot

- Describe processes and procedurals

272

The paper focuses on the candidates’ ability to produce detailed and well organised oral

texts about different topics, both concrete and abstract. Their ability to take part in face-

to-face conversations and through electronic devices with a clear pronunciation and

correctness, fluency and spontaneity which can be understood with no effort, despite

sporadic mistakes.

If compared to the CEFR tasks and criteria, it could be stated that the paper is suitable for

the assessment of the task. However, it could be advisable to specify the exam tasks and

the focus on each of them more clearly.

EOI CEFR (B2)

Some of the following tasks may be

included:

- Answer and ask questions

- Describe people or objects

with visual support

- Narrate dreams, goals or

feelings

- Inform on familiar topics

- Tell stories about familiar

themes.

- Make hypotheses

- Make a presentation and

answer audience’s questions

Tasks:


instructions, etc.)











273

- Exchange information on

known matters about his or her

profession or interests

- Advice someone

- Complain on common

situations

- Manage to communicate in

daily situations or journeys

- Check correct information

- Summarize a plot

- Describe processes and

procedurals


• singing

Criteria:

ability of the candidates:

- to produce oral texts, detailed

and well organised about

different topics, both concrete

and abstract

- to take part in face-to-face and

conversations and through

electronic devices with a clear

pronunciation and correctness,

fluency and spontaneity with

allows the understanding with

Criteria:







topics



274

The speaking rubric is scored with the help of a rubric. The rubric is analytic and contains

6 different criteria (appropriate pronunciation, phonology, rhythm and intonation;

grammar; vocabulary, organisation and register; interaction and discourse management).

It is quantitative and qualitative as it uses both a numeric and a word scales with six

degrees of excellence. It is domain-independent and skill-focused since it is only used for

the assessment of the speaking paper but for any type of task. Finally, it is a proficiency

rubric used by an examiner and paper based.

Type of rubric according to EOI







Channel Paper

no effort despite of sporadic

mistakes


disadvantages of different

options.


fluency



questions are posed

275






No, there are too many criteria. Furthermore, the labels are too long.


Yes, the descriptors are well detailed although a bit long and no examples are provided.


No. There are too many criteria and the descriptors are a bit long so it is very difficult to

manage the rubric.


descriptors positively worded, brief and not vague) have not been followed, the EOI

rubric is not feasible since it has too many criteria. Some of them could easily be grouped

under the same label so that there were fewer. In relation to the descriptors, they are

generally positively worded, as encouraged by the European Council, but there are some

descriptors that are negatively worded, especially in the lowest degrees of excellence. The

descriptors are not brief; they are quite long but well detailed.



fact, the European Council uses 5 different criteria to assess speaking (range, accuracy,

276

fluency, interaction and coherence). If the labels are not the same, the truth is that 5 of the

criteria used by the EOI rubric are also used in the CEFR.



standards included in the CEFR. It can be checked by means of a simple comparison

between the CEFR writing scale and the EOI rubric:

Range, Fluency and

Coherence

Accuracy

/Grammar and

vocabulary

Interaction /


range of language

to be able to give

clear descriptions,

express viewpoints

on most general

topics.

Can produce

stretches of

language with a

fairly even tempo;

although he/she can

be hesitant. There

Shows a relatively

high degree of

grammatical

control. Does not

make errors which

cause

misunderstanding

and can correct

most of his/her

mistakes.

Can initiate

discourse, take

his/her turn when

appropriate and end

conversation when

he/she needs to.

Can help the

discussion along on

familiar ground

confirming

comprehension,

inviting others in,

etc.

277

are few noticeably

long pauses.

Can use a limited

number of cohesive

devices to link

utterances.

EOI (5 points) Adjusts the level of

formality precisely.

Makes a coherent

and clear speech.

Expresses

himself/herself

with spontaneity

and fluency.

Uses high-level

grammar structures

and communicates

with many varied

structures in an

excellent way.

Uses high-level

vocabulary

correctly and is

able to use precise

and varied

vocabulary.

Expresses

himself/herself

with autonomy and

fluency. Uses

expressions to start,

maintain and

conclude a

conversation.

Interaction is made

without much

effort with

spontaneity.

Reliability

Information related to reliability coefficients has been given in the writing section.

EXAM Tasks: Variable

278








Scorer Examiner

Channel Paper

Relevant Yes

Valid Yes

Reliable Unknown


279


Brief No

Not vague Yes


Scoring Criteria


No

Descriptors (well

described)

Yes

Clear and Handy? No

6.4. Reading

Reading rubrics are not very widely used, neither here in Spain nor in countries such as

the United States, where rubrics have been commonly implemented for many years. In

fact, the well-known English as a foreign language certificates mentioned in the previous

section (Cambridge Certificates, Trinity College London, IELTS, AICLES and the EOI)

do not use any rubric to assess this skill. As has been already stated, rubrics were

280

traditionally writing assessment tools. The success of the communicative approach

facilitated its inclusion as part of the assessment of the speaking skill. However, despite

not being frequently used for the evaluation of the receptive skills, rubrics can work as

tools to evaluate the reading and listening skills too.

The reading comprehension is most commonly assessed through multiple choice

questions, true or false questions, sentence completion, open questions, gapped texts or

summaries. The CEFR indicates that the tasks selected should focus on (69):


• reading for information, e.g. using reference works



• The language user may read

• for gist



• for implications, etc


The correct interpretation and comprehension of a text is what is often referred to as

reading. According to Al-Ghazo it is an “active and mental process that improves

concentration and focus” (722). There are many reasons why the assessment of the

reading skill is important, but most of the literature reviewed mentions the same ideas.

Uribe-Enciso, among other authors, postulates that an effective reading opens the access

to a huge amount of digital and print information (39) while Al Ghazo stresses the benefits

281

as it provides the readers with such an expansion of vocabulary or the improvement of

other language aspects (grammar, writing) (722).

“In language learning reading promotes continuous expansion of vocabulary, full

awareness of syntactic structures and forms of written discourse, development of

cognitive skills and learner autonomy and increasing comprehensive knowledge

of any topic readers want to learn about” (Uribe-Enciso 39).

The process of reading implies many different sub-processes such as anticipation,

intelligent guesses the reader makes at the beginning, the understanding of the main ideas,

the comprehension of unknown words by context and, as Grabe states all those processes

“are performed according to the reader’s language proficiency level, the text type, the

reading purpose, the reader’s motivation” (cited in Uribe-Enciso 40). In the matter of

some of those sub-processes or reading skills, the elt resourceful web page created by

Rachel Roberts contains a full blog post where their use and instruction by teachers is

encouraged. The skills of prediction, reading for gist, reading for specific info and

skimming and scanning are explained there. The idea of encouraging prediction before

reading is related to the activation of vocabulary and knowledge previously known, which

might be connected to the topic used in the text and eases comprehension. Similar to the

prediction skill is the reading for gist process, which aims to provide students with an

overview of the text to verify they understand just the main idea of it, so that later on they

can try to read for specific information. Finally, skimming and scanning are two strategies

normally used with L2 learners as they allow them to glean important pieces of

information (scanning) without reading in detailed or getting the important ideas of it

(skimming). These strategies are very helpful in increasing the speed of reading with

regard to a language certificate exam or in improving their synthesis ability. Grellet

suggests these skills or processes are actually different types of reading although some of

282

the terminology she uses is different (extensive and intensive reading (cited in Karppinen

4). Knowledge of the processes, skills or types of reading is essential for its assessment

in L2 since L2 reading is more complex owing to the fact that “acquisition of systematic

knowledge and development of reading skills occur simultaneously” in contrast to what

happens in L1 reading (Uribe-Enciso 40). Therefore, the reinforcement and training of

them will balance and facilitate the L2 learner’s tasks.

The National Reading Panel (NPR) affirms there are three major factors which affect

reading comprehension. Those factors are vocabulary instruction, the active-interactive

strategic process and, finally, the preparation of the teacher. Thus, instructors must use

meta-cognitive strategies which deal with the planning, monitoring and evaluation of the

reading comprehension, cognitive strategies associated with the incoming information

and social and affective strategies linked to the interaction. (cited in Al-Ghazo 723). As

for the methodological strategies which should be used in the instructions, many studies

have been written. Most of them agree on a division of the strategies in lower-level

process or bottom-up; and the higher-level process or top-down. Kianiparsa and Vali

define the former as those processes “related to grammar and vocabulary recognition”,

whereas the latter ones are “related to comprehension, schemata, and interpretation of a

text (9). The same division was made by Grabe and Stoller. They classify the tasks of

word recognition, syntactic parsing, semantic proposition formation and working

memory activation as lower-level processes and text model of reading comprehension,

situation model of the reader interpretation and executive control processing as higher-

level process (cited in Karppinen 4-5). “Some bottom-up theorists, such as Abraham,

Carrel and Eisterhold, claim that the lack of automaticity in accessing linguistic data

causes poor skilled reading” (cited in Uribe-Enciso 40) so it is important to train this skill.

283

However, Kianiparsa defend that “for being a competent reader we need a combination

of both these processes [lower- and higher-level]” (10)

Some other strategies are mentioned in the instruction of reading. KWL (Know-Want to

Know-Learned), for example, promotes planning, goal setting, monitoring and evaluation

of the information contained in the text. CORI (Concept Oriented Reading Instruction)

focuses on selecting topics based on personal interests, gathering information through

reading and then working on a project. Strategies based on comparative learning tasks,

such as puzzles, conform the CRS (Collaborative Strategic Reading) (Uribe-Enciso 43).

The evaluation of the reading skill should be a tool for gathering information from the

learner’s reading abilities later used for the planning and implementation of better reading

lessons Thus, a wide variety of text and reading characteristics should be present in the

evaluation, and some different kinds of assessment methods and tools should be applied

(Kianiparsa and Vali 10 18). The most common reading assessment tasks encompass

“multiple-choice, written and oral recall, cloze, summary, sentence completion, short-

answer, open-ended question, true/false, matching activity, check list, ordering and fill-

in-the blank test” (14). Karppinen conducted research based on the kind of reading

activities and strategies used by Finnish ESL books. The results of her investigation show

that individual and pair activities are more common than group activities (only 7-10% of

the total) (14). It was also found that 75% of the reading activities were post-reading, in

contrast with a 25% of the activities conceived for pre-reading. Furthermore, no activities

were intended to be carried out during the reading (14). The most frequent tasks were

summary, open-ended questions and translation exercises (15). With regard to the training

of reading strategies, around 45% of the activities focused on careful or detailed reading,

only 20% worked on scanning, and a scant 3% dealt with skimming (16). As for purposes,

between 47% and 32% of the activities aimed to understand the core ideas of the text,

284

while around 30% aimed to elicit a personal response (17). Finally, close to 50% of the

reading activities where combined with speaking, and between a 34% and 44% were

combined with writing. The listening skill, however, is not often combined with any

reading activities (18).

The use of rubrics or grading scales for the assessment of the reading skill is virtually

non-existent, as is the literature review on it. Nevertheless, Grabe argues that “learners

sometimes do not carry out successful reading tasks because they are not aware of the

reading purposes and, therefore, they do not know what strategies are more appropriate

for them” (cited in Al-Ghazo 44). As a result, the research on the creation of rubrics and

its validation with case studies could be an interesting tool. Students could know what the

purposes and criteria are and address the reading task bearing them in mind.

6.4.2. Assessment of Reading in the main English Certificates of ESL


The assessment of the reading skill in the Cambridge First Certificate is carried out

through a paper which includes both the Reading and the Use of English exam. The time

allowed is 1 hour and 15 minutes for a total of seven tasks. Three of those tasks intend to

measure the reading comprehension (Part 5, 6 and 7). “For Parts 5 to 7, the test contains

a range of texts and accompanying reading comprehension tasks” (Cambridge English

Language Assessment 7). Part 5 consists of a text followed up by 6 multiple choice

questions with 4 possible options each. This task intends to measure the

“detailed understanding of a text, including the expression of opinion, attitude,

purpose, main idea, detail, tone, implication and gist. Candidates are also tested

285

on their ability to recognise meaning from context and follow text organisation

features, such as exemplification, comparison and reference” (8)

Part 6 is a gapped text, some sentences from the original texts are removed and the learner

must identify which one goes in each gap. In this case, the focus is on the “text structure,

cohesion and coherence, and candidates’ ability to follow the development of a long text”

(9). Finally, part 7 consists of a long text divided into different parts and named by a

letter. There are ten questions and the candidate must indicate in which part of the text

that information can be found. The aim of the task tries to check how the candidate can

locate specific information and detail and recognise opinion and attitude in one long text

or a group of short texts.

With regard to the CEFR, the reading global scale provided states that a B2 candidate is

able to:

“read with a large degree of independence, adapting style and speed of reading to

different texts and purposes, and using appropriate reference sources selectively.

Has a broad active reading vocabulary but may experience some difficulty with

low frequency idioms” (69).

In addition, the CEFR contains four other reading scales:

• Reading correspondence

• Reading for orientation

• Reading for information and argument

• Reading instructions

The most important criteria included in those scales for the B2 users include the ability

to read correspondence, follow a text of instructions, understand specialised articles and

286

reports related to current issues. Among the functions that the candidate has to be able to

do are scanning through long and complex texts and finding relevant details and the

identification of the most important information, ideas or opinions (CEFR 69-71). Thus,

the tasks included in the First Certificate for the assessment of reading match many of the

criteria of the framework. However, the understanding of texts of instructions is not

present and it could be included. The understanding of correspondence is not assessed

through a specific task of the reading paper. Nevertheless, it is checked indirectly in the

writing paper since one of the tasks may include part of a letter received which must be

answered.

FCE CEFR

Task:

Part 5: multiple choice

Part 6: gapped text

Part 7: Multiple Matching

Tasks:


• reading for information, e.g. using

reference works



• for gist




Criteria:

Part 5: detailed understanding of a text,

including the expression of opinion,

Criteria:

• read with a large degree of

independence, adapting style and

speed of reading to different texts

287

attitude, purpose, main idea, detail, tone,

implication and gist

Part 6: text structure, cohesion and

coherence, and candidates’ ability to

follow the development of a long text

Part 7: locate specific information and

detail, and recognise opinion and attitude

and purposes, and using

appropriate reference sources

selectively



• understand specialised articles

and reports related to current

issues

• scan through long and complex

texts


• identification of the most

important information, ideas or

opinions

Concerning the assessment tool, the First Certificate does not use any kind of rubric or

grading scale to assess the candidate’s performance in the reading paper.

IELTS

Three reading passages are used to assess the reading completion in a 60-minute long test.

There is a total of 40 items. However, the three reading tasks are not always the same, but

they are three among from a total of 11different types. Task 1 type is a reading with

“multiple-choice” questions. The candidate may have to decide either which is the best

answer from four possible or the two best from five. The number of questions for this test

288

is variable and it “tests a wide range of reading skills, including detailed understanding

of specific points or an overall understanding of the main points of the text” (IELTS web

page. Test format). The second reading task type is referred to as “identifying

information”. The text is followed by several true, false or not mentioned questions and

it aims to assess “the test takers’ ability to recognise particular points of information

conveyed in the text. It can thus be used with more factual texts”. The next task type is

called “Identifying writer’s views/claims”, and the candidates may have to decide

whether some statements match the author’s opinion or not or if the information is not

given. It “assesses the test takers’ ability to recognise opinions or ideas, and so it is often

used with discursive or argumentative texts”.

The “matching information” task type implies the localisation of specific information

within paragraphs marked with letters. It focuses on the “ability to scan for specific

information”. Meanwhile, the “Matching headings” task measures the ability to

“recognise the main idea or theme in the paragraphs or sections of a text, and to

distinguish main ideas from supporting ones” (IELTS web page. Test Format) and

consists in matching each paragraph with the right headline. The “Matching features” task

type requires matching some statements with a list of options in order to check the

candidate’s ability to recognise relationships and connections. Those reading questions

that involve matching the first half of a sentence with one form a list of several options

which are labelled under the “matching sentence endings” task type and are aimed at

testing the understanding of the core ideas in a sentence.

Candidates may also be asked to complete a sentence with a given number of words based

on the text information, so that their ability to locate specific information can be checked.

This sort of task is named “sentence completion”. When the candidates need to complete

the summary or a table with information drawn from the text, the task type is called

289

“Summary, note, table, flow-chart completion”. The “Diagram label completion” task

type is the same, but with a diagram. Both of these tasks attempt to assess the “ability to

understand a detailed description” (ibid.). Finally, the “short-answer questions” task type

consists in answering some questions based on the information of the text with a specific

number of words. The task’s focus is the “ability to locate and understand precise

information in the text.” (ibid.)

With regard to the CEFR, as has already been explained above, it advises the uses of tasks

that implies the reading for different purposes and in different ways. The IELTS test

format, with three tasks from eleven possible types, broadly covers all the tasks indicated

in the CEFR.

IELTS CEFR

(3 tasks from 11 possible types)

Task type 1: Multiple choice

Task type 2: Identifying information

Task type 3: Identifying writer’s

views/claims

Task type 4: Matching information

Task type 5: Matching headings

Task type 6: Matching features

Task type 7: Matching sentence endings

Task type 8: Sentence completion

Task type 9: Summary, note, table, flow-

chart completion

Task type 10: Diagram label completion

Tasks:



reference works



• for gist




290

Task type 11: Short-answer questions

Criteria:

Multiple choice tests a wide range of

reading skills, including detailed

understanding of specific points or an

overall understanding of the main points

of the text

Identifying information assesses the test

takers’ ability to recognise particular

points of information conveyed in the

text. It can thus be used with more factual

texts

This type of task assesses the test takers’

ability to recognise opinions or ideas, and

so it is often used with discursive or

argumentative texts.

Matching information assesses the test

takers’ ability to scan for specific

information. Unlike task type 5,

Matching headings, it is concerned with

specific information rather than with the

main idea.

Matching headers tests the test takers’

ability to recognise the main idea or

Criteria:






selectively

• ability to read correspondence,

• follow a text of instructions,



issues.


texts




opinions

291

theme in the paragraphs or sections of a

text, and to distinguish main ideas from

supporting ones

Matching features assesses the test

takers’ ability to recognise relationships

and connections between facts in the text

and their ability to recognise opinions

and theories. It may be used both with

factual information, as well as opinion-

based discursive texts. Test takers need to

be able to skim and scan the text in order

to locate the required information and to

read for detail.

Matching sentence endings assesses the

test takers’ ability to understand the main

ideas within a sentence.

Matching sentence endings assesses the

test takers’ ability to locate detail/specific

information

Summarising assesses the test takers’

ability to understand details and/or the

main ideas of a section of text. In the

variations involving a summary or notes,

test takers need to be aware of the type of

word(s) that will fit into a given gap (for

292

example, whether a noun is needed, or a

verb, etc.).

Diagram label completion assesses the

test takers’ ability to understand a

detailed description, and to relate it to

information presented in the form of a

diagram.

Short answer questions assess the test

takers’ ability to locate and understand

precise information in the text

As for the assessment tool, the IELTS does not use any kind of rubric or grading scale to

assess the candidate’s performance in the reading paper.

ISE-II

The Trinity College London reading paper for the B2 level is, as has already been

mentioned and explained, is taken along with the writing exam. Therefore, only the

writing tasks are explained in this section. There are two exclusive tasks for the

assessment of the reading comprehension. Task 1 is a 500-word text divided into 5

paragraphs which can be an article, a review, magazine, textbook or any other format the

candidate is familiar with. After reading the text, the candidate must answer 15 different

questions. The first 5 questions are for the candidate to demonstrate that he or she has

understood the main ideas though the matching of some headlines with each of the

293

paragraphs of the text. Afterwards, the student must find 5 true statements from a list of

eight. This task intends to test the understanding of specific information. Questions 11 to

15 are also for the understanding of specific and factual information. The learner has to

complete some sentences with one word or a few words.

Task 2 of the reading paper is called “multi-text reading” and it consists of four reading

texts presented together with 15 questions afterwards. The four texts together are roughly

about 500 words. In the first items, the candidate must point out to which of the texts the

question asked is related so that it is proved he or she understands the main idea and

purpose of each of them. The following questions are equivalent to the second section of

questions of the task 1: selection of 5 true statements from a list of 8. The last questions

(26-30) are for the completion of a summary with a number of words from the text.

As for the CEFR, it is advisable to use tasks that involves the reading for different

purposes and in different ways. The ISE-II test format, with two tasks which contain

different questions types and areas, covers most of the CEFR requirements.

ISE-II CEFR

Task I: Long reading

Title matching

Selecting the true statements

Completing Sentences

Task II: Multi-text reading

Multiple Matching


Completing Summary notes

Tasks:



reference works



• for gist


294



Criteria:

Task 1:

Title matching

The candidate must demonstrate that he

or she understands the

main idea of each paragraph.

Some useful reading subskills to practise

for this section are:

- skimming

- scanning

- reading for gist

- understanding the main idea

of each paragraph.


demonstrates that he or she understands

specific, factual

information at the sentence level.



- careful reading for specific

information

- comparing, evaluating and

inferring

Criteria:






selectively





issues


texts




opinions

295

- distinguishing principal

statement from supporting

examples or details

- distinguishing fact from

opinion

- scanning

Completing Sentences

demonstrate that he or she understands

specific, factual information at the

word and/or phrase level OR can infer

and understand across paragraphs

(e.g. writer’s attitude, line of argument).



- careful reading for

comprehension

- understanding cohesion

patterns, lexis, grammar and

collocation

- deducing meaning

- understanding across

paragraphs.

Task 2: multi-text reading

Multiple Matching

296

demonstrate that he or she

understands the main idea and purpose of

each text.



- skimming

- scanning

- reading for gist

- reading for purpose or main

ideas.



specific, factual

information at the sentence level.



- careful reading for specific

information

- comparing, evaluating and

inferring

- distinguishing principal

statement from supporting

examples or details

- distinguishing fact from

opinion

297

- scanning.

Completing summary notes


specific, factual information at the

word and/or phrase level across the texts.



- careful reading for

comprehension at the word

and/or phrase level across

texts

- inferring

- summarising.

The Trinity College London ISE-II certificate does not use any rubric for the assessment

of the reading tasks in the reading and writing paper.

ACLES

The reading comprehension paper may have between 2 or 4 tasks and it lasts for between

60 and 70 minutes. The texts used are authentic with a minimum of 1,300 words each and

a maximum of 2,100 in total. The paper aims to check that the candidate is able to

understand the main ideas of complex texts. The Association of Language Centres in

Higher Education gives some guidelines, but it is each of the examination centres which

298

decides certain aspects. For example, the University of A Coruña centre of languages

decides whether to include 2 or 4 tasks. From its website, the criteria for the paper can be

found (Centro de Linguas, Universidade de A Coruña web page). These criteria included

in the following table have been translated from its website:

Acles CEFR

2-4 tasks

• Understand the main ideas of

complex texts

Tasks:



reference works


• reading for pleasure.

• for gist



• for implications, etc

Criteria:

• Able to read independently,

adapting the style and the speed to

the different texts. The candidate

has a wide range of vocabulary

and may have some difficulties

with uncommon terms

Criteria:






selectively



299

• Able to read and understand the

gist of texts related to his or her

speciality

• Can search for relevant details

fast

• Identifies the main contents in a

piece of news, article or reports

about a wide range of

professional issues

• Understand specialised article and

is can use a dictionary to confirm

his or her interpretation of

specific terms

• Get information, ideas and

opinions of different specialised

sources

• Understand extensive, complex

instructions, including details,

conditions and warnings



issues


texts




opinions

According to the above, the criteria stated by ACLES do not match the CEFR guidelines,

since they are intended to check the candidate is able to understand just the main ideas.

Nevertheless, through the criteria stated by the centre of language of the UDC, it can be

300

elucidated that the identification of details and following of instructions are also being

measured. Moreover, the criteria stated are virtually equivalent to those stated in the

CEFR reading scales.

No rubric is used for the assessment of the reading paper.

EOI

The reading paper lasts for about 60 minutes. The number of tasks is variable, but they

may be one of the following (Escuela Oficial de Idiomas de Gijón. Departamento de

Inglés 173-174):

• Answer questions about general and specific reading comprehension (open-

ended, multiple choice, true or false)

• Find words or expressions in a text for the given definitions

• Match text fragments with ideas

• Complete dialogues

• Complete a text with given words

• Choose the best title for the text

• Indicate the main idea

• Match paragraphs with different titles

• Compare similarities and differences between two texts

• Identify the author’s purpose, intention or opinion

• Order a text

• Choose the best summary of the text

• Remove illogical words

301

• Recollocate sentences which have been removed from a text

• Complete a text with gaps

• Identify statements related to a text

• Use information from a text to solve a problem

• Ask questions to given answers

• Translate sentences

The texts used can be conversations or dialogues, applications forms, public

announcements, commercial adverts or informative leaflets, basic information on service,

instructions, postcards, e-mails, faxes, descriptive texts (people, places houses, work, etc.)

or short stories (174)

As for the criteria the exam intends to assess the reading comprehension of the candidates

and to check their ability to (172-173):

- Understand extensive, complex instructions

- Identify content and main ideas in articles, reports and pieces of news quickly.

- Understands letters and emails

- Understands articles and reports on current affairs in which the author

expresses a concrete opinion or point of view

- Identifies different points of view and main conclusions

- Identifies the topic, argumentative line, main ideas and details

- Interprets the cultural features, social conventions and lifestyles which appear

in the text

302

In regard to the guidelines given by the European Council in the framework, the paper

matches the tasks proposed and also the criteria to assess the candidate’s reading

comprehension ability.

EOI CEFR

Variable number of tasks

• Answer questions about general

and specific reading

comprehension (open-ended,

multiple choice, true or false)

• Find words or expressions in a text

for the given definitions

• Match text fragments with ideas

• Complete dialogues

• Complete a text with given words

• Choose the best title for the text

• Indicate the main idea

• Match paragraphs with different

titles

• Compare similarities and

differences between two texts

• Identify the author’s purpose,

intention or opinion

• Order a text

Tasks:



reference works



• for gist




303

• Choose the best summary of the

text

• Remove illogical words

• Recollocate sentences which have

been removed from a text

• Complete a text with gaps

• Identify statements related to a text

• Use information from a text to

solve a problem

• Ask questions to given answers

• Translate sentences

Criteria:

- Understand extensive,

complex instructions

- Identify content and main

ideas in articles, reports and

pieces of news quickly

- Understands letters and emails

- Understands articles and

reports on current affairs in

which the author expresses a

concrete opinion or point of

view

Criteria:






selectively





issues.

304

- Identifies different points of

view and main conclusions

- Identifies the topic,

argumentative line, main ideas

and details

- Interprets the cultural features,

social conventions and

lifestyles which appear in the

text


texts




opinions

6.5. Listening

Concerning the listening skill, grading scales are not commonly used either. As occurred

with the other receptive skill, reading, the common English certificates mentioned in this

piece of work (with the exception of the ISE II) do no use any rubric to assess this skill.

As for the most common tasks used to the assessment of listening, they are multiple

choice questions, true or false listenings, open questions and sentence completion

exercises.

The CEFR lists the following listening tasks to assess this skill (65):



305

• listening as a member of a live audience (theatre, public meetings, public lectures




The literature about the use of rubrics is, as occurs with the other receptive skill, barely

non-existent. As was the case with the reading, most of the research deals with the

difficulties of the process and appropriate strategies of instruction. Some of the

information may be relevant for this dissertation as it is important to take it into account

for the construction of a rubric which can help the assessment of the listening skill.

Helgesen stated that “listening is an active, purposeful process of making sense of what

we hear” (24). As Celce-Murcia highlights, the importance of the assessment of this skill

comes from, the fact that it is the most frequently used in the daily life (cited in Solak and

Altay 190). Another vital reason for the importance of the listening comprehension is that

no learning can be realised without the correct comprehension of the input, which implies

that listening is essential for the development of the productive skill speaking (191).

The listening comprehension is a “highly problem-solving activity that can be broken

down into a set of distinct sub-skills” (cited 191). As a result, many studies have stressed

the necessity to integrate different processes such as the phonetic, phonological, prosodic,

lexical, syntactic, semantic and pragmatic in order to be able to understand any spoken

message. (190). The difficulties within the process are vast; Hedge classifies them into

internal and external (cited in ibid. 191). The lack of motivation of the level, his or her

level of anxiety, the lack of knowledge of the topic discussed or the appearance of many

306

unknown words are internal problems, whilst the environmental noises or the source

speaker’s characteristics are encompassed under the external problems label. Some other

authors have also pointed out certain factors that may be considered as obstacles, such as

the speaker’s accent, omissions, the length of the listening, the poor quality of the

recording or even the distance between the recorder and the listeners. But also the lack of

knowledge of listening strategies has been mentioned as a factor which hinders

comprehension (191).

Richards has collected some of those strategies which may be relevant for the students in

order to face a listening task and that should be instructed and trained. On the one hand,

the cognitive strategies “the mental activities related to the comprehending and storing in

working memory or long-term memory for later retrieval” (10). On the other hand, the

meta-cognitive strategies “are conscious or unconscious mental activities that perform an

executive function in the management of cognitive strategies” (11). Solak and Altay

conducted research on the beliefs of English Language Comprehension Problems with 12

prospective teachers. The findings suggest that research takers do not have difficulties to

find the main ideas of the listening or elicit the knowledge related to the topic, rather they

have problems with words not pronounced clearly and varied accents. In addition, the

presence of many unknown words was found to be as the most important reason for the

failure in the listening comprehension.

Some research conducted has studied various instruction strategies and activities which

may be useful to train this skill and assess it. In an article for The Internet TESL Journal,

Lavelle explained how to make the most of the listening activities contained in an ESL

textbook. He argues that there are basically four different phases for a lesson based on

listening practice. In the Listening Phase he suggests using both top-down and bottom-

up strategies. His proposal includes a first listening approach through the ticking of

307

certain key words or sentences. Once those words have been located, the learners listen

again in order to try to understand the meanings of those words. The second part of this

phase might consist on questions on the understanding of the whole passage. The second

phase is that of Grammaticalsation. Lavelle suggests Bastone’s bottom-up method, which

could be the use of the words previously learned in sentences and the combination thereof

through the application of suitable grammar. The third phase is named “Focus on Lexis”

and will discuss different collocations related with the vocabulary learned. Finally, the

Personalization Phase will lead students to discuss different questions in which they

would need to use the vocabulary and applied the grammar and collocations learned.

Walters and Chien recommend summarisation as a “high-skill” exercise for advanced

listening training as it requires learners to extract the main ideas and re-organise them

(313). The case study they carried out included eleven English listening and speaking

teachers and ten American native-English speaking college students. All members were

surveyed on their listening assessment preferences and the identification of how they

would assess some specific news text. The results suggest preferences for checking the

main ideas, identifying vocabulary and both listening for both gist and detailed

information. The selection of key ideas, vocabulary and model summary for the design

of a listening assessment enabled the researchers conclude that both groups of participants

agreed on the main ideas and most of the key vocabulary. Thus, summarisation was

validated as a useful technique.

6.5.2. Assessment of listening in the main English Certificates of ESL


308

The Cambridge First Certificate paper for the assessment of the listening skill consists of

four different tasks. The first task involves listening to eight different and unrelated short

extracts and one multiple choice question for each of them. Each of the extracts has a

different focus: main point, purpose or location of the speech, relationships between

speakers’ attitude or opinion of the speakers. Part 2 is a monologue and the completion

of ten sentences in order to assess specific and detailed information. Gist, detail, function,

attitude, purpose and opinion are assessed in the third task. In this part, the candidate must

match statements with one of the five speakers. The final task is an interview or a

conversation and there are seven multiple choice questions with four possible answers

each. The focus is on the specific information, opinion, attitude, gist and main idea.

(Cambridge English Language Assessment 51)

In the CEFR there are five different scales to assess the listening skill. In the overall one,

it is pointed out how a B2 user:

“Can understand the main ideas of propositionally and linguistically complex

speech on both concrete and abstract topics delivered in a standard dialect,

including technical discussions in his/her field of specialisation. Can follow

extended speech and complex lines of argument provided the topic is reasonably

familiar, and the direction of the talk is sign-posted by explicit markers” (66)

Besides this scale, other four are provided for:

• Understanding interaction between native speakers

• Listening as a member of a live audience

• Listening to announcements and instructions

• Listening to audio, media and recordings

309

Those scales give the main criteria a B2 user must master in relation to the listening skills.

The criteria encompass the ability to keep up with a conversation, understand much of

what is said in a discussion in which he/she is participating and being able to participate,

understand and follow lectures, talks and reports with academic vocabulary,

announcements and messages and radio documentaries or broadcast audio. It can also

identify viewpoints and attitudes of different speakers (CEFR 66-68). According to this,

the tasks which form the listening exam would be suitable to assess the listening

comprehension of the learner. Nevertheless, none of them implies the response or

participation of the candidate to a discussion with native speakers and this could be an

interesting task for the certification of the level.

FCE CEFR

Task:

Task 1: short extracts multiple choice

Task 2: sentence completion

Task 3: Multiple Matching

Task 4: Interview multiple choice

Tasks:

• listening to public announcements

(information, instructions, warnings, etc.)

• listening to media (radio, TV,

recordings, cinema)

• listening as a member of a live audience

(theatre, public meetings, public lectures,



Criteria:

Task 1: main point, purpose or location

of the speech, relationships between

Criteria:

• main ideas of propositionally and

linguistically complex

310

speakers’ attitude or opinion of the

speakers

Task 2: specific information, detail,

stated opinion

Task 3: Gist, detail, function, attitude,

purpose and opinion

Task 4: specific information, opinion,

attitude, gist and main idea

• Can follow extended speech and

complex lines of argument


• understand much of what is said

in a discussion in which he/she is

participating and be able to

participate

• understand and follow lectures,

talks and reports with academic

vocabulary, announcements and

messages and radio

documentaries or broadcast audio

• identify viewpoints and attitudes

of different speakers

As for the assessment tool, the First Certificate does not use any kind of rubric or grading

scale to assess the candidate’s performance in the listening paper.

IELTS

The listening paper includes 4 different tasks with 40 questions items and has a duration

of 30 minutes. The 4 tasks are taken from a total of 6 different tasks types.

The Multiple-choice task type, as its name indicates, consists in answering one question

followed by three possible answers to check the candidate’s ability to understand

311

“specific points or an overall understanding of the main points of the listening text”.

(IELTS web page. Test format). Another task type is the so-called “matching” type, and

test takers must match a list of items with a set of options to check the skill of listening

for detail. The “plan, map, diagram labelling” task type consists in completing a map or

diagram with the suitable words to assess the ability to understand instructions. A similar

task type is “Form, note, table, flow-chart, summary completion”, but in this case the

learner must complete a summary or a table with the same intention. Although it may

seem similar, the “sentence completion” task type is different from the previous ones

since the candidates are required to complete some sentences with the suitable words, but

the test items are not as visual as in the previous two types but take the form of a normal

text. However, the focus is the same. Finally, candidates have to answer briefly one

question in the “short-answer” question types which aims to test the “ability to listen for

concrete facts, such as places, prices or times, within the listening text.” (ibid.)

The CEFR gives different examples of tasks suitable for the assessment of the listening

comprehension. The wide variety of task types the IELTS uses makes easy to confirm

that the main criteria to determine the level of the speaker will be covered with this test

type.

IELTS CEFR

4 Tasks from 6 possible task types:

Task type 1 – Multiple choice

Task type 2 – Matching

Task type 3 – Plan, map, diagram

labelling

Tasks:




recordings, cinema)

312

Task type 4 – Form, note, table, flow-

chart, summary completion

Task type 5 – Sentence completion

Task type 6 – Short-answer questions





Criteria:

Multiple choice questions are used to test

a wide range of skills. The test taker may

be required to have a detailed

understanding of specific points or an

overall understanding of the main points

of the listening text.

Matching assesses the skill of listening

for detail and whether a test taker can

understand information given in a

conversation on an everyday topic, such

as the different types of hotel or guest

house accommodation. It also assesses

the ability to follow a conversation

between two people. It may also be used

to assess test takers’ ability to recognise

relationships and connections between

facts in the listening text

This type of task assesses the ability to

understand, for example, a description of

Criteria:









participate




messages and radio




313

a place, and to relate this to a visual

representation. This may include being

able to follow language expressing

spatial relationships and directions (e.g.

straight on/through the far door).

This focuses on the main points which a

listener would naturally record in this

type of situation.

Sentence completion focuses on the

ability to identify the key information in

a listening text. Test takers have to

understand functional relationships such

as cause and effect.

Sentence completion focuses on the

ability to listen for concrete facts, such as

places, prices or times, within the

listening text.

No rubric is used for the assessment of this paper.

ISE II

314

As stated above, the listening and the speaking skills are assessed together in the listening

and speaking paper. Of the 4 tasks of this paper, the final one is specifically for the

assessment of listening, although it is also assessed in the speaking tasks.

The “independent listening task” lasts for 8 minutes. It consists in listening to a

monologue. The examiner asks the candidate some questions before and after the

listening. The first time the monologue is played the questions are intended to check the

listening for gist. The second time, the questions require listening for details. The

candidate answers orally but may take some notes the second time.

The CEFR does include listening to a monologue (listening as a member of a live

audience, theatre, public meetings, public lectures, entertainment, etc.) as a suitable task

to assess the listening comprehension of the students. Nevertheless, it would be

recommendable to include some other listening tasks with varied formats. Besides, the

fact that there is only one task makes it difficult to assess all the criteria the CEFR

recommends for the assessment of the listening comprehension in the B2 level.

ISE-II CEFR

Part 4: Independent listening task.

Listening to a monologue twice.

Tasks:




recordings, cinema)



315



Criteria:

Showing ability to process and report

information, including main points and

supporting detail

- Placing information in a wider

context

- Inferring information not

expressed explicitly

- Reporting speaker’s intentions

- Inferring word meaning

Criteria:









participate




messages and radio




316

The ISE-II uses two rubrics to assess the listening and speaking paper. On the one hand,

in the rubric for the speaking tasks the listening criteria has been included. On the other

hand, a rubric exclusive for the independent listening task is used for the assessment.

The rubric is holistic as the skill is assessed globally. It is quantitative with a numeric

scale from 0 to 4. As it is only used for the assessment of this particular task, it is a

domain-independent rubric and task-specific. Moreover, it is a proficiency rubric, paper

based although it can be found online, and it is used by an examiner.

Type of rubric according to ISE-II

How is measured Holistic



Application Task specific



Channel Paper

Below there is the analysis in depth:


Yes. The skill assessed is the listening skill. In the literature review section, the

importance of the listening skill has been fully addressed.


It is a holistic rubric so there are no individually assessed criteria.


317

Yes. All the descriptors are well explained. However, they do not contain any example.


Yes, the rubric is brief and clear which eases its use.


descriptors positively worded, brief and not vague) have been partially adopted, the ISE-

II rubric is feasible since it does not contain individual criteria to be assessed and is very

clear and handy. In relation to the descriptors, they are positively worded as the CEFR

recommends.

Finally, are the rubric and the test relevant, valid and reliable? On the one hand, the

criteria for the task are relevant because the CEFR includes both the task and the listening

functions it intends to measure. However, since there is only one listening task, it lacks

many functions and types of recording the framework recommends. The validity can also

be easily confirmed since the test, with the help of the rubric, measures what it is supposed

to assess. The main problem is that due to the lack of varied listening tasks it cannot be

checked if the candidate can do all the things stated in the CEFR scale for a B2 learner.

CEFR

B2

Can understand standard spoken language, live or broadcast, on both

familiar and unfamiliar topics normally encountered in personal, social,

academic or vocational life. Only extreme background noise, inadequate

discourse structure and/or idiomatic usage influences the ability to

understand.

Can understand the main ideas of propositionally and linguistically

complex speech on both concrete and abstract topics delivered in a

318

standard dialect, including technical discussions in his/her field of

specialisation.

Can follow extended speech and complex lines of argument provided

the topic is reasonably familiar, and the direction of the talk is sign-

posted by explicit markers.

ISE-II

(Level 4)

• Identifies and reports all important points relevantly

• Shows full understanding of main points, and how they relate

to the message as a whole

• Makes sense of connected English speech rapidly and

accurately with confidence

• Fully infers meanings left unstated (e.g. speaker’s

viewpoints)

Reliability

Reliability information has been given in the writing section.

EXAM Tasks: 1 (task 3 of the speaking paper also assesses

listening a little)

Match CEFR tasks: No

Match CEFR criteria: No

319

RUBRIC Type Measurement Holistic





Scorer Examiner

Channel Paper

Relevant No

Valid No

Reliable Yes



Brief Yes

320

Not vague Yes


Scoring Criteria


Not relevant

Descriptors (well

described)

Yes


ACLES

The listening paper included in the B2 certificate of ACLES lasts for around 30 to 40

minutes. There must be at least 2 tasks and a maximum of 4. The records can be in video

or simply audio format and they must last at least 2 minutes but no more than 5. The UDC

centre of languages states that the paper aims to check the ability of the candidate to

understand face-to-face conversation and recorded speeches about different topics from

personal life to academic or professional issues. The criteria included in the following

table are the ones quoted in its web page. (Centro de Linguas Universidade da Coruña

web page)

321

ACLES CEFR

Tasks:

(Two, three or four)

Ability of the candidate to understand

face-to-face conversation and recorded

speeches about different topics from

personal life to academic or professional

issues

Tasks:




recordings, cinema)





Criteria:

• understand any sort of speech

unless there is excessive

background noise or contains too

many specific terms or a bad

structure

• Understand main ideas of a

complex speech about concrete or

abstract topics

• Understand complex lines of

argumentation when the topic is

familiar and is developed with

explicit markers

Criteria:



• can follow extended speech and






participate




322

messages and radio




According to the table, the listening paper is suitable to assess whether the candidate has

a B2 listening level. It would be advisable to include any listening task which involved a

conversation between two speakers.

The ACLES listening exam paper does not use a rubric for the assessment.

EOI

The listening paper from the EOI advanced level lasts for at least 45 and is formed by

different recordings (number unspecified) that will be played two times, and a number of

writing tasks to check the understanding. The kind of tasks that may appear are multiple

choice, true or false, brief answers, relate different texts to headlines, put parts of a text

in the correct order, identify images, identify the main points or ideas in a conversation,

complete tables, drawings, maps, diagrams, recognise communicative situations and

follow instructions (Escuela Oficial de Idiomas de Gijón. Departamento de Inglés 137-

138). The paper aims to check the ability of the candidate to understand statements and

messages, warnings and instructions about abstract or concrete themes; understand the

main ideas in conferences, talks and reports; understand TV news or programmes about

current affairs; understand documentaries, live interviews and parts of TV and film

pieces; understand conversations among native speakers; understand face-to-face

323

conversations; understand discussions on issues related to his or her speciality; identify

context elements; recognise terms, and expressions and complex sentences in common

situations (137). With regard to the guidelines stated in the framework, the EOI paper

would be suitable to assess the listening skill since it may contain tasks proposed in the

framework itself and the criteria written there.

EOI CEFR

Tasks:

Unspecified

• multiple choice

• true or false

• brief answers

• relate different texts to headlines

• put parts of a text in the correct

order

• identify images

• identify the main points or ideas

in a conversation

• complete tables, drawings, maps,

diagrams

• recognise communicative

situations and follow instructions

Tasks:




recordings, cinema)





Criteria:

the ability of the candidate to:

Criteria:

324

• understand statements and

messages, warnings and

instructions about abstract or

concrete themes

• understand the main ideas in

conferences, talks and reports

• understand tv news or

programmes about current affairs

• understand documentaries, live

interviews and parts of tv and film

pieces

• understand conversations among

native speakers

• understand face-to-face

conversations

• understand discussions on issues

related to his or her speciality;

• identify context elements;

recognise terms, and expressions

and complex sentences in

common situations





• keep up with a conversation,




participate




messages and radio




The EOI listening paper does not use a rubric for the assessment.

325

6.6. Findings

The previous, detailed analysis of both the exam papers and the rubrics used (if any) to

assess some of the exam tasks enables the comparison between them in order to make

some findings.

The first step is the comparison between exam papers. In this case, the English Certificate

tests analysed will be compared in terms of skill and also in general.

Writing

CEFR (Writing)

Tasks: Criteria:


questionnaires






• letters







play.


English

Certificate

FCE IELTS

(Band 6)

ISE II ACLES EOI

Time 1h 20 min 60 min 2h (together

with 2

70-90 min 90 min

326

reading

tasks)

Nº of Tasks 2 2 2 2 1 or 2

Word

length

140-190 Task 1:

150-180

Task 2:

250-300

180 At least 125

each and no

more than

450 in total

75-250

Rubric? Yes Yes Yes Yes Yes

Match

CEFR

tasks

Task 1: Yes

Task 2: Yes

Task 1: No

Task 2: Yes

Task 1: Yes

Task 2: Yes

Task 1: yes

Task 2: yes

Task 1: Yes

Task 2: Yes

Match

CEFR

criteria

Yes Yes Yes No

(unknown)

Yes

Concerning the writing papers, the comparison shows how all the English Certificate

exams test the writing ability of the candidate in a separate paper except for the Trinity

ISE II, which tests it in a paper together with the reading. It can also be stated that almost

all the certificates have designed the paper taking into account the CEFR, since all the

tasks but one are mentioned in the framework as suitable tasks to assess this skill; and

also all of the criteria related to the level match the B2 criteria proposed by the European

Council with the exception of the ACLES certificate. In this case, it is not known whether

or not the criteria match because although it is said that they are based on the CEFR, they

do not appear written specifically.

327

With regard to the IELTS writing task 1, it implies the completion of a graphic or diagram,

a task not contemplated by the CEFR. However, the CEFR includes the completion of

questionnaires as a suitable task so it would not affect the reliability of the test. As for the

ACLES writing paper, it does not specify the assessment criteria for the writing, which

makes its comparison with the framework impossible, but does not necessarily mean that

the paper is not suitable. Nevertheless, it has already been explained how important it is

for the students or candidates to be aware of the assessment criteria, so it would be

recommendable to include them.

Speaking

CEFR (Speaking)

Tasks: Criteria:


instructions, etc.)

















topics





328



fluency



questions are posed

English

Certificate

FCE IELTS

(Band 6)

ISE II ACLES EOI

Time 15 min 11-14 min 20 min

(together

with the

listening

paper)

7-10 min 10 min

Nº of Tasks 4 4 3 2 Variable

Rubric? Yes Yes Yes Yes Yes

Match

CEFR

tasks

Task 1: Yes

Task 2: Yes

Task 3: Yes

Task 4: Yes

Task 1: Yes

Task 2: Yes

Task 3: Yes

Task 4: Yes

Task 1: Yes

Task 2: Yes

Task 3: Yes

No

(unknown)

Yes

Match

CEFR

criteria

Yes Yes Yes No

(unknown)

Yes

The speaking is assessed through an individual paper in all the exams but the ISE II,

because of its integration concept which measures the skills combined in pairs. It is

clear that the papers have been designed according to the CEFR tasks and criteria. Once

329

again, ACLES paper gives no information about the criteria and the tasks, so it is

impossible to examine it.

Reading

CEFR (Reading)

Tasks: Criteria:



reference works


• reading for pleasure.

• for gist


• for detailed understanding; for

implications, etc.






selectively


• follow a text of instructions,



issues


texts




opinions

330

English

Certificate

FCE IELTS

(Band 6)

ISE II ACLES EOI

Time 1h 15 min

(together

with Use of

English)

60 min 2h (together

with the

writing

paper)

60-70 min 60 min

Nº of Tasks 3 3 (from 11

possible

types)

2 2- 4 Variable

Rubric? NO No No No No

Match

CEFR

tasks

Task 1: Yes

Task 2: Yes

Task 3: Yes

Task 1: Yes

Task 2: Yes

Task 3: Yes

Task 1: Yes

Task 2: Yes

No

Yes

Match

CEFR

criteria

Yes Yes Yes No

Yes

From the comparison, it is fundamental to highlight the fact that none of them uses a

rubric in order to assess the exam paper. As has already been explained, the criteria stated

by ACLES do not match the CEFR guidelines. The reason why is that they aim to check

if the candidate is able to understand just the main ideas, while the CEFR also suggests

the assessment of reading for details, following instructions, etc. Nevertheless, through

the criteria stated by the UDC Language Centre, it can be elucidated that the identification

of details and following of instructions are also being measured.

331

Listening

CEFR (Listening)

Tasks: Criteria:




recordings, cinema)







• can follow extended speech and


• keep up with a conversation,




participate




messages and radio




English

Certificate

FCE IELTS

(Band 6)

ISE II ACLES EOI

Time 40 min 30 min 8 min 30-40 min 45 min

332

Nº of Tasks 4 4 (from 6

different

types)

1 2- 4 Variable

Rubric? NO No Yes No No

Match

CEFR

tasks

Task 1: Yes

Task 2: Yes

Task 3: Yes

Task 4: Yes

No No Yes

Yes

Match

CEFR

criteria

No Yes No No

Yes

The listening papers have in common the lack of a grading scale for the assessment of the

candidate, with the exception of the ISE II, which does include one. It is also worth noting

that it is the skill’s exam paper in which more incongruities between the framework and

the papers are found. The FCE criteria do not match the CEFR criteria because none of

them implies the response or participation of the candidate to a discussion with native

speakers, and this could be an interesting task for the certification of the level. As for the

IELTS, the tasks proposed are different from the framework’s examples of tasks for the

assessment of the listening comprehension. Another certificate which does not follow the

criteria is the ACLES paper as none of the tasks includes any listening task which involves

a conversation between two speakers. The criteria for the assessment cannot therefore be

matched.

333

English Certificate paper’s comparison

FCE IELTS

(Band 6)

ISE II ACLES EOI

Nº of exam papers 4 4 2 4 4

Match

CEFR

tasks

Writing Yes Yes Yes Yes Yes

Speaking Yes Yes Yes No Yes

Reading Yes Yes Yes No Yes

Listening Yes No No Yes Yes

Match

CEFR

criteria

Writing Yes Yes Yes No Yes

Speaking Yes Yes Yes No Yes

Reading Yes Yes Yes No Yes

Listening No Yes No No Yes

Reliability Cronbach’s

alpha

0.94 (List)

0.92

(Wr)

0.90

Unknown Unknown Unknown

SEM 2.78 (List)

0.37

(Wr)

0.38

Unknown Unknown Unknown

Rubric’s comparison

334

WRITING FCE IELTS ISE II ACLES EOI

Type Measurement Analy. Analy. Analy. Analy. Analy.

Scoring Quant. Quant Quant Quant+qu

al

Quant+qu

al

Theme D-R D-R D-R D-R D-R

Application Skill-

Foc

Task-

foc

Task-

foc

Skill-Foc Task-foc

Function Prof. Prof. Prof. Prof. Prof.

Scorer Examin

er

Examin

er

Examin

er

Examiner Examiner

Channel Paper Paper Paper Paper Paper

Relevant Yes Yes Yes Unknown Yes

Valid Yes Yes Yes Yes Yes

CEFR Feasible Yes No No Yes No

Descripto

rs

Pos. Yes No No No No

Brief Yes Yes No Yes No

No

Vagu

e

No No Yes Yes Yes

Popham

’s rubric

Skill worthwhile Yes Yes Yes Yes Yes

Scoring criteria

(few+well

labelled)

Yes Yes Yes No No

Descriptors (well

described)

No Yes Yes Yes Yes

335

Clear+handy? Yes No No Yes No

Analy.= analytic; Quant= quantitative; Qual= qualitative; D-R= Domain Relevant;

Skill-Foc= skill focused; Task-Foc= task focused; Prof.=proficiency; Pos. =positive

From the comparison of the rubrics, a number of interesting findings may be commented

on. It is especially relevant that all the rubrics are analytic and none of them holistic.

Presumably, this is because the certificates attempt to give the most accurate score

possible. Analytic rubrics measure different aspects of the candidates writing such as the

organisation, lexis, grammar, etc. Hence, it is easier for the marker to give a precise score.

Furthermore, all of them are quantitative except for the ones used by the ACLES and the

EOI, which contain both quantitative and qualitative scales. The main reason why may

be that these certificates give the candidates a numeric score in order to reflect somehow

not only whether they have the B2 level, but also how far they are from achieving it or

how well they have performed. Nevertheless, a qualitative approach could also be

feasible, although it would imply the inclusion of some kind of feedback or explanation

by the examiner.

In the matter of the application of the rubric, two of them are skill-focused (FCE and

ACLES), as they are used for the assessment of all the writing tasks, whereas the other

three are task-focused because they use rubrics specific for each of the tasks.

It is worth mentioning that none of the rubrics have passed all the criteria, with the FCE’s

and the ACLES’s being the grading scales with the fewest fails. This illustrates the

complexity of the creation of a rubric and how difficult it is to design a perfect one.

336

SPEAKING FCE IELTS ISE II ACLES EOI

Type Measurement Analy. Analy. Analy. Analy. Analy.

Scoring Quant. Quant Quant Quant+qu

al

Quant+qu

al

Theme D-R D-R D-R D-R D-R

Application Skill-

Foc

Skill-

foc

Skill-

foc

Skill-Foc Skill-foc

Function Prof. Prof. Prof. Prof. Prof.

Scorer Examin

er

Examin

er

Examin

er

Examiner Examiner

Channel Paper Paper Paper Paper Paper

Relevant Yes Yes Yes Yes Yes

Valid Yes No Yes Yes Yes

CEFR Feasible Yes No No Yes No

Descripto

rs

Pos. Yes No No Yes No

Brief Yes Yes No Yes No

No

Vagu

e

No No Yes No Yes

Popham

’s rubric

Skill worthwhile Yes Yes Yes Yes Yes

Scoring criteria

(few+well

labelled)

Yes Yes Yes Yes No

Descriptors (well

described)

No Yes Yes No Yes

337

Clear+handy? Yes No No Yes No

In the case of the speaking rubrics, similar results were encountered. The preferred rubric

is the analytic and quantitative and also the skill-focused one, since all the rubrics are

used for assessing all the speaking tasks of each of the exam papers. No rubric was found

that passed the whole analysis without failures. The FCE rubric and the ACLES rubic

were the ones which failed in fewer categories, since they do not contain any description

for some of the bands.

It can be highlighted that the criteria were few and well labelled in all the rubrics except

for the EOI’s one. On the other hand, only three of them presented the descriptors

correctly described. In the case of the EOI’s rubric, the descriptors were precisely written.

However, there were too many, making the rubric not handy and not feasible. Something

very similar happens with the Trinity ISE-II’s rubric. Even though its criteria are few and

well labelled, the fact that the descriptors are so long make the rubric very unhandy.

339

Chapter 7: CONCLUSIONS

This thesis can be concluded by making a general reflection upon what this research

means within the area studied and what it entails for the current educational community.

One of the main goals of the current doctoral thesis was to determine the degree of

implementation of the CEFR in the principal English certificates, as well as to examine

the use of rubrics in the assessment of different skills. The results of an extensive

examination of the exams and their grading scales show that more research is needed in

the field as incongruences and shortcomings have been detected.

The research undertaken has been satisfactory in many ways. It has enabled not only a

deeper understanding of the complex assessment process; but also a better perception of

the CEFR as a cornerstone for the TEFL in Europe, and the main driving force towards

the establishment of a communicative system in each of its member States. However, one

of the most significant findings to emerge from this thesis is the verification that the

certificates that must determine the learner’s competence present faults in the

implementation of the basic CEFR guidelines, and they assess with rubrics which do not

meet the efficiency and reliability requirements that are expected of them. Moreover,

significant research limitations stem from the lack of transparency the certificates show

in terms of assessment criteria, instruments and reliability data. In spite of their official

status and national recognition, some certificates provide vague information on the

structure of their exams and do not supply data on the validity of their exams, such as the

SEM or the Cronbach’s alpha. As a result, the reflections of this conclusion must be

addressed from the need for additional extensive research in order to complement and

improve them.

340

With regard to the positive aspects, the analysis of both the exam papers and the grading

scales and the comparisons made open up interesting lines for future research.

Furthermore, the community can benefit from the key findings which have established

patterns between the assessment of the different skills and contribute to the improvement

of the evaluation process from the perspective of exam creation and rubrics design. The

switch towards a more communicative assessment of the receptive skills can also be

initiated with some of the basis of the current dissertation.

7.1. Research implications

The research carried out allows reflections to be made related to many different aspects

or areas, such as the institutions in charge of the certificates, the development of exams,

the creation of rubrics and the CEFR. Each of those reflections derives from some

problem, obstacle or issue which should be addressed promptly for the sake of the

teaching-learning process.

The first observation deals with the evidence that the exam stipulations and criteria are

not always at the candidate’s disposal, as the exam structure is sometimes variable,

unclear or confusing. Indeed, among the tests studied, just three of the certificates provide

test candidates with complete information on the exam criteria and the tasks that will be

faced. Here it is essential to bear in mind that a great deal of research has been conducted

on the benefits of giving EFL learners access to the rubrics, the explanation of their

criteria and the provision of students with good and bad examples (Wang, Sundeen,

Becker and Laurian and Fitzgerald). The ACLES certificate, for instance, makes no

reference to the exam’s criteria for writing and speaking, even when they are specified

for the reading and listening paper. In the case of the EOI certificate, the tasks of the

341

writing paper are mentioned but not explained, as occurs with the exercises of the

speaking, listening and reading parts.

Despite this absence of concrete details in the institutions’ webpages, it has been assumed

that the official examiners of both certificates do manage this information; otherwise, this

would imply that the certificates have a complete lack of reliability. It is therefore

recommendable that the exam contents and assessment specifications be published so that

candidates can prepare the exams accordingly. It should be noted, however, that both

certificates depend partially on local institutions: the different universities where it is

assessed, in the case of the ACLES certificate, and the different schools and autonomous

communities in the case of the Schools of Languages. This may be the reason why criteria

and tasks vary from one exam to another, or it may explain why some of these institutions

facilitate the specifications of their tests and others do not. For example, the EOI from A

Coruña does not publish its official assessment rubrics but the EOI from Gijón shares

them on its website. With the aim of shedding light on the transparency of the exams, it

would be advisable to unify criteria, exam structures, tasks and rubrics, which would

further increase the reliability of the certificates.

Following this line of transparency and specification, this analysis found insufficient data

and research on the reliability of the certificates. Certificates of such significance at

national level ought to be endorsed with sufficient research on effectivity and validity.

Nevertheless, whereas the FCE and the IELTS websites contain data concerning

reliability indicators (e.g. the Cronbach’s alpha and SEM) as well as articles and

documents explaining research and studies carried out regarding the reliability of their

certificates, the ISE-II, ACLES and EOI websites do not mention any empirical figure.

In the current doctoral dissertation, only the data available could be taken into account in

the examination and comparison of the different exams as research into the verification

342

of each of the papers’ reliability would imply a complete parallel and immeasurable

investigation.

Another implication which may be drawn from the study conducted is the fact that none

of the rubrics used for the assessment has passed the test of effectivity. This clearly

illustrates the difficulty and complexity which designing a really effective and valid rubric

may entail. One of the main complications discovered in the comparison of results is the

achievement of a suitable balance between the brevity the level descriptors are supposed

to have and the clear and detailed explanations they should give. Some of the rubrics

show an exemplification of this obstacle; for instance, the writing rubric used by the EOI

measures many different criteria and, as a result, it is neither feasible nor clear and useful.

On the other hand, the FCE writing rubric is feasible and useful, and it contains a suitable

number of criteria, but the descriptors are too vague and not well described. Something

similar happens with the ISE II’s grading scale: the descriptors are suitable, well

explained and clear, but they are not brief.

The fact that none of the rubrics have passed all the assessment criteria lead to the

consideration of the real feasibility of the framework itself. The CEFR gives guidance on

how to assess the different skills and it does so in different ways. To begin with, it contains

charts which provide information on the characteristics a candidate must achieve in each

of the language levels for each of the skills. This allows teachers or institutions to prepare

curricula, syllabi design evaluations and tests for the assessment of the contents learned

or for diagnosis. Moreover, it includes grading scales with descriptors for each of the

skills and each of the levels and even for different tasks. Thus, criteria and descriptors for

the assessment of speaking with a sustained monologue, public announcement or an

addressing audience’s task are available. Nevertheless, a maximum use of the framework

would imply that the institutions in charge of the certificates’ elaboration would either

343

have to use a different rubric for each of the tasks or use the overall one, which is not as

precise as the others. These drawbacks are not disadvantaging per se, in fact, it would be

ensured that the tasks selected are appropriate and that the descriptors incorporated into

the scale would be much more precise. In contrast, the amount of work would rise together

with the time employed. Another possible inconvenience related to the previous one is

that the CEFR scales are holistic and they only indicate different levels of language. As a

result, those certificates which determine whether a candidate has a certain level or not

may find the descriptions too general since they do not contemplate different degrees of

competence within the same level; i.e., if the candidate performance is outstanding, very

good, solid, satisfactory, etc.

Many studies conducted (such as the previously mentioned one by Trong and that carried

out by Ghalib and Al-Hattami) have proved that analytic scales are more precise than the

holistic ones. Consequently, they are more reliable and therefore most of the rubrics

examined are analytic (except for the ISE-II’s listening one). As the institutions

themselves are responsible for creating their own grading scales and the framework just

contains holistic ones, the process of designing the rubrics is more complicated and the

resulting rubrics are more diverse despite being based on the framework. Each institution

must decide on the criteria, the scale and formulate the descriptor for each of the levels.

While this may result in a much more tailored rubric, it may also distance them from the

framework and hence the reliability thereof is likely to decrease. Having reached this

point, the inclusion of holistic rubrics in the CEFR would unify the rubrics, at least in

terms of criteria, which would increase the reliability of the certificates. The institutions

would only need to tailor these rubrics to their exam’s features and tasks, but the

assessment criteria and the basics of the descriptors would be more similar.

344

In connection with the designing of rubrics according to the framework’s guidance, it is

worth mentioning the recommendations given for the descriptors. Those suggestions are

the ones which have been mentioned and used for the examination of all the rubrics during

the research: the feasibility of the rubric according to the size and number of criteria used,

positively worded writing, and the brief and “not vague” descriptions of the levels. It has

already been stated above how difficult it is to achieve a right balance among those

specifications, particularly between the prescription of using brief descriptors which are

at the same time “non- vague”. Another obstacle is writing with positive words, since

describing the lower levels of performance in an affirmative way requires being a true

master of language. In fact, only the FCE writing rubrics are positively worded, whereas

all the other rubrics analysed do not respect this instruction. As far as the speaking rubrics

are concerned, all but the ACLES and FCE’s rubrics use negative words. More illustration

of positively worded descriptors in the CEFR grading scales would help to ease its

implementation in other rubrics.

The European Framework of Reference for Languages is extremely useful in the matter

of assessment and has carried out a great labour in the unification of levels and promotion

of communicative approaches. Nonetheless, it should be noted that its application might

not be as feasible as it should be, owing to the shortage of precise grading scales. None

of the rubrics involved in the assessment of the most prominent language certificates is

completely effective, valid and reliable, and all of them present contradictions to the

CEFR’s regulations. Although this phenomenon may be due to particular unrelated

reasons, it would be foolish not to hold the framework responsible for it to some extent.

It may be the case that the applicability of the CEFR is not a total reality and therefore, a

revision and improvement of it is highly recommendable.

345

Another interesting reflection arises from the presence or absence of rubrics for the

assessment of papers. All the grading scales analysed assess the writing or speaking

papers except for the ISE-II rubric which assesses the listening skill. This leads to the

conclusion that the productive skills (speaking and writing) are easier to assess with a

rubric than the receptive skills. Since the framework provides grading scales for reading

and listening and it is possible to assess those skills with rubrics, we need to look for an

explanation in the exam tasks. The evaluation of writing generally consists of writing

essays and speaking requires the production of a speech or discussion. These types of

tasks are resolved with open answers, very different from one candidate to another, and

hence the objectivity of their assessment depends on the use of a reliable tool (the rubric).

On the other hand, reading and listening exams contain tasks which are generally

answered with a multiple choice or with one or two concrete words. In these cases, a

rubric would be totally useless as, given that an answer can only be correct or incorrect,

there is no place for creative answers. A change in the type of task towards more open

answers will be required in order to measure competence with a rubric. The ISE II

listening paper, for instance, requires the candidate to answer questions orally in order to

assess whether he or she has understood the recording. Hence, it is possible to assess with

a grading scale; for example, if all the data from the listening have been correctly

understood or just a part of them. Tasks of his type benefit from the use of a rubric. The

production of a summary in the reading paper to prove the correct understanding of the

general ideas and the sub-ideas, as well as the relationships established among them, is

another example of task which could be assessed with a grading scale. Other possible

tasks include the oral or written explanation of a topic by using all the information given

either in the listening track or the reading texts. Nevertheless, the change of the traditional

346

forms of assessment in the listening and reading papers would necessarily involve a

complete shift in the teachers’ methodology too.

7.2. Research limitations

In spite of the fact that the research conducted has led to many interesting findings and

allows us to make some important reflections on the use of rubrics, the adaptation of the

CEFR and the English certificates, they must be taken with precaution, since it is

fundamental to be aware of the multiple limitations of this thesis. With the intention of

analysing the results from a reasonable perspective and of being able to suggest future

actions to improve them, the following paragraphs contain a series of constrains that

should be taken into account.

Firstly, the data mentioned referring to the reliability of the tests have been taken directly

from the certificates’ official websites, but many of them did not provide those data, so a

real comparison between the reliability of certificates was not possible. Ideally, for a

comprehensive examination and contrast of all the exams, they should be properly tested

in terms of reliability. For instance, a group of participants should take all the certificates

and be assessed by at least two different examiners in order to compare the results. Thus,

data referring to the reliability of the test paper could be obtained and compared. If the

reliability were good, then the candidate should obtain the same score independently of

the examiner. At the same time, comparisons between exams could be made. Candidates

should obtain similar results in all the certificates in terms of level; for example, obtain a

B2 level in all the certificates. However, all this would imply carrying out another huge

parallel research project.

Another limitation of the study is related to the rubrics. As has been explained, the test

designed to control their effectiveness is a combination of the information contained in

347

the framework about the requirements for building a rubric and a rubric of rubrics

designed by Popham. Nevertheless, there is plenty of research on rubrics and how to build

effective and reliable ones. Although most authors agree on the basic requirements, there

may be certain variations about what is a good rubric and what criteria must it contain.

Experienced examiners could be involved in the research, so that its perspective and

knowledge about the use of the selected rubrics, the assessment of the different exam

papers and the elaboration of their tasks. Interviews or questionnaires could have been

used for the design of the research or the comments on their results and findings.

Even though the European Framework has been one of the principal cornerstones of this

research, the limitations of the framework itself have already been explained in the

previous section. As a result, the tasks suitable for assessing the different skills or the

criteria for assessing a candidate in each of the levels might not be perfect or totally exact.

On the other hand, the framework has been written based on many exhaustive

investigations by an entire panel of experts, and since it is the main frame of reference in

the teaching, learning and assessment of languages, it must be taken with the reliability it

deserves. It is fundamental, though, to consider that the project was undertaken more than

a decade ago and it was an enormous unprecedented work. It may thus be adjusted,

improved or modified in the future.

7.3. Research applicability and future implications

The current thesis intends to contribute to the area of language learning, especially to the

assessment of foreign language with the use of rubrics. Despite the numerous limitations,

it may be used in several ways.

348

The first application of the research could be as an encouragement to improve the rubrics

used by the main certificates analysed. The examination carried out is conducive to the

simple detection of certain mistakes that could be corrected to a greater or lesser extent.

Since none of the rubrics passed all of the recommendations stated in the framework, nor

the rubric’s rubric implemented, further revision should be conducted by their

institutions. This revision could be started with the current research; for instance, those

rubrics which do not use positive wording in their descriptors should correct this anomaly.

Similarly, those which use vague descriptors or excessively long ones, or perhaps too

many criteria, should be more harmonious with the framework guidelines.

It is particularly encouragable to increase transparency in the information on. The benefits

of providing students with the assessment criteria before the exam as well as allowing

them to be fully conversant with the tasks of the paper have been sufficiently proved.

Some of the certificate papers analysed did not provide candidates with specific data

about the paper tasks, or the information given is far too vague. Including this data would

be advisable and it would be positive to attach a number of past exam papers or models.

Furthermore, criteria for each of the tasks or papers are not always specified. Learners

need to know what is expected from them in order to be prepared for the exam, and to

know how far they are from being able to use the language functions a determined level

requires. It is also recommendable that the instruments of assessment be available, as well

as examples which correspond to different grades. Some of the rubrics analysed are not

published on the institutions’ websites or are not easy to find. A perfect scenario would

incorporate training and practice on rubrics into the certificate’s preparation courses. For

instance, learners could try to assess other candidates’ compositions or speaking

performance by using the same rubric. Candidates should be explained how those rubrics

work and be allowed to see different examples.

349

The research should also encourage a revision of the European Framework of Reference

for Languages. It was published in 2001, so almost two decades have already passed. It

is essential to check its results and effectivity. Some research ought to be conducted on

what the framework has supposed through these years, and whether its aims have been

accomplished or not. From these possible future studies, some conclusions and reflection

could be derived. Firstly, this reflection would allow us to ascertain how the Framework

is being used and implemented. For example, if it is really being used as the cornerstone

of language teaching and learning and if the use which is being made of it is suitable.

Additionally, it should be considered whether textbooks or Language certificates do

actually follow the guidelines given or, finally, if the education curriculum and syllabus

designed by the different governments allows the learning of sufficient language

competence in each of the levels.

The results of those analyses could help to improve the framework itself. In the current

thesis, a number of gaps in the framework have been noticed. As a result, some

suggestions are humbly suggested. For example, it would be interesting to include

information on the criteria the rubrics should incorporate for each skill; adding analytic

rubrics for each of the different levels and each of the skills. Furthermore, several exam

tasks could be added together with examples of model answers.

The test designed to check the validity of the exam papers and rubrics could be used to

assess other certificates or other rubrics. In addition, it could also be employed as a

guideline to create a new rubric or a new exam paper. Rubric creators should take into

account all the guidelines and check if they are really being followed. At the same time,

exam task recommendations and criteria should be considered when designing a syllabus

or an exam paper.

350

A broad area of research resulting from this thesis could be the analysis of reliability of

all the certificates, so that they can be compared, ranked and improved. This research

could also be subdivided into smaller areas. Improving the reliability of important English

Certificates would be incredibly helpful for the society and educational community.

Another future area of research is that concerning the assessment of receptive skills with

rubrics. This would lead to a wholesale transformation of teaching methodologies and

learning processes for the reading and listening skills. As has already been indicated, the

use of grading scales for the assessment of those skills would imply an implementation

of exam tasks different from the traditional ones. A change in the traditional tasks would

necessarily have to be accompanied by a change in the teaching-learning process.

Moreover, research on how to create and design effective and good rubrics for the tasks

or papers would have to be undertaken. Subsequently, research on the effectivity of the

new ways of teaching, learning and assessment of these two skills could be made, and

hence also improvement in and adjustments to the process.

Further areas of study would include further research on grading scales. To date, there

has been a lack of detailed studies on how to determine which rubrics are satisfactory.

More case studies would help to compare the different effects and implications of the use

of different types of rubrics. Most of the current studies deal with analytic and holistic

types, but case studies comparing rubric types according to other criteria are scant.

The following is a summary of some of the possible future lines of research:

- Analysis of the reliability of the most common English Certificates. Work

out what the index of reliably of each of them is, for example, Cronbach’s

Alpha and SEM.

351

- Comparison of other English certificates.

- Detailed revision of the European Framework of Reference for Languages,

applicability, consequences and possible deficiencies.

- Research on effective, reliable and valid rubrics. How to determine which

ones are good and improve them.

- Analysis, comparison and contrast of rubrics according to scale, function,

scorer and channel.

- Research on new ways of assessment for the receptive skills. New

methodologies and instruments of assessment.

- Research on the use of rubrics to assess the reading and writing papers.

These exiting new lines of research point towards the great deal of research which is still

needed to improve the language learning process, towards which the current doctoral

thesis supposes is merely a drop in the ocean. It might be overwhelming to consider how

much study is required until we achieve a system that unequivocally guarantees that its

students develop their maximum potential language competence. Nevertheless, one can

reflect on the meaningful milestones which over two decades the educational community

has accomplished and realise that the field of language learning has already undergone a

powerful and wholesale change towards a real communicative learning. This should be a

352

source of inspiration for all researchers, to prove to them that findings, foresight,

discipline, research and effort can actually transform the reality we live in.

353

Chapter 8: BIBLIOGRAPHY

Abbas, Zainab. “Difficulties in Using Methods of Alternative Assessment in Teaching

from Iraqi Instructors Points of View.” Al-Faith Journal. University of Diyala, College

of Education-Diyala. No. 48. Feb. 11 Oct. 2016

www.iasj.net/iasj?func=fulltext&aId=39413

ACLES. “Certacles. Modelo de acreditación de exámenes de ACLES.” Acles.es,

www.acles.es/files/certacles-modelo-acreditacion-examenes-acles.pdf

--“Estructura exámenes certacles.” Acles.es,

www.acles.es/files/ckeditor/estructura_examenes_certacles_2016_2_1.pdf

Al-Ghazo, Abeer. “The assessment of Reading comprehension strategies: Practices of

Jordanian public teachers at secondary level.” International Journal of English Language,

Literature and Humanities, Vol. III, Issue V, Jul. 2015, pp. 721-742.

Allen, Laura K. et al. “L2 Writing Practice: Game enjoyment as a key to engagement.”

Language Learning and Technology, Vol. 18, No.2, Jun. 2014, pp.124-150.

Altec. “Crear rúbrica.” Rubistar, University of Kansas, powered by 4teachers.org,

rubistar.4teachers.org/index.php?screen=CustomizeTemplate&bank_rubric_id=57&sect

ion_id=12&

Andrade, Heide. “What is a Rubric?” Rubstar. Create Rubrics for your Project-Based

Learning Activities. 2008.rubistar.4teachers.org/index.php?screen=WhatIs

Annenberg Foundation. Annenberg Learner. Teacher resources and professional

development across the curriculum, St. Louis, MO, 2016, www.learner.org/

http://www.iasj.net/iasj?func=fulltext&aId=39413

http://www.acles.es/files/certacles-modelo-acreditacion-examenes-acles.pdf

http://www.acles.es/files/ckeditor/estructura_examenes_certacles_2016_2_1.pdf

354

Ayhan, Ülkü. and M.Uğur Türkyılmaz. “Key of language assessment: rubrics and rubric

design.” International Journal of Language and Linguistics, Vol. 2, No.2, Jun. 2015, pp.

82-92, ijllnet.com/journals/Vol_2_No_2_June_2015/12.pdf

Baitman, Brittany and Mauricio Veliz Cambos. “A Comparison of oral evaluation ratings

by native English teachers and non-native English speaker teachers.” Literatura y

Lingüística, Vol. 61, No. 3, 27th Aug. 2012, pp. 171-200.

Bas, Gökhan. Implementation of Multiple Intelligences Supported Project-Based

Learning in EFL/ESL Classrooms, Karatli Sehit Sahin Yilmaz Secondary School, 2008.

Becker, Anthony. “Student-generated scoring rubrics: Examining their formative value

for improving ESL students’s writing performance.” Assessing Writing, Jul. 2016.

Black, Paul and Dylan William. “Inside the black box; Raising standards through

classroom assessment.” Phi Delta Kappan, Vol. 80, No. 2. 1998, pp. 139-148,

doi:10.1177/0031721009200119

British Council, IDP: IELTS Australia and Cambridge Assessment English, “Test

Format.” IELTS, 2018, www.ielts.org/about-the-test/test-format

--" Test performance 2017”, IELTS, www.ielts.org/teaching-and-

research/test-performance

--“Guide for teachers. Test format, scoring and preparing students for the

test”, UCLES, 2012.

Brooks, Gavin. “Assessment and Academic Writing: A look at the Use of Rubrics in the

Second Language Writing Classroom.” Kwansei Gakuin University Humanities Review,

Vol. 17, 2012, pp. 227-240, core.ac.uk/download/pdf/143638458.pdf

http://www.ielts.org/about-the-test/test-format

http://www.ielts.org/teaching-and-research/test-performance

http://www.ielts.org/teaching-and-research/test-performance

355

Brooks, Val. Assessment in Secondary Schools. The new teacher’s guide to monitoring,

assessment, recording, reporting and accountability. Buckingham, Philadelphia, Open

University Press, 2002.

Buján, Karmele et al. (coord.). La evaluación de competencias en la educación superior.

Las rúbricas como instrumento de evaluación. Editorial MAD, 2011.

Cambridge Assessment English. “Quality and Accountability.” Cambridge Assessment

English, www.cambridgeenglish.org/research-and-validation/quality-and-

accountability/

Cambridge English Language Assessment. “Cambridge English First Handbook for

Teachers for exams from 2016.” CambridgeEnglish.org,

www.cambridgeenglish.org/images/167791-cambridge-english-first-handbook.pdf

--IELTS 13 Academic with Answers, Cambridge University Press, 2015.

Cambridge University Press. “Research.” Cambridge Dictionary, 2018,

dictionary.cambridge.org/

Cano, Elena. “Las rúbricas como instrumento de evaluación de competencias en

educación superior: ¿Uso o abuso?” Profesorado. Revista de currículum y formación del

profesorado, Vol. 19, No. 2, Mai-Aug. 2015, pp. 265-280,

www.ugr.es/~recfpro/rev192COL2.pdf

Carrillo Zoque, Andrea del Pilar and Diana Rocío Unigarro Millan. La lúdica como

estrategia para transformar los procesos de evaluación tradicional de las estudiantes de

grado décimo en la clase de inglés en el Liceo Femenino Mercedes Nariño. Dissertation.

Tut. Dario Alexsander Chitiva Rodriguez. Bogota, Fundación Universitaria Los

file:///C:/Users/Maria/Downloads/www.cambridgeenglish.org/images/167791-cambridge-english-first-handbook.pdf

http://www.ugr.es/~recfpro/rev192COL2.pdf

356

Libertadores Vicerrectoría De Educación Virtual Y A Distancia Especialización En

Pedagogía De La Lúdica, 2015.

Castillo, Santiago and Jesús Cabrerizo. Evaluación educativa de aprendizajes y

competencias, Madrid, Pearson Education, 2010, ISBN: 978-84-8322-781-7

Castillo Tabares, R. et al. “Implicaciones de la evaluación continua a través de rúbricas

sobre las practices pedagógicas: evidencia empírica y apliación de análisis

multidimensional.” Revista Horizontes Pedagógicos, No. 16. 2014, pp. 66-77.

Center for Advanced Research on Language Acquisition (CARLA).

www.carla.umn.edu/assessment/vac/Evaluation/res_1.html

Centro de Linguas Universidade de A Coruña. “Acles.” Centrodelinguas.gl,

http://www.centrodelinguas.gal/gl/pag/276/acles/

-- “Especificacións do nivel B2.” Centrodelinguas.gl,

http://www.centrodelinguas.gal/gl/pag/289/caja-estatica--acreditacion-acles--

especificacions-do-nivel-b2/

Clever Prototypes, LLC. Quick Rubric web page, 2016, www.quickrubric.com/r#/create-

a-rubric

Council of Europe. Common European Framework of Reference for Languages:

Learning, Teaching, Assessment. Cambridge: Cambridge University Press, 2001.

Council of Europe. “Education and Languages, Languages Policy.” Council of Europe,

2014, www.coe.int/t/dg4/linguistic/Cadre1_en.asp

Çaḡatay, Sibel and Fatma Ünveren. “IS CEFR Really over There?, Procedia. Social and

Behavioural Sciences, Vol. 232, 2016, pp. 705-712.

http://www.carla.umn.edu/assessment/vac/Evaluation/res_1.html

http://www.quickrubric.com/r#/create-a-rubric


357

Dawson, Phillip. “Assessment rubrics: towards clearer and more replicable design,

research and practice.” Assessment and evaluation in Higher Education, Nov., Routledge,

doi: 10.1080/02602938.2015.1111294

Del Pozo Flórez, Jose Angel. Competencias Profesionales. Herramientas de evaluación:

el portafolios, la rúbrica y las pruebas situacionales, Narcea S.A. de Ediciones, 2012.

Dikli, Semire.“Assessment at a distance: Traditional vs. Alternative Assessment.” The

Turkish Online Journal of Education Technology, Vol. 2. Issue 3, art. 2. Florida State

University, 2003.

Ekmekçi, Emrah. “Comparison of Native and Non-native English language Teacher’s

Evaluation of EFL Learner’s Speaking Skills: Conflicting or Identical Rating

Behaviour?.” English Language Teaching, Vol. 9, No. 5, 2016, pp.98-105, DOI:

10.5539/elt.v9n5p98

Escudero Escorza, Tomás. “Enfoques Modélicos y estrategias en la evaluación de centros

educativos.” Revista Electrónica de Investigación y Evaluación Educativa), Relieve, Vol.

3, No.1, 1997.

Escuela Oficial de Idiomas de Gijón. “Criterios, procedimientos e instrumentos

de evaluación” eoigijon, 2017, eoigijon.com/wp-

content/uploads/2017/10/criterios-e-instrumentos-de-evaluacion.pdf

--“Modelos de Pruebas de Certificación de idiomas. Inglés. Nivel

Avanzado (NA).” eoigijon, www.educastur.es/estudiantes/idiomas/pruebas-

certificacion/modelos

--“Departamento de inglés. Programación 2017-2018.

http://eoigijon.com/wp-content/uploads/2017/10/criterios-e-instrumentos-de-evaluacion.pdf

http://eoigijon.com/wp-content/uploads/2017/10/criterios-e-instrumentos-de-evaluacion.pdf

http://www.educastur.es/estudiantes/idiomas/pruebas-certificacion/modelos

http://www.educastur.es/estudiantes/idiomas/pruebas-certificacion/modelos

358

Essay Tagger LCC. Essay Tagger.com. Common Core Rubric Creation Tool web page,

2016, www.essaytagger.com/commoncore

European Commission. Assessment of Key Competences in initial education and training:

Policy Guidance, Strasbourg, 20.11.2012 SWD(2012) 371 final, eur-lex.europa.eu/legal-

content/EN/TXT/PDF/?uri=CELEX:52012SC0371&from=EN

Ewing, Hannah. “Stereotype threat and assessment in schools.” Journal of Initial Teacher

Inquiry, Chris Asrall, Murray Fastier and Letitia Fickel (eds), Vol. 1, 2015, pp.7-9, ISSN

2463-4417

Ezza, El-Sadig Yahya. “Criteria for Assessing EFL Writing at Majma’ah University.”

Education in the Middle Middle East and North Africa, S. Hidri and C. Coombe (eds.),

Springer International Publishing Switzerland, pp.185-200, DOI: 10.1007/978-3-319-

43234-2_112017

Fatalaki, Javad. Ahmadi. “Teacher-Based Language Assessment.” International Letters

of Social and Humanistic Sciences, SciPress Ltd., Vol. 60., 2015, pp. 77-82.

Fitzpatrick, Jody L. et al. Program Evaluation, Alternative approaches and practical

guidelines. Pearson, 2004.

Frydrychova Klimova, Blanka. “Evaluation Writing in English as a second language.”

Procedia_Social and Behavioral Sciences, Dec. 2011, pp. 390-394.

Gallego Arrufat, María Jesus and Manuela Raposo Rivas “Compromiso del estudiante y

percepción del proceso evaluador basado en rúbricas.” REDU. Revista de docencia

universitaria. Vol. 12, No.1, Apr., 2014, pp. 197-215, doi:10.4995/redu.2014.6423.

García-Sanz, Mari Paz. “La valuación de competencias en educación superior mediante

rúbricas: un caso práctico.” Revista Electrónica Interuniversitaria de Formación del

http://www.essaytagger.com/commoncore

359

Profesorado, Vol. 17, No. 1, 2014, pp. 87-106, DOI:10.6018/reifop.17.1. 198861,

revistas.um.es/reifop/article/view/87

Gardner, Richard et all. Rubrics. A paper submitted in partial fulfilment of the

requirements of RES 5560 Appalachian State University, Nov. 30, 2009.

lesn.appstate.edu/.../Gardner,Powell.../Rubric%20Lit.%20Review-

%20Dr.%20Olson.doc

Ghalib, Thikra. K. and Abdulghani Al-Hattami. “Holistic versus Analytic Evaluation of

EFL Writing. A Case Study.” English Language Teaching, Canadian Center of Science

and Education, Jun., Vol. 8, No. 7, 2015, pp. 225-236, doi:10.5539/elt.v8n7p225

Gil Pascual, Juan Antonio. Técnicas e instrumentos para la recogida de información.

Universidad Nacional de Educación a Distancia, ISBN: 878-84-362-6250-6, 2011

Girón-García, Carolina and Claudia Llopis Moreno. “Designing Oral-based Rubrics for

Oral Language Testng with Undergraduate Spanish Students in ESP Context.” The

Journal of Language Teaching and Learning. 2015-2, pp. 86-107,

dergipark.gov.tr/download/article-file/209019

Glencoe/McGraw-Hil. “Education up Close.” Teaching Today, Apr. 2005, Educational

and Proffesional Publishing Group of the McGraw-Hil Companies, Inc., New York, 2005.

www.glencoe.com/sec/teachingtoday/educationupclose.phtml/32

The glossary of education reform. In S. Abbott (Ed.). “Hidden curriculum.” 24th Aug.

2004, edglossary.org/hidden-curriculum

Goldin, Ilya M. A focus on content: the use of rubrics in peer review to guide students

and instructors, University of Pittsburgh, submitted to the Graduate Faculty of Arts &

http://www.glencoe.com/sec/teachingtoday/educationupclose.phtml/32

360

Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy,

2011, core.ac.uk/download/pdf/95381728.pdf

Griffee, Dale T. An Introduction to Second Language Research Methods, Design and

Data, TESL-EJ Publications, 2012.

Hamp-Lyons, Liz, “Purposes of Assessment.” Handbook of Second Language

Assessment, Dina Tsagari, Jayanti Banerjee (eds.), Walter de Gruyter Inc., 2016.

de Haan, Pieter and Kees Van Esch. “Towards and instrument for the assessment of the

development of writing skills.” Language and Computers, Oct. 2004, pp. 1-14.

Harmer, Jeremy. How to teach English, Pearson Longman, 2007.

Heidari, Adeleh et al. “The Role of Culture Through the Eyes of Different Approaches to

and Methods of Foreign Language Teaching.” Journal of Intercultural Communication,

issue 34, Mar., 2014, ISSN 1404-1634

Helgesen, Marc. Listening in Practical Language Teaching, edited by David Numan,

Mcgraw-hill, 2003.

Henning, Melissa D. “Rubrics to the Rescue: What are rubrics?” TeachersFirst. Thinking

Teachers Teaching Thinkers, www.teachersfirst.com/lessons/rubrics/what-are-

rubrics.cfm

Hensley, Brandon and Jeffrey Brand. “Public Speaking Assessment 2013 Report”,

Milliking University, 2012-2013, pp. 1-18.

Hernández Sampieri, Roberto et al. Metodología de la Investigación, 5ª Ed. McGraw Hill,

2010, ISBN: 978-607-15-0291-9

361

Herrera Mosquera, Leonardo and Diego Macías. “A call for language assessment literacy

in the education and development of teachers of English as a Foreign Language.” Colomb.

Appl. Linguist. J., Vol. 17, No. 2, pp. 302-312, 2015, doi:

10.14483/udistrital.jour.calj.2015.2.a09

Herrero Martínez, Rafaela. Mª et al. “Evaluación de competencias con actividades

académicas interdisciplinares.” Etik@net, 12, Vol. I, pp. 106-126, 2012.

Hymes, Dell H. “On communicative competence”, Pride, J.B & Holmes, J. (eds),

Sociolinguistics, pp. 269-93, Penguin.

IELTS Home: http://www.ielts.org/

Jackson, Noel R. and Anthony E. Ward. “Assessing Public Speaking. A trial rubric to

speed up and standardise feedback.” 2014 Information Technology Based Higher

Education and Training (ITHET), York, 2014, pp. 1-5, doi:

10.1109/ITHET.2014.7155700

Janisch, Carole. et al. “Implementing Alternative Assessment: Opportunities and

Obstacles.” The Educational Forum, Vol. 71, 2007.

Jonsson, Anders and Gunilla Svingby. “The use of scoring rubrics: Reliability, validity

and educational consequences.” Educational Research Review, 2, 2007, pp. 130–144, doi:

10.1016/j.edurev.2007.05.002

Karppinen, Tiia. “Reading Activities in EFL Textbooks: An analysis of upper secondary

school textbooks.” Bachelor’s Thesis, 11 Dec. 2013,

jyx.jyu.fi/bitstream/handle/123456789/44521/1/URN%3ANBN%3Afi%3Ajyu-

201411023158.pdf

http://www.ielts.org/

362

Keng, Leslie et al. “A Comparison of Distributed and Regional Scoring.” Test,

Measurement & Research Services, Pearson Bulletin, Sep., Issue 17, 2010,

images.pearsonassessments.com/images/tmrs/tmrs_rg/TMRSBulletin17.pdf?WT.mc_id

=TMRS_A_Comparison_of_Distributed

Kianiparsa, Parnaz and Sara Vali. “What is the Best Method to Assess EFL Learners’

Reading Comprehension.” ELTWeekly, Vol. 2, Issue 75, 12th Dec. 2010, pp. 8-24.

Laurian, Simona and Carlton J. Fitzgerald. “Effectsof using rubrics in a university

academic level Romanian literature class.” Procedia. Social and Behavioral Sciences,

Elsevier, 76, 2013, pp. 431-440.

Lavelle, Thomas. “Getting the Most from Texbook Listening Activities.” The Internet

TESL Journal, Nov. 2000.

Lavigne, Alyson Leah and Thomas L. Good. The Teacher and student evaluation: moving

beyond the failure of school system, Routledge, 2014.

López Bautista, Dolores. Evolución histórica de la evaluación educativa, 2010,

lahermandaddeeva.files.wordpress.com/2010/03/evolucion-historica-de-la-evaluacion-

educativa.pdf

Little, David et al. Training Teachers to use the European Portfolio, Council of Europe

Publishing, 2007.

Louw, Willa. “My Love Affair with Alternative Assessment: Integrating Quality

Assessment into OBE Courses for Distance Education.” Progressio, Vol. 25, Issue 2,

2003, pp. 21-28, ISSN: 0256-8853

Madrid, Daniel. “Introducción a la investigación en el aula de la lengua extranjera.”

Metodología de investigación en el área de filología inglesa, edited by María Elena

https://lahermandaddeeva.files.wordpress.com/2010/03/evolucion-historica-de-la-evaluacion-educativa.pdf

https://lahermandaddeeva.files.wordpress.com/2010/03/evolucion-historica-de-la-evaluacion-educativa.pdf

363

García Sánchez and María Sagrario Salaberri, Universidad de Almería, Secretariado de

Publicaciones, 2001, pp. 11-45,

www.ugr.es/~dmadrid/Publicaciones/Introduccion%20investigacion%20aula-

Sagrario%20y%20Elena.pdf

Marin-García, Juan A. et al. “Protocol: Comparing advantages and disadvantages of

Rating Scales, Behavior Observation Scales and Paired Comparison Scales for behavior

assessment of competencies in workers. A systematic literature review.” WPOM-Working

Papers on Operations Management, [S.l.], Vol. 6, No. 2, Nov. 2015, pp. 49-63,

DOI:10.4995/wpom.v6i2.4032, polipapers.upv.es/index.php/WPOM/article/view/4032

Mclaren, Neil et al. (eds.) TELF in Secondary Education: handbook and workbook,

Editorial Universidad de Granada, 2005, ISBN 84-338-3638-2

MECD Ministerio de Educación Cultura y Deporte. El Sistema Educativo español,

MECD/CIDE, Madrid, 2004, uom.uib.cat/digitalAssets/202/202199_6.pdf

Miller, Nigel. Alternative Forms of Formative and Summative Assessment, edited by John

Huston and David Whigham, Glasgow Calcedonian University, 2002.

Ministerio de Trabajo, Migraciones y Seguridad Social. “Diplomas de Acreditación de

Conocimientos de idiomas (inglés).” empleo.gob.es, Gobierno de España,

www.empleo.gob.es/es/mundo/consejerias/reinoUnido/portalempleo/es/curriculum/acre

ditacion-idiomas/index.htm

Morales, Carmen et al. La enseñanza de las lenguas extranjeras en España. Secretaría

General Técnica. Centro de Publicaciones. Ministerio de Educación, Cultura y Deporte.

sede.educacion.gob.es/publiventa/la-ensenanza-de-las-lenguas-extranjeras-en-

espana/investigacion-educativa/8757

http://www.empleo.gob.es/es/mundo/consejerias/reinoUnido/portalempleo/es/curriculum/acreditacion-idiomas/index.htm

http://www.empleo.gob.es/es/mundo/consejerias/reinoUnido/portalempleo/es/curriculum/acreditacion-idiomas/index.htm

364

Moss, Danna and Carol Van Duzer. “Project Based-learning for Adult English Language

Learners.” Eric Digest ED427556, 1998.

National Governors Association Center for Best Practices (NGA Center) and the Council

of Chief State School Officers (CCSSO) (n.d) Common Core Standards State Iniciative.

Preparing America’s Students for College & Careers. www.corestandards.org/about-the-

standards/development-process/

Oxford University Press. “Research.” Oxford English Dictionary, 2018,

en.oxforddictionaries.com/

Panadero, Ernesto and Anders Jonsson. “The use of scoring rubrics for formative

assessment purposes revisited: A review.” Educational Research Review V. 9, 2013, pp.

129–144., doi: 10.1016/j.edurev.2013.01.002

París Mañas, Georgina et al. “La evaluación de la competencia ‘Trabajo en equipo’ de

los estudiantes universitarios.” RIDU Revista d’Innovació Docent Universitaria, No. 8,

2016, pp. 86-97, DOI: 10.1344/RIDU2016.8.10

Patel, Amita “Evaluation – A Challenge for a Language Teacher.” The Global Journal of

English Studies I, May Volume 1, Issue 1, 2015, ISSN: 2395 4795

Patel, Pratiksha. Portfolio Asssessments, College of Education and Educational

Technology, Dec. 2001.

Perín, Dolores and Mark Lauterbach. “Assessing Text-Based Writing of Low Skilled

College Students.” International Artificial Intelligence in Education Scociety, Springer,

8th Nov. 2016.

http://www.corestandards.org/about-the-standards/development-process/

http://www.corestandards.org/about-the-standards/development-process/

365

Phelan, Colin and Julie Wren. Exploring reliability in academic assessment, UNI Office

of Academic Assessment, University of NorthernIowa, 2005-2006,

www.uni.edu/chfasoa/reliabilityandvalidity.htm

Popham, James. W. Mastering Assessment: a self-service system for educators,

Routledge, Oxon, 2006.

--Evaluación Trans-formativa, El poder transformador de la evaluación

formativa, Humanes, Narcea, 2013.

Pozuelos, Francisco José et al. Investigando la alimentación humana, Proyecto

Curricular Investigando Nuestro Mundo 6-12, Díada Editora, 2008, ISBN: 978-84-

96723-12-2

Princippia, Formación y Consultoría, S.L. Princippia. Una nueva forma de enseñar, una

nueva forma de aprender web page, 2016, princippia.com/www.princippia.com

Raposo-Rivas, Manuela. And Mª Esther Matínez-Figueira. “Evaluación educativa

utilizando rúbrica: un desafío para docentes y estudiantes

universitarios.” Educ, Vol. 17, No. 3, 2014, pp. 499-513, DOI: 10.5294/edu.2014.17.3.6

Real Decreto 1105/2014, de 26 de diciembre, por el que se establece el currículo básico

de la Educación Secundaria Obligatoria y del Bachillerato. Ministerio de educación,

cultura y deporte. Madrid, España, 3 de enero de 2015.

www.boe.es/boe/dias/2015/01/03/pdfs/BOE-A-2015-37.pdf

Reazon Systems Inc. “irubric.” RCampus web page, 2016,

www.rcampus.com/rubricshellc.cfm?mode=studio&sms=build&#REQUEST.rsUrlToke

n#

https://www.boe.es/boe/dias/2015/01/03/pdfs/BOE-A-2015-37.pdf

http://www.rcampus.com/rubricshellc.cfm?mode=studio&sms=build&#REQUEST.rsUrlToken


366

Richards, Jack C. Teaching Listening and Speaking from Theory to Practice, Cambridge

University Press, 2009,

www.researchgate.net/publication/255634567_Teaching_Listening_and_Speaking_Fro

m_Theory_to_Practice

Richards, Jack C. and Richard Schmidt. Language Teaching & Applied Linguistics,

Longman, Pearson Education, 2002.

Roberts, Rachel. “What are Reading skills?─They’re not (only) what you think.” Elt-

resourceful web page, 1st Dec. 2015, elt-resourceful.com/2015/12/01/what-are-reading-

skills-theyre-not-only-what-you-think/

Roca-Varela, Mª Luisa and Ignacio M. Palacios. “How are spoken skills assessed in

proficiency tests of general English as a Foreign Language? A preliminary survey.”

International Journal of English Studies (IJES), Vol. 13, No. 2, 2013, pp. 53-68.

Salehi, Mohammad and Zahra Sayyar. “An Investigation of the Reliability and Validity

of Peer-, Self-, and Teacher Assessment in EFL Learner’s Written and Oral Production.”

International Journal of Assessment and Evaluation in Education, Vol. 6, Dec. 2016, pp.

9-23.

Sambell, Kay et al. Assessment for Learning in Higher Education, Oxon, Routledge,

2013.

Schreiber, Lisa M. et al., “The Development and Test of the Public Speaking Competence

Rubric.” Communication Education, Routledge, Vol. 61, No. 3, Jul. 2012, pp.205-233,

DOI: 10.1080/03634523.2012.6707709

http://www.researchgate.net/publication/255634567_Teaching_Listening_and_Speaking_From_Theory_to_Practice

http://www.researchgate.net/publication/255634567_Teaching_Listening_and_Speaking_From_Theory_to_Practice

367

Simons, Marthea and Josef Colpaert. “Judgamental Evaluation of the CEFR by

stakeholders in language testing”, Revista de Lingüística y Lenguas Aplicadas, Vol. 10,

2015, pp. 66-77, DOI: 10.4995/rlyla.2015.3434

Slagell, Oral Presentations Evaluations: Pros and Cons, Fundamentals of Public

Speaking, Iowa State University,

isucomm.iastate.edu/files/pdf/OralPresentationEvaluation-ProsandCons.pdf

Solak, Ekrem and Firat Altay. “Prospective EFL Teacher’s perceptions of listening

comprehension problems in Turkey.” The Journal of international social research, Vol.

7, No. 30, 2014, pp. 190-198.

Sundee, Todd H. “Intructional rubrics: Effects of presentation on writing quality.”

Assessing writing, Elselvier, 1st of Apr. 2014, pp. 74-87.

Teacnhology Inc. “General Rubric Creator.” Teacnhology. The online Teacher resource

web page, 2010, www.teach-nology.com/web_tools/rubrics/general/

Trinity College London. “Integrated Skills in English (ISE) Guide for Teachers — ISE II

(B2).” Trinity College web page, 2015, Online edition Jun. 2017.

Trong Tuan, Luu. “Teaching and Assessing Speaking Performance through Analytic

Scoring Approach.” Theory and Practice in Language Studies, Academy Publisher, Vol.

2, No. 4, April 2012, pp. 673-679, doi: 10.4304/tpls.2.4.673-679

Tsushima, Rika. “Methodological Diversity in Language Assessment Research: The Role

of Mixed Methods in Classroom-Based Language Assessment Studies” International

Journal of Qualitative Methods, University of Alberta, Vol.14, No.2, 2015, pp. 104-121,

DOI: 10.1177/160940691501400202

http://www.teach-nology.com/web_tools/rubrics/general/

368

Turley, Eric D. and Chris Gallagher. “On the User of Rubrics: Reframing the Great Rubric

Debate.” The English Journal, Vol. 97, No.4, March 2008, National Council of Teachers,

pp. 87-92. DOI: 10.2307/30047253, www.jstor.org/stable/30047253

Uribe-Enciso, Olga. “Improving EFL students’ performance in Reading comprehension

through explicit instruction in strategies.” Rastros Rostros, Vol. 17, No. 31, 2015, pp. 37-

52, doi: 10.16925/ra.v17i31.1271

Velasco Martínez, Leticia. And Juan Carlos Tójar Hurtado. “Evaluación por

competencias en educación superior. Uso y diseño de rúbricas por los docentes

universitarios.” AIDIPE (Ed.), Investigar con y para la sociedad, Bubok, Vol. 2, 2015,

pp. 1393-1405, avanza.uca.es/aidipe2015/libro/volumen2.pdf

Verano-Tacoronte, Domingo. et al.“Valoración de la competencia de comunicación oral

de estudiantes universitarios a través de una rúbrica fiable y válida.” Revista Brasileira

de Educaçao, Vol. 21, No. 64, Jan.-Mar. 2016, pp. 39-60, doi: 10.1590/S1413-

24782016216403

Veerappan, Veeramuthu and Sulaiman Tajularipin. “A review on IELTS Writing Test, its

Tests Results and Inter Rater Reliability.” Theory and Practice in Language Studies,

Vol.2, No. 1, Jan. 2012, pp. 138-143, doi:10.4304/tpls.2.1.138-143

Vez, José Manuel. “La Investigación en Didáctica de las Lenguas Extranjeras.” Educatio

Siglo XXI, Vol. 29, No.1, 2011, pp. 81-108,

digitum.um.es/xmlui/bitstream/10201/27149/1/La%20Investigación%20en%20Didáctic

a%20de%20las%20Lenguas%20Extranjeras.pdf

http://www.jstor.org/stable/30047253

369

Von der Embse, Nathaniel P. et al. “Readying students to test: The influence of fear and

efficacy appeals on anxiety and test performance.” School Psychology International, Vol.

36, No. 6, 2015, pp. 620–637, DOI: 10.1177/0143034315609094

Walters, Brent G. and Ching-ning Chien. “College EFL Teacher’s Perspectives on

Listening Assessment and Summarization for a Specific Task.” Journal of Language

Teaching and Research, Vol.5, No.2, Mar. 2014, pp. 313-322

WebFinance, Inc. “Effectiveness evaluation.” Businessdictionary,

http://www.businessdictionary.com/

Weiqiang Wang. “Using rubrics in student self-assessment: student

perceptions in the English as a foreign language writing context.” Assessment &

Evaluation in

Higher Education, DOI: 10.1080/02602938.2016.1261993. 2016.

White, Michael and Gail Winkworth. A Rubric for Building Effective Collaboration:

Creating and Sustaining Multi Service Partnerships to Improve Outcomes for Clients,

ISBN: 9780-0-9873564-0-6, 2012.

Wimmer, Mary. “School Refusal: Information for Educators.” Helping Children at Home

and School III, National Association of School Psychologists, 2010.

Yilmaz, Burçak and Fatma Ünveren. “A comparative study of perceptions about the

Common European Framwework of Reference among EFL Teachers working at state and

private schools”, International Online Journal of Education and teaching (IOJET), Vol.

5, No. 2, 2018, pp. 401-417.

https://doi.org/10.1177%2F0143034315609094

370

8.1. List of figures:

Fig. 1. Annenberg Learner. Rubric creator. Screenshot.

Annenberg Foundation. Annenberg Learner. Teacher resources and professional

development across the curriculum, St. Louis, MO, 2016, www.learner.org/

Fig. 2. EssayTagger.com. Essay Tagger Common Core Rubric Creation Tool. Screenshot

Essay Tagger LCC. Essay Tagger.com. Common Core Rubric Creation Tool web page,

2016, www.essaytagger.com/commoncore

Fig. 3. Irubric by RCampus. Screenshot

Reazon Systems Inc. “irubric” RCampus web page, 2016,

www.rcampus.com/rubricshellc.cfm?mode=studio&sms=build&#REQUEST.rsUrlToke

n#

Fig. 4. Rubistar. Screenshot

Altec. “Crear rúbrica.” Rubistar, University of Kansas, powered by 4teachers.org,

rubistar.4teachers.org/index.php?screen=CustomizeTemplate&bank_rubric_id=57&sect

ion_id=12&

Fig. 5. Teachnology General Rubric Generator. Screenshot

Teacnhology Inc. “General Rubric Creator.” Teacnhology. The online Teacher resource

web page, 2010, www.teach-nology.com/web_tools/rubrics/general/

Fig. 6. Quick Rubric. Screenshot

Clever Prototypes, LLC. Quick Rubric web page, 2016, www.quickrubric.com/r#/create-

a-rubric

http://www.essaytagger.com/commoncore



http://www.teach-nology.com/web_tools/rubrics/general/



371

Fig. 7. Rubric.O-Matic software by Peter Evans. Screenshot

Evans, Peter. “Rubric-O-Matic 2016.” eMarking Assistant, Helping teachers provide

detailed feedback, 2015, emarkingassistant.com/products/rubric-o-matic/

Fig. 8. Princippia by Princippia Formación y Consultoría, S.L. Rubric sample. Screenshot



Fig. 9. Princippia by Princippia Formación y Consultoría, S.L. Evaluation Criteria.

Screenshot



Fig. 10. Princippia by Princippia Formación y Consultoría, S.L. Final Evaluation.

Screenshot



373

APPENDICES

Appendix 1: Exam samples

FCE

391

IELTS

413

ISE II

Listening and speaking

415

Reading and writing

421

ACLES

431

EOI

441

Appendix 2: Rubrics

FCE

443

IELTS

447

ISE-II

453

ACLES

455

EOI

459

Appendix 3: CEFR

Common Reference Levels: self-assessment grid

461

Common Reference Levels: qualitative aspects of spoken language use

463

Chapter Four scales for different skills and tasks

Adequacy of Assessment tools in Official English Certificates ...

Documents

Transcript of Adequacy of Assessment tools in Official English Certificates ...