Adequacy of Assessment tools in Official English Certificates ...
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of Adequacy of Assessment tools in Official English Certificates ...
Adequacy of Assessment tools in Official English Cer�ficates to the Guidelines provided by the Common European Framework for Languages: An analysis on the effec�ve implementa�on of rubrics
Lucía Fraga Viñas
Tesis doctoral
2019
Ad
equ
acy
of
Ass
essm
ent
too
ls in
Offi
cial
En
glis
h C
er�
fica
tes
to t
he
Gu
idel
ines
p
rovi
ded
by
the
Co
mm
on
Eu
rop
ean
Fra
mew
ork
fo
r La
ngu
ages
: An
an
alys
is o
n
the
effec
�ve
imp
lem
enta
�o
n o
f ru
bri
cs
LucíaFragaViñas
UDC2019
Adequacy of Assessment tools in Official
English Certificates to the Guidelines
provided by the Common European
Framework for Languages: An analysis on
the effective implementation of rubrics
Lucía Fraga Viñas
Tesis doctoral UDC / 2019
Director: Eduardo Barros Grela
Codirectora: María Bobadilla Pérez
Programa de doctorado en Estudios Ingleses Avanzados: Lingüística,
Literatura y Cultura.
1
ABSTRACT
The Council of Europe, through the Common European Framework of Reference for
Language, Teaching and Assessment (CEFRL), has been promoting shifts in the teaching
learning process, such as the communicative approach. The current doctoral thesis has
analysed the exams and rubrics of the main Official Certificates in Spain to check how
efficiently they have adapted the CEFR guidelines. The results of an extensive
examination show that more research is needed in the field as incongruences and
shortcomings have been detected. One of the most significant findings to emerge from
this thesis is the verification that the certificates that must determine the learner’s
competence present faults in the implementation of the basic CEFR guidelines, and they
assess with rubrics which do not meet the efficiency and reliability requirements that are
expected of them. Moreover, significant research limitations stem from the lack of
transparency the certificates show in terms of assessment criteria, instruments and
reliability data. On the other hand, this study has allowed the establishment of patterns
between the assessment of the different skills and can contribute to the improvement of
the evaluation process from the perspective of exam creation and rubrics design. The
switch towards a more communicative assessment of the receptive skills can also be
initiated with some of the basis of the current dissertation.
Keywords: CEFR, assessment, EFL, rubrics, communicative approach, English Official
Certificates
3
Resumen
El Consejo Europeo, a través del Marco Común Europeo de Referencia para la Lengua,
Enseñanza y Aprendizaje (MCER), ha impulsado cambios en el proceso de enseñanza y
aprendizaje de lenguas, tales como el enfoque comunicativo. Esta tesis doctoral ha
analizado los exámenes y las rúbricas de los principales Certificados Oficiales en España
para comprobar la eficiencia con la que han adoptado las directrices del MCER. Los
resultados de este estudio muestran que todavía se necesita más investigación en este
campo ya que se han detectado incongruencias y omisiones. Uno de los resultados más
significativos ha sido el hecho de que los certificados que determinan la competencia de
los estudiantes contienen errores en la implementación de las directrices del marco y
examinan con rúbricas que no cumplen con los requisitos de validez y eficiencia que se
espera de ellas. Además, muchas de las limitaciones de este estudio se derivan de la falta
de transparencia en cuanto a criterios de evaluación, instrumentos y datos de fiabilidad
de los certificados. Por otra parte, esta investigación ha permitido establecer patrones
entre la evaluación de las diferentes destrezas y puede contribuir a la mejora del proceso
de evaluación desde la perspectiva de creación de exámenes y diseño de rúbricas. El
cambio hacia una evaluación más comunicativa de las destrezas receptivas también puede
iniciarse a partir de algunas bases de la presente tesis.
Palabras clave: MCER, evaluación, inglés como lengua extranjera, rúbrica, enfoque
comunicativo, Certificados Oficiales de inglés
5
Resumo
O Consello Europeo, a través do Marco Común Europeo de Referencia para a Lingua,
Ensino e Aprendizaxe ( MCER), impulsou cambios no proceso de ensino e aprendizaxe
de linguas, tales como o enfoque comunicativo. Esta tese doutoral analizou os exames e
as rúbricas dos principais Certificados Oficiais en España para comprobar a eficiencia
coa que adoptaron as directrices do MCER. Os resultados deste estudo mostran que aínda
se necesita máis investigación neste campo xa que se detectaron incongruencias e
omisións. Un dos resultados máis significativos foi o feito de que os certificados que
determinan a competencia dos estudantes conteñen erros na implementación das
directrices do marco e examinan con rúbricas que non cumpren cos requisitos de validez
e eficiencia que se espera delas. Ademais, moitas das limitacións deste estudo derívanse
da falta de transparencia en canto a criterios de avaliación, instrumentos e datos de
fiabilidade dos certificados. Por outra banda, esta investigación permitiu establecer
patróns entre a avaliación das diferentes destrezas e pode contribuír á mellora do proceso
de avaliación desde a perspectiva de creación de exames e deseño de rúbricas. O cambio
cara a unha avaliación máis comunicativa das destrezas receptivas tamén pode iniciarse
a partir dalgunhas bases da presente tese.
Palabras clave: MCER, avaliación, inglés como lingua estranxeira, enfoque comunicativo
Certificados Oficiais de Inglés
7
Spanish summary (long)
En los últimos años son muchas las investigaciones que se han realizado sobre la
evaluación en el aula de inglés como Lengua Extranjera. El establecimiento del Marco
Común Europeo de Referencia para las Lenguas: Enseñanza, Aprendizaje y Evaluación
(MCER) ha impulsado cambios en el proceso de enseñanza aprendizaje, tales como el
enfoque comunicativo o el trabajo de las cuatro destrezas dentro del aula: comprensión
oral, comprensión escrita, producción oral y producción escrita. Sin embargo, es
necesario analizar las principales Certificaciones Oficiales con el objetivo de analizar si
sus exámenes son adecuados a las pautas marcadas por el MCER en cuanto a forma,
contenido y evaluación. Además, los métodos de evaluación tradicional todavía
predominantes en los centros educativos no son adecuados para evaluar estas destrezas
comunicativas, lo cual conlleva una necesidad de investigación en dicha área para
encontrar formas de evaluación capaces de medirlas y evaluarlas. En este ámbito, la
introducción de rúbricas como instrumento objetivo de evaluación o como herramienta
de enseñanza aprendizaje se ha convertido en una de las principales áreas de estudio.
A pesar de que en muchos países como Estados Unidos, las rúbricas son un instrumento
de evaluación común desde principios del siglo XX, en España no se les ha prestado
demasiada atención hasta hace una década. En los últimos años la aparición de rúbricas
en los libros de texto se ha hecho más frecuente, en parte debido a la aprobación de la ley
actual de educación, la LOMCE. De todas formas, todavía no se ha realizado suficiente
investigación y aún hay muchas cuestiones por contestar acerca de su aplicación,
efectividad, usos y objetividad.
No obstante, el MCER no solo ha ejercido una gran influencia sobre nuevas metodologías
y enfoques, sino que también lo ha hecho sobre otros aspectos, por ejemplo, aquellas
8
tareas que son adecuadas para poder examinar las distintas destrezas. En este sentido, el
propio marco es una gran fuente de referencia que detalla diferentes actividades
apropiadas para examinar a los estudiantes. Asimismo, el marco también es un punto de
referencia sobre los estándares de aprendizaje que sirve para la preparación de los
currículos y las programaciones escolares y a la vez para definir los contenidos de un
examen y los criterios que se deben usar para su evaluación.
La presente tesis doctoral se ha escrito tras una lectura amplia y exhaustiva de fuentes
primarias sobre la evaluación de ILE, en particular, sobre el uso de rúbricas y del MCER
para la creación de exámenes y herramientas de evaluación. Uno de los principales fines
de la presente tesis es facilitar a la comunidad educativas pruebas acerca de la efectividad
y validez con la que las normativas del MCER se han adaptado e implementado en los
principales exámenes de los certificados de inglés más comunes en España. Con tal
intención, se ha analizado cada uno de los exámenes diseñados para examinar cada
destreza, es decir, las tareas que incluyen, cuáles son sus objetivos y sus criterios para
averiguar si siguen las directrices del marco. Además, la investigación estudia los
exámenes de qué destrezas utilizan una rúbrica para su evaluación y examina dichas
rúbricas para comprobar su efectividad. Este análisis de los exámenes que forman cada
uno de los certificados junto con el análisis de las rúbricas que utilizan pretende aportar
información a la comunidad educativa. Esta información puede utilizarse de diferentes
maneras, por ejemplo, para detectar qué ejercicios son más frecuentes para examinar una
destreza y qué criterios deben considerarse para la evaluación de tal destreza. También
nos permitió saber si el marco se está utilizando de manera correcta e identificar las
incoherencias en cuanto a sus recomendaciones y su implementación real. Por otra parte,
el análisis de rúbricas permitió establecer patrones entre ellas. De este modo, se podrá
averiguar qué tipos de rúbricas son las más comunes, si son efectivas o no y qué aspectos
9
pueden mejorarse para aumentar su fiabilidad. Finalmente, este detallado análisis ayudará
a la creación y diseño de rúbricas futuras.
En cuanto a las líneas de investigación, es evidente que el inglés es un campo de estudio
muy amplio y esta tesis doctoral mezcla distintas áreas. En primer lugar, la educación, la
asignatura de inglés como lengua extranjera y su relación con el MCER. También
podemos enmarcar esta investigación dentro del estudio de nuevos métodos de
evaluación, en concreto de rúbricas, o en el área de certificados de lenguas extranjeras.
El enfoque comunicativo que promueve el Consejo Europeo ha revolucionado las
metodologías de enseñanza y aprendizaje que existían antes de la última década. La
consolidación de la Unión Europea y la necesidad social de adaptarse a las demandas y
exigencias del mercado de la globalización ha llevado a centrarse en las destrezas
comunicativas. Tener fluidez en inglés implica que uno puede mantener una conversación
sobre un tema general y entender a cualquier persona sin demasiada dificultad. Por lo
tanto, es esencial que la destreza oral y la auditiva tengan prioridad. Igualmente, lograr
alcanzar una buena competencia comunicativa es el principal objetivo de cualquier
currículum de inglés como lengua extranjera. Llegados a este punto, es fundamental dejar
claro que las destrezas comunicativas se refieren a aquellas involucradas en la actuación
oral, es decir, la comprensión auditiva y la expresión oral. Por su parte, la competencia
comunicativa requiere que el estudiante sepa la lengua y tenga conocimientos suficientes
para saber qué es adecuado en una comunidad de hablantes en una situación concreta, así
como conocimientos sobre pragmática, discurso y cultura o estrategias para solventar
posibles dificultades. Todo esto requiere que la programación, tareas y enseñanza se
transformen y alejen de metodologías tradicionales centradas en conocimientos
gramaticales. Y para que sea posible, es necesario investigar para que la transformación
10
de la enseñanza sea efectiva y significativa y las nuevas metodologías e instrumentos
utilizados estén validados.
Todo esto es lógico si se tiene en cuenta que muchas de las metodologías imperantes
utilizan ejercicios de verdadero o falso o de respuesta múltiple, los cuales no necesitan de
una rúbrica para determinar si la respuesta es incorrecta. Sin embargo, para determinar si
una composición es buena o mal es fundamental contar con una herramienta fiable y
objetiva. La presencia de muchos o pocos errores gramaticales es solo uno de los criterios
que deben utilizarse junto con otros como el nivel y uso de vocabulario, la puntuación, la
coherencia y la conexión. No obstante, no cualquier rúbrica puede utilizarse porque no
todas las rúbricas son objetivas, válidas y fiables y de ahí que sea fundamental investigar
en esta área. Es necesario saber cómo diseñar y crear una rúbrica efectiva o al menos
cómo determinar si una ya hecha lo es o no.
Otra área de estudio es la tecnológica puesto que los nuevos dispositivos electrónicos son
una realidad innegable en la sociedad actual. Han tenido un impacto considerable en
prácticamente todos los aspectos de nuestra vida: el trabajo, la salud, las interacciones
sociales, la industria, la medicina… etc. y, por supuesto, también en la educación. Los
estudiantes de lenguas extranjeras tienen ahora acceso ilimitado a una gran cantidad de
input y pueden acceder a miles de muestras y ejemplos con solo un clic, así como a
múltiples diccionarios, tesauros o aplicaciones de vocabulario. Esta tesis doctoral quiere
hacer también una pequeña aportación al campo con la recopilación de recursos en línea
para crear rúbricas.
Otro campo de investigación relacionado es el de los niveles de competencia en una
lengua. Determinar el nivel de competencia de una persona en una lengua no es una tarea
fácil y el MCER ha hecho una gran labor elaborando una escala y unificando criterios
para que, por ejemplo, un nivel intermedio alto signifique lo mismo en un país que en
11
otro dentro de la Unión Europea. Los sistemas educativos europeos han ido adaptándose
a los nuevos requerimientos exigidos en el marco al igual que aquellos certificados
dedicados a otorgar diplomas de certificación de nivel en una lengua. Evidentemente, se
han producido muchos ajustes para adaptar los exámenes de certificación al marco y sus
directrices. Sin embargo, todavía queda mucho por hacer. Es en esta línea donde se sitúa
la contribución de esta investigación ya que el análisis de los exámenes y sus rúbricas
indica aquellos puntos en el que las recomendaciones del marco no se han implementado
correctamente para que puedan corregirse.
Una vez establecidos los objetivos y las líneas de investigación relacionadas con la
presente investigación, es momento de hablar de los resultados y conclusiones de la
misma. La presente investigación ha sido satisfactoria en muchos aspectos. Ha permitido
no solo una comprensión profunda del complejo sistema de evaluación sino también una
mejor percepción del MCER como piedra angular de la enseñanza de inglés como lengua
extranjera en Europa y como fuerza conductora del establecimiento del sistema
comunicativo en todos sus países miembros. No obstante, uno de los descubrimientos
más significativos que ha surgido de la investigación es la verificación de que los
certificados que determinan la competencia de los candidatos en lengua inglesa presentan
fallos en la implementación de las directrices básicas del MCER y en la evaluación con
rúbricas que, en ocasiones, no cumplen con los requisitos de eficiencia y fiabilidad que
se espera de ellas. Además, han surgido limitaciones a la investigación derivadas de la
falta de transparencia de los certificados en cuento a criterios e instrumentos de
evaluación o datos de fiabilidad, A pesar de sus estatus oficial y nacional, algunos de los
certificados analizados proporcionan información vaga sobre la estructura de sus pruebas
y no muestran datos de fiabilidad de sus exámenes como el Alpha Cronbach.
12
Con referencia a los aspectos positivos, el análisis tanto de los exámenes como de sus
rúbricas de evaluación ha permitido realizar comparaciones que abren líneas nuevas e
interesantes de investigación. También la comunidad puede beneficiarse de los hallazgos
clave que han establecido patrones entre la evaluación de las destrezas y pueden contribuir
a la mejora del proceso de evaluación desde la perspectiva del diseño de rúbricas y la
creación de exámenes. El cambio hacia un sistema de evaluación más comunicativo que
se ajuste más a la metodología y enseñanza utilizadas en el enfoque comunicativo puede
partir también de algunas de las premisas de esta investigación.
La primera observación que debe hacerse con respecto a los descubrimientos tiene que
ver con la necesidad de promover una mayor transparencia. Conocer el formato de un
examen, así como sus objetivos y sus criterios de evaluación es fundamental para
cualquier candidato. Por este motivo, es recomendable que las instituciones responsables
de dichos certificados publiquen en sus páginas web las rúbricas con las que van a evaluar
los examinadores, así como instrucciones precisas sobre el tipo de actividades que
contendrán sus pruebas y los objetivos y criterios de las mismas. Siguiendo esta línea de
transparencia, algunos de los certificados no ofrecen estudios que apoyen la validez y
eficiencia de sus certificados. Unos certificados con tal relevancia a nivel nacional
deberían estar respaldados con investigación sobre sus resultados y validez. Mientras los
certificados FCE y IELTS aportan estos resultados, documentos y estudios, ninguno de
los demás certificados lo hace, lo cual ha impedido una comparación de certificados con
respecto a sus coeficientes de fiabilidad.
Otro resultado clave de esta investigación ha sido el hecho de que ninguna de las rúbricas
empleadas por ninguno de los certificados analizados ha superado el test de validez
elaborado para este estudio a partir de una rúbrica de rúbricas y las directrices del marco
al respecto. Aunque algunas rúbricas presentan más aspectos a mejorar que otras, lo cierto
13
es que el hecho de que ninguna de ellas se pueda considerar plenamente válida ejemplifica
la dificultad y complejidad que conlleva diseñar una rúbrica efectiva. Una de las
principales complicaciones que se ha descubierto en la comparación de resultados es que
la dificultad de alcanzar un equilibrio entre usar descriptores que sean bastante breves
para que la rúbrica sea práctica y que las explicaciones sean lo suficientemente claras y
detalladas para que no se consideren demasiado vagas o abiertas. Alguna de las rúbricas,
por ejemplo, utiliza demasiados criterios de evaluación y son, en consecuencia, poco
prácticas o manejables. Por otro lado, algunas rúbricas sí son prácticas porque miden solo
cuatro criterios, pero sus descriptores son demasiado vagos.
El hecho de que ninguna de las rúbricas analizadas haya superado todos los criterios de
fiabilidad y validez lleva a reflexionar y considerar si el propio MCER es realmente
factible. El marco aporta directrices sobre cómo evaluar las distintas destrezas y lo hace
de maneras diferentes. Para empezar, contiene cuadros con información acerca de las
características que un candidato debe demostrar en cada uno de los niveles y destrezas.
Esto permite que las distintas instituciones y los profesores puedan elaborar el currículo,
las programaciones y diseñar las evaluaciones y los tests para la evaluación de contenidos
o la determinación de un nivel. Asimismo, incluye rúbricas con descriptores para cada
destreza e incluso para tareas o actividades concretas, por ejemplo, hay descriptores para
la evaluación de un monólogo, de una presentación o de un anuncio público. Sin embargo,
el uso de las rúbricas del propio marco implicaría que las instituciones que se encargan
de diseñar certificados de nivel tendrían que usar una rúbrica distinta para cada tarea del
examen o una muy global para todas que no sería tan precisa. Aunque estos
inconvenientes no sean desventajas en sí mismos, pues asegurarían que todas las tareas
son adecuadas, al igual que sus instrumentos de evaluación, la cantidad de trabajo y
tiempo empleado se elevarían considerablemente. Otro posible inconveniente es que las
14
escalas proporcionadas por el marco son de tipo holístico, por lo que serían demasiado
genéricas para aquellos certificados que intenten determinar si un candidato posee o no
un determinado nivel y no podrían incluir notas sobre la actuación del mismo
(sobresaliente, notable, bien, suficiente…). Diversos estudios revisados han probado,
además, que las rúbricas de tipo analítico son más precisas que las holísticas. Por
consiguiente, son más fiables y, por eso, la mayoría de las rúbricas analizadas de los
certificados son analíticas a excepción de la empleada por el ISE II para evaluar la
compresión auditiva. Las instituciones que organizan y diseñan los certificados se
encargan también de la creación de rúbricas y éstas son, como se ha dicho, de tipo
analítico en su mayoría. Puesto que las escalas incluidas en el marco son holísticas, el
proceso de creación de las mismas es más complejo y, como resultado, las rúbricas
utilizadas por un certificado y otro son muy diferentes pese a ser del mismo nivel. Sin
embargo, la elección de escoger los criterios, descriptores y escalas puede personalizar
mucho más las rúbricas al modelo de prueba propuesto, pero a su vez las distancia más
del MCER. Es probable que si el propio marco incluyese rúbricas de tipo analítico para
las distintas destrezas y niveles, las rúbricas creadas por instituciones o los propios
docentes serían más similares entre sí y posiblemente más fiables.
En línea con el diseño de rúbricas según las recomendaciones expuestas en el marco:
viabilidad de la rúbrica de acuerdo con el tamaño y el número de criterios, escritura
positiva e información breve pero no vaga en los descriptores; cabe destacar la dificultad
de redactar descriptores con información breve pero no vaga ya mencionado. Además,
otro obstáculo de las rúbricas analizadas ha sido el de utilizar palabras positivas, ya que
conseguir redactar los descriptores de los niveles de actuación más bajos con palabras
positivas requiere un dominio absoluto de la lengua. Debido a esto, solo dos de las
rúbricas analizadas cumplen este apartado. Un mayor uso de palabras de connotación
15
positiva en el marco ayudaría a mejorar este aspecto proporcionando más ejemplos sobre
cómo realizar tan compleja tarea.
Teniendo en cuenta todo lo expuesto anteriormente, podría concluirse que el MCER, pese
a haber realizado una labor magnífica en la unificación de niveles y el impuso de la
competencia comunicativa, puede no ser todo lo aplicable que debería. Una vez que ya
se ha iniciado una reforma del sistema y la evaluación de lenguas extranjeras sería un
buen momento para revisar el propio marco e intentar mejorar las pequeñas deficiencias
que presenta.
Otra reflexión interesante de la investigación llevada a cabo es la relacionada con la
presencia o ausencia de rúbricas como herramienta de evaluación. Con respecto a esto, se
ha detectado que solo las destrezas productivas, es decir, la producción oral y la
producción escrita, son evaluadas con rúbricas. Tan sólo el ISE-II utiliza una rúbrica para
evaluar una destreza no productiva sino receptiva, la comprensión auditiva. Es verdad
que las tareas que tradicionalmente se utilizan para examinar las destrezas productivas
(una presentación, un monólogo, una redacción…) parecen a priori más fáciles de
examinar con una rúbrica. Sin embargo, nada impide que las destrezas receptivas puedan
evaluarse de manera objetiva y fiable con una rúbrica. No obstante, evaluar una prueba
de comprensión auditiva o escrita con una escala requeriría un cambio en los modelos de
examen y tareas que normalmente se utilizan para medir estas destrezas. Evaluar un
ejercicio de respuesta múltiple o verdadero o falso con una rúbrica no tiene ningún
sentido, pero si en lugar de este tipo de actividades se utilizase otras en las que el alumno
tuviese que demostrar su comprensión elaborando algún texto oral o escrito, sí sería
posible utilizarlas. Esto es precisamente lo que ocurre en la prueba auditiva del certificado
ISE II, en el que el alumno, después de escuchar el audio, debe elaborar un resumen
oralmente y a continuación conversar con el examinador acerca de lo que ha entendido,
16
contestando a sus preguntas cuando sea necesario. El cambio del tipo de tareas más
comunicativas también deberá llevar consigo un cambio en la metodología.
A pesar de los resultados y hallazgos de esta investigación, es fundamental tener en cuenta
las limitaciones de la misma. Una mayor transparencia por parte de las instituciones con
respecto a sus pruebas, objetivos, criterios y rúbricas habría facilitado esta investigación,
así como datos referentes a estudios previos relacionados con su fiabilidad.
Esta investigación también abre la puerta a nuevas líneas de investigación ya que la
investigación en el ámbito de la evaluación de lenguas extranjeras, el MCER y las rúbricas
todavía requieren de muchos más estudios. Entre las distintas investigaciones que se
podrían seguir a raíz de esta tesis doctoral las más importantes estarían relacionadas con
un estudio de la fiabilidad que comparase los distintos certificados, por ejemplo,
comprobando la nota obtenida por un mismo candidato presentándose al mismo nivel en
los distintos certificados o la nota obtenida por un mismo candidato en un examen si es
evaluado por diversos examinadores independientes diferentes. Por último, un estudio
sobre la aplicabilidad del MCER sería de sumo interés, así como uno que analice hasta
qué punto se respeta y aplican los principios del marco en el sistema educativo actual.
Todas estas nuevas líneas de investigación apuntan a un gran espacio de investigación en
esta área para alcanzar un sistema que garantice inequívocamente que todos los
estudiantes puedan desarrollar al máximo su potencial. Son muchos los pasos que ya se
han dado en las últimas dos décadas, en el campo del aprendizaje de lenguas se han
realizado cambios importantes hacia un aprendizaje comunicativo real. Esto debería, por
tanto, ser fuente de inspiración para todos los investigadores, para demostrarles que los
hallazgos, la disciplina y la investigación de verdad pueden transformar la realidad en la
que vivimos.
17
Galician summary (long)
Nos últimos anos son moitas as investigacións que se realizaron sobre a avaliación na
aula de inglés como Lingua Estranxeira. O establecemento do Marco Común Europeo de
Referencia para as Linguas: Ensino, Aprendizaxe e Avaliación (MCER) impulsou
cambios no proceso de ensino aprendizaxe, tales como o enfoque comunicativo ou o
traballo das catro destrezas dentro da aula: comprensión oral, comprensión escrita,
produción oral e produción escrita. Porén, cómpre analizar as principais Certificacións
Oficiais co obxectivo de analizar se os seus exames son adecuados ás pautas marcadas
polo MCER en canto a forma, contido e avaliación. Ademais, os métodos de avaliación
tradicional aínda predominantes nos centros educativos non son adecuados para avaliar
estas destrezas comunicativas, o cal da lugar a unha necesidade de investigación na
devandita área para atopar formas de avaliación capaces de medilas e avalialas. Neste
ámbito, a introdución de rúbricas como instrumento obxectivo de avaliación ou como
ferramenta de ensino aprendizaxe converteuse nunha das principais áreas de estudo.
A pesar de que en moitos países como Estados Unidos, as rúbricas son un instrumento de
avaliación común dende principios do século XX, en España non se lles prestou
demasiada atención ata hai unha década. Nos últimos anos a aparición de rúbricas nos
libros de texto fíxose máis frecuente, en parte debido á aprobación da lei actual de
educación, a LOMCE. De tódolos xeitos, aínda non se realizou suficiente investigación e
aínda hai moitas cuestións por contestar sobre a súa aplicación, efectividade, usos e
obxectividade.
Con todo, o MCER non só exerceu unha gran influencia sobre novas metodoloxías e
enfoques, senón que tamén o fixo sobre outros aspectos, por exemplo, aquelas tarefas que
son adecuadas para poder examinar as distintas destrezas. Neste sentido, o propio marco
18
é unha gran fonte de referencia que detalla diferentes actividades apropiadas para
examinar aos estudantes. Así mesmo, o marco tamén é un punto de referencia sobre os
estándares de aprendizaxe que serve para a preparación dos currículos e as programacións
escolares e ao mesmo tempo para definir os contidos dun exame e os criterios que se
deben usar para a súa avaliación.
A presente tese doutoral escribiuse tras unha lectura ampla e exhaustiva de fontes
primarias sobre a avaliación de ILE, en particular, sobre o uso de rúbricas e do MCER
para a creación de exames e ferramentas de avaliación. Un dos principais fins da presente
tese é facilitar á comunidade educativas probas sobre a efectividade e validez coa que as
normativas do MCER foron adaptadas e aplicadas nos principais exames dos certificados
de inglés máis comúns en España. Con tal intención, analizouse cada un dos exames
deseñados para examinar cada destreza, é dicir, as tarefas que inclúen, cales son os seus
obxectivos e os seus criterios para pescudar se seguen as directrices do marco. Ademais,
a investigación estudou os exames que utilizan unha rúbrica para a súa avaliación e
examina ditas rúbricas para comprobar a súa efectividade. Esta análise dos exames que
forman cada un dos certificados xunto coa análise das rúbricas que utilizan pretende
achegar información á comunidade educativa. Esta información pode utilizarse de
diferentes maneiras, por exemplo, para detectar os exercicios que son máis frecuentes
para examinar unha destreza e os criterios que deben considerarse para a avaliación de tal
destreza. Tamén permitiu saber se o marco está a utilizarse de maneira correcta e
identificar as incoherencias en canto ás súas recomendacións e a súa aplicación real. Por
outra banda, a análise de rúbricas fixo posible establecer patróns entre elas. Deste xeito,
pescudáronse os tipos de rúbricas que son as máis comúns, se son efectivas ou non e os
aspectos que poden mellorarse para aumentar a súa fiabilidade. Finalmente, esta detallada
análise axudará á creación e deseño de rúbricas futuras.
19
En canto ás liñas de investigación, é evidente que o inglés é un campo de estudo moi
amplo e esta tese doutoral mestura distintas áreas. En primeiro lugar, a educación, a
materia de Inglés como Lingua Estranxeira e a súa relación co MCER. Tamén podemos
enmarcar esta investigación dentro do estudo de novos métodos de avaliación, en concreto
das rúbricas, ou na área de certificados de linguas estranxeiras.
O enfoque comunicativo que promove o Consello Europeo revolucionou as metodoloxías
de ensino e aprendizaxe que existían antes da última década. A consolidación da Unión
Europea e a necesidade social de adaptarse ás demandas e esixencias do mercado da
globalización levou a centrarse nas destrezas comunicativas. Ter fluidez en inglés implica
que un pode manter unha conversación sobre un tema xeral e entender a calquera persoa
sen demasiada dificultade. Polo tanto, é esencial que a destreza oral e a auditiva teñan
prioridade. Igualmente, lograr alcanzar unha boa competencia comunicativa é o principal
obxectivo de calquera currículo de inglés como lingua estranxeira. Chegados a este punto,
é fundamental deixar claro que as destrezas comunicativas refírense a aquelas
involucradas na actuación oral, é dicir, a comprensión auditiva e a expresión oral. Pola
súa banda, a competencia comunicativa require que o estudante saiba a lingua e teña
coñecementos suficientes para saber o que é adecuado nunha comunidade de falantes
nunha situación concreta, así como coñecementos sobre pragmática, discurso e cultura ou
estratexias para esquivar posibles dificultades. Todo isto require que a programación,
tarefas e ensino se transformen e afasten de metodoloxías tradicionais centradas en
coñecementos gramaticais. E para que isto sexa posible, cómpre investigar para que a
transformación do ensino sexa efectiva e significativa e as novas metodoloxías e
instrumentos utilizados estean validados.
Todo isto é lóxico se se ten en conta que moitas das metodoloxías imperantes utilizan
exercicios de verdadeiro ou falso ou de resposta múltiple, os cales non necesitan dunha
20
rúbrica para determinar se a resposta é incorrecta. Non obstante , para determinar se unha
composición é boa ou mala é fundamental contar cunha ferramenta fiable e obxectiva. A
presenza de moitos ou poucos erros gramaticais é só un dos criterios que deben utilizarse
xunto con outros como o nivel e uso de vocabulario, a puntuación, a coherencia e a
conexión. No entanto, non calquera rúbrica pode utilizarse porque non todas as rúbricas
son obxectivas, válidas e fiables e por iso é polo que é fundamental investigar nesta área.
É necesario saber como deseñar e crear unha rúbrica efectiva ou polo menos como
determinar se unha xa feita o é ou non.
Outra área de estudo é a tecnolóxica posto que os novos dispositivos electrónicos son
unha realidade innegable na sociedade actual. Tiveron un impacto considerable en
practicamente todos os aspectos da nosa vida: o traballo, a saúde, as interaccións sociais,
a industria, a medicina… etc. e, por suposto, tamén na educación. Os estudantes de
linguas estranxeiras teñen agora acceso ilimitado a unha gran cantidade de input e poden
acceder a miles de mostras e exemplos con só un clic así como a múltiples dicionarios,
tesauros ou aplicacións de vocabulario. Esta tese doutoral quere facer tamén unha
pequena achega ao campo coa recompilación de recursos en liña para crear rúbricas.
Outro campo de investigación relacionado é o dos niveis de competencia nunha lingua.
Determinar o nivel de competencia dunha persoa nunha lingua non é unha tarefa fácil e o
MCER fixo un gran labor elaborando unha escala e unificando criterios para que, por
exemplo, un nivel intermedio alto signifique o mesmo nun país que noutro dentro da
Unión Europea. Os sistemas educativos europeos foron adaptándose aos novos
requirimentos esixidos no marco do mesmo xeito que aqueles certificados dedicados a
outorgar diplomas de certificación de nivel nunha lingua. Evidentemente, producíronse
moitos axustes para adaptar os exames de certificación ao marco e as súas directrices. Así
a todo, aínda queda moito por facer. É nesta liña onde se sitúa a contribución desta
21
investigación xa que a análise dos exames e as súas rúbricas indica aqueles puntos no que
as recomendacións do marco non se adaptaron correctamente para que poidan corrixirse.
Unha vez establecidos os obxectivos e as liñas de investigación relacionadas coa presente
investigación, é momento de falar dos resultados e conclusións da mesma. A presente
investigación foi satisfactoria en moitos aspectos. Permitiu non só unha comprensión
profunda do complexo sistema de avaliación senón tamén unha mellor percepción do
MCER como pedra angular do ensino de inglés como lingua estranxeira en Europa e
como forza condutora do establecemento do sistema comunicativo en todos os seus países
membros. No entanto, un dos descubrimentos máis significativos que xurdiu da
investigación é a verificación de que os certificados que determinan a competencia dos
candidatos en lingua inglesa presentan fallos na adaptación das directrices básicas do
MCER e na avaliación con rúbricas que, en ocasións, non cumpren cos requisitos de
eficiencia e fiabilidade que se espera delas. Ademais, xurdiron limitacións á investigación
derivadas da falta de transparencia dos certificados en canto a criterios e instrumentos de
avaliación ou datos de fiabilidade. A pesar da súa posición oficial e nivel nacional, algúns
dos certificados analizados proporcionan información vaga sobre a estrutura das súas
probas e non mostran datos de fiabilidade dos seus exames como o Alpha Cronbach.
Con referencia aos aspectos positivos, a análise tanto dos exames como das súas rúbricas
de avaliación permitiu realizar comparacións que abren liñas novas e interesantes de
investigación. Tamén a comunidade pode beneficiarse dos achados clave que permitiron
establecer patróns entre a avaliación das destrezas e poden contribuír á mellora do proceso
de avaliación dende a perspectiva do deseño de rúbricas e a creación de exames. O cambio
cara a un sistema de avaliación máis comunicativo que se axuste máis á metodoloxía e
ensino utilizadas no enfoque comunicativo pode partir tamén dalgunhas das premisas
desta investigación.
22
A primeira observación que debe facerse con respecto aos descubrimentos ten que ver
coa necesidade de promover unha maior transparencia. Coñecer o formato dun exame,
así como os seus obxectivos e os seus criterios de avaliación é fundamental para calquera
candidato. Por este motivo, é recomendable que as institucións responsables dos
devanditos certificados publiquen nas súas páxinas web as rúbricas coas que van avaliar
os examinadores, así como instrucións precisas sobre o tipo de actividades que conterán
as súas probas e os obxectivos e criterios dos mesmos. Seguindo esta liña de
transparencia, algúns dos certificados non ofrecen estudos que apoien a validez e
eficiencia dos seus certificados. Uns certificados con tal relevancia a nivel nacional
deberían estar apoiados con investigación sobre os seus resultados e validez. Mentres os
certificados FCE e IELTS achegan estes resultados, documentos e estudos, ningún dos
demais certificados o fai, o cal impediu unha comparación de certificados con respecto
aos seus coeficientes de fiabilidade.
Outra resultado clave desta investigación foi o feito de que ningunha das rúbricas
empregadas por ningún dos certificados analizados superou o test de validez elaborado
para este estudo a partir dunha rúbrica de rúbricas e as directrices do marco. Aínda que
algunhas rúbricas presentan máis aspectos a mellorar que outras, o certo é que o feito de
que ningunha delas se considerase plenamente válida exemplifica a dificultade e
complexidade que provoca deseñar unha rúbrica efectiva. Unha das principais
complicacións que se descubriu na comparación de resultados é que a dificultade de
alcanzar un equilibrio entre usar descritores que sexan bastante breves para que a rúbrica
sexa práctica e que as explicacións sexan o suficientemente claras e detalladas para que
non se consideren demasiado vagas ou abertas. Algunha das rúbricas, por exemplo, utiliza
demasiados criterios de avaliación e son, en consecuencia, pouco prácticas ou
23
manexables. Doutra banda, algunhas rúbricas si son prácticas porque miden só catro
criterios, pero os seus descritores son demasiado vagos.
O feito de que ningunha das rúbricas analizadas superase todos os criterios de fiabilidade
e validez leva a reflexionar e considerar se o propio MCER é realmente factible. O marco
achega directrices sobre como avaliar as distintas destrezas de maneiras diferentes. Para
empezar, contén cadros con información sobre as características que un candidato debe
demostrar en cada un dos niveis e destrezas. Isto permite que as distintas institucións e os
profesores poidan elaborar o currículo, as programacións e deseñar as avaliacións e os
tests para a avaliación de contidos ou a determinación dun nivel. Así mesmo, inclúe
rúbricas con descritores para cada destreza e mesmo para tarefas ou actividades concretas,
por exemplo, hai descritores para a avaliación dun monólogo, dunha presentación ou dun
anuncio público. Agora ben, o uso das rúbricas do propio marco implicaría que as
institucións que se encargan de deseñar certificados de nivel terían que usar unha rúbrica
distinta para cada tarefa do exame ou unha máis global para todas que non sería tan
precisa. Aínda que estes inconvenientes non sexan desvantaxes en si mesmos, pois
asegurarían que todas as tarefas son adecuadas, do mesmo xeito que os seus instrumentos
de avaliación, a cantidade de traballo e tempo empregado elevaríanse considerablemente.
Outro posible inconveniente é que as escalas proporcionadas polo marco son de tipo
holístico, polo que serían demasiado xenéricas para aqueles certificados que tenten
determinar se un candidato posúe ou non un determinado nivel e non poderían incluír
notas sobre a actuación do mesmo (sobresaliente, notable, ben, suficiente…). Diversos
estudos revisados probaron, ademais, que as rúbricas de tipo analítico son máis precisas
que as holísticas. Por conseguinte, son máis fiables e, por iso, a maioría das rúbricas
analizadas dos certificados son analíticas salvo a empregada polo ISE II para avaliar a
compresión auditiva. As institucións que organizan e deseñan os certificados encárganse
24
tamén da creación de rúbricas e estas son, como se dixo, de tipo analítico na súa maioría.
Posto que as escalas incluídas no marco son holísticas, o proceso de creación das mesmas
é máis complexo e, como resultado, as rúbricas utilizadas por un certificado e outro son
moi diferentes a pesar de ser do mesmo nivel. Con todo, a elección de escoller os criterios,
descritores e escalas pode personalizar moito máis as rúbricas ao modelo de proba
proposto, pero á súa vez as distancian máis do MCER. É probable que se o propio marco
incluíse rúbricas de tipo analítico para as distintas destrezas e niveis, as rúbricas creadas
por institucións ou os propios docentes foran máis similares entre si e posiblemente máis
fiables.
En liña co deseño de rúbricas segundo as recomendacións expostas no marco: viabilidade
da rúbrica de acordo co tamaño e o número de criterios, escritura positiva e información
breve pero non vaga nos descritores; cabe destacar a dificultade de redactar descritores
con información breve pero non vaga como xa se mencionou. Ademais, outro obstáculo
das rúbricas analizadas foi o de utilizar palabras positivas, xa que conseguir redactar os
descritores dos niveis de actuación máis baixos con palabras positivas require un dominio
absoluto da lingua. Debido a isto, só dous das rúbricas analizadas cumpren este apartado.
Un maior uso de palabras de connotación positiva no marco axudaría a mellorar este
aspecto proporcionando máis exemplos sobre como realizar tan complexa tarefa.
Tendo en conta todo o exposto anteriormente, podería concluírse que o MCER, a pesar
de realizar un labor magnífico na unificación de niveis e o impuxo da competencia
comunicativa, pode non ser todo o aplicable que debería. Unha vez que xa se iniciou unha
reforma do sistema e a avaliación de linguas estranxeiras sería un bo momento para
revisar o propio marco e tentar mellorar as pequenas deficiencias que presenta.
Outra reflexión interesante da investigación levada a cabo é a relacionada coa presenza
ou ausencia de rúbricas como ferramenta de avaliación. Con respecto a isto, detectouse
25
que só as destrezas produtivas, é dicir, a produción oral e a produción escrita, son
avaliadas con rúbricas. Tan só o ISE-II utiliza unha rúbrica para avaliar unha destreza non
produtiva senón receptiva, a comprensión auditiva. É verdade que as tarefas que
tradicionalmente se utilizan para examinar as destrezas produtivas (unha presentación, un
monólogo, unha redacción…) parecen a priori máis fáciles de examinar cunha rúbrica.
Porén, nada impide que as destrezas receptivas poidan avaliarse de maneira obxectiva e
fiable cunha rúbrica. Así a todo, avaliar unha proba de comprensión auditiva ou escrita
cunha escala requiriría un cambio nos modelos de exame e tarefas que normalmente se
utilizan para medir estas destrezas. Avaliar un exercicio de resposta múltiple ou
verdadeiro ou falso cunha rúbrica non ten ningún sentido, pero se en lugar deste tipo de
actividades utilizásese outras nas que o alumno tivese que demostrar a súa comprensión
elaborando algún texto oral ou escrito, si sería posible utilizalas. Isto é precisamente o
que ocorre na proba auditiva do certificado ISE II, no que o alumno, despois de escoitar
o audio, debe elaborar un resumo oral e, a continuación, conversar co examinador sobre
o que entendeu, contestando as súas preguntas cando sexa necesario. O cambio do tipo
de tarefas máis comunicativas tamén deberá levar consigo un cambio na metodoloxía.
A pesar dos resultados e achados desta investigación, é fundamental ter en conta as
limitacións da mesma. Unha maior transparencia por parte das institucións con respecto
ás súas probas, obxectivos, criterios e rúbricas facilitaría esta investigación, así como
datos referentes a estudos previos relacionados coa súa fiabilidade.
Esta investigación tamén abre a porta a novas liñas de investigación xa que a
investigación no ámbito da avaliación de linguas estranxeiras, o MCER e as rúbricas
aínda requiren de moitos máis estudos. Entre as distintas investigacións que se poderían
seguir por mor desta tese doutoral, as máis importantes estarían relacionadas cun estudo
da fiabilidade que comparase os distintos certificados, por exemplo, comprobando a nota
26
obtida por un mesmo candidato presentándose ao mesmo nivel nos distintos certificados
ou a nota obtida por un mesmo candidato nun exame se é avaliado por diversos
examinadores independentes diferentes. Por último, un estudo sobre a aplicabilidade do
MCER sería de sumo interese así como un que analice ata que punto respéctase e aplican
os principios do marco no sistema educativo actual.
Todas estas novas liñas de investigación apuntan a un gran espazo de investigación nesta
área para alcanzar un sistema que garanta inequivocamente que todos os estudantes
poidan desenvolver ao máximo o seu potencial. Son moitos os pasos que xa se deron nas
últimas dúas décadas, no campo da aprendizaxe de linguas realizáronse cambios
importantes cara a unha aprendizaxe comunicativa real. Isto debería, polo tanto, ser fonte
de inspiración para todos os investigadores, para demostrarlles que os achados, a
disciplina e a investigación de verdade poden transformar a realidade na que vivimos.
27
Prologue
In the new global economy, English has become the most important and relevant language
in the world. However, English had acquired the status of lingua franca par excellence
much before the advances in transportation and technology enabled the intensification of
trade around the world. Thus, this new era, often known as globalisation, has resulted in
an upsurge in the significance of the language. This explains why the learning of English
as a Foreign Language1 (hereinafter, EFL) is nowadays a primary concern worldwide.
This need to acquire the language has spurred a proliferation of studies related to the
Teaching of English as a Foreign Language (henceforth, TEFL), and consequently many
different methodologies have emerged through the years, with varying success. While
some of them are still being implemented and coexist, others have been cast aside. At the
same time, new methods, tasks, instruments and assessment2 tools are appearing, most of
them inspired by the new technological advances. Along with these, new certificates,
titles and diplomas which certify an individual’s language competence and level of
proficiency have been created. This is partly trigged by the fact that demonstration of the
level of fluency in a language has become an essential requirement in order to apply for
a job or for a grant, or to enable somebody to study in other country. In addition, in
primary and secondary schools, and also at university and other academic contexts,
English is frequently evaluated and, most of the time, compulsory.
1 Foreign Language: according to Richards and Schmidt: a language which is not the native language of large numbers of people in a particular country or region, is not used as a medium of instruction, and is not widely used as a medium of communication in government, media, etc. Foreign languages are typically taught as school subjects for the purpose of communicating with foreigners or for reading printed materials in a language (206). As opposed to second language: a language that plays a major role in a particular country, though it may not be the first language of many people who use it (472) 2 Assessment and evaluation will be treated as synonyms owing to of stylistic reasons, although the difference between them will be explained later.
28
In recent years, the Council of Europe, through the Common European Framework of
Reference for Languages: Learning, teaching and assessment (from now on CEFR), has
been promoting shifts in the teaching-learning process, such as the implementation of the
communicative approach3 or the practice of the four language skills in the classroom
(speaking, reading, listening and writing) rather than focusing solely on writing and
grammar. As a result, a great deal of research on assessment in the EFL classroom has
been conducted. Nevertheless, traditional assessment methods remain predominant and
are still used in most schools, even though they are not suitable for the assessment of
communicative language competence, which could be defined as the underlying
knowledge a speaker has of the rules of grammar, including phonology, orthography,
syntax, lexicon, and semantics, and the rules for their use in socially appropriate
circumstances (Hymes 269-293). This conveys the need for research in this area in order
to find new assessment tools to evaluate them. Traditional paper tests no longer work for
the evaluation of speaking and must be discarded in order to pave the way for new
assessment methods.
In light of this new situation, research into rubrics4 has become a central area of study. In
spite of the fact that many countries, such as the USA, have been using them as a common
assessment tool since the beginning of the 20th century, far too little attention has been
paid to them in our country. By contrast, the appearance of grading scales in school
textbooks in Spain has not become frequent until recently and its current presence has
been partly prompted by the current education law, known as the LOMCE. So far,
3 Communicative approach: approach to foreign language learning teaching which emphasises that the goal of language learning is communicative competence and which seeks to make meaningful communication and language use a focus of all classroom activities (Richards and Schimdt 90) 4 Rubrics: for stylistic purposes and due to the high frequency of appearance of the term, rubrics will be sometimes referred as grading scales.
29
however, there has been little discussion on rubrics and there are still many enquiries
about their application, effectivity, uses and objectivity which need to be addressed.
Not only has the CEFR had an influence on teaching methodologies and approaches, but
also over other aspects; for instance, it is a valuable source of information regarding
suitable assessment tasks for each of the skills. Furthermore, it provides useful knowledge
of Learning Standards. Those learning standards facilitate not only the preparation of the
curriculum and the syllabus, but also the contents of a paper and the criteria used for the
assessment thereof.
This doctoral thesis has been written after an exhaustive and comprehensive reading of
primary resources for the assessment of EFL, especially, about the use of rubrics and the
use of the CEFR for the creation of assessment tests and tools. The main aim of the
research is to provide the educational community with evidence on how appropriately the
CEFR guidelines have been implemented in the assessment papers of the most common
English Certificates in Spain. In order to do so, an analysis of each skill’s exam papers
will be carried out, and this will include the tasks, the objectives of each one thereof, and
whether or not they are adhering to the indications of the CEFR. Furthermore, the research
studies which skills are being assessed with a rubric and examines them to check their
effectivity.
This analysis of the exam papers which form the English Certificates selected, together
with the rubrics employed for their grading, intends to provide the educational community
with a useful source of information which may be used with different purposes. One of
these could be the detection of which tasks are more frequent for the assessment of a
particular skill and which criteria are usually considered for their evaluation.
Furthermore, it will allow us to check whether the framework is being used correctly and
it will identify incoherencies and omissions disregards in reference to the framework’s
30
recommendations. On the other hand, the analysis of rubrics will make it possible to
establish patterns among them. In this way the most common types could be detected.
Moreover, it will be determined if they are effective or not and in which aspects they can
be improved. Finally, this entire analysis will help in the creation and design of future
rubrics.
Objectives of the thesis
The entire thesis has been conceived with the intention of providing the educational
community with research that may help to improve of the assessment of EFL.
The creation of the CEFR has encouraged a new approach in TEFL and the learning of
foreign languages. Old-fashioned methodologies, such as the grammar-translation
method5, have been substituted for communicative ones6, which prioritise speaking and
listening skills and favour practical scopes. All of these changes imply a necessary switch
in the assessment and evaluation processes, since traditional paper-based exams are no
longer effective.
Taking this into account, the aim of this thesis is to facilitate ease the required change in
the assessment process. In order to do so, all the information related to assessment,
involving both traditional and new methodologies, needs to be gathered. It is essential to
know the dimension of assessment so that contributions to its improvement can be
produced.
5 Grammar-Translation Method: a method of foreign or second language teaching which makes use of
translation and grammar study as the main teaching and learning activities (Richards and Schmidt 231) 6 All the methods derived from the communicative approach, such as Task-Based Language Teaching, Cooperative Language Learning, and Content-Based Instruction
31
Secondly, it is also an intention of the current thesis to study in detail both the CEFR and
everything related to rubrics: the former because it is fundamental to analyse its
indications towards the teaching of foreign languages and, more specifically, the
assessment of foreign languages; and the latter because the effective use thereof requires
detailed knowledge of them. This includes knowing how they work, what advantages or
possible drawbacks they have, which different types exist and how they must be designed.
As a specific contribution to the current assessment of EFL and the use of rubrics is the
main objective of the current thesis, the principal goal is to study and analyse how the
most important Certificates of competence in English assess the different skills in relation
to the framework and how rubrics are being used to validate these certificates.
With the above stated purpose, the structure and tasks of the different papers from each
of the certificates selected will be studied. It will be checked whether they follow the
CEFR and if they are adapting its recommendations to their exams. Moreover, their
effectiveness and reliability will be tested and the areas in which certain improvements
or corrections should be made will be identified.
Concerning rubrics, it will be studied which papers do use a grading scale as an
assessment instrument. Furthermore, in those in which a rubric is employed, the rubrics
will be analysed in order to detect the most common types of rubrics, and to ascertain
whether they are effective and reliable. In addition, there will be a study on which aspects
can be improved and whether the framework guidance is being followed. The aim is to
stablish relations and patterns among the different rubrics and skills, to give advice for
their correction or improvement, if needed, and to establish accurate instructions for the
creation of future ones.
32
Finally, the research also intends to establish future lines of investigation which can
contribute to the current one, or which may be opened up thanks to the present one.
The main objectives and sub-objectives of the current thesis are as follows:
1- To gather and organise information relating to the assessment of English as a
Foreign Languages (EFL).
1.1. To study alternative evaluation methods and instruments of
evaluation.
2- To study the Common European Framework (CEFR) in relation to
assessment, Learning Standards and grading scales.
3- To do an in-depth study on the theory, definition, types and composition of
rubrics.
3.1. To provide information about online rubrics and instruments to design
them.
4- To analyse some of the most common English Certificates in relation to their
papers and rubrics.
4.1. To determine whether the implementation of the Framework is
correct.
4.2. To establish patterns and relationships among them.
4.3. To detect shortcomings or mistakes which could be improved.
5- To open new lines of research.
33
Table of contents ABSTRACT ...................................................................................................................................... 1
Resumen ........................................................................................................................................ 3
Resumo .......................................................................................................................................... 5
Spanish summary (long) ................................................................................................................ 7
Galician summary (long).............................................................................................................. 17
Prologue ...................................................................................................................................... 27
Chapter 1: INTRODUCTION ..................................................................................................... 37
1.1. State of the art ............................................................................................................. 37
1.2. Lines of research ......................................................................................................... 43
1.3. Synopsis ...................................................................................................................... 47
Chapter 2: REVIEW OF THE LITERATURE ........................................................................... 51
2.1. A historical review of evaluation ..................................................................................... 53
2.2. Types of evaluation .......................................................................................................... 62
a) According to the moment of application: .................................................................... 62
b) According to its extension ........................................................................................... 63
c) According to the agent of evaluation (ibid. 40) ........................................................... 63
d) According to the scale ................................................................................................. 64
e) According to the purpose ............................................................................................ 64
f) According to the scoring ............................................................................................. 65
g) According to the delivery methods employed ............................................................. 65
h) According to the formality .......................................................................................... 66
i) Divergent/convergent .................................................................................................. 66
j) Process/ Product .......................................................................................................... 66
2.3. Dimensions of assessment ................................................................................................ 66
2.4. Importance and consequences of evaluation .................................................................... 68
2.5. Traditional Evaluation ...................................................................................................... 72
2.6. Alternative evaluation ...................................................................................................... 76
2. 7. Assessment for Learning ................................................................................................. 80
2.8. Instruments ....................................................................................................................... 82
2.8.1. Portfolio ..................................................................................................................... 83
2.12.1.1. European Portfolio of languages ............................................................................. 85
2.8.2. Oral presentations ...................................................................................................... 86
2.8.3. Journals ..................................................................................................................... 87
2.8.4. Projects ...................................................................................................................... 88
2.8.5. Interviews .................................................................................................................. 90
34
2.8.6. Progressive assessment chart..................................................................................... 91
2.8.7. Report ........................................................................................................................ 92
2.8.8. Rubrics ...................................................................................................................... 93
2.9. Language Assessment Literacy ........................................................................................ 93
Chapter 3: The CEFR .................................................................................................................. 95
3.1. Common Reference Levels .............................................................................................. 96
3.2. Learning Standards ........................................................................................................... 98
3.3. Chapter 9: Assessment ................................................................................................... 101
3.4. The CEFR on rating scales or checklist ......................................................................... 104
3.5. Evaluation of competences ............................................................................................. 107
Chapter 4: RUBRICS ................................................................................................................ 111
4.1. Definition .................................................................................................................. 111
4.2. Why use rubrics? ....................................................................................................... 113
4.3. Historical Overview. Rubrics in Education ............................................................... 116
4.4. Types ......................................................................................................................... 118
a) According to how is measured: ................................................................................. 119
b) According to the scoring type: .................................................................................. 120
c) According to its theme .............................................................................................. 120
d) According to its application ...................................................................................... 121
e) According to its function ........................................................................................... 121
f) According to the scorer ............................................................................................. 122
g) According to the channel ........................................................................................... 122
4.5. Parts of a rubric ......................................................................................................... 123
4.6. Advantages and disadvantages .................................................................................. 124
4.7. How to build a rubric ................................................................................................ 127
4.8. Online Tools for building a rubric ............................................................................. 128
Chapter 5: METHODOLOGY .................................................................................................. 139
5.1. Introduction .................................................................................................................... 139
5.2. Methodological approach ............................................................................................... 141
5.3. Research design .............................................................................................................. 149
5.3.1. Objectives and context ............................................................................................ 149
5.3.2. Definition of the units of analysis ........................................................................... 150
5.3.3. Number scheme rules .............................................................................................. 153
5.3.4. Categorisation and codification ............................................................................... 164
5.3.5. Reliability and Validity ........................................................................................... 168
5.3.4. Data Analysis .......................................................................................................... 168
35
5.4. Hypotheses ..................................................................................................................... 171
Chapter 6: RESEARCH ............................................................................................................ 173
6.1. Proficiency Test exam papers for the different language skills and their assessment
rubrics 173
6.2. Writing ...................................................................................................................... 174
6.2.1. Literature review ............................................................................................... 174
6.2.2. Assessment of Writing in the main English Certificates ......................................... 183
6.3. Speaking .................................................................................................................... 228
6.3.1. Literature Review .............................................................................................. 229
6.3.2. Assessment of Speaking in the main English Certificates of ESL .................... 234
6.4. Reading ..................................................................................................................... 279
6.4.1. Literature Review .............................................................................................. 280
6.4.2. Assessment of Reading in the main English Certificates of ESL ..................... 284
6.5. Listening .................................................................................................................... 304
6.5.1. Literature Review .............................................................................................. 305
6.5.2. Assessment of listening in the main English Certificates of ESL ..................... 307
6.6. Findings ..................................................................................................................... 325
Chapter 7: CONCLUSIONS ..................................................................................................... 339
7.1. Research implications ............................................................................................... 340
7.2. Research limitations .................................................................................................. 346
7.3. Research applicability and future implications ......................................................... 347
Chapter 8: BIBLIOGRAPHY ................................................................................................... 353
8.1. List of figures: ................................................................................................................ 370
APPENDICES ........................................................................................................................... 373
Appendix 1: Exam samples ................................................................................................... 373
FCE ................................................................................................................................... 373
IELTS ................................................................................................................................ 391
ISE II ................................................................................................................................. 413
ACLES .............................................................................................................................. 421
EOI .................................................................................................................................... 431
Appendix 2: Rubrics ................................................................................................................. 441
FCE ................................................................................................................................... 441
IELTS ................................................................................................................................ 443
ISE-II ................................................................................................................................. 447
ACLES .............................................................................................................................. 453
EOI .................................................................................................................................... 455
36
Appendix 3: CEFR ................................................................................................................ 459
Common Reference Levels: self-assessment grid ............................................................. 459
Common Reference Levels: qualitative aspects of spoken language use ......................... 461
Chapter Four scales for different skills and tasks .............................................................. 463
37
Chapter 1: INTRODUCTION
1.1. State of the art
The use of the CEFR within the European Union for TEFL and the assessment of EFL is
already a reality. The Spanish government has implemented the framework’s guidelines
through education acts, and the main English Certificates for assessing English
competence claim to follow the CEFR patterns. Such a meaningful change in foreign
language learning and EFL have boosted research on CEFR. Furthermore, there are many
studies on how to implement the CEFR in other countries outside Europe. Nonetheless,
although not exactly scarce, research on the implementation of the CEFR is not so
prolific.
Mathea Simons and Josef Colpaert conducted a study on this issue with the aim of
shedding some light on how the CEFR is perceived and how it can be improved. Their
article “Judgemental Evaluation of the CEFR by stakeholders in language testing”
published in 2015 in Revista de Lingüística y Lenguas Aplicadas collects their findings.
They designed a survey which was discussed and answered by 138 users (teachers,
researchers, publishers, test developers and policy makers) who attended the International
Conference “Language Testing in Europe. Time for a Framework?” held in the University
of Antwerp in 2013. Among their findings, the fact that most of the responders use the
CEFR frequently in their jobs can be highlighted, although they do so in varying degrees
of detail. The results showed that responders make use of the framework for designing
language tests that corresponded to CEFR levels (58.7%), informing of the content of a
teaching syllabus or curriculum (49%) and designing teaching and learning tasks (46%).
In addition, and even though 56% admit that the institution they work for requires them
to use the CEFR, respondents state they use the CEFR because they have read research
38
studies that convinced them that the CEFR is important (59%). Overall, the perception of
the usefulness of CEFR is rather positive or very positive, but the practicality and degree
of detail do not entirely meet their expectations. Finally, among the recommendations
suggested, the most relevant could be the need for some control over the use that is made
of the framework in real educational settings.
In the same line, the article “Is CEFR Really over There?” written by Sibel Çagatay and
Fatma Ünveren and published in Procedia. Social and Behavioural Sciences in 2015
deals with the research they conducted to explore English Language instructors’
knowledge on the basis of CEFR and their perceptions of CEFR curriculums. They
surveyed instructors of EFL subjects, 18 from a private university and 36 from a state
university during the academic year 2013-2014. Among the state instructors, they claimed
to know about the CEFR but were neutral about understanding the contents of it.
Furthermore, 68% did not take a course or education concerning CEFR and admitted that
they did not have sufficient knowledge about it. Concerning the impact of CEFR on
coursebooks and programmes, 52% did not think that their programme was CEFR
specific and only 36% felt that the CEFR had an impact on the coursebook they used. As
for the private instructors, all of them stated they had received education and were fully
aware of the uses and details of the CEFR. They also agreed on the impact the CEFR had
on their coursebooks and programmes. By way of conclusion, both private and state
instructors considered the CEFR to be useful.
Teachers’ perceptions of the CEFR were also analysed in the primary school context. An
article published in the International Online Journal of Education and Teaching (IOJET)
in 2018 comments on the findings of a comparative study carried out among EFL teachers
working at state and private schools. The researchers, Yilmaz and Ünveren, sent a
questionnaire to 105 school teachers around Turkey. The conclusions of their research
39
were that the majority of teachers had general knowledge of the CEFR but teachers from
private schools had taken courses or training on it and therefore had a sufficient amount
of knowledge, unlike their counterparts. Moreover, teachers from private schools felt that
the CEFR had an impact on course books, tests and language-teaching techniques, while
state teachers remained undecided on those issues. Yilmaz and Ünveren also conducted
a socio-demographic study and found novice teachers to be more aware of the CEFR’s
impact. Moreover, EFL teachers holding MA or PhD degrees had more knowledge on
CEFR than those who only hold degrees in English.
The use of rubrics is still limited within Spain, but it is progressively more frequent in
textbooks and official certificates. Despite this, there are many questions in relation to
them that need to be studied, although since the education act (LOMCE) was enacted,
more research in connection with rubrics has appeared. In 2015, Carolina Girón García
and Claudia Llopis Moreno, from the University Jaime I published an article about the
use of rubrics for the assessment of speaking titled “Designing Oral-based Rubrics for
Oral Language Testing with undergraduate Spanish Students in ESP context” in The
Journal of Language Teaching and Learning (JLTL). Their article deals with a research
conducted with students at their university which aimed to determine whether the fact
that students could choose their partner in their oral assessment (ideal partner) improved
their results in comparison to performing the same task with an undesired partner. A total
of 10 undergraduates were selected as subjects of the research, all from different
university degrees but taking the subject of scientific English during the academic year
2013/2014. Research consisted in the design of a questionnaire wherein the subjects had
to point out the classmate they would select as their “ideal partner” and the one with
whom they would prefer not to perform (“undesired partner”). The students’ level was
assessed with a monologue task and a rubric. The rubric was aimed at assessing the
40
students’ fluency, vocabulary, grammar, pronunciation, coherence and communicative
management. The scale was from 1 to 3 and then an average grade was calculated. After
the analysis of all the questionnaires and the level of the participants, they were paired
for the performance of a dialogue (role-play). Students had to perform the task firstly with
their ideal partner and then with the undesired one. Both performances were assessed with
the same rubric. Once all the data had been analysed, the researchers concluded that 20%
of the participants improved their performance and grade with their ideal partner, whereas
60% obtained a worse qualification when they were with their undesired classmate. 20%
of the participants obtained the same result with both partners. Although the study
limitations are clear, the results underscore the need to continue studying the assessment
of communicative skills with rubrics in order to check its effectivity and improve the
evaluation process.
In the article entitled “Assessment rubrics: towards cleared and more replicable design,
research and practice”, appearing in the journal Assessment and evaluation in Higher
Education published in November 2015, Phillip Dawson reflects on the different
meanings of the term “rubric”, from a secret assessment-sheet up to the articulation of the
assessment criteria expected from a written paper. He states that the research on rubrics
has risen considerably in recent years, as evinced by the fact that until 1997 there were
only 106 books whose main topic were rubrics, something which has clearly changed, as
at least 5,000 articles on rubrics were published worldwide up to 2013.
In the article, Dawson warns about the fact that many institutions are forced to use a rubric
when the definition thereof is not yet clear. His concerns stem from the fact that the lack
of agreement on what a rubric really is may lead to incoherence in its application and may
diminish the rubric’s effectivity and reliability. This issue originated research upon the
different ways of categorizing grading scales. One of those criteria might be purpose:
41
whether they have been designed for the assessment of one task or for the assessment of
a skill independently of the task. Other examples of criteria are whether the rubric
contains examples, or if it is holistic or analytic. However, Dawson claims there are other
good ways of categorising them, such as the way the descriptors are expressed, the
presentation of a rubric, who has designed the grading scale, etc. Finally, Dawson
mentions that some rubrics can include some feedback.
Following Dawson’s line of research, Thikra K. Ghalib and Abdulghani A. Al-Hattami
also undertook research into different types of rubrics. In particular, they focused on
holistic and analytic rubrics as an instrument for the assessment of written tasks. The
results of their research are given in their article “Holistic versus Analytic Evaluation of
EFL writing: A Case Study” published in June 2015 in English Language Teaching
magazine. The University of Taiz in Yemen was the location for their case study with 30
students taking the Writing Skills course during the academic year 2014/2015. Their
objective was to find out whether holistic or analytic rubrics could improve the reliability
of the evaluation process and whether there is any correlation between the grades obtained
and the use of one or other type of rubric. The examiners used two rubrics: a holistic one,
with six points; and an analytic one, with five different criteria to assess cohesion,
vocabulary, syntactic structure, etc. Students had to write a 250 word-descriptive text.
The examiners assessed both of them with each of the rubrics with one month in between.
The analysis of the data showed that students obtained lower grades with the analytic
rubric but, at the same time, reliability among the examiners’ grades was higher when
using the analytic rubric.
A case study carried out by the Universities of Granada and Vigo (Gallego Arrufat and
Raposo-Rivas) concluded that, after using the rubric throughout a whole term in one
subject, students thought that its use increased their motivation and boosted cooperative
42
work. They expressed their options through a survey based on a Likert scale (211).
Another case study carried out by Verano-Tacoronte et al. designed a rubric based on
specific literature read. This rubric was validated by a panel of experts and later used to
assess undergraduate students. Students had access to the rubric in order to prepare their
presentations in pairs. They were assessed by a team of teachers using the rubric and the
results showed the rubric’s high reliability, as the scores given by the teachers were very
similar.
On the other hand, some authors have conducted research on rubrics by analysing the
work of other researchers on the issue. For instance, Jonsson and Svingby studied
seventy-five scientific studies on the reliability and validity of rubrics and they concluded
that scoring is more consistent when using them. Moreover, both the reliability and
validity of the assessment process increased when employing them as an assessment tool.
Panadero and Jonsson analysed twenty-one studies and found that rubrics provided
transparency to the assessment, reduced the anxiety, aided with feedback, and helped to
improve students’ self-efficacy and self-regulation.
On the contrary, some other researches are more critical with the use of rubrics although
they do not discourage its use. Reddy and Andrade are critical about how carelessness
with the validity of the rubrics some researches seem to be. They criticised the fact that
they describe neither the rubric development nor the content validity (cited in Cano 273).
Panadero, Alfonso-Tapia and Huertas resolved that rubrics, in spite of improving the
feedback, do not boost learning by themselves (cited in Cano 270-274).
43
Another research project conducted by Velasco-Martinez and Tojar-Hurtado and
published in Investigar con y para la sociedad attempted to ascertain to what extent
teachers use rubrics to assess competences. To this end, they analysed 150 different
rubrics used by teachers of different universities in Catalonia, Aragón, Galicia,
Extramadura and Castilla y León. Among the results obtained, it was discovered that the
branch of social and legal sciences is the one in which rubrics are most used (34%), as
opposed to arts and humanities (only a 4%). A further finding was that rubrics were
mainly used to assess written essays (36%) and hardly ever used to assess visual or
graphic resources (2.7%), which implies a traditional conceptualisation of knowledge as
something that can be looked up in a book. In addition to these findings, the authors also
provide the educational community with some other interesting data, such as the teaching
methodology the participant teachers applied in their lessons. These data denoted that
master lecture is still the most used methodology (36.7%) while other innovative
methodologies have no significant presence in those universities; for example, only 1.3%
of the respondents used portfolio and 6.7% used case studies with their students. (1396-
1400)
1.2. Lines of research
In this section, the key lines of research will be stated. English is an extremely broad
subject of research and different areas related to it may be implicated in its study. The
current research connects the areas of Education and English Language with the
44
implementation of the CEFR. Besides, new methods of assessment, particularly rubrics
and English language proficiency tests7 in a language are also involved in the research.
The communicative approach promoted by the European Council has revolutionised the
teaching and learning methodologies existing prior to the last decade. The consolidation
of the European Union and the social necessity of adaption to the economic market
demands of globalisation have led to a focus on communicative skills. Being fluent in
English means one can hold a conversation about any general topic and understands
anybody without much difficulty; thus, it is essential to treat speaking and listening skills
as a priority. In addition to this, the students’ achievement of communicative competence
is the main aim of TEFL. Here, it is relevant to clarify how the concepts of communicative
skills and communicative competence are used throughout this thesis. Communicative
skills refer to those used in oral performance; i.e. listening and speaking. In contrast,
communicative competence involves all the knowledge of a language and the knowledge
of whether it is feasible, appropriate, or done in a particular speech community (Richards
and Schmidt 90-91). The foregoing implies a complete transformation of syllabi, tasks
and teaching, all of which need extensive research. If a meaningful and effective
transformation is wanted, all of these new methodologies and instruments must initially
be checked and subsequently improved once they are in force.
The dominant traditional methods of assessment prevailing were designed to evaluate
traditional methodologies, but they are not suitable for the current communicative
approach. For example, speaking tasks cannot be assessed with either multiple-choice
tasks or with a true or false response, and, obviously, paper-based tests are not an option.
As a result, the need has arisen to apply new assessment methodologies and new
7 Proficiency tests: tests that measures how much of a language someone has learned (Richards and Schmidt 425). In this case the standarised English Certificates which are under study.
45
assessment instruments. Portfolios, grading scales, journals and projects are some of the
examples which could be mentioned as new instruments of assessment currently used in
the evaluation process. Nevertheless, a great deal of research still needs to be conducted
in relation with them. Among these new popular instruments for alternative assessment,
rubrics can be highlighted. Although it can be argued that they are not so new, their use
in Spain is still scarce. Hence, this thesis provides an important opportunity for studying
how effective, valid and reliable a rubric can be for assessment, what types of rubrics
exist, and how the reliability and precision thereof can be improved. It is in this line of
research where the present thesis is located.
Another principal area of study is technology, since new devices and new technologies
have an undeniable presence in the current society. They have had an impact and
influence in almost all aspects of life, including work, relationships, industry, medicine,
etc., and education is no exception. Thus, there is a vast field of research, since those new
resources, applications and programmes can be used in the classroom and even in the
assessment process, by both students and teachers. Foreign language learners can have
access to an unlimited amount of input: reading, listening, speaking and writing samples
can be easily accessed with one click as well as, dictionaries, thesauri and vocabulary
applications. How to adapt all these new possibilities to the education system is a broad
issue to examine and research. It is here where this thesis aims to make a small
contribution through the inclusion of online rubrics and tools for their creation.
Another line of research connected with the learning of foreign languages and the new
educational approach are levels of competence. The necessity of learning a new language
and the corresponding need to assess competence in a foreign language are the result of
the aforementioned globalisation. Determining the level of competence which a learner
or user has in a foreign language is not an easy task. That is why the European Council,
46
having detected this problem, urged research in the field. As a result of this, the CEFR
was developed.
Once the CEFR established the common levels of competence for languages in each skill
and provided guidelines on how to assess them, it was time to implement it across Europe.
Therefore, education systems had to be adapted to the new requirements and,
consequently, certificates to prove level of competence became especially significant.
While there have been a lot of adjustments in order to adapt the framework, there are still
many shortcomings in the adaptation thereof. Indeed, how to correctly implement the
Framework guidelines, how to assess competences and how to teach foreign language
according to the same requires further research. This is the line of investigation which has
inspired the current research, as its intention is to ascertain how Certificates of
competence in English have adapted the Framework to their papers together with the new
assessment instruments.
In conclusion, this thesis follows several lines of research which are somehow related and
connected. Generally speaking, these lines are as follows:
- The study of English as a Foreign Language (EFL).
- The modernisation of teaching and assessment methods.
- The implementation of the Common European Framework of Reference (CEFR).
- The assessment of the levels of competence of a foreign language user.
- The tasks which may form a paper for the assessment of each of the skills.
- The use of rubrics as an instrument for assessing skills from the communicative
approach.
47
1.3. Synopsis
Owing to the foregoing, the current doctoral thesis intends to study rubrics as an
alternative tool for the assessment of EFL and the most popular English Certificates in
our country. With this in mind, the entire work has been organised into different chapters
which deal with important aspects necessary to achieve the final desired results.
After the current introduction section, the thesis starts in chapter two with a theoretical
framework on the assessment of EFL. This literature review encompasses its definition,
different types and possible classifications, dimensions, and importance. The framework
is sub-divided into various sections which deal with different relevant and related aspects.
First, a historical review of evaluation, which goes from a global perspective of the
phenomenon to finish with a more focused review of it in Spain. Next, the different types
of evaluation are explained, along with the different possible classifications which can be
made. Those schemes help to understand the dimensions of assessment and all the spheres
and agents related to it. This general theoretic framework finishes with the explanation of
different methods and assessment tools for the assessment of EFL, paying special
attention to the alternative ones, in particular, the rubric, which is one of the main subjects
of the following chapters of this work.
An alternative evaluation must be linked to the Learning Standards and undoubtedly to
the CEFR. These will be fully explained in chapter three, which begins with the Common
Reference Levels that are relevant for TEFL, the achievement of the communicative
competence and the assessment thereof, after which the concept of Learning Standards,
meaningful for the understanding of the current education system in Spain based on the
LOMCE, will be reviewed. Assessment is also one of the key issues in this thesis; hence,
48
the chapter of the CEFR that deals with the same is covered. Chapter three concludes with
the rating scales provided by the CEFR and the evaluation of competences.
Chapter four deals with the use of rubrics as a tool to assess the English competence of
foreign language learners. This chapter includes a comprehensive summary of the
literature and resources available. It also focuses on the description and explanation of a
rubric: how it can be defined, the importance of its use, its origin and history, the different
parts it may contain, its advantages and disadvantages and the multiple types of rubric
there are according to different criteria which might be used to classify them. Finally, the
chapter ends with an explanation of how a rubric must be built, and a review of the online
tools available to design them.
Chapter five contains the description of the research methodology. This focuses on the
methodological approach selected and how the research has been designed. In this section
the objectives of the research are explained as well as the selection of English Certificates
which are analysed, the criteria used to determine effective rubrics and the instruments
used to do so. Lastly, the hypotheses of the research are stated.
The research carried out is fully detailed in chapter six. This chapter is sub-divided into
six sections. The first section is the use of rubrics for the assessment of the different skills.
Then, there are four sections which correspond to the four different skills: writing,
speaking, listening and reading. These four sections follow the same structure: first there
is a review of the literature on the use of rubrics in order to assess each of the four different
skills, after which the test for the corresponding skill of each of the English certificates is
analysed. The analysis includes the explanation of the time and tasks used and the criteria
stated to its assessment, as well as a comparison with the tasks and criteria stated in the
CEFR, attempting to detect possible defects, shortcomings or incoherence. The rubric
49
used for the assessment of each paper (if any) is also analysed, according to the type and
effectiveness thereof.
The research’s chapter finishes with a comparison among all the papers and among all
the rubrics used to examine a particular skill. The comparisons help in the design of exam
papers and in the future development of rubrics for that skill. In addition, we could
contrast which kinds of rubrics are more likely to appear in the assessment of a certain
skill and if its use may or not lead to the need of change the way of teaching and assessing
one determined skill.
The conclusions of the thesis are found in chapter seven, which deals with the
implications of the research and the findings it has led to by making a reflection of what
aspects should be improved and corrected. The limitations of the research are also
mentioned in this chapter, which finishes with an explanation of its applicability; i.e., in
which ways this research may be used to make the most of it. Finally, several lines of
future research the current thesis opens are stated, with the hope that more research can
be conducted in relation with the same in order to continue improving the assessment of
EFL, the implementation of the CEFR and the use of rubrics.
The final chapter gathers all the bibliography which has been used in the preparation of
the current thesis together with the articles, theses and online resources employed. The
appendices contain the exam papers of each of the certificates analysed and the rubrics
that each of them uses.
51
Chapter 2: REVIEW OF THE LITERATURE
The assessment of EFL is a matter of international importance since English reached the
position as the world’s principal and most widespread lingua franca. It is therefore not
just the main language used in trade and international communication but also one of the
most taught and studied languages. Its importance is undeniable and many institutions,
both public and private, deal with the granting of certificates, titles and diplomas which
certify and individual’s knowledge and level of proficiency. In many educational and
professional contexts, such as schools, universities and international firms, English is
frequently taught, and most often evaluated.
The determination of one’s linguistic competence is highly complex and arduous, which
complicates the evaluation process even further. Consequently, a great deal of research
has been dedicated to the evaluation of EFL.
Patel defines evaluation as “the process of determining the degree to which the goals of
a programme have been achieved” (3). Nevertheless, this definition is very partial as it
only refers to one type of assessment; i.e. summative assessment. Moreover, it is actually
conceptualising assessment and not evaluation. This example illustrates the inaccuracy
with which the term is commonly employed, owing to the lack of a universal term.
A better definition of assessment might be its conception as “all the activities teachers
use to help students learn and to cause student progress” (Black and William 43).
Concepts as basic as evaluation and assessment are often confused, even by professionals
within the educational world. Following Pérez-Paredes and Rubio’s distinction (cited in
Mclaren et al. 2005), assessment is a “general term we use to refer to the set of procedures
which are put into practice when gathering information about student’s communicative
52
competence or student’s language performance achievement” (606). On the other hand,
evaluation “considers the teaching and learning program as a whole and seeks to obtain
feedback that can serve different purposes for the different agents in education” (609-
610). As a result, evaluation encompasses assessment, among other aspects such, as
planning, programming, reporting, etc. whilst assessment refers strictly to testing.
However, during this thesis the terms will be used interchangeably for stylistic purposes.
The importance of evaluation is unquestionable. Some authors, such as Dikli, affirm that
it is “one of the crucial components of the instruction” (13). Abbas states that “is an
essential tool for verifying that educational goals have been met” (24).
Nevertheless, this proliferation of definitions is not due solely to the importance thereof,
but also because of the complexity of the term, which makes it possible to distinguish
between three different levels regarding evaluation (Castillo and Cabrerizo):
- technical: evaluation as the process of checking that the system is accomplishing
its functions;
- ideological: evaluation has two functions. One is to legitimise the cultural
inheritance and the other to eliminate what does not belong to the ideological
principles which are being transmitted;
- psychoeducational: evaluation applied to individual/particular students; (6)
In addition, evaluation involves a huge range of dimensions (ibid. 20), including criteria,
evaluator, functions, indicator, methods, objectives, process, uses and variables which
must be considered.
With regard to the EFL Classroom, traditional teaching methods, such as the Grammar-
Translation Method, have been gradually substituted by certain communicative methods.
Since 2001, the European Council, through the CEFR, has been further promoting
53
communicative competence. As a result, the four language skills (speaking, listening,
reading and writing) started to be practised in the class, together with communicative
strategies and techniques and sociocultural context and not just the grammar, as was
common previously.
2.1. A historical review of evaluation
Evaluation and assessment are not new concepts. In fact, according to Lavigne and Good
“forms of testing can be traced back to Chinese, Greek and Romans” (2). López Bautista
expounds Cicero and Augustine of Hippo as examples of people who had already
introduced educative approaches. However, it was not until the Middle Ages when
university examinations started to be much formal. The tendency would continue in the
ensuing centuries. The 18th century saw an increasing demand to access education. This
would lead to an increase of evaluations (1-2) but only in form of entrance tests. The
education and evaluation processes back then were completely different from how we
currently conceive them, even far from our traditional concept of them. Actually, what
we currently know as traditional schooling was born and developed in the 19th century.
At that time, only memory ability was tested.
It was in the United States of America where the first oral and written forms of evaluation
(as they are now conceived) were found (Garrison cited in Lavigne and Good 2). The
beginning of formalisation of assessment arose from the moment when the first cheating
scandals emerged in the 1820s. It was argued that some students had received easier
questions than the others in order to manipulate the results (Lavigne and Good 2). This
scandal brought about the implementation of certain measures to improve objectiveness.
54
For instance, a number of public schools introduced committees to guarantee fairness
during the examination processes. As López Bautista states, in 1845, Horace Man
designed a performance test so that the schools of Boston could be evaluated. Some years
later, from 1887 to 1898, another formal evaluation was carried out in the country. This
time was Joseph Rice who assessed the orthographical knowledge of thousands of
students from all over the country (2). At that time, evaluation was used as a tool of
control, authority and punishment.
It was back then when the questioning around assessment (which has persisted until the
present day) started, and when the first doubts and criticism on assessment appeared.
Odell summarised these concerns back in 1928 (cited in Lavigne and Good 3. The main
concerns voiced included the belief that examinations were harmful for the students’
health as they provoked stress and fear. It was also thought that sometimes the contents
were not adjusted to the objectives. In addition, some critics claimed that, because passing
the exams became a goal in itself, examinations encouraged cheating and bluffing.
Another recurrent objection to assessment was the amount of time that was spent doing
examinations, time that could be better used for learning, reviewing, etc. Finally, it was
believed that exams were totally unnecessary since capable teachers should be able to
assess students on a daily basis through observation.
Evaluation continued to evolve throughout the next century. During the early decades,
some intelligence tests were designed, boosted by the findings of Charles Spearme in
1904 which identified a characteristic of general intelligence. Binet would design the first
intelligence test just one year later and Stern promulgated an intelligence quotient
formula. However, the tendency changed again in 1942, when Ralph W. Tyler defined
55
the term evaluation as a concept different and separate from measurement (López
Bautista 2-3). For him, the purpose of evaluation was to find out to what extent the
established criteria have been met. From that moment on, the approaches to evaluation
also focused on the syllabus and there was greater concern for the assessment process. In
1967, Scriven distinguished between formative and summative evaluation, as well as
between intrinsic and extrinsic evaluation. One of the major changes was changed by
Piaget’s constructivist conception of evaluation. This approach sees the learner as an
active subject, able to build his or her own knowledge. It also conceives that all new
knowledge is generated from that acquired. (ibid. 3).
Castillo and Cabrerizo point out seven different stages in the evolution of evaluation
through history (9). On the first moment or stage, assessment was seen as a way of
measuring the students to establish differences between them. Assessment subsequently
became a measure to check the consecution level of the objectives established. Influenced
by the new trends and conceptions emerging in the United States in the 60s, evaluation
was considered as a complex process that affected every aspect of education, and
summative and formative evaluation started to be distinguished. New perspectives were
developed in the following decade and gave rise to reflection on assessment as a process
that should affect the decisions made. Moreover, criteria and educational objectives were
at their peak; thus, evaluation became a normative process. The fifth stage encompassed
a period where different models of evaluation proliferated. Most of them could be
included into qualitative and quantitative paradigms.
Concerning EFL, Liz Hamp-Lyons also locates the origins of formal large-scale
examinations in the US before and after the First World War. As for Britain, foreign
56
languages were assessed with achievement purpose, just like ancient Greek and Latin had
been examined in the previous centuries. In 1911, the Cambridge University Senate
suggested the creation of a teaching certificate in modern foreign languages. In 1913 the
Certificate of Proficiency in English was developed prompted by the interest in improving
the relationships with colonies and former colonies. The test consisted of grammar and
translation exercises as well as phonetic transcriptions, essays and pronunciation. (15-17)
Another period in language testing and assessment took place in the late 1950s and early
1960s. John Carrol developed the Foreign Language Aptitude Battery, which designed to
determine to what extent a person will be able to master a language. At this time, two
proficiency tests were also developed in the United States: the Certificate of Proficiency
in English at the University of Michigan and the proficiency test of the American
University Language Centre in Washington D.C., which would lead to the now famous
TOELF. Nevertheless, these proficiency examinations were different from the British one
mentioned above, since they were influenced by psychometric advances and made no
assumptions of previous knowledge of the learners. The same period of time implicated
significant changes in Britain, too. English was a very strong language owing to
commerce and politics, so universities received thousands of international applications.
Thereupon, it was necessary to determine if a student would be able to study in English
or how much English he or she would need to learn before being able to do so. The
English Proficiency Test Battery and the Test in English-Overseas were the two most
famous examinations at the time. (ibid. 15-16)
The next change occurred in 1979 and was brought about drifted by the appearance of
communicative language teaching. The British Council required a more communicative
test to check proficiency within the academic context. The English Language Testing
57
Service (ELTS) was created, but it was too expensive, as well as hard to score and to
develop. As a result, it would be replaced by the IELTS, which was more generic as it did
not assess each individual according to the field he or she intended to study. (ibid. 17)
The following period was marked by the establishment of the Common European
Framework of Reference (CEFR), funded by the Council of Europe. The main purpose
was “to overcome the barriers to communication among professionals working in the field
of modern languages arising from the different educational systems in Europe” (Council
of Europe, cited in Hamp-Lyons 18) and “provid[ing] a common basis for the elaboration
language syllabuses, curriculum guidelines, examinations, textbooks, etc. across Europe”
(ibid.18)
Since the 1990s notable interest in the assessment process has been evident, with the
educational community focusing much more in validity and reliability of assessment and
the effect that examinations have on learners. Furthermore, teachers seem to be much
more aware of the implications of their tests. Reflexion and research on what it is taught
and the way it is taught as a direct consequence of the implications and influences of
testing has also been unavoidable. Tsushima argues that “due to epistemological changes
in 2nd language acquisition and the increasing awareness that any language assessment
cannot be separated from the context, culture, and values of where it was created and
used” (106) qualitative approaches are now more common and mixed methods (MM)
have been demanded.
In Spain, assessment also developed gradually, and it was influenced by the changes and
movements which were happening in the other countries mentioned. However, political
58
issues delayed the changes and Spain seemed to be always one step behind. One of the
most significant changes in the evolution of the Spanish education system occurred when
the Constitution of 1812 was approved. Thereupon, education came to be organised,
financed and controlled by the State and not by the church. Written on the grounds of
freedom and equality, the constitution defended the universality of primary education and
gave the teaching programmes some homogeneity (MECD 1). General Elío’s military
uprising would subsequently return the control to the church, until 1920, when the liberals
were in power for three years and enacted the Reglamento General de la Instrucción
Pública (General Regulation of the Public Instruction) and education was free again. The
following years would follow a similar pattern, with alternate periods of absolutism and
liberalism. During the reign of Isabel II, a well-known Spanish law for public instruction
was enacted in the year 1857. This is known as Ley Moyano and it achieved a consensus
between progressives and conservatives. This law established three levels of study:
elementary, general studies (6 years) and post-secondary. It also standardised teaching in
public and private schools and both teaching training and teachers. The governance
thereof was divided among the State, the province and the local governments. (ibid. 2)
The next significant period in education occurred during the First Republic, when the
freedom of education was allowed. Nevertheless, it lasted for just one year, as the
Bourbon Restoration enacted a conservative constitution and education became the cause
of multiple fights between the two main political sides. The independence of some
colonies sparked off an internal crisis, which led to some changes being made in
education, such as the regulation of examinations. The Second Republic brought along a
unique school, free and compulsory for everybody, secular, and it regulated bilingualism
and the position of teachers as public servants. The progressive changes did not last long
59
as the dictatorship in 1939 gave the control over education back to the church, and
education was used to transmit the ideology of the regime. Education became elitist,
teaching methods became archaic and learners were separated by sex. It would not be
until 1950s when the dictatorship started to be a little bit less dogmatic with its regulations
over education. At the end of the regime, in 1970 the General Law of Education (Ley
General de Educación) established four levels of education: preschool, primary school
(EGB), secondary school and, higher education. (ibid. 5-7).
As for the teaching of foreign languages in particular, it was not until the 20th century that
the teaching of modern foreign languages was introduced. It was specifically in a Royal
Decree of 1900 when the study of foreign languages, such as French, English or German,
was considered as an educational component (Morales et al. 8). At that moment, French
was established as the compulsory foreign language, which had to be studied from the
age of 12, whereas English and German were optional in certain academic years (ibid.
19). It was significant, however, that knowledge of French was a requirement to be able
to access Baccalaureate studies.
The introduction of foreign languages was quite brief, though, as only three years later
the possibility of studying an optional subject of English or German was removed and the
compulsory study of French was limited to the two years previous to Baccalaureate
studies. The teaching of foreign languages was then granted to the Superior Schools of
Trade (ibid. 19). In 1926, a reform drafted by the Plan Calleja slightly improved the
situation of the foreign language teaching by making the study of French compulsory for
three years and offering the possibility of studying English or German for two years (ibid.
21). This reform survived with some modifications until 1938, when a further educational
60
reform, which increased the number of school hours, was implemented. This reform
included some tests which the students had to pass at the end of certain academic years
and in which the students should prove their knowledge in foreign languages among other
subjects.
An important shift in the teaching of foreign languages happened in 1953, when a new
law (Ley de Ordenacion de la Enseñanza Media) was passed. The age for starting to learn
foreign languages was initially set at 12 and then lowered 11, but for more academic
years. At the same time, the necessity of using the target language in the class was
mentioned for the first time in a written law (ibid. pp 24-27).
The final major period of education in Spain starts with the Transition to democracy,
when the current Constitution was drafted (1978) and a new law was enacted (Ley
Orgánica del Estatuto de Centros Escolares, LOECE) only two years later. It established
the study of a foreign language for all students in primary education. Furthermore, it was
possible to choose either French or English as the main foreign language to study, and
English quickly became the predominant option. Since then, the different political parties
in power have made changes in the education system. In 1990, a new act, known as the
LOGSE, was enacted and introduced a system based on continuous and integrative
evaluation. It was based on Piaget’s constructivist ideas, so students should learn how to
learn. The LOGSE also conceived evaluation as a subject in itself and established the
assessment of students, but also of teachers, educational centres, syllabuses, etc. (López
Bautista 5). The aims were to use evaluation to obtain information for orientation and
improvement. The final stage started with the Organic Law of Education (2006), or LOE
61
as it is known. From this moment, the evaluation in Spain had another cornerstone: the
assessment of competences and the implementation of the eight key competences:
- Competence in linguistic communication
- Mathematical competence
- Competence in knowledge of and interaction with the physical world
- Competence in processing information and use of ICT
- Competence in social skills and citizenship
- Cultural and artistic competence
- Learning to learn
- Autonomy and personal initiative
At the same time, competence in linguistic communication was sub-divided into different
sub-competences, which included linguistics (grammar), socio-cultural aspects, logic,
sociolinguistics, learning to learn and strategy.
In 2013, a new government approved a new Organic Law known as LOMCE aimed to
replace the LOE. This new educative also contains key competences -seven to be precise:
- Competence in linguistic communication
- Mathematical competence and science and technology basic competence
- Digital competence
- Learning to learn
- Competence in social skills and citizenship
- Leadership and entrepreneur spirit
- Awareness and cultural expression
The main novelty is the introduction of the so-called Language Standards, types of
quantitative reference tables for assessing to what extent a student has achieved a goal
62
(standard) in the curriculum. Learning Standards have been used in many other countries
for the evaluation of school subjects and are closely linked to rubrics or grading scales,
as will be explained below in the Learning Standards section.
2.2. Types of evaluation
The complexity of evaluation implies a huge amount of evaluation types which may be
categorised in different ways, according to the aspect used to catalogue them.
Among the different criteria which may be used, evaluation could be classified according
to the moment of application, its purpose, its extension, and the agent of evaluation.
(Castillo and Cabrerizo 32-48).
a) According to the moment of application:
❖ Diagnostic evaluation: some tests are used to “expose learner difficulties, gaps in
their knowledge, and skill deficiencies during a course” (Harmer 321).
❖ Placement evaluation: this consists of a test to determine in which level a student
should be placed. It is “usually based on syllabuses and materials the students will
follow and used once their level has been decided on” (ibid. 321).
❖ Initial evaluation: this is the one carried out by the teacher at the beginning of the
academic year or at the beginning of a course in order to “know the previously
acquired knowledge of their new students” (in Mclaren, Madrid and Bueno 609).
❖ Progress evaluation: this kind of evaluation measures how much the student has
progressed.
❖ Final evaluation: it is the one which is done at the end of a course to check the
achievement of the objectives.
63
b) According to its extension
The extension refers to whether the evaluation is assessing only one aspect, for instance,
the academic result of an exam, the behaviour of the student in the class, etc. or if it
assesses different aspects at the same time. (Castillo and Cabrerizo 39)
❖ Global evaluation: this encompasses all the components and dimensions of the
learner, the educative centre, the programme, etc. The evaluation applied only to
the students would be a kind of evaluation which measures skills, knowledge,
attitude, competences, etc.
❖ Partial evaluation: this is focuses solely on the measurement of one aspect or
dimension: knowledge, skills, attitude, etc.
❖ Inner evaluation: this type of evaluation refers to the one carried out by the centre,
its teachers, or administrative staff to test the inner working.
c) According to the agent of evaluation (ibid. 40)
This kind of categorisation is based on the person who is in charge of the evaluation. It
allows evaluation to be divided into different kinds:
❖ Self-evaluation: that carried out by teachers when they evaluate their own work,
or by students when they evaluate themselves.
❖ Hetero-evaluation: this is evaluation as everybody conceives it. When the teacher
evaluates students or when students evaluate the teacher. It can be external if an
agent of evaluation from outside the high school evaluates the students.
❖ Co-evaluation: co-evaluation or peer-evaluation. This is the kind of evaluation
carried out by people who belong to the same level or status. In other words, it
refers to the evaluation of teachers by their colleagues or to the evaluation of
students carried out by their classmates.
64
d) According to the scale
Normative evaluation is that which compares the results obtained by a group with the
general average of one concrete group; for instance, the average of another group from
the same level, the centre’s average mark, the comparison with other centres. (ibid. 41-
42)
On the other hand, criterial evaluation is based on some evaluation criteria specified
previously to the assessment, and available for the students to consult.
e) According to the purpose
This is probably the best-known criterion. Evaluation may be used with three main
purposes: diagnosis, formative and summative.
❖ Diagnosis evaluation has already been explained above (in letter A).
❖ Formative evaluation: an evaluation “is considered to be formative if the primary
purpose is to provide information for program improvement” (Fitzpatrick et al.
16). It is related to the evaluation of the process and is used to make curricular
decisions on the content, the way of teaching, the reviewing, etc. It allows the
teacher to check, reinforce, regulate the learning; and the student to orientate,
receive feedback, check, etc.
❖ Summative evaluation: this type of evaluation “is usually called final evaluation”
and is “considered the main type of evaluation in school settings” (in McLaren,
Madrid and Bueno 611). It is carried out at the end of the academic year in order
to check the achievements reached. It normally has a penalising function as it
allows the teacher to decide whether the learner passes or not. Fitzpatrick, Sanders
and Worthen define the summative evaluation as follows: it is “concerned with
providing information to serve decisions or assist in making judgements about
program adoption, continuation or expansion.” (7)
65
In order to understand the significance and magnitude of evaluation, it is fundamental to
know all the areas and aspects it influences, such as the scholar programme, the working
of the educative centre, the teacher’s performance, the student’s learning, the didactic
materials, the tools, procedures, the whole education system, educational community and
the evaluation itself (meta-evaluation). (Castillo and Cabrerizo 49)
f) According to the scoring
There are a large number of scales for scoring a test. The most common ones are probably
the numerical scales, particularly the scale from one to ten, which is widely used in Spain,
where 10 is the maximum grade and 1 is the minimum. Another common numerical scale
is from 1 to 100. Moreover, in some countries, for instance, in Germany, the scale from
1 to 10 is used in the opposite direction so 1 would be the highest score and 10 the lowest.
Numerical scales can be also expressed by percentages. The qualitative scale is also very
used in Spain and it consists of a wording scale which scores qualitatively (sobresaliente,
notable, bien…). The words can change (excellent, perfect, good, average, needs to
improve…) but the idea is the same. Other countries like the United States or the United
Kingdom used letter scales. Once again, the number of levels may vary. The most typical
letter scale ranges from A, which is the top, to F, which is the bottom. Grades can be
given by hand, computer or distributed scoring. “Distributed scoring is a model of
performance scoring in which readers receive training and conduct scoring remotely (e.g.,
from home) rather than in a regionally-located performance scoring center” (Keng et al.
1)
g) According to the delivery methods employed
In this case, the test can be traditional, using pen and paper, it can be delivered by
computer, online or it can be made orally through a presentation, interview, exposition,
etc.
66
h) According to the formality
Amita Patel claims that with informal assessment “the judgements are integrated with
other tasks” (4) and they are commonly employed to provide students with formative
feedback. This type of assessment is not very stressful or threatening for learners. In
contrast, formal assessment is the one which students know they are doing. (ibid.)
i) Divergent/convergent
Divergent assessments are those “in which a range of answers or solutions might be
considered correct.” (ibid.) While they are more time consuming, they are often more
authentic in order to assess cognitive skills. On the other hand, tests with only one correct
response are called convergent, and are faster and easier to grade.
j) Process/ Product
Process assessment focuses on the steps followed by the students to achieve an ability or
to do a particular essay or task; it measures development. Product assessment scores or
measures only the outcome of a task or test and never the process which was followed.
With regard to assessment approaches in educative centres. Escudero Escorza classifies
these into five groups: the first approach is focused on the results, the second one on the
organisation of the centre itself, the third type uses mixed criteria, the fourth type seeks
to assess cultural aspects related to the centre, and the fifth type assess the institution’s
ability to self-transform. (2)
2.3. Dimensions of assessment
As it has been previously stated, assessment and evaluation are such complex concepts
because they encompass many different dimensions which must be taken into
67
consideration. Castillo and Cabrerizo mention a huge range of dimensions, including
criteria, evaluator, functions, indicator, methods, objectives, process, uses and variables,
but there are even more.
The learner’s dimension is a good one to start with. Irrespective of whether the method
used to assess is traditional or alternative, most of the time assessment implies the
students. Assessment is a very important part of the teaching-learning process, not only
if it is performed for summative purposes or for any other. Assessment is supposed to
help students, providing them with feedback and information on their achievements of
goals, improvements, weaknesses, level, strengths, study techniques, work, abilities and
skills and teamwork. This is the reason why they should be always taken into account
when the assessment is being chosen or the evaluation is being planned. The number of
students in the class, their possible different levels or what the type of learners they are,
are some of the things which should concern the teacher. It is highly recommendable to
allow them to take part in the process in some way so that they will perfectly understand
what they are facing and what it is expected.
The evaluator is another essential and indispensable dimension of the process. Scoring is
probably the most difficult part of the teaching process and it requires a great deal of
capacity, ability and objectivity from the scorer. Furthermore, as will be explained later
in greater detail, teachers should have some language assessment literacy in order to be
good at assessing.
The criteria are also an implication of the assessment process. Besides being clearly
defined, the test or assessment task must be suitable for them as well as the assessment
tool. It is not just a case of referring to the particular criteria for a task, but also for the
whole academic year as some criteria and standards are regulated by the curriculum or
68
the syllabus. The criteria must be truly useful to assess a particular skill or ability of the
student.
The school is another dimension related to assessment. Some schools have special
regulations to standardise examinations. In addition, evaluation grades might be used to
elaborate statistics on a large-scale which classify the educational centre where they were
made. For example, the academic failure reports or the schools with the best average
results in a particular skill, subject or course.
It should not be forgotten that parents are also part of the educational community. They
should also be implicated in their children’s educations and sometimes, for better or
worse, qualifications are the most important indicator for them.
The assessment task or method is obviously a dimension of the process. There are
hundreds of different tasks in order to assess students, but the decision must be taken with
a view to the types, the context, how fast, reliable and effective scoring is, and so on.
The assessment instrument, if we use one, may be also quite significant. Rubrics,
checklists, interviews or reports might have a great influence on the assessment.
2.4. Importance and consequences of evaluation
Carrillo and Unigarro state that “it is not possible to consider assessment separately from
the teaching and learning processes” (36). According to Ahmadi Fatalaki assessment
allows the teacher to identify the successful instruction and it also affect the learning
engagement of the students since teachers are more aware of the learner’s level of
proficiency (78). Assessment determines what is taught and how (Carrillo and Unigarro).
69
In the current society, a society of information and knowledge in which students must be
the principal actors in the learning process, it is necessary to teach them how to learn, to
engage themselves in the process and to boost critical thinking, creativity, team work,
initiative and reflexion (Herrero et al.). The European Higher Education Area (EHEA)
has promoted formative education along with methodologies which help learners to
develop both academic and professional competences and to turn the student into an
active subject.
Moreover, during the last decade there has been an increasing awareness that assessment
“cannot be separated from the context, culture, values and where it was created and used”
and, as a consequence, “hermeneutic and qualitative approaches have been adopted”
(Tsushima 106). Heidari, Ketabi and Zonoobi explored the role of culture in the different
language teaching methods from the ancient times to the present day. They have
discovered that because of globalisation, culture has a much more significant role in
modern teaching methods such as the Communicative Approach, Task-Based Language
Teaching, Content-Based Language Instruction and The Intercultural Competence. (para.
22)
Yamtim and Wongwanich place assessment among the five different components which
“contribute determining the quality of instruction” as well as students, teachers, resources
and context (cited in Herrera and Macías 303). After all, the evaluative system provides
the social community with information on educative quality, it also contributes with
information for research and it allows the most urgent areas for intervention to be
identified. Furthermore, it allows the impact of specific programmes and political policies
to be verified. Syllabi and curricula can also be improved based on assessment results as
70
it can help in the incorporation of quality standards. Evaluation results may also be used
to select students to access certain programmes, such as university degrees. Finally,
assessment measures the domain of competence (Carrillo and Unigarro 33) which is
useful in determining the learner’s level,
diagnosing where possible problems are and checking the progress and goals achieved.
Popham devotes an entire chapter in his book, Mastering Assessment, to point out how
assessment can help teaching. Among the reasons he gives, one could highlight how high-
stakes test - tests which implicate important decisions- may affect the promotion of the
student or not. But high-stakes tests also help to determine whether the instructional
practice was good or not. How pre-assessing students can save time to teacher is explained
in Mastering Assessment. The reason for this is that topics which learners already know
can be omitted. Additionally, the assessment of students during the teaching process
allows formative assessment and progress-monitoring. Moreover, the teacher can
determine whether he or she will need to explain the topic in a different way or not.
However, Popham suggests not giving too much weight to these “en-route” assessments
so that students do not become intimidated. With regard to post-assessment, the most
important advice which stems shifted from the book is the fact that more than “dispensing
student’s grades” (16), these assessments should answer the question: “How effective was
my teaching?” (15).
Not only is assessment important and significant owing to its benefits or the positive
influence it can have on the teaching practice, but also because of the negative effects and
consequences it might also lead to. Carrillo and Unigarro state three important factors in
language learning: anxiety, motivation and self-confidence (21). A high level of the last
71
two is essential for successful learning. On the contrary, a low level of anxiety eases the
learning. As a result, the common high level of anxiety the students experience nowadays
during the language lesson is hindering their learning. Horwitz points out three main
sources of this anxiety: communicative apprehension, fear of a negative evaluation and
exam anxiety (cited in ibid. 54).
The fear of a negative evaluation has a strong impact on the learner’s test performance.
That is because “there is abundant evidence that demonstrates that individuals’
intellectual performance is undermined in situations that remind them that they are
stereotyped, which is causing them to underperform” (Schmader cited in Ewing 7).
Furthermore, performance anxiety can lead some students to school refusal. According to
research conducted in the United States, up to 5% of students may reject going to school
because of anxiety or depression (Wimmer 1). These feelings of anxiety or depression are
generally generated by the assessment and evaluative process.
Anxiety may be transmitted to students during the teaching process. Sometimes, teachers
“may communicate the consequences of a test performance in a very different manner”
(Von der Embse, Schultz and Draughn 622). For instance, they may use test results to
“threaten students” as an attempt to try to motivate or encourage them to study and
prepare the exam well if they do not want to fail. “Fear appeals refer to messages that
repeatedly remind students about the importance of passing exams and the consequences
of failure” (ibid.). Those messages are usually linked to the ones sent by many learners’
parents from their homes and the student’s own self-demanding to pass the course, to
entry the desirable university or to obtain great scores.
72
The level of anxiety can be measured with several scales. One of the most important one
is the Foreign Language Class Anxiety Scale (FLCAS) which consists of 33 items and a
Likert scale from 1 to 5. The Cognitive Anxiety Scale is another one and has 27 items. It
also uses a Likert Scale, this time from 1 to 4. It is recommendable to check the level of
anxiety of a class in order to take the suitable measures to reduce it.
Both the positive and negative consequences derived from assessment or the evaluation
process make it so significant and important.
2.5. Traditional Evaluation
What traditional evaluation really means may differ from one author to the other. It is
obvious that not everybody has the same conceptualisation on what it is traditional and
what it is not. Moreover, the term itself is very broad. For the purposes of the current
work, traditional evaluation corresponds to all the evaluation methods, techniques and
tools which are used by the teacher to carry out a final and summative evaluation at the
end of the course or term. This evaluation is only quantitative because it merely aims to
give a mark to students in order to establish how well they have achieved the learning of
the concepts from the syllabus. There is no intention of providing students with qualitative
information about their learning, or about what he or she can do to improve, or how to
learn better. It does not aim to give information about their strengths and weakness, or
even try to measure their “real” performance, competence and abilities or skills, but only
the memorised knowledge that they have acquired. When this kind of evaluation takes
place, there is normally nothing else to do, as with this evaluation the student has already
passed or failed the subject/term/course.
73
Dikli maintains that traditional assessment tools are usually “multiple-choice test,
true/false test, short answers and essays” (13). Multiple choice tests consist of different
questions or unfinished statements and various answers from among which the learner
has to choose. They are commonly used in high schools as they are “fast, easy and
economical to score” as well as “objective” (13). However, there are also many problems.
For example, “they are extremely difficult to write well, and the distractors may actually
put ideas into student’s heads that they did not have before they read them” (14). In
addition, it is possible to pass a multiple-choice test without having a really good
competence in writing or speaking the language. True/false tests requires students to
“make a decision and find out which of the two potential responses is true” (14). However,
it may be difficult to determine whether the student really knows the correct answer as
he/she has a 50% chance of success. (13-14). Short-answer tests try to obtain a brief
written answer to a question. The main problem is that questions are normally unrelated
and are normally open to interpretation.
Some other common tools for summative assessment, especially in the teaching of foreign
languages, are cloze procedures. Cloze tests are based on the “deletion of every nth word
in a text (somewhere between every 5th or 9th word)” (Harmer 323). As a result, the
elimination of words is random and all kinds of words may be deleted. The main
disadvantage of cloze procedures is that the learner “depends on the particular words that
are deleted, rather than on any general English knowledge” (324).
74
Summative and quantitative evaluation is still the most usual and common type of
evaluation in Spain’s education system, even when teaching and education have been
recently exposed to a significant evolution.
Castillo and Cabrerizo (427) have collected some of the disadvantages of traditional
evaluation:
❖ Only concepts are assessed: the quantitative marks given to student measure only
their memory ability.
❖ Scholar failure. The huge and increasing academic failure rates are closely related
to the teaching as well as to the assessment method. Traditional evaluation does
not measure the real written and spoken competence of a learner, either in real and
authentic situations, leading to lack of motivation in students. In the era of the new
technologies and communications, it is completely understandable that students
have no interest in traditional exams or tasks. In addition, it has been proved that
there are multiple intelligences and also multiple types of learners which
traditional evaluation does not take into account.
❖ The student is the only one being evaluated. This conceptualisation is completely
old-fashioned. Within the educational community, the teacher should also be
assessed−as well as the syllabus, the evaluation programme, etc.− in order to
improve the quality of the system. Traditional evaluation does not serve this
purpose.
❖ Only the results are measured. Against traditional evaluation, the new and
alternative ways of evaluation recognise the importance of evaluating not just the
results but also the process, the rhythm, the effort, the strategies the learner is able
to use, the methodology, the progress, etc.
75
❖ Remedial exams are merely repetitions of previous exams. Some authors criticise
the fact that a remedial exam only entails doing a very similar exam on a different
day instead of providing the student with new strategies to face the difficulties.
❖ Traditional evaluation focuses on what the learners do not know or on their
mistakes. This is a very negative perspective which does not motivate students at
all. Evaluation should be focused on what the students are able to do in order to
stimulate them.
❖ Evaluation is merely a score. As has already been explained, the way in which
evaluation is carried out nowadays is reduced to a number and it does not contain
any qualitative comment.
❖ The exam is the only assessment tool. Students have to gamble everything in one
exam. It does not matter how they have performed during the entire course, what
they have proved to know during all the classes, but only what they are able to do
in a traditional assessment one day in a highly stressful situation.
❖ Traditional evaluation does not consider self-evaluation whose benefits have been
more than proved.
❖ Traditional evaluation is normally used as a measure of repression or punishment
instead of being an instrument for improving learning.
Brooks states that “it has long been recognized that what teacher teach and the ways in
which they teach are heavily influenced by particular types of assessment”. (16) Pidgeon
has already expounded (1992) how in many classes “children were ricocheting from one
unrelated activity to another to ensure all attainment targets were covered”. This has
propitiated that children do not learn in any “meaningful way” (cited in Brooks 17).
76
Besides some of the consequences already mentioned, Brooks also points out that
traditional evaluation discourages skills such as critical reflexion and speculation, as well
as the learning for personal improvement. Lavigne and Good claim “teacher criticism was
consistently shown to have a negative effect on student learning” (47). Zainab Abbas has
also reflected on the effects of evaluation and assessment, and she states that “experience
indicates that the process of evaluation has been misused by the majority of EFL
instructors” (4) and that they have substituted it for a “monthly or regular selected
responses test”. She also points out what many other authors and researchers have found
exams and tests to be very stressful and harmful for the students and this fact does not
benefit overall language proficiency.
2.6. Alternative evaluation
The teaching of EFL is not like it used to be two decades ago. The establishment of the
CEFR and the promotion of the communicative competence have driven reforms in the
subject. The communicative approach is now hugely encouraged in lessons, so the main
language of work is English and not Spanish. The different skills: speaking, listening,
reading and writing are now put into practice besides grammar.
However, the truth is that, normally, grammar has still the greatest weight in the student
mark and the assessment tools used for the evaluation are, in most schools and high
schools, very traditional. Despite the presence of formative evaluation in the classroom,
it is not commonly applied. What most of teachers do as a “formative evaluation” is
actually just to reserve a 10 % of the global quantitative mark to measure (quantitatively
too) the attendance, the attitude, the work and the homework done by the student during
77
the year. As a result, very important aspects of the evaluation, such as the progress made
by the learners, their daily work and effort, or their real performance in a non-threatening
and stressing situation, are reduced to a 10% of their final mark.
By contrast, there is another type of evaluation already mentioned: formative evaluation.
Nevertheless, it is important to highlight, once again, that the kind of methodologies that
consider “periodical intermediate exams” as formative evaluation are not the ones this
essay refers to (Popham 17). Formative evaluation has been proved to be more effective
than summative evaluation. Researchers such as Paul Black and Dylan William have
conducted a great deal of research on formative evaluation, its implications and its results.
One of their studies on formative evaluation was an assessment of the formative
evaluation itself. Their report was published by the journal “Assessment” in Education
and was based on a wide variety of reports on the topic. These reports had been written
by other researchers from different countries and were based on the experience with
pupils ranging from 5-year-olds to university students. According to the report written by
Black and William, their research showed conclusively that formative evaluation
improves learning. Numerous reviewers of this meta-analysis made by Black and
Williams in 1998 conclude that the methodological rigour used and the quality of their
judgements and conclusions endorse the reliability of the outcomes of the report are
totally reliable. (qtd. in Popham 23-26)
Once the benefits of implementing formative evaluation have been described, it seems
clear that formative and qualitative evaluation is more recommendable than summative
evaluation or, at least, that a combination of the two should be implemented in a fairer
and more equitable manner.
78
Notwithstanding that formative evaluation already exists today, the significant and
immense changes our society is experiencing in recent years make it necessary to go a
little bit beyond that. Formative evaluation is required but the methods, techniques, and
tools which can be used to implement it need to keep in step with the evolution and the
advances society is undergoing. For this reason, it seems necessary to turn the formative
evaluation into a trans-formative evaluation and in order to do so, it seems essential to
provide teachers with a new or renewed variety of alternative methods, techniques and
tools which will enable them to design and programme a new and real alternative
evaluation.
First of all, as was done with evaluation in the previous part, it is essential to start by
giving a definition of the concept “alternative assessment and evaluation”. Despite the
fact that it is a common concept in the current educational community, it is difficult to
find an appropriate definition which captures the entire essence of what an alternative
method of assessment or evaluation implicates. In fact, the concept of alternative testing
was used to describe all those activities which were not formal tests but which could be
used for assessing learning performance, as alternatives to the traditional methods of
evaluating language.
Valencia and Pearson have created a quite similar conceptualisation of the term:
“alternative assessment consists of all of those efforts that do not adhere to the traditional
criteria of standardisation, cost-effectiveness, objectivity and machine-scorability” (cited
in Abbas 27). A much more simplistic definition of the term is that published by
79
Glencoe/McGraw-Hill which understands that it is every “alternative to traditional paper-
and-pencil tests”. (para. 2)
Other authors consider alternative assessment as the evaluation of what students can do
instead of what they are able to recall or to reproduce. It would be the checking of what
students integrate and produce. (Abbas 27)
The above definitions and explanations point out the complexity of the concept due to the
wide spectrum of manifold methods or techniques which could be encompassed under
this denomination. In her article on alternative assessment, Willa Louw mentions the view
of two authors, Mary and Combs, on the theme (23). Although the works of these authors
were published in 1997, their conceptualisation has been able to integrate the complexity
of the concept and is still relevant for this reason. They both conceive alternative
assessment as an amalgam of cognitive, demonstrative and affective methods carried out
in order to evaluate the students. According to this, alternative assessment implies not
simply assessing how the knowledge could be applied to the real world in authentic tasks
or assessing the performance of skills such as demonstration and simulation, but also
assessing the attitudes and values of the students.
However, whether everything which is not a written “traditional” test should be
considered alternative assessment or not could be widely discussed. Is a non-written
examination suffient to be considered alternative assessment? What could be labelled as
“alternative”? Nowadays technological improvements and advances, as well as the
changes in the world, are so relevant that what could be regarded as alternative ten years
ago, now would be far from what we understand as alternative to traditional, particularly
80
taking into account the characteristics of Secondary Education students. These pupils
were born within the Internet Era, with a computer, a smartphone and a tablet at home,
and they have been able to manage all of these devices since a very early age. Commonly
referred as “digital natives”, they are distant from the previous generations. The
generation of students who witnessed the introduction of digital screens or projectors in
the schools and who owned a mobile phone from the ages of 10 or 11 could be slightly
impressed by the changes applied during these years but, in order to engage digital
natives, we need more than a projector in the classroom, and other alternative
methodologies and methods should be implemented. The evolution from the printed book
to the book projected onto the screen could work for a couple of years with past
generations but it would be considered boring and old-fashioned for the new ones.
Younger students would define these methodologies as traditional more than alternative,
as it would be the case with most of erstwhile alternative assessment methods.
2. 7. Assessment for Learning
Kay Sambell, Liz McDowell and Catherine Montgomery, lecturers at Northumbria
University in the United Kingdom, devote an entire volume to outline what assessment
for leaning is. Despite the fact that they focus on higher education, the truth is that most
of the concepts may also be applied to secondary education. Assessment is commonly
thought of as a form of testing what the learned know, what they can do, and which grade
is associated to it. However, Assessment for Learning (hereinafter, AfL) is “the principle
that all assessment, within the overall package, should contribute to helping students to
learn and to succeed” (3). As a result, assessment for learning is neither summative nor
81
formative, or any other type, but a balanced combination of different types with the main
purpose of improving students’ learning.
AfL is based on six principles. It must be rich in formal feedback but also in informal
feedback; emphasise authentic and complex assessment tasks; develop students’ abilities
to assess their own progress; direct own learning; offer extensive confidence-building
opportunities and practice; and have a suitable balance of summative and formative
assessment (5).
Active learners involved in their own learning need feedback in order to evaluate their
own progress and improve their learning. For this reason, teacher’s comments and self-
review logs, interviews, and so forth, are recommendable. Peer review of drafts, for
instance, is a suitable practice for providing students with informal feedback.
Collaborative approaches may help learners to learn together, revising, discussing,
sharing ideas, understanding different points of view or learning about new methods that
can be very helpful for them. It is also fundamental to allow students to practice or
rehearse so that they can improve, gain self-confidence or correct possible mistakes.
These opportunities can be generated by teachers during the teaching process. The
dominance of summative assessment should be reduced and balanced through a much
more qualitative assessment if AfL is the principal target of the whole process. Finally, it
is indispensable to design authentic tasks, what is really important for the students to
know instead of what it is easier to score (6-7).
There is a huge range of practices that might be introduced in the class in order to assess
for learning, all of which involve real-world contexts or practices beyond the academy
(ibid. 13), students being able to detect problems, solve them and work in a cooperative
way. Consequently, teaching approaches must focus on independent thinking, problem-
solving ability, originality and teamwork skills. Learners should be explained what they
82
are doing, why, and what the purpose is, so that they can understand the value and the
significance of what they are learning.
Besides the understanding of the relevance of assessments, it is important that students
see the link to the real world and develop personal engagement. In Trowler and Trowler’s
words (cited in ibid.) “individual student engagement in educationally purposive
activities leads to more favourable educational outcomes” (15).
With regard to TEFL, Christopher J. Hall warns there is “a tendency to identify the
language uniquely with a single ’standard‘ variety”, which means, to associate native
speaker English with the only valid variety of English and, as a result, “the sole reference
point for assessing the adequacy of non-native speaker forms” (378). If an authentic
assessment is desired, Standard English cannot be the only the only accurate language
allowed.
2.8. Instruments
Alternative evaluation encompasses a colossal and endless list of methods and techniques
which would be impossible to expound entirely. In the present work, an extensive range
of examples will be described.
It is essential to clarify that most of alternative evaluation methods are grounded on
formative evaluation. Some authors state that alternative assessment must be applied
continuously in the class because classroom-based assessment informs immediately
teachers and students, as well as parents, of student performance on an ongoing basis
(Janisch et al. 222). Learning a foreign language implies being able to produce and to
know the language all the time and not just in a concrete exam. For this reason, students
must be assessed in their day a day use, in order to correctly measure how they use the
83
language in performance and authentic tasks. Dikli claims that “authentic assessment
aims to relate the instruction to the real-world experience of the learners.” (14) Along the
same lines, Wangsatorntanakhun (cited in Dikli) states the term, performance-based
assessment, embraces both alternative and authentic assessment.
2.8.1. Portfolio
According to Nigel Miller, portfolio “contains great potential for developing and
demonstrating transferable skills as an ongoing process throughout the degree
programme.” (8). The quantity of research, works and articles written on the portfolio and
their countless advantages and benefits is vast.
Paulson, Paulson, and Meyer (cited in Dikli) define portfolios as “a purposeful collection
of student work that exhibits the student’s efforts, progress, and achievements in one or
more areas” (14). The portfolio normally includes the entire body of work from the
student or a selection of the best pieces and any kind of reflection on it. This reflection is
based on the selection of works by the students and the reason why they have been chosen
or based on what the student has or has not learned through the use of any self-evaluation
tool. Within the current technological era, this alternative method of assessment could be
improved taking the advantages provided by the new technological tools. The internet
provides the student with numerous applications or websites specialised in creating online
portfolios, for instance, Eufolio. However, other tools and applications available on the
Internet could be used with an identical purpose. Students could develop all their
creativity in order to design and construct their own portfolios by building them on a
webquest or a wiki, or by utilising applications such as Lino-it, Wall wisher, Smore or
Kerpoof, among many others.
84
Dikli states that “the practicality of e-portfolio use is highly dependent on instructor’s as
well as learner’s knowledge of computer technology. The variety of information that
could be included to e-portfolios is infinite”. (15)
Among the great number of advantages of portfolios, the following ones could be
highlighted (Miller):
❖ Portfolios can be useful for students with work experience to claim credit for tasks
done at the workplace and to tailor work tasks in a way that promotes learning and
development. They can also be useful as a basis for interviews and promotion.
❖ A portfolio is usually a collection of work developed over time and may help the
student to think about what is being achieved in an ongoing way.
❖ Students have a degree of control over what goes into the portfolio.
❖ As evidence of a student’s achievements, a portfolio can foster confidence and a
sense of achievement.
❖ The process can foster dialogue with tutors and assessors.
As for the relationship between the portfolio and the evaluation process, a number of
authors state that, through a portfolio, teachers can assess the progress of students and
discover the process they have followed. It can also assess in a formative way, rather than
in a final exam, while students can get more involved in the process. Finally, it shows the
abilities they like the most instead of the ones chosen by the teacher (Castillo and
Cabrerizo 218).
In addition, through the portfolios students are engrossed in self-evaluation and set goals
for their learning. They are no longer “defenceless vessels waiting to be filled with facts”
(Wasserstain cited in Janisch et al. 14). Instead, they are “masters of their own learning
and sense” making (Graves 2002, cited in Janisch et al. 221).
85
Although a universal and unified model of portfolio does not exist, except for the one
proposed by the European Council, which will be explained below, Castillo and Cabrerizo
(2010) recommend that the following aspects always be included: Subjects’ aims,
competences which must be acquired, general scheme of work, daily planning, register
of key experiences, evaluation criteria and guidelines, self-assessment guidelines and
conclusions (219).
On the other hand, portfolios might also entail some difficulties. Janisch, Liu, and Akrofi
discussed some of them: the principal one is the lack of motivation or self-initiative and
self-reliance of the students, who are used to take a passive role in learning, and it is
sometimes difficult to get them to be more responsive and participatory. (227)
Other disadvantages of using portfolios could be the following:
❖ Requiring extra time to plan an assessment system and conduct the assessment.
❖ Gathering all of the necessary data and work samples can make portfolios bulky
and difficult to manage.
❖ Developing a systematic and deliberate management system is difficult, but this
step is necessary in order to make portfolios more than a random collection of
student work.
❖ Scoring portfolios involves the extensive use of subjective evaluation procedures
such as rating scales and professional judgment, and this limits reliability.
❖ Scheduling individual portfolio conferences is difficulty and the length of each
conference may interfere with other instructional activities. (Venn 538 cited in
Patel, P. 2).
2.12.1.1. European Portfolio of languages:
The European Portfolio of Languages (EPL) is the result of many projects and studies
developed by the Council of Europe in several EU countries aimed at setting up the basis
86
and common features for a standard portfolio valid in the whole territory. According to
Little et. al.
EPL “summarizes the owner’s linguistic identity and his or her experience of
learning and using second/foreign languages; it also provides space for the owner
periodically to record his or her self-assessment of overall second/foreign
language proficiency.” (7)
This portfolio consists of three obligatory parts: the Language Passport, the Language
Biography and the Dossier.
The European Portfolio was designed mainly to make students aware of their own
learning process and promote their autonomy and to provide evidence of the learner’s
competence, skills and ability in the target language (Little et al. 8).
The European Portfolio of Languages is closely related to the CEFR. As a result, the EPL
uses the level’s scales of the framework for self-assessment. Moreover, the grid of levels
could be used by teachers who wish to assess the level of their students in the different
skills to mark the portfolio if they want to implement its use in the evaluation process.
After all, “the ELP should be seen as a means of bringing the concerns, perspectives and
emphases of the CEFR down to the level of the learner in the language classroom” (Little
et al. 10).
2.8.2. Oral presentations
Oral presentations are another frequent alternative method of assessment. Fortunately, the
communicative approach promoted by the European Union has started to introduce oral
presentations in English lessons. Oral presentations often consist of a speech made by the
student on a topic, using, or not, additional aids, such as presentation software. Students
are normally required to prepare their speeches in advance, and that commonly includes
researching a certain theme and being able to produce a communicative speech.
87
Ryan and Cook point out the following reasons for introducing this method in the
assessment process:
❖ Testing cognitive skills
❖ Allowing students to demonstrate their ability to generate and synthesise new
ideas
❖ Giving them the opportunity to demonstrate what they have learnt in an analytical
way
❖ Turning the tertiary classroom into an active learning environment
❖ Giving students the chance to learn from their peers and to share their knowledge
with them. (para. 2)
They also stress that students “need to use the task as a valuable opportunity to develop
vital skills” (para. 5). They also emphasise that the practice gained in oral presentations
enables students to develop to understand process and explain key topics which will
suppose tremendously valuable practice for their future careers and professions.
Slagell (n.d.) warns about the disadvantages: it may require external judges and much
time to evaluate the effect; it is a threatening situation that increases student anxiety and
may interfere with learning. Moreover, it involves complicated ethical issues and the
feedback is often limited; it attempts to divide delivery from content; it offers a false sense
of “objectivity” and suggests that effective speaking is a checklist of universal and distinct
specific behaviours. Finally, peer assessment is complicated; some peers are poor
listeners or rude respondents, there are temporal and administrative challenges, and
videotaping can heighten anxiety.
2.8.3. Journals
Journals are entries written by students in which they reflect on what they have done and
learned in the class or in their assignments. Journals can be written on a daily, weekly or
88
monthly basis, at the end of the lesson, or at home. On the Internet, a huge quantity of
tools can be found to connect the ICT with the lessons.
Richard and Renandy (cited in Abbas 32) state that journal keeping, being informal in
nature, enables a student to gain extensive writing practice. The main advantages of
journals are:
❖ It can be enjoyable since it gives the student free rein to write on any topic at the
spur of the moment.
❖ It offers students the privacy, freedom and safety to experiment and develop as a
writer.
❖ It contributes greatly to the humanistic approach to teaching and learning, an
example of which is the integration of values during the sharing sessions.
Lack of motivation would possibly be the most significant disadvantage of this method.
Students will not be particularly thrilled by this method and they may even consider it
boring and annoying. Moreover, it is not really an authentic task, and it only allows the
teacher to evaluate the writing skill. It could, however, be combined with another method.
2.8.4. Projects
Project-Based Learning, known as PBL, is a method that is currently in vogue within the
educational community. Project-Based Learning is a method which consists of a complex
task or question that students must accomplish through a designed product. Students must
work in teams in a cooperative way and need to plan the project, organise, research,
discuss, negotiate and share the results (Moss and Van Duzer 3). Gökhan Bas (2008)
states that “the need for education to adapt to a changing world is a primary reason that
project-based learning is increasingly popular” (2). Fried-Booth (as cited in ibid.) notes
that PBL creates a bridge between the English spoken in the class and real-life English,
and that places students in situations which require the usage of the language in order to
89
communicate as well as the establishment of a trusting relationship among the team
members (cited in Moss and Van Duzer). Information gap activities, learn-to-learn
activities, role playing, interviews, research and planning are only some of the key words
related to PBL (Gökhan 5).
As a summary it can be highlighted that Project-Based Learning is characterised by the
following principles (Moss and Van Duzer):
❖ Builds on previous work: integrates speaking, listening, reading and writing skills.
❖ Incorporates collaborative teamwork, problem solving and negotiating
❖ Requires learners to engages in independent work
❖ Challenges learners to use English outside the class
❖ Involves students in the planning process
❖ Leads to clear outcomes
❖ Incorporates self-evaluation, peer evaluation and teacher evaluation.
Other advantages that could be mentioned are that PBL allows all students to use the
learning method they prefer according to the type of intelligence they have. For instance,
a visual learner could create a poster to show the results, and a kinaesthetic learner might
prefer role-play or to give a presentation in order to share her/his results. PBL also
integrates all the curriculum areas and the social and cultural context easily and it is
adaptable to the skill levels of each of the students. In addition, learners develop good
learning habits and responsibilities and learn the contents (“know”) and how to use them
(“do”) at the same time.
Nevertheless, there are also some difficulties which may be considered as disadvantages
by some teachers, and that is the main reason why they do not introduce the method as
much as they should. These would include the following (Pozuelos, Rodríguez and Travé
14-15):
90
❖ The amount of workload that starting a project entails.
❖ Lack of resources and materials for developing the project.
❖ Rigid school timetables divided by subjects.
❖ Lack of practice or examples of PBL in the teacher’s environment.
❖ Climate of uncertainty the first time a project is carried out.
❖ Critiques made by the parents or teachers of the school because of the innovative
character of the method.
❖ The apparent inability of cover all the contents established by the curriculum.
2.8.5. Interviews
Abbas observes that interviews “are one-on-one sessions between the learner and the
instructor” (31). Richard and Renandya (qtd. in ibid. 31) state that conferencing is an
effective means of teacher response to student writing. It is a form of oral teacher
feedback. A short 10 to 15-minute conference will enable the teacher to ask students about
certain parts of the letter writing which are problematic. In fact, Fitzpatrick et al. comment
that “helpful as they are, written documents cannot provide a complete or adequate basis
for describing the object of evaluation” (207). However, it is essential to highlight that
interviews are not ordinary conversations but conversations with very concrete
pedagogical purposes. These purposes may be, as already mentioned above, to clarify
writing parts or to get feedback from the student on the subject, the teaching, the contents,
etc. But they could also be to assess the student’s competence. In EFL, interviews could
be extremely helpful in determining a learner’s speaking competence but also to check a
certain grammar point or the specific vocabulary from a theme, and to assess the listening
comprehension of a record or if the student has understood a reading.
91
Castillo and Cabrerizo (376) state that interviews have three functions: diagnostic (its
goal is to identify problems or level), collection of information (about the student’s
interests) and therapeutic (in order to put strategies of intervention into practice).
They also point out the advantages of the interview (ibid. 377):
- It allows the teacher to establish a closer relationship with the student.
- Interviews are adaptable to the student and his/her level.
- Interviews are flexible, the interviewer can adapt himself/herself to the
circumstances which can suddenly emerge.
- It provides a lot of information, both verbal and non-verbal.
- It is suitable for many students with special needs.
On the other hand, interviews also present a number of disadvantages:
- It they are time consuming, which is especially problematic with big groups.
- There could be problems with subjectivity.
- Interviewers should be highly audacious and perceptive.
2.8.6. Progressive assessment chart
There is a diverse range of varieties which can be considered as checklists. Some authors
distinguish between category system, checklist and progressive assessment charts
(Castillo and Castrillo 362-364), whilst others conceive them as different sorts of the same
tool or even mix them. The truth is that all of them may be combined into one by the
teacher in order to adapt it to his or her concrete evaluation needs.
The difference between them is the way of measuring or assess. The categories system
collects comments written by the teacher on some previously defined and concrete
categories. These categories must be thorough and exclusive. Checklists, on the other
hand, are not based on comments but on yes-no answers to the different aspects defined.
The teacher must point out whether or not one aspect is present in the student’s behaviour.
92
Finally, progressive chart reports measure the different categories defined so they can
have some score, depending on how much that aspect is present or absent.
2.8.7. Report
As Fitzpatrick, Sanders and Worthen remind us, the main purpose of evaluation reports
is to improve the programme. However, “evaluation reports serve many purposes” (376),
among which the following could be highlighted:
❖ To help the teacher to make curricular decisions.
❖ To change attitudes and behaviours of the students.
❖ To improve communication between the teacher and the learners.
❖ To involve students in their learner process.
❖ To drive student’s attention to their difficulties.
In summary, reports inform the learners and the teacher himself/herself about the
“findings and conclusions resulting from the collection, analysis and interpretation of
evaluation information” (ibid. 377). It is fundamental that the report contain relevant and
important content for the students. This implies that the content has “truth value and utility
value”. Truth value refers to the quality of the information reported and the utility value
stands for the significance and pertinence.
Although reports are traditionally written on paper, there are multiple forms of presenting
reports and the teacher should consider the best delivery process. Among the possibilities,
the information may be reported through “brochures, audiotape reports or a slide
presentation”. (ibid. 379).
As formative evaluation aims to help students in their learning process, and it also helps
the teachers to improve their teaching and curricular decisions, it is recommendable that
reports be delivered regularly.
93
2.8.8. Rubrics
Rubrics or grading scales are charts which allow the assessment of a task on the grounds
of some criteria and established scale. Since rubrics suppose a meaningful and relevant
part of the current thesis, they will be discussed at length in a separate chapter (Chapter
4).
2.9. Language Assessment Literacy
Language Assessment Literacy (LAL) is a fairly common concept in other countries, such
as the USA. Unfortunately, as of yet it is not particularly well-known in this country. As
quoted in Herrera and Macías, “developing LAL among EFL teachers must be a necessary
component of foreign language teacher education programs” (303). Assessment literacy
implies that the teacher understands and knows the meaning of high- and low-quality
assessment. LAL implies the knowledge of certain skills and how to assess them, but also
the knowledge of how to measure language competence. In addition, it is necessary to be
conversant with certain related principles, such as validity, reliability, objectivity or
ethics. Fatalaki (78) expounds fifteen skills that teachers of foreign language should gain.
According to him, it is important for a foreign language teacher to learn how to control,
administer and score a test, how to interpret statistical and raw data, how to detect poorly
performed test items and confusing factors unrelated to the skill being assessed. He or she
should also distinguish between correlation and causation or between two or more data
sets. Furthermore, the acquisition of knowledge about different assessment measures of
reliability and validity of the test are also important points. A teacher must know how to
94
intervene suitably when students misbehave and understand measurement error and
confidence intervals. Communicating the results appropriately and having ethics and a
huge commitment to test improvement are also fundamental skills.
According to Herrera and Macías, a high competence of LAL entails an appropriate
design of assessment, the selection of alternative assessment methods or techniques, the
analysis of the impact caused by standardised tests and a suitable connection between the
language teaching approach and the assessment practices. The introduction of LAL in
teaching training programmes could solve many problems related or derived from the
evaluation process. Herrera and Macías suggest initially a “questionnaire to diagnose the
needs with regard to assessment literacy of not only EFL teachers but language teachers
in general” (304).
95
Chapter 3: The CEFR
The CEFR, as mentioned in chapter 1 and 2, was developed by the European Council as
the result of twenty years of research with the intention of providing a comprehensive and
transparent basis for the creation and design of language syllabi and curriculum
guidelines, together with the design of teaching and learning textbooks, worksheets, tasks
and, in short, teaching materials, and the assessment of foreign language competence.
As cited in the framework, assessment is used in the document “in the sense of the
assessment of the proficiency of the language user” (177). The CEFR provides the entire
educational community with Learning Standards for languages teaching. In addition, the
Framework itself is a valuable resource that can be consulted by teachers to construct the
specifications of a task or the construction of test items. For those purposes, information
is provided in section 4.1: ‘the context of language use’ (domains, conditions and
constraints, mental context), section 4.6: ‘Texts’, and Chapter 7: ‘Tasks and their Role in
Language Teaching’ (179) as well as in section 5.2: Communicative language
competences.
On the other hand, the use of the framework as a provider of Learning Standards and
guidelines for the construction of tasks and test items is not its sole function. The CEFR
also deals with the assessment of languages in many other different manners. This is
because “the scales provide a source for the development of rating scales for the
assessment of the attainment of a particular learning objective and the descriptors may
assist in the formulation of criteria” (179). The descriptors, which refer to the short text
that contains a description of what each level of reference consists of, can be used by
teachers for teacher assessment and also for student’s self-assessment. On the one hand,
the descriptors for communicative acts, for instance, may be particularly helpful for
feedback, as they can give the students an overall impression of their performance in a
96
task. Scales may also be a good tool for summative assessment since teachers can build
their grids or checklist on the grounds of the framework. As is well explained in the
document, such a descriptor “Can ask for and provide personal information might be
exploded into the implicit constituent parts I can introduce myself; I can say where I live;
I can say my address in French […]” (180).
On the other hand, the descriptors of proficiency can also assist in self-assessment and in
the creation of tools, such as checklists, grids or examination rating scales for
performance assessment.
The Framework is also a practical instrument to “relate national and institutional
frameworks to each other, through the medium of the Common Framework” and to “map
the objectives of particular examinations and course modules using the categories and
levels of the scales” (182). This means that the Learning Standards used in a determined
level in one institution are very similar to the ones used in another institution of a
complete different geographical point. It also implies that a student with a certified B2
level has the same competence in a language as any other citizen in the European Union.
Finally, the reference to the levels of the scales is a clear link stated by the Framework
itself to use it as a reference for the building of grading scales or rubrics in order to assess
language learners.
3.1. Common Reference Levels
Chapter 3 of the CEFR document deals with the six different levels which the European
Council establishes for the knowledge of a language. These levels are the following: A1
(Breakthrough), A2 (Waystage), B1 (Threshold), B2 (Vantage), C1 (Effective
Operational Proficiency) and C2 (Mastery). Sometimes, the six levels are referred using
97
basic or elementary (lower and upper) for the lowest two (A1 and A2), intermediate
(lower and high/upper) for the ones in the middle of the scale (B1 and B2), advanced (C1)
and proficiency (C2). As the framework states, the levels can be read from the highest to
the lowest, or the reverse. However, the levels are always presented from top to bottom
for consistency. Moreover, “each level should be taken to subsume the levels below it on
scale. That is to say someone at B1 is considered also to be able to do whatever is stated
at A2, to be better than what is stated at A2” (36-37).
All the scales provided by the framework can be separated into three groups: user-
oriented, assessor-oriented and constructor-oriented scales. The European Council
defines user-oriented scales as those which “report typical or likely behaviour of learners
at any given level” (37). These scales are always positively worded, and they state
holistically what the student can do in each level:
“…can understand simple English spoken slowly and carefully to him/her and
catch the main points in short, clear, simple messages and announcements”
(Eurocentres Certificate Scale of language Proficiency cited in European
Framework of Reference, 38).
Assessor-oriented scales deal with how well the student performs. They intend to guide
the scoring process and are usually negatively worded. Nevertheless, the Framework
encourages examiners and scale-designers to avoid these negative statements and try to
use descriptors as positive as possible by describing key features of good performance
examples. The Framework also highlights that those scales which contain more than 5
different criteria “have been argued to be less appropriate for assessment because
assessors tend to find it difficult to cope with more than 3-5 categories” (38). These scales
can be holistic or analytic, though the most recent ones are commonly used to determine
the level (diagnosis-oriented).
98
Constructor-oriented scales “guide the construction of tests at appropriate levels.
Statement are expressed in terms of specific communication tasks the learner might be
asked to perform in tests” (39).
3.2. Learning Standards
The development of the CEFR has been the most ambitious attempt so far to establish
certain common Learning Standards in the European Community. As indicated by its own
name, the CEFR, Learning Standards only relate to Languages, unlike the American
Common Core Learning Standards. Nevertheless, it is best to begin with a definition of
what they are.
“Learning Standards are concise, written descriptions of what students are expected to
know and be able to do at a specific stage of their education” (The glossary of education
reform, para. 1). Despite describing what students should have learned after a course (i.e.,
the educational objectives, what they can do at a specific level of their learning
progression), they do not mention any specific teaching method or curricula. Learning
Standards are normally organised by subject and most of them are common in the
different regions or states of a country.
In the United States, the development of Learning Standards represented highly
significant changes in the American education system. They appeared around the late
1880s and early 1990s and, by that time, each State had developed its own Learning
Standards for each grade. Since then, the so-called “Standards-based Education Reform”
has progressively introduced changes in the school system through different acts. One of
the most important changes was the development of the Common Core Learning
Standards, in 2009, by selecting the most effective Learning Standards which were
already in use in different States across the country. For that purpose, the National
99
Governors Association (NGA) and the Council Chief State School Officers (CCSSO)
took into consideration the comments and proposals of thousands of teachers, parents,
members of the school community and citizens concerned with the issue. [The National
Governors’ Association Center for Best Practices (NGA Center) and the Council of Chief
State School Officers (CCSSO), n.d.]. Although the Common Core Learning Standards
have been adopted by 42 States and some other territories, they have not been free of
controversy, and some of their aspects have been under scrutiny and have been object of
debate. Some of the concerns the critics have brought up are:
- Whether what students learn must be decided by the federal government rather
than by the local communities and parents of each school.
- If the Learning Standards are the most important and appropriate for all the
States and schools.
- If they are or not prescriptive enough.
- If they truly represent learning professions.
In Europe, the CEFR has linked the Learning Standards to the learning of languages. The
European Council points out the Framework’s aim:
“it was designed to provide a transparent, coherent and comprehensive basis for
the elaboration of language syllabuses and curriculum guidelines, the design of
teaching and learning materials, and the assessment of foreign language
proficiency” (Council of Europe, para. 1).
The recommendations of the European Union through the CEFR resulted in significant
changes in the Spanish education system. The current education act, the Real Decreto
(Royal Decree) 1105/2014, of 26 December, linked to the Organic Law of Education,
known as the LOMCE, draws on the Framework guidelines. In fact, the Spanish
100
translation for Learning Standards “estándares de aprendizaje” is mentioned 374 times in
the text. The act points out the Learning Standards provided must be taken as reference
for the curriculum programme and the syllabus design. (Real Decreto 1105/2014 171).
Those Learning Standards, as cited in the text, “permiten definir los resultados de
aprendizaje, y que concretan lo que el estudiante debe saber, comprender y saber hacer
en cada asignatura; deben ser observables, medibles y evaluables y permitir graduar el
rendimiento o logro alcanzado8” (172). In this case, the Learning Standards provided refer
not just to language subjects, but to all of them. The influence of the CEFR in the Learning
Standards provided for the English as a foreign language subject is clear, though. For
example, the descriptor proposed by the CEFR for B1 competence in overall listening
comprehension stands as follows: “Can understand straightforward factual information
about common everyday or job-related topics, identifying both general messages and
specific details, provided speech is clearly articulated in a generally familiar accent”
(CEFR 66). The same level learning standard in the same skill (Year 2 of the
Baccalaureate, B1, listening comprehension) in the Spanish act is as follows:
“Comprende instrucciones, anuncios, declaraciones y mensajes detallados, dados cara a
cara o por otros medios, sobre temas concretos, en lenguaje estándar y a velocidad
normal9” (Real Decreto 1105/2014 442).
The Spanish Learning Standard for the oral production in the same level “Hace
presentaciones de cierta duración sobre temas de su interés académico o relacionados con
su especialidad10” (Real Decreto 1105/2014 443) corresponds to the B1 descriptor of the
8 Allow us to define the learning results and concrete what the student must know and understand in each subject; they must be observable, measurable and evaluable and allow to grade the performance and reached achievements 9 He understands instructions, announcements, declarations and detailed messages in face-to-face conversations or through other means in standard language and at normal speed 10 He makes presentations of a certain length about topics of his academic interest or related to his specialty.
101
CEFR for addressing audiences “Can give a prepared straightforward presentation on a
familiar topic within his/her field which is clear enough to be followed without difficulty
most of the time, and in which the main points are explained with reasonable precision.”
(CEFR 60)
The two aforementioned Learning Standards are just two examples to illustrate the clear
connection between the Learning Standards given by the European Council through the
CEFR and the Learning Standards stated in the Spanish Royal Decree for the subject of
English as a Foreign Language. This connection can be found in all the Learning
Standards stated in the above mentioned Spanish Royal Decree.
3.3. Chapter 9: Assessment
The CEFR devotes an entire chapter to the assessment of foreign languages. More
specifically, to the use of the CEFR with assessment purposes. In this chapter, the
difference between assessment and evaluation is clarified. Furthermore, validity,
reliability and feasibility are three terms referred to as fundamental for any kind of
discussion in this area. Consequently, the framework can be a great help as the assessment
of a particular level should include the same criteria. Thus, if a learner passes a B1 exam
in one country, he/she should be able to pass a B1 exam in any other country. In the same
way, two different learners who have passed a B1 exam should have more or less the
same level. These two examples would prove that the exams were valid and reliable. As
the framework itself states, it can be used in order to specify the content of the tests, the
criteria to determine the attainment of an objective and for the description of levels of
proficiency, so that certain comparisons across different systems can be made. The tests
102
should be also feasible; i.e., they should be practical for the assessors. As the CEFR
provides them with a useful point of reference, the feasibility of their examinations should
be easier thanks to it.
With regard to the use of the framework for the preparation of tests and examinations, it
provides the examiner with numerous criteria that can be consulted when designing a
task. Furthermore, it contains a “sampling of the relevant types of discourse” (178).
Different domains, conditions and constraints as well as mental context can be found in
the Framework. The appearance of examples for all the levels is also very useful for this
task.
The huge number of descriptors provided and classified according to the level constitute
a key source for the development of rating scales and checklists. Hence, the descriptors
of the communicative activities which can be found in chapter four of this thesis, can be
used for the definition of a specification for the design of an assessment task. The scales
are also helpful for reporting the results. Finally, teachers can self-assess themselves or
use the scale to implement student self-assessment. For instance, they can create a
checklist or a type of grid for continuous assessment or for summative assessment at the
end of each lesson/unit or course. Nevertheless, it is important to maintain the scales
positively worded in case some changes are made since it is a typical mistake of such
scales to be negatively worded (181).
The scales of the Common Reference Levels aim to facilitate comparison among systems.
In this regard, if the same descriptors are used in the examination, different tests can be
compared as well as the results of those tests, so that both national and institutional
frameworks can be related.
103
Chapter nine also includes a list of the different types of assessment. In particular, it
contains 26 kinds which are classified in pairs of opposites:
- Achievement assessment/ proficiency assessment
- Norm-Referencing (NR) / criterion referencing (CR)
- Mastery CR/ Continuum CR
- Continuous assessment/fixed point assessment
- Formative assessment/summative assessment
- Direct assessment/ indirect assessment
- Performance assessment/ knowledge assessment
- Subjective assessment/objective assessment
- Rating on a scale/ rating on a checklist
- Impression/ guided judgment
- Holistic/ analytic
- Series assessment /category assessment
- Assessment by others/ self-assessment
All the kinds of assessment cited above are explained in the framework. Most are already
defined in the “Types of evaluation” section of the current work. Due to the importance
that the distinction between holistic and analytic assessment will have for the current
thesis, special attention has been paid to the definitions and explanations provided by the
framework. According to this, it can be stressed that a holistic assessment implies a
“global synthetic judgment” (190), whereas in an analytic assessment, different aspects
are analysed separately. The Framework also clarifies the fact that the distinction can be
made in terms of what is assessed; e.g., whether it is a global category, such as “writing”,
or if the examiner needs to assign separate scores to all the aspects involved. Another
104
example could be how the result is calculated, either with holistic rating scale or with an
analytic grid.
3.4. The CEFR on rating scales or checklist
Advice on how to design and develop effective grading scales or checklist can be found
in chapter 9 and in the appendices. The recommendations contained range from the
suitable number of criteria to the formulation of the descriptors.
Chapter 9 also warns of the importance of choosing a feasible assessment tool; for
instance, if a rubric is selected as the assessment tool, the possible categories that it
includes must be feasible. The framework emphasizes that it is “more than 4 or 5
categories starts to cause cognitive overload and that 7 categories is psychologically an
upper limit” (193). As a result, if the criteria considered relevant for the assessment
exceed that limit, these features ought to be combined and renamed under a broader
category. The framework itself exemplifies the process by providing 12 possible
categories which could be used in the assessment of the oral competence (193):
➢ Turn-taking strategies
➢ Cooperating strategies
➢ Asking for clarification
➢ Fluency
➢ Flexibility
➢ Coherence
➢ Thematic development
➢ Precision
➢ Sociolinguistic competence
➢ General Range
105
➢ Vocabulary range
➢ Grammatical accuracy
➢ Vocabulary control
➢ Phonological control
To illustrate the reduction process, four different scales are presented. These scales show
how the 12 aforementioned criteria have been reduced to five or six categories which
encompass them all (194-195). One of the examples is the Cambridge Certificate in
Advanced English (CAE) which contains five different criteria. The different criteria
encompassed under each one are specified within brackets:
• Fluency (fluency)
• Accuracy and range (general range, vocabulary range, grammatical accuracy and
vocabulary control)
• Pronunciation (phonological control)
• Task achievement (Coherence and sociolinguistic appropriacy)
• Interactive communication (turn-taking strategies, co-operative strategies,
thematic development)
With regard to the formulation of descriptors, appendix A includes several specifications
on the best way to state them. The first remark is on positiveness. Previous research on
proficiency scales detected a tendency to formulate lower levels descriptors negatively.
The framework recognises the difficulty in doing so, “it is more difficult to formulate
proficiency at low levels in terms of what the learner can do rather than in terms of what
they can’t do” (205), but it also encourages the desire to revert that tendency. Some
examples are given, for instance, “can produce only formulaic utterances lists and
enumerations” could be expressed as follows: “produces and recognises a set of words
106
and short phrases learnt by heart” (206) in an attempt to reformulate the descriptor
positively.
Definiteness in the statements is also encouraged by the Council. Avoiding vagueness
and describing concrete tasks are essential for achieving effectiveness. However,
definiteness should not lead to the production of excessively long descriptors since, as the
framework notes, a “descriptor which is longer than a two-clause sentence cannot
realistically be referred to during the assessment process” (207). Brevity also helps the
independence of descriptor desired by the framework. Moreover, the descriptors must be
clear and transparent so that both the examiner and the learner can totally understand what
is expected in the assessment.
As for the ways in which descriptors of language proficiency can be assigned to different
levels, three different methods are possible: intuitive methods, qualitative methods and
quantitative methods. In the case of the last two, the criteria can be started either with
performance or with the descriptors. If intuitive methods are used to design the scale,
there are three possible options:
1. An expert would be in charge of developing the rubric or checklist.
2. A committee develops the grading scale. In this case of a small of group of experts
would be in charge of developing the rubric. They may use drafts and work on
them.
3. If an experiential principle is chosen, a committee will develop the grid and
afterwards a systematic piloting and feedback could be implemented to check its
effectiveness.
Selecting a qualitative method for the creation would involve “small workshops with
groups of informants and a qualitative rather than statistical interpretation of the
107
information obtained” (209). On the other hand, if the method is quantitative it would
“involve a considerable amount of statistical analysis and careful interpretation of the
results” (210).
3.5. Evaluation of competences
In the document “Assessment of Key Competences in initial education and training:
Policy Guidance”, the European Commission defines a key competence as “a
combination of knowledge, skills and attitudes appropriate to a specific context” (6)”. It
is explained that these key competences encompass not only “traditional” competences
such as communication and digital competence but also some others such as learning to
learn, cultural awareness, initiative…etc.
The CEFR makes a distinction between general competences and communicative
language competences. The declarative knowledge, skills and know-how, “existential”
competence and the ability to learn are included as general competences.
Declarative knowledge includes knowledge on the work which embraces “locations,
institutions and organisations, persons, objects, events, processes” (46) and “classes of
entities and their properties and relations” (102); and it also includes sociocultural
knowledge (everyday living, interpersonal relations, values, social convention or body
language among others) and intercultural awareness.
Social, living, professional and leisure skills will form the skills and know-how
competence. With regard to existential competence, this consists of “factors connected
108
with their individual personalities, characterised by the attitudes, motivations, values,
beliefs, cognitive styles and personality types which contribute to their personal identity”
(105). Finally, the final general competence, ability to learn, is about studying and
heuristic skills and communication awareness.
Communicative language competence, defined by the CEFR as “those which empower a
person to act using specifically linguistic means” (9), are classified into three types:
linguistic, sociolinguistic and pragmatic. The first is formed by the “traditional
competences”, such as lexical, grammatical, semantic, phonological, orthographic, etc.
Sociolinguistic competence refers to those abilities to use the language in a social
dimension. This means knowledge of different formulas, including greeting, introduction,
politeness and register, among others. To finish, the pragmatic competence concerns the
organisation of speech, language functions and the fluency and propositional precision.
All of these competences must be taught, worked and assessed in the classroom. The
CEFR has eased the process by providing the educational community with different
grading scales in which descriptors about the different levels of each competence are
formulated. Therefore, it can be checked to what extent students masters a particular
competence, either general or linguistic, and whether their level is suitable for the course.
It is also important to highlight that the approach with which all these competences must
be acquired: “an action-oriented one in so far as it views users and learners of a language
primarily as ‘social agents’ (p.9).
Besides general and communicative competences, Jose Ángel del Pozo states that
professional competence is also fundamental in current society, where education is
frequently linked to the work world. In this regard, del Pozo gathers the competences
from the different approaches mentioned and classifies them three broad types. Basic
competences are related to the previous knowledge that allows the student to enter the
109
work world, cross curricular competences are related to social skills, team work,
methodological skills; and specific competences refer to the specific and technical
abilities required by a profession (16-17).
In the chapter eight of the CEFR, two other competences are defined: plurilingual and
pluricultural competences. These competences are defined as the “ability to use languages
for the purposes of communication and to take part in intercultural interaction, where a
person, viewed as a social agent has proficiency, of varying degrees, in several languages
and experience of several cultures” (168). The plurilingual competence boosted by the
European Commission has been introduced slowly into the Spanish education system. In
recent years, plurilingual education systems have become commonplace in Spain. In
those schools, English is taught not just in the subject of English but also in some other
subjects, such as music, P.E, biology, arts, mathematics…. However, this does not fit the
CEFR’s definition of the concept of “plurilinguism”. It is indeed linked to another
educational movement, also currently in vogue, Content and Language Integrated
Learning, much better known by its acronym CLIL. The main idea is to learn a language
through the learning of contents of a different area.
The role of the Framework is intended to be open and dynamic, transparent,
comprehensive and coherent but non-dogmatic. As a result, it does not support any
particular language teaching method and it does not position itself in any current dispute
on language education (CEFR 18). This does not imply the CEFR does not have a huge
impact on education policies. In contrast, it
“will enable them [users] to approach public examination syllabuses in a more
insightful and critical manner, raising their expectations of what information
examining bodies should provide concerning the objectives, content, criteria and
procedures for qualifying examinations at national and international level” (20).
110
So, for instance, if the Framework provides a description of the levels for different
competences and those are taken into consideration and selected as a learning objective,
it will lead to particular content choices in the curriculum and the syllabus.
Concerning assessment, the CEFR devotes an entire chapter (Chapter 9) to this important
subject. This chapter provides the education community with an extensive list of different
types of assessment, although it is not exhaustive, as it has been clarified in the document
itself. The CEFR can be taken as a reference or resource for assessment in multiple ways.
One of them is for the specification of the content for examinations or the criteria for the
attainment of a language objective. Furthermore, the descriptors may be used to construct
tasks, to give feedback, and also for self- or teacher-assessment, by using them as a sort
of checking list or grid.
The European Commission intends to promote the key competences in order to achieve
certain general objectives. These aims are meant to reduce early school-drop-out, to
increase early childhood education, to provide a better support to teachers and to provide
students with high-quality learning based on significant and relevant curricula.
111
Chapter 4: RUBRICS
The use of rubrics is becoming increasingly popular within the educational community.
As opposed to traditional tools of evaluation, which normally measure the memory
knowledge and cognitive skills of the learner, rubrics are assessment tools which measure
the performance of the learner in a standardised way. K. Bujan (75) argues that rubrics
are being used in order to grant traditional qualifications a more authentic or real value.
Rubrics are similar to templates, file cards or forms which are used as a guide to assess
specific activities (Castillo and Cabrerizo 405). It is essential to highlight that rubrics
allow the teacher to assess not just intellectual skills, such as critical thinking, analysis,
opinion, or creativity, but also the learner’s attitude (Bujan 75). One of the strongest
points of using rubrics as an evaluation or assessment tool is the fact that they work as a
guide for both teachers and students. This means it is highly encouraged to allow students
to consult the rubrics the teacher uses before being assessed. Knowing what aspects are
going to be measured and what needs to be placed in one descriptor or the other could
help learners in many different ways. First of all, they know exactly what is going to be
assessed, they know what the assessment criteria will be, they can better understand the
process of evaluation and the process of assessment and they get more involved in it.
Furthermore, it could also be recommendable at a certain point to allow students to create
a rubric as this could help them to understand how it works and to reflect on the
assessment process. Students could also attempt to self-evaluate themselves using the
rubric and arguing and defending their opinions.
4.1. Definition
112
Rubrics can be defined and understood in slightly different ways. Brookhart defines
Rubrics as “a coherent set of criteria for students' work that includes descriptions of levels
of performance quality on the criteria” (4 cited in Wang). Melissa D. Henning gives a
definition in the same line, “a set of scoring guidelines that evaluate students' work and
provide a clear teaching directive”. As those definitions state, rubrics are mainly
descriptive and not evaluative. Heide Andrade argues that rubrics “are often used to grade
student work but they can serve another, more important, role as well” (1). At Berkeley
University Center for Teaching and Learning, rubrics have three characteristics: the
criteria students must achieve for a task, the indicator of quality which students should
know and follow in order to pass the task; and as a scoring tool. However, the use of
rubric is also highlighted not only as a summative or formative tool but also as a teaching
tool which benefits teachers, students, and the entire teaching-learning process. Henning
explicates very clearly the process:
[Rubrics] “convey the teacher's expectations and they provide students with a
concrete print out or electronic file showing what they need to do for the specific
project. Typically, a teacher provides the rubric to students before an assignment
begins, so students can use the rubric as a working guide to success.”
Furthermore, they help the teacher during the assessment as they provide them with a
complete range of criteria and goals in different aspects, not just grammatical. They also
contain curriculum goals and standards. In addition, they enable students to understand
their scores by comparing them with the rubric used. Self-assessment and peer-
assessment may also be encouraged by the employment of rubrics.
113
4.2. Why use rubrics?
Research on rubrics in Spain, although scarce, underlines the necessity of using rubrics
aimed at the methodological changes framed by the European Council. The establishment
of the CEFR, the changes produced in post-secondary education brought about by the
process of Bologna together with the enactment of recent educational legislation have
shifted the focus within in the teaching-learning process. The teacher is no longer the
centre of attention; neither is theoretical knowledge the aim of the process. In contrast,
pupils are now the main focus. Students are guided by a teacher who facilitates the pupil’s
own learning while working on different skills. The final objective is to ensure that
students are prepared for the labour market and to ease their integration into the
workforce.
These significant changes in the teaching-learning process convey an evaluation process
with a different aim: not only the assessment of the knowledge, but also assessment as an
important tool to improve the teaching. It demands student participation through self-
assessment and peer assessment and the need for teacher’s feedback.
Rubrics may be a very useful tool to adapt the teaching-learning process to the recent
demands of the system. Cano gives reasons why teachers should support the use of
rubrics:
a) Because of its formative value instead of summative. Even though they are more
and more used to assess nowadays but they were first associated to qualitative
feedback.
114
b) They may guide the learning process. They are useful for the teachers as they
make sure that their teaching is in harmony with the criteria that they are going to
apply to assess the students. Moreover, students can know what is expected and
they may also learn what points they need to improve after the assessment.
c) Because of its building value. The participation of the students in the creation of
the rubric allows a major implication in their learning process and it helps them
to learn to learn.
d) Because of the monitoring of long-term development. Rubrics allow the
development of students over different courses to be checked.
e) Because of the scientific evidence. There is a great deal of scientific evidences on
rubrics’ benefits and validity. (269-270)
The centre for Advanced Research on Language Acquisition, CARLA, makes up to
twelve arguments on why the educational community needs rubrics. Some of them are
very similar to the ones mentioned above, such as “helps instructors to clarify goals and
improve teaching” or “learners can develop ability to judge quality, own work and peer’s
work” and others encompass other possible arguments in favour of the use of rubrics. For
instance, using rubrics can “answer parents’ questions” as they can see what students
must achieve and how they have performed, it allows the time evaluating performance
and giving feedback to be reduced, it aligns evaluation criteria to standards, curriculum,
instruction and assessment task and increases “reliability, implies consistency and
objectivity”.
Positive outcomes with regard to rubrics have been found in several studies. For example,
a study carried out on the use of rubrics in different educational centres has achieved
115
positive and encouraging findings. A case study conducted in the Universities of Granada
and Vigo (Gallego Arrufat and Raposo-Rivas) concluded that, after using the rubric
during a whole term in one subject, students thought that its use increased their motivation
and boosted cooperative work. They expressed themselves in that way through a survey
based on a Likert scale (211). Another case study carried out by Verano-Tacoronte et al.
designed a rubric based on specific literature read. This rubric was validated by a panel
of experts and later used to assess undergraduate students. Students had access to the
rubric in order to prepare their presentations in pairs. They were assessed by a team of
techers using the rubric and the results showed a high reliability of the rubric as the scores
given by the teachers were fairly similar.
Some authors have investigated rubrics by analysing research conducted by other
researchers on the issue. For instance, Jonsson and Svingby revised seventy-five scientific
field studies on the reliability and validity of the rubrics and they concluded the scoring
is more consistent when using them, and both reliability and validity of the assessment
process increased by employing them as an assessment tool. Panadero and Jonsson
analysed twenty-one studies finding that rubrics provide transparency to the assessment,
reduce anxiety, aid with feedback, and help to improve students’ self-efficacy and self-
regulation.
Nevertheless, many authors have also noted drawbacks. Verano-Tacoronte et al. warn of
the scarce training teachers still have (43). Other researches are also critical towards the
use of rubrics, although they do not discourage their use. Reddy and Andrade are critical
with the carelessness about how valid the rubrics of some researches are, since they
describe neither the rubric development nor the content validity (cited. in Cano 273).
116
Panadero, Alfonso-Tapia and Huertas concluded that, in spite of improving the feedback,
rubrics do not boost learning by themselves (cited in Cano 270-274).
Furthermore, a study carried out by Velasco-Martinez and Tojar-Hurtado and published
in Investigar con y para la sociedad attemped to ascertain to what extent teachers are
using rubrics to assess competences. With this in mind, they analysed 150 different
rubrics used by teachers of different universities in Catalonia, Aragón, Galicia,
Extramadura and Castilla y León. Among the results obtained, it was discovered that the
branch of social and legal sciences is the one in which rubrics are mostly used (34%) as
opposed to arts and humanities (only 4%). Another finding was that rubrics were mainly
used to assess essay writing (36%) and hardly ever used to assess visual or graphic
resources (2.7%), which implies a traditional conceptualisation of knowledge as
something that can be memorised. In addition to these findings, the authors also provide
the educational community with some other interesting data they gathered, such as the
teaching methodology the participant teachers usually apply in their lessons. These data
evinced that the teacher-centred lecture is still the most used methodology (36.7%) while
other innovative methodologies have no significant presence in those universities; for
example, only 1.3% of the respondents use portfolios and 6.7% use case studies with their
students. (1396-1400)
4.3. Historical Overview. Rubrics in Education
The word rubric dates back to the Middle Ages. Popham clarifies that the term was used
by Christian monks, who spent their time in monasteries copying scriptural literature in
Latin. They frequently used large red letters in order to mark the beginning of each major
117
section. The Latin modifier for red materials was called rubric, so “rubric” was employed
to name the label of a section and, by extension, a category (6). Ayhan and Türkyılmaz
also mention “rubric” as “once used to signify the highlights of legal decision as well as
the dictations for conducting religious services” (82).
Gavin Brooks argues that rubrics were introduced in the L1 classroom in order to assess
students’ writings (229). Until that moment, writings were scored based on the teacher
criteria, who had to come up with a mark without any specific guidelines to support his
or her decision. According to Brooks, rubrics “were first proposed as a tool to analyse
writing in 1912 when Noyes suggested the use of a rubric as a means of standardizing the
evaluation of student compositions” (230). Noyes thought students were too subjective
when they scored and decided to create a rubric to grant more objectivity to the
assessment process (cited in Gardner, Powell and Widmann 2). At that time, the purpose
of rubrics was simply the assessment of students by means of an objective scale and not
the improvement of students’ writing. In the same year, Milo B. Hillegas created A Scale
for the Measurement of Quality in English Composition by young people, which would
be known as the Hillegas Scale. As Turley and Gallagher commented, this scale created
by a professor of Columbia University College “offered a scientific way to quantify the
quality of student compositions fashioned on a statistical model of normal distribution”
(88). The Hillegas scale was used by many schools in the United States. Edward
Thorndike, who had been Hillegas’ professor at Columbia, improved his scale in 1915,
by “substituting new specimens for certain of the original samples and by including
several examples in the steps at or near the middle of the scale” (Hudelson, cited in Brooks
230). Noyes, Hillegas and Thorndike early rubrics were used to compare and rank
different schools all over the United States and some school headmasters even used them
118
in order to assess their teachers (Turley and Gallagher 88). That was probably the first
step towards using rubrics not only for the assessment of students but also for the
“assessment” of teachers. In the University of Detroit (Michigan), S. A. Courtis,
supervisor of educational research in the 1910s, used the scales in order to assess the
effectiveness of teaching methods. The teachers had to use the scale in their class so that
he could supervise the teacher’s performance by comparing the writing scores of the
students. In this way, he could check if the scores were decreasing or increasing. In the
event there was a decrease, the supervisor would intervene and work with the teacher to
improve the results. (ibid. 89)
Rubrics are still used nowadays all over the world. While in Spain they are still starting
to be used as an assessment tool and occasionally as a teaching tool, in the United States
their use is widespread. As has been explained above, they were used initially for
assessment purposes, but after the 70’s, they began to be used to provide students with
feedback (Brooks 230). However, they are still used in order to make comparisons
between schools, and its effectiveness as a tool to improve students writing has been
questioned by many American scholars in several academic journals. On the other hand,
some scholars strongly support the use of rubrics.
4.4. Types
There is also a large variety of rubric types. This is due to the fact that different criteria
can be used to classify them. Hence, according to the criteria we chose, we can allocate
the grading scales under different labels. In the present thesis, only the most common
ones are mentioned:
119
a) According to how is measured:
❖ Holistic or Global: the different parts are not measured separately but together, as
a whole. This way, performance is compared to the criteria in general. Popham
notes holistic scoring makes one overall, holistic judgment and there is a simple
reason for using them: holistic rubrics “saves tons of time”. Ayhan and Uğur
define them as follows “raters judge by forming an overall impression of learners´
performance and matching it to best fitting column on the scale” (p. 88). They
also expound that each of the scales describes performance in relation with several
criteria such as grammar, vocabulary, fluency, etc. Rubrics usually consist of 5 or
6 dimensions. Some of the positive aspects of using this kind of rubric are that
they are time saving, they can be used in different tasks, they are easier to
understand by children and they focus on what students are able to do, and not the
other way around. On the other hand, non-specific feedback is provided, and
students may meet criteria in some issues and not in others so it may be difficult
to locate them in one level.
- Primary trait: this kind of holistic rubric focuses on only one individual
characteristic
❖ Analytic or Partial: the performance of the student is compared to each of the
criteria stablished, separately. Sometimes, separate marks can be put together in
order to obtain a total mark. Analytic rubrics, in Popham’s words, “supply
diagnostic data of considerable utility” (17). Furthermore, they can show students’
progress in different aspects as well as pointing out their strengths and weaknesses
specifically (Ayhan and Uğur). In contrast, they are more difficult to design, more
difficult to use and more time consuming.
120
- Multiple trait: rubrics of this kind are very similar to the analytic ones
and the terms are often exchanged. However, Ayhan and Uğur explain
that the difference lies in the fact that “analytic rubrics evaluate more
traditional and generic dimensions of language production, while
multiple trait rubrics focus on specific features of performance” (89).
b) According to the scoring type:
❖ Quantitative: when the rubric has been composed to give a numeric mark.
❖ Qualitative: the qualification of the rubric is not numeric.
- Words: the different aspects are assessed with comments such as good,
excellent, poor, etc.
- Alphabetic: when the rubric is designed to provide a result in form of a letter:
A, B, C, D, E.
- Graphic: the results of this sort of rubric are presented through squares or
graphic symbols
- Symbols: the results are showed by a picture or sign, such as an emoji or
smiley. They are especially useful with young learners.
❖ Mixed: a mixture of more than one type of the above-mentioned types of rubrics.
c) According to its theme (Goldin 8-14)
❖ Domain-independent: this kind of rubrics consists of criteria which apply to any
theme, area or skill.
❖ Domain-relevant: they are opposite to the above mentioned. They are linked to an
area, owing to which they use vocabulary and terminology of that domain.
121
❖ Problem-specific: they are very specific, they are used not for a specific task but
to assess how the students address and solve a particular problem which they may
face in their future professional careers.
❖ Open-ended problem: this is very similar to the previous one but, in this case, the
kind of situation the students must face requires them to think more freely as there
is not a unique way of performing well.
d) According to its application
❖ Hypergeneral rubrics: they are, as Popham defines, “excessively inexact scoring
guides” (19). They contain excessively vague criteria, dimensions and descriptors
which may be applied to any kind of task, any skill and any domain.
❖ Task-specific: those kind or rubrics are built just for the assessment of one
particular activity. The main disadvantage is that they are “essentially worthless
… because they deal with a student’s response to a particular task” (21)
❖ Skill-focused rubrics: as the name explains, these rubrics assess one skill:
speaking, writing, listening or reading. If they are well built, they are very useful.
e) According to its function
❖ Proficiency/Diagnostic: its goal is to determine the learner’s level. It may be used
to place a student in a concrete level or to prove his/her level to obtain a certificate.
❖ Achievement: this kind of rubric attempts to reveal to what extent the learner has
achieved the course goals or how he/she has performed in a task.
122
f) According to the scorer
❖ Peer review/ co-assessment: rubrics used by learners in order to assess their
classmates. They can assess other learner’s work or even assess their colleagues’
work as part of a team.
❖ Teacher: rubrics used by the teacher to assess a student. Even though the teacher
is the one using them, it is highly recommended that he or she share it with the
students and explain it to them before the assessment.
❖ Self-assessment: rubrics used by the students in order to assess their own work.
g) According to the channel
❖ Paper: traditional rubrics printed on paper
❖ IRubric: online rubric. They are shared online with all the students. Some types
of software allow the teacher to provide feedback based on their assessment with
the rubric.
There are specific rubrics for assessing behaviour. According to the type of scale used
(Marin-García et al. 51-54)
❖ Rating scales: the descriptions of the different dimensions are numbers. They are
easy and quick to use but they may lead to doubts on their interpretation, the
feedback is vague and they have validity and reliability problems.
❖ BARS: this type of scale is used to assess behaviour. Key aspects of behaviour
are ordered from the least to the most efficient. They are very clear and more
reliable and objective, but they are difficult to create.
❖ BOS: they gather frequency data of behaviour observations. They are difficult to
apply buy they assess every dimension separately.
123
❖ Paired comparison: two students are assessed at the same time. The rater decides
which of them is better in each of the dimensions. It is very fast and quite reliable
but it implies some ethics issues.
4.5. Parts of a rubric
A rubric consists traditionally of a grid with multiple cells. The names given to the
different parts a rubric varies from one author to the other, but the parts mentioned are
basically the same although they may not use an identical term. Rubrics may have only
two sections (if they are holistic) or four (when they are analytic).
When the rubric is composed by two main sections, there is one vertical column and a
horizontal one as well. One corresponds to the language descriptions and the other one to
the scores. The two sections are frequently assembled, and they form just one column in
which each cell contains the score and the language description of that level.
Rubrics which have 4 sections normally contain a task description at the top. They usually
have many columns and cells. Either the left vertical column or the first horizontal line
may be called the scale and it can also be known as scores or performance levels. These
scale levels may be a number, but most often they are expressions which indicate the level
of achievement (Excellent, Good, Poor, etc.). If the scale is placed on the horizontal line,
then the first column on the left will contain the dimensions, also called criteria. Those
will indicate what is being measured (Grammar, Vocabulary, Cohesion, Coherence,
Fluency, Clarity, Visual Contact… etc.). Finally, the remaining cells in the grill form the
descriptors of the dimensions or the performance; i.e. they describe what is expected from
124
an X dimension or category in a determined scale level. Descriptors of the dimensions
might contain many qualifying adjectives and even examples of performance. García-
Sanz mentions that the Structure of the Observed Learning Outcomes Taxonomy
(SOLO), which was created by Biggs and Collins in 1982, is based on the progress from
incompetence to competence in order to create the achievement scale (6). Atherton
summarizes the V stages as follows:
“Pre-structural: here students are simply acquiring bits of unconnected
information, which have no organisation and make no sense. Unistructural: simple
and obvious connections are made, but their significance is not grasped. Multi-
structural: a number of connections may be made, but the meta-connections
between them are missed, as is their significance for the whole. Relational level:
the student is now able to appreciate the significance of the parts in relation to the
whole. At the extended abstract level, the student is making connections not only
within the given subject area, but also beyond it, able to generalise and transfer
the principles and ideas underlying the specific instance.” (cited in García-Sanz
93)
These stages can ease the teacher’s design of their own rubrics. They just need to take
them into account to stablish the performance scale and the descriptions of the different
dimensions.
4.6. Advantages and disadvantages
The main advantages of the usage of a rubric are as follows (Bujan 77 and Castillo and
Cabrerizo):
125
❖ The assessment is more objective, more standardised and more consistent,
especially when a practical performance such as a speaking or writing is being
measured.
❖ The teacher clarifies the assessment criteria very specifically.
❖ It adjusts better the assessment to the criteria.
❖ Rubrics provide useful feedback on the educational process.
❖ They allow the teacher to supply documentary evidence on the student progress.
❖ Rubrics ease students’ understanding of the results.
❖ They allow students to focus on what is expected from them and to review or
adapt their performance to what it is expected. (407)
José Ángel del Pozo also adds:
❖ Rubrics indicate clearly the strengths and weaknesses of the students.
❖ They boost students’ responsibility of their own learning (59-60)
Moreover, rubrics can be created by the students, by the teacher or another person and
they can be used for a great amount of purposes (Castillo and Cabrerizo): self-assessment,
peer-assessment and the assessment of: teachers, oral speeches, written productions,
individual works or essays, group works or essays, a portfolio (207).
In this regard, Raposo-Rivas and Martínez-Figueira clarify that peer assessment has
multiple advantages because it is an incentive for students, it allows them to develop
interpersonal strategies and social abilities, it helps them in the development of
professional skills and the reflexive and critical thinking. For such an important and
126
enriching practice, rubrics are an inestimable useful tool as they will enable students to
assess their peers in an objective way. (99-200)
Nevertheless, there may also be certain disadvantages. The CARLA mentions that the
“information provided by primary trait rubrics is limited and may not easily translate into
grades”. Besides, “task-specific rubrics cannot be applied to other tasks without
adaptation of at least one or more dimensions.” (cited in Ayhan and Uğur 90). Elena Cano
provides the educational community with reasons against the use or overuse of rubrics.
She specifies that attitudes may not always be correctly measured with rubrics as they
require long-term assessment. Furthermore, she disagrees with task-oriented rubrics as
they are too time-consuming, and she warns against the difficulty that creating a real valid
rubric implies. She defends her argument by indicating that there are many non-valid
rubrics available which may be harmful for the assessment if they are used and, at the
same time, some other tools may be used instead for the same purpose, such as checklists,
interactive feedback, portfolios… etc. Tunner (cited in Goldin 2011) supports this idea
by saying “while there seems to be a general consensus that rubrics are important and that
they improve the peer review activity, there is not as much agreement on how they should
be implemented” (22). There is no standardised and shared conception of the term rubric
within the educational community, neither in the design process nor in the applications
thereof, and this is a clear disadvantage for its common use and utilisation (Castillo
Tabares et al. 75-76). Some research carried out on the use of a rubric in order to assess
team work in the University of Lleida (París et al.) mentioned among their conclusions
that the rubrics used were not very flexible and adaptable to different circumstances,
alternatives were not proposed and there were difficulties in the coordination among
members. However, they highlighted the implication of the students in the goals, tasks
and achievements together with a strong feeling of integration in their own teams (95).
127
Similarly, White and Winkwort state that rubrics “connect key types of partnership
building […] with 3 key drivers that enable partnerships to grow” (5). Those key drivers
are the shared commitment, the capacity to hold collaboration and the common vision for
what can be achieved through it.
4.7. How to build a rubric
There are several steps which must be followed in order to design or create a rubric from
scratch. José Ángel del Pozo points out that determining what is going to be measured is
the first step. Once the skill is decided, the task can be designed or chosen. Afterwards,
the type of rubric which will be used for scoring must be selected. The type of rubric
employed must be chosen on the grounds of the task and context (number of students,
space, etc.). With the aim of creating the performance levels, it is important to start writing
only three levels, the maximum, the minimum and an intermediate one. Once those levels
have been selected, they are used as a reference to write the intermediate ones. The
following step is writing the description of each of the performance levels. What can be
observed from the learner must appear, it is advisable to look for an “excellent” model to
identify which characteristics define a good work. Avoiding the use of ambiguous words
is important as well as expressing the performance criteria in terms of observable student
behaviours or product characteristics. If possible, the different criteria which will be
assessed should be placed in the “order” they are likely to be observed to ease the scoring
task. Once the rubric has been designed, it is convenient to check whether it is really
useful. We could use a rubric to evaluate the rubric created, such as the one mentioned
above, or we could ask another teacher or scorer to check it. It is also recommendable to
implicate the learners in the process; i.e., by explaining and showing them the rubric, even
allowing them to make suggestions and little changes. If it is oral, the rubric has to be
used immediately after the performance. Furthermore, a good idea for improving the
128
teaching-learning process is to ask the students to evaluate their own work or their
classmates’ work. This way the teacher can compare his or her own score with the one
given by the learner and detect possible problems and misunderstandings. It would be
interesting to conduct an interview with the learners so that they can exchange
impressions and give a fully detailed feedback which helps both the student and the
teacher.
4.8. Online Tools for building a rubric
The following section analyses different tools which are available on the Internet for
building a rubric. It will briefly describe their appearance and how they work, and also
what their main advantages and flaws are.
Annenberg Learner is a tool created by the Annenberg Foundation. This foundation
promotes excellent teaching in American schools and offers different multimedia
resources to help teachers to improve their teachings methods. One of the resources they
have created is the Annenberg Learner. It is basically a simple resource to build rubrics.
The web design is very plain, not over-elaborate, but easy to navigate through. The rubric
can be created in just seven steps: first, a title must be given; secondly, it is possible to
choose the format, either in form of table or a list. The third step deals with the scale.
Users can select one of six different scales, one of which is numerical. Next, the order of
the scale must be decided, from worst to best, or vice versa. The fifth step refers to the
assessment criteria; there are twenty to choose from and the user can include as many of
them as he or she wishes. Step six gives the chance to put the assessment criteria in the
order wished. The final step provides the opportunity to make amendments in any of the
129
previous steps and to generate a pdf with the created rubric. The strengths of this tool are
its simplicity, speed and the fact that registration in not required.
Fig. 1. Annenberg Learner. Rubric creator. Screenshot.
Essay Tagger Common Core Rubric Creation Tool is also a rubric creator with a simple
design and use. The creation starts with the selection of the target grade level, which is
an indicator of quality, as it takes into account the target audience. Its handicap is the fact
that the levels are solely based on the British school system, which is a problem for
Spanish users. The second step for the rubric creation is the selection of the standards that
will be applied. Since the tool is created in line with the UK system, the common core
standards that appear are those included in the British education acts. However, this
should not be a problem, as most of the ones referring to speaking, listening, writing and
reading can be easily applied to the standards of the CEFR. For instance, “choose
language that expresses ideas precisely and concisely, recognizing and eliminating
130
wordiness and redundancy” (Common Core Standards, Language, Knowledge of
Language, as cited in Essay Tagger Common Core Rubric Creation). This resource also
allows the user to specify if the CC Standards will be applied to sentence, paragraph, or
whole document level. Finally, a title must be written, and an email address provided.
This way, the user will receive a link to the rubric created. Furthermore, some more rubric
elements which are not linked to the CC Standards can be edited once the rubric is created.
This rubric is, as has already been said, not targeted to a Spanish context. Nevertheless,
it is a useful model to try to create an equivalent tool in which the standard outcomes of
the educational law can be included.
Fig. 2. EssayTagger.com. Essay Tagger Common Core Rubric Creation Tool. Screenshot
iRubric is a really interesting tool created by Rcampus (Reazon Systems Inc.),
comprehensive Education Management System and a collaborative learning
131
environment. Besides providing teachers with the possibility of creating a class
(attendance list, score lists, schedules, send messages to students, post notes, etc.), it
allows them to look for a rubric out of the hundreds which other people have shared, to
create a rubric from scratch, or even to edit one already made by other teachers. It also
provides the teacher with the opportunity of sharing the rubric with the class in order to
assess in collaborative manner. Rcampus is free for the educational community but it does
require subscription.
Fig. 3. Irubric by RCampus. Screenshot
Rubistar is probably one of the most famous online tools for creating rubrics. It belongs
to the University of Kansas (created by Altec). The rubric is created in just three easy
steps: selection of topic/area/skill, selection of a customisable rubric, and selection of
categories (they are created automatically but they can be edited. The data base provides
the user with myriad rubrics to choose from, which is a very strong point in its favour.
The downsides include that the rubric can only be written in English, despite the fact that
the rest of the web page offers the possibility of being read in Spanish. Additionally, up
132
to nineteen rows can be chosen for the categories, but only four columns can be created
for the scale.
Fig. 4. Rubistar. Screenshot
Teachnology General Rubric Generator (Teachnology Inc.) is an easy tool to manage and
it allows an image to be selected for the rubric. Nevertheless, if one wishes to choose
some of the options, a fee must be paid. Moreover, it does not offer any pre-designed
material so the creation it is more time-consuming and difficult. Only four descriptors
and five categories can be selected.
133
Fig. 5. Technology General Rubric Generator. Screenshot
Quick Rubric (Clever Prototypes LLC) has a highly visual and helpful design. In the main
screen the user can visualise a blank rubric with a table format. All the elements can be
modified, the number of rows, the number of columns and the order of each of them. The
descriptors, the criteria, the categories and the scale can be introduced by the user. The
134
rubric can be saved and printed once it is finished. The creation process can be slow since
it does not provide any criteria example. Registration is also required.
Fig. 6. Quick Rubric. Screenshot
Rubric-O-Matic is Australian software created by Peter Evans. It is a much more complex
tool than the previous ones. It needs to be downloaded and installed as a complement of
Windows Word, but it provides many different functions. Besides the creation of rubrics
and access to many examples of rubrics, it contains marking scales from several
educational systems of Europe, Australia and United States. It has an automated grading
function, so that once the assigned mark for a criterion is given, it can automatically
calculate the total score. Moreover, as explained in its web page, it allows the user “to
create and use detailed reusable feedback comment banks, insert audio comments into
assignments, highlight a phrase in the assignment and click a button to do a plagiarism
search or to highlight easily confusing phrases or words.”
135
Fig. 7. Rubric. O-Matic software by Peter Evans. Screenshot
Finally, Princippia: Innovación educativa is an application provided by Google Apps
(Princippia, Formación y Consultoría, S.L.). It is a spreadsheet from Google docs which
can be used as a template for rubric creation. The template contains one spreadsheet with
the instructions for building the rubric, and another spreadsheet for the class list, which
can be edited, an editable rubric with the criteria, and another one for the scoring system.
One of the spreadsheets allows the user to select for each of the pupils the appropriate
descriptor assigned to the different criteria which are being assessed. The final
spreadsheet is the final assessment, which provides the final score for each student
together with each of the individual marks assigned for each criterion in different colours,
so that they can be easily seen at a glance. This tool can be particularly helpful if it is used
for the same class throughout the year, but it can be very time-consuming if many
modifications have to be introduced.
137
Fig. 9. Princippia by Princippia Formación y Consultoría, S.L. Evaluation Criteria.
Screenshot
Fig. 10. Princippia by Princippia Formación y Consultoría, S.L. Final Evaluation.
Screenshot
139
Chapter 5: METHODOLOGY
5.1. Introduction
As has already been stated, this thesis intends to analyse the exams and rubrics used by
the most popular English certificates in Spain in order to examine whether or not they are
effective and, for the rubrics, to determine the most common types used for each skill.
Rubrics will be assessed according to their types, if they measure what they are supposed
to score, and if they are valid and reliable. It is also the intention of the current work to
examine the tasks which compose the papers of the four different skills in each test, with
the purpose of deciding whether or not they follow the guidelines given by the European
Framework of Languages.
Through research people inquire, question, observe and analyse so that the object of study
is found, verified or refuted. According to Hernández Sampieri et al. research can be
defined as the set of systemic and empiric processes that are applied when a phenomenon
is being studied (20). Valid research must be based on scientific methodologies and
instruments to ensure that it is accurate and objective. Gil Pascual claims that researchers
use tools in order to be able to quantify the information or to transform it into figures (15).
Science subjects and areas have traditionally been the fields for research, whereas
educational ones were not considered valid for quantitative research. Nevertheless, Gil
points out the appearance of new research instruments which allow the quantification and
measurement of many aspects. Previously, these aspects were inconceivable in a
scientific investigation (15). Vez claims that the scientific orientation in the educational
area stems from Bloom’s structuralism and Skinner’s conductivism, the cornerstone of
the university knowledge platform (84-85).
140
Vez also expounds how the linguistic culture of the Didactics of Language Teaching
shifted linguistic material towards the didactic objectives level or the didactic contents.
This stems from the mentalist belief that there is always a transfer between conscious
knowledge and linguistic competence (85). Other advances in the linguistic investigation
made by Greenberg, Chomsky, Van Bruen, Fillmore and Andersons and Halliday in
different areas have allowed emphasis to be placed on (Vez 80):
- Semantics as opposed to morphosyntax
- Functionality as opposed to grammar inventory
- Communicative approach as opposed to no communicative.
According to Vez, it was the European Union which assumed the change of paradigm and
achieved, with some other institutional support, the mobilization of research towards a
different direction: languages are not the main objective anymore, rather the language
users (87).
The European Council was responsible for implementing of a set of actions oriented
towards the research of languages from a qualitative perspective and, in 1971, in the
symposium titled “The Linguistic content, means of evaluation and their interaction in
the teaching and learning of modern languages in adult education” (cited in Vez) which
was held at Rüschlikon, Switzerland, between 3 and 7 May , established the following
objectives, among others:
• language teaching should specify worthwhile, appropriate and
realistic objectives based on a proper assessment of the needs,
characteristics and resources of learners;
• language teaching should be planned as a coherent whole, covering the
specification of objectives, the use of teaching methods and materials,
141
the assessment of learner achievement and the effectiveness of the
system, providing feedback to all concerned;
• effective language teaching involves the co-ordinated efforts of
educational administrators and planners, textbook and materials
producers, testers and examiners, school inspectors, teacher trainers,
teachers and learners, who need to share the same aims, objectives and
criteria of assessment. (Vez 89).
Finally, Puren (cited in Vez 95) indicates that, from the new epistemic approach,
theorisation must come from internal data, referring to empirical data collected within the
educative framework by the actors of the teaching-learning process.
5.2. Methodological approach
There are several instruments of research in the educative field. This thesis will analyse
different rubrics as well as the exams and tasks they intend to measure and will compare
them with the guidelines explained in the European Framework.
To begin with, it is fundamental to describe the different sorts of research which exist.
Daniel Madrid notes the three basic types of research:
a) Basic or theorical investigation; used for the construction of abstract theoretical
methods which explain the teaching and learning processes of a language.
b) Applied research: the application of the theoretical models to the educative areas.
c) Practical research: it makes practical use of the other two investigation types. It is
normally based on the premises established by the theoretical and applied research
when they are applied to practical situations in the classroom (12).
142
Taking into consideration the previous classification, the methodology of the current
research can be framed as basic or theoretical. Additionally, it can be stated that it will
employ the technique of “Analysis of Documents”. Hernández Sampieri et al. mention
research using organisational documents and materials such as reports, evaluations,
letters, plans, messages, etc. (434) as part of a qualitative approach. They state that this
approach encompasses a wide variety of conceptions, views, techniques and non-
quantitative studies (20). In addition, Dale T. Griffee perceives data as the lifeblood of
research which connects theory and practice (128) and defines the data collection
instrument (DCI) as “the means, either physical or nonphysical, of producing
quantitative or qualitative data to be analysed or interpreted” (128).
This research will also be qualitative. Madrid refers to this kind of research as an
investigation which does not use numerical data extracted from reality but instead tries to
interpret and describe the reality in detail through words (12). The qualitative method
tries to erase the subjectivism of the person who is analysing the documents (Gil 281).
Through the analysis of the documents’ content, different data are collected with the
typical objectivity and relevance which the scientific method denotes (ibid. 282). For
Berelson, the analysis of content is a research technique for the objective, systemic and
qualitative description of the content of the communication” (cited in Gil 282). It is
therefore the transformation of content in quantitative data. According to Krippendorff,
this instrument is bound to formulate valid inference which can be applied to a context
from certain data (cited in Gil 283). The aim of the analysis of content may be, for
instance, describe tendencies, analyses persuasive techniques, connect features, relate
attributes, etc. (Gil 283).
Hernández et al. claim that the main purposes of the qualitative analysis are:
1) Explore the data
143
2) Give them a structure
3) Describe experiences
4) Discover concepts, categories, themes, patterns and links among the data in order
to give them sense, interpret and explain them with regard to the problem.
5) Understand the context which surrounds the data
6) Rebuild facts and stories
7) Connect the results with the available knowledge
8) Generate a theory based on data. (418)
In order to apply this methodology, it is fundamental to follow the below stated conditions
(ibid. 283):
- Objectivity
- Systematisation (according to organised patterns or standards)
- To be quantifiable
- Manifesto
Griffee suggests a similar process by which “a large amount of raw data is reduced” (128);
this data must subsequently be interpreted by the assignation of meaning. Then, it must
be validated. He defines validation as “an argument by which evidence shows that the
analysed interpretation based on data to some extent reflects the construct” (129).
Hernández et al. defend that qualitative approaches are open and expansive since they are
based on the literature review. Furthermore, they are normally applied to lower number
of cases. In conclusion, they are oriented to learn from experiences and different points
of views, value process and generate theories (361).
In order to apply the method, certain procedures should be followed (Gil 284-287):
144
1) Objective and context
Besides defining the aims and the universe which will be subject of study, it is
important to determine the type of documents which will be analysed.
2) Define the units of analysis
Among the different possible units of analysis (lexis, propositional,
argumentative, etc.) the research-conducted units will be the thematic ones, as
the documents investigated will be all rubrics and skill test papers.
3) Number scheme rules
The presence or absence of a code, its frequency, order of appearance, density,
concentration.
4) Categorisation
It is the classification of the elements of the text from a previously established
criterion.
5) Codification
This is the allocation of codes to each category.
6) Reliability and validity
Reliability can be calculated through the percentage of times that independent
codes coincide.
Validity will come about when all the information of the documents is covered in
the categories and when all the categories show rich results in order to produce
hypotheses and inferences.
7) Data analysis
Analysing the data requires:
145
- A descriptive phase where the frequency of the categories will be investigated
along with their internal and external variables.
- An inferential phase where conclusions will be drawn.
- A multivariate phase in which categories, complex structures and relationships
among the content blocks will be studied.
In addition, Gil vindicates that the content analysis may be carried out through the study
of the thematic content, the semantic content and the network analysis. The thematic
content may consist in word counting; i.e., the analysis of the number of times a word
appears. Afterwards, those with the same stem are grouped together. They can also be
classified by the context in which they appear. The analysis of the semantic content
studies the relationships among the terms. Thus, terms can be related in different ways,
for instance, if one term refers to a part of another, it would be an inclusion relationship;
if it is the output or the input of another, it would be a role relationship.
Nevertheless, research may not always be pure. Qualitative research does not use numeric
data but instead describes through words. Hence, the validation is quite significant, as has
already been mentioned. When the research methodology is not pure, it may be the case
that it is using triangulation. Triangulation is a technique which is normally used “to
validate data collection instruments and meet the charge of subjective bias from single-
method or single-observer studies (Griffee 123). It is generally defined as a combination
of methodologies. As Patton states (cited in Griffee 132), triangulation consists in
verifying the consistency of different data sources within the same methods and using
multiple perspectives or theories to interpret the data. Moreover, Daniel Madrid remarks
that the main principle of triangulation is the gathering and analysis of data from three
angles or points of views so that they can be compared and contrasted (34). He affirms
146
that it is especially useful for qualitative research since, for instance, data may be
collected from different documents or instruments and then combine them all. (ibid. 34)
A number of authors oppose triangulation, since they argue that one cannot mix methods.
However, Patton vindicates that the purity of the method is not as fundamental as the
search for useful and relevant information. He adds that “the research cannot test
predetermined hypotheses and still remained open to what emerges from open-ended
phenomenological observation” (133).
Once the different methodological approaches have been briefly described, it is time to
itemise the current research with the aim of framing it under the right labels. As has
already been stated, the current research will be basic or theoretical since it will be
substantiated in the literature review of the European Framework of Reference for
Languages, Assessment and Evaluation, rubrics and English Certificates with the
intention of determining whether the exams measure what they are supposed to measure,
if the instruments of assessment are truly effective, and if the guidelines promoted by the
European Council are being followed. Accordingly, the research will be qualitative.
The main technique used will be the analysis of documents, in particular the rubrics
which are employed for the assessment of the different English Certificates analysed and
the exam papers which form each of the certificates. Following Hernández Sampieri et
al. (418), the main purposes of the research will be:
1) Explore the data: analyse the tasks and items of the exam papers of each of the
skills assessed and the rubrics (if any) which the examiners use for the assessment
of them.
2) Structure data: all the data will be structured in recording tables.
147
3) Describe their structure, type and effectiveness: The tasks and items will be
described together with their objectives; the grading scales will be classified
according to different criteria and the effectiveness of both exams and rubrics will
be discussed.
4) Discover concepts, categories and patterns: The conclusions will make it possible
to check whether certain patterns exist according to the skill which is being
assessed. Those patterns will be ascertained with the help of a rubric created
specifically for this purpose.
5) Understand the context: all the data will be understood within the Common
European Framework context.
6) Rebuild facts: all the data gathered in the recording tables will be collected in a
new comparison rubric created specifically for this task.
7) Connect the results with knowledge: the conclusions obtained will be connected
to the knowledge available on assessment of foreign languages and didactics of
language teaching and learning.
8) Generate a theory based on data: The main errors presented both in the exam
papers and rubrics will be stated so that a number of future amendments can be
carried out.
The table below summarises the methodological approach.
148
This graph illustrates the methodological purposes.
This chart shows the methodological purposes of the investigation. First, the selection of
• Basic or theoretical: study of the European Framework together assessment instruments (rubrics).
Investigation type
• Qualitative. The data will be described and analysed through words.
Research type
• Analysis of documents: rubrics and exam papersInstruments
• Evaluation theory
•Rubrics theory
• European Framework for Languages guidelines
Study
• English Certificates
Selection•Rubrics
• Exam Papers
Analysis of documents (Tables)
• Placement of the data analysed in a rubric for the comparison and detection of patterns
Results•Reflection
• Possible Amendments
Conclusions
149
the English Certificates (with their respective exam papers and rubrics) based on the
previous study of evaluation and rubrics theory and the European Framework of
Reference for Languages guidelines. Second, the analysis of documents through the
recording tables created. Third, the arrangement of the data analysed in a rubric built for
the identification of patterns. Fourth, the reflection of the results and possible error
correction and the drawing of conclusions.
5.3. Research design
In order to plan the research, Gil’s scheme (284-287) will be followed. Thus, the first step
is to define the objectives and context of the research and also to determine the
documents which should be analysed.
1) Objective and context
2) Define the units of analysis
3) Number scheme rules
4) Categorisation
5) Codification
6) Reliability and validity
7) Data analysis
5.3.1. Objectives and context
The first step is the establishment of the objectives of the research and the context.
The objectives of the research are the following ones:
- Analyse the exam papers of the main English certificates in Spain to determine
whether or not they follow the European Framework of Reference for
Languages guidelines in regard to the assessment of each skill.
150
- Analyse the exam tasks in order to check whether or not they measure what
they are supposed to measure.
- Analyse the rubrics used for the assessment of the exam papers in order to
check their effectiveness and validity.
- Determine which types of rubrics are more common and whether some
patterns according to skills can be stablished.
The context in which the current research can be established is the detailed study of a
large body of theory related to the assessment and evaluation process which is gathered
in the literature review section (chapter 2), the study of the Common European
Framework of Reference (CEFR) (chapter 3) and the study of the theory of rubrics
(chapter 4). The information from all these three areas will determine the foundations on
which the current research is designed and established.
5.3.2. Definition of the units of analysis
The second step is the definition of the units of analysis. The documents which will be
employed for this research are the exam papers used of the main English certificates and
the handbooks provided by its institutions. Furthermore, the grading scales which they
use for the assessment of the different tasks or papers (if any).
5.3.2.1. Selection of English Certificates of ESL
In the current globalised world, English has achieved an important role as a lingua franca.
Hence, speaking fluent English has become imperative in many fields such as the
academic or economic ones. Furthermore, the assertion that someone can speak English
in an interview or a CV is no longer enough. As a result, the certification of the English
151
language level is nowadays a common requirement in order to apply for some jobs, to
study in some international universities or to aim for certain public positions. Roca Varela
and Palacios also describe how
“many Spanish universities, where, as a consequence of the Bologna Declaration
on the European Space of Higher Education (ESHE), undergraduates and
sometimes also graduate students need to show that they possess at least a B1
level of a foreign language, in most cases English, when they graduate” (55)
There are a wide range of English Certificates of ESL which aim to certify the level of
English their candidates actually have. The most well-known certificates in Spain are the
Cambridge University Certificates ESOL and IELTS, The Trinity College ISE,
CERTACLES, and the Spain’s Official School of Languages certificates. The criteria
selected for the election of those certificates and not others cater to several factors. First
of all, since the current research can be contextualised under the European Framework,
all the certificates should be valid in Europe and should be based on the CEFR and its
levels. This criterion ruled out other famous English Certificates of ESL such as the
TOELF. The second criterion applied is connected to the popularity and acceptance of
the certificates in Spain. According to this criterion, the English Certificates of ESL
selected are the most highly regarded and accepted by most institutions such as
universities or the Spanish government (Ministerio de Trabajo, Migraciones y Seguridad
Social).
All those tests assess the different skills separately through different papers or
performance tasks. In Roca Varela and Palacio’s words they:
“measure the ability of non-native speakers to understand and use English in real-life
settings by examining their competence to understand and produce written and spoken
152
English. Examinees are generally given an overall mark according to their level of
performance on the whole range of tasks included in the tests” (55).
Besides the selection of certificates, it is relevant to comment on the Framework level
chosen. It has already been said that the tests chosen are based on the CEFR levels of
competence. Therefore, all of them except the IELTS assess the aforementioned levels
with separate certificates. This means that the candidate decides which level of
certification he or she wants to be assessed in. If the test is passed, he or she is granted
the certification of the correspondent level. In contrast, if the candidate does not pass the
test, he or she does not receive any certification. In the case of the IELTS, there is only
one test whose results determine the level of competence the candidate has (from an A1
to a C2). In consequence, the candidate will always obtain a certificate after the
examination.
Concerning this research, it has been decided that the exam papers and rubrics analysed
correspond with the B2 level. This level is upper-intermediate, and it is the most
commonly required by companies and institutions to work or study. According to the
CEFR, a B2 user:
“Can understand the main ideas of complex text on both concrete and abstract
topics, including technical discussions in his/her field of specialisation. Can
interact with a degree of fluency and spontaneity that makes regular interaction
with native speakers quite possible without strain for either party. Can produce
clear, detailed text on a wide range of subjects and explain a viewpoint on a topical
issue giving the advantages and disadvantages of various options.” (24)
At this level, the user is already independent and is able to express himself or herself as
well as understand with a certain degree of facility in a broad range of topics.
153
All the certificates chosen use a rubric to score some of their papers and/or tasks:
In the following sections how each of the English Certificates of ESL above mentioned
assesses them will be analysed together with the rubrics they use, if any, to score them.
5.3.3. Number scheme rules
Concerning the number of scheme rules, it is important to define which principles are
going to be used in order to determine which rubrics are effective, valid and reliable. In
addition, the criteria to indicate if the exam tasks and criteria correspond to the ones stated
by the CEFR.
Cambridge ESOL: First Certificate
FCE
Writing paper: one rubric for
the assessment of all the tasks.
Speaking part: one rubric for
the assessment of all the tasks.
Reading paper: no
Listening paper: no
IELTS
Writing paper:two different rubrics,
one per task.
Speaking part: one rubric for
the assessment of all the tasks.
Reading paper: no
Listening paper: no
Trinity College London:
ISE II
Writing paper: two different rubrics for the assessment of the tasks, one per
task
Speaking part: one rubric for
the assessment of all the tasks.
Reading paper: no
Listening paper: one rubrics dor the assessment of the
task (the same used for Speaking)
ACLES
Writing paper: one rubric for
the assessment of all the tasks.
Speaking part: one rubric for
the assessment of all the tasks.
Reading paper: no
Listening paper: no
EOI
Writing paper: one rubric for
the assessment of all the tasks.
Speaking part: one rubric for
the assessment of all the tasks.
Reading paper: no
Listening paper: no
154
5.3.3.1. CEFR tasks and assessment criteria
In the CEFR, tasks suitable for the assessment of each skill, together with the general
criteria of assessment for a B2 user in each of the skills are stated. The information
concerning the tasks, objectives and the criteria gathered from each of the certificates will
be compared to the information contained in the CEFR. This study seeks to state whether
they are effectively assessing the skill. The summary of the information contained in the
CEFR for each of the skills can be found below:
In terms of writing, the CEFR states in the global scale provided that a B2 learner “can
produce clear, detailed text on a wide range of subjects and explain a viewpoint on a
topical issue giving the advantages and Independent disadvantages of various options”
(61). Three illustrative scales for the writing skill can be found in section 4.4. of the
framework entitled `Communicative language activities and strategies´. These scales can
be used in order to ease the design of communicative tasks and the evaluation of the
writing ability. The creative writing scale indicates that a B2 learner is able to “write a
review of a film, book or play” (62) whereas in the essay and report scale is pointed out
his or her ability to develop an argument by giving reasons and details or explaining the
advantages or disadvantages; and sum up information from different sources.
Tasks:
• completing forms and questionnaires
• writing articles for magazines, newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
155
• taking down messages from dictation, etc.
• creative and imaginative writing
• writing personal or business letters
Criteria:
A B2 learner can produce clear, detailed text on a wide range of subjects and explain
a viewpoint on a topical issue giving the advantages and independent disadvantages
of various options.
Can write a review of a film, book or play.
Can write an essay or report.
Concerning speaking, the normal tasks used to assess speaking are presentations,
description of pictures, interviews, discussions and dialogues between students.
The CEFR points out that the oral productions tasks should be (58):
• Public address (information, instructions, etc.)
• Addressing audiences (speeches at public meetings, university lectures,
sermons, entertainment, sports commentaries, sales presentations, etc.)
In addition, it is indicated that the assessment of the speaking skills can include some of
the following tasks: reading a written text aloud; speaking from notes, or from a written
text or visual aids (diagrams, pictures, charts, etc.); acting out a rehearsed role; speaking
spontaneously; singing (CEFR 58).
The CEFR provides the educational community with five different illustrative scales for
the speaking skill levels (58-61). Those scales can be used to assess the level of the:
• Overall spoken production
• Sustained monologue: describing experience
156
• Sustained monologue: putting a case (e.g. in debate)
• Public announcements
• Addressing audience
The criteria for a speaking B2 level overall production states that a B2 learner “Can give
clear, systematically developed descriptions and presentations, with appropriate
highlighting of significant points, and relevant supporting detail” and he or she “can give
clear, detailed descriptions and presentations on a wide range of subjects related to his/her
field of interest, expanding and supporting ideas with subsidiary points and relevant
examples.” (CEFR 60). In addition, some other specifications for particular functions
such as describing an experience or doing a debate can be extracted from the other rubrics
provided. Thus, a B2 learner must be able to describe with details different topics, give
reasons and support them in discussions and debates or highlight the advantages or
disadvantages of different options. He or she has the ability to give presentations with
clarity and fluency and depart spontaneously from those when follow up questions are
posed, or interesting points raised by the audience.
Tasks:
• public address (information, instructions, etc.)
• addressing audiences (speeches at public meetings, university lectures, sermons,
entertainment, sports commentaries, sales presentations, etc.)
• reading a written text aloud
• speaking from notes, or from a written text or visual aids (diagrams, pictures,
charts, etc.)
• acting out a rehearsed role
157
As for the reading comprehension is most commonly assessed through multiple choice
questions, true or false questions, sentence completion, open questions, gapped texts or
summaries. The CEFR indicates that the tasks selected should focus on (69):
• reading for general orientation;
• reading for information, e.g. using reference works;
• reading and following instructions;
• reading for pleasure;
• The language user may read;
• for gist;
• for specific information;
• for detailed understanding;
• speaking spontaneously
• singing
Criteria:
• clear, systematically developed descriptions and presentations
• supporting ideas with subsidiary points and relevant examples
• describing an experience
• describe with details different topics
• give reasons and supporting them in discussions and debates
• highlight the advantages or disadvantages of different options
• give presentations with clarity and fluency
• depart spontaneously from discussion when follow up questions are posed
158
• for implications, etc.;
According to the CEFR, the reading global scale provided states that a B2 candidate
is able to:
“read with a large degree of independence, adapting style and speed of reading to
different texts and purposes, and using appropriate reference sources selectively.
Has a broad active reading vocabulary but may experience some difficulty with
low frequency idioms” (69).
In addition, the CEFR contains four other reading scales:
• reading correspondence
• reading for orientation
• reading for information and argument
• reading instructions
The most important criteria comprised in those scales for the B2 users include the ability
to read correspondence, follow a text of instructions, understand specialised articles and
reports related to current issues. Among the functions that the candidate has to be able to
do are: scan through long and complex texts, finding relevant details and the identification
of the most important information, ideas or opinions (CEFR 69-71).
Tasks:
• reading for general orientation
• reading for information, e.g. using reference works
• reading and following instructions
• reading for pleasure
• for gist
159
• for specific information
• for detailed understanding
• for implications, etc.
Criteria:
• read with a large degree of independence, adapting style and speed of reading to
different texts and purposes, and using appropriate reference sources selectively
• ability to read correspondence
• follow a text of instructions
• understand specialised articles and reports related to current issues.
• scan through long and complex texts
• find relevant details
• identification of the most important information, ideas or opinions
Finally, regarding the most common tasks used for the assessment of listening, they are
multiple choice questions, true or false listening tests, open questions and sentence
completion exercises.
The CEFR lists the following listening tasks to assess this skill (65):
• listening to public announcements (information, instructions, warnings, etc.)
• listening to media (radio, TV, recordings, cinema)
• listening as a member of a live audience (theatre, public meetings, public lectures,
entertainment, etc.)
• listening to overheard conversations, etc.
160
In the CEFR there are five different scales to assess the listening skill. In the overall one,
it is pointed out how a B2 user:
“Can understand the main ideas of propositionally and linguistically complex
speech on both concrete and abstract topics delivered in a standard dialect,
including technical discussions in his/her field of specialisation. Can follow
extended speech and complex lines of argument provided the topic is reasonably
familiar, and the direction of the talk is sign-posted by explicit markers” (66)
Besides this scale, other four are provided for:
• Understanding interaction between native speakers
• Listening as a member of a live audience
• Listening to announcements and instructions
• Listening to audio, media and recordings
Those scales provide the main criteria which a B2 user must master in relation to the
listening skills. The criteria encompass the ability to keep up with a conversation,
understand much of what is said in a discussion in which he/she is participating and be
able to participate, understand and follow lectures, talks and reports with academic
vocabulary, announcements and messages and radio documentaries or broadcast audio. It
can also identify viewpoints and attitudes of different speakers (CEFR 66-68).
Tasks:
• listening to public announcements (information, instructions, warnings, etc.)
• listening to media (radio, TV, recordings, cinema)
• listening as a member of a live audience (theatre, public meetings, public lectures,
entertainment, etc.)
161
• listening to overheard conversations, etc.
Criteria:
• main ideas of propositionally and linguistically complex
• can follow extended speech and complex lines of argument
• keep up with a conversation
• understand much of what is said in a discussion in which he/she is participating and
be able to participate
• understand and follow lectures, talks and reports with academic vocabulary,
announcements and messages and radio documentaries or broadcast audio
• identify viewpoints and attitudes of different speakers
5.3.3.2. Criteria to create an appropriate rubric or to determine which ones are effective
To begin with, it can generally be stated that a good rubric must be valid, reliable,
effective and relevant. Nevertheless, those factors are commonly absent in many of the
rubrics contained in some textbooks or available on the internet. It is essential that the
teacher or examiner has all these criteria present when he or she builds the rubrics,
chooses it or supervises the one created by the students.
Whether the criteria included in the rubrics are pertinent in order to determine if the
student has achieved a specific skill or has completed the task successfully, is a key factor
to verify if the rubric is relevant. Another important issue is whether the task or what is
being assessed will really be useful for the learner in his or her future career.
162
Colin Phelan and Julie Wren argue that “validity refers to how well a test measures what
it is purported to measure” (para. 12), it is the accuracy of an assessment. In order to
achieve validity, the goal of the assessment must be clear and very well defined and set.
There are many different times of validity:
- Face validity: determines if the items assess the desired construct.
- Construct validity: is used to ensure that the rubric is actually measuring what
it is intended to measure, and no other variables.
- Criterion-Related validity: it consists on comparing the rubric to one other.
- Formative validity: it refers to the outcomes and tries to check to what extent
the rubric provides data to improve the learning program.
- Sampling validity: “ensures that the rubric covers the broad range of areas
within the concept under study” (para. 24).
Some good advice can also imply making sure the aims are very clear and operationalised.
With the purpose of checking that the rubrics match the evaluation standards and criteria,
it is important to try to get the student involved in the creation or choice of the rubric and
to get them familiarised with it, and, if possible, compare your rubric with other similar
rubrics.
Validity may be threatened by several factors, such as constructs very poorly defined, an
unsuitable selection of the task or an inappropriate context for the performance. The
teacher must make sure that the rubric chosen or created is adequate for what is going to
be measured.
Reliability is, according to Phelan and Wren, “the degree to which an assessment tool
produces stable and consistent results” (para. 1). This means, for instance, that a reliable
163
rubric will score a text with the same mark if it is used twice with one-month-time in
between or that two different teachers will score it with the same mark.
As in the case of validity, there are also different types of reliability. The first type is
called test-retest reliability. A rubric should score the same test with the same mark or a
similar one if it is used twice. Another sort of reliability is known as parallel forms and
consists of assessing the same test with two different rubrics which contain similar
descriptors and criteria. The score should be similar in order to obtain a high reliability
rate. When there is more than one examiner using the same rubric, the inter-rater
reliability can be checked. Finally, the internal consistency reliability intends to evaluate
to what extent different items of a test prove the same results. Phelan and Wren point out
two subtypes, average inter-item correlation, “obtained by taking all of the items on a test
that probe the same construct, determining the correlation coefficient for each pair of
items, and finally taking the average of all of these correlation coefficients” (para. 10)
and split-half reliability, which consists of dividing in half the items that measure the
same skill “the entire test is administered to a group of individuals, the total score for each
“set” is computed, and finally the split-half reliability is obtained by determining the
correlation between the two total “set” scores”.
It is essential to check how effective a rubric is with the purpose of improving the
teaching-learning process. A truly effective rubric must provide students with feedback
on what they need to improve and how to do so, what they have done well and also
provide the teacher with information about the whole process and how to address the
problems detected.
Relevance is another important factor. It is necessary to be sure the rubric descriptors and
criteria are really relevant to assess a particular skill or task. In addition, the task must be
164
genuinely useful for the learner’s development. The teacher has to ask himself/herself
whether what is being assessed is significant for the students, if it will help them in their
profession or if it really probes the master of one skill or not.
Popham proposes a rubric to evaluate rubrics. It is a very simple rubric with four
evaluative criteria which correspond to the following questions:
-Is the skill assessed actually worthwhile?
-Are there few scoring criteria but correctly labelled?
-Are there degrees of excellence described appropriately?
-Is the rubric presented in a clear and handy way? (27-28).
In addition, as it has already been noted, the Common Framework of Reference for
Languages includes certain guidelines on how to construct good grading scales or
checklist to use as assessment tools. The guidelines given highlight the necessity of
building a feasible tool, so it is not suitable to build a rubric which contains more than
5 criteria to assess. With regard to the descriptors, they should be positively worded,
brief and they should avoid vagueness.
5.3.4. Categorisation and codification
The fourth and fifth step are categorization and codification. The instrument chosen for
the research are recording tables. These charts have been designed specifically for this
study and they have been created bearing in mind that the information they contain is
going to be compared, examined and, obviously, analysed. Accordingly, they were
165
considered as a helpful visual instrument which will be used in order to summarise and
categorise the data extracted from the different exams and rubrics.
The recording table below has been designed in order to compare the tasks which each
paper contains with the ones proposed by the CEFR. Moreover, it allows the comparison
between the criteria stated to assess these tasks.
English certificate name CEFR
Task:
[Tasks which form the paper]
Tasks:
• [Quotation of the tasks
recommended by the CEFR for the
assessment of the skill
Criteria:
[Criteria for the assessment of the paper’s
tasks]
Criteria:
[Criteria stated by the CEFR for the
assessment of the skill in a determined
level]
In order to classify the rubrics used by each of the certificates in the different skills, the
following chart has been conceived. It seeks to give a detailed classification of a rubric.
Consequently, it encompasses all the different possible classifications which can be
considered to classify a rubric. As a result, the rubric will be defined according to all the
possible criteria.
Type of rubric according to English certificate name
How is measured
How is scored
Theme
166
Application
Function
Scorer
Channel
Moreover, the grading tables suggested by the Framework will be compared with the
rubrics used in the certificates according to the descriptors and criteria employed.
Finally, one recording table was designed to allow the summarisation of all the data
extracted and analysed. This chart it is merely a compendium of the two previous ones
together with a table which summarizes Popham’s rubric for rubrics.
EXAM Tasks:
Match CEFR tasks:
Match CEFR criteria:
RUBRIC Type Measurement
Scoring
Theme
Application
167
Function
Scorer
Channel
Relevant
Valid
Reliable
CEFR criteria Feasible
Descriptors Positive
Brief
No vague
Popham’s rubric Skill worthwhile
Scoring Criteria
(few and well labelled)
168
Descriptors (well
described)
Clear and Handy?
5.3.5. Reliability and Validity
With regard to the reliability and validity of each of the exams, the limitations of the
current research made it impossible to carry out a case study as this would imply a whole
new and extensive research so the data used to measure the reliability and validity are
those provided for the institutions based on studies and research carried out previously.
The main coefficients used will be Cronbach’s Alpha and SEM.
5.3.4. Data Analysis
Finally, the data obtained and classified will be studied in order to establish comparisons
and patterns among the tests and scales studied. For this purpose and with the aim of
easing such comparison, some rubrics specifically designed will be used.
One recording table to compare all the certificates per skill (4 tables in total):
CEFR (Corresponding Skill)
Tasks: Criteria:
Tasks:
169
English
Certificate
FCE IELTS
(Band 6)
ISE II ACLES EOI
Time
Nº of Tasks
Word
length
Rubric?
Match
CEFR
tasks
Match
CEFR
criterion
One recording table in order to compare the analysis of the certificates in relation to the
suitability of their exam papers:
FCE IELTS
(Band 6)
ISE II ACLES EOI
Nº of exam papers
Match
CEFR
tasks
Writing
Speaking
Reading
Listening
170
Match
CEFR
criteria
Writing
Speaking
Reading
Listening
Reliability Cronbach’s
alpha
SEM
One recording table which compares the certificates in relation to their rubrics (two in
total, one for the comparison of the writing rubrics and another for the comparison of the
speaking rubrics.
SKILL FCE IELTS ISE II ACLES EOI
Type Measurement
Scoring
Theme
Application
Function
Scorer
Channel
Relevant
Valid
CEFR Feasible
Descriptors Pos.
Brief
171
No
Vague
Popham’s
rubric
Skill worthwhile
Scoring criteria
(few+well labelled)
Descriptors (well
described)
Clear+handy?
5.4. Hypotheses
Taking into account the objectives of this thesis and the research which has been
explained and designed, it is fundamental at this stage to state what the hypotheses are:
H1.- The main English Certificates exam papers in Spain follow the guidelines given by
the European Framework of References for Languages.
H2.- The productive skills will use a rubric for the assessment of the paper tasks while
the receptive skills will present a lack of them.
H3.- Quantitative rubrics will be predominant in the main English Certificates.
H4.- The rubrics used will not be entirely valid, effective or reliable according to the
criteria stated by the CEFR.
H5.- The rubrics used will not be entirely effective or valid according to the criteria for
good rubrics.
172
H6.- Certain patterns could be extracted from the analysis of the different rubrics
analysed.
The hypotheses above mentioned obviously need to be subjected to a detailed research
in order to check whether or not they are correct. Hence, the research design previously
exposed is essential for accomplishing such a meaningful task. The next chapter
presents the research carried out.
173
Chapter 6: RESEARCH
6.1. Proficiency Test exam papers for the different language skills and their
assessment rubrics
In the following sections, the main exams used to certificate English competence in Spain
will be analysed. Special attention will be paid to:
o The types of task contained in each paper and whether those tasks are
recommended by the CEFR.
o If the assessment criteria match the criteria indicated by the European Framework.
o The rubrics used to assess those skills (if any) with the intention of finding out:
1. The type of rubrics used are in accordance with the different criteria.
2. Whether they are suitable or not.
3. Which types are the most common according to each skill.
4. If the use of the rubric implies a change of methodology or traditional
assessment tasks.
5. If they are adjusted to the CEFR criteria.
6. Their validity.
7. Their relevance.
8. Their reliability.
As has been explained above, rubrics, grading scales or evaluation matrices are powerful
resources or tools in order to assess whether a student has achieved the learning standards
of a certain level, its level of proficiency in a subject, topic or language, his or her work
174
and behaviour through a course, essay, etc., his or her progress, etc. In spite of the fact
they can be implemented in any subject, their use in the assessment of foreign languages
is the main focus of this thesis.
As a result, it makes sense to explore the use of rubrics for the assessment of the four
different skills: speaking, writing, reading and listening. Even though it is true that rubrics
have been traditionally used for the assessment of productive skills such as speaking and
writing, it is also possible to assess the so-called receptive skills (i.e., reading and
listening) using rubrics.
6.2. Writing
Rubrics were originally created for the scoring of writing compositions. This is probably
the reason why grading scales for writing are the most diversified. Writing is commonly
assessed through the elaboration of a composition by the learner. Those compositions or
essays are usually within a range of 40 to 220 words11. Traditional tasks encompass the
creation of a specific kind of text. Those text types correspond to formal or informal
letters or emails, articles, reports, complaints or opinion essays12. The CEFR refers to the
following types of tasks to assess this skill:
“completing forms and questionnaires; writing articles for magazines, newspapers,
newsletters, etc.; producing posters for display; writing reports, memoranda, etc.;
making notes for future reference; taking down messages from dictation, etc.; creative
and imaginative writing; writing personal or business letters, etc.” (61)
6.2.1. Literature review
11 In the English certificates analysed 12 In the English certificates analysed
175
Writing has been one of the main skills to be taught and assessed in the evaluation of
languages in the last centuries. Ezza stated that “since the early 1870s, academic circles
in English-speaking countries, notably in the United States, have been attaching
heightened significance to writing instruction” (186).
A simple analysis of the general assessment practices enables the verification that writing
is present, either directly or indirectly, in most evaluations. It is obvious that the
assessment of writing in a Foreign Language or in a native language course implies the
student’s demonstration of his or her writing production and abilities. On the other hand,
it might not be an immediate first thought that any kind of paper assessment presupposes
the test-taker uses writing. For instance, in a biology exam the teacher is assessing the
student’s knowledge of the cell. Nevertheless, the student must prove that knowledge by
explaining and describing the way in which a cell works through words. If the student
does not explain himself or herself correctly, the teacher might consider that he or she
does not know the content, so his or her ability to write is somehow being assessed. This
is because traditional examination practices, still the most common ones in our
educational system, are carried out through paper-based exams which require learners to
prove their knowledge on a certain subject, issue or area by producing a written answer.
As a result, it can be concluded that “in educational settings, writing is the basis upon
which a candidate’s achievement, learning and intelligence are judged” (Ghalib 225). For
instance, the student’s knowledge of photosynthesis in a biology exam does not just
depend on his or her understanding of the process, but also on his or her ability to explain
it clearly and correctly to the teacher in the exam. For this reason, writing is somehow
being assessed as well.
Despite its importance, writing is a skill that is still not mastered by most students, neither
native nor non-native speakers. According to some researchers, such as Allen et al.,
176
“students in the US struggle to reach proficiency levels throughout their high school
years” (125). This may be due to the complexity and difficulty that the process of writing
implies since it “requires individuals to coordinate a number of cognitive skills and
knowledge sources, such as goal setting, discourse awareness, memory management
strategies and social cultural knowledge” (Flower and Hayes cited in Allen et al. 125).
L1 and L2 writing processes are similar as both imply the setting of goals and the
translation of ideas into words (125). However, they show differences in lower-level
processes. L1 writer’s syntactic constructions and lexical access are pretty much
automatized while they are high demanding tasks for L2 writers (De Keyser cited in
Allen, Crossley et al. 126). Consequently, non-native speakers spend less time on higher-
lever processes because of the emphasis they need to put on the tasks that native speakers
perform automatically. Furthermore, L2 learners must tackle some extra constraints when
writing; for instance, the translation they may mentally try to do when they write in their
L2. Some studies related to this issue suggest L2 writings often include more t-units but
fewer modifiers, subordinate clauses and cohesive mechanisms. (Silva cited in Allen et
al. 126).
With regard to evaluation of the writing skill of L2 learners, it is essential to understand
the difficulty and complexity of the process since it “requires accounting for multiple
factors to ensure a fair and accurate judgement of writer’s abilities” (Veerappan and
Tajularipin 143). As for the criteria that must be considered for the assessment of this
skill, there are several. Even though these criteria may be referred to with a different name
or label, the truth is that they are assessing the same aspects, as will be illustrated below:
the Somerset Local Educational Authority (LEA) in the United Kingdom takes into
account eight criteria: originality, vocabulary, elaboration, organization, syntactic
agreement, spelling, handwriting and layout (Wilkinson cited in Ezza 187). On the other
177
hand, the Australian Curriculum Assessment and Reporting Authority (ACARA) uses ten
scoring criteria: audience, structure, persuasive devices, cohesion, text, ideas, vocabulary,
paragraphing, sentence structure and spelling (cited in Ezza 187) and the City University
of New York relies on only five: critical response to the writing task; development of
ideas; structure of the response; sentence, word choice and grammar; and usage and
mechanics (187). According to Polio (cited in de Haan and Van Esch 2-3), the features to
review in the text are the following: overall quality (based on the linguistic accuracy),
syntactic complexity (variety of t-units and elaborate language structures), lexical
features, content (interest, referencing and argumentation), mechanics (spelling,
punctuation and capitalisation), coherence and discourse (organisation and emphatics and
cohesive devices), fluency and revision. All of the aforementioned examples illustrate
briefly the diversity of sets of criteria that may be used to evaluate the writing skill but
also the clear similarities among all the sets.
Bearing in mind the different criteria which can be used for the assessment of writing,
rubrics seem to be an appropriate tool which, as Mark Brook states “enables an evaluator
to convert a given quality of student work into a letter grade, percentage or level” (cited
in Frydrychova 391). The general advantages of using a rubric have already been stated
in the chapter devoted to rubrics. As for the advantages related specifically to the use of
rubrics for the assessment of writing, Ezza highlights they “ensure greater score validity”
(187). Concretely, holistic rubrics allow an “authentic reaction of the rater” whereas
analytic ones identify “writers’ strengths and weaknesses” (187). Ezza also warns about
some of the cons: holistic rubrics cannot give a precise diagnostic info and focus and rely
too much on the scorer rather than on the text. For his part, analytic rubrics might have
other problems. For instance, analytic rating of one criterion may influence the rating of
the other criteria. That is, the fact that one text contains many grammar mistakes does not
178
necessarily mean that the structure, cohesion, or range of vocabulary are not excellent.
However, this fact may affect the rater’s perception. Some other critics suggest that
rubrics may be “inhibiting in terms of their creativity” or that rubrics “describe minimal
standards rather than high standards” (432).
Some interesting studies on the use of rubrics for the evaluation of writing have been
carried out in the recent years. Wang conducted an inquiry during an EFL writing course
in a Chinese University about the effects of rubrics for the self-assessment of student’s
writings. The research consisted of six essays and several interviews with six informants
among the 80 who were doing the writing tasks. The students were shown one rubric in
the first class together with an explanation and samples which would fit each of the levels
of the scale. During one lesson, the students were asked to write an essay which was
photocopied by the teacher afterwards. Later the students had to self-assess themselves.
In the following lesson, students were asked to peer-assess the same writings. Then,
students were shown both self- and peer-assessments. The same process was carried out
six times. The research demonstrated that the participants “embraced the rubric as an
instructional tool guiding them throughout the forethought, performance and reflection
stages of self-regulated learning” (9).
In an article for Assessing Writing, Sundeen noted the results of a study he carried out in
a high school from the western of the US. 89 learners were divided into three different
groups, each exposed to different conditions. One group could see the six-point rubric
they would be assessed with and it was also explained to them; the second group could
see the rubric, but no explanation was given to them; and the final group received no
instruction related with the assessment tool. The rubric which was used to assess the
students’ essays before and after the research measured the organisation, word choice,
ideas, sentence fluency and convention of the persuasive essays. It could be proved that
179
those students who had seen and studied the rubric obtained improved results. However,
no significance difference where found in terms of number of paragraphs, sentences and
words, which was contrary to what it was expected. Concerning the group which were
only shown the rubric some minutes before the writing, it was observed that their writing
quality was better than when they could not see the rubric, although the improvement was
not as significant as it was in the group where the rubric had been shown and commented
on beforehand (78-88).
Inspired by Sundeen’s research, Becker carried out a similar study. He decided to
compare the performance of learners on a summary writing task and the impact that the
involvement of students in the creation of the rubric could have in their results. 96 ESL
students with a high-intermediate level of English took part in the research. Three groups
were labelled as treatment and one as control group. There was no overall difference in
the English proficiency levels of any of the groups. The first group which was referred as
“A” participated in a workshop which lasted for 60 minutes. The aim of the workshop
was to develop a scoring rubric. With this intention, students were shown good and poor
examples of rubrics and they were asked to think about how a summary could be assessed.
Afterwards, they had to list the criteria of a demonstrative effective summary. Later the
criteria were discussed in order to articulate levels of quality, and finally they were tasked
with the creation of a draft rubric. Furthermore, the revised version of the rubric was
posted so that it would be available for everybody. Class “B” did not create a rubric but
could see and discuss five different benchmark essays and, afterwards, they had the
opportunity to score 10-15 summaries from students of the other three groups. These
summaries were the ones which had achieved the same score from three professional
raters. Group “C” just had the chance of seeing the rubric 15 minutes before completing
the summary task. Group “D” was only asked to complete the task without any kind of
180
access to the rubric. The results obtained by the research were in the line of Sundeen´s
achievements. Although the pre-test scores of the four groups were very similar, it was
found that the results of those students in group A in the post-test were significantly higher
in comparison to the other classes (B, C and D). Moreover, the results obtained by class
B learners were higher than the results achieved by the learners in groups C and D. As a
result, Becker concluded that the fact of providing the learners with a rubric is not enough
to improve their writing performance. It is therefore necessary to involve the students in
the creation of the rubric or, at least explain it to them and provide them with different
models and examples in order to improve their scores (15-24).
Simona Laurian and Carlton J. Fitzgerald tried to probe the differences between students
who had been shown or not the rubric through a small case study. The students had to do
two writing assignments, the second one with a copy of the rubric that would be later used
by the scorer. In addition, they answered a survey which consisted of fifteen questions
divided in three categories: the use of rubrics, standards and positive and negative effects
of the rubrics. The students answered with a five-point rating scale. The results of the
case study were quite significant, as the average score of the students improved from
86.83 to a 90.86 in the second essay written. Eighteen of the students got a higher mark
in the second assignment. With regard to the survey, the most remarkable aspect is that
20 students out of 21 admitted that they had taken into account a rubric in order to
complete their writing task if they had been allowed to see it. 19 out of 21 considered that
rubrics were useful for self-assessment before handing in an essay. 14 disagreed with the
statement that a rubric “stifles my creativity” which rejects the argument that students
“found rubrics inhibiting in terms of their creativity” (432).The above summarised studies
by Wang, Sundeen, Becker and Laurian and Fitzgerald illustrate the importance of giving
ESL learners access to the rubrics, involving them either in their creation or through the
181
explanation of their criteria and the illustration of students with examples of good and
bad models which would fit each of the scale levels.
Other studies on the use of rubric as an assessment instrument deal with the use of
automated scoring systems. Perin and Lauterbach pointed out that those “cannot yet
interpret the meaning of a piece of writing, identify off-topic content, or determine
whether it is well argued”. Nevertheless, they worked on the use of coh-metrix automated
scoring in order to improve them. This system focuses on cohesion and the different
linguistic variables or “cohesive cues” which are, as McNamara et al. argue, that what
“would be expected to characterize student writing judge to be of high quality” (62 cited
in Perin and Lauterbach). The cohesive cues used by different automated systems may
include connectives, lexical overlap, logical operators, causal cohesion, semantic co-
referentiality, anaphoric reference, polysemy, hypernymy, lexical diversity, word
formation, syntactic complexity, syntactic similarity and basic text measures. The
research counted on a corpus of persuasive essays and summaries from the Mississippi
State University, together with other ones written by low-skilled adults at community
colleges. All the essays were scored first by experienced human raters using a six-point
holistic rubric for the essay and a 16-point analytic scale for the summary. Some of the
criteria included in the rubric were the following: critical thinking, use of examples,
reasons provided, organization, coherence, accuracy of grammar and usage of mechanics.
The inter-rater reliability was r=.85. Afterwards, they were scored by the automatic rater
in order to find out whether the system can distinguish between trained and low trained
students. The system made use of significant predictors in order to attempt to detect those
persuasive essays written by trained students. These predictors were the number of words
before the main verb, the textual lexical diversity and the celex logarithm frequency
including all words. Using the three predictors the coh-metrix was unable to distinguish
182
between the two groups of essays. However, when the researchers decided to use a coh-
metrix system based on 52 different predictors, they found that it was capable of showing
a number of significant differences between high and low essays in the lexical diversity
and the argument overlap predictor. As for the written summaries, the coh-metrix found
differences in the content word overlap, adjacent sentences, proportional, standard
deviation and lexical diversity predictors.
A large amount of studies outline different kinds of rubrics in order to find the most
reliable ones or the features essential for building an effective one. Ghalib and Al-Hattami
carried out a case study with 30 students of English at the Faculty of Arts of Taiz
University to compare the assessment with holistic and analytic rubrics. The case study
consisted of assessing a descriptive essay with a holistic rubric and with an analytic one.
Three raters were trained previously in a two-hour session where they were explained the
rating system, advised on how to avoid bias and the most common rating problems were
discussed. The rater had to assess the 30 writing samples in two separate sessions. The
first time, they used a 6-point holistic grading scale, whereas in the second session (which
took place one month later) they used an analytic rubric with the following criteria:
content, cohesion, syntactic structure, vocabulary and mechanics. Furthermore, the
analytic rubric had several well-defined standards of performance points. This case study
allowed the researchers to measure the standard deviation of the scores, this being 3.12
with the holistic rubric and 2.82 with the analytic one. They also observed that the analytic
rubric was more rigorous according to t-test. In addition, the analysis of variance showed
no significant differences among the three raters and the confidence interval using the
holistic rubric and the analytic one was 95%. As a result, the researchers concluded that,
using analytical scoring rubrics, raters “give lower scores than then using holistic scoring
rubrics” (230) but the analytic ones provide “more consistent scores” (131). It is also
183
worth highlighting that Ghalib and Al-Hattami’s article for English Language Teaching
contains a list of effective rubric features such a “well-defined list of criteria for test-
takers to know what is expected” (226), “standards of excellence for the different levels
of performance” (226), “gradations of quality” based on the degree to which standards
have been met, and “modal exemplars of expected performance at the different levels of
the scale” (226). Moreover, they state that an effective rubric “is the one that is used by
different raters on a given assessment task and generates similar judgements/scores”
(226).
6.2.2. Assessment of Writing in the main English Certificates
The English Certificates analysed showed a certain degree of consensus on the way to
assess the writing skills. Thus, the tasks consist mainly in writing compositions.
Cambridge First Certificate
The Cambridge First Certificate (CFE) writing paper consists of two writing tasks. The
time allowed to take the exam is 1 hour and 20 minutes. In task 1 “candidates are given
input in the form of an essay title to respond to, along with accompanying notes to guide
their writing” (Cambridge English Language Assessment 27). The essay composition
must be 140-190 words long in which the candidate will agree or disagree with a prompt
given. Candidates must prove their ability for giving information, giving opinion, giving
reasons, comparing and contrasting ideas and opinions and drawing a conclusion in
English. Some other specifications of the criteria include the candidates’ need to express
their ideas in a clear and logical way, the use a variety of structures and vocabulary and
the appropriate use of cohesive devices and linking words. Task 2 consists of three
184
different tasks types, from which one must be chosen. The three tasks proposed may be
an article, an informal or formal letter or email, a report or a review. In this part,
candidates “must be able to demonstrate appropriate use of one or more of the following
functions: describing, explaining, reporting, giving information, suggesting,
recommending, persuading” (28). Moreover, they must also adjust themselves to the
specific type of writing requirements in terms of format and register, as well as using of
cohesive devices, appropriate vocabulary and structures and correct exposition of ideas.
Concerning writing, the CEFR states, in the global scale provided, that a B2 learner “can
produce clear, detailed text on a wide range of subjects and explain a viewpoint on a
topical issue giving the advantages and Independent disadvantages of various options”
(61). Three illustrative scales for the writing skill can be found in the section 4.4. of the
framework titled `Communicative language activities and strategies´ in order to ease the
creation of communicative tasks and the evaluation of the writing ability. The creative
writing scale indicates that a B2 learner is able to “write a review of a film, book or play”
(62) whereas in the essay and report scale is pointed out his or her ability to develop an
argument by giving reasons and details or explaining the advantages or disadvantages;
and sum up information from different sources. According to all this, the tasks proposed
by the Cambridge First Certificate would be suitable to assess whether or not a candidate
has a B2 writing level. In addition, the tasks selected would be among those recommended
by the Council to assess writing. The table below summarises the Cambridge First
Certificate’s successful adaptation of the exam tasks to what the CEFR suggests.
FCE CEFR
Task:
1. Essay
Tasks:
185
2. Informal letter/e-mail, article,
report, formal letter/e-mail or
review
• completing forms and
questionnaires
• writing articles for magazines,
newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
• taking down messages from
dictation, etc.
• creative and imaginative writing
• writing personal or business
letters
Criteria:
(Task 1) Candidate must prove his or her
ability for giving information, giving
opinion, giving reasons, comparing and
contrasting ideas and opinions and
drawing a conclusion.
(Task 2) Candidate needs to demonstrate
appropriate use of one or more of the
following functions: (describing,
explaining, reporting, giving information,
suggesting, recommending, persuading
Criteria:
A B2 learner can produce clear, detailed
text on a wide range of subjects and
explain a viewpoint on a topical issue
giving the advantages and independent
disadvantages of various options.
Can write a review of a film, book or
play.
Can write an essay or report.
186
As for the instrument used to assess this skill, the Cambridge FCE examiner must use one
rubric in order to score the two writing compositions. This rubric is analytic because it
assesses separately the four different criteria. These criteria are: content, communicative
achievement, organisation and language. As for the scoring, it is quantitative since it uses
a numeric scale from 0 as the minimum and 5 as the maximum. It is a rubric only used to
assess the two writing tasks, so it is domain relevant and skill focused. The main purpose
of these certificates is to determine whether the candidate has the B2 level or not, so it is
a proficiency rubric. Although it is not the candidate’s teacher who scores the tasks, the
examiner substitutes the teacher in order to increase the objectivity of the task. Finally, it
is a paper rubric even though it can be also found on the Internet.
Type of rubric according to Cambridge FCE
How is measured Analytic
How is scored Quantitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
It is now time to determine whether the rubric used by the Cambridge certificate is
suitable or not. In order to do so, all the aspects included and explained in the section
Criteria to create a good rubric or to determine which ones are good will be analysed.
Afterwards, a recording table will summarise all the conclusions of the analysis.
187
To begin with, Popham’s rubric will be employed. The rubric’s rubric criteria will
correspond to the following questions: Is the skill assessed actually worthwhile? Are there
few scoring criteria but correctly labelled? Are there degrees of excellence described
appropriately? Is the rubric presented in a clear and handy way? (Popham 27-28).
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the writing skill. In the literature review section, the importance
of the writing skill has been fully addressed. Moreover, the CEFR includes the writing
skill among the skills used to determine someone’s language level.
➢ Are there few scoring criteria but correctly labelled?
Yes. The scoring criteria are few (four) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
No. There are multiple cells to describe the different criteria according to the scale. The
descriptors are appropriately described, but in some cases they are somewhat short and
they do not provide examples, which would be recommendable. Furthermore, the
descriptors for scale 4, 2 and 0 are very vague and imprecise as they merely indicate a
performance between the bands above and below.
➢ Is the rubric presented in a clear and handy way?
Yes. The rubric is clear because it is not excessively long and the design is good and
handy.
With regard to the CEFR recommendations for good rubrics (the necessity of building a
feasible tool, descriptors positively worded, brief and not vague), the Cambridge rubric
for the FCE is feasible since it contains only 4 different criteria. Furthermore, CEFR
188
advice on the reduction of criteria has been followed by the grouping of categories under
the same label. For example, under the label “Language” both the grammar and
vocabulary are assessed. The “Organisation” criterion encompasses the cohesion devices
together with the structure and format of the composition. In relation to descriptors, they
are positively worded, as encouraged by the European Council. For instance, band 1 of
the language criteria, which corresponds to one of the lowest levels, states “uses everyday
vocabulary generally appropriately, while occasionally overusing certain lexis. Uses
simple grammatical forms with a good degree of control. While errors are noticeable,
meaning can still be determined.” (34). The descriptors are brief but, as has already been
stated, some of the bands are very vague as they only include the indication: performance
shares features of Bands X and X as the only descriptor.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable as they are relevant to the skill. In fact,
the European Council uses three of them (communicative achievement, organisation and
language) in the writing scale provided by the CEFR as the Cambridge Handbook for
teachers indicates (33). The validity can also be easily confirmed since the test, with the
help of the rubric, measures what they are supposed to assess, and the descriptors
provided match the evaluation standards included in the CEFR. A simple comparison
between the CEFR writing scale and the FCE writing rubric shows the similarities. (All
descriptors included in the following table have been taken from the Cambridge FCE
handbook 33-34)
Communicative
Achievement
Organisation Language
189
CEFR (B2) Uses the
conventions of the
communicative
task to hold the
target reader’s
attention and
communicate
straightforward
ideas
Text is generally
well organised and
coherent, using a
variety of linking
words and cohesive
devices
Uses a range of
everyday
vocabulary
appropriately, with
occasional
inappropriate use of
less common lexis.
Uses a range of
simple and some
complex
grammatical forms
with a good degree
of control.
Errors do not
impede
communication.
FCE (Band 5) Uses the
conventions of the
communicative
task effectively to
hold the target
reader’s attention
and communicate
straightforward and
Text is well
organised and
coherent, using a
variety of cohesive
devices and
organisational
patterns to
Uses a range of
vocabulary,
including less
common lexis,
appropriately.
Uses a range of
simple and complex
grammatical forms
190
complex ideas, as
appropriate.
generally good
effect.
with control and
flexibility.
Occasional errors
may be present but
do not impede
communication.
Reliability
Cambridge Assessment English controls the reliability of its certificates by using
Cronbach’s alpha (the closer the Alpha is to 1, the more reliable the test section is) and
also the Standard Error of Measurement tool (shows the impact of reliability on the likely
score of an individual: it indicates how close a test taker’s score is likely to be to their
‘true score’, to within some stated probability). The results of these two tools are
summarised in the following table that Cambridge Assessment English publishes in its
web page.
Cronbach’s alpha SEM
Reading 0.80 3.61
Writing 0.84 1.39
Use of English 0.84 3.18
191
Listening 0.81 2.16
Speaking 0.84 1.50
Total Score 0.94 2.78
The above extensive analysis of the Cambridge First Certificate writing exam and the
rubric proves that the exam paper is suitable and matches CEFR indications and levels;
and that the rubric used for the assessment of the exam is good and suitable in most of the
aspects. However, it is too vague because of the omission of certain bands descriptors and
the absence of examples. The following table summarises the whole analysis.
EXAM Tasks: 2
Match CEFR tasks: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
Scoring Quantitative
Theme Domain-relevant
192
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
Reliable Yes
CEFR criteria Feasible Yes
Descriptors Positive Yes
Brief Yes
No vague NO
Popham’s rubric Skill worthwhile Yes
Scoring Criteria Yes
193
(few and well labelled)
Descriptors (well
described)
NO
Clear and Handy? Yes
IELTS
The International English Language Testing System (IELTS) is aimed at measuring the
language proficiency of people who need a certificate to prove their capacity to study or
work in English. IELTS is jointly owned by the British Council, IDP: IELTS Australia
and Cambridge Assessment English. There are two types of tests available: academic and
general training. The former is more oriented to people who wish to study at university
in English while the latter is conceived for candidates who want to either work in English
or study Secondary Education in English. The main difference with other English
certificates, such as the Cambridge ESOL tests (PET, FCE, CAE) or the Trinity ISE, is
that the IELTS does not separate candidates by level; i.e., there are no separate tests to
achieve the certificate of that level. In contrast, all candidates take the same exam and
their results determine their level.
Academic writing is one of the four parts that make up the test. The 60-minute writing
test contains two tasks. In writing task 1, candidates must describe in academic or semi-
formal style one graph or chart. In writing task 2, they must write an essay giving their
opinion about a particular topic.
194
A sample for this task could be the following:
“The first car appeared on British roads in 1888. By the year 2000 there may be
as many as 29 million vehicles on British roads.
Alternative forms of transport should be encouraged, and international laws
introduced to control car ownership and use. To what extent do you agree or
disagree?” (IELTS web page. Sample Test Questions)
Whereas task 1 focuses on the “ability to identify the most important and relevant
information and trends in a graph”, task 2 “assesses the ability to present a clear, relevant,
well-organised argument, giving evidence or examples to support ideas and use language
accurately” (IELTS web page. Test Format).
As for the compatibility of the IELTS writing assessment with the CEFR guidelines, there
are no references to the description of any kind of graphic, diagram, table or chart as
recommended tasks for the assessment of writing. However, the CEFR recommends the
use of diagrams as a suitable task for the speaking exam or the reading comprehension.
Moreover, writing reports of memoranda may include the use of graphics and diagrams,
too. The IELTS task 1 implies the combination of two skills: on the one hand, the reading
comprehension since the candidate has to be able to read and understand the information
contained in the visual stimuli. On the other hand, the writing skill when he or she writes
the description. The production of an essay does appear in the framework as a valid task
to assess the writing skill.
IELTS CEFR
195
Task:
1. Graphic/Diagram description
2. Essay
Tasks:
• completing forms and
questionnaires
• writing articles for magazines,
newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
• taking down messages from
dictation, etc.
• creative and imaginative writing
• writing personal or business
letters
Criteria:
(Task 1) ability to identify the most
important and relevant information and
trends in a graph, chart, table or diagram,
and to give a well-organised overview of
it using language accurately in an
academic style.
(Task 2) ability to present a clear,
relevant, well-organised argument, giving
evidence or examples to support ideas
and use language accurately.
Criteria:
A B2 learner can produce clear, detailed
text on a wide range of subjects and
explain a viewpoint on a topical issue
giving the advantages and independent
disadvantages of various options.
Can write a review of a film, book or
play.
Can write an essay or report.
196
The scoring of the IELTS writing test is based on the use of grading scales. In this case,
two different rubrics are used: one for the scoring of the first task and another one for
marking the second task. Both of them are analytic, as they assess individually four
different criteria: task achievement, coherence and cohesion, lexical resource,
grammatical range and accuracy. They are both quantitative with a numeric scale from 0
to 10. As they are different, depending on the task, it can be stated they are task-specific
and domain relevant. They are proficiency rubrics as they are targeted to assess the
candidate level or competence. Both are paper-based despite the fact they can be found
online.
Type of rubric according to IELTS
How is measured Analytic
How is scored Quantitative
Theme Domain-relevant
Application Task-specific
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
It is now time to determine whether the rubrics used by the IELTS are good or not.
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the writing skill. In the literature review section, the importance
of the writing skill has been fully addressed. Moreover, the CEFR includes the writing
skill among the skills used to determine someone’s language level.
➢ Are there few scoring criteria but correctly labelled?
197
Yes. The scoring criteria are few (four) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
Yes. All the degrees of excellence are fully explained. However, they do not contain any
example.
➢ Is the rubric presented in a clear and handy way?
No. Whereas it is clear, it is possibly not very handy because it is too extensive since there
are 9 degrees of excellence.
As far as CEFR recommendations for good rubrics are concerned (the necessity of
building a feasible tool, descriptors positively worded, brief and not vague), the IELTS
rubric is feasible as it only contains 4 different criteria, but it is perhaps excessively long
owing to the numeric scale that it uses. The CEFR advice on the reduction of criteria by
the grouping thereof under one clear label have been followed. In relation to descriptors,
they are not always positively worded, as encouraged by the European Council. For
instance, bands 0 and 1 contain descriptors such as “does not attempt the task in any way”,
“does not organise ideas logically” or in band 5 “makes inadequate, inaccurate or over-
use of cohesive devices” (IELTS rubric writing task 1). The descriptors are brief, but they
do not contain any sort of example.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable because they are relevant to the skill. In
fact, the European Council uses some of them although with different label names. For
example, the “coherence and cohesion” category is equivalent to the CEFR
“organisation” and the “lexical resource” and the “grammatical range and accuracy”
198
categories are labelled as “language” in the writing scale provided by the CEFR as the
Cambridge Handbook for teachers indicates (33). However, communicative achievement
is not assessed in the IELTS exam. The validity can be confirmed in the organisation and
language criteria since the test, with the help of the rubric, measures what it is supposed
to assess and the descriptors provided match the evaluation standards included in the
CEFR.
The following table compares both of them:
Communicative
Achievement
Organisation Language
CEFR (B2) Uses the
conventions of the
communicative
task to hold the
target reader’s
attention and
communicate
straightforward
ideas.
Text is generally
well organised and
coherent, using a
variety of linking
words and cohesive
devices.
Uses a range of
everyday
vocabulary
appropriately, with
occasional
inappropriate use of
less common lexis.
Uses a range of
simple and some
complex
grammatical forms
with a good degree
of control.
199
Errors do not
impede
communication.
IELTS (Band 6) Arranges
information and
ideas coherently
and there is a clear
overall progression
uses cohesive
devices effectively,
but cohesion within
and/or between
sentences may be
faulty or
mechanical.
May not always use
referencing clearly
or appropriately.
(Lexical) uses an
adequate range of
vocabulary for the
task.
Attempts to use less
common
vocabulary but with
some inaccuracy.
Makes some errors
in spelling and/or
word formation, but
they do not impede
communication.
(Grammatical
range)
Uses a mix of
simple and complex
sentence forms.
200
Makes some errors
in grammar and
punctuation but
they rarely reduce
communication.
Reliability
The IELTS also controls its reliability regularly. As stated on its own website, research
conducted in year 2015 investigated the reliability of the test using Cronbach's alpha and
SEM. The results were the following (ielt.org):
Alpha SEM
Listening Paper 0.92 0.37
Academic Reading 0.90 0.38
The above extensive analysis of the IELTS writing exam and tasks and the rubric proves
that the exam paper is suitable and matches CEFR in some indications and levels but not
in others and that the rubric used for the scoring is not effective or valid in many aspects.
The following table summarises the analysis.
EXAM Tasks: 2
201
Match CEFR tasks: Task 1: No
Task 2: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
Scoring Quantitative
Theme Domain-relevant
Application Task-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
Reliable Yes
202
CEFR criteria Feasible NO
Descriptors Positive NO
Brief Yes
Not vague NO
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Yes
Descriptors (well
described)
Yes
Clear and Handy? NO
TRINITY COLLEGE ISE II
The Trinity College London Integrated Skills in English Certificate to certificate a B2
competence (equivalent to the CEFR B2) divides the exam into two exam modules: on
the one hand, the reading and writing exam; and on the other, the listening and speaking
203
exam. This division makes it different from all the other proficiency certificates and offers
a much more integrated approach. The first module lasts for two hours and contains four
tasks, two of which examine the writing ability of the candidate (tasks 3 and 4).
To begin with, task 3 is called “Reading into writing task” and connects the previous two
tasks (which focus principally on the assessment of the reading comprehension of the
candidates) with the following ones, aimed more at assessing writing production. The
candidate has to write about a composition of around 180 words based on the four reading
texts in task 2. The task aims to measure the ability to:
“identify information that is relevant to the writing prompt; identify common
themes and links across multiple texts; paraphrase and summarise factual ideas,
opinions, arguments and/or discussion; synthesise such information to produce a
coherent response to suit the purpose” (Trinity College London 13)
The type of composition can be one of the following: descriptive essay, discursive essay,
argumentative essay, article (magazine or online), informal email or letter, formal email
or letter, review, report. The same genres can be the object of task 4, which is normally
referred as “extended writing”. In this task, the student must also answer in response to a
prompt in the same number of words as the previous task. The focus is here solely on the
“ability to produce a clear and detailed response to a prompt” (13). The writing topic of
this task may be related to one of the following issues:
• Society and living standards
• Personal values and ideals
• The world of work
• Natural environmental concerns
204
• Public figures past and present
• Education
• National customs
• Village and city life
• National and local produce and products
• Early memories
• Pollution and recycling
It is advised to spend about 40 minutes on each of the writing tasks.
According to the CEFR guidelines for the assessment of the writing skill in a B2 level,
the ISE II would be highly suitable for assessing this skill, due to both the tasks it uses
and the criteria they assess.
ISE-II CEFR
Task:
3. Reading into Writing
4. Extended Writing
Types (both)
➢ Descriptive essay
➢ Discursive essay
➢ Argumentative essay
➢ Article (magazine or online)
➢ Informal email or letter
➢ Formal email or letter
➢ Review
Tasks:
• completing forms and
questionnaires
• writing articles for magazines,
newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
• taking down messages from
dictation, etc.
• creative and imaginative writing
205
➢ Report • writing personal or business
letters
Criteria:
A candidate who passes ISE II Writing
can:
➢ synthesise and evaluate
information and arguments from a
number of sources
➢ express news and views
effectively in writing and relate to
the views of others
➢ write clear, detailed texts on a
variety of subjects related to his
or her interests, following
established conventions of the
text type concerned
➢ write clear, detailed descriptions
of real or imaginary events and
experiences, marking the
relationship between ideas in
clear, connected text
➢ write an essay or report that
develops an argument
systematically, gives reasons and
Criteria:
A B2 learner can produce clear, detailed
text on a wide range of subjects and
explain a viewpoint on a topical issue
giving the advantages and independent
disadvantages of various options.
Can write a review of a film, book or
play.
Can write an essay or report.
206
relevant details, and highlights
key points
➢ explain the advantages and
disadvantages of various options
➢ evaluate different ideas or
solutions to a problem
➢ summarise a range of factual and
imaginative texts, e.g. news items,
interviews or documentaries
➢ discuss and contrast points of
view, arguments and the main
themes
➢ summarise the plot and sequence
of events in a film or play.
For the assessment of the writing tasks, Trinity College London examiners must use two
rubrics, one per task. Task 3 “reading into writing” scale is analytic and assesses four
different criteria: reading for writing, task fulfilment, organisation and structure and
language control. According to the scoring, it is quantitative, with a numeric scale from
0 to 4. It is only used to assess the writing skill, so it is domain-relevant and also task
specific, since it is only used to assess task 3 in particular and cannot be used for assessing
any other task. It is a proficiency rubric, owing to the nature of the certificate, assessed
by an examiner; and it is a paper rubric, although it can be found online.
Task 4 “extended writing” rating scale is also analytic but with only three criteria: task
fulfilment, organisation and structure and language control. The rest of the classifications
207
are the same (quantitative, domain-relevant, task-specific, proficiency, examiner and
paper).
Type of rubric according to ISE-II (Task 3 and 4)
How is measured Analytic
How is scored Quantitative
Theme Domain-relevant
Application Task-specific
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
It is time to deeply analyse the rubric:
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the writing skill. In the literature review section, the importance
of the writing skill has been fully addressed. Moreover, the CEFR includes the writing
skill among the skills used to determine someone’s language level.
➢ Are there few scoring criteria but correctly labelled?
Yes. The scoring criteria are few (four or three) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
Yes, the descriptors are fully described. However, no examples are provided.
➢ Is the rubric presented in a clear and handy way?
208
No. Whereas it is clear, it is possibly not very handy because it is too broad. Task 3 rubric
is two-pages long. The descriptors of the two rubrics are well described. However, they
are too long, which means that the rubrics are not very handy.
Concerning the CEFR recommendation for good rubrics (the necessity of building a
feasible tool, descriptors positively worded, brief and not vague), the ISE-II rubrics are
feasible since they only contain 4 and 3 different criteria, but they are perhaps too long
owing to the long descriptors they contain. The CEFR advises on the reduction of criteria
by the grouping thereof under one clear label has been followed. In relation to descriptors,
they are not always positively worded as encouraged by the European Council. For
instance, score degree 1 says “poor achievement of the communicative aim” or “errors
frequently impede understanding” (43).
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable as they are relevant to the skill. In fact,
the European Council uses some of them although with different label names. For
example, the “task fulfilment” category is equivalent to the CEFR “communicative”. The
validity can be confirmed in the organisation and language criteria, since the test with the
help of the rubric measures what they are supposed to assess and the descriptors provided
match the evaluation standards included in the CEFR.
The following table compares both of them:
Communicative
Achievement/Task
fulfilment
Organisation/
Organisation and
structure
Language/Language
Control
209
CEFR (B2) Uses the
conventions of the
communicative
task to hold the
target reader’s
attention and
communicate
straightforward
ideas.
Text is generally
well organised and
coherent, using a
variety of linking
words and cohesive
devices.
Uses a range of
everyday
vocabulary
appropriately, with
occasional
inappropriate use of
less common lexis.
Uses a range of
simple and some
complex
grammatical forms
with a good degree
of control.
Errors do not
impede
communication.
ISE-II Excellent
achievement of the
communicative
aim.
Excellent
awareness of the
writer–reader
Effective
organisation of
text.
Very clear
presentation and
logical
development of
most ideas and
Wide range of
grammatical items
relating
to the task with
good level of
accuracy.
210
relationship (i.e.
appropriate use of
standard style and
register throughout
the text).
All requirements
(i.e. genre, topic,
reader,
purpose and
number of words)
of the
instruction
appropriately met.
arguments, with
appropriate
highlighting
of significant
points and relevant
supporting detail.
Appropriate format
throughout the text.
Effective
signposting.
Wide range of
lexical items
relating to the
task with good level
of accuracy.
Any errors do not
impede
understanding.
Excellent spelling
and punctuation.
Reliability
The Trinity College London ISE-II does not provide any data on the Cronbach’s Alpha
coefficient or the average SEM of the test.
The above extensive analysis of the ISE-II writing exam and tasks and the rubric proves
that the exam paper is suitable and matches CEFR in some indications and levels but not
in others, and that the rubric used for the scoring is not effective or valid in some aspects.
The following table summarises the analysis:
EXAM Tasks: 2
211
Match CEFR tasks: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
Scoring Quantitative
Theme Domain-relevant
Application Task-specific
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
Reliable Unknown
CEFR criteria Feasible NO
212
Descriptors Positive No
Brief No
Not vague YES
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Yes
Descriptors (well
described)
NO
Clear and Handy? YES
ACLES
The Association of Language Centres in Higher Education certifies whether a learner has
or not a concrete level through an exam in which the four skills are measured individually.
The B2 accreditation assesses separately the speaking expression and interaction, the
listening comprehension, the writing production and the reading comprehension.
213
The writing production paper lasts for 70-90 minutes. Candidates must write two texts
which can be descriptive, narrative, informative or argumentative. Text must be at least
125-word long each and not over 450 words in total. Among the possible tasks that we
can find: informal letter to tell about news, experiences or feelings; answer a professional
letter or email; answer an advertisement; write a CV and a cover letter; summarise and
state an opinion; write a text for a magazine or forum or write instructions (ACLES.
Estructura Exámenes).
In the ACLES accreditation model document, it is stated that the criteria for the task are
the same as those indicated in the CEFR. However, it is not really specified the criteria
the writing tasks intend to focus on. As for the tasks, they are suitable and match those
indicated by the European Council.
ACLES CEFR
Task:
• informal letter to tell about news,
experiences or feelings
• answer a professional letter or
email; answer an advertisement
• write a CV and a cover letter
• summarise and state an opinion
• write a text for a magazine or
forum or write instructions
Tasks:
• completing forms and
questionnaires
• writing articles for magazines,
newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
• taking down messages from
dictation, etc.
• creative and imaginative writing
214
• writing personal or business
letters
Not specified Criteria:
A B2 learner can produce clear, detailed
text on a wide range of subjects and
explain a viewpoint on a topical issue
giving the advantages and independent
disadvantages of various options.
Can write a review of a film, book or
play.
Can write an essay or report.
A rubric is used to assess both tasks of the writing paper. This rubric is analytic and it
contains four different criteria (task adequacy, organisation and register, grammar and
vocabulary and orthography and punctuation). It is a quantitative rubric with a numeric
scale from 1 to 10 with 5 degrees of excellence (1-2; 3-4; 5-6; 7-8; 9-10) and also a
qualitative word-scale of 5 levels (from very deficient to very well for a B2 level). It is a
domain-relevant, since it is a writing scale and also skill-focused. Furthermore, the rubric
can be classified as a proficiency rubric, the scorer is an examiner and it is a paper rubric.
Type of rubric according to ACLES
How is measured Analytic
How is scored Quantitative and Qualitative
Theme Domain-relevant
215
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
The deep analysis of the rubric is:
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the writing skill. In the literature review section, the importance
of the writing skill has been fully addressed. Moreover, the CEFR includes the writing
skill among the skills used to determine someone’s language level.
➢ Are there few scoring criteria but correctly labelled?
No. The scoring criteria are few (four) as Popham recommends but they are not correctly
labelled. The orthography and punctuation criteria could be included under the “grammar
and vocabulary” category and punctuation under the “organisation and register” category.
➢ Are there degrees of excellence described appropriately?
Yes, the descriptors are fully described. However, no examples are provided.
➢ Is the rubric presented in a clear and handy way?
Yes, the rubric is easy to use and clear.
Regarding CEFR recommendations for good rubrics (the necessity of building a feasible
tool, descriptors positively worded, brief and not vague), the ACLES rubric is feasible
since it contains 4 different criteria, which are well described but not too long. In relation
to descriptors, they are not always positively worded as encouraged by the European
216
Council. For instance, score degree 3-4 states words as “bastantes errores” or “dominio
insuficiente”. Nevertheless, a clear attempt to uses positively worded descriptors is
observed in most of the descriptors.
Finally, are the rubric and the test relevant, valid and reliable? The criteria the writing
tasks intend to measure are not specified so it is difficult to state if they are relevant to the
skill. It is supposed that as the tasks are adapted to the ones mentioned in the CEFR and
it is stated that the criteria followed are the same described in the framework; the paper
should be relevant, but it is difficult to state it for sure. The validity can be confirmed in
the organisation and language criteria since the test, with the help of the rubric, measures
what it is supposed to assess and the descriptors provided match the evaluation standards
included in the CEFR.
The following table compares both of them:
Communicative
Achievement
Organisation Language
CEFR (B2) Uses the
conventions of the
communicative
task to hold the
target reader’s
attention and
communicate
straightforward
ideas.
Text is generally
well organised and
coherent, using a
variety of linking
words and cohesive
devices.
Uses a range of
everyday
vocabulary
appropriately, with
occasional
inappropriate use of
less common lexis.
Uses a range of
simple and some
217
complex
grammatical forms
with a good degree
of control.
Errors do not
impede
communication.
ACLES (9-10) Tanto las ideas
simples como las
complejas se
comunican con
claridad. El texto
está bien
estructurado.
Enlaza las frases y
conceptos
apropiadamente. El
formato, el tono y el
estilo son
apropiados.
Demuestra un
dominio muy
Bueno de
estructuras
gramaticales,
simples y
complejas. Usa un
vocabulario amplio,
conforme con lo
esperado del nivel
B2. Usa con acierto
algunas poco
communes.
Reliability
The ACLES web page does not provide the Cronbach’s Alpha coefficient or the average
SEM; nor does it give any information about any reliability research data.
218
The following text summarises the whole analysis.
EXAM Tasks: 2
Match CEFR tasks: Yes
Match CEFR criteria: Unknown
RUBRIC Type Measurement Analytic
Scoring Quantitative and Qualitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Unknown
Valid Yes
219
Reliable Unknown
CEFR criteria Feasible Yes
Descriptors Positive No
Brief YES
Not vague YES
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
NO
Descriptors (well
described)
Yes
Clear and Handy? Yes
EOI
220
The Official School of Languages (EOI) offers English courses ordered by level. The B2
level is called “Avanzado 2” and grants a title which certifies that level if the exam is
passed. Although the basics of the different Schools of Languages in Spain are the same,
each autonomous community is responsible for the creation and evaluation of the exam
certificate. The exam, criteria and rubrics analysed below correspond to the EOI Gijón
(Principado de Asturias). The reason being that this School publishes is rubrics online so
that all the students can access them. However, the EOI of A Coruña does not publish any
rubric online and does not allow the students to see the rubrics.
The paper for the assessment of writing expression and interaction consists of one or two
tasks. The tasks will be between 75 words and 250 words long. One of the tasks might be
completing or writing a text following the information given, and the other task may be a
free composition about a prompt given. The writing paper lasts for 90 minutes. The
possible types of tasks that can appear are the following: writing a personal letter of email
describing feelings and experiences, writing a formal letter, writing a text about personal
experiences or events, writing an opinion or argumentative texts, writing about routines
in present or past, describing people, objects or places, describing a picture, telling a story,
correcting or completing a letter, taking notes or summarising a conference or a film,
removing illogical words from a text, cloze, ordering a text (Escuela Oficial de Idiomas
de Gijón. Departamento de Inglés 175-176). As for the criteria that the candidates must
prove, the paper intends to assess their ability to: write notes to transmit simple
information, write letters with news and points of view, write reports which develop an
argumentation, write reviews, write down structured notes on relevant information from
a lecture or conference, summarise texts, organize texts according to the text typology,
use cohesive devices, use wide range of vocabulary, use a wide range of grammar
structures, express opinions (174-175).
221
If the exam tasks and criteria are compared to the tasks and criteria proposed in the
framework for the assessment of the writing skills, it can be stated that the exam is suitable
since uses most of the tasks contained in the CEFR and the criteria also match.
EOI CEFR
Tasks:
1 or 2 among the following:
• writing a personal letter of email
• describing feelings and
experiences
• writing a formal letter
• writing a text about personal
experiences or events
• writing an opinion or
argumentative texts
• writing about routines in present
or past
• describing people, objects or
places
• describing a picture
• telling a story
• correcting or completing a letter
• taking notes or summarizing a
conference or a film
Tasks:
• completing forms and
questionnaires
• writing articles for magazines,
newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
• taking down messages from
dictation, etc.
• creative and imaginative writing
• writing personal or business
letters
222
• removing illogical words from a
text
• cloze
• ordering a text
Criteria:
Ability to:
• write notes to transmit simple
information
• write letters with news and points
of view
• write reports which develop an
argumentation
• write reviews
• write down structured notes on
relevant information from a
lecture or conference
• summarise texts
• organise texts according to the
text typology
• use cohesive devices
Criteria:
A B2 learner can produce clear, detailed
text on a wide range of subjects and
explain a viewpoint on a topical issue
giving the advantages and independent
disadvantages of various options.
Can write a review of a film, book or
play.
Can write an essay or report.
223
• use wide range of vocabulary, use
a wide range of grammar
structures
• express opinions
The writing paper is assessed with a rubric, which is analytic. It assesses seven different
aspects or criteria (orthography and punctuation, grammar, vocabulary, register,
interaction and discourse management) and two of them are subdivided into a further two.
Attending to the scoring, it is both qualitative and quantitative as the six levels of
excellence are described with a number (from 0 to 5) and words. It is a domain relevant
and skill-focused rubric and it is also a proficiency rubric. It is a paper rubric and it is
used by an examiner.
Type of rubric according to EOI
How is measured Analytic
How is scored Quantitative and qualitative
Theme Domain-relevant
Application Task-specific
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
Here is the deep analysis:
➢ Is the skill assessed actually worthwhile?
224
Yes. The skill assessed is the writing skill. In the literature review section, the importance
of the writing skill has been fully addressed. Moreover, the CEFR includes the writing
skill among the skills used to determine someone’s language level.
➢ Are there few scoring criteria but correctly labelled?
No, the number of scoring criteria is excessively high. Moreover, the labels are much too
long.
➢ Are there degrees of excellence described appropriately?
Yes, the descriptors are described suitably.
➢ Is the rubric presented in a clear and handy way?
No. Despite being clear, the rubric is excessively long which make it very unhandy.
As for the CEFR recommendations for good rubrics (the necessity of building a feasible
tool, descriptors positively worded, brief and not vague), the EOI rubric is not feasible
since it contains 7 different criteria. Furthermore, CEFR advice on the reduction of criteria
by the grouping thereof under one clear label has not been followed. The labels should be
regrouped and shorten to ease the examiner’s task. For example, the orthography,
vocabulary and grammar criteria could be assessed within the same criteria. In relation to
descriptors, they are generally positively worded as encouraged by the European Council
but there are also some negative worded descriptors. For instance, “the candidate is not
able to exchange, ask or comment information” (Interaction- 0 points). The descriptors
are quite long. Besides, they are so many that is hardly impossible for the examiner to
score somebody fast or to distinguish between two close levels.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable because they are relevant to the skill. In
225
fact, the European Council uses three of them (communicative achievement, organisation
and language) in the writing scale provided by the CEFR. The validity can also be easily
confirmed since the test, with the help of the rubric, measures what it is supposed to assess
and they descriptors provided match the evaluation standards included in the CEFR.
Communicative
Achievement
Organisation Language
(Grammar/Vocabulary)
CEFR (B2) Uses the
conventions of the
communicative
task to hold the
target reader’s
attention and
communicate
straightforward
ideas.
Text is generally
well organised and
coherent, using a
variety of linking
words and
cohesive devices.
Uses a range of
everyday vocabulary
appropriately, with
occasional
inappropriate use of
less common lexis.
Uses a range of simple
and some complex
grammatical forms
with a good degree of
control.
Errors do not impede
communication.
EOI (5 points) Describes, presents
situations,
necessities, facts
Builds a coherent
and clear speech
which adjusts to
Uses a wide range of
grammar structures and
226
and opinions,
explains, gives
reasons in a
proficient way
with not difficulty.
Writes with
fluency and ease.
the required text
typology and the
paragraph and
organisation
conventions. Uses
with ease cohesive
devices and key
words and phrases.
uses precise and varied
vocabulary.
Reliability
The web page of the school of Languages of Gijón does not give any date related to any
reliability coefficient or research.
EXAM Tasks: 1-2
Match CEFR tasks: Yes
Match CEFR criteria: YES
RUBRIC Type Measurement Analytic
Scoring Quantitative and qualitative
Theme Domain-relevant
227
Application Task-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
Reliable Unknown
CEFR criteria Feasible NO
Descriptors Positive NO
Brief NO
Not vague Yes
Popham’s rubric Skill worthwhile Yes
Scoring Criteria No
228
(few and well labelled)
Descriptors (well
described)
YES
Clear and Handy? NO
6.3. Speaking
Productive skills such as writing and speaking are each day more and more assessed with
the help of a rubric. Although the appearance of the rubric is closely linked to the
assessment of writing compositions and essays, its use for the assessment of the speaking
skill is currently becoming habitual. In Spain, the encouragement of the communicative
approach and the development of the European Framework of Reference for Languages
have led many teachers to implement this tool in order to evaluate the speaking
performances of their students. The normal tasks used to assess speaking are
presentations, description of pictures, interviews, discussions and dialogues between
students.
The CEFR points out that the oral productions tasks should be (58):
• public address (information, instructions, etc.)
• addressing audiences (speeches at public meetings, university lectures, sermons,
entertainment, sports commentaries, sales presentations, etc.)
229
In addition, it is indicated that the assessment of the speaking skills can include some of
the following tasks: reading a written text aloud; speaking from notes, or from a written
text or visual aids (diagrams, pictures, charts, etc.); acting out a rehearsed role; speaking
spontaneously; singing (CEFR 58).
6.3.1. Literature Review
The evaluation of speaking is a relatively new since, in the past, traditional
methodologies, such as the Grammar-Translation Method, were dominant in the teaching
of languages in our country. That situation has luckily been changed in the recent years,
mainly promoted by the popularity of the communicative approach and the establishment
of the CEFR, as has already been explained.
Many authors have defended how the “communication skills are critical for intellectual
development, career trajectory, and civic engagement” (Schreiber et al. 207). The
introduction of the speaking competence in the class brings along the necessity of
significant changes in terms of teaching methodology and assessment. Students must
practice this skill in the class and the teacher needs to check the learning of the students
through the evaluation of this skill.
The assessment of a speaking task is completely different from the traditional assessment
of any grammar or vocabulary exercises. There are no right or wrong answers and many
other factors may be taken into account. For example, in order to assess speaking
performance, there are some dimensions related to the language itself, such as the
grammar or the vocabulary and some other strictly related with the performance or the
message delivery such as the articulation, or the non-verbal behaviour.
230
Bygate states that “the process of speaking includes three main phases: conceptualizing
the message content, formulating the message linguistically, and articulating the message
(cited in Baitman and Veliz 177). Some other authors, such as Alderson and Bachman,
distinguish only two types of knowledge: language knowledge and the textual knowledge
(cited in Baiman and Veliz 179). The language type refers to syntax, phonology and
vocabulary; and the textual type refers to the organisation and cohesion.
The assessment of speaking in L2 it is particularly complex. Baitman and Veliz explained
some of the problems which can be faced. Among them, the simulation of a real-language
use situation might be hard to achieve, in addition to the fact it is costly in terms of time
and resources. Many studies have been conducted in order to provide the community with
a better scope on the assessment of this skill. Schreiber et al., for instance, reviewed some
of them, such as that carried out by Quianthy and Haffering. They considered the topic
choice, fulfilment of the oral discourse, purpose determination, supporting materials and
strategic word choice were the dimensions to be assessed with regard to the composition
of the message, whereas vocal variety, articulation, language use and non-verbal
behaviour must also be scored (cited in Schreiber et al. 208)
Through an extensive collection of speaking rubrics and studies on them, it was
determined that some of the dimensions and criteria often used in the assessment of
speaking were always the same or very similar, although they can be called by different
names. The range of vocabulary, grammar (linguistic control), organisation or structure,
link devices (connectors, cohesion), content (topic development) and pronunciation
(articulation, vocal expression) appear in 95% of the rubrics revised. As for those which
include the performance or delivery, non-verbal behaviour, eye contact and the use of
supporting materials are the most frequent dimensions.
231
With reference to the assessment of speaking, numerous studies have dealt with different
problems or related issues related. The one conducted by Emrah Ekmekçi and the one
mentioned above by Brittany Baitman and Mauricio Veliz are just two examples of case
studies which examined whether there are significant differences between the assessment
by native-speaker teachers and non-native speakers. However, they did not reach the same
conclusions. While the former, carried out with 80 EFL students and 6 teachers (3 NNES
and 3 NES) and a 20-point analytic scale, found no difference (Ekmekçi 104), the latter,
in which 12 teachers participated and scored 4 TOELF independent tasks with a analytic
scale, proved that NNES gave lower scores than the NES (Baitman and Veliz 186). The
rubric used was analytic and measured accuracy, fluency, pronunciation and vocabulary.
Native-speaker teachers tended to give more importance to fluency and pronunciation,
whereas non-native teachers scored grammar accuracy and vocabulary more stringently
(ibid. 191). In the same line, the study conducted by Zhang and Elder in China with 30
test-takers and holistic numerical ratings from 39 examiners (19 NES and 20 NNES)
showed no significance difference, although it did allow the researchers to ascertain that
linguistic features were considered to be more relevant for NNES. Meanwhile, NES
teachers tend to focus more on interaction, compensation strategies and demeanour (cited
in Baitman and Veliz).
Another pertinent matter regarding the assessment of the oral skill seems to be the
reliability and validity of peer-, self- and teacher assessment of the oral productions of
EFL learners, as shown by the high number of investigations which dealt with this issue.
With the intention of giving an answer to questions such as how reliable each of them is,
and if there exists any correlation among the three types of assessment for the oral and
written productions, Salehi and Sayyar published in the International Journal of
Assessment and Evaluation in Education an article where their study is explained. 32
232
students from three English Language Teaching Institutes acted as self- and peer-
assessors, and two experienced teachers as the teacher assessors. The students received a
seminar in which they were explained what self- and peer-assessment are and how to
carry them out. The results were very positive in terms of reliability, as high inter-rater´s
reliability was found through the comparison of all the peer assessment (14). The
correlation between the two teachers was strong in the teacher assessment of the oral
production. However, in the matter of correlation between peer- and teacher-assessment
and self-assessment and teacher-assessment, the results were dissimilar. Whilst in the
assessment of the written production the correlation was found to be strong and high
(r=.85 and r=.79 respectively), no significant correlation was noted in the scores of oral
productions given by the teachers and the self-assessment (r=.30). Correlation between
peer-assessment and teacher-assessment was significant (r=.61) but still lower than the
correlation achieved in the written production (16). The results suggest that peer-
assessment is more reliable than self-assessment in oral production.
One of the main disadvantages often outlined when speaking about the evaluation of the
speaking skill is how time-consuming it may be. Jackson and Ward state that “assessing
public speaking on an individual basis, especially in larger cohorts, is very time
demanding in terms of organizing sessions, staff availability and feedback” (2). Because
of this, they wanted to create and test the use of a rubric and to identify which factors
might affect the variation among markers so that rubrics could be used in the future as a
reliable tool able to speed up and standardise feedback. 32 international students
participated and three academic markers scored their public speaking performances. The
public speaking rubric used was the one developed by Schreiber et al. in the above-
mentioned research, which included ten aspects related with the two main categories of
content and delivery. It was found out that the dimensions of design and development of
233
the speech and the persona were those which showed a bigger difference among the three
scorers´ marks. Body language was the second aspect, although on a lower scale.
Hensley and Brand wrote an essay from the Communication Department in Millikin
University (Illinois) to provide the educational community with information in order to
“prepare and demonstrate the most effective ways to craft and delivered messages
adopted to a wide variety of audiences” (2). They advise giving advanced or exemplary
speech (both verbal and non-verbal) to the students so that they can learn how to perform
them correctly by watching and examining different excellent examples. Another
suggestion is the use of supporting materials which back up the ideas of the speech. As
for the organisation of the speech, they recommend structuring the same into four
different phases: an introduction which focuses on catching the reader’s attention, the
establishment of the thesis and a good conclusion. Moreover, natural and fluent
transitions must be used through all the phases.
Luu Trong Tuan focused on the use of analytic rubrics to assess the speaking
performance. Tuan collected some positive or negative aspects of the analytic rubrics
mentioned in different works about analytic rubrics. Among the advantages, one can find
the fact of being a useful diagnostic tool for students and also for teachers. Learners can
easily obtain feedback to improve their performances and the teacher may, through the
information taken from the use of rubrics, tailor the instruction according to the needs of
the students (674). Trong also explained that inexperienced scorers might find the analytic
rubric more useful, as it does not require as much expertise as a holistic rubric to obtain
reliable results. As for the disadvantages collected, Trong mentions the great amount of
time it requires to use them as opposed to holistic rubrics, the decrease of the
interconnectedness of spoken discourse and the fact that the criterion scored first may
have an effect on the evaluation of the subsequent criteria (674). After the literature
234
review on the matter, Trong carried out a study with 104 students divided in two groups.
The experimental group (51 students) was scored with an analytic rubric while the control
group (53 students) was assessed with a holistic one. The teacher responsible for scoring
was the same for both groups. The final test of the previous semester was used as “pre-
test”, while the six speaking tests during the semester formed the “post-test”. The analytic
rubric used contained five criteria: coherence, content, grammar and structure, language
and organisation. Furthermore, a questionnaire which consisted of six items and a six-
scale Likert type was conducted. The speaking competence of the students was almost
the same according to the pre-test results. After the realisation of the 6 tests of the post-
test phase, it was discovered that the students from the control group (scored with a
holistic rubric) did not improve much from the first test to the sixth. They obtained an
average mark of 6.31 in the first test and 6.58 in the final one. However, the experimental
group (scored with the analytic rubric) showed an increase from an average mark of 6.33
in the first test to a 7.06 in the last. The improvement in terms of criteria was highest in
the content one. This suggests that analytic rubrics help students´ improvement more than
the holistic ones.
6.3.2. Assessment of Speaking in the main English Certificates of ESL
Cambridge First Certificate
The Cambridge First Certificate (FCE) speaking paper consists of four different tasks or
parts. These involve “short exchanges with the interlocutor; a 1-minute individual ‘long
turn’; a collaborative task involving the two candidates; a discussion” (Cambridge
English Language Assessment 71). The total duration of the test is 14 minutes.
235
Each part focuses on different language functions. As a result, part one is a conversation
between the examiner and the candidate in which the examiner will ask the candidate
different questions about general personal information (place of birth, family,
hobbies…etc.). The aim of this part is the assessment of the candidate’s ability to use
general interactional and social language. In part 2, the candidate is asked to compare two
pictures for around one minute. This task aims to assess how the candidate is able to
organise a larger unit of discourse; compare, describe and express opinions. This task is
also individual, whereas part 3 is collaborative and involves two candidates. Candidates
are given one written question with different written stimuli to talk together. “The focus
is on sustaining an interaction; exchanging ideas, expressing and justifying opinions,
agreeing and/or disagreeing, suggesting, speculating, evaluating, reaching a decision
through negotiation, etc.” (Cambridge English Language Assessment, 71). Finally, the
last part of the exam is also collaborative and implies a discussion between the two
candidates of some questions asked (orally) by the examiner and related to the previous
task. Candidates are expected to prove their ability to express agreement and
disagreement, give opinions and justify them, and speculate.
The CEFR provides the education community with five different illustrative scales for
the speaking skill levels (58-61). Those scales can be used to assess the level of the:
• Overall spoken production
• Sustained monologue: describing experience
• Sustained monologue: putting a case (e.g. in debate)
• Public announcements
• Addressing audiences
236
The criteria for a speaking B2 level overall production states that a B2 learner “can give
clear, systematically developed descriptions and presentations, with appropriate
highlighting of significant points, and relevant supporting detail” and he or she “can give
clear, detailed descriptions and presentations on a wide range of subjects related to his/her
field of interest, expanding and supporting ideas with subsidiary points and relevant
examples” (CEFR 60). In addition, some other specifications for particular functions such
as describing an experience or doing a debate can be extracted from the other rubrics
provided. Thus, a B2 learner must be able to describe different topics in detail, give
reasons and support them in discussions and debates or highlight the advantages or
disadvantages of different options. He or she has the ability to give presentations with
clarity and fluency and depart spontaneously from those when follow-up questions are
posed, or interesting points are raised by the audience.
Although only one of the tasks (part 1) included in the speaking exam is among those
mentioned in the CEFR proposal of tasks to assess speaking (public address:
information), the other three functions are mentioned in the activities that the assessment
of speaking may involve. Speaking from pictures (part 2), speaking from written texts
(part 3) and speaking spontaneously (part 4, part 1). With regard to the criteria the CEFR
gives to assess a B2 learner, the tasks proposed by the FCE do assess them. However, it
would be more recommendable to include a task involving production of speech by the
candidate since it is the main task advised in the framework. The following table
summarises and compares both of them.
FCE CEFR (B2)
Task: Tasks:
237
Part 1: questions
Part 2: monologue. Description of a
picture
Part 3: collaborative task. Discussion a
written question with stimuli
Part 4: collaborative task. Discussion of
questions related with the previous task
• public address (information,
instructions, etc.)
• addressing audiences (speeches at
public meetings, university
lectures, sermons, entertainment,
sports commentaries, sales
presentations, etc.)
• reading a written text aloud
• speaking from notes, or from a
written text or visual aids
(diagrams, pictures, charts, etc.)
• acting out a rehearsed role
• speaking spontaneously
• singing
Criteria:
Part1: ability to use general interactional
and social language
Part 2: ability to organise a larger unit of
discourse; compare, describe and express
opinions
Part 3: sustaining an interaction;
exchanging ideas, expressing and
justifying opinions, agreeing and/or
disagreeing, suggesting, speculating,
Criteria:
• clear, systematically developed
descriptions and presentations
• supporting ideas with subsidiary
points and relevant examples
• describing an experience
• describe with details different
topics
• give reasons and supporting them
in discussions and debates
238
As for the rubric the FCE uses to score the candidates in the speaking exam, it is an
analytic rubric. It contains 4 different categories to assess: Grammar and Vocabulary,
Discourse Management, Pronunciation and Interactive Communication. It is also a
quantitative rubric with a numeric scale from 0 to 5, and it is domain-relevant and skill-
focused since it is used to assess all the tasks of the speaking exam. It is a proficiency
rubric because the certificate aims to determine the level of the candidate. The scorer is a
trained examiner and it is a paper rubric although it can be found on the Internet, too.
Type of rubric according to Cambridge FCE
How is measured Analytic
How is scored Quantitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
evaluating, reaching a decision through
negotiation, etc.”
Part 4: expressing and justifying
opinions, agreeing and/or disagreeing and
speculating
• highlight the advantages or
disadvantages of different options
• give presentations with clarity and
fluency
• depart spontaneously from
discussion when follow up
questions are posed
239
With reference to whether the rubric used by the FCE examiners is good or not, here is
the analysis.
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the speaking skill. In the literature review section, the
importance of the speaking skill has been fully addressed. Moreover, the CEFR boosts
the communicative approach so this is probably the most important skill.
➢ Are there few scoring criteria but correctly labelled?
Yes. The scoring criteria are few (four) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
No. There are multiple cells to describe the different criteria according to the scale. The
descriptors are appropriately described but in some cases they are a bit short and they do
not provide examples, which would be recommendable. Furthermore, the descriptors for
scale 4, 2 and 0 are very vague and imprecise as they merely indicate a performance
between the bands above and below.
➢ Is the rubric presented in a clear and handy way?
Yes. The rubric is clear because it is not excessively long and the design is good and
handy.
In regard to the CEFR recommendation for good rubrics (the necessity of building a
feasible tool, descriptors positively worded, brief and not vague), the Cambridge rubric
for the FCE is feasible since it contains only 4 different criteria. In relation to the
240
descriptors, they are positively worded as encouraged by the European Council. For
instance, band 1 of the language criteria, which corresponds to one of the lowest levels,
states “Shows a good degree of control of simple grammatical forms” (Cambridge
Language Assessment 82). The descriptors are brief but, as it has already been stated,
some of the bands are very vague because they only include: performance shares features
of Bands X and X as the only descriptor.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable as they are relevant to the skill. In fact,
the European Council uses 5 different criteria to assess speaking (range, accuracy,
fluency, interaction and coherence). If the labels are not the same, the truth is that the
labels used by the FCE measure the same aspects but one. Hence, the grammar and
vocabulary category from the Cambridge rubric is equivalent to the CEFR’s ‘accuracy’.
The assessment of the ‘range’ and ‘coherence’ and ‘fluency’ corresponds to the ‘discourse
management’ in the FCE and the ‘interaction’ is clearly represented by the ‘interactive
communication’ category. On the other hand, the FCE proposes the assessment of
pronunciation and the CEFR does not.
The validity can also be easily confirmed since the test, with the help of the rubric,
measures what it is supposed to assess and the descriptors provided match the evaluation
standards included in the CEFR. A simple comparison between the CEFR speaking scale
and the FCE speaking rubric shows the similarities. (The information has been taken from
the CEFR (28) and the Cambridge First Handbook (82).
241
Range, Fluency and
Coherence
/Discourse
Management
Accuracy
/Grammar and
vocabulary
Interaction
/Interactive
Communication
CEFR (B2) Has a sufficient
range of language
to be able to give
clear descriptions,
express viewpoints
on most general
topics.
Can produce
stretches of
language with a
fairly even tempo;
although he/she can
be hesitant. There
are few noticeably
long pauses.
Can use a limited
number of cohesive
devices to link
utterances.
Sows a relatively
high degree of
grammatical
control. Does not
make errors which
cause
misunderstanding
and can correct
most of his/her
mistakes.
Can initiate
discourse, take
his/her turn when
appropriate and end
conversation when
he/she needs to.
Can help the
discussion along on
familiar ground
confirming
comprehension,
inviting others in,
etc.
242
FCE (Band 5) Produces extended
stretches of
language with very
little
hesitation.
Contributions are
relevant and
there is a clear
organisation of
ideas.
Uses a range of
cohesive devices
and discourse
markers.
Shows a good
degree of control
of a range of
simple and some
complex
grammatical forms.
Uses a range of
appropriate
vocabulary to give
and exchange
views on a wide
range of familiar
topics.
Initiates and
responds
appropriately,
linking
contributions to
those of
other speakers.
Maintains and
develops the
interaction and
negotiates
towards an
outcome.
Reliability
Reliability coefficients of the ESOL certificates have already been stated in the writing
section.
The above extensive analysis of the Cambridge First Certificate speaking exam and tasks
and the rubric proves that the exam paper is suitable and matches both the CEFR
indications and levels, and that the rubric used for the assessment of the exam is good and
suitable in most of the aspects. However, it is too vague owing to the omission of certain
243
bands descriptors and the absence of examples. The following table summarises the
analysis.
EXAM Tasks: 4
Match CEFR tasks: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
Scoring Quantitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
244
Reliable Yes
CEFR criteria Feasible Yes
Descriptors Positive Yes
Brief Yes
Not vague NO
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Yes
Descriptors (well
described)
NO
Clear and Handy? Yes
IELTS
245
The IELTS speaking exam contains three tasks and has a total duration of 11-14 minutes.
The first part lasts for 4 or 5 minutes. In this part the examiner introduces himself or
herself and asks the candidate a few general questions on familiar topics such as family,
interests, hobbies, etc. The aim of this part is to check the ability to communicate on
everyday topics. In the second task of the test, the learner has to speak on his or her own
for around two minutes on a particular topic given by the examiner in one card with some
points to mention. The candidate is allowed one minute to prepare the task and make
some notes. Afterwards, the examiner asks him or her some question related. This part is
about 4 minutes long. This task intends to measure the “ability to speak at length on a
given topic (without further prompts from the examiner), using appropriate language and
organising ideas coherently” (IELTS web page. Test format). Finally, task 3 is a
discussion between the examiner and the candidate about a topic related to the previous
part. It “focuses on the ability to express and justify opinions and to analyse, discuss and
speculate about issues.” (ibid.)
The IELTS test format and criteria coincide almost exactly with the CEFR guidelines on
tasks and criteria to assess the speaking performance of a candidate. According to the test
format, the candidate must address audiences and speak from notes (part 2) and also speak
spontaneously (part 1 and 3). Furthermore, the long turn exposition makes it possible to
assess the ability of the candidate to give presentations, support his or her ideas (part 2),
give reasons and supporting them (part 3) or depart spontaneously (part 3).
IELTS CEFR (B2)
Task:
(Task 1) Interview
Tasks:
246
(Task 2) Long turn (monologue)
(Task 3) Discussion
• public address (information,
instructions, etc.)
• addressing audiences (speeches at
public meetings, university
lectures, sermons, entertainment,
sports commentaries, sales
presentations, etc.)
• reading a written text aloud
• speaking from notes, or from a
written text or visual aids
(diagrams, pictures, charts, etc.)
• acting out a rehearsed role
• speaking spontaneously
• singing
Criteria:
(Part 1) ability to communicate opinions
and information on everyday topics and
common experiences or situations by
answering a range of questions
(Part 2) ability to speak at length on a
given topic (without further prompts
from the examiner), using appropriate
language and organising ideas coherently
Criteria:
• clear, systematically developed
descriptions and presentations
• supporting ideas with subsidiary
points and relevant examples
• describing an experience
• describe with details different
topics
• give reasons and supporting them
in discussions and debates
247
As for the rubric the IELTS uses to score the candidates in the speaking exam, it is an
analytic rubric. It contains 4 different categories to assess: Fluency and coherence, lexical
resource, grammatical range and accuracy and pronunciation. It is also a quantitative
rubric with a numeric scale from 0 to 9; it is domain-relevant and skill-focused since it is
used to assess all the tasks of the speaking exam. It is a proficiency skill because the
certificate aims to determine the level of the candidate. The scorer is a trained examiner
and it is a paper rubric although it can be found on the Internet, too.
Type of rubric according to IELTS
How is measured Analytic
How is scored Quantitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
(Part 3) ability to express and justify
opinions and to analyse, discuss and
speculate about issues
• highlight the advantages or
disadvantages of different options
• give presentations with clarity and
fluency
• depart spontaneously from
discussion when follow up
questions are posed
248
In the matter of whether the rubric used by the IELTS examiners is good or not, the
analysis can be found below.
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the speaking skill. In the literature review section, the
importance of the speaking skill has been fully addressed. Moreover, the CEFR boosts
the communicative approach so this is probably the most important skill.
➢ Are there few scoring criteria but correctly labelled?
Yes. The scoring criteria are few (four) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
Yes. There is enough description for all of them although they do not contain any
example.
➢ Is the rubric presented in a clear and handy way?
No, the grading scale is too long so the scoring process can be tedious and confusing.
As regards the CEFR recommendations for good rubrics (the necessity of building a
feasible tool, descriptors positively worded, brief and not vague), the IELTS rubric is
feasible since it only contains 4 different criteria. Nevertheless, the extension of the
grading scale (ten different levels) makes it difficult to use. In relation to the descriptors,
they are not always positively worded as encouraged by the European Council). For
example, some descriptors such as “cannot produce basic sentence forms” can be read in
band 2 or “cannot respond without noticeable pauses” (IELTS. Speaking rubric). The
descriptors are brief although they do not provide the examiner with examples.
249
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable because they are relevant to the skill.
The European Council uses 5 different criteria to assess speaking (range, accuracy,
fluency, interaction and coherence). There are no criteria to measure “interaction” in the
IELTS rubric and, since there is a discussion task, it should be included. The other criteria
match the ones used by the CEFR, they are just grouped in a different way.
The validity cannot be confirmed since the descriptors provided match the evaluation
standards included in the CEFR but the rubric does not measure all that it is supposed to
assess. This is because there are no interaction criteria, despite a discussion task being
included in the exam format. A simple comparison between the CEFR writing scale and
the IELTS speaking rubric shows the similarities and differences.
Range, Fluency and
Coherence
/Discourse
Management
Accuracy
/Grammar and
vocabulary
Interaction
/Interactive
Communication
CEFR (B2) Has a sufficient
range of language
to be able to give
clear descriptions,
express viewpoints
on most general
topics.
Sows a relatively
high degree of
grammatical
control. Does not
make errors which
cause
misunderstanding
and can correct
Can initiate
discourse, take
his/her turn when
appropriate and end
conversation when
he/she needs to.
250
Can produce
stretches of
language with a
fairly even tempo;
although he/she can
be hesitant. There
are few noticeably
long pauses.
Can use a limited
number of cohesive
devices to link
utterances.
most of his/her
mistakes.
Can help the
discussion along on
familiar ground
confirming
comprehension,
inviting others in,
etc.
IELTS (Band 6) (Fluency and
coherence)
Candidate is
willing to speak at
length, though may
lose coherence at
times due to
occasional
repetition, self-
correction or
hesitation.
He uses a range of
connectives and
(Lexical resource)
Candidate has a
wide enough
vocabulary to
discuss topics at
length and make
meaning clear in
spite of
inappropriacies.
He generally
paraphrases
successfully.
251
discourse markers
but not always
appropriately
(Grammatical
range and
accuracy)
He uses a mix of
simple and
complex structures,
but with limited
flexibility.
He may make
frequent mistakes
with complex
structures though
these rarely cause
comprehension
problems.
Reliability
The reliability data used to check the IELTS certificates have been discussed in the
writing section.
The above extensive analysis of the IELTS speaking exam and tasks and the rubric proves
that the exam paper is suitable and matches CEFR indications and levels but that the
rubric used for the assessment of the exam is not suitable in many aspects.
The following table summarises the analysis.
252
EXAM Tasks: 4
Match CEFR tasks: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
Scoring Quantitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid No
Reliable Unknown
253
CEFR criteria Feasible No
Descriptors Positive No
Brief Yes
Not vague NO
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Yes
Descriptors (well
described)
Yes
Clear and Handy? NO
ISE II
As it happens with the reading and writing skills, the ISE-II assesses two skills together.
Therefore, both the listening and speaking abilities of the candidate form an integrated
approach. This means they are both evaluated in the same exam. This module exam
254
consists of four tasks and lasts for 20 minutes. The first task is called the “topic task” and
the candidate is allowed to speak about a topic within his or her personal interests which
he or she has previously prepared. Moreover, he/she is allowed to use notes or maps to
ease the task. Candidates may also use an item such a picture. The timing for this task is
4 min. The examiner will ask the candidate questions related to the topic chosen. The
candidate is expected to be able to use these language functions:
• Initiating and maintaining the conversation
• Expressing and expanding ideas and opinions
• Highlighting advantages and disadvantages
• Speculating
• Giving advice
• Expressing agreement and disagreement
• Eliciting further information
• Establishing common ground
The second task is named “collaborative task” and it is also 4-minute long. In this part,
the examiner poses a prompt in the form of a dilemma and the candidates ask questions
to find out more information and keep the conversation going. The language functions
expected to be managed by the candidate are:
• Initiating and maintaining the conversation
• Expressing and expanding ideas and opinions
• Highlighting advantages and disadvantages
• Speculating
• Giving advice
• Expressing agreement and disagreement
255
• Eliciting further information
• Establishing common ground
Task 3 is the “conversation task” and has a duration of 2 minutes. The examiner asks
the candidates questions on a subject (society and living standards, personal values and
ideals, the world of work, national environmental concerns and public figures past and
present) and they start a conversation. The candidate must be able to prove his or her
ability:
• Initiating and maintaining the conversation
• Expressing and expanding ideas and opinions
• Highlighting advantages and disadvantages
• Speculating
• Giving advice
• Expressing agreement and disagreement
• Eliciting further information
• Establishing common ground
The last task of this module exam assesses listening exclusively so it will be analysed in
the listening section.
With regard to the adequacy of the exam design according to the CEFR guidelines, it
can be said that the ISE-II tasks match the tasks proposed by the framework and also the
criteria which must be measured to certificate the speaking of a B2 candidate.
Therefore, the exam is suitable.
256
ISE-II CEFR (B2)
Task:
(Task 1) Topic Task
(Task 2) Collaborative Task
(Task 3) Conversation Task
Tasks:
• public address (information,
instructions, etc.)
• addressing audiences (speeches at
public meetings, university
lectures, sermons, entertainment,
sports commentaries, sales
presentations, etc.)
• reading a written text aloud
• speaking from notes, or from a
written text or visual aids
(diagrams, pictures, charts, etc.)
• acting out a rehearsed role
• speaking spontaneously
• singing
Criteria:
(Part 1)
• Initiating and maintaining the
conversation
• Expressing and expanding ideas
and opinions
Criteria:
• clear, systematically developed
descriptions and presentations
• supporting ideas with subsidiary
points and relevant examples
• describing an experience
257
• Highlighting advantages and
disadvantages
• Speculating
• Giving advice
• Expressing agreement and
disagreement
• Eliciting further information
• Establishing common ground
(Part 2)
• Initiating and maintaining the
conversation
• Expressing and expanding ideas
and opinions
• Highlighting advantages and
disadvantages
• Speculating
• Giving advice
• Expressing agreement and
disagreement
• Eliciting further information
• Establishing common ground
(Part 3)
• Initiating and maintaining the
conversation
• describe with details different
topics
• give reasons and supporting them
in discussions and debates
• highlight the advantages or
disadvantages of different
options.
• give presentations with clarity and
fluency
• depart spontaneously from
discussion when follow up
questions are posed.
258
This module is assessed with two rubrics, one for the first three tasks and another one
for the last one. Since the tasks related to speaking are the first, only that rubric will be
analysed in the current section.
The ISE II Speaking and listening rating scale is analytic and it contains four criteria
(communicative effectiveness, interactive listening, language control and delivery). It is
a quantitative rubric, too: the numeric scale used ranges from 0 to 4. It is domain-
relevant and is skill-focused since it is used to assess three different tasks even though
they aim to assess two skills. It is a proficiency rubric, paper based despite the fact that
it can be found online. Finally, an examiner is responsible for the scoring with it.
Type of rubric according to ISE-II
How is measured Analytic
How is scored Quantitative
• Expressing and expanding ideas
and opinions
• Highlighting advantages and
disadvantages
• Speculating
• Giving advice
• Expressing agreement and
disagreement
• Eliciting further information
• Establishing common ground
259
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
The following is the deep analysis of the rubric:
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the speaking skill. In the literature review section, the
importance of the speaking skill has been fully addressed. Moreover, the CEFR boosts
the communicative approach so that this is probably the most important skill.
➢ Are there few scoring criteria but correctly labelled?
Yes. The scoring criteria are few (four) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
Yes. All the descriptors are well explained. However, they do not content any example.
➢ Is the rubric presented in a clear and handy way?
No. Although the number of criteria is adequate and so are the descriptions, latter are very
extensive which may make its use difficult.
The CEFR recommendations for good rubrics (the necessity of building a feasible tool,
descriptors positively worded, brief and not vague) have been partially followed. The
ISE-II is feasible since it contains only 4 different criteria, but it may not be very handy.
In relation to the descriptors, they are not always positive worded. For instance, in the
260
scale 1 “does not maintain and develop the interaction sufficiently” or “does not show
adequate level of grammatical accuracy and lexical precision” can be read. The
descriptors are not brief; in contrast, they are quite long.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable because they are relevant to the skill. In
fact, the European Council uses most of them. If the labels are not the same, the truth is
that the labels used by the ISE-II measure the same aspects. Hence, the “delivery”
category is equivalent to the CEFR’s ‘accuracy’. The assessment of the ‘accuracy’ is the
“language control” in the Trinity College rubric and the interaction is more or less
equivalent to the communicative effectiveness. In the ISE-II there is one criterion more
which does not appear at the CEFR scales, the interactive listening but this is only because
the guidelines explain the assessment of skills per separate. The validity can also be easily
confirmed since the test, with the help of the rubric, measures what it is supposed to assess
and the descriptors provided match the evaluation standards included in the CEFR as the
table below shows:
Range, Fluency
and Coherence
/Delivery
Accuracy /Language
Control
Interaction
/Communicative
effectiveness
CEFR (B2) Has a sufficient
range of language
to be able to give
clear descriptions,
express viewpoints
Shows a relatively
high degree of
grammatical control.
Does not make errors
which cause
Can initiate discourse,
take his/her turn when
appropriate and end
conversation when
he/she needs to. Can
261
on most general
topics.
Can produce
stretches of
language with a
fairly even tempo;
although he/she
can be hesitant.
There are few
noticeably long
pauses.
Can use a limited
number of
cohesive devices
to link utterances.
misunderstanding and
can correct most of
his/her mistakes.
help the discussion
along on familiar
ground confirming
comprehension,
inviting others in, etc.
ISE-II (Scale
4)
• clearly
intelligible
• uses focal
stress and
intonation
effectively
• speaks
promptly
• uses a wide
range of
grammatical
structures/
lexis flexibly
to deal with
topics at this
level
• consistently
shows a high
• fulfils the task
very well
• initiates and
responds with
• effective turn-
taking
• effectively
maintains and
262
and
fluently
• requires no
careful
listening
level of
grammatical
• accuracy and
lexical
precision
• errors do not
impede
communication
• develops the
interaction
• solves
communication
• problems
naturally, if
any
Reliability
Trinity College London reliability data have been mentioned in the writing section.
The above extensive analysis of the ISE-II speaking exam and tasks and the rubric proves
that the exam paper is suitable and matches CEFR indications and levels but that the
rubric used for the assessment of the exam is not suitable in some aspects.
The following table summarises the analysis carried out:
EXAM Tasks: 3
Match CEFR tasks: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
263
Scoring Quantitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
Reliable Yes
CEFR criteria Feasible No
Descriptors Positive No
Brief No
Not vague Yes
264
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Yes
Descriptors (well
described)
YES
Clear and Handy? NO
ACLES
The ACLES certification for the B2 level lasts for 7-10 minutes. The candidates must
perform a monologue and also an interaction in pairs. However, neither the tasks in detail
nor the criteria they intend to measure are explained. Hence, it is difficult to compare
whether or not they are suitable for the examination of the speaking exam. It is stated in
the ACLES website that the criteria followed are those provided for the level in the CEFR,
but as they are not clearly stated in any document it is difficult to say it for sure.
For the assessment of the speaking paper, the examiner uses a speaking rubric. This rubric
has three criteria (fluency/interaction; linguistic correction and pronunciation). It can be
265
classified as analytic and quantitative as it uses a numeric scale (the same used for the
writing scale of the ACLES) which also includes a qualitative word-scale. According to
the theme, it is domain-relevant and according to the application is skill-focused. It is a
proficiency rubric; the scorer is an examiner and it is a paper rubric.
Type of rubric according to ACLES
How is measured Analytic
How is scored Quantitative and Qualitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
In depth analysis of the rubric:
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the speaking skill. In the literature review section, the
importance of the speaking skill has been fully addressed. Moreover, the CEFR boosts
the communicative approach so this is probably the most important skill.
➢ Are there few scoring criteria but correctly labelled?
Yes. The scoring criteria are few (three) as Popham recommends and are correctly
labelled.
➢ Are there degrees of excellence described appropriately?
266
No. There are multiple cells to describe the different criteria according to the scale. The
descriptors are appropriately described but in some cases they are very vague and
imprecise as they merely indicate a performance between the above and below degrees
of excellence.
➢ Is the rubric presented in a clear and handy way?
Yes. The rubric is clear because it is not excessively long and the design is good and
handy.
In relation to the CEFR recommendations for good rubrics (the necessity of building a
feasible tool, descriptors positively worded, brief and not vague), the ACLES rubric is
feasible since it contains only 3 different criteria. In relation to the descriptors, they are
positively worded as encouraged by the European Council. The descriptors are brief but,
as it has already been stated, some of the bands are very vague because they only include:
“shares features of Bands X and X” (ACLES. Speaking Rubric) as the only descriptor.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable because they are relevant to the skill. In
fact, the European Council uses 5 different criteria to assess speaking (range, accuracy,
fluency, interaction and coherence). If the labels are not the same, the truth is that the
labels used by the ACLES measure the same aspects but one.
The validity can also be easily confirmed since the test, with the help of the rubric,
measures what it is supposed to assess and the descriptors provided match the evaluation
standards included in the CEFR. It can be checked by means of a simple comparison
between the CEFR writing scale and the ACLES rubric:
267
Range, Fluency and
Coherence /
Fluency, interaction
and adequacy
Accuracy /
linguistic
correction
Interaction
CEFR (B2) Has a sufficient
range of language
to be able to give
clear descriptions,
express viewpoints
on most general
topics.
Can produce
stretches of
language with a
fairly even tempo;
although he/she can
be hesitant. There
are few noticeably
long pauses.
Can use a limited
number of cohesive
devices to link
utterances.
Shows a relatively
high degree of
grammatical
control. Does not
make errors which
cause
misunderstanding
and can correct
most of his/her
mistakes.
Can initiate
discourse, take
his/her turn when
appropriate and end
conversation when
he/she needs to.
Can help the
discussion along on
familiar ground
confirming
comprehension,
inviting others in,
etc.
268
ACLES (9-10) Se comunica con
mucha fluidez,
incluso en períodos
largos y complejos.
Suple las
ocasionales
carencias con
paráfrasis y
circunloquios
adecuados. Las
pausas son escasas
y no entorpecen la
comunicación del
contenido temático.
En las tareas
interactivas el
diálogo fluye con
naturalidad, y se
muestra capaz de
iniciar turnos,
tomarlos cuando le
corresponde y
cerrarlos con
eficacia
comunicativa.
Excelente control
gramatical, con
escasos errores
sistemáticos y
pequeños fallos que
no provocan la
incomprensión, y
que a veces
autocorrige. Léxico
abundante,
incluyendo algunas
palabras de uso
poco frecuente o
frases idiomáticas,
adecuadas a la
tarea.
269
Reliability
ACLES reliability have been discussed in the writing section.
The next table summarizes the complete analysis:
EXAM Tasks: 2
Match CEFR tasks: Unknown
Match CEFR criteria: Unknown
RUBRIC Type Measurement Analytic
Scoring Quantitative and Qualitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
270
Relevant Yes
Valid Yes
Reliable Unknown
CEFR criteria Feasible Yes
Descriptors Positive Yes
Brief Yes
Not vague No
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Yes
Descriptors (well
described)
NO
Clear and Handy? Yes
271
EOI
The speaking paper is around 10 minutes long. The candidate may have to hold a
conversation with the teacher or other classmate, comment and describe an image, discuss
about some current issue, express personal opinions, tell about a personal experience,
have a dialogue or role play on a familiar situation in which the candidate proves his or
her ability to develop himself or herself (Escuela Oficial de Idiomas de Gijón.
Departamento de Inglés 169). The different possible types of task which can appear are
(169):
- Answer and ask questions
- Describe people or objects with visual support.
- Narrate dreams, goals or feelings
- Inform on familiar topics
- Tell stories about familiar themes.
- Make hypotheses
- Make a presentation and answer audience’s questions.
- Exchange information on known matters about his or her profession or
interests
- Advice someone
- Complain on common situations
- Manage to communicate in daily situations or journeys
- Check correct information
- Summarise a plot
- Describe processes and procedurals
272
The paper focuses on the candidates’ ability to produce detailed and well organised oral
texts about different topics, both concrete and abstract. Their ability to take part in face-
to-face conversations and through electronic devices with a clear pronunciation and
correctness, fluency and spontaneity which can be understood with no effort, despite
sporadic mistakes.
If compared to the CEFR tasks and criteria, it could be stated that the paper is suitable for
the assessment of the task. However, it could be advisable to specify the exam tasks and
the focus on each of them more clearly.
EOI CEFR (B2)
Some of the following tasks may be
included:
- Answer and ask questions
- Describe people or objects
with visual support
- Narrate dreams, goals or
feelings
- Inform on familiar topics
- Tell stories about familiar
themes.
- Make hypotheses
- Make a presentation and
answer audience’s questions
Tasks:
• public address (information,
instructions, etc.)
• addressing audiences (speeches at
public meetings, university
lectures, sermons, entertainment,
sports commentaries, sales
presentations, etc.)
• reading a written text aloud
• speaking from notes, or from a
written text or visual aids
(diagrams, pictures, charts, etc.)
• acting out a rehearsed role
273
- Exchange information on
known matters about his or her
profession or interests
- Advice someone
- Complain on common
situations
- Manage to communicate in
daily situations or journeys
- Check correct information
- Summarize a plot
- Describe processes and
procedurals
• speaking spontaneously
• singing
Criteria:
ability of the candidates:
- to produce oral texts, detailed
and well organised about
different topics, both concrete
and abstract
- to take part in face-to-face and
conversations and through
electronic devices with a clear
pronunciation and correctness,
fluency and spontaneity with
allows the understanding with
Criteria:
• clear, systematically developed
descriptions and presentations
• supporting ideas with subsidiary
points and relevant examples
• describing an experience
• describe with details different
topics
• give reasons and supporting them
in discussions and debates
274
The speaking rubric is scored with the help of a rubric. The rubric is analytic and contains
6 different criteria (appropriate pronunciation, phonology, rhythm and intonation;
grammar; vocabulary, organisation and register; interaction and discourse management).
It is quantitative and qualitative as it uses both a numeric and a word scales with six
degrees of excellence. It is domain-independent and skill-focused since it is only used for
the assessment of the speaking paper but for any type of task. Finally, it is a proficiency
rubric used by an examiner and paper based.
Type of rubric according to EOI
How is measured Analytic
How is scored Quantitative and Qualitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
no effort despite of sporadic
mistakes
• highlight the advantages or
disadvantages of different
options.
• give presentations with clarity and
fluency
• depart spontaneously from
discussion when follow up
questions are posed
275
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the speaking skill. In the literature review section, the
importance of the speaking skill has been fully addressed. Moreover, the CEFR boosts
the communicative approach so this is probably the most important skill.
➢ Are there few scoring criteria but correctly labelled?
No, there are too many criteria. Furthermore, the labels are too long.
➢ Are there degrees of excellence described appropriately?
Yes, the descriptors are well detailed although a bit long and no examples are provided.
➢ Is the rubric presented in a clear and handy way?
No. There are too many criteria and the descriptors are a bit long so it is very difficult to
manage the rubric.
The CEFR recommendations for good rubrics (the necessity of building a feasible tool,
descriptors positively worded, brief and not vague) have not been followed, the EOI
rubric is not feasible since it has too many criteria. Some of them could easily be grouped
under the same label so that there were fewer. In relation to the descriptors, they are
generally positively worded, as encouraged by the European Council, but there are some
descriptors that are negatively worded, especially in the lowest degrees of excellence. The
descriptors are not brief; they are quite long but well detailed.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, it can be
stated that the criteria the rubric uses are suitable because they are relevant to the skill. In
fact, the European Council uses 5 different criteria to assess speaking (range, accuracy,
276
fluency, interaction and coherence). If the labels are not the same, the truth is that 5 of the
criteria used by the EOI rubric are also used in the CEFR.
The validity can also be easily confirmed since the test, with the help of the rubric,
measures what it is supposed to assess and the descriptors provided match the evaluation
standards included in the CEFR. It can be checked by means of a simple comparison
between the CEFR writing scale and the EOI rubric:
Range, Fluency and
Coherence
Accuracy
/Grammar and
vocabulary
Interaction /
CEFR (B2) Has a sufficient
range of language
to be able to give
clear descriptions,
express viewpoints
on most general
topics.
Can produce
stretches of
language with a
fairly even tempo;
although he/she can
be hesitant. There
Shows a relatively
high degree of
grammatical
control. Does not
make errors which
cause
misunderstanding
and can correct
most of his/her
mistakes.
Can initiate
discourse, take
his/her turn when
appropriate and end
conversation when
he/she needs to.
Can help the
discussion along on
familiar ground
confirming
comprehension,
inviting others in,
etc.
277
are few noticeably
long pauses.
Can use a limited
number of cohesive
devices to link
utterances.
EOI (5 points) Adjusts the level of
formality precisely.
Makes a coherent
and clear speech.
Expresses
himself/herself
with spontaneity
and fluency.
Uses high-level
grammar structures
and communicates
with many varied
structures in an
excellent way.
Uses high-level
vocabulary
correctly and is
able to use precise
and varied
vocabulary.
Expresses
himself/herself
with autonomy and
fluency. Uses
expressions to start,
maintain and
conclude a
conversation.
Interaction is made
without much
effort with
spontaneity.
Reliability
Information related to reliability coefficients has been given in the writing section.
EXAM Tasks: Variable
278
Match CEFR tasks: Yes
Match CEFR criteria: Yes
RUBRIC Type Measurement Analytic
Scoring Quantitative and Qualitative
Theme Domain-relevant
Application Skill-focused
Function Proficiency
Scorer Examiner
Channel Paper
Relevant Yes
Valid Yes
Reliable Unknown
CEFR criteria Feasible No
279
Descriptors Positive No
Brief No
Not vague Yes
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
No
Descriptors (well
described)
Yes
Clear and Handy? No
6.4. Reading
Reading rubrics are not very widely used, neither here in Spain nor in countries such as
the United States, where rubrics have been commonly implemented for many years. In
fact, the well-known English as a foreign language certificates mentioned in the previous
section (Cambridge Certificates, Trinity College London, IELTS, AICLES and the EOI)
do not use any rubric to assess this skill. As has been already stated, rubrics were
280
traditionally writing assessment tools. The success of the communicative approach
facilitated its inclusion as part of the assessment of the speaking skill. However, despite
not being frequently used for the evaluation of the receptive skills, rubrics can work as
tools to evaluate the reading and listening skills too.
The reading comprehension is most commonly assessed through multiple choice
questions, true or false questions, sentence completion, open questions, gapped texts or
summaries. The CEFR indicates that the tasks selected should focus on (69):
• reading for general orientation
• reading for information, e.g. using reference works
• reading and following instructions
• reading for pleasure
• The language user may read
• for gist
• for specific information
• for detailed understanding
• for implications, etc
6.4.1. Literature Review
The correct interpretation and comprehension of a text is what is often referred to as
reading. According to Al-Ghazo it is an “active and mental process that improves
concentration and focus” (722). There are many reasons why the assessment of the
reading skill is important, but most of the literature reviewed mentions the same ideas.
Uribe-Enciso, among other authors, postulates that an effective reading opens the access
to a huge amount of digital and print information (39) while Al Ghazo stresses the benefits
281
as it provides the readers with such an expansion of vocabulary or the improvement of
other language aspects (grammar, writing) (722).
“In language learning reading promotes continuous expansion of vocabulary, full
awareness of syntactic structures and forms of written discourse, development of
cognitive skills and learner autonomy and increasing comprehensive knowledge
of any topic readers want to learn about” (Uribe-Enciso 39).
The process of reading implies many different sub-processes such as anticipation,
intelligent guesses the reader makes at the beginning, the understanding of the main ideas,
the comprehension of unknown words by context and, as Grabe states all those processes
“are performed according to the reader’s language proficiency level, the text type, the
reading purpose, the reader’s motivation” (cited in Uribe-Enciso 40). In the matter of
some of those sub-processes or reading skills, the elt resourceful web page created by
Rachel Roberts contains a full blog post where their use and instruction by teachers is
encouraged. The skills of prediction, reading for gist, reading for specific info and
skimming and scanning are explained there. The idea of encouraging prediction before
reading is related to the activation of vocabulary and knowledge previously known, which
might be connected to the topic used in the text and eases comprehension. Similar to the
prediction skill is the reading for gist process, which aims to provide students with an
overview of the text to verify they understand just the main idea of it, so that later on they
can try to read for specific information. Finally, skimming and scanning are two strategies
normally used with L2 learners as they allow them to glean important pieces of
information (scanning) without reading in detailed or getting the important ideas of it
(skimming). These strategies are very helpful in increasing the speed of reading with
regard to a language certificate exam or in improving their synthesis ability. Grellet
suggests these skills or processes are actually different types of reading although some of
282
the terminology she uses is different (extensive and intensive reading (cited in Karppinen
4). Knowledge of the processes, skills or types of reading is essential for its assessment
in L2 since L2 reading is more complex owing to the fact that “acquisition of systematic
knowledge and development of reading skills occur simultaneously” in contrast to what
happens in L1 reading (Uribe-Enciso 40). Therefore, the reinforcement and training of
them will balance and facilitate the L2 learner’s tasks.
The National Reading Panel (NPR) affirms there are three major factors which affect
reading comprehension. Those factors are vocabulary instruction, the active-interactive
strategic process and, finally, the preparation of the teacher. Thus, instructors must use
meta-cognitive strategies which deal with the planning, monitoring and evaluation of the
reading comprehension, cognitive strategies associated with the incoming information
and social and affective strategies linked to the interaction. (cited in Al-Ghazo 723). As
for the methodological strategies which should be used in the instructions, many studies
have been written. Most of them agree on a division of the strategies in lower-level
process or bottom-up; and the higher-level process or top-down. Kianiparsa and Vali
define the former as those processes “related to grammar and vocabulary recognition”,
whereas the latter ones are “related to comprehension, schemata, and interpretation of a
text (9). The same division was made by Grabe and Stoller. They classify the tasks of
word recognition, syntactic parsing, semantic proposition formation and working
memory activation as lower-level processes and text model of reading comprehension,
situation model of the reader interpretation and executive control processing as higher-
level process (cited in Karppinen 4-5). “Some bottom-up theorists, such as Abraham,
Carrel and Eisterhold, claim that the lack of automaticity in accessing linguistic data
causes poor skilled reading” (cited in Uribe-Enciso 40) so it is important to train this skill.
283
However, Kianiparsa defend that “for being a competent reader we need a combination
of both these processes [lower- and higher-level]” (10)
Some other strategies are mentioned in the instruction of reading. KWL (Know-Want to
Know-Learned), for example, promotes planning, goal setting, monitoring and evaluation
of the information contained in the text. CORI (Concept Oriented Reading Instruction)
focuses on selecting topics based on personal interests, gathering information through
reading and then working on a project. Strategies based on comparative learning tasks,
such as puzzles, conform the CRS (Collaborative Strategic Reading) (Uribe-Enciso 43).
The evaluation of the reading skill should be a tool for gathering information from the
learner’s reading abilities later used for the planning and implementation of better reading
lessons Thus, a wide variety of text and reading characteristics should be present in the
evaluation, and some different kinds of assessment methods and tools should be applied
(Kianiparsa and Vali 10 18). The most common reading assessment tasks encompass
“multiple-choice, written and oral recall, cloze, summary, sentence completion, short-
answer, open-ended question, true/false, matching activity, check list, ordering and fill-
in-the blank test” (14). Karppinen conducted research based on the kind of reading
activities and strategies used by Finnish ESL books. The results of her investigation show
that individual and pair activities are more common than group activities (only 7-10% of
the total) (14). It was also found that 75% of the reading activities were post-reading, in
contrast with a 25% of the activities conceived for pre-reading. Furthermore, no activities
were intended to be carried out during the reading (14). The most frequent tasks were
summary, open-ended questions and translation exercises (15). With regard to the training
of reading strategies, around 45% of the activities focused on careful or detailed reading,
only 20% worked on scanning, and a scant 3% dealt with skimming (16). As for purposes,
between 47% and 32% of the activities aimed to understand the core ideas of the text,
284
while around 30% aimed to elicit a personal response (17). Finally, close to 50% of the
reading activities where combined with speaking, and between a 34% and 44% were
combined with writing. The listening skill, however, is not often combined with any
reading activities (18).
The use of rubrics or grading scales for the assessment of the reading skill is virtually
non-existent, as is the literature review on it. Nevertheless, Grabe argues that “learners
sometimes do not carry out successful reading tasks because they are not aware of the
reading purposes and, therefore, they do not know what strategies are more appropriate
for them” (cited in Al-Ghazo 44). As a result, the research on the creation of rubrics and
its validation with case studies could be an interesting tool. Students could know what the
purposes and criteria are and address the reading task bearing them in mind.
6.4.2. Assessment of Reading in the main English Certificates of ESL
Cambridge First Certificate
The assessment of the reading skill in the Cambridge First Certificate is carried out
through a paper which includes both the Reading and the Use of English exam. The time
allowed is 1 hour and 15 minutes for a total of seven tasks. Three of those tasks intend to
measure the reading comprehension (Part 5, 6 and 7). “For Parts 5 to 7, the test contains
a range of texts and accompanying reading comprehension tasks” (Cambridge English
Language Assessment 7). Part 5 consists of a text followed up by 6 multiple choice
questions with 4 possible options each. This task intends to measure the
“detailed understanding of a text, including the expression of opinion, attitude,
purpose, main idea, detail, tone, implication and gist. Candidates are also tested
285
on their ability to recognise meaning from context and follow text organisation
features, such as exemplification, comparison and reference” (8)
Part 6 is a gapped text, some sentences from the original texts are removed and the learner
must identify which one goes in each gap. In this case, the focus is on the “text structure,
cohesion and coherence, and candidates’ ability to follow the development of a long text”
(9). Finally, part 7 consists of a long text divided into different parts and named by a
letter. There are ten questions and the candidate must indicate in which part of the text
that information can be found. The aim of the task tries to check how the candidate can
locate specific information and detail and recognise opinion and attitude in one long text
or a group of short texts.
With regard to the CEFR, the reading global scale provided states that a B2 candidate is
able to:
“read with a large degree of independence, adapting style and speed of reading to
different texts and purposes, and using appropriate reference sources selectively.
Has a broad active reading vocabulary but may experience some difficulty with
low frequency idioms” (69).
In addition, the CEFR contains four other reading scales:
• Reading correspondence
• Reading for orientation
• Reading for information and argument
• Reading instructions
The most important criteria included in those scales for the B2 users include the ability
to read correspondence, follow a text of instructions, understand specialised articles and
286
reports related to current issues. Among the functions that the candidate has to be able to
do are scanning through long and complex texts and finding relevant details and the
identification of the most important information, ideas or opinions (CEFR 69-71). Thus,
the tasks included in the First Certificate for the assessment of reading match many of the
criteria of the framework. However, the understanding of texts of instructions is not
present and it could be included. The understanding of correspondence is not assessed
through a specific task of the reading paper. Nevertheless, it is checked indirectly in the
writing paper since one of the tasks may include part of a letter received which must be
answered.
FCE CEFR
Task:
Part 5: multiple choice
Part 6: gapped text
Part 7: Multiple Matching
Tasks:
• reading for general orientation
• reading for information, e.g. using
reference works
• reading and following instructions
• reading for pleasure
• for gist
• for specific information
• for detailed understanding
• for implications, etc.
Criteria:
Part 5: detailed understanding of a text,
including the expression of opinion,
Criteria:
• read with a large degree of
independence, adapting style and
speed of reading to different texts
287
attitude, purpose, main idea, detail, tone,
implication and gist
Part 6: text structure, cohesion and
coherence, and candidates’ ability to
follow the development of a long text
Part 7: locate specific information and
detail, and recognise opinion and attitude
and purposes, and using
appropriate reference sources
selectively
• ability to read correspondence
• follow a text of instructions
• understand specialised articles
and reports related to current
issues
• scan through long and complex
texts
• find relevant details
• identification of the most
important information, ideas or
opinions
Concerning the assessment tool, the First Certificate does not use any kind of rubric or
grading scale to assess the candidate’s performance in the reading paper.
IELTS
Three reading passages are used to assess the reading completion in a 60-minute long test.
There is a total of 40 items. However, the three reading tasks are not always the same, but
they are three among from a total of 11different types. Task 1 type is a reading with
“multiple-choice” questions. The candidate may have to decide either which is the best
answer from four possible or the two best from five. The number of questions for this test
288
is variable and it “tests a wide range of reading skills, including detailed understanding
of specific points or an overall understanding of the main points of the text” (IELTS web
page. Test format). The second reading task type is referred to as “identifying
information”. The text is followed by several true, false or not mentioned questions and
it aims to assess “the test takers’ ability to recognise particular points of information
conveyed in the text. It can thus be used with more factual texts”. The next task type is
called “Identifying writer’s views/claims”, and the candidates may have to decide
whether some statements match the author’s opinion or not or if the information is not
given. It “assesses the test takers’ ability to recognise opinions or ideas, and so it is often
used with discursive or argumentative texts”.
The “matching information” task type implies the localisation of specific information
within paragraphs marked with letters. It focuses on the “ability to scan for specific
information”. Meanwhile, the “Matching headings” task measures the ability to
“recognise the main idea or theme in the paragraphs or sections of a text, and to
distinguish main ideas from supporting ones” (IELTS web page. Test Format) and
consists in matching each paragraph with the right headline. The “Matching features” task
type requires matching some statements with a list of options in order to check the
candidate’s ability to recognise relationships and connections. Those reading questions
that involve matching the first half of a sentence with one form a list of several options
which are labelled under the “matching sentence endings” task type and are aimed at
testing the understanding of the core ideas in a sentence.
Candidates may also be asked to complete a sentence with a given number of words based
on the text information, so that their ability to locate specific information can be checked.
This sort of task is named “sentence completion”. When the candidates need to complete
the summary or a table with information drawn from the text, the task type is called
289
“Summary, note, table, flow-chart completion”. The “Diagram label completion” task
type is the same, but with a diagram. Both of these tasks attempt to assess the “ability to
understand a detailed description” (ibid.). Finally, the “short-answer questions” task type
consists in answering some questions based on the information of the text with a specific
number of words. The task’s focus is the “ability to locate and understand precise
information in the text.” (ibid.)
With regard to the CEFR, as has already been explained above, it advises the uses of tasks
that implies the reading for different purposes and in different ways. The IELTS test
format, with three tasks from eleven possible types, broadly covers all the tasks indicated
in the CEFR.
IELTS CEFR
(3 tasks from 11 possible types)
Task type 1: Multiple choice
Task type 2: Identifying information
Task type 3: Identifying writer’s
views/claims
Task type 4: Matching information
Task type 5: Matching headings
Task type 6: Matching features
Task type 7: Matching sentence endings
Task type 8: Sentence completion
Task type 9: Summary, note, table, flow-
chart completion
Task type 10: Diagram label completion
Tasks:
• reading for general orientation
• reading for information, e.g. using
reference works
• reading and following instructions
• reading for pleasure
• for gist
• for specific information
• for detailed understanding
• for implications, etc.
290
Task type 11: Short-answer questions
Criteria:
Multiple choice tests a wide range of
reading skills, including detailed
understanding of specific points or an
overall understanding of the main points
of the text
Identifying information assesses the test
takers’ ability to recognise particular
points of information conveyed in the
text. It can thus be used with more factual
texts
This type of task assesses the test takers’
ability to recognise opinions or ideas, and
so it is often used with discursive or
argumentative texts.
Matching information assesses the test
takers’ ability to scan for specific
information. Unlike task type 5,
Matching headings, it is concerned with
specific information rather than with the
main idea.
Matching headers tests the test takers’
ability to recognise the main idea or
Criteria:
• read with a large degree of
independence, adapting style and
speed of reading to different texts
and purposes, and using
appropriate reference sources
selectively
• ability to read correspondence,
• follow a text of instructions,
• understand specialised articles
and reports related to current
issues.
• scan through long and complex
texts
• find relevant details
• identification of the most
important information, ideas or
opinions
291
theme in the paragraphs or sections of a
text, and to distinguish main ideas from
supporting ones
Matching features assesses the test
takers’ ability to recognise relationships
and connections between facts in the text
and their ability to recognise opinions
and theories. It may be used both with
factual information, as well as opinion-
based discursive texts. Test takers need to
be able to skim and scan the text in order
to locate the required information and to
read for detail.
Matching sentence endings assesses the
test takers’ ability to understand the main
ideas within a sentence.
Matching sentence endings assesses the
test takers’ ability to locate detail/specific
information
Summarising assesses the test takers’
ability to understand details and/or the
main ideas of a section of text. In the
variations involving a summary or notes,
test takers need to be aware of the type of
word(s) that will fit into a given gap (for
292
example, whether a noun is needed, or a
verb, etc.).
Diagram label completion assesses the
test takers’ ability to understand a
detailed description, and to relate it to
information presented in the form of a
diagram.
Short answer questions assess the test
takers’ ability to locate and understand
precise information in the text
As for the assessment tool, the IELTS does not use any kind of rubric or grading scale to
assess the candidate’s performance in the reading paper.
ISE-II
The Trinity College London reading paper for the B2 level is, as has already been
mentioned and explained, is taken along with the writing exam. Therefore, only the
writing tasks are explained in this section. There are two exclusive tasks for the
assessment of the reading comprehension. Task 1 is a 500-word text divided into 5
paragraphs which can be an article, a review, magazine, textbook or any other format the
candidate is familiar with. After reading the text, the candidate must answer 15 different
questions. The first 5 questions are for the candidate to demonstrate that he or she has
understood the main ideas though the matching of some headlines with each of the
293
paragraphs of the text. Afterwards, the student must find 5 true statements from a list of
eight. This task intends to test the understanding of specific information. Questions 11 to
15 are also for the understanding of specific and factual information. The learner has to
complete some sentences with one word or a few words.
Task 2 of the reading paper is called “multi-text reading” and it consists of four reading
texts presented together with 15 questions afterwards. The four texts together are roughly
about 500 words. In the first items, the candidate must point out to which of the texts the
question asked is related so that it is proved he or she understands the main idea and
purpose of each of them. The following questions are equivalent to the second section of
questions of the task 1: selection of 5 true statements from a list of 8. The last questions
(26-30) are for the completion of a summary with a number of words from the text.
As for the CEFR, it is advisable to use tasks that involves the reading for different
purposes and in different ways. The ISE-II test format, with two tasks which contain
different questions types and areas, covers most of the CEFR requirements.
ISE-II CEFR
Task I: Long reading
Title matching
Selecting the true statements
Completing Sentences
Task II: Multi-text reading
Multiple Matching
Selecting the true statements
Completing Summary notes
Tasks:
• reading for general orientation
• reading for information, e.g. using
reference works
• reading and following instructions
• reading for pleasure
• for gist
• for specific information
294
• for detailed understanding
• for implications, etc.
Criteria:
Task 1:
Title matching
The candidate must demonstrate that he
or she understands the
main idea of each paragraph.
Some useful reading subskills to practise
for this section are:
- skimming
- scanning
- reading for gist
- understanding the main idea
of each paragraph.
Selecting the true statements
demonstrates that he or she understands
specific, factual
information at the sentence level.
Some useful reading subskills to practise
for this section are:
- careful reading for specific
information
- comparing, evaluating and
inferring
Criteria:
• read with a large degree of
independence, adapting style and
speed of reading to different texts
and purposes, and using
appropriate reference sources
selectively
• ability to read correspondence
• follow a text of instructions
• understand specialised articles
and reports related to current
issues
• scan through long and complex
texts
• find relevant details
• identification of the most
important information, ideas or
opinions
295
- distinguishing principal
statement from supporting
examples or details
- distinguishing fact from
opinion
- scanning
Completing Sentences
demonstrate that he or she understands
specific, factual information at the
word and/or phrase level OR can infer
and understand across paragraphs
(e.g. writer’s attitude, line of argument).
Some useful reading subskills to practise
for this section are:
- careful reading for
comprehension
- understanding cohesion
patterns, lexis, grammar and
collocation
- deducing meaning
- understanding across
paragraphs.
Task 2: multi-text reading
Multiple Matching
296
demonstrate that he or she
understands the main idea and purpose of
each text.
Some useful reading subskills to practise
for this section are:
- skimming
- scanning
- reading for gist
- reading for purpose or main
ideas.
Selecting the true statements
demonstrate that he or she understands
specific, factual
information at the sentence level.
Some useful reading subskills to practise
for this section are:
- careful reading for specific
information
- comparing, evaluating and
inferring
- distinguishing principal
statement from supporting
examples or details
- distinguishing fact from
opinion
297
- scanning.
Completing summary notes
demonstrate that he or she understands
specific, factual information at the
word and/or phrase level across the texts.
Some useful reading subskills to practise
for this section are:
- careful reading for
comprehension at the word
and/or phrase level across
texts
- inferring
- summarising.
The Trinity College London ISE-II certificate does not use any rubric for the assessment
of the reading tasks in the reading and writing paper.
ACLES
The reading comprehension paper may have between 2 or 4 tasks and it lasts for between
60 and 70 minutes. The texts used are authentic with a minimum of 1,300 words each and
a maximum of 2,100 in total. The paper aims to check that the candidate is able to
understand the main ideas of complex texts. The Association of Language Centres in
Higher Education gives some guidelines, but it is each of the examination centres which
298
decides certain aspects. For example, the University of A Coruña centre of languages
decides whether to include 2 or 4 tasks. From its website, the criteria for the paper can be
found (Centro de Linguas, Universidade de A Coruña web page). These criteria included
in the following table have been translated from its website:
Acles CEFR
2-4 tasks
• Understand the main ideas of
complex texts
Tasks:
• reading for general orientation
• reading for information, e.g. using
reference works
• reading and following instructions
• reading for pleasure.
• for gist
• for specific information
• for detailed understanding
• for implications, etc
Criteria:
• Able to read independently,
adapting the style and the speed to
the different texts. The candidate
has a wide range of vocabulary
and may have some difficulties
with uncommon terms
Criteria:
• read with a large degree of
independence, adapting style and
speed of reading to different texts
and purposes, and using
appropriate reference sources
selectively
• ability to read correspondence
• follow a text of instructions
299
• Able to read and understand the
gist of texts related to his or her
speciality
• Can search for relevant details
fast
• Identifies the main contents in a
piece of news, article or reports
about a wide range of
professional issues
• Understand specialised article and
is can use a dictionary to confirm
his or her interpretation of
specific terms
• Get information, ideas and
opinions of different specialised
sources
• Understand extensive, complex
instructions, including details,
conditions and warnings
• understand specialised articles
and reports related to current
issues
• scan through long and complex
texts
• find relevant details
• identification of the most
important information, ideas or
opinions
According to the above, the criteria stated by ACLES do not match the CEFR guidelines,
since they are intended to check the candidate is able to understand just the main ideas.
Nevertheless, through the criteria stated by the centre of language of the UDC, it can be
300
elucidated that the identification of details and following of instructions are also being
measured. Moreover, the criteria stated are virtually equivalent to those stated in the
CEFR reading scales.
No rubric is used for the assessment of the reading paper.
EOI
The reading paper lasts for about 60 minutes. The number of tasks is variable, but they
may be one of the following (Escuela Oficial de Idiomas de Gijón. Departamento de
Inglés 173-174):
• Answer questions about general and specific reading comprehension (open-
ended, multiple choice, true or false)
• Find words or expressions in a text for the given definitions
• Match text fragments with ideas
• Complete dialogues
• Complete a text with given words
• Choose the best title for the text
• Indicate the main idea
• Match paragraphs with different titles
• Compare similarities and differences between two texts
• Identify the author’s purpose, intention or opinion
• Order a text
• Choose the best summary of the text
• Remove illogical words
301
• Recollocate sentences which have been removed from a text
• Complete a text with gaps
• Identify statements related to a text
• Use information from a text to solve a problem
• Ask questions to given answers
• Translate sentences
The texts used can be conversations or dialogues, applications forms, public
announcements, commercial adverts or informative leaflets, basic information on service,
instructions, postcards, e-mails, faxes, descriptive texts (people, places houses, work, etc.)
or short stories (174)
As for the criteria the exam intends to assess the reading comprehension of the candidates
and to check their ability to (172-173):
- Understand extensive, complex instructions
- Identify content and main ideas in articles, reports and pieces of news quickly.
- Understands letters and emails
- Understands articles and reports on current affairs in which the author
expresses a concrete opinion or point of view
- Identifies different points of view and main conclusions
- Identifies the topic, argumentative line, main ideas and details
- Interprets the cultural features, social conventions and lifestyles which appear
in the text
302
In regard to the guidelines given by the European Council in the framework, the paper
matches the tasks proposed and also the criteria to assess the candidate’s reading
comprehension ability.
EOI CEFR
Variable number of tasks
• Answer questions about general
and specific reading
comprehension (open-ended,
multiple choice, true or false)
• Find words or expressions in a text
for the given definitions
• Match text fragments with ideas
• Complete dialogues
• Complete a text with given words
• Choose the best title for the text
• Indicate the main idea
• Match paragraphs with different
titles
• Compare similarities and
differences between two texts
• Identify the author’s purpose,
intention or opinion
• Order a text
Tasks:
• reading for general orientation
• reading for information, e.g. using
reference works
• reading and following instructions
• reading for pleasure
• for gist
• for specific information
• for detailed understanding
• for implications, etc.
303
• Choose the best summary of the
text
• Remove illogical words
• Recollocate sentences which have
been removed from a text
• Complete a text with gaps
• Identify statements related to a text
• Use information from a text to
solve a problem
• Ask questions to given answers
• Translate sentences
Criteria:
- Understand extensive,
complex instructions
- Identify content and main
ideas in articles, reports and
pieces of news quickly
- Understands letters and emails
- Understands articles and
reports on current affairs in
which the author expresses a
concrete opinion or point of
view
Criteria:
• read with a large degree of
independence, adapting style and
speed of reading to different texts
and purposes, and using
appropriate reference sources
selectively
• ability to read correspondence
• follow a text of instructions
• understand specialised articles
and reports related to current
issues.
304
- Identifies different points of
view and main conclusions
- Identifies the topic,
argumentative line, main ideas
and details
- Interprets the cultural features,
social conventions and
lifestyles which appear in the
text
• scan through long and complex
texts
• find relevant details
• identification of the most
important information, ideas or
opinions
6.5. Listening
Concerning the listening skill, grading scales are not commonly used either. As occurred
with the other receptive skill, reading, the common English certificates mentioned in this
piece of work (with the exception of the ISE II) do no use any rubric to assess this skill.
As for the most common tasks used to the assessment of listening, they are multiple
choice questions, true or false listenings, open questions and sentence completion
exercises.
The CEFR lists the following listening tasks to assess this skill (65):
• listening to public announcements (information, instructions, warnings, etc.)
• listening to media (radio, TV, recordings, cinema)
305
• listening as a member of a live audience (theatre, public meetings, public lectures
entertainment, etc.)
• listening to overheard conversations, etc.
6.5.1. Literature Review
The literature about the use of rubrics is, as occurs with the other receptive skill, barely
non-existent. As was the case with the reading, most of the research deals with the
difficulties of the process and appropriate strategies of instruction. Some of the
information may be relevant for this dissertation as it is important to take it into account
for the construction of a rubric which can help the assessment of the listening skill.
Helgesen stated that “listening is an active, purposeful process of making sense of what
we hear” (24). As Celce-Murcia highlights, the importance of the assessment of this skill
comes from, the fact that it is the most frequently used in the daily life (cited in Solak and
Altay 190). Another vital reason for the importance of the listening comprehension is that
no learning can be realised without the correct comprehension of the input, which implies
that listening is essential for the development of the productive skill speaking (191).
The listening comprehension is a “highly problem-solving activity that can be broken
down into a set of distinct sub-skills” (cited 191). As a result, many studies have stressed
the necessity to integrate different processes such as the phonetic, phonological, prosodic,
lexical, syntactic, semantic and pragmatic in order to be able to understand any spoken
message. (190). The difficulties within the process are vast; Hedge classifies them into
internal and external (cited in ibid. 191). The lack of motivation of the level, his or her
level of anxiety, the lack of knowledge of the topic discussed or the appearance of many
306
unknown words are internal problems, whilst the environmental noises or the source
speaker’s characteristics are encompassed under the external problems label. Some other
authors have also pointed out certain factors that may be considered as obstacles, such as
the speaker’s accent, omissions, the length of the listening, the poor quality of the
recording or even the distance between the recorder and the listeners. But also the lack of
knowledge of listening strategies has been mentioned as a factor which hinders
comprehension (191).
Richards has collected some of those strategies which may be relevant for the students in
order to face a listening task and that should be instructed and trained. On the one hand,
the cognitive strategies “the mental activities related to the comprehending and storing in
working memory or long-term memory for later retrieval” (10). On the other hand, the
meta-cognitive strategies “are conscious or unconscious mental activities that perform an
executive function in the management of cognitive strategies” (11). Solak and Altay
conducted research on the beliefs of English Language Comprehension Problems with 12
prospective teachers. The findings suggest that research takers do not have difficulties to
find the main ideas of the listening or elicit the knowledge related to the topic, rather they
have problems with words not pronounced clearly and varied accents. In addition, the
presence of many unknown words was found to be as the most important reason for the
failure in the listening comprehension.
Some research conducted has studied various instruction strategies and activities which
may be useful to train this skill and assess it. In an article for The Internet TESL Journal,
Lavelle explained how to make the most of the listening activities contained in an ESL
textbook. He argues that there are basically four different phases for a lesson based on
listening practice. In the Listening Phase he suggests using both top-down and bottom-
up strategies. His proposal includes a first listening approach through the ticking of
307
certain key words or sentences. Once those words have been located, the learners listen
again in order to try to understand the meanings of those words. The second part of this
phase might consist on questions on the understanding of the whole passage. The second
phase is that of Grammaticalsation. Lavelle suggests Bastone’s bottom-up method, which
could be the use of the words previously learned in sentences and the combination thereof
through the application of suitable grammar. The third phase is named “Focus on Lexis”
and will discuss different collocations related with the vocabulary learned. Finally, the
Personalization Phase will lead students to discuss different questions in which they
would need to use the vocabulary and applied the grammar and collocations learned.
Walters and Chien recommend summarisation as a “high-skill” exercise for advanced
listening training as it requires learners to extract the main ideas and re-organise them
(313). The case study they carried out included eleven English listening and speaking
teachers and ten American native-English speaking college students. All members were
surveyed on their listening assessment preferences and the identification of how they
would assess some specific news text. The results suggest preferences for checking the
main ideas, identifying vocabulary and both listening for both gist and detailed
information. The selection of key ideas, vocabulary and model summary for the design
of a listening assessment enabled the researchers conclude that both groups of participants
agreed on the main ideas and most of the key vocabulary. Thus, summarisation was
validated as a useful technique.
6.5.2. Assessment of listening in the main English Certificates of ESL
Cambridge First Certificate
308
The Cambridge First Certificate paper for the assessment of the listening skill consists of
four different tasks. The first task involves listening to eight different and unrelated short
extracts and one multiple choice question for each of them. Each of the extracts has a
different focus: main point, purpose or location of the speech, relationships between
speakers’ attitude or opinion of the speakers. Part 2 is a monologue and the completion
of ten sentences in order to assess specific and detailed information. Gist, detail, function,
attitude, purpose and opinion are assessed in the third task. In this part, the candidate must
match statements with one of the five speakers. The final task is an interview or a
conversation and there are seven multiple choice questions with four possible answers
each. The focus is on the specific information, opinion, attitude, gist and main idea.
(Cambridge English Language Assessment 51)
In the CEFR there are five different scales to assess the listening skill. In the overall one,
it is pointed out how a B2 user:
“Can understand the main ideas of propositionally and linguistically complex
speech on both concrete and abstract topics delivered in a standard dialect,
including technical discussions in his/her field of specialisation. Can follow
extended speech and complex lines of argument provided the topic is reasonably
familiar, and the direction of the talk is sign-posted by explicit markers” (66)
Besides this scale, other four are provided for:
• Understanding interaction between native speakers
• Listening as a member of a live audience
• Listening to announcements and instructions
• Listening to audio, media and recordings
309
Those scales give the main criteria a B2 user must master in relation to the listening skills.
The criteria encompass the ability to keep up with a conversation, understand much of
what is said in a discussion in which he/she is participating and being able to participate,
understand and follow lectures, talks and reports with academic vocabulary,
announcements and messages and radio documentaries or broadcast audio. It can also
identify viewpoints and attitudes of different speakers (CEFR 66-68). According to this,
the tasks which form the listening exam would be suitable to assess the listening
comprehension of the learner. Nevertheless, none of them implies the response or
participation of the candidate to a discussion with native speakers and this could be an
interesting task for the certification of the level.
FCE CEFR
Task:
Task 1: short extracts multiple choice
Task 2: sentence completion
Task 3: Multiple Matching
Task 4: Interview multiple choice
Tasks:
• listening to public announcements
(information, instructions, warnings, etc.)
• listening to media (radio, TV,
recordings, cinema)
• listening as a member of a live audience
(theatre, public meetings, public lectures,
entertainment, etc.)
• listening to overheard conversations, etc.
Criteria:
Task 1: main point, purpose or location
of the speech, relationships between
Criteria:
• main ideas of propositionally and
linguistically complex
310
speakers’ attitude or opinion of the
speakers
Task 2: specific information, detail,
stated opinion
Task 3: Gist, detail, function, attitude,
purpose and opinion
Task 4: specific information, opinion,
attitude, gist and main idea
• Can follow extended speech and
complex lines of argument
• keep up with a conversation
• understand much of what is said
in a discussion in which he/she is
participating and be able to
participate
• understand and follow lectures,
talks and reports with academic
vocabulary, announcements and
messages and radio
documentaries or broadcast audio
• identify viewpoints and attitudes
of different speakers
As for the assessment tool, the First Certificate does not use any kind of rubric or grading
scale to assess the candidate’s performance in the listening paper.
IELTS
The listening paper includes 4 different tasks with 40 questions items and has a duration
of 30 minutes. The 4 tasks are taken from a total of 6 different tasks types.
The Multiple-choice task type, as its name indicates, consists in answering one question
followed by three possible answers to check the candidate’s ability to understand
311
“specific points or an overall understanding of the main points of the listening text”.
(IELTS web page. Test format). Another task type is the so-called “matching” type, and
test takers must match a list of items with a set of options to check the skill of listening
for detail. The “plan, map, diagram labelling” task type consists in completing a map or
diagram with the suitable words to assess the ability to understand instructions. A similar
task type is “Form, note, table, flow-chart, summary completion”, but in this case the
learner must complete a summary or a table with the same intention. Although it may
seem similar, the “sentence completion” task type is different from the previous ones
since the candidates are required to complete some sentences with the suitable words, but
the test items are not as visual as in the previous two types but take the form of a normal
text. However, the focus is the same. Finally, candidates have to answer briefly one
question in the “short-answer” question types which aims to test the “ability to listen for
concrete facts, such as places, prices or times, within the listening text.” (ibid.)
The CEFR gives different examples of tasks suitable for the assessment of the listening
comprehension. The wide variety of task types the IELTS uses makes easy to confirm
that the main criteria to determine the level of the speaker will be covered with this test
type.
IELTS CEFR
4 Tasks from 6 possible task types:
Task type 1 – Multiple choice
Task type 2 – Matching
Task type 3 – Plan, map, diagram
labelling
Tasks:
• listening to public announcements
(information, instructions, warnings, etc.)
• listening to media (radio, TV,
recordings, cinema)
312
Task type 4 – Form, note, table, flow-
chart, summary completion
Task type 5 – Sentence completion
Task type 6 – Short-answer questions
• listening as a member of a live audience
(theatre, public meetings, public lectures,
entertainment, etc.)
• listening to overheard conversations, etc.
Criteria:
Multiple choice questions are used to test
a wide range of skills. The test taker may
be required to have a detailed
understanding of specific points or an
overall understanding of the main points
of the listening text.
Matching assesses the skill of listening
for detail and whether a test taker can
understand information given in a
conversation on an everyday topic, such
as the different types of hotel or guest
house accommodation. It also assesses
the ability to follow a conversation
between two people. It may also be used
to assess test takers’ ability to recognise
relationships and connections between
facts in the listening text
This type of task assesses the ability to
understand, for example, a description of
Criteria:
• main ideas of propositionally and
linguistically complex
• Can follow extended speech and
complex lines of argument
• keep up with a conversation
• understand much of what is said
in a discussion in which he/she is
participating and be able to
participate
• understand and follow lectures,
talks and reports with academic
vocabulary, announcements and
messages and radio
documentaries or broadcast audio
• identify viewpoints and attitudes
of different speakers
313
a place, and to relate this to a visual
representation. This may include being
able to follow language expressing
spatial relationships and directions (e.g.
straight on/through the far door).
This focuses on the main points which a
listener would naturally record in this
type of situation.
Sentence completion focuses on the
ability to identify the key information in
a listening text. Test takers have to
understand functional relationships such
as cause and effect.
Sentence completion focuses on the
ability to listen for concrete facts, such as
places, prices or times, within the
listening text.
No rubric is used for the assessment of this paper.
ISE II
314
As stated above, the listening and the speaking skills are assessed together in the listening
and speaking paper. Of the 4 tasks of this paper, the final one is specifically for the
assessment of listening, although it is also assessed in the speaking tasks.
The “independent listening task” lasts for 8 minutes. It consists in listening to a
monologue. The examiner asks the candidate some questions before and after the
listening. The first time the monologue is played the questions are intended to check the
listening for gist. The second time, the questions require listening for details. The
candidate answers orally but may take some notes the second time.
The CEFR does include listening to a monologue (listening as a member of a live
audience, theatre, public meetings, public lectures, entertainment, etc.) as a suitable task
to assess the listening comprehension of the students. Nevertheless, it would be
recommendable to include some other listening tasks with varied formats. Besides, the
fact that there is only one task makes it difficult to assess all the criteria the CEFR
recommends for the assessment of the listening comprehension in the B2 level.
ISE-II CEFR
Part 4: Independent listening task.
Listening to a monologue twice.
Tasks:
• listening to public announcements
(information, instructions, warnings, etc.)
• listening to media (radio, TV,
recordings, cinema)
• listening as a member of a live audience
(theatre, public meetings, public lectures,
315
entertainment, etc.)
• listening to overheard conversations, etc.
Criteria:
Showing ability to process and report
information, including main points and
supporting detail
- Placing information in a wider
context
- Inferring information not
expressed explicitly
- Reporting speaker’s intentions
- Inferring word meaning
Criteria:
• main ideas of propositionally and
linguistically complex
• Can follow extended speech and
complex lines of argument
• keep up with a conversation
• understand much of what is said
in a discussion in which he/she is
participating and be able to
participate
• understand and follow lectures,
talks and reports with academic
vocabulary, announcements and
messages and radio
documentaries or broadcast audio
• identify viewpoints and attitudes
of different speakers
316
The ISE-II uses two rubrics to assess the listening and speaking paper. On the one hand,
in the rubric for the speaking tasks the listening criteria has been included. On the other
hand, a rubric exclusive for the independent listening task is used for the assessment.
The rubric is holistic as the skill is assessed globally. It is quantitative with a numeric
scale from 0 to 4. As it is only used for the assessment of this particular task, it is a
domain-independent rubric and task-specific. Moreover, it is a proficiency rubric, paper
based although it can be found online, and it is used by an examiner.
Type of rubric according to ISE-II
How is measured Holistic
How is scored Quantitative
Theme Domain-relevant
Application Task specific
Function Proficiency
Scorer Teacher (examiner)
Channel Paper
Below there is the analysis in depth:
➢ Is the skill assessed actually worthwhile?
Yes. The skill assessed is the listening skill. In the literature review section, the
importance of the listening skill has been fully addressed.
➢ Are there few scoring criteria but correctly labelled?
It is a holistic rubric so there are no individually assessed criteria.
➢ Are there degrees of excellence described appropriately?
317
Yes. All the descriptors are well explained. However, they do not contain any example.
➢ Is the rubric presented in a clear and handy way?
Yes, the rubric is brief and clear which eases its use.
The CEFR recommendations for good rubrics (the necessity of building a feasible tool,
descriptors positively worded, brief and not vague) have been partially adopted, the ISE-
II rubric is feasible since it does not contain individual criteria to be assessed and is very
clear and handy. In relation to the descriptors, they are positively worded as the CEFR
recommends.
Finally, are the rubric and the test relevant, valid and reliable? On the one hand, the
criteria for the task are relevant because the CEFR includes both the task and the listening
functions it intends to measure. However, since there is only one listening task, it lacks
many functions and types of recording the framework recommends. The validity can also
be easily confirmed since the test, with the help of the rubric, measures what it is supposed
to assess. The main problem is that due to the lack of varied listening tasks it cannot be
checked if the candidate can do all the things stated in the CEFR scale for a B2 learner.
CEFR
B2
Can understand standard spoken language, live or broadcast, on both
familiar and unfamiliar topics normally encountered in personal, social,
academic or vocational life. Only extreme background noise, inadequate
discourse structure and/or idiomatic usage influences the ability to
understand.
Can understand the main ideas of propositionally and linguistically
complex speech on both concrete and abstract topics delivered in a
318
standard dialect, including technical discussions in his/her field of
specialisation.
Can follow extended speech and complex lines of argument provided
the topic is reasonably familiar, and the direction of the talk is sign-
posted by explicit markers.
ISE-II
(Level 4)
• Identifies and reports all important points relevantly
• Shows full understanding of main points, and how they relate
to the message as a whole
• Makes sense of connected English speech rapidly and
accurately with confidence
• Fully infers meanings left unstated (e.g. speaker’s
viewpoints)
Reliability
Reliability information has been given in the writing section.
EXAM Tasks: 1 (task 3 of the speaking paper also assesses
listening a little)
Match CEFR tasks: No
Match CEFR criteria: No
319
RUBRIC Type Measurement Holistic
Scoring Quantitative
Theme Domain-relevant
Application Task-specific
Function Proficiency
Scorer Examiner
Channel Paper
Relevant No
Valid No
Reliable Yes
CEFR criteria Feasible Yes
Descriptors Positive Yes
Brief Yes
320
Not vague Yes
Popham’s rubric Skill worthwhile Yes
Scoring Criteria
(few and well labelled)
Not relevant
Descriptors (well
described)
Yes
Clear and Handy? Yes
ACLES
The listening paper included in the B2 certificate of ACLES lasts for around 30 to 40
minutes. There must be at least 2 tasks and a maximum of 4. The records can be in video
or simply audio format and they must last at least 2 minutes but no more than 5. The UDC
centre of languages states that the paper aims to check the ability of the candidate to
understand face-to-face conversation and recorded speeches about different topics from
personal life to academic or professional issues. The criteria included in the following
table are the ones quoted in its web page. (Centro de Linguas Universidade da Coruña
web page)
321
ACLES CEFR
Tasks:
(Two, three or four)
Ability of the candidate to understand
face-to-face conversation and recorded
speeches about different topics from
personal life to academic or professional
issues
Tasks:
• listening to public announcements
(information, instructions, warnings, etc.)
• listening to media (radio, TV,
recordings, cinema)
• listening as a member of a live audience
(theatre, public meetings, public lectures,
entertainment, etc.)
• listening to overheard conversations, etc.
Criteria:
• understand any sort of speech
unless there is excessive
background noise or contains too
many specific terms or a bad
structure
• Understand main ideas of a
complex speech about concrete or
abstract topics
• Understand complex lines of
argumentation when the topic is
familiar and is developed with
explicit markers
Criteria:
• main ideas of propositionally and
linguistically complex
• can follow extended speech and
complex lines of argument
• keep up with a conversation
• understand much of what is said
in a discussion in which he/she is
participating and be able to
participate
• understand and follow lectures,
talks and reports with academic
vocabulary, announcements and
322
messages and radio
documentaries or broadcast audio
• identify viewpoints and attitudes
of different speakers
According to the table, the listening paper is suitable to assess whether the candidate has
a B2 listening level. It would be advisable to include any listening task which involved a
conversation between two speakers.
The ACLES listening exam paper does not use a rubric for the assessment.
EOI
The listening paper from the EOI advanced level lasts for at least 45 and is formed by
different recordings (number unspecified) that will be played two times, and a number of
writing tasks to check the understanding. The kind of tasks that may appear are multiple
choice, true or false, brief answers, relate different texts to headlines, put parts of a text
in the correct order, identify images, identify the main points or ideas in a conversation,
complete tables, drawings, maps, diagrams, recognise communicative situations and
follow instructions (Escuela Oficial de Idiomas de Gijón. Departamento de Inglés 137-
138). The paper aims to check the ability of the candidate to understand statements and
messages, warnings and instructions about abstract or concrete themes; understand the
main ideas in conferences, talks and reports; understand TV news or programmes about
current affairs; understand documentaries, live interviews and parts of TV and film
pieces; understand conversations among native speakers; understand face-to-face
323
conversations; understand discussions on issues related to his or her speciality; identify
context elements; recognise terms, and expressions and complex sentences in common
situations (137). With regard to the guidelines stated in the framework, the EOI paper
would be suitable to assess the listening skill since it may contain tasks proposed in the
framework itself and the criteria written there.
EOI CEFR
Tasks:
Unspecified
• multiple choice
• true or false
• brief answers
• relate different texts to headlines
• put parts of a text in the correct
order
• identify images
• identify the main points or ideas
in a conversation
• complete tables, drawings, maps,
diagrams
• recognise communicative
situations and follow instructions
Tasks:
• listening to public announcements
(information, instructions, warnings, etc.)
• listening to media (radio, TV,
recordings, cinema)
• listening as a member of a live audience
(theatre, public meetings, public lectures,
entertainment, etc.)
• listening to overheard conversations, etc.
Criteria:
the ability of the candidate to:
Criteria:
324
• understand statements and
messages, warnings and
instructions about abstract or
concrete themes
• understand the main ideas in
conferences, talks and reports
• understand tv news or
programmes about current affairs
• understand documentaries, live
interviews and parts of tv and film
pieces
• understand conversations among
native speakers
• understand face-to-face
conversations
• understand discussions on issues
related to his or her speciality;
• identify context elements;
recognise terms, and expressions
and complex sentences in
common situations
• main ideas of propositionally and
linguistically complex
• Can follow extended speech and
complex lines of argument
• keep up with a conversation,
• understand much of what is said
in a discussion in which he/she is
participating and be able to
participate
• understand and follow lectures,
talks and reports with academic
vocabulary, announcements and
messages and radio
documentaries or broadcast audio
• identify viewpoints and attitudes
of different speakers
The EOI listening paper does not use a rubric for the assessment.
325
6.6. Findings
The previous, detailed analysis of both the exam papers and the rubrics used (if any) to
assess some of the exam tasks enables the comparison between them in order to make
some findings.
The first step is the comparison between exam papers. In this case, the English Certificate
tests analysed will be compared in terms of skill and also in general.
Writing
CEFR (Writing)
Tasks: Criteria:
• completing forms and
questionnaires
• writing articles for magazines,
newspapers, newsletters, etc.
• producing posters for display
• writing reports, memoranda, etc.
• making notes for future reference
• letters
A B2 learner can produce clear, detailed
text on a wide range of subjects and
explain a viewpoint on a topical issue
giving the advantages and independent
disadvantages of various options.
Can write a review of a film, book or
play.
Can write an essay or report.
English
Certificate
FCE IELTS
(Band 6)
ISE II ACLES EOI
Time 1h 20 min 60 min 2h (together
with 2
70-90 min 90 min
326
reading
tasks)
Nº of Tasks 2 2 2 2 1 or 2
Word
length
140-190 Task 1:
150-180
Task 2:
250-300
180 At least 125
each and no
more than
450 in total
75-250
Rubric? Yes Yes Yes Yes Yes
Match
CEFR
tasks
Task 1: Yes
Task 2: Yes
Task 1: No
Task 2: Yes
Task 1: Yes
Task 2: Yes
Task 1: yes
Task 2: yes
Task 1: Yes
Task 2: Yes
Match
CEFR
criteria
Yes Yes Yes No
(unknown)
Yes
Concerning the writing papers, the comparison shows how all the English Certificate
exams test the writing ability of the candidate in a separate paper except for the Trinity
ISE II, which tests it in a paper together with the reading. It can also be stated that almost
all the certificates have designed the paper taking into account the CEFR, since all the
tasks but one are mentioned in the framework as suitable tasks to assess this skill; and
also all of the criteria related to the level match the B2 criteria proposed by the European
Council with the exception of the ACLES certificate. In this case, it is not known whether
or not the criteria match because although it is said that they are based on the CEFR, they
do not appear written specifically.
327
With regard to the IELTS writing task 1, it implies the completion of a graphic or diagram,
a task not contemplated by the CEFR. However, the CEFR includes the completion of
questionnaires as a suitable task so it would not affect the reliability of the test. As for the
ACLES writing paper, it does not specify the assessment criteria for the writing, which
makes its comparison with the framework impossible, but does not necessarily mean that
the paper is not suitable. Nevertheless, it has already been explained how important it is
for the students or candidates to be aware of the assessment criteria, so it would be
recommendable to include them.
Speaking
CEFR (Speaking)
Tasks: Criteria:
• public address (information,
instructions, etc.)
• addressing audiences (speeches at
public meetings, university
lectures, sermons, entertainment,
sports commentaries, sales
presentations, etc.)
• reading a written text aloud
• speaking from notes, or from a
written text or visual aids
(diagrams, pictures, charts, etc.)
• acting out a rehearsed role
• clear, systematically developed
descriptions and presentations
• supporting ideas with subsidiary
points and relevant examples
• describing an experience
• describe with details different
topics
• give reasons and supporting them
in discussions and debates
• highlight the advantages or
disadvantages of different options
328
• speaking spontaneously
• give presentations with clarity and
fluency
• depart spontaneously from
discussion when follow up
questions are posed
English
Certificate
FCE IELTS
(Band 6)
ISE II ACLES EOI
Time 15 min 11-14 min 20 min
(together
with the
listening
paper)
7-10 min 10 min
Nº of Tasks 4 4 3 2 Variable
Rubric? Yes Yes Yes Yes Yes
Match
CEFR
tasks
Task 1: Yes
Task 2: Yes
Task 3: Yes
Task 4: Yes
Task 1: Yes
Task 2: Yes
Task 3: Yes
Task 4: Yes
Task 1: Yes
Task 2: Yes
Task 3: Yes
No
(unknown)
Yes
Match
CEFR
criteria
Yes Yes Yes No
(unknown)
Yes
The speaking is assessed through an individual paper in all the exams but the ISE II,
because of its integration concept which measures the skills combined in pairs. It is
clear that the papers have been designed according to the CEFR tasks and criteria. Once
329
again, ACLES paper gives no information about the criteria and the tasks, so it is
impossible to examine it.
Reading
CEFR (Reading)
Tasks: Criteria:
• reading for general orientation
• reading for information, e.g. using
reference works
• reading and following instructions
• reading for pleasure.
• for gist
• for specific information
• for detailed understanding; for
implications, etc.
• read with a large degree of
independence, adapting style and
speed of reading to different texts
and purposes, and using
appropriate reference sources
selectively
• ability to read correspondence
• follow a text of instructions,
• understand specialised articles
and reports related to current
issues
• scan through long and complex
texts
• find relevant details
• identification of the most
important information, ideas or
opinions
330
English
Certificate
FCE IELTS
(Band 6)
ISE II ACLES EOI
Time 1h 15 min
(together
with Use of
English)
60 min 2h (together
with the
writing
paper)
60-70 min 60 min
Nº of Tasks 3 3 (from 11
possible
types)
2 2- 4 Variable
Rubric? NO No No No No
Match
CEFR
tasks
Task 1: Yes
Task 2: Yes
Task 3: Yes
Task 1: Yes
Task 2: Yes
Task 3: Yes
Task 1: Yes
Task 2: Yes
No
Yes
Match
CEFR
criteria
Yes Yes Yes No
Yes
From the comparison, it is fundamental to highlight the fact that none of them uses a
rubric in order to assess the exam paper. As has already been explained, the criteria stated
by ACLES do not match the CEFR guidelines. The reason why is that they aim to check
if the candidate is able to understand just the main ideas, while the CEFR also suggests
the assessment of reading for details, following instructions, etc. Nevertheless, through
the criteria stated by the UDC Language Centre, it can be elucidated that the identification
of details and following of instructions are also being measured.
331
Listening
CEFR (Listening)
Tasks: Criteria:
• listening to public announcements
(information, instructions, warnings, etc.)
• listening to media (radio, TV,
recordings, cinema)
• listening as a member of a live audience
(theatre, public meetings, public lectures,
entertainment, etc.)
• listening to overheard conversations, etc.
• main ideas of propositionally and
linguistically complex
• can follow extended speech and
complex lines of argument
• keep up with a conversation,
• understand much of what is said
in a discussion in which he/she is
participating and be able to
participate
• understand and follow lectures,
talks and reports with academic
vocabulary, announcements and
messages and radio
documentaries or broadcast audio
• identify viewpoints and attitudes
of different speakers
English
Certificate
FCE IELTS
(Band 6)
ISE II ACLES EOI
Time 40 min 30 min 8 min 30-40 min 45 min
332
Nº of Tasks 4 4 (from 6
different
types)
1 2- 4 Variable
Rubric? NO No Yes No No
Match
CEFR
tasks
Task 1: Yes
Task 2: Yes
Task 3: Yes
Task 4: Yes
No No Yes
Yes
Match
CEFR
criteria
No Yes No No
Yes
The listening papers have in common the lack of a grading scale for the assessment of the
candidate, with the exception of the ISE II, which does include one. It is also worth noting
that it is the skill’s exam paper in which more incongruities between the framework and
the papers are found. The FCE criteria do not match the CEFR criteria because none of
them implies the response or participation of the candidate to a discussion with native
speakers, and this could be an interesting task for the certification of the level. As for the
IELTS, the tasks proposed are different from the framework’s examples of tasks for the
assessment of the listening comprehension. Another certificate which does not follow the
criteria is the ACLES paper as none of the tasks includes any listening task which involves
a conversation between two speakers. The criteria for the assessment cannot therefore be
matched.
333
English Certificate paper’s comparison
FCE IELTS
(Band 6)
ISE II ACLES EOI
Nº of exam papers 4 4 2 4 4
Match
CEFR
tasks
Writing Yes Yes Yes Yes Yes
Speaking Yes Yes Yes No Yes
Reading Yes Yes Yes No Yes
Listening Yes No No Yes Yes
Match
CEFR
criteria
Writing Yes Yes Yes No Yes
Speaking Yes Yes Yes No Yes
Reading Yes Yes Yes No Yes
Listening No Yes No No Yes
Reliability Cronbach’s
alpha
0.94 (List)
0.92
(Wr)
0.90
Unknown Unknown Unknown
SEM 2.78 (List)
0.37
(Wr)
0.38
Unknown Unknown Unknown
Rubric’s comparison
334
WRITING FCE IELTS ISE II ACLES EOI
Type Measurement Analy. Analy. Analy. Analy. Analy.
Scoring Quant. Quant Quant Quant+qu
al
Quant+qu
al
Theme D-R D-R D-R D-R D-R
Application Skill-
Foc
Task-
foc
Task-
foc
Skill-Foc Task-foc
Function Prof. Prof. Prof. Prof. Prof.
Scorer Examin
er
Examin
er
Examin
er
Examiner Examiner
Channel Paper Paper Paper Paper Paper
Relevant Yes Yes Yes Unknown Yes
Valid Yes Yes Yes Yes Yes
CEFR Feasible Yes No No Yes No
Descripto
rs
Pos. Yes No No No No
Brief Yes Yes No Yes No
No
Vagu
e
No No Yes Yes Yes
Popham
’s rubric
Skill worthwhile Yes Yes Yes Yes Yes
Scoring criteria
(few+well
labelled)
Yes Yes Yes No No
Descriptors (well
described)
No Yes Yes Yes Yes
335
Clear+handy? Yes No No Yes No
Analy.= analytic; Quant= quantitative; Qual= qualitative; D-R= Domain Relevant;
Skill-Foc= skill focused; Task-Foc= task focused; Prof.=proficiency; Pos. =positive
From the comparison of the rubrics, a number of interesting findings may be commented
on. It is especially relevant that all the rubrics are analytic and none of them holistic.
Presumably, this is because the certificates attempt to give the most accurate score
possible. Analytic rubrics measure different aspects of the candidates writing such as the
organisation, lexis, grammar, etc. Hence, it is easier for the marker to give a precise score.
Furthermore, all of them are quantitative except for the ones used by the ACLES and the
EOI, which contain both quantitative and qualitative scales. The main reason why may
be that these certificates give the candidates a numeric score in order to reflect somehow
not only whether they have the B2 level, but also how far they are from achieving it or
how well they have performed. Nevertheless, a qualitative approach could also be
feasible, although it would imply the inclusion of some kind of feedback or explanation
by the examiner.
In the matter of the application of the rubric, two of them are skill-focused (FCE and
ACLES), as they are used for the assessment of all the writing tasks, whereas the other
three are task-focused because they use rubrics specific for each of the tasks.
It is worth mentioning that none of the rubrics have passed all the criteria, with the FCE’s
and the ACLES’s being the grading scales with the fewest fails. This illustrates the
complexity of the creation of a rubric and how difficult it is to design a perfect one.
336
SPEAKING FCE IELTS ISE II ACLES EOI
Type Measurement Analy. Analy. Analy. Analy. Analy.
Scoring Quant. Quant Quant Quant+qu
al
Quant+qu
al
Theme D-R D-R D-R D-R D-R
Application Skill-
Foc
Skill-
foc
Skill-
foc
Skill-Foc Skill-foc
Function Prof. Prof. Prof. Prof. Prof.
Scorer Examin
er
Examin
er
Examin
er
Examiner Examiner
Channel Paper Paper Paper Paper Paper
Relevant Yes Yes Yes Yes Yes
Valid Yes No Yes Yes Yes
CEFR Feasible Yes No No Yes No
Descripto
rs
Pos. Yes No No Yes No
Brief Yes Yes No Yes No
No
Vagu
e
No No Yes No Yes
Popham
’s rubric
Skill worthwhile Yes Yes Yes Yes Yes
Scoring criteria
(few+well
labelled)
Yes Yes Yes Yes No
Descriptors (well
described)
No Yes Yes No Yes
337
Clear+handy? Yes No No Yes No
In the case of the speaking rubrics, similar results were encountered. The preferred rubric
is the analytic and quantitative and also the skill-focused one, since all the rubrics are
used for assessing all the speaking tasks of each of the exam papers. No rubric was found
that passed the whole analysis without failures. The FCE rubric and the ACLES rubic
were the ones which failed in fewer categories, since they do not contain any description
for some of the bands.
It can be highlighted that the criteria were few and well labelled in all the rubrics except
for the EOI’s one. On the other hand, only three of them presented the descriptors
correctly described. In the case of the EOI’s rubric, the descriptors were precisely written.
However, there were too many, making the rubric not handy and not feasible. Something
very similar happens with the Trinity ISE-II’s rubric. Even though its criteria are few and
well labelled, the fact that the descriptors are so long make the rubric very unhandy.
339
Chapter 7: CONCLUSIONS
This thesis can be concluded by making a general reflection upon what this research
means within the area studied and what it entails for the current educational community.
One of the main goals of the current doctoral thesis was to determine the degree of
implementation of the CEFR in the principal English certificates, as well as to examine
the use of rubrics in the assessment of different skills. The results of an extensive
examination of the exams and their grading scales show that more research is needed in
the field as incongruences and shortcomings have been detected.
The research undertaken has been satisfactory in many ways. It has enabled not only a
deeper understanding of the complex assessment process; but also a better perception of
the CEFR as a cornerstone for the TEFL in Europe, and the main driving force towards
the establishment of a communicative system in each of its member States. However, one
of the most significant findings to emerge from this thesis is the verification that the
certificates that must determine the learner’s competence present faults in the
implementation of the basic CEFR guidelines, and they assess with rubrics which do not
meet the efficiency and reliability requirements that are expected of them. Moreover,
significant research limitations stem from the lack of transparency the certificates show
in terms of assessment criteria, instruments and reliability data. In spite of their official
status and national recognition, some certificates provide vague information on the
structure of their exams and do not supply data on the validity of their exams, such as the
SEM or the Cronbach’s alpha. As a result, the reflections of this conclusion must be
addressed from the need for additional extensive research in order to complement and
improve them.
340
With regard to the positive aspects, the analysis of both the exam papers and the grading
scales and the comparisons made open up interesting lines for future research.
Furthermore, the community can benefit from the key findings which have established
patterns between the assessment of the different skills and contribute to the improvement
of the evaluation process from the perspective of exam creation and rubrics design. The
switch towards a more communicative assessment of the receptive skills can also be
initiated with some of the basis of the current dissertation.
7.1. Research implications
The research carried out allows reflections to be made related to many different aspects
or areas, such as the institutions in charge of the certificates, the development of exams,
the creation of rubrics and the CEFR. Each of those reflections derives from some
problem, obstacle or issue which should be addressed promptly for the sake of the
teaching-learning process.
The first observation deals with the evidence that the exam stipulations and criteria are
not always at the candidate’s disposal, as the exam structure is sometimes variable,
unclear or confusing. Indeed, among the tests studied, just three of the certificates provide
test candidates with complete information on the exam criteria and the tasks that will be
faced. Here it is essential to bear in mind that a great deal of research has been conducted
on the benefits of giving EFL learners access to the rubrics, the explanation of their
criteria and the provision of students with good and bad examples (Wang, Sundeen,
Becker and Laurian and Fitzgerald). The ACLES certificate, for instance, makes no
reference to the exam’s criteria for writing and speaking, even when they are specified
for the reading and listening paper. In the case of the EOI certificate, the tasks of the
341
writing paper are mentioned but not explained, as occurs with the exercises of the
speaking, listening and reading parts.
Despite this absence of concrete details in the institutions’ webpages, it has been assumed
that the official examiners of both certificates do manage this information; otherwise, this
would imply that the certificates have a complete lack of reliability. It is therefore
recommendable that the exam contents and assessment specifications be published so that
candidates can prepare the exams accordingly. It should be noted, however, that both
certificates depend partially on local institutions: the different universities where it is
assessed, in the case of the ACLES certificate, and the different schools and autonomous
communities in the case of the Schools of Languages. This may be the reason why criteria
and tasks vary from one exam to another, or it may explain why some of these institutions
facilitate the specifications of their tests and others do not. For example, the EOI from A
Coruña does not publish its official assessment rubrics but the EOI from Gijón shares
them on its website. With the aim of shedding light on the transparency of the exams, it
would be advisable to unify criteria, exam structures, tasks and rubrics, which would
further increase the reliability of the certificates.
Following this line of transparency and specification, this analysis found insufficient data
and research on the reliability of the certificates. Certificates of such significance at
national level ought to be endorsed with sufficient research on effectivity and validity.
Nevertheless, whereas the FCE and the IELTS websites contain data concerning
reliability indicators (e.g. the Cronbach’s alpha and SEM) as well as articles and
documents explaining research and studies carried out regarding the reliability of their
certificates, the ISE-II, ACLES and EOI websites do not mention any empirical figure.
In the current doctoral dissertation, only the data available could be taken into account in
the examination and comparison of the different exams as research into the verification
342
of each of the papers’ reliability would imply a complete parallel and immeasurable
investigation.
Another implication which may be drawn from the study conducted is the fact that none
of the rubrics used for the assessment has passed the test of effectivity. This clearly
illustrates the difficulty and complexity which designing a really effective and valid rubric
may entail. One of the main complications discovered in the comparison of results is the
achievement of a suitable balance between the brevity the level descriptors are supposed
to have and the clear and detailed explanations they should give. Some of the rubrics
show an exemplification of this obstacle; for instance, the writing rubric used by the EOI
measures many different criteria and, as a result, it is neither feasible nor clear and useful.
On the other hand, the FCE writing rubric is feasible and useful, and it contains a suitable
number of criteria, but the descriptors are too vague and not well described. Something
similar happens with the ISE II’s grading scale: the descriptors are suitable, well
explained and clear, but they are not brief.
The fact that none of the rubrics have passed all the assessment criteria lead to the
consideration of the real feasibility of the framework itself. The CEFR gives guidance on
how to assess the different skills and it does so in different ways. To begin with, it contains
charts which provide information on the characteristics a candidate must achieve in each
of the language levels for each of the skills. This allows teachers or institutions to prepare
curricula, syllabi design evaluations and tests for the assessment of the contents learned
or for diagnosis. Moreover, it includes grading scales with descriptors for each of the
skills and each of the levels and even for different tasks. Thus, criteria and descriptors for
the assessment of speaking with a sustained monologue, public announcement or an
addressing audience’s task are available. Nevertheless, a maximum use of the framework
would imply that the institutions in charge of the certificates’ elaboration would either
343
have to use a different rubric for each of the tasks or use the overall one, which is not as
precise as the others. These drawbacks are not disadvantaging per se, in fact, it would be
ensured that the tasks selected are appropriate and that the descriptors incorporated into
the scale would be much more precise. In contrast, the amount of work would rise together
with the time employed. Another possible inconvenience related to the previous one is
that the CEFR scales are holistic and they only indicate different levels of language. As a
result, those certificates which determine whether a candidate has a certain level or not
may find the descriptions too general since they do not contemplate different degrees of
competence within the same level; i.e., if the candidate performance is outstanding, very
good, solid, satisfactory, etc.
Many studies conducted (such as the previously mentioned one by Trong and that carried
out by Ghalib and Al-Hattami) have proved that analytic scales are more precise than the
holistic ones. Consequently, they are more reliable and therefore most of the rubrics
examined are analytic (except for the ISE-II’s listening one). As the institutions
themselves are responsible for creating their own grading scales and the framework just
contains holistic ones, the process of designing the rubrics is more complicated and the
resulting rubrics are more diverse despite being based on the framework. Each institution
must decide on the criteria, the scale and formulate the descriptor for each of the levels.
While this may result in a much more tailored rubric, it may also distance them from the
framework and hence the reliability thereof is likely to decrease. Having reached this
point, the inclusion of holistic rubrics in the CEFR would unify the rubrics, at least in
terms of criteria, which would increase the reliability of the certificates. The institutions
would only need to tailor these rubrics to their exam’s features and tasks, but the
assessment criteria and the basics of the descriptors would be more similar.
344
In connection with the designing of rubrics according to the framework’s guidance, it is
worth mentioning the recommendations given for the descriptors. Those suggestions are
the ones which have been mentioned and used for the examination of all the rubrics during
the research: the feasibility of the rubric according to the size and number of criteria used,
positively worded writing, and the brief and “not vague” descriptions of the levels. It has
already been stated above how difficult it is to achieve a right balance among those
specifications, particularly between the prescription of using brief descriptors which are
at the same time “non- vague”. Another obstacle is writing with positive words, since
describing the lower levels of performance in an affirmative way requires being a true
master of language. In fact, only the FCE writing rubrics are positively worded, whereas
all the other rubrics analysed do not respect this instruction. As far as the speaking rubrics
are concerned, all but the ACLES and FCE’s rubrics use negative words. More illustration
of positively worded descriptors in the CEFR grading scales would help to ease its
implementation in other rubrics.
The European Framework of Reference for Languages is extremely useful in the matter
of assessment and has carried out a great labour in the unification of levels and promotion
of communicative approaches. Nonetheless, it should be noted that its application might
not be as feasible as it should be, owing to the shortage of precise grading scales. None
of the rubrics involved in the assessment of the most prominent language certificates is
completely effective, valid and reliable, and all of them present contradictions to the
CEFR’s regulations. Although this phenomenon may be due to particular unrelated
reasons, it would be foolish not to hold the framework responsible for it to some extent.
It may be the case that the applicability of the CEFR is not a total reality and therefore, a
revision and improvement of it is highly recommendable.
345
Another interesting reflection arises from the presence or absence of rubrics for the
assessment of papers. All the grading scales analysed assess the writing or speaking
papers except for the ISE-II rubric which assesses the listening skill. This leads to the
conclusion that the productive skills (speaking and writing) are easier to assess with a
rubric than the receptive skills. Since the framework provides grading scales for reading
and listening and it is possible to assess those skills with rubrics, we need to look for an
explanation in the exam tasks. The evaluation of writing generally consists of writing
essays and speaking requires the production of a speech or discussion. These types of
tasks are resolved with open answers, very different from one candidate to another, and
hence the objectivity of their assessment depends on the use of a reliable tool (the rubric).
On the other hand, reading and listening exams contain tasks which are generally
answered with a multiple choice or with one or two concrete words. In these cases, a
rubric would be totally useless as, given that an answer can only be correct or incorrect,
there is no place for creative answers. A change in the type of task towards more open
answers will be required in order to measure competence with a rubric. The ISE II
listening paper, for instance, requires the candidate to answer questions orally in order to
assess whether he or she has understood the recording. Hence, it is possible to assess with
a grading scale; for example, if all the data from the listening have been correctly
understood or just a part of them. Tasks of his type benefit from the use of a rubric. The
production of a summary in the reading paper to prove the correct understanding of the
general ideas and the sub-ideas, as well as the relationships established among them, is
another example of task which could be assessed with a grading scale. Other possible
tasks include the oral or written explanation of a topic by using all the information given
either in the listening track or the reading texts. Nevertheless, the change of the traditional
346
forms of assessment in the listening and reading papers would necessarily involve a
complete shift in the teachers’ methodology too.
7.2. Research limitations
In spite of the fact that the research conducted has led to many interesting findings and
allows us to make some important reflections on the use of rubrics, the adaptation of the
CEFR and the English certificates, they must be taken with precaution, since it is
fundamental to be aware of the multiple limitations of this thesis. With the intention of
analysing the results from a reasonable perspective and of being able to suggest future
actions to improve them, the following paragraphs contain a series of constrains that
should be taken into account.
Firstly, the data mentioned referring to the reliability of the tests have been taken directly
from the certificates’ official websites, but many of them did not provide those data, so a
real comparison between the reliability of certificates was not possible. Ideally, for a
comprehensive examination and contrast of all the exams, they should be properly tested
in terms of reliability. For instance, a group of participants should take all the certificates
and be assessed by at least two different examiners in order to compare the results. Thus,
data referring to the reliability of the test paper could be obtained and compared. If the
reliability were good, then the candidate should obtain the same score independently of
the examiner. At the same time, comparisons between exams could be made. Candidates
should obtain similar results in all the certificates in terms of level; for example, obtain a
B2 level in all the certificates. However, all this would imply carrying out another huge
parallel research project.
Another limitation of the study is related to the rubrics. As has been explained, the test
designed to control their effectiveness is a combination of the information contained in
347
the framework about the requirements for building a rubric and a rubric of rubrics
designed by Popham. Nevertheless, there is plenty of research on rubrics and how to build
effective and reliable ones. Although most authors agree on the basic requirements, there
may be certain variations about what is a good rubric and what criteria must it contain.
Experienced examiners could be involved in the research, so that its perspective and
knowledge about the use of the selected rubrics, the assessment of the different exam
papers and the elaboration of their tasks. Interviews or questionnaires could have been
used for the design of the research or the comments on their results and findings.
Even though the European Framework has been one of the principal cornerstones of this
research, the limitations of the framework itself have already been explained in the
previous section. As a result, the tasks suitable for assessing the different skills or the
criteria for assessing a candidate in each of the levels might not be perfect or totally exact.
On the other hand, the framework has been written based on many exhaustive
investigations by an entire panel of experts, and since it is the main frame of reference in
the teaching, learning and assessment of languages, it must be taken with the reliability it
deserves. It is fundamental, though, to consider that the project was undertaken more than
a decade ago and it was an enormous unprecedented work. It may thus be adjusted,
improved or modified in the future.
7.3. Research applicability and future implications
The current thesis intends to contribute to the area of language learning, especially to the
assessment of foreign language with the use of rubrics. Despite the numerous limitations,
it may be used in several ways.
348
The first application of the research could be as an encouragement to improve the rubrics
used by the main certificates analysed. The examination carried out is conducive to the
simple detection of certain mistakes that could be corrected to a greater or lesser extent.
Since none of the rubrics passed all of the recommendations stated in the framework, nor
the rubric’s rubric implemented, further revision should be conducted by their
institutions. This revision could be started with the current research; for instance, those
rubrics which do not use positive wording in their descriptors should correct this anomaly.
Similarly, those which use vague descriptors or excessively long ones, or perhaps too
many criteria, should be more harmonious with the framework guidelines.
It is particularly encouragable to increase transparency in the information on. The benefits
of providing students with the assessment criteria before the exam as well as allowing
them to be fully conversant with the tasks of the paper have been sufficiently proved.
Some of the certificate papers analysed did not provide candidates with specific data
about the paper tasks, or the information given is far too vague. Including this data would
be advisable and it would be positive to attach a number of past exam papers or models.
Furthermore, criteria for each of the tasks or papers are not always specified. Learners
need to know what is expected from them in order to be prepared for the exam, and to
know how far they are from being able to use the language functions a determined level
requires. It is also recommendable that the instruments of assessment be available, as well
as examples which correspond to different grades. Some of the rubrics analysed are not
published on the institutions’ websites or are not easy to find. A perfect scenario would
incorporate training and practice on rubrics into the certificate’s preparation courses. For
instance, learners could try to assess other candidates’ compositions or speaking
performance by using the same rubric. Candidates should be explained how those rubrics
work and be allowed to see different examples.
349
The research should also encourage a revision of the European Framework of Reference
for Languages. It was published in 2001, so almost two decades have already passed. It
is essential to check its results and effectivity. Some research ought to be conducted on
what the framework has supposed through these years, and whether its aims have been
accomplished or not. From these possible future studies, some conclusions and reflection
could be derived. Firstly, this reflection would allow us to ascertain how the Framework
is being used and implemented. For example, if it is really being used as the cornerstone
of language teaching and learning and if the use which is being made of it is suitable.
Additionally, it should be considered whether textbooks or Language certificates do
actually follow the guidelines given or, finally, if the education curriculum and syllabus
designed by the different governments allows the learning of sufficient language
competence in each of the levels.
The results of those analyses could help to improve the framework itself. In the current
thesis, a number of gaps in the framework have been noticed. As a result, some
suggestions are humbly suggested. For example, it would be interesting to include
information on the criteria the rubrics should incorporate for each skill; adding analytic
rubrics for each of the different levels and each of the skills. Furthermore, several exam
tasks could be added together with examples of model answers.
The test designed to check the validity of the exam papers and rubrics could be used to
assess other certificates or other rubrics. In addition, it could also be employed as a
guideline to create a new rubric or a new exam paper. Rubric creators should take into
account all the guidelines and check if they are really being followed. At the same time,
exam task recommendations and criteria should be considered when designing a syllabus
or an exam paper.
350
A broad area of research resulting from this thesis could be the analysis of reliability of
all the certificates, so that they can be compared, ranked and improved. This research
could also be subdivided into smaller areas. Improving the reliability of important English
Certificates would be incredibly helpful for the society and educational community.
Another future area of research is that concerning the assessment of receptive skills with
rubrics. This would lead to a wholesale transformation of teaching methodologies and
learning processes for the reading and listening skills. As has already been indicated, the
use of grading scales for the assessment of those skills would imply an implementation
of exam tasks different from the traditional ones. A change in the traditional tasks would
necessarily have to be accompanied by a change in the teaching-learning process.
Moreover, research on how to create and design effective and good rubrics for the tasks
or papers would have to be undertaken. Subsequently, research on the effectivity of the
new ways of teaching, learning and assessment of these two skills could be made, and
hence also improvement in and adjustments to the process.
Further areas of study would include further research on grading scales. To date, there
has been a lack of detailed studies on how to determine which rubrics are satisfactory.
More case studies would help to compare the different effects and implications of the use
of different types of rubrics. Most of the current studies deal with analytic and holistic
types, but case studies comparing rubric types according to other criteria are scant.
The following is a summary of some of the possible future lines of research:
- Analysis of the reliability of the most common English Certificates. Work
out what the index of reliably of each of them is, for example, Cronbach’s
Alpha and SEM.
351
- Comparison of other English certificates.
- Detailed revision of the European Framework of Reference for Languages,
applicability, consequences and possible deficiencies.
- Research on effective, reliable and valid rubrics. How to determine which
ones are good and improve them.
- Analysis, comparison and contrast of rubrics according to scale, function,
scorer and channel.
- Research on new ways of assessment for the receptive skills. New
methodologies and instruments of assessment.
- Research on the use of rubrics to assess the reading and writing papers.
These exiting new lines of research point towards the great deal of research which is still
needed to improve the language learning process, towards which the current doctoral
thesis supposes is merely a drop in the ocean. It might be overwhelming to consider how
much study is required until we achieve a system that unequivocally guarantees that its
students develop their maximum potential language competence. Nevertheless, one can
reflect on the meaningful milestones which over two decades the educational community
has accomplished and realise that the field of language learning has already undergone a
powerful and wholesale change towards a real communicative learning. This should be a
352
source of inspiration for all researchers, to prove to them that findings, foresight,
discipline, research and effort can actually transform the reality we live in.
353
Chapter 8: BIBLIOGRAPHY
Abbas, Zainab. “Difficulties in Using Methods of Alternative Assessment in Teaching
from Iraqi Instructors Points of View.” Al-Faith Journal. University of Diyala, College
of Education-Diyala. No. 48. Feb. 11 Oct. 2016
www.iasj.net/iasj?func=fulltext&aId=39413
ACLES. “Certacles. Modelo de acreditación de exámenes de ACLES.” Acles.es,
www.acles.es/files/certacles-modelo-acreditacion-examenes-acles.pdf
--“Estructura exámenes certacles.” Acles.es,
www.acles.es/files/ckeditor/estructura_examenes_certacles_2016_2_1.pdf
Al-Ghazo, Abeer. “The assessment of Reading comprehension strategies: Practices of
Jordanian public teachers at secondary level.” International Journal of English Language,
Literature and Humanities, Vol. III, Issue V, Jul. 2015, pp. 721-742.
Allen, Laura K. et al. “L2 Writing Practice: Game enjoyment as a key to engagement.”
Language Learning and Technology, Vol. 18, No.2, Jun. 2014, pp.124-150.
Altec. “Crear rúbrica.” Rubistar, University of Kansas, powered by 4teachers.org,
rubistar.4teachers.org/index.php?screen=CustomizeTemplate&bank_rubric_id=57§
ion_id=12&
Andrade, Heide. “What is a Rubric?” Rubstar. Create Rubrics for your Project-Based
Learning Activities. 2008.rubistar.4teachers.org/index.php?screen=WhatIs
Annenberg Foundation. Annenberg Learner. Teacher resources and professional
development across the curriculum, St. Louis, MO, 2016, www.learner.org/
354
Ayhan, Ülkü. and M.Uğur Türkyılmaz. “Key of language assessment: rubrics and rubric
design.” International Journal of Language and Linguistics, Vol. 2, No.2, Jun. 2015, pp.
82-92, ijllnet.com/journals/Vol_2_No_2_June_2015/12.pdf
Baitman, Brittany and Mauricio Veliz Cambos. “A Comparison of oral evaluation ratings
by native English teachers and non-native English speaker teachers.” Literatura y
Lingüística, Vol. 61, No. 3, 27th Aug. 2012, pp. 171-200.
Bas, Gökhan. Implementation of Multiple Intelligences Supported Project-Based
Learning in EFL/ESL Classrooms, Karatli Sehit Sahin Yilmaz Secondary School, 2008.
Becker, Anthony. “Student-generated scoring rubrics: Examining their formative value
for improving ESL students’s writing performance.” Assessing Writing, Jul. 2016.
Black, Paul and Dylan William. “Inside the black box; Raising standards through
classroom assessment.” Phi Delta Kappan, Vol. 80, No. 2. 1998, pp. 139-148,
doi:10.1177/0031721009200119
British Council, IDP: IELTS Australia and Cambridge Assessment English, “Test
Format.” IELTS, 2018, www.ielts.org/about-the-test/test-format
--" Test performance 2017”, IELTS, www.ielts.org/teaching-and-
research/test-performance
--“Guide for teachers. Test format, scoring and preparing students for the
test”, UCLES, 2012.
Brooks, Gavin. “Assessment and Academic Writing: A look at the Use of Rubrics in the
Second Language Writing Classroom.” Kwansei Gakuin University Humanities Review,
Vol. 17, 2012, pp. 227-240, core.ac.uk/download/pdf/143638458.pdf
355
Brooks, Val. Assessment in Secondary Schools. The new teacher’s guide to monitoring,
assessment, recording, reporting and accountability. Buckingham, Philadelphia, Open
University Press, 2002.
Buján, Karmele et al. (coord.). La evaluación de competencias en la educación superior.
Las rúbricas como instrumento de evaluación. Editorial MAD, 2011.
Cambridge Assessment English. “Quality and Accountability.” Cambridge Assessment
English, www.cambridgeenglish.org/research-and-validation/quality-and-
accountability/
Cambridge English Language Assessment. “Cambridge English First Handbook for
Teachers for exams from 2016.” CambridgeEnglish.org,
www.cambridgeenglish.org/images/167791-cambridge-english-first-handbook.pdf
--IELTS 13 Academic with Answers, Cambridge University Press, 2015.
Cambridge University Press. “Research.” Cambridge Dictionary, 2018,
dictionary.cambridge.org/
Cano, Elena. “Las rúbricas como instrumento de evaluación de competencias en
educación superior: ¿Uso o abuso?” Profesorado. Revista de currículum y formación del
profesorado, Vol. 19, No. 2, Mai-Aug. 2015, pp. 265-280,
www.ugr.es/~recfpro/rev192COL2.pdf
Carrillo Zoque, Andrea del Pilar and Diana Rocío Unigarro Millan. La lúdica como
estrategia para transformar los procesos de evaluación tradicional de las estudiantes de
grado décimo en la clase de inglés en el Liceo Femenino Mercedes Nariño. Dissertation.
Tut. Dario Alexsander Chitiva Rodriguez. Bogota, Fundación Universitaria Los
356
Libertadores Vicerrectoría De Educación Virtual Y A Distancia Especialización En
Pedagogía De La Lúdica, 2015.
Castillo, Santiago and Jesús Cabrerizo. Evaluación educativa de aprendizajes y
competencias, Madrid, Pearson Education, 2010, ISBN: 978-84-8322-781-7
Castillo Tabares, R. et al. “Implicaciones de la evaluación continua a través de rúbricas
sobre las practices pedagógicas: evidencia empírica y apliación de análisis
multidimensional.” Revista Horizontes Pedagógicos, No. 16. 2014, pp. 66-77.
Center for Advanced Research on Language Acquisition (CARLA).
www.carla.umn.edu/assessment/vac/Evaluation/res_1.html
Centro de Linguas Universidade de A Coruña. “Acles.” Centrodelinguas.gl,
http://www.centrodelinguas.gal/gl/pag/276/acles/
-- “Especificacións do nivel B2.” Centrodelinguas.gl,
http://www.centrodelinguas.gal/gl/pag/289/caja-estatica--acreditacion-acles--
especificacions-do-nivel-b2/
Clever Prototypes, LLC. Quick Rubric web page, 2016, www.quickrubric.com/r#/create-
a-rubric
Council of Europe. Common European Framework of Reference for Languages:
Learning, Teaching, Assessment. Cambridge: Cambridge University Press, 2001.
Council of Europe. “Education and Languages, Languages Policy.” Council of Europe,
2014, www.coe.int/t/dg4/linguistic/Cadre1_en.asp
Çaḡatay, Sibel and Fatma Ünveren. “IS CEFR Really over There?, Procedia. Social and
Behavioural Sciences, Vol. 232, 2016, pp. 705-712.
357
Dawson, Phillip. “Assessment rubrics: towards clearer and more replicable design,
research and practice.” Assessment and evaluation in Higher Education, Nov., Routledge,
doi: 10.1080/02602938.2015.1111294
Del Pozo Flórez, Jose Angel. Competencias Profesionales. Herramientas de evaluación:
el portafolios, la rúbrica y las pruebas situacionales, Narcea S.A. de Ediciones, 2012.
Dikli, Semire.“Assessment at a distance: Traditional vs. Alternative Assessment.” The
Turkish Online Journal of Education Technology, Vol. 2. Issue 3, art. 2. Florida State
University, 2003.
Ekmekçi, Emrah. “Comparison of Native and Non-native English language Teacher’s
Evaluation of EFL Learner’s Speaking Skills: Conflicting or Identical Rating
Behaviour?.” English Language Teaching, Vol. 9, No. 5, 2016, pp.98-105, DOI:
10.5539/elt.v9n5p98
Escudero Escorza, Tomás. “Enfoques Modélicos y estrategias en la evaluación de centros
educativos.” Revista Electrónica de Investigación y Evaluación Educativa), Relieve, Vol.
3, No.1, 1997.
Escuela Oficial de Idiomas de Gijón. “Criterios, procedimientos e instrumentos
de evaluación” eoigijon, 2017, eoigijon.com/wp-
content/uploads/2017/10/criterios-e-instrumentos-de-evaluacion.pdf
--“Modelos de Pruebas de Certificación de idiomas. Inglés. Nivel
Avanzado (NA).” eoigijon, www.educastur.es/estudiantes/idiomas/pruebas-
certificacion/modelos
--“Departamento de inglés. Programación 2017-2018.
358
Essay Tagger LCC. Essay Tagger.com. Common Core Rubric Creation Tool web page,
2016, www.essaytagger.com/commoncore
European Commission. Assessment of Key Competences in initial education and training:
Policy Guidance, Strasbourg, 20.11.2012 SWD(2012) 371 final, eur-lex.europa.eu/legal-
content/EN/TXT/PDF/?uri=CELEX:52012SC0371&from=EN
Ewing, Hannah. “Stereotype threat and assessment in schools.” Journal of Initial Teacher
Inquiry, Chris Asrall, Murray Fastier and Letitia Fickel (eds), Vol. 1, 2015, pp.7-9, ISSN
2463-4417
Ezza, El-Sadig Yahya. “Criteria for Assessing EFL Writing at Majma’ah University.”
Education in the Middle Middle East and North Africa, S. Hidri and C. Coombe (eds.),
Springer International Publishing Switzerland, pp.185-200, DOI: 10.1007/978-3-319-
43234-2_112017
Fatalaki, Javad. Ahmadi. “Teacher-Based Language Assessment.” International Letters
of Social and Humanistic Sciences, SciPress Ltd., Vol. 60., 2015, pp. 77-82.
Fitzpatrick, Jody L. et al. Program Evaluation, Alternative approaches and practical
guidelines. Pearson, 2004.
Frydrychova Klimova, Blanka. “Evaluation Writing in English as a second language.”
Procedia_Social and Behavioral Sciences, Dec. 2011, pp. 390-394.
Gallego Arrufat, María Jesus and Manuela Raposo Rivas “Compromiso del estudiante y
percepción del proceso evaluador basado en rúbricas.” REDU. Revista de docencia
universitaria. Vol. 12, No.1, Apr., 2014, pp. 197-215, doi:10.4995/redu.2014.6423.
García-Sanz, Mari Paz. “La valuación de competencias en educación superior mediante
rúbricas: un caso práctico.” Revista Electrónica Interuniversitaria de Formación del
359
Profesorado, Vol. 17, No. 1, 2014, pp. 87-106, DOI:10.6018/reifop.17.1. 198861,
revistas.um.es/reifop/article/view/87
Gardner, Richard et all. Rubrics. A paper submitted in partial fulfilment of the
requirements of RES 5560 Appalachian State University, Nov. 30, 2009.
lesn.appstate.edu/.../Gardner,Powell.../Rubric%20Lit.%20Review-
%20Dr.%20Olson.doc
Ghalib, Thikra. K. and Abdulghani Al-Hattami. “Holistic versus Analytic Evaluation of
EFL Writing. A Case Study.” English Language Teaching, Canadian Center of Science
and Education, Jun., Vol. 8, No. 7, 2015, pp. 225-236, doi:10.5539/elt.v8n7p225
Gil Pascual, Juan Antonio. Técnicas e instrumentos para la recogida de información.
Universidad Nacional de Educación a Distancia, ISBN: 878-84-362-6250-6, 2011
Girón-García, Carolina and Claudia Llopis Moreno. “Designing Oral-based Rubrics for
Oral Language Testng with Undergraduate Spanish Students in ESP Context.” The
Journal of Language Teaching and Learning. 2015-2, pp. 86-107,
dergipark.gov.tr/download/article-file/209019
Glencoe/McGraw-Hil. “Education up Close.” Teaching Today, Apr. 2005, Educational
and Proffesional Publishing Group of the McGraw-Hil Companies, Inc., New York, 2005.
www.glencoe.com/sec/teachingtoday/educationupclose.phtml/32
The glossary of education reform. In S. Abbott (Ed.). “Hidden curriculum.” 24th Aug.
2004, edglossary.org/hidden-curriculum
Goldin, Ilya M. A focus on content: the use of rubrics in peer review to guide students
and instructors, University of Pittsburgh, submitted to the Graduate Faculty of Arts &
360
Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy,
2011, core.ac.uk/download/pdf/95381728.pdf
Griffee, Dale T. An Introduction to Second Language Research Methods, Design and
Data, TESL-EJ Publications, 2012.
Hamp-Lyons, Liz, “Purposes of Assessment.” Handbook of Second Language
Assessment, Dina Tsagari, Jayanti Banerjee (eds.), Walter de Gruyter Inc., 2016.
de Haan, Pieter and Kees Van Esch. “Towards and instrument for the assessment of the
development of writing skills.” Language and Computers, Oct. 2004, pp. 1-14.
Harmer, Jeremy. How to teach English, Pearson Longman, 2007.
Heidari, Adeleh et al. “The Role of Culture Through the Eyes of Different Approaches to
and Methods of Foreign Language Teaching.” Journal of Intercultural Communication,
issue 34, Mar., 2014, ISSN 1404-1634
Helgesen, Marc. Listening in Practical Language Teaching, edited by David Numan,
Mcgraw-hill, 2003.
Henning, Melissa D. “Rubrics to the Rescue: What are rubrics?” TeachersFirst. Thinking
Teachers Teaching Thinkers, www.teachersfirst.com/lessons/rubrics/what-are-
rubrics.cfm
Hensley, Brandon and Jeffrey Brand. “Public Speaking Assessment 2013 Report”,
Milliking University, 2012-2013, pp. 1-18.
Hernández Sampieri, Roberto et al. Metodología de la Investigación, 5ª Ed. McGraw Hill,
2010, ISBN: 978-607-15-0291-9
361
Herrera Mosquera, Leonardo and Diego Macías. “A call for language assessment literacy
in the education and development of teachers of English as a Foreign Language.” Colomb.
Appl. Linguist. J., Vol. 17, No. 2, pp. 302-312, 2015, doi:
10.14483/udistrital.jour.calj.2015.2.a09
Herrero Martínez, Rafaela. Mª et al. “Evaluación de competencias con actividades
académicas interdisciplinares.” Etik@net, 12, Vol. I, pp. 106-126, 2012.
Hymes, Dell H. “On communicative competence”, Pride, J.B & Holmes, J. (eds),
Sociolinguistics, pp. 269-93, Penguin.
IELTS Home: http://www.ielts.org/
Jackson, Noel R. and Anthony E. Ward. “Assessing Public Speaking. A trial rubric to
speed up and standardise feedback.” 2014 Information Technology Based Higher
Education and Training (ITHET), York, 2014, pp. 1-5, doi:
10.1109/ITHET.2014.7155700
Janisch, Carole. et al. “Implementing Alternative Assessment: Opportunities and
Obstacles.” The Educational Forum, Vol. 71, 2007.
Jonsson, Anders and Gunilla Svingby. “The use of scoring rubrics: Reliability, validity
and educational consequences.” Educational Research Review, 2, 2007, pp. 130–144, doi:
10.1016/j.edurev.2007.05.002
Karppinen, Tiia. “Reading Activities in EFL Textbooks: An analysis of upper secondary
school textbooks.” Bachelor’s Thesis, 11 Dec. 2013,
jyx.jyu.fi/bitstream/handle/123456789/44521/1/URN%3ANBN%3Afi%3Ajyu-
201411023158.pdf
362
Keng, Leslie et al. “A Comparison of Distributed and Regional Scoring.” Test,
Measurement & Research Services, Pearson Bulletin, Sep., Issue 17, 2010,
images.pearsonassessments.com/images/tmrs/tmrs_rg/TMRSBulletin17.pdf?WT.mc_id
=TMRS_A_Comparison_of_Distributed
Kianiparsa, Parnaz and Sara Vali. “What is the Best Method to Assess EFL Learners’
Reading Comprehension.” ELTWeekly, Vol. 2, Issue 75, 12th Dec. 2010, pp. 8-24.
Laurian, Simona and Carlton J. Fitzgerald. “Effectsof using rubrics in a university
academic level Romanian literature class.” Procedia. Social and Behavioral Sciences,
Elsevier, 76, 2013, pp. 431-440.
Lavelle, Thomas. “Getting the Most from Texbook Listening Activities.” The Internet
TESL Journal, Nov. 2000.
Lavigne, Alyson Leah and Thomas L. Good. The Teacher and student evaluation: moving
beyond the failure of school system, Routledge, 2014.
López Bautista, Dolores. Evolución histórica de la evaluación educativa, 2010,
lahermandaddeeva.files.wordpress.com/2010/03/evolucion-historica-de-la-evaluacion-
educativa.pdf
Little, David et al. Training Teachers to use the European Portfolio, Council of Europe
Publishing, 2007.
Louw, Willa. “My Love Affair with Alternative Assessment: Integrating Quality
Assessment into OBE Courses for Distance Education.” Progressio, Vol. 25, Issue 2,
2003, pp. 21-28, ISSN: 0256-8853
Madrid, Daniel. “Introducción a la investigación en el aula de la lengua extranjera.”
Metodología de investigación en el área de filología inglesa, edited by María Elena
363
García Sánchez and María Sagrario Salaberri, Universidad de Almería, Secretariado de
Publicaciones, 2001, pp. 11-45,
www.ugr.es/~dmadrid/Publicaciones/Introduccion%20investigacion%20aula-
Sagrario%20y%20Elena.pdf
Marin-García, Juan A. et al. “Protocol: Comparing advantages and disadvantages of
Rating Scales, Behavior Observation Scales and Paired Comparison Scales for behavior
assessment of competencies in workers. A systematic literature review.” WPOM-Working
Papers on Operations Management, [S.l.], Vol. 6, No. 2, Nov. 2015, pp. 49-63,
DOI:10.4995/wpom.v6i2.4032, polipapers.upv.es/index.php/WPOM/article/view/4032
Mclaren, Neil et al. (eds.) TELF in Secondary Education: handbook and workbook,
Editorial Universidad de Granada, 2005, ISBN 84-338-3638-2
MECD Ministerio de Educación Cultura y Deporte. El Sistema Educativo español,
MECD/CIDE, Madrid, 2004, uom.uib.cat/digitalAssets/202/202199_6.pdf
Miller, Nigel. Alternative Forms of Formative and Summative Assessment, edited by John
Huston and David Whigham, Glasgow Calcedonian University, 2002.
Ministerio de Trabajo, Migraciones y Seguridad Social. “Diplomas de Acreditación de
Conocimientos de idiomas (inglés).” empleo.gob.es, Gobierno de España,
www.empleo.gob.es/es/mundo/consejerias/reinoUnido/portalempleo/es/curriculum/acre
ditacion-idiomas/index.htm
Morales, Carmen et al. La enseñanza de las lenguas extranjeras en España. Secretaría
General Técnica. Centro de Publicaciones. Ministerio de Educación, Cultura y Deporte.
sede.educacion.gob.es/publiventa/la-ensenanza-de-las-lenguas-extranjeras-en-
espana/investigacion-educativa/8757
364
Moss, Danna and Carol Van Duzer. “Project Based-learning for Adult English Language
Learners.” Eric Digest ED427556, 1998.
National Governors Association Center for Best Practices (NGA Center) and the Council
of Chief State School Officers (CCSSO) (n.d) Common Core Standards State Iniciative.
Preparing America’s Students for College & Careers. www.corestandards.org/about-the-
standards/development-process/
Oxford University Press. “Research.” Oxford English Dictionary, 2018,
en.oxforddictionaries.com/
Panadero, Ernesto and Anders Jonsson. “The use of scoring rubrics for formative
assessment purposes revisited: A review.” Educational Research Review V. 9, 2013, pp.
129–144., doi: 10.1016/j.edurev.2013.01.002
París Mañas, Georgina et al. “La evaluación de la competencia ‘Trabajo en equipo’ de
los estudiantes universitarios.” RIDU Revista d’Innovació Docent Universitaria, No. 8,
2016, pp. 86-97, DOI: 10.1344/RIDU2016.8.10
Patel, Amita “Evaluation – A Challenge for a Language Teacher.” The Global Journal of
English Studies I, May Volume 1, Issue 1, 2015, ISSN: 2395 4795
Patel, Pratiksha. Portfolio Asssessments, College of Education and Educational
Technology, Dec. 2001.
Perín, Dolores and Mark Lauterbach. “Assessing Text-Based Writing of Low Skilled
College Students.” International Artificial Intelligence in Education Scociety, Springer,
8th Nov. 2016.
365
Phelan, Colin and Julie Wren. Exploring reliability in academic assessment, UNI Office
of Academic Assessment, University of NorthernIowa, 2005-2006,
www.uni.edu/chfasoa/reliabilityandvalidity.htm
Popham, James. W. Mastering Assessment: a self-service system for educators,
Routledge, Oxon, 2006.
--Evaluación Trans-formativa, El poder transformador de la evaluación
formativa, Humanes, Narcea, 2013.
Pozuelos, Francisco José et al. Investigando la alimentación humana, Proyecto
Curricular Investigando Nuestro Mundo 6-12, Díada Editora, 2008, ISBN: 978-84-
96723-12-2
Princippia, Formación y Consultoría, S.L. Princippia. Una nueva forma de enseñar, una
nueva forma de aprender web page, 2016, princippia.com/www.princippia.com
Raposo-Rivas, Manuela. And Mª Esther Matínez-Figueira. “Evaluación educativa
utilizando rúbrica: un desafío para docentes y estudiantes
universitarios.” Educ, Vol. 17, No. 3, 2014, pp. 499-513, DOI: 10.5294/edu.2014.17.3.6
Real Decreto 1105/2014, de 26 de diciembre, por el que se establece el currículo básico
de la Educación Secundaria Obligatoria y del Bachillerato. Ministerio de educación,
cultura y deporte. Madrid, España, 3 de enero de 2015.
www.boe.es/boe/dias/2015/01/03/pdfs/BOE-A-2015-37.pdf
Reazon Systems Inc. “irubric.” RCampus web page, 2016,
www.rcampus.com/rubricshellc.cfm?mode=studio&sms=build&#REQUEST.rsUrlToke
n#
366
Richards, Jack C. Teaching Listening and Speaking from Theory to Practice, Cambridge
University Press, 2009,
www.researchgate.net/publication/255634567_Teaching_Listening_and_Speaking_Fro
m_Theory_to_Practice
Richards, Jack C. and Richard Schmidt. Language Teaching & Applied Linguistics,
Longman, Pearson Education, 2002.
Roberts, Rachel. “What are Reading skills?─They’re not (only) what you think.” Elt-
resourceful web page, 1st Dec. 2015, elt-resourceful.com/2015/12/01/what-are-reading-
skills-theyre-not-only-what-you-think/
Roca-Varela, Mª Luisa and Ignacio M. Palacios. “How are spoken skills assessed in
proficiency tests of general English as a Foreign Language? A preliminary survey.”
International Journal of English Studies (IJES), Vol. 13, No. 2, 2013, pp. 53-68.
Salehi, Mohammad and Zahra Sayyar. “An Investigation of the Reliability and Validity
of Peer-, Self-, and Teacher Assessment in EFL Learner’s Written and Oral Production.”
International Journal of Assessment and Evaluation in Education, Vol. 6, Dec. 2016, pp.
9-23.
Sambell, Kay et al. Assessment for Learning in Higher Education, Oxon, Routledge,
2013.
Schreiber, Lisa M. et al., “The Development and Test of the Public Speaking Competence
Rubric.” Communication Education, Routledge, Vol. 61, No. 3, Jul. 2012, pp.205-233,
DOI: 10.1080/03634523.2012.6707709
367
Simons, Marthea and Josef Colpaert. “Judgamental Evaluation of the CEFR by
stakeholders in language testing”, Revista de Lingüística y Lenguas Aplicadas, Vol. 10,
2015, pp. 66-77, DOI: 10.4995/rlyla.2015.3434
Slagell, Oral Presentations Evaluations: Pros and Cons, Fundamentals of Public
Speaking, Iowa State University,
isucomm.iastate.edu/files/pdf/OralPresentationEvaluation-ProsandCons.pdf
Solak, Ekrem and Firat Altay. “Prospective EFL Teacher’s perceptions of listening
comprehension problems in Turkey.” The Journal of international social research, Vol.
7, No. 30, 2014, pp. 190-198.
Sundee, Todd H. “Intructional rubrics: Effects of presentation on writing quality.”
Assessing writing, Elselvier, 1st of Apr. 2014, pp. 74-87.
Teacnhology Inc. “General Rubric Creator.” Teacnhology. The online Teacher resource
web page, 2010, www.teach-nology.com/web_tools/rubrics/general/
Trinity College London. “Integrated Skills in English (ISE) Guide for Teachers — ISE II
(B2).” Trinity College web page, 2015, Online edition Jun. 2017.
Trong Tuan, Luu. “Teaching and Assessing Speaking Performance through Analytic
Scoring Approach.” Theory and Practice in Language Studies, Academy Publisher, Vol.
2, No. 4, April 2012, pp. 673-679, doi: 10.4304/tpls.2.4.673-679
Tsushima, Rika. “Methodological Diversity in Language Assessment Research: The Role
of Mixed Methods in Classroom-Based Language Assessment Studies” International
Journal of Qualitative Methods, University of Alberta, Vol.14, No.2, 2015, pp. 104-121,
DOI: 10.1177/160940691501400202
368
Turley, Eric D. and Chris Gallagher. “On the User of Rubrics: Reframing the Great Rubric
Debate.” The English Journal, Vol. 97, No.4, March 2008, National Council of Teachers,
pp. 87-92. DOI: 10.2307/30047253, www.jstor.org/stable/30047253
Uribe-Enciso, Olga. “Improving EFL students’ performance in Reading comprehension
through explicit instruction in strategies.” Rastros Rostros, Vol. 17, No. 31, 2015, pp. 37-
52, doi: 10.16925/ra.v17i31.1271
Velasco Martínez, Leticia. And Juan Carlos Tójar Hurtado. “Evaluación por
competencias en educación superior. Uso y diseño de rúbricas por los docentes
universitarios.” AIDIPE (Ed.), Investigar con y para la sociedad, Bubok, Vol. 2, 2015,
pp. 1393-1405, avanza.uca.es/aidipe2015/libro/volumen2.pdf
Verano-Tacoronte, Domingo. et al.“Valoración de la competencia de comunicación oral
de estudiantes universitarios a través de una rúbrica fiable y válida.” Revista Brasileira
de Educaçao, Vol. 21, No. 64, Jan.-Mar. 2016, pp. 39-60, doi: 10.1590/S1413-
24782016216403
Veerappan, Veeramuthu and Sulaiman Tajularipin. “A review on IELTS Writing Test, its
Tests Results and Inter Rater Reliability.” Theory and Practice in Language Studies,
Vol.2, No. 1, Jan. 2012, pp. 138-143, doi:10.4304/tpls.2.1.138-143
Vez, José Manuel. “La Investigación en Didáctica de las Lenguas Extranjeras.” Educatio
Siglo XXI, Vol. 29, No.1, 2011, pp. 81-108,
digitum.um.es/xmlui/bitstream/10201/27149/1/La%20Investigación%20en%20Didáctic
a%20de%20las%20Lenguas%20Extranjeras.pdf
369
Von der Embse, Nathaniel P. et al. “Readying students to test: The influence of fear and
efficacy appeals on anxiety and test performance.” School Psychology International, Vol.
36, No. 6, 2015, pp. 620–637, DOI: 10.1177/0143034315609094
Walters, Brent G. and Ching-ning Chien. “College EFL Teacher’s Perspectives on
Listening Assessment and Summarization for a Specific Task.” Journal of Language
Teaching and Research, Vol.5, No.2, Mar. 2014, pp. 313-322
WebFinance, Inc. “Effectiveness evaluation.” Businessdictionary,
http://www.businessdictionary.com/
Weiqiang Wang. “Using rubrics in student self-assessment: student
perceptions in the English as a foreign language writing context.” Assessment &
Evaluation in
Higher Education, DOI: 10.1080/02602938.2016.1261993. 2016.
White, Michael and Gail Winkworth. A Rubric for Building Effective Collaboration:
Creating and Sustaining Multi Service Partnerships to Improve Outcomes for Clients,
ISBN: 9780-0-9873564-0-6, 2012.
Wimmer, Mary. “School Refusal: Information for Educators.” Helping Children at Home
and School III, National Association of School Psychologists, 2010.
Yilmaz, Burçak and Fatma Ünveren. “A comparative study of perceptions about the
Common European Framwework of Reference among EFL Teachers working at state and
private schools”, International Online Journal of Education and teaching (IOJET), Vol.
5, No. 2, 2018, pp. 401-417.
370
8.1. List of figures:
Fig. 1. Annenberg Learner. Rubric creator. Screenshot.
Annenberg Foundation. Annenberg Learner. Teacher resources and professional
development across the curriculum, St. Louis, MO, 2016, www.learner.org/
Fig. 2. EssayTagger.com. Essay Tagger Common Core Rubric Creation Tool. Screenshot
Essay Tagger LCC. Essay Tagger.com. Common Core Rubric Creation Tool web page,
2016, www.essaytagger.com/commoncore
Fig. 3. Irubric by RCampus. Screenshot
Reazon Systems Inc. “irubric” RCampus web page, 2016,
www.rcampus.com/rubricshellc.cfm?mode=studio&sms=build&#REQUEST.rsUrlToke
n#
Fig. 4. Rubistar. Screenshot
Altec. “Crear rúbrica.” Rubistar, University of Kansas, powered by 4teachers.org,
rubistar.4teachers.org/index.php?screen=CustomizeTemplate&bank_rubric_id=57§
ion_id=12&
Fig. 5. Teachnology General Rubric Generator. Screenshot
Teacnhology Inc. “General Rubric Creator.” Teacnhology. The online Teacher resource
web page, 2010, www.teach-nology.com/web_tools/rubrics/general/
Fig. 6. Quick Rubric. Screenshot
Clever Prototypes, LLC. Quick Rubric web page, 2016, www.quickrubric.com/r#/create-
a-rubric
371
Fig. 7. Rubric.O-Matic software by Peter Evans. Screenshot
Evans, Peter. “Rubric-O-Matic 2016.” eMarking Assistant, Helping teachers provide
detailed feedback, 2015, emarkingassistant.com/products/rubric-o-matic/
Fig. 8. Princippia by Princippia Formación y Consultoría, S.L. Rubric sample. Screenshot
Princippia, Formación y Consultoría, S.L. Princippia. Una nueva forma de enseñar, una
nueva forma de aprender web page, 2016, princippia.com/www.princippia.com
Fig. 9. Princippia by Princippia Formación y Consultoría, S.L. Evaluation Criteria.
Screenshot
Princippia, Formación y Consultoría, S.L. Princippia. Una nueva forma de enseñar, una
nueva forma de aprender web page, 2016, princippia.com/www.princippia.com
Fig. 10. Princippia by Princippia Formación y Consultoría, S.L. Final Evaluation.
Screenshot
Princippia, Formación y Consultoría, S.L. Princippia. Una nueva forma de enseñar, una
nueva forma de aprender web page, 2016, princippia.com/www.princippia.com