Systematic review of grading practice: Is there evidence of grade inflation?

14
Systematic review of grading practice: Is there evidence of grade ination? Jayne H. Donaldson a, * , Morag Gray b, 1 a School of Nursing, Midwifery and Social Care, Faculty of Health, Life and Social Science, Napier University, Comely Bank Campus, Edinburgh EH4 2LD, UK b Educational and Research Consultant, Gray Academic, Woodville Court, 63/2 Canaan Lane, Edinburgh, EH10 4SG, UK article info Article history: Accepted 2 October 2011 Keywords: Nurse education Grading Practice Grade ination abstract Systematic Review of Grading Practice: Is there evidence of grade ination? This paper describes the outcomes of a systematic review of literature pertaining to the grading of practice within nursing, midwifery, medicine and allied health professions. From a total of 215 papers, 147 were included and data were extracted using a systematic data extraction tool. The focus of this paper relates to one of the emerging themes: the issue of grade ination. The paper examines the grade ination phenomenon: it discusses the reasons for grade ination from a variety of perspectives. The paper reports on the suggestions made within the literature on how to control grade ination, but these, the authors conclude, are not fully evaluated and should be adopted only where rigorous evaluation can carried out. It is imperative that evaluations include the usefulness, reliability and validity testing of rubrics or any other solutions to grade ination that are adopted by clinicians and educators. Ó 2011 Elsevier Ltd. All rights reserved. Introduction Background NHS Education for Scotland commissioned Lauder et al. (2008) to undertake a review of the pre-registration education of nurses and midwives over a two year period (2006e2008). The research aim focused on identifying how the changing pre-registration programme impacts on the skills and competence of newly quali- ed nurses and midwives. Lauder et als (2008) ndings raised several important issues surrounding the assessment of studentstheory and practice. They reported that there was variability in existing practices across Higher Education Institutions in relation to assessment processes and documentation. Lauder et al. (2008) suggested that these differences have the potential to impact on the student experience and the quality of mentor support, espe- cially in clinical areas where students are placed from more than one university, each with their own systems and processes for practice assessment. Lauder et als report revealed various approaches/methods to assess practice being used in the pre-registration programmes in Scotland. For example, there were systems that: Did not grade practice, but simply used passor failcriteria, Graded at the level of the total practice placement/module, Graded at the level of domains of practice, Graded at the level of specic practice outcomes/prociencies. In 2007, the UK Nursing and Midwifery Council (NMC, 2007) issued a new standard stating that midwifery practice must be graded and contribute to the nal academic award. At present, the same professional regulatory standard does not apply for nursing in the UK. As a result of the Lauder et al. (2008) report together with other developments across Scotland, NHS Education for Scotland commissioned a systematic review of the literature in 2009 with a specic remit of grading of practice. The authors undertook this review, and revealed some ndings which are worthy of further discussion and debate. One of these ndings relates to the issue of grade ination which is the focus of this paper. The systematic review The advantages to undertaking a systematic review, ensures that all literature sources are included, and therefore the credibility of * Corresponding author. Tel.: þ44 (0) 131 455 5368. E-mail addresses: [email protected] (J.H. Donaldson), morag@ grayacademic.co.uk (M. Gray). 1 Tel.: þ44 (0) 131 447 9168. Contents lists available at SciVerse ScienceDirect Nurse Education in Practice journal homepage: www.elsevier.com/nepr 1471-5953/$ e see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.nepr.2011.10.007 Nurse Education in Practice 12 (2012) 101e114

Transcript of Systematic review of grading practice: Is there evidence of grade inflation?

at SciVerse ScienceDirect

Nurse Education in Practice 12 (2012) 101e114

Contents lists available

Nurse Education in Practice

journal homepage: www.elsevier .com/nepr

Systematic review of grading practice: Is there evidence of grade inflation?

Jayne H. Donaldson a,*, Morag Gray b,1

a School of Nursing, Midwifery and Social Care, Faculty of Health, Life and Social Science, Napier University, Comely Bank Campus, Edinburgh EH4 2LD, UKb Educational and Research Consultant, Gray Academic, Woodville Court, 63/2 Canaan Lane, Edinburgh, EH10 4SG, UK

a r t i c l e i n f o

Article history:Accepted 2 October 2011

Keywords:Nurse educationGradingPracticeGrade inflation

* Corresponding author. Tel.: þ44 (0) 131 455 5368E-mail addresses: [email protected]

grayacademic.co.uk (M. Gray).1 Tel.: þ44 (0) 131 447 9168.

1471-5953/$ e see front matter � 2011 Elsevier Ltd.doi:10.1016/j.nepr.2011.10.007

a b s t r a c t

Systematic Review of Grading Practice: Is there evidence of grade inflation? This paper describes theoutcomes of a systematic review of literature pertaining to the grading of practice within nursing,midwifery, medicine and allied health professions. From a total of 215 papers, 147 were included anddata were extracted using a systematic data extraction tool. The focus of this paper relates to one of theemerging themes: the issue of grade inflation. The paper examines the grade inflation phenomenon: itdiscusses the reasons for grade inflation from a variety of perspectives. The paper reports on thesuggestions made within the literature on how to control grade inflation, but these, the authorsconclude, are not fully evaluated and should be adopted only where rigorous evaluation can carried out.It is imperative that evaluations include the usefulness, reliability and validity testing of rubrics or anyother solutions to grade inflation that are adopted by clinicians and educators.

� 2011 Elsevier Ltd. All rights reserved.

Introduction

Background

NHS Education for Scotland commissioned Lauder et al. (2008)to undertake a review of the pre-registration education of nursesand midwives over a two year period (2006e2008). The researchaim focused on identifying how the changing pre-registrationprogramme impacts on the skills and competence of newly quali-fied nurses and midwives. Lauder et al’s (2008) findings raisedseveral important issues surrounding the assessment of students’theory and practice. They reported that there was variability inexisting practices across Higher Education Institutions in relation toassessment processes and documentation. Lauder et al. (2008)suggested that these differences have the potential to impact onthe student experience and the quality of mentor support, espe-cially in clinical areas where students are placed from more thanone university, each with their own systems and processes forpractice assessment.

.(J.H. Donaldson), morag@

All rights reserved.

Lauder et al’s report revealed various approaches/methods toassess practice being used in the pre-registration programmes inScotland. For example, there were systems that:

� Did not grade practice, but simply used ‘pass’ or fail’ criteria,� Graded at the level of the total practice placement/module,� Graded at the level of domains of practice,� Graded at the level of specific practice outcomes/proficiencies.

In 2007, the UK Nursing and Midwifery Council (NMC, 2007)issued a new standard stating that midwifery practice must begraded and contribute to the final academic award. At present, thesame professional regulatory standard does not apply for nursingin the UK. As a result of the Lauder et al. (2008) report togetherwith other developments across Scotland, NHS Education forScotland commissioned a systematic review of the literature in2009 with a specific remit of grading of practice. The authorsundertook this review, and revealed some findings which areworthy of further discussion and debate. One of these findingsrelates to the issue of grade inflation which is the focus of thispaper.

The systematic review

The advantages to undertaking a systematic review, ensures thatall literature sources are included, and therefore the credibility of

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114102

the evidence derived from the literature is assessed and the level ofevidence that any literature source which has been chosen forinclusion is graded (Parahoo, 2006).

Themethods employed to undertake this systematic reviewwillnow be described, and the findings associated with grade inflationwill then be discussed.

Methods

The aim of the overall systematic review was to undertakea literature review that focused on exploring issues of grading inpractice, including reliability, validity, and the implications formentor preparation and support.

Literature searching

Literature was collected following a systematic search of theliterature. Table 1 summarises the search strategy employed, whichincluded searching literature from all health professions (Fig. 1).Databases searched included:

� CINAHL� British Nursing Index� Medline� ASSIA� British Education Index� AMED� EMBASE� Australian Education Index� ERIC� Index to Theses� CERUK (Current Educational Research in the UK)� Directory of Open Access Journals� SCIRUS� Google Scholar

The above search criteria revealed 164 literature sources. Afurther 48 sources were included in the first phase of the review aspart of a hand searching exercise.

Table 1Search strategy as recorded in EBSCO CINAHL, 31 July 2009.

# Query

1 (Grade or grading) and clinical practice2 (Grade or grading) and tool*3 Practice assess*4 Assess* N2 practice5 Assess* N2 tool*6 Rubric*7 (MH “Student Performance Appraisalþ”)8 (MH "Student satisfaction/EV")9 (MH "Student, Nursing/EV")10 (MH "Student, Midwifery/EV")11 (MH "Student, Allied Health/EV")12 (MH "Student, Medical/EV")13 (MH "Student, Physical Therapy/EV")14 (MH "Competency Assessment")15 (MH "Clinical Competence/EV")16 (MH "Educational Measurementþ")17 (MH "Mentorship")18 (MH "Preceptorship")19 Mentor* or preceptor*20 1 or 2 or 3 or 4 or 5 or 621 7 or 8 or 9 or 10 or 11 or 12 or 1322 14 or 15 or 1623 17 or 18 or 1924 20 and 2125 20 and 21 e Limit from 1999 to 2009

E-mail contact asking for help in the identification of greyliterature was made to the RCN Research and Development Co-ordinating Centre; the Scottish Heads Academic Nursing & AlliedHealth Professions and the Council of Deans of Health; the Inter-national Council of Nurses (ICN) Research Network; the Interna-tional Network for Doctoral Education in Nursing and NursingKnowledge International. As a result, 3 additional sources wereincluded within the literature base.

Two hundred and fifteen literature sources were includedwithin phase 1 of the literature review e an overview of the liter-ature search can be viewed in Fig. 1.

Reviewing the literature for inclusion with the review: phase 1

Every piece of literature (full text articles) was independentlyreviewed by the reviewers (MG and JD), and were included orexcluded from the study. The data extracted from each literaturesource was summarised by each reviewer and where there wasdisagreement, reviewers met to discuss and agree inclusion/exclusion. Datawere extracted using a common data extraction tooldesigned and utilised by the reviewers. Table 2 provides an over-view of the categories of data that were extracted from each liter-ature source.

Exclusion criteria included literature that did not discussa grading tool or system and/or not available in the Englishlanguage. An overview of excluded can be found in publication onthe full report (Gray and Donaldson, 2009).

None of the literature revealed use of randomised controlledtrials, and the reviewers considered it appropriate to use literaturewhich employed quantitative and qualitative methodology,descriptive accounts, text and expert opinion and other literaturereviews. The remaining 119 literature sources were then used in thesecond phase of the study.

Data extraction from included studies (phase 2)

Data from included studies were extracted and themed. Themesthat emerged were:

1. What are the issues in grading of any professional practice?B Definition of graded practiceB Argument in favour of gradingB Argument against gradingB Grade inflationB Grading toolsB Importance of contextB Associated dilemmas, difficulties, barriers, challenges

2. What are the issues related to reliability and validity in gradingof professional practice?B Validity, reliability e depending on type of assessment tool

usedB Inter-rater reliability e depending on type of assessment

tool usedB Importance of contextB Associated dilemmas, difficulties, barriers, challenges

3. What are the implications for mentor preparation and supportin respect to grading of professional practice?B Training coursesB Use of paper-based training packsB Associated dilemmas, difficulties, barriers, challenges

For the purposes of this paper, the findings related to gradeinflation only will be discussed.

Fig. 1. Literature searching process.

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114 103

Findings

Level of evidence on data extracted

Within systematic reviews it is important that some informa-tion is provided about the level of evidence available on the topic ofinterest. Table 3a demonstrates the total number of includedliterature sources using quantitative methodology, qualitativemethodology, literature reviews, descriptive accounts, text andopinion. Table 3b demonstrates the number of included literaturesources associated with grade inflation using quantitative meth-odology, qualitative methodology, literature reviews, descriptiveaccounts, text and opinion.

Appendix I provides an overviewof the data extracted from eachliterature sources which identified grade inflation.

Definition of grading

For clarity within this paper individuals (for example, mentors;co-mentors; sign-off mentors; preceptors; supervisors;

consultants; programme directors) who assess students in practiceare referred to as assessors.

The grading of practice reflects the conclusion of a decisionmaking process (Lanphear, 1999; Sadler, 2005). Grading practiceinvolves making a decision based on the assessment of perfor-mance which allows recognition of merit or excellence beyondawarding a mere pass (Andre, 2000; Williams and Bateman, 2003;Hill et al., 2006).

“Grading . provides feedback to students and staff andbecause students like to know how they are getting on rather thanonly that they have passed or failed” (Moon, 2002: 73). Moon(2002: 85) defines grade assessment criterion as “a standard ofperformance that a learner must reach in order to be allocateda particular grade within a hierarchy of grades. In this case there islikely to be a series of grade assessment criteria related to thedifferent grades”.

During the literature review it was apparent that some authorsused ‘grading’ as a term to rate a single event (such as ratinga single task undertaken), while others referred to this asa combination, or grade point average, of scores, marks or gradesto indicate an overall result (Sadler, 2005). Grading within this

Table 2Categories of data that were extracted from each literature source, and the rationalefor extracting it.

Category ofdata extraction

Rationale for extracting this category

Author(s) and date ofpublication

To identify the source

Aim(s) of thepaper/study

To provide a focus of the paper/study

Type of study (Quantitative,Qualitative, Text/opinion,Literature review(systematic/non-systematic)

To categorise the levelof evidence

Sample size To gauge the generalisation of the findingsData collection method(s) To assess the appropriateness of data

collection techniquesData analysis method(s) To assess the appropriateness of data

analysis techniquesMain findings To summarise the main findings of the

paper/study and provide a methodof theming the findings for presentationwithin this literature review

Rationale for inclusion To state why the literature source had beenincluded within this systematic review inorder to achieve transparency of decision forinclusion for the reader

Limitation(s) of thepaper/study

To gauge the generalisation of the findings

Where papers/studies did not state any of the above this was noted as such (i.e.not stated) and where any of the above was not applicable this was stated as such(i.e. N/A).

Table 3bNumber of literature sources within each category associated with grade inflation.

Classification of studya Quantitative Qualitative Literaturereview

Descriptiveaccount,text andopinion

Number of literaturesources with gradeinflation measuredand/or described

17 0 4 0

Total number of articles reviewedwith grade inflation measured and/or described

21

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114104

paper will refer to any scale (numerical, alphabetical or descrip-tive) which has been used to rate any type of student performancewhether that was during a specific assessment task or continuousassessment.

The argument in favour of grading practice

With the move of Nursing & Midwifery Education into HigherEducation in the UK, Andre (2000) asserts that grading of prac-tice became increasingly relevant. “A practice-based disciplinesuch as nursing, that espouses the value of applying skills topractice, needs to consider how such value is communicated inacademic form” (Andre, 2000: 672). Glover et al. (1997) arguethat not to grade students’ performance in practice is devaluingthis important aspect of their education. Andre (2000) adds tothis by stressing that by not grading practice high achievingstudents are disadvantaged as their accomplishment is notovertly rewarded.

Table 3aNumber of literature sources within each category.

Classification of studya Quantitative Qualitative Literaturereview

Descriptiveaccount,text andopinion

Number of literaturesources

66 6 19 28

Total number of articlesreviewed

119b

b7 studies were not reviewed as theywere unable to be sourced due tothe short time scale of this review

b where studies used mixed method, the study was counted as quantitative forthe purposes of distinguishing between levels of evidence.

The “.‘Holy Grail’ of being able to fairly assess practice wouldachievewhat may be seen as a very desirable goal for measuringthe ‘art’ of midwifery as well as its science. Students who excelat practice might benefit from this aspect being formallyacknowledged” (Darra et al., 2003: 44).

Others make an argument for grading practice in terms ofimproving the quality of the learning experience. Grading practiceimproves the learning experience since it provides students’ withdetailed feedback on their performance (ElBadrawy and Korayem,2007; Johnson, 2007,, 2008), and allows both students and theirassessors to rate students over time and note patterns of perfor-mance (Ben-David et al., 2004; Seldomridge and Walsh, 2006;Holaday and Buckley, 2008). A number of authors argue thatgrading of practice provides students with a strong incentive andmotivational driver for students to perform at their best (Williamsand Bateman, 2003; ElBadrawy and Korayem, 2007; Johnson, 2007,,2008). Johnson (2007: 27) also states that grading of competencebased practice is particularly important in the context of “nationaland international moves towards developing unified frameworksfor linking qualifications”.

Argument against grading practice

There are two main arguments against the use of grades. Firstlythere are those opposed to grading on the grounds that it is notcompatible with the principles of competency-based assessment(Andre, 2000; Williams and Bateman, 2003). Secondly, there arearguments that focus on the negative impact of grading studentsand others as illustrated in Sharp’s opinion below:

“Medical degrees are usually awarded on a pass/fail or satis-factory/unsatisfactory basis since while most patients would bepleased to be treated by a first or possibly upper second classdoctor, the presence of lower second or third class doctors in theNHS would do little to promote public confidence in the service.Doctors are either qualified to practice of they are not” (Sharp,2006: 146).

Sharp (2006) postulates whether the same argument wouldhold for other professions.

The negative impact of grading practice on students revolvesaround the unfairness of this to less able students who can becomede-motivated due to a sense of failure even though they are‘passing’ (Williams and Bateman, 2003); the push for competi-tiveness between students rather than collaboration (Williams andBateman, 2003) and the cause of anxiety and stress in students(Ravelli and Wolfson, 1999).

In their comparative study to measure the impact of a changein grading system in the first 2 years of medical school fromgraded to pass/fail on medical students’ academic performance,attendance, residency match, satisfaction and psychological well-being, Bloodgood et al. (2009) found that the move from grading

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114 105

to pass/fail was not associated with a decline in students’academic performance nor attendance. The change to pass/failgrading system resulted in a statistically significant improvementin students’ well-being (particularly females). Rohe et al. (2006)suggested that the use of a pass/fail rather than a grade reducesstress and anxiety in students and increase group cohesion.

Issues related to grading of nursing and midwifery practice

From the review of the literature there are a number of issuesidentified related to grading of nursing and midwifery practice:importance of context; grading tools, grade inflation, use of rubricsin grading tools and challenges.

Context of grading nursing and midwifery practiceThe context in which grading nursing and midwifery practice

takes place is argued by a number of authors to be critical (Neary,2000a,b; Clouder and Toms, 2005; Cowan et al., 2005; Yorke, 2005;Allen et al., 2008; Cassidy, 2009). There is increasing emphasisbeing placed on using authentic assessments. Mueller n.d definesauthentic assessment as a form of assessment in which students areasked to perform real-world tasks that demonstrate meaningfulapplication of essential knowledge and skills. Yorke (2005) assertsthat the more accurate terminology to use is the assessment ofauthentic performance.

Whilst acknowledging the importance of context in the gradingof practice, Johnson (2007) warns of the inherent danger ofinconsistency in different assessors’ judgements:

“Context might interfere with consistency in at least two ways.First, context might interfere with an assessor’s ability to posi-tion the qualities of two difference performances on a commonscale. Factors may exist that intrude on the process of castingconsistent judgements (e.g. performance tasks involving inter-actions between individuals might be interpreted differently byjudges who accommodate variations in social dynamics, such asdealing with ‘tricky’ as opposed to ‘helpful’ customers).Secondly, context can make it more difficult to infer the basis onwhich assessors’ decisions are being made. Assessors indifferent contexts might make different judgements based ondifferent foundations from each other because their under-standing of competence is based on their different experiences”(Johnson, 2007: 29).

Grade inflation

Grade inflation is defined as when there is a greater percentageof excellent of higher scores than is a true reflection of the students’actual performance warrants (Cacamese et al., 2007; Isaacson andStacy, 2009). The danger of grade inflation is that it allowsstudents to erroneously believe that they are more competent thanthey perhaps are and may have weaknesses that need to beaddressed (Cacamese et al., 2007). Seldomridge and Walsh (2006)make the observation that grade inflation is not confined tonursing but occurs across all healthcare professions.

There are a number of documented reasons as to why gradeinflation occurs. Reasons can be mapped to those revolving aroundstudents; assessors, the studenteassessor relationship and thegrading tool itself.

Reasons for grade inflation: studentsStudents, according to North American authors, generate pres-

sure on assessors to give them good grades regardless of the qualityof their performance (Walsh and Seldomridge, 2005; Weaver et al.,2007).

Reasons for grade inflation: assessorsA number of possible reasons for grade inflation are attributed

to assessors. Inexperienced assessors often have more difficulty ingiving negative feedback (Cacamese et al., 2007; Walsh andSeldomridge, 2005) and it is suggested that they find it easier toavoid conflict by giving them a good grade and relying on anotherassessor in the student’s subsequent placement to properly assessthe student since they don’t have the confidence to do so them-selves (Fordham, 2005; Yorke, 2005; Weaver et al., 2007). Thus,less experienced assessors are more likely to give students’a ‘second-chance’. That said it is not always inexperienced asses-sors who give students a ‘second-chance’. For over thirty yearsnow, the terms ‘Hawk’ and ‘Dove’ have been used to describe thephenomenon of Hawks as assessors who have high expectationsand subsequently fail many students and Doves who are morelikely to pass students as they err on leniency as opposed to hawkswho are more stringent (Alexander, 1996; McManus et al., 2006;Seldomridge and Walsh, 2006; Panzarella and Manyon, 2007).The consequences of this can be that one student can receivea higher grade than one of their peers either because of betterperformance or luck in being assessed by a lenient marker(Iramaneerat and Yudkowsky, 2007).

More experienced assessors however are not immune to gradeinflation. Again North American authors state that many asses-sors who are non-tenured are reluctant to give low gradesbecause they rely on good evaluations from their students fortheir continued employment. Assessors who are tenured areoften reluctant to give low grades because they do not want towaste time in dealing with student appeals (Chambers, 1999;Walsh and Seldomridge, 2005; Gill et al., 2006; Isaacson andStacy, 2009).

Reasons for grade inflation: student e assessor relationshipThe nature of the studenteassessor relationship is seen as

a factor in causing grade inflation. In written assessments there ismore distance between student and assessor and it is possible touse anonymous marking so that it is less subject to bias. However,in grading practice a closer relationship develops (Cowan et al.,2005) and the assessor can be unduly influenced by otherfactors leading to the ‘Halo effect’ (Fisher and Parolin, 2000;Iramaneerat and Yudkowsky, 2007; Fletcher, 2008). It is difficultfor assessors to give poor grades to students who model them-selves on them or to give high grades to students they perceive asbeing difficult to manage (Clouder and Toms, 2005; McGrath et al.,2006). Clouder and Toms (2005) highlight that assessors havetheir own picture of how they expect students to be and if theyfall short of those then grades are likely to be lower, than thosestudents who mirror the assessors’ expectations. Brown (2000)found in his study that 76% (n ¼ 115) of mentors made refer-ence to personal characteristics when assessing their perfor-mance. According to Calman et al. (2002) a student’s practiceassessment is often dependent on the assessor’s personality andtheir knowledge of the student. Smith (2007) asserts that asses-sors are more likely to over-grade their student’s work thanunder-grade it.

Reasons for grade inflation: tool designTool design can also contribute to grade inflation. If a tool uses

the letters AeD for example with D being a minimum pass, thereis a tendency to award weak students a C grade and averagestudents a B grade (Isaacson and Stacy, 2009). Iramaneerat andYudkowsky 2007 suggest that some assessors are more likely tocluster their grade around a particular portion of the scale e

either lenient or severe ends or the mid-point rather than usingthe whole scale available to them. Isaacson and Stacy (2009: 136)

Table 4Reasons for grade inflation.

Reason Authors

Forming a bond with students Brown 2000; Cacamese et al. (2007);Clouder and Toms (2005); Cowan et al.,2005; McGrath et al., 2006

Having difficulty in givingnegative feedback

Cacamese et al. (2007); Walsh andSeldomridge (2005)

Pressure of high studentexpectations

Cacamese et al. (2007); Walsh andSeldomridge (2005); Weaver et al., 2007)

Being unprepared tochallenge student’sself-assessment

Fordham (2005); Yorke (2005); Weaveret al., 2007)

Having low confidence/lackof experience in failingstudents

Fordham (2005); Yorke (2005); Weaveret al., 2007)

Failing a student reflectsbadly on the assessorand can affect tenure

Chambers (1999); Walsh and Seldomridge(2005); Gill et al. (2006); Isaacson andStacy (2009)

‘Halo effect’ e where astudent is highly rated inone area, the grademay be inflated to match

Fisher and Parolin (2000); Clouder andToms (2005); McGrath et al. (2006);Fletcher (2008)

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114106

warn that grading tools that use ‘equal weighting of objectivescan lead to grade inflation because students can succeed overallwhile missing the bigger picture and important components ofthe course’. The reasons for grade inflation are summarised inTable 4.

Data extracted from literature sources included within thisliterature review and which state/report/comment upon gradeinflation are summarised in Appendix I.

Suggested ways of controlling grade inflation

A few authors report ways of controlling grade inflation.Chambers (1999) and Battistone et al. (2001) advocate the need toprovide detailed evidence of why a particular grade has beenawarded as this had the effect of reducing grade inflation. Weaveret al. (2007) found that using explicit criteria for each grade hel-ped to reduce grade inflation, although it did continue to persist.Some authors postulate that grade inflation is due to assessorsbeing unable to discriminate effectively between grades (Hill et al.,2006; Iramaneerat and Yudkowsky, 2007; Hemmer et al., 2008;) soincluding this aspect in the training of assessors may help toameliorate grade inflation. Norcini (2007) suggests the use ofassessors who have not been the student’s mentor reduces theamount of prior information available and reduce the assessor’spersonal stake in the trainee.

Schmahmann et al. (2008) suggests that no single test is accu-rate enough to predict performance in medical interns inneurology, but using a complimentary set of grading tools to obtaina composite score could limit grade inflation.

Use of rubrics in grading tools

According to Truemper (2004) the origins of the word rubric isderived from the Latin ‘rubrica terra’ meaning the early practice ofusing red soil to signal something of importance. Rubrics can bedefined as “an assessment tool that uses clearly defined criteriaand proficiency levels to gauge student achievement of thosecriteria. The criteria provide descriptions of each level of perfor-mance in terms of what students are able to do”. (Montgomery,2000: 325).

There are two types of grading rubrics: analytical and holistic(Truemper, 2004). “An analytical rubric allows for the separateevaluation of each component of the task, while a holistic oneviews all elements in a combined manner” (Truemper, 2004:562).

Regardless of the type of rubric, they are made up of three keycomponents:

1. Clearly defined performance criteria/elements2. Detailed descriptions of what a performance looks like at each

level of proficiency3. Rating scale (commonly 3 or 4 point) (Moskal and Leydens,

2000; Walsh et al., 2008).

The advantages of using a grading rubric are cited as:

� A way to reduce subjectivity, improve objectivity and consis-tency in grading (Truemper, 2004; Walsh et al., 2008);

� Away to reduce the time burden on assessors (Andrade, 2000;Isaacson and Stacy, 2009);

� Helping students understand the reasons why they receivea particular grading (Montgomery, 2000; Truemper, 2004;Walsh et al., 2008);

� Well constructed, they can improve intra- rater and inter-raterreliability (Moskal and Leydens, 2000);

� When used repeatedly, they can be used formatively as thestudent knows exactly what is expected at the highest level(Andrade, 2000; Truemper, 2004; Lasater, 2007; Walsh et al.,2008); and they can note patterns and/or improvements intheir work (Isaacson and Stacy, 2009);

� Since they are clear and transparent to both assessors andstudents they help to improve communication (Truemper,2004; Walsh et al., 2008; Isaacson and Stacy, 2009);

� Provide “a more level playing field for increasingly diversegroups of students” (Lasater, 2007: 497) supported by Isaacsonand Stacy (2009).

“Carefully designed analytic, holistic, task specific and generalscoring rubrics have the potential to produce valid and reliableresults” (Moskal and Leydens, 2000: 9). Isaacson and Stacy (2009)state that rubrics serve as a blueprint for grading.

Discussion

It is important to note that by the very nature of the contextand processes involved in the assessment of practice, it isimpossible to completely remove subjectivity in making judge-ments of students’ performance (McGrath et al., 2006; Isaacsonand Stacy, 2009). Grading practice reflects the conclusion ofa decision making process which indicates how well a student isprogressing in respect to a standard or criteria and flags up areaswhere students can improve. There are a number of documentedchallenges when grading practice. Some of the challenges such astime available, consistency and accountability of assessors are notspecific to only grading practice. Those which are specific tograding relate to validity and reliability issues of the tools usedand grade inflation.

The findings from this systematic literature review should benoted with some degree of caution as the limitations of the liter-ature reported are noted within Appendix I. These include text/opinion articles which can be biased. Literature reviews (AppendixI) have common issues such as lack of clarity around inclusion andexclusion criteria of included/excluded studies, and are non-systematic.

luded

studiesreporting/co

mmen

tingupongrad

einflation

Typeof

study

Sample

size

Dataco

llection

method

sDataan

alysis

Mainfindings

Rationalefor

inclusion

Limitations

eseve

ral

ent

ttheAPG

uded

nsisten

ceater

Quan

titative

:Opportunisticstudy

eve

ryprelim

inary

piece

ofwork.

Nonumbe

rsu

pplie

d.

Con

venience

sample

ofallpractitioners

undertaking

aprogram

meof

study

within

one

university/BSc

communitynursing

(post-reg).

Assessm

entof

practicegrid

-APG

mea

sures13

item

ssu

chas

motivation,

initiative

,indep

enden

cean

dselfdirected.

SPSS

einterferen

tial

statistics

t-test

and

Blandan

dAltman

plot.

Inter-raterrelia

bility

judge

dsign

ificant-

individualsmak

ethe

samemea

suremen

tat

2differenttime

points.M

arks

were

simila

rforthe

studen

tov

ertw

otrim

esters

-perhap

ssu

ggests

relia

bilityor

that

theinstrumen

tis

not

sensitive

enou

ghto

detectch

ange

s.Authorsnotethat

inve

ryhighgrad

ethere

isless

relia

bilityto

thepreviou

sor

nex

tgrad

e.Lo

wer

and

middle

grad

esare

morerelia

ble.

Ithink

theresh

ould

bean

Interestingpoints

forinter-rater

relia

bility.

Verysm

allsample

andnot

majorly

tested

.

(con

tinu

edon

next

page)

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114 107

Reported quantitative studies were commonly carried out ina single institution which may provide geographical bias, usedsmall samples sizes limiting the generalisation of the findings, re-ported (or not) low response rates and sampling bias which alsolimits the generalisation of the findings. The findings from thissystematic literature review comes from a wide source of profes-sional literature, and as demonstrated in Table 3a and b, thenumber of quantitative studies is relatively large, and on closerinspection, these studies often have limitations of being set in onegeographical area, one profession, and calculated on relativelysmall sample sizes. Most quantitative studies used survey design,and/or presented using descriptive statistics only. Therefore thegeneralisability of findings from these quantitative studies shouldbe necessarily cautious.

For these reasons, the authors of this literature review haveconcluded that grade inflation is a well-documented reportedproblem with grading practice, but given the limitations of thereported literature on this topic it would be unwise to conclude thatgrade inflation is a generalised problem in every area or in everyprofession (as it was not stated in every paper on grading practiceincluded within this review).

There are some suggested methods (although not fully evalu-ated) from the literature to control grade inflation. The mostpromising of those appears to be the use of rubrics. Carefully con-structed rubrics can ameliorate grade inflation. Rubrics are madeup of three key components: clearly defined performance criteriaor elements; detailed descriptions of what a performance looks likeat each level (or grade) of proficiency; and a rating scalewhichmostcommonly uses a three or four points. Rubrics can be used bothformatively and summatively.

If nursing and midwifery were to use grading within practice,there are a number of training requirements suggested by theliterature for mentors: developing an understanding and inter-pretation of educational terminology; how to accurately assesslearning; how to assign grades; how to use the tools consistentlyand effectively; how to write evidence to support the gradedassessment given; how to deal with borderline students; and howto deliver effective feedback. The time and methods of training andupdating mentors needs to be considered. However, a full evalua-tion of the development, testing and use of rubrics within nursingand midwifery is advocated by the authors given the lack ofevidence on their usefulness, reliability and validity within theliterature.

If the grading of practice is adopted, we recommend thatconsideration is given to the following:

� The development, testing and use of rubrics;� The use of rubrics for formative as well as summativeassessment;

� The use of a multi-method approach to assessment which canbe graded;

� Comprehensive training and updating sessions for assessors;� On-going evaluation and monitoring of the grading processused.

ixI.

Ove

rview

ofinc

Aim

bean

don

(200

3)To

evaluat

aspects

ofamea

surem

instrumen

whichincl

intern

alco

andinter-r

relia

bility.

Acknowledgements

Sheena Moffat, Subject Librarian, Edinburgh Napier UniversityNHS Education for Scotland for funding this project. The views

expressed within this systematic literature review are not neces-sarily the views of the funding organisation.

Appen

d

Author

Bau

lcom

Wats

(continued )

Author Aim Type of study Sample size Data collectionmethods

Data analysis Main findings Rationale forinclusion

Limitations

improvement ingrade over the nextsemester but authorsargue that this couldbe due to ’plateau inperformance’ as theyare alreadyexperiencedpractitioners. Authorssuggest furtherdevelopment andtesting.

Cacameseet al. (2007)

Determine theexistence, extent andpossible causes ofsub-internship gradeinflation.

Survey - 16 questions. 278 sample size, 141returned survey e

51% response rate.Sent to national (USA)sample of internalmedicine clerkship.

QuestionnaireeLikertscale and yes/no.

Inferential statisticsusing t tests, analysisof variance andKrusaleWallis tests.

Grades described ashonours/high pass/pass/low pass/fail orA/B/C/D/E or pass/failand 11 others. Gradeinflation defined asa percentage ofexcellent scores thanstudent performanceswarrant. 80% receivehonours level gradewhile directors feelthat only 1/3rdshould achieve thisgrade. Grade inflationexists. 18% ofassessors admittedpassing a student thatthey felt should havefailed. Difficultydelivering negativefeedback is the topexplanation for gradeinflation.

Authors concludethat grades are notparticularly helpfulin discriminatingstudentperformance.

Response rate 51%,some questionswere unanswered.Retrospective recallrequired.

Calmanet al., 2002

To describe themethods ofmeasuring progressin achievingcompetence of pre-registration Scottishnursing & midwiferystudents and todescribe thephilosophy andapproaches tocompetenceassessment in HEI &FE Colleges.

Phased usingquantitative andqualitative methodse 13 programmesacross 7 HEIs.

13 directors ofprogrammes(survey).12 group interviewswith students (6nursing & 6midwifery),comprising 72students (36 of eachgroup).

Postal questionnaireDocumentaryanalysisInterviews with keystakeholders.

Consensus views ofstudents weresummarised by theinterviewer underthe topic headingswithin the interviewguide. No otherinformation providedon analysis of data.

Four key findings ecompetenceassessment methods,preparation ofpractice assessors;consequences offailure to meetexpected level ofoutcome andstudents’ views. Atthe time of the study2 institutionsawarded merits/distinction foracademic excellenceonly. One institutionhad used a gradingsystem for clinicalpractice but this was

Students’ viewssuggest that thereis little confidencein methods ofclinical competenceassessment. Noformal reliabilityand validity testing.

Sampling could bebiased.

J.H.D

onaldson,M.G

ray/Nurse

Educationin

Practice12

(2012)101

e114

108

discontinued"because ofa tendency for highgrades to beawarded.there wasconcern too that thesystem was open tosubjective bias of theassessor." [p. 520].Also suggests lack ofconsistency toassessment.

Chambers (1999) Address one dentalschool’s efforts toaddress issues ofcompetency-basedevaluation.

Quantitative. 77 faculty members. Quarterly ratingsystem developedand used to replacedaily grading. In year1703 ratings werecompleted on 126students - average 5.3ratings per student. Inyear 2 1967 ratingswere completed on142 students - morethan 15 ratings perstudent, on average.

Inferential statistics. Grade inflationbrought undercontrol. Progressionhalted forincompetentstudents. Moreextensive info aboutremedial needs ofstudents.Excellent face validityand rater consistencyin some cases.

Grade inflationseemed betterfollowingimplementation ofthis tool.

Tested in onedental school inUSA.

Coote et al. (2007) Development ofa CommonAssessment Form(CAF) for gradeassessing physiostudents froma number of HEIs onpractice Educationplacements inRepublic of Ireland.

QuantitativePiloted CAF (commonassessment tool) ona small number ofsites then used.

54 practice educators. Pre-pilotQuestionnaire to 54practice educators.Testing validity - 2experienced practiceeducators gradedstudents at the end ofplacement e 71 datasets returned foranalysis.Testing inter-raterreliability practicetutor and a practiceeducator rated thesame student on theCAF at the end of theplacement e 43 datasets returned foranalysis.

SPSS.Pearson correlationcoefficient.C IntraclassCorrelationCoefficientsBland and Altmanmethodicients.

The validity studysuggested good face,content and constructvalidity and thecorrelationcoefficients andmeans of differencesbetween scoresuggest highreliability. Thepossibility ofsystematically higherscoring by thepractice educatorswarrants furtherinvestigation.

CommonAssessment Tool.Discusses validityand reliability ofCAF.

No response ratesstated.

Gill et al. (2006) Examine theapplicability, validityand reliability of theClinical PerformanceAssessment Tool(CPAT) for post-graduate critical carenurses.

Quantitative:3 phasedescriptivecorrelational study.

Phase 1e6experienced clinicalnurses.Phase 2e8students and 8clinical facilitators.Phase 3 survey 9assessors, 13students.

CPAT.Questionnaire.Interviews estudents & theirassessor.

Descriptive statisticsand thematic analysisusing NUDIST.

CPAT facilitatedassessors to assiststudents to developtheir clinicalperformance.However substantialrefinement wasrequired to make ituseful as a clinicalassessment tool.Use of 5-pointassessment ratingscaleetoo complexand unrealistic forassessors and

Used 5-point ratingscale.

Small sample sizes,further evaluationnot apparent fromliterature insubsequent years.

(continued on next page)

J.H.D

onaldson,M.G

ray/Nurse

Educationin

Practice12

(2012)101

e114

109

(continued )

Author Aim Type of study Sample size Data collectionmethods

Data analysis Main findings Rationale forinclusion

Limitations

students to use.Grade inflationadmitted.

Glover et al. (1997) To evaluate and grade3rd yearundergraduatenursing students’clinical performanceduring practicum.

Exploratory Pilotstudy using bothquantitative andqualitative methods.

Total cohort of 1603rd year studentnurses.97 preceptorscompleted thestudents’ assessmentand became aconvenience sample.36 out of 160students completedthe self-assessmenttool.

Assessment tool-which had bothquantitative andqualitativecomponents.

Descriptive statistics.Content analysis.

Preceptors ratedstudent performanceat a higher than theexpected level.Preceptors ratedstudent performanceslightly higher thanstudents’ self-assessment.Preceptors found itdifficult to supporttheir ranking ofstudent performanceusing ANCI cues.Preceptors gradeswere ranked higherthan theoreticalgrades.

Grading/rating toolused.

Difficult to tell howmany grade pointswere used in theassessment tool.Domains convertedto % but no detail asto how this wasdone.

Hemmeret al. (2008)

Describe currentevaluation methods,use of the Reporter-Interpreter-Manager/Educator (RIME)framework and gradeassignment byinternal medicineclerkship directors.

Quantitative e

survey.Clerkship directors(n ¼ 84) Responserate 77%.

Survey Inferential statistics. Internal medicineclerkship directorscontinue toemphasisedescriptiveevaluation of traineese and also shifted toincluding a widevariety ofexamination andother methods,includingobservations withstandardised and realpatients. RIMEgaining widespreadacceptance.Triangulation ofmethods used to ratecandidates. Moregrade inflation incriterion-referencedgrading.

Interestingframework fordescriptiveevaluationcombined withdirect observation.

Response rate.

Hill et al. (2006) To develop andimplementa competency-basedassessment processfor the experientialcomponent ofa pharmacyeducationcurriculum.

Quantitative Survey. Students (n ¼ 74: 76%response rate).Faculty (n ¼ 90: 30%response rate).

Survey. Inferential statistics Faculty and studentperceptions of theassessment processwere generallypositive. Moderatelysuccessful in reducinggrading inflation.

Used 5-pointgrading tool,providing example.

Response rate.

J.H.D

onaldson,M.G

ray/Nurse

Educationin

Practice12

(2012)101

e114

110

Isaacson andStacy (2009)

Discuss faculty andstudent concernswith regard to clinicalevaluation and toexplore the use ofrubrics as a tool tohelp objectify theclinical evaluationprocess.

Literature review/opinion

NA NA NA Rubrics provide onepossible solutions tothe concerns offaculty and studentsrelated to effectiveclinical evaluation.

Rubrics could beworth furtherexploration.

No research data.

Johnson (2007) Discusses some of theissues that surroundthe grading ofcompetence basedassessmentsincluding grading andmotivation; effects ofgrading on (mis)classification; gradingand accountabilityetc.

Literature review/opinion

NA NA NA Potential benefits ofgrading needs to bebalance against itspotential drawbacks.

Good summarypaper of the prosand cons.

No research data.

Johnson (2008) Literature reviewattempting tosynthesise thepositive and negativeeffects of grading anddiscuss theimplications.

Literature review. NA NA NA Questions about thedesirability of gradingcompetency-basedassessments arerelated to issues ofvalidity, with thequestion hinging onthe simultaneousexistence of twomutually supportingfactors: ‘use value’and ‘validity’.

Literature reviewon process ofgrading, discussingimplications.

No details of searchstrategy or numberof articles included/excluded.

Kogan et al. (2003) Determine thefeasibility, reliabilityand validity of themCEX when used toevaluate Americanmedical students’clinical skills ina medical coreclerkship.

Quantitative. Students (n ¼ 165)completed 9 mCEXduring internship(89% completionrate).

mCEX instrument. Mean mCEX scoreswere correlated withexam scores andcourse grades.

Data support thefeasibility,reproducibility andvalidity of the mCEXin evaluating medicalstudents’ clinicalskills.

7-point scale used. Small sample size,single institution inUSA.

Kogan et al. (2003) Determine thefeasibility, reliabilityand validity of themCEX when used toevaluate Americanmedical students’clinical skills ina medical coreclerkship.

Quantitative. Students (n ¼ 165)completed 9 mCEXduring internship(89% completionrate).

mCEX instrument. Mean mCEX scoreswere correlated withexam scores andcourse grades.

Data support thefeasibility,reproducibility andvalidity of the mCEXin evaluating medicalstudents’ clinicalskills.

7-point scale used. Small sample size,single institution inUSA.

Molenaaret al. (2004)

Comparison ofgraded written casereports with rating ofclinical performance.

Quantitative. 4 reviewers carriedout rating of writtencase reports. Clinicalsupervisors (n ¼ notdefined) rated clinicalperformance.Analysis of grades of710 case reports and189 ratings.

Scoring list to gradecase reports.

Descriptive statistics. Ratings of clinicalperformance wereskewed towardsexcellent.Rating of case reportswere normallydistributed. Thereforevery low ratingbetween the two.

Comparison ofdifferent methodsof ratingcompetence.

Small sample.One institutionused.

(continued on next page)

J.H.D

onaldson,M.G

ray/Nurse

Educationin

Practice12

(2012)101

e114

111

(continued )

Author Aim Type of study Sample size Data collectionmethods

Data analysis Main findings Rationale forinclusion

Limitations

Suggest that casereports measureprimarily cognitiveaspects as comparedwith overall clinicalperformance. Clinicalperformance was notdiscussed amongstraters, but‘moderation’ done forwritten case reports.

Murrayet al. (2000)

Literature review todetermine the extentcurrently availableassessmentapproaches canmeasure potentiallyrelevant medicaleducation outcomesaddressingpractitionerperformance

Literature review. MEDLINE and searchwords defined.

NA NA Validity andreliability are likely tocome from multiplesampling andtriangulation of data.

Comparison ofdifferent methodsof ratingcompetence.

No inclusion/exclusion criteriadefined.

Schmahmannet al. (2008)

Develop a gradingsystem that assessesmultiple skills andreflects proficiency.

Quantitativelongitudinal study.

409 medical students. Bedside ExaminationExercise (BEE)completed withexisting techniques.

Interferentialstatistics.

Need fora compositae scorederived fromdifferent testinstruments.

BEE helped to limitgrade inflation byusing a compositaescore betweentests.

One geographicalare, small samplesize.

Examine therelationship betweengrades earned intheory and practice.

Quantitative. 10 paired courseswere compared.

Documentaryevidence of gradesachieved.

Wilcoxon rank tests. Grades for theorycourses approachednormal distributionand had a slightpositive skew, whilegrades for clinicalcourses did not havea normal distributionand were negativelyskewed.

Used gradingsystem.

Number of studentin each course notdefined e

aggregated datawere used.

Weaver et al. (2007) Test the hypothesis:a simple change ofthe shift gradingcards, using explicitcriteria, woulddecrease gradeinflation and aid toredistribute the shiftevaluations.

Quantitative: Beforeand after study onemergency medicalclinical clerkship.

1612 before changeevaluations.1737 after changeevaluations.Students onemergencydepartmentclerkship.

Completed shiftevaluation cards.

Descriptive statistics. A simple change inshift evaluation cardsto include moreexplicit gradingcriteria resulted ina significant changein grade distributionand greatly decreasedgrade inflation.

Part-remedy tograde inflation.

One geographicalarea, no sample sizegiven.Description of datacollection andanalysis sketchy.

Weaver et al. (2007) Test the hypothesis:a simple change ofthe shift gradingcards, using explicitcriteria, woulddecrease gradeinflation and aid toredistribute theshift evaluations.

Quantitative: Beforeand after study onemergency medicalclinical clerkship.

1612 before changeevaluations.1737 after changeevaluations.Students onemergencydepartmentclerkship.

Completed shiftevaluation cards.

Descriptive statistics. A simple change inshift evaluation cardsto include moreexplicit gradingcriteria resulted ina significant changein grade distributionand greatly decreasedgrade inflation.

Part-remedy tograde inflation.

One geographicalarea, no sample sizegivenDescription of datacollection andanalysis sketchy.

J.H.D

onaldson,M.G

ray/Nurse

Educationin

Practice12

(2012)101

e114

112

Wilk

inson

etal.(20

08)

Toev

aluatethe

relia

bilityan

dfeasibility

ofassessing

perform

ance

ofmed

ical

specialis

tregistrars

using3

method

s.

Quan

titative

:Fe

asibility

study.

Trainee

se

128,

59,

230formini-CEX

,DOPS

(Directly

observed

procedural

skills)

andMSF

(Multi-source

feed

back

)assessmen

tsresp

ective

ly.

Mini-clinical

evaluation(m

ini-

CEX

;DOPS

;MSF

.

Inferential

statistics.

Themethod

sare

feasible

toco

nduct

andcanmak

erelia

ble

distinctionsbe

twee

ndoc

tors’p

erform

ance.

How

ever

therewere

somedoc

tors

who

scored

satisfactory

overall,bu

tob

tained

unsatisfactory

for

someaspects

ofcare.

Gradingsystem

usedan

dreported.

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114 113

References

Alexander, H.A., 1996. Physiotherapy student clinical education: the influence ofsubjective judgements on observational assessment. Assessment and Evalua-tion in Higher Education 21 (4), 357e367.

Allen, P., Lauchner, K., Bridges, R.A., Francis-Johnson, P., McBride, S.G., Olivarez, A.,2008. Evaluating continuing competency: a challenge for nursing. Journal ofContinuing Education in Nursing 39 (2), 81e85.

Andrade, H.G., 2000. Using rubrics to promote thinking and learning. EducationalLeadership 57 (5), 13e19.

Andre, K., 2000. Grading student clinical practice performance: the Australianperspective. Nurse Education Today 20 (8), 672e679.

Battistone, M.J.P.B, Milnes, C., Battistone, M.L., Sande, M.A., Hemmer, P.A.,Shomaker, T.S., 2001. Global Descriptive Evaluations Are More Responsive thanGlobal Numeric Ratings in Detecting Students’ Progress during the InpatientPortion of an Internal Medicine Clerkship. Academic Medicine 76 (10 Suppl.),S105eS107.

Baulcombe, S., Watson, R., 2003. The internal consistency and intra-rater reliabilityof an assessment practice grid for community nurses. Clinical Effectiveness inNursing 7 (3/4), 168e170.

Ben-David, M.F., Snadden, D., Hesketh, A., 2004. "Linking appraisal of PRHOprofessional competence of junior doctors to their education.". Medical Teacher26 (1), 63e70.

Bloodgood, R.A., Short, J.G., Jackson, J.M., Martindale, J.R., 2009. A Change to Pass/Fail Grading in the First Two Years at One Medical School Results in ImprovedPsychological Well-being. Academic Medicine 84 (5), 655e662.

Brown, N., 2000. What are the criteria that mentors use to make judgements on theclinical performance of student mental health nurses? An exploratory study ofthe formal written communication at the end of clinical nursing practicemodules. Journal of Psychiatric & Mental Health Nursing 7 (5), 407e416.

Cacamese, S.M., Elnicki, M., Speer, A.J., 2007. Grade inflation and the internalmedicine subinternship: a national survey of clerkship directors. Teaching &Learning in Medicine 19 (4), 343e346.

Calman, L., Watson, R., Norman, I., Redfern, S., Murrells, T., 2002. Assessing practiceof student nurses: methods, preparation of assessors and student views. Journalof Advanced Nursing 38 (5), 516e523.

Cassidy, S., 2009. Interpretation of competence in student assessment. NursingStandard 23 (18), 39e46.

Chambers, D.W., 1999. Faculty ratings as part of a competency-based evaluationclinic grading system. Evaluation and the Health Professions 22 (1), 86e106.

Clouder, L., Toms, J., 2005. An Evaluation of the Validity of Assessment StrategiesUsed to Grade Practice Learning in Undergraduate Physiotherapy Students:Final Report to the Health Science and Practice Subject Centre of the HigherEducation Academy. from: http://www.health.heacademy.ac.uk/projects/miniprojects/clouder.pdf.

Coote, S., Alpine, L., Cassidy, C., Loughnane, M., McMahon, S., Meldrum, D.,O'Connor, A., O'Mahoney, M., 2007. The development and evaluation ofa common assessment form for physiotherapy practice education in Ireland.Physiotherapy Ireland 28 (2), 6e10.

Cowan, D.T., Norman, I.J., Coopamah, V.P., 2005. Nurse competency. A project toestablish a skills competency matrix for EU nurses. British Journal of Nursing(BJN) 14 (11), 613e617.

Darra, S., Hunter, B., McIvor, M., Webber, F., Morse, N., 2003. Education.Developing a midwifery skills framework. British Journal of Midwifery 11(1), 43e47.

ElBadrawy, H., Korayem, M., 2007. The flexible requirement system for grading ofclinical performance of undergraduate dental students. European Journal ofDental Education: Official Journal Of The Association For Dental Education InEurope 11 (4), 208e215.

Fisher, M., Parolin, M., 2000. The reliability of measuring nursing clinical perfor-mance using a competency based assessment tool: a pilot study. Collegian 7 (3),21e27.

Fletcher, P., 2008. Clinical competence examination e improvement of validityand reliability. International Journal of Osteopathic Medicine 11 (4),137e141.

Fordham, A.J., 2005. Using a competency based approach in nursing education.Nursing Standard 19 (31), 41e48.

Gill, F., Leslie, G., Southerland, K., 2006. Evaluation of a clinical performanceassessment tool (CPAT) within a critical care context. Australian Critical Care 19(3), 105e113.

Glover, P., Ingham, E., Gassner, L.A., 1997. The development of an evaluation tool forgrading clinical competence. Contemporary Nurse: A Journal for the AustralianNursing Profession 6 (3/4), 110e116.

Gray, M.A., Donaldson, J., 2009. Exploring Issues in the use of Grading in Practice: ALiterature Review. Final Report commissioned by NHS Education for Scotland.http://www.nes.scot.nhs.uk/practice_education/documents/Final_Report_Vol1.pdf.

Hemmer, P.A., Papp, K.K., Mechaber, A.J., Durning, S.J., 2008. Evaluation, grading,and use of the RIME vocabulary on internal medicine clerkships: results ofa national survey and comparison to other clinical clerkships. Teaching &Learning in Medicine 20 (2), 118e126.

Hill, L.H., Delafuente, J.C., Sicat, B.L., Kirkwood, C.K., 2006. Development ofa competency-based assessment process for advanced pharmacy practiceexperiences. American Journal of Pharmaceutical Education 70 (1), 1.

J.H. Donaldson, M. Gray / Nurse Education in Practice 12 (2012) 101e114114

Holaday, S.D., Buckley, K.M., 2008. Chapter 7. A standardized clinical evaluationtool-kit: improving nursing education and practice. Annual Review of NursingEducation 6, 123e149.

Iramaneerat, C., Yudkowsky, R., 2007. Rater errors in a clinical skills assessmentof medical students. Evaluation and the Health Professions 30 (3),266e283.

Isaacson, J.J., Stacy, A.S., 2009. Rubrics for clinical evaluation: objectifying thesubjective experience. Nurse Education in Practice 9 (2), 134e140.

Johnson, M., 2007. Is passing just enough? Some issues to consider in gradingcompetence- based assessments. Research Matters 3, 27e30.

Johnson, M., 2008. Grading in competence-based qualifications - is it desirable andhow might it affect validity? Journal of Further and Higher Education 32 (2),175e184.

Kogan, J.R., Bellini, L.M., Shea, J.A., 2003. Feasibility, reliability and validity of themini-clinical evaluation exercise (mCEX) in a medical core clerkship. AcademicMedicine 78, s33es35.

Lanphear, J.H., 1999. In support of grading systems. Education for Health 12 (1),79e83.

Lasater, K., 2007. Clinical judgment development: using simulation to create anassessment rubric. Journal of Nursing Education 46 (11), 496e503.

Lauder, W., Watson, R., Holland, K., Roxburgh, M., Topping, K., 2008. The nationalevaluation of fitness for practice curricula: self-efficacy, support, and self-re-ported competence in pre-registration student nurses and midwives. Journal ofClinical Nursing 17, 1858e1867.

McGrath, P., Anastasi, J., et al., 2006. Collaborative voices: ongoing reflections onnursing competencies. Contemporary Nurse: A Journal for the AustralianNursing Profession 22 (1), 46e58.

McManus, I.C., Thompson, M., Mollon, J., 2006. Assessment of examiner leniencyand stringency (‘hawk-dove effect’) in the MRCP (UK) clinical examination(PACES) using multi-facet Rasch modelling. BMC Medical Education 6 (42),1e22.

Molenaar, W.J., Reinders, J.J., Koopmans, S.A., Talsma, M.D., van Essen, L.H., 2004.Written case reports as assessment of the elective student clerkship: consis-tency of central grading and comparison with ratings of clinical performance.Medical Teacher 26 (4), 301e304.

Montgomery, K., 2000. Classroom rubrics: systematizing what teachers do natu-rally. The Clearing House 73 (6), 324e328.

Moon, J., 2002. The Module & Programme Development Handbook. Kogan Page,London.

Moskal, B.M., Leydens, J.A., 2000. Scoring rubric development: validity and relai-bility. Practical Assessment, Research & Evaluation 7 (10), 1e11.

Mueller, J., n.d. What is authentic assessment. http://jonathan.mueller.faculty.noctrl.edu/toolbox/whatisit.htm (last accessed 01.09.10.).

Murray, E., Gruppen, L., Catton, P., Hays, R., Woolliscroft, J.O., 2000. The account-ability of clinical education: its definition and assessment. Medical Education34 (10), 871e879.

Neary, M., 2000a. Responsive assessment of clinical competence: part 1. NursingStandard 15 (9), 34e36.

Neary, M., 2000b. Responsive assessment of clinical competence: part 2. NursingStandard 15 (10), 35e40.

Norcini, J.J., 2007. Workplace-based Assessment in Clinical Training. Association forthe Study of Medical Education, Edinburgh.

Nursing and Midwifery Council, 2007. Grading of Clinical Practice for Pre-registration Midwifery Programmes. http://www.nmc-uk.org/Documents/Circulars/2007%20circulars/NMC%20circular%2025_2007.pdf.

Panzarella, K.J., Manyon, A.T., 2007. A model for integrated assessment of clinicalcompetence. Journal of Allied Health 36 (3), 157.

Parahoo, K., 2006. Nursing Research: Principles, Process and Issues. Palgrave Mac-millan, Houndsmills, Basingstoke.

Ravelli, C., Wolfson, P., 1999. What is the ‘ideal’ grading system for the juniorsurgery clerkship? American Journal of Surgery 177, 140e144.

Rohe, D.E., Barrier, P.A., Clark, M.M., Cook, D.A., Vickers, K.S., Decker, P.A., 2006.The Benefits of pass-fail grading on stress, mood, and group cohesion inmedical students. Mayo Clinic Proceedings 81 (11), 1443e1448.

Sadler, D.R., 2005. Interpretations of criteria-based assessment and grading in highereducation. Assessment and Evaluation in Higher Education 30 (3), 175e194.

Schmahmann, J.D., Neal, M., MacMore, J., 2008. Evaluation of the assessment andgrading ofmedical students on a neurology clerkship. Neurology 70 (9), 706e712.

Seldomridge, L.A., Walsh, C.M., 2006. Evaluating student performance in under-graduate preceptorships. Journal of Nursing Education 45 (5), 169e176.

Sharp, S., 2006. The grading of placement in initial teacher education in Scotland.Scottish Educational Review 38 (2), 145e157.

Smith, J., 2007. Assessing and grading students’ clinical practice: midwives’ livedexperience. Evidence Based Midwifery 5 (4), 112e118.

Truemper, C.M., 2004. Using scoring rubrics to facilitate assessment and evaluation ofgraduate-level nursing students. Journal of Nursing Education 43 (12), 562e564.

Walsh, C.M., Seldomridge, L.A., 2005. Clinical grades: upward bound. Journal ofNursing Education 44 (4), 162e168.

Walsh, C.M., Seldomridge, L.A., Badros, K.K., 2008. Developing a practical evaluationtool for preceptor use. Nurse Educator 33 (3), 113e117.

Weaver, C.S., Humbert, A.J., Besinger, B.R., Graber, J.A., Brizendine, E.J., 2007. A MoreExplicit Grading Scale Decreases Grade Inflation in a Clinical Clerkship.Academic Emergency Medicine 14 (3), 283e286.

Wilkinson, J.R., Crossley, J.G., Wragg, A., Mills, P., Cowan, G., Wade, W., 2008.Implementing workplace-based assessment across the medical specialities inthe United Kingdom. Medical Education 42 (4), 364e373.

Williams, M., Bateman, A., 2003. Graded Assessment in Vocational Education andTraining. Australian National Training Authority, Leabrook, Australia, pp. 1e66.

Yorke, M., 2005. Issues in the assessment of practice-based professional learning: Areport prepared for the Practice-based Professional Learning CETL at the OpenUniversity from http://www.open.ac.uk/cetl-workspace/cetlcontent/documents/464428ed4aa20.pdf.