Responsiveness of the Neck Disability Index in patients with mechanical neck disorders

7
Clinical Study Responsiveness of the Neck Disability Index in patients with mechanical neck disorders Brian A. Young, PT, DSc a, * , Michael J. Walker, PT, DSc a , Joseph B. Strunce, PT, DSc b , Robert E. Boyles, PT, DSc c , Julie M. Whitman, PT, DSc d , John D. Childs, PT, PhD, MBA a a Department of Physical Therapy, Sheppard AFB, TX, USA b Northern Navajo Medical Center, Shiprock, NM, USA c School of Physical Therapy, University of Puget Sound, Tacoma, WA, USA d Department of Physical Therapy, Regis University, Denver, CO, USA Received 15 February 2008; revised 1 May 2009; accepted 6 June 2009 Abstract PURPOSE: Report the test-retest reliability, construct validity, minimum clinically important difference (MCID), and minimal detectable change (MDC) for the Neck Disability Index (NDI). STUDY DESIGN/SETTING: Cohort study of patients presenting to outpatient physical therapy clinics. PATIENT SAMPLE: Ninety-one subjects with a primary complaint of neck pain, with or without concomitant upper extremity (UE) symptoms, who were participants in a randomized clinical trial. OUTCOME MEASURES: NDI and the 15-point Global Rating of Change (GRC) self-report measures. METHODS: All subjects completed the NDI at baseline and at a 3-week follow-up. Additionally, sub- jects completed the GRC scale, which was used to dichotomize patients into improved or stable groups. Changes in the NDI were used to assess test-retest reliability, construct validity, MCID, and MDC. RESULTS: Test-retest reliability was moderate for the NDI (intraclass correlation coefficient, 0.64; 95% confidence interval, 0.19–0.84). For the NDI, the MCID was 7.5 points and the MDC was 10.2 points. CONCLUSIONS: The NDI appears to demonstrate adequate responsiveness based on statistical ref- erence criteria when used in a sample that approximates the high percentage of patients with neck pain and concomitant UE referred symptoms. Because the MCID is within the bounds of measurement error, a 10-point change (the MDC) should be used as the MCID. Ó 2009 Elsevier Inc. All rights reserved. Keywords: Minimum clinically important difference; Reliability; Validity; Psychometric properties; Minimum detectable change; Standard error of measurement Introduction It is estimated that up to 54% of the population has experienced neck pain within the past 6 months [1], with up to 42% seeking care from general practitioners [2]. Fifteen percent of a physical therapist’s caseload consists of patients with neck pain [3]. Several self-report functional outcome or disability measures have been developed for the assessment of disability in patients with neck pain [4–10]. Of interest to clinicians is the clinical utility of self-report measures to accurately reflect patient-perceived status and identify when that status has changed through a course of treatment. The Neck Disability Index (NDI), originally modeled after the Oswestry Low Back Pain Disability Questionnaire [4,11], is the most studied and well established of the outcome measures for neck pain [12] and assesses both subjective symptoms and activities of daily living. Several researchers [4,8,13,14] have assessed the reliability and validity of the NDI. Three studies [8,13,15] have reported on the responsiveness of the scale or the ability of the scale to accuretly detect when change has indeed occurred [16]. FDA device/drug status: not applicable. Author disclosures: none. The opinions or assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Air Force, U.S. Public Health Service, U.S. Army, or Department of Defense. The study was approved by the Brooke Army Medical Center/Wilford Hall Medical Center Joint Institutional Review Board. * Corresponding author. 105 Jupiter Street, Sheppard AFB, TX 76311, USA. Tel.: (210) 386-2133. E-mail address: [email protected] (B.A. Young) 1529-9430/09/$ – see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.spinee.2009.06.002 The Spine Journal 9 (2009) 802–808

Transcript of Responsiveness of the Neck Disability Index in patients with mechanical neck disorders

The Spine Journal 9 (2009) 802–808

Clinical Study

Responsiveness of the Neck Disability Index in patientswith mechanical neck disorders

Brian A. Young, PT, DSca,*, Michael J. Walker, PT, DSca, Joseph B. Strunce, PT, DScb,Robert E. Boyles, PT, DScc, Julie M. Whitman, PT, DScd, John D. Childs, PT, PhD, MBAa

aDepartment of Physical Therapy, Sheppard AFB, TX, USAbNorthern Navajo Medical Center, Shiprock, NM, USA

cSchool of Physical Therapy, University of Puget Sound, Tacoma, WA, USAdDepartment of Physical Therapy, Regis University, Denver, CO, USA

Received 15 February 2008; revised 1 May 2009; accepted 6 June 2009

Abstract PURPOSE: Report the test-retest reliability,

FDA device/drug

Author disclosure

The opinions or as

authors and are not to

the U.S. Air Force, U

of Defense.

The study was app

Hall Medical Center

* Corresponding a

USA. Tel.: (210) 386

E-mail address: b

1529-9430/09/$ – see

doi:10.1016/j.spinee.2

construct validity, minimum clinically importantdifference (MCID), and minimal detectable change (MDC) for the Neck Disability Index (NDI).STUDY DESIGN/SETTING: Cohort study of patients presenting to outpatient physical therapyclinics.PATIENT SAMPLE: Ninety-one subjects with a primary complaint of neck pain, with or withoutconcomitant upper extremity (UE) symptoms, who were participants in a randomized clinical trial.OUTCOME MEASURES: NDI and the 15-point Global Rating of Change (GRC) self-reportmeasures.METHODS: All subjects completed the NDI at baseline and at a 3-week follow-up. Additionally, sub-jects completed the GRC scale, which was used to dichotomize patients into improved or stable groups.Changes in the NDI were used to assess test-retest reliability, construct validity, MCID, and MDC.RESULTS: Test-retest reliability was moderate for the NDI (intraclass correlation coefficient,0.64; 95% confidence interval, 0.19–0.84). For the NDI, the MCID was 7.5 points and the MDCwas 10.2 points.CONCLUSIONS: The NDI appears to demonstrate adequate responsiveness based on statistical ref-erence criteria when used in a sample that approximates the high percentage of patients with neck painand concomitant UE referred symptoms. Because the MCID is within the bounds of measurement error,a 10-point change (the MDC) should be used as the MCID. � 2009 Elsevier Inc. All rights reserved.

Keywords: Minimum clinically important difference; Reliability; Validity; Psychometric properties; Minimum detectable

change; Standard error of measurement

Introduction

It is estimated that up to 54% of the population hasexperienced neck pain within the past 6 months [1], withup to 42% seeking care from general practitioners [2].Fifteen percent of a physical therapist’s caseload consists

status: not applicable.

s: none.

sertions contained herein are the private views of the

be construed as official or as reflecting the views of

.S. Public Health Service, U.S. Army, or Department

roved by the Brooke Army Medical Center/Wilford

Joint Institutional Review Board.

uthor. 105 Jupiter Street, Sheppard AFB, TX 76311,

-2133.

[email protected] (B.A. Young)

front matter � 2009 Elsevier Inc. All rights reserved.

009.06.002

of patients with neck pain [3]. Several self-report functionaloutcome or disability measures have been developed for theassessment of disability in patients with neck pain [4–10].Of interest to clinicians is the clinical utility of self-reportmeasures to accurately reflect patient-perceived status andidentify when that status has changed through a course oftreatment.

The Neck Disability Index (NDI), originally modeledafter the Oswestry Low Back Pain Disability Questionnaire[4,11], is the most studied and well established of theoutcome measures for neck pain [12] and assesses bothsubjective symptoms and activities of daily living. Severalresearchers [4,8,13,14] have assessed the reliability andvalidity of the NDI. Three studies [8,13,15] have reportedon the responsiveness of the scale or the ability of the scaleto accuretly detect when change has indeed occurred [16].

ContextThe Neck Pain Disability Index (NPDI) is a popular

standardized questionnaire providing a self-report of

perceived neck symptoms and impact at one point in

time.

ContributionThe authors found that baseline NPDI scores declined

more in subjects who felt, compared to baseline, they

were globally ‘‘better,’’ than in subjects who did not feel

they were ‘‘better.’’ But there was considerable overlap.

The statistically determined minimum detectable change

(MDC), calculated from the standard error of measure,

appeared to be larger than the minimal clinically impor-

tant difference in the NPDI.

ImplicationsWhat constitutes meaningful improvement in spinal dis-

orders remains a controversial area, and statistically de-

rived values based upon a patient’s global report of

being ‘‘better’’ apparently varies with method of calcu-

lation and the population being studied.—The Editors

803B.A. Young et al. / The Spine Journal 9 (2009) 802–808

Responsiveness is often reported in two ways. The first isthe minimum detectable change (MDC), that is defined asa change in a patient’s score that is greater than measure-ment error [17]. The second is the minimum clinically im-portant difference (MCID) or the smallest change in aninstrument that is perceived to be beneficial by the patientand thus would bring about a change in patient manage-ment [18]. Prior MDC scores for the NDI have been re-ported to be between 4.2 and 9.8 raw score points[8,13,15]. Prior MCID scores for the NDI have been re-ported to be between 5 and 9.5 raw score points [13,15].

Sixty-five percent of patients with pain referable to thecervical spine report both axial and upper extremity (UE)radicular symptoms [19]. However, prior reports of thepsychometric properties of the NDI have yet to be reportedin a sample which approximates this high percentage ofpatients with neck pain and concomitant UE referred symp-toms. Therefore, the primary purpose of this study was toreport the psychometric properties of the NDI in a largesample of patients with a primary complaint of mechanicalneck pain, presenting with or without concomitant UEradicular symptoms. A secondary purpose was to assess ifthere is a difference in the amount of change in NDI scoresrelative to baseline scores between patients with andwithout unilateral UE symptoms who were rated asimproved, as well as between those rated as stable.

Methods

Subjects

This study is a secondary analysis of a larger multicenterrandomized clinical trial of physical therapy interventionsfor patients presenting with mechanical neck pain [20]. Inthe overall trial, patients were randomized to receivemanual physical therapy and exercise or a treatmentapproach comprising advice to remain active, range-of-mo-tion exercise, and subtherapeutic ultrasound. For the pur-pose of this secondary analysis, treatment groups werecollapsed into a single cohort. Patients were included inthe study if they had a primary complaint of neck pain withor without unilateral UE symptoms, age more than 18 years,a minimum score of 10 out of 50 on the NDI, and a mini-mum composite pain score of 30 mm on three separate100-mm visual analog scales assessing cervical pain, UEpain, and average 24-hour pain. To ensure that patientshad the opportunity to demonstrate clinically meaningfulimprovement over time in response to the interventionsbeing tested in the main trial, the aforementioned thresholdNDI and pain scores were established. Patients with a his-tory of whiplash injury within the past 6 weeks, cancer,infection, spinal fracture or surgery, symptoms consistentwith cervical spinal stenosis, bilateral UE symptoms, twoor more positive neurologic findings (ie, altered musclestretch reflexes, sensation, or strength), or pending legalaction regarding their neck pain were excluded. The study

was approved by a combined Institutional Review Boardfor Brooke Army Medical Center and Wilford Hall AirForce Medical Center, San Antonio, Texas, and all patientsprovided consent before participation.

Of the 94 patients who completed the randomizedclinical trial [20], data from 91 patients are reported here.These data represent all the patients who had completeNDI data for both the initial and 3-week follow-up visits.A total of 10 military physical therapists at two hospital-based clinics and one outpatient-based clinic participatedin data collection. Patients enrolled in the trial wererandomly assigned to either a manual physical therapyand exercise group or a minimal intervention group.Patients in the manual physical therapy and exercise groupreceived manual physical therapy interventions directed atthe cervical and thoracic regions and a standardizedexercise program. Patients in the minimal interventiongroup received cervical rotation range-of-motion exercises,advice to remain active, continued medication use asprescribed by the physician, and subtherapeutic ultrasound.All patients were treated twice weekly for 3 weeks. Anindependent examiner blinded to randomization status ad-ministered the self-reported outcome measures at the startof the study (before randomization) and again at comple-tion of the 3-week treatment phase of the study. The NDIwas administered at both data collection points, and 15-point Global Rating of Change (GRC) scales were com-pleted by both patients and treating therapists at the com-pletion of treatment. Treating therapists were blinded to

804 B.A. Young et al. / The Spine Journal 9 (2009) 802–808

patient scores on all self-report questionnaires. Data fromboth groups of the randomized trial were combined to forma single cohort for this secondary analysis.

Outcome measures

Neck Disability Index

The NDI [4,11] is a 10-item, 50-point index that assessesdifferent aspects of daily functioning in patients with neckpain. The NDI assesses four items regarding subjectivesymptoms (pain intensity, headache, concentration, sleep-ing), four items regarding activities of daily living (lifting,work, driving, recreation), and two items regarding discre-tionary activities of daily living (personal care, reading)[8,13]. Each item is scored 0 to 5, with the total reportedas either a raw score (0–50) [4,8,13,21–23] or as a percent-age score [15,24]. Raw scores were used in this analysis.

Global Rating of Change

The 15-point GRC scale was used for both patients andclinicians to assess overall perceptions of improvementsince the initiation of treatment [18]. The GRC scale rangesfrom �7 (‘‘a very great deal worse’’) to 0 (‘‘about thesame’’) to þ7 (‘‘a very great deal better’’). Incrementaldescriptors of worsening and improving are assigned valuesfrom �1 to �7 and from þ1 to þ7, respectively. Thefollowing classifications have been proposed regardingthe GRC score: 0, 1, or �1 signifies no change; 62 to 3signifies minimal change; 64 to 5 signifies moderatechange; and 66 to 7 signifies a large change in a patient’scondition. The GRC has been well validated andextensively used in research as an outcome measure andas an external reference standard to compare outcomemeasures [15,25–28].

Data analysis

Patients with an average GRC rating $3 (‘‘somewhatbetter’’) were considered to have improved. Patients withan average rating !3 to O�3 were considered to haveremained stable. Patients with an average rating of #�3(‘‘somewhat worse’’) or smaller were considered to haveworsened. Because no patients were classified as worsenedin our sample, we classified all the patients into two groups,improved and stable, for data analysis.

Patient variables for the improved and stable groups werecompared at baseline using independent t tests for continuousdata and c2 for categorical data. The Pearson correlation co-efficients were calculated between the patient and the clini-cian GRC scores. Construct validity was determined bycomparing the change in scores for the stable and improvedpatients using one-way analysis of variance for the repeatedmeasures at baseline and at the 3-week follow-up.

Responsiveness was calculated by constructing a receiveroperating characteristic curve by plotting the sensitivityvalues (true positive) on the y-axis against the 1�specificityvalues (false positives) on the x-axis to distinguish patientwho had improved from those who remained stable. Thearea under the curve (AUC) and 95% confidence intervals(CIs) were used as a quantitative method for assessinga scale’s ability to distinguish patients who had improvedfrom those who remained stable based on the GRC [29–31]. The MCID was determined to be the magnitude ofchange associated with the uppermost left-hand corner ofthe curve, where both sensitivity and 1�specificity aremaximized [26]. The AUC for a diagnostic test is consid-ered satisfactory when it exceeds 0.70 [25]. Responsivenesswas also calculated by determining the correlation betweenchange scores for the NDI and GRC in all patients.

MDC [32] was determined by first calculating thestandard error of measurement (SEM) [33,34] of the NDIfor those patients remaining stable, using the formula of(SD�[1�r]1/2), where r is the test-retest reliability coeffi-cient and SD is the standard deviation of the total variance.Test-retest reliability of the NDI was examined using theintraclass correlation coefficient (ICC) (2,1) [35] for thosepatients whose condition remained stable. Although noconsensus exists as to how much change must occur toconfidently exceed the bounds of measurement error, previ-ous researchers have reported 1 SEM as the best measure ofmeaningful change on health-related quality-of-lifemeasures [36]. To increase our confidence that we haveexceeded the bounds of measurement error, we used 1.65times SEM to calculate the statistically meaningful change,which represents the statistical amount of change necessaryto confidently exceed measurement error at a 90% CI. Thisvalue was then multiplied by the square root of 2 to accountfor the error in taking two measurements [13,26].

To assess for a difference in outcomes between thosepatients with unilateral UE symptoms and those withoutunilateral UE symptoms, a further subgrouping wasperformed. Change scores were calculated by subtractingthe 3-week NDI score from the initial NDI scores for eachof four groups: presence or absence of unilateral UE symp-toms, delineated as improved or stable. A one-way analysisof variance was performed to assess for differences inchange scores between the groups, with Bonferroni multi-ple comparisons tests used for post hoc comparisons. Ofparticular interest are the interactions between the im-proved groups and any mean change score differences, aswell as the interactions between the stable groups andany mean change score differences.

Results

Data from 91 patients (mean age, 47.8 years; SD,14.6 years; 61 female) were used in this analysis. The meanGRC (average of patient/clinician) for all patients was 3.9

Table 1

Baseline characteristics

Variable

Improved

patients (n566)

Stable

patients (n525) p Value

Age (y) 47.7 (13.6) 48.1 (17.4) .91

Gender (% females) 66.7 68.0 .90

Baseline NDI score

(0–50)

15.8 (4.6) 18.0 (7.1) .08

UE symptoms (%) 60.6 60.0 .96

Baseline VAS pain

(0–100 mm)

53.4 (21.3) 50.2 (16.3) .50

Symptom duration (d) 815 (1951) 844 (1721) .95

NDI, Neck Disability Index; VAS, visual analog scale; UE, upper

extremity.

Values are listed as mean (standard deviation) unless stated otherwise.

Table 2

Change scores

Measure

Total group

(n591)

Improved patients

(n566)

Stable patients

(n525)

Baseline NDI 16.4 (5.5) 15.8 (4.6) 18.0 (7.1)

Final NDI 8.2 (6.4) 5.9 (4.5) 14.2 (6.9)

NDI change score 8.2 (6.1) 9.8 (5.6) 3.8 (5.2)

Average GRC 4.0 (2.3) 5.1 (1.2) 0.8 (1.4)

ICC (2,1)

(95% CI)

N/A N/A 0.64 (0.19; 0.82)

NDI change UE

symptoms

present

N/A 9.3 (4.8); n540 3.7 (5.3); n515

NDI change UE

symptoms not

present

N/A 10.7 (6.7);

n526

4.0 (5.3);

n510

NDI, Neck Disability Index; GRC, global rating of change; ICC,

intraclass correlation coefficient; N/A, not applicable; CI, confidence

interval; UE, upper extremity.

All data reported as mean (standard deviation) unless otherwise

indicated.

1

805B.A. Young et al. / The Spine Journal 9 (2009) 802–808

(SD, 2.3), with an ICC of r50.69 (p!.000) between thepatient and clinician GRC. At the 3-week follow-up, 66patients (73%) were classified as having improved ($3),and 25 patients (27%) were classified as remaining stable(!3 to O�3). No patients were classified as worsening.These groups did not differ in their baseline characteristics,as listed in Table 1. The mean initial and final NDI scoresfor each group are illustrated in Fig. 1 and Table 2. Therewas a significant interaction between groups for the changein pretest and posttest scores, indicating there was a differ-ence in the NDI scores between stable and improvedpatients at the 3-week follow-up.

The AUC for the NDI at 3 weeks was 0.79 (0.68, 0.89)(Fig. 2). The MCID for the NDI was 7.5 points. The corre-lation between change scores for the NDI and GRC in allthe patients was 0.52 (p!.000).

Test-retest reliability of the NDI among patients whosestatus remained stable was moderate, with an ICC equalto 0.64 (95% CI, 0.19–0.84). With this ICC and a commonSD of 7.2 points, the SEM for the NDI was 4.3 points, andthe MDC for the NDI was 10.2 points.

The change scores for groups based on the presence orabsence of unilateral UE symptoms are listed in Table 2and shown graphically in Fig. 3. There was a significant

02468

1012141618

20

Initial NDI Final NDI

Nec

k D

isab

ility

Inde

x (0

-50)

Improved

Stable

Fig. 1. Graph of initial and final Neck Disability Index (NDI) scores be-

tween patients rated as improved and stable on the global rating of change.

There is a significant difference between the final NDI scores (p!.000).

interaction effect for group, f57.363, p!.000. In otherwords, there was a significant difference between thosewho had improved versus those who were stable. Posthoc analysis revealed no differences in NDI change scoresbetween those patients who had improved and were experi-encing unilateral UE symptoms versus those who improvedbut were not experiencing unilateral UE symptoms. Like-wise, no differences in NDI change scores were found be-tween those who were stable and not experiencingunilateral UE symptoms versus those who were stableand experiencing unilateral UE symptoms.

Discussion

The results of this study show that a change of 7.5 out of50 points (or 15 points if referencing a 100-point scale) is

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.2 0.4 0.6 0.8 11-Specificity

Sensitivity

Fig. 2. Receiver operating characteristic curve for the Neck Disability

Index (NDI). The arrow marks the data point nearest the uppermost left

hand corner of the graph. This point represents the minimum clinically

important difference for the NDI.

0

5

10

15

20

25

NDI Initial NDI 3 week

ND

I Sco

re (0

-50)

UE Absent Stable

UE Absent Improved

UE Present Stable

UE Present Improved

Fig. 3. Comparison of initial and 3-week Neck Disability Index scores

based on improved versus stable and the presence or absence of unilateral

upper extremity symptoms.

806 B.A. Young et al. / The Spine Journal 9 (2009) 802–808

the MCID for the NDI when used to collectively assess out-comes in patients presenting with primary mechanical neckpain both with and without concomitant UE symptoms.However, the MCID is within the bounds of measurementerror as indicated by an MDC of 10.2 points [26]. Theseresults are similar to those of Cleland et al. [15] and arehigher than those previously reported by Stratford et al.[13] and Westaway et al. [8] (Table 3). We will exploretwo possible explanations for the variability in reportedMCID and MDC scores.

One difference between our study and the studies ofStratford et al. [13] and Westaway et al. [8] is the choiceof which external standard was used to assess meaningfulchange. Stratford et al. [13] and Westaway et al. [8] usedan a priori assessment by the evaluating or treating therapistin their NDI responsiveness studies. We, however, used thea posteriori GRC scale as our comparison standard for theNDI, as previously used in studies assessing cervical andlow back pain functional outcome measure responsiveness[15,25,28]. To minimize the potential error in patient recallfrom their baseline status [37] and to account for treatmenteffect from the perspective of the therapist, we used theaverage of the therapist and patient global ratings [25,26].

Table 3

Comparison of current study results with prior reported results

Study N MDC MCID Reliability

Current study 91 10.2 7.5 0.64

Cleland et al. [15]* 137 9.8 9.5 0.50

Stratford et al. [13] 50 5 5 0.94

Westaway et al. [8] 31 4.2 NR 0.89

MDC, minimal detectable change; MCID, minimum clinically

important difference; NR, not reported.

* Data converted to raw score (0–50) from percent.

Although the GRC has been used previously, it is not with-out controversy [37,38]. The most common complaintregarding the use of a retrospective patient rating is thatpatients may forget their own baseline status. However,our results show a successful delineation between thosewho had improved versus those who remained stable basedon the significant difference in NDI scores between groupsafter treatment. As previously reported by Fritz and Irrgang[26], this supports the construct validity for using the GRCas an external standard to assess meaningful change in out-come instruments.

As reported by Cleland et al. [15], selecting different cutoffs in a rating scale can vary the responsiveness values ofa patient outcome measure. We arbitrarily chose a GRC rat-ing $3 (‘‘somewhat better’’) as the cut off to delineate im-proved versus stable patients. As we were looking at theresponsiveness of the instrument to detect change whenchange has occurred after a period of treatment, weaccepted that there would be some fluctuation in NDIscores for the stable group based on the selected GRC cutoff. However, we did not want to establish our cut off scoretoo low and assume that a change has occurred that is sig-nificant to the patient when indeed it was not. Further workmust be done to ascertain the best method of establishingexternal standards to assess meaningful change.

The second difference in reported MCID and MDCscores for the NDI may be due to the type of patients in-cluded in our study compared with those in other reports.Sixty percent of our patients presented with concomitantUE radicular symptoms, an equal amount in both theimproved and stable groups. Likewise, Cleland et al.[15] reported data that included 23% of patients withsymptoms distal to the shoulder. Stratford et al. [13] in-cluded only patients with neck pain, and Westaway et al.[8] included only two patients (6%) with radicular symp-toms. We found no significant difference in NDI changescores when comparing between the presence and absenceof unilateral UE symptoms, suggesting that the NDI ade-quately accounts for UE symptoms in conjunction withneck pain.

Our study did show the NDI to be a responsive indicatorof a change in health status when true change has occurred(AUC50.79). Therefore, an MCID of 7.5 points on the NDIcorresponds to a clinically meaningful change whenapplied to patients with characteristics similar to those ofour study sample. However, because the MCID of 7.5points was within the bounds of measurement error(MDC510.2 points), a clinician should use 10 points (outof a maximum 50-point NDI score) to determine when clin-ically meaningful change has occurred. These results add tothe growing body of literature regarding the responsivenessof the NDI. Although these values give guidelines to adjust-ing patient treatment based on functional outcome measure-ment tools, it is also important for clinicians to rememberthat clinical decision making should also be based on thepatient’s perception of treatment efficacy [39].

807B.A. Young et al. / The Spine Journal 9 (2009) 802–808

A limitation of this secondary analysis is that thepatients were enrolled into a larger clinical trial withoutspecified minimum or maximum symptom duration andwith or without UE symptoms. Although this does increasethe external validity of these results, future research on theNDI could be designed to control for symptom duration andUE symptom status to assess for variances in outcome toolresponsiveness with these variables. Given the relativelylarge change score necessary to detect true change(10 out of 50 points), further research is also needed toexplore whether there are more responsive instruments forpatients presenting with mechanical neck pain, both withand without UE symptoms.

Conclusions

The NDI demonstrates satisfactory responsiveness in ourstudy. In application to clinical practice, one shouldconsider NDI changes of 10 points to be clinically mean-ingful for patients with mechanical neck pain presentingboth with and without concurrent UE symptoms.

References

[1] Cote P, Cassidy JD, Carroll L. The Saskatchewan health and back

pain survey: the prevalence of neck pain and related disability in

Saskatchewan adults. Spine 1998;23:1689–98.

[2] Picavet HS, Schouten JS. Musculoskeletal pain in the Netherlands:

prevalences, consequences and risk groups, the DMC(3)-study. Pain

2003;102(1–2):167–78.

[3] Hackett GI, Hudson MF, Wylie JB, et al. Evaluation of the efficacy

and acceptability to patients of a physiotherapist working in a health

centre. Br Med J (Clin Res Ed) 1987;294:24–6.

[4] Vernon H, Mior S. The Neck Disability Index: a study of reliability

and validity. J Manipulative Physiol Ther 1991;14:409–15.

[5] Leak AM, Cooper J, Dyer S, et al. The Northwick Park Neck Pain

Questionnaire, devised to measure neck pain and disability. Br J

Rheumatol 1994;33:469–74.

[6] Wheeler AH, Goolkasian P, Baird AC, Darden BV. Development of

the Neck Pain and Disability Scale. Item analysis, face, and

criterion-related validity. Spine 1999;24:1290–4.

[7] Jordan A, Manniche C, Mosdal C, Hindsberger C. The Copenhagen

Neck Functional Disability Scale: a study of reliability and validity.

J Manipulative Physiol Ther 1998;21:520–7.

[8] Westaway MD, Stratford PW, Binkley JM. The patient-specific

functional scale: validation of its use in persons with neck dysfunc-

tion. J Orthop Sports Phys Ther 1998;27:331–8.

[9] BenDebba M, Heller J, Ducker TB, Eisinger JM. Cervical spine

outcomes questionnaire: its development and psychometric proper-

ties. Spine 2002;27:2116–23.

[10] Feise RJ, Menke JM. Functional Rating Index: a new valid and

reliable instrument to measure the magnitude of clinical change in

spinal conditions. Spine 2001;26:78–87.

[11] Fairbank JC, Pynsent PB. The Oswestry Disability Index. Spine

2000;25:2940–52.

[12] Pietrobon R, Coeytaux RR, Carey TS, et al. Standard scales for

measurement of functional outcome for cervical pain or dysfunction:

a systematic review. Spine 2002;27:515–22.

[13] Stratford PW, Riddle DL, Binkley JM, et al. Using the Neck Disabil-

ity Index to make decisions concerning individual patients. Physi-

other Can 1999;51:107–19.

[14] Chok B, Gomez E. The reliability and application of the Neck

Disability Index in physiotherapy. Physiother Singapore 2000;3:

16–9.

[15] Cleland JA, Childs JD, Whitman JM. Psychometric properties of

the Neck Disability Index and Numeric Pain Rating Scale in pa-

tients with mechanical neck pain. Arch Phys Med Rehabil

2008;89:69–74.

[16] de Bruin AF, Diederiks JP, de Witte LP, et al. Assessing the respon-

siveness of a functional status measure: the Sickness Impact Profile

versus the SIP68. J Clin Epidemiol 1997;50:529–40.

[17] Beaton DE. Understanding the relevance of measured change through

studies of responsiveness. Spine 2000;25:3192–9.

[18] Jaeschke R, Singer J, Guyatt GH. Measurement of health status.

Ascertaining the minimal clinically important difference. Control

Clin Trials 1989;10:407–15.

[19] Daffner SD, Hilibrand AS, Hanscom BS, et al. Impact of neck and

arm pain on overall health status. Spine 2003;28:2030–5.

[20] Walker MJ, Boyles RE, Young BA, et al. The effectiveness of manual

physical therapy and exercise for mechanical neck pain: a randomized

clinical trial. Spine 2008;33:2371–8.

[21] Hoving JL, Koes BW, de Vet HC, et al. Manual therapy, physical

therapy, or continued care by a general practitioner for patients with

neck pain. A randomized, controlled trial. Ann Intern Med 2002;136:

713–22.

[22] Bronfort G, Evans R, Nelson B, et al. A randomized clinical trial of

exercise and spinal manipulation for patients with chronic neck pain.

Spine 2001;26:788–99.

[23] Kjellman G, Oberg B. A randomized clinical trial comparing general

exercise, McKenzie treatment and a control group in patients with

neck pain. J Rehabil Med 2002;34:183–90.

[24] George SZ, Fritz JM, Erhard RE. A comparison of fear-avoidance

beliefs in patients with lumbar spine pain and cervical spine pain.

Spine 2001;26:2139–45.

[25] Childs JD, Piva SR, Fritz JM. Responsiveness of the numeric

pain rating scale in patients with low back pain. Spine 2005;30:

1331–4.

[26] Fritz JM, Irrgang JJ. A comparison of a modified Oswestry Low Back

Pain Disability Questionnaire and the Quebec Back Pain Disability

Scale. Phys Ther 2001;81:776–88.

[27] Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal

important change in a disease-specific quality of life questionnaire.

J Clin Epidemiol 1994;47:81–7.

[28] Whitman JM, Flynn TW, Childs JD, et al. A comparison between two

physical therapy treatment programs for patients with lumbar spinal

stenosis: a randomized clinical trial. Spine 2006;31:2541–9.

[29] Altman DG, Machin D, Bryant TN, et al. Statistics with confidence.

2nd ed. Bristol, UK: BMJ Books; 2000.

[30] Deyo RA, Centor RM. Assessing the responsiveness of functional

scales to clinical change: an analogy to diagnostic test performance.

J Chronic Dis 1986;39:897–906.

[31] Kopec JA, Esdaile JM. Functional disability scales for back pain.

Spine 1995;20:1943–9.

[32] Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence

supporting an SEM-based criterion for identifying meaningful

intra-individual changes in health-related quality of life. J Clin

Epidemiol 1999;52:861–73.

[33] Eliasziw M, Young SL, Woodbury SG, et al. Statistical methodology

for the concurrent assessment of interrater and intrarater reliability:

using goniometric measurements as an example. Phys Ther

1994;74:777–88.

[34] Roebroeck ME, Halaar J, Lankhorst GJ. The application of

generalizability theory to reliability assessment: an illustration using

isometric force measurements. Phys Ther 1993;73:386–95.

[35] Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater

reliability. Psychol Bull 1979;86:420–8.

[36] Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD. Linking

clinical relevance and statistical significance in evaluating intra-

808 B.A. Young et al. / The Spine Journal 9 (2009) 802–808

individual changes in health-related quality of life. Med Care

1999;37:469–78.

[37] Norman GR, Stratford PW, Regehr G. Methodological problems in

the retrospective computation of responsiveness to change: the lesson

of Cronbach. J Clin Epidemiol 1997;50:869–79.

[38] Stratford PW, Binkley FM, Riddle DL. Health status measures:

strategies and analytic methods for assessing change scores. Phys

Ther 1996;76:1109–23.

[39] Bellamy N, Carr A, Dougados M, et al. Towards a definition of

‘‘difference’’ in osteoarthritis. J Rheumatol 2001;28:427–30.