Clinimetrics in Parkinson's disease

Johan Marinus


Thesis Leiden University, June 11, 2003

ISBN: 90-6734-265-3

© 2003, J. Marinus, except (parts of) the following chapters:

chapter 2: John Wiley & Sons, Ltd.

chapter 3: BMJ Publishing Group.

chapters 5 and 9: Lippincott Williams & Wilkins.

chapter 6: American Psychiatric Publishing, Inc.

chapter 11: Elsevier Science.

No part of this book may be reproduced, stored in a retrieval system of any nature, or

transmitted in any form or by any means, electronic, mechanical, photocopying, recording or

otherwise, without prior permission in writing of the copyright owner.

Printed by: [OPTIMA], grafische communicatie, Rotterdam

Cover illustration: Han Marinus & [OPTIMA]


PROEFSCHRIFT

ter verkrijging van

de graad van Doctor aan de Universiteit Leiden,

op gezag van de Rector Magnificus Dr. D.D. Breimer,

hoogleraar in de faculteit der Wiskunde en

Natuurwetenschappen en die der Geneeskunde,

volgens besluit van het College voor Promoties

te verdedigen op woensdag 11 juni 2003

klokke 14.15 uur

door

Johan Marinus

geboren te Leiden

in 1954

Promotiecommissie

Promotor: Prof. Dr. R.A.C. Roos

Co-promotores: Dr. J.J. van Hilten

Dr. Ir. A.M. Stiggelbout

Referent: Prof. Dr. Ir. H.C.W. de Vet (Vrije Universiteit Amsterdam)

Lid: Prof. Dr. J.C. van Houwelingen

The studies described in this thesis were performed at the Department of Neurology of the

Leiden University Medical Center, Leiden, The Netherlands, and were financially supported

by the Netherlands Organization for Scientific Research (NWO, project no. 0940-33-021) and

the Leiden University Medical Center.

Financial support for this thesis has been provided by the Netherlands Organisation for

Scientific Research (Nederlandse Organisatie voor Wetenschappelijk Onderzoek: NWO),

Stichting Het Remmert Adriaan Laan Fonds, Boehringer Ingelheim BV, GlaxoSmithKline

BV, and Roche Nederland BV.

Aan mijn vader

Contents

1. Parkinson's disease, clinimetrics, and the 'disablement process': an introduction 9

2. Systematic evaluation of rating scales for impairment and disability in Parkinson's disease 15

C. Ramaker, J. Marinus, A.M. Stiggelbout, J.J. van Hilten

Mov Disord 2002;17:867-876

3. Health-related quality of life in Parkinson's disease: a systematic review of disease- 35

specific instruments

J. Marinus, C. Ramaker, J.J. van Hilten, A.M. Stiggelbout

J Neurol Neurosurg Psychiatry 2002;72:241-248

4. Development of an instrument for the assessment of cognition in Parkinson's disease 57

J. Marinus, M. Visser, N.A. Verwey, F.R.J. Verhey, H.A.M. Middelkoop, A.M. Stiggelbout,

J.J. van Hilten

Submitted for publication

5. Evaluation of the Hospital Anxiety and Depression Scale in patients with Parkinson's 85

disease

J. Marinus, A.F.G. Leentjens, M. Visser, A.M. Stiggelbout, J.J. van Hilten

Clin Neuropharmacol 2002;25:318-324

6. The contribution of somatic symptoms to the diagnosis of depressive disorder in 101

Parkinson's disease: a discriminant analytic approach

A.F.G. Leentjens, J. Marinus, J.J. van Hilten, R. Lousberg, F.R.J. Verhey

J Neuropsychiatry Clin Neurosci 2003;15:74-77

7. Development of a questionnaire for autonomic dysfunction in Parkinson's disease: 109

the SCOPA-AUT

M. Visser, J. Marinus, A.M. Stiggelbout, J.J. van Hilten


8. Development of a short scale for motor impairments and disabilities in Parkinson's disease: 125

the SPES/SCOPA

J. Marinus, M. Visser, A.M. Stiggelbout, J. M. Rabey, P. Martínez-Martín, U. Bonuccelli,

P.H. Kraus, J.J. van Hilten


9. Activity-based diary for Parkinson's disease 145

J. Marinus, M. Visser, A.M. Stiggelbout, J.M. Rabey, U. Bonuccelli, P.H. Kraus,

J.J. van Hilten

Clin Neuropharmacol 2002;25:43-50

10. Development of a questionnaire for sleep and sleepiness in Parkinson's disease 161

J. Marinus, M. Visser, J.J. van Hilten, G.J. Lammers, A.M. Stiggelbout


11. A short psychosocial questionnaire for patients with Parkinson's disease: the SCOPA-PS 179

J. Marinus, M. Visser, P. Martínez-Martín, J.J. van Hilten, A.M. Stiggelbout

J Clin Epidemiol 2003;56:61-67

12. Summary and conclusions 197

Samenvatting en conclusies 207

List of abbreviations 219

Nawoord 221

List of publications 222

Curriculum vitae 223

- 9 -

11 Parkinson's disease, clinimetrics, and the 'disablement

process': an introduction

Chapter 1

- 10 -

Parkinson's disease Parkinson's disease (PD) is a chronic, progressive, neurological disorder that affects almost

two percent of the population over 65 years.1 The disease is named after the London

physician James Parkinson, who reported on six patients with shaking palsy or paralysis

agitans.2 Parkinson described the disease as "involuntary tremulous motion, with lessened

muscle power, in parts not in action and even when supported; with a propensity to bend the

trunk forward, and to pass from a walking to a running pace: the senses and intellects being

uninjured."

The symptoms that accompany this disease are largely caused by a reduction of dopamine in

the striatum, resulting from the selective loss of dopaminergic neurons in the substantia nigra.

The first symptoms become apparent when approximately 70-80% of these neurons have

been destroyed, implying that the symptomatic phase of the disease is probably preceded by a

long asymptomatic phase. In PD not only the dopaminergic pathways are affected, but other

neurotransmitter systems, i.e., serotonergic, noradrenergic, and cholinergic, are also

involved.3 The cause of the disease is still unknown. Most researchers take the view that the

risk of developing PD is determined by genetic factors, environmental factors, or a

combination of both.

PD is characterized by tremor, rigidity, bradykinesia, and postural disturbances.4 Non-motor

features like autonomic dysfunctions, mood changes, and cognitive deterioration often

complicate the course of the disease.

The onset of the disease generally occurs between the ages of 50 and 65, and the incidence

increases with higher age. PD is associated with a two-fold higher relative risk of death,5,6

and a five-fold higher odds ratio of institutionalisation compared to age-matched controls.5

Patients die 3-6 years earlier than non-patients,7,8 after a mean disease duration of 11-15

years.6 Patients therefore spend a long time of their life with the disease.

Current knowledge on progression of PD is often not very accurate because assessment

instruments for the various involved domains are lacking or display clinimetric problems.

Furthermore, much of the information on disease progression is derived from clinical trials

that only include patients without other serious conditions. PD generally occurs at ages over

40 and patients may develop other conditions associated with ageing (comorbidity). Most

comorbid conditions are non-fatal, but add to the disease burden. Information on disease

progression in these ‘unselected’ patients is not available. Another problem is that the

longitudinal population studies that have been performed to date, have only included a

Introduction

- 11 -

limited number of endpoints. Better insight in the course of the disease may be obtained by

performing a longitudinal study, in which all relevant domains are addressed in unselected

patients with PD. A better understanding of the possible relations and interactions may, for

instance, allow physicians or other care providers to time their interventions better.

It is important to realise that the quality of the information that is obtained in such a

longitudinal study depends on the quality of the applied measurement instruments. The

discipline that is concerned with the quality of health measurement scales is called

clinimetrics.

Clinimetrics

The term clinimetrics was introduced by Alvan Feinstein, who defined it as the "domain

concerned with indexes, rating scales, and other expressions that are used to describe or

measure symptoms, physical signs, and other distinctly clinical phenomena in clinical

medicine."9 If instruments are used to quantify disease characteristics, it is essential that they

meet certain quality criteria in order to provide accurate answers. The most important criteria

are reliability, validity, and responsiveness. Reliability is the extent to which an instrument is

free of measurement error and is often defined as the proportion of variance that is

attributable to 'true' differences between individuals.10 The concept includes the idea of

reproducibility, but also refers to internal consistency or homogeneity of the components of

an index.9 Validity is commonly defined as the extent to which a test measures what it is

intended to measure.11 Responsiveness involves the ability of an instrument to reflect changes

in health status.12

Disablement Process

Many of the existing measurement scales in PD were developed without a clear framework to

guide the construction process, often resulting in heterogeneous scales that combine items

that evaluate impairments with those that assess disabilities. The use of a framework would

have facilitated the development of conceptually clear indices. Well-known frameworks are

the International Classification of Functioning, Disability, and Health (ICF, formerly

ICIDH),13 developed by the World Health Organization, and the model of the 'Disablement

Process', put forward by Verbrugge and Jette.14 In our research project we used the latter

framework to model our research. Verbrugge and Jette proposed a sociomedical model of

Chapter 1

- 12 -

disability, that is especially useful for epidemiological and clinical research. The model

describes the pathway that links pathology with impairments, functional limitations, and

disability and proposes bi-directional relations between these entities. Extra- and intra-

individual factors, and comorbidity as a special case of an intra-individual factor, may act

upon this pathway and modify the course of the disease and its impact on the individual.

Using this model may help to identify relevant aspects of the disease, to assign these aspects

to the appropriate domains, and to evaluate whether the instruments are of sufficient

clinimetric quality.

Aim of the study

This thesis reflects the first phase of a project that aims to provide data on disease progression

in patients with PD. This first phase is concerned with arriving at the appropriate instruments,

the second phase involves the use of these instruments in the longitudinal follow-up of

patients, stratified by disease duration and age at onset of the disease. The project is called

‘the assessment of the disablement process in Parkinson’s Disease’ and is financed by the

Netherlands Organization for Scientific Research (NWO; project 940-33-021) and the Leiden

University Medical Center.

The first phase of this project (called the SCOPA project, short for SCales for Outcomes in

PArkinson's disease) involves the selection or development of practical and clinimetric sound

instruments, intended for comparing groups of patients or for clinical use in individual

patients. Scales are considered clinimetric sound if they measure one single aspect of the

disease in a valid and reliable way, and practical if they can be administered swiftly, do not

demand special equipment or skills, and provide an index that can easily be obtained and

interpreted.

We followed a standard procedure for all scales. First we evaluated whether appropriate

instruments were available. Next we assessed whether these instruments had already been

tested in PD, and whether results of these studies justified the use of this scale in the

longitudinal phase. If a scale was available, but had not been tested in PD, we assessed its

clinimetric properties in this population. If appropriate scales were not available, we

developed them.

On the impairment level, modules for the following domains were selected or constructed:

cognition, mood, psychiatric complications, motor evaluation, motor complications, and

Introduction

- 13 -

autonomic dysfunction. On the disability level modules for psychosocial disability, activities

of daily living, and sleep were developed. Other modules that were evaluated or developed

included comorbidity and costs. Applying the concept of the disablement process to our

research project resulted in the model presented in figure 1. In this thesis, the clinimetric

evaluation of scales that involved impairment and disability levels, is described.

Figure 1. Model of the SCOPA project

Extra-individual factors:- medical care & rehabilitation- medication / external support

Intra-individual factors:- lifestyle, behaviour, coping- activity accomodations

Comorbidity

IMPAIRMENTS

cognition

mood

psychiatric dysfunction

motor

motor complications

autonomic

DISABILITY

sleep

physical

psychosocial

GLOBAL

global health

costs

utility





Comorbidity

IMPAIRMENTS

cognition

mood


motor

motor complications

autonomic

DISABILITY

sleep

physical

psychosocialIMPAIRMENTS

cognition

mood


motor

motor complications

autonomic

IMPAIRMENTS

cognition

mood


motor

motor complications

autonomic

DISABILITY

sleep

physical

psychosocialDISABILITY

sleep

physical

psychosocial

GLOBAL

global health

costs

utility

Chapter 1

- 14 -

References

1. De Rijk MC, Launer LJ, Berger K, Breteler MM, Dartigues JF, Baldereschi M, et al. Prevalence of

Parkinson's disease in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in

the Elderly Research Group. Neurology 2000; 54(11 Suppl 5):S21-S23.

2. Parkinson J. An essay on the shaking palsy. London: Sherwood, Neely, and Jones, 1817.

3. Jellinger K. Overview of morphological changes in Parkinson's disease. Adv Neurol 1987; 45:1-18.

4. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson's

disease. J Neurol Neurosurg Psychiatry 1988; 51(6):745-752.

5. Berger K, Breteler MM, Helmer C, Inzitari D, Fratiglioni L, Trenkwalder C, et al. Prognosis with



6. Di Rocco A, Molinari SP, Kollmeier B, Yahr MD. Parkinson's disease: Progression and Mortality in the

L-DOPA Era. In: Battistin L, Scarlato G, Caraceni T, Ruggieri S, editors. Parkinson's Disease.

Philadelphia: Lippincott - Raven, 1996: 3-11.

7. Wermuth L, Stenager EN, Stenager E, Boldsen J. Mortality in patients with Parkinson's disease. Acta

Neurol Scand 1995; 92(1):55-58.

8. Ben Shlomo Y, Marmot MG. Survival and cause of death in a cohort of patients with parkinsonism:

possible clues to aetiology? J Neurol Neurosurg Psychiatry 1995; 58(3):293-299.

9. Feinstein AR. Clinimetrics. 1 ed. New Haven: Yale University Press, 1987.

10. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to their Development and Use.

2 ed. Oxford: Oxford Medical Publications, 1995.

11. McDowell I, Newell C. Measuring Health: A Guide to Rating Scales and Questionnaires. 2 ed. New

York: Oxford University Press, 1996.

12. Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis 1985;

38(1):27-36.

13. World Health Organization. The international classification of functioning, disability, and health. Internet

2002. http://www3.who.int/icf/onlinebrowser/icf.cfm

14. Verbrugge LM, Jette AM. The Disablement Process. Soc Sci Med 1994; 38(1):1-14.

- 15 -

22 Systematic evaluation of rating scales for impairment and

disability in Parkinson's disease

Claudia Ramaker1, Johan Marinus1, Anne M. Stiggelbout2, Jacobus J. van Hilten1

Departments of 1Neurology and 2Medical Decision Making, Leiden University Medical Center,

Leiden, The Netherlands

Parts of this chapter were published in Movement Disorders 2002;17:867-876

Chapter 2

- 16 -

Abstract We assessed the clinimetric characteristics of rating scales used for the evaluation of motor

impairments and disabilities of patients with Parkinson's disease (PD) by conducting a systematic

review of PD rating scales published from 1960 to the present. Thirty studies describing clinimetrics

of 11 rating scales used for PD were identified. Outcome measures included validity (including factor

structure), reliability (internal consistency, inter-rater, and intra-rater), and responsiveness. We found

three impairment scales (Webster, Columbia University Rating Scale [CURS], Parkinson's Disease

Impairment Scale), four disability scales (Schwab and England, Northwestern University Disability

Scale [NUDS], Intermediate Scale for Assessment of PD, Extensive Disability Scale), and four scales

evaluating both impairments and disabilities (New York University, University of California Los

Angeles, Unified Parkinson's Disease Rating Scale [UPDRS], Short Parkinson's Evaluation Scale).

The scales showed large differences in the extent of representation of items considered responsive to

dopaminergic treatment or to those symptoms that appear late in the disease course and lack

responsiveness to treatment. Irrespective of the scale, there was a lack of consistency concerning inter-

rater reliability of bradykinesia, tremor, and rigidity. Overall, disability items displayed moderate to

good inter-rater reliability. The available evidence indicated that the CURS, NUDS, and UPDRS have

moderate to good reliability and validity. The majority of instruments demonstrated clinimetric

shortcomings or had not been subjected to extensive clinimetric testing, despite their frequent use. The

CURS, NUDS, and UPDRS have been evaluated most often and these scales are also considered valid

and reliable.

Rating scales in Parkinson's diisease

- 17 -

Introduction Parkinson's disease (PD) is a progressive neurological disorder that gradually results in

accumulating disability. Because most of the motor features result from striatal dopamine

deficiency, the treatment of patients with PD has focussed on the administration of

dopaminergic drugs to alleviate symptoms. New insights in the pathophysiology of PD and an

increasing awareness of factors that contribute to levodopa-induced motor complications have

stimulated the development of not only new drugs, but also very promising surgical

techniques.1-3 Consequently, the increasing number of therapeutic interventions in PD has

highlighted the importance of measuring clinical outcomes. In 1981, Marsden and Schachter4

reviewed all methods for the assessment of extrapyramidal disorders and presented a

comprehensive summary of subjective and objective assessments, regardless of their validity

and reliability. Since the appearance of this review, the evaluation of patient outcomes,

clinimetrics, has developed in a science of its own. Information on validity, reliability, and

responsiveness is now considered as essential knowledge to assure the useful application of a

rating scale.5 We conducted a systematic review of the clinimetric aspects of scales that are

used by observers to evaluate the motor impairments and disabilities of patients with PD.

Methods

Studies were included if they evaluated clinimetric properties of a PD rating scale that

addressed motor impairments or disabilities, evaluated by an observer. Self-report scales and

quality-of-life measures were therefore excluded from this review.6 Scales that primarily

assessed dyskinesias or motor fluctuations were also excluded. Impairment is defined as an

abnormality in structure or function of a body organ or system, and disability as a reduction of

a person's ability to perform a basic task.7,8

Search Strategy

The following sources were used to identify studies of interest: computerized searches of

Medline and EMBase (using text words [rating] scale, impairment, disability, clinimetrics,

evaluation, and the individual scale names in combination with 'Parkinson' and related terms;

search conducted December 2001), reference lists of the reviews found by the Medline and

EMBase search-strategy, SCIsearch, the Cochrane Library,9 symposia reports, PD handbooks,

Chapter 2

- 18 -

and reference lists of all included publications. Searches were not restricted to the English

language.

Methods of Review

Two reviewers independently reviewed the identified publications according to a two-step

review process. First, abstracts were reviewed for eligibility. Eligible reports were judged

against a set of methodological criteria in which both thoroughness (methodological and

statistical) and results of studies evaluating validity, reliability, and responsiveness were

assessed. A checklist was used to record sample characteristics, outcome measures,

appropriateness of statistical analysis, and methodological quality. The method of presenting

the quality of scales was adopted from McDowell and Newell.10

In attempting to interpret the different indices of correlation and degrees of agreement, we

noticed that there was no general agreement about how high they should be. Because a new

rating scale is generally not designed to replicate exactly the existing method against which it

is compared, the expected correlation should not be perfect as this may indicate that the new

scale is redundant. Few studies, however, declare what levels of correlation are to be taken as

demonstrating adequate validity or reliability.

We interpreted the different correlations and degrees of agreement for validity and reliability

as follows: the Spearman's coefficient rs, Pearson's coefficient r, Kendall's coefficient W or T,

Eta coefficient, and Cramer's coefficient V with values of 0.70 and lower were considered

poor,11 whereas values over 0.70 were considered moderate to good. The values for kappa

(K), weighted kappa (Kw), and intra-class correlation coefficient (ICC) of 0.40 or lower were

considered to indicate poor agreement, 0.41-0.60 moderate, 0.61-0.80 substantial, and values

≥ 0.81 good to almost perfect agreement.12 Cronbach's α ≤ 0.70 was considered poor, whereas

values of 0.71-0.90 were considered moderate to good.10,13 If, however, α is too high (> 0.90),

this may reflect redundancy, indicating that some of the items are unnecessary.11

The thoroughness of the evidence was classified as follows. If the appropriate statistical

procedures were used, the sample size was considered large enough, and all circumstances

were optimal (i.e., the PD population), it was classified as good. If less preferable statistical

procedures were used or the circumstances were less optimal, it was classified as moderate. If

questionable statistical procedures were used or the circumstances were somewhat doubtful or

unclear, it was classified as fair, and if the statistical procedure or the circumstances were

inadequate, it was classified as poor.


- 19 -

Studies were eligible when they calculated the following clinimetric characteristics of disease

specific impairment and disability instruments in Parkinson's disease: validity (content

validity, criterion validity, and construct validity including factor structure), reliability

(internal consistency, inter-rater reliability, intra-rater reliability), or responsiveness.

Validity

Validity is the extent to which an instrument measures what it is supposed to measure and

does not measure what it is not supposed to measure. Three types of validity are frequently

discussed: content validity, criterion validity, and construct validity.

Content validity. Content validity consists of a judgment whether the instrument samples all

the relevant topics of a domain. It relies on expert opinions and reviews of the literature.

Criterion validity. This is the demonstration of the concordance of an assessment compared

with a particular standard, the criterion. It is assessed using correlation coefficients of

concordance, or percentages of agreement. The most commonly used correlation coefficients

of concordance are Spearman's coefficient rs, Pearson's coefficient r, Kendall's coefficient W,

and Cramer's coefficient V. Coefficients range from -1 (indicating an inverse linear

association) through 0 (indicating no association at all) to +1 (indicating perfect positive

linear association). This concept is particularly useful when an obvious gold standard exists

for use as a criterion.

Construct validity. Construct validity is commonly used instead of criterion validity, because

in most cases a gold standard is lacking. It is demonstrated by examining the relations among

a newly created test and another test to show that the new test measures the same construct.

Factor analysis. Factor analysis is commonly used to study the internal structure of a scale

that contains separate components, each reflecting a different aspect of the measured domain.

Using this technique, a large number of interrelated items are reduced to a smaller number of

common dimensions or factors (clusters of items). Unrelated items do not belong to the same

factor.

Reliability

Reliability is the extent to which an instrument is free of measurement error. Reliability

assessment aims to quantify the most important sources of measurement error, and includes

both consistency among scale items and reproducibility within and between observers.

Chapter 2

- 20 -

Internal consistency. Internal consistency estimates the extent to which all items are

measuring the same construct. Cronbach's coefficient α, the most frequently used indicator of

internal consistency, represents the average of all possible split-half reliabilities. Coefficient α

will be equal to zero, if there is no linear relationship between the items. If all items are

perfectly reliable and measure the same aspect (true score), then coefficient α is equal to 1.

For clinical applications at a group level the minimum value is 0.70, whereas for use at the

individual level, a minimum of 0.90 is desirable.11

Inter-rater reliability. Inter-rater reliability is the consistency among different observers

performing the same assessment on the same individual. Inter-rater reliability is best assessed

by ICC or K.14 The ICC is a parametric measure of agreement and represents the proportion

of variance among patients caused by true differences.15 Kappa, developed for the study of

nonparametric ratings by observers, measures agreement corrected for the extent of agreement

expected by chance alone. Where categories are ordered, it may be preferable to give different

weights to disagreements according to the magnitude of the discrepancy, Kw.16 If a squared

weighting scheme is used, Kw is equivalent to the ICC.

Intra-rater reliability. This measures the reproducibility of the assessment by the same

examiner during repeat assessment. The intra-rater reliability is also best assessed by the ICC

or the K statistics.

Responsiveness

Responsiveness or sensitivity to change is the ability of an instrument to reflect underlying

change over time. In contrast to the assessment of differences in change between individuals,

there is no clear consensus as to how this should be assessed for rating methods.15,17

Other information that was gathered included the type of scale, number of items, scoring

method, and administration time. Whenever information on studies or scales was unclear or

incomplete, we contacted the authors with the request to provide additional information.


- 21 -

Results

Description of Studies

Over the period of 1966 to December 2001, 30 studies were identified that described

clinimetric characteristics of 11 rating scales for patients with PD. We excluded a study by

Cutson et al.18 that dealt with the Duke University Parkinson's Rating Scale (DUPRS),

because the scale could not be retrieved. We were unable to trace studies that evaluated

responsiveness. Three impairment scales (Columbia University Rating Scale [CURS],

Webster rating scale [Webster], and Parkinson's Disease Impairment Scale [PDIS]), four

disability scales (Northwestern University Disability Scale [NUDS], Intermediate Scale for

Assessment of Parkinson's disease [ISAPD], Schwab and England, and Extensive Disability

Scale [EDS]), and four multimodular scales, containing both impairment and disability

sections (New York University Parkinson's disease evaluation [NYU], University of

California Los Angeles scale [UCLA], Short Parkinson's Evaluation Scale [SPES], and

Unified Parkinson's Disease Rating Scale [UPDRS]), were identified.

We report the clinimetric characteristics of individual impairment and disability items. Details

on individual scales are followed by a comparison of their clinimetric characteristics.

Impairment

Content validity. In evaluating the content of impairment scales and impairment sections of

multimodular scales, large differences emerged. Some impairment items were present in all

(tremor and bradykinesia) or in the majority (rigidity and gait) of scales. Some items were

unique for a particular scale (e.g., blepharospasm in the UCLA, short and extra steps in the

PDIS). As the core features are not equally represented and defined in the different rating

scales, the contribution of these symptoms to the total score varies from scale to scale (table

1). The contribution of items dealing with bradykinesia and hypokinesia (including finger and

foot taps, successive hand movements, facial expression, body bradykinesia, akinesia, and

arm swing) to the total impairment scores varies from 17% (SPES) to 40% (Webster). For

tremor these values vary from 10% (Webster) to 33% (SPES), for rigidity from 0% (PDIS) to

20% (CURS), and for postural stability from 0% (Webster, UCLA and NYU) to 10% (PDIS).

Chapter 2

- 22 -

Table 1. Contribution of type of symptoms to the total impairment score

WEBSTER UCLA CURS NYU UPDRS PDIS SPES

brady-/hypokinesia 40 23 28 16 37 30 17

tremor 10 11 20 14 26 20 33

rigidity 10 9 20 14 19 0 17

postural stability 0 0 4 0 4 10 8

other items 40 57 28 56 14 40 25

Values are percentages obtained by dividing the maximum score for that (group of) item(s) by the maximum

possible score for that scale.

Two scales used a weighting factor for each item. In the NYU, the maximum possible score

for each sign determines the weighting; in the UCLA, for example, 'akinesia' is weighted nine

times whereas ‘mask facies’ is weighted only once. Several studies repeatedly demonstrated

that tremor behaves independently from all other items, not significantly contributing to the

explained variance of a scale,19 nor to the construct validity (Hoehn and Yahr [H&Y]

staging).20,21 Postural instability, another major feature of PD occurring in the later stages of

the disease, is not evaluated in the Webster, the UCLA, and the NYU. Speech is present in

five impairment scales or sections (Webster, UCLA, CURS, and motor impairment sections

of UPDRS and SPES). Seborrhea and sialorrhea are evaluated in three (Webster, UCLA, and

CURS) and two impairment scales (UCLA and CURS), respectively.

Another problem that emerged concerned the applied methods of evaluation. This was

particularly noteworthy for bradykinesia.

Reliability. Nine studies reported inter-rater reliability of individual items, whereas only one

study evaluated intra-rater reliability.22 This latter study reported a moderate to good intra-

rater reliability for all items of the CURS, except for rigidity, which was not reported because

the study used video recordings.

Regardless of the scale, there was a lack of consistency among the findings (ranging from

poor to good) concerning inter-rater reliability of the core features bradykinesia, tremor, and

rigidity, as well as for the item speech (table 2). The majority of the studies found a good

inter-rater reliability for postural stability. Seborrhea and sialorrhea showed poor inter-rater

reliability in the CURS22,23 and moderate inter-rater reliability in the UCLA.24


- 23 -

Disability

Content validity. The Schwab and England activities-of-daily-living scale is a staging system,

in which 100% represents complete independency and 0% a vegetative state. The remaining

three disability scales and four disability sections of multimodular rating scales bear only

some resemblance in content of items. Dressing, walking, speech, hygiene, and feeding or

eating (swallowing), are included in all scales. Turning in or getting out of bed, and getting

out of a chair, are included in all scales except the NUDS. Handwriting is found in four scales

(UCLA, NYU, UPDRS Activities-of-Daily-Living [ADL] section and SPES ADL section)

and climbing stairs in three (UCLA, EDS, and ISAPD).

Reliability. Eight studies reported inter-rater reliability of separate items, in contrast to intra-

rater reliability, which was only evaluated in one study.20 This study reported a moderate to

good intra-rater reliability for all items of the PDIS.

Overall, the disability items displayed moderate to good inter-rater reliability, with a few

exceptions. Speech scored poor in two studies on the NUDS,23,34 and in one study on the

EDS.34 In the original publication of the UPDRS,27 Fahn et al. reported poor inter-rater

reliability for walking, in contrast to two later studies that found substantial to excellent

values for this item.21,34

Clinimetric characteristics of the included scales

Impairment scales. The three impairment scales (table 3), the Columbia University Rating

Scale (CURS), the Webster, and the Parkinson's Disease Impairment Scale (PDIS), vary in

number of items (25, 10, and 10, respectively) and response options (0-4, 0-3, and 0-3).

Parkinson's Disease Rating Scale by Webster. For a scale that has been used for a long time

by many investigators, surprisingly little evidence is published on its validity and reliability.

Notably, the Webster includes one disability item (self-care) and nine impairment items,

which makes this scale conceptually unclear. From a factor analysis assessed in one study,

three factors were derived, including: (I) arm swing, gait, self-care, and posture; (II) speech

and facies; (III) seborrhea.19 Four studies showed that the scale displays poor to moderate

inter-rater reliability.23-25,27,29,34

Chapter 2

- 24 -

Table 2. Interpretation of values for interrater reliability

WEBSTER UCLA CURS UPDRS SPES Brady-/hypokinesie finger tap +1

++22 +27 ++26 +++21,27,28

+++21

foot tap -23 ++22

+25,27 +++21,26,28

successive movements +29 ++23

-/+23 ++22

-27 ++25,28 ++/+++21,26

facial expression -23,24 -24 +22,23 -26,27 +25 ++21,28

body bradykinesia -23 ++22

+25,27 ++26 +++21,28

akinesia +24 arm swing -23

+24

Tremor rest and postural ++23

+24,29 +24 -/++23

+29 ++22

rest +25 ++/+++21,28 +++26,27

+++21

postural ++21 ++/+++21 action -28

+21,25 ++26,27

Rigidity -23 +24,29

++24 -29 -/+23

+26,27 ++/+++21 +++25

+++21

Postural stability +++22 +26,27 +++25,28,30

+++21

Posture -23 +24,29

+24 ++22 -27 +25,26 ++21 +++28

Speech -23 +24

+24 -23 +22

-26 +25 ++21,27 +++28

++21

Seborrhoea -23 +24 -22,23 Sialorrhoea +24 +24 -23 Numbers in superscript correspond with publications in the reference list. For the NYU and the PDIS, no

information on interrater reliability per item is available; - = poor; + = fair; ++ = moderate; +++ = good

Table 3. Results of validity and reliability, and thoroughness (strength of evidence) of validity and reliability testing

VALIDITY RELIABILITY

Scale typea N items Construct Factorb Inter-rater Intra-rater Internal

No of studiesc

CURS 1969 25 ++(+)/+++ /+++ ++/+++ +++/+++ +++/+++ 522,23,29,31,32

CURS-modified (Sydney) 1993 I 11 ++(+)/+++ 0 +++/+++ +++/+++ 0 122

CURS-modified 1985 8 0 /- +/+ 0 0 122

EDS 1991 D 21 +++/+++ 0 +++/+++ 0 0 129

ISAPD 1987 I,D 13 +++/+++ /+++ ++(+)/+++ 0 +++/+++ 133

NUDS 1980 D 6 ++(+)/+++ 0 ++(+)/+++ 0 0 62,19,23,24,29,34

NYU 1980 I,D 6 +++/+++ 0 0 0 0 135

PDIS 1987 I 10 -(+)/+ /- 0 ++(+)/++ 0 120

SPES 1997 I,D 25 +++/+++ /+++ +++/++(+) 0 0 121

UCLA 1981 I,D 21 0 0 ++(+)/+++ 0 0 224,34

UPDRS 1987 31 +++/+++ /+++ ++/+++ 0 +++/+++ 425-27,36

UPDRS ADL I,D 13 +++/+++ /+++ 0 0 +++/+++ 221,37

UPDRS ME 14 +(+)/+(++) /+++ ++/++ 0 +++/+++ 521,28,37-39

Webster I 10 ++/+ /++ -(+)/+++ 0 0 619,23,24,32,34,40

Signs before the slash refer to results of validity and reliability and signs behind the slash refer to thoroughness (strength of evidence) of validity and reliability testing.

Results of validity and reliability testing: 0 = no numerical results reported; ? = results not interpretable; - = poor results; + = fair results; ++ = moderate results; +++ =

good results. Thoroughness of validity and reliability testing: 0 = no reported evidence; ? = results not interpretable; - = poor evidence; + = fair evidence; ++ = moderate

evidence; +++ = good evidence a I, impairment scale; D, disability scale; b Thoroughness of testing only; c Superscript numbers correspond with the studies in Reference list.

Chapter 2

- 26 -

Columbia University Rating Scale. Although the Columbia University Rating Scale (CURS)

has been used frequently in clinical studies before the introduction of the UPDRS in 1981,

only a few studies have been published on the validity and/or reliability of this scale, mostly

in combination with other PD rating scales.22,23,32,34 The available evidence shows the CURS

to have moderate to good validity and reliability. The factor structure was evaluated in only

one study that included 95 patients with PD plus syndromes, and this therefore precludes a

conclusion on this issue in PD.31 A modified version of the CURS, the Sydney scale, appears

to be equally valid and reliable.22

Parkinson's Disease Impairment Scale. Only one study assessed validity and reliability of the

Parkinson's Disease Impairment Scale (PDIS). Due to an unclear factor analysis and the

subsequent assessment of the construct validity based on these factors, the validity of this

scale is questionable.20 The intra-rater reliability was moderate to good.

Disability Scales

Four disability scales, including the Northwestern University Disability Scale (NUDS), the

Intermediate Scale for Assessment of Parkinson's disease (ISAPD), the Schwab and England

scale, and the Extensive Disability Scale (EDS) are difficult to compare, because they vary

substantially in scoring, grading, number, and kind of items. Although the ISAPD is, among

others, based on the NUDS, its grading is different, 0 to 3 instead of 0 to 10.

Schwab and England. The Schwab and England scale has become a standard assessment tool

in PD and has been used in hundreds of studies. The clinimetric properties of this scale,

however, have never been established. The data available from studies with a primary aim to

investigate characteristics of other rating scales, suggest a moderate to substantial validity and

good reliability.33,34,38

Northwestern University Disability Scale. Two studies found a moderate to good construct

validity.19,34 These studies showed that the NUDS score correlates highly with the Webster

score (Kendall's W = 0.82)19 and with the CURS (Spearman's rs = -0.78),34 which are both

impairment scales. The inter-rater reliability of the NUDS was found excellent by its

designers,41 but only moderate by others.23,24,34 An explanation for the latter could be the

combined effect of the large number of severity gradations in this scale and the use of non-

weighted K’s. Although this scale is frequently used, no information is available on internal

consistency or intra-rater reliability.

Rating scales in Parkinson's disease

- 27 -

Intermediate Scale for Assessment of Parkinson's Disease. Evaluated only by its designers,

the Intermediate Scale for Assessment of Parkinson's disease (ISAPD) showed a moderate to

good correlation with the H&Y, with the UPDRS, and with the Schwab and England.33 In the

same study, the results were also excellent for internal consistency and good for inter-rater

reliability. The administration time was 7 (± 3.70) minutes.33

Extensive Disability Scale. The Extensive Disability Scale (EDS) is a modified version of the

Minimal Record of Disability (MRD),42,43 which is used in patients suffering from multiple

sclerosis. This scale has only been used and tested by its developers, who found a moderate to

good construct validity and inter-rater reliability.34 The administration time by a trained

reviewer was stated to be 15-20 minutes.34

Impairment and Disability Sections in Multimodular Scales

In comparing the four impairment and disability scales, the New York University Parkinson's

disease evaluation (NYU), the Short Parkinson's Evaluation Scale (SPES), the University of

California Los Angeles scale (UCLA), and the Unified Parkinson's Disease Rating Scale

(UPDRS), we noticed the similarity in content. All scales included items on bradykinesia,

tremor, rigidity, walking, eating, turning in bed, and handwriting.

New York University Parkinson's Disease Evaluation. For this scale only poor construct

validity with the H&Y was reported.35 The administration time by a trained examiner was

stated to be 10 minutes.35

University of California Los Angeles Scale. The UCLA scale was rarely used in clinical trials

and beyond the work of Martínez-Martín,24 who found a moderate to good inter-rater

reliability, no further evidence for reliability or validity of the scale has been published.

Unified Parkinson's Disease Rating Scale. The UPDRS has found broad acceptance for the

evaluation of PD and has been used in many trials.44 Nine studies extensively tested and

evaluated this scale. Like the Webster, the UPDRS ADL section is conceptually unclear since

it includes several impairment items (salivation, falling, freezing, tremor, and sensory

complaints). Nevertheless, the UPDRS demonstrates high internal consistency and inter-rater

reliability, shows moderate construct validity, and has a stable factor structure.21,26,28,34,36-39

Even across off- and on-state examinations, the motor examination (ME) section of this scale

has a stable factor structure and high internal consistency.38 The high internal consistency of

the ADL and motor section most likely indicates a redundancy of items. This was

underscored by a study that successfully reduced the ADL and motor section of the UPDRS

Chapter 2

- 28 -

to eight items each, without a loss in reliability or validity.37 The time to administer the

UPDRS was stated to be 10-20 minutes34 and assessed as 16.95 (± 7.98) minutes.34

Short Parkinson's Evaluation Scale. Evidence for construct validity and inter-rater reliability

of the SPES is good, but was only reported in an article by its original designers.21 An

advantage of the SPES is that it is short and easy to administer (7-10 minutes by

neurologists).21

Discussion

Compared to their widespread clinical use for assessment of impairment and disability in PD,

rating scales are seldom extensively evaluated for validity and reliability. The terms

impairment and disability are derived from the International Classification of Impairments,

Disabilities, and Handicaps of the WHO (ICIDH; http://www.who.int./icidh).7,8 The ICIDH-2

was developed recently, and introduces new terms, ‘body structures and function’ are handled

both positive (functional and structural integrity) and negative (impairment) and so are

activities (activity versus activity limitation).

Systematically searching the literature, we found 30 studies describing clinimetric issues of

11 scales for impairment and disability rating in PD. In general, a criticism could be made on

the frequent choice of the H&Y as the gold standard for testing other scales, because, to the

best of our knowledge, none have evaluated its clinimetric data. Nevertheless, the H&Y is the

most commonly used method of establishing the severity of PD.

In evaluating impairment items, the contribution of the core motor features of PD to the total

impairment score varies from scale to scale. For instance, items dealing with bradykinesia and

hypokinesia contribute almost 40% to the total score of the UPDRS ME section, resulting in a

strong effect on the sum scores of the impairment section and on the total score. There are

also large differences in the extent of representation of items considered responsive to

dopaminergic treatment (e.g., bradykinesia, rigidity) or to those symptoms that appear late in

the course of the disease and lack responsiveness to dopaminergic treatment (e.g., postural

instability, swallowing, speech, and freezing). Hence, these differences in content should be

taken into consideration when one has to select a scale for the evaluation of short-term

dopaminergic treatment or for long-term follow-up, in which the occurrence of symptoms not

responsive to dopaminergic treatment indicate disease progression. Generally, within the

framework of impairments, items as sialorrhea and seborrhea have limited clinical

significance. Irrespective of the scale, the findings concerning inter-rater reliability of the core


- 29 -

features bradykinesia, tremor, and rigidity, as well as for the item speech, lacked consistency.

The majority of the studies, however, found a good inter-rater reliability for postural stability.

Clearer description of items may help to improve inter-rater reliability of items. To avoid the

problems with inter-rater reliability, objective measurements could be considered in assessing

impairments in PD.30,45-49 It is remarkable that only one study evaluated intra-rater reliability,

because it is a relevant issue in the case of longitudinal studies performed by one assessor.

Although there is general agreement on the definition of disability (i.e., experienced difficulty

in carrying out activities of daily living), there is no consensus on what should be measured.

All evaluated disability scales and sections included the items of the NUDS (dressing,

walking, speech, hygiene, feeding, and eating). Overall, disability items displayed moderate

to good inter-rater reliability. The low inter-rater reliability values repeatedly found for speech

and walking, suggest that these items are difficult to score or lack clear anchors.

The identified PD rating scales can be divided in three groups: impairment scales, disability

scales, and multimodular scales containing both impairment and disability sections. By

comparing the three impairment scales Webster, CURS, and PDIS, we found evidence for the

CURS to have strong validity, whereas there is insufficient data on validity available for the

Webster and the PDIS. As the overall reliability of the CURS is moderate to good, the inter-

rater reliability of the Webster is poor to moderate. Therefore, as a brief rating method the

Webster appears adequate, but the available clinimetric data on the CURS indicate that this

scale may be preferred. The PDIS was inadequately evaluated by its designers and due to a

lack of other information on clinimetric issues of the PDIS, no recommendations can be given

with respect to this scale. The four disability scales, NUDS, ISAPD, Schwab and England,

and EDS bear hardly any resemblance. Large differences between the scales are found in the

scoring and grading of items. The Schwab and England disability scale takes a unique

position, because it uses a different grading system and has never primarily been evaluated for

its clinimetric characteristics. The construct validity and inter-rater reliability of the NUDS,

ISAPD, and EDS were found to be moderate to good, indicating no particular preference.

Only the NUDS was evaluated independently. The ISAPD, evaluated only by its designers,

appears to be a very valid and reliable disability scale that may be useful as a tool for

evaluation of disability in PD. Independent verification of the clinimetric characteristics,

however, is recommended.

Of the scales containing both an impairment and a disability section, the UPDRS is the most

widely used and tested scale. The NYU, SPES, and UCLA have rarely been used and have

Chapter 2

- 30 -

only been evaluated by their designers. The construct validity of the UPDRS is satisfactory in

those studies that have used the H&Y for comparison. Important differences between these

scales include the scoring and the contribution of the individual items to the subtotal and total

score. In relation to the validity aspects of the UPDRS, some findings deserve comments. The

construct validity of the UPDRS has to be considered very satisfactory. The UPDRS ADL

section, however, is conceptually unclear as it includes several impairment items. With

respect to inter-rater reliability, the UPDRS, SPES, and UCLA should be considered reliable

scales. The SPES and UCLA, however, were evaluated only by the designers of the scales.

The UPDRS demonstrates a very high internal consistency, but the effects of redundancy

(several items focusing on the same aspect of the construct) should be kept in mind. Internal

consistency increases with the number of items and depends substantially on the homogeneity

of items and on the inter-item correlation.

Taken together, the evaluation of the impairment and disability sections as a whole, show that

the UPDRS is a reliable and valid scale, although these sections include some redundant and

unreliable items. The SPES appears to be a valid and reliable scale that might be considered

for evaluation of patients with PD. Nonetheless, independent verification of the clinimetric

characteristics is recommended. Because the UCLA and NYU lack thorough clinimetric

testing, no recommendations can be given.

Others have reviewed disease-specific PD scales,4,30,50,51 but only Mitchell and associates44

presented some clinimetric properties of the most commonly used scales (identified through a

Medline search conducted over the period from 1966 until August 1998). In this study the

UPDRS was found to be the most thoroughly studied scale with overall better clinimetric

properties compared to other scales. As indicated by the authors, one of the limitations of

their study concerned the main focus, which was not to summarize the clinimetrics of scales,

but to examine the pattern of utilization of disease-specific clinical scales used as endpoints in

PD trials. The summary of clinical properties they present is simple and intended to serve as a

guide.

In summary, this review underscores that the clinimetric soundness of the majority of PD

assessment scales is questionable. Moreover, as these scales are generally used in trials on PD

patients who lack serious comorbidity, there is no information on the clinimetric behaviour of

the scales in unselected PD populations.


- 31 -

We emphasize the following critical notes regarding clinimetric issues:

1. The most important question in choosing a scale is how well it is suited to the task at

hand in terms of validity, reliability, and efficiency.

2. A greater number of items increases the internal consistency. Limiting the number of

items in a scale, however, contributes to simplicity and utility of the assessment, at the

expense of completeness, sensitivity, and reliability.

3. It is remarkable that none of the studies addressed differences in responsiveness

between scales, which is required to ensure the usefulness in the longitudinal evaluation

of PD. Responsiveness is an essential part of the statistical analysis since it involves the

ability of a measure to reflect change.

4. Video recordings may help to improve assessment of inter- and intra-rater reliability in

studies. These recordings have their limitations, however, for they can only be used to

score items that are clearly visible or audible. Rigidity, seborrhoea, and sialorrhea are

difficult to discern on tape and should not be included if a scale is used for video

assessments.

Acknowledgements

We thank Dr. F. Durif, Dr. M. Hely, Prof. L. Henderson, Prof. Dr. C. Kennard, Dr. P.

Martínez-Martín, Prof. Dr. J. Opara, Dr. J. M. Rabey, and Dr. N.C. Reynolds Jr. for providing

additional information. C.R. is funded by the Prinses Beatrix Fonds (project no. 97-0205) and

J.M. is funded by the Netherlands Organization for Scientific Research (project no. 0940-33-

021).

References

1. Rascol O, Brooks DJ, Korczyn AD, De Deyn PP, Clarke CE, Lang AE. A five-year study of the incidence

of dyskinesia in patients with early Parkinson's disease who were treated with ropinirole or levodopa. 056

Study Group. N Engl J Med 2000; 342(20):1484-1491.

2. Parkinson Study Group. Pramipexole vs levodopa as initial treatment for Parkinson disease: A

randomized controlled trial. JAMA 2000; 284(15):1931-1938.

3. Lang AE. Surgery for levodopa-induced dyskinesias. Ann Neurol 2000; 47(Suppl.):S193-S199.

4. Marsden CD, Schachter M. Assessment of extrapyramidal disorders. Br J Clin Pharmacol 1981;

11(2):129-151.

5. Handbook of neurological rating scales. New York: Demos Vermande, 1997.

Chapter 2

- 32 -

6. Marinus J, Ramaker C, Van Hilten JJ, Stiggelbout AM. Health related quality of life in Parkinson's

disease: a systematic review of disease specific instruments. J Neurol Neurosurg Psychiatry 2002;

72(2):241-248.

7. World Health Organization. International Classification of impairments, disabilities, and handicaps: a

manual of classifications relating to the consequences of diseases. Geneva: World Health Organization,

1980.

8. Simeonsson RJ, Lollar D, Hollowell J, Adams M. Revision of the International Classification of

Impairments, Disabilities, and Handicaps: developmental issues. J Clin Epidemiol 2000; 53(2):113-124.

9. The Cochrane Controlled Trials Register. The Cochrane Library 2001; Issue 3. Update Software. Internet:

http://.update-software.com/cochrane.



11. Nunnally JC, Bernstein IH. Psychometric Theory. 3 ed. New York: McGraw-Hill, Inc., 1994.

12. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;

33(1):159-174.

13. Feinstein AR. Clinimetrics. 1 ed. New Haven: Yale University Press, 1987.

14. Fleis JL. The measurement of inter-rater agreement. In: Statistical methods for rates and proportion. New

York: John Wiley, 1981: 212-236.



16. Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial

credit. Psychol Bull 1968; 70:213-220.

17. Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a critical review

and recommendations. J Clin Epidemiol 2000; 53(5):459-468.

18. Cutson TM, Sloane R, Schenkman M. Development of a clinical rating scale for persons with Parkinson's

disease. J Am Geriatr Soc 1999; 47(6):763-764.

19. Henderson L, Kennard C, Crawford TJ, Day S, Everitt BS, Goodrich S, et al. Scales for rating motor

impairment in Parkinson's disease: studies of reliability and convergent validity. J Neurol Neurosurg

Psychiatry 1991; 54:18-24.

20. Reynolds Jr. NC, Montgomery GK. Factor Analysis of Parkinson's Impairment: An Evaluation of the

Final Common Pathway. Arch Neurol 1987; 44:1013-1016.

21. Rabey JM, Bass H, Bonuccelli U, Brooks D, Klotz P, Korczyn AD, et al. Evaluation of the Short

Parkinson's Evaluation Scale: A New Friendly Scale for the Evaluation of Parkinson's Disease in Clinical

Drug Trials. Clin Neuropharmacol 1997; 20(4):322-337.

22. Hely MA, Wilson A, Williamson PM, O'Sullivan DJ, Rail D, Morris JGL. Reliability of the Columbia

Scale for Assessing Signs of Parkinson's Disease. Mov Disord 1993; 8(4):466-472.

23. Geminiani G, Cesana BM, Tamma F, Contri P, Pacchetti C, Carella F, et al. Interobserver Reliability

Between Neurologists in Training of Parkinson's Disease Rating Scales: A Multicenter Study. Mov

Disord 1991; 6(4):330-335.


- 33 -

24. Martínez-Martín P, Carrasco de la Pena JL, Ramo C, Antiguedad AR, Bermejo F. [Study of inter-observer

reliability in the use of qualitative scales assessing Parkinson disease (II)]. Arch de Neurobiol 1988;

51(5):287-291.

25. Martínez-Martín P, Gil-Nagel A, Morlán Gracia L, Balseiro Gómez J, Martínez-Sarriés FJ, Bermejo F, et

al. Unified Parkinson's Disease Rating Scale Characteristic and Structure. Mov Disord 1994; 9(1):76-83.

26. Richards M, Marder K, Cote L, Mayeux R. Interrater Reliability of the Unified Parkinson's Disease

Rating Scale for Motor Examination. Mov Disord 1994; 9(1):89-91.

27. Fahn S, Elton RL, and members of the UPDRS Development Committee. Unified Parkinson's Disease

Rating Scale. In: Fahn S, Marsden CD, Goldstein M, Clane DB, editors. Recent Developments in

Parkinson's Disease. Florham Park, NJ: Macmillan Healthcare Information, 1987: 153-163.

28. Goetz CG, Stebbins GT, Chmura TA, Fahn S, Klawans HL, Marsden CD. Teaching Tape for the Motor

Section of the Unified Parkinson's Disease Rating Scale. Mov Disord 1995; 10(3):263-266.

29. Ginanneschi A, Degl'Innocenti F, Maurello MT, Magnolfi S, Marini P, Amaducci L. Evaluation of

Parkinson's Disease: A New Approach to Disability. Neuroepidemiology 1991; 10:282-287.

30. Tervainen H, Calne D. Quantitative assessment of parkinsonian deficit. In: Rinne K, Linger M, Stamm G,

editors. Parkinson's disease: current progress, problems, and management. New York: Elsevier

Biomedical Press, 1980.

31. Baas H, Stecker K, Fischer PA. Value and appropriate use of rating scales and apparative measurement in

quantification of disability in Parkinson's disease. J Neural Transm 1993; 5:45-61.

32. Ginanneschi A, Degl'Innocenti F, Magnolfi S, Maurello MT, Catarzi L, Marini P, et al. Evaluation of

Parkinson's Disease: Reliability of Three Rating Scales. Neuroepidemiology 1988; 7:38-41.

33. Martínez-Martín P, Gil-Nagel A, Morlán Gracia L, Balseiro Gómez J, Martínez-Sarriés FJ, Bermejo F, et

al. Intermediate Scale for Assessment of Parkinson's Disease: Characteristic and Structure. Parkinsonism

Relat Disord 1995; 1(2):97-102.

34. Martinez-Martin P, Carrasco de la Pena JL, Ramo C, Antiguedad AR, Bermejo F. [Inter-observer

reproducibility of qualitative scales in Parkinson disease (I)]. Arch de Neurobiol 1987; 50(5):309-314.

35. Lieberman A, Dziatolowki M, Gopinathan G, Kopersmith M, Neophytides A, Korein J. Evaluation of

Parkinson's disease. In: Goldstein M, editor. Ergot compounds and brain function: neuroendocrine and

neuropsychiatric aspects. New York: Raven Press, 1980: 277-286.

36. Nouzeilles MI, Merello M. Correlation Between Results of Motor Section of UPDRS and Webster Scale.

Mov Disord 1997; 12(4):613.

37. Van Hilten B, Van der Zwan AD, Zwinderman AH, Roos RAC. Rating Impairment and Disability in

Parkinson's Disease: Evaluation of the Unified Parkinson's Disease Rating Scale. Mov Disord 1994;

9(1):84-88.

38. Stebbins GT, Goetz CG. Factor structure of the Unified Parkinson's Disease Rating Scale: Motor

Examination section. Mov Disord 1998; 13(4):633-636.

39. Stebbins GT, Goetz CG, Lang AE, Cubo E. Factor Analysis of the Motor Section of the Unified

Parkinson's Disease Rating Scale During the Off-State. Movement Disorders 1999; 14(4):585-589.

40. Kennard C, Munro AJ, Park DM. The reliability of clinical assessment of Parkinson's disease. J Neurol

Neurosurg Psychiatry 1984; 47:322-323.

Chapter 2

- 34 -

41. Canter GJ, De La Torre R, Mier M. A method for evaluating disability in patients with Parkinson's

disease. J Nerv Ment Dis 1961; 133:143-147.

42. LaRocca NG. Statistical and Methodological Considerations in Scale Construction. In: Munsat TL, editor.

Quantification of Neurologic Deficit. Boston: Butterworths, 1989: 49-67.

43. Slater RJ, LaRocca NG, Scheinberg LC. Development and testing of a minimal record of disability in

multiple sclerosis. Ann N Y Acad Sci 1984; 436:453-468.

44. Mitchell SL, Harper DW, Lau A, Bhalla R. Patterns of outcome measurement in Parkinson's disease

clinical trials. Neuroepidemiology 2000; 19(2):100-108.

45. Potvin AR, Tourtellotte WW, Syndulko K, Potvin J. Quantitative methods in assessment of neurologic

function. CRC Crit Rev Bioeng 1981; 6(3):177-224.

46. Jankovic J. Pathophysiology and clinical assessment of motor symptoms in Parkinson's disease. In: Koller

WC, editor. Handbook of Parkinson's disease. New York: Marcel Dekker, 1992: 99-126.

47. Ringendahl H. Standardization of a test of fine motor functions (Motorische leistungsserie MLS) in

Parkinson's disease. Nervenarzt 1998; 69(6):507-515.

48. Lauk M, Chow CC, Lipsitz LA, Mitchell SL, Collins JJ. Assessing muscle stiffness from quiet stance in

Parkinson's disease. Muscle Nerve 1999; 22(5):635-639.

49. Caligiuri MP, Galasko DR. Quantifying drug-induced changes in parkinsonian rigidity using an

instrumental measure of activated stiffness. Clin Neuropharmacol 1992;15(1):1-12.

50. Lang AET, Fahn S. Assessment of Parkinson's Diseaese. In: Munsat TL, editor. Quantification of

Neurologic Deficit. Boston: Butterworths, 1989: 285-309.

51. Martínez-Martín P. Rating Scales in Parkinson's Disease. In: Jankovic J, Tolosa E, editors. Parkinson's

Disease and Movement Disorders. Baltimore: Williams and Wilkins, 1993: 281-292.

- 35 -

33 Health-related quality of life in Parkinson’s disease:

a systematic review of disease-specific instruments

Johan Marinus1, Claudia Ramaker1, Jacobus J. van Hilten1, Anne M. Stiggelbout2

Department of 1Neurology and 2Medical Decision Making, Leiden University Medical Center, Leiden,

The Netherlands

Published in the Journal of Neurology, Neurosurgery & Psychiatry 2002; 72:241-248

Chapter 3

- 36 -

Abstract Objective. To compare and contrast disease-specific quality of life instruments in Parkinson’s Disease

and assess their clinimetric properties. Methods. Two reviewers independently evaluated both

thoroughness and results of studies regarding clinimetric characteristics of identified scales. Results.

Twenty studies were found reporting on the clinimetric properties of four scales. The content validity

of the Parkinson’s Disease Questionnaire-39 item version (PDQ-39), the Parkinson’s Disease Quality

of Life questionnaire (PDQL), and the ‘Fragebogen Parkinson LebensQualität’ (Parkinson Quality of

Life questionnaire; PLQ) was adequate to good, but for the Parkinson’s Impact Scale (PIMS) it was

insufficient. Construct validity of both the PDQ-39 and the PDQL was good, but for the PLQ and the

PIMS this was insufficiently evaluated. Internal consistency of all scale totals and of subscale totals of

the PDQL were good, whereas for the social support subscale of the PDQ-39 and four subscales of the

PLQ this was inadequate. Test-retest reliability was not evaluated for the PDQL and was adequate in

the other scales. Responsiveness was partially established for the PDQ-39, and not assessed for the

other scales. The number of available translations, as well as the number of studies in which these

instruments were used, differed considerably. Conclusions. The selection of an instrument partially

depends on the goal of the study. In many situations however, the PDQ-39 will probably be the most

appropriate HRQoL instrument. The PDQL may be considered as an alternative, whereas the PLQ

may be considered in studies involving German speaking patients with Parkinson's disease. Use of the

PIMS should be considered only as a means of identifying areas of potential problems.

Health related quality of life in Parkinson's disease

- 37 -

Introduction Quality of life (QoL) is a multidimensional concept that reflects a subjective evaluation of a

person’s satisfaction with life and concerns, among others, the relationships with family or

relatives, a person’s own health, the health of another close person, finances, housing,

independence, religion, social life, and leisure activities.1 Health contributes to QoL, and this

domain is often referred to as ‘health-related quality of life’ (HRQoL). The World Health

Organization (WHO) describes health as a state of complete physical, mental, social, and

spiritual wellbeing, and not merely the absence of disease or infirmity.2 This indicates that

psychological and social factors are an integral part of health. Sometimes ‘role functioning’ is

added as a separate entity to the concept of HRQoL. Bowling3 takes several definitions of

HRQoL into account and defines the concept as optimum levels of mental, physical, role (e.g.

work, parent, carer etc.), and social functioning, including relationships, and perceptions of

health, fitness, life satisfaction, and wellbeing. HRQoL gained importance in the past three

decades and is considered to be an important outcome measure in studies involving patients

with chronic diseases. Although initially physician-based evaluations were chosen as primary

endpoints in clinical research, more recent studies often consider HRQoL as their main

outcome measure.

HRQoL can be assessed both with generic and disease-specific instruments. The generic

instruments (for example Medical Outcomes Study–Short Form 36, Sickness Impact Profile)

offer the possibility of comparing HRQoL across different diseases. These instruments

contain items of a more general nature, and therefore lack specificity. Disease specific

instruments generally tap the same domains, but the items are tailored to particular disease

characteristics and may also include items dealing with side effects of therapy. Consequently,

disease specific instruments better reflect the consequences of that disease to a particular

person and generally are more sensitive to change in perceived HRQoL.4

In Parkinson’s disease (PD), several disease-specific HRQoL instruments have become

available in the past few years. Investigators who want to use such an instrument are faced

with the choice between several scales, which differ in many respects. In the process of

selecting the appropriate instrument, a comparison of the quality of these scales can be

helpful. We therefore compared and contrasted HRQoL instruments in PD and evaluated their

clinimetric properties.

Chapter 3

- 38 -

Methods

Search strategy

We reviewed the literature from 1965 to 2000 and used the following sources to identify

studies of interest: Medline, Embase, SCIsearch, the Cochrane Library, symposia reports,

Parkinson's disease handbooks, and reference lists of included publications. We used the

following search terms: Parkinson Disease, quality of life, health status, PDQ39, PDQL,

Parkinson’s Disease Questionnaire, PIMS, Parkinson’s Impact Scale, PLQ, Fragebogen

Parkinson Lebensqualität, PDQUALIF. These terms were combined with the following terms:

clinimetric, psychometric, reliability, validity, internal consistency, factor analysis, factor

structure, responsiveness, and sensitivity to change. The list of publications regarding each

scale was sent to the developer, with the request to add references in the case of

incompleteness.

Methods of the review

Two reviewers independently reviewed the identified publications according to a two-step

review process. Firstly, abstracts were reviewed for eligibility. Thereafter, eligible reports

were judged against a set of methodological criteria, in which both thoroughness

(methodological and statistical) and results of studies testing validity, reliability, and

responsiveness were assessed. To this extent we used a checklist, evaluating sample

characteristics, outcome measures, appropriateness of statistical analysis, and methodological

quality. The method of presenting the quality of scales was adopted from McDowell and

Newell.5 For reliability, Cronbach’s α greater than 0.7 and intraclass correlation coefficients

(ICC) or kappa's (K) greater than 0.7 were considered a good result, and studies were judged

‘thorough’ if the appropriate statistical procedures were used and the sample size was

considered to be large enough. With respect to validity, the result of content validity was

considered ‘good’ if all relevant domains were covered and ‘thorough’ if unselected

(community based) PD patients were closely involved in both the generation and evaluation

of items. When only outpatient samples or samples from a Parkinson's disease society were

used in this phase, we considered thoroughness to be moderate and when the patients were not

involved at all, we considered thoroughness to be poor.

Discrepancies were registered and resolved by consensus with a third and fourth reviewer.


- 39 -

Studies were eligible when they evaluated the following clinimetric characteristics of disease-

specific HRQoL instruments in PD: validity (content validity and construct validity, including

factor structure), reliability (internal consistency, test-retest reliability), and responsiveness.

Content validity reflects the extent to which a scale covers all important topics or domains.6

Construct validity is assessed by measuring the extent to which a scale correlates positively

with other measures that address the same construct (convergent validity), or negatively with

measures that address opposite constructs (divergent validity), in situations where a gold

standard is not available.

Another method of construct validation is the analysis of ‘known groups’ differences. In this

method patients are grouped on the basis of some characteristic, for example disease severity

or difficulties in performing activities. Patients with higher disease severity or patients

experiencing greater difficulty, are expected to have lower HRQoL.

In a factor analysis, items that correlate highly with each other group together in clusters

(factors), that are considered to reflect underlying common themes. Factor analysis may be

used to construct subscales, or to analyze the construct of an instrument.

Adequate internal consistency is a prerequisite for scales developed to measure one particular

construct. When all the items within a scale correlate highly with each other, the scale

demonstrates good internal consistency, and thus measures one underlying construct. Internal

consistency is calculated using Cronbach’s α. Values range from 0–1, with higher scores

reflecting higher internal consistency. For group comparisons in research situations, internal

consistency is considered adequate when α exceeds 0.7.7

Test-retest reliability is assessed by calculating the reproducibility of an instrument in stable

patients over a relatively short time period and is best calculated by means of the K

coefficient, or the ICC.

Responsiveness (or sensitivity to change) is the ability of an instrument to accurately detect

change when it has occurred. Responsiveness in HRQoL instruments is preferably

demonstrated with both internal indicators of change (correlation with patient’s own

evaluation of change) and external indicators of change (correlation with external measures).

Other information that was gathered included the procedure of item generation, type of scale,

number of items, response options, scoring method, available translations, availability of

instructions, conditions for use, administration time, and frequency of missing items.

Whenever information on studies or scales was unclear or incomplete, we contacted the

authors with the request to provide additional information.

Chapter 3

- 40 -

Results We found 21 studies addressing five scales. Five of these studies concerned translated

versions. One study, and consequently one scale (PDQUALIF)8, was excluded because

information on the format of this scale, as well as on the included items, was unavailable at

the time of our review. Therefore, 20 studies reporting on the clinimetric properties of four

scales were included in this review.

These scales were the Parkinson’s Disease Questionnaire-39 item version (PDQ-39),9 the

Parkinson’s Disease Quality of Life questionnaire (PDQL),10 the Parkinson’s Impact Scale

(PIMS),11 and the Parkinson LebensQualität (PLQ) (Parkinson QoL) questionnaire.12 Some

common characteristics of the scales are considered first. Details on individual scales are

discussed later, followed by a comparison of the clinimetric characteristics.

Disease-specific HRQoL scales

The four questionnaires were developed between 1995 and 1998. The scales can be self

completed by the patient, but can also easily be administered by an interviewer.

All scales can be used freely for scientific purposes, but in the case of the PDQL permission

for use must be granted from the developers. The PDQ-39 and the PDQL have a license fee

for commercial use. The administration time of these scales was never formally assessed, but

is expected to vary from 10 minutes (PIMS), to 15 or 20 minutes (PDQ-39, PDQL, PLQ).

All scales use a five point ordinal scoring system. The number of available translations differs

considerably between scales, ranging from one (PLQ) or two (PIMS), to 10 (PDQL) or 21

(PDQ-39). The number of studies in which these instruments have been used, range from one

(PLQ and PIMS) to at least five (PDQL) or 18 (PDQ-39). An instruction manual for scientific

users is only available for the PDQ-39 and the PLQ.

PDQ-39

The Parkinson’s Disease Questionnaire (PDQ-39) was designed by Peto et al.9 The scale has

39 items. Higher scores reflect lower HRQoL. The PDQ-39 has eight subscales: mobility (10

items), activities of daily living (six items), emotional well-being (six items), stigma (four

items), social support (three items), cognitions (four items), communication (three items), and

bodily discomfort (three items). Items in each subscale,13 as well as in the total scale,14 can be

summarized into an index and transformed linearly to a 0–100 scale. A shorter summary

index (PDQ-8 SI) can also be calculated.15


- 41 -

The scale has been formally validated in United States English,16 United Kingdom English,9

German,17 and Spanish.18,19 A French version is currently being validated.20 Translations are

available in Australian English, Canadian English, Canadian French, Czech, Danish, Dutch,

Finnish, Hebrew, Italian, Polish, Portuguese, Russian, Swedish, Greek, Japanese, and Serbian.

PDQL

The PDQL was developed by de Boer et al.10 This scale has 37 items. Higher scores reflect

better HRQoL. Four subscales are discerned: parkinsonian symptoms (14 items), systemic

symptoms (seven items), social function (seven items), and emotional function (nine items).

The PDQL has been formally validated in Dutch,10 United Kingdom English,21 German,22 and

French.22 Translations are available in Argentinean Spanish, Belgian Dutch, Italian,

Portuguese, and Spanish.

PIMS

The PIMS was developed by Calne et al.11 The scale has 10 items and is completed three

times, one month apart. The items in the PIMS are broadly formulated and concern domains

rather than specific situations. Higher scores reflect lower HRQoL. Stable patients only score

each item once, whereas patients with fluctuations judge the negative impact for both 'on' and

'off' periods. The scale can be self completed, but the developers recommend that patients be

advised with respect to their disease state (stable or fluctuating). Guidelines for use by

physicians are available.

The scale is only available in Canadian English and Canadian French. The scale contains two

optional items (sexuality and financial security) that were left unanswered in 32% and 13% of

the questionnaires, respectively.

PLQ

The PLQ was designed by van den Berg.12 The scale has 44 items. Items in the scale are

grouped in nine domains: depression (five items), physical achievement (five items),

concentration (four items), leisure (five items), restlessness (four items), activity limitation

(six items), insecurity (five items), social integration (five items), and anxiety (five items).

There are five types of standard questions and four categories of responses, worded in two

directions. Responses can be recoded with a spreadsheet program that is available from the

author. The scale has been validated only in German.

Chapter 3

- 42 -

Scale development, scoring and time-frame

The method of item generation differed between scales. In the PIMS, items were decided

upon by consensus between 10 specialized nurses and tested in 167 patients. In the other three

scales, patients were directly involved in the generation and evaluation of items. Although the

PDQ-39 and the PLQ solely relied on patient information for item generation, items in the

PDQL were also obtained from interviews with neurologists, relatives of patients, and

studying the literature.

Items in the PDQ-39 were initially generated through interviews with 20 patients visiting an

outpatient neurology clinic. This resulted in a 65 item list that was reduced to 39 items on the

basis of a survey in 359 patients. Items in the PDQL were generated by means of interviews

with five patients and a relative, consulting neurologists, and reviewing the literature.

Seventy-three items were found and piloted in 13 inpatients and outpatients. Items endorsed

most often or rated as most important were selected for the final 37-item version, which was

tested in 384 patients. Items in the PLQ were generated by interviews with groups of patients.

This resulted in 113 items that subsequently were piloted in 61 inpatients and outpatients. The

questionnaire was then reduced to 44 items and tested in 405 patients (which constituted a

response rate of only 38%).

The PDQ-39 and the PDQL both assess the frequency with which patients experience

difficulties. The PIMS assesses the impact of the disease on patient’s lives, whereas the PLQ,

depending on the item of interest, assesses intensity, applicability, or quality.

Scales differ considerably in the period they refer to. The PIMS does not specify a time frame,

whereas the PDQL assesses the past three months, the PDQ-39 the past month, and the PLQ

the past week. The PDQL and PLQ assess the items ‘as is’, without asking the patient to

indicate whether this was due to PD, whereas the PDQ-39 and the PIMS relate the items to

having PD. In the PDQ-39 all items begin with: ‘Due to having PD, how much of the time did

you have trouble with…’. In the PIMS patients are asked to rate the negative impact of PD in

a particular domain.

Content validity

The content of the scales differs considerably. We grouped items thematically on the basis of

face value in domains reflecting physical, mental, and social or role functioning (table 1).

Whenever there was doubt regarding the correct allocation, items were assigned to domains

according to subscale allocation or factor structure as reported in the original studies. Table 2


- 43 -

shows that about half of the items in the PDQ-39 and the PDQL concern physical features,

whereas the PIMS has only two items in this domain. In the PIMS, on the contrary, half of the

items deal with the social domain. In the PLQ almost half of the items involve mental

features.

In the physical domain, only transportation is addressed by all scales. In the PIMS, the only

other theme addressed in this domain concerns taking part in traffic. The PDQ-39, PDQL, and

PLQ share items on walking, motor features, and other disease features. Transfers are

addressed in detail in the PDQL, but are lacking in the PDQ-39 and PLQ. Items on self care

are assessed in detail in the PDQ-39, and as an overall item in the PLQ, but are lacking in the

PDQL. Many physical items in the PDQ-39 concern activities (‘disabilities’), whereas in the

PDQL and PLQ most items reflect impairments.

In the mental domain all scales include items on mood, feelings, and anxiousness. The PIMS

does not incorporate items on cognition, whereas the other scales address both concentration

and memory. The PLQ contains seven items addressing anxiousness.

In the social domain all scales address some aspect of relationships. Relationships with

partner, family, or friends are only addressed in the PDQ-39 and the PIMS. Sexuality is only

addressed in the PDQL and the PIMS. Social stigma is not assessed in the PIMS. Role

functioning is adequately assessed in the PIMS, but only marginally in the PDQ-39 and the

PLQ, whereas the PDQL does not address this theme at all.

Construct validity

Construct validity of both the PDQ-39 and the PDQL was thoroughly established using

generic HRQoL scales, disease specific instruments, ‘known groups’ comparisons, and other

health measures. The PLQ was less thoroughly assessed. Correlations with a generic HRQoL

scale and an ADL scale were adequate, but correlations with disease specific instruments

were poor, and known groups differences were not assessed. For the PIMS only known

groups comparisons were performed, demonstrating significant differences between stable

and fluctuating patients in their off situations (table 3).

Chapter 3

- 44 -

Table 1. Content HQoL-scales

ITEMS PDQ-39 PDQL PIMS PLQ

Physical

Walking

-problems getting around house 6

-walking 4,5 24 25

-shuffling; initiation / shuffling 11 7

Transfers

-turning in bed 35

-getting up 32

-turning around while walking 14

self care

-self care 23

-washing 11

-dressing 12

-cutting food 15

-holding drinks without spilling 16

-doing up buttons / shoe laces 13

daily activities

-looking after home* 1 2 (5) 40

-carrying bags / shopping 3

-doing leisure activities* 1 1 (8) 7 (38)

-getting around in public 7

Transportation

-take part in traffic 29 6 21

-needed company when going out 8

motor features

-mobility 27

-slowness 1

-rigidity 1 3

-dexterity 9 26

-shaking hands; tremor 6 4

-sudden uncontrolled movements 30 2

-on-off periods 20 14

-speech; talking 34 22

-writing 14 16

Other disease features

-generally unwell 2

-extreme exhaustion 13

-worn out 7 20


- 45 -


-painful cramps or spasms 37

-pain in joints or body 38 6

-unexpected falling asleep 30

-problems sleeping (at night) 19 5

-difficulty sitting still 27

-drooling 25

-incontinence / frequent urinating 28

-constipation 33

-feeling unpleasantly hot or cold 39

-feeling that body parts don’t belong to oneself 44

Mental

Cognition

-concentration 31 31 8

-adjusting to circumstances 32

-memory 32 34 28

Mood

-depression 17 26 2*2 11,30

-weepy or tearful 19

feelings (positive)

-feelings; confidence 1 41

-self worth 1 29

feelings (negative)

-being tense; stress 4 2 10

-angry or bitter 20

-unsure due physical limitations; trust body functions 5 34

-safety; doing what you want without harming yourself 8

-unsure around others 18

-feeling isolated / lonely 18

-feeling ignored 36

anxious / worry

-anxious 21 2

-worry loss cognitive capacity 18

-worry physical loss 19

-worry illness 42

-worry future; afraid progression 22 15 13,16,43

-worry operation 37

-fear side effects 17

-worry falling 9

Other features

Chapter 3

- 46 -


-dependency on medication 15

-difficulty accepting illness 21

-confined to house more than liked 10

-distressing dreams / hallucinations 33 9

-disinterest/listnessless 12

Social and role functioning

Relationships

-problems with close relationships 27 3

-lacked support spouse / partner 28 3

-lacked support family / friends 29

-impact community relationships 4

-sexual relationship 36 10

-wanted to isolate oneself 35

-keeping up relationships 24

(fear of) social stigma

-embarrassed (in public) 25 10

-worried about other’s reaction 26

-felt had to conceal PD 23

-feeling that illness is noticed 36

-avoid eating / drinking in public 24

-signing name in public 23

role functioning

-work 5

-looking after home* 1 (2) 5

-financial security/support 9

-feeling dependent of others 31

social activities

-doing hobbies 3 39

-doing leisure activities* 1 (1) 8 (7) 38

-cancel social activities 12

-cancel important activities 33

-less able to go on holiday 17 37

-visit exhibitions 22

-problems with communication 35

Numbers correspond with number of the item in the scale; numbers in parentheses indicate that the item is

assessed, but was primarily allocated to another section * 1 ‘leisure activities’ and ‘looking after home’ present in physical and social domain because of differences in

subscale allocation among scales * 2 item numbers of the PIMS are sometimes presented more than once because of broad formulation.


- 47 -

Table 2. Number of items per domain.

PDQ-39 PDQL PIMS PLQ

Physical 19 21 2 16

Mental 12 9 3 19

Social 8 7 5 9

Total number of items 39 37 10 44

Items are allocated to a domain on the basis of face value, or, in case of ambiguity, on the basis

of subscale allocation or factor structure, as presented by the scale developers in original study.

Internal consistency

Cronbach’s α’s for scale totals are all well over 0.8 (table 3). Alpha’s for subscales are higher

than 0.7, except for social support in the UK version of the PDQ-39,9 social support16,23 and

cognitions23 in the US version of the PDQ-39, for cognitions and bodily discomfort in the

Spanish version of the PDQ-39,19 and for mood, concentration, restlessness, and social

integration in the PLQ.12

Test-retest reliability

Test-retest reliability was not assessed for the PDQL. In the PIMS an ICC of 0.72 was

reported for the total score. Reproducibility of subscales was assessed for the PDQ-39 and the

PLQ. Subscales with correlations lower than 0.7 concerned the social support subscale in the

PDQ-39 and the anxiety subscale in the PLQ.

Responsiveness

Responsiveness was not established for either the PDQL or the PIMS. In the PLQ it was

assessed only in a small subset of 16 patients during a period in the hospital. Paired t-tests

were only significant for activity limitation and insecurity. When the tests were corrected for

multiple comparisons, all nine scale changes were non-significant. Two studies reported on

the responsiveness of the PDQ-39. Fitzpatrick et al.24 found moderate standardized response

means for the mobility and ADL subscales in 51 patients who indicated their situation had

worsened over a period of four months. Change in the PDQ-39 score was significantly

correlated with self reported change and change in the SF-36. In the other study, Harrison et

al.25 found that four subscales of the PDQ-39 (mobility, ADL, stigma, social support) were

responsive to deterioration in health state.

Table 3. Clinimetric characteristics of HQoL scales

scale reliability validity responsiveness intern consist test-retest content construct factorial

PDQ-39 α total scale:

0.84 - 0.9414,23,27

α subscales:

0.69 - 0.949 or

0.66 – 0.9527 or

0.57 – 0.9423

item-total correlation:

0.67 - 0.919

r = 0.68 - 0.949

(n=167; 3-6 days)

+++9 generic HRQoL scales:

• range subscales PDQ-39 and SF-36: r =- 0.34 - -0.809

• summary indices of PDQ-39 and EQ-5D: rs = -0.7528

disease-specific scales:

• H&Y: rs = 0.6028,29; vs PDQ subscales:rs = 0.16 – 0.7224,27

• SES: rs = -0.6628,29

• UPDRS-ME: rs = 0.4129

• Columbia, range subscales: rs = 0.08 – 0.5824,27

other measures:

• Beck DI with PDQ-39: rs = 0.6828, with PDQ-emotional: r = 0.7325

• Barthel Index with PDQ-39-ADL: rs = 0.325

• MMSE: rs = -0.3228,29

analysis of group differences:

• by self reported severity: s9

• by H&Y (clinic sample) : s (except social)27 or s (except

emotional, stigma, social, cognitions, body discomfort)23

• by H&Y (population sample): s30 (except social, stigma)

8 factoren9

• mobility

• ADL

• emotional

• stigma

• social support

• cognitions

• communication

• bodily discomfort

- SRM for mobility and ADL:

0.55 and 0.43, respectively

(n=51, 4 month interval)24

- change PDQ-39 compared to

change in24:

• Columbia: ns

• H&Y: ns

• SF-36: s

• self report change: s

- 4 subscales (mobility, ADL,

stigma, social support) show

significant deterioration

- PDQ-39 more responsive

than GHQ-28 and OPCS25

PDQL α total scale:

0.9410 – 0.9521

α subscales:

0.80 – 0.8710 or

0.77 – 0.8721

0 ++10 generic HRQoL scales:

• SF24, related domains: range r = 0.46 – 0.6610

disease-specific measures:

• ‘Webster contributes significant to QoL’21

other measures:

• CES-D vs PDQL-emotional: r = -0.7910

• MOS soc support survey vs PDQL-social: r = 0.1310

• ‘CAMCOG/GDS-15 contribute significant to QoL’21


• by SES, 3 levels: s10

• by Webster, 3 levels: s (all, except emotional)21

4 factors10:

• parkinsonian

• systemic

• social

• emotional

0

scale reliability validity responsiveness

intern consist test-retest content construct factorial

PLQ α total scale: 0.9512

α subscales: 0.62 - 0.8712

correlation subscale - total

scale: r = 0.73 - 0.8612

(n = 405)

total scale:

r = 0.8712

subscales:

r = 0.69-0.8612

(n = 65; 14 days)

++12

generic HRQoL scales:

• EORTC-QLQ30: r = 0.67 (n = 111)12

disease-specific measures:

• H&Y: r = 0.27, ns (n = 21 – 29)12

• SES: r = -0.27, ns (n = 21 – 29)12

other measures:

• QoL-VAS: r = -0.28, ns (n = 21 – 29)12

• ADL–scale: r = 0.73 (n = 111)12

9 subscales, 1 or 2

factors per subscale,

>50% variance12

• depression

• physical.

achievement

• leisure

• concentration

• social integration

• insecurity

• restlessness

• activity limitation

• anxiety

2 week interval

no external criterion

(n=16)12

PIMS α total scale: 0.9011

correlation among factors:

r = 0.10 - 0.4611

ICC = 0.7211

(n= 149; 1 month)

-11

consensus


• by self assessed fluctuations: s (only between stable patients and

fluctuating patients at their worst)11

4 factors, explaining

72% variance11:

• psychological

• social

• physical

• financial

0

n = number of patients; α = Cronbach’s α; r = Pearson; rs = Spearman;; ICC = Intraclass correlation coefficient; +++ = good; ++ = moderate; + = fair; - = poor; 0 = no

numerical results reported; ? = unclear; SES = Schwab and England scale; Beck’s DI = Beck Depression Inventory; s = significant; ns = not significant; SRM =

standardized response mean.

Table 4. Quality assessment table

SCALE RELIABILITY VALIDITY RESPONSIVENESS

internal consistency test-retest content construct

PDQ-39 +++ / +++ +++ / +++ ++ / +++ +++ / +++ ++ / +

PDQL +++ / +++ 0 ++ / ++ +++ / +++ 0

PIMS +++ / +++ +++ / +++ ? / - ? / - 0

PLQ +++ / +++ +++ / +++ ++ / +++ + / ++ - / -

+++ /+++: signs before the slash refer to results of validity, reliability, and responsiveness testing and signs behind the slash refer to thoroughness (‘strength of evidence’)

of validity, reliability, and responsiveness testing

Results of validity, reliability, and responsiveness testing: / Thoroughness of validity, reliability, and responsiveness testing:

0 no numerical results reported 0 no reported evidence

? results not interpretable ? results not interpretable

- poor results - poor evidence

+ fair results + fair evidence

++ moderate results ++ moderate evidence

+++ good results +++ good evidence


- 51 -

Discussion Scales differed considerably in content. Probably this is largely the result of differences in the

ways the items were generated and reduced, and differences among the samples involved in

generating and evaluating the items may have added to the heterogeneity. In the PDQ-39 and

the PLQ items were only derived from interviews with patients, whereas in the PDQL also

information from neurologists, relatives, and the literature was used. Items in the PIMS were

obtained through consensus between specialized nurses.

To guarantee good content validity, patients should be closely involved in both item

generation and evaluation. For item generation other sources may be used as well. For the

evaluation of items, however, a large sample of patients should be involved. This sample

should ideally consist of patients attending a neurology clinic, patients living in nursing

homes, and of unselected patients living in the community. None of the scales applied this

method. The information on relevance of items in the item reduction phase was obtained from

patients who were members of a Parkinson’s disease society (PDQ-39), or from both

inpatients and outpatients of a neurology clinic (PDQL, PLQ). In the PDQL, only four

outpatients and one patient and a relative from the Parkinson’s disease society were involved

in the item generation process, whereas only thirteen inpatients and outpatients were involved

in the evaluation process. Both the small sample sizes as well as the fact that only a clinic-

based sample was involved, may have affected the final make-up considerably. For instance,

the item ‘feeling worried about a possible operation’ in the PDQL was not found in other

scales.

Different strategies with respect to the item reduction process – that is, psychometric or

clinimetric - affected the final content as well. In the first strategy, considerations of the

measurement properties of scales prevail, whereas in the second the completeness of the

assessment is considered more important. In the PDQ-39 and the PLQ, the developers used a

predominantly psychometric strategy. In the PDQ-39, items were omitted when they were

considered redundant, had low item scale correlations, or clustered in subscales that could not

be meaningfully interpreted. In the PLQ, items with low item scale correlations, non-normal

frequency distributions, often missing values, floor or ceiling effects, or items that could not

clearly be assigned to subscales, were removed. The developers of the PDQL, however,

followed a more clinimetric strategy and included all items patients considered important in

the final scale. Items that loaded on more than one subscale were assigned to subscales on the

basis of face validity.

Chapter 3

- 52 -

When the scales are compared in more detail, the differences in content become apparent. The

PDQ-39 lacks items addressing transfers and night-time sleep problems in the physical

domain, but covers all relevant themes in the mental domain. Role functioning is

insufficiently covered. Sexuality is not addressed in this scale.

The PDQL misses items on self care in the physical section, taps all items in the mental

domain, and lacks items on close relationships and role functioning in the social domain.

Our findings regarding the content validity of the PDQ-39 and the PDQL largely agree with

Damiano et al.26 However, their criterion list did not contain items explicitly addressing

transfers and hobbies.

The PIMS lacks items on walking, transfers, self-care, motor features, and other disease

features in the physical domain, on cognition and ‘other features’ in the mental domain, and

on social stigma in the social domain.

The PLQ lacks items on transfers and communication in the physical domain, but in the

mental domain all relevant themes are addressed. Role functioning and relationships are

insufficiently covered in the social domain. The PLQ is the only scale that explicitly asks for

the consequences of being dependent of others.

The construct validity of both the PDQ-39 and the PDQL are well established. For the PLQ

this was less thoroughly demonstrated, whereas for the PIMS construct validation with other

measures was not performed.

All scales share factors on physical, mental, and social functions. The other factors that

emerged in the scales were very different.

The internal consistency for scale totals is adequate for all scales. All subscales of the PDQL

demonstrate good internal consistency, whereas the social support subscale in the PDQ-39

and four subscales in the PLQ showed insufficient internal consistency. Test-retest reliability

was not assessed for the PDQL and was found to be adequate for the other scales, except for

the anxiety subscale in the PLQ and, again, the social support subscale in the PDQ-39.

Responsiveness was not assessed at all for either the PDQL or the PIMS. For the PLQ

responsiveness was inadequately evaluated. There are indications that the PDQ-39 is capable

of detecting deterioration, but for improvement this still needs to be established.

A comparison of the clinimetric qualities of the scales is presented in table 4.

Apart from methodological considerations, other issues may influence the selection of a

HRQoL instrument. For instance, the time frame is of importance. When short periods are

assessed (for example, one week in the PLQ), day to day differences may affect the total score


- 53 -

considerably, resulting in lower comparability over time. Assessing longer periods may

therefore be preferred, as is done in both the PDQ-39 (one month) and the PDQL (three

months).

Another factor that may affect the selection of a scale is the framing of questions. The PDQL

and the PLQ evaluate health ‘as is’, regardless whether complaints were caused by PD or not.

Both other scales relate the health state to having PD. However, it may be difficult or even

impossible for patients to judge whether a particular situation (for example, sleep problems,

fatigue) is caused by PD, or is the result of aging or some comorbid condition.

Other considerations that may guide the selection of an appropriate HRQoL instrument for a

particular study may concern the language and the number of studies in which the instrument

has been used. In this respect the PDQL, and especially the PDQ-39, are attractive candidates.

The intended sample may also influence the selection. The PLQ was tested only in a sample

of patients that were members of a Parkinson’s disease society, whereas the PIMS was used

only in an outpatient clinic sample. The PDQL was evaluated both in a Parkinson's disease

society sample and a community based sample, whereas the PDQ-39 was evaluated in all the

aforementioned populations.

The number of items is not a useful criterion for selection, because the numbers hardly differ

between the PDQ-39, the PDQL and the PLQ, whereas insufficient clinimetric support exists

for the only short scale, the PIMS.

In most other respects the scales differed only marginally, and therefore these factors are not

expected to play a role in selecting a scale.

The selection of an instrument will partly be based on the goal of the study. For certain

interventions, some domains of HRQoL may be more important to assess than others and may

thus influence the selection of the instrument. In many situations, however, the PDQ-39 will

probably be the most appropriate HRQoL instrument, because this scale has been tested most

thoroughly, has adequate clinimetric characteristics, has been used in the largest number of

studies, and is available in many languages. However, responsiveness of this scale still needs

to be assessed more thoroughly, especially with respect to situations where patients are

expected to improve (for example, intervention studies). The PDQ-39 lacks items on self

image, night time sleep problems, sexual activity, and transfers. Reliability of the social

support subscale (test-retest and internal consistency) is inadequate. The PDQL may be

considered as an alternative. Information on test-retest reliability and responsiveness,

however, is still lacking and the scale does not include items on self-care, role functions, and

Chapter 3

- 54 -

close relationships. The PLQ may be considered in studies involving German-speaking

patients with PD. However, construct validity and responsiveness are insufficiently assessed,

and items concerning transfers and speech are missing, whereas relationships and role

functions are only scarcely addressed. Use of the PIMS should be considered only as a means

of identifying areas of potential problems. Items in this scale lack specificity, whereas the

content validity is insufficiently founded and construct validity and responsiveness have not

been assessed at all.

Acknowledgements

J.M. is supported by the Netherlands Organization for Scientific Research (project no. 0940-

33-021) and C.R. is supported by the Prinses Beatrix Fund (project no. 97-0205). We thank C.

Berne, S. Calne, A.G.E.M. de Boer, J.E. Harrison, V. Peto, M. van den Berg, and M.D. Welsh

for providing additional information.

References

1. Bowling A. What things are important in people's lives? A survey of the public's judgements to inform

scales of health related quality of life. Soc Sci Med 1995; 41(10):1447-1462.

2. World Health Organization. WHO-Constitution - the first two chapters. World Wide Web. 2001.

3. Bowling A. Health-related quality of life: a discussion of the concept, its use and measurement.

Measuring disease. Buckingham: Open University Press, 1995: 1-19.

4. Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials.

CMAJ 1986; 134:889-895.



6. Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis 1985;

38(1):27-36.

7. Bland JM, Altman DG. Cronbach's alpha. BMJ 1997; 314(7080):572.

8. Welsh M, McDermott M, Holloway R, Plumb S, Pfeiffer R, Hubble J. Development and Testing of the

Parkinson's Disease Quality of Life Scale: The PDQUALIF. Mov Disord 1997; 12(5):836.

9. Peto V, Jenkinson C, Fitzpatrick R, Greenhall R. The development and validation of a short measure of

functioning and well being for individuals with Parkinson's disease. Qual Life Res 1995; 4(3):241-248.

10. De Boer AG, Wijker W, Speelman JD, de Haes JC. Quality of life in patients with Parkinson's disease:

development of a questionnaire. J Neurol Neurosurg Psychiatry 1996; 61(1):70-74.

11. Calne S, Schulzer M, Mak E, Guyette C, Rohs G, Hatchard S, et al. Validating a quality of life rating

scale for idiopathic Parkinsonism: Parkinson's impact scale. Parkinsonism Relat Disord 1996; 2(2):55-61.


- 55 -

12. Van den Berg M. Leben mit Parkinson - Entwicklung und psychometrische Testung des Fragenbogens

PLQ. Neurologie & Rehabilitation 1998; 4(5):221-226.

13. Peto V, Jenkinson C, Fitzpatrick R. PDQ-39: a review of the development, validation and application of a

Parkinson's disease quality of life questionnaire and its associated measures. J Neurol 1998; 245 Suppl

1:S10-S14.

14. Jenkinson C, Fitzpatrick R, Peto V, Greenhall R, Hyman N. The Parkinson's Disease Questionnaire

(PDQ-39): development and validation of a Parkinson's disease summary index score. Age Ageing 1997;

26(5):353-357.

15. Jenkinson C, Fitzpatrick R, Peto V, Greenhall R, Hyman N. The PDQ-8: development and validation of a

short-form Parkinson's disease questionnaire. Psychol Health 1997; 12:805-814.

16. Bushnell DM, Martin ML. Quality of life and Parkinson's disease: Translation and validation of the US

Parkinson's Disease Questionnaire (PDQ-39). Qual Life Res 1999; 8(4):345-350.

17. Berger K, Broll S, Winkelmann J, Heberlein I, Muller T, Ries V. Untersuchung zur Reliabilität der

deutschen Version des PDQ-39: Ein krankheitsspezifischer Fragenbogen zur Erfassung der Lebensqualität

von Parkinson-Patienten. Aktuel Neurol 1999; 26(4):180-184.

18. Martinez-Martin P, Frades PB. Quality of life in Parkinson's disease: validation study of the PDQ-39

Spanish version. The Grupo Centro for Study of Movement Disorders. J Neurol 1998; 245 Suppl 1:S34-

S38.

19. Martinez-Martin P, Frades B, Jimenez Jimenez FJ, Pondal M, Lopez Lozano JJ, Vela L, et al. The PDQ-

39 Spanish version: reliability and correlation with the short- form health survey (SF-36). Neurologia

1999; 14(4):159-163.

20. Mauduit N, Schuck S, Allain H, Chaperon J. Échelles et questionnaires dans la maladie de Parkinson. Rev

Neurol (Paris) 2000; 156 Suppl 2Bis:63-69.

21. Hobson P, Holden A, Meara J. Measuring the impact of Parkinson's disease with the Parkinson's Disease

Quality of Life questionnaire. Age Ageing 1999; 28(4):341-346.

22. Marquis P, Girod I, Berdeaux G, Peyro Saint-Paul H, Cialdella P. Psychometric analysis of French and

German versions of the Parkinson's Disease Quality of Life Questionnaire (PDQL). Qual Life Res 1998;

7:632.

23. Damiano AM, McGrath MM, Willian MK, Snyder CF, LeWitt PA, Reyes PF, et al. Evaluation of a

measurement strategy for Parkinson's disease: assessing patient health-related quality of life. Qual Life

Res 2000; 9(1):87-100.

24. Fitzpatrick R, Peto V, Jenkinson C, Greenhall R, Hyman N. Health-related quality of life in Parkinson's

disease: a study of outpatient clinic attenders. Mov Disord 1997; 12(6):916-922.

25. Harrison JE, Preston S, Blunt SB. Measuring symptom change in patients with Parkinson's disease. Age

Ageing 2000; 29:41-45.

26. Damiano AM, Snyder C, Strausser B, Willian MK. A review of health-related quality-of-life concepts and

measures for Parkinson's disease. Qual Life Res 1999; 8(3):235-243.

27. Jenkinson C, Peto V, Fitzpatrick R, Greenhall R, Hyman N. Self-reported functioning and well-being in

patients with Parkinson's disease: comparison of the short-form health survey (SF-36) and the Parkinson's

Disease Questionnaire (PDQ-39). Age Ageing 1995; 24(6):505-509.

Chapter 3

- 56 -

28. Schrag A, Selai C, Jahanshahi M, Quinn NP. The EQ-5D-a generic quality of life measure-is a useful

instrument to measure quality of life in patients with Parkinson's disease. J Neurol Neurosurg Psychiatry

2000; 69(1):67-73.

29. Schrag A, Jahanshahi M, Quinn N. What contributes to quality of life in patients with Parkinson's disease?

J Neurol Neurosurg Psychiatry 2000; 69(3):308-312.

30. Schrag A, Jahanshahi M, Quinn N. How Does Parkinson's Disease Affect Quality of Life? A Comparison

With Quality of Life in the General Population. Mov Disord 2000; 15(6):1112-1118.

- 57 -

44 Development of an instrument for the assessment of

cognition in Parkinson's disease

Johan Marinus1, Martine Visser1, Nicolaas A. Verwey2, Frans R.J. Verhey3, Huub A.M.

Middelkoop4 5, Anne M. Stiggelbout6, Jacobus J. van Hilten1

Departments of 1Neurology, 4Neuropsychology, 6Medical Decision Making, Leiden University

Medical Center, Leiden, The Netherlands; 2Faculty of Medicine, Utrecht University, Utrecht, The

Netherlands; 3Department of Psychiatry, University Hospital of Maastricht, Maastricht, The

Netherlands; 5Department of Cognitive Psychology, Leiden University, Leiden, The Netherlands

Submitted

Chapter 4

- 58 -

Abstract

The cognitive deficits in Parkinson's disease (PD) may be underestimated if classical instruments for

cognition are used. These instruments generally put an emphasis on cortical functions. Some of these

functions are relatively spared in PD, whereas other cognitive functions that are frequently affected in

PD, are lacking. The objective of this study was to develop a short and practical instrument that is

sensitive to the specific cognitive deficits in PD, and is reliable and valid. We did not have the

objective to construct a screening tool or a diagnostic instrument. In stead, the instrument is intended

for comparing groups in research situations and for assessing change in individual functioning over

time. A literature search was conducted to identify the most frequently affected cognitive domains in

PD, and to select or develop candidate items from these domains for the initial scale. This scale was

next tested in 85 patients and 75 age-, education-, and sex-matched controls. Items that met predefined

criteria for data quality, reproducibility, and discriminative properties, were included in the final scale.

This scale, the SCOPA-COG, consists of 10 items with a maximum score of 43, with higher scores

reflecting better performance. The test-retest reliability of the sumscore was 0.78 (intraclass

correlation coefficient), and ranged from 0.40-0.75 for individual items (weighted kappa). Cronbach's

α was 0.83. Construct validity of the scale was supported by the expected correlations with the

CAMCOG and the MMSE, and by differences found between groups of participants classified by

dementia status, and between patients grouped by disease severity. The scale showed a clear trend

towards lower cognition scores for patients with more advanced PD. This trend was more pronounced

in the SCOPA-COG than in the CAMCOG and the MMSE and use of this scale may provide a better

insight into the longitudinal development of cognitive deficits in PD. The coefficient of variation of

the SCOPA-COG was higher than that of the CAMCOG or the MMSE, indicating a better ability to

detect differences between individuals. The SCOPA-COG is a short, reliable, and valid instrument that

is sensitive to the specific cognitive deficits in PD.

Assessment of cognition in Parkinson's disease

- 59 -

Introduction Patients with Parkinson’s disease (PD) generally perform less well on cognition tests than

age-, education-, and sex-matched controls.1-4 The reported prevalence of dementia in patients

with this condition varies widely between studies, with values ranging from 4 to 93 percent.1

This large difference is caused by differences in methodology, applied criteria for dementia,

and sample characteristics. The mean prevalence rate of 27 studies discussed in a review by

Cummings was 39.9%.1 More recent cross-sectional community studies also reported

prevalence rates of about 40 percent.5,6 The odds ratio for developing dementia compared

with controls was estimated 5.9 in a study by Aarsland et al.7 A longitudinal study by the

same authors found an 8-year prevalence of 78.7% in a large, representative community-

based cohort of patients with idiopathic PD.8

The pattern of cognitive decline in PD differs from that in Alzheimer’s disease (AD).1,9-11

According to the clinical diagnostic criteria of the fourth edition of the Diagnostic and

Statistical Manual of Mental Disorders (DSM-IV), the diagnosis of dementia is made if

memory is impaired, in combination with a decline in at least one of the four following

domains: language, gnosis, praxis, and executive functioning. The problems should be

sufficiently large to interfere with daily activities.12 The DSM-IV criteria are described in

qualitative terms. Patients with PD generally have normal or only slightly decreased

performance in language, gnosis, and praxis, whereas memory and executive functions are

more prominently affected.1,13 This pattern of cognitive decline is often referred to as

‘subcortical dementia’. Some authors state that the current definition of dementia is biased

towards the cortical type phenomenology.11,13 Therefore, applying the classical instruments

for dementia to patients with PD may lead to underestimation of the cognitive deficit in PD.

A quantitative instrument that would focus on the most vulnerable cognitive functions in PD

would provide important additional information. Such an instrument may allow finer

discrimination, detect changes earlier in the course of the disease, and serve as an 'index of

severity'. Items in this scale should be insensitive to the severity of the motor deficit.

The objective of this study was to develop a short and practical instrument, the SCOPA-COG,

that is sensitive to the specific cognitive deficits in PD, and is reliable and valid. We did not

have the objective to construct a screening tool or a diagnostic instrument. In stead, the

instrument is intended for comparing groups in research situations and for assessing change in

individual functioning over time. The development of the SCOPA-COG is part of a larger

research project, the SCales for Outcomes in PArkinson's disease (SCOPA), in which short,

Chapter 4

- 60 -

practical, and clinimetric sound instruments for all relevant domains in PD are selected or

developed.

Patients and methods

Scale development

We searched the literature on studies that dealt with cognitive functioning in PD. In selecting

candidate items from the literature we used the following procedure. First, the cognitive

domains that were affected most frequently were identified. Next, tests within those domains

that consistently yielded different scores for patients with PD and controls were selected,

provided they could easily be administered in a clinical setting and did not involve fine motor

activities, such as writing, drawing, and constructing. Multiple tests were selected for each

domain, allowing us to identify which item had the best characteristics with respect to data

quality, reproducibility, and sensitivity to discriminate between patients and controls.

We used the functional classification of Lezak14 and discerned the following cognitive

domains: attention, memory and learning, executive functions, visuospatial functions, verbal

functions, and thinking and reasoning. Findings from the literature are briefly reviewed and

the concept of bradyphrenia is discussed.

Attention. Attention deficits are often found in patients with PD,1,13,15 and hence addressed in

the scale.

Memory and learning. In PD, especially free recall (‘active memory’) is impaired, while cued

recall (‘passive memory’) is largely unimpaired.1,4,13,16-19 This pattern clearly differs from that

in AD, where both types are affected.19,20 This difference is found both for verbal and visual

memory.17 Thus, not memory function per se, but its functional use is hampered. Both

immediate and delayed recall were found to be impaired in demented and non-demented

patients with PD in a study by Richards et al.2 However, results from studies on immediate

recall are contradictory,17 whereas the results from studies on delayed recall show more

consistency, the vast majority of studies reporting impaired delayed recall. Patients with PD

also frequently display more problems with learning tasks.1,21-24 Orientation, on the other

hand, is typically intact.25 Taking these findings into account, we decided to include items on

free recall (both immediate and delayed, and visual and verbal) and learning, whereas cued

recall and orientation were not addressed.


- 61 -

Executive functions. Executive functions are defined as 'functions involved in the self-

regulation of problem solving strategies' or as 'mental processes involved in goal-directed

behaviour'.14 According to Lezak,14 executive functions can be conceptualised as having four

components: volition, planning, purposive action, and effective performance. However, the

concept is somewhat vague and consequently there is debate whether a particular function is

executive or not. Patients with PD especially show problems that involve internally guided

behaviour (spontaneous generation of task-specific planning), whereas externally guided

behaviour is normal.23,26-28 Both maintaining an attentional set and set-shifting may be

impaired.16,22 An attentional set is defined as a learned predisposition to attend to one

dimension of multidimensional stimuli in order to guide subsequent responding. Shifting

attentional set involves altering the rules which are currently guiding behaviour.29 Set shifting

may be impaired very early in the course of the disease.1,13,30-33 Other functions in this domain

concern concept formation, verbal fluency, and alternating tasks. The latter two are often

impaired in non-demented patients with PD. Concept formation, however, may be preserved.2

Therefore, items on motor planning, set shifting, alternating programs, and fluency tasks were

included in the initial scale.

Visuospatial functions. Assessment of visuospatial and visuoconstructive skills is difficult in

patients with profound motor impairment, but motor-free tasks suggest that patients with PD

have visuospatial deficits in excess of their motor abnormalities.1,26 According to Boller et

al.,26 a visuospatial deficit is characterized by difficulties in appreciating the relative position

of stimulus-objects in space, difficulties in integrating those objects into a coherent spatial

framework, and difficulties in performing mental operations involving spatial concepts. The

authors described the impairment of visuospatial function in PD and found that simple

visuospatial tests discriminated better between patients and controls than complex tests.26

Other problems that have been found include angle matching, figure rotation, figure-ground

discrimination, mental reconstruction of three-dimensional objects from line drawings, and

the perception of spatial positions, shape, and size.1,26 However, because not all of the

visuospatial functions are impaired and because some of these findings have not been

replicated in other studies, Brown and Marsden34 argue against a generalized visuospatial

deficit.16 We included two motor-free items in the visuospatial domain of the initial scale.

Verbal functions. Patients with PD often perform within the normal range of tests of verbal

functions. Patients without overt dementia have naming-test scores one standard deviation

below controls, but still perform within the normal range for the test. PD-patients with

dementia have more naming errors than non-demented patients, but scores remain within

Chapter 4

- 62 -

acceptable limits.1,2,35,36 In general, there is a clear tendency for verbal functions to remain

normal and hence verbal functions were not addressed in the scale.

Thinking and reasoning. Thinking and reasoning has not been assessed thoroughly in PD, but

the information that is available suggests that patients with PD tend to perform

normally.14,30,37 No items from this domain were included.

Bradyphrenia. Slowing of information processing has frequently been reported,21,38 but not

consistently found.21,22,28 It is imperative to differentiate cognitive slowing from depressive

and motor slowing, both of which are frequently present in PD. Mayeux et al.39 reported that

psychomotor retardation is an important component of depression, which is found in

approximately 40% of the patients with PD. Differentiating between 'cognitive slowing' and

'motor slowing' may be achieved by controlling for the motor demands of the task, while

increasing cognitive complexity. Thus prolongation of response time reflects cognitive

slowing as a function of task complexity. Studies using the choice reaction paradigm,

however, mostly failed to find evidence of cognitive slowing. One explanation for this is that

the cognitive demands may be too simple.22,28 Dubois et al.22 stated that cognitive slowing has

only been clearly demonstrated on tests that require a high level of processing (e.g. Tower of

London, Stroop task). However, these findings may indicate a disturbance in cognitive

strategy, rather than a true slowing of central processing.22 In a review on this subject, Dubois

et al.22 concluded that bradyphrenia, which they defined as a non-specific lengthening of

information-processing time, is not demonstrated in PD, although cognitive slowing may

result from impaired executive functioning. Brown and Marsden28 arrived at the same

conclusion and stated that "cognitive slowing is not a useful or valid general description of the

characteristics of cognitive dysfunction observed in PD". Taking these arguments into

consideration, we decided not to address this topic separately.

Taken together, the domains that were found to be affected most frequently concerned

attention, memory, executive functions, and visuospatial functions, and items from these

domains were included in the initial scale. Verbal functions and thinking and reasoning on the

other hand, tended to be normal and no items from these domains were included. Altogether

this resulted in an initial scale of 19 items (see Appendix).

Participants

Patients. Patients with idiopathic PD who gave informed consent and fulfilled the United

Kingdom Parkinson’s Disease Society Brain Bank criteria for idiopathic PD,40 were included.


- 63 -

Patients with a history of brain surgery or deep brain stimulation, patients with other diseases

of the central nervous system, and patients who were not able to read or understand Dutch,

were excluded. The patients were selected from the database of patients with idiopathic PD of

the outpatient clinic of the Department of Neurology of the Leiden University Medical

Center. We aimed to include 75 patients and 75 controls. Patients had to cover the whole

spectrum of the disease and we pursued to enrol at least 25 patients with mild, 25 with

moderate, and 25 with severe idiopathic PD.

Controls. Persons without idiopathic PD and without other diseases of the central nervous

system who had no history of brain surgery were included, provided that they gave informed

consent and were able to read and understand Dutch.

Recruitment. Eligible patients received a letter in advance, in which they were informed on

the goal of the study and the procedures at the time of assessment. The letter also stated that

an investigator would contact the patient and inquire whether s/he considered participating.

Patients were also requested to provide the names of two persons, one male and one female,

who consented to participate as a control subject. The age difference between the patient and

his or her controls was not to exceed 10 years. The introductory letter emphasised that only

the names of persons that explicitly expressed their willingness to participate were to be

provided. In recruiting the controls, a 'match-to-sample' procedure was followed, taking into

account that the distribution of age, sex, and education level in both groups had to be similar.

The study was approved by the institutional review board.

Other assessment instruments

The other scales that were used for comparison in this study were the Dutch version of the

CAMCOG,41 the Mini Mental State Examination (MMSE),42 and the Hoehn and Yahr (H&Y)

scale.43 The CAMCOG is part of the Cambridge Examination of Mental Disorders of the

Elderly (CAMDEX),44 and evaluates cognitive functioning. The CAMCOG is administered

by a trained interviewer and has 60 items with a maximum score of 107, with higher scores

reflecting better performance. Eight subscales are discerned, that is, orientation, language,

memory, attention, praxis, calculation, abstract thinking, and perception. In our study we used

59 items with a maximum score of 106. One item (recognition of person, function) was

excluded, because it was difficult to execute in our test situation. As advised by the

developers, a cut-off value of 79/80 was used to discriminate between demented and non-

demented individuals.41 The CAMCOG has good validity and reliability45 and has previously

been used to assess cognition in patients with PD.46 The MMSE is also included in the

Chapter 4

- 64 -

cognition section of the CAMDEX. The MMSE is administered by an interviewer and has 19

items with a maximum score of 30. Higher scores reflect better performance. There are five

subscales: orientation, registration, attention and calculation, recall, and language. For the

MMSE a cut-off point of 24/25 was used to separate demented from non-demented

participants. Fourteen items of the MMSE are included in the CAMCOG score. The H&Y

staging system was used to assess disease severity. H&Y 1 is the mildest stage with only

unilateral symptoms, whereas H&Y 5 represents the most severe stage, in which patients are

wheelchair-bound or bed-ridden. To discriminate between groups of patients with different

disease severity, we classified patients as mild (H&Y 1 and 2), moderate (H&Y 3), or severe

(H&Y 4 and 5). Because patients in H&Y stages 1 and 5 are usually underrepresented, stages

1 and 2 on the one hand, and 4 and 5 on the other hand, were collapsed. All the patients were

assessed when they had the full benefit of their medication (i.e., were 'on'). The participants

were examined by three trained investigators (JM, MV, NV).

Outcome measures and statistical analysis

Item reduction. Items were reduced in a two-step process. First, the quality of the data was

assessed and only items that had missing values in less than 5% of the patients and showed

adequate reproducibility (weighted kappa [Kw] ≥ 0.40; quadratic weights), were retained. The

second step was to select per domain those items that best discriminated between patients and

controls. To this extent the Mann-Whitney-U-test was used, with a significance level of 0.05.

Reliability. Test-retest reliability was assessed in 30 patients with an interval of six weeks.

Since test-retest reliability is preferably assessed in stable patients, an external criterion (i.e.,

outside the SCOPA-COG) was chosen to determine whether patients were stable or not.

Patients who had score changes in the MMSE of 5 or more points were considered unstable

and their scores were not used for reproducibility evaluation. For individual items Kw was

used, whereas an intraclass correlation coefficient (ICC) was used for the reproducibility of

sum scores. Since the kappa statistic is unstable in the case of limited variation,47 the

percentage agreement was calculated for tests where the chance expected agreement for a

particular response exceeded 0.85. Internal consistency was assessed with Cronbach's

coefficient α.

Validity. 'Known-groups' validity was assessed by comparing the SCOPA-COG scores of

participants grouped by CAMCOG scores (cut-off point 79/80) and MMSE scores (cut-off

point 24/25), using a t-test for independent samples. ‘Known-groups’ validity was also


- 65 -

examined by comparing the SCOPA-COG scores of patients grouped by modified H&Y

stages (mild, moderate, severe), using analysis of variance (ANOVA) and ordinal regression.

In order to assess the underlying structure, an explanatory factor analysis was performed on

the final version of the scale. In this analysis an extraction method with oblique rotation was

used, because factors were assumed to be correlated.

Discrimination. The proportion of the scale range that was used by patients was calculated to

evaluate the discriminative properties of the individual scales. The coefficients of variation

(CV) were also calculated and compared. In this coefficient the standard deviation is divided

by the mean. Higher values for CV reflect a greater ability to detect differences between

individuals and thus allows finer discrimination.

The data were analysed with the Statistical Package for the Social Sciences, release 10 (SPSS

Inc, Chicago, Illinois).

Results

Participants

Eighty-five patients and 75 controls were assessed between 1 January 2002 and 1 July 2002.

Demographic data of patients and controls were similar for male-female ratio, age, and

education (table 1). The required number of patients per disease category (mild, moderate,

severe) were included, although four patients with severe PD were recruited from another

hospital (Medical Center Haaglanden, The Hague, The Netherlands). This was necessary

because some of our patients with severe PD declined, whereas others had previously been

involved in pilot testing. Therefore, we were able to recruit only 21 patients with severe PD

from our outpatient clinic.

Item selection

The percentage of missing values was very low and ranged from 0-2.3%.

Three patients that participated in the retest assessments were considered unstable, since their

MMSE score differed 5 or more points between both assessments and their scores were

excluded from the reproducibility evaluation. Reproducibility was therefore assessed in 27

patients. Seven items (2,6,10,11,14,16,17) had weighted kappa values lower than 0.40 and

were removed (table 2). Of the remaining 12 items, two items (3,8) did not discriminate

Chapter 4

- 66 -

between patients and controls and were also excluded (table 2). All three executions of item 5

discriminated between patients and controls, but the score difference between the second and

third execution of this test and the previous ones, did not differ between these groups. Hence,

no difference between groups in the capacity to 'learn' from additional displays of these ten

words was found, and therefore we decided to administer this item only once in the final

version of the scale.

Table 1. Characteristics of participants

Patients Controls p-value

number of participants 85 75

% male 62 56 0.431

mean (SD) age (years) 69.7 (10.5) 69.4 (12.3) 0.852

mean (SD) years of education 10.6 (3.9) 11.0 (3.7) 0.552

median (IQ range)3 level of education4 3 (2,6) 4 (2,6) 0.085

Hoehn and Yahr stages

1 4 (4.7%)

2 23 (27.1%)

3 33 (38.8%)

4 19 (22.4%)

5 6 (7.1%)

mean (SD) disease duration (years) 9.5 (7.1)

number on levodopa 71 (84%)

mean (SD) dose of levodopa in mg 660 (298)

number on dopamine agonists 49 (58%)

number on other antiparkinsonian medication 46 (54%)

1 Chi-square; 2 t-tests for independent samples; 3 interquartile range (between 25th and 75th percentile); 4

assessed with ordinal scale, ranging from 1 (primary school) to 8 (university degree); 5 Mann-Whitney-U


- 67 -

Thus nine items were removed from the initial scale altogether. The remaining 10 items

fulfilled our criteria for eligibility and displayed good data quality, reproducibility, and

discriminative properties. An adequate content of the scale was guaranteed, because items of

sufficient quality were retained in all domains. In the memory domain, items addressed both

visual and verbal memory, whereas immediate and delayed recall was also assessed. In the

domain of executive functions, items were included that evaluated fluency, set-shifting, and

motor planning. Two items were included in the area of attention and one in the area of

visuospatial functions. The maximum score of the SCOPA-COG is 43, with higher scores

reflecting better cognitive performance. The SCOPA-COG is administered by an interviewer

and completed in 10-15 minutes.

Clinimetric characteristics of the SCOPA-COG

Reliability. Weighted kappa for individual items ranged from 0.40-0.75 (table 2), and the ICC

for the sum score was 0.78 (table 3). The reproducibility of the sumscore of the SCOPA-COG

was higher than that of the CAMCOG and that of the MMSE (0.72 and 0.66, respectively).

The internal consistency of the SCOPA-COG and the CAMCOG were similar, with

Cronbach's alphas of 0.83 and 0.84, respectively. These values were considerably higher than

that of the MMSE, which measured 0.62 (table 3). The corrected item-total correlations of the

SCOPA-COG ranged from 0.34 (counting down by threes) - 0.66 (delayed recall). None of

the items could be deleted without a decrease of Cronbach's α.

Validity. Thirteen subjects (10 patients and 3 controls) scored below the cut-off point of the

CAMCOG and were considered demented according to this criterion. The mean CAMCOG

score was 94.2 (SD 6.4) for the 147 non-demented participants, and 68.2 (SD 12.2) for the

demented subjects. The SCOPA-COG scores of these demented and non-demented persons

were 28.8 (SD 5.8) and 13.3 (SD 4.0), respectively, and differed significantly (p < 0.001).

Twenty-six individuals (21 patients and 5 controls) scored below the cut-off point of the

MMSE. The mean MMSE score for the 134 non-demented participants was 27.9 (SD 1.6),

and for the demented participants it was 21.5 (SD 3.5). Their SCOPA-COG scores differed

significantly (p < 0.001), and measured 29.0 (SD 6.1) and 20.2 (SD 7.3), respectively.

The mean SCOPA-COG score was 30.7 (SD 5.6) for controls and 24.8 (SD 7.1) for patients

(table 4). This score difference was significant (p < 0.001). The SCOPA-COG scores of

patients with mild, moderate, and severe disease status also differed significantly (ANOVA; p

< 0.001), and measured 28.5 (SD 5.5), 24.1 (SD 6.5), and 21.7 (SD 7.7), respectively (table 5,

figure 1).

Chapter 4

- 68 -

Table 2. Reproducibility and discrimination of SCOPA-COG scores

SCOPA item range significance1 Kw2

memory

1. cubes3 0-5 + 0.49

2. visual retention 0-5 + 0.23

3. digits forward 0-9 - 0.59

4. digits backward3 0-8 + 0.50

5. a. word recall3 0-5 + 0.40

b. word recall 0-5 +

c. word recall 0-5 +

attention

6. counting down by ones 0-2 - -0.02

7. counting down by threes3 0-2 + 0.54

8. serial sevens 0-5 - 0.52

9. months backward3 0-2 + 0.44

executive functions

10. tapping 0-3 + 0.29

11. go-no-go 0-3 + 0.39

12. fist-edge-palm3 0-3 + 0.46

13. dices3 0-3 + 0.50

14. letter-digit alternation 0-3 + 0.34

15. fluency animals3 0-6 + 0.70

16. odd-man-out 0-4 + 0.37

visuospatial functions

17. line orientation 0-4 - 0.38

18. figure assembly3 0-5 + 0.45

delayed recall

19. delayed word recall3 0-5 + 0.75

1 '+' indicates significant difference (p ≤ 0.05) between patients and controls

(Mann-Whitney-U); 2 weighted kappa assessed in 27 stable patients; 3 items in

italics are included in final version of the SCOPA-COG.


- 69 -

Table 3. Reliability of sumscores of (sub)scales SCOPA-COG, CAMCOG, and MMSE

Scales and subscales reproducibility1 % agreement Cronbach α

SCOPA-COG total score 0.78 0.83

memory (4) 0.66

attention (2) 0.59

executive (3) 0.71

visuospatial (1) 0.45

CAMCOG total score 0.72 0.84

orientation (10) 0.022 50

language (17) 0.53

memory (13) 0.61

attention (2) 0.42

praxis (8) 0.33

calculation (2) -0.043 89

abstract thinking (4) 0.49

perception (3) 0.53

MMSE total score 0.66 0.62

orientation (10) 0.022 50

registration (1) 0.003 93

attention & calculation (1) 0.52

recall (1) 0.51

language (6) 0.64

Numbers in round brackets indicate number of items in that subscale. 1 intraclass correlation

coefficient used for sumscores and weighted kappa used for individual items; 2 low score because of

very skewed distribution, therefore percentage agreement indicated as well; 3 Kw statistic unstable

because expected probability for a particular score exceeds 0.8547; therefore, percentage agreement

indicated as well.

Chapter 4

- 70 -

Table 4. SCOPA-COG, CAMCOG, and MMSE scores of patients and controls

maximum1 patients controls p-value % score2

Mean (SD) SCOPA sumscore 43 24.8 (7.1) 30.7 (5.6) < 0.001 0.81

SCOPA domain scores:

Mean (SD) memory 23 9.8 (3.8) 12.7 (3.6) < 0.001 0.77

Mean (SD) attention 4 2.9 (1.2) 3.5 (0.8) < 0.001 0.83

Mean (SD) executive 12 8.2 (2.2) 10.1 (1.9) < 0.001 0.81

Mean (SD) visuospatial 5 3.9 (1.2) 4.4 (0.8) 0.004 0.89

Mean (SD) CAMCOG sum 106 89.1 (11.2) 95.4 (7.0) < 0.001 0.94

CAMCOG domain scores:

Mean (SD) orientation 10 9.5 (1.1) 9.7 (0.5) 0.045 0.98

Mean (SD) language 30 26.1 (3.1) 27.8 (2.2) < 0.001 0.94

Mean (SD) memory 27 21.0 (3.4) 22.6 (2.8) 0.002 0.93

Mean (SD) attention 7 5.6 (1.7) 6.1 (1.2) 0.045 0.92

Mean (SD) praxis 12 10.4 (1.8) 11.4 (1.0) < 0.001 0.91

Mean (SD) calculation 2 1.9 (0.3) 1.9 (0.2) 0.324 1.00

Mean (SD) abstraction 8 6.3 (1.9) 6.9 (1.3) 0.023 0.91

Mean (SD) perception 10 8.2 (1.6) 9.1 (1.3) < 0.001 0.90

Mean (SD) MMSE sumscore 30 26.0 (3.7) 27.7 (2.0) < 0.001 0.94

MMSE domain scores:

Mean (SD) orientation 10 9.5 (1.1) 9.7 (0.5) 0.045 0.98

Mean (SD) registration 3 2.9 (0.4) 3.0 (0.1) 0.063 0.97

Mean (SD) attention 5 3.9 (1.0) 4.3 (1.4) 0.042 0.91

Mean (SD) recall 3 1.6 (1.0) 2.3 (0.9) < 0.001 0.70

Mean (SD) language 9 8.1 (1.3) 8.4 (0.7) 0.032 0.96

1 maximum score per (sub)scale; 2 scores of patients expressed as percentages of scores of controls

Ordinal regression analysis of the H&Y data on the SCOPA-COG score of all participants,

resulted in a good fitting model (p = 0.90) with a significant trend (p < 0.001), accounting for

29% of the variance. A regression analysis performed on the data of patients only, resulted in

a good fitting model (p = 0.71) with a significant trend (p < 0.001), accounting for 25% of the

variance (table 5). Including age as a covariate in this model, produced a model with good fit

and significant trend that explained 38% of the variance. Age was included because it


- 71 -

correlated significantly with the SCOPA-COG scores and differed significantly between

groups of patients with different disease severity. Significant trends were also obtained for all

domains of the SCOPA-COG (table 5). A similar analysis was performed on the data of the

CAMCOG (table 6).

The correlation between the SCOPA-COG score on the one hand and the CAMCOG and

MMSE scores on the other hand, was 0.83 and 0.72, respectively (table 7). The SCOPA and

CAMCOG correlated equally strong with the H&Y scores, whereas the correlation between

the MMSE and H&Y was somewhat lower.

The SCOPA-COG scores of male and female patients were not significantly different (table

8). Cognition scores generally correlated stronger with age than with education.

The factor analysis revealed two factors with an eigenvalue greater than 1, accounting for

50.5% of the variance (table 9). All memory items, and two of the three executive function

items (fist-edge-palm, animal fluency) loaded on the first factor, whereas the attention items

and the third executive function item (dices) loaded on the second factor. The visuospatial

task had equal loadings on both factors. The correlation between both factors was 0.37.

Discrimination. Scale scores in the patient group ranged from 10-30 for the MMSE, 38-105

for the CAMCOG, and 8-37 for the SCOPA-COG. The proportion of the scale range used by

patients was thus very similar, with values between 63 and 67%. If, however, the difference in

mean scores between patients and controls is expressed in percentages of the range of the

scale, the values differ considerably. These percentages are 6.5 for the MMSE, 7.1 for the

CAMCOG, and 23.7 for the SCOPA-COG. If the mean scores of controls and patients with

different disease severity are expressed as percentages of the maximum possible score, the

proportion of the scale range for these categories used by the participants is 12% for both the

MMSE and the CAMCOG, and 21% for the SCOPA-COG (figure 2). The CV, that reflects

the ability to detect differences between individuals, was calculated for the patient group and

measured 0.14 for the MMSE, 0.13 for the CAMCOG, and 0.29 for the SCOPA-COG.

Table 5. Comparisons of SCOPA-COG scores between controls and patients classified by modified Hoehn and Yahr stage

controls p-mild1 p-moderate1 p-severe1 fit2 trend3 pseudo R-sq4 adj pseudo R-sq 5

SCOPA-COG total 30.7 28.5 24.1 21.7 0.85 0.000 0.23 0.38

memory 12.7 11.8 9.2 8.5 0.84 0.000 0.18 0.37

attention 3.5 2.9 3.1 2.6 0.83 0.000 0.12 0.186

executive 10.1 9.3 7.8 7.5 0.37 0.000 0.23 0.34

visuospatial 4.4 4.6 3.9 3.3 0.77 0.000 0.14 0.20

1 refers to disease severity, classified by the modified Hoehn and Yahr (H&Y) stages, with H&Y 1 and 2 'mild', H&Y 3 'moderate', and H&Y 4 and 5 'severe' 2 goodness of fit: p > 0.05 indicates that the model does not differ significantly from a model with a good fit 3 p < 0.05 indicates that a trend is present (trend in the model is significantly different from zero); p > 0.05 indicates absence of trend 4 pseudo R2: part of the variance accounted for by the model 5 adj pseudo R2: proportion of the variance accounted for by the model with age included as a covariate 6 R square not informative because of insufficient fit.

Table 6. Comparisons of CAMCOG scores between controls and patients classified by modified Hoehn and Yahr stage

controls p-mild1 p-moderate1 p-severe1 fit2 trend3 pseudo R-sq4 adj pseudo R-sq5

CAMCOG total 95.5 94.3 89.7 82.6 0.61 <0.001 0.18 0.30

orientation 9.7 9.8 9.6 9.0 0.83 0.027 0.06 0.08

language 27.8 27.6 26.0 24.6 0.65 <0.001 0.17 0.24

memory 22.6 22.2 21.4 19.3 0.80 <0.001 0.11 0.20

attention 6.1 5.7 5.9 5.0 0.53 0.2027 0.038 0.038

praxis 11.4 11.4 10.2 9.6 0.55 <0.001 0.19 0.28

calculation 1.9 1.9 1.9 1.8 0.006 0.4447 0.028 0.028

abstract thinking 6.9 6.8 6.2 6.0 0.33 0.1457 0.038 0.07

perception 9.1 8.9 8.4 7.3 0.20 <0.001 0.16 0.358

1 refers to disease severity, classified by the modified Hoehn and Yahr (H&Y) stages, with H&Y 1 and 2 'mild', H&Y 3 'moderate', and H&Y 4 and 5 'severe' 2 goodness of fit: p > 0.05 indicates that the model does not differ significantly from a model with a good fit 3 p < 0.05 indicates that a trend is present (trend in the model is significantly different from zero); p > 0.05 indicates absence of trend 4 pseudo R2: proportion of the variance accounted for by the model 5 adj Pseudo R2: part of the variance accounted for by the model with age included as a covariate 6 insufficient fit 7 trend not significant 8 R squares not informative because of insufficient fit or insignificant trend.

Chapter 4

- 74 -

Table 7. Correlations among scales in patient group

Hoehn and Yahr (rs)1 SCOPA-COG (r)2 CAMCOG (r)2

MMSE -0.283 0.72 0.83

CAMCOG -0.40 0.83

SCOPA-COG -0.39

1 Spearman's correlation coefficient (rho) 2 Pearson's product-moment correlation coefficient 3 correlation significant at p < 0.01; all other correlations significant at p < 0.001

Other findings

Linear regression analysis, using both the forward and backward stepwise method on the

scores of all three cognition scales, revealed that age, disease severity (i.e., H&Y), and level

of education were the variables that contributed most to the explanation of variance in scores

among participants. The adjusted R-squares of the constructed models were 0.23 in the

MMSE, 0.42 in the CAMCOG, and 0.47 in the SCOPA-COG. If this regression method was

applied to the data of the patients only, the same variables emerged, revealing adjusted R-

squares of 0.23 for the MMSE, 0.42 for the CAMCOG, and 0.34 for the SCOPA-COG.

Table 8. Relation between cognition and sex, age, and education in patients

mean (sd)

males

mean (sd)

females

p-value

m/f1

r age2

r years

educ2

rs level

educ3

MMSE 26.2 (3.6) 25.8 (3.9) 0.60 -0.45 0.20 (p = 0.072) 0.19 (p = 0.081)

CAMCOG 90.7 (9.6) 86.4 (13.2) 0.09 -0.50 0.33 (p = 0.002) 0.43 (p < 0.001)

SCOPA-COG 25.1 (7.1) 24.3 (7.3) 0.60 -0.54 0.20 (p = 0.069) 0.30 (p = 0.005)

1 p-value for differences in cognition scores between sexes 2 Pearson's correlation coefficient; all correlations with age significant (p < 0.001) 3 Spearman's correlation coefficient


- 75 -

Table 9. Factor analysis

factor 1 factor 2

1. cubes 0.51

2. digits backwards 0.61

3. word recall 0.71

4. counting by threes 0.64

5. months backwards 0.71

6. fist-edge-palm 0.79

7. dices 0.62

8. animal fluency 0.69

9. figure assembly 0.41 0.44

10. delayed recall words 0.63

Pattern matrix of explanatory factor analysis; extraction method:

oblique; data < 0.40 not shown. Two factors with eigenvalue > 1,

explaining 50.5% of the variance. Correlation between factors 0.37.

Figure 1. SCOPA-COG score and disease severity

Mean absolute SCOPA-COG scores of all participants. Controls included as

Hoehn and Yahr (H&Y) 0 (none, 'no signs of disease'; 75 persons). Disease

severity of patients categorised as mild (H&Y 1 and 2; 27 persons),

moderate (H&Y 3; 33 persons), or severe (H&Y 4 and 5; 25 persons).

Modified Hoehn and Yahr

severemoderatemildnone

SCO

PA-C

OG

sco

re

35

30

25

20

15

10

5

0

Chapter 4

- 76 -

Figure 2. Scores expressed in percentages of maximum score

Mean relative scores by modified H&Y stage, expressed as percentages of

maximum score. Controls included as H&Y 0 ('no signs of disease'; here 'none').

Discussion

We started this study with searching the literature for those cognitive functions that were

likely to be affected in PD, in order to select candidate items for a short, practical, and

sensitive scale that evaluates the specific cognitive deficits in PD. We found that especially

attention, active memory, executive functions, and visuospatial functions were impaired,

whereas verbal functions, and thinking and reasoning were relatively spared. The pattern that

thus emerged by following this strategy, is consistent with the concept of subcortical

dementia. It could therefore be argued that the SCOPA-COG is an instrument for assessing

subcortical functions in PD, although it was not our intention to create such an instrument. In

stead, we focussed on vulnerable functions, irrespective of whether these were considered

cortical or subcortical.

We developed a short and practical scale that displayed good clinimetric properties in this

study. Reproducibility and internal consistency were good, indicating adequate reliability.

The construct validity of the scale was supported by the expected correlations with other

scales, and by differences found between categories of participants grouped by dementia

modified Hoehn and Yahr

severemoderatemildnone

Scal

e sc

ores

in p

erce

ntag

es

100

90

80

70

60

50

40

30

20

10

0��

�� MMSE

CAMCOG

SCOPA-COG

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��


- 77 -

status, and among patients grouped by disease severity. The higher CV in comparison with

the CAMCOG and the MMSE, indicates that the SCOPA-COG permits finer discrimination.

The scale shows a clear trend towards lower cognition scores for patients with more advanced

PD. This trend is more pronounced in the SCOPA-COG than in both other scales (figure 2).

Differences found in CAMCOG and MMSE scores between controls and patients with mild

PD are very small. In the SCOPA-COG these differences are larger, which may indicate that

this scale is more sensitive to the changes in cognitive performance that occur early in the

course of the disease. Male and female patients have similar scores, indicating that the

SCOPA is insensitive to the sex of the patient.

The SCOPA-COG correlates stronger with the CAMCOG than with the MMSE, but the latter

correlation is still substantial. Interestingly, the SCOPA-COG and MMSE correlate equally

strong with the CAMCOG, although the CAMCOG and SCOPA-COG share only one item,

whereas the CAMCOG and MMSE share 14 items.

The first group of items that emerged from the factor analysis, appears to reflect memory. All

memory items load on this factor. The loading of the 'fist-edge-palm' item on this factor may

be explained by the appeal that is made on memory to recall the sequence of movements. The

same explanation applies to the 'animal fluency task', in which memory must be activated to

produce correct responses. The second factor is best characterised by attention, since both

attention items load on this factor, whereas attention is also a prerequisite for correctly

executing the 'dices' task. The 'figure assembly' task loads only moderately on both factors,

which may indicate that this item is associated with an altogether different cognitive domain.

This is not surprising, given the nature of the task. The finding that the executive functions do

not emerge as a separate factor, indicates that these functions are somewhat heterogeneous,

which may be understood by the fact that functions within this domain, such as set-shifting,

fluency, and alternating motor programs, are rather different. Apparently the similarity with

other factors is stronger than the mutual resemblance among items in this domain.

Using a scale that selectively addresses some of the most often affected cognitive functions in

PD may add to our knowledge on cognitive difficulties that patients with this condition

experience. In general, the existing scales (such as the CAMCOG and MMSE), are based on a

more 'cortical' concept. Studying the CAMCOG scores in table 4, shows that seven of the

eight subscales of the CAMCOG differ significantly between patients and controls. This

indicates that these so-called 'cortical' functions are also affected in PD, albeit to a lesser

extent, as is apparent from the last column of this table. Some of the cortical functions, such

Chapter 4

- 78 -

as language and orientation, become affected later in the course of the disease (table 6). Using

both types of scales in longitudinal research may provide important information on the

interrelationship between these functions and may also shed light on several important topics.

Questions as to whether changes in 'cortical' and 'subcortical' functions occur simultaneously,

or whether one precedes the other, may be addressed. Other questions may concern the way

these changes interact, or whether changes in cortical and subcortical functions are related to

particular stages of the disease. These questions are all very relevant, even if the concept of

'subcortical dementia' as a distinctive entity is questioned.48,49 Arguments against this

distinction concern, among others, the overlap between the two dementias, the lack of a

consistent pattern among diseases that involve subcortical regions, the presence of subcortical

lesions in patients with cortical degeneration, and cortical lesions in patients with subcortical

lesions. Brown and Marsden50 argue that the dense pattern of neural interconnections between

cortical and subcortical regions suggests that the functional organisation of the brain does not

respect such convenient anatomical distinctions. Cummings and Benson,51 however,

emphasise that subcortical dementia is a clinical and not an anatomical concept. Advocates of

the dichotomy between cortical and subcortical functions refer to the consistent pattern of

spared and impaired functions, the neuropsychological differences between patients with AD

and PD, and to the association between subcortical gray matter lesions and the usual

presentations of subcortical dementias.25,52,53 Two prominent researchers in the area of

subcortical dementia provide somewhat different definitions of this concept. Dubois and

Pillon13 define subcortical dementia as a progressive dysexecutive syndrome with memory

deficits, in the absence of aphasia, apraxia, or agnosia. According to Cummings,1 subcortical

dementia is characterised by four cardinal features: memory impairment, decline in

intellectual function, psychomotor retardation, and mood abnormalities, whereas language

function is spared and motor disturbances coexist with the dementia. In a study by Korten et

al.,54 41 clinicians were asked to indicate which cognitive domains were affected in cortical,

subcortical, and frontal dementia. Agreement with respect to the latter two was disappointing.

This once again underlines the lack of a gold standard for subcortical dementia and indicates

that it is impossible to construct a screening or diagnostic instrument at this point in time.

After all, how should a cut-off point be determined? The SCOPA-COG is hence intended as

an index of severity and may be used to compare groups in research situations.

Although there is no consensus in this area, we tend to agree with Dubois et al.22 and Lezak,14

that the distinction between cortical and subcortical dementia still may be of heuristic value

and help research in this field. Our study should therefore be considered as a first attempt to


- 79 -

construct a cognitive scale that accounts for the typical phenomenology of subcortical

dementia.

It should be realised that the SCOPA-COG was constructed on the basis of differences in this

sample of participants and thus may reflect certain characteristics of this particular group.

Replication of the results in a different sample is needed to add further support to this scale.

In conclusion, the SCOPA-COG is a short and practical instrument with good clinimetric

properties that focuses on the most vulnerable cognitive functions in PD. The use of this scale

may provide a better insight into specific cognitive deficits that patients with PD experience.

Acknowledgements

This study was financed by the Netherlands Organization for Scientific Research (project no.

0940-33-021). The authors thank professor R.A.C. Roos for his helpful comments on the first

version of this manuscript. Doctors W.V.M. Perquin and J. van Rossum are gratefully

acknowledged for their help in the recruitment of patients.

References

1. Cummings JL. Intellectual impairment in Parkinson's disease: clinical, pathologic, and biochemical

correlates. J Geriatr Psychiatry Neurol 1988; 1(1):24-36.

2. Richards M, Stern Y, Marder K, Cote L, Mayeux R. Relationships between extrapyramidal signs and

cognitive function in a community-dwelling cohort of patients with Parkinson's disease and normal

elderly individuals. Ann Neurol 1993; 33(3):267-274.

3. Goldman WP, Baty JD, Buckles VD, Sahrmann S, Morris JC. Cognitive and motor functioning in

Parkinson disease: subjects with and without questionable dementia. Arch Neurol 1998; 55(5):674-680.

4. Katzen HL, Levin BE, Llabre ML. Age of disease onset influences cognition in Parkinson's disease. J Int

Neuropsychol Soc 1998; 1998 May;4(3):285-290.

5. Mayeux R, Denaro J, Hemenegildo N, Marder K, Tang MX, Cote LJ, et al. A population-based

investigation of Parkinson's disease with and without dementia. Relationship to age and gender. Arch

Neurol 1992; 49(5):492-497.

6. Sutcliffe RL, Meara JR. Parkinson's disease epidemiology in the Northampton District, England, 1992.

Acta Neurol Scand 1995; 92(6):443-450.

7. Aarsland D, Andersen K, Larsen JP, Lolk A, Nielsen H, Kragh-Sorensen P. Risk of dementia in

Parkinson's disease: A community-based, prospective study. Neurology 2001; 56(6):730-736.

8. Aarsland D, Andersen K, Larsen JP. 8-year prevalence of dementia in Parkinson's disease. Movement

Disorders 2001; 16 suppl 1:S57. Abstract.

Chapter 4

- 80 -

9. Stern Y, Richards M, Sano M, Mayeux R. Comparison of cognitive changes in patients with Alzheimer's

and Parkinson's disease. Arch Neurol 1993; 50(10):1040-1045.

10. Mahieux F, Fenelon G, Flahault A, Manifacier MJ, Michelet D, Boller F. Neuropsychological prediction

of dementia in Parkinson's disease. J Neurol Neurosurg Psychiatry 1998; 64(2):178-183.

11. Scheltens Ph. Dementia in Parkinson's disease: subclinical Alzheimer's disease? In: Wolters ECh,

Scheltens Ph, Berendse HW, editors. Mental Dysfunction in Parkinson's disease II. Utrecht: Academic

Pharmaceutical Productions, 1999: 189-193.

12. American Psychiatric Association. Delirium, Dementia, and Amnestic and Other Cognitive Disorders.

Diagnostic and Statistical Manual of Mental Disorders. Washington: American Psychiatric Association,

1994: 123-163.

13. Dubois B, Pillon B. Dementia in Parkinson's disease. In: Wolters ECh, Scheltens Ph, Berendse HW,

editors. Mental Dysfunction in Parkinson's Disease II. Utrecht: Academic Pharmaceutical Productions,

1999: 165-176.

14. Lezak MD. Neuropsychological Assessment. 3 ed. New York: Oxford University Press, 1995.

15. Brown GG, Rahill AA, Gorell JM, McDonald C, Brown SJ, Sillanpaa M, et al. Validity of the Dementia

Rating Scale in assessing cognitive function in Parkinson's disease. J Geriatr Psychiatry Neurol 1999;

12(4):180-188.

16. Brown RG, Marsden CD. Neuropsychology and cognitive function in Parkinson's disease: an overview.

In: Marsden CD, Fahn S, editors. Movement Disorders 2. London: Butterworths, 1987: 99-123.

17. Guillard A, Fenelon G, Mahieux F. Les altérations cognitives au cours de la maladie de Parkinson. Rev

Neurol (Paris) 1991; 147:337-355.

18. Pillon B, Deweer B, Agid Y, Dubois B. Explicit memory in Alzheimer's, Huntington's, and Parkinson's

diseases. Arch Neurol 1993; 50(4):374-379.

19. Barrett AM, Crucian GP, Schwartz RL, Heilman KM. Testing memory for self-generated items in

dementia: method makes a difference. Neurology 2000; 54(6):1258-1264.

20. Grober E, Buschke H, Crystal H, Bang S, Dresner R. Screening for dementia by memory testing.

Neurology 1988; 38(6):900-903.

21. Reid WG, Broe GA, Hely MA, Morris JG, Williamson PM, O'Sullivan DJ, et al. The neuropsychology of

de novo patients with idiopathic Parkinson's disease: the effects of age of onset. Int J Neurosci 1989;

48(3-4):205-217.

22. Dubois B, Boller F, Pillon B, Agid Y. Cognitive deficits in Parkinson's disease. In: Corkin S, Grafman J,

Boller F, editors. Handbook of Neuropsychology. Amsterdam: Elsevier Science Publishers, 1991: 195-

240.

23. Berger HJ, van Es NJ, Van Spaendonck KP, Teunisse JP, Horstink MW, 't Hof MA, et al. Relationship

between memory strategies and motor symptoms in Parkinson's disease. J Clin Exp Neuropsychol 1999;

21(5):677-684.

24. Faglioni P, Saetti MC, Botti C. Verbal learning strategies in Parkinson's disease. Neuropsychology 2000;

14(3):456-470.

25. Cummings JL. Subcortical dementia. Neuropsychology, neuropsychiatry, and pathophysiology. Br J

Psychiatry 1986; 149:682-697.


- 81 -

26. Boller F, Passafiume D, Keefe NC, Rogers K, Morrow L, Kim Y. Visuospatial impairment in Parkinson's

disease. Role of perceptual and motor factors. Arch Neurol 1984; 41(5):485-490.

27. Taylor AE, Saint-Cyr JA, Lang AE. Frontal lobe dysfunction in Parkinson's disease. The cortical focus of

neostriatal outflow. Brain 1986; 109 ( Pt 5):845-883.

28. Brown RG, Marsden CD. Cognitive function in Parkinson's disease: from description to theory. Trends

Neurosci 1990; 13(1):21-29.

29. Cools R, Swainson R, Owen AM, Robbins TW. Cognitive dysfunction in non-demented Parkinson's

disease. In: Wolters ECh, Scheltens Ph, Berendse HW, editors. Mental dysfunction in Parkinson's disease

2. Utrecht: Academic Pharmaceutical Productions, 1999: 142-164.

30. Flowers KA, Robertson C. The effect of Parkinson's disease on the ability to maintain a mental set. J

Neurol Neurosurg Psychiatry 1985; 48(6):517-529.

31. Gotham AM, Brown RG, Marsden CD. 'Frontal' cognitive function in patients with Parkinson's disease

'on' and 'off' levodopa. Brain 1988; 111 (Pt 2):299-321.

32. Van Spaendonck KP, Berger HJ, Horstink MW, Buytenhuijs EL, Cools AR. Executive functions and

disease characteristics in Parkinson's disease. Neuropsychologia 1996; 34(7):617-626.

33. Cools R, Barker RA, Sahakian BJ, Robbins TW. Mechanisms of cognitive set flexibility in Parkinson's

disease. Brain 2001; 124(Pt 12):2503-2512.

34. Brown RG, Marsden CD. Visuospatial function in Parkinson's disease. Brain 1986; 109(Pt 5):987-1002.

35. Maeshima S, Itakura T, Nakagawa M, Nakai K, Komai N. Visuospatial impairment and activities of daily

living in patients with Parkinson's disease: a quantitative assessment of the cube-copying task. Am J Phys

Med Rehabil 1997; 76(5):383-388.

36. Lee A, Harris J. Problems with perception of space in Parkinson's disease. Neuro-ophthalmology 1999;

22:1-15.

37. Brown RG, Marsden CD. How common is dementia in Parkinson's disease? Lancet 1984; 2(8414):1262-

1265.

38. Mayeux R, Stern Y, Sano M, Cote L, Williams JB. Clinical and biochemical correlates of bradyphrenia in

Parkinson's disease. Neurology 1987; 37(7):1130-1134.

39. Mayeux R, Stern Y, Rosen J, Leventhal J. Depression, intellectual impairment, and Parkinson disease.

Neurology 1981; 31(6):645-650.

40. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson's disease.


41. Derix MMA, Teunisse S, Hijdra A, Wens L, Hofstede AB, Walstra GJM, et al. CAMDEX-N: De

Nederlandse versie van de Cambridge Examination for Mental Disorders of the Elderly. Lisse: Swets &

Zeitlinger, 1992.

42. Folstein MF, Folstein SE, McHugh PR. "Mini-mental state". A practical method for grading the cognitive

state of patients for the clinician. J Psychiatr Res 1975; 12(3):189-198.

43. Hoehn MM, Yahr MD. Parkinsonism: onset, progression and mortality. Neurology 1967; 17(5):427-442.

44. Roth M, Tym E, Mountjoy CQ, Huppert FA, Hendrie H, Verma S, et al. CAMDEX. A standardised

instrument for the diagnosis of mental disorder in the elderly with special reference to the early detection

of dementia. Br J Psychiatry 1986; 149:698-709.

Chapter 4

- 82 -

45. McDowell I, Newell C. Mental status testing. Measuring Health: A Guide to Rating Scales and

Questionnaires. New York: Oxford University Press, 1996: 287-334.

46. Hobson P, Meara J. The detection of dementia and cognitive impairment in a community population of

elderly people with Parkinson's disease by use of the CAMCOG neuropsychological test. Age Ageing

1999; 28(1):39-43.

47. Haas M. Statistical methodology for reliability studies. J Manipulative Physiol Ther 1991; 14(2):119-132.

48. Mayeux R, Stern Y, Rosen J, Benson F. Is "subcortical dementia" a recognizable clinical entity? Ann

Neurol 1983; 14(3):278-283.

49. Whitehouse PJ. The concept of subcortical and cortical dementia: another look. Ann Neurol 1986;

19(1):1-6.

50. Brown RG, Marsden CD. 'Subcortical dementia': the neuropsychological evidence. Neuroscience 1988;

25(2):363-387.

51. Cummings JL, Benson DF. Subcortical dementia. Review of an emerging concept. Arch Neurol 1984;

41(8):874-879.

52. Huber SJ, Shuttleworth EC, Paulson GW, Bellchambers MJ, Clapp LE. Cortical vs subcortical dementia.

Neuropsychological differences. Arch Neurol 1986; 43(4):392-394.

53. Cahn-Weiner DA, Grace J, Ott BR, Fernandez HH, Friedman JH. Cognitive and behavioral features

discriminate between Alzheimer's and Parkinson's disease. Neuropsychiatry Neuropsychol Behav Neurol

2002; 15(2):79-87.

54. Korten EC, Verhey FR, Derix MM, Klinkenberg EL, Jolles J. [Consensus on the concepts of cortical,

subcortical and frontal dementia. Research by national and international dementia experts]. Tijdschr

Gerontol Geriatr 2001; 32(3):109-116.

55. Benton AL. The Revised Visual Retention Test. 4 ed. New York: Psychological Corporation, 1974.

56. Luria AR. Higher cortical functions in man. 1 ed. London: Tavistock Publications, 1966.

57. Benton AL, Varney NR, Hamsher KD. Visuospatial judgment. A clinical test. Arch Neurol 1978;

35(6):364-367.


- 83 -

Appendix

Memory and learning

1. Cubes*

Copying the order in which four cubes are pointed; 5 series.

(Knox Cube Test from Arthur Point Scale of Performance Battery)

2. Visual retention test

Selecting the one figure (out of four resembling figures), that was shown previously; 5 series. (Selected

from Benton's visual retention test 55.)

3. Digit span forward

(3-9 digits); repeating a series of digits in the same direction they were presented.

4. Digit span backward*

(2-8 digits); as 3, but now in backward direction.

5. Verbal recall*

Reading and recalling 10 words. Series is performed three times. Differences between consecutive trials

reflect learning capacity.

Attention

6. Counting down by ones, 20 to 1

7. Counting down by threes, 30 to 0*

8. Serial sevens

Subtracting 7 from 100, and then again from the remaining total; repeat 5 times.42

9. Months backward*

Naming the months of the year in reverse order.

Executive functions

10. Tapping

"Tap twice when I tap once. Tap once when I tap twice"; series of 10.56

11. Go – No Go

"Tap once when I tap once. Do not tap when I tap twice"; series of 10.56

12. Fist-edge-palm*

Copying three consecutive hand movements; series of 10.56

Chapter 4

- 84 -

13. Dices*

Asking the subject to say "Yes" if the number on a dice that is printed on a card represents an even

number and "No" if it concerns an odd number. Correct if necessary. Then second trial is performed, in

which the subject is requested to say "Yes" if the number on the dice is higher than the one that was

shown previously, and "No", if the number is lower; series of 10. Only second trial is scored.

14. Letter-digit alternation

Showing 10 pairs of one letter and one digit. During the first series, only the letters should be read aloud.

Correct if necessary. During the second series, only digits are to be read aloud. Only the second trial is

scored.33

15. Fluency animals*

Naming as many animals as possible during one minute.

16. Odd-man-out

Have subject indicate and explain which figure is different from the other two. Then this explicated rule

has to be applied when answering the next three sets of three figures. Correct if necessary. Then the same

set is presented again, and another way in which one of the figures is differs from the other two should be

indicated and explained. Answers to the next set should be given according to this second rule.30

Differences concern size or shape of the figures.

Visuospatial functions

17. Angle matching

Indicating which of 12 lines have the same orientation as four presented line segments. (From visual

judgement of line orientation test57).

18. Figure assembly*

Indicating which figure parts are necessary to construct a presented figure; 5 series.

Memory

19. Verbal recall*

Delayed recall. Recalling the words of item 5.

* item included in final version of SCOPA-COG

- 85 -

55

Evaluation of the Hospital Anxiety and Depression Scale in patients with Parkinson’s disease

Johan Marinus1, Albert F.G. Leentjens2 3, Martine Visser1, Anne M. Stiggelbout4, Jacobus J.

van Hilten1


Leiden; 2Department of Psychiatry, Maastricht University Hospital, Maastricht; 3Institute for Brain and Behaviour, Maastricht University, Maastricht, The Netherlands

Published in Clinical Neuropharmacology 2002;25:318-324

Chapter 5

- 86 -

Abstract The purpose of this study was to evaluate the psychometric properties of the Hospital Anxiety and

Depression Scale (HADS) in patients with Parkinson's disease (PD), and to assess the prevalence of

symptoms of anxiety and depression in this population. The HADS was sent to 205 patients with PD,

together with three quality-of-life (QoL) instruments, i.e., the Parkinson’s Disease Questionnaire

(PDQ-39), the EQ-5D, and a visual analogue scale (VAS). HADS scores were also compared with

Hoehn and Yahr (H&Y) scores. Eighty-six percent of the patients returned the questionnaires. The

quality of the data was good. Cronbach’s α for the HADS was 0.88. Test-retest reliability over two

weeks was 0.84 for the sum score of the HADS (intraclass correlation coefficient), and ranged from

0.42-0.76 for individual items (weighted kappa). Factor analysis revealed two factors, accounting for

51.9% of the variance. One factor represented anxiety, the other depression. Correlations with PDQ-

39, EQ-5D, VAS, and H&Y were 0.72, -0.59, -0.59, and 0.32, respectively (p-values < 0.001).

Depression scores accounted for 52% of the variance in QoL, whereas disease severity explained

24%. Using the cut-off values proposed by the developers, indicated that possible and probable

anxiety were present in 28.9 and 19.8 percent of the patients, respectively. Percentages for possible

and probable depression were 21.6 and 16.5. The psychometric performance of the HADS in patients

with PD is satisfactory. In addition, almost 50% of the patients displayed symptoms of anxiety,

whereas nearly 40% showed signs of depression.

Keywords: Parkinson's disease, Hospital Anxiety and Depression Scale, anxiety, depression,

psychometrics.

Evaluation of the HADS in Parkinson's disease

- 87 -

Introduction

Depressive symptoms are common in patients with Parkinson’s disease (PD). The mean

prevalence of depressive symptomatology in 26 studies reviewed by Cummings was

estimated at 40%, with a range of 4-70%.1 Depression affects quality of life (QoL) in patients

with PD substantially2-4 and, therefore, evaluating depression is important. However, the

majority of rating scales for depression include somatic items that resemble somatic features

that are inherent to PD (e.g. psychomotor slowing, masked face, mental slowing,

concentration difficulties, sleep disturbances, fatigue). Application of these scales may lead to

overestimation of depressive symptoms in PD, and it may be more appropriate to use scales

without somatic items.

Anxiety is another common problem in PD, but this phenomenon has received much less

attention. Symptoms of anxiety may occur in up to 40% of the patients with PD,5 with two of

the largest studies reporting frequencies of about 20%.6,7 Several studies indicated that

anxiety and depression are related constructs.8,9 Just as in depression, anxiety may include

somatic symptoms that can be caused by PD as well (e.g. restlessness, being easily fatigued,

difficulty concentrating, irritability, muscle tension, sleep disturbance). Following the same

line of reasoning as in depression, scales without somatic symptoms may be preferred for use

in PD.

A scale that addresses both anxiety and depression and lacks somatic items, is the Hospital

Anxiety and Depression Scale (HADS).10 This scale is thus an attractive candidate to assess

these phenomena in PD. The HADS has 14 items, seven of which address depression and

seven that evaluate anxiety. Scores on individual items can either be summed to calculate a

total score or totalled per subscale to produce separate anxiety and depression scores. Each

item has four graded response options, scored 0 (absence) to 3 (present to extreme). Items are

phrased both indicative and contra-indicative. Scores on contra-indicative items are recoded

before adding up. Higher scores reflect greater problems.

Experience with this scale in PD is scarce.2,11 Therefore, we conducted a study with the

objective to both assess the psychometric properties of the HADS in patients with PD, and

evaluate the prevalence of symptoms of anxiety and depression in this population.

Chapter 5

- 88 -

Materials and Methods

Patients

Patients who visited the outpatient neurology clinic of the Leiden University Medical Center,

and who fulfilled the United Kingdom Parkinson’s Disease Society Brain Bank criteria for

idiopathic PD,12 were considered eligible. Patients with other diseases of the central nervous

system were excluded. The study was approved by the institutional review board.

Methods

Questionnaires were sent to 205 patients. The included instruments were the Parkinson’s

Disease Questionnaire (PDQ-39),13 the EQ-5D,14 and a visual analogue scale (VAS)

assessing quality of life with PD. The PDQ-39 is a disease-specific quality-of-life (QoL)

instrument with five ordinal response options that includes 39 items, clustered in eight

subscales (mobility, activities of daily living, emotional well-being, stigma, social support,

cognitions, communication, and bodily discomfort). Summary indices can be calculated both

for subscales and for the total scale. Higher scores reflect poorer QoL. The EQ-5D is a short

generic QoL instrument with three ordinal response options that includes five items

(mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). A summary

index with a maximum of 1.00 (reflecting the best health state) can be derived from the five

dimensions by conversion with a table of scores.15,16

Patients were requested to fill out the PDQ-39 first and subsequently put it in one of the two

enclosed return envelopes, which had to be sealed before the other scales were filled out. This

was done because some of the items in the PDQ-39 resembled items in other instruments, and

we wanted to ensure that patients did not look back at their previous responses in order to

avoid overestimation of the correlation between scales. Patients were asked to return the

completed questionnaires within one week. After two weeks we contacted all patients that

had not returned their questionnaires, and inquired whether they still wanted to participate.

Patients who returned their questionnaires within five days were asked to fill out the HADS a

second time two weeks later, in order to evaluate test-retest reliability.

Information from questionnaires was combined with information that was obtained from

patients records. This information concerned disease severity and disease duration. Disease

severity was evaluated at each control visit and assessed by the Hoehn and Yahr (H&Y)

staging system.17 H&Y 1 is the mildest stage with only unilateral symptoms, whereas H&Y 5

represents the most severe stage, in which patients are wheelchair-bound or bedridden.


- 89 -

Statistical analysis

Data were entered and analyzed with SPSS for windows 10.0 (SPSS Inc, Chicago, IL, USA).

The quality of the data was evaluated and items were considered to perform adequately if

they met the following criteria: missing values in less than 5% of the subjects, item-total

correlations of 0.20 or higher,18 and absence of floor and ceiling effects (endorsement rates

between 0.20 and 0.80).18

Reliability. Internal consistency was assessed using Cronbach’s α. Values higher than 0.7

were considered adequate for group comparisons, whereas values higher than 0.9 were

considered adequate for clinical application.19 The test-retest reliability for individual items

was assessed with a weighted kappa (Kw), whereas for the total score an intraclass correlation

coefficient (ICC) was used. Quadratic weights were used to calculate Kw, because with these

weights, kappa is equivalent to the ICC,20 which facilitates the interpretation of the results.

Validity. A principal component factor analysis was performed to explore the factor structure

of the HADS. An oblique rotation method was used because the factors were assumed to be

correlated. Construct validity with the PDQ-39, the EQ-5D, and the VAS was assessed using

Pearson’s correlation coefficient (r). For correlation with the H&Y scale, Spearman's rho (rs)

was used. Pearson's r was also used to assess the correlation between HADS scores and both

disease duration and age. An ordinal regression analysis was used to assess the relation

between HADS scores and H&Y scores. The significance threshold was set at 0.05.

Prevalence of symptoms

The cut-off values as proposed by the developers of the HADS10 were applied to our

population in order to determine the proportion of patients considered unimpaired (not

anxious or not depressed, scoring ≤ 7 on each subscale), possibly impaired (8-10 on each

subscale), or probably impaired (≥ 11 on each subscale).

Chapter 5

- 90 -

Results

Response rate and sample characteristics

One-hundred-and-seventy-seven questionnaires were returned, constituting a response rate of

86%. Sample characteristics are presented in table 1.

The first 59 patients were approached for the test-retest analysis. Fifty-four of these patients

(92%) returned the HADS the second time. Patient characteristics of the retest group were

similar to those of the total group.

Table 1. Patient characteristics

number of patients: 177

male / female ratio: 99 (56%) male / 78 (44%) female

mean (SD) age in years: 65.2 (11.1)

mean (SD) age at onset in years: 55.7 (11.6)

mean (SD) disease duration in years: 9.4 (5.6)

disease stage:

Hoehn &Yahr 1 and 2 (mild): 74 (41.8%)

Hoehn &Yahr 3 (moderate): 70 (39.5%)

Hoehn &Yahr 4 and 5 (severe): 33 (18.6%)

Clinimetric evaluation of the HADS

Data quality. The quality of the data was good. There were no missing values. All item-total

correlations exceeded 0.20 (table 2) and none of the items showed floor or ceiling effects.

Reliability. Cronbach’s α for the total HADS was 0.88. For the anxiety subscale and the

depression subscale this was 0.86 and 0.78, respectively. None of the anxiety items could be

deleted without a decrease of Cronbach’s α. Deletion of item 14 in the depression subscale

would have resulted in a slight increase of α to 0.80.

The test-retest reliability for sumscores, assessed with an ICC, was 0.84 for the total HADS,

and 0.86 and 0.84 for the anxiety and depression subscales, respectively. The test-retest

reliability for individual items (Kw) ranged from 0.42-0.76 (table 2).

Validity. Factor analysis with oblique rotation revealed three factors with an eigenvalue

greater than 1, together explaining 59.0% of the variance. Factor one included all the original

anxiety items, with the exception of item 5, which loaded slightly more on the second


- 91 -

(‘depression’) factor. The second factor included all the original depression items, with the

exception of items 10 and 14. These latter two items together constituted the third factor,

which had an eigenvalue of 1.04. The factor loadings for items 10 and 14 on the third factor

were 0.69 and 0.75, respectively. Inspection of the scree plot indicated that a two-factor

solution might be more appropriate, considering the fact that the third factor hardly explained

more of the variance than the fourth and higher factors did. The third factor accounted for

7.4% of the variance, whereas the fourth factor accounted for 6.2%. Their eigenvalues were

1.04 and 0.87, respectively. We therefore performed a second factor analysis, forcing the

solution into two factors (table 3). In this model 51.6% of the variance was accounted for. All

odd item numbers (together constituting the original anxiety subscale) loaded on the first

(‘anxiety’) factor, except for item seven (‘I can sit at ease and feel relaxed’). All even item

numbers (together constituting the original depression subscale), loaded on the second

(‘depression’) factor, except for item eight (‘I feel as if I am slowed down’). Items seven and

eight did not discriminate between subscales and loaded almost equally on both factors.

Table 2. HADS item scores

item mean SD test-retest1 item-scale2

A D

1 feel tense or wound up (A) 1.40 0.83 0.70 0.70

2 still enjoy things I used to enjoy (D) 0.91 0.83 0.62 0.64

3 get a sort of frightened feeling (A) 0.55 0.72 0.42 0.61

4 can laugh and see the funny side (D) 0.84 0.77 0.62 0.66

5 worrying thoughts go through my mind (A) 1.22 0.96 0.63 0.66

6 feel cheerful (D) 0.79 0.81 0.56 0.61

7 can sit at ease and feel relaxed (A) 1.48 0.72 0.61 0.60

8 feel as if I am slowed down (D) 1.84 0.81 0.62 0.38

9 frightened feeling in stomach (A) 0.61 0.71 0.59 0.54

10 lost interest in my appearance (D) 0.57 0.76 0.55 0.39

11 restless as if I have to be on the move (A) 1.31 0.83 0.76 0.65

12 look forward with enjoyment to things (D) 0.79 0.79 0.67 0.60

13 get sudden feelings of panic (A) 0.72 0.78 0.48 0.65

14 can enjoy book, radio, TV programme (D) 1.01 0.86 0.52 0.31

1 test-retest reliability over 14 days, assessed with a squared weighted kappa 2 item-total correlation, i.e. the correlation of an item with its own subscale total (scores on the left side for

anxiety subscale (A), scores on the right side for depression subscale (D))

Chapter 5

- 92 -

Table 3. Factor analysis

item factor

1 2

1 feel tense or wound up 0.74

2 still enjoy things 0.70

3 sort of frightened feeling 0.81

4 can laugh and see funny side 0.26 0.64

5 worrying thoughts 0.61 0.29

6 feel cheerfull 0.68

7 sit at ease and feel relaxed 0.42 0.47

8 feel as if I am slowed down 0.35 0.29

9 frightened feeling in stomach 0.70

10 lost interest in appearance 0.70

11 restless as if I have to be on the move 0.72

12 look forward with enjoyment to things 0.62

13 sudden feelings of panic 0.81

14 enjoy book, radio, TV 0.55

• only scores higher than 0.25 are shown

• extraction method: principal component analysis

• rotation method: oblique rotation.

The distribution of the sumscore of the total HADS did not differ significantly from a normal

distribution (Kolmogorov Smirnov Z = 1.00, p = 0.27). The correlation among both subscales

of the HADS was 0.61. The total HADS correlated substantially (r > 0.60) with the PDQ-39

summary index, the PDQ emotional well-being subscale, and the ‘anxiety/depression item’ of

the EQ-5D, whereas correlations with the complete EQ-5D and the VAS were moderate

(table 4). The scores of the total HADS generally correlated higher with the subscales of the

PDQ-39 and the items of the EQ-5D, than the separate anxiety and depression scores did.

Patients in H&Y stages 1 and 5 were underrepresented and we therefore categorized patients

as mildly (H&Y 1 and 2, n = 74), moderately (H&Y 3, n = 70), and severely (H&Y 4 and 5, n

= 33) affected (table 1). The correlation between the H&Y stage and HADS total score was

0.32 (p < 0.001). Ordinal regression of HADS scores on modified H&Y data revealed a good

fitting model (Chi2 = 47.96, df = 58, p = 0.82), displaying a significant trend (p < 0.001).


- 93 -

The correlation between HADS total scores on the one hand and age and disease duration on

the other hand, were very low and non significant (-0.04 and 0.08, respectively). Correlation

between H&Y and the PDQ-39 summary index was 0.49 (rs).

Table 4. Correlation with other scales

HADS-total HADS-anxiety HADS-depression

PDQ-391 Summary Index 0.724 0.664 0.624

PDQ-39 subscales:

Mobility 0.53 0.47 0.48

activities of daily living 0.42 0.38 0.38

emotional well-being 0.734 0.724 0.59

Stigma 0.50 0.49 0.41

social support 0.39 0.36 0.33

cognitions 0.48 0.40 0.47

communication 0.34 0.26 0.36

bodily discomfort 0.47 0.52 0.32

Euroqol Summary Index 0.59 0.54 0.53

Euroqol items:

Mobility 0.22 0.20 0.193

daily activities 0.49 0.40 0.48

self-care 0.31 0.25 0.30

pain / discomfort 0.29 0.31 0.20

anxiety / depression 0.674 0.624 0.59

QoL VAS2 -0.59 -0.49 -0.58

1 PDQ-39: Parkinson’s Disease Questionnaire 2 QoL VAS: visual analogue scale assessing quality of life 3 correlation significant at 0.05 level; all other correlations significant at 0.01 level 4 substantial correlation (r > 0.60)

Chapter 5

- 94 -

Prevalence of anxiety and depression

When the cut-off scores as proposed by the developers were applied to our population, 61.9%

of the patients were considered not-depressed, whereas 21.6% were possibly, and 16.5% were

probably depressed. The distribution of anxiety scores indicated that 51.1% of the patients

were considered not-anxious, while 28.9% and 19.8% of the patients were regarded to

experience possible and probable anxiety, respectively (figure 1).

Differences in HADS total scores were not significant for male versus female patients (p =

0.64).

Figure 1. Percentage of patients with depression or anxiety for different cut-off values

vertical dashed lines indicate limits for non-cases (≤ 7), possible cases (8-10), and definite cases (≥ 11)

Subscale total

2520151050

Perc

enta

ge o

f 'ca

ses' 100

90

80

70

60

50

40

30

20

10

0

score HADS-D

score HADS-A


- 95 -

Discussion To date, sufficiently large studies addressing the psychometric properties of the HADS in a

population of patients with PD are lacking. The high response rate in our study supports the

notion that the reported data largely reflect the true scores of the target population.

The quality of the data in this study is high. The HADS shows adequate test-retest reliability

and sufficient internal consistency for group comparisons, with values resembling those

found in other populations.21-25 The internal consistency of 0.88 is just below the

recommended threshold of 0.90 and, consequently, using this scale for clinical evaluation

may be disputed. In this field the other depression rating scales generally do not perform

better than the HADS, however.26 The item-total correlations range from 0.31-0.70, which is

in accordance with results from other studies.10,23,25,27 If the factor solution is forced into two

factors, the factor structure is similar to that reported for other populations, that is, two factors

accounting for approximately 50% of the variance, with one factor representing anxiety and

the other depression.21-23,25,27,28 The correlation between the HADS and scores on QoL

instruments is consistent with the results of other studies that assessed the relation between

depression and quality of life in PD.2-4 In our study depression accounts for 52% of the

variance in QoL in patients with PD. Interestingly, this percentage is much higher than that

for disease severity, which accounts for only 24% of the variance in QoL. This underscores

the importance of assessing depression in PD. In clinical care disease severity is nearly

always routinely evaluated, but for depression this is far less common.

Disease duration does not affect the level of depression in our population. The relation

between depression and disease severity assessed by H&Y, shows a significant positive trend,

but the correlation is low. Neither age nor gender correlate with the HADS scores. These

findings are in line with results from the literature.1

Almost fifty percent of the patients shows symptoms of anxiety, with twenty percent

displaying probable anxiety. This latter percentage is in agreement with the prevalences

found by Aarsland et al.6 and Vasqeuz et al.7 Thirty-eight percent of the patients in our study

displays symptoms of depression, with 21 percent showing possible and 17 percent showing

probable depressive symptomatology. These results comply with the prevalences reported by

Cummings.1

Most of the rating scales for depression and anxiety were developed for use in psychiatric

populations and invariably include somatic items. Because the somatic features of depression

and anxiety show considerable overlap with those of PD, the prevalence of these symptoms

Chapter 5

- 96 -

may be overestimated if such scales are used in this population. Consequently, using scales

without somatic items may be more appropriate. Some researchers argue that scales including

somatic items may be used in physical diseases after all, provided that cut-off values are

adjusted to account for the presence of somatic symptoms.29,30 This argument holds only if

the severity of somatic symptoms co-vary in accordance with depressive symptoms. This

may be questioned however, as was demonstrated by Leentjens et al.,31 who found that non-

somatic items of the Hamilton Rating scale for Depression (HAM-D) and the Montgomery-

Åsberg Depression Rating Scale (MÅDRS) contributed more to the clinical diagnosis of

depression than individual somatic items did. This illustrates the differential importance of

these items with respect to depression in PD. The only somatic items that discriminated to

some extent, were ‘reduced appetite’ and ‘early morning wakening’, which is in agreement

with results of studies in other non-psychiatric populations.32,33

Despite this obvious advantage of the HADS for use in PD, the instrument has hardly been

used in this patient group. Hitherto, the HADS has only been used in PD for external

validation of another scale,2 or for describing baseline values for emotional state.11 Only

Leentjens et al.34 assessed the screening and diagnostic properties of the HADS in 55 non-

demented patients with idiopathic PD and concluded that the screening properties of this

scale seem adequate, but that the diagnosis of depression is better achieved with expert-

administered depression scales, such as the HAM-D and the MÅDRS.35 This is in line with

the intent of the developers, who aimed to construct a reliable screening test for psychiatric

disorder in the physically ill that could help the busy clinician. In the study by Leentjens et

al.34 maximum discrimination between depressed and non-depressed patients was obtained

when the total HADS was used with a cut-off of 18/19. However, as the sample size was

small relative to the number of items, this finding should be viewed with some caution.

The developers of the HADS consider the scale scores insensitive to somatic problems, and,

from their perspective, adjusting cut-off values for particular populations is unnecessary.10

This statement may be questioned with respect to items seven and eight in our population.

These items do not discriminate between subscales and the mean values are higher than those

found usually.25 This is most likely explained by a confounding effect of tremor and rigidity

on item 7 (‘I can sit at ease and feel relaxed’), and bradykinesia and rigidity on item 8 (‘I feel

as if I am slowed down’). Nevertheless, the item-total correlations of both items are quite

acceptable and deletion of these items would have resulted in a lower Cronbach’s alpha.


- 97 -

Consequently, there is no reason for deletion of these items in studies on patients with PD,

but our finding casts some doubt on the level of the cut-off value.

A disadvantage of this study was the lack of a gold standard, that is, psychiatric evaluation.

This would have enabled us to evaluate the diagnostic properties of the HADS. Looking

back, we regret that we did not include the Beck Depression Inventory in the survey as well.

A direct comparison of these frequently used self-administered depression scales would have

produced valuable information.

Although the absence of somatic items in scales that evaluate anxiety and depression in

patients with physical diseases has its advantages, it is important to realize that there is a

trade-off with face validity. After all, the concurrent validity with criteria for depression and

anxiety is decreased. For instance, five of the nine criteria for depression mentioned in the

fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV)36

concern somatic items (insomnia, fatigue, reduced appetite, reduced concentration,

psychomotor slowing). Furthermore, since the HADS focuses on the core symptoms of

depression (mood and anhedonia), two of the other non-somatic items of the DSM-IV are not

addressed either (guilt, suicidal thoughts). In this respect, the Beck Depression Inventory

covers the domain of depression more thoroughly, since it addresses seven of the nine DSM

criteria for depression.

Taken together, the psychometric properties of the HADS in a large population of patients

with PD are satisfactory. Results regarding reliability and construct validity are adequate and

agree with results in other populations. The internal consistency is just below the desired

threshold for clinical evaluation, indicating that information on an individual level should be

interpreted with some caution. Although patients with PD may interpret two items

‘somatically’, removal of these items from this scale is unnecessary, although a slight rise of

the advised cut-off may increase the concurrent validity with the DSM criteria for depressive

disorder. Taking this into account, the HADS can adequately be used to assess anxiety and

depression in patients with PD. Although the screening properties of the HADS seem

adequate, the diagnostic properties may be questioned and it may be more appropriate to use

the scale scores as an ‘index of severity’. An additional advantage of this scale is that it can

be completed by patients without the help of an expert.

Chapter 5

- 98 -

Depression plays an important role and accounts for more than 50% of the variance in quality

of life in PD. This percentage is twice as high as that of disease severity and evaluating

depression should hence be part of the routine clinical assessment of patients with PD.

Acknowledgements


0940-33-021). The authors thank Professor R.A.C. Roos for his helpful comments.

References

1. Cummings JL. Depression and Parkinson's disease: a review. Am J Psychiatry 1992; 149(4):443-454.



S38.

3. Schrag A, Jahanshahi M, Quinn N. What contributes to quality of life in patients with Parkinson's disease? J


4. Schrag A, Jahanshahi M, Quinn NP. What contributes to depression in Parkinson's disease? Psychol Med

2001; 31(1):65-73.

5. Richard IH, Schiffer RB, Kurlan R. Anxiety and Parkinson's disease. J Neuropsychiatry Clin Neurosci

1996; 8(4):383-392.

6. Aarsland D, Larsen JP, Lim NG, Janvin C, Karlsen K, Tandberg E, et al. Range of neuropsychiatric

disturbances in patients with Parkinson's disease. J Neurol Neurosurg Psychiatry 1999; 67(4):492-496.

7. Vazquez A, Jimenez-Jimenez FJ, Garcia-Ruiz P, Garcia-Urra D. "Panic attacks" in Parkinson's disease. A

long-term complication of levodopa therapy. Acta Neurol Scand 1993; 87(1):14-18.

8. Menza MA, Robertson-Hoffman DE, Bonapace AS. Parkinson's disease and anxiety: comorbidity with

depression. Biol Psychiatry 1993; 34(7):465-470.

9. Henderson R, Kurlan R, Kersun JM, Como P. Preliminary examination of the comorbidity of anxiety and

depression in Parkinson's disease. J Neuropsychiatry Clin Neurosci 1992; 4(3):257-264.

10. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr Scand 1983;

67(6):361-370.

11. Serra-Mestres J, Ring HA. Vulnerability to emotionally negative stimuli in Parkinson's disease: an

investigation using the Emotional Stroop task. Neuropsychiatry Neuropsychol Behav Neurol 1999;

12(1):52-57.

12. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson's disease. J





- 99 -

14. Brazier J, Jones N, Kind P. Testing the validity of the Euroqol and comparing it with the SF-36 health

survey questionnaire. Qual Life Res 1993; 2(3):169-180.

15. Brooks R. EuroQol: the current state of play. Health Policy 1996; 37(1):53-72.

16. McDowell I, Newell C. Measuring Health: A Guide to Rating Scales and Questionnaires. 2 ed. New York:

Oxford University Press, 1996.


18. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to their Development and Use. 2

ed. Oxford: Oxford Medical Publications, 1995.


20. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as

measures of reliability. Educational and Psychological Measurement 1973; 33:613-619.

21. Moorey S, Greer S, Watson M, Gorman C, Rowden L, Tunmore R, et al. The factor structure and factor

stability of the Hospital Anxiety and Depression Scale in patients with cancer. Br J Psychiatry 1991;

158:255-259.

22. Jack TM, Walker VA, Morley SJ, Hanks GW, Finlay-Mills BM. Depression, anxiety and chronic pain.

Anaesthesia 1987; 42(11):1235-1236.

23. Spinhoven P, Ormel J, Sloekers PP, Kempen GI, Speckens AE, Van Hemert AM. A validation study of the

Hospital Anxiety and Depression Scale (HADS) in different groups of Dutch subjects. Psychol Med 1997;

27(2):363-370.

24. Visser MC, Koudstaal PJ, Erdman RA, Deckers JW, Passchier J, van Gijn J, et al. Measuring quality of life

in patients with myocardial infarction or stroke: a feasibility study of four questionnaires in The

Netherlands. J Epidemiol Community Health 1995; 49(5):513-517.

25. Herrmann C. International experiences with the Hospital Anxiety and Depression Scale: a review of

validation data and clinical results. J Psychosom Res 1997; 42(1):17-41.

26. McDowell I, Newell C. Depression. In: McDowell I, Newell C. Measuring Health: A Guide to Rating

Scales and Questionnaires. New York: Oxford University Press, 1996: 238-286.

27. Fayers PM, Machin D. Factor analysis. In: Staquet MJ, Hays RD, Fayers PM, editors. Quality of Life

Assessment in Clinical Trials: Methods and Practice. Oxford: Oxford Univerity Press, 1998: 191-223.

28. Zesiewicz TA, Hauser RA. Depression in Patients with Parkinson's Disease: Epidemiology,

Pathophysiology and Treatment Options. CNS Drugs 2000; 13(4):253-264.

29. Lewis G, Wessely S. Comparison of the General Health Questionnaire and the Hospital Anxiety and

Depression Scale. Br J Psychiatry 1990; 157:860-864.

30. Bridges KW, Goldberg DP. The validation of the GHQ-28 and the use of the MMSE in neurological in-

patients. Br J Psychiatry 1986; 148:548-553.

31. Leentjens AFG, Marinus J, van Hilten JJ, Lousberg R, Verhey FRJ. The contribution of somatic symptoms

to the diagnosis of depressive disorder in Parkinson's disease: a discriminant analytic approach (accepted). J

Neuropsychiatry Clin Neurosci.

32. Moffic HS, Paykel ES. Depression in medical in-patients. Br J Psychiatry 1975; 126:346-353.

33. Clarke DC, Cavanaugh SA, Gibbons RD. The core symptoms of depression in medical and psychiatric

patients. J Nerv Ment Dis 1983; 171:705-713.

Chapter 5

- 100 -

34. Leentjens AFG, Lousberg R, Verhey FRJ. The psychometric properties of the Hospital Anxiety and

Depression Scale in patients with Parkinson's disease. Acta Neuropsychiatrica 2001; 13(4):83-85..

35. Leentjens AF, Verhey FR, Lousberg R, Spitsbergen H, Wilmink FW. The validity of the Hamilton and

Montgomery-Asberg depression rating scales as screening and diagnostic tools for depression in

Parkinson's disease. Int J Geriatr Psychiatry 2000; 15(7):644-649.

36. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4 ed.

Washington: American Psychiatric Association, 1994.

- 101 -

66 The contribution of somatic symptoms to the diagnosis of

depressive disorder in Parkinson’s disease: a discriminant analytic approach

Albert F.G. Leentjens1,2, Johan Marinus3, Jacobus J. van Hilten3, Richel Lousberg2, Frans

R.J. Verhey1,2

1 Department of Psychiatry, Maastricht University Hospital, Maastricht, The Netherlands; 2 Institute

for Brain and Behaviour, Maastricht University, Maastricht, The Netherlands; 3 Department of

Neurology, Leiden University Medical Center, Leiden, The Netherlands

Published in the Journal of Neuropsychiatry and Clinical Neurosciences 2003;15:74-77

Chapter 6

- 102 -

Abstract This study assessed the sensitivity of individual depressive symptoms and their relative contribution to the

diagnosis of depressive disorder in patients with Parkinson’s disease. The Structured Clinical Interview

for DSM-IV Depression and the Hamilton and the Montgomery-Åsberg Depression Rating Scales (Ham-

D, MADRS) were administered to 149 consecutive nondemented patients. The contribution of the

individual items of these scales to the diagnosis of “depressive disorder” was calculated by discriminant

analysis. The discriminant models based on the Ham-D and MADRS scores were both highly significant.

Nonsomatic core symptoms of depression had the highest correlation coefficient. Somatic items had

mostly low correlation coefficients, with the exception of reduced appetite and early morning wakening

(late insomnia). Nonsomatic symptoms of depression appear to be the most important for distinguishing

between depressed and nondepressed patients with Parkinson’s disease, along with reduced appetite and

early morning awakening.

Contribution of somatic symptoms to depression in Parkinson's disease

- 103 -

Introduction The diagnosis of depressive disorder in patients with somatic diseases, such as Parkinson’s

disease (PD), is often difficult because some of the physical symptoms of these diseases overlap

with the somatic symptoms of depressive disorder. In patients with PD, the “masked facies”,

psychomotor retardation, mental slowing, fatigue, and sleep disturbances may give the

appearance of depression in euthymic patients. A lot of research has been directed to the

specificity of depressive symptoms in these patients, that is, to the question whether a certain

symptom profile can be considered characteristic for depression in PD.1-4 Although there is some

evidence for a more prominent role of anxiety symptoms in depression in PD, the results of

different studies are not consistent.1-4 From a clinical point of view, however, the sensitivity of

depressive symptoms is more important. Clinicians are more interested in how to recognize

depressive disorder in a patient with known Parkinson’s disease, than in the way depressive

symptoms in PD patients differ from depressive symptoms in patients with other chronic

diseases, or in otherwise healthy individuals. This study analyzes the contribution of individual

depressive symptoms to the formal diagnosis of “depressive disorder” by using a discriminant

analytic approach.

Patients and methods

As part of an ongoing research project on psychopathology in PD, 169 consecutive patients with

primary PD, as defined by the clinical criteria of the United Kingdom Parkinson’s Disease

Society Brain Bank (UK-PDS-BB), were referred from the neurological outpatient department

for a protocolized mental status examination.5 This examination consisted of the Structured

Clinical Interview for DSM IV-Depression (SCID-D), to confirm or reject the diagnosis of

depressive disorder as defined by the DSM-IV criteria.6,7 The DSM IV diagnosis of depressive

disorder was considered the gold standard in this study. Patients fulfilling the DSM-IV criteria

for dementia were excluded in order to prevent unreliable answers due to recollection bias. All

patients completed the Hamilton Rating Scale for Depression (Ham-D),8 irrespective of the

presence or absence of depression. For 111 patients, the score on the Montgomery-Åsberg

Depression Rating Scale (MADRS)9 was also available. Thus, in this study, the Ham-D and

MADRS were used as symptom checklists and not as diagnostic scales. The physical disability

of patients was rated according to the Hoehn and Yahr staging system,10 which ranges from I

Chapter 6

- 104 -

(mild unilateral disease) to V (wheelchair bound or bedridden unless aided). The cognitive status

of the patients was assessed with the Mini-Mental State Examination (MMSE).11

On the basis of the individual items of the Ham-D and MADRS, a discriminant model was

calculated for both scales that would optimally predict whether patients would fall into the

depressed or nondepressed group according to the gold standard (i.e., DSM-IV criteria). Next, a

correlation coefficient with this discriminant function was obtained for each of the individual

items on these scales. These correlation coefficients reflect the relative strength of association of

each symptom with the discriminant function, and can thus be considered an indirect measure for

the contribution of these symptoms to the diagnosis of depressive disorder. Finally, the

individual items were grouped in order of descending correlation coefficients, that is, in order of

decreasing sensitivity. Wilks' lambda was calculated as a test of the discriminant function. As an

indicator of the prevalence of individual depressive symptoms, the percentage of nonzero scores

on each item of both scales is described. Means are accompanied by standard deviations. All

analyses were performed with the Statistical Package for the Social Sciences (SPSS) version 9.0.

Results

Of the 169 referred patients, 20 (11.8%) were excluded because of dementia. The other 149

patients participated in the analysis of the Ham-D items: 89 men and 60 women, with an average

age of 66.8 ± 10.2 years. Their average MMSE score was 27.4 ± 2.3. According to the Hoehn

and Yahr scale, 16 patients were classified as stage I, 81 as stage II, 24 as stage III, 9 as stage IV,

and none as stage V (19 patients not staged). Thirty-three patients met the criteria for depressive

disorder (22.1%). The average Ham-D score was 8.3 ± 3.9 for nondepressed patients and 16.9 ±

5.5 for depressed patients. The results of the discriminant analysis are shown in Table 1. Wilks’

lambda, as a test of the discriminant function in this model, was highly significant (λ = 0.438, χ2

= 108.502, df =17, P < 0.001). In this model 93.0% of the patients were correctly classified as

depressed or nondepressed. In this discriminant model of the Ham-D, suicidality was the best

discriminator between depressed and nondepressed patients. This item was followed, in

descending order, by feelings of guilt, psychic anxiety, reduced appetite, depressed mood, and

reduction of work and interest. Most somatic items, like psychomotor slowing, tiredness,

physical anxiety, and early and middle insomnia had low discriminative properties. The

exceptions were reduced appetite and early morning wakening (or late insomnia); these two

somatic items had relatively high discriminative properties.


- 105 -

Table 1. Structure matrix of the Hamilton Rating Scale for Depression items

item no. description correlation prevalence (%)

3 suicidal thoughts 0.537 20

2 feelings of guilt 0.469 24

10 anxiety (psychic) 0.371 46

12 loss of appetite (gastrointestinal) 0.362 18

1 depressed mood 0.338 46

7 work and interest 0.273 86

6 early morning wakening 0.257 28

9 agitation 0.244 35

14 sexual interest 0.239 35

8 retardation 0.232 80

15 hypochondriasis 0.221 38

16 insight 0.171 11

13 fatigue 0.162 79

11 anxiety (somatic) 0.161 25

17 loss of weight 0.158 3

4 insomnia initial 0.105 17

5 insomnia middle -0.007 36

Items are grouped in descending order of correlation within the discriminant function. The percentage

nonzero scores on the individual items is shown as an indicator of the prevalence of these symptoms.

The MADRS was completed for 64 men and 47 women, with an average age of 68.1 ± 10.1. The

number of patients participating in this analysis was smaller than for the Ham-D, because the

MADRS was added to the protocol later. Their average MMSE was 27.2 ± 2.4. Twelve were

classified as Hoehn and Yahr stage I, 65 as stage II, 19 as stage III, 8 as stage IV, and none as

stage V (7 patients not staged). Twenty-eight met the criteria for depressive disorder (25.2%).

The average MADRS score was 8.4 ± 5.1 for nondepressed and 20.5 ± 8.4 for depressed patients.

The results of the discriminant analysis are shown in Table 2. In this model also, Wilks’ lambda

was highly significant (λ = 0.419, χ2 = 90.560, df = 10, P < 0.001). This model classified 88.3%

of all patients correctly.

Chapter 6

- 106 -

In the discriminant model of the MADRS, the two “core” symptoms of depression, depressed

mood and anhedonia, had the highest correlation coefficients. In the MADRS, the clinician’s

judgment of “apparent sadness” was the item that distinguished the best between depressed and

nondepressed individuals. Somatic items of the MADRS had low correlation coefficients,

including the item “concentration difficulties”. However, as with the Ham-D, reduced appetite

was a relatively important indicator of depression.

.

Table 2. Structure matrix of the Montgomery-Åsberg Depression Rating Scale items

item no. description correlation prevalence (%)

1 observed depression 0.747 45

8 anhaedonia 0.739 47

2 reported depression 0.739 56

10 suicidal thoughts 0.435 29

9 feelings of guilt 0.388 53

5 reduced appetite 0.385 21

7 fatigue 0.292 81

3 tension 0.272 64

6 concentration 0.170 58

4 insomnia 0.082 52

Items are grouped in descending order of correlation within discriminant function. Percentage

of nonzero scores on the individual items shown as indicator of prevalence of these symptoms.

A post hoc analysis was performed for both the Ham-D and MADRS to discover in what way the

percentage of correctly classified patients would be affected when only nonsomatic symptoms of

depression would have been included in the analysis. After exclusion of the somatic items of the

Ham-D (items 4, 5, 6, 8, 11-14, and 16), 86.6% of the patients was correctly classified as

depressed or nondepressed. After exclusion of the somatic items of the MADRS (items 4-7),

88.3% of the patients was classified correctly


- 107 -

Discussion We studied the sensitivity and discriminative properties of somatic and nonsomatic symptoms of

depression in a large sample of patients with PD. The prevalence of dementia and depressive

disorder in this population, as well as the scores of both depressed and nondepressed patients on

the Ham-D and MADRS, are comparable to that of other studies.12,13

The discriminant analyses showed that core symptoms and other nonsomatic symptoms of

depression were the most important symptoms for establishing the diagnosis of depressive

disorder in patients with PD, as was expected. The results also showed that most somatic items in

both depression rating scales do not contribute substantially to the discriminant model. This was

also illustrated by the post hoc analyses: the percentage of patients who were correctly classified

as depressed or nondepressed was hardly reduced by exclusion of the somatic items of the

model. However, in the case of PD, not all somatic symptoms should be considered of little

importance to the diagnosis of depression. Our analyses show that “reduced appetite” and “early

morning wakening” meaningfully contribute to the discriminant model, and thus to the diagnosis

of depressive disorder. In an attempt to prevent the inclusion of nonspecific symptoms in the

diagnosis of depression, some authors argue that all somatic symptoms should be eliminated.14-16

Our study does not support this often-advocated view. Instead, our study shows that somatic

symptoms differ among themselves in terms of diagnostic sensitivity and that a more refined

approach is warranted.

The discriminant analytic approach to the clinical problem of diagnosing depressive disorder in a

patient with PD has both its advantages and its limitations. Less prevalent depressive symptoms

may have a high discriminative power, whereas common symptoms may not have high

discriminative properties. This may limit the clinical applicability of our findings in the

individual patient. Furthermore, the correlation coefficients within the discriminant models do

not reflect absolute values, but only reflect the relative association of individual items with the

discriminant model. Therefore it is not possible to define cutoff values above which the

contribution of a symptom should be considered relevant. For the same reason, correlation

coefficients of the items of different discriminant models, such as those of the Ham-D and

MADRS, cannot be compared. The major advantage of discriminant analysis is that it looks at

the sensitivity, rather than the specificity, of individual symptoms for the diagnosis of

depression. Therefore it may help the physician to assign clinical importance to certain

symptoms if they are present. In this way discriminant analysis facilitates a more refined

approach to the diagnosis of depression in patients with physical disease, such as PD.

Chapter 6

- 108 -

In conclusion, the recognition of depressive disorder in PD is often difficult because of the

overlap of characteristic symptoms of this disease and the somatic symptoms of depression. The

present discriminant analyses show that the core symptoms and other nonsomatic symptoms of

depression are most important in distinguishing depressed and nondepressed PD patients. Most

somatic items only have low discriminatory properties, with the notable exception of two

symptoms that are relatively sensitive indicators of depression: reduced appetite and early

morning wakening.

References 1. Robins AH. Depression in patients with Parkinsonism. Br J Psychiatry 1976; 128:141-145.

2. Gotham AM, Brown RG, Marsden CD. Depression in Parkinson's disease: a quantitative and qualitative

analysis. J Neurol Neurosurg Psychiatry 1986; 49(4):381-389.

3. Huber SJ, Freidenberg DL, Paulson GW, Shuttleworth EC, Christy JA. The pattern of depressive symptoms

varies with progression of Parkinson's disease. J Neurol Neurosurg Psychiatry 1990; 53(4):275-278.

4. Ehmann TS, Beninger RJ, Gawel MJ, Riopelle RJ. Depressive symptoms in Parkinson's disease: a

comparison with disabled control subjects. J Geriatr Psychiatry Neurol 1990; 3(1):3-9.

5. Hughes AJ, Daniel SE, Kilford L, Lees AJ. Accuracy of clinical diagnosis of idiopathic Parkinson's

disease: a clinico-pathological study of 100 cases. J Neurol Neurosurg Psychiatry 1992; 55(3):181-184.

6. Spitzer RL, Williams JBW, Gibbon M. Instruction manual for the Structured Clinical Interview for DSM-

III-R. New York: New York State Psychiatric Institute, 1987.

7. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4 ed.

Washington: American Psychiatric Association, 1994.

8. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry 1960; 23:56-62.

9. Montgomery SA, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry

1979; 134:382-389.


11. Folstein MF, Folstein SE, McHugh PR. "Mini-mental state". A practical method for grading the cognitive

state of patients for the clinician. J Psychiatr Res 1975; 12(3):189-198.

12. Cummings JL. Depression and Parkinson's disease: a review. Am J Psychiatry 1992; 149(4):443-454.

13. Leentjens AF, Verhey FR, Lousberg R, Spitsbergen H, Wilmink FW. The validity of the Hamilton and

Montgomery-Asberg depression rating scales as screening and diagnostic tools for depression in

Parkinson's disease. Int J Geriatr Psychiatry 2000; 15(7):644-649.

14. Endicott J. Measurement of depression in patients with cancer. Cancer 1984; 53(10 Suppl):2243-2249.

15. Rapp SR, Vrana S. Substituting nonsomatic for somatic symptoms in the diagnosis of depression in elderly

male medical patients. Am J Psychiatry 1989; 146(9):1197-1200.


67(6):361-370

- 109 -

77 Development of a questionnaire for autonomic dysfunction

in Parkinson's disease: the SCOPA-AUT

Martine Visser¹, Johan Marinus¹, Anne M. Stiggelbout², Jacobus J. van Hilten¹

Departments of Neurology¹ and Medical Decision Making², Leiden University Medical Center,

Leiden, The Netherlands.

Submitted

Chapter 7

- 110 -

Abstract Objectives. Autonomic symptoms are common in Parkinson's disease (PD), but there is no validated

questionnaire covering all relevant domains. The aim was to develop a disease-specific instrument, the

SCOPA-AUT, that evaluates autonomic symptoms in patients with PD and to assess its reliability and

validity. Methods. First, items were generated, based on an extensive literature search and expert

opinion. Based on the results of a postal survey among 46 patients with PD, 21 patients with multiple

system atrophy (MSA), and 8 movement disorders specialists, items were reduced according to the

frequency, burden, and clinical relevance of the symptoms. Evaluation of the validity of the

questionnaire was based on the results of a second postal survey in 140 patients with PD and 100

controls. Test-retest reliability was assessed in a subsample of 55 patients with PD, who received a

second questionnaire, two weeks after returning the first one. Results. The initial 45 items were

reduced to 25 items, that addressed the following domains: gastrointestinal (7 items), urinary (6

items), cardiovascular (3 items), thermoregulatory (4 items), pupillomotor (1 item), and sexual (2

items for men and 2 for women). Test-retest reliability of domain scores and individual items was

good, and each domain displayed high internal consistency. Most items and all domains, except sexual

dysfunction in men and women, differentiated between the PD and control group. Each domain had a

good content validity. A significant increase in autonomic problems was found for patients in the more

advanced stages of PD for all autonomic domains, except sexual dysfunction. Conclusions. The

SCOPA-AUT is a questionnaire that evaluates autonomic dysfunction in PD with good reliability and

validity.

Autonomic dysfunction in Parkinson's disease

- 111 -

Introduction Parkinson's disease (PD) has mainly been characterised in terms of motor impairments, but it

is increasingly recognized that the clinical spectrum of PD is more extensive and also affects

other areas such as cognition, mood, sleep, and autonomic functions. With respect to the

latter, a broad spectrum of autonomic features, involving gastrointestinal, urinary, sexual,

cardiovascular, thermoregulatory, respiratory, and pupillomotor functions, have been

described in PD.1-7 The overall prevalence of these autonomic features varies considerably,

ranging from 2% for urinary incontinence to 72% for constipation.8 In part, these autonomic

features have been related to disease duration, disease severity, or use of antiparkinsonian

drugs.9,10 For several autonomic symptoms, including gastrointestinal and urinary problems,

orthostatic hypotension, and erectile dysfunction, therapeutic interventions have become

available.11,12 The presence of autonomic dysfunction in patients with PD is associated with

depression and has an impact on daily functioning and patients' health-related quality of

life.13,14 Autonomic failure is also a frequent and prominent manifestation of multiple system

atrophy (MSA).15 Although MSA and PD are two distinct diseases, the profile of autonomic

features is quite similar.16

Despite the fact that a lot of research has been done, no reliable and valid instrument exists

that covers the full spectrum of autonomic problems in PD. Therefore, the primary aim of this

study was to develop a reliable and valid questionnaire for autonomic dysfunction in PD.

Additionally, because of the similarity in clinical presentation, the item generation and

reduction process was designed to account for the autonomic features of MSA as well.

The development of the SCOPA-AUT is part of a larger research project, the SCales for

Outcomes in PArkinson's disease (SCOPA), in which practical and clinimetric sound

instruments for all relevant domains in PD are selected or developed.

Methods

The SCOPA-AUT was developed in three stages: (1) item generation, based on a literature

search and expert opinion, (2) item reduction, based on the results of a postal survey, and (3)

clinimetric evaluation of the SCOPA-AUT, based on the results of a second postal survey.

The local medical ethical committee approved the study protocol.

Chapter 7

- 112 -

Item generation

The items were generated by conducting an extensive review of the literature on autonomic

symptoms in PD and MSA, including a review of questionnaires that evaluate specific

domains of autonomic dysfunction or address symptoms related to PD. Clinicians from the

areas of neurophysiology, gastro-enterology, gynaecology, urology, and sexology, who were

specialized in autonomic functions, were consulted regarding the generation of relevant items.

Patient functioning with respect to the selected items was addressed in two questions. The

first evaluated the frequency of the problem ("How often do you suffer from this problem?",

with response options ranging from 0 = 'never' to 3 = 'often'). This was followed by a second

question addressing the burden to the patient ("How much does this problem bother you?",

with response options ranging from 1 = 'not at all' to 4 = 'very much'). The time frame for all

questions was the past month, except for the item 'syncope' which addressed the past six

months. The questionnaire was first reviewed for content by the aforementioned professionals

and next piloted in 16 patients for intelligibility of questions and response options, after which

unclear questions were rephrased.

Item reduction

This initial questionnaire was sent to 55 patients with PD. All the patients were selected from

the database of the outpatient movement disorders clinic of the department of neurology of

the Leiden University Medical Center, in which clinical information is regularly updated. All

patients fulfilled the United Kingdom Parkinson's Disease Society Brain Bank (UKPDSBB)

criteria for idiopathic PD.17 Equal groups of patients with PD were selected from the different

Hoehn and Yahr (H&Y) stages,18 to cover the full spectrum of disease severity. The

questionnaire was also sent to 18 patients who fulfilled the criteria for MSA19 and were

registered at the same clinic. Additionally, 20 patients with MSA who attended a MSA

meeting organised by the Dutch Parkinson's Disease Society, were requested to complete the

questionnaire as well.

For both patient groups the mean, standard deviation, and frequency distribution of each item

was calculated for both the frequency and burden of the problem, as well as for the product of

frequency and burden. The aim was to select autonomic problems with a high frequency, a

high burden, or a combination of a high frequency and burden in patients with either PD or

MSA. Items were selected for the SCOPA-AUT if at least one of the following criteria was

fulfilled: (1) frequency: mean of 1 ('a little of the time') or higher, (2) burden: mean of 3

('quite a bit') or higher, or (3) product of frequency and burden: mean of 2 or higher. Items


- 113 -

were considered redundant and removed if the inter-item correlation exceeded 0.80. Items that

best covered the domain remained in the questionnaire. Items were not removed if it would

affect the content validity of the domain.

To ensure that uncommon problems with a high clinical relevance were not excluded, the

questionnaire was next sent to 10 movement disorder specialists, who rated the clinical

relevance of the items on a scale from 0 ('not at all') to 3 ('very high'). Items with a mean of 2

('rather important') or higher, were selected.

Evaluation of the SCOPA-AUT

The items that were thus selected, were rephrased into a single question evaluating the

frequency of the problem, with response options ranging from 0 ('never') to 3 ('often'). Extra

response options were added for questions in the urinary and sexual domain. In the urinary

domain, a response option anchored to the worst score of 3 was added, which allowed

subjects to indicate they used a catheter. In the domain of sexual dysfunction, an extra option

'not applicable' allowed subjects to indicate they had not been sexually active. This response

option was not included in the domain or total score. This second questionnaire was piloted

for intelligibility in 10 patients, and ambiguous or misleading items were rephrased.

A second postal survey was performed in 112 controls and 185 patients with PD, who were

selected from the SCOPA database and fulfilled the UKPDSBB criteria for idiopathic PD. To

ensure sufficient numbers of patients in all disease stages, 11 patients from H&Y stages 4 and

5, who participated in the first study, were again included in the second study. This was not

expected to bias the results, because all items had been rephrased and the time interval

between the two questionnaires was more than seven months. Each patient was asked to

provide, if possible, the names and addresses of one man and one woman of approximately

the same age (age difference less than 10 years), who agreed to participate in the control

group. Partners were not allowed as control subjects. The survey also included demographic

questions and a questionnaire on comorbid diseases, that addressed 20 common chronic

disorders.20 Each patient questionnaire was labelled to allow combination of the results with

information from patient records regarding disease severity, disease duration, and medication.

Non-responders were reminded by telephone after two weeks. A subsample, consisting of the

first 60 patients who returned the questionnaire, received a second mailing two weeks after

returning the first, to determine the test-retest reliability of the questionnaire.

Chapter 7

- 114 -

The following properties were examined for each item: mean, standard deviation, and

frequency distribution of the responses. Test-retest reliability for items was assessed with a

weighted kappa (quadratic weights). Differences in item scores between the two groups were

analysed using the Mann-Whitney-U test. Items that did not differentiate between the PD and

control group were removed, provided that an adequate content validity of the domain was

preserved. Means and standard deviations were calculated for the total score and for each

domain, and differences between groups were analysed using a t-test. Test-retest reliability of

domains and total score were analysed with an intraclass correlation coefficient (ICC) and the

internal consistency of the domains with Cronbach's alpha. 'Known-groups' analyses, using

Kruskal-Wallis or ANOVA with post-hoc t-tests and Bonferoni correction, were performed to

examine domain and total scores of controls and patients grouped by disease severity.

Results

Item generation and item reduction

Forty-five items in the following domains were selected for the initial questionnaire: gastro-

intestinal (14 items), urinary (8 items), cardiovascular (5 items), thermoregulatory (6 items),

pupillomotor (1 item), skin (1 item), respiratory (1 item), and sexual dysfunction (6 items for

men, 3 for women). An additional item assessed the use of medication in the aforementioned

domains. Forty-six of the 55 patients with PD (84%) returned the questionnaire in the first

postal survey. The mean age of these patients was 64.9 (SD 9.2) years and the disease

duration 10.6 (SD 5.9) years (table 1). Twenty-one patients with MSA (53%) returned the

questionnaire. The mean age of the MSA patients was 63.9 (SD 8.0) years and their disease

duration 6.8 (SD 2.7) years. Eight of the 10 movement disorder specialists returned the

questionnaire. Based on the results in patients with PD, 23 items fulfilled the criteria of

frequency, burden, and product of both. Two items were additionally included based on

information from patients with MSA. The specialists indicated clinical relevance of another

10 items, resulting in a total of 35 items. The item 'problems with orgasm', that did not meet

the criteria in the female group, was preserved to obtain a similar coverage of the sexual

domain in both sexes. Two items in the urinary domain, two items involving orthostatic

hypotension, and three items concerning sexual dysfunction in men, were removed because of

redundancy. The item on medication was modified to match the selected items. Overall, the

item reduction process resulted in a total of 28 items, with one additional question on the use


- 115 -

of relevant medication. The selected items assessed the following domains: gastro-intestinal

(10 items), urinary (6 items), cardiovascular (3 items), thermoregulatory (4 items),

pupillomotor (1 item), and sexual dysfunction (2 items for men and 2 for women).

Evaluation of the SCOPA-AUT

In the second postal survey, 143 of the 185 patients (77%) returned the questionnaire. Three

patients were excluded from the analyses because more than 20% of their data was missing.

The response rate in the test-retest assessment was 92%. Data from two patients were

removed from this analysis because they had more than 20% missing values. In the control

group 100 of the 108 subjects (93%) returned the questionnaire. There were no differences

between the characteristics of the total PD group and the subsample used for test-retest

reliability. Compared to the PD group, subjects in the control group were younger and

included more females (table 1), and the analyses were therefore adjusted for age and sex.

Table 1. Characteristics of PD / MSA group in first, and PD / control group in second survey

postal survey 1 postal survey 2

variable PD-1 MSA PD-2 controls p

N 46 21 140 100 -

Mean (SD) age (years) 64.9 (9.2) 63.9 (8.0) 65.6 (10.9) 61.4 (11.2) 0.0051

Gender (M/F) 26/19 12/9 84/56 48/52 0.0662

Mean (SD) disease duration (years) 10.6 (5.9) 6.8 (2.7) 9.9 (5.2)

Hoehn and Yahr stage (n)

1

2

3

4

5

missing

3

15

15

10

1

2

NA3

4

55

52

26

3

0

NA3

1T-test 2Chi-square test of the subjects in the second postal survey 3Not applicable

Chapter 7

- 116 -

Missing data was low, namely 0-1% per item in the control group and 0-4% in the PD group,

except for questions regarding sexual functioning. These four questions had the most missing

values, especially in female patients (11-13%). In total 46-50% of the women with PD and

39-40% of the female controls indicated 'not applicable' on these two items, compared to 21-

24% in male patients and 10-17% in male controls. Women who had missing values or scored

'not applicable', were significantly older, both in the PD and the control group.

Table 2. Frequency distribution and item characteristics of PD group in second survey

Item 0 1 2 3 extra valid¹ Kw²

swallowing 55 75 9 1 - 140 0.69

salivation 34 63 31 12 - 140 0.86

dysphagia 83 52 4 0 - 139 0.74

gastric emptying 68 50 17 5 - 140 0.73

constipation 64 46 19 11 - 140 0.75

straining 24 66 29 21 - 140 0.74

faecal incontinence 120 17 2 1 - 140 0.56

urgency 44 58 19 14 4³ 139 0.79

urinary incontinence 72 42 13 8 4³ 139 0.84

incomplete emptying 62 49 15 9 4³ 139 0.69

weak stream of urine 48 55 21 8 4³ 136 0.87

frequency of urination 14 56 46 20 3³ 139 0.45

nocturia 13 24 55 43 5³ 140 0.76

light-headed when standing up 68 55 12 3 - 138 0.75

light-headed when standing for some time 84 38 12 1 - 135 0.69

syncope 133 6 1 0 - 140 0.85

hyperhidrosis during the day 67 49 17 7 - 140 0.73

hyperhidrosis during the night 52 52 22 14 - 140 0.80

cold intolerance 77 39 17 4 - 137 0.74

heat intolerance 63 39 27 6 - 135 0.74

over-sensitive to bright light 54 58 20 8 - 140 0.74

Men: impotence 25 15 14 8 184 80 0.87

Men: ejaculation problem 26 17 11 6 204 80 0.73

Women: vaginal dryness 12 7 1 3 264 49 0.77

Women: orgasm problem 7 9 4 2 284 50 0.61

¹number of valid responses; ² weighted Kappa; ³extra response option (catheter); 4extra option (not applicable).


- 117 -

Test-retest reliability for individual items was high with weighted kappa's ranging from 0.45-

0.90 (table 2). According to the criteria of Landis and Koch,21 only two items had moderate

agreement (i.e., between 0.41-0.60): 'frequency of urination' (0.45) and 'faecal incontinence'

(0.56). Compared to controls, patients had significantly higher scores on all items (p < 0.05),

except for 'heart burn', 'diarrhoea', 'flatulence', and 'syncope'. Similar results emerged when

the scores were corrected for age, or analysed separately for men and women. In men the

items 'incomplete emptying', 'impotence', and 'ejaculation problem', and in women the items

'faecal incontinence', 'vaginal lubrication', and 'problems with orgasm', did not differentiate

between patients and controls. Compared to controls, patients reported significantly higher

use of medication for constipation (23% versus 4%). No differences in the use of medication

were reported for other domains. After removal of the items 'heartburn', 'diarrhoea', and

'flatulence', the remaining 7 items still adequately covered the gastrointestinal domain. The

final version of the questionnaire includes 23 items for each sex (appendix). The scale has a

score range of 0-69, with higher scores reflecting worse autonomic functioning.

The test-retest reliability of the domain scores was good, with ICCs ranging from 0.67-0.90.

The ICC for the total score was 0.87 (table 3). The internal consistency was calculated in the

PD group and was high for most domains, with Cronbach's alpha ranging from 0.56-0.86

(table 3). The internal consistency of the pupillomotor domain could not be calculated, since it

includes only one item.

Patients had significantly higher scores than controls in all domains (p < 0.05), except for

sexual dysfunction in men and women (table 3). We compared the scores of controls with

those of patients, classified into groups according to their H&Y stages as: mild (H&Y 1 and

2), moderate (H&Y 3), or severe (H&Y 4 and 5). The total score and the domain scores

showed significant differences between groups, except for the sexual dysfunction domain in

women (table 4). A significant trend, with more autonomic problems in groups with higher

disease severity, was present in all domains, except sexual dysfunction. There were no

differences in domain scores of patients with severe PD who participated in both the first and

second postal survey, and patients who only participated in the second postal survey.

Patients with PD recorded more comorbid diseases than subjects in the control group. Some

of these comorbidities, however, are inherent to PD, i.e., dizziness, urinary incontinence, and

bowel dysfunction. After the removal of these symptoms from the comorbidity questionnaire,

no differences between the groups were found.

Chapter 7

- 118 -

Table 3. Means (SD) of the domains in patients and controls, and reliability in patients

Domain PD controls p Cronbach's α ICC¹

Gastrointestinal (7 items) 5.3 (3.1) 1.4 (1.6) 0.000² 0.66 0.90

Urinary (6 items) 7.1 (4.2) 3.9 (2.4) 0.000² 0.84 0.83

Cardiovascular (3 items) 1.2 (1.3) 0.3 (0.6) 0.000² 0.57 0.83

Thermoregulatory (4 items) 1.8 (2.0) 3.1 (2.4) 0.000² 0.68 0.82

Pupillomotor (1 item) 0.9 (0.9) 0.4 (0.7) 0.000³ - 0.744

Sexual combined (2+2 items) 1.9 (1.8) 1.3 (1.6) 0.035² - -

Sexual men (2 items) 2.0 (1.9) 1.3 (1.7) 0.055² 0.88 0.84

Sexual women (2 items) 1.7 (1.5) 1.4 (1.5) 0.440² 0.56 0.68

Total autonomic score (25 items) 18.8 (8.5) 8.8 (5.4) 0.000² - 0.87

¹intraclass correlation coefficient; ²T-test; ³Mann-Whitney-U test; 4Weighted kappa statistic

Table 4. Comparisons between controls and patients grouped by Hoehn & Yahr stages

Domain controls patients p fit4 trend5

1¹ 2¹ 3¹

Gastrointestinal 1.4 4.4 5.7 6.9 <0.001² 0.71 <0.001

Urinary 3.9 6.2 7.5 8.1 <0.001² 0.38 <0.001

Cardiovascular 0.27 1.0 1.2 1.1 <0.001² 0.18 <0.001

Thermoregulatory 1.8 3.1 3.1 3.9 <0.001² 0.85 <0.001

Pupillomotor 0.4 0.8 0.9 0.8 <0.001³ 0.75 <0.001

Sexual (combined) 1.3 1.6 2.3 1.5 0.055² 0.07 0.060

Sexual (men) 1.3 1.6 2.6 1.5 0.062² 0.40 0.065

Sexual (women) 1.4 1.4 1.8 1.6 0.890² 0.14 0.748

Total autonomic score 8.8 16.5 19.8 21.4 <0.001² 0.52 <0.001

¹modified Hoehn & Yahr stages: 1 = mild (H&Y 1+2), 2 = moderate (H&Y 3), 3 = severe (H&Y 4+5);

²ANOVA; ³Kruskal Wallis test; 4Goodness of fit: p > 0.05 indicates the model does not differ significantly from

a model with good fit; 5Trend: p < 0.05 indicates that a trend is present and differs significantly from zero.


- 119 -

Discussion

This study was designed to develop and evaluate a questionnaire for autonomic dysfunction in

patients with PD and indicates that autonomic dysfunction is a prominent aspect of PD, being

present early in the disease and increasing with advancing H&Y stages. Although some

studies have used a questionnaire to assess autonomic dysfunction, these instruments were

never thoroughly validated.22 The items in the SCOPA-AUT were selected because they were

common, had a high patient burden, were clinically relevant, and were more frequently

present in patients with PD than in controls. The response rate in both surveys was high,

which ensures that the characteristics of the intended sample are adequately reflected and it

may also indicate that these aspects are relevant to patients.

The evaluation of the validity of the SCOPA-AUT is based on content validity and 'known-

groups' comparisons. No gold standard or validated questionnaire exists against which the

SCOPA-AUT can be compared. Urodynamic tests,23 cardiovascular reflex measurements,24 or

anorectal manometry,25 are autonomic function tests that have been used in patients with PD.

These tests are useful in obtaining insight in specific aspects of autonomic failure, but cannot

be used for validation of this questionnaire. The presence of clinical symptoms as indicated

by the patients, does not necessarily reflect the result of a laboratory test, or vice versa. In the

process of developing the SCOPA-AUT, we focussed on the content and clinical applicability

of the questionnaire. Therefore infrequent but serious symptoms, like syncope, remained in

the questionnaire because of the clinical relevance of this problem. Taken together, the

content validity of the SCOPA-AUT is good and based on both experts' and patients'

opinions, therewith capturing all relevant features of the different autonomic domains. The

instrument also shows adequate known-groups validity, as the instrument discriminates

between patients and controls, and between patients in mild, moderate, and severe stages. The

test-retest reliability of items, domains, and the total score is high. These are important

characteristics, especially for use in longitudinal studies and clinical trials. In the clinical

management of patients, the questionnaire can be used to identify problematic autonomic

areas that require attention or treatment.

To capture the spectrum of autonomic dysfunction in PD adequately, we aimed to include

sufficient numbers of patients from each H&Y stage. As patients from H&Y stages 4 and 5

were underrepresented, 11 patients who participated in the first survey were re-included in the

second survey. Potential bias in the final results is considered small, as all items were

rephrased and the interval between both surveys was more than seven months. This was

Chapter 7

- 120 -

confirmed by a post-hoc analysis, which showed no differences between patients who

participated both times and those who participated in the second survey only.

Although PD and MSA are pathologically distinct diseases, the clinical spectrum of

dysautonomia overlaps.16 Severe autonomic dysfunction is often associated with MSA, but

Riley et al.26 found no differences between PD and MSA patients in five different tests of

autonomic function. Therefore, in the item selection and reduction process, the SCOPA-AUT

was designed to cover the relevant domains of autonomic dysfunction of both diseases,

allowing this instrument to be used in both patient groups. Patients with MSA were not

included in the final evaluation of the SCOPA-AUT, because our main objective was to

evaluate its use in PD. In the near future we aim to evaluate the SCOPA-AUT in a sample of

patients with MSA, to determine the reliability and validity of this instrument in patients with

this condition.

The domain scores are the sum of the individual item scores in that particular domain and

reflect the degree of dysfunction. The items evaluate different functions within these domains

and can even represent the extremes of a continuum, e.g., heat and cold intolerance. We

therefore were somewhat surprised that the internal consistencies of the domains, measured

with Cronbach's alpha, were still fairly high.

In agreement with other studies,27 the questions in the domain of sexual dysfunction displayed

the most missing values, viz., 13% in female patients, in addition to 50% answering 'not

applicable'. Only the data of a small subsample of women was therefore available for analysis

and this revealed no differences in sexual dysfunction between patients and controls. Women

who did not answer these questions were significantly older than women who responded.

Different results may be found in a younger sample.

Patients with PD had significantly higher scores than controls in most items, most domains,

and the total score. These results differ from findings in previous studies, where only some of

these symptoms differed significantly between patients and controls.22,28 This discrepancy

may be explained by the comparatively large sample in our study. The aforementioned studies

included 48 patients and 32 controls, and 44 patients and 24 controls, respectively.

Additionally, patients in our study covered a broad range of disease severity and disease

duration. Within the PD group, patients with more advanced disease stages had higher domain

scores, except for sexual dysfunction. This indicates that the questionnaire may have the

ability to measure change, although responsiveness of this questionnaire still needs to

evaluated.


- 121 -

Acknowledgements

The authors thank Prof. J.G. van Dijk, Dr. A.A.M. Masclee, Dr. P.T.M. Wijenborg, and Dr.

A.A.B. Lycklama à Nijeholt for their help in generating relevant items. We like to thank Prof.

R.A.C. Roos for reviewing the manuscript.

References

1. Edwards LL, Pfeiffer RF, Quigley EM, Hofman R, Balluff M. Gastrointestinal symptoms in Parkinson's

disease. Mov Disord 1991; 6(2):151-156.

2. Lemack GE, Dewey Jr RB, Roehrborn CG, O'Suilleabhain PE, Zimmern PE. Questionnaire-based

assessment of bladder dysfunction in patients with mild to moderate Parkinson's disease. Urology 2000;

56(2):250-254.

3. Brown RG, Jahanshahi M, Quinn N, Marsden CD. Sexual function in patients with Parkinson's disease and

their partners. J Neurol Neurosurg Psychiatry 1990; 53(6):480-486.

4. Senard JM, Rai S, Lapeyre MM, Brefel C, Rascol O, Rascol A, et al. Prevalence of orthostatic hypotension

in Parkinson's disease. J Neurol Neurosurg Psychiatry 1997; 63(5):584-589.

5. Fischer M, Gemende I, Marsch WC, Fischer PA. Skin function and skin disorders in Parkinson's disease. J

Neural Transm 2001; 108(2):205-213.

6. Shill H, Stacy M. Respiratory function in Parkinson's disease. Clin Neurosci 1998; 5(2):131-135.

7. Armstrong RA. Parkinson's disease and the eye. Ophthalmic Physiol Opt 1997; 17 Suppl 2:S9-16.

8. Martignoni E, Pacchetti C, Godi L, Micieli G, Nappi G. Autonomic disorders in Parkinson's disease. J

Neural Transm Suppl 1995; 45:11-19.

9. Turkka JT. Correlation of the severity of autonomic dysfunction to cardiovascular reflexes and to plasma

noradrenaline levels in Parkinson's disease. Eur Neurol 1987; 26(4):203-210.

10. Kujawa K, Leurgans S, Raman R, Blasucci L, Goetz CG. Acute Orthostatic Hypotension When Starting

Dopamine Agonists in Parkinson's Disease. Arch Neurol 2000; 57(10):1461-1463.

11. Task force Movement Disorders Society. Drugs to treat autonomic dysfunction in Parkinson's disease. Mov

Disord 2002; 17 Suppl 4:S103-S111. Abstract.

12. Hussain IF, Brady CM, Swinn MJ, Mathias CJ, Fowler CJ. Treatment of erectile dysfunction with sildenafil

citrate (Viagra) in parkinsonism due to Parkinson's disease or multiple system atrophy with observations on

orthostatic hypotension. J Neurol Neurosurg Psychiatry 2001; 71(3):371-374.

13. Berrios GE, Campbell C, Politynska BE. Autonomic failure, depression and anxiety in Parkinson's disease.

Br J Psychiatry 1995; 166(6):789-792.

14. Damiano AM, Snyder C, Strausser B, Willian MK. A review of health-related quality-of-life concepts and

measures for Parkinson's disease. Qual Life Res 1999; 8(3):235-243.

15. Chaudhuri KR. Autonomic dysfunction in movement disorders. Curr Opin Neurol 2001; 14(4):505-511.

16. Magalhaes M, Wenning GK, Daniel SE, Quinn NP. Autonomic dysfunction in pathologically confirmed

multiple system atrophy and idiopathic Parkinson's disease--a retrospective comparison. Acta Neurol Scand

1995; 91(2):98-102.

Chapter 7

- 122 -

17. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson's disease. J



19. Gilman S, Low PA, Quinn N, Albanese A, Ben Shlomo Y, Fowler CJ, et al. Consensus statement on the

diagnosis of multiple system atrophy. J Neurol Sci 1999; 163(1):94-98.

20. Van der Velden J, Abrahamse HPhH, Donker G, Van der Steen J, Van Sonsbeek JLA, Van den Bos GAM.

What do health interview surveys tell us about the prevalences of somatic chronic diseases? : A study into

concurrent validity. Eur J Public Health 1998; 8:52-58.


33(1):159-174.

22. Siddiqui MF, Rast S, Lynn MJ, Auchus AP, Pfeiffer RF. Autonomic dysfunction in Parkinson's disease: a

comprehensive symptom survey. Parkinsonism Relat Disord 2002; 8(4):277-284.

23. Araki I, Kitahara M, Oida T, Kuno S. Voiding dysfunction and Parkinson's disease: urodynamic

abnormalities and urinary symptoms. J Urol 2000; 164(5):1640-1643.

24. Turkka JT, Tolonen U, Myllyla VV. Cardiovascular reflexes in Parkinson's disease. Eur Neurol 1987;

26(2):104-112.

25. Stocchi F, Badiali D, Vacca L, D'Alba L, Bracci F, Ruggieri S, et al. Anorectal function in multiple system

atrophy and Parkinson's disease. Mov Disord 2000; 15(1):71-76.

26. Riley DE, Chelimsky TC. Autonomic nervous system testing may not distinguish multiple system atrophy

from Parkinson's disease. J Neurol Neurosurg Psychiatry 2003; 74(1):56-60.

27. Sakakibara R, Shinotoh H, Uchiyama T, Sakuma M, Kashiwado M, Yoshiyama M, et al. Questionnaire-

based assessment of pelvic organ dysfunction in Parkinson's disease. Auton Neurosci 2001; 92(1-2):76-85.

28. Singer C, Weiner WJ, Sanchez-Ramos JR. Autonomic dysfunction in men with Parkinson's disease. Eur

Neurol 1992; 32(3):134-140.


- 123 -

Appendix

SCOPA-AUT

The response options for all questions are: never, sometimes, regularly, often. In some domains extra response

options are added. The questions concerning medication have the response options: no and yes.

1. In the past month, have you had difficulty swallowing or have you choked?

2. In the past month, has saliva dribbled out of your mouth?

3. In the past month, has food ever become stuck in your throat?

4. In the past month, did you ever have the feeling during a meal that you were full very quickly?

5. Constipation is a blockage of the bowel, a condition in which someone has a bowel movement twice a

week or less. In the past month, have you had problems with constipation?

6. In the past month, did you have to strain hard to pass stools?

7. In the past month, have you had involuntary loss of stools?

8. In the past month, have you had difficulty retaining urine? (extra: use catheter)

9. In the past month, have you had involuntary loss of urine? (extra: use catheter)

10. In the past month, have you had the feeling that after passing urine your bladder was not completely

empty? (extra: use catheter)

11. In the past month, has the stream of urine been weak? (extra: use catheter)

12. In the past month, have you had to pass urine again within 2 hours of the previous time? (extra: use

catheter)

13. In the past month, have you had to pass urine at night? (extra: use catheter)

14. In the past month, when standing up have you had the feeling of becoming either light-headed, or no

longer being able to see properly or no longer being able to think clearly?

15. In the past month, did you become light-headed after standing for some time?

16. Have you fainted in the past 6 months?

17. In the past month, have you ever perspired excessively during the day?

18. In the past month, have you ever perspired excessively during the night?

19. In the past month, have your eyes ever been over-sensitive to bright light?

20. In the past month, how often have you had trouble tolerating cold?

21. In the past month, how often have you had trouble tolerating heat?

The following 3 questions are only for men:

22. In the past month, have you been impotent (unable to have or maintain an erection)? (extra: not

applicable)

23. In the past month, how often have you been unable to ejaculate?

(extra: not applicable)

23a. In the past month, have you taken medication for an erection disorder? (If so, which medicine?) (no;

yes: _______)

Chapter 7

- 124 -

The following 2 questions are only for women:

24. In the past month, was your vagina too dry during sexual activity? (extra: not applicable)

25. In the past month, have you had difficulty reaching an orgasm? (extra: not applicable)

The following questions are for everyone:

26. In the past month, have you used medication for:

a. constipation? b. urinary problems?c. blood pressure? d. other symptoms (no; yes: _______)

- 125 -

88 Development of a short scale for motor impairments and

disabilities in Parkinson's disease: the SPES/SCOPA

Johan Marinus1, Martine Visser1, Anne M. Stiggelbout2, Jose Martin Rabey3, Pablo Martínez-

Martín4, Ubaldo Bonuccelli5, Peter H. Kraus6, Jacobus J. van Hilten1

Department of 1Neurology and 2Medical Decision Making, Leiden University Medical Center, Leiden,

The Netherlands; 3Department of Neurology, Assaf Harofeh Medical Center, Zerefin, Israel; 4Neuroepidemiology Unit, Department of Applied Epidemiology, National Center for Epidemiology

I.S. Carlos III, Madrid, Spain; 5Department of Neurology, University Hospital, Pisa, Italy; 6Department of Neurology, St. Josef University Hospital, Bochum, Germany

Submitted

Chapter 8

- 126 -

Abstract Objective. To evaluate the reliability and validity of the SPES/SCOPA, a short scale developed to

assess motor function in patients with Parkinson’s disease (PD). Methods. Eighty-five patients with

PD were assessed with the SPES/SCOPA, Unified Parkinson’s Disease Rating Scale (UPDRS), Hoehn

and Yahr (H&Y) scale, and Schwab and England (S&E) scale. Thirty-four patients were examined

twice by two different assessors, who were blind to each other’s scores and test executions ('clinical

assessment situation'). Additionally, six items of the motor section of the SPES/SCOPA were assessed

in nine patients and recorded on videotape, to evaluate inter-rater and intra-rater reliability for this

situation ('video assessment situation'). Results. The reproducibility of the sumscores in the clinical

assessments was high for all subscales of the SPES/SCOPA. Inter-rater reliability of individual items

ranged from 0.27-0.83 in the motor impairment section, from 0.58-0.82 in the activities-of-daily-living

section, and from 0.65-0.92 in the motor complications section. Inter-rater reliability of the motor

items in the video assessments ranged from 0.69-0.87, and intra-rater reliability from 0.81-0.95. The

correlation between related subscales of the SPES/SCOPA and UPDRS were all higher than 0.85 and

both scales revealed similar correlations with other measures of disease severity. The mean time to

complete the scales was 8.1 (SD 1.9) minutes for the SPES/SCOPA and 15.6 (SD 3.6) minutes for the

UPDRS. Conclusions. The SPES/SCOPA is a short, reliable, and valid scale that can adequately be

used in both research and clinical practice.

Motor impairments and disabilities in Parkinson’s disease

- 127 -

Introduction

Over the last 15 years the Unified Parkinson's Disease Rating Scale (UPDRS) has become a

standard tool in the clinical evaluation of patients with Parkinson's Disease (PD). The UPDRS

is the most frequently used scale in trials in PD1 and has acceptable inter- and intra-rater

reliability for most items.2,3 The construct validity with other scales is adequate, but the

content validity has been questioned, especially with respect to its conceptual clearness and

balance between items that represent symptoms responsive to dopaminergic therapy and those

more resistant to this intervention.2 Other critiques include the length of the scale4 and the

redundancy of items.5 The mean time to complete the scale is approximately 17 minutes for

experienced users,4 which makes it less suitable for clinical application. Van Hilten et al.5

demonstrated that the UPDRS can be shortened by removing redundant items from the motor

section and conceptually unclear items from the activities-of-daily-living (ADL) section,

without negative consequences for reliability or validity. Obviously, a shorter scale with

similar clinimetric properties may have advantages for patients, clinicians, and researchers.

As a result of the aforementioned considerations, the Short Parkinson's Evaluation Scale

(SPES) was developed.6 The scale is short, conceptually clear, and has good reliability and

validity.2,6 The instrument is considered easy to use by its evaluators,6 but has only been used

in a few studies.7,8

Careful inspection of the SPES, however, indicates that the consistency in the framing of

response options may be improved, and that the item 'swallowing' represents an impairment

and should be moved from the disability (ADL) section to the motor impairment (MI) section,

in order to be consistent with current methodological concepts of scale construction.9,10

Additionally, some clinimetric aspects of the SPES have not been addressed to date. These

involve intra-rater reliability and inter-rater reliability between two assessors who perform the

clinical assessments separately. Hence, we first modified the SPES according to the

aforementioned considerations and subsequently evaluated this scale, the SPES/SCOPA. The

development of this scale is part of a larger research project on SCales for Outcomes in

PArkinson’s disease (SCOPA),11 in which short, practical, and clinimetric sound scales for all

relevant domains in PD are selected or developed.

The objective of this study was to evaluate the reliability (intra- and inter-rater reliability,

internal consistency) and construct validity (correlation with related scales, 'known-groups'

comparisons) of the SPES/SCOPA.

Chapter 8

- 128 -

Methods

Development of the SPES/SCOPA

The SPES/SCOPA consists of three sections: motor impairments (MI), activities-of-daily-

living (ADL), and motor complications (MC). There are four response options, ranging from

0 (normal) to 3 (severe). In comparison with the original SPES, some modifications to the

three sections were made based on findings in the literature and empirical testing of some of

the items. The mental section was removed altogether, because we felt that these important

functions could not be assessed in a reliable and valid way by a few single questions. As a

part of the SCOPA project, we tested and developed separate instruments to evaluate these

functions.11

Motor impairments. Tremor. Pooled data of several studies8,12-15 (together including 1361

patients) revealed that in less than 4% of the patients the tremor score of the legs was higher

than that of the arms, and that in only 2% of the patients a tremor in the legs was present

while it was absent in the arms. For reasons of efficiency we decided to evaluate tremor only

in the upper extremities. Additionally, in the response options we linked the amplitude of the

tremor to a displacement in centimetres to improve the quantification of 'small', 'moderate',

and 'severe'. Bradykinesia. 'Finger tapping' was replaced by 'rapid alternating movements' on

the basis of the results of a study in which we compared five different tests for bradykinesia.16

This study revealed that this test had the highest intra- and inter-rater reliability and the

highest correlation with measures of disease severity. Both tremor and bradykinesia were

assessed during 20 seconds. A time window was not indicated in the original SPES. Rigidity.

The phrase 'detectable only on activation of the contralateral arm' was removed from the

response options, since a pilot study showed that the muscle tone in the ipsilateral arm also

increased in many healthy individuals if the contralateral arm was raised. Rigidity was now

evaluated by the 'perceived difficulty in trying to reach the end positions in elbow or wrist',

which proved to be a useful criterion in a pilot study we held in 17 patients with PD. Postural

stability. This item was modified on the basis of a study, in which we compared six tests for

postural stability.17 The word 'retropulsion' was removed from the response options, and the

scoring was now determined by the number of steps patients took to restore balance, and by

whether a patient would fall or not. Arising from chair, gait and speech. These items only

underwent minimal changes compared with the original SPES and do not need detailed

discussion. Two other impairments were added and evaluated historically, viz., swallowing

and freezing during 'on'. We considered it more important to assign items to the appropriate


- 129 -

section, than to assign items to sections on the basis of the way they are elicited (examination

versus history). 'Swallowing' was therefore moved from the ADL section to the MI section.

'Freezing during on' was added because it was considered a useful progression marker of PD

that is less responsive to dopaminergic interventions. We expected to achieve a better balance

between more dopamine-responsive and more dopamine-resistant features if this item was

added. The maximum score in the MI section is 42.

Activities of daily living. The response options were framed as uniformly as possible.

Responses reflected no difficulty (normal), some difficulty (no assistance needed),

considerable difficulty (possibly needing assistance), and unable (or needing complete or

almost complete assistance). As stated before, the item 'swallowing' (previously addressed

under 'eating') was removed from this section. 'Turning and getting out of bed', was extended

to 'changing positions', to include all important transfers of daily life. The maximum score in

this section is 21.

Motor complications. The original section on complications of therapy was modified and

now evaluated both presence and severity of dyskinesias and motor fluctuations. Freezing and

dystonia were removed, because these phenomena cannot solely be attributed to dopaminergic

therapy and hence do not purely reflect complications of therapy. Items were framed to

address the impairment level. The maximum score for both dyskinesias and motor

fluctuations is 6.

Patients

Eighty-five consecutive patients who visited the outpatient clinic of the Department of

Neurology of the Leiden University Medical Center, and who fulfilled the United Kingdom

Parkinson’s Disease Society Brain Bank criteria for idiopathic PD,18 were included in the

clinical assessments. Patients were excluded if they also had other diseases of the central

nervous system or were not able to understand Dutch. Nine patients that met the same

eligibility criteria were included in video assessments. The study was approved by the

medical ethics committee of the Leiden University Medical Center.

Assessment procedures

The patients were evaluated with the SPES/SCOPA (appendix), UPDRS parts II - IV,19 the

Hoehn and Yahr (H&Y) scale,20 and the Schwab and England (S&E) scale.21 The latter three

Chapter 8

- 130 -

scales were included to evaluate the construct validity of the SPES/SCOPA. One global

question evaluated overall ADL functioning and patients were asked to indicate on a seven-

point-scale (ranging from very good to very bad) how well they had been able to carry out

various daily activities in the past month. Patients also had to indicate whether they were 'on'

or 'off' at the time of assessment. Additional information that was gathered involved

medication, disease duration, and comorbidity.

Reproducibility of the SPES/SCOPA was assessed in three different ways, i.e., inter-rater

reliability of clinical assessments, inter-rater reliability of video assessments, and intra-rater

reliability of video assessments. In the clinical assessments, 34 patients were assessed twice

by the investigators (JM, MV), who assessed patients separately and were blind to each

other's scores and test executions. Reproducibility was calculated in those patients who had a

stable response to medication over this period (i.e., who had no 'on-off' transitions). In the

video assessments, inter-rater reliability was assessed by an international panel of movement

disorders specialists (from Italy, Israel, Spain, Germany, and The Netherlands), who rated

nine videotaped patients who had a stable response to medication during the time the

recordings were made. The panel rated the patients twice with an interval of 7-14 days. In the

video recordings, six items from the MI section of the SPES/SCOPA were included. The

other four items of this section were not appropriate for video scoring, i.e., rigidity, speech,

and the two historic items. Rigidity could not be evaluated from video for obvious reasons,

the other three items were not included because patients and raters spoke different languages.


Reproducibility was assessed with the intra-class correlation coefficient (ICC; one-way

random effects model) if two ratings were compared (inter-rater reliability in clinical

assessments and intra-rater reliability in video assessments), and with Kendall's coefficient of

concordance W if it concerned agreement between more than two raters (inter-rater reliability

in video assessments). The ICC is equivalent to weighted kappa if quadratic weights are

used,22 and we therefore used the 'strength-of-agreement' classification as proposed by Landis

and Koch.23 These authors classified strength of agreement as slight (0.00-0.20), fair (0.21-

0.40), moderate (0.41-0.60), substantial (0.61-0.80), or almost perfect (0.81-1.00). Internal

consistency of subscales was evaluated with Cronbach's alpha. Construct validity was

evaluated by determining the correlation between scales, using Pearson's r for the correlation

of SPES/SCOPA with UPDRS and S&E, and Spearman's rs for the correlation with H&Y and


- 131 -

'global functioning'. 'Known-groups' validity was assessed by comparing scores of patients

with different disease severity (by H&Y), using analysis of variance and ordinal regression.

Results

Characteristics of patients that participated in the clinical assessment are presented in table 1.

The nine patients that were recorded on video involved five men and four women, with a

mean age of 62.4 and a mean disease duration of 12.4 years. Three of these patients were in

H&Y stage 2, three in H&Y 3, and three in H&Y 4 (data not shown).


n1 85

mean (SD) age (years) 65.4 (10.4)

number (%) males 52 (61.2%)


Hoehn & Yahr stages:

1 0

2 25

3 42

4 18

5 0

mean (SD) SPES/SCOPA-Motor impairments 13.8 (5.1)

mean (SD) SPES/SCOPA-ADL 8.5 (3.2)

mean (SD) UPDRS-Motor evaluation 32.6 (10.9)

mean (SD) UPDRS-ADL 15.9 (6.0)

mean (SD) Schwab & England 75.7 (12.4)

n assessed while 'on' / 'off' / missing 78 / 5 / 2

n (%) on levodopa 72 (84.7%)

mean (SD) levodopa dose in users (mg) 535 (407)

n (%) on dopamine agonists 63 (74.1%)

1 number of patients

Chapter 8

- 132 -

Practicality. The mean time necessary to complete the scales was 8.1 (SD 1.9) minutes for the

SPES/SCOPA and 15.6 (SD 3.6) minutes for the UPDRS.

Reliability of clinical assessments. One patient changed from 'on' to 'off' between both clinical

assessments and was removed from the analysis. Inter-rater reliability was hence assessed in

33 patients. Inter-rater reliability coefficients of the motor sections of the clinical assessments

were all at least 'moderate' according to the Landis & Koch criteria, except for two items in

the SPES/SCOPA ('postural tremor right hand', 'rigidity right arm') and eight items in the

UPDRS (table 2). The latter eight items also included the two items that had 'fair' reliability in

the SPES/SCOPA. The mean reliability coefficient calculated over the items that were shared

by both scales, was 0.56 for both. The mean ICC calculated over all the items of the motor

sections was 0.58 for the SPES/SCOPA and 0.50 for the UPDRS. Two items in the UPDRS

(rest tremor left and right leg) could not reliably be calculated as a result of insufficient

dispersion. However, percentage agreement for these items was high (table 2). Results for the

ADL section are presented in table 3. All agreements were at least 'substantial', except

'changing positions' in the SPES/SCOPA. The mean reliability coefficient calculated over the

shared items of the ADL sections was 0.69 for the SPES/SCOPA and 0.71 for the UPDRS.

The mean ICC over all items of the ADL section of the UPDRS was 0.76. The MC sections of

both scales (table 4) shared only one item in both the dyskinesia and the motor fluctuation

section. The mean ICC for the items in the dyskinesia section was 0.83 for the SPES/SCOPA

and 0.75 for the UPDRS. The mean ICC for items in the motor fluctuation sections was 0.67

for the SPES/SCOPA and 0.60 for the UPDRS. The ICCs of the sumscores of the UPDRS

were generally somewhat higher than those of the SPES/SCOPA (table 5).

Reliability of video assessments. The inter-rater reliability coefficients for all items in the

video assessments were at least 'substantial' (> 0.60; table 6). Intra-rater reliability coefficients

were higher, with all items above 0.80 ('almost perfect').

Internal consistency. The internal consistencies of the SPES/SCOPA scales were higher than

those of the UPDRS, with the exception of the MI scale (table 5). 'Sensory symptoms' in the

UPDRS-ADL even had a negative corrected item-total correlation (-0.02). Other items with

corrected correlations below 0.20 involved rest tremor of the right hand (0.15) and

swallowing (0.18) in the SPES/SCOPA scales, and rest tremor of head, right hand, left hand,

and right leg (0.03, 0.11, 0.16, and 0.18, respectively) in the UPDRS motor section, and

tremor (0.05) in the UPDRS ADL section. The corrected item-total correlation of the tremor

of the left hand was considerably higher in both scales (0.40 in the SPES/SCOPA and 0.31 in

the UPDRS).


- 133 -

Table 2. Inter-rater reliability (ICC) of motor impairment items in clinical assessment

Shared items SPES/SCOPA UPDRS Non-shared items

Rest tremor R 0.63 0.53

Rest tremor L 0.67 0.45

Postural tremor R 0.34* 0.34*

Postural tremor L 0.64 0.58

Rapid alternating movements R 0.58 0.59

Rapid alternating movements L 0.45 0.50

Rigidity R 0.27* 0.38*

Rigidity L 0.61 0.64

Rise from chair 0.83 0.83

Postural instability 0.60 0.59

Gait 0.60 0.63

Speech 0.55 0.68

0.57 Freezing during ‘on’

0.79 Swallowing

0.36* Facial expression

0.21* Rest tremor head

0.00** Rest tremor R leg

0.00** Rest tremor L leg

0.59 Rigidity head

0.29* Rigidity R leg

0.44 Rigidity L Leg

0.46 Finger tap R

0.39* Finger tap L

0.28* Hand movements R

0.67 Hand movements L

0.53 Leg agility R

0.61 Leg agility L

0.66 Posture

0.23* Body bradykinesia

* all agreements at least 'moderate' (> 0.40), except *, where agreement is fair (ICC = 0.21-0.40)23

** estimates unreliable due to insufficient dispersion; % agreement 97.1 (R leg) and 91.2 (L leg)

NB: 8 items in UPDRS (U) and 2 items in SPES/SCOPA (S) with poor agreement (≤ 0.40)

Chapter 8

- 134 -

Table 3. Inter-rater reliability (ICC) of ADL items in clinical assessment


Speech 0.66 0.68

Feeding 0.63 0.61

Dressing 0.80 0.76

Hygiene 0.70 0.71

Changing positions 0.58* 0.73

Walking 0.61 0.66

Handwriting 0.82 0.85

0.91 Salivation

0.90 Swallowing

0.73 Falling (unrelated to freezing)

0.77 Freezing when walking

0.83 Tremor

0.68 Sensory complaints

* all agreements 'substantial' (> 0.60), except changing positions (S15), where agreement is 'moderate'

Table 4. Inter-rater reliability (ICC) of motor complication items in clinical assessment


Dyskinesias

Presence dyskinesias 0.92 0.96

0.74 Severity dyskinesias

0.94 How disabling dyskinesias?

0.71 How painful dyskinesias?

0.39* Early morning dystonia?

Motor Fluctuations

Presence 'off' periods 0.69 0.65

0.65 Severity 'off' periods

0.71 Off periods predictable?

0.41** Off periods unpredictable?

0.62 Sudden offs?

NB: all agreements 'substantial', except for * U35 ('fair') and ** U37 ('moderate')


- 135 -

Table 5. Inter-rater reliability and internal consistency of SPES/SCOPA and UPDRS

ICC sumscores ICC items Cronbach alpha item-total1

SPES/SCOPA Motor 0.86 0.27 - 0.83 0.74 0.15 – 0.65

UPDRS Motor 0.90 0.002 - 0.83 0.88 0.03 – 0.65

SPES/SCOPA ADL 0.89 0.58 - 0.82 0.81 0.38 – 0.72

UPDRS ADL 0.93 0.61 - 0.91 0.75 -0.023 – 0.65

SPES/SCOPA Dyskinesias 0.89 0.74 – 0.92 0.92 NA4

UPDRS Dyskinesias 0.94 0.39 - 0.96 0.58 0.11 - 0.63

SPES/SCOPA Fluctuations 0.72 0.65 – 0.69 0.95 NA4

UPDRS Fluctuations 0.75 0.41 – 0.71 0.74 0.48 - 0.66

NB: Values obtained by clinical assessment in 85 patients. 1 corrected item-total correlation; 2 tremor R and L

leg; estimates unreliable due to insufficient dispersion; 3 sensory symptoms; 4 not applicable (only two items)

Table 6. Reliability of SPES/SCOPA items assessed by video

intra-rater1 inter-rater2

rest tremor R 0.95 0.83

rest tremor L 0.87 0.71

postural tremor R 0.91 0.71

postural tremor L 0.93 0.87

rapid alt mov R 0.85 0.82

rapid alt mov L 0.81 0.69

rise from chair 0.86 0.72

postural instability 0.87 0.87

gait 0.84 0.70

1 assessed by weighted kappa (quadratic weights); 14 raters 2 assessed by Kendall's coefficient of concordance W; 14 raters

Chapter 8

- 136 -

Validity. Correlations between related sections of the SPES/SCOPA and the UPDRS were

0.88 for motor impairments, 0.86 for ADL, 0.86 for dyskinesias, and 0.95 for motor

fluctuations. The correlations of these sections with the H&Y and S&E scale were all very

similar (table 7). The correlation between these sections and disease duration also bore strong

resemblance, with coefficients of 0.38 (SPES/SCOPA) and 0.23 (UPDRS) for the motor

sections, and 0.29 (SPES/SCOPA) and 0.36 (UPDRS) for the ADL sections. The correlation

with global ADL functioning was 0.49 for SPES/SCOPA ADL and 0.48 for the UPDRS

ADL.

Table 7. Correlations with Hoehn &Yahr and Schwab & England (Spearman’s rho)

Hoehn & Yahr Schwab & England

SPES/SCOPA Motor 0.50 -0.58

UPDRS Motor 0.47 -0.53

SPES/SCOPA ADL 0.47 -0.66

UPDRS ADL 0.45 -0.62

SPES/SCOPA complications1 0.26 -0.31

UPDRS IV complications1 0.26 -0.41

1 sum of dyskinesia and fluctuation scores

Mean scores of patients grouped by their H&Y stages (table 8) indicated significant

differences between groups for both the motor and ADL sections (ANOVA; p < 0.001). Post-

hoc t-tests showed no significant differences between patients in H&Y 2 and H&Y 3 in both

scales, but differences between stages 2 and 4, and between stages 3 and 4, were significant.

A significant trend was present in both sections of the scales, with higher scores for patients

with more advanced PD.


- 137 -

Table 8. Scale scores grouped by Hoehn and Yahr stages ('known-groups' comparisons)

Hoehn & Yahr ANOVA post-hoc t-tests1 ordinal regression

2 3 4 2-3 3-4 2-4 trend2 fit3 R-sq4

SPES/SCOPA motor 11.1 13.4 18.7 <0.001 0.11 <0.001 <0.001 <0.001 0.57 0.26

UPDRS motor 27.7 31.0 43.1 <0.001 0.51 <0.001 <0.001 <0.001 0.78 0.25

SPES/SCOPA ADL 7.2 7.9 11.8 <0.001 0.84 <0.001 <0.001 <0.001 0.33 0.30

UPDRS ADL 13.4 15.0 21.6 <0.001 0.72 <0.001 <0.001 <0.001 0.44 0.26

1 numbers in the columns are p-values of differences in scores of patients in different Hoehn and Yahr stages 2 a significant trend indicates that the trend differs significant from zero (i.e., no trend) 3 goodness of fit: p > 0.05 indicates that the model does not differ significantly from a model with a good fit 4 R-square: proportion of the variance accounted for by the model.

Discussion

We evaluated several aspects of the clinimetric performance of the SPES/SCOPA. Inter-rater

reliability and internal consistency obtained by clinical assessment in 85 patients, were very

similar for the SPES/SCOPA and the UPDRS. The reproducibility of the sumscores of the

SPES/SCOPA was high. Two items in the motor section displayed less than moderate

agreement, i.e., 'postural tremor' and 'rigidity', both of the right arm. The same items on the

same side also performed only fairly in the UPDRS, and previous studies on this scale have

also found lower scores for these items.4,24,25 Inter-rater reliability of the SPES/SCOPA MI

section assessed from video, showed higher scores than the clinical assessments with all

values higher than 0.60. Intra-rater reliability is even higher, with all reliability coefficients

above 0.80. Items in the ADL section were only evaluated historically in the clinical

assessment situation and all displayed at least 'substantial agreement', with the exception of

'changing positions'. Items in the MC section all showed at least 'substantial' reproducibility.

The results from our study comply with previous findings in which the SPES and UPDRS

were compared,6 with comparable items displaying similar reliability.

Internal consistency of all SPES/SCOPA scales were above 0.70, which is considered the

minimum for group comparisons.26 Item-total correlations of the tremor items were low,

indicating that these items behave rather independently, an observation was previously

reported by Martínez-Martín et al.4

Chapter 8

- 138 -

The correlation between the SPES/SCOPA and UPDRS is high, indicating that these scales

largely measure the same constructs. Moreover, the similarity of the correlations between the

SPES/SCOPA and UPDRS on the one hand, and measures of disease severity, such as H&Y,

S&E, global ADL functioning, and disease duration on the other hand, is striking. This further

endorses the impression that the scales capture the same phenomena. Differences between

patients grouped by their H&Y stages also display very similar results for both scales.

Although the SPES/SCOPA contains only half the number of items and uses four response

options instead of five, reliability and validity are apparently preserved.

Inter-rater reliability in our study was generally lower than that seen in other studies. This is

not surprising, given that most previous studies have used either video recordings, or a design

where several raters assess patients simultaneously, therewith excluding potential biases

caused by changes in the patient's state and differences in test executions. Video assessments

are useful since they provide information on reproducibility in standardised situations, thus

presenting the opportunity to locate weaker items that may benefit from clearer instructions

and descriptions. However, knowledge on the degree of reliability if patients are assessed at

separate occasions, either by the same or by different assessors, provides additional

information, since it reflects the routine of studies and clinical practice. To the best of our

knowledge, only one study has assessed reproducibility of the UPDRS over separate clinical

examinations.3 That study has evaluated intra-rater reliability over a two-week interval,

whereas we have assessed inter-rater reliability in immediate succession. Compared with our

data, that study found somewhat higher values for reproducibility of items in the motor

section and somewhat lower values for ADL items. This may seem somewhat surprising since

usually reliability scores are higher for intra- than for inter-rater reliability, and higher for

shorter than for longer intervals. One may have expected that both their scores would either

have been higher, because intra-rater reliability was assessed, or lower, because of the longer

interval. A possible explanation for this apparent discrepancy is that descriptions of ADL

items generally leave less room for interpretation bias than the description of motor items.

To summarise, the SPES/SCOPA is a reliable, valid, and conceptually clear scale that is

completed in half the time it takes to administer the UPDRS. Additionally, the SPES/SCOPA

has a better balance between early and late features of the disease and between items that are

more responsive to dopaminergic interventions and those that are more resistant. Altogether,

these advantages may favour the use of the SPES/SCOPA in evaluating motor function in

patients with PD.


- 139 -

Acknowledgements

Professor R.A.C. Roos is gratefully acknowledged for reviewing the manuscript. The authors

thank the following persons for their participation in evaluating the videotapes: Dr. S.

Agostini, Dr. S. Bernardini, Dr. C. Berti, Dr. D. Canteparo, Dr. E. Cubo, Dr. G. Gambacchi,

Dr. C. Klein, Dr. T. van Laar, Dr. C. Lucetti, and Dr. T. Prokhorov.

References

1. Mitchell SL, Harper DW, Lau A, Bhalla R. Patterns of outcome measurement in Parkinson's disease

clinical trials. Neuroepidemiology 2000; 19(2):100-108.

2. Ramaker C, Marinus J, Stiggelbout AM, Van Hilten BJ. Systematic evaluation of rating scales for

impairment and disability in Parkinson's disease. Mov Disord 2002; 17(5):867-876.

3. Siderowf A, McDermott M, Kieburtz K, Blindauer K, Plumb S, Shoulson I. Test-Retest reliability of the

Unified Parkinson's Disease Rating Scale in patients with early Parkinson's disease: Results from a

multicenter clinical trial. Mov Disord 2002; 17(4):758-763.

4. Martínez-Martin P, Gil-Nagel A, Morlán Gracia L, Balseiro Gómez J, Martínez-Sarriés FJ, Bermejo F, et

al. Unified Parkinson's Disease Rating Scale Characteristic and Structure. Mov Disord 1994; 9(1):76-83.



9(1):84-88.




7. Werber EA, Rabey JM. The beneficial effect of cholinesterase inhibitors on patients suffering from

Parkinson's disease and dementia. J Neural Transm 2001; 108(11):1319-1325.

8. Reichmann H, Brecht HM, Kraus PH, Lemke MR. [Pramipexole in Parkinson disease. Results of a

treatment observation]. Nervenarzt 2002; 73(8):745-750.




11. SCOPA homepage. Internet 2002. http://www.lumc.nl/2050/research/scopa_homepage.html

12. Guttman M. Double-blind comparison of pramipexole and bromocriptine treatment with placebo in

advanced Parkinson's disease. International Pramipexole- Bromocriptine Study Group. Neurology 1997;

49(4):1060-1065.

13. Hubble JP, Koller WC, Cutler NR, Sramek JJ, Friedman J, Goetz C, et al. Pramipexole in patients with

early Parkinson's disease. Clin Neuropharmacol 1995; 18(4):338-347.

14. Lieberman A, Ranhosky A, Korts D. Clinical evaluation of pramipexole in advanced Parkinson's disease:

results of a double-blind, placebo-controlled, parallel-group study. Neurology 1997; 49(1):162-168.

Chapter 8

- 140 -

15. Pinter MM, Pogarell O, Oertel WH. Efficacy, safety, and tolerance of the non-ergoline dopamine agonist

pramipexole in the treatment of advanced Parkinson's disease: a double blind, placebo controlled,

randomised, multicentre study. J Neurol Neurosurg Psychiatry 1999; 66(4):436-441.

16. Gruber RA, Marinus J, Visser M, Van Hilten JJ. Inter- and intra-rater reliability and discriminative ability

of five measures of bradykinesia in subjects with and without Parkinson's disease. Mov Disord 2002;

17(supplement 5):S119. Abstract.

17. Visser M, Marinus J, Bloem BR, Kisjes H, Van den Berg BM, Van Hilten JJ. Clinical tests for postural

instability in patients with Parkinson's disease. Mov Disord 2002; 17(supplement 5):S120. Abstract.







21. Schwab RS, England AC, Jr. Projection technique for evaluting surgery in Parkinson's disease. In:

Gilingham FJ, Donaldson IML, editors. Third symposium on Parkinson's disease. Edinburgh: E. & S.

Livingstone, 1969: 152-157.

22. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as

measures of reliability. Educational and Psychological Measurement 1973; 33: 613-619.


33(1):159-174.

24. Prochazka A, Bennett DJ, Stephens MJ, Patrick SK, Sears-Duru R, Roberts T, et al. Measurement of

rigidity in Parkinson's disease. Mov Disord 1997; 12(1):24-32.

25. Richards M, Marder K, Cote L, Mayeux R. Interrater Reliability of the Unified Parkinson's Disease

Rating Scale for Motor Examination. Mov Disord 1994; 9(1):89-91.



- 141 -

Appendix

SPES/SCOPA scale

A. Motor evaluation Clinical examination 1. Rest tremor

assess each arm separately during 20 seconds; hands rest on thighs; if tremor is not evident at rest, try to keep the patient attentive, e.g. by having him/her count backwards with eyes closed

0 = absent 1 = small amplitude (< 1 cm) occurring spontaneously, or obtained only while keeping patient attentive (any amplitude) 2 = moderate amplitude (1-4 cm), occurring spontaneously 3 = large amplitude (≥ 4 cm), occurring spontaneously. 2. Postural tremor

check with arms outstretched, pronated and semipronated, and with index fingers of both hands almost touching each other (elbows flexed); assess each position during 20 seconds

0 = absent 1 = small amplitude (< 1cm) 2 = moderate amplitude (1-4 cm) 3 = large amplitude (≥ 4 cm). 3. Rapid alternating movements of hands

rapid alternating pronation/supination movements of upper hand, each time slapping the palm of the horizontally held lower hand during 20 seconds; each hand separately 0 = normal 1 = slow execution, or mild slowing and/or reduction in amplitude; may have occasional arrests 2 = moderate slowing and/or reduction in amplitude or hesitations in initiating movements or frequent

arrests in ongoing movements 3 = can barely perform task.

4. Rigidity

assess passive movements of elbow and wrist over full range, with the patient relaxed in sitting position; ignore cogwheeling; check each arm separately

0 = absent 1 = mild rigidity over full range, no difficulty reaching end positions

2 = moderate rigidity, some difficulties reaching end positions 3 = severe rigidity, considerable difficulties reaching end positions. 5. Rise from chair patient is instructed to fold arms across chest; use straight back chair 0 = normal 1 = slowly; does not need arms to get up 2 = needs arms to get up (can get up without help) 3 = unable to rise (without help). 6. Postural stability stand behind the patient and pull patient backwards, while s/he is standing erect with eyes open and feet

spaced slightly apart; patient is not prepared 0 = normal, may take up to 2 steps to recover 1 = takes 3 or more steps; recovers unaided 2 = would fall if not caught 3 = spontaneous tendency to fall or unable to stand unaided.

Chapter 8

- 142 -

7. Gait

assess gait pattern; use walking aid or offer assistance, if necessary 0 = normal 1 = mild slowing and/or reduction of step height or length; does not shuffle

2 = severe slowing, or shuffles or has festination 3 = unable to walk.

8. Speech 0 = normal 1 = slight loss of expression, diction and/or volume 2 = slurred; not always intelligible 3 = unintelligible always or most of the time. Historical information 9. Freezing during 'on'

Freezing is characterized by hesitation when trying to start walking or 'gluing' to the ground while walking. 0 = absent 1 = start hesitation only, occasionally present 2 = frequently present, may have freezing when walking 3 = severe freezing when walking.

10. Swallowing 0 = normal 1 = some difficulty or slow; does not choke; normal diet 2 = sometimes chokes; may require soft food 3 = chokes frequently; may require soft food or alternative method of food intake.

B. Activities of Daily Living 11. Speech 0 = normal 1 = some difficulty; may sometimes be asked to repeat sentences 2 = considerable difficulty; frequently asked to repeat sentences 3 = unintelligible most of the time. 12. Feeding (cutting, filling cup, etc.) 0 = normal 1 = some difficulty or slow; does not need assistance 2 = considerable difficulty; may need some assistance 3 = needs almost complete or complete assistance. 13. Dressing 0 = normal 1 = some difficulty or slow; does not need assistance 2 = considerable difficulty; may need some assistance (e.g. buttoning, getting arms into sleeves) 3 = needs almost complete or complete assistance. 14. Hygiene (washing, combing hair, shaving, brushing teeth, using toilet) 0 = normal 1 = some difficulty or slow; does not need assistance 2 = considerable difficulty; may need some assistance 3 = needs almost complete or complete assistance.


- 143 -

15. Changing position (turning over in bed, getting up out of bed, getting up out of a chair, turning

around when standing) 0 = normal 1 = some difficulty or slow; does not need assistance with any change of position 2 = considerable difficulty; may need assistance with one or more changes of position 3 = needs almost complete or complete assistance with one or more changes of position. 16. Walking 0 = normal 1 = some difficulty or slow; does not need assistance or walking aid 2 = considerable difficulty; may need assistance or walking aid 3 = unable to walk, or walks only with assistance and great effort. 17. Handwriting 0 = normal 1 = some difficulty (e.g. slow, small letters); all words legible 2 = considerable difficulty; not all words legible; may need to use block letters 3 = majority of words are illegible.

C. Motor Complications 18. Dyskinesias (presence)

0 = absent 1 = present some of the time 2 = present a considerable part of the time 3 = present most or all of the time.

19. Dyskinesias (severity) 0 = absent 1 = small amplitude 2 = moderate amplitude 3 = large amplitude 20. Motor fluctuations (presence of 'off' periods)

What proportion of the waking day is patient 'off' on average? 0 = none 1 = some of the time 2 = a considerable part of the time 3 = most or all of the time.

21. Motor fluctuations (severity of 'off' periods)

0 = absent 1 = mild end-of-dose fluctuations 2 = moderate end-of-dose fluctuations; unpredictable fluctuations may occur occasionally 3 = severe end-of-dose fluctuations; unpredictable on-off oscillations occur frequently.

- 145 -

99 Activity-based diary for Parkinson’s disease

Johan Marinus1, Martine Visser1, Anne M. Stiggelbout2, Jose Martin Rabey3, Ubaldo

Bonuccelli4, Peter H. Kraus5, Jacobus J. van Hilten1

Department of 1Neurology and 2Medical Decision Making, Leiden University Medical Center,

Leiden, The Netherlands; 3Department of Neurology, Assaf Harofeh Medical Center, Zerefin, Israel; 4Department of Neurology, University Hospital, Pisa, Italy; 5Department of Neurology, St. Josef

University Hospital, Bochum, Germany

Published in Clinical Neuropharmacology 2002;25:43-50

Chapter 9

- 146 -

Abstract The objective of this study was to develop a Parkinson's disease (PD) diary that evaluates a patient’s

difficulties in performing activities, as a substitute for the amount of 'on'- and 'off'-time, and to assess

its clinimetric qualities. In this study 84 PD-patients kept a diary for two or three periods of five days.

Daily, five items were recorded across 11 time periods. Patients simultaneously recorded ‘on-off’ in

the traditional way. The diary was easily understood and median recording time was 5-10 minutes a

day. Clinimetric analysis showed that the diary could successfully be reduced to three days, in which

five items (walking, transfers, manual activities, dyskinesias, and sleep) with four response options

were assessed seven times daily. Sumscores of the first three items accurately predicted being 'on' or

'off' in 93% of the cases, making separate scoring of 'on' and 'off' unnecessary. The diary was

internally consistent and showed good reproducibility. Construct validity with external measures was

adequate, and comparisons between patients grouped by disease severity and by degree of fluctuations

revealed significant differences in the expected directions. Taken together, this PD diary has a sound

clinimetric basis and provides information on the extent of perceived disability, thereby accurately

reflecting both the severity of off-periods and the variability of motor fluctuations.

Activity-based diary for Parkinson’s disease

- 147 -

Introduction Rating scales offer the possibility to record the clinical status of patients. However, they

provide a momentary impression that will generally describe the situation of stable patients

more accurately than that of patients with a fluctuating disease pattern. Information gathered

over a longer period of time will reflect the status of the latter patients more accurately, and

therefore diaries are especially useful in clinical evaluation and research involving fluctuating

patients.

Most patients with Parkinson’s disease (PD) experience motor fluctuations (‘wearing off’,

‘on-off’ fluctuations) after a number of years of levodopa use. Therapy in this situation is

often aimed at reducing the amount of ‘off-time’, which is usually evaluated by recording the

amount of time spent in ‘on’ and ‘off’ over a number of days. However, clinimetric

characteristics of these ‘on-off diaries’ have never been reported. Further, there are several

disadvantages of this assessment method. Firstly, some patients have difficulties establishing

whether they are ‘on’ or ‘off’.1 Secondly, considering ‘on-off’ in terms of a dichotomy may

be too simplistic, since different gradations of ‘off’ exist, and transitions from ‘on’ to ‘off’

and vice versa may occur gradually. Thirdly, since patients usually have to make hourly or

half-hourly recordings during several days, the method is time-consuming and the large

number of evaluations may affect compliance.

Therefore, constructing a diary that lacks these shortcomings is of considerable importance.

We hypothesized that being ‘on’ or ‘off’ will affect the ability to perform activities and that

this ability could successfully replace ‘on-off’ evaluations. We decided to focus on activities

(‘disabilities’), rather than on signs and symptoms (‘impairments’), because studies

evaluating reliability of self-reports of disabilities have shown good results,2,3 whereas for

impairments poor results were found.4 We therefore aimed to construct a diary that reflects

‘on’ and ‘off’ time, that has a low respondent burden, and is nevertheless valid, reliable, and

responsive.

Materials and Methods

Patients

Consecutive, cognitively unimpaired PD-patients (MMSE ≥ 24), who visited the outpatient

neurology clinic of the Leiden University Medical Center, who were willing to co-operate,

and who fulfilled the United Kingdom Parkinson’s Disease Society Brain Bank criteria for

Chapter 9

- 148 -

idiopathic PD,5 entered the study. Patients with other central neurological conditions were

excluded.

Diary Card

The development of the diary card (DC) is part of a larger research project, the SCOPA-

project (SCales for Outcomes in PArkinson’s disease). The SCOPA-DC has five items that

concern walking, the ability to make transfers, manual activities, dyskinesias, and quality of

sleep (appendix C). Patients simultaneously recorded whether they were ‘on’ or ‘off’. The

SCOPA-DC was filled out during five consecutive days, and items were evaluated 11 times

daily (after wakening, at breakfast, after breakfast, before lunch, at lunch, after lunch, before

dinner, at dinner, after dinner, while going to bed, at night). In this way each day was split

into time periods, anchored to specific parts of the day, rather than to separate hours. At

awakening patients recorded both the quality of sleep and the actual number of hours asleep

during the previous night. The number of minutes of sleep during daytime was also recorded.

There were four response options. Patients were asked to take an average over the previous

time period. Each day a new DC was completed. Instructions were given on the back of the

card (Appendix B).

Methods of assessment

Patients made diary recordings during five consecutive days and brought their DC’s to the

clinic on the sixth day. On that day the patients were first asked to fill out a questionnaire,

evaluating the experiences with the use of the DC. Hereafter, patients were interviewed by an

investigator, who gathered information from both the patient and, if present, the partner. The

investigator evaluated the same DC-items and recorded, on the basis of the obtained historical

information, which response option best reflected the patient’s average situation over the past

week. The partner was interviewed on the actual performance of a patient’s activities. The

investigator also administered the Mini-Mental State Examination (MMSE),6 the Hoehn and

Yahr scale (H&Y),7 and the Activities of Daily Living sections of both the Unified

Parkinson’s Disease Rating Scale (UPDRS-ADL),8 and the Short Parkinson’s Evaluation

Scale (SPES-ADL).9 This latter scale has four response options and was developed by a

group of European neurologists with the intent to simplify scoring of patients with PD. The

investigator was blind to the patient’s diary recordings at the time these scales were

administered. A neurologist independently assessed whether the patient suffered from motor

fluctuations. Patients not experiencing motor fluctuations were considered stable. Stable


- 149 -

patients were asked to fill out the DC during another period of five consecutive days two

weeks later, and once again three months later. Fluctuating patients were only asked to fill

out the DC’s three months later.


Data were entered and analyzed with SPSS for windows (release 9.0). The feasibility and

acceptability of the DC was assessed using the percentages of patients who expressed

difficulties with the DC. We assessed whether we could reduce both the number of recording

days as well as the number of daily evaluations without substantial changes in item means,

standard deviations, and test-retest reliability.

Reliability. Internal consistency was assessed using Cronbach’s α. Values higher than 0.7

were considered to reflect adequate internal consistency for group comparisons.10 For each

patient an activity score was calculated, based on the sum of the scores on walking, transfers

and manual activities across five days. This score was linearly transformed to a 0-100 scale

and labeled ‘DCsum’ (Appendix D). Higher values of DCsum reflect greater difficulties with

activities, while the complement of DCsum represents a percentage of ‘good functioning’.

Test-retest reliability was established by computing intraclass correlation coefficients (ICC)

for stable patients, both for a two-week and a three-month interval. Both the investigator’s

ratings of item scores over the last week and the partner’s ratings of the actual performance of

a patient’s activities, were compared with the patient’s own diary recordings, using Pearson’s

correlation coefficient (r). These ratings were considered to reflect inter-observer reliability.

Validity. Construct validity with external measures (H&Y, UPDRS-ADL, SPES-ADL) was

established using Pearson’s correlation coefficient. Construct validity using ‘known-groups’

comparisons based on disease severity (H&Y) was assessed with analysis of variance

(ANOVA), whereas patients with and without motor fluctuations were compared using an

independent samples t-test. The significance level was set at 0.05 and corrected for multiple

comparisons (Bonferroni) whenever appropriate.

Responsiveness. Within-person standard deviations of the aggregated first three items of

stable patients were compared with those of fluctuating patients, using a t-test for

independent samples. Smaller within-persons standard deviations of stable patients compared

to fluctuating patients, may be indicative for responsiveness. An independent samples t-test

was also used to compare the within-subjects coefficients of variation (CV) of stable and

fluctuating patients. In the CV the standard deviation of an item mean is expressed as a

Chapter 9

- 150 -

percentage of this item mean. Higher CV’s indicate greater variability and consequently,

fluctuating patients are expected to show significantly higher CV’s than stable patients.

Prediction of ‘on’ and ‘off’. Logistic regression was used to assess whether DCsum could

adequately predict whether patients were ‘on’ or ‘off’.

Results

Eighty-four eligible patients entered the study (table 1). Because patients in H&Y stages 1

and 5 were underrepresented, disease severity was categorized as mild (H&Y 1 and 2),

moderate (H&Y 3), or severe (H&Y 4 and 5). Thirty-four patients had a stable disease

pattern, whereas 47 patients had motor fluctuations. Three patients could not unequivocally

be assigned to one of both categories.

Eighty-three percent of the patients indicated that the DC was easily understood, while

another 69% considered the DC easy to use. Seventy-nine percent indicated that completing

the DC did not take much time. The median recording time was 5-10 minutes a day.

Data reduction

The DC was successfully reduced. Item means, standard deviations, and test-retest reliability

did not substantially change when the number of days was reduced from five to three and the

number of daily evaluations from 11 to seven, neither for stable patients, nor for fluctuating

patients (table 2). We made no efforts to further reduce the number of days in order to avoid

the risk that ‘day-to-day’ fluctuations would seriously affect the results. Nor did we attempt

to further reduce the number of daily evaluations, as this would have resulted in too large

gaps in the distribution of evaluations across the day.

Clinimetrics of the SCOPA Diary Card

The summary score DCsum was calculated and used as an index of severity, because it

reflects the level of difficulties in performing activities. Items 4 (dyskinesias) and 5 (sleep)

were considered separately, because they do not necessarily reflect disease severity.

Dyskinesias and quality of sleep were significantly correlated with DCsum (r = 0.46 and

0.32, respectively; p < 0.01). DCsum was normally distributed (Kolmogorov-Smirnov Z =

0.59; p = 0.87).


- 151 -


number of patients 84 (50 male, 34 female)

modified Hoehn and Yahr

mild (H&Y 1 and 2)

moderate (H&Y 3)

severe (H&Y 4 and 5)

n = 26 (H&Y 1: n = 6; H&Y 2: n = 20)

n = 37

n = 21 (H&Y 4: n = 21; H&Y 5: n = 0)

mean (SD) age in years 63.06 (8.05)

mean (SD) age at onset in years 54.02 (9.92)

mean (SD) disease duration in years 8.91 (5.26)

number of patients on levodopa 68 (81%)

mean (SD) years on levodopa (n=68)

mean (SD) levodopa dose in mg (n=68)

6.67 (4.92)

631 (367)

number of patients on dopamine agonists 69 (84%)

number on other antiparkinsonian medication 49 (60%)

mean (SD) MMSE

mean (SD) SPES-ADL

mean (SD) UPDRS-ADL

27.04 (2.21)

9.68 (3.77)

16.49 (6.52)

Reliability. Cronbach’s α, calculated over the first three items, was 0.80. Test-retest

reliability, evaluated using sumscores per item across three days, was assessed in 34 stable

patients. The mean ICC was 0.91 for a two week interval (range items: 0.84-0.95; table 3)

and 0.68 for a three month interval (range items: 0.53-0.76). The mean reproducibility over a

three month interval in 33 fluctuating patients was 0.37 (range items: 0.30-0.44). The

correlation between investigator’s ratings and patient’s ratings was 0.81, while the correlation

between partner’s ratings and patient’s ratings was 0.70 (r).

Validity. Construct validity of DCsum with external measures was 0.46 for the H&Y stage,

0.57 for the SPES-ADL, and 0.62 for the UPDRS-ADL. Diary recordings of patients grouped

by disease severity revealed significant differences between groups (ANOVA, p < 0.001).

Post hoc comparisons between the three groups indicated that differences between mildly and

moderately affected patients just failed to reach significance (p = 0.08), whereas both other

differences (mild versus severe, moderate versus severe) were significant at the 0.001 level.

Chapter 9

- 152 -

Table 2. Comparison long and short version of the SCOPA Diary Card

parameter long version

(5 days, 11 evaluations)

short version

(3 days, 7 evaluations)

mean (SD) aggregated items 1-3 0.84 (0.44) 0.86 (0.44)

mean (SD) item 4 (dyskinesias) 0.49 (0.60) 0.49 (0.60)

mean (SD) item 5 (sleep) 1.06 (0.55) 1.04 (0.57)

mean (SD) of hours sleep night 6.43 (1.52) 6.40 (1.55)

mean (SD) of minutes sleep day 0.62 (0.66) 0.65 (0.70)

ICC1 aggregated items 1-3 0.93 0.93

ICC 4 (dyskinesia) 0.93 0.94

ICC 5 (sleep) 0.84 0.84

ICC hours slept at night 0.89 0.89

ICC hours slept during day 0.86 0.86

1 ICC: intraclass correlation coefficient

‘Known-groups’-comparisons between stable and fluctuating patients revealed significant

differences in DCsum (23.88 ± 13.05 versus 30.81 ± 14.28, respectively; p < 0.05).

Responsiveness. Within-person standard deviations of stable patients, calculated over the first

three items of three consecutive days, differed significantly from those of fluctuating patients

(1.58 ± 1.42 versus 2.81 ± 1.59, respectively; p < 0.001). Coefficients of variation of

fluctuating patients were significantly higher than those of stable patients (0.39 versus 0.17, p

< 0.001).

Prediction of ‘on’ and ‘off’. The mean correlation (r) between DCsum and the patient’s own

estimation of ‘on’ and ‘off’ was 0.75. Binary logistic regression analysis revealed that being

‘on’ or ‘off’ was accurately predicted by DCsum in 93% of the cases, making separate

evaluation of this characteristic unnecessary. This indicated that we could remove the ‘on-

off’-item from the final version of the DC. This resulted in a DC with five items, to be

assessed seven times daily during three consecutive days (Appendix A).


- 153 -

Table 3. Test-retest reliability Diary Card

stable patients 2 weeks 3 months

item 1 (walking) ICC1 = 0.95 ICC = 0.76

item 2 (transfers) ICC = 0.95 ICC = 0.70

item 3 (manual activities) ICC = 0.89 ICC = 0.53

item 4 (dyskinesias)

item 5 (sleep)

number of hours sleep at night

number of hours sleep in daytime

DCsum2

ICC = 0.94

ICC = 0.84

ICC = 0.89

ICC = 0.86

ICC = 0.88

ICC = 0.73

ICC = 0.66

ICC = 0.79

ICC = 0.44

ICC = 0.73

1 ICC: intraclass correlation coefficient 2 DCsum: scores aggregated over first three items of three days, linearly

transformed to 0-100 scale

Discussion

Despite the general use of ‘on-off’ diaries in drug trials involving PD-patients with motor

fluctuations, no information is available on the clinimetric characteristics of such diaries.

Information on other forms of self-report in PD is also sparse. The available studies indicate

that self-assessment of ‘disabilities’ (i.e., difficulties performing activities such as walking,

dressing, and eating) in PD-patients can produce reliable2,3 and valid2 results. Reliability and

validity of ‘impairments’ (signs and symptoms) in PD, however, is less favorable.4 Diaries

should therefore focus on disabilities rather than on impairments. Another reason for

preferring disabilities over impairments is that disabilities generally are more meaningful to

patients.11

We decided to select three key activities: walking, transfers, and manual activities. These

activities were chosen because they are important to people, are difficult to avoid, and occur

frequently during the day. Additional support for these activities is provided by four factor

analysis studies that addressed ADL and disability in PD patients.12-15 These studies revealed

a general pattern of one factor concerned with functions on the whole body level (gait and

mobility, ADL functions), and one factor predominantly dealing with manual activities. We

therefore decided to focus on these factors. In our study the first factor (‘gross mobility’) was

split into two items, one concerning walking, the other the ability to make transfers. Manual

activities was the third item. Dyskinesias were included as the fourth item as they reflect

Chapter 9

- 154 -

another important motor complication, and this item was also addressed at the disability

level. Although dyskinesias may affect the performance of activities, the item was not

included in the sumscore because there is no clear-cut relation with the extent of disability.

After all, on the one hand dyskinesias may aggravate disabilities, while on the other hand

dyskinesias frequently occur during ‘on’, when patients benefit from medication and are

better capable of performing activities. The fifth item concerned the quality of night-time

sleep and was assessed once a day. The actual amount of sleep time, both during night and

day, was included in the diary as well.

We selected seven evaluation moments, anchored to usual breaks in the day, for the final

version of the DC. The time points ‘mid-morning’ and ‘mid-afternoon’ can be linked to

coffee and tea breaks, or be combined with other breaks, given the cultural setting. The

natural pauses make moments of reflection possible and patients are asked to take an average

over the previous period. This is less burdensome for patients than hourly evaluations that

may affect compliance. For people with other schedules, or whenever considered appropriate,

an alternative time table may be adopted, provided that an equal spread over the day is taken

into account.

The clinimetric analysis of the DC shows good results. Both the internal consistency and the

reproducibility in stable patients over two weeks are good. As can be expected, the

reproducibility decreases over longer time intervals, especially in fluctuating patients. The

lower values for reproducibility of fluctuating patients are in fact a favorable outcome, as this

may indicate that the DC is potentially responsive. After all, stable patients should produce

consistent scores on the outcome measure and thus reveal high reproducibility, whereas

fluctuating patients should display lower results, given the fact that changes occur.

The DC correlates adequately with external measures (scales, investigator, partner), which

gives evidence to the construct validity of this assessment method. Construct validity is also

confirmed by group comparisons, showing differences in the expected directions between

patients with different disease severity, and between stable and fluctuating patients.

This study had some potential shortcomings. Firstly, the accuracy of patients’ own ratings of

‘on’ and ‘off’ was not independently established and therefore may lack validity. Secondly,

the responsiveness of the SCOPA-DC still needs to be established more thoroughly, for

instance in a study with an intervention of known efficacy.


- 155 -

It may be hypothesized that this DC produces indices that do not seem to be as clear-cut as

the reduction in ‘off-time’. However, the complement of DCsum represents a percentage of

‘good functioning’, and changes in this percentage reflect treatment efficacy. Additionally, as

we outlined before, it is too simplistic to consider ‘on-off’ as a dichotomy, since there are

gradual differences in off-state, which are not accounted for in the traditional on-off diaries.

This study is the first thorough attempt to evaluate the clinimetric characteristics of a PD

diary. Additionally, pre-post comparisons of row scores and DCsum scores, can be used to

evaluate the efficacy of interventions. Correspondingly, column totals can be used to assess

the stability of performance over time. To this end, standard deviations of the column totals

are expressed in a percentage of the means of the column totals, in a ‘coefficient of variation’.

Fluctuating patients will display greater variability in performance, resulting in a larger

coefficient of variation. Consequently, a decrease of the coefficient of variation after an

intervention indicates treatment efficacy, as it reflects a reduction of fluctuations.

Statistical analysis indicates that seven daily evaluations suffice, and for that reason the

SCOPA Diary Card in the form outlined here, is an appropriate outcome measure in both

research and clinical practice. With respect to the management of motor complications in

daily practice, a more frequent sampling is desirable. In this situation an alternative form with

separate hourly intervals (but with the same items and response options), is more appropriate.

Acknowledgements

Financial support for this study was granted from the Netherlands Organization for Scientific

Research (NWO, project number 0940-33-021). The authors thank Prof. R.A.C. Roos and Dr.

A.H. Zwinderman for their help with the manuscript.

References 1. Goetz CG, Stebbins GT, Blasucci LM, Grobman MS. Efficacy of a patient-training videotape on motor

fluctuations for on-off diaries in Parkinson's disease. Mov Disord 1997; 12(6):1039-1041.

2. Brown RG, MacCarthy B, Jahanshahi M, Marsden CD. Accuracy of self-reported disability in patients

with parkinsonism. Arch Neurol 1989; 46(9):955-959.

3. Louis ED, Lynch T, Marder K, Fahn S. Reliability of Patient Completion of the Historical Section of the

Unified Parkinson's Disease Rating Scale. Mov Disord 1996; 11(2):185-192.

Chapter 9

- 156 -

4. Golbe LI, Pae J. Validity of a Mailed Epidemiological Questionnaire and Physical Self-Assessment in

Parkinson's Disease. Mov Disord 1988; 3(3):245-254.



6. Folstein MF, Folstein SE, McHugh PR. "Mini-mental state". A practical method for grading the

cognitive state of patients for the clinician. J Psychiatr Res 1975; 12(3):189-198.









11. Hogan T, Grimaldi R, Dingemanse J, Martin M, Lyons K, Koller W. The Parkinson's disease symptom

inventory (PDSI): a comprehensive and sensitive instrument to measure disease symptoms and treatment

side-effects. Parkinsonism Relat Disord 1999; 5:93-98.

12. Baas H, Stecker K, Fischer PA. Value and appropriate use of rating scales and apparative measurement

in quantification of disability in Parkinson's disease. J Neural Transm 1993; 5:45-61.

13. Henderson L, Kennard C, Crawford TJ, Day S, Everitt BS, Goodrich S, et al. Scales for rating motor

impairment in Parkinson's disease: studies of reliability and convergent validity. J Neurol Neurosurg

Psychiatry 1991; 54:18-24.


al. Intermediate Scale for Assessment of Parkinson's Disease. Characteristic and Structure. Parkinsonism

& Related Disorders 1995; 1(2):97-102.



9(1):84-88.

QUESTIONSprevious

nightbreakfast

(or until 7.00)

mid-morning

(7.00 - 10.00)

lunch(10.00 - 13.00)

mid-afternoon

(13.00 – 16.00)

dinner

(16.00 – 19.00)

bed time(19.00 - 22.00)

sum

Slept greater part?

1. Walking

2. Changing position

3. Using your hands

4. Uncontrollablemovements

5. SleepNumber of hours sleep in previous night:

(to the nearest half hour)

Number of minutes sleep during the day:

(to the nearest 15 minutes)

sum 1-3

Date: Name: IDno:

Place a cross in the box which best reflects your situation over the previous time period.

For ‘walking’, ‘changing position’ and ‘using your hands’: For ‘uncontrollable movements’: For ‘sleep’:

0 = no difficulty 0 = absent or do not bother me 0 = slept very well

1 = slight difficulty (somewhat slow, no help required) 1 = bother me slightly 1 = slept rather well

2 = moderate difficulty (rather slow, some help required) 2 = bother me moderately 2 = slept rather badly

3 = severe difficulty (impossible or only with a lot of help) 3 = bother me a lot 3 = slept very badly

0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3

0 1 2 30 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

yesyes yes yes yes yes

QUESTIONSprevious

nightbreakfast

(or until 7.00)

mid-morning

(7.00 - 10.00)

lunch(10.00 - 13.00)

mid-afternoon

(13.00 – 16.00)

dinner

(16.00 – 19.00)

bed time(19.00 - 22.00)

sum

Slept greater part?

1. Walking

QUESTIONSprevious

nightbreakfast

(or until 7.00)

mid-morning

(7.00 - 10.00)

lunch(10.00 - 13.00)

mid-afternoon

(13.00 – 16.00)

dinner

(16.00 – 19.00)

bed time(19.00 - 22.00)

sum

Slept greater part?

1. Walking


3. Using your hands


3. Using your hands






sum 1-3






sum 1-3

Date: Name: IDno:

Place a cross in the box which best reflects your situation over the previous time period.

For ‘walking’, ‘changing position’ and ‘using your hands’: For ‘uncontrollable movements’: For ‘sleep’:

0 = no difficulty 0 = absent or do not bother me 0 = slept very well

1 = slight difficulty (somewhat slow, no help required) 1 = bother me slightly 1 = slept rather well

2 = moderate difficulty (rather slow, some help required) 2 = bother me moderately 2 = slept rather badly

3 = severe difficulty (impossible or only with a lot of help) 3 = bother me a lot 3 = slept very badly

0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3

0 1 2 30 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

yesyes yes yes yes yes

Appendix A. SCOPA Diary Card

Chapter 9

- 158 -

Appendix B. Patient instruction

How to complete the SCOPA Diary Card?

• Fill in one Diary Card each day and indicate name and date.

• Indicate at each time point, by means of a cross, which description best reflects your situation over the

previous time period. For example: if you experienced slight difficulty walking during the period from

waking up until breakfast, place a cross in box ‘1’ in the row ‘walking’ under the heading ‘breakfast’.

Always estimate an average over the relevant time period. At ‘mid-morning’, place a cross in the box that

on average best reflects the period from breakfast until mid-morning.

• On waking in the morning, fill in the column ‘previous night’. You may not be able to assess some

categories in the column ‘previous night’ (e.g. walking). In that case, do not fill anything in. If you went to

the toilet, then you can assess this.

• Also on waking, record in the category ‘Sleep’ how well you slept (by means of a cross in the appropriate

box), as well as the number of hours you slept the previous night. This means the number of hours you

actually slept and not the number of hours you spent in bed. At the end of the day (at ‘bed time’) record the

total number of minutes you slept during that day.

• In the other columns always indicate first whether you slept the greater part of the relevant time period. If

you slept during the greater part of a particular time period during the day, then do not place a cross in any

of the categories, but place a cross in the box ‘yes’ in the row ‘slept greater part?’. If you did not sleep

during the relevant time period, or only during the lesser part of the time, then fill in the categories 1 to 4

and do not place a cross in the row ‘slept greater part?’.

• If your daily time table is very different, e.g. because you do not eat breakfast in the morning, use the times

specified under the column headings.

• Do not write in the row labelled ‘sum 1-3’ or in the column labelled ‘sum’.


- 159 -

Appendix C. Item descriptions

WALKING

Which of the following descriptions of walking best reflects your situation?

0. No difficulty walking.

1. Slight difficulty walking (no help required).

2. Moderate difficulty walking (some help required).

3. Severe difficulty walking (impossible or only with a lot of help).

CHANGING POSITION

Which of the following descriptions of turning over in bed, getting up out of bed, getting up out of a chair and

turning around when standing best reflects your situation?

0. No difficulty with any of these actions.

1. Slight difficulty with one or more of these actions (somewhat slow, no help required).

2. Moderate difficulty with one or more of these actions (rather slow, some help required).

3. Severe difficulty with one or more of these actions (impossible or only with a lot of help).

USING YOUR HANDS

Which of the following descriptions of using your hands when eating, drinking, getting dressed, washing,

shaving, combing your hair and brushing your teeth best reflects your situation?

0. No difficulty with any of these actions.

1. Slight difficulty with one or more of these actions (somewhat slow, no help required).

2. Moderate difficulty with one or more of these actions (rather slow, some help required).

3. Severe difficulty with one or more of these actions (impossible or only with a lot of help).

UNCONTROLLABLE (SUPERFLUOUS) MOVEMENTS*

* These so-called ‘dyskinesias’ involve irregular, abrupt, jerking movements of arms, legs, trunk or head that

occur involuntarily. It does not include shaking or trembling.

Which of the following statements about uncontrollable movements best reflects your situation?

0. Absent or does not bother me.

1. Present and bothers me slightly.

2. Present and bothers me moderately.

3. Present and bothers me a lot.

SLEEP

Which of the following statements about sleep best reflects your situation?

0. I have slept very well.

1. I have slept rather well.

2. I have slept rather badly.

3. I have slept very badly.

Chapter 9

- 160 -

Appendix D. Guidelines for scientific use of the SCOPA Diary Card 1. The SCOPA Diary Card is recorded for a period of three consecutive days, e.g. three at baseline and three

days at follow-up assessments.

2. Per item a total score is calculated over these three days and this score is transformed to a 0-100 scale

(dividing by 63, multiplication by 100; subtract 3 from 63 for each missing value). Do not calculate row

scores for days where two or more time points are missing.

3. A sumscore (DCsum) is calculated over the aggregated first three items and transformed to a 0-100 scale.

To this end, first the score on each item (as computed in 2) is divided by 3, after which these scores are

summed. Higher scores of DCsum indicate greater difficulty, while its complement reflects the percentage

of ‘good functioning’.

4. Scores on dyskinesia and sleep are not included in the sumscore DCsum, but are evaluated separately.

Calculate as in 2.

5. In order to explore the stability of performance, calculate co lumn totals over the first three items of each

time point of each day. Then calculate the mean column total (over all 21 columns, or less in the case of

missing values) and the standard deviation of this column total. Divide the standard deviation by the mean.

This yields the ‘coefficient of variation’ (CV). Stable patients are not expected to show large variations and

will have a small CV, while fluctuating patients will have higher CV’s. CV’s greater than 0.30 are regarded

to reflect a fluctuating disease pattern. In order to calculate a column total, no scores should be missing in

that particular column.

The CV becomes unstable in the case of very low means, and therefore this parameter should not be

computed if the mean column total is lower than 1 (as these values are only found in patients without

disabilities and without fluctuations this presents no problem, because the DC was not meant to be used in

these patients).

- 161 -

1100 Development of a questionnaire for sleep and sleepiness in

Parkinson's disease

Johan Marinus1, Martine Visser1, Jacobus J. van Hilten1, Gert Jan Lammers1 Anne M.

Stiggelbout2

Departments of Neurology1 and Medical Decision Making2, Leiden University Medical Center,

Leiden, The Netherlands.

Submitted

Chapter 10

- 162 -

Abstract

Objective. To develop a valid, reliable, and short questionnaire (SCOPA-SLEEP) that assesses night-

time sleep (NS) and daytime sleepiness (DS) in patients with Parkinson's disease (PD). Methods. A

postal survey including four instruments, the SCOPA-SLEEP NS (5 items) and DS (6 items), the

Pittsburgh Sleep Quality Index (PSQI), and the Epworth Sleepiness Scale (ESS) was completed by

142 patients with PD and 100 controls. Results. Reliability of the scale was high: internal consistency

of the NS and DS scales were 0.88 and 0.91, respectively (Cronbach alpha), and test-retest reliabilities

were 0.94 and 0.89, respectively (intraclass correlation coefficient). Scale scores differed significantly

between patients and controls (p < 0.001). Construct validity was assessed by correlations with scales

that addressed similar constructs. Correlation between the NS scale and the PSQI was 0.83 (p <

0.001), and the correlation between the DS scale and the ESS was 0.81 (p < 0.001). Factor analysis

revealed one factor each for both scales, indicating that the scales measure one construct, which

justifies the calculation of sumscores. The coefficient of variation of both the NS and the DS scale was

higher than that of the PSQI and the ESS, indicating a better ability to detect differences between

individuals. Conclusions. The SCOPA-SLEEP is a reliable, valid, and practical instrument for

assessing both night-time sleep and daytime sleepiness in patients with PD.

Sleep and sleepiness in Parkinson’s disease

- 163 -

Introduction Insomnia and hypersomnia occur frequently in the general population and increase with

higher age. Several studies found even higher prevalences of both types of sleep problems in

patients with Parkinson's disease (PD).1-3 Poor night-time sleep is associated with lower

quality of life of patients and their spouses,4-6 while excessive daytime sleepiness may be

bothersome or even dangerous. A few studies reported the occurrence of 'sleep attacks' among

patients with PD, potentially causing hazardous situations.7-9 Sleep problems therefore merit

particular attention and this raises the question how these should be evaluated.

Within the scope of selecting appropriate instruments for a longitudinal study of patients with

PD, we were interested in a concise, practical, and clinimetric sound instrument to assess both

night-time sleep problems and daytime sleepiness in PD. The questionnaire should be

appropriate for both research and clinical practice. However, none of the existing sleep scales

matched our objective. Some scales lacked conceptual clarity and combined scores on items

addressing different constructs into a sumscore (e.g., Pittsburgh Sleep Quality Index,10

Parkinson's Disease Sleep Scale11). Other scales had potential problems with face validity and

were either too short (Stanford Sleepiness Scale,12 Karolinska Sleepiness Scale13), lacked

relevant items (Sleep Problems Scale14), or asked patient to indicate the chance of falling

asleep in situations they possibly did not experience (Epworth Sleepiness Scale15). Still other

scales were not suitable for clinical use, because they were too long, the calculation of

sumscores was complex (Pittsburgh Sleep Quality Index), or combined continuous and

categorical responses (St. Mary's Hospital Sleep Questionnaire16). Additionally a number of

scales were not appropriate because they involved diagnostic instruments (Sleep Disorders

Questionnaire17), or were intended for particular patient groups (e.g., narcolepsy) or particular

interventions (e.g., pharmacological in the Leeds Sleep Evaluation Questionnaire18).

We therefore decided to develop and validate a new scale, the SCOPA-SLEEP, that evaluates

both night-time sleep (NS) and daytime sleepiness (DS) in patients with PD. The scale is

intended for comparing groups in research situations and for clinical use in individual

patients. The development of this scale is part of a larger project on SCales for Outcomes in

PArkinson’s disease (SCOPA; http://www.lumc.nl/2050/research/scopa_homepage.html), in

which short, practical, and clinimetric sound scales for all relevant domains in PD are selected

or developed.

Chapter 10

- 164 -

Methods

Scale development

Items in the NS scale were selected from the literature and evaluated whether patients

experienced problems with respect to their nocturnal sleep. It was hypothesized that together

these items would reflect a patient's perceived sleep quality. The items were judged by experts

and piloted among patients regarding comprehensibility and clarity. Testing was continued

until no further problems were encountered and patients understood all items well. The DS

scale was developed similarly and evaluated how often patients had fallen asleep in daytime,

had experienced difficulty staying awake, and whether falling asleep in daytime was

considered a problem. The SCOPA-SLEEP thus consists of two parts. The NS subscale

addresses night-time sleep problems in the past month and includes five items with four

response options. Patients have to indicate how much they were bothered by particular sleep

problems, ranging from 0 (not at all) to 3 (a lot). The five items address sleep initiation, sleep

fragmentation, sleep efficiency, sleep duration, and early wakening. The maximum score of

this scale is 15, with higher scores reflecting more severe sleep problems. One additional

question evaluates overall sleep quality on a seven-point scale (ranging from 'slept very well'

to 'slept very badly'). The score on this item is not included in the sumscore of NS, but used

separately as a global measure of sleep quality. The DS subscale evaluates daytime sleepiness

in the past month and includes six items with four response options, ranging from 0 (never) to

3 (often). Subjects indicate how often they fell asleep unexpectedly, fell asleep in particular

situations (while sitting peacefully, while watching TV or reading, or while talking to

someone), how often they had difficulty staying awake, and whether falling asleep in daytime

was considered a problem. The maximum score is 18, with higher scores reflecting more

severe sleepiness.

Participants

Since patients with PD reported more sleep problems than controls in almost all previous

studies, the scales would have to be able to detect these differences, and therefore subjects

without PD were included as a control group.

Patients. Patients who visited the outpatient clinic of the Department of Neurology of the

Leiden University Medical Center and fulfilled the United Kingdom Parkinson’s Disease

Society Brain Bank criteria for idiopathic PD,19 were included. Patients were excluded if they


- 165 -

also had other diseases of the central nervous system or were not able to read or understand

Dutch.

Controls. Subjects without PD who were able to read or understand Dutch were eligible as

controls, provided that they had no history of diseases of the central nervous system.

Recruitment. Questionnaires were sent to eligible patients. An introductory letter provided

information on the goal of the study and requested patients to provide the names of two

persons, one man and one woman, who consented to participate as a control subject. The age

difference between the patient and his or her controls was not to exceed 10 years. The

introductory letter emphasized that only the names of persons that explicitly expressed their

willingness to participate were to be provided. Partners were not eligible as controls, since

nocturnal sleep problems of patients could affect the partner’s sleep pattern.6,20 Response was

interpreted as consent to participate. The study was approved by the medical ethics committee

of the Leiden University Medical Center.

Scale evaluation

A postal survey was sent to potential participants. The included questionnaires were the

SCOPA-SLEEP (appendix), the Pittsburgh Sleep Quality Index (PSQI),10 and the Epworth

Sleepiness Scale (ESS).15 Eight additional questions were used to evaluate use of sleep

medication, sleep initiation time (minutes), time awake per night (hours), actual duration of

night-time sleep (hours), duration of daytime sleep (minutes), and how often subjects had

'planned naps', 'unplanned naps', or 'fallen asleep quite unexpectedly' in the past month.

Response options for the latter three questions ranged from ‘not at all’ to ‘every day’. The

PSQI, ESS, and the eight additional questions were included to assess the construct validity of

the SCOPA-SLEEP. The PSQI and the ESS were included because they are frequently used

and have previously been used in studies involving patients with PD. The PSQI evaluates

several aspects of night-time sleep and consists of 19 self-rated questions and five questions

rated by the bedpartner or roommate.10 The latter five questions are used for clinical

information only and are not tabulated in the scoring of the PSQI. Scores are first grouped in

seven domains and next recoded to a 0-3 scale. The seven domains include subjective sleep

quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbances, use of

sleep medication, and daytime dysfunction. Both the total score and the subscale scores can

be used. The total score has a maximum of 21, with higher scores reflecting greater problems.

The developers advise a cut-off score of 5/6 to separate good from bad sleepers.10

Chapter 10

- 166 -

The ESS evaluates daytime sleepiness. In this scale, the individual is asked to rate the chance

of dozing off for eight different situations.15 There are four response options, ranging from 0

(would never doze) to 3 (high chance of dozing). The maximum score is 24, with higher

scores reflecting more severe sleepiness. Healthy controls usually have scores ≤ 10. Scores ≥

11 are considered indicative for excessive sleepiness.21 Scores ≥ 16 indicate a high level of

daytime sleepiness, but are by themselves not diagnostic of a particular sleep disorder.15

Participants were asked to complete the questionnaires within one week. After two weeks

non-responders were contacted by telephone and the investigator inquired whether the subject

still considered participating. Patients who returned their questionnaires within one week were

asked to complete the SCOPA-SLEEP a second time two weeks later, for the evaluation of

the test-retest reliability. Information from the questionnaires of participating patients was

combined with information from patient records, i.e., disease severity, disease duration, and

medication, to assess 'known-groups' validity. Disease severity was evaluated at each control

visit and assessed by the Hoehn and Yahr (H&Y) staging system.22 H&Y 1 is the mildest

stage with only unilateral symptoms, whereas H&Y 5 is the most severe stage in which

patients are wheelchair-bound or bedridden.


Data were entered and analyzed with SPSS for Windows 10.0 (SPSS Inc, Chicago, IL, USA).

Questionnaires were excluded if they had more than 20% missing values.

Data quality and score distribution. The quality of the data was considered acceptable if item

scores were missing in less than 10% of the patients23 and item-total correlations in the patient

group exceeded 0.20.24

Reliability. Internal consistency of the scales was assessed with Cronbach α. Test-retest

reliability for individual items was assessed with a weighted kappa (Kw; quadratic weights),

whereas for the total score an intraclass correlation coefficient (ICC) was used.

Validity. Age, disease severity, and male-female ratio of responders were compared with

those of non-responders, using t-tests, Mann-Whitney-U tests, and Chi-square tests,

respectively. Independent samples t-tests were used to compare scores of patients and

controls, and scores of patients that were on medication (levodopa, dopamine agonists, or

sleep medication) versus those who were not. The significance threshold was set at 0.05.

Construct validity of the SCOPA-SLEEP was assessed by calculating the correlation between

this scales and scales that addressed similar constructs, using Pearson's correlation coefficient


- 167 -

(r). This coefficient was also used to explore the relation with disease duration. Spearman’s

correlation coefficient (rs) was used if the correlation involved subscales of the PSQI and the

‘global sleep quality’ item. Known-groups validity was assessed by comparing the NS and DS

scores of patients with different disease severity, using analysis of variance (ANOVA). To

discriminate groups of patients with different disease severity, patients were classified as mild

(H&Y 1 and 2), moderate (H&Y 3), or severe (H&Y 4 and 5). Stages 1 and 2 on the one

hand, and 4 and 5 on the other hand, were collapsed because patients in H&Y stages 1 and 5

were underrepresented, a common finding in studies involving patients with PD. A principal

component factor analysis with orthogonal rotation was performed to explore the underlying

structure of the scales. Coefficients of variation (CV) were calculated to assess the

discriminative properties of the scale. The CV is calculated by dividing the standard deviation

of the score by the mean of the score. Higher values for CV indicate a better ability to detect

differences between individuals.

Results

Response rate and sample characteristics

A postal survey was sent to 185 patients with PD and 112 controls. One-hundred-forty-three

patients returned their questionnaires. One questionnaire had more than 20% missing data and

was excluded. Thus 142 usable questionnaires remained, constituting a response rate of

76.7%. One-hundred-and-four controls returned their questionnaires. Four questionnaires

were excluded because the age difference with the corresponding patient was more than 10

years. Therefore, 100 usable questionnaires (89.3%) were available for analysis. Fifty-six of

the 60 patients that returned their questionnaire within one week, completed the SCOPA-

SLEEP a second time. One questionnaire was subsequently removed from the analysis

because too much data were missing, leaving a response rate of 91.7%.

Differences between responders and non-responders in the patient group were not significant

for disease severity and age, but the proportion of women among the non-responders was

significantly higher (p < 0.05). The mean disease duration of the patients was 9.9 years (table

1). Disease severity was mild in 61 patients (43.0%), moderate in 52 patients (36.6%), and

severe in 29 patients (20.4%). The male-female ratio did not differ significantly between

patients and controls (p = 0.053), but controls were significantly younger.

Chapter 10

- 168 -

Table 1. Characteristics of participants

Patients Controls p

n 142 100

% male 60.5 48.0 0.0531

mean (SD) age (years) 65.6 (10.8) 61.4 (11.2) 0.0042

H&Y 1 4 (2.8%)

H&Y 2 57 (40.1%)

H&Y 3 52 (36.6%)

H&Y 4 26 (18.3%)

H&Y 5 3 (2.1%)


n on levodopa 73 (51%)

mean (SD) levodopa dose in users (mg) 665 (361)

n on dopamine agonists 60 (42%)

n on levodopa + dopamine agonists 54 (38%)

n on sleep medication 27 (19%) 7 (7%) < 0.0011

1 Chi-square test 2 t-test

Scale evaluation

Data quality and score distribution. The quality of the data was good. None of the items had

missing values in more than 10% of the patients, indicating good acceptability. All item-total

correlations exceeded 0.20. Patients used the full score range in both scales. Twenty-five

patients (17.7%) had a sumscore of 0 in the NS scale, whereas 2 patients (1.4%) scored 15.

Seventeen patients (12.1%) had a sumscore of 0 in the DS scale, whereas one patient (0.7%)

scored 18.

Reliability. Cronbach α for the NS subscale was 0.88, with corrected item-scale correlations

ranging from 0.48-0.85 (table 2). Test-retest reliability for the sumscore of this scale was 0.94

(ICC), whereas Kw for items ranged from 0.82-0.90. Cronbach α for the DS subscale was

0.91, with corrected item-scale correlations between 0.55 and 0.88. The ICC for the sumscore

of DS was 0.89, with Kw for items ranging from 0.49-0.82.


- 169 -

Table 2. Reliability of SCOPA-SLEEP scales in patient group

Cronbach α item-total1 ICC sum2 Kw items3

SCOPA-NS (night-time sleep) 0.88 0.48-0.85 0.94 0.82-0.94

1: difficulty falling asleep 0.48 0.90

2: been awake too often 0.77 0.82

3: lying awake too long 0.85 0.86

4: waking too early 0.72 0.83

5: had too little sleep 0.77 0.90

Overall sleep quality N/A N/A N/A 0.91

Pittsburgh Sleep Quality Iindex 0.77 0.27-0.71 * *

SCOPA-DS (daytime sleepiness) 0.91 0.55-0.88 0.89 0.49-0.82

1: falling asleep unexpectedly 0.88 0.79

2: falling asleep while sitting 0.86 0.78

3: falling asleep while watching TV 0.82 0.81

4: falling asleep while talking 0.55 0.78

5: difficulty staying awake 0.76 0.82

6: falling asleep considered a problem 0.64 0.49

Epworth Sleepiness Scale 0.86 0.56-0.71 * *

1 corrected item-total correlations; 2 intraclass correlation coefficient for the sumscore (calculated over 2 week

interval); 3 weighted kappa of items, calculated over 2 week interval; * reproducibility not assessed for this scale;

N/A = not applicable

Validity. The scores on all items of both parts of the SCOPA-SLEEP differed significantly

between patients and controls, with the exception of item NS1 (difficulty falling asleep) (table

3). Responses to seven of the eight additional questions also differed significantly between

patients and controls (all p-values < 0.001). The one exception again concerned sleep

initiation, with both groups indicating similar amounts of time before falling asleep.

Sumscores of patients and controls differed significantly on all included sleep scales (table 4).

The correlation between NS and the PSQI sumscore in the patient group was 0.83 (p < 0.001)

and the correlation with the separate subscales of the PSQI ranged from 0.38–0.73 (all p-

values < 0.001). The correlation between NS and the 'global sleep quality score' was 0.85 (p <

0.001), whereas this was 0.78 (p < 0.001) for the PSQI with the global score. The correlation

between the DS scale and the ESS in the patient group was 0.81 (p < 0.001). No significant

Chapter 10

- 170 -

differences were found in the scores of patients grouped by disease severity for any of the

four scales (ANOVA). The relation with disease duration displayed similar results, with low

and insignificant correlations.

Table 3. SCOPA-SLEEP item scores, median (interquartile range)

Patients Controls p p-adj1

Night-time Sleep (NS)

1. difficulty falling asleep 0 (1) 0 (1) 0.9592 0.771

2. been awake too often 1 (2) 1 (1) 0.0032 < 0.0012

3. lying awake too long 1 (2) 0 (1) < 0.0012 < 0.0012

4. waking too early 1 (2) 0 (1) < 0.0012 < 0.0012

5. had too little sleep 1 (2) 0 (1) < 0.0012 < 0.0012

'overall sleep quality' (0-6) 2 (2) 1 (1) < 0.0012 < 0.0012

Daytime sleepiness (DS)

1. falling asleep unexpectedly 1 (2) 0 (1) < 0.0012 < 0.0012

2. falling asleep while sitting 1 (2) 0 (1) < 0.0012 < 0.0012

3. falling asleep watching TV 1 (1) 0 (1) < 0.0012 < 0.0012

4. falling asleep while talking 0 (0) * < 0.0012 < 0.0012

5. difficulty staying awake 1 (1) 0 (1) < 0.0012 < 0.0012

6. sleepiness problematic 0 (1) 0 (0) < 0.0012 < 0.0012

Other sleep parameters

1. n (%) using sleep medication 27 (19) 7 (7) 0.0024 < 0.0012

2. sleep initiation time (minutes) 22 19 0.5343 0.71

3. time awake per night (hours) 1.9 0.6 0.0063 0.005

4. actual sleep per night (hours) 6.3 7.0 0.0013 0.007

5. sleep in daytime (minutes) 34 11 < 0.0013

6. median (IQ range) planned naps 2 (3) 0 (2) < 0.0012

7. median (IQ range) unplanned naps 1 (2) 0 (1) < 0.0012

8. median (IQ range) unexpected sleep 0 (1) 0 (0) < 0.0012

* all controls scoring 0, 1 univariate analysis, adjusted for age and sex, 2 Mann-Whitney-U test, 3 t-test, 4 Chi-

square test


- 171 -

There were no significant differences in any of the four scale scores between patients who

used levodopa and those who did not. We also found no significant correlation between the

levodopa dose and any of the scale scores in those patients that took levodopa. Scores on both

scales that evaluated daytime sleepiness were higher for patients taking dopamine agonists,

with differences reaching significance in the ESS (8.8 versus 5.9; p = 0.04), but not in the DS

(5.9 versus 4.1; p = 0.07). Subjects who used sleep medication had significantly higher NS

and PSQI scores in both the patient and the control group (p < 0.001), but differences in DS

and ESS scores were not significant.

If the proposed PSQI cut-off value (5/6) was used to discriminate between good and bad

sleepers, 106 subjects (29 controls and 77 patients, i.e., 43.8% of the total sample) were

considered poor sleepers. Using this PSQI cut-off as an external criterion for the NS subscale,

resulted in an area under the receiver operating characteristic (ROC) of 0.90, with an optimal

cut-off at 3/4, yielding a sensitivity of 0.82 and specificity of 0.84. Since we considered the

proportion of subjects with poor sleep by this criterion in both groups exceptionally high, we

also used responses to the 'global sleep quality' item as a criterion. This learned that only 32

subjects (3 controls, 29 patients) actually considered themselves poor sleepers, a finding that

agrees better with the literature.1,25,26 If this global item was used to separate patients who

slept badly (scores 4-6) from those who did not (scores 0-3), the best cut-off point for the NS

subscale was 6/7, with an area under the ROC curve in patients of 0.94. This cut-off value

showed a sensitivity of 0.97 and a specificity of 0.80. Using this same 'global sleep quality

criterion' for the PSQI, suggested that a cut-off of 8/9 would be more appropriate, both in

patients and in all subjects, resulting in an area under the ROC curve of 0.91, with a

sensitivity of 0.93 and a specificity of 0.76.

Three controls and 38 patients had an ESS score ≥ 11, whereas none of the controls and 16 of

the patients scored ≥ 16. Using the cut-off value of 10/11 to separate persons with excessive

daytime sleepiness from those without, indicated an optimal cut-off value of 4/5 for the

SCOPA-DS. The area under the ROC curve was 0.93, with a sensitivity of 0.90 and a

specificity of 0.82.

Chapter 10

- 172 -

Table 4. Sumscores of patients and controls

Patients Controls p-adj1 CV2

mean (SD) SCOPA-NS (night-time sleep) 4.9 (4.0) 2.8 (2.7) < 0.001 0.82

mean (SD) Pittsburgh Sleep Quality Index 7.2 (4.3) 4.5 (3.3) < 0.001 0.70

mean (SD) SCOPA-DS (daytime sleepiness) 5.2 (4.1) 2.1 (2.0) < 0.001 0.79

mean (SD) Epworth Sleepiness Scale 7.9 (5.3) 4.1 (3.2) < 0.001 0.67

1 univariate analysis of variance, adjusted for age and sex 2 Coefficient of Variation, i.e., the standard deviation of the score divided by the mean of the score; higher

values of CV indicate better ability to detect differences between individuals

Factor analysis of the SCOPA-NS revealed one factor, accounting for 68.1% of the variance.

For the DS subscale also one factor emerged, explaining 69.1% of the variance. The factor

analysis of the PSQI was performed on the seven subscales, which produced two factors

accounting for 58.7% of the variance, with the sleep pattern related items (quality, duration,

efficiency, and latency) loading on one factor, and daytime dysfunction and sleep

disturbances loading on the other. The ESS also revealed two factors, together explaining 63.4

%. Items that addressed the more private situations at home (items 1, 2, 5, and 7) loaded on

one factor, whereas items that evaluated more 'public' situations (car, public places, talking to

someone; items 3, 4, 6, and 8) loaded on the other.

The CV of both the NS and the DS scales were higher than those of the PSQI and the ESS

(table 4).

Discussion

We developed a short questionnaire for the assessment of sleep problems in patients with PD,

consisting of two scales, one that evaluates night-time sleep (NS) and one that assesses

daytime sleepiness (DS). The scales displayed good acceptability, and substantial floor and

ceiling effects were absent. Both scales revealed good internal consistency and

reproducibility, indicating reliability for both scales. Patients with PD had significantly higher

scores than controls on both scales. Correlation with other scales that address similar

constructs was high, giving support to the construct validity of the SCOPA-SLEEP. The

factor analysis revealed one factor for each SCOPA scale, indicating that the scales each

measure one construct, therewith justifying the calculation of sumscores. The coefficient of


- 173 -

variation of both SCOPA scales was higher than that of the PSQI and the ESS, indicating a

better ability to detect differences between individuals. Responsiveness of the SCOPA scales

remains to be evaluated.

Assessment of night-time sleep. Studies in other populations showed that the PSQI has

adequate reliability and validity.10,27-30 For internal consistency this was confirmed by the

results of our study. The scale has previously been used in PD.31-33 Some comments regarding

the PSQI are in order, however. First, the content validity of the PSQI may be questioned,

especially with respect to use in PD. The PSQI evaluates daytime dysfunction, but problems

in this area can often be explained by symptoms of PD. The score on the 'daytime

dysfunction' subscale is made up of two items, i.e., 'enthusiasm to get things done' (which

could be affected by PD or by depression, present in approximately 25% of the patients with

PD34) and 'trouble staying awake' (which could be caused by PD or by the effect of

antiparkinsonian medication). Second, the incorporation of 'trouble staying awake' and taking

sleep medication' in the sumscore is questionable. These items address a clearly different

construct than the other items, that evaluate aspects of sleep pattern. This is partially

confirmed by the factor analysis, in which daytime dysfunction (together with sleep

disturbances) loads on one factor, whereas the other, sleep pattern related items (quality,

duration, efficiency, and latency) load on the other. Third, calculating the sumscore of the

PSQI is time-consuming, which makes it less suitable for clinical application. These

arguments favor the use of the SCOPA-NS in patients with PD. Additionally, if the PSQI is

used in patients with PD, a higher cut-off may be more appropriate.

Assessment of daytime sleepiness. Studies in other populations have shown that the test-retest

reliability and internal consistency of the ESS are adequate.15,35-37 The scale has shown to

discriminate successfully between healthy controls and patients with sleep disorders. The ESS

has frequently been used in PD.8,9,31,38-45 Two comments regarding the use of the ESS are in

place. First, patients are asked to rate the chance of dozing, without actually having to have

had the experience of dozing off in that particular situation. Three of the situations described

in the ESS (sitting inactive in a public place, as a passenger in a car for an hour without a

break, and in a car while stopped for a few minutes) may actually be experienced infrequently

by the more severely affected or older patients, which may further compromise the patient's

appraisal of the situation. Second, both in the patient and the control group two factors

emerged, suggesting that the scale does not measure one construct. The SCOPA-DS may

therefore be preferred in this population, since it does not have the aforementioned objections.

Chapter 10

- 174 -

The first disease-specific sleep scale in PD, the Parkinson's Disease Sleep Scale (PDSS),11

was published very recently and evaluates various aspects of nocturnal sleep problems.

Unfortunately, this publication appeared after we finished our data collection and hence this

scale was not included in our study. A direct comparison of these two disease-specific scales

would have produced valuable information. The PDSS includes 15 items that evaluate overall

sleep quality (1 item), insomnia (2 items), potential reasons of sleep disturbances (6 items),

motor symptoms (4 items), sleep refreshment (1 item), and daytime dozing (1 item). Patients

indicate on a 10 cm visual analogue scale (VAS) how well they slept or how often the

described items applied to them, based on their experience during the past week. On face

value, the scale appears to measure various constructs. A thorough clinimetric evaluation has

not been published yet. Internal consistency and factor analysis were not reported, therewith

ruling out the possibility to judge whether the calculation of sumscores is justified. The

reproducibility of this scale seems adequate, but was only assessed in 15 patients. The relation

with other scales was only assessed by calculating the correlation between one item of this

scale (unexpectedly falling asleep during the day) and a scale that addresses daytime

sleepiness (ESS), but not with scales that evaluate nocturnal sleep problems.

In conclusion, sleep problems occur frequently in PD and deserve appropriate attention.

However, patients with PD may experience problems in many domains that all may have to

be considered and, consequently, short and practical instruments have considerable

advantages. These instruments allow rapid detection of areas that need further attention. It is

nevertheless essential that these instruments have good clinimetric properties. The SCOPA-

SLEEP scale fulfils these criteria, and may therefore be included in the assessment schedule

of patients with this condition.

Acknowledgements Professor R.A.C. Roos is gratefully acknowledged for reviewing the manuscript. This study

was financed by the Netherlands Organization for Scientific Research (project number 0940-

33-021).


- 175 -

References

1. Tandberg E, Larsen JP, Karlsen K. A community-based study of sleep disorders in patients with

Parkinson's disease. Mov Disord 1998;13:895-899.

2. Tandberg E, Larsen JP, Karlsen K. Excessive daytime sleepiness and sleep benefit in Parkinson's disease:

a community-based study. Mov Disord 1999;14:922-927.

3. Factor SA, McAlarney T, Sanchez-Ramos JR, Weiner WJ. Sleep disorders and sleep effect in Parkinson's

disease. Mov Disord 1990;5:280-285.

4. Caap-Ahlgren M, Dehlin O. Insomnia and depressive symptoms in patients with Parkinson's disease.

Relationship to health-related quality of life. An interview study of patients living at home. Arch Gerontol

Geriatr 2001;32:23-33.

5. Pal PK, Calne S, Samii A, Fleming JAE. A review of normal sleep and its disturbances in Parkinson's

disease. Parkinsonism Relat Disord 1999;5:1-17.

6. Smith MC, Ellgring H, Oertel WH. Sleep disturbances in Parkinson's disease patients and spouses. J Am

Geriatr Soc 1997;45:194-199.

7. Frucht S, Rogers JD, Greene PE, Gordon MF, Fahn S. Falling asleep at the wheel: motor vehicle mishaps

in persons taking pramipexole and ropinirole. Neurology 1999;52:1908-1910.

8. Chaudhuri KR, Pal S, Brefel-Courbon C. 'Sleep attacks' or 'unintended sleep episodes' occur with

dopamine agonists: is this a class effect? Drug Saf 2002;25:473-483.

9. Hobson DE, Lang AE, Martin WR, Razmy A, Rivest J, Fleming J. Excessive daytime sleepiness and

sudden-onset sleep in Parkinson disease: a survey by the Canadian Movement Disorders Group. JAMA

2002;287:455-463.

10. Buysse DJ, Reynolds III CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a

new instrument for psychiatric practice and research. Psychiatry Res 1989;28:193-213.

11. Chaudhuri KR, Pal S, DiMarco A, Whately-Smith C, Bridgman K, Mathew R, et al. The Parkinson's

disease sleep scale: a new instrument for assessing sleep and nocturnal disability in Parkinson's disease. J

Neurol Neurosurg Psychiatry 2002;73:629-635.

12. Hoddes E, Zarcone V, Smythe H, Phillips R, Dement WC. Quantification of sleepiness: a new approach.

Psychophysiology 1973;10:431-436.

13. Gillberg M, Kecklund G, Akerstedt T. Relations between performance and subjective ratings of sleepiness

during a night awake. Sleep 1994;17:236-241.

14. Jenkins CD, Stanton BA, Niemcryk SJ, Rose RM. A scale for the estimation of sleep problems in clinical

research. J Clin Epidemiol 1988;41:313-321.

15. Johns MW. A new method for measuring daytime sleepiness: the Epworth sleepiness scale. Sleep

1991;14:540-545.

16. Ellis BW, Johns MW, Lancaster R, Raptopoulos P, Angelopoulos N, Priest RG. The St. Mary's Hospital

sleep questionnaire: a study of reliability. Sleep 1981;4:93-97.

17. Douglass AB, Bornstein R, Nino-Murcia G, Keenan S, Miles L, Zarcone Jr VP, et al. The Sleep Disorders

Questionnaire. I: Creation and multivariate structure of SDQ. Sleep 1994;17:160-167.

18. Parrott AC, Hindmarch I. The Leeds Sleep Evaluation Questionnaire in psychopharmacological

investigations - a review. Psychopharmacology (Berl) 1980;71:173-179.

Chapter 10

- 176 -


J Neurol Neurosurg Psychiatry 1988;51:745-752.

20. Thommessen B, Aarsland D, Braekhus A, Oksengaard AR, Engedal K, Laake K. The psychosocial

burden on spouses of the elderly with stroke, dementia and Parkinson's disease. Int J Geriatr Psychiatry

2002;17:78-84.

21. Johns MW. Sensitivity and specificity of the multiple sleep latency test (MSLT), the maintenance of

wakefulness test and the Epworth Sleepiness Scale: failure of the MSLT as a gold standard. J Sleep Res

2000;9:5-11.

22. Hoehn MM, Yahr MD. Parkinsonism: onset, progression and mortality. Neurology 1967;17:427-442.

23. The World Health Organization Quality of Life Assessment (WHOQOL): development and general

psychometric properties. Soc Sci Med 1998;46:1569-1585.



25. Lees AJ, Blackburn NA, Campbell VL. The nighttime problems of Parkinson's disease. Clin

Neuropharmacol 1988;11:512-519.

26. Nausieda PA, Weiner WJ, Kaplan LR, Weber S, Klawans HL. Sleep disruption in the course of chronic

levodopa therapy: an early feature of the levodopa psychosis. Clin Neuropharmacol 1982;5:183-194.

27. Backhaus J, Junghanns K, Broocks A, Riemann D, Hohagen F. Test-retest reliability and validity of the

Pittsburgh Sleep Quality Index in primary insomnia. J Psychosom Res 2002;53:737.

28. Carpenter JS, Andrykowski MA. Psychometric evaluation of the Pittsburgh Sleep Quality Index. J

Psychosom Res 1998;45(1 Spec No):5-13.

29. Fictenberg NL, Putnam SH, Mann NR, Zafonte RD, Millard AE. Insomnia screening in postacute

traumatic brain injury: utility and validity of the Pittsburgh Sleep Quality Index. Am J Phys Med Rehabil

2001;80:339-345.

30. Gentili A, Weiner DK, Kuchibhatla M, Edinger JD. Test-retest reliability of the Pittsburgh sleep quality

index in nursing home residents. J Am Geriatr Soc 1995;43:1317-1318. Letter.

31. Fabbrini G, Barbanti P, Aurilia C, Vanacore N, Pauletti C, Meco G. Excessive daytime sleepiness in de

novo and treated Parkinson's disease. Mov Disord 2002;17:1026-1030.

32. Iranzo A, Valldeoriola F, Santamaria J, Tolosa E, Rumia J. Sleep symptoms and polysomnographic

architecture in advanced Parkinson's disease after chronic bilateral subthalamic stimulation. J Neurol

Neurosurg Psychiatry 2002;72:661-664.

33. Shulman LM, Taback RL, Bean J, Weiner WJ. Comorbidity of the nonmotor symptoms of Parkinson's

disease. Mov Disord 2001;16:507-510.

34. Cummings JL. Depression and Parkinson's disease: a review. Am J Psychiatry 1992;149:443-454.

35. Bloch KE, Schoch OD, Zhang JN, Russi EW. German version of the Epworth Sleepiness Scale.

Respiration 1999;66:440-447.

36. Johns MW. Reliability and factor analysis of the Epworth Sleepiness Scale. Sleep 1992;15:376-381.

37. Johns MW. Sleepiness in different situations measured by the Epworth Sleepiness Scale. Sleep

1994;17:703-710.


- 177 -

38. Ondo WG, Dat VK, Khan H, Atassi F, Kwak C, Jankovic J. Daytime sleepiness and other sleep disorders

in Parkinson's disease. Neurology 2001;57:1392-1396.

39. Tan EK, Lum SY, Fook-Chong SM, Teoh ML, Yih Y, Tan L, et al. Evaluation of somnolence in

Parkinson's disease: Comparison with age- and sex-matched controls. Neurology 2002;58:465-468.

40. Arnulf I, Konofal E, Merino-Andreu M, Houeto JL, Mesnage V, Welter ML, et al. Parkinson's disease and

sleepiness: An integral part of PD. Neurology 2002;58:1019-1024.

41. Pal S, Bhattacharya KF, Agapito C, Chaudhuri KR. A study of excessive daytime sleepiness and its

clinical significance in three groups of Parkinson's disease patients taking pramipexole, cabergoline and

levodopa mono and combination therapy. J Neural Transm 2001;108:71-77.

42. Ondo WG, Vuong KD, Jankovic J. Exploring the relationship between Parkinson disease and restless legs

syndrome. Arch Neurol 2002;59:421-424.

43. Nieves AV, Lang AE. Treatment of excessive daytime sleepiness in patients with Parkinson's disease with

modafinil. Clin Neuropharmacol 2002;25:111-114.

44. O'Suilleabhain PE, Dewey Jr RB. Contributions of dopaminergic drugs and disease severity to daytime

sleepiness in Parkinson disease. Arch Neurol 2002;59:986-989.

45. Moller JC, Stiasny K, Hargutt V, Cassel W, Tietze H, Peter JH, et al. Evaluation of sleep and driving

performance in six patients with Parkinson's disease reporting sudden onset of sleep under dopaminergic

medication: a pilot study. Mov Disord 2002;17:474-481.

Chapter 10

- 178 -

APPENDIX

SCOPA-SLEEP scale

Aim of the questionnaire

By means of this questionnaire, we would like to find out to what extent in the past month you have had

problems with sleeping. Some of the questions are about problems with sleeping at night, such as, for example,

not being able to fall asleep or not managing to sleep on. Another set of questions is about problems with

sleeping during the day, such as dozing off (too) easily and having trouble staying awake.

First read these instructions before you answer the questions!

Place a cross in the box above the answer which best reflects your situation. If you wish to change an answer, fill

in the ‘wrong’ box and place a cross in the correct one. If you have been using sleeping tablets, then the answer

should reflect how you have slept while taking these tablets.

NS: Night-time sleep problems

response options: not at all – a little – quite a bit – a lot

In the past month, … 1. … have you had trouble falling asleep when you went to bed at night?

2. … to what extent do you feel that you have woken too often?

3. … to what extent do you feel that you have been lying awake for too long at night?

4. … to what extent do you feel that you have woken up too early in the morning?

5. … to what extent do you feel you have had too little sleep at night?

Overall, how well have you slept at night during the past month?

response options: very well – well – rather well – not well but not badly - rather badly – badly - very badly

DS: Daytime sleepiness

response options: never – sometimes – regularly – often

1. How often in the past month have you fallen asleep unexpectedly either during the day or in the evening?

2. How often in the past month have you fallen asleep while sitting peacefully?

3. How often in the past month have you fallen asleep while watching TV or reading?

4. How often in the past month have you fallen asleep while talking to someone?

5. In the past month, have you had trouble staying awake during the day or in the evening?

6. In the past month, have you experienced falling asleep during the day as a problem?

- 179 -

1111 A short psychosocial questionnaire for patients with

Parkinson’s disease: the SCOPA-PS

Johan Marinus1, Martine Visser1, Pablo Martínez-Martín2, Jacobus J. van Hilten1, Anne M.

Stiggelbout3


Leiden, The Netherlands; 2Neuroepidemiology Unit, Department of Applied Epidemiology, National

Center for Epidemiology I.S. Carlos III, Madrid, Spain

Published in the Journal of Clinical Epidemiology 2003;56:61-67

Chapter 11

- 180 -

Abstract Purpose. To develop a short questionnaire for psychosocial functioning in patients with Parkinson’s

Disease (PD). Methods. The SCales for Outcomes in PArkinson's disease - PsychoSocial

questionnaire (SCOPA-PS) was tested in a survey and compared with other instruments and with

medical information. This survey was sent to 205 patents with idiopathic PD. Results. Eighty-six

percent of the questionnaires were returned. Cronbach’s α was 0.83. Two-week test-retest reliability

was 0.85 (intraclass correlation coefficient). Construct validity with other scales (Spearman's rho) was

0.82 for the Parkinson’s Disease Questionnaire - 39 item version (PDQ-39), 0.76 for the PDQ-8, 0.69

for the Hospital Anxiety and Depression Scale, -0.61 for the Euroqol, and -0.60 for a visual analogue

scale evaluating Quality-of-Life. The summary index revealed a significant increase with increasing

disease severity. Conclusions. The SCOPA-PS is a new, short psychosocial questionnaire for patients

with PD with good clinimetric properties.

Keywords: Parkinson disease, Quality of Life, Clinimetrics, Psychosocial, SCOPA

Psychosocial questionnaire for Parkinson’s disease

- 181 -

Introduction Parkinson's disease (PD) is a chronic, progressive, neurological disorder that affects almost

two percent of the population over 65 years of age.1 The onset of the disease generally occurs

between the ages of 50 and 65, and the incidence increases with higher age. PD is

characterized by tremor, rigidity, slowness of movement, and abnormalities of posture. Non-

motor features like autonomic dysfunctions, mood changes, and cognitive deterioration often

complicate the course of the disease.

Chronic progressive illnesses like PD can have a considerable impact on the psychosocial

functioning of patients. Yet in PD, this impact is seldom evaluated separately. The

psychosocial consequences of PD are usually assessed with generic or disease-specific

health-related quality-of-life (HRQoL) instruments that include physical items in addition to

items that address psychosocial functioning. HRQoL is generally considered to consist of

physical, mental, and social (including 'role') aspects of a disease.2 Item selection in HRQoL

scales is usually based on interviews with patients, in which patients are asked to rate the

importance of certain features of the disease, and where the most relevant items are retained

for the final version of the scale. Consequently, HRQoL instruments consist of a mixture of

items that only have in common that patients consider them important, and that either

physical, mental, or social aspects are evaluated. Additionally, these instruments often

include items that address impairment levels and items that address disability levels, which

further adds to the heterogeneity. It is often far from clear whether patients interpret certain

impairments as being 'physical' or 'psychosocial'. If, for instance, a patient indicates that he

often has trouble with the shaking of his hands, this could mean he experiences trouble with

dressing (physical), feels ashamed (psychosocial), or both. Therefore, including items that

evaluate the extent to which patients are bothered by impairments, provides no direct insight

in the impact this impairment has upon the level of psychosocial functioning. However, the

distinction between physical and psychosocial is important. First, separate information on the

psychosocial consequences may provide a better understanding of the different consequences

of the disease to an individual. Second, neurologists may, because of their professional focus

on physical consequences of the disease, interpret particular problems somatically and hence

overlook psychosocial difficulties. Third, the distinction between physical and psychosocial

consequences may have implications for the choice of an intervention. After all, if the

difficulties are predominantly experienced at the physical level, this may call for

interventions that aim to improve motor functions, e.g., pharmacotherapy or physical therapy.

Although people with psychosocial difficulties will also benefit from optimally controlled

Chapter 11

- 182 -

motor symptoms, additional support from other healthcare providers, such as psychologists or

social workers, may be indicated. To quantify the psychosocial impact of PD, an instrument

that only includes items that directly address perceived emotions and experienced difficulties

in social situations is needed.

To date, a short, clinimetrically sound scale that evaluates this psychosocial impact of PD is

not available. We therefore conducted a study with the objective to develop a short

psychosocial instrument for patients with PD, and to evaluate its reliability (internal

consistency, test-retest reliability) and validity (construct validity, 'known-groups

comparisons').

The development of the SCOPA psychosocial questionnaire (SCOPA-PS) is part of a larger

research project, the SCales for Outcomes in PArkinson’s disease (SCOPA), in which short,

practical, and clinimetric sound scales for all relevant domains in PD are selected or

developed.

Methods

Scale construction

Information on topics that were relevant for inclusion in the SCOPA-PS was derived from

two studies that evaluated the content validity of disease-specific HRQoL instruments in

PD.3,4 Next, articles that included information on the relevance of items to patients with PD

(frequency of occurrence, severity of problems, frequency-severity products, item-total

correlations) were studied.5-7 Items were included in the SCOPA-PS if they concerned social

or emotional consequences of PD, were comprehensive, and were important to patients. In

order for a short scale to have good content validity, items had to cover a broad area. This

was achieved by a less specific formulation of items, or by combining more than one element

into an item. The items were written to address the disability or handicap level. Together the

items aimed to cover the psychosocial domain in PD. The initial scale was piloted until no

further problems were encountered and patients understood all items well. This resulted in a

scale of 11 items, which evaluated the severity of a particular problem during the past month

on a scale from 0 (not at all) to 3 (very much). The scale was developed in Dutch, but an

English translation is also available (appendix). The scale can be used free of charge and is

available from the first author.


- 183 -

Patients

All patients that were registered at the outpatient neurology clinic of the Leiden University

Medical Center on 1 January 2001, and who fulfilled the United Kingdom Parkinson's

Disease Society Brain Bank criteria for idiopathic PD,8 were considered eligible. Patients

with other disorders of the central nervous system, and patients that were not capable of

reading or understanding Dutch, were excluded. Response was interpreted as consent to

participate.

Methods

A postal survey was sent to the 205 patients that fulfilled the aforementioned criteria. The

survey included the SCOPA-PS, the Parkinson’s Disease Questionnaire (PDQ-39),6 the PDQ-

8,9 the Euroqol (EQ-5D),10 the Hospital Anxiety and Depression Scale (HADS),11 and a

visual analogue scale (VAS) assessing HRQoL with PD. The PDQ-39 is a disease-specific

HRQoL instrument with five ordinal response options that includes 39 items, clustered in

eight subscales (mobility, activities of daily living, emotional well-being, stigma, social

support, cognitions, communication, and bodily discomfort). Summary indices can be

calculated both for the subscales and for the total scale. The scale addresses a conceptually

related (though not identical) construct. The PDQ-8 is the short version of the PDQ-39,

constructed by the developers by selecting the items with the highest item-total correlation

from each of the eight subscales. The EQ-5D is a short generic HRQoL instrument that

consists of five items (mobility, self-care, usual activities, pain/discomfort, and

anxiety/depression) with three ordinal response options. A summary index with a maximum

score of 1.00 (reflecting the best health state) can be derived from the five dimensions by

conversion with a table of scores.10,12 The scale was previously shown to perform well in a

population of patients with PD.13 The HADS was included because patients with PD show

elevated levels of anxiety and depression, and because depression accounts for approximately

50 percent of the variance in HRQoL in this patient group.14-16 The HADS has fourteen items,

with seven items addressing depression and seven anxiety. Scores on individual items can

either be summed to calculate a total score, or summed per subscale to produce separate

anxiety and depression scores. Each item has four graded response options, scored 0

(absence) to 3 (present to extreme).

Patients were requested to fill out the PDQ-39 first and to seal it into one of the two enclosed

return envelopes, before completing the other scales. This was done because some of the

items in the PDQ-39 resembled items in other instruments, and we wanted to ensure that

Chapter 11

- 184 -

patients did not check their previous responses, in order to avoid overestimation of the

correlation between scales. Patients were asked to return the completed questionnaires within

one week. After two weeks we contacted those patients that had not returned their

questionnaires, and inquired whether they still considered participating. All patients that

returned the questionnaires within five days, were asked to fill out the SCOPA-PS a second

time two weeks later, in order to assess test-retest reliability.

Information from the questionnaires was combined with information on disease duration and

disease severity obtained by chart review. Disease severity was evaluated with the Hoehn and

Yahr (H&Y) staging system17 at each control visit. H&Y 1 is the mildest stage with only

unilateral symptoms, whereas H&Y 5 represents the most severe stage, in which patients are

wheelchair-bound or bed-ridden.

The study protocol was approved by the institutional review board.


Data quality. Items were considered to perform adequately when they met the following

criteria: missing values in less than 5% of the patients, item-total correlation of 0.20 or

higher,18 and absence of floor and ceiling effects of individual items (endorsement rates

between 0.20 and 0.80).18

First a sum score was computed by adding the scores on the individual items, and then a

summary index (SI) was calculated by transformation of this sum score to a 0-100 scale.

Higher scores reflected greater psychosocial difficulties.

Reliability. Internal consistency was assessed using Cronbach’s α. The test-retest reliability

for individual items over a two-week interval was assessed with a weighted kappa (Kw;

quadratic weights), whereas for the total score an intraclass correlation coefficient (ICC) was

calculated.

Validity. Construct validity with other scales was assessed using Spearman's correlation

coefficient (rs). This coefficient was also used to assess the correlation between the SCOPA-

PS SI and both disease duration and age. T-tests for independent samples were used to

compare the SCOPA-PS SI of male patients with those of female patients, and to compare the

mean age and disease duration of responders and non-responders. The differences in male-

female ratio and disease severity among responders and non-responders, were assessed using

a Chi-square test and a Mann-Whitney-U test, respectively. The significance threshold was

set at 0.05. 'Known-groups' validity was assessed by comparing the SCOPA-PS SI of patients


- 185 -

with different disease severities and different levels of anxiety and depression (assessed by

item 5 of the EQ-5D), using ordinal regression.

Results

One-hundred-seventy-seven questionnaires (86%) were returned. The characteristics of the

patients are presented in table 1. Mean age and male-female ratio did not differ between

responders and non-responders. However, non-responders had significantly higher H&Y

stages (p = 0.008) and longer disease duration (12.7 versus 9.4; p = 0.017). The first 59

patients were approached for the test-retest analysis. Fifty-four of these patients (92%)

returned the SCOPA-PS the second time. The patient characteristics of the retest group were

similar to those of the total group.


number of patients: 177

male / female ratio 99 (56%) / 78 (44%)

mean age (SD) in years: 65.2 (11.1)

mean age at onset (SD) in years: 55.7 (11.6)

mean disease duration (SD) in years: 9.4 (5.6)

Modified Hoehn and Yahr stage: mild (H&Y1 1: 10, H&Y 2: 64): 74 (41.8%)

moderate (H&Y 3): 70 (39.5%)

severe (H&Y 4: 31, H&Y 5: 2): 33 (18.6%)

1 Hoehn and Yahr stage

Data quality. None of the items had missing values in more than 5% of the subjects. There

were no floor or ceiling effects and all item-total correlations exceeded 0.20 (table 2).

Reliability. The internal consistency of the scale, calculated by Cronbach’s α, was 0.83. Item-

total correlations ranged from 0.24 (problems with sexuality) – 0.67 (asking others for help

too often) (table 2). The test-retest reliability (ICC) of the total scale was 0.85. For individual

items the test-retest reliability, measured as a weighted kappa, ranged from 0.46 (problems

getting along with partner, family, or good friends) - 0.83 (problems with sexuality).

Table 2. Frequency distribution and item characteristics

item no. frequency of responses valid1 item-total2 Kw3

0 1 2 3

During the past month, have you ..

1 ..had difficulty with work, household, or other chores? 22 67 60 28 177 0.61 0.69

2 ..had difficulty with hobbies, sport, or leisure activities? 20 72 56 29 177 0.59 0.73

3 ..felt uncertain in your contact with others? 64 79 30 4 177 0.58 0.50

4 ..had problems getting along with your partner, family, or good friends? 122 42 11 2 177 0.34 0.46

5 ..had problems in the area of sexuality? 69 44 36 20 169 0.24 0.83

6 ..felt more house-bound than you would wish to be? 44 62 47 22 175 0.63 0.73

7 ..had the feeling you have had to ask others for help too often? 57 62 40 17 176 0.67 0.61

8 ..felt isolated and lonely? 85 64 21 6 176 0.61 0.71

9 ..had difficulty when having a conversation? 61 77 30 8 176 0.49 0.55

10 ..felt ashamed of your disease? 125 35 12 3 175 0.28 0.65

11 ..been concerned about the future? 30 76 51 20 177 0.49 0.68

1 number of valid responses 2 item-total correlations 3 weighted kappa (quadratic weights) for test-retest reliability of individual items


- 187 -

Validity. Correlations between the SCOPA-PS SI on the one hand, and the PDQ-39 SI and

PDQ-8 SI on the other hand, were 0.82 and 0.76, respectively. Correlations of the SCOPA-PS

with the HADS, the EQ-5D and the VAS were somewhat lower and remarkably similar to the

correlation of the PDQ-39 with these scales (table 3). The correlation of the SCOPA-PS SI

with the subscales of the PDQ-39 ranged from 0.42 (with the bodily discomfort subscale) to

0.68 (with the emotional well-being subscale). The correlation between both versions of the

PDQ was high, which might be expected since the PDQ-8 was derived from the PDQ-39.

Table 3. Correlation between SCOPA-PS and other scales (Spearman's rho)

PDQ39 SI1 PDQ8 SI1 HADS2 HADS-A3 HADS-D4 EQ-5D SI5 VAS6

SCOPA-PS SI 0.82 0.76 0.69 0.61 0.62 -0.61 -0.60

PDQ39 SI 0.91 0.74 0.69 0.65 -0.69 -0.54

PDQ8 SI 0.74 0.69 0.64 -0.59 -0.46

HADS 0.92 0.88 -0.63 -0.58

HADS-A 0.64 -0.55 -0.49

HADS-D -0.58 -0.56

EQ-5D SI 0.41

NB: all correlations significant (p < 0.001); 1 Parkinson’s Disease Questionnaire Summary Index, 39 and 8 item

version; 2 Hospital Anxiety and Depression Scale; 3 HADS anxiety subscale; 4 HADS depression subscale; 5

Euroqol summary index; 6 Visual analogue scale assessing quality of life.

The mean duration between the H&Y assessments and the time of the survey was 3.5 months.

Because patients in H&Y stages 1 and 5 were underrepresented (10 and 2 patients,

respectively), disease severity was categorised as mild (H&Y 1 and 2), moderate (H&Y 3), or

severe (H&Y 4 and 5). Ordinal regression yielded a good fitting model (Chi-square 45.9,

degrees of freedom 46; p = 0.48) displaying a significant increase in psychosocial disability

with increasing H&Y stages (p < 0.001). The mean SCOPA SI of mildly affected patients

was 25.7 (SD 12.7), whereas for moderately and severely affected patients this was 37.2 (SD

18.7) and 45.1 (SD 18.5), respectively. Ordinal regression of the SCOPA-PS SI on the scores

of the anxiety/depression item of the EQ-5D also displayed a good fitting model (Chi-square

Chapter 11

- 188 -

37.5, degrees of freedom 48; p = 0.86) indicating a significant increase in psychosocial

disability with higher anxiety/depression scores (p < 0.001). The mean SCOPA-SI of patients

scoring 1 (no problem), 2 (some difficulty), or 3 (extreme difficulty) on this item of the EQ-

5D were 26.5 (SD 14.6), 41.1 (SD 17.0), and 63.0 (SD 9.2), respectively.

The SCOPA-PS SI correlated significantly with disease duration (rs = 0.20; p < 0.01), but not

with age (rs = 0.02; p = 0.78). No significant difference in SCOPA-PS SI was observed

between male and female patients.

Discussion

We have developed a short questionnaire that evaluates psychosocial functioning in patients

with PD. The response rate in this study was high and the quality of the data was good.

Patients that participated in this study attended an outpatient movement disorders clinic in the

western part of The Netherlands. This clinic has a regional function and serves patients with a

wide range of disease severities in PD.

The internal consistency and retest reliability of the SCOPA-PS SI were good, and

convergent validity with other scales agreed with our expectations. ‘Known-groups’

comparisons indicated that patients with longer disease duration, higher disease severity, and

elevated levels of anxiety and depression had significantly higher SCOPA-PS scores. Age

and sex were not related to worse outcomes. Although the non-response rate in our study was

low, the SCOPA-PS scores may actually have been somewhat higher, since non-responders

had higher disease severity and longer disease duration and both of these variables were

associated with higher SCOPA-PS scores.

The use of this scale on an individual level is somewhat disputable. The minimum threshold

for reliability in group comparisons is often set at 0.70, whereas for use on the individual

level, generally more stringent demands are made. Nunnally and Bernstein19 state that for

important decisions on an individual level, a reliability of 0.90 is the minimum. Helmstadter20

quotes variable reliability values for various types of psychological tests intended for

individuals, the median for personality tests being 0.85, that for ability tests being 0.90, and

for attitude tests, 0.79. Hays et al.21 state that the 0.90 criterion for individual assessment may

be too stringent and that many highly regarded instruments fail to meet this standard. Since

the SCOPA-PS is not intended for diagnostic purposes (that could involve important

decisions for individuals based on the patient’s score in relation to a cut-off level), the


- 189 -

demands for internal consistency may be somewhat less stringent. We believe that the scale

can adequately be used for monitoring psychosocial functioning over time, and may serve as

an indicator of areas of potential problems.

All items of the scale had item-total correlations that exceeded the predefined criterion.

However, some items had moderate item-total correlations and will be discussed in more

detail. This concerned item 4 ('getting along with partner, family, or good friends'), item 5

('problems with sexuality') and item 10 ('felt ashamed') with values of 0.32, 0.24, and 0.28,

respectively. We did not consider to remove one or more of these items, because in our view

test characteristics of items should not dominate the item selection process at the expense of

content validity. In this so-called clinimetric approach, it is important that all relevant

attributes of the construct are measured, and hence, content validity and reproducibility are

the most important parameters.22

The moderate item-total correlation for item 4 ('getting along with partner, family, or good

friends') is most likely explained by the low endorsement of this item. To examine whether

this may have been the result of the way this item was framed, we evaluated five different

versions of this item in 30 patients after the postal survey had been completed. However, no

differences between these versions were found, and we therefore decided to retain the

original description. Our findings regarding this item are supported by observations from

various studies on the PDQ-39, in which the 'social support' subscale consistently has been

found to produce the lowest mean score, both in clinic and non-clinic samples, and across

populations from different countries.6,9,14,23,24 This subscale also showed the lowest

correlations with measures of disease severity,6,9,14,25 indicating that progression of the

disease does not greatly affect the quality of close relationships.

The pattern of correlations and endorsement frequencies of item 10 ('felt ashamed') is very

similar to that of item 4 ('problems getting along with partner, family, or good friends').

Seventy-one percent of the patients indicated no problems with respect to this item. In several

studies this item showed low item means,6,14,23,24,26 and its correlation with measures of

disease severity were also moderate (range 0.29-0.33).6,14,23,24,26

The moderate item-total correlation of the item on sexuality may be explained by physical

features of the disease, especially with respect to male patients. Problems with erectile

function may affect up to 60% of males,27 while on the other hand antiparkinsonian

medication may increase libido. Female patients more often report vaginal tightness and

involuntary urination, which may affect the quality of the sexual relationship.28 Evidently,

Chapter 11

- 190 -

these physical problems do not necessarily correlate highly with other psychosocial

consequences of the disease. Although the origin of the problems may be physical, the

consequences for the psychosocial level are considerable and, consequently, an item

addressing this topic should be present in a psychosocial instrument. A large proportion of

our patients (59%) indicated problems in this area. Men had significantly higher scores than

women (item means 1.40 versus 0.56, respectively; p < 0.001), but the severity of the sexual

problems did not correlate significantly with disease severity or age.

Several disease-specific HRQoL instruments have become available in PD in the past few

years.5-7,9,29,30 These scales include items on physical, mental, and social aspects of the

disease. The instruments usually inquire how often patients had difficulty with certain aspects

of the disease, and often include both impairments and disabilities. As far as the impairments

are concerned, it is not always clear whether these difficulties are of a physical or a

psychosocial nature. We therefore propose to disentangle these levels by using separate

instruments to assess physical and psychosocial functions. These separate measures may

produce more meaningful and conceptually clear indices. Instruments that assess physical

disabilities in PD already exist,31-33 but to date a short scale that evaluates the psychosocial

consequences of PD was not available.

One could argue that the separate subscales of the two most frequently used disease-specific

HRQoL scales in PD, the PDQ-396 and the Parkinson's Disease Quality-of-Life questionnaire

(PDQL),5 may be used, since the PDQ-39 has subscales on emotional well-being, stigma,

social support, and communication, while the PDQL has both a social and an emotional

subscale. Apart from the fact that it is unusual to use only subscales of an instrument and that

the involved subscales of both instruments would still consist of 16 items each, other

arguments favour the use of the SCOPA-PS. As far as the PDQ-39 is concerned, both internal

consistency and test-retest reliability of the social support subscale have been found to be low

(Cronbach’s α and Pearson’s r < 0.70) in some studies,6,34 whereas role functioning is only

marginally addressed and sexuality is not addressed at all in this scale.3 The PDQL on the

other hand, provides no information on two important psychosocial items, viz. close

relationships and role functioning.

If the basic assumption is to select a short disease-specific instrument, the SCOPA-PS may be

preferred over the other currently available short HRQoL instruments, the Parkinson’s Impact

Scale (PIMS, 10 items)29 and the PDQ-8,9 because of the potential problems with respect to


- 191 -

content validity of these scales. The items in the PIMS were obtained by consensus between

specialised nurses and did not involve patients in the item generation phase. The PDQ-8 was

obtained through statistical procedures in which the item with the highest item-total

correlation in each of the eight subscales of the PDQ-39 was selected for the short version.

This approach may lead to a scale that does not reflect the content of the original scale

adequately. In the PDQ-8 engagement in work, household, and leisure activities is not

assessed, while items on sexuality and role functioning are also lacking. Furthermore, the

internal consistency of the PDQ-8 was somewhat low in our study (Cronbach's α 0.74).

Taken together, the SCOPA-PS fills the need for a short, clinimetrically sound scale. It

addresses the difficulties patients experience in social and emotional spheres. The scale can

adequately be used in research situations, whereas the use in clinical practice may be argued.

The internal consistency is, however, similar to that of comparable subscales of other disease-

specific HRQoL instruments in PD. The distinction between physical difficulties on the one

hand and emotional and social difficulties on the other hand, provides a clear insight in the

difficulties patients perceive in the two distinct domains.

Acknowledgements


0940-33-021). The authors thank professor R.A.C. Roos for his helpful comments.

References

1. De Rijk MC, Launer LJ, Berger K, Breteler MM, Dartigues JF, Baldereschi M, et al. Prevalence of



2. Bowling A. Health-related quality of life: a discussion of the concept, its use and measurement.

Measuring disease. Buckingham: Open University Press, 1995: 1-19.

3. Marinus J, Ramaker C, Van Hilten JJ, Stiggelbout AM. Health related quality of life in Parkinson's


72(2):241-248.

4. Damiano AM, Snyder C, Strausser B, Willian MK. A review of health-related quality-of-life concepts

and measures for Parkinson's disease. Qual Life Res 1999; 8(3):235-243.

Chapter 11

- 192 -

5. De Boer AG, Wijker W, Speelman JD, De Haes JC. Quality of life in patients with Parkinson's disease:

development of a questionnaire. J Neurol Neurosurg Psychiatry 1996; 61(1):70-74.



7. Van den Berg M. Leben mit Parkinson - Entwicklung und psychometrische Testung des Fragenbogens

PLQ. Neurol Rehabil 1998; 4(5):221-226.



9. Jenkinson C, Fitzpatrick R, Peto V, Greenhall R, Hyman N. The PDQ-8: development and validation of a

short-form Parkinson's disease questionnaire. Psychol Health 1997; 12:805-814.

10. Brooks R. EuroQol: the current state of play. Health Policy 1996; 37(1):53-72.


67(6):361-370.



13. Schrag A, Selai C, Jahanshahi M, Quinn NP. The EQ-5D-a generic quality of life measure-is a useful

instrument to measure quality of life in patients with Parkinson's disease. J Neurol Neurosurg Psychiatry

2000; 69(1):67-73.



S38.

15. Schrag A, Jahanshahi M, Quinn N. What contributes to quality of life in patients with Parkinson's

disease? J Neurol Neurosurg Psychiatry 2000; 69(3):308-312.

16. Schrag A, Jahanshahi M, Quinn NP. What contributes to depression in Parkinson's disease? Psychol Med

2001; 31(1):65-73.




19. Nunnally JC, Bernstein IH. Psychometric Theory. 3 ed. New York: McGraw-Hill, Inc., 1994.

20. Helmstadter GC. Evaluation of tests - reliability. Principles of psychological measurement. London:

Methuen & Co Ltd, 1966: 58-86.

21. Hays RD, Anderson RT, Revicki D. Assessing reliability and validity of measurement in clinial trials. In:

Staquet MJ, Hays RD, Fayers PM, editors. Quality of Life Assessment in Clinical Trials. Oxford: Oxford

University Press, 1998: 169-182.

22. Wright JG, Feinstein AR. A comparative contrast of clinimetric and psychometric methods for

constructing indexes and rating scales. J Clin Epidemiol 1992; 45(11):1201-1218.

23. Berger K, Broll S, Winkelmann J, Heberlein I, Muller T, Ries V. Untersuchung zur Reliabilität der

deutschen Version des PDQ-39: Ein krankheitsspezifischer Fragenbogen zur Erfassung der

Lebensqualität von Parkinson-Patienten. Aktuel Neurol 1999; 26(4):180-184.


- 193 -

24. Fitzpatrick R, Peto V, Jenkinson C, Greenhall R, Hyman N. Health-related quality of life in Parkinson's

disease: a study of outpatient clinic attenders. Mov Disord 1997; 12(6):916-922.

25. Jenkinson C, Peto V, Fitzpatrick R, Greenhall R, Hyman N. Self-reported functioning and well-being in

patients with Parkinson's disease: comparison of the short-form health survey (SF-36) and the

Parkinson's Disease Questionnaire (PDQ-39). Age Ageing 1995; 24(6):505-509.

26. Jenkinson C, Fitzpatrick R, Peto V, Greenhall R, Hyman N. The Parkinson's Disease Questionnaire

(PDQ-39): development and validation of a Parkinson's disease summary index score. Age Ageing 1997;

26(5):353-357.

27. Brown RG, Jahanshahi M, Quinn N, Marsden CD. Sexual function in patients with Parkinson's disease

and their partners. J Neurol Neurosurg Psychiatry 1990; 53(6):480-486.

28. Welsh M, Hung L, Waters CH. Sexuality in women with Parkinson's disease. Mov Disord 1997;

12(6):923-927.

29. Calne S, Schulzer M, Mak E, Guyette C, Rohs G, Hatchard S, et al. Validating a quality of life rating

scale for idiopathic Parkinsonism: Parkinson's impact scale. Parkinsonism Relat Disord 1996; 2(2):55-61.

30. Welsh M, McDermott M, Holloway R, Plumb S, Pfeiffer R, Hubble J. Development and Testing of the

Parkinson's Disease Quality of Life Scale: The PDQUALIF. Mov Disord 1997; 12(5):836-836. Abstract.





al. Intermediate Scale for Assessment of Parkinson's Disease. Characteristic and Structure. Parkinsonism

Relat Disord 1995; 1(2):97-102.




34. Bushnell DM, Martin ML. Quality of life and parkinson's disease: Translation and validation of the US

Parkinson's Disease Questionnaire (PDQ-39). Qual Life Res 1999; 8(4):345-350.

Chapter 11

- 194 -

Appendix: SCOPA – PS

Questionnaire on the psychosocial consequences of Parkinson’s Disease

In this questionnaire, we inquire about problems which you may encounter as a result of your illness in the areas

of (social) activities, contact with other people, and on an emotional level. When answering the following

questions, please think carefully about your personal situation during the past month, and consider to what

extent the situation described actually posed a problem for you. Tick the box above the answer which best

reflects your situation.

1. During the past month, have you had difficulty with work, household or other chores?

2. During the past month, have you had difficulty with hobbies, sport or leisure activities?

3. During the past month, have you felt uncertain in your contact with others?

4. During the past month, have you had problems getting along with your partner, family

or good friends?

5. During the past month, have you had problems in the area of sexuality?

6. During the past month, have you felt more house-bound than you would wish to be?

7. To what extent have you had the feeling that you have had to ask others for help too often during the past

month?

not at all a little quite a bit very much








- 195 -

8. To what extent have you felt isolated and lonely during the past month?

9. During the past month, have you had difficulty when having a conversation?

10. To what extent have you felt ashamed of your disease during the past month?

11. During the past month, have you been concerned about the future?

© This questionnaire is made available free of charge, with the permission of the authors, to all those

undertaking non-profit and profit making research. The authors may be requested to share data for psychometric

purposes. Use of this questionnaire in studies should be communicated to the developers. No changes may be

made to the questionnaire without written permission. For further information, please contact

[email protected]





- 197 -

1122 Summary and conclusions

Chapter 12

- 198 -

This thesis reflects the first phase of the project called ‘The assessment of the disablement

process in Parkinson’s Disease’, that aims to provide data on disease progression in patients

with Parkinson's disease (PD). This first phase (called the SCOPA project, short for SCales

for Outcomes in PArkinson's disease) is concerned with arriving at the appropriate

instruments, the second phase involves the use of these instruments in the longitudinal follow-

up of patients, stratified by disease duration and age at onset of the disease.

Chapter 1 is the introduction to this thesis and poses the problem that in many relevant

domains in PD good instruments, that allow clinicians and researchers to quantify the extent

of a particular problem, are lacking. We use the framework of the Disablement Process1, to

test already existing instruments for their use in PD or to develop new instruments that are

valid, reliable, and clinimetrically sound. This framework proposes a sociomedical model of

disability, describing a pathway that links pathology with impairments, functional limitations,

and disability and acknowledges bi-directional relations between these entities. Extra- and

intra-individual factors, and comorbidity as a special case of an intra-individual factor, may

act upon this pathway and modify the course of the disease and the extent to which an

individual is affected. The following domains on the impairment level were considered of

such relevance that it justifies separate evaluation: cognition, mood, psychiatric

complications, motor function, motor complications, and autonomic dysfunction (discussed in

chapters 4-8). On the disability level, modules for psychosocial disability, activities-of-daily-

living (ADL), and sleep were developed (chapters 8-11). Other modules that have been tested

or developed, but not presented in this thesis, involve comorbidity and costs.

Chapter 2 presents an overview of the clinimetric characteristics of existing rating scales that

evaluate impairments and disabilities in PD. The study involves a systematic review of 30

studies that report on the clinimetric characteristics of 11 rating scales. Outcome measures

included validity, reliability, and responsiveness. We found three impairment scales (Webster,

Columbia University Rating Scale [CURS], Parkinson's Disease Impairment Scale), four

disability scales (Schwab and England, Northwestern University Disability Scale [NUDS],

Intermediate Scale for Assessment of Parkinson's Disease, Extensive Disability Scale), and

another four scales that evaluated both impairments and disabilities (New York University

Parkinson's Disease Rating Scale, University of California Los Angeles Scale, Unified

Parkinson's Disease Rating Scale [UPDRS], Short Parkinson Evaluation Scale). The scales

showed large differences in their representation of items considered responsive to

Summary and conclusions

- 199 -

dopaminergic treatment or of symptoms that appear late in the course of the disease and lack

responsiveness to treatment. Irrespective of the scale, there was a lack of consistency in the

inter-rater reliability of items addressing bradykinesia, tremor, and rigidity. Overall, disability

items displayed moderate to good inter-rater reliability. The available evidence indicates that

the CURS, NUDS, and UPDRS have moderate to good reliability and validity. The majority

of instruments demonstrated clinimetric shortcomings or had not been subjected to extensive

clinimetric testing, despite their frequent use. The CURS, NUDS, and UPDRS are the scales

that have been evaluated most often, and are considered valid and reliable.

In chapter 3 we compared and contrasted disease-specific quality-of-life instruments in PD

and assessed their clinimetric properties. Two reviewers independently evaluated both

thoroughness and results of 20 studies that reported clinimetric properties of four scales. The

content validity of the Parkinson’s Disease Questionnaire-39 item version (PDQ-39), the

Parkinson’s Disease Quality of Life questionnaire (PDQL), and the ‘Fragebogen Parkinson

LebensQualität’ (Parkinson Quality of Life questionnaire; PLQ) was found to be adequate to

good, but for the Parkinson’s Impact Scale (PIMS) it was insufficient. Construct validity of

both the PDQ-39 and the PDQL was good, but for the PLQ and the PIMS this was

insufficiently evaluated. Internal consistency of all scale totals and of subscale totals of the

PDQL was good, whereas for the social support subscale of the PDQ-39 and four subscales of

the PLQ this was inadequate. Test-retest reliability was not evaluated for the PDQL and was

adequate in other scales. Responsiveness was partially established for the PDQ-39, and not

assessed for the other scales. The number of available translations, as well as the number of

studies in which these instruments are used, differed considerably. The conclusion of our

study was that the selection of an instrument obviously depends on the goal of the study, but

that in most situations the PDQ-39 probably will be the most appropriate HRQoL instrument

currently available. The PDQL may be considered an alternative, whereas the PLQ may be

considered in studies involving German speaking patients with PD. Use of the PIMS should

be considered only as a means of identifying areas of potential problems.

Chapter 4 involves the assessment of cognition in PD. We argue that cognitive deficits in

Parkinson's disease (PD) may be underestimated if classical instruments for cognition are

used. These instruments generally put an emphasis on cortical functions. Some of these

functions are relatively spared in PD, whereas other cognitive functions that are frequently

affected in PD, are lacking. The objective of this study was to develop a short and practical

Chapter 12

- 200 -

instrument that is sensitive to the specific cognitive deficits in PD, and is valid and reliable. It

was not our objective to construct a screening tool or a diagnostic instrument. In stead, the

instrument is intended for comparing groups in research situations and for assessing change in

individual functioning over time. First we searched the literature to identify the most

frequently affected cognitive domains in PD, and to select or develop candidate items from

these domains for the initial scale. This scale was next tested in 85 patients with PD and 75

age-, education-, and sex-matched controls. Items that met predefined criteria for data quality,

reproducibility, and discriminative properties, were included in the final scale. This scale, the

SCOPA-COG, consists of 10 items with a maximum score of 43, with higher scores reflecting

better performance. The test-retest reliability of the sumscore was 0.78 (intraclass correlation

coefficient), and ranged from 0.40-0.75 for individual items (weighted kappa). Cronbach's α

was 0.83. Construct validity of the scale was supported by the expected correlations with the

CAMCOG and the MMSE, and by differences found between groups of participants

classified by dementia status, and between patients grouped by disease severity. The scale

showed a clear trend towards lower cognition scores for patients with more advanced PD.

This trend was more pronounced in the SCOPA-COG than in the CAMCOG and the MMSE

and use of this scale could provide a better insight into the longitudinal development of

cognitive deficits in PD. The coefficient of variation of the SCOPA-COG was higher than that

of the CAMCOG or the MMSE, indicating a better ability to detect differences between

individuals. The SCOPA-COG is a short, reliable, and valid instrument that is sensitive to the

specific cognitive deficits in PD and suitable for comparing groups in research situations.

The purpose of the study presented in chapter 5 was to evaluate the psychometric properties

of the Hospital Anxiety and Depression Scale (HADS) in patients with Parkinson's disease

(PD), and to assess the prevalence of symptoms of anxiety and depression in this population.

The HADS was sent to 205 patients with PD, together with three quality-of-life (QoL)

instruments, i.e., the Parkinson’s Disease Questionnaire (PDQ-39), the EQ-5D, and a visual

analogue scale (VAS). HADS scores were also compared with Hoehn and Yahr (H&Y)

scores. Eighty-six percent of the patients returned the questionnaires. Cronbach’s α for the

HADS was 0.88. Test-retest reliability over two weeks was 0.84 for the sum score of the

HADS (intraclass correlation coefficient), and ranged from 0.42-0.76 for individual items

(weighted kappa). Factor analysis revealed two factors, accounting for 51.9% of the variance.

One factor represented anxiety, the other depression. Correlations with the PDQ-39, EQ-5D,

VAS, and H&Y were 0.72, -0.59, -0.59, and 0.32, respectively (all p-values < 0.001).


- 201 -

Depression scores accounted for 52% of the variance in QoL, whereas disease severity

explained 24%. Using the cut-off values proposed by the developers indicated that possible

and probable anxiety was present in 28.9 and 19.8 percent of the patients, respectively.

Percentages for possible and probable depression were 21.6 and 16.5. We conclude that the

psychometric performance of the HADS in patients with PD is satisfactory. In addition,

almost 50% of the patients display symptoms of anxiety, whereas nearly 40% show signs of

depression.

In chapter 6 we assessed the sensitivity of individual depressive symptoms and their relative

contribution to the diagnosis of depressive disorder in patients with PD. This study was

conducted because there is a considerable overlap between the somatic symptoms of PD and

those of depression, which makes it particularly difficult to diagnose depression in patients

with PD. The significance of the somatic symptoms of depression in patients with PD is

therefore unclear. To evaluate this, the Structured Clinical Interview for DSM-IV Depression

and the Hamilton and the Montgomery-Åsberg Depression Rating Scales (Ham-D, MADRS)

were administered to 149 consecutive nondemented patients. The contribution of the

individual items of these scales to the diagnosis of “depressive disorder” was calculated by

discriminant analysis. The discriminant models based on the Ham-D and MADRS scores

were both highly significant. Nonsomatic core symptoms of depression (anhedonia and mood)

had the highest correlation coefficient. Somatic items had mostly low correlation coefficients,

with the exception of reduced appetite and early morning wakening. The conclusion of the

study was that nonsomatic symptoms of depression appear to be the most important for

distinguishing between depressed and nondepressed patients with PD, along with reduced

appetite and early morning awakening.

The objective of the study presented in chapter 7 was to develop and test a questionnaire for

autonomic symptoms in PD, the SCOPA-AUT. First, items were generated by an extensive

literature search and by consulting experts. Based on the results of a postal survey in 46

patients with PD, 21 patients with multiple system atrophy (MSA), and 8 movement disorders

specialists, items were reduced according to frequency, burden, and clinical relevance of

symptoms. Evaluation of the validity of the questionnaire was based on the results of a second

postal survey in 140 PD patients and 100 controls. Medication was assessed in an additional

question. The test-retest reliability was established in a subsample of 55 PD patients, who

received the questionnaire twice. The initial 45 items were reduced to 25 items addressing the

Chapter 12

- 202 -

following domains: gastrointestinal (7 items), urinary (6 items), cardiovascular (3 items),

thermoregulatory (4 items), pupillomotor (1 item), and sexual dysfunction (2 items for men

and 2 for women). Test-retest reliability was good for all items and domain scores, and all

domains demonstrated high internal consistency. Each domain had a good content validity.

All domains and most items differentiated between the PD and control groups. A significant

increase in autonomic problems was found in patients in the more advanced disease stages for

all autonomic domains, except sexual dysfunction. We conclude that the SCOPA-AUT is a

reliable and valid questionnaire that accurately evaluates autonomic disturbances in patients

with PD.

In chapter 8 we evaluated the reliability and construct validity of the SPES/SCOPA, a short

scale developed to assess motor function in patients with PD. Eighty-five patients with PD

were assessed with the SPES/SCOPA, Unified Parkinson’s Disease Rating Scale (UPDRS),

Hoehn and Yahr (H&Y) scale, and Schwab and England (S&E) scale. Thirty-four patients

were examined twice by two different assessors, who were blinded to each other’s scores and

test executions ('clinical assessment situation'). Additionally, six items of the motor section of

the SPES/SCOPA were assessed in nine patients and recorded on videotape, to evaluate inter-

rater and intra-rater reliability for this situation ('video assessment situation'). The

reproducibility of the sumscores in the clinical assessments was high for all subscales of the

SPES/SCOPA. Inter-rater reliability of individual items ranged from 0.27-0.83 in the motor

impairment section, from 0.58-0.82 in the Activities-of-Daily-Living section, and from 0.65-

0.92 in the motor complications section. Inter-rater reliability of the motor items in the video

assessments ranged from 0.69-0.87, and intra-rater reliability from 0.81-0.95. The correlation

between related subscales of the SPES/SCOPA and UPDRS were all over 0.85 and both

scales revealed similar correlations with other measures of disease severity. The mean time to

complete the scales was 8.1 (SD1.9) minutes for the SPES/SCOPA and 15.6 (SD 3.6) minutes

for the UPDRS. We conclude that the SPES/SCOPA is a reliable, valid, and short scale that

can adequately be used in both research and clinical practice.

The objective of the study in chapter 9 was to develop a PD diary that evaluates a patient’s

difficulties in performing activities, as a substitute for the amount of 'on'- and 'off'-time, and

to assess its clinimetric qualities. In this study 84 patients with PD kept a diary for two or

three periods of five days. Daily, five items were recorded across 11 time periods. Patients

simultaneously recorded being ‘on' or 'off’ in the traditional way. The diary was easily


- 203 -

understood and median recording time was 5-10 minutes a day. Clinimetric analysis showed

that the diary could successfully be reduced to three days, in which five items (walking,

transfers, manual activities, dyskinesias, and sleep) with four response options were assessed

seven times daily. Sumscores of the first three items accurately predicted being 'on' or 'off' in

93% of the cases, therewith making separate scoring of 'on' and 'off' unnecessary. The diary

was internally consistent and showed good reproducibility. Construct validity with external

measures was adequate, and comparisons between patients grouped by disease severity and

by degree of fluctuations, revealed significant differences in the expected directions. Our

conclusion is that the SCOPA Diary Card has a sound clinimetric basis and provides

information on the extent of perceived disability, therewith accurately reflecting both the

severity of off-periods and the variability of motor fluctuations.

Chapter 10 describes a study that had the objective to develop a valid, reliable, and short

questionnaire (SCOPA-SLEEP) that assesses nighttime sleep (NS) and daytime sleepiness

(DS) in patients with PD. A postal survey including four instruments, the SCOPA-SLEEP NS

(5 items) and DS (6 items), the Pittsburgh Sleep Quality Index (PSQI), and the Epworth

Sleepiness Scale (ESS) was completed by 142 patients with PD and 100 controls. We found

that the reliability of the questionnaire was high: internal consistency of the NS and DS scales

was 0.88 and 0.91, respectively (Cronbach's alpha), and test-retest reliability was 0.94 and

0.89, respectively (intraclass correlation coefficient). Scale scores differed significantly

between patients and controls. Construct validity was assessed by correlations with scales that

addressed similar constructs. Correlation between the NS scale and the PSQI was 0.83, and

the correlation between the DS scale and the ESS was 0.81. Factor analysis revealed one

factor each for both scales, indicating that the scales measure one construct, which justifies

the calculation of sumscores. The coefficient of variation of both the NS and the DS scale was

higher than that of the PSQI and the ESS, indicating a better ability to detect differences

between individuals. Our conclusion is that the SCOPA-SLEEP is a practical, reliable, and

valid instrument for assessing nighttime sleep and daytime sleepiness in patients with PD.

The purpose of the study presented in chapter 11 was to develop a short questionnaire for

psychosocial functioning in patients with PD. The SCOPA-PS was tested in a survey and

compared with other instruments and with medical information. This survey was sent to 205

patients with idiopathic PD. Eighty-six percent of the questionnaires were returned. Internal

consistency and test-retest reliability were high, indicating good reliability. Construct validity

Chapter 12

- 204 -

with other diseases-specific QoL instruments (PDQ-39, PDQ-8) was also high and

correlations with other scales (HADS, EQ-5D) were as expected. The summary index

revealed a significant increase with increasing disease severity. The conclusion of this study is

that the SCOPA-PS is a new, short psychosocial questionnaire for patients with PD with good

clinimetric properties.

Concluding remarks

Looking back on this study there are a few points that need further clarification. At the

conception of this study in 1997, a choice had to be made between two research frameworks,

i.e., the International Classification of Impairments, Disabilities, and Handicaps, version I

(ICIDH-1)2 and the Disablement Process of Verbrugge and Jette.1 The latter model was

chosen, because it allowed better insight in the course of the disease. The disablement process

proposes bi-directional relations between domains and acknowledges the influence of extra-

and intra-individual factors that may modify the course of the disease and its impact on the

individual. These influences were not accounted for in the ICIDH-1. Other disadvantages of

this latter model were the supposed linear and causal relationships, and the lack of mutual

exclusiveness between domains.3 For instance, is experienced difficulty with household work

a disability (difficulty with performing basic tasks) or a handicap (experienced disadvantage

to fulfil social roles)? The updated version of the ICIDH-1, known as the ICF (International

Classification of Functioning, disability, and health),4 now also includes personal and

environmental influences and bi-directional relationships between domains, and differences

between the two models have become much smaller. Choosing between these models would

now have been more difficult, because of the smaller differences, but also less essential,

because these differences are less fundamental.

One impairment scale that is still being tested and hence not present in this thesis, is the

Parkinson Psychosis Rating Scale (PPRS). This scale evaluates psychiatric complications.

The reason for the delay is that we encountered some shortcomings of this scale later in the

course of the evaluation process. The scale has now been modified in co-operation with the

developers, and the modified version is currently being assessed. Two other scales, addressing

costs of PD and comorbidity, have already been tested, but will be presented in another thesis.

For the calculation of costs, a questionnaire has been developed and for the assessment of

comorbidity an existing instrument, the Cumulative Illness Rating Scale–Geriatric (CIRS-G)

will be used.


- 205 -

We have selected and developed instruments that met our initial criteria of practicality (short

and easy to administer) and quality (valid and reliable). These scales will be used in the

longitudinal phase of the study. An overview of the instruments is presented in table 1. Six

instruments included in the SCOPA are completed by patients, that is, the HADS (mood),

SCOPA-AUT (autonomic disturbances), SCOPA Diary Card (motor complications), SCOPA-

SLEEP (evaluation of night-time sleep and daytime sleepiness), SCOPA-PS (psychosocial

disability), and costs. The other six instruments are administered by researchers or physicians:

the SPES/SCOPA (with scales on motor impairments, ADL, and motor complications), the

SCOPA-COG (cognition), the PPRS (psychiatric complications), and the CIRS-G

(comorbidity). The time to complete the full set of scales is approximately one hour for

patients and 45 minutes for investigators. We are currently comparing a self-administered

version of the ADL scale of the SPES/SCOPA with the physician-administered version, to

evaluate whether the potential loss of information outweighs the reduction in research burden

for physicians or researchers.

Table 1. Scales for the longitudinal phase of the SCOPA project

Completed by Domain Scale items minutes

Patient mood HADS 21 10

autonomic disturbances SCOPA-AUT 23 10

sleep SCOPA-SLEEP 14 8

psychosocial disability SCOPA-PS 11 6

costs Questionnaire on costs 18 20

motor complications SCOPA Diary Card 5 8

Neurologist motor evaluation SPES/SCOPA-motor impairments 10 6

or Activities of Daily Living SPES/SCOPA-ADL 7 5

Researcher motor complications SPES/SCOPA-motor complications 4 2

cognition SCOPA-COG 10 10

psychiatric complications PPRS 6 10

comorbidity CIRS-G 14 10

Chapter 12

- 206 -

Looking to the future we hope that our multi-modular scale, the SCOPA, will fulfil the

expectations we have for this instrument. The primary objective of this study has been to

develop a toolbox that includes modules that can be used either separately or as a whole. The

full scale will be used in the annual follow-up of a group of patients with PD, stratified by

disease duration and age at onset of the disease. This allows a more thorough assessment of

relations between various variables, such as personal characteristics, social and environmental

influences, comorbidity, and disease parameters. Some of the previously supposed relations

may be confirmed or rejected, while new associations may be found. Additionally, the

complete assessment will give a more accurate description of the phenotype of patients and

may allow a more detailed assessment of the relation with genetic make-up. However, the

most important hope we have for this instrument is that it will lead to a better understanding

of PD and will help to indicate new directions for research and therapy, thus improving the

future prognosis of patients with Parkinson's disease.

References




1980.





- 207 -

Samenvatting en conclusies


- 208 -

In dit proefschrift wordt de eerste fase beschreven van een onderzoek naar het 'disablement

proces' van de ziekte van Parkinson (ZvP). Het doel van dit project is gegevens te verschaffen

over de gevolgen van het ziekteproces voor patiënten met de ZvP. Deze eerste fase (genaamd

het SCOPA project, een afkorting van SCales for Outcomes in PArkinson's disease, ofwel

schalen voor uitkomstmaten bij de ZvP) houdt zich bezig met het bijeenbrengen van de goede

instrumenten, de tweede fase behelst het gebruik van deze instrumenten in het longitudinale

onderzoek van patiënten.

Hoofdstuk 1 is de inleiding van dit proefschrift waarin wordt beschreven dat op vele

relevante domeinen van de ZvP goede meetinstrumenten ontbreken. We gebruiken het model

van het 'disablement process'1 om reeds bestaande instrumenten voor gebruik bij de ZvP te

testen óf om nieuwe instrumenten te ontwikkelen, die valide, betrouwbaar en conceptueel

helder zijn. 'Disability' is een Angelsaksisch begrip voor de mate van verlies van menselijke

activiteiten binnen een relevante sociaal-culturele en maatschappelijke context.2 'Disability'

ontstaat als een persoon als gevolg van een medische conditie niet kan beantwoorden aan de

sociaal-maatschappelijke taakstelling. Het raamwerk van het 'disablement process' beschrijft

een pad dat pathologie verbindt met stoornissen ('impairments'), beperkingen in vaardigheden

of bewegen ('functional limitations') en beperkingen in activiteiten, bezigheden of participatie

('disablities'). Het model gaat uit van wederzijdse relaties tussen deze entiteiten. Extra- en

intra-individuele factoren, en co-morbiditeit als een bijzonder geval van een intra-individuele

factor, werken op dit pad en kunnen het ziektebeloop modificeren en de mate waarin iemand

beperkt wordt, beïnvloeden.

De volgende domeinen op stoornisniveau werden dermate belangrijk geacht dat er apart

aandacht aan diende te worden besteed: cognitie, stemming, motorische functies, motorische

complicaties en autonome functies (hoofdstukken 4 tot en met 8). Op beperkingniveau

werden modules ontwikkeld voor de psychosociale gevolgen van de ZvP, activiteiten van het

dagelijks leven (ADL) en slaapproblemen (hoofdstukken 8 tot en met 11). Andere modules

die werden getest of ontwikkeld, maar die niet in dit proefschrift worden beschreven,

betreffen comorbiditeit en kosten.

In hoofdstuk 2 wordt een overzicht gegeven van de klinimetrische karakteristieken van

schalen, die gebruikt worden om de motorische stoornissen en beperkingen van de ZvP te

beoordelen. Het artikel omvat een literatuuroverzicht van 30 studies, die tezamen inzicht

geven in de klinimetrische karakteristieken van 11 verschillende schalen. De uitkomstmaten


- 209 -

waren validiteit, betrouwbaarheid en responsiviteit. We vonden drie schalen op stoornisniveau

(Webster, Columbia University Rating Scale [CURS], Parkinson's Disease Impairment Scale),

vier op beperkingniveau (Schwab and England, Northwestern University Disability Scale

[NUDS], Intermediate Scale for Assessment of Parkinson's Disease, Extensive Disability

Scale) en nog eens vier schalen die zowel stoornissen als beperkingen evalueerden (New York

University Parkinson's Disease Rating Scale, University of California Los Angeles Scale,

Unified Parkinson's Disease Rating Scale [UPDRS], Short Parkinson Evaluation Scale). De

schalen lieten grote onderlinge verschillen zien in de verhouding tussen het aantal items dat

beschouwd wordt gevoelig te zijn voor dopaminerge behandeling (dopa-responsief) en items

die pas later in de ziekte optreden en juist minder goed op deze behandeling reageren (dopa-

resistent). Ongeacht om welke schaal het ging, was er weinig consistentie te ontdekken in de

interbeoordelaarsbetrouwbaarheid van items die bradykinesie, tremor en rigiditeit

beoordeelden. Over het algemeen genomen vertoonden de items een matige tot goede

interbeoordelaarsbetrouwbaarheid. De beschikbare informatie toonde aan dat de CURS, de

NUDS en de UPDRS een matig tot goede betrouwbaarheid en validiteit hebben. De meeste

instrumenten vertoonden klinimetrische tekortkomingen of waren niet grondig getest,

ondanks hun veelal frequente gebruik. De CURS, NUDS en UPDRS werden het meest

geëvalueerd en kunnen als betrouwbaar en valide worden beschouwd.

In hoofdstuk 3 vergeleken we vier ziektespecifieke kwaliteit-van-leven (KvL) instrumenten

en evalueerden hun klinimetrische eigenschappen. Twee reviewers beoordeelden,

onafhankelijk van elkaar, de grondigheid en de resultaten van 20 studies. De inhoudsvaliditeit

van de Parkinson’s Disease Questionnaire-39 (PDQ-39), de Parkinson’s Disease Quality of

Life vragenlijst (PDQL) en de ‘Fragebogen Parkinson LebensQualität’ (PLQ) was voldoende

tot goed, maar die van de Parkinson’s Impact Scale (PIMS) was onvoldoende. De

constructvaliditeit van zowel de PDQ-39 als de PDQL was goed, maar voor de PLQ en de

PIMS was dit onvoldoende onderzocht. De interne consistentie van alle vier de volledige

schalen en van alle subschalen van de PDQL was goed, terwijl dit voor de 'sociale steun'

subschaal van de PDQ-39 en voor vier subschalen van de PDQL onvoldoende was. De test-

hertestbetrouwbaarheid van de PDQL bleek niet onderzocht te zijn, maar in het geval van de

overige schalen was deze voldoende. De responsiviteit van de PDQ-39 was gedeeltelijk

onderzocht, maar voor de overige schalen was dit nog niet geëvalueerd. Het aantal

beschikbare vertalingen, alsmede het aantal studies waarin de instrumenten gebruikt werden,

verschilde aanzienlijk. De conclusie van onze studie was dat de selectie van het


- 210 -

meetinstrument natuurlijk afhangt van het doel van de studie, maar dat in de meeste gevallen

de PDQ-39 het meest geschikte meetinstrument is dat op dit moment voorhanden is. De

PDQL kan als een alternatief worden beschouwd, terwijl het gebruik van de PLQ in een

Duitssprekende populatie kan worden overwogen. Gebruik van de PIMS moet hooguit

worden overwogen als methode om mogelijke probleemgebieden op te sporen.

Hoofdstuk 4 gaat over het onderzoek naar de cognitie bij patiënten met de ZvP. We stellen

dat de cognitieve problemen bij de ZvP mogelijk worden onderschat als de klassieke cognitie-

instrumenten worden gebruikt. Deze instrumenten leggen in het algemeen de nadruk op de

corticale functies. Sommige van deze functies blijven bij Parkinson echter gespaard, terwijl

andere cognitieve functies, die bij deze aandoening vaak zijn aangedaan, in deze schalen juist

ontbreken. Het doel van de in dit hoofdstuk beschreven studie was om een korte, praktische

schaal te ontwikkelen, die gevoelig is voor de specifieke cognitieve problemen van de ZvP, en

die valide en betrouwbaar is. Het was niet de bedoeling een screeningsinstrument of een

diagnostisch instrument te ontwikkelen. De schaal is bedoeld om in onderzoekssituaties

groepen met elkaar te vergelijken en om veranderingen in individueel functioneren in de tijd

te onderzoeken.

Allereerst hebben we de literatuur bestudeerd om na te gaan welke cognitieve domeinen bij de

ZvP het vaakst zijn aangedaan en om kandidaat-items uit deze domeinen te selecteren voor de

initiële schaal. Deze schaal werd vervolgens getest bij 85 patiënten met de ZvP en 75

controlepersonen, die wat betreft leeftijd, opleiding en geslacht vergelijkbaar waren. Items die

voldeden aan vantevoren vastgestelde criteria voor datakwaliteit, reproduceerbaarheid en

discriminatieve eigenschappen, werden opgenomen in de uiteindelijke schaal. Deze schaal, de

SCOPA-COG, bestaat uit 10 items met een maximumscore van 43. Een hogere score duidt op

een betere prestatie. De test-hertestbetrouwbaarheid van de somscore was 0.78 (intraclass

correlatiecoëfficiënt), die van de afzonderlijke items lag tussen 0.40 en 0.75 (gewogen kappa).

Cronbach's α bedroeg 0.83. De constructvaliditeit van de schaal werd ondersteund door de

verwachte relaties met de CAMCOG en de MMSE, alsmede door de verschillen die werden

gevonden tussen groepen deelnemers die waren ingedeeld naar het al of niet aanwezig zijn

van dementie én tussen groepen patiënten die op grond van hun ziekte-ernst werden

onderscheiden. De schaal liet een duidelijke trend zien naar lagere cognitiescores voor

patiënten in een meer gevorderd ziektestadium. Deze trend was meer uitsproken bij de

SCOPA-COG dan bij de CAMCOG en de MMSE. De variatiecoëfficiënt van de SCOPA-

COG was groter dan die van de CAMCOG en de MMSE, hetgeen duidt op een groter


- 211 -

discriminerend vermogen. De conclusie van dit onderzoek was dat de SCOPA-COG een kort,

betrouwbaar en valide instrument is dat gevoelig is voor de specifieke cognitieve stoornissen

van de ZvP en gebruikt kan worden om in onderzoekssituaties groepen patiënten met elkaar te

vergelijken.

Het doel van de in hoofdstuk 5 beschreven studie was om de psychometrische eigenschappen

van de Hospital Anxiety and Depression Scale (HADS) bij patiënten met de ZvP te

onderzoeken én om de prevalentie van symptomen van angst en depressie in deze populatie te

bepalen. De HADS werd verstuurd naar 205 patiënten met de ZvP, tezamen met drie KvL

instrumenten, te weten de PDQ-39, de EuroQol (EQ-5D) en een visueel analoge schaal

(VAS). De HADS scores werden tevens vergeleken met de Hoehn en Yahr (H&Y) scores.

Zesentachtig procent van de patiënten stuurde de vragenlijsten terug. De Cronbach’s α van de

HADS was 0.88. De test-hertestbetrouwbaarheid over twee weken was 0.84 voor de somscore

van de HADS (intraclass correlatiecoëfficiënt) en bedroeg 0.42-0.76 voor de afzonderlijke

items (gewogen kappa). De factoranalyse liet twee factoren zien, die 51.9% van de variantie

bepaalden. De ene factor vertegenwoordigde angst, de andere depressie. De correlaties met de

PDQ-39, EQ-5D, VAS en H&Y waren respectievelijk 0.72, -0.59, -0.59 en 0.32 (alle p-

waarden < 0.001). Depressie bepaalde 52% van de variantie in KvL, terwijl ziekte-ernst 24%

ervan bepaalde. De door de ontwikkelaars van de schaal voorgestelde afkappunten werden

aangehouden en hierbij bleek er van 'mogelijke' en 'waarschijnlijke' angst sprake te zijn bij

respectievelijk 28.9 en 19.8% van de patiënten. De percentages voor mogelijke en

waarschijnlijke depressie bedroegen 21.6 en 16.5. De conclusie was dat de psychometrische

eigenschappen van de HADS in een populatie van patiënten met de ZvP bevredigend waren.

Daarnaast vertoonde bijna 50% van de patiënten tekenen van angst en bijna 40% tekenen van

depressie.

In hoofdstuk 6 onderzochten we de sensitiviteit van de afzonderlijke depressieve symptomen

en hun relatieve bijdrage aan de diagnose van depressie bij patiënten met de ZvP. Deze studie

werd uitgevoerd omdat er een aanzienlijke overlap bestaat tussen de somatische

verschijnselen van de ZvP en die van depressie, hetgeen het bijzonder lastig maakt om

depressie te diagnosticeren bij patiënten met de ZvP. De betekenis van deze somatische

symptomen van depressie is bij mensen met de ZvP daarom onduidelijk. Om deze betekenis

nader te onderzoeken, werd het gestructureerde klinische interview voor DSM-IV Depressie

en de Hamilton en Montgomery-Åsberg depressieschalen (Ham-D, MADRS) afgenomen bij


- 212 -

149 opeenvolgende niet-demente patiënten. De bijdrage van de afzonderlijke items van deze

schalen aan de diagnose 'depressie' werd bepaald met behulp van discriminantanalyse.

De discriminantmodellen gebaseerd op de scores van de Ham-D en de MADRS waren beide

zeer significant. De niet-somatische kernsymptomen van depressie (anhedonie en stemming)

hadden de hoogste correlatiecoëfficiënt. De somatische items hadden meestal lage

correlatiecoëfficiënten, met uitzondering van 'verminderde eetlust' en 'vroeg ontwaken'. De

conclusie van de studie was dat de niet-somatische symptomen van depressie het belangrijkste

lijken te zijn bij het onderscheiden van depressieve en niet-depressieve patiënten, tezamen

met 'verminderde eetlust' en 'vroeg ontwaken'.

Het doel van de in hoofdstuk 7 beschreven studie was om een vragenlijst te ontwikkelen voor

de stoornissen van de autonome functies bij de ZvP, de SCOPA-AUT. Eerst werden, door

middel van literatuuronderzoek en het raadplegen van experts, items gegenereerd. Op basis

van de resultaten van een postenquête onder 46 patiënten met de ZvP, 21 patiënten met

multiple systeem atrofie (MSA) en 8 specialisten in bewegingsstoornissen, werden de items

vervolgens gereduceerd. Frequentie van voorkomen, ervaren hinder en klinische relevantie

bepaalden of items al dan niet werden behouden. De validiteit van de aldus verkregen

vragenlijst werd onderzocht in een tweede postenquête onder 140 patiënten met de ZvP en

100 controlepersonen. De test-hertestbetrouwbaarheid werd bepaald in subgroep van 55

patiënten die de vragenlijst twee keer ontvingen. De aanvankelijke 45 items werden

gereduceerd tot 25 items en bestreken de volgende gebieden: gastro-intestinaal (7 items),

urogenitaal (6 items), cardiovasculair (3 items), thermoregulatoir (4 items), pupillomotorisch

(1 item) en seksuele disfunctie (2 items voor mannen en 2 voor vrouwen). Medicatie werd in

een apart item uitgevraagd. De test-hertest betrouwbaarheid van de domeinscores en de

afzonderlijke items was goed en alle domeinen lieten een hoge interne consistentie zien. Elk

domein had een goede inhoudsvaliditeit. Alle domeinen en het merendeel van de items

differentieerden tussen patiënten en controles. We vonden een significante toename van

autonome problemen bij toenemende ziekte-ernst in alle domeinen, met uitzondering van

seksuele disfunctie. Onze conclusie was dat de SCOPA-AUT een betrouwbare en valide

vragenlijst is, die het mogelijk maakt op accurate wijze de autonome stoornissen bij de ZvP te

evalueren.


- 213 -

In hoofdstuk 8 onderzochten we de betrouwbaarheid en constructvaliditeit van de

SPES/SCOPA, een schaal die ontworpen is om de motorische functie van patiënten met de

ZvP te onderzoeken. Vijfentachtig patiënten met de ZvP werden onderzocht met de

SPES/SCOPA, de Unified Parkinson’s Disease Rating Scale (UPDRS), de Hoehn en Yahr

schaal (H&Y) en de Schwab en England schaal (S&E). Vierendertig patiënten werden twee

maal onderzocht door twee verschillende onderzoekers, die geblindeerd waren voor elkaar's

scores en testuitvoering. Daarnaast werden er van negen patiënten video-opnamen gemaakt,

teneinde de inter- en intrabeoordelaarsbetrouwbaarheid voor deze situatie vast te stellen.

De reproduceerbaarheid van de somscores in het klinisch onderzoek, berekend met behulp

van een intraclass-correlatiecoëfficiënt (ICC), was hoog voor alle subschalen van de

SPES/SCOPA. De interbeoordelaarsbetrouwbaarheid van de afzonderlijke items van de

motore sectie liep van 0.27-0.83, van de ADL-sectie van 0.58-0.82 en van de sectie

motorische complicaties van 0.65-0.92. De interbeoordelaarsbetrouwbaarheid van de

afzonderlijke items van de motorsectie in het video-onderzoek liep van 0.69-0.87 en voor de

intrabeoordelaarsbetrouwbaarheid van 0.81-0.95. De correlatie tussen gerelateerde subschalen

van de SPES/SCOPA en de UPDRS was, met alle coëfficiënten boven 0.85, hoog. Beide

schalen lieten vergelijkbare correlaties zien met de diverse maten voor ziekte-ernst. De tijd

benodigd voor het invullen van de SPES/SCOPA was 8 minuten, die voor de UPDRS 16

minuten. Wij concludeerden dat de SPES/SCOPA een korte, betrouwbare en valide schaal is,

met goede eigenschappen voor onderzoek en klinische praktijk.

Het doel van de in hoofdstuk 9 beschreven studie was om een Parkinsondagboek voor

patiënten met motorische fluctuaties te ontwikkelen. Dit dagboek ging niet uit van de

hoeveelheid 'on'- en 'off'-tijd, maar nam de moeilijkheden die een patiënt ervaart bij het

uitvoeren van activiteiten als uitgangspunt. In deze studie hielden 84 patiënten met de ZvP

gedurende twee of drie periodes van vijf dagen een dagboek bij. Per dag werden er over 11

periodes steeds vijf items beoordeeld. Patiënten hielden tegelijkertijd op de gebruikelijke

manier bij of zij ‘on' dan wel 'off’ waren. Patiënten hadden er geen moeite mee om het

dagboek te begrijpen. De mediane invultijd bedroeg 5-10 minuten per dag. Statistische

analyses maakten duidelijk dat het dagboek teruggebracht kon worden naar drie dagen,

waarbij er zeven maal daags vijf verschillende items (lopen, positieveranderingen, manuele

activiteiten, dyskinesieën en slaap) met elk vier responsopties moesten worden beoordeeld.

Op basis van de somscores over de eerste drie items, kon in 93% van de gevallen het 'on' dan

wel 'off' zijn correct worden voorspeld. Een aparte evaluatie hiervan wordt daarmee


- 214 -

overbodig. Het dagboek bleek intern consistent en had een goede reproduceerbaarheid. De

constructvaliditeit met externe maten kwam overeen met de verwachtingen. Ook de

vergelijkingen tussen patiënten ingedeeld naar hun ziekte-ernst en naar de ernst van hun

fluctuaties lieten significante verschillen in de verwachte richtingen zien. Onze conclusie was

dat de SCOPA Dagboekkaart een goede klinimetrische basis heeft en informatie verschaft

over de ervaren problemen met het uitvoeren van activiteiten, waarbij zowel de ernst van de

'off'-periodes als de variabiliteit van de motorische fluctuaties goed worden weerspiegeld.

In Hoofdstuk 10 wordt een studie beschreven die tot doel had om een korte, betrouwbare en

valide vragenlijst, de SCOPA-SLEEP, te ontwikkelen. Deze lijst informeerde bij patiënten

met de ZvP zowel naar het slapen 's nachts (NS) als naar slaperigheid overdag (DS). Een

postenquête met vier meetinstrumenten, de SCOPA-SLEEP NS (5 items) en DS (6 items), de

Pittsburgh Sleep Quality Index (PSQI) en de Epworth Sleepiness Scale (ESS), werd ingevuld

door 142 patiënten met de ZvP en 100 controlepersonen. Wij vonden een hoge

betrouwbaarheid voor deze vragenlijst: de interne consistentie van de NS en DS schalen was

respectievelijk 0.88 en 0.91 (Cronbach's alpha) en de test-hertestbetrouwbaarheid was

respectievelijk 0.94 en 0.89 (ICC). De somscores op de schalen verschilden significant tussen

patiënten en controles. De constructvaliditeit werd bepaald aan de hand van correlaties met

schalen die gerelateerde constructen evalueerden. De correlatie tussen de NS schaal en de

PSQI was 0.83 en de correlatie tussen de DS schaal en de ESS was 0.81. Een factoranalyse

liet een factor zien voor elke SCOPA-schaal. Dit geeft aan dat de schalen een enkel construct

meten, hetgeen het berekenen van somscores rechtvaardigt. De variatiecoëfficiënt van zowel

de NS- als de DS-schaal was hoger dan die van de PSQI en de ESS, hetgeen duidt op een

groter vermogen om verschillen tussen individuen te ontdekken. Wij oordeelden dat de

SCOPA-SLEEP een betrouwbaar, valide en praktisch instrument is om zowel het slapen 's

nachts als slaperigheid overdag bij patiënten met de ZvP te onderzoeken.

Het doel van de studie die in hoofdstuk 11 wordt beschreven, was om een korte vragenlijst te

ontwikkelen voor het psychosociaal functioneren van patiënten met de ZvP. De SCOPA-PS

werd getest in een enquête en werd vergeleken met andere instrumenten en met informatie uit

medische dossiers. De enquête werd verzonden naar 205 patiënten met idiopathische

Parkinson. Zesentachtig procent van de vragenlijsten werd teruggestuurd. De interne

consistentie en de test-hertestbetrouwbaarheid waren hoog, hetgeen duidt op een goede

betrouwbaarheid. De constructvaliditeit met andere ziektespecifieke KvL-instrumenten (PDQ-


- 215 -

39, PDQ-8) was eveneens hoog en de correlaties met andere schalen (HADS, EQ-5D)

kwamen overeen met onze verwachtingen. De somscore liet een significante toename zien bij

toenemende ziekte-ernst. De conclusie van de studie was dat de SCOPA-PS een nieuwe,

korte, psychosociale vragenlijst voor patiënten met de ZvP is, die goede klinimetrische

eigenschappen heeft.

Tot besluit

Terugkijkend op deze studie zijn er een aantal punten die een nadere uitleg behoeven. Toen in

1997 het idee voor deze studie ontstond, moest er een keuze gemaakt worden tussen twee

onderzoeksmodellen, namelijk die van de internationale classificatie van stoornissen,

beperkingen en handicaps versie I (ICIDH-1; International Classification of Impairments,

Disabilities, and Handicaps)3 en het model van het Disablement Process van Verbrugge en

Jette.1 Het laatste model werd gekozen, omdat het een beter inzicht verschafte in het beloop

van de ziekte. Het 'disablement process' gaat uit van wederzijdse relaties tussen domeinen en

erkent de invloed van extra- en intra-individuele factoren die het verloop van de ziekte en de

invloed die dit op het individu heeft, kunnen wijzigen. Met deze invloeden werd geen

rekening gehouden in de ICIDH-1. Andere nadelen van dit laatstgenoemde model waren de

veronderstelde lineaire en causale relaties, en het gebrek aan wederzijdse uitsluiting tussen

domeinen.4 Zijn bijvoorbeeld problemen die bij het huishouden worden ervaren, nu een

beperking (moeilijkheid met het uitvoeren van een basale taak) of een handicap (ervaren

nadeel om een sociale rol te vervullen)? De vernieuwde versie van de ICIDH-1, die bekend

staat als de ICF (International Classification of Functioning, disability, and health),5 erkent

inmiddels ook persoons- en omgevingsinvloeden, en gaat nu ook uit van wederzijdse relaties

tussen domeinen. De verschillen tussen beide modellen zijn nu dan ook veel kleiner

geworden. Kiezen tussen deze twee modellen zou nu, vanwege deze kleinere verschillen,

moeilijker zijn geweest, maar tevens minder essentieel, omdat de verschillen minder

fundamenteel zijn.

Een schaal die nog steeds getest wordt en daarom ontbreekt in dit proefschrift is de Parkinson

Psychosis Rating Scale (PPRS). Met deze schaal worden de psychiatrische complicaties

onderzocht. De reden van de vertraging is dat wij enkele tekortkomingen van deze schaal pas

later in het evaluatieproces tegenkwamen. De schaal is inmiddels in samenwerking met de

oorspronkelijke ontwikkelaars aangepast en de gemodificeerde versie wordt nu onderzocht.


- 216 -

Twee andere schalen, over comorbiditeit en de kosten van de ZvP, zijn al getest, maar zullen

in een ander proefschrift worden beschreven. Voor het berekenen van de kosten is een

vragenlijst ontwikkeld en voor comorbiditeit zal een bestaand meetinstrument, de Cumulative

Illness Rating Scale–Geriatric (CIRS-G) worden gebruikt.

Wij hebben meetinstrumenten geselecteerd en ontwikkeld die voldoen aan onze bij aanvang

gestelde criteria voor bruikbaarheid (kort en eenvoudig af te nemen) en kwaliteit

(betrouwbaar en valide). Deze schalen zullen worden gebruikt in de longitudinale fase van de

studie. Een overzicht van de instrumenten is weergegeven in tabel 1.

Tabel 1. Instrumenten voor de longitudinale fase van het SCOPA-project

Invullen door Domein Schaal items minuten

Patient stemming HADS 21 10

autonome stoornissen SCOPA-AUT 23 10

slaap SCOPA-SLEEP 14 8

psychosociale beperkingen SCOPA-PS 11 6

kosten Kostenvragenijst ZvP 18 20

motorische complicaties SCOPA Dagboekkaart 5 8

Neuroloog motorisch onderzoek SPES/SCOPA-motor sectie 10 6

of Activiteiten dagelijks leven SPES/SCOPA-ADL 7 5

Onderzoeker motorische complicaties SPES/SCOPA-motor complications 4 2

cognitie SCOPA-COG 10 10

psychiatrische complicaties PPRS 6 10

comorbiditeit CIRS-G 14 10

Zes instrumenten die in SCOPA zijn opgenomen, worden door patiënten zelf ingevuld, te

weten de HADS (stemming), de SCOPA-AUT (autonome stoornissen), de SCOPA

dagboekkaart (motorische complicaties), de SCOPA-SLEEP (slaapstoonissen), de SCOPA-PS

(psychosociale beperkingen) en de kostenvragenlijst. De overige zes instrumenten worden

afgenomen door een arts of onderzoeker: de SPES/SCOPA (met schalen over motorische

stoornissen, ADL en motorische complicaties), de SCOPA-COG (cognitie), de PPRS

(psychiatrische complicaties) en de CIRS-G (comorbiditeit). De tijd die nodig is om de


- 217 -

volledige set schalen in te vullen bedraagt circa een uur voor patiënten en 45 minuten voor

onderzoekers. Momenteel vergelijken wij een door patiënten zelf ingevulde versie van de

ADL schaal met de door de arts ingevulde versie, om na te gaan in hoeverre het mogelijke

verlies aan informatie opweegt tegen de reductie in onderzoekslast voor artsen en

onderzoekers.

Kijkend naar de toekomst hopen wij dat dit multi-modulaire instrument, de SCOPA, aan onze

verwachtingen zal voldoen. Het belangrijkste doel van de studie was om een gereedschapkist

te ontwikkelen, die modules bevat die zowel afzonderlijk als in hun geheel kunnen worden

gebruikt. De volledige set schalen zal worden gebruikt bij de jaarlijkse follow-up van een

groep patiënten met de ZvP, die worden gestratificeerd naar ziekteduur en leeftijd bij aanvang

van de ziekte. Dit maakt een grondig onderzoek mogelijk naar de relaties tussen verschillende

variabelen, zoals persoonskenmerken, maatschappelijke factoren en omgevingsinvloeden,

comorbiditeit en ziekteparameters. Sommige van de eerder veronderstelde relaties kunnen dan

geverifieerd of gefalsifieerd worden, terwijl mogelijk weer andere relaties worden gevonden.

Tevens zal het volledige onderzoek een meer accurate beschrijving van het fenotype van

patiënten geven, en kan daarmee een meer gedetailleerd onderzoek naar de relatie met

genetische opmaak mogelijk maken. Wij hopen echter vooral dat het gebruik van dit

instrument zal leiden tot een beter begrip van de ZvP en op deze wijze zal helpen bij het

vinden van nieuwe richtingen voor onderzoek en behandeling, waardoor in de toekomst de

prognose van patiënten met de ZvP zal worden verbeterd.

Referenties


2. Helders PJM, Van der Net J, Engelbert RHH. Functionele diagnostiek en spreekkamerwerkelijkheid.

Fysiopraxis 1999;(3):20-23.



1980.



5. World Health Organization. The international classification of functioning, disability, and health. Internet .


- 219 -

List of Abbreviations

AD Alzheimer's Disease

ADL Activities of Daily Living / Activiteiten van het Dagelijks Leven

ANOVA Analysis of variance

CAMCOG Cognition section of the Cambridge Examination of Mental Disorders of the Elderly

CAMDEX Cambridge Examination of Mental Disorders of the Elderly

CIRS-G Cumulative Illness Rating Scale - Geriatric

CURS Columbia University Rating Scale

CV Coefficient of Variation

DC Diary Card

DS scale Daytime Sleepiness subscale of the SCOPA-SLEEP questionnaire

DSM-IV Diagnostic and Statistical Manual of Mental Disorders, fourth edition

DUPRS Duke University Parkinson's Disease Rating Scale

EDS Extensive Disability Scale

EQ-5D EuroQol (five dimensions)

ESS Epworth Sleepiness Scale

H&Y Hoehn and Yahr scale

HADS Hospital Anxiety and Depression Scale

HAM-D Hamilton rating scale for depression

HRQoL Health-Related Quality of Life

ICC Intraclass Correlation Coefficient

ICF International Classification of Functioning, disability, and health

ICIDH International Classification of Impairments, Disabilities and Handicaps

ISAPD Intermediate Scale for Assessment of Parkinson's Disease

K Kappa

KvL Kwaliteit van Leven

Kw weighted Kappa

MADRS Montgomery-Åsberg Depression Rating Scale

MC Motor Complications

ME Motor Evaluation

MI Motor Impairments

MMSE Minimental State Examination

MRD Minimal Record of Disability

MSA Multiple System Atrophy

NS scale Night-time Sleep subscale of the SCOPA-SLEEP questionnaire

NUDS Northwestern University Disability Scale

- 220 -

NWO Nederlandse organisatie voor Wetenschappelijk Onderzoek; the Netherlands

Organization for scientific research

NYU New York University Parkinson's disease rating scale

PD Parkinson's disease

PDIS Parkinson's Disease Impairment Scale

PDQ-39, PDQ-8 Parkinson's Disease Questionnaire – 39 item version, 8 item version

PDQL Parkinson's Disease Quality of Life questionnaire

PDQUALIF Parkinson's Disease Quality of Life questionnaire

PIMS Parkinson's disease Impact Scale

PLQ Parkinson's disease 'LebensQualität Fragebogen' (Quality of Life questionnaire)

PPRS Parkinson Psychosis Rating Scale

PSQI Pittsburgh Sleep Quality Index

QoL Quality of Life

r Pearson's product moment correlation coefficient

ROC receiver operating characteristic

rs Spearman's rank correlation coefficient

S&E Schwab and England Scale

SCID-D Structural Clinical Interview for DSM-IV Depression

SCOPA SCale for Outcomes in PArkinson's disease

SCOPA-AUT SCale for Outcomes in PArkinson's disease, autonomic module

SCOPA-COG SCale for Outcomes in PArkinson's disease, cognition module

SCOPA-DC SCale for Outcomes in PArkinson's disease Diary Card

SCOPA-PS SCale for Outcomes in PArkinson's disease, psychosocial module

SCOPA-SLEEP SCale for Outcomes in PArkinson's disease, sleep questionnaire

SD Standard Deviation

SI Summary Index

SPES Short Parkinson's Evaluation Scale

SPES/SCOPA Short Parkinson's Evaluation Scale / SCale for Outcomes in PArkinson's disease

SPSS Statistical Package for the Social Sciences

SRM Standardized Response Mean

UCLA University of Columbia Los Angelos Parkinson's disease rating scale

UKPDSBB United Kingdom Parkinson's Disease Society Brain Bank

UPDRS Unified Parkinson's Disease Rating Scale

VAS Visual Analogue Scale

WHO World Health Organization

ZvP Ziekte van Parkinson

- 221 -

Nawoord

Dit proefschrift is mogelijk gemaakt doordat heel veel patiënten, hun partners en soms zelfs

hun familieleden, vrienden en kennissen, bereid waren om mee te werken aan de hier

beschreven onderzoeken. Daarnaast ben ik veel dank verschuldigd aan mensen die als

controlepersoon aan dit onderzoek hebben meegedaan.

- 222 -

List of publications

• Marinus J, Niël CG, De Bie RA, Wiggenraad RG, Schoppink EM, Beukema LH. Measuring

radiation fibrosis: the interobserver reliability of two methods of determining the degree of

radiation fibrosis. Int J Radiat Oncol Biol Phys 2000; 47(5):1209-1217.

• Marinus J. Bestralingsfibrose: ontstaan, onderzoek en behandeling. Oedeminus 2000;3(1):18-25.

• Dijkstra PU, Van Wilgen PC, Buijs RP, Brendeke W, De Goede CJ, Kerst A, Koolstra M, Marinus

J, Schoppink EM, Stuiver MM, Van der Velde CF, Roodenburg JLN. Incidence of shoulder pain

after neck dissection: a clinical explorative study for risk factors. Head Neck 2001; 23(11):947-

953.

• Marinus J, Ramaker C, Van Hilten JJ, Stiggelbout AM. Health related quality of life in Parkinson's


72(2):241-248.

• Marinus J, Visser M, Stiggelbout AM, Rabey JM, Bonuccelli U, Kraus PH, Van Hilten JJ.

Activity-based diary for Parkinson's disease. Clin Neuropharmacol 2002; 25(1):43-50.

• Marinus J, Leentjens AFG, Visser M, Stiggelbout AM, Van Hilten JJ. Evaluation of the hospital

anxiety and depression scale in patients with Parkinson's disease. Clin Neuropharmacol 2002;

25(6):318-324.

• Ramaker C, Marinus J, Stiggelbout AM, Van Hilten BJ. Systematic evaluation of rating scales for

impairment and disability in Parkinson's disease. Mov Disord 2002; 17(5):867-876.

• Leentjens AFG, Marinus J, Van Hilten JJ, Lousberg R, Verhey FRJ. The Contribution of Somatic

Symptoms to the Diagnosis of Depressive Disorder in Parkinson's Disease: A Discriminant

Analytic Approach. J Neuropsychiatry Clin Neurosci 2003; 15(1):74-77.

• Marinus J, Visser M, Martínez-Martín P, Van Hilten JJ, Stiggelbout AM. A short psychosocial

questionnaire for patients with Parkinson's disease. the SCOPA-PS. J Clin Epidemiol 2003;

56(1):61-67.

• Visser M, Marinus J, Bloem BR, Kisjes H, Van den Berg B, Van Hilten JJ. Clinical tests for the

evaluation of postural instability in patients with Parkinson's disease. Arch Phys Med Rehabil

(accepted).

- 223 -

Curriculum vitae

Johan Marinus werd op 7 februari 1954 in Leiden geboren. Na de HBS-A te hebben

doorlopen, studeerde hij fysiotherapie aan de Haagse Academie voor Fysiotherapie. Na het

behalen van het diploma van deze opleiding in 1977, is hij gaan werken op de afdeling

Fysiotherapie van het Westeinde Ziekenhuis te Den Haag (thans Medisch Centrum

Haaglanden). In 1994 werd hij teamleider van deze afdeling. Tussen 1983 en 1991 was hij

tevens docent aan de Haagse Academie voor Fysiotherapie. In 1995 begon hij met de studie

Bewegingswetenschappen aan de Faculteit der Gezondheidswetenschappen van de

Universiteit Maastricht. In 1999 studeerde hij cum laude af. Het afstudeeronderzoek betrof

een vergelijking van de interbeoordelaarsbetrouwbaarheid van twee verschillende methoden

om de stugheid van bestraald borstweefsel te meten. Na afronding van de studie begon hij aan

een baan als onderzoeker in opleiding op de afdeling Neurologie van het Leids Universitair

Medisch Centrum. Het promotieonderzoek had als thema de klinimetrie bij de ziekte van

Parkinson, en de resultaten hiervan zijn in dit proefschrift beschreven. Het onderzoek werd

gefinancierd door de Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) en

het Leids Universitair Medisch Centrum. Gedurende deze periode volgde hij diverse

cursussen op het gebied van epidemiologie en biostatistiek. Het opleidingsprogramma

epidemiologie werd door de Vereniging voor Epidemiologie in 2002 goedgekeurd, zodat na

de promotie de registratie als Wetenschappelijk onderzoeker Epidemiologie kan plaatsvinden.

Sinds 2001 is de auteur als docent betrokken bij het post-HBO onderwijs van de opleiding

Fysiotherapie van de Leidse Hogeschool. Op 1 april 2003 is hij begonnen aan een baan als

onderzoeker op de afdeling Reumatologie van het Erasmus Medisch Centrum te Rotterdam.

Clinimetrics in Parkinson's disease - LUMC

Documents

Transcript of Clinimetrics in Parkinson's disease - LUMC