PCA in studying coordination and variability: a tutorial

14
PCA in studying coordination and variability: a tutorial Andreas Daffertshofer a, * , Claudine J.C. Lamoth a,b , Onno G. Meijer a , Peter J. Beek a a Faculty of Human Movement Sciences, Institute for Fundamental and Clinical Human Movement Sciences, Van der Boechorststraat 9, Vrije Universiteit, 1081 BT Amsterdam, The Netherlands b Department of Orthopedic Surgery, Medical Center Vrije Universiteit, Amsterdam, The Netherlands Received 3 February 2003; accepted 12 January 2004 Abstract Objective. To explain and underscore the use of principal component analysis in clinical biomechanics as an expedient, unbiased means for reducing high-dimensional data sets to a small number of modes or structures, as well as for teasing apart structural (invariant) and variable components in such data sets. Design. The method is explained formally and then applied to both simulated and real (kinematic and electromyographic) data for didactical purposes, thus illustrating possible applications (and pitfalls) in the study of coordinated movement. Background. In the sciences at large, principal component analysis is a well-known method to remove redundant information in multidimensional data sets by means of mode reduction. At present, principal component analysis is starting to penetrate the fundamental and clinical study of human movement, which amplifies the need for an accessible explanation of the method and its possibilities and limitations. Besides mode reduction, we discuss principal component analysis in its capacity as a data-driven filter, allowing for a separation of invariant and variant properties of coordination, which, arguably, is essential in studies of motor variability. Methods. Principal component analysis is applied to kinematic and electromyographic time series obtained during treadmill walking by healthy humans. Results. Common signal structures or modes are identified in the time series that turn out to be readily interpretable. In addition, the identified coherent modes are eliminated from the data, leaving a filtered, residual pattern from which useful information may be gleaned regarding motor variability. Conclusions. Principal component analysis allows for the detection of modes (information reduction) in both kinematic and electromyographic data sets, as well as for the separation of invariant structure and variance in those data sets. Relevance Principal component analysis can be successfully applied to movement data, both as feature extractor and as data-driven filter. Its potential for the (clinical) study of human movement sciences (e.g., diagnostics and evaluation of interventions) is evident but still largely untapped. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Variability; Coordination; Principal component analysis 1. Introduction While biomechanics is the application of mechanics in the study of animals, including man, clinical biome- chanics is the application of biomechanics to (help) solve clinical problems. The large majority of the problems addressed in clinical biomechanics pertain to the diag- nostic and functional evaluation of movement disorders resulting from a broad variety of motor impairments. In addressing such motor problems, clinical biomechanists routinely record kinematic, kinetic and electromyo- graphic signals, which are either analyzed statistically or in terms of an explicit inverse dynamical model. Such analyses, in turn, may provide insight into certain rele- vant biomechanical aspects of particular movement disorders and, in some cases, the underlying motor impairment. In general, in their endeavors, clinical biomechanists are confronted with several formidable challenges. We mention three. First, the human motor apparatus is a * Corresponding author. E-mail address: A_daff[email protected] (A. Daffertshofer). 0268-0033/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.clinbiomech.2004.01.005 www.elsevier.com/locate/clinbiomech Clinical Biomechanics 19 (2004) 415–428

Transcript of PCA in studying coordination and variability: a tutorial

www.elsevier.com/locate/clinbiomech

Clinical Biomechanics 19 (2004) 415–428

PCA in studying coordination and variability: a tutorial

Andreas Daffertshofer a,*, Claudine J.C. Lamoth a,b, Onno G. Meijer a, Peter J. Beek a

a Faculty of Human Movement Sciences, Institute for Fundamental and Clinical Human Movement Sciences, Van der Boechorststraat 9,

Vrije Universiteit, 1081 BT Amsterdam, The Netherlandsb Department of Orthopedic Surgery, Medical Center Vrije Universiteit, Amsterdam, The Netherlands

Received 3 February 2003; accepted 12 January 2004

Abstract

Objective. To explain and underscore the use of principal component analysis in clinical biomechanics as an expedient, unbiased

means for reducing high-dimensional data sets to a small number of modes or structures, as well as for teasing apart structural

(invariant) and variable components in such data sets.

Design. The method is explained formally and then applied to both simulated and real (kinematic and electromyographic) data

for didactical purposes, thus illustrating possible applications (and pitfalls) in the study of coordinated movement.

Background. In the sciences at large, principal component analysis is a well-known method to remove redundant information in

multidimensional data sets by means of mode reduction. At present, principal component analysis is starting to penetrate the

fundamental and clinical study of human movement, which amplifies the need for an accessible explanation of the method and its

possibilities and limitations. Besides mode reduction, we discuss principal component analysis in its capacity as a data-driven filter,

allowing for a separation of invariant and variant properties of coordination, which, arguably, is essential in studies of motor

variability.

Methods. Principal component analysis is applied to kinematic and electromyographic time series obtained during treadmill

walking by healthy humans.

Results. Common signal structures or modes are identified in the time series that turn out to be readily interpretable. In addition,

the identified coherent modes are eliminated from the data, leaving a filtered, residual pattern from which useful information may be

gleaned regarding motor variability.

Conclusions. Principal component analysis allows for the detection of modes (information reduction) in both kinematic and

electromyographic data sets, as well as for the separation of invariant structure and variance in those data sets.

Relevance

Principal component analysis can be successfully applied to movement data, both as feature extractor and as data-driven filter.

Its potential for the (clinical) study of human movement sciences (e.g., diagnostics and evaluation of interventions) is evident but still

largely untapped.

� 2004 Elsevier Ltd. All rights reserved.

Keywords: Variability; Coordination; Principal component analysis

1. Introduction

While biomechanics is the application of mechanics

in the study of animals, including man, clinical biome-

chanics is the application of biomechanics to (help) solve

clinical problems. The large majority of the problems

addressed in clinical biomechanics pertain to the diag-nostic and functional evaluation of movement disorders

*Corresponding author.

E-mail address: [email protected] (A. Daffertshofer).

0268-0033/$ - see front matter � 2004 Elsevier Ltd. All rights reserved.

doi:10.1016/j.clinbiomech.2004.01.005

resulting from a broad variety of motor impairments. In

addressing such motor problems, clinical biomechanists

routinely record kinematic, kinetic and electromyo-

graphic signals, which are either analyzed statistically or

in terms of an explicit inverse dynamical model. Such

analyses, in turn, may provide insight into certain rele-

vant biomechanical aspects of particular movementdisorders and, in some cases, the underlying motor

impairment.

In general, in their endeavors, clinical biomechanists

are confronted with several formidable challenges. We

mention three. First, the human motor apparatus is a

416 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

vastly complex system that is composed of many mov-ing segments that are connected through more than

hundred joints, the vast majority with several axes of

rotation, and powered by hundreds of muscles (Bern-

stein, 1967). Besides active muscular forces and external

forces, any biomechanical analysis aspiring to some

degree of validity will have to take inertial, reactive and

intersegmental forces into account as well. As a conse-

quence of this complexity, biomechanical models areusually limited to certain properties of a kinematic chain

and seldom encompass whole-body motor activities

such as walking or standing upright from a seated po-

sition. Second, the relationship between motor impair-

ments and the resulting functional limitations is very

difficult to assess in any straightforward or conclusive

manner. The reason for this is that patients have the

propensity to adapt to the motor impairments (primarydisorders) they are afflicted with, resulting in movement

patterns or strategies that may be viewed as functionally

optimal given the constraints imposed by the motor

impairments (Latash and Anson, 1996). As a conse-

quence, pathological movement should not be viewed

solely as an overt symptom of a certain underlying

impairment but, at least to a degree, also as a functional

adaptation to (the consequences of) that impairment.Furthermore, healthy, normal movement patterns can-

not simply be used as a norm or standard for patients

with a particular motor pathology. Third, human motor

behavior is intrinsically variable, both within and be-

tween individuals (Newell and Corcos, 1993). By now, a

substantial and rapidly increasing number of basic

studies in motor control have shown that motor vari-

ability is not simply a reflection of random noise butoften contains hidden features and regularities that may

provide insight into motor control and that may be

functionally beneficial (e.g., Collins et al., 1995, Priplata

et al., 2002). As a consequence, there is a clear need to

tease apart (truly) random components from determin-

istic components, which, as a rule, is a tall order.

To deal with these problems, the standard arsenal of

concepts and tools of clinical biomechanics needs to besupplemented with methods aimed at identifying func-

tional units of coordination in the form of synergies or

coordinative structures at both the kinematic and the

muscular level, as well as with methods to distinguish

functional from purely random fluctuations. Besides

accounting for the kinematic redundancy that is given

by the presence of fixed, holonomic (e.g., skeletal) con-

straints and symmetry relations, these methods shouldallow for the detection of time-varying coherent patterns

of coordination. The goal of the present article is to offer

a tutorial on the use of advanced methods of analysis for

pattern detection in multidimensional signals. Whereas

these methods are well established in the sciences at

large, they are only beginning to penetrate the field of

clinical biomechanics. Here, we will explain these

methods in detail, and highlight their potential for the(clinical) study of human movement as we go along.

Before going into medias res, we first list a few related

studies. This list is by no means complete but is simply

meant to underscore the relevance of the methods and

examples to be discussed later. Focusing on the multi-

dimensionality of the data sets, an obvious candidate to

apply multivariate analysis techniques is the study of

human gait patterns, which usually involves therecording and analysis of a large number of kinematic

variables. For instance, for a simplified seven-segment

sagittal plane model more than 80 variables are required

to describe the horizontal, vertical, and rotational dis-

placements, velocities and accelerations of the joints and

segment centers of mass (Winter, 1983). Patently, such

large numbers of variables hamper the clinical use of

gait information and call for a significant reduction ofdata (Andriacchi and Alexander, 2000; Deluzio et al.,

1999). In search for meaningful data reductions, several

groups have tried to identify coordination patterns

during walking. For instance, the kinematic properties

of cyclic arm and leg movements (Diedrich and Warren,

1995; Donker et al., 2001; Wagenaar and van Emmerik,

2000; Wannier et al., 2001; Whithall and Caldwell,

1992), trunk coordination (Feipel et al., 2001; Lamothet al., 2002; McGibbon and Krebs, 2001; Van Emmerik

and Wagenaar, 1996; Vogt et al., 2001), and interseg-

mental coordination between pelvis, thigh, shank, and

foot (Bianchi et al., 1998, Borghese et al., 1996) have

been analyzed in terms of phase and/or frequency

locking behavior. In a similar vein, various aspects of

patterns of muscular activity have been examined, for

instance, by comparing multidimensional activity pro-files during human walking in terms of stride-to-stride

and inter-subject variability (Winter and Yack, 1987),

by looking at correlations between muscles in the lower

extremities (Halliday et al., 2003; Hansen et al., 2001;

Hof et al., 2002; Patla et al., 1985), and by examining

activity patterns within individual muscles (Shiavi and

Griffin, 1981; White and McNair, 2002; Wootten et al.,

1990a,b) and relating those patterns to distinct phases ofthe gait cycle. Finally, at the kinetic level, multivariate

covariance analyses have been applied (Sadeghi et al.,

2000, 2002, 2003) to net sagittal moments at the hip,

knee, and ankle of the lower limb during the stance

phase in order to detect different levels of within and

between muscle activities at each joint (local asymmetry)

and between the legs (global symmetry).

Collectively, the listed studies testify to the basicpremise of the present tutorial, namely, that clinical

biomechanics may benefit from the application of

sophisticated methods of data analysis aimed at mode

reduction as well as the detection of invariant and var-

iant properties of coordination. Since principal compo-

nent analysis (PCA) forms the basis of a broad class of

such methods, we will start with a formally succinct

A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428 417

description of PCA, followed by a discussion of threesimulated (�mock’) examples for didactical purposes.

Subsequently, we illustrate the use of PCA by applying

it to two sets of gait data, one kinematic, and the other

electromyographic. We then go on showing how the

method may be used as a data-driven filter to tease apart

deterministic and random components using the same

sets of signals. Finally, we conclude with a summary and

a brief outlook.

1 Generally speaking, there exists an arbitrary number of vector sets

f~wðkÞg that can be substituted into (1) so that one can alternatively

express the vector ~qðtÞ as ~qðtÞ ¼PN

k¼1 wkðtÞ~wðkÞ––here ~wðkÞ always

reflect N linearly independent N -dimensional, constant vectors and

wk ¼ wkðtÞ are scalar, time-dependent functions.

2. Principal component analysis

Recent advances in data acquisition have led to an

enormous increase in number and length of empirical

signals. In consequence, the a priori selection of only a

few empirical quantities to describe and study processesand phenomena has taken a back seat. Making educated

guesses for the experimental design has been replaced by

off-line selection of common structures in recorded sig-

nals. As a result, the analysis of multi-dimensional sig-

nals has become a central issue across disciplines

highlighting the crucial question of how to define rele-

vant features. What are these features? In the study of

motor control, as in many other areas of research, thisquestion cannot be answered in generality (due to, for

instance, the task-specific form of coordination). How-

ever, irrespective of the explicit nature of the processes

under study, the number of extracted features should

always remain sufficiently small to allow for tracking

them. In mathematical terms, a small number of features

or variables implies that the system in question is low-

dimensional, at least in essence. These dimensionalityarguments, in turn, readily hint at the kind of mathe-

matical tools that appear to be required to define and

then extract main features. In all the studies listed in the

introduction structurally similar values or evolutions are

quantified using the one or the other correlation mea-

sure. High correlations are subsequently considered to

represent relevant properties of the system under

examination. Thus, an obvious candidate to achieve anunbiased feature extraction or dimensionality reduction

can be found within the realm of (conventional) statis-

tics for multivariate data. Statistics provides a wide

variety of measures, e.g., covariance or correlation

coefficients, which are commonly used to detect simi-

larities across signals. Most of these statistical measures,

however, are primarily designed for dealing with prob-

lems that are already low-dimensional, such asencountered in the study of uni- or bivariate processes,

and thus do not need to be reduced further. Similar to

the experimenter’s choice to record specific variables, the

application of uni- or bivariate techniques requires an a

priori decision which signals and processes to study.

That is, in contrast to unbiased methods one has to

make the above mentioned educated guess as to which

variables represent the most important events of theprocess of interest. Using statistically driven techniques

for pattern recognition one can avoid the need of

making such a priori assumptions. A by now classical

example of such a technique for multivariate signals is

the so-called singular value decomposition (Golub and

van Loan, 1990) or Karhunen/Lo�eve-expansion (Haken,

1996, Chapter 11.1); especially when applied to sets of

time series these methods are commonly referred to asPCA.

2.1. Mathematical definitions

To allow for a general mathematical description, we

introduce arbitrary multivariate data in the form of Ndifferent real-valued, time-dependent variables denoted

as qkðtÞ. For instance, finite sets of Cartesian coordi-nates, e.g., x1ðtÞ; x2ðtÞ; . . . ; xN ðtÞ, or sets of electromyo-

graphic signals, e.g., emg1ðtÞ; emg2ðtÞ; . . . ; emgN ðtÞ, areabbreviated as q1ðtÞ; q2ðtÞ; . . . ; qN ðtÞ. In order to be able

to profit from algebraic or geometrical tools we combine

these variables into a single N -dimensional, time-

dependent vector ~q ¼~qðtÞ, that isq1ðtÞ0

..

.

0

0BBBB@

1CCCCAþ

0

q2ðtÞ...

0

0BBBB@

1CCCCAþ � � � þ

0

0

..

.

qN ðtÞ

0BBBB@

1CCCCA

¼

q1ðtÞq2ðtÞ...

qN ðtÞ

0BBBB@

1CCCCA ¼~qðtÞ ð1Þ

Operating from this form, one tries to transform the

data using a set of M linearly independent vectors ormodes ~vðkÞ. Assuming the presence of redundancies in

the data, the numberM of vectors needed to describe the

data will be smaller than the number of original time

series, that is, M < N . However, the data might not

contain such redundancies and such a representation

should therefore be seen as an approximation that for-

mally reads 1

~qðtÞ �~q�

ðMÞðtÞ ¼XM<N

k¼1

nkðtÞ~vðkÞ ð2Þ

The proper choice of both the modes ~vðkÞ and the cor-

responding time series nk ¼ nkðtÞ is the central concern

of PCA.

0

0.5q 1

-0.5

-0.5

0

0.5q 2

0 5 10

-0.5

-0.5

-0.5

0

0.5

t

q 3

-0.50

0.5

-0.5

0

0.5

-0.5

0

0.5

q 1

v(1)

q 2

q3

0 5 10

-0.5

0

0.5

t

1

-0.5 0 0.5

-0.5

0

0.5

q ' 1

v(3)

v(2)

q' 2

0

0.52

0 5 10

0

0.5

t

3

ξ

ξ

ξ

(a)

(b)

Fig. 1. (a) Geometrical determination of the first principal mode of a three-dimensional data set: (left panel) time series q1 . . . q3; (middle panel)

q1; . . . ; q3 as point distribution in the corresponding vector space and (right panel) projection of the data on the first mode resulting in the corre-

sponding time series n1––see text for further details. (b) Geometrical determination of the second and third modes corresponding to the three time

series in (a): (left panel) point distribution in the vector space [q01; q02] that is orthogonal to the one shown in Fig. 1a and (right panel) projection of the

data on the second and third mode resulting in time series n2 and n3––cf. Fig. 2a and see text for further details.

2 To simplify the mathematics, the distance is quantified by its

quadratic form.

418 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

More than a century ago Pearson (1901) identified a

possible procedure by means of a geometrical solution:

Suppose one has a three-dimensional set of data points

recorded at discrete times ti, i.e. fq1ðtiÞ; q2ðtiÞ; q3ðtiÞg, asshown in the left panel of Fig. 1a. Representing these

data in the corresponding vector space results in a dis-

tribution of discrete points (Fig. 1a, middle panel) and

the direction, along which this distribution spreadsmost, is referred to as first principal axis ~vð1Þ. Subse-quently, projecting the data onto this direction yields a

time series n1ðtiÞ that reflects the evolution along the first

principal axis; see Fig. 1a, right panel. The remaining

components are determined equivalently after projecting

the data onto the, here two-dimensional, orthogonal

subspace as shown in Fig. 1b.

In accordance with this geometrical view, but ren-dering the analysis more general, a criterion for deter-

mining optimal (principal) modes that approximate data

by means of Eq. (2) can be given via the distance be-

tween the data~qðtÞ and the approximation~qðMÞðtÞ. Thatis, dependent on the number of modes M , this distance,

or the error, needs to be minimal. However, to account

for the time span 06 t6 T , during which~qðtÞ evolves (or

is recorded), the instantaneous distance 2 is replaced by

its mean over time. In practice, the mean computation

h. . .iT for variables evolving continuously in time reads

hf ðtÞiT ¼ 1=TR T0f ðtÞdt, whereas for discrete (equally

sampled) data, i.e. for f ðtÞ ! f ði � DtÞ ¼ fi with i ¼1; 2; . . . ; n, we calculate the mean as hf ðtÞiT ¼1=n

Pni¼1 fi––without loss of generality we here consider

continuously evolving time series so that the leastsquares error minimization becomes

ErrorðMÞ ¼ 1

T

Z T

0

~qðtÞ"

�XMk¼1

nkðtÞ~vðkÞ#2

dt¼! minimal

ð3Þ

Karhunen (1946) and Lo�eve (1945, 1948) showed thatthis minimization can be realized in terms of an eigen-

value problem, which can be solved algebraically; in

fact, Pearson’s geometrical (analytical) solution is basi-

cally mapped onto conventional linear algebra. In brief,

A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428 419

the least square distances are combined into the so-called covariance matrix fCovijg, whose coefficients

read

Covij ¼ h½qiðtÞ � hqiðtÞiT �½qjðtÞ � hqjðtÞiT �iT ð4Þ

Similar to the correlation matrix, the form (4) is com-

monly used in statistics to compare different data sets.

Here, however, the covariance matrix is further rescaledto unit trace, that is, using (4) one normalizes every

coefficient via the trace of the matrix, that is,PN

i¼1 Covii.

The eigenvalues kk and eigenvectors~vðkÞ of the resulting

matrix can be determined solving the equation

1PNi¼1 Covii

Cov11 Cov12 � � � Cov1N

Cov21 Cov22 � � � Cov2N

..

. ... . .

. ...

CovN1 CovN2 � � � CovNN

0BBBB@

1CCCCA �~vðkÞ

¼ kk

1 0 � � � 0

0 1 � � � 0

..

. ... . .

. ...

0 0 � � � 1

0BBBB@

1CCCCA �~vðkÞ ð5Þ

The eigenvectors directly correspond with the principal

modes introduced in the preceding. In addition, we find

for the eigenvalues kk the following boundaries and

ranking

k1 P k2 P . . . P kN P 0 withXNk¼1

kk ¼ 1 ð6Þ

because the matrix is real, symmetric, and normalized.

Importantly, each kk represents a measure for the vari-

ance, deviation, or spread of the data along the corre-

sponding mode~vðkÞ. Indeed, the spectrum of eigenvalues

agrees with the aforementioned geometrically driven

iterative procedure, that is, with the subsequent searchfor the maximal spread of data points in the corre-

Fig. 2. Example ½q1; q2; q3� ¼ ½sin 2pt; 0:5 sin 2pt þ noise; noise�, sampled with

show the three individual time series that are plotted in the corresponding

principal axes the projection results in time series n1, n2 and n3 depicted in the

corresponding eigenvalue: here k1 ¼ 97%, k2 ¼ 3% and k3 ¼ 0. Since k3 vanishsee text for further details.

sponding vector space. Remarkably, the one-to-onecorrespondence between the optimization procedure

and the eigenvalue problem is only given when the ei-

genvectors ~vðkÞ and, thus, the principal directions, are

(assumed to be) orthogonal––under this constraint the

optimization has a unique solution. Assuming ortho-

gonality of the eigenvectors ~vðkÞ, however, one can

immediately compute the evolution along each mode by

means of a simple scalar product, that is,

nkðtÞ ¼~vðkÞ �~qðtÞ ð7Þ

In sum, consecutively determined directions of max-

imal data spread define principal axes ~vðkÞ along whichthe data evolve according to time series nkðtÞ. The modes

are ranked by means of decreasing contribution to the

entire data set as can be quantified by the (normalized)

variance, kk.

2.2. Simulated examples

After the inevitably rather abstract mathematicalformulation of PCA we now turn to some explicit

examples. To keep track of the procedure we restrict

ourselves to the discussion of three three-dimensional

problems that allow for an immediate visualization and

may thus help the reader to get a feel for the mapping

onto principal axes. In detail, the data consists of two

differently correlated processes that mix within one of

the three components. For the sake of simplicity, let oneof the processes be dictated by a simple sine-function,

i.e., q1ðtÞ ¼ sin 2pt, whereas the other is entirely random

in terms of uncorrelated (white) noise with vanishing

mean, i.e., q3ðtÞ ¼ noise with hnoiseiT ¼ 0. The third

signal, q2ðtÞ, is assumed to be an additive mixture of

these two. As depicted in Fig. 2, plotting each of these

(sampled) time series results in a data distribution in the

form of a plane (Fig. 2 left panels). The orientation ofthe plane coincides with the principal axes~vð1Þ,~vð2Þ, and

Dt ¼ 0:01 s for 10 periods, i.e., 0 s6 t6 10 s. The extreme left panels

phase space in the adjacent panel to the right. After determining the

extreme right panels. The contributions of each of the nk is given via the

es the phase space becomes basically two-dimensional, that is [n1; n2];––

-1

0

1

q1

-1

0

1

q2

0 5 10-1

0

1

q3

t -1

0

1

-1

0

1

-1

0

1

v(3)

q1

v(1)

v(2)

q2

q3

-1

0

1

-1

0

1

-1

0

1

12

3

-1

0

1

1

-1

0

1

2

0 5 10-1

0

1

3

t

ξ

ξ ξ

ξ

ξ

ξ

Fig. 3. Example qj ¼ sinð2pt þ noise1;jÞ þ noise2;j, sampled with Dt ¼ 0:01 s for 10 periods, i.e., 0 s6 t6 10 s; the contributions of njðtÞ are given as

k1 ¼ 76% and k2 ¼ 14% and k3 ¼ 10%; see text and compare with Fig. 2.

-1

0

1

q1

-1

0

1

q2

0 5 10-1

0

1

q3

t -1

0

1

-1

0

1

-1

0

1

v(3)

q1

v(1)

v(2)

q2

q3

-1

0

1

-1

0

1

-1

0

1

12

3

-1

0

1

1

-1

0

1

2

0 5 10-1

0

1

3

t

ξ

ξ ξ

ξ

ξ

ξ

Fig. 4. Example ½q1; q2; q3� ¼ ½sin 2pt; 0:5 sin 2pt; cos 2pt�; cosine and sine function turn out to be independent so that the individual contributions of

n1ðtÞ and n2ðtÞ are given as k1 ¼ 56% and k2 ¼ 44%, respectively; see also text and compare with Figs. 2 and 3.

3 Remember the relation between correlation/covariance functions

and the Fourier transform: when utilizing scalar product as given in

(3), sine and cosine functions become orthogonal and so do, e.g.,

sinðxtÞ, sinð2xtÞ, sinð3xtÞ, and so on (see also below).

420 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

~vð3Þ. Projecting the data onto these axes results in time

series nkðtÞ, in which n1 exclusively contains the deter-

ministic process (sine-function) and n2 solely reflects the

white noise. The last component, n3 is irrelevant as can

be appreciated from the fact that k3 � 0. In fact, because

the third (and last) eigenvalue k3 vanishes, the system

turns out to be effectively two-dimensional by means of

two independent processes that evolve like n1 and n2.To complicate things a little further, we assume next

that we have another three-dimensional system, whose

components are basically described by a sinusoidal

oscillation and an additive, individual noise term. Fur-

thermore, we randomize the phase of every oscillator so

that the resulting time series look like the ones shown in

Fig. 3 (left panel). In contrast to the previous example,

none of the three eigenvalues kk vanishes but projectiononto the principal axes reveals the system’s contents: n1basically reflects the sinusoidal part (including the ran-

domized phase), whereas all the three additive noise

terms are combined in n2 and n3. Hence, although the

system remains three-dimensional the deterministic

sinusoidal process generates the most spread or variance

in the data set and appears to be independent of the

system’s noise (apart from the random phase).

Focusing a bit more on the phase we realize that due

to the use of the covariance as error measure, sine and

cosine terms become independent. 3 Consequently, two

oscillations with a phase difference of 90� should be

viewed as two individual processes (which will become

important later on when discussing oscillatory compo-

nents during walking). As shown in Fig. 4 cosine and

sine time series form a circle in phase space, whosedescription requires two dimensions. Accordingly, only

one eigenvalue vanishes while the two other reflect the

effective values of the sine and cosine functions. The

system’s evolution in the phase space spanned by n1 andn2 is shown in Fig. 4 (right panels).

Before continuing with more realistic, empirical

examples some words of caution are in order. Analyzing

multivariate signals by means of principal componentsshould never be viewed as a simple exercise, and should

never be done cavalierly. The last example readily

Fig. 5. (Left panel) Positions of the 23 markers. Movements of the

head, shoulders, elbows, wrists, hands, pelvis, hip, knees, ankles, feet,

and the trunk were recorded during treadmill walking (a total of

23 · 3¼ 69 time series sampled with 60 Hz; three-dimensional record-

ing system Optotrak, Northern Digitale, Northern Digital Inc., On-

tario, Canada). (Right panel) the dynamics of the markers displayed as

trajectories. To facilitate the subsequent interpretation, we corrected

for eventual drifts in the center of mass by high-pass filtering the data

(cut-off at 0.2 Hz).

10-1

100

43.7

8%

21.4

6%

15.51

%

10.1

4%

.74%

λk

A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428 421

showed that even when searching solely for the number

of relevant features in the data, an exclusive look at the

spectrum of eigenvalues kk does not provide all the

necessary information. Of course, the eigenvalues

quantify the total strength of a certain mode within the

entire data set. The modes in the example depicted in

Fig. 5, however, are certainly not �independent’ from

each other but only deviate by means of a simple phaseshift. Thus, in order to properly analyze and interpret

the modes, it is necessary to compare the resulting mode

evolutions, i.e., the projections nkðtÞ, with each other.

The eigenvectors ~vðkÞ, or more specifically their coeffi-

cients vðkÞ1 ; vðkÞ2 ; . . . ; vðkÞN , provide information about the

degree to which the individual (original) signals con-

tribute to the corresponding mode. Hence, it is always

advisable to investigate all the available results fromPCA, that is, eigenvalues kk, eigenvectors ~vðkÞ, and pro-

jections nkðtÞ, and to compare them across modes. Only

such a combined view will allow for the identification of

distinct processes in the system under study.

1 2 3 4 5 6 7 8 9 1010-2

2

1.27

%

modes k

Fig. 6. Eigenvalue spectrum on a logarithmic scale as explained in the

text. The sum of the first 4 eigenvalues coversP4

k¼1 kk ¼ 90:9% of the

variance in the data. After mode 4 there is a rapid, discontinuous drop

in the eigenvalues, which indicates that only 4 modes are necessary to

represent the main features of the data.

3. Data reduction: two examples

3.1. Kinematic data during walking

To identify different walking patterns or gait forms,

or to evaluate changes therein due to pathology,

recovery or intervention, both similarities and differ-

ences between gait recordings need to be assessed. Thus

far, this has proven to be rather difficult (Cappozzo,2002; Chau, 2001a,b; Chau and Rizvi, 2002; Leurgans

et al., 1993). We here apply PCA to the time series of the

three Cartesian coordinates of 23 markers recorded

during treadmill walking (in total N ¼ 69 signals)––see

Fig. 5. In line with other studies on gait patterns, we

expect to find a fairly small number of relevant modes

that suffice to describe the essential features of (normal)

gait (see, e.g., Chau, 2001a,b, for a review). Indeed,looking exclusively at the kinematics, walking appears

to be composed of rather steady coordination modes,

e.g., cyclic leg and arm movements that alternate at a

fixed rate.

Interestingly, this small number of relevant modes

does not increase when adding signals that are not

necessarily linearly correlated with the arm/leg walking

pattern, such as movements of the head. As mentionedearlier, the number of relevant modes can be determined

by looking at the eigenvalues of the covariance matrix––

note that prior to the computation of the principal

components kinematic walking signals should be re-

scaled to unit variance because otherwise the first modes

will always reflect the signals with the largest ampli-

tudes, here the feet and hands. Treadmill walking results

in an eigenvalue spectrum that even on a logarithmicscale displays a drastic decrease of the eigenvalues be-

yond mode four (see Fig. 6). Indeed, the first four modes

cover about 90% of the data’s spread and can thus be

assumed to cover most, if not all, relevant (coherent)

features of the signals.

A closer look at these first four modes reveals that

modes 1 and 3 primarily reflect the arm and foot

movements including all the phase-locked components

422 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

that oscillate at the stride or walking frequency (e.g.,knee and hip positions). In contrast, modes 2 and 4

appear to oscillate at twice the basic movement fre-

quency (i.e., the step frequency), reflecting knee and

ankle bending, as well as body sway. In fact, all the

phase-locked components that oscillate at this double

frequency contribute to modes 2 and 4 as well (Fig. 7a).

Notice that, although by definition all the individual

principal modes are linearly independent, pair-wisecombinations between modes present themselves. These

pair-wise combinations are manifest because, like in the

example discussed in Fig. 4, the signals within each

identified pair (e.g., n1 and n3), evolve periodically at

identical (fundamental) frequencies with a phase shift of

90�.Obviously, these dynamics reflect three-dimensional,

pendulum-like oscillations. The fact that the oscillationsare pendulum-like implies that they are governed by

holonomic constraints due to the skeleton, causing

kinematic redundancies (Hazan and Thomas, 1999).

One can either eliminate these redundancies by trans-

forming the Cartesian coordinates into polar coordi-

nates and restricting the analysis to (segment) angles

(which boils down to an a priori reduction of informa-

tion), or by performing subsequent analyses of the timeseries nkðtÞ aimed at disclosing the redundancies after-

wards (see, Fig. 7b). The latter method is indicated

whenever there is reason to believe that the operative

constraints are not strictly holonomic, as is the case

ξ1

ξ3

ξ1.3

t

(a

(b

Fig. 7. (a) First four eigenmodes during treadmill walking. Notice that in or

first rescaled to unit variance (see text). (b) Phase portraits of n1 vs. n3 and n2 vFig. 4 and see text for further details.

when the measured segment lengths are not constant(e.g., due to skin deformation and sliding markers).

The less obvious evolutions are found in the sub-

sequent, higher modes. Recall the strong relation be-

tween power spectrum and covariance function that can

be appreciated here by the orthogonality between cosine

and sine function or between sinðxtÞ and sinð2xtÞ. As

such, modes 2 and 4 might be viewed as higher har-

monics of modes 1 and 3. Here, however, this inter-pretation is inadequate because of the absence of any

(substantial) higher harmonics in the arm swing (see Fig.

7a). In contrast, higher modes containing coordinates

related to the feet, whose evolution is far from sinusoi-

dal, may be readily interpreted in this way.

Remember that the dynamics of nkðtÞ should be

interpreted with caution since spectral components may

mix across modes (e.g., what appears to be frequencydoubling may just be a reflection of higher harmonics).

Apart from specific frequency components, higher

modes may reveal slow processes like a drifting center of

mass (with zero mean), other not frequency-locked but

coherent signals, and purely random fluctuations. Given

that, in the present analysis, the higher modes do not

show any further marked drops in the eigenvalues, we

abstain from looking at them individually but rathercombine them into an overall residual part (see Section

4). Before doing so, however, we first illustrate how

PCA may be used to analyze entirely different signals,

that is, electromyograms (EMG).

ξ2

ξ4

ξ2.4

t

)

)

der to compute the principal components each original time series was

s. n4 hinting at oscillations under holonomic constraints, compare with

A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428 423

3.2. EMG signals during walking

Compared to kinematic data, EMG signals are much

more variable, with the degree of variability depending

on the muscles, internal physiological factors and the

prevailing task conditions. To briefly illustrate this kind

of application, we apply the method to data from an

experimental study aimed at assessing the effect of

experimentally induced low back pain in healthy sub-jects on both structural (i.e., invariant) and variable

properties of patterns of trunk coordination as well as

back muscle activity during walking (Lamoth et al., in

press). Here, we focus on the EMG traces of left and

right thoracic and lumbar muscles (in total N ¼ 6 sig-

nals) during two stride cycles. The activity patterns are

rather consistent across muscles. As it turns out, two

principal components are sufficient to describe almost90% of the data (k1 þ k2 ¼ 88%). The first eigenmode

covers all the participating muscles, as they are collec-

tively active at every instance of foot/ground contact

(i.e., twice per stride cycle). The second mode, in con-

trast, predominantly contains the activities of the tho-

racic muscles, which are 1:1 frequency- and antiphase-

locked with the strides. Thus, adding modes 1 and 2 for

these muscles results in activations at both contactpoints with higher peaks contralateral to the foot

touching the ground (Fig. 8).

Put differently, the second mode oscillates at half the

frequency of the first (i.e., the dominant and homoge-

1 2 3 4 5 610 -2

10 -1

10 0

71.4

9%

16.3

7%

λk

modes k

0 1 2

0

5

10

ξ1

0 1 2

0

5

10

ξ2

t [s]

Fig. 8. Eigenvalue spectrum (left), projections (second column from the left),

upper panel, and~vð2Þ, lower panel), and the corresponding frequency distribut

thoracic, L2, L4). Two modes turn out to be most dominant: while the first on

reflects left/right asymmetry predominantly in the thoracic muscles oscillating

were recorded during treadmill walking, rectified and low-pass filtered (cu

recordings and off-line data processing.

neous) mode. The second mode, thus, modulates thestrides onto the step oscillation and allows for the dis-

crimination between left and right movements, particu-

larly for the thoracic muscles (the effect seems to be less

pronounced for L2).

As we already mentioned in the analysis of the

kinematic data, the higher modes can be difficult to

interpret. In the subsequent sections about the use of

PCA as a data-driven filter, we show how these highermodes may provide a window into the study of vari-

ability and noise.

4. Data-driven filtering

In the field of motor control, it is by now well rec-

ognized that human movement is intrinsically variableand that in-depth analyses of movement variability may

provide insight into underlying control structures. In a

similar vein, there is growing recognition that detailed

analyses of motor variability may be instrumental in

understanding patterns of pathological movement (e.g.,

Collins and De Luca, 1993; Hausdorff et al., 1995; Ne-

well and Corcos, 1993). Generally speaking, variability

is a difficult and often elusive property of movementsystems as it stems from both deterministic and sto-

chastic processes. Thus, one is confronted with the

challenge to tease apart deterministic and stochastic

components of movement patterns.

0 0.5

1

2

3

4

5

6

0 5 10 150

0.05

0 .1

0.15

P[ξ1]

-0.5

-0.5

0 0.5

1

2

3

4

5

6

vj( k)

0 5 10 150

0.05

0 .1

0.15

P[ξ2]

f [Hz]

eigenmodes (third column form the left displays the coefficients of~vð1Þ,ions (right) of six back muscle activities (left thoracic, L2, L4 and right

e is homogeneous and oscillates with the step frequency, the second one

with the stride frequency––see text for further details. Bipolar EMGs

t-off at 20 Hz)––see (Lamoth et al., in press) for further details on

Σk=14 ξ

k(t)v

k=569 ξ

k(t)v

k

Fig. 9. Global gait pattern and filtered (residual) pattern. While the

sum of the first four (most dominant) eigenmodes results in a walking

pattern that is almost identical to the original data set (compare the

pattern on the left-hand side with Fig. 5), the filtered pattern (right-

hand side) indicates that the largest residual variability is located in the

foot, knee, and hand movements.

424 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

As we have shown in the preceding, PCA permits thedetection of main features, referred to as principal

components, resulting in a (usually desirable) reduction

of dimensionality. On the other hand, PCA also allows

for a separation of main and residual components

within a data set. Viewing consistent features as coher-

ent components implies that the mechanisms generating

these common structures follow deterministic rules––

otherwise they would not be consistent/coherent. Incontrast, the residual components often contain a degree

of randomness or stochasticity. In the following, we will

eliminate the consistent, dominant features by sub-

tracting them from the data, thus shifting the focus of

the analysis to less coherent, more variable aspects of

the signals. Based on approximation (2) we first define

deviations of the data from the common pattern as

residual components by means of

~qðtÞ �~q�

ðMÞðtÞ )~qðtÞ �~q�

ðMÞðtÞ ¼~q�

ðresidualÞðtÞ ð8Þ

Especially when partitioning signals into deterministicand stochastic components, subtracting either the one or

the other from the signal can be seen as filtering the

noise or the common parts, respectively. In this spirit we

recast (8) in terms of global patterns,~q�

ðglobalÞðtÞ ¼~q�

ðMÞðtÞ,and filtered components, ~q

ðfilteredÞðtÞ ¼~q�

ðresidualÞðtÞ. Put

differently, we partition the data into two parts

~qðtÞ ¼XM<N

k¼1

nkðtÞ~vðkÞ|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}~q�ðglobalÞðtÞ

þXN

k¼Mþ1

nkðtÞ~vðkÞ|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}~q�ðfilteredÞðtÞ

ð9Þ

Since ~q�

ðglobalÞðtÞ is given as sum of (a few) dominant

principal components the resulting filter characteristic

exclusively depends on the data itself, except for a single

parameter, M , the number of modes. Of course, the

number of modes that define the global pattern influ-ences the filtered pattern. Indeed, the choice of M is by

no means trivial and, in view of the afore-listed exam-

ples, one has to take into account both the eigenvalue

spectrum and the evolution of the extracted eigenmodes.

While the first needs to show discontinuously decreasing

eigenvalues by means of kk>M � kk6M (even on a loga-

rithmic scale), the dynamics nkðtÞ might help to account

for redundancies, such as those due to holonomic con-straints.

Below we apply this data-driven filter to the data sets

used earlier in order to elucidate the main idea behind

this application.

4.1. Kinematic data during walking

As already indicated in Section 3.1, the first fourprincipal components identified for the kinematic data

during treadmill walking cover a high amount of vari-

ance:P4

k¼1 kk > 90%. Hence, one can expect the residual

components to be rather small. As illustrated in Fig. 9,

the filtered data indeed show fairly small amplitudes and

are concentrated predominantly around the feet. To

reiterate, during walking the trajectories of the feet

contain numerous higher harmonics due to the cycliccontact with the ground. Therefore, the residual pattern

appears to be less random but rather summarizes these

remaining higher harmonics. A closer look at the indi-

vidual coefficients further indicates that these higher

harmonics are also present in the signals recorded at the

knees but �damp out’ with increasing distance from the

ground––in the present data set the markers placed on

the back and head seem to move rather steadily. Inter-estingly, we also find a reasonable amount of residual

movements in the hand, which cannot be explained by

the ground contact points. These components might be

caused by the relatively small inertia of the hand. This

small inertia allows for subtle functional adjustments

(e.g., in view of balance) as well as for a stronger

expression of small (and fast) random fluctuations. Ei-

ther way, the oscillations of the hand are less sinusoidalboth in terms of higher harmonics and random fluctu-

ations.

4.2. EMG signals during walking

As stated before, EMG signals are highly variable.

Especially when studying EMG over longer periods one

can expect that the global pattern, as extracted by PCA,does not cover all the relevant features. To emphasize

this characteristic we extend the data set reported in

A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428 425

Section 3.2 by adding several consecutive strides(assuming that the strides are independent from each

other). That is, we do not only compare different mus-

cles during walking but also examine the consistency of

the activity patterns over a finite bout of walking. Notice

that consecutive recordings are usually lumped together

to improve the reliability of statistical estimates (e.g., by

computing mean values), whereas here we leave the

specific choice of an optimally weighted combination upto the PCA. In detail, we split the recordings as obtained

at three walking velocities into subsequent strides con-

taining signals of four lumbar muscles each, resulting in

a total of 21 strides · 4 muscles · 3 velocities¼ (N ¼ 252)

signals.

In this rather high-dimensional data set the first three

modes cover about 80% of the spread, i.e., ðk1 ¼ 62%Þþðk2 ¼ 10%Þ þ ðk3 ¼ 8%Þ ¼ 80%. Similar to the examplein Fig. 8, the first mode is quite homogeneous in that all

trials, strides, and muscles contribute to a similar degree.

Its corresponding time series, n1ðtÞ, shows activity peaks

at each step (compare Figs. 8 and 10 and recall that, in

this analysis, we only include the lumbar muscles). Both

subsequent modes also show step-related patterns as can

be appreciated by the evolutions n2ðtÞ and n3ðtÞ. Similar

to the thoracic muscles, mode 2 modulates the stridesonto the step oscillation: increasing contralateral activ-

ity for L2left and L2right vs. increasing ipsilateral activity

for L4left and L4right. The effect of walking speed be-

comes visible in mode 3. At the low velocity (see the first

four coefficients of the eigenvector of mode 3 in Fig. 10)

3 6 9 12 15 1810-3

10-2

10-1

100

λk

modes k

0

0

20

40

ξ1

0

0

20

40

ξ2

0

0

20

40

ξ3

Fig. 10. PCA of EMG data as discussed in the text: we used four lumbar m

signals. A significant decrease in eigenvalues after mode 3 (left panel) motiv

details.

adding mode 3 to the previous ones basically shifts theactivation in time by decreasing the initial and increas-

ing the final values of each peak. Thus, when walking

slowly the back muscle activation is delayed in com-

parison to walking at a high speed. There, mode 3 is

effectively subtracted from the previous ones, thus,

shifting the activation peaks slightly backwards in time

(see the last four coefficients of the eigenvector of mode

3 in Fig. 10). At the intermediate velocity (coefficients 5–9 of the eigenvector of mode 3 in Fig. 10) the four

muscles show differential effects, that is, both L2 muscle

activations are delayed relative to both L4 muscle acti-

vations.

Apart from these consistent global patterns, the

EMG signals contain other important features that are

evident in all residual modes, that is, in the filtered data

(notice that we combined all remaining modes becausefurther abrupt drops in the eigenvalues are absent). In

Fig. 11 we show the original data, its global pattern

(given as sum of the first three modes), and the filtered

pattern.

In sum, the filtered data mainly consists of rather

irregular deviations of the global pattern in the form of

phase shifts and (apparently random) amplitude modi-

fications. In view of these irregularities, an appropriateway to further analyze the filtered pattern would be to

calculate conventional statistical measures like the

overall variance (or standard deviation) and the coeffi-

cient of variation. The latter measure seems to be par-

ticularly interesting when investigating the effects of

50 100

50 100

50 100

% stride cycle

-0.1 -0.05 0 0.05 0.1

-0.1 -0.05 0 0.05 0.1

-0.1 -0.05 0 0.05 0.1

1

45

89

12

1

45

89

12

1

45

89

12

⟨ v ( k)j

⟩strides

uscles· 21 strides· 3 velocities (3.8, 4.6, and 5.4 km/h), i.e. N ¼ 252

ates the filter settings, i.e. M ¼ 3, used in Fig. 11. See text for further

0 50 100

0

stride cycle % stride cycle %

emg

original data

0 50 100

0

Σk=13 ξ

kv(k)

global pattern

0 50 100

0

Σk=484 ξ

kv(k) filtered pattern

Fig. 11. (Left panel) Original time series of the EMG signals in Fig. 10––note that here we use only one velocity condition (5.4 km/h) for the sake of

clarity of the plots––i.e. 84 time series; (right upper panel) global pattern constructed via the sum of the first three principal components (each time

series reflects a coefficient of the resulting N -dimensional vector); (right lower panel) filtered pattern by means of the residual:~qðfilteredÞ ¼~q�~qðglobalÞ.In all the panels the black curve represents the mean of the individual time series.

426 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

experimental manipulations other than walking velocity.

In general, the method discussed in the present section

may be used to zoom in on subtle effects of the experi-

mental manipulations that are barely visible, if at all, in

the common patterns (or, for that matter, in standard

statistics).

5. Summary and outlook

As explained in the present tutorial, the primary use

of PCA is to obtain a non-redundant set of variables for

a compact description of certain processes or phenom-

ena (dimensionality reduction). 4 That is, PCA, as

unbiased or �blind’ approach, is applied to extract the

�relevant’ information in high-dimensional data sets byconsidering only those principal components that ex-

plain sufficiently high fractions of the entire data set in

terms of its spread or variance. In addition to this

conventional use, we highlighted another use of PCA,

namely as a data-driven filter involving the detection

and successive extraction of the coherent signal struc-

tures, resulting in a residual with a substantial portion of

random components. Then, the focal point of theremaining data is the residual variance, which is absent

in the overall pattern.

Using both simulated and empirical examples, we

illustrated the generality of these two kinds of applica-

tion of PCA as well as their significance for studies of

4 In the engineering notion, one seeks a normal reference model,

here, to locate pathological deviations from this normal model and to

analyze entire time dependencies or waveforms.

human movement. As regards possible applications in

clinical settings, a host of research areas and topics come

to mind that could benefit from the methods outlined

here. For instance, afflictions like Parkinson’s, Hun-

tington’s, and Binswanger’s disease are all known to

affect gross motor coordination, resulting in non-trivial

modifications of movement patterns during whole-body

activities such as walking or object manipulation. Whilecomparisons between pathological and healthy coordi-

nation patterns abound, only modest advances have

been made in precisely identifying and thoroughly

quantifying the observed differences. A systematic use of

PCA may help to strengthen such comparisons. Fur-

thermore, by applying PCA as a function of relevant

experimental manipulations (e.g., medication, rehabili-

tation, or other interventions) may help to evaluate theinduced changes in the coordination patterns. If com-

ponents are identified that are affected most in com-

parison with healthy subjects, or are most susceptible to

treatment, fundamental insights into the (mechanisms of

the) underlying impairment might be obtained.

In addition to the extraction of dominant modes,

PCA may also be successfully used in its capacity as a

data-driven filter to closely examine the relation betweenpathological movement and variability. Especially when

marked differences in the dominant modes are absent,

this may be a worthwhile line of investigation to pursue.

At present, there is growing recognition that the intrinsic

variability of motor behavior contains important infor-

mation about both healthy and pathological motor

control (Collins and De Luca, 1993, 1994; Dingwell and

Cavanagh, 2001; Dingwell et al., 1999; Giakas andBaltzopoulos, 1997; Goldberger et al., 2002; Hamill

et al., 1999; Harris and Wolpert, 1998; Hausdorff et al.,

A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428 427

1998, 2000, 1995, 1996; Maki, 1997; Miller et al., 1996;Riley and Turvey, 2002; Scholz et al., 2002; Scholz and

Sch€oner, 1999; Vogt et al., 2001; West and Griffin,

1999). In the study of movement disorders this recog-

nition has thus far mainly resulted in detailed analysis of

the variability of coarse-grained variables like the cen-

ter-of-pressure and stride length, which by definition

collapse the properties of a large number of fine-grained

variables. Although these studies have led to importantinsights, such as the observation that successive varia-

tions in stride length are more strongly correlated in

healthy subjects than in patients with Huntington’s

disease, they do not unravel the generative, coordinative

principles causing such differences. Viewed in this light,

the application of multivariate analyses like PCA 5 may

provide a useful, if not necessary, technique for pin-

pointing and discriminating between generating com-ponents, and may thus contribute profoundly to this

currently evolving clinical understanding of movement

disorders.

Acknowledgements

This study was financially supported by grants from

the Dutch organization for Scientific Research # 904-65-

090 (MW-NWO), the Dutch Association for Exercise

Therapy Mensendieck (NVOM) and the Mensendieck

Development Foundation (SOM).

References

Andriacchi, T.P., Alexander, E.J., 2000. Studies of human locomotion:

past, present and future. J. Biomech. 33, 1217–1224.

Bernstein, N.A., 1967. The co-ordination and regulation of move-

ments. Pergamon Press, Oxford.

Bianchi, L., Angelini, D., Orani, G., Lacquaniti, F., 1998. Kinematic

coordination in human gait: relation to mechanical energy cost. J.

Neurophysiol. 79, 2155–2170.

Borghese, N.A., Bianchi, L., Lacquaniti, F., 1996. Kinematic deter-

minants of human locomotion. J. Physiol. (London) 494, 863–879.

Cappozzo, A., 2002. Minimum measured-input models for the

assessment of motor ability. J. Biomech. 35, 437–446.

Chau, T., 2001a. A review of analytical techniques for gait data. Part 1:

fuzzy, statistical and fractal methods. Gait Posture 13, 49–66.

Chau, T., 2001b. A review of analytical techniques for gait data. Part

2: neural network and wavelet methods. Gait Posture 13, 102–120.

5 In fact, there exists a plenitude of PCA related decompositions or

separations into modes or fundamental patterns. We here only list two

quite recent and promising attempts that might become useful in the

context of gait analysis. First, as we have shown PCA modes may turn

out to be correlated, which can be avoided by using so-called

independent components (Jutten and Herault, 1994; Makeig et al.,

1997) Second, especially when recording during short periods, gait

data should be considered as non-stationary signals, which can be

studied using the so-called Wavelet–Karhunen–Lo�eve expansion

(Starck and Querre, 2001).

Chau, T., Rizvi, S., 2002. Automatic stride interval extraction from

long, highly variable and noisy gait timing signals. Hum. Mov. Sci.

21, 495–514.

Collins, J.J., De Luca, C.J., 1993. Open-loop and closed-loop control

of posture: a random-walk analysis of center-of-pressure trajecto-

ries. Exp. Brain Res. 95, 308–318.

Collins, J.J., De Luca, C.J., 1994. Random walking during quiet

standing. Phys. Rev. Lett. 73, 764–767.

Collins, J.J., Chow, C.C., Imhoff, T.T., 1995. Stochastic resonance

without tuning. Nature 376, 236–238.

Deluzio, K.J., Wyss, U.P., Costigan, P.A., Sorbie, C., Zee, B., 1999.

Gait assessment in unicompartmental knee arthroplasty patients:

Principal component modelling of gait waveforms and clinical

status. Hum. Mov. Sci. 18, 701–711.

Diedrich, F.J., Warren, W.H., 1995. Why change gaits? Dynamics of

the walk-run transition. J. Exp. Psychol.: Hum. Percep. Perf. 21,

183–202.

Dingwell, J.B., Cavanagh, P.R., 2001. Increased variability of contin-

uous overground walking in neuropathic patients is only indirectly

related to sensory loss. Gait Posture 14, 1–10.

Dingwell, J.B., Ulbrecht, J.S., Boch, J., Becker, M.B., O’Gorman, J.T.,

Cavanagh, P.R., 1999. Neuropathic gait shows only trends towards

increased variability of sagittal plane kinematics during treadmill

locomotion. Gait Posture 10, 21–29.

Donker, S.F., Beek, P.J., Wagenaar, R.C., Mulder, T., 2001. Coor-

dination between arm and leg movements during locomotion. J.

Motor Beh. 33, 86–102.

Feipel, V., De Mesmaeker, T., Klein, P., Rooze, M., 2001. Three-

dimensional kinematics of the lumbar spine during treadmill

walking at different speeds. Eur. Spine J. 10, 16–22.

Giakas, G., Baltzopoulos, V., 1997. Time and frequency domain

analysis of ground reaction forces during walking: An investigation

of variability and symmetry. Gait Posture 5, 189–197.

Goldberger, A.L., Amaral, L.A., Hausdorff, J.M., Ivanov, P., Peng,

C.K., Stanley, H.E., 2002. Fractal dynamics in physiology:

alterations with disease and aging. Proc. Natl. Acad. Sci. USA

99 (Suppl 1), 2466–2472.

Golub, G.H., van Loan, C.F., 1990. Matrix computations. John

Hopkins University Press, Baltimore.

Haken, H., 1996. Principles of Brain Functioning. Springer, Berlin.

Halliday, D.M., Conway, B.A., Christensen, L.O., Hansen, N.L.,

Petersen, N.P., Nielsen, J.B., 2003. Functional coupling of motor

units is modulated during walking in human subjects. J. Neuro-

physiol. 89, 960–968.

Hamill, J., Emmerik van, R.E.A., Heiderscheit, B.C., Li, L., 1999. A

dynamical system approach to lower extremity injuries. Clin.

Biomech. 14, 297–308.

Hansen, N.L., Hansen, S., Christensen, L.O., Petersen, N.T., Nielsen,

J.B., 2001. Synchronization of lower limb motor unit activity

during walking in human subjects. J. Neurophysiol. 86, 1266–

1276.

Harris, C.M., Wolpert, D.M., 1998. Signal-dependent noise determines

motor planning. Nature 20, 780–784.

Hausdorff, J.M., Peng, C.K., Ladin, Z., Wei, J.Y., Goldberger, A.L.,

1995. Is walking a random walk? Evidence for long-range

correlations in stride interval of human gait. J. Appl. Physiol. 78,

349–358.

Hausdorff, J.M., Purdon, P.L., Peng, C.K., Ladin, Z., Wei, J.Y.,

Goldberger, A.L., 1996. Fractal dynamics of human gait: stability

of long-range correlations in stride interval fluctuations. J. Appl.

Physiol. 80, 1448–1457.

Hausdorff, J.M., Cudkowicz, M.E., Firtion, R., Wei, J.Y., Goldberger,

A.L., 1998. Gait variability and basal ganglia disorders: Stride-to-

stride variations of gait cycle timing in Parkinson’s disease and

Huntington’s disease. Mov. Disord. 13, 428–437.

Hausdorff, J.M., Lertratanakul, A., Cudkowicz, M.E., Peterson, A.L.,

Kalition, D., Goldberger, A.L., 2000. Dynamic markers of altered

428 A. Daffertshofer et al. / Clinical Biomechanics 19 (2004) 415–428

gait rhythm in amyotrophic lateral sclerosis. J. Appl. Physiol. 88,

2045–2053.

Hazan, Z., Thomas, J.S., 1999. In: Binder, M.D. (Ed.), Peripheral and

Spinal Mechanisms in the Neural Control of Movement, vol. 123.

Elsevier, Amsterdam, pp. 379–387.

Hof, A.L., Elzinga, H., Grimmius, W., Halbertsma, J.P.K., 2002.

Speed dependence of averaged EMG profiles in walking. Gait

Posture 16, 78–86.

Jutten, C., Herault, J., 1994. Independent component analysis, a new

concept? Sig. Proc. 36, 287–314.

Karhunen, K., 1946. Zur spektralen Theorie stochastischer Prozesse.

Ann. Acad. Sci. Fennicae. Ser. A 34 (Math. Phys.).

Lamoth, C.J.C., Beek, P.J., Meijer, O.G., 2002. Pelvis-thorax coordi-

nation in the transverse plane during gait. Gait Posture 16, 101–114.

Lamoth, C.J.C., Daffertshofer, A., Meijer, O.G., Moseley, G., Wuis-

man, P.I.J.M., Beek, P.J., in press. Effects of experimentally

induced pain and fear of pain on trunk coordination and back

muscle activity during walking. Clin. Biomech.

Latash, M.L., Anson, J.G., 1996. What are ‘‘normal movements’’ in

atypical populations? Behav. Brain Sci. 19, 55–106.

Leurgans, S.E., Moyeed, R.A., Silverman, B.W., 1993. Canonical

correlation-analysis when the data are curves. J. R. Stat. Soc. B 55,

725–740.

Lo�eve, M., 1945. Fonctions alatoires du second ordre. Compte Rend.

Acad. Sci. (Paris) 220.

Lo�eve, M., 1948. In: L�evy, P. (Ed.), Processus Stochastic et Mouve-

ment Brownien. Gauthier Villars, Paris.

Makeig, S., Jung, T.P., Bell, A.J., Ghahremani, D., Sejnowski, T.J.,

1997. Blind separation of auditory event-related brain responses

into independent components. Proc. Natl. Acad. Sci. USA 94,

10979–10984.

Maki, B.E., 1997. Gait changes in older adults: predictors of falls or

indicators of fear. J. Am. Geriat. Soc. 45, 313–320.

McGibbon, C.A., Krebs, D.E., 2001. Age related changes in lower

trunk coordination and energy transfer during gait.

Miller, R.A., Thaut, M.H., McIntosh, G.C., Rice, R.R., 1996.

Components of EMG symmetry and variability in parkinsonian

and healthy elderly gait. Electroencephalogr. Clin. Neurophysiol.

101, 1–7.

Newell, K.M., Corcos, D., 1993. Variability and Motor Control.

Human Kinetics, Champaign-Urbana, IL.

Patla, A.E., Calvert, T.W., Stein, R.B., 1985.Model of a pattern generator

for locomotion in mammals. Am. J. Physiol. 248, R484–494.

Pearson, K., 1901. On lines and planes of closest fit to systems of

points in space. London Edin. Dublin Philos. Mag. J. Sci. 6, 559–

572.

Priplata, A., Niemi, J., Salen, M., Harry, J., Lipsitz, L.A., Collins, J.J.,

2002. Noise-enhanced human balance control. Phys. Rev. Lett. 89,

238101.

Riley, M.A., Turvey, M.T., 2002. Variability of determinism in motor

behavior. J. Mot. Behav. 34, 99–125.

Sadeghi, H., 2003. Local or global asymmetry in gait of people without

impairments. Gait Posture 17 (3), 197–204.

Sadeghi, H., Prince, F., Sadeghi, S., Labelle, H., 2000. Principal

component analysis of the power developed in the flexion/extension

muscles of the hip in able-bodied gait. Med. Eng. Phys. 22, 703–

710.

Sadeghi, H., Allard, P., Barbier, F., Sadeghi, S., Hinse, S., Perrault, R.,

Labelle, H., 2002. Main functional roles of knee flexors/extensors

in able-bodied gait using principal component analysis (I). Knee 9,

47–53.

Scholz, J.P., Danion, F., Latash, M.L., Sch€oner, G., 2002. Under-

standing finger coordination through analysis of the structure of

force variability. Biol. Cybern. 86, 29–39.

Scholz, J.P., Sch€oner, G., 1999. The uncontrolled manifold concept:

identifying control variables for a functional task. Exp. Brain Res.

126, 289–306.

Shiavi, R., Griffin, P., 1981. Representing and clustering electromyo-

graphic gait patterns with multivariate techniques. Med. Biol. Eng.

Comp. 19, 605–611.

Starck, J.-L., Querre, P., 2001. Multispectral data restoration by the

wavelet Karhunen-Lo�eve transform. Sig. Proc. 81, 2449–2459.

Van Emmerik, R.E.A., Wagenaar, R.C., 1996. Effects of walking

velocity on relative phase dynamics in the trunk in human. J.

Biomech. 29, 1175–1184.

Vogt, L., Pfeifer, K., Portscher, M., Banzer, W., 2001. Influences of

nonspecific low back pain on three-dimensional lumbar spine

kinematics in locomotion. Spine 26, 1910–1919.

Wagenaar, R.C., van Emmerik, R.E.A., 2000. Resonant frequencies of

arms and legs identify different walking patterns. J. Biomech. 33,

853–861.

Wannier, T., Bastiaanse, C., Colombo, G., Dietz, V., 2001. Arm to leg

coordination in humans during walking, creeping and swimming

activities. Exp. Brain Res. 141, 375–379.

West, B.J., Griffin, L., 1999. Allometric control, inverse power laws

and human gait. Chaos Solitons & Fractals 10, 1519–1527.

White, S.G., McNair, P.J., 2002. Abdominal and erector spinae muscle

activity during gait: the use of cluster analysis to identify patterns

of activity. Clin. Biomech. 17, 177–184.

Whithall, J., Caldwell, G.E., 1992. Coordination of symmetrical and

asymmetrical human gait. J. Mot. Behav. 24, 339–353.

Winter, D.A., 1983. In: Winter, D.A. (Ed.), International Congress of

Biomechanics. Human Kinetics, Waterloo.

Winter, D.A., Yack, H.J., 1987. EMG profiles during normal walking:

stride-to-stride and inter-subject variability. Electroencephalogra-

phy and Clinical Neurophysiology 67, 402–411.

Wootten, M.E., Kadaba, M.P., Cochran, G.F.B., 1990a. Dynamic

electromyography I. Numerical representation using principal

component analysis. J. Orthop. Res. 8, 247–258.

Wootten, M.E., Kadaba, M.P., Cochran, G.F.B., 1990b. Dynamic

electromyography II. Normal patterns during gait. J. Orthop. Res.

8, 259–265.