Articulated motion reconstruction from feature points

14
Author's personal copy Pattern Recognition 41 (2008) 418 – 431 www.elsevier.com/locate/pr Articulated motion reconstruction from feature points B. Li a , , Q. Meng b , H. Holstein c a Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, M1 5GD, UK b Department of Computer Science, Loughborough University, Loughborough, LE11 3TU, UK c Department of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Wales, UK Received 17 October 2006; received in revised form 25 March 2007; accepted 6 June 2007 Abstract A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic key- frame-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to enhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in a single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving self-resuming tracking for articulated motion. 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Non-rigid articulated motion; Point pattern matching; Non-rigid pose estimation; Motion tracking and object recognition 1. Introduction Visual interpretation of non-rigid articulated motion has lately seen somewhat of a renaissance in computer vision and pattern recognition. The motivation for directing existing motion analysis of rigid objects towards non-rigid articulated objects [1,2], especially human motion [3–5], is driven by potential applications such as human–computer interaction, surveillance systems, entertainment and medical studies. A large body of research, dedicated to the task of structure and motion analysis, utilises feature-based methods regardless of parametrisation by points, lines, curves or surfaces. Among these, concise feature-point representation, advantageously abstracting the underlying movement, is usually used as an Corresponding author. Tel.: +44 161 247 3598; fax: +44 161 247 1483. E-mail addresses: [email protected] (B. Li), [email protected] (Q. Meng), [email protected] (H. Holstein). 0031-3203/$30.00 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2007.06.002 essential or intermediate correspondence towards the end- product of motion and structure recovery [6–8]. In the context of vision cues via feature-point representa- tion, the spatio–temporal information is notably reduced to a sequence of unidentified points moving over time. To determine the subject’s structure and therefore its underlying skeletal-style movements for the purpose of high-level recognition, two fun- damental problems of feature-point tracking and identification need to be solved. Tracking feature points in successive frames has been investigated extensively in the literatures [9–12]. How- ever, the identities of the subject feature points are not obtain- able from inter-frame tracking alone. Feature-point identification requires the determination of which point in an observed data frame corresponds to which point in its model, thus allowing recovery of structure. The task addresses the difficult problem of automatic model matching and identification, crucial at the start of tracking or on resump- tion from tracking loss. Currently, most tracking approaches simplify the problem to incremental pose estimation, relying

Transcript of Articulated motion reconstruction from feature points

Author's personal copy

Pattern Recognition 41 (2008) 418–431

www.elsevier.com/locate/pr

Articulated motion reconstruction from feature points

B. Lia,∗, Q. Mengb, H. Holsteinc

aDepartment of Computing and Mathematics, Manchester Metropolitan University, Manchester, M1 5GD, UKbDepartment of Computer Science, Loughborough University, Loughborough, LE11 3TU, UKcDepartment of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Wales, UK

Received 17 October 2006; received in revised form 25 March 2007; accepted 6 June 2007

Abstract

A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of

feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose

a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely

by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking

to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic key-

frame-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to

enhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in a

single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a

common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in

buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving

self-resuming tracking for articulated motion.

� 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords: Non-rigid articulated motion; Point pattern matching; Non-rigid pose estimation; Motion tracking and object recognition

1. Introduction

Visual interpretation of non-rigid articulated motion has

lately seen somewhat of a renaissance in computer vision

and pattern recognition. The motivation for directing existing

motion analysis of rigid objects towards non-rigid articulated

objects [1,2], especially human motion [3–5], is driven by

potential applications such as human–computer interaction,

surveillance systems, entertainment and medical studies. A

large body of research, dedicated to the task of structure and

motion analysis, utilises feature-based methods regardless of

parametrisation by points, lines, curves or surfaces. Among

these, concise feature-point representation, advantageously

abstracting the underlying movement, is usually used as an

∗ Corresponding author. Tel.: +44 161 247 3598; fax: +44 161 247 1483.

E-mail addresses: [email protected] (B. Li), [email protected]

(Q. Meng), [email protected] (H. Holstein).

0031-3203/$30.00 � 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

doi:10.1016/j.patcog.2007.06.002

essential or intermediate correspondence towards the end-

product of motion and structure recovery [6–8].

In the context of vision cues via feature-point representa-

tion, the spatio–temporal information is notably reduced to a

sequence of unidentified points moving over time. To determine

the subject’s structure and therefore its underlying skeletal-style

movements for the purpose of high-level recognition, two fun-

damental problems of feature-point tracking and identification

need to be solved. Tracking feature points in successive frames

has been investigated extensively in the literatures [9–12]. How-

ever, the identities of the subject feature points are not obtain-

able from inter-frame tracking alone.

Feature-point identification requires the determination of

which point in an observed data frame corresponds to which

point in its model, thus allowing recovery of structure. The task

addresses the difficult problem of automatic model matching

and identification, crucial at the start of tracking or on resump-

tion from tracking loss. Currently, most tracking approaches

simplify the problem to incremental pose estimation, relying

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 419

on manual model fitting at the start of tracking, or on an as-

sumption of initial pose similarity and alignment to the model,

or on pre-knowledge of a specific motion from which to infer

an initial pose [5,13]. In this sense, the general recovery of non-

rigid articulated motion solely from feature points still remains

an open problem. There is a relative dearth of algorithmic self-

initialisation for articulated motion reconstruction from only a

collection of sparse feature points.

Motivated by these observations, we present a dynamic

segment-based hierarchical point matching (DSHPM) algo-

rithm to address self-initialising articulated motion reconstruc-

tion from sparse feature points. The articulated motion we

are considering describes general segmental jointed freeform

movement. The motion of each segment can be considered as

rigid or nearly rigid, but the motion of the object as a whole

is high-dimensionally non-rigid. In our work, the articulated

model of an observed subject is a priori known, suggesting a

model-based approach. As a general solution to the problem,

the algorithm only assumes availability of feature-point motion

data, such as obtained in our experiments via a marker-based

motion capture system. We do not make the usual simplify-

ing assumptions of model-pose similarity or restricted motion

class for tracking initialisation, nor do we require absence of

data noise. The algorithm aims to establish one-to-one matches

between the model point-set and its freeform motion data to

reconstruct the underlying articulated movements in buffered

real-time.

2. Related work

The problem of automatically identifying feature points to re-

trieve underlying articulated movement can be inherently diffi-

cult for a number of reasons: (1) the possibility of globally high

dimensionality to depict the articulated structure; (2) relaxation

of segment rigidity to allow for limited distortion; (3) data cor-

ruption due to missing (occluded) and extra (via the process of

feature extraction) data; (4) unrestricted and arbitrary poses in

freeform movements; (5) requirements of self-initialising track-

ing and identification; and (6) computational cost compatible

with real-time. While few works have attempted to address all

these issues in the context of sparse feature-point representation

(an early exploratory paper was published in Ref. [14]), many

researches have attacked a variety of aspects of the problem. In

this section, we review techniques related to the problem from

two categories: point pattern matching (PPM) and articulated

motion tracking.

PPM is a fundamental problem for object recognition, motion

estimation and image registration in a wide variety of circum-

stances [15,16]. Many of the approaches have focused on rigid,

affine or projective correspondence, using techniques such as

graphs, interpretation trees [17], Hausdorff distance [18], geo-

metric alignment and hashing [19]. In these cases, the developed

techniques are based on geometric invariance or constraint sat-

isfaction in affine transformations, yielding approximate map-

ping between objects and models [20]. However, these methods

cannot be easily extended to the high-configuration dimension-

ality of a complex articulated motion.

For modelling non-rigidity, elastic model [2,21], weighted-

graph matching [22] and thin-plate spline approaches [23] have

been developed to formulate densely distributed points into

high-level structural presentations of lines, curves or surfaces.

However, the necessary spatial data continuity is not available

in the case of sparse points representing skeletal structures.

Piecewise approaches [24,25] are probably the most appropri-

ate for segmental data. In our case, a set of piecewise affine

transformations with allowable distortion relaxation are sought

for marching to an articulated segment hierarchy under kine-

matic constraints.

A second category of literature deals with the tracking of

a particular type of articulated motion: human motion. Exist-

ing algorithms commonly are model-based to reconstruct pre-

cise poses from video images. The main challenge is to track

a large number of degrees of freedom in high-dimensional

freeform movements in the presence of image noise. To im-

prove the reliability and efficiency of motion tracking, a spatial

hierarchical search, using certain heuristics such as colour or

appearance consistency, has proved successful [26,27]. How-

ever, the spatial hierarchy and matching heuristic may not be

applicable in individual frames due to self-occlusion and im-

age noise. In that case, spatio–temporal approaches have been

shown advantageous in recent researches. Sigal et al. [28] intro-

duced a “loose-limbed model” to emphasise motion coherency

in tracking. Lan and Huttenlocher [29] developed a “unified

spatio–temporal” model exploring both spatial hierarchy and

temporal coherency in articulated tracking from silhouette data.

Spatio–temporal methods have enabled robust limb tracking in

multi-target tracking [30], outdoor scene analysis [31] and 3D

reconstruction of human motion [32]. Our work benefits from

the spatio–temporal concept. However, methodologies based

on the rich information of images cannot be adapted to our

problem domain of motion reconstruction from concise feature-

point representation.

Marker-based motion capture systems exemplify point-

feature trackers [33]. Coloured markers, active markers, or

a set of specially designed marker patterns have been used

to code the identification information in some systems. Such

approaches side step the hard problem of marker identifica-

tion, but at the expense of losing application generality. The

generic PPM problem in articulated motion is exemplified by a

state-of-the-art optical MoCap system, e.g. Vicon [34], without

recourse to marker coding. However, auto-identification may

fail for complex motion. MoCap data normally need time-

consuming manual post-processing before they can be used

for actual applications.

Our previous baseline study [35] developed a segment-based

articulated point matching algorithm for identifying an ar-

bitrary pose of an articulated subject with sparse point fea-

tures from a single-frame data. The algorithm provided a self-

initialisation phase of pose estimation, crucial at the beginning

or on resumption of tracking. It utilised an iterative “coarse-

to-fine” matching scheme, benefiting from the well-known it-

erative closest point (ICP) algorithm [36], to establish a set

of relaxed affine segmental correspondences between a model

point-set and an observed data set taken from one frame of ar-

Author's personal copy

420 B. Li et al. / Pattern Recognition 41 (2008) 418–431

ticulated motion. However, we argued the possibility of more

robust motion reconstruction by combining with information

from inter-frame tracking, that will eventually reduce the uncer-

tainty inherent in the matching problem for single-frame data

[37].

Pursuing the cross-fertilisation of the research and existing

techniques, we extend our previous study on single-frame ar-

ticulated pose identification [35,37] into a dynamic context.

We propose a DSHPM algorithm, targeting the reconstruction

of articulated movement from motion sequences. The DSHPM

algorithm integrates inter-frame tracking and spatial hierarchi-

cal matching to achieve the effectiveness of articulated PPM

to buffered real-time. The idea of segment-based articulated

matching, as computational substrate to explore the spatial hi-

erarchy [35,37], is enhanced by exploiting motion coherency

embedded in inter-frames that ultimately reduces the ambiguity

of identification in the presence of data noise.

3. Framework of the model-based DSHPM algorithm

The generic task under consideration arose from the need

to identify feature-point data to reconstruct underlying skeletal

structure of freeform articulated motion. We assume the data

capture rate is sufficiently high, as demanded in most real-world

applications. This allows the obtaining of feature-point trajec-

tories in successive frames. However, the identities of feature

points (or trajectories) are not known.

3.1. The articulated model and motion data

The subject to be tracked is pre-modelled. A subject model

comprises S segments with complete feature points. Each seg-

ment Ps ={ps,i |i=1, . . . , Ms} has Ms identified feature points

ps,i . The feature points are sufficient in number and distribution

to indicate the orientation and segment structure with demanded

details. Segment non-rigidity is allowed within a threshold of

segmental distortion ratio �s . Articulation is indicated through

join-point commonality between two segments, suggesting a

segment-based hierarchy. To keep the algorithm general, each

segment undergoes independent motion constrained only by

joint points. We do not impose motion constraints such as fea-

sible biological poses for a specific subject type.

The observed motion data of the subject is represented by

a sequence of point-sets at each time frame t: Qt = {qtj |j =

1, . . . , N t }, where the N t data points qtj could be corrupted by

missing data due to occlusion and extra noise data arising from

the process of feature extraction.

3.2. Outline of the DSHPM

To identify massive data within a complex motion sequence,

frame-by-frame model fitting would be computationally ex-

pensive and unnecessary. Ideally, initial entire model fitting

need only be attempted at some key-frames, in particular, at

the start of tracking or on the resumption from tracking fail-

ure. Subsequent identification of an individual feature point

success?

parent–segment correction

dynamic segment–based hierarchical identification in a key–frame range

recruitmenttracking &

backward

success?

pre–tracking &

pre–segmentation

segment

tracking and identity propagation

abandon the

NY

key–frame range

failed 2 times?

identification & recruitment

key–frame

dynamic key–frame–shift

has parent?

an articulated motion sequence presented by fearture points ......

CT-based iterative

segmental matching

Y

N

Y

N

YN

& recruitmentmotion verification

next segment (if any) depth–first along hierarchical tree

Fig. 1. Framework of the dynamic segment-based hierarchical point matching

(DSHPM) algorithm.

can be achieved by tracking over its trajectory. In the case

of broken tracks, re-identification costs are largely reduced

by reference to the known points whose identities are carried

forward.

The framework of the proposed DSHPM algorithm is shown

in Fig. 1. As computation substrate for initial model fitting,

it employs a hierarchial segmental mapping supported by

candidate-table (CT) optimisation at a key-frame (Section 4.2).

To reduce the inherent uncertainty of segmental match-

ing in a single key-frame, a dynamic scheme (Section 4.3)

incorporating temporal coherency in a key-frame range is

explored through inter-frame tracking. This includes three

phases: motion verification, dynamic key-frame-shift identi-

fication and backward parent-segment correction, as shown

in Fig. 1. Under CT-based iterative matching, the algorithm

first verifies that a proposed segmental match is consistent with

an affine transformation over a period of movement subject to

a relaxed geometric invariance defined by a segmental distor-

tion ratio �s . We name this process motion verification (Sec-

tion 4.3.1). If a segment cannot be identified or the segment

identification cannot be proved correct by motion verification,

reflecting poor segment data in the current key-frame, the al-

gorithm shifts the key-frame forwards a certain time period

to attempt re-identification. This process is denoted dynamic

key-frame-shift identification (Section 4.3.2). The final phase,

backward parent-segment correction, aims to correct any

wrongly identified parent-segment which could cause subse-

quent unsuccessful matches of child-segments (Section 4.3.3).

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 421

In the dynamic process, a temporal key-frame shift is always

accompanied by a recruitment procedure that forward propa-

gates already obtained identities and recruits any newly appear-

ing matches in order to maintain spatial hierarchy.

4. The DSHPM algorithm

Identification is carried out hierarchically segment by seg-

ment in a chosen key-frame containing e.g. over 90% of the

model points (Section 4.2), or in a key-frame range when nec-

essary, taking advantage of a dynamic scheme (Section 4.3). In

order to make temporal coherence of motion cues exploitable

and reduce the search space, feature-point pre-tracking and pre-

segmentation are carried out prior to segmental identification

(Section 4.1).

4.1. Pre-tracking and pre-segmentation

Feature-point data are tracked in a time period before identi-

fication. We denote this step as pre-tracking in Fig. 1. This pro-

cess allows propagating inter-frame correspondences of feature

points in a key-frame backwards and forwards along stacked

pre-tracked trajectories. It not only makes use of motion co-

herence for efficient segment retrieval, but also makes the key-

frame-based identification feasible.

The pre-tracked trajectories exhibit relative motion cues of

individual points. To reduce the search space of a segment,

a pre-segmentation process is carried out prior to segmental

identification, as shown in Fig. 1. We group unidentified points

that maintain relatively constant distances during articulated

movements, as candidates for intra-segmental membership.

The pre-segmentation is subjected to criteria that depend on

the Euclidean distance Dti,j between each pair of observed data

(qi, qj ) at frames t=K+n�, n=0, 1, . . . , 10, starting from the

key-frame K and proceeding in intervals �, where � denotes

the motion relaxation interval, that is, the number of frames

during which motion relaxation takes place, reflecting notice-

able changes in pose. We determine intra-segmental point-pair

candidature (qi, qj ) using a two-stage criterion with relaxation:

Dti,j <

(

1 + maxs

�s

)

maxs

Ds , (1)

maxn(DK+n�i,j ) − minn(D

K+n�i,j )

avgn(DK+n�i,j )

< maxs

�s , (2)

where the segment distortion ratio �s is determined by relative

variation of edge length, among segmental point-pairs, reflect-

ing allowed “non-rigidity” of a segment in motion.

Criterion (1) indicates that the point-pair distance Dti,j should

be less than the maximum intra-segmental point-pair distance

Ds with maximum distortion relaxation. Criterion (2) requires

that the ratio between the extremal distance difference and the

average distance of the point-pair, over the intervals, should be

less than the maximum distortion ratio allowed in any segment.

If both criteria are satisfied, indicating that point-pair (qi, qj )

maintains a consistent distance with allowed relaxation in

articulated movements and may therefore belong to a same

segment, we store this information in a segmentation matrix

Seg and set Seg(i,j) = 1; otherwise we set Seg(i,j) = 0 for an

extra-segment pair.

4.2. CT-based iterative segmental identification

Articulated motion maintains a relaxed geometric invariance

in near-rigid segments. Matching at segment level is therefore

preferable to brute global point-to-point searching. Initially nei-

ther correspondence nor motion transformation are known for

any segment or point. To identify a segment Ps in an articu-

lated structure at tracking start, we adapt the basic idea of CT

for pose identification developed in our pervious work [35,37],

to the new context of motion sequence. We enhance the static

CT-based iterative segmental matching by exploring motion co-

herence embedded in inter-frames.

Briefly, a CT is created in two stages: (1) CT generation

and optimisation augmented by pre-segmentation information

(Section 4.1), and (2) CT-based iterative matching (Section 4.1).

4.2.1. CT generation

As explained in Refs. [35,37], the CT of segment Ps is deter-

mined by intra-segmental distance similarity, here augmented

with heuristic rigidity cues available from pre-segmentation. To

define the column ordering of a CT for a segment in which no

point has been identified, arbitrarily choose a “pivot” reference

point ppivots, and order the remaining model points pi by non-

decreasing distance Dpivots ,ifrom the pivot, giving a model

pivot sequence for the segment.

To match the model pivot sequence, a sequential search is

applied, with the possibility of rapid rejection of false candi-

dates. Thus, from the unidentified data at key-frame K, arbi-

trarily choose an assumed pivot match qK

apivotsof ppivots

, and

calculate its distance DK

apivots ,jto all other unidentified points

qKj .

To exclude large outliers of the segment based on the cho-

sen pivot, a pivot-centred bounding box, relaxed by distortion

ratio �s , is applied [35]. Then in the bounding subspace, the

algorithm seeks match candidates of each model point based on

distance similarity with reference to the assumed pivot qK

apivots,

satisfying the distortion tolerance,

|Dpivots ,i− DK

apivots ,j|Seg

(apivots ,j)=1|

Dpivots ,i< �s , (3)

in which the candidate selection is restricted by the pre-

segmentation point-pair rigidity criterion Seg(apivots ,j) = 1.

We list such selected candidates in a table column as possible

matches. The procedure is repeated for every element along

the model pivotal sequence, giving rise to an ordered matching

sequence of columns that define the CT for the assumed pivot

match (ppivots, qK

apivots).

Taking each unidentified point in turn as an assumed pivot

match, we generate a set of CTs for that model pivot choice.

Author's personal copy

422 B. Li et al. / Pattern Recognition 41 (2008) 418–431

Heuristically, the CT constructed with the correct pivot match,

if present in the data, should include more candidates than other

CTs. To economise the iterative search at next stage (Section

4.2.2), CT prioritising by CT-culling, CT-ranking and candidate

ordering are applied to reduce the search space in which the

correct solution is likely to be found, as discussed in Ref. [35].

The use of CTs makes the assumption of small motion or pose

similarity, used e.g. in the ICP [36], unnecessary.

When a join point has already been identified during its

parent-segment identification, then this point is chosen as the

unique pivot. In this case, only one CT is generated, resulting

in a striking reduction of search space.

4.2.2. Iterative segmental matching

To detect the presence or absence of a one-to-one intra-

segmental feature-point correspondences in a CT of the reduced

set of prioritised CTs, an iterative matching procedure is em-

ployed to seek the best transformation that interprets segmen-

tal movement under distortion relaxation. We take first can-

didates in the top row of a CT to provide an assumed ini-

tial match QKs of model segment Ps . An affine transformation

[Rs, Ts] is determined under this correspondence by a least-

square SVD method [38]. If this transformation maps model

points into a rough alignment with their assumed matches, so

that the average mapping error satisfies the desired matching

quality bounded by the segmental distortion ratio,

e(Ps→Q

Ks )

|[Rs ,Ts ] < �s , (4)

then the assumed segmental match (Ps → QKs ) is taken as

correct. Otherwise, there must be pseudo pairs in the matching

assumption which need to be removed by coarse-to-fine error-

driven iteration. Pseudo pairs exaggerate individual matching

errors at wrong matches. Based on this cue, we remove the

worst match by replacing it with its next candidate in the CT, if it

exists; otherwise we omit its match from the currently assumed

correspondence in the CT. If this CT becomes exhausted before

the best segment match is found, then a new CT would be

interrogated [37].

To qualify as a whole segmental match, the transformation

under the assumed correspondence should also guarantee the

desired matching quantity fraction �s ,

number(Ps → QKs )|[Rs ,Ts ]

Ms

> �s , (5)

where Ms is the number of feature points in model segment

Ps . If the matching quantity criterion is not satisfied, the algo-

rithm will attempt to find the remaining matches of the segment,

which may have been dropped during iterations, or were ex-

cluded from the CT on grounds of limiting the search space via

stringent criteria equations (1)–(3). Finding remaining matches

is achieved by reassigning their nearest-neighbours in the data

under current transformation [Rs, Ts] (refer to segment recruit-

ment shown in Fig. 3). If no such closest neighbour is found,

we say the match of the point is lost.

Iterative motion estimation and refinement alternately update

the assumed matches until converging to a segmental match

Fig. 2. Motion verification.

(Ps → QKs ) in the correct CT, leading to satisfaction on both

matching quality equation (4) and matching quantity equation

(5). In the event of no CT providing an acceptable match, the

segment match will be deemed not to exist in the current key-

frame.

In the case of segments with less than three matching pairs,

the SVD-based motion estimation cannot be applied. Segmental

identification becomes highly uncertain. We need to confirm

such a segment in the hierarchical chain depending on whether

its children or even grandchildren can be found (Section 4.4).

4.3. Dynamic identification

Single-frame spatial pose data alone may have inherent un-

certainty in determining the correct match from noisy data. In

the dynamic context of a motion sequence, geometric coher-

ence embedded in the temporal domain along pre-tracked tra-

jectories is used to improve the reliability of identification from

a key-frame. As shown in Fig. 1, after the CT-based iterative

segmental matching (Section 4.2), a dynamic scheme of motion

verification (Section 4.3.1), dynamic key-frame-shift identifica-

tion (Section 4.3.2) and backward parent-segment correction

(Section 4.3.3) is applied in a key-frame range to guarantee a

most likely correct segment identification.

4.3.1. Motion verification

In segment-based articulated motion, geometric invariance

of segmental “rigidity” should be maintained over movements.

The idea of motion verification is therefore to propagate the

segmental feature-point identities along their pre-tracked tra-

jectories, and to confirm an affine transformation under such

correspondence still satisfying the matching quality criterion

equation (4) within an allowed distortion relaxation. As sum-

marised in Fig. 2, the segmental matching obtained in the key-

frame should be confirmed via its “rigidity”, after a motion

relaxation interval �-frame-shift of the key-frame.

When the key-frame segment match (Ps → QKs ) is con-

firmed, the algorithm will attempt to retrieve newly appearing

points at the observed frame K + � if the segment is incom-

plete. This is achieved by the segment recruitment procedure as

shown in Fig. 3. If at the observed frame K+�, more matches

are found, reflecting good data quality, then the dynamic iden-

tification scheme favours a key-frame-shift to K + �, to be

described below.

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 423

Fig. 3. Segment recruitment.

Fig. 4. Recursive Parent_segment_correction.

4.3.2. Dynamic key-frame-shift identification

The quality of some frame data for a segment may be very

poor, on account of excessive missing/extra data or distortion.

In this case, CT-based segmental identification may fail. This

will break off the hierarchical search and result in serious un-

certainty for successive child-segment identification. For this

reason, we do not confine the segment matching to the initially

chosen key-frame, but rather carry out matching in a key-frame

range. If the segment identification or verification fails, then the

dynamic key-frame-shift module is used to re-identify the seg-

ment in up to two successive key-frame-shifts, as shown Fig. 1.

In order to maintain spatial hierarchy, after the key-frame-

shift, the recruitment procedure in Fig. 3 is applied to all in-

completely identified segments to encompass any previously

missed matches and forward propagate the obtained segmental

identities into the new key-frame.

4.3.3. Recursive parent-segment correction

If two successive key-frame-shift processes still fail to iden-

tify or verify a segment Ps , this may imply a wrong or highly

distorted joint pivot in use, derived from its parent-segment dur-

ing hierarchical searching. In this case, the algorithm attempts a

recursive backward parent-segment correction to check for the

join point and even its parent-segments, as described in Fig. 4.

If after a series of dynamic attempts, no parent join takes part

in the failed identification, we ultimately abandon the segment.

This indicates that the segment could have been occluded, or

have poor data quality even over a range of investigated frames.

4.4. Integrating temporal coherence with spatial hierarchy for

articulated match

Articulation at inter-segment joins is represented in a tree hi-

erarchy. Consistent matching of articulated segments is carried

out with respect to this tree. Such spatial hierarchy organises

segment identification in a parent–child ordering, thereby car-

rying forward identified join points in a parent to its children.

We assume that one of the segments of the articulated struc-

ture contains more points and has more segments linked to it

than most other segments. We treat such a segment as root,

seeking to identify it first. After the root has been located,

searching proceeds depth-first to child-segments along hierar-

chical chains, taking advantage of available joints which have

been located during parent-segment identification. This linkage

through join points considerably increases the reliability and

efficiency of child-segment identification. In the case of miss-

ing joint data on a parent segment, we recover a virtual joint if

at least three identified points are obtained in the parent. When

a parent has several children, searching prioritises to the child

with the most model points, as its identification incurs the least

uncertainty from missing, extra or distorted data, and leads to

the greatest subsequent search space reduction. In the case of

broken search chains in the hierarchy due to a failed segment

identification or missing join point, identification will proceed

to other segments on other chains first and leave any remain-

ing child-segments on broken chains to be identified last under

conditions of a much reduced search space.

The segment-based hierarchical search operates dynamically

in a key-frame range, rather than being confined to a single

static data frame as in Refs. [35,37]. This lends robustness to

segment identification, as data maybe poor in one frame while

good in another. When the hierarchical chain is broken in a

chosen frame, the algorithm could shift to a new frame to carry

on the search. Most existing feature-point identifications will

be carried forwards along pre-tracked trajectories to propagate

spatial continuity into the new frame. An obtained segment

match can be confirmed by geometry invariance presented

in the temporal domain of a motion relaxation interval �

(Fig. 2). A failed child-segment identification caused by a

wrongly inherited pivot from its parent can be corrected by a

recursive procedure (Fig. 4). Meanwhile, the dynamic scheme

allows an efficient recruitment of reappearing segment points

by reference to the known points whose identities are carried

forward (Fig. 3). Our experimental results confirm that tem-

poral coherency integrated with spatial hierarchical cohesion

enhances identification of complex articulated motion in the

presence of data corruption (Section 5).

4.5. Identification with tracking for registering a whole

motion sequence

After segment-based dynamic hierarchical matching in a key-

frame range, the identity of each feature point can be propagated

Author's personal copy

424 B. Li et al. / Pattern Recognition 41 (2008) 418–431

along its trajectory by inter-frame tracking. The algorithm con-

tinues to track identified points forwards throughout the whole

motion sequence until missing data are encountered, causing

broken trajectories. For missing data, the algorithm attempts to

identify reappearing points or even segments and restart their

tracking. This is much easier by reference to the already identi-

fied points than at the initial stage of model fitting, because the

motion transformation can be obtained from partially identified

segment matches. In the case of an entire newly appearing seg-

ment, identification complexity is evidently much reduced in

the presence of previously established correspondences in the

articulation.

5. Experiments

The algorithm has been implemented in Matlab. We tested it

on articulated models, such as human and robot manipulators

in various low densities and distributions of feature points. Hu-

man motion representing a typical articulated motion with seg-

ments of only near-rigidity makes the identification task more

difficult than in the case of robot manipulator motion with rigid

segments. To reflect this challenge, we report in this section ex-

perimental results on real-world human motion capture and its

overlays with synthetic noise for performance analysis of the

algorithm.

In our experiments, all 3D model data and human motion

data were acquired via a marker-based optical MoCap Vicon

512 system [34]. The measurement accuracy of this system is

to the level of a few millimetres in a control volume spanning

metres in linear extent. We attached markers as extrinsic feature

points on a subject at key sites, indicating segmental structure

with required detail. Marker attachment to tight clothing or

bony landmarks nevertheless introduced inevitable segmental

non-rigidity due to underlying soft body tissues. The sampling

rate of human MoCap was 60 frame per second (fps) in our

experiments.

5.1. Human motion reconstruction from MoCap data

A number of freeform movements captured from various

subjects in various point distributions were investigated. The

subject model is first generated off-line using one complete

frame of feature-point data captured in a clear pose. We manu-

ally identified the data in a 3D-interactive display and grouped

them into segments consistent with the underlying articulated

anatomy. This produced a “stick-figure” skeleton model of the

subject, as shown in the first of the figure sequences in Fig. 6(a)

and (b). Having attached markers to the subject and defined its

model, we proceeded with the capture of the subject’s freeform

motion using the Vicon MoCap 512 system.

5.1.1. Parameter setting

The segmental distortion ratio �s in Eqs. (1)–(4) and match-

ing quantity �s in Eq. (5) were pre-defined according to the

segment rigidity and the quality of MoCap data. Precise values

of the parameters are not required a priori, but algorithmic per-

formance will be compromised by very inappropriate choices.

Fig. 5. Capture static pose data for subject model generation.

To provide experimental values of �s , we analysed a number of

dynamic trials. We found that segmental distortion differs with

different body parts and motion intensity. Thigh segments may

give rise to large distortion with �s ≈ 0.2. A value of �s ≈ 0.05

was found to be adequate for indicating the rigidity of the head,

and an average �s ≈ 0.1 for other body parts. We used one set

of approximate distortion ratios �s =.05..2 in all human motion

experiments (Fig. 5).

For a rigid segment, a small value of �s guarantees a pre-

cise matching quality and provides good ability of rejecting

outliers. Therefore, we could reduce the matching quantity re-

quirement �s to gain more tolerance on missing data. For a de-

formable segment, a high �s value allows increased distortion,

but at the cost of increased candidate search space and possible

low-matching quality. For compensation, we have to raise the

matching size requirement �s with possible compromised han-

dling of missing data. Based on the quality of MoCap data and

the rigidity of individual segments, we set the matching quan-

tity �s = .80..90. To reflect significant pose changes, we chose

a motion relaxation interval � = 15 frames, corresponding to

0.25 s at the MoCap rate of 60 fps.

5.1.2. Reconstruction results

Illustrative results of identified MoCap sequences from rep-

resentative full-body models of 27, 34 and 49 feature points in

15 segments are given in Table 1 and in Fig. 6. In Fig. 6, iden-

tified feature points are linked intra-segmentally. As shown in

Fig. 6, there is no assumption of pose similarity between the

model (the frame first in Fig. 6(a), (b)) and their motion se-

quences, initially or during the movements. The captured mo-

tion sequences were subject to inevitable intermittent missing

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 425

Table 1

Identification examples of human motion (MoCap rate at 60 fps)

Activity Sequence length Number of Identification Efficiency (identified

in frames (seconds) trajectories rate (%) frames per second)

Protocol 1: 27 feature points

Walking 600 (10) 32 96 300

Running 600 (10) 40 95 270

Freeform movement 1200 (20) 93 94 220

Protocol 2: 34 feature points

Walking 600 (10) 45 98 380

Running 600 (10) 54 97 320

Freeform movement 1200 (20) 106 94 250

Protocol 3: 49 feature points

Walking 600 (10) 56 96 280

Running 600 (10) 65 94 220

Freeform movement 1200 (20) 128 91 130

points, extra noise points and segmental distortion in complex

intensive motion. We observe that the proposed DSHPM algo-

rithm is capable of reconstructing an articulated motion rep-

resented by sparse or moderately dense feature points in the

presence of data noise. Even when some key points, such as

join points, or segments have been lost, the algorithm can still

carry on the identification process by taking advantage of the

proposed dynamic scheme.

This algorithm has been successfully used to identify a num-

ber of MoCap trials in a commercial project to produce the

game “Dance: UK”.1 Extracts from an identified dance se-

quence are shown in Fig. 7.

5.1.3. Performance analysis from the MoCap data

To demonstrate the performance of the DSHPM algorithm,

Table 1 gives results of human motion identification obtained

by applying the DSHPM algorithm on real-world MoCap

trials. The average missing and extra data in the real-world

MoCap examples is about 10–15%. Some general types of

activities, listed in the first column, were tested under three

types of marker protocol: 27, 34 and denser 49 feature points.

The freeform movement includes walking running, jumping,

bending and dance.

The results shown in each row of Table 1 were averaged

for a number of trials from different subjects preforming sim-

ilar movements and with the same marker protocol. The av-

erage length of each type of activity, measured in frames as

shown in column 2, gives an indication of absolute motion pe-

riod at the 60 fps MoCap rate. The sequence length by itself

does not always indicate the difficulty of identification. In the

most favourable case of no tracking interruption, each feature

point need be identified only once, allowing its identity to be

propagated along its trajectory with minimum re-identification

cost. However, a trial with occlusions in complex movements

will lead to increased computational cost, as each reappearing

feature point is subjected to identification after tracking loss.

1 The “Dance: UK” was developed in collaboration with Broadsword

Interactive Ltd [38,39]. It was released during Christmas 2004.

To indicate identification workload due to lost tracking, col-

umn 3 gives the number of trajectories, counting interruptions.

These are generally higher than the indicated number of feature

points, consistent with increasing identification difficulty.

Activities (first column) for each marker protocol are or-

dered by increasing movement complexity, accompanied by in-

creased identification difficulty due to more missing data and

extra noise data. This is reflected by the decreasing identifica-

tion rate in column 4. The identification rate is defined as the

percentage of the number of correctly identified trajectories in

relation to the total number of trajectories encountered. The

high trajectory-based rate emphasises correct identification ob-

tained via segment-based articulated matching, illustrating the

effective nature of the algorithm, rather than identification in-

herited only from inter-frame tracking, thus illustrating the ef-

fective nature of the algorithm.

We observed that the identification rate in Table 1 is in excess

of 90% in all motion types, whether with sparser or denser

feature points, even for large accelerative movements with big

segmental distortion, such as in jumping and dancing, and for

complex movements characterised by large numbers of broken

trajectories due to occlusion.

Reconstruction efficiency of the algorithm depends on the

complexity of the articulated model, but critically also on the

motion conditions: the level of segmental distortion associated

with movement intensity, the frequency and amount of miss-

ing/extra data associated with motion complexity. We indicated

an empirical reconstruction efficiency via “identified frames per

second” in Table 1. This indicator is defined as the length of

a trial (measured in frames) divided by the computational time

(measured in seconds) when the DSHPM identification was ex-

ecuted on a Compaq Pentium IV with 512 MB RAM in Matlab

code. We observe that for common activities of walking and

running, the identification efficiency in the type 2 marker pro-

tocol (34 feature points) is higher than for types 1 and 3 (27

and 49 feature points, respectively). Type 2 is a compromise

between having too few feature points (type 1) to allow unin-

terrupted hierarchical searching in the case of missing data, and

the denser data sets (type 3), with possible identification con-

Author's personal copy

426 B. Li et al. / Pattern Recognition 41 (2008) 418–431

–400–2000

200–1000

–5000

0

500

1000

1500

model

–600–400

–2000

200

–2000

200

400

0

500

1000

1500

Frame=50

–1000

–500

00 200

400600

0

500

1000

1500

Frame=200

–400–200

0200

–2000

200400

600800

0

500

1000

1500

Frame=350

–400–2000

200

–400200

0200

400600

0

500

1000

1500

Frame=500

–1000

–500

0

0

500

1000

0

500

1000

1500

Frame=650

–2000200400600

0

500

1000

0

500

1000

1500

Frame=800

0200

400600

0200400

600

0

500

1000

1500

Frame=950

0500

10000

500

0

500

1000

1500

model

0 500 10000

5000

500

1000

1500

Frame=60

0500 0500

0

200

400

600

800

1000

1200

1400

1600

Frame=120

0500

1000 0200

400600

8000

500

1000

1500

Frame=180

0500

10000200400600800

0

500

1000

1500

Frame=240

05001000 05000

500

1000

1500

Frame=300

0 200400600 800

0200

400600

0

500

1000

1500

Frame=360

0500

10000

5000

500

1000

1500

Frame=420

Fig. 6. Reconstructed human freeform movements: subject models of 15 segments followed by 7 sampled frames from their identified motion sequences: (a)

human motion represented by 34 feature points; (b) human motion represented by 49 feature points.

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 427

Fig. 7. Dance trial reconstruction in the game project “Dance: UK”.

Author's personal copy

428 B. Li et al. / Pattern Recognition 41 (2008) 418–431

0 0.09 0.18 0.27

additional distortion level

0 0.09 0.18 0.27

additional distortion level

0.5

0.6

0.7

0.8

0.9

1

Ide

ntifica

tio

n r

ate

0.5

0.6

0.7

0.8

0.9

1

Ide

ntifica

tio

n r

ate

30 points

50 points

30 points

50 points

Fig. 8. Comparison of the static approach [35] with the DSHPM approach for motion reconstruction with additional synthetic distortion: (a) static identification;

(b) dynamic identification.

fusion and general increased computation cost. For each type

of marker protocol, identification efficiency decreases with in-

creasing activity complexity involving more broken trajecto-

ries and data noise. In all cases, the identification efficiency

exceeded the MoCap rate 60 fps by at least two times, making

identification time competitive with real motion time. This sug-

gests reconstruction for an on-line tracker realised in buffered

real-time.

5.2. Evaluation based on synthetic distortion of real data

To evaluate the robustness and efficiency of the DSHPM al-

gorithm, we used two MoCap walking sequences. Each of them

has 600 frames corresponding to 10 s at a MoCap rate 60 fps.

They were captured from the same subject for comparability,

in the sparse case of 30 points and the denser case of 50 points,

respectively. Both sequences are denoted as “ideal”, having no

missing or extra data, and minimal distortion by marker attach-

ment to the subject at tightly clothed key sites. Their identifi-

cation rate by the DSHPM is 100%.

5.2.1. Identification of distorted motion data compared to

dynamic and static schemes

In the first series of experiments, we compared identifica-

tion effectiveness by the proposed dynamic strategy with that

of a static identification scheme [35], under increasing mo-

tion distortion. In the latter, identification is carried out in

isolation at each frame, without considering any inter-frame

temporal coherence. To simulate the effect of distortion un-

der variable motion intensity, we augmented the “ideal” mo-

tion data with synthetic noise over its natural distortion, as

follows. Taking a pre-identified “ideal” walking sequence, we

added Gaussian noise N(0, 0.5�ls)/√

6 to the x, y and z co-

ordinates of each point in sequences of over 600 frames, the

standard deviation being parameterised by a dimensionless dis-

tortion level � and an average segmental length ls . The aver-

age identification rates (fraction of correctly identified points)

over 500 trials, versus the increasing distortion level of both

sparser 30 and denser 50 feature-point walking trials, using

either the static or dynamic identification scheme, are given

in Fig. 8(a) and (b).

We observe that increased distortion leads to more potential

for confusion among neighbouring data points, especially for

the denser point-set, and to greater loss of identification and

tracking difficulty. However, comparing Fig. 8(a) and (b), the

DSHPM algorithm achieves better identification rates than the

static method, at a given distortion level. This is more evident

with increasing distortion, as is to be expected from the DSHPM

robustness gained by exploiting motion coherence embedded

in inter-frames to survive spatial distorted data. The advantage

of using DSHPH is more obvious for the difficult situation of

the denser set.

5.2.2. Identification of corrupted data with missing/extra data

The second series of experiments studied the ability of the

DSHPM algorithm to identify “ideal” walking sequences sub-

jected to increasing missing or extra data. To obtain test motion

sequences with missing data, we removed feature-point data

randomly and evenly among segments, with gaps continuing

for 1–60 frames and average length Lcorrupt = 30 frames. To

generate an extra trajectory in a volume encompassing the ob-

served data, we randomly inserted two points, one in each of

two frames with 1–60 frames apart, and linearly interpolated

the trajectory in-betweens. The average length of an extra tra-

jectory is therefore Lcorrupt = 30 frames. The fraction of such

corrupted (missing or extra) data is defined as (Lcorrupt/L) ×(Ncorrupt/N), in which Ncorrupt denotes the number of miss-

ing or extra trajectories generated, where L is the frame

length of the sequence and N is the number of model feature

points.

Average identification rates of 500 trials at different data cor-

ruption levels are shown in Fig. 9. Fractions of missing or ex-

tra data added to the “ideal” walking sequences are indicated

by the bottom horizontal axis. The corresponding numbers of

broken trajectories encountered at each missing or extra noise

level, for 30 (and 50) point trials, are given along the top axis.

Comparing the left and right of the zero-line, the latter line

indicating identification of the “ideal” sequences with original

30 or 50 trajectories, we observe that the algorithm demon-

strates good robustness in rejecting large numbers of outliers,

but rapidly fails to survive the inherent difficulty of increasing

missing data. It is also evident that the denser set gains better

tolerance on missing data than the depleted sparser point-set.

However, more identification loss happens to the denser set for

extra data.

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 429

Num. of trajectories for sparse (30) and dense (50) point sets

-0.2

4

-0.1

8

-0.1

2

-0.0

6 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

174 (290)

138 (230)

102 (170)

66 (110)

30 (50)

90 (150)

150 (250)

210 (350)

270 (450)

330 (550)

390 (650)

450 (750)

510 (850)

570 (950)

630 (1050)

0.5

0.6

0.7

0.8

0.9

1

1.1

Fraction of missing (-) and extra (+) data

Iden

tifi

cati

on

rate

30 points

50 points

Fig. 9. DSHPM identification with synthetic missing and extra data added in the “ideal” walking movements.

Num. of trajectories for sparse (30) and dense (50) point sets

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.10

-0.0

6

-0.1

2

-0.1

8

-0.2

4

630 (

1050)

570 (

950)

510 (

850)

450 (

750)

390 (

650)

330 (

550)

270 (

450)

210 (

350)

150 (

250)

90 (

150)

30 (

50)

66 (

110)

102 (

170)

138 (

230)

174 (

290)

10

100

1000

Fraction of missing (-) and extra (+) data

SV

D-c

ou

nt

30 points

50 points

Fig. 10. SVD-count versus missing and extra data.

5.2.3. Complexity

During dynamic hierarchical matching, the identification

step is computationally more intensive than inter-frame track-

ing. Motion transformation [Rs, Ts] calculated by the SVD

[38] under an assumed segmental correspondence is the

most time consuming step, and is invoked in most essen-

tial modules, e.g. CT-based iterative segment match, motion

verification and recruitment. Therefore, in the last series of

experiments, we attempted to measure an empirical com-

plexity via the total number of SVD invocations, denoted

by SVD-count.

We undertook such a complexity analysis relating to the two

experiments of Section 5.2.2 above. SVD-counts versus miss-

ing/extra data in walking sequences, for both cases of sparser

and denser point sets, are monitored in Fig. 10. In both cases, we

observe that SVD-counts grow steadily with increasing extra

data in an approximate log-linear manner. Most SVD-counts are

spent at the initial key-frame identification stage. Ideally, when

all segments are identified without missing data, any of outliers

need only to be tracked without further identification cost. On

the left side of zero-line in Fig. 10, SVD load increases rapidly

with increasing missing data. However, the growth tendency is

Author's personal copy

430 B. Li et al. / Pattern Recognition 41 (2008) 418–431

restrained toward higher fractions of lost data. This is because

on the one hand, incomplete data causes more identification

and verification difficulties during initial segmental matching,

and the recruitment function that invokes the SVD is required

to encompass any newly appearing matches; on the other hand,

missing data reduce the number of points to be identified and

raise the possibility of segment abandon. Comparing the denser

and sparser cases, all have the same number of segments, but

the denser set will lead to more populated CTs. This is likely to

require greater numbers of match attempts, but the overall com-

plexity is seen to grow only at some low power of the extra data

measure.

6. Conclusion

The proposed dynamic segment-based hierarchical point

matching (DSHPM) algorithm addresses a general and cur-

rently open problem in pattern recognition: non-rigid articu-

lated motion reconstruction from low-density feature points.

The algorithm has a crucial self-initialisation phase of pose

estimation, benefiting from our previous work [35,37]. In the

context of a dynamic sequence, the DSHPM algorithm inte-

grates a key-frame-based dynamic hierarchial matching with

inter-frame tracking to achieve computation efficiency and

robustness to data noise. The candidate table optimisation

heuristics are improved by exploiting geometric coherency

embedded in inter-frames. Segment-based articulated match-

ing along a spatial hierarchy is significantly enhanced by

a dynamic scheme, in the forms of motion-based verifica-

tion, dynamic key-frame-shift identification and backward

parent-segment correction. Performance analysis of the al-

gorithm using synthetic data demonstrates the effectiveness

of the dynamic scheme that ultimately determines the ro-

bustness of articulated motion reconstruction and reduces

the uncertainty inherent in the matching problem for single

frame.

We provided illustrative experimental results of human mo-

tion reconstruction using 3D real-world MoCap data. Identifica-

tion rates for most common freeform movements have achieved

90% or higher without requiring manual intervention to aid the

identification. Identification efficiency proceeded at over twice

the common MoCap rate of 60 fps. This suggests the DSHPM

algorithm has the potential for self-initialising point-feature

tracking and identification of articulated movement in real-time

applications.

Acknowledgements

All model and motion data used in our experiments

were obtained by a marker-based optical motion capture

system—Vicon-512, installed at the Department of Computer

Science, UWA. Some motion trials analysed in this paper were

captured for the game project “Dance: UK” in collaboration

with Broadsword Interactive Ltd. [39].

References

[1] J.K. Aggarwal, Q. Cai, W. Liao, B. Sabata, Articulated and elastic non-

rigid motion: a review, In Proceedings of the IEEE Workshop on Motion

of Non-Rigid and Articulated Object, Austin, TX, 1994, pp. 2–14.

[2] J. Maintz, M. Viergever, A survey of medical image registration, IEEE

Eng. Med. Biol. Mag. 2 (1) (1998) 1–36.

[3] J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by

annealed particle filtering, in: Proceedings of the IEEE International

Conference on CVPR, vol. 2, 2000, pp. 126–133.

[4] T. Moeslund, E. Granum, A survey of computer vision-based human

motion capture, Comput. Vision Image Understanding 81 (3) (2001) 231

–268.

[5] L. Wang, W. Hu, T. Tan, Recent developments in human motion analysis,

Pattern Recognition 36 (3) (2003) 585–601.

[6] C. Cédras, M. Shah, A survey of motion analysis from moving light

displays, in: Proceedings of the IEEE Computer Vision and Pattern

Recognition, Washington, June 1994, pp. 214–221.

[7] C. Taylor, Reconstruction of articulated objects from point

correspondences in a single uncalibrated image, Comput. Vision Image

Understanding 80 (3) (2000) 349–363.

[8] J. Zhang, R. Collins, Y. Liu, Representation and matching of articulated

shapes, in: Proceedings of the IEEE International Conference on CVPR,

vol. 2, 2004, pp. 342–349.

[9] I. Cox, S. Hingorani, An efficient implementation of Reid’s multiple

hypothesis tracking algorithm and its evaluation for the purpose of visual

tracking, IEEE Trans. Pattern Anal. Mach. Intell. 18 (2) (1996) 138–150.

[10] S. Deb, M. Yeddanapudi, K. Pattipati, Y. Bar-Shalom, A generalized S-D

assignment algorithm for multisensor–multitarget state estimation, IEEE

Trans. Aerosp. Electron. Syst. 33 (2) (1997) 523–538.

[11] C. Veenman, M. Reinders, E. Backer, Resolving motion correspondence

for densely moving points, IEEE Trans. Pattern Anal. Mach. Intell. 23

(1) (2001) 54–72.

[12] Y. Wang, Feature point correspondence between consecutive frames based

on genetic algorithm, Int. J. Robot. Autom. 21 (2006) 2841–2862.

[13] M. Ringer, J. Lasenby, Modelling and tracking articulated motion from

multiple camera views, in: Proceedings of the British Machine Vision

Conference, Bristol, UK, September 2000, pp. 172–182.

[14] B. Li, H. Holstein, Dynamic segment-based sparse feature-point

matching in articulate motion, in: Proceedings of the IEEE International

Conference on Systems, Man and Cybernetics, 2002.

[15] R. Campbell, P. Flynn, A survey of free-form object representation and

recognition techniques, Comput. Vision Image Understanding 81 (2001)

166–210.

[16] B. Li, Q. Meng, H. Holstein, Point pattern matching and applications—a

review, in: Proceedings of the IEEE International Conference on Systems,

Man and Cybernetics, Washington, DC, USA, October 2003.

[17] V. Gaede, O. Günther, Multidimensional access methods, ACM Comput.

Surv. 30 (2) (1998) 170–231.

[18] D.M. Mount, N.S. Netanyahu, J.L. Moigne, Efficient algorithms for

robust feature matching, Pattern Recognition 32 (1999) 17–38.

[19] H.J. Wolfson, I. Rigoutsos, Geometric hashing: an overview, IEEE

Comput. Sci. Eng. 4 (1997) 10–21.

[20] W.E.L. Grimson, T. Lozano-Perez, D. Huttenlocher, Object Recognition

by Computer: The Role of Geometric Constraints, MIT Press,

Cambridge, MA, 1990.

[21] E. Bardinet, L.D. Cohen, N. Ayache, A parametric deformable model to

fit unstructured 3D data, Comput. Vision Image Understanding 71 (1)

(1998) 39–54.

[22] A. Cross, E. Hancock, Graph matching with a dual-step EM algorithm,

IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1236–1253.

[23] H. Chui, A. Rangarajan, A new point matching algorithm for non-rigid

registration, Comput. Vision Image Understanding 89 (2003) 114–141.

[24] A. Pitiot, G. Malandain, E. Bardinet, P. Thompson, Piecewise affine

registration of biological images, in: Second International Workshop on

Biomedical Image Registration, 2003.

[25] G. Seetharaman, G. Gasperas, K. Palaniappan, A piecewise affine model

for image registration in nonrigid motion analysis, in: Proceedings of

Author's personal copy

B. Li et al. / Pattern Recognition 41 (2008) 418–431 431

the IEEE International Conference on Image Processing, 2000,

pp. 1233 –1238.

[26] D. Forsyth, D. Ramanan, C. Sminchisescu, People tracking, in:

Proceedings of the IEEE International Conference on Computer Vision

and Pattern Recognition, 2006.

[27] D. Gavrila, L. Davis, Model-based tracking of humans in action: a

multi-view approach, in: Proceedings of the IEEE Computer Vision and

Pattern Recognition, San Francisco, 1996, pp. 73–80.

[28] L. Sigal, S. Bhatia, S. Roth, M. Black, M. Isard, Tracking loose-

limbed people, in: Proceedings of the IEEE International Conference on

Computer Vision and Pattern Recognition, 2004.

[29] X. Lan, D. Huttenlocher, A unified spatio–temporal articulated model

for tracking, in: Proceedings of the IEEE International Conference on

Computer Vision and Pattern Recognition, 2004.

[30] H. Nguyen, Q. Ji, A. Smeulders, Robust multi-target tracking using

spatio–temporal context, in: Proceedings of the IEEE International

Conference on Computer Vision and Pattern Recognition, 2006.

[31] T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor scene

using spatio–temporal motion analysis, in: Proceedings of the IEEE

International Conference on Pattern Recognition, 2004.

[32] L. Kakadiaris, D. Metaxas, Model-based estimation of 3D human motion,

IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1453–1459.

[33] J. Richards, The measurement of human motion: a comparison of

commercially available systems, Human Movement Sci. 18 (5) (1999)

589–602.

[34] 〈www.vicon.com〉. Vicon Motion Systems.

[35] B. Li, Q. Meng, H. Holstein, Reconstruction of segmentally articulated

structure in freeform movement with low density feature points, Image

and Vision Comput. 22 (10) (2004) 749–759.

[36] P.J. Besl, N.D. McKay, A method of registration of 3-D shapes, IEEE

Trans. Pattern Anal. Mach. Intell. 14 (2) (1992) 239–255.

[37] B. Li, Q. Meng, H. Holstein, Articulated pose identification with sparse

point features, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (3)

(2004) 1412–1423.

[38] K.S. Arun, T.S. Huang, S.D. Blostein, Least square fitting of two 3-D

point sets, IEEE Trans. Pattern Anal. Mach. Intell. 9 (5) (1987) 698–700.

[39] 〈www.broadsword.co.uk〉. Broadsword Interactive Ltd.

About the Author—BAIHUA LI received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degree in computerscience from the University of Wales, Aberystwyth in 2003. She is a Lecturer in the Department of Computing and Mathematics, Manchester MetropolitanUniversity, UK. Her current research interests include computer vision, pattern recognition, human motion tracking and recognition, 3D modelling and animation.

About the Author—QINGGANG MENG received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degreein computer science from the University of Wales, Aberystwyth in 2003. He is a Lecturer in the Department of Computer Science, Loughborough University,UK. His research interests include biologically/psychologically inspired robot learning and control, machine vision and service robotics.

About the Author—HORST HOLSTEIN received the degree of B.S. in Mathematics from the University of Southampton, UK, in 1963, and obtained a Ph.D.in the field of rheology from University of Wales, Aberystwyth, UK, in 1981. He is a Lecturer in the Department of Computer Science, University of Wales,Aberystwyth, UK. His research interests include motion tracking, computational bioengineering and geophysical gravi-magnetic modelling.