Post on 21-Jan-2023
Author's personal copy
Pattern Recognition 41 (2008) 418–431
www.elsevier.com/locate/pr
Articulated motion reconstruction from feature points
B. Lia,∗, Q. Mengb, H. Holsteinc
aDepartment of Computing and Mathematics, Manchester Metropolitan University, Manchester, M1 5GD, UKbDepartment of Computer Science, Loughborough University, Loughborough, LE11 3TU, UKcDepartment of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Wales, UK
Received 17 October 2006; received in revised form 25 March 2007; accepted 6 June 2007
Abstract
A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of
feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose
a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely
by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking
to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic key-
frame-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to
enhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in a
single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a
common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in
buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving
self-resuming tracking for articulated motion.
� 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Non-rigid articulated motion; Point pattern matching; Non-rigid pose estimation; Motion tracking and object recognition
1. Introduction
Visual interpretation of non-rigid articulated motion has
lately seen somewhat of a renaissance in computer vision
and pattern recognition. The motivation for directing existing
motion analysis of rigid objects towards non-rigid articulated
objects [1,2], especially human motion [3–5], is driven by
potential applications such as human–computer interaction,
surveillance systems, entertainment and medical studies. A
large body of research, dedicated to the task of structure and
motion analysis, utilises feature-based methods regardless of
parametrisation by points, lines, curves or surfaces. Among
these, concise feature-point representation, advantageously
abstracting the underlying movement, is usually used as an
∗ Corresponding author. Tel.: +44 161 247 3598; fax: +44 161 247 1483.
E-mail addresses: b.li@mmu.ac.uk (B. Li), q.meng@lboro.ac.uk
(Q. Meng), hoh@aber.ac.uk (H. Holstein).
0031-3203/$30.00 � 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2007.06.002
essential or intermediate correspondence towards the end-
product of motion and structure recovery [6–8].
In the context of vision cues via feature-point representa-
tion, the spatio–temporal information is notably reduced to a
sequence of unidentified points moving over time. To determine
the subject’s structure and therefore its underlying skeletal-style
movements for the purpose of high-level recognition, two fun-
damental problems of feature-point tracking and identification
need to be solved. Tracking feature points in successive frames
has been investigated extensively in the literatures [9–12]. How-
ever, the identities of the subject feature points are not obtain-
able from inter-frame tracking alone.
Feature-point identification requires the determination of
which point in an observed data frame corresponds to which
point in its model, thus allowing recovery of structure. The task
addresses the difficult problem of automatic model matching
and identification, crucial at the start of tracking or on resump-
tion from tracking loss. Currently, most tracking approaches
simplify the problem to incremental pose estimation, relying
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 419
on manual model fitting at the start of tracking, or on an as-
sumption of initial pose similarity and alignment to the model,
or on pre-knowledge of a specific motion from which to infer
an initial pose [5,13]. In this sense, the general recovery of non-
rigid articulated motion solely from feature points still remains
an open problem. There is a relative dearth of algorithmic self-
initialisation for articulated motion reconstruction from only a
collection of sparse feature points.
Motivated by these observations, we present a dynamic
segment-based hierarchical point matching (DSHPM) algo-
rithm to address self-initialising articulated motion reconstruc-
tion from sparse feature points. The articulated motion we
are considering describes general segmental jointed freeform
movement. The motion of each segment can be considered as
rigid or nearly rigid, but the motion of the object as a whole
is high-dimensionally non-rigid. In our work, the articulated
model of an observed subject is a priori known, suggesting a
model-based approach. As a general solution to the problem,
the algorithm only assumes availability of feature-point motion
data, such as obtained in our experiments via a marker-based
motion capture system. We do not make the usual simplify-
ing assumptions of model-pose similarity or restricted motion
class for tracking initialisation, nor do we require absence of
data noise. The algorithm aims to establish one-to-one matches
between the model point-set and its freeform motion data to
reconstruct the underlying articulated movements in buffered
real-time.
2. Related work
The problem of automatically identifying feature points to re-
trieve underlying articulated movement can be inherently diffi-
cult for a number of reasons: (1) the possibility of globally high
dimensionality to depict the articulated structure; (2) relaxation
of segment rigidity to allow for limited distortion; (3) data cor-
ruption due to missing (occluded) and extra (via the process of
feature extraction) data; (4) unrestricted and arbitrary poses in
freeform movements; (5) requirements of self-initialising track-
ing and identification; and (6) computational cost compatible
with real-time. While few works have attempted to address all
these issues in the context of sparse feature-point representation
(an early exploratory paper was published in Ref. [14]), many
researches have attacked a variety of aspects of the problem. In
this section, we review techniques related to the problem from
two categories: point pattern matching (PPM) and articulated
motion tracking.
PPM is a fundamental problem for object recognition, motion
estimation and image registration in a wide variety of circum-
stances [15,16]. Many of the approaches have focused on rigid,
affine or projective correspondence, using techniques such as
graphs, interpretation trees [17], Hausdorff distance [18], geo-
metric alignment and hashing [19]. In these cases, the developed
techniques are based on geometric invariance or constraint sat-
isfaction in affine transformations, yielding approximate map-
ping between objects and models [20]. However, these methods
cannot be easily extended to the high-configuration dimension-
ality of a complex articulated motion.
For modelling non-rigidity, elastic model [2,21], weighted-
graph matching [22] and thin-plate spline approaches [23] have
been developed to formulate densely distributed points into
high-level structural presentations of lines, curves or surfaces.
However, the necessary spatial data continuity is not available
in the case of sparse points representing skeletal structures.
Piecewise approaches [24,25] are probably the most appropri-
ate for segmental data. In our case, a set of piecewise affine
transformations with allowable distortion relaxation are sought
for marching to an articulated segment hierarchy under kine-
matic constraints.
A second category of literature deals with the tracking of
a particular type of articulated motion: human motion. Exist-
ing algorithms commonly are model-based to reconstruct pre-
cise poses from video images. The main challenge is to track
a large number of degrees of freedom in high-dimensional
freeform movements in the presence of image noise. To im-
prove the reliability and efficiency of motion tracking, a spatial
hierarchical search, using certain heuristics such as colour or
appearance consistency, has proved successful [26,27]. How-
ever, the spatial hierarchy and matching heuristic may not be
applicable in individual frames due to self-occlusion and im-
age noise. In that case, spatio–temporal approaches have been
shown advantageous in recent researches. Sigal et al. [28] intro-
duced a “loose-limbed model” to emphasise motion coherency
in tracking. Lan and Huttenlocher [29] developed a “unified
spatio–temporal” model exploring both spatial hierarchy and
temporal coherency in articulated tracking from silhouette data.
Spatio–temporal methods have enabled robust limb tracking in
multi-target tracking [30], outdoor scene analysis [31] and 3D
reconstruction of human motion [32]. Our work benefits from
the spatio–temporal concept. However, methodologies based
on the rich information of images cannot be adapted to our
problem domain of motion reconstruction from concise feature-
point representation.
Marker-based motion capture systems exemplify point-
feature trackers [33]. Coloured markers, active markers, or
a set of specially designed marker patterns have been used
to code the identification information in some systems. Such
approaches side step the hard problem of marker identifica-
tion, but at the expense of losing application generality. The
generic PPM problem in articulated motion is exemplified by a
state-of-the-art optical MoCap system, e.g. Vicon [34], without
recourse to marker coding. However, auto-identification may
fail for complex motion. MoCap data normally need time-
consuming manual post-processing before they can be used
for actual applications.
Our previous baseline study [35] developed a segment-based
articulated point matching algorithm for identifying an ar-
bitrary pose of an articulated subject with sparse point fea-
tures from a single-frame data. The algorithm provided a self-
initialisation phase of pose estimation, crucial at the beginning
or on resumption of tracking. It utilised an iterative “coarse-
to-fine” matching scheme, benefiting from the well-known it-
erative closest point (ICP) algorithm [36], to establish a set
of relaxed affine segmental correspondences between a model
point-set and an observed data set taken from one frame of ar-
Author's personal copy
420 B. Li et al. / Pattern Recognition 41 (2008) 418–431
ticulated motion. However, we argued the possibility of more
robust motion reconstruction by combining with information
from inter-frame tracking, that will eventually reduce the uncer-
tainty inherent in the matching problem for single-frame data
[37].
Pursuing the cross-fertilisation of the research and existing
techniques, we extend our previous study on single-frame ar-
ticulated pose identification [35,37] into a dynamic context.
We propose a DSHPM algorithm, targeting the reconstruction
of articulated movement from motion sequences. The DSHPM
algorithm integrates inter-frame tracking and spatial hierarchi-
cal matching to achieve the effectiveness of articulated PPM
to buffered real-time. The idea of segment-based articulated
matching, as computational substrate to explore the spatial hi-
erarchy [35,37], is enhanced by exploiting motion coherency
embedded in inter-frames that ultimately reduces the ambiguity
of identification in the presence of data noise.
3. Framework of the model-based DSHPM algorithm
The generic task under consideration arose from the need
to identify feature-point data to reconstruct underlying skeletal
structure of freeform articulated motion. We assume the data
capture rate is sufficiently high, as demanded in most real-world
applications. This allows the obtaining of feature-point trajec-
tories in successive frames. However, the identities of feature
points (or trajectories) are not known.
3.1. The articulated model and motion data
The subject to be tracked is pre-modelled. A subject model
comprises S segments with complete feature points. Each seg-
ment Ps ={ps,i |i=1, . . . , Ms} has Ms identified feature points
ps,i . The feature points are sufficient in number and distribution
to indicate the orientation and segment structure with demanded
details. Segment non-rigidity is allowed within a threshold of
segmental distortion ratio �s . Articulation is indicated through
join-point commonality between two segments, suggesting a
segment-based hierarchy. To keep the algorithm general, each
segment undergoes independent motion constrained only by
joint points. We do not impose motion constraints such as fea-
sible biological poses for a specific subject type.
The observed motion data of the subject is represented by
a sequence of point-sets at each time frame t: Qt = {qtj |j =
1, . . . , N t }, where the N t data points qtj could be corrupted by
missing data due to occlusion and extra noise data arising from
the process of feature extraction.
3.2. Outline of the DSHPM
To identify massive data within a complex motion sequence,
frame-by-frame model fitting would be computationally ex-
pensive and unnecessary. Ideally, initial entire model fitting
need only be attempted at some key-frames, in particular, at
the start of tracking or on the resumption from tracking fail-
ure. Subsequent identification of an individual feature point
success?
parent–segment correction
dynamic segment–based hierarchical identification in a key–frame range
recruitmenttracking &
backward
success?
pre–tracking &
pre–segmentation
segment
tracking and identity propagation
abandon the
NY
key–frame range
failed 2 times?
identification & recruitment
key–frame
dynamic key–frame–shift
has parent?
an articulated motion sequence presented by fearture points ......
CT-based iterative
segmental matching
Y
N
Y
N
YN
& recruitmentmotion verification
next segment (if any) depth–first along hierarchical tree
Fig. 1. Framework of the dynamic segment-based hierarchical point matching
(DSHPM) algorithm.
can be achieved by tracking over its trajectory. In the case
of broken tracks, re-identification costs are largely reduced
by reference to the known points whose identities are carried
forward.
The framework of the proposed DSHPM algorithm is shown
in Fig. 1. As computation substrate for initial model fitting,
it employs a hierarchial segmental mapping supported by
candidate-table (CT) optimisation at a key-frame (Section 4.2).
To reduce the inherent uncertainty of segmental match-
ing in a single key-frame, a dynamic scheme (Section 4.3)
incorporating temporal coherency in a key-frame range is
explored through inter-frame tracking. This includes three
phases: motion verification, dynamic key-frame-shift identi-
fication and backward parent-segment correction, as shown
in Fig. 1. Under CT-based iterative matching, the algorithm
first verifies that a proposed segmental match is consistent with
an affine transformation over a period of movement subject to
a relaxed geometric invariance defined by a segmental distor-
tion ratio �s . We name this process motion verification (Sec-
tion 4.3.1). If a segment cannot be identified or the segment
identification cannot be proved correct by motion verification,
reflecting poor segment data in the current key-frame, the al-
gorithm shifts the key-frame forwards a certain time period
to attempt re-identification. This process is denoted dynamic
key-frame-shift identification (Section 4.3.2). The final phase,
backward parent-segment correction, aims to correct any
wrongly identified parent-segment which could cause subse-
quent unsuccessful matches of child-segments (Section 4.3.3).
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 421
In the dynamic process, a temporal key-frame shift is always
accompanied by a recruitment procedure that forward propa-
gates already obtained identities and recruits any newly appear-
ing matches in order to maintain spatial hierarchy.
4. The DSHPM algorithm
Identification is carried out hierarchically segment by seg-
ment in a chosen key-frame containing e.g. over 90% of the
model points (Section 4.2), or in a key-frame range when nec-
essary, taking advantage of a dynamic scheme (Section 4.3). In
order to make temporal coherence of motion cues exploitable
and reduce the search space, feature-point pre-tracking and pre-
segmentation are carried out prior to segmental identification
(Section 4.1).
4.1. Pre-tracking and pre-segmentation
Feature-point data are tracked in a time period before identi-
fication. We denote this step as pre-tracking in Fig. 1. This pro-
cess allows propagating inter-frame correspondences of feature
points in a key-frame backwards and forwards along stacked
pre-tracked trajectories. It not only makes use of motion co-
herence for efficient segment retrieval, but also makes the key-
frame-based identification feasible.
The pre-tracked trajectories exhibit relative motion cues of
individual points. To reduce the search space of a segment,
a pre-segmentation process is carried out prior to segmental
identification, as shown in Fig. 1. We group unidentified points
that maintain relatively constant distances during articulated
movements, as candidates for intra-segmental membership.
The pre-segmentation is subjected to criteria that depend on
the Euclidean distance Dti,j between each pair of observed data
(qi, qj ) at frames t=K+n�, n=0, 1, . . . , 10, starting from the
key-frame K and proceeding in intervals �, where � denotes
the motion relaxation interval, that is, the number of frames
during which motion relaxation takes place, reflecting notice-
able changes in pose. We determine intra-segmental point-pair
candidature (qi, qj ) using a two-stage criterion with relaxation:
Dti,j <
(
1 + maxs
�s
)
maxs
Ds , (1)
maxn(DK+n�i,j ) − minn(D
K+n�i,j )
avgn(DK+n�i,j )
< maxs
�s , (2)
where the segment distortion ratio �s is determined by relative
variation of edge length, among segmental point-pairs, reflect-
ing allowed “non-rigidity” of a segment in motion.
Criterion (1) indicates that the point-pair distance Dti,j should
be less than the maximum intra-segmental point-pair distance
Ds with maximum distortion relaxation. Criterion (2) requires
that the ratio between the extremal distance difference and the
average distance of the point-pair, over the intervals, should be
less than the maximum distortion ratio allowed in any segment.
If both criteria are satisfied, indicating that point-pair (qi, qj )
maintains a consistent distance with allowed relaxation in
articulated movements and may therefore belong to a same
segment, we store this information in a segmentation matrix
Seg and set Seg(i,j) = 1; otherwise we set Seg(i,j) = 0 for an
extra-segment pair.
4.2. CT-based iterative segmental identification
Articulated motion maintains a relaxed geometric invariance
in near-rigid segments. Matching at segment level is therefore
preferable to brute global point-to-point searching. Initially nei-
ther correspondence nor motion transformation are known for
any segment or point. To identify a segment Ps in an articu-
lated structure at tracking start, we adapt the basic idea of CT
for pose identification developed in our pervious work [35,37],
to the new context of motion sequence. We enhance the static
CT-based iterative segmental matching by exploring motion co-
herence embedded in inter-frames.
Briefly, a CT is created in two stages: (1) CT generation
and optimisation augmented by pre-segmentation information
(Section 4.1), and (2) CT-based iterative matching (Section 4.1).
4.2.1. CT generation
As explained in Refs. [35,37], the CT of segment Ps is deter-
mined by intra-segmental distance similarity, here augmented
with heuristic rigidity cues available from pre-segmentation. To
define the column ordering of a CT for a segment in which no
point has been identified, arbitrarily choose a “pivot” reference
point ppivots, and order the remaining model points pi by non-
decreasing distance Dpivots ,ifrom the pivot, giving a model
pivot sequence for the segment.
To match the model pivot sequence, a sequential search is
applied, with the possibility of rapid rejection of false candi-
dates. Thus, from the unidentified data at key-frame K, arbi-
trarily choose an assumed pivot match qK
apivotsof ppivots
, and
calculate its distance DK
apivots ,jto all other unidentified points
qKj .
To exclude large outliers of the segment based on the cho-
sen pivot, a pivot-centred bounding box, relaxed by distortion
ratio �s , is applied [35]. Then in the bounding subspace, the
algorithm seeks match candidates of each model point based on
distance similarity with reference to the assumed pivot qK
apivots,
satisfying the distortion tolerance,
|Dpivots ,i− DK
apivots ,j|Seg
(apivots ,j)=1|
Dpivots ,i< �s , (3)
in which the candidate selection is restricted by the pre-
segmentation point-pair rigidity criterion Seg(apivots ,j) = 1.
We list such selected candidates in a table column as possible
matches. The procedure is repeated for every element along
the model pivotal sequence, giving rise to an ordered matching
sequence of columns that define the CT for the assumed pivot
match (ppivots, qK
apivots).
Taking each unidentified point in turn as an assumed pivot
match, we generate a set of CTs for that model pivot choice.
Author's personal copy
422 B. Li et al. / Pattern Recognition 41 (2008) 418–431
Heuristically, the CT constructed with the correct pivot match,
if present in the data, should include more candidates than other
CTs. To economise the iterative search at next stage (Section
4.2.2), CT prioritising by CT-culling, CT-ranking and candidate
ordering are applied to reduce the search space in which the
correct solution is likely to be found, as discussed in Ref. [35].
The use of CTs makes the assumption of small motion or pose
similarity, used e.g. in the ICP [36], unnecessary.
When a join point has already been identified during its
parent-segment identification, then this point is chosen as the
unique pivot. In this case, only one CT is generated, resulting
in a striking reduction of search space.
4.2.2. Iterative segmental matching
To detect the presence or absence of a one-to-one intra-
segmental feature-point correspondences in a CT of the reduced
set of prioritised CTs, an iterative matching procedure is em-
ployed to seek the best transformation that interprets segmen-
tal movement under distortion relaxation. We take first can-
didates in the top row of a CT to provide an assumed ini-
tial match QKs of model segment Ps . An affine transformation
[Rs, Ts] is determined under this correspondence by a least-
square SVD method [38]. If this transformation maps model
points into a rough alignment with their assumed matches, so
that the average mapping error satisfies the desired matching
quality bounded by the segmental distortion ratio,
e(Ps→Q
Ks )
|[Rs ,Ts ] < �s , (4)
then the assumed segmental match (Ps → QKs ) is taken as
correct. Otherwise, there must be pseudo pairs in the matching
assumption which need to be removed by coarse-to-fine error-
driven iteration. Pseudo pairs exaggerate individual matching
errors at wrong matches. Based on this cue, we remove the
worst match by replacing it with its next candidate in the CT, if it
exists; otherwise we omit its match from the currently assumed
correspondence in the CT. If this CT becomes exhausted before
the best segment match is found, then a new CT would be
interrogated [37].
To qualify as a whole segmental match, the transformation
under the assumed correspondence should also guarantee the
desired matching quantity fraction �s ,
number(Ps → QKs )|[Rs ,Ts ]
Ms
> �s , (5)
where Ms is the number of feature points in model segment
Ps . If the matching quantity criterion is not satisfied, the algo-
rithm will attempt to find the remaining matches of the segment,
which may have been dropped during iterations, or were ex-
cluded from the CT on grounds of limiting the search space via
stringent criteria equations (1)–(3). Finding remaining matches
is achieved by reassigning their nearest-neighbours in the data
under current transformation [Rs, Ts] (refer to segment recruit-
ment shown in Fig. 3). If no such closest neighbour is found,
we say the match of the point is lost.
Iterative motion estimation and refinement alternately update
the assumed matches until converging to a segmental match
Fig. 2. Motion verification.
(Ps → QKs ) in the correct CT, leading to satisfaction on both
matching quality equation (4) and matching quantity equation
(5). In the event of no CT providing an acceptable match, the
segment match will be deemed not to exist in the current key-
frame.
In the case of segments with less than three matching pairs,
the SVD-based motion estimation cannot be applied. Segmental
identification becomes highly uncertain. We need to confirm
such a segment in the hierarchical chain depending on whether
its children or even grandchildren can be found (Section 4.4).
4.3. Dynamic identification
Single-frame spatial pose data alone may have inherent un-
certainty in determining the correct match from noisy data. In
the dynamic context of a motion sequence, geometric coher-
ence embedded in the temporal domain along pre-tracked tra-
jectories is used to improve the reliability of identification from
a key-frame. As shown in Fig. 1, after the CT-based iterative
segmental matching (Section 4.2), a dynamic scheme of motion
verification (Section 4.3.1), dynamic key-frame-shift identifica-
tion (Section 4.3.2) and backward parent-segment correction
(Section 4.3.3) is applied in a key-frame range to guarantee a
most likely correct segment identification.
4.3.1. Motion verification
In segment-based articulated motion, geometric invariance
of segmental “rigidity” should be maintained over movements.
The idea of motion verification is therefore to propagate the
segmental feature-point identities along their pre-tracked tra-
jectories, and to confirm an affine transformation under such
correspondence still satisfying the matching quality criterion
equation (4) within an allowed distortion relaxation. As sum-
marised in Fig. 2, the segmental matching obtained in the key-
frame should be confirmed via its “rigidity”, after a motion
relaxation interval �-frame-shift of the key-frame.
When the key-frame segment match (Ps → QKs ) is con-
firmed, the algorithm will attempt to retrieve newly appearing
points at the observed frame K + � if the segment is incom-
plete. This is achieved by the segment recruitment procedure as
shown in Fig. 3. If at the observed frame K+�, more matches
are found, reflecting good data quality, then the dynamic iden-
tification scheme favours a key-frame-shift to K + �, to be
described below.
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 423
Fig. 3. Segment recruitment.
Fig. 4. Recursive Parent_segment_correction.
4.3.2. Dynamic key-frame-shift identification
The quality of some frame data for a segment may be very
poor, on account of excessive missing/extra data or distortion.
In this case, CT-based segmental identification may fail. This
will break off the hierarchical search and result in serious un-
certainty for successive child-segment identification. For this
reason, we do not confine the segment matching to the initially
chosen key-frame, but rather carry out matching in a key-frame
range. If the segment identification or verification fails, then the
dynamic key-frame-shift module is used to re-identify the seg-
ment in up to two successive key-frame-shifts, as shown Fig. 1.
In order to maintain spatial hierarchy, after the key-frame-
shift, the recruitment procedure in Fig. 3 is applied to all in-
completely identified segments to encompass any previously
missed matches and forward propagate the obtained segmental
identities into the new key-frame.
4.3.3. Recursive parent-segment correction
If two successive key-frame-shift processes still fail to iden-
tify or verify a segment Ps , this may imply a wrong or highly
distorted joint pivot in use, derived from its parent-segment dur-
ing hierarchical searching. In this case, the algorithm attempts a
recursive backward parent-segment correction to check for the
join point and even its parent-segments, as described in Fig. 4.
If after a series of dynamic attempts, no parent join takes part
in the failed identification, we ultimately abandon the segment.
This indicates that the segment could have been occluded, or
have poor data quality even over a range of investigated frames.
4.4. Integrating temporal coherence with spatial hierarchy for
articulated match
Articulation at inter-segment joins is represented in a tree hi-
erarchy. Consistent matching of articulated segments is carried
out with respect to this tree. Such spatial hierarchy organises
segment identification in a parent–child ordering, thereby car-
rying forward identified join points in a parent to its children.
We assume that one of the segments of the articulated struc-
ture contains more points and has more segments linked to it
than most other segments. We treat such a segment as root,
seeking to identify it first. After the root has been located,
searching proceeds depth-first to child-segments along hierar-
chical chains, taking advantage of available joints which have
been located during parent-segment identification. This linkage
through join points considerably increases the reliability and
efficiency of child-segment identification. In the case of miss-
ing joint data on a parent segment, we recover a virtual joint if
at least three identified points are obtained in the parent. When
a parent has several children, searching prioritises to the child
with the most model points, as its identification incurs the least
uncertainty from missing, extra or distorted data, and leads to
the greatest subsequent search space reduction. In the case of
broken search chains in the hierarchy due to a failed segment
identification or missing join point, identification will proceed
to other segments on other chains first and leave any remain-
ing child-segments on broken chains to be identified last under
conditions of a much reduced search space.
The segment-based hierarchical search operates dynamically
in a key-frame range, rather than being confined to a single
static data frame as in Refs. [35,37]. This lends robustness to
segment identification, as data maybe poor in one frame while
good in another. When the hierarchical chain is broken in a
chosen frame, the algorithm could shift to a new frame to carry
on the search. Most existing feature-point identifications will
be carried forwards along pre-tracked trajectories to propagate
spatial continuity into the new frame. An obtained segment
match can be confirmed by geometry invariance presented
in the temporal domain of a motion relaxation interval �
(Fig. 2). A failed child-segment identification caused by a
wrongly inherited pivot from its parent can be corrected by a
recursive procedure (Fig. 4). Meanwhile, the dynamic scheme
allows an efficient recruitment of reappearing segment points
by reference to the known points whose identities are carried
forward (Fig. 3). Our experimental results confirm that tem-
poral coherency integrated with spatial hierarchical cohesion
enhances identification of complex articulated motion in the
presence of data corruption (Section 5).
4.5. Identification with tracking for registering a whole
motion sequence
After segment-based dynamic hierarchical matching in a key-
frame range, the identity of each feature point can be propagated
Author's personal copy
424 B. Li et al. / Pattern Recognition 41 (2008) 418–431
along its trajectory by inter-frame tracking. The algorithm con-
tinues to track identified points forwards throughout the whole
motion sequence until missing data are encountered, causing
broken trajectories. For missing data, the algorithm attempts to
identify reappearing points or even segments and restart their
tracking. This is much easier by reference to the already identi-
fied points than at the initial stage of model fitting, because the
motion transformation can be obtained from partially identified
segment matches. In the case of an entire newly appearing seg-
ment, identification complexity is evidently much reduced in
the presence of previously established correspondences in the
articulation.
5. Experiments
The algorithm has been implemented in Matlab. We tested it
on articulated models, such as human and robot manipulators
in various low densities and distributions of feature points. Hu-
man motion representing a typical articulated motion with seg-
ments of only near-rigidity makes the identification task more
difficult than in the case of robot manipulator motion with rigid
segments. To reflect this challenge, we report in this section ex-
perimental results on real-world human motion capture and its
overlays with synthetic noise for performance analysis of the
algorithm.
In our experiments, all 3D model data and human motion
data were acquired via a marker-based optical MoCap Vicon
512 system [34]. The measurement accuracy of this system is
to the level of a few millimetres in a control volume spanning
metres in linear extent. We attached markers as extrinsic feature
points on a subject at key sites, indicating segmental structure
with required detail. Marker attachment to tight clothing or
bony landmarks nevertheless introduced inevitable segmental
non-rigidity due to underlying soft body tissues. The sampling
rate of human MoCap was 60 frame per second (fps) in our
experiments.
5.1. Human motion reconstruction from MoCap data
A number of freeform movements captured from various
subjects in various point distributions were investigated. The
subject model is first generated off-line using one complete
frame of feature-point data captured in a clear pose. We manu-
ally identified the data in a 3D-interactive display and grouped
them into segments consistent with the underlying articulated
anatomy. This produced a “stick-figure” skeleton model of the
subject, as shown in the first of the figure sequences in Fig. 6(a)
and (b). Having attached markers to the subject and defined its
model, we proceeded with the capture of the subject’s freeform
motion using the Vicon MoCap 512 system.
5.1.1. Parameter setting
The segmental distortion ratio �s in Eqs. (1)–(4) and match-
ing quantity �s in Eq. (5) were pre-defined according to the
segment rigidity and the quality of MoCap data. Precise values
of the parameters are not required a priori, but algorithmic per-
formance will be compromised by very inappropriate choices.
Fig. 5. Capture static pose data for subject model generation.
To provide experimental values of �s , we analysed a number of
dynamic trials. We found that segmental distortion differs with
different body parts and motion intensity. Thigh segments may
give rise to large distortion with �s ≈ 0.2. A value of �s ≈ 0.05
was found to be adequate for indicating the rigidity of the head,
and an average �s ≈ 0.1 for other body parts. We used one set
of approximate distortion ratios �s =.05..2 in all human motion
experiments (Fig. 5).
For a rigid segment, a small value of �s guarantees a pre-
cise matching quality and provides good ability of rejecting
outliers. Therefore, we could reduce the matching quantity re-
quirement �s to gain more tolerance on missing data. For a de-
formable segment, a high �s value allows increased distortion,
but at the cost of increased candidate search space and possible
low-matching quality. For compensation, we have to raise the
matching size requirement �s with possible compromised han-
dling of missing data. Based on the quality of MoCap data and
the rigidity of individual segments, we set the matching quan-
tity �s = .80..90. To reflect significant pose changes, we chose
a motion relaxation interval � = 15 frames, corresponding to
0.25 s at the MoCap rate of 60 fps.
5.1.2. Reconstruction results
Illustrative results of identified MoCap sequences from rep-
resentative full-body models of 27, 34 and 49 feature points in
15 segments are given in Table 1 and in Fig. 6. In Fig. 6, iden-
tified feature points are linked intra-segmentally. As shown in
Fig. 6, there is no assumption of pose similarity between the
model (the frame first in Fig. 6(a), (b)) and their motion se-
quences, initially or during the movements. The captured mo-
tion sequences were subject to inevitable intermittent missing
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 425
Table 1
Identification examples of human motion (MoCap rate at 60 fps)
Activity Sequence length Number of Identification Efficiency (identified
in frames (seconds) trajectories rate (%) frames per second)
Protocol 1: 27 feature points
Walking 600 (10) 32 96 300
Running 600 (10) 40 95 270
Freeform movement 1200 (20) 93 94 220
Protocol 2: 34 feature points
Walking 600 (10) 45 98 380
Running 600 (10) 54 97 320
Freeform movement 1200 (20) 106 94 250
Protocol 3: 49 feature points
Walking 600 (10) 56 96 280
Running 600 (10) 65 94 220
Freeform movement 1200 (20) 128 91 130
points, extra noise points and segmental distortion in complex
intensive motion. We observe that the proposed DSHPM algo-
rithm is capable of reconstructing an articulated motion rep-
resented by sparse or moderately dense feature points in the
presence of data noise. Even when some key points, such as
join points, or segments have been lost, the algorithm can still
carry on the identification process by taking advantage of the
proposed dynamic scheme.
This algorithm has been successfully used to identify a num-
ber of MoCap trials in a commercial project to produce the
game “Dance: UK”.1 Extracts from an identified dance se-
quence are shown in Fig. 7.
5.1.3. Performance analysis from the MoCap data
To demonstrate the performance of the DSHPM algorithm,
Table 1 gives results of human motion identification obtained
by applying the DSHPM algorithm on real-world MoCap
trials. The average missing and extra data in the real-world
MoCap examples is about 10–15%. Some general types of
activities, listed in the first column, were tested under three
types of marker protocol: 27, 34 and denser 49 feature points.
The freeform movement includes walking running, jumping,
bending and dance.
The results shown in each row of Table 1 were averaged
for a number of trials from different subjects preforming sim-
ilar movements and with the same marker protocol. The av-
erage length of each type of activity, measured in frames as
shown in column 2, gives an indication of absolute motion pe-
riod at the 60 fps MoCap rate. The sequence length by itself
does not always indicate the difficulty of identification. In the
most favourable case of no tracking interruption, each feature
point need be identified only once, allowing its identity to be
propagated along its trajectory with minimum re-identification
cost. However, a trial with occlusions in complex movements
will lead to increased computational cost, as each reappearing
feature point is subjected to identification after tracking loss.
1 The “Dance: UK” was developed in collaboration with Broadsword
Interactive Ltd [38,39]. It was released during Christmas 2004.
To indicate identification workload due to lost tracking, col-
umn 3 gives the number of trajectories, counting interruptions.
These are generally higher than the indicated number of feature
points, consistent with increasing identification difficulty.
Activities (first column) for each marker protocol are or-
dered by increasing movement complexity, accompanied by in-
creased identification difficulty due to more missing data and
extra noise data. This is reflected by the decreasing identifica-
tion rate in column 4. The identification rate is defined as the
percentage of the number of correctly identified trajectories in
relation to the total number of trajectories encountered. The
high trajectory-based rate emphasises correct identification ob-
tained via segment-based articulated matching, illustrating the
effective nature of the algorithm, rather than identification in-
herited only from inter-frame tracking, thus illustrating the ef-
fective nature of the algorithm.
We observed that the identification rate in Table 1 is in excess
of 90% in all motion types, whether with sparser or denser
feature points, even for large accelerative movements with big
segmental distortion, such as in jumping and dancing, and for
complex movements characterised by large numbers of broken
trajectories due to occlusion.
Reconstruction efficiency of the algorithm depends on the
complexity of the articulated model, but critically also on the
motion conditions: the level of segmental distortion associated
with movement intensity, the frequency and amount of miss-
ing/extra data associated with motion complexity. We indicated
an empirical reconstruction efficiency via “identified frames per
second” in Table 1. This indicator is defined as the length of
a trial (measured in frames) divided by the computational time
(measured in seconds) when the DSHPM identification was ex-
ecuted on a Compaq Pentium IV with 512 MB RAM in Matlab
code. We observe that for common activities of walking and
running, the identification efficiency in the type 2 marker pro-
tocol (34 feature points) is higher than for types 1 and 3 (27
and 49 feature points, respectively). Type 2 is a compromise
between having too few feature points (type 1) to allow unin-
terrupted hierarchical searching in the case of missing data, and
the denser data sets (type 3), with possible identification con-
Author's personal copy
426 B. Li et al. / Pattern Recognition 41 (2008) 418–431
–400–2000
200–1000
–5000
0
500
1000
1500
model
–600–400
–2000
200
–2000
200
400
0
500
1000
1500
Frame=50
–1000
–500
00 200
400600
0
500
1000
1500
Frame=200
–400–200
0200
–2000
200400
600800
0
500
1000
1500
Frame=350
–400–2000
200
–400200
0200
400600
0
500
1000
1500
Frame=500
–1000
–500
0
0
500
1000
0
500
1000
1500
Frame=650
–2000200400600
0
500
1000
0
500
1000
1500
Frame=800
0200
400600
0200400
600
0
500
1000
1500
Frame=950
0500
10000
500
0
500
1000
1500
model
0 500 10000
5000
500
1000
1500
Frame=60
0500 0500
0
200
400
600
800
1000
1200
1400
1600
Frame=120
0500
1000 0200
400600
8000
500
1000
1500
Frame=180
0500
10000200400600800
0
500
1000
1500
Frame=240
05001000 05000
500
1000
1500
Frame=300
0 200400600 800
0200
400600
0
500
1000
1500
Frame=360
0500
10000
5000
500
1000
1500
Frame=420
Fig. 6. Reconstructed human freeform movements: subject models of 15 segments followed by 7 sampled frames from their identified motion sequences: (a)
human motion represented by 34 feature points; (b) human motion represented by 49 feature points.
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 427
Fig. 7. Dance trial reconstruction in the game project “Dance: UK”.
Author's personal copy
428 B. Li et al. / Pattern Recognition 41 (2008) 418–431
0 0.09 0.18 0.27
additional distortion level
0 0.09 0.18 0.27
additional distortion level
0.5
0.6
0.7
0.8
0.9
1
Ide
ntifica
tio
n r
ate
0.5
0.6
0.7
0.8
0.9
1
Ide
ntifica
tio
n r
ate
30 points
50 points
30 points
50 points
Fig. 8. Comparison of the static approach [35] with the DSHPM approach for motion reconstruction with additional synthetic distortion: (a) static identification;
(b) dynamic identification.
fusion and general increased computation cost. For each type
of marker protocol, identification efficiency decreases with in-
creasing activity complexity involving more broken trajecto-
ries and data noise. In all cases, the identification efficiency
exceeded the MoCap rate 60 fps by at least two times, making
identification time competitive with real motion time. This sug-
gests reconstruction for an on-line tracker realised in buffered
real-time.
5.2. Evaluation based on synthetic distortion of real data
To evaluate the robustness and efficiency of the DSHPM al-
gorithm, we used two MoCap walking sequences. Each of them
has 600 frames corresponding to 10 s at a MoCap rate 60 fps.
They were captured from the same subject for comparability,
in the sparse case of 30 points and the denser case of 50 points,
respectively. Both sequences are denoted as “ideal”, having no
missing or extra data, and minimal distortion by marker attach-
ment to the subject at tightly clothed key sites. Their identifi-
cation rate by the DSHPM is 100%.
5.2.1. Identification of distorted motion data compared to
dynamic and static schemes
In the first series of experiments, we compared identifica-
tion effectiveness by the proposed dynamic strategy with that
of a static identification scheme [35], under increasing mo-
tion distortion. In the latter, identification is carried out in
isolation at each frame, without considering any inter-frame
temporal coherence. To simulate the effect of distortion un-
der variable motion intensity, we augmented the “ideal” mo-
tion data with synthetic noise over its natural distortion, as
follows. Taking a pre-identified “ideal” walking sequence, we
added Gaussian noise N(0, 0.5�ls)/√
6 to the x, y and z co-
ordinates of each point in sequences of over 600 frames, the
standard deviation being parameterised by a dimensionless dis-
tortion level � and an average segmental length ls . The aver-
age identification rates (fraction of correctly identified points)
over 500 trials, versus the increasing distortion level of both
sparser 30 and denser 50 feature-point walking trials, using
either the static or dynamic identification scheme, are given
in Fig. 8(a) and (b).
We observe that increased distortion leads to more potential
for confusion among neighbouring data points, especially for
the denser point-set, and to greater loss of identification and
tracking difficulty. However, comparing Fig. 8(a) and (b), the
DSHPM algorithm achieves better identification rates than the
static method, at a given distortion level. This is more evident
with increasing distortion, as is to be expected from the DSHPM
robustness gained by exploiting motion coherence embedded
in inter-frames to survive spatial distorted data. The advantage
of using DSHPH is more obvious for the difficult situation of
the denser set.
5.2.2. Identification of corrupted data with missing/extra data
The second series of experiments studied the ability of the
DSHPM algorithm to identify “ideal” walking sequences sub-
jected to increasing missing or extra data. To obtain test motion
sequences with missing data, we removed feature-point data
randomly and evenly among segments, with gaps continuing
for 1–60 frames and average length Lcorrupt = 30 frames. To
generate an extra trajectory in a volume encompassing the ob-
served data, we randomly inserted two points, one in each of
two frames with 1–60 frames apart, and linearly interpolated
the trajectory in-betweens. The average length of an extra tra-
jectory is therefore Lcorrupt = 30 frames. The fraction of such
corrupted (missing or extra) data is defined as (Lcorrupt/L) ×(Ncorrupt/N), in which Ncorrupt denotes the number of miss-
ing or extra trajectories generated, where L is the frame
length of the sequence and N is the number of model feature
points.
Average identification rates of 500 trials at different data cor-
ruption levels are shown in Fig. 9. Fractions of missing or ex-
tra data added to the “ideal” walking sequences are indicated
by the bottom horizontal axis. The corresponding numbers of
broken trajectories encountered at each missing or extra noise
level, for 30 (and 50) point trials, are given along the top axis.
Comparing the left and right of the zero-line, the latter line
indicating identification of the “ideal” sequences with original
30 or 50 trajectories, we observe that the algorithm demon-
strates good robustness in rejecting large numbers of outliers,
but rapidly fails to survive the inherent difficulty of increasing
missing data. It is also evident that the denser set gains better
tolerance on missing data than the depleted sparser point-set.
However, more identification loss happens to the denser set for
extra data.
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 429
Num. of trajectories for sparse (30) and dense (50) point sets
-0.2
4
-0.1
8
-0.1
2
-0.0
6 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
174 (290)
138 (230)
102 (170)
66 (110)
30 (50)
90 (150)
150 (250)
210 (350)
270 (450)
330 (550)
390 (650)
450 (750)
510 (850)
570 (950)
630 (1050)
0.5
0.6
0.7
0.8
0.9
1
1.1
Fraction of missing (-) and extra (+) data
Iden
tifi
cati
on
rate
30 points
50 points
Fig. 9. DSHPM identification with synthetic missing and extra data added in the “ideal” walking movements.
Num. of trajectories for sparse (30) and dense (50) point sets
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.10
-0.0
6
-0.1
2
-0.1
8
-0.2
4
630 (
1050)
570 (
950)
510 (
850)
450 (
750)
390 (
650)
330 (
550)
270 (
450)
210 (
350)
150 (
250)
90 (
150)
30 (
50)
66 (
110)
102 (
170)
138 (
230)
174 (
290)
10
100
1000
Fraction of missing (-) and extra (+) data
SV
D-c
ou
nt
30 points
50 points
Fig. 10. SVD-count versus missing and extra data.
5.2.3. Complexity
During dynamic hierarchical matching, the identification
step is computationally more intensive than inter-frame track-
ing. Motion transformation [Rs, Ts] calculated by the SVD
[38] under an assumed segmental correspondence is the
most time consuming step, and is invoked in most essen-
tial modules, e.g. CT-based iterative segment match, motion
verification and recruitment. Therefore, in the last series of
experiments, we attempted to measure an empirical com-
plexity via the total number of SVD invocations, denoted
by SVD-count.
We undertook such a complexity analysis relating to the two
experiments of Section 5.2.2 above. SVD-counts versus miss-
ing/extra data in walking sequences, for both cases of sparser
and denser point sets, are monitored in Fig. 10. In both cases, we
observe that SVD-counts grow steadily with increasing extra
data in an approximate log-linear manner. Most SVD-counts are
spent at the initial key-frame identification stage. Ideally, when
all segments are identified without missing data, any of outliers
need only to be tracked without further identification cost. On
the left side of zero-line in Fig. 10, SVD load increases rapidly
with increasing missing data. However, the growth tendency is
Author's personal copy
430 B. Li et al. / Pattern Recognition 41 (2008) 418–431
restrained toward higher fractions of lost data. This is because
on the one hand, incomplete data causes more identification
and verification difficulties during initial segmental matching,
and the recruitment function that invokes the SVD is required
to encompass any newly appearing matches; on the other hand,
missing data reduce the number of points to be identified and
raise the possibility of segment abandon. Comparing the denser
and sparser cases, all have the same number of segments, but
the denser set will lead to more populated CTs. This is likely to
require greater numbers of match attempts, but the overall com-
plexity is seen to grow only at some low power of the extra data
measure.
6. Conclusion
The proposed dynamic segment-based hierarchical point
matching (DSHPM) algorithm addresses a general and cur-
rently open problem in pattern recognition: non-rigid articu-
lated motion reconstruction from low-density feature points.
The algorithm has a crucial self-initialisation phase of pose
estimation, benefiting from our previous work [35,37]. In the
context of a dynamic sequence, the DSHPM algorithm inte-
grates a key-frame-based dynamic hierarchial matching with
inter-frame tracking to achieve computation efficiency and
robustness to data noise. The candidate table optimisation
heuristics are improved by exploiting geometric coherency
embedded in inter-frames. Segment-based articulated match-
ing along a spatial hierarchy is significantly enhanced by
a dynamic scheme, in the forms of motion-based verifica-
tion, dynamic key-frame-shift identification and backward
parent-segment correction. Performance analysis of the al-
gorithm using synthetic data demonstrates the effectiveness
of the dynamic scheme that ultimately determines the ro-
bustness of articulated motion reconstruction and reduces
the uncertainty inherent in the matching problem for single
frame.
We provided illustrative experimental results of human mo-
tion reconstruction using 3D real-world MoCap data. Identifica-
tion rates for most common freeform movements have achieved
90% or higher without requiring manual intervention to aid the
identification. Identification efficiency proceeded at over twice
the common MoCap rate of 60 fps. This suggests the DSHPM
algorithm has the potential for self-initialising point-feature
tracking and identification of articulated movement in real-time
applications.
Acknowledgements
All model and motion data used in our experiments
were obtained by a marker-based optical motion capture
system—Vicon-512, installed at the Department of Computer
Science, UWA. Some motion trials analysed in this paper were
captured for the game project “Dance: UK” in collaboration
with Broadsword Interactive Ltd. [39].
References
[1] J.K. Aggarwal, Q. Cai, W. Liao, B. Sabata, Articulated and elastic non-
rigid motion: a review, In Proceedings of the IEEE Workshop on Motion
of Non-Rigid and Articulated Object, Austin, TX, 1994, pp. 2–14.
[2] J. Maintz, M. Viergever, A survey of medical image registration, IEEE
Eng. Med. Biol. Mag. 2 (1) (1998) 1–36.
[3] J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by
annealed particle filtering, in: Proceedings of the IEEE International
Conference on CVPR, vol. 2, 2000, pp. 126–133.
[4] T. Moeslund, E. Granum, A survey of computer vision-based human
motion capture, Comput. Vision Image Understanding 81 (3) (2001) 231
–268.
[5] L. Wang, W. Hu, T. Tan, Recent developments in human motion analysis,
Pattern Recognition 36 (3) (2003) 585–601.
[6] C. Cédras, M. Shah, A survey of motion analysis from moving light
displays, in: Proceedings of the IEEE Computer Vision and Pattern
Recognition, Washington, June 1994, pp. 214–221.
[7] C. Taylor, Reconstruction of articulated objects from point
correspondences in a single uncalibrated image, Comput. Vision Image
Understanding 80 (3) (2000) 349–363.
[8] J. Zhang, R. Collins, Y. Liu, Representation and matching of articulated
shapes, in: Proceedings of the IEEE International Conference on CVPR,
vol. 2, 2004, pp. 342–349.
[9] I. Cox, S. Hingorani, An efficient implementation of Reid’s multiple
hypothesis tracking algorithm and its evaluation for the purpose of visual
tracking, IEEE Trans. Pattern Anal. Mach. Intell. 18 (2) (1996) 138–150.
[10] S. Deb, M. Yeddanapudi, K. Pattipati, Y. Bar-Shalom, A generalized S-D
assignment algorithm for multisensor–multitarget state estimation, IEEE
Trans. Aerosp. Electron. Syst. 33 (2) (1997) 523–538.
[11] C. Veenman, M. Reinders, E. Backer, Resolving motion correspondence
for densely moving points, IEEE Trans. Pattern Anal. Mach. Intell. 23
(1) (2001) 54–72.
[12] Y. Wang, Feature point correspondence between consecutive frames based
on genetic algorithm, Int. J. Robot. Autom. 21 (2006) 2841–2862.
[13] M. Ringer, J. Lasenby, Modelling and tracking articulated motion from
multiple camera views, in: Proceedings of the British Machine Vision
Conference, Bristol, UK, September 2000, pp. 172–182.
[14] B. Li, H. Holstein, Dynamic segment-based sparse feature-point
matching in articulate motion, in: Proceedings of the IEEE International
Conference on Systems, Man and Cybernetics, 2002.
[15] R. Campbell, P. Flynn, A survey of free-form object representation and
recognition techniques, Comput. Vision Image Understanding 81 (2001)
166–210.
[16] B. Li, Q. Meng, H. Holstein, Point pattern matching and applications—a
review, in: Proceedings of the IEEE International Conference on Systems,
Man and Cybernetics, Washington, DC, USA, October 2003.
[17] V. Gaede, O. Günther, Multidimensional access methods, ACM Comput.
Surv. 30 (2) (1998) 170–231.
[18] D.M. Mount, N.S. Netanyahu, J.L. Moigne, Efficient algorithms for
robust feature matching, Pattern Recognition 32 (1999) 17–38.
[19] H.J. Wolfson, I. Rigoutsos, Geometric hashing: an overview, IEEE
Comput. Sci. Eng. 4 (1997) 10–21.
[20] W.E.L. Grimson, T. Lozano-Perez, D. Huttenlocher, Object Recognition
by Computer: The Role of Geometric Constraints, MIT Press,
Cambridge, MA, 1990.
[21] E. Bardinet, L.D. Cohen, N. Ayache, A parametric deformable model to
fit unstructured 3D data, Comput. Vision Image Understanding 71 (1)
(1998) 39–54.
[22] A. Cross, E. Hancock, Graph matching with a dual-step EM algorithm,
IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1236–1253.
[23] H. Chui, A. Rangarajan, A new point matching algorithm for non-rigid
registration, Comput. Vision Image Understanding 89 (2003) 114–141.
[24] A. Pitiot, G. Malandain, E. Bardinet, P. Thompson, Piecewise affine
registration of biological images, in: Second International Workshop on
Biomedical Image Registration, 2003.
[25] G. Seetharaman, G. Gasperas, K. Palaniappan, A piecewise affine model
for image registration in nonrigid motion analysis, in: Proceedings of
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418–431 431
the IEEE International Conference on Image Processing, 2000,
pp. 1233 –1238.
[26] D. Forsyth, D. Ramanan, C. Sminchisescu, People tracking, in:
Proceedings of the IEEE International Conference on Computer Vision
and Pattern Recognition, 2006.
[27] D. Gavrila, L. Davis, Model-based tracking of humans in action: a
multi-view approach, in: Proceedings of the IEEE Computer Vision and
Pattern Recognition, San Francisco, 1996, pp. 73–80.
[28] L. Sigal, S. Bhatia, S. Roth, M. Black, M. Isard, Tracking loose-
limbed people, in: Proceedings of the IEEE International Conference on
Computer Vision and Pattern Recognition, 2004.
[29] X. Lan, D. Huttenlocher, A unified spatio–temporal articulated model
for tracking, in: Proceedings of the IEEE International Conference on
Computer Vision and Pattern Recognition, 2004.
[30] H. Nguyen, Q. Ji, A. Smeulders, Robust multi-target tracking using
spatio–temporal context, in: Proceedings of the IEEE International
Conference on Computer Vision and Pattern Recognition, 2006.
[31] T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor scene
using spatio–temporal motion analysis, in: Proceedings of the IEEE
International Conference on Pattern Recognition, 2004.
[32] L. Kakadiaris, D. Metaxas, Model-based estimation of 3D human motion,
IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1453–1459.
[33] J. Richards, The measurement of human motion: a comparison of
commercially available systems, Human Movement Sci. 18 (5) (1999)
589–602.
[34] 〈www.vicon.com〉. Vicon Motion Systems.
[35] B. Li, Q. Meng, H. Holstein, Reconstruction of segmentally articulated
structure in freeform movement with low density feature points, Image
and Vision Comput. 22 (10) (2004) 749–759.
[36] P.J. Besl, N.D. McKay, A method of registration of 3-D shapes, IEEE
Trans. Pattern Anal. Mach. Intell. 14 (2) (1992) 239–255.
[37] B. Li, Q. Meng, H. Holstein, Articulated pose identification with sparse
point features, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (3)
(2004) 1412–1423.
[38] K.S. Arun, T.S. Huang, S.D. Blostein, Least square fitting of two 3-D
point sets, IEEE Trans. Pattern Anal. Mach. Intell. 9 (5) (1987) 698–700.
[39] 〈www.broadsword.co.uk〉. Broadsword Interactive Ltd.
About the Author—BAIHUA LI received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degree in computerscience from the University of Wales, Aberystwyth in 2003. She is a Lecturer in the Department of Computing and Mathematics, Manchester MetropolitanUniversity, UK. Her current research interests include computer vision, pattern recognition, human motion tracking and recognition, 3D modelling and animation.
About the Author—QINGGANG MENG received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degreein computer science from the University of Wales, Aberystwyth in 2003. He is a Lecturer in the Department of Computer Science, Loughborough University,UK. His research interests include biologically/psychologically inspired robot learning and control, machine vision and service robotics.
About the Author—HORST HOLSTEIN received the degree of B.S. in Mathematics from the University of Southampton, UK, in 1963, and obtained a Ph.D.in the field of rheology from University of Wales, Aberystwyth, UK, in 1981. He is a Lecturer in the Department of Computer Science, University of Wales,Aberystwyth, UK. His research interests include motion tracking, computational bioengineering and geophysical gravi-magnetic modelling.