Model-based automatic tracking of articulated human movement

11
Introduction Technological advances have contributed to the increased interest in vision-based/marker-free approaches to human motion analysis. Such approaches may prove to be a less expensive and more flexible means of human motion capture than commercial opto-electronic systems or laser-based shape measurement systems. However, tracking the kinematics of human motion without artificial markers is a task that is non-trivial due primarily to the complex structure of the human body, the nature of the movements and the regular occurrence of occlusions of one body segment by another. The © 2004 isea Sports Engineering (2004) 7, 53–63 53 Technical note Model-based automatic tracking of articulated human movement M.R. Yeadon*, G. Trewartha and J.P. Knight* * School of Sport and Exercise Sciences, Loughborough University, United Kingdom. Department of Sport and Exercise Science, University of Bath, United Kingdom. Abstract This study applied a vision-based tracking approach to the analysis of articulated, three-dimen- sional (3D) whole-body human movements. A 3D computer graphics model of the human body was constructed from ellipsoid solids and customized to two gymnasts for size and colour. The model was used in the generation of model images from multiple camera views with simulated environments based on measurements taken on each of three synchronized video cameras and the lighting sources present in the original recording environment. A hierarchical procedure was used whereby the torso was tracked initially to establish whole-body position and orientation and subsequently body segments were added successively to the model to establish body configura- tion. An iterative procedure was used at each stage to optimize each new set of variables using a score based on the RGB colour difference between the model images and video images at each stage. Tracking experiments were carried out on movement sequences using both synthetic and video image data. Promising qualitative results were obtained with consistent model matching in all sequences, including sequences involving whole-body rotational movements. Accurate tracking results were obtained for the synthetic image sequences. Automatic tracking results for the video images were also compared with kinematic estimates obtained via manual digitization and favourable comparisons were obtained. It is concluded that with further development this model-based approach using colour matching should provide the basis of a robust and accurate tracking system applicable to data collection for biomechanics studies. Keywords: motion capture, marker-free, human modelling, video image Correspondence address: Prof. M.R. Yeadon School of Sport and Exercise Sciences Loughborough University, Loughborough, Leicestershire LE11 3TU, United Kingdom Tel: +44 (0)1509-226307 Fax: +44 (0)1509-223971

Transcript of Model-based automatic tracking of articulated human movement

Introduction

Technological advances have contributed to theincreased interest in vision-based/marker-free

approaches to human motion analysis. Suchapproaches may prove to be a less expensive andmore flexible means of human motion capture thancommercial opto-electronic systems or laser-basedshape measurement systems. However, tracking thekinematics of human motion without artificialmarkers is a task that is non-trivial due primarily tothe complex structure of the human body, the natureof the movements and the regular occurrence ofocclusions of one body segment by another. The

© 2004 isea Sports Engineering (2004) 7, 53–63 53

Technical note

Model-based automatic tracking of articulated humanmovement

M.R. Yeadon*, G. Trewartha† and J.P. Knight*

* School of Sport and Exercise Sciences, Loughborough University, United Kingdom.† Department of Sport and Exercise Science, University of Bath, United Kingdom.

Abstract

This study applied a vision-based tracking approach to the analysis of articulated, three-dimen-sional (3D) whole-body human movements. A 3D computer graphics model of the human bodywas constructed from ellipsoid solids and customized to two gymnasts for size and colour. Themodel was used in the generation of model images from multiple camera views with simulatedenvironments based on measurements taken on each of three synchronized video cameras andthe lighting sources present in the original recording environment. A hierarchical procedure wasused whereby the torso was tracked initially to establish whole-body position and orientation andsubsequently body segments were added successively to the model to establish body configura-tion. An iterative procedure was used at each stage to optimize each new set of variables using ascore based on the RGB colour difference between the model images and video images at eachstage. Tracking experiments were carried out on movement sequences using both synthetic andvideo image data. Promising qualitative results were obtained with consistent model matching inall sequences, including sequences involving whole-body rotational movements. Accuratetracking results were obtained for the synthetic image sequences. Automatic tracking results forthe video images were also compared with kinematic estimates obtained via manual digitizationand favourable comparisons were obtained. It is concluded that with further development thismodel-based approach using colour matching should provide the basis of a robust and accuratetracking system applicable to data collection for biomechanics studies.

Keywords: motion capture, marker-free, human modelling, video image

Correspondence address:Prof. M.R. YeadonSchool of Sport and Exercise SciencesLoughborough University, Loughborough, LeicestershireLE11 3TU, United KingdomTel: +44 (0)1509-226307Fax: +44 (0)1509-223971

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 53

potential applications of marker-free humanmovement tracking systems are wide-ranging,including performance analysis, surveillance, andman–machine interfaces including gesture recogni-tion. For application to the collection of kinematicdata for biomechanical analyses, the accuracy of thekinematic estimates obtained is of major importance.

The tracking of human movement generally aimsto recover the 3D pose (position, orientation andconfiguration) of the human form over time. Thisoften involves estimating multiple joint angles withrespect to an object-centred co-ordinate system.Previous approaches have demonstrated successfultracking of movements of a constrained nature. Forexample the movements tracked were essentially 2Dor assumed to be limited to specific planes of motion(Yacoob & Davis 2000), the possible model configu-rations were pre-determined (Rohr 1994), or onlysimulated sequences were tracked (Chen & Lee1992). A small number of studies have demonstratedsuccess on less constrained movements with real datausing image properties such as: edges (Gavrila &Davis 1996); ellipsoid regions (Bregler & Malik1998); texture (Lerasle et al. 1999); silhouette(Delamarre & Faugeras 2001). The last approachdemonstrated promising tracking results on runningmovements although some movements involvinglongitudinal rotations proved problematic to track.

Despite the large amount of research activity inthis area, successful tracking results on general 3Dmovement from video are still limited (Gavrila1999). Common threads among the more successfulsystems include a 3D model-oriented approach andthe use of image information from multiple synchro-nized camera views. Both of these factors reduce theambiguity of matching 2D features when describing3D motion. A model-based tracking approach basedon the matching of colour pixel values has beenapplied successfully to the tracking of 3D rigid bodymotion from synthetic and video data (Trewartha etal. 2003). Since human movement may be regardedas the motion of a hierarchical system of linked rigidbodies, it may be possible to extend this method bysuccessively adding body segments after a rigid bodytracking of the torso. The purpose of this study is toapply a model-based tracking approach in order torecover motion parameters from a number of

synthetic and video sequences containing 3D humanbody movements. The performance of the trackingprocedure is evaluated by comparing the trackedkinematic variable estimates with known values forthe synthetic sequence and with values obtained frommanual digitizing for the video sequences.

Method

The position and angle time histories (11 sets) for ahalf-twisting forward somersault with shoulder andhip configuration changes were generated using a sim-ulation model of aerial movement (Yeadon et al. 1990).A total of 10 variables were altered in this movement:three pelvis positional co-ordinates, three pelvis orien-tation angles (representing whole-body orientation),two arm abduction angles and two thigh elevationangles. Image sequences of this movement weregenerated (Open Inventor, SGI) from three views(front, side and top) using a NURBS (non-uniformrational B-spline) surface-based graphics representa-tion of an 18 segment human body model (Fig. 1a).The camera and lighting parameters used in thecreation of the sequences were known.

An ‘ellipsoid’ volume-based version of the 18segment human body model was constructed to be thetracking model (Fig. 1b). A total of 33 ellipsoid solidswere used with each body segment comprisingbetween one and four solids. Orientation and positionwere each described by three variables, while bodyconfiguration was specified using 51 joint angles.

The tracking procedure, based around a ‘generate-and-test’ approach, has been described previously(Trewartha et al. 2003) but has the following basicstructure. For each set of synchronized target images,

54 Sports Engineering (2004) 7, 53–63 © 2004 isea

Model-based tracking of human movement Yeadon et al.

Figure 1 (a) NURBS model; (b) Ellipsoid model.

(a) (b)

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 54

associated model images containing the ellipsoidhuman body model within simulated environmentsare generated and matched onto the target images.Matching is evaluated by defining a ‘mask’ comprisingthe pixels corresponding to the model in the modelimage and identifying the corresponding pixels in thetarget image through the mask. A score based on theRGB colour difference between the model and targetimages is then calculated. Model configuration valuesare altered in an iterative manner to minimize theRGB difference score between the model images andtarget images. Final model values provide estimatesfor the position, orientation and configuration of thehuman figure at each time instant of a movementsequence.

A hierarchical procedure was used to determine thetracking variables by first optimizing the six variablesdefining the position and orientation of the torso.Subsequently segments were added outwards alongthe link system. At each stage of tracking those modelsegments directly relating to variables being optimizedwere successively included in the model imagesgenerated for the model-to-target comparisons. Thusin the initial tracking of the torso no limb segmentswere included in the model. This approach wasadopted since attempting the simultaneous improve-ment of all variables describing body configurationmay lead an automatic tracking approach into majordifficulties as the search space is relatively high and solocating a global optimum is an ill-conditionedproblem.

A collection of downhill-stepping optimizationalgorithms (based on the RGB difference score) wereimplemented to control the iteration of the model con-figuration to the best estimate of body posture. In earlymatching at each time step, algorithms that iterated anumber of variables to a global optimum simultane-ously (multi-parameter) were used. In later matching,algorithms that iterated a single variable at a time wereused together with a final quadratic refinement of theestimate corresponding to minimum score.

The 11 frame synthetic target sequences of articu-lated human motion were tracked using the ellipsoidmodel and employing the hierarchical structure foroptimization of all 10 motion variables. In each set oftime-matched frames the pelvis variables were trackedinitially, followed by arm variables and then leg

variables. This process was repeated with successivelysmaller increment steps being used in the algorithmsto refine the optimization of the variables. Thestarting pose for each frame was defined by the extrap-olated values from the optimized poses obtained forthe previous two frames.

The accuracy of tracking the synthetic half-twistingsomersault was assessed by comparing the values ofthe ten tracked variables with the known values usedto generate the sequences.

Two gymnasts, who gave informed consent,performed a number of aerial gymnastic movementsfrom a trampette and were recorded by threegenlocked Hi-8 video camera systems. The scene wasilluminated by just two 2000 W spotlights, one to thefront/right and one to the right/rear of the movementspace (Fig. 2).

Parameters to be used for specifying the modelcameras in the simulated environments were obtainedby calibrating the video cameras via the direct lineartransformation (DLT) procedure, and then back-cal-culating the initial geometric camera parameterestimates from the DLT parameters. A constrainedmulti-parameter optimization procedure (simulatedannealing) was then used to obtain seven restrictedparameters (location, orientation, focal length)available to specify the model cameras. Model lightingparameters were obtained by estimating the positionand orientation of the spotlights and substituting thesevalues into the simulated environments.

© 2004 isea Sports Engineering (2004) 7, 53–63 55

Yeadon et al. Model-based tracking of human movement

Side light

Front light

Front camera (height = 1.2 m)

Side camera (height = 1.4 m)

Top camera (height = 5.1 m)

Balcony

Approximate direction of movement

18.4 m15.1 m

7.8 m Z

X

Figure 2 Layout of video data collection area for aerial movements.

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 55

Selected aerial movements were captured via an imagecapture board into SGI movie format. Post-processing(separation of fields and vertical interpolation) ofcaptured video sequences resulted in three synchro-nized image sequences for each movement displayedat 768 × 576 pixels at 50 images per second. Thefollowing movements were tracked: a starjumpmovement by the female subject (37 images); a pikedforward somersault by the female subject (43 images);a half-twisting forward somersault by the male subject(45 images).

The ellipsoid graphics model was customized toeach of the gymnasts using anthropometric data toscale body segments and RGB sampling from thevideo images to select segment colouring (Fig. 3).

The hierarchical tracking approach was applied tothe video sequences with RGB difference scoresbased on an equal weighting between colour differ-ences obtained using original RGB pixel intensitiesand differences obtained using pixel intensities nor-malized for overall intensity (normalized RGB). Atotal of 15, 20 and 14 model variables were altered forthe starjump, piked somersault and twisting somer-sault movements respectively. Initial modelconfigurations were obtained from manual digitizingestimates of the first video fields. Following initializa-tion in the first field the tracking process was fullyautomatic: there was no ‘boot-strapping’ procedureavailable to re-initialize the model configuration atany future stage. The final values from the first fieldwere used as the initial estimates for the second fieldand linear extrapolation was used to estimate theinitial variables in subsequent fields.

The tracking performance of the three videorecorded movements was assessed by visual comparison

of the original video sequences with the generatedmodel images and by comparing estimates of thetracked variables with values obtained via manual digiti-zation.

Results

The final model configurations of the synthetic half-twisting somersault returned by the trackingprocedure were used to produce graphics sequencesthat were compared with the target sequences fromthe front and side views (Fig. 4).

The accuracy of the tracking of the syntheticmovement was good, with root-mean-square (RMS)errors less than 5 mm and 1.2° (Table 1).

The original video target images of the movementwere compared with the generated model images andthe target-through-mask images which showed theregion of the target images covered by the modelboundaries (Figs 5–7). The RMS differences inkinematic estimates obtained from the automatic

56 Sports Engineering (2004) 7, 53–63 © 2004 isea

Model-based tracking of human movement Yeadon et al.

Figure 3 Model images of the ellipsoid human body model cus-tomized for anthropometry and colour for (a) female subject and(b) male subject.

Figure 4 Comparison of target and tracked image sequences fromfront and side views for the synthetic half-twisting forward somer-sault movement.

Original NURBS image sequence (front view)

Tracked ellipsoid model sequence (front view)

Original NURBS image sequence (side view)

Tracked ellipsoid model sequence (side view)

(a) (b)

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 56

© 2004 isea Sports Engineering (2004) 7, 53–63 57

Yeadon et al. Model-based tracking of human movement

Figure 5 Video (uppersequence), model(middle sequence) andtarget-through-mask(lower sequence)images from the frontcamera view (selectedfields) for the starjumpmovement.

Figure 6 Video (uppersequence), model(middle sequence) andtarget-through-mask(lower sequence)images from the sidecamera view (selectedfields) for the pikedsomersault movement.

Figure 7 Video (uppersequence), model(middle sequence) andtarget-through-mask(lower sequence)images from the frontcamera view (selectedfields) for the half-twisting somersaultmovement.

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 57

tracking procedure and manual digitizing were calcu-lated (Table 2). Repeated digitization provided anindependent estimate of the precision of the manualdigitizing process. The RMS difference valuesobtained from repeated manual digitizing trials were6–16 mm for position estimates, 1–3° for pelvis orien-tation angles, 3–5° for arm angles and 2° for legangles, depending on the movement.

There was a tendency for the time histories of thetracked variables on the right side of the body to be inbetter correspondence with the digitized estimates

than the variables on the left side of the body(Figs 8–10). This was due primarily to the placementof video cameras and light sources within the data col-lection environment (Fig. 2) which resulted in poorillumination of the left side of the body.

Although not presented, the time histories for theleg abduction and elevation variables for the twistingsomersault movement had the same degree of corre-spondence (automatic tracking vs. manual digitizing)as the leg elevation variables from the piked somer-sault movement (Fig. 9).

Discussion

The present results on synthetic image data (Fig. 4 andTable 1) confirm that the model-based trackingapproach is capable of returning accurate estimateswhen extended to the tracking of articulated movement.RMS error estimates were less than 5 mm for pelvisposition estimates, less than 0.5° for pelvis orientationestimates and less than 1.2° for limb configurationangles. Despite the experiments being run on syntheti-cally generated image data these low error estimates areimpressive considering the inter-frame steps taken bysome of the variables. The entire movement wascovered in only 11 frames, resulting in arm anglechanges of up to 70° between consecutive frames. Theiterative tracking algorithms were able to cope withsuch large changes despite using initial estimates thatwere constrained to lie within 8° of the final values fromthe previous frame. This result gives some confidencethat the approach has the potential to produce veryaccurate estimates under good tracking conditions.

Comparing the target images to final model imagesfor each video-recorded movement (Figs 5–7) it isevident that the procedure was successful in trackingthe prominent features of each movement. The modelimages obtained for the starjump and piked somer-sault movements exhibit extremely goodcorrespondence with the video images while themodel images for the twisting somersault movementare generally good but show periods of inconsistenttracking. Similarly, the target-through-mask imagesdemonstrate that the registration of the model ontothe human subject at each stage is also occurring withgood consistency for the vast majority of time steps foreach movement.

58 Sports Engineering (2004) 7, 53–63 © 2004 isea

Model-based tracking of human movement Yeadon et al.

Variable Range Max error RMS error

pelvis vertical 790 mm 8 mm 4 mmpelvis A–P 980 mm 9 mm 5 mmpelvis lateral 10 mm 5 mm 2 mmpelvis somersault 282° 0.7° 0.4°pelvis tilt 6° 1.4° 0.5°pelvis twist 175° 1.0° 0.4°L-arm abduction 130° 1.1° 0.7°R-arm abduction 130° 0.9° 0.4°L-thigh elevation 40° 2.9° 1.1°R-thigh elevation 40° 2.5° 1.0°

Note: A–P = anterior–posterior, L = left, R = right.

Table 1 Tracking error estimates for the synthetic half-twisting somersault

RMS differenceVariable Star Piked Twisting

jump som. som.

pelvis vertical 11 mm 34 mm 41 mmpelvis A–P 12 mm 30 mm 39 mmpelvis lateral 10 mm 22 mm 26 mmpelvis somersault 1.5° 6.1° 6.3°pelvis tilt 2.9° 6.8° 4.4°pelvis twist - 8.6° 13.0°L-thigh elevation 7.4° 6.4° 10.2°L-thigh abduction 6.4° 11.1° 9.4°R-thigh elevation 1.8° 5.9° 8.1°R-thigh abduction 4.4° 9.1° 7.6°L-upperarm angle * 12.1° 14.6° 28.1°R-upperarm angle* 7.7° 20.8° 23.1°L-forearm elevation 11.8°R-forearm elevation 8.5°L-calf elevation 7.5°R-calf elevation 5.9°

* upperarm angle gives the difference in arm orientation after elevation andabduction.A–P = anterior–posterior, L = left, R = right.

Table 2 RMS differences between estimates obtained from trackingand manual digitizing

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 58

© 2004 isea Sports Engineering (2004) 7, 53–63 59

Yeadon et al. Model-based tracking of human movement

pelvis – vertical

0

0.5

1

1.5

2

2.5

0 5 10 15 20 25 30 35 40

frame

manualtrack

position [m]

left thigh – abduction

-70

-60

-50

-40

-30

-20

-10

00 5 10 15 20 25 30 35 40

frame

manualtrack

orientation [°]

pelvis – anterior–posterior

-1.6

-1.2

-0.8

-0.4

00 5 10 15 20 25 30 35 40

frame

manual

track

position [m]

right thigh – abduction

-20

0

20

40

60

0 5 10 15 20 25 30 35 40frame

manualtrack

orientation [°]

pelvis – lateral

-0.075

-0.05

-0.025

0

0.025

0.05

0.075

0.1

0 5 10 15 20 25 30 35 40frame

manualtrack

position [m]

left thigh – elevation

-40

-30

-20

-10

0

10

20

0 5 10 15 20 25 30 35 40frame

manualtrack

orientation [°]

pelvis – somersault

-20

-15

-10

-5

0

5

10

0 5 10 15 20 25 30 35 40frame

manual

track

orientation [°]

right thigh – elevation

-45

-30

-15

0

15

30

0 5 10 15 20 25 30 35 40frame

manualtrack

orientation [°]

Figure 8 Time histories of thetracked variables during thestarjump movement obtainedfrom automatic tracking andmanual digitizing.

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 59

60 Sports Engineering (2004) 7, 53–63 © 2004 isea

Model-based tracking of human movement Yeadon et al.

pelvis – vertical

0

0.5

1

1.5

2

2.5

0 5 10 15 20 25 30 35 40 45frame

manual

track

position [m]

left thigh – abduction

-10

0

10

20

30

0 5 10 15 20 25 30 35 40 45

frame

manual

track

orientation [°]

pelvis – anterior–posterior

-2.5

-2

-1.5

-1

-0.5

00 5 10 15 20 25 30 35 40 45

frame

manualtrack

position [m]

right thigh – abduction

-15

-10

-5

0

5

10

15

20

0 5 10 15 20 25 30 35 40 45frame

manualtrack

orientation [°]

pelvis – lateral

0

0.02

0.04

0.06

0.08

0.1

0 5 10 15 20 25 30 35 40 45frame

manual

track

position [m]

left thigh – elevation

-140

-120

-100

-80

-60

-40

-20

0

20

0 5 10 15 20 25 30 35 40 45frame

manualtrack

orientation [°]

pelvis – somersault

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35 40 45frame

manualtrack

orientation [°]right thigh – elevation

-140

-120

-100

-80

-60

-40

-20

0

20

0 5 10 15 20 25 30 35 40 45frame

manual

track

orientation [°]

Figure 9 Time histories ofthe tracked variables forthe piked somersaultmovement obtained fromautomatic tracking andmanual digitizing.

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 60

© 2004 isea Sports Engineering (2004) 7, 53–63 61

Yeadon et al. Model-based tracking of human movement

vertical

0

0.5

1

1.5

2

2.5

0 5 10 15 20 25 30 35 40 45

frame

manualtrack

position [m]

somersault

050

100150

200250

300350

400

0 5 10 15 20 25 30 35 40 45frame

manual

track

orientation [°]

-2.5

-2

-1.5

-1

-0.5

00 5 10 15 20 25 30 35 40 45

frame

manual

track

position [m]

anterior – posterior tilt

-2

0

2

4

6

8

10

12

0 5 10 15 20 25 30 35 40 45frame

manualtrack

orientation [°]

lateral

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15 20 25 30 35 40 45

frame

manual

track

position [m]

-50

0

50

100

150

200

0 5 10 15 20 25 30 35 40 45frame

manual

track

orientation [°]

twist

Figure 10 Timehistories of thetracked pelvisvariables for the half-twisting somersaultmovement obtainedfrom automatictracking and manualdigitizing.

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 61

When the tracked estimates and the estimatesobtained from manual digitizing are compared (Table2 and Figs 8–10) it is evident that the starjumpmovement seems to have been tracked with goodaccuracy, the RMS difference values approaching thelimits of the manual digitizing precision. As themovement complexity increases, the RMS differencesalso increase. The RMS differences for the piked som-ersault movement are approximately twice as large asthose observed for the starjump movement and theRMS difference values for the twisting somersaultmovement are larger still. Generally limb angles havelarger RMS values than pelvis variables since theerrors in estimating pelvis variables are propagated tomore distal variables. On some occasions limbvariables must deviate from the true values in order tomaintain a good model-to-target match due to poorlyestimated pelvis variables.

The results for tracking movement from videosequences are particularly encouraging bearing inmind that the data collection environment was farfrom ideal. The distribution of the cameras and theconcentration of the light sources on one side of thebody led to difficulties in estimating values for bodyparts on the far side of the scene due to occlusionproblems and shadowing on some regions of the body(see Figs 8–10). Although it was initially considered tobe advantageous to brightly illuminate the subject andleave the background with little illumination, theshadowing problem for certain body parts negatedany possible advantage. Moreover, since more suc-cessful tracking results have been obtained usingsome form of normalized colour representationrather than the intensity-based original RGB signal itis anticipated that a well-illuminated environmentthroughout the entire activity volume will prove moresuccessful. Such an environment is arguably morelikely to exist in a ‘real life’ setting, such as a gymnas-tics arena.

All vision-based tracking systems introduce anumber of assumptions to make the problemtractable. The proposed tracking system is nodifferent to others in that tracking will only be suc-cessful providing certain criteria are met. For example,the model-to-target comparison is based on a colourdifference score and so it is vital that there is somecolour contrast between the subject and the back-

ground and preferably between different bodysegments. The pose estimate is improved if bodysegments are not occluded from a number of cameraviews, if the model is a better approximation of thehuman form, and if reliable information about theenvironment (e.g. lighting and camera conditions) isavailable.

The ‘generate-and-test’ optimization procedureused in this approach tends to be more computation-ally demanding than other pose estimation routines(Delamarre & Faugeras 2001). This procedure hasbeen used successfully in a number of systems,however, and has been demonstrated to be a success-ful method in motion estimation. Due primarily tothe fact that the system reported here is aimed atobtaining kinematic estimates for biomechanicalanalyses, robustness and accuracy considerations willinitially take priority over speed of data availability.The run time for tracking the video sequences with a600 MHz Pentium II CPU ranged from eight to ninehours. Processing speed may be expected to improveby a factor of around 20 when using currenthardware with parallel processing of the data fromeach camera.

Lerasle et al. (1999) state that the errors involved inmodelling the human form using approximate repre-sentations have a negative effect on the accuracy ofpose estimation and they recommend the use of moreprecise and deformable representations. In contrast,Gavrila (1999) states that models need not be highlyrepresentative providing robust model matching ispossible. At present, the 18 segment (33 solids)ellipsoid model is considered sufficiently complex fortracking human movement from video images. Modelsegments have been sized according to anthropomet-ric measurements and coloured according to asampling of the video images. Future elaboration ofthe method may be necessary if the approach is to beused for tracking individuals where no information(other than video) is available on their physical charac-teristics or if it is felt necessary to update the RGBcontent of model segments based on the most recentimage data during tracking.

At present the approach has run into some incon-sistent matching when tracking more complexmovements, particularly twisting motions.Longitudinal rotations are notoriously difficult to

62 Sports Engineering (2004) 7, 53–63 © 2004 isea

Model-based tracking of human movement Yeadon et al.

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 62

estimate due to the fact that little image change resultsfrom relatively larger changes in twist angle. One stepthat may be expected to result in immediate improve-ments in this regard would be to increase the numberof cameras (three were used in the present study) toprovide more redundancy of data. Additionally,patterned clothing may improve the ability of thesystem to track torso variables accurately.

This study has presented successful trackingresults on a number of aerial movements which havenot been tracked in any other study, including whole-body somersaulting and twisting rotations. Thetracked estimates have been evaluated againstkinematic estimates obtained using an alternativemethod, a step that is often overlooked or avoided intracking studies. The present results demonstrate thecomplexity of tracking possible using a relativelystraightforward approach to the problem with only asingle image cue and minimal image processing. Thisprovides a sound basis for future development.Further additions are required before this trackingsystem will provide a genuine solution for biome-chanics research. It is envisaged, however, that thismodel-based ‘multi-view’ colour matching approachmay form the basis of such a system.

ReferencesBregler, C. & Malik, J. (1998) Tracking people with twist

and exponential maps. In: Proceedings of the IEEEComputer Society Conference on Computer Vision andPattern Recognition, pp. 8–15. IEEE Computer Society,Los Alamitos, CA.

Chen, Z. & Lee, H.J. (1992) Knowledge-guided visualperception of 3D human gait from a single imagesequence. IEEE Transactions on Systems, Man andCybernetics, 22, 336–342.

Delamarre, Q. & Faugeras, O. (2001) 3D articulatedmodels and multiview tracking with physical forces.Computer Vision and Image Understanding, 81, 328–357.

Gavrila, D.M. (1999) The visual analysis of humanmovement: A survey. Computer Vision and ImageUnderstanding, 73, 82–98.

Gavrila, D.M. & Davis, L.S. (1996) 3-D model-basedtracking of humans in action: A multi-view approach.In: Proceedings of the IEEE Computer Society Conference onComputer Visions and Pattern Recognition, pp. 73–80.IEEE Computer Society, Los Alamitos, CA.

Lerasle, F., Rives, G. & Dhome, M. (1999) Tracking ofhuman limbs by multiocular vision. Computer Vision andImage Understanding, 75, 229–246.

Rohr, K. (1994) Towards model-based recognition ofhuman movements in image sequences. CVGIP: ImageUnderstanding, 59, 94–115.

Trewartha, G., Yeadon, M.R. & Knight, J.P. (2003) Colourbased rigid body tracking using three-dimensionalgraphics models. Sports Engineering 6, 139–148.

Yacoob, Y. & Davis, L.S. (2000) Learned models forestimation of rigid and articulated human motion fromstationary or moving camera. International Journal ofComputer Vision, 12, 5–30.

Yeadon, M.R., Atha, J. & Hales, F.D. (1990). Thesimulation of aerial movement - IV. A computersimulation model. Journal of Biomechanics, 23, 85–89.

© 2004 isea Sports Engineering (2004) 7, 53–63 63

Yeadon et al. Model-based tracking of human movement

7.1.5 Sports D70 Yeadon 26/3/04 12:12 pm Page 63