Reproduction of human arm movements using Kinect-based motion capture data
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Reproduction of human arm movements using Kinect-based motion capture data
Abstract— Exploring the full potential of humanoid robots
requires their ability to learn, generalize and reproduce com-
plex tasks that will be faced in dynamic environments. In re-
cent years, significant attention has been devoted to recover-
ing kinematic information from the human motion using a
motion capture system. This paper demonstrates and evalu-
ates the use of a Kinect-based capture system that estimates
the 3D human poses and converts them into gestures imitation
in a robot. The main objectives are twofold: (1) to improve the
initially estimated poses through a correction method based on
constraint optimization, and (2) to present a method for com-
puting the joint angles for the upper limbs corresponding to
motion data from a human demonstrator. The feasibility of
the approach is demonstrated by experimental results showing
the upper-limb imitation of human actions by a robot model.
I. INTRODUCTION
ROGRAMMING robots to perform complex tasks and
extends its repertoire can be extremely tedious and time
consuming. Learning from demonstration is a promising
methodology that offers a more intuitive approach to teach
a robot how to generate its own motor skills [1, 2]. To this
end, the robot should be able to estimate human poses when
performing a desired task, as well as to translate the skele-
ton data into appropriate motor commands. In the last
years, a large body of work has studied the use of marker
based motion capture systems for extracting 3D poses as
input for training robots to perform complex motions [3-6].
Despite much research progress, these systems are usually
expensive, they require careful calibration and its applica-
tion is limited to rigid environments. To overcome these
limitations, the main challenge is to develop accurate meth-
ods for extracting 3D human poses from image sequences
using low-cost systems. as a valid alternative.
Recently, the field of markerless motion capture has ex-
perienced a strong evolution with the development of
high-speed and cheap depth cameras. In particular, the
depth data provided by the PrimeSense sensor opened up
new opportunities for extracting gesture-based interactions
with a more portable and less costly system. The publica-
J. Rosado is with the Department of Computer Science and Systems
Engineering, Coimbra Institute of Engineering, IPC, Coimbra, Portugal
F. Silva is with the Department of Electronics, Telecommunications
and Informatics, Institute of Electronics and Telematics Engineering of
Aveiro, University of Aveiro, Portugal (email: [email protected]).
V. Santos is with the Department of Mechanical Engineering, Institute
of Electronics and Telematics Engineering of Aveiro, University of Avei-
ro, Portugal (email: [email protected]).
Z. Lu is with the Institute of Electronics and Telematics Engineering of
Aveiro, University of Aveiro, Portugal (email: [email protected]).
tion of the tracking algorithm of the Kinect Software De-
velopment Kit [4] and the availability of several develop-
ment environments (e.g., Microsoft SDK) have contributed
for a growing interest in model-free approaches. However,
the success of these alternatives depends on the accuracy
and robustness required in each specific area of application.
This paper addresses the main concern associated with
the use of a Kinect-based human motion capture in robot-
ics: the lack of a kinematic model to assure coherence in
the provided poses. The main objective is to demonstrate
and evaluate both a human action pose correction method
and an inverse kinematics technique. The former aims to
assure constant limb lengths over an entire sequence of
poses. The later converts each of the 3D poses into the cor-
responding angles for the upper-body joints, including a
validation test to deal with physical limits (e.g., joint lim-
its). The motivation of this work is to create a database of
classified motions to learning control in robotics.
In line with this, the remainder of the paper is organized
as follows: Section II presents the motion capture system
based on a single Kinect camera and the experimental con-
ditions. Section III describes the pose correction method
based on constraint optimization. Section IV focuses on the
kinematic mapping from 3D poses to joint angles. Section
V discusses the results achieved to validate the proposed
solutions. Finally, Section VI concludes the paper and pro-
poses future extensions.
II. HUMAN MOTION CAPTURE
The Kinect sensor provides a 640×480 depth image, at
30 frames per second, for the skeleton-based pose estima-
tion with depth resolution of a few centimeters. The human
skeleton estimated from the depth image includes a total of
20 body joints that will be the input for our approach.
These captured data consists of a set of Cartesian points in
the 3D volume for each human pose, which will be called
raw-data hereinafter. Several studies have assessed the ac-
curacy of the depth reconstruction and joint positions from
the Kinect pose estimation, including comparisons with
ground truth motion capture data [9-11]. In general, these
studies highlight the potential of the Kinect skeleton in con-
trolled body postures whenever self-occlusions are avoided.
In the experiments, we have used a single Kinect camera
positioned at about 3 meters from the human subject to
capture the whole body standing upright. The human pose
estimation is fully automatic and did not require calibration.
Reproduction of Human Arm Movements Using Kinect-Based
Motion Capture Data
José Rosado, Filipe Silva, Vítor Santos and Zhenli Lu, Member, IEEE
P
Fig. 1. Frame-to-frame variation of the limb lengths for a static posture (left) and a reaching arm movement (right).
In this study the attention is dedicated to the upper limbs,
including the shoulder, elbow and wrist joints of both right
and left arms. In order to ensure the most convenient acqui-
sition conditions, the human subject was asked to prevent
lower trunk movements and to perform controlled scapular
motions. Precautions were also taken to avoid occlusions of
the upper limb parts.
Besides the accuracy and robustness of the skeletal pos-
es, a critical element is the stability of the estimated frame-
to-frame body geometry. As mentioned before, a character-
istic of the human body skeletonization with the Kinect
sensor is that the limb lengths are not kept constant through
the entire sequence and differ between the two arms. Fig. 1
illustrates the variations of the limb lengths, from frame-to-
frame, a static posture and a reaching arm movement were
evaluated. In the static case, the mean value rounds 268 mm
for the arm and 233 mm for the forearm, while the standard
deviation is around 3.65 mm and 1.51 mm, respectively.
These measures are significantly different during the execu-
tion of a reaching movement: 265 mm of mean for the arm,
216 mm for the forearm with a standard deviation around
15.9 mm and 8.8 mm, respectively.
III. CONSTRAINED-BASED MOTION FILTERING
The pose correction method aims to convert the motion
of a source human subject into a new motion, while satisfy-
ing a given set of kinematic constraints. These kinematic
constraints are formulated in order to assure a kinematic
model with constant limb lengths. The proposed method,
applied to each individual frame, can be divided into two
main steps:
• Static calibration: the first step is a static calibration of
the arms, prior to each data collection, to define the ref-
erence model of the subject anthropometry. Concretely,
the human subject was told to hold their arms full ex-
tended aligned with the trunk (fundamental standing po-
sition), while several frames are acquired. A distance
vector among consecutive joints (shoulder-elbow and el-
bow-wrist) is calculated as the mean value taken over all
these frames for both arms. It should be pointed out that
this arm calibration is the basis for the joint-angle calcu-
lations in Section IV: all joints angles are defined as zero
degrees at this calibration posture.
• Pose correction: the basic problem is to find the closest
configuration ( ) nnxxxX ×ℜ∈= 3
21 ,...,,
( )31 ,... ℜ∈nxx to the measurements that are observed
over time, such that the distance between consecutive
points (i.e., link lengths) remains constant.
In line with this, we deal with the following optimization
problem:
∑=
Ω∈−
n
iX
XX
1
ˆmin (1)
where Ω is a certain subset of and is an appropriate ma-
trix norm which measures goodness of fit. Here, we admit
the Euclidean norm as measure of closeness. The goal is to
minimize the objective function (1) by selecting a value of
X that satisfies all equality quadratic constraints defined by:
1,1 ++ =− iiii dxx (2)
where the left part is the Euclidean distance between two
consecutive points and the right part is the link lengths in
the reference model.
The constrained minimization problem was solved with
the OPTI toolbox that can solve this problem of optimizing
a quadratic function of several variables subject to quadrat-
ic constraints. The comparison of the human skeletons ob-
tained with the Kinect raw-data and those after the pose
correction are illustrated in Fig. 2. Different poses are rep-
resented for a movement sequence involving both the right
and the left arm.
Fig. 2. Overlap of the human skeletons extracted from the Kinect and
those after the constraint-based optimization at different frames (green
points and black lines are original Kinect data; red points and blue lines
are motion constrained filtered data. Green and red lines are the respective
trajectories of the wrists).
IV. KINEMATIC MAPPING
One of the main issues in using motion capture data for
training robots is to convert the 3D joint positions into joint
angles relative to a robot model. In this context, the human
skeleton is replaced by two 4 degree-of-freedom (DOF)
robot arms of the same dimensions. Then, an inverse kine-
matics algorithm generates the corresponding joint angles
of the robot for each pose. The problem is decomposed into
a per-frame inverse kinematics algorithm, followed by mo-
tion filtering and interpolation.
A. Inverse kinematics
The filtered movement data is the input for the inverse
kinematics module in which the human arms are modeled
as two independent 4-dof serial chains consisting of a 3-
DOF shoulder (rotations joints with intersecting axis) and a
1-DOF elbow joint. The implementation of the inverse kin-
ematics follows some basic assumptions. First, the robot
model was defined to match the anthropometric measures
of the human subject, avoiding the retargeting problem (i.e.,
compensate for body differences). Second, the perturba-
tions in the movement data caused by the movement of the
subject’s shoulder are ignored. Concretely, we consider that
all joint positions are uniformly affected by the perturba-
tions and the shoulders are at the origin of the reference
system with fixed coordinate frames. Third, the inverse
kinematics considers mechanical constraints on the joints,
such as physical limits both on the range of joint motions
(e.g., the elbow cannot invert the motion when full-
stretched) and on the maximum joint velocities.
Given the 3D positions of the shoulder, elbow and wrist,
the inverse kinematics algorithm is simplified: two degrees
of freedom completely describe the elbow when the posi-
tion of the shoulder is known (the elbow lies on the surface
of a sphere centered at the shoulder). Similarly, the wrist
can only lie on the surface of a sphere centered at the el-
bow. Thus, the configuration of the arm is completely rep-
resented by four variables (the joint angles). Attention was
devoted to avoid discontinuous jumps near ±180º associat-
ed with the use of inverse tangent functions.
Additionally, the implemented algorithm includes a vali-
dation test since there may be motions where the robot’s
joints are not able to approximate the human pose in a rea-
sonable way due to physical limitations. The proposed
strategy to properly cope with the joint velocity limits is to
slowing down the task-space trajectory whenever the limits
are encountered. Thus, whenever the generated joint veloci-
ties violate the speed limits of the joint actuators, the trajec-
tory is scaled in time by an appropriate constant that simul-
taneously assures tracking of the desired arm path and the
fulfillment of the velocity constraints.
B. Filtering and Interpolation
The frame rate of the Kinect sensor and high frequency
components in the movement data imposed a post-
processing stage to refine results. The exact procedure
combines basic interpolation and smoothing techniques. On
the one hand, the joint-angle trajectories are filtered using a
moving average algorithm to smooth out short-term fluc-
tuations based on predefined trail onset and termination
times. On the other hand, the strategy adopted to provide a
more detailed description of the action performed by the
human subject is to use spline interpolation over the set of
observations to satisfy the requirements of differentiability.
To evaluate the different steps of post-processing, we use a
measure based on jerk, the third time derivative of position,
to quantify smoothness at the level of the joint-angles tra-
jectories. Concretely, the particular jerk metric used to
quantify movement smoothness is the integrated squared
jerk [12] defined by:
∫=2
1
)(t
tisj dttxη (3)
A comparison of movement smoothness measures
among the original signal (after pose correction), the mov-
ing average filtered signal, the cubic spline interpolation
and the fifth-order spline interpolation was performed (Fig.
3). The exact procedure to be followed depends on the ul-
timate goal. Anyway, the previous considerations may be of
importance in determining what strategies are appropriate
to the problem in hand.
Fig. 3. Comparison of the smoothness measure for different motion post-
processing methods applied on the joint-angle trajectories (ordinate is
plotted in a logarithmic scale).
V. GESTURES IMITATION IN A ROBOT
Several real-time movements executed by a human sub-
ject were captured using the Kinect sensor to provide vali-
dation for our algorithms. Two different movements were
chosen to illustrate the results: a rhythmic motion repeated
many times and a discrete sequence of upper-limb move-
ments. In the first experiment, the human subject is asked to
repeat a circular path trying to keep, as much as possible, a
constant speed across all trials. Fig. 4 and Fig. 5 show the
variability always present in human movements, both in
task- and joint-spaces. Since the details vary, it seems nec-
essary to consolidate the demonstrated movements having
in mind the desired final result (i.e., the extent to which the
motor goal is reached).
Fig. 4. Variability of human movements in the task-space for the execution
of a circular path repeated many times.
Fig. 5. Variability of human movements in the joint-space for the execu-
tion of a circular path repeated many times.
The second experiment consists of a gesture imitation
task using the two arms in different configurations around
the T-pose. Fig. 6 compares the positions of the right and
left wrists as seen by the filtered data and the robot simula-
tion. The consistency between the two curves suggests the
efficacy of the human motion reconstruction algorithm pro-
posed.
VI. CONCLUSIONS
In this paper, we have described and demonstrated the
potential of the Kinect sensor for gestures imitation of an
upper-body robot from demonstrations of a human teacher.
The implementation of the proposed ideas on the 4-DOF
robot model shows that human-demonstrated gestures are
well replicated by the robot. In this context, the approach is
useful for providing a natural and intuitive interface for a
user to teach complex movements to a robot. The main goal
is to create real data sets that, if combined with other, can
be later used for learning a compact representation of the
task. In this context, they will assist in developing learning
techniques for manipulation/locomotion behaviors based on
examples of human demonstrations.
Fig. 6. Comparison of the motion capture data (left) with the gestures replicated by the robot (the end-effector path is represented in both cases).
ACKNOWLEDGMENT
This work is partially funded by FEDER through the Oper-
ational Program Competitiveness Factors - COMPETE and
by National Funds through FCT - Foundation for Science
and Technology in the context of the project FCOMP-01-
0124-FEDER-022682 (FCT reference Pest-C/EEI/UI0127/
2011). Zhenli Lu is supported by FCT under contract
CIENCIA 2007 (Post Doctoral Research Positions for the
National Scientific and Technological System).
REFERENCES
[1] Billard, A., Callinon, S., Dillmann, R., Schaal, S.: Robot Program-
ming by Demonstration. In: Siciliano, B., Khatib, O. (eds.) Hand-
book of Robotics, Springer, New York, NY, USA, (2008)
[2] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A Survey of
Robot Learning from Demonstration. Robotics and Autonomous
Systems, 57(5): 469-483 (2009)
[3] Dasgupta, A., Nakamura, Y.: Making Feasible Walking Motion of
Humanoid Robots from Human Motion Capture Data. In: IEEE In-
ternational Conference on Robotics and Automa-tion, pp. 1044-1049
(1999)
[4] Elgammal, A., Lee, C-S: Tracking People on a Torus. IEEE Transac-
tions on Pattern Anal-ysis and Machine Intelligence, 31(3): 520-538
(2009)
[5] Inamura, T., Toshima, I., Tanie, H., Nakamura, Y: Embodied Sym-
bol Emergence Based on Mimesis Theory. International Journal of
Robotics Research, 23(4-5): 363-377 (2004)
[6] Kulic, D., Takano, J.W., Nakamura, Y.: Incremental Learning, Clus-
tering and Hierarchy Formation of Whole Body Motion Patterns us-
ing Adaptive Hidden Markov Chains. Inter-national Journal of Ro-
botics Research, 27(7): 761-784 (2008)
[7] Shon, A.P., Grochow, K., Hertzmann, A., Rao. R.P.: Learning
Shared Latent Structure for Image Synthesis and Robotic Imitation.
In: Weiss, Y., Schlkopf, B., Platt, J.C. (eds.) Ad-vances in Neural In-
formation Processing Systems, MIT Press, Cambridge, MA (2005)
[8] Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M.,
Moore, R., Kipman, A., Blake, A.; “Real-time Human Pose Recogni-
tion in Parts from Single Depth Images. In: IEEE Computer Vision
and Pattern Recognition, Colorado Springs, USA (2011)
[9] Khoshelham, K., Elberink, S.O.: Accuracy and Resolution of Kinect
Depth Data for Indoor Mapping Applications. Sensors, 12(2): 1437-
1454 (2012)
[10] Smisek, J., Jancosek, M., Pajdla, T.: 3D with Kinect. In: Internation-
al Conference on Computer Vision Workshops, pp. 1154–1160, Bar-
celona, Spain (2011)
[11] Obdržálek, S., Kurillo, G., Ofli, F., Bajcsy, R., Seto, E., Jimison, H.,
Pavel, M.: Accuracy and Robustness of Kinect Pose Estimation in
the Context of Coaching of Elderly Population. In: International
Conference of the IEEE Engineering in Medicine and Biology Socie-
ty, pp. 1188-1193, California, USA (2012)
[12] Platz, T., Denzler, P., Kaden, B., & Mauritz, K.-H: Motor Learning
After Recovery from Hemiparesis. Neuropsychologia, 32, 1209-1223
(1994)