Learning task space control through goal directed exploration
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Learning task space control through goal directed exploration
Learning Task Space Control throughGoal Directed Exploration
Lorenzo Jamone1, Lorenzo Natale2, Kenji Hashimoto1, Giulio Sandini2, Atsuo Takanishi1
Waseda University, Tokyo, Japan1
Robotics, Brain and Cognitive Science Department - Italian Institute of Technology, Genova, [email protected], [email protected],
k − [email protected], [email protected], [email protected]
Abstract—We present an autonomous goal-directed strategyto learn how to control a redundant robot in the task space.We discuss the advantages of exploring the state space throughgoal-directed actions defined in the task space (i.e. learningby trying to do) instead of performing motor babbling in thejoints space, and we stress the importance of learning to beperformed online, without any separation between training andexecution. Our solution relies on learning the forward model andthen inverting it for the control; different approaches to learnthe forward model are described and compared. Experimentalresults on a simulated humanoid robot are provided to supportour claims. The robot learns autonomously how to performreaching actions directed toward 3D targets in task space byusing arm and waist motion, not relying on any prior knowledgeor initial motor babbling. To test the ability of the system toadapt to sudden changes both in the robot structure and inthe perceived environment we artificially introduce two differentkinds of kinematic perturbations: a modification of the length ofone link and a rotation of the task space reference frame. Resultsdemonstrate that the online update of the model allows the robotto cope with such situations.
I. INTRODUCTION
This work investigates how to learn a kinematic model of
the robot and use it for control. Specifically, we learn the
forward model of a redundant robot and we invert it for the
control. Learning is performed online, without any separation
between system training and task execution; no prior knowl-
edge about the system is given. Our main contribution is to
combine advanced control using learned models [1], [2] and
goal directed exploration [3], [4]. Differently from previous
works, we want the robot to learn autonomously through a goal
directed exploration of the task space, without any previous
exploration of the motor space (motor babbling). This strategy
allows to restrict the exploration of the state space to the
regions which are more relevant to the task fulfillment; as soon
as the robot needs to perform tasks in other regions of the state
space the learned model is updated accordingly. Experimental
results show the advantages of this approach. Moreover, the
learned model have to be updated if changes in the robot
structure or in the perceived environment occur. We simulate
such situations by introducing artificially two different kinds
of kinematic perturbations: a modification of the size of one
link and a 45◦ CCW rotation of the task space reference frame.
Results demonstrate how the robot is able to cope with these
situations.
The paper is organized as follows. In section II we provide
general considerations about online learning and exploration,
and in section III we discuss how we applied those considera-
tions to learning task space control, also in relation to previous
works; then, in section IV we describe the details of our
implementation. Finally, in sections V and VI we report the
experimental results and we draw our conclusions.
II. ONLINE LEARNING AND EXPLORATION
Past works [5]–[7] tackled the learning problem by con-
sidering two distinct phases: an exploration phase in which
the robot collects data about the system followed by a later
exploitation phase in which the robot performs its actions
using a neural network which has been trained with the data
gathered previously. This approach has two problems. First, at
some point the robot (i.e. learning system) should decide to
stop learning and activate the controller that uses the network.
Besides the problem of individuating this special time (how
can the robot say that it has learned enough?), the process
would probably take too much time (i.e. the curse of dimen-
sionality [8]) and still such a learned model would be no longer
useful if the system changes. Finally, two different controllers
are needed to drive the robot actions during two different
phases, as the goal changes (i.e. during the exploration phase
the goal is just to gather data for learning). This is neither
intuitive nor practical. Indeed, the robot should explore the
learning space driven by the same motivations that characterize
its later actions (i.e. a goal directed exploration). This also
helps to reduce the size of such a space, by individuating the
areas which are more important to the task fulfillment, at least
at the current developmental stage. Recently, goal directed
exploration has been proposed by different authors for the case
of inverse kinematics learning [3], [4]. Here we investigate this
aspect for the case in which the forward kinematic model is
learned and then inverted for control. Furthermore, differently
from [4], we do not impose any particular choice of the goals,
as we assume that those are already given.
III. LEARNING TASK SPACE CONTROL
In this work we consider task space control of a redundant
robot. Task space (or operational space) is the space in which
generally robot tasks are described (e.g. the 3D Cartesian
space), as opposed to joints space (or motor space). Since the
702
Proceedings of the 2011 IEEEInternational Conference on Robotics and Biomimetics
December 7-11, 2011, Phuket, Thailand
robot needs to be controlled in joints space, an inverse model is
required which maps the desired task space behavior into the
appropriate joints space trajectories (i.e. motor commands).
When the joints space dimension is larger than the task
space dimension an infinity of solutions exists for the model
inversion: in this case, we say that the robot is redundant
with respect to the specified task. Given this scenario, there
are mainly two possibilities to exploit learning: to learn the
forward model and then invert it for the control, or to directly
learn the inverse model. Learning the forward model gives the
opportunity to choose the best inverse solution according to the
situation. Different methods can be exploited (e.g. extended
Jacobian [9], augmented Jacobian [10], local minimization
through null space projection [11]) to solve a secondary task
while performing the main one (i.e. end-effector positioning).
Possible choices for such a secondary task can be joint limits
avoidance, obstacle avoidance, or in general the optimization
of an arbitrary function. One main drawback of this approach
is that the model inversion can be problematic if the Jacobian
matrix becomes ill-conditioned (i.e. the system is close to sin-
gular configurations): joint velocities obtained by the inversion
of such a matrix become huge or even unfeasible. The problem
is avoided when the inverse model is learned directly from the
gathered velocity data: since they are generated by the robot
motion, they are physically feasible by definition. However,
when the inverse model is directly learned from data, the
system can be controlled only exploiting the specific inverse
solution which has been learned. A robotic implementation of
this latter strategy is proposed in [1]. For what concerns for-
ward model learning, we consider the interesting work in [2]
as a main reference. The authors propose an approach which
combines forward kinematics model learning with task space
control, exploiting null space projections to solve multiple
tasks simultaneously. After a motor babbling phase, the system
is controlled with the learned model, which is also constantly
updated online during the movements; results are provided in
simulation on a 3 DOF planar manipulator. The Jacobian is
derived analitically from the learned forward kinematic model,
and then inverted. As they claim, this computation propagates
the estimation errors of the learned model. An alternative
solution is to learn directly the non-linear Jacobian matrix from
the gathered velocity data, as we investigated in [12]. However,
as long as velocities are derived from position measurements
an estimation error is always introduced.
In this work we follow the same approach as in [2] and we
extends those results considering a more complex scenario, in
which a simulated humanoid robot with joints limits is con-
sidered; the robot performs reaching actions directed toward
3D target points in the cartesian space exploting both arm
and waist motion (7 DOFs). Differently from their work, we
do not perform any motor space exploration (babbling phase)
before the reaching actions start: the exploration of the state
space is fully realized during the execution of goal directed
movements in task space. The main goal of these movements
is to bring the end-effector (i.e. robot hand) into a specified
3D position, with the required accuracy. The secondary goal
is to keep the robot joints as far as possible from the physical
limits: this latter is achieved by local minimization through
null space projection, as in [11]. As a result of this strategy,
the motor space is not explored exhaustively; only the regions
which are more relevant for the current tasks are covered. This
allows a faster convergence of the estimation error in those
areas; then, as soon as the robot needs to perform actions in
other regions of the motor space, the learned model is updated
online. Also, to keep the system far from joints limits helps
to carry out a safe and efficient goal directed exploration. In
general, since in the beginning the model estimation can be
poor or even completely wrong, the robot can get stuck at
joint limits without finding a direction to move out: even if
a target in a direction which is opposite to the joint limits
is choosen, a wrong estimated Jacobian could push the robot
again toward the limits. Without motion the model can not
be updated, and the robot could get stuck forever. This effect
is consistently reduced by choosing joint limit avoidance as a
secondary task.
Finally, concerning the problems related to matrix inversion
near singularities, here we choose to rely on the damped least
squares method [13].
IV. PROPOSED SOLUTIONS
The solution we chose for the design of the task space
controller was originally proposed in [11]. Motor velocities
q are generated as follows:
q = Km · C1 +Ks · C2 (IV.1)
where
C1 = J†(q) xdm (IV.2)
C2 = (I − J†(q)J(q)) qds (IV.3)
where J(q) is the Jacobian matrix which maps from motor
velocities to task velocities, J†(q) is its Moore-Penrose gen-
eralized inverse, (I − J†(q)J(q)) is a null-space projector,
xdm is the desired task space velocity for the main task and
qds is the desired joints space velocity for the secondary task.
At every control step these desired velocities are chosen as
follows:
xdm = xd − x (IV.4)
qds = −Kg ∇M(q) (IV.5)
where xd and x are the desired and actual task space positions
and ∇M(q) is the gradient of M(q), the function we want
to minimize as a secondary task. Km, Ks and Kg are
positive definite diagonal gain matrices respectively for the
main task, the secondary task and the gradient descent. Since
our secondary task is to keep the joints as far as possible from
the limits we chose M(q) as in [11]:
M(q) =1
N
N∑i=0
(qi − ai
ai − qmaxi
)2
(IV.6)
ai =qmaxi + qmin
i
2(IV.7)
703
where N is the overall number of joints and qmin and qmax
are lower and upper joints limits.
If the secondary task is specified in a different space (i.e.
M(z)) an appropriate Jacobian is needed to transform the
desired velocities in the joints space; equation IV.5 is hence
modified as follows:
qds = −Kg J†
z (q) ∇M(z) (IV.8)
where Jz(q) is the Jacobian which maps velocities in the qspace to velocities in the z space. See [14] for details.
At every control step we check whether the system is close
to singularities by computing the smaller singular value of the
Jacobian through singular value decomposition (SVD). If the
smaller singular value δm is lower than a predefined threshold
δT we rely on the damped least squares solution [13]. In this
case J†(q) in equations IV.2 and IV.3 is computed as follows:
J†(q) = JT (q)(J(q)JT (q) + λ2I)−1 (IV.9)
where
λ2 =
[1−
(δmδT
)2]· λ2
MAX (IV.10)
In our implementation we chose δT = 0.0001 and λ2MAX =
0.00005.
In the following sections we describe the two approaches we
investigated to obtain a model of the Jacobian J(q) through
learning: derive it from a learned forward kinematic model or
learn it directly from velocity measurements.
A. Derive the Jacobian from the learned forward model
Following the approach of [2] a neural network is trained
using Locally Weighted Projection Regression (LWPR [15])
to estimate the forward kinematic model of the system,
x = f(q). Then the estimated model is derived to obtain
the Jacobian J(q), defined as ∂f∂q . The derivative computation
comes “for free” since the LWPR algorithm already performs
it every time the model is updated.
At every control step i the LWPR neural network is trained
with a couple < qi,xi > and the local Jacobian J(qi) is
computed. We will refer to this neural network as the kinematicmap.
B. Learn the Jacobian from velocity measurements
In [12] we proposed an algorithm to learn online a model
of the Jacobian. The estimation was based on Receptive
Field Weighted Regression (RFWR [16]) and incremental least
squares (ILS [17]). Here we use the same algorithm, but
we employ LWPR instead of RFWR; LWPR is actually an
enhanced version of RFWR. Its main advantage is that both
update and prediction should be faster (it is an O(n) algorithm,
where n is the input dimension, while RFWR is O(n2)) and
that the dimension is locally reduced with a Linear Projection
Regression that projects locally along relevant dimensions.
LWPR should outperform RFWR expecially in presence of
a big input space with irrelevant dimensions.
A local Jacobian matrix Ji is estimated from sensory measure-
ments (arm joints displacements, Δq, and hand displacements,
Δx) in the surroundings of qi. Then, a LWPR neural network
is trained with couples < qi, Ji >, for different qi. We will
refer to this neural network as the Jacobian map.
The Jacobian map is updated during the movement as follows:
1. collect new sample (Δx,Δq)i at time i2. retrieve Ji = LWPR(qi)3. update Ji with (Δx,Δq)i → Ji+1
4. train LWPR(qi, Ji+1)
where the matrix update at step 3 is done with ILS. At every
control step i the local Jacobian J(qi) is obtained by querying
the map with the current qi.
C. LWPR initialization
In general, the output of the LWPR neural network is null
if no local model is associated with the current input. This
always happens in the beginning, when still no samples have
been collected and used for training, but can occur also later,
if the system works in regions of the motor space that have
not been explored yet. If the estimated Jacobian J(q) is null,
the controller IV.1 only relies on C2 (since C1 becomes zero)
and drives the joints toward their mean value, in order to
minimize M(q). As soon as M(q) is minimized, the system
can then easily get stuck around that configuration: if the
output of the LWPR neural network is still null the robot
would not move anymore even if the reaching target changes
(C1 remains zero), it is not able to collect data for training
and therefore it can not learn the kinematic model. Therefore,
we need an initialization matrix for J(q), which is used
everytime the LWPR outputs zero. In our implementation we
chose a matrix with random non-null coefficients between zero
and one. The motion generated using such a matrix will not
necessary bring the hand closer to the target but will produce
exploratory movement. Indeed, a feature of the employed
learning algorithms is that we can use even a “wrong” matrix
as initialization, without preventing the converging of the
estimation error.
V. EXPERIMENTAL RESULTS
The experiments are carried out using the iCub Dynamic
Simulator [18]: a snapshot is displayed in figure 1. All the
software has been realized using YARP [19]. To the sake of
clarity, we recall the definition of joint space vector q and task
space vector x, as used hereinafter:
. q = [θsy θsp θsr θe θwy θwr θwp]T ∈ R
7
. x = [xh yh zh]T ∈ R
3
where θsy , θsp, θsr are the shoulder yaw, pitch and roll
rotations (elevation/depression, adduction/abduction and ro-
tation of the arm), θe is the elbow flexion/extension, θwy ,
θwr, θwp are the waist yaw, roll and pitch rotations (rotation,
adduction/abduction, elevation/depression of the trunk), and
xh, yh, zh are the three cartesian coordinates describing the
hand position with respect to a fixed reference frame placed
704
on the ground, in the middle between the robot feet.
The robot joints limits are defined as follows:
. qmin = [−90◦; 0◦;−35◦; 10◦;−50◦;−30◦;−10◦]T
. qmax = [90◦; 160◦; 95◦; 100◦; 50◦; 30◦; 70◦]T
We compare the two learning algorithms (kinematic map and
Jacobian map) in section V-A, and the two exploration strate-
gies (random motor babbling and goal directed exploration) in
section V-B. Then in section V-C we evaluate the learning and
adaptation capabilities of the system when using the kinematic
map algorithm and the goal directed exploration strategy.
Experiments presented in this latter section are organized as
follows: first the robot performs reaching movements directed
toward 3D targets (goal directed exploration) then we evaluate
the learning performances by executing some test movements,
which consist in following a given task space trajectory. It is
important to notice that this separation has been introduced
only to assess the quality of the model estimation with repeat-
able and significant tests: it does not mean that learning and
execution occur in different phases. The right image in figure
1 provide information about the 3D targets distribution and
the test trajectories. The three black spheres in the kinematic
sketch of the robot indicate the actuated joints: the 3-DOFs
waist, the 3-DOFs shoulder, the 1-DOFs elbow. The green dots
are a representative subset of the targets: they are all physically
reachable and most of them are located in front of the robot,
mainly on the right side. The two test trajectories are displayed
in blue: the first one is a solid line which depicts a cube-like
figure, the other one is a straight dashed line. We refer to the
first one as the cube test and to the second one as the line test.
Fig. 1. On the left: the iCub Simulator. On the right: a sketch of the robotkinematics, the target points of the goal directed exploration (green dots), thetest trajectories (blue solid line for the cube test, blue dashed line for the linetest).
A. Comparison of learning algorithms
We first compare the estimation performances of the two
proposed learning algorithms. Indeed, the quality of the task
space controller depends on the quality of the estimation. Dur-
ing the goal directed exploration data is collected at every time
step i in the form < qi,xi,Δqi,Δxi >, and used for traning
either the kinematic map, xi = f(qi), or the Jacobian map,
Δxi = J(qi)Δqi, as described in sections IV-A and IV-B.
We define the Jacobian estimation error Je = Δxi − Δxi,
where Δxi =∂f∂q (qi)Δqi in case of the kinematic map and
Δxi = J(qi)Δqi in case of the Jacobian map. We built a test
set of 30K samples to evaluate the estimation performances
of the learning algorithms. In figure 2 the Root Mean Squared
Error (RMSE) of the Jacobian estimation is depicted (i.e. the
average of the RMSE of the three components of Je). Then,
in figure 3 we show the hand trajectories during the cube
test performed with the two learned maps, trained with 150K
samples. The kinematic map learns faster than the Jacobian
map, and also the final RMSE (after 150K samples) appears
to be lower: 0.0033 for the kinematic map, 0.0037 for the
Jacobian map. As a consequence, the hand trajectory during
the cube test happens to be better (more straight) when using
the kinematic map for the control. A possible reason for this
result is that the velocity measurements used to train the
Jacobian map < Δqi,Δxi > are derived from the measured
positions (Δqi = qi − qi−1, Δxi = xi − xi−1) and this
computation introduces a further estimation error.
Therefore, we will rely on the kinematic map for all the tests
presented hereinafter.
Fig. 2. RMSE of the Jacobian estimation with training samples collectedduring the goal directed exploration, using the two presented approaches: thekinematic map, blue solid line, and the Jacobian map, blue dashed line.
Fig. 3. Hand trajectories during the cube test, after 150K samples have beencollected with the goal directed exploration and used for training: controllingwith the kinematic map, left image, and controlling with the Jacobian map,right image.
B. Comparison of exploration strategies
An important feature of the goal directed exploration is that
learning is restricted to the regions which are more relevant
for the control: learning such a reduced space should be
faster then learning the whole space. To support our claim
we compared the two approaches. We performed both a
random motor babbling phase (movements toward uniformly
distributed joints configurations were executed controlling the
705
system in joints space) and a goal directed exploration phase
(the hand was driven toward task space positions using the
task space controller IV.1). Training samples for the kinematic
map were collected during the movements: the distribution of
joints positions in both cases are shown in figure 4 and 5.
The distribution of hand positions in task space was the same
both in the random motor babbling and in the goal directed
exploration. Then we compared the Jacobian estimation RMSE
of the kinematic map in the two cases: using the random
motor babbling training samples and using the goal directed
exploration training samples. Results in figure 6 show that the
goal directed exploration allows to achieve a faster and more
precise estimation.
Moreover, thanks to the null-space projection exploited in the
controller, the joints are maintained as far as possible from the
limits. As already pointed out, this helps to carry out a safe
and effective exploration. As a comparison, we performed the
goal directed exploration again using controller IV.2) alone,
without null-space projection. The percentage of movements
ending at joints limits in this case were 65%, while it was only
10% when exploting the null-space projection: as a result the
collected data were highly biased on the joints limits and the
model estimation was poor.
Fig. 4. Joint positions distribution during the goal directed exploration. Jointsare maintained as far as possible from the physical limits.
Fig. 5. Joint positions distribution during the random motor babbling. Jointspositions are uniformly distributed over the whole range.
C. Model learning and adaptation
First we evaluated the performances of the controller in the
execution of the cube test as more data samples are acquired
and used to update the kinematic map with the goal directed
exploration. Results are depicted in figure 7. The system
Fig. 6. RMSE of the Jacobian estimation using the kinematic map, withtraining samples collected during the goal directed exploration, blue solidline, and the random motor babbling, blue dashed line. The test set sampleshave been collected during the goal directed exploration.
Fig. 7. Hand trajectory during the cube test using the kinematic map trainedwith increasing number of samples. From the left: 50K samples, 1000Ksamples, 7000K samples.
starts to be controllable after 50K samples are learned (almost
all the final positions are reached, even if with inaccurate
movements), and trajectories become almost straight around
1000K samples. Then performances further improve as more
samples are learned.
1) Unexplored space: The following experiment aims at
evaluating the ability of the learning algorithm to update the
model when a new region of the motor and the task space
is explored. The line test (as defined in the beginning of this
section) is executed for several iterations; data is collected
during the motion to update the model. The resulting hand
trajectories are displayed in figure 8.
A good estimation of the model is achieved around iteration
80, when about 150K samples have been collected and used for
training. In figure 9 the joints positions distribution is depicted.
Even if the controller tries to keep the joints far from the
limits, the solution of the main task (hand trajectory) for the
line test impose to drive some of the joints close to the limits.
Therefore, the motor space is explored in a region which was
not covered in the previous goal directed exploration (compare
to figure 4). The same holds for the task space (hand positions),
as it can be deduced from the right image in figure 1: the line
test reference trajectory lies in a region of the task space that
has been explored only partially.
2) Robot kinematic modification: After 1000K samples
have been learned during the goal directed exploration we
introduce a modification in the robot kinematics. Specifically,
we change the length of the forearm from 0.18 meters to
0.30 meters. This can simulate different situations, like the
replacement of a robot part with a different one or the use of
a tool for reaching. The goal directed exploration is executed
further with the new kinematics and the cube test is performed
at different learning stages (i.e. increasing number of training
706
samples). Results are shown in figure 10. The first plot on
the left shows the hand trajectory as soon as the kinematic
modification is introduced. After some learning is performed
hand trajectories become straight again, as in the center and
right plots. The learning process is faster if we train the system
on the specific trajectory. Figure 11 shows the improvements of
the controller if the cube test is repeated for several iterations.
In this case only 7 iterations of the test movement (about 100K
training samples) are sufficient to recover a good estimation
of the kinematic model. Of course in this case the new model
is learned only in this specific region of the workspace.
After 2800K samples have been learned with the new kinemat-
ics the modification is removed. The goal directed exploration
is continued with the original kinematics and the cube test is
performed at different learning stages. Figure 12 shows how
the system is able to recover a good estimation of the model.
Fig. 8. Hand trajectory during the execution of the line test (inside anunexplored region of the state space), at different learning stages. From thetop-left corner: after 1, 20, 40, 60, 80, 100 iterations of the test movement.
Fig. 9. Joint positions distribution during the execution of the line test.Movements are performed in a region of the motor space that was not exploredduring the goal directed exploration.
Fig. 10. Hand trajectories during the cube test, after a modification of therobot kinematics. The goal directed exploration is performed with the newrobot kinematics and new samples are collected to update the model. Fromthe left: 0 samples, 700K samples, 2800K samples.
Fig. 11. Hand trajectories during the cube test, after a modification of therobot kinematics. Several iterations of the cube test are performed with thenew robot kinematics and new samples are collected to update the model.From the left: 1 iteration, 7 iterations (about 100K samples), 14 iterations(about 200K samples).
Fig. 12. Hand trajectories during the cube test, after the modification ofthe robot kinematics is removed. The goal directed exploration is performedagain with the original robot kinematics and new samples are collected toupdate the model. From the left: 0 samples, 700K samples, 2800K samples.
Fig. 13. Hand trajectories during the cube test, after a 45◦ CCW rotationof the reference frame. The goal directed exploration is performed with therotated reference frame and new samples are collected to update the model.From the left: 0 samples, 700K samples, 2800K samples.
Fig. 14. Hand trajectories during the cube test, after a 45◦ CCW rotation ofthe reference frame. Several iterations of the cube test are performed with therotated reference frame and new samples are collected to update the model.From the left: 1 iteration, 10 iterations (about 150K samples), 25 iterations(about 375K samples).
Fig. 15. Hand trajectories during the cube test, after the 45◦ CCW rotation ofthe reference frame is removed. The goal directed exploration is performedagain with the original reference frame and new samples are collected toupdate the model. From the left: 0 samples, 700K samples, 2800K samples.
707
3) Rotated reference frame: The second modification we
introduce is a 45◦ CCW rotation around the y axis of the task
space reference frame. The perceived position of the hand is
modified as follows:
xMODh = cos 45◦xh − sin 45◦zh (V.1)
yMODh = yh (V.2)
zMODh = cos 45◦xh + sin 45◦zh (V.3)
The goal directed exploration is executed again with the
rotated reference frame and the cube test is performed at
different learning stages. Results are shown in figure 13. We
can observe a similar behavior to the one shown in the case
of the kinematic modification. Again, the learning process is
faster if we train the system on the specific trajectory, as shown
in figure 14. After 2800K samples have been learned with the
rotated reference frame the modification is removed. Again,
the plots in figure 15 show how the system is able to recover
a good estimation of the model.
VI. CONCLUSIONS
We presented a study about the control of a redundant robot
in the task space (operational space) through an autonomously
learned model, and provided experimental results on a sim-
ulated humanoid robot with joints limits. The robot starts
without any prior knowledge and learns online the kinematic
model necessary to control the end-effector in task space
by using arm and waist motion. Differently from previous
works, there is no separation between an exploration phase
and an exploitation phase; on the contrary, the robot shows a
continuous behavior whose performances evolve with time. We
showed the advantages provided by goal directed exploration
in the task space with respect to random motor babbling in the
joints space. Since the learning space is explored through goal
directed movements the part of the motor space that is actually
covered depends on the choice of the controller. Specifically,
we exploit the redundancy of the robot to address two tasks
simulaneously: to cancel the task space position error (main
task) and to keep the joints as far as possible from the limits
(secondary task). We realize this through Jacobian control
with null-space projection. As a consequence of this strategy
the motor space is not explored exhaustively, but only in the
regions that are more relevant for the tasks fulfillment: this
speeds up learning in those regions. As soon as actions need
to be performed in other areas the learned model is updated
accordingly. To test the ability of the system to adapt to
sudden changes both in the robot structure and in the perceived
environment we artificially introduced two different kinds of
kinematic perturbations: a modification of the length of one
link and a rotation of the task space reference frame. Results
demonstrate that the online update of the model allows the
robot to cope with such situations.
The proposed strategy is very general and can be applied to
any robot which needs to be controlled in a specific task
space. In particular, we are already implementing it on the
humanoid robot Kobian [20], in order to control the robot
hand for reaching and grasping purposes; in this case, the task
space variable is the hand position in the robot visual space
(i.e. uncalibrated camera coordinates).
REFERENCES
[1] A. D’Souza, S. Vijayakumar, and S. Schaal, “Learning inverse kine-matics,” in International Conference on Intelligent Robots and Systems(IROS), Maui, Hawaii, USA, Oct. 29 - Nov. 03 2001, pp. 298–303.
[2] C. Salaun, V. Padois, and O. Sigaud, “Control of reundant robots usinglearned models: an operational space control approach,” in InternationalConference on Intelligent Robots and Systems (IROS), St. Luis, USA,October, 11-15 2009, pp. 878–885.
[3] M. Rolf, J. J. Steil, and M. Gienger, “Goal babbling permits directlearning of inverse kinematics,” IEEE Transactions on AutonomousMental Development, vol. 2, no. 3, pp. 216–229, 2010.
[4] A. Baranes and P. Oudeyer, “Intrinsically motivated goal explorationfor active motor learning in robots: A case study,” in InternationalConference on Intelligent Robots and Systems (IROS), Taipei, Taiwan,2010.
[5] S. Rougeaux and Y. Kuniyoshi, “Robust tracking by a humanoid visionsystem,” in International Workshop on Humanoid and Human FriendlyRobotics, Tsukuba, Japan, 1998.
[6] D. Demers and K. Kreutz-delgado, “Learning global direct inversekinematics,” in Advances in Neural Information Processing Systems.Morgan Kaufmann, 1992, pp. 589–595.
[7] L. Natale, F. Nori, G. Metta, and G. Sandini, “Learning precise 3d reach-ing in a humanoid robot,” in International Conference of Developmentand Learning. London, UK: IEEE, 2007.
[8] R. E. Bellman, Dynamic Programming. Princeton University Press,1956.
[9] J. Baillieul, “Kinematic programming alternatives for redundant manipu-lators,” in International Conference on Robotics and Automation (ICRA),St. Louis, USA, 1985, pp. 722–728.
[10] L. Sciavicco and B. Siciliano, “A solution algorithm for the inversekinematic problem for redundant manipulator,” International Journal ofRobotics and Automation, no. 4, pp. 403–410, 1988.
[11] A. Ligeois, “Automatic supervisory control of the configuration andbehavior of multibody mechanisms,” Transactions on System, Man andCybernetics, no. 7, pp. 868–871, 1977.
[12] L. Jamone, L. Natale, M. Fumagalli, F. Nori, G. Metta, and G. Sandini,“Machine-learning based control of a human-like tendon driven neck,”in International Conference on Robotics and Automation. Anchorage,Alaska, USA: IEEE-RAS, 2010.
[13] Y. Nakamura and H. Hanafusa, “Inverse kinematic solutions withsingularity robustness for robot manipulator control,” Transactions ofthe ASME Journal of Dynamic Systems, Measurement and Control, no.108, pp. 163–171, 1986.
[14] S. Chiaverini, “Singularity-robust task-priority redundancy resolutionfor real-time kinematic control of robot manipulator,” Transaction onRobotics and Automation, no. 13(3), pp. 398–410, 1997.
[15] S. Vijayakumar and S. Schaal, “Locally weighted projection regression :An o(n) algorithm for incremental real time learning in high dimensionalspace,” in International Conference on Machine Learning (ICML), 2000,pp. 1079–1086.
[16] S. Schaal and C. G. Atkenson, “Constructive incremental learning fromonly local information,” Neural Computation, vol. 10(8), pp. 2047–2084,1998.
[17] K. Hosoda and M. Asada, “Versatile visual servoing without knowledgeof true jacobian,” in International Conference on Intelligent Robots andSystems (IROS). IEEE, 1994.
[18] V. Tikhanoff, P. Fitzpatrick, G. Metta, L. Natale, F. Nori, and A. Can-gelosi, “An open source simulator for cognitive robotics research:The prototype of the icub humanoid robot simulator,” in Workshopon Performance Metrics for Intelligent Systems, National Institute ofStandards and Technology, Washington DC, August 19-21 2008.
[19] G. Metta, P. Fitzpatrick, and L. Natale, “Yarp: yet another robotplatform,” International Journal on Advanced Robotics Systems, March2006, special Issue on Software Development and Integration inRobotics.
[20] N. Endo, K. Endo, M. Zecca, and A. Takanishi, “Modular design ofemotion expression humanoid robot kobian,” in 18th CISM-IFToMMSymposium on Robot Design, Dynamics, and Control (ROMANSY),Udine, Italy, 2010.
708