Human Hand Posture Reconstruction for a Visual Control of an Anthropomorphic Robotic Hand

Human Hand Posture Reconstruction for a Visual Control of an Anthropomorphic Robotic Hand

Ignazio Infantino1,Antonio Chella1, 2, Haris Džindo1, Irene Macaluso1

1ICAR-CNR sez. di Palermo edif. 11, Viale delle Scienze, 90128, Palermo, Italy

{infantino,dzindo,macaluso}@pa.icar.cnr.it http://www.pa.icar.cnr.it

2DINFO Università di Palermo, Viale delle Scienze, 90128, Palermo, Italy

[email protected] http://www.csai.unipa.it

Abstract. The paper deals with the control of an anthropomorphic robotic hand by vision: our approach is to provide a Human-Computer-Interface based on natural hand gesture imitation of the operator showing his hand. We describe the design and the implementation of the visual control of a robotic system composed of an anthropomorphic hand and a stereo rig. The aim of the pro-posed system is to understand or reproduce the movements of a human hand in order to learn complex manipulation tasks. Moreover, the robotic system could receive tasks by carrying on a dialogue with human instructors involving ges-tures. A novelty algorithm for a robust and fast fingertips localization and track-ing is presented. Moreover, a simulator is integrated in the system to give useful feedbacks to the users during operations, and provide robust testing framework for real experiments.

1 Introduction

The control of robotic systems has reached a high level of precision and accuracy, but often the high complexity and task specificity are limiting factors for large scale uses. Today, robots are requested to be both “intelligent” and “easy to use”, allowing a natural and useful interaction with human operators and users. In this paper, we deal with the (remote) control of an anthropomorphic robotic hand: our approach is to provide a Human-Computer-Interface [10],[15],[20] based on natural hand gesture imitation of the operator showing his hand [8]. Different methods have been proposed to capture human hand motion. Rehg and Kanade [11],[12] introduced the use of a highly articulated 3D hand model for the tracking of a human hand (see also [14]). For tracking, the axes of the truncated cylinders that are used to model phalanges, are pro-jected onto the image, and local edges are found. Fingertip positions are measured through a similar procedure. A nonlinear least squares method is used to minimize the error between the measured joint and tip locations and the locations predicted by the model. The system runs in real-time, however, dealing with occlusions and handling

https://www.researchgate.net/publication/2530491_A_Gesture_Based_Interface_for_Human-Robot_Interaction?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3890679_Real-Time_Input_of_3D_Pose_and_Gestures_of_a_Useraposs_Hand_and_Its_Applications_for_HCI?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3907604_Real-time_3D_hand_posture_estimation_based_on_2D_appearance_retrieval_using_monocular_camera?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3561450_DigitEyes_vision-based_hand_tracking_for_human-computer_interaction?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/5608696_Visual_recognition_of_continuous_hand_postures?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3192709_Visual_Interpretation_of_Hand_Gestures_for_Human-Computer_Interaction_A_Review?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/221303972_Visual_Tracking_of_High_DOF_Articulated_Structures_an_Application_to_Human_Hand_Tracking?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

background clutter remains a problem. Heap and Hogg [6] used a deformable 3D hand shape model. The hand is modeled as a surface mesh which is constructed via PCA from training examples. Real-time tracking is achieved by finding the closest possibly deformed model matching the image. In [18] Cipolla and Mendoca presented a stereo hand tracking system using a 2D model deformable by affine transformations. Wu and Huang [21] proposed a two-step algorithm to estimate the hand pose, first estimating the global pose and subsequently finding the configuration of the joints. Their algo-rithm relies on the assumption that all fingertips are visible. Shimada et al. [15], [16] estimate the whole finger joints and 3-D palm orientation from the silhouette in real-time, and use Kalman filter for estimating 3-D shape and motion. Another promising approach [13] uses stochastic visual segmentation and non-linear supervised learning in order to recognize the gesture and classify it as a member of a predefined class. Our method uses fingertips as features, considering each finger except the thumb as planar manipulator, and starting from this hypothesis we compute inverse kinematics to control the robotic hand. This paper is structured as follows: section 2 describes the whole system and the experimentation setup; section 3 deals with the fingertip local-ization and the tracking on image sequence; section 4 concerns the control of robotic hand by inverse kinematics computation; section 5 reports the details about various experimentations done; finally, some conclusions and future work are drawn in section 6.

2 The Implemented System Description

Our proposal deals with the design and implementation of a visual control of a robotic system composed by a dexterous anthropomorphic hand (the DIST-Hand [1]) and calibrated stereo cameras. The aim of the present work is to reproduce real-time movements of a human hand in order to learn complex manipulation tasks. The human hand is placed in front of the stereo couple. A black panel is used as back-ground and illumination is controlled. Fingertips and wrist are extracted at each frame and their 3D coordinates are computed by a standard triangulation algorithm. Given a hand model, inverse kinematics is used to compute the joint angles which are trans-lated to suitable commands sent to the DIST-Hand controller. The system (see figure 1) is composed by some common modules that work in parallel on the two flows of images sourcing from the stereo rig:

1) image pre-processing operation (noise filtering, tresholding, edge detecting); 2) the fingertip tracking following the first estimation; 3) fingertip localization on image based on previous position.

The coordinates of the fingertips are the input of the other software modules responsi-ble of:

1) 3D coordinate computation; 2) inverse kinematics;

https://www.researchgate.net/publication/221260032_Model-Based_Hand_Tracking_Using_an_Unscented_Kalman_Filter?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3745223_Hand_gesture_estimation_and_model_refinement_using_monocular_camera_Ambiguity_limitation_by_inequality_constraints?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==


https://www.researchgate.net/publication/2468566_View-independent_Recognition_of_Hand_Postures?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3669613_Towards_3D_hand_tracking_using_a_deformable_model?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3731387_The_DIST-HAND_robot?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3906086_3D_hand_pose_reconstruction_using_specialized_mappings?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

3) actuator commands computation.

��

� ��

��

� � ��

xreal�

xpred

� ��

� � ��

� � ��

� � � ��

I*

X

��

� � � ��

� � ��

� � � ��

Qhuman

� ��! ��

� � ��

Qrobot

� ��

Fig. 1. The system architecture: two parallel set of procedures perform separately the estima-tion of the fingertips on right and left images; the coordinates extracted are the input of the pro-cedures responsible for the control of the robotic hand.

3 Fingertip localization and tracking

This section deals with the localization of the fingertips in long image sequences. In order to achieve real-time performances state-space estimation techniques are used: this reduces the search space from the full camera view to a smaller search window centered on the prediction of the future position of the fingertips. In the following, a novel algorithm is proposed to detect the fingertip position inside the search window. The visual tracking procedure starts with a preliminary setup phase, based on various image processing operations. The procedure individuates the hand/arm on image, compares it with robotic one and calculates exact size ratios. Relevant features are tracked on image sequences, and reconstructed in a three-dimensional space, using

standard computer vision algorithms [3]. These features are used as a guide to perform 3D reconstruction of the simulated arm/hand, and real system positioning. The recon-structed movements are reproduced both by the simulator and the robotic system in real time.

3.1 Fingertip Tracking

In this section we are involved with prediction of feature points in long image se-quences. If the motion of the observed scene is continuous, as it is, we should be able to make predictions on the motion of the image points, at any time, on the basis on their previous trajectories. Kalman filter [2],[18] is the ideal tool in such tracking problems. The motion of a fingertip at each time instance (frame) can be characterized by its po-sition and velocity. For a feature point, define the state vector at time tk as

x(k)= [ r(k) c(k) vr(k) vc(k)]T (1)

where (r(k), c(k)) represent the fingertip pixel position at kth frame, and (vr(k) , vc(k)) be its velocity in r and c directions. Under the assumption that a fingertip moves with constant translation over the long sequence and a sufficiently small sampling interval (T), the plant equation for the re-cursive tracking algorithm can be written as

X(k+1)=Ax(k)+w(k) (2)

where

1 0 T 0 0 1 0 TA= 0 0 1 0

0 0 0 1

(3)

and the plant noise w(k) is assumed to be zero mean with covariance matrix Q(k). For each fingertip, the measurement used for the corresponding recursive tracking fil-ter at time tk is the image plane coordinates of the corresponding point in the kth im-age. Thus the measurement is related to the state vector as:

Z(k) = Hx(k) + v(k) (4)

where

1 0 0 0H = 0 1 0 0 (5)

and the measurement noise v(k) is assumed to be zero mean with covariance matrix R(k).


https://www.researchgate.net/publication/234268591_Kalman_filtering_with_real_time_applications?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

After the plant and measurement equations have been formulated, the Kalman filter can be applied to recursively estimate the motion between two consecutive frames and track the feature point. Assuming that the trajectory of a feature point has been established up to the kth frame of a sequence, we describe procedures for estimating the in-frame motion between the kth and (k+1)th frames. First, for a fingertip, the predicted location of its corresponding point z(k+1|k) is computed by the KF. Subsequently, a window centering at z(k+1|k) is extracted from the (k+1)th image and the fingertip extraction algorithm is applied to the window to identify salient feature points as described in the next section. The same process is repeated for each fingertip. The initial fingertips and wrist coordinates are computed by assuming the hand is placed as in figure 2. Under such hypothesis it is straightforward to compute the fin-gertip coordinates.

(a)

(b)

Fig. 2. (a) Initial hand configuration used to calculate the first estimation of the fingertip posi-tions (b).

3.2. Fingertip Extraction

This subsection describes a novelty algorithm to perform robust and fast localization of the fingertips. Once a search window is determined for each fingertip (see figure 3.a), the feature point is searched for within that window. The overall shape of a hu-man fingertip can be approximated by a semicircle. Based on this observation, finger-tips are searched for by template matching with a set of circular templates as shown in figure 3.c. Ideally, the size of the templates should differ for different fingers and dif-ferent users. However, our experiments showed that fixed window size works well for various users. This result has been proven by other authors [9]. We choose a square of 20x20 pixels with a circle whose radius is 10 pixels as a template for normalized cor-relation in our implementation.

https://www.researchgate.net/publication/221292214_Real-Time_Tracking_of_Multiple_Fingertips_and_Gesture_Recognition_for_Augmented_Desk_Interface_Systems?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

(a)

(b)

(c)

(d)

(e)

Fig. 3. (a) fingertip search window centered on predicted location (b) of the fingertip; (c) circu-lar templates used to search fingertips; (d) the application of the correlation with the five tem-plates produces a series of local maximums, points with high correlation response individuate three groups and provide three fingertip candidate; (e) the red point is the new estimation of fingertip position selected using the data association algorithm.

In order to find meaningful data we first perform an adaptive threshold on the re-sponse images to obtain distinct regions. For each correlation we select the points with the highest matching scores (usually 2-3) which lies in unconnected regions of the thresholded image. Subsequently, near responses of different correlations are grouped. We select the groups characterized by the highest number of points associated and the centroid of such groups are used to compute the fingertip candidates (see figure 2.d). In order to calculate those candidates, we must combine the information provided by each point in the group. However, only the points corresponding to semi-circle corre-lation responses are used in this stage.

Each point is associated with a vector whose direction depends on the orientation of the corresponding semi-circle (i.e. right - 0°, up - 90°, and so on). Its module is given by the correlation matching score. The direction in which the fingertip candidate should be searched is computed as the angle formed by the vector sum of such vec-tors. Therefore, the fingertip candidate coordinate for each group lies on the line through the previously computed centroid and whose direction is given by the above procedure.

Fig. 4. Results of fingertips extraction in different hand postures; the algorithm robustness has been tested also with artificial noise added.

We use techniques of data association based on the previous measurements and pre-dicted state to choose the correct fingertip among the selected candidates (see figure 2.e). The robustness of this algorithm has been tested using various fingers configuration: also in the more complicated case of two very close fingers on the background palm the system follow the correct feature. The addition of artificial noise or the use of a lower image scale doesn’t degrade the performance of the algorithm (see figure 4). No constraints of the possible hand postures are necessary and gray-level video images are sufficient to obtain good results.

3.3. Three-dimensional coordinate computation

Given the camera matrices it is possible to compute the fundamental (or essential) ma-trix of the stereo rig [3]. Epipolar geometry is used to find correspondences between two views. In order to reduce computational efforts, matching features are searched on the epipolar line in the search window of the second image provided by KF by means

of normalized correlation. 3D fingertip coordinates are computed by a standard trian-gulation algorithm.

4. The robotic hand control

The coordinates of the four fingertips and the wrist are used to solve the inverse kine-matics problem for joint angles, provided a kinematics model of a human hand. The model adopted [7] is designed to remain simple enough for inverse kinematics to be done in real-time, while still respecting human hand capabilities. The whole hand is modeled by a 27 degrees of freedom skeleton whose location is given by the wrist’s middle point and whose orientation is that of the palm. Each fin-ger consists of three joints whose names are shown in the figure 5.a and which are ca-pable of two types of motion: abduction/adduction (AA) and flexion/extention (FE). Except for the thumb, there are 2 DOF for metacapophalangeal (MCP) joints (FE+AA), and 1 DOF for proximal interphalangeal (PIP) joints and distal interpha-langeal (DIP) joints (FE). The thumb is modeled by a 5 DOF kinematical chain, with 2 DOF for the trapezio-metacarpal (TM) and MCP joint (AA+FE) and 1 for interpha-langeal (IP) joint (FE). Fortunately, hand motion is also highly constrained so that the hand gestures are not completely arbitrary [23]. Furthermore, we need to consider the kinematical structure of the DIST-Hand robot which allows additional simplification of our model. In order to obtain meaningful configurations and to reduce the dimensionality of the joint space we include those constraints into the model. The constraints we have used are described in the following. DIST-Hand robot constraints. Since the robotic hand has only four fingers we are not interested in the posture of pinky (4 DOF). Furthermore, it is not necessary to compute the orientation (3 DOF) of the human hand since the DIST-Hand has no wrist. Thus, the number of DOF can be reduced to 20. Static constraints. They express the admissible range of joint motions as a result of the hand anatomy:

-15°<=qMCP-AA<=15° (except for the middle where qMCP-AA=0°),

0°<=qMCP-FE<=90°, 0°<=qPIP-FE<=110°, 0°<=qDIP-FE<=90°,

qTM-AA=0°.

(6)

The first and the last equation reduce the number of DOF of the model to 18.

https://www.researchgate.net/publication/3207929_Model-Based_Analysis_of_Hand_Posture?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/285776637_Modeling_the_constraints_of_human_hand_motion?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

Fig. 5. (a) The human hand model adopted for calculating inverse kinematics. The whole hand is modeled by a 27 dof skeleton whose location is given by the wrist’s middle point and whose orientation is that of the palm. Each finger has 4 dof, namely one in abduction/adduction and three in flexion/extension. Since the robotic hand has four fingers and it is not placed on an arm, we are not interested neither in the posture of the pinky, nor in the position/orientation of the hand. Then, given the hand constraints, it is possible to reduce the number of DOF to 13. (b) The reference systems and transformation matrices used.

Dynamic constraints. They impose limits on joints during finger motions. The most commonly constraint used is the one which relates DIP joints of fingers II-IV to PIP

joints of the same fingers. Since these two joints are driven by the same tendon it can be stated that:

qDIP=(2/3)*qPIP,

(7)

This relationship reduces the number of DOF to 15. Planarity constraint. The fingers II-IV can be modeled as planar manipulators once the AA joint angle has been set for these fingers.

4.1. Inverse Kinematics

In order to solve the inverse kinematics problem, calibration of the human hand has to be performed to adapt to various users. This process must be done off-line using an in-teractive procedure. We use 6 frames to represent finger kinematics (see figure 5.b): Oc (the camera frame), Oh (the hand frame whose origin is the wrist’s middle point and whose axes are that of the camera frame) and Ofr(I-IV) (the finger-root frame whose origin is at first flexion/extension joint of each finger and whose axes are defined by the palm ori-entation with z axes perpendicular to the palm plane and x axis pointing towards the respective fingertip in the initial configuration). Under the assumption that the hand orientation doesn’t change in time, we can com-pute the fixed transformation matrixes Th

fr(I-IV) during the initialization phase. In such a way it is possible to express the coordinates of each fingertip in the finger-root frame and solve for the joint state vector:

ftfr=Th

fr*Tch*ftc

(8)

For each finger except the thumb we compute the joint variable qMCP-AAII-IV as the an-

gle between the projection of the fingertip on the palm plane and the x axes. By rotat-ing the finger-root frame around z axis by angle qMCP-AA

II-IV, each finger II-IV can be represented as a 3-link planar manipulator. The joint angles qMCP-FE, qPIP and qDIP=(2/3)qPIP for the fingers II-IV are then found by solving the inverse kinematics problem for such a manipulator. The thumb joint angles are found directly by solving the inverse kinematics problem for the kinematics model associated with it. Given the fingertip coordinates xi(k) and the forward kinematics equation xi(k)=k[qi(k)], i=I...IV, and denoting with D(x,y) the distance between two points, each state vector qi(k) may be computed by minimizing the following function:

F (qi(k)) = D ( xi(k), k(qi(k)) ), i=I...IV

(9)

subject to the human hand constraints. The initial guess is set to the previously com-puted joint vector qi(k-1) since under the slow motions assumption we can state that the joint angles vary little from time k-1 to time k.

An iterative algorithm provides the correct solution in few iterations (usually less then 10). The computed positions of the hand are sent to the human-to-robot mapping unit which computes the corresponding robot joint angles. These angles are communicated by local network to the real hand, or by the DDE mechanism to the simulator (see fig. 6.a).

(a)

(b)

(c)

Fig. 6. (a) A simulator is part of the system architecture, allowing to visualize the reconstructed movements on the virtual robotic hand. The 3D model used is very similar to real robotic hand and receive the same commands. The simulator also includes the model of a PUMA-arm that in future will complete our system architecture. (b) Human hand calibration software. (c) An ex-ample of configuration used in the experimentations: one or two web cameras connected to a PC that perform movements reconstruction, the robotic hand and its controller accessible by network.

Fig. 7. Some examples of posture imitated by the robotic hand. The reconstruction of the hu-man hand allows to grasp the glass.

5. Experimental Results

We have tested our proposed method by using two different stereo rigs: the first one was composed by two Sony cameras, and the second one by two usb web-cams. The software (the described system and the simulator) runs on a personal computer (Pen-tiumIII, 450MHz), equipped with two video grabber cards and an ethernet card. The movements command are send to robotic DIST-hand using a tcp-ip link to it. The various procedures implemented to perform the visual control allow a frame rate of 10 images per second, permitting a qualitative correct reproduction of the move-ments of the real hand presented. If the system find inconsistencies of fingertip positions or fails to follow them (mainly due a too rapid movements), the user can establish a new visual control showing the hand in the position of initialization. A more reactive behavior of the system is ob-tained if it is used only the images of a single camera: avoiding the 3D reconstruction of the fingertips, it is possible to perform the control of the robotic hand using the variations of the length ratios from the initial position. Naturally, the precision is lower but it is enough to execute manipulation or grasping tasks (see example shown in figure 7).

Opened hand (initial position, OH)

100%

Thumb (T)

89%

Index (I)

92%

Middle (M) 91%

Ring (R)

90%

Two fingers (TI, TM,TR,IM, IR, MR)

85%

Three fingers (TIM, TIR,IMR, TMR)

80%

Four fingers (TIMR)

74%

Closed hand (CH) 73%

Fig. 6: We have tested the system performing 17 standard movements starting from the initial position. For each final posture, it ha been calculated the percentage of success in reaching cor-rect final position by qualitative observation of the robotic hand. The results shown are related to 20 acquired sequences for each class of movements. Note: letters T,I,M,R,P indicate the fin-gers. (Sample images/ Hand movement/ Percentage of reproduction success).

6. Conclusions

The research demonstrates how to control in “natural way” a robotic hand using only visual information. The proposed system is able to reproduce the movements of a hu-

man hand in order to perform telepresence tasks or to learn complex manipulation tasks. Moreover, a novelty algorithm for a robust and fast fingertips localization and tracking is been presented and results shown the goodness of the proposed solution. Future work will extend the visual control to a complete robotic arm-hand system.

7. Acknowledgements

This research is partially supported by MIUR (Italian Ministry of Education, Univer-sity and Research) under project RoboCare (A Multi-Agent System with Intelligent Fixed and Mobile Robotic Components).

8. References

1. S. Bernieri, A. Caffaz, G. Casalino, “The DIST-Hand” Robot IROS '97, Conference Video Proceedings, Grenoble (France), September 1997.

2. C. K. Chui, G. Chen, Kalman Filtering: With Real-Time Applications, Springer Series in Information Sciences, 1999.

3. O. Faugeras, Three-Dimensional Computer Vision, MIT Press, Cambridge, MA, 1993. 4. B.P.K. Horn, Robot Vision, MIT Press, Cambridge, MA, 1986. 5. F. Lathuilière, J.Y. Hervé, “Visual Hand Posture Tracking in a Gripper Guiding Applica-

tion”, in Proc. Of IEEE Int’l Conference on Robotics and Automation, San Francisco, April 2002.

6. A. J. Heap, D. C. Hogg, “Towards 3-D hand tracking using a deformable model”, in 2nd International Face and Gesture Recognition Conference, pp. 140-145,Killington, Vermont, USA, October 1996.

7. J. Lee, T.L. Kunii, “Model-based analysis of Hand Posture”, in IEEE Computer Graphics and Applications, pp. 77-86, 1995.

8. C. Nolker, H. Ritter, “Visual Recognition of Continuous Hand Postures”, IEEE Transac-tion on Neural Network, Vol. 13, no. 4, July 2002.

9. K. Oka, Y. Sato, H. Koike, “Real-time tracking of multiple fingertips and gesture recogni-tion for augmented desk interface systems”, in Proc. 2002 IEEE Int'l Conf. Automatic Face and Gesture Recognition 2002 , pp. 429-434, May 2002.

10. V. Pavlovic, R. Sharma, T.S. Huang., “Visual interpretation of hand gestures for human-computer interaction: a review”, IEEE PAMI, 19(7):677–695, 1997.

11. J.M. Rehg, T. Kanade, “Digit-Eyes: Vision-Based Hand Tracking for Human-Computer In-teraction”, in Proc. of the IEEE Workshop on Motion of Non-Rigid and Articulated Ob-jects, Austin, Texas, Nov. 1994, pp.16-22.

12. J.M. Regh, T. Kanade, “Visual tracking of high dof articulated structures: an application to human hand tracking”, in Proc. of Third European Conf. on Computer Vision, Stockholm, Sweden, 1994, pp. 35-46.

13. R. Rosales, V. Athitsos, L. Sigal, S. Sclaroff, “3D Hand Pose Reconstruction Using Spe-cialized Mappings”, in Proc. IEEE Int.l. Conf. on Computer Vision (ICCV’01). Canada. Jul. 2001.

14. Y. Sato, M. Saito, H. Koike, “Real-Time Input of 3D Pose and Gestures of a User’s Hand and Its Applications for HCI”, on Proc. of Virtual Reality 2001 Conference, pp. 13-17, March 2001, Yokohama, Japan.













https://www.researchgate.net/publication/3847387_Visual_hand_posture_tracking_in_a_gripper_guiding_application?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==




















15. N. Shimada, K. Kimura and Y. Shirai, “Real-time 3-D Hand Posture Estimation based on 2-D Appearance Retrieval Using Monocular Camera”, in Proc. Intl. Workshop on RATFG-RTS (satellite WS of ICCV2001), pp.23-30, 2001.

16. N. Shimada, Y. Shirai, Y. Kuno and J. Miura, “Hand Gesture Estimation and Model Re-finement using Monocular Camera -- Ambiguity Limitation by Inequality Constraints”, in Proc. of The 3rd Intl. Conference on Automatic Face and Gesture Recognition, pp.268-273, 1998.

17. T. Starner, J. Weaver, A. Pentland, “Real-time American sign language recognition using desk- and wearable computer-based video”, IEEE PAMI, 20(12): 1371–1375, 1998.

18. B.D.R. Stenger, P.R.S. Mendonca, R. Cipolla, “Model-Based Hand Tracking Using an Un-scented Kalman Filter”, in Proc. British Machine Vision Conference, Manchester, UK, September 2001.

19. B.D.R. Stenger, P.R.S. Mendonca, R. Cipolla, “Model based 3D tracking of an articulated hand”, In Proc. CVPR‘01, pages 310–315, 2001.

20. S. Waldherr S., Thrun S., Romero R, “A gesture-based interface for human-robot interac-tion”, Autonomous Robots, 9, (2), 2000, 151-173.

21. Y. Wu, T. S. Huang, “View-independent Recognition of Hand Postures”, in Proc. of IEEE CVPR'2000, Vol. II, pp.88-94, Hilton Head Island, SC, 2000

22. Y. Wu, J.Y. Lin, T. S. Huang, “Capturing Natural Hand Articulation”, in Proc. of IEEE Int'l Conf. on Computer Vision, Vancouver, Canada, 2001

23. Y. Wu, J. Y. Lin, T. S. Huang, "Modeling Human Hand Constraints", in Proc. Of Work-shop on Human Motion (Humo2000), Austin, TX, Dec., 2000.






https://www.researchgate.net/publication/3940737_Model-based_3D_tracking_of_an_articulated_hand?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3940737_Model-based_3D_tracking_of_an_articulated_hand?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==








https://www.researchgate.net/publication/3192935_Real-time_American_sign_language_recognition_using_desk_and_wearable_computer_based_video?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==

https://www.researchgate.net/publication/3192935_Real-time_American_sign_language_recognition_using_desk_and_wearable_computer_based_video?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==



https://www.researchgate.net/publication/3906157_Capturing_natural_hand_articulation?el=1_x_8&enrichId=rgreq-13c78ed407db488d789fd4d87d1ec0e2&enrichSource=Y292ZXJQYWdlOzIyODkxMjk0MjtBUzoxMDY1NzgwOTUxMTYyOTJAMTQwMjQyMTYwNjkyNQ==





Human Hand Posture Reconstruction for a Visual Control of an Anthropomorphic Robotic Hand

Documents

Transcript of Human Hand Posture Reconstruction for a Visual Control of an Anthropomorphic Robotic Hand