Hand Gesture Recognition of English Alphabets usingArtificial Neural Network
Recent Trends in Information Systems (ReTIS-15), Kolkata
S. Bhowmick, S. Kumar and A. KumarPresented by: A. Kumar
Department of Electronics and Communication EngineeringTezpur University, Assam
July, 2015Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 1 / 41
Plan of Talk
• Introduction• Motivation• Theoretical Background• Literature Survey• Problems Identified• Proposed Techniques and Approaches• Experimental Results and Discussions• Comparison with Previous Work• Limitations• Conclusion and Future Direction• References
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 2 / 41
Introduction
• Gestures are all sorts of non-verbally communicated information [2].
• Gestures can be facial expressions, limb movements or anymeaningful body state [1].
• Gestures are of two types- static and dynamic.
Figure: Gesture types
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 3 / 41
Motivation
• Free-hand based tracking is still a challenging issue.
• Problems of recognition of similar gestures and movement epenthesisin the continuous sign language conversation are some of the biggestchallenges before the researchers.
• The work is a step towards helping the hearing and speech-impairedpeople.
Figure: Similar gestures and movement epenthesis
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 4 / 41
Theoretical Background
Figure: Gesture recognition system block diagram
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 5 / 41
Theoretical Background (contd.)
• Colour image planes
I RGB, HSV, YCbCr etc.
• Segmentation
I Segmenting the region of interest (hand) from background and other body parts
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 6 / 41
Theoretical Background (contd.)
• Moment or centroid
I A moment is a gross characteristics of a contour computed by integrating or summing over all of the pixels ofthe contour [10]
mp,q =∑
I(x, y)xpyq (1)
(X, Y ) = (m1,0
m0,0
,m0,1
m0,0
) (2)
• Feature extraction
I Orientation, trajectory length and acceleration
• Classification
I Artificial neural network
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 7 / 41
Types of Gesture Recognition System
• Gloved based systems
I Extra sensors provide better accuracy of result.
I Tedious installation/set-up methods and cumbersome to use [2], [3], [5].
• Vision based systems
I Easy capture of the gestures in front of camera.
I No set-up method and naturalness and ease of work [2], [3], [5].
Figure: Glove based and vision based systems
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 8 / 41
Literature Survey
• “Dynamic Hand Gesture Recognition Using Hidden Markov Model," Z.Yang, Y. Li, W. Chen, and Y. Zheng [3].
• “Trajectory Guided Recognition of Hand Gestures having only GlobalMotions," M. K. Bhuyan, P. K. Bora, and D. Ghosh [5].
• “Continuous Gesture Trajectory Recognition System Based onComputer Vision," X. Wenkai, and E. J. Lee [6].
• “Vision-Based Continuous Sign Language Recognition using ProductHMM," S. H. Yu, C. L. Huang, S. C. Hsu, H. W. Lin, and H. W. Wang [7].
• “Gesture Recognition for Alphabets from Hand Motion Trajectory UsingHidden Markov Models," M. Elmezain, A. Hamadi, G. Krell, S. Etriby, andB. Michaelis [8].
• “Evaluation of HMM Training Algorithms for Letter Hand GestureRecognition," N. Liu, B. C. Lovell, P. J. Kootsookos [9].
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 9 / 41
Problems Identified
• Free hand-based tracking• Recognition of similar gestures and movement epenthesis
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 10 / 41
Proposed Techniques and Approaches
• Select the best segmentation model• Consider the best lighting condition
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 11 / 41
Proposed Techniques and Approaches (contd.)
• We have considered three hand-segmentation models
X Image intensity based model
X Background subtraction model
X HSV+YCbCr model based on skin-colour
• Three lighting conditions considered are
X Normal light
X Poor light
X Bright light
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 12 / 41
Results at Various Lighting Conditions
Figure: Results at various lighting conditions
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 13 / 41
Comparison of Results among the Models
Model Normal Poor light Bright lightImage-intensity Working Sensitive WorkingB. subtraction Sensitive Sensitive Working
Skin-colour based HSV+YCbCr Robust Working Robust
• HSV+YCbCr model due to its best performance has been consideredin the project.
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 14 / 41
The Steps/Algorithm of the Adopted System Model
• Input RGB frame is converted to HSV and YCbCr planes,
• Skin region/pixels are extracted by performing thresholding from bothframes,
• Both the frames are converted into binary frames,
• Morphological operations follow to remove unwanted noisy pixels,
• Logical AND operation is performed between the planes to removebackground noise and to get the most connected region,
• Centroids of segmented palm region is calculated using momentcalculation,
• Gesture trajectories are found by connecting centroids of thesegmented region,
• Drawing of alphabets by hand gestures.Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 15 / 41
Flowchart of the System Model
Figure: Flowchart of the system modelAnurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 16 / 41
Feature Extraction Techniques
I The features considered in the work are
• Orientation feature
• Gesture trajectory length
• Velocity and acceleration
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 17 / 41
Feature Extraction Techniques (contd.)
• Orientation feature
θi = arctan(yi − yi−1
xi − xi−1), i = 1, 2, ...N (3)
• Gesture trajectory length [5]
di =√(x′
i − x′i−1)
2 + (y′i − y′i−1)2 (4)
lN =∑√
(x′i − x′
i−1)2 + (y′i − y′i−1)
2 =∑
di (5)
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 18 / 41
Feature Extraction Techniques (contd.)
• Velocity and acceleration
X Velocity (v) is defined as the rate of change of distance with time.
v =ds
dt(6)
X In coordinate system velocity [5] will be determined by
vi = (vx, vy) = (xi+1 − xi
ti+1 − ti,yi+1 − yiti+1 − ti
) (7)
X Acceleration (a) is defined as the rate of change of velocity
a =dv
dt(8)
a = (vi+1 − viti+1 − ti
) (9)
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 19 / 41
Feature Extraction System Model
Figure: Feature extraction block diagramAnurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 20 / 41
Feature Extraction System Model (contd.)
Figure: Feature extraction techniquesAnurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 21 / 41
Network Specifications of MLP for Isolated GestureAlphabets
ANN type MLPTraining algorithm traingda
Maximum number of epochs 200
Input layer size 128
MSE goal 0.0001
Training time 18.05 secondsTesting time 15.86 seconds
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 23 / 41
Isolated Gesture Alphabets
Figure: A few isolated gestures considered in the workAnurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 24 / 41
Confusion Matrix for Isolated Gesture Alphabets
Gesture S T CS RS WS RR (%)=CST × 100%
A 30 30 27 1 2 90.0%B 30 30 26 2 2 86.7%C 30 30 29 1 0 96.7%D 30 30 25 2 3 83.3%E 30 30 29 1 0 96.7%F 30 30 29 0 1 96.7%O 30 30 28 1 1 93.3%Q 30 30 29 1 0 96.7%
• S is the total number of samples trained, T is that of samplestested, CS is that of correctly spotted gestures, RS is that ofrejected gesture samples, WS is that of wrongly spotted gesturesand RR is the recognition rate in %
• Overall recognition rate=(240-18)/240= 92.5%
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 25 / 41
Recognition Rate Results for Isolated GestureAlphabets
No. of trained samples per gesture class 30
No. of tested samples per gesture class 30
No. of validation samples per gesture class 30
No. of gestures taken 8
Total no. of gestures tested 30× 8 = 240
Total misclassified gestures 18
Recognition rate (percent) 222240 = 92.5%
Best result C,E,F,QPoorest result D
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 26 / 41
Hand Gesture Recognition for Continuous GestureAlphabets
• Regular methods or approaches for detecting isolated gesture symbolwill not hold good for continuous gesture as the problem of separationbetween two gestures arise.
• The separation requires another powerful feature such as accelerationto detect the individual symbol boundary.
• It has been observed that velocity (v) remains almost constant for aparticular gesture symbol. So acceleration(a) becomes zero.
If v=constant => a=dv/dt=0
• By selecting a suitable threshold for acceleration, we can detect thesymbol boundary.
• In this work, the acceleration and velocity features have been used tospot the start and end points of an individual gesture in the continuousgestures.Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 27 / 41
Hand Gesture Recognition for Continuous GestureAlphabets (contd.)
• Multi Layer Perceptron (MLP) and Focussed Time Delay NeuralNetwork (FTDNN) are trained with these feature information to recognizethe individual gestures in the continuous gestures.
Figure: Continuous gesture recognition techniques
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 28 / 41
Continuous Gesture Alphabet
Figure: Continuous gesture alphabet BA
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 29 / 41
Network Specifications of MLP for Continuous GestureAlphabets
ANN type MLPTraining algorithm traingda
Maximum number of epochs 200
Input layer size 256
MSE goal 0.0001
Training time 25.45 secondsTesting time 21.56 seconds
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 30 / 41
Confusion Matrix for Continuous Gesture Alphabetswith MLP
Gesture S T CS RS WS RR (%)=CST × 100%
BA 30 30 26 2 2 86.7%AD 30 30 25 2 3 83.3%CA 30 30 28 2 0 93.3%CB 30 30 27 1 2 90.0%CF 30 30 29 1 0 96.7%DB 30 30 26 3 1 86.7%AA 30 30 26 2 2 86.7%
• Overall recognition rate=(210-23)/210= 89.05%
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 31 / 41
Recognition Rate Results for Continuous GestureAlphabets with MLP
No. of trained samples per gesture class 30
No. of tested samples per gesture class 30
No. of validation samples per gesture class 30
No. of gestures taken 7
Total no. of gestures tested 30× 7 = 210
Total misclassified gestures 23
Recognition rate (percent) 187210 = 89.05%
Best result CFPoorest result AD
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 32 / 41
Network Specifications of FTDNN for ContinuousGesture Alphabets
ANN type MLPTraining algorithm traingda
Maximum number of epochs 200
Input layer size 256
MSE goal 0.0001
Training time 26.07 secondsTesting time 21.34 seconds
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 33 / 41
Confusion Matrix for Continuous Alphabets withFTDNN
Gesture S T CS RS WS RR (%)=CST × 100%
BA 30 30 25 1 4 83.3%AD 30 30 25 3 2 83.3%CA 30 30 27 1 2 90.0%CB 30 30 28 2 0 93.3%CF 30 30 29 0 1 96.7%DB 30 30 24 4 2 80.0%AA 30 30 25 1 4 83.3%
• Overall recognition rate=(210-27)/210= 87.14%
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 34 / 41
Recognition Rate Results for Continuous GestureAlphabets with FTDNN
No. of trained samples per gesture class 30
No. of tested samples per gesture class 30
No. of validation samples per gesture class 30
No. of gestures taken 7
Total no. of gestures tested 30× 7 = 210
Total misclassified gestures 27
Recognition rate (percent) 183210 = 87.14%
Best result CFPoorest result DB
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 35 / 41
Discussions
� The combined features of orientation and gesture trajectory length forisolated gestures efficiently deal with the problem of recognition ofsimilar gestures.
� The use of velocity and acceleration feature enable us to recognizethe continuous gestures and in the process the problem of movementepenthesis is removed.
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 36 / 41
Comparison with Previous Work
• In [6], recognition rate obtained for isolated Arabic numerals (0-9) is93.72% using Hidden Markov Model (HMM-FNN) and is increased to95.76% while employing HMM-FNN (Fuzzy Neural Network) hybridframework.
• HMM and Conditional Random Field (CRF) are more frequently usedin pattern recognition purpose compared to ANN, and in [6] only isolatedgestures have been considered, and therefore the recognition rateobtained in this work can be said to be satisfactory.
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 37 / 41
Limitations
� The method is illumination sensitive.
� Presence of skin-colour background deteriorates the result.
� The gesturer must always wear full-sleeve cloth.
� Presence of multiple gesturer aggravates the result.
� The method is not for real-time gesture recognition system.
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 38 / 41
Conclusion and Future Direction
� A free hand based hand gesture recognition system for both isolatedand continuous gestures has been developed.
� The problems of recognizing similar gestures and movementepenthesis have been eradicated.
� Scope of future work will be to develop and implement the work inbare hand without wearing full sleeves cloth.
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 39 / 41
References[1] S. Mitra, and T. Acharya, “Gesture Recognition: A Survey, in IEEE Transactions on Systems, Man, andCyberneticsPart C: Applications and Reviews, vol. 37, no. 3, pp. 311-324, 2007.
[2] M. E. Al-Ahdal, and N. M. Tahir, “Review in Sign Language Recognition Systems", in IEEE Symposium on Computersand Informatics, pp. 52-57, 2002
[3] Z. Yang, Y. Li, W. Chen, and Y. Zheng, “Dynamic Hand Gesture Recognition Using Hidden Markov Model", in 7thInternational Conference on Computer Science Education, pp. 360-365, 2012.
[4] J. Appenrodt, A. Hamadi, and B. Michaelis,“Data Gathering for Gesture Recognition Systems Based on Single Color,Stereo Color and Thermal Cameras", in International Journal of Signal Processing, Image Processing and PatternRecognition, vol.3, pp. 37-50, 2010.
[5] M. K. Bhuyan, P. K. Bora, and D. Ghosh, “Trajectory Guided Recognition of Hand Gestures having only GlobalMotions", in World Academy of Science, Engineering and Technology, vol. 21, pp.753-764, 2008.
[6] X. Wenkai, and E. J. Lee, “Continuous Gesture Trajectory Recognition System Based on Computer Vision", inInternational Journal of Applied Mathematics and Information Science, pp. 339-346, 2012.
[7] S. H. Yu, C. L. Huang, S. C. Hsu, H. W. Lin, and H. W. Wang, “Vision-Based Continuous Sign Language Recognitionusing Product HMM", in First Asian Conference on Pattern Recognition (ACPR), pp. 510-514, 2011.
[8] M. Elmezain, A. Hamadi, G. Krell, S. Etriby, and B. Michaelis, “Gesture Recognition for Alphabets from Hand MotionTrajectory Using Hidden Markov Models", in The 7th IEEE International Symposium on Signal Processing and InformationTechnology, pp. 1209-1214, 2007.
[9] N. Liu, B. C. Lovell, P. J. Kootsookos, “Evaluation of HMM Training Algorithms for Letter Hand Gesture Recognition", inProceedings of 3rd IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp.648-651, 2003.
[10] G. Bradski, and A. Kaehler, Learning OpenCV, 1st Ed., OReilly Media Inc., 2008.
Anurag Kumar et al. ReTIS-15 (Paper Id. 246) July 10, 2015 40 / 41
Top Related