Download - Sign language recognition using intrinsic-mode sample entropy on sEMG and accelerometer data

Transcript

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009 2879

Sign Language Recognition Using Intrinsic-ModeSample Entropy on sEMG and Accelerometer Data

Vasiliki E. Kosmidou, Student Member, IEEE, and Leontios J. Hadjileontiadis∗, Member, IEEE

Abstract—Sign language forms a communication channel amongthe deaf; however, automated gesture recognition could furtherexpand their communication with the hearers. In this work, datafrom five-channel surface electromyogram and 3-D accelerometerfrom the signer’s dominant hand were analyzed using intrinsic-mode entropy (IMEn) for the automated recognition of Greek signlanguage (GSL) isolated signs. Discriminant analysis was used toidentify the effective scales of the intrinsic-mode functions and thewindow length for the calculation of the IMEn that contributes tothe efficient classification of the GSL signs. Experimental resultsfrom the IMEn analysis applied to GSL signs corresponding to60-word lexicon repeated ten times by three native signers haveshown more than 93% mean classification accuracy using IMEnas the only source of the classification feature set. This provides apromising bed-set toward the automated GSL gesture recognition.

Index Terms—Accelerometer, empirical mode decomposition(EMD), intrinsic-mode entropy (IMEn), sign recognition, surfaceelectromyogram (sEMG).

I. INTRODUCTION

S IGN language (SL) is the native language for the deaf.Although they successfully communicate with each other

when using SL, they face many difficulties when they try to com-municate with hearers, especially those who are incompetent inSL; it is like sensing the existence of an invisible communica-tion wall. Solutions such as pointing, writing, or lip-reading,combined with some new technological means, i.e., faxes, com-puters (e.g., e-mails), mobile phones (e.g., SMSs), facilitate suchcommunication. Nevertheless, SL interpreters are mainly thosewho could smooth out this communication barrier; yet, they areso few compared to the 700,000,000 deaf or hard of hearingpeople worldwide (World Health Organization) and the 143 ex-isting different SLs (types with dialects). Being as complex asany spoken language, SL has many thousands of signs formedby specific gestures (related to the hands) and facial expres-sions, each differing from another by minor changes in handmotion, shape, position, and facial expression. This, along with

Manuscript received July 25, 2008; revised October 23, 2008. First publishedJanuary 23, 2009; current version published November 20, 2009. This workwas supported by the General Secretariat for Research and Technology (GSRT),Greek Ministry of Development, under Grant 44 (22195/December 16, 2005)within the 3rd Community Support Programme—Operational Programme “In-formation Society 2000-2008 A3-M3.3” (Funding: 75% European Regional De-velopment Fund-25% National Fund). Asterisk indicates corresponding author.

V. E. Kosmidou is with the Department of Electrical and Computer Engi-neering, Aristotle University of Thessaloniki, Thessaloniki GR 541 24, Greece(e-mail: [email protected]).

∗L. J. Hadjileontiadis is with the Department of Electrical and ComputerEngineering, Aristotle University of Thessaloniki, Thessaloniki GR 541 24,Greece (e-mail: [email protected]).

Digital Object Identifier 10.1109/TBME.2009.2013200

the fact that there are many intra- intersigner differences makesthe automated SL-recognition problem a challenging one.

The main issues in the automated SL-recognition problemrefer to the data-acquisition system used to acquire the SL infor-mation, the methodology for achieving the highest classificationrate, and the affordable SL vocabulary size by the system. Forthe latter issue, a middle-sized vocabulary includes around 40signs whereas an extended one includes over 100 signs.

The first reported attempts at machine SL recognition orig-inate in 1990s. In particular, Charayaphan and Marble [1] in-vestigated the interpretation of hand motion in American SL(ASL), based on image processing. Starner [2], and Starnerand [3], using colored gloves on both hands, applied hiddenMarkov models (HMMs) on data from a color camera for 40ASL signs, achieving a hit ratio of over 92%. In [4] and [5], asystem, namely GloveTalk, was developed using a VPL Data-Glove with a Polhemus tracker attached for position and orien-tation tracking as input devices and employing neural networksfor hand gesture classification. Wexelblat [6] developed a ges-ture recognition system using three Ascension Flock-of-Birdposition trackers together with a CyberGlove on each hand.Takahashi and Kishino [7], by using a VPL DataGlove, man-aged to correctly recognize 30 of the 46 signs of the JapaneseKana alphabet. Murakami and Taguchi [8] used recurrent neuralnetworks (RNNs) for SL recognition. They trained their systemon 42 hand shapes in the Japanese finger alphabet using a VPLDataGlove, achieving a recognition rate of 98%. Kadous [9] in-troduced a system based on PowerGloves to recognize a set of 95isolated Australian SL (Auslan) signs with 80% accuracy. Liangand Ouhyoung [10] used the DataGlove and HMMs for continu-ous recognition of Taiwan SL with a vocabulary between 71 and250 signs. Vogler and Metaxas [11] used HMMs for continuousASL recognition with a vocabulary of 53 signs and a completelyunconstrained sentence structure, achieving 95%–98% recogni-tion accuracy for different sets of features. They also describedan approach to continuous, whole-sentence, ASL recognitionthat used phonemes instead of whole signs as the basic units;they achieved a mean classification rate around 91% for 22signs [12], [13]. A Chinese SL (CSL) translating system wasdeveloped by Ma et al. [14] for the recognition of an extendedvocabulary with the use of two CyberGloves.

More recently, a video-based approach for the classificationof Greek SL (GSL) was proposed, using HMMs for both iso-lated and continuous signs [15]. In [16], camera vision sys-tems along with wireless accelerometers mounted on a braceletor watch to measure hand rotation and movement were used.Such a system was also employed for an educational software,namely CopyCat, which is a gesture-based computer game for

0018-9294/$26.00 © 2009 IEEE

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

2880 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009

deaf children [17]. More complex capturing systems involveda combination of headgear, gloves, and a body vest to asso-ciate detailed data on the body motions and facial expressionswith signs [18]. Although this data-acquisition system encum-bers the user, it seemed more reliable and efficient than usingcameras [18]. In [19], an attempt to break down 164 signs intovisemes (in the same way that speech is broken into phonemes)was followed, applying HMMs on video-based data and result-ing in a 74.3% classification rate. In the same vein, the workin [20] combined statistical dynamic time warping with discrim-inant classifiers for the recognition of 120 signs of the Dutch SL,based on 3-D hand motion features derived from a video input,achieving a classification rate of 92%. Toward real-time hand-gesture recognition a two-level approach was introduced [21],combining the posture-detection implementation using Haar-like features and the AdaBoost learning algorithm with syn-tactic analysis based on a stochastic context-free grammar. Tohandle the transition parts between two adjacent signs, whichdirect HMM and context-dependent model cannot deal with,transition-movement models were proposed in [22], achievingan average accuracy of 91.9% over a large vocabulary of 5113CSL signs using two CyberGloves and three Polhemus trackers.

A comparative study that evaluates the classification powerof HMMs and RNN for gesture-recognition systems has shownthe improvement of the discrimination results to 91.9% for 14gestures, when combining the HMMs with RNN [23]. In [24], asparse Bayesian classifier was employed achieving 90% recog-nition rate for ten gestures, whereas another approach [25],based on fuzzy maximum–minimum composition and relationaldatabase management system, reports 98.2% classification for20 gestures. Recently, the hidden conditional random fields wereintroduced for gesture recognition, showing a great improve-ment comparing with the commonly used HMM [26].

An alternative SL information acquisition approach was re-cently introduced, referring to the use of surface electromyo-gram (sEMG) (acquired from the user’s forearm), either solely[27] or combined with data from a triaxial accelerometer (3-D-Acc) (placed on the user’s wrist) [28], [29]. sEMG signals wereacquired on the skin above the muscles involved during ges-turing with noninvasive surface electrodes. This concept wasalso adopted in the cases of gesture recognition in prostheticcontrol problems [30], [31]. In [27] and [28], the feature spaceconsisted of integral absolute, zero-crossing, and autoregressivecoefficients, along with frequency domain features, such as thefirst moment of the wavelet spectrum, achieving a hit ratio ofover 90% for ten GSL signs.

This paper stems from the work reported in [29] and ex-plores the classification power of one feature, i.e., the sampleentropy [32] combined with the empirical mode decomposition(EMD) [33], namely intrinsic-mode entropy (IMEn) [34], forthe classification of 60 GSL signs, when applied to sEMG and3-D-Acc data. This approach reduces the feature space for theclassifier, as it tries to maximize the classification ability throughthe efficiency of IMEn to capture the differences between the60 GSL signs. IMEn is robust to changes in low-frequencycomponents of a signal when used to discriminate between dif-ferent nonlinear signals. As the EMD is a greedy and complete

transformation, the relevant information contained in different-frequency components is preserved, allowing an efficient searchof discriminating features [34]. Experimental results have shownthat the IMEn, when applied to sEMG and 3-D-Acc data for 60GSL signs from three signers, provides an efficient automatedGSL gesture recognition, presenting low intra- and intersignerperformance variability.

The rest of the paper is structured as follows. Section IIprovides background information on the IMEn and describesthe proposed analysis. The experimental dataset used along-side some implementation issues are described in Section III.Section IV presents the performance of the proposed approachthrough experimental results and discusses its efficiency to ad-dress the GSL classification problem. Finally, Section V con-cludes the paper.

II. METHOD

A. Background

1) Sample Entropy: Entropy describes the tendency for sys-tems to go from a state of high potential energy to a state oflower energy. So far, many methods have been introduced forthe estimation of entropy in physiological systems. In 1981,Shaw [35] suggested that a measure of the rate of informationgeneration of a chaotic system would be a useful parameterfor characterizing such a system. Using a modification of theKolmogorov–Smirnov entropy of a time series by Eckmannand Ruelle [36], Pincus [37] introduced the statistical entropy,namely approximate entropy (ApEn), as an efficient measure ofthe regularity of a time series, especially for short and noisy one.However, Richman and Moorman [32], found that the methodused to calculate ApEn introduced a bias, as the ApEn algorithmcounts each sequence as matching itself, making it highly de-pendent on the signal length and relative inconsistent for shortrecords. In order to reduce this bias, they proposed a modi-fied version of the ApEn algorithm, known as Sample Entropy(SampEn). The SampEn of a time series measures the negativelogarithm of the conditional probability that two sequences thatare similar for m points, remain similar at the next point, withina tolerance r, where self-matches are not included in calculatingthe probability.

In order to compute SampEn, the time series y(j), 1≤ j≤ N is first embedded in a delayed m-dimensional space,where the vectors are constructed as x(i) = [y(i + k)]m−1

k=0 , i =1 . . . N − m + 1. The probability Bm (r) that two sequencesmatch for m points is computed by counting the average numberof vector pairs (in the embedded space), for which the distanceis lower than the tolerance r. Similarly, Am (r) is defined foran embedding dimension of m + 1 [32]. The sample entropy isthen calculated as

SampEn(y(j),m, r) = − ln(

Am (r)Bm (r)

). (1)

It can be easily shown [34] that SampEn is severely affectedby low-frequency components of the considered time series.Given that entropy is a nonlinear measure, the presence of localtrends in the whole signal decreases the overall SampEn even

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

KOSMIDOU AND HADJILEONTIADIS: SIGN LANGUAGE RECOGNITION USING INTRINSIC-MODE SAMPLE ENTROPY 2881

for those components of a signal that are highly random. Thismethod is therefore ill-adapted for studying physiological sig-nals, which are the outputs of several interacting control mecha-nisms, whereby applying SampEn as a discriminating tool maylead to erroneous findings. A consistent filtering operation istherefore needed in order to highlight discriminant nonlineardynamics, before calculating entropy.

2) Empirical Mode Decomposition (EMD): In this paper,the robustness of the estimation of the sample entropy is en-hanced by the combination of SampEn with the EMD, as ini-tially proposed in [34]. EMD is an intuitive signal-dependentdecomposition of a time series into waveforms modulated inamplitude and frequency [33]. The iterative extraction of thesecomponents is based on the local representation of the signalas the sum of a local oscillating component and a local trend.The first iteration of the algorithm consists of extracting a com-ponent, referred to as the intrinsic mode function (IMF), whichrepresents the oscillations of the entire signal. The difference be-tween the original signal and the IMF time series is the residual.The IMF component is obtained by a sifting process such that itsatisfies the requirement that it is zero-mean and that the numberof extrema and the number of zero-crossings are identical or dif-fer by one. This same procedure is then applied to the residualto extract the second IMF. Therefore, all IMFs are iterativelyextracted. The nonstationary signal x(t) is then represented as asum of IMFs and the residual component x(t) =

∑Ki=1 ci(t) +

rK (t), where ci(t) denotes the ith extracted empirical modeand rK (t) denotes the residual, which is a monotonic func-tion without extrema and can either be the mean trend or aconstant.

The EMD process can be summarized as follows [33]:1) Identify all the extrema (minima and maxima) of x(t).2) Interpolate between minima and maxima using cubic

spline to produce the lower emin(t) and the upper emax(t)envelope, respectively.

3) Compute the average m(t) = (emax(t) + emin(t))/2.4) Extract the IMF component by c1(t) = x(t) − m(t).5) Iterate on the residual m(t).

B. Proposed Classification Analysis

1) Intrinsic-Mode Entropy (IMEn): The IMEn correspondsto the SampEn computed on the cumulative sums of the IMFsobtained by the EMD decomposition. In fact, the cumulativesums of the IMFs, beginning with the first IMF, yield a multilevelfiltering of the original signal, starting from the finest scales andending with the whole signal. The cumulative IMF sum up toorder k, Ck

IMF, is defined by CkIMF(t) =

∑ki=1 ci(t).

If K is the total number of modes yielded by the EMD decom-position and the residual rK (t) is considered as the (K+1)thmode, then CK +1

IMF corresponds to the original signal x(t). TheIMEn can be represented by

IMEn(k,m, r) = SampEn(CkIMF(t),m, r), k = 1, . . . ,K + 1

(2)where m is the window length and r is the tolerance.

The use of the IMEn for discriminant analysis is the sameas for SampEn, but over different oscillation levels. Computing

the SampEn, over the different levels, enables the analysis ofthe evolution of the signal regularity, as lower frequency com-ponents are gradually added until the SampEn of the signalis reached. This procedure is robust to the variability of localtrends that are frequent in biomedical data [34].

2) Classification Process and Setup: Discriminant analysisis used to determine which variables discriminate between twoor more naturally occurring groups and consequently to reducethe number of the variables considered adequate to fully discrim-inate the gestures [38]. The analytical procedure of discriminantanalysis generates a discriminant function (or a set of discrimi-nant functions). These functions are linear combinations of thepredictor variables that provide the best discrimination betweenthe groups (signs). Each discriminant function is orthogonal tothe others. Obviously, the first function represents the most pow-erful differentiating dimension. The functions are determinedobserving a sample of cases for which the group membershipis known. After generating the functions, they can be appliedto new cases with measurement for the predictor variables, butwith unknown group membership.

The discriminant analysis used in this approach is based onthe Mahalanobis distance criterion. Mahalanobis distance is thedistance between a case and the centroid for each group in at-tribute space (n-dimensional space defined by n variables) [39].There is a Mahalanobis distance for each case and each case isclassified as belonging to the group for which the Mahalanobisdistance is minimum. The statistical distance or Mahalanobisdistance between two points x = (x1 , x2 , . . . , xn )t and y =(y1 , y2 , . . . , yn )t in the n-dimensional space �n from the samedistribution, which has a covariance matrix C, is defined as

dS (x, y) =√

(x − y)tC−1(x − y). (3)

Obviously, the Mahalanobis distance is the same as the Eu-clidean distance if the covariance matrix is the identity matrix.The introduction of the covariance matrix has two effects. First,the matrix normalizes the effects of coordinates, i.e., when thecoordinates are independent, the distance reduces to the “nor-malized” Euclidian distance. Second, the Mahalanobis distancedown weighs coordinates that are highly correlated. Considerthe artificial example where we replicate one coordinate l times,add some small random numbers (to avoid singularity), andappend them to the inputs x and y. Whereas Euclidian dis-tance will inflate the influence of the coordinate by (l +1)-fold,in Mahalanobis distance the weights of the replicated coordi-nates are adjusted by the covariance, so the total weight for the(l +1) coordinates is (about) the same as a single coordinate.This feature is very useful when some coordinates are redundantdue to the data-collection procedures [40], [41].

The proposed classification analysis consists of the followingconfigurations:Setup 1: Estimation of the best (k, m) pairs per single-signer.

All acquired data (sEMG and Acc signals corresponding to60 signs repeated ten times) from each signer were subjectedto IMEn analysis and the IMEn(k,m, r)i , i = 1, 2, . . . , S,where S denotes the number of signers, was calculated using(2). Then, a discriminant analysis was carried out in the

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

2882 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009

resulted feature set to find those (k, m) pairs (r was keptconstant to the value of 0.2 [31]), i.e., (k,m)opt

i , that providethe maximum classification accuracy among the employedsigns of the used vocabulary (see subsequent section) persigner, denoted as CAccmax

i , i = 1, 2, . . . , S.Setup 2: Classification accuracy based on a single-signer. The

resulting (k,m)opti pairs from setup 1 were used to eval-

uate the mean classification accuracy of the IMEn analy-sis, i.e., CAcc

ij , i, j = 1, 2, . . . , S based on a single-signer.

Therefore, 70% and 30% of the feature set consisting ofthe IMEn(kopt

i ,mopti , r)j , i, j = 1, 2, . . . , S, were used for

training and testing, respectively. This sequence was repeated100 times, and the classification accuracy was averaged out,resulting in the CAcc

ij , i, j = 1, 2, . . . , S. By repeating the

classification experiment many times with different (random)training and test sets, a robust estimate of the classifier per-formance was obtained.

Setup 3: Estimation of the best (k, m) pairs per signer-pair.In this step, the estimated IMEn(k,m, r)i , i = 1, 2, . . . , S,from setup 1 were grouped into pairs of signers, i.e.,IMEn(k,m, r)(i,l) , i, l = 1, 2, . . . , S, i �= l. A discriminantanalysis was carried out in the resulted feature set to reveal the(k,m)opt

(i,l) pairs that provide the maximum classification ac-curacy among the employed signs of the used vocabulary persigner-pair, denoted as CAccmax

(i,l) , i, l = 1, 2, . . . , S, i �= l.Setup 4: Classification accuracy based on a signer-pair. The

resulting (k,m)opt(i,l) pairs from setup 3 were used to eval-

uate the classification accuracy of the IMEn analysis, i.e.,

CAcc(i,l)j , i, j, l = 1, 2, . . . , S, i �= l, based on a signer-pair.

Similar to setup 2, 70% and 30% of the feature set consist-ing of the IMEn(kopt

(i,l) ,mopt(i,l) , r)j , i, j, l = 1, 2, . . . , S, i �=

l, were used for training and testing, respectively. This se-quence was repeated 100 times, and the classification accu-

racy was averaged out, resulting in the CAcc(i,l)j , i, j, l =

1, 2, . . . , S, i �= l.

III. IMPLEMENTATION ISSUES

A. Dataset Construction

SL comprises of facial expressions and hand gestures. Al-though facial expressions contribute toward the recognition ofSL, information based on hand gestures can uniquely distin-guish between most of the signs in the SL [42]. In this work,only the dominant hand was used for the identification of theGSL gestures, since it sufficiently accounts for more than 70%of the GSL signs [41]. The phonetic elements seen in a spokenlanguage are corresponded to movements of the arm (elbow andshoulder), the wrist, and the fingers in the SL case, constitut-ing the gesture. As the gesture is directly connected to handmovement, measurement of the latter could contribute to thegesture representation in the recognition space. This was themotivation to explore the capabilities of sEMG and 3-D-Accdata in SL recognition. To this end, the movements of the armcan be captured with the help of a 3-D Accelerometer, whereas

Fig. 1. (a) Acquisition device, bioplux8, and its mounting to a young signer’sdominant hand. (b) Schematic of the placement of the sEMG electrodes and the3-D-Acc on the signer’s hand.

the motions of the wrist and the fingers can be obtained by thecorresponding muscles on the arm.

The dataset used in this work consists of a set of five-channelsEMG signals and three-channel 3-D-Acc data acquired at theSignal Processing and Biomedical Technology Unit, AristotleUniversity of Thessaloniki, Greece, with the wireless (Blue-tooth) bioplux8 device (PLUX, Engenharia de Biosensores,Lisboa, Portugal) during signing. This device provides at leastfive analogue inputs for sEMG recordings (12-bit at the rate1kHz) and three additional ones for triaxial-accelerometer record-ings (12-bit at the rate125 Hz). Fig. 1(a) depicts the mounting of

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

KOSMIDOU AND HADJILEONTIADIS: SIGN LANGUAGE RECOGNITION USING INTRINSIC-MODE SAMPLE ENTROPY 2883

TABLE IGSL 60-WORD LEXICON AND ITS INTRASIMILARITY INDEX

the bioplux8 (6 × 3 × 1.5 cm) on a young signer’s right hand.Unlike vision-based methods, wearing assistive tools, some-times, encumber the user during the performance of SL. The pro-posed acquisition system, however, is small-sized, lightweight,and is mounted on the signer’s arm in such a way [see Fig. 1(a)]that causes negligible influence to the way a signer performsthe SL signs. This was justified by all signers participated in thestudy. After experimentation on the exact placement and type ofthe sEMG electrodes, a position that provides with high signalquality and discrimination among the performed motions perchannel was identified [27]. In particular, it was found that themuscles Flexor Carpi Radialis (channel #1: wrist movements),Flexor Digitorum Superficiallis (channel #2-finger movements),Flexor Carpi Ulnaris (channel #3-wrist movements), ExtensorDigitorum Communis (channel #4-finger movements), and Ex-tensor Carpi Ulnaris (channel #5-wrist movements) presentedthe best performance and provided the desired degrees of free-dom (DOFs) per hand articulation. Extensive experiments haveshown that the use of additional electrodes did not provide sig-nificant raise in the classification accuracy, yet reduction in thenumber of electrodes clearly affected the classification perfor-mance. Consequently, the aforementioned five channels wereused as a trade-off between manageable data volume and dis-crimination accuracy. The surface electrodes used were of awet disposable active differential type that stick to the skin andobtain high signal-to-noise ratio. Fig. 1(b) illustrates the config-uration of the position of the electrodes; the reference electrodewas placed on the signer’s wrist. The 3-D-Acc was also mountedon the signer’s wrist [see Fig. 1(b)], to be most sensitive to thehand spatial position and movements.

Sixty GSL signs were performed during the data acquisi-tion corresponding to the words listed in Table I. Three signers(S = 3), all deaf-born experienced GSL teachers, were askedto perform each sign ten times, including a short pause of 2s between each repetition. An example of the acquired signals

during signing of the “towel,” “glass,” and “eat” GSL signs isdepicted in Fig. 2.

B. Vocabulary Construction

The selected 60 GSL signs formed the vocabulary used inthe classification analysis, have increased frequency occurrencein everyday life GSL use scenarios, and are all included in thebasic curriculum of the primary special schools for the deaf kids.According to the aforementioned dataset configuration, we haveused the gesture information of these 60 GSL signs expressedby the dominant hand only.

In order to examine the intrasimilarity of the vocabulary, interms of the similar movements involved in each hand articula-tion, each sign was broken down into small motions, using threeDOFs for modeling the motions of: (1) the arm (first DOF); (2)the fingers (second DOF); and (3) the wrist (third DOF). For eachDOF, all possible motions [extension, flexion, left/right rotation,pause (inactivity)] were symbolized accordingly (see Table II).Using the symbols from Table II, each GSL sign was corre-sponded to a sequence of symbols per DOF (see Fig. 3) used inthe similarity analysis. The latter was based on the Needleman–Wunsch global alignment of two sequences borrowed from thearea of bioinformatics [43]. To achieve this, each symbol fromthe DOF sequences was represented by an amino acid letter (seeTable II); the three resulted DOF amino acid letter sequenceswere concatenated forming a unified one per sign and the maxi-mum sequences match per pair of signs was estimated. Accord-ing to the ratio of the number of the matching amino acid lettersper pair to the maximum length of the compared sequences thepercentage of similarity (PS) was calculated. The results of thiscomparison are depicted in Fig. 4, which depicts the histogramof PS. As it is apparent from Fig. 4, the selected vocabularyfollows an approximately Gaussian-type distribution (mean of∼50% ± SD (Standard deviation) of ∼11%), implying a suf-ficient confusion in the categorization of its GSL signs. Froma finer perspective, the PS of each sign with all the other signswithin the vocabulary, i.e., PSs , was calculated and the degreeof similarity (DoS) for each sign was defined based on the firstand third quartiles (R25 and R75) of the distribution of PSs . Inthis way, the DoS for each sign was characterized as Low (L)when R25 ≤ 35%, as Medium (M) when R25 > 35% and R75≤ 65%, and as High (H) when R25 ≥ 40% and R75 > 65%(see Table I).

C. Realization

The IMEn analysis was realized on a PC (Intel R© CoreTM 2CPU 6600 at the rate 2.0 GHz) using MATLAB R2007a (TheMathworks, Inc., Natick, MA) whereas the statistical analysiswas carried out with SPSS 16 (SPSS Inc., Chicago, IL). All sig-nals were standardized to a unit standard deviation; the tolerancer was set equal to 0.2, implying a 20% relationship with the stan-dard deviation of the data, allowing measurements on datasetswith different amplitudes and short length to be compared [32].The IMFs were estimated up to scale K = 12, whereas the mparameter used for the estimation of the SampEn varied from 1to 7 points.

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

2884 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009

Fig. 2. Example of the eight acquired signals during signing of the “towel,” “glass” and “eat” GSL signs by the second signer. Ch1–Ch5 and Ch6–Ch8 arecorresponded to the sEMG and 3-D-Acc, respectively.

TABLE IISYMBOLS ALONG WITH THE CORRESPONDING AMINO ACID LETTER PER DEGREE OF FREEDOM FOR MODELING THE 60 GSL SIGNS

Fig. 3. Sequence of symbols per degree of freedom (DOF), corresponding to the GSL sign “glass.” Three DOFs are used for modeling the motions of the arm(DOF1), the fingers (DOF2), and the wrist (DOF3) and five symbols to model the motions of extension, flexion, left/right rotation, and pause (inactivity).

IV. RESULTS AND DISCUSSION

An indicative example from the IMEn analysis when ap-plied to the acquired sEMG signal from channel #1 duringGSL gesturing of the three words “towel,” “glass,” and “eat”by the second signer (i = 2) is given in Fig. 5. In particular,the seven plots in Fig. 5 depict the mean (line with marker)± standard deviation (gray area) of the IMEn(k,m, 0.2)2 , k =1, 2, . . . , 7; m = 1, 2, . . . , 7, throughout the ten repetitions ofeach sign. According to Table I, the signs “towel” and “glass”correspond to H in terms of DoS, whereas the sign “eat”

corresponds to an M. As it can be noted from Fig. 5, theestimated IMEn(k,m, 0.2)2 can easily discriminate the sign“eat” from the other two signs for all (k, m) pairs. Regardingthe discrimination of the signs “towel” and “glass,” statisti-cally significant discrimination (Kruskal–Wallis test, p < 0.05)in the IMEn(k,m, 0.2)2 domain is achieved for low val-ues (<4) of the m parameter only, as for higher ones, allIMEn(k,m, 0.2)2 curves tend to overlap. However, the validdiscrimination remains independent of the value of the kparameter.

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

KOSMIDOU AND HADJILEONTIADIS: SIGN LANGUAGE RECOGNITION USING INTRINSIC-MODE SAMPLE ENTROPY 2885

TABLE III(k, m)opt

i AND (k, m)opt(i , l) PAIRS FOR THE MAXIMUM RECOGNITION PERFORMANCE

The dependence of the IMEn performance on the (k, m) pairsis thoroughly examined in the setup 1. Table III (upper part)presents the (k,m)opt

i , i = 1, 2, 3, that provide the maximumclassification accuracy among the employed signs of the usedvocabulary per channel and per signer. The discriminant analysisforced the reduction of the feature set length from approximately700 to less than 50 discriminant functions for every single-signercase. In Fig. 6(a), visualization of the way the first two functionsdiscriminate between groups is provided by plotting the individ-ual scores for the two discriminant functions as resulted for thesigner 2. The group centroids are the mean discriminant scoresfor each of the dependent variable categories for each of thediscriminant functions. The closer the means, the more errors of

classification there likely will be. As it is depicted in Fig. 6, notall signs are completely discriminated by the first two discrim-inant functions (it can be observed that the first two functionsoffer a classification score of 26%), justifying the need for allproposed discriminant functions to increase the discriminationpower. These are based on the (k,m)opt

i , i = 1, 2, 3, listed inTable III (upper part). From this table, it is deduced that thereis an important contribution to the classification performanceof the low scales of kopt

i (1 or 2) for the first five channels,corresponding to the high-frequency components of the sEMGsignals, and of the high scales of kopt

i (5–7) for the last threechannels, corresponding to low-frequency components of theacceleration data; this is noticed independently from the signers.

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

2886 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009

Fig. 4. Histogram of the percentages of similarity (PS) of pairs of the aminoacid letter sequences (see Table II) of the selected GSL signs. The distributionof the PS follows an approximately Gaussian-type distribution (mean ± SD:∼50% ± 11%).

Fig. 5. IMEn analysis results when applied to the acquired sEMG signal fromchannel #1 during GSL gesturing of the three signs “towel,” “glass,” and “eat”by the second signer (i = 2). Mean± SD values of the IMEn(k, m, 0.2)2 , k =1, 2, . . . , 7; m = 1, 2, . . . , 7, are denoted with a marker-solid line and a grayarea, respectively, across the ten repetitions of the three signs.

Fig. 6. Scatterplot of the individual scores of the 60 signs for the first twodiscriminant functions, as resulted for the second signer (i = 2). The groupcentroids are the mean discriminant scores for each of the dependent variablecategories for each of the discriminant functions. Apparently, the first twofunctions offer a low classification score around 26%.

Moreover, more than 70% of the mopti values in Table III (upper

part) are smaller than 4, justifying the results shown in Fig. 5.This result is connected with the nature of the analyzed data,as small values of mopt

i increase the sensitivity of the IMEnto capture rapid time changes in the input signals; hence, tobetter identify the differences among them, especially the high-frequency sEMG signals (channels #1–5).

Using the (k,m)opti , i = 1, 2, 3, the maximum classification

accuracy achieved, i.e., CAccmaxi , i = 1, 2, 3, ranges from 99%

to 99.8% (see Table III, upper part), indicating a successfuloverall classification performance of the IMEn analysis for theemployed vocabulary. These values denote the upper limit inthe classification performance of the proposed feature set forthe single-signer case.

The degree of dependency of the IMEn analysis with thesigner is examined in the setup 2, where the resulting (k,m)opt

i

pairs from setup 1 were used to evaluate the CAccij , i, j =

1, 2, 3, based on a single-signer. The mean and the standarddeviation of the classification accuracy over 100 iterations ofrandomly selecting the train and the test set for each signerare listed in Table IV (left part). For example, the first col-umn indicates the results when the classification test for eachsigner is performed using the (k,m)opt

1 pairs. From the inspec-tion of Table IV, it is evident that the feature set based on the(k,m)opt

1 pairs results in the most accurate recognition for the60 GSL signs across the three signers. Overall results showmean discrimination accuracy more than 89% with standarddeviation around ±1.5%. This highlights the robustness of theIMEn analysis to the variability in signing across signers.

The exploitation of the extension from single-signer to signer-pair is examined in the setup 3, where a grouping procedureis adopted. Table III (lower part) presents the (k,m)opt

(i,l) , i, l =1, 2, 3, i �= l pairs that provide the maximum classification accu-racy CAccmax

(i,l) , i, l = 1, 2, 3, i �= l. Similar to the procedure fol-lowed in setup 1, the discriminant analysis forced the reduction

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

KOSMIDOU AND HADJILEONTIADIS: SIGN LANGUAGE RECOGNITION USING INTRINSIC-MODE SAMPLE ENTROPY 2887

TABLE IVPERCENTAGES OF CLASSIFICATION ACCURACY

of the feature set length from approximately 700 to around60 discriminant functions for every signer-pair case. FromTable III (lower part), a similar behavior to the case of the setup1 is noticed, as again there is an important contribution to theclassification performance of the low scales of kopt

(i,l) (1–3) for the

first five channels and of the high scales of kopt(i,l) (5–7) for the last

three channels, independently from the signer-pair. Moreover,more than 75% of the mopt

(i,l) values in Table III (lower part) aresmaller than 4.

Using the (k,m)opt(i,l) , i, l = 1, 2, 3, i �= l, the maximum clas-

sification accuracy achieved, i.e., CAccmax(i,l) , ranges from 90.3%

to 95.8% (see Table III-lower part), indicating a satisfactoryoverall classification performance for the signer-pair case, set-ting, simultaneously, the upper limit in the classification perfor-mance of the proposed feature set for that case.

The recognition performance based on a signer-pair is ex-amined in the setup 4, where the (k,m)opt

(i,l) pairs are used for

the estimation of the classification accuracy CAcc(i,l)j , i, j, l =

1, 2, 3, i �= l. Table IV (right part) presents the mean and thestandard deviation of the classification accuracy over 100 ran-dom selections of train and test feature sets for each signer.From Table IV, it is evident that the feature set based on the(k,m)opt

(1,3) pairs results in the most accurate recognition for the60 GSL signs across the signer-pairs. Overall results show meandiscrimination accuracy of more than 91% with standard devia-tion around ±1.5%, showing a robust behavior to the variabilityin signing across signers (both single and in pairs).

When comparing the values of CAccij and CAcc

(i,l)j from

Table IV, it is noticed that the IMEn analysis providesslightly better classification results in the case of the single-signer than in the case of the signer-pair, although less dis-criminant functions are used in the former case. This in-dicates that, in general, the information from single-signerIMEn analysis is sufficient for achieving efficient classificationacross signers. Nevertheless, in some cases (see for example

CAcc(1,3)1 , CAcc

(1,3)3 , CAcc

(2,3)2 ), the signer-pair IMEn anal-

ysis provides better classification performance than the single-signer case, leading to the conclusion that a combinatory ap-proach could maximize the classification ability of the IMEnanalysis. In order to test the ability of the IMEn against the use

of simple SampEn to provide better discrimination among theGSL gestures, the maximum classification accuracy of the Sam-pEn was also estimated for the cases of single-signer (setup 1)and signer-pair (setup 3). The SampEn analysis followed simi-lar steps and parameter configuration to the IMEn analysis; i.e.,the acquired sEMG and 3-D-Acc data were initially subjectedto SampEn analysis using (1) (r = 0.2; m = 1, 2, . . . , 7); then,discriminant analysis was applied to the estimated SampEn fea-ture set and the principal discriminant functions were found.Using the latter, the maximum classification accuracy amongthe employed signs of the vocabulary used was estimated, bothper single-signer (see upper part of Table III), and per signer-pair(see lower part of Table III). From these results, it can be deducedthat in the case of a single-signer (setup 1), IMEn analysis (rang-ing between 99.0% and 99.8%) appears to perform better thanthe SampEn one (ranging between 96.4% and 99.3%) for all thesigners (i = 1,2,3) conducting the experiment. A clear distancein the discrimination efficiency of the IMEn against SampEn isnoticed in the case of signer-pair (setup 3), where, unlike IMEn(IMEn: CAccmax

(i,l) = 90.3-95.8%, i, l = 1, 2, 3, i �= l), SampEnseems to be highly dependent to the intersigner differences(SampEn : CAccmax

(i,l) = 78.6-83.6%, i, l = 1, 2, 3, i �= l). Thisis a clear example of the advantage of the IMEn to accountfor the intersigner differences by using information from theeffective oscillation levels (IMFs) of the acquired data duringGSL gesturing; this information is not captured by the simpleSampEn analysis, exhibiting, thus, lower discrimination perfor-mance due to the influence of local trends.

To elaborate upon the ability of the proposed approach tocircumvent the problem of discrimination in extreme cases

with high DoS, the CAccij and CAcc

(i,l)j results referring to

the signs from Table I that are characterized as H are iso-lated for further inspection; these are reported in Table V as

H − CAccij and H − CAcc

(i,l)j , respectively. From this table,

it is seen that the signs with high DoS do not affect signifi-cantly the performance of the IMEn analysis, as the resultinghit-ratios are, in general, greater than 90%. Nevertheless, somecases seem to be affected more than the others [see for exam-ple H − CAcc

13 = 78.67% and H − CAcc

(1,2)3 = 80% for the

wID = 35 (“towel”)], yet with still acceptable hit-ratio values.

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

2888 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009

TABLE VCLASSIFICATION ACCURACY OF IMEN ANALYSIS FOR THE SIGNS WITH HIGH DOS

Furthermore, it is noticed that the standard deviation in the es-timated H − CAcc

ij and H − CAcc

(i,l)j values is greater than

that seen in the cases of CAccij and CAcc

(i,l)j (see Table IV),

denoting that the high DoS acts as a kind of noise in the IMEnclassification procedure, especially in the case of similar trends.These, however, are efficiently addressed by the IMEn due to itsembedded EMD analysis. The latter allows the computation ofthe SampEn over the different EMD levels, enabling the evolu-tion of the signal regularity. This presents increased robustnessto the variability of local trends in biomedical signals [33], suchas sEMG, and fosters the classification ability of IMEn analysis.

To examine the contribution of each channel to the clas-sification accuracy of the IMEn approach, discriminant anal-ysis was performed for the single-signer case to estimate(k,m)opt

i,ch , i = 1, 2, 3; ch = 1, 2, . . . , 8 that provide the max-imum classification accuracy per channel. The results of thisanalysis are shown in Table VI, where the first column liststhe classification accuracy for each channel, i.e., CAccmax

i,ch , i =1, 2, 3; ch = 1, 2, . . . , 8, the second lists the classification accu-racy for grouped channels (two groups of channels correspond-ing to sEMG and Acc signals), i.e., CAccmax

i,1−5 ,CAccmaxi,6−8i =

1, 2, 3, while the third reproduces the CAccmaxi values listed

in Table IV. From the inspection of Table VI, it can be no-

ticed that every channel participates almost equally to theclassification of the 60 GSL signs, justifying the need forthe acquisition of the sEMG signals from the proposed mus-cles and the use of the triaxial-accelerometer, since theircombination dramatically increases the maximum classifica-tion accuracy (e.g., from CAccmax

3,1−5 = 86.3%, CAccmax3,6−8 =

77.3% to CAccmax3 = 99%).

It is noteworthy that the current version of the proposed clas-sification analysis does not support real-time implementation,as the focus was placed on the exploitation of the classificationability of the proposed features, rather than real-time optimiza-tion. Despite that, in the IMEn analysis the feature set is gen-erated from a single process, i.e., EMD; hence, this could beadvantageous when the focus is placed on the minimization ofthe computational burden and contributes to a fast realization ofthe IMEn analysis.

As it is deduced from the experimental results, the IMEn anal-ysis proved to be a promising tool for isolated sign recognition,as it exhibited high classification accuracy for 60 signs across alltested signers. In a more general perspective, making a systemthat can understand different signers with their own variationsin signing involves collecting data from many signers. Never-theless, the results presented here establish a promising bed-set

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

KOSMIDOU AND HADJILEONTIADIS: SIGN LANGUAGE RECOGNITION USING INTRINSIC-MODE SAMPLE ENTROPY 2889

TABLE VICONTRIBUTION OF EACH CHANNEL TO THE CLASSIFICATION ACCURACY

for converging toward independent SL recognition and gener-alization. A further enhancement of the IMEn analysis couldbe achieved by combining it with an SL grammar, embeddinggrammatical rules that support continuous SL recognition tothe classifier. This could increase the flexibility and size of thevocabulary construction by creating a set of scenarios (e.g., re-lated set of signs) that comply with real-life situations. Althoughfinger-spelling recognition was not addressed here, differenceson the fingers articulations have been identified in the sEMGsignals, as in the examined GSL vocabulary, many words in-volve a variety in fingers position. Perhaps, in the case of aseparate mode, additional electrodes should be placed on themuscles of finger closer to the fingers to increase the fidelity ofthe relevant sEMG targeting finger-spelling recognition. Finally,extension of the IMEn analysis to both hands could potentiallycover a wider range of signs. Our future studies will focus onthe aforementioned issues.

V. CONCLUSION

In this work, an SL recognition scheme was proposed basedon the application of the IMEn on sEMG and 3-D-Acc data ac-quired from the dominant hand. The findings here show that theIMEn analysis yields a high mean recognition accuracy (>93%)where recognition was performed on the isolated 60 GSL signs.Moreover, the proposed analysis has shown high robustnessagainst increased similarity among the GSL signs and promis-ing generalization ability across different signers. The combi-nation of sEMG signals with 3-D-Acc ones introduced here setsa novel pathway for creating an efficient wireless wearable sys-tem that, combined with portable devices (e.g., PocketPC) andtext-to-speech/speech-to-text engines, could be integrated intoa portable SL recognition device.

ACKNOWLEDGMENT

The authors would like to acknowledge the Greek Sign Lan-guage teachers, Mr. G. Petropoulos, Mr. T. Germanidis, andMrs. M. Christoforidou for performing the GSL gestures duringthe acquisition process. In addition, the authors thank Prof. C.Kotzamanidis, from the Department of Physical Education &Sports Science, Aristotle University of Thessaloniki, for hishelp and guidance during the muscle identification experiments.

The authors are grateful to Dr. V. Kourbetis, Consultant to theSpecial Education, Pedagogical Institute of Greece, and Mrs.A. Gouvatzi, Director of the Special School for the Deaf,Thessaloniki, for their contribution to the construction ofthe GSL vocabulary and their help in the data-acquisitionexperiments.

Finally, the authors would like to thank the anonymous re-viewers for their constructive comments toward the improve-ment of the manuscript.

REFERENCES

[1] C. Charayaphan and A. Marble, “Image processing system for interpretingmotion in American sign language,” J. Biomed. Eng., vol. 14, pp. 419–425, 1992.

[2] T. Starner, “Visual recognition of American sign language using hiddenMarkov models,” M.S. thesis, Massachusetts Inst. Technol., Media Lab.,Cambridge, Jul. 1995.

[3] T. Starner and A. Pentland, “Real-time American sign language recog-nition from video using hidden Markov model,” Percept. Comput. Sec.,Massachusetts Inst. Technol., Media Lab., Cambridge, Tech. Rep. 375,1996.

[4] S. S. Fels and G. Hinton, “GloveTalk: A neural network interface betweena DataDlove and a speech synthesizer,” IEEE Trans. Neural Netw., vol. 4,no. 1, pp. 2–8, Jan. 1993.

[5] S. S. Fels, “Glove–TalkII: Mapping hand gestures to speech using neu-ral networks—An approach to building adaptive interfaces,” Ph.D. dis-sertation, Comput. Sci. Dept., Univ. Toronto, Toronto, ON, Canada,1994.

[6] A. D. Wexelblat, “A feature-based approach to continuous gesture anal-ysis,” M.S. thesis, Massachusetts Inst. Technol., Media Lab., Cambridge,MA, 1993.

[7] T. Takahashi and F. Kishino, “Gesture coding based on experiments usinga hand gesture interface device,” ACM SIGCHI Bull., vol. 23, no. 2,pp. 67–73, Apr. 1991.

[8] K. Murakami and H. Taguchi, “Gesture recognition using recurrent neuralnetworks,” in Proc. SIGCHI Conf. Human Factors Comput. Syst.: Reach-ing Through Technol. Table Contents, New Orleans, LA, 1991, pp. 237–242.

[9] M. W. Kadous, “Machine recognition of Auslan signed using Power-Glove: Towards large-lexicon recognition of sign language,” in Proc.WIGLS, Workshop Integr. Gesture Language Speech, Appl. Sci. Eng. Lab.Newwark, DE, Oct. 1996, pp. 165–174.

[10] R.-H. Liang and M. Ouhyoung, “A real-time continuous gesture recog-nition system for sign language,” in Proc. 3rd Int. Conf. Autom. FaceGesture Recognit., Nara, Japan, 1998, pp. 558–565.

[11] C. Vogler and D. Metaxas, “Adapting hidden Markov models for ASLrecognition by using three-dimensional computer vision methods,” inProc. IEEE Int. Conf. Syst., Man Cybern., Orlando, FL, 1997, pp. 156–161.

[12] C. Vogler and D. Metaxas, “ASL recognition based on a coupling betweenHMMs and 3D motion analysis,” in Proc. IEEE Int. Conf. Comput. Vis.,Mumbai, India, 1998, pp. 363–369.

[13] C. Vogler and D. Metaxas, “Toward scalability in ASL Recognition:Breaking down signs into phonemes,” presented at the Gesture Workshop,Gif-sur-Yvette, France, 1999.

[14] J. Ma, W. Gao, J. Wu, and C. Wang, “A continuous Chinese sign languagerecognition system,” in Proc. IEEE 4th Int. Conf. Autom. Face GestureRecognit., Grenoble, France, Mar. 2000, pp. 428–433.

[15] V. N. Paschaloudi and K. G. Margaritis, “Towards an assistive toolfor Greek sign language communication,” in Proc. IEEE 3th Int.Conf. Adv. Learning Technol., Athens, Greece, Jul. 2003, pp. 125–129.

[16] H. Brashear, T. Starner, P. Lukowicz, and H. Junker, “Using multiplesensors for mobile sign language recognition,” in Proc. IEEE 7th Int.Symp. Wearable Comput., Oct. 2005, pp. 45–52.

[17] S. Lee, V. Henderson, H. Brashear, T. Starner, and S. Hamilton, “User-centered development of a gesture-based American sign language game,”presented at the Instruct. Technol. Educ. Deaf Symp., Rochester, NY, Jun.2005.

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.

2890 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 12, DECEMBER 2009

[18] J. Hernandez-Rebollar, N. Kyriakopoulos, and R. Lindeman, “A new in-strumented approach for translating American sign language into soundand text,” in Proc. IEEE 6th Int. Conf. Autom. Face Gesture Recognit.,Seoul, Korea, May 2004, pp. 547–552.

[19] H. Cooper and R. Bowden, “Large lexicon detection of sign language,” inHuman Computer Interaction, Proc. Int. Conf. Comput. Vision (ICCV)(Lecture Notes in Computer Science), vol. 4796, M. Lew, N. Sebe,Th. S. Huang, and E. M. Bakker, Eds. Berlin, Germany: Springer-Verlag, 2007, pp. 88–97.

[20] J. F. Lichtenauer, E. A. Hendriks, and M. J. T. Reinders, “Sign languagerecognition by combining statistical DTW and independent classification,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 2040–2046,Nov. 2008. DOI: 10.1109/TPAMI.2008.123.

[21] Q. Chen, N. D. Georganas, and E. M. Petriu, “Hand gesture recognitionusing Haar-like features and a stochastic context-free grammar,” IEEETrans. Instrum. Meas., vol. 57, no. 8, pp. 1562–1571, Aug. 2008.

[22] G. Fang, W. Gao, and D. Zhao, “Large-vocabulary continuous sign lan-guage recognition based on transition-movement models,” IEEE Trans.Syst., Man Cybern B., vol. 37, no. 1, pp. 1–9, Jan. 2007.

[23] C. W. Ng and S. Ranganath, “Real-time gesture recognition system andapplication,” Image Vis. Comput., vol. 20, no. 13–14, pp. 993–1007, Dec.2002.

[24] S.-F. Wong and R. Cipolla, “Real-time interpretation of hand motionsusing a sparse Bayesian classifier on motion gradient orientation images,”in Proc. Br. Mach. Vis. Conf.,. Oxford, U.K., 2005, vol. 1, pp. 379–388.

[25] J.-H. Kim, D.-G. Kim, J.-H. Shin, S.-W. Lee, and K.-S. Hong, “Handgesture recognition system using fuzzy algorithm and RDBMS for postPC,” in Fuzzy Systems and Knowledge Discovery (FSKD) (Lecture Notesin Computer Science/Lecture Notes in Artificial Intelligence), L. Wangand Y. Jin, Eds. vol. 3614, Heidelberg, Germany: Springer-Verlag, 2005,pp. 170–175.

[26] S. B. Wang, A. Quattoni, L.-P. Morency, and D. Demirdjian, “Hidden con-ditional random fields for gesture recognition,” in Proc. IEEE Comput.Soc. Conf. Comput. Vis. Pattern Recognit., New York, Jun. 2006, vol. 2,pp. 1521–1527.

[27] V. E. Kosmidou, L. J. Hadjileontiadis, and S. M. Panas, “Evaluation ofsurface EMG features for the recognition of American sign language ges-tures,” in Proc. IEEE 28th Annu. Int. Conf. Eng. Med. Biol. Soc. (EMBS),New York, Aug. 2006, pp. 6197–6200.

[28] V. E. Kosmidou and L. J. Hadjileontiadis, “ICT-based biofeedback ineveryday applications for people with disabilities,” presented at the 4thPersonalized Health Conf., Porto Carras, Greece, Jun. 2007.

[29] V. E. Kosmidou and L. J. Hadjileontiadis, “Intrinsic mode entropy: Anenhanced classification means for automated Greek sign language gesturerecognition,” in Proc. IEEE 28th Annu. Int. Conf. Eng. Med. Biol. Soc.(EMBS), Vancouver, BC, Canada, Aug. 2008, pp. 5057–5060.

[30] G. Saridis and T. Gootee, “EMG pattern analysis and classification for aprosthetic arm,” IEEE Trans. Biomed. Eng., vol. BME-30, no. 6, pp. 18–29, Jun. 1983.

[31] A. B. Ajiboye and R. F. Weir, “A heuristic fuzzy logic approach to EMGpattern recognition for multifunctional prosthesis control,” IEEE Trans.Neural Syst. Rehabil. Eng., vol. 13, no. 3, pp. 280–291, Sep. 2005.

[32] J. S. Richman and J. R. Moorman, “Physiological time-series analysisusing approximate entropy and sample entropy,” Amer. J. Physiol., HeartCirc. Physiol., vol. 278, pp. H2039–H2049, 2000.

[33] N. Huang, Z. Shen, S. Long, M. Wu, H. H. Shih, N. C. Zheng, N. C. Yen,C. Tung, and H. Liu, “The empirical mode decomposition and Hilbertspectrum for nonlinear and nonstationary time series analysis,” in Proc.R. Soc. Lond. A, 1998, vol. 454, pp. 903–995.

[34] H. Amoud, H. Snoussi, D. Hewson, M. Doussot, and J. Duchene, “Intrinsicmode entropy for nonlinear discriminant analysis,” IEEE Signal Process.Lett., vol. 14, no. 5, pp. 297–300, May 2007.

[35] R. Shaw, “Strange attractors, chaotic behavior, and information flow,” Z.Naturforsch., vol. 36A, pp. 80–112, 1981.

[36] J. P. Eckmann and D. Ruelle, “Ergodic theory of chaos and strange attrac-tors,” Rev. Mod. Phys., vol. 57, pp. 617–654, 1985.

[37] S. M. Pincus, “Approximate entropy as a measure of system complexity,”in Proc. Nat. Acad. Sci. USA, 1991, vol. 88, pp. 2297–2301.

[38] B. Tabachnick and L. Fidell, Using Multivariate Statistics, 4th ed.Boston, MA: Allyn & Bacon, 2001, ch. 11.

[39] H. Xu, “Mahalanobis distance based ARTMAP networks,” M.S. thesis,Dept. Comput. Sci., San Diego State Univ., CA, Oct. 2003.

[40] W. J. Krzanowski, Principles of Multivariate Analysis: A User’sPerspective (Oxford Statistical Science Series), Rev Sub edition, NewYork: Oxford Univ. Press, Dec. 2000.

[41] G. A. F. Seber, Multivariate Observations (Wiley Series in Probabilityand Statistics), San Francisco, CA: Wiley, Sep. 2004.

[42] R. A. Tennant, M. Gluszak-Brown, and V. Nelson-Metlay, The AmericanSign Language Handshape Dictionary. Washington, DC: Clerc Books,Gallaudet Univ. Press, 1998, pp. 11–24.

[43] S. B. Needleman and C. D. Wunsch, “A general method applicable to thesearch for similarities in the amino acid sequence of two proteins,” J.Mol. Biol., vol. 48, no. 3, pp. 443–453, Mar. 1970.

Vasiliki E. Kosmidou (S’03) was born in Drama,Greece, in 1982. She received the Diploma in electri-cal and computer engineering in 2004 from Aristo-tle University of Thessaloniki, Thessaloniki, Greece,where she is currently working toward the Ph.D.degree.

Her current research interests include advancedsignal processing, applied statistics, biomedicalengineering, and gesture recognition techniques.

Leontios J. Hadjileontiadis (S’87–M’98) was bornin Kastoria, Greece, in 1966. He received the Diplomain electrical engineering and the Ph.D. degree inelectrical and computer engineering from AristotleUniversity of Thessaloniki, Thessaloniki, Greece, in1989 and 1997, respectively, and the Ph.D. degreein music composition from the University of York,York, U.K., in 2004.

In December 1999, he joined the Department ofElectrical and Computer Engineering, Aristotle Uni-versity of Thessaloniki, as a faculty member, where

he is currently an Assistant Professor, and is also involved in the field of lungsounds, heart sounds, bowel sounds, ECG data compression, seismic data anal-ysis, and crack detection at the Signal Processing and Biomedical TechnologyUnit, Telecommunications Laboratory. His current research interests includehigher order statistics, alpha-stable distributions, higher order zero crossings,wavelets, polyspectra, fractals, neurofuzzy modeling for medical, and mobileand digital signal processing applications.

Prof. Hadjileontiadis is a member of the Technical Chamber of Greece, theHigher-Order Statistics Society, the International Lung Sounds Association, andthe American College of Chest Physicians. He was the recipient of the SecondAward at the Best Paper Competition of the Ninth Panhellenic Medical Confer-ence on Thorax Diseases’97, Thessaloniki. He was also an open finalist at theStudent Paper Competition (Whitaker Foundation) of the IEEE Engineering inMedicine and Biology Society (EMBS), Chicago, IL, 1997; a finalist at the Stu-dent Paper Competition (in memory of Dick Poortvliet) of the MediterraneanConference on Medical and Biological Engineering (MEDICON), Lemesos,Cyprus, 1998; and a recipient of the Young Scientist Award of the 24th Inter-national Lung Sounds Conference, Marburg, Germany, 1999. In 2004, 2005,and 2007, he organized and served as a mentor to 3 five-student teams thathave ranked as third, second, and seventh worldwide, respectively, at the Imag-ine Cup Competition (Microsoft), Sao Paulo, Brazil, 2004/Yokohama, Japan,2005/Seoul, Korea, 2007, with projects involving technology-based solutionsfor people with disabilities. He is currently a Professor in composition at theState Conservatory of Thessaloniki, Greece.

Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 10,2010 at 22:03:41 EST from IEEE Xplore. Restrictions apply.