Mapping facial expression recognition algorithms on a low-power smart camera

MAPPING FACIAL EXPRESSION RECOGNITION ALGORITHMS ON A LOW-POWERSMART CAMERA

A.A. Abbo, V. Jeanne, M. Ouwerkerk,C. Shan, R. Braspenning

Philips Research Eindhoven

A. Ganesh, H. Corporaal

Eindhoven University of Technology

ABSTRACT

Recent developments in the field of facial expression recogni-tion advocate the use of feature vectors based on Local Bi-nary Patterns (LBP). Research on the algorithmic side ad-dresses robustness issues when dealing with non-ideal illu-mination conditions. In this paper, we address the challengesrelated to mapping these algorithms on smart camera plat-forms. Algorithmic partitioning taking into account the cam-era architecture is investigated with a primary focus of keep-ing the power consumption low. Experimental results showthat compute-intensive feature extraction tasks can be mappedon a massively-parallel processor with reasonable processorutilization. Although the final feature classification phase couldalso benefit from parallel processing, mapping on a general-purpose sequential processor would suffice.

Index Terms— Facial Expression Recognition, Low powerSmart Cameras, Video Scene Analysis, Parallel Processing.

1. INTRODUCTION

Advances in CMOS technology enable new application areasemploying distributed sensing systems. Being part of this cat-egory, distributed smart camera systems (DSCS) help to addadditional sensory input to a decision making process. Due tothe computational-intensive nature of video processing algo-rithms, the impact of progress in CMOS process technologyis more visible on the application of smart cameras. This isdemonstrated with the low-cost yet good quality image sen-sors that are integrated in mobile devices and the advances inhighly integrated smart camera modules that form the hard-ware infrastructure of DSCS.

The research challenges concerning DSCS cover a num-ber of layers ranging from optimal hardware design for thesmart camera nodes to developing applications on a networkof sensing nodes. With respect to camera systems targeted atanalyzing facial expressions, the following challenges need tobe addressed:

• Computationally-efficient algorithms for robust facial-expression analysis under varying illumination and view-ing angle.

• Optimal trade-off between on-node processing and stream-ing video for long operation times and small camerasize.

• Energy-efficient partitioning of algorithms on the smartcamera compute resources.

A number of approaches have been proposed in the fieldof facial expression recognition with varying performance lev-els and computational complexity. Methods based on geomet-ric features and appearance models [1] and Gabor filters [2]have been shown to have good recognition rates. Althoughthe Gabor filter based approach appears to be robust in real-world applications, its improved performance comes at theexpense of high computational complexity. Recently in [3],application of Local Binary Patterns (LBP) has been proposedfor extracting facial features primarily motivated by the lowcomputational complexity for equivalent recognition perfor-mance. The complete expression recognition task involves afeature classification stage in which the extracted feature islabeled as one of expression classes (happy, sad, angry, etc.).

Depending on sensor resolution, compute resources on-board the camera and communication means, smart cameraarchitectures take different forms [4, 5, 6, 7]. Despite archi-tectural differences, the common principle in most designs isto bring video processing (analysis) as close as possible to theimage sensor thereby reducing the amount of video (data) thatwould otherwise need to be transferred to a central decisionpoint. Due to increased component and system integration,smart cameras are getting more compact and dissipate lesspower without sacrificing on functional performance.

In this work, we focus on issues related to mapping LBP-based facial expression recognition algorithms on a smart wire-less camera platform. The primary objective is to identifyproper partitioning of the algorithm on the compute resourcesavailable on the camera node in such a way that the overallpower dissipation is kept minimal. Envisaged applications arein power constrained systems such as mobile or re-deployablecamera networks. An example of the latter system is cus-tomer behavior (emotion) analysis in shopping malls. SinceLBP features have been shown to be powerful in representinglocal and global object textures, the results reported here haverelevance in the general sense.

978-1-4244-2665-2/08/$25.00 c©2008 IEEE

https://www.researchgate.net/publication/4138374_A_smart_camera_for_face_recognition?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

https://www.researchgate.net/publication/3745154_Comparison_between_Geometry-Based_and_Gabor_Wavelets-Based_Facial_Expression_Recognition_Using_Multi-Layer_Perceptron?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

https://www.researchgate.net/publication/4186342_Robust_facial_expression_recognition_using_local_binary_patterns?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

https://www.researchgate.net/publication/221477871_Camera_Mote_with_a_High-Performance_Parallel_Processor_for_Real-Time_Frame-Based_Video_Processing?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

https://www.researchgate.net/publication/221477888_An_Automated_Face_Recognition_System_for_Intelligence_Surveillance_Smart_Camera_Recognizing_Faces_in_the_Crowd?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

https://www.researchgate.net/publication/2955745_Smart_cameras_as_embedded_systems?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

The paper is organized as follows: in section 2, a brief de-scription of current facial expression recognition techniquesis given based on the LBP technique. The smart camera plat-form chosen for the mapping exercise is described in section 3followed by a discussion on algorithm partitioning and map-ping options in section 4. Experimental results are presentedin section 5, based on which some conclusions are drawn insection 6.

2. FACIAL EXPRESSION RECOGNITION

Figure 1 shows a block diagram of an automatic facial expres-sion recognition system. The two main stages in the systemare: (i) feature extraction and (ii) feature classification. Thefeature extraction stage involves pre-processing stages suchas face localization in the scene and scaling in addition to ex-traction of a feature of some kind for the region-of-interest(ROI). The feature classification process involves comparingthe generated feature vector with vectors selected to representa set of expressions.

FaceDetection

Local Feature

Detection

FeatureVector

Generation

FeatureTemplates

FeatureClassification

Angry

Happy

Neutral

Video

Fig. 1. Facial Expression Analysis

There is active algorithmic research in the area of facialexpression recognition with the objective of finding a robustrecognition method with modest computational complexity.A recent proposal for facial expression recognition [3, 8] usesfeatures derived from Local Binary Patterns. Originally pro-posed for texture analysis, LBP based methods have been suc-cessfully adopted for face recognition purposes [9]. Whilebeing computationally low-cost, LBP based features vectorshave been shown to have a good descriptive power with re-spect to facial expressions. Motivated by these characteris-tics, this method has been chosen for the mapping exercise inthis paper.

For the final feature classification stage, a number of ap-proaches have been investigated in the literature, ranging fromsimple template matching using weighted Chi-square distancemeasure [9], to Support Vector Machine (SVM) [10]. In [3]and [8], SVM is shown to have better discriminative powerwhen operating on LBP based feature vectors.

2.1. Feature Extraction

2.1.1. Local Binary Patterns

Originally, the LBP method was introduced for texture anal-ysis using a 3x3 pixel neighborhood [11]. Further researchon this topic has lead to kernels with different neighborhoodsizes enabling application of LBPs to areas like face recogni-tion [9].

In Figure 2, the generation of an LBP(8,2) code is shown,where 8 neighbors at a circular distance of 2 pixels from thepivot pixel (P) are considered. Bi-linear interpolation on a 2x2pixel window is used to generate intensity values for samplesthat do not lie at the center of the grids (those marked withX). The samples located in the green boxes are directly usedin the LBP computation.

75474039

10362312733

724352

52466669

5661747166

x

xx

x

A

p1 p2

p3 p4

p2 p4

p1 p3

p2 p1

p4 p3

p4 p3

p2 p1

Fig. 2. LBP code generation; the binary pattern is built viaclock wise scanning with the least-significant bit at A, i.e.,(11111001)2 = (249)10.

Contribution (Ci) of each neighbor (Ni) to the LBP (8, 2)code is determined through a simple comparison (Equation1) and binary weighting (Equation 3). Bilinear interpolationfor the off-grid samples, i.e., neighbors 1,3,5, an 7, is done asshown in Equation 2. The pixels b = [b1, b2, b3, b4] lie aroundthe new sampling point with the corresponding weights w =[0.1716, 0.2426, 0.2426, 0.3431].

Ci ={

1 if Ni ≥ P, i = 0..70 otherwise

(1)

Ni = w • b, i = 1, 3, 5, 7 (2)

LBP (8, 2) =i=7∑i=0

Ci × 2i (3)

2.1.2. Spatially Enhanced Histogram

The LBP codes are transformed into a feature vector by cre-ating a so-called spatially-enhanced histogram. The vector is

https://www.researchgate.net/publication/220931208_Robust_Texture_Classification_by_Subsets_of_Local_Binary_Patterns?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

https://www.researchgate.net/publication/6690158_Face_Description_with_Local_Binary_Patterns_Application_to_Face_Recognition?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0





https://www.researchgate.net/publication/4119592_Dynamics_of_Facial_Expression_Extracted_Automatically_from_Video?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

built by concatenating local histograms from partitions of theregion-of-interest. Figure 3 shows a partitioning into 42 (7rows by 6 columns) regions of 21×18 pixels. The bins of thehistogram are selected based on a uniformity criterion defin-ing the maximum number of bit transitions (’1’ to ’0’ or viceversa) in the LBP code. For the case of uniformity order 2,there are 58 uniform codes that have at most 2 bit transitions.The remaining codes are lumped into one non-uniform bin.Thus, for the chosen partitioning, one finds a feature vector oflength 2478 (= 7× 6× 59).

Fig. 3. Region-of-interest partitioning for spatially-enhancedhistogram generation.

2.2. Feature Classification

In [3], a simple approach with modest recognition perfor-mance is described in which feature templates are generatedby taking the average histogram of different test images fora given class of expression. The weighted Chi square metricis then applied to determine to which expression template themeasured facial feature comes closest to. The metric is givenby Equation 4, where S and M represent the measured fea-ture vector and one of the expression templates, respectively.The weighting vector w accentuates feature-rich regions andhelps to improve the recognition performance. The classifi-cation stage selects an expression associated to the templatethat gives the smallest χ2 value.

χ2(S, M) =∑i,j

wi,j(Si,j −Mi,j)2

Si,j + Mi,j(4)

To improve the expression recognition accuracy further,[3, 8] propose feature classification based on Support VectorMachines. In this work, we restrict our focus to the templatematching approach since the reasoning derived from the al-gorithm mapping experiment can be extended easily to otherclassification methods.

3. LOW-POWER SMART WIRELESS CAMERAARCHITECTURE

Smart cameras have been proposed over the years for real-time vision applications. While early smart vision systems

relied on PC-based video processing, the availability of high-performance Digital Signal Processors (DSPs) and vision pro-cessors has led to true smart cameras capable of locally inter-preting video scenes and generating high-level information.The motivation behind smart cameras includes cost and sizereduction compared to PC-based systems, ease of deploymentand low power operation due to reduction in video transfer.

Equipped with high-performance embedded processors andimage sensors, current mobile devices (e.g., phones) are ef-fectively becoming smart cameras allowing image and videoprocessing at lower frame rates and resolutions. In [12], thefeasibility of mapping of an LBP based face recognition algo-rithm on a mobile phone equipped with an ARM9 processoris described.

Functional blocks of a typical smart camera architectureare depicted in Figure 4. The processing power on the smartcamera is partitioned into an image processing block for thecompute-intensive part and a micro-processor for handlinghigh-level decision making algorithms.

ImageProcessor Memory Micro

Processor

PowerManagement

Communication(Radio)

Interface & Glue LogicImageSensor

Fig. 4. Camera functional block diagram

3.1. The WiCa Wireless Smart Camera Platform

Figure 5 shows a photo of the 2nd-generation wireless smartcamera called WiCa [6]. The camera is capable of processingstereo images from the two image sensors and is equippedwith a massively-parallel processor (Xetal-II [13]) for low-level and intermediate vision processing and an 8051 micro-controller for high-level decision making and camera controltasks. The two processors communicate via an SRAM anddecision results can be communicated wireless to other nodesvia a ZigBee radio. A low-power Complex ProgrammableLogic Device (CPLD) is used to realize interface and glue-logic functionalities between the different components.

3.2. Xetal2-II Parallel Processor

Top-level architecture of Xetal-II is shown in Figure 6. Itis a massively-parallel single instruction multiple data (MP-SIMD) processor realized using 320 processing elements (PEs)organized as a linear processing array (LPA). A total 10 Mbiton-chip frame memory is allocated to the LPA to facilitateframe-iterative video processing tasks. Video interfacing tothe chip is provided via a data input processor (DIP) and




https://www.researchgate.net/publication/4283021_Face_and_Eye_Detection_for_Person_Authentication_in_Mobile_Phones?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

XETAL-II

SRAM8051

ZigBee

SENSO

RSE

NSO

R

CPLD

Fig. 5. Smart Stereo Wireless Camera (WiCa1.2.1)

data output processor (DOP). Both interfaces provide threeindependent video channels, each with 10-bit resolution andcarrying up to 80 Mpixels/s per channel. Serial-to-paralleland parallel-to-serial stream conversion is done via two 320-element-wide sequential line memories. A global control pro-cessor (GCP) manages operation of the IC and provides se-quential data processing capability.

VideoOut

Video In

I2CGPOGPIGlobal

ControlProcessor

Program (16k x 56b)

Data (2k x 16b)

Linear Processor Array(320 PEs)

Sequential I/O Memory(2 lines x 320 pixels)

DIP DOPFrame Memory(2048 lines x 320 pixels)

IMEM

(240

kb)

OM

EM,L

UT

(240

kb)

Fig. 6. Xetal-II top-level architecture

When operating at 110 MHz, Xetal-II provides up to 140GOPS of compute power while dissipating less than 785 mWresulting in a computational efficiency of 177 GOPS/Watt.This high performance figure originates from the good matchbetween architectural and data-level parallelism, which is com-mon in low-level image processing tasks like LBP computa-tion. Similar performance figures observations have been re-ported in recent vision processor designs [14, 15]. Xetal-II isrealized in 90 nm CMOS technology and occupies 74 mm2, abig part of which is allocated for the 10 Mbit video memory.Detailed architectural description of the IC can be found in[13].

4. ALGORITHM MAPPING

In this section, we address issues related to mapping the LBP-based facial expression recognition algorithm on the WiCaplatform. We will investigate a couple of task partitioningoptions and see how effectively the two compute resources,i.e., Xetal-II and the 8051 micro-controller, are utilized. Thegoal is to find a partition in which the overall operation leadsto a low-power consumption.

Although we address the algorithms for expression recog-nition, it should also be noted that there is an earlier phase inwhich detection of face(s) is done to determine the region-of-interest. In relation to the chosen smart camera platform, thistopic has been addressed in [16] where real-time face detec-tion is demonstrated using a facial feature detection algorithmrunning mainly on the MP-SIMD processor.

4.1. LBP Computation

Since the LBP operations given in equations 1, 2 3 are exe-cuted on every pixel in the region-of-interest, they are natu-rally suited for parallel implementation. The code segmentin Table 1 shows how the LBP computation can be codedin the XTC language of Xetal-II. The three operations, i.e.,bi-linear interpolation, comparison and binary weighting areshown which generate a decimal LBP code from 8 neighborsof a pixel.

The .left and .right operators are used to access the leftand right neighbors of the pivot pixel cr o, respectively. Asshown in Figure 2, a total of 5 image lines (ff, fs, cr, pf, ps)are involved in computing LBP(8,2). The extensions o ande are used to access the odd and even pixels of an image line.

In principle, since the LBP generation can be performedon a line-basis, the amount of on-chip storage required forthis algorithm is just the LBP row-span, i.e., 5 rows of VGAlines (640 pixels). This is equivalent to 10 Xetal-II line mem-ories of 320 pixels each, which is only 0.5 % of the availableon-chip memory, implying that there is sufficient room foraccommodating pre-processing (such as face detection) andpost processing (histogram generation and feature classifica-tion).

4.2. Spatially-Enhanced Histogram Generation

4.2.1. LBP to Bin Conversion

This is the first step in which the LBP code at each pixel lo-cation is transformed into one of the 58 bins representing uni-form codes or lumped into a bin for the non-uniform codes.This could be realized using a look-up-table (LUT) in whichthe LBP code is used as an address to a table filled with bin in-dexes. With regard to the WiCa platform the following threeimplementations are feasible: (i) using a bin table of 58 en-tries in the GCP data memory (ii) using the LUT memoryin the data output processor (DOP) (iii) off-loading the LUT

https://www.researchgate.net/publication/255607787_REAL-TIME_FACE_DETECTION_ON_A_DUAL-SENSOR_SMART_CAMERA_USING_SMOOTH_EDGES_TECHNIQUE?el=1_x_8&enrichId=rgreq-b659dfd414277f4d9ca71767d6e5443f-XXX&enrichSource=Y292ZXJQYWdlOzIyMTQ3NzczMjtBUzo5ODU3MDMwMDU1OTM3OUAxNDAwNTEyNDAxMzQ0

Table 1. LBP computation on a VGA size image line.------------------------------------------------void lbpo(){lbpo = (((ff_o.left*0.1716+

ff_e.left*0.2426+fs_o.left*0.2426+fs_e.left*0.3431) >= cr_o)?128:0) +

(( ff_o >= cr_o)?64:0 ) +(((fs_o.right*0.2426+

fs_e*0.3431 +ff_e*0.2426 +ff_o.right*0.1716) >= cr_o)?32:0 ) +

(( cr_o.right >= cr_o)?16:0 ) +(((ps_e*0.2426 +

pf_e*0.3431 +ps_o.right*0.1716+pf_o.right*0.2426) >= cr_o)?8:0 ) +

(( ps_o >= cr_o)?4:0 ) +(((ps_o.left*0.1716 +

ps_e.left*0.2426 +pf_o.left*0.2426 +pf_e.left*0.3431) >= cr_o)?2:0 ) +

(( cr_o.left >= cr_o)?1:0 ) ;

}-----------------------------------------------

computation to the sequential (8051) processor. We discussthe first option further to see how well histogram computa-tion could be done by combined GCP-LPA operation on theMP-SIMD processor.

4.2.2. Row-wise Aggregation

In this phase, each LBP line is compared with the 58 uniformbins and every time the LBP code matches a bin, the corre-sponding histogram entry is incremented. For this purpose,one Xetal-II line is reserved per bin to keep the histogramcount. Since the feature vector construction requires separatehistogram per region, in total, 406 lines are needed to holdhistograms for all the 7 horizontal partitions of the region-of-interest. The storage requirement can be reduced at the ex-pense of more computation if the column-wise aggregation isdone right after the row-wise aggregation per horizontal par-tition.

4.2.3. Column-wise Aggregation

Here, the objective is to cluster the bin contributions of all thecolumns of a region. Since each region has a horizontal spanof 18 pixels which are folded into 9 columns of odd and evenpixels, the column-wise aggregation has effectively to collectpixels over 9 columns. The contribution of odd and even pix-els is already merged during the row-wise aggregation.

Due to the limited neighbor connectivity of Xetal-II pro-cessing elements, i.e., direct left and right neighbors, column-wise operations are in-general executed less efficiently. How-

ever, this can be managed to a certain extent by writing in-structions taking into account the hardware connectivity andpipeline latency.

4.2.4. Sequential Histogram Implementation

This alternative shifts part or all of the histogram computationto a sequential processor, in the WiCa case, to the 8051 micro-controller. By making use of the LUT in the DOP module ofXetal-II, the LBP code can be transformed to the 59 histogrambins as it streams out. In this case, the task of the sequentialprocessor becomes building separate histograms for the 42partitions of the region-of-interest. Since data streams out ofXetal-II line by line, the sequential processor needs to identifythe boundaries of 21 × 18 LBP values per partition. Anotheroption is to stream out the result of row-wise aggregation,so that the less SIMD friendly part of column aggregation isdone on the 8051 core.

4.3. Feature Classification

In principle, template matching based feature classificationusing the weighted χ2 distance metric (see Equation 4) canbe realized on the parallel processing hardware; which makessense specially when the histogram calculation is also doneon the parallel engine.

Assuming a LUT-based division operation, evaluating Equa-tion 4 for a feature vector of length 2478 involves around12390 basic arithmetic and memory operations. When check-ing for 7 facial expression classes, and a video speed of 25frames per second, the raw computational load amounts toabout 2.2 MOPS. Depending on the compute power available,mapping on the sequential processor could be a possibility.

5. EXPERIMENTAL RESULTS

Table 2 gives a summary of the execution cycle count for dif-ferent parts of LBP-based facial expression recognition algo-rithm on a region-of-interest of 108-by-147 pixels. Due to thelimited compute power available in the micro-controller, alltasks other than the final template matching are mapped onthe SIMD processor. The histogram computation task is splitinto two: Hist(1) where LBP code to histogram bin conver-sion and row-wise aggregation are done, and Hist(2) wherecolumn wise aggregation is done to build the final feature vec-tor.

While LBP computation maps efficiently on Xetal-II, dueto high data-level parallelism, both histogram computationtasks are executed less efficiently. Due to the search over 58bins during the mapping phase, Hist(1) takes up a large num-ber of clock cycles. The column on array utilization gives apercentage of PEs that perform a useful task, which is ratherlow during the histogram computation phases. The figure ob-viously increases when multiple ROIs are handled in parallel.

Related to the array utilization, the column on compute per-formance gives a measure of useful computations deliveredby the SIMD processor during execution of each task.

The last column gives an indication on power consumedby the respective processor when working on the assignedtasks. In case of Xetal-II, a total of 56588 execution cyclesin one frame time (1/25Hz = 40 ms) amount to an averageoperation frequency of 1.41 MHz. When combined with acompute efficiency of (785 mW/110MHz = 7.14 mW/MHz)for the processor, this gives an average power dissipation ofabout 10 mW. It should be noted that the dissipation will behigher when a face detection task is also included. Since alarge portion of the chip area is taken up by low-leakage framememories, the idle power of Xetal-II is limited to about 1.7mW.

The weighted χ2 based template matching was mapped onto the 8051 micro-controller. The cycle count obtained from asimulator when matching the generated feature vector with 7expression templates is around 29 MCycles, which translatesinto 1.2 seconds of processing time at an operating frequencyof 24 MHz. The large cycle count is a consequence of the8051 processor architecture, which has a simple 8-bit datap-ath and instructions that cost 12 or 24 clock cycles. Replacingthe micro-controller with a high performance general-purposeprocessor would improve the performance. For example, sim-ulation on an ARM9E family processor with 32 kB data cacheand running at 100 MHz shows processing time of 9 ms forthe template matching task, which is within a frame intervalof 40 ms.

Table 2. Execution profile of facial expression recognition onthe WiCa platform.

Cycle Array Perf. Aver.

Processor Algorithm Count Utiliz. @ 25 fps Power

(ROI) [%] [MOPS] [mW]

LBP 15582 17 22

Xetal-II Hist (1) 34104 1.8 5

Hist (2) 6902 1.8 1

Total 56588 6 32 12

8051 Template

Matching 29.1e6 – – 73

6. CONCLUSIONS

In this paper, the issue of mapping facial expression recogni-tion algorithms on a low-power smart camera platform is ad-dressed. A recognition algorithm based on Local Binary Pat-terns (LBP) and spatially-enhanced histogram has been inves-tigated to determine how efficiently it maps on a platform witha massively-parallel single-instruction multiple-data (SIMD)

processor (Xetal-II) and a general-purpose micro-controller(8051 family). Due to its pixel parallel operation, the LBP al-gorithm maps well on the SIMD processor. However, with re-spect to spatially-enhanced histogram computation, the SIMDprocessor utilization is about 1.8%, indicating the sequentialnature of the algorithm. For the final phase, i.e., feature classi-fication, a template matching algorithm based on the weightedχ2 metric is considered. Since mapping this algorithm onthe 8051 micro-controller reduces the achievable frame rate,upgrading to a high-performance general-purpose processorwould be necessary.

7. ACKNOWLEDGMENTS

We would like to thank R. Kleihorst and B. Schueler of NXPResearch for support regarding the WiCa smart camera plat-form.

8. REFERENCES

[1] Y. Tian, T. Kanade, and J.F. Cohn, “Recognizing actionunits for facial expression analysis,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 23,no. 2, 2001.

[2] Z. Zhang, M.J. Lyons, M. Schuster, and S. Aka-matsu, “Comparison between geometry-based andgabor-wavelets-based facial expression recognition us-ing multi-layer perceptron,” in Proceedings of the In-ternational Conference on Automatic Face and GestureRecognition, April 1998.

[3] C. Shan, S. Gong, and P.W. McOwan, “Robust facialexpression recognition using local binary patterns,” inProceedings of the International Conference on ImageProcessing (ICIP), Sept 2005.

[4] W. Wolf, B. Ozer, and T. Lv, “Smart camera as em-bedded system,” in IEEE Computer Magazine, 2002,vol. 35, pp. 48–53.

[5] R. Kleihorst, M. Reuvers, B. Krose, and H. Broers, “Asmart camera for face recognition,” in Proceedingsof the International Conference on Image Processing(ICIP), 2004, pp. 2849–2852.

[6] R. Kleihorst, A.A. Abbo, B. Schueler, and A. Danilin,“Camera mote with a high-performance parallel proces-sor for real-time frame-based video processing,” in Pro-ceedings of the International Conference on DistributedCamera Systems (ICDSC), Sep 2007.

[7] Y.M Mustafah, A.W. Azman, A. Bigdeli, and B.CLovell, “An automated face recognition system for in-telligence surveillance: Smart camera recognizing faces





















in the crowd,” in Proceedings of the International Con-ference on Distributed Camera Systems (ICDSC), Sep2007.

[8] V. Jeanne, C. Shan, T. Gritti, R. van Leeuwen,R. Braspenning, and G. de Haan, “Improved facialexpression recognition using local binary patterns withconfidence measures,” in Submitted to ECCV2008,2008.

[9] T. Ahonen, A. Hadid, and M. Pietikinen, “Face de-scription with local binary patterns: Application to facerecognition,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 28, no. 12, pp. 2037–2041, 2006.

[10] G. Littlewort, M.S. Bartlett, I. Fasel, J. Susskind, andJ. Movellan, “Dynamics of facial expression extractedautomatically from video,” Image and Vision Comput-ing Journal, vol. 24, pp. 615–625, 2006.

[11] T. Maenpaa, T. Ojala, M. Pietikinen, and M. Soriano,“Robust texture classification by subsets of local binarypatterns,” in Proc. of 15th International Conference onPattern Recognition, 2000, vol. 3, pp. 935 – 938.

[12] A. Hadid, J.Y. Heikkila, O. Silven, and M. Pietikinen,“Face and eye detection for person authentication in mo-bile phones,” in Proceedings of the International Con-ference on Distributed Camera Systems (ICDSC), Sep2007.

[13] A.A. Abbo et al., “Xetal-II: A 107GOPS, 600mW mas-sively parallel processor for video scene analysis,” IEEEJournal of Solid State Circuits, vol. 43, no. 1, January2008.

[14] S. Arakawa et al., “A 512GOPS fully-programmabledigital image processor with full hd 1080p processingcapabilities,” in ISSCC Dig. of Tech. Papers 2008, Feb2008, pp. 312 – 313.

[15] C. Cheng et al., “An intelligent visual sensor soc with2790 fps cmos image sensor and 205GOPS/W visionprocessor,” in ISSCC Dig. of Tech. Papers 2008, Feb2008, pp. 306 – 307.

[16] V. Jeanne, F. Jegaden, R. Kleihorst, A. Danilin, andB. Schueler, “Real-time face detection on a dual-sensorsmart camera using smooth edges technique,” in Work-shop on Distributed Smart Cameras, October 2006.























Mapping facial expression recognition algorithms on a low-power smart camera

Documents

Transcript of Mapping facial expression recognition algorithms on a low-power smart camera