Real-time face location on gray-scale static images

15
* Corresponding author. Tel.: #39-(051)-2093547; fax: #39- (051)-2093540. E-mail address: dmaio@deis.unibo.it (D. Maio). Pattern Recognition 33 (2000) 1525}1539 Real-time face location on gray-scale static images Dario Maio!,",*, Davide Maltoni!," !Corso di Laurea in Scienze dell'Informazione, Universita % di Bologna, via Sacchi 3, 47023 Cesena, Italy "DEIS - CSITE-CNR, Facolta % di Ingegneria, Universita % di Bologna, viale Risorgimento 2, 40136 Bologna, Italy Received 14 October 1998; accepted 19 May 1999 Abstract This work presents a new approach to automatic face location on gray-scale static images with complex backgrounds. In a "rst stage our technique approximately detects the image positions where the probability of "nding a face is high; during the second stage the location accuracy of the candidate faces is improved and their existence is veri"ed. The experimentation shows that the algorithm performs very well both in terms of detection rate (just one missed detection on 70 images) and of e$ciency (about 13 images/s can be processed on Hardware Intel Pentium II 266 MHz). ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Face location; Directional image; Generalized Hough transform; Ellipse detection; Elliptical "tting; Deformable template 1. Introduction Automatic face location is a very important task which constitutes the "rst step of a large area of applications: face recognition, face retrieval by similarity, face track- ing, surveillance, etc. (e.g. Ref. [1]). In the opinion of many researchers, face location is the most critical step towards the development of practical face-based biomet- ric systems, since its accuracy and e$ciency have a direct impact on the system usability. Several factors contribute to making this task very complex, especially in the case of applications requiring to operate in real-time on gray- scale static images. Complex backgrounds, illumination changes, pose and expression changes, head rotation in the 3D space and di!erent distances between the subject and the camera are the main sources of di$culty. Many face-location approaches have been proposed in the literature, depending on the type of images (gray- scale images, color images or image sequences) and on the constraints considered (simple or complex back- ground, scale and rotation changes, di!erent illumina- tions, etc.). Giving a brief summary of the conspicuous number of works requires a pre-classi"cation; unfortu- nately, due to the large amount of di!erent techniques used by researchers this task is not so easy. While we are aware of the unavoidable inaccuracies, we have tried to make a tentative classi"cation: f Methods based on template matching with static masks and heuristic algorithms which use images taken at di!erent resolutions (multiresolution approaches) [2,3]. f Computational approaches based on deformable tem- plates which characterize the human face [4] or inter- nal features [5}8]: eyes, nose, mouth. These methods can be conceived as an evolution of the previous class, since the templates can be adapted to the di!erent shapes characterizing the searched objects. The tem- plates are generally de"ned in terms of geometric primitives like lines, polygons, circles and arcs; a "tness criterion is employed to determine the degree of matching. f Face and facial parts detection using dynamic con- tours or snakes [6,9}11]. These techniques involve a constrained global optimization, which usually gives very accurate results but at the same time is computa- tionally expensive. f Methods based on elliptical approximation and on face searching via least-squares minimization [12], 0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 0 - 2

Transcript of Real-time face location on gray-scale static images

*Corresponding author. Tel.: #39-(051)-2093547; fax: #39-(051)-2093540.

E-mail address: [email protected] (D. Maio).

Pattern Recognition 33 (2000) 1525}1539

Real-time face location on gray-scale static images

Dario Maio!,",*, Davide Maltoni!,"

!Corso di Laurea in Scienze dell'Informazione, Universita% di Bologna, via Sacchi 3, 47023 Cesena, Italy"DEIS - CSITE-CNR, Facolta% di Ingegneria, Universita% di Bologna, viale Risorgimento 2, 40136 Bologna, Italy

Received 14 October 1998; accepted 19 May 1999

Abstract

This work presents a new approach to automatic face location on gray-scale static images with complex backgrounds.In a "rst stage our technique approximately detects the image positions where the probability of "nding a face is high;during the second stage the location accuracy of the candidate faces is improved and their existence is veri"ed. Theexperimentation shows that the algorithm performs very well both in terms of detection rate (just one missed detectionon 70 images) and of e$ciency (about 13 images/s can be processed on Hardware Intel Pentium II 266 MHz).( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Face location; Directional image; Generalized Hough transform; Ellipse detection; Elliptical "tting; Deformable template

1. Introduction

Automatic face location is a very important task whichconstitutes the "rst step of a large area of applications:face recognition, face retrieval by similarity, face track-ing, surveillance, etc. (e.g. Ref. [1]). In the opinion ofmany researchers, face location is the most critical steptowards the development of practical face-based biomet-ric systems, since its accuracy and e$ciency have a directimpact on the system usability. Several factors contributeto making this task very complex, especially in the case ofapplications requiring to operate in real-time on gray-scale static images. Complex backgrounds, illuminationchanges, pose and expression changes, head rotation inthe 3D space and di!erent distances between the subjectand the camera are the main sources of di$culty.

Many face-location approaches have been proposed inthe literature, depending on the type of images (gray-scale images, color images or image sequences) and onthe constraints considered (simple or complex back-ground, scale and rotation changes, di!erent illumina-tions, etc.). Giving a brief summary of the conspicuous

number of works requires a pre-classi"cation; unfortu-nately, due to the large amount of di!erent techniquesused by researchers this task is not so easy. While we areaware of the unavoidable inaccuracies, we have tried tomake a tentative classi"cation:

f Methods based on template matching with staticmasks and heuristic algorithms which use images takenat di!erent resolutions (multiresolution approaches)[2,3].

f Computational approaches based on deformable tem-plates which characterize the human face [4] or inter-nal features [5}8]: eyes, nose, mouth. These methodscan be conceived as an evolution of the previous class,since the templates can be adapted to the di!erentshapes characterizing the searched objects. The tem-plates are generally de"ned in terms of geometricprimitives like lines, polygons, circles and arcs; a "tnesscriterion is employed to determine the degree ofmatching.

f Face and facial parts detection using dynamic con-tours or snakes [6,9}11]. These techniques involvea constrained global optimization, which usually givesvery accurate results but at the same time is computa-tionally expensive.

f Methods based on elliptical approximation and onface searching via least-squares minimization [12],

0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 3 0 - 2

Fig. 1. Two di!erent functional schemes of our approach.

incremental ellipse "tting [13] and elliptic regiongrowing [14].

f Approaches based on the Hough transform [7] andthe adaptive Hough transform [15].

f Methods based on the search for a signi"cant group offeatures [triplets, constellations, etc.] in the contextconsidered: for example, two eyes and a mouth suit-ably located constitute a signi"cant group in the con-text of a face [7,16}19].

f Face search on the eigenspace determined via PCA[20] and face location approaches based on the in-formation theory [21,22].

f Neural Network approaches [23}30]. The best resultshave been obtained by using feed forward networks toclassify image portions normalized with respect toscale and illumination. During the training, examplesof face objects and non-face objects are presented tothe network. The high computational cost, induced bythe need to process at di!erent resolutions all thepossible positions of a face in the image, is the maindrawback of these methods.

f Face location on color images through segmentationin a color-space: YIQ, YES, HSI, HSV, Farnsworth,etc [27,31}36]. Generally, color information greatlysimpli"es the localization task: a simple spectro-graphic analysis shows that the face skin pixels areusually clustered in a color space, and then an ad hocsegmentation allows the face to be isolated from thebackground or, at least, to drastically reduce theamount of information which must be processed dur-ing the successive stages.

f Face detection on image sequences using motion in-formation: optical #ow, spatio-temporal gradient, etc.[27,33,37].

Since in several applications it is mandatory (or prefer-able) to deal with static gray-scale images we believe it isimportant to develop a method which does not exploitadditional information like color and motion. Forexample, most of the surveillance cameras nowadaysinstalled in shops, banks and airports are still gray-scalecameras (due to the lower cost), and the electronic pro-cessing of mug-shot or identity card databases couldrequire to detect faces from static gray-scale picturesprinted on paper. Unfortunately, if we discard color andmotion-based approaches, the most robust methods aregenerally time consuming and cannot be used in real-time applications.

The aim of this work is to provide a new method whichis capable of processing gray-scale static images in realtime. The algorithm must operate with structured back-grounds and must tolerate illumination changes, scalevariations and small head rotations.

Our approach (Fig. 1(a)) is based on a locationtechnique which starts by approximately detecting theimage positions (or candidate positions) where theprobability to "nd a face is high (module AL) andthen for each of them improves the location accuracyand veri"es the presence of a true face (module FLFV).Actually, most of the applications in the "eld ofbiometric systems require detection of just one objectin the image (i.e. the foreground object): underthis hypothesis, a more e$cient implementation ofour method is reported in Fig. 1(b), where at each stepthe module AL passes only the most likely position toFLFV and FLFV continues to require a new positionuntil a valid face is detected or no more candidates areavailable. It should be noted that, even in this case, thesystem could be used to detect more faces in an image,

1526 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

Fig. 2. Two images and the corresponding directional images. The vector lengths are proportional to their moduli.

1The vector direction is unoriented and lies in the range[!903, #903[.

assuming that the iterative process is not prematurelyinterrupted.

Although AL and FLFV have been implemented ina very di!erent manner, both these modules work on thesame kind of data: that is the directional image extractedby the starting gray-scale image.

In Section 2 the directional image is de"ned and somecomments about its computation are reported. Section3 describes the module AL which is based on the searchof elliptical blobs in the directional image by means of thegeneralized Hough transform. In Section 4 we present thedynamic-mask-based technique used for "ne locationand face veri"cation (module FLFV) and in Section 5we discuss how to practically combine AL and FLVF inorder to implement the functional schema of Fig. 1(b).Section 6 reports the results of our experimentation overa 70 image database; "nally, in Section 7, we present ourconclusions and discuss future research.

2. Directional image

Most of the face location approaches perform an initialedge extraction by means of a gradient-like operator; fewmethods also exploit other additional features like direc-

tional information, intensity maxima and minima, etc.Our technique strongly relies on the edge phase-anglescontained in a directional image.

A directional image is a matrix de"ned over a discretegrid, superimposed on the gray-scale image, whose ele-ments are in correspondence with the grid nodes. Eachelement is a vector lying on the xy plane. The vectordirection1 represents the tangent to the image edges ina neighborhood of the node, and its modulus is deter-mined as a weighted sum of the contrast (edge strength)and the consistency (direction reliability) (Fig. 2). In thiswork the directional image is computed by means of themethod proposed by Donahue and Roklin [38]. Eachdirectional image element is calculated over a localwindow where a gradient-type operator is employed toextract several directional estimates (2D sub-vectors),which are averaged by least-squares minimization tocontrol noise. This technique is more robust than thestandard operators used for computing the gradientphase angle and enables the contrast and the consistency

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1527

Fig. 3. The template T (in dark gray) is constituted by thosepoints which are possible centers of ellipses capable of origin-ating in [x

0, y

0] a vector d with direction u.

2The angular variation h is introduced to compensate forboth small ellipse rotations and inaccuracies in the computationof u.

to be calculated with a very small overhead. In particular,the contrast is derived by the magnitude of the 2D sub-vectors and the consistency is in inverse proportion withthe residual resulting from the least-squares minimiz-ation [38] (in fact, an high reliability corresponds toa low residual, that is, all the 2D sub-vectors are nearlyparallel).

3. AL - approximate location

The analysis of a certain number of directional imagessuggested the formulation of a simple method for detect-ing faces. In particular, we noted that when a face ispresent in an image the corresponding directional imageregion is characterized by vectors producing an ellipticalblob. For this reason, the module AL is based on thesearch for ellipses on the directional image. Several tech-niques could be used for this purpose, for examplemultiresolution template matching [39] and least-squares ellipse "tting [12]. We adopted a new approachbased on the generalized Hough transform [40] whichperforms very well in terms of e$ciency. The method iscapable of detecting the approximate position of all theellipses within a certain range of variation de"ned ac-cording to pre-"xed scale and rotation changes. Thebasic idea is to perform a generalized Hough transformby using an elliptical annulus C as template. Actually, thedirectional information allows the transform to be imple-mented very e$ciently, since the template used for theaccumulator array updating can be reduced to only twosectors of the elliptical annulus.

Formally:Let a and b be the lengths of the semi-axes of an el-

lipse used as reference, and let o3and o

%be, respectively,

the reduction and expansion coe$cients de"ning thescale range (and hence the elliptical annulus C):a.*/

"o3) a , b

.*/"o

3) b , a

.!9"o

%) a , b

.!9"o

%) b

(Fig. 3). Let D be the directional image and let A be theaccumulator array, then the algorithm can be sketchedas:

Reset A;∀ vector d3D

M[x0, y

0]"origin(d);

u"direction(d);p"modulus(d);T"current}template([x

0, y

0], u);

∀ pixel [x, y]3TMA[x, y]"A[x, y]#p )weight

T([x, y]);N

N

The high-score A cells are good candidates for ellipsecenters.

The direction(d) and the modulus(d) of the directionalelements d are calculated o!-line as described in Section

2; current}template([x0, y

0], u) determines the current

template T as a function of the direction u of the vectorcentered in [x

0, y

0]. The points [x

1, y

1] and [x

2,y

2] in

Fig. 3 are the only two points where an ellipse tangent tod in [x

0, y

0] with semi-axes a, b could be centered. Since

we are interested in all the ellipses whose semi-axes arein the range [a

.*/2a.!9

, b.*/2

b.!9

], we must take intoaccount all the points lying on the two segments deter-mined by the intersection between the straight line de-"ned by [x

1, y

1], [x

2,y

2] and the elliptical annulus C.

Finally, if we assume a maximum angular variation2 h onthe directional information, the geometric locus T of thepossible centers becomes:

T"G[x, y]Do2r)A

x!x0

a B2#A

y!y0

b B2)o2

%,

*angleAarctgAy!y

0x!x

0B, /B)

h2H.

where *angle(a, b): [!903, #903[][!903, #903[P[03, 903] returns the smaller angle determined by the

1528 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

Fig. 5. An arti"cial image and the corresponding transform.

Fig. 4. (a) graphic representation of an ideal template T, and (b)its corresponding discrete version induced by the discretizationof A.

directions a, b; the angle / can be computed as a functionof u by deriving the tangent vector expression by theparametric equation of the ellipse:

/"arctgA!b

a ) tg(u)B.The function weight

T: TP[0, 1] associates, to each point

[x, y] of T, a weight which decreases with the angulardistance between the straight line de"ned by [x,y],[x

0, y

0] and the direction /:

weightT(x, y)"1!

2 )*angle(arctg((y!y0)/(x!x

0)), /)

h.

Fig. 4 shows a representation of a template T whoseelements are associated to gray levels proportional totheir weights (the light pixels within the sectors are asso-ciated to larger weights).

An e$cient implementation of the module AL hasbeen obtained by adopting the following tricks:

f The grid which de"nes the accumulator array A hasbeen set equal to that de"ning D. In particular the

granularity used is 7]7 pixels, that is, the directionalimage elements are calculated every 7]7 pixels.Therefore, the size of both the directional image andthe accumulator array associated to an X]> pixelimage is xX/7y]xY/7y .

f The directions of the elements in D have been dis-cretized (256 values).

f The templates T have been pre-computed (in our ex-periments we chose a"34, b"45, o

3"0.6, o

%"1.2,

h"303); by using relative coordinates with respect tothe ellipse center, the number of di!erent templatescorresponds to the number of di!erent directions.

f The algorithm has been implemented in integer arith-metic.

Fig. 5 shows an example of ellipse detection as per-formed by AL: the starting image contains "ve di!erentlyshaped ellipses; the array of accumulator resulting fromthe Hough transform clearly exhibits a local maximumcorresponding to each ellipse center.

4. FLFV - 5ne location and face veri5cation

Di!erent strategies can be adopted in order to improvethe location accuracy and to verify whether an ellipticalobject is really a face or not. Some of the alternatives weexplored are reported in the following:

f Improving the ellipse center location through AHT(Adaptive Hough Transform) [41,42] which requiresthe granularity of the hot accumulator cells to begradually re"ned.

f Local optimization of the center [x#, y

#], of the semi-

axes a and b and of the ellipse tilt angle m through alocal optimization algorithm (Steepest descent, DownhillSimplex method [43], etc.) which searches for the best-"tting ellipse in the parameter space [x

#, y

#, a, b, m].

f Position optimization and face veri"cation throughthe detection of a symmetry axis [44]; a symmetry axis

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1529

Fig. 6. The "gure shows the projection, on the vertical axis, of the directional image vectors belonging to the region delimited by thewhite rectangle. The local minima, generated by the presence of horizontal vectors in the eye and mouth regions could be used for theregistration according to the internal feature positions.

3Actually, the border region constitutes an exception: in thiscase we "xed a priori the total number n

"of elements within the

region and we created an element each 360/n"

degree by snap-ping it to the grid point closest to the border ellipse. In oursimulation we chose n

""30.

can be extracted both from the original and the direc-tional image.

f Projection of the image portion containing the ellipseon the symmetry axis and on its orthogonal one.Several authors [34,35,45,46] demonstrated that theseprojections are characterized by local intensity minimain the regions corresponding to the eyes and themouth. The projection method can also be applied tothe directional image: in this case local maxima andminima are present in the eye, nose and mouth regionsdue to the presence of horizontal and vertical vectors(Fig. 6).

In our preliminary work [47] the "ne location andface veri"cation sub-tasks were performed sequentially.From the experimental evidences we argued that betterresults could be achieved by executing both operationssimultaneously. In fact, a precise face locationcannot always be obtained by using an ellipticaltemplate, since particular physiognomies, hairstyles orilluminations sometimes make the face only coarselyelliptical. On the other hand, the reliability of a faceveri"cation method (for example the projection approachin Fig. 6) strongly depends on the face location accuracy.Therefore, in order to confer a greater robustness to theFLFV module, we developed a global approach whichattempts to satisfy the two di!erent aims at the sametime.

The face is locally searched in a small portion of thedirectional image by starting from a candidate positionresulting from AL; to this purpose, a mask F, describingthe global aspect of a human face, and de"ned in terms ofdirectional elements is employed. The local search isperformed through an orientation-based correlation be-tween F and the D portion of interest. Actually, after the

approximate location step, D is locally re"ned (i.e. re-computed on a 3]3 pixel grid) in a neighborhood of thecandidate position according to a multiresolution strat-egy (in the following, for sake of exposition, D@ denotesthe new directional image portion).

The mask F is de"ned as a set of directional elementsni, i"1, 2, n

.!9each of which is characterized by an

origin, an unoriented direction (in the range [!903, 903[)and a modulus. The element origins are determined bysuperimposing a grid having the same granularity of D@(3]3) to the template reported in Fig. 7. In particular anelement n

*is de"ned for each grid point lying within one

of the template gray regions (which correspond to thesalient face features). Since the template is parametricallyde"ned according to the sizes a and b, whereas the gridgranularity is "xed, a di!erent number n

.!9of elements is

created varying a and b.3As to directions, all the elements within the mouth,

eyes and eyebrows regions have horizontal direction, thenose elements have vertical direction and each elementbelonging to the border region has the direction of thetangent (in that point) to the external ellipse. Each of theseven regions in Fig. 7 has a global modulus which isequally partitioned among its elements. If we considerthe common face stereotype, the shape and the positionof the nose in Fig. 7 appear a bit unusual; in particular,the nose is too short and moved upward. Actually, this

1530 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

Fig. 7. The template used for the construction of the maskF used by the FLVF. The sizes and ratios of the template havebeen calculated by analyzing some face images and by keepinginto account the indications reported by other authors[19,35,36]. The region global moduli are: border"390,mouth"110, nose"60, eyes"35#35, eyebrows"25#25.

choice is motivated by the fact that in the nostril regionthe edge directions are not strictly vertical but can berather chaotic.

By discretizing a and b in the range [a.*/2

a.!9

,b.*/2

b.!9

] it is possible to pre-compute a set F"F1,

F2,2, F

mof static masks (Fig. 8), where the element

positions are given in relative coordinates with respect tothe mask center. It should be noted that the set of masksin F does not explicitly model face rotation; in fact, sincein our current application only small head rotation isallowed, a set of `verticala masks proved to be adequate;anyway, more in general F could be expanded to copewith tilted faces by including two or more rotated copiesof each mask.

The masks in F allow a correlation degree at eachposition in D@ to be e$ciently computed. Matching direc-tional elements requires an ad hoc correlation (or dis-tance) operator capable of dealing with the discontinuity(!903% 903) in the de"nition of directions (e.g. Ref.[48]). Good results have been obtained in this work witha distance operator de"ned as an average sum of direc-tion-di!erence absolute values. The distance between themask F

i3F and the portion of D@ centered in [x, y] is

computed as:

Distance (Fi, D@, [x, y])

MDistance"0; ModuliSum"0;∀ element n3F

iM[x

n, y

n]"origin(n);

Let d3D@ be the element with origin[x, y]#[x

n, y

n];

un"direction(n); p

n"modulus(n);

ud"direction(d); p

d"modulus(d);

Distance"Distance#pn) p

d)*angle(u

n, u

d);

ModuliSum"ModuliSum#pn)p

d;

NDistance"Distance/ModuliSum;

N

Let [x0, y

0] be a candidate position as resulting from

the module AL, then FLFV determines the face position[xH, yH] and the semi-axes aH and bH by minimizing theDistance function over a discrete state space:

d.*/

" minF

i|Fx|*x0~*x, x0`*x+y|*y0~*y,y0`*y+

MDistance(Fi, D@, [x, y])N

where *x and *y de"ne a neighborhood of [x0, y

0], and

aH, bH coincide with the semi-axes of the best-"ttingmask. In our simulations we set *x"*y"4 and there-fore the total number of states is 12]9]9"972. Ane$cient implementation (in integer arithmetic) allowedan exhaustive strategy for determining the optimum to beadopted. In case of a large number of states the use of lessexpensive optimization techniques (Steepest descent,Downhill Simplex method, etc.) should be investigated.

Once the best-"tting position has been determined, theface veri"cation sub-task can be simply performed bycomparing the distance d

.*/with a pre-"xed threshold.

Fig. 9 shows the results of the intermediate steps ofa face location example.

5. Combining AL and FLFV

Depending on the application requirements, there areseveral ways of adjusting and combining AL and FLFVmodules. Since at this stage our aim is to developa method capable of e$ciently detecting the foregroundface, we adopted the functional schema of Fig. 1.b. Inparticular, the algorithm searches for just one face in theimage; it returns the face position[x

&, y

&] and sizes a

&, b

&,

in case of detection, and null otherwise. A pseudo-codeversion of the whole face detection method is reported:

Compute directional image D;Perform generalized Hough transform;[x

0, y

0]"get "rst candidate position on A;

d&"R;

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1531

Fig. 8. A graphical representation of the 12 masks constituting the set F (m"12) used in our experimentation (the length of the maskelements is proportional to their modulus). It should be noted that, since the region global moduli are constant, the element moduli inthe smaller masks are generally larger since their number is lower. The only exception concerns the border region where the number ofelements is constant and therefore their modulus is independent on the mask size.

While ([x0, y

0] is not NULL) and d

f'¹

1MD@"Re"nes the directional image in a neighbor-

hood of [x0, y

0];

d.*/

" minF

i|Fx|*x0~*x, x0`*x+y|*y0~*y,y0`*y+

MDistance(Fi, D@, [x, y])N

//[xH, yH], aH, bH determine the

state corresponding to d.*/

.

if (d.*/

(d&)Md

&"d

.*/;[x

&, y

&]"[xH, yH];

a&"aH; b

&"bH;N

[x0, y

0]"get next candidate position on A;

Nif (d

f(¹

2) M return [x

&, y

&], a

&, b

&; N

else M return null; N

Two di!erent thresholds ¹1and ¹

2(¹

1(¹

2) are used

by the algorithm: ¹1

stops the iterative search process assoon as a good match has been found (d

&)¹

1). In case

all the candidate positions have been examined (in ourexperiments the candidate positions are the three bestlocal maxima of A), the iterative process is interrupted,and a valid face is returned if the smallest distance com-puted is less than ¹

2. Using a pair of thresholds allows

a larger number of candidate positions to be analyzed incase the current one is not su$ciently reliable, and at the

same time the temporary discarded faces can be recon-sidered if no more-likely candidates have been found.

6. Experimentation

Experimental results have been produced on adatabase of 70 images each of which contains at leastone human face (Fig. 11). All the images (384]288pixels*256 gray levels) were acquired in some o$ces andlaboratories of our department, under di!erent illumina-tions (sometimes rather critical: backlighting, semidark-ness,2) and with the subject at di!erent distances fromthe camera. In 10 images people wear spectacles. Thesubjects were required to gaze the camera. Each of the70 images was manually labeled, by indicating with amouse, the eye positions (le and re) and the mouth center(mc) of the face in foreground (Fig. 10). The followingformulae were used to derive the features (center c andsemi-axes a, b) of an ellipse approximating a face:

a"1.1 ) DDle-reDD;

b"1.3 ) (DDle-mcDD#DDre-mcDD)/2;

c"[x#, y

#]; mc"[x

.#, y

.#]; x

#"x

.##cos m ) 0.54 ) b,

y#"y

.##sin m ) 0.53 ) b

1532 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

Fig. 9. The results of the intermediate steps in a face location example: directional image computation, generalized Hough transformand selection of the best candidate position (module AL), directional image local re"ning, determination of the best-"tting mask andposition (module FLFV).

Fig. 10. The geometric construction used for de"ning the face ellipse (c, a, b) starting from the eye centers le, re and the mouth center mc.

where m"(a#b)/2, with a"angle(ec, mc) andb"angle(el, mc)

Module AL: The approximate location module detectedthe correct face position as global maximum of A (i.e. "rst

candidate position) in 65 cases (92.86%). In the remain-ing "ve images the face position was detected as thesecond or third candidate position. Fig. 12 shows someimages and the corresponding transforms, where the

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1533

Fig. 11. The database used in our experimentation.

c

Fig. 13. A false alarm caused by an elliptical object in the scene.In any case, the local maximum determined by the face is wellevident and constitutes the second best choice.

global maximum determined by the elliptical face shapeis well visible (an ellipse of semi-axes a and b was super-imposed on each maximum). Fig. 13 reports an examplewhere the transform global maximum does not coincidewith the face position, but it is determined by a di!erentcircular object (an apple within a poster). This falsealarm, as can be seen in Fig. 14 ("rst row, third column),is removed by the module FLFV.

The whole approach: AL#FLFV: The whole face detec-tion algorithm has been applied to the 70 images: in 69cases the foreground face was correctly detected; only in

one case no faces were found. Eleven false alarms (in7 images) generated by AL were correctly discarded byFLFV: if we consider the 69 images, we can conclude thaton average about 0.16 false alarms per image are pro-cessed and correctly discarded by FLFV before the cor-rect face position is detected.

1534 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

Fig. 12. Some images and the corresponding transforms as computed by the module AL.

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1535

Fig. 14. Some examples of face location. The thin-border ellipses denote AL outputs, whereas the best-"tting masks are reported asFLFV outputs.

1536 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

Fig. 15. The only missed detection error on our database: noneof the three candidate positions have been accepted by themodule FLFV; the upper candidate position is not enough closeto the face to allow FLFV to converge.

The location accuracy was measured by considering,for the 69 faces correctly detected, the percentage erroron the center position (c}err) and on the semi-axes length(a}err, b}err). Let c, a and b be the center and thesemi-axes of the foreground face in an image (asmanually indicated during the database labeling) and let[x

&, y

&], a

&and b

&be the center and the semi-axes returned

by the algorithm, then:

c}err"DD[x&, y

&]!cDD/a#b

a}err"Da&!aD/a

b}err"Db&!bD/b

The average values obtained: c}err"8.70%,a}err"8.66%, b}err"10.47% denote a good locationaccuracy. It is worth remarking that a face cannot beexactly described by an ellipse using the geometric modelof Fig. 11 and therefore the errors measured should notbe exclusively attributed to the detection algorithm. Infact, a qualitative analysis of the results con"rmed that inmost cases the mask positioned by the module FLFVperfectly "t the underlying face, especially in the eye andthe mouth regions.

Fig. 14 shows some examples; the thin-border ellipsesdenote AL outputs; the false alarms produced by AL canbe easily noted since no masks are associated to thecorresponding ellipses. Fig. 15 shows the only imageswhich produced a miss detection error, probably due tothe lateral illumination which hides a signi"cant part ofthe face, thus inducing the module AL to poorly estimatethe face position.

The method discussed in this paper is capable of pro-cessing a 384]288 gray-scale image in 0.078 s on Hard-ware Intel Pentium II 266 MHz: in particular, 0.031 s arenecessary for the directional image computation, 0.007 s

for the Hough transform, 0.015 s for the directional im-age re"ning and 0.019 s for the "nal template matching.By increasing the last two terms by 16% (in order to takeinto account the average number of false alarms gener-ated by AL) and by summing all the contributes, weobtain an average processing time of 0.078 s which cor-responds approximately to 13 frames per second. Wedisregard the computation of the 256 templates for theHough transform and the computation of the masks inF, since they are performed o!-line in 0.086 s.

7. Conclusions

This work proposes a two-stage approach to facelocation on the gray scale static images with complexbackgrounds. Both the modules operate on the elementsconstituting the directional image, which has been pro-ved to be very e!ective in providing reliable informationeven in the presence of critical illumination andsemidarkness.

The approximate location module searches for themost likely positions in the image by means of a particu-lar implementation of the generalized Hough transform.Great computational saving is obtained with respect toa correlation-based technique. In fact, in the former casejust one directional image scan allows the `hota positionsto be extracted, whereas in the latter several elliptictemplates (resembling the di!erent elliptical shapes andsizes) should be shifted everywhere on the directionalimage to discover the high correlation points. Numer-ically, if n

Tis the average number of cells updated by the

current template T and nD

is the number of elementsin D (in our implementation n

T+20 and n

D+2000)

O(nT) n

D) operations are necessary for calculating the

Hough transform. If we assume that nT

also denotes theaverage number of elements in a hypothetical ellipticaltemplate, then O(m ) n

T) n

D) operations are necessary for

the template matching with m templates. According toour parameters choice, a reasonable value for m is 12(please, refer to Fig. 8); hence the GHT implementationgives a considerable saving with respect to a correlation-based approach.

The "ne location and face veri"cation module analysessmall-re"ned portions of directional image attempting todetect the exact position and size of a face together witha con"dence value about its presence. This is performedby means of a set of masks resembling the human facestereotype; since the masks are de"ned in terms of direc-tions, their matching is robust and very little biased bylight conditioning.

Very good results have been achieved both in termsof face detection rate (just one missed detection on 70images) and e$ciency (about 13 images/sec. can beprocessed on Hardware Intel Pentium II 266 MHz run-ning Window95TM). Furthermore, a more e$cient version

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1537

could be obtained by applying an appropriate code op-timization.

As to future research, we are going to investigate howto combine the modules AL and FLFV with predictiveanalysis techniques (for example Kalman "ltering) inthe context of face tracking applications. Furthermore,we are gathering a larger database of images which willallow us to more precisely characterize the performanceof our approach and to draw a ROC curve showing thefalse detection/missed detection tradeo!.

References

[1] R. Chellappa, S. Sirohey, C.L. Wilson, C.S. Barnes, Humanand machine recognition of faces: a survey, Tech. ReportCS-TR-3339, Computer Vision Laboratory, University ofMaryland, 1994.

[2] I. Craw, H. Ellis, J.R. Lishman, Automatic extraction offace-features, Pattern Recognition Lett. 5 (1987) 183}187.

[3] G. Yang, T.S. Huang, Human face detection in a complexbackground, Pattern Recognition 27 (1994) 53}63.

[4] I. Craw, D. Tock, A. Bennet, Finding face features, Pro-ceedings of ECCV, 1992.

[5] A. Yuille, D. Cohen, P. Hallinan, Facial features extractionby deformable templates, Tech. Report 88-2, HarwardRobotics Laboratory, 1988.

[6] C. Huang, C. Chen, Human facial feature extraction forface interpretation and recognition, Pattern Recognition25 (1992) 1435}1444.

[7] G. Chow, X. Li, Towards a system for automatic facialfeature detection, Pattern Recognition 26 (1993) 1739}1755.

[8] K. Lam, H. Yan, Locating and extracting the eye in humanface images, Pattern Recognition 29 (1996) 771}779.

[9] A. Lanitis, C.J. Taylor, T.F. Cootes, T. Ahmed, Automaticinterpretation of human faces and hand gesture using#exible models, Proceedings of International Workshopon Automatic Face and Gesture Recognition, Zurich,1995, pp. 98}103.

[10] R. Funayama, N. Yokoya, H. Iwasa, H. Takemura, Facialcomponent extraction by cooperative active nets with glo-bal constraints, Proceedings of the 13th ICPR, v. B, 1996,pp. 300}304.

[11] S.R. Gunn, M.S. Nixon, Snake head boundary extractionusing global and local energy minimisation, Proceedingsof the 13th ICPR, v. B, 1996, pp. 581}585.

[12] S.A. Sirohey, Human face segmentation and identi"cation,Tech. Report CAR-TR-695, Center for Automation Re-search, University of Maryland, 1993.

[13] A. Jacquin, A. Eleftheriadis, Automatic location trackingof faces and facial features in video sequences, Proceedingsof the International Workshop on Automatic Face andGesture Recognition, 1995, pp. 142}147.

[14] R. Herpers, H. Kattner, H. Rodax, G. Sommer, GAZE: anattentive processing strategy to detect and analyze theprominent facial regions, Proceedings of the InternationalWorkshop on Automatic Face and Gesture Recognition,1995, pp. 214}220.

[15] X. Li, N. Roeder, Face contour extraction from front-viewimages, Pattern Recognition 28 (1995) 1167}1179.

[16] V. Govindaraju, S.N. Srihari, D.B. Sher, A computationalmodel for face location, Proceedings of the 3rd ICCV,1990, pp. 718}721.

[17] H.P. Graf, T. Chen, E. Petajan, E. Cosatto, Locating facesand facial parts, Proceedings of the International Workshopon Automatic Face and Gesture Recognition, 1995, pp.41}46.

[18] M.C. Burl, T.K. Leung, P. Perona, Face localization viashape statistics, Proceedings of the International Workshopon Automatic Face and Gesture Recognition, 1995, pp.154}159.

[19] S. Jeng, H.M. Liao, Y. Liu, M. Chern, An e$cient ap-proach for facial feature detection using geometricalface model, Proceedings of the 13th ICPR, v. C, 1996,pp. 426}430.

[20] B. Moghaddam, A. Pentland, Maximum likelihood detec-tion of faces and hands, Proceedings of the InternationalWorkshop on Automatic Face and Gesture Recognition,1995, pp. 122}128.

[21] M.S. Lew, N. Huijsmans, Information theory and facedetection, Proceedings of the 13th ICPR, v. C, 1996, pp.601}605.

[22] A.J. Colmenarez, Face and facial feature detection withinformation-based maximum discrimination, Nato AsiConference on Faces, 1997.

[23] D. Valentin, H. Abdi, A.J. Otoole, G.W. Cottrell, Con-nectionist models of face processing*a survey, PatternRecognition 27 (1994) 1209}1230.

[24] G. Burel, D. Carel, Detection and localization of faces ondigital images, Pattern Recognition Lett. 15 (10) (1994)963}967.

[25] K. Sung, T. Poggio, Example-based learning for view-based human face detection, A.I. Memo 1521, CBCLPaper 112, MIT, 1994.

[26] H.A. Rowley, S. Baluja, T. Kanade, Human face detectionin visual scenes, Tech. Report CMU-CS-95-158R, Car-negie Mellon University, 1995.

[27] B. Schiele, A. Waibel, Gaze tracking based on face color,Proceedings of the International Workshop on AutomaticFace and Gesture Recognition, 1995, pp. 344}349.

[28] N. Intrator, D. Reisfeld, Y. Yeshurun, Extraction of facialfeatures for recognition using neural networks, Proceed-ings of the International Workshop on Automatic Faceand Gesture Recognition, 1995, pp. 260}265.

[29] R. Feraud, A conditional ensemble of experts applied toface detection, Nato Asi Conference on Faces, 1997.

[30] F.F. Soulie, Connectionist methods for human face pro-cessing, Nato Asi Conference on Faces, 1997.

[31] H. Wu, Q. Chen, M. Yachida, An application of fuzzytheory: face detection, Proceedings of the InternationalWorkshop on Automatic Face and Gesture Recognition,1995, pp. 314}319.

[32] Y. Dai, Y. Nakano, Face-texture model based on SGLDand its application in face detection in a color scene,Pattern Recognition 29 (1996) 1007}1017.

[33] C.H. Lee, J.S. Kim, K.H. Park, Automatic face location ina complex background using motion and color informa-tion, Pattern Recognition 29 (1996) 1877}1889.

[34] K. Sobottka, I. Pitas, Extraction of facial regions andfeatures using color and shape information, Proceedings ofthe 13th ICPR, v. C, 1996, pp. 421}425.

1538 D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539

[35] H. Sako, A.V.W. Smith, Real-time expression recognitionbased on features position and dimension, Proceedings ofthe 13th ICPR, v. C, 1996, pp. 643}648.

[36] E. Saber, A. Murat Tekalp, Face detection and facialfeature extraction using color, shape and symmetry-basedcost functions, Proceedings of the 13th ICPR, v. C, 1996,pp. 654}657.

[37] B. Leroy, I.L. Herlin, L.D. Cohen, Face identi"cation bydeformation measure, Proceedings of the 13th ICPR, v. C,1996, pp. 633}637.

[38] M.J. Donahue, S.I. Rokhlin, On the use of level curves inimage analysis, Image Understanding 57 (1993) 185}203.

[39] P. Seitz, M. Bichsel, The digital doorkeeper*automaticface recognition with the computer', Proceedings of the 25thIEEE Carnahan Conference on Security Technology, 1991.

[40] D.H. Ballard, Generalizing the Hough transform to detectarbitrary shapes, Pattern Recognition 3 (1981) 110}122.

[41] L.S. Davis, Hierarchical generalized Hough transform andline segment based generalized Hough transforms, PatternRecognition 15 (1982) 277}285.

[42] J. Illingworth, J. Kittler, The adaptive Hough trans-form, IEEE Trans. Pattern Anal. Mach. Intell. 9 (1987)690}697.

[43] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flan-nery, Numerical Recipes in C, Cambridge UniversityPress, Cambridge, 1992.

[44] D. Reisfeld, H. Wolfson, Y. Yeshurun, Detection of interestpoints using symmetry, Proceedings of the 3rd ICCV,1990, pp. 62}65.

[45] R. Brunelli, T. Poggio, Face recognition: features versustemplates, IEEE Trans. Pattern Anal. Mach. Intell. 15(1993) 1042}1052.

[46] H. Wu, Q. Chen, M. Yachida, Facial Features Extractionand Face Veri"cation, Proceedings of the 13th ICPR, v. C,1996, pp. 484}488.

[47] D. Maio, D. Maltoni, Fast face location in complex back-grounds, Nato Asi Conference on Faces, 1997.

[48] A. Crouzil, L. Massip-Pailhes, S. Castan, A New Cor-relation Criterion Based on Gradient Fields Similarity,Proceedings of the 13th ICPR, v. A, 1996, pp. 632}636.

About the Author*DARIO MAIO is Full Professor at the Computer Science Department, University of Bologna, Italy. He haspublished in the "elds of distributed computer systems, computer performance evaluation, database design, information systems, neuralnetworks, biometric systems, autonomous agents. Before joining the Computer Science Department, he received a fellowship from theC.N.R. (Italian National Research Council) for participation to the Air Tra$c Control Project. He received the degree in ElectronicEngineering from the University of Bologna in 1975. He is a IEEE member. He is with CSITE - C.N.R. and with DEIS; he teachesdatabase and information systems at the Computer Science Dept., Cesena.

About the Author*DAVIDE MALTONI is an Associate Researcher at the Computer Science Department, University of Bologna,Italy. He received the degree in Computer Science from the University of Bologna, Italy, in 1993. In 1998 he received his Ph.D. inComputer Science and Electronic Engineering at DEIS, University of Bologna, with research theme `Biometric Systemsa. His researchinterests also include autonomous agents, pattern recognition and neural nets. He is an IAPR member.

D. Maio, D. Maltoni / Pattern Recognition 33 (2000) 1525}1539 1539