Background learning for robust face recognition

12
832 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005 Background Learning for Robust Face Recognition With PCA in the Presence of Clutter A. N. Rajagopalan, Rama Chellappa, Fellow, IEEE, and Nathan T. Koterba Abstract—We propose a new method within the framework of principal component analysis (PCA) to robustly recognize faces in the presence of clutter. The traditional eigenface recognition (EFR) method, which is based on PCA, works quite well when the input test patterns are faces. However, when confronted with the more general task of recognizing faces appearing against a background, the performance of the EFR method can be quite poor. It may miss faces completely or may wrongly associate many of the background image patterns to faces in the training set. In order to improve performance in the presence of background, we argue in favor of learning the distribution of background patterns and show how this can be done for a given test image. An eigenbackground space is constructed corresponding to the given test image and this space in conjunction with the eigenface space is used to impart robust- ness. A suitable classifier is derived to distinguish nonface patterns from faces. When tested on images depicting face recognition in real situations against cluttered background, the performance of the proposed method is quite good with fewer false alarms. Index Terms—Clutter, eigenface, eigenbackground, Fisher’s linear discriminant (FLD), principal component analysis (PCA). I. INTRODUCTION F ACE recognition by machines is an active area of research and spans several disciplines such as image processing, pattern recognition, computer vision, and neural networks. The interest in this field is motivated by a broad range of potential applications for systems that can code and interpret face images. Developing a computational model for face recognition is quite a difficult task because faces are a natural class of complex, mul- tidimensional objects. The computational approaches available today can only suggest broad constraints for this problem. Over the last ten years, many approaches have been attempted to solve the face recognition problem [1]–[13]. One of the very successful and popular face recognition methods is based on the principal components analysis (PCA) [1]. In 1987, Sirovich and Kirby [1] showed that if the eigenvectors corresponding to a set of training face images are obtained, any image in that database can be optimally reconstructed using a linear weighted com- bination of these eigenvectors. Their work explored the repre- sentation of human faces in a lower-dimensional subspace. In Manuscript received December 30, 2002; revised June 4, 2004. This work was supported in part by DARPA HID under contract N00014-03-1-0520. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Christine Guillemot. A. N. Rajagopalan is with Image Processing and Computer Vision Labo- ratory, Department of Electrical Engineering, Indian Institute of Technology, Madras, Chennai 600 036, India (e-mail: [email protected]). R. Chellappa and N. T. Koterba are with the Center for Automation Research, University of Maryland, College Park, MD 20742 USA (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TIP.2005.847288 1991, Turk and Pentland [2] used these eigenvectors (or eigen- faces as they are called) for face recognition. PCA was used to yield projection directions that maximize the total scatter across all faces in the training set. They also extended their ap- proach to real time recognition of a moving face image in a video sequence [14]. Another popular scheme for dimension- ality reduction in face recognition is due to Belhumeur et al. [8], Etemad and Chellappa [9], and Swets and Weng [10]. It is based on Fisher’s linear discriminant (FLD) analysis. The FLD uses class membership information and develops a set of fea- ture vectors in which variations of different faces are empha- sized while different instances of a face due to illumination con- ditions, facial expressions and orientations are de-emphasized. The FLD method deals directly with discrimination between classes whereas the eigenface recognition (EFR) method deals with the data in its entirety without paying any particular atten- tion to the underlying class structure. It is generally believed that algorithms based on FLD are superior to those based on PCA when sufficient training samples are available. But as shown in [15] this is not always the case. Methods such as EFR and FLD work quite well provided the input test pattern is a face, i.e., the face image has already been cropped out of a scene. The problem of recognizing faces in still images with a cluttered background is more general and diffi- cult as one does not know where a face pattern might appear in a given image. A good face recognition system must possess the following two properties. It should: 1) detect and recognize all the faces in a scene, and 2) not misclassify background patterns as faces. Since faces are usually sparsely distributed in images, even a few false alarms will render the system ineffective. Also, the performance should not be too sensitive to any threshold se- lection. Some attempts to address this situation are discussed in [2], [7] where the use of distance from eigenface space (DFFS) and distance in eigenface space (DIFS) are suggested to detect and eliminate nonfaces for robust face recognition in clutter. In this study, we show that DFFS and DIFS by themselves (in the absence of any information about the background) are not sufficient to discriminate against arbitrary background patterns. If the threshold is set high, traditional EFR invariably ends up missing faces. If the threshold is lowered to capture the face, the technique incurs many false alarms. Thus, the scheme is quite sensitive to the choice of the threshold value. One possible approach to handle clutter in still images is to use a good face detection module to find face patterns and then feed only these patterns as inputs to the traditional EFR scheme. Face detection is a research problem in itself and various approaches exist in the literature [16]–[18]. Most of the works assume the pose to be frontal. For a recent and 1057-7149/$20.00 © 2005 IEEE

Transcript of Background learning for robust face recognition

832 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005

Background Learning for Robust Face RecognitionWith PCA in the Presence of Clutter

A. N. Rajagopalan, Rama Chellappa, Fellow, IEEE, and Nathan T. Koterba

Abstract—We propose a new method within the framework ofprincipal component analysis (PCA) to robustly recognize faces inthe presence of clutter. The traditional eigenface recognition (EFR)method, which is based on PCA, works quite well when the inputtest patterns are faces. However, when confronted with the moregeneral task of recognizing faces appearing against a background,the performance of the EFR method can be quite poor. It may missfaces completely or may wrongly associate many of the backgroundimage patterns to faces in the training set. In order to improveperformance in the presence of background, we argue in favor oflearning the distribution of background patterns and show howthis can be done for a given test image. An eigenbackground spaceis constructed corresponding to the given test image and this spacein conjunction with the eigenface space is used to impart robust-ness. A suitable classifier is derived to distinguish nonface patternsfrom faces. When tested on images depicting face recognition inreal situations against cluttered background, the performance ofthe proposed method is quite good with fewer false alarms.

Index Terms—Clutter, eigenface, eigenbackground, Fisher’slinear discriminant (FLD), principal component analysis (PCA).

I. INTRODUCTION

FACE recognition by machines is an active area of researchand spans several disciplines such as image processing,

pattern recognition, computer vision, and neural networks. Theinterest in this field is motivated by a broad range of potentialapplications for systems that can code and interpret face images.Developing a computational model for face recognition is quitea difficult task because faces are a natural class of complex, mul-tidimensional objects. The computational approaches availabletoday can only suggest broad constraints for this problem.

Over the last ten years, many approaches have been attemptedto solve the face recognition problem [1]–[13]. One of the verysuccessful and popular face recognition methods is based on theprincipal components analysis (PCA) [1]. In 1987, Sirovich andKirby [1] showed that if the eigenvectors corresponding to a setof training face images are obtained, any image in that databasecan be optimally reconstructed using a linear weighted com-bination of these eigenvectors. Their work explored the repre-sentation of human faces in a lower-dimensional subspace. In

Manuscript received December 30, 2002; revised June 4, 2004. This workwas supported in part by DARPA HID under contract N00014-03-1-0520. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Dr. Christine Guillemot.

A. N. Rajagopalan is with Image Processing and Computer Vision Labo-ratory, Department of Electrical Engineering, Indian Institute of Technology,Madras, Chennai 600 036, India (e-mail: [email protected]).

R. Chellappa and N. T. Koterba are with the Center for AutomationResearch, University of Maryland, College Park, MD 20742 USA (e-mail:[email protected]; [email protected]).

Digital Object Identifier 10.1109/TIP.2005.847288

1991, Turk and Pentland [2] used these eigenvectors (or eigen-faces as they are called) for face recognition. PCA was usedto yield projection directions that maximize the total scatteracross all faces in the training set. They also extended their ap-proach to real time recognition of a moving face image in avideo sequence [14]. Another popular scheme for dimension-ality reduction in face recognition is due to Belhumeur et al.[8], Etemad and Chellappa [9], and Swets and Weng [10]. It isbased on Fisher’s linear discriminant (FLD) analysis. The FLDuses class membership information and develops a set of fea-ture vectors in which variations of different faces are empha-sized while different instances of a face due to illumination con-ditions, facial expressions and orientations are de-emphasized.The FLD method deals directly with discrimination betweenclasses whereas the eigenface recognition (EFR) method dealswith the data in its entirety without paying any particular atten-tion to the underlying class structure. It is generally believed thatalgorithms based on FLD are superior to those based on PCAwhen sufficient training samples are available. But as shown in[15] this is not always the case.

Methods such as EFR and FLD work quite well provided theinput test pattern is a face, i.e., the face image has already beencropped out of a scene. The problem of recognizing faces in stillimages with a cluttered background is more general and diffi-cult as one does not know where a face pattern might appear ina given image. A good face recognition system must possess thefollowing two properties. It should: 1) detect and recognize allthe faces in a scene, and 2) not misclassify background patternsas faces. Since faces are usually sparsely distributed in images,even a few false alarms will render the system ineffective. Also,the performance should not be too sensitive to any threshold se-lection. Some attempts to address this situation are discussed in[2], [7] where the use of distance from eigenface space (DFFS)and distance in eigenface space (DIFS) are suggested to detectand eliminate nonfaces for robust face recognition in clutter.In this study, we show that DFFS and DIFS by themselves (inthe absence of any information about the background) are notsufficient to discriminate against arbitrary background patterns.If the threshold is set high, traditional EFR invariably ends upmissing faces. If the threshold is lowered to capture the face, thetechnique incurs many false alarms. Thus, the scheme is quitesensitive to the choice of the threshold value.

One possible approach to handle clutter in still images isto use a good face detection module to find face patterns andthen feed only these patterns as inputs to the traditional EFRscheme. Face detection is a research problem in itself andvarious approaches exist in the literature [16]–[18]. Most ofthe works assume the pose to be frontal. For a recent and

1057-7149/$20.00 © 2005 IEEE

RAJAGOPALAN et al.: BACKGROUND LEARNING FOR ROBUST FACE RECOGNITION 833

comprehensive survey on face detection techniques, see [19],[20]. In this paper, we propose a new methodology within thePCA framework to robustly recognize faces in a given testimage with background clutter. Toward this end, we constructan “eigenbackground space” which represents the distributionof the background images corresponding to the given testimage. The background is learned “on the fly” and providesa sound basis for eliminating false alarms. An appropriatepattern classifier is derived and the eigenbackground spacetogether with the eigenface space is used to simultaneouslydetect and recognize faces. Results are given on several testimages to validate the proposed method.

The organization of the paper is as follows. In Section II,we briefly describe the eigenface recognition method and ex-amine the effect of background on the performance of traditionalEFR. The problem of background representation is taken up inSection III. A classifier based on the Kullback–Leibler measureis derived in Section IV to discriminate between face and non-face patterns. A robust face recognition scheme is proposed inSection V. Experimental results are given in Section VI whileSection VII concludes the paper.

II. EIGENFACE RECOGNITION IN CLUTTER

Consider a training set of face images (all row-ordered) where is the total number of training images. Letbe the dimension (in pixels) of the training images. The meanof these face images is given by . Let thedifference image from the mean value be given by the vector

. PCA yields projection directionsthat maximize the total scatter across all the images. PCA solvesfor a set of orthonormal vectors where the th vector ischosen such that is maximum sub-ject to . It turns out that the vectors and scalars

are the eigenvectors and eigenvalues, respectively, of the co-variance matrix which is given by

(1)

where . Let be the eigenvector of thematrix . Then, by definition, , where isthe eigenvalue corresponding to eigenvector . Premultiplyingboth sides by , we have

Clearly, is the eigenvector of the matrix and the cor-responding eigenvalue is . Since the dimension of the matrix

is only , it is easier to solve for the eigenvectors offirst and then determine the eigenvectors of as .

Of the eigenvectors, only need be used for recognition,i.e., eigenvectors that have higher eigenvalues which accountfor most of the variations among the training patterns. Thesesignificant eigenvectors are called “eigenfaces” [2], [4] and eachface image in the training set can be expressed as a weightedlinear combination of the eigenfaces. The weight vector

corresponding to a pattern is calculated as the projection ontothe dimensional eigenface space, i.e.

where

(2)

When a new test pattern is presented to the system for recog-nition, its weight vector is determined with respect to theeigenface space. In order to perform recognition, the differenceerror between this weight vector and the mean weight vectorcorresponding to every person in the training set is computed.This error is called the distance in face space (DIFS). The DIFSvalue of the test pattern with respect to the th person is givenby

(3)

where is the mean weight vector of the th person. Thatface class in the training set for which the DIFS is minimumis declared as the recognized face provided the difference erroris less than an appropriately chosen threshold.

In the above discussion, the input test image was that of ahuman face. The problem involving a face image among non-face images is a difficult one. If motion information (such asvideo data) is available, then one could use it to detect the pres-ence of a face. The case of a still image containing face againstbackground is much more complex and some attempts havebeen made to tackle it [2], [7]. In [2], the authors advocate theuse of distance from face space (DFFS) to reject nonface pat-terns. If is the projection of the mean subtracted input imagepattern in the eigenface space, then can be expressed as

where is the th component of the weightvector . The DFFS is then defined as

(4)

It can be looked upon as the error in the reconstruction of apattern. It has been pointed out in [2] that a thresholdcould be chosen such that it defines the maximum allowabledistance from the face space. If , then the testpattern is classified as a nonface image. In a more recent work[7], DFFS together with DIFS has been suggested to improveperformance. A test pattern is classified as a face and recognizedprovided its DFFS as well as DIFS values are less than suitablychosen thresholds and , respectively.

Although DFFS and DIFS have been suggested as possiblecandidates for discriminating against background patterns, it isdifficult to conceive that by learning just the face class we cansegregate any arbitrary background pattern against which facepatterns may appear. It may not always be possible to come upwith threshold values that will result in no false alarms and yetcan detect all the faces. To better illustrate this point, we showsome examples in Fig. 1(a) where faces appear against back-ground. Our training set contains faces of these individuals. De-tails of the PCA used in this experiment are given in Section VI.The idea is to locate and recognize these individuals in the testimages when they appear against clutter. For every subimagepattern in a test image, its DFFS and DIFS values were calcu-lated and an attempt was made to recognize a pattern (as a faceor otherwise) based on its DFFS and DIFS values as suggested in

834 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005

Fig. 1. (a) Test images captured inside the laboratory. Results for (b) traditional EFR and (c) the proposed method. Note that traditional EFR has many falsealarms.

[7]. The subimage patterns were of the same size as the trainingimages. The threshold value was chosen to be the smallest forwhich real faces are not missed. It turns out that not only dowe detect faces but also end up with many false alarms [seeFig. 1(b)] since information about the background is completelyignored. It is interesting to note that some of the background pat-terns have been wrongly identified as one of the individuals inthe training set. If the threshold values are made smaller to elim-inate false alarms, we end up missing some of the faces. In fact,for difficult poses, the thresholds for DFFS and DIFS need tobe set even higher which, in turn, will increase the possibility ofmore false alarms. Thus, the performance of the EFR techniqueis quite sensitive to the threshold values chosen. In real practice,it is very difficult to come up with a good threshold value for agiven test image.

What would really be desirable is to have a way of setting thethreshold high so that face images are seldom missed, while atthe same time potential false alarms are correctly discriminated.In order to achieve this, properties of the background scene localto a given test image must be extracted and utilized.

III. BACKGROUND REPRESENTATION

In the absence of any information about the background,many naturally occurring nonface patterns in the real worldcould well be confused as face patterns. If only the eigenface

space is learnt, then background patterns with relatively smallDFFS and DIFS values will pass for faces and this can resultin an unacceptable number of false alarms. We argue in favorof learning the distribution of background images specific to agiven scene. It must be mentioned that in [16], [17] attemptshave been made to learn the distribution of nonface patterns.These approaches are quite data intensive in nature as theyattempt to learn a “universal” background class from hundredsof thousands of nonface patterns. Since one is usually interestedin recognizing faces in a given image, we believe that it will bemore useful to learn the background distribution “local” to agiven test image than to learn a universal background class. Alocally learnt distribution can be expected to be more effectivein capturing the background characteristics of the given testimage. By constructing the eigenbackground space for the giventest image and comparing the proximity of an image patternto this subspace versus the eigenface subspace, backgroundpatterns can be rejected. We conjecture that utilization of bothface and background distributions would reduce false alarmsand also decrease the sensitivity to the choice of threshold.

A. Eigenbackground Space

We now describe a simple but effective technique for con-structing the “eigenbackground space” (a term that we coinakin to the eigenface space). It is assumed that faces aresparsely distributed in a given image, which is a reasonable

RAJAGOPALAN et al.: BACKGROUND LEARNING FOR ROBUST FACE RECOGNITION 835

assumption. Given a test image, the background is learnt “onthe fly” from the test image. Initially, the test image is scannedfor those image patterns that are very unlikely to belongto the “face class.” Simple thresholding is used to separate“prominent” background patterns using the a priori statisticalknowledge-base of faces, i.e., the eigenface space. This isachieved as follows.

• A window pattern in the test image is classified (pos-itively) as a background pattern if its distance from theeigenface space is greater than a certain (high) threshold

.

Note that we use DFFS to initially segregate only the mostlikely background patterns. Since the background usually con-stitutes a major portion of the test image, it is possible to ob-tain a sufficient number of samples for learning the “backgroundclass” even if the threshold is chosen to be large for higherconfidence. Patterns thus obtained represent a reasonable sam-pling of the background scene. Since the number of backgroundpatterns is likely to be very large, these patterns are distributedinto clusters using simple -means clustering so that -pat-tern centers are returned. It may be mentioned here that choosingtoo few clusters can result in an under-representation of the“background class”. On the other hand, one cannot use too manyclusters because the number of training samples is limited. Themean and covariance estimated from these clusters allow us toeffectively extrapolate to other background patterns in the image(not picked up due to high value of ) as well.

The pattern centers which are much fewer in number as com-pared to the number of background patterns are then used astraining images for learning the eigenbackground space. Al-though the pattern centers belong to different clusters, they arenot totally uncorrelated with respect to one another and fur-ther dimensionality reduction is possible. The procedure thatwe follow is similar to that used to create the eigenface space.We first find the principal components of the background pat-tern centers or the eigenvectors of the covariance matrix ofthe set of background pattern centers. These eigenvectors canbe thought of as a set of features which together characterizethe variation among pattern centers of the background space.The subspace spanned by the eigenvectors corresponding to thelargest eigenvalues of the covariance matrix is called theeigenbackground space. The significant eigenvectors of the ma-trix , which we call “eigenbackground images,” form a basisfor representing the background image patterns in the given testimage. A new image can be transformed into its eigenback-ground components by

(5)

where denotes inner product, is the th eigenback-ground image and is the mean value of all the backgroundpattern centers. The projections form the weight vector

which describes the contribution of eacheigenbackground image in representing an input image pattern.

Note that the number of eigenfaces and eigenbackgroundimages need not be the same.

IV. CLASSIFIER

In this section, we derive a classifier which uses informationfrom both the eigenface subspace and the eigenbackground sub-space to discriminate between face and nonface patterns. Let theface class be denoted by and the background class be de-noted by . Assuming the conditional density function for thetwo classes to be Gaussian, we have

where and are the means while and are the covari-ance matrices of the face and the background class, respectively.If the image pattern is of size , then . Let

where (6)

Diagonalization of results in

where is a matrix containing eigenvectors of and is of theform . The column vector is the th eigen-vector. The matrix is diagonal and its elements are the eigen-values of [21]. Finally, is the weight vectorobtained by projecting the vector onto the subspace spannedby the eigenvectors in . Written in scalar form, becomes

Since is approximated using only principal projec-tions, we seek to formulate an estimator for . Following[7], we derive the estimator as

(7)

836 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005

where is a weight that must be optimally estimated using asuitable criterion and is the reconstruction error in withrespect to the eigenface space. This is because can bewritten as

where is the estimate of when projected onto the eigenfacespace. Because is computed using only principal projec-tions in the eigenface space, we have

as the s are orthonormal.Unlike in [7], we proceed to estimate the distribution of the

background also. Based on the estimates of the densities of theface and the background class, we arrive at the desired classifier.For the background, since is approximated using onlyprincipal projections, we have

(8)

where is the reconstruction error in with respect to theeigenbackground space and

Here, is the estimate of when projected onto the eigen-background space and the weight is optimally estimated asdescribed below.

From (6) and (7), the density estimate based on canbe written as the product of two marginal and independentGaussian densities in the face space and its orthogonalcomplement , i.e.

(9)

Here, is the true marginal density in the face spacewhile is the estimated marginal density in .

Along similar lines, the density estimate for the backgroundclass can be expressed as

(10)

Here, is the true marginal density in the backgroundspace while is the estimated marginal density in .

The optimal values of and can be determined by mini-mizing the Kullback-Leibler distance [22] between the true den-sity and its estimate. The cost to be minimized is

where is the expectation operator. Using and its estimateand the fact that , it can be shown that

and

By solving , the optimal values for andturn out to be

(11)

(12)

Thus, once we select the -dimensional principal subspace ,the optimal density estimate has the form given by (9)where is given by (11). A similar argument applies to thebackground space also.

Note that in (7) and (8), consists of two components; thecomponent of which lies in the feature space and a componentin the orthogonal subspace. Since both and yield ameasure of the reconstruction error, we propose to use thesemarginal densities for classifying an image pattern as face orbackground. Assuming equal a priori probabilities, an imagepattern is classified as a face if

(13)

is positive, else not. When , i.e., when the numberof eigenfaces and eigenbackground patterns are the same, and

RAJAGOPALAN et al.: BACKGROUND LEARNING FOR ROBUST FACE RECOGNITION 837

when , i.e., when the arithmetic mean of the eigenvaluesin the orthogonal subspaces is the same, the above classifier in-terestingly simplifies to

(14)

which is a function of only the reconstruction errors. Clearly,the face space would favor a better reconstruction of face pat-terns while the background space would favor the backgroundpatterns. A background image represented with the eigenback-ground images can be expected to have smaller error as com-pared to the case when it is reconstructed using eigenfaces.

V. PROPOSED METHOD

In this section, we propose a scheme that recognizes faces bysearching a given test image for patches of image patterns offaces appearing against a cluttered background. The differentstages of the method are as follows: estimation of the eigenfacespace, construction of the eigenbackground space, and recogni-tion. Training data samples of image patterns of faces are usedto create the eigenface space. Given a test image, the back-ground is learnt on the fly and the eigenbackground space corre-sponding to that test image is derived. Finally, the system classi-fies a subimage as being either a known face or as a backgroundpattern by using the knowledge of both the eigenface space andthe eigenbackground space. Construction of the eigenface spaceand the eigenbackground space has already been discussed inSections II and III, respectively. We now describe the recogni-tion process.

A. Image Recognition

Once the eigenface space and the eigenbackground space arelearnt using training images (of size pixels), the testimage is examined again, but now for the presence of faces atall points in the image. At every pixel in the test image, asubimage of size is cropped about that pixel to obtainthe test patterns. For each of these test window patterns, theclassifier proposed in Section IV is used to determine whethera pattern is a face or not. Ideally, one must use (13) but forcomputational simplicity we use (14) which is the difference inthe reconstruction error. The classifier works quite well despitethis simplification.

To express the operations mathematically, let the subimagepattern under consideration in the test image be denoted as .The vector is projected onto the eigenface space as well asthe eigenbackground space to yield estimates of as and ,respectively. If

and (15)

where is an appropriately chosen threshold then recog-nition is carried out based on its DIFS value. The weight vector

corresponding to pattern in the eigenface space is com-pared (in the Euclidean sense) with the prestored mean weights

of each of the face classes. The pattern is recognized as be-longing to the th person if

and (16)

where is the number of face classes or people in the databaseand is a suitably chosen threshold.

In the above discussion, since a background pattern will bebetter approximated by the eigenbackground images than by theeigenface images, it is to be expected that would beless than for a background pattern . On the otherhand, if is a face pattern, then it will be better representedby the eigenface space than the eigenbackground space. Thus,learning the eigenbackground space helps to reduce the falsealarms considerably. Moreover, the threshold value can now beraised comfortably without generating false alarms because thereconstruction error of a background pattern would continue toremain a minimum with respect to the background space only.Thus, even if is raised, it will not result in a spurt of falsealarms. Hence, knowledge of the background leads to improvedperformance (fewer misses as well as fewer false alarms) andreduces sensitivity to the choice of threshold values (propertiesthat are highly desirable in a recognition scenario). Results cor-roborating these observations will be presented in the section onexperimental results.

The important stages in the proposed scheme can be summa-rized as follows.

Step 1 Compute Eigenfaces. Acquire the initial set oftraining images of faces. Calculate the eigenfaces,keeping only images that correspond to thehighest eigenvalues. These images define theeigenface space.

Step 2 Identify Prominent Background Images. Using ahigh enough threshold , those subimage patternsin the given test image which are sufficiently farfrom the eigenface space are marked as backgroundimages.

Step 3 Calculate Background Pattern Centers. The clas-sical -means algorithm is used to calculate thepattern centers so as to bring down the number ofbackground patterns that we need to deal with.

Step 4 Obtain Eigenbackground Images. The pat-tern centers returned by the -means algorithm areused as training images for the background. Theeigenvectors that correspond to the highest eigen-values are computed from these training patterns.These eigenvectors define the eigenbackgroundspace.

Step 5 Detect and Recognize Faces in the Scene. Givena subimage of the scene, the classifier proposed inSection IV is used to check if it is a face pattern andthe distance in eigenface space (DIFS) of that pat-tern is used to simultaneously recognize that person.

In Fig. 2, we give a block schematic of the architecture of theproposed scheme.

838 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005

Fig. 2. Architecture of the proposed system. (a) Computation of eigenfaces. (b) Construction of eigenbackground space. (c) Face detection and recognition.

Fig. 3. (a) Test case where a person appears naturally against a cluttered scene. (b) Results for the traditional EFR technique. (c) Results using the proposedmethod. (d) Some of the background pattern centers returned by theK-means algorithm. (e) First eight eigenbackground images for the background local to thetest image. (f) Typical eigenfaces.

VI. EXPERIMENTAL RESULTS

In this section, we demonstrate the performance of the pro-posed scheme and compare it with the traditional EFR tech-nique [7]. We generated our own face database for this purpose.The training set consisted of images of 60 subjects with ten im-ages per subject. The set contained frontal as well as profile im-ages. The face images were cropped to 27 27 pixel arrays fortraining and the eigenface space was constructed from this set.

For traditional EFR, after some experiments, the number of sig-nificant eigenfaces was chosen to be 60. The recognition accu-racy was 98% on the training set. Since PCA yields projectionsthat maximize the total scatter across all classes, the first few(2 to 3) principal components mainly capture variations due tolighting. If these components are ignored, then PCA is knownto be less sensitive to illumination [8]. In our implementationof the PCA, we neglected the top two eigenfaces.

RAJAGOPALAN et al.: BACKGROUND LEARNING FOR ROBUST FACE RECOGNITION 839

Fig. 4. (a) Test images with different complex backgrounds. Results for (b) traditional EFR and (c) the proposed method.

For the purpose of testing, we captured images in which sub-jects in the database appeared against different types of back-ground. Some of the images were captured within the laboratorywhere the background consisted of computers, furniture, wallcurtains, etc. For other types of clutter, we used big posters withdifferent types of complex background. Pictures of the individ-uals in our database were then captured with these posters in the

backdrop. Although this is not the same as capturing images out-doors, it nevertheless, serves the purpose of demonstrating theeffectiveness of the proposed scheme. We captured about 400such test images each of size 120 160 pixels. Finally, we alsotook outdoor images of some of the individuals in the trainingset. These images were taken against real backgrounds undernatural lighting conditions. We give results on all the above data

840 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005

Fig. 5. Representative results for the proposed method on some more test images.

sets. The test image is scanned for the presence of a face at allpoints in the image. If a face pattern is recognized by the system,a box is drawn at the corresponding location in the output image.

For the proposed method, the eigenbackground space waslearnt using the methodology discussed in Section III. The de-cision to associate a test window pattern to a known face wasbased on the approach discussed in Section V. Thresholdsand were chosen to be the maximum of all the DFFS andDIFS values, respectively, among the faces in the training set

(which is a reasonable thing to do). The threshold values werekept the same for all the test images and for both the schemesas well. The number of background patterns and backgroundcenters depend on image resolution. The higher the resolution,the higher the number required for an effective representation ofthe background. For the proposed scheme, the number of back-ground pattern centers was chosen to be 600 while the number ofeigenbackground images was chosen to be 100 and these werekept fixed for all the test images. The number of eigenback-

RAJAGOPALAN et al.: BACKGROUND LEARNING FOR ROBUST FACE RECOGNITION 841

Fig. 6. (a) Few of the test cases where the proposed method had false alarms. (b) Test cases where the person is not in the training set.

ground images was arrived at based on the accuracy of recon-struction of the background patterns. The scheme is, however,not very sensitive to this choice.

Results for the test images that were captured inside our lab-oratory are shown in Figs. 1 and 3 for both traditional EFRtechnique and the proposed method. Some representative back-ground pattern centers are shown in Fig. 3(d). It is clear fromthe figure that the pattern centers do indeed belong to the back-ground. Some typical eigenbackground and eigenface imagesare shown in Fig. 3(e) and (f), respectively. Note that the eigen-background images look very different from eigenfaces.

The two methods were also tested on hundreds of poster im-ages with individuals appearing against different types of com-plex background. Due to space constraint, we have given onlya few representative results here (Figs. 4 and 5). All the figuresare quite self-explanatory. From these results, we observe thattraditional EFR (which does not utilize background informa-tion) confuses too many background patterns with faces in thetraining set. If is decreased to reduce the false alarms,then it ends up missing many of the faces. On the other hand,the proposed scheme works quite well and recognizes faces withvery few false alarms, if any. When tested on all the 400 test im-ages, the proposed method has a detection capability of 80%with no false alarms, and the recognition rate on these detectedimages is 78%. A face is missed when either the pose is verydifferent or when there is a significant change in scale or illu-mination from that of the images in the training set. If isincreased to capture slightly difficult poses, we have observedthat the increase in the number of false alarms is very marginal.This can be attributed to the fact that the proximity of a back-ground pattern continues to remain with respect to the back-ground space despite changes in . In Fig. 6(a), we showsome of the few cases when the proposed method had a falsealarm.

As an aside, the proposed method was also checked for thecase when the face in the test image was that of an unknownperson [Fig. 6(b)]. Interestingly, in the few images that we tried,the algorithm could detect faces. However, as expected, it didnot recognize them. In Fig. 6(b), the boxes correspond to facedetection results only and not recognition. There is, of course, noclaim that the method will detect any unknown face, in general.

Finally, we considered outdoor images in which some ofthe people in the training set appear against real backgrounds.These images serve to demonstrate the effectiveness of the pro-posed scheme in recognizing individuals in natural conditionsand with single/multiple people present in the images. Repre-sentative recognition results corresponding to these situationsare given in Fig. 7. We note that the method can handle sideviews quite well [Fig. 7(a)]. It is robust to reasonable changesin illumination as shown in Fig. 7(b). It works even when thereare several persons within the same image [Fig. 7(c)]. Thus, weobserve that the proposed method performs quite well even onoutdoor images.

In Fig. 8, we compare the overall detection performance ofthe proposed method with that of the traditional EFR methodby plotting the detection rate as a function of the (false alarmrate) FAR. The parameters for traditional EFR and the proposedmethod are as described earlier in this section. From the figure,we note that the detection versus false alarm tradeoff for theproposed method is comparatively much superior. Due to back-ground learning, the detection accuracy is quite good even atlow values of FAR. The traditional EFR method, on the otherhand, is quite susceptible to false alarms.

The complexity of background learning is of the same orderas that of the -means algorithm. In our experiments, the

-means algorithm converged in about 15–20 iterations. Foreach iteration, the complexity is O(KBD), where K is thenumber of clusters, B is the number of background samples

842 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 6, JUNE 2005

Fig. 7. Some results for the proposed method on outdoor images. (a) Examples of side-view of faces. (b) Different illumination conditions for two individuals.(c) Example images containing several people within the same image.

Fig. 8. Detection rate versus FAR the proposed method and the traditionalEFR method.

and D is the dimension of the sample vector. The complexity oftesting is roughly twice that of the traditional EFR as there arenow two subspaces to work with instead of one. On a PentiumIV PC with 1.7-GHz clock speed and 256 MB RAM, themethod takes about 10 s for background learning. For testing,it takes about 3 s.

There are some simple ways to reduce the global computa-tional complexity of the algorithm. For example, instead of pro-cessing each and every pixel, one could process every alternatepixel along rows and columns. For further speedup, one could

skip processing of some of the pixels in the immediate neigh-borhood of an already identified face. We also wish to point outthat for situations where people appear against a relatively con-stant or slowly changing clutter, background learning need bedone either only once or very infrequently.

VII. CONCLUSION

In the literature, the eigenface technique has been demon-strated to be very useful for face recognition. However, when thescheme is directly extended to recognize faces in the presenceof background clutter, its performance degrades as it cannot sat-isfactorily discriminate against nonface patterns. In this paper,we have presented a robust scheme for recognizing faces in stillimages of natural scenes. We argue in favor of constructing aneigenbackground space from the background images of a givenscene. The background space which is created “on the fly” fromthe test image is shown to be very useful in distinguishing non-face patterns. The scheme outperforms the traditional EFR tech-nique and gives very good results with almost no false alarms,even on fairly complicated scenes. Several results on real im-ages have been given for purpose of validation.

In our analysis, the distributions were assumed to beGaussian. One could explore the possibility of coming up witha classifier that would take even higher order statistics[23] of the data into account for classification. For backgroundlearning, one must decide the number of background centersbased on the resolution of the image. It would be very usefulto develop a formal methodology to arrive at the number ofbackground centers for a given image.

RAJAGOPALAN et al.: BACKGROUND LEARNING FOR ROBUST FACE RECOGNITION 843

ACKNOWLEDGMENT

The authors would like to thank the reviewers for their usefulsuggestions that helped in enhancing the presentation of thispaper.

REFERENCES

[1] L. Sirovich and M. Kirby, “Low-dimensional procedure for the charac-terization of human faces,” J. Opt. Soc. Amer. A, vol. 4, pp. 519–524,1987.

[2] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neu-rosci., vol. 3, pp. 71–86, 1991.

[3] R. Brunelli and T. Poggio, “Face recognition: Features vs templates,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 8, pp. 1042–1053,Aug. 1993.

[4] A. Pentland, B. Moghaddam, and T. Starner, “View-based and modulareigenspaces for face recognition,” in Proc. IEEE Int. Conf. ComputerVision and Pattern Recognition, 1994, pp. 84–91.

[5] F. Samaria and S. Young, “HMM-based architecture for face identifica-tion,” Image Vis. Comput., pp. 537–543, 1994.

[6] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recog-nition of faces: A survey,” Proc. IEEE, vol. 83, no. 5, pp. 705–740, May1995.

[7] B. Moghaddam and A. Pentland, “Probabilistic visual learning for objectrepresentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 5,pp. 696–710, May 1997.

[8] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. Fisher-faces: Recognition using class specific linear projection,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 19, no. 5, pp. 711–720, May 1997.

[9] K. Etemad and R. Chellappa, “Discriminant analysis for recognition ofhuman face images,” J. Opt. Soc. Amer. A, vol. 14, pp. 1724–1733, 1997.

[10] D. L. Swets and J. Weng, “Using discriminant features for image re-trieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 6, pp.831–836, Jun. 1996.

[11] W. Zhao, A. Krishnaswamy, R. Chellappa, D. L. Swets, and J. Weng,“Discriminant analysis of principal components for face recognition,”in Face Recognition: From Theory to Applications, H. Wechsler, P. J.Phillips, V. Bruce, F. F. Soulie, and T. S. Huang, Eds. New York:Springer-Verlag, 1998, pp. 73–85.

[12] C. Liu and H. Wechsler, “Evolutionary pursuit and its application to facerecognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 6, pp.570–582, Jun. 2000.

[13] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The FERET evalua-tion methodology for face-recognition algorithms,” IEEE Trans. PatternAnal. Mach. Intell., vol. 22, no. 8, pp. 1090–1103, Aug. 2000.

[14] A. Pentland, “Looking at people: Sensing for ubiquitous and wearablecomputing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 1, pp.107–119, Jan. 2000.

[15] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. PatternAnal. Mach. Intell., vol. 23, no. 2, pp. 228–233, Feb. 2001.

[16] K. Sung and T. Poggio, “Example-based learning for view-based humanface detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 1,pp. 39–51, Jan. 1998.

[17] H. A. Rowley, S. Baluja, and T. Kanade, “Neural network-based facedetection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 1, pp.23–38, Jan. 1998.

[18] A. N. Rajagopalan, K. S. Kumar, J. Karlekar, R. Manivasakan, M. M.Patil, U. B. Desai, P. G. Poonacha, and S. Chaudhuri, “Locating humanfaces in a cluttered scene,” Graph. Models Image Process., vol. 62, pp.323–342, 2000.

[19] E. Hjelmas and B. K. Low, “Face detection: A survey,” Comput. VisionImage Understanding, vol. 83, pp. 236–274, 2001.

[20] M. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces in images:A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 1, pp.34–58, Jan. 2002.

[21] R. O. Duda and P. E. Hart, Pattern Classification and Scene Anal-ysis. New York: Wiley, 1973.

[22] K. Fukunaga, Introduction to Statistical Pattern Recognition. NewYork: Academic, 1991.

[23] A. N. Rajagopalan and R. Chellappa, “Higher-order statistics-based de-tection of vehicles in still images,” J. Opt. Soc. Amer. A, vol. 18, pp.3037–3048, Dec. 2001.

A. N. Rajagopalan received the Ph.D. degree in electrical engineering from theIndian Institute of Technology (IIT), Bombay, in 1998.

During the summer of 1998, he was a Visiting Scientist at the Image Com-munication Laboratory, University of Erlangen, Erlangen, Germany. He joinedthe Center for Automation Research, University of Maryland, College Park, inOctober 1998 and was on the research faculty as Assistant Research Scientistuntil September 2000. Since October 2000, he has been an Assistant Professorin the Department of Electrical Engineering, IIT, Madras. His research interestsinclude depth recovery from defocus, image restoration, superresolution, facedetection, and recognition, and higher-order statistical learning. He is a co-au-thor of the book Depth From Defocus: A Real Aperture Imaging Approach (NewYork: Springer-Verlag, 1999).

Rama Chellappa (S’78–M’79–SM’83–F’92) received the B.E. (Hons.) degreefrom the University of Madras, Madras, India, in 1975 and the M.E. (Distinc-tion) degree from the Indian Institute of Science, Bangalore, in 1977. He re-ceived the M.S.E.E. and Ph.D. degrees in electrical engineering from PurdueUniversity, West Lafayette, IN, in 1978 and 1981, respectively.

Since 1991, he has been a Professor of electrical engineering and an affiliateProfessor of Computer Science at the University of Maryland, College Park.He is also affiliated with the Center for Automation Research (Director) andthe Institute for Advanced Computer Studies (permanent member). Prior tojoining the University of Maryland, he was an Assistant Professor (1981 to1986) and an Associate Professor (1986 to 1991) and Director of the Signaland Image Processing Institute (1988 to 1990) with the University of SouthernCalifornia (USC), Los Angeles. Over the last 22 years, he has published nu-merous book chapters and peer-reviewed journal and conference papers. Hehas edited a collection of Papers on Digital Image Processing (Los Alamitos,CA: IEEE Computer Society Press, 1992), coauthored a research monographon Artificial Neural Networks for Computer Vision (with Y. T. Zhou) (NewYork: Springer-Verlag, 1990), and co-edited a book on Markov Random Fields:Theory and Applications (with A. K. Jain) (New York: Academic, 1993). Hiscurrent research interests are face and gait analysis, 3-D modeling from video,automatic target recognition from stationary and moving platforms, surveillanceand monitoring, hyperspectral processing, image understanding, and commer-cial applications of image processing and understanding.

Dr. Chellappa has served as an Associate Editor of the IEEE TRANSACTIONS

ON SIGNAL PROCESSING, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND

MACHINE INTELLIGENCE, IEEE TRANSACTIONS ON IMAGE PROCESSING, andIEEE TRANSACTIONS ON NEURAL NETWORKS. He was Co-Editor-in-Chiefof Graphical models and Image Processing. He is now serving as the Ed-itor-in-Chief of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE

INTELLIGENCE. He served as a member of the IEEE Signal Processing SocietyBoard of Governors from 1996 to 1999. Currently, he is serving as the VicePresident of Awards and Membership for the IEEE Signal Processing Society.He has served as a General the Technical Program Chair for several IEEEinternational and national conferences and workshops. He received severalawards, including the National Science Foundation (NSF) Presidential YoungInvestigator Award, an IBM Faculty Development Award, the 1990 Excellencein Teaching Award from School of Engineering at USC, the 1992 Best IndustryRelated Paper Award from the International Association of Pattern Recognition(with Q. Zheng), and the 2000 Technical Achievement Award from the IEEESignal Processing Society. He was elected as a Distinguished Faculty ResearchFellow (1996 to 1998) at the University of Maryland, he is a Fellow of the Inter-national Association for Pattern Recognition, and he received a DistinguishedScholar-Teacher award from the University of Maryland in 2003.

Nathan T. Koterba received the B.S. degree in computer engineering from theUniversity of Maryland, College Park, in 2004.

During summers of 2002 and 2003, he participated in the University ofMaryland’s Maryland Engineering Research Internship Teams (MERIT)program, working in the Center for Automation Research under the supervisionof Dr. R. Chellappa and Dr. A. N. Rajagopalan. He helped research new facedetection and recognition schemes and collected data for an article in theNovember 2003 issue of National Geographic. He is currently with the JohnsHopkins Applied Physics Laboratory and plans to pursue the Ph.D. degree inhuman computer interaction and human systems engineering.