Deterioration of visual information in face classification using Eigenfaces and Fisherfaces

15
Machine Vision and Applications (2006) 17(1): 68–82 DOI 10.1007/s00138-006-0016-4 ORIGINAL PAPER Gabriel Jarillo Alvarado · Witold Pedrycz · M. Reformat · Keun-Chang Kwak Deterioration of visual information in face classification using Eigenfaces and Fisherfaces Received: 18 July 2005 / Accepted: 9 January 2006 / Published online: 11 March 2006 C Springer-Verlag 2006 Abstract In the area of biometrics, face classification be- comes one of the most appealing and commonly used ap- proaches for personal identification. There has been an on- going quest for designing systems that exhibit high classifi- cation rates and portray significant robustness. This feature becomes of paramount relevance when dealing with noisy and uncertain images. The design of face recognition classi- fiers capable of operating in presence of deteriorated (noise affected) face images requires a careful quantification of de- terioration of the existing approaches vis-` a-vis anticipated form and levels of image distortion. The objective of this experimental study is to reveal some general relationships characterizing the performance of two commonly used face classifiers (that is Eigenfaces and Fisherfaces) in presence of deteriorated visual information. The findings obtained in our study are crucial to identify at which levels of noise the face classifiers can still be considered valid. Prior knowl- edge helps us develop adequate face recognition systems. We investigate several typical models of image distortion such as Gaussian noise, salt and pepper, and blurring ef- fect and demonstrate their impact on the performance of the two main types of the classifiers. Several distance mod- els derived from the Minkowski family of distances are in- vestigated with respect to the produced classification rates. The experimental environment concerns a well-known stan- dard in this area of face biometrics such as the FERET database. The study reports on the performance of the clas- sifiers, which is based on a comprehensive suite of exper- iments and delivers several design hints supporting further developments of face classifiers. Keywords Face recognition · Deterioration of visual information · Principal component analysis · Linear discriminant analysis · Fisherfaces and Eigenfaces · FERET face database G. J. Alvarado (B ) · W. Pedrycz · M. Reformat · K.-C. Kwak Department of Electrical and Computer Engineering, University of Alberta, 9107-116 Street, Edmonton, Alberta, Canada E-mail: {gabrielj, pedrycz, reform, kwak}@ece.ualberta.ca 1 Introduction In biometrics today, face recognition is one of the most im- portant and appealing technologies being investigated and applied in practice. Face images are widely used and gen- erally accepted as a vehicle of personal identification. As such, they are used in official documents, for instance pass- ports and drivers’ licences. Although there are some reliable methods of biometric personal identification such as finger- printing and iris scanning, these methods usually require ex- pensive specialized equipment and quite evident cooperation of the participants, conditions which are not always at hand. Although researchers in psychology, neural sciences and en- gineering, image processing, and computer vision have in- vestigated a number of issues related to face recognition by humans and machines, it is still difficult to design an auto- matic system capable of addressing the efficacies and diver- sities of this task [1]. In the face recognition process, the main idea is to capture the distinctiveness of a face with- out being overly sensitive to noise, such as lightning condi- tions and facial expression. Usually a face image (or face, for brief) is transformed into a space that is spanned by ba- sis image functions, just like a Fourier transform projects an image onto basis images of the fundamental frequencies. In the case of the Eigenfaces method the basis functions, also known as “Eigenfaces”, are the eigenvectors of the covari- ance matrix of a set of the training images [2]. In a similar manner, “Fisherfaces” considers the basis functions as the discriminant vectors of a matrix formed by the scatter be- tween and the scatter within classes of a training set. In general, the amount of information presented in an image is too large to actually be able to implement the sta- tistical approaches in a straightforward manner, this is, the number of pixels of an image usually makes the computa- tion of the covariance matrix impossible, rising up the need of dimensionality reduction. A technique now commonly used is a Principal Component Analysis (PCA). PCA tech- niques, also known as Karhunen-Loeve (KL) methods, se- lect a dimensionality reducing linear projection that max- imizes the scatter of all projected samples [3]. A second

Transcript of Deterioration of visual information in face classification using Eigenfaces and Fisherfaces

Machine Vision and Applications (2006) 17(1): 68–82DOI 10.1007/s00138-006-0016-4

ORIGINAL PAPER

Gabriel Jarillo Alvarado · Witold Pedrycz ·M. Reformat · Keun-Chang Kwak

Deterioration of visual information in face classificationusing Eigenfaces and Fisherfaces

Received: 18 July 2005 / Accepted: 9 January 2006 / Published online: 11 March 2006C© Springer-Verlag 2006

Abstract In the area of biometrics, face classification be-comes one of the most appealing and commonly used ap-proaches for personal identification. There has been an on-going quest for designing systems that exhibit high classifi-cation rates and portray significant robustness. This featurebecomes of paramount relevance when dealing with noisyand uncertain images. The design of face recognition classi-fiers capable of operating in presence of deteriorated (noiseaffected) face images requires a careful quantification of de-terioration of the existing approaches vis-a-vis anticipatedform and levels of image distortion. The objective of thisexperimental study is to reveal some general relationshipscharacterizing the performance of two commonly used faceclassifiers (that is Eigenfaces and Fisherfaces) in presenceof deteriorated visual information. The findings obtained inour study are crucial to identify at which levels of noise theface classifiers can still be considered valid. Prior knowl-edge helps us develop adequate face recognition systems.We investigate several typical models of image distortionsuch as Gaussian noise, salt and pepper, and blurring ef-fect and demonstrate their impact on the performance ofthe two main types of the classifiers. Several distance mod-els derived from the Minkowski family of distances are in-vestigated with respect to the produced classification rates.The experimental environment concerns a well-known stan-dard in this area of face biometrics such as the FERETdatabase. The study reports on the performance of the clas-sifiers, which is based on a comprehensive suite of exper-iments and delivers several design hints supporting furtherdevelopments of face classifiers.

Keywords Face recognition · Deterioration of visualinformation · Principal component analysis · Lineardiscriminant analysis · Fisherfaces and Eigenfaces · FERETface database

G. J. Alvarado (B) · W. Pedrycz · M. Reformat · K.-C. KwakDepartment of Electrical and Computer Engineering,University of Alberta, 9107-116 Street, Edmonton, Alberta, CanadaE-mail: {gabrielj, pedrycz, reform, kwak}@ece.ualberta.ca

1 Introduction

In biometrics today, face recognition is one of the most im-portant and appealing technologies being investigated andapplied in practice. Face images are widely used and gen-erally accepted as a vehicle of personal identification. Assuch, they are used in official documents, for instance pass-ports and drivers’ licences. Although there are some reliablemethods of biometric personal identification such as finger-printing and iris scanning, these methods usually require ex-pensive specialized equipment and quite evident cooperationof the participants, conditions which are not always at hand.Although researchers in psychology, neural sciences and en-gineering, image processing, and computer vision have in-vestigated a number of issues related to face recognition byhumans and machines, it is still difficult to design an auto-matic system capable of addressing the efficacies and diver-sities of this task [1]. In the face recognition process, themain idea is to capture the distinctiveness of a face with-out being overly sensitive to noise, such as lightning condi-tions and facial expression. Usually a face image (or face,for brief) is transformed into a space that is spanned by ba-sis image functions, just like a Fourier transform projects animage onto basis images of the fundamental frequencies. Inthe case of the Eigenfaces method the basis functions, alsoknown as “Eigenfaces”, are the eigenvectors of the covari-ance matrix of a set of the training images [2]. In a similarmanner, “Fisherfaces” considers the basis functions as thediscriminant vectors of a matrix formed by the scatter be-tween and the scatter within classes of a training set.

In general, the amount of information presented in animage is too large to actually be able to implement the sta-tistical approaches in a straightforward manner, this is, thenumber of pixels of an image usually makes the computa-tion of the covariance matrix impossible, rising up the needof dimensionality reduction. A technique now commonlyused is a Principal Component Analysis (PCA). PCA tech-niques, also known as Karhunen-Loeve (KL) methods, se-lect a dimensionality reducing linear projection that max-imizes the scatter of all projected samples [3]. A second

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 69

issue that commonly arises in the face recognition systemsis the “small sample size” problem [4, 5]. It refers to thefrequent scenario of having a number of available samplesfar smaller than the dimensionality of the samples—a situ-ation that leads to a poor representation of the entire spaceand results in high error rates of face classification. Anotheraspect to be aware of relates to the lightning of the envi-ronment, significant differences in lightning conditions maycause a person look like a different individual and lead tomisclassification. A well-known method that can alleviatevarious lightning conditions exploits Fisherfaces [3], it takesadvantage of Linear Discriminant Analysis (LDA). Fisher-faces method is a class-driven approach in the sense that itattempts to split the categories according to the scatter ofimages occurring between the classes and the scatter withinthe classes.

In practical applications, image classification usuallytakes place in environments in which we have a fairly lim-ited control over affecting noise. For instance, consider facerecognition in a system involving security cameras, or placeswhere access is quite inconvenient or simply restricted. Thissituation gives rise to the genuine need for a careful quan-tification, analysis, and design recommendations as to the is-sues of coping with the deteriorated character of visual infor-mation in face classifiers. It is of our interest to identify howvarious types of noisy environments affect the deteriorationof existing classification algorithms, particularly Eigenfacesand Fisherfaces. Our intention is to carefully quantify thelevels of noise these classifiers are able to tolerate, and toreveal the behavior, in terms of error rates, of such classi-fiers in the presence of deteriorated information. The find-ings obtained in our study are crucial to identify at whichlevels of noise the face classifiers can still be consideredvalid, therefore we do not consider preprocessing steps thatmay improve the classification rates, for instance by meansof denoising algorithms. Prior knowledge helps us developadequate face recognition systems. The two classifiers aretested under the presence of deteriorated quality of visual in-formation, changes in lightning conditions, and at the sametime they are exposed to the small sample size problem. Tocome up with a quantitative assessment of the classifiers, weare concerned with several typical models of image distor-tion such as Gaussian noise, salt-and-pepper, and blurringeffect originated by misfocus of lenses. Likewise, a numberof distances (viz. Euclidean, Hamming, and Tschebyshev)coming from the general class of Minkowski distances aresought as possible components of the classification architec-tures. We show and characterize dependencies between theproposed models of image deterioration along with differentdistances, and consequential classification results. The com-ments and guidelines provided as a result of this experimen-tal study may serve other researchers in their work concern-ing face recognition, a direct example may be when dealingwith super-resolution [6].

This paper is arranged in the following manner.Section 2 concisely summarizes the required formal back-ground knowledge of the Eigenfaces and Fisherfaces clas-

sification techniques. Section 3 focuses on a complete de-scription of the classifier adopted in this study. In the se-quel, Sect. 4 introduces the FERET dataset and elaborateson the experimental conditions showing how various sce-narios of forming noisy classification environment are built.Section 5 continues with a description of the detailed exper-imental setup. The crux of experimental results is coveredin Sect. 6. Finally, in Sect. 7 we draw conclusions and offersome general comments.

2 Face recognition – feature extraction

In general, face recognition of still images can be di-vided into two categories: geometric matching and tem-plate matching. In case of geometric matching, the geo-metric characteristics of faces are compared. For templatematching, an image feature is extracted from an array ofpixels (face image), and compared to a set of feature tem-plates stored in a database. Many of the templates match-ing approaches use principal component analysis (PCA) orlinear discriminant analysis (LDA) as a vehicle that leadsto the development of highly representative features of im-age (feature space) while reducing high dimensionality ofthe original images (faces). In general, PCA and LDA havethe advantage of achieving good dimensionality reduction ofthe original face image at a reasonable computational cost.Given the popularity and omnipresence of such approaches,we consider it especially important to assess their robustnessagainst different types of visual deterioration. In what fol-lows, we briefly recall the fundamentals of those ideas andintroduce all the required notation.

2.1 Eigenfaces method

To complete the training process of face classifiers within theframework of Eigenfaces it is required to compute eigenvec-tors and eigenvalues of the covariance matrix of the trainingimage set. The set of computed eigenvectors define a newface space in which the images are represented. Given thecollection of eigenvectors, we construct a set of feature vec-tors for each image. To fix the required notation, let us intro-duce the following symbols.

Let an image z(x, y) represent an image array of pixels;equivalently such image can be represented as a vector z ofn pixels. Given a set Z = {z1, z2, . . . , zN } of N image vec-tors formed in an n-dimensional space. In total we assumethe existence of c possible individuals (classes). The covari-ance matrix R of the training image set is defined in a usualmanner

R = 1

N

N∑

i=1

(zi − z)(zi − z)T (1)

where z is the mean vector of all images occurring in thetraining set. Let us denote the term (zi − z) in (1) as �i , and

70 G. J. Alvarado et al.

A = [�1, �2, . . . , �N ], therefore R can be expressed as

R = 1

NAAT (2)

Since R ∈ Rn the computation becomes nonfeasible and ex-pensive even for small images. The dimensionality reductionbecomes a must. Fortunately we can achieve dimensionalityreduction by solving for a smaller matrix T ∈ RN [7], manyresearchers in the field have followed such approach sincethe early era of face recognition, for instance see [8]. T isdescribed as

T = BT B (3)

where B = A√N

. The eigenvalues of R are the same as theeigenvalues of T, and the eigenvectors of R are the same asthe normalized eigenvectors of TB. B can be decomposedinto B = U W V T by applying singular value decomposition(SVD). By substitution of B in (3), T becomes

T = V W T W V T (4)

where V is the matrix of eigenvectors obtained from T . Bydefinition of eigenvectors, if vi is an eigenvector correspond-ing to an eigenvalue λi , we can develop T vi = λi vi , there-fore BT Bvi = λi vi T vi = λi vi . After some manipulationwe obtain RBvi = λi Bvi .

Bvi represents the ith eigenvector of the covariance ma-trix R. Therefore, given that V is the matrix of eigenvectorsof T it follows that BV = E . The eigenvalues of R arecomputed as W T W . Thus the outlined approach is feasibleand far less expensive than solving for the original R. If theeigenvectors are depicted as images then they show up a setof ghostly faces commonly known as Eigenfaces.

Once the face space has been constructed, its feature vec-tors are formed as a linear combination of the eigenvectorsof the covariance matrix R, which makes it possible to re-construct the original images Z . We project an image zi intothe face space through the following transformation

wi = ET(zi − z) (5)

where w is a vector of weights associated with the eigen-vectors in E . One can experiment with different number ofeigenvectors to compute the weight vectors.

2.2 Fisherfaces method

Fisherfaces come as a refinement of Eigenfaces, its majorimprovement is evident from its robustness against light-ning conditions and facial expressions, although some claimthat under particular circumstances Eigenfaces outperformsFisherfaces [9]. Fisherfaces is a class-specific approach inthe sense it attempts to maximize the separability of theclasses within the linear subspace. This is accomplished bymaximizing the ratio of the scatter between classes and thescatter within classes. The discriminatory power of Fisher-faces has been extensively studied and quantified, see [10].

One can complete dimensionality reduction using linear pro-jection and still preserve linear separability. This is a strongargument in favor of using linear methods for dimensionalityreduction [3]. The scatter between classes reads as

SB =c∑

i=1

Ni (mi − m)(mi − m)T (6)

where c is the number of classes, Ni is the number of im-ages belonging to class i, mi is the mean vector of imagedtransformed by PCA belonging to class i, and m is the meanvector of all images transformed by PCA. The scatter withinclasses is defined as

SW =c∑

i=1

w j ∈i

(w j − mi )(w j − mi )T (7)

The ratio to be maximized comes in the form

DLDA = argD max

∣∣DT SB D∣∣

∣∣DT SW D∣∣ = [d1, d2, . . . , dm] (8)

where {di |i=1,2,... ,m} is the set of generalized eigenvectorsof SB and SW associated to the set of the largest eigenvalues{λi |i=1,2,... ,m}, and m is the rank of DLDA. In other wordsSBdi = λi SW di |i=1,2,... ,m , meaning that,

S−1W SBwi = λi wi (9)

where wi is the i th eigenvector of SW−1SB . As the rank of

SB is c−1, we arrive at DLDA = {d1, d2,· · · ,dc−1}. To com-pute a fisher feature vector vi we form a new linear combi-nation of the features transformed by PCA with DLDA, thisis

vi = DTLDAwi = DT

LDA ET(zi − z) (10)

3 Face recognition – a nearest neighbor classifier design

The classification task is cast in the standard frameworkof pattern recognition. Given the specificity of the task athand, it is instructive to consider one of the simplest formsof classifiers such as a nearest neighbor classification rule(NN classifier). We envision the nearest neighbor classifierto serve as a reference classifier.

As outlined in Sect. 2, the feature extraction methodsemployed in this study consider Euclidean distance implic-itly for their derivation. The Euclidean distance is a mem-ber of a general family of distances, formally known asMinkowski distances. In this study we take a broader lookat the distance model and analyze its impact on face recog-nition. In this study, we are concerned with the explo-ration of various distance models derived from the familyof Minkowski distances, expressed as [11]

d(v′, v) = p

√√√√n∑

j=1

|v′j − v j |p (11)

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 71

where v j is the jth component of the feature vector. In par-ticular, if p = 1 the relationship arises as the Hamming dis-tance, if p = 2 (11) returns the Euclidean distance. Finally,if p = ∞ we arrive at the Tschebyshev distance. In fact (11)describes an infinite number of distances, all depending onthe values of p, nevertheless the three mentioned above arethe most common ones.

As discussed in [12], the k-NN rule (in our case con-cerning k = 1) can be critically dependent upon the dis-tance used in the construct, especially if there are few ex-amples. To briefly outline the NN classifier, let us considera set of N feature pairs (v1, c1), . . . ,(vN , cN ), where v′

iis a feature vector in some feature space and ci takes val-ues in the set {1, 2, . . . , M}. Each ci is considered to bean index of the class which the ith pattern belongs, We callv′

n ∈ {v1, v2, . . . , vN } a nearest-neighbor of v if the follow-ing relationship holds mini d(vi , v) = d(v′

n, v) for i = 1,2, . . . , n. The nearest neighbor rule assigns v to belong tocategory c′

n of its nearest neighbor v′n [13].

4 Dataset and noisy conditions

This section elaborates on the experimental data and dis-cusses the conditions of experiments involving image degra-dation.

4.1 The FERET dataset

The faces come from the Face Recognition Technology(FERET) program database of facial images. The FERETevaluation procedure is an independently administered testof face-recognition algorithms, for details see [14]. The im-age dataset comprises variations of illumination and light-ning conditions, as well as changes in facial expressions.The FERET dataset has been widely used and referred toin the face recognition area, examples are [15–20].

In our experimental setup we use three images per personfrom a group of 200 individuals, making a total of 600 8-bit grayscale images of 256 × 384 pixels. The images weredown-sampled by a factor of two and trimmed to show onlythe face area, as presented in Fig. 1. The down-sampled andtrimmed images are 80 × 100 pixels by 8 bits per pixel.

4.2 Noisy environments

In this study we are concerned in revealing essential relation-ships between the deterioration of visual information avail-able to the classifier (images) and the resulting error rates.The forms of noise are chosen to mimic real world situ-ations, therefore we concentrate on three essential mecha-nisms of information distortion such as Gaussian noise, saltand pepper, and blurring effect, coming at different levels ofdegradation.

We admit the existence of several other important typesof noise affecting face recognition, for instance occlusion.

Fig. 1 FERET dataset, samples of trimmed images

At this point, we argue that occlusion is not necessarily atype of noise but a particular manner in which individualsare presented, in other words, the environment does not playa role in the occlusion of images. In what follows, we brieflycharacterize the noise models used in this study.

4.2.1 Salt and pepper noise

This type of noise is present when an image is coded andtransmitted over a noisy channel or degraded by electricalsensor noise [21], as such it is commonly found in scannedimages. Salt and pepper noise consists of white and blackdots (salt and pepper) distributed randomly throughout animage. To reproduce this noise, the value of randomly se-lected pixels is changed to either white or black, and a prob-ability density function specifies the quantity of pixels to bemodified, half of the pixels are changed to white and the re-maining half to black.

4.2.2 Gaussian noise

Noise having Gaussian-like distribution is very often en-countered in acquired data. For example, as an image is lit-erally a flow of energy, it is necessary to use some recordingmechanism which captures the energy flow at a particulartime and makes a permanent record, such mechanisms arealways imperfect and add some noise to the images [21]. Tosimulate this form of noise, the values of all pixels are mod-ified to follow a normal Gaussian distribution with a zeromean and certain variance.

4.2.3 Blurring effect

An image can be degraded by blurring effect due to misfo-cus of lenses, motion, or atmospheric turbulence, essentiallythe high frequency details of the image are reduced [21]. Inthis study, the out of focus blur is considered, the blurringis generated by computing the average of the pixels around

72 G. J. Alvarado et al.

Fig. 2 FERET dataset, Illustration of uncorrupted and corrupted images

a particular pixel, the block size defines the amount of sidepixels to form the block, the new value is set to the pixel inthe center of such block. The blurring effect increases as theblock size increases. To complete a set of experiments, weconsider several levels of degradation of visual informationby admitting the following scenarios:

– Gaussian noise with zero mean and variance of 0.01,0.02, . . . , 0.05.

– Salt and pepper noise with density equal to 10%,20%, . . . , 50%.

– Blurring effect with square block of sizes averaging 3 ×3, 5 × 5, . . . , 13 × 13.

Figure 2 shows examples of the FERET dataset with su-perimposed distortion.

5 Experimental setup

The setup is geared towards a thorough and in-depth inves-tigation of the robustness of the Eigenfaces and Fisherfacesclassifiers operating at different noise scenarios, involvingboth, the type of deterioration and its intensity. We report onthe performance of the classifier in terms of error rate in theusual manner [22–24]. Two different experimental scenariosare considered in our investigation.

5.1 Training with uncorrupted images

It involves a comprehensive training of Eigenfaces and Fish-erfaces classifiers with the use of clean (uncorrupted) im-ages, while the testing is performed with uncorrupted andcorrupted images.

Table 1 Levels of image distortion in the training sets

Distortion Blurring effect Density of salt and Gaussian noisemodel (pixel block) pepper noise (%) variance

Low 3 × 3 10 0.01Medium 7 × 7 30 0.03High 9 × 9 50 0.05

5.2 Training with corrupted and uncorrupted images

For this scenario the training set includes the collectionof uncorrupted images mixed with some corrupted images,while the testing is performed over uncorrupted and cor-rupted images.

The levels of image deterioration occurring in the train-ing sets are referred as low, medium, and high; refer toTable 1. To obtain the representative behavior of the clas-sifiers, an average is taken over a number of experiments. A3-fold cross validation across individuals is performed (tworandomly selected images to train and one remaining imageto test). All classes are considered in the training and test-ing sets. All classes are contemplated in both, the trainingand the testing sets. The training and testing sets of the firstscenario comprised of 400 images and 200 images respec-tively, while in the second scenario there are 800 images inthe training set (400 uncorrupted images plus the same im-ages corrupted by only one type of distortion and involvinga single level of deterioration) and 200 images in the testingset. In summary, Table 2 presents the characteristics of bothscenarios.

As a first step, the Eigenfaces classifier is trained andlater tested using only uncorrupted images. The error ratesare computed considering different numbers of eigenvectors,starting with the eigenvectors with the highest eigenvaluesto the maximum possible number of eigenvectors. Lateron, the classifier considers only the selected eigenvectors

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 73

Table 2 Description of training and testing sets

Training set Testing set

Scenario 1: training with uncorrupted images 3 sets of 400 uncorrupted images3 sets of 400 uncorrupted images combined 3 sets of 200 images. Each level of distortion,

with 400 corrupted images at low level including uncorrupted images, is consideredof distortion independently in the testing sets.

Scenario 2: training with corrupted 3 sets of 400 uncorrupted images combinedand uncorrupted images with 400 corrupted images at medium level

of distortion3 sets of 400 uncorrupted images combined

with 400 corrupted images at high levelof distortion

during the classification process. Eigenfaces is then testedwith corrupted images affected by different levels ofdegradation.

In the same fashion, Fisherfaces is trained and latertested with only uncorrupted images. Subsequently the errorrates are computed according to variations in the number ofdiscriminant vectors. This approach allows us to determinethe optimal discriminant vectors to compute the feature vec-tors. Fisherfaces is then used to classify taking into accountonly the chosen discriminant vectors. It is later tested withcorrupted images and different levels of image distortion, allthis following a 3-fold cross validation technique.

The vectors’ size of the images transformed by PCAto train the Fisherfaces method is based upon the portionof the total variance that their eigenvectors account for, inthis study Fisherfaces was trained using the portion that ac-count for 90% (or the closest possible) of the total variance.Other selections of variance have been used to compute theweights of the images, for example, Bartlett et al. [25] con-sidered those eigenvectors that account for 98% of the vari-ance, approximately 200 eigenvectors, however we foundthat those eigenvectors that account for 90% of the varianceprovide lower error rates in the case of Fisherfaces method.

In general, the error rate decreases as we increase thenumber of eigenvectors and discriminant vectors. It is of in-terest to find the general optimal number of eigenvectors toadequately represent the images in the face space withoutdamaging the classification rates. This reduces the computa-tional effort of the face classification methods and providesa simpler representation of the images in the face space.

6 Results

This section reports on the experimental results of Eigen-faces and Fisherfaces classifiers under the scenarios outlinedabove. We start with the first scenario (training with uncor-rupted images) and then move on to the second one (involv-ing corrupted images in the training phase).

6.1 Training with uncorrupted images

Figure 3 summarizes the error rates of the Eigenfaces andFisherfaces methods obtained for the first scenario using

0%

20%

40%

60%

80%

100%

1 11 21 31 41 51 61 71 81 91 101

Eigenvectors

Err

or R

ate

Euclidean

Hamming

Tschebyshev

a) Eigenfaces

0%

20%

40%

60%

80%

100%

1 11 21 31 41

Discriminant Vectors

Err

or R

ate

Euclidean

Hamming

Tschebyshev

b) Fisherfaces

Fig. 3 Average error rates of a Eigenfaces and b Fisherfaces classifiers

several distances. To make the plots readable we show er-ror rates up to 100 eigenvectors. In case of Eigenfaces,the numbers of eigenvectors that provide the lowest errorrates are 73, 200, and 30 eigenvectors for the Euclidean,Hamming, and Tschebyshev distance, respectively. In thecase of Fisherfaces, the classifier was trained using the por-tion of faces transformed by PCA that accounts for approxi-mately 90% of the total variance.

Please observe that in Fisherfaces there are at mostc−1 discriminant vectors available, therefore the previ-ous plots depict Fisherfaces up to such extent along the

74 G. J. Alvarado et al.

independent axis. For further insight refer to [3] or to Sect.2.2 in this paper.

In Fig. 3 we notice that the error rate goes up as thenumber of discriminant vectors increases, this may suggestthat some over-fitting has happened in the training phase.As there are only two images per class and 200 classes, wefall into the small sample size problem, not having enoughimages to adequately “cover” the entire Fisherface space.

Each discriminant vector adds an extra dimension in thefisher space to represent the images, however, most of thevariance is comprised in the primary discriminant vectors.As we add more discriminant vectors to compute the featurevectors, we also add extra information that may not be rele-vant to distinguish between individuals, and at the same timewe also expand the dimensionality of the space. If the vari-ances attributed to the added discriminant vectors are small

Fig. 4 Impact of image distortion on Eigenfaces method

Fig. 5 Impact of image distortion on Fisherfaces method

compared to the rest, then we are basically not providingrelevant information to the feature vectors.

The distance function may offer some insight at thispoint, for example, the classification using Tschebyshev dis-tance shows the most predominant increase in the error rate,this can be thought of as if the discriminant vectors thatproduce the elements in the feature vectors are those thatdo not introduce relevant information for classification, andthey could very well be those used to compute the distance,shadowing the relevant ones.

Figures 4 and 5 show the impact of image distortionover the testing sets using Eigenfaces and Fisherfaces clas-sifiers. The data is organized according to types and levelsof noise in the testing sets. A positive standard deviation isalso depicted on top of each bar. Table 3 refers to the num-ber of eigenvectors and discriminant vectors that provided

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 75

Table 3 Number of eigenvectors and discriminant vectors providingthe lowest error rate

Euclidean Hamming Tschebyshev

Eigenvectors 73 200 30Discriminant vectors 24 27 21

Table 4 Number of eigenvectors providing the lowest error rates con-sidering low, medium, and high image distortion levels occurring in thetraining set

Distance typeDistortion Distortionmodel level Euclidean Hamming Tschebyshev

Blur Low 78 358 29Medium 61 373 11High 145 139 24

Salt and pepper Low 86 175 31Medium 89 247 17High 90 206 36

Gaussian Low 75 155 48Medium 71 163 31High 71 423 31

the lowest error rates in the classification of uncorrupted im-ages, these eigenvectors and discriminant vectors were usedto compute the error rates present in the figures.

6.2 Training with corrupted and uncorrupted images

The main results are collected across various image distor-tion models. Table 4 presents the number of eigenvectorsthat provide the lowest error rates in Eigenfaces method,the reported results are obtained using a collection of un-corrupted images combined corrupted images with low,medium, and high level of distortion in the training sets.

Table 5 describes the number of discriminant vectors thatprovide the lowest error rate over uncorrupted images. These

Table 5 Number of discriminant vectors that provide the lowest errorrates considering low, medium, and high image distortion in the train-ing set

Distance typeDistortion Distortionmodel level Euclidean Hamming Tschebyshev

Blur Low 18 21 17Medium 25 26 11High 23 23 12

Salt and pepper Low 29 29 21Medium 10 14 15High 6 5 6

Gaussian Low 26 24 23Medium 34 44 31High 26 20 28

eigenvectors are those to be used by Fisherfaces when car-rying out testing with the use of corrupted images.

Figures 6 and 7 show the impact of blurring effect overEigenfaces and Fisherfaces methods respectively, they con-template Euclidean, Hamming, and Tschebyshev distances.The performance of Eigenfaces and Fisherfaces is scruti-nized in terms of error rate and standard deviation. Theindependent axis shows three main groups labeled as low,medium, and high distortion levels, these are the distortionlevels accounted in the training sets; each of these groupspresents several distortion levels that correspond to the im-age distortion in the testing sets. Tables 4 and 5 denote thenumber of eigenvectors and discriminant vectors that pro-vided the lowest error rates in the classification over uncor-rupted images; these eigenvectors and discriminant vectorswere used to compute the error rates presented in the figures.

In the same fashion Figs. 8 and 9 present the impact ofsalt and pepper noise on Eigenfaces and Fisherfaces meth-ods, respectively.

Lastly Figs. 10 and 11 present the impact of Gaussiandistortion on Eigenfaces and Fisherfaces methods, respec-tively.

From the results it is evident that the Fisherfaces methodoutperforms the Eigenfaces method for classification tasks,this is expected as the images present differences in light-ning conditions and facial expressions. However, an interest-ing thing occurs, the distance functions affect the Eigenfacesand Fisherfaces methods in different manner. In general,Hamming distance provides lower error rates in Eigenfacesface space, and Euclidean distance in Fisherfaces space.Tschebyshev distance is definitely a bad choice in any sit-uation due to the fact that it only considers the largest dis-tance between single dimensions between feature vectors.However it generally follows a trend in accordance with thedeterioration level.

Some important hypotheses to consider can be drawnfrom these findings. As the face space delineated by Eigen-faces is relatively large, and having in mind that we incurin the small sample size problem, we may find the featurevectors relatively close to one another in such large space,making the classification task difficult. Therefore, a dis-tance measure that separates the feature vectors (or spreadsthem out along the Eigenfaces space) may help to decreasethe computed error rates. Let us take a closer look at theMinkowski family of distances described in (11) with avalue of p = 1, a.k.a. Hamming distance. We see that thefeature vectors may be separated along particular axes (di-mensions) of face space when the value of particular fea-tures (elements) that form the feature vectors assume nega-tive quantities, i.e., v′

j < 0 or v j < 0, such attribute cannotbe pulled off by any other distance used in this study.

In the case of Fisherfaces, the face space is consid-erably narrower than it is in Eigenfaces, therefore usinga distance model that separates the feature vectors withinthe Fisherfaces space may not be as favorable as it is inEigenfaces space. Let us not forget that Fisherfaces alreadyseparates the feature vectors according to classes, and that its

76 G. J. Alvarado et al.

Fig. 6 Impact of blurring effect on Eigenfaces method with uncorrupted and corrupted images in the training sets

Fig. 7 Impact of blurring effect on Fisherfaces method with uncorrupted and corrupted images in the training sets

Fig. 8 Impact of salt and pepper noise on Eigenfaces method with uncorrupted and corrupted images in the training sets

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 77

Fig. 9 Impact of salt and pepper noise on Fisherfaces method with uncorrupted and corrupted images in the training sets

Fig. 10 Impact of Gaussian noise on Eigenfaces classifier with uncorrupted and corrupted images in the training sets

Fig. 11 Impact of Gaussian noise on Fisherfaces classifier with uncorrupted and corrupted images in the training sets

78 G. J. Alvarado et al.

Table 6 Suggested distances for blurring effect in Eigenfaces and Fisherfaces classifiers

Testing image set

Training image set Uncorrupted Low distortion Medium distortion High distortion

UncorruptedEigenfaces Hamming Hamming Euclidean Euclidean EuclideanFisherfaces Euclidean Euclidean Euclidean Euclidean Hamming

Uncorrupted with low distortionEigenfaces Hamming Hamming Hamming EuclideanFisherfaces Euclidean Hamming Euclidean Hamming Euclidean Euclidean

Uncorrupted with medium distortionEigenfaces Hamming Hamming Hamming HammingFisherfaces Euclidean Euclidean Euclidean Euclidean Hamming

Uncorrupted with high distortionEigenfaces Hamming Hamming Hamming HammingFisherfaces Hamming Euclidean Hamming Euclidean Hamming

Table 7 Suggested distances for salt and pepper noise in Eigenfaces and Fisherfaces classifiers

Testing image set

Training image set Uncorrupted Low distortion Medium distortion High distortion

UncorruptedEigenfaces Hamming Hamming Hamming aAny distanceFisherfaces Euclidean Euclidean Hamming Hamming aAny distance

Uncorrupted with low distortionEigenfaces Hamming Hamming Euclidean aAny distanceFisherfaces Euclidean Hamming Euclidean Hamming Euclidean Hamming aAny distance

Uncorrupted with medium distortionEigenfaces Hamming Hamming Hamming HammingFisherfaces Euclidean Euclidean Euclidean Any distance

Uncorrupted with high distortionEigenfaces Hamming Hamming Euclidean Hamming HammingFisherfaces Any distance Hamming Hamming Any distance

aVery high error rate.

Table 8 Suggested distances for Gaussian noise in Eigenfaces and Fisherfaces classifiers

Testing image set

Training image set Uncorrupted Low distortion Medium distortion High distortion

UncorruptedEigenfaces Hamming Hamming Hamming HammingFisherfaces Euclidean Euclidean Hamming Euclidean Hamming Euclidean Hamming

Uncorrupted with low distortionEigenfaces Hamming Hamming Euclidean HammingFisherfaces Euclidean Hamming Euclidean Hamming Euclidean Hamming Euclidean Hamming

Uncorrupted with medium distortionEigenfaces Hamming Hamming Hamming HammingFisherfaces Euclidean Euclidean Euclidean Euclidean

Uncorrupted with high distortionEigenfaces Hamming Hamming Hamming HammingFisherfaces Euclidean Hamming Euclidean Hamming Euclidean Hamming Euclidean

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 79

Fig. 12 Example of misclassified individuals when using Eigenfaces method

transformation matrix DL D A is computed using Euclideandistance implicitly. From this perspective it becomes reason-able that Hamming distance performs better within Eigen-faces space and Euclidean distance within Fisherfaces space.

An interesting tendency can also be observed in the im-pact of blurring effect on the Eigenfaces method, for lowlevels of image distortion, in either training or testing sets,the Hamming distance provides lower error rates; howeveras the distortion level increases, the Euclidean distance takesover providing lower error rates.

The error rates presented in the second scenario show thetendency of diminishing when the corruption in the classifi-cation sets is similar to the corruption in the training sets,this is not surprising if we think that the feature vectors inthe training set already contain the information, in terms ifvariance, of the corruption in the testing sets. Probably theclearest example can be observed in the error rates shown inFig. 7, where the error rates of training with low distortionlevel show the lowest values at the point where the classifica-tion sets contained low distortion levels as well. On the other

80 G. J. Alvarado et al.

Fig. 13 Example of misclassified individuals using Fisherfaces method

hand, the error rates of training with medium distortion levelare minimum at the points where testing is performed overuncorrupted and corrupted images with medium distortionlevels.

Based on our results we suggest introducing image dis-tortion in the training sets to improve classification perfor-mance of Eigenfaces and Fisherfaces; there is a significantreduction of the error rate by following this approach.

So far we have introduced only one image distortionmodel in the training sets at a time, however it would beof interest to assess the performance of the classifiers whentraining phase is done with several types of distortion. Suchapproach may eventually reduce the error rates when clas-sifying images with combined distortion effect. Tables 6–8summarize the findings as to the role of distance functiondepending upon the level and intensity of noise. We have

Deterioration of visual information in face classification using Eigenfaces and Fisherfaces 81

identified the type of distance that is favored in the sense itexhibits a tendency of maintaining the lowest error rates.

Some examples of the misclassified individuals when us-ing the Eigenfaces method are presented in Fig. 12; here weshow the actual individual being classified along with the in-dividual suggested by the classifier. Likewise the results inFig. 13 concern the misclassified individuals in case of theuse of the Fisherfaces method (classifier).

7 Conclusions

This paper has delivered an extensive experimental studyon the performance of Eigenfaces and Fisherfaces meth-ods (classifiers) completed under noisy conditions; in par-ticular we have investigated blurring effect, salt and pepper,and Gaussian noise. Two general design scenarios have beenproposed. When designing classifiers, we experimented withthree distances that are used to form the nearest neighborclassifier.

We have quantified the effect of noise and arrived atseveral design guidelines that can be helpful for formingface classifiers in anticipation to various noise conditionsand their intensities. In general, we have found that the in-troduction of image distortion in the training sets improvesthe classification performance of Eigenfaces and Fisherfacesmethods. In this study we obtained error rates as low as 1.5%which, as far as we know, has not been accomplished beforewith the FERET dataset.

Acknowledgements The experimental research used the FERETdatabase of facial images collected under the FERET program. Sup-port from the Natural Sciences and Engineering Research Council(NSERC) and Canada research Chair (W. Pedrycz) is gratefully ac-knowledged.

References

1. Joo Er, M., Wu, S., Lu, J., Lye Toh, H.: Face recognition withradial basis function (RBF) neural networks. IEEE Trans. NeuralNetw. 13(3), 697–710 (2002)

2. Bolle, R.M. et al.: Guide to Biometrics. Springer-Verlag, BerlinHeidelberg New York (2004)

3. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs.Fisherfaces: recognition using class specific linear projection.IEEE Trans. Pattern Anal. Machine Intell. 19(7), 711–720 (1997)

4. Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N.: Face recognitionusing kernel direct discriminant analysis algorithms. IEEE Trans.Neural Netw. 14(1), 117–126 (2003)

5. Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N.: Face recognitionusing LDA-based algorithms. IEEE Trans. Neural Netw. 14(1),195–126–208 (2003)

6. Baker, S., Kanade, T.: Limits on super-resolution and how to breakthem. IEEE Trans. Pattern Recognit. Anal. Machine Intell. 24(9),1167–1183 (2002)

7. Turk, M., Pentland, A.: Face recognition using Eigenfaces. In:Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, pp. 586–591 (1991)

8. Sirovich, L., Kirby, M.: Low-dimensional procedure for the char-acterization of human faces. J. Opt. Soc. Amer. A 4(3), 519–524(1987)

9. Martınez, A.M., Avinash, C.K.: PCA versus LDA. IEEE Trans.Pattern Analy. Machine Intell. 23(2), 228–233 (2001)

10. Etemad, K., Chellappa, K.: Discriminant analysis for recognitionof human face images. J. Opt. Soc. Amer. A 14(8), 1724–1733(1997)

11. Cios, K., Pedrycz, W., Swiniarski, R.: Data Mining Methodsfor Knowledge Discovery. Kluwer Academic Publishers, secondprinting (2000)

12. Ripley, B.D.: Pattern Recognition and Neural Networks. Cam-bridge University Press (1996)

13. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification.IEEE Trans. Inf. Theory IT-13, 21–27 (1967)

14. Phillips, P.J. et al.: The FERET database and evaluation procedurefor face recognition algorithms. Image Vis. Comput. J. 16(5), 295–306 (1998)

15. Pentland, A., Choudhury, T.: Face recognition for smart environ-ments. IEEE Computer J. 33(2), 50–55 (2000)

16. Frey, B.J., Colmenarez, A., Huang, T.S.: Mixtures of local linearsubspaces for face recognition. In: Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Recog-nition, pp. 32–37 (1998)

17. Liu, C., Wechsler, H.: A unified Bayesian framework for facerecognition. In: Proceedings IEEE of the International Conferenceon Image Processing, vol. 1, pp. 151–155 (1998)

18. Perlibakas, V.: Distance measure for PCA-based face recognition.Pattern Recognit. Let. 25(6), 711–724 (2004)

19. Zhao, W., Chellappa, R.: Robust image based face recognition.In: Proceedings of the IEEE International Conference on ImageProcessing, vol. 1, pp. 41–44 (2000)

20. Zhao, W., Chellappa, R., Krishnaswamy, A.: Discriminant analy-sis of principal components for face recognition. In: ProceedingsOf the Third IEEE International Conference on Automatic Faceand Gesture Recognition, pp. 336–341 (1998)

21. Ekstrom, M.P.: Digital Image Processing Techniques. AcademicPress (1984)

22. Samaria, F.S., Harter, C.: Parameterisation of a stochastic modelfor human face identification. In: Proceedings of the Second IEEEWorkshop on Applications of Computer Vision, pp. 138–142(1994)

23. Stainvas, I., Intrator, N.: Blurred face recognition via a hybrid neu-ral architecture. In: Proceedings IEEE of the 15th InternationalConference on Pattern Recognition, vol. 2, pp. 805–808 (2000)

24. McGuire, P., D’Eleuterio, G.M.T.: Eigenpaxels and a neural net-work approach to image classification. IEEE Trans. Neural Netw.12(3), 625–635 (2001)

25. Stewart Bartlett, M., Movellan, J.R., Sejnowski, T.J.: Face recog-nition by independent component analysis. IEEE Trans. NeuralNetw. 13(6), 1450–1464 (2002)

Gabriel Jarillo Alvarado obtainedhis B.Sc. degree in BiomedicalEngineering from the UniversidadIberoamericana, Mexico. In 2003he obtained his M.Sc. degree fromthe University of Alberta at the De-partment of Electrical and Com-puter Engineering, he is currentlyenrolled in the Ph.D. program at thesame University. His research inter-ests involve machine learning, pat-tern recognition, and evolutionarycomputation with particular interestto biometrics for personal identifi-cation.

82 G. J. Alvarado et al.

Witold Pedrycz is a Professor andCanada Research Chair (CRC) inComputational Intelligence) in theDepartment of Electrical and Com-puter Engineering, University ofAlberta, Edmonton, Canada. Hisresearch interests involve Compu-tational Intelligence, fuzzy model-ing, knowledge discovery and datamining, fuzzy control includingfuzzy controllers, pattern recogni-tion, knowledge-based neural net-works, relational computing, andSoftware Engineering. He has pub-lished numerous papers in this area.He is also an author of 9 researchmonographs. Witold Pedrycz has

been a member of numerous program committees of conferences in thearea of fuzzy sets and neurocomputing. He currently serves on editorialboard of numereous journals including IEEE Transactions on SystemsMan and Cybernetics, Pattern Recognition Letters, IEEE Transactionson Fuzzy Systems, Fuzzy Sets & Systems, and IEEE Transactions onNeural Networks. He is an Editor-in-Chief of Information Sciences.

Marek Reformat received hisM.Sc. degree from Technical Uni-versity of Poznan, Poland, and hisPh.D. from University of Manitoba,Canada. His interests were relatedto simulation and modeling in time-domain, as well as evolutionarycomputing and its application tooptimization problems For threeyears he worked for the ManitobaHVDC Research Centre, Canada,where he was a member of asimulation software developmentteam. Currently, Marek Reformatis with the Department of Electricaland Computer Engineering atUniversity of Alberta. His research

interests lay in the areas of application of Computational Intelligence

techniques, such as neuro-fuzzy systems and evolutionary computing,as well as probabilistic and evidence theories to intelligent dataanalysis leading to translating data into knowledge. He applies thesemethods to conduct research in the areas of Software and KnowledgeEngineering. He has been a member of program committees of severalconferences related to Computational Intelligence and evolutionarycomputing.

Keun-Chang Kwak receivedB.Sc., M.Sc., and Ph.D. degreesin the Department of ElectricalEngineering from Chungbuk Na-tional University, Cheongju, SouthKorea, in 1996, 1998, and 2002,respectively. During 2002–2003, heworked as a researcher in the BrainKorea 21 Project Group, ChungbukNational University. His researchinterests include biometrics, com-putational intelligence, patternrecognition, and intelligent control.