Contourlet appearance model for facial age estimation

8
Contourlet Appearance Model for Facial Age Estimation Khoa LUUl,3, Keshav Seshadri2,3, Marios Savvides2,3, Tien D. Buil, and Ching Y Suenl * Abstract In this paper we ppose a novel Contourlet Appeance Model (CAM) that is more accute and faster at localizing facial landmarks than Active Appeance Models (AAMs). Our CAM also has the ability to not only extract holis- tic texture information, as AAMs do, but can also extct local texture information using the Nonsubsampled Con- tourlet Tnsform (NSC. We demonstte the efficiency of our method by applying it to the pblem of facial age es- timation. Compared to previously published age estimation techniques, our appach yields more accute results when tested on various face aging databases. 1. Introduction Face based age estimation studies have become promi- nent in recent years due to their potential role in enhanc- ing facial recognition systems, in assisting human law en- forcement etc. Human face age progression can be divided physically into two categories, i.e. growth and development (childhood to pre-adulthood), which primarily affects the physical structure of the craniofacial complex in the form of lengthening and widening of bony-tissue, and adult aging, where the primary impacts are in the formation of wrinkles, lines and soft tissue morphology. Active Appearance Mod- els (AAMs) [3] are a commonly used tool in face age es- timation [19] because they can successfully represent both shape that is prominent in the growth and development pe- riod, and texture that is prominent in the adult aging period. 1.1. Motivation There are two main driving reasons for carrying out this work. Firstly, prior work on AAM implementations indi- cates that their ability to generalize to automatically fit fa- 1 Department of Computer Science and Soſtware En- gineering. Concordia University, Montreal. Canada {kh_lu,bui,suen}@enes.eoneordia.ea 2 Department of Electrical and Computer Engineering (ECE). Carnegie Mellon University, Pittsburgh, USA kseshadr@andrew.emu.edu, [email protected].edu 3 CyLab Biometrics Center. Carnegie Mellon University, Pittsburgh. USA 978-1-4577-1359-0/11/$26.00 ©2011 IEEE cial landmarks on unseen images that exhibit illumination, pose or expression variation is quite poor [4]. Active Shape Models (ASMs) [5] have undergone several changes since the classical ASM was proposed in [6] and have recently become more robust [14, 20, 32] and hence more popular for accomplishing this task as they are not only faster, but also more accurate when dealing with unseen variations that are not present in training images. Secondly, although AAMs are an excellent model based approach for face related problems [19, 24], they only rep- resent holistic and not local information. This concern is something that local based approaches such as Local Binary Patterns (LBPs) [1], Gabor features etc., have addressed and hence their use in many applications has increased tremen- dously over the last few years. Compared to LBPs and Gabor features, the Nonsubsampled Contourlet Transform (NSCT) [8] can extract both local and directional informa- tion. Additionally, compared to LBPs that deal poorly with with noisy images, NSCT can distinguish between the noise and the high frequency components, which contain edges and directional information, in an image. This paper proposes a new Contourlet Appearance Model (CAM) that combines the robust facial landmarking capability of the Modified Active Shape Model (MASM) implementation [25] and NSCT features. Once the land- marking task is accomplished, both gray-scale pixels that represent the holistic features, prominent in the period of childhood, and Contourlet features that represent local fea- tures (wrinkles), that are outstanding especially in the adult period when faces are aging, are used to extract facial fea- tures om given images. These combined features are used as the discriminative facial inputs to a face aging function and are found to yield promising results on unseen images across two databases. The rest of this paper is organized as follows. In section 2, we review prior work carried out on AAMs and age esti- mation. In section 3, we describe the Contourlet Transform and the NSCT. In section 4, we describe how our Contourlet Appearance Model is used to extract shape and texture in- formation om facial images and go on to describe how we setup face aging functions for age estimation in section 5. Finally, we compare our approach against exisitng age esti- mation methods in section 6 and offer some conclusions on this work in section 7.

Transcript of Contourlet appearance model for facial age estimation

Contourlet Appearance Model for Facial Age Estimation

Khoa LUUl,3, Keshav Seshadri2,3, Marios Savvides2,3, Tien D. Buil, and Ching Y. Suenl *

Abstract

In this paper we propose a novel Contourlet Appearance

Model (CAM) that is more accurate and faster at localizing

facial landmarks than Active Appearance Models (AAMs).

Our CAM also has the ability to not only extract holis­

tic texture information, as AAMs do, but can also extract

local texture information using the Nonsubsampled Con­

tourlet Transform (NSCT). We demonstrate the efficiency of

our method by applying it to the problem of facial age es­

timation. Compared to previously published age estimation

techniques, our approach yields more accurate results when

tested on various face aging databases.

1. Introduction

Face based age estimation studies have become promi­nent in recent years due to their potential role in enhanc­ing facial recognition systems, in assisting human law en­forcement etc. Human face age progression can be divided physically into two categories, i.e. growth and development (childhood to pre-adulthood), which primarily affects the physical structure of the craniofacial complex in the form of lengthening and widening of bony-tissue, and adult aging, where the primary impacts are in the formation of wrinkles, lines and soft tissue morphology. Active Appearance Mod­els (AAMs) [3] are a commonly used tool in face age es­timation [19] because they can successfully represent both shape that is prominent in the growth and development pe­riod, and texture that is prominent in the adult aging period.

1.1. Motivation

There are two main driving reasons for carrying out this work. Firstly, prior work on AAM implementations indi­cates that their ability to generalize to automatically fit fa-

1 Department of Computer Science and Software En-gineering. Concordia University, Montreal. Canada {kh_lu,bui,suen}@enes.eoneordia.ea

2Department of Electrical and Computer Engineering (ECE). Carnegie Mellon University, Pittsburgh, USA [email protected], [email protected]

3CyLab Biometrics Center. Carnegie Mellon University, Pittsburgh. USA

978-1-4577-1359-0/11/$26.00 ©2011 IEEE

cial landmarks on unseen images that exhibit illumination, pose or expression variation is quite poor [4]. Active Shape Models (ASMs) [5] have undergone several changes since the classical ASM was proposed in [6] and have recently become more robust [14, 20, 32] and hence more popular for accomplishing this task as they are not only faster, but also more accurate when dealing with unseen variations that are not present in training images.

Secondly, although AAMs are an excellent model based approach for face related problems [19, 24], they only rep­resent holistic and not local information. This concern is something that local based approaches such as Local Binary Patterns (LBPs) [1], Gabor features etc., have addressed and hence their use in many applications has increased tremen­dously over the last few years. Compared to LBPs and Gabor features, the Nonsubsampled Contourlet Transform (NSCT) [8] can extract both local and directional informa­tion. Additionally, compared to LBPs that deal poorly with with noisy images, NSCT can distinguish between the noise and the high frequency components, which contain edges and directional information, in an image.

This paper proposes a new Contourlet Appearance Model (CAM) that combines the robust facial landmarking capability of the Modified Active Shape Model (MASM) implementation [25] and NSCT features. Once the land­marking task is accomplished, both gray-scale pixels that represent the holistic features, prominent in the period of childhood, and Contourlet features that represent local fea­tures (wrinkles), that are outstanding especially in the adult period when faces are aging, are used to extract facial fea­tures from given images. These combined features are used as the discriminative facial inputs to a face aging function and are found to yield promising results on unseen images across two databases.

The rest of this paper is organized as follows. In section 2, we review prior work carried out on AAMs and age esti­mation. In section 3, we describe the Contourlet Transform and the NSCT. In section 4, we describe how our Contourlet Appearance Model is used to extract shape and texture in­formation from facial images and go on to describe how we setup face aging functions for age estimation in section 5. Finally, we compare our approach against exisitng age esti­mation methods in section 6 and offer some conclusions on this work in section 7.

(a) (b) (e)

(d) (e) (f)

Figure 1. Sample ASM fitting results. The images in the first row

are the initialization provided to the ASMs while the images in

the second row show the corresponding fitting results under such

initialization conditions. (a) An example of poor initialization, (b)

Accurate initialization provided to the classical ASM implemen­

tation (c) Accurate initialization provided to MASM, (d) Fitting

results produced by MASM under poor initialization conditions,

(e) Fitting results produced by classical ASM under accurate ini­

tialization conditions, (f) Fitting results produced by MASM under

accurate initialization conditions.

2. Background

In this section, we will first review prior work carried out on AAMs and will then describe previous studies carried out on age estimation.

2.1. Prior work on AAMs

AAMs were first presented by Cootes et al. in 1998 [3]. Since then, there have been numerous studies on possible improvements to them which have mainly focused on either improving their texture modeling or fitting stage.

In the texture modeling stage, it is necessary to deal with high-dimensional texture representations. When rep­resenting high resolution images, the twin problems of stor­age and computational requirements are often encountered. Therefore, two solutions were suggested to reduce redun­dancy in representation, i.e. image compression and local textures. In the first approach, wavelet based approaches were proposed to compress the texture to eliminate re­dundant pixels with small variances [29]. However, the compression ratio is still limited at the fitting stage. In the second approach, due to local textures around feature points being more important than others at the model fitting

stage, Cristinacce and Cootes [7] suggested replacing the global textures by local ones to avoid dealing with high­dimensional texture. Kahraman et al. [17] used two in­dependent subspaces, illumination and identity, to combat variations in illumination. Christoudias and Darrell [2] pre­sented a nonlinear modeling approach combining a Gaus­sian Mixture Model (GMM) and a Nearest Neighbor Model (NNM) to model objects in nonlinear subspaces.

At the fitting stage, searching algorithms can suffer from local minima problems due to poor initial model parameters and as shown in Figure 1, initialization is extremely im­portant to obtaining accurate fitting results. Therefore, the initial parameters have to be within a small radius of con­vergence. To solve this problem, Wimmer et al. [28] used ASMs to initialize the AAM fitting stage more precisely. Their approach yields both a large radius of convergence and accurate fitting. However, their method is still an AAM based fitting algorithm and as we have outlined in section 1.1, if ASM fitting results are good, we only need to locate facial landmarks using an ASM and then apply Principal Component Analysis (PCA) on the region bounded by their convex hull to determine the best AAM parameters without actually needing to use an AAM for fitting the image. This is a key idea in our approach, in which we use the MASM implementation, which has been shown to have a reason­ably high fitting accuracy, to obtain the landmark locations. This is highlighted by Figure 1, which shows that MASM obtains more accurate fitting results than classical ASM un­der identical initialization conditions.

2.2. Prior work on age estimation

Age estimation techniques can be broadly classified un­der two headings which are described in the following para­graphs.

In holistic features based approaches, facial features are globally extracted from the entire facial region. Lanitis et

al. [18] used AAM features with a Weighted Aging Func­tion (WAS method) setup by a genetic algorithm to address the age estimation problem. Ricanek et al. [24] developed a multi-ethnic face age determination system to deal with race variations. In this implementation, each facial image is annotated with 161 landmarks. After extracting AAM fea­tures, Least Angle Regression (LAR) is used to determine the best features for age prediction. Luu et al. [19] used AAMs and Support Vector Regression (SVR) to build an age estimation framework.

In local features based approaches, facial features are locally extracted from windows around key facial land­marks, which often encode age information (such as wrin­kles around the corners of the eyes etc.). Yan et al. [31] used Spatially Flexible Patches (SFPs) and GMMs to setup a method known as Regression from Patch Kernels (RPK). Each SFP integrates local features with coordinate informa-

Figure 2. Schematic diagram of the Contourlet Transform.

tion to overcome problems such as occlusions, illumination effects or slight head pose variations. Suo et al. [26] de­signed a multi-resolution hierarchical graphical face model for age estimation. Facial images are first decomposed into detailed components from coarse to fine resolutions. AAMs are then applied to the overall face and to each type of fa­cial component. Moreover, skin zones such as wrinkles and blobs are also considered in this graph. Exploiting the strength of Gabor features, Gao and Ai [11] used them for facial image representation and used a Linear Discrim­inant Analysis (LDA) for building an age classifier. They also proposed using fuzzy LDA to cope with age ambigu­ity. Their method was tested on a dataset containing 4787 consumer images with four age groups: babies, children, adults and the elderly and was found to exhibit a classifi­cation rate of around 91 %. Recently Guo et al. [16] inves­tigated Biologically Inspired Features (BIFs) derived from a feed-forward model of the primate visual object recogni­tion pathway. The advantage of this approach is that small translations, rotations and scale changes can be well han­dled by the new set of features. Guo et al. [15] then im­proved on this method by integrating an age manifold into the aging process. Instead of dealing with each image in­dependently, Geng et al. [12] introduced an Aging Pattern Subspace (AGES) that uses all facial images of each sub­ject. Their idea was to model the aging pattern, a sequence of an individual's facial images sorted chronologically, by constructing a representative subspace. A detailed literature review on age estimation methods can be found in [10].

3. Contourlet Transform and Nonsubsampled Contourlet Transform

This section will review the Contourlet Transform and the Nonsubsampled Contourlet Transform, which is an ex­tension of the Contourlet Transform that allows redundancy to archive better decomposed images.

3.1. Contourlet Transform

Do and Vetterli introduced the Contourlet Transform [9] that has the ability to represent images with smooth con­tours. Unlike other wavelet transforms originally devel­oped in continuous domain and then applied to discretized

signals, the Contourlet Transform is designed specially to discover the 2D geometry of any given digital image, and not only captures the main features of wavelets, e.g multi­resolution, localization and critical sampling, but also of­fers a high degree of directionality and anisotropy. The transform is based on an efficient 2D multi-scale and multi­direction image expansion using Directional Filter Banks (DFBs).

The Contourlet Transform is constructed by using a dou­ble filter bank to decompose an image into several direc­tional sub-bands at multiple scales. This double filter bank is a combination of the Laplacian Pyramid (LP) with a DFB at each scale. The LP is first used to decompose the image into low frequency and high frequency sub-bands, which capture point discontinuities. Then the DFB is applied on the high frequency sub-bands to link point discontinuities into linear structures resulting in several directional sub­bands. Repeating these processes on the low frequency sub­bands results in a multi-scale and directional decomposition of images. The schematic diagram of the Contourlet scheme is shown in Figure 2.

3.2. Nonsubsampled Contourlet Transform (NSCT)

The Nonsubsampled Contourlet Transform [8] is a fully shift-invariant, multi-scale, and multi-directional transform that can separate strong edges, weak edges and noise from a given image. It is constructed using a Nonsubsampled Pyramid (NSP), responsible for the multi-scale information, and the Nonsubsampled Directional Filter Bank (NSDFB), which is responsible for conducting the directional informa­tion. Given an input image Ii represented at the i-th level, 1 :s; i :s; J, where J is the highestmaximum level of the

decomposition, Ii is divided into a low-pass sub-band I�L) and a high-pass sub-band I�H) by using the NSP equipped with a low-pass filter H(L) and a high-pass filter H(H) as shown in (1) and (2), in which * is the convolution operator.

(1)

(2)

The high-pass sub-band I�H) is then decomposed into directional sub-bands by using the NSDFB, which is con­structed in cascades comprising of parallelogram filters and

(L) two-channel fan filter banks. The low-pass sub-band Ii is in turn used for the next decomposition. Consequently, the directional sub-bands Ii,k can be computed by using the equivalent filter F�q using (3) in which 2li is the number of directional sub-bands at the i-th level.

(3)

The procedure is in turn applied to the low-pass sub-band to obtain the next level decomposition. Generally, the decom­position procedure of NSCT is formulated as shown in (4)

Figure 3. Features extracted by different texture extraction methods (a) Original facial image, (b) Noisy image with std. dev. of noise (J set

to 0.1, (c) - (f) Low-pass, strong edge, weak edge and noise components respectively obtained after applying LNSCT on the noisy image,

(g) LBP map of the noisy image, (h) Noisy image filtered using a Gabor filter.

in which 1 is the given image, I}O) is the low-pass sub-band and (Ii,k)i,k are directional sub-band coefficients.

(4)

Both noise and weak edges have low magnitude coeffi­cients in the frequency domain and because of this most im­age enhancement methods cannot separate weak edges from noise. However, weak edges have geometric structure while noise does not. NSCT can therefore be used to discriminate between them. This is illustrated by Figure 3, which shows how NSCT is able to separate out these three components from a noisy image which can in turn enable the extraction of more potentially discriminative feature vectors compared to conventional texture extraction methods such as LBPs or Gabor filters.

The strong edges in an image are the pixels with the largest magnitude coefficients in all sub-bands. Weak edges correspond to those pixels having both large and small mag­nitude coefficients in directional sub-bands within the same scale. Noise corresponds to those pixels that have small magnitude coefficients in all sub-bands. To enhance the range of values of a processed image, the image is converted from the spatial domain into the logarithmic domain before applying the NSCT [30]. The modified Logarithmic NSCT (LNSCT) coefficients x are nonlinearly mapped by v (x) us­ing (5) in which m and m are respectively the mean and the maximum magnitude of the coefficients for each pixel across the directional sub-band, (J is the standard deviation of noise of the sub-bands at a specific pyramidal level and p and c are constants with 1 ::; c ::; 5.

vex) = { �ax« I:I)P' 1) if (co- S 'in) (strong-edge pixels) if (cO" > 'in, cO" S m) (weak-edge pixels) if(cO" > 'in, cO" > m) (noise)

(5)

The steps involved in extracting Contourlet features are as follows:

1. Given an image I, calculate its logarithm: L = 10g(l)

2. Apply NSCT on L for N levels

3. Estimate the noise standard deviation

4. For each level of the pyramid:

(a) Estimate the noise variance

(b) At each pixel, compute m and m of the corre­sponding coefficients in all directional sub-bands at this level, and apply (5)

5. Apply the inverse NSCT to get the processed image

4. Contourlet Appearance Model (CAM)

This section outlines our proposed Contourlet Appear­ance Model for extraction of features to be used later by face aging functions for age estimation. The three key stages in building CAM are shape modeling, texture modeling and finally combining the resultant shape and texture parame­ters into a form that can be used for regression based age estimation.

4.1. CAM shape modeling

Our CAM first requires the automatic localization of sev­eral facial landmarks before the texture information from within their convex hull is extracted. This process, referred to as the shape modeling stage, is accomplished by using an ASM. Once an ASM is trained on images with a set of man­ually annotated landmarks (that lie along the facial bound­ary, eyes, nose, etc.), it can exhibit a fairly high degree of accuracy in locating the same landmarks in unseen faces.

During the training stage an ASM builds a point dis­tribution model to capture the variation in shape exhibited by several faces with manually positioned landmarks. The shapes in the training set are aligned using Generalized Pro­crustes Analysis [13] before the mean shape x is calculated. PCA is then applied on the aligned shapes to capture the main axes of shape variation. It now becomes possible to approximate any shape x by the shape model equation given by (6), in which Psis a matrix of the principal eigenvectors of the covariance matrix of the aligned shapes and bs is a vector of the PCA coefficients of x.

(6)

The final stage in the training process involves building sta­tistical models of the local pixel variation in regions around each landmark across all training images. This informa­tion is represented by the mean g and covariance matrix Sg of the profiles around a landmark which are constructed by sampling the image intensities along lines perpendicular to the direction of the shape boundary at a point (ID profiles) or in square regions around it (2D profiles) at multiple res­olutions of the face in an image pyramid.

When an unseen face is to be fitted by an ASM, a face detector is first used to initialize its parameters and roughly align the mean shape over the face. The landmarks are then iteratively moved at the lowest resolution image in the im­age pyramid into locations whose surrounding profiles g bear the lowest squared Mahalanobis distance f(g) to the mean profile for the respective landmark, as shown in (7).

(7)

The PCA coefficients for the shape in question are then cal­culated and constrained, so that the shape generated is rep­resentative of a typical human face. The process is repeated until the landmark locations converge at the lowest resolu­tion and are then scaled and again iteratively updated at the higher image resolutions. Thus, a course to fine adjustment of the landmark positions is made possible and once con­vergence is reached at the highest resolution, the final set of landmark coordinates is obtained.

4.1.1 Modified Active Shape Model (MASM)

The ASM implementation used in our CAM is the Modified Active Shape Model (MASM) proposed in [25]. MASM uses only 2D profiles and 2D search regions in order to de­termine the best location for each landmark. The use of 2D profiles results in a better modeling of the pixel intensities in regions around each landmark compared to the ID pro­files used by the classical ASM described by Cootes et al. in [6]. MASM also goes one step further than classical ASM by building a PCA subspace of the pixel variation around

Figure 4. Comparison of the fitting results produced by an AAM

and CAM (a) Input image, (b) and (c) Fitting result and texture

reconstruction obtained by an AAM, (d) and (e) Fitting result and

texture reconstruction obtained by CAM.

each landmark. At the test stage, a 2D profile around a po­tential landmark location is projected onto this subspace to obtain a vector of projection coefficients and is then recon­structed with these coefficients. The reconstruction error between this reconstructed profile and the original profile is calculated and the candidate location with the lowest recon­struction error is the one that is used as the new location for the landmark in question. The use of these improvements contribute to the fairly high fitting accuracy of MASM and enable it to outperform classical ASM as demonstrated in [25].

4.2. CAM texture modeling

After finding the facial region bounded by the convex hull of the landmarks determined by MASM, CAM tex­ture parameters must be extracted from this region. These CAM texture parameters are extracted using methods sim­ilar to those used by AAMs. During the training stage, a PCA subspace P t is built on the "shape-free" and illumina­tion normalized gray-level texture patches enclosed by the convex hulls of several training faces. The phrase "shape­free" is used as all texture patches are warped to a com­mon mean shape thus ensuring that the resultant patches are all of the same shape and contain the same number of pix­els. In addition to this gray-level texture subspace, we also build a PCA subspace P w of the warped weak edges tex­ture vectors obtained after processing the images using the NSCT as outlined in subsection 3.2. When a test image has been landmarked, we obtain a "shape-free" gray-level tex­ture vector t and a "shape-free" weak edges texture vector

Figure 5.68 point landmarking scheme used in our experiments.

w which are normalized using the same process followed during training. These texture vectors are then projected onto their respective subspaces to obtain two sets of texture coefficients bt and bw using (8) and (9) in which t and w

are the mean texture vector obtained from the original train­ing images and the mean texture vector obtained from the processed images respectively.

(8)

(9)

4.3. Combining CAM shape and texture parameters

The three sets of parameters that we obtain from the CAM shape and texture modeling are combined and serve as our input to a regression based age estimator function. This is where our implementation differs from the pro­cess followed by conventional AAM based aged estimators. During their training stages, AAM based methods extract the shape parameters bs and texture bt parameters from landmarked images, combine them into a single vector b and then go on to build a third PCA subspace Pc for this set of combined features so that they can now be represented by (10) in which c is the set of PCA coefficients of the com­bined appearance model.

b = Pcc (10)

When fitting an unseen image, the search is driven by up­dates to c and it is the final set of combined appearance pa­rameters that is used for regression based age analysis. This is where the frailties of AAM based methods emerge. Since the search is driven by the combined appearance parame­ters and not the texture parameters themselves, the recon­structed "shape-free" patches are not as close to the original gray-level textures as obtained by our CAM. This is clearly illustrated in Figure 4. It is here that our major contribution lies. By not relying on the combined appearance parame­ters to drive the landmark search, we obtain better estimates

Figure 6. Sample images from the FG-NET (top row) and the PAL

(bottom row) face aging databases showing the quality of images

in them. As can be seen, the images are non-ideal in that they

contain slight pose and illumination effects.

of the texture parameters as well as better texture recon­structions compared to AAM based methods. As a result of this, our age estimates also improve as the coefficients used in our age estimation functions are more accurate. This is borne out by the results we obtain in section 6.

5. Using CAM for age estimation

To use our CAM for age estimation, we construct two aging functions, one for youths, using training images with subjects aged between 0 and 20 and another for adults, us­ing training images in which the subjects are aged between 21 and 69. The face aging functions are built using Sup­port Vector Regression (SVR). Given an input image, the combined features are first extracted using our CAM as de­scribed in section 4. We then use a Support Vector Machine (SVM) with a Gaussian (RBF) kernel on the hybrid fea­ture vectors, to decide whether the facial image is that of a child's or an adult's. Based on the output from the bi­nary SVM classifier, the appropriate aging function is used to determine the age of the face.

6. Experimental Results

For our experiments, we use a landmarking scheme consisting of 68 facial landmarks [20] as shown in Fig­ure 5. MASM is trained on a set of 500 images of 115 subjects manually annotated with these 68 landmarks and drawn from the National Institute of Standards and Tech­nology (NIST) Multiple Biometric Grand Challenge - 2008 (MBGC-2008) still face challenge database [22, 23], which consists of 34,729 frontal images of 810 subjects. At the fitting stage, the Viola-Jones face detector [27] is used to detect faces in test images and initialize MASM.

To test the Combined Appearance Model on age estima­tion we used two face aging datasets, i.e. FG-NET [18] and PAL [21]. The FG-NET database contains 1002 face images of 82 subjects with ages ranging from 0 to 69. PAL con­sists of 443 neutral facial images of adults with ages rang­ing from 18 to 91. Sample images from the two databases are shown in Figure 6. The Mean Absolute Error (MAE), defined as the average of the absolute error between the es-

Methods MAE (years) CAM 4.12 AAM 6.77

AAM + LBP 6.17 BIFs [16] 4.77

AGES [12] 6.77

Table 1. Comparison of MAE values (in years) obtained by differ­

ent age estimation methods when the LOPO scheme was used on

the FG-NET database.

Methods MAE (years) AAM + LBP 7.17

CAM 6.0

Table 2. MAE values (in years) obtained by using different age

estimation methods on the PAL database.

timated ages and the ground truths, is computed using (11) and is used as our performance metric to compare the dif­ferent age estimation techniques.

(11)

In (11), Xi is the estimated age for the ith testing sample, Xi

is the corresponding ground truth age (the true age), and Nt is the total number of testing samples.

In our experiment, the Leave One Person Out (LOPO) scheme is set up to experimentally evaluate the different methods on the FG-NET database. In this scheme, all fa­cial images of one subject are used as the testing set and the remaining images are used as the training set for training the CAM texture model as well as the face aging functions. This process is applied in turn to all 82 subjects in the FG­NET database. Finally, the overall MAE is computed. Table 1 lists the MAE values obtained by several age estimation methods when the LOPO scheme was used on the FG-NET database while Table 2 compares the MAEs obtained by two additional age estimation techniques when the same testing scheme was used on the PAL database.

Some previously published results used the Cumulative Score (CS) rather than MAE to evaluate their estimation re­sults. It is computed using (12), in which Ne :::; e is the number of testing samples with MAE values less than or equal to e. Figure 7 compares the cumulative scores ob­tained by our method against such approaches when tested on the FG-NET database.

Ne < e CumScore(e) = --

-

- x 100 (12) Nt

From Tables 1 and 2 and Figure 7 it is clear that our CAM based approach obtains more accurate age estimates than several exisiting methods.

100 .-------------------

90 r--------------����-80 I--------------.>.....,,�-� :....., -70 �------_=��� __ --���-­

� 60 �-----��----____?�----­� � 50 i------:;>.,....L------;?.L..--:---::-=-::-----.� 40 �--_r..p....---___o>".c...---------r.

"5 30 �-_h<7__----::;".c...-----------� 20 �--;+_-___o>".c...-------------

10 Hr-r��-----------'=::.btiJYL---o ��-----------------

o 4 Error Level (years)

9 10

Figure 7. A comparison of the cumulative scores obtained by dif­

ferent age estimation methods at error levels from 0 to 10 years

when tested on the FG-NET database.

7. Conclusions

We have proposed a novel Contourlet Appearance Model to localize facial landmarks and extract texture based fea­tures from the convex hull bounded by them. Our CAM is shown to be more accurate at reconstructing unseen tex­ture compared to conventional AAMs because of its use of a Modified Active Shape Model for locating the facial land­marks. By decoupling the fitting of facial landmarks from the extraction of texture based features and by the used of the Nonsubsampled Contourlet Transform we are able to obtain discriminative features for age estimation using re­gression based aging functions. The results of using our CAM based age estimation technique on databases such as FG-NET and PAL are quite promising and highlight the im­proved accuracy of our method over several existing age es­timation techniques.

Acknowledgements

This research was supported by CyLab at Carnegie Mel­lon under grants DAAD 19-02-1-0389 and W911NF-09-1-0273 from the Army Research Office.

References

[1] T. Ahonen, A. Hadid, and M. Pietikainen. Face Recognition

with Local Binary Patterns. In Proc. of the 8th European

Conf. on Computer Vision, volume 3021, pages 469-481,

May 2004.

[2] C. M. Christoudias and T. Darrell. On Modeling Nonlinear

Shape and Texture Appearance Manifolds. In Proc. of IEEE

Con! on Computer Vision Pattern Recognition, volume 2,

pages 1067-1074, June 2005.

[3] T. F. Cootes, G. 1. Edwards, and C. 1. Taylor. Interprettting

Face Images using Active Appearance Models. In Proc. of

the 3rd IntI. Conf. on Automatic Face and Gesture Recogni­

tion, pages 300-305, Apr. 1998.

[4] T. F. Cootes and C. J. Taylor. Statistical Models of Appear­

ance for Computer Vision. Technical report, Imaging Sci­

ence and Biomedical Engineering, University of Manchester,

Mar. 2004.

[5] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Ac­

tive Shape Models - Their Training and Application. Com­

puter Vision and Image Understanding, 61(1):38-59, Jan.

1995.

[6] T. F. Cootes, C. J. Taylor, and A. Lanitis. Active Shape Mod­

els : Evaluation of a Multi-Resolution Method for Improv­

ing Image Search. In Proc. of the 5th British Machine Vision

Conf., pages 327-336, 1994.

[7] D. Cristinacce and T. F. Cootes. A Comparison of Shape

Constrained Facial Feature Detectors. In Proc. of the 6th IntI.

Con! on Automatic Face and Gesture Recognition, pages

375-380, May 2004.

[8] A. L. da Cunha, J. P. Zhou, and M. N. Do. The Nonsubsam­

pled Contourlet Transform: Theory, Design, and Applica­

tions. IEEE Trans. on Image Processing, 15(10):3089-3101,

Oct. 2006.

[9] M. N. Do and M. Vetterli. The Contourlet Transform: An

Efficient Directional Multiresolution Image Representation.

IEEE Trans. on Image Processing, 14(12):2091-2106, Dec.

2005.

[10] Y. Fu, G. Guo, and T. S. Huang. Age Synthesis and Estima­

tion via Faces: A Survey. IEEE Trans. on Pattern Analysis

and Machine Intelligence, 32( 11): 1955-1976, Nov. 2010.

[11] F. Gao and H. Ai. Face Age Classification on Consumer Im­

ages with Gabor Feature and Fuzzy LDA Method. In Proc. of

the 3rd IntI. Con! on Advances in Biometrics, volume 5558,

pages 132-141, June 2009.

[12] X. Geng, Z. Zhou, and K. Smith-Miles. Automatic Age Esti­

mation Based on Facial Aging Patterns. IEEE Trans. on Pat­

tern Analysis and Machine Intelligence, 29(12):2234--2240,

Dec. 2007.

[13] J. C. Gower. Generalized Procrustes Analysis. Psychome­

trika, 40(1):33-51, Mar. 1975.

[14] L. Gu and T. Kanade. A Generative Shape Regularization

Model for Robust Face Alignment. In ECCV '08: Proc. of

the 10th European Con! on Computer Vision: Part 1, pages

413-426, Oct. 2008.

[15] G. Guo, G. Mu, Y. Fu, C. Dyer, and T. S. Huang. A Study

on Automatic Age Estimation Using a Large Database. In

Proc. of the 12th IEEE IntI. Con! on Computer Vision, pages

1986-1991, Oct. 2009.

[16] G. Guo, G. Mu, Y. Fu, and T. S. Huang. Human Age Es­

timation Using Bio-Inspired Features. In Proc. of the IEEE

Con! on Computer Vision and Pattern Recognition, pages

112-119, June 2009.

[17] F. Kahraman, M. Gokmen, S. Darkner, and R. Larsen. An

Active Illumination and Appearance (aia) Model for Face

Alignment. In Proc. of the IEEE Con! on Computer Vision

And Pattern Recognition, pages 1-7, June 2007.

[18] A. Lanitis, C. J. Taylor, and T. F. Cootes. Modeling the Pro­

cess of Ageing in Face Images. In Proc. of the 7th IEEE IntI.

Conf. on Computer Vision, volume 1, pages 131-136, Sept.

1999.

[19] K. Luu, T. D. Bui, K. Ricanek, and C. Y. Suen. Age­

determination using Active Appearance Models and Support

Vector Machine Regression. In BTAS '09: Proc. of the 3rd IEEE IntI. Con! on Biometrics: Theory, Applications and

Systems, pages 314--318, Sept. 2009.

[20] S. Milborrow and F. Nicolls. Locating Facial Features with

an Extended Active Shape Model. In ECCV '08: Proc. of

the 10th European Con! on Computer Vision: Part IV, pages

504--513, Oct. 2008.

[21] M. Minear and D. C. Park. A Lifespan Database of Adult

Facial Stimuli. Behavior Research Methods, Instruments and

Computers, 36(4):630-633, Nov. 2004.

[22] NIST. National Institute of Standards and Technology: Mul­

tiple Biometric Grand Challenge (MBGC) - 2008. http:

//face.nist.gov/mbgc, 2008.

[23] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scrugs,

A. J. O'Toole, D. Bolme, K. W. Bowyer, B. A. Draper,

G. H. Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan III,

and S. Weimer. Overview of the Multiple Biometrics Grand

Challenge. In Proc. of the 3rd IAPRlIEEE IntI. Conf. on Bio­

metrics, pages 705-714, June 2009.

[24] K. Ricanek, Y. Wang, C. Chen, and S. J. Simmons. Gener­

alized Multi-Ethnic Face Age-Determination. In BTAS '09: Proc. of the 3rd IEEE IntI. Conf. on Biometrics: Theory, Ap­

plications and Systems, pages 127-132, Sept. 2009.

[25] K. Seshadri and M. Savvides. Robust Modified Active Shape

Model for Automatic Facial Landmark Annotation of Frontal

Faces. In BTAS '09: Proc. of the 3rd IEEE IntI. Con! on

Biometrics: Theory, Applications and Systems, pages 319-

326, Sept. 2009.

[26] J. Suo, T. Wu, S. Zhu, S. Shan, X. Chen, and W. Gao. Design

Sparse Features for Age Estimation Using Hierarchical Face

Model. In Proc. of the 8th IntI. Conf. on Automatic Face and

Gesture Recognition, pages 1-6, Sept. 2008.

[27] P. Viola and M. J. Jones. Robust Real-Time Face Detection.

IntI. lournal of Computer Vision, 57(2):137-154, May 2004.

[28] M. Wimmer, S. Fujie, F. Stulp, T. Kobayashi, and B. Radig.

An ASM Fitting Method Based on Machine Learning that

Provides a Robust Parameter Initialization for AAM Fitting.

In Proc. of the 8th IntI. Con! on Automatic Face and Gesture

Recognition, pages 1-6, Sept. 2008.

[29] C. Wolstenholme and C. J. Taylor. Wavelet Compression of

Active Appearance Models. In Proc. of the 2nd IntI. Con!

on Medical Image Computing and Computer-Assisted Inter­

vention, pages 544--554, Sept. 1999.

[30] X. Xie, J. H. Lai, and W. Zheng. Extraction of Illumina­

tion Invariant Facial Features from a Single Image Using

Nonsubsampled Contourlet Transform. Pattern Recognition,

43(12):4177-4189, Dec. 2010.

[31] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S.

Huang. Regression from Patch-Kernel. In Proc. of IEEE

Con! on Computer Vision and Pattern Recognition, pages

1-8, June 2008.

[32] Y. Zhou, L. Gu, and H.-J. Zhang. Bayesian Tangent Shape

Model: Estimating Shape and Pose Parameters via Bayesian

Inference. In CVPR '03: Proc. of the 215t IEEE IntI. Con!

on Computer Vision and Pattern Recognition, pages 109-

116, June 2003.