Download - Contourlet appearance model for facial age estimation

Contourlet Appearance Model for Facial Age Estimation

Khoa LUUl,3, Keshav Seshadri2,3, Marios Savvides2,3, Tien D. Buil, and Ching Y. Suenl *

Abstract

In this paper we propose a novel Contourlet Appearance

Model (CAM) that is more accurate and faster at localizing

facial landmarks than Active Appearance Models (AAMs).

Our CAM also has the ability to not only extract holis

tic texture information, as AAMs do, but can also extract

local texture information using the Nonsubsampled Con

tourlet Transform (NSCT). We demonstrate the efficiency of

our method by applying it to the problem of facial age es

timation. Compared to previously published age estimation

techniques, our approach yields more accurate results when

tested on various face aging databases.

1. Introduction

Face based age estimation studies have become prominent in recent years due to their potential role in enhancing facial recognition systems, in assisting human law enforcement etc. Human face age progression can be divided physically into two categories, i.e. growth and development (childhood to pre-adulthood), which primarily affects the physical structure of the craniofacial complex in the form of lengthening and widening of bony-tissue, and adult aging, where the primary impacts are in the formation of wrinkles, lines and soft tissue morphology. Active Appearance Models (AAMs) [3] are a commonly used tool in face age estimation [19] because they can successfully represent both shape that is prominent in the growth and development period, and texture that is prominent in the adult aging period.

1.1. Motivation

There are two main driving reasons for carrying out this work. Firstly, prior work on AAM implementations indicates that their ability to generalize to automatically fit fa-

1 Department of Computer Science and Software En-gineering. Concordia University, Montreal. Canada {kh_lu,bui,suen}@enes.eoneordia.ea

2Department of Electrical and Computer Engineering (ECE). Carnegie Mellon University, Pittsburgh, USA [email protected], [email protected]

3CyLab Biometrics Center. Carnegie Mellon University, Pittsburgh. USA

978-1-4577-1359-0/11/$26.00 ©2011 IEEE

cial landmarks on unseen images that exhibit illumination, pose or expression variation is quite poor [4]. Active Shape Models (ASMs) [5] have undergone several changes since the classical ASM was proposed in [6] and have recently become more robust [14, 20, 32] and hence more popular for accomplishing this task as they are not only faster, but also more accurate when dealing with unseen variations that are not present in training images.

Secondly, although AAMs are an excellent model based approach for face related problems [19, 24], they only represent holistic and not local information. This concern is something that local based approaches such as Local Binary Patterns (LBPs) [1], Gabor features etc., have addressed and hence their use in many applications has increased tremendously over the last few years. Compared to LBPs and Gabor features, the Nonsubsampled Contourlet Transform (NSCT) [8] can extract both local and directional information. Additionally, compared to LBPs that deal poorly with with noisy images, NSCT can distinguish between the noise and the high frequency components, which contain edges and directional information, in an image.

This paper proposes a new Contourlet Appearance Model (CAM) that combines the robust facial landmarking capability of the Modified Active Shape Model (MASM) implementation [25] and NSCT features. Once the landmarking task is accomplished, both gray-scale pixels that represent the holistic features, prominent in the period of childhood, and Contourlet features that represent local features (wrinkles), that are outstanding especially in the adult period when faces are aging, are used to extract facial features from given images. These combined features are used as the discriminative facial inputs to a face aging function and are found to yield promising results on unseen images across two databases.

The rest of this paper is organized as follows. In section 2, we review prior work carried out on AAMs and age estimation. In section 3, we describe the Contourlet Transform and the NSCT. In section 4, we describe how our Contourlet Appearance Model is used to extract shape and texture information from facial images and go on to describe how we setup face aging functions for age estimation in section 5. Finally, we compare our approach against exisitng age estimation methods in section 6 and offer some conclusions on this work in section 7.

https://www.researchgate.net/publication/225136840_Locating_Facial_Features_with_an_Extended_Active_Shape_Model?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/224083447_Robust_modified_Active_Shape_Model_for_automatic_facial_landmark_annotation_of_frontal_faces?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/221304714_A_Generative_Shape_Regularization_Model_for_Robust_Face_Alignment?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/4022936_Bayesian_tangent_shape_model_estimating_shape_and_pose_parameters_via_Bayesian_inference?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/2516800_Active_Shape_Models_Evaluation_of_a_Multi-Resolution_Method_for_Improving_Image_Search?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/257485030_Active_Shape_Models-Their_Training_and_Application?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/2931099_Statistical_Models_of_Appearance_for_computer_vision?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/221304831_Face_Recognition_with_Local_Binary_Patterns?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

(a) (b) (e)

(d) (e) (f)

Figure 1. Sample ASM fitting results. The images in the first row

are the initialization provided to the ASMs while the images in

the second row show the corresponding fitting results under such

initialization conditions. (a) An example of poor initialization, (b)

Accurate initialization provided to the classical ASM implemen

tation (c) Accurate initialization provided to MASM, (d) Fitting

results produced by MASM under poor initialization conditions,

(e) Fitting results produced by classical ASM under accurate ini

tialization conditions, (f) Fitting results produced by MASM under

accurate initialization conditions.

2. Background

In this section, we will first review prior work carried out on AAMs and will then describe previous studies carried out on age estimation.

2.1. Prior work on AAMs

AAMs were first presented by Cootes et al. in 1998 [3]. Since then, there have been numerous studies on possible improvements to them which have mainly focused on either improving their texture modeling or fitting stage.

In the texture modeling stage, it is necessary to deal with high-dimensional texture representations. When representing high resolution images, the twin problems of storage and computational requirements are often encountered. Therefore, two solutions were suggested to reduce redundancy in representation, i.e. image compression and local textures. In the first approach, wavelet based approaches were proposed to compress the texture to eliminate redundant pixels with small variances [29]. However, the compression ratio is still limited at the fitting stage. In the second approach, due to local textures around feature points being more important than others at the model fitting

stage, Cristinacce and Cootes [7] suggested replacing the global textures by local ones to avoid dealing with highdimensional texture. Kahraman et al. [17] used two independent subspaces, illumination and identity, to combat variations in illumination. Christoudias and Darrell [2] presented a nonlinear modeling approach combining a Gaussian Mixture Model (GMM) and a Nearest Neighbor Model (NNM) to model objects in nonlinear subspaces.

At the fitting stage, searching algorithms can suffer from local minima problems due to poor initial model parameters and as shown in Figure 1, initialization is extremely important to obtaining accurate fitting results. Therefore, the initial parameters have to be within a small radius of convergence. To solve this problem, Wimmer et al. [28] used ASMs to initialize the AAM fitting stage more precisely. Their approach yields both a large radius of convergence and accurate fitting. However, their method is still an AAM based fitting algorithm and as we have outlined in section 1.1, if ASM fitting results are good, we only need to locate facial landmarks using an ASM and then apply Principal Component Analysis (PCA) on the region bounded by their convex hull to determine the best AAM parameters without actually needing to use an AAM for fitting the image. This is a key idea in our approach, in which we use the MASM implementation, which has been shown to have a reasonably high fitting accuracy, to obtain the landmark locations. This is highlighted by Figure 1, which shows that MASM obtains more accurate fitting results than classical ASM under identical initialization conditions.

2.2. Prior work on age estimation

Age estimation techniques can be broadly classified under two headings which are described in the following paragraphs.

In holistic features based approaches, facial features are globally extracted from the entire facial region. Lanitis et

al. [18] used AAM features with a Weighted Aging Function (WAS method) setup by a genetic algorithm to address the age estimation problem. Ricanek et al. [24] developed a multi-ethnic face age determination system to deal with race variations. In this implementation, each facial image is annotated with 161 landmarks. After extracting AAM features, Least Angle Regression (LAR) is used to determine the best features for age prediction. Luu et al. [19] used AAMs and Support Vector Regression (SVR) to build an age estimation framework.

In local features based approaches, facial features are locally extracted from windows around key facial landmarks, which often encode age information (such as wrinkles around the corners of the eyes etc.). Yan et al. [31] used Spatially Flexible Patches (SFPs) and GMMs to setup a method known as Regression from Patch Kernels (RPK). Each SFP integrates local features with coordinate informa-

https://www.researchgate.net/publication/221362426_On_Modelling_Nonlinear_Shape-and-Texture_Appearance_Manifolds?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/3816587_Modeling_the_process_of_ageing_in_face_images?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/221400666_Wavelet_Compression_of_Active_Appearance_Models?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/221363061_An_Active_Illumination_and_Appearance_AIA_Model_for_Face_Alignment?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/232619511_A_Comparison_of_Shape_Constrained_Facial_Feature_Detectors?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

Figure 2. Schematic diagram of the Contourlet Transform.

tion to overcome problems such as occlusions, illumination effects or slight head pose variations. Suo et al. [26] designed a multi-resolution hierarchical graphical face model for age estimation. Facial images are first decomposed into detailed components from coarse to fine resolutions. AAMs are then applied to the overall face and to each type of facial component. Moreover, skin zones such as wrinkles and blobs are also considered in this graph. Exploiting the strength of Gabor features, Gao and Ai [11] used them for facial image representation and used a Linear Discriminant Analysis (LDA) for building an age classifier. They also proposed using fuzzy LDA to cope with age ambiguity. Their method was tested on a dataset containing 4787 consumer images with four age groups: babies, children, adults and the elderly and was found to exhibit a classification rate of around 91 %. Recently Guo et al. [16] investigated Biologically Inspired Features (BIFs) derived from a feed-forward model of the primate visual object recognition pathway. The advantage of this approach is that small translations, rotations and scale changes can be well handled by the new set of features. Guo et al. [15] then improved on this method by integrating an age manifold into the aging process. Instead of dealing with each image independently, Geng et al. [12] introduced an Aging Pattern Subspace (AGES) that uses all facial images of each subject. Their idea was to model the aging pattern, a sequence of an individual's facial images sorted chronologically, by constructing a representative subspace. A detailed literature review on age estimation methods can be found in [10].

3. Contourlet Transform and Nonsubsampled Contourlet Transform

This section will review the Contourlet Transform and the Nonsubsampled Contourlet Transform, which is an extension of the Contourlet Transform that allows redundancy to archive better decomposed images.

3.1. Contourlet Transform

Do and Vetterli introduced the Contourlet Transform [9] that has the ability to represent images with smooth contours. Unlike other wavelet transforms originally developed in continuous domain and then applied to discretized

signals, the Contourlet Transform is designed specially to discover the 2D geometry of any given digital image, and not only captures the main features of wavelets, e.g multiresolution, localization and critical sampling, but also offers a high degree of directionality and anisotropy. The transform is based on an efficient 2D multi-scale and multidirection image expansion using Directional Filter Banks (DFBs).

The Contourlet Transform is constructed by using a double filter bank to decompose an image into several directional sub-bands at multiple scales. This double filter bank is a combination of the Laplacian Pyramid (LP) with a DFB at each scale. The LP is first used to decompose the image into low frequency and high frequency sub-bands, which capture point discontinuities. Then the DFB is applied on the high frequency sub-bands to link point discontinuities into linear structures resulting in several directional subbands. Repeating these processes on the low frequency subbands results in a multi-scale and directional decomposition of images. The schematic diagram of the Contourlet scheme is shown in Figure 2.

3.2. Nonsubsampled Contourlet Transform (NSCT)

The Nonsubsampled Contourlet Transform [8] is a fully shift-invariant, multi-scale, and multi-directional transform that can separate strong edges, weak edges and noise from a given image. It is constructed using a Nonsubsampled Pyramid (NSP), responsible for the multi-scale information, and the Nonsubsampled Directional Filter Bank (NSDFB), which is responsible for conducting the directional information. Given an input image Ii represented at the i-th level, 1 :s; i :s; J, where J is the highestmaximum level of the

decomposition, Ii is divided into a low-pass sub-band I�L) and a high-pass sub-band I�H) by using the NSP equipped with a low-pass filter H(L) and a high-pass filter H(H) as shown in (1) and (2), in which * is the convolution operator.

(1)

(2)

The high-pass sub-band I�H) is then decomposed into directional sub-bands by using the NSDFB, which is constructed in cascades comprising of parallelogram filters and

(L) two-channel fan filter banks. The low-pass sub-band Ii is in turn used for the next decomposition. Consequently, the directional sub-bands Ii,k can be computed by using the equivalent filter F�q using (3) in which 2li is the number of directional sub-bands at the i-th level.

(3)

The procedure is in turn applied to the low-pass sub-band to obtain the next level decomposition. Generally, the decomposition procedure of NSCT is formulated as shown in (4)

https://www.researchgate.net/publication/221110796_A_Study_on_Automatic_Age_Estimation_using_a_Large_Database?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/7401411_The_contourlet_transform_an_efficient_directional_multiresolution_image_representation_IEEE_Trans_Image_Process?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/224579250_Human_age_estimation_using_bio-inspired_features?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/221292412_Design_sparse_features_for_age_estimation_using_hierarchical_face_model?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/225119664_Face_Age_Classification_on_Consumer_Images_with_Gabor_Feature_and_Fuzzy_LDA_Method?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

https://www.researchgate.net/publication/3194045_Correction_to_Automatic_Age_Estimation_Based_on_Facial_Aging_Patterns?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

Figure 3. Features extracted by different texture extraction methods (a) Original facial image, (b) Noisy image with std. dev. of noise (J set

to 0.1, (c) - (f) Low-pass, strong edge, weak edge and noise components respectively obtained after applying LNSCT on the noisy image,

(g) LBP map of the noisy image, (h) Noisy image filtered using a Gabor filter.

in which 1 is the given image, I}O) is the low-pass sub-band and (Ii,k)i,k are directional sub-band coefficients.

(4)

Both noise and weak edges have low magnitude coefficients in the frequency domain and because of this most image enhancement methods cannot separate weak edges from noise. However, weak edges have geometric structure while noise does not. NSCT can therefore be used to discriminate between them. This is illustrated by Figure 3, which shows how NSCT is able to separate out these three components from a noisy image which can in turn enable the extraction of more potentially discriminative feature vectors compared to conventional texture extraction methods such as LBPs or Gabor filters.

The strong edges in an image are the pixels with the largest magnitude coefficients in all sub-bands. Weak edges correspond to those pixels having both large and small magnitude coefficients in directional sub-bands within the same scale. Noise corresponds to those pixels that have small magnitude coefficients in all sub-bands. To enhance the range of values of a processed image, the image is converted from the spatial domain into the logarithmic domain before applying the NSCT [30]. The modified Logarithmic NSCT (LNSCT) coefficients x are nonlinearly mapped by v (x) using (5) in which m and m are respectively the mean and the maximum magnitude of the coefficients for each pixel across the directional sub-band, (J is the standard deviation of noise of the sub-bands at a specific pyramidal level and p and c are constants with 1 ::; c ::; 5.

vex) = { �ax« I:I)P' 1) if (co- S 'in) (strong-edge pixels) if (cO" > 'in, cO" S m) (weak-edge pixels) if(cO" > 'in, cO" > m) (noise)

(5)

The steps involved in extracting Contourlet features are as follows:

1. Given an image I, calculate its logarithm: L = 10g(l)

2. Apply NSCT on L for N levels

3. Estimate the noise standard deviation

4. For each level of the pyramid:

(a) Estimate the noise variance

(b) At each pixel, compute m and m of the corresponding coefficients in all directional sub-bands at this level, and apply (5)

5. Apply the inverse NSCT to get the processed image

4. Contourlet Appearance Model (CAM)

This section outlines our proposed Contourlet Appearance Model for extraction of features to be used later by face aging functions for age estimation. The three key stages in building CAM are shape modeling, texture modeling and finally combining the resultant shape and texture parameters into a form that can be used for regression based age estimation.

4.1. CAM shape modeling

Our CAM first requires the automatic localization of several facial landmarks before the texture information from within their convex hull is extracted. This process, referred to as the shape modeling stage, is accomplished by using an ASM. Once an ASM is trained on images with a set of manually annotated landmarks (that lie along the facial boundary, eyes, nose, etc.), it can exhibit a fairly high degree of accuracy in locating the same landmarks in unseen faces.

During the training stage an ASM builds a point distribution model to capture the variation in shape exhibited by several faces with manually positioned landmarks. The shapes in the training set are aligned using Generalized Procrustes Analysis [13] before the mean shape x is calculated. PCA is then applied on the aligned shapes to capture the main axes of shape variation. It now becomes possible to approximate any shape x by the shape model equation given by (6), in which Psis a matrix of the principal eigenvectors of the covariance matrix of the aligned shapes and bs is a vector of the PCA coefficients of x.

(6)

The final stage in the training process involves building statistical models of the local pixel variation in regions around each landmark across all training images. This information is represented by the mean g and covariance matrix Sg of the profiles around a landmark which are constructed by sampling the image intensities along lines perpendicular to the direction of the shape boundary at a point (ID profiles) or in square regions around it (2D profiles) at multiple resolutions of the face in an image pyramid.

When an unseen face is to be fitted by an ASM, a face detector is first used to initialize its parameters and roughly align the mean shape over the face. The landmarks are then iteratively moved at the lowest resolution image in the image pyramid into locations whose surrounding profiles g bear the lowest squared Mahalanobis distance f(g) to the mean profile for the respective landmark, as shown in (7).

(7)

The PCA coefficients for the shape in question are then calculated and constrained, so that the shape generated is representative of a typical human face. The process is repeated until the landmark locations converge at the lowest resolution and are then scaled and again iteratively updated at the higher image resolutions. Thus, a course to fine adjustment of the landmark positions is made possible and once convergence is reached at the highest resolution, the final set of landmark coordinates is obtained.

4.1.1 Modified Active Shape Model (MASM)

The ASM implementation used in our CAM is the Modified Active Shape Model (MASM) proposed in [25]. MASM uses only 2D profiles and 2D search regions in order to determine the best location for each landmark. The use of 2D profiles results in a better modeling of the pixel intensities in regions around each landmark compared to the ID profiles used by the classical ASM described by Cootes et al. in [6]. MASM also goes one step further than classical ASM by building a PCA subspace of the pixel variation around

Figure 4. Comparison of the fitting results produced by an AAM

and CAM (a) Input image, (b) and (c) Fitting result and texture

reconstruction obtained by an AAM, (d) and (e) Fitting result and

texture reconstruction obtained by CAM.

each landmark. At the test stage, a 2D profile around a potential landmark location is projected onto this subspace to obtain a vector of projection coefficients and is then reconstructed with these coefficients. The reconstruction error between this reconstructed profile and the original profile is calculated and the candidate location with the lowest reconstruction error is the one that is used as the new location for the landmark in question. The use of these improvements contribute to the fairly high fitting accuracy of MASM and enable it to outperform classical ASM as demonstrated in [25].

4.2. CAM texture modeling

After finding the facial region bounded by the convex hull of the landmarks determined by MASM, CAM texture parameters must be extracted from this region. These CAM texture parameters are extracted using methods similar to those used by AAMs. During the training stage, a PCA subspace P t is built on the "shape-free" and illumination normalized gray-level texture patches enclosed by the convex hulls of several training faces. The phrase "shapefree" is used as all texture patches are warped to a common mean shape thus ensuring that the resultant patches are all of the same shape and contain the same number of pixels. In addition to this gray-level texture subspace, we also build a PCA subspace P w of the warped weak edges texture vectors obtained after processing the images using the NSCT as outlined in subsection 3.2. When a test image has been landmarked, we obtain a "shape-free" gray-level texture vector t and a "shape-free" weak edges texture vector




https://www.researchgate.net/publication/283923112_Generalized_Procrustes_analysis?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

Figure 5.68 point landmarking scheme used in our experiments.

w which are normalized using the same process followed during training. These texture vectors are then projected onto their respective subspaces to obtain two sets of texture coefficients bt and bw using (8) and (9) in which t and w

are the mean texture vector obtained from the original training images and the mean texture vector obtained from the processed images respectively.

(8)

(9)

4.3. Combining CAM shape and texture parameters

The three sets of parameters that we obtain from the CAM shape and texture modeling are combined and serve as our input to a regression based age estimator function. This is where our implementation differs from the process followed by conventional AAM based aged estimators. During their training stages, AAM based methods extract the shape parameters bs and texture bt parameters from landmarked images, combine them into a single vector b and then go on to build a third PCA subspace Pc for this set of combined features so that they can now be represented by (10) in which c is the set of PCA coefficients of the combined appearance model.

b = Pcc (10)

When fitting an unseen image, the search is driven by updates to c and it is the final set of combined appearance parameters that is used for regression based age analysis. This is where the frailties of AAM based methods emerge. Since the search is driven by the combined appearance parameters and not the texture parameters themselves, the reconstructed "shape-free" patches are not as close to the original gray-level textures as obtained by our CAM. This is clearly illustrated in Figure 4. It is here that our major contribution lies. By not relying on the combined appearance parameters to drive the landmark search, we obtain better estimates

Figure 6. Sample images from the FG-NET (top row) and the PAL

(bottom row) face aging databases showing the quality of images

in them. As can be seen, the images are non-ideal in that they

contain slight pose and illumination effects.

of the texture parameters as well as better texture reconstructions compared to AAM based methods. As a result of this, our age estimates also improve as the coefficients used in our age estimation functions are more accurate. This is borne out by the results we obtain in section 6.

5. Using CAM for age estimation

To use our CAM for age estimation, we construct two aging functions, one for youths, using training images with subjects aged between 0 and 20 and another for adults, using training images in which the subjects are aged between 21 and 69. The face aging functions are built using Support Vector Regression (SVR). Given an input image, the combined features are first extracted using our CAM as described in section 4. We then use a Support Vector Machine (SVM) with a Gaussian (RBF) kernel on the hybrid feature vectors, to decide whether the facial image is that of a child's or an adult's. Based on the output from the binary SVM classifier, the appropriate aging function is used to determine the age of the face.

6. Experimental Results

For our experiments, we use a landmarking scheme consisting of 68 facial landmarks [20] as shown in Figure 5. MASM is trained on a set of 500 images of 115 subjects manually annotated with these 68 landmarks and drawn from the National Institute of Standards and Technology (NIST) Multiple Biometric Grand Challenge - 2008 (MBGC-2008) still face challenge database [22, 23], which consists of 34,729 frontal images of 810 subjects. At the fitting stage, the Viola-Jones face detector [27] is used to detect faces in test images and initialize MASM.

To test the Combined Appearance Model on age estimation we used two face aging datasets, i.e. FG-NET [18] and PAL [21]. The FG-NET database contains 1002 face images of 82 subjects with ages ranging from 0 to 69. PAL consists of 443 neutral facial images of adults with ages ranging from 18 to 91. Sample images from the two databases are shown in Figure 6. The Mean Absolute Error (MAE), defined as the average of the absolute error between the es-

https://www.researchgate.net/publication/8090735_A_Lifespan_Database_of_Adult_Facial_Stimuli?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==


https://www.researchgate.net/publication/221383402_Overview_of_the_Multiple_Biometrics_Grand_Challenge?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==


https://www.researchgate.net/publication/220660094_Robust_Real-Time_Face_Detection?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==

Methods MAE (years) CAM 4.12 AAM 6.77

AAM + LBP 6.17 BIFs [16] 4.77

AGES [12] 6.77

Table 1. Comparison of MAE values (in years) obtained by differ

ent age estimation methods when the LOPO scheme was used on

the FG-NET database.

Methods MAE (years) AAM + LBP 7.17

CAM 6.0

Table 2. MAE values (in years) obtained by using different age

estimation methods on the PAL database.

timated ages and the ground truths, is computed using (11) and is used as our performance metric to compare the different age estimation techniques.

(11)

In (11), Xi is the estimated age for the ith testing sample, Xi

is the corresponding ground truth age (the true age), and Nt is the total number of testing samples.

In our experiment, the Leave One Person Out (LOPO) scheme is set up to experimentally evaluate the different methods on the FG-NET database. In this scheme, all facial images of one subject are used as the testing set and the remaining images are used as the training set for training the CAM texture model as well as the face aging functions. This process is applied in turn to all 82 subjects in the FGNET database. Finally, the overall MAE is computed. Table 1 lists the MAE values obtained by several age estimation methods when the LOPO scheme was used on the FG-NET database while Table 2 compares the MAEs obtained by two additional age estimation techniques when the same testing scheme was used on the PAL database.

Some previously published results used the Cumulative Score (CS) rather than MAE to evaluate their estimation results. It is computed using (12), in which Ne :::; e is the number of testing samples with MAE values less than or equal to e. Figure 7 compares the cumulative scores obtained by our method against such approaches when tested on the FG-NET database.

Ne < e CumScore(e) = --

-

- x 100 (12) Nt

From Tables 1 and 2 and Figure 7 it is clear that our CAM based approach obtains more accurate age estimates than several exisiting methods.

100 .-------------------

90 r--------------��-80 I--------------.>.....,,�-� :....., -70 �------_=�� __ --��-

� 60 �-----��----____?�----� � 50 i------:;>.,....L------;?.L..--:---::-=-::-----.� 40 �--_r..p....---___o>".c...---------r.

"5 30 �-_h<7__----::;".c...-----------� 20 �--;+_-___o>".c...-------------

10 Hr-r��-----------'=::.btiJYL---o ��-----------------

o 4 Error Level (years)

9 10

Figure 7. A comparison of the cumulative scores obtained by dif

ferent age estimation methods at error levels from 0 to 10 years

when tested on the FG-NET database.

7. Conclusions

We have proposed a novel Contourlet Appearance Model to localize facial landmarks and extract texture based features from the convex hull bounded by them. Our CAM is shown to be more accurate at reconstructing unseen texture compared to conventional AAMs because of its use of a Modified Active Shape Model for locating the facial landmarks. By decoupling the fitting of facial landmarks from the extraction of texture based features and by the used of the Nonsubsampled Contourlet Transform we are able to obtain discriminative features for age estimation using regression based aging functions. The results of using our CAM based age estimation technique on databases such as FG-NET and PAL are quite promising and highlight the improved accuracy of our method over several existing age estimation techniques.

Acknowledgements

This research was supported by CyLab at Carnegie Mellon under grants DAAD 19-02-1-0389 and W911NF-09-1-0273 from the Army Research Office.

References

[1] T. Ahonen, A. Hadid, and M. Pietikainen. Face Recognition

with Local Binary Patterns. In Proc. of the 8th European

Conf. on Computer Vision, volume 3021, pages 469-481,

May 2004.

[2] C. M. Christoudias and T. Darrell. On Modeling Nonlinear

Shape and Texture Appearance Manifolds. In Proc. of IEEE

Con! on Computer Vision Pattern Recognition, volume 2,

pages 1067-1074, June 2005.

[3] T. F. Cootes, G. 1. Edwards, and C. 1. Taylor. Interprettting

Face Images using Active Appearance Models. In Proc. of

the 3rd IntI. Conf. on Automatic Face and Gesture Recogni

tion, pages 300-305, Apr. 1998.










[4] T. F. Cootes and C. J. Taylor. Statistical Models of Appear

ance for Computer Vision. Technical report, Imaging Sci

ence and Biomedical Engineering, University of Manchester,

Mar. 2004.

[5] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Ac

tive Shape Models - Their Training and Application. Com

puter Vision and Image Understanding, 61(1):38-59, Jan.

1995.

[6] T. F. Cootes, C. J. Taylor, and A. Lanitis. Active Shape Mod

els : Evaluation of a Multi-Resolution Method for Improv

ing Image Search. In Proc. of the 5th British Machine Vision

Conf., pages 327-336, 1994.

[7] D. Cristinacce and T. F. Cootes. A Comparison of Shape

Constrained Facial Feature Detectors. In Proc. of the 6th IntI.

Con! on Automatic Face and Gesture Recognition, pages

375-380, May 2004.

[8] A. L. da Cunha, J. P. Zhou, and M. N. Do. The Nonsubsam

pled Contourlet Transform: Theory, Design, and Applica

tions. IEEE Trans. on Image Processing, 15(10):3089-3101,

Oct. 2006.

[9] M. N. Do and M. Vetterli. The Contourlet Transform: An

Efficient Directional Multiresolution Image Representation.

IEEE Trans. on Image Processing, 14(12):2091-2106, Dec.

2005.

[10] Y. Fu, G. Guo, and T. S. Huang. Age Synthesis and Estima

tion via Faces: A Survey. IEEE Trans. on Pattern Analysis

and Machine Intelligence, 32( 11): 1955-1976, Nov. 2010.

[11] F. Gao and H. Ai. Face Age Classification on Consumer Im

ages with Gabor Feature and Fuzzy LDA Method. In Proc. of

the 3rd IntI. Con! on Advances in Biometrics, volume 5558,

pages 132-141, June 2009.

[12] X. Geng, Z. Zhou, and K. Smith-Miles. Automatic Age Esti

mation Based on Facial Aging Patterns. IEEE Trans. on Pat

tern Analysis and Machine Intelligence, 29(12):2234--2240,

Dec. 2007.

[13] J. C. Gower. Generalized Procrustes Analysis. Psychome

trika, 40(1):33-51, Mar. 1975.

[14] L. Gu and T. Kanade. A Generative Shape Regularization

Model for Robust Face Alignment. In ECCV '08: Proc. of

the 10th European Con! on Computer Vision: Part 1, pages

413-426, Oct. 2008.

[15] G. Guo, G. Mu, Y. Fu, C. Dyer, and T. S. Huang. A Study

on Automatic Age Estimation Using a Large Database. In

Proc. of the 12th IEEE IntI. Con! on Computer Vision, pages

1986-1991, Oct. 2009.

[16] G. Guo, G. Mu, Y. Fu, and T. S. Huang. Human Age Es

timation Using Bio-Inspired Features. In Proc. of the IEEE

Con! on Computer Vision and Pattern Recognition, pages

112-119, June 2009.

[17] F. Kahraman, M. Gokmen, S. Darkner, and R. Larsen. An

Active Illumination and Appearance (aia) Model for Face

Alignment. In Proc. of the IEEE Con! on Computer Vision

And Pattern Recognition, pages 1-7, June 2007.

[18] A. Lanitis, C. J. Taylor, and T. F. Cootes. Modeling the Pro

cess of Ageing in Face Images. In Proc. of the 7th IEEE IntI.

Conf. on Computer Vision, volume 1, pages 131-136, Sept.

1999.

[19] K. Luu, T. D. Bui, K. Ricanek, and C. Y. Suen. Age

determination using Active Appearance Models and Support

Vector Machine Regression. In BTAS '09: Proc. of the 3rd IEEE IntI. Con! on Biometrics: Theory, Applications and

Systems, pages 314--318, Sept. 2009.

[20] S. Milborrow and F. Nicolls. Locating Facial Features with

an Extended Active Shape Model. In ECCV '08: Proc. of

the 10th European Con! on Computer Vision: Part IV, pages

504--513, Oct. 2008.

[21] M. Minear and D. C. Park. A Lifespan Database of Adult

Facial Stimuli. Behavior Research Methods, Instruments and

Computers, 36(4):630-633, Nov. 2004.

[22] NIST. National Institute of Standards and Technology: Mul

tiple Biometric Grand Challenge (MBGC) - 2008. http:

//face.nist.gov/mbgc, 2008.

[23] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scrugs,

A. J. O'Toole, D. Bolme, K. W. Bowyer, B. A. Draper,

G. H. Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan III,

and S. Weimer. Overview of the Multiple Biometrics Grand

Challenge. In Proc. of the 3rd IAPRlIEEE IntI. Conf. on Bio

metrics, pages 705-714, June 2009.

[24] K. Ricanek, Y. Wang, C. Chen, and S. J. Simmons. Gener

alized Multi-Ethnic Face Age-Determination. In BTAS '09: Proc. of the 3rd IEEE IntI. Conf. on Biometrics: Theory, Ap

plications and Systems, pages 127-132, Sept. 2009.

[25] K. Seshadri and M. Savvides. Robust Modified Active Shape

Model for Automatic Facial Landmark Annotation of Frontal

Faces. In BTAS '09: Proc. of the 3rd IEEE IntI. Con! on

Biometrics: Theory, Applications and Systems, pages 319-

326, Sept. 2009.

[26] J. Suo, T. Wu, S. Zhu, S. Shan, X. Chen, and W. Gao. Design

Sparse Features for Age Estimation Using Hierarchical Face

Model. In Proc. of the 8th IntI. Conf. on Automatic Face and

Gesture Recognition, pages 1-6, Sept. 2008.

[27] P. Viola and M. J. Jones. Robust Real-Time Face Detection.

IntI. lournal of Computer Vision, 57(2):137-154, May 2004.

[28] M. Wimmer, S. Fujie, F. Stulp, T. Kobayashi, and B. Radig.

An ASM Fitting Method Based on Machine Learning that

Provides a Robust Parameter Initialization for AAM Fitting.

In Proc. of the 8th IntI. Con! on Automatic Face and Gesture

Recognition, pages 1-6, Sept. 2008.

[29] C. Wolstenholme and C. J. Taylor. Wavelet Compression of

Active Appearance Models. In Proc. of the 2nd IntI. Con!

on Medical Image Computing and Computer-Assisted Inter

vention, pages 544--554, Sept. 1999.

[30] X. Xie, J. H. Lai, and W. Zheng. Extraction of Illumina

tion Invariant Facial Features from a Single Image Using

Nonsubsampled Contourlet Transform. Pattern Recognition,

43(12):4177-4189, Dec. 2010.

[31] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S.

Huang. Regression from Patch-Kernel. In Proc. of IEEE

Con! on Computer Vision and Pattern Recognition, pages

1-8, June 2008.

[32] Y. Zhou, L. Gu, and H.-J. Zhang. Bayesian Tangent Shape

Model: Estimating Shape and Pose Parameters via Bayesian

Inference. In CVPR '03: Proc. of the 215t IEEE IntI. Con!

on Computer Vision and Pattern Recognition, pages 109-

116, June 2003.
























































https://www.researchgate.net/publication/222684023_Extraction_of_illumination_invariant_facial_features_from_a_single_image_using_nonsubsampled_contourlet_transform?el=1_x_8&enrichId=rgreq-2631ce3af596d4e5ea1d7fee651b1ed7-XXX&enrichSource=Y292ZXJQYWdlOzI1NDAxNjA0MztBUzoxMTk3Nzc1OTQ5NzQyMDhAMTQwNTU2ODYxMjQwOA==