A Framework for Long Distance Face Recognition Using Dense - and Sparse-Stereo Reconstruction

A Framework for Long Distance Face Recognition

using Dense- and Sparse-Stereo Reconstruction

Ham Rara, Shireen Elhabian, Asem Ali, Travis Gault, Mike Miller, Thomas Starr,

and Aly Farag

CVIP Laboratory, University of Louisville, KY, USA {hmrara01,syelha01,amali003,travis.gault,wmmill06,tlstar01,

aafara01}@louisville.edu

Abstract. This paper introduces a framework for long-distance face recognition

using both dense- and sparse-stereo reconstruction. Two methods to determine

correspondences of the stereo pair are used in this paper: (a) dense global ste-

reo-matching using maximum-a-posteriori Markov Random Fields (MAP-

MRF) algorithms and (b) Active Appearance Model (AAM) fitting of both im-

ages of the stereo pair and using the fitted AAM mesh as the sparse correspon-

dences. Experiments are performed regarding the use of different features ex-

tracted from these vertices for face recognition. A comparison between the two

approaches (a) and (b) are carried out in this paper. The cumulative rank curves

(CMC), which are generated using the proposed framework, confirms the feasi-

bility of the proposed work for long distance recognition of human faces.

1 Introduction

Automatic face recognition is a challenging task that has been an attractive research

area in the past three decades (for more details see [1]). At the outset, most efforts

were directed towards 2D facial recognition which utilizes the projection of the 3D

human face onto the 2D image plane acquired by digital cameras. The face recogni-

tion problem is then formulated as follows: given a still image, identify or verify one

or more persons in the scene using a stored database of face images. The main theme

of the solutions provided by different researchers involves detecting one or more

faces from the given image, followed by facial feature extraction which can be used

for recognition. Challenges involving 2D face recognition are well-documented in the

literature. Intra-subject variations such as illumination, expression, pose, makeup, and

aging can severely affect a face recognition system.

To address pose and illumination, researchers recently are focusing on 3D face

recognition [2]. 3D face geometry can either be acquired using 3D sensing devices

such as laser scanners [3] or reconstructed from one or more images [4-6]. Although

3D sensing devices have been proven to be effective in 3D face recognition [7], their

high cost, limited availability and controlled environment settings have created the

need for methods that extract 3D information from acquired 2D face images.

Recently, there has been interest in face recognition at-a-distance. Yao, et al. [8]

created a face video database, acquired from long distances, high magnifications, and

https://www.researchgate.net/publication/220566092_Face_Recognition_A_Literature_Survey?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/3193588_Face_recognition_based_on_fitting_a_3D_morphable_model?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/4076243_Automatic_3D_reconstruction_for_face_recognition?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/3969046_3D_Face_Reconstruction_From_Video_Using_A_Generic_Model?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/257672083_Three-Dimensional_Face_Recognition?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/228888156_3D_face_recognition_using_mapped_depth_images?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/4207503_3D_Assisted_Face_Recognition_A_Survey_of_3D_Imaging_Modelling_and_Recognition_Approachest?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

both indoor and outdoor under uncontrolled surveillance conditions. Medioni, et al.

[9] presented an approach to identify non-cooperative individuals at a distance by

inferring 3D shape from a sequence of images.

To realize our objectives and the current lack of existing facial stereo databases, we

constructed our own passive stereo acquisition setup [10]. The setup consists of a

stereo pair of high resolution cameras (and telephoto lenses) with adjustable baseline.

It is designed such that user can remotely pan, tilt, zoom and focus the cameras to

converge to the center of the cameras’ field of views on the subject’s nose tip. This

system is used to capture stereo pairs of 30 subjects at various distances (3-, 15-, and

33-meter ranges).

The paper is organized as follows: Section 2 discusses stereo reconstruction me-

thods (dense and sparse), Section 3 shows the experimental results, Section 4 vali-

dates the best method in Sec. 3 using the FRGC database, and later sections deal with

discussions and limitations of the proposed approaches, conclusions and future work.

2 Stereo Matching-Based Reconstruction

Dense, Global Stereo Matching: The objective of the classical stereo problem is to

find the pair of corresponding points p and q that result from the projection of the

same scene point (X, Y, Z) to the two images of the stereo-pair. Currently, the state-of-

the-art in stereo matching is achieved by global optimization algorithms [11], where

the problem is formulated as a maximum-a-posteriori Markov Random Field (MAP-

MRF) scenario. Given the left and right images, the goal is to find the disparity map

D, where at each pixel p, the disparity is 𝑑𝑝 = 𝑝𝑥 − 𝑞𝑥 . To correctly solve this prob-

lem, the constraints of the visual correspondence should be satisfied: (a) uniqueness,

where each pixel in the left image corresponds to at most one pixel in the right image

and (b) occlusion, where some pixels do not have correspondences. To achieve these

constraints, similar to Kolmogorov’s approach [12], we treat the two images symme-

trically by computing the disparity maps for both images simultaneously. The dispari-

ty map D is computed by minimizing the energy function 𝐸 𝐷 = 𝐸𝑑𝑎𝑡𝑎 𝐷 +

𝐸𝑠𝑚𝑜𝑜𝑡 ℎ 𝐷 + 𝐸𝑣𝑖𝑠 𝐷 . The terms refer to the penalty, smoothness, and visibility con-

straint terms [13][14]. To fill the occluded regions, we propose to interpolate between

the correctly reconstructed pixels of each scan line using the cubic splines [15] inter-

polation model. Finally, after getting a dense disparity map from which we get a set of

correspondence points, we reconstruct the 3D points of the face [10]. To remove some

artifacts of the reconstruction, an additional surface fitting step is done [16].

Sparse-Stereo Reconstruction: The independent AAM version of [17] is used to

find sparse correspondences of the left and right images of the stereo pair. The shape

𝑠 can be expressed as the sum of a base shape 𝑠0 and a linear combination of 𝑛 shape

vectors 𝑠𝑖 , 𝑠 = 𝑠0 + 𝑝𝑖𝑠𝑖𝑖 , where 𝑝𝑖 are the shape parameters. Similarly, the appear-

ance 𝐴 𝒙 can be expressed as the sum of the base appearance 𝐴0 𝒙 and a linear

combination of basis images 𝐴𝑖 𝒙 , 𝐴 𝒙 = 𝐴0 𝒙 + 𝜆𝑖𝐴𝑖 𝒙 𝑖 , where the pixels 𝒙 lie

on the base mesh 𝑠0. Fitting the AAM to an input image involves minimizing the

error image between the input image warped to the base mesh and the appearance

https://www.researchgate.net/publication/3192833_A_pixel_dissimilarity_measure_that_is_insensitive_to_image_sampling?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/7990921_An_Experimental_Comparison_of_In-CutMax-Flow_Algorithms_for_Energy_Minimization_in_Vision?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/2900541_Active_appearance_models_revisited_IJCV_602_135-164?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/4298100_Non-Cooperative_Persons_Identification_at_a_Distance_with_3D_Face_Modeling?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/32896299_Multi-camera_Scene_Reconstruction_via_Graph_Cuts?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/232635044_Face_recognition_at-a-distance_based_on_sparse-stereo_reconstruction?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0


https://www.researchgate.net/publication/279352890_Interpolating_Cubic_Splines?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/221303891_A_Comparative_Study_of_Energy_Minimization_Methods_for_Markov_Random_Fields?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

𝐴 𝒙 = 𝐴0 𝒙 + 𝜆𝑖𝐴𝑖 𝒙 𝑖 , that is, 𝐴0 𝒙 + 𝜆𝑖𝐴𝑖 𝒙 𝑖 − 𝐼 𝑊 𝒙; 𝑝 2

𝒙𝜖𝑠0 . For this

work, the error image is minimized using the project-out version of the inverse com-

positional image alignment (ICIA) algorithm [17].

To facilitate a successful fitting process, the AAM mesh is initialized according to

detected face landmarks (eyes, mouth center, and nose tip). After detecting these

facial features, the AAM base mesh is warped to these points.

The detection of facial features starts with identifying the possible facial regions in

the input image, using a combination of the Viola-Jones detector [18] and the skin

detector of [19]. The face is then divided into four equal parts to establish a geome-

trical constraint of the face. The face landmarks are then identified using variants of

the Viola-Jones detector, i.e., the face detector is replaced with the corresponding face

landmark (e.g., eye detector) detector [20]. False detections are then removed by

taking into account the geometrical structure of the face (i.e., expected facial feature

locations).

3 Experimental Results

The 3D acquisition system in [10] is used to build a human face database of 30 sub-

jects at different ranges in controlled environments. The database consists of a gallery

at 3 meters and three different probe sets at the 3-, 15-, and 33-meter ranges. Table I

shows the system parameters at different ranges.

Table I: Stereo-based acquisition system parameters

Dense, Global Stereo 3D Face Reconstructions: The gallery is constructed by cap-

turing stereo pairs for the 30 subjects at the 3-meter range. We reconstruct the 3D face

of each subject using the approach that is described in Section 2. Fig. 1(a) illustrates a

sample from this gallery for different subjects. This figure shows the left image of

each subject and two different views for the 3D reconstruction with and without the

textures.

For the dense, global 3D reconstruction approach, only the images from the probe

sets 3-meter and 15-meter ranges are considered. The reason behind this is that the

methodology from Sec. 2.1 fails to determine acceptable correspondences of the ste-

reo-pair images of the 33-meter range, leading to unacceptable 3D reconstructions.

This result has led the authors to propose the second method (sparse-stereo) to deal

with stereo-pairs that are difficult to extract dense correspondences. Fig. 1(b) and

1(c) illustrate the samples from these probe sets.

Sparse-Stereo 3D Face Reconstructions: The gallery and probe sets are similar to

above, except that the 33-meter images are now included in the probe set. The train-

ing of the AAM model involves images from the gallery. The vertices of the final


https://www.researchgate.net/publication/220660094_Robust_Real-Time_Face_Detection?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/2882163_Statistical_Color_Models_with_Application_to_Skin_Detection?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/38182034_Face_and_facial_feauture_detection_evaluation_Performance_evaluation_of_public_domain_haar_detectors_for_face_and_facial_feature_detection?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0


Figure 1: Dense 3D reconstructions: (a) 3-meter gallery, (b) 3-meter, and (c) 15-meter probes.

AAM mesh on both left and right images can be considered as a set of corresponding

pair of points, which can be used for stereo reconstruction. To further refine the cor-

respondences, a local NCC search around each point is performed, using the left im-

age as the reference, to get better correspondences. Fig. 2 shows stereo reconstruction

results of three subjects, visualized with the x-y, x-z, and y-z projections, after rigid

alignment to one of the subjects. Notice that in the x-y projections, the similarity (or

difference) of 2D shapes coming from the same (or different) subject is enhanced.

This is the main reason behind the use of x-y projections as features in Sec. 3 (Recog-

nition).

Figure 2: Reconstruction results. The 3D points are visualized as projections in the x-y, x-z,

and y-z planes. Circle and diamond markers belong to the same subject, while the square mark-

ers are that of a different subject.

Recognition: For face recognition, we use five approaches for using the 3D face

vertices derived from dense- and sparse-stereo (3D-AAM) to identify probe images

against the gallery: (a) moment-based approach for dense 3D reconstructions, (b)

feature vectors derived from Principal Component Analysis (PCA) of 3D-AAM ver-

tices, (c) goodness-of-fit criterion (Procrustes) after rigidly registering the 3D-AAM

vertices of a probe with that of a gallery subject, (d) feature vectors from PCA of x-y

plane projections of the 3D-AAM vertices, and (e) the same procedure as (c) but us-

ing the x-y plane projections of the 3D-AAM vertices of both probe and gallery, after

frontal pose normalization.

Moment-based Recognition: For the dense 3D reconstructions, to compare between

gallery and probe sets, feature vectors are derived from moments [21] derived from

the 3D vertex coordinates. The moments are computed as 𝜂𝑟𝑠𝑡 = 𝑋𝑟𝑍𝑌𝑋 𝑌𝑠𝑍𝑡 .

Principal Component Analysis (PCA): To apply PCA [22] for feature classification,

the primary step is to solve for the matrix P of principal components from a training

database, using a number of matrix operations. The feature vectors Y can then be

determined as follows: 𝑌 = 𝑃𝑇𝑋, where 𝑋 is a centered input data. The similarity

measure used for recognition is the L2 norm.

Goodness-of-fit (Procrustes): The Procrustes distance [23] between two shapes is a

least-squares type of metric that requires one-to-one correspondence between shapes.

After some preprocessing steps involving the computation of centroids, rescaling each

shape to have equal size and aligning with respect to translation and rotation, the

squared Procrustes distance between two shapes 𝒙1 and 𝒙2 is the sum of squared point

distances, 𝑃𝑑2 = 𝒙1 − 𝒙2

2.

Figure 3: Cumulative match characteristic (CMC) curve of the: (a) 3-meter probe set, (b) 15-

meter probe set, and (c) 33-meter probe set. Note that only the 3-m and 15-m probe sets use

moment-based recognition (see Fig. 3)

Discussion of Results: Fig. 3 shows the cumulative rank curves (CMC) curves for the

five types of feature extraction methods in the previous section (Sec. 3), using 3-, 15-,

and 33-meter probes. We can draw four conclusions: (a) both 2D Procrustes (i.e., x-y

projection of 3D-AAM) and 2D PCA outperform both 3D Procrustes and 3D PCA,

(b) goodness-of-fit criterion (Procrustes) slightly outperforms PCA in both 2D and

3D, (c) degradation of recognition at increased distances, and (d) the moment-based

methods perform poorly at lower ranks but shoots up quickly to 100% at rank-5 and

rank-8 for the 3-m and 15-m probe sets, respectively.

https://www.researchgate.net/publication/226575631_Eigenfaces_vs_Fisherfaces_Recognition_Using_Class_Specific_Linear_Projection?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

The conclusion in (a) can be explained with the help of Fig. 4. The diagram shows the

top view of a simple stereo system. Ol and Or are centers of projection, and pl, pr, ql, qr

are points on the left and right images.

Figure 4: Simple stereo illustration. 3D reconstruction is sensitive to the correspondence prob-

lem but projection to the x-y plane minimizes the error.

Assume that the y-coordinates of the four image points are equal. pl and pr will re-

construct P, ql and qr will reconstruct Q, and so on. Notice that a small change in the

correspondence affects the xyz reconstructions hugely, i.e., their Euclidean distance

between each other is huge. But when they are projected to the x-y plane, their 2D

Euclidean distances with each other are considerably lesser. This scenario is possibly

happening with the correspondence of the AAM vertices between the left and right

image.

The conclusion in (b) is related to the primary purpose of PCA, which is optimal

reconstruction error, in the mean-square-error (MSE) sense. Projection to a low-

dimensional space may remove the classification potential of the vectors. There is no

dimensional change with rigid alignment using Procrustes; similar shapes are ex-

pected to have less Procrustes distance after rigid alignment and geometric informa-

tion of faces (e.g., distance ratios between face parts) are maintained.

Results are expected to degrade with distance since the captured images are at less

ideal conditions (although recognition using 2D x-y projections remain stable). The

work in Medioni [9] deals with identification at a distance using recovered dense 3D

shape from a sequence of captured images. The results at the 15- and 33-meter range

(Figs 5(b,c)) are comparable (and slightly better) than their 9-meter results; however,

their experimental setting may be less controlled than ours.

The authors cannot find any concrete reason behind conclusion (d), except the fact

that the dense reconstructed shapes may be overfitted by the gridfit procedure of [16].

This part of the framework is still a work-in-progress and more elaborate 3D recogni-

tion methods will be incorporated as future work (see Conclusions).


Sensitivity Analysis: This section performs a sensitivity analysis of the recognition

performance with respect to errors in the AAM fitting of the stereo pair of images. To

simulate errors introduced to the system, the fitted AAM vertices of the stereo pair are

randomly perturbed with additive white Gaussian noise of a certain variance. Fig. 5

shows the plot of rank-1 recognition rates versus point sigmas, for the 3-, 15-, and 33-

meter ranges. The recognition method used is the 2D Procrustes approach of Fig. 3.

These findings reinforce the known fact that acceptable AAM fitting is necessary to

get satisfactory correspondence, which leads to suitable recognition.

Figure 5: Sensitivity analysis of recognition with respect to errors in the AAM fitting of the

stereo pair of images. Notice that the recognition rates are fairly stable across various values of

σ for the 3- and 15-meter probe sets. However, for the 33-meter probe set, the recognition

performance severely deteriorates after 𝜎 = 5.

4 Validation with FRGC Database

The main purpose of this section is to test if the 2D (x-y projection of 3D-AAM)

Procrustes results of Fig. 5 carry over to the much larger FRGC database. Since the

FRGC database contains close-to-perfect dense 3D information, the advantage of 2D

Procrustes over its 3D equivalent would also be investigated. The section of the

FRGC database [24] with range (3D) data is used. Each range is accompanied by a

corresponding texture image. 115 subjects (with three images each, for a total of 345)

were chosen out of the total number of subjects, the number being restricted by the

manual annotation of AAM training data.

Fig. 6 may provide some insights regarding the better performance of 2D Pro-

crustes over 3D Procrustes. Note that the 2D+3D (range, depth) partition of the

FRGC database contains 2D video images with corresponding range values for each

pixel. After the manual/automatic fitting of AAM vertices, the corresponding range

(depth) value is extracted for each vertex. In Fig. 6, the red dots represent the ex-

tracted depth values using the 2D coordinates of the fitted vertices. Notice that some

depth values are undesirable; they do not contain the intended depth of the facial

feature points (e.g., nose area of Fig. 6(a) and face boundary of Fig. 6(b)). The 2D

coordinates of the AAM vertices (except for the face boundary) are adjusted accord-

https://www.researchgate.net/publication/7942084_An_evaluation_of_multi-modal_2D3D_face_biometrics?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

ing to the COSMOS framework [25]; specifically, the vertices are adjusted along a

local neighborhood in the horizontal (𝑥) direction, according to some extremum val-

ues defined by [25]. The green dots represent the adjusted 3D vertices. The face

boundaries are adjusted using ad-hoc methods that investigate the most acceptable

face vertex depth of the whole face boundary. Fig. 7 shows both the 2D and 3D Pro-

crustes methods using the FRGC database. Similar to Fig. 3, the 2D approach outper-

forms the 3D version.

Figure 6: Using the original 2D coordinates of the fitted AAM vertices in the range image do

not give the desired feature locations; hence, the adjustment of the vertices: movement of the

(x) dot to the () dot at the (a) nose location and (b) face boundary, indicated by the arrow. The

vertex movement is not noticeable in the corresponding 2D video image.

Figure 7: Cumulative match characteristic (CMC) curve of the FRGC experiment. Notice the

superiority of the 2D Procrustes approach over the 3D version. Classification is performed

using the leaving-one-out methodology of [22]. The reason behind this is related to Fig. 6; the

movement from the red to green dots represents a large variation in the 3D coordinate system,

but when projected to the 2D x-y plane, the planar movement is much smaller, e.g., there is no

noticeable pixel difference in the 2D image after the vertex movement in the nose of Fig. 6(a).

0 5 10 150.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank

Recognitio

n R

ate

2D Procrustes

3D Procrustes

https://www.researchgate.net/publication/3192760_COSMOS-A_representation_scheme_for_3D_free-form_objects?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0



5 Conclusions and Future Work

This paper described a framework for identifying faces remotely, specifically at dis-

tances 3-, 15- and 33-meters. The best approach used AAM to get correspondences

between the left and right images of the stereo pair. This study found out that recogni-

tion using the point-to-point distance between 2D x-y projected shapes (after rigid

registration, Procrustes) outperforms the others. The advantage of the 2D Procrustes

approach over its 3D version carried over to the much larger FRGC database (with

chosen 115 subjects). Using our database, we have illustrated the potential of using

these few vertices, as opposed to the whole set of points of the human face (dense

reconstruction).

The authors are aware of more elaborate methods of 3D shape classification related

to face recognition (even with the presence of face expression), such as [7]. However,

for this application (of identifying faces at far distances), a close-to-perfect dense 3D

scan of the face is difficult to obtain; therefore, this study currently deals with sparse

3D points related to the AAM vertices. The next step of this work is to densify the

AAM mesh to have a reconstruction that has a close semblance to a dense 3D scan

but still contain lesser vertices than conventional 3D scans. This study does not cur-

rently consider face expression (since the 3D sparse reconstruction can only do so

much) but will be considered as future work once the densification of the AAM ver-

tices is taken care of. Additionally, the authors plan to increase the database size (for

better statistical significance) and capture images at further distances (with the help of

state-of-the art equipment).

References

1. Zhao, W., Chellappa, R., Rosenfeld, A.: Face recognition: a literature survey. ACM Com-

puting Surveys 35 (2003) 399–458

2. Kittler, J., Hilton, A., Hamouz, M., Illingworth, J.: 3D assisted face recognition: A survey

of 3D imaging, modelling and recognition approaches. In: Proc. of CVPR. (2005)

3. Pan, G., Han, S.,Wu, Z.,Wang, Y.: 3D face recognition using mapped depth images. In:

Proc. of CVPR- Workshop on Face Recognition Grand Challenge Experiments. (2005)

4. Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. IEEE Trans.

on PAMI 25 (2003) 1063–1074

5. Hu, Y., Jiang, D., Yan, S., Zhang, L., Zhang, H.: Automatic 3D reconstruction for face

recognition. In: Proc. of Sixth IEEE International Conference on Face and Gesture Recog-

nition. (2004) 843–848

6. Chowdhury, A.R., Chellappa, R., Krishnamurthy, S., Vo, T.: 3D face reconstruction from

video using a generic model. In: Proc. of IEEE International Conference on Multimedia and

Expo. (2002) 449–452

7. Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Three-dimensional face recognition. In:

Intl. Journal of Computer Vision (IJCV). (2005) 5–30

8. Yao, et al..: Improving long range and high magnification face recognition: Database acqui-

sition, evaluation, and enhancement. In: Computer Vision and Image Understanding

(CVIU). (2008)


















9. Medioni, G., Jongmoo, C., Cheng-Hao, K., Choudhury, A., Li, Z., Fidaleo, D.: Noncoo-

perative persons identification at a distance with 3D face modeling. In: Proc. of IEEE Inter-

national Conference on Biometrics: Theory, Applications, and Systems (BTAS’07). (2007)

10. Rara, H., Elhabian, S., Ali, A., Miller, M., Starr, T., and Farag, A.: Face recognition at-a-

distance based on sparse-stereo reconstruction. In: Proc. of CVPR- Biometrics Workshop.

(2009)

11. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen,

M.F., Rother, C.: A comparative study of energy minimization methods for markov random

fields. In: Proc. of ECCV. (2006) 16–29

12. Kolmogorov, V., Zabih, R.: Multi-camera scene reconstruction via graph cuts. In: Proc. of

ECCV. (2002) 82–96

13. Birchfield, S., Tomasi, C.: A pixel dissimilarity measure that is insensitive to image sam-

pling. IEEE Trans. on PAMI 20 (1998) 401–406

14. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algo-

rithms for energy minimization in vision. IEEE Trans. on PAMI 26 (2004) 1124–1137

15. Knott, G.D.: Interpolating Cubic Splines. Springer-Verlag (2000)

16. D'Errico, J.: Surface Fitting using gridfit. In: Matlab Central File Exchange

(http://www.mathworks.com/matlabcentral/fileexchange/). (2006)

17. Matthews, I., Baker, S.: Active Appearance Models Revisited. In: International Conference

on Computer Vision (ICCV). (2004)

18. Viola, P., Jones, M. J.: Robust real-time face detection. In: International Journal of Com-

puter Vision (IJCV). (2004) 151–173

19. Jones, M., Rehg, J.: Statistical color models with application to skin detection. In: Interna-

tional Journal of Computer Vision (IJCV). (2002) 81—96

20. Castrillon-Santana, M., Deniz-Suarez, O., Anton-Canalıs, L., Lorenzo-Navarro, J.: FACE

AND FACIAL FEATURE DETECTION EVALUATION: Performance Evaluation of Pub-

lic Domain Haar Detectors for Face and Facial Feature Detection. In: VISAPP. (2008)

21. Elad, M., Tal, A., Ar, S.: Content based retrieval of vrml objects: an iterative and interactive

approach. In: Proceedings of the sixth Eurographics workshop on Multimedia. (2001) 107–

118

22. Belhumeur, P., Hespanha J., Kriegman, D.: Eigenfaces vs. Fisherfaces: Recognition using

Class Specific Linear Projection. In: IEEE Trans. PAMI. (1998)

23. Cootes, T.F., Taylor, C.J.: Statistical Models of Appearance for Computer Vision. In:

Technical Report, University of Manchester, UK. (2004)

24. Chang, K., Bowyer, K., Flynn, P.: An Evaluation of Multimodal 2D+3D Face Biometrics.

In: IEEE Trans. PAMI. (2005)

25. Dorai, C., Jain, A.K.: COSMOS - A Representation Scheme for 3D Free-Form Objects. In:

IEEE Trans. PAMI. (1997)























https://www.researchgate.net/publication/239569823_Content_Based_Retrieval_of_VRML_Objects_-_An_Iterative_and_Interactive_Approach?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0



https://www.researchgate.net/publication/2931099_Statistical_Models_of_Appearance_for_computer_vision?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0

https://www.researchgate.net/publication/2931099_Statistical_Models_of_Appearance_for_computer_vision?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0






https://www.researchgate.net/publication/279352890_Interpolating_Cubic_Splines?el=1_x_8&enrichId=rgreq-8b752b5bab5d5ca1bb783780db9b7ed3-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0NTU5MjtBUzo5ODQ1NTEyNDk3MTUyNUAxNDAwNDg0OTM5NzI0




A Framework for Long Distance Face Recognition Using Dense - and Sparse-Stereo Reconstruction

Documents

Transcript of A Framework for Long Distance Face Recognition Using Dense - and Sparse-Stereo Reconstruction