Improving long range and high magnification face recognition: Database acquisition, evaluation, and...

15
Improving long range and high magnification face recognition: Database acquisition, evaluation, and enhancement Yi Yao a , Besma R. Abidi a, * , Nathan D. Kalka b , Natalia A. Schmid b , Mongi A. Abidi a a Imaging, Robotics, and Intelligent Systems Laboratory, The University of Tennessee, Knoxville, TN 37996, USA b West Virginia University, Morgantown, WV 26506, USA Received 23 April 2007; accepted 4 September 2007 Available online 29 September 2007 Abstract In this paper, we describe a face video database, UTK-LRHM, acquired from long distances and with high magnifications. Both indoor and outdoor sequences are collected under uncontrolled surveillance conditions. To our knowledge, it is the first database to pro- vide face images from long distances (indoor: 10–16 m and outdoor: 50–300 m). The corresponding system magnifications range from 3· to 20· for indoor and up to 284· for outdoor. This database has applications in experimentations with human identification and authentication in long range surveillance and wide area monitoring. Deteriorations unique to long range and high magnification face images are investigated in terms of face recognition rates based on the UTK-LRHM database. Magnification blur is shown to be a major degradation source, the effect of which is quantified using a novel blur assessment measure and alleviated via adaptive deblurring algo- rithms. A comprehensive processing algorithm, including frame selection, enhancement, and super-resolution is introduced for long range and high magnification face images with a large variety of resolutions. Experimental results using face images of the UTK-LRHM database demonstrate a significant improvement in recognition rates after assessment and enhancement of degradations. Ó 2007 Elsevier Inc. All rights reserved. Keywords: Face recognition; Long range surveillance; Face database; Image quality; Image enhancement; Image restoration; Super-resolution 1. Introduction Substantial developments have been made in face recog- nition research over the last two decades and multiple face databases have been collected. These include the FERET [1], FRGC/FRVT [2], AR [3], BANCA [4], CMU FIA [5], and CMU PIE [6], to name a few. In these databases, face images or videos, typically visible RGB or monochro- matic, are recorded under different resolutions, illumina- tions, poses, expressions, and occlusions. The data is mostly collected from close distances and with low and constant camera zoom and is only well suited for close range applications, such as identity verification at access points. The rapidly increasing need for long range surveil- lance and wide area monitoring calls for a passage from close-up distances to long distances and accordingly from low and constant camera zoom to high and adjustable zoom. This enables the detection and verification of humans from long distances and provides critical early threat assessment and situational awareness. The work and database described herewith serve this purpose and provide the research community with a standard testing foundation for long range face related research. Fig. 1 illus- trates the optical design, outdoor location, and range of the data we collected. A definition of different ranges of system magnifications and observation distances for near-ground surveillance (both indoor and outdoor) is given in Table 1. Our database has the following distinguishing character- istics [7]: (1) According to the above definitions, most existing face databases fall into the category of low magni- fication with a few of them achieving medium magnifica- tions. For instance, the Human ID gait video database [8] collected by the University of South Florida and the 1077-3142/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2007.09.004 * Corresponding author. Fax: +1 865 974 5459. E-mail address: [email protected] (B.R. Abidi). www.elsevier.com/locate/cviu Available online at www.sciencedirect.com Computer Vision and Image Understanding 111 (2008) 111–125

Transcript of Improving long range and high magnification face recognition: Database acquisition, evaluation, and...

Available online at www.sciencedirect.com

www.elsevier.com/locate/cviu

Computer Vision and Image Understanding 111 (2008) 111–125

Improving long range and high magnification face recognition:Database acquisition, evaluation, and enhancement

Yi Yao a, Besma R. Abidi a,*, Nathan D. Kalka b, Natalia A. Schmid b, Mongi A. Abidi a

a Imaging, Robotics, and Intelligent Systems Laboratory, The University of Tennessee, Knoxville, TN 37996, USAb West Virginia University, Morgantown, WV 26506, USA

Received 23 April 2007; accepted 4 September 2007Available online 29 September 2007

Abstract

In this paper, we describe a face video database, UTK-LRHM, acquired from long distances and with high magnifications. Bothindoor and outdoor sequences are collected under uncontrolled surveillance conditions. To our knowledge, it is the first database to pro-vide face images from long distances (indoor: 10–16 m and outdoor: 50–300 m). The corresponding system magnifications range from3· to 20· for indoor and up to 284· for outdoor. This database has applications in experimentations with human identification andauthentication in long range surveillance and wide area monitoring. Deteriorations unique to long range and high magnification faceimages are investigated in terms of face recognition rates based on the UTK-LRHM database. Magnification blur is shown to be a majordegradation source, the effect of which is quantified using a novel blur assessment measure and alleviated via adaptive deblurring algo-rithms. A comprehensive processing algorithm, including frame selection, enhancement, and super-resolution is introduced for longrange and high magnification face images with a large variety of resolutions. Experimental results using face images of the UTK-LRHMdatabase demonstrate a significant improvement in recognition rates after assessment and enhancement of degradations.� 2007 Elsevier Inc. All rights reserved.

Keywords: Face recognition; Long range surveillance; Face database; Image quality; Image enhancement; Image restoration; Super-resolution

1. Introduction

Substantial developments have been made in face recog-nition research over the last two decades and multiple facedatabases have been collected. These include the FERET[1], FRGC/FRVT [2], AR [3], BANCA [4], CMU FIA[5], and CMU PIE [6], to name a few. In these databases,face images or videos, typically visible RGB or monochro-matic, are recorded under different resolutions, illumina-tions, poses, expressions, and occlusions. The data ismostly collected from close distances and with low andconstant camera zoom and is only well suited for closerange applications, such as identity verification at accesspoints. The rapidly increasing need for long range surveil-lance and wide area monitoring calls for a passage from

1077-3142/$ - see front matter � 2007 Elsevier Inc. All rights reserved.

doi:10.1016/j.cviu.2007.09.004

* Corresponding author. Fax: +1 865 974 5459.E-mail address: [email protected] (B.R. Abidi).

close-up distances to long distances and accordingly fromlow and constant camera zoom to high and adjustablezoom. This enables the detection and verification ofhumans from long distances and provides critical earlythreat assessment and situational awareness. The workand database described herewith serve this purpose andprovide the research community with a standard testingfoundation for long range face related research. Fig. 1 illus-trates the optical design, outdoor location, and range of thedata we collected. A definition of different ranges of systemmagnifications and observation distances for near-groundsurveillance (both indoor and outdoor) is given in Table 1.

Our database has the following distinguishing character-istics [7]: (1) According to the above definitions, mostexisting face databases fall into the category of low magni-fication with a few of them achieving medium magnifica-tions. For instance, the Human ID gait video database[8] collected by the University of South Florida and the

Fig. 1. (a) Composite imaging system used for outdoor sequence collection. (b) A zoomed overview from a satellite image of the location for outdoorsequence collection. The green spot is the camera’s location and orange dots are the subject’s locations with uniformly distributed observation distances inthe range of 50–300 m. (c) A picture of the subject and the camera at an observation distance of 300 m.

Table 1Definition of magnification/distance ranges

Range Low/short Medium High/long Extreme

Magnification 1·–3· 3·–10· 10·–35· >35·Distance <3 m 3–10 m 10–100 m >100 m

Availability Most existing databases UTK-LRHM database

112 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

database collected by the University of Texas at Dallas(UTD) [9] involve medium distance sequences. The parallelwalking videos in the UTD database start from a maxi-mum distance of 13.6 m and their perpendicular walkingvideos are collected from a distance of 10.4 m. However,the camera’s zoom in both databases remains low and con-stant. In comparison, our database aims at high to extrememagnifications and long to extreme distances. For indoorsequences high magnifications (10·–20·) are used whilefor outdoor sequences extreme magnifications (up to284·) are achieved. As a result, degradations induced byhigh magnification and long distance, such as magnifica-tion blur, are systematically present in the data. (2) Ourdatabase portrays actual near-ground surveillance condi-tions (illumination changes caused by non-uniform ceilinglight for indoor sequences, weather conditions and air tur-bulences for outdoor sequences, and subject motions).More importantly, our UTK-LRHM database includesthe effect of camera zooming, which is commonly absentin other databases available to date. Sequences with vari-ous combinations of still/moving subjects and constant/adjustable camera zoom are collected for the study of theindividual and combined effects of target and cameramotions.

In face detection and tracking, criterion functions areused to describe the probability of an area being a faceimage. The term face quality assessment was first explicitlyused by Identix [10], where a face image is evaluatedaccording to the confidence of detectable eyes, frontal facegeometry, resolution, illumination, occlusion, contrast,focus, etc. Kalka et al. applied the quality assessment met-rics, such as lighting (illumination), occlusion, inter-oculardistance (resolution), and image blurriness, originally pro-posed for iris to face images [11]. Xiong and Jaynes devel-

oped a metric based on bilateral symmetry, color,resolution, and expected aspect ratio (frontal face geome-try) to determine whether a detected face image in a surveil-lance video is suitable to be added to an on-the-fly database[12]. Weber reviewed existing quality measures for faceimages in [13]. A comparison of human and automaticmeasurement of the quality of biometric samples can befound in [14].

Long distance and high magnification conditions intro-duce severe and non-uniform blur, which is unique to ourdatabase in comparison with most existing databases col-lected from close distances and with low magnifications.Thus, in addition to the aforementioned metrics, our firstpriority is to examine the effect of high magnification bluron face image quality and eventually the overall face recog-nition rate. We begin our study with the assessment of highmagnification blur and then describe the correspondingenhancement techniques with focus on image deblurring.We also consider the combination of image deblurringand super-resolution (SR) for face images with lowresolution.

Apart from numerous deblurring algorithms, such asadaptive unsharp masking (UM) [15] and regularizedimage deconvolution [16], algorithms are proposed espe-cially for face deblurring by making use of known facialstructures. Fan et al. incorporated the prior statistical mod-els of the shape and appearance of a face into the regular-ized image restoration formulation [17]. A hybridrecognition and restoration architecture was described byStainvas and Intrator [18], where a neural network istrained by both sharp and blurred face images. Liao andLin applied Tikhonov regularization to Eigen-face sub-spaces to overcome the algorithm’s sensitivity to imagenoise [19].

In this paper, an enhancement algorithm particularlydesigned to process magnification blur is introduced andits effectiveness is verified via extensive experimentalresults. A wavelet based method is selected to achieve a bal-ance between image deblurring and denoising. A restora-tion algorithm based on norm-1 Lasso regularization [20]is employed for image deblurring. Lasso regularized imagedeconvolution features a reduced computational complex-ity compared with the total variation method and an

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 113

improved ability in preserving edges compared with theconventional norm-2 regularization. It is a promisingchoice since a reduced computational complexity facilitatesreal-time applications and the improved ability in preserv-ing facial features helps in increasing face recognition rates[21].

In addition to the hardware design used to achieve therequired resolution, super-resolution is a software basedmethod that has received intensive research interests. Sincethe publishing of the pioneer work of Tsai and Huang [22],various SR algorithms have been proposed [23–30]. Bor-man and Stevenson reviewed existing SR algorithms in[31]. A more detailed and updated review was later givenby Park et al. [32]. In addition to general SR methods,algorithms designed particularly for face images also existin literature, such as hallucinating faces [33] and Eigenfaces[34]. Baker and Kanade employed a pyramid based algo-rithm to learn a prior of the derivatives for reconstructinghigh resolution images [33]. A layered predictor networkwas employed for face SR based on a Resampling-Maxi-mum Likelihood model [35]. Jia and Gong addressed thereconstruction of SR face images using multiple occludedimages of different resolutions [36], which are commonlyencountered in surveillance videos.

In this paper, we also study and compare the perfor-mance of SR algorithms when applied to long range andhigh magnification face images. Image sets with differentresolutions, measured by the inter-ocular distance in pixels,are selected from the UTK-LRHM database. These imagesundergo combinations of processing, including SR and SRfollowed by enhancement. The output data are fed into twocommercial recognition engines, FaceIt� [37] and Veri-Look� [38], and the resulting face recognition rates tabu-lated and compared. FaceIt� employs local featureanalysis (LFA) while VeriLook� utilizes Gabor wavelets.Face recognition engines based on different approachesare selected to validate the generality of our conclusions.

The major contributions of this paper are as follows. (1)A long range and high magnification face video database isestablished, which lays the foundation for long range facerelated research work. The observation distances and sys-tem magnifications reach up to 300 m and 284·, respec-tively. (2) The influence of magnification blur on facerecognition rates is studied and the resulting degradationis quantified using adaptive sharpness measures [39]. (3)

Fig. 2. Illustration of (a) a small expression

A wavelet based deconvolution algorithm is proposedwhere a norm-1 regularization is applied to reduce compu-tational complexity and preserve image edges especiallyfacial features. Significant improvements in face recogni-tion rates are achieved.

The remainder of this paper is organized as follows. Sec-tion 2 elaborates on our UTK-LRHM database. Facequality measures and enhancement algorithms are dis-cussed in Sections 3 and 4. Section 5 presents our experi-mental results and Section 6 concludes this paper.

2. Database description

Our database collection, including indoor and outdoorsessions, began in February 2006 and ended in October2006. The data set contains frontal view face images andvideos collected with various system magnifications (10·–284·), observation distances (10–300 m), indoor (office ceil-ing light and side light) and outdoor (sunny and cloudy)illuminations, still/moving subjects, and constant/varyingcamera zooms. Small expression and pose variations arealso included in the video sequences of our database, asshown in Fig. 2, closely resembling the variations encoun-tered in uncontrolled surveillance applications.

2.1. Indoor sequences

For the indoor sequence collection, the observation dis-tance was varied from 10 m to 16 m. Given this distancerange and an image resolution of 640 · 480, a 22· systemmagnification is sufficient to yield a face image with aninter-ocular distance of 60 pixels. This resolution is recom-mended by FaceIt� for successful recognition. Therefore, acommercially available PTZ camera (Panasonic WV-CS854) was used.

Our indoor database includes both still images (eightimages per subject) and video sequences (six sequencesper subject). Still images are collected at uniformly distrib-uted distances in the range of 10–16 m with an interval of1 m approximately. The corresponding system magnifica-tion varies from 10· to 20· with an increment of 2·,achieving an approximately constant inter-ocular distanceto eliminate the effect of resolution. Still images withlow magnification (1·) are also taken from a closedistance (1 m) as a reference image set. The achievable face

change and (b) a small pose variation.

114 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

recognition rate using this image set provides an ideal per-formance reference for evaluating degradations caused byhigh magnification.

The observation distance and system magnification aretwo major factors, to which this effort is devoted. Mean-while, the effect of composite target and camera motionsis included to achieve a close resemblance to practical sur-veillance scenarios. Therefore, the indoor video sequencesare recorded under the following conditions: (1) constantdistance & varying system magnification, (2) varying dis-tance (the subject walks at a normal speed towards theobservation camera) & constant system magnification,and (3) varying distance & varying system magnification.Conditions 1 and 2 concentrate on the individual effect ofcamera zoom and subject motion while the combined effectcan be observed in condition 3. In addition, the systemmagnification in condition 3 is varied so that a constantinter-ocular distance can be maintained. These videosequences can be used for studies of the effect of resolution,subject motion, and camera zoom. Fig. 3 shows exampleface images degraded by blurs from subject motion, camerazoom, and camera auto-focus.

The above still images and video sequences are collectedunder fluorescent ceiling lights with full intensity (approx-imately 500 lx) and include a certain degree of illuminationchanges caused by a varied distribution of the ceiling lights.Our indoor database also considers large amounts of illu-mination changes. A halogen side light (approximately2500 lx) is added and a sequence is recorded as the intensityof the ceiling lights is decreased from 100% to 0%, whichcreates a visual effect of a rotating light source.

The gallery images are collected by a Canon A80 cameraunder controlled indoor environment from a distance of

Fig. 3. Illustration of various types of blurs captured in our database in additiosubject’s motion, (e) camera’s zoom motion, and (f) camera’s focus motion.

0.5 m. The image resolution is 2272 · 1704 pixels and thecamera’s focal length is 114 mm (magnification: 2.28·).Figs. 4 and 5 illustrate sample images of one data recordin the database. A data record is a series of images of agiven subject under all shooting conditions. Table 2 sum-marizes the data set specifications.

The indoor session has 55 participants (78% male and22% female). Their ethnic distribution consists of 73% Cau-casian, 13% Asian, 9% Asian Indian, and 5% of Africandescent. The image resolution is 640 · 480 pixels. For thevideo sequences, our database provides uncompressedframes in the format of BMP files at a rate of 30 framesper second as well as AVI files compressed using MicrosoftMPEG 2.0 codec. Each video sequence lasts 9 s. The totalphysical size for storage is 84 GB, with 1.53 GB per subject.

2.2. Outdoor sequences

For the outdoor sequence collection, a composite imag-ing system was built where a Meade ETX-90 telescope(focal length: 1250 mm) was coupled with a JVC MG-37U camcorder (focal length: 2.3–73.6 mm) via a Celestron40 mm eyepiece using an afocal connection (Fig. 1(a)).Therefore, the achievable system magnification is of 22·–659·.

Our outdoor database includes both still images (twoimages per subject) and video sequences (twelve sequencesper subject). Two sequences per subject are collected at uni-formly distributed distances from 50 m to 300 m with aninterval of about 50 m. The corresponding system magnifi-cation varies from 66· to 284· with an increment of 44·,achieving an approximately constant inter-ocular distance.The two sequences are collected with different subject’s

n to magnification blur. (a)–(c) reference images. Blurred images due to: (d)

Fig. 4. A set of still images in one data record: (a) gallery image, (b) 1· reference, 60 p, (c) 10·, 9.5 m, 57 p, (d) 12·, 10.4 m, 57 p, (e) 14·, 11.9 m, 58 p, (f)16·, 13.4 m, 60 p, (g) 18·, 14.6 m, 60 p, and (h) 20·, 15.9 m, 60 p. Face images in (b)–(h) have approximately the same resolution with an inter-oculardistance around 60 pixels. The inter-ocular distance is obtained by averaging all the face images across different subjects in each data set.

Fig. 5. A set of sample frames from collected sequences in one data record. (a) Condition 1: 20· fi 10·, 13.4 m, constant observation distance. (b)Condition 2: 10·, 15.9 m fi 9.5 m, constant system magnification. (c) Condition 3: 20· fi 10·, 15.9 mfi 9.5 m, constant inter-ocular distance. (d) Varyingillumination, 20·, 15.9 m.

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 115

Table 2Indoor sequence specifications

Still images

Magnification (·) 1 10 12 14 16 18 20Distance (m) 1 9.5 10.4 11.9 13.4 14.6 15.9Inter-ocular distance (pixel) 60 57 57 58 60 60 60

Conditions Magnification (·) Distance (m)

Various video sequence conditions

1. Constant distance & varying system magnification 10 fi 20 13.4 and 15.92. Varying distance & constant system magnification 10 and 15 9.5 fi 15.93. Varying distance & varying system magnification 10 fi 20 9.5 fi 15.9Varying illumination, constant distance & system magnification 20 15.9

Table 3Outdoor sequence specifications

Magnification (·) 66 109 153 197 241 284Distance (m) 50 100 150 200 250 300Inter-ocular distance (pixel) 79 76 79 76 78 78

116 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

motions, one with the subject standing still and the otherwith the subject walking a short distance. One still imageper subject is also taken from a close distance (1 m at 1·)for a reference image set. The gallery images are collectedby a Nikon camera under controlled indoor environmentfrom a distance of 1 m. The image resolution is2560 · 1920 pixels. Fig. 6 illustrates sample images of onedata record in the outdoor database and Table 3 summa-rizes various sequence specifications.

The outdoor session has 48 participants (83.3% maleand 16.7% female). Their ethnic distribution consists of41.7% Caucasian, 39.5% Asian, and 18.8% Asian Indian.The image resolution is 640 · 480 pixels. Each videosequence lasts 5 s. The total physical size for storage is724.8 MB, with 15.7MB per subject. The outdoor databasehas sequences of 20 subjects (41.7%) who came back fromthe indoor session.

Fig. 6. A set of sample frames from standing sequences in one outdoor data rec100 m, 76 p, (e) 153·, 150 m, 79 p, (f) 197·, 200 m, 76 p, (g) 241·, 250 m, 78 p,same resolution with an inter-ocular distance of 80 pixels.

3. Face image quality assessment

In this section, we study the degradations in face recog-nition rates introduced by an increased system magnifica-tion and observation distance. Apart from illumination,pose, and expression, magnification blur is identified asanother major deteriorating source for long range data.To describe the corresponding degradations, an adaptiveface image quality measure is developed based on imagesharpness measures.

ord: (a) indoor gallery image, (b) 1· reference, (c) 66·, 50 m, 79 p, (d) 109·,and (h) 284·, 300 m, 78 p. Face images in (c)–(h) have approximately the

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 117

3.1. Face recognition rates vs. system magnification

We first examine the relation between face recognitionrates and system magnifications/observation distances.The gallery image sets (Figs. 4a and 6a) are comparedagainst different sets of probe images with a constantinter-ocular distance, each set consisting of face images col-lected at the same observation distance and with the samesystem magnification. We use FaceIt� [37] and VeriLook�

[38] as our evaluation tools and rank-one recognition rateas the comparison criterion. In the following experiments,including the experiments in Section 5, we are able toobtain similar conclusions using either recognition engine.We will only show the plots obtained from FaceIt�.

The CMC curves with respect to various system magni-fications are illustrated in Fig. 7 and Table 4. Image dete-rioration from limited fine facial details causes therecognition rate to drop gradually as the system magnifica-tion increases. For indoor images, the rank-one recognitionrate declines from 61.8% to 47.3% as the system magnifica-tion increases from 10· to 20·. There exists a significantperformance gap between the low (1·) and high (20·) mag-nification probes. Similar observations apply to the out-door session, where the rank-one recognition ratedeclines from 51.1% to 29.8% as the observation distanceincreases from 50 m to 300 m, which reveals that magnifi-cation blur is an additional major degrading factor in facerecognition. As expected, the rank-one recognition ratefrom Verilook� also decreases as the system magnificationincreases. The rank-one recognition rate drops from 41.8%to 32.7% for the indoor session and from 58.7% to 10.9%for the outdoor session. This performance gap needs tobe compensated for by image enhancement.

3.2. Image sharpness vs. system magnification

Since image noise levels increase with system magnifica-tions, conventional sharpness measures, commonly used toevaluate out-of-focus blur [40], are sensitive to image noiseand therefore can not be directly applied to high magnifica-

1 2 3 4 5 6 7 8 9 10

50

60

70

80

90

Rank

Cum

ulat

ive

reco

gnitio

n ra

te (%

)

1x10x12x14x16x18x20x

a b

Fig. 7. CMC comparison across probes with different system magnificationrecognition rate drops gradually as the system magnification increases. The C

tion images [21]. To avoid artificially elevated sharpnessvalues due to image noise, adaptive sharpness measureswere proposed by the authors and applied to high magnifi-cation systems to quantify magnification blur [39].

To justify the use of adaptive sharpness measures, thenoise characteristics of the image sets at various magnifica-tions are investigated. The standard deviation of the uni-form background is used to characterize image noise leveland is computed with respect to system magnification, asshown in Fig. 8. The image noise level increases as the sys-tem magnification increases. Therefore, to exclude the arti-ficially increased sharpness measure caused by increasednoise levels, adaptive sharpness measures are used.

Adaptive sharpness measures assign different weights topixel gradients according to their local activities. For pixelsin smooth areas, small weights are used. For pixels adja-cent to strong edges, large weights are allocated. Adaptivesharpness measures can be divided into two groups, separa-ble and non-separable. Separable adaptive sharpness mea-sures only focus on horizontal and vertical edges whilenon-separable measures include the contributions fromdiagonal edges. For separable approaches, two weight sig-nals are constructed, a horizontal

Lxðx; yÞ ¼ ½Iðxþ 1; yÞ � Iðx� 1; yÞ�p ð1Þ

and a vertical

Lyðx; yÞ ¼ ½Iðx; y þ 1Þ � Iðx; y � 1Þ�p; ð2Þ

where I(x, y) denotes the image intensity at pixel (x,y) andp is a power index determining the degree of noise suppres-sion. In general, larger power indices are used when imagenoise levels are high. The adaptive Tenengrad sharpnessmeasure, for instance, becomes

S ¼XM

x¼1

XN

y¼1

Lxðx; yÞI2xðx; yÞ þ Lyðx; yÞI2

yðx; yÞh i

; ð3Þ

where M/N denotes the number of image rows/columnsand Ix (x,y)/Iy(x,y) represents the horizontal/vertical gradi-ent at pixel (x,y) obtained via the Sobel filter. For non-sep-arable methods, the weights are given by

1 2 3 4 5 6 7 8 9 1020

30

40

50

60

70

80

90

100

Rank

Cum

ulat

ive

reco

gniti

on ra

te (%

)

1m50m100m150m200m250m300m

s and observation distances: (a) indoor and (b) outdoor sessions. FaceMC curves are obtained using FaceIt�.

Table 4Comparison of rank-one recognition rates across probes with different system magnifications and observation distances

Indoor session

System magnification (·) 1 10 12 14 16 18 20

FaceIt� (%) 65.5 61.8 60.0 58.2 56.4 49.1 47.3

VeriLook� (%) 63.6 41.8 40.0 38.2 34.6 32.7 32.7

Outdoor session

Observation distance (m) 1 50 100 150 200 250 300

FaceIt� (%) 80.9 51.1 46.8 38.3 35.6 34.0 29.8

VeriLook� (%) 76.1 58.7 41.3 34.8 30.4 21.7 10.9

1 10 12 14 16 18 202

4

6

8

10

12

14

System magnification (x)

Estim

ated

imag

e no

ise

leve

l

Mean noise level

Fig. 8. Image noise levels for face images collected with different systemmagnifications. The dots represent the noise deviation computed from faceimages of different subjects. The mean noise level increases as the systemmagnification increases.

118 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

Lðx;yÞ¼ ½Iðx�1;yÞþ Iðxþ1;yÞ� Iðx;y�1Þ� Iðx;yþ1Þ�p;ð4Þ

and the corresponding adaptive Tenengrad is formulatedas

S ¼XM

x¼1

XN

y¼1

Lðx; yÞ I2xðx; yÞ þ I2

yðx; yÞh i

: ð5Þ

In the following experiments, each data set consists offace images collected from the same observation distanceand with the same system magnification. The sharpnessmeasures of these face images are computed and the meansharpness values are obtained by averaging them acrossdifferent subjects within one image set. Fig. 9 shows twoplots of the computed sharpness values (NSATP2: non-sep-arable adaptive Tenengrad measure with p = 2) and theirmeans. These mean sharpness values present a clearer viewof how image quality responds to system magnification/observation distance. As expected, image sharpnessdecreases as system magnification/observation distanceincreases for both indoor and outdoor image sets. Thedecrease in image sharpness is consistent with the decreasein face recognition rates caused by magnification blur asseen in Fig. 7. Therefore, we could use the adaptive Tenen-grad as an indicator not only for the degree of magnifica-tion blur but also for the recognition rates.

To illustrate the effectiveness of the newly developedNSATP2, Fig. 10 presents a comparison of NSATP2

against the focus component in the ‘‘faceness’’ measurefrom FaceIt�, which also evaluates degradation from blur.For the outdoor session, both NSATP2 and the focus com-ponent from FaceIt� can properly describe the degradationfrom magnification blur. However, for the indoor session,since the sampled magnification interval is relatively dense,the focus component, corrupted by increased noise, isunable to represent the influence of magnification blur.Its value increases as the system magnification increasesfrom 12· to 20·. In comparison, the NSATP2 measuredecreases consistently as the system magnificationincreases, which agrees with both the visual inspectionand the behavior of face recognition rates shown inFig. 7. Therefore, our NSATP2 measure is able to detectthe artificially elevated sharpness caused by the increasednoise introduced by high magnification and therefore pro-duces a robust and accurate evaluation of magnificationblur.

4. Face image enhancement

As illustrated in Section 3, high magnification imagessuffer from both increased image blur and high noise levels.In general, deblurring algorithms are sensitive to imagenoise unless an appropriate probabilistic model is used.Denoising algorithms, on the other hand, smooth out valu-able image details. The resulting images are either low ondetails or too noisy. Since most face recognition enginesare sensitive to both types of degradations, a good balanceneeds to be found to achieve an optimal recognition rate. Amulti-scale processing based on wavelet transforms is usedand proves to be effective in restoring and enhancing datawith high magnification values.

4.1. Algorithm description

The sharpness of each probe image is computed and itsvalue compared with a pre-learned threshold Sth. Based onthe assumption of Gaussian distribution and using Maxi-mum Likelihood (ML) estimation, Sth can be obtained byaveraging the mean sharpness measures of the 1· and10· image sets. If the sharpness value of the current probeimage is smaller than Sth, image enhancement is performed.Otherwise, no processing is conducted. In so doing, onlythose images which may deteriorate the overall recognitionrate are processed. Images with acceptable sharpness are

1 5 10 15 20

1

1.5

2

2.5

x 104

System magnification (x)

Face

imag

e sh

arpn

ess

mea

sure

Mean sharpnessmeasure

0 50 100 150 200 250 300102

103

104

Observation distance (m)

Face

imag

e sh

arpn

ess

mea

sure

Mean sharpnessmeasure

a b

Fig. 9. Sharpness measures for face images collected with different system magnifications/observation distances: (a) indoor and (b) outdoor sessions. Thedots represent the sharpness measures computed from face images of different subjects. The mean sharpness value decreases as the system magnification/observation distance increases.

5 10 15 200.4

0.5

0.6

0.7

0.8

0.9

1

System magnification (x)

Nor

mal

ized

imag

e qu

ality

mea

sure

NSATP2

FaceIt

1 50 100 150 200 250 3000

0.2

0.4

0.6

0.8

1

Observation distance (m)

Nor

mal

ized

imag

e qu

ality

mea

sure NSATP2

FaceIt

a b

Fig. 10. Comparison with the focus component in the ‘‘faceness’’ measure from FaceIt�: (a) indoor and (b) outdoor sessions. For fair comparison, themeasures are normalized with respect to the values at 1· and 1 m for indoor and outdoor sessions, respectively.

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 119

fed to the face recognition engine directly. The importanceof choosing an efficient measure of face image qualitybecomes evident in assessing and sorting the probe images.Another advantage of using the face quality measuredescribed in Section 3 is attributed to the reduced compu-tational complexity, which is also crucial to real-time appli-cations. The block diagram of the proposed algorithm isdepicted in Fig. 11.

The proposed algorithm proceeds as follows.

(1) Detect the face in the input probe image. If a face isdetected, go to step (2). Otherwise, wait for the nextframe.

(2) Compute the inter-ocular distance. If the inter-oculardistance is greater than 60 pixels, go to step (4).

(3) Perform super-resolution using consecutive faceframes.

(4) Compute the sharpness measure of the input faceimage: S.

(5) If S < Sth, go to step (6). Otherwise, output the faceimage and go to step (1) for the next probe.

(6) Decompose the face image via the Haar wavelettransform of level 1.

(7) Apply deblurring algorithms to the approximationcoefficients.

(8) Apply denoising algorithms to the vertical/horizon-tal/diagonal detail coefficients.

(9) Apply adaptive grey level contrast stretching.(10) Reconstruct the output image via the corresponding

inverse wavelet transform.

Face detection and inter-ocular distance computationare implemented using existing OpenCV function [41] andFaceIt� SDK [42]. SR algorithms can be divided into reg-istration and interpolation stages. In our implementation,the Vandewalle method is chosen for registration [23] andcubic spline method is selected for 2D interpolation [45].Global thresholding is used for denoising. Two types ofdeblurring algorithms, UM and regularized deconvolution,are implemented for the enhancement of the approxima-tion image. The UM method follows its traditional imple-mentation using a Laplacian filter. The regularizeddeconvolution utilizes the Lasso regularization as discussedin the following section [20].

4.2. Regularization approach

A typical Tikhonov regularized deconvolution usesnorm-2 definition and solves the following optimizationproblem:

Fig. 11. Block diagram of the enhancement algorithm for long range and high magnification face images.

120 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

fk ¼ arg min jjAf � fbjj2L2þ kjjLf jj2L2

n o; ð6Þ

where f and fb are the original and blurred images in vectorformat, A and L represent the blurring filter and a pre-de-fined mask in vector format, and k denotes the regulariza-tion parameter. Various forms of L can be found inliterature. The identity matrix and Laplacian filter aretwo popular choices [43,44]. The Tikhonov regularizationdoes not allow discontinuities in the solution, leading tooverall smoothed edges in the restored images. The totalvariation regularization is proposed to preserve edges inthe reconstructed images [16]. A norm-1 definition isadopted:

fk ¼ arg min jjAf � fbjj2L2þ kjj

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffif 2

x þ f 2y

qjjL1

n o; ð7Þ

where fx and fy denote the vertical and horizontal imagegradients in vector format. The total variation regulariza-tion is capable of preserving edges but suffers from a signif-icantly increased computational complexity. In ourimplementation, we utilize the Lasso regularization:

fk ¼ arg min jjAf � fbjj2L2þ kjjf jjL1

n o: ð8Þ

The Lasso regularization achieves similar edge preservationas the total variation regularization with substantially re-duced computations.

5. Experimental results

In this section, we examine the performance of differentimage sets processed via various enhancement and SRalgorithms, as well as their combinations when necessary.These image sets are sampled from the UTK-LRHM data-

base and represent scenarios with various observation dis-tances, system magnifications, and resolutions. Themagnifications, distances, and inter-ocular distances testedin our experiments vary between 10·–284·, 10–300 m, and35 pixels–85 pixels, respectively. The image sets consideredin this section can be categorized into two distinct groups:(1) low resolution with an inter-ocular distance less than60 pixels and (2) high resolution with an inter-ocular dis-tance larger than or equal to 60 pixels. Based on our exper-imental results, suitable combinations of image processingprocedures are recommended for these two groups of dataso as to achieve the optimal recognition rate.

Before continuing with our discussion, we define the fol-lowing notations. The probe set with face images taken at amagnification, M, a distance, D in meters, and with aninter-ocular distance of C pixels, is denoted asM · D m C p.

5.1. Image enhancement

Frames selected from video sequences by applying theadaptive Tenengrad algorithm are used in the followingexperiments to exclude blurs from other sources such assubject motion, camera zooming, and improper focus. Dif-ferent probe sets are obtained by enhancing the same imageset via various enhancement methods, including UM,deconvolution (Deconv), Liao and Lin’s Eigen-face [19],and our wavelet based method. In addition, two probe sets,the unprocessed face images and the 1· reference faceimages, are also included and their performances serve asreferences for comparison. The same experiments arerepeated for image sets at different magnifications andobservation distances. Since similar observations areobtained, in the interest of space, we select the comparisons

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 121

based on the 10 · 9 m 60 p, 20 · 16 m 60 p,109 · 100 m 80 p, and 284 · 300 m 80 p data sets for illus-tration (Fig. 12 and Table 5).

Our wavelet based algorithms are able to achieve themost improvements with an increase of 16.3% in therank-one recognition rate for the 20 · 16 m 60 p data set,yielding a performance comparable to the 1· reference. Itis seen from these results that with the proper processing,the degradation in face recognition rate caused by magnifi-cation blur can be successfully compensated for. In thiswork, we also want to make use of the adaptive Tenengradto predict face recognition rates at different system magni-fications and determine whether an image enhancementstep is necessary. From Fig. 9(a), we have Sth = 16,600by averaging the mean sharpness measures of the 1· and10· image sets. Using sharpness measure selection (SMS),two (3.6%) and eight (14.5%) samples from the

1 2 3 4 5 6 7 8 9 1050

60

70

80

90

100

Rank

Cum

ulat

ive

reco

gniti

on ra

te (%

)

OriginalUMDeconvWavelet+Lasso(no contrast streching)Wavelet+UMWavelet+UM SMSWavelet+LassoWavelet+Lasso SMS1x reference

1 2 3 4 5 6 7 8 9 1040

50

60

70

80

90

100

Rank

Cum

ulat

ive

reco

gniti

on ra

te (%

)

OriginalUMDeconvWavelet+UM SMSWavelet+Lasso SMS1x reference

a b c

e

g

Fig. 12. Sample face images for the 20 · 16 m 60 p data set: (a) original imagestretching, and (d) enhanced by wavelet + Lasso with contrast stretching. CMCIndoor data sets: (e) 10 · 9 m60 p and (f) 20 · 16 m 60 p. Outdoor data sets:wavelet Lasso/UM algorithm with and without SMS are identical for the 20 ·shown in (f). All CMC curves are obtained using FaceIt�.

20 · 16 m 60 p and 10 · 9 m 60 p image sets, respectively,meet the minimum sharpness requirement and hence donot need preprocessing before the recognition step. Theresulting face recognition rates are identical to the casewhere all images are processed for the 20 · 16 m 60 p imageset while an improved recognition rate is achieved for the10 · 9 m 60 p image set when the images with S P Sth arenot processed via wavelet Lasso. This verifies the suitabilityof the derived threshold and the effectiveness of the SMSmethod.

For the outdoor data sets, our enhancement algorithmcan improve the rank-one recognition rate from 46.8% to61.1% for the 109 · 100 m 80 p data set and from 29.8%to 36.9% for the 284 · 300 m 80 p data set. As the systemmagnification increases, the improvement in recognitionrates decreases. Different from the indoor session, wherea similar performance is achieved as the 1· reference after

1 2 3 4 5 6 7 8 9 1030

40

50

60

70

80

90

100

Rank

Cum

ulat

ive

reco

gniti

on ra

te (%

)

OriginalUMDeconvEigen-faceWavelet+DeconvWavelet+UM SMSWavelet+Lasso SMS(no constrast stretching)Wavelet+Lasso SMS1x reference

1 2 3 4 5 6 7 8 9 1020

30

40

50

60

70

80

90

100

Rank

Cum

ulat

ive

reco

gniti

on ra

te (%

)

OriginalUMDeconvWavelet+UM SMSWavelet+Lasso SMS1x reference

d

h

f

, (b) enhanced by UM, (c) enhanced by wavelet + Lasso without contrastcomparison across probes processed by different enhancement algorithms.(g) 109 · 100 m 80 p and (h) 284 · 300 m 80 p. The performances of the16 m 60 p data set. Only the CMC curves of wavelet Lasso/UM SMS are

a

d

b

e

c

f

40

60

80

100

reco

gniti

on ra

te (%

)

g

Table 5Comparison of rank-one recognition rates across probes processed bydifferent enhancement algorithms

10 · 9 m 60 p 20 · 16 m 60 p

Indoor

Original 61.8 47.3

Eigen-face [19] 55.2 48.2UM 54.5 50.9Deconv 54.5 56.4Wavelet + Deconv 52.7 56.4Wavelet + Lasso SMS (no contrast

stretching)67.2 59.7

Wavelet + UM 65.5 65.5Wavelet + UM SMS 63.6 65.5Wavelet + Lasso 63.6 63.6

Wavelet + Lasso SMS 69.1 63.6

1· reference 65.5

109 · 100 m 80 p 284 · 300 m 80 pOutdoor

Original 46.8 29.8

UM 53.2 40.4Deconv 50.4 34.0Wavelet + UM SMS 59.0 36.9Wavelet + Lasso SMS 61.1 36.9

1· reference 89.4

Results are obtained using FaceIt�.

122 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

image quality assessment and enhancement, the gapbetween the 1· reference and the high magnification datasets is still present for the outdoor session, especially fordata sets with system magnifications beyond 100·. In addi-tion to magnification blur, the outdoor images also experi-ence non-uniform blur due to air turbulence. In our currentalgorithm, a uniform PSF is estimated and used for deblur-ring in the wavelet space. To overcome the degradationsfrom non-uniform blur, the PSF should be adaptively esti-mated according to different regions within one image. Thiswill be addressed in our future work.

To verify the aforementioned observations obtained byusing FaceIt� as a recognition engine, the same experi-ments are conducted using another publicly availableengine VeriLook� [38]. With similar conclusions obtainedfor all data sets, only the results of the 20 · 16 m 60 p dataset are compared in Table 6. As expected, our methodsconsistently generate the most significant improvementscompared with the original data sets.

Table 6Comparison of rank-one recognition rates using FaceIt� and VeriLook�

for the 20 · 16 m 60 p data set

Probe set VeriLook� FaceIt�

Original 32.7 47.3

Eigen-face [19] 43.6 48.2UM 47.3 50.9Deconv 49.1 56.4Wavelet + Lasso SMS 54.5 63.6

1· reference 63.6 65.5

5.2. Super-resolution

We evaluated and compared the performances of vari-ous SR algorithms in terms of face recognition rates. TheSR algorithm based on Vandewalle registration [23] andcubic spline interpolation was selected [45]. Following theconvention in [45], a SR implementation with a SR factorof m and using n consecutive frames is denoted as SR(m,n). The SR factor is the ratio between the height/widthof the output high resolution image and the height/width ofthe input low resolution image.

In the following experiment, different probe sets with dif-ferent inter-ocular distances, 10 · 16 m 35 p, 10 · 9 m 60 p,and 20 · 13 m 85 p, processed by SR(2,4) are comparedagainst the same gallery set. Fig. 13 depicts the resultingCMC curves and Table 7 compares the improvements inthe rank-one recognition rate. For probes with an inter-ocu-lar distance less than 60 pixels (the required minimum reso-lution by FaceIt�), an obvious improvement is achieved byapplying SR, indicated by an increase of 14.5% in the rank-one recognition rate. However, as the inter-ocular distance

1 2 3 4 5 6 7 8 9 100

20

Rank

Cum

ulat

ive

10x16m35pSR(2,4)10x9m60pSR(2,4)20x13m85pSR(2,4)

Fig. 13. Illustration of results using SR images. (a), (b), and (c): originalimages shown at 50% of the original image size for the 10 · 16 m 35 p,10 · 9 m 60 p, and 20 · 13 m 85 p data sets. (d), (e), and (f): SR imagesshown at 25% of the original image size for the 10 · 16 m35 p,10 · 9 m 60 p, and 20 · 13 m 85 p data sets. (g) CMC comparisons acrossprobes processed by SR(2,4) using image sets with different inter-oculardistances. The CMC curves are obtained using FaceIt�.

Table 7Comparison of rank-one recognition rates across probes processed bySR(2,4)

Probe 10 · 16 m 35 p 10 · 9 m 60 p 20 · 13 m 85 p

FaceIt�

Original (%) 9.1 49.1 58.2SR(2,4) (%) 23.6 54.6 65.5

VeriLook�

Original (%) 7.3 32.7 54.5SR(2,4) (%) 18.2 41.8 61.8

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 123

increases, this improvement decreases. When the inter-ocu-lar distance exceeds 85 pixels, the improvement drops to7.3%. With further increased inter-ocular distance, theadvantage of using SR diminishes. Although the requiredminimum inter-ocular distance is not specified in Veri-Look�, the improvement in the rank-one recognition rate,benefited from SR, also degrades as the inter-ocular distanceincreases.

We also apply enhancement algorithms to superresolved probes aiming at further improved face recogni-tion rates. Two probes are super resolved and thenenhanced by UM and wavelet Lasso, respectively. Fourreference probes are used: (1) the original unprocessedimage set, (2) the optically zoomed image set with the same

1 2 3 4 50

10

20

30

40

50

60

70

80

90

Ran

Cum

ulat

ive

reco

gniti

on ra

te (%

)

a b c

f

Fig. 14. (a) Original image from the 10 · 16m35p image set. EnhancedSR(2,4) + wavelet Lasso. (a) and (b) are shown at 50% of the original image sizeacross probes processed by different combinations of SR and enhancement alg

Table 8Comparison of rank-one recognition rates across probes processed by differen

Probe set 10 · 16 m 35 p Wavelet SR

FaceIt� (%) 9.1 14.6 23.6

VeriLook� (%) 7.3 9.1 18.2

inter-ocular distance as the SR processed images, (3) theimage set processed by wavelet Lasso only, and (4) theimage set processed by SR only. Fig. 14 shows the resultingCMC curves, with the corresponding rank-one recognitionrates listed in Table 8.

The wavelet based processing has proved efficient androbust for enhancing long range and high magnificationface images as shown in Fig. 12. Compared with SR, ityields marginal improvement for face images with aninter-ocular distance less than the minimum requirementof 60 pixels. However, it can substantially elevate the rec-ognition rate of super resolved face images and leads toan increase of 12.8% in the rank-one recognition rate.The combination of SR and wavelet Lasso achieves compa-rable recognition rates as the optically zoomed probe,which demonstrates the feasibility and effectiveness ofusing a purely software based approach to increase faceresolution and recognition rates.

Based on our experimental results, we conclude that forimage sets with an inter-ocular distance greater than60 pixels, our wavelet based processing algorithm is suffi-cient and produces an increase of 16.3% in the rank-onerecognition rate with values similar to the 1· images. Forimage sets with smaller inter-ocular distances, SR process-ing is recommended. The combination of SR and wavelet

6 7 8 9 10k

OriginalWavelet LassoSRSR+UMSR+Wavelet LassoOptical zoom

d e

images: (b) wavelet Lasso, (c) SR(2,4), (d) SR(2,4) + UM, and (e). (c)–(e) are shown at 25% of the original image size. (f) CMC comparisonsorithms. The CMC curves are obtained using FaceIt�.

t combinations of SR and enhancement algorithms

SR+UM SR + wavelet Optical zoom 20 · 16 m 60 p

23.6 36.4 47.320.0 27.3 34.5

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Rank

Cum

ulat

ive

reco

gniti

on ra

te (%

)

Original 10x16m35pSR(2,4) 10x16m35pSR+Wavelet Lasso 10x16m35pOriginal 20x16m60pWavelet Lasso SMS 20x16m60p

Best SR

Best for <60

Best for >60

Fig. 15. CMC comparisons across the best processing methods using the20 · 16 m 60 p and 10 · 16 m 35 p image sets. Sample face images can befound in Figs. 12 and 14. The CMC curves are obtained using FaceIt�.

124 Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125

based restoration achieves an increase of up to 27.3% in therank-one recognition rate. Fig. 15 illustrates the best per-formances obtained for the 20 · 16 m 60 p and10 · 16 m 35 p image sets.

6. Conclusions

A unique face database with both still images and videosequences collected from long distances and with high sys-tem magnifications was established. This database featuresvarious types of degradations encountered in practical longrange surveillance with emphasis on magnification blur,which was addressed and identified as a major degradationsource in face recognition for long range applications.Apart from existing face quality measures, a special metricevaluating face quality degradations caused by high magni-fication was applied and its efficiency in distinguishing lowand high magnification images and predicting face recogni-tion rates was illustrated. A wavelet Lasso algorithm thatcan efficiently compensate for the degradations in face rec-ognition rates caused by magnification blur was introducedand implemented. A relative improvement of up to 34.5%in the rank-one recognition rate was achieved via deblur-ring, as compared to the original image. For face imageswith an inter-ocular distance less than 60 pixels, SR is ableto produce a relative improvement over the original imageof up to 300% in the rank-one recognition rate andachieves comparable performance as images with equiva-lent optical zoom when further enhanced via wavelet basedprocessing.

Acknowledgments

This work was supported by the DOE UniversityResearch program in Robotics under Grant #DOE-DEFG02-86NE37968 and NSF-CITeR Grant #01-598B-UT. Special appreciations are attributed to Trey Bohon,Tim Grundman, Doug Kiesler, Kevin Lynn, Jason Nitz-berg, and Gwang Son for their work in collecting the face

video database as part of their Capstone Senior Designproject.

References

[1] P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The FERETdatabase and evaluation procedure for face-recognition algorithms,Image and Vision Computing 16 (1998) 295–306.

[2] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K.Hoffman, J. Marques, J. Min, W. Worek, Overview of the facerecognition grand challenge, in: IEEE International Conference onComputer Vision and Pattern Recognition, Springer, CA, 2005, pp.947–954.

[3] A.M. Martinez, R. Benavente, The AR Face Database, CVCTechnical Report #24, 1998.

[4] E. Bailly-Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J.Mariethoz, J. Matas, K. Messer, V. Popovici, F. Poree, B. Ruiz, J.-P.Thiran, The BANCA database and evaluation protocol, in: Audio-and Video-Based Biometric Person Authentication, Guilfore, UK,2003, pp. 625–638.

[5] R. Goh, L. Liu, X. Liu, T. Chen, The CMU face in action database,in: Int’l Workshop on Analysis and Modeling of Face and Gestrues,in conjunction with International Conference on Computer Vision,Beijing, China, 2005, pp. 254–262.

[6] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, andexpression database, IEEE Transactions on Pattern Analysis andMachine Intelligence 25 (2003) 1615–1618.

[7] Y. Yao, B. Abidi, N. Kalka, N. Schmid, M. Abidi, High magnifi-cation and long distance face recognition: database acquisition,evaluation, and enhancement, in: Biometrics Symposium, Baltimore,MD, 2006.

[8] J.D. Shutler, M.G. Grant, M.S. Nixon, J.N. Carter, On a largesequence-based human gait database, in: International Conference onRecent Advances in Soft Computing, Nottingham, UK, 2002, pp. 66–71.

[9] A.J. O’Toole, J. Harms, S.L. Snow, D.R. Hurst, M.R. Pappas, J.H.Ayyad, H. Abdi, A video database of moving faces and people, IEEETransactions on Pattern Analysis and Machine Intelligence 27 (2005)812–816.

[10] P. Griffin, Understanding the face image format standards, in: ANSI/NIST Workshop, Gaithersburg, MD, 2005.

[11] N. Kalka, J. Zuo, N.A. Schmid, B. Cukic, Image quality assessmentfor iris biometric, in: SPIE Symposium on Defense and Security,Conference on Human Identification Technology III, Orlando, FL,2006, Article No. 62020D.

[12] Q. Xiong, C. Jaynes, Mugshot database acquisition in videosurveillance networks using incremental auto-clustering quality mea-sures, in: IEEE Conference on Advanced Video and Signal BasedSurveillance, Miami, FL, 2003, pp. 191–198.

[13] F. Weber, Some quality measures for face images and theirrelationship to recognition performance, in: NIST Biometric QualityWorkshop, Gaithersburg, MD, 2006.

[14] A. Adler, T. Dembinsky, Human vs. automatic measurement ofbiometric sample quality, in: Canadian Conference on Eletrical andComputer Engineering, Ottawa, Canada, 2006, pp. 2090–2093.

[15] G. Ramponi, A cubic unsharp masking technique for contrastenhancement, IEEE Transactions on Signal Processing 67 (1998) 211–222.

[16] T.G. Chan, C.K. Wong, Total variation blind deconvolution, IEEETransactions on Image Processing 7 (1998) 370–375.

[17] X. Fan, Q. Zhang, D. Liang, L. Zhao, Face image restoration basedon statistical prior and image blur measure, in: InternationalConference on Multimedia and Expo, Baltimore, MD, 2003, pp.297–300.

[18] I. Stainvas, N. Intrator, Blurred face recognition via a hybrid networkarchitecture, in: International Conference on Pattern Recognition,Barcelona, Spain, 2000, pp. 805–808.

Y. Yao et al. / Computer Vision and Image Understanding 111 (2008) 111–125 125

[19] Y. Liao, X. Lin, Blind image restoration with eigen-face subspace,IEEE Transactions on Image Processing 14 (2005) 1766–1772.

[20] V. Agarwal, A.V. Gribok, M. Abidi, Image restoration using L1 normpenalty function, Technical report, University of Tennessee, Knox-ville, 2005.

[21] Y. Yao, B. Abidi, M. Abidi, Quality assessment and restoration offace images in long range/high zoom video. Multi-Sensory Multi-Model Face Biometrics for Personal Identification, Springer-Verlag,New York, 2007.

[22] R.Y. Tsai, T.S. Huang, Multiframe image restoration and registra-tion, Advances in Computer Vision and Image Processing 1 (1984)317–339.

[23] P. Vandewalle, S. Susstrunk, M. Vetterli, A frequency domainapproach to registration of aliased images with applications to super-resolution, EURASIP Journal on Applied Signal Processing, 2006,Article No. 71459.

[24] R.S. Wagner, D.E. Waagen, and M.L. Cassabaum, Image super-resolution for improved automatic target recognition, in: SPIEDefense & Security Symposium, Orlando, FL, 2004, pp. 188–196.

[25] B. Marcel, M. Briot, R. Muttieta, Calcul de translation et rotationpar la transformation de Fourier, Traitement du Signal 14 (1997)135–149.

[26] D. Keren, S. Peleg, and R. Brada, Image sequence enhancement usingsub-pixel displacement, in: IEEE Conf. Comput. Vis. PatternRecognit., Ann Arbor, MI, 1988, pp. 742–746.

[27] P. Ferreira, Interpolation and discrete Papoulis-Gerchberg algorithm,IEEE Transactions on Signal Processing 42 (1994) 2596–2606.

[28] M. Irani, S. Peleg, Improving resolution by image registration,CVGIP: Graphical Models and Image Processing 53 (1991) 231–239.

[29] M.S. Alam, J.G. Bognar, R.C. Hardie, B.J. Yasuda, Infraredregistration and high resolution reconstruction using multiple trans-lationally shifted aliased video frames, IEEE Transactions onInstrumentation and Measurement 49 (2000) 915–923.

[30] H. Ur, D. Gross, Improved resolution from sub-pixel shifted pictures,CVGIP: Graphical Models and Image Processing 54 (1992) 186–191.

[31] S. Borman, R.L. Stevenson, Super-resolution from imagesequences—a review, in: Midwest Symposium on Circuits andSystems, Notre Dame, IN, 1998, pp. 9–12.

[32] S.C. Park, M.K. Park, M.G. Kang, Super-Resolution Image Recon-struction: A Technical Overview, IEEE Signal Processing Magazine20 (2003) 21–36.

[33] S. Baker, T. Kanade, Hallucinating faces, in: International Confer-ence on Automatic Face and Gesture Recognition, Grenoble, France,2000, pp. 83–88.

[34] B.K. Gunturk, A.U. Batur, Y. Altunbasak, M.H. Hayes, R.M.Mersereau, Eigenface-domain super-resolution for face recognition,IEEE Transactions on Image Processing 12 (2003) 597–606.

[35] D. Lin, W. Liu, X. Tang, Layered local prediction network withdynamic learning for face super-resolution, in: IEEE InternationalConference on Image Processing, Genova, Italy, 2005, pp. 885–888.

[36] K. Jia, S. Gong, Face super-resolution using multiple occludedimages of different resolutions, in: IEEE Conference on AdvancedVideo and Signal Based Surveillance, Teatro Sociale, Como, Italy,2005, pp. 614–619.

[37] P.J. Phillips, P. Grother, R.J. Micheals, D.M. Blackburn, E. Tabassi,M. Bone, Face Recognition Vendor Test 2002, Evaluation Report,http://www.frvt.org/FRVT2002/documents.htm.

[38] http://www.neurotechnologija.com/vl_sdk.html.[39] Y. Yao, B. Abidi, M. Abidi, Digital imaging with extreme zoom:

system design and image restoration, in: IEEE Conference onComputer Vision Systems, New York, 2006, pp. 52–58.

[40] E.P. Krotkov, Active Computer Vision by Cooperative Focus andStereo, Springer-Verlag, New York, 1989.

[41] R. Lienhart and J. Maydt, An extended set of Haar-like features forrapid object detection, in: IEEE International Conference on ImageProcessing, Rochester, New York, 2002, pp. 900–903.

[42] FaceIt� SDK Developer’s Guide, Software version 6.1, 2005.[43] C.W. Groetsch, The Theory of Tikhonov Regularization of Fredholm

Integral Equations of the First Kind, Pitman, Boston, 1984.[44] A.N. Tikhonov, V.Y. Arsenin, Solution of Ill-posed Problems, John

Wiley, New York, 1977.[45] Y. Yao, B. Abidi, N.D. Kalka, N. Schimid, and M. Abidi, Super-

resolution for high magnification face images, in: SPIE Defense andSecurity Symposium, Biometric Technology for Human IdentificationIV, Orlando, FL, 2007, Article No. 65390G.