Human and Computer Evaluations of Face Sketches with Implications for Forensic Investigations

7
Human and Computer Evaluations of Face Sketches with Implications for Forensic Investigations Yong Zhang, Member, IEEE, Christine McCullough, John R. Sullins, Member, IEEE, and Christine R. Ross Abstract Because sketches represent the original faces in a much concise yet recognizable form, they play an important role in criminal investigations, human perceptions and biometrics. In this work, we compared the performances of humans and a PCA-based algorithm in recognizing face sketches. A total of 250 sketches of 50 subjects were involved. All sketches were drawn manually by five artists (each artist drew 50 sketches, one for each subject). Experiments were carried out by matching sketches in a probe set to photos in a gallery set. This study resulted in the following findings: (i) A large inter-artist variation in sketch recognition rate was observed; (ii) Fusing sketches from different artists significantly improved the performance; (iii) Human performance seems correlated with that of the algorithm; (iv) The algorithm was superior with sketches of less distinctive features, while humans used tonality (or pigmentation) cues more efficiently. I. INTRODUCTION Face sketching is a technique that has been routinely used in criminal investigations [1]. The success of using face sketches to capture fugitives was often publicized in the media coverage, especially for the high profile cases [2]. As a special forensic art, face sketching is traditionally done manually by police artists. As a result of rapid advancements in computer graphics and face biometrics, sophisticated facial composite kits have been developed and used in law enforcement agencies. A national survey indicates that 80% of police departments in the U.S. have used composite software and 43°0 of them relied on forensic artists [3]. However, there are concerns about the accuracy of face sketches, especially those generated by composite software. Evaluation studies showed that computerized systems produced poor results and were inferior to well-trained sketch artists [4, 5]. In fact, hand-drawn sketches are still considered more reliable by law enforcement agencies. One of the drawbacks of computerized composite systems is that they followed a "piecemeal" approach by adding individual facial features in an isolated manner. In contrast, artists tend to This work was supported in part by Youngstown State University Research Council Grant No. 08 - #8. Y. Zhang and J. R. Sullins are with Department of Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555. C. McCullough is with Department of Arts, Youngstown State University, Youngstown, OH 44555. C. Ross is with Criminal Intelligence Unit, Ohio Bureau of Criminal Identification and Investigation, Youngstown, OH 44503. utilize a more "holistic" strategy when they draw sketches. The quality of a sketch (whether by software or an artist) is dependent upon many factors such as an artist's skill and experience, exposure time to a face, distinctiveness of a face, as well as memory and emotion of eyewitnesses. The impacts of these factors and their complex interrelationships have not been well understood on a quantitative basis. In this paper, we examined the effectiveness of sketches by comparing the performances of humans and an algorithm. To simplify the task, we used sketches obtained under an "ideal" condition: artists drew sketches by looking at photos without a time constraint. This type of sketch allows us to address some fundamental issues that are of interest to both criminal investigators and researchers in biometrics and psychology: (i) Does the sketch recognition rate change greatly from one artist to another? If so, we may harness the inter-artist variation using a fusion method; (ii) We can use the "ideal" sketches to establish a baseline to benchmark the performance of sketches that are drawn under more forensically realistic conditions; (iii) In sketch-photo matching, do humans use certain cues more than algorithms? How can forensic artists and composite software developers benefit from the findings? II. RELATED WORKS There is a large body of literature on face recognition [6]. We give a brief review on three groups of papers that are most relevant to this study. It has long been recognized that a better understanding of how humans perform face recognition, especially under challenging conditions, is a critical step towards designing a more robust and efficient machine vision system. An excellent review of key findings in human face perception can be found in [7]. Insightful discussions are provided on human vision's tolerance of face degradation and the connection between high frequency features and the recognition of line drawings. Bruce et al [8] addressed the issue of incorporating psychologically plausible components into face computational models by examining the correlations between human performance and two algorithms. More recently, O'Toole et al [9] compared seven algorithms with humans using a large data set. Their results showed that the best algorithms surpassed humans with both "easy" and "difficult" faces. They also conducted a human-algorithm fusion test and obtained nearly perfect classifications [10]. In another study, Adler and Schuckers [11] reported that 29.2% 978-1-4244-2730-7/08/$25.00 0D2008 IEEE

Transcript of Human and Computer Evaluations of Face Sketches with Implications for Forensic Investigations

Human and Computer Evaluations of Face Sketcheswith Implications for Forensic Investigations

Yong Zhang, Member, IEEE, Christine McCullough, John R. Sullins, Member, IEEE, andChristine R. Ross

AbstractBecause sketches represent the original faces in a much

concise yet recognizable form, they play an important role incriminal investigations, human perceptions and biometrics. Inthis work, we compared the performances of humans and aPCA-based algorithm in recognizing face sketches. A total of250 sketches of 50 subjects were involved. All sketches weredrawn manually by five artists (each artist drew 50 sketches,one for each subject). Experiments were carried out bymatching sketches in a probe set to photos in a gallery set. Thisstudy resulted in the following findings: (i) A large inter-artistvariation in sketch recognition rate was observed; (ii) Fusingsketches from different artists significantly improved theperformance; (iii) Human performance seems correlated withthat of the algorithm; (iv) The algorithm was superior withsketches of less distinctive features, while humans used tonality(or pigmentation) cues more efficiently.

I. INTRODUCTION

Face sketching is a technique that has been routinely usedin criminal investigations [1]. The success of using facesketches to capture fugitives was often publicized in themedia coverage, especially for the high profile cases [2]. As aspecial forensic art, face sketching is traditionally donemanually by police artists. As a result of rapid advancementsin computer graphics and face biometrics, sophisticated facialcomposite kits have been developed and used in lawenforcement agencies. A national survey indicates that 80%of police departments in the U.S. have used compositesoftware and 43°0 of them relied on forensic artists [3].

However, there are concerns about the accuracy of facesketches, especially those generated by composite software.Evaluation studies showed that computerized systemsproduced poor results and were inferior to well-trained sketchartists [4, 5]. In fact, hand-drawn sketches are still consideredmore reliable by law enforcement agencies. One of thedrawbacks of computerized composite systems is that theyfollowed a "piecemeal" approach by adding individual facialfeatures in an isolated manner. In contrast, artists tend to

This work was supported in part by Youngstown State UniversityResearch Council Grant No. 08 - #8.

Y. Zhang and J. R. Sullins are with Department of Computer Science andInformation Systems, Youngstown State University, Youngstown, OH44555.

C. McCullough is with Department ofArts, Youngstown State University,Youngstown, OH 44555.

C. Ross is with Criminal Intelligence Unit, Ohio Bureau of CriminalIdentification and Investigation, Youngstown, OH 44503.

utilize a more "holistic" strategy when they draw sketches.The quality of a sketch (whether by software or an artist)

is dependent upon many factors such as an artist's skill andexperience, exposure time to a face, distinctiveness of a face,as well as memory and emotion of eyewitnesses. The impactsof these factors and their complex interrelationships have notbeen well understood on a quantitative basis. In this paper, weexamined the effectiveness of sketches by comparing theperformances of humans and an algorithm. To simplify thetask, we used sketches obtained under an "ideal" condition:artists drew sketches by looking at photos without a timeconstraint. This type of sketch allows us to address somefundamental issues that are of interest to both criminalinvestigators and researchers in biometrics and psychology:(i) Does the sketch recognition rate change greatly from oneartist to another? If so, we may harness the inter-artistvariation using a fusion method; (ii) We can use the "ideal"sketches to establish a baseline to benchmark theperformance of sketches that are drawn under moreforensically realistic conditions; (iii) In sketch-photomatching, do humans use certain cues more than algorithms?How can forensic artists and composite software developersbenefit from the findings?

II. RELATED WORKS

There is a large body of literature on face recognition [6].We give a briefreview on three groups ofpapers that are mostrelevant to this study.

It has long been recognized that a better understanding ofhow humans perform face recognition, especially underchallenging conditions, is a critical step towards designing amore robust and efficient machine vision system. Anexcellent review of key findings in human face perceptioncan be found in [7]. Insightful discussions are provided onhuman vision's tolerance of face degradation and theconnection between high frequency features and therecognition of line drawings. Bruce et al [8] addressed theissue of incorporating psychologically plausible componentsinto face computational models by examining the correlationsbetween human performance and two algorithms. Morerecently, O'Toole et al [9] compared seven algorithms withhumans using a large data set. Their results showed that thebest algorithms surpassed humans with both "easy" and"difficult" faces. They also conducted a human-algorithmfusion test and obtained nearly perfect classifications [10]. Inanother study, Adler and Schuckers [11] reported that 29.2%

978-1-4244-2730-7/08/$25.00 0D2008 IEEE

of humans performed better than the best algorithm and37.50O of humans performed worse. They used a differentmeasuring method and did not prescreen subjects based on

gender and ethnicity. It should be noted that allaforementioned comparative experiments used untrainedindividuals performing the evaluations and used only photosas inputs.

Sketch recognition research is strongly motivated by itsforensic applications. The earlier works include a study ofmatching police sketches to mug-shot photographs [12].Sketches were transformed into pseudo-photographs througha series of standardizations, and were compared with photosin an eigenspace. Tang et al [13] reported a more

comprehensive investigation on face sketch recognition.They developed a photo-to-sketch transformation methodthat synthesizes sketches from the original photos. Themethod enhances the similarity between the sketches byartists and the synthesized sketches. They also found that thealgorithms performed competitively with humans using thosesketches. Recently, a study on searching sketches in mug-shotdatabases was reported [14]. Sketch-photo matching was

performed using extracted local features and globalmeasurements. Sketches were drawn by composite softwareand no transformation was applied to sketches or photos.

Fusion is an important technique for improving biometricperformance [15]. Bowyer et al [16] have demonstrated thata multi-sample approach and a multi-modal approach can

achieve the same level of performance. Large increases inface recognition accuracy were also reported in multi-framefusion of videos [17, 18]. Therefore, it is natural to argue thatthe fusion of multiple sketches may also increase the chanceof finding a correct sketch-photo match. Multi-sketch fusioncan be done using the sketches from the same artist or thesketches from different artists. We chose the second optionbecause it potentially offers more diverse information of a

face.

III. EXPERIMENTS

A. PhotosFace photos of 100 subjects were randomly selected from a

database that consists of image and videos of 176 subjects.Original color photos were converted to grayscale photoswith a resolution of 480 x 720 pixels. All faces have fontalviews with a neutral expression. The 100 subjects include 73males, 27 females, 16 African Americans, 4 Asians, and 80Caucasians. There are two age groups: a younger age group

(18-25 years, 87 subjects) and an older age group (40+ years,

13 subjects).100 photos were evenly divided into two sets: Photo set A

was used as training samples in PCA tests and Photo set Bwas used for sketching. Photo set B includes 37 males, 13females, 8 African Americans, 2 Asians and 40 Caucasians,with 45 in the younger age group and 5 in the older age group.

B. SketchesFive artists participated in the sketch drawing sessions.

They are faculty members and students from the Departmentof Arts at Youngstown State University. All artists havereceived formal trainings in sketching, painting and humanface anatomy. After consulting with the experiencedprofessionals from the Ohio Bureau of CriminalIdentification and Investigation, a general sketchingguideline was established. However, under the guideline,each artist had the freedom to follow his or her own drawingstyle. This allows us to study the impact of inter-artistvariations on the sketch recognition rate. Several photo andsketch samples are shown in Fig. 1. Information ofparticipating subjects is summarized in Table I.

Photos

Sketches byartist

Sketches by

artist 2

Sketches by

Sketches by

artist 4

Sketches by

artist 5

Fig. 1. Photos and the corresponding sketches drawn by five artists. Both

photos and sketches have been normalized using two eye coordinates.

TABLE I

INFORMATION OF SUBJECTS IN PHOTOS AND SKETCHES

Photos Sketches- 50 subjects in set A. - 50 subjects of set B.50 subjects in set B. - Five artists participated.

- Photos were taken under - Each artist drew 50 sketches,the normal indoor light. one of each subject.

- Faces have frontal views - Following a general guideline,with a neutral expression. artists drew with their own styles.

- 73males, 27 females, 16 - 37males, 13 females, 8 AfricanAfrican Americans, 4 Americans, 2 Asians and 40Asians, 80 Caucasians. Caucasians.

- 87 in younger age group, - 45 in younger age group,13 in older age group. 5 in older age group.

C. Computer EvaluationsAn eigenface method based on the Principle Component

Analysis (PCA) was used in computer evaluations [19, 20].Five evaluation tests were designed, one for each artist (seeTable II). For example, in Test- 1, Photo set B was used as thegallery and the corresponding sketches by artist 1 were usedas the probe. The training samples were composed of photoset A and the sketches by other artists. This arrangementensures the independence of training samples from the testingdata (gallery and probe). The inclusion of both photos andsketches in the training samples is necessary to avoid a

skewed eigenspace and hence biased matching scores.

In order to evaluate the effectiveness of sketches under a

condition that is closer to the routine practice in criminalinvestigations, we did not apply any advanced enhancingtransformations to the sketches and photos as discussed in[13]. The only pre-processing of sketches is a simplegeometrical normalization that cropped faces by an ellipticalmask according to two eye coordinates.

TABLE II

TRAINING, GALLERY AND PROBE DATA IN PCA TESTS

Training Gallery ProbeTest-i Photo set A Photo set B Sketches of

and the sketches by Photo set Bartists 2, 3, 4 and 5. by artist 1.

Test-2 Photo set A Photo set B Sketches ofand the sketches by Photo set Bartists 1, 3, 4 and 5. by artist 2.

Test-3 Photo set A Photo set B Sketches ofand the sketches by Photo set Bartists 1, 2, 4 and 5. by artist 3.

Test-4 Photo set A Photo set B Sketches ofand the sketches by Photo set Bartists 1, 2, 3 and 5. by artist 4.

Test-5 Photo set A Photo set B Sketches ofand the sketches by Photo set B

|artists 1, 2, 3 and 4. by artist 5.

D. Human EvaluationsHuman evaluations were carried out in three steps: (i) A

gallery sheet (50 photos) and a probe sheet (50 sketches by an

artist) were printed on two separate papers; (ii) A group ofvolunteers were asked to match each sketch in the probe sheetto a photo in the gallery sheet; (iii) Matching results were thenconverted into a distance matrix that is similar to the one

generated by the PCA algorithm. This evaluation process was

repeated five times, each with a new probe sheet of a differentartist. In each probe sheet, the sketches were randomlypositioned. Examples of the gallery sheet and the probe sheetby artist 2 are given in Fig. 2 and Fig. 3.

"~~~~~~ , !"B R i^

Fi.2 h alr kshe (fo pht se B) use in.........hua evlaIos

Fig. 3. The probe sheet (by artist 2) used in human evaluations. Note thatthe positions of sketches were shuffled with respect to that of gallerysheet.

32 volunteers participated in the experiments, primarilyundergraduate students who have a demographic distributionsimilar to that in gallery and probe sheets. Two requirementswere imposed on selecting a volunteer: (i) he or she does nothave advanced knowledge of biometrics: (ii) he or she doesnot know any of the subjects in gallery and probe sheets. Thesecond requirement is necessary, because humans have animpressive ability of recognizing familiar faces.To minimize the impact of fatigue on the recognition

accuracy, during each test, light refreshment was providedand no time limit was imposed. Most volunteers were able tofinish a test within one hour. A sketch can be matched tomultiple photos and vise versa. If a particular match wasreplaced by a presumably better one, the change was alsomarked on the sheets. About 140 of the original matcheswere altered, indicating that volunteers second-guessed theirinitial decisions, and were willing to make a correction.

In recent studies on face recognition by humans [9, 11],volunteers were asked to rate the similarity of a photo pairthat were displayed on the computer screen (one pair at atime). We used the gallery and probe sheets with no time limitfor the following considerations: (1) Volunteers can examinethe entire galley and probe sets before making a match, andhave the opportunity to correct a match. This experimentalprotocol is more compatible to the situation where a victim oran investigator try to link a sketch to a set of photos ofsuspects; (2) Since volunteers visually and mentally gothrough a many-to-many matching process, their ratings andthe derived distance matrix is comparable to that of PCA thatincludes all possible gallery-probe pairs.

Human evaluations can be quantified with two measures:

an Absolute Recognition Ratio (ARR) and a RelativeMatching Distance (RMD). ARR is simply a percentage ofcorrect matches:

ARR = I

T

where ST is the total number of sketches in a probe sheet, andSi denotes the number of sketches that are correctlyrecognized by the ith volunteer. RMD is formulated as an

inverted match count:

RMD = N

AU(k) +k=l

i,1j [1, M]

where Ay(k) denotes a match between the ith sketch and thejthphoto made by the kth volunteer, £ is a small number in case

that the sum of Ay(k) is zero, N is the number of volunteers,and M is the number of sketches or photos used in the test.Ay(k) can take a value between 0 and 1, depending on thevolunteer's confidence of a match. In this study, a binaryrating scale was used, 1 for a match and 0 for a no-match.This is essentially a voting process. More volunteers votedfor the match between the ith sketch and the jth photo, thelarger the sum ofAy is, and the smaller RMDy is. As a result,RMD represents a fusion of multiple human classifiers. Adistance matrix made of all RMDU (MbyM matrix) can thenbe used in comparison with the PCA results.

IV. RESULTS AND DISCUSSIONS

A. Performances ofHuman and PCA

We used the Cumulative Match Characteristic (CMC)curves and their rank- I values in performance analysis. TableIII gives the rank-I values of five PCA tests, the rank-Ivalues of human evaluations using RMD, and the statistics ofARR. Those results are also plotted in Fig. 4. The CMCcurves of individual tests and fusion tests are shown in Fig. 5.

The recognition rates of PCA and humans show largevariations with respect to the artists. Sketches of artist 1 hadthe lowest rates while sketches of artists 2 and 4 gave muchbetter results. It should be emphasized that the recognitionrate is more likely related to the drawing style of an artistrather than his or her talent. In fact, the sketches of artist 1

exhibited somewhat caricature-like features that exaggeratedcertain facial parts (nose and eyes, see Fig. 1), which makestheir recognition more challenging. Interestingly, humansseem better equipped in handling this type of geometricdistortion than PCA (0.09 for PCA, 0.44 for human RMD,0.30 for human ARR). As discussed in [7], "some departuresfrom veridicality are actually beneficial for human facerecognition". Our results seem to be supportive of theargument.

O'Toole et al [9] found that algorithms outperformedhumans on the "easy" faces, while humans showed better

performance in recognizing "difficult" faces. The "difficult"faces were selected based on their lower than averagesimilarity scores in a prescreening test, and were related to theillumination factor. If we regard the sketches of artist 1 as"difficult" ones (though not caused by illumination changerather by shape distortion), our results are consistent withtheir findings. Being able to understand humans' ability oftolerating those degradations and to further incorporate themechanism into an algorithm is certainly desirable.

TABLE IIIHUMAN AND PCA PERFORMANCES

Computer Human Human(PCA) (RMD) (ARR)

Rank-I Rank-i Mean Min Max Dev

Artist 1 0.09 0.44 0.30 0.06 0.42 0.10Artist 2 0.58 0.81 0.48 0.16 0.72 0.13Artist 3 0.42 0.51 0.32 0.13 0.49 0.11Artist 4 0.49 0.93 0.61 0.30 0.82 0.16Artist 5 0.37 0.84 0.47 0.18 0.76 0.19

100

o .r

025Q

O 2~~~~~i ~~Humnan RMD {rahk41)K0mh ARR (m,0PCA (ank-l

o ooArt l A 2 Arat t Art ..A st

Fig. 4. Performances of humans and PCA with respect to the sketchesof five artists. Rank-I data ofPCA is correlated to both RM\D and ARR.

too8

u~~ ~ ~ ~ ~ ~ F .n

Ei~ ~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~... .. ..

ASO

o 2Q ~~~~~~~~ArfsK3

kArtis 3

* usioriRk-

00

.0....................................

1 3 5 7 9 11 13 16 17 19 21

(b) CMC curves ofhumanPCeautions and fusion test.

tff g fI,_,0 a .v

cm ,N...,. z,EY 4 @. ""

C0.Q0 g1 3'.f 7 9 '1 1 5 1 9 2

*_; ' Ih

(bCC uve o umn vlutin an fus3iontest

Fig. 5. PCA tests and human evaluations that used sketches of anindividual artist in comparison to the fusion tests. Note the performanceimprovements in both PCA and human fusion tests.

As shown in Fig. 4, PCA results have good correlationswith human RMD and ARR ratings, although therelationships are not strongly linear. It suggests that theremight be common elements in the recognition strategiesemployed by humans and PCA. The biological plausibility ofPCA was also demonstrated in a comparison study [8]. It wasfound that PCA may be a good model of human picturememory, especially for unfamiliar faces.

B. Multi-Sketch FusionThe common practice in criminal investigations is that an

artist is called to draw one or two sketches of a suspect. Asdiscussed above, the quality of sketches (measured by therecognition rate) varies from one artist to another. Will theuse of multiple sketches from different artists increase thechance of solving a crime? With this question in mind, weconducted two score level fusion tests, one using humanRMD data and one using Mahalanobis distances ofPCA tests.First, the distance matrix of each individual test wasnormalized using its maximum and minimum entries. Then,by the sum rule, all corresponding entries of five normalizedmatrices were added to create a new fused matrix. The fusionresults are shown in Fig. 5. Both human and PCA fusionsshowed large performance improvements in comparison to anindividual test. Perfect recognition was achieved at rank-3 inboth cases. If more artists participated, perfect recognition atrank-I could be expected. Of course, the cost of hiringmultiple police artists could be an issue, but from theperspective of capturing a criminal, multi-sketch fusion isclearly preferred.

The multi-sketch method is similar to the multi-samplemethod used in video frame fusion [17, 18]. But multi-sketchfusion is unique in the sense that the information diversity ofa face stems from the variation of artists' perceptual abilitiesrather than the photo itself. The multi-sample method utilizesphotometric and geometric variations in different photos. In amulti-sketch fusion, artists use the same photo (or verbaldescriptions) to draw sketches, but one artist may be moreobservant of certain facial features than the others. Thecombination ofthe diverse yet complementary information inthe sketches by different artists results in a higher recognitionrate.

C. The Most Recognized SketchesCMC or ROC curves present an overall measurement of a

classifier's performance, but they do not reveal theeffectiveness of a classifier with respect to specific cues. Togain more insight of what makes a sketch more recognizableto humans or PCA, we identified three groups of sketches: (a)Sketches that are most recognized by both PCA and humans;(b) Sketches that are most recognized by PCA but not byhumans; (c) Sketches that are most recognized by humans butnot by PCA. The three groups of sketches drawn by artist 1and artist 2 are shown in Fig. 6 and Fig. 7.

(a) The most recognized by both PCA and humans.

.1 eifi 6...'......

(b) The most recognized by PCA but not by humans.

(c) The most recognized by humans but not by PCA.

Fig. 6. Sketches that characterize the recognition preferences of PCA andhumans. All sketches were drawn by artist 2.*~j i ES

(a) The most recognized by both PCA and humans.

(b) The most recognized by PCA but not by humans.

(c) The most recognized by humans but not by PCA.

Fig. 7. Sketches that characterize the recognition preferences of PCA andhumans. All sketches were drawn by artist 1.

Group (a) contains faces that have hairs or mustaches, a

slightly non-frontal pose, or Asian ethnicity (two females). Itis evident that both human vision and PCA are keen on

patterns that show deviations from the "average" face of a

sample population. This phenomenon has certainly beenexplored by forensic artists who often pay special attentionsto facial marks, scars, moles, tattoos, or any other unusualfacial features.Group (b) consists of faces that have less distinctive

photometric or shape cues. In other words, they are more

similar to each other than to faces in Group (a). Theyconstitute a large portion of the subjects in the tests. Manyhuman volunteers experienced difficulties with this group

(except one female volunteer who consistently had highrecognition rate). Tang et al [13] and O'Toole et al [9] alsoobserved that algorithms exceeded human performance withthose similar faces. One possible explanation is that PCA can

pick up the subtle differences between similar faces and use

the image cues more efficiently than humans. This advantageof PCA is probably more pronounced when dealing with a

large number of subjects. However, it may be unfair tocompare humans and algorithms in this context, becausehuman vision may switch to a drastically different approachfacing a large crowd of people. For example, it has beensuggested that human vision might employ a rejectionmechanism rather than a systematic ranking-based metric thatis used in most algorithms [21]. After all, human visualsystem might be more "opportunistic" than we thought.Sinha et al [7, 22] showed that pigmentation cues are as

equally important as shape cues in face recognition (since weworked exclusively with grayscale images, we used the termof "tonality" instead of "pigmentation". The latter is more

accurate for characterizing the reflectance properties of skinin color images). As can be seen in faces of Group (c), humanvision tends to utilize tonality cues more efficiently thanPCA. We also turned on and off the pixel normalizationoption during sketch preprocessing, and did not findnoticeable changes in PCA results. The lack of enoughtonality cues in sketches may partially explain the lowrecognition rate of PCA, because a sketch-photo pair inGroup (c) was likely to be projected far apart in theeigenspace because of their intensity disparity. The lack ofsufficient number of subjects with darker skin tone in thetraining set may also affect the performance of PCA. Incontrast, humans seem able to overcome the tonality loss insketches and still find a correct match with relative ease.

In Fig. 6 and Fig. 7, it can be seen that the sketches of artist2 have more shadings and finer touches between major linesthan the sketches of artist 1. Those drawing details may

contribute to the higher recognition rate of the sketches ofartist 2. The importance of "mass" materials in line drawingsfor face recognition was also reported in [23]. It was arguedthat the skilful inclusion of those photometric cues couldmake a sketch more recognizable. Although there is not

quantitative data available on how much details should beadded without jeopardizing the fidelity of a sketch, it still canbe recommended that a forensic artist should add more detailswhen the situation allows.

V. SUMMARY

As biometric technologies steadily improve theirperformances, their involvement in traditional criminalinvestigations becomes more and more important. This isvery true in the field of forensic arts where both skilled sketchartists and composite software are routinely used. In thispaper, we present a study of human and computerrecognitions of face sketches, with a particular interest ofhow forensic artists and law enforcement professionals canbenefit from the research. We summarize our findings asfollows:

1) There is a large inter-artist variation in terms of sketchrecognition rate, which is likely related to drawing stylerather than the talent.

2) Since multi-sketch fusion can significantly improve therecognition rate, using multiple artists in a criminalinvestigation is recommended.

3) Other than the accuracy of major sketch lines, pictorialdetails such as shadings and skin textures are alsoimportant.

4) Humans showed better performance with the morecaricature-like (considered as more difficult) sketches.

5) Human and PCA performances seem correlated, thoughexperiments involving more artists are needed.

6) PCA does a better job in recognizing sketches of lessdistinctive features, while humans utilize tonality cuesmore efficiently.

Finally, it is worth mentioning that sketch-photo matchingis more challenging than photo-photo matching, because asketch is not a simple duplicate of a face, but rather the oneperceived and reconstructed by an artist. Therefore, we mayhave much more to gain by examining how humans andcomputer recognize sketches and caricatures.

REFERENCES

[1] K. T. Taylor, "Forensic Art and Illustration", CRC Press, 2000.[2] L. Gibson and D. F. Mills, "Faces of Evil: Kidnappers,

Murderers, Rapists and the Forensic Artist Who Puts ThemBehind Bars", New Horizon Press, 2006.

[3] D. Mcquiston-Surrett, L. D. Topp, and R. S. Malpass, "Use offacial composite systems in US law enforcement agencies",Psychology, Crime and Law, 12(5), pp. 505-517, 2006.

[4] C. D. Frowd, D. Carson, H. Ness, J. Richardson, L. Morrison,S. McLanaghan, and P. J. B. Hancock, "A forensically validcomparison of facial composite systems", Psychology, Crime& Law, 11(1), pp. 33-52, 2005

[5] C. D. Frowd, D. McQuiston-Surrett, S. Anandaciva, C. E.Ireland, and P. J. B. Hancock, "An evaluation of US systemsfor facial composite production", Ergonomics, 50, pp. 562-585,2007.

[6] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, "Facerecognition: A literature survey," ACM Computing Surveys,35(4), pp. 399-458, 2003.

[7] P. Sinha, B. J. Balas, Y. Ostrovsky, and R. Russell, "Facerecognition by humans: 19 results all computer visionresearchers should know about", Proceedings of the IEEE,94(11), pp. 1948-1962, 2006.

[8] V. Bruce, P. J. B. Hancock, and A. M. Burton, "Comparisonsbetween human and computer recognition of faces", ThirdIEEE International Conference on Automatic Face andGesture Recognition, Washington, DC. pp. 408 - 413, 1998.

[9] A. J. O'Toole, P. J. Phillips, F. Jiang, J. H. Ayyad, N. Penard,and H. Abdi, "Face Recognition Algorithms Surpass HumansMatching Faces Over Changes in Illumination", IEEE Trans.Pattern Anal. Mach. Intell. 29(9), pp. 1642-1646, 2007.

[10] A. J. O'Toole, H. Abdi, F. Jiang, and P. J. Phillips: "FusingFace-Verification Algorithms and Humans", IEEETransactions on Systems, Man, and Cybernetics, Part B, 37(5),pp. 1149-1155, 2007.

[11] A. Adler, and M. E. Schuckers, "Comparing Human andAutomatic Face Recognition Performance", IEEE Trans. onSystems, Man, and Cybernetics, Part B, 37(5), pp. 1248-1255,2007.

[12] R. G. Uhl, and N. V. Lobo, "A framework for recognizing afacial image from a police sketch", Proceedings of theConference on Computer Vision and Pattern Recognition, pp.586-593, San Francisco, 1996.

[13] X. Tang, and X. Wang, "Face sketch recognition", IEEETransactions on Circuits and Systems for Video Technology,14(1), pp. 50-57, 2004.

[14] P. C. Yuen, and C. H. Man, "Human Face Image SearchingSystem Using Sketches", IEEE Transactions on Systems, Man,and Cybernetics, Part A, 37(4), pp. 493-504, 2007.

[15] A. A. Ross, K. Nandakumar, and A. K. Jain, Handbook ofMultibiometrics, Springer, 2006.

[16] K. W. Bowyer, K. Chang, P. J. Flynn, and X. Chen, "Facerecognition using 2-D, 3-D and Infrared: Is multimodal betterthan multisample?" Proceedings of the IEEE, 94(11), pp.2000-2012, 2006.

[17] S. J. Canavan, M. P. Kozak, Y. Zhang, S. R. Sullins, M. A.Shreve, and D. B. Goldgof, "Face Recognition by Multi-FrameFusion of Rotating Heads in Videos", First IEEE InternationalConference on Biometrics: Theory, Applications, and Systems,Sept. 27-29, Washington, DC, 2007.

[18] D. Thomas, K. W. Bowyer, and P. J. Flynn, "Multi-frameapproaches to improve face recognition," IEEE Workshop onMotion and Video Computing, Austin, TX, 2007.

[19] M. Turk, and A. Pentland, "Eigenfaces for recognition",Journal ofCognitive Neuroscience, (3)1, pp. 71-86, 1991.

[20] www.cs.colostate.edu/evalfacerec/[21] B. Kamgar-Parsi, S. Krawczyk, E. Lawson, and R. Stanchak,

"Toward A Human-Like Similarity Measure for FaceRecognition", Biometric Consortium Conference: BiometricsSymposium, Baltimore, MD, 2006.

[22] R. Russell, P. Sinha, I. Biederman, and M. Nederhouser, "Ispigmentation important for face recognition? Evidence fromcontrast negation", Perception, 35, pp. 749-759, 2006.

[23] V. Bruce, E. Hanna, N. Dench, P. Healy, and A. M. Burton,"The importance of 'mass' in line drawings of faces", AppliedCognitive Psychology, 6, pp. 619 628, 1992.