Face detection for video summary using enhancement-based fusion strategy under varying illumination...

8
IEEE-32331 International Conference on Science, Engineering and Management Research (ICSEMR 2014) 978-1-4799-7613-3/14/$31.00 ©2014 IEEE Face Detection for Video Summary using Enhancement-based Fusion Strategy under Varying Illumination Conditions Richa Mishra and Dr. Ravi Subban Department of Computer Science School of Engineering and Technology, Pondicherry University Puducherry, India [email protected], [email protected] Abstract—A biometric-based techniques emerge as the promising approach for most of the real-time applications including security systems, video surveillances, human-computer interaction and many more. Among all biometric methods, face recognition offers more benefits as compared to others. Diagnosing human faces and localizing them in images or videos is the priori step of tracking and recognizing. But the performance of face detection is limited by certain factors namely lighting conditions, pose variation, occlusions, low resolution images and complex background. To overcome the problems, this paper examines a fusion strategy in the enhancement-based skin- color segmentation approach that can improve the performance of face detection algorithm. The method is robust against complex background, ethnicity and lighting variations. The method consists of three steps. The first step receives spatial transform techniques in parallel to enhance the contrast of the image, change the color space of the enhanced images to YCbCr, apply skin segmentation technique and yield the binary segmented images. The second step ascertains the weight of accuracy (WoA) of each of the segmented image and fed it into the fusion strategy to get the final skin detected region. Finally, the last step localizes the human face. The methodology is not constrained to just frontal face identification. However it is invariant with the diverse head postures, enlightment condition, and size of faces. The experimental result demonstrates the improvement in the accuracy and precision along with the reduction in FPR as compared to other enhancement classifiers. Keywords—enhancemenet techniques; face detection; fusion strategy; illumination; skin detection I. INTRODUCTION Computer vision is the artificial implementation of human vision. Human eyes are capable of visualizing wide range of frequencies of the visual spectrum. The objects in an image captured under unpredictable or unconstrained environments can be easily recognized by human eyes but it is one of the major challenging tasks in the computer vision. Face recognition – an application of computer vision - is the basic concern of this paper due to its popularity among several applications including sophisticated security systems, access control, content-based video search, and many more. It concerns with the theory of extracting information from an image or video frames to perform the task of finding and identifying human faces. It is the system in which an arbitrary image is searched into the concerned database to get the peoples identification that is localized in the input image. Under a controlled environment, the face recognition systems have high recognition rate for identifying person’s details. While in an uncontrolled environment, the systems have faced lots of challenges caused by different factors namely illumination changes, varying gestures, pose and occlusion [1] that degrade their performance rate. These challenges are still unresolved and intense research has been going on to resolve it. The robustness and the reliability of a face recognition system are reliant to a great extent upon the ability of the face detection systems accuracy [2] - one of the biometric methods that are used in security systems [3]. It is the preliminary step of face recognition process. But diagnosing and localizing human faces in captured images or recorded videos under unpredictable environment are still a most challenging task. The intent of any face detection approach is to spot face regions within an image. The successfulness of real-time face detection techniques depends on the reliability of the facial feature selection under an unconstrained environment. The facial feature comprises of the eyes, mouth, nostrils, skin-color, and texture. The identification of these attributes under a controlled environment is the easiest and simplest task, but it becomes complex and difficult under different lighting conditions, facial expressions, head poses and occlusions [2]. Several approaches have been proposed for resolving the previously mentioned challenges [4, 5, 6]. One of the widely used approaches to perform face detection is through the skin detection approach [1]. It is one of the beneficial and simplest ways used to detect the human faces in color images. Instead of the benefits, skin detection has also faced major challenges due to some factors like heavy shadows, overexposure, complex background, skin races, lighting variations, and others. These factors result in changes in the appearance of the human skin- color and represent the color differently in the captured image [7]. In order to improve the accuracy of skin detection, several image processing techniques have been developed. For enhancing the machine vision as like human eyes, the concept of spatial transform technique has been taken into account. It is a foremost step in applications like skin detection, face Veltech Multitech Dr.Rangarajan Dr.Sakunthala Engineering College, Avadi, Chennai (Sponsors)

Transcript of Face detection for video summary using enhancement-based fusion strategy under varying illumination...

IEEE-32331

International Conference on Science, Engineering and Management Research (ICSEMR 2014) 978-1-4799-7613-3/14/$31.00 ©2014 IEEE

Face Detection for Video Summary using Enhancement-based Fusion Strategy under Varying

Illumination Conditions

Richa Mishra and Dr. Ravi Subban Department of Computer Science

School of Engineering and Technology, Pondicherry University Puducherry, India

[email protected], [email protected]

Abstract—A biometric-based techniques emerge as the promising approach for most of the real-time applications including security systems, video surveillances, human-computer interaction and many more. Among all biometric methods, face recognition offers more benefits as compared to others. Diagnosing human faces and localizing them in images or videos is the priori step of tracking and recognizing. But the performance of face detection is limited by certain factors namely lighting conditions, pose variation, occlusions, low resolution images and complex background. To overcome the problems, this paper examines a fusion strategy in the enhancement-based skin-color segmentation approach that can improve the performance of face detection algorithm. The method is robust against complex background, ethnicity and lighting variations. The method consists of three steps. The first step receives spatial transform techniques in parallel to enhance the contrast of the image, change the color space of the enhanced images to YCbCr, apply skin segmentation technique and yield the binary segmented images. The second step ascertains the weight of accuracy (WoA) of each of the segmented image and fed it into the fusion strategy to get the final skin detected region. Finally, the last step localizes the human face. The methodology is not constrained to just frontal face identification. However it is invariant with the diverse head postures, enlightment condition, and size of faces. The experimental result demonstrates the improvement in the accuracy and precision along with the reduction in FPR as compared to other enhancement classifiers.

Keywords—enhancemenet techniques; face detection; fusion strategy; illumination; skin detection

I. INTRODUCTION Computer vision is the artificial implementation of human

vision. Human eyes are capable of visualizing wide range of frequencies of the visual spectrum. The objects in an image captured under unpredictable or unconstrained environments can be easily recognized by human eyes but it is one of the major challenging tasks in the computer vision. Face recognition – an application of computer vision - is the basic concern of this paper due to its popularity among several applications including sophisticated security systems, access control, content-based video search, and many more. It concerns with the theory of extracting information from an image or video frames to perform the task of finding and

identifying human faces. It is the system in which an arbitrary image is searched into the concerned database to get the peoples identification that is localized in the input image. Under a controlled environment, the face recognition systems have high recognition rate for identifying person’s details. While in an uncontrolled environment, the systems have faced lots of challenges caused by different factors namely illumination changes, varying gestures, pose and occlusion [1] that degrade their performance rate. These challenges are still unresolved and intense research has been going on to resolve it.

The robustness and the reliability of a face recognition system are reliant to a great extent upon the ability of the face detection systems accuracy [2] - one of the biometric methods that are used in security systems [3]. It is the preliminary step of face recognition process. But diagnosing and localizing human faces in captured images or recorded videos under unpredictable environment are still a most challenging task. The intent of any face detection approach is to spot face regions within an image. The successfulness of real-time face detection techniques depends on the reliability of the facial feature selection under an unconstrained environment. The facial feature comprises of the eyes, mouth, nostrils, skin-color, and texture. The identification of these attributes under a controlled environment is the easiest and simplest task, but it becomes complex and difficult under different lighting conditions, facial expressions, head poses and occlusions [2].

Several approaches have been proposed for resolving the previously mentioned challenges [4, 5, 6]. One of the widely used approaches to perform face detection is through the skin detection approach [1]. It is one of the beneficial and simplest ways used to detect the human faces in color images. Instead of the benefits, skin detection has also faced major challenges due to some factors like heavy shadows, overexposure, complex background, skin races, lighting variations, and others. These factors result in changes in the appearance of the human skin-color and represent the color differently in the captured image [7]. In order to improve the accuracy of skin detection, several image processing techniques have been developed. For enhancing the machine vision as like human eyes, the concept of spatial transform technique has been taken into account. It is a foremost step in applications like skin detection, face

Veltech Multitech Dr.Rangarajan Dr.Sakunthala Engineering College, Avadi, Chennai (Sponsors)

IEEE-32331

detection, tracking and recognition. Its basic purpose is to improve the visual quality of digital images that exhibit bright lights and dark shadows. It also helps in reducing false positive and speed up the detection process. Thus, the improvement of face detection techniques is based on accuracy of skin-color identification that is transitively based on the enhancement of an image. The spatial transformation can be categorized into intensity and spatial dependent routines like contrast stretching, gamma correction, histogram equalization, smoothing filter and unsharp masking. However, the performance of these conventional methods is limited with respect to global processing scheme in skin or face detection applications [8]. Therefore, an improved algorithm is proposed to overcome the limitations of these methods and to enhance accuracy rate of skin and face detection.

In this paper, an efficient and robust face detection algorithm is presented through the skin-color detection using the fusion strategy of an enhancement–based approach and morphological processing to overcome the above mentioned limitations. It is initiated with the enhancement of an input image using five different enhancement techniques, taking all of them in parallel and getting the five enhanced images. The color space of these enhanced images is converted into YCbCr color space and then follows the classification of skin and non-skin pixels using piecewise linear decision boundary algorithm. All of the binary segmented images have been used as an input in the fusion phase to yield the final skin detected region. Morphological operator is also applied to remove noise. Finally, the segmented components are labelled, their areas and centroids are calculated and the bounding boxes are drawn around the detected human faces. This paper is the extended work of [9] on different datasets. The rest of the paper is organized as follows: Section 2 discusses the related work. Section 3 explains all of the steps of the proposed algorithm. Section 4 narrates the experimental results and conclusion is given in Section 5.

II. RELATED WORK Diagnosing human face is the preliminary step of face

tracking and recognition system. In the last two decades, varieties of face detection algorithms [4, 5, 6] has been proposed keeping in mind to improvise the performance rate of detection in images or videos or to reduce error rate. Under a controlled environment, the performance of face detection systems is very high along with the low error rate. But in real-time, the situation is gone reversed due to the addition of lighting conditions, complex background, pose variations and occlusion [10]. To address these challenges, most of the researchers [10, 11] have used skin detection algorithm as their first step to detect human faces. It provides invariance to geometric transformations, limits search space and high robustness to the images. Ban et al. [10] proposed boosting-based face detection method utilizing color information for emphasizing skin-color and ignoring parametric fitting or morphological operation. Lin [2] developed an effective and robust face detection scheme that could significantly decrease the execution time of the algorithm for images having complex background. They used color and triangle-based segmentation approach to localize human face and verify it by multilayer feed forward neural network. They had detected multiple faces

in a complex background and varying illuminations as well as handled change in poses and facial expression. They used AR face database and got 98.2% success rate with 2% false rejection rate.

In the above literature, it is clearly shown that the face detection system directly or indirectly depends on the morphological or illumination compensation algorithms along with the skin-color detection techniques. But the process is affected by varying lighting conditions, complex background, physical characteristics and limited availability of training data. The reason is that the overexposure or variations in lighting conditions result in changes in the appearance of skin-color that debases the performance of face recognition task. Works have been done to resolve the above factors by normalizing color, transformation of color spaces, multiple color channel ratios and eliminating luminance components. In addition to the above challenging factors, cameras may also result in poor visibility of images that makes the recognition of objects either difficult or impossible [3]. Since, the quality of images is very much affected by movement of objects, low resolution cameras, lighting condition environments etc. Thus, this may introduce noise in the image that may degrade the performance of detection and recognition process. It is therefore crucial to use enhancement techniques as a pre-processing step to enhance the quality of images to use them for biometric recognition purposes. Han et al. [1] studied twelve representative illumination pre-processing methods over YaleBExt, CMU-PIE, CAS-PEAL, FRGC V2.0 datasets using six face matching algorithms. They have categorized illumination pre-processing approaches based on their principles into three classes: gray-level transformation, gradient and edge extraction, and face reflection field estimation.

Since, images captured with display devices in real world scenes usually affected by poor visibility due to over exposure to the sunlight’s or shadows and low contrast that creates difficulty in visualizing the image even with the naked eye. Tao et al. [8] resolved the problem by proposing an image enhancement method with the dynamic range compression – neighbourhood dependent intensity transformation – followed by local contrast enhancement to enhance the luminance in dark shadows. They compared their algorithm to histogram equalization and multiscalen Retinex with color restoration to improve the face detection in complex lighting environment. They experimented on FRGC database containing up-right frontal human face captured under normal visual quality, dim brightness, dark face and bright background, and dark images of very low brightness. The same work has been done by Unaldi et al. [12] using different approach. They proposed wavelet-based image enhancement algorithm considering dynamic range compression and low contrast enhancement algorithm. Lim and Ibrahim [13] proposed two new enhancement techniques – CHEGIC and block based contrast enhancement method. They compared their results with the three conventional enhancement techniques namely, gamma intensity correction, histogram equalization and homomorphic unsharp masking. Yun et al. [14] used the illumination-compensation and morphological processing for face detection. They proposed an empirical method face ratio from golden ratio.

IEEE-32331

Fig. 1. Flow diagram of the proposed system

They used centroid and area for evaluating the starting and ending points of the face. Thus, in order to improve the performance of real-time human face detection and recognition systems, improvement in an image enhancement and skin detection techniques should be addressed.

III. PROPOSED SYSTEM The process of face detection initiates with the localization

of human faces followed by the extraction and verification of them. The detection steps propagate efficiently with the selection of attributes. The attributes selected in this paper are skin-color, area of the facial region and its centroid. Figure 1 shows the flow diagram of the proposed efficient, robust and reliable face detection algorithm based on skin-color detection using the fusion strategy of an enhancement-based approach and morphological operators to remove noise. The following subsections narrate the detailed description of the proposed approach. A. Skin Detection Technique

The purpose of the skin detection process is to extract skin regions from an image that helps in localizing human faces within the image as it limits the search space. It starts with the selection of color spaces followed by the skin detector model that further labelled the pixels as skin or non-skin. Commonly, it is based on the theory that the color model can effectively help in modelling the skin-color that follows the segmentation of skin regions [15]. But the overlapping of skin and non-skin

pixels limits its accuracy. It may also result in multiple errors and unsuccessful diagnosis if the skin regions are detected with errors at the initial stage of the process. Thus, the initial step of the process must be performed with at most accuracy so that the reliability and efficiency of the system is maintained. It has been observed in the literature that the accuracy of the technique can be enhanced by adding additional features that helps in labelling the pixels more accurately. The additional feature that is considered in the paper is image enhancement techniques as a pre-processing step that are discussed in the succeeding subsections.

1) Pre-processing step: This step involves the application of the spatial

transformation techniques for improving the visibility of the image to machine. It helps in contrast manipulation and sharpening of image. Since, the appearance of human skin in different lighting conditions enriches the complexity of skin detection process that results high false detection [7]. Thus, in this paper, the pre-processing phase consists of two different disjoint parts. In the first part, the preparation of ground truth of corresponding sample images takes place. The operation has been manually performed on the Caltech and Indian face datasets. In the second part, conversion of color images into different enhanced images using respective enhancement transformation function has been taken place. The techniques that are considered for improvising the details of the images namely, contrast manipulation (CS), gamma correction (GC), histogram equalization (HE), smoothing filter (SF) and unsharp masking (UM). The basic idea behind the CS (also called normalization) is to increase the dynamic range of grey levels of an image. It is a linear scaling function that is applied to the image pixel values. The pixels values of an enhanced image of the corresponding original image can be obtained by using following basic equation [3]:

(1)

Here, the range of pixels is stretched from [a, b] to [c, d]. Gamma correction is a non-linear operation and is useful for general purpose contrast manipulation. It is defined by the power-law expression given in equation [13]:

(2)

Here, e and are positive constants. By changing the value of the gamma, the appearance of the image changes from darker to brighter or vice-versa as it changes not only the intensity values of the pixels but also the ratios of red to green to blue in a color image. HE attempts to improve the intensity variation by redistributing the histogram in a roughly uniform fashion throughout the whole range [0 255]. The increase in contrast resulting from HE is enough to render any intensity differences in the equalized images visually indistinguishable. SF attempts to reduce sharp transitions in intensities of an image as random noise consists of sharp transitions in intensity levels. It also helps in removing irrelevant details from the image. In this paper, 3 x 3 SF is used to remove the irrelevant details of the sample images. UM attempts to sharpen the edges of the image.

Fusion Strategy

Final Skin Labeled Image Morphological Processing

Get the labeled imageCalculate the area of labeled image

Obtain the centroid of the area

Make bounding box around the face

Ground Truth Image

Input Image

Contrast Stretching

Gamma Correction

Histogram Equalization

Smooth Filter

Unsharp Masking

Transformation to YCbCr

Skin Segmentation Technique

Binary Segmented Image

w1 w2 w3 w4 w5

IEEE-32331

The input image, I(x,y), is fed to each of the techniques simultaneously and converted it into different enhanced images, generated by applying the transformation function of the techniques to each channel of the color space. The mathematical representation of the pre-processing the image pixels are given in equation (3).

(3)

Here, E(x, y) represents the enhanced version of the original image I(x, y) and T represents the transformation function, applied on the input image to produce the enhanced image. The above equation represents the transformation function applied on the grayscale image. In this paper, color images are considered. Thus, we have modified the equation as shown in the equation (4).

(4) (5)

Fig. 2. Enhanced images of the sample image

Here, k = {R, G, B} channels and i = 1 to 5. The purpose of using the above mentioned techniques is due to their benefits in different situations. Contrast stretching attempts to increase the appearance of large-scale light-dark transitions. Power law transformation (Gamma correction) is used when expansion of dynamic range of image requires. Unsharp mask increases the appearance of small-scale edges. Figure 2 shows the effect on an original image after applying the five different transformation functions independently and individually. Fig. 2a represents the original sample RGB color image; Fig. 2b illustrates the corresponding ground truth image; Fig. 2c represents the enhanced image of the sample image after applying contrast stretching technique; Fig. 2d depicts the gamma corrected image of the respective original image; Fig. 2e illustrates the histogram equalized image of the respective original image; Fig. 2f represents the smoothed image of the corresponding original image by applying smoothing filter; Fig. 2g depicts the unsharp masked image of the respective original images.

2) Color Space: The rationale behind the detection of human faces in color

images is to get more information than a gray level image [16].

This implies that color images are capable of improving the accuracy of detection. Ravi Subban and Richa Mishra [17] showed that the performance of orthogonal color space (YCbCr, YUV, YIQ, YCgCr, etc) performs well in the discrimination of skin and non-skin region. In this paper, YCbCr is used because the clustering of skin-color is more as compared to other color space. The overlapping between skin and non-skin region under illumination condition is small. It is basically used in video compression techniques with some benefits [2]. Ban et al. [10] used skin-color information along the YCbCr color space to propose a boosted-based face detection method. It has been observed in the literature [10, 14] that YCbCr color space yields better result always as compared to other color space in the discrimination of skin-colors as well as human faces. That’s why it is used to convert the color space of enhanced images to YCbCr color space. Equation (6) represents its transformation formula.

Fig. 3. Color converted image of contrast stretched image

(6)

Figure 3 shows the color converted image of original image after applying contrast stretching enhancement technique. Fig. 3(a) shows the original image. Fig. 3(b) shows the enhanced image after applying contrast stretching. Fig. 3(c) shows the change in color space of the enhanced image from RGB to YCbCr.

3) Skin segmentation : After the completion of color transformation, a skin

segmentation process is performed. Piecewise linear decision boundary is used for labeling the pixels either as skin or non-skin. The logic behind using this approach is its popularity among researchers due to its simplicity and fast computational time. The approach takes transformed images one at a time and converts it into binary skin segmented image using multiple thresholds. This intimates that five binary segmented images will be obtained, one for each transformed image and is shown in figure 4. But the limitation with the approach is that it may result in a high detection rate along with the high false positive rate. In order to overcome this problem, the fusion strategy is used that makes a decision before labeling the pixel as skin or non-skin.

4) Fusion strategy: Fusion strategy implies the fusion of multiple

results/features obtained from different algorithms so that more information can be obtained as compared to single algorithm. It is referred as multi-algorithmic fusion. It performs either at score level or decision level. Score level makes the final

(a) (b) (c)

(d) (e) (f)

(g)

(a) (b) (c)

IEEE-32331

decision on the basis of combination of match scores generated by each classifier to generate a new match score. It is applied to the score output of the multiple classifiers. It involves sum rule, product rule, exponential sum rule and tanhyperbolic sum rule. Decision level makes the decision either yes or no. Majority voting, weighted voting and “AND” and “OR” rules are the decision level fusion strategies [18]. Wang et al. [19] developed a fusion of face-iris verification system in order to remove the difficulties faced by the individual classifiers. They used weighted sum rule to examine the performance of each classifier. They stated that the weighted sum rule is the best approach as compared to the sum rule and the Fisher rule. Wang et al. [20] proposed a visual recognition algorithm based on the fusion of static and dynamic features of body biometrics. They used Procrustes shape analysis method to extract static features and a condensation framework to track the walker and to recover joint-angle trajectories of lower limbs as dynamic features of the body. While Zhou et al. [21] introduced an innovative video based recognition method by integrating the side face and gait biometrics information at match score level. They constructed ESFI (Enhanced Side Face Image) as the face template and GEI (Gait Energy Image) as the gait template. They tested the performance of the methods on the video sequences corresponding to 46 people. They concluded that the best performance was achieved by using sum rule and max rule to integrate the information obtained from ESFI and GEI. Xiao [18] used the sum rule with the fuzzy approach to detect human faces using skin detector, eye detector and face shape detector.

From the above discussion, it is clearly seen that in a biometric recognitions, the sum rule produced the best performance overall as compared to other methods and the fusion strategy has not been used in the skin detection process to the best of my knowledge. Thus, the sum rule based fusion strategy is used to enhance the accuracy of the face detection process by improvising skin detection process. The parameter weight is associated with the score level fusion that represents the performance of different detectors. It requires FPR (false positive rate) and FNR (false negative rate) of each classifier to compute their respective weights. Larger the weight, smaller will be the error rate. Thus, we have referred this parameter as weight of accuracy (WoA). The weight for each classifier is computed using (7).

(7)

(8)

(9)

where, k = 1 to 5, FPRi and FNRi represents the false positive rate and false negative rate of ith classifier, respectively and are calculated using (8) and (9). The manually labeled ground truth of the image constructed in pre-processing phase has been used for estimating the FPR and FNR of each classifier. After computing the weights, Sf is calculated for each pixel using (10).

(10)

where, si represents the ith pixel of binary skin segmented image of each classifier, and N = m x n represents total number of pixels in the input image. This implies that the Sf value is calculated for each and every pixel of the image. If the computed value is greater than some predefined threshold , then the pixel is labelled as skin (1) else non-skin (0).

(11)

5) Morphological operator: Morphological operator is used to remove the noise that

may be added during the segmentation process. Eliminating the noise from the binary image by deleting all the connected components that have fewer than predefined pixels, results in another binary skin segmented image. The opening operation i.e. erosion followed by dilation is applied to get the final skin segmented image. The general representation of opening of A by B is denoted as

(12)

Using the opening method, the noise can be removed and make the process robust.

B. Face Localization This step involves the localization of face candidate from

the skin segmented regions. As we have stated, increasing TPR by reducing FPR for improving the localization of human faces is the main objective of this paper. The task is split into two steps: computing the facial area and identifying the face region. First, the holes are filled in the segmented region. After that, the regions are labeled using 8-connected components. The area of all the connected components is calculated and compared with the predefined threshold to test whether it is face or non-face. If it is a face, the centroid of the facial area is computed and the coordinates for making the bounding box around is extracted.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONS The proposed approach has been substantiated by using the

images collected from the Internet, Caltech face database [22] and Indian face database [23]. Caltech dataset contains 450 frontal face images of 27 subjects collected by Markus Weber at California Institute of Technology. The images are captured under different lighting, expression and backgrounds. They are of size 896 x 592 pixels and are stored in jpeg format. While Indian face dataset contains images of 40 subjects including men and women with eleven different poses (looking front, left, right, up, up towards left, up towards right and down) for each individual. The images are captured under bright homogeneous backgrounds and the subjects’ positions are upright and frontal. They are of size 640 x 480 pixels and are stored in jpeg format. The aim of taking these datasets is to get the images captured under complex background, lighting variations, pose and expression. All these qualities are fulfilled by considering Caltech and Indian face datasets. In order to check the performance of the proposed approach in real-time, image available on Internet (103 images) is used. The successfulness of the approach has been compared with five

IEEE-32331

conventional enhancement techniques (CS, GC, HE, SF and UM). The performance metrics considered for estimating the performance are TPR, FPR, Precision and Accuracy. The metrics are obtained by using the following equations:

(13)

(14)

(15)

TABLE I. EXPERIMENTAL RESULT OF THE SKIN DETECTION

Datasets Performance Metrics

(%)

State-of-art methods

CS GC HE SF UM Proposed

Caltech face

database

TPR 90.1 79.4 95.5 86.2 83.1 94.1 FPR 15.1 8.1 82.5 79.7 12.3 7.9

Precision 56.1 68.7 19.5 18.6 59.5 74.3 Accuracy 85.2 89.8 31.0 31.8 86.9 90.9

Indian face

dataset

TPR 93.9 95.3 95.2 97.5 70.5 92.8 FPR 4.7 4.9 80.8 81.6 34.2 3.2

Precision 83.3 83.0 23.0 23.3 47.1 87.6 Accuracy 95.0 95.1 34.6 34.3 66.0 95.9

Internet images

TPR 73.6 46.0 79.1 47.9 57.0 61.2 FPR 10.1 7.5 27.1 36.6 32.7 2.6

Precision 50.1 49.4 14.7 17.12 46.1 55.6 Accuracy 88.2 90.9 73.1 62.7 88.8 95.0

Here, TP (True Positive) represents the number of pixels correctly detected as skin. FP (False Positive) represents the number of non-skin pixels detected as skin. TN (True Negative) represents the number of pixels correctly detected as non-skin. FN (False Negative) represents the number of skin pixels detected as non-skin. Since, the effect of an unconstrained environment commonly results some incorrect labeling of pixels that transitively results in the incorrect face detection. Thus, the aim of any skin classifier is to maintain a tradeoff between TPR and FPR. Larger the TPR, smaller should be the FPR. Experimental results shown in table I indicate that an accuracy of around 90.9% is achieved on Caltech face dataset with 7.9% FPR, 95.9% on Indian face dataset with 3.2% FPR and 95% on Internet images with 2.6% FPR It illustrates the qualitative performance of the proposed approach over different datasets. The situations of the experimental states are as follows. Figure 4 shows the experimental results of the color image containing human face having homogeneous background along with the human face having dark complexion. Fig. 4a represents the original sample RGB color image followed by the binary segmented skin region obtained by using fusion strategy and the truly detected human face bounded by red color bounding box; Fig. 4b illustrates the enhanced image of the input image obtained by applying contrast stretching approach followed by the skin extracted image using multiple threshold. It also shows the truly detected human face along with the two false detection which is caused by hairs. The reason is that the contrast image increases the intensity of images that results in the overlap between the skin and non-skin regions. Fig. 4c illustrates the gamma corrected enhanced image of the input image followed by its corresponding perfectly classified binary segmented image that results in the true detection of human face. Fig. 4d

reflects the histogram equalized image followed by its binary segmented image which has high FPR that totally misguide the face detection process. The SF has also misguided the face detection process which is shown in fig. 4e. Fig. 4f depicts unsharp masked image which has high FPR but less than SF and HE but still it has missed the human face in the input image.

Figure 5 shows the experimental result of a color image containing human face facing shadow light as compared to its background.

Image Type Binary Segmented

Images Face Detected

Image

Original Image TPR: 92.36%,

FPR: 1.08% True Detection

Contrast Enhanced

Image TPR: 93.99% FPR: 3.42%

True detection with two False Detection

Gamma Correcetd

Image TPR: 96.62% FPR: 1.74%

True Detection

Histogram

Equalized Image TPR: 86.52% FPR: 83.62%

No Detection

Smooth Filtered

Image TPR: 99.31% FPR: 92.44%

No Detection

Unsharp Masked

Image TPR: 85.23% FPR: 58.89%

No Detection

Fig. 4. Experimental results of color image with homogeneous background

Image Type Binary Segmented Images

Face Detected Image

Original Image TPR: 94.07%

FPR: 19.64% True Detection

Contrast Enhanced

Image TPR: 62.73% FPR: 24.26%

False Detection

Gamma Correcetd

Image TPR: 26.31% FPR: 3.95%

Partial Detection

IEEE-32331

Histogram Equalized Image

TPR: 95.95% FPR: 82.61%

No Detection

Smooth Filtered

Image TPR: 22.54% FPR: 65.75%

No Detection

Unsharp Masked

Image TPR: 3.94% FPR: 17.58%

False Detection

Fig. 5. Experimental results of color image with complex background

Image Type Binary Segmented Images

Face Detected Image

Original Image TPR: 75.94% FPR: 0.87%

True Detection with two False Detection

Contrast Enhanced Image

TPR: 82.39% FPR: 3.78%

True detection with three False Detection

Gamma Correcetd Image

TPR: 81.10% FPR: 2.01%

True Detection with four False Detection

Histogram Equalized Image

TPR: 40.79% FPR: 29.15%

No Detection

Smooth Filtered

Image TPR: 59.79% FPR: 51.80%

False Detection

Unsharp Masked Image

TPR: 82.90% FPR: 7.80%

True Detection with five False Detection

Fig. 6. Experimental results of color image with varying lighting condition

Fig. 5a represents the original sample RGB color image followed by the binary segmented skin region obtained by using fusion strategy and the truly detected human face bounded by red color bounding box; Fig. 5b illustrates the enhanced image of the input image obtained by applying contrast stretching approach followed by the skin extracted

image using multiple threshold. It also shows the truly detected human face along with the complex background. The reason is that the contrast image increases the intensity of images that results in overlappong between the skin and non-skin regions. Fig. 5c illustrates the gamma corrected enhanced image of the input image followed by its corresponding binary segmented image that is badly affected by shadow light that results partial detection of human face. Fig. 5d reflects the histogram equalized image followed by its binary segmented image which has high FPR that totally misguide the face detection process. The SF also misguides the face detection process but it results in less FPR as compared to HE as shown in fig. 5e. Fig. 5f depicts unsharp masked image which has high FNR instead FPR that has missed the human face in the input image.

Figure 6 shows the experimental result of the color image containing human face facing varying light along with its background. Fig. 6a represents the original sample RGB color image followed by the binary segmented skin region obtained by using fusion strategy and the truly detected human face bounded by red color bounding box with two false detection; Fig. 6b illustrates the enhanced image of the input image obtained by applying contrast stretching approach followed by the skin extracted image using multiple threshold. It also shows the truly detected human face along with the false detection. The reason is that the contrast image increases the intensity of images that results in overlapping between the skin and non-skin regions and in the figure the color of wooden table is somewhat skin-like color that has also affected its performance. Fig. 6c illustrates the gamma corrected enhanced image of the input image followed by its corresponding binary segmented image that is by the skin-like colors of hair and wooden table that results in true detection of human face along with four false detection. Fig. 6d reflects the histogram equalized image followed by its binary segmented image that has missed the face in the image. The SF also misguides the face detection process which is shown in fig. 6e. Fig. 6f depicts unsharp masked image which has truly detected human face in the input image with false detection.

V. CONCLUSION Face detection based on geometrical features is proposed in

this paper. We have used the enhancement approach along with the fusion strategy to locate the human face robustly in images. The affected skin segmented region results after enhancement techniques due to complex background, ethnicity, varying lightings and camera characteristics. To resolve the problem, the fusion strategy is adapted to reduce the FPR and increase the TPR. Since, the accuracy of face detection results depends on the accuracy of skin region extraction from an image. Therefore, the proposed algorithm fixes the delocalization problem of bounding box by correctly identifying skin-color with minimum FPR, and to locate the human face. The advantage of our approach is its simplicity and low complexity. The future work includes making the comparative study on the state-of-the-art skin detection techniques, improving the results of the face detection system and using this face detection system as a pre-processing step in face recognition system.

IEEE-32331

REFERENCES [1] H. Han, S. Shan, X. Chen, and W. Gao, “A comparative study on

illumination preprocessing in face recognition”, Pattern Recognition, vol. 46, pp. 1691–1699, 2013.

[2] C. Lin, “Face detection in complicated backgrounds and different illumination conditions by using YCbCr color space and neural network”, Pattern Recognition Letters, vol. 28, pp. 2190–2200, 2007.

[3] K. Iqbal, M.O. Odetayo, and A. James, “Face detection of ubiquitous surveillance images for biometric security from an image enhancement perspective”, J. Ambient Intelligence and Humanized Computing, pp. 1-14, 2012.

[4] C. Zhang, and Z. Zhang, “A survey of recent advances in face detection”, Technical Report, Microsoft Research, 2010.

[5] E. Hjelm°as, B.K. Low, “Face detection: A survey”, J. Computer Vison and Image Understanding, vol. 83, pp. 236–274, 2001.

[6] M.-H. Yang, D. Kriegman, and N. Ahuja, “Detecting faces in images: A survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 34–58, 2002.

[7] A. Zaidan, H.A. Karim, N. Ahmad, G.M. Alam, B. Zaidan, “A novel hybrid module of skin detector using grouping histogram technique for bayesian method and segment adjacent-nested technique for neural network”, Int. J. Phys. Sci, vol. 5, pp. 2471–2492, 2010.

[8] L. Tao, M.-J. Seow, and V. K. Asari, “Nonlinear image enhancement to improve face detection in complex lighting environment” International Society for Optics and Photonics Electronic Imaging, pp. 606416–606416, 2006.

[9] R. Mishra and R. Subban, “Face detection for video summary using enhancement based fusion strategy”, International Journal of Research in Engineering and Technology, 2014.

[10] Y. Ban, S.-K. Kim, K.-A. Toh, and S. Lee, “Face detection based on skin color likelihood”, Pattern Recognition, vol. 47, pp. 1573-1585, 2014.

[11] H. Yao, and W. Gao, “Face detection and localization based on skin chrominance and lip chrominance transformation from color images”, Pattern Recognition, vol. 34, pp. 1555-1564, 2001.

[12] N. Unaldi, P. Sankaran, V.K. Asari, and Z.-U. Rahman, “Image enhancement for improving face detection under non-uniform lighting

conditions”, 15th International Conference on Image Processing, pp. 1332–1335, 2008.

[13] C.-S. Lim and H. Ibrahim, “Image enhancement for face images using spatial domain processing”, International Journal of Advanced Research in Computer Science and Electronics Engineering, vol. 2, pp-714, 2013.

[14] J.-U. Yun, H.-J. Lee, A.K. Paul, J.-H. Baek, “Face detection for video summary using illumination-compensation and morphological processing”, Pattern Recognition Letters, vol. 30, pp. 856–860, 2009.

[15] M. Kawulok, J. Kawulok, and J. Nalepa, “Spatial-based skin detection using discriminative skin-presence features”, Pattern Recognition Letters, 2013.

[16] I.-S. Hsieh, K.-C. Fann, and C. Lin, “A statistic approch to the detection of human faces in color nature scene”, Pattern Recognition, vol. 35, pp. 1583–1596, 2002.

[17] R. Subban and R. Mishra, “Combining color spaces for human skin detection in color images using skin cluster classifier”, Int. Conf. on Advances in Recent Technologies in Electrical and Electronics, 2013.

[18] Q. Xiao, “Using fuzzy adaptive fusion in face detection”, 2011 IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (CIBIM), pp. 157–162, 2011.

[19] Y. Wang, T. Tan, and A.K. Jain, “Combining face and iris biometrics for identity verification”, Audio and Video-Based Biometric Person Authentication, pp. 805–813, 2003.

[20] L. Wang, H. Ning, T. Tan, and W. Hu, “Fusion of static and dynamic body biometrics for gait recognition”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 149–158, 2004.

[21] X. Zhou and B. Bhanu, “Integrating face and gait for human recognition”, Computer Vision and Pattern Recognition Workshop, pp. 55–55, 2006.

[22] M. Weber, “ Computational vision - caltech face database”, Retrieved from: http://www.vision.caltech.edu/html-files/archive.html, 2005.

[23] V. Jain and A. Mukherjee, “ The indian face database”, Retrieved from: http://vis-www. cs. umass.edu/~vidit/IndianFaceDatabase/, 2002.