ISSN ONLINE(2320-9801) PRINT (2320-9798)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

3D Face Recognition Using Pose and Illumination Compensation

DR.A.Muthu kumaravel
Department of MCA, Bharath Institute of Science and Technology, Chennai, TN, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

The paper describes a face recognition system using a com-bination of color and depth images. To cope with illumi-nation and pose variations 3D information is used for the normalization of the input images. The proposed pose compensation algorithm is based on a robust 3D face detection and pose estimation technique, while illumination com-pensation exploits depth data to recover the illumination of the scene and relight the image under frontal lighting. When normalized images, depicting upright orientation and frontal lighting, are used for classification significantly high recognition rates are achieved, as established on a face database with more than 2000 images.

 

INTRODUCTION

Recent public face recognition tests demonstrated that the accuracy of state-of-the-art algorithms degrades significantly for images exhibiting pose and illumination variations. Cur-rent research efforts strive to achieve insensitivity to such variations.
The paper describes and evaluates a complete face iden-tification system using a combination of 2D color and 3D range images captured in real-time. We present several novel techniques which are capable, taking as input a pair of 2D and 3D images, to produce a pair of normalized images de-picting frontal pose and illumination. The efficiency and ro-bustness of the proposed system is demonstrated on a data set of significant size and compared with state-ofthe- art compensation techniques.
Although the 3D structure of the human face conveys important discriminatory information only a few techniques have been proposed employing range images. This is mainly due to the high cost of available 3D digitizers and the fact that they do not operate in real time (e.g. time of flight laser scanners) or produce inaccurate depth information (e.g. stereo vision). The work presented in this paper is partly motivated by the recent development of novel low cost 3D of low-cost sensors that are capable of real-time 3D acquisition [1]. A common approach adopted towards 3D face recognition is based on the extraction of 3D facial features by means of differential geometry techniques [2–4]. A few techniques [5,6] also employ grayscale images but mainly for augment-ing the detection of features such as the eyes that are harder to detect on the range image. Although feature-based tech-niques are robust to pose variations they rely on accurate 3D maps of faces, usually extracted by expensive off-line 3D scanners. Thus their applicability to real-world applica-tions with highly noisy data is questionable. The recogni-tion rates claimed by the above techniques were estimated using databases of limited size and without significant vari-ations of the faces. Only recently [7] conducted an ex-periment with a database of significant size (275 persons) containing both grayscale and range images, and produced comparative results of face identification using eigenfaces for 2D, 3D and their combination and for varying image quality. This test however considered only frontal images captured under constant illumination conditions. For this work we have recorded a face database containing several appearance variations. These variations are compensated before reaching the classifier, thus leading to high recogni-tion rates.

ACQUISITION OF 3D DATA

The proposed system is based on real-time quasi-synchronous color and 3D image acquisition based on the color structrured-light approach [1]. The sensor is based on low cost devices, an off-the-shelf CCTV-color camera and a standard slide projector. The average depth accuracy of the system op-timized for an access control application is about 0:5mm. The spatial resolution of the range images is approximately equal to the color camera resolution.
Using the above setup a face database was recorded. For each subject several images depicting different appearance variations were acquired: three facial expressions, three types of illumination (left/right side spot lights and overhead light), two pose variations (§20 degrees), two images with and without glasses, and three frontal images. The database con-tains 20 persons and 2 recordings with time lapse between each recording session about 10 days (2200 image pairs).

POSE COMPENSATION

The aim of the pose compensation algorithm described in this section is to generate, given a pair of color and depth images, novel corresponding color and depth images de-picting a frontal, upright face orientation. Also the center of the face on the input image is aligned with the center of the face in the gallery images of the same person with pixel accuracy.
The proposed technique uses the range image only for face detection and pose estimation and therefore is robust especially under varying pose and illumination conditions, as demonstrated by the experimental results.
The detection of the face in the image is the first step of the algorithm. Segmentation of the head from the body relies on statistical modelling of the head - torso points using a mixture of Gaussians assumption. The parameters of the model are then estimated by means of the Expectation Max-imization algorithm and by incorporation of a-priori constraints on the relative dimensions of the body parts, de-scribed in detail in [8].
The estimation of 3D head pose, performed next is based on the detection of the nose [8]. After the tip of the nose is localized a 3D line is fitted on the 3D coordinates of pixels on the ridge of the nose. This 3D line defines two of the three degrees of freedom of the face orientation. The third degree of freedom, that is the rotation angle around the nose axis, is then estimated by finding the 3D plane that cuts the face into two bilateral symmetric parts. The error of the above pose estimation algorithm tested on more than 2000 images is less than 2 degrees.
Once the tip of the nose and the pose of the face have been estimated, a 3D coordinate frame aligned with the face is defined centered on the tip of the nose. A warping pro-cedure is subsequently applied on the input depth image to align this local coordinate frame with a reference coordi-nate frame, which is defined during the training faces using the gallery images, bringing the face in up-right orientation. The transformation between the local and reference coordinate frames is further refined to pixel accuracy by apply-ing the ICP [9] surface registration algorithm between the warped and a reference (gallery) depth image correspond-ing to claimed person ID.
The rectified depth image contains missing pixel val-ues that are interpolated using a series of steps. Some of the missing values are determined simply by copying corre-sponding symmetric pixel values from the other side of the face. Remaining missing pixel values are linearly interpo-lated from neighboring points. The interpolated depth map is subsequently used to rectify the associated color image also using 3D warping (fig. 1).
The proposed pose compensation algorithm is very ac-curate as will be demonstrated in 5 but also computationally efficient, with total running time is less than 1 sec on a Pen-tium III 1 Ghz computer.

ILLUMINATION COMPENSATION

In this section an algorithm is described that compensates illumination by generating from the input image a novel image relight from a frontal direction. Our approach is in-spired by recent work on image-based scene relighting used for rendering realistic images. Image relighting relies on in-verting the rendering equation, i.e. the equation that relates the image brightness with the object material and geometry and the illumination of the scene. Given several images of the scene under different conditions this equation may be solved (although an ill-posed problem) to recover the illumination distribution and then use this to re-render the scene under novel illumination. The first step is therefore to recover the scene illumina-tion from a pair of color and depth images. Assuming that the scene is illuminated by a single light source a technique is adopted that learns the non-linear relationship between the image brightness and light source direction L using a set of artificially generated bootstrap images.
For each subject in our database we use the reference pose compensated depth image Ir to render N virtual views of the face illuminated from different directions. The set of light source directions is uniformly sampled from a section of the positive hemisphere. To decrease the dimensionality of the problem, from each rendered image a feature vector is extracted containing locally weighted averages of image brightness over M preselected image locations (M = 30 in our experiments). The sample locations are chosen so as to include face areas with similar albedo (i.e. the skin). Fea-ture vectors xi; i = 1; : : : ; N extracted from all the images, normalized to have zero mean and unit variance, are then used as samples of the M-dimensional illuminant direction function . An approximation of this function ~
L = G(x) G using the samples is a regression problem that may be effi-ciently solved using Support Vector Machines (SVM) [10]. Assume now that we want to compute the similarity be-tween a pose compensated probe image and gallery images of a person j in the gallery. A feature vector x is computed from the probe image as described previously. Then an estimate of the light source direction is given by ~ j image i.e. the
SVM regression function computed for the person j during the training phase.
Given the estimate of the light source direction L re-lighting the input image with frontal illumination L0 is straightforward. Let I C , I D be respectively the input pose compensated color and depth images and ~C the illumination com-
pensated image. Then the image irradiance for each pixel u is approximated by,
image
(1) where A is the unknown face albedo or texture function (geometry independent component) and R is a rendering of the surface with constant albedo. Equation 1 is written
R(ID; L; u)
image
i.e. the illumination compensated image is given by multi-plication of the input image with a ratio image.
The same relighting procedure is also applied on train-ing images. Then it is expected that illumination compen-sated probe and gallery images of the same person will only differ up to a scale factor since the intensity of the light source may not be recovered. This scale factor is cancelled by taking the logarithm of the images (that makes the fac-tor additive instead of multiplicative) and subsequently sub-tracting the mean value.
Although the description of above relighting technique considers a single channel image, color images may be han-dled equally well by applying the same procedure (illumi-nant estimation and relighting) separately for each channel.
An important advantage of the previously described al-gorithm is the flexibility in coping with complex illumination conditions by adaptation of the rendering function R above. For example, accounting for attached shadows may be simply achieved by activating shadowing in the render-ing engine. On the other hand, from our experience with different rendering models, good results may be also ob-tained with relatively simple renderings.

EXPERIMENTAL RESULTS

The focus of the experimental evaluation was to investigate the improvement achieved by incorporating the proposed pose and illumination schemes into state-of-the-art 2D face recognition algorithms. We have therefore used the Embedded Hidden Markov Model algorithm [11] as a baseline classification algorithm. Two such classifiers are used, one for color images (for practical reason only the red compo-nent of the color images was used) and one for depth images. The results of each classifier is a similarity measure for every person in the database. Using the similarity measures associated with color and depth images respectively a combined similarity measure is obtained using the product rule [12].
We have performed several experiments using images of the recorded face database. Training of the classifier was performed using images from the first recording session. On average 3 images per subject depicting different facial expressions were used for training. Testing was performed using all images of the second recording session.
Table 1 demonstrates the recognition rates achieved with the proposed compensation scheme. This is compared with the case that no compensation is performed (the face detec-tion algorithm in [11] was applied in this case), and with manual pose normalization i.e. three points over the eyes and mouth were selected by a human operator and used to rectify the images. Rectification in this case is performed either by 2D affine warping of the images or by 3D warping using depth information as described in section 3.As shown in table 1 the proposed scheme results in significant im-provements in the recognition accuracy and its very close to the accuracy achieved by manual image normalization.
Very good results were also obtained by the posed illumination compensation technique (table 2). We have
In summary we have proposed a new approach for 3D face recognition based on automatic image normalization algorithms exploiting the availability of 3D information. Sig-nificant improvements in face classification accuracy were obtained using this scheme. We hope in further improve-ment of these results in the future using model-based image warping for pose compensation and also by the investiga-tion of efficient reflectance estimation techniques to further enhance illumination compensation.

Tables at a glance

Table icon Table icon
Table 1 Table 2

Figures at a glance

Figure Figure
Figure 1 Figure 2

References