FACE RECOGNITION USING PCA, LDA AND VARIOUS DISTANCE CLASSIFIERS

Kuldeep Singh Sodhi; Madan Lal

FACE RECOGNITION USING PCA, LDA AND VARIOUS DISTANCE CLASSIFIERS

Kuldeep Singh Sodhi¹, Madan Lal²

University College of Engineering, Punjabi University, Patiala, Punjab, India.
Assistant Professor, University College of Engineering, Punjabi University, Patiala, Punjab, India.

Related article at Pubmed, Scholar Google

Visit for more related articles at Journal of Global Research in Computer Sciences

Abstract

Face recognition has become a major field of interest these days. Face recognition algorithms are used in a wide range of applications such as security control, crime investigation, and entrance control in buildings, access control at automatic teller machines, passport verification, identifying the faces in a given databases. This paper discusses different steps involved in face recognition using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) and the different distance measures that can be used in face recognition

Keywords

Face Recognition, Principal Component Analysis, Linear Discriminant Analysis, LDA, PCA, distance measures.

INTRODUCTION

Face recognition is biometric identification by scanning a personÃÂ¢Ãâ¬ÃÅ¸s face and matching it against a library of known faces. Face recognition is defined as the identification of a person from an image of their face.The success of any recognition method depends heavily on the particular choice of features used by the classifier. A good feature extractor is claimed to select features which are not sensitive to arbitrary environmental variations such as orientation and illumination [1]. Recognition systems can operate in well-controlled or uncontrolled environments. Image Recognition in well-controlled environments, where the imaging conditions of the trainee as well as the probe images are fixed, is relatively mature field of research [2]. Research in uncontrolled environments is much less mature and the results from well-controlled environments cannot be assumed to hold in uncontrolled environments. Recognition in controlled environments can be time and cost intensive and can be impractical to use in real world use [2]. As part of this research, main emphasis is on the assessment of suitability of image recognition systems in uncontrolled environment and their ability to use in real-world.

A wide variety of recognition methods for image recognition (fig. 1), especially for face image recognition are reported in the literature [3]. In this survey various methods for image recognition are categorized as Holistic methods [4-6], Feature-based methods [7-9], Hybrid methods [10]. Holistic methods use the whole face region as the raw input to a recognition system [3]. One of the most widely used representations of the face region is eigenfaces, which are based on principal component analysis and use a nearest neighbour classifier [4]. Fisherfaces which use linear/Fisher discriminant analysis (FLD/LDA) for best discriminating the face images of same class [5-6]. In Feature-based (structural) matching methods, local features such as the eyes, nose and mouth are first extracted and their locations and local statistics (geometric and/or appearance) are fed into a structural classifier [3]. Earlier methods belong to the category of structural matching methods, use the distances and angles between eye corners, mouth extreme, nostrils, and chin top [7]. Hidden Markov Model (HMM) based methods use strips of pixels that cover the forehead, eye, nose, mouth, and chin [8]. The Elastic Bunch Graph Matching (EBGM) algorithm stores spectral information about the neighbourhoods of facial features by convolving these areas with Gabor wavelets (masks) [9]. The Hybrid methods, just as the human perception system uses both local features and the whole face region to recognize a face. One can argue that these methods could potentially offer the better of the two types of methods [3].

One of the methods of this category is based on recent advances in component-based detection/recognition and 3D morphable models. The basic idea of component-based methods is to decompose a face into a set of facial components such as mouth and eyes that are interconnected by a flexible geometrical model. The 3D morphable face model is applied to generate arbitrary synthetic images under varying pose and illumination. Only three face images (frontal, semi-profile, profile) of a person are needed to compute the 3D face model [10]. The techniques used in this paper are based on holistic approaches.

VARIOUS STEPS IN FACE RECOGNITION

Image Acquisition:

The method for acquiring face images depends upon the underlying application. For instance, surveillance applications may best be served by capturing face images by means of a video camera while image database investigations may require static intensity images taken by a standard camera. Some other applications, such as access to top security domains, may even necessitate the forgoing of the nonintrusive quality of face recognition by requiring the user to stand in front of a 3D scanner or an infra-red sensor [21].

Discrete Wavelet Transform (DWT):

The wavelet transform concentrates the energy of the image signals into a small number of wavelet coefficients. It has good time-frequency localization property [11]. The fundamental idea behind wavelets is to analyse signal according to scale. It was developed as an alternative to the short time Fourier to overcome problems related to its frequency and time resolution properties [12].Wavelet transform decomposes a signal into a set of basic functions. These basic functions are obtained from a mother wavelet by translation and dilation.

Where a and b are both real numbers which quantify the scaling and translation operations respectively[13]. The advantage of DWT over DFT and DCT is that DWT performs a multi-resolution analysis of signal with localization in both time and frequency. Also, functions with discontinuities and with sharp spikes require fewer wavelet basis vectors in the wavelet domain than sine-cosine basis vectors to achieve a comparable approximation [14].

The symbols L and H refer to low-pass and high-pass filter respectively. LL represents the approximation sub-band & LH, HL and HH are the detail sub-bands. LL is the low frequency sub-band gives global description of an image[15]. Horizontal coefficients (LH) correspond to the low-frequency component in the horizontal direction and high-frequency component in the vertical direction [16].

Feature Extraction using PCA or LDA:

Principal Component Analysis (PCA):

PCA is also known as Karhunen Loeve projection. PCA calculates the Eigen vectors of the covariance matrix, and projects the original data onto a lower dimensional feature space, which is defined by Eigen vectors with large Eigen values. PCA has been used in face representation and recognition where the Eigen vectors calculated are referred to as Eigen faces. In gel images, even more than in human faces, the dimensionality of the original data is vast compared to the size of the dataset, suggesting PCA as a useful first step in analysis. There are many approaches to face recognition ranging from the Principal Component Analysis (PCA) approach (also known as Eigen faces) Prediction through feature matching. The idea of feature selection and point matching has been used to track human motion. Eigen faces have been used to track human faces.

They use a principal component analysis approach to store a

set of known patterns in a compact subspace representation

of the image space, where the subspace is spanned by the

Eigen vectors of the training image set.

PCA is a useful statistical technique that has found

application in fields such as face recognition and image

compression, and is a common technique for finding patterns

in data of high dimension. The basic goal is to implement a

simple face recognition system, based on well-studied and

well-understood methods. One can choose to go into depth

of one and only one of those methods. The method to be

implemented is the PCA (Principal Component Analysis). It

is one of the more successful techniques of face recognition

and easy to understand and describe using mathematics. This

method involves using Eigen faces.

The first step is to produce a feature detector (dimension

reduction). Principal Components Analysis (PCA) was

chosen because it is the most efficient technique, of

dimension reduction, in terms of data compression. This

allows the high dimension data, the images, to be

represented by lower dimension data and so hopefully

reducing the complexity of grouping the images [19].PCA

aims to maximize between-class data separation [17]. It

works by finding a new coordinate system for a set of data,

where the axes (or principal components) are ordered by the

variance contained within the training data [14]. A brief

view of PCA is given below [4].

Linear Discriminant Analysis (LDA):

Linear Discriminant is a “classical” technique in pattern

recognition, where it is used to find a linear combination of

features which characterize or separate two or more classes

of objects or events. The resulting combination may be used

as a linear classifier or, more commonly, for dimensionality

reduction before it can be classified[26].

In computerized face recognition, each face is represented

by a large number of pixel values. Linear discriminant

analysis is primarily used here to reduce the number of

features to a more manageable number before classification.

Each of the new dimensions is a linear combination of pixel

values, which form a template. The linear combinations

obtained using Fisher's linear discriminant are called Fisher

faces, while those obtained using the related principal

component analysis are called eigenfaces [26].

Linear Discriminant Analysis easily handles the case where

the within-class frequencies are unequal and their

performance has been examined on randomly generated test

data. This method maximizes the ratio between-class

variance to the within-class variance in any particular data

set thereby guaranteeing maximal separability. Data sets can

be transformed and test vectors can be classified in the

transformed space by two different approaches.

a. Class-dependent transformation: This type of

approach involves maximizing the ratio of between

class variance to within class variance. The main

objective is to maximize this ratio so that adequate

class separability is obtained. The class-specific type

approach involves using two optimizing criteria for

transforming the data sets independently.

b. Class-independent transformation: This approach

involves maximizing the ratio of overall variance to

within class variance. This approach uses only one

optimizing criterion to transform the data sets and

hence all data points irrespective of their class identity

are transformed using this transform. In this type of

LDA, each class is considered as a separate class

against all other classes [26].

Difference between PCA and LDA:

The prime difference between LDA and PCA is that LDA deals directly with discrimination between classes, whereas the PCA deals with the data in its entirety for the principal components analysis without paying any particular attention to the underlying class structure [27].In PCA, the shape and lo[27cation of the original data sets changes when transformed to a different space whereas LDA does not change the location but only tries to provide more class separability and draw a decision region between the given classes. The goal of the Linear Discriminant Analysis (LDA) is to find an efficient way to represent the face vector space. PCA constructs the face space using the whole face training data as a whole, and not using the face class information. On the other hand, LDA uses class specific information which best discriminates among classes. LDA produces an optimal linear discriminant function which maps the input into the classification space in which the class identification of this sample is decided based on some metric such as Euclidean distance. LDA takes into account the different variables of an object and works out which group the object most likely belongs to[26].

In Figure 6, there are two different classes represented by two different Gaussian like distributions. However, only two samples per class are supplied to the PCA or LDA. In this conceptual depiction, the classification result of the PCA procedure (using only the first eigenvector) is more desirable than the result of the LDA. DPCA and DLDA represent the decision thresholds obtained by using nearest-neighbour classification [27].

One characteristic of both PCA and LDA is that they produce spatially global feature vectors. In other words, the basis vectors produced by PCA and LDA are non-zero for almost all dimensions, implying that a change to a single input pixel will alter every dimension of its subspace projection. At one level, PCA and LDA are very different: LDA is a supervised learning technique that relies on class labels, whereas PCA is an unsupervised technique [28].

LDA and PCA optimize the transformation T with different intentions. LDA optimizes T by maximizing the ration of between-class variation and with-in class variation. PCA obtains T by searching for the directions that have largest variations. Therefore LDA and PCA project parameter vectors along different directions. Figure 7 shows the difference between the projecting directions of LDA and PCA when projecting the parameter vectors from a two-dimensional parametric space onto a one-dimensional feature space [29].

The comparison table between PCA and LDA is given in Figure 8.

Distance Measures:

Various distance measures can be used as similarity measure to compare the feature vector of test image with that of trainee images. All the trainees as well as the test image are projected to the feature space of training dataset. Distances between the projected test image and the projection of all centred trainee images are calculated. Test image is supposed to have minimum distance with its corresponding equivalent image in the training dataset.

Types of Distance Measures:

The various types of distance measures that can be used in face recognition are explained below [22]:

City Block distance:

The sum of absolute differences between two vectors is called the L1 distance, or city-block distance. This is a true distance function since it obeys the triangle inequality. reason why it is called the city-block distance, and also as the Manhattan distance or taxicab distance is that going from a point A to a point B is achieved by walking „around the blockÃÂ¢Ãâ¬ÃÅ¸, compared to the Euclidean „straight lineÃÂ¢Ãâ¬ÃÅ¸ distance[23].

Where λiis the ith eigenvalue corresponding to the ith eigenvector

Combining distance measures:

Rather than using a single distance classifier for finding the distance between images, some combination of the above given standard distance measures (City Block, Euclidean, angle and Mahalanobis) might outperform the individual distance measures. The simplest mechanism for combining distance measures is to add them.

Rotation of test image:

The recognition accuracy of the face recognition system can be improved by rotating the test image at different angles such as 90, 180,270.

CONCLUSION

The major steps involved in face recognition are:- image acquisition, applying DWT, feature extraction using PCA or LDA, selecting distance measure and finally rotating the image at different angles if match is not found.PCA technique is unsupervised learning technique that is best suited for databases having images without class labels, whereas LDA is supervised learning technique that relies on class labels and is well suited for distributed classes in small datasets.Different distance measures or classifiers may be used for finding the distance between trainee image and database images such as Euclidean distance, city-block distance, angle distance classifier, mahalanobis distance etc. Rather than using a single distance classifier for finding the distance between images, some combination of the standard distance measures (City Block, Euclidean, angle and Mahalanobis) might outperform the individual distance measures. The simplest mechanism for combining distance measures is to add them.

References

Rabab M. Ramadan, Rehab F. Abdel – Kader “Face Recognition Using Particle Swarm Optimization-Based Selected Features”, International Journal of Signal Processing, Image Processing and Pattern Recognition, Vol. 2, No. 2, June 2009,pp.51-65
Charles Schmitt, Sean Maher, “The Evaluation and Optimization of Face Recognition Performance Using Experimental Design and Synthetic Images”, A Research Report prepared by RTI International-Institute of Homeland Security Solutions , U.S.A., June 2011
W.Zhao, R.Chellappa, P. J. Phillips, A. Rosenfeld, “Face Recognition: A Literature Survey”, ACM Computing Surveys, Vol. 35, No. 4, December 2003, pp. 399–458
M.Turk, A.Pentland, “Eigenfaces for Recognition”, Journal of Cognitive Neurosicence, Vol. 3, No. 1, 1991, pp. 71-86
W. Zhao, R.Chellappa, A. Krishnaswamy, “Discriminant Analysis of Principal Components for Face Recognition”, Proceedings of the 3rd IEEE International Conference on face and Gesture Recognition, FGÃÂ¢Ãâ¬ÃÅ¸98, 14-16 April 1998,Nara,Japan,pp.336-341
P. N. Belhumeur, J. P. Hespanha, and D.J. Kriegman “Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection”, IEEETransaction on PatternAnalysis Machine Intelligence (PAMI), Vol.19, 1997, pp.711–720
T .Kanade, “Computer recognition of human faces”, Birkhauser, Basel, Switzerland, and Stuttgart, Germany,1977
A. V. Nefian, M.H. Hayes III, “Hidden Markov models for face recognition”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing,12-15 May 1998,Vol.5,pp.2721-2724
L. Wiskott, J.M. Fellous, N. Kruger, C. Malsburg, “ Face Recognition by Elastic Bunch Graph Matching”, IEEE Transaction on Pattern Analysis Machine Intelligence (PAMI), Vol.19, 1997, pp. 775–779
J. Huang, B. Heisele, V. Blanz, “Component-based Face Recognition with 3D Morphable Models”, Proceedings of the 4th International Conference on Audio and Video Based Biometric Person Authentication (AVBPA), 09-11 June 2003, Guildford, UK, pp. 27-34
SHI Dongcheng, JIANG Jieqing, “The Method of Facial Expression Recognition Based on DWT-PCA/LDA”, Proceedings of IEEE 3rd International Congress on Image and Signal Processing (CISP), 16-18 October 2010,Yantai, Vol.4, pp.1970-1974
Kamarul Hawari Ghazali, Mohd. Marzuki Mustafa, Aini Hussain, “Image Classification using Two Dimensional Discrete Wavelet Transform”, Proceedings of International Conference on Instrumentation, Control & Automation (ICA),20-22 October 2009, Bandung, Indonesia,pp.71-74
Li Xian Wei, Yang Sheng, Wang Qi, Li Ming, “Face Recognition Based on Wavelet Transform and PCA”, Proceedings of the IEEEPacific-Asia Conference on Knowledge Engineering and Software Engineering (KESEÃÂ¢Ãâ¬ÃÅ¸09), 19-20 December 2009, Shenzhen pp.136-138
Paul Nicholl, Abbes Amira, DjamelBouchaffra “Multiresolution Hybrid Approaches for Automated Face Recognition”, Proceedings of the IEEE 2nd NASA/ESA Conference, 5-8 August 2007, Edinburgh.pp.89-96
Meihua Wang, Hong Jiang and Ying Li, “Face Recognition based on DWT/DCT and SVM”, Proceedings of the IEEE International Conference on Computer Application and System Modeling (ICCASM), 22-24 October 2010, Taiyuan,Vol.3, pp.V3-507-V3-510
Yee Wan Wong, Kah Phooi Seng, Li-Minn Ang, “M-Band Wavelet Transform in Face Recognition System”, Proceedings of IEEE 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI) Conference,14-17 May 2008,Krabi ,Vol.1, pp.453-458
Jan Mazanec, Martin Melisek, Milos Oravec, Jarmila Pavlovicova, “Support Vector Machines, PCA and LDA in face recognition” Journal of Electrical Engineering, Vol. 59, No. 4, 2008, pp.203-209
Bruce A. Draper, Kyungim Baek, Marian Stewart Bartlett, J. Ross Beveridge, “Recognizing faces with PCA and ICA”,
Srinivasulu Asadi, Dr .Ch.D.V.SubbaRao, V.Saikrishna, “A Comparative study of Face Recognition with Principal Component Analysis and Cross-Correlation Technique”, International Journal of Computer Applications (0975 – 8887), November 2010, Vol. 10, No.8,pp.17-21
Rahul Garg, Varun Gulshan, “PCA : Face Recognition”, a Project Report , October 29, 2005
Mini Singh Ahuja, Sumit Chhabra, “Effect of distance measures in PCA based face recognition”, International Journal of Enterprise Computing and Business Systems (2230-8849), July 2011, Vol. 1 Issue 2.
Wendy S. Yambor, Bruce A. Draper, J. Ross Beveridge, “Analyzing PCA-based Face Recognition Algorithms: Eigenvector Selection and Distance Measures”.
http://www.econ.upf.edu/~michael-/stanford./maeb5.pdf
http://en.wikipedia.org/wiki/Mahalanobis_distance
Hussein Rady, “Face Recognition using Principle Component Analysis with Different Distance Classifiers”, Internationa Journal of Computer Science and Network Security, October 2011 Vol.11 No.10.
M.N.Shah Zainudin., Radi H.R., S. Muniroh Abdullah., Rosman Abd. Rahim.,M.Muzafar Ismail., MIdzdihar Idris., H.A.Sulaiman., Jaafar A. , “Face recognition using Principal Component Analysis and Linear Discriminant Analysis”, International Journal of Electrical & Computer Sciences IJECS-IJENS,October 2012 Vol 12 No 5.
Aleix M. Martínez, and Avinash C. Kak, “PCA versus LDA”, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) , February 2001, Vol. 23, No. 2
Bruce A. Draper, Kyungim Baek, Marian Stewart Bartlett, and J. Ross Beveridge, “Recognizing faces with PCA and ICA”, Journal of Computer Vision and Image Understanding , 11 February 2003,91, pp. 115–137.
Xuechuan Wang, “Feature Extraction and Dimensionality Reduction in Pattern Recognition and Their Application in Speech Recognition”, PhD dissertation, School of Microelectronical Engineering, Griffith University, November 2002.