PCA, EigenFace and All That

home · about · subscribe

July 27, 2013 · -

PCA(Principal Component Analysis) is one of the most commonly used unsupervised learning algorithm to compress, extract features for data and even for dimensionality reduction purposes. It has quite a lof of applications as follows:


What does it do in the first place?

Functions of PCA

Why do we choose PCA over other transformations which turn the original variables into a representation which depend on orthogonal bases, say Fourier Transform?

What about the disadvantages?

Assumptions that PCA does:

Eigenvalues and Eigenvectors

What does PCA do (in a nutshell)?


By modifying the variance of the variables(scaling), it is possible to attribute different importance to them. By doing so, the prior information or the belief on the importance of the attributes can be preserved even in the PCA stage.

Definition of Principal Components

Given a sample nn observations on a vector pp variables:
x=(x1,x2,,xp)x = (x_1, x_2, \ldots, x_p) define the principal component of the sample by a linear transformation which is given as following:
z=aTx=i=1paixi z = a^T x = \displaystyle\sum_{i=1}^p a_i x_i where aa is a=(a1,a2,,ap) a = (a_1, a_2, \ldots, a_p) which is chosen to maximize the variance of zz and subject to cov[zk,zl]=0,k>l1 cov[z_k, z_l] = 0, k \gt l \geq 1 and to aTa=1 a^T a = 1

How to derive the coefficients aa ?

The variance is:
var[z]=<z2><z>2 var[z] = \lt z^2\gt - \lt z \gt^2 =i,j=1paiaj<xixj>i,j=1paiaj<xi><xj> = \displaystyle \sum_{i,j = 1}^p a_i a_j \lt x_i x_j \gt - \displaystyle \sum_{i,j=1}^p a_i a_j \lt x_i \gt \lt x_j \gt =i,j=1paiajSij = \displaystyle\sum_{i,j = 1}^p a_i a_j S_{ij} where Sij=<xixj><xi><xj>S_{ij} = \lt x_i x_j \gt - \lt x_i \gt \lt x_j \gt is the covariance matrix for xx.
To find aa which maximizes var[z]var[z] subject to aTa=1a^T a = 1, let us introduce a lagrange multiplier λ\lambda. Then, the maximization equation becomes aTSaλ(aTa1) a^T S a - \lambda (a^T a - 1) If we take the derivative with respect to aa, then the equation becomes Saλa=0 S a - \lambda a = 0 (SλI)a=0 (S - \lambda I) a = 0 Therefore, aa is an eigenvector of SS which has the corresponding value of λ\lambda.

In short, we are interested in representing the most of the variation in the data by transforming the original variables into principal components. These components are orthogonal and ordered by magnitude so that the first few of them could explain most of the variation in the original observation.


PCA has a very good application which is in the computer vision domain, called EigenFace, short version. Eigenface is a name for eigenvectors which are the components of the face itself. It has been used for face recognition where the most variations considered as important. It was quite successful as well for some 20 years ago although it was replaced then by other methods. In this implementation, I used a particular subset of Yale Face Database.
The images that I used are given below: Alt text

If we average the face used for PCA, we get the following face: Alt text

The eigenfaces that we generated out of 11 faces are given below. Alt text

The eigenface which has the most variation(almost half of it) is given below(note the illumination variation) Alt text

Cumulative sum of first 10 eigenvalues is given below. Alt text

As it could be seen from here(the last line), the top 4 eigenfaces can explain 95% variance of the faces.

The program that I used to generate the images in here and see the Notebook for the flow of overall program.

All Rights Reserved

Copyright, 2020