天天看点

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

PCA

Contents

  [hide]
  • 1 Introduction
  • 2 Example and Mathematical Background
  • 3 Rotating the Data
  • 4 Reducing the Data Dimension
  • 5 Recovering an Approximation of the Data
  • 6 Number of components to retain
  • 7 PCA on Images
  • 8 References

Introduction

Principal Components Analysis (PCA) is a dimensionality reduction algorithm that can be used to significantly speed up your unsupervised feature learning algorithm. More importantly, understanding PCA will enable us to later implement whitening, which is an important pre-processing step for many algorithms.

Suppose you are training your algorithm on images. Then the input will be somewhat redundant, because the values of adjacent pixels in an image are highly correlated. Concretely, suppose we are training on 16x16 grayscale image patches. Then 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 are 256 dimensional vectors, with one feature 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 corresponding to the intensity of each pixel. Because of the correlation between adjacent pixels, PCA will allow us to approximate the input with a much lower dimensional one, while incurring very little error.

Example and Mathematical Background

For our running example, we will use a dataset 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 with 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 dimensional inputs, so that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Suppose we want to reduce the data from 2 dimensions to 1. (In practice, we might want to reduce data from 256 to 50 dimensions, say; but using lower dimensional data in our example allows us to visualize the algorithms better.) Here is our dataset:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

This data has already been pre-processed so that each of the features 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 have about the same mean (zero) and variance.

For the purpose of illustration, we have also colored each of the points one of three colors, depending on their 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 value; these colors are not used by the algorithm, and are for illustration only.

PCA will find a lower-dimensional subspace onto which to project our data. From visually examining the data, it appears that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the principal direction of variation of the data, and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 the secondary direction of variation:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

I.e., the data varies much more in the direction 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 than 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. To more formally find the directions 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, we first compute the matrix 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as follows:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

If 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 has zero mean, then 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is exactly the covariance matrix of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. (The symbol "

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

", pronounced "Sigma", is the standard notation for denoting the covariance matrix. Unfortunately it looks just like the summation symbol, as in 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

; but these are two different things.)

It can then be shown that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

---the principal direction of variation of the data---is the top (principal) eigenvector of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

is the second eigenvector.

Note: If you are interested in seeing a more formal mathematical derivation/justification of this result, see the CS229 (Machine Learning) lecture notes on PCA (link at bottom of this page). You won't need to do so to follow along this course, however.

You can use standard numerical linear algebra software to find these eigenvectors (see Implementation Notes). Concretely, let us compute the eigenvectors of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and stack the eigenvectors in columns to form the matrix 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Here, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the principal eigenvector (corresponding to the largest eigenvalue), 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the second eigenvector, and so on. Also, let

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 be the corresponding eigenvalues.

The vectors 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 in our example form a new basis in which we can represent the data. Concretely, let 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 be some training example. Then 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the length (magnitude) of the projection of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 onto the vector 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

Similarly, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the magnitude of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 projected onto the vector 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

Rotating the Data

Thus, we can represent 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 in the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-basis by computing

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

(The subscript "rot" comes from the observation that this corresponds to a rotation (and possibly reflection) of the original data.) Lets take the entire training set, and compute 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 for every 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Plotting this transformed data 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, we get:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

This is the training set rotated into the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

,

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 basis. In the general case, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 will be the training set rotated into the basis 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

,

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, ...,

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

One of the properties of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is that it is an "orthogonal" matrix, which means that it satisfies 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. So if you ever need to go from the rotated vectors 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 back to the original data 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, you can compute

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

because 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

Reducing the Data Dimension

We see that the principal direction of variation of the data is the first dimension 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 of this rotated data. Thus, if we want to reduce this data to one dimension, we can set

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

More generally, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and we want to reduce it to a 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 dimensional representation 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 (where 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

), we would take the first

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 components of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, which correspond to the top 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 directions of variation.

Another way of explaining PCA is that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is an 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 dimensional vector, where the first few components are likely to be large (e.g., in our example, we saw that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 takes reasonably large values for most examples 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

), and the later components are likely to be small (e.g., in our example, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 was more likely to be small). What PCA does it it drops the the later (smaller) components of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and just approximates them with 0's. Concretely, our definition of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 can also be arrived at by using an approximation to 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 where all but the first 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 components are zeros. In other words, we have:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

In our example, this gives us the following plot of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 (using 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

):

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

However, since the final 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 components of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as defined above would always be zero, there is no need to keep these zeros around, and so we define 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as a 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-dimensional vector with just the first 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 (non-zero) components.

This also explains why we wanted to express our data in the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 basis: Deciding which components to keep becomes just keeping the top 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 components. When we do this, we also say that we are "retaining the top 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 PCA (or principal) components."

Recovering an Approximation of the Data

Now, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is a lower-dimensional, "compressed" representation of the original 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Given 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, how can we recover an approximation 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to the original value of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

? From an earlier section, we know that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Further, we can think of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as an approximation to 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, where we have set the last 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 components to zeros. Thus, given 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, we can pad it out with 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 zeros to get our approximation to 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Finally, we pre-multiply by 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to get our approximation to 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Concretely, we get

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

The final equality above comes from the definition of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 given earlier. (In a practical implementation, we wouldn't actually zero pad 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and then multiply by 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, since that would mean multiplying a lot of things by zeros; instead, we'd just multiply 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 with the first 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 columns of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as in the final expression above.) Applying this to our dataset, we get the following plot for 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

We are thus using a 1 dimensional approximation to the original dataset.

If you are training an autoencoder or other unsupervised feature learning algorithm, the running time of your algorithm will depend on the dimension of the input. If you feed 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 into your learning algorithm instead of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, then you'll be training on a lower-dimensional input, and thus your algorithm might run significantly faster. For many datasets, the lower dimensional 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

representation can be an extremely good approximation to the original, and using PCA this way can significantly speed up your algorithm while introducing very little approximation error.

Number of components to retain

How do we set 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

; i.e., how many PCA components should we retain? In our simple 2 dimensional example, it seemed natural to retain 1 out of the 2 components, but for higher dimensional data, this decision is less trivial. If 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is too large, then we won't be compressing the data much; in the limit of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, then we're just using the original data (but rotated into a different basis). Conversely, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is too small, then we might be using a very bad approximation to the data.

To decide how to set 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, we will usually look at the percentage of variance retained for different values of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Concretely, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, then we have an exact approximation to the data, and we say that 100% of the variance is retained. I.e., all of the variation of the original data is retained. Conversely, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, then we are approximating all the data with the zero vector, and thus 0% of the variance is retained.

More generally, let 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 be the eigenvalues of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 (sorted in decreasing order), so that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the eigenvalue corresponding to the eigenvector 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Then if we retain 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 principal components, the percentage of variance retained is given by:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

In our simple 2D example above, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Thus, by keeping only 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 principal components, we retained 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, or 91.3% of the variance.

A more formal definition of percentage of variance retained is beyond the scope of these notes. However, it is possible to show that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Thus, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, that shows that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is usually near 0 anyway, and we lose relatively little by approximating it with a constant 0. This also explains why we retain the top principal components (corresponding to the larger values of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

) instead of the bottom ones. The top principal components 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 are the ones that're more variable and that take on larger values, and for which we would incur a greater approximation error if we were to set them to zero.

In the case of images, one common heuristic is to choose 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 so as to retain 99% of the variance. In other words, we pick the smallest value of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 that satisfies

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Depending on the application, if you are willing to incur some additional error, values in the 90-98% range are also sometimes used. When you describe to others how you applied PCA, saying that you chose 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to retain 95% of the variance will also be a much more easily interpretable description than saying that you retained 120 (or whatever other number of) components.

PCA on Images

For PCA to work, usually we want each of the features 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to have a similar range of values to the others (and to have a mean close to zero). If you've used PCA on other applications before, you may therefore have separately pre-processed each feature to have zero mean and unit variance, by separately estimating the mean and variance of each feature 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. However, this isn't the pre-processing that we will apply to most types of images. Specifically, suppose we are training our algorithm on natural images, so that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is the value of pixel 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. By "natural images," we informally mean the type of image that a typical animal or person might see over their lifetime.

Note: Usually we use images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image patches randomly from these to train the algorithm. But in practice most feature learning algorithms are extremely robust to the exact type of image it is trained on, so most images taken with a normal camera, so long as they aren't excessively blurry or have strange artifacts, should work.

When training on natural images, it makes little sense to estimate a separate mean and variance for each pixel, because the statistics in one part of the image should (theoretically) be the same as any other. This property of images is calledstationarity.

In detail, in order for PCA to work well, informally we require that (i) The features have approximately zero mean, and (ii) The different features have similar variances to each other. With natural images, (ii) is already satisfied even without variance normalization, and so we won't perform any variance normalization. (If you are training on audio data---say, on spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform variance normalization either.) In fact, PCA is invariant to the scaling of the data, and will return the same eigenvectors regardless of the scaling of the input. More formally, if you multiply each feature vector 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 by some positive number (thus scaling every feature in every training example by the same number), PCA's output eigenvectors will not change.

So, we won't use variance normalization. The only normalization we need to perform then is mean normalization, to ensure that the features have a mean around zero. Depending on the application, very often we are not interested in how bright the overall input image is. For example, in object recognition tasks, the overall brightness of the image doesn't affect what objects there are in the image. More formally, we are not interested in the mean intensity value of an image patch; thus, we can subtract out this value, as a form of mean normalization.

Concretely, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 are the (grayscale) intensity values of a 16x16 image patch (

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

), we might normalize the intensity of each image 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as follows:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, for all 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Note that the two steps above are done separately for each image 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 here is the mean intensity of the image 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. In particular, this is not the same thing as estimating a mean value separately for each pixel 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization method as given in the equations above would be a reasonable default.

References

http://cs229.stanford.edu

Whitening

Contents

  [hide]
  • 1 Introduction
  • 2 2D example
  • 3 ZCA Whitening
  • 4 Regularizaton

Introduction

We have used PCA to reduce the dimension of the data. There is a closely related preprocessing step called whitening (or, in some other literatures, sphering) which is needed for some algorithms. If we are training on images, the raw input is redundant, since adjacent pixel values are highly correlated. The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.

2D example

We will first describe whitening using our previous 2D example. We will then describe how this can be combined with smoothing, and finally how to combine this with PCA.

How can we make our input features uncorrelated with each other? We had already done this when computing 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Repeating our previous figure, our plot for 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 was:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

The covariance matrix of this data is given by:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

(Note: Technically, many of the statements in this section about the "covariance" will be true only if the data has zero mean. In the rest of this section, we will take this assumption as implicit in our statements. However, even if the data's mean isn't exactly zero, the intuitions we're presenting here still hold true, and so this isn't something that you should worry about.)

It is no accident that the diagonal values are 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Further, the off-diagonal entries are zero; thus, 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 are uncorrelated, satisfying one of our desiderata for whitened data (that the features be less correlated).

To make each of our input features have unit variance, we can simply rescale each feature 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 by 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Concretely, we define our whitened data 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as follows:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Plotting 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, we get:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

This data now has covariance equal to the identity matrix 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. We say that 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is our PCA whitened version of the data: The different components of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 are uncorrelated and have unit variance.

Whitening combined with dimensionality reduction. If you want to have data that is whitened and which is lower dimensional than the original input, you can also optionally keep only the top 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 components of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. When we combine PCA whitening with regularization (described later), the last few components of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 will be nearly zero anyway, and thus can safely be dropped.

ZCA Whitening

Finally, it turns out that this way of getting the data to have covariance identity 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 isn't unique. Concretely, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is any orthogonal matrix, so that it satisfies 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 (less formally, if 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 is a rotation/reflection matrix), then 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

will also have identity covariance. In ZCA whitening, we choose 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. We define

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Plotting 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, we get:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

It can be shown that out of all possible choices for 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, this choice of rotation causes 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to be as close as possible to the original input data 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

When using ZCA whitening (unlike PCA whitening), we usually keep all 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 dimensions of the data, and do not try to reduce its dimension.

Regularizaton

When implementing PCA whitening or ZCA whitening in practice, sometimes some of the eigenvalues 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 will be numerically close to 0, and thus the scaling step where we divide by 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 would involve dividing by a value close to zero; this may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we therefore implement this scaling step using a small amount of regularization, and add a small constant 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to the eigenvalues before taking their square root and inverse:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

When 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 takes values around 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, a value of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 might be typical.

For the case of images, adding 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 here also has the effect of slightly smoothing (or low-pass filtering) the input image. This also has a desirable effect of removing aliasing artifacts caused by the way pixels are laid out in an image, and can improve the features learned (details are beyond the scope of these notes).

ZCA whitening is a form of pre-processing of the data that maps it from 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. It turns out that this is also a rough model of how the biological eye (the retina) processes images. Specifically, as your eye perceives images, most adjacent "pixels" in your eye will perceive very similar values, since adjacent parts of an image tend to be highly correlated in intensity. It is thus wasteful for your eye to have to transmit every pixel separately (via your optic nerve) to your brain. Instead, your retina performs a decorrelation operation (this is done via retinal neurons that compute a function called "on center, off surround/off center, on surround") which is similar to that performed by ZCA. This results in a less redundant representation of the input image, which is then transmitted to your brain.

Implementing PCA/Whitening

In this section, we summarize the PCA, PCA whitening and ZCA whitening algorithms, and also describe how you can implement them using efficient linear algebra libraries.

First, we need to ensure that the data has (approximately) zero-mean. For natural images, we achieve this (approximately) by subtracting the mean value of each image patch.

We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using

avg = mean(x, 1);     % Compute the mean pixel intensity value separately for each patch. 
x = x - repmat(avg, size(x, 1), 1);
      

Next, we need to compute 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can compute this in one fell swoop as

sigma = x * x' / size(x, 2);
      

(Check the math yourself for correctness.) Here, we assume that x is a data structure that contains one training example per column (so, x is a 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-by-

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 matrix).

Next, PCA computes the eigenvectors of Σ. One could do this using the Matlab eig function. However, because Σ is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the svd function. Concretely, if you implement

[U,S,V] = svd(sigma);
      

then the matrix U will contain the eigenvectors of Sigma (one eigenvector per column, sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix S will contain the corresponding eigenvalues (also sorted in decreasing order). The matrix V will be equal to transpose of U, and can be safely ignored.

(Note: The svd function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues. A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.)

Finally, you can compute 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as follows:

xRot = U' * x;          % rotated version of the data. 
xTilde = U(:,1:k)' * x; % reduced dimension representation of the data, 
                        % where k is the number of eigenvectors to keep
      

This gives your PCA representation of the data in terms of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. Incidentally, if x is a 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-by-

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 matrix containing all your training data, this is a vectorized implementation, and the expressions above work too for computing xrot and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 for your entire training set all in one go. The resulting xrot and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 will have one column corresponding to each training example.

To compute the PCA whitened data 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, use

xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;
      

Since S's diagonal contains the eigenvalues 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, this turns out to be a compact way of computing 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 simultaneously for all 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

.

Finally, you can also compute the ZCA whitened data 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as:

xZCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x;
      

Exercise:PCA in 2D

Contents

  [hide]
  • 1 PCA, PCA whitening and ZCA whitening in 2D
    • 1.1 Step 0: Load data
    • 1.2 Step 1: Implement PCA
      • 1.2.1 Step 1a: Finding the PCA basis
      • 1.2.2 Step 1b: Check xRot
    • 1.3 Step 2: Dimension reduce and replot
    • 1.4 Step 3: PCA Whitening
    • 1.5 Step 4: ZCA Whitening

PCA, PCA whitening and ZCA whitening in 2D

In this exercise you will implement PCA, PCA whitening and ZCA whitening, as described in the earlier sections of this tutorial, and generate the images shown in the earlier sections yourself. You will build on the starter code that has been provided atpca_2d.zip. You need only write code at the places indicated by "YOUR CODE HERE" in the files. The only file you need to modify ispca_2d.m. Implementing this exercise will make the next exercise significantly easier to understand and complete.

Step 0: Load data

The starter code contains code to load 45 2D data points. When plotted using the scatter function, the results should look like the following:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Step 1: Implement PCA

In this step, you will implement PCA to obtain xrot, the matrix in which the data is "rotated" to the basis comprising 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

made up of the principal components. As mentioned in the implementation notes, you should make use of MATLAB's svd function here.

Step 1a: Finding the PCA basis

Find 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and draw two lines in your figure to show the resulting basis on top of the given data points. You may find it useful to use MATLAB's hold on and hold off functions. (After calling hold on, plotting functions such as plot will draw the new data on top of the previously existing figure rather than erasing and replacing it; and hold off turns this off.) You can use plot([x1,x2], [y1,y2], '-') to draw a line between (x1,y1) and (x2,y2). Your figure should look like this:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

If you are doing this in Matlab, you will probably get a plot that's identical to ours. However, eigenvectors are defined only up to a sign. I.e., instead of returning 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 as the first eigenvector, Matlab/Octave could just as easily have returned 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, and similarly instead of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 Matlab/Octave could have returned 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

. So if you wound up with one or both of the eigenvectors pointing in a direction opposite (180 degrees difference) from what's shown above, that's okay too.

Step 1b: Check xRot

Compute xRot, and use the scatter function to check that xRot looks as it should, which should be something like the following:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Because Matlab/Octave could have returned 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and/or 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 instead of 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, it's also possible that you might have gotten a figure which is "flipped" or "reflected" along the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

- and/or 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-axis; a flipped/reflected version of this figure is also a completely correct result.

Step 2: Dimension reduce and replot

In the next step, set k, the number of components to retain, to be 1 (we have already done this for you). Compute the resulting xHatand plot the results. You should get the following (this figure should not be flipped along the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

- or 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-axis):

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Step 3: PCA Whitening

Implement PCA whitening using the formula from the notes. Plot xPCAWhite, and verify that it looks like the following (a figure that is flipped/reflected on either/both axes is also correct):

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Step 4: ZCA Whitening

Implement ZCA whitening and plot the results. The results should look like the following (this should not be flipped/reflected along the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

- or 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

-axis):

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Contents

  • Step 0: Load data
  • Step 1a: Implement PCA to obtain U
  • Step 1b: Compute xRot, the projection on to the eigenbasis
  • Step 2: Reduce the number of dimensions from 2 to 1.
  • Step 3: PCA Whitening
  • Step 3: ZCA Whitening
  • Congratulations! When you have reached this point, you are done!
close all

%%================================================================
      

Step 0: Load data

We have provided the code to load data from pcaData.txt into x.
x is a 2 * 45 matrix, where the kth column x(:,k) corresponds to
the kth data point.Here we provide the code to load natural image data into x.
You do not need to change the code below.      
x = load('pcaData.txt','-ascii');
figure(1);
scatter(x(1, :), x(2, :));
title('Raw data');


%%================================================================
      

Step 1a: Implement PCA to obtain U

Implement PCA to obtain the rotation matrix U, which is the eigenbasis
sigma.      
% -------------------- YOUR CODE HERE --------------------
u = zeros(size(x, 1)); % You need to compute this
[n m] = size(x);
x = x-repmat(mean(x,2),1,m);%预处理,均值为0
sigma = (1.0/m)*x*x';
[u s v] = svd(sigma);


% --------------------------------------------------------
hold on
plot([0 u(1,1)], [0 u(2,1)]);%画第一条线
plot([0 u(1,2)], [0 u(2,2)]);%第二条线
scatter(x(1, :), x(2, :));
hold off

%%================================================================
      

Step 1b: Compute xRot, the projection on to the eigenbasis

Now, compute xRot by projecting the data on to the basis defined
by U. Visualize the points by performing a scatter plot.      
% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
xRot = u'*x;


% --------------------------------------------------------

% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure(2);
scatter(xRot(1, :), xRot(2, :));
title('xRot');

%%================================================================
      

Step 2: Reduce the number of dimensions from 2 to 1.

Compute xRot again (this time projecting to 1 dimension).
Then, compute xHat by projecting the xRot back onto the original axes
to see the effect of dimension reduction      
% -------------------- YOUR CODE HERE --------------------
k = 1; % Use k = 1 and project the data onto the first eigenbasis
xHat = zeros(size(x)); % You need to compute this
xHat = u*([u(:,1),zeros(n,1)]'*x);


% --------------------------------------------------------
figure(3);
scatter(xHat(1, :), xHat(2, :));
title('xHat');


%%================================================================
      

Step 3: PCA Whitening

Complute xPCAWhite and plot the results.      
epsilon = 1e-5;
% -------------------- YOUR CODE HERE --------------------
xPCAWhite = zeros(size(x)); % You need to compute this
xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x;



% --------------------------------------------------------
figure(4);
scatter(xPCAWhite(1, :), xPCAWhite(2, :));
title('xPCAWhite');

%%================================================================
      

Step 3: ZCA Whitening

Complute xZCAWhite and plot the results.      
% -------------------- YOUR CODE HERE --------------------
xZCAWhite = zeros(size(x)); % You need to compute this
xZCAWhite = u*diag(1./sqrt(diag(s)+epsilon))*u'*x;

% --------------------------------------------------------
figure(5);
scatter(xZCAWhite(1, :), xZCAWhite(2, :));
title('xZCAWhite');
      

Congratulations! When you have reached this point, you are done!

You can now move onto the next PCA exercise. :)      

Exercise:PCA and Whitening

Contents

[hide]
  • 1 PCA and Whitening on natural images
    • 1.1 Step 0: Prepare data
      • 1.1.1 Step 0a: Load data
      • 1.1.2 Step 0b: Zero mean the data
    • 1.2 Step 1: Implement PCA
      • 1.2.1 Step 1a: Implement PCA
      • 1.2.2 Step 1b: Check covariance
    • 1.3 Step 2: Find number of components to retain
    • 1.4 Step 3: PCA with dimension reduction
    • 1.5 Step 4: PCA with whitening and regularization
      • 1.5.1 Step 4a: Implement PCA with whitening and regularization
      • 1.5.2 Step 4b: Check covariance
    • 1.6 Step 5: ZCA whitening

PCA and Whitening on natural images

In this exercise, you will implement PCA, PCA whitening and ZCA whitening, and apply them to image patches taken from natural images.

You will build on the MATLAB starter code which we have provided in pca_exercise.zip. You need only write code at the places indicated by "YOUR CODE HERE" in the files. The only file you need to modify is pca_gen.m.

Step 0: Prepare data

Step 0a: Load data

The starter code contains code to load a set of natural images and sample 12x12 patches from them. The raw patches will look something like this:

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

These patches are stored as column vectors 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 in the 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 matrix x.

Step 0b: Zero mean the data

First, for each image patch, compute the mean pixel value and subtract it from that image, this centering the image around zero. You should compute a different mean value for each image patch.

Step 1: Implement PCA

Step 1a: Implement PCA

In this step, you will implement PCA to obtain xrot, the matrix in which the data is "rotated" to the basis comprising the principal components (i.e. the eigenvectors of Σ). Note that in this part of the exercise, you should not whiten the data.

Step 1b: Check covariance

To verify that your implementation of PCA is correct, you should check the covariance matrix for the rotated data xrot. PCA guarantees that the covariance matrix for the rotated data is a diagonal matrix (a matrix with non-zero entries only along the main diagonal). Implement code to compute the covariance matrix and verify this property. One way to do this is to compute the covariance matrix, and visualise it using the MATLAB command imagesc. The image should show a coloured diagonal line against a blue background. For this dataset, because of the range of the diagonal entries, the diagonal line may not be apparent, so you might get a figure like the one show below, but this trick of visualizing using imagesc will come in handy later in this exercise.

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Step 2: Find number of components to retain

Next, choose k, the number of principal components to retain. Pick k to be as small as possible, but so that at least 99% of the variance is retained. In the step after this, you will discard all but the top k principal components, reducing the dimension of the original data to k.

Step 3: PCA with dimension reduction

Now that you have found k, compute 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, the reduced-dimension representation of the data. This gives you a representation of each image patch as a k dimensional vector instead of a 144 dimensional vector. If you are training a sparse autoencoder or other algorithm on this reduced-dimensional data, it will run faster than if you were training on the original 144 dimensional data.

To see the effect of dimension reduction, go back from 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 to produce the matrix 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

, the dimension-reduced data but expressed in the original 144 dimensional space of image patches. Visualise 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 and compare it to the raw data, x. You will observe that there is little loss due to throwing away the principal components that correspond to dimensions with low variation. For comparison, you may also wish to generate and visualise 

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

 for when only 90% of the variance is retained.

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening

Raw

images   

PCA dimension-reduced images (99% variance) PCA dimension-reduced images (90% variance)

Step 4: PCA with whitening and regularization

Step 4a: Implement PCA with whitening and regularization

Now implement PCA with whitening and regularization to produce the matrix xPCAWhite. Use the following parameter value:

epsilon = 0.1
      

Step 4b: Check covariance

Similar to using PCA alone, PCA with whitening also results in processed data that has a diagonal covariance matrix. However, unlike PCA alone, whitening additionally ensures that the diagonal entries are equal to 1, i.e. that the covariance matrix is the identity matrix.

That would be the case if you were doing whitening alone with no regularization. However, in this case you are whitening with regularization, to avoid numerical/etc. problems associated with small eigenvalues. As a result of this, some of the diagonal entries of the covariance of your xPCAwhite will be smaller than 1.

To verify that your implementation of PCA whitening with and without regularization is correct, you can check these properties. Implement code to compute the covariance matrix and verify this property. (To check the result of PCA without whitening, simply set epsilon to 0, or close to 0, say 1e-10). As earlier, you can visualise the covariance matrix with imagesc. When visualised as an image, for PCA whitening without regularization you should see a red line across the diagonal (corresponding to the one entries) against a blue background (corresponding to the zero entries); for PCA whitening with regularization you should see a red line that slowly turns blue across the diagonal (corresponding to the 1 entries slowly becoming smaller).

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
Covariance for PCA whitening with regularization Covariance for PCA whitening without regularization

Step 5: ZCA whitening

Now implement ZCA whitening to produce the matrix xZCAWhite. Visualize xZCAWhite and compare it to the raw data, x. You should observe that whitening results in, among other things, enhanced edges. Try repeating this with epsilon set to 1, 0.1, and 0.01, and see what you obtain. The example shown below (left image) was obtained with epsilon = 0.1.

UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
UFLDL Tutorial_Preprocessing: PCA and Whitening PCA Whitening Implementing PCA/Whitening Exercise:PCA in 2DExercise:PCA and Whitening
ZCA whitened images Raw images

Contents

  • Step 0a: Load data
  • Step 0b: Zero-mean the data (by row)
  • Step 1a: Implement PCA to obtain xRot
  • Step 1b: Check your implementation of PCA
  • Step 2: Find k, the number of components to retain
  • Step 3: Implement PCA with dimension reduction
  • Step 4a: Implement PCA with whitening and regularisation
  • Step 4b: Check your implementation of PCA whitening
  • Step 5: Implement ZCA whitening

Step 0a: Load data

Here we provide the code to load natural image data into x.
x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to
the raw image data from the kth 12x12 image patch sampled.
You do not need to change the code below.      
x = sampleIMAGESRAW();
figure('name','Raw images');
randsel = randi(size(x,2),204,1); % A random selection of samples for visualization
display_network(x(:,randsel));%为什么x有负数还可以显示?

%%================================================================
      

Step 0b: Zero-mean the data (by row)

You can make use of the mean and repmat/bsxfun functions.      
% -------------------- YOUR CODE HERE --------------------
x = x-repmat(mean(x,1),size(x,1),1);%求的是每一列的均值
%x = x-repmat(mean(x,2),1,size(x,2));

%%================================================================
      

Step 1a: Implement PCA to obtain xRot

Implement PCA to obtain xRot, the matrix in which the data is expressed
with respect to the eigenbasis of sigma, which is the matrix U.      
% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
[n m] = size(x);
sigma = (1.0/m)*x*x';
[u s v] = svd(sigma);
xRot = u'*x;


%%================================================================
      

Step 1b: Check your implementation of PCA

The covariance matrix for the data expressed with respect to the basis U
should be a diagonal matrix with non-zero entries only along the main
diagonal. We will verify this here.
Write code to compute the covariance matrix, covar.
When visualised as an image, you should see a straight line across the
diagonal (non-zero entries) against a blue background (zero entries).      
% -------------------- YOUR CODE HERE --------------------
covar = zeros(size(x, 1)); % You need to compute this
covar = (1./m)*xRot*xRot';

% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);

%%================================================================
      

Step 2: Find k, the number of components to retain

Write code to determine k, the number of components to retain in order
to retain at least 99% of the variance.      
% -------------------- YOUR CODE HERE --------------------
k = 0; % Set k accordingly
ss = diag(s);
% for k=1:m
%    if sum(s(1:k))./sum(ss) < 0.99
%        continue;
% end
%其中cumsum(ss)求出的是一个累积向量,也就是说ss向量值的累加值
%并且(cumsum(ss)/sum(ss))<=0.99是一个向量,值为0或者1的向量,为1表示满足那个条件
k = length(ss((cumsum(ss)/sum(ss))<=0.99));

%%================================================================
      

Step 3: Implement PCA with dimension reduction

Now that you have found k, you can reduce the dimension of the data by
discarding the remaining dimensions. In this way, you can represent the
data in k dimensions instead of the original 144, which will save you
computational time when running learning algorithms on the reduced
representation.      
Following the dimension reduction, invert the PCA transformation to produce
the matrix xHat, the dimension-reduced data with respect to the original basis.
Visualise the data and compare it to the raw data. You will observe that
there is little loss due to throwing away the principal components that
correspond to dimensions with low variation.      
% -------------------- YOUR CODE HERE --------------------
xHat = zeros(size(x));  % You need to compute this
xHat = u*[u(:,1:k)'*x;zeros(n-k,m)];

% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.

figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']);
display_network(xHat(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));

%%================================================================
      

Step 4a: Implement PCA with whitening and regularisation

Implement PCA with whitening and regularisation to produce the matrix
xPCAWhite.      
epsilon = 0.1;
xPCAWhite = zeros(size(x));

% -------------------- YOUR CODE HERE --------------------
xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x;
figure('name','PCA whitened images');
display_network(xPCAWhite(:,randsel));

%%================================================================
      

Step 4b: Check your implementation of PCA whitening

Check your implementation of PCA whitening with and without regularisation.
PCA whitening without regularisation results a covariance matrix
that is equal to the identity matrix. PCA whitening with regularisation
results in a covariance matrix with diagonal entries starting close to
1 and gradually becoming smaller. We will verify these properties here.
Write code to compute the covariance matrix, covar.      
Without regularisation (set epsilon to 0 or close to 0),
when visualised as an image, you should see a red line across the
diagonal (one entries) against a blue background (zero entries).
With regularisation, you should see a red line that slowly turns
blue across the diagonal, corresponding to the one entries slowly
becoming smaller.      
% -------------------- YOUR CODE HERE --------------------
covar = (1./m)*xPCAWhite*xPCAWhite';

% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);

%%================================================================
      

Step 5: Implement ZCA whitening

Now implement ZCA whitening to produce the matrix xZCAWhite.
Visualise the data and compare it to the raw data. You should observe
that whitening results in, among other things, enhanced edges.      
xZCAWhite = zeros(size(x));

% -------------------- YOUR CODE HERE --------------------
xZCAWhite = u*xPCAWhite;

% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure('name','ZCA whitened images');
display_network(xZCAWhite(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));
      

Published with MATLAB® 7.11

继续阅读