Variance, okay we know the formula for that. It’s just the sum of the squared deviations from the mean, divided by n minus one. But if the mean of the original data coordinates is zero, so is the mean of their projections onto the new direction. So, we have that this term I call mu is zero. So, we’re trying to maximize this thing. Now, I’m going to show you how to write this in a more condensed fashion, that you will see if you’re reading about PCA. First, let’s notice that if each of these quantities is a number and a vector, the variance above is just the squared length of the vector. Now let’s look at the thing inside the double bars. If you don’t follow this step, try writing it down for yourself and multiplying it out. But see that we can write this as a matrix where x_1 is the first row, x_2 is the second row and x_3 is the third row times the w vector. Remember that this w is just the length of the w vector, so it’s just a number. I’ll just pull it out over here. Now, let’s write this in a more condensed form. See that the matrix on the left is just our data matrix where the rows are the observations, so the number of rows is the number of data points and the columns are the features or dimensions. The vector on the right is just our w vector. Another way you might see this written is like this. Be sure to go through these steps on your own if you find you’re a little fuzzy on them. But now we’ve arrived at where we wanted to. This last quantity is called a Rayleigh quotient. This is exactly the quantity we try to maximize in PCA, when we’re looking for w; the direction of the first dimension.