Let’s summarize. What I’m trying to tell you is that the goal of maximizing these links, the projections of our data onto the first dimension can be written this way, where this thing in here is the variance of this set of projections of the X vectors onto the W direction, up to a constant factor. Then, we choose W so as to maximize that variance. This is how we choose the first principle component, what we would do to find the next principle component is, subtract the component of each data vector in the direction of the first principle component. Remember that when you subtract a vector from another vector, you line up the first vector’s tail with a second vector’s tip. Then, we’d get a new set of data vectors that live in the space orthogonal to the first principle component. In our example here, this new space is only a straight line orthogonal to the first principle component. If we were working in three-dimensions, and we found the direction of the first principal component, we’d find the component of each data point along that direction, and subtract that from the data point. These new data vectors would all live in the 2D plane perpendicular to the first principle component. Then we’d apply the same procedure to find the next principle component, until we had one for each dimension.