PCA, amounts to finding a new and special basis for your dataset. But hang on, what does it mean to find a new basis? I’m guessing that you’ve heard about and even done this yourself before. But before we move on, let’s just briefly review exactly what this means. Let’s return to thinking about vectors as directions in space, and start in the 2D plane. You know that we can represent a point living in the 2D plane with two coordinates. The coordinates tell you how to get to the point from the origin. For example, the vector 1,2 means go one unit to the right, and two up to get to this point. So, we see that we can decompose the overall movement represented by the vector into two movements; one to the right, and one up. Once we know the directions to go, we can specify how far to go in each direction. Towards this goal, let’s write down two vectors, i hat and j hat. The first one, we write as 1,0 which means move one unit to the right. The second one, we write as 0,1 which represents moving one unit up. Our original vector is made up of one i hat and two j hats. We can write it as 1 times 1,0 plus 2 times 0,1 or 1,2. In other words, our vector can be written as a linear combination of i hat and j hat. We call i hat and j hat a set of basis vectors for the 2D plane. A set of vectors is a basis for a space, if no vector in the set can be written as a linear combination of the others, and any vector in the space can be written as a linear combination of vectors in the set. If we want to express this using matrix multiplication, we could write it this way. Now, we see that the number in the top position of this vector means how much of i hat, or how much of the first basis vector. While the number in this bottom position means how much of the second basis vector. So, 1,2 is how we write our vector in the i hat, j hat basis. Although we use i hat and j hat most of the time, there are other bases we can choose for the 2D plane, because there are different ways to get to our point. We could shoot out over here, and then go back over here. Remember, that our choice of basis also effectively sets up a coordinate grid for us to represent our vectors in. We are used to seeing a grid made up of square boxes, the usual X, Y coordinate system, with the basis vectors i hat and j hat. But if we use these red and blue vectors as our basis vectors, a more sensible grid would be made up of these parallelograms. Thus, we can represent our original vector with a different combination of basis vectors. We can think of this as another language for writing down our original vector, because we see that we can write down a new set of instructions for creating the original vector. Now, all we need is a way to translate between the two languages, the original language, where we use i hat j hat, and this one with the blue and red vectors.