11 – 03 Training Memory V1

Last time, we defined a model, and next, I want to actually instantiate it and train it using our training data. First, I’ll specify my model hyperparameters. The input and output will just be one, it’s just one sequence at a time that we’re processing and outputting, then I’ll specify a hidden dimension which is just the number of features expect to generate with the RNN layer. I’ll set this to 32, but for a small data set like this, I may even be able to go smaller. I’ll set n_layers to one for now. So, I’m not stacking any RNN layers. I’ll create this RNN and printed out. I should see the variables that I expect. My RNN layer with an input size and hidden dimension, and a linear layer with an input number of features and output number. Before training, I’m defining my loss and optimization functions. Now in this case, we’re training our model to generate data points that are going to be basically coordinate values. So to compare a predicted and ground truth point like this, we’ll use a regression loss because this is just a quantity rather than something like a class probability. So for the loss function, I’m going to use mean squared error loss which will just measure the distance between two points. I’ll use an Adam optimizer which is standard for recurrent models, passing in my parameters and our learning rate. Next, I have a function train that’s going to take in and RNN a number of steps to train for, and the parameter that will determine when it will print out law statistics. Now, at the very start of this function, I’m initializing my hidden state. At first, this is going to be nothing and it will default to a hidden state of all zeros. Then let’s take a look at our batch loop. Now, this is a little unconventional, but I’m just generating data on the fly here according to how many steps we will train for. So, in these lines, I’m just generating a sequence of 20 sine wave values at a time. As we saw when I generated data at the start. Here, I’m getting my input x and a target y that’s just shifted by one time step in the future. Here, I’m converting this data into tensors and squeezing the first dimension of our x_tensor to give it a batch size of one. Then I can pass my input tensor into my RNN model. So this is taking in my x input tensor and my initial hidden state at first. It produces a predicted output and a new hidden state. Next is an important part. I want to feed this new hidden state into the RNN as input at the next time step when we loop around once more. So I’m just copying the values from this produced hidden state into a new variable. This essentially detaches the hidden state from its history and I will not have to backpropagate through a series of accumulated hidden states. So this is what’s going to be passed as input to the RNN at the next time step or next point in our sequence. So then, I have the usual training commands, I zero out any accumulated gradients. Calculate the loss, and perform a backpropagation in optimization step. Down here, I have some code to print out the loss and show what our input and predicted outputs are. Finally, this function returns a trained RNN which will be useful if you want to save a model for example. So, let’s run this code. I’ll choose to train our RNN, and that we defined above for 75 steps. I’ll print out the result every 15 steps. We can see the mean squared error loss here and the difference between our red input in our blue output values. Recall that we want the blue output values to be one times step in the future when compared to the red ones. So it starts out pretty incorrect. Then we can see the loss decreases quite a lot after the first 15 steps. Our blue line is getting closer to our red one. As we train the blue predicted line gets closer to what we know our target is, at the end of 75 steps, our loss is pretty low. Our blue line looks very similar to what we know or output should be. If we look at the same time step for a red input dot, and a blue input dot, we we shouldn’t see that the blue input is one time-step shifted in the future. It’s pretty close. You could imagine getting even better performance after training for more steps or if you wanted to add more layers to your RNN. So, in this video, I wanted to demonstrate the basic structure of a simple RNN and show you how to keep track of the hidden state and represent memory over time as you train. You could imagine doing something very similar with data about world temperature or stock prices which are a little bit more complicated than this. But it will be really interesting to see if you could predict the future given that kind of data. Okay, so this is just an example, you can check out this code in our program GitHub, which is linked to below. I encourage you to play around with these model parameters until you have a good handle on the dimensions of an RNN input and output and how hyperparameters might change, how this model trains. Next, Matt and I will go over an exercise in generating text.