To introduce you to RNNs in PyTorch, I’ve created a notebook that will show you how to do simple time series prediction with an RNN. Specifically, we’ll look at some data and see if we can create an RNN to accurately predict the next data point given a current data point, and this is really easiest to see in an example. So, let’s get started. I’m importing our usual resources, and then I’m actually going to create some simple input and target training data. A classic example is to use a sine wave as input because it has enough variance and shape to be an interesting task, but it’s also very predictable. So, I want to create a sample input and target sequence of data points of length 20, which I specify here as sequence length. Recall that RNNs are meant to work with sequential data, and so the sequence length is just the length of a sequence that it will look at as input. Often, the sequence length will indicate the number of words in a sentence or just some length of numerical data as is the case here. So, in these two lines, I’m just going to generate the start of a sine wave in a range from zero to Pi time steps. At first, I’m going to create a number of points that sequence length 20 plus 1, then I’m going to reshape my sine wave data to give it one extra dimension, the input size, which is just going to be one. Then, to create an input and target sequence of the length I want, I’m going to say an input X is equal to all but the last point in data, and the target Y is equal to all but the first point. So, X and Y should contain 20 data points and have an input size of one. Finally, I’m going to display this data using the same x-axis. You can see the input X is in red and the target Y is shifted over by one in blue. So, if we look at this point as an example at the same time step, Y is basically X shifted one time step in the future, and that’s exactly what we want. So, now we have our training data and the next step is defining an RNN to learn from this data. We can define an RNN as usual, which is to say as a class using PyTorche’s NN library. The syntax will look similar to how we’ve defined CNNs in the past. Let’s actually click on the RNN documentation to read about the parameters that our recurrent layer takes in as input. So, here’s the documentation for an RNN layer. We can see that this layer is responsible for calculating a hidden state based on its inputs. Now, to define a layer like this, we have these parameters: an input size, a hidden size, a number of layers and a few other arguments. The input size is just the number of input features, and in our specific case we’re going to have inputs that are 20 values in sequence and one in input size features. This is like when we thought about the depth of an input image when we made CNN’s. Next, we have a hidden size that defines how many features the output of an RNN will have and its hidden state. We also have a number of layers, which if it’s greater than one, just means we’re going to stack two RNNs on top of each other. Lastly, I want you to pay attention to this batch first parameter. If it is true, that means the input and output tensors that we provide are going to have the batch size as the first dimension, which in most cases that we go through will be true. So, this is how you define an RNN layer, and later in the forward function we’ll see that it takes in an input and an initial hidden state, and it produces an output and a new hidden state. Back to our notebook. Here, I’m defining an RNN layer, self- doubt RNN. This RNN is taking in an input size and a hidden dimension that defines how many features the output of this RNN will have. Then it takes in a number of layers which allows you to create a stacked RNN if you want and this is typically a value kept between one and three layers. Finally, I’m setting batch first to true because I’m shaping the input such that the batch size will be the first dimension. Okay. Then to complete this model I have to add one more layer which is a final fully-connected layer. This layer is responsible for producing the number of outputs, output size that I want given the output of the RNN. So, all of these parameters are just going to be passed into our RNN when we create it. You’ll also note that I’m storing the value of our hidden dimension so I can use it later in our forward function. In the forward function, I’m going to specify how a batch of input sequences will pass through this model. Note that this forward takes in an input X and the hidden state. The first thing I’m doing is grabbing the batch size of our input calling X dot size of 0. Then I’m passing my initial input and hidden state into the RNN layer. This produces the RNN output and a new hidden state. Then I’m going to call view on the RNN output to shape it into the size I want. In this case that’s going to be batch size, times sequence length rows and the hidden dimension number of columns. This is a flattening step where I’m preparing the output to be fed into a fully-connected layer. So, I’ll pass this shaped output to the final fully-connected layer, and return my final output here and my hidden state generated from the RNN. Now, as a last step here, I’m going to actually create some text data and to test RNN and see if it’s working as I expect. The most common error I get when programming RNNs is that I’ve messed up the data dimension somewhere. So, I’m just going to check that there as I expect. So, here I’m just creating a test RNN with an input and output size of one, a hidden dimension of 10, and the number of layers equal to two, and you can change the hidden dimension and the number of layers. I basically just want to see that this is making the shape of outputs I expect. So, here I’m creating some test data that are sequence length along. I’m converting that data into a tensor datatype, and I’m squeezing the first dimension to give it a batch size of one as a first dimension. Then I’m going to print out this input size and I’ll pass it into our test RNN as input. Recall that this takes an initial hidden state, and an initial one here is just going to be none. Then this should return an output and a hidden state, and I’m going to print out those sizes as well. Okay. So, our input size is a 3D tensor which is exactly what I expect. If first dimension is one our batch size, then 20 our sequence length, and finally our input number of features. Which is just one as we specified here. The output size is a 2D tensor. This is because in the forward function of our model definition actually smooshed the batch size and sequence length into one parameter. So, batch size times sequence length is 20 and then we have an output size of one. Finally we have our hidden state. Now, the first dimension here is our number of layers that I specified in the model definition two. Next we have the value one which is just the batch size of our input here. Finally, the last dimension here is 10 which is just our hidden dimension. So, all of these look pretty good and as I expect and I can proceed. Next I’ll show you how to train a model like this.