9 – Other Architectures

In this video, I will show you a pair of similar architectures that also work well, but there are many variations to LSTMs and we encourage you to study them further. Here’s a simple architecture which also works well. It’s called the gated recurring unit or GRU for short. It combines the forget and the … Read more

8 – Putting It All Together

So here we go. As we’ve seen before, here is the architecture for an LSTM with the four gates. There is the forget gate, which takes the long-term memory and forgets part of it. The learn gate puts the short-term memory together with the event as the information we’ve recently learned. The remember gate joins … Read more

7 – LSTM 7 Use Gate

And finally, we come to the use gate or output gate. This is the one that uses the long term memory that just came out of the forget gate and the short term memory that just came out of the learned gate, to come up with a new short term memory and an output. These … Read more

6 – Remember Gate

And now we’re going to learn the Remember Gate. This one is the simplest. It take the long-term memory coming out of the Forget Gate and the short-term memory coming out of the Learn Gate and simply combines them together. And how does this work mathematically? Again, very simple. We just take the outputs coming … Read more

5 – Forget Gate

Now, we go to the Forget Gate, this one works as follows: It takes a long term memory and it decides what parts to keep and to forget. In this case, the show is about nature and science and the forget gate decides to forget that the show is about science and keep the fact … Read more

4 – Learn Gate

So, let’s keep this our base case. We have a long term memory which is at the show we’re watching it’s about nature and science. We also have a short term memory which is what we’ve recently seen, a squirrel and a tree. And finally, we have our current event which is a picture we … Read more

3 – LSTM Architecture

So in order to study the architecture of an LSTM, let’s quickly recall the architecture of an RNN. Basically what we do is we take our event E_t and our memory M_t-1, coming from the previous point in time, and we apply a simple tanh or sigmoid activation function to obtain the output and then … Read more

2 – LSTM Basics

So let’s recap. We have the following problem: we are watching a TV show and we have a long term memory which is that the show is about nature and science and lots of forest animal have appeared. We also have a short term memory which is what we have recently seen which is squirrels … Read more

18 – 08 Making Predictions V3

Now, the goal of this model is to train it so that it can take in one character and produce a next character and that’s what this next step, Making Predictions is all about. We basically want to create functions that can take in a character and have our network predict the next character. Then, … Read more

17 – 07 CharRNN Solution V1

We wanted to define a character RNN with a two layer LSTM. Here in my solution, I am running this code on GPU and here’s my code for defining our character level RNN. First, I defined an LSTM layer, self.lstm. This takes in an input size, which is going to be the length of a … Read more

16 – 06 Defining Model V2

All right. So, we have our mini batches of data and now it’s time to define our model. This is a little diagram of what the model will look like. We’ll have our character’s put into our input layer and then a stack of LSTM cells. These LSTM cells make up our hidden recurrent layer … Read more

15 – 05 Batching Data V1

So, this is my complete get_batches code that generates mini-batches of data. So, the first thing I wanted to do here is get the total number of complete batches that we can make in batches. To do that, I first calculated how many characters were in a complete mini-batch. So, in one mini-batch, there’s going … Read more

14 – 04 Implementing CharRNN V2

This is a notebook where you’ll be building a characterwise RNN. You’re going to train this on the text of Anna Karenina, which is a really great but also quite sad a book. The general idea behind this, is that we’re going to be passing one character at a time into a recurrent neural network. … Read more

13 – Sequence-Batching

One of the most difficult parts of building networks for me is getting the batches right. It’s more of a programming challenge than anything deep learning specific. So here I’m going to walk you through how batching works for RNN. With RNNs we’re training on sequences of data like text, stock values, audio etc. By … Read more

12 – Character-Wise RNN

Coming up in this lesson you’ll implement a character-wise RNN. That is, the network will learn about some text one character at a time and then generate new text one character at a time. Let’s say, we want to generate new Shakespeare plays. As an example, to be or not to be. We’d pass the … Read more

11 – 03 Training Memory V1

Last time, we defined a model, and next, I want to actually instantiate it and train it using our training data. First, I’ll specify my model hyperparameters. The input and output will just be one, it’s just one sequence at a time that we’re processing and outputting, then I’ll specify a hidden dimension which is … Read more

10 – 02 Time Series Prediction V2

To introduce you to RNNs in PyTorch, I’ve created a notebook that will show you how to do simple time series prediction with an RNN. Specifically, we’ll look at some data and see if we can create an RNN to accurately predict the next data point given a current data point, and this is really … Read more


Okay so, let’s say we have a regular neural network which recognizes images and we fitted this image. And the neural neural network guesses that the image is most likely a dog with a small chance of being a wolf and an even smaller chance of being a goldfish. But, what if this image is … Read more