12 – Character-Wise RNN

Coming up in this lesson you’ll implement a character-wise RNN. That is, the network will learn about some text one character at a time and then generate new text one character at a time. Let’s say, we want to generate new Shakespeare plays. As an example, to be or not to be. We’d pass the sequence into our RNN one character at a time. Once trained the network will generate new text by predicting the next character based on the characters it’s already seen. So then to train this network we wanted to predict the next character in the input sequence. In this way the network will learn to produce a sequence of characters that look like the original text. Let’s consider what the architecture of this network will look like. First, let’s unroll the RNN so we can see how this all works as a sequence. Here, we have our input layer where we’ll pass in the characters as one hot encoded vectors. These vectors go to the hidden layer. The hidden layer is built with LSTM cells where the hidden state and cell state pass from one cell to the next in the sequence. In practice, we’ll actually use multiple layers of LSTM cells. You just stack them up like this. The output of these cells go to the output layer. The output layer is used to predict to the next character. We want the probabilities for each character the same way you did image classification with the cabinet. That means that we want a Softmax activation on the output. Our target here will be the input sequence but shifted over one so that each character is predicting the next character in the sequence. Again, we’ll use cross entropy loss for training with gradient descent. When this network is trained up we can pass in one character and get out a probability distribution for the likely next character. Then we can sample from that distribution to get the next character. Then we can take that character, pass it in and get another one. We keep doing this and eventually we’ll build up some completely new text. We’ll be training this network on the text from Anna Karenina, one of my favorite books. It’s in the public domain so it’s free to use however you want. Also, it’s an amazing novel.