14 – 04 Implementing CharRNN V2

This is a notebook where you’ll be building a characterwise RNN. You’re going to train this on the text of Anna Karenina, which is a really great but also quite sad a book. The general idea behind this, is that we’re going to be passing one character at a time into a recurrent neural network. We’re going to do this for a whole bunch of text, and at the end what’s going to happen, is that our network is going to be able to generate new text, one character at a time. This is the general structure. We have our input characters and we want to one-hot encode them. This one-hot vector, will be fed into a hidden recurrent layer, and then the hidden layer has two outputs. First, it produces some RNN output, and it produces a hidden state, which will continue to change and be fed to this hidden layer at the next time step in the sequence. We saw something similar in the last code example. So, our recurrent layer keeps track of our hidden state, and its output goes to a final fully connected output layer. Our linear output layer, will produce a series of character class scores. So, this output will be as long as our input vector, and we can apply a Softmax function to get a probability distribution for the most likely next character. So, this network is based off of Andrej Karpathy’s post on RNNs, which you can find here. It’s a really good post and so you can check out these links to read more about RNNs. [inaudible] notebook is broken into a small series of exercises that you can implement yourself. For each exercise, I’m also going to provide a solution to consult. I recommend that you open the exercise notebook in one window and watch videos in another. That way you can work alongside me. Okay, so first things first, I’m loading in and taking a look at our text data. Here, I’m loading in the Anna Karenina text file and I’m printing out the first 100 characters. The characters are everything from letters, to spaces, to newline characters, and we can see the classic first-line, “Happy families are all alike. Every unhappy family is unhappy in its own way.” Then, I’ll actually want to turn our text into numerical tokens. This is because our network can only learn from numerical data, and so we want to map every character in the text to a unique index. So, first off, with the text, we can just create a unique vocabulary as a set. Sets, are a built in python data structure, and what this will do, is look at every character in the past in the text. Separate it out as a string and get rid of any duplicates. So, chars, is going to be a set of all our unique characters. This is also sometimes referred to as a vocabulary. Then, I’m creating a dictionary from a vocabulary of all our characters, that maps the actual character to a unique integer. So, it’s just giving a numerical value to each of our unique characters, and putting it in a dictionary int2char. Then I’m doing this the other way, where we have a dictionary that goes from integers to characters. Recall that any dictionary is made of a set of key and value pairs. In the int2char case, the keys are going to be integers and the values are going to be string characters. In the char2int case, our keys are going to be the characters and our values are going to be their unique integers. So, these basically give us a way to encode text as numbers. Here, I am doing just that. I’m encoding each character in the text as an integer. This creates an encoded text, and just like I printed the first 100 characters before, I can print the first 100 encoded values. If you look at the length of our unique characters, you’ll see that we have 83 unique characters in the text. So, our encoded values will fall in this range. You can also see some repeating values here like 82, 82, 82 and 19,19. If we scroll back up to our actual text, we can surmise that the repeated 82s are probably this new line character, and 19 is maybe a p. Okay, so are encodings are working, and now what we want to do, is turn these encodings into one-hot vectors, that our RNN can take in as input, just like in our initial diagram. Here, I’ve actually written a function that takes in an encoded array, and turns it into a one-hot vector of some specified length. I can show you what this does with an example below. I’ve made a short test sequence three, five, one and a vector length that I specify, eight. So, I’m passing this test sequence and the number of labels that I expect into our one-hot function. I can see that the result is an array of three one-hot vectors. All of these vectors are of length eight and the index three, five, and one are on for their respective encodings. Now, for our vocabulary of 83 characters, these are just going to be much longer vectors. Cool. So, we have our preprocessing functions and data in place, and now your first task will be to take our encoded characters, and actually turn them into mini batches that we can feed into our network. So, as Matt mentioned before, the idea is that we actually want to run multiple sequences through our network at a time. Where one mini batch of data contains multiple sequences. So, here’s an example starting sequence. If we say we want a batch size of two, we’re going to split this data into two batches. Then, we’ll have these sequence length windows that specify how big we want our sequences to be. In this case, we have a sequence length of three, and so our window will be three in width. For a batch size of two and sequence length of three these values will make up our first mini-batch. We’ll just slide this window over by three to get the next mini-batch. So, each mini-batch is going to have the dimensions batch size by sequence length. In this case, we have a two by three window on are encoded array that we pass into our network. If you scroll down, I have more specific instructions. The first thing you’re going to be doing is taking in an encoded array, and you’ll want to discard any values that don’t fit into completely full mini-batches. Then, you want to reshape this array into batch size number of rows. Finally, once you have that batch data, you’re going to want to create a window that iterates over the batches a sequence length at a time, to get your mini batches. So, here’s the skeleton code. Your array is going to be some encoded data, then you have a batch size and sequence length. Basically, you want to create an input x that should be a sequence length or number of timesteps wide and a batch size tall. This will make up our input data and you’ll also want to provide targets. The targets y, for this network are going to be just like the input characters x, only shifted over by one. That’s because we want our network to predict the most likely next character more some input sequence. So, you’ll have your input sequence x and our targets y shifted over by one. Then finally, when we [inaudible] batches, we’re going to create a generator that iterates through our array and returns x and y with this yield command. Okay, I’ll leave implementing this batching function up to you. You can find more information about how you could do this in the notebook. There’s some code for testing out your implementation below. In fact, this is what your batches should look like when you run this code. If you need any help or you just want to see my solution, go ahead and check out the solution video next.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading