10 – 08 Making Predictions V3

Now, the goal of this model is to train it so that it can take in one character and produce a next character and that’s what this next step, Making Predictions is all about. We basically want to create functions that can take in a character and have our network predict the next character. Then, we want to take that character, pass it back in, and get more and more predicted next characters. We’ll keep doing this until we generate a bunch of text. So, you’ve been given this predict function which will help with this. This function takes in a model and occurring character and its job is to basically give us back the encoded value of the predictive next character and the hidden state that’s produced by our model. So, let’s see what it’s actually doing step-by-step. It’s taking in our input character and converting it into it’s encoded integer value. Then, as part of pre-processing, we’re turning that into a one-hot encoded representation and then converting these inputs into a tensor. These inputs we can then pass to our model, and then you’ll see a couple of steps that are really similar to what we saw in our training loop. We put our inputs on a GPU if it’s available and we detach our hidden state from its history here. Then, we pass in the inputs and the hidden state to our model which returns an output and a new hidden state. Next, we’re processing the output a little more. We’re applying a softmax function to get p probabilities for the likely next character. So, p is a probability distribution over all the possible mixed characters given the input character x. Now, we can generate more sensible characters by only considering the k most probable characters. So, here we’re giving you a couple of lines of code to use top k sampling, which finds us the k most likely next characters. Then, here we’re adding an element of randomness, something that selects from among those top likely next characters. So, then we have a most likely next character and we’re actually returning the encoded value of that character and the hidden state produced by our model, but we’ll basically want to call the predict function several times, generating one character’s output, then passing that in as input and predicting the next and next characters. That brings me to our next function sample. Sample will take in our trained model and the size of text that we want to generate. It will also take in prime, which is going to be a set of characters that we want to start our model off with. Lastly, we will take in a value for top k which will just return our k most probable characters in our predict function. So, in here, we’re starting off by moving our model to GPU if it’s available, and here we’re also initializing the hidden state with a batch size of one because, for one character that we’re inputting at a time, the batch size will be one. In this way, prediction is quite different than training a model. Then, you’ll see that we’re getting each character in our prime word. The prime word basically helps us answer the question, how do we start to generate text? We shouldn’t just start out randomly. So, what is usually done is to provide a prime word or a set of characters. Here the default prime set is just the, T-H-E, but you can pass in any set of characters that you want as the prime. The sample function first processes these characters in sequence adding them to a list of characters. It then calls predict on these characters passing in our model, each character and hidden state and this returns the next character after our prime sequence and the hidden state. So, here we have all our prime characters in the default case. This is going to be T, H, and E and then we’re going to append the next most likely character. So, we’re basically building up a list of characters here, then we’re going to generate more and more characters. In this loop, we’re passing in our model and the last character in our character list. This returns the next character and the hidden state. This character is appended to our list and the cycle starts all over again. So, predict is generating a next likely character which is appended to our list and then that goes back as input into our predict function. The effect is that we’re getting next and next and next characters and adding them to our characters list, that is until we reach our desired text length. Finally, we join all these characters together to return a sample text, and here I’ve generated a couple samples. You can see that I’ve passed in my model that was trained for 20 epochs, and I said, generate a text that’s 1,000 characters long starting with the prime word Anna. I’ve also passed in a value for top k equal to five. You can see that this starts with the prime word and generates what might be thought of as a paragraph of text in a book. Even with just a few prime characters, our model is definitely making complete and real words that make sense. The structure and spelling looks pretty good even if the content itself is a little confusing, and here’s another example where I’ve loaded in a model by name and I’m using this loaded model to generate a longer piece of text, starting with the prime words, “And Levin said.” So, this is pretty cool. A well-trained model can actually generate some text that makes some sense. It learned just from looking at long sequences of characters what characters were likely to come next. Then in our sampling and prediction code, we used top-k sampling and some randomness in selecting the best likely next character. You can train a model like this on any other text data. For example, you could try it on generating Shakespeare sonnets or another text of your choice. Great job on getting this far. You’ve really learned a lot about implementing RNNs in PyTorch.

%d 블로거가 이것을 좋아합니다: