16 – 06 Defining Model V2

All right. So, we have our mini batches of data and now it’s time to define our model. This is a little diagram of what the model will look like. We’ll have our character’s put into our input layer and then a stack of LSTM cells. These LSTM cells make up our hidden recurrent layer and when they look at a mini batch of data as input they’ll look at one character at a time and produce an output and a hidden state. So, we will pass an input character into our first LSTM cell which produces a hidden state. Then at the next time step, we’ll look at the next character in our sequence and pass that into this LSTM cell which will see the previous hidden state as input. You have so far seen this behavior in a one layer RNN but in this case we plan on using a two-layer model that has stacked LSTM layers and that means that the output of this LSTM layer is going to go to the next one as input and each of these cells is sharing its hidden state with the next cell in the unrolled series. Finally, the output of the last LSTM layer will include some character class scores that will be the length of our vocabulary. We’ll put this through a Softmax activation function which we’ll use to get the probability distribution for predicting the most likely next character. So, to start you off on this task, you’ve been given some skeleton code for creating a model. First, we’re going to check to see if a GPU is available for training then you’ll see this class character RNN. You can see that this character RNN class has our usual init and forward functions and later you’ve been given some code to initialize the hidden state of an LSTM layer and I’ll go over this in a moment. You can definitely take a look at this given code and how we’re creating our initial character dictionaries but you won’t need to change it. We also have several parameters that are going to be passed in when a character RNN is instantiated and I’ve saved some of these as class variables. So, using these input parameters and variables, it will be up to you to create our model layers and complete the forward function. You’ll first create an LSTM layer which you can read about in the documentation here. We can see that an LSTM layer is created using our usual parameters; an input size, hidden size, number of layers, and a batch first parameter. We’ll also add a dropout value. This introduces a dropout layer in between the outputs of LSTM layers if you’ve decided to stack multiple layers. So, after you define an LSTM layer, I’ll ask you to define two more layers; one dropout layer and a final fully-connected layer for getting our desired output size. Once you’ve defined these layers, you’ll move on to define the forward function. This takes in an input x and hidden state. You’ll pass this input through the layers of the model and return a final output and hidden state. You’ll have to make sure to shape the LSTM output so that it can be fed into the last fully connected layer. Okay. Then at the bottom here, you’ll see this function for initializing the hidden state of an LSTM. An LSTM has a hidden and a cell state that are saved as a tuple hidden. The shape of the hidden and cell state is defined first by the number of layers in our model, the batch size of our input, and then the hidden dimension that we specified in model creation. In this function, we’re initializing the hidden weights all to zero and moving them to GPU if it’s available. Okay, so all the code that you see you don’t need to change, you just need to define the model layers and feedforward behavior. If you’ve implemented this correctly, you should be able to set your model hyperparameters and proceed with training and generating some sample text. Try this out on your own then next, check out my solution.

%d 블로거가 이것을 좋아합니다: