4 – Building The RNN 1

(speaker) Array. So now we’re going to build the graph. So first thing that we need to do is define our hyperparameters. The first one is the LSTM size. So this is the number of units in the hidden layers in the LSTM cells. So LSTM cells actually have four different network layers in them. There’s like three sigma layers and one TNH layer. So this is the number of units in each of those layers. So if you set this to 256, then there’s 256 units in each of those four layers. Otherwise, you can basically just think of this as like setting the number of units in your hidden layer. It’s kind of all it is, except LSTM cells are way more complicated than just a normal hidden layer. However, that’s the same idea. I mean, you’re just setting the number of units in your hidden layer. And as you probably know by now, that the more units you have in your hidden layer, then the better performance you’re going to get out of your network at the expense of computation, of course. Next is our LSTM layers hyperparameter. So this just says the number of layers that you are using for your LSTMs. So this is deep learning. So this is just setting the number of layers we have, the number of hidden layers in LSTM cells we’re using. So again, as usual, the more layers you have, typically the better performance you’re going to have. But then also you run into overfitting. So what I’ve seen is that a good practice is to make your network large and be fine with overfitting, but then add regularization like dropout to it to control overfitting. That being said, I’d start with 1 and see how it does. And then if you feel like you need to add another layer, you can do that and then see how the validation accuracy responds. So next is the batch size. So this is just the number of reviews that we’re going to feed to our network in one training pass. Typically, you want is as high as you can go without running out of memory. The reason this is because our networks and TensorFlow is really good at doing calculations on matrices. And so then the fewer matrices we have to pass in, then the less computation time it’s going to take. However, if you have a lot of weights, a lot of connections in your network, then you’re going to quickly hit this memory limit in your GPU or computer. So basically, the way to do this is just to make this as large as you can before you run out of memory. And finally, we have a learning rate. So here is the first step in creating your network. So just defining placeholders for the inputs and the labels. And then we’re also going to be using dropout here, so we want to make a placeholder for the key probability in our dropout layer. OK, I’ll leave this up to you. So the next thing is going to be our embedding layer. So this layer here is colored yellow in our diagram. So again, we’re dealing with something like 74,000 words in our vocabulary. So if you try to [INAUDIBLE] code this, it’s going to be massively inefficient. So instead, what we’re going to do is just pass in integers and we’re going to use an embedding layer and use that layer as a lookup table. So you saw how we did this in the word2vec lesson. So what you could do is actually train up an embedding layer with word2vec and then just import that in here. However, it turns out that for most cases, unless you have like massive amounts of data, then it’s usually just fine to make a new embedding layer and then let the network train up those connections and find representations on its own. So here I’ll let you create the embedding matrix with tf.variable and use tf.nn.embedding_lookup to get the vectors that were going to pass to our LSTM cells. So if you need some help with this, go ahead and go back to the word2vec lesson and see how I did it there. Also, you can read documentation on embedding_lookup by clicking this. Next, we’re going to define our LSTM cells. So we’re not actually going to be building the network in this part, we’re just going to be building what’s inside these LSTM cells. So here you can check out this documentation on TensorFlow’s recurrent network functions. So the first one that we’re going to use is just a basic LSTM cell. So you can get that with tf.contrib.rnn.BasicLSTMCell. So this is what the function documentation looks like and basically you just create the cell and then you tell the number of units to use. In our case, I called this LSTM size, above there. So you can just say, LSTM is equal to basic, all those basic LSTM cell, LSTM size, and that’s all you need. It looks something like that. So next, we’re going to add dropout to our LSTM cells. And that’s really useful because it just kind of makes your network better with very little effort. So if you remember, dropout as a regularization technique that can help us prevent overfitting. With these LSTM units, it’s really easy to use dropout with this class called dropout_wrapper. So basically, you just pass in cell and you pass in your key probability, which we defined as a placeholder above, and then you just have this drop. So it basically acts exactly like an LSTM cell, except now, your LSTM cell has dropout on the output. So again, you have this option to add more layers. And TensorFlow also makes this really simple using tf.contrib.rnn.MultiRNNCell. So here you just pass in a list of cells. So in our case, since we have this dropout, like I was saying, this is just a normal LSTM cell, this is like wrapped in like dropout, it is just adding dropout on the end. So now we can just create this list that is however many layers long and then pass this to multi RNN cell. So this is just going to create a list of cells and then it’s going to just build layers for us. So here use all of this to build the LSTM cell. So here you’re going to create just your basic LSTM cell, then you’re going to add dropout to it, and then you can stack it up in multiple LSTM layers. So here I’m just using the cell to get an initial state. So if you remember, LSTMs have cell states that are passed between our LSTM cells. And this is just creating that initial cell state that’s all 0’s. And this is going to get updated through training and through passing data in as a sequence. So finally, we’re going to build our RNN forward pass. So what this means is that we’re going to pass in our data and then it’s going to go through here and then actually calculate the outputs. So the whole reason that I’m using an RNN here is that we can actually get information about the sequence of our words and our reviews. So you might remember from Andrew Trask’s lesson that he just used a feedforward network. So in that case, the network only knows about the individual words that were in the review. But in this case, we have the individual words that were in the review, but also the sequence that they appeared in. So the individual words are coming through here, through the embed layer, so they have this kind of vertical path through the network. But then, since we’re using a recurrent neural network, we also have this sequential information that’s getting passed through the hidden layer. So our output here, “positive,” knows that “best” and “movie” came before “ever,” and now we have this information about the sequence of words in our review. So this makes it such that recurrent neural networks actually perform better than just normal feedforward networks. So the way we’re going to implement this actual like running our data through the RNN is with tf.nn.DynamicRNN. So what you need to do here is just pass in your cell, which we already created, and the inputs, which are, in our case, coming from our embedding layer and going into the hidden layer. So that’s what those inputs mean. It’s not the inputs to the network, but it’s the inputs to the LSTM cells. It’s the input to this hidden layer. Then you also give it the initial state. What this does is it actually goes through this network and then calculates the state for each hidden layer and passes it to the next one, and also calculates the output of each of the hidden layers. And then when it’s done doing that, it gives us this list outputs and it gives us the final state. So here I’ll leave it up to you to build this dynamic RNN and calculate the outputs and the final state.

%d 블로거가 이것을 좋아합니다: