10 – Mini Project 5 Solution

All right, so in this section, we’ve made our network more efficient. We’ve done this by getting rid of the multiplication by 1 because we don’t need to, because it doesn’t change anything. And we’ve gotten rid of the processing of any of the words that have 0 in them altogether. And this really should increase the speed of both the training and the testing of our network, and the running of our network, by a significant amount, which we’ll find out in a second. Now, in order to make this change, notice this top part of this class didn’t really change, right? So we still initialize the network, we still create the wave matrix, still create the same word index lookup. I left these methods around but we’re not going to use them. And then we get to these train methods and things really start to change. So I added kind of a pre processing step to the train method, which was the following. It takes each of the kind of training_reviews_raw, so that the raw text, and it creates a set of indices. So it converts all the words to numbers, right? But to the rows that they’re corresponding to. So in this case it’s like what we did up here with this 4 and 9, right? So then we had this list of indices and we can just sum them into this vector. And this ends up being really, really fast, which is great. So what we did here is just a kind of a bigger version of that, where for each review we convert the words to the row of our weight matrix that it corresponds to, right? So our inputs correspond to rows in the matrix and our outputs correspond to columns, right? So in major matrix multiplication is kind of how it works out. And that allows us to make it down here to completely skip generating our input vector, right? So we don’t have to do that in forward propagation at all. We jump straight to generating our hidden layer. All we do is we iterate through each index in our review, each index, and we sum that row of weights_0_1 into layer_1, right? So we take their 1, which has been emptied out right here, and we say all right, for each row in weights_0_1, add it into layer_1. What we’re doing is we’re saying for each row, add it into thin layer. So add the row for horrible, it’s this one, add the row for terrible, it’s this one, add it in here, right? We’re skipping this entirely and the 70,000 other words that were mentioned here. That is a huge, huge time saving and I think I’m really excited to see how much speed that gives it. Now this part’s exactly the same, Layer_2 error is exactly the same, Layer_2_delta is exactly the same. Layer_1 error and 1_delta are exactly the same, because nothing’s changed. The only thing that’s different now is the way that we update the weights is basically just the inverse of when we populated it. So we iterate through the same indices and we say all right, we’re just going to update the ones that we forward propagated, the ones that we used and we’re going to leave all the rest of them alone. This actually makes even more sense when you look at how back propagation happened or how this happens, right? So when you’re computing this final, that’s not a weight delta, your weight adjustment, you’re multiplying the input value by the delta that’s here. So we have back propagate delta, you have delta on hidden layer. So the way to update these four indices, you say 1 times the delta that’s right here, and you subtract that from these weights. Now the thing is if it’s a 0, that subtraction’s always going to be 0. So we can skip that on the way back as well, which is great. So that’s how we do our weight update. All the evaluation logic is the same, the test logic is the same, the run logic is the same. And now we want to try it out. So we’re going to go ahead and grab our in the network, put it right there. Then we’re going to grab the actual training, so just a reference, right? This train did around a 100 reviews per second. Go ahead and run that, put that right there, grab the test. I guess they’re going to train, grab the test, I guess we kind of want to see that, so I’ll see if I can. A 1000 reviews a second, it’s ten times faster, over ten times, look at that 1300 and still training at the same conversion speed. Look at that, 1300 reviews per second, that is over and order of magnitude increase in speed, because we got rid of all these problems, like COC testing. Okay testing, you almost didn’t even noticed it happened. Cool, awesome. And we’re getting a great score, actually, yeah, yeah. I mean it should be exactly identical, right? because nothing has changed. Man, this is great, that’s so much faster, awesome. So this is great. If we wanted to, we could train multiple iterations, right? So if I said times 2. [BLANK_AUDIO] So we’ll hit train and the second it starts, so it’s converting everything into indices right now. And then it’s going to start training, so I guess maybe I lost a little bit here. But if you’re going to keep doing more iterations, it’s going to use the same indices. So it converts the indices once, keeps going. Wow, reviews per second it’s getting up to 1500, it’s crazy fast. So we can keep training, training accuracy keeps going up. It’s just the faster your network goes, the more iterations you can do in a given period of time, right? And the way that we’re able to accomplish this is by just stripping out the stuff that the neural net is doing that actually isn’t helping us predict or really even to learn. That’s really what the strategy is all about, so yeah. Now, in the next section we’re going to kind of go back to our data and say okay, can we kind of improve the modeling? And so we improved the way that we frame the problem. Do we improve the way that the neural net is for back propagation because of how we train a problem? So because we set up just pushing forward the vocabulary, this allowed us to trim out a lot of inefficient waste that the neural net was computing. And now we’re going to go back. Okay, can we reframe the problem again to reduce even more noise and potentially even reduce more weight propagations we have to do, to speed how fast it learns and actually how fast it computes? So we’re going to do one more iteration of this in project six. I’ll see you there.

%d 블로거가 이것을 좋아합니다: