9 – Understanding Inefficiencies in our Network

All right. So in the last section, we optimized our neural network to better find the correlation of our data set by removing some distracting noise. And the neural network attended the signal so much better. It trained 83% accuracy in the training data, and the testing accuracy was up to 85%. And this was just after one iteration. So we probably could have kept training it, and squeeze out a little more, but we’re going to keep looking at this one iteration benchmark to see how fast we can get the neural net to train. I mean, this is up from 60% before, so this is a huge gain. In accuracy and the speed of training for our neural network, and that was a lot of progress. However, the actual raw computational speed, the number of seconds that it takes to do a full pass is still pretty slow. What I want to be able to do in here is attack this network and say okay. What is this thing doing that is wasteful on the computation site. So before we had kind of a wasteful data and now I want to say what is wasteful inside this neural net. Coz you know it’s funny. You could do a lot of things on this theory side and try to say okay how can it learn faster. But truth is, the other one before was the learning. It was just taking a really long time. So we also could of tied a optimize the computational side so that it’s just train so much more faster that it’s still able to learn what we want the other to learn. The faster you can get your neural to train then, to be honest. The longer you’ll let it train before you get bored. And you’ll find more interesting stuff. And people who train neural nets. You can kind of just keep training. There’s no natural finish. It’s unlike probabilistic graphical models. Or many of them, anyways, where you do a discrete count of lots of different things. And then when it’s done, it’s done. In accuracy, neural nets can kind of just keep training, right? But the faster you can get it to train, the more data that you can put into it, the stronger it can be. So what we’re going to do here is we’re going to analyze what’s going on in our network and look for things that we can shave out that are going to allow our neural net to go faster. And now there’s one first one that kind of stands out to me. We’re creating a really big vector for layer 0. It’s 74,000 and 70 something values, right? And only a handful of them are being turned onto 1. Now why does this matter? Well, this four propagation step is a weighted sum. We take this 1, we multiply it by these weights, we add it into layer 1. Then, we take the next one, 0, we multiply it by these weights and we add that into layer 1. Woh, wait a minute. We take a zero and we multiple it by these weights, and then we add the result of that. That means every time there’s a zero, when we take this vector and do a big matrix multiplication to create our layer one, all these zeros aren’t doing anything, because zero times anything is still just zero. So, zero times this vector added into layer one doesn’t change layer one from what it was before. So that, to me, is like the biggest source of inefficiency in this network. To kind of show you, computationally, and sort of prove to you that this is the case, check this out. So we have kind of a fake layer 0 that only has 10 values, we’re going to picture it here. And then we’re going to say, okay, layer zero. Layer 0, 4=1, layer, kind of pretend that we put a few words in here. [BLANK_AUDIO] Now we’re looking at layer zero again, it looks like that, right? So, weights 01, we’re going to say this is just a random write matrix. [BLANK_AUDIO] And then we’re going to say, okay, layer0.weights01. Okay, so that’s the output. Now, what if instead we only summed these vectors in here, right? So we just said, okay, 1 times this goes in here. So if we have these two indices, 4, 9, we have to have a new layer, right? So layer1 equals np.zeros, so it’s empty, and it’s got 5 values. And we’re going to say, for index in indices, weights0 1, index, layer 1. [BLANK_AUDIO] += 1 *, because one is going in the data, layer one. Boom. Exactly the same values, look at that and the cool thing here is we only actually worked with part of this matrix. So if this, you know two words out of 70,000 words then we just saved not having to, you know perform this operation in this sum with the 69,000 other words. That should be a pretty great savings. Now, we’ll see how much it actually works out to in the end, but that should be really positive. I’ll be curious to see how that kind of works out. Now let’s take a look at the neural net again, look for some more efficiency. Now the other thing that’s inefficient is one times anything is just itself so this whole one times thing is kind of a waste. So what if instead we change this to just be a sum? One times, we can just eliminate that, right? So [INAUDIBLE], pair one, still the same thing. Awesome. So we got rid of this multiplication, we got rid of doing these all together, and we’re still getting the same hidden state. That we were getting when we did this full dot practice, this full matrix or vector matrix multiplication. I’m really liking this. I think this is a ton to build upon. And most of the weights are over here, right? There’s only four weights that go from the hidden to the output. Well in our case it’s hidden layer size. I think we have a bigger layer, but most of the computation is here. 74,000 by whatever hidden layer size, this is the beefy part of training and writing our neural net. So that brings us to kind of the next project. So project five is about installing this into the neural network from before. So I’m going to go ahead and let you take a stab at that and then we’re going to come back and talk about how we can do that in a neural net before. All right, best of luck.

%d 블로거가 이것을 좋아합니다: