All right, so in project three we’re going to build our neural network. That is going to predict whether or not a neural review has positive or negative sediment by using the counts of words that are inside of our review. Now the changes I made first were, to create a pre process data function, that just brings in all the kind of little snippets that we built and tested above. So word to index, kind of the different vocabularies, and vocabulary sizes. Just all of the variables that we used in our training dataset generation logic, I wanted to have it in a pre-processing data function, so that it’s all kind of self contained in the variables that are in this class. The next thing I did was, I split off the stuff that was in here into an init network function, just to keep things clean, and also this needs to know the number of input nodes, the number of output nodes. Which is based on the number of reviews or unique vocabulary in our views, and the number of labels that we have. So it’s nice to kind of have these in separate function, just so that you can kind of read the progress that’s here, and just clean it that way, I kind of like it. And the next thing it did was update input layer and set target for label which are these functions that we played with before. I’m going to go ahead and move into the class just so that it’s all kind of self contained and together, and that makes this class portable, so I can use it somewhere else. All right, so now on the training method, this is where most of the action is, right? So the first thing I checked was that, the number of training reviews we have is the same as the number of labels. So in the off chance someone input something that doesn’t line up correctly, we kind of want to let people know, so we see a kind of weird behavior in around that. And the next thing we´re going to do is kind of intialize correct so far, we´re going keep track of how many predictions we get right and wrong while we´re training. This is a useful metric that I kind of like to watch to understand how the neuromet is doing during the training process, right. Is it getting better, is it not getting better at all, is it getting worse? These things are kind of the basics of understanding how you’re doing and then being able to adjust for that. Now also we select review and a label out of our training reviews, we update the input layer. This is the same as previously we were propagating from layer zero to layer one, or from your input layer to your hidden layer. However, in this case, we have to adjust and generate our input data set first before we can do this propagation. So now we generate hidden layer same way as before, except without nonlinearity, and the last one will generate with nonlinearity. So that’s our forward propagation step. Our back propagation step, the first thing we do. How close did we, did we miss? This is where we put our function that we created. We say our prediction minus our function, and then because we have a nonlinearity on this layer our layer 2 delta has to multiply by this function, which is sigmoid times 1 minus sigmoid. And then we continue to back propagate in this way. Now a thing that you see we skip here is that because there’s a nonlinearity on layer one, we don’t do this multiplication step here, unlike before, because this is a linear layer. So we don’t actually have to adjust for the not mislope of the non linearity. Once we have our layer two and layer one deltas, we’re ready to update our weights. Which you do here in the exact same way that we did in our previous neural network. And then add a little bit of logic just to kind of log our progress. As well as see how fast we’re training, and how many predictions we got correct. Now how am I deciding whether we got something correct or not? What I’m looking at is the absolute value of our prediction, or, excuse me, the absolute value of the error of our prediction. So up here we calculate the difference between what our prediction should be and what it was. And so I said if it predicts exactly 0.5, well it didn’t, it’s totally ambiguous. It’s kind of half way between positive and negative, it didn’t pick either. But if it’s closer to the right prediction, well then this error measure will be less than 0.5. And so that’s why I can kind of see how many classifications we got right, as opposed to just the loss by kind of just typing this on the fly, and then logging as we go. Now the other thing I want to be able to do here is test it. Which is really just a matter of taking it for logic and in the evaluation logic, put that in a one function, which I did here. And then I add another one for running where we can put in a text review and it converts that text into an input data, and just forward pops and give POSITIVE, NEGATIVE labels. So we can test it on the whole data set or we can kind of throw in some examples and see whether we like it. So now that we’ve got this, let’s go ahead and first validate that our and the next we’d go first and create one. Here, I’m actually selecting the first 24,000 reviews to train on. And I’m just going ahead and say it, the last we could say these 1,000 reviews can be our test data set. So I’m going to kind of continue to do that. You could pick a different training test split, there’s actually another 25,000 in the IMDB data set you could use. But, just for the sake of making it easy, I think we’re just going to go with this. So, we’re going to initialize it this way, this is actually our default learning rate. >From here, I’m going to put it in so just we can see it. The other thing I like to do before we get started is actually test it. So our waits are initialized randomly right now, so it shouldn’t really predict well at all. So in this case, testing accuracy is exactly 50% which is, if you just guessed, between positive and negative randomly then gear you should get 50% accuracy, and it’s actually what we see here. Which is a good place to start.\ Especially when you have a neural net with only two predictions, I really like to see it start off not being biased towards one way. Like if I initialize my weights in such a way where it always predicts one way or always predicts another, or it doesn’t get any of them right. Then I kind of scratch my head like, wait a minute, something’s probably broken here, and so I’ll go investigate that first. But, as we can see, it’s breaking randomly, and it doesn’t seem to have any real predictive power at the moment. So, now we’re going to try to train our network. Every, something I threw in here a little bit later is that, every 2,500 predictions it will do a new line, wo we can not just see what the progress is now but, we can kind of see it change over time. So now when I’m watching it train there’s a few things I’m looking at. First is speed, trying to kind of gauge, how long am I going to be sitting here? And then I’m also looking at the training accuracy. So now if you look, so far it’s actually not predicting particularly well. It’s doing just worse than random. Which is sort of worse than it was doing before. At this point, when we’re 14% of the way through the training data set, and it hasn’t even learned anything yet, and it’s like it is doing worst then that. I’m really starting to go okay, something is probably wrong here. They are a few types of neutral nets where at this point it actually does continue print random especially in reinforcement learning, however on this dataset we are looking in direct correlation. I should be seeing some change right here, so I’m just going to go ahead a quit this out, we could wait longer, but I just don’t think that it’s going to be a good idea to do that, so we’re going to go ahead and hit stop. The natural thing for me to do here is think, okay, so the learning rate’s too high, right? So, when things are doing like this, maybe it’s diverging, who knows. So let’s go ahead and adjust this learning rate to b. Lower and a good way to fill things out is first move by orders in magnitude so I’m going to divide it by ten, reinitialize the network, I’m queue slash then compare bounce around a little bit Man, I’m starting to see kind of the same behavior. It’s not really getting better. Now we’ll just train for a second and then kind of talk about why was I lowering rate, right? So lowering rate, if you remember from before, is the step side, it’s how big of a jump that it tries to take to reduce the error. Probably a standard reason why thins kind of thing happens is that it’s over shooting, so it’s ending up not really any closer to solving the problem than when it started because it’s going to far. Under shooting means the network trains very very slowly but it does tend to make progress, this to me could be very very slowly, but it just doesn’t look like it’s, training at all. It’s just camping out right near 50% and so that’s really concerning. And so, we’re at our 20% here, I should be seeing something at this point. So we’re going to cancel this and we’re going to go again. So check this out. Do it one more time. [BLANK_AUDIO] Okay, camping out 40% [BLANK_AUDIO] Create. Come on, buddy. It’s funny. Eventually, these types of metrics become really entertaining to watch. And man, I’m actually still kind of surprised, it is not really happening. Here we go, okay, so it’s starting to learn a little bit, so this is a good sign, right? So it’s starting to find correlation but it’s still going pretty slow, not only, like this is pretty slow as far as your views per second. It’s only expressing 100 reviews per second but then it’s not converting very quickly, right? And I can keep knocking down the learning rate, but the truth is, the more you knock down the learning rate, the slower the learning happens, right? Whereas before, overshooting, this is still going to continue knocking down. So one thing that I could do here is continue to tweak the learning rate, and I could spend all day trying to do that, and I would get incremental Improvements. But we’re so early on we haven’t refined anything, just pose some big frame questions that we need to really re-evaluate in our neural networks. Say hey, can we frame this problem so that the correlation is a little more clear, right? So right now I’m going back, I’m thinking, okay, so, up here. This is our setup, right, we’re counting the words and putting them in here, and then it’s making a prediction. What about this is so difficult for this thing that it’s taking this, so it is converging, but it’s just not going very quickly. Is there anything that we can do, to make it more obvious for the network for it to identify the words that were validated in kind of in our, well not those lists those were the raw counts. Up here, right so it finds these words more easily, so there are two things that I typically do here. One is I start changing stuff and see if stuff works, and the other one is I dig deeper in to exactly what’s going here, take a look at a few training examples. See that, make sure that the pattern that I think should be in there is actually showing up or maybe I have a mistake in my logic. Nine times out of ten, when something’s not training correctly it means that there’s something simple in here that I got backwards, more than a big complicated change. But sometimes it needs to be a big complicated change, it’s still training really really quickly. So I mean if we if we extrapolated this you know things can train fast in the beginning and then slow down and taper off. So I mean I don’t really see this getting much past 61, 62, something in that kind of range, I don’t know if they keep training. All right. So now I’m going to ask my question. Okay. How can I make this simpler? What is the signal in my training data, and what is the noise in my training data? And that’s going to be kind of a topic of the next section, which we’re going to analyze, and then try to see if we can get this training to happen faster. So feel free to let this kind of train all the way, I don’t think it’ll get too much past this. And I really feel that we’re going to be able to build a better classifier here in a minute.