9 – Understanding Neural Noise

Okay, so in this section we’re going to talk about noise versus signal. Now job is to look for correlation and neural nets can do very, very good at that. However, once again, this talks about framing the problem so that the neural nets have the most advantage and can train and understand the most complicated patterns of these as possible. And so what we saw on our last section is that wow this thing really wasn’t training very quickly. It seemed like there’s a lot more signal in this, excellent, versus terrible, versus moving, versus, you know, what we can some of these other words that are really positive or negative. It just seems like if I as a human was looking at this text I could predict better than 60% accuracy using just the words. So what we want to do is go back to, but before we start getting fancy with crazy regularization or fancy things in the neural net we want to go back to the data. The data is where all of the gold is. Like the neural net is just the backhoe that we’re going to dig all the gold out of the ground with. But if we’re not finding much gold, especially in the beginning, it’s probably not a problem with our backhoe. It’s probably a problem with where we’re digging, the dirt that we’re choosing, how we’re manipulating it. And so, what I want to talk about here is noise versus signal because that’s the whole game. The whole game is saying we have a ton of data, we know there’s a pattern in here, we want the neural net to be able to find it. Now earlier, by kind of sheer luck, when we created this update_input_layer, I saw an 18 here. And at the time, it kind of checked off my mind. Wow, that’s really high. Consider from me what it’s like if there was an 18 right here, right? So what this is, this for propagation, it’s a weighted sum so there’s four weights coming out of this input layer, right? One gets multiplied times four, one gets multiplied times these four and then those two vectors are summed here. So it’s a weighted sum, well it’s actually a weighted sums of weighted sums, but whatever. This vector is characteristic of horrible. So when I say vector, I mean the list of weights, so this weight, this weight, and this weight, have a certain impact on these four nodes. [LAUGH] You know it’s funny, they kind of interpret each other, right? So how high this number is affects how dominantly these weights control this hidden layer. And these weights control how dominantly this input affects this hidden layer. So they’re multiplied by each other and it’s an associative thing, so that they both kind of interplay with each other in that way. However, if this is multiplied by 18, and this is multiplied by 1, this is going to be the dominant. I mean, this vector is basically going to be exactly the same as these four weights. For horrible, multiplied by 18, like is a percentage of the amount of energy that’s in these nodes it’s going to be mostly this word and so I was looking at this and was going, okay which one is being weighted by 18? Well if we look at word index, [BLANK_AUDIO] Sorry, vocab 0, because this is the zeroth position. Or is that review vocab, reviews, is it reviews vocab? Might have been, I might have deleted it. Get back here. [BLANK_AUDIO]. [INAUDIBLE] We’ll add. Okay, so this is it. So it’s nothing. [LAUGH] So when I took [INAUDIBLE] one of the words is a nothing word. And I look in here and I go well there was 18 of them. One of them is probably fine, and the neural net can sort that out. But you know it’s seeing mostly nothing in this vector, and then word over here, word over here, very softly, right? You know I look at kind of the rest of things. So let’s just review that split [SOUND] space, review 0 space space. Okay. [BLANK_AUDIO] Empty, empty, empty. Look at that, empty, empty. So at first I’m thinking okay, maybe a tokenization error, but then there’s a whole bunch of periods in here, so period happens a bunch. I wonder what the distribution is? So a single review_counter = counter, right? So for words is reviews(0).split, review_counter word equals one. So, once again, using these counters. I love these counters. Review counter, most common, show me. Wow, look at that. Look at that. The dominant words, these have nothing to do with sentiment. So there’s going to be some standard words down here, but, insightful it’s right there. But when you look at this, most of this review are completely irrelevant filler words like the, to, I, is, of, a. And this waiting is causing it to have a dominant effect in the hidden layer, and the hidden layer is all of the output layer gets to use to try to make a prediction. So if this hidden layer doesn’t have rich information, the output layer is going to struggle. So now I’m sitting here going okay wait, we decided to do counts earlier. Maybe counts was a bad idea, because the counts doesn’t highlight the signal. When I’m looking at these counts it seems like it highlights the noise. But when I say highlight what I mean is weights it most heavily. Neural nets they’re just weights and functions. Like you take this, these set of values, you re-weight them, right, into these four nodes, and then you run a function. In this case we don’t do a function here, but it’s a linear function. And then we re-weight them again, and we do a function, and that’s our prediction, right? So if our weighting is off, or how we’re creating our input data, it’s going to make it really hard to find the signal. That’s noise. That means that the way that we’re framing the problem is adding a significant amount of noise. Because in these weights, the neural net has to learn to be like, hey, period, quiet down. Hey entity, quiet down, v, two, I, high, is, of, a. Quiet down. I need to hear insightful. I need to hear welcome. I need to hear other positive, because this is positive view. I don’t know if it’s a negative review we could look too, but this neural net is trying to quiet down all the words that aren’t relevant and listen more attentively to the words that are relevant. But we’re not helping it by causing the weight to be the things that are most frequent. And so I think that we should try eliminating this. So in project four I think we’re going to try that. So let me go ahead and describe what that’s going to be. So we’re going to grab the code that we had for Project 3. So we going to go down to Project 4, and say okay, Project 4: Reducing Noise in our Input Data. Right? And we’re going to take this network and we’re going to say, okay, how can we modify this network so that we don’t weight it by these counts anymore? Well, if we don’t weigh it by its counts anymore, then that means that this would always be a one or a zero. So at this point we’re changing it so that it’s just a representation to vocabulary in general. If we did that that should work actually, because then okay, so the period and the the, and to, and I will still be in here, the neural net has to decided which words are most important. But it doesn’t have to say okay, period times 27, so 27 times the weights where period going into the hidden layer. That’s a lot of signal to push back down, where as if we just say ones and zeros, and do a binary representation, that should be a lot less noisy and a lot easier for the neuron to figure out. Now I think in here, that’s actually going to be pretty easy to change, and I’ll just do that project right here. So it’s in our update_input_layer. Before we were incrementing it, if we just get rid of that plus we’re going to set it to equal one so to each value in layer zero we’re going to set to equal one if that vocabulary term exists. Okay so let’s rebuild that. And then let’s grab our training value from up here. We’ll do our original one. [BLANK_AUDIO] And we need our trained one too. Copy that and get rid of that. So have a record, go down here. Create our network. And this is our, basically our new class, and hit train. Look at that, look at that. It’s already to 60% after 2% of progress. This is amazing progress, look at that. So we eliminated a lot of our noise, right, by getting rid of this weighting. And the neural net was able to find correlation so much faster. Look at that, 70%. And we’re only 9% into training. See, this is what increasing a signal and reducing the noise is all about. It’s about making it more obvious for your neural net so that it can get to work at handling the signal and then combining it in interesting ways and looking for more difficult patterns. And you just kind of get rid of the noise in your training data. We could have spent days and days and days up here just tweaking our little alpha, just moving it around, lowering it down, trying to get the train to happen slowly but in reality we can have a big fat alpha and make huge steps and progress really quickly if we just get rid of this really silly noise, because we’re trying to train our neural nets to do interesting stuff. Interesting stuff is not ignore periods. Interesting stuff is identify which words are relevent. Identify which combinations of words are relevent. That’s what we want our neural net to do. Finding interesting representations, doing interesting things in this hidden layer to really understand the vocabulary of what’s being said in the review. That’s what we want to be happening. Man, look at that, almost 80%. So I’ll go ahead and let this keep training. Go ahead and train this yourself, and yeah, so this is looking great. The next thing I’m noticing, I guess maybe just because this is a video. I would like for this to be training a lot faster. So the next thing that I would like for us to be able to do is kind of take a look inside the neural net, understanding what’s going on, and see if we can kind of crank out a little bit more speed. But for now, I’m going to let this go ahead and train.

%d 블로거가 이것을 좋아합니다: