13 – Analysis_ What’s Going on in the Weights

Welcome back. So, in this section we’re going to be talking a little more about what’s going on, a little more theory, a little less project based. And it’s really just about trying to understand what are these weights doing? What can I attach my mind to when I’m thinking about a neural net training that’s going to be able to help me debug it, help me be able to find better signal and cut out better noise and just have a better framework to work from. To think about things like that. So things that we know, so we’re taking this 1 that represents X1, right? And we’re summing it into layer_1 and we’re making a prediction. Now there’s an interesting phenomenon that happens when, especially when you have linear layers right here. And I talked about this before very briefly when I was saying that we wanted to have a linear layer here and I said I would talk about it later, and now’s the time. When we have a linear layer like this, it’s making a prediction. What’s really happening is these four, in this case or more, however many you decide it to be, weights are feature detectors. What they’re doing is, they’re trying to detect a certain state in this neuron, or this neuron, or this neuron. Right, and so if this is like a big negative number, this is looking for this to be a big negative number. But if this is zero or if this is a positive number, it’s going to have the opposite effect, right? So these are looking for a certain set of states, it’s looking for 0.5, 0.9, 1 and negative 2, right? So there are certain hidden layers that cause this to output a high number and a certain hidden layers that cause this to output a low number. And it’s just weighted sum, it’s actually pretty simple, right? If these are positive in all the same times or all the same ways that these weights are positive, this puts a lot. But if this is positive and this is negative and this is positive. Positive, and this is negative. And this is positive, and this is negative. And this is positive, and this is negative. Then this ends up being a really really low number, right? So, when these have the same polarity, and have high values in all the same places, this makes a high prediction. So what does that mean for all this stuff back here? Well, horrible and terrible, when we think about what commonly happens in our data set. They’re associated with negativity, right? Excellent, and fantastic, if that was another word up here, are constantly trying to predict this one. They’re trying to relate to this vector in a certain way, right? So what’s interesting is that if these words are being trained, which they’re just rows in the matrix, right? They’re being trained to manipulate this in the same way, to create a high value or a low value. What does it mean when we’ve learned here. We backup the gradients and we update these to both exhibit the same phenomenon. Horrible and terrible are both supposed to affect this node in the same way. And as we’re updating these weights to cause them to do that, what happens? Their weights become similar, why? Because both of them have the same goal, right? So, if this one predicts positively, this thing’s going to go no, no, no. Update your weights, mister horrible vector, and be a little more like this, because this is what causes me to predict negative. That’s what that propagation is all about, it’s about this node being able to tell these weights how to be better, how to be more accurate, right? And because it’s sending the same signal, horrible and terrible, right? Be negative, right? They end up congregating together, and it’s really quite cool. So check this out. So we’ve got kind of a heuristic for seeing how similar the different vectors are for each word, and now we can go through and we can say, okay, show me the vectors. The words that have the most similar vector to excellent. I’m just using a simple .product. Excellent, perfect, amazing, today, wonderful, fun, great, best, and this is on a trained neural net. Look at that! After we trained, because these words, excellent, perfect, amazing, all were supposed to create the same effect or very similar effect on the output. They were given similar weights. Inversely, if we say the opposite, if we say terrible. [BLANK_AUDIO] Worst, awful, waste, poor, terrible, dull, poorly, disappointment, fail, disappointing, boring, unfortunate. I mean, it just goes on, and on, and on. This is actually an even better filter than out original counts. I mean, our counts had these weird names in them, and they just had a noise, but this is awesome. It’s clear, it’s evident that the network has figured out that these words are related, right? These words are both trying to create the same effect on the output data, right? Now to be clear, we can’t over state this. It doesn’t know that these have the same meaning generally. All it knows is that they have the same meaning in the context of this one output neuron, because it’s the only output neuron there. They exist to create the same effect as this neuron. Now we had a bunch of different neurons that were pushing it in one way or another to be the same or different in a bunch of different ways, this could get a lot more complicated. But basically it grouped words by sentiment, right? It said hey, all your negative words have a really similar vector coming out of you and all your positive words have a really similar vector coming out of you. And that’s a really powerful intuition, because that means you can look at a neural net like this and you can say, hm, okay, we’re going to trade it for this. So it’s probably going to group these vectors like this and these vectors like this, and you can kind of start to think about what´s going on under the hood and how you can, once again, identify signal and noise. Because that´s what framing the problem is really all about. Now to kind of add a bit of extra kind of jazz to this. What I like to do is something that’s called tf–idf. So tf–idf, what it does is it takes a high dimensional vector, and in our case, high dimensions might just be four, right? And it clusters them into two dimensions, or some other, with how many you pick, so we can plot them on an X Y graph and see how they’re naturally clustered in a higher dimension. And the cool thing is that we can do that here. So what I’m going to do is I’m going to cluster them, but then I’m also going to say hey, the words that are our ratio instead of really positive I’m going to make them green, and the ones that were really negative I’m going to to make them black. And so, in theory because all these vectors are really similar, it should cluster, or it should show how they’re clustered already. And look at that. So it’s, I mean there’s a couple that are scattered in between but, the neural net that has clearly separated those. There’s this big long negative cluster, big long negative cluster. And then these really nice positive clusters. Let’s take a look. So I think we can add labels here. I left them in, that was kind of messy. Check this out. So we click that real zoom, you can kind of look in. Wow, interesting. So unconvincing, dimensional, lousy. Look at this. Extraordinary, lovable, provoking, award, touched. It’s beautiful, it’s awesome. So we can kind of look around. So this, maybe we’ll call this a vector space. And actually you see the ones that are kind of agnostic are like the ones with those names. So this is kind of a little bit of a noisier one. It’s like Dan right there. [LAUGH] But still it did a really good job of clustering vectors by sentiment automatically, right? I guess I told it to do this because I gave it training data to allow it to do this. But all it was trying to do is predict accurately, but implicitly while it was trying to predict accurately, it had to group these words to have similar different vectors so that it can classify them in aggregate when they’re all summed together as being more positive or more negative. And that’s what the neural network did. And it’s really cool to kind of see it kind of behave this way. And so I guess to close, this is what neural nets are all about. This is what framing the problem’s all about. It’s about understanding what’s going on under the hood so you can reduce the noise as much as possible. Increase the signal as much as possible so that it can find the structure that you’re interested in it finding. We didn’t tell it that lovable and extraordinary were similar terms in this context. All we said was, hey, all these reviews are positive, all these reviews are negative, figure it out. And it figured out which terms were similar, which terms were different. It was able to do that to make accurate predictions. So I hope you’ve enjoyed this segment. So we’ll see you soon and continue to enjoy your course.

%d 블로거가 이것을 좋아합니다: