Same exercise for the optimization step for the center below. I give you a couple of hypotheses, four in total. Pick the one that minimizes the total bubble length should be easy now
So after the assign step, you can now see these little blue lines over here that marry data points to cluster centers. And now we’re going to think of what this rubber bands. They’re rubber bands that like to be as short as possible. In the optimize step, we’re not allowed, now allowed to move … Read more
So now we know that these four points correspond to the present class center one that was randomly chosen. And these three points over here correspond to class center in the middle. That’s the assignment step. obviously that’s not good enough. Now we have to optimize. And what we are optimizing is, you are minimizing … Read more
And the answer is this guy is closer these guys over her are closer. And the way to see this is you can make a line between the cluster centers and then draw an equidistant and orthogonal line and that line separate the space into a half space that’s closer to center number one, which … Read more
In k-means, you randomly draw cluster centers and say our first initial guess is, say, over here and over here. These are obviously not the correct cluster centers. You’re not done yet. But k-means now operates in two steps. Step number is assign and step number two is optimize. So let’s talk about the assignment. … Read more
Okay, and I would argue it’s 2. There’s a cluster over here. And a cluster over here. And the cluster centers respectively lie right over here and somewhere over here. So that’s the place we would like to find to characterize the data.
The perhaps the most basic algorithm for clustering, and by far the most used is called K-MEANS. And I’m going to work with you through the algorithm with many, many quizzes for you. Here is our data space. And suppose we are given this type of data. The first question is intuitively, how many clusters … Read more
And I would say the answer’s yes. You could make it so that the cluster centers sit right on top of each other, and the separation line looks like this. And all the top points are associated to the top cluster center, and all the bottom points are associated to the bottom cluster center. Granted, … Read more
Let me give another example and ask you a quiz. Suppose we have data just like this over here. Do you think there could be a local minimum if you initialize this data set with two cluster centers? Is there a stable solution where which the two cluster would not end up one over here … Read more
And the answer is positive, and I prove it to you. Suppose you put one cluster center right between those two points over here and the other two somewhere in here. It doesn’t even have an error. In your assignment step, you will find that pretty much everything left of this line would be allocated … Read more
So let’s make another data set. In this case, you’re going to pick three cluster centers and, then, conveniently, we’ll draw three clusters onto my diagram. Obviously, for three cluster centers, you want a cluster to be here, right here, and right over here. So my question is, is it possible that all these data … Read more
And the answer is no, as I will illustrate to you. K-means is what’s called a hill climbing algorithm, and as a result it’s very dependent on where you put your initial cluster centers.
So now we look at the limits of what k-means can or cannot do, and you’re going to try to break it. And specifically, talk about local minima and to do this, I want to ask you a question that you can think about and see if you get the answer right. Suppose you use … Read more
So here’s an example that should make it intuitively clear the clustering sometimes make sense. So take Katie and me, we both have a movie collection at home. And just imagine that both of us look at each other’s movies, and all movies, and Katie gets to rank them from really, really bad to great. … Read more
Now this wraps up what we’re going to talk about in terms of the k-means algorithm. What I’ll have you do is practice much more in the coding aspects of this in the mini project. But before we do that, here are few thoughts on things that k-means is very valuable for and a few … Read more
Now that I’ve explained the theory of k-means clustering to you, I’m going to show you how to use the scikit-learn implementation to deploy it in your own studies. So I start over here at Google, and I find that there’s a whole page on clustering in scikit-learn. The first thing that I notice when … Read more
Now I’m going to show you another set of data that won’t work out quite so perfectly, but you can see how k-means clustering is still. And the type of data that I’ll use in this example is uniform points. This is what uniform points look like. It’s just scattered everywhere. So I wouldn’t look … Read more
One of the things that’s immediately apparent once I start assigning my centroids, with these colored regions, is how all the points are going to be associated with one of the centroids, with one of the clusters. So you can see that the blue is probably already in reasonably good shape. I would say that … Read more
And I hope you said three, it’s pretty obvious that there should be three centroids here. So let’s add three, one, two, three. So they’re all starting out right next to each other, but we’ll see how as the algorithm progresses, they end up in the right place.
Now I want to show you a visualization tool that I found online that I think does a really great job of helping you see what k-means clustering does. And that should give you a good intuition for how it works. So I’d like to give a special shout out to Naftali Harris, who wrote … Read more