# 5 – K-means Implementation

Here I’ve read in an image of a monarch butterfly. And I want to segment this image into a few pieces, just enough to separate the background green scenery and the orange and black butterfly. To perform K-means segmentation, I’m going to focus on one distinguishing feature, the color of each pixel. I’ll need to reshape this image so that I’m feeding in a 2D array to the K-means algorithm. It should be m by three in dimension, where m is the number of pixels and three is the number of color channels. I’ll also convert these values to a float type. This is all done to prepare the data for K-means clustering. OpenCvs K-means function will expect this kind of data. Then to perform K-means, I just use cv2.Kmeans function. This takes in our m by three array of pixel values that we just created. Then a value for K which I’ve initially set as two. Then any labels we want, none in this case. Then our stop criteria, which I’ll have to define. Then a number of attempts and finally how we choose our initial center points, which is randomly. But let’s go back to our criteria for a moment. I’m actually going to define this above this function call. The criteria tells this algorithm when to stop. Here, I’m making that criteria a value of epsilon or a max number of iterations. The max number of iterations are 10 and epsilon is a value we briefly talked about earlier where if the clusters move less than this after some iteration, the algorithm stops, it’s reached convergence. Now I’m choosing a low value of k at first because it will take a shorter time to process. And a K equal to two should produce a binary image of the two most prevalent colors in the image. Then K-means produces an output that results in the cluster labels 0 and 1 in this binary case and the center points. Next, you’ll want to display this result. To display the segments, I’ll need to convert the data back into an 8-bit image. I’ll also reshape this segmented data into the original shape of the image copy. Now I’ll be able to plot the segmented image as I would any other image. It takes a short amount of time and you can see that this image is now broken down into two distinct segments. One cluster is this orange value and one a sort of dark swampy green background. And I can actually visualize the cluster labels by displaying them one at a time like a mask. Let’s see the label equal to one. So here we can see that cluster one contains the colorful butterfly and flower pixels. This is our orange cluster. I can even go so far as using this information to mask the image at this segment. Here you can see I blocked out that whole segment. But let’s go back to our K-means step up here and instead of having an equal to two let’s up this K to six and run it again. This took longer to process but now you can see six distinct colors segments. There is a brighter orange for the butterfly’s wings and colors for the flowers and different green background. And I can do anything I want with each of these segments like masking or performing some analysis just on the wings of the butterfly or background. So segmentation can be a powerful tool.