2 – 02 YOLO Output V2

When we talked about localization in images, we talked about creating a CNN that could output a predicted class front object in an image and a predicted bounding box for that object. In the CNN examples that we’ve seen, these outputs are analyzed separately in the network trains by using a weighted combination of classification and regression losses. Another way to process these outputs is by merging them into a single output vector, which is what the yellow algorithm does. Let’s see this in an example. Let’s assume I want to train a CNN to be able to detect three classes, a person, a cat, and a dog. In this case, because we only have three classes, the output vector y will only have three elements, C one, two, and three. Each of which is a class score or a probability that the image is of a person, cat, or dog. If you have more classes, this vector will get longer. For this image, we want to train the CNN so that it can identify the person in this image and look at that person within a bounding box. We can do this by adding some box parameters to our output vector. We can add four more numbers, x, y, w and h, that determine the position and size of the bounding box. X and y determine the coordinates of the center of the box, and w and h determine its width and height. Once you’ve trained your CNN to output class probabilities and bounding box coordinates, you’re one step closer to being able to detect objects in any given image. Next we’ll briefly go over the sliding-windows approach that you’ve seen before with our example output vector in mind. Then you’ll see how yellow improves upon sliding windows and breaks an image into a grid for efficient object detection.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading