4 – 04 Using A Grid To Improve Localization V2

The implementation of sliding windows is very slow, but it can be faster if you choose a stride so that each window covers a new part of an image and there’s no overlap. Inspired by this approach, YOLO uses a grid instead of sliding windows. So, let’s see how this works? In this example, we’re using a seven by 10 grid. In the yellow algorithm, a much finer grid is used but the overall process will be the same. Now you might be wondering, how can we get an accurate bounding box out of a grid? This was one of the challenges with sliding windows too. How can you account for the fact that these grid cells are unlikely to match with the bounding box for an object? Well, the idea is that we can assign output vectors to each grid cell, so each cell will have an associated vector that tells us one, if an object is in that cell, two, the class of that object, and three, the predicted bounding box for that object. In this way the bounding box coordinates do not have to be contained within a grid cell. Assuming we have an input image with two labels and bounding boxes, we can then train a CNN to produce the correct output vectors for each of these grid cells. We’ll call the output vector for each grid cell gn. This output vector contains the same parameters as the output vector y we saw in previous videos. For the first cell, the vector g1 will look like this, there are no objects in this grid cell, so PC equals zero. The vector will have zero for the class scores, and some values for box coordinates. The values don’t matter much because we’ll discard vectors with too low of a PC value. We’ll get the same output vector for all the grid cells that have no objects in them. Now what about this cell on top of the person in our image? This output vector which is numbered by grid cell looks like this, PC equals one because there is an object in the grid cell, and c1 equals one because the object is a person. The output vector will also hold the predicted bounding box coordinates and we’ll see how these are generated later in this lesson. Now that we know what the gn vectors look like, let’s see how we can use them to train a CNN.

%d 블로거가 이것을 좋아합니다: