9 – 09 NonMaximal Suppression V1

For a test image broken down into a grid, how can we handle the case in which our grid CNN produces multiple grid cell vectors and multiple bounding boxes for the same object? To account for this, we use a technique called non-maximal suppression. This uses the IOU between two predicted bounding boxes to select … Read more

8 – 08 Intersection Over Union IOU V1

Before learning about non-maximal suppression, we’ll need to learn about intersection over union or IOU, which is a technique used in non-maximal suppression to compare how good two different bounding boxes are for a given object. It’s easiest to see how to calculate IOU in an example. Take these two bounding boxes. We define the … Read more

7 – 07 Too Many Boxes V2

One of the problems with this grid-based method for object detection is that a trained CNN, when faced with a new test image will often produce multiple grid cell vectors that are all trying to detect the same object. This means lots of output vectors that all contain slightly different bounding boxes for the same … Read more

4 – 04 Using A Grid To Improve Localization V2

The implementation of sliding windows is very slow, but it can be faster if you choose a stride so that each window covers a new part of an image and there’s no overlap. Inspired by this approach, YOLO uses a grid instead of sliding windows. So, let’s see how this works? In this example, we’re … Read more

3 – 03 A Convolutional Approach To Sliding Windows V3

Since objects can be anywhere in a given image, you can make sure to detect all of them by sliding a small window over the entire image and checking for objects within each of the created windows. This is the Sliding Windows approach. Let’s see how this works in detail. Suppose I’ve trained my CNN … Read more

2 – 02 YOLO Output V2

When we talked about localization in images, we talked about creating a CNN that could output a predicted class front object in an image and a predicted bounding box for that object. In the CNN examples that we’ve seen, these outputs are analyzed separately in the network trains by using a weighted combination of classification … Read more

10 – 10 Anchor Boxes V3

From what we’ve seen, YOLO can work well for multiple objects where each object is associated with one grid cell. But what about in the case of overlap, in which one grid cell actually contains the center points of two different objects? We can use something called anchor boxes to allow one grid cell to … Read more

1 – 01 Introduction V3

Now you’ve learned about a number of region-based methods for recognizing and locating multiple objects in a scene. Architectures like faster R-CNN are accurate, but the model itself is quite complex, with multiple outputs that are each a potential source of error. Once trained they’re still not fast enough to run in real time. In … Read more