3-2. YOLO Archives

9 – 09 NonMaximal Suppression V1

2021-08-08 by Dr. Serendipity

For a test image broken down into a grid, how can we handle the case in which our grid CNN produces multiple grid cell vectors and multiple bounding boxes for the same object? To account for this, we use a technique called non-maximal suppression. This uses the IOU between two predicted bounding boxes to select … Read more

8 – 08 Intersection Over Union IOU V1

2021-08-08 by Dr. Serendipity

Before learning about non-maximal suppression, we’ll need to learn about intersection over union or IOU, which is a technique used in non-maximal suppression to compare how good two different bounding boxes are for a given object. It’s easiest to see how to calculate IOU in an example. Take these two bounding boxes. We define the … Read more

7 – 07 Too Many Boxes V2

2021-08-08 by Dr. Serendipity

One of the problems with this grid-based method for object detection is that a trained CNN, when faced with a new test image will often produce multiple grid cell vectors that are all trying to detect the same object. This means lots of output vectors that all contain slightly different bounding boxes for the same … Read more

6 – 06 Generating Bounding Boxes V3

2021-08-08 by Dr. Serendipity

How does Yolo find a correct Bounding Box when it looks at an image broken up by grid? The trick is that it assigns the ground-truth bounding box for one object in an image to only one grid cell in the training image. So, only one grid cell is meant to locate the object. Now, … Read more

5 – 05 Training On A Grid V2

2021-08-08 by Dr. Serendipity

Training on grid cells requires a very specific kind of training data. To train a network to output a predicted vector of class scores and box coordinates for each cell, we need to have a true vector to compare it to. So, for each training image, we have to break it into a grid, and … Read more

4 – 04 Using A Grid To Improve Localization V2

2021-08-08 by Dr. Serendipity

The implementation of sliding windows is very slow, but it can be faster if you choose a stride so that each window covers a new part of an image and there’s no overlap. Inspired by this approach, YOLO uses a grid instead of sliding windows. So, let’s see how this works? In this example, we’re … Read more

3 – 03 A Convolutional Approach To Sliding Windows V3

2021-08-08 by Dr. Serendipity

Since objects can be anywhere in a given image, you can make sure to detect all of them by sliding a small window over the entire image and checking for objects within each of the created windows. This is the Sliding Windows approach. Let’s see how this works in detail. Suppose I’ve trained my CNN … Read more

2 – 02 YOLO Output V2

2021-08-08 by Dr. Serendipity

When we talked about localization in images, we talked about creating a CNN that could output a predicted class front object in an image and a predicted bounding box for that object. In the CNN examples that we’ve seen, these outputs are analyzed separately in the network trains by using a weighted combination of classification … Read more

11 – 11 YOLO Algorithm V3

2021-08-08 by Dr. Serendipity

Let’s see how your load takes in an input image and detects multiple objects. Say we have a CNN that’s been trained to recognize several classes, including a traffic light, a car, a person, and a truck. We give it two types of anchor boxes, a tall one and a wide one, so that it … Read more

10 – 10 Anchor Boxes V3

2021-08-08 by Dr. Serendipity

From what we’ve seen, YOLO can work well for multiple objects where each object is associated with one grid cell. But what about in the case of overlap, in which one grid cell actually contains the center points of two different objects? We can use something called anchor boxes to allow one grid cell to … Read more

1 – 01 Introduction V3

2021-08-08 by Dr. Serendipity

Now you’ve learned about a number of region-based methods for recognizing and locating multiple objects in a scene. Architectures like faster R-CNN are accurate, but the model itself is quite complex, with multiple outputs that are each a potential source of error. Once trained they’re still not fast enough to run in real time. In … Read more