Let’s see how your load takes in an input image and detects multiple objects. Say we have a CNN that’s been trained to recognize several classes, including a traffic light, a car, a person, and a truck. We give it two types of anchor boxes, a tall one and a wide one, so that it can handle overlapping objects of different shapes. Once the CNN has been trained, we can now detect objects in images by feeding at new test images. The test image is first broken up into a grid and the network then produces output vectors, one for each grid cell. These vectors tell us if a cell has an object in it, what class the object is, and the bounding boxes for the object. Since we’re using two anchor boxes, we’ll get two predicted anchor boxes for each grid cell. Some, in fact most of the predicted anchor boxes will have a very low PC value. After producing these output vectors, we use non-maximal suppression to get rid of unlikely bounding boxes. For each class, non-maximal suppression gets rid of the bounding boxes that have a PC value lower than some given threshold. It then selects the bounding boxes with the highest PC value, and removes bounding boxes that are too similar to this. It will repeat this until all of the non-maximal bounding boxes had been removed for every class. The end result will look like this, we can see that yellow has effectively detected many objects in the image such as cars and people. Now that you know how YOLO works, you can see why it’s one of the most widely used object detection algorithms today. Next, you’ll get to work with the code implementation of the yellow algorithm, and really see how it detects objects in different scenes and with varying levels of confidence.