Our first task is object detection. And for that, we can use bounding boxes. They are a simpler method of scene understanding compared to segmentation. In neural network, just has to figure out where an object is and draw a type box around it. There are already great open source state of the art solutions, such as YOLO and SSD models. These models perform extremely well even at high frame per second. They’re useful for detecting different objects such as cars, people, traffic lights, and other objects in the scene. However, burning boxes have their limits. Imagine drawing and bounding box around a curvy road, the forest, or the sky, this quickly becomes problematic or even impossible to convey the true shape of an object. At best, bounding boxes can only hope to achieve partial seen understanding.