6 – 05 Scale And RotationInvariance V2

ORB uses fast to detect key points in an image. And it goes to a couple of extra steps to make sure that it can detect objects no matter their size, or location in an image. Given an image, the ORB algorithm starts by building an image pyramid. An image pyramid is a multi-scale representation of a single image that consists of a sequence of images all of which are versions of the original image at different resolutions. Each level in the pyramid consists of a down sampled version of the image in the previous level. Down sampled means that the image resolution has been reduced. In this example, the image was down sampled by a factor of two, and so a portion that was initially a four by four square area is now a two by two square area. A down sampled version of an image contains fewer pixels and has been reduced in size by this factor of two. Here we see an example of an image pyramid with five levels. At each level, the image is down sampled by a factor of two, and by level four we have an image that is one-sixteenth the resolution of the original image. Once ORB has created the image pyramid, it uses the fast algorithm to quickly locate the key points in the differently sized images at each of these levels. Since each level of the pyramid consists of a smaller version of the original image, any objects in the original image are also going to be reduced in size at each level of the pyramid. So, by locating key points at each level, ORB is effectively locating key points for the objects at different scales. In this way ORB is partially scale invariant. This is of great importance because objects are unlikely to appear at the exact same size in every image. Especially something like a cat that may be at one time close to a camera, and another very far away or even hiding. So now, ORB has key points associated with each level of this image pyramid. After the key points in all the levels of the pyramid have been located, ORB now assigns an orientation to each key point. Like left or right facing depending on how the levels of intensity change around that key point. Let’s see how this is done in detail. ORB will start by selecting the image in Level zero of the pyramid. For this image, it will now calculate the orientation of it’s key points by first computing the intensity centroid inside a box centered at the key point. The intensity centroid can be thought of as the position of the average pixel intensity in a given patch. Once the intensity centroid has been calculated the orientation of the key point is obtained by drawing a vector from the key point to the intensity centroid as shown here. The orientation of this particular key point is down and towards the left because the brightness in this region of the image increases in that direction. Once an orientation has been assigned to each key point in the image at level zero of the pyramid, ORB now repeats the same process for the images at all the other pyramid levels. It’s important to note that the patch size is not reduced in size at each level of the image pyramid. Therefore, the image area covered by the same patch at each level of the pyramid will be larger. This results in key points having different sizes. Which can be seen here. In this image the circles represent the size of each key point. Key points with the bigger size were found in higher levels of the pyramid. After having located and assigned an orientation to the key points, ORB now uses a modified version of brief to create the feature vectors. This modified version of brief is called rBRIEF, or Rotation-Aware Brief which can create the same vector for key points no matter the orientation of an object. This makes the ORB algorithm rotation invariant, meaning it can detect the same key points in an image that’s rotated at any angle. RBRIEF starts out in the same way as BRIEF by selecting 256 random pairs of pixels inside a defined patch around a given key point to construct a 256 bit vector. It then rotates these random pairs of pixels by the orientation angle of the key point, so as to align the random points with the orientation of the key point. Finally, rBRIEF compares the brightness of the random pairs of pixels, and assigns ones and zeros accordingly creating the corresponding feature vectors. The set of all the feature vectors for all the key points found in an image, is known as the ORB descriptor.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading