Now that you know how the algorithm works you may be wondering how exactly can we use ORB descriptors to perform object recognition? Let’s look at an example that displays how ORB can detect the same object at different scales and orientations. Suppose I want to be able to detect this person’s face in other images, say in this image of multiple people. We’ll call this first image the training image. This second image in which I want to perform face detection will be called the query image. So, given this training image, I want to find similar features in this query image. The first step will be to calculate the ORB descriptor for the training image and save it in memory. The ORB descriptor will contain the binary feature vectors that describe each key point in this training image. The second step will be to compute and save the ORB descriptor for the query image. Once we have the descriptors for both the training and query images, the final step is to perform key point matching between the two images using their corresponding descriptors. This matching is usually performed by a matching function, whose aim is to match key points in two different images by comparing their descriptors and seeing if they’re close enough together to make for a match. When a matching function compares two key points it reached the quality of the match according to some metric, something that represents the similarity of the key point feature vectors. You can think of this metric as being similar to the standard Euclidean distance between two key points. Some metric simply ask, do the feature vectors contain a similar order of ones and zeros? It’s important to keep in mind that different matching functions will have different metrics for determining the quality of the match. For binary descriptors like the ones used by ORB the hamming metric is usually used because it can be performed extremely fast. The hamming metric determines the quality of the match between two key points by counting the number of disimilar bits between their binary descriptors. When comparing the key points in the training image with the ones in the query image, the pair with the smallest number of differences is considered to be the best match. Once the matching function has finished comparing all the key points in the training and query images, it returns the best matching pairs of key points. The best matching points between our training image and our query image are displayed here. We can clearly see that the best matching points between our training image and our query image mostly all correspond to the face in the training image. There are one or two features that don’t quite match up, but may have been chosen because of similar patterns of intensity in that area of the image. Since most points correspond to the face in the training image, we can say that the matching function has been able to recognize this face in the query image correctly.