When we train a CNN to classify a set of images, we train it by comparing the output predicted class with the true class label and seeing if they match. We typically use Cross-entropy to measure the error between these classes because Cross-entropy loss decreases as the predicted class which has some uncertainty associated with it gets closer and closer to the true class label. But, when we look at comparing a set of points, say locations or points on a face or points that define a specific region in an image, we need a loss function that measures the similarity between these coordinate values. This is not a classification problem, this is a regression problem. Classification is about predicting the class label and regression is about predicting a quantity. For regression problems, like predicting X_Y coordinate locations, we need to use a loss function that compares these quantities and that gives us a measure of their closeness. It’s also interesting to note that with classification problems, we have an idea of the accuracy. If our predicted class matches the true class label then our model is accurate, but with regression problems, we can’t really say whether a point is accurate or not. We can only evaluate quantities by looking at something like the mean squared error between them. So for regression problems, we often talk about models with a small error rather than models that are accurate. To measure the error between two quantities, we have a few different types of loss functions that we can use. The simplest measure is L1 loss which measures the element-wise difference between a predicted output which I’ll call P and a target T. Say we’re predicting just one point P and X_Y coordinate that indicates the center of an object in an image, in this case, the loss function we’ll look at the predicted point P that was generated by a CNN and the true target location T of the center of the object, and L1 loss would return a value that represents the distance between the predicted and true points. We also have MSE loss, which measures the mean squared error between the elements in a prediction P and a target T. Both of these methods can be good for measuring the distance between points but all loss functions have their strengths and weaknesses. You may consider that L1 loss can become negligible for small error values and the MSE loss responds the most to large errors, and so it may end up amplifying errors that are big but infrequent. Also known as outliers. There’s also Smooth L1 loss which for small differences between predicted and true values uses a squared error function and for larger errors, uses L1 loss. So Smooth L1 loss try to combine the best aspects of MSE and L1 loss. It will really be up to you to try these different loss functions, look at how they decrease during training, and choose the best one for a given regression task.