24 – Confusion Matrix-Question 1

So after we develop a model, we want to find out how good it is. This is a difficult question. But in this section, we’ll learn a few different metrics that will tell us how good our model is. So we’re going to look at two main examples. The first example is a model that will help us detect a particular illness, and tell if a patient is healthy or sick. The second example will be a spam detector, which will help us determine if an email is spam or not. For example, on the left, you have an email from your grandma, which you don’t want to label a spam. On the right, you have an email that is clearly spam and you want to send that to the spam folder. So let’s look at the model for diagnosing an illness. There are four possible cases. When a patient is sick, and the model correctly diagnosed them as sick. This is a sick patient, I will send in for further examination or for treatment. This case, we’ll call a true positive. When a patient is healthy and the model correctly diagnosed him as healthy, this is a healthy patient that we’ll send home. This case, we call a true negative. When a patient is sick and the modeling correctly diagnosed them as healthy. This is a mistake, and it means we’ll be sending a sick patient back home with no treatment. This is called a false negative. And finally, when a patient is healthy and the model incorrectly diagnoses them as sick. This is also a mistake, and it means we’ll be sending a healthy person for further examination or treatment. This is called a false positive. Now, we’ll introduce what’s called the confusion matrix. This is a table that will describe the performance of a model. In this model, we have 10,000 patients. A thousand of them are sick and have been correctly diagnosed as sick. We call these true positive. 200 of them are sick and have been incorrectly diagnosed them as healthy. So we call them false negatives. 800 patients are healthy and have been incorrectly diagnosed as sick. We call these false positives. And finally, 8,000 patients are healthy and have been correctly diagnosed as healthy. We call these true negatives. The confusion matrix is a simple table that stores these four values. Now let’s look at the model for detecting spam email. There are four possible cases. When we get a spam email and the classifier sends it to a spam folder correctly, which is a true positive. When we get a spam email and the classifier incorrectly sends it to our inbox, this is a false negative. When we get a good email, for example, from our grandma and the classifier incorrectly sends it to our spam folder, this is called a false positive. And finally, when we get a good email the classifier correctly sends it to our inbox, which is a true negative. And we can also find the confusion matrix for this model. Here, we have a pool of a thousand emails. Out of these emails, 100 spam emails have been correctly sent to the spam folder. 170 spam emails have been incorrectly sent to the inbox. 30 non-spam emails have been incorrectly sent to the spam folder. And finally, 700 non-spam emails have been correctly sent to the inbox. So here is the confusion matrix. Now it’s your turn to create a confusion matrix. Look at this data where the blue points are positive, and the red points are negative. The model we’ve trained is the line that separates them, with the positive region being in the top, and the negative region in the bottom. Now please fill in the following four blanks in the confusion matrix for number of true positives, true negatives, false positives, and false negatives.