12 – Labeled Data and Accuracy

After exploring the day and night image data, you may have noticed a part of the data that we haven’t yet gone over. A label associated with each image. So, what exactly is a label? And why do we need It? A label is kind of like a tag that’s attached to a specific image. And that tells you something about that image. You can think of a label sort of like a name tag. I wear a name tag to events when I meet new people, and my name tag labels me as Cezanne. Now, an image can have multiple labels that describe it, which would be like if I had multiple labels, like human or wears glasses, that each described something about me. But for this lesson, we’ll be working with one label per image. These labels separate the image data into classes. And classes are like general categories. So the label for me might be human which is a category distinct from a label like a table or a car or any other thing, and it’s more general than a label like Cezanne. So, for the image data sets we work with, we should have as many labels as we have classes. In the case of our day and night images, we have two labels, day and night. Now, why do we need these labels? You can tell if an image is night or day, but a computer cannot unless we tell it explicitly with a label. This becomes especially important when we’re testing the accuracy of a classification model. A classifier takes in an image as input, ensure the output a predicted label that tells us the predicted class of that image. Now, when we load in data like you’ve seen, we load in what are called the true labels. And a true label is just the correct label for that image. To check the accuracy of our classifier, we compare the predicted and true labels. If the true and predicted labels match, then we’ve classified the image correctly. But sometimes the labels do not match, which means we’ve misclassified an image. After looking at many, many images the accuracy of a classifier, is defined as the number of correctly classified images, for which the predicted label matches the true label, divided by the total number of images. So, say we tried to classify 100 total images, and we correctly classified 81 of them, meaning we misclassified 19 of them. That would mean we’d have 0.81 or 81 percent accuracy, and we can only tell a computer to check the accuracy of a classifier when we have these predicted and true labels to compare. We can also learn from any mistakes the classifier makes, as we’ll see later in this lesson. As a note, it’s good practice to use numerical labels instead of strings or categorical labels. Numbers are easier to track and compare, so for our day and night binary class example, instead of day and night labels, we’ll use the numerical labels zero for night, and one for day. Okay, now you’re familiar with the day and night image data, and you know what a label is, and why we use them. You’re ready for the next steps. We’ll be building a classification pipeline from start to end. Let’s first brainstorm what steps we’ll take to classify these images.

%d 블로거가 이것을 좋아합니다: