Questions about accuracy threshold in lesson 3 planet exercise

pulse201 · July 29, 2019, 11:08pm

Hello, first post here, let’s see how this goes.
So I’m having trouble understanding a few things about accuracy threshold. First, Jeremy says that metrics are not a part of your model and that they don’t affect predictions. Now if you set your threshold as 0.2 that means that if the final probability for a particular class is higher than 0.2 the model will classify it has having that class right? Well doesn’t that affect your predictions then? Shouldn’t it be a parameter as well in that case? If it does, then how can you choose it?

What am I missing here?

Any help would be appreciated.

blueharen · July 30, 2019, 1:56pm

Not totally sure I understand your question but I’ll give it a shot.

The metrics are just things like accuracy or beta scores - basically they are a quick way to see how your model is performing catered to a certain judgement.

You are correct in the way that the threshold works. If the model finds an object with 55% certainty, it will classify the image as having the object since default threshold is 50%. So, changing this threshold really only changes the likelihood of classes being identified. For example, in one of my projects, my model usually has about a 99% chance of a class if it exists. Often, it will show a 80% chance of a class that doesn’t exist. To fix this, I changed the threshold to 90% in order to discard these false positives while maintaining accuracy for the correctly labeled classes.

pulse201 · July 30, 2019, 11:21pm

But if it changes the likelihood of classes being identified it is a part of your model right? In that case could we parametrize the threshold and incorporate it to our model to improve accuracy? Why do we set it manually?

blueharen · July 31, 2019, 11:14am

That is a fair point and I did not think of it in that way.

But I think the reason we set it manually is due to what we are looking for, based on the needs you have. Say for example I was doing medical research and was finding cancer cells based off of images. In a case like this, I would rather have false positives over false negatives - we don’t want to tell people who have cancer that they don’t, and if we tell people who don’t have cancer that they might, a quick checkup would solve the problem. Something like this would need a lower threshold.