Straight forward change from softmax of sigmoid


Has anyone figured out how to change from softmax to sigmoid in a resnet network only trained on single label images? I would like to know how confident the model is and not use the answer if it is uncertain when doing inference.


Hi Best,

I think this plan will not give you what you want. If you change the activation/loss function, the trained weights will turn out to be different than they are already.

What you can do is use a threshold to accept or reject the largest predicted class rather than choosing the maximum predicted class as is conventionally done after softmax.

Even so, and this is an interpretation (opinion), I don’t think this method tells you exactly “how confident the model is” that it is a dog, but rather something like “given that the photo is either a cat or a dog, the probability that it is a dog is .6”.

1 Like

Hi Pomo,

You are probably right, my frustration was/is that I can’t figure out how I edit my learner so that I can test it. It seems like I just have to change a variable name in the code.

But I did try to add a none class that I put 54 images that I snapped randomly with my phone, and that actually works much better than expected. Before I added the none class I had 3 classes, of which I wanted to search for. But as I’m analyzing a live video feed from a moving arm, I’m also seeing images that should not be classified as one of the three classes, but is not easily defined.

When i added the none class, it felt kind og magical.

But i still want to know how to calculate other propability metrics.

Thank you for your input!

I have not looked at that part of the fastai code in many months, and can’t advise on exactly where to edit. I think the softmax and selection of the prediction is done in get_preds, not in the model itself. You might try copying the get_preds code and editing to give the desired outcome. But maybe a fastai expert can chime in with a better approach.

This question of how to get “don’t know” is the most natural question to ask after learning about ML classifiers. It’s surely in need of a definitive, comprehensive unpacking and explanation. That’s beyond my expertise.

I suspect that your “none” class is learning features common to all backgrounds, including the targets’ backgrounds. Then the presence of those features in the absence of the targets’ features would classify the target as “none”. If you train only on the targets, the model would never need to learn the background features and would only classify by the relative strengths of the targets’ features, giving lots of false positives even when using a threshold.

The above is entirely speculation, not based on experiment or book learnin’, so please don’t take as authoritative.

1 Like