Why is Fastai creating four classes (multilabel) when there are only three?

BenHamm · July 8, 2022, 10:52pm

I’m training a ResNet50 classifier to detect when my cat is trying to bring in a dead animal into my house. There are three nested classes possible:

cat in frame,
cat face visible (if score is high enough, this is the cue to save the images for later)
cat with prey (if score is high enough, this is the cue to lock the cat door)

Here is what my dataframe looks like when everything is prepped and ready:

You’ll notice that one category is “null”, i.e. no cat is in frame and no label is assigned. I thought this would result in a classifier with three classes, in which all were unactivated in the case of no cat being present. However, in practice FastAI creates a fourth “null” class to represent “no cat”:

Any idea what I’m doing wrong? Netron tells me I am using Sigmoid! But I still see Softmax-y behavior:

Learner setup code below:

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y,
                   item_tfms=Resize(224,ResizeMethod.Squish))

dls = dblock.dataloaders(df, num_workers=0)

learn = vision_learner(dls, resnet50, pretrained=True, metrics=[accuracy_multi])
learn.loss_func = BCEWithLogitsLossFlat()
learn.fine_tune(epochs=10, base_lr=0.001, freeze_epochs=1)

bencoman · July 9, 2022, 1:34am

I’m a novice just completing the 2022 Part 1 lecture series (which IIUC may be released in a few weeks), so you should wait for a more authoritive answer, but just to test myself…

Its because final predicted inference values need to sum to 1. So considering when there is physically no cat present, without a no-cat category you’d end up with either:

One of three categories having a strong false-positive prediction; or,
Three categories weakly predicted, i.e. in the extreme, ~33% each

The first makes for a poor decision, and the latter is awkward to action, needing additional explicit imperative programming by yourself.

Having a “not-a-cat” category allows the NN to do its magic to “learn” to a strongly predict that category with the other three categories tending to pred=0%, which provides a high-confidence situation to action.

BenHamm · July 9, 2022, 2:11am

Hmm, I’m no expert but that cannot be right. It’s the final softmax layer that enforces the “sum to 1” behavior you’re describing, and that layer has been removed for my multi-label use case. Indeed, it’s quite common for me to have inferences that sum to >1 in this model. See example below, where my cat actually has an animal in his jaws:

bencoman · July 9, 2022, 2:43am

A few questions to help my learning…

What is your total number of training images?
Can you describe the structure of your training data on disk?

BenHamm · July 10, 2022, 9:44pm

About 21,000 images between training and validation. I’m not sure why the structure on the disk matters–they’re all in a single directory. Labels are not assigned by directory structure but by a JSON. Ultimately, what matters is that it all enters the dataframe → datablock → dataloader correctly, which AFAIK it has. dls.show_batch() provides valid outputs:

msivanes · July 11, 2022, 10:52am

You’ll notice that one category is “null”, i.e. no cat is in frame and no label is assigned. I thought this would result in a classifier with three classes, in which all were unactivated in the case of no cat being present. However, in practice FastAI creates a fourth “null” class to represent “no cat”:

You can customize this behavior using MultiCategoryBlock. There are samples with no labels present. In this type of scenario, it is better to add_na to your category vocab indicating the absence of labels as a separate class (dls.vocab should help to validate).

MultiCategoryBlock(add_na=True)

dls.show_batch looks and Learner looks good to me.

BenHamm · July 12, 2022, 12:52am

Thanks for the reply! However, it looks like your suggestion simply added the ‘#na#’ label to vocab without changing the labels for the blank ones. See below:

zonkyo · July 12, 2022, 12:30pm

Hi there.

This is the indented behavior for fastAI; you provide four classes to train against, you get four possible results.

If you do not want to see results for you null class (or do not want to name it ‘no cat’) the best way is to exclude it from the training.
As you can see here Lesson 3 - Unknown Labels (Pets Revisited) | walkwithfastai if the algorithm attempts to recognize something unknown, in your case no cat at all, it should return [].

Regarding this question you could read up on How to use BCEWithLogitsLossFlat in lesson1-pets.ipynb or even this one Handle data that belongs to classes not seen in training or testing - #28 by cudawarped

Regarding the sigmoid/softmax-y behaviour - from what I know ResNet50 has a last layer softmax, It should be a Sigmoid when using BCEWith…, according to the last link above, but you could check this again with learn.summary()?

bencoman · July 12, 2022, 1:31pm

Here is thing I’m trying to understand…
You say you have three categories, but your dataframe clearly shows you have four categories.

Does having a category labelled with an empty string really make a difference?
That would seem to require special-case handling of empty-strings to condense categories that would make the library code more complicated. The simplest library code would be that empty-strings are treated no different from any other string.

I thought categories processed “by position” anyway? …rather than processed by “string content”. The limited examples I’ve seem so far, the last step usually seems to be to reattach vocabularly to predictions by their index.

So my naive understanding is that FastAI, rather than "creating a fourth ‘null’ class to represent ‘no cat’ "
is just passing your empty-string-category through untouched, and the deeper consideration of sigmoid and softmax seems a red herring.

I’ve yet to work through it in detail, but I notice Fastbook Chapter 6 says…
Our list of categories [for MultiCategoryBlock] is not encoded in the same way that it was for the regular CategoryBlock. In that case, we had a single integer representing which category was present, based on its location in our vocab. In this case, however, we instead have a list of zeros, with a one in any position where that category is present.

TensorMultiCategory([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]))

Is that similar to how you are doing multi-category ?

zonkyo · July 13, 2022, 5:29pm

I am currently attempting to get a own data set for the same purpose by downloading images from google - so I would be happy to take your images into account as well

I think this project is interesting, since a buddy of mine told me of his cat that brought in mice, other rodents, bunnies, once a pheasant and ate it. On the kitchen table.

So locking out this kitty would be benefical - and if he would be okay with the cat bringing in what ever, I would have built something funny.

@bencoman
yes, it does; the categoray with the null string will be a fourth category that, in this case, contains images of no cat (or in general, other things than cats, for example racoons, dogs, velociraptors, …).
The vocab is just “what can be seen in the image”, or to be more explicit, there are different versions of “what can be seen” - there are, for example, the files that contain not only the label of an object but also the “exact” position as a bounding box. With these, the object recognition engine can be trained and triggered to infer more than one object in an image. If in one of @BenHamm 's images would appear suddenly twelce cats looking intently at the cam, it would still register as “cat cat_face”, the bounding boxes approach would/should yield 12 bounding boxes with “cat cat_face”.
I have to admit that I am not too firm on this approach, by the way!

Now, regarding the last few paragraphs:
I am not quite sure I get what you mean, but I’ll try!
As far as I understand you want to use only three classes, i.e., creating a dataloader containing only the classified images. Then you want to use during training (?) also the images without cats, so actually the class without classes (or the ‘other’ class or ‘none’ class).
This should yield a model with only three classes, in @BenHamm 's case the ‘cat’, ‘cat cat_face’, and ‘cat cat_face cat_with_prey’ case. The “faulty” images would then be categorized as [0,0,0] since they should be not similar at all.

Did I understand you correctly?

The last part, about MultiCategory, is essentially already used by @BenHamm as you can see in his first post. Here, the difference is, that @benhamm removed the output layer and greps the layer before that. But, if everything works well, he should get, for ‘cat approaching the cat door but having nothing in its fangs’ the results [0, 1, 1, 0], i.e. ‘’=0, ‘cat’=1, ‘cat_face’=1, ‘cat_with_prey’=0 IFF he would be using the last layer.
The MultiCategoryBlock should allow to categorize “Multiple Objects”, ie. a cat and maybe a dog, in an image. So if you have for example the categories: cat, dog, giraffe, whale, rhino and have, in the layer before the result layer, the value [0, 1.34, 0.53, 0.1, 0.55] it might result in [0, 1, 0, 0, 1] which would indicate no cats, a(least one) dog, no giraffe, no whale, a(t least one) rhino!

BenHamm · July 13, 2022, 5:52pm

@zonkyo to clarify, this is already a project that I’ve had working for several years. Was my first coding project! You can see how I did it a long time ago: Cats, Rats, A.I., Oh My! - Ben Hamm - YouTube

I have changed a lot about my stack since then–basically ditching all the Amazon components and migrating to FastAI. Currently I’m using a multi-label image classification model (NOT object detection) successfully. But I am trying to make it better.

If you’re trying to build something similar, I’d love to chat! I am not sure you’ll be successful using internet-scraped data, but I could be very wrong since you are way more skilled than I.

Regarding some of the questions/response in this thread: If having a “null” class is expected behavior, then I am good. I thought perhaps the model would be more performant if I constrained everything to three classes, and had “no_cat” mean that the activations were zero.

zonkyo · July 23, 2022, 9:28am

So, I started dabbling with this project on the gathered data (just ~2000 images) which I separated in different classes, namely “cat”, “cat with prey” and started out with the single label classification idea.

The reasoning behind that is the following: a “cat cat_face” and a “cat” should yield the same result, namely, cat door unlocked. I do not think it is important whether the cat is leaving or coming ^^ but that is just my preference (and labeling/pushing to different directories is a pain).

This is not that bad, I guess, especially considering that I actually am not well versed in DL with images.

@BenHamm if you would not mind letting me getting my fingers on your data set, I would like to see what happens when a network is trained on some random cats vs. your well collected dataset

Currently, since I just started out, it recognizes random things as one of the classes, still (a donkey as a house cat, ups). So this has to be fixed, but since I assume it is more like “prey”=locked and otherwise open, this would not be too bad (as a baseline)

zonkyo · July 24, 2022, 5:12pm

So, solved it on my end - easy enough for the first try!

For the fine-tunining, set the threshold to something “low”, like 0.5, for the testing increase it to something like 0.9, i.e.
learn.loss_function = BCEWithLogitsLossFlat(thres=0.9)

Besides that, you can change your get_y method! Sorry that this occurred only now to me, after dabbling a bit around with it myself.
get_y = lambda label: [1, 0, 0] if label=='cat' else [1, 1, 0] if label=='cat cat_face' else [1,1,1] if label=='cat cat_face cat_with_prey' else [0, 0, 0]

this should work as well. This way, you could use images without targets as well, if you want to.

so sometimes it really misses the point (lower center, cat next to a corpse of… something) and sometimes in my mini dataset it learned the wrong concept (cat box = cat, ups).

But as a proof of concept, it is possible to enhance the data set you gathered over a longer time with your cat with some random cats from the web! Fun stuff!

( in the end I had around 1200 images, 700 ‘normal’ cats, 430 cat with prey and some random images; whether you want to let in a tiger or a lion, as long as they have no prey in their fangs, that is your decision; you could add a cateogry ‘Bencat’ against a ‘cat’ category)

zonkyo · September 19, 2022, 8:52am

Apparently, other people think this is a real issue

Article in German of “Lion’s den”, I think in America the show “Shark tank”, where someone is attempting to sell his ‘invention’ that does the exact same thing.
So, your problem seems to concern others as well!