From my understanding, normal cross entropy cannot be used in multi-label problems, because softmax measures probabilities relative to one another.
(if you were confused like me at first, multi-class means multiple classes, one right, whereas multi-label is multiple classes, multiple answers)
To be sure, here’s a (not working) multi-label learner:
from fastai.vision.all import *
from random import randint
vocab = ['this', 'is', 'a', 'label'] # have some random classes just so we're able to create a DataLoader
path = untar_data(URLs.BIWI_HEAD_POSE)
dls = DataBlock(
blocks=(ImageBlock, MultiCategoryBlock),
get_y=lambda x: vocab[randint(0, 3)],
get_items=get_image_files
).dataloaders(path)
learn = vision_learner(dls, resnet18)
print(learn.loss_func)
FlattenedLoss of BCEWithLogitsLoss()
If we instead use a CategoryBlock
, then the output is
FlattenedLoss of CrossEntropyLoss()
Since fast.ai is choosing it, I guess it’s safe to assume that this is the best overall loss for this type of problem.
Admittedly, I haven’t been able to test the performance of a multi-label learner with BCE yet (the PASCAL_2012
download is taking quite a bit). It also doesn’t help that whenever I Google the uses of BCE, multi-label is not mentioned:
Binary Cross-Entropy is widely used in training neural networks for binary classification problems
And the one Medium article I found on it is paywalled…
If BCE works when multiple classes are correct, why shouldn’t it too when only one is? Or does it work okay in this case, BUT (normal?) cross entropy just happens to be better? And if so, why?
For context, I’m beginning part 7 of the course, but I’m dragging 5 and 6 along as I try to understand loss functions. Please feel free to correct any misassumptions.