Is Pseudo-Labeling possible with the fastai library?

eugeneware · January 16, 2018, 2:00pm

Hi, I’ve completed the Part 1 v1 course, and am now doing v2.

In v1 @jeremy shows how to use pseudo labeling to predict the probabilities on the test set and then incorporate them into the training data to improve results.

By using the probabilities and not just the highest class predicted, we also used a technique called “Knowledge Distillation”.

Is this possible with the fastai library? It seems that the API from_csv, for example assumes only single classes per image.

Also the pytorch loss functions such as torch.nn.NLLLoss only seem to take categorical indexes, and not tensors or one hot encodings, which is ideally what I’d like to pass as target labels.

Has anyone successfully implemented Pseudo-Labeling with fastai?

Ideally pseudo-labeling would be built into the fastai API directly as it’s a very useful tool to have, at least in the context of Kaggle Competitions!

jerimma · January 16, 2018, 7:52pm

I haven’t tried it myself but could you not just either:

save your pseudo labeled data with the other csv file?
or maybe easier, load the csv data in to an array, add the pseudo labeled data to it and use the from_array function.

Hope this helps!

Partha · January 18, 2018, 8:42am

Hello, Can you share the link to part1 v1 videos please …
Regards
Partha

jamesrequa · January 23, 2018, 1:28am

@eugeneware Yes its definitely possible to implement pseudo-labeling with fastai library.

Simply put, its just a matter of combining a subset of your predictions with the training set. So you would use something like pandas to concat your predicted labels to your training labels and test images to your training images. Have all of this saved to one csv file and then point to this file using from_csv.

Do you mean multi-label and not multi-class? from_csv does support multi-label classification, you can check the planet competition (lesson 2 notebook) for how to go about doing that.