Get_preds on unlabelled dataset


I have a hard time understanding the get_preds methods output, and more so why to use it on unlabelled data.

Am I correct in understanding that the output of the get_preds first value is a matrix on each image and all its corresponding classes it predicts the image is classified as?

If I use the get_preds on a test dataset with unlabelled images the second get_preds output value is None. What can I do with this output of just predicted classes of each image in the test dataset. Could it be viewed as how well the model is able to predict classes in images without cross-referencing the actual class/label of the image it predicted on?

get_preds() is for generating a prediction on data that is not your training set.
(Think of it as the reason you trained this model, or to make a submission to Kaggle) Therefore, you do not expect to have labels for them.

Correct, preds is a vector (1 element for each category) of the prob of the category.

Find the highest prob to get the predicted category

max_preds, max_cats = torch.max(preds.squeeze(-1), axis=1)

Returns the max prob and the category which has the max prob for each item


preds,_,decodes = learn.get_preds(dl=dl, with_decoded=True)

decodes will be the index of the highest probability

I usually do:

dl = learn.dls.test_dl(df_inf, bs=bs_inf, num_workers=0)
preds,_ = learn.get_preds(dl=dl)
max_vals, max_cats = torch.max(preds.squeeze(-1), axis=1)

df_inf[‘PREDICT_PROB_1x’] = max_vals
df_inf[‘PREDICT_ID_1’] = max_cats
df_inf[‘PREDICT_LABEL_1’] = df_inf[‘PREDICT_ID_1’].map(dict(enumerate(learn.dls.vocab.o2i)))

Ok great thx for the answer. So get_preds is then used for batch predictions on data that contains more than one item at once, instead of single predictions using the predict() method for example?

Correct. But…
predict() internally takes the single item and calls test_dl() to make a dataloader and then calls get_preds()

Also, I think this is a good writeup which consolidates lot of the info …

1 Like