Get_preds on unlabelled dataset

krullmizter · May 15, 2023, 9:07pm

Hi,

I have a hard time understanding the get_preds methods output, and more so why to use it on unlabelled data.

Am I correct in understanding that the output of the get_preds first value is a matrix on each image and all its corresponding classes it predicts the image is classified as?

If I use the get_preds on a test dataset with unlabelled images the second get_preds output value is None. What can I do with this output of just predicted classes of each image in the test dataset. Could it be viewed as how well the model is able to predict classes in images without cross-referencing the actual class/label of the image it predicted on?

meanpenguin · May 16, 2023, 4:51am

get_preds() is for generating a prediction on data that is not your training set.
(Think of it as the reason you trained this model, or to make a submission to Kaggle) Therefore, you do not expect to have labels for them.

Correct, preds is a vector (1 element for each category) of the prob of the category.

Find the highest prob to get the predicted category

max_preds, max_cats = torch.max(preds.squeeze(-1), axis=1)

Returns the max prob and the category which has the max prob for each item

or

preds,_,decodes = learn.get_preds(dl=dl, with_decoded=True)

decodes will be the index of the highest probability

I usually do:

dl = learn.dls.test_dl(df_inf, bs=bs_inf, num_workers=0)
preds,_ = learn.get_preds(dl=dl)
max_vals, max_cats = torch.max(preds.squeeze(-1), axis=1)

df_inf[‘PREDICT_PROB_1x’] = max_vals
df_inf[‘PREDICT_ID_1’] = max_cats
df_inf[‘PREDICT_LABEL_1’] = df_inf[‘PREDICT_ID_1’].map(dict(enumerate(learn.dls.vocab.o2i)))

krullmizter · May 16, 2023, 8:28pm

Ok great thx for the answer. So get_preds is then used for batch predictions on data that contains more than one item at once, instead of single predictions using the predict() method for example?

meanpenguin · May 17, 2023, 1:46am

Correct. But…
predict() internally takes the single item and calls test_dl() to make a dataloader and then calls get_preds()

github.com

fastai/fastai/blob/74fd6ea8cca582cc878d89658541873761ecb617/fastai/learner.py#L320


      
                  res = cb.all_tensors()
                  pred_i = 1 if with_input else 0
                  if res[pred_i] is not None:
                      res[pred_i] = act(res[pred_i])
                      if with_decoded: res.insert(pred_i+2, getcallable(self.loss_func, 'decodes')(res[pred_i]))
                  if reorder and hasattr(dl, 'get_idxs'): res = nested_reorder(res, tensor(idxs).argsort())
                  return tuple(res)
              self._end_cleanup()
          
          
def predict(self, item, rm_type_tfms=None, with_input=False):
              dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms, num_workers=0)
              inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
              i = getattr(self.dls, 'n_inp', -1)
              inp = (inp,) if i==1 else tuplify(inp)
              dec = self.dls.decode_batch(inp + tuplify(dec_preds))[0]
              dec_inp,dec_targ = map(detuplify, [dec[:i],dec[i:]])
              res = dec_targ,dec_preds[0],preds[0]
              if with_input: res = (dec_inp,) + res
              return res
          
          
def show_results(self, ds_idx=1, dl=None, max_n=9, shuffle=True, **kwargs):

Also, I think this is a good writeup which consolidates lot of the info …