Learn.get_preds batch size and number of items mismatch error

WaldoBG · September 28, 2020, 2:42am

When I use

learn.get_preds(dl=dls)

If the number of items in the dataloaders is not exactly divisible by the batch size, I get an error. For example, if there are 101 items in the dataloaders, and batch size is 8, when it gets to the last batch, I get:

IndexError Traceback (most recent call last)
in
----> 1 preds = learn.get_preds(dl=dls)

~/anaconda3/envs/fastai2/lib/python3.8/site-packages/fastai/learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, reorder, cbs, **kwargs)
240 res[pred_i] = act(res[pred_i])
241 if with_decoded: res.insert(pred_i+2, getattr(self.loss_func, ‘decodes’, noop)(res[pred_i]))
–> 242 if reorder and hasattr(dl, ‘get_idxs’): res = nested_reorder(res, tensor(idxs).argsort())
243 return tuple(res)
244 self._end_cleanup()

~/anaconda3/envs/fastai2/lib/python3.8/site-packages/fastai/torch_core.py in nested_reorder(t, idxs)
651 “Reorder all tensors in t using idxs"
652 if isinstance(t, (Tensor,L)): return t[idxs]
–> 653 elif is_listy(t): return type(t)(nested_reorder(t_, idxs) for t_ in t)
654 if t is None: return t
655 raise TypeError(f"Expected tensor, tuple, list or L but got {type(t)}”)

~/anaconda3/envs/fastai2/lib/python3.8/site-packages/fastai/torch_core.py in (.0)
651 “Reorder all tensors in t using idxs"
652 if isinstance(t, (Tensor,L)): return t[idxs]
–> 653 elif is_listy(t): return type(t)(nested_reorder(t_, idxs) for t_ in t)
654 if t is None: return t
655 raise TypeError(f"Expected tensor, tuple, list or L but got {type(t)}”)

~/anaconda3/envs/fastai2/lib/python3.8/site-packages/fastai/torch_core.py in nested_reorder(t, idxs)
650 def nested_reorder(t, idxs):
651 "Reorder all tensors in t using idxs"
–> 652 if isinstance(t, (Tensor,L)): return t[idxs]
653 elif is_listy(t): return type(t)(nested_reorder(t_, idxs) for t_ in t)
654 if t is None: return t

IndexError: index 98 is out of bounds for dimension 0 with size 96

I’ve found that if I make the batch size 1, it always works, but also does inference slower. Is this a known issue? Am I doing something wrong?

Thanks

AdamF · October 9, 2020, 4:43pm

This seems to be a bug. Thanks for catching it and suggesting the slow work-around of setting batchsize to one. I’ve also raised a github issue for it here

dreamflasher · October 20, 2020, 11:31am

It’s a bug. I tried to make the batch size so that the number of items in the dataloader is divisble by the batch size, but I still get the exception (for any batch size > 1).
What I find strange is that the same code works without any issues for training and validation.

AdamF · October 20, 2020, 12:06pm

I noticed in a this summary that has the right incantation for getting predictions:

imgs, probs, classes, clas_idx = learn.get_preds(dl=learn.dls.test_dl(files), with_input=True, with_decoded=True)

where files is a list of files. The documentation for test_dl is accurate (Create a test dataloader from test_items using validation transforms of dls), but perhaps does not link to get_preds and the inference task as clearly as it could.

dreamflasher · October 20, 2020, 12:31pm

That’s what I am doing – with_input and with_decoded doesn’t change anything. The problem is within get_preds():

dl = learn.dls.test_dl(files)
print(len(dl.get_idxs()))

19500

probs, _ = learn.get_preds(dl=dl, reorder=False)
print(len(probs))

19460

dreamflasher · October 20, 2020, 12:43pm

I found the solution!
dl = learn.dls.test_dl(files, drop_last=False)
It looks like drop_last defaults to True somewhere.