Learn.get_preds not giving predictions for all records

Seemant · May 14, 2019, 11:38am

When doing image classification, I have 381 training images and I want to get predictions for these 381 images. But I am getting predictions only for 376 images.

snip_3

Is anyone else facing this issue?

Why am I getting predictions on a randomly selected subset of my training data?

msandroid · May 14, 2019, 12:14pm

Is your batch size larger than 5? If yes, you probably have set up the dataloader in a such a way that only full batches are used and the remaining images are skipped.

Seemant · May 14, 2019, 12:23pm

Yes, my batch_size is 8. How can I use dataloader so that no images are skipped?

msandroid · May 14, 2019, 12:37pm

There is no easy way to do it. If you have a look in create function of the databunch in fastai/basic_data.py you see this line:

dls = [DataLoader(d, b, shuffle=s, drop_last=s, num_workers=num_workers, **dl_kwargs) for d,b,s in zip(datasets, (bs,val_bs,val_bs,val_bs), (True,False,False,False)) if d is not None]

drop_last defines if you drop the last images. If you override this function with your own you could change the behaviour. But pay attention. Depending on your loss function, the not full batch could screw a bit with your training. As you see the validation and test set dont drop the not complete batch. perhaps this would be the easiest for you to use.

Seemant · May 14, 2019, 12:40pm

Okay. Thanks a lot for the reply.

sgugger · May 14, 2019, 12:59pm

There is actually an easy way to do it:
data.train_dl = data.train_dl.new(drop_last=False)
Note that you shouldn’t train in this configuration as it can cause problems with batchnorm layers.

Seemant · May 14, 2019, 2:18pm

Hey, It seems to be working now with this code. Thanks for the reply.