Missing Predictions

GiantSquid · July 7, 2019, 7:41pm

I’m working on the Kaggle TMDB challenge. I created a databunch like so

all_data = (TabularList
.from_df(
x, cat_names=cat_names, cont_names=cont_names, procs=procs, )
.split_none()
.label_from_df(cols=dep_var, label_cls=FloatList, log=True)
.add_test(test)
.databunch())

I was trying to train on all data so I used split_none(). The train dataset in my databunch has 3000 items, which is as expected. I then created and trained a model, which seemed to be working as expected. But when I try to get the predictions…

train_pred, _ = learn.get_preds(DatasetType.Train)

I get back something with length 2944. What could be happening to the other 66 items?

kushaj · July 8, 2019, 5:25pm

Maybe the last batch is getting dropped.

Deepak_S · March 24, 2020, 8:16pm

I know this is old, but for the sake of others like me…

It happened to me as well. As @kushaj indicated the last batch is being ignored.

My dataset size was 15620 rows and
15620 % 64 (the default batch size) = 4 --> Hence I was not able to predict for the last batch of size 4

To get around it, I found the prime factors and took the biggest product of these factors that I could afford to compute quickly
Positive Integer factors of 15620 = 2, 4, 5, 20, 11, 220, 71

i.e. changed

learn.data.batch_size = 71

and it worked.

kushaj · March 25, 2020, 1:16pm

By default in fastai drop_last=True for training set and for validation set drop_last=False.