Trying to figure out whether it’s a bug or something I missed. I’m solving a collab filtering problem using ‘CollabDataBunch’ and ‘collab_learner’.
When I run ‘get_preds’ on the validation dataset and check the predictions I get a number of predictions that is consistent with the data being fed in:
data.valid_ds.x.codes.T.size, data.valid_ds.x.codes.T.size, len(preds), len(targets)
>>> (191, 191, 191, 191)
However, when I run ‘get_preds’ on the training dataset, the last batch (which is smaller than my batch_size) in the input isn’t getting predictions. Here’s what I do:
preds, targets = learn.get_preds(ds_type=DatasetType.Train)
data.train_ds.x.codes.T.size, data.train_ds.x.codes.T.size, len(preds), len(targets)
>>> (1725, 1725, 1664, 1664)
What’s happening is that my ‘batch_size’ is 64 but the last batch is 61 (1725 - 1664) and it doesn’t get predictions generated for it. I tested with ‘batch_size=32’ and got the expected result (29 predictions are missing):
(1725, 1725, 1696, 1696)
Did anyone else encounter this problem? Is there a solution that I’m missing, or is it likely a bug?
I you constructed the learner with TextLMDataBunch as dataset then the default is to leave out the last part if it doesn’t match the batchsize. I believe that you can give the argument drop_last=False to the constructor TextLMDataBunch
I seem to be getting an even worse outcome. If I set to
DatasetType.Train I miss the last batch as well, but I also get the data out of order. @gene do you get that problem also? Setting it to
DatasetType.Fix resolves both problems, so I am using that not but that is not intuitive in this context. (like you I got it from the forums)
@bfarzin, I found a better solution (at least for my use case) where you have two options depending on your needs.
Option 1 is to pass the same dataset for training and test to
learn = collab_learner(data, test=data, n_factors=50, pct_val=0.1) and calling
preds, _ = learn.get_preds(ds_type=DatasetType.Test) after training. More details here.
Option 2 is to export the learner using
learn.export() and load your learner with the training dataset as the test dataset
learn = load_learner(path, test=data). Then you can again use
preds, _ = learn.get_preds(ds_type=DatasetType.Test). More details here.
Important: it won’t work with fastai version below 1.0.40 so you might need to update. Hope it’s useful.