Inconsistency between get_preds

wjsheng · May 15, 2019, 4:56pm

I am facing some inconsistencies using the get_preds function.

_, y_train = learn.get_preds(ds_type=src.train_ds) # 7897 good
_, y_train = learn.get_preds(ds_type=DatasetType.Train) # 7818 bad

_, y_valid = learn.get_preds(ds_type=src.valid_ds) # 7897 bad
_, y_valid = learn.get_preds(ds_type=DatasetType.Valid) # 1974 good

The number after # refers to the len. I wonder why is this the case. Any help is appreciated. Thank you.

sgugger · May 15, 2019, 5:36pm

src.valid_ds or src.train_ds are invalid keysfor ds_type, so you shouldn’t use them.
Note that the training dataloader drops the last batch if it doesn’t have batch size items, so that’s why you have the wrong length. ds_type = DatasetType.Fix will give you the training set un-shuffled and with that last batch.

karolsputo · February 1, 2020, 11:30pm

Hi!

I’m facing a somewhat similar issue using get_preds.

I’m passing learn.get_preds() a test set of length 28000, but I’m getting out predictions of length 33600. Any ideas why that might be?

Code:

torch.tensor(test_df.values).float().shape

torch.Size([28000, 784])

preds, labels = learn.get_preds(torch.tensor(test_df.values).float())
preds.shape, labels.shape

(torch.Size([33600, 10]), torch.Size([33600]))

bwarner · February 1, 2020, 11:52pm

get_preds doesn’t directly accept data, but rather the DatasetType, such as DatasetType.Valid or DatasetType.Test which tells it what dataset in the databunch to run on. If you need to add a test set to you databunch, there are a couple of different ways to do it.