How does the `DeviceDataLoader` work?

xujiboy · March 12, 2019, 11:15pm

Hi,

I’m little bit confused by how the DeviceDataLoader work.

So I created a DataBunch from a pandas DataFrame of 500 rows, and I split it into 400 rows for training and 100 for validation:

data = (
    TabularList.from_df(df, path=admission, cat_names=cat_names, cont_names=cont_names, procs=procs)
               .split_by_idx(valid_idx=range(400,500))
               .label_from_df(cols=dep_var)
               .databunch()
)

With a default batch size of 64 I found out that there are 6 batches in the training dataset:

dataloader = data.dl(DatasetType.Train)
for x,y in dataloader:
    cat,cont=x
    print(cat.shape,cont.shape,y.shape)

[Output:]

torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])

My questions is: what happened to the leftover 16 rows of training data? Are they ignored during training?

Interestingly the validation dataset are treated differently: the 100 rows are split into 2 batches: one of size 64 and one of size 36.

dataloader = data.dl(DatasetType.Valid)
for x,y in dataloader:
    cat,cont=x
    print(cat.shape,cont.shape,y.shape)

[Output:]

torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([36, 1]) torch.Size([36, 6]) torch.Size([36])