Hi,
I’m little bit confused by how the DeviceDataLoader
work.
So I created a DataBunch
from a pandas
DataFrame
of 500 rows, and I split it into 400 rows for training and 100 for validation:
data = (
TabularList.from_df(df, path=admission, cat_names=cat_names, cont_names=cont_names, procs=procs)
.split_by_idx(valid_idx=range(400,500))
.label_from_df(cols=dep_var)
.databunch()
)
With a default batch size of 64
I found out that there are 6
batches in the training dataset:
dataloader = data.dl(DatasetType.Train)
for x,y in dataloader:
cat,cont=x
print(cat.shape,cont.shape,y.shape)
[Output:]
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
My questions is: what happened to the leftover 16
rows of training data? Are they ignored during training?
Interestingly the validation dataset are treated differently: the 100 rows are split into 2 batches: one of size 64 and one of size 36.
dataloader = data.dl(DatasetType.Valid)
for x,y in dataloader:
cat,cont=x
print(cat.shape,cont.shape,y.shape)
[Output:]
torch.Size([64, 1]) torch.Size([64, 6]) torch.Size([64])
torch.Size([36, 1]) torch.Size([36, 6]) torch.Size([36])