Mismatch between train_dl and trains_ds lengths

hotessy · April 19, 2019, 4:08pm

I’m trying semantic segmentation and my dataset contains 704 images (each of images and masks).

I’m loading the dataset using the following code:

src = (SegmentationItemList
       .from_folder(path=path_img, extensions=['.png'], recurse=False)
       .filter_by_func(get_mask_imgs_filter)
       .split_by_rand_pct(valid_pct=0.1)
       .label_from_func(get_y_fn, classes=codes))

data = (src
        .transform(get_transforms(), size=512, tfm_y=True)
        .databunch(bs=4)
        .normalize(imagenet_stats))

I have two questions:

Why does len(data.train_ds) = 634 but len(data.train_dl) = 158
Why does data.one_item(data.train_ds.x[0])[1].shape = torch.Size([1]) but data.one_batch()[1][0].shape = torch.Size([1, 512, 512])

Please help me understand the reason these mismatches.

Seb · April 19, 2019, 5:05pm

train_dl is a dataloader which combines items of your dataset train_ds into batches. Here batch size (bs) =4.

158 * 4 = 632
Looks like there’s a small discrepancy, I am not sure what fastai does by default with the last batch if smaller than bs

hotessy · April 19, 2019, 5:53pm

Thanks!

Any explanation for this ? (why is the image size being squeezed to size 1 )