Help using tfmdDL

joseadolfo · February 14, 2020, 9:44pm

tmfdDL has two parameters (among others). One is bs=64, the other batch_size=None. Can any one explain to me the difference between the two? How are they meant to be used? A subsequent, show_batch() command crashes the system whenever batch_size is set to anything other than 1.

jeremy · February 14, 2020, 9:48pm

It’s from DataLoader:

So it’s just for compatibility. Use bs. (No idea why setting batch_size would cause a problem, however).

joseadolfo · February 14, 2020, 10:07pm

I have the following code that loads the COCO_TINY dataset:

Data acquisition

ds_items = get_image_files(ds_source/‘train’)
ds_split = RandomSplitter(seed=SEED_VL, valid_pct=0.2)(ds_items)
train, valid = (ds_items[i] for i in ds_split)
ds_bbox = lambda o: get_x_y[o.name][0]
ds_label = lambda o: get_x_y[o.name][1]

Datasets

def to_np(x): return np.array(x, dtype=np.float32)
dsets = Datasets(ds_items, [PILImage.create, [ds_bbox, to_np, TensorBBox.create], [ds_label, MultiCategorize(add_na=True)]], splits=ds_split, n_inp=1)

Transformations and dataloaders (WORKS!)

aft_itm = [BBoxLabeler(as_item=False), PointScaler(y_first=False), ToTensor()]
aft_btch = [IntToFloatTensor(), AffineCoordTfm(size=SZ)]
tdl_trn = TfmdDL(dsets, bs=1, num_workers=4, after_item=aft_itm, after_batch=aft_btch, device=default_device())

dldr = DataLoaders(tdl_trn)

If bs is set to 1 in TfmdDL, then the code works fine, tdl_trn.one_batch() runs correctly, and tdl_trn.show_batch( figsize=(5,5)) shows the image. However, if bs=2 (or any number greater than 1), tdl_trn.one_batch() crashes with the following error:

RuntimeError Traceback (most recent call last)
in ()
----> 1 x,y,z = tdl_trn.one_batch()
2 type(x), type(y), type(z), x.shape, y

11 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/utils/collate.py in default_collate(batch)
53 storage = elem.storage().new_shared(numel)
54 out = elem.new(storage)
—> 55 return torch.stack(batch, 0, out=out)
56 elif elem_type.module == ‘numpy’ and elem_type.name != 'str’
57 and elem_type.name != 'string’:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 2 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Your help on how to solve this issue will be appreciated

jeremy · February 14, 2020, 11:43pm

That’s pretty hard to read. Can you please use markdown code formatting and try to make your post as clear as possible?

joseadolfo · February 15, 2020, 9:32pm

My apologies, Sir. I am pretty inept at formatting things. I have edited the code above and hope it is clearer.

muellerzr · February 15, 2020, 10:05pm

IIRC coco_tiny images are not all the same size. You should include a Resize method first in your item_tfms (after_item) to get it working.

joseadolfo · February 15, 2020, 10:08pm

That’s the reason I set AffineCoordTfm(size=SZ). At least I thought so.