Batch transforms (GPU) vs Item transforms (CPU)

kshitijpatil09 · May 22, 2020, 7:02pm

Need an opinion on this:

With GPU-based batch transforms, we’re able to parallelize the transform process, thus faster implementation. But aren’t we getting less combinations of examples in one epoch than with transforms applied at item-level. For example, with RandomErasing augmentation, I see we erase the exact same portion of whole batch, doesn’t it seem redundant? Won’t it affect the few shot learning techniques?

muellerzr · May 22, 2020, 7:04pm

I think this may be better suited for the fastai2 channels, alright if I move it there? (As there’s a few fastai2 specific questions )

kshitijpatil09 · May 22, 2020, 7:06pm

Sure!

muellerzr · May 22, 2020, 7:07pm

I’m not familiar enough with how the old version 100% worked, but if you tried doing RandomErasing on v1, would each particular image be different? (Also moved to v2!)

kshitijpatil09 · May 22, 2020, 7:29pm

Me neither. But I suppose it’s possible to put RandomErasing in item_tfms (specifically, after_item, assuming you’ve resized your images to same size) and it’ll work on each image differently. The question is, how much does it affect the performance? There are multitudes of augmentations in albumentations, which are certainly fuelling many competition winning approaches, all are performed on CPU.

It makes sense to do preprocessing steps like Normalization, Whitening on GPU, or any redudant step could do as well, but not sure about augmentations.

Again, it’s not specific to fastai2 design, and I’m looking for some empirical results that support one side over the other.

ducha-aiki · May 22, 2020, 8:05pm

If you are looking at batch-wise GPU thing, I would recommend kornia https://kornia.readthedocs.io/en/latest/augmentation.html

kshitijpatil09 · May 23, 2020, 7:33am

This looks great If I understood correctly from the very first example, we generate parameters for a batch, for instance, if it’s a RandomRotation Transform, we’ll generate the random degrees of batch size and then apply them, thus, we’re still performing transforms batch-wise on GPU but with different parameters for each example.

This behavior can be controlled by same_on_batch argument, which will use same transform parameters for whole batch.