Batch transforms (GPU) vs Item transforms (CPU)

Need an opinion on this:

With GPU-based batch transforms, we’re able to parallelize the transform process, thus faster implementation. But aren’t we getting less combinations of examples in one epoch than with transforms applied at item-level. For example, with RandomErasing augmentation, I see we erase the exact same portion of whole batch, doesn’t it seem redundant? Won’t it affect the few shot learning techniques?


I think this may be better suited for the fastai2 channels, alright if I move it there? (As there’s a few fastai2 specific questions :slight_smile: )


I’m not familiar enough with how the old version 100% worked, but if you tried doing RandomErasing on v1, would each particular image be different? (Also moved to v2!)

Me neither. But I suppose it’s possible to put RandomErasing in item_tfms (specifically, after_item, assuming you’ve resized your images to same size) and it’ll work on each image differently. The question is, how much does it affect the performance? There are multitudes of augmentations in albumentations, which are certainly fuelling many competition winning approaches, all are performed on CPU.

It makes sense to do preprocessing steps like Normalization, Whitening on GPU, or any redudant step could do as well, but not sure about augmentations.

Again, it’s not specific to fastai2 design, and I’m looking for some empirical results that support one side over the other.

1 Like

If you are looking at batch-wise GPU thing, I would recommend kornia


This looks great :star_struck: If I understood correctly from the very first example, we generate parameters for a batch, for instance, if it’s a RandomRotation Transform, we’ll generate the random degrees of batch size and then apply them, thus, we’re still performing transforms batch-wise on GPU but with different parameters for each example.

This behavior can be controlled by same_on_batch argument, which will use same transform parameters for whole batch.

1 Like