Custom transformations

So I want to add my custom augmentation, and decided to start from simple experiment by adding

def st(x): return x.flip(2)

as a function to vision/ and ‘st’ to all list in the same file, but my new transform doesn’t show up after importing

What else should I do to make this transform available?

Are you doing it in jupyter or in python file? If jupyter, have you reloaded the kernel?


%reload_ext autoreload
%autoreload 2

and thought I don’t need to reload. Turns out I was wrong. Thanks!

Yeah the autoreload is quite buggy with fastai, I don’t know why. Always restart your ntoebook/reload in case of doubt.
Adding a new transform is as simple as what you coded, so it should work perfectly.

@sgugger I tried to use mixup for multi-label classification, but I get tensor mismatch at

   if self.stack_y:
       new_target =[last_target[:,None].float(), y1[:,None].float(), lambd[:,None].float()], 1)

in class MixUpCallback, since last_target and y1 have dimensions of (batch size, number of classes), while lambd - (batch size)

Plus a few lines earlier

   if self.stack_x:
       new_input = [last_input, last_input[shuffle], lambd]

Shouldn’t new_input be a tensor and not a list?

1 Like

For multiclassification, targets are one-hot encoded, so you should use the argument stack_y=False.

I still had a tensor mismatch even with stack_y=False, but when I added

lambd = lambd.unsqueeze(1)

right before

new_target = last_target * lambd + y1 * (1-lambd)

in, it would run normally.

Ah, yes. Your y would have two dimensions then so it’s a bug. Thanks for the fix, feel free to put it in a PR, otherwise I’ll add it tonight or tomorrow/

In the end I modified it a bit to

if len(last_target.shape) == 2:
lambd = lambd.unsqueeze(1)

so it doesn’t have an effect otherwise.

What about simple multi-class(1 class per image)? I use just .mixup() then, right?

So I wanted to implement AutoAugment transforms and am looking for some advice.
AA is working with PIL images using 14 basic functions implemented via Image.transform(Image.AFFINE), PIL.ImageEnhance and PIL.ImageOps. While you can find a few of the functions among Fastai transforms, most are not yet in the library, so the question would be does it make sense to just wrap original AA with transforms tensor to PIL image and back and just use it that way, especially since it can be used as the only augmentation, or splitting AA into basic functions and then assembling it back into whole transform is the only reasonable way?

The best would be to add the functions that are missing in fastai v1: if you look at the source code of vision.transform, it’s really easy to code new transforms (though maybe a few of those functions are difficult).

Otherwise, you should just wrap original AA with transforms tensor to PIL image and back and just use it that way as you said.

I added locally senet family of models and made it work the same way as tvm.resnets by

  • adding def _se_resnet_split, _se_resnet_meta and expanding model_meta in vision/

  • using from v0.7 with modified models’ functions and putting it into the vision/models

Should I PR it? Can also add that slight fix to callbacks/

The slight fix for the mixup model is more than welcome in a direct PR to fastai.
For the senet file, let me check if the 0.7 version can be refactored with our new tools first, and once I’ve added it to the library, your PR with the metadata will be more than welcome too (note that my refactoring might potentially change the indexes you have).

senet models are very similar to resnets so changes to are rather simple:

for every model-defining function

(pretrained=False, num_classes=1000)


(num_classes=1000, pretrained=‘imagenet’)

to adjust to learners calling models with “True” argument


settings = pretrained_settings[‘se_resnext50_32x4d’][‘imagenet’]


settings = pretrained_settings[‘se_resnext101_32x4d’][pretrained]

to account for the change of pretrained to False instead of ‘imagenet’

Hi, can you share your code for autoaugment in Thx

I did a bit of experimenting(mostly fruitless), and for now settled on

policy = ImageNetPolicy()
def autoaugment(x):
pil_img = PIL.Image.fromarray(image2np(x*255).astype(‘uint8’))
x = policy(pil_img)
x = pil2tensor(x,np.float32)
return x

autoaugment = TfmPixel(_autoaugment)

It does the trick, in fact, I played around with training Imagenet and started the training with standard get_transforms(). Then I changed transforms to

[RandTransform(tfm=TfmCrop (crop_pad), kwargs={‘row_pct’: (0, 1), ‘col_pct’: (0, 1), ‘padding_mode’: >‘reflection’}, p=1.0, resolved={}, do_run=True, is_random=True, use_on_y=False),
RandTransform(tfm=TfmPixel (flip_lr), kwargs={}, p=0.5, resolved={}, >do_run=True, is_random=True, use_on_y=False)]

(basically RandomCrop + AutoAugment + RandomHorizontalFlip)
and got about 5% speed up and a third less GPU memory taken!

Edit: ok, reduction in GPU memory intake was extremely suspicious, and it turned out to be a simple case of not noticing a change in the architecture. :grimacing:

1 Like

Hi. I made a small change to your code and now the autoaugment seems to produce the expected results:

def _autoaugment(x):
npim = image2np(x)*255 # convert to numpy array in range 0-255
npim = npim.astype(np.uint8)
pil_img = PIL.Image.fromarray(npim)
transformed = policy(pil_img)
return pil2tensor(transformed, dtype=np.float32)/255

I think we were supposed to multiply the array after using image2np.

Hope it helps.

I still get the error for stack_y.
May be we need similar fix for stack_y as True ? My target shape is bs,(1,h,w)

if self.stack_y:
            new_target =[last_target[:,None].float(), y1[:,None].float(), lambd[:,None].float()], 1)
            if len(last_target.shape) == 2:
                lambd = lambd.unsqueeze(1).float()
            new_target = last_target.float() * lambd + y1.float() * (1-lambd)