Proposal for v2


I’m playing with the since some time and it is hard to bend it to implement some use cases and it has some glitches that I think could be avoided. Mostly it is hard to get multiple objects augmented at once and it is very hard to undo the transformations for Test Time Augmentation if your labels were transformed.(I’ve mention few use cases at the end of the post)

I would like to suggest an improvement to the current API to allow for a custom function that applies a set of transformations to the output of Dataset.

Such function could look as follows:

def mask_n_coord_apply(t, x, labels):
    bbs,m = labels
    return t(x, TfmType.PIXEL, is_y=False), t(bbs, TfmType.COORD, is_y=True), t(m, TfmType.MASK, is_y=True)

The default function that implements current behavior would look as follow:

def default_apply_transform(self, t, x, y):
    if y is None: return t(x, TfmType.PIXEL, is_y=False)
    return t(x, TfmType.PIXEL, is_y=False), t(y, self.tfm_y, is_y=True)

The ‘t’ passed to this function would be applying a set of transformation in a deterministic way. ie if such set contains RandomCrop, the cropping box would be fixed during the execution of the apply function.

I’ve made a proposal that reuses current “set_state” to simulate such behavior, and I think I see how we could rewrite the current transformations to get a simpler code and to be able to implement an undo transformation that is needed in TTA.
I’ve presented it as a PR with a working example and with a proposed future API to write transforms.

Jeremy would like to have the conversation take place in the forum so if you are interested in commenting/participating then use this conversation instead of the PR.

Use cases & Glitches:


Imagine you want to train yolo / retinanet, and you are given a mask. To avoid issues with bounding box being deformed after rotation the best is to augment the Mask and the Image and then generate a bounding box.
In current API one needs two Datasets to get this working.

Here is how this can be done in proposed solution:

def mask_to_coord_apply(t, im, mask):
    im = t(im, TfmType.PIXEL, is_y=False),
    mask = t(mask, TfmType.PIXEL, is_y=False),
    bb = generate_bb_out_of_multicolor_mask(mask)
    return  im, (mask, bb)


Imagine you want to implement a variation of Unet that predicts Masks and Borders (for easier cutting). It is almost impossible to have Mask and Border transformed together along with the Image. The only way to do this is to write another Dataset and generate a border after all the transformations.

Here is how this can be done in proposed solution:

def mask_n_border_apply(t, im, labels):
    mask,borders = labels
    im = t(im, TfmType.PIXEL, is_y=False),
    mask = t(mask, TfmType.PIXEL, is_y=False),
    borders = t(borders, TfmType.PIXEL, is_y=False), 
    return  im, (mask, borders)


Imagine you want to predict: masks, bounding boxes and key points (joints, limbs etc), you have your images annotated but I don’t see how you can randomly augment your images in this scenario. Maybe if the key points are set as special pixels on a mask, but then they can be lost in the process.

Here is how this can be done in proposed solution:

def mask_n_border_apply(t, im, labels):
    m,bbs,kps = labels
    im = t(im, TfmType.PIXEL, is_y=False),
    m = t(m, TfmType.PIXEL, is_y=False),
    bbs = t(bbs, TfmType.COORD, is_y=False), 
    kps = t(bbs, TfmType.COORD, is_y=False), 
    return  im, (m,bbs,kps)


transform_on_side etc, work only with categorization tasks. For regression, we need to define tfm_y in multiple places.

If we pass TfmType as a parameter to t we won’t need any tfm_y that causes issues.


Test TIme Augmentation works wonders but it is very hard to implement for regressive models, with the existing API especially if we use any random transformations.


@groverpr @kcturgutlu @yinterian this is likely to impact you - would love to hear your thoughts (and anyone elses!)

1 Like

I see the problems with current approach. I have little experience designing APIs but this looks good.

1 Like

@goverpr @kcturgutlu @yinterian, can we get to some consensus here? I know that this isn’t probably the most exciting thing to discuss but as far a as I understand @jeremy we need your opinion to go forward with the proposal.

I’ve used the new API to implement UNet with weights for data science bowl 2018, the API allowed me to return Mask and Weights separately and have them augmented correctly without the need to represent them as different color channels that caused some issues during scaling.

The code looked as follows

def apply_transforms(t, x, y):
    x = t(x, TfmType.PIXEL, is_y=False)
    m,w = y
    m = t(m, TfmType.MASK, is_y=False)
    w = t(w, TfmType.MASK, is_y=False)
    return x, (m,w)

The only issue I’ve observed is that learn.predict_with_targs() breaks as it assumes that y is a tensor not tuple, but that can be easily corrected.

So if you have some time let us know what you think, I would love to move forward with this suggestion.

Can you share your notebook or examples that we can try out with our own data or even DSBowl ? How do you define the dataset now, is it different ? I think this will be a really great change but it would be nice to have a line by line code not just the source where we define our augmentations, create the data and plot them with and without augmentation.


1 Like

@piotr.czapla Sorry, I wanted to comment on it a few days back when I tried to look at the codes, but didn’t because I wanted to spend some more time (which I couldn’t get given other stuff) to understand better before commenting. Based on your comments and on first look, the suggested changes look very useful and API looks good too.

Honestly, I have not gone into codes of fastai’s before, so it was taking me even more time to understand the codes. So yeah, a notebook with examples of each problem and proposed solution would be great. (if you have time to do that :slight_smile: )

Yes I think that’s why I’m having trouble understanding this too - without a simple walk-thru of what it does and how it’s different and what problem it solves I’m having trouble wrapping my head around it.

1 Like

@jeremy @groverpr @kcturgutlu I was away this week, but I clean up my notebooks tomorrow (European time) and share it with you. This exercise can be actually super useful for me in the learning process as I had really hard time to use the newly implemented use_clr_beta to work comparable to Adam, so maybe you can spot what I was doing wrong.

1 Like

I am looking forward to it, especially transformations on different tasks!

@kcturgutlu, @groverpr, @jeremy It took me a bit longer as I had indeed created custom datasets for the data science bowl competition and I had to rewrite it to make it more generic, but hopefully I have something that shows how one could use the new API to work with this competition.

Here is the notebook, I’ve extracted all relevant code there for your convenience.


Thank you I am on it now !

As per my understanding below is what it is doing. Correct me where ever wrong:

  1. You define transformations as you normally would.
  2. Then you use PadToSz(Transform) that is just padding, right? Basically if the image is smaller than sz, it is padding it to sz size.
  3. You transform mask and weight. Although I didn’t understand calc_weight function completely. But the idea is to augment them separately. right?

Had to dig into your dsb2018 repo to understand :smiley:

Generally this idea of allowing custom function to apply transformations sounds good. API looks good too.


I haven’t put the transformation to the notebook as they aren’t that relevant for the API, and I underestmated how deep you are going to dig in to the code. I’m flattered that you took the effort, thank you.

  • re. The calc_weight it is from UNet paper, they have weights for each pixel, that put 10x more loss on the background pixels between objects that are close by, that way they hope the network will learn to output a separated masks:
  • re. The PadToSz, I didn’t want to scale images that come in this competition in variable size and shapes. So I’ve implemented padding (for small images) and used cropping to get constant shape at the end. I’ve used WeightedRandomSampler to pick large images more frequently for a batch, but this is not yet shown in the notebook as I haven’t yet reimplemented it.
1 Like

@jeremy I’ve split the PR so that travis-ci can be merged separately. If you are okey with the API, I will put more tests to ensure I haven’t break things and we can consider getting it merged.

Then we can deprecate the tfm_y, and deprecate the setState in favour of determ, but that should be done once we have a good test coverage.

1 Like

@piotr.czapla I’ll make it a priority to look at this after Monday’s class.

1 Like

@kcturgutlu, @goverpr, @jeremy. I went ahead with the proposal and I have implemented a modified version. The modification removes the need for is_y and tfmtype parameters, and replace it with the ability to modify the transformation parameters during execution.

So instead of having to code ifs around sz_y=512, you can simply overwrite sz for all transformations as follows: (t(x), t(y, sz=512)).
Or if you don’t like the cv2.INTER_AREA in your scaling because you are transforming mask you can simply write:
(t(x), t(y, interpolation=cv2.INTER_NEAREST))

We have shortcuts for most common parameters so to transform masks you would write:
(t(x), t(y, **TfmParams.CLASS))

The above examples are the body of apply_transform.

All of the parameters are visible if you run repr(tfms[0]), the output looks like this :

One thing that I haven’t yet quite figure out is a nice way to disable transformations as needed.
So far I’ve added a parameter disable that turn’s off any transformation:

So the TfmParams.CLASS looks as follows:

{'disable': ('', '', '', '',    ''),
 'interpolation': 0,
 'pad_mode': 0}

This might become a pain to maintain I was thinking about reintroducing TfmType but this is a leaky abstraction and one that isn’t easy to amend. I guess we can stay with the above for a while.

Please have a look at this pull request:, I’ve tried to keep the diff nice even though there is a lot of changes. For that reason, I left Transform & TransformCoord (which is simply an alias to new RndTfm).

The changes are backward compatible as far as the are able to test.

@jeremy if you like the change let me know and I will write more compatibility tests, and lets try to merge it. So it does not stay for long as PR as porting fixes from old to the new API is manual and time consuming.

Thanks for the interesting ideas! Without tfmtype, how do you ensure that the right details are used for each transform? (We wouldn’t want the user to have to know all those details themselves.)

You have TfmParams.CLASS or TfmParams.PIXEL etc. all of the with the default values for the transformation.

The idea is to use it as fallows:

t(x, ** TfmParams.CLASS)

I was thinking about reusing for this purpose TfmType, and simply convert the numbers to dicts, so that you would write: t(x **TfmType.CLASS), but I’ve decided against it in the first proposal as the ‘disable’ parameter uses names of functions in and I thought it is better to write, than '' that is prone to typing errors. But that are details.

So the options for the syntax are following:

t(y, **TfmParams.CLASS)
t(y, **TfmType.CLASS)
t(y, TfmType.CLASS)   # I can add  a positional argument that we convert to something like: `**TfmParams.from_type(tfm_type)`, so that there is no need for the transformation functions to handle this.

The idea is that we have 4 levels of understanding of the API:

  • 0 - nothing changes everyone, are using the API as previously with exception that they specify TfmType only once.
  • 1 - People know about the apply_transforms hook where they can change how the x & y are interpreted, and they use TfmType / TfmParams to add parameters to the transformation function.
  • 2 - People know that they can overwrite any parameters in any partially applied function in the transformations and they use that to implement very specific use-cases.
  • 3 - People know what the .determ() is for and how to implement Randomized transformations and how to extend the library.

Currently, the API only supports levels 0 and 3 (which is limited as we don’t have hooks in right places)

@jeremy I’ve just realised you may not remember what “t” is. The t stands for the deterministic version of the transformations which are obtained after calling tfms.determ(). It is only relevant if a user is on level 1 and he wants to implement custom apply_transforms hook. This the place where we replaced the TfmType and is_y with parameters. The TfmType stays as it is was on level 0 where you simply use the public api as shown on your videos.

I’m focusing on keeping the level 0 api unchanged so the API stays compatible with the current notebooks and videos.

After much thinking I’ve decided that I should try to re-write fastai from scratch prior to the next version of part 1. I’d definitely like to integrate some of these new fastai.transforms ideas. With the upcoming NVIDIA DALI library I’m thinking that much of the process may need some re-thinking, since much more will be happening on the GPU.

My plan is to create a branch, and try to write fastai v1.0 there, including tests, docs, and a clean API for everything we covered in parts 1 & 2 of the course. My stretch goal is to be able to cover every application from both parts of the course in the revised part 1! :smiley: :open_mouth:

So I probably won’t incorporate anything other than bug fixes into the master branch in the mean time.

How does that all sound?

PS: Since someone is bound to bring up second system syndrome, I’ll just mention now that I’m aware of it, and think that it is avoidable with care and awareness…