Applying random transforms to validation and test sets

mjj · November 5, 2019, 10:43am

Hello, I recently moved to fastai_dev due to the ability to do GPU transforms. This is truly a distinctive and helpful feature, so thanks all involved!

I do have a question regarding TTA. I’m trying to apply train transforms (including random rotations and flips) to the test set, but if I create the loader with test_dl.

tdl = test_dl(db, test_items)

Only validation transforms are applied, as expected, since RandomTransform derived transforms only apply to the train set.

What’s the easiest way to apply train transforms to the test and validation sets?

I had to rewrite these two functions from data/core.py:

def tta_set(dsrc, test_items, rm_tfms=0):
    "Create a test set from `test_items` using **TRAIN** transforms of `dsrc`"
    test_tls = [tl._new(test_items, split_idx=0) for tl in dsrc.tls[:dsrc.n_inp]]
    rm_tfms = tuplify(rm_tfms, match=test_tls)
    for i,j in enumerate(rm_tfms): test_tls[i].tfms.fs = test_tls[i].tfms.fs[j:]
    return DataSource(tls=test_tls)

def tta_dl(dbunch, test_items, rm_type_tfms=0, **kwargs):
    "Create a test dataloader from `test_items` using **TRAIN** transforms of `dbunch`"
    test_ds = tta_set(dbunch.valid_ds, test_items, rm_tfms=rm_type_tfms) if isinstance(dbunch.valid_ds, DataSource) else test_items
    return dbunch.valid_dl.new(test_ds, **kwargs)

And then create the dataloader with:

tdl = tta_dl(db, test_items)

But it’s a very awkward hack.

I’m using fastai_dev as of commit 54a9c28 (Nov, 1).

sgugger · November 5, 2019, 1:23pm

Note that TTA in v2 hasn’t landed yet. Your hack doesn’t seem unreasonable in the meantime.

mjj · November 5, 2019, 10:41pm

Thanks for your answer.

If it helps, I like the new Transforms-centric way of modelling the dataset, but I think the assumption that RandomTransforms should only work on the train dataset is wrong and should be easily tuneable. I liked v1’s way of explicitly passing two lists of transforms (train/valid) and I think it should be kept that way in v2.

sgugger · November 5, 2019, 11:57pm

It is, you just have to set their split_idx to None if you want them applied to the two sets, to 0 for the training set only and to 1 for the validation set only.