How to use TTA on a new test dataset? And what are the default TTA transforms

DavidBressler · February 27, 2019, 3:19am

I’m using TTA in an image classification task like so:

my_tfms = get_transforms(do_flip=True, flip_vert=True, max_rotate=45, max_zoom=1.2, max_lighting=0.2,
                     max_warp=None, p_affine=1, p_lighting=.5)
data = ImageDataBunch.from_csv(path=DATAPATH, folder='images_resized',
                               csv_labels='my_labels.csv', valid_pct=.15, header=None,
                               fn_col=0, label_col=1, ds_tfms=my_tfms, size=224, bs=40)
learner = create_cnn(data, models.resnet34, metrics=accuracy)
learner.fit_one_cycle(1)
accuracy(*learner.TTA())

Two questions:

What are the transforms that TTA uses for its augmentations? I’m guessing it’s not the “my_tfms” that were used to define the original databunch, right? Is there a default set of transforms that TTA uses?
I’d like to run TTA on a new dataset (found in a new csv, e.g. my_labels2.csv). How would I do this?

sgugger · February 27, 2019, 3:35pm

TTA uses the transforms you defined on the training set (with some tweaks to make sure to look at the four corners).
To run it on a new dataset, define is as the test set of your DataBunch then run learner.TTA(ds_type=DatasetType.Test).

DavidBressler · February 28, 2019, 5:36am

Thanks @sgugger !

This forced me to look a little bit deeper into how data is loaded into the learner. In case anyone’s interested, I wound up loading my ‘test’ data into the ‘validation’ slot of the learner, as seen below (need to first run the code above):

df_data = pd.read_csv(DATAPATH/'my_labels.csv')
main_image_list=ImageItemList.from_df(df=df_data,path=DATAPATH,folder='images_resized', cols=0,suffix='')
df_test_data = pd.read_csv(DATAPATH/'my_test_labels.csv')
test_image_list=ImageItemList.from_df(df=df_test_data,path=DATAPATH,folder='images_resized', cols=0,suffix='')
data_test=(main_image_list
      .split_by_list(main_image_list,test_image_list)
      .label_from_df(cols=1)         
      .transform(tfms, size=224)
      .databunch(bs=40))

learner.data = data_test
accuracy(*learner.TTA())

sgugger · February 28, 2019, 2:08pm

Oh? Using add_test didn’t work?

DavidBressler · March 1, 2019, 12:15am

@sgugger , from what I understand, fastai test datasets have no labels (https://docs.fast.ai/data_block.html#Add-a-test-set), so I decided to add the test set as the validation set, so I could include labels with it.

sgugger · March 1, 2019, 12:18am

Ah, got it.

DavidBressler · April 30, 2019, 5:10am

@sgugger is there somewhere I can find a list of the exact transforms that TTA uses? You mentioned “tweaks to make sure to look at the four corners” as one difference from that used on the train set…

I’d like to use them on the validation set with get_preds so I can save the outputs of each of the 8 variations. Currently I’m doing:

data_test.valid_ds.tfms=data_test.train_ds.tfms

but it gives a slightly worse accuracy when averaging predictions compared to doing learner.TTA()

sgugger · April 30, 2019, 12:52pm

You should check the source code to have more details. I didn’t write the TTA part so I don’t know more than I told you already

DavidBressler · April 30, 2019, 3:41pm

ok, thanks anyway. Averaging the 8 versions gets pretty close, so I’ll probably stick with that. In case anyone’s wondering, here’s the code I’m using:

data_test1.valid_ds.tfms=data_test1.train_ds.tfms
learner1 = cnn_learner(data_test1, models.resnet34, metrics=accuracy,pretrained=True)
predictions1=learner1.get_preds(DatasetType.Valid)
predictions2=learner1.get_preds(DatasetType.Valid)
predictions3=learner1.get_preds(DatasetType.Valid)
predictions4=learner1.get_preds(DatasetType.Valid)
predictions5=learner1.get_preds(DatasetType.Valid)
predictions6=learner1.get_preds(DatasetType.Valid)
predictions7=learner1.get_preds(DatasetType.Valid)
predictions8=learner1.get_preds(DatasetType.Valid)
comb_output=[predictions1[0],predictions2[0],predictions3[0],predictions4[0],
            predictions5[0],predictions6[0],predictions7[0],predictions8[0]]
comb_output=torch.sum(torch.stack(comb_output),dim=0)
accuracy(comb_output,predictions1[1])

aipitch · August 8, 2019, 7:25pm

I was also wondering how to use TTA in the 2019 version of Fastai and came across this thread. While it was covered in the 2018 course and notebooks, I don’t recall it being mentioned or used in the 2019 course.
Is it still useful and effective in improving prediction accuracy with the 2019 version of fastai?