Add testset without applying transforms

#1

Hi !

I’m trying to use the add_test method from fastai library to make my final predictions on a testset for a kaggle competition, but I can’t make it work. I keep on getting the same error:

/work/stages/schwob/.conda/envs/pytorch/lib/python3.6/site-packages/fastai/data_block.py in add_test(self, items, label)
    550         if isinstance(items, ItemList): items = self.valid.x.new(items.items, inner_df=items.inner_df).process()
    551         else: items = self.valid.x.new(items).process()
--> 552         self.test = self.valid.new(items, labels)
    553         return self
    554 

/work/stages/schwob/.conda/envs/pytorch/lib/python3.6/site-packages/fastai/data_block.py in new(self, x, y, **kwargs)
    616     def new(self, x, y, **kwargs)->'LabelList':
    617         if isinstance(x, ItemList):
--> 618             return self.__class__(x, y, tfms=self.tfms, tfm_y=self.tfm_y, **self.tfmargs)
    619         else:
    620             return self.new(self.x.new(x, **kwargs), self.y.new(y, **kwargs)).process()

/work/stages/schwob/.conda/envs/pytorch/lib/python3.6/site-packages/fastai/data_block.py in __init__(self, x, y, tfms, tfm_y, **kwargs)
    589         self.y.x = x
    590         self.item=None
--> 591         self.transform(tfms, **kwargs)
    592 
    593     def __len__(self)->int: return len(self.x) if self.item is None else 1

/work/stages/schwob/.conda/envs/pytorch/lib/python3.6/site-packages/fastai/data_block.py in transform(self, tfms, tfm_y, **kwargs)
    709         _check_kwargs(self.x, tfms, **kwargs)
    710         if tfm_y is None: tfm_y = self.tfm_y
--> 711         if tfm_y: _check_kwargs(self.y, tfms, **kwargs)
    712         self.tfms,  self.tfmargs   = tfms,kwargs
    713         self.tfm_y, self.tfmargs_y = tfm_y,kwargs

/work/stages/schwob/.conda/envs/pytorch/lib/python3.6/site-packages/fastai/data_block.py in _check_kwargs(ds, tfms, **kwargs)
    581         try: x.apply_tfms(tfms, **kwargs)
    582         except Exception as e:
--> 583             raise Exception(f"It's not possible to apply those transforms to your dataset:\n {e}")
    584 
    585 class LabelList(Dataset):

Exception: It's not possible to apply those transforms to your dataset:
 Not implemented: you can't apply transforms to this type of item (EmptyLabel)

There are several things I don’t understand:

  • Why does it adds the test set using the valid attribute ?
  • Why does it try to apply my train transforms to the test set ? I obviously don’t want the same transforms to be applied for my test set (the one I submit) and my training set.
  • Even stranger, why does it try to apply it to the label list while obviously it is None (it is even specified in the docs that it has to be None) ?

My wild guess is that, as I already added a validation set, which contains transforms to be applied to labels as well (cropping for instance), it tries to apply it to the test set as it uses valid attribute. So I guess it all comes down to my first question. Is there a way to avoid using valid and its transforms ?

Thanks in advance !

0 Likes

(Laurent) #2

I ran into this as well, the logic that’s happening in that function is a bit odd. I know Sylvain already fixed up parts of it, but I’m not sure about the logic that handles the test set in particular. I’d recommend filing an issue on the Github to bring it to their attention.

To immediately answer your question: just set up your data bunch again without using the .transform() method. So you might have something in your code like:

data = (ImageList.from_folder(path)
        .split_by_pct()
        .label_from_folder()
        .transform(tfms)
        .databunch()
        .normalize() )

Just comment out the transform() line, then substitute it into your learner by just doing
learner.data = data
and you should be good to go.

To step through your questions:

  • It adds the test set using the valid attribute because this defines how the data is handled (for example, there’s no back propagation for either valid or test).
  • It tries to apply transforms because it is handling it as if it were a validation set (as your previous question highlights). I don’t think that’s the intended behaviour though.
  • It doesn’t really “apply” the label list, more that it thinks it is handling a validation set, which should have transformable labels, so it naively tries to handle that.
0 Likes

#3

Thanks !
That is indeed what I thought was happening. This behavior is pretty strange, maybe it should allow 3 sets of transforms to be applied (train, valid and test), so that the test set doesn’t inherit from the valid transforms. Besides, it is probably worth catching the case where labels are None. I’ll go check the github in case someone already raised the issue and i’ll create one if necessary.

0 Likes

(Laurent) #4

it should allow 3 sets of transforms to be applied (train, valid and test)

Indeed. I think there’s actually something like this available because the library explicitly allows for different sets as visible here, it’s just that for some reason the logic isn’t checking/taking advantage of those semantics. I don’t have the code in mind so well, so I’m not sure where/how it checks, I just know that differentiated semantics for train/test/valid are available.

Besides, it is probably worth catching the case where labels are None.

Good point! I think they’d be grateful for a PR for this if you have spare time :wink:

0 Likes

#5

Yes this enum is used for instance when calling db.dl(type), it returns the corresponding dataset. It is used at other points in the code but the transforms were not implemented using this. I actually think it would be a good idea to somehow enable the use of DatasetType to add transforms to a specific dataset.

Yes i’ll try to work on it if needed, I have some spare time. I also have other ideas to make the add_test function more stable (for instance it calls the __init__ method from the ItemList, so if I created my test_list with specific arguments, and then used add_test(test_list), it actually adds the same list but without the arguments I added, which forces me to do db.test_ds.x.arg = arg).

0 Likes

#6

Sylvain fixed it on master, we can now specify custom tfms to add_test which is perfect.

0 Likes

(jaideep v) #7

hi lauren
I face the same issue but the above method dsnt works for me
.Below is what i get when i print in the new dataset in which i have put test images path. With this DB i fail during iteration. next(iter(data.train_dl)). I use custom open function which is defined in the custom Image list class. I am not sure why it dsnt takes that .
Could you please help out.

test_data=(RSNAobjectItemList.from_folder(path_t)
        .split_none()
        .label_from_func(get_y_func)
     
        .databunch(path=path_t, bs=56, collate_fn=bb_pad_collate1)
        .normalize(imagenet_stats)
       
       )
test_data

ImageDataBunch;

Train: LabelList (0 items)
x: RSNAobjectItemList

y: ObjectCategoryList

Path: ../input/rsna-pneumonia-detection-challenge/stage_2_test_images;
0 Likes

(Laurent) #8

So I think this fails because fastai’s basic assumption for test data is that it doesn’t have any labels. From what I can tell, your code above violates that assumption (you assign labels to your test set by doing label_from_func().

At any rate I’m a bit surprised to see that you’re creating a DataBunch out of purely test data, I think that’s going to cause you quite some pain, I think that breaks a couple of fastai’s assumptions.

Check out the docs on adding a test set, hopefully that’ll help you get an idea of how to approach this. You don’t need to create an entirely new DataBunch, all you need to do is add a add_test() call.

0 Likes

(jaideep v) #9

actually i tried without putting any labels also it threw an error message your databunch has not got any labels…

0 Likes

(Laurent) #10

That’s the code working as intended. DataBunch is designed to handle labeled data, i.e. train and validation data. If you want to add test data, which is unlabeled by definition, it is a special case, and you should use the add_test() API that was designed with test sets specifically set in mind.
Check the link I posted above for more clarification.

0 Likes

(jaideep v) #11

m actually following that…
With original data bunch object i cant use add_test because of tfmy=True . Version 54 that i use is not having updated add test procedure that can take tfm param for test set.
How can we customize the Label List class to include the modified add_test in the version 54.

0 Likes

(jaideep v) #12

i think i figured out the way…
All i need is the test_dl from databunch. Hope it will work at the time of inference from the test set.

data_test = (RSNAobjectItemList.from_df(train_df , path=path2)
        .split_by_rand_pct(0.15,seed=42)
        .label_from_func(get_y_func)
       
        .transform(tfms, size=284)
        .add_test(test_fnames)
        .databunch(path=path2, bs=56)
         .normalize(imagenet_stats)
       
       )
0 Likes

(Laurent) #13

I’m not sure if Sylvain’s patches have made it to master/your fastai install yet. If his patches have made it in, then indeed, this should be fine.
If not, this will throw an error (because it will try to apply the transforms to the labels, which your test set doesn’t have). If it throws those errors, then just create your DataBunch like you did above, but remove the transform() line, or replace it with two empty lists. That should allow it to run on the test_dl.

0 Likes

#14

Sylvain’s patch is on master, but not sure it is released yet. Basically you can create an ItemList for your test set, and then use add_test on your databunch to add it, even with tfm_y=True, as you can pass custom transforms for your test set. Commit for the change is this one.

0 Likes

#15

Hi, I’m using the current build from Github (with Sylvain’s patch) but I don’t think my transforms are being applied.

I call

learn.data.add_test(imagelist, tfms = my_tfms)

where

imagelist = ImageList.from_df(samplecsv,base_dir,folder="test_images",suffix='.png')

has my test images and my_tfms = [customtfm()] points to a custom transform I created (which works properly when visualizing train data or training).

I’ve added prints inside the customtfm so I can tell when it runs, but it does not run the transform when I’m using preds, y = learn.get_preds(DatasetType.Test) Is there another way to apply specific transforms on the test set before prediction?

0 Likes

#16

You can check the tfms on your test set with learn.data.test_ds.tfms normally.

1 Like

#17

Thanks, I was able to fix my problem with that.

0 Likes