Test Set with no Labels

lungen · August 8, 2019, 7:07am

Hi, looks like a simple task, I’m definitely missing something.
I’m creating a data bunch with a test set in it, but the test set does not have labels:

.add_test(test_paths, label=None)
.transform(get_transforms(), size=image_size, tfm_y=True)

It seems that tfm_y=True makes it apply transformations to all labels, both training (which is correct) and test sets:

raise Exception(f"It's not possible to apply those transforms to your dataset:\n {e}")
594 
595 class LabelList(Dataset):

Exception: It's not possible to apply those transforms to your dataset:
Not implemented: you can't apply transforms to this type of item (EmptyLabel)

How can I tell it to do so only on the training set? It is important to apply transformations both on training images and labels (image segments), otherwise it doesn’t make sense. Yet, with the test set I need resizing (probably don’t need others like flips), and - I need it to not try to apply them to (non-existent) labels.
What’s the right way?

sgugger · August 8, 2019, 7:16am

You shouldn’t have any transform on your test set as you won’t be able to reverse them on the predictions you get to match your original image. The error you see is a defense mechanism of the library for that purpose.

If your images for inference have various sizes, you should use predict on each of then individually.

lungen · August 8, 2019, 7:26am

Thanks for pointing out that this is a defense mechanism.
Yet instead of predicting one by one, I’m thinking of loading test set separately:

imagelist = ImageList.from_df(...)

and then adding it to data bunch as an ImageList:

data.add_test(imagelist, tfms = my_tfms)

This way I can apply to the test set only transformations I need (resize), and set tfms_y=False.
Is this the right approach?

sgugger · August 8, 2019, 7:28am

You will need a post-processing script to put your predictions back to the size of your original images, so it’s not ideal.

lungen · August 8, 2019, 7:33am

True. Definitely not ideal.
But I already have that code, I use it when I check/visualize my predictions on the validation set. So, I’ll probably live with that.
Thanks, really appreciate your remarks!

florobax · August 8, 2019, 7:48am

Btw, you don’t need to pass a resize transform, as the test set is by default automatically resized to the same size as training set.

lungen · August 8, 2019, 8:58am

Useful info, thanks.
But I got stuck here.

class SegLabelList(SegmentationLabelList):
    def open(self, fn): return open_mask_image(fn)
    
class SegItemList(SegmentationItemList):
    _label_cls = SegLabelList

data = (SegItemList
        .from_folder(dir_data_train, extensions=['.dcm'], recurse=True)
        .split_by_rand_pct(valid_pct=0.2, seed=7)
        .label_from_func(lambda x : str(x), classes=[0, 1])
        .transform(get_transforms(), size=image_training_size, tfm_y=True)
        .databunch(bs=batch_size)
        .normalize(imagenet_stats)
)

test_list = (ImageList
        .from_folder(dir_data_test, extensions=['.dcm'], recurse=True)
        .split_none()
)

data.add_test(test_list, label=None, tfm_y=False)

I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-fc08dbbead4b> in <module>
     78 )
     79 
---> 80 data.add_test(test_list, label=None, tfm_y=False)
     81 
     82 print(get_time(), 'DataBunch created. Train/valid/test size:', len(data.train_ds), '/', len(data.valid_ds))

/opt/conda/lib/python3.6/site-packages/fastai/basic_data.py in add_test(self, items, label, tfms, tfm_y)
    156     def add_test(self, items:Iterator, label:Any=None, tfms=None, tfm_y=None)->None:
    157         "Add the `items` as a test set. Pass along `label` otherwise label them with `EmptyLabel`."
--> 158         self.label_list.add_test(items, label=label, tfms=tfms, tfm_y=tfm_y)
    159         vdl = self.valid_dl
    160         dl = DataLoader(self.label_list.test, vdl.batch_size, shuffle=False, drop_last=False, num_workers=vdl.num_workers)

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in add_test(self, items, label, tfms, tfm_y)
    555         "Add test set containing `items` with an arbitrary `label`."
    556         # if no label passed, use label of first training item
--> 557         if label is None: labels = EmptyLabelList([0] * len(items))
    558         else: labels = self.valid.y.new([label] * len(items)).process()
    559         if isinstance(items, MixedItemList): items = self.valid.x.new(items.item_lists, inner_df=items.inner_df).process()

TypeError: object of type 'ItemLists' has no len()

Where does ItemLists come from, if I’m passing ImageList (singular) to add_test function?

florobax · August 8, 2019, 9:08am

When you do split_none(), it converts your list to an ItemLists that contains a train and valid list. Just don’t split, and do

test_list = (ImageList.
             from_folder(dir_data_test, extensions=['.dcm'], recurse=True))

lungen · August 8, 2019, 9:39am

Indeed this has worked, and also the test images were resized to the size of training images. Really appreciate you help.