Empty label list for test dataset using from_folder

Hi, I have problem with ImageDataBunch. I try to do the following.

data = ImageDataBunch.from_folder(train_path, 
                                      train="train",
                                      test="test", 
                                      valid_pct=0.1, 
                                      ds_tfms=get_transforms(), 
                                      size=224, num_workers=4).normalize(imagenet_stats)

It is executed perfectly, but when I look the data, the label for test dataset is empty. Here is the output.

ImageDataBunch;

Train: LabelList (180 items)
x: ImageList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
Class A,Class A,Class A,Class A,Class A
Path: dataset;

Valid: LabelList (20 items)
x: ImageList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
Class A,Class B,Class A,Class A,Class B
Path: dataset;

Test: LabelList (120 items)
x: ImageList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: EmptyLabelList
,,,,
Path: dataset

As you can see, the label for Test is empty. Here is my folder structure.

dataset:
|- train
   |- Class A
   |- Class B
   |- Class C
   |- Class D
|- test
   |- Class A
   |- Class B
   |- Class C
   |- Class D

What should I do in order to get label for Test set? Should I change my folder structure or is there something wrong with the code?

Thank you.

In fastai the test data is expected to be unlabelled (in v1 anyway, think v2 has more flexible support here). So it’s more for ‘test data’ in the sense Kaggle uses of the data on which you’ll be evaluated, rather than the sense of a holdout from your validation data to use more sparingly.

There doesn’t seem to be any easy way around this in the API. You might be able to create the train/valid sets as usual and then just assign a labelled test dataset and loader to data.test_ds/data.test_dl and then they might work. Though not sure if any functions will actually then use that. If that does work, you could also use DataBunch.create which takes a test dataset parameter.
Though I suspect that won’t really help as things won’t expect labelled test data. So, you might be better just creating a separate databunch with the test data as the validation set. Then you can just evaluate against that and fastai should do what you want (expecting the validation set to be labelled). I think passing that dataloader into the various Learner methods that accept a dataloader (like Learner.validate) should work, don’t think it requires it to be one of the Learners loaders.
Not sure that any of the methods for duplicating the various item list classes (ItemList, ItemLists - split, LabelLists - split+labelled) will let you copy over the parameters from the main databunch, but maybe.

1 Like

TomB is right, v2 will allow labeled test sets easily. Until then please see my notebook here on how to work around and use a labeled test set in v1 :slight_smile:

https://github.com/muellerzr/fastai-Experiments-and-tips/blob/master/Test%20Set%20Generation/Labeled_Test_Set.ipynb

2 Likes

Thank you for your answer. Actually, I can do accuracy score manually by evaluating the true label and predicted label. However, I want to do TTA and I do not know how to write my own TTA thus I need to use learn.TTA() from fastai library.

I will consider about your suggestion.

Thank you. I will study your notebook.

I have solved it.

Thank you very much @muellerzr for your notebook. It really helps me a lot.

First, I create Image Data Bunch for training.

data = ImageDataBunch.from_folder(train_path/"train", 
                                  train=".",
                                  valid_pct=0.1, 
                                  ds_tfms=get_transforms(), 
                                  size=224, num_workers=4).normalize(imagenet_stats)

Next, I create Imaga Data Bunch for test set.

src = (ImageList.from_folder(train_path/"test")
            .split_none()
            .label_from_folder())

data_test = (src.transform(get_transforms(), size=224))

data_test.valid = data_test.train
data_test = data_test.databunch().normalize(imagenet_stats)

After that by doing the following, I am able to do TTA prediction on my test set.

learn.data.valid_dl = data_test.valid_dl
y_preds, y = learn.TTA(ds_type=DatasetType.Valid)
if y.shape[0] == len(data_test.valid_ds):
  print(accuracy(y_preds, y))
else:
  print(f'There is error. Shape of y_preds {y_preds.shape}. Shape of y {y.shape}')