How to classify data on a test set

Fackelmann · February 8, 2019, 7:43am

Hi,

I have trained my model, and now I’m trying to use it to predict data on a test set (which I have under a ‘test’ folder).

I’ve looked in the forums and I’ve seen mentions to adding a test_name field to a data object this method:

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), test_name='test')

But when I try to do it, I get the following error:

NameError: name 'ImageClassifierData' is not defined

(Not sure if the function/method is deprecated…)

How can I classify that data with my model?

ark_aung · February 8, 2019, 9:29am

ImageClassifierData seems not to exist [anymore] (at least on fastai 1.0.42).

One simple way to add test images under test folder is

data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), 224, test='test', bs=64).normalize(imagenet_stats)

or with DataBlock API:

data = (ImageItemList.from_folder(path) #Where to find the data? -> in path and its subfolders
         .split_by_folder()              #How to split in train/valid? -> use the folders
         .label_from_folder()            #How to label? -> depending on the folder of the filenames
         .add_test_folder()              #Optionally add a test set (here default name is test)
         .transform(tfms, size=64)       #Data augmentation? -> use tfms with a size of 64
         .databunch())                   #Finally? -> use the defaults for conversion to ImageDataBunch

you can even check whether your DataBunch has the test data by:
data.test_ds.x[0]

then you can train your Learner (let’s call it learner) and after it has been trained, you can run your Learner on test dataset
preds, _ = learner.get_preds(ds_type=DatasetType.Test)

Fackelmann · February 8, 2019, 10:49pm

Thank you!

I see that it would work, but I think I would need to train the data with that data set right?

In my case I had already trained my model and saved it.

I tried to create a new data2 after training my model, and then loading the model with that data2, but I got an error regarding the dimensions of the data.

So I ended up loading the images and predicting them one by one. Which takes a lot of time and, since we are using GPUs, I bet it’s a huge waste of time and resource. But it worked

I’ve peeked at lesson 2 and I think that Jeremmy will cover that here.

ark_aung · February 9, 2019, 9:00am

You would not need to train with that data.

For example, you load your learner either with
learner.load (after creating a learner)
(or)
load_learner

Afterwards, you can create a DataBunch with:

data = (ImageItemList.from_folder(path) #Where to find the data? -> in path and its subfolders
         .split_by_folder()              #How to split in train/valid? -> use the folders
         .label_from_folder()            #How to label? -> depending on the folder of the filenames
         .add_test_folder()              #Optionally add a test set (here default name is test)
         .transform(tfms, size=64)       #Data augmentation? -> use tfms with a size of 64
         .databunch())

and then set your learner’s data with
learner.data = data

Then you can do learner.get_preds

Another way is to do:

test = ImageItemList.from_folder(os.path.join(path, 'test'))
learner = load_learner(path, fname='export.pkl', test=test)
preds, _ = learner.get_preds(ds_type=DatasetType.Test)

Sayak · May 7, 2019, 2:45am

Thank you very much for this