How to add test data to ImageDataBunch and predict labels for test data

After taking lesson 1, I tried out image classification with stanford cars dataset.

I was having trouble with using the test set in ImageDataBunch. This is how my ImageDataBunch looks now.

    data = ImageDataBunch.from_df(path/'car_ims', train_df, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)
    test_data = ImageList.from_df(test_df, path/'car_ims')
    data.add_test(test_data)
    ImageDataBunch;

    Train: LabelList (6516 items)
    x: ImageList
    Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
    y: CategoryList
    AM General Hummer SUV 2000,AM General Hummer SUV 2000,AM General Hummer SUV 2000,AM General Hummer SUV 2000,AM General Hummer SUV 2000
    Path: /home/jupyter/.fastai/data/stanford-cars/car_ims;

    Valid: LabelList (1628 items)
    x: ImageList
    Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
    y: CategoryList
    Chevrolet Avalanche Crew Cab 2012,Ford F-150 Regular Cab 2007,Audi TT Hatchback 2011,Ferrari 458 Italia Convertible 2012,MINI Cooper Roadster Convertible 2012
    Path: /home/jupyter/.fastai/data/stanford-cars/car_ims;

    Test: LabelList (8041 items)
    x: ImageList
    Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
    y: EmptyLabelList
    ,,,,
    Path: /home/jupyter/.fastai/data/stanford-cars/car_ims

After training the model, in order to get the test predictions I used the following code, but I wanted to know if there is a way to predict labels for all images in the test set without iteration.

test_predictions = []
for test_image in test_data:
    test_predictions.append(learn.predict(test_image)[0])

This gave the label for each image in the testset.
Is there a more straightforward way to achieve this?

2 Likes

Here’s how I did it.

test_imgs = (path/'cars_test/').ls()
test_imgs.sort(key=lambda x: x.stem)
data.add_test(test_imgs) 
learn.data = data
preds = learn.get_preds(ds_type=DatasetType.Test)
9 Likes

You can use get_preds.

After fitting call these :

predictions, *_ = learner.get_preds(DatasetType.Test)
labels = np.argmax(predictions, 1)

I encountered an issue where all predictions were the same. Using the second line solved it.

1 Like

learn.get_preds(ds_type=DatasetType.Test) worked. Thanks! :grinning:
but I’m facing a few more issues.

  1. The labels returned by get_preds are all the same like @abyaadrafid mentioned. So I was able to get the predictions using np.argmax.
  2. The predicted labels are different between get_preds and iterating through predict. The accuracy was better when I was iterating through predict

Here is the code.

preds = learn.get_preds(DatasetType.Test)
# same label for all items
len(set([int(x) for x in preds[1]]))
> 1
label_index = np.argmax(preds[0], 1)
test_predictions_direct = [class_names[x] for x in label_index]
test_predictions = []
for test_image in test_data:
    test_predictions.append(learn.predict(test_image)[0])
test_df['predictions_direct'] = test_predictions_direct
test_df['predictions'] = test_predictions
test_df['predictions'] = test_df['predictions'].apply(str)
test_df['predictions_direct'] = test_df['predictions_direct'].apply(str)
test_df.head()

> |name|label|predictions|predictions_direct|
|45|000046.jpg|AM General Hummer SUV 2000|AM General Hummer SUV 2000|AM General Hummer SUV 2000|
|46|000047.jpg|AM General Hummer SUV 2000|AM General Hummer SUV 2000|AM General Hummer SUV 2000|
|47|000048.jpg|AM General Hummer SUV 2000|AM General Hummer SUV 2000|AM General Hummer SUV 2000|
|48|000049.jpg|AM General Hummer SUV 2000|AM General Hummer SUV 2000|AM General Hummer SUV 2000|
|49|000050.jpg|AM General Hummer SUV 2000|AM General Hummer SUV 2000|AM General Hummer SUV 2000|

# different predictions
test_df[test_df['predictions'] != test_df['predictions_direct']].shape[0]
> 5922

# accuracy with predict iteration
(test_df['label'] == test_df['predictions']).sum()/test_df.shape[0]
> 0.5743066782738465

# accuracy with get_preds
(test_df['label'] == test_df['predictions_direct']).sum()/test_df.shape[0]
> 0.1708742693694814

Is your class_names same as the one created by fastai?

1 Like

Thanks, class_names and data.classes were not in the same order. I’m looking into why it was not.

test_predictions_direct = [data.classes[int(x)] for x in label_index] fixed the issue.

1 Like

I suppose it kind of depends on what gets loaded first in the databunch, and the sequence of classes will be inferred thereafter based on what it sees first. There’s no guarantee that the databunch loads 00001.jpg followed by 000002.jpg and so on. Which is why i sorted my test items in my previous response.

You can check this thread out.

Ordering of output

The ordering of the predictions calculated as:
predictions = learn.get_preds(ds_type=DatasetType.Test)

Do they correspond to this ordering?
data.test_ds.items
Where data is an ImageDataBunch.

Details on data and learn below:

train, test = [ImageList.from_df(df, path=path, cols='id', folder=folder, suffix='.jpg') 
               for df, folder in zip([df_train_sub, df_test_sub], ['train_images', 'test_images'])]

data = (train.split_by_rand_pct(0.2, seed=123)
        .label_from_df(cols='category_id_name')
        .add_test(test)
        .transform(get_transforms(), size=64)
        .databunch(path=Path('.'), bs=64).normalize())

learn = cnn_learner(data, models.resnet34)