How to make predictions on list of "new" images


On lesson 1 (breeds), we are given a way to make predictions on individual images towards the end of the jupyter notebook as follows:

Method 1.
trn_tfms, val_tfms = tfms_from_model(arch, sz)
ds = FilesIndexArrayDataset([fn], np.array([0]), val_tfms, PATH)
dl = DataLoader(ds)
preds = learn.predict_dl(dl)

Method 2.
trn_tfms, val_tfms = tfms_from_model(arch, sz)
im = val_tfms(open_image(PATH + fn)) # open_image() returns numpy.ndarray
preds = learn.predict_array(im[None])

I am wondering if there is a way to generalize the approach to make predictions on a list of images and not just a single one.

I know I could accomplish the task by making a function and looping through all the images I want to predict on, but it seems to me that the fastai library is smarter than that and probably already has a method to do what I want.

Any advise?



1 Like

Anyone ?? @jeremy @rachel Thanks guys

In the below example, I have two folders train and test with the images. test=test_folder specifies the folder to look for the test images.

path = ‘’
train_folder = ‘train’
test_folder = ‘test’

data = ImageDataBunch.from_df(path=path, df=df_train, folder=train_folder, test=test_folder, ds_tfms=get_transforms(), size=224, bs=10)

After you are done training/validating the images, you do this:

log_preds, y = learn.TTA()
accuracy(log_preds, y)

log_preds_test = learn.get_preds(is_test = True)

log_preds_test = np.argmax(log_preds_test[0], axis = 1)
preds_classes = [data.classes[i] for i in log_preds_test]
probs = np.exp(log_preds_test)

submission = pd.DataFrame({ ‘imgname’: os.listdir(‘test’), ‘label’: preds_classes })
submission.to_csv(‘test_classification_results.csv’, index=False)

Please note the above example is that of a multi-label classification. The generated csv file will have two columns ‘imgname’ and ‘label’

Thanks @AjitB. However, I think that in your suggested approach, the “test” images in the learn object are the images used during training. In my case, I want to predict on a new set of images which were not available during trainig, so they were not passed on to the learn object.

I did figure out a way to do what I want, which essentially is to create a new data object as follows:

def get_data2(sz, bs): # sz: image size, bs: batch size
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
    data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', test_name='myTest',
                                   val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    return data
data2 = get_data2(sz, bs)
trn_tfms, val_tfms = tfms_from_model(arch, sz)
ds = data2.test_ds
dl = DataLoader(ds)
preds = learn.predict_dl(dl)

The key changes in this code was that the test_name variable inside the get_data2 function now points to a new directory with my new images.

One “problem” with this approach is that I am not applying TTA to the new predictions, which is not ideal. I am still trying to figure out how to do that. Any thoughts?

Does anyone know how to update the learn object without training? In essence, I want to update the test images in it to my new set of images, so that I can call learn.TTA to make predictions?


I figured it out…
It is just a matter of doing learn.set_data(get_data2(sz,bs)), and point the get_data fucntion to the directory with the new test data, and then just follow the process as detailed in the notebook, that is:

log_preds, y = learn.TTA(is_test=True)
probs = np.mean(np.exp(log_preds),0)

Hope this helps someone else in the future …


1 Like

Hi @reyemarr,
I have verified that the images in the test folder are not used for training or validation.
It can be confirmed by looking at the values of the following:


len(data.train_ds) + len(data.valid_ds) = Number of images in the train folder
len(data.test_ds) = Number of images in the test folder

Definition of the ‘class ImageDataBunch’ can be found here:

The definition of the fit method can be found here: