What's the fastest way to make predictions on a directory of images?

Suppose I trained an image classifier and I want to use it to make predictions:

path = /path/to/my/model
pred_dir = /path/to/directory/full/of/images/i/want/to/predict

First, load the model:

data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet34, metrics=accuracy)
learn.load("model1000v1")

The following works but is quite awkward since learn.predict only makes prediction on a single image at a time:

preds = []
for f in Path(pred_dir).iterdir():
    _,x,_ = learn.predict(open_image(f))
    preds.append(int(x))

I tried the following but I only got 64 predictions (I assume 64 is the default batch size):

pred_files = list(Path(pred_dir).iterdir())
imglist = ImageItemList(items=pred_files)

I also tried the following and got 4319 predictions despite there only being 79 images in my directory:

pred_files = list(Path(pred_dir).iterdir())
imglist = ImageItemList(items=pred_files)
len(pred_files) #answer: 79
preds = learn.get_preds(imglist)
preds[1].shape #answer: 4319

Any ideas?

You can’t pass a Dataset to get_preds, it should be a DatasetType. If you have multiple predictions to make, you should just use the test set, as documented here.

When you set a test dataset as per the docs, are they given the same set of transforms (I’m mainly concerned that the image size is correct) and normalization?

If not, what is the best way to perform predictions on a large set of images while also resizing and normalizing?

They get the same transforms as the validation set, yes.

1 Like

Great – thanks!