What's the fastest way to make predictions on a directory of images?

xjdeng · January 5, 2019, 10:53pm

Suppose I trained an image classifier and I want to use it to make predictions:

path = /path/to/my/model
pred_dir = /path/to/directory/full/of/images/i/want/to/predict

First, load the model:

data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet34, metrics=accuracy)
learn.load("model1000v1")

The following works but is quite awkward since learn.predict only makes prediction on a single image at a time:

preds = []
for f in Path(pred_dir).iterdir():
    _,x,_ = learn.predict(open_image(f))
    preds.append(int(x))

I tried the following but I only got 64 predictions (I assume 64 is the default batch size):

pred_files = list(Path(pred_dir).iterdir())
imglist = ImageItemList(items=pred_files)

I also tried the following and got 4319 predictions despite there only being 79 images in my directory:

pred_files = list(Path(pred_dir).iterdir())
imglist = ImageItemList(items=pred_files)
len(pred_files) #answer: 79
preds = learn.get_preds(imglist)
preds[1].shape #answer: 4319

Any ideas?

sgugger · January 6, 2019, 9:39pm

You can’t pass a Dataset to get_preds, it should be a DatasetType. If you have multiple predictions to make, you should just use the test set, as documented here.

zache · September 24, 2019, 1:38pm

When you set a test dataset as per the docs, are they given the same set of transforms (I’m mainly concerned that the image size is correct) and normalization?

If not, what is the best way to perform predictions on a large set of images while also resizing and normalizing?

sgugger · September 24, 2019, 3:23pm

They get the same transforms as the validation set, yes.

zache · September 24, 2019, 3:30pm

Great – thanks!