Predicting from a directory

JRicon · March 2, 2017, 10:32pm

Hi all,

How are you going about to predict categories of images from a test directory? One option is to use a loop and grab files one by one, but this seems slow. A better idea is to use a generator and them let the files flow through it in batches. In a test directory we don’t have subdirectories (we don’t know the classes), but this method in keras does require you to have subdirectories. As a quick fix, what I do is copy the test folder into a pretest folder, and then point the generator to pretest. And it does work well.

But is there any less hackish way of efficiently calculating a large number of predictions from a directory? (I’m at lesson 4, and having a try at the fisheries competition) I also want to avoid loading everything into an X_test array and feeding it to the model, as I guess that would be less efficient.

jeff · March 2, 2017, 11:11pm

Using a pretest folder is valid. I usually extract test images into a test/unknown/ subfolder.

davecg · March 2, 2017, 11:13pm

Try dask array and dask delayed for processing image -> np array.

For prediction, you don’t need to worry about augmentation (usually).

You can actually run model.predict(X) directly on a dask array IIRC.

That said, it’s a lot easier just to create a subfolder.


import dask.array as da
import numpy as np
from dask.delayed import delayed
from scipy.misc import imread, imresize
from glob import glob

sharks = glob('/data/the-nature-conservancy-fisheries-monitoring/train/SHARK/*')

@delayed
def preproc(fp):
    img = imread(fp)
    return imresize(arr=img, size=(256,256))

dsets = [da.from_delayed(preproc(fp), shape=(256,256,3), dtype=np.float32)  for fp in sharks]

# you can now treat this almost like a normal array
# e.g. In model.predict()
a = da.stack(dsets)
a.shape

twairball · March 3, 2017, 2:24am

Similar to what Jeff said, I just extract test images into test/unknown and then create a test_batches as usual.

Throw the test_batches into your model like: model.predict_generator(test_batches, test_batches.nb_sample, ....)

Just remember to rename the filenames before you submit to kaggle!
(rename the unknown/image_name.jpg to image_name.jpg for example)