How are you going about to predict categories of images from a test directory? One option is to use a loop and grab files one by one, but this seems slow. A better idea is to use a generator and them let the files flow through it in batches. In a test directory we don’t have subdirectories (we don’t know the classes), but this method in keras does require you to have subdirectories. As a quick fix, what I do is copy the test folder into a pretest folder, and then point the generator to pretest. And it does work well.
But is there any less hackish way of efficiently calculating a large number of predictions from a directory? (I’m at lesson 4, and having a try at the fisheries competition) I also want to avoid loading everything into an X_test array and feeding it to the model, as I guess that would be less efficient.
Try dask array and dask delayed for processing image -> np array.
For prediction, you don’t need to worry about augmentation (usually).
You can actually run model.predict(X) directly on a dask array IIRC.
That said, it’s a lot easier just to create a subfolder.
import dask.array as da
import numpy as np
from dask.delayed import delayed
from scipy.misc import imread, imresize
from glob import glob
sharks = glob('/data/the-nature-conservancy-fisheries-monitoring/train/SHARK/*')
@delayed
def preproc(fp):
img = imread(fp)
return imresize(arr=img, size=(256,256))
dsets = [da.from_delayed(preproc(fp), shape=(256,256,3), dtype=np.float32) for fp in sharks]
# you can now treat this almost like a normal array
# e.g. In model.predict()
a = da.stack(dsets)
a.shape