Loading test data into memory before predicting

melonkernel · November 9, 2018, 9:06am

Currently when i want to make csv submission to Kaggle, and have the csv loaded as a dataframe, i run

def applied_fn(s):
    s[1] = predict_this_id(s[0])
    return s

df_predict.apply(applied_fn, axis=1, raw=True)

Where the predict_this_id loads the image from the path etc…
It is however rather slow as predict_this_id loads the image from disk and does the prediction .

I was wondering if fastai v1 has got a tool for this to load all images, or in bunches, and then one can do the predictions from memory. Or does it not make much difference on an SD anyway.

Checking Github before i post this, i realized that there is ’get_image_files’ in fastai.vision.data that loads a FilePathList, using get_files from fastai.data_block

How does fastai load images into memory when creating the databunch, although i guess that loads it into GPU memory…

sgugger · November 9, 2018, 1:14pm

You should use a test set if you want to predict on a lot of images, it’s there for that purpose.

melonkernel · November 12, 2018, 11:52am

Ah, of course.