Batch inference on labelled test set w/ fastaiv2

tleyden · October 10, 2020, 7:45pm

I’m trying to do batch prediction on a CSV file with ground truth labels and image filenames. The csv looks like this:

It’s from the german traffic signs dataset, and the filenames are names of image files to do inference on, and the class ids are the types of signs (eg, 14 == stop sign).

I’m looking for the best fastai function(s) to do the following steps:

I want to do inference on an already trained model, and would rather not have to retrain the model and pass in the labelled test set as an additional dataset to the learner (assuming that’s even possible).

This code snippet “works” but isn’t efficient at all:

num_correct_preds = 0
num_wrong_preds = 0

for row in df.itertuples():
    test_img = testing_path/row.Filename
    img = PILImage.create(test_img)
    class_id, _, probs = learn_inf.predict(img)
    class_id, probs
    is_correct = int(class_id) == int(row.ClassId)
    if is_correct:
        num_correct_preds += 1
    else:
        num_wrong_preds += 1

I’ve looked at the docs for both Learner.validate() and Learner.get_preds() but it’s not immediately clear how to use either of these for this task. In particular, how would I set up the DataLoader object to pass into Learner.get_preds()? The ImageDataLoaders.from_* API calls all seem to be geared towards training, because they expect to be told how to do a training/validation split, which isn’t needed for batch inference on a labelled test set.

I started to look into the lower-level Datablock API to create a DataLoader from that, but haven’t had any luck so far. Is Datablock + Learner.get_preds() the best approach here? I don’t need to do the inference and calculate the accuracy in one single step here, I’m mainly interest in doing the batch inference, and I could calculate the accuracy metric separately.