Fastai v2 Recipes (Tips and Tricks) - Wiki

farid · April 13, 2020, 1:35pm

Inference (Prediction)

This post describes how to get predictions from a test dataset, pretty-printing them, and plotting its corresponding confusion matrix when a test dataset has labels.

First of all, I would like to point out that this post is a summary of several posts that I gathered in the forum. Therefore, the credit goes to the original contributors being: @sgugger, @VishnuSubramanian, @sut , @chengwliu, @LessW2020 , @vijayabhaskar, @muellerzr If I missed any other contributor, please DM me and I will update that list.

The Learner's get_preds(dl=dl_oject) method expect a DataLoader object. Therefore we need to create a test_dl Dataloader object. There are 2 options to create that one:

Option 1: Creating a test loader at the same time as the train and valid DataLoaders object
Splits are used in Datasets, TfmdLists, and DataBlock. They allow to split a dataset (or a list of items) in several chunks called subsets. If we split our dataset in 3 subsets, we will end up having 3 following subsets:

1- subset(0): the train dataset, and has the alias name `train`
2- subset(1): the valid dataset, and  has the alias name `valid`
3- subset(2),  the test dataset, and that one doesn't have a name

If we create a DataLoaders object called dls, the latter will be an array object with the following elements:

1- dls[0] which has an alias name `dls.train`, and is the 'train` Dataloader
2- dls[1] which has an alias name `dls.valid `, and is the 'valid` Dataloader
3- dls[2] has not any alias name and is the 'test` Dataloader

Therefore, we have the following test Dataloader : dls[2]

Option 2: Creating a test loader after creating the DataLoaders dls object
In this case, we assume having 2 splits, and therefore having the train and valid DataLoader objects as described here above.
In this example, we will use the vision module to illustrate how to create a test Dataloader (let’s assume that our test data have labels, hence the use of with_label=True argument):

test_files = get_image_files('/path/to/test/data') 
test_dl = learn.dls.test_dl(test_files, with_label=True) # check the **Note** here below

Once we have a test Dataloader object (either dls[2] or test_dl), we can inject it in the Leaner get_preds() method. In the following case, we are using test_dl object (obtained in Option 2). We could have used dls[2] had we opted for Option 1

In this example, we are getting the prediction and we are pretty-printing them by displaying: the prediction, the confidence percentage, and the image name:

preds = learn.get_preds(dl=test_dl)  
for index, item in enumerate(preds[0]): 
	prediction = dls.categorize.decode(np.argmax(item)).upper() 
	confidence = max(item) 
	percent = float(confidence) 
	print(f"
	"Prediction: {prediction} - Confidence: {percent*100:.2f}% -
	 Image: {test_dl.items[index].name}")

As a bonus , we can also store the test_dl object in the DataLoaders dls object as a second validation DataLoader like this:

dls.loaders.append(test_dl)

and then use it to display the corresponding confusion matrix like this:

interp = ClassificationInterpretation.from_learner(learn, ds_idx=2)
interp.plot_confusion_matrix()

Note: test_dl can be created using these 2 equivalent methods:

test_dl = learn.dls.test_dl(test_files, with_label=True)

or

test_dl = test_dl(learn.dls, test_files, with_label=True)

we can do that because test_dl() uses the following @patch annotation (source code):

@patch
def test_dl(self:DataLoaders, test_items, rm_type_tfms=None, with_labels=False, **kwargs):