Inference (Prediction)
This post describes how to get predictions from a test dataset, pretty-printing them, and plotting its corresponding confusion matrix when a test dataset has labels.
First of all, I would like to point out that this post is a summary of several posts that I gathered in the forum. Therefore, the credit goes to the original contributors being: @sgugger, @VishnuSubramanian, @sut , @chengwliu, @LessW2020 , @vijayabhaskar, @muellerzr If I missed any other contributor, please DM me and I will update that list.
The Learner's get_preds(dl=dl_oject) method expect a DataLoader object. Therefore we need to create a test_dl Dataloader object. There are 2 options to create that one:
Option 1: Creating a test loader at the same time as the train and valid DataLoaders object
Splits are used in Datasets, TfmdLists, and DataBlock. They allow to split a dataset (or a list of items) in several chunks called subsets. If we split our dataset in 3 subsets, we will end up having 3 following subsets:
1- subset(0): the train dataset, and has the alias name `train`
2- subset(1): the valid dataset, and has the alias name `valid`
3- subset(2), the test dataset, and that one doesn't have a name
If we create a DataLoaders object called dls, the latter will be an array object with the following elements:
1- dls[0] which has an alias name `dls.train`, and is the 'train` Dataloader
2- dls[1] which has an alias name `dls.valid `, and is the 'valid` Dataloader
3- dls[2] has not any alias name and is the 'test` Dataloader
Therefore, we have the following test Dataloader : dls[2]
Option 2: Creating a test loader after creating the DataLoaders dls object
In this case, we assume having 2 splits, and therefore having the train and valid DataLoader objects as described here above.
In this example, we will use the vision module to illustrate how to create a test Dataloader (let’s assume that our test data have labels, hence the use of with_label=True argument):
test_files = get_image_files('/path/to/test/data')
test_dl = learn.dls.test_dl(test_files, with_label=True) # check the **Note** here below
Once we have a test Dataloader object (either dls[2] or test_dl), we can inject it in the Leaner get_preds() method. In the following case, we are using test_dl object (obtained in Option 2). We could have used dls[2] had we opted for Option 1
In this example, we are getting the prediction and we are pretty-printing them by displaying: the prediction, the confidence percentage, and the image name:
preds = learn.get_preds(dl=test_dl)
for index, item in enumerate(preds[0]):
prediction = dls.categorize.decode(np.argmax(item)).upper()
confidence = max(item)
percent = float(confidence)
print(f"
"Prediction: {prediction} - Confidence: {percent*100:.2f}% -
Image: {test_dl.items[index].name}")
As a bonus , we can also store the test_dl object in the DataLoaders dls object as a second validation DataLoader like this:
dls.loaders.append(test_dl)
and then use it to display the corresponding confusion matrix like this:
interp = ClassificationInterpretation.from_learner(learn, ds_idx=2)
interp.plot_confusion_matrix()
Note: test_dl can be created using these 2 equivalent methods:
test_dl = learn.dls.test_dl(test_files, with_label=True)
or
test_dl = test_dl(learn.dls, test_files, with_label=True)
we can do that because test_dl() uses the following @patch annotation (source code):
@patch
def test_dl(self:DataLoaders, test_items, rm_type_tfms=None, with_labels=False, **kwargs):