Inference (Prediction)
This post describes how to get predictions from a test dataset, pretty-printing them, and plotting its corresponding confusion matrix when a test dataset has labels.
First of all, I would like to point out that this post is a summary of several posts that I gathered in the forum. Therefore, the credit goes to the original contributors being: @sgugger, @VishnuSubramanian, @sut , @chengwliu, @LessW2020 , @vijayabhaskar, @muellerzr If I missed any other contributor, please DM me and I will update that list.
The Learner
's get_preds(dl=dl_oject) method expect a DataLoader
object. Therefore we need to create a test_dl
Dataloader
object. There are 2 options to create that one:
Option 1: Creating a test loader at the same time as the train and valid DataLoaders
object
Splits are used in Datasets
, TfmdLists
, and DataBlock
. They allow to split a dataset (or a list of items) in several chunks called subsets. If we split our dataset in 3 subsets, we will end up having 3 following subsets:
1- subset(0): the train dataset, and has the alias name `train`
2- subset(1): the valid dataset, and has the alias name `valid`
3- subset(2), the test dataset, and that one doesn't have a name
If we create a DataLoaders
object called dls
, the latter will be an array object with the following elements:
1- dls[0] which has an alias name `dls.train`, and is the 'train` Dataloader
2- dls[1] which has an alias name `dls.valid `, and is the 'valid` Dataloader
3- dls[2] has not any alias name and is the 'test` Dataloader
Therefore, we have the following test
Dataloader : dls[2]
Option 2: Creating a test loader after creating the DataLoaders
dls
object
In this case, we assume having 2 splits, and therefore having the train
and valid
DataLoader objects as described here above.
In this example, we will use the vision
module to illustrate how to create a test Dataloader
(let’s assume that our test
data have labels, hence the use of with_label=True
argument):
test_files = get_image_files('/path/to/test/data')
test_dl = learn.dls.test_dl(test_files, with_label=True) # check the **Note** here below
Once we have a test Dataloader
object (either dls[2]
or test_dl
), we can inject it in the Leaner
get_preds()
method. In the following case, we are using test_dl
object (obtained in Option 2). We could have used dls[2]
had we opted for Option 1
In this example, we are getting the prediction and we are pretty-printing them by displaying: the prediction, the confidence percentage, and the image name:
preds = learn.get_preds(dl=test_dl)
for index, item in enumerate(preds[0]):
prediction = dls.categorize.decode(np.argmax(item)).upper()
confidence = max(item)
percent = float(confidence)
print(f"
"Prediction: {prediction} - Confidence: {percent*100:.2f}% -
Image: {test_dl.items[index].name}")
As a bonus , we can also store the test_dl
object in the DataLoaders
dls
object as a second validation
DataLoader like this:
dls.loaders.append(test_dl)
and then use it to display the corresponding confusion matrix like this:
interp = ClassificationInterpretation.from_learner(learn, ds_idx=2)
interp.plot_confusion_matrix()
Note: test_dl can be created using these 2 equivalent methods:
test_dl = learn.dls.test_dl(test_files, with_label=True)
or
test_dl = test_dl(learn.dls, test_files, with_label=True)
we can do that because test_dl()
uses the following @patch
annotation (source code):
@patch
def test_dl(self:DataLoaders, test_items, rm_type_tfms=None, with_labels=False, **kwargs):