Update: Never mind. it seems like I was using from_folder wrong. I need to use from_name_re given my folder structure. I fixed that and it works as expected.
I have a single image classification dataset with 714 samples across 11 classes.
data = ImageDataBunch.from_folder(
'./data, train='train', valid_pct=0.2,
ds_tfms=get_transforms(), size=size, bs=bs,
).normalize(imagenet_stats)
I fit the learner, and get interpretation, plot confusion matrix:
learner = cnn_learner(data, resnet34, metrics='error_rate')
learner.fit_one_cycle(4)
interp = ClassificationInterpretation.from_learner(learner)
interp.plot_confusion_matrix(figsize=(10, 10), dpi=60
I notice that the confusion matrix is showing way more samples than in the training dataset. For example, my first class has 11 elements, but the confusion matrix shows 55 elements in the main diagonal for that class.
If I call get_preds and look at its shape, I get (1496, 11)
preds, _, _ = interp.get_preds(with_loss=True)
preds.shape
Where does that 1496 number come from?
What samples are considered to plot the confusion matrix? Is there some resampling happening? Are the results being accumulated across cycles/epochs?
Thanks