Why my batch is much smaller is than my sample?

Hi! I have set up my model to download a total of 150 images. I end up with around 80 for each batch (no idea why). What bothers me is that when I pass my model to the data loader and train, the confussion matrix later tells me that there are around 30 images on my dataset. Why is this happening?

Here I share my work for you to see.

I ran your notebook.

Your batch size is 64 (which is the default size). You can check this with dls.bs or learn.dls.bs.

The confusion matrix plots the model’s predictions, which is based off of the validation dataset. When creating your training and validation dataset, you put aside 20% of the dataset to be part of the validation dataset.

laminoids = DataBlock(
                ...,
                splitter=RandomSplitter(valid_pct=0.2, seed=80),
                ...
)

You downloaded around 170 images and 20% of 170 is around 30, hence the 30 images displayed in the matrix. :slightly_smiling_face:

1 Like