Why am I training on validation set with Kaggle dog breed identification dataset?

Hey there, I’ve a strange issue (or simply I’m not able to create a proper ImageDataBunch, we’ll see). The problem is that, training on the Kaggle dog breed identification dataset, counters say that I’m training on the validation set.

This is my split (it looks ok to me):

But when I train:


the model is training on the valid_ds, isn’t it?

And when it validates… it validates on a sub-split of the valid_ds, isn’t it?


My databunch is done this way:

data = (ImageList.from_folder(imgpath).split_by_rand_pct().label_from_folder().add_test_folder(imgpath+'/../test').transform(get_transforms(), size=img_size).databunch(bs=bs)).normalize(imagenet_stats)

So, I don’t know… the creation of the databunch looks ok, especially because the splits (as you can see above) are ok. So, what am I doing wrong?

Thanks! (Fast.ai is awesome!)

P.s. I forgot to mention that i re-arranged the original dataset in a hierarchy of folders named with classes labels, with each folder containing numbered images for each class.

Like this but with more images.

Oh my gosh, fit_one_cycle() counters talk about batches, not single images! And given that I have bs=4 (because of my crappy GPU with only 2 Gb of VRAM), 8178÷4 = 2044.5 and 2044÷4 = 511 and so everything is working as it should!*

  • well, not really because I can’t manage to go beyond 73% accuracy, but this is another problem.