More Samples in Databunch than Original Data

Thanks for your attention.

I used the “ImageDataBunch.from_folder” method to create a databuch from my training set with about 300 images and use valid_pc=0.8. Everything went fine, but what confused me is that I found the number of samples in the databunch is around 1000 (about 800 for training set and 200 for valid set.)

I am curious about the expansion of the databunch. I thought it was the result of transformation. But the same number came out even if I set the “ds_tfm=None”.

Could anybody tell me the reason why there are more samples in the databunch than my original image set? And how can I restrict the databunch to the original image set, with exact the same number of samples?

Thanks again for your help.

Hi! You should only have 300 if that´s the total imgs in your dataset folder (and subfolders). Please share your code.

@yuyang Also check whether you are linking towards the correct folder or not. If not, say you accidentally link to the parent of the desired folder, that parent folder might have other folders containing different datasets and all together they may add to 1000 (though one thought this is unlikely due to from_folder class method seems to requires a specific directories to work… Unless it so happens that the parent folder also contain this kind of directory?

Thanks for above suggestions! I finally found that the error was caused by the path of the folder’s name.

I set the “path” parameter to be, say “d:/mywork”, with three subfolders named “train”,“valid” and “test”. Then I set the “train” parameter to be “./train”. I thought only the pictures in “d:/mywork/train” will be taken in to caculation, but it turned out that the data from all the three subfolders are taken. I changed the path to be “d:/mywork/train” and the problem was solved.

Thanks for your help again!