ImagedataBunch versus Data Block API


I am trying to focus on creating my databunches and fully understand how it works (and stop copying.pasting from the lessons) before going forward.

From what I understood, understanding the DATA_BLOCK API can let me create my databunches in a “generic” way. (always using the data block code).

And after downloading images from google in 2 separate folders named by label, without Train opr Valid subfolders , i would like to understand how to create my databunch with the data block API

data = (ImageList.from_folder(path)

To have the same results than in the lesson 2 with the shortcut method of :

data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

I don’t really get it…
Let me know


I think i found myself the answer.
My purpose is really to use all the time the generic Data Block API and understand exactly what’s happening (and stop copying/pasting from lessons).

Can anybody confirm that :

data = (ImageItemList.from_folder(path).random_split_by_pct(valid_pct=0.2, seed=4).label_from_folder().transform(get_transforms(), size=224, num_corkers=4).databunch())


data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4)

are exactly the same thing ?

Am i correct ?

Hi Michael,
You are almost correct but path in ImageDataBunch referes to the folder where it expects to find train, valid, test folders that is why you need to specify train='.'. But path in ImageItemList expects a folder where it can find image files(It fetches images recursively). And arguments like .databunch(num_workers=4, bs=4) go into databunch method. Other than that, they are exactly the same thing.