ImagedataBunch versus Data Block API

mturbot · March 11, 2019, 4:37pm

Hello,

I am trying to focus on creating my databunches and fully understand how it works (and stop copying.pasting from the lessons) before going forward.

From what I understood, understanding the DATA_BLOCK API can let me create my databunches in a “generic” way. (always using the data block code).

And after downloading images from google in 2 separate folders named by label, without Train opr Valid subfolders , i would like to understand how to create my databunch with the data block API

data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_folder()
.transform()
.databunch())

To have the same results than in the lesson 2 with the shortcut method of :

data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

I don’t really get it…
Let me know
thanks,
Michael

mturbot · March 12, 2019, 9:58am

Hi,

I think i found myself the answer.
My purpose is really to use all the time the generic Data Block API and understand exactly what’s happening (and stop copying/pasting from lessons).

Can anybody confirm that :

data = (ImageItemList.from_folder(path).random_split_by_pct(valid_pct=0.2, seed=4).label_from_folder().transform(get_transforms(), size=224, num_corkers=4).databunch())

and

data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4)

are exactly the same thing ?

Am i correct ?

bharat0to · March 12, 2019, 4:12pm

Hi Michael,
You are almost correct but path in ImageDataBunch referes to the folder where it expects to find train, valid, test folders that is why you need to specify train='.'. But path in ImageItemList expects a folder where it can find image files(It fetches images recursively). And arguments like .databunch(num_workers=4, bs=4) go into databunch method. Other than that, they are exactly the same thing.