I am trying to focus on creating my databunches and fully understand how it works (and stop copying.pasting from the lessons) before going forward.
From what I understood, understanding the DATA_BLOCK API can let me create my databunches in a “generic” way. (always using the data block code).
And after downloading images from google in 2 separate folders named by label, without Train opr Valid subfolders , i would like to understand how to create my databunch with the data block API
data = (ImageList.from_folder(path)
To have the same results than in the lesson 2 with the shortcut method of :
data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
I don’t really get it…
Let me know
I think i found myself the answer.
My purpose is really to use all the time the generic Data Block API and understand exactly what’s happening (and stop copying/pasting from lessons).
Can anybody confirm that :
data = (ImageItemList.from_folder(path).random_split_by_pct(valid_pct=0.2, seed=4).label_from_folder().transform(get_transforms(), size=224, num_corkers=4).databunch())
data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4)
are exactly the same thing ?
Am i correct ?
You are almost correct but
ImageDataBunch referes to the folder where it expects to find train, valid, test folders that is why you need to specify
path in ImageItemList expects a folder where it can find image files(It fetches images recursively). And arguments like
.databunch(num_workers=4, bs=4) go into databunch method. Other than that, they are exactly the same thing.