I am trying to focus on creating my databunches and fully understand how it works (and stop copying.pasting from the lessons) before going forward.
From what I understood, understanding the DATA_BLOCK API can let me create my databunches in a “generic” way. (always using the data block code).
And after downloading images from google in 2 separate folders named by label, without Train opr Valid subfolders , i would like to understand how to create my databunch with the data block API
data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_folder()
.transform()
.databunch())
To have the same results than in the lesson 2 with the shortcut method of :
data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
I don’t really get it…
Let me know
thanks,
Michael
I think i found myself the answer.
My purpose is really to use all the time the generic Data Block API and understand exactly what’s happening (and stop copying/pasting from lessons).
Can anybody confirm that :
data = (ImageItemList.from_folder(path).random_split_by_pct(valid_pct=0.2, seed=4).label_from_folder().transform(get_transforms(), size=224, num_corkers=4).databunch())
and
data = ImageDataBunch.from_folder(path, train=’.’, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4)
Hi Michael,
You are almost correct but path in ImageDataBunch referes to the folder where it expects to find train, valid, test folders that is why you need to specify train='.'. But path in ImageItemList expects a folder where it can find image files(It fetches images recursively). And arguments like .databunch(num_workers=4, bs=4) go into databunch method. Other than that, they are exactly the same thing.