Hello,
It is not my first post regarding Data Block API - i’m a bit stubborn, sorry; but I really don’t want to go further in the lessons until I understand how to create my databunches.
regarding the first lessons with the URLs.MNIST_TINY and the documentation in the examples for “look at data”. all the examples are using the ImageDataBunch but I am trying to understand how to do the same with the Data Block that many people say should be the usual way of creating our databunches. :
I have a folder :
Mnist_tiny :
– models
– test
– train :
— 3
— 7
– valid
– labels.csv
It would be really helpful if you could guide me on creating the databunch using different ways :
ImageItemList.from_folder :
data = (ImageItemList.from_folder(path).split_by_folder().label_from_folder().transform(tfms, size=24).databunch())
This one works ok.
questions :
.split_by_folder : it nothing is in the brackets, it goes and check if it is structured as a image-net stucture with valid and train folders ?
it goes into the folder train, sees 2 folders ‘3’ and ‘7’ and splits them automatically ? What would have happened (and needed to be changed if they were called ‘train-files’ and ‘valid-files’ ?
Other question :
What would have been my structuring for these if my folders were as this :
Mnist_tiny :
- models
- test
– 3 :
— train
— valid
– 7 :
- train
- valid
- labels.csv
ImageItemList.from_csv :
Can’t create my databunch here with the datablock API…
data = (ImageItemList.from_csv(path, ‘labels.csv’)) works
but I really don’t know how to split and label and create my databunch from the csv file.
Should i create a df from the csv with df = pd.read_csv(path/'labels.csv) or can I just make it from the csv itself ?
Please help me, I know that once I understand the datablock and what it is trying to do for each step, the rest will easely follow.
It would be really cool if we could have in the docs of “Look at Data” the examples explained bith with ImageDataBunch AND ImageItemList with the datablock for a better understanding
Thanks a lot,
Michael