While using ImageDatabunch, the outputs are image paths and labels. I didn’t see any code in Learner read these paths, wondering when do these images being loaded exactly? Thanks!
Learner is not responsible for loading data
ItemList has method
get which is responsible for returning item. This
get method is invoked by
__getitem__ which is just pytorch Dataset method which is invoked by pytorch Dataloader during iteration in training loop (fastai/basic_train.py:99) or validation loop (fastai/basic_train.py:57).
Learner only invokes training/validation loop.
ImageList overwrites this method (fastai/vision/data.py:268) so
get method loads file (using
open_image function fastai/vision/image:388) instead of returning item.
if you have a tar file try the code below (be sure to omit the file ext); by setting the path_img to load into imagedatabunch and learner reads it in as “data”
path = untar_data('https://download.com/tarfilewithoutext'); path path_img = path data = ImageDataBunch.from_folder(path=path_img, valid_pct=0.3, ds_tfms=get_transforms(), bs=bs, size=224, num_workers=8).normalize(imagenet_stats) learn = cnn_learner(data, models.resnet101, metrics=[accuracy, error_rate])
Thanks Kornel, that makes a lot sense, but can you tell me when you saying get method is invoked by getitem, where is it exactly? Is it Python default thing?
Didn’t see get is called in getitem method from Dataset from Pytorch.
I think my question is more about where is the image being loaded, since if you check out the databunch, inside the dataset or dataloader is still path. Thanks
In pytorch Dataloader is used to joining and spliting data items into batches. Also it is created as an iterator so when you call
next(dl) it will return next batch, which is done in fastai training loop
Dataset is only to tell Dataloader how to get single item, which you are doing by implementing
You can check pytorch source code, but guide is clear enough: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
Here is step by step what happens:
- You get a dataloader from your ImageDatabunch for the set type you want, for example
my_dl = data.dl(ds_type=DatasetType.Train).
- You trigger
__iter__method by iterating over the dataloader or directly calling
- There are several levels of wrappers here, but it basically propagates
- On the
__gettitem__()to load a single image which after a couple of more calls invokes open_image that uses the PIL library to load the actual image from the disk.
Hi Kornel, I know pytorch and used that very often, the question is why getitem will call get method? Thanks for your patience!
I think you mean DataLoader->DeviceDataLoader->LabelList->ImageList. (im wrong)
Can you point me where this happens? “load a single image which after a couple of more calls invokes open_image that uses the PIL library”
The dataloaders order was correct, the
ImageDataBunch holds an instance of
DeviceDataLoader which in turn holds pytorch’s
DataLoader You can easily verify thing like that in a notebook.
The order of calls to get an item is:
Finally find get is called in ItemList() getitem.