When are images loaded?

While using ImageDatabunch, the outputs are image paths and labels. I didn’t see any code in Learner read these paths, wondering when do these images being loaded exactly? Thanks!

Learner is not responsible for loading data

ItemList has method get which is responsible for returning item. This get method is invoked by __getitem__ which is just pytorch Dataset method which is invoked by pytorch Dataloader during iteration in training loop (fastai/basic_train.py:99) or validation loop (fastai/basic_train.py:57). Learner only invokes training/validation loop.

ImageList overwrites this method (fastai/vision/data.py:268) so get method loads file (using open_image function fastai/vision/image:388) instead of returning item.

1 Like

if you have a tar file try the code below (be sure to omit the file ext); by setting the path_img to load into imagedatabunch and learner reads it in as “data”

path = untar_data('https://download.com/tarfilewithoutext'); path
path_img = path
data = ImageDataBunch.from_folder(path=path_img, valid_pct=0.3, ds_tfms=get_transforms(), bs=bs, size=224, num_workers=8).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet101, metrics=[accuracy, error_rate])

Thanks Kornel, that makes a lot sense, but can you tell me when you saying get method is invoked by getitem, where is it exactly? Is it Python default thing?
Didn’t see get is called in getitem method from Dataset from Pytorch.

Thanks!

Hi saltdoc,
I think my question is more about where is the image being loaded, since if you check out the databunch, inside the dataset or dataloader is still path. Thanks

In pytorch Dataloader is used to joining and spliting data items into batches. Also it is created as an iterator so when you call next(dl) it will return next batch, which is done in fastai training loop
Dataset is only to tell Dataloader how to get single item, which you are doing by implementing __getitem__

You can check pytorch source code, but guide is clear enough: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

Here is step by step what happens:

  • You get a dataloader from your ImageDatabunch for the set type you want, for example my_dl = data.dl(ds_type=DatasetType.Train).
  • You trigger __iter__ method by iterating over the dataloader or directly calling next(my_dl.__iter__())
  • There are several levels of wrappers here, but it basically propagates DeviceDataLoader->DataLoader->LabelList->ImageList
  • On the ImageList it calls __gettitem__() to load a single image which after a couple of more calls invokes open_image that uses the PIL library to load the actual image from the disk.
1 Like

Hi Kornel, I know pytorch and used that very often, the question is why getitem will call get method? Thanks for your patience!

Thanks slawekbiel!
I think you mean DataLoader->DeviceDataLoader->LabelList->ImageList. (im wrong)
Can you point me where this happens? “load a single image which after a couple of more calls invokes open_image that uses the PIL library”

The dataloaders order was correct, the ImageDataBunch holds an instance of DeviceDataLoader which in turn holds pytorch’s DataLoader You can easily verify thing like that in a notebook.

type(data.train_dl), type(data.train_dl.dl)
(fastai.basic_data.DeviceDataLoader, torch.utils.data.dataloader.DataLoader)

The order of calls to get an item is:

LabelList::__getitem__()
ItemList::__getitem__()
ImageList::get()
ImageList::open()
open_image()

1 Like

Thanks slawekbiel!
Finally find get is called in ItemList() getitem.

Hey I have been using the fastaiv1 object detection library(GitHub - ChristianMarzahl/ObjectDetection: Some experiments with object detection in PyTorch) and I have a question on how to plot the “data” databunch items,I need to know the distribution of images in the train and validation set (like how many hard negatives and hard positives), how can I do that?
This is my code for loading data:


import numpy as np
train_samples_per_scanner = 3000
val_samples_per_scanner = 1000

train_images = list(np.random.choice(training_set, train_samples_per_scanner))
valid_images = list(np.random.choice(valid_set, val_samples_per_scanner))
batch_size = 64

do_flip = True
flip_vert = True 
max_rotate = 90 
max_zoom = 1.1 
max_lighting = 0.2
max_warp = 0.2
p_affine = 0.75 
p_lighting = 0.75 

tfms = get_transforms(do_flip=do_flip,
                      flip_vert=flip_vert,
                      max_rotate=max_rotate,
                      max_zoom=max_zoom,
                      max_lighting=max_lighting,
                      max_warp=max_warp,
                      p_affine=p_affine,
                      p_lighting=p_lighting)

train, valid ,test = ObjectItemListSlide(train_images), ObjectItemListSlide(valid_images), ObjectItemListSlide(test_images)
item_list = ItemLists(".", train, test)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()

Here training set and validation set are list of object_detection_fastai.helper.wsi_loader.SlideContainer objects
I want a plot of how many of the items have which class([0,1,2]=[‘background’, ‘hard negative’, ‘mitotic figure’])
All suggestions, patch codes, and notebooks are welcome, please share whichever resources are available to you for this problem
Thank you in advance,
Harshit

DATA AS SHOW BY LEARNER OBJECT

learn

data=ImageDataBunch;

Train: SlideLabelList (3000 items)
x: ObjectItemListSlide
Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256)
y: SlideObjectCategoryList
ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256)
Path: .;

Valid: SlideLabelList (1000 items)
x: ObjectItemListSlide
Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256)
y: SlideObjectCategoryList
ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256)
Path: