While using ImageDatabunch, the outputs are image paths and labels. I didn’t see any code in Learner read these paths, wondering when do these images being loaded exactly? Thanks!
Learner
is not responsible for loading data
ItemList
has method get
which is responsible for returning item. This get
method is invoked by __getitem__
which is just pytorch Dataset method which is invoked by pytorch Dataloader during iteration in training loop (fastai/basic_train.py:99) or validation loop (fastai/basic_train.py:57). Learner
only invokes training/validation loop.
ImageList
overwrites this method (fastai/vision/data.py:268) so get
method loads file (using open_image
function fastai/vision/image:388) instead of returning item.
if you have a tar file try the code below (be sure to omit the file ext); by setting the path_img to load into imagedatabunch and learner reads it in as “data”
path = untar_data('https://download.com/tarfilewithoutext'); path
path_img = path
data = ImageDataBunch.from_folder(path=path_img, valid_pct=0.3, ds_tfms=get_transforms(), bs=bs, size=224, num_workers=8).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet101, metrics=[accuracy, error_rate])
Thanks Kornel, that makes a lot sense, but can you tell me when you saying get method is invoked by getitem, where is it exactly? Is it Python default thing?
Didn’t see get is called in getitem method from Dataset from Pytorch.
Thanks!
Hi saltdoc,
I think my question is more about where is the image being loaded, since if you check out the databunch, inside the dataset or dataloader is still path. Thanks
In pytorch Dataloader is used to joining and spliting data items into batches. Also it is created as an iterator so when you call next(dl)
it will return next batch, which is done in fastai training loop
Dataset is only to tell Dataloader how to get single item, which you are doing by implementing __getitem__
You can check pytorch source code, but guide is clear enough: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
Here is step by step what happens:
- You get a dataloader from your ImageDatabunch for the set type you want, for example
my_dl = data.dl(ds_type=DatasetType.Train)
. - You trigger
__iter__
method by iterating over the dataloader or directly callingnext(my_dl.__iter__())
- There are several levels of wrappers here, but it basically propagates
DeviceDataLoader->DataLoader->LabelList->ImageList
- On the
ImageList
it calls__gettitem__()
to load a single image which after a couple of more calls invokes open_image that uses the PIL library to load the actual image from the disk.
Hi Kornel, I know pytorch and used that very often, the question is why getitem will call get method? Thanks for your patience!
Thanks slawekbiel!
I think you mean DataLoader->DeviceDataLoader->LabelList->ImageList. (im wrong)
Can you point me where this happens? “load a single image which after a couple of more calls invokes open_image that uses the PIL library”
The dataloaders order was correct, the ImageDataBunch
holds an instance of DeviceDataLoader
which in turn holds pytorch’s DataLoader
You can easily verify thing like that in a notebook.
type(data.train_dl), type(data.train_dl.dl)
(fastai.basic_data.DeviceDataLoader, torch.utils.data.dataloader.DataLoader)
The order of calls to get an item is:
LabelList::__getitem__()
ItemList::__getitem__()
ImageList::get()
ImageList::open()
open_image()
Thanks slawekbiel!
Finally find get is called in ItemList() getitem.
Hey I have been using the fastaiv1 object detection library(GitHub - ChristianMarzahl/ObjectDetection: Some experiments with object detection in PyTorch) and I have a question on how to plot the “data” databunch items,I need to know the distribution of images in the train and validation set (like how many hard negatives and hard positives), how can I do that?
This is my code for loading data:
import numpy as np
train_samples_per_scanner = 3000
val_samples_per_scanner = 1000
train_images = list(np.random.choice(training_set, train_samples_per_scanner))
valid_images = list(np.random.choice(valid_set, val_samples_per_scanner))
batch_size = 64
do_flip = True
flip_vert = True
max_rotate = 90
max_zoom = 1.1
max_lighting = 0.2
max_warp = 0.2
p_affine = 0.75
p_lighting = 0.75
tfms = get_transforms(do_flip=do_flip,
flip_vert=flip_vert,
max_rotate=max_rotate,
max_zoom=max_zoom,
max_lighting=max_lighting,
max_warp=max_warp,
p_affine=p_affine,
p_lighting=p_lighting)
train, valid ,test = ObjectItemListSlide(train_images), ObjectItemListSlide(valid_images), ObjectItemListSlide(test_images)
item_list = ItemLists(".", train, test)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()
Here training set and validation set are list of object_detection_fastai.helper.wsi_loader.SlideContainer objects
I want a plot of how many of the items have which class([0,1,2]=[‘background’, ‘hard negative’, ‘mitotic figure’])
All suggestions, patch codes, and notebooks are welcome, please share whichever resources are available to you for this problem
Thank you in advance,
Harshit
DATA AS SHOW BY LEARNER OBJECT
learn
data=ImageDataBunch;
Train: SlideLabelList (3000 items)
x: ObjectItemListSlide
Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256)
y: SlideObjectCategoryList
ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256)
Path: .;
Valid: SlideLabelList (1000 items)
x: ObjectItemListSlide
Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256),Image (3, 256, 256)
y: SlideObjectCategoryList
ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256),ImageBBox (256, 256)
Path: