Cannot read Pascal 2007 into a Fastai2 dataloader

I am trying to port a Pascal 2007-based object detection application to fastai2, but it fails reading the Pascal data set into the dataloader.

Have created a dictionary of images and targets, a DataBlock, and a dataloader

img_y_dict = dict(zip(tot_img_names, tot_truths))

data = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
getters=[noop, lambda o: img_y_dict[o.name][0], lambda o: img_y_dict[o.name][1]],
item_tfms=Resize(sz),
batch_tfms=aug_transforms(),
n_inp=1)

dlrs = data.dataloaders(path)

It fails in the getters

in (o)
2 get_items=get_image_files,
3 splitter=RandomSplitter(),
----> 4 getters=[noop, lambda o: img_y_dict[o.name][0], lambda o: img_y_dict[o.name][1]],
5 #getters=[truth_data_func],
6 item_tfms=Resize(sz),

KeyError: ā€˜008673.jpgā€™

It fails with, at least, these images: ā€œ008673.jpgā€, ā€œ005939.jpgā€, ā€œ004236.jpgā€ in the test.jason, and ā€œ008359.jpgā€ in the valid.json folders. I know that "008359.jpgā€ has a header, but has missing the ā€œsegmentationā€ entry. The other image/target tuples may have errors as well.

The error says: KeyError: ā€˜008673.jpgā€™; I have inspected the img_y_dict dictionary:

for k in img_y_dict.keys():
print(k)

None of the failing tuples appear in the dictionary. They have been filtered out (by the get_annotations?)

1 Like

This may be able to help you. Itā€™s slightly outdated (databunch is dataloader, etc) https://github.com/muellerzr/A-walk-with-fastai2/blob/master/12_Object_Detection.ipynb

Iā€™m going through this now and seeing this too, let me investigate @joseadolfo :slight_smile: The issue is we are getting images from the folder rather than from our json document. Here is how I went about adjusting it:

getters = [lambda o: path/'train'/o, lambda o: img2bbox[o][0], lambda o: img2bbox[o][1]]
def get_train_imgs(noop):
  return imgs
pascal = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
                 splitter=RandomSplitter(),
                 get_items=get_train_imgs, 
                 getters=getters,
                 item_tfms=item_tfms,
                 batch_tfms=batch_tfms,
                 n_inp=1)
dls = pascal.dataloaders(path/'train')

Iā€™m working on an actual notebook now but Iā€™ll upload it once itā€™s done :slight_smile:

Thank you for the helpful information. I believe there is a bug particular to reading the Pascal dataset. When I switch to reading COCO, the DataBlock and dataloaders work perfectly

Itā€™s not, it has to due with how the folder structure is set up. If we look at get_image_files's length inside of the train folder you can see itā€™s much larger than the length of the train images

@joseadolfo if you need it, I have the start of a RetinaNet notebook (Itā€™s still training but wanted you to have something end to end) :slight_smile: https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/06_Object_Detection.ipynb

Thanks, very kind of you. I wrote an end-to-end object detection notebook in Fastai1 that applies Googleā€™s AutoAugment data augmentation policy. Results were very impressive. I am now porting the notebook to Fastai2 to benchmark performance. Aside from my problems reading Pascal, I am struggling trying to gain access to the image/target tuple at the mini batch level. Something similar to the dl_tfms parameter in Fastai1 databunch. If you are interested, the Fastai1 notebook is at: (https://github.com/jav0927/course-v3/blob/master/SSD_Object_Detection_RS50_V2_0_AutoAugmented.ipynb)

2 Likes

Absolutely! :slight_smile: I donā€™t have all the time to work on it but I can try to help. Where about in the code is this? (where youā€™re trying to grab the mini-batch)

I believe youā€™re talking about accessing the transforms on a batch level. That is the batch_tfms parameter. Or after_batch if youā€™re adjusting it after the fact @joseadolfo :slight_smile: Let me know if you need help from there (but that should give a hint on where to start I hope?)

1 Like

Thanks very much.

Hi Zach. Thank you for your great notebook. After training when I call learn.show_results() it gives me ā€œTypeError: object of type ā€˜intā€™ has no len()ā€ error!

Yes. show_results and predict will not work. Iā€™m not very familiar with Object Detection, though I am aware there is a thread on it for fastai v2 where they handled some issues

Thank you!

Thank you for your great tutorials. Hereā€™s the updated link for the object detection tutorial: https://walkwithfastai.com/Object_Detection

1 Like