Cannot read Pascal 2007 into a Fastai2 dataloader

joseadolfo · January 30, 2020, 1:42am

I am trying to port a Pascal 2007-based object detection application to fastai2, but it fails reading the Pascal data set into the dataloader.

Have created a dictionary of images and targets, a DataBlock, and a dataloader

img_y_dict = dict(zip(tot_img_names, tot_truths))

data = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
getters=[noop, lambda o: img_y_dict[o.name][0], lambda o: img_y_dict[o.name][1]],
item_tfms=Resize(sz),
batch_tfms=aug_transforms(),
n_inp=1)

dlrs = data.dataloaders(path)

It fails in the getters

in (o)
2 get_items=get_image_files,
3 splitter=RandomSplitter(),
----> 4 getters=[noop, lambda o: img_y_dict[o.name][0], lambda o: img_y_dict[o.name][1]],
5 #getters=[truth_data_func],
6 item_tfms=Resize(sz),

KeyError: ‘008673.jpg’

It fails with, at least, these images: “008673.jpg”, “005939.jpg”, “004236.jpg” in the test.jason, and “008359.jpg” in the valid.json folders. I know that "008359.jpg” has a header, but has missing the “segmentation” entry. The other image/target tuples may have errors as well.

The error says: KeyError: ‘008673.jpg’; I have inspected the img_y_dict dictionary:

for k in img_y_dict.keys():
print(k)

None of the failing tuples appear in the dictionary. They have been filtered out (by the get_annotations?)

muellerzr · January 30, 2020, 3:30am

This may be able to help you. It’s slightly outdated (databunch is dataloader, etc) https://github.com/muellerzr/A-walk-with-fastai2/blob/master/12_Object_Detection.ipynb

muellerzr · January 30, 2020, 5:27pm

I’m going through this now and seeing this too, let me investigate @joseadolfo The issue is we are getting images from the folder rather than from our json document. Here is how I went about adjusting it:

getters = [lambda o: path/'train'/o, lambda o: img2bbox[o][0], lambda o: img2bbox[o][1]]
def get_train_imgs(noop):
  return imgs
pascal = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
                 splitter=RandomSplitter(),
                 get_items=get_train_imgs, 
                 getters=getters,
                 item_tfms=item_tfms,
                 batch_tfms=batch_tfms,
                 n_inp=1)
dls = pascal.dataloaders(path/'train')

I’m working on an actual notebook now but I’ll upload it once it’s done

joseadolfo · January 30, 2020, 6:39pm

Thank you for the helpful information. I believe there is a bug particular to reading the Pascal dataset. When I switch to reading COCO, the DataBlock and dataloaders work perfectly

muellerzr · January 30, 2020, 6:40pm

It’s not, it has to due with how the folder structure is set up. If we look at get_image_files's length inside of the train folder you can see it’s much larger than the length of the train images

muellerzr · January 30, 2020, 7:13pm

@joseadolfo if you need it, I have the start of a RetinaNet notebook (It’s still training but wanted you to have something end to end) https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/06_Object_Detection.ipynb

joseadolfo · January 30, 2020, 9:17pm

Thanks, very kind of you. I wrote an end-to-end object detection notebook in Fastai1 that applies Google’s AutoAugment data augmentation policy. Results were very impressive. I am now porting the notebook to Fastai2 to benchmark performance. Aside from my problems reading Pascal, I am struggling trying to gain access to the image/target tuple at the mini batch level. Something similar to the dl_tfms parameter in Fastai1 databunch. If you are interested, the Fastai1 notebook is at: (https://github.com/jav0927/course-v3/blob/master/SSD_Object_Detection_RS50_V2_0_AutoAugmented.ipynb)

muellerzr · January 30, 2020, 9:19pm

Absolutely! I don’t have all the time to work on it but I can try to help. Where about in the code is this? (where you’re trying to grab the mini-batch)

muellerzr · January 30, 2020, 9:38pm

I believe you’re talking about accessing the transforms on a batch level. That is the batch_tfms parameter. Or after_batch if you’re adjusting it after the fact @joseadolfo Let me know if you need help from there (but that should give a hint on where to start I hope?)

joseadolfo · January 30, 2020, 9:54pm

Thanks very much.

bousejin · July 15, 2020, 2:49pm

Hi Zach. Thank you for your great notebook. After training when I call learn.show_results() it gives me “TypeError: object of type ‘int’ has no len()” error!

muellerzr · July 15, 2020, 4:10pm

Yes. show_results and predict will not work. I’m not very familiar with Object Detection, though I am aware there is a thread on it for fastai v2 where they handled some issues

bousejin · July 15, 2020, 5:34pm

Thank you!

dreamflasher · January 12, 2021, 1:39pm

Thank you for your great tutorials. Here’s the updated link for the object detection tutorial: https://walkwithfastai.com/Object_Detection