Making subset dataset from Open Images Dataset V4 for your app

daisukelab · January 2, 2019, 3:57am

Hi everyone,

I’m sharing what I’m working on recently, it helps:

Creating subset from big Open Images Dataset V4 (OIDV4) for your application model training.
Filtering annotations by class, size of bbox.
Finding better balanced annotations.

It’s great that big datasets are available now for free like OIDV4, but we have new problem that it’s tough to reshape them for applying to our own application use.

Then I made a utility to mitigating this problem. And here is an example notebook that shows how to make subset, and also shows training a simple object detection model borrowing from courses/dl2/pascal.ipynb.
–> Link to notebook Example_Open_Images_Dataset_V4_fast.ai.v0.7.pascal.ipynb

Please note that this notebook uses old library (v0.7).

I’m planning to migrating it to fast.ai v1 after new course becomes available.

This is part of my project repository for summarizing utility and example for practical image applications.

Link to daisukelab/dl-image

Hope it helps.