[Proposal] VisionDataBunch class

tnisonoff · October 9, 2018, 12:29am

In an experimental commit here*

I create a VisionDataBunch in vision/data.py that overrides #create and #labels_to_csv rather than creating private functions and overwriting DataBunch’s versions of these on import

The current implementation has the major drawback that once you import
fastai.vision.data, you can no longer use DataBunch.create on
non-image datasets.

Additionally, this seems cleaner and easier to reason about.

If people think this is an appropriate change I’m happy to modify the existing notebooks / create a notebook for this change, but I was hesitant to go through that process if the change would be rejected.

I’m interested to hear others’ thoughts!

https://github.com/tylernisonoff/fastai/commit/1901da586b284608faa533f52d76397d15b8db6f

sgugger · October 9, 2018, 1:38am

Why is that? The method is monkey-patched but its old behavior isn’t erased.

tnisonoff · October 9, 2018, 2:17am

Ah I see now that they’re essentially the same implementation, but the second one adds onto it.

Howabout we just merge the ds_tfms keyword arg into the original DataBunch.create

this way we dont have to maintain two versions of the same code, and its a lot easier to reason about (I got tripped up debugging some vision code as I was looking at the wrong create method).

sgugger · October 9, 2018, 12:01pm

There is a dependency reason that makes it that way. fastai.data shouldn’t depend on fastai.vision.data.