Fastai_v1, adding features

devforfu · November 16, 2018, 6:32am

Yes, sure, please check the snippet I’ve shared above. Like, you see, as I can understand the DataBunch class declares (via type annotation) that it accepts torch.data.utils.Dataset instances:

class DataBunch():
    ...

    @classmethod
    def create(cls, train_ds:Dataset, valid_ds:Dataset, test_ds:Dataset=None, path:PathOrStr='.', bs:int=64,
               num_workers:int=defaults.cpus, tfms:Optional[Collection[Callable]]=None, device:torch.device=None,
               collate_fn:Callable=data_collate)->'DataBunch':

I was reading this line like this, following OOP principles: I am capable to take any object that is compliant with Dataset interface and its heirs. However, it is not really the case with the library. For example, I can’t do the following:

from torchvision.datasets import MNIST
from fastai.vision import *

path = Path.home()/'data'/'MNIST'
train_ds = MNIST(path, train=True)
valid_ds = MNIST(path, train=False)

# the objects don't have property `c` and cannot be directly passed into `create`
bunch = ImageDataBunch.create(train_ds, valid_ds)
learn = create_cnn(bunch, models.resnet18)
learn.fit_one_cycle(1)

I would say it expects FastaiDataset interface, that extends the original definition with additional properties. I mean, here is what we have now:

# mock up interface to illustrate my idea
class FastaiDataset(Dataset):

    ... # some other properities

    @property
    def c(self):
        return len(self.classes)

# and then it should be more like this
class DataBunch():
    ...

    @classmethod
    def create(cls, train_ds:FastaiDataset, valid_ds:FastaiDataset, test_ds:FastaiDataset=None, path:PathOrStr='.', bs:int=64,
               num_workers:int=defaults.cpus, tfms:Optional[Collection[Callable]]=None, device:torch.device=None,
               collate_fn:Callable=data_collate)->'DataBunch':
        ...

Probably I am complicating things too much The idea is that it would be great to have a possibility to take any class that is compatible with torch.utils.data.Dataset interface and pass into DataBunch without additional manual wrappers and decorators. I mean, that the type annotation is a bit misleading At least, from my point of view. As soon as the library strictly annotates every argument and returned value, then probably it is important to have the appropriate interfaces.

Of course, it is only my personal opinion.