Yes, sure, please check the snippet I’ve shared above. Like, you see, as I can understand the DataBunch
class declares (via type annotation) that it accepts torch.data.utils.Dataset
instances:
class DataBunch():
...
@classmethod
def create(cls, train_ds:Dataset, valid_ds:Dataset, test_ds:Dataset=None, path:PathOrStr='.', bs:int=64,
num_workers:int=defaults.cpus, tfms:Optional[Collection[Callable]]=None, device:torch.device=None,
collate_fn:Callable=data_collate)->'DataBunch':
I was reading this line like this, following OOP principles: I am capable to take any object that is compliant with Dataset
interface and its heirs. However, it is not really the case with the library. For example, I can’t do the following:
from torchvision.datasets import MNIST
from fastai.vision import *
path = Path.home()/'data'/'MNIST'
train_ds = MNIST(path, train=True)
valid_ds = MNIST(path, train=False)
# the objects don't have property `c` and cannot be directly passed into `create`
bunch = ImageDataBunch.create(train_ds, valid_ds)
learn = create_cnn(bunch, models.resnet18)
learn.fit_one_cycle(1)
I would say it expects FastaiDataset
interface, that extends the original definition with additional properties. I mean, here is what we have now:
# mock up interface to illustrate my idea
class FastaiDataset(Dataset):
... # some other properities
@property
def c(self):
return len(self.classes)
# and then it should be more like this
class DataBunch():
...
@classmethod
def create(cls, train_ds:FastaiDataset, valid_ds:FastaiDataset, test_ds:FastaiDataset=None, path:PathOrStr='.', bs:int=64,
num_workers:int=defaults.cpus, tfms:Optional[Collection[Callable]]=None, device:torch.device=None,
collate_fn:Callable=data_collate)->'DataBunch':
...
Probably I am complicating things too much The idea is that it would be great to have a possibility to take any class that is compatible with
torch.utils.data.Dataset
interface and pass into DataBunch
without additional manual wrappers and decorators. I mean, that the type annotation is a bit misleading At least, from my point of view. As soon as the library strictly annotates every argument and returned value, then probably it is important to have the appropriate interfaces.
Of course, it is only my personal opinion.