I’m trying to use a custom weighted sampler for my classifier because of imbalanced data, is there any way to do that ? I thought about using **dl_kwargs when calling databunch() factory method, but as a matter of fact it has 2 problems:
shuffle is forced to True for train_dl and to False for other dataloaders, which prevents me from setting it manually to False for train_dl (can’t pass the argument twice)
I can’t pass a sampler that will only be used for train_dl as **dl_kwargs is passed to all dataloaders
I assume I’ll have to create the dataloaders myself and use the Databunch.__init__, but I wanted to be sure I didn’t miss something simpler before.
@sgugger I too am having this exact problem and I like this solution but I am not sure how I would give my data to the new dataloader. I need to get the PyTorch dataset from the Fasai datablock I believe.
TypeError Traceback (most recent call last)
<ipython-input-24-1e5ac58e2822> in <module>
----> 1 data.train_dl = data.train_dl.new(train_dataset, shuffle=False, sampler=sampler)
TypeError: new() takes 1 positional argument but 2 were given
I also think it might be easier just to build the PyTorch datasets and dataloaders with PyTorch and then create a databunch from them. However, with this option I’m not sure how to create the initial PyTorch datasets since I can’t find how Fastai does it for tabular problems.
Well I don’t know why it says your dataloader is empty, while it says at the same time that it has 338562 elements. Did you try a smaller batch size as it states ?
If you want to give it the dataset you can access it through data.train_ds.