I’m working on a binary classification problem. My dataset is stored in an external database. So I wrote my own
class MyDS(Dataset):
. . .
def __getitem__(self, index):
# external database is queried here
the_x = torch.tensor(...)
the_y = bool(...)
return torch.unsqueeze(the_x, 0), torch.unsqueeze(torch.tensor(the_y).float(), 0)
Would it be wrong to let __getitem__() to return a whole minibatch? That would reduce the overhead of external querying. PyTorch forces me to unsqueeze() anyhow, adding that sample-within-minibatch dimension.
To get it working in fastai, I’m using batch size of one in DataBunch.create(..., bs=1). The Learner still adds a singleton outermost dimension to x and y, which I remove in the on_batch_begin callback.