Hey,
I am currently redoing the translation part and am trying to do it in fastai v1. I want to train an Image caption generator and I want to start by training a seq2seq auto encoder. The model is supposed to learn to map a sentence into itself.
I have troubles creating the Dataset in a way that the function pad_allocate
accepts a batch.
The dataset was created the following way in the fastai v0.7 translate notebook:
def A(*a):
"""convert iterable object into numpy array"""
return np.array(a[0]) if len(a)==1 else [np.array(o) for o in a]
class Seq2SeqDataset(Dataset):
def __init__(self, x, y):
self.x, self.y = x, y
def __getitem__(self, idx):
return A(self.x[idx], self.y[idx])
def __len__(self):
return len(self.x)
Problem:
trn_dl = DataLoader(dataset=trn_ds, batch_size=bs, sampler=trn_sampler, collate_fn=pad_collate)
batch = next(iter(trn_dl))
fails with:
can’t convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.
I have tried to break my problem down:
trn_ds
is such a dataset and [trn_ds[0]]
gives:
[[array([ 9, 411, 1019, 700, 498, 1]),
array([ 9, 411, 1019, 700, 498, 1])]]
And pad_collate([trn_ds[0]], pad_idx=1, pad_first=False)
gives;
(tensor([[ 9, 411, 1019, 700, 498, 1]]),
tensor([[ 9, 411, 1019, 700, 498, 1]]))
So far so good.
However, if I try to pass more than one sentence to pad_collate
it fails with the same error message as above:
[trn_ds[0], trn_ds[2]]
is:
[[array([ 9, 411, 1019, 700, 498, 1]),
array([ 9, 411, 1019, 700, 498, 1])],
[array([ 51, 4386, 68, 193, 12, 107, 11, 9, 2768, 1]),
array([ 51, 4386, 68, 193, 12, 107, 11, 9, 2768, 1])]]
And pad_collate([trn_ds[0], trn_ds[2]], pad_idx=1, pad_first=False)
gives the same error:
can’t convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.
This line causes the error:
tensor(np.array([trn_ds[0][1], trn_ds[1][1]]))
Ok, it is kind of obvious that this can’t be converted to a tensor. But how do I have to construct the input to pad_collate
so that it accepts a batch?
Thanks in advance!
F