even though creating the dataloaders works as expected and I can train the network, when I inspected the pipeline it still gives me an error:
Final sample: (TensorText([ 2, 8, 285, 8, 23, 8, 283, 15, 8, 9, 55, 16, 274, 14,
48, 74, 17, 97, 157, 22, 0]),)
Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline:
Building one batch
Applying item_tfms to the first sample:
Pipeline: ToTensor
starting from
(TensorText of size 21)
applying ToTensor gives
(TensorText of size 21)
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([21]),torch.Size([59]),torch.Size([59]),torch.Size([19])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-73-1b741f3e22f1> in <module>()
----> 1 db_lm.summary(data)
6 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
53 storage = elem.storage()._new_shared(numel)
54 out = elem.new(storage)
---> 55 return torch.stack(batch, 0, out=out)
56 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
57 and elem_type.__name__ != 'string_':
RuntimeError: stack expects each tensor to be equal size, but got [21] at entry 0 and [59] at entry 1
I would like to understand what is going on - i.e. why the subsequent steps work but .summary()
is failing?
Building a classifier
I am trying to adapt this example for my application:
So far this is what I got:
def get_y(s):
return s['Classification']
dls_clas = DataBlock(
blocks=(TextBlock.from_df('Answered Questions', vocab=dls_lm.vocab, seq_len=dls.seq_len), CategoryBlock),
get_y = get_y,
splitter=RandomSplitter(0.1)).dataloaders(data, bs=128)
The error it returns me is:
/usr/local/lib/python3.6/dist-packages/fastai2/text/data.py in <listcomp>(.0)
43 self.o2i = defaultdict(int, {v:k for k,v in enumerate(self.vocab) if v != 'xxfake'})
44
---> 45 def encodes(self, o): return TensorText(tensor([self.o2i [o_] for o_ in o]))
46 def decodes(self, o): return L(self.vocab[o_] for o_ in o if self.vocab[o_] != PAD)
47
TypeError: unhashable type: 'L'
My main concern is how to specify that from the several columns that my dataframe contains the x is ‘Answered Questions’ and the y ‘Classification’. The remaining columns can simply be ignored. How can I specify it correctly?