not sure I am missing something really basic here but, when I create
dataloaders from a
DataBlock for a language model, my
valid_dl show different
Here what I mean
dls_lm = DataBlock(
).dataloaders(path, path=path, bs=128, seq_len=60)
This is expected
>>> x, y = first(dls_lm.train)
>>> x.shape, y.shape, len(dls_lm.train)
(torch.Size([128, 60]), torch.Size([128, 60]), 53)
This NOT expected (e.g. expecting
[128, 60] and getting
>>> x, y = first(dls_lm.valid)
>>> x.shape, y.shape, len(dls_lm.valid)
(torch.Size([128, 72]), torch.Size([128, 72]), 5)
What am I not getting?
Thanks all and happy hacking!
Uh oh! Can reproduce and confirm this is a bug!
ah so this is not me!
Thanks for the super prompt reply.
Fixed in master. Thanks for flagging!
I wanted to add a second part to the above question, but I refrained myself as I was not even sure the first one made sense.
Now I will go all-in
How can the training part not fail (it works perfectly fine)?
Specifically, the validation chunk, e.g. the model gets trained on
torch.Size([128, 60]) but then validated on
learn = language_model_learner(
dls_lm, AWD_LSTM, drop_mult=0.3,
You are the Speedy Gonzales of Deep Learning!
Thanks a ton!
seq_len is just the length you feed the model at each time. The model will produce exactly the same results whichever
seq_len you use thanks to its hidden state. It is the gradients that will be different (computed over 60 timesteps instead of 72 in this example).