Language model: different seq_len in train and valid dl

FraPochetti · March 28, 2020, 5:52pm

Hi all,
not sure I am missing something really basic here but, when I create dataloaders from a DataBlock for a language model, my train_dl and valid_dl show different seq_len (bptt).

Here what I mean

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files, 
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=60)

This is expected

>>> x, y = first(dls_lm.train)
>>> x.shape, y.shape, len(dls_lm.train)

(torch.Size([128, 60]), torch.Size([128, 60]), 53)

This NOT expected (e.g. expecting [128, 60] and getting [128, 72])

>>> x, y = first(dls_lm.valid)
>>> x.shape, y.shape, len(dls_lm.valid)

(torch.Size([128, 72]), torch.Size([128, 72]), 5)

What am I not getting?
Thanks all and happy hacking!

sgugger · March 28, 2020, 5:59pm

Uh oh! Can reproduce and confirm this is a bug!

FraPochetti · March 28, 2020, 6:01pm

ah so this is not me!
Thanks for the super prompt reply.

sgugger · March 28, 2020, 6:04pm

Fixed in master. Thanks for flagging!

FraPochetti · March 28, 2020, 6:07pm

I wanted to add a second part to the above question, but I refrained myself as I was not even sure the first one made sense.
Now I will go all-in

How can the training part not fail (it works perfectly fine)?
Specifically, the validation chunk, e.g. the model gets trained on torch.Size([128, 60]) but then validated on torch.Size([128, 72])

learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3, 
    metrics=[accuracy, Perplexity()]).to_fp16()

learn.fit_one_cycle(1, 2e-2)

FraPochetti · March 28, 2020, 6:08pm

Oh god!
You are the Speedy Gonzales of Deep Learning!
Thanks a ton!

sgugger · March 28, 2020, 6:10pm

seq_len is just the length you feed the model at each time. The model will produce exactly the same results whichever seq_len you use thanks to its hidden state. It is the gradients that will be different (computed over 60 timesteps instead of 72 in this example).

FraPochetti · March 28, 2020, 6:11pm

Great thank you!