Language model: different seq_len in train and valid dl

Hi all,
not sure I am missing something really basic here but, when I create dataloaders from a DataBlock for a language model, my train_dl and valid_dl show different seq_len (bptt).

Here what I mean

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_text_files, 
    splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=60)

This is expected

>>> x, y = first(dls_lm.train)
>>> x.shape, y.shape, len(dls_lm.train)

(torch.Size([128, 60]), torch.Size([128, 60]), 53)

This NOT expected (e.g. expecting [128, 60] and getting [128, 72])

>>> x, y = first(dls_lm.valid)
>>> x.shape, y.shape, len(dls_lm.valid)

(torch.Size([128, 72]), torch.Size([128, 72]), 5)

What am I not getting?
Thanks all and happy hacking!

Uh oh! Can reproduce and confirm this is a bug!

1 Like

ah so this is not me! :smiley:
Thanks for the super prompt reply.

Fixed in master. Thanks for flagging!

1 Like

I wanted to add a second part to the above question, but I refrained myself as I was not even sure the first one made sense.
Now I will go all-in :slight_smile:

How can the training part not fail (it works perfectly fine)?
Specifically, the validation chunk, e.g. the model gets trained on torch.Size([128, 60]) but then validated on torch.Size([128, 72])

learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3, 
    metrics=[accuracy, Perplexity()]).to_fp16()

learn.fit_one_cycle(1, 2e-2)

Oh god!
You are the Speedy Gonzales of Deep Learning!
Thanks a ton!

seq_len is just the length you feed the model at each time. The model will produce exactly the same results whichever seq_len you use thanks to its hidden state. It is the gradients that will be different (computed over 60 timesteps instead of 72 in this example).

2 Likes

Great thank you!