Hi @jeremy , recently I’ve been working on the Kaggle’s toxic comment dataset and I’m trying out different methods using pytorch and the fastai library. There are two things that got me confused:
What’s your rationale behind building the new
fastai/text.pymodule without using
torchtext? I saw the post here that you agree with @Deb that there’s a problem with
torchtext's sequential tokenization. Could you shed some light on this issue?
I saw one of the competitors from the Kaggle’s toxic comment competition posted the following thought:
Padding as a regularizer
I built all the models in PyTorch. This gives you huge flexibility, but I struggled for a long time to replicate the results people were achieving with simple GRU models in keras. It turns out the biggest difference was sequence padding. My PyTorch code used variable length sequences (data split into buckets and then padded). Padding all sequences to the same length appears to have a significant regularising effect, so my best results were achieved by using a single or very small number of buckets.
Another competitor then replied the following message:
I remember reading a few months back somewhere on Keras’s github issues discussions, @jhoward commented on PyTorch vs Keras padding and the effect it has on regularization, as well as the effect of pre-post padding. I wish you had saw that, it would have saved you some trouble ;-).
I couldn’t find the Github issue, but with my Google-fu, it brought me to one of your tweets in January:
Turns out having lots of padding was somehow regularizing the model. It took more epochs to train, and ended with a better accuracy. I’ve now increased the dropout on the fixed model, and get the same performance.
Something interesting going on there…
Since then do you have any additional thoughts on why padding provides some regularizing effect? Also, how exactly do you pad your text to get the effect? Does this effect lead to rebuilding the