In the current implementation, the RNN output pooling calculates the average of the entire tensor including the padding. This pooling creates inconsistent behavior when predicting batches of texts (due to the change in the amount of padding).
This PR replaces the previous padding with PyTorch PackedSequence to pack and pad the texts in the RNN core. This change also improves the RNN speed.
We also use the lengths to calculate the average pooling while ignoring the padding.
Note: In the current implementation of MultiBatchRNN, predicting a batch that contains texts with more than 1*bptt difference in length is not recommended.
Here is the commit with my changes:
I’d appreciate your feedback!
p.s. @jeremy@sgugger - this is a followup on my PR, this time without breaking the current implementation
Thanks! What we actually need is a jupyter notebook showing your solution, how its used, and tests. We’ll then integrate it in to the library, and credit you in the CHANGES doc. You can share the notebook using the ‘gist it’ extension. Hope that works for you.
Yes please. We would mostly need the full IMDB notebook to check it goes to same accuracy (94.7-94.8%). Here is the draft that was in dev to get the values of the hyper-parameters, the rest should be the same with the full dataset instead of the sample.
Sorry for the late response. Here is the IMDB notebook with all the adaptation for fastai v1.
I reached accuracy of 92%, but I did only one cycle of training for the language model (it’s quite slow and I’m cheap ). I believe we can reach the same accuracy with more training.
So I tried it on the full imdb dataset. It does seem to help a tiny little bit for the final accuracy (though that could be random) but it’s actually slower than the current implementation. Shouldn’t it be faster?
Actually that wasn’t the point imo. The reason I suggested this change was that the prediction results in production were unstable when predicting a batch of inputs together - the output of a single input varied depending on the amount of padding needed.
I also hoped it’d speed up things as well
If I understand you correctly, the question you are trying to address is: if I changed the amount of padding for the same piece of text, would I get varying predictions? I love your idea of dividing by length as opposed to blind average.
If this was the case, the real test is whether your predictions remain consistent if you change the padding on a single line of text you are trying to predict on (and comparatively, if the current fastai model generates inconsistent predictions)
Now I haven’t read your code, I’m not sure why you wanted to do the padding in RNN core instead of PackedSequence, but if that is what is slowing the machine down, potentially, you can revert to the original version?
You’re right (If I understand you correctly ), changing the amount of padding affects the predictions. It’s relevant only if you want to predict a batch of inputs, because otherwise you don’t need to pad the input.
I think you’re right again :), we probably can revert to the original version without the PackedSequence and keep only the new pooling. I’ll have to look into it though