Average max pooling in PoolingLinearClassifier (torch.text)


I’m trying to implement Ulmfit into the AllenNLP framework.
I have noticed that for text classification, the PoolingLinearClassifier uses Pytorch’s adaptive_avg_pool1d. However, this does not take into account the padding necessary for batching, therefore leading to slightly incorrect pooling (we divide by a slighlty bigger number than we should)
Is my understanding correct?

It is! There is talk about this in this topic.