Hi all, this is my first post so i’ll try to format this correctly.
There is a ling of code in Learner.py masked_concat_pool that just doesn’t make sense to me:
Currently it looks like this
avg_pool = output.masked_fill(mask[:, :, None], 0).mean(dim=1) avg_pool *= output.size(1) / (output.size(1)-mask.type(avg_pool.dtype).sum(dim=1))[:,None]
The average pool was first generated using “mean” then it was scaled by the inverse of the sequence length. This seems to imply that the shorter the sequence, the larger the mean should be scaled. I wonder if this is a trick to improve performance, or a bug because maybe someone assume the first line is using “sum”?
I believe the code is not like this from the class, it’s also not like this when fastai was still using pytorch adaptive 1d pooling?