Average max pooling in PoolingLinearClassifier (torch.text)

Varal7 · October 27, 2018, 7:16pm

Hi,

I’m trying to implement Ulmfit into the AllenNLP framework.
I have noticed that for text classification, the PoolingLinearClassifier uses Pytorch’s adaptive_avg_pool1d. However, this does not take into account the padding necessary for batching, therefore leading to slightly incorrect pooling (we divide by a slighlty bigger number than we should)
Is my understanding correct?

github.com

fastai/fastai/blob/master/fastai/text/models.py#L193


def __init__(self, layers:Collection[int], drops:Collection[float]):
    super().__init__()
    mod_layers = []
    activs = [nn.ReLU(inplace=True)] * (len(layers) - 2) + [None]
    for n_in,n_out,p,actn in zip(layers[:-1],layers[1:], drops, activs):
        mod_layers += bn_drop_lin(n_in, n_out, p=p, actn=actn)
    self.layers = nn.Sequential(*mod_layers)


def pool(self, x:Tensor, bs:int, is_max:bool):
    "Pool the tensor along the seq_len dimension."
    f = F.adaptive_max_pool1d if is_max else F.adaptive_avg_pool1d
    return f(x.permute(1,2,0), (1,)).view(bs,-1)


def forward(self, input:Tuple[Tensor,Tensor]) -> Tuple[Tensor,Tensor,Tensor]:
    raw_outputs, outputs = input
    output = outputs[-1]
    sl,bs,_ = output.size()
    avgpool = self.pool(output, bs, False)
    mxpool = self.pool(output, bs, True)
    x = torch.cat([output[-1], mxpool, avgpool], 1)
    x = self.layers(x)

sgugger · October 27, 2018, 8:25pm

It is! There is talk about this in this topic.