Tabular Autoencoder?

I’ve implemented one before in v0.7 and I’m working on porting it to V1.0. There’s a great discussion on this on kaggle where it was used to generate features that won the Porto Seguro safe driver competition.

The two trickiest parts are the shuffling of data, which I’ve got a nice trick for partly because it would have been so complex to integrate into fastai, and the dataloader, which I think should be easier in V1 using label_by_func but I haven’t implemented it.

For data shuffling I originally implemented it in the dataloader, but it’s not as efficient or easy to integrate as my new solution, which is to swap within the batch as a module. Here’s some code to get you started:

class BatchSwapNoise(nn.Module):
“”“Swap Noise module”“”

def __init__(self, p):
    super().__init__()
    self.p = p

def forward(self, x):
    if self.training:
        mask = torch.rand(x.size()) > (1 - self.p)
        idx = torch.add(torch.arange(x.nelement()),
                        (torch.floor(torch.rand(x.size()) * x.size(0)).type(torch.LongTensor) *
                         (mask.type(torch.LongTensor) * x.size(1))).view(-1))
        idx[idx>=x.nelement()] = idx[idx>=x.nelement()]-x.nelement()
        return x.view(-1)[idx].view(x.size())
    else:
        return x

There’s more discussion on the forum here:

3 Likes