Passing missingness as mask to model (Tabular Autoencoder)

Postradamus · July 4, 2022, 1:04pm

Hi, I hope this is the right place for this question.

Context
I am working on a problem where I want to impute missing values in tabular data. Hence I came across the walkwithfastai site on tabular autoencoders which already helped me a lot. I got this working on my problem, but of course I now want to modify it , because I want to try model architectures that directly make use of the missingness. But I am unsure how to move forward.

What I want to do is pass two additional tensors to my model, on top of the continuous and categorical tensors, indicating missingness in both continuous and categorical features.

The question
How should I best approach returning two additional tensors from a modified TabularPandas class indicating missingness / what places need modification?

Currently I’m thinking here in ReadTabBatchIdentity

class ReadTabBatchIdentity(ItemTransform):
    "Read a batch of data and return the inputs as both `x` and `y`"
    def __init__(self, to): store_attr()

    def encodes(self, to):
        if not to.with_cont: res = (tensor(to.cats).long(),) + (tensor(to.cats).long(),)
        else: res = (tensor(to.cats).long(),tensor(to.conts).float()) + (tensor(to.cats).long(), tensor(to.conts).float())
        if to.device is not None: res = to_device(res, to.device)
        return res

one would “just” need to add references to something like to.cats_missing or to.conts_missing. But then I’m not sure if that’s the best way to go and how I’d actually do that.

Any help is greatly appreciated!