Hi, I hope this is the right place for this question.
I am working on a problem where I want to impute missing values in tabular data. Hence I came across the walkwithfastai site on tabular autoencoders which already helped me a lot. I got this working on my problem, but of course I now want to modify it , because I want to try model architectures that directly make use of the missingness. But I am unsure how to move forward.
What I want to do is pass two additional tensors to my model, on top of the continuous and categorical tensors, indicating missingness in both continuous and categorical features.
How should I best approach returning two additional tensors from a modified
TabularPandas class indicating missingness / what places need modification?
Currently I’m thinking here in ReadTabBatchIdentity
class ReadTabBatchIdentity(ItemTransform): "Read a batch of data and return the inputs as both `x` and `y`" def __init__(self, to): store_attr() def encodes(self, to): if not to.with_cont: res = (tensor(to.cats).long(),) + (tensor(to.cats).long(),) else: res = (tensor(to.cats).long(),tensor(to.conts).float()) + (tensor(to.cats).long(), tensor(to.conts).float()) if to.device is not None: res = to_device(res, to.device) return res
one would “just” need to add references to something like
to.conts_missing. But then I’m not sure if that’s the best way to go and how I’d actually do that.
Any help is greatly appreciated!