Auto feature engineering for tabular data

In competitions, it’s normal to spend a lot of time on manual feature engineering, but it’s a bit weird using neural networks. I think that it’s difficult to find interactions with the desired ease for regular linear layers and that’s why cnn’s idea is so powerful: it provides a built-in feature engineering.
I made some search and found this paper: https://arxiv.org/abs/1807.00311
It makes point-wise multiplications after embeddings, but i don’t know, why we can’t use the same idea using logs of embeddings outputs in additional linear layers.
Another paper on this topic is this: https://export.arxiv.org/pdf/1810.11921
Here the idea is to use regular embeddings but with “multi head self-attention”.
What do you think about the topic itself? Why is it so unpopular?

2 Likes