Auto feature engineering for tabular data

In competitions, it’s normal to spend a lot of time on manual feature engineering, but it’s a bit weird using neural networks. I think that it’s difficult to find interactions with the desired ease for regular linear layers and that’s why cnn’s idea is so powerful: it provides a built-in feature engineering.
I made some search and found this paper:
It makes point-wise multiplications after embeddings, but i don’t know, why we can’t use the same idea using logs of embeddings outputs in additional linear layers.
Another paper on this topic is this:
Here the idea is to use regular embeddings but with “multi head self-attention”.
What do you think about the topic itself? Why is it so unpopular?