Applying part 2 to tabular data

Nice post.

I have not looked at the dataset in any detail but things I would investigate.

Is there a class imbalance? If so, adjust for it by oversampling the under represented classes.

Run the tabula data through a CNN by treating each product id the same way we would an image where rows are entries for the matching id, columns are the features activated in that row and there is 1 channel.

If that doesn’t work try feeding the CNN different representations. For example, rows are entries for the matching id, columns are the features activated in that row and each feature is also a channel that is repeated across all columns.

replies to the replies:

@RogerS49 - thanks for the git advice! I hope soon I’ll overcome my time/courage constraints and get to it, I know it’s going to improve my life. I’m pretty sure that if you will write about your journey into beginning git it will be relevant to many of us here.

@tomsthom - following Jeremy’s reply, namely BN and LSUV are NOT redundant together, I’ll definitely make an effort to make them work together. Your idea, to normalize activations after the linear layer sounds like the easiest solution as it won’t require iterating at all. I hope to find the time soon to report about this more. Do let me know what you find with the relu sub!

@maral - Thanks for the advice. As for class imbalance - I tried to minimize preprocessing to minimum here as my interest is to check the modeling methods we learn, and I feel I don’t want to mix in more factors and hyperparameters such as sampling methods, etc. The CNN sounds like a great idea, on the lines of the link that was posted above by knesgood. As I said, my first priority now is to follow the courses methods, but I hope that afterwards I’ll get to checking CNNs on tabular data.

Thanks everyone for your replies!
I’m in some busy days now and it will probably take a bit longer until my next code post. Meanwhile if anyone wants to share her/his experiences, insights, failures, ideas with tabular data please do!