Optimizing Tabular Data vs. LightGBM

mindtrinket · March 27, 2019, 4:11am

This has been very helpful!

I tried to change the signals into images and did image classification

@karthik.subraveti I would love to see your image classification. I tried the same thing and will try to share it this week. It was HORRIBLE (less than just all 0s).

@tanyaroosta My big takeaway from the Rossmann paper was embeddings are amazing and help everyone. Unfortunately, it was a year before LightGBM, and LightGBM findings do not appear. So we would have to go back and run LightGBM to compare. However, even GBM was .71 while NN was .7 for table 3 when the data was randomized. Time series (table 4) had NN doing much better than trees, which seems to be a lesson from practical machine learning 3 or 4.

As I’m typing, this seems to bring back observations of @devforfu and @edwardjross that some of these Kaggle competitions might not be a NN strong point at this time (none of these are time series, all have strange quirks in the data).

So new way ahead!
Scrap #1; or just use a callback to stop training with AUCROC…
#2 run tests on layers and sizes
#3 run more tests on Oversampling
#4 More cross-validation
#5 Try to recreate lr_find() but for hyperparameters