LightGBM Discussion

KevinB · June 29, 2018, 3:23am

I wasn’t 100% sure where this topic fit since it isn’t covered anywhere, but I was curious if anybody has worked with LightGBM and if there are certain variables that have some intuition behind them.

My thought was that it would be cool to add LightGBM compatibility to the fastai library for (totally) tabular data. I’m seeing LGBM continually pop up on Kaggle as better than XGBoost which is what piqued my interest. Here is some information about the main variables that can be tuned so for a fastai approach to this, I am planning on picking a decent default number and allowing the user to change it if they know what they want. Ideally a user would be able to use the same data as they provide a Random Forest and it will work.

radek · June 29, 2018, 7:00am

People seem to use it quite a bit more as it seems to be faster than xgboost at least on the CPU. One idea for a workflow is to start with lgbm, do feature selection / engineering and only then use xgboost to construct your final model or for ensembling. In terms of actual performance based on what I read both seem to be very similar and if there is any edge it would probably be in the direction of xgboost. It is really hard to say though as there are so many things that go into the mix and probably any universal statements that one is better than the other might be too strong.

KevinB · June 29, 2018, 3:32pm

Thanks for the reply Radek. That is an interesting observation. So maybe a mixture of the two would be better as a rule of thumb then. Use the LGBM to start since it is faster to get a good base of the columns that matter and build new features during this phase and then try both XGBoost and LGBM after this initial piece is done and probably ensemble them together at that point.