I have been using fastai tabular learner is some competitions and am getting pretty good scores. But, I am not getting the best scores. While using other algorithms like XGBoost and LightGBM, to get excellent score we need to a lot of parameter fine tuning. In fastai, the only fine tuning I could understand from Jeremy’s lessons is setting the Learning Rate. Are these any other parameters we can fine tune? And how do we do it? The LR is the most important parameter but what are the other parameters that impact the accuracy?
It really depends on which aspect of the model you are trying to improve.
If you are trying to reduce the training loss, you could try to train a bigger and deeper model, simply train longer, and do a hyperparameter search. Hyperparameter is a huge topic but you are on the right track that learning rate generally considered the most important one. If you believe that you have fiddled with it enough, you can try \beta of momemtum and mini-batch size. I would usually stop there. At the very end, you could always customize the architecture but for most people it is not necessary.
If there is a big gap between your training loss and the validation loss, meaning that your model has a high variance and is not generalizing well, you could try regularization techniques like dropout and data augmentation.
what accuracy did you manage to achieve with gradient boosting?
I used fastai tabular but my accuracy on the test set maxed out at about 0.86?
I will leave a link to the kernel below, I hope everyone can check it out
I was working exclusively with the tabular learner for the past three weeks.
Here are my three take-aways:
No other tuning than stage-wise learning rate is required.(*)
[EDIT: This might not be true. Redo experiments.] For predicting numbers or forecasting values in a time-series, predicting the number or the percentage change makes no difference altogether.
Feature engineering makes the most difference. Specifically, by converting implicit knowledge into explicit variables. For instance, for a time series, adding another feature that measures the distance from the moving average immediately improves accuracy.
After having applied all three lessons, my tabular learner went from ~70% rmspe to about 95% rmspe with a mean absolute percentage error (mape) of 2 - 3%.
Looking back, my case really is about the same as the Rossmann example in the sense of dealing with a feature engineering problem in the first place.
(*) Stage-wise LR means, you start with a mega-rate, then use ten-time more than optimal, and finally use the optimal rate to fine tune. Example.:
I should have mentioned that I actually build an experiment pipeline and spent a few days testing every possible combination of features in and out on the actual model until I got a descent combination.
From my point of view, making results reproducible remains the first step before tuning the results because that way you can seperate the cause from the effect through many experiments.
However, what I have just learned over the past few days is that the tabular learner can be very prone to overfitting so currently I rebuild the pipeline to redo the experiments w.r.t. to best results from automated validation on clean data excluded from training. Looking at my code base, I have about 10X more loc on procs and automation than on actual DL…
Not sure whether it’s an art, science, or just plain engineering PITA.