I am currently working on Statoil iceberg challenge and after getting near 10th rank, we have reached the plateau and it’s not improving further.
I am currently in ensembling phase where I have 5-10 models each performing 0.15-0.16 on LB and having near 75-85% correlation between them. Currently, I am taking avg of 2-3 models and trying on LB, this sometimes improves sometime doesn’t.
I am looking for two kinds of suggestions.
How to find the best combination, I can’t try all combinations.
What is the best way to stack all these models? I have used Xgboost and LogReg and both overfits to training grossly.
Any help would be appreciated
Sounds like you’re doing fantastically well! Congrats I suspect you’ve seen these already, but wanted to make sure you were aware of these fantastic guides for ensembling:
Also, remember that the public leaderboard isn’t your main guide as to your performance - especially in this competition, where they’ve added thousands of synthetic (augmented) images to the public leaderboard dataset, but not the private leaderboard dataset. So be sure to have a good local validation method, and trust it!
Also, have you looked at the various regularization methods that have greatly improved CIFAR10 results recently, such as shake-shake and shake-drop? And of course snapshot ensembing with SGDR?
Thanks Jeremy for reply
Well, I didn’t know that synthetic images are ONLY IN public leaderboard. I had to synthesize the images in training to match the test dataset. However, it improved performance only from 0.1626 to 0.1594. Maybe the DataAug took care of that already.
I am doing 3-fold CV but 5-fold gives a much better performance. One problem I am facing and I think few others to whom I have talked in top positions is that their training performance is near 0.14 and validation performance is near 0.19 and still it performs about 0.15-0.16 on LB.
Thanks for telling the new techniques, I haven’t used them yet. I will implement them and post the improvement here.
An easier technique BTW to implement is mixup regularization. That might be worth trying, and probably you could throw it together pretty quickly.
Great job securing such a high spot in this competition!!
That being said I would be a little concerned getting 0.19 on validation set and scoring much higher on the public lb at least for this competition like Jeremy says there are 1000s of machine generated images so it could be that your model is learning to predict well on synthetic images but doesn’t perform as well on real images.
For me, I have the opposite situation at the moment, on the training/validation I am able to consistently get 0.05 - 0.06 loss (0.048 with tta) yet on the leaderboard my score is never below 0.2. I am hoping that this is because my model simply performs very poorly on synthetic images which is fine for me if the private lb doesn’t have any. Of course, I could be terribly wrong with my assumptions and still end up at the bottom of the private LB lol
Anyway, just some food for thought!
I’m glad I didn’t enter this comp - unreliable public leaderboards are totally terrifying and keep me up at night!
Maybe you are right but if you look at the synthetic images, most of them are just the images which we generate using standard data aug techniques. So it shouldn’t make much difference because anyway we do that while training.
@devm2024, @jamesrequa Guys, are you adapting datasets to be able to use default functionality of fastai lib?