Improving the Model further

While some of the practices have been discussed which revolves around transfer learning (SoTA approaches) to get better results, there is still often room for improvement (especially Kaggle) left for maximizing performance. I have tried lots of ways to improve it, but still it isn’t enough.

Things which usually work:

  • Data augmentation (suitable for datasets)
  • Custom Loss function (in some cases, but mostly it’s standard ones)
  • Adding more parameters (adding layers, more computation);[not helpful in all cases,but usually works]
  • Adding more data (larger mini batches, if RAM allows)
  • Researching more into architecture
  • Finding bottlenecks (often it’s hard and time taking)
  • Ensembling (Proven to work really well especially for competitions, More info required though)

I wanted to gather more info as to in which directions (apart from these) should we be researching more into to get even more better results (reaching upto saturation with current research). You can share your experiences either from Kaggle or implementing research papers which have worked out for you earlier.


Yes I have also been thinking that Kaggle winners do a lot a creative pre-processive, ensembling etc. What are the things we can do on top of our models to give us that extra boost. Obviously such a thing would be non standard and data specific. For me, in one cases dilation helped to increase the receptive field size and so did hard mining or focal loss after a lot of epochs. Focal loss is a loss worth covering. It is basically like saying don’t learn with easy examples, just on the ones you get wrong by some large extent. I think we should compile a list of handy techniques to extend on after the usual beginning.

1 Like

Just a correction @prajjwal1. More RAM is required if you want to train using larger mini-batches. If you only increase the size of the training set, without changing batch size, you would just have more iteration per epoch to go through the whole training set.

More RAM is also needed to train larger architectures with more weights to store in memory.

Thanks, fixed now