Learning Rate Pattern for Structured Data

(Matthew Teschke) #1

I’ve been working on applying the fast.ai library to structured data, binary classification problems (for example, the Porto Seguro Kaggle competition.

In each case so far, when using the learning rate finder and plotting the results, the learning rate always decreases and levels off, but it never curves back up again.


Two questions:

  1. Is this consistent with what other folks are seeing?
  2. Is there an intuitive explanation for why this might be the case?

For reference, my starting point was the excellent work by @kcturgutlu in the Structured Learner post

(Matthew Teschke) #2

I think that this pattern I observed is reflective of “super-convergence”, which Leslie Smith defines as “where the test loss and accuracy remain nearly constant for this LR range test, even up to very large learning rates”. See her full paper on tuning neural networks here.

Based on what I’ve seen for structured learning problems and my interpretation of her work, this would seem to indicate that large batch sizes and large learning rates would allow you to quickly get the best results.