Lesson 8 - Official topic

harish3110 · May 11, 2020, 2:17am

The answer to your question was answered in last year’s course. You can check the notes here:

How do we find out that it’s 2.6? Jeremy ran lots and lots of different models using lots of different sets of hyperparameters of various types (dropout, learning rates, and discriminative learning rate and so forth), and then Jeremy created something called a random forest which is a kind of model where Jeremy attempted to predict how accurate his NLP classifier would be based on the hyperparameters. And then Jeremy used random forest interpretation methods to basically figure out what the optimal parameter settings were, and Jeremy found out that the answer for this number was 2.6. So that’s actually not something he has published or Jeremy doesn’t think he has even talked about it before, so there’s a new piece of information. Actually, a few months after Jeremy did this, Stephen Merity and somebody else did publish a paper describing a similar approach, so the basic idea may be out there already.

The magic value of 2.6 was introduced in the ULMFiT paper officially!