Blank lr plot returned

Divya_Bhargavi · February 9, 2019, 12:39am

Hi All!

This is the first time I am running into this issue and would love to have some direction what I am getting wrong.
The functions below execute in seconds and the lr plot returned is blank.

learn.lr_find(start_lr=1e-7, end_lr=1e-3)
learn.recorder.plot()

Any ideas why?

Thanks!

Claus · April 26, 2019, 9:26am

Hi,

i have the same problem if I use:

learn.lr_find(start_lr = 1e-7,
             end_lr = 1,
             num_it = 100,
             stop_div = True, # Abbruch bei Divergenz
             wd = None)

learn.recorder.plot()

I get a blank plot with a wrong X-Axis between 1e+00 and 1e+01.

Did you already figure out what was wrong?

Thanks!

Claus · April 26, 2019, 10:45am

I changed it to:

learn.lr_find(stop_div=False, num_it=200)

and now it just worked fine. I have no clue what was wrong before.

Divya_Bhargavi · April 26, 2019, 5:25pm

Hi,

I came across this couple of times.
Once it was fixed by

fixing path to validation set
giving a different range of learning rate
I think it stops executing when the metrics doesn’t improve.

Hope it helps.

jesbuddy7 · April 26, 2019, 6:11pm

I bumped into this many many times. particularly in transfer learning, when trying to fine tune. From my observation, this tend to happen during latter epoch. For example after you find the appropriate learning rate during the initial training(i.e. when the network is still frozen, and training only the last few layers), and train for say 40 epoch,

During the fine tuning process, you would do lr_find again based on the trained results from the initial training. The first few epoch from the initial training might be able to produce a plot just fine with lr_find, but when you train the latter epoch(e.g. 20+ epoch), the empty plot appear more often when calling lr_find.

During fine tuning, I also observe:

if i rerun the lr_find multiple times, i will eventually get a plot. Sometimes I would need to rerun lr_find more than 10 or even 20 times to get a plot.
when I do get a plot, they are not always the same plot. For the initial training, I always get the same plot. So I ended up creating a candidate list of LR values (15 to 20 of them), sort them in ascending order, and pick 3 or 4 that appear most often in my lr_find trials.
The smaller the lr, the more empty plot seems to appear when calling lr_find

I’m interested to see what other people are doing. Not sure if this is an ideal way.

joerg · May 31, 2019, 7:37am

I’m having the same problem, when calling lr_find for fine-tuning after learn.unfreeze().
Calling lr_find twice in a row made the plot appear. Thanks for the help. It’s still strange, though.

NathanHub · May 31, 2019, 8:10am

This can happen after having already trained your model and because of the stop condition of lr_find, which is :

if self.stop_dv and (math.isnan(loss) or loss>self.best*4):

Indeed, sometimes your loss is so small that even the slightiest increase makes it 4 times bigger than your ‘best loss’, and exits the learning rate finder without anything to plot.

This is why I think some of you have to run the cell several times, you are sometimes lucky and have a loss that doesn’t fluctuate that much in the beginning of the lr_find, making it return some values.

rpcoelho · June 24, 2019, 7:11pm

I was having the same problem using the tabular_learner, on a data set similar to the Rossman data set. I changed the metircs to mae and put log=False when creating the databunch, then it worked. But at this point I am getting an MAE much worse than using an XGBRegressor. Might need to tweek the embedding layer sizes. Let’s see what happens…

pankaj_kvhld · October 21, 2019, 6:00pm

Wow, it worked like a charm.
Could you please explain why this works?

Claus · October 22, 2019, 6:47am

To be honest, I’m not 100% sure why this works.
I think the reason is the stop_div = True and the stop_div = False in the other post.
According to https://docs.fast.ai/train.html

" If stop_div , stops when loss diverges."

and

https://medium.com/@hiromi_suenaga/deep-learning-2-part-2-lesson-13-43454b21a5d0

" stop_div basically means that it’ll use whatever schedule you asked for but when the loss gets too bad, it’ll stop training."

This means that in our cases where stop_div = True the learner recognises that the loss gets too high, therefore you get the blank plot.
I think it would be interessting for us if you could add the plot that you get if you use stop_div = False.
If you find out why this happens please let us know so that we know that it is not just magic

kevinh · June 26, 2020, 1:07pm

I’m replying because I encountered this issue with tabular data.

In my case, the problem was that some columns had inf values (-inf actually after a log transformation). After replacing these values, the model was able to learn and the plot was not empty anymore.