I bumped into this many many times. particularly in transfer learning, when trying to fine tune. From my observation, this tend to happen during latter epoch. For example after you find the appropriate learning rate during the initial training(i.e. when the network is still frozen, and training only the last few layers), and train for say 40 epoch,
During the fine tuning process, you would do lr_find again based on the trained results from the initial training. The first few epoch from the initial training might be able to produce a plot just fine with lr_find, but when you train the latter epoch(e.g. 20+ epoch), the empty plot appear more often when calling lr_find.
During fine tuning, I also observe:
- if i rerun the lr_find multiple times, i will eventually get a plot. Sometimes I would need to rerun lr_find more than 10 or even 20 times to get a plot.
- when I do get a plot, they are not always the same plot. For the initial training, I always get the same plot. So I ended up creating a candidate list of LR values (15 to 20 of them), sort them in ascending order, and pick 3 or 4 that appear most often in my lr_find trials.
- The smaller the lr, the more empty plot seems to appear when calling lr_find
I’m interested to see what other people are doing. Not sure if this is an ideal way.