I ran learn.lr_find(start_lr=1e-8, end_lr=1e2, num_it=100) to find the optimal learning rate for fine-tuning a language model. However, the validation loss is nan and the curve I get when running learn.recorder.plot() is very flat. Does someone know any explanation for either of the two issues?
Thanks for any suggestion!
Thanks so much for the clarification and hints! The graph looks a lot better now with the new settings!
Thanks for the calcification, I was also wondering the same thing. But this could be more clear in the documentation,
#na# looks like something is wrong.
When I run fit - what is this number highlighted in yellow ? And when the epoch is finished - it runs a small run as well. What is the significance of these numbers.
The sceenshot if from lesson 6 - Rossman
A little old, but thought I’d answer since I just came across this thread. The highlighted number in yellow is the total items / batch size = # of batches to run through. Its a more granular look than epochs, and less granular than total number of items processed.