Got a Q about lr_find() as discussed in DL1. The plot of loss vs. the learning rate look very smooth compared to what I have. Is this due to batch_size? I used 32. Anyone know what they used in DL1?
I found this great article by Sylvain Gugger that explain this and more in details.