Got a Q about lr_find() as discussed in DL1. The plot of loss vs. the learning rate look very smooth compared to what I have. Is this due to batch_size? I used 32. Anyone know what they used in DL1?
Got a Q about lr_find() as discussed in DL1. The plot of loss vs. the learning rate look very smooth compared to what I have. Is this due to batch_size? I used 32. Anyone know what they used in DL1?