i wonder what learning rate should i choose based on the learning rate finder plots (below). of course i can ran multiple experiments to see which learning rate works best, however it takes quite some time on my machine hence my question here
I used different num_it
argument, but still not sure. i think 1e-03 looks like a good candidate.