I am creating ImageList that has around 40k+ categories and noticed that the lr_find has completely flattened out suggesting a gradient of 2.00E-11. While I have done a great job going to accuracy of 30% from 1% I seem to have hit a wall and can’t get accuracy to improve much more.
I did notice that reducing batch size and going from fp16 to fp32 did give a little more wiggle room. But after 4 epochs accuracy only got worse.
When the curve is so flat, does it even matter which lr is selected? Should I keep it closer to the 1e-04 or keep doing a slice around where it suggests?
The other parameter im tinkering with is WD and have moved it from .4 to .00001 but neither seems to help much.