I’ve been doing a lot of model training recently and something I’ve found that seems to work better than just Ranger + fit_flat_cos, is what I’ll term stair stepping descent with fit_flat_cos.
Like this:
1 - 3 @ 8e-3, fit-flat-cos
2 - 5 @ 1e-3, fit-flat-cos
3 - 8 @ 8e-4, fit-flat-cos
etc. Basically a small run with flat and slide…then another run with flat and slide…with lr decay on each.
Basically, I theorize that the constant run and drop, run and drop, helps it to steadily work it’s way down into a nice valley vs if we run it as a longer single fit-flat-cos.
I’ll keep testing with it and hopefully make it into a single callback (feel free to beat me to it but just wanted to throw it out there that this approach seems to be working quite well on my private datasets.