I am trying to train an image detection convnet (yolo inspired) and trying to use the technique behind lr_find(…) to identify max lr. I am using Keras to do this investigation. I am constrained by hardware to have a batch size of 16. My training composed of images ranging from 4-20 objects and their bounding boxes. It seemed that with a random sample size of just 16, this can have significant implication for the loss. The network has a much lower loss if the # of object in images is low (like 3), but very very high if it is over 20. As a result, my loss is extremely noisy per steps and only have a clear downward trends over several epochs.
I am trying various things to try smooth this out. And I hope to see some loss vs. LR plot thats nice and clean like the ones in Lesson1. Anyone one facing this issue with their work while trying out this idea, pls let me know how you deal with this. Currently, the loss fluctuation is so high as to be useless to identify a max LR.
Not sure if this forum can include image, i will post an example how it looks.