In different small projects on image classification I encounter a similar problem: difficulty picking the correct learning rate. Currently I’m working on classification of about 10.000 images classified into 7 categories. The validation set contains 25% of the data. The architecture is resnet50.
Initially I find a good learning rate (see picture for parameters):
In the second step there is a big gap with the validation loss, it seems to be underfitting. The training loss is not improving much. Is my learning rate too small?
Some questions about picking the learning rate:
- what is the use of the lower limit in
max_lr()=slice(x1, x2). In my understanding the slice defines limits for the max learning rate and ‘the software’ figures out the best one?
- Sometimes it seems to perform better when I don’t use
- the second run of
fit_one_cycletends be more rough/unpredictable. I try to create a
bump/initial increase in the loss by picking higher
max_lr(), because this prevents setting in a local minimum with good effect. But sometimes the decline of the loss is doesn’t recover and the proccess ends around the same error rate.
- Is it advised to use weigh decay in a resent50 CNN?
If anyone has advice about the questions above I’d like to learn more. Especially is an interaction between the number of epochs,
max_lr and the cycle length which I didn’t grasp yet. Fit one cycle and
max_lr induce an inital increase and the number of epochs determines how many times the data is evaluated? Is there a possibility to use multiple cycles as well to create a saw-tooth pattern for the learning rate over multiple epochs? Any other considerations for