Hi Miriam, thanks for this detailed discussion, and the reminder that accuracy does not always shadow the loss! Your understanding is already more sophisticated than mine in a lot of ways, and I’m learning a lot from you.
You’ll definitely want to go with the learning rate that works the best in reality. “Going back one notch” is not a hard and fast rule. In this case, if you find a smaller learning rate is better, go for it.
The other question on what is causing “validation loss oscillating, validation accuracy increasing” is beyond my knowledge, hopefully others can give more insights. A good starting point seems to be this thread, including a couple of linked articles:
I have also browsed (emphasis on browsed, not studied) a few other related articles on stack exchange and the like, but could not find a real consensus for this type of phenomenon. Overfitting is often mentioned as a possibility, but there are numerous other opinions that are probably dependent on context. In the end, I have to shrug my shoulders and encourage you to explore this rabbit hole more if you have time - it is quite fascinating. Good luck and thanks for this opportunity for adventure!