What is the optimal learning rate dependent on?

I have been playing around with a couple of datasets and noticed that if one changes the batch size and the image size, the optimal learning rate seems to be about the same, but changing the architecture changes the learning rate…

Have others observed this phenomenon? Is the optimal learning rate dependent on the dataset and architecture but not on the batch size and image size? How do transformations affect learning rate?

In general, if you change anything about your model or data, then the optimal learning rate can change. Batch size would probably be the most robust to changes but this isn’t guaranteed.