OK, let me try to answer I might be wrong. Learning rate is big of a step do you want to take downhill towards global minimum or in order to reach high accuracy or low error rate. The bigger the step is you tend to go fast towards finding the right weights hence your DL model is good to go, but there is a downside that the system keep on overshooting the global minimum and hence the DL model fails to converge.
so to summarize is effecting how fast or how slow your model converges and if the model are able to converge at all is what choosing the learning rate is all about.
Now, batch size from what I understand is a way to economically run the DL computation. DL computation is expensive, you need GPU, etc bla bla bla. Now instead of updating the parameters (weights and biases) every time every data points goes to the DL network, you want t kind of ‘cheat’ by just updating the parameters in ‘batch’. So, if you choose batch size = 128 in image classification model then you run 128 images into the DL network but you don’t update the parameters on every single image but on every batch of 128 images. The technically of this, they might average. So this need memory, because the computer need to remember all the computations of 128 images (feed forward) before it could in one go compute (average) all the 128 images updated parameters. So, you could say the batch size also effect time + cost not just memory since if you increase the batch size then you are saving time + cost, but you need more memory + your model might not be so good (because your batch size is too big!)
to summarize I don’t think there is any direct relationship at all between the two. But I think the two hyperparameters effect the cost + time of the DL model.
This is to me like very intuitive and experiential experiences. You kind of have to play with it to get a feeling how low, how big you want to adjust those hyper parameters. This is the job of DL engineer is to play with these hyperparameters and build intuition into it