I don’t think it’s as simple as that.
Realize that in your stochastic gradient descent (SGD), the gradient of the loss function is computed over the entire batch. If you have a very small batch size, your gradient will really be all over the place and pretty random, because learning will be image by image.
With a large batch size, you get more “accurate” gradients because now you are optimizing the loss simultaneously over a larger set of images. So while you are right that you get more frequent updates when using a smaller batch size, those updates aren’t necessarily better. Trade-off is then: Many “bad” updates versus few “good” updates.
At the extreme end you’d have batch-size = training size (probably not feasible for image problems, but when you have a small structured data set it’s certainly possible). In that case you get one update for epoch, and the update is supposed to be “globally” good, i.e., good for the entire dataset. In a way, it’s not SGD any more, but rather just GD without the “stochastic” part.