I think the main disadvantage is that it the parameters are only updated after each batch.
So say you have 1k images and a batch size of 100.
The parameters will be updated 10 times over the course of 1 epoch.
If you were to set it to something like 500, then the parameters would only get updated twice, making it take longer to get to good values.
I don’t think it’s as simple as that.
Realize that in your stochastic gradient descent (SGD), the gradient of the loss function is computed over the entire batch. If you have a very small batch size, your gradient will really be all over the place and pretty random, because learning will be image by image.
With a large batch size, you get more “accurate” gradients because now you are optimizing the loss simultaneously over a larger set of images. So while you are right that you get more frequent updates when using a smaller batch size, those updates aren’t necessarily better. Trade-off is then: Many “bad” updates versus few “good” updates.
At the extreme end you’d have batch-size = training size (probably not feasible for image problems, but when you have a small structured data set it’s certainly possible). In that case you get one update for epoch, and the update is supposed to be “globally” good, i.e., good for the entire dataset. In a way, it’s not SGD any more, but rather just GD without the “stochastic” part.