I’m a little bit confused about the batch size.
Does a batch size of
64 mean that we divide the whole training data into
64 pieces (batches) and feed one batch at a time into the neural net, calculate the gradient and then change the weights accordingly? Or does it mean that one batch consists of
64 training examples?
Isn’t increasing the batch size to lets say
128 actually halving the batch size, because now the individual batches are just halve as big as before (now you divide the whole training set in twice as many batches).
Second question: How does the batch size affect the ability of the neural net to converge?