I’m a little bit confused about the batch size.

Does a batch size of `64`

mean that we divide the whole training data into `64`

pieces (batches) and feed one batch at a time into the neural net, calculate the gradient and then change the weights accordingly? Or does it mean that one batch consists of `64`

training examples?

Isn’t increasing the batch size to lets say `128`

actually halving the batch size, because now the individual batches are just halve as big as before (now you divide the whole training set in twice as many batches).

Second question: How does the batch size affect the ability of the neural net to converge?