Batch size calculator?

I wonder is it possible to compute batch size in advance? Taking into account image resolution, the number of GPUs, model size, etc. I mean, I guess it is possible as soon as all parameters are known in advance. The only question is how difficult to get a generic solution?

I know there is an interesting context manager that helps to manage GPUs while working in Jupyter. However, as I can understand, it helps to work with GPU without kernel restarts. But I guess it should be possible to pre-compute optimal batch value depending on hardware and data setup. Even if parameters should be entered manually.

Because so far every once in a while I have issues with picking the right value and getting CUDA errors when, for example, unfreezing the model and continue training. It requires more parameters to train and therefore fails. Probably not a bit deal if you working in the notebook but becomes a problem when writing end-to-end training scripts :smile:

5 Likes

i think there was already talks to achieve just that. I think it have to be done through experimentation as pytorch is dynamic you can’t easily calculate the value. Moreover when you change the batch size you might want to change the learning rate as well so you want to keep the control over the process. But indeed having a tool to find the optimal (memory wise) batch size is quite interesting.

Yeah, agree, it is always a good idea to carry out interactive experimentation.

Talking about batch size computation with a dynamic computational graph, I guess we need to know how much space occupies a single training sample, and then gather all trainable tensors from the model, right? I mean, we can at least get an upper boundary in this case. At least, for “simple” models. The models with a lot of branches will probably be overestimated.