Simulating smaller batch size without decreasing speedup from having larger batch size

For my current project, I’ve noted that smaller batch sizes significantly outperform larger batch sizes. I think the reason might be a small dataset. However, smaller batch sizes take a longer time to train per epoch. Is there a way to effectively have a smaller batch size, while actually data-loading and doing passes through the model with a larger batch size?

Hi keeplearning. A few observations:

  1. It’s always risky, in science, in the forums, and in reality to claim that something is not possible. Nonetheless…

  2. What you are seeing is that a series of smaller steps, each more accurate to the local gradient, finds a steeper path through the loss landscape than one large step. By deduction, your project has a fast-changing gradient. This is not always so, but applies in your case.

  3. To find the local gradient, you must compute a complete forward pass. As far as I can see, there is no way to break up this unitary process except to use several smaller mini-batches. So the answer seems to be no. (The reverse is possible: you can combine several mini-batches into a single larger batch and step.)

  4. If you can do, for example, 1.5 large batch epochs in the time for one epoch with smaller batches it may outperform in terms of clock time. Clock time is what matters to most people, not the performance per epoch.

  5. You might experiment with different optimizers. Some are more responsive to a fast-changing gradient, like the one you probably have. Searching these forums will find some impressive cutting-edge optimizer research.

HTH, Malcolm

1 Like