Has somebody looked into ways of automating batch size? It feels low-tech and antiquated to wait for a CUDA out of memory failure, and then decrease the batch size by trial and error until the training goes through… surely there must be some better way?