Out of memory and restarting kernel during inference

In this kernel, as far as I can see, the memory usage during training is fine, but during inference, I get Dataloader worker is killed or CUDA out of memory.

I tried decreasing batch size and number of workers, but both did not help.

Any advice?