In this kernel, as far as I can see, the memory usage during training is fine, but during inference, I get Dataloader worker is killed or CUDA out of memory.
I tried decreasing batch size and number of workers, but both did not help.
Any advice?
In this kernel, as far as I can see, the memory usage during training is fine, but during inference, I get Dataloader worker is killed or CUDA out of memory.
I tried decreasing batch size and number of workers, but both did not help.
Any advice?