When my training script is run on Windows I encounter the CUDA out of memory error but when the exact same script is run on Ubuntu it runs fine. What’s stranger is that the first epoch (training + validation) runs fine but OOM occurs after a few iterations in the second epoch. I noticed the problem was caused by over 1 GB of cached memory when running on Windows. I was able to fix this problem by adding a callback that calls cuda.torch.empty_cache() in on_batch_end(). However, as you can imagine, this greatly slows down the training speed. I tried to call cuda.torch.empty_cache() only in on_epoch_begin() or on_epoch_end() but both didn’t work. Any idea what is going on? I installed PyTorch and Fast AI with Aanaconda on Windows.
fastai currently only supports Linux, as noted in the install docs here so that’s probably why you’re seeing weird behavior. This thread over at pytorch suggests that that extra cached memory is not wasted space, pytorch is actually using it and will call empty_cache() on it’s own if needed.
Given that you are seeing this on Windows, it’s possible (likely?) that Windows is using your GPU for other things, like OS rendering and maybe computation. I’m not familiar with working on Windows - can you run nvidia-smi
and see what the results are? Either way, you should be able to get it to work by lowering the batch size.
I find you need a smaller batch size in Windows compared to Linux. My Windows machine has a GPU with 11 GB memory, and I tend to get Cuda memory errors around 9 GB. Not sure if this is a fast.ai thing, a Pytorch thing or a Cuda thing.