CUDA Illegal Memory Access

In the new FastAI update I encounter the ‘CUDA Error: illegal memory access encoutered’ every time I first use learner.predict with a forward LSTM and then learner.predict with a backward LSTM. I have tried everything and found it can be fixed by reloading the torch.cuda environment after the first learner.predict, coming down to:

learn_fwd.predict()

from impotlib import reload
reload(torch.cuda)

learn_bwd.predict()

I thought I should leave this somewhere as it has cost me a large amount of time to figure out where the error came from and having defined many workarounds.

I hope this helps someone, if I have posted this in the wrong category, please let me know.

2 Likes

I run into this error when running the IMDB training in the first chapter of the fastbook , even if I decrease the batch size to bs=16. Training the image classification and segmentation models went fine though.

I ran into similar problems and it seemed there was something wrong in the reset method of the AwdLstm where the “raw” layer weights are copied back into the model. However, just reading the code, everything seems to make perfect sense. Still, I guess there are some pytorch/cuda cache things at play that may lead to this problem.

In other threads about that particular CUDA error, I have found that it may happen if pytorch makes wrong deductions about which device a tensor is on. For me, stupidly calling learn.model.to("cuda") seems to be a workaround as well. Of course, that doesn’t make any more sense than reloading torch.cuda via importlib, but so far I haven’t successfully managed to truly understand the issue :frowning: