I encountered the error you may see below a couple of times on two different systems.
Both times, that error produced the following consequences:
Occupied cuda memory was not released following a kernel restart
The gpu remained with some load on it, with no other process using it (it’s not even, in both systems, connected to a monitor).
The notebook server refused to exit on Ctrl-C
Once I manually killed the notebook server, the system hanged up completely.
Any idea about what could have caused that error? Note that it almost completed the epoch, and memory occupation rose to 80% but remained stable till the error.
I’ve been using fast.ai v0.7 following this guide, and the v0.7 library seems like using the older version of CUDA and cuDNN? So you just upgrade CUDA and cuDNN then the problem just go away?