I encountered the error you may see below a couple of times on two different systems.
Both times, that error produced the following consequences:
Occupied cuda memory was not released following a kernel restart
The gpu remained with some load on it, with no other process using it (it’s not even, in both systems, connected to a monitor).
The notebook server refused to exit on Ctrl-C
Once I manually killed the notebook server, the system hanged up completely.
Any idea about what could have caused that error? Note that it almost completed the epoch, and memory occupation rose to 80% but remained stable till the error.
Something similar was already asked: Cudnn_status_execution_failed but no one answered. Let me try again.