Developer chat

OK, so here is an update on this story.

It’ll take ipython time to sort it out, because my patch can’t be applied as is since it’ll break %debug magic, so they will have to make it configurable - let’s see how and when it gets resolved. In particular we need a simple magic to reset %tb.

Meanwhile, fastai (git master) has been instrumented with the following features that will provide you a solution to this problem today:

  1. under non-ipython environment it doesn’t do anything special
  2. under ipython it strips tb by default only for the “CUDA out of memory” exception, i.e. %debug magic will work under all circumstances but this one, and it’ll leak memory in all of those until tb is reset
  3. The env var FASTAI_TB_CLEAR_FRAMES changes this behavior when run under ipython,
    depending on its value:
  • “0”: never strip tb (makes it possible to always use %debug magic, but with leaks)
  • “1”: always strip tb (never need to worry about leaks, but %debug won’t work)

where ipython == ipython/ipython-notebook/jupyter-notebook

At the moment we are only doing this for the fit() family of functions. If you find other fastai API needing this please let us know.

You can set os.environ['FASTAI_TB_CLEAR_FRAMES']="0" (or "1") in your code or from the shell when you start jupyter.

Let me know whether I have missed any special cases, so that we have that one sorted out before we release 1.0.42.

Of course, all the tricks posted in my original message still apply.

I will end this with an easy to remember tip, if everything else fails or perhaps you’re not using fastai and you can’t recover from OOM in your notebook, just run a cell with this content:

1/0

and you should be back in the game w/o needing to restart the kernel.

This whole subject matter is now documented here: https://docs.fast.ai/troubleshoot.html#memory-leakage-on-exception

If you encounter any related issues you can discuss those here: A guide to recovering from CUDA Out of Memory and other exceptions

6 Likes