@bfarzin, fyi, I made a whole bunch of improvements for https://docs.fast.ai/utils.mem.html#GPUMemTrace (git master required or 1.0.47 when it gets released).
This includes a new decorator https://docs.fast.ai/utils.mem.html#gpu_mem_trace, so now you can sprinkle those above methods and functions and get automatic reporting, e.g.: some output from unet learner debug I’m in the process of doing:
△Used Peaked MB: 0 0 (UnetBlock.forward: exit)
△Used Peaked MB: 0 0 (UnetBlock.forward: exit)
△Used Peaked MB: 0 154 (UnetBlock.forward: exit)
△Used Peaked MB: 372 64 (UnetBlock.forward: exit)
△Used Peaked MB: 128 282 (FeatureLoss.make_features: exit)
△Used Peaked MB: 1,220 0 (FeatureLoss.make_features: exit)
△Used Peaked MB: 1,508 32 (FeatureLoss.forward: exit)
I also changed the output format to make it easier to have stacks of those.
I know that column on the left looks redundant, but remember each of these prints is unrelated to each other and various other outputs may come in between.
Here I used an assumption that 5 digits fixed width should be enough for now (6 with ,), as I don’t know anybody with 100GB+ cards yet.
Other important changes in GPUMemTrace:
-
no need to start(), it starts automatically
-
context manager prints report automatically
-
added context and subcontext in reports, so you could easily tell where the report has come from, but only need to set the main context in the constructor. Example:
m1 = GPUMemTrace(ctx='foo')
m2 = GPUMemTrace(ctx='bar')
m1.report('sample1')
m1.report('sample2')
m2.report('sample1')
m2.report('sample2')
gives:
△Used Peaked MB: 0 0 (foo: sample1)
△Used Peaked MB: 0 0 (foo: sample2)
△Used Peaked MB: 0 0 (bar: sample1)
△Used Peaked MB: 0 0 (bar: sample2)
Have a look at the doc, lots of examples there.
As you use it please let me know if anything could be improved. The idea is to type as little as possible and to get intelligible outputs that could quickly help find leaks and inefficient code.
For example with the decorator it should be possible to turn debug traces on and off w/o touching the code (once the decorators are in the code). Just need to tweak it some more. It’s a work in progress. So please start using it and send back feedback if you find any. Thank you.
p.s. see the note in the doc about peak measurement being unreliable due to not having control over the thread that performs that measurement. We need to get pytorch support for this to give correct numbers always - please vote for this feature request.