IPyExperiments: Getting the most out of your GPU RAM in jupyter notebook


(Stas Bekman) #21

Thank you for your feedback, @piotr.czapla.

I’d like to make the amount of data it prints customizable, since some probably would like it to be more terse, probably never silent, since it’d be hard to tell whether it was run or not otherwise. So probably by default it can be terse, printing the consumed/reclaimed data in a tight one or two liners, and if someone wants verbose as it is now, it’ll be so via a constructor argument.

Any other data to collect? time, duration?


(Piotr Czapla) #22

I would print the parameters that you remove from the global scope to minimise amount of surprises. Actually we could replace the global variables by a proxy object that when printed would tell user what just happend to his variable, or throw an error when a member is accessed on such object. What do you think?
I’m not sure regarding time, as this might depend on user.


(Stas Bekman) #23

Good idea.

Actually we could replace the global variables by a proxy object that when printed would tell user what just happend to his variable, or throw an error when a member is accessed on such object. What do you think?

I think if you access a variable and you get the error that it doesn’t exist it’s the best telltale sign, no? If you replace it with something else it would be more confusing, since if the user forgets they wanted the variables to be annihilated, and try to use them, rather than print, the error would be even more confusing or misleading.

Basically, it’d behave exactly as if you were to jump into a middle of a notebook and run some cell with variables that were supposed to be initialized earlier in the notebook cells - so you will get the same error here. Which is very consistent. And most season jupyter notebook users will instantly ask themselves - did I run the above cells?


(Stas Bekman) #24

OK, other than refactoring I added a bunch of new changes.

  1. printing what vars got deleted
  2. added .get_stats() and .finish() methods so that the user can get the numbers programmatically for an even better experimentation.
  3. elapsed wallclock time report

See the 3rd experiment in the demo notebook here to see the new methods in action.

The API is still wide-open so any suggestions for improvement are welcome.


(Stas Bekman) #25

Added:

  1. a way to prevent some local variables from deletion
  2. context manager support

See the 3+4 experiments in the demo notebook here to see the new methods in action.


(Stas Bekman) #26
  • replaced gputil with much faster nvidia-ml-py3

I have no idea whether it works on windows, but I see no reason why it shouldn’t - as it accesses the nvml library directly.


(Stas Bekman) #27
  • added a test suite
  • made the package available on pypi and conda
  • some minor fixes

(Piotr Czapla) #28

I’ve found some time today to have a look at the cyclic references of learner. Adding WeakReferences to callbacks fixes the issue but we still have a cyclic reference in scipy module which cannot be easily fixed. I’ve updated the test to reflect this. https://github.com/fastai/fastai/pull/1375


(Stas Bekman) #29

Recent changes:

  • on GPU backend loading report the ID, Name and Total RAM of the selected GPU
  • print_state now gives an easier to read report

Some breaking changes in the last release:

  • made the module into proper subclasses, no more global function aliases. So now use directly the desired backend: IPyExperimentsCPU, IPyExperimentsPytorch as an experiments module. It should be trivial now to add other backends.
  • and get_stats method has been replaced with data property method, which now returns one or more IPyExperimentMemory named tuple(s) depending on the used subclass.

Latest API is here: https://github.com/stas00/ipyexperiments#api


(Stas Bekman) #30

It was painful to maintain two somewhat similar systems, so I integrated both into one.

So the big change is: ipygpulogger got integrated into ipyexperiments.

I’d like to finalize the API and to make sure that all the reported numbers and their names make sense and are intuitive. So if you get a chance please kindly play with the latest version and let me know if anything is unclear/confusing/can be improved/etc.

Thank you.