I’m trying to optimize the memory consumption of my network and I was wondering whether or not there is a (easy?) way to discard some features maps during the forward pass (selecting the layers to discard) after use and recomputing them on the fly once needed during the backpropagation.
Something like this paper:
They use shared memory for that, and they mention that some features are already supported by pytorch. Has anyone tried that with fastai already? Any insight/examples to show me what I’d have to do to mimic this idea?