tl;dr
Under certain conditions, Learn.get_preds() can eat all your ram and kill your kernel, even on a large machine. This looks to be easily fixed for a 50% reduction by using an in-place sort, potentially a lot greater reductions depending on use-case.
/tl;dr
A problem that appears to have cropped up for many users over many versions of fastai (though I haven’t comprehensively checked) is Learn.get_preds() chewing up memory. This is the root-cause of ImageClassifierCleaner chewing up memory as well (see Out of memory when execute class ImagesCleaner and Learn.get_preds() running out of RAM at completion for examples).
The issue is that Learn.get_preds() appears to store all batches in memory, and then produce a sorted copy of all this using nested_reorder().
To replicate, edit your copy of fastai/learner.py (the auto-generated, “do not edit” version) and insert the following helper at the top:
def getMemStats(prefix=""):
linux_filepath = "/proc/meminfo"
try:
meminfo = dict( (i.split()[0].rstrip(":"), int(i.split()[1])) for i in open(linux_filepath).readlines() )
memused = (meminfo["MemTotal"]-meminfo["MemAvailable"])/(1024*1024*1024)
print(prefix+"memused=%0.1fGB"%(memused))
except Exception as e:
print(prefix+"getMemStats() failed")
return
Then in every second line in the definition of get_preds(), call getMemStats(“xxx”) - replacing xxx with a hint to where you are in the code. In particular, do this before and after the lines
self._do_epoch_validate(dl=dl)
and
if reorder and hasattr(dl, ‘get_idxs’): res = nested_reorder(res, tensor(idxs).argsort())
Save that, and now load 2020 notebook 01, and after the first active cell (where we train on cats-vs-dogs), add a cell with the following:
from fastai.vision.widgets import *
cleaner = ImageClassifierCleaner(learn)
Restart your kernel (to force reload of the edited fastai/learner.py), and run these two cells.
You should see memory use increase by ~3.3GB by the line _do_epoch_validate and the same again by the nested_reorder
For larger datasets than the Oxford Pets dataset, this can easily cause even the largest machine to run out of memory.
The easy win here is to look at the nested_reorder line : this appears to make a new sorted copy, rather than sorting in-place, thus using twice as much memory as necessary for this task. Performing an in-place sort will therefore save 50% of memory use instantly.
The step I am less sure of is how to reduce memory from the _do_epoch_validate : the question here is if the entire results of _do_epoch_validate need to be kept for all use-cases. My knowledge of the library and all use-cases is not good enough to confirm this.
However, for the case of ImageClassifierCleaner it is clear that this should not be necessary - it should be possible to loop through batches of get_preds and accumulate just the indices (as opposed to the actual images), targets and losses, which should take up far less memory per item. An even greater saving could be to keep only some maximum number of top losses per category, ejecting any items from memory that aren’t in at least one of these category top-losses lists. That would be much more complex to implement, so I assume the former would be the quick win.
Hopefully someone with more extensive experience editing the library will step in to confirm/refute my suggestions above.