Problem with ClassificationInterpretation and ImageClassifierCleaner

Hi there !

ClassificationInterpretation and ImageClassifierCleaner make the kernel crash all the time. The problem is I am not getting any error… The kernel just crashes while the learner iterates through the files…

Any idea why that might be the case ?

note that the model fine_tunes without any problem, which I believe removes the possibility that a file might be corrupt. Also, I tried changing the batch_size to no avail. Finally, show_results() works without problem as well… very curious… maybe my hardware is screwed somehow ?

I’ve kept investigating this a little. I’ve taken 1/10th of the data, tried ClassificationInterpretation and ImageClassifierCleaner and they both worked. So I did that for the rest of the data, 1/10th at a time. Everything worked perfectly…
@sgugger sorry for the at-mentionning, but I believe it is possible that there is something that causes OOM errors with those two things (both have in common that they make predictions on the entire datasets; since the transforms happen on the GPU before prediction (I believe), maybe this is where things crash ?)

edit: also, I tried to use datablock.dataloaders() method with the ‘sample’ and ‘n’ parameters to see if I could take only a few items in the folders without having to reorganize them. Neither seemed to work. Maybe I got their intended use wrong, but in that case maybe something to do just that should exist ( i.e you have two folders with your classes, but would like at first to run a quick model on only ten% of available files for each class).

If the kernel crashes, it means Python has crashed. Is there an error message in the shell window that you launched Jupyter from?

If you have lots and lots of pictures, it’s possible you kernel crashes because it goes out of RAM. You should try applying it to a smaller dataloader.

1 Like

This definitely seems to be worse in fastai2 - I suspect due to it storing inputs now too. Altering the function not to store inputs, as I don’t need this, uses orders of magnitude less memory:

def from_learner(cls, learn, ds_idx=1, dl=None, act=None):
    "Construct interpretatio object from a learner"
    if dl is None: dl = learn.dls[ds_idx]
    preds, targs, decoded, losses = learn.get_preds(dl=dl, with_input=False, with_loss=True, with_decoded=True, act=None)
    return cls(dl, [], preds, targs, decoded, losses)

Thanks @aaerox - that worked for me.

To add a little context: I was noticing that while ClassificationInterpretation.from_learner was running, my RAM usage was slowly increasing until it maxed out and the kernel died.

1 Like

While using classificationInterpretation my notebook crashed due to high ram consumption. How to solve this issue.

1 Like

I’m confused. If this doesn’t happen during training why would it happen when doing an interpretation on what is probably only 20% of the training data? Does a bug report need to be submitted or is this in a queue already? The reason I ask is I am using the latest version and it appears to have an issue with RAM usage.

This still happens in Fastai 2.2.5, I can only complete ClassificationInterpretation.from_learner() with 20k images (I’m on a cluster node with 128 Gb of RAM). With just some more images, the process gets killed by OOM reaper. I’ll try @aaerox modification then I’ll open a pull request (if it’s not there yet).

You all should go through this thread, it talks about the same issue:


@jimmiemunyi - correct. ImageClassifierCleaner is now fixed in repo, but ClassificationInterpretation is outstanding.

@muellerzr is working on a “proper” fix for the Interpretation class adding several nice features, but in the meantime you can patch your own code using the snippets here to fix the problem: Learn.get_preds() memory inefficiency - quick fix


Issue with Interpretation still happening in fastai v2.4
@muellerzr any idea when the fix would be applied to the main repo

Just to add, ClassificationInterpretation.from_learner(learn) in v2.4 and v2.5 currently fails with TypeError: __init__() missing 1 required positional argument: 'losses'

Please provide a reproducer so we know how you are setting things up

@muellerzr currently using a private paperspace environment (8 CPU, 30GB RAM) over ~53000 image dataset … notebook looks like below

learn_segmented = cnn_learner(data_segmented, torchvision.models.densenet121, metrics=[error_rate, accuracy], model_dir=model_path)
interp_segmented = ClassificationInterpretation.from_learner(learn_segmented)

1 Like

I could not recreate the issue with the pets dataset. Please make sure your data is labelled and that everything looks correct. What I used:

from import *

set_seed(99, True)
path = untar_data(URLs.PETS)/'images'
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2,
    label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
interp = ClassificationInterpretation.from_learner(learn)

So, from class view, I believe the labeling doesn’t have an issue

Initially, the kernel just crash during interpretation. However, after excluding input with_input=False based on above fix, the error TypeError: __init__() missing 1 required positional argument: 'losses' started to happen

Unless you’re using that version I posted, you will have issues. I also don’t know if that version works anymore

@muellerzr Tnx, what version do you recommend I can try perhaps

@muellerzr I have now tried v2.5.1, with below fix and it worked ok

   def from_learner(cls, learn, ds_idx=1, dl=None, act=None):
        "Construct interpretation object from a learner"
        if dl is None: dl = learn.dls[ds_idx].new(shuffled=False, drop_last=False)
        # return cls(dl, *learn.get_preds(dl=dl, with_input=True, with_loss=True, with_decoded=True, act=None))
        return cls(dl, dl.dataset.items, *learn.get_preds(dl=dl, with_input=False, with_loss=True, with_decoded=True, act=None))

   # new Interpretation.plot_top_losses in fastai/
   def plot_top_losses(self, k, largest=True, **kwargs):
        losses,idx = self.top_losses(k, largest)
        if not isinstance(self.inputs, tuple): self.inputs = (self.inputs,)
        if isinstance(self.inputs[0], Tensor): inps = tuple(o[idx] for o in self.inputs)
        # else: inps = self.dl.create_batch(self.dl.before_batch([tuple(o[i] for o in self.inputs) for i in idx]))
        else: inps = (first(to_cpu(self.dl.after_batch(to_device(first(self.dl.create_batches(idx)))))),)
        b = inps + tuple(o[idx] for o in (self.targs if is_listy(self.targs) else (self.targs,)))
        x,y,its = self.dl._pre_show_batch(b, max_n=k)
        b_out = inps + tuple(o[idx] for o in (self.decoded if is_listy(self.decoded) else (self.decoded,)))
        x1,y1,outs = self.dl._pre_show_batch(b_out, max_n=k)
        if its is not None:
            plot_top_losses(x, y, its, outs.itemgot(slice(len(inps), None)), self.preds[idx], losses,  **kwargs)
        #TODO: figure out if this is needed
        #its None means that a batch knows how to show itself as a whole, so we pass x, x1
        #else: show_results(x, x1, its, ctxs=ctxs, max_n=max_n, **kwargs)