RuntimeError during ClassificationInterpretation.from_learner

ChrisCFranca · September 27, 2020, 8:29pm

Hello everyone! I’ve been having some problems using the ClassificationInterpretation.from_learner() class method. I keep getting a RuntimeError when i try to use it. I’ll explain below:

I have an image dataset splited between 2 folders. The information about the location and the label of each image is inside a pandas dataframe:
im1

I’m using sklearn’s StratifiedCrossValidation to split the indexes to perform a cross Validation as follows using the DataBlock API:

im2

so my data_cv list contains 5 DataBlock objects inside each index. When i need to make a dataloaders object out of it, i just use:

split = 0
fold_dataloader = data_cv[split].dataloaders(data_df, bs=BS, num_workers=0)

data_df is the previous dataframe.

Then i successfully created the learner and performed the .fine_tune() method, wich yield some pretty good results for my purpose (i’m training in MixedPrecision):

And here’s the problem. I tried to create the ClassificationInterpretation.from_learner(learn) object where learn is the name of my learner.

When i run the code, it then begins to perform an evaluation on my valid set. It has about 1515 batches (my batch size is 384, the maximum number i could fit inside my RTX2070). It goes fine until batch number 180/1515, where it throws this ginormous error:

RuntimeError Traceback (most recent call last)
~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in with_events(self, f, event_type, ex, final)
154 def with_events(self, f, event_type, ex, final=noop):
–> 155 try: self(f’before{event_type}’) ;f()
156 except ex: self(f’after_cancel{event_type}’)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in all_batches(self)
160 self.n_iter = len(self.dl)
–> 161 for o in enumerate(self.dl): self.one_batch(*o)
162

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in one_batch(self, i, b)
175 self._split(b)
–> 176 self._with_events(self._do_one_batch, ‘batch’, CancelBatchException)
177

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in with_events(self, f, event_type, ex, final)
154 def with_events(self, f, event_type, ex, final=noop):
–> 155 try: self(f’before{event_type}’) ;f()
156 except ex: self(f’after_cancel{event_type}’)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in call(self, event_name)
132
–> 133 def call(self, event_name): L(event_name).map(self._call_one)
134

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in map(self, f, *args, **kwargs)
382 else f.getitem)
–> 383 return self._new(map(g, self))
384

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in _new(self, items, *args, **kwargs)
332 def _xtra(self): return None
–> 333 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
334 def getitem(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in call(cls, x, args, **kwargs)
46
—> 47 res = super().call(((x,) + args), **kwargs)
48 res._newchk = 0

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in init(self, items, use_list, match, *rest)
323 if (use_list is not None) or not _is_array(items):
–> 324 items = list(items) if use_list else _listify(items)
325 if match is not None:

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in _listify(o)
259 if isinstance(o, str) or _is_array(o): return [o]
–> 260 if is_iter(o): return list(o)
261 return [o]

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in call(self, *args, **kwargs)
225 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
–> 226 return self.fn(*fargs, **kwargs)
227

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in _call_one(self, event_name)
136 assert hasattr(event, event_name), event_name
–> 137 [cb(event_name) for cb in sort_by_run(self.cbs)]
138

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in (.0)
136 assert hasattr(event, event_name), event_name
–> 137 [cb(event_name) for cb in sort_by_run(self.cbs)]
138

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\callback\core.py in call(self, event_name)
43 res = None
—> 44 if self.run and _run: res = getattr(self, event_name, noop)()
45 if event_name==‘after_fit’: self.run=True #Reset self.run to True at each end of fit

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\callback\core.py in before_batch(self)
94 def before_batch(self):
—> 95 if self.with_input: self.inputs.append((to_detach(self.xb)))
96

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in to_detach(b, cpu, gather)
184 return x.cpu() if cpu else x
–> 185 return apply(_inner, b, cpu=cpu, gather=gather)
186

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in apply(func, x, *args, **kwargs)
162 “Apply func recursively to x, passing on args”
–> 163 if is_listy(x): return type(x)([apply(func, o, *args, **kwargs) for o in x])
164 if isinstance(x,dict): return {k: apply(func, v, *args, **kwargs) for k,v in x.items()}

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in (.0)
162 “Apply func recursively to x, passing on args”
–> 163 if is_listy(x): return type(x)([apply(func, o, *args, **kwargs) for o in x])
164 if isinstance(x,dict): return {k: apply(func, v, *args, **kwargs) for k,v in x.items()}

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in apply(func, x, *args, **kwargs)
164 if isinstance(x,dict): return {k: apply(func, v, *args, **kwargs) for k,v in x.items()}
–> 165 res = func(x, *args, **kwargs)
166 return res if x is None else retain_type(res, x)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in _inner(x, cpu, gather)
183 if gather: x = maybe_gather(x)
–> 184 return x.cpu() if cpu else x
185 return apply(_inner, b, cpu=cpu, gather=gather)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in _f(self, *args, **kwargs)
297 cls = self.class
–> 298 res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
299 return retain_type(res, self, copy_meta=True)

RuntimeError: [enforce fail at …\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 231211008 bytes. Buy new RAM!

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in to_concat(xs, dim)
234 # in this case we return a big list
–> 235 try: return retain_type(torch.cat(xs, dim=dim), xs[0])
236 except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])

RuntimeError: [enforce fail at …\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 45317357568 bytes. Buy new RAM!

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in
----> 1 interp = ClassificationInterpretation.from_learner(learn)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\interpret.py in from_learner(cls, learn, ds_idx, dl, act)
27 “Construct interpretation object from a learner”
28 if dl is None: dl = learn.dls[ds_idx]
—> 29 return cls(dl, *learn.get_preds(dl=dl, with_input=True, with_loss=True, with_decoded=True, act=None))
30
31 def top_losses(self, k=None, largest=True):

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, reorder, cbs, **kwargs)
230 if with_loss: ctx_mgrs.append(self.loss_not_reduced())
231 with ContextManagers(ctx_mgrs):
–> 232 self._do_epoch_validate(dl=dl)
233 if act is None: act = getattr(self.loss_func, ‘activation’, noop)
234 res = cb.all_tensors()

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in _do_epoch_validate(self, ds_idx, dl)
183 if dl is None: dl = self.dls[ds_idx]
184 self.dl = dl;
–> 185 with torch.no_grad(): self._with_events(self.all_batches, ‘validate’, CancelValidException)
186
187 def _do_epoch(self):

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in with_events(self, f, event_type, ex, final)
155 try: self(f’before{event_type}’) ;f()
156 except ex: self(f’after_cancel_{event_type}’)
–> 157 finally: self(f’after_{event_type}’) ;final()
158
159 def all_batches(self):

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in call(self, event_name)
131 def ordered_cbs(self, event): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, event)]
132
–> 133 def call(self, event_name): L(event_name).map(self._call_one)
134
135 def _call_one(self, event_name):

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in map(self, f, *args, **kwargs)
381 else f.format if isinstance(f,str)
382 else f.getitem)
–> 383 return self._new(map(g, self))
384
385 def filter(self, f, negate=False, **kwargs):

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in _new(self, items, *args, **kwargs)
331 @property
332 def _xtra(self): return None
–> 333 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
334 def getitem(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
335 def copy(self): return self._new(self.items.copy())

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in call(cls, x, args, **kwargs)
45 return x
46
—> 47 res = super().call(((x,) + args), **kwargs)
48 res._newchk = 0
49 return res

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in init(self, items, use_list, match, *rest)
322 if items is None: items = []
323 if (use_list is not None) or not _is_array(items):
–> 324 items = list(items) if use_list else _listify(items)
325 if match is not None:
326 if is_coll(match): match = len(match)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in _listify(o)
258 if isinstance(o, list): return o
259 if isinstance(o, str) or _is_array(o): return [o]
–> 260 if is_iter(o): return list(o)
261 return [o]
262

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in call(self, *args, **kwargs)
224 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
225 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
–> 226 return self.fn(*fargs, **kwargs)
227
228 # Cell

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in _call_one(self, event_name)
135 def _call_one(self, event_name):
136 assert hasattr(event, event_name), event_name
–> 137 [cb(event_name) for cb in sort_by_run(self.cbs)]
138
139 def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\learner.py in (.0)
135 def _call_one(self, event_name):
136 assert hasattr(event, event_name), event_name
–> 137 [cb(event_name) for cb in sort_by_run(self.cbs)]
138
139 def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\callback\core.py in call(self, event_name)
42 (self.run_valid and not getattr(self, ‘training’, False)))
43 res = None
—> 44 if self.run and _run: res = getattr(self, event_name, noop)()
45 if event_name==‘after_fit’: self.run=True #Reset self.run to True at each end of fit
46 return res

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\callback\core.py in after_validate(self)
117 “Concatenate all recorded tensors”
118 if not hasattr(self, ‘preds’): return
–> 119 if self.with_input: self.inputs = detuplify(to_concat(self.inputs, dim=self.concat_dim))
120 if not self.save_preds: self.preds = detuplify(to_concat(self.preds, dim=self.concat_dim))
121 if not self.save_targs: self.targets = detuplify(to_concat(self.targets, dim=self.concat_dim))

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in to_concat(xs, dim)
229 “Concat the element in xs (recursively if they are tuples/lists of tensors)”
230 if not xs: return xs
–> 231 if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
232 if isinstance(xs[0],dict): return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs[0].keys()}
233 #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in (.0)
229 “Concat the element in xs (recursively if they are tuples/lists of tensors)”
230 if not xs: return xs
–> 231 if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
232 if isinstance(xs[0],dict): return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs[0].keys()}
233 #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in to_concat(xs, dim)
235 try: return retain_type(torch.cat(xs, dim=dim), xs[0])
236 except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])
–> 237 for i in range_of(o_)) for o_ in xs], L())
238
239 # Cell

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in (.0)
235 try: return retain_type(torch.cat(xs, dim=dim), xs[0])
236 except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])
–> 237 for i in range_of(o_)) for o_ in xs], L())
238
239 # Cell

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in call(cls, x, args, **kwargs)
45 return x
46
—> 47 res = super().call(((x,) + args), **kwargs)
48 res._newchk = 0
49 return res

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in init(self, items, use_list, match, *rest)
322 if items is None: items = []
323 if (use_list is not None) or not _is_array(items):
–> 324 items = list(items) if use_list else _listify(items)
325 if match is not None:
326 if is_coll(match): match = len(match)

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastcore\foundation.py in _listify(o)
258 if isinstance(o, list): return o
259 if isinstance(o, str) or _is_array(o): return [o]
–> 260 if is_iter(o): return list(o)
261 return [o]
262

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in (.0)
235 try: return retain_type(torch.cat(xs, dim=dim), xs[0])
236 except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])
–> 237 for i in range_of(o_)) for o_ in xs], L())
238
239 # Cell

~\Anaconda3\envs\FastaiV2\lib\site-packages\fastai\torch_core.py in _f(self, *args, **kwargs)
296 def _f(self, *args, **kwargs):
297 cls = self.class
–> 298 res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
299 return retain_type(res, self, copy_meta=True)
300 return _f

RuntimeError: index_select(): Expected dtype int64 for index

The first error was
RuntimeError: DefaultCPUAllocator: not enough memory: you tried to allocate 231211008 bytes. Buy new RAM!
followed by
RuntimeError: DefaultCPUAllocator: not enough memory: you tried to allocate 45317357568 bytes. Buy new RAM!
and finally
RuntimeError: index_select(): Expected dtype int64 for index

I would be very gratefull if anyone could help me find out why this is happening, because right now i have no idea!

Thank you.

orendar · September 27, 2020, 9:53pm

Well, the error itself and the fact that it happened halfway through the processing seems to indicate you don’t have enough RAM on your machine for this method to work. I’m not an expert on its inner workings - perhaps it needs to hold the entire dataset in the RAM in order for it to work. Regardless, try running it on less batches (10, 50, 100, 200 etc) and see if it works and what the RAM consumption of your machine looks like - that should indicate whether that is indeed the problem or not.

ChrisCFranca · September 27, 2020, 10:35pm

Thanks for the reply! I indeed thought that was it, but with fastai v1 i was using this method for this exact same dataset in the same configs, and it did work pretty smoothly. Now i’m getting this error regardless the batch size (i tried for 64 up to 348, tha same error remained). I also followed my RAM usage for the method and it maximizes it around the 30iest batch (also my pc gets really slow and i can’t use any other program), where it remains maximized until it raises the error around the 180iest batch.

One thing i did notice is that the ClassificationInterpretation.from_learner() calls the method learn.get_preds() internally and passes inputs=True as default. This one argument is what causes my RAM go crazy since i tested calling the method without it and no errors occurred, neither my RAM raised. I don’t exactly know what passing inputs=True does in this case, but i believe it to be the source of the problem.

I managed to create the interpreter without using the inputs=True, which made me able to plot the confusion matrices, but i am unable to plot the top losses . I guess the inputs are the images themselves being loaded to memory as you said, and since my validation set consists of half a million images, my RAM cannot support it . I just don’t understand how it worked on fastai v1.

orendar · September 28, 2020, 3:15pm

Hey Christian,

Sounds like your analysis is correct. If you would like to invest some more time and effort into solving this (and potentially also contributing your solution to the community), you could either go back to the fastai v1 source code and compare to current one and see if you get any insights, or you could just calculate the losses yourself using a dataloader with batches and without shuffling (so the standard way basically) and then get the top losses that way

jku · January 8, 2021, 11:16am

I just ran into the same error. In my case I had to half the batch size from 32 down to 16. I stored the model with the bs=32-based trained model, restarted the notebook, initialized a bs=16 dataloader, loaded the previously stored model and then the classifier interpreter worked.

Zachary487 · January 11, 2021, 10:49am

The material based mostly on the free deep learning course and library fast.ai (from learn.fit_one_cycle(1) interp = ClassificationInterpretation .from_learner (learn) If you got error: RuntimeError : Expected object of backend CPU but got.

Nox Player