Hi, hope I’m posting this in the right forum
I am running into a problem with the learn.load (using a cnn_learner) function and was wondering if anyone else gets this too? I keep getting a: “RecursionError: maximum recursion depth exceeded” error when running learn.load(base_dir + '[model_name]')
.
I don’t really understand what is happening here, and I’ve seen this happen a few times with other library calls, and it seems to happen regularly if I try to re-run functions of the model after the initial training run. But for my particular case at the moment to reproduce:
- I can save a model initially.
- Then initially load the model.
- But when I try re-load a model after that after changing some parameters (say for example the number of epochs in a fit cycle) I get that error. I’ve been trying to find a solution but can’t seem to get past this.
The only way I really recover from this is to create a new data bunch and to create a new learner. Which is not really ideal as I’d like to resume from my save point, rather than starting from scratch all the time.
I am on the latest version of fast ai (1.0.51). I’m also running my own ML rig (where I ssh in from my macOS laptop and connect to my running jupyter server) on Ubuntu 18 with an RTX 2070. I’m also using fp16. I’ve pasted the stack trace I am getting below if that helps to provide more context.
Thanks everyone!
RecursionError Traceback (most recent call last)
in
----> 1 learn.load(base_dir + ‘stage-1-50’)
~/fastai/fastai/basic_train.py in load(self, name, device, strict, with_opt, purge, remove_module)
259 remove_module:bool=False):
260 “Load model and optimizer state (if with_opt
) name
from self.model_dir
using device
.”
–> 261 if purge: self.purge(clear_opt=ifnone(with_opt, False))
262 if device is None: device = self.data.device
263 elif isinstance(device, int): device = torch.device(‘cuda’, device)
~/fastai/fastai/basic_train.py in purge(self, clear_opt)
310
311 tmp_file = get_tmp_file(self.path/self.model_dir)
–> 312 torch.save(state, open(tmp_file, ‘wb’))
313 for a in attrs_del: delattr(self, a)
314 gc.collect()
~/anaconda3/lib/python3.7/site-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol)
217 >>> torch.save(x, buffer)
218 “”"
–> 219 return _with_file_like(f, “wb”, lambda f: _save(obj, f, pickle_module, pickle_protocol))
220
221
~/anaconda3/lib/python3.7/site-packages/torch/serialization.py in _with_file_like(f, mode, body)
142 f = open(f, mode)
143 try:
–> 144 return body(f)
145 finally:
146 if new_fd:
~/anaconda3/lib/python3.7/site-packages/torch/serialization.py in (f)
217 >>> torch.save(x, buffer)
218 “”"
–> 219 return _with_file_like(f, “wb”, lambda f: _save(obj, f, pickle_module, pickle_protocol))
220
221
~/anaconda3/lib/python3.7/site-packages/torch/serialization.py in _save(obj, f, pickle_module, pickle_protocol)
290 pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
291 pickler.persistent_id = persistent_id
–> 292 pickler.dump(obj)
293
294 serialized_storage_keys = sorted(serialized_storages.keys())
~/fastai/fastai/callback.py in getattr(self, k)
61
62 #Passthrough to the inner opt.
—> 63 def getattr(self, k:str)->Any: return getattr(self.opt, k, None)
64 def setstate(self,data:Any): self.dict.update(data)
65
… last 1 frames repeated, from the frame below …
~/fastai/fastai/callback.py in getattr(self, k)
61
62 #Passthrough to the inner opt.
—> 63 def getattr(self, k:str)->Any: return getattr(self.opt, k, None)
64 def setstate(self,data:Any): self.dict.update(data)
65
RecursionError: maximum recursion depth exceeded