Learn.fit_one_cycle(4): Interrupted - Runtime Error

kalensr · September 4, 2019, 12:50am

Has anyone experienced the following runtime error? I’ve been working on it for a day now.

lesson1-pets - Jupyter Notebook

Full dump:

Interrupted

RuntimeError Traceback (most recent call last)
in
----> 1 learn.fit_one_cycle(4)
~/.local/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
20 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
21 final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
—> 22 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
23 24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None):
~/.local/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
200 callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
201 self.cb_fns_registered = True
–> 202 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
203 204 def create_opt(self, lr:Floats, wd:Floats=0.)->None:
~/.local/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
99 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
100 xb, yb = cb_handler.on_batch_begin(xb, yb)
–> 101 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
102 if cb_handler.on_batch_end(loss): break
103~/.local/lib/python3.6/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
24 if not is_listy(xb): xb = [xb]
25 if not is_listy(yb): yb = [yb]
—> 26 out = model(*xb)
27 out = cb_handler.on_loss_begin(out)
28~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
–> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
95 def forward(self, input):
96 for module in self._modules.values():
—> 97 input = module(input)
98 return input
99~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
–> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
95 def forward(self, input):
96 for module in self._modules.values():
—> 97 input = module(input)
98 return input
99~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
–> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
95 def forward(self, input):
96 for module in self._modules.values():
—> 97 input = module(input)
98 return input
99~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
–> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/torchvision/models/resnet.py in forward(self, x)
43 identity = x
44—> 45 out = self.conv1(x)
46 out = self.bn1(out)
47 out = self.relu(out)
~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
–> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
320 def forward(self, input):
321 return F.conv2d(input, self.weight, self.bias, self.stride,
–> 322 self.padding, self.dilation, self.groups)
323 324~/.local/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
63 # This following call uses waitid with WNOHANG from C side. Therefore,
64 # Python can still get and update the process status successfully.
—> 65 _error_if_any_worker_fails()
66 if previous_handler is not None:
67 previous_handler(signum, frame)
RuntimeError: DataLoader worker (pid 7481) is killed by signal: Killed.

navidpanchi · September 6, 2019, 8:26pm

Try using a single worker in the dataloader. See if that fixes your problem

kalensr · September 13, 2019, 12:47pm

Thanks for the reply! I will try this over the weekend.

akshaypt · December 6, 2019, 5:22am

Yes, i have got the same error. Did you got any solution?

akshaypt · December 6, 2019, 5:24am

Hi NavidPanchi,

I have got the same error, I saw your suggestion, but I don’t fully understand by what you meant? I am new to this.

Can you please elaborate on this "Try using a single worker in the dataloader. ?!’

chatuur · December 6, 2019, 12:11pm

This is a common error.
I usually encounter this when there’s a shortage of RAM (Not just with fastai ).

A single worker means set num_workers=1 in the databunch.

A more elaborate discussion can be found here

akshaypt · December 7, 2019, 5:49am

Thanks @chatuur