Prediction on test image raises RunTimeError of pids exiting unexpectedly

I am trying to predict on a test image with the model I trained. I was facing lots of error in getting model on CPU but managed to fix it. However, now that I am trying to predict, it throws error of pids closing unexpectedly. I have been on this for hours and I cant seem to get what the issue is. I m just a beginner but I am trying for the first time to train a model on Lego data set from Kaggle. Please help !

Fastai version: 2.0.11

Following is displayed:

---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
~\anaconda3\envs\FastDL\lib\site-packages\torch\utils\data\dataloader.py in _try_get_data(self, timeout)
    778         try:
--> 779             data = self._data_queue.get(timeout=timeout)
    780             return (True, data)

~\anaconda3\envs\FastDL\lib\multiprocessing\queues.py in get(self, block, timeout)
    107                     if not self._poll(timeout):
--> 108                         raise Empty
    109                 elif not self._poll():

Empty: 

During handling of the above exception, another exception occurred:

    RuntimeError                              Traceback (most recent call last)
    <ipython-input-152-35522ec3692e> in <module>
          1 fnames=get_image_files(path/'Test')
          2 fnames
    ----> 3 pred_class,pred_idx,outputs = l.predict(fnames[0])
          4 pred_class

    ~\anaconda3\envs\FastDL\lib\site-packages\fastai\learner.py in predict(self, item, rm_type_tfms, with_input)
        246     def predict(self, item, rm_type_tfms=None, with_input=False):
        247         dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms, num_workers=0)
    --> 248         inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
        249         i = getattr(self.dls, 'n_inp', -1)
        250         inp = (inp,) if i==1 else tuplify(inp)

    ~\anaconda3\envs\FastDL\lib\site-packages\fastai\learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, reorder, cbs, n_workers, **kwargs)
        233         if with_loss: ctx_mgrs.append(self.loss_not_reduced())
        234         with ContextManagers(ctx_mgrs):
    --> 235             self._do_epoch_validate(dl=dl)
        236             if act is None: act = getattr(self.loss_func, 'activation', noop)
        237             res = cb.all_tensors()

    ~\anaconda3\envs\FastDL\lib\site-packages\fastai\learner.py in _do_epoch_validate(self, ds_idx, dl)
        186         if dl is None: dl = self.dls[ds_idx]
        187         self.dl = dl
    --> 188         with torch.no_grad(): self._with_events(self.all_batches, 'validate', CancelValidException)
        189 
        190     def _do_epoch(self):

    ~\anaconda3\envs\FastDL\lib\site-packages\fastai\learner.py in _with_events(self, f, event_type, ex, final)
        153 
        154     def _with_events(self, f, event_type, ex, final=noop):
    --> 155         try:       self(f'before_{event_type}')       ;f()
        156         except ex: self(f'after_cancel_{event_type}')
        157         finally:   self(f'after_{event_type}')        ;final()

    ~\anaconda3\envs\FastDL\lib\site-packages\fastai\learner.py in all_batches(self)
        159     def all_batches(self):
        160         self.n_iter = len(self.dl)
    --> 161         for o in enumerate(self.dl): self.one_batch(*o)
        162 
        163     def _do_one_batch(self):

    ~\anaconda3\envs\FastDL\lib\site-packages\fastai\data\load.py in __iter__(self)
        101         self.randomize()
        102         self.before_iter()
    --> 103         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
        104             if self.device is not None: b = to_device(b, self.device)
        105             yield self.after_batch(b)

    ~\anaconda3\envs\FastDL\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
        361 
        362     def __next__(self):
    --> 363         data = self._next_data()
        364         self._num_yielded += 1
        365         if self._dataset_kind == _DatasetKind.Iterable and \

    ~\anaconda3\envs\FastDL\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
        972 
        973             assert not self._shutdown and self._tasks_outstanding > 0
    --> 974             idx, data = self._get_data()
        975             self._tasks_outstanding -= 1
        976 

    ~\anaconda3\envs\FastDL\lib\site-packages\torch\utils\data\dataloader.py in _get_data(self)
        939         else:
        940             while True:
    --> 941                 success, data = self._try_get_data()
        942                 if success:
        943                     return data

    ~\anaconda3\envs\FastDL\lib\site-packages\torch\utils\data\dataloader.py in _try_get_data(self, timeout)
        790             if len(failed_workers) > 0:
        791                 pids_str = ', '.join(str(w.pid) for w in failed_workers)
    --> 792                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
        793             if isinstance(e, queue.Empty):
        794                 return (False, None)

    RuntimeError: DataLoader worker (pid(s) 9960, 22048, 4456, 20800, 13168, 8732, 19872, 9612, 20492, 20564, 14396, 20644) exited unexpectedly**

Can you try this again?

1 Like

I updated and tried again. The previous error is gone, but now it gives the folder name of the test image which is ‘Test’:

fnames=get_image_files(path/'Test')
fnames
pred_class,pred_idx,outputs = l.predict(fnames[0])
pred_class

'Test'

What are your classes?

They are the names of Lego figures:

SPIDER-MAN
VENOM
AUNT MAY
GHOST SPIDER
YODA
LUKE SKYWALKER
R2-D2
MACE WINDU
GENERAL GRIEVOUS
KYLO REN
THE MANDALORIAN
CARA DUNE
KLATOOINIAN RAIDER 1
KLATOOINIAN RAIDER 2
MYSTERIO
FIREFIGHTER
SPIDER-MAN
HARRY POTTER
RON WEASLEY
BLACK WIDOW
YELENA BELOVA
TASKMASTER
CAPTAIN AMERICA
OUTRIDER 1
OUTRIDER 2
OWEN GRADY
TRACKER TRAQUEUR RASTREADOR
IRON MAN MK 1

I can’t recreate this on my machine. Currently running the Pets example on 2.0.12 of fastai. I did the following just like you did:

fnames = get_image_files(path/'images')
name, idx, probs = learn.predict(fnames[0])

Which returns:
chihuahua for name.

What is the output of l.dls.vocab? I’m wondering if your vocab may not be aligning up to what you’re potentially thinking it may be. We’ll also need to know how you made your DataBlock to truly get a hint of what could be going on besides wild guessing

2 Likes

It was an error from my end. After training the model I made another folder in the path named Test and trained the model again, and hence it was giving the label as test. I shifted the test folder to parent folder and now it works !

Though the model gives 10% loss, I m happy I made my first DL application on real world dataset !

Thank you so much for helping in resolving the initial error !

I’ll share the results as well, even if it is a negative one:

test_01

fnames=get_image_files('.')
pred_class,pred_idx,outputs = l.predict(fnames[0])
pred_class

'R2-D2'