Very very very strange behaviour while predicting on CPU

gianferrarif · November 7, 2018, 2:33pm

Hi all.
I have a very strange behaviour while running my email image classification on CPU (fastai 1.0.19).
In a non deterministic way, sometimes I can run the model and get the predictions, sometimes not.
Once the model is loaded, I can make the predictions all the times. But if I re-run the script, at the next run it works, sometimes not.
Sometimes I get different errors.
The frequency of this behaviour depends also on the .pth that I load.

If somebody wants to test it, the repo is: https://github.com/francescogianferraripini/qcarcrash

To run it, run ‘python qcarcrash.py test’
Uncomment the section on row 37 or 48 of qcarcrash.py to test the different models.
Just to prove the erratic behaviour, https://github.com/francescogianferraripini/qcarcrash/blob/master/Different%20Errors contains the output of some execution of the model that almost never works.
All models are resnet50, although trained on different datasets.

I am very very puzzled.

jeremy · November 7, 2018, 3:24pm

It’s due to a pytorch bug. We’ve just pushed a change to work around it.

gianferrarif · November 7, 2018, 3:58pm

I just updated to 1.0.20 using conda.
I still face the same random behaviour.
When you see " <starlette.responses.JSONResponse object at 0x7fb6d538c2e8>" it is working

(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$ python qcarcrash.py test
['/damage_1.jpg', '/whole_1.jpg']
Traceback (most recent call last):
  File "qcarcrash.py", line 197, in <module>
    qcrashLearners = [generateLearner(md) for md in qcrashModelDefs]
  File "qcarcrash.py", line 197, in <listcomp>
    qcrashLearners = [generateLearner(md) for md in qcrashModelDefs]
  File "qcarcrash.py", line 96, in generateLearner
    size=modelDefition['imageSize'],bs=32
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/data.py", line 353, in from_name_re
    return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, test=test, **kwargs)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/data.py", line 347, in from_name_func
    return cls.from_lists(path, fnames, labels, valid_pct=valid_pct, test=test, **kwargs)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/data.py", line 342, in from_lists
    return cls.create(*datasets, path=path, **kwargs)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/data.py", line 289, in create
    zip(datasets, (bs,bs*2,bs*2), (True,False,False))]
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/data.py", line 288, in <listcomp>
    dls = [DataLoader(*o, num_workers=num_workers) for o in
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 802, in __init__
    sampler = RandomSampler(dataset)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 64, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0
(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$ python qcarcrash.py test
['/damage_1.jpg', '/whole_1.jpg']
1.0.20
<starlette.responses.JSONResponse object at 0x7fb6d538c2e8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
1.0.20
(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$ python qcarcrash.py test
['/damage_1.jpg', '/whole_1.jpg']
1.0.20
/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/fastprogress.py:95: UserWarning: Your generator is empty.
  warn("Your generator is empty.")
Traceback (most recent call last):
  File "qcarcrash.py", line 200, in <module>
    print(str(predict_image(Path('./6b7c64a6-e1e6-11e8-a965-99eec267d82d.jpg'))))
  File "qcarcrash.py", line 154, in predict_image
    pred,_,_=qcrashLearners[0].predict(img)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/learner.py", line 51, in predict
    res = self.pred_batch()[0]
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/basic_train.py", line 217, in pred_batch
    preds,_ = self.get_preds(ds_type, with_loss=False, n_batch=1, pbar=pbar)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/basic_train.py", line 210, in get_preds
    activ=_loss_func2activ(self.loss_func), loss_func=lf, n_batch=n_batch, pbar=pbar)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/basic_train.py", line 38, in get_preds
    zip(*validate(model, dl, cb_handler=cb_handler, pbar=pbar, average=False, n_batch=n_batch))]
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/basic_train.py", line 49, in validate
    for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 63, in __iter__
    self.update(0)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 78, in update
    self.update_bar(0)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 96, in update_bar
    self.on_update(0, '100% [0/0]')
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 258, in on_update
    filled_len = int(self.length * val // self.total)
ZeroDivisionError: integer division or modulo by zero
(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$

devforfu · November 7, 2018, 4:07pm

I was getting this error when didn’t provide correct path to the validation dataset. I didn’t have this error while making a single image prediction. However, in my case, an inference was a bit too slow.

gianferrarif · November 7, 2018, 4:08pm

Did you have it at random?

devforfu · November 7, 2018, 4:25pm

No, in my case, the execution is slow on each try.

gianferrarif · November 7, 2018, 4:30pm

I do not have issues of speed. I get random errors.

gianferrarif · November 7, 2018, 4:54pm

@jeremy let me know if I have to open an issue on github.

jeremy · November 7, 2018, 8:37pm

@gianferrarif - no, without a reproducible errors there’s nothing I can do I’m afraid. If you can find a small amount of code that reliably fails on a standard platform, then we can work to fix it.

gianferrarif · November 7, 2018, 8:42pm

In the repo I posted the first scenario fails 90% of the time.

gianferrarif · November 7, 2018, 11:05pm

I thoroughly recreated all the environments and now it seems working. I will test again

Benudek · December 4, 2018, 3:38pm

Francesco, I seem to get that error, when running through the docs to create a language model.

Seem to be linked to the progress bar.

Did you find a solution?

Error:

/opt/anaconda3/lib/python3.6/site-packages/fastprogress/fastprogress.py:95: UserWarning: Your generator is empty.
warn(“Your generator is empty.”)

Python Doc:
https://docs.fast.ai/text.html
lines triggering failure
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.5)
learn.fit_one_cycle(1, 1e-2)

gianferrarif · December 4, 2018, 3:43pm

I had the issue in image recognition at inference time. Therefore it is a different scenario.

Benudek · December 4, 2018, 3:45pm

probbly, I just string searched I have confess thx for the fast reply anyways