Very very very strange behaviour while predicting on CPU

Hi all.
I have a very strange behaviour while running my email image classification on CPU (fastai 1.0.19).
In a non deterministic way, sometimes I can run the model and get the predictions, sometimes not.
Once the model is loaded, I can make the predictions all the times. But if I re-run the script, at the next run it works, sometimes not.
Sometimes I get different errors.
The frequency of this behaviour depends also on the .pth that I load.

If somebody wants to test it, the repo is:

To run it, run ‘python test’
Uncomment the section on row 37 or 48 of to test the different models.
Just to prove the erratic behaviour, contains the output of some execution of the model that almost never works.
All models are resnet50, although trained on different datasets.

I am very very puzzled.

It’s due to a pytorch bug. We’ve just pushed a change to work around it.

1 Like

I just updated to 1.0.20 using conda.
I still face the same random behaviour.
When you see " <starlette.responses.JSONResponse object at 0x7fb6d538c2e8>" it is working

(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$ python test
['/damage_1.jpg', '/whole_1.jpg']
Traceback (most recent call last):
  File "", line 197, in <module>
    qcrashLearners = [generateLearner(md) for md in qcrashModelDefs]
  File "", line 197, in <listcomp>
    qcrashLearners = [generateLearner(md) for md in qcrashModelDefs]
  File "", line 96, in generateLearner
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/", line 353, in from_name_re
    return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, test=test, **kwargs)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/", line 347, in from_name_func
    return cls.from_lists(path, fnames, labels, valid_pct=valid_pct, test=test, **kwargs)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/", line 342, in from_lists
    return cls.create(*datasets, path=path, **kwargs)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/", line 289, in create
    zip(datasets, (bs,bs*2,bs*2), (True,False,False))]
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/", line 288, in <listcomp>
    dls = [DataLoader(*o, num_workers=num_workers) for o in
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/torch/utils/data/", line 802, in __init__
    sampler = RandomSampler(dataset)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/torch/utils/data/", line 64, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integeral value, but got num_samples=0
(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$ python test
['/damage_1.jpg', '/whole_1.jpg']
<starlette.responses.JSONResponse object at 0x7fb6d538c2e8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
<starlette.responses.JSONResponse object at 0x7fb6d1fa77b8>
(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$ python test
['/damage_1.jpg', '/whole_1.jpg']
/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/ UserWarning: Your generator is empty.
  warn("Your generator is empty.")
Traceback (most recent call last):
  File "", line 200, in <module>
  File "", line 154, in predict_image
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/vision/", line 51, in predict
    res = self.pred_batch()[0]
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/", line 217, in pred_batch
    preds,_ = self.get_preds(ds_type, with_loss=False, n_batch=1, pbar=pbar)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/", line 210, in get_preds
    activ=_loss_func2activ(self.loss_func), loss_func=lf, n_batch=n_batch, pbar=pbar)
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/", line 38, in get_preds
    zip(*validate(model, dl, cb_handler=cb_handler, pbar=pbar, average=False, n_batch=n_batch))]
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastai/", line 49, in validate
    for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/", line 63, in __iter__
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/", line 78, in update
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/", line 96, in update_bar
    self.on_update(0, '100% [0/0]')
  File "/home/gianferrarif/anaconda3/envs/fastai-cpu/lib/python3.6/site-packages/fastprogress/", line 258, in on_update
    filled_len = int(self.length * val //
ZeroDivisionError: integer division or modulo by zero
(fastai-cpu) gianferrarif@fgplaptop:~/Dev/qcarcrash$

I was getting this error when didn’t provide correct path to the validation dataset. I didn’t have this error while making a single image prediction. However, in my case, an inference was a bit too slow.

Did you have it at random?

No, in my case, the execution is slow on each try.

I do not have issues of speed. I get random errors.

@jeremy let me know if I have to open an issue on github.

@gianferrarif - no, without a reproducible errors there’s nothing I can do I’m afraid. If you can find a small amount of code that reliably fails on a standard platform, then we can work to fix it.

In the repo I posted the first scenario fails 90% of the time.

I thoroughly recreated all the environments and now it seems working. I will test again

Francesco, I seem to get that error, when running through the docs to create a language model.

Seem to be linked to the progress bar.

Did you find a solution?


/opt/anaconda3/lib/python3.6/site-packages/fastprogress/ UserWarning: Your generator is empty.
warn(“Your generator is empty.”)

Python Doc:
lines triggering failure
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.5)
learn.fit_one_cycle(1, 1e-2)

1 Like

I had the issue in image recognition at inference time. Therefore it is a different scenario.

probbly, I just string searched I have confess :wink: thx for the fast reply anyways