How to run learner.predict in parallel

I’ve a trained a ResNet-34 based image classification learner with something like:

learn = create_nn(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4, max_lr=slice(1e-6, 1e-4))

Then I run prediction sequentially on every image in the test folder:

labels = []
for path in
    cls, label, probs = learn.predict(open_image(path.as_posix()))

I realized after running the cell that this going to be terrible slow!

How I could parallelize this so that it runs more efficiently?


Hi. Ever got an answer to this? I’m in need of the same… thanks!

Me too!

You can put your images in a set and predict them with get_preds. This will work if you already have a bunch of images that you want to label via your model.


Hi. Thank you for the reply. But it does not work yet, let me show you what I did.

First please let me write what is working without multiprocessing. The makePrediction function takes the image as input and prints the label of class.

def makePrediction(image_loc):
    img = open_image(image_loc)
    pred_class,pred_idx,outputs = learn_gender_eth.predict(img)
    thresh = 0.4
    labelled_preds = [' '.join([[i] for i,p in enumerate(pred) if p > thresh]) for pred in outputs[None]]
    print (labelled_preds)

My input is some image:


This works perfectly!

Now, I want to use multiprocessing. I make the following code.

from multiprocessing import Pool
p = Pool(processes=20)
data =, Path(path/"utkface/crop_part1/").ls()[0:4])

Here, Path(path/"utkface/crop_part1/").ls()[0:6] is list of images as shown below:


But it gives an error

RemoteTraceback                           Traceback (most recent call last)
Traceback (most recent call last):
  File "/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-55-2edee360c795>", line 3, in makePrediction
    pred_class,pred_idx,outputs = learn_gender_eth.predict(img)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/", line 374, in predict
    batch =
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/", line 181, in one_item
    return self.one_batch(ds_type=DatasetType.Single, detach=detach, denorm=denorm, cpu=cpu)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/", line 168, in one_batch
    try:     x,y = next(iter(dl))
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/", line 75, in __iter__
    for b in self.dl: yield self.proc_batch(b)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/", line 348, in __next__
    data = _utils.pin_memory.pin_memory(data)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 51, in pin_memory
    return [pin_memory(sample) for sample in data]
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 51, in <listcomp>
    return [pin_memory(sample) for sample in data]
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 43, in pin_memory
    return data.pin_memory()
RuntimeError: cuda runtime error (3) : initialization error at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCCachingHostAllocator.cpp:296

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-58-28845f1c4938> in <module>
      1 from multiprocessing import Pool
      2 p = Pool(processes=20)
----> 3 data =, Path(path/"utkface/crop_part1/").ls()[0:6])
      4 p.close()
      5 print(data)

/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/ in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    270     def starmap(self, func, iterable, chunksize=None):

/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/ in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    659     def _set(self, i, obj):

RuntimeError: cuda runtime error (3) : initialization error at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCCachingHostAllocator.cpp:296

What should I do?

mschmit5 is correct. The way to process multiple images at a time using fastai is to use get_preds. It will batch the images in parallel and process an entire batch at one time on the gpu.

You can add a test set to an existing databunch by calling either add_test or add_test_folder like so:'utkface/crop_part1'). Or alternatively, you could add the test databunch to the learner when calling load_learner. Then you would call learn.get_preds(DatasetType.Test) to get predictions for the entire test dataset.


Hi, thank you! Okay, I will try that. However, I am not using GPU for inference but CPU. I will try that and see if speed has changed. So, it seems we are not supposed to set num_workers option manually. Just using get_preds will run it in parallel. Is that so?

