How to run learner.predict in parallel

bachir · October 31, 2018, 4:22pm

I’ve a trained a ResNet-34 based image classification learner with something like:

learn = create_nn(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4, max_lr=slice(1e-6, 1e-4))

Then I run prediction sequentially on every image in the test folder:

labels = []
for path in test_path.ls():
    cls, label, probs = learn.predict(open_image(path.as_posix()))
    labels.append(label)

I realized after running the cell that this going to be terrible slow!

How I could parallelize this so that it runs more efficiently?

borice · November 16, 2019, 4:13am

Hi. Ever got an answer to this? I’m in need of the same… thanks!

nathuram · January 22, 2020, 12:49pm

Me too!

mschmit5 · January 22, 2020, 1:15pm

You can put your images in a set and predict them with get_preds. This will work if you already have a bunch of images that you want to label via your model.

nathuram · January 22, 2020, 4:28pm

Hi. Thank you for the reply. But it does not work yet, let me show you what I did.

First please let me write what is working without multiprocessing. The makePrediction function takes the image as input and prints the label of class.

def makePrediction(image_loc):
    img = open_image(image_loc)
    pred_class,pred_idx,outputs = learn_gender_eth.predict(img)
    outputs[None]
    thresh = 0.4
    labelled_preds = [' '.join([learn_gender_eth.data.classes[i] for i,p in enumerate(pred) if p > thresh]) for pred in outputs[None]]
    print (labelled_preds)

My input is some image:

process(PosixPath('utkface/crop_part1/64_1_0_20170110140833569.jpg.chip.jpg'))

This works perfectly!

Now, I want to use multiprocessing. I make the following code.

from multiprocessing import Pool
p = Pool(processes=20)
data = p.map(makePrediction, Path(path/"utkface/crop_part1/").ls()[0:4])
p.close()
print(data)

Here, Path(path/"utkface/crop_part1/").ls()[0:6] is list of images as shown below:

[PosixPath('utkface/crop_part1/64_1_0_20170110140833569.jpg.chip.jpg'),
 PosixPath('utkface/crop_part1/26_1_0_20170104165749289.jpg.chip.jpg'),
 PosixPath('utkface/crop_part1/1_0_2_20161219162852759.jpg.chip.jpg'),
 PosixPath('utkface/crop_part1/43_0_4_20170104000923085.jpg.chip.jpg')]

But it gives an error

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-55-2edee360c795>", line 3, in makePrediction
    pred_class,pred_idx,outputs = learn_gender_eth.predict(img)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py", line 374, in predict
    batch = self.data.one_item(item)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/basic_data.py", line 181, in one_item
    return self.one_batch(ds_type=DatasetType.Single, detach=detach, denorm=denorm, cpu=cpu)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/basic_data.py", line 168, in one_batch
    try:     x,y = next(iter(dl))
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/fastai/basic_data.py", line 75, in __iter__
    for b in self.dl: yield self.proc_batch(b)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 348, in __next__
    data = _utils.pin_memory.pin_memory(data)
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 51, in pin_memory
    return [pin_memory(sample) for sample in data]
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 51, in <listcomp>
    return [pin_memory(sample) for sample in data]
  File "/media/raghav/workspace/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 43, in pin_memory
    return data.pin_memory()
RuntimeError: cuda runtime error (3) : initialization error at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCCachingHostAllocator.cpp:296
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-58-28845f1c4938> in <module>
      1 from multiprocessing import Pool
      2 p = Pool(processes=20)
----> 3 data = p.map(makePrediction, Path(path/"utkface/crop_part1/").ls()[0:6])
      4 p.close()
      5 print(data)

/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

/media/raghav/workspace/anaconda3/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

RuntimeError: cuda runtime error (3) : initialization error at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCCachingHostAllocator.cpp:296

What should I do?

bwarner · January 23, 2020, 12:20am

mschmit5 is correct. The way to process multiple images at a time using fastai is to use get_preds. It will batch the images in parallel and process an entire batch at one time on the gpu.

You can add a test set to an existing databunch by calling either add_test or add_test_folder like so: learn.data.add_test_folder(test_folder='utkface/crop_part1'). Or alternatively, you could add the test databunch to the learner when calling load_learner. Then you would call learn.get_preds(DatasetType.Test) to get predictions for the entire test dataset.

nathuram · January 23, 2020, 5:42am

Hi, thank you! Okay, I will try that. However, I am not using GPU for inference but CPU. I will try that and see if speed has changed. So, it seems we are not supposed to set num_workers option manually. Just using get_preds will run it in parallel. Is that so?