Use of "parallel" function in fastai.core

I have tried to use the fastai.core.parallel function multiple times in different situations, but every time it fails (i.e. the progress bar goes to a 100%, but with none of the work is completed). Could someone please help me understand what I am doing wrong?

For context, I am using a library image_slicer to slice images up into 64 pieces, each and save them in a different directory.

import pathlib
import image_slicer
from fastai.core import parallel

path_og = pathlib.Path('/home/jupyter/images')

def my_func(j):
    path_og = pathlib.Path('/home/jupyter/images')
    try:
        tiles = image_slicer.slice(path_og/j, 64, save=False)
        image_slicer.save_tiles(tiles, directory='/home/jupyter/images_sliced', prefix=j.replace('.jpg',''), format='jpeg')
    except:
        pass

parallel(my_func, os.listdir(path_og))
       

As additional context, my_func(os.listdir(path_og)[0]) can be run successfully, so I know there is nothing wrong internally in my function.

4 Likes

I have added an example to the documentation to illustrate the function’s correct usage. In short my_func must accept two values both the value of the array element and the index of the array element e.g. def my_func(value, index)

2 Likes

Are you on Windows? The parallel function doesn’t work for me on Windows. I either use a single threaded process or write my own version of a multithreading wrapper with ThreadPoolExecutor.

1 Like

I could not get parallel() to work on windows neither, but it did work on Linux.

1 Like

@KarlH I am on a mac, which it does work for

Unrelated to the problems listed here, but I spent some time figuring out something out with parallel, so I’ll share:

  1. Processes don’t share global variables, so you can’t append results to a global list within the function that you parallelize.

  2. But: the solution is in the source code of fastai.core.parallel! If your function returns a result, then parallel will return a list of those results. It doesn’t look like those results will be ordered (didn’t matter to me), but you can add the corresponding index to each individual result to solve that issue.

I guess if I modify the current example in the doc, we have:

def my_func(value, index):

    return([index,value])
 
my_array = [i*2 for i in range(5)]
results = parallel(my_func, my_array, max_workers=3)

I got it working on Windows. All you have to do is call the function in __name__ == '__main__'
Example:

if __name__ == '__main__':
    parallel(process, arr=range(1500))