How to disable all form of multiprocessing in FastAI and Pytorch ? Why does my custom transform block causes CUDA multiprocessing error?

sirgarfield · January 17, 2021, 12:20am

Trying to define a custom transform block that converts audio file to tensor. But encountering a series of errors.

Custom Transform

import soundfile as sf
class AudioTransform(Transform):
    def __init__(self, period, stride):
        self.period = period 
        self.stride = stride 
        self.sr = 48000
        
    def encodes(self, record_id):
        if (data_dir/f'train/{record_id}.flac').is_file():
            y, sr = sf.read(data_dir/f'train/{record_id}.flac')
        else:
            y, sr = sf.read(data_dir/f'test/{record_id}.flac')
            y_ = []
            i = 0
            effective_length = self.period * self.sr
            stride = self.stride * self.sr
            y = np.stack([y[i:i+effective_length].astype(np.float32) for i in range(0, 60*self.sr+stride-effective_length, stride)])
       
        return torch.tensor(y).cuda()
       
def AudioBlock(): return TransformBlock(type_tfms=AudioTransform(period=10, stride=5).encodes)

Usage

data = DataBlock(blocks=(AudioBlock, MultiCategoryBlock(vocab=[str(i) for i in range(24)])),
                       splitter=IndexSplitter(val_index),
                       get_x=ColReader('recording_id'),
                       get_y=ColReader('species_id', label_delim=' ')
                     )
dls = data.dataloaders(train_tp,bs=16, worker=1)

learn = Learner(dls, model, metrics=[lwlrap],
              loss_func=nn.BCELoss(),
              opt_func=Adam)
learn.fit(20, lr)

Usage block produces

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Error fix attempt 1:

If I remove the content of AudioTransform.encodes and just return the input like below, the CUDA multiprocessing error goes away

    def encodes(self, record_id):
        return record_id

Can someone help me understand why ?

Error fix attempt 2:

Use

torch.multiprocessing.set_start_method('spawn', force=True)

as suggested by the error message. The CUDA multiprocessing error goes away but I get this new error relating to picking my AudioTransform.encodes function above

Question:

How do I disable all form of multiprocessing in both torch and FastAI ?
Any Idea on how to fix the CUDA multiprocessing error and the pickling error without disable multiprocessing ?
3.What is wrong with the content of AudioTransform.encodes ? Why does removal of function body suppress the CUDA multiprocessing error ?

muellerzr · January 17, 2021, 12:30am

Set num_workers to 0 in that DataLoaders call, not workers. The other solution is just don’t have your device set to cuda in that transform. Fastai should do that automatically for you

dokuboyejo · December 30, 2021, 12:35am

I’ve similar issue as this as well as here Seems there is a similar issue when using something like supervisor within virtualised environment (docker).
Upon further check, it seems there might be a bug in the following

An Issue has been created as well

cc: @muellerzr