Very slow loading of the Convnet pretrained model on lesson 1

Ok, so we are using torch’s torch.utils.data.DataLoader which takes a parameter called num_workers to fetch data with multiple sub processes.

num_workers (int, optional) – how many subprocesses to use for data loading.
 0 means that the data will be loaded in the main process (default: 0)

(defaut set in fastai lib is 4)

The stalling is happening when we try to get the next batch iterator from the DataLoader

fastai/dataset.py line 218


215    def __next__(self):
216        if self.i>=len(self.dl): raise StopIteration
217        self.i+=1
218        return next(self.it)

It just might be a bug in torch.utils.data.DataLoader

Temp Solution (I think this should work for now)

In your notebook, you can set this num_worker to 0 to avoid spawning sub processes to circumvent the need to lock.

Do this by passing num_workers=0 to ImageClassifierData.from_paths() function:

ImageClassifierData.from_paths(PATH, bs=2, tfms=tfms, num_workers=0)

(at 3 to 4 places in the notebook)

It should not lock for now (I tried 2 times after this change but did not stall/lock). Let me know if this solves temporarily.

@jeremy this should be okay for the meantime, right? Or is setting num_workers to 4 important?

In the meantime, we will need to see what’s happening with data fetching.

3 Likes