Ok, so we are using torch’s torch.utils.data.DataLoader
which takes a parameter called num_workers to fetch data with multiple sub processes.
num_workers (int, optional) – how many subprocesses to use for data loading.
0 means that the data will be loaded in the main process (default: 0)
(defaut set in fastai lib is 4)
The stalling is happening when we try to get the next batch iterator from the DataLoader
fastai/dataset.py line 218
215 def __next__(self):
216 if self.i>=len(self.dl): raise StopIteration
217 self.i+=1
218 return next(self.it)
It just might be a bug in torch.utils.data.DataLoader
Temp Solution (I think this should work for now)
In your notebook, you can set this num_worker to 0 to avoid spawning sub processes to circumvent the need to lock.
Do this by passing num_workers=0 to ImageClassifierData.from_paths() function:
ImageClassifierData.from_paths(PATH, bs=2, tfms=tfms, num_workers=0)
(at 3 to 4 places in the notebook)
It should not lock for now (I tried 2 times after this change but did not stall/lock). Let me know if this solves temporarily.
@jeremy this should be okay for the meantime, right? Or is setting num_workers to 4 important?
In the meantime, we will need to see what’s happening with data fetching.
–