How to solve RuntimeError: DataLoader worker (pid(s)) exited unexpectedly?

I’m a beginner with deep learning, and I’m using Google Colab to run my code.
(pytorch 1.4.0, touchvision 0.5.0)
I generated my databunch by using these code:

tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.5)

src = (ImageList.from_folder(path=data_folder)

img_data = (src.transform(tfms, size=128)

When I try to run

model = cnn_learner(img_data, models.resnet34, metrics=[accuracy, error_rate]) = img_data

I got the RuntimeError: DataLoader worker (pid(s) XXX) exited unexpectedly

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/torch/utils/data/ in _try_get_data(self, timeout)
    760         try:
--> 761             data = self._data_queue.get(timeout=timeout)
    762             return (True, data)

13 frames
RuntimeError: DataLoader worker (pid 303) is killed by signal: Segmentation fault. 

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/torch/utils/data/ in _try_get_data(self, timeout)
    772             if len(failed_workers) > 0:
    773                 pids_str = ', '.join(str( for w in failed_workers)
--> 774                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
    775             if isinstance(e, queue.Empty):
    776                 return (False, None)

RuntimeError: DataLoader worker (pid(s) 303) exited unexpectedly

I’ve tried to reduce my batch_size, but that didn’t work.
I’ve also searched about this error, and it says I need to use num_workers=0 to solve this problem, but I didn’t use any code about DataLoader method.
How could I solve this problem?


By the way, I’ve tried using pytorch 1.5.0+touchvision 0.6.0, and it’ll not report this error, but will get a warning:

UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. 
If you wish to keep the old behavior, please set recompute_scale_factor=True. 
See the documentation of nn.Upsample for details. 
warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change ")

And when I using pytorch 1.4.0, it took about 20~30 minutes to epoch once, but when I using pytorch 1.5.0, it would take over an hour to epoch once.

OK, It seems like that all the origin of the problem is the version of pytorch.
After trying all the ways I could find on Google, I use the default version(pytorch 1.7.0 + touchvision 0.8.0) on colab to run my code and it could work. But the warnings are still exsiting.
So I used

import warnings

to ignore them.
And the time to epoch once also became decent. What a strange problem.
Maybe the fastaiv2 casued the problem, or maybe colab’s pytorch.
That’s the end of this question.

