fast.ai Course Forums

RuntimeError: DataLoader worker is killed by signal

jeffhale (Jeff Hale) March 20, 2019, 4:09am 86

I think Kaggle still doesn’t have a high enough shared memory limit for their Docker containers.

Some options:

reduce your batch size, say to bs=16 maybe, instead of the default 64.
reduce the number of workers. This will slow down your training.
train on Colab instead of Kaggle. Colab fixed this issue in fall 2018.

I would favor option #1 or #3.

2 Likes

Lesson 9 Discussion & Wiki (2019)