I think Kaggle still doesn’t have a high enough shared memory limit for their Docker containers.
Some options:
- reduce your batch size, say to bs=16 maybe, instead of the default 64.
- reduce the number of workers. This will slow down your training.
- train on Colab instead of Kaggle. Colab fixed this issue in fall 2018.
I would favor option #1 or #3.