Sometimes I try to use the DataLoader and it just takes forever (minutes without loading 1 batch). Most of the times the same code in the same environment just works fine (few seconds).
When it is slow, it normally gets back to normal by just restarting the environment. Other times, I need to reboot the machine. This is so weird. Any idea what is going on? The problem is that the behavior is not always the same. I don’t know how to debug this.
If you interrupt your notebook kernel, from the stack trace you can see where your kernel is hangging?
How can I do that? Is the a YT video you could point me on how to do that? Just to clarify the loader seems to be working it keeps running and the time goes. Something that runs in 4 seconds keeps running for minutes.
I’ve noticed that on my machine it seems re-downloads the dataset/rewrites the cache every time. Seemingly ignoring download_mode & ignore_verification settings.
Loading fashion_mnist takes 6 minutes on (Py3.8) and 12 minutes on (Py3.9).
I’d love to know why it is so slow.
miniai doesn’t have its own DataLoader - it just uses the PyTorch one.
The easiest way to debug it is to set
num_workers=0. Often exceptions causes hangs in DataLoader when using multiple workers.
I noticed the problem is not on pytorch DataLoader but seems to be in HF
load_dataset that dos not find the cached dataset. But still looks for it. If I reboot the machine it works again.
You can push the interrupt button in Jupiter, and see where your code is hanging. If you want to experiment with what is happening, you can call
%debug in the next cell. Hope it helps