Sometimes I try to use the DataLoader and it just takes forever (minutes without loading 1 batch). Most of the times the same code in the same environment just works fine (few seconds).
When it is slow, it normally gets back to normal by just restarting the environment. Other times, I need to reboot the machine. This is so weird. Any idea what is going on? The problem is that the behavior is not always the same. I don’t know how to debug this.
How can I do that? Is the a YT video you could point me on how to do that? Just to clarify the loader seems to be working it keeps running and the time goes. Something that runs in 4 seconds keeps running for minutes.
I’ve noticed that on my machine it seems re-downloads the dataset/rewrites the cache every time. Seemingly ignoring download_mode & ignore_verification settings.
Loading fashion_mnist takes 6 minutes on (Py3.8) and 12 minutes on (Py3.9).
I noticed the problem is not on pytorch DataLoader but seems to be in HF load_dataset that dos not find the cached dataset. But still looks for it. If I reboot the machine it works again.
You can push the interrupt button in Jupiter, and see where your code is hanging. If you want to experiment with what is happening, you can call %debug in the next cell. Hope it helps