How to debug miniai DataLoader?

fredguth · February 3, 2023, 2:19pm

Sometimes I try to use the DataLoader and it just takes forever (minutes without loading 1 batch). Most of the times the same code in the same environment just works fine (few seconds).

When it is slow, it normally gets back to normal by just restarting the environment. Other times, I need to reboot the machine. This is so weird. Any idea what is going on? The problem is that the behavior is not always the same. I don’t know how to debug this.

dhoa · February 3, 2023, 3:15pm

If you interrupt your notebook kernel, from the stack trace you can see where your kernel is hangging?

fredguth · February 3, 2023, 3:26pm

How can I do that? Is the a YT video you could point me on how to do that? Just to clarify the loader seems to be working it keeps running and the time goes. Something that runs in 4 seconds keeps running for minutes.

AllenK · February 4, 2023, 12:08am

I’ve noticed that on my machine it seems re-downloads the dataset/rewrites the cache every time. Seemingly ignoring download_mode & ignore_verification settings.
Loading fashion_mnist takes 6 minutes on (Py3.8) and 12 minutes on (Py3.9).

I’d love to know why it is so slow.

jeremy · February 4, 2023, 2:26am

miniai doesn’t have its own DataLoader - it just uses the PyTorch one.

The easiest way to debug it is to set num_workers=0. Often exceptions causes hangs in DataLoader when using multiple workers.

fredguth · February 5, 2023, 12:34pm

I noticed the problem is not on pytorch DataLoader but seems to be in HF load_dataset that dos not find the cached dataset. But still looks for it. If I reboot the machine it works again.

dhoa · February 5, 2023, 9:33pm

You can push the interrupt button in Jupiter, and see where your code is hanging. If you want to experiment with what is happening, you can call %debug in the next cell. Hope it helps