Dataloader on Windows with num_workers>0

I can run FastAI locally just fine with num_workers = 0 on Windows, but that is painfully slow. The GPU load keeps fluctuating and the training times are 3~4x longer than what they should.

Is there any tutorial/guide/best practices to run FAST AI on windows? I am not saying about installation, cuda or anything like that. My only issue is with the dataloader.

On a side note, I managed to run almost the same pipeline using Pytorch Lightning by wrapping my call inside

if __name__ == '__main__':
    main_train_loop()

but this didn’t work with fastai for me. I keep getting pickling errors relating to my augmentation functions (not using any lambda func)

Thanks in advance, any tip would be helpful.