I’m interested in training some models from scratch on ImageNet, and I’d like it to be as fast as possible. I have access to some good machines with multiple GPUs and lots of RAM. I’d like to be able to the training on a single machine, probably on 4 GPUs (or maybe 8). Unfortunately they aren’t Volta GPUs, so I probably can’t use half-precision to good effect (although I may be able to get access to a machine with Voltas if I have to).
I’ve read the blog posts about the DAWNBench challenges, and I’ve looked at this repo. I’m wondering if there are any new best practices or things people have done in the last year that make use of updated PyTorch/fastai libraries or other new approaches for fast training.
Also, just any advice or experiences people have had in doing things like this. I’d like to keep extra development and fiddling with things like the distributed training process to a minimum, if at all possible. Thanks in advance for any help/thoughts on this!
Also check out this thread which is looking at reproducing some work on improving on the DAWNBench speed. Not many concrete results in the thread, but the original work should be worth a look. Any contributions to the thread would also be great. Seems like a few people are interested in optimising training times so collaboration could be good (and needn’t involve big code changes, just sharing experiences would help)…
I was playing around earlier today and noticing that training was taking a really long time (using 1 GPU, just to get a sense of things). It seems that the data loading is a bottleneck. I went looking around and found someone pointing to the tensorpack DataFlow library as a faster alternative to the PyTorch dataloaders. I think I’m going to try it out and see if it makes a difference.