As you know, Amazon has P3 instances with an enormous amount of GPU computational power. As Jeremy mentioned in one of his posts about competing with other companies in training models, they were able to train the ImageNet model in approx. 3 hours and ~70$.
So my question is, what do I need to know (from a programmatical point of view) to train my own ImageNet weights using that kind of AWS instance? Like, do I just need to write a usual training loop as I do for single GPU instance and upload the program/container onto the host, or should I handle model distribution between GPUs myself? I know that TensorFlow has some kind of dispatcher that allows running the model in parallel on several chips but does anybody have a real example of the code that I can use to train the model on 8 GPUs? Like, upload the program to the host, download ImageNet, configure paths and variables, and run the computations?
I would like to train my own ImageNet model using TensorFlow/Keras or PyTorch, but not sure where to start. So far I was using 1 GPU on my local machine only and not sure about possible difficulties in handling distributed models.