How to use Multiple GPUs?

For anything more than 2 GPUs, PCIe lanes become an important factor.

Also I have experimented a bit with different numbers of GPUs on Google GCP, and seems anything more than 2 parallell GPUs seemed handicapped by the PCIe lane speed on V100 x8 GPUs. The NVlink topology was in a way that did not help to scale into 4x GPUs in parallel. Of course I could still use 4 separately running models on 8 GPUs with better speedup (4 models = each model/2 GPUs ).

I don’t know how AWS V100 NVlink topology is. But I gues there are better topologies can be tuned for DL training. Specifically, the DGX-2 topology seems better than GCP.

Here are my 3 posts on the details of GCP’s 8 GPU analysis.

I wonder how did you connect 6 GPUs. What MB, CPU and how many PCIe lanes do you have?