Model Parallelism and VRAM pooling

Hi Fellas. I just wanted to ask you if you had the occasion of making experiments with RTX consumer card and the NVlink.

There are a lot of articles about performance comparisons with and without NVlink, but AFAIK they are all about data parallelism (e.g. Pytorch’s DataParallel): they show very modest speedups with NVlink w.r.t. the pci-express bus.

However, true model parallelism (the graph is fragmented across multiple gpus) would be more interesting when it comes to NVlink, particularly if the bridge allows to pool the memory, thus obtaining a single, ‘big’ GPU from the point of view of the model.

Thank you.

Did you find out more about this? I have two Titan RTXs with nvlink and I would love to be able to share memory.

1 Like

Note that Model Parallel doesn’t offer speed-up per se, but rather a way to train a model too big to fit on a single GPU. If the model can fit on a single GPU, splitting it among several GPUs on a single host actually will slow down the training. See https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html#single-machine-model-parallel-best-practices.

But for Data Parallel, this posts suggests nvlink may help.

1 Like

model of parallelism : like master slave , pool , Task

Ref: https://www.nileshblog.tech/question/parallel-algorithm-models-data-task-work-pool-and-master-slave-model/