Setting up GPU cluster with Horovod

Has anyone tried setting up a cluster of machines with GPU’s using Uber’s Horovod?

I am curious to know what people’s experiences were in this regard. For example, things that you want to know but were not clearly spelled out in the documentation.

I know this is a super late response, but we’ve done quite a bit with horovod using both Tensorflow/Keras as well as pytorch. It was really awesome! We clustered together 100 V100s and it went very smoothly (powers of 2 tend to work better, but 100 being less efficient is better then 64 at “peak” efficiency). I can’t remember the exact numbers, but there was a not insignificant loss of efficiency per GPU, but overall the additional parallelization made it worthwhile. I’ve been interested in trying to see how well a fastai model would do, and exactly what it would take, but unfortunately haven’t been able to make time for it yet.