How to use multiple gpus

Did anybody out there figure out how to use multiple GPUs with language_model_learner, using fast.ai version 2? Links to code examples would really help.

I am trying to use a Jupyter notebook in AzureML with 4 GPUs, but only 1 GPU gets utilized. I am trying to adjust code from https://github.com/fastai/fastbook/blob/master/10_nlp.ipynb to work with multiple GPUs.

For example, my changes look like this:
learn.model.cuda()
dls_lm.cuda()
learn.to_parallel(device_ids=[0,1,2,3])

I encounter “RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cuda:1 and hidden tensor at cuda:0”
I understand that CUDA wants me to send the model and the input to all GPUs, but after trying a few different things, I still see only 1 GPU being used:

Help with fast.ai would be really appreciated.