GPU leakage when training LM or text classification models

I have 2 GPUs on my local DL Rig and I’m setting which one I want to as such:

torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}')
# => Using GPU #1

Everything works fine until I start training the model or run learn.lr_find. At this point when I look at nvidia-smi I see that in addition to GPU #1 being used as expected, about 1G of memory on GPU #0 gets allocated as well.

Anyways, this is causing problems because I have another developer using GPU #0.

Any ideas on how to fix?