Bug? Multigpu training RuntimeError if tensor is passed into loss function init

Anyone know how to troubleshoot RuntimeError with multi-gpu training when passing weights to initialize CrossEntropyLossFlat or BCEWithLogitsLossFlat?

Exception
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA_nll_loss_forward)

Code

from fastai.vision.all import *
from fastai.text.all import *
from fastai.tabular.all import *
from fastai.collab import *
from accelerate import notebook_launcher
from fastai.distributed import *

path = untar_data(URLs.PETS)/'images'

def train():
    dls = ImageDataLoaders.from_name_func(
        path, get_image_files(path), valid_pct=0.2,
        label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))

    wgt = tensor([1.,2.]) # <---- This breaks multi-gpu training
    learn = vision_learner(dls, resnet34, metrics=error_rate, loss_func=CrossEntropyLossFlat(weight=wgt)).to_fp16()

    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fine_tune(1)

notebook_launcher(train, num_processes=4)

Observations:

  1. Setting wgt = None results in successful training.
  2. Setting wgt = tensor([1.,2.]).to(1) results in same error, but also strangely complains about ...found at least two devices, cuda:2 and cuda:0!, which is not even the wgt’s assigned device!