I have trained a simple model in a multiGPU server and for management resources I used torch.cuda.set_device(1) for the training.
Then I exported the model and loaded in another server (inference server) and I realised that I couldn’t load the model in cuda:0, so what I did was to create my own load_learner using map_location argument:
def my_load_learner(fname, cpu=True, pickle_module=pickle, map_location='cuda:0'): "Load a `Learner` object in `fname`, optionally putting it on the `cpu`" distrib_barrier() res = torch.load(fname, map_location='cpu' if cpu else map_location, pickle_module=pickle_module) if hasattr(res, 'to_fp32'): res = res.to_fp32() if cpu: res.dls.cpu() return res
This isn’t working when I try to predict with the loaded model. I have an error:
[…] File “/opt/conda/lib/python3.8/site-packages/torch/tensor.py”, line 995, in torch_function
ret = func(*args, **kwargs)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Why I can’t do a traning in cuda:1 and load the model in cuda:0 for prediction?, there is something I’m missing?.