From what I can tell, this is the problem:
In data/load.py iter
# fix issue 2899. If the process start method isn't fork, the data will be copied to cuda in learner one_batch. if self.device is not None and multiprocessing.get_start_method().lower() == "fork": b = to_device(b, self.device)
the start_method is spawn for Windows 10
and one_batch does not get called before the error happens, I imagine because it is running transforms on the object and the transforms expect the data to already be on the GPU.
Changing it to this makes it work:
if self.device is not None: b = to_device(b, self.device)