Make inference run on the GPU

I was stuck on this for a couple of days (weeks even but I have abandoned this project in the meantime and am just now returning to it).

I am running a fairly complicated model and I want to use fastai pre-trained model as part of it. The problem is that the model trains on GPU and so running so shuffling data back and forth between GPU and CPU makes the model almost impossible/impractical. This has not been working despite setting cpu=False when load_learner.

To replicate:

  1. load any model e.g. camvid:
    learn = load_learner('stage1', cpu=False)
  2. create a test tensor:
    test_v1 = torch.tensor(np.ndarray((512,512,3))).to('cuda')
  3. learn.predict(test_v1) yields:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-2886587ca7d8> in <module>()
----> 1 learn.predict(test_v1)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/learner.py in predict(self, item, rm_type_tfms)
    331     def predict(self, item, rm_type_tfms=None):
    332         dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms)
--> 333         inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
    334         i = getattr(self.dls, 'n_inp', -1)
    335         full_dec = self.dls.decode_batch((*tuplify(inp),*tuplify(dec_preds)))[0][i:]

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, **kwargs)
    319             for mgr in ctx_mgrs: stack.enter_context(mgr)
    320             self(_before_epoch)
--> 321             self._do_epoch_validate(dl=dl)
    322             self(_after_epoch)
    323             if act is None: act = getattr(self.loss_func, 'activation', noop)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/learner.py in _do_epoch_validate(self, ds_idx, dl)
    278             dl,old,has = change_attrs(dl, names, [False,False])
    279             self.dl = dl;                                    self('begin_validate')
--> 280             with torch.no_grad(): self.all_batches()
    281         except CancelValidException:                         self('after_cancel_validate')
    282         finally:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/learner.py in all_batches(self)
    246     def all_batches(self):
    247         self.n_iter = len(self.dl)
--> 248         for o in enumerate(self.dl): self.one_batch(*o)
    249 
    250     def one_batch(self, i, b):

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py in __iter__(self)
     95         self.randomize()
     96         self.before_iter()
---> 97         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
     98             if self.device is not None: b = to_device(b, self.device)
     99             yield self.after_batch(b)

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    817             else:
    818                 del self.task_info[idx]
--> 819                 return self._process_data(data)
    820 
    821     next = __next__  # Python 2 compatibility

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
    844         self._try_put_index()
    845         if isinstance(data, ExceptionWrapper):
--> 846             data.reraise()
    847         return data
    848 

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_utils.py in reraise(self)
    367             # (https://bugs.python.org/issue2651), so we work around it.
    368             msg = KeyErrorMessage(msg)
--> 369         raise self.exc_type(msg)

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data = next(self.dataset_iter)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py", line 106, in create_batches
    yield from map(self.do_batch, self.chunkify(res))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/utils.py", line 270, in chunked
    res = list(itertools.islice(it, cs))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py", line 119, in do_item
    try: return self.after_item(self.create_item(s))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py", line 125, in create_item
    def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 265, in __getitem__
    res = tuple([tl[it] for tl in self.tls])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 265, in <listcomp>
    res = tuple([tl[it] for tl in self.tls])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 242, in __getitem__
    return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 206, in _after_item
    def _after_item(self, o): return self.tfms(o)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 185, in __call__
    def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 136, in compose_tfms
    x = f(x, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 71, in __call__
    def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 82, in _call
    if self.use_as_item or not is_listy(x): return self._do_call(f, x, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 87, in _do_call
    return x if f is None else retain_type(f(x, **kwargs), x, f.returns_none(x))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/dispatch.py", line 98, in __call__
    return f(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/vision/core.py", line 87, in create
    if isinstance(fn,Tensor): fn = fn.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Things I have tried:

  • I have looked into fastai2 and it does not have the defaults or config, unlike (it seems?) v1.
  • torch.device does not seem to help
  • setting self.learner.model = self.learner.model.to('cuda')
  • searching through the forums if anyone has had a similar issue

If there’s any questions or detail I could add to make this clearer, please let me know! :slight_smile:

I really do not feel like I (yet) know enough about the fastai2 internals to solve this on my own. So any help would be deeply appreciated!

First, are you doing batch inference or individual? :slight_smile:

Second, what does the following give you:

  • learn.model.device
  • learn.dls.device

So far I’d be happy getting either working. But of course, batch would be preferable.

learn.dls.device yields device(type='cuda', index=0)
learn.model.device does not work because it does not have attribute device
dir(learn.model) yields:

['_apply',
 '_backend',
 '_backward_hooks',
 '_buffers',
 '_construct',
 '_forward_hooks',
 '_forward_pre_hooks',
 '_get_name',
 '_load_from_state_dict',
 '_load_state_dict_pre_hooks',
 '_modules',
 '_named_members',
 '_parameters',
 '_register_load_state_dict_pre_hook',
 '_register_state_dict_hook',
 '_save_to_state_dict',
 '_slow_forward',
 '_state_dict_hooks',
 '_tracing_name',
 '_version',
 'add_module',
 'append',
 'apply',
 'buffers',
 'children',
 'cpu',
 'cuda',
 'double',
 'dump_patches',
 'eval',
 'extend',
 'extra_repr',
 'float',
 'forward',
 'half',
 'has_children',
 'insert',
 'layers',
 'load_state_dict',
 'modules',
 'named_buffers',
 'named_children',
 'named_modules',
 'named_parameters',
 'parameters',
 'register_backward_hook',
 'register_buffer',
 'register_forward_hook',
 'register_forward_pre_hook',
 'register_parameter',
 'requires_grad_',
 'sfs',
 'share_memory',
 'state_dict',
 'summary',
 'to',
 'train',
 'training',
 'type',
 'zero_grad']

Ah yes sorry :slight_smile:
Use next(learn.model.parameters()).is_cuda

(This says what device your model is on)

Oddly that next(learn.model.parameters()).is_cuda yields True

But still the same error.

So I am really confused. It seems that dls is on GPU. The model is on GPU. Is there any chance the transformations take it to CPU? Why would inference and training transformations behave differently? If anything, augmentations do not happen.

EDIT: I will re-run the whole thing in colab, just to make sure this is not just some issue with my environment and will report as soon as that is done.

What I can tell you is that batch inference should work fine (I do it all the time). I have not tried cuda .predict out though :confused:

Yeah I mean inference normally works for me too.

Ok, so was re-runing the inference in colab, but the camvid example errored out on me, so I will come back to it tomorrow.

EDIT: So I was fighting with Colab for most of morning and afternoon, but I kept getting this convoluted error no matter what fastai2 or pytorch version I had:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-c3b8c47cf081> in <module>()
----> 1 learn = load_learner('xihelm-sem-seg-6 (1)')
      2 test_v1 = torch.tensor(np.ndarray((512,512,3))).to('cuda')
      3 learn.predict(test_v1)

2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in load_learner(fname, cpu)
    595 def load_learner(fname, cpu=True):
    596     "Load a `Learner` object in `fname`, optionally putting it on the `cpu`"
--> 597     res = torch.load(fname, map_location='cpu' if cpu else None)
    598     if hasattr(res, 'to_fp32'): res = res.to_fp32()
    599     if cpu: res.dls.cpu()

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    424         if sys.version_info >= (3, 0) and 'encoding' not in pickle_load_args.keys():
    425             pickle_load_args['encoding'] = 'utf-8'
--> 426         return _load(f, map_location, pickle_module, **pickle_load_args)
    427     finally:
    428         if new_fd:

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
    611     unpickler = pickle_module.Unpickler(f, **pickle_load_args)
    612     unpickler.persistent_load = persistent_load
--> 613     result = unpickler.load()
    614 
    615     deserialized_storage_keys = pickle_module.load(f, **pickle_load_args)

ModuleNotFoundError: No module named 'utils'

forgive me for jumping in - hope i don’t send you off in the weeds.

to me - your traceback makes it look like you are maybe missing an:

import utils

somewhere?

1 Like

@313V great observation! Try doing !pip install utils @leaf (I only knew about that cause I needed that package the other day. Didn’t put 1+1)

Right, thank you both for your input but the utils is just a surrogate problem I feel.

Also note that this problem only occurs in Colab. I re-ran all of the above on my problem of interest and it yields the same results. So I am not sure what is not CUDA.

The target issue is here:

def create(cls, fn:(Path,str,Tensor,ndarray,bytes), **kwargs)->None:
        "Open an `Image` from path `fn`"
        if isinstance(fn,TensorImage): fn = fn.permute(1,2,0).type(torch.uint8)
        if isinstance(fn,Tensor): fn = fn.numpy()
        if isinstance(fn,ndarray): return cls(Image.fromarray(fn))
        if isinstance(fn,bytes): fn = io.BytesIO(fn)
        return cls(load_image(fn, **merge(cls._open_args, kwargs)))

EDIT:

So I guess while creating items the error occurs. But I guess my tensor is already on the GPU. However both:
self.model.predict(tensor.cpu())
and self.model.predict(tensor.cpu().numpy()) yield something like:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data = next(self.dataset_iter)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py", line 106, in create_batches
    yield from map(self.do_batch, self.chunkify(res))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/utils.py", line 268, in chunked
    res = list(itertools.islice(it, cs))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py", line 119, in do_item
    try: return self.after_item(self.create_item(s))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/load.py", line 125, in create_item
    def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 265, in __getitem__
    res = tuple([tl[it] for tl in self.tls])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 265, in <listcomp>
    res = tuple([tl[it] for tl in self.tls])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 242, in __getitem__
    return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/data/core.py", line 206, in _after_item
    def _after_item(self, o): return self.tfms(o)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 188, in __call__
    def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 136, in compose_tfms
    x = f(x, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 71, in __call__
    def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 82, in _call
    if self.use_as_item or not is_listy(x): return self._do_call(f, x, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/transform.py", line 87, in _do_call
    return x if f is None else retain_type(f(x, **kwargs), x, f.returns_none(x))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastcore/dispatch.py", line 98, in __call__
    return f(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/vision/core.py", line 88, in create
    if isinstance(fn,ndarray): return cls(Image.fromarray(fn))
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/PIL/Image.py", line 2647, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

For some strange reason, I think that the batch does not get loaded correctly and not only is the predict and subsequently get_preds getting one item, it is only getting one item:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/PIL/Image.py", line 2645, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 224, 224), '<f4')

During handling of the above exception, another exception occurred:

EDIT: as a side-note, why is pred a Tensor rather than a function?

Okay I managed to replicate this end-2-end with a camvid example. I think the utils was just something weird with colab.

1 Like

Aha! I think I am closer:
dset = Datasets(test_v1) causes this issue (assuming that is what happens under the hood by slicing(3,512,512) -> 3x (1,512,512)

That may be due to it thinking you have a batch of 3 instead of 1 three channel image. Is it possible to use PILImage.create when doing it? Or TensorImage? (To which you pass in your data)?

Well I understand that part :slight_smile:

PILImage I’d avoid (because shuffling the data & dataformats during training is impractical), but TensorImage could work. Though already ds takes a long time to create. I wonder why that is.

For posterity: this was solved by manually assigning the model to the appropriate device:
e.g. learn.model.to

6 Likes