Windows: RuntimeError: cuda runtime error (801) & RuntimeError: Expected object of scalar type Long but got scalar type

amritv · October 28, 2019, 5:44am

I am using fastai v2 on a windows system and testing on the pets notebook.

Current version:

Cuda: True
GPU: GeForce GTX 1060
Python version: 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Pytorch version: 1.3.0

I got the following error after running this code:

pets = DataBlock(types=(PILImage, Category),
get_items=get_image_files,
splitter=RandomSplitter(),
#get_y=RegexLabeller(pat = r’/([^/]+)\d+.jpg$‘))
get_y = RegexLabeller(pat = r’\([^\]+)\d+.jpg$')) #For windows

dbunch = pets.databunch(untar_data(URLs.PETS)/“images”, item_tfms=RandomResizedCrop(460, min_scale=0.75), bs=32,
batch_tfms=[*aug_transforms(size=224, max_warp=0), Normalize(*imagenet_stats)])

learn.fit_one_cycle(4)

results in the following error:

RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\tmp_conda_3.7_183424\conda\conda-bld\pytorch_1570818936694\work\torch/csrc/generic/StorageSharing.cpp:245

Looking at this thread Windows FAQ — PyTorch 2.4 documentation it points that multiprocessing on CUDA tensors are not supported and offered 2 alternatives one being change num_worker to 0, which I did:

dbunch = pets.databunch(untar_data(URLs.PETS)/“images”, item_tfms=RandomResizedCrop(460, min_scale=0.75), bs=32,
batch_tfms=[*aug_transforms(size=224, max_warp=0), Normalize(*imagenet_stats)], num_workers=0)

This then resulted in a different error when running one_fit:

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 ‘target’ in call to _thnn_nll_loss_forward

The error was being generated here:

~\Anaconda3\envs\fastai_v2_1.3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1837 if dim == 2:
→ 1838 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
1839 elif dim == 4:
1840 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

so I added a line:

if dim == 2:
    target = target.long() #new input
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

and now the notebook works fine.

However are there better suggestions on how to fix this?

sgugger · October 28, 2019, 1:34pm

This is because we removed the automatic conversions to long ints in the data preprocesseing pipeline, since we wondered why it was there (now we know ). There’ll be a fix today.

amritv · October 28, 2019, 2:55pm

Awesome thanks!

cudawarped · November 5, 2019, 10:30am

@amritv Does the latest version work for you now? I am getting a similar error on windows after completion of the first epoch in the pets notebook

RuntimeError: Expected object of scalar type Int but got scalar type Long for argument #2 ‘other’

coming from

fastai_dev\fastai2\torch_core.py in _f(self, *args, **kwargs)
155 def _f(self, *args, **kwargs):
156 cls = self.class
→ 157 res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
158 return cls(res) if isinstance(res,Tensor) else res
159 return _f

amritv · November 5, 2019, 4:37pm

Hey @cudawarped, just tried it out and it worked after a

git pull

and

conda env update

However I notice that your error was generated elsewhere.

cudawarped · November 5, 2019, 5:18pm

Hey @amritv thanks a million for checking

conda env update

has removed the errors and its training without issue.

s.s.o · January 29, 2020, 8:49am

@sgugger, the issue still exists in the recent release for windows… I use 1.2 torch since 1.3 is not available on their site with cuda… the peds example not working…

d:\conda3\lib\site-packages\fastai2\learner.py in accumulate(self, learn)
431 def accumulate(self, learn):
432 bs = find_bs(learn.yb)
–> 433 self.total += to_detach(self.func(learn.pred, *learn.yb))*bs
434 self.count += bs
435 @property

d:\conda3\lib\site-packages\fastai2\metrics.py in error_rate(inp, targ, axis)
79 def error_rate(inp, targ, axis=-1):
80 "1 - accuracy"
—> 81 return 1 - accuracy(inp, targ, axis=axis)
82
83 # Cell

d:\conda3\lib\site-packages\fastai2\metrics.py in accuracy(inp, targ, axis)
74 “Compute accuracy with targ when pred is bs * n_classes”
75 pred,targ = flatten_check(inp.argmax(dim=axis), targ)
—> 76 return (pred == targ).float().mean()
77
78 # Cell

d:\conda3\lib\site-packages\fastai2\torch_core.py in _f(self, *args, **kwargs)
270 def _f(self, *args, **kwargs):
271 cls = self.class
–> 272 res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
273 return retain_type(res, self)
274 return _f

RuntimeError: Expected object of scalar type Int but got scalar type Long for argument #2 ‘other’

sgugger · January 29, 2020, 3:03pm

fastai v2 has not been tested on Windows and support for windows is not a priority right now (first, let’s finish it and document it ). I don’t think this will be dealt with until March.

ThomVett · March 22, 2020, 10:58pm

Linking another answer to this thread which allowed me to fix this specific cuda runtime 801 error in intro notebook of fastbook.
Setting num_workers=0 in ImageDataLoaders.from_name_func made it work for me.

acorbellini · September 14, 2020, 2:56am

Hey, besides adding num_workers=0 to run in Windows, I created an auxiliary loss function in my notebook (instead of modifying the fastai library code):

def loss_aux(input, target, **kwargs):
  target = target.long()
  return F.cross_entropy(input, target, **kwargs)

And set it up in the learner:

learn = Learner(dls, simple_cnn, loss_func=loss_aux, metrics=accuracy)

I didn’t come up with this by myself, other posts suggested doing this for the metrics argument. Also tried updating to latest pytorch and fastai (2.0.9) before doing this.

Hope this helps someone.

magic · December 30, 2020, 7:20pm

Yeah, set num_workers=0

path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224), 
num_workers=0)

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

Apparently Window’s 10 CUDA doesn’t support this feature (GPU memory sharing?).
See here: Windows FAQ — PyTorch 1.7.0 documentation

AamiBrown · January 13, 2021, 1:41pm

i faced one major error i.e How to Fix “Windows Smartscreen Can’t Be Reached” Error. I found a few resources but nothing really worked. Now i fixed this error due to a useful guide mentioned here https://www.reviewsed.com/fix-smart-screen-cant-be-reached/

KaitlynMoore · April 7, 2021, 2:54pm

Hi you may help me in my problem. Actually I am using Windows 10pro and I am facing ethernet doesn’t have a valid IP configuration for like a month from now and believe me its really frustrating.