I am using fastai v2 on a windows system and testing on the pets notebook.
Current version:
Cuda: True
GPU: GeForce GTX 1060
Python version: 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Pytorch version: 1.3.0
I got the following error after running this code:
pets = DataBlock(types=(PILImage, Category),
get_items=get_image_files,
splitter=RandomSplitter(),
#get_y=RegexLabeller(pat = r’/([^/]+)\d+.jpg$‘))
get_y = RegexLabeller(pat = r’\([^\]+)\d+.jpg$')) #For windows
dbunch = pets.databunch(untar_data(URLs.PETS)/“images”, item_tfms=RandomResizedCrop(460, min_scale=0.75), bs=32,
batch_tfms=[*aug_transforms(size=224, max_warp=0), Normalize(*imagenet_stats)])
learn.fit_one_cycle(4)
results in the following error:
RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\tmp_conda_3.7_183424\conda\conda-bld\pytorch_1570818936694\work\torch/csrc/generic/StorageSharing.cpp:245
Looking at this thread Windows FAQ — PyTorch 2.4 documentation it points that multiprocessing on CUDA tensors are not supported and offered 2 alternatives one being change num_worker to 0, which I did:
dbunch = pets.databunch(untar_data(URLs.PETS)/“images”, item_tfms=RandomResizedCrop(460, min_scale=0.75), bs=32,
batch_tfms=[*aug_transforms(size=224, max_warp=0), Normalize(*imagenet_stats)], num_workers=0)
This then resulted in a different error when running one_fit:
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 ‘target’ in call to _thnn_nll_loss_forward
The error was being generated here:
~\Anaconda3\envs\fastai_v2_1.3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1837 if dim == 2:
→ 1838 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
1839 elif dim == 4:
1840 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
so I added a line:
if dim == 2: target = target.long() #new input ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
and now the notebook works fine.
However are there better suggestions on how to fix this?