Pytorch variable mismatch(cuda.LongTensor vs cuda.IntTensor) in running lesson1.ipynb

Hi everyone,

I am new to v2 of fast.ai course and I am running the lession1.ipynb on fast ai (version 2) and get this error while running the ‘3 lines of code’ on dogs cats dataset

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

The error is: RuntimeError: Expected object of type Variable[torch.cuda.LongTensor] but found type Variable[torch.cuda.IntTensor] for argument #1 ‘target’

Full error messages below.

Epoch
0% 0/3 [00:00<?, ?it/s]
0%| | 0/360 [00:00<?, ?it/s]

RuntimeError Traceback (most recent call last)
in ()
2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
3 learn = ConvLearner.pretrained(arch, data, precompute=True)
----> 4 learn.fit(0.01, 3)

~\Anaconda3\envs\python3\lib\site-packages\fastai\learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
97 self.sched = None
98 layer_opt = self.get_layer_opt(lrs, wds)
—> 99 self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
100
101 def lr_find(self, start_lr=1e-5, end_lr=10, wds=None):

~\Anaconda3\envs\python3\lib\site-packages\fastai\learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, **kwargs)
87 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
88 fit(model, data, n_epoch, layer_opt.opt, self.crit,
—> 89 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
90
91 def get_layer_groups(self): return self.models.get_layer_groups()

~\Anaconda3\envs\python3\lib\site-packages\fastai\model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, kwargs)
82 for (*x,y) in t:
83 batch_num += 1
—> 84 loss = stepper.step(V(x),V(y))
85 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
86 debias_loss = avg_loss / (1 - avg_mom**batch_num)

~\Anaconda3\envs\python3\lib\site-packages\fastai\model.py in step(self, xs, y)
41 if isinstance(output,(tuple,list)): output,*xtra = output
42 self.opt.zero_grad()
—> 43 loss = raw_loss = self.crit(output, y)
44 if self.reg_fn: loss = self.reg_fn(output, xtra, raw_loss)
45 loss.backward()

~\Anaconda3\envs\python3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce)
1047 weight = Variable(weight)
1048 if dim == 2:
-> 1049 return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
1050 elif dim == 4:
1051 return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)

RuntimeError: Expected object of type Variable[torch.cuda.LongTensor] but found type Variable[torch.cuda.IntTensor] for argument #1 ‘target’

From my understanding, ‘target’ or label of the image is in type IntTensor, but output of pytorch’s resnet (‘input’) is LongTensor instead. I am not sure how to cast them properly. Any help or guidance on this will be appreciated. Thank you!

Edit: I run it in my local machine, not on AWS or other cloud service. I installed fast.ai via pip and pytorch is running fine when I run other non-fastai projects.

Yup it’s a known problem on Windows, where fastai isn’t supported as yet. Do a forum search and you’ll find a few solutions to this issue.

FYI if you’re fairly new to DL I’d strongly suggest using Linux instead of Windows, since neither pytorch nor fastai is supported on Windows as yet. (Will be from pytorch 0.4 however)

2 Likes

Thank you Jeremy! I just installed dual Ubuntu on my Windows with the help from another thread and I have to admit it is way easier to setup and run well too!

2 Likes

This issue was fixed on fastai’s github on Dec 30th. https://github.com/fastai/fastai/issues/71 You just have to pull the latest sources.

Hi,
I configured fastai on windows 10 and the first and second lessons ran without a hitch. But once i installed a new windows 10 insider preview over my current windows 10, I had to reinstall the anaconda and remake a fastai environment. I setup everything as I did the first time but this time I started getting this error and I’ve updated all the sources/libraries/packages. Not sure how to fix it. Could you please explain to me what you meant by saying pull the latest sources?

Dear Jeremy,
I have been working with the PyTorch on Windows 10 since version 2.0. Moreover, 99% of my code runs without any changes on both Linux and Windows 10. I have also recently upgraded to PyTorch version 3.0 on Windows and there were no changes whatsoever.

The only limitation is when using data loaders on Windows, you have to set the number of workers to 0:

t_ds, v_ds = trainTestSplit(train_ds)

t_loader = torch.utils.data.DataLoader(t_ds, batch_size=batch_size, shuffle=False, num_workers=0)
v_loader = torch.utils.data.DataLoader(v_ds, batch_size=batch_size, shuffle=False, num_workers=0)

But that is not a major determinant for me.

And with respect to LongTensor, when the target is Binary, and using BCELoss, this works perfectly on Windows 10:

if use_cuda:
        lgr.info ("Using the GPU")            
        Y_tensor = Variable(torch.from_numpy(y_data_np)).type(torch.FloatTensor).cuda()  # BCEloss

Best,

Where do you set these exactly? Sorry I’m new to deep learning