Hey guys,
I’ve been playing with the Unet code in Lesson 14 and applying it to other image segmentation tasks. I notice that I get a rather strange inconsistent error that crops up occasionally:
RuntimeError Traceback (most recent call last)
<ipython-input-49-da31222bd713> in <module>()
1 learn.load('Unet-ish')
----> 2 out = learn.predict(is_test=True)
3 out.shape
~/Constructs/TSGSalt/fastai/learner.py in predict(self, is_test, use_swa)
370 dl = self.data.test_dl if is_test else self.data.val_dl
371 m = self.swa_model if use_swa else self.model
--> 372 return predict(m, dl)
373
374 def predict_with_targs(self, is_test=False, use_swa=False):
~/Constructs/TSGSalt/fastai/model.py in predict(m, dl)
247
248 def predict(m, dl):
--> 249 preda,_ = predict_with_targs_(m, dl)
250 return np.concatenate(preda)
251
~/Constructs/TSGSalt/fastai/model.py in predict_with_targs_(m, dl)
259 if hasattr(m, 'reset'): m.reset()
260 res = []
--> 261 for *x,y in iter(dl): res.append([get_prediction(to_np(m(*VV(x)))),to_np(y)])
262 return zip(*res)
263
~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
--> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)
<ipython-input-26-1c6459a27813> in forward(self, x)
14 inp = x
15 x = F.relu(self.rn(x))
---> 16 x = self.up1(x, self.sfs[3].features)
17 x = self.up2(x, self.sfs[2].features)
18 x = self.up3(x, self.sfs[1].features)
~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
--> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)
<ipython-input-25-8fd292dc4fbf> in forward(self, up_p, x_p)
10 up_p = self.tr_conv(up_p)
11 x_p = self.x_conv(x_p)
---> 12 cat_p = torch.cat([up_p,x_p], dim=1)
13 return self.bn(F.relu(cat_p))
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 64 and 16 in dimension 0 at /pytorch/torch/lib/THC/generic/THCTensorMath.cu:111
A few odd things about it:
- It only appears when using the trained model to predict something. It trains fine.
- If I restart and rerun the Kernel, the dimensions mismatch changes for no reason (ie. the error changes even though the code doesn’t). Up there is 64 vs 16. Sometimes is 32 vs 64. Other times, 64 vs 3.
- Sometimes the model will succeed in predicting an output on a the training dataset but fail when I set learn.predict(is_test=True)
Anybody else come across this?