Strange Error Tweaking L14 Unet

Hey guys,

I’ve been playing with the Unet code in Lesson 14 and applying it to other image segmentation tasks. I notice that I get a rather strange inconsistent error that crops up occasionally:

RuntimeError                              Traceback (most recent call last)
<ipython-input-49-da31222bd713> in <module>()
      1 learn.load('Unet-ish')
----> 2 out = learn.predict(is_test=True)
      3 out.shape

~/Constructs/TSGSalt/fastai/learner.py in predict(self, is_test, use_swa)
    370         dl = self.data.test_dl if is_test else self.data.val_dl
    371         m = self.swa_model if use_swa else self.model
--> 372         return predict(m, dl)
    373 
    374     def predict_with_targs(self, is_test=False, use_swa=False):

~/Constructs/TSGSalt/fastai/model.py in predict(m, dl)
    247 
    248 def predict(m, dl):
--> 249     preda,_ = predict_with_targs_(m, dl)
    250     return np.concatenate(preda)
    251 

~/Constructs/TSGSalt/fastai/model.py in predict_with_targs_(m, dl)
    259     if hasattr(m, 'reset'): m.reset()
    260     res = []
--> 261     for *x,y in iter(dl): res.append([get_prediction(to_np(m(*VV(x)))),to_np(y)])
    262     return zip(*res)
    263 

~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    355             result = self._slow_forward(*input, **kwargs)
    356         else:
--> 357             result = self.forward(*input, **kwargs)
    358         for hook in self._forward_hooks.values():
    359             hook_result = hook(self, input, result)

<ipython-input-26-1c6459a27813> in forward(self, x)
     14         inp = x
     15         x = F.relu(self.rn(x))
---> 16         x = self.up1(x, self.sfs[3].features)
     17         x = self.up2(x, self.sfs[2].features)
     18         x = self.up3(x, self.sfs[1].features)

~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    355             result = self._slow_forward(*input, **kwargs)
    356         else:
--> 357             result = self.forward(*input, **kwargs)
    358         for hook in self._forward_hooks.values():
    359             hook_result = hook(self, input, result)

<ipython-input-25-8fd292dc4fbf> in forward(self, up_p, x_p)
     10         up_p = self.tr_conv(up_p)
     11         x_p = self.x_conv(x_p)
---> 12         cat_p = torch.cat([up_p,x_p], dim=1)
     13         return self.bn(F.relu(cat_p))

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 64 and 16 in dimension 0 at /pytorch/torch/lib/THC/generic/THCTensorMath.cu:111

A few odd things about it:

  1. It only appears when using the trained model to predict something. It trains fine.
  2. If I restart and rerun the Kernel, the dimensions mismatch changes for no reason (ie. the error changes even though the code doesn’t). Up there is 64 vs 16. Sometimes is 32 vs 64. Other times, 64 vs 3.
  3. Sometimes the model will succeed in predicting an output on a the training dataset but fail when I set learn.predict(is_test=True)

Anybody else come across this?