Chapter 13 Learner error

pony999 · December 6, 2020, 10:54pm

I’m following the book code in Colab for Ch13 CNN. Got errors with Learner generated for simple_cnn model, firstly introduced in the Creating the CNN section and onwards. The same errors appears when Run the official code provided by fast.ai: Chapter 13, Convolutions, which only proves that it’s not my bad spelling.

I’m using fastai version: 2.1.8 and didn’t help running !pip install fastbook --upgrade.

This seems to be Ch13 specific as just managed to run the entire code for the following Chapter 14: ResNets. The TypeError [details below] suggests that torch.nn.functional.cross_entropy is not happy with fastai.torch_core.TensorImageBW and/or fastai.torch_core.TensorCategory.

# Learner for chapter 13
learn = Learner(dls, simple_cnn, loss_func=F.cross_entropy, metrics=accuracy)

learn.model on it’s own seems to be defined as expected:

Sequential(
  (0): Sequential(
    (0): Conv2d(1, 4, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
  )
  (1): Sequential(
    (0): Conv2d(4, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
  )
  (2): Sequential(
    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
  )
  (3): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU()
  )
  (4): Conv2d(32, 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (5): Flatten(full=False)
)

Error details

Error (1):

learn.summary()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-bc39e9e85f86> in <module>()
----> 1 learn.summary()

6 frames
/usr/local/lib/python3.6/dist-packages/fastai/callback/hook.py in _print_shapes(o, bs)
    163 def _print_shapes(o, bs):
    164     if isinstance(o, torch.Size): return ' x '.join([str(bs)] + [str(t) for t in o[1:]])
--> 165     else: return str([_print_shapes(x, bs) for x in o])
    166 
    167 # Cell

TypeError: 'int' object is not iterable

Error (2):

learn.fit_one_cycle(2, 0.01)

epoch	train_loss	valid_loss	accuracy	time
0	0.000000	00:00
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-99af2a5e6729> in <module>()
----> 1 learn.fit_one_cycle(2, 0.01)

13 frames
/usr/local/lib/python3.6/dist-packages/torch/overrides.py in handle_torch_function(public_api, relevant_args, *args, **kwargs)
   1069     raise TypeError("no implementation found for '{}' on types that implement "
   1070                     '__torch_function__: {}'
-> 1071                     .format(func_name, list(map(type, overloaded_args))))
   1072 
   1073 def has_torch_function(relevant_args: Iterable[Any]) -> bool:

TypeError: no implementation found for 'torch.nn.functional.cross_entropy' on types that implement __torch_function__: [<class 'fastai.torch_core.TensorImageBW'>, <class 'fastai.torch_core.TensorCategory'>]

Error (3):

def fit(epochs=1):
    learn = Learner(dls, simple_cnn(), loss_func=F.cross_entropy,
                    metrics=accuracy, cbs=ActivationStats(with_hist=True))
    learn.fit(epochs, 0.06)
    return learn

learn = fit()

epoch	train_loss	valid_loss	accuracy	time
0	0.000000	00:00
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-a7897b19c59c> in <module>()
----> 1 learn = fit()

13 frames
/usr/local/lib/python3.6/dist-packages/torch/overrides.py in handle_torch_function(public_api, relevant_args, *args, **kwargs)
   1069     raise TypeError("no implementation found for '{}' on types that implement "
   1070                     '__torch_function__: {}'
-> 1071                     .format(func_name, list(map(type, overloaded_args))))
   1072 
   1073 def has_torch_function(relevant_args: Iterable[Any]) -> bool:

TypeError: no implementation found for 'torch.nn.functional.cross_entropy' on types that implement __torch_function__: [<class 'fastai.torch_core.TensorImageBW'>, <class 'fastai.torch_core.TensorCategory'>]

muellerzr · December 6, 2020, 10:59pm

So the first (learn.summary()) is expected, there’s an active bug report on it: learn.summary() raises an `TypeError: 'int' object is not iterable` · Issue #3011 · fastai/fastai · GitHub

The second is an issue with the newest pytorch, it won’t just readily accept types anymore like it used to before (so long as it was a tensor). I would recommend using CrossEntropyLossFlat() as your loss function instead as we need to do some magic to convert our tensors to a base Tensor type that will work

pony999 · December 6, 2020, 11:05pm

Thank you Zachary for swift answer.

Although the CrossEntropyLossFlat() leads me to another error now:

learn = Learner(dls, simple_cnn, loss_func=CrossEntropyLossFlat, metrics=accuracy)

learn.fit_one_cycle(2, 0.01)

epoch	train_loss	valid_loss	accuracy	time
0	0.000000	00:00
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-33-99af2a5e6729> in <module>()
----> 1 learn.fit_one_cycle(2, 0.01)

19 frames
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __torch_function__(cls, func, types, args, kwargs)
    993 
    994         with _C.DisableTorchFunction():
--> 995             ret = func(*args, **kwargs)
    996             return _convert(ret, cls)
    997 

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

muellerzr · December 7, 2020, 12:23am

You need to use an instance of the class as your loss function. You’re just passing in the constructor. (IE do loss_func=CrossEntropyLossFlat())

pony999 · December 7, 2020, 3:04pm

It looks that in this case loss_func=CrossEntropyLossFlat() doesn’t work either.
Behaves differently this time, as previously it always stopped at the beginning of the epoch training.
Now it trains the first epoch and fails as soon as it finishes.
Not sure if it’s worth to try something else or just wait a bit for a future update.

learn.fit_one_cycle(2, 0.01)

epoch	train_loss	valid_loss	accuracy	time
0	0.061375	0.045542	None	00:11
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-99af2a5e6729> in <module>()
----> 1 learn.fit_one_cycle(2, 0.01)

22 frames
/usr/local/lib/python3.6/dist-packages/torch/overrides.py in handle_torch_function(public_api, relevant_args, *args, **kwargs)
   1069     raise TypeError("no implementation found for '{}' on types that implement "
   1070                     '__torch_function__: {}'
-> 1071                     .format(func_name, list(map(type, overloaded_args))))
   1072 
   1073 def has_torch_function(relevant_args: Iterable[Any]) -> bool:

TypeError: no implementation found for 'torch.tensor.eq' on types that implement __torch_function__: [<class 'fastai.torch_core.TensorImageBW'>, <class 'fastai.torch_core.TensorCategory'>]

def fit(epochs=1):
    learn = Learner(dls, simple_cnn(), loss_func=CrossEntropyLossFlat(),
                    metrics=accuracy, cbs=ActivationStats(with_hist=True))
    learn.fit(epochs, 0.06)
    return learn

learn = fit()

epoch	train_loss	valid_loss	accuracy	time
0	2.304835	2.284931	None	00:57
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-51-a7897b19c59c> in <module>()
----> 1 learn = fit()

22 frames
/usr/local/lib/python3.6/dist-packages/torch/overrides.py in handle_torch_function(public_api, relevant_args, *args, **kwargs)
   1069     raise TypeError("no implementation found for '{}' on types that implement "
   1070                     '__torch_function__: {}'
-> 1071                     .format(func_name, list(map(type, overloaded_args))))
   1072 
   1073 def has_torch_function(relevant_args: Iterable[Any]) -> bool:

TypeError: no implementation found for 'torch.tensor.eq' on types that implement __torch_function__: [<class 'fastai.torch_core.TensorImageBW'>, <class 'fastai.torch_core.TensorCategory'>]

tsm_tau · December 14, 2020, 5:53am

Have you able to solve this issue? Please guide if solved. Thanks

wlw · December 16, 2020, 4:46pm

I’m getting the same issue. A solution would be great.

ram_cse · December 26, 2020, 1:53pm

I found one solution. If we put nn.Linear() module after Flatten() above code work perfectly. But not able to find the explanation, why nn.Linear() works.

hey_un · December 29, 2020, 2:22pm

I am having the same issues with learner.fine_tune and CrossEntropyLossFlat(). Anyone with a solution to avoid the program from crashing?

mauricetk · January 4, 2021, 9:35am

Having the same issue but following advice from @muellerzr and @ram_cse allows you to fix the situation.

Changing loss_func from F.cross_entropy to CrossEntropyLossFlat() starts the learn process but it crashes after the first epoch is about to finish.

If you now follow the proposal from @ram_cse, the issue is fixed: By adding nn.Linear(2,2) the learn process does not crash.

So keep in mind that you have to modify two bits. First change the loss function and change the model sequential.

Be mindful that adding the linear layer adds 4 additional parameters, so the model is changed and the solution is in my view clean but also do not know the implications of this. Accuracy seems to be similar like in the book.

I can also not explain adding a layer works. I noticed though that adding the linear layer changes the output of the model from:
(64, 2) to torch.Size([64, 2])

mauricetk · January 4, 2021, 9:58am

I opened a bug report on GitHub: https://github.com/fastai/fastai/issues/3123

Hoping this is helpful for the people who know how to fix this. Otherwise just delete.

peiyi · January 4, 2021, 1:45pm

It seems that == operation between <class 'fastai.torch_core.TensorImageBW'> and <class 'fastai.torch_core.TensorCategory'> is broken or missing.

Metric accuracy use == for element-wise comparison, so you can replace accuracy with a metric without using == to let your learner compute accuracy.

For example, like this:

def my_accuracy(y_pred, y_true):
    y_pred = torch.argmax(y_pred, axis=1).float()
    equ = [1 if i == t else 0 for i, t in zip(y_pred, y_true)]
    return np.mean(equ)

learn = Learner(dls, simple_cnn, loss_func=CrossEntropyLossFlat(), metrics=my_accuracy)

I think this may work.

peiyi · January 4, 2021, 2:19pm

Also, I believe nn.Linear(2,2) works because it changes the type of your output to tensor while Flatten() only changes the shape of your output.

@module(full=False)
def Flatten(self, x):
    "Flatten `x` to a single dimension, e.g. at end of a model. `full` for rank-1 tensor"
    return x.view(-1) if self.full else x.view(x.size(0), -1)

mauricetk · January 4, 2021, 6:28pm

Excellent! Amazing analysis and amazing fix.

I verified with both the examples (binary, and multi-class classification) and both work.

I will update the bug report with your information.

Thanks!

muellerzr · January 4, 2021, 7:30pm

This has popped up a number of times on the forums. This is a pytorch bug that’s been fixed on the master. Do the following interim until the next release:

!pip install git+https://github.com/fastai/fastai
!pip install git+https://github.com/fastai/fastcore

mauricetk · January 4, 2021, 7:37pm

Cool! Was not aware of this. Will close the bug report now.

Thanks!

rawmean · May 1, 2021, 7:48pm

This problem still exist and is not fixed in the latest version from github (May 2021)
To fix this problem, I defined a new loss function and did a cast.

loss = nn.BCEWithLogitsLoss()

def my_loss(input, target):
  inp = cast(inp, Tensor)
  t = cast(t, Tensor)
  return loss(inp, t)

  learn = Learner(dls, net, loss_func=my_loss, cbs=callbacks, 
                metrics=[Dice()],opt_func=Adam)

Rushirajsinh · August 12, 2021, 5:09am

This helped solve my problem. If anyone is still facing similar issues, modify the loss function accordingly

class DiceBCELoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(DiceBCELoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        
        inputs = cast(inputs, Tensor)
        targets = cast(targets, Tensor)
        
        #comment out if your model contains a sigmoid or equivalent activation layer
        inputs = F.sigmoid(inputs)       
        
        #flatten label and prediction tensors
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        
        intersection = (inputs * targets).sum()                            
        dice_loss = 1 - (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)  
        
        BCE = F.binary_cross_entropy(inputs.float(), targets.float(), reduction='mean')
        Dice_BCE = BCE + dice_loss
        
        return Dice_BCE
    
dice_bce_loss = DiceBCELoss()