Lesson 1 official topic

phiriv · July 14, 2023, 4:37pm

AssertionError: Exception occured in Recorder when calling event after_batch:

For lab1 (Is it a bird?) I’m trying to modify the learner to work in a multiclass setting, i.e. with 3 categories instead of 2.
According to the tutorial in the docs this is straightforward, but I’m having trouble with fine-tuning the model.

When re-training with an optimal learning rate for 3 epochs the following rather complex error message was produced:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_18/568062170.py in <module>
     10 learn2=vision_learner(dls2, resnet18, metrics=[partial(accuracy_multi, thresh=0.5), fbeta_macro, fbeta_sample] )
     11 learn2.lr_find() #find the optimal learning rate
---> 12 learn2.fine_tune(3, 1e-3) #args: n epochs, learning rate. Not in documentation for some reason...ugh

/opt/conda/lib/python3.7/site-packages/fastai/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
    163     "Fine tune with `Learner.freeze` for `freeze_epochs`, then with `Learner.unfreeze` for `epochs`, using discriminative LR."
    164     self.freeze()
--> 165     self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    166     base_lr /= 2
    167     self.unfreeze()

/opt/conda/lib/python3.7/site-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt, start_epoch)
    117     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    118               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 119     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch)
    120 
    121 # %% ../../nbs/14_callback.schedule.ipynb 50

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch)
    262             self.opt.set_hypers(lr=self.lr if lr is None else lr)
    263             self.n_epoch = n_epoch
--> 264             self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
    265 
    266     def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    197 
    198     def _with_events(self, f, event_type, ex, final=noop):
--> 199         try: self(f'before_{event_type}');  f()
    200         except ex: self(f'after_cancel_{event_type}')
    201         self(f'after_{event_type}');  final()

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _do_fit(self)
    251         for epoch in range(self.n_epoch):
    252             self.epoch=epoch
--> 253             self._with_events(self._do_epoch, 'epoch', CancelEpochException)
    254 
    255     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False, start_epoch=0):

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    197 
    198     def _with_events(self, f, event_type, ex, final=noop):
--> 199         try: self(f'before_{event_type}');  f()
    200         except ex: self(f'after_cancel_{event_type}')
    201         self(f'after_{event_type}');  final()

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _do_epoch(self)
    246     def _do_epoch(self):
    247         self._do_epoch_train()
--> 248         self._do_epoch_validate()
    249 
    250     def _do_fit(self):

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _do_epoch_validate(self, ds_idx, dl)
    242         if dl is None: dl = self.dls[ds_idx]
    243         self.dl = dl
--> 244         with torch.no_grad(): self._with_events(self.all_batches, 'validate', CancelValidException)
    245 
    246     def _do_epoch(self):

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    197 
    198     def _with_events(self, f, event_type, ex, final=noop):
--> 199         try: self(f'before_{event_type}');  f()
    200         except ex: self(f'after_cancel_{event_type}')
    201         self(f'after_{event_type}');  final()

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in all_batches(self)
    203     def all_batches(self):
    204         self.n_iter = len(self.dl)
--> 205         for o in enumerate(self.dl): self.one_batch(*o)
    206 
    207     def _backward(self): self.loss_grad.backward()

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in one_batch(self, i, b)
    233         b = self._set_device(b)
    234         self._split(b)
--> 235         self._with_events(self._do_one_batch, 'batch', CancelBatchException)
    236 
    237     def _do_epoch_train(self):

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    199         try: self(f'before_{event_type}');  f()
    200         except ex: self(f'after_cancel_{event_type}')
--> 201         self(f'after_{event_type}');  final()
    202 
    203     def all_batches(self):

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in __call__(self, event_name)
    170 
    171     def ordered_cbs(self, event): return [cb for cb in self.cbs.sorted('order') if hasattr(cb, event)]
--> 172     def __call__(self, event_name): L(event_name).map(self._call_one)
    173 
    174     def _call_one(self, event_name):

/opt/conda/lib/python3.7/site-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    154     def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step))
    155 
--> 156     def map(self, f, *args, **kwargs): return self._new(map_ex(self, f, *args, gen=False, **kwargs))
    157     def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs))
    158     def argfirst(self, f, negate=False):

/opt/conda/lib/python3.7/site-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs)
    838     res = map(g, iterable)
    839     if gen: return res
--> 840     return list(res)
    841 
    842 # %% ../nbs/01_basics.ipynb 336

/opt/conda/lib/python3.7/site-packages/fastcore/basics.py in __call__(self, *args, **kwargs)
    823             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    824         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 825         return self.func(*fargs, **kwargs)
    826 
    827 # %% ../nbs/01_basics.ipynb 326

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in _call_one(self, event_name)
    174     def _call_one(self, event_name):
    175         if not hasattr(event, event_name): raise Exception(f'missing {event_name}')
--> 176         for cb in self.cbs.sorted('order'): cb(event_name)
    177 
    178     def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)

/opt/conda/lib/python3.7/site-packages/fastai/callback/core.py in __call__(self, event_name)
     60             try: res = getcallable(self, event_name)()
     61             except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
---> 62             except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)
     63         if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
     64         return res

/opt/conda/lib/python3.7/site-packages/fastai/callback/core.py in __call__(self, event_name)
     58         res = None
     59         if self.run and _run:
---> 60             try: res = getcallable(self, event_name)()
     61             except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
     62             except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in after_batch(self)
    558         if len(self.yb) == 0: return
    559         mets = self._train_mets if self.training else self._valid_mets
--> 560         for met in mets: met.accumulate(self.learn)
    561         if not self.training: return
    562         self.lrs.append(self.opt.hypers[-1]['lr'])

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in accumulate(self, learn)
    480     def accumulate(self, learn):
    481         bs = find_bs(learn.yb)
--> 482         self.total += learn.to_detach(self.func(learn.pred, *learn.yb))*bs
    483         self.count += bs
    484     @property

/opt/conda/lib/python3.7/site-packages/fastai/metrics.py in accuracy_multi(inp, targ, thresh, sigmoid)
    199 def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    200     "Compute accuracy when `inp` and `targ` are the same size."
--> 201     inp,targ = flatten_check(inp,targ)
    202     if sigmoid: inp = inp.sigmoid()
    203     return ((inp>thresh)==targ.bool()).float().mean()

/opt/conda/lib/python3.7/site-packages/fastai/torch_core.py in flatten_check(inp, targ)
    785     "Check that `inp` and `targ` have the same number of elements and flatten them."
    786     inp,targ = TensorBase(inp.contiguous()).view(-1),TensorBase(targ.contiguous()).view(-1)
--> 787     test_eq(len(inp), len(targ))
    788     return inp,targ
    789 

/opt/conda/lib/python3.7/site-packages/fastcore/test.py in test_eq(a, b)
     35 def test_eq(a,b):
     36     "`test` that `a==b`"
---> 37     test(a,b,equals, cname='==')
     38 
     39 # %% ../nbs/00_test.ipynb 25

/opt/conda/lib/python3.7/site-packages/fastcore/test.py in test(a, b, cmp, cname)
     25     "`assert` that `cmp(a,b)`; display inputs and `cname or cmp.__name__` if it fails"
     26     if cname is None: cname=cmp.__name__
---> 27     assert cmp(a,b),f"{cname}:\n{a}\n{b}"
     28 
     29 # %% ../nbs/00_test.ipynb 16

AssertionError: Exception occured in `Recorder` when calling event `after_batch`:
	==:
96
32

Here is my code:

#establish which DL model to use
#here we need to change the perforamnce metric as unidim error rate won't work
#brie_pop=BrierScoreMulti(thresh=0.4) did not work due to non-mutual exclusivity of strictly proper scoring rule?
fbeta_macro=FBetaMulti(beta=1.1, thresh=0.5, average='macro')
fbeta_macro.name='FBeta (macro)'
fbeta_sample=FBetaMulti(beta=1.1, thresh=0.5, average='samples')
fbeta_macro.name='FBeta (samples)'

#multiclass modification
learn2=vision_learner(dls2, resnet18, metrics=[partial(accuracy_multi, thresh=0.5), fbeta_macro, fbeta_sample] )
learn2.lr_find() #find the optimal learning rate
learn2.fine_tune(3, 1e-3) #args: n epochs, learning rate. Not in documentation for some reason...ugh

Has anyone else experienced a similar error, or possesses the API expertise to find the root cause of this issue?
Thanks for your time!

bencoman · July 23, 2023, 4:15am

I’m not very familiar with this, but I’ll give it a go…

You report your error using three cutom metrics.

metrics=[partial(accuracy_multi, thresh=0.5), 
                                fbeta_macro, fbeta_sample] )

First thing to help would be try each one separately to narrow down the issue.

You don’t say if you already had a minimal metric implementation working.

The minimal metric example is:

def custom_accuracy(preds, targets, threshold=0.5):
    preds2 = (preds > threshold).float()
    accuracy = (preds2 == targets).float().mean()
    return accuracy

learn = vision_learner(dls, resnet34, metrics=[custom_accuracy])

I notice the FBetaMulti function definition does not have parameters preds, targets, so something might clash there.

FBetaMulti (beta, thresh=0.5, sigmoid=True, labels=None, pos_label=1,
             average='macro', sample_weight=None)

I’m not familiar with the FBetaMulti function, but its documentation says its for “multi-label classification problems”, which is different from “multi-class classification” using a single label.

IIUC…

Multi-class classification: Each instance belongs to exactly one class from a set of mutually exclusive classes.
Multi-label classification: Each instance can have multiple labels, and the classes are not mutually exclusive.

Pay attention to the warning under Single-label classification.

So an approach to solve this would be:

Drop back to a single custom metric
Get the basic example working
Wrap your basic example around your calls to FBeta, so you can sprinkle debug-prints to show type and content of data in/out between “basic example” and “Fbeta”.

phiriv · July 25, 2023, 7:49pm

Thanks ben, your suggestion worked well - the basic implementation got the job done and I’ll play around with more complicated metrics for a little while before moving on to the next lesson. Appreciate your clear exposition of the problem

Take care

xvyuuakl5 · August 16, 2023, 10:17pm

The lesson 1 cats and dogs “running your first notebook” model example screenshot shows an error rate under 0.10. (https://github.com/fastai/fastbook/blob/master/01_intro.ipynb)

But when I try to run it locally, the error rate is always 0.331529. Seems like it should at least change from epoch to epoch? And it should eventually get lower?

nba · August 22, 2023, 11:11am

In the “Limitations Inherent To Machine Learning” section, one of the points mentioned is that ML models “This learning approach only creates predictions, not recommended actions.”

What kind of actions did he have in mind? And how are they different from predictions (which can theoretically inform actions)?

rahim · September 11, 2023, 7:30pm

I’m not sure where the error is in my training, but I tried to change bird/forest for seaweed/car and I am getting odd results. While it is correctly detecting seaweed or a car, the probability is really off. For my baseline seaweed.jpg for example, it is saying ‘0.00000’. What am I overlooking?
Also what is returned in probs[0] vs probs[1]?

.

vbakshi · September 11, 2023, 7:47pm

Can you print out the full output from learn.predict and see what that shows? My guess is that “seaweed” is actually the second class (with the first being “car”) and so the probs value corresponding to that would be probs[1]. probs is the predicted value for each class, with probs[0] being the prediction for the first class, and probs[1] being the prediction of the second class. Since this is a classification problem, probs is the probability of each class that the model predicts for the image.

Also, the second value returned by learn.predict returns the index of the class with the highest probability, so printing that out you can see if it’s 0 or 1 and compare that with the probs tensor.

rahim · September 12, 2023, 4:08am

You are correct that it could be reversed but how do i determine which class is where? I looked at the previous example and I replaced forest with car and bird with seaweed respectively, but in that example, it also used probs[0] to confirm bird.jpg. Here is my screenshot.

rahim · September 12, 2023, 4:10am

this is the bird code. In this, it expects bird class to be in probs[0]

is the original instructions incorrect?

bencoman · September 12, 2023, 2:08pm

This is a common confusion since the example is extra-simplified and ignores the middle parameter of the three parameters returned by learn.predict().
Try…

pred, ndx, probs = learn.predict(...)
print(f"This is a {pred} with probability {probs[ndx]}.")

rahim · September 14, 2023, 2:33am

ah that helps. So basically the function comes back with a prediction (pred) of two things it understands (bird, forest) and then tells you where in the tensor (probs) you can see the probability via ndx as the key?

vsrinivastp · October 4, 2023, 11:18am

Hi, the images in the markdown cells of the notebook do not render when opened using collab. Any idea how do I get them displayed?

renzoide · October 6, 2023, 12:53am

Hi I have a question, I am with lesson 1 and when I do a test to load an image that is neither a bird nor a forest, the model seems to interpret that everything is a bird, is this ok? it can only distinguish birds and forests? Thanks in advance

Liannnnn · October 14, 2023, 3:13pm

Hi @renzoide, as far as I understand, you can train the model to recognize many things… but it cannot recognize a desert if you did not include deserts in your training set. Basically, you will have to replace ‘bird’ by ‘desert’ starting from the part where you build the dataset, like so:

Liannnnn · October 14, 2023, 3:18pm

For anyone else running into the same issue, I thought I could post my problem and solution here:

Google Colab was really slow for me while running the first model in the Chapter 1 notebook. Like, I got bored of waiting after half an hour. I solved this by changing the hardware accelerator under runtime > Change runtime type to GPU.

There are some tutorials online on how to run these notebooks with a GPU but it seems like it became a lot easier since those were written .

itachiuchiha · October 19, 2023, 12:11pm

cleaner = ImageClassifierCleaner(learn)
cleaner

We have the option to delete or preserve the images that ImageClassfierCleaner lists as being misclassified in accordance with what we learned in class.

I have photographs that pertain to multiple classes; is it possible to delete all of the images for each class at once?
instance:- The widget shows photographs of cats. When I click on Train, I can select the images I want to erase. When I click on Test, I can similarly select the images I want to delete.
Why do photographs selected for deletion now display the keep tag when I go back to the training section?

renzoide · October 21, 2023, 1:43pm

Hi @Liannnnn it makes sense what you say, since the model can only predict from the training data, if it is given another type of image that was not part of the training data it will try to “label” it in one of the two groups, thanks.

findvish · November 12, 2023, 6:40am

Hello,
I tried training the sample sentiment analysis using colab with GPUs. At the end of the training I specified a simple example of mine for a movie review - “I really did not like that movie” and the learning model said it was a positive sentiment and that it was ~88% confident.
I am attaching the screenshot for your reference below.
This output really makes me wonder what went wrong here.
Is the model relying on the keyword “like” in deciding that it is a positive sentiment?
Great if you can help explain this.

Axel · November 12, 2023, 4:36pm

That’s a great way to test your model and learn about its limitations!

No one here can say exactly what went wrong just by looking at the notebook, but here are some ways to approach it:

Your accuracy peaks at 93%, meaning 7% of the samples were incorrectly classified. Can you figure out a way to display those?
Add a few negative examples with the word “like” to your training set, train another model and see how the model’s behaviour changes. Be careful however not to then test your model with the same exact examples.
Probably a bit overkill this early in the course, but there are ways to extract from any given inference run indicators of which parts of the input data influenced the output most.

Other things to note:

Between epoch 2 and 3, your model shows signs of overfitting. Perhaps the hyper parameters can be tuned better?
You could also read up on the pretrained model weights provided with AWS_LSTM, and try to assess how suitable it is for this task.

findvish · November 12, 2023, 7:50pm

Hello Axel,
Thanks for your suggestions. While I am barely starting out, your suggestions make a lot of sense. I will certainly try to gain more expertise and poke around this a little bit. I think you are alluding things like confusion matrix and so on to dig a little deeper. But it is going to take me a few weeks to get there. Appreciate the help!