Lesson 2 - Non-beginner discussion

As we talked about running the first cat-dog example of 01_intro, I noticed that when I trained the model, the second step of fine-tune (here updating the whole model) seems to overfit; error-rate increases, validation loss increases, with training loss dropping significantly. Is that overfitting an oversight, or expected behavior?

A link to this awesome website seems very appropriate:

2 Likes

I have a requirement to compare between inference metric (output metric such as accuracy,f1score etc) of various models on a certain topic (eg text classification), and pick the best model. How do you do it , has anybody tried using any statistical significance tests for the same ?. thanks

Maybe we should build this into fastai at some point, at least the easy incarnation involving rotating the images.

1 Like

Generally I’d imagine you compare their results on a held out test set. If you wanted to use a p-test you could, just make sure to do multiple runs with your models if you can (IE 3 or 5 times)

1 Like

I see that DataLoader subclasses GetAttr. Can you explain t a bit?

Can you share intuition behind observing metric vs loss on validation set during training? I thought metric like accuracy is much more volatile, especially if validation set is small, so choosing checkpoints based on minimizing validation loss seemed like a good idea to me.

One thing to consider: let’s for simplicity consider a classification setting. The value of a cross-entropy loss depends not only on whether the image is classified correctly, but also on the confidence that the model has in the prediction. So your loss can increase if the model is getting more things wrong, OR if the model is becoming less confident about some predictions.

Intuitively, the second thing might not necessarily be bad: if the model was overconfident for some reason earlier, it’s ok if it becomes less confident now (and so the loss increases) as long as the prediction is still correct. If you think in these terms, you see how you might get a loss that’s increasing and an accuracy that is improving.

For example, the model might be learning now how to classify well some data points that was getting wrong earlier (which would decrease the loss by a certain amount A), and in order to do so it might need to become less confident about other examples that it was already getting right (which would increase the loss by B). If B > A then you will get a net increase in the loss, but also an improved accuracy.

4 Likes

Hey can anyone point me in the right direction with this:
In chapter one of fastbook, there is this statement

The importance of pretrained models is generally not recognized or discussed in most courses, books, or software library features, and is rarely considered in academic papers. As we write this at the start of 2020, things are just starting to change, but it’s likely to take a while.

My interest is in just how things are changing, are there any papers that are tackling this you can point us to, or are there any interesting ideas that you can share with regards to this.

An example of this is ULMFit paper :slight_smile:

But if I am correct, ULMFit is a 2018 paper

Just because it’s old (three years) doesn’t mean it doesn’t work any more :wink: ULM-FiT was the start of utilizing transfer learning for text data. Multi-FiT just came out in the last year or so which uses this approach for multi-lingual problems

I didn’t mean to say that it doesn’t work, what I am trying to say is that, from the book:

As we write this at the start of 2020, things are just starting to change, but it’s likely to take a while.

It says that things are just starting to change in 2020, so I wanted to be shown the things that are happening now in that area.

I had asked this question yesterday and @sgugger had replied. But wanted to follow up here re some clarifications too. Basically I would like to get an intuition about transfer learning and its relationship with neural nets /DL. Per my understanding the concept of Transfer Learning predates DL. They way @jeremy explained it yesterday re using base resnet (trained for a different task) to improve on book/pets detector (which can be argued for classical ML classifiers too. So is there something inherent in the architecture of Neural Networks that make them more efficient to be used for Transfer Learning, compared to say Random Forests / GBTs etc @jeremy @sgugger Thanks!

I am trying to build a databunch/dataloader from already pre-processed data (store as numpy array as x_train, x_test, y_train, y_test). However, I am not sure how to do that as fastai2 expects ‘path’ (of image file names) as input… Is there any way I can feed this already pre-processed data directly to a learner ?

Yes, you can totally do that. Look at the Data block API (https://docs.fast.ai/data_block.html). Jeremy gave an overview at the end of last lesson.

It is referring to the fastai v1. I need for the v2. Moreover, I didn’t get how to pass the numpy array as input to create a dataloader going through the document.

In chapter 7, in the progressive resizing section it says that one should be careful using this on pre-trained models if the transfer learning model dataset is similar to the original dataset in terms of images and sizes as the weights will not change much, and thus if we train on smaller images it might damage the already learnt pre-trained weights.

Is this what Jeremy referred to as ''Catastrophic forgetting in transfer learning" in lesson 2?

Have a look here:

They idea with the synthetic Gabor filters seems to be super useful (see last picture in the blog post).

Be sure to check the publication, as they investigated other interesting concepts too.

1 Like

The docs are fastai v1, but the same functionality is present in v2 (although slightly different). Look here:

(minute 1:23:37)

You just need to create your own get_items (so remove get_image_files and insert your own).