Lesson 2 Download nb - underfitting

I’ve been trying to eliminate underfitting in this nb with my dog breed dataset. This discussion moved here from:

I’m copying over some of the posts from there…

Cool. So I noticed in your last experiment you are still underfitting but your validation loss is extremely low. In fact, your error_rate is 0 so your model is classifying every single example in your validation set correctly.

Bad news: it will be difficult for your training set to get to that loss.
Good news: you are basically done with your training.

If you want to improve your model I would increase the size of your validation set and try again.

BTW: what is the size of your validation set?

Here’s the full post for the record:

The funny thing is, I just ran:
learn = create_cnn(data, models.resnet34, ps=0.1, wd=0.04, metrics=error_rate)
for 24 epochs this time, and finally got train_loss < valid_loss!

So that feels like success, but I don’t understand it! Does it make sense to you?

I’ll run with wd=0.001 now.


Only 75 in validation:

I have been wondering if this underfitting behavior has more to do with the data than anything else.

So yeah, basically decreasing wd and ps in your last experiment helped your model achieve an great validation loss. Notice that your error rate does not decrease. Actually since you said your validation set is 75 images long and 1/75 = 0.1333 basically your model is clasifying 74 images correctly and one incorrectly. Since what you care about is the performance in your validation set, you are practically done. Your validation loss keeps decreasing because the loss (cross entropy) is computed on the probabilities the model assigns to each category, not on the final classification. Think about it like how sure the model is for each of its predictions (it is not the same to classify with confidence that to classify with a high doubt). So once your validation loss is very low and your accuracy is near perfect you are practically done.

I would suggest increasing your dataset size to see if with a larger validation set the model performs as well as it did. And if it does, move to another problem!


Thanks, that’s helpful advice. Sadly, I just re-ran my best case for 30 epochs to see if it would continue improving - and it’s worse now!

learn = create_cnn(data, models.resnet34, ps=0.1, wd=0.04, metrics=error_rate)


Does this inconsistency make any sense??

I’d like to try adding to my dataset. But I think you’re right - I need to move on to Lesson 3!

Francisco, thanks very much for all your help!

1 Like

So when the dataset is small enough you may face some differences in the training results. I wouldn’t worry about it too much, but if you want to be really sure, you can increase your dataset’s size and the results should vary less between runs.

1 Like

Thanks, that makes sense. I guess as more proof of that: I reset the DataBunch and ran for 36 epochs with the same ps, wd, etc. and got:


Weird! Next time I’ll be sure to use more data.

Also I hereby recognize you as a TA Superstar!!


in the previous version of fastai library we have used cycle_multiply parameter that helps mitigating underfitting. I don’t know if there is something like that in new version.


@lesscomfortable when can you say the validation loss is very low, is it <0.001 or <0.01 or what? And if train_loss - valid_loss = e at what value of e, should we stop stop tearing our hair off and stop caring about trying to push down the train_loss. Moreover another point appears here, what if the dataset is huge? Increasing few epochs at a time, in hope of reducing the train_loss isn’t feasible, and even for that we have to throw off the weights I just trained, else the loss starts blowing up instead of coming down. If I had to do the same thing what @ricknta did, (i.e 30 epochs but with a training dataset of 100000), it is a overkill, and considering that I use Colab it would have been impossible to train the model.

If your validation loss and accuracy are low enough you don’t need to care too much if you cannot get your model to overfit since your model is performing great out of sample (in images that it has not trained on). You try to overfit at the beginning because first you want to know if your model is training correctly in your training set and you can always increase regularization afterwards to improve performance in the validation set.

In regard to the other question, you need to train at least two or three epochs to be able to tweak your hyperparams (lr, wd, ps) in a meaningful way. This is regardless of dataset size.

Sorry, I think I failed to frame my second question well, can you please check this thread and answer here? Pls.

Maybe I can help a bit, although I’m not an expert! I spent hours and tried many combinations of lr and epochs, and then also ps (droput) and wd (weight decay) and couldn’t consistently get train_loss < val_loss. What Francisco helped me understand is that it was partly due to not having enough data, and it doesn’t really matter since the model is actually performing very well, with very low losses and error. Hope this helps. Francisco, please correct me if needed!

Sure. Thanks for the explanation, I got what Francisco and you were talking about. Unfortunately I have the exact opposite problem you have, I have enormous amount of data (not as huge as ImageNet though), and hence training is very slow (it takes about an hour just to train 2 epochs on 11 GB K80 card). Hence at mid point of training(say about 9/14 epochs) when you realise 3 more epochs might have saved you from under fitting, now you have to wait for 5 more epochs. Throw that result away and do your work again for 17 epochs(almost 9 hours, due to lack of foresight which might have saved you 7.5 hours).