Does this Loss Curve Mean I need More Dropout or Less Epochs?

StormFalcon32 · February 23, 2020, 7:58pm

I’m training an ULMFiT language model on around 80K tweets using the following code

learn = language_model_learner(data_lm, arch=AWD_LSTM, drop_mult=0.8)

learn.freeze()

learn.fit_one_cycle(cyc_len=1, max_lr=1e-2, moms=(0.8, 0.7))

learn.unfreeze()

learn.fit_one_cycle(cyc_len=20, max_lr=1e-2, moms=(0.8, 0.7))

# plot losses

losses_lm_fig = learn.recorder.plot_losses(return_fig=True)

losses_lm_fig.savefig(path / 'losses_lm_fig.jpg', dpi=1000, bbox_inches='tight')

Should I increase the drop_mult or lower the number of epochs?

florobax · February 24, 2020, 11:57am

As you are clearly overfitting, you have several options:

Increasing dropout is one, but be careful not to go too high or else your network will learn nothing
You can add more weight decay (argument wd in fit_one_cycle) as it will prevent your weights from being too big.
You can use a smaller/simpler model.
You can add more data

The latter option is clearly the best but the hardest as you can’t get infinite data. Just keep in mind that sometimes there is nothing more for your model to learn and your dataset is simply not good enough for the task you are trying to learn.

Finally, on the number of epochs: you can lower it, but it will not make you reach a better validation loss than the best one you get here. Just use a callback to store the weights when validation loss is at its best and it should be ok. Also you might want to do more epochs before unfreezing.

StormFalcon32 · February 24, 2020, 6:51pm

Yes, I will try using a callback to save the best version. Why do you recommend more epochs before unfreezing? In lesson 4 of the fastai course, the language model demonstrated for IMDb text classification only trains for 1 epoch before unfreezing.

florobax · February 25, 2020, 10:50am

When frozen the model is effectively smaller and will therefore not tend to overfit as much. I personally like to see how good a model can get when frozen before trying to unfreeze. And most of the time unfreezing doesn’t make it better, it just makes it overfit. So as long as your model learns, I suggest not unfreezing it. Once it is as good as it can be you can try to unfreeze it to see if you make any meaningful progress.

StormFalcon32 · February 25, 2020, 3:11pm

That makes sense. Do you also think that a similar approach should be used when training the classifier? Right now, I am doing

# gradual unfreezing

learn.fit_one_cycle(cyc_len=1, max_lr=1e-3, moms=(0.8, 0.7))

learn.freeze_to(-2)

learn.fit_one_cycle(1, slice(1e-2 / (2.6 ** 4), 1e-2), moms=(0.8, 0.7))

learn.freeze_to(-3)

learn.fit_one_cycle(1, slice(5e-3 / (2.6 ** 4), 5e-3), moms=(0.8, 0.7))

learn.unfreeze()

learn.fit_one_cycle(4, slice(1e-3 / (2.6 ** 4), 1e-3), moms=(0.8, 0.7), callbacks=[callbacks.SaveModelCallback(learn, monitor='f_beta', name='model')])

Would you recommend doing [1, 1, 4] instead of [1, 1, 1, 4]? Or maybe something even more extreme like [2, 2, 2]?

florobax · February 25, 2020, 3:47pm

Well I honestly don’t know, you have to try and see what works best.

StormFalcon32 · February 25, 2020, 3:50pm

Yeah fair enough.