How to split the epochs when fine-tuning my language model

I want to fine tune my language model. However, when running it for 8 epochs it takes a long time and my online environment automatically disconnects. Thus, I need a way to split the epochs into smaller units and save the language model in between.

learn_lm.fit_one_cycle(8, 1e-3, moms=(0.8,0.7))

This is the code for 8 epochs without saving in between epochs
How to I have to change the second and third parameter in order to make this work when i want something like this:

learn_lm.fit_one_cycle(4, 1e-3, moms=(0.8,0.7))'fine_tuned_4')
learn_lm.fit_one_cycle(4, 1e-3, moms=(0.8,0.7))

You may want to read this thread on how to use fit_one_cycle in practice

To answer your question directly, you’ll need to do a little guesswork. Use lr_find to look at the learning rates that will fit, and see if they actually need to change. If I were you, I’d stick with the same arguments as before, let it fit, and see how the model performs. Then, perhaps I’d go back to the start, train with the initial arguments for 4 epochs, and after, see if a new learning rate is appropriate.

Besides that, does your online env disconnect because it believes the session is idle? What kind of env are you on? Is this in a notebook of some kind?

1 Like

Thanks a lot for your answer! I will do as you suggested and try the two options: same parameters vs rerun learning rate finder after 4 epochs.
The momentum parameters does not have to be adapted?

I am using the free GPUs provided by Paperspace Gradient. They run for a maximum of 6 hours. Due to the fact that my language model was still improving after 6 hours I want to try out more epochs with saving the model state in between.

If you’re using v1, there’s a start_pct parameter. Include a SaveModelCallback to save after each epoch then you can see after how many epochs your last one went to, then adjust your start_pct and continue training from there

(this isn’t in v2 yet)


Link to callback since I was just reading about it myself.


Thanks a lot, I did not know about the SaveModelCallback feature!
I do not find the start_pct parameter you mentioned. fit_one_cycle() has a parameter pct_start but I am not sure what it does from the documentation. It also has a parameter start_epoch which I think might be the solution here?

You want pct_start

According to this post pct_start stands for the “Percentage of total number of epochs when learning rate rises during one cycle”.

I do not understand how I would relate this to picking up where I left of after 4 epochs, when I want 8 in total? What percentage would I set it to?

1 Like

@lewa sorry you’re right, it is start_epoch :slight_smile:

1 Like