ULMFit - expected accuracy with gradual unfreezing

samonderous · April 24, 2019, 10:48pm

I’m using ULMFit to fine tune a language model and text classifier. I’m noticing that with gradual unfreezing on the text classifier learner, accuracy starts low for the unfrozen layers. It’s almost like it’s starting to fine-tune again.

What is the expected behavior? Is it expected that accuracy dip until the unfrozen layers are fine-tuned? Or should accuracy always be increasing?

My code for training my text classifier (after fine tuning the language model) looks like this.

data_clas = TextClasDataBunch.from_df(path = “”, train_df = df_trn, valid_df = df_val, vocab=data_lm.train_ds.vocab, min_freq=1, bs=32)
data_clas.save()

clearn = text_classifier_learner(data_clas, arch=model, drop_mult=0.5) # get the learner
clearn.load_encoder(‘ft_enc’)
clearn.freeze()
clearn.purge()
torch.cuda.empty_cache()

clearn.fit_one_cycle(cyc_len=400, max_lr=1e-2, moms=(0.8, 0.7))

Accuracy starts out very low after the unfreeze below

torch.cuda.empty_cache()
clearn.freeze_to(-2)
clearn.fit_one_cycle(20, slice(1e-4,1e-2), moms=(0.8,0.7))

torch.cuda.empty_cache()
clearn.freeze_to(-3)
clearn.fit_one_cycle(20, slice(1e-5,5e-3), moms=(0.8,0.7))

torch.cuda.empty_cache()
clearn.unfreeze()
clearn.fit_one_cycle(500, slice(1e-5,1e-3), moms=(0.8,0.7))

nooreo92 · August 22, 2019, 9:07am

samonderous:

for the unfrozen layers. It’s almost like it’s starting to fine-tune again.

What is the expected behavior? Is it expected that accuracy dip until the unfrozen layers are fine-tuned? Or should accuracy always be increasing?

My code for training my text classifier (after fine tuning the language model) looks like this.

data_clas = TextClasDataBunch.from_df(path = “”, train_df = df_trn, valid_df = df_val, vocab=data_lm.train_ds.vocab, min_freq=1, bs=32)
data_clas.save()

clearn = text_classifier_learner(data_clas, arch=model, drop_mult=0.5) # get the learner
clearn.load_encoder(‘ft_enc’)
clearn.freeze()
clearn.purge()
torch.cuda.empty_cache()

clearn.fit_one_cycle(cyc_len=400, max_lr=1e-2, moms=(0.8, 0.7))

Accuracy starts out very low after the unfreeze below

torch.cuda.empty_cache()
c

Hi,

I am facing the same issue. Did you manage to understand why this happens ?