Can any one help why the use of use_clr cause the validation error to decrease and then start increasing again after the epochs reached half. I tried different drapouts and changing the other parameters but this is always happening. I am using a small data set (trn =100 text samples, val = trn =100 text samples) first to see which parameters are best.
wd=1e-4
bptt = 60 #with 70 cause memory error after some time
bs = 52
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
drops = np.array([0.25, 0.1, 0.2, 0.02, 0.15])*2
lr=1e-3
lrs = lr
trn_dl = LanguageModelLoader(np.concatenate(trn_lm1), bs, bptt)
val_dl = LanguageModelLoader(np.concatenate(val_lm1), bs, bptt)
md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)
learner= md.get_model(opt_fn, em_sz, nh, nl,
dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])
learner.metrics = [accuracy]
learner.model.load_state_dict(wgts)
# Changing cyclic momentum
# using cyclic momentum and AdamW optimizer
learner.freeze_to(-1)
learner.fit(lrs/2, 1, wds=wd, use_clr=(10,5), cycle_len=1, use_wd_sched=True)
Epoch
100% 1/1 [00:46<00:00, 46.44s/it]
epoch trn_loss val_loss accuracy
0 6.573314 5.392568 0.234703
[array([5.39257]), 0.23470321856439114]
After tuning the last layer we train further and get this behaviours. Can anyone help
# Testing next layer
learner.unfreeze()
learner.fit(lrs, 1, wds=wd, use_clr=(10,5), cycle_len=15, use_wd_sched=True)
Epoch
100% 15/15 [13:04<00:00, 52.29s/it]
epoch trn_loss val_loss accuracy
0 6.070682 5.12017 0.23257
1 5.783371 5.019183 0.235936
2 5.530017 4.944405 0.239494
3 5.322582 4.936019 0.239319
4 5.151774 4.953646 0.239379
5 5.025274 4.937507 0.241399
6 4.932359 4.935177 0.243282
7 4.843357 4.93804 0.242848
8 4.762087 4.942481 0.243768
9 4.705473 4.950382 0.244738
10 4.645226 4.961413 0.244852
11 4.602684 4.969392 0.245466
12 4.567122 4.975381 0.244395
13 4.524189 4.986904 0.244889
14 4.506185 4.978909 0.24621
[array([4.97891]), 0.2462101262062788]