Use_clr causes the validation loss to decrease and then increase after half

zubair1.shah · August 23, 2018, 5:04am

Can any one help why the use of use_clr cause the validation error to decrease and then start increasing again after the epochs reached half. I tried different drapouts and changing the other parameters but this is always happening. I am using a small data set (trn =100 text samples, val = trn =100 text samples) first to see which parameters are best.

    wd=1e-4
    bptt = 60 #with 70 cause memory error after some time 
    bs = 52
    opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
    drops = np.array([0.25, 0.1, 0.2, 0.02, 0.15])*2

    lr=1e-3
    lrs = lr

    trn_dl = LanguageModelLoader(np.concatenate(trn_lm1), bs, bptt)
    val_dl = LanguageModelLoader(np.concatenate(val_lm1), bs, bptt)
    md = LanguageModelData(PATH, 1, vs, trn_dl, val_dl, bs=bs, bptt=bptt)

    learner= md.get_model(opt_fn, em_sz, nh, nl, 
        dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])

    learner.metrics = [accuracy]
    learner.model.load_state_dict(wgts)

    # Changing cyclic momentum
    # using cyclic momentum and AdamW optimizer
    learner.freeze_to(-1)
    learner.fit(lrs/2, 1, wds=wd, use_clr=(10,5), cycle_len=1, use_wd_sched=True)

   Epoch
  100% 1/1 [00:46<00:00, 46.44s/it]
  epoch      trn_loss   val_loss   accuracy                 
  0      6.573314   5.392568   0.234703  
 [array([5.39257]), 0.23470321856439114]

After tuning the last layer we train further and get this behaviours. Can anyone help

   # Testing next layer
   learner.unfreeze()
   learner.fit(lrs, 1, wds=wd, use_clr=(10,5), cycle_len=15, use_wd_sched=True) 

  Epoch
 100% 15/15 [13:04<00:00, 52.29s/it]
 epoch      trn_loss   val_loss   accuracy                 
0      6.070682   5.12017    0.23257   
1      5.783371   5.019183   0.235936                 
2      5.530017   4.944405   0.239494                 
3      5.322582   4.936019   0.239319                 
4      5.151774   4.953646   0.239379                 
5      5.025274   4.937507   0.241399                 
6      4.932359   4.935177   0.243282                 
7      4.843357   4.93804    0.242848                 
8      4.762087   4.942481   0.243768                 
9      4.705473   4.950382   0.244738                 
10     4.645226   4.961413   0.244852                 
11     4.602684   4.969392   0.245466                 
12     4.567122   4.975381   0.244395                 
13     4.524189   4.986904   0.244889                 
14     4.506185   4.978909   0.24621                  
[array([4.97891]), 0.2462101262062788]