Training & Validation Loss Increases then Decreases

bwarner · May 15, 2019, 11:20pm

I’m working with the Stanford Dogs 120 dataset, and have noticed that I get the following pattern with ResNet-50 and ResNet-101 where in the second epoch the training and validation loss increases followed by the training and validation loss decreasing in the following epochs.

valid%20loss

I am using lr_find() to select a learning rate where the slope is steepest, and have experimented with different weight wd and dropout ps, but the pattern still happens. I’m wondering if this is normal or if it means there’s a setting I should change?

dipam7 · May 16, 2019, 3:37am

Hey, first of all welcome to the community. This shouldn’t be the case according to me. Can you share more of your code?

bwarner · May 16, 2019, 3:56pm

Sure. I am constructing my data using the datablock api:

src = (ImageList.from_folder(img_path, extensions='.jpg')
       .split_by_folder(train='train', valid='valid')
       .label_from_folder())
data = src.transform(tfms, size=size, resize_method=ResizeMethod.PAD, padding_mode='zeros')
       .databunch(bs=batchsize)
       .normalize(imagenet_stats)

I am using padding_mode='zeros' because reflection and border error out and it produces a more accurate results than squishing or cropping. tfms are the defaults.Then I create the model, call lr_find

learn = cnn_learner(data, models.resnet101, metrics=[error_rate, accuracy], path = path)
learn.lr_find()
learn.recorder.plot()

And then choose a learning rate between 3e-3 and 6e-3

lr = 6e-3
learn.fit_one_cycle(10, lr)

Which results in an output similar to the one I posted. I’ve tried increasing weight decay wd and dropout ps but it still results in the same pattern.

jayrodge · July 11, 2019, 6:29pm

Did you figure out the reason?

Krieker · July 25, 2019, 10:40am

Is your dataset balanced? I have similar issue

Antoine.C · July 25, 2019, 12:54pm

Isn’t that the expected behaviour of the one-cycle policy?

01h18m55s in lesson 3 (2019)

ste · July 25, 2019, 3:01pm

TL;DR: your model is working as expected

I saw that you split by folder.
Probably the reason of the behavior you’ve described is the specific validation set you’ve chosen, that is not randomly selected from the same distribution as the training set, but is fixed and pre-defined (maybe holdout).

I’m pretty sure that if you randomly split train and validation set with 80% 20% and re-train the model, you’ll see the accuracy increasing up to the maximum as expected.

bwarner · August 6, 2019, 5:09am

That’s right. I forgot about that when I posted the original question. Thanks!