Help, my train_loss is getting worse, valid_loss is up and down and error_rate is decreasing most of time

franva · December 29, 2019, 4:06am

Update 1

I updated my data set to have balanced images in birds and others folders.
Now the dataset folder structure is:

minidata
- train
  - birds
  - others
- validation
  - birds
  - others

I also removed the valid_pct = 0.2 in favour of using folder structures,
data = ImageDataBunch.from_folder(path=path_img, train=train_folder, valid=valid_folder,.....)
so you can see that when picking up images, birds and others have a good mix:

however, the train_loss and valid_loss are still becoming worse and worse.

I use lr_find() to find the best lr and applied:

Unfortunately, the train_loss and valid_loss are decreasing

Here is my code:

from fastai.vision import *
from fastai.metrics import error_rate
from fastai.callbacks import EarlyStoppingCallback,SaveModelCallback
from datetime import datetime as dt
from functools import partial

path_img = '/path to Datasets/BirdsNotBirds/minidata'
train_folder = 'train'
valid_folder = 'validation'

tunedTransform = partial(get_transforms, max_zoom=1.5, )

data = ImageDataBunch.from_folder(path=path_img, train=train_folder, valid=valid_folder, ds_tfms=tunedTransform(), 
                                  size=(299, 450), bs=40, classes=['birds', 'others'], 
                                  resize_method=ResizeMethod.SQUISH)
data = data.normalize(imagenet_stats)
data.show_batch(rows=6, figsize=(14,12))

learn = cnn_learner(data, models.resnet50, metrics=error_rate)

learn.lr_find()
learn.recorder.plot()
callbacks = [EarlyStoppingCallback(learn, monitor='error_rate', mode='min', min_delta=1e-5, patience=5)]

learn.fit_one_cycle(30, callbacks=callbacks, max_lr=slice(3e-5,5e-2))

Please help.

I have 2 classes: birds(100K images) and others(7K images)

folder structure:

data
- birds (100K images)
- others (7K images)

data = ImageDataBunch.from_folder(path=path_img, bs=40, valid_pct= 0.2, ds_tfms=get_transforms(), size=(299, 450), classes=['birds', 'others'], resize_method=ResizeMethod.SQUISH)
    data = data.normalize(imagenet_stats)
    learn = cnn_learner(data, models.resnet50, metrics=error_rate)
    callbacks = [EarlyStoppingCallback(learn, monitor='error_rate', min_delta=1e-5, patience=5)]

    learn.fit_one_cycle(30, callbacks=callbacks, max_lr=slice(1e-5,8e-2))

According to the Lesson 2, @jeremy 's advice:
if train_loss > valid_loss it means: learning rate is too high or too few epochs

Then I adjusted my LR several times, still I am facing this issue: train_loss is getting worse.

Please help~!

bwarner · December 29, 2019, 6:40am

I don’t think that’s stated quite right.

In general, if training loss is higher than validation loss then your model is underfitting. Which usually needs you need to train longer for with a higher learning rate. But you should be seeing both decrease in this case, not increase.

If validation loss increases, this means your model probably is overfitting and your learning rate is too high.

In this case since both your training loss and validation loss are increasing (cyclically increasing for valid), I’d guess the learning rate is too high. Among a few other items to consider:

Your data has a large class imbalance, so splitting the data randomly might not be creating a representative validation set. Rachel Thomas has a good article on choosing a validation set which I would recommend reading.

Error rate isn’t necessarily the most informative metric when dealing with class imbalance. AUROC would be one example of better choice.

Using lr_find can help you select a good learning rate.

franva · December 29, 2019, 7:09am

Thanks for your help.

when I change monitor from error_rate to accuracy, I got an error saying:

only train_loss, valid_losss and error_rate are available.

So what is the AUROC? and I cannot use it anyway, because it is not available for monitoring.

franva · December 29, 2019, 7:25am

Also, in order to use

learn.lr_find()

do I need to run

learn.fit_one_cycle()

before the lr_find()?

OR, I can use lr_find() immediately after
learn = cnn_learner() is created ?

franva · December 29, 2019, 7:31am

Also, I have updated my question to include my folder structure.

I have separated folders for birds and others, so I guess it should be fine?

franva · December 30, 2019, 3:43am

Also, I watched the Lesson 2 video again and found that

https://course.fast.ai/videos/?lesson=2
at time: 1:10:00

@jeremy explicitly answered the imbalanced data set always works

So now, I am confused. Who is correct?

bwarner · December 30, 2019, 5:25am

You can use it. You need to add it to your metrics. Fastai has pretty good documentation

To find a good learning rate before training, you’d run lr_find before fit_one_cycle.

Both? Nothing I wrote contradicts what Jeremy said in the course. I mentioned that your validation set might not be representative with a random split and your choice of metric isn’t necessarily the best given the class imbalance. I never wrote that training a model wouldn’t work.

franva · December 30, 2019, 6:21am

Thanks @bwarner

To use AUROC, it is Restricted to binary classification tasks.
In my case, it should be fine, coz I’m trying to use AI model to tell whether there are birds or not.
However, how to do binary classification?? Currently, I believe my code is Categorical Classification

For lr_find(), thanks yes, I need to use it before the training happened.

For the validation set, my understanding for setting valid_pct = 0.2 is
the learner will pick up 20% images from birds folder and 20% images from others and than using them to validate the trained model.
That means there will be
100K * 0.2 = 20K bird images and
7K * 0.2 = 1.4K other images

so both birds and others images will be represented in validation process.
So I do not quite get what you mean:

your validation set might not be representative with a random split

As to

your choice of metric isn’t necessarily the best given the class imbalance
then what is the best choice of metric in this case ?

Once again, I appreciate your help and efforts in explaining things to me.

Btw, how did you quote my posts in your answer???

bwarner · December 30, 2019, 8:33pm

Regarding update 1: Your learning rate is still too high. You should watch (or rewatch) Lesson 3.