Help, my train_loss is getting worse, valid_loss is up and down and error_rate is decreasing most of time

Update 1

I updated my data set to have balanced images in birds and others folders.
Now the dataset folder structure is:

  • minidata
    • train
      • birds
      • others
    • validation
      • birds
      • others

I also removed the valid_pct = 0.2 in favour of using folder structures,
data = ImageDataBunch.from_folder(path=path_img, train=train_folder, valid=valid_folder,.....)
so you can see that when picking up images, birds and others have a good mix:

however, the train_loss and valid_loss are still becoming worse and worse.

I use lr_find() to find the best lr and applied:

Unfortunately, the train_loss and valid_loss are decreasing

Here is my code:

from import *
from fastai.metrics import error_rate
from fastai.callbacks import EarlyStoppingCallback,SaveModelCallback
from datetime import datetime as dt
from functools import partial

path_img = '/path to Datasets/BirdsNotBirds/minidata'
train_folder = 'train'
valid_folder = 'validation'

tunedTransform = partial(get_transforms, max_zoom=1.5, )

data = ImageDataBunch.from_folder(path=path_img, train=train_folder, valid=valid_folder, ds_tfms=tunedTransform(), 
                                  size=(299, 450), bs=40, classes=['birds', 'others'], 
data = data.normalize(imagenet_stats)
data.show_batch(rows=6, figsize=(14,12))

learn = cnn_learner(data, models.resnet50, metrics=error_rate)

callbacks = [EarlyStoppingCallback(learn, monitor='error_rate', mode='min', min_delta=1e-5, patience=5)]

learn.fit_one_cycle(30, callbacks=callbacks, max_lr=slice(3e-5,5e-2))

Please help.

I have 2 classes: birds(100K images) and others(7K images)

folder structure:

  • data
    • birds (100K images)
    • others (7K images)

data = ImageDataBunch.from_folder(path=path_img, bs=40, valid_pct= 0.2, ds_tfms=get_transforms(), size=(299, 450), classes=['birds', 'others'], resize_method=ResizeMethod.SQUISH)
    data = data.normalize(imagenet_stats)
    learn = cnn_learner(data, models.resnet50, metrics=error_rate)
    callbacks = [EarlyStoppingCallback(learn, monitor='error_rate', min_delta=1e-5, patience=5)]

    learn.fit_one_cycle(30, callbacks=callbacks, max_lr=slice(1e-5,8e-2))

According to the Lesson 2, @jeremy 's advice:
if train_loss > valid_loss it means: learning rate is too high or too few epochs

Then I adjusted my LR several times, still I am facing this issue: train_loss is getting worse.

Please help~!

I don’t think that’s stated quite right.

In general, if training loss is higher than validation loss then your model is underfitting. Which usually needs you need to train longer for with a higher learning rate. But you should be seeing both decrease in this case, not increase.

If validation loss increases, this means your model probably is overfitting and your learning rate is too high.

In this case since both your training loss and validation loss are increasing (cyclically increasing for valid), I’d guess the learning rate is too high. Among a few other items to consider:

Your data has a large class imbalance, so splitting the data randomly might not be creating a representative validation set. Rachel Thomas has a good article on choosing a validation set which I would recommend reading.

Error rate isn’t necessarily the most informative metric when dealing with class imbalance. AUROC would be one example of better choice.

Using lr_find can help you select a good learning rate.


Thanks for your help.

when I change monitor from error_rate to accuracy, I got an error saying:

only train_loss, valid_losss and error_rate are available.

So what is the AUROC? and I cannot use it anyway, because it is not available for monitoring.

Also, in order to use


do I need to run


before the lr_find()?

OR, I can use lr_find() immediately after
learn = cnn_learner() is created ?

Also, I have updated my question to include my folder structure.

I have separated folders for birds and others, so I guess it should be fine?

Also, I watched the Lesson 2 video again and found that
at time: 1:10:00

@jeremy explicitly answered the imbalanced data set always works

So now, I am confused. Who is correct? :weary:

You can use it. You need to add it to your metrics. Fastai has pretty good documentation

To find a good learning rate before training, you’d run lr_find before fit_one_cycle.

Both? Nothing I wrote contradicts what Jeremy said in the course. I mentioned that your validation set might not be representative with a random split and your choice of metric isn’t necessarily the best given the class imbalance. I never wrote that training a model wouldn’t work.

Thanks @bwarner

To use AUROC, it is Restricted to binary classification tasks.
In my case, it should be fine, coz I’m trying to use AI model to tell whether there are birds or not.
However, how to do binary classification?? Currently, I believe my code is Categorical Classification

For lr_find(), thanks yes, I need to use it before the training happened.

For the validation set, my understanding for setting valid_pct = 0.2 is
the learner will pick up 20% images from birds folder and 20% images from others and than using them to validate the trained model.
That means there will be
100K * 0.2 = 20K bird images and
7K * 0.2 = 1.4K other images

so both birds and others images will be represented in validation process.
So I do not quite get what you mean:

your validation set might not be representative with a random split

As to

your choice of metric isn’t necessarily the best given the class imbalance
then what is the best choice of metric in this case ?

Once again, I appreciate your help and efforts in explaining things to me.

Btw, how did you quote my posts in your answer???

Regarding update 1: Your learning rate is still too high. You should watch (or rewatch) Lesson 3.