Validation loss is getting bigger all the time!

Hi Everyone,
I am working on the distracted drivers classification problem using Kaggle states farm dataset. it has 10 classes.
I have started with ResNet-50 with 20% of the data as validation and everything was OK! until I decided to split the validation set by drivers (4 drivers out of the 26) which seems a better idea to match the test set which contains new drivers too.

I started to see strange losses and accuracies for the validation!!

I tried the following to reduce the overfitting:

  • Data augmentation

  • Dropout in the fully-connected layers [0.5 and 0.7]

  • BatchNorm after the last layer

  • Weight Decay (0.1)

but none of them solve my problem!

Any help would be appreciated :sleepy:

Your validation loss starts to increase at the beginning, it means that you overfit right at the first epoch. It makes sense because your model can’t predict on the new classes that they never seen in the training. you should change your approaches, in case of the new drivers in the test set it should classify it as unknown.

I am not using new classes they are both have similar classes. but the drivers in the validation set are not seen in the training set (to check if the model will generalize well for not seen data)

Would you mind upload your notebook to Github or Kaggle kernel so we can have a look at it? It is pretty hard to tell what is going on just from that screenshot, as there are so many things that could go wrong.

sure

note: now i am using only a sample of the data to make my experiments faster.

Since it is a high variance problem, I would recommend using the entire dataset and see if it improves the situation first.

I tried it before and got same problem
thanks

Well, I know that this one sounds like a strange advice, but I would update the fastai library to the latest one and try again.

how can I do that While I am using PaperSpace Gradient?

Here is a guide

thanks for the link. but nothing changed still have the same problem.

@jeremy can you help me figure out the problem? Thanks

Please read the etiquette section of the FAQ.

I’m not able to reproduce the issue. I downloaded the dataset and got this:

41%20PM

Code:

from fastai.vision import *
path = Path('/home/user/datasets/state-farm-distracted-driver-detection/imgs')
np.random.seed(42)
bs = 32
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(do_flip=False), size=224, num_workers=4).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet34, metrics=error_rate, pretrained=True)
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(5, max_lr=1e-2)

And I used the following to create the validation set:

import pandas as pd
driver_list = pd.read_csv('/home/user/datasets/state-farm-distracted-driver-detection/driver_imgs_list.csv', delimiter = ',')
basepath = Path('/home/user/datasets/state-farm-distracted-driver-detection/imgs')

for index, row in driver_list.loc[driver_list['subject'] == 'p022'].iterrows():
    old_path = basepath/"train"/row['classname']/row['img']
    new_path = basepath/"valid"/row['classname']/row['img']
    old_path.rename(new_path)

Just ran that a few times, replacing ‘p022’ with different person ids, until I had a validation set of ~4000 images.

I’m not sure what exactly may be different about your setup. One thing I did notice is that the notebook you shared shows a training set of only ~2k images. Mine ended up at ~18k. Unless you intended to use a subset, I’d double check that.

(Note: I didn’t tune any hyperparameters; the loss I achieved isn’t very good. For example I’d reduce max_zoom on the transforms since people’s faces are occasionally getting cropped. But this quick experiment at least shows that the validation loss is decreasing on my setup.)

Thanks so much for your help. Everything is ok now after cleaning the training set.