Validation loss keeps increasing, and performs really bad on test dataset

I am training on chest Xray dataset from kaggle, and I use the training test and validation dataset provided to train and evaluate on Resnet-50. I am loading the data as follows:

data = ImageDataBunch.from_folder(path, train="train", valid='test',test='val', bs=bs, size=224, num_workers=4).normalize(imagenet_stats)

followed by:

learn = cnn_learner(data, models.resnet50, metrics=accuracy, model_dir="/tmp/model/")



epoch train_loss valid_loss accuracy time
0 0.267473 0.492386 0.826923 03:39
1 0.142290 0.628095 0.802885 03:37
2 0.085903 0.762643 0.810897 03:37
3 0.050223 0.735109 0.834936 03:38
4 0.030469 0.805609 0.833333 03:37
5 0.019916 0.912937 0.826923 03:37
6 0.010563 0.932671 0.820513 03:38
7 0.006739 0.862673 0.833333 03:37
learn.fit_one_cycle(6, max_lr=slice(1e-6,1e-4))
epoch train_loss valid_loss accuracy time
0 0.006664 0.932899 0.833333 03:38
1 0.009007 1.020591 0.822115 03:38
2 0.004957 1.058624 0.810897 03:40
3 0.003255 1.058656 0.814103 03:37
4 0.002129 1.068035 0.814103 03:38
5 0.001938 1.024862 0.825321 03:39

and then the validation on test as follows:

which output
[5.0541553, tensor(0.3750)]

How do I improve/ debug the issue, and training on resnet-34 gave 91% accuracy on validation set and 50% on test set. Also is the right way to load and test the testing dataset?

Note: The validation set seemed to be too small, and hence I have switched tests and validation sets.

Hi @jibinmathew69 it looks like you may be overfitting (decreasing training loss, increasing/high validation loss).

Maybe start with resnet34 and see how the model performs with a less complex base architecture.
If you continue to overfit, maybe try some other regularisation such as data augmentation, or increasing dropout?

I have already experimented with Resnet-34, and the validation accuracy went up to 91% but the on the test set it show 50%. And considering the nature of the image, rotation, zoom form of augmentation didn’t make sense, since it’s Xray and would have same format all through out.

Ahh great. Maybe try increasing drop out and weight decay to see if that improves things?

I also noticed some other things.

In your first fit_one_cycle run, you could choose a value from the learning rate graph to pass into that function. You would pick the highest learning rate from the steepest point before the loss starts to rapidly increase.

So for example, 1e-02 could be a suitable observation.

learn.fit_one_cycle(8, slice(1e-02))

I would also run lr_find again after you unfreeze. Then from that graph, you can choose an appropriate learning rate range to pass into your fine tuning fit_one_cycle run

How do I apply dropout to pre-trained model and also weight decay? Also, could you verify if the test set evaluation is done right?

1 Like

In the docs for cnn_learner you can see a parameter called ps This defaults to 0.5 and is the value that controls dropout.

For weight decay look for the wd param

For the test set I normally use TTA to run a test evaluation. Check here in the docs:


It defaults to running on the validation set. But if you want to run it on the test set you write something like this:


Here are the relevant docs for that:


I tried a dropout of 0.75 and weight decay of 0.1 but nothing much improved.

epoch train_loss valid_loss accuracy time
0 0.003090 0.961501 0.836538 03:13
1 0.003340 1.014350 0.833333 03:13
2 0.003588 0.984097 0.836538 03:09
3 0.004335 0.963037 0.838141 03:12
4 0.008236 0.971218 0.836538 03:09
5 0.003931 0.998021 0.834936 03:11
6 0.003813 1.057713 0.831731 03:12
7 0.002577 1.012105 0.833333 03:09

this doesn’t seem to work

Hey, I don’t know what you are doing wrong but I have 2 kernels on this dataset if you want to check them out. Links to the kernels:
Pneumonia detection
Mixed precision on pneumonia detection

Did you figure out the reason?

From experience, when the training set is not tiny (but even more so, if it’s huge) and validation loss increases monotonically starting at the very first epoch, increasing the learning rate tends to help lower the validation loss - at least in those initial epochs.