How to improve accuracy on a classification problem

oo92 · May 26, 2020, 4:51am

Hi.

I am building a classifier using the Food-101 dataset. The dataset has predefined training and test sets, both labeled. It has a total of 101,000 images. I’m trying to build a classifier model with >=90% accuracy for top-1. I’m currently sitting at 75% and I’ve built this model using what I’ve learned from Mr.Howard so I owe him big thanks. The training set was provided unclean. But now, I would like to know some of the ways I can improve my model and what are some of the things I’m doing wrong.

I’ve partitioned the train and test images into their respective folders. Here, I am using 0.2 of the training dataset to validate the learner by running 5 epochs.

np.random.seed(42)
data = ImageList.from_folder(path).split_by_rand_pct(valid_pct=0.2).label_from_re(pat=file_parse).transform(size=224).databunch()

top_1 = partial(top_k_accuracy, k=1)
learn = cnn_learner(data, models.resnet50, metrics=[accuracy, top_1], callback_fns=ShowGraph)
learn.fit_one_cycle(5)

epoch	train_loss	valid_loss	accuracy	top_k_accuracy	time
0	2.153797	1.710803	0.563498	0.563498	19:26
1	1.677590	1.388702	0.637096	0.637096	18:29
2	1.385577	1.227448	0.678746	0.678746	18:36
3	1.154080	1.141590	0.700924	0.700924	18:34
4	1.003366	1.124750	0.707063	0.707063	18:25

And here, I’m trying to find the learning rate. Pretty standard to how it was in the lectures:

learn.lr_find()
learn.recorder.plot(suggestion=True)

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Min numerical gradient: 1.32E-06
Min loss divided by 10: 6.31E-08

Using the learning rate of 1e-06 to run another 5 epochs. Saving it as stage-2

learn.fit_one_cycle(5, max_lr=slice(1.e-06))
learn.save('stage-2')

epoch	train_loss	valid_loss	accuracy	top_k_accuracy	time
0	0.940980	1.124032	0.705809	0.705809	18:18
1	0.989123	1.122873	0.706337	0.706337	18:24
2	0.963596	1.121615	0.706733	0.706733	18:38
3	0.975916	1.121084	0.707195	0.707195	18:27
4	0.978523	1.123260	0.706403	0.706403	17:04

Previously I ran 3 stages altogether but the model wasn’t improving beyond 0.706403 so I didn’t want to repeat it. Below is my confusion matrix. I apologize for the terrible resolution. Its the doing of Colab. I will post the link to the full notebook.

Since I’ve created an additional validation set, I decided to use the test set to validate the saved model of stage-2 to see how well it was performing:

path = '/content/food-101/images'
data_test = ImageList.from_folder(path).split_by_folder(train='train', valid='test').label_from_re(file_parse).transform(size=224).databunch()

learn.load('stage-2')
learn.validate(data_test.valid_dl)

This is the result:

[0.87199837, tensor(0.7584), tensor(0.7584)]

This is the link to the full notebook.

ShaanShah · May 26, 2020, 6:49am

Hey ! I am not so sure if it would work or not but why don’t you try training the model after unfreezing the model after first training step.
Also perhaps a heavier resnet would help

oo92 · May 26, 2020, 7:01am

I can try that but I can’t go above ResNet50

ShaanShah · May 26, 2020, 7:34am

As in your ram is not supporting a higher resnet or you are not getting a better result with higher resnet ?

oo92 · May 26, 2020, 8:15am

The project specification demands ResNet50 or lower

vferrer · May 26, 2020, 1:50pm

I recommend you:

Training during more epochs.
Use label smoothing.
Use lr_find before the first fit to find a good lr. For the unfreeze, use the previous lr divided by 5 - 15. It depends.
Use MixUp (typically, you need to train more epochs to be useful).

Finally, if you didn’t, you may want to watch the first part of fastai course.

oo92 · May 26, 2020, 11:08pm

Hi Victor.

What do you mean by “Training during more epochs”? You mean increase the number of epochs? If that’s what you’re suggesting, it won’t help because the model does not go above 0.76 after a certain while.
How can I apply label smoothing to this problem? #Already figured this out!
I’ve added an lr_find before my first epoch. But when you say use lr/5-15 for the unfreeze, what exactly do you mean by that?
Isn’t label smoothing a part of MixUp? So wouldn’t this be a part of #2?

vferrer · May 27, 2020, 6:55am

Hi @oo92.

As you said, I mean increase the number of epochs.
If you find a lr=1e-3 to train the head, use a between lr=1e-3/5 and 1e-3/15 to train with the whole net. In this case, remember to use slice(lr) to train with lower lr the first lowest layers. By default, fastai train those with an lr x10 lower.
They are different techniques. MixUp, CutMix, etc are data augmentations that helps to regularize your network. Label Smoothing changes the loss function to be more robust to incorrect labels in the training dataset + some nice additions. This paper explains why you should employ it.

Finally, remember to increase your epochs if you use MixUp. The network needs more time to reach the same accuracy as without it.

NASA · May 27, 2020, 7:38am

Hi, I get the same question that the accuracy is always around 0.5.
I use the resnet50 and the dataset are 2 classes(0 and 1).
I have tried the loss_func = LabelSmoothingEntropy, but it did not work.
And have used the lr_find.
Do anyone find the solutions?

oo92 · May 27, 2020, 10:53am

What do you think of Optimal Transforms Augmentation and TTA?

vferrer · May 27, 2020, 11:04am

I don’t know what you mean with “Optimal Transforms Augmentation”. Maybe, ¿RandAugment? If you have enough data, using the default fastai transforms + MixUp gives you almost SOTA results. You may squeeze more with more advance image transforms techniques. Also, you could improve results by finetuning hyperparameters (watch fastai lessons).

Finally, TTA gives a boost at cost of inference time. See docs.

oo92 · May 27, 2020, 11:41am

The author mentions Optimal Transforms Augmentation in this blog:

https://platform.ai/blog/page/3/new-food-101-sota-with-fastai-and-platform-ais-fast-augmentation-search/

oo92 · May 28, 2020, 6:07pm

This is result after label smoothing:

[1.5405526, tensor(0.8143), tensor(0.8143)]

I jumped from 75% to 81%. Definitely a great improvement. Now I’m going to work on applying the fastai transforms + TTA. I can’t use MixUp unfortunately because 5 epochs alone take me a day to run a full test as I am on Colab and my local Jupyter is throwing weird issues that prevent me from using my own GPU.

oo92 · June 1, 2020, 12:53am

Adding fastai’s get_transforms() didn’t improve the model.

tfms = get_transforms(do_flip=True,flip_vert=True)

data = ImageList.from_folder(path).split_by_rand_pct(valid_pct=0.2).label_from_re(pat=file_parse).transform(tfms, size=224).databunch()