Classifying large number of classes ( 5000)

iyersathya · February 23, 2019, 7:45pm

I am trying to classify images that has around 5000 classes. I am using fastai v1, resnet 50.
Training errors are going down but validation error are all over the place and total error is 99-100%.

Is resnet-50, even resnet-101,resnet150 a right architecture for this kind of classification.
As i understand resnet’s are pre-trained for 1000 classes. I am going for 5000 in a last layer.
Do I need to add more FC layers or Is there a alternative network I should consider (Google search did not help much)

bjack913 · February 25, 2019, 7:07am

What’s the size of your training and validation sets? How many examples are in each class?

iyersathya · February 25, 2019, 2:41pm

@bjack913 thank you for your reply,
I have divided training/validation into 80/20, I have around 40 images each class with total of around 200k images.

    tfms = get_transforms(max_rotate=20,max_zoom=1.3,max_lighting=0.4,
                          max_warp=0.4,p_affine=1.,p_lighting=1.)
    data = ImageDataBunch.from_lists(model_path, fnames=file_names,labels=labels, valid_pct=0.2,
           ds_tfms=tfms,
           size=224, 
           num_workers=8).normalize(imagenet_stats)
    learn = create_cnn(data, models.resnet50, metrics=error_rate)

bjack913 · February 26, 2019, 1:36am

Thanks for the information. How many epochs have you trained for? Keep in mind that if you randomly guessed a class out of 5000 classes, then you’d expect an accuracy of 0.02% (or an error rate of 99.98%). If your model has an error rate of <99.98% then it is in fact learning and may need to be trained for longer.

iyersathya · February 26, 2019, 3:05am

@bjack913 thank you for your continued support and help.
Here’s what I am doing, but as you pointed even though training error is going down but validation errors are really high and error rate is >99.98%. Training error is going down from one cycle to next.

learn.load(f'{model_path}/stage-2_unfreezed');
learn.unfreeze()
for i in range(0,10):
    learn.fit_one_cycle(20, max_lr=slice(1e-6,1e-3))
    path = f'{model_path}/stage-3-epoch_{i}_unfreezed'
    learn.save(path);

epoch	train_loss	valid_loss	error_rate
1	2.887602	13.034577	0.999333
2	2.922368	13.079821	0.999333
3	3.048307	13.346167	0.999281
4	3.147970	13.503422	0.999256
5	3.242375	13.832283	0.999384
6	3.228010	13.872521	0.999230
7	3.215050	13.705756	0.999102
8	3.147623	13.871474	0.999307
9	3.179110	13.856788	0.999281
10	3.086861	13.711730	0.999050
11	3.123843	13.350389	0.999230
12	3.106609	13.591689	0.999204
13	2.992076	13.637679	0.999358
14	2.918751	13.508119	0.999230
15	2.928656	13.523503	0.999256
16	2.845991	13.388854	0.999102
17	2.854365	13.312586	0.999153
18	2.867545	13.408681	0.999179
19	2.828079	13.210934	0.999204
20	2.767628	13.331071	0.999230

yeldarb · February 26, 2019, 4:14am

Where are the weights you’re loading coming from?

If you try commenting out that line and training again while frozen from the pretrained imagenet weights what does the output look like?

———————————

Have you tried inspecting your batches to make sure your data is getting loaded and decoded properly?

Imagine a hypothetical scenario where you tried to train a network on random noise — there’s nothing for it to learn to classify the static because there’s no information there. The results might look a lot like what you posted above where the model is basically just randomly guessing.

What do your classes represent and where did the labels come from?

iyersathya · February 26, 2019, 4:42am

Hello @yeldarb, yes I have initially trained it with all the layers frozen except last layer, then unfreeze all the layers and tried finding optimum LR, using find_lr,
The code above is after I have already gone through above steps. Labels and classes were prepared and verified even before I started training process.

yeldarb · February 26, 2019, 4:55am

Yes and what were the results?

hanman · February 26, 2019, 8:36am

5000 classes… seems like a facial recognition problem. Previously, I saw someone using Siamese networks to solve this. Not sure, whether it is implemented in fastai.
https://www.quora.com/What-are-Siamese-neural-networks-what-applications-are-they-good-for-and-why

In my own experience, Siamese Networks may offer 3 distinct advantages over Traditional CLASSIFICATION!

These advantages are somewhat true for any kind of data, and not just for Images (where these are currently most popularly used).

CAN BE MORE ROBUST TO EXTREME CLASS IMBALANCE.

CAN BE GOOD TO ENSEMBLE WITH A CLASSIFIER.

CAN YIELD BETTER EMBEDDINGS.

Let’s say we want to learn to predict what animal is in a given image.

Case 1 : if it is just 2 animal classes to predict from (Cat vs Dogs) and given millions of images of each class, one could train a deep CNN Classifier. Easy!

Case 2 : but what if we have tens of thousands of animal classes and for most of these, we only have a few dozens of image examples? Trying to learn each animal as a Class using deep CNN seems less feasible now. Such a classifier can perform poorly for rarely seen training class e.g. let’s say there were only 4 training images of ‘eels’

digitalspecialists · February 26, 2019, 9:05am

I found a large FC layer helped on a 1500 class unbalanced task. I would try experiments running on fewer (500,1000,2000,etc) classes and see if performance falls off a cliff somewhere. Another strategy is to run a coarse then fine grained classifier if your class distinction is broad enough. Eg, imagine separate ‘car’, ‘van’, ‘motorcycle’ classifiers not just ‘vehicle models’.

iyersathya · February 26, 2019, 4:15pm

@yeldarb the results before unfreeze were similar to the one I posted above.
2.767628 13.331071 0.999230

iyersathya · February 26, 2019, 4:17pm

@hanman, thank you for your inputs, let me see if siamese n/w can help.
But yes this problem does fall into case2 you mentioned.

yeldarb · February 26, 2019, 5:58pm

So the thing I was trying to get at was that if you accidentally trained on too high of a learning rate and had your model diverge it could be in an unrecoverable state and no matter how much training you do it will never come back to reasonable weights.

That’s why I suggested trying to start again from the beginning.

the results before unfreeze were similar to the one I posted above.
2.767628 13.331071 0.999230

If these numbers are from the end of your training run they’re not helpful in diagnosing the issue. You’d want to see if your error_rate and loss were getting worse and worse from the beginning.

For instance, this is what happens if you initially train lesson1-pets with a learning rate of 1 and then unfreeze, do the learning rate finder, and try to bring things back to sanity:

With 5000 classes I imagine it would be even harder for it to start finding its way back than with only 37.

iyersathya · February 27, 2019, 12:10am

@yeldarb, thank you for your continuous inputs.
Here’s what I did

learn.fit_one_cycle(10)
## training error went down on each epoch as expected, but validation and error remained at 12 and 99.99 respectively
learn.recorder.plot_losses()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(10,max_lr=slice(1e-6,1e-4)

I am running the last line in a loop , As I pointed before after each call to fit_one_cycle, training is consistently going down but valid_loss and error_rate remains high.

epoch	train_loss	valid_loss	error_rate
20	2.767628	13.331071	0.999230

after few cycles, training error is at 2.1 but valid_loss/error loss remains unchanged.

raghavab1992 · November 20, 2019, 8:11am

@iyersathya Were u able to solve this? Can you share your learnings here?