I am trying to classify images that has around 5000 classes. I am using fastai v1, resnet 50.
Training errors are going down but validation error are all over the place and total error is 99-100%.
Is resnet-50, even resnet-101,resnet150 a right architecture for this kind of classification.
As i understand resnet’s are pre-trained for 1000 classes. I am going for 5000 in a last layer.
Do I need to add more FC layers or Is there a alternative network I should consider (Google search did not help much)
@bjack913 thank you for your reply,
I have divided training/validation into 80/20, I have around 40 images each class with total of around 200k images.
Thanks for the information. How many epochs have you trained for? Keep in mind that if you randomly guessed a class out of 5000 classes, then you’d expect an accuracy of 0.02% (or an error rate of 99.98%). If your model has an error rate of <99.98% then it is in fact learning and may need to be trained for longer.
@bjack913 thank you for your continued support and help.
Here’s what I am doing, but as you pointed even though training error is going down but validation errors are really high and error rate is >99.98%. Training error is going down from one cycle to next.
learn.load(f'{model_path}/stage-2_unfreezed');
learn.unfreeze()
for i in range(0,10):
learn.fit_one_cycle(20, max_lr=slice(1e-6,1e-3))
path = f'{model_path}/stage-3-epoch_{i}_unfreezed'
learn.save(path);
If you try commenting out that line and training again while frozen from the pretrained imagenet weights what does the output look like?
———————————
Have you tried inspecting your batches to make sure your data is getting loaded and decoded properly?
Imagine a hypothetical scenario where you tried to train a network on random noise — there’s nothing for it to learn to classify the static because there’s no information there. The results might look a lot like what you posted above where the model is basically just randomly guessing.
What do your classes represent and where did the labels come from?
Hello @yeldarb, yes I have initially trained it with all the layers frozen except last layer, then unfreeze all the layers and tried finding optimum LR, using find_lr,
The code above is after I have already gone through above steps. Labels and classes were prepared and verified even before I started training process.
In my own experience, Siamese Networks may offer 3 distinct advantages over Traditional CLASSIFICATION!
These advantages are somewhat true for any kind of data, and not just for Images (where these are currently most popularly used).
CAN BE MORE ROBUST TO EXTREME CLASS IMBALANCE.
CAN BE GOOD TO ENSEMBLE WITH A CLASSIFIER.
CAN YIELD BETTER EMBEDDINGS.
Let’s say we want to learn to predict what animal is in a given image.
Case 1 : if it is just 2 animal classes to predict from (Cat vs Dogs) and given millions of images of each class, one could train a deep CNN Classifier. Easy!
Case 2 : but what if we have tens of thousands of animal classes and for most of these, we only have a few dozens of image examples? Trying to learn each animal as a Class using deep CNN seems less feasible now. Such a classifier can perform poorly for rarely seen training class e.g. let’s say there were only 4 training images of ‘eels’
I found a large FC layer helped on a 1500 class unbalanced task. I would try experiments running on fewer (500,1000,2000,etc) classes and see if performance falls off a cliff somewhere. Another strategy is to run a coarse then fine grained classifier if your class distinction is broad enough. Eg, imagine separate ‘car’, ‘van’, ‘motorcycle’ classifiers not just ‘vehicle models’.
So the thing I was trying to get at was that if you accidentally trained on too high of a learning rate and had your model diverge it could be in an unrecoverable state and no matter how much training you do it will never come back to reasonable weights.
That’s why I suggested trying to start again from the beginning.
the results before unfreeze were similar to the one I posted above.
2.767628 13.331071 0.999230
If these numbers are from the end of your training run they’re not helpful in diagnosing the issue. You’d want to see if your error_rate and loss were getting worse and worse from the beginning.
For instance, this is what happens if you initially train lesson1-pets with a learning rate of 1 and then unfreeze, do the learning rate finder, and try to bring things back to sanity:
@yeldarb, thank you for your continuous inputs.
Here’s what I did
learn.fit_one_cycle(10)
## training error went down on each epoch as expected, but validation and error remained at 12 and 99.99 respectively
learn.recorder.plot_losses()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(10,max_lr=slice(1e-6,1e-4)
I am running the last line in a loop , As I pointed before after each call to fit_one_cycle, training is consistently going down but valid_loss and error_rate remains high.
epoch
train_loss
valid_loss
error_rate
20
2.767628
13.331071
0.999230
after few cycles, training error is at 2.1 but valid_loss/error loss remains unchanged.