Hello everyone,
I’m trying to resolve a classification problem but includes more than 1000 different classification options.
I’m trying to resolve this like a cats/dog problem or sentiment analysis problem but instead of two classes I have 1000.
data.class = “Class 1” … “Class 1000”
Is this the correct way to do it?
It is taking a lot of time to go above 50% accuracy.
Basically, yes it’s the same. Here are some pointers.
Re 50%. Often we use a top-3 or top-5 metric rather than top-1. It depends on the real life use case. You often find another 10% in the next few results.
What network architecture are you using? You probably need to try something big like dn201, res 152 or wrn. Try adding an xtra_fc=[x] where x is higher than your numcats, or as large as the last model layer. And if your use case allows, ensemble multiple models and do cross validations.
Be very careful with your train/val/test stratification. With many cats you can find poor representation after splitting. You need to address the weighting in some way, eg under/over-sampling, data augmentation, weighting the loss function, or post-training probability adjustment.
If it is taking a long time, start with a much smaller sample. This is a quick way to equalise the sampling, too.
Also take a close look at your confusion matrix, you’ll see where the model is struggling. Sometimes it is worth running a separate model to preclassify troublesome cats. Some people go so far as to classify into coarse grained cats (eg water animals, land animals, air animals) before running more fine grained models, but it’s never worked for me. Use cases matter a lot with fine grained problems.
Taking these approaches can together often reduce your error by half or more.
The model with 1000s was unable to pull more than 43% accuracy and a great loss on validation and training
I decided to move to less categories 50 and I went to 75% accuracy but the still erratic behavior with validation and training.
Not a single prediction on the 50 categories was able to pull more than 50%.
When I was trying to predict I was getting the same results in the same order all the time.
I think the reason for that is that the model that we have is for sentiment analysis. (0,1).