I try to identify 3000 tabletop games by images. I have an
ImageBunch containing 3000 games and around 20 images per game.
Using Resnet50 (with only 100 different games) I can’t get an error rate lower then 60%. I tried different transformations like flipping, rotation, lightning, … without success.
Are cnn’s with resnetXX a good pick to identify 3000 different items?
All classification examples I see are using less then 10 classes.
(I saw lessons 1-4 so far)
Wow, 3000 classes are in fact a lot. But I think Imagenet also contains 1000 classes of object localization.
An error rate of 60% sounds bad by itself, but if you think about the fact that a random draw would only have a 1/3000 \approx 0.00033 chance of success, an accuracy of 40% is actually around 1200 times better. So I wouldn’t say the classifier is that bad It’s simply a very hard classification problem given only very few examples to learn from.
So what I would do is to decrease the number of classes, e.g. at first to 1000, and see how the error rate changes and then try to add more training data.
So it should be possible do identify like 100 games and then go up. But I need get more training data.
I wasn’t sure whether its worth spending time on it, but know I it is :D.
Thanks a lot.
Well, no guarantees that it’ll work. I have no experience with training classifiers with such a high number of classes. I only wanted to offer a more optimistic perspective on your error rate and suggest to slowly and iteratively try to increase training set size But in the end only you can decide a) how good your model needs to be for it to be considered a success and b) if it’s worth spending your time on it.
Good luck and keep us updated
you may check Kaggle Landmark Recognition and Retrieval Competitions [link] It deals with huge number of classed (https://www.kaggle.com/c/landmark-recognition-2019/discussion/95176) . related paper:
code not working properly but should be there in https://github.com/PaddlePaddle/