I am trying to build a Transfer Learning model, but the number of classes is around 5000.
Now that all the pre-trained models are on top of Imagenet, which has around 1000 classes, is it a good option to add custom dense layers on top of it or directly add a single dense layer equal to the number of classes followed by a softmax layer.
My doubts are:
Since the number of classes is around 5000, I might have to add dense layers with around 8192 nodes, which would increase both the training time and the size of the model, cause the number of Images is around 1.2 million.
Gosh that’s actually a very deep question. And it turns out the answer is probably “yes”, but there’s better ways to do it - a paper discussed this just a couple of weeks ago! I hope to cover this in part 2. If you want to skip ahead, here’s a great summary: http://smerity.com/articles/2017/mixture_of_softmaxes.html