CNN with large amount of categories

Did you considered using a siamese network approach?
Should be useful in a situation like this, with few samples per class.

For reference, take a look at great @radek starter pack: