Basically, yes it’s the same. Here are some pointers.
Re 50%. Often we use a top-3 or top-5 metric rather than top-1. It depends on the real life use case. You often find another 10% in the next few results.
What network architecture are you using? You probably need to try something big like dn201, res 152 or wrn. Try adding an xtra_fc=[x] where x is higher than your numcats, or as large as the last model layer. And if your use case allows, ensemble multiple models and do cross validations.
Be very careful with your train/val/test stratification. With many cats you can find poor representation after splitting. You need to address the weighting in some way, eg under/over-sampling, data augmentation, weighting the loss function, or post-training probability adjustment.
If it is taking a long time, start with a much smaller sample. This is a quick way to equalise the sampling, too.
Also take a close look at your confusion matrix, you’ll see where the model is struggling. Sometimes it is worth running a separate model to preclassify troublesome cats. Some people go so far as to classify into coarse grained cats (eg water animals, land animals, air animals) before running more fine grained models, but it’s never worked for me. Use cases matter a lot with fine grained problems.
Taking these approaches can together often reduce your error by half or more.