(Pre-)Training for Classification Tasks: Few versus many classes

In general is it desirable to have many more detailed classes when training a ConvNet on a classification task? For example let’s say I want to train a ConvNet to distinguish between different animals would it be desirable to have classes for every dog breed even if I am just interested in the end to distinguish dogs from cats, wolfs, foxes etc?

Does a ConvNet learn better features if it is trained on many detailed classes? My guess would be that it does because we use ImageNet for pre-training. But I wonder if that has been systematically studied?