The images aren’t as distinctive as faces, maybe increasing the number of images could help. I would try wih tthe same number of images first, but use a bigger Resnet model, see how this works then increase the number of images so you can see what makes the most difference.
I had a similar issue before and increasing the model size helped a lot.
Also make sure you have enough images in each class. currently it looks like model thinks the majority of images are three.
I normally use a minimum of 200-600 images per class depending on how different the classes are from each other.