Coral Snake vs Scarlet King Snake

How to know what features did the neural network primarily learn to look at for classification, while learning from the training set? Like, did the network learn to distinguish based on patterns, colors, shape, etc. On a similar note, is there a way to know, on what basis did the network classify a test image?

I am building a classifier using resnet34 to classify scarlet king snake vs coral snake. Both the snakes look alike, but there is one color pattern listed below (with a snake poem as mnemonic;) ), that distinguishes them.

  • Coral Snake --> Red Touches Yellow (Kills a fellow)
  • Scarlet King Snake --> Red Touches Black (Friend for Jack)

Snake

My input image set has 83 images of scarlet_king snake and 52 images of coral snake (split with valid_pct=0.2 for train/valid data set). My classifier has an error rate of 7%. It classified two coral_snake as scarlet_king. Is it due to the imbalance in the input data set count across the two classes?