So really you don’t trust the label of A and want to verify it.
I would try an unsupervised learner such as a CNN autoencoder and use a bottleneck layer as the feature vector for each image during inference.
First you will need to train it on A to E categories of images to produce a set of filters that can represent your dataset and then you can run inference on each image to obtain the feature vector.
Store these feature vectors in an approximate nearest neighbour search package like Annoy with the label.
Then given a new image you can run inference on it to extract the feature vector then search the AKNN for the K nearest images and obtain their labels and distances. From here you can rank which label is most likely given K nearest images and distances.
How to deal with new image categories?
For new categories you will want at least 3 images in the category. Run the images from the new category through the network using inference only to extract the feature vectors. Then add the feature vectors to the AKNN. Then for each image in the new category search for the K nearest images. If your algorithm ranks the 2 other images in the category as highest match you should be good to use the current network without fine-tuning. If there is a mismatch then you can fine-tune the autoencoder and recompute the feature vectors for the existing images.
There are probably more complicated things you could do to avoid having to fine-tune one network and recompute feature vectors such as have a seperate autoencoder per category.