Buiding type classification - How to?

Background:

I’m interested in classification of buildings from images. From the images I want to classify for instance if a building is a kind of sport, residential, religious, commercial, hotel or city hall building. I looks simple I have 6 classes of buildings for a start. I have about 6500 images for training, 500 for validation and 300 for testing.

The Problem:
I’m well aware of potential performance of DL models (70- 100% accuracy). However, I get disappointing results (20% / 40% accuracy) using VGG, VGG+ Fine-Tuning, VGG+Image Generators and Inception etc. I don’ thing my coding is the problem (For cats and doqs I got 96% validation acc.). But, more about suitability of imagenet pre-trained models and features. I used SVG classification and get 90% accuracy.

Sample Training Data

Question:

  • Any suggestions are welcome how to deal with this type of problem?
  • Any reference paper or codes are available?

PS: I thing the problem is close to Dstl-Satellite Imagery Feature Detection type of problem. I can provide some images as well as more details if needed.