Best approach suggestion for image categorization starting from 3d model


Just trying to find my way into this interesting word.
I’m tying to prototype this idea: have a 3d model of a new object (a car, a building, etc), and when this is actually built you can define trough a photo if this is created as planned or your are behind schedule.
Just to have an example using public data I was looking at this

or this

to be confronted with this

Tried to create different custom categories - such as 30%, 60% and 100% - on commercial image categorization API and train the network on the first photo (generating about 20 examples per category) cutting out the different floors to provide relevant examples, but this didn’t work. I suppose the problem is this systems just evaluate the whole image and didn’t focus on a specific element.
Thinking about the next if will be better to just provide one positive category (i.e. 50%) and one negative category (everything else) and train using the relevant enriched example creating a bigger training set, cut off the background and train using only the relevant part (so the building in this case), or start defining a custom model using something such There are pre-defined models able to help?
I have very little Python knowledge, and a distant past as scientific background.