So I first sticked to your course case, using full convolutional network.
My dataset consists in images of racing cars (all are the same model, but they can be decorated differently). 800 pictures for 17 classes, and using your full convolutional network, 94% classification accuracy reached easily. And the localization trick works neatly !
But for now, when there are multiple cars it doesn't look for a car but the class car. It is... too precise. So I've thought of 3 solutions :
Training a full convolutional network with a binary task : picture with car(s) VS picture without car. Using the heatmap trick that would act as a car localizer (and could thus handle multiple cars).
Maybe I could check the inputs of less deep convolutional layers and maybe one is actually already doing the simple car detection...
Actually checked that on a picture where there are 2 cars, but doesn't seem like working
Convo Layer n-1
Convo Layer n-2
Maybe doing an average of output from the last convolution layer filters for all classes could yield me some "this looks like a combination of all the already existing cars although it's not one of them"
Did try this also. Doesn't work all the time, but cases appear where it works :
Here it doesn't find the one behind ...
Supposing the localization works, using that same training data, I'd slice car parts. Do you think there could be some unsupervised classification (clustering) at that point ? In order to find car classes "by itself"... Something like PCA + k-means I don't know.
Actually tried to do some dimension reduction on the final convolution features of full pictures (didn't try with slices yet) and it works pretty well... The question is how to react with a new point ? If it's right in a cluster, then assign the existing class, and if it's a little too far then say it's a new class... Hmmm. I expect it could start to make sense if I crop pictures and train again. Clusters might become stronger and stronger
tSNE (intended for visualizations)
LDA = Linear Discriminant Analysis (supervised dimension reduction)
PCA (most popular but rarely the best in my experience)
Now we can see that for a new image (with a new class : that is a car that has never been seen before), when projecting it on our LDA space we get :
Now, supposing clusters are stronger maybe there will be ways to strongly assume that because it's not right in a cluster, it should be a new class. Etc etc, and it goes unsupervised now.
That would, in the end, enable to "tag" automatically a new picture from a new class, switching from supervised to unsupervised.
Thanks for the feedback