I have had some success implementing Tensorflow’s Object Detection API, and I’m getting some pretty good results from training the SSD Inception V2 architecture using our own dataset, with no finetuning/checkpoints.
I have so far managed to make some modifications to the API and respective pipeline/config (decoder, exporter, feature extractor and so on) to accept our specific data (single channel, very small resolution <= 30x90 images).
However, I’m now struggling to work out how (or if it’s even possible) to add metadata inputs to the API to improve accuracy. I believe that due to the huge variety of data within a particular class, and sometimes the similarity between classes themselves in our dataset, there is some ambiguity, and adding metadata (for example specific IDs/tags) may improve the overall performance, since the network will have context and giving the network context using the metadata I have should help reduce this ambiguity.
To that end, I have some specific questions/ideas:
Is it possible to modify the current input, and add a second one-hot encoded input tensor alongside image_tensor? If so, where should I begin to look?
If not, would it perhaps be possible to modify the image_tensor input itself, and merge the one-hot encoded data, perhaps as a separate pseudo channel?
Would it perhaps be better to build a standalone network from the ground up, and if so, which parts would I need to “extract” from the API code in order to replicate the SSD-Inception V2 model?
The input tensor modification would be required for both training and inference, and no output tensor modifications/additions are required.
In the interest of openness, I have also asked this question on SO: https://stackoverflow.com/questions/47908222/tensorflow-object-detection-add-metadata-input-tensor-e-g-one-hot-encoded-id
Thank you so much for any help/pointers in the right direction.