Hi all,
I have been implementing an algorithm for object detection and tracking.
As I wanted to enable object re-identification, I have read some papers, but while watching the lecture in which Jeremy explained style transfer principle, I started to wonder if it would be possible to uniquely identify an object by observing the output of a certain group of kernels or dense layers, without additional mechanisms such as path prediction, explicit pose estimation or measuring distance between objects on consecutive frames.
For example, when I observed the video in which a girl walked around the store, picked the shoes and then sat down to put the shoes on, at some point while sitting, she was classified as a dog (probably because from that camera angle her long hair covered a major part of the body) and therefore after being classified as a person again, in current simple algorithm for people counting that I use, the counter value has been incremented. Anyway, intuitively, the problem seems trivial, the papers I have read so far suggest adding additional complex mechanisms, but perhaps some values before the final layer contain information that is not that much affected by pose in this case (e.g. hair color, skin color, dress type) and could serve well to re-identify the object throughout frame sequence or various cameras.
Has anyone implemented anything that might be relevant to object identification by observing the outputs of the certain layer or noticed any interesting article?
Thank you.