Are there one shot object detectors and recognizers trained on imagenet?

I have seen one shot detection on Omniglot dataset mostly but not on imagenet. Facenet works really well for face recognition so why is there no pretrained network which encodes a image into a vector, which we can use to recognize the object later. We can use vgg, inception and reset last layer vector to do something similar but I haven’t found any of these network trained with a triplet loss to produce a vector specifically for encoding the object. And why can’t we have a yolo or ssd like network which can detect all objects and output bounding box and its vectors at once , we will need to have another step to recognize the actual object but we will be able to detect any object if we have a network like this right?

I think its it good because we will have a general object encoder thus we can use couple of images to index an object and then recognize the object after that using a distance measurement and also because facenet works so well. But when ever I think of something like this there is a good reason why its not out there and I am asking for that answer. Is it too difficult to train, too noisy , too broad not like face and symbols or didn’t I search well to find them?

Thank you