I tried object localization on my own data containing single object per image without any class i.e. I’ve to predict only bounding boxes of object without classifying them as object are very different. Most of the objects are grocerry item on a white background. I tried adding a simple regressor head ahead resnet and mobilenet but I’m in vain as the results aren’t good. Also, some of my test images contain two adjecent objects like a pair of shoes whereas in my training data there’s only one shoe. So, this is one thing I’m not able to tackle,I’ve also tried using ssd but it not working because objects present in training data are of very different scales and aspect ratios.Please help in this regard.
this is the example of training image
this is test image