Object detection - Understand Learner.predict return values

Marctrix · May 2, 2022, 6:00pm

Hello everyone,

I applied inference on a saved trained object detection model. I used the function Learner.predict() to predict the bounding boxes coordinates for each image of another dataset. I don’t understand the returned values.

Learner.predict() can return for example:

predict[0] = (TensorBBox([[  7.9246,  13.2136,  74.8705, 100.0172], [ 78.8447,   8.7797, 146.1627,  99.2884]]), 
predict[1] = TensorBase([-0.9009, -0.7798, -0.0641,  0.6670, -0.0144, -0.8537,  0.8270,  0.6548]), 
predict[2] =TensorBase([-0.9009, -0.7798, -0.0641,  0.6670, -0.0144, -0.8537,  0.8270,  0.6548]))

I thought the accurate predicted coordinates were the ones of the TensorBBox but once I plot the predictions results I noticed that there were some coordinates out of the limits of the corresponding image.
Then I tried to plot the bboxes which coordinates are from the first returned TensorBase (predict[2]) and the bboxes were displayed correctly. To display the bboxes correctly I had to previously do: ((predict[2]+1)/2).numpy() to make all coordinates positive.

My questions are:

What does each returned value from Learner.predict represent? (I read the documentation of Learner.predict but in my case of object detection I don’t understand the returned values). In classification, I know that the first returned value is the predicted label, the second one is the label id and the third one is the probabilites of each label.
Why both returned TensorBase (predict[2] and predict[3]) have the same values?

Thank you.