Object detection - Understand Learner.predict return values

Hello everyone,

I applied inference on a saved trained object detection model. I used the function Learner.predict() to predict the bounding boxes coordinates for each image of another dataset. I don’t understand the returned values.

Learner.predict() can return for example:

predict[0] = (TensorBBox([[  7.9246,  13.2136,  74.8705, 100.0172], [ 78.8447,   8.7797, 146.1627,  99.2884]]), 
predict[1] = TensorBase([-0.9009, -0.7798, -0.0641,  0.6670, -0.0144, -0.8537,  0.8270,  0.6548]), 
predict[2] =TensorBase([-0.9009, -0.7798, -0.0641,  0.6670, -0.0144, -0.8537,  0.8270,  0.6548]))

I thought the accurate predicted coordinates were the ones of the TensorBBox but once I plot the predictions results I noticed that there were some coordinates out of the limits of the corresponding image.
Then I tried to plot the bboxes which coordinates are from the first returned TensorBase (predict[2]) and the bboxes were displayed correctly. To display the bboxes correctly I had to previously do: ((predict[2]+1)/2).numpy() to make all coordinates positive.

My questions are:

  • What does each returned value from Learner.predict represent? (I read the documentation of Learner.predict but in my case of object detection I don’t understand the returned values). In classification, I know that the first returned value is the predicted label, the second one is the label id and the third one is the probabilites of each label.

  • Why both returned TensorBase (predict[2] and predict[3]) have the same values?

Thank you.