Hello everyone,
I applied inference on a saved trained object detection model. I used the function Learner.predict() to predict the bounding boxes coordinates for each image of another dataset. I don’t understand the returned values.
Learner.predict() can return for example:
predict[0] = (TensorBBox([[ 7.9246, 13.2136, 74.8705, 100.0172], [ 78.8447, 8.7797, 146.1627, 99.2884]]),
predict[1] = TensorBase([-0.9009, -0.7798, -0.0641, 0.6670, -0.0144, -0.8537, 0.8270, 0.6548]),
predict[2] =TensorBase([-0.9009, -0.7798, -0.0641, 0.6670, -0.0144, -0.8537, 0.8270, 0.6548]))
I thought the accurate predicted coordinates were the ones of the TensorBBox but once I plot the predictions results I noticed that there were some coordinates out of the limits of the corresponding image.
Then I tried to plot the bboxes which coordinates are from the first returned TensorBase (predict[2]) and the bboxes were displayed correctly. To display the bboxes correctly I had to previously do: ((predict[2]+1)/2).numpy() to make all coordinates positive.
My questions are:
-
What does each returned value from Learner.predict represent? (I read the documentation of Learner.predict but in my case of object detection I don’t understand the returned values). In classification, I know that the first returned value is the predicted label, the second one is the label id and the third one is the probabilites of each label.
-
Why both returned TensorBase (predict[2] and predict[3]) have the same values?
Thank you.