How to interpret inference results (Lesson 3 Head Pose)

#1

Hi! I’m unsure about how to interpret the inference results when using learn.predict(img) for ImagePoint problems such as the Head Pose problem of Lesson 3. My goal is to collect the predicted coordinates and use them for further processing later. What confuses me is the direct output of learn.predict(img). It doesn’t seem to make any sense to me.

When printed out, learn.predict(img) outputs a touple where the first item is the original image shape (120x160 pixels) while 2nd and 3rd items seem to be equal:

(ImagePoints (120, 160),
tensor([[-1.2336e-03, -2.0357e+00]]),
tensor([-1.2336e-03, -2.0357e+00]))

The predicted coordinates don’t seem to be in the output, but still the correct ImagePoint is displayed with img.show(y=learn.predict(img)[0]), just like in the Regression example of https://docs.fast.ai/tutorial.inference.html. This is what I don’t get. Even though img.show() is given the original image shape as y=ImagePoints (120, 160), it still shows the predicted coordinate correctly on the image. What am I missing here? How can I find and collect the predicted ImagePoint or coordinates when using learn.predict(img)? Any hints to put me on the right track are highly appreciated!

Thanks a lot for help folks!

0 Likes

#2

After going through a bunch of different predictions, it looks like the coordinates are actually in the 2nd and 3rd items of the output. The tensor([[-0.5004, -0.1986]]) seems to mean:
y = -50.04% from the center of the image
x = -19.86% from the center of the image

What I still don’t understand is how can the img.show() method display the coordinates correctly when it is only given the image shape and not the coordinates.

0 Likes

(Zachary Mueller) #3

Probably because mathematically you can convert those percentages into coordinates based on the image size would be my guess.

0 Likes

#4

img.show(y=learn.predict(img)[0]) is never given the percentages ([1,2]), only the size ([0]). That’s what I’m baffled about. The percentages are the 2nd / 3rd items of the predict method’s output, but the show method is only given the 1st item. Or then I’m just missing something.

0 Likes

(Zachary Mueller) #5

https://docs.fast.ai/vision.image.html#ImagePoints

See ImagePoints in the docs. Looks like the object has the image already in it and it overlays those points to the image, creating a new object with all of that combined.

1 Like