Share your work here ✅

Eng1neer · February 1, 2019, 4:52pm

Hi everyone,

I’ve noticed that in Lesson 3, Image regression task (head coordinates) a slightly more complex procedure is required to convert 3D head position to screen coordinates. More specifically, other than intrinsic camera matrix multiplication (which is done in the original solution), there are also rotation and translation matrices which are used to define RGB-camera position relative to the depth-camera on Kinect (ground truth head positions are defined in depth-camera coordinates)

Here’s the relevant part of notebook which uses matrix multiplication for the 3D->2D coordinates conversion:

def convert_biwi(coords, cal):
    pt = cal @ np.append(coords, 1)

    return tensor([pt[1]/pt[2], pt[0]/pt[2]])

def get_ctr(f):
    ctr = np.genfromtxt(img2txt_name(f), skip_header=3)
    
    fcal = img2cal_name(f)
    
    cal_i = np.genfromtxt(fcal, skip_footer=6)
    cal_p = np.eye(3, 4)
    cal_rot = np.genfromtxt(fcal, skip_header=5, skip_footer=2)
    cal_rot = np.vstack([np.c_[cal_rot, np.array([0, 0, 0])], [0, 0, 0, 1]])
    cal_t_vec = np.genfromtxt(fcal, skip_header=9, skip_footer=1)
    cal_t = np.identity(4)
    cal_t[0, 3] = cal_t_vec[0]
    cal_t[1, 3] = cal_t_vec[1]
    cal_t[2, 3] = cal_t_vec[2]
    cal = cal_i @ cal_p @ cal_rot @ cal_t
    
    return convert_biwi(ctr, cal)

(what I didn’t get is why I had to swap x and y coordinates in the tensor([pt[1]/pt[2], pt[0]/pt[2]]) expression. Any advice?)

With that change, the validation error is more than 2x times lower than before: 0.000971

And after training it a bit more with half the original learning rate, the validation error decreased 10x times more: 0.000100!

The results seem to be insanely accurate:

Wow, that feels like magic.