Keypoints Detection

I am trying to use the reg_head for a resnet34 Model applied as included down here, where pictures of 384 by 288 pixels were used, but I do not understand where the values of the input (64 * 12 * 9) and output (6144) come from.

In this example, the keypoints number was 12, how to reflect this on my model if I am going to detect only two keypoints resizing my images to 224*224 pixels?

Link to the ex.:

head_reg = nn.Sequential(
nn.Linear(64 * 12 * 9, 6144),
nn.Linear(6144, 24),
learn = create_cnn(data, arch, metrics=[my_acc,my_accHD], loss_func=F.l1_loss, custom_head=head_reg)

As I understood, a new filter of 9x12 was assumed. The output was assumed to be the product of 64*(k=96). The number 96 -I am not sure- is the ┬┤convolution volume depth assumed here

The value of K 96 could be considered as a default value which could be applicable to other cases like in my case, where 2 Kpts to be detected? Or other values should be assumed?