Flipping and image regression with keypoints

Hi,

I want to augment my data with flipping. But I’ve realized that can have some issues when the keypoints have semantic meaning or an order to them. For instance, look at this picture of a cat. All the keypoints for ears are in the same direction: [left ear, right ear]. But after flipping, they are now opposite. So when the model makes a prediction, it will get a big error even though it was correct!

To make it even more clear:
I have two points for the ears: [(100, 150), (200, 150)]
Flipping the image around x=150 gives the Y: [(200, 150), (100, 150)]
Predictions from model: [(98, 149), (201, 151)]
Will give a huge error, since the model predicts left to right, but the points now are in the other direction.

How is this best solved? Fixing my MSELoss to first sort the points from Y-truth back to a left-to-right order? Or do it in some augmentation step to make sure they’re all in the same order?

I actually think this is the “bug” mentioned here, with flip and tensorpoints making it only predict in the middle: Flip with TensorPoint causes a bug · Issue #100 · fastai/fastai2 · GitHub
Of course it will, when the flips gives it a huge error so it regresses towards the middle.

Also might be the issue @muellerzr is seeing in his GH issue here? Major bug regarding Augmentation and TensorPoint · Issue #2628 · fastai/fastai · GitHub

Any thoughts on this?

How can I handle flipping augmentation with multiple keypoints when the loss is calculated looking at the order of the points?

The obvious answer is to fix it, modify your flipping augmentation to relabel the points as well. If the goal is to predict specifically the location of the left ear and the right ear in an ordered fashion this is the only real way.

For more general problems you don’t usually expect spatial ordering to matter. Do a minimum cost assignment between predicted points and possible targets, and then backprop only the best assignment.

Thanks for your ideas!

Any ideas on where to inject the code to do any of those things?

For the first one, I’m not sure how I could modify my list when fastai augments the image.

And for the second idea, I’m not sure where I would do it as well…

(I can figure out the logic myself, just don’t see how/where I can apply it)

For the assignment solution, you’d write a custom loss function. Compute a pairwise assignment cost between all predictions and all ground truths, use something like the min-cost assignment to pair up your predictions and GTs, then compute your actual losses. Look at the DETR paper for an idea on how to do this in theory, and it can all be wrapped up inside your loss function.

If you’re relabelling points, then that involves writing a new flip augmentation that relabels it. I don’t know where your existing flip augmentation comes from, but you should remove it and replace it with your own. Fastai only augments the image where you tell it to, so just don’t tell it to flip with it’s inbuilt functions, and then tell it to use yours instead.