I’m struggling with setting up a Dataloader correctly and hope you can help me.
I’m working on a hand pose estimation problem and want to predict 21 key points on the hand (basically the joint positions) and their distance to the camera or their distance to the first joint of the middle finger. So I have Images, 2D points and a distance value for each point. My model should learn to predict the 2D Key points and the distance from an Image as input.
I am struggling to set up a data loader, so that show(), show_batch(), show_results() and augmentations work. I don’t think that I can use the DataBlock API, as there is no Block for 2D + distance data. There is a PointsBlock, but it seems that it only works for 2D data. So I followed the Custom Transforms and Siamese Tutorials here: https://docs.fast.ai/tutorial.albumentations.html to create my own Transform and data type. You can see that in this Colab notebook: https://colab.research.google.com/drive/1k8-udaIayLcfrLE4d97_0r0u748TrpOc?usp=sharing
I am not sure what the type should actually be. A fastuple of (TensorImage, TensorPoint for the key points, Tensor for the distances) or (TensorImage, Tensor for the key points and distances)?
How can I make sure, that augmentations still work. E.G. rotations and shifts should move the image and 2D points, but not the distances?
I am thankful for any tips and hints. In the meantime I will dive deeper into the fast.ai API.