Working with 2.5D data


I’m struggling with setting up a Dataloader correctly and hope you can help me.

I’m working on a hand pose estimation problem and want to predict 21 key points on the hand (basically the joint positions) and their distance to the camera or their distance to the first joint of the middle finger. So I have Images, 2D points and a distance value for each point. My model should learn to predict the 2D Key points and the distance from an Image as input.

I am struggling to set up a data loader, so that show(), show_batch(), show_results() and augmentations work. I don’t think that I can use the DataBlock API, as there is no Block for 2D + distance data. There is a PointsBlock, but it seems that it only works for 2D data. So I followed the Custom Transforms and Siamese Tutorials here: to create my own Transform and data type. You can see that in this Colab notebook:

I am not sure what the type should actually be. A fastuple of (TensorImage, TensorPoint for the key points, Tensor for the distances) or (TensorImage, Tensor for the key points and distances)?

How can I make sure, that augmentations still work. E.G. rotations and shifts should move the image and 2D points, but not the distances?

I am thankful for any tips and hints. In the meantime I will dive deeper into the API.

I made some progress:

According to the book I am using TfmdLists for x and y and feed them to Datasets to create the dataloader. For x (the Images) I can just use standard transforms. For y (the 2D points and distances) I created a type KeyPointsDistance that inherited from fastuple and contains the values for the 2D Points as a TensorPoint and for the distances as a Tensor. I also created a Transform that encodes a file path to this type or tuple.

The problem I am facing right now is, that dls.one_batch only returns one example (one tuple) for y, but a batch for x. Must y be one tensor for one_batch to work?

class KeyPointsDistance(fastuple):
  def show(self):
    keypoints, distances = self
    return[keypoints, distances.view(21,1,1)], 2)

class GaneratedHands21_2DKeyPointsAndDistanceTransform(Transform):    
    def encodes(self, image_file):
        two_d_points = self._load_2d_points(image_file)  # imagine this function exists :)
        distances = self._load_distances(image_file). # imagine this function exists :)
        return KeyPointsDistance(TensorPoint(two_d_points), Tensor(distances))

splits = RandomSplitter()(files)

xtfms = [PILImage.create,ToTensor]
ytfms = [GaneratedHands21_2DKeyPointsAndDistanceTransform, ToTensor]

dsets = Datasets(files, [xtfms, ytfms], splits=splits)

dls = dsets.dataloaders()
batch = dls.one_batch()
x,y = batch
len(x), len(y)
# (64, 2)

To make show_batch work I created a typedispatched function and it gets called (but with the batching problem). So if the batching problem is solved, then this should also be solved.

I still don’t know if I can make augmentations work. Has anyone hints on this? From reading the book, it seems to me that if x or y is a tuple and fastai knows the type, then it should apply the correct augmentations to it.

I am still having issues here. Another approach would be to create a Datasets with 3 Pipelines. One for the images, one for the key points and one for the distances.

imgtfms = [PILImage.create,ToTensor]
kptsTfm = [KeyPointsTransform]
disttfm = [DistanceTransform]

splits = RandomSplitter()(files)
dsets = Datasets(files, [imgtfms, kptsTfm, disttfm], splits=splits)

But I am not sure how to make show_batch work then.