3D Transforms

I’m trying to incorporate transformations for 3D data into my fastai pipeline. Currently the transforms in vision are specialized for 2D data. I wrote a couple transform functions which follow the same format as in fastai (e.g., zoom3d, rotate3d, etc) but with the proper dimensions (the transformation matrix is 4x4 for 3D instead of 3x3 for 2D). However, when I substitute my functions into a tfms list like:

tfms = [zoom3d(...)]

and create a databunch, I get the following (abbreviated) error message when I get an item from that databunch:

self.affine_mat = self.affine_mat @ m
RuntimeError: size mismatch, m1: [3 x 3], m2: [4 x 4]  ...

Currently, when a fastai image is initialized, the self.affine_mat is set to a torch.eye(3) (see here). I tried to call x.affine_mat = torch.eye(4) after I cast my image file as a fastai Image class (I am using a custom ImageItemList that has a new open method). However, the shown error message persists. How and where should I set the affine_mat to get 3D transforms to work? Or will I have to apply additional changes to make this work? I can go into more detail as necessary.

Also, if this is doable, should I submit a PR with the 3D transforms?

Thanks for the help

The inner pipeline of the fastai library is only working for 2D images, and the pytorch functions we use (mainly grid_sample) can only be used with 2D images as far as I know, so there’s no way to adapt it.
You can however:

  • write functions that take a 3D image and return a 3D image
  • implement the apply_tfms method of your Image3D class (should be a sublass of ItemBase) to code how they are applied
1 Like

I have a similar problem in creating data bunch. The PIL.Image can not accept my 3d image and even if it accepts, it assumes the last dimension as the channels! Let’s say I have an image of size (20,40,15). I can not use the data bunch creation and all those transform and processing of images in fastai, as it is for 2D images! Could you please help me find a solution or an alternative way to implement my CNN image regression? It is not MR images. It is 3D arrays saved by tif.imsave.