Using unet_learner with BW 1 channel Images

I would like to use the unet_learner with BW (1 channel) images for the KeyPoint-Regression task.

Currently, I’m encountering this error:

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[32, 1, 512, 512] to have 3 channels, but got 1 channel instead

as the network still expects RGB images.

Could you tell me the easiest way to solve this problem?

P.S.: I’ve read this post, but I would prefer to avoid implementing the Unet in PyTorch if possible.

unet_learner can take kwargs which are passed to create_unet_model, which takes an n_in argument. So adding n_in=1 to the arguments you call unet_learner with should do the trick. (Can’t test it myself right now, but that’s what the code suggests…)

1 Like

Thanks @benkarr

I have tried to set n_in=1 and now I get:

UserWarning: Using a target size (torch.Size([4, 2, 2])) that is different to the input size (torch.Size([4, 4, 512, 512]))

I suppose I have to add 1 dimension to the target (from [4, 2, 2] to [4, 1, 2, 2]) by torch.unsqueeze().

I was wondering where to insert this within the unet_learner().

Hmm, the error mentions that the target size doesn’t match the input size…
my guess would be that something is off with the dataloader. Can you share a link to a notebook or at least the (complete) code of how you create the dataloader?

1 Like

Thanka again @benkarr !!

Here is the link to the notebook on Kaggle:

I tried the same dataloader with both the ResNet architecture and the EfficientNet, and it works correctly.
Even with the PyTorch format dataloader, it worked fine with the aforementioned architectures.

I’m starting to think that maybe part of the problem is due to the preprocessing of the images from 16 to 8 bits."

so your DataBlock uses a PointBlock as the target but the unet model predicts segmentation masks of shape img_height x img_width, which would break when calculating the loss, as the shapes don’t match (this also fits the error from above). The PointBlock works fine for a plain regression with a ResNet/EfficientNet but not for the unet which requires something like a MaskBlock or an extra transform.
To get from a point to a segmentation mask one would usually create a “one-hot”-image (1 at the position of the keypoint, 0 else) or a “gaussian heatmap”.
The latter is described here which shows the way with the PointBlock + transform and also how to retrieve the points from the predicted masks.
I guess a simpler way would be to do some preprocessing once by looping through the images/keypoints, create the masks, add them to your dataset and save the paths to the dataframe, so you can use the standard segmentation DataBlock with a MaskBlock and the ColReader…
I can not demonstrate that with your notebook, since the dataset is private but if you make the dataset public I’ll have a go.

1 Like

Thank you again for the valuable advice!!

I have made the dataset public.
As soon as possible, I will try to implement what you have suggested

1 Like

Hi @benkarr,
I have tried to adapt the heatmap mechanism you have suggested here to the Unet architecture.
This is the new notebook

Now I get this error:
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 1

It’s because the images mini-batch as shape [8, 1, 512, 512] and the heatmap mini-batch as shape [8, 2, 512, 512] instead.
I suppose that images mini-batch shape is incorrect, it should be as the heatmap mini-batch (i.e. with 2 coordinates instead of 1).

The PointBlock dataloader implementation is quite straightforward and seems correct.
I really don’t know why I have this error :frowning: