Using unet_learner with BW 1 channel Images

Redevil · March 29, 2024, 11:01pm

I would like to use the unet_learner with BW (1 channel) images for the KeyPoint-Regression task.

Currently, I’m encountering this error:

‘RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[32, 1, 512, 512] to have 3 channels, but got 1 channel instead’

as the network still expects RGB images.

Could you tell me the easiest way to solve this problem?

P.S.: I’ve read this post, but I would prefer to avoid implementing the Unet in PyTorch if possible.

benkarr · March 30, 2024, 6:41pm

Hey,
unet_learner can take kwargs which are passed to create_unet_model, which takes an n_in argument. So adding n_in=1 to the arguments you call unet_learner with should do the trick. (Can’t test it myself right now, but that’s what the code suggests…)

Redevil · April 2, 2024, 10:00am

Thanks @benkarr

I have tried to set n_in=1 and now I get:

UserWarning: Using a target size (torch.Size([4, 2, 2])) that is different to the input size (torch.Size([4, 4, 512, 512]))

I suppose I have to add 1 dimension to the target (from [4, 2, 2] to [4, 1, 2, 2]) by torch.unsqueeze().

I was wondering where to insert this within the unet_learner().

benkarr · April 2, 2024, 10:39am

Hmm, the error mentions that the target size doesn’t match the input size…
my guess would be that something is off with the dataloader. Can you share a link to a notebook or at least the (complete) code of how you create the dataloader?

Redevil · April 2, 2024, 4:22pm

Thanka again @benkarr !!

Here is the link to the notebook on Kaggle:
https://www.kaggle.com/code/cortomalt/regression-seflattionunet/

I tried the same dataloader with both the ResNet architecture and the EfficientNet, and it works correctly.
Even with the PyTorch format dataloader, it worked fine with the aforementioned architectures.

I’m starting to think that maybe part of the problem is due to the preprocessing of the images from 16 to 8 bits."

benkarr · April 3, 2024, 9:31am

Ok,
so your DataBlock uses a PointBlock as the target but the unet model predicts segmentation masks of shape img_height x img_width, which would break when calculating the loss, as the shapes don’t match (this also fits the error from above). The PointBlock works fine for a plain regression with a ResNet/EfficientNet but not for the unet which requires something like a MaskBlock or an extra transform.
To get from a point to a segmentation mask one would usually create a “one-hot”-image (1 at the position of the keypoint, 0 else) or a “gaussian heatmap”.
The latter is described here which shows the way with the PointBlock + transform and also how to retrieve the points from the predicted masks.
I guess a simpler way would be to do some preprocessing once by looping through the images/keypoints, create the masks, add them to your dataset and save the paths to the dataframe, so you can use the standard segmentation DataBlock with a MaskBlock and the ColReader…
I can not demonstrate that with your notebook, since the dataset is private but if you make the dataset public I’ll have a go.

Redevil · April 3, 2024, 11:41am

Thank you again for the valuable advice!!

I have made the dataset public.
As soon as possible, I will try to implement what you have suggested

Redevil · April 10, 2024, 10:40am

Hi @benkarr,
I have tried to adapt the heatmap mechanism you have suggested here to the Unet architecture.
This is the new notebook

Now I get this error:
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 1

It’s because the images mini-batch as shape [8, 1, 512, 512] and the heatmap mini-batch as shape [8, 2, 512, 512] instead.
I suppose that images mini-batch shape is incorrect, it should be as the heatmap mini-batch (i.e. with 2 coordinates instead of 1).

The PointBlock dataloader implementation is quite straightforward and seems correct.
I really don’t know why I have this error