UNet: Size error for a custom dataset

Nice, this error I know ! There is a problem between the size of input’s dimension 0 (which should be 3) and the maximum value of target. Can you check input.shape and target.max (and target.shape as a sanity check).

Absolutely. Thus, in the case dim == 2, we have: Input shape: torch.Size([100352, 3]) Target shape: torch.Size([100352]) Target Max: 3
Should the input shape be transposed ? Are tensors used to initialize the Image class supposed to have a shape as [H,W,C] ?

No they supposed to be in format [C, H, W], which seems to be what you are doing here. But somehow the target’s max is 3, while it’s supposed to be 2. Do you have 3 classes including background or 3 classes + background ? If that’s the latter, you should have self.c = 4 in your dataset class.

Yes, indeed, I am passing the images in usual pytorch style, [C, H, W].
HA ! I think that was it ! I tried with 4 classes and it seems to have worked its way through the pipeline without raising any errors.
So to recap:

  • Pass the loss function directly to the learner
  • Ensure number of classes is correct :wink:

Thanks a lot, my friend, you really saved my fastai membership. :wink: I may have drifted back to pytorch without your help !
Next stop: a working segmentation model !

2 Likes

Nice! No problem mate, always happy to help :slight_smile: If you plan on continuing to use fastai, you should still at some point get a bit deep on the DataBlock API. It is pretty unnatural at the beginning when you are used to classic pytorch, but it soon gets very handy and integrates better with the rest of fastai (for instance you can pass the loss function to the dataset :stuck_out_tongue: ). Anyway, good luck on your future work with fastai!

1 Like

That solved the issue for me on the error:
“only batches of spatial targets supported (3D tensors) but got targets of dimension: 4”

The code:
learn = unet_learner(data, models.resnet18, metrics=dice, loss_func=CrossEntropyFlat(axis = 1))

I’ve thought that unet_learner handles it this way out of the box.

Hi All

I’m participating in kaggle pulmonary embolism competition and facing challenge in writing the test pipeline. Kaggle requires a separate inference notebook for submission that does not have train data or internet connection… in case this is useful.
The idea is to train a model – save the model – load the model in different file – do the inference.

Following is my training pipeline:
df is the column of filenames and target variables (encoded)

get_x = lambda x:f'{source}/train/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'

vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate', 
         'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
         'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
         'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
         'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences

get_y = ColReader(vocab) 

block = DataBlock(blocks=(ImageBlock(cls=PILDicom), MultiCategoryBlock(vocab=vocab, encoded=True)), 
                  get_x=get_x,
                  get_y=get_y,
                  batch_tfms=aug_transforms(size=224))

dls = block.dataloaders(df, bs=8, num_workers=0)

head = create_head(nf=1024, n_out=14, lin_ftrs=[256, 64], concat_pool=True)
config = cnn_config(custom_head=head)

learn = cnn_learner(dls, resnet34, config=config)
learn.fit_one_cycle(3, lr_max=0.05)
learn.save(file='resnet34_10epochs')

Now, the inference pipeline is: In this case, df is dataframe with filenames only. no target variable columns. Also, loss_func in training learn was self-configured as BCEWithLogitsLoss()… so used the same here as well.

get_x = lambda x:f'{source}/test/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'

vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate', 
         'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
         'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
         'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
         'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences

block = DataBlock(blocks=(ImageBlock(cls=PILDicom)), # , MultiCategoryBlock(vocab=vocab, encoded=True)
                  get_x=get_x,
                  batch_tfms=aug_transforms(size=224))
dls = block.dataloaders(df[:1000], bs=64, num_workers=0)
head = create_head(nf=1024, n_out=14, lin_ftrs=[256, 64], concat_pool=True)
config = cnn_config(custom_head=head)

learn = cnn_learner(dls, resnet34, config=config, , n_out=14, pretrained=False, loss_func=nn.BCEWithLogitsLoss())
test_data = dls.test_dl(df)
preds = learn.get_preds(dl=test_data)

On the get_preds, I’m getting the following error:
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 1, 224, 224] to have 3 channels, but got 1 channels instead

I can interpret the issue. It is related to having only 1 channel, instead of 3 channels. But this is exactly like datablock and dataloader in training process. How come this issue did not come up in training process but came up in inference process.
Need some help urgently!
Thanks in anticipation.

1 Like

Adding to the above post:
If it is what it is and I’m not doing anything incorrectly, then I suppose I’ll have to convert 1 channel to 3-channels somewhere in the process.
From looking at the datablock summary (shared below), it seems that it needs to be done at batch_tfms level. But I’m clueless about how to do that. I could not find any transform from the list that modify the structure of the Tensor.

Would be great if someone can help!

Setting-up type transforms pipelines
Collecting items from      StudyInstanceUID SeriesInstanceUID SOPInstanceUID
2668     16997c0ce2d9      1dab76457905   4f43708a0732
2669     16997c0ce2d9      1dab76457905   adaaeef49da2
2670     16997c0ce2d9      1dab76457905   f1ab8e058c53
2671     16997c0ce2d9      1dab76457905   fa5228210e24
2672     16997c0ce2d9      1dab76457905   28f6a41eb73f
...               ...               ...            ...
4958     018dbbbea69e      7c5b3db841d6   4f6f9b32ee00
4959     018dbbbea69e      7c5b3db841d6   f06ca93806ef
4960     018dbbbea69e      7c5b3db841d6   a39e84b22dcf
4961     018dbbbea69e      7c5b3db841d6   79c0b89bb9ce
4962     018dbbbea69e      7c5b3db841d6   d80591f64848

[306 rows x 3 columns]
Found 306 items
2 datasets of sizes 245,61
Setting up Pipeline: <lambda> -> PILDicom.create

Building one sample
  Pipeline: <lambda> -> PILDicom.create
    starting from
      StudyInstanceUID     16997c0ce2d9
SeriesInstanceUID    1dab76457905
SOPInstanceUID       be10058789c1
Name: 2696, dtype: object
    applying <lambda> gives
      ../input/rsna-str-pulmonary-embolism-detection/test/16997c0ce2d9/1dab76457905/be10058789c1.dcm
    applying PILDicom.create gives
      PILDicom mode=I;16 size=512x512

Final sample: (PILDicom mode=I;16 size=512x512,)


Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': 224, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      (PILDicom mode=I;16 size=512x512)
    applying ToTensor gives
      (TensorDicom of size 1x512x512)

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': 224, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}
    starting from
      (TensorDicom of size 4x1x512x512)
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorDicom of size 4x1x512x512)
    applying Flip -- {'size': 224, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} gives
      (TensorDicom of size 4x1x224x224)
    applying Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False} gives
      (TensorDicom of size 4x1x224x224)

Hi @imnishantg

That’s an inconsistency with the batch shape.

When Fastai downloads a pretrained model, it expects 3 channels as input (RGB). Besides that, it adds to the Dataloader the after_batch Normalize transformation with the ImageNet stats.
That’s not true when we load a saved model or create a pretrained=False (as you did in the inference).
The dataloader with the after_batch normalization outputs 3 channels (example: [bs, 3, 224, 224]) and that fits the model. Your original dataloader outputs just 1 channel.
I wrote a detailed answer with example in the Kaggle notebook`s comments:
https://www.kaggle.com/cordmaur/fastai2-medical-simple-training/comments

Hope that helps.

3 Likes

Thank you @cordmaur, the solution you detailed in the Kaggle link helped me after looking for a couple days to solve this.