UNet: Size error for a custom dataset

Ok so I guess we need to check what are out.shape and yb.shape in

  File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/fastai/basic_train.py", line 39, in loss_batch
    loss = loss_func(out, *yb)

Can you do this ? It is highly possible the problem comes from out, which would mean that your model outputs 5D tensors (with batch as first dimension) for some reason. You can by the way check learn.model(learn.data.one_batch()[0]).shape, that should output the same thing as out.shape.

For the following modification:

    out = model(*xb)
    for yy in yb: 
        print(yy.shape)
    print('Out shape before handler: {}'.format(out.shape))
    out = cb_handler.on_loss_begin(out)
    print('Out shape after handler: {}'.format(out.shape))

I get:

torch.Size([2, 1, 224, 224])
Out shape before handler: torch.Size([2, 3, 224, 224])
Out shape after handler: torch.Size([2, 3, 224, 224])

Also, running learn.model(learn.data.one_batch()[0]).shape returns the following shape: torch.Size([2, 3, 224, 224]) which seems coherent.

Indeed, it looks like it is completely working as intended. At this point I guess we’ll need to check what’s happening in pytorch. Can you check what are input.shape and target.shape in nn.functional.cross_entropy and nn.functional.nll_loss ? It will enable us to see if problem comes from fastai (probably in FlattenedLoss for some reason) or somewhere in pytorch.

What’s very strange is that inputs and targets are supposed to be flattened by fastai, therefore having at most 2 dimensions. So there is no particular reason it shouldn’t work imo (I’m using a unet_learner with this loss myself, works perfectly).

That might be it !
Running print('Input shape: {} target shape: {}'.format(input.shape, target.shape)) in nn.functional.nll_loss returns Input shape: torch.Size([2, 3, 224, 224]) target shape: torch.Size([2, 1, 224, 224]) .Target shape should have 3 channels, right ?

Hmm, indeed, I’m used to PyTorch and I’m still getting used to fastai’s internal which is why I’m getting lost in the pipeline. :wink:

No target is supposed to have 1 channel, it has in theory integer values between 0 and 2 (included) in your case. Is this the case ?

It is indeed the case. For the max, min and dtype I get: Max target: 2 Min target: 0 Target dtype: torch.int64

Looks fine to me, I’ll check some things on my notebook seems if there’s something different for me.

Thanks a lot ! Meanwhile, I’ll try to trace the error again, see if I missed something !

So after checking, for me input and target have flattened losses in nll_loss, which is not the case for you, so there is probably something going wrong somewhere in fastai as I was suspecting seeing the stack trace. There is indeed no call between loss = loss_func(out, *yb) and ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) which is strange, there should be at elast 3 intermediate calls. I don’t know why it happens though, so we’ll need to go a bit blindly. Can you try to pass loss_func=CrossEntropyFlat(axis=1) to unet_learner ? Maybe the one given in the dataset is not taken into account for some reason.

Nice ! Indeed, it seems giving the loss function through the dataset wasn’t acknowledged by the learner.

print(learn.loss_function) # nll_loss
learn.loss_func = CrossEntropyFlat(axis = 1)
print(learn.loss_function) # FlattenedLoss of CrossEntropyLoss()

The lr finder seems to finish the first batch before returning the following error: File "/home/mehdi/miniconda3/envs/fai/lib/python3.6/site-packages/torch/nn/functional.py", line 1824, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: Assertioncur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch_1565287148058/work/aten/src/THNN/generic/ClassNLLCriterion.c:94`

Tracing the error indicates that this time, the dimensions’s length is 2, which is a progress. :wink:

1 Like

Nice, this error I know ! There is a problem between the size of input’s dimension 0 (which should be 3) and the maximum value of target. Can you check input.shape and target.max (and target.shape as a sanity check).

Absolutely. Thus, in the case dim == 2, we have: Input shape: torch.Size([100352, 3]) Target shape: torch.Size([100352]) Target Max: 3
Should the input shape be transposed ? Are tensors used to initialize the Image class supposed to have a shape as [H,W,C] ?

No they supposed to be in format [C, H, W], which seems to be what you are doing here. But somehow the target’s max is 3, while it’s supposed to be 2. Do you have 3 classes including background or 3 classes + background ? If that’s the latter, you should have self.c = 4 in your dataset class.

Yes, indeed, I am passing the images in usual pytorch style, [C, H, W].
HA ! I think that was it ! I tried with 4 classes and it seems to have worked its way through the pipeline without raising any errors.
So to recap:

  • Pass the loss function directly to the learner
  • Ensure number of classes is correct :wink:

Thanks a lot, my friend, you really saved my fastai membership. :wink: I may have drifted back to pytorch without your help !
Next stop: a working segmentation model !

2 Likes

Nice! No problem mate, always happy to help :slight_smile: If you plan on continuing to use fastai, you should still at some point get a bit deep on the DataBlock API. It is pretty unnatural at the beginning when you are used to classic pytorch, but it soon gets very handy and integrates better with the rest of fastai (for instance you can pass the loss function to the dataset :stuck_out_tongue: ). Anyway, good luck on your future work with fastai!

1 Like

That solved the issue for me on the error:
“only batches of spatial targets supported (3D tensors) but got targets of dimension: 4”

The code:
learn = unet_learner(data, models.resnet18, metrics=dice, loss_func=CrossEntropyFlat(axis = 1))

I’ve thought that unet_learner handles it this way out of the box.

Hi All

I’m participating in kaggle pulmonary embolism competition and facing challenge in writing the test pipeline. Kaggle requires a separate inference notebook for submission that does not have train data or internet connection… in case this is useful.
The idea is to train a model – save the model – load the model in different file – do the inference.

Following is my training pipeline:
df is the column of filenames and target variables (encoded)

get_x = lambda x:f'{source}/train/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'

vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate', 
         'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
         'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
         'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
         'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences

get_y = ColReader(vocab) 

block = DataBlock(blocks=(ImageBlock(cls=PILDicom), MultiCategoryBlock(vocab=vocab, encoded=True)), 
                  get_x=get_x,
                  get_y=get_y,
                  batch_tfms=aug_transforms(size=224))

dls = block.dataloaders(df, bs=8, num_workers=0)

head = create_head(nf=1024, n_out=14, lin_ftrs=[256, 64], concat_pool=True)
config = cnn_config(custom_head=head)

learn = cnn_learner(dls, resnet34, config=config)
learn.fit_one_cycle(3, lr_max=0.05)
learn.save(file='resnet34_10epochs')

Now, the inference pipeline is: In this case, df is dataframe with filenames only. no target variable columns. Also, loss_func in training learn was self-configured as BCEWithLogitsLoss()… so used the same here as well.

get_x = lambda x:f'{source}/test/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'

vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate', 
         'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
         'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
         'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
         'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences

block = DataBlock(blocks=(ImageBlock(cls=PILDicom)), # , MultiCategoryBlock(vocab=vocab, encoded=True)
                  get_x=get_x,
                  batch_tfms=aug_transforms(size=224))
dls = block.dataloaders(df[:1000], bs=64, num_workers=0)
head = create_head(nf=1024, n_out=14, lin_ftrs=[256, 64], concat_pool=True)
config = cnn_config(custom_head=head)

learn = cnn_learner(dls, resnet34, config=config, , n_out=14, pretrained=False, loss_func=nn.BCEWithLogitsLoss())
test_data = dls.test_dl(df)
preds = learn.get_preds(dl=test_data)

On the get_preds, I’m getting the following error:
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 1, 224, 224] to have 3 channels, but got 1 channels instead

I can interpret the issue. It is related to having only 1 channel, instead of 3 channels. But this is exactly like datablock and dataloader in training process. How come this issue did not come up in training process but came up in inference process.
Need some help urgently!
Thanks in anticipation.

1 Like

Adding to the above post:
If it is what it is and I’m not doing anything incorrectly, then I suppose I’ll have to convert 1 channel to 3-channels somewhere in the process.
From looking at the datablock summary (shared below), it seems that it needs to be done at batch_tfms level. But I’m clueless about how to do that. I could not find any transform from the list that modify the structure of the Tensor.

Would be great if someone can help!

Setting-up type transforms pipelines
Collecting items from      StudyInstanceUID SeriesInstanceUID SOPInstanceUID
2668     16997c0ce2d9      1dab76457905   4f43708a0732
2669     16997c0ce2d9      1dab76457905   adaaeef49da2
2670     16997c0ce2d9      1dab76457905   f1ab8e058c53
2671     16997c0ce2d9      1dab76457905   fa5228210e24
2672     16997c0ce2d9      1dab76457905   28f6a41eb73f
...               ...               ...            ...
4958     018dbbbea69e      7c5b3db841d6   4f6f9b32ee00
4959     018dbbbea69e      7c5b3db841d6   f06ca93806ef
4960     018dbbbea69e      7c5b3db841d6   a39e84b22dcf
4961     018dbbbea69e      7c5b3db841d6   79c0b89bb9ce
4962     018dbbbea69e      7c5b3db841d6   d80591f64848

[306 rows x 3 columns]
Found 306 items
2 datasets of sizes 245,61
Setting up Pipeline: <lambda> -> PILDicom.create

Building one sample
  Pipeline: <lambda> -> PILDicom.create
    starting from
      StudyInstanceUID     16997c0ce2d9
SeriesInstanceUID    1dab76457905
SOPInstanceUID       be10058789c1
Name: 2696, dtype: object
    applying <lambda> gives
      ../input/rsna-str-pulmonary-embolism-detection/test/16997c0ce2d9/1dab76457905/be10058789c1.dcm
    applying PILDicom.create gives
      PILDicom mode=I;16 size=512x512

Final sample: (PILDicom mode=I;16 size=512x512,)


Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': 224, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      (PILDicom mode=I;16 size=512x512)
    applying ToTensor gives
      (TensorDicom of size 1x512x512)

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': 224, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}
    starting from
      (TensorDicom of size 4x1x512x512)
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorDicom of size 4x1x512x512)
    applying Flip -- {'size': 224, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} gives
      (TensorDicom of size 4x1x224x224)
    applying Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False} gives
      (TensorDicom of size 4x1x224x224)

Hi @imnishantg

That’s an inconsistency with the batch shape.

When Fastai downloads a pretrained model, it expects 3 channels as input (RGB). Besides that, it adds to the Dataloader the after_batch Normalize transformation with the ImageNet stats.
That’s not true when we load a saved model or create a pretrained=False (as you did in the inference).
The dataloader with the after_batch normalization outputs 3 channels (example: [bs, 3, 224, 224]) and that fits the model. Your original dataloader outputs just 1 channel.
I wrote a detailed answer with example in the Kaggle notebook`s comments:
https://www.kaggle.com/cordmaur/fastai2-medical-simple-training/comments

Hope that helps.

3 Likes

Thank you @cordmaur, the solution you detailed in the Kaggle link helped me after looking for a couple days to solve this.