How to use SegmentationDataLoaders correctly?

Hello, I’m new to fastai and I was experimenting with it for a semantic segmentation application. Starting from the tutorials, I understand that the suggested dataloader to adopt is SegmentationDataLoaders. However, it runs fine for loading and showing a batch:

dls = SegmentationDataLoaders.from_label_func(
path, bs=1, fnames=fnames, label_func=label_func2, codes=['Bkgd', 'Red'], shuffle_train=True)
dls.show_batch(max_n = 6, figsize=(12,12))

but then it fails when I try to fine tune a model:

learn = unet_learner(dls, arch=resnet34)

throwing RuntimeError: CUDA error: device-side assert triggered :

RuntimeError: CUDA error: device-side assert triggered

[Full traceback omitted for brevity]
Snooping around the forum I found several references to similar (at least I guess) issues as in [1], [2], [3] but I still don’t get how to fix it. My understanding is that the problem is caused by the loader that reads the binary masks in [0, 255] format instead of [0, 1]. What I tried was:

  • pre-process the masks (divide by 255 as I only have 1 class)
  • save them with the correct [0, 1] format

but when I read them with the dataloader they are still in [0, 255] format.

I also went through other suggested hacks but I didn’t manage to make them work (maybe they were outdated or I’m simply not good enough at coding :sweat_smile: ).

So my question is: as of today, what is the correct way to build a data loader for a segmentation task?

Thanks in advance and sorry for the long post!

You can use this transformation


You can check my blog, hope it helps.

Thanks for sharing! :slight_smile:
Unfortunately though I’m still not able to make it work. IntToFloatTensor(div_mask=255) works indeed and I end up with masks in [0, 1] format, but the learner still fails with the same error.
Differently from part II of you blog I created the learner without specifying the loss, i.e.:

learn = unet_learner(dls, arch=resnet34, n_out=1)
as opposed to:

learn = unet_learner(dls,resnet34,loss_func=lovasz_hinge,metrics=[meanapv1],n_out=1)

So my guess now is that either:

  • the problem was somehow caused by the default loss function
  • in my dataset there are some “empty” images for which the mask contains only 0s (which is the same format I get if I don’t convert using IntToFloatTensor(div_mask=255) . So it may be something related to that… (?)

Do you have any suggestion? If you wouldn’t mind sharing also the lovasz_hinge implementation you’re using I could try to see if that solves the problem…


The lovasz_hinge loss is available in the same repo. Can you provide a notebook that reproduces the error, so it will help in debugging.

@VishnuSubramanian I tried with lovasz_hinge and now it seems to work (both using [0, 1] coming from the DataBlock and [0, 255] masks coming from SegmentationDataLoaders).

If I understood correctly, when I call unet_learner it tries to infer the loss from the dataloader, which in my case happen to end up in FlattenedLoss of CrossEntropyLoss(). At this point, I would be curious to understand why it doesn’t work with that but I didn’t manage to debug further.

In case you may want to give it a try here’s the code:

### Note: use fastai conda env: fastat==2.8.1, torch==1.8.1
from import *
from import *

IMG_PATH = Path().cwd().parent / 'dataset/red/v1.0/crops_512/images' #custom_path

tfms = [IntToFloatTensor(div_mask=255), Flip(), 
#         Brightness(0.1, p=0.25), Zoom(max_zoom=1.1, p=0.25), Normalize.from_stats(*imagenet_stats)
def label_func(fname:Path): return str(fname).replace('images','masks')

db = DataBlock(blocks=(ImageBlock(), MaskBlock()),
#                item_tfms=[Resize(size, pad_mode=PadMode.Border)],
               get_items=get_image_files, get_y=label_func)

dls = db.dataloaders(source=IMG_PATH, bs=2)

learn = unet_learner(dls, arch=resnet34, n_out=1)
#FlattenedLoss of CrossEntropyLoss()


Thanks :slight_smile:

You can try out a small experiment to understand it. Take the output of the model and label tensor from the data loader and pass it to the loss function. You will understand what is going wrong.

Try with the default loss function, and then try with Binary cross entropy. I hope this experiment will help you understand what is happening.

Thanks a lot for the suggestion! I tried to do it, but I got stuck at some point. Basically, the default loss inferred is, which in turn falls back to torch.nn.CrossEntropyLoss(). So I tried what you said and the error seems to happen somewhere in torch.masked_select(). Here’s a reproducible example and the full stack trace:

import torch
from import CrossEntropyLossFlat, nn

i = torch.randn(1, 1, 128, 128).random_(256).to('cuda')
t = torch.empty(1, 128, 128, dtype=torch.long).random_(2).to('cuda')

i.shape, t.shape

# loss = a
loss = nn.CrossEntropyLoss()
loss(i, t)

RuntimeError                              Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/IPython/core/ in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/.local/lib/python3.7/site-packages/IPython/lib/ in pretty(self, obj)
    400                         if cls is not object \
    401                                 and callable(cls.__dict__.get('__repr__')):
--> 402                             return _repr_pprint(obj, self, cycle)
    404             return _default_pprint(obj, self, cycle)

~/.local/lib/python3.7/site-packages/IPython/lib/ in _repr_pprint(obj, p, cycle)
    695     """A pprint that just redirects to the normal repr function."""
    696     # Find newlines and replace them with p.break_()
--> 697     output = repr(obj)
    698     for idx,output_line in enumerate(output.splitlines()):
    699         if idx:

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/ in __repr__(self)
    191             return handle_torch_function(Tensor.__repr__, (self,), self)
    192         # All strings are unicode in Python 3.
--> 193         return torch._tensor_str._str(self)
    195     def backward(self, gradient=None, retain_graph=None, create_graph=False, inputs=None):

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/ in _str(self)
    381 def _str(self):
    382     with torch.no_grad():
--> 383         return _str_intern(self)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/ in _str_intern(inp)
    356                     tensor_str = _tensor_str(self.to_dense(), indent)
    357                 else:
--> 358                     tensor_str = _tensor_str(self, indent)
    360     if self.layout != torch.strided:

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/ in _tensor_str(self, indent)
    240         return _tensor_str_with_formatter(self, indent, summarize, real_formatter, imag_formatter)
    241     else:
--> 242         formatter = _Formatter(get_summarized_data(self) if summarize else self)
    243         return _tensor_str_with_formatter(self, indent, summarize, formatter)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/ in __init__(self, tensor)
     89         else:
---> 90             nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) &
     92             if nonzero_finite_vals.numel() == 0:

RuntimeError: CUDA error: device-side assert triggered

Do you have any ideas?

Ideally, it should be like this.

t = torch.randn(8, 2, 128, 128, dtype=torch.float32).to('cuda')
i = torch.empty(8, 128, 128, dtype=torch.int64).random_(2).to('cuda')
loss = nn.CrossEntropyLoss()

The output cannot be 1 for CrossEntropy.

Thanks a lot! :smiley:

So the whole point was to change the model’s output channel size to 2 instead of 1. In fact, if I use learn = unet_learner(dls, arch=resnet34, n_out=2) then also works
Just to be sure I fully understood: that 2 comes from the fact that I have binary classification (0 - background, 1 - object), right?

Thank you very much, you’ve been extremely helpful :slight_smile: :slight_smile:

Yup, that’s right. You can also use Binary cross-entropy if you want to keep the n_out as 1.

Great, so to sum up the solution to the initial post:

The key points are basically 2:

  • masks must in [0, 1, …, K-1] where K is the number of categories → so in case of [0, 255] format you can add IntToFloatTensor(div_mask=255) to the loader transformations
  • to make it trainable with the default CrossEntropyLossFlat() you must specify unet_learner(..., n_out=K)


from import *

tfms = [ IntToFloatTensor(div_mask=255) ]

dls = SegmentationDataLoaders.from_label_func(
    path, bs=2, fnames=get_image_files(path / 'images'), label_func=label_func, 

learn = unet_learner(dls, arch=resnet34, n_out=2)
learn.fine_tune(1, 1e-4)

Same result can be achieved with the DataBlock:

from import *
db = DataBlock(blocks=(ImageBlock(), MaskBlock()),
               get_items=get_image_files, get_y=label_func)

dls = db.dataloaders(source=IMG_PATH, bs=1)

A still open point is how to include codes argument in both approaches. For example, in my case adding labels for 0, 1 pixels causes show_batch not to display overlaid masks:

dls = SegmentationDataLoaders.from_label_func(
    IMG_PATH.parent, bs=2, fnames=fnames, label_func=label_func,
    codes=['Bkgd', 'Cell']
# OR:
# db = DataBlock(blocks=(ImageBlock(), MaskBlock(codes=['Bkgd', 'Cell'])),
#                batch_tfms=tfms,
#                get_items=get_image_files, get_y=label_func)
# dls = db.dataloaders(source=IMG_PATH, bs=2)


(it should be like:

Kudos to @VishnuSubramanian for helping!! :smiley:



in fact using both codes and IntToFloatTensor works! You just need to specify vmin and vmax parameters correctly in show_batch:

dls.show_batch(vmin=0, vmax=1)