Building an Image Data Loader with transforms on the input but not the target

These two posts helped me get closer to the right answer, but I’m still running into an error:


My goal:
Randomly mask a portion of one side of the input image, and leave the target image unchanged, like the screenshot below from the output of my data.summary(main_path, bs=8, show_batch=True), which appears to be right, but isn’t working with the learner:

My Code:

class PILImageInput(PILImage): pass
class PILImageOutput(PILImage): pass

class crappifier(Transform):
    def __init__(self):
        pass
    
    def encodes(self, img:PILImageInput):
        w,h = img.size
        coords = self.get_random_direction(w,h)
        ImageDraw.Draw(img).rectangle(coords, fill='green')
        return img     
        
    def get_random_direction(self, w,h):
        direction = random.randint(1, 4)
        perc = random.uniform(.2, .5)
        if direction == 1: 
            return [(0, 0), (w, h*perc)]
        if direction == 2: 
            return [(w-w*perc, 0), (w, h)]
        if direction == 3: 
            return [(0, h), (w, h -h*perc)]
        if direction == 4:
            return [(0, 0), (w*perc, h)]

data = DataBlock(blocks=(ImageBlock(cls=PILImageInput), ImageBlock()), 
                 get_items=get_image_files,
                 get_y=lambda x: main_path/x.name,
                 n_inp=1,
                 splitter=RandomSplitter(),
                 item_tfms=[Resize((400,300)), crappifier()]
                )

learn_gen = unet_learner(data.dataloaders(main_path), arch, wd=wd, norm_type=NormType.Weight,
                         self_attention=True, y_range=y_range, loss_func=loss_gen)

My Code:
`AssertionError: ‘n_out’ is not defined, and could not be inferred from data, set ‘dls.c’ or pass ‘n_out’

I’ve tried setting n_out to no avail, and I fear I am misunderstanding a fundamental component of DataBlocks or DataLoaders.

Any help would be much appreciated!

Progress… I set
n_out = 400 * 300

learn_gen = unet_learner(data.dataloaders(main_path), arch, n_out=n_out, wd=wd, norm_type=NormType.Weight, self_attention=True, y_range=y_range, loss_func=loss_gen)

and now I get a learner, but now I’m getting an error when I call learn_gen.fit_one_cycle(2, 1e-2)
The error is

... /opt/conda/lib/python3.7/site-packages/fastai/learner.py in add_cb(self, cb)
    107         cb.learn = self
    108         setattr(self, cb.name, cb)
--> 109         self.cbs.append(cb)
    110         return self
    111 

AttributeError: 'NoneType' object has no attribute 'append'

I saw someone else post this error, and it appeared to be a bug they corrected… but I’m hitting it again. Any help is much appreciated!

Thank you.

Like mentioned in the thread you posted, for your first question, you would do something like (I see you’ve already figured it out):

class PILImageInput(PILImage):
    pass


def transform(img):
    if isinstance(img, PILImageInput):
        img = mask(img)
    return img


dblock = DataBlock(blocks=(ImageBlock(PILImageInput), ImageBlock),
                   item_tfms=[transform])

Regarding dls.n_out, it loosely refers to the number of output categories you have. In a digit recognizer, it would be 10 (zero through nine), in regression it would be one (the output is just one real value), et cetera. For the U-Net architecture, it would be the number of output channels, which is most likely three (RGB) in your case.

Have a nice one!

Thank you @BobMcDear!
Do you know why it wasn’t able to discern n_out from my data?
When I run into questions like these in the future, how would you recommend I figure out what n_out does and means?

When I look at the docs for unet_learner it just lists the parameters and the default values, but not what each parameter means. Clicking on the source code doesn’t help me much either. Is this just a matter of getting more accustomed to navigating source code?

I’m traveling this weekend, but I’ll mark your answer a solution when I return! Thank you again for the insight.

No problem!

Generally speaking, fastai is able to infer n_out just fine when the nature of your task is simple, like classification or regression.

However, as you found out, for more complex problems, you need to nudge it towards the right direction because there are so many possible combinations it can’t account for. What if you have more than one dependent variable? How about Siamese networks? Etc.

About the documentation, it says,

The model is built from arch using the number of final filters inferred from dls if possible (otherwise pass a value to n_out ).

Filters is jargon for channels, so it is saying if your DataLoaders doesn’t know the number of final channels, you need to pass in n_out manually.

And don’t be concerned about not fully understanding the source code. Most mainstream packages have helpful docs and an active community you can take your questions to, so you don’t need to figure everything out by yourself. But as time goes by, you will be able to do so if need be :slightly_smiling_face: .

Have an awesome weekend!