Create DataBlock with Two Input Images

I’m playing around with DataBlocks trying to understand how to use them. In my example I’m just feeding in the same image twice just to get that working before moving on. However, I think this should work, but it doesn’t.

def get_x(r): 
    return path/r['image_name'], path/r["image_name"]

def get_y(r): 
    return r["class"]

dblock = DataBlock(blocks=([ImageBlock, ImageBlock], CategoryBlock),
                    get_x=get_x, 
                    get_y=get_y,
                    item_tfms=[Resize(224)])

dls = dblock.dataloaders(df, bs=24)

However, I get this message:

Could not do one pass in your dataloader, there is something wrong in it

Then when I try to run a network created from this dataloader i eventually see this in the stack trace:

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class ‘pathlib.PosixPath’>

Which is what I am passing in with the get_x method, but shouldn’t that work?

You need to define the number of inputs too. IE:

dblock = DataBlock(blocks=(ImageBlock, ImageBlock, CategoryBlock),
                    get_x=get_x, 
                    get_y=get_y,
                    item_tfms=[Resize(224)],
                    n_inp=2)

(which btw don’t pass the blocks in as two seperate lists, instead use n_inp). This will be turned into a list so we say the first two are inputs here and the rest are outputs

But now i get:

FileNotFoundError: [Errno 2] No such file or directory: ‘Normal’

Which is one of my class names.

Note that for more than one input/target, you can’t use get_x/get_y. You have to pass a list with getters instead (which should have three things, the get for the first image, for the second image and then for your target).

5 Likes

You wouldn’t happen to know somewhere with an example of this off of the top of your head would you? I found the example in the documentation with the bounding boxes, but that doesn’t seem clear how to adapt it to this kind of a scenario.

When in doubt, and doesn’t look like fastai has it, check my repo walk with fastai2 :wink: Bengali.AI here is an example using 4 getters, 3 labels and an input image via ColReaders.

2 Likes

Ok, I think I’m getting close here…

getters = [
           ColReader('image_name', pref=path),
           ColReader('image_name', pref=path),
           ColReader('class'),
]

dblock = DataBlock(blocks=(ImageBlock, ImageBlock, CategoryBlock),
                    getters=getters, 
                    item_tfms=[Resize(224)], n_inp=2)

dls = dblock.dataloaders(df, bs=24)

However, now I get this error:

TypeError: forward() takes 2 positional arguments but 3 were given

This example you gave definitely got me a lot farther, but unfortunately it only used one input and I have two. Which is where I seem to be stumbling here. Also, I’ll have to go through those notebooks, they look really great!

1 Like

The issue now is your model. You’ll need to make a model that accepts 2 seperate images. (Best way is to grab one batch of data and look at it and play with your model). What’s your model look like?

1 Like

ooohhhh… yeah. LOL I’m just using the xresnet pretrained model. Is it possible to use pretrained models in this scenario do you think? I wonder if it’s possible to use two separate xresnet models? One for each image. Anyhow, I’ll have to think about this. Let me know if you have any thoughts on this.

You need a model that accepts two inputs, It can be built with a pretrained model, and pass those two inputs to it.

I think I was overcomplicating the problem. Here is the model that works:

class SiameseModel(Module):
    def __init__(self, encoder, head):
        self.encoder, self.head = encoder, head

    def forward(self, x1, x2, x3, x4):
        ftrs = torch.cat([self.encoder(x1), self.encoder(x2), self.encoder(x3), self.encoder(x4)], dim=1)
        return self.head(ftrs)


def loss_func(out, targ):
    return CrossEntropyLossFlat()(out, targ.long())


def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

encoder = create_body(densenet201, cut=-1)
head = create_head(15360, 2, ps=0.3)
model = SiameseModel(encoder, head)

learn = Learner(dls, model, loss_func=loss_func,
                splitter=siamese_splitter, metrics=accuracy).to_fp16()

learn.freeze()
learn.fit_one_cycle(5, 1e-3)
1 Like

This is awesome.

I wonder what limitations it has. More precisely : I have a segmentation problem with 480x480 px images and three (!) binary masks to predict 3 different classes.

So far I have addressed this by creating 3 different segmentation models, but that, of course, does not take possible interdependencies between my 3 classes into consideration.

On the other hand, having batches with 3 masks instead of 1 in memory should greatly impact the maximum batch size, possibly down to bs=1.

Has anyone tried this before? Other comments appreciated, too.

I guess the three masks can overlap each other? if not you could merge them to one mask with 3 classes.

In my experience one model with multiple heads (just used it for classification not for segmentation) work as good as different models. But I’d avoid super small batch sizes.

I’d just run some experiments with a subset of the data - different models / heads / batch sizes / resolutions to get a feeling what will or won’t work.

Thanks Florian for the advice!

Yes I do have pixels that belong to 2 classes. I think I just try it out

Hello,
I work with a set of grayscale image (CT scan images). I would like to concatenate 3 successive images in one tensor like (3x256x256) and put it in an Imageblock() . Do you have any idea ? sorry i’m a newbie!

One simple option is to preprocess the data, run a python script that converts all the given images to the format you want, and store it.

Hey Zach, this is a really interesting thread. Thanks for your awesome input. Say we actually want to have 2 inputs, the first is an image so an ImageBlock works here and the 2nd is just text, for simplicity let’s assume it’s a word that describes the image— and thus I’m assuming we could use a CategorigalBlock with n_inp = 2. I guess my silly question is… is there an architecture that accepts both images and text as an input? If not, what are the alternatives here? Many many thanks!

1 Like