Detectron2 anyone?

There may be some issues I’m not aware of but since you’re rewriting create_batch and show_batch, why not put all the special behavior there?

1 Like

Can you elaborate more on “put all special behavior there”? Specially inside create_batch, what would that be returning in this case?

The problem I was having is that if create_batch returns a List/Tuple, the rest of the pipeline fails to work.

Another problem I’m currently facing (and this happens even with the Buckets), is that currently the type of my batch is Bucket, Bucket, making it hard for type dispatch… I’m trying two solutions here, either type dispatching based on the type of the first element of the Bucket, or dynamically creating custom Bucket types, like “ImageTensorBucket”. The second solution worked well, but since the types were created dynamically pickling these types does not work straight out of the box (and this is needed for dataloader multiprocessing)

1 Like

I don’t understand that part, create_batch is the thing that returns the batch, so there is nothing else after. That you would need a special type for handling batch transforms, that I get, but you shouldn’t have to monkey-patch other functions.

As for introducing custom new types, they should be exported in the module if you want pickling to work.

1 Like

You’re right, the training process almost work flawlessly if we simply return a Tuple, the only minor issue that happens here is when Recorder tries to accumulate validation losses here and calls find_bs on the tuple, but we can directly patch find_bs for that, so let’s forget about it.

The problems starts to appear on the “predict” methods. I think it’s going to be more useful if I just give you the problem instead of trying to explain it, so here is the first problem that happens on learn.get_preds:

dl = self.dls.test_dl([item], num_workers=0)
inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
TypeError                                 Traceback (most recent call last)
<ipython-input-20-4149556023ad> in <module>
      1 dl = self.dls.test_dl([item], num_workers=0)
----> 2 inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)

~/git/fastai2/fastai2/ in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, inner, reorder, **kwargs)
    235                 res[pred_i] = act(res[pred_i])
    236                 if with_decoded: res.insert(pred_i+2, getattr(self.loss_func, 'decodes', noop)(res[pred_i]))
--> 237             if reorder and hasattr(dl, 'get_idxs'): res = nested_reorder(res, tensor(idxs).argsort())
    238             return tuple(res)

~/git/fastai2/fastai2/ in nested_reorder(t, idxs)
    613     "Reorder all tensors in `t` using `idxs`"
    614     if isinstance(t, (Tensor,L)): return t[idxs]
--> 615     elif is_listy(t): return type(t)(nested_reorder(t_, idxs) for t_ in t)
    616     if t is None: return t
    617     raise TypeError(f"Expected tensor, tuple, list or L but got {type(t)}")

~/git/fastai2/fastai2/ in <genexpr>(.0)
    613     "Reorder all tensors in `t` using `idxs`"
    614     if isinstance(t, (Tensor,L)): return t[idxs]
--> 615     elif is_listy(t): return type(t)(nested_reorder(t_, idxs) for t_ in t)
    616     if t is None: return t
    617     raise TypeError(f"Expected tensor, tuple, list or L but got {type(t)}")

~/git/mantisshrimp/mantisshrimp/data/ in nested_reorder2(t, idxs)
     83     if isinstance(t, Bucket):
     84         return t[idxs]
---> 85     return _old_nested_reorder(t, idxs)
     86 fastai2.torch_core.nested_reorder = nested_reorder2

~/git/fastai2/fastai2/ in nested_reorder(t, idxs)
    615     elif is_listy(t): return type(t)(nested_reorder(t_, idxs) for t_ in t)
    616     if t is None: return t
--> 617     raise TypeError(f"Expected tensor, tuple, list or L but got {type(t)}")
    619 # Cell

TypeError: Expected tensor, tuple, list or L but got <class 'dict'>

The main culprit of the problem here is is_listy.

After this problem is solved, a further problem is going to be encountered inside learn.predict, here:

dec = self.dls.decode_batch(inp + tuplify(dec_preds))[0]

tuplify will ultimately use is_iter and our dec preds will not end up wrapped by a tuple, which is what we need.

This is why led me to this crazy Bucket solution, which required a lot of monkey-patching… I would love a simpler solution if you think it’s possible =)

1 Like

The problem is that in that solution we’re creating types at runtime =/

But forget about this for now, not important compared to the other problems

My DataBlock is eassier, is just a dict.

I am using it just for the target and keeping ImageBlock for the input.

It works pretty straidforward. I am not using transforms like resize and normalize. It is done in first layer of torchvision.models.detection.maskrcnn_resnet50_fpn.

Can you share your implementation?

Does your data block allows for variable sized images? (Without the need to collate them with the help of Resize)

It is working for variable images sizes. All the work is done in a subclass of Learner.

1 Like

Ow, I’m so sorry, I forgot to answer this.

I don’t have telegram, but I really like the forum system we have here, if you like, we can create a thread only for discussing this project, with the added benefit of being open to anyone that wants to collaborate =)

1 Like

In the previous answer you have a topic that was created for that purpose. I have been updating there with my advances

1 Like

So I believe this you current datablock?

maskrccnnDataBlock = DataBlock(
    blocks=(ImageBlock, MaskRCNNBlock),

Does it work if you remove the resize from item_tfms?

If the problem is only in predict, I’d create a subclass of Learner and deal with predict (or get_preds if the problem comes from there) by overriding the method, like we do for the language model (predict) or for tabularLearner (get_preds).

While your Bucket solution works, I feel it’s a lot of magic :wink:

1 Like

There is, I also don’t like it

Alright then! I’ll try this solution today them and report how it goes, thank you a lot!!

1 Like

Yes, by removing it works. I also suggest you to change the MaskRCNN class name for solving num_workers issue.

The only work left is finding the place where values are passed to metrics. So, I can extract mask from dict and use segmentation ones.

I was given some intuition here:

However, I don’t understand where I should do the changes that he suggests me.

I think that he suggest me to create new metrics that do this change.

Are your images already of the same size? I’m failing to see where you modified the collate function of your dataloader, and without changing that it cannot collate images of different sizes

No @lgvaz , my images are of different sizes.

In my learner subclass I append all TensorImages to a list and feed into the model. I didn’t override or modify collate function!

I see that, but the dataloader should not even work if the images are of different sizes, before the learner get’s to say anything, this is what I’m failing to understand.

Can you explain in your case how a batch is collated when let’s say, if have an image of size (224,224) and another of size (512, 512)?

My bad, I looked into dataloader line and batch_size is set to 1. This one was a problem that I have. I forgot that I had it.

In Pytorch Forums there is listed some collate functions:

The most common solution is the next:

def my_collate(batch):
    data = [item[0] for item in batch]
    target = [item[1] for item in batch]
    target = torch.LongTensor(target)
    return [data, target]

We need to override the batchcollate that fastai library is usisng.

1 Like

Makes sense! =)

A simple zip collate function like the one I described before works great:

def _zip_collate(t): return Tuple(zip(*t))

I’m working on that rn

Where do you pass that function??