Instance segmentation

For getting Torchvision Mask-RCNN work with learner class I think that the proper way is subclassing Learner.

I have added a topic where I am explaining my concerns and problems that I am struggling with.

Did you have time to look at it?? If not, no problem. Thank you for all your help. :smiley:

I am trying to create a new block for MaskRCNN. It’s working.

However, Masks are not getting resized. You can take a look here.

UPDATE with the progress.

Getting near to solution!!

Dataloader is working:

class MaskRCNN(dict):
    
    @classmethod
    def create(cls, dictionary): 
        return cls(dict({x:dictionary[x] for x in dictionary.keys()}))
    
    def show(self, ctx=None, **kwargs): 
        dictionary = self
        
        boxes = dictionary["boxes"]
        labels = dictionary["labels"]
        masks = dictionary["masks"]
        
        result = masks
        return show_image(result, ctx=ctx, **kwargs)

def MaskRCNNBlock(): 
    return TransformBlock(type_tfms=MaskRCNN.create, batch_tfms=IntToFloatTensor)

def get_bbox(o):
    label_path = get_y_fn(o)
    mask=PILMask.create(label_path)
    pos = np.where(mask)
    xmin = np.min(pos[1])
    xmax = np.max(pos[1])
    ymin = np.min(pos[0])
    ymax = np.max(pos[0])
    
    return TensorBBox.create([xmin, ymin, xmax, ymax])
    
def get_bbox_label(o):
    
    return TensorCategory([1])
    
    
def get_mask(o):
    label_path = get_y_fn(o)
    mask=PILMask.create(label_path)
    mask=image2tensor(mask)
    return TensorMask(mask)

def get_dict(o):
    return {"boxes": get_bbox(o), "labels": get_bbox_label(o),"masks": get_mask(o)}
    

getters = [lambda o: o, get_dict]

maskrccnnDataBlock = DataBlock(
    blocks=(ImageBlock, MaskRCNNBlock),
    get_items=partial(get_image_files,folders=[manual_name]),
    getters=getters,
    splitter=RandomSplitter(valid_pct=0.1,seed=2020),
    item_tfms=Resize((size,size)),
    batch_tfms=Normalize.from_stats(*imagenet_stats)
)
maskrccnnDataBlock.summary(path_images)
dls = maskrccnnDataBlock.dataloaders(path_images,bs=bs)

Testing if data works with model:

b = dls.one_batch()

from torchvision.models.detection.mask_rcnn import *
model=maskrcnn_resnet50_fpn(num_classes=2,min_size=1002,max_size=1002)
model.train()
model = model.to("cuda")

image,target=b
images=[]
for aux in image:
    images.append(aux)
targets= []
for i in range(len(target["masks"])):
    targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
output=model(images,targets)
output

model.eval()
output=model(images)
output

This works. So I decided to create a subclass of Learner for making compatible with all FastAI Library:

class Mask_RCNN_Learner(Learner):
    def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
                 metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
                 moms=(0.95,0.85,0.95)):
        super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
                 metrics, path, model_dir, wd, wd_bn_bias, train_bn,
                 moms)
      
    def all_batches(self):
        self.n_iter = len(self.dl)
        for o in enumerate(self.dl): self.one_batch(*o)

    def one_batch(self, i, b):
        self.iter = i
        try:
            self._split(b);                                  self('begin_batch')
            images =[]
            for aux in self.xb:
                images.append(aux)
            targets= []
            for i in range(len(self.yb["masks"])):
                targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
            loss_dict = self.model(images,targets);       self('after_pred')
            if len(self.yb) == 0: return
            loss = sum(loss for loss in loss_dict.values())
            self.loss = loss;                                self('after_loss')
            if not self.training: return
            self.loss.backward();                            self('after_backward')
            self.opt.step();                                 self('after_step')
            self.opt.zero_grad()
        except CancelBatchException:                         self('after_cancel_batch')
        finally:                                             self('after_batch')

    def _do_begin_fit(self, n_epoch):
        self.n_epoch,self.loss = n_epoch,tensor(0.);         self('begin_fit')

    def _do_epoch_train(self):
        try:
            self.dl = self.dls.train;                        self('begin_train')
            self.all_batches()
        except CancelTrainException:                         self('after_cancel_train')
        finally:                                             self('after_train')

    def _do_epoch_validate(self, ds_idx=1, dl=None):
        if dl is None: dl = self.dls[ds_idx]
        try:
            self.dl = dl;                                    self('begin_validate')
            with torch.no_grad(): self.all_batches()
        except CancelValidException:                         self('after_cancel_validate')
        finally:                                             self('after_validate')                                              
    
    @log_args(but='cbs')
    def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
        with self.added_cbs(cbs):
            if reset_opt or not self.opt: self.create_opt()
            if wd is None: wd = self.wd
            if wd is not None: self.opt.set_hypers(wd=wd)
            self.opt.set_hypers(lr=self.lr if lr is None else lr)

            try:
                self._do_begin_fit(n_epoch)
                for epoch in range(n_epoch):
                    try:
                        self.epoch=epoch;          self('begin_epoch')
                        self._do_epoch_train()
                        self._do_epoch_validate()
                    except CancelEpochException:   self('after_cancel_epoch')
                    finally:                       self('after_epoch')

            except CancelFitException:             self('after_cancel_fit')
            finally:                               self('after_fit')  
                
    def validate(self, ds_idx=1, dl=None, cbs=None):
        if dl is None: dl = self.dls[ds_idx]
        with self.added_cbs(cbs), self.no_logging(), self.no_mbar():
            self(_before_epoch)
            self._do_epoch_validate(ds_idx, dl)
            self(_after_epoch)
        return getattr(self, 'final_record', None)

Just to mention which are the changes:

 self._split(b);                                  self('begin_batch')
            images =[]
            for aux in self.xb:
                images.append(aux)
            targets= []
            for i in range(len(self.yb["masks"])):
                targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
            loss_dict = self.model(images,targets);       self('after_pred')
            if len(self.yb) == 0: return
            loss = sum(loss for loss in loss_dict.values())

The learner construction:

from torchvision.models.detection.mask_rcnn import *
model=maskrcnn_resnet50_fpn(num_classes=2,min_size=1002,max_size=1002)
model.train()
model = model.to("cuda")
learn = Mask_RCNN_Learner(dls=dls, model=model,loss_func=nn.L1Loss(),
                wd=1e-1).to_fp16()
learn.fit_one_cycle(5, 1e-3)

Gives this error:

Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
_pickle.PicklingError: Can't pickle <class '__main__.MaskRCNN'>: it's not the same object as __main__.MaskRCNN
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
_pickle.PicklingError: Can't pickle <class '__main__.MaskRCNN'>: it's not the same object as __main__.MaskRCNN
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)

So, that’s were I am stucked right now. If this works, just need to figurate how modify the metrics computing

2 Likes

@WaterKnight
Such error is related to pickling. Usually pickling error happens when it feels confused on the class that you are trying to dump. (e.g. you import a class, make a change on it, and then dump an instance of the class/ your class has a conflicting name).

My best guess is that the error comes from name conflict of your data class MaskRCNN.

I see you import Mask RCNN module from torchvision (i.e. from torchvision.models.detection.mask_rcnn import *). In this module, there is also another class called MaskRCNN (see the source code), which in name conflict with your data class MaskRCNN.

Try changing the class name to something else and see if the error goes away

1 Like

Thank you very much. It solved the issue.

MaskRCNN is working now in FastAI2! :smiley:

The remaining work is:

  • Adjust the input for the metrics
  • Solve issues with FP16. Some losses of the model are NaN in FP16 and not in FP32.
  • Improve show_result

For the metrics I was given some intuition:

However, I don’t understand.

3 Likes

You need to write a function that passed your yb["mask"] to the metric function (in fastai) you want to use.

1 Like

Ahhh, okey. Could you link me the default function , please? So I can look at his code and override it.

I don’t know which functions should I redefine. Don’t know which line of code is calling metrics computation.

In that case all fastai metrics would work.


EDIT
@sgugger I am editing this post with a better explanation.

Learner class can receive as parameter an array of metrics. Somewhere in FastAI lib should be a function that pass the data to each metric (self.yb and self.preds). I have been looking for this code inside learner.py and in callbacks.

However, I didn’t manage to find it. Could you post here the function or a link that points there, please?

I discover what you were refering for.

I have seen 2 options for solve this issue:

  • Create differents metrics that get the mask from the dict. However, It will suppose duplicating code in each class.
  • Overriding Recorder.after_batch in line 417 for met in mets: met.accumulate(self.learn). Then no need to code cuplication

Do you think that is a better option to pass a deep_copy of self.learn or other object where pred and yb contains just the masks?

Which option looks better for you?

Thank you very much for the previous help! :smiley:

I don’t understand. Could you explain it a little more please?

Hi @WaterKnight do you have any github repo of your work that is sharable ? seems very interesting.

Hey,I am training MaskRCNN right now for binary segmentation. It is fully working with variable batch size. The only thing not working is Mixed Precission It is owed to an error on PyTorch

I’ll be sharing the repo in a month! This is my final degree project.

2 Likes

no worries – best of luck!

I hope that by the time PyTorch solves the problem with torch.cuda.amp.autocast and maskrcnn.

In this repo you will find semantic segmentation models too!

1 Like

I guess what i am interested to see if you ficgured out a way to use an error metric. Also, does lr_finder work well for MaskRCNN

Yes, I managed to get Dice, Jaccard Coeff and other metrics for segmentation. Lr_find is working. Frezzeing and unfreezing too.

Just the only issue is Mixed Precission.

1 Like

cool – so the code snippet you pasted above actually works ? Like if we just change the name of the data class MaskRCNN, as mentioned in the comments – it will actually work – or are there other changes.
Thanks for your inputs :slight_smile:

I needed to create a new TfdmList and some new transforms

I see, so adding to the aug_transforms in the maskrccnnDataBlock ?

@pinaki f you mean Augmentation transforms, I am using the ones from Albumentations library!

Hello. Wanted to know if you managed to advance on the MaskRCNN implementation.

1 Like