For getting Torchvision Mask-RCNN work with learner class I think that the proper way is subclassing Learner
.
I have added a topic where I am explaining my concerns and problems that I am struggling with.
For getting Torchvision Mask-RCNN work with learner class I think that the proper way is subclassing Learner
.
I have added a topic where I am explaining my concerns and problems that I am struggling with.
Did you have time to look at it?? If not, no problem. Thank you for all your help.
I am trying to create a new block for MaskRCNN. It’s working.
However, Masks are not getting resized. You can take a look here.
UPDATE with the progress.
Getting near to solution!!
Dataloader is working:
class MaskRCNN(dict):
@classmethod
def create(cls, dictionary):
return cls(dict({x:dictionary[x] for x in dictionary.keys()}))
def show(self, ctx=None, **kwargs):
dictionary = self
boxes = dictionary["boxes"]
labels = dictionary["labels"]
masks = dictionary["masks"]
result = masks
return show_image(result, ctx=ctx, **kwargs)
def MaskRCNNBlock():
return TransformBlock(type_tfms=MaskRCNN.create, batch_tfms=IntToFloatTensor)
def get_bbox(o):
label_path = get_y_fn(o)
mask=PILMask.create(label_path)
pos = np.where(mask)
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
return TensorBBox.create([xmin, ymin, xmax, ymax])
def get_bbox_label(o):
return TensorCategory([1])
def get_mask(o):
label_path = get_y_fn(o)
mask=PILMask.create(label_path)
mask=image2tensor(mask)
return TensorMask(mask)
def get_dict(o):
return {"boxes": get_bbox(o), "labels": get_bbox_label(o),"masks": get_mask(o)}
getters = [lambda o: o, get_dict]
maskrccnnDataBlock = DataBlock(
blocks=(ImageBlock, MaskRCNNBlock),
get_items=partial(get_image_files,folders=[manual_name]),
getters=getters,
splitter=RandomSplitter(valid_pct=0.1,seed=2020),
item_tfms=Resize((size,size)),
batch_tfms=Normalize.from_stats(*imagenet_stats)
)
maskrccnnDataBlock.summary(path_images)
dls = maskrccnnDataBlock.dataloaders(path_images,bs=bs)
Testing if data works with model:
b = dls.one_batch()
from torchvision.models.detection.mask_rcnn import *
model=maskrcnn_resnet50_fpn(num_classes=2,min_size=1002,max_size=1002)
model.train()
model = model.to("cuda")
image,target=b
images=[]
for aux in image:
images.append(aux)
targets= []
for i in range(len(target["masks"])):
targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
output=model(images,targets)
output
model.eval()
output=model(images)
output
This works. So I decided to create a subclass of Learner for making compatible with all FastAI Library:
class Mask_RCNN_Learner(Learner):
def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
moms=(0.95,0.85,0.95)):
super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
metrics, path, model_dir, wd, wd_bn_bias, train_bn,
moms)
def all_batches(self):
self.n_iter = len(self.dl)
for o in enumerate(self.dl): self.one_batch(*o)
def one_batch(self, i, b):
self.iter = i
try:
self._split(b); self('begin_batch')
images =[]
for aux in self.xb:
images.append(aux)
targets= []
for i in range(len(self.yb["masks"])):
targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
loss_dict = self.model(images,targets); self('after_pred')
if len(self.yb) == 0: return
loss = sum(loss for loss in loss_dict.values())
self.loss = loss; self('after_loss')
if not self.training: return
self.loss.backward(); self('after_backward')
self.opt.step(); self('after_step')
self.opt.zero_grad()
except CancelBatchException: self('after_cancel_batch')
finally: self('after_batch')
def _do_begin_fit(self, n_epoch):
self.n_epoch,self.loss = n_epoch,tensor(0.); self('begin_fit')
def _do_epoch_train(self):
try:
self.dl = self.dls.train; self('begin_train')
self.all_batches()
except CancelTrainException: self('after_cancel_train')
finally: self('after_train')
def _do_epoch_validate(self, ds_idx=1, dl=None):
if dl is None: dl = self.dls[ds_idx]
try:
self.dl = dl; self('begin_validate')
with torch.no_grad(): self.all_batches()
except CancelValidException: self('after_cancel_validate')
finally: self('after_validate')
@log_args(but='cbs')
def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
with self.added_cbs(cbs):
if reset_opt or not self.opt: self.create_opt()
if wd is None: wd = self.wd
if wd is not None: self.opt.set_hypers(wd=wd)
self.opt.set_hypers(lr=self.lr if lr is None else lr)
try:
self._do_begin_fit(n_epoch)
for epoch in range(n_epoch):
try:
self.epoch=epoch; self('begin_epoch')
self._do_epoch_train()
self._do_epoch_validate()
except CancelEpochException: self('after_cancel_epoch')
finally: self('after_epoch')
except CancelFitException: self('after_cancel_fit')
finally: self('after_fit')
def validate(self, ds_idx=1, dl=None, cbs=None):
if dl is None: dl = self.dls[ds_idx]
with self.added_cbs(cbs), self.no_logging(), self.no_mbar():
self(_before_epoch)
self._do_epoch_validate(ds_idx, dl)
self(_after_epoch)
return getattr(self, 'final_record', None)
Just to mention which are the changes:
self._split(b); self('begin_batch')
images =[]
for aux in self.xb:
images.append(aux)
targets= []
for i in range(len(self.yb["masks"])):
targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
loss_dict = self.model(images,targets); self('after_pred')
if len(self.yb) == 0: return
loss = sum(loss for loss in loss_dict.values())
The learner construction:
from torchvision.models.detection.mask_rcnn import *
model=maskrcnn_resnet50_fpn(num_classes=2,min_size=1002,max_size=1002)
model.train()
model = model.to("cuda")
learn = Mask_RCNN_Learner(dls=dls, model=model,loss_func=nn.L1Loss(),
wd=1e-1).to_fp16()
learn.fit_one_cycle(5, 1e-3)
Gives this error:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
_pickle.PicklingError: Can't pickle <class '__main__.MaskRCNN'>: it's not the same object as __main__.MaskRCNN
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
_pickle.PicklingError: Can't pickle <class '__main__.MaskRCNN'>: it's not the same object as __main__.MaskRCNN
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
So, that’s were I am stucked right now. If this works, just need to figurate how modify the metrics computing
@WaterKnight
Such error is related to pickling. Usually pickling error happens when it feels confused on the class that you are trying to dump. (e.g. you import a class, make a change on it, and then dump an instance of the class/ your class has a conflicting name).
My best guess is that the error comes from name conflict of your data class MaskRCNN
.
I see you import Mask RCNN module from torchvision (i.e. from torchvision.models.detection.mask_rcnn import *
). In this module, there is also another class called MaskRCNN
(see the source code), which in name conflict with your data class MaskRCNN
.
Try changing the class name to something else and see if the error goes away
Thank you very much. It solved the issue.
MaskRCNN is working now in FastAI2!
The remaining work is:
For the metrics I was given some intuition:
However, I don’t understand.
You need to write a function that passed your yb["mask"]
to the metric function (in fastai) you want to use.
Ahhh, okey. Could you link me the default function , please? So I can look at his code and override it.
I don’t know which functions should I redefine. Don’t know which line of code is calling metrics computation.
In that case all fastai metrics would work.
EDIT
@sgugger I am editing this post with a better explanation.
Learner
class can receive as parameter an array of metrics. Somewhere in FastAI lib should be a function that pass the data to each metric (self.yb
and self.preds
). I have been looking for this code inside learner.py
and in callbacks.
However, I didn’t manage to find it. Could you post here the function or a link that points there, please?
I discover what you were refering for.
I have seen 2 options for solve this issue:
Recorder.after_batch
in line 417
for met in mets: met.accumulate(self.learn)
. Then no need to code cuplicationDo you think that is a better option to pass a deep_copy of self.learn
or other object where pred
and yb
contains just the masks?
Which option looks better for you?
Thank you very much for the previous help!
I don’t understand. Could you explain it a little more please?
Hey,I am training MaskRCNN right now for binary segmentation. It is fully working with variable batch size. The only thing not working is Mixed Precission It is owed to an error on PyTorch
I’ll be sharing the repo in a month! This is my final degree project.
no worries – best of luck!
I hope that by the time PyTorch solves the problem with torch.cuda.amp.autocast and maskrcnn.
In this repo you will find semantic segmentation models too!
I guess what i am interested to see if you ficgured out a way to use an error metric. Also, does lr_finder work well for MaskRCNN
Yes, I managed to get Dice, Jaccard Coeff and other metrics for segmentation. Lr_find is working. Frezzeing and unfreezing too.
Just the only issue is Mixed Precission.
cool – so the code snippet you pasted above actually works ? Like if we just change the name of the data class MaskRCNN, as mentioned in the comments – it will actually work – or are there other changes.
Thanks for your inputs
I needed to create a new TfdmList and some new transforms
I see, so adding to the aug_transforms in the maskrccnnDataBlock ?
Hello. Wanted to know if you managed to advance on the MaskRCNN implementation.