@WaterKnight try making the Mask block the first block. The order matters here
@muellerzr With the next order:
manual = DataBlock(blocks=(ImageBlock,MaskBlock(codes),BBoxBlockSegmentation, BBoxLblBlock),
get_items=partial(get_image_files,folders=[dataset1]),
getters=getters,
splitter=RandomSplitter(valid_pct=0.1,seed=2020),
item_tfms=Resize((size,size)),
batch_tfms=Normalize.from_stats(*imagenet_stats),
n_inp=1
)
manual.summary(path_images)
dls = manual.dataloaders(path_images,bs=bs)
dls.one_batch()
Different error
Setting-up type transforms pipelines
Collecting items from ../datasets/Images
Found 621 items
2 datasets of sizes 559,62
Setting up Pipeline: <lambda> -> PILBase.create
Setting up Pipeline: get_mask -> PILBase.create
Setting up Pipeline: get_bbox -> TensorBBox.create
Setting up Pipeline: get_bbox_label -> MultiCategorize
Building one sample
Pipeline: <lambda> -> PILBase.create
starting from
../datasets/Images/manual/165.png
applying <lambda> gives
../datasets/Images/manual/165.png
applying PILBase.create gives
PILImage mode=RGB size=1002x1004
Pipeline: get_mask -> PILBase.create
starting from
../datasets/Images/manual/165.png
applying get_mask gives
../datasets/Labels/manual/165.png
applying PILBase.create gives
PILMask mode=L size=1002x1004
Pipeline: get_bbox -> TensorBBox.create
starting from
../datasets/Images/manual/165.png
applying get_bbox gives
[[425, 387, 641, 591]]
applying TensorBBox.create gives
TensorBBox of size 1x4
Pipeline: get_bbox_label -> MultiCategorize
starting from
../datasets/Images/manual/165.png
applying get_bbox_label gives
[Class1]
applying MultiCategorize gives
TensorMultiCategory([1])
Final sample: (PILImage mode=RGB size=1002x1004, PILMask mode=L size=1002x1004, TensorBBox([[425., 387., 641., 591.]]), TensorMultiCategory([1]))
Setting up after_item: Pipeline: AddMaskCodes -> BBoxLabeler -> PointScaler -> Resize -> ToTensor
Setting up before_batch: Pipeline: mybb_pad
Setting up after_batch: Pipeline: IntToFloatTensor -> Normalize
Could not do one pass in your dataloader, there is something wrong in it
Building one batch
Applying item_tfms to the first sample:
Pipeline: AddMaskCodes -> BBoxLabeler -> PointScaler -> Resize -> ToTensor
starting from
(PILImage mode=RGB size=1002x1004, PILMask mode=L size=1002x1004, TensorBBox of size 1x4, TensorMultiCategory([1]))
applying AddMaskCodes gives
(PILImage mode=RGB size=1002x1004, PILMask mode=L size=1002x1004, TensorBBox of size 1x4, TensorMultiCategory([1]))
applying BBoxLabeler gives
(PILImage mode=RGB size=1002x1004, PILMask mode=L size=1002x1004, TensorBBox of size 1x4, TensorMultiCategory([1]))
applying PointScaler gives
(PILImage mode=RGB size=1002x1004, PILMask mode=L size=1002x1004, TensorBBox of size 1x4, TensorMultiCategory([1]))
applying Resize gives
(PILImage mode=RGB size=1002x1002, PILMask mode=L size=1002x1002, TensorBBox of size 1x4, TensorMultiCategory([1]))
applying ToTensor gives
(TensorImage of size 3x1002x1002, TensorMask of size 1002x1002, TensorBBox of size 1x4, TensorMultiCategory([1]))
Adding the next 3 samples
Applying before_batch to the list of samples
Pipeline: mybb_pad
starting from
[(TensorImage of size 3x1002x1002, TensorMask of size 1002x1002, TensorBBox of size 1x4, TensorMultiCategory([1])), (TensorImage of size 3x1002x1002, TensorMask of size 1002x1002, TensorBBox of size 1x4, TensorMultiCategory([1])), (TensorImage of size 3x1002x1002, TensorMask of size 1002x1002, TensorBBox of size 1x4, TensorMultiCategory([1])), (TensorImage of size 3x1002x1002, TensorMask of size 1002x1002, TensorBBox of size 1x4, TensorMultiCategory([1]))]
applying mybb_pad failed.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-22-e40df62d36e3> in <module>
7 n_inp=1
8 )
----> 9 manual.summary(path_images)
10 dls = manual.dataloaders(path_images,bs=bs)
11 dls.one_batch()
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastai2/data/block.py in summary(self, source, bs, show_batch, **kwargs)
171 if len([f for f in dls.train.before_batch.fs if f.name != 'noop'])!=0:
172 print("\nApplying before_batch to the list of samples")
--> 173 s = _apply_pipeline(dls.train.before_batch, s)
174 else: print("\nNo before_batch transform to apply")
175
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastai2/data/block.py in _apply_pipeline(p, x)
131 except Exception as e:
132 print(f" applying {name} failed.")
--> 133 raise e
134 return x
135
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastai2/data/block.py in _apply_pipeline(p, x)
127 name = f.name
128 try:
--> 129 x = f(x)
130 if name != "noop": print(f" applying {name} gives\n {_short_repr(x)}")
131 except Exception as e:
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastcore/transform.py in __call__(self, x, **kwargs)
70 @property
71 def name(self): return getattr(self, '_name', _get_name(self))
---> 72 def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
73 def decode (self, x, **kwargs): return self._call('decodes', x, **kwargs)
74 def __repr__(self): return f'{self.name}: {self.encodes} {self.decodes}'
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastcore/transform.py in _call(self, fn, x, split_idx, **kwargs)
80 def _call(self, fn, x, split_idx=None, **kwargs):
81 if split_idx!=self.split_idx and self.split_idx is not None: return x
---> 82 return self._do_call(getattr(self, fn), x, **kwargs)
83
84 def _do_call(self, f, x, **kwargs):
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastcore/transform.py in _do_call(self, f, x, **kwargs)
84 def _do_call(self, f, x, **kwargs):
85 if not _is_tuple(x):
---> 86 return x if f is None else retain_type(f(x, **kwargs), x, f.returns_none(x))
87 res = tuple(self._do_call(f, x_, **kwargs) for x_ in x)
88 return retain_type(res, x)
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
96 if not f: return args[0]
97 if self.inst is not None: f = MethodType(f, self.inst)
---> 98 return f(*args, **kwargs)
99
100 def __get__(self, inst, owner):
<ipython-input-13-ef644dab5e4a> in mybb_pad(samples, pad_idx)
2 "Function that collect `samples` of labelled bboxes and adds padding with `pad_idx`."
3 if len(samples[0]) > 3:
----> 4 samples = [(s[0], *clip_remove_empty(*s[1:3])) for s in samples]
5 else:
6 samples = [(s[0], *clip_remove_empty(*s[1:])) for s in samples]
<ipython-input-13-ef644dab5e4a> in <listcomp>(.0)
2 "Function that collect `samples` of labelled bboxes and adds padding with `pad_idx`."
3 if len(samples[0]) > 3:
----> 4 samples = [(s[0], *clip_remove_empty(*s[1:3])) for s in samples]
5 else:
6 samples = [(s[0], *clip_remove_empty(*s[1:])) for s in samples]
~/anaconda3/envs/seg/lib/python3.7/site-packages/fastai2/vision/data.py in clip_remove_empty(bbox, label)
26 bbox = torch.clamp(bbox, -1, 1)
27 empty = ((bbox[...,2] - bbox[...,0])*(bbox[...,3] - bbox[...,1]) < 0.)
---> 28 return (bbox[~empty], label[~empty])
29
30 # Cell
IndexError: The shape of the mask [1002] at index 0 does not match the shape of the indexed tensor [1, 4] at index 0
I’ll have to take a look at this later tonight and get back with you
Nice!!
Thank you very much for your help.
I am going to take a deep look into the Mask-RCNN model that is available in torchvision. This model returns the losses instead of the predictions.
I am going to adjust Mask-RCNN code from torchvision with the aim of that this model returns the predictions, so that we train as usual with fastai. We are just going to need to update the loss_func.
@muellerzr did you found the solution??
For getting Torchvision Mask-RCNN work with learner class I think that the proper way is subclassing Learner
.
I have added a topic where I am explaining my concerns and problems that I am struggling with.
Did you have time to look at it?? If not, no problem. Thank you for all your help.
I am trying to create a new block for MaskRCNN. It’s working.
However, Masks are not getting resized. You can take a look here.
UPDATE with the progress.
Getting near to solution!!
Dataloader is working:
class MaskRCNN(dict):
@classmethod
def create(cls, dictionary):
return cls(dict({x:dictionary[x] for x in dictionary.keys()}))
def show(self, ctx=None, **kwargs):
dictionary = self
boxes = dictionary["boxes"]
labels = dictionary["labels"]
masks = dictionary["masks"]
result = masks
return show_image(result, ctx=ctx, **kwargs)
def MaskRCNNBlock():
return TransformBlock(type_tfms=MaskRCNN.create, batch_tfms=IntToFloatTensor)
def get_bbox(o):
label_path = get_y_fn(o)
mask=PILMask.create(label_path)
pos = np.where(mask)
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
return TensorBBox.create([xmin, ymin, xmax, ymax])
def get_bbox_label(o):
return TensorCategory([1])
def get_mask(o):
label_path = get_y_fn(o)
mask=PILMask.create(label_path)
mask=image2tensor(mask)
return TensorMask(mask)
def get_dict(o):
return {"boxes": get_bbox(o), "labels": get_bbox_label(o),"masks": get_mask(o)}
getters = [lambda o: o, get_dict]
maskrccnnDataBlock = DataBlock(
blocks=(ImageBlock, MaskRCNNBlock),
get_items=partial(get_image_files,folders=[manual_name]),
getters=getters,
splitter=RandomSplitter(valid_pct=0.1,seed=2020),
item_tfms=Resize((size,size)),
batch_tfms=Normalize.from_stats(*imagenet_stats)
)
maskrccnnDataBlock.summary(path_images)
dls = maskrccnnDataBlock.dataloaders(path_images,bs=bs)
Testing if data works with model:
b = dls.one_batch()
from torchvision.models.detection.mask_rcnn import *
model=maskrcnn_resnet50_fpn(num_classes=2,min_size=1002,max_size=1002)
model.train()
model = model.to("cuda")
image,target=b
images=[]
for aux in image:
images.append(aux)
targets= []
for i in range(len(target["masks"])):
targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
output=model(images,targets)
output
model.eval()
output=model(images)
output
This works. So I decided to create a subclass of Learner for making compatible with all FastAI Library:
class Mask_RCNN_Learner(Learner):
def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
moms=(0.95,0.85,0.95)):
super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
metrics, path, model_dir, wd, wd_bn_bias, train_bn,
moms)
def all_batches(self):
self.n_iter = len(self.dl)
for o in enumerate(self.dl): self.one_batch(*o)
def one_batch(self, i, b):
self.iter = i
try:
self._split(b); self('begin_batch')
images =[]
for aux in self.xb:
images.append(aux)
targets= []
for i in range(len(self.yb["masks"])):
targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
loss_dict = self.model(images,targets); self('after_pred')
if len(self.yb) == 0: return
loss = sum(loss for loss in loss_dict.values())
self.loss = loss; self('after_loss')
if not self.training: return
self.loss.backward(); self('after_backward')
self.opt.step(); self('after_step')
self.opt.zero_grad()
except CancelBatchException: self('after_cancel_batch')
finally: self('after_batch')
def _do_begin_fit(self, n_epoch):
self.n_epoch,self.loss = n_epoch,tensor(0.); self('begin_fit')
def _do_epoch_train(self):
try:
self.dl = self.dls.train; self('begin_train')
self.all_batches()
except CancelTrainException: self('after_cancel_train')
finally: self('after_train')
def _do_epoch_validate(self, ds_idx=1, dl=None):
if dl is None: dl = self.dls[ds_idx]
try:
self.dl = dl; self('begin_validate')
with torch.no_grad(): self.all_batches()
except CancelValidException: self('after_cancel_validate')
finally: self('after_validate')
@log_args(but='cbs')
def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
with self.added_cbs(cbs):
if reset_opt or not self.opt: self.create_opt()
if wd is None: wd = self.wd
if wd is not None: self.opt.set_hypers(wd=wd)
self.opt.set_hypers(lr=self.lr if lr is None else lr)
try:
self._do_begin_fit(n_epoch)
for epoch in range(n_epoch):
try:
self.epoch=epoch; self('begin_epoch')
self._do_epoch_train()
self._do_epoch_validate()
except CancelEpochException: self('after_cancel_epoch')
finally: self('after_epoch')
except CancelFitException: self('after_cancel_fit')
finally: self('after_fit')
def validate(self, ds_idx=1, dl=None, cbs=None):
if dl is None: dl = self.dls[ds_idx]
with self.added_cbs(cbs), self.no_logging(), self.no_mbar():
self(_before_epoch)
self._do_epoch_validate(ds_idx, dl)
self(_after_epoch)
return getattr(self, 'final_record', None)
Just to mention which are the changes:
self._split(b); self('begin_batch')
images =[]
for aux in self.xb:
images.append(aux)
targets= []
for i in range(len(self.yb["masks"])):
targets.append({"boxes": target["boxes"][i], "labels": target["labels"][i],"masks": target["masks"][i]})
loss_dict = self.model(images,targets); self('after_pred')
if len(self.yb) == 0: return
loss = sum(loss for loss in loss_dict.values())
The learner construction:
from torchvision.models.detection.mask_rcnn import *
model=maskrcnn_resnet50_fpn(num_classes=2,min_size=1002,max_size=1002)
model.train()
model = model.to("cuda")
learn = Mask_RCNN_Learner(dls=dls, model=model,loss_func=nn.L1Loss(),
wd=1e-1).to_fp16()
learn.fit_one_cycle(5, 1e-3)
Gives this error:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
_pickle.PicklingError: Can't pickle <class '__main__.MaskRCNN'>: it's not the same object as __main__.MaskRCNN
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
Traceback (most recent call last):
_pickle.PicklingError: Can't pickle <class '__main__.MaskRCNN'>: it's not the same object as __main__.MaskRCNN
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/david/anaconda3/envs/seg/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
So, that’s were I am stucked right now. If this works, just need to figurate how modify the metrics computing
@WaterKnight
Such error is related to pickling. Usually pickling error happens when it feels confused on the class that you are trying to dump. (e.g. you import a class, make a change on it, and then dump an instance of the class/ your class has a conflicting name).
My best guess is that the error comes from name conflict of your data class MaskRCNN
.
I see you import Mask RCNN module from torchvision (i.e. from torchvision.models.detection.mask_rcnn import *
). In this module, there is also another class called MaskRCNN
(see the source code), which in name conflict with your data class MaskRCNN
.
Try changing the class name to something else and see if the error goes away
Thank you very much. It solved the issue.
MaskRCNN is working now in FastAI2!
The remaining work is:
- Adjust the input for the metrics
- Solve issues with FP16. Some losses of the model are NaN in FP16 and not in FP32.
- Improve show_result
For the metrics I was given some intuition:
However, I don’t understand.
You need to write a function that passed your yb["mask"]
to the metric function (in fastai) you want to use.
Ahhh, okey. Could you link me the default function , please? So I can look at his code and override it.
I don’t know which functions should I redefine. Don’t know which line of code is calling metrics computation.
In that case all fastai metrics would work.
EDIT
@sgugger I am editing this post with a better explanation.
Learner
class can receive as parameter an array of metrics. Somewhere in FastAI lib should be a function that pass the data to each metric (self.yb
and self.preds
). I have been looking for this code inside learner.py
and in callbacks.
However, I didn’t manage to find it. Could you post here the function or a link that points there, please?
I discover what you were refering for.
I have seen 2 options for solve this issue:
- Create differents metrics that get the mask from the dict. However, It will suppose duplicating code in each class.
- Overriding
Recorder.after_batch
inline 417
for met in mets: met.accumulate(self.learn)
. Then no need to code cuplication
Do you think that is a better option to pass a deep_copy of self.learn
or other object where pred
and yb
contains just the masks?
Which option looks better for you?
Thank you very much for the previous help!
I don’t understand. Could you explain it a little more please?
Hey,I am training MaskRCNN right now for binary segmentation. It is fully working with variable batch size. The only thing not working is Mixed Precission It is owed to an error on PyTorch
I’ll be sharing the repo in a month! This is my final degree project.
no worries – best of luck!
I hope that by the time PyTorch solves the problem with torch.cuda.amp.autocast and maskrcnn.
In this repo you will find semantic segmentation models too!
I guess what i am interested to see if you ficgured out a way to use an error metric. Also, does lr_finder work well for MaskRCNN
Yes, I managed to get Dice, Jaccard Coeff and other metrics for segmentation. Lr_find is working. Frezzeing and unfreezing too.
Just the only issue is Mixed Precission.