One of the core contributors of torchvision has said he’s working on faster/maskrcnn.
Right now, there isn’t an official roi align layer or non-max suppression in pytorch which means you have to either build one of the cuda extensions floating around or use a very slow naive version. Those problems + messy anchor box assignment code seem to be making most of the current pytorch attempts hard to work with. Also, I “think” roi align and nms are coming in the next version of torchvision.
Also also, I wonder why nobody ever mentions the Light-Head rcnn paper. https://arxiv.org/pdf/1711.07264v1.pdf. I’d be interested in throwing a mask-head on that and seeing how well it does at instance segmentation.
Edit: Also also also, @jeremy. I saw that you were looking for a 1-pass instance segmentation network on twitter. Did you ever find anything or are you just going to do object detection for part 2? I was actually trying (and failing) to hack a mask head onto retinanet as I saw that tweet. Perhaps you could point me in the right direction? The only thing I’ve been able to find is this (class agnostic) https://arxiv.org/abs/1711.07618 and the code isn’t available yet.