Is DL overkill for my use case?

Hi @cjmills, I was actually trying to bring the student-id example to Fastai, not sure if it’s even possible. I tried adding a resize transform in that tutorial’s code but the amount of custom classes and helpers made it a debugging nightmare. I sketched something like

documents = DataBlock(blocks=(ImageBlock, MaskBlock(['document']), BBoxBlock),
        get_items = get_image_files,
        get_y = label_func,
        splitter=RandomSplitter(),
        batch_tfms=aug_transforms(size=(120,160)))

but ain’t sure how to work with more than two blocks, especially for getting the labels, and how to use a Torch model with Fastai. Is it as simple as this, i.e, passing torchvision.models.detection.mask_rcnn.MaskRCNNPredictor as Learner.model?

I’m curious about your enthusiasm with IceVision. Do you think it will keep getting updated? I confess the out-of-sync documentation and examples with the actual code (renamed functions, etc) threw me off a bit, but would consider it if you think it’s worth it. How does it compare to Fastai? I see it ‘supports’ Fastai, but aren’t both supposed to be higher-level libraries to ease working with Pytorch?

I see now that IceVision seems to be maintained, for some reason last time I checked it seemed like there was over a year since the last commit, maybe I was in some weird branch.

So, IceVision’s big selling point (for me) is the ease of working with training data for tasks like object detection and image segmentation. That can include data-curation/cleaning and built-in transforms for data augmentation. It also provides interfaces for using pretrained models from different open-source resources. You can then train these models with fastai’s training loop.

As for using fastai object detection-related tasks, I would wait for the new part 2 course that starts in about a week and a half. Jeremy plans to cover object detection again, so I can update you on the approach when we get to that material.

Also, I might have accidentally sent you a link for an outdated branch earlier in this thread. So, that might be my bad.

The setup process for IceVision is a pain though, given all its dependencies. Fortunately, that is not a recurring hassle.

Creating custom DataBlocks is generally only recommended for people trying to create reusable application libraries for others. Otherwise, it’s best to use the mid-layer APIs for doing novel applications that aren’t already supported OOTB in fastai.

Hi Jeremy, thanks for chipping in! I understand there’s already Bbox, Mask and Image blocks (which should be the ones I need for Mask RCNN), no?

I was having trouble understanding how to work with one input and two targets, but found this example that does just that :slight_smile: Now, if I understand things correctly, I can use any torch model with Learner, right? So I can do something like

coco = DataBlock(blocks=(ImageBlock, BBoxBlock, MaskBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=[lambda o: img2bbox[o.name][0], lambda o: img2bbox[o.name][1]], 
                 item_tfms=Resize(128),
                 batch_tfms=aug_transforms(),
                 n_inp=1)

dls = coco.dataloaders(coco_source)

learn = Learner(
    dls, 
    torchvision.models.detection.mask_rcnn.MaskRCNNPredictor, 
    loss_func=F.nll_loss, 
    opt_func=opt_func, 
    metrics=accuracy
    )

?

I’d suggest watching the live coding videos – we go into that process there.

Hi, I’m getting back to this project (with pure Pytorch this time) but now with considerably larger images (14.000x10.000px). I understand Mask RCNN resizes images internally to 1024x1024 with 0 padding, but was advised to annotate the original images for increased precision.

Thing is, I’m running into RAM issues in my __getitem___ method. An image with 24 object instances creates a [14000,10000,24] shaped tensor that free Colab can’t handle. Annotating in CVAT is also super slow since I have to upload and request the images from their server.

My objects are again quite simple (rectangular), so I don’t think this resolution is needed. Is there a sweet spot I could resize to in order to alleviate these issues? I still intend to predict on full size images though, if it makes a difference…