Fastai v2 models in production

feribg · March 9, 2020, 8:17pm

What’s the recommended way to deploy fastai models in production and avoid all the dataloader and dataset overhead.

So far this has been pretty easy:

preds1 = learn.model.eval()(None, torch.from_numpy(learn.dls.valid_ds[cont_names].values).float().cuda()).detach().cpu().numpy()

But I’m not sure how to get the trained procs in order to apply all the changes to the data ?

I can’t find a good way to go from a numpy array to making predictions for a dataset, to finding those same indices in the array and merging the predictions with the original data (not just the training inputs), if we got the dataloader route and do preds,targs learn.get_preds(ds) #no indices here

Thanks!

muellerzr · March 9, 2020, 8:20pm

You should use the test DataLoader with your items. IE:

dl = learn.dls.test_dl(items)

And then get predictions.

sebderhy · March 12, 2020, 9:32pm

Hi @feribg,

In case it helps, I just deployed a fastai2-based web API that derotates pictures (understands if a picture is rotated, and possibly sets it back straight). I don’t think it is super-useful, but it may be interesting as a deployment template because:

It uses FastAPI behind the scenes which provides automated doc generation, and many other features that I quite frankly don’t really understand
It provides both img2class and img2img APIs
There is in the repo a notebook which shows how to call the API in various different context (I strangely struggled quite a lot with this part).

There is also a web-UI, but it currently only does the classification (it does not derotate the pictures). I started looking for how to do an img2img web app, but I struggled with it since I have no experience in HTML and JS. If anyone has a good way to do it, please let me know!

Hope you will find it helpful :).
Seb

feribg · March 13, 2020, 3:04am

Thanks @sebderhy but I don’t actually see how you apply the dataset transformation steps, I see the main piece here but it just gets a raw byte array from the file upload right?

github.com

sebderhy/derotate/blob/8e27b4b2fd364e85398feded261cf571532c458a/app/server.py#L87


    if(rotation_state=='rotated180'): img_pil_out = img_pil.rotate(180)
    if(rotation_state=='rotated90'): img_pil_out = img_pil.rotate(-90)
    if(rotation_state=='rotated270'): img_pil_out = img_pil.rotate(90)
    img_pil_out.format = img_pil.format
    return img_pil_out




@app.post("/img2img/")
def img2img(file: UploadFile = File(...)):
    img_bytes = (file.file.read())
    pred = learn.predict(img_bytes)
    img_pil = Image.open(BytesIO(img_bytes))
    img_pil_out = derotate_img(pred, img_pil)
    out_img_bytes = image_to_byte_array(img_pil_out)
    with tempfile.NamedTemporaryFile(mode="w+b", suffix=".png", delete=False) as FOUT:
        FOUT.write(out_img_bytes)
        return FileResponse(FOUT.name, media_type="image/png")




if __name__ == '__main__':
    if 'serve' in sys.argv:

muellerzr · March 13, 2020, 3:11am

learn.predict applies any augmentation done to the validation set.

feribg · March 13, 2020, 3:25am

Got it, thanks @muellerzr, so how does it all work when the input in pred doesn’t seem to have a batch dimension:

    img_bytes = (file.file.read())
    pred = learn.predict(img_bytes)

Does it also apply a unit axis to the dataset passed ?

muellerzr · March 13, 2020, 3:28am

Go explore predicts source code

It makes a batch of 1

github.com

fastai/fastai2/blob/master/fastai2/learner.py#L228


        self._do_epoch_validate(dl=dl)
        self(event.after_epoch if inner else _after_epoch)
        if act is None: act = getattr(self.loss_func, 'activation', noop)
        res = cb.all_tensors()
        pred_i = 1 if with_input else 0
        if res[pred_i] is not None:
            res[pred_i] = act(res[pred_i])
            if with_decoded: res.insert(pred_i+2, getattr(self.loss_func, 'decodes', noop)(res[pred_i]))
        return tuple(res)


def predict(self, item, rm_type_tfms=None, with_input=False):
    dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms)
    inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
    dec = self.dls.decode_batch((*tuplify(inp),*tuplify(dec_preds)))[0]
    i = getattr(self.dls, 'n_inp', -1)
    dec_inp,dec_targ = map(detuplify, [dec[:i],dec[i:]])
    res = dec_targ,dec_preds[0],preds[0]
    if with_input: res = (dec_inp,) + res
    return res


def show_results(self, ds_idx=1, dl=None, max_n=9, shuffle=True, **kwargs):

feribg · March 14, 2020, 5:11am

Thanks yep I think I’ll need to get the source and dive deeper there’s a lot of indirections.

tcapelle · May 15, 2020, 8:21pm

I have a doubt, why The unet are so heavy.
I have to segment some images that are 3000x1500 and I can only feed one at a time. (in my 8GB card)
Is that heavy? I suppose that down the U, we have something like 512x3000x1500/2**4, so it is quite heavy. Any idea how to “lighten this up” for production?
Also, any tip to remove the Resize transform from the pipeline more elegant than doing:
learn.dls.after_item.fs.pop(1)
where 1 is the position of Resize.

muellerzr · May 15, 2020, 8:32pm

I don’t think so

Perhaps breaking up the image into 4 or 5 quadrants, run it as a batch, then bring it back together? (or more)

tcapelle · May 15, 2020, 8:34pm

Don’t know, One image eats up 7.9GB for a xresnet34 based unet.

muellerzr · May 15, 2020, 8:39pm

Yeah it does have a very large memory usage. I’d try the break up method as small as you reasonably can get. Since no augmentations besides normalization should be applied I think you’d be okay.

tcapelle · May 15, 2020, 8:40pm

probably the solutions comes from something like this: https://devblogs.nvidia.com/speeding-up-deep-learning-inference-using-tensorrt/

ninjalu · November 5, 2020, 7:39pm

Hello Zach! Thanks for your responses on the forums. I have found them very helpful. I am new to Fastai, and got stuck with get_pred. Could you offer some help please?

I have built a model classifying pictures, my end goal is to label all 20K pictures I have for further learning.
At the moment I have created a list of image paths (images), and I used iteration to label everything, as follows:

labels = []
for idx, image in enumerate(images):
label, _, _ = learn.predict(image)
labels.append(label)

This has been very slow. I want to use get_preds, but I couldn’t work out how to do it from the list of image paths to prediction.

TIA!

Lu

msivanes · November 6, 2020, 8:39am

arul_bharathi · March 12, 2021, 10:44pm

@feribg @muellerzr Due to the large-scale image sizes and img2img complexity why can’t we try creating a web app that might perform GPU inference? I have usually seen web apps only for CPU inference in img2img problems. Also, I am personally trying to create a web app using uvicorn+starlette for GPU inference, but indeed I am facing a lot of challenges with that.