Just wondering what would be the most efficient way to decode the bounding box that has been transformed by fastai (-1,1) scale back to normal scale?
Here is what I meant:
Create datablock for bbox, if you call dls.one_batch(), you should get a 3 items tuple (Image, bbox, label) b = dls.one_batch() # here is a tuple (image, bbox, bboxlabel)
if you call b[1], the shape is (bs, num_of_bbox, 4)
here is output (example)
as you can see, bbox coordinates has been transformed to (-1,1) for data argumentation, in v1, I believe this has been done by the FlowField
In fastai v2, as I tired, dispatch system is very powerful, but it has to decode the entire tuple (image, bbox, label) dls.decode_batch(b)
here is an example: grab a bunch of bboxs from one image in the batch
It works but I assume it would be slow if I call it every batch before I use the factory provided anchor box matching function (try to port efficientDet into fastai) which takes [yxyx] in pascal format. (Guess I can write my own but that do take a lot of time… )
Well, ideally you’d use what you just found, the decode_batch with as large of a batch as you possibly can. Otherwise you’d run it manually through the decodes of TensorPoint and BBoxPoint (IIRC, I haven’t played with BBox’s but I’ll gladly go down this rabbit hole with you and learn )
The other option is you use decode instead. I believe it uses type-annotation so you could be able to just pass in your bbox?
(If that doesn’t help some, I’ll go down the rabbit hole a bit later tonight)
hows going, thanks for the quick reply and awesome work on those youtube videos
That’s what I am thinking, to re-scale back you just have to know the formula that they used to convert to (-1,1), I was wondering fastai has a hidden function that can transfer the coordinates back.
The problem with dls.decode_batch is you try to fit a image with size 512+, I assume it will be super inefficient to call decode every time with bs=4.
But sure, I will keep digging if I can find a better way, thanks a lot
It actually really isn’t I found. I did a lot of work trying to speed things up and of all the things I found I could reduce time on, the decode batch I couldn’t! (that’s why I said the biggest batch you could, because it can all fit into memory super easily.)
In the meantime (before I go investigate the time differences in decode_batch etc myself)… that formula is in PointScaler, specifically _unscale_points
and for the BBox it’s specifically here:
(I included the above since this one’s decodes calls And extends the prior)
Should give you a nice starting place! Tell us what you find out (and most importantly, what are those times? )
Here is the code for anyone who wants to do the same (or myself in the future research this topic… )
assume you formed dataloader : dls
b = dls.one_batch()
dls.decode_batch(b)[0][1] # this give you first item bbox in the batch
from fastai.vision.core import _unscale_pnts
_unscale_pnts(b[1][0], img_size) # this give you first item bbox in the batch
The only thing note here is first one returns TensorBox, 2nd one returns TensorPoint, it won’t have much effect on other systems, but later on if you want to call show, you probably want to cast it back to TensorBox (becuase it is a TensorBox type)