[Solved] Bb_pad removing bound box data from dataloader

At least, thats what it looks like it’s doing, but I’m not sure.

My code:

pages_db = DataBlock(
    blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(),
    get_y=[
        lambda o: trn_df_bb_trim.loc[trn_df_bb_trim['name']==f'{o.stem}{o.suffix}','label'].values[0][0],
        lambda o: trn_df_bb_trim.loc[trn_df_bb_trim['name']==f'{o.stem}{o.suffix}','label'].values[0][1]
    ],
    item_tfms= Resize(512),
    n_inp=1
)


#pages_dl = pages_db.dataloaders(TRN_DATA_PTH)
pages_db.summary(TRN_DATA_PTH, before_batch=None)

Fairly straightforward fastai code. The lambda’s I’ve verified are grabbing the boxes and the labels correctly from my pandas table.

The lead is this.
First, proof that the dataloader knows the bboxes aren’t emtpy:

Final sample: (PILImage mode=RGB size=1776x2808, TensorBBox([[ 860., 1805.,  874., 1779.],
        [ 813., 1781.,  883., 1748.],
        [1380., 2487., 1460., 2454.],
        [ 813., 2569.,  861., 2532.],
        [ 280., 2560.,  316., 2521.],
        ....
TensorMultiCategory([ 195,  160,   89,   80,  120,   80,   80,   37,  120,   80,   73,   45,
          80,   44,  120,  108,   44,  149,   40,   89,   79,  108,   48,   73,
          45,   44,  108,   79,   43,   98,   45,  873,  114,   53,   79,   89,
          ....

Then, the before_batch

Applying before_batch to the list of samples
  Pipeline: bb_pad
    starting from
      [(TensorImage of size 3x512x512, TensorBBox of size 183x4, TensorMultiCategory of size 183), (TensorImage of size 3x512x512, TensorBBox of size 96x4, TensorMultiCategory of size 96), (TensorImage of size 3x512x512, TensorBBox of size 207x4, TensorMultiCategory of size 207), (TensorImage of size 3x512x512, TensorBBox of size 285x4, TensorMultiCategory of size 285)]
    applying bb_pad gives
      [(TensorImage of size 3x512x512, TensorBBox of size 0x4, TensorMultiCategory([], dtype=torch.int64)), (TensorImage of size 3x512x512, TensorBBox of size 0x4, TensorMultiCategory([], dtype=torch.int64)), (TensorImage of size 3x512x512, TensorBBox of size 0x4, TensorMultiCategory([], dtype=torch.int64)), (TensorImage of size 3x512x512, TensorBBox of size 0x4, TensorMultiCategory([], dtype=torch.int64))]

The before_batch look at the sample fills me with confidence since the TensorBBox and the TensorMultiCategory match in non-zero size. However, bb_pad appears to wreak havoc.

Things to note:

  1. Both summary and dataloaders execute with no problems.
  2. I can do show_batch and get an image back with zero bounding boxes or labels

I know that bb_pad is added as a before_batch operation thanks to BBoxBlock, but unsure what the heck it’s doing to nullify all my labels. Is the item transform not being applied to the targets in a way thats making all the bounding boxes empty?

Meanwhile, I’m investigating how bb_pad operates on its parameters …

Okay so it’s clearly because my TensorBBox is not between -1 and 1 when it hits bb_pad (really, when clip_remove_empty is applied). Why not? Rather, what DOES scale the points? The example in the fastai docs doesn’t do anything, and the bounding boxes there are definitely outside the -1,1 clamp range.

I’m talking out loud at this point, since it slowly becomes clearer. It comes down to my understanding of this paragraph:

Unless specifically mentioned, all the following transforms can be used as single-item transforms (in one of the list in the tfms you pass to a TfmdDS or a Datasource) or tuple transforms (in the tuple_tfms you pass to a TfmdDS or a Datasource). The safest way that will work across applications is to always use them as tuple_tfms. For instance, if you have points or bounding boxes as targets and use Resize as a single-item transform, when you get to PointScaler (which is a tuple transform) you won’t have the correct size of the image to properly scale your points.

Struggling to comprehend how to apply this. Should I specify specific after_item transforms, a PointScaler on my image, before the before_batch hook? I suppose I’ll try it, but I still need some kind of resize to make sense of the images …

So, I got something working

pages_db = DataBlock(
    blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(),
    get_y=[
        lambda o: trn_df_bb_trim.loc[trn_df_bb_trim['name']==f'{o.stem}{o.suffix}','label'].values[0][0],
        lambda o: trn_df_bb_trim.loc[trn_df_bb_trim['name']==f'{o.stem}{o.suffix}','label'].values[0][1]
    ],
    n_inp=1
)

pages_ds = pages_db.datasets(TRN_DATA_PTH)

pages_dl = TfmdDL(pages_ds, bs=1, after_item=[BBoxLabeler(), PointScaler(), ToTensor()])

produces good image when I call show_batch with it. However, I’m still not satisfied, particularly because

  1. Why does the example for the coco dataset not need to do this?
  2. Why can’t I put [BBoxLabeler(), PointScaler(), ToTensor()]) into the item_tfms field of the DataBlock and instead call dataloaders, not datasets? As far I can tell this should work, since a DataLoader after_item is set to the passed in item_tfms
  3. I can’t set a batch size > 1 because the collate fails. However, nowhere does a Resize tfm seem to work, or do anything. I continuously get

RuntimeError: stack expects each tensor to be equal size, but got [3, 3062, 1854] at entry 0 and [3, 4400, 2920] at entry 1

I’ll take all the help I can get.

So I must be the pinnacle of stUpid. My problem: Y coordinate wasn’t correct for the bottom right. Solution: fixed the Y coordinate. I was SURE I had the coordinates right, since without bb_pad show_batch with bs=1 was working.

Now, bs=4 with works flawlessly. I love myself. If anyone comes across this thread, I hope you find it entertaining.

Final solution:

pages_db = DataBlock(
    blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(),
    get_y=[
        lambda o: trn_df_bb_trim.loc[trn_df_bb_trim['name']==o.name, 'label'].values[0][0],
        lambda o: trn_df_bb_trim.loc[trn_df_bb_trim['name']==o.name, 'label'].values[0][1]
    ],
    item_tfms=Resize(512, method='squish'),
    n_inp=1
)
pages_dl = pages_db.dataloaders(TRN_DATA_PTH, bs=4)