Custom data augmentation requiring access to inputs and targets

Hi, I’m writing some image augmentations where the transformations of the inputs and the targets depend on each other.
As far as I understand the “normal” transformations only take inputs or targets and apply the respective transformation.
I found that using the .add_tfm() on the databunch I can add a transformation which receives both inputs and targets.
However this (again as far as I understand) is applied after all the other transformations happen.
Is there a way to add such a transformation before the other transformations?
My main problem is that the resizing of the images is allready applied once I get to my custom augmentation.
The data bunch is created like this (for brevity I ommited the parameters):

data = (
ObjectItemList.from_df()
.split_by_rand_pct()
.label_from_func()
.transform()
.databunch()
)
data.add_tfm()

Tricky. Can you be more specific on what you want to do? An example would be nice.

Yes of course: I’m trying to implement the original augmentations of the SSD paper:

Sample a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3,
0.5, 0.7, or 0.9

This requires to compute candidate crop ROIs and then to check them against the available bounding boxes using jaccard.
As I’m writing this I have a rough version running using the .add_tfm(). In my opinion this is suboptimal as I have to squish the newly generated patch again into a square input size. Which in my current implementation is extremly slow due to going back from the gpu to cpu, numpy, then opencv for resize and going all the way back to the cuda… and as the notebook is running right now is infeasible even on a V100 :confused:

small update: the slowness is also due to principal of that augmentation, as it omits a lot of candidates before it uses one. And the runtime is extremly non deterministic as the number of candiates being checked my vary between 1 and few thousands…
which I’m limiting now.

next update: I optimized it a bit and now I’m down to 1 min per epoch, which is about double the time without this augmentation. This is ok for the moment, but still I think it would be way more efficient and result in better input data if being done before the resize by transform()

I’m a bit confused. Isn’t this done with the output of the model to determine which boxes predictions to keep?

yes, a similiar step is being done in the loss function to select the positive samples, but there the iou between the anchors boxes and the ground truth boxes is computed.

However one of the augmentations the authors of SSD mention is what I described above, to create good crops which make sure that objects are contained and the majority of their “body” is part of the crop.

This augmentation looks really promising, I might say amazing :slight_smile:
The loss is currently going down way more than ever before in my experiments. Its at 75 percent of my former minimal loss and keeps falling…

If it is of general interest I’m going to supply the code after making it a bit nicer next week.
The weekend is approaching fast right now and I’m squeezing in as much epochs as possible before leaving…

1 Like

okay, I have been to euphoric. I have missed that the transformation which is added using .add_tfm() on the databunch is obviously applied to both training and validation.
What I actually want is this: data.train_dl.add_tfm().
Now the provement of the validation loss is almost completly gone and needs further investigation.

1 Like

I’m back at it. It took me some time to get mAP running and being able to finally measuring the performance and playing around with some stuff.
I’m seeing a difference in mAP of about 5 points between the default augmentations from get_transforms() and using my custom transformation. I have the feeling that this could be improved if the transformations could applied earlier in the chain. @sgugger, you said this would be tricky. Is it possible at all? Is implementing a custom databunch the right way to do that?

This isn’t supported directly for the time being. The workaround I think of is creating a custom ItemList that groups your image and the bboxes to which you could define a custom apply_tfms method, but otherwise the pipeline applies them separately to inputs and targets.

ok, thanks! I will have a look into this.