Project: GoogleNet data augmentation

Here’s a fun little project if anyone is interested in checking it out today: can you implement the GoogleNetResize transformation from this repo. I’ve noticed that a lot of recent top results on Imagenet use this, so it would be nice if we had it available! :slight_smile:

For extra points, try to implement the other transformations in fbresnet_augmentor in that same file. We already have brightness and contrast transformations, but I didn’t do them very carefully and I think it’s worth thinking about doing it “properly”; i.e. doing it in HSV color space or something like that. I’m guessing that’s what imgaug does but I didn’t check.

BTW the approach they’re using in fbresnet_augmentor is probably slow - doing all the lightness/color transformations as separate steps. I also suspect it may lead to cropping problems. That’s why I do brightness/contrast in one function. If you look into these transformations you may want to think about this issue (and I may well be wrong!)


Hi Jeremy,

I was curious to understand what exactly this transformation is and eventually I had to deep dive into it.

For this input,

the output seems like,

To my understanding, it seems like a mix of a bunch of zooms and aspect ratios. I went through the and extracted just the relevant piece of code to get the transform working as a standalone class in here: Would this class as-is suffice to be a part of the fastai with just functional name refactoring? Also, I think I might have to inherit the CoordTransform parent class if I am not mistaken.

Would you suggest any other improvements?

EDIT 1: Ah, there’s a mistake. Will update this.

1 Like

Some more examination here


Exactly. Check out the paper the code references, if you haven’t already - there’s a short section there on augmentation that explains the basic idea, in fairly simple language.

Your implementation is looking good! I think to incorporate it into fastai some useful steps would be:

  • See if you can inherit from CoordTransform so as to get y-value augmentation working too (and then try testing on masks and coordinates as y values)
  • Add a new enum member CropType.GOOGLENET
  • Change tfms_from_stats so it says:
    val_crop = CropType.CENTER if crop_type in (CropType.RANDOM,CropType.GOOGLENET) else crop_type
  • in image_gen don’t set scale for that crop_type (since we both scale and crop in one function)
  • in Transforms.__init__ you can then set crop_tfm to GoogleNetResize for that crop_type

…or something like that. How does that sound? (I’ll be in USF a bit later today but can chat about it if you’re around.)

Also, you should probably test what happens when the input image is not much bigger than the output size. In that case, the default crop area fraction of 0.08 is obviously far too low. Does the code seem to handle this OK? (we wouldn’t want it to zoom in further than to 1-pixel resolution).


If I understood correctly, iteratively transform height and width to check if both the height and width are less than or equal to the actual height and width. If the condition is satisfied with in n trails (10 in the repo), crop and resize to target size; else scale the shortest side to target size (scale_min method in fastai.transforms) and center crop the image. Please correct me if I am wrong.

I managed to write a class that inherits CoordTransform here:

I have also included other transformations in fastai.transforms to compare the outputs and also to check my understanding.

1 Like


As you’ve made significant progress on the GoogleNetResize, maybe continue doing it and I’ll look at the other two transformations that we’ve discussed sometime back - Lighting & CutOut. Let me know incase you’d like some help on the GoogleNetResize.

1 Like

Thank you @binga

I have made the changes to as mentioned by Jeremy here:

I have one question:
Currently, the parameters such as min_area_frac=0.08, min_aspect_ratio=0.75, max_aspect_ratio=1.333, and p=0.5 to googlenet_resize are set to defaults as mentioned in the repo. Should they be configurable?

I was able to run the Pascal Lesson-8 notebook using CropType.GOOGLENET here:
and the final results are as follows:

Using CropType.NO:

Using CropType.GOOGLENET:


  1. Detection accuracies were more or less the same in both cases. However, Detection l1 was better in the case of CropType.NO (close to 35% bad).
  2. For the same number of epochs, Train loss is very close to the val loss in case of CropType.GOOGLENET unlike in case of CropType.NO.

Bounding boxes with tfm_y=TfmType.COORD and CropType.GOOGLENET are as follows:

As the bounding boxes (and sometimes improper crop) are not proper using CropType.GOOGLENET, poor l1 is observed in object detection. What do you think?

I will try CropType.GOOGLENET on another dataset and share the results.

Please suggest any improvements required. Thanks again.

Great job @ziron!

I wouldn’t bother testing googlenet aug with bounding boxes, since it hasn’t been used for that in the academic literature and it’ll take a lot of work to find suitable hyperparams. Instead, test it with classification datasets.

And yes please do make the settings into params. Could you go ahead and submit a PR so we can try this out?