Lesson 2 In-Class Discussion

Here is an example from fish.ipynb notebook (inside dl1 directory). In the case of the this Kaggle competition the important part of the image is a fish. The fish were often not in the middle of the image so the cropped image missed the most important information.

sz = 350
tfms = tfms_from_model(resnet34, sz, crop_type=CropType.NO)
data = ImageClassifierData.from_csv(PATH, "images", csv_fname, bs, tfms, val_idxs)

thanks @Moody for bringing this up.

@yinterian could you clarify if when we set sz that the default resize function is a center crop? And to avoid this we can just follow your provided code passing in crop_type=CropType.NO? Or is this only for augmentation/transforms?

I have a related question on sz parameter to tfms_from_model. Can we provide a Height x Width like (400 x 250) size or does it need to be a Square. Digging into the code little bit, it looks like it only expects one value for sz parameter and looks to be resizing to the square image?

Anyways we can keep size input as int or a tuple of h, w? or was that found to be not that useful?

1 Like

Based on what @moody mentioned, when cropping is used to convert the images to a square shape and data augmentation is used, does 1) cropping happen first and then augmentation or 2) data augmentation of the original image and then cropping? Seems like option 2 may retain more data, but option 1 is more computationally efficient

1 Like

It needs to be square. It’s a limitation of how GPUs are programmed nowadays, but I suspect at some point people will handle rectangular inputs too. For now, no library can do this (AFAIK).

Data augmentation happens before cropping, for the reason you mention (retains more data).


Ok…The reason I ask is, my images are stock photos that are aspect ration 1.5 to 1 on Height to width. I will try the square image. But in Torch Transforms, looks like it does take rectangular values - http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.Resize

I can create a pull request if you think it might be a useful feature to add.

I’m also interested in having this feature as long as its technically feasible…in my case I’m working with images that are ratio 2:1

fwiw, I know some people in the recently completed Carvana Kaggle competition seemed to be using varying rectangular shaped images since that dataset contained images that were all ratio 1.5:1

For Data Augmentation how do we know the number of new objects being created?

Or are we just transforming the images?

No new object is getting created. But on each Epoch, the Dataset Loader will apply transforms with a random parameter (like zoom, sheer, shift etc) on each image to create slightly modified version of the input image so that network doesn’t over-train to the input images.

1 Like

It’s easy enough to do it in the transforms - the issue is that the model itself needs to have a consistent input size. I guess it doesn’t really have to be square - just consistent. But in practice generally we have a mix of landscape and portrait orientation, which means square is the best compromise.

If you have a dataset that’s consistently of a particular orientation, then perhaps it does indeed make sense to use a rectangular input - in which case feel free to provide a PR which allows that (i.e. sz everywhere it’s used would assume square if it’s an int, or rectangle if a tuple).

1 Like

But should we be creating new images? i.e. keep the original and create a new transformed version of it as well. I thought that’s data augmentation.

I guess if we have a sufficiently large dataset, doing an in-place transformation might be okay. But if we start will fewer data, it might be better to add. What’s the guidance here?

The new image is always created dynamically, on-the-fly, and then never reused. It’s not being stored anywhere, so the original is always kept.


training becomes very slow, so it can have a similar impact to being in a local minima.

In practice, can It actually result in local minima?

So, Is this the equation?

Our architecture ~= Predefined architecture (No change) + second last layer (Calculating activations for this layer for the provided data?) + last layer (Output layer)

1 Like

Pretty close. We add a few layers to the end, but in practice what you describe is close enough :slight_smile:

1 Like

Questions -Original Q and Jeremy’s reply:

This is what we do:

  • We find an optimal ‘lr’ and feed into ‘Adam’
  • ‘Adam’ an adaptive optimizer will start with our ‘lr’ and reduce overall loss. The reason we are using ‘Adam’ because it has adaptive ‘lr’ technique?
  • If statement 2 is correct, then why are we explicitly supplying 3 learning rates (when we unfreeze) in the code?
  • If statement 2 is incorrect, then why not use SGD. What makes ‘Adam’ special apart from adaptive ‘lr’? Guess, it’ll be covered in future lectures?

We specify 3 learning rates for different layer groups and not for one layer. Different layer groups need different amount of fine tuning and hence different learning rates. Before unfreezing, we were only training the last layer and we only needed to supply one learning rate. After unfreezing, if we supply only learning rate, fastai library will use the same learning rate for all the layer groups and this may not be ideal.

So, ‘Adam’ when used without unfreezing will adapt learning rate over time for last layer?
How does Adam’s adaptive nature help just for the last layer?

Trying to test Dogs v Cats super-charged! ipynb getting weights not found exception. Where can I get these weights. Is there any pre-requisite to run these ipynb

FileNotFoundError: [Errno 2] No such file or directory: ‘…/fastai/courses/dl1/fastai/weights/resnext_50_32x4d.pth’

1 Like

Download the weights from http://files.fast.ai/models/weights.tgz and unzip it into that ‘fastai/courses/dl1/fastai/weights/’ folder.