Lesson 2 In-Class Discussion

bdekoven · November 8, 2017, 11:05pm

For Data Augmentation how do we know the number of new objects being created?

Or are we just transforming the images?

ramesh · November 8, 2017, 11:48pm

No new object is getting created. But on each Epoch, the Dataset Loader will apply transforms with a random parameter (like zoom, sheer, shift etc) on each image to create slightly modified version of the input image so that network doesn’t over-train to the input images.

jeremy · November 9, 2017, 12:41am

It’s easy enough to do it in the transforms - the issue is that the model itself needs to have a consistent input size. I guess it doesn’t really have to be square - just consistent. But in practice generally we have a mix of landscape and portrait orientation, which means square is the best compromise.

If you have a dataset that’s consistently of a particular orientation, then perhaps it does indeed make sense to use a rectangular input - in which case feel free to provide a PR which allows that (i.e. sz everywhere it’s used would assume square if it’s an int, or rectangle if a tuple).

anandsaha · November 9, 2017, 3:28am

But should we be creating new images? i.e. keep the original and create a new transformed version of it as well. I thought that’s data augmentation.

I guess if we have a sufficiently large dataset, doing an in-place transformation might be okay. But if we start will fewer data, it might be better to add. What’s the guidance here?

–

jeremy · November 9, 2017, 3:29am

The new image is always created dynamically, on-the-fly, and then never reused. It’s not being stored anywhere, so the original is always kept.

vikbehal · November 9, 2017, 3:43am

training becomes very slow, so it can have a similar impact to being in a local minima.

In practice, can It actually result in local minima?

vikbehal · November 9, 2017, 3:47am

So, Is this the equation?

Our architecture ~= Predefined architecture (No change) + second last layer (Calculating activations for this layer for the provided data?) + last layer (Output layer)

jeremy · November 9, 2017, 3:53am

Pretty close. We add a few layers to the end, but in practice what you describe is close enough

vikbehal · November 9, 2017, 4:06am

Questions -Original Q and Jeremy’s reply:

This is what we do:

We find an optimal ‘lr’ and feed into ‘Adam’
‘Adam’ an adaptive optimizer will start with our ‘lr’ and reduce overall loss. The reason we are using ‘Adam’ because it has adaptive ‘lr’ technique?
If statement 2 is correct, then why are we explicitly supplying 3 learning rates (when we unfreeze) in the code?
If statement 2 is incorrect, then why not use SGD. What makes ‘Adam’ special apart from adaptive ‘lr’? Guess, it’ll be covered in future lectures?

ar_ai · November 9, 2017, 4:19am

We specify 3 learning rates for different layer groups and not for one layer. Different layer groups need different amount of fine tuning and hence different learning rates. Before unfreezing, we were only training the last layer and we only needed to supply one learning rate. After unfreezing, if we supply only learning rate, fastai library will use the same learning rate for all the layer groups and this may not be ideal.

vikbehal · November 9, 2017, 4:25am

So, ‘Adam’ when used without unfreezing will adapt learning rate over time for last layer?
How does Adam’s adaptive nature help just for the last layer?

rsrivastava · November 9, 2017, 4:32am

Trying to test Dogs v Cats super-charged! ipynb getting weights not found exception. Where can I get these weights. Is there any pre-requisite to run these ipynb

FileNotFoundError: [Errno 2] No such file or directory: ‘…/fastai/courses/dl1/fastai/weights/resnext_50_32x4d.pth’

ar_ai · November 9, 2017, 4:34am

Download the weights from http://files.fast.ai/models/weights.tgz and unzip it into that ‘fastai/courses/dl1/fastai/weights/’ folder.

rsrivastava · November 9, 2017, 4:38am

Thanks a lot!

dgovender · November 9, 2017, 5:24am

Adam Optimization vs SGDR

Adam optimization contains a momentum parameter that controls how quickly the optimizer reaches the global/local minimum.

The momentum parameter is in the range 0 - 1. Values close to 1 represent 'high` momentum.

The higher the momentum, the more likely the optimizer will overshoot the minimum.

Question: Can you simulate SGDR by manipulating the momentum parameter in Adam optimization (at specific times in the iteration cycle), without varying the actual learning rate?

rsrivastava · November 9, 2017, 5:25am

I was among a few unfortunate people who do not have free $500 AWS access. Therefore i am trying to test everything in my local machine. When I try to run lesson1-sgd.ipynb. I am getting following error. I know this error is because model is trained in GPU that we are tying to run in CPU…But I do not know how to solve this any idea?

AssertionError: Torch not compiled with CUDA enabled

Elfayoumi · November 9, 2017, 5:29am

I am getting:
FileNotFoundError: [Errno 2] No such file or directory: 'wgts/resnext_50_32x4d.pth’
am i missing something?
i did git pull before I started

rsrivastava · November 9, 2017, 5:33am

Please check this post

ramesh · November 9, 2017, 5:38am

You probably have installed PyTorch with Cuda. If you pip uninstall torch and then reinstall using the non-cuda version it should not give that error. To install non-cuda version see the Getting Started section in http://pytorch.org/

But please note that you may not be able to run things to completion in your local (non-GPU) machine unless you wait for a very long time. The best course of action might be using Crestle or AMI. If you need financial help please post - Request or share AWS credits here and someone might be able to setup a AWS box using their credits and provide you access for limited hours / week.

Elfayoumi · November 9, 2017, 5:51am

Thanks Rashna