Wiki: Lesson 2


#191

Hi GregFet: May you share the link on Kaggle to download the satellite dataset? Thanks a lot.


#192

So in the lesson Jeremy mentioned that the validation images are center cropped to be square. But are the training images also cropped that way? It seems to me it would be the case, but that’s not clear.


(Kofi Asiedu Brempong) #193

Hi all, i tried building an image classifier based the steps outlined in the lesson 1 notebook.
I wrote my first medium post based on the results i had, please check it out and let me know your thoughts


(Navin Kumar) #194

I was doing the dog breeds identification in AWS . I was getting Nan in the training & validation loss while training with differential learning rates.
attached the image

I was using learning rate = 0.2

May I know why it happens.


(Rami) #195

I have the same question. How is the model handling variable input sizes?


(Navin Kumar) #196

My notebook is similar to Jeremy’s dogbreeds notebook shown in the video except the below:

  1. learning rate: Mine Jeremy
    0.2 1e-2

  2. Arch : resnet34 resent50

Later when I changed the learning rate to 1e-2 the Nan’s observed in the training/validation loss disappeared.

Is it that Nan’s appear when the learning rate is much higher as in this case 0.2 or 0.1 ?
The lr_finder was showing that lr of 0.1 would be a good choice…


#197

Since no one answered, I’d add what I’ve found in case it’s helpful to someone: the code is buried deep in the transforms.py of the fast.ai library. By the code, the cropping of the validation images depend on the cropping of the training images. If the training images are cropped in the “random” or “googlenet” way, then the validation images are centre-cropped; otherwise, the validation images are cropped the same way as the training images.


(Abhishek Awasthi) #198

Guys can anyone tell where to find all trained model weights by jeremy in lectures as i don’t have sufficient computational resources to train which is demotivating as i am not able to see results on my pc of the models which we are making.


(Tiago Caruso) #199

This is a non-answer


(Gerald) #201

how did you resolve this?


(Benjamin Trautman) #202

Having some issues with overfitting on the dog breeds challenge. I walked through the code as Jeremy explained his process but as soon as the first learn.fit(1e-2,5), my results diverge from his.

epoch trn_loss val_loss accuracy
0 0.536105 0.319523 0.902153
1 0.350298 0.297175 0.902642
2 0.254736 0.266593 0.912427
3 0.19815 0.265514 0.91683
4 0.166361 0.253992 0.918297

as you can see, my trn_loss rapidly decreases below the level of val_loss. As I move through the exercise, my trn_loss continues to drop while val_loss maintains at ~0.23-0.25.

I believe I’m overfitting, but i’m confused how this could happen if I’m running ostensibly the same code as Jeremy. The only change I had to make was to the batch size (decreased to 20) because I was running into cuda memory error. Could this cause this issue?


(Gerald) #203

Hi to everyone! Please, help me with some error with notebook lesson2 “Multi-label classification”.

I am using a paperspace, so run cell. Below is the error.

thanks in advance.


(Murali Manohar) #204

Hi @cqfd , I have a doubt regarding “pre_compute” . It is mentioned that we pass in that dataset once (without any data augmentations) to get the activation vectors of each image. But when we go to the code,

Initially, in #1, It is written that pre_compute=True, and then in the next step we do
1.learn.fit(.01,1), doesn’t this mean that the weights of resnet34 arch are getting updated?

  1. For data augmentation, we put “pre_compute=False” for not computing activation vectors for data augmented images and do “.fit” again. My question is, how do you get activation vector for a rotated image which is diff from actual image. Do we use activation vectors of the original image? Are we passing the rotated image through resnet arch? If yes, that means we are computing activation vectors which goes against pre_compute=False?

#205

I’m slightly confused about points 5.-7.:

Review: easy steps to train a world-class image classifier

  1. precompute=True
  2. Use lr_find() to find highest learning rate where loss is still clearly improving
  3. Train last layer from precomputed activations for 1-2 epochs
  4. Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
  5. Unfreeze all layers
  6. Set earlier layers to 3x-10x lower learning rate than next higher layer
  7. Use lr_find() again
  8. Train full network with cycle_mult=2 until over-fitting

Let’s say that for step 4., I do:

learn.precompute=False
learn.fit(lrs=1e-2, n_cycle=3, cycle_len=1)

If after that, I do

learn.unfreeze()
lrf = learn.lr_find()
learn.sched.plot()

then the learning rate finder will show me the loss as a function of learning rate, assuming that I use the same learning rate for all layers However, what I’ll actually do is use differential learning rates, like lrs = np.array([1e-4,1e-3,1e-2]).

So my question is: Can I tell the learning rate finder to show me the loss as a function of late-layer learning rate, assuming that early/middle-layer learning rate is 100/10 times lower than late-layer learning rate?


#206

I’m unable to run any of the other architectures other than resnet34. I’m getting the following errors when I run the notebook.

Could somebody help with this issue? I’m running it on my local machine (windows 10) which has Nvidia gpu.