My thinking here is that a higher learning rate will push the weights around more - and therefore will give less credence to the pretrained weights. So in lesson 2 the earlier layers have higher relative weights, which means we’re pushing the weights further from the pretrained network.
Having said that, I haven’t done rigorous experiments, and I haven’t seen this technique used elsewhere, so take my theory with a grain of salt. @sebastianruder is helping run some experiments at the moment that may give some initial data on this, but it might be a few months before we have anything really concrete to show.
Hello, I am a student with a pretty basic laptop with 2gb 940MX. I am currently stuck at the part where we unfreeze all the layers of the network and train them and it is showing an error of ‘OUT OF MEMORY’. I don’t have access to any cloud GPU’s either. I just have one solution to go ahead if i could get the file ‘224_all.h5’ that is located under models folder under the dogs-vs-cats dataset folder from anyone who has trained the model. Really appreciated if anyone could provide me with the file. Thank you.
Thank you sir for replying, I tried that but all in vain it still gives error of out of memory. Only solution I can see here now is if anyone could provide me with the weights of the fully trained resnet34. Thank you.
Hi,
In the loss vs iterations plot generated by learn.sched.plot_loss(), the loss is computed from a mini-batch or is it for the entire training dataset?
After every iteration, is the loss of entire training set computed for this plot? Or is the loss computed from the mini-batch that was used for the iteration?
Hi guys, I could not find the file/folder that contains label information for cats/dogs classification in the dataset. How does the model learn without labeling?
Edit: I found that all cat images were put in one folder while dogs in the other one, probably they were labeled in this way, which I am not quite familiar with.
Would you share your code for the kaggle dog breeds competition using fastai libraries? Because I found that code used label data while the code for dogscats did not. Thank you.
For the instructors, I wish there was a half-completed notebook with tasks/notebooks that I could fill out for this lesson. I am not very clear what to do after watching the lesson. Right now I am playing with the notebooks, but without much direction.
Edit: I started watching lesson 3 and I see there’s a task to classify dog breeds using the kaggle dataset, i will do that
This seems like a basic question, so please point me to the right place if I’ve missed it somewhere else
We cover multi-labels in lesson three, but how do you handle multi-classes when one class is unknown? For example, for image classification what would be the set up for “apple”, “banana”, “anything that’s not an apple or a banana?”
I see. From what I’m seeing on Stack Overflow, the solution is to create a “garbage” class that you fill with everything else. Seems like a lot of extra work for something that must be a common problem, but I’ll try it and see.
In lesson 1 @jeremy used the following differential learning rates with the preceding layers trained 10x slower than the following set of layers:
lr=np.array([1e-4,1e-3,1e-2])
In lesson 2 in Understanding Amazon from Space:
lr = 0.2 lrs = np.array([lr/9,lr/3,lr])
While I understand the rationale behind having varying learning rates I can’t seem to understand why lr/9 and lr/3 are used in this case (9 and 3 seem quite arbitrary, why not lr/25 or lr/16 or any other numbers?)
Since we will not want to alter the weights of the earlier layers too much since they capture fundamental low-level patterns / gradients etc. in the images, why are we training it at much higher learning rate in the notebook used in lesson2 vs the dogs-cats problem in lesson1? Only reason I can think of is that the model has been trained on data similar to the data for dogs-and-cats and not on satellite images and therefore we might want to nudge the weights in the earlier layers a little more, but then again how do I know how large a learning rate should be set for the earlier layers?
@henrywjr I think you’re thinking about things the right way–I can’t remember where in lecture Jeremy covers this, but he says pretty much exactly what you describe.
Picking 3 and 3^2 is definitely a bit of a judgement call, but the thinking is just that we want to give the weights more wiggle room. We were using 10 and 10^2 before, and 3 < 10, so Ultimately this is a hyperparameter. You either try a bunch of different approaches and see how they do on your validation set, or you can just follow your heart.
This is regarding Multi-label Classification (lesson2-image_models.ipynb):
x,y = next(iter(data.val_dl))
Looks like y is a 64 x 17 matrix, it would have made more sense for it to be a vector of length equal to number of classes, so, 1 x 17 or 17 x 1. Why is y a 64 x 17 matrix, what does it represent?
Thanks!
Edit: This was answered in Lesson 3, basically y is the transpose of mini-batch, in this case mini-batch size is 64.