That’s a very serious allegation. I hope you have something very specific and detailed to back it up, or that you you make an immediate full retraction.
We have published the full code for both approaches and showed the comparison at the PyTorch developers conference. They are as close as possible to exactly the same.
Your claim is also a straw man. I specifically stated that keras is the “gold standard”. fastai has the benefit of years of additional research. The keras devs made an explicit choice to freeze their API a long time ago.
Awesome lesson! I was playing around with the notebook and can’t quite figure out where the models are being saved when you call learn.save()? The docs say it is in self.model_dir. Is this somewhere in a hidden directory in the jupyter notebook?
Also I was able to get an under 4% (~3.8%) error rate by decreasing the earlier learning rates by 10e-3 and 10e-1 for the later rates: learn.fit_one_cycle(4, max_lr=slice(1e-9,1e-5))
However given the traffic that this thread sees during the lecture it takes a while just to scroll down to the bottom of the ever-developing thread. A quicker way to post a new reply (instead of replying to any post in particular) is to click on the Reply button available in the scroll bar on the right-side of the page at all times.
I think Jeremy will explane that later classes but idea behind lr_find is to increase learning rate every time until it become too big. When it becomes too big fastai stop looping because we already got all we wanted.
They get stored here: ~/.fastai/data/oxford-iiit-pet/images/models/
So all data gets downloaded to this “new” .fastai dir in your home directory into the data folder and the saved model states get stored there within the respective subdirs.
When the error becomes too high (due to increased learning rate) the lr_find() stops. You can control the limits to where it stops from parameters of the function (used to be a clip value, but check documentation).
There has been a lot of hand wringing that Keras also updates BatchNorm parameters even when trainable is set to false, ie layer is frozen. In fact if I remember right, in one of @Jeremy’s tweets, he had pointed this exact issue as a problem with Keras. Any intuition as to why the performance improves with BN layer unfrozen. Seems counter intuitive, especially since the following layers cannot adapt to change in BatchNorm when they are frozen.
But then again what is intuitive and counter-intuitive seems all warped. As Jeremy recently tweeted about an approach where the head’s weights are frozen at random initialization and the base model’s weights are updated. completely topsy turvy
1e-6 is the learning rate for the very first layers (the layers which Jeremy showed which had those edges and gradients in it) and then we have the Learning rate 1e-4 for the above layers. We have actually unfrozen the complete net ( meaning we are now training the weights of the net from the very first layers)
So the basic concept behind this is we dont want to change the weights of the previous layers (very first layers) which are detecting edges, gradients etc because they are really good at it, hence a very small learning rate (which means that we wont be changing their weights to a great extent) but the layers after that where we now try to recognize patters, eyeballs etc is where we would like to change the weights as that is where we want the net to recognize patterns which are related to our dataset, hence greater learning rate than the previous one. This greater learning rate gives those layers a better chance to understand our dataset and change the weights accordingly.
The python slice(start, stop) method gives you the ability to apply these learning rates to the spread of layers beginning with the small learning rate for first layers and increasing it as we move forward with the layers in the network.
Imagebunch using folder,CSV,list has already been discussed, but what can we do if we have a folder of multiple CSVs, just like quickdoodle recognition on kaggle right now ? Up till now I am working with generators.