Are the precomputed values similar to TF’s bottlenecks?
after we are done with the work in hand we save them?
@jeremy what should be the value of sz variable. should it be size of the input data or is it fixed for a given architecture?
Some architectures have it fixed. Resnet uses any size.
Is it possible to unfreeze only specific layers and not all of them?
So, in future, should researchers cite the notebook if they are using diff. learning rates ?
So, I can set sz to same as size of my input image?
why do we want to give learning rates for the initial layers for the already trained model ?
How are the learning rate groups defined? Where in library code do we assign different layers to the different groups? Can there be more than 3 groups?
How do we use cyclical learning rate schedular to learn these differential learning rates?
The LR for them is very small, so they won’t be updated much. Totally freezing them is a good idea, but I guess if your classes are a bit different from the classes for pre trained weights, you would want to adjust these early layers a bit.
When you run learn.fit for the differential LR, it does 7 epochs, But previously it was 3 epochs. How did it change? Did we specify anywhere to change the number of Epochs?
I guess yes, you can develop your own groups, but you might have to tweak the library. Jeremy might have researched and found 3 is a good number, though.
Defined by Jeremy on every architecture.
Does cosine learning rate scheduling and freeze technique works in deeper networks too where the search space is more complex?
precompute = True
means we are training only the last layers as needed and using pre-trained weights for the previous ones in the model?
Very close, and you are not using augmentation.