Lesson 2 In-Class Discussion

Are the precomputed values similar to TF’s bottlenecks?

1 Like

@ecdrid I thought Jeremy said we “save it from time to time”.

1 Like

Yes. Based on their definition of bottlenecks, it’s exactly the same.

after we are done with the work in hand we save them?

@jeremy what should be the value of sz variable. should it be size of the input data or is it fixed for a given architecture?

1 Like

Some architectures have it fixed. Resnet uses any size.

2 Likes

Is it possible to unfreeze only specific layers and not all of them?

1 Like

So, in future, should researchers cite the notebook if they are using diff. learning rates :slight_smile: ?

2 Likes

So, I can set sz to same as size of my input image?

why do we want to give learning rates for the initial layers for the already trained model ?

How are the learning rate groups defined? Where in library code do we assign different layers to the different groups? Can there be more than 3 groups?

2 Likes

How do we use cyclical learning rate schedular to learn these differential learning rates?

Yes. See freeze_to function (defined in learner.py)

2 Likes

The LR for them is very small, so they won’t be updated much. Totally freezing them is a good idea, but I guess if your classes are a bit different from the classes for pre trained weights, you would want to adjust these early layers a bit.

6 Likes

When you run learn.fit for the differential LR, it does 7 epochs, But previously it was 3 epochs. How did it change? Did we specify anywhere to change the number of Epochs?

1 Like

I guess yes, you can develop your own groups, but you might have to tweak the library. Jeremy might have researched and found 3 is a good number, though.

1 Like

Defined by Jeremy on every architecture.

1 Like

Does cosine learning rate scheduling and freeze technique works in deeper networks too where the search space is more complex?

precompute = True
means we are training only the last layers as needed and using pre-trained weights for the previous ones in the model?

Very close, and you are not using augmentation.