Lesson 2 In-Class Discussion

charlielee · November 7, 2017, 3:19am

Are the precomputed values similar to TF’s bottlenecks?

css · November 7, 2017, 3:20am

@ecdrid I thought Jeremy said we “save it from time to time”.

surmenok · November 7, 2017, 3:21am

Yes. Based on their definition of bottlenecks, it’s exactly the same.

ecdrid · November 7, 2017, 3:22am

after we are done with the work in hand we save them?

trusttheai · November 7, 2017, 3:22am

@jeremy what should be the value of sz variable. should it be size of the input data or is it fixed for a given architecture?

yinterian · November 7, 2017, 3:23am

Some architectures have it fixed. Resnet uses any size.

zpnc · November 7, 2017, 3:24am

Is it possible to unfreeze only specific layers and not all of them?

pramod.srinivasan · November 7, 2017, 3:24am

So, in future, should researchers cite the notebook if they are using diff. learning rates ?

trusttheai · November 7, 2017, 3:24am

So, I can set sz to same as size of my input image?

ecdrid · November 7, 2017, 3:25am

why do we want to give learning rates for the initial layers for the already trained model ?

johnnyv · November 7, 2017, 3:25am

How are the learning rate groups defined? Where in library code do we assign different layers to the different groups? Can there be more than 3 groups?

brightertiger · November 7, 2017, 3:25am

How do we use cyclical learning rate schedular to learn these differential learning rates?

surmenok · November 7, 2017, 3:26am

Yes. See freeze_to function (defined in learner.py)

anandsaha · November 7, 2017, 3:26am

The LR for them is very small, so they won’t be updated much. Totally freezing them is a good idea, but I guess if your classes are a bit different from the classes for pre trained weights, you would want to adjust these early layers a bit.

Sree · November 7, 2017, 3:27am

When you run learn.fit for the differential LR, it does 7 epochs, But previously it was 3 epochs. How did it change? Did we specify anywhere to change the number of Epochs?

anandsaha · November 7, 2017, 3:29am

I guess yes, you can develop your own groups, but you might have to tweak the library. Jeremy might have researched and found 3 is a good number, though.

yinterian · November 7, 2017, 3:30am

Defined by Jeremy on every architecture.

atulkum · November 7, 2017, 3:30am

Does cosine learning rate scheduling and freeze technique works in deeper networks too where the search space is more complex?

ecdrid · November 7, 2017, 3:30am

precompute = True
means we are training only the last layers as needed and using pre-trained weights for the previous ones in the model?

yinterian · November 7, 2017, 3:31am

Very close, and you are not using augmentation.