Differential learning rate applied to hidden layers

I am new to Deep Learning. My question was related to applying different learning rates to the learner. One of the code lines says:

  1. Are we applying separate learning rates to hidden layers? I read about this in one of the other answers. It said something about group of layers, but I could not get a hang of it.

  2. Does applying learning rates to all the layers mean we are re-training the entire CNN? If yes, won’t it reduce the accuracy of the model?

  1. Suppose we have 99 layers. and we have lr=np.array([1e-4,1e-3,1e-2]) , then we must be assigning 1e-4 as the learning rate for the first 33 layers, 1e-3 for the next 33 layers. 1e-2 for the last 33 layers.

  2. It is important to understand that we are doing this differential learning rates in transfer learning. Here, we want the initial layers to train less as these layers mostly represent edges, corners like low-level features (and, even if we make them learn more they would eventually end up learning the edges and corners), so we have a lower learning rate. Instead of starting at random point, we start at a better point. Wiki: Lesson 2 would help.


@SHAR1 Thanks! That clears a lot of doubts of mine.