Question on how differential learning rates are applied to model

radek · November 3, 2017, 8:44pm

The proof is in the pudding AKA the code

I do not know how this works, but in Learner#fit we call Learner#get_layer_opt. This is on line #95 in learner.py.

From there, we instantiate LayerOptimizer and this whole class deals with layer_groups. I have not a whole lot clue how this works, but it seems that if the passed in learning rate is just an int, it will be turned into an array of ints I think will be the length of the layer_groups.

Without knowing more, I infer that probably for this model there exist 3 layer_groups and layers from each group get assigned a particular learning rate.

But I am not sure. Those are however the steps that would need to be take to start figuring this out. To go further, one would need to read more code and there might be Python features one would need to familiarize themselves with.

I think if we want to start tackling such questions, we really have to get into the habit of reading the source code. There are so many questions on this that can be asked that we cannot possibly address on these forums.

So there we go We all have a chance to become better Python programmers And if someone figures this out - and I am tempted to start figuring such things starting next week, than maybe we can start writing docstrings so that our colleagues can have an easier time figuring such things out.