From what I understood through reading the docs, there are two behaviours of slice:
- When you only pass 1 argument (like the learningrate
lrin your example), the last layergroups learningrate islrand all others havelr/10 - When you pass 2 arguments (like
1e-5,5e-2), the first group of layers gets1e-5as learningrate and the last group5e-2. All other groups get a learningrate evenly geomatrically spaced between those arguments.
You can look it up in the docs: discriminative layer training and when you scroll down a bit to lr_range
Hope this helps 