From what I understood through reading the docs, there are two behaviours of slice
:
- When you only pass 1 argument (like the learningrate
lr
in your example), the last layergroups learningrate islr
and all others havelr/10
- When you pass 2 arguments (like
1e-5,5e-2
), the first group of layers gets1e-5
as learningrate and the last group5e-2
. All other groups get a learningrate evenly geomatrically spaced between those arguments.
You can look it up in the docs: discriminative layer training and when you scroll down a bit to lr_range
Hope this helps