Can someone explain lm_splitter in 12c_ulmfit.ipynb

hamelsmu · May 3, 2019, 12:20am

Hello friends, I am reviewing the lessons and am confused by the lm_splitter function defined near the beginning of this notebook https://github.com/fastai/fastai_docs/blob/master/dev_course/dl2/12c_ulmfit.ipynb

These are my questions:

what the output is supposed to look like?
Why is the groups variable being overwritten after the for loop (seems like an error?).

I feel like something is wrong with this function b/c the groups variable is being overwritten, however, I’m confused by what this function is supposed to accomplish to begin with. Any insight or help would be much appreciated.

phaniteja · May 3, 2019, 5:50am

From my understanding what they are trying to do is to apply different learning rate to different parts of the models. Each part might have one or more layers. These different parts are the groups
groups=[part1,part2,part3,…]
each part is a sequential object

for each part they want to apply a different learning rate.

1)The output is list of list of parameters.
output=[part.parameters() for part in groups)
so output is basically list of parameter groups.
so when applying different learning rates we loop over this output and for all the parameters in each part we apply the same learning rate.

2)Groups variable being overwritten seems like an error to me as well. That line should be an append to the existing groups variable.

RogerS49 · May 3, 2019, 8:40am

The great thing about note books is that you can delve into the workings easily by splitting and adding cells to enable discovery of inputs and outputs.

I concur with your bug description :-

def lm_splitter(m):
groups = []
for i in range(len(m[0].rnns)): groups.append(nn.Sequential(m[0].rnns[i], m[0].hidden_dps[i]))
#groups = [nn.Sequential(m[0].encoder, m[0], m[0].input_dp, m[1])]
groups.append([nn.Sequential(m[0].encoder, m[0], m[0].input_dp, m[1])])
return [list(o.parameters()) for o in groups]

sgugger · May 3, 2019, 7:48pm

Yeah it was supposed to be +=, not an =. Plus some of the layers were misnamed. I pushed a fix.

jeremy · May 3, 2019, 10:12pm

@sgugger let us know if the training results improve after fixing this!

sgugger · May 3, 2019, 10:26pm

The training couldn’t have run with this, so I probably had fixed it but forgot to push.

hamelsmu · May 4, 2019, 3:01pm

Thanks for fixing this, and sorry for my ignorance! Thank you so much for all that you do for this library it is truly amazing