Hello! I am new in the forum and i registered because I had a question. I am a junior developer and i was wondering about the functionality of match_embeds.
So one of the arguments is old_wgts which the function itself changes in place and at the same time it returns the changed in place old_wgts argument. Whats the point of changing some variable in place and at the same time returning this variable. Anyway the old variable has been modified so it seems a bit redundant.
Just wondering if that was intended and if that is a good practice in Software Development. To me it seems a bit redundant, so either you change a variable in place or you return a new variable without modifying the original one.
The reason why this is done is if we use a pretrained WikiText model (or your own that you’re fine-tuning on top of). We need to have its embeddings (aka their vocab) to match what our new vocab is. If there’s zero redundancies, then we’re good to go. If not, we need to adjust that matrix (in place so we keep the old weights) to make room.
Edit: Technically you are right, since it’s by reference we shouldn’t be returning the value, a PR would be helpful if you’d like to try
@arnau well, thinking about it the answer is a bit more complex. In some cases you would, in some you wouldn’t. Specifically if returning the value increases the readability of the function. In reality it does the same thing without complexity issues, so then the question is if it increases understanding. Personally, I believe it does because it shows that we’re adjusting the old weights and then that’s what’s being used (rather than new weights) and I have a feeling Jeremy and Sylvain had this in mind too.
and later on he stops using old_wgts and just uses wgts. So it is really anti intuitive to change the old_wgts variable in my opinion if you are not going to use it anymore… Whats the point of changing a variable in place if you will not further use it
That if I had to guess is readability. From a coding perspective, sure we know what’s going on. But from a course perspective, if we just kept everything as old_wgts, we’d ask "where did we change to integrate the new weights’?
Coding for a course != coding for a library You can see that in the tests we do a .copy() so it is no longer in place too:
And in the code itself, the naming doesn’t change in the related function:
wgts = match_embeds(wgts, old_vocab, new_vocab) (in load_pretrained).