In the lecture, Jeremy explains that linear interpolation is taking a chunk of one number and combining with a chunk of another number, something like:
a * (1 - alpha) + b * (alpha)
Is my understanding correct?
I understand how it relates to learning algorithms and building up the momentum, but what is this idea about more generally? Is it literally combining two numbers using some scaling terms that sum to 1? Is there all there is to this or what it is that I am missing?
Yup that’s pretty much it. It’s just a weighted average of two (or more) things.
Just because it’s a formula, we might think there is more to it Apply numbers to it and it becomes simple:
a = 5, b = 7 ,alpha 0.1 => output = 5.2
a = 5, b = 7, alpha 0.5 => output = 6
a = 5, b = 7, alpha 0.9 => output = 6.8
So it always among min 5 to max 7? Is that it? So What is the purpose of it?
If you get a new measurement, e.g. b, but you still want to account for the previous value a, you are taking a value between them - simple average if
alpha=0.5, closer to b if
alpha=1 than the previous value is not used.
You take a little bit of that and a little bit of that That’s all it does so it seems.
I like the explanation by @krasin btw.
I thought there was some deeper meaning to it and spent a bit of time investigating but haven’t found it. Guess fancy name would imply more but alas doesn’t seem to be the case.
If fancy names meant something more, then rectified linear unit wouldn’t be
Hehe, yes, indeed
I have a related and sad story to tell. I spent a couple of hours this weekend trying to understand what a TDNN (Time Delayed NN) was. Turns out - as best as I could understand - it is just a convolution over temporal data with maybe some inputs duplicated and shifted (not sure on the last point).
Still, nothing is as impenetrable as the papers on LSTMs. I still can’t believe what happened to me after lecture #6 and spending some time on the notebook for it
(BTW beam seach is such a simple and nice idea a very nice lecture on search algorithms if anyone cared - reason I bring this up is that it was Jeremy’s suggestion and it can give your RNNs text generation ability and extra oomph )
Would like to understand hierarchical RNNs and the Clockwork RNN but haven’t found a good PyTorch implementation. I am suspecting that the trick is very simple - you have parts of the network that see input for multiple time steps / every couple of time steps and somehow we just keep on reusing their outputs and combining with parts of the network that ‘runs’ faster. Probably I could even implement it but who knows if it would do what is outlined in those papers.
Still, this is probably one of those things I think about that have zero impact on anything important in my life so time to move on to more important things Like lecture #7 for instance!
Is that the one which @jeremy showed in as part of SGD with momentum implementation in xl? In that part, he considered the previous loss value as part of the current value.
Yup, fancy name creates a lot of space and questions in my mind!
Yes, it is! No rapid changes are allowed, random variations of the gradient are smoothed. Simple recursive filter, another fancy name
Thanks, I need to revisit the video to understand more about this. Programming Jargons are kind of understandable for me, DL practitioners should give a different set of programmatic explanation for the approaches like this, surely Linear interpolation is scarier than the term recursive filter.