Why is yhat a result of mse? I thought yhat was the output of lin2, and mse was only a way of measuring how good our predictor is.
and those propagate back to outside the function? (so, classes are passed by reference?)
What is .clone() exactly doing?
Thatâs python for you, most objects are mutable.
Iâve run into this elsewhere in python (think it was scikit-learn, but canât remember specifically now) and was totally confused by the transposition. I still donât understand it, so if someone can explain, thatâd be awesome.
the model has layers referenced in its constructor. if we mutate the referenced class, we can access the value weâve added to it (as .g) from within the main class (model)
From what I read, looks like the idea is to head toward a language that integrates deep learning / numerical computation / differentiable programming. So the idea is that Swift will grow out of the â⌠for TFâ part. I think. Anyone chime in who knows more.
Itâs copying your tensor during assignment. Otherwise it would just copy the reference to that tensor and not the actual object.
Just to confirm, the â.clamp_min(0.)-0.5â is part of the tweaking?
I believe so, yes
I hadnât noticed this fixup initialization paper until Jeremy mentioned it today and it looks very interesting. I havenât fully grasped that paper yet, but does anyone know if some of these ideas or related ideas could apply to RNNs as well? Are there any other ways to improve over the LSTM/GRU (or antisymmetric RNN)?
Are __call__
and __init__
completely independent? For example if I have a __init__
variable, would it be available in a __call__
Yes, regular ReLU doesnât have the -0.5
So itâs like copy.deepcopy()?
But for PyTorchâs tensors.
Whatâs the intuition behind it?
Rewatch the video tomorrow, Jeremy explained it
init is called when you instantiate a variable (which you would have to do before calling it)
Module should have bwd()
shouldnât it?
shouldnât einsum be written
as: ib bjâ> ij (not bi bjâ> ij)? was that a typo?