It might be linked to some kind of optimization, or maybe it just made more sense to them when they wrote that layer.
why couldn’t you just set a cutoff for the initialization instead of implementing all this math? ie all w1 must be between 0.001 and 0.01 (and then randomly draw from that with a generic gaussian distribution)
Because if you use a too tiny std, then your weights will vanish across the network and all become zeros. It is a tricky business: you have to be neither too big or too small.
Does Terence know LaTeX or all that LaTeX is copy pasted from wikipedia ?
I think you just gave me the solution a problem I’ve been having with one of my networks…!
I am using a single embedding layer, but for 16 inputs. I’m guessing the the scales aren’t working out well because of it.
@rachel to help people get started…
I can confirm that the docker works well. https://forums.fast.ai/t/docker-image-of-fastai-jupyter-cuda10-py3-7/40081
To update, log in via bash, and then git pull, reinstall, then git clone the part3 code. I have provided the commands there.
he made bookish a tool to convert md to latex
Would be cleaner if Jeremy used y=f(x) and y=g(x), just to avoid any confusion about composing functions.
He didn’t use f anf f afterward thought
I love the course! I really hope that you will expand this course in some advanced topics like randomized linear algebra and the linkage to convex/non-convex optimization.
Sorry, why do his function def’s not have “return ____” at the end?
Is it a Python default that functions return the result of the last operation?
It’s storing the gradients, that’s all we need.
it’s assigned to a layer via a mutation (class.g = whatever)
Why is yhat a result of mse? I thought yhat was the output of lin2, and mse was only a way of measuring how good our predictor is.
and those propagate back to outside the function? (so, classes are passed by reference?)
What is .clone() exactly doing?
That’s python for you, most objects are mutable.
I’ve run into this elsewhere in python (think it was scikit-learn, but can’t remember specifically now) and was totally confused by the transposition. I still don’t understand it, so if someone can explain, that’d be awesome.
the model has layers referenced in its constructor. if we mutate the referenced class, we can access the value we’ve added to it (as .g) from within the main class (model)
From what I read, looks like the idea is to head toward a language that integrates deep learning / numerical computation / differentiable programming. So the idea is that Swift will grow out of the “… for TF” part. I think. Anyone chime in who knows more.