Lesson 8 (2019) discussion & wiki

sgugger · March 19, 2019, 3:38am

It might be linked to some kind of optimization, or maybe it just made more sense to them when they wrote that layer.

alando · March 19, 2019, 3:39am

why couldn’t you just set a cutoff for the initialization instead of implementing all this math? ie all w1 must be between 0.001 and 0.01 (and then randomly draw from that with a generic gaussian distribution)

sgugger · March 19, 2019, 3:40am

Because if you use a too tiny std, then your weights will vanish across the network and all become zeros. It is a tricky business: you have to be neither too big or too small.

PierreO · March 19, 2019, 3:41am

Does Terence know LaTeX or all that LaTeX is copy pasted from wikipedia ?

pl3 · March 19, 2019, 3:42am

I think you just gave me the solution a problem I’ve been having with one of my networks…!

I am using a single embedding layer, but for 16 inputs. I’m guessing the the scales aren’t working out well because of it.

username_not_found · March 19, 2019, 3:42am

@rachel to help people get started…

I can confirm that the docker works well. https://forums.fast.ai/t/docker-image-of-fastai-jupyter-cuda10-py3-7/40081

To update, log in via bash, and then git pull, reinstall, then git clone the part3 code. I have provided the commands there.

swagman · March 19, 2019, 3:44am

he made bookish a tool to convert md to latex

paul · March 19, 2019, 3:44am

Would be cleaner if Jeremy used y=f(x) and y=g(x), just to avoid any confusion about composing functions.

sgugger · March 19, 2019, 3:45am

He didn’t use f anf f afterward thought

knguyen · March 19, 2019, 3:47am

I love the course! I really hope that you will expand this course in some advanced topics like randomized linear algebra and the linkage to convex/non-convex optimization.

drscotthawley · March 19, 2019, 3:49am

Sorry, why do his function def’s not have “return ____” at the end?
Is it a Python default that functions return the result of the last operation?

sgugger · March 19, 2019, 3:49am

It’s storing the gradients, that’s all we need.

zachcaceres · March 19, 2019, 3:49am

it’s assigned to a layer via a mutation (class.g = whatever)

jcd · March 19, 2019, 3:50am

Why is yhat a result of mse? I thought yhat was the output of lin2, and mse was only a way of measuring how good our predictor is.

drscotthawley · March 19, 2019, 3:50am

and those propagate back to outside the function? (so, classes are passed by reference?)

ymittal23 · March 19, 2019, 3:51am

What is .clone() exactly doing?

sgugger · March 19, 2019, 3:51am

That’s python for you, most objects are mutable.

magiclantern · March 19, 2019, 3:52am

I’ve run into this elsewhere in python (think it was scikit-learn, but can’t remember specifically now) and was totally confused by the transposition. I still don’t understand it, so if someone can explain, that’d be awesome.

zachcaceres · March 19, 2019, 3:52am

the model has layers referenced in its constructor. if we mutate the referenced class, we can access the value we’ve added to it (as .g) from within the main class (model)

Borz · March 19, 2019, 3:52am

From what I read, looks like the idea is to head toward a language that integrates deep learning / numerical computation / differentiable programming. So the idea is that Swift will grow out of the “… for TF” part. I think. Anyone chime in who knows more.