Lesson 8 (2019) discussion & wiki

Why is yhat a result of mse? I thought yhat was the output of lin2, and mse was only a way of measuring how good our predictor is.

and those propagate back to outside the function? (so, classes are passed by reference?)

What is .clone() exactly doing?

1 Like

That’s python for you, most objects are mutable.

2 Likes

I’ve run into this elsewhere in python (think it was scikit-learn, but can’t remember specifically now) and was totally confused by the transposition. I still don’t understand it, so if someone can explain, that’d be awesome.

the model has layers referenced in its constructor. if we mutate the referenced class, we can access the value we’ve added to it (as .g) from within the main class (model)

3 Likes

From what I read, looks like the idea is to head toward a language that integrates deep learning / numerical computation / differentiable programming. So the idea is that Swift will grow out of the “… for TF” part. I think. Anyone chime in who knows more.

It’s copying your tensor during assignment. Otherwise it would just copy the reference to that tensor and not the actual object.

5 Likes

Just to confirm, the “.clamp_min(0.)-0.5” is part of the tweaking?

1 Like

I believe so, yes

I hadn’t noticed this fixup initialization paper until Jeremy mentioned it today and it looks very interesting. I haven’t fully grasped that paper yet, but does anyone know if some of these ideas or related ideas could apply to RNNs as well? Are there any other ways to improve over the LSTM/GRU (or antisymmetric RNN)?

Are __call__ and __init__ completely independent? For example if I have a __init__ variable, would it be available in a __call__

Yes, regular ReLU doesn’t have the -0.5

2 Likes

So it’s like copy.deepcopy()?

But for PyTorch’s tensors.

What’s the intuition behind it?

Rewatch the video tomorrow, Jeremy explained it :wink:

1 Like

init is called when you instantiate a variable (which you would have to do before calling it)

1 Like

Module should have bwd() shouldn’t it?

shouldn’t einsum be written
as: ib bj–> ij (not bi bj–> ij)? was that a typo?