Lesson 5 In-Class Discussion ✅

I’m not fully understanding it myself but from what I actually understood:
The parameters of the NN layers are matrices of weights and bias stored as Pytorch Tensors.
When a tensor is created there is a boolean parameter called ‘Requires_grad’.

Here comes the blurry part for me so take it with a grain of salt (I have to digg in the source code of autograd):
If ‘Requiers_grad’ is set to True, the tensors is created with an extra “grads matrix” of same size and empty.
Then, when you call “loss.backward()”, Pytorch is somehow able to go back to the formula that produced ‘Loss’ and find the tensors involved for which “requiers_grad=True”.
Pytorch then processes the partial derivative for each entry in the tensor and stores the result in the extra “grads matrix”.

So when you call ‘a.grad’ you are indeed only looking in this “grads matrix” attached to ‘a’.
Is that somehow clear?:smile:

2 Likes

This is the formula to update the parameters of the neural net with the SGD method.

Maybe you’ll understand it better like this:
new_parameter = old_parameter - learning_rate * parameter.grad

parameter: the matrix of weights or biases stored in the layer

learning_rate: is just a scaling value so the model will not move the weights too fast. (Jeremy talks about this one at great length in all the lessons)

parameter.grad : is the ‘partial derivative’ of the loss when you move just a little bit each separate weight.

So for each weight: it takes it’s partial derivative to the loss (what we call ‘grade’), scales it with the Learning_rate, and substract all this to the present weight to get the new weight.

2 Likes

at 1:48:55 mark of Lesson 5 video, how are J3, K3 cell values chosen initially for the first time? Here they are -18.33,98.246 but what were their values initially?


Also, any resource(article/visualization) for getting a better grasp on momentum, rmsprop , rmsprop+momentum (adam) ?

Thanks.

1 Like

That is, thank you!

Hi, I have a question on optimisers. specifically i don’t get how we are getting the exponentially weighted average’s initial value from… the other ones goes back to preceding and multiply by 0.9(momentum constant) plus with gradient of the current time step.(Correct me if i am wrong on this one…)
But the initial value as i pointed on the excel cell does it appear randomly or how?


Pls explain

I have the same doubt… any answers yet…?

not yet

If anybody is having trouble to run custom networks, make sure you are passing a data loader (e.g. data.train_dl) to your update function. If you mistakenly pass a dataset (e.g. data.train_ds) you might get a error such as:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 ‘mat2’

This happened to me a couple of times and I solved it by checking lesson’s notes.

Thanks for these details. If you are using jupyter, you should use

mv ml-100k.zip /home/jupyter/.fastai/data

Thank you for suggestion!

So I was trying to Implement nn.Linear on my own but I get different results than the built in one from Pytorch

  • my own code

  • losses starting from as high as 5 using Mnist_Logistic
    image

  • compared to what Jeremy got
    image

and using Mnist_NN

image

  • and this is what I got with the built in nn.Linear
    image

  • So is it something wrong with my code ?

1 Like

I kinda know that Pytorch doesn’t randomly initialize the weights as I did, but is it the thing causing this issue?

I was trying to implement the collaborative filtering notebook in Google Colab, with the original movielens-100k dataset. But whenever I am trying to run this line
“movie_bias=learn.bias(top_movies,is_item=True)”
I am getting an error
You’re trying to access an item that isn’t in the training data. If it was in your original data, it may have been split such that it’s only in the validation set now.

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in () ----> 1 movie_bias=learn.bias(top_movies,is_item=True) 2 #is item set to True says I want the items, False to say I want the users

3 frames

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 1722 # remove once script supports set_grad_enabled 1723 no_grad_embedding_renorm(weight, input, max_norm, norm_type) -> 1724 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 1725 1726

TypeError: embedding(): argument ‘indices’ (position 2) must be Tensor, not NoneType
Can someone please help me where I am going wrong and how to implement it correctly?
Thanks in advance.

Check the path where you have uploaded your dataset. Then copy the path accordingly.

I similarly ran into this issue and it looks like it’s based on the weight / bias initialization. This blog post goes it into more detail and explains the PyTorch implementation:

It looks like it’s using a more complex initialization pattern (Kaiming initialization) but based on the PyTorch docs:

I was able to approximate the same scale and shape by initializing like this:

k = 1 / math.sqrt(in_features)
self.weights = nn.Parameter(torch.empty(in_features, out_features).uniform_(-k, k))
self.bias = bias
if self.bias:
  self.biases = nn.Parameter(torch.empty(out_features).uniform_(-k, k))

During the estimation of the gradient, why is the 0.01 added on the intercept instead of adding it to the input prior multiplying with the slope? (f((x+0.01)a + b) - f(xa+b))/0.01

Hi vthommeret Hope all is well!
I read your post it was informative and concise.
I added this line from pathlib import Path to avoid a Config error on google Colab
I ammended this line self.lin = nn.Linear(784, 10, bias=True).cuda() to avoid this error
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
Great post!
Cheers mrfabulous1 :smiley: :smiley: