Lesson 5 In-Class Discussion ✅

as of Jul. 28th 2019, probably it does not provide (sorry if it is wrong).

My workflow to use 100k data is the following:

  • current environment:
    – using Crestle.ai
    – fastai ver. 1.0.55
    – just have done git pull at courses/fast-ai/course-v3/
  1. I download the data from:
    http://files.grouplens.org/datasets/movielens/ml-100k.zip
  2. Upload the zip from Jupyter notebook’s UI
    – You can find Upload button on upper right of the screen
  3. Open terminal from New -> Terminal
  4. Move directory to the place you uploaded the file (probably /home/crestle/fastai)
  5. Move ml-100k.zip file to /home/crestle/.fastai/data (note that ‘dot’ exist before ‘fastai’)
    – use linux command: https://www.rapidtables.com/code/linux/mv.html
  6. Navigate your directory to /home/crestle/.fastai/data with cd command
  7. Unzip the zip file with unzip ml-100k.zip

This let me run all codes in less4-collab.ipynb.
(I am a very beginner for using linux, so there should be more efficient way…)

2 Likes

I’m looking at this snippet from the class notes for lesson 2, since i needed to review for lesson 5.

def update():
    y_hat = x@a
    loss = mse(y, y_hat)
    if t % 10 == 0: print(loss)
    loss.backward()
    with torch.no_grad():
        a.sub_(lr * a.grad) // this line!!!
        a.grad.zero_()

What I’m not understanding is how does a.grad get populated. a is not passed into loss.backward() and I don’t see how it could reference it. If anyone has a suggestion on understanding this line, it would be appreciated.

Hey guys,

As Jeremy asked in the Lesson 5,
I just re-created the NN.linear class and Adam optimizer from scratch.
The only blurry part is the first weights update.
Since Adam relies on having previous update vectors to process the new updates, I used regular SGD for the first update.
But how is this normally done?
Of course feel free to criticize my code and the way I mad it work.

Here’s the notebook:

I’m not fully understanding it myself but from what I actually understood:
The parameters of the NN layers are matrices of weights and bias stored as Pytorch Tensors.
When a tensor is created there is a boolean parameter called ‘Requires_grad’.

Here comes the blurry part for me so take it with a grain of salt (I have to digg in the source code of autograd):
If ‘Requiers_grad’ is set to True, the tensors is created with an extra “grads matrix” of same size and empty.
Then, when you call “loss.backward()”, Pytorch is somehow able to go back to the formula that produced ‘Loss’ and find the tensors involved for which “requiers_grad=True”.
Pytorch then processes the partial derivative for each entry in the tensor and stores the result in the extra “grads matrix”.

So when you call ‘a.grad’ you are indeed only looking in this “grads matrix” attached to ‘a’.
Is that somehow clear?:smile:

2 Likes

This is the formula to update the parameters of the neural net with the SGD method.

Maybe you’ll understand it better like this:
new_parameter = old_parameter - learning_rate * parameter.grad

parameter: the matrix of weights or biases stored in the layer

learning_rate: is just a scaling value so the model will not move the weights too fast. (Jeremy talks about this one at great length in all the lessons)

parameter.grad : is the ‘partial derivative’ of the loss when you move just a little bit each separate weight.

So for each weight: it takes it’s partial derivative to the loss (what we call ‘grade’), scales it with the Learning_rate, and substract all this to the present weight to get the new weight.

2 Likes

at 1:48:55 mark of Lesson 5 video, how are J3, K3 cell values chosen initially for the first time? Here they are -18.33,98.246 but what were their values initially?


Also, any resource(article/visualization) for getting a better grasp on momentum, rmsprop , rmsprop+momentum (adam) ?

Thanks.

1 Like

That is, thank you!

Hi, I have a question on optimisers. specifically i don’t get how we are getting the exponentially weighted average’s initial value from… the other ones goes back to preceding and multiply by 0.9(momentum constant) plus with gradient of the current time step.(Correct me if i am wrong on this one…)
But the initial value as i pointed on the excel cell does it appear randomly or how?


Pls explain

I have the same doubt… any answers yet…?

not yet

If anybody is having trouble to run custom networks, make sure you are passing a data loader (e.g. data.train_dl) to your update function. If you mistakenly pass a dataset (e.g. data.train_ds) you might get a error such as:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 ‘mat2’

This happened to me a couple of times and I solved it by checking lesson’s notes.

Thanks for these details. If you are using jupyter, you should use

mv ml-100k.zip /home/jupyter/.fastai/data

Thank you for suggestion!

So I was trying to Implement nn.Linear on my own but I get different results than the built in one from Pytorch

  • my own code

  • losses starting from as high as 5 using Mnist_Logistic
    image

  • compared to what Jeremy got
    image

and using Mnist_NN

image

  • and this is what I got with the built in nn.Linear
    image

  • So is it something wrong with my code ?

1 Like

I kinda know that Pytorch doesn’t randomly initialize the weights as I did, but is it the thing causing this issue?

I was trying to implement the collaborative filtering notebook in Google Colab, with the original movielens-100k dataset. But whenever I am trying to run this line
“movie_bias=learn.bias(top_movies,is_item=True)”
I am getting an error
You’re trying to access an item that isn’t in the training data. If it was in your original data, it may have been split such that it’s only in the validation set now.

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in () ----> 1 movie_bias=learn.bias(top_movies,is_item=True) 2 #is item set to True says I want the items, False to say I want the users

3 frames

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 1722 # remove once script supports set_grad_enabled 1723 no_grad_embedding_renorm(weight, input, max_norm, norm_type) -> 1724 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 1725 1726

TypeError: embedding(): argument ‘indices’ (position 2) must be Tensor, not NoneType
Can someone please help me where I am going wrong and how to implement it correctly?
Thanks in advance.

Check the path where you have uploaded your dataset. Then copy the path accordingly.

I similarly ran into this issue and it looks like it’s based on the weight / bias initialization. This blog post goes it into more detail and explains the PyTorch implementation:

It looks like it’s using a more complex initialization pattern (Kaiming initialization) but based on the PyTorch docs:

I was able to approximate the same scale and shape by initializing like this:

k = 1 / math.sqrt(in_features)
self.weights = nn.Parameter(torch.empty(in_features, out_features).uniform_(-k, k))
self.bias = bias
if self.bias:
  self.biases = nn.Parameter(torch.empty(out_features).uniform_(-k, k))

During the estimation of the gradient, why is the 0.01 added on the intercept instead of adding it to the input prior multiplying with the slope? (f((x+0.01)a + b) - f(xa+b))/0.01

Hi vthommeret Hope all is well!
I read your post it was informative and concise.
I added this line from pathlib import Path to avoid a Config error on google Colab
I ammended this line self.lin = nn.Linear(784, 10, bias=True).cuda() to avoid this error
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
Great post!
Cheers mrfabulous1 :smiley: :smiley: