Lesson 5 In-Class Discussion ✅

hiromu_n · July 28, 2019, 9:31am

as of Jul. 28th 2019, probably it does not provide (sorry if it is wrong).

My workflow to use 100k data is the following:

current environment:
– using Crestle.ai
– fastai ver. 1.0.55
– just have done git pull at courses/fast-ai/course-v3/

I download the data from:
– http://files.grouplens.org/datasets/movielens/ml-100k.zip
Upload the zip from Jupyter notebook’s UI
– You can find Upload button on upper right of the screen
Open terminal from New -> Terminal
Move directory to the place you uploaded the file (probably /home/crestle/fastai)
Move ml-100k.zip file to /home/crestle/.fastai/data (note that ‘dot’ exist before ‘fastai’)
– use linux command: https://www.rapidtables.com/code/linux/mv.html
Navigate your directory to /home/crestle/.fastai/data with cd command
Unzip the zip file with unzip ml-100k.zip

This let me run all codes in less4-collab.ipynb.
(I am a very beginner for using linux, so there should be more efficient way…)

foobar8675 · August 28, 2019, 5:10pm

I’m looking at this snippet from the class notes for lesson 2, since i needed to review for lesson 5.

def update():
    y_hat = x@a
    loss = mse(y, y_hat)
    if t % 10 == 0: print(loss)
    loss.backward()
    with torch.no_grad():
        a.sub_(lr * a.grad) // this line!!!
        a.grad.zero_()

What I’m not understanding is how does a.grad get populated. a is not passed into loss.backward() and I don’t see how it could reference it. If anyone has a suggestion on understanding this line, it would be appreciated.

github.com

fastai/course-v3/blob/master/nbs/dl1/lesson2-sgd.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from fastai.basics import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this part of the lecture we explain Stochastic Gradient Descent (SGD) which is an **optimization** method commonly used in neural networks. We will illustrate the concepts with concrete examples."
   ]
  },
  {

This file has been truncated. show original

Brainkite · August 31, 2019, 4:42pm

Hey guys,

As Jeremy asked in the Lesson 5,
I just re-created the NN.linear class and Adam optimizer from scratch.
The only blurry part is the first weights update.
Since Adam relies on having previous update vectors to process the new updates, I used regular SGD for the first update.
But how is this normally done?
Of course feel free to criticize my code and the way I mad it work.

Here’s the notebook:

github.com

Brainkite/Fastai-personal-notebooks/blob/master/L5 - My NN linear and Adam.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MNIST"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fast_launch(imports, DataBunch, variables)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},

This file has been truncated. show original

Brainkite · September 1, 2019, 4:16pm

I’m not fully understanding it myself but from what I actually understood:
The parameters of the NN layers are matrices of weights and bias stored as Pytorch Tensors.
When a tensor is created there is a boolean parameter called ‘Requires_grad’.

Here comes the blurry part for me so take it with a grain of salt (I have to digg in the source code of autograd):
If ‘Requiers_grad’ is set to True, the tensors is created with an extra “grads matrix” of same size and empty.
Then, when you call “loss.backward()”, Pytorch is somehow able to go back to the formula that produced ‘Loss’ and find the tensors involved for which “requiers_grad=True”.
Pytorch then processes the partial derivative for each entry in the tensor and stores the result in the extra “grads matrix”.

So when you call ‘a.grad’ you are indeed only looking in this “grads matrix” attached to ‘a’.
Is that somehow clear?

Brainkite · September 1, 2019, 4:33pm

This is the formula to update the parameters of the neural net with the SGD method.

Maybe you’ll understand it better like this:
new_parameter = old_parameter - learning_rate * parameter.grad

parameter: the matrix of weights or biases stored in the layer

learning_rate: is just a scaling value so the model will not move the weights too fast. (Jeremy talks about this one at great length in all the lessons)

parameter.grad : is the ‘partial derivative’ of the loss when you move just a little bit each separate weight.

So for each weight: it takes it’s partial derivative to the loss (what we call ‘grade’), scales it with the Learning_rate, and substract all this to the present weight to get the new weight.

spock · September 2, 2019, 7:33pm

at 1:48:55 mark of Lesson 5 video, how are J3, K3 cell values chosen initially for the first time? Here they are -18.33,98.246 but what were their values initially?

Also, any resource(article/visualization) for getting a better grasp on momentum, rmsprop , rmsprop+momentum (adam) ?

Thanks.

foobar8675 · September 2, 2019, 8:41pm

That is, thank you!

shashank.madan · September 17, 2019, 6:11am

Hi, I have a question on optimisers. specifically i don’t get how we are getting the exponentially weighted average’s initial value from… the other ones goes back to preceding and multiply by 0.9(momentum constant) plus with gradient of the current time step.(Correct me if i am wrong on this one…)
But the initial value as i pointed on the excel cell does it appear randomly or how?

Pls explain

shashank.madan · September 17, 2019, 6:15am

I have the same doubt… any answers yet…?

spock · September 19, 2019, 7:20am

not yet

marcossantana · October 8, 2019, 8:36pm

If anybody is having trouble to run custom networks, make sure you are passing a data loader (e.g. data.train_dl) to your update function. If you mistakenly pass a dataset (e.g. data.train_ds) you might get a error such as:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 ‘mat2’

This happened to me a couple of times and I solved it by checking lesson’s notes.

redtailedhawk · November 19, 2019, 4:14am

Thanks for these details. If you are using jupyter, you should use

mv ml-100k.zip /home/jupyter/.fastai/data

hiromu_n · November 24, 2019, 8:34am

Thank you for suggestion!

kelwa · February 12, 2020, 6:38pm

So I was trying to Implement nn.Linear on my own but I get different results than the built in one from Pytorch

my own code

image880×286 32.1 KB
losses starting from as high as 5 using Mnist_Logistic
compared to what Jeremy got

and using `Mnist_NN`

and this is what I got with the built in nn.Linear
So is it something wrong with my code ?

kelwa · February 12, 2020, 6:40pm

I kinda know that Pytorch doesn’t randomly initialize the weights as I did, but is it the thing causing this issue?

avenio · May 10, 2020, 5:44am

I was trying to implement the collaborative filtering notebook in Google Colab, with the original movielens-100k dataset. But whenever I am trying to run this line
“movie_bias=learn.bias(top_movies,is_item=True)”
I am getting an error
You’re trying to access an item that isn’t in the training data. If it was in your original data, it may have been split such that it’s only in the validation set now.

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in () ----> 1 movie_bias=learn.bias(top_movies,is_item=True) 2 #is item set to True says I want the items, False to say I want the users

3 frames

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 1722 # remove once script supports set_grad_enabled 1723 no_grad_embedding_renorm(weight, input, max_norm, norm_type) -> 1724 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 1725 1726

TypeError: embedding(): argument ‘indices’ (position 2) must be Tensor, not NoneType
Can someone please help me where I am going wrong and how to implement it correctly?
Thanks in advance.

avenio · May 10, 2020, 5:46am

Check the path where you have uploaded your dataset. Then copy the path accordingly.

vthommeret · May 16, 2020, 1:44pm

I similarly ran into this issue and it looks like it’s based on the weight / bias initialization. This blog post goes it into more detail and explains the PyTorch implementation:

github.com

pytorch/pytorch/blob/master/torch/nn/modules/linear.py

import math

import torch
from torch.nn.parameter import Parameter
from .. import functional as F
from .. import init
from .module import Module


class Identity(Module):
    r"""A placeholder identity operator that is argument-insensitive.

    Args:
        args: any argument (unused)
        kwargs: any keyword argument (unused)

    Examples::

        >>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False)
        >>> input = torch.randn(128, 20)

This file has been truncated. show original

It looks like it’s using a more complex initialization pattern (Kaiming initialization) but based on the PyTorch docs:

I was able to approximate the same scale and shape by initializing like this:

k = 1 / math.sqrt(in_features)
self.weights = nn.Parameter(torch.empty(in_features, out_features).uniform_(-k, k))
self.bias = bias
if self.bias:
  self.biases = nn.Parameter(torch.empty(out_features).uniform_(-k, k))

elie · July 7, 2020, 7:57am

During the estimation of the gradient, why is the 0.01 added on the intercept instead of adding it to the input prior multiplying with the slope? (f((x+0.01)a + b) - f(xa+b))/0.01

mrfabulous1 · July 12, 2020, 5:38am

Hi vthommeret Hope all is well!
I read your post it was informative and concise.
I added this line from pathlib import Path to avoid a Config error on google Colab
I ammended this line self.lin = nn.Linear(784, 10, bias=True).cuda() to avoid this error
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
Great post!
Cheers mrfabulous1

Lesson 5 In-Class Discussion ✅

and using Mnist_NN

and using `Mnist_NN`