Lesson 2: further discussion ✅

I have a question regarding the update() function defined in lesson 2:

def update():
    y_hat = x@a
    loss = mse(y, y_hat)
    if t % 10 == 0: print(loss)
    loss.backward()
    with torch.no_grad():
        a.sub_(lr * a.grad)
        a.grad.zero_()

Having seen the update rule for linear regression couple of times from a mathematical perspective, I am a little surprised by how this works. Especially the loss.backward() call. This looks something like this:
(theta are the params, J is the loss function and h_theta(x) is the y_hat)

If you would do this mathematically, you would generally first compute the gradient (derivatives) of the loss function with respect to the parameters. And once you have their functional form you can plug in your y (labels), a (parameter estimates) and x (feature) values.

If I am reading this, then I’m seeing that they first compute the loss, which is mathematically just a scalar, and from that scalar they are still able to compute the gradient…

I guess it has something to do with the fact that what is returned from mse() is actually not a scalar but a rank 1 tensor which apparently seems to store all the stuff that actually went into it (e.g. y, a and x) and is somehow still able to compute the derivatives with respect to a correctly.

Nonetheless this seems quite “magical”, would be really grateful if somebody could shed some light on this! Would be also great to understand a little better how PyTorch is computing gradients. Is it doing that analytically?

1 Like

So for each label (multi-label) you have 4 images right? How about training 4 separate models and combine the results. E.g. combine the predictions of the four models into a single (final) prediction (perhaps by majority voting) ?

Would also be interesting to check whether a certain channel is better at predicting certain labels then others.

EDIT: just realized that Resnet naturally runs on 3 channel images (RGB)… so training 4 models on 4 different images wouldnt do. Anyway, here is a kernel of someone doing something similiar: https://www.kaggle.com/iafoss/pretrained-resnet34-with-rgby-0-460-public-lb
It reads:

   # we initialize this conv to take in 4 channels instead of 3
   # we keeping corresponding weights and initializing new weights with zeros

At each update, the derivative is computed at one point: the current value of the parameters. The technique used by pytorch or other DL frameworks is called Automatic Differentiation. Basically, the gradient is automatically computed step by step using the chain rule. So it is fast and accurate like you had an analytical expression.

If you want to understand more what is going on under the hood you can read
the first 3 section of this pytorch tutorial.

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

2 Likes

Thanks! It seems the magic that I am referring to is basically accomplished by the computational graph that is defined during the forward pass. Super interesting to see how this works and how one can implement new operators to work within this paradigm!

1 Like

Hey…i got an issue…The javascript code that will extract image link from google images is not working or rather the option to save the .txt file is just not appearing …any suggestion??

I used it about two days ago and it worked for me. Can you give any more information on exactly what you’re trying, what browser you’re using, etc?

Pls someone help…i just downloaded images for 5 categories from google images and then when i tried to run the verify function its just not verifying all categories

i just got my problem solved:sweat_smile:

Way to go! Feels great to get stumped and then work your way out of it.

You might want to share what was wrong and what you did, so that others will benefit from your work if they have a similar problem.

Not exactly right. This is true only if you are talking about a binary classification where you have 50% to get the right result with random choice. For a 10-class classification the probability to get the right class at random is 10% and so 59% would be a significant improvement.

1 Like

What issues cause this learning rate behavior?

Hi there! DL beginner and non-computer scientist here so forgive my ignorance. I had the same question as lucasvw, but I had a hard time following the PyTorch examples you linked. If possible, could you clarify your statement regarding computing the derivative for the current value of parameters within the context of the SGD example of lesson 2?

In other words, for a given update t, our line is defined by:

y(x_1,x_2) = a_1 x_1 + a_2 x_2

where x_2 is 1 but that doesn’t really matter.
If we define the loss function J as MSE for a total of n points:

J(\hat{y}, y) = \frac{\sum_{i=0}^{n} (\hat{y}-y)^2}{n}

Is the chain rule we are trying to apply this (for paramter x_1)?

\frac{\partial J}{\partial x_1} = \frac{\partial J}{\partial y} \frac{\partial y}{\partial x_1} + \frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial x_1}

I may be off track here big time. I’ve used the debugger to see what the values are at t=0 for the Lesson 2 example. For the following:

a= [-1,1]
J = 6.6121
a.grad = [-2.3325, -2.5381]

No matter what I do, I can’t seem to reproduce the values in a.grad. Any further help would be very appreciated. Thanks! :slight_smile:

I am getting below error while importing from fastai.widgets. It had run fine once but not sure what is causing the issue now.

from fastai.widgets import *


NameError Traceback (most recent call last)
in ()
----> 1 from fastai.widgets import *

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/init.py in ()
----> 1 from .image_cleaner import *

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in ()
12 all = [‘DatasetFormatter’, ‘ImageCleaner’]
13
—> 14 class DatasetFormatter():
15 @classmethod
16 def from_toplosses(cls, learn, n_imgs=None, ds_type:DatasetType=DatasetType.Valid, **kwargs):

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in DatasetFormatter()
14 class DatasetFormatter():
15 @classmethod
—> 16 def from_toplosses(cls, learn, n_imgs=None, ds_type:DatasetType=DatasetType.Valid, **kwargs):
17 “Formats images with padding for top losses fromlearn, using ds_type dataset.”
18 dl = learn.dl(ds_type)

NameError: name ‘DatasetType’ is not defined

1 Like

How can i look at the pariticular image data which are predicted wrong?

Q: I have no idea how to use a Nvidia GPU to accelerate my training. I can access my graphic card by using
**" torch.cuda.current_device()"**and “torch.cuda.set_device(0)”,
I’ve also tried “defaults.device = torch.device(‘cuda’)”, but it’s still slow when I run “learn.fit_one_cycle(4)” to train my dataset. Can any one tell me does fastai detect and use GPU automaticly or not; if not, what extra code do I need to use my GPU , thanks alot

Hello guys! I was just curious if widgets even work on Google colab or not?
I am running the code but it is always getting stuck at this point, the previous statement is running but whenever it is reaching ImageCleaner it is getting stuck

If anyone have the solution working on Google Colab then please share.
Thanks

i am not sure if it is google colab issue… even my run is stuck at this step with below errors.

In [126]:

ImageCleaner(ds, idxs, path)

TypeError Traceback (most recent call last)
in ()
----> 1 ImageCleaner(ds, idxs, path)

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in init(self, dataset, fns_idxs, batch_size, duplicates, start, end)
92 self._deleted_fns = []
93 self._skipped = 0
—> 94 self.render()
95
96 @classmethod

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in render(self)
220 self._skipped += 1
221 else:
–> 222 display(self.make_horizontal_box(self.get_widgets(self._duplicates)))
223 display(self.make_button_widget(‘Next Batch’, handler=self.next_batch, style=“primary”))

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in get_widgets(self, duplicates)
180 “Create and format widget set.”
181 widgets = []
–> 182 for (img,fp,human_readable_label) in self._all_images[:self._batch_size]:
183 img_widget = self.make_img_widget(img, layout=Layout(height=‘250px’, width=‘300px’))
184 dropdown = self.make_dropdown_widget(description=’’, options=self._labels, value=human_readable_label,

TypeError: slice indices must be integers or None or have an index method

I think i found the issue… the definition of ImageCleaner have changed. If you look at the API
ImageCleaner(dataset, fns_idxs, batch_size: int = 5, duplicates=False, start=0, end=40)

the third argument is… batch_size, instead of path-- so in your use case… just change the call to ImageCleaner as below

ImageCleaner(ds, idxs)

1 Like

The loss increases …in future lessons jeremy taught it…simply because …with higher learning rate weights are updated with larger value so they have less tendency to approach to minimum value as they keep oscillating but if you have lower learning rate the weigth which would get updated in backpropagation would generally tend to lower to minimum value

com_d
IT’s not working.

1 Like