Lesson 9 Discussion & Wiki (2019)

1 Like

Saw some weird behavior and couldn’t understand. I have two similar pieces of code doing parameter update. I see error in one and not in another. Can anyone help me understand why we observe this behavior

Can you try to replace the following line:

l.weight = l.weight - l.weight.grad*lr

with

l.weight.sub_(l.weight.grad*lr)

It might help.

1 Like

As shown in the screenshots even
l.weight -= l.weight.gradlr
seems to work. However i’m confused on why that would work and not
l.weight = l.weight - l.weight.grad
lr

It doesn’t block me or anything, but it is just weird :slight_smile:
Thanks for the response.

1 Like

weight is an nn.Parameter. So you need to assign it with an nn.Parameter. To avoid this properly, either use the data attribute (which is a tensor) or do inplace mutation.

4 Likes

@jeremy, if you’d like to replace the hardcoded cells in each dev_nbs such as:

!./notebook2script.py 02_fully_connected.ipynb

with just:

nb_auto_export()

add this into the first nb export, so that it’s imported everywhere.

from IPython.display import display, Javascript
def nb_auto_export():
    display(Javascript("if (IPython.notebook.kernel) { IPython.notebook.kernel.execute('!./notebook2script.py  ' + IPython.notebook.notebook_name )}"))

or w/o needing to import anything, just this js magic cell:

%%javascript 
if (IPython.notebook.kernel) {
    IPython.notebook.kernel.execute('!./notebook2script.py ' + IPython.notebook.notebook_name)
}

More details are here.

edit: had to add if (IPython.notebook.kernel) {} or it’d fail on nb load.

1 Like

A simplest analogy to a callback in real life is an emergency phone number. If something happens an emergency phone number is used to notify someone who cares about the situation or can do something about it. There might be different phone numbers for different situations. If there is fire we call a fire fighter. If there is a water leak there is a plumber’s phone number. Phone number enables notification about the situation, but in any case, someone should make the call and someone should answer it. This way the caller can make things happen without knowing details about what would happen as the result of the call.

In our case, think about the function’s name as a phone number. When we start training, Fast.ai library
makes a courtesy call to a function named begin_fit. Boilerplate begin_fit function implemented in the library does nothing at all. If you want to do something special though, you can implement your own function called begin_fit and in the body of the function put all the custom code that you want to be executed before fitting begins. Fast.ai does many ‘courtesy calls’ at different points in the training loop. These courtesy calls enable flexibility and customization of the training loop.

4 Likes

I will do that! :slight_smile:

2 Likes

I have annotated notebooks discussed in the Lesson 9 with hyperlinks to the time spots in YouTube video. They are available in my GitHub repo

Meanwhile I am trying to create a pull request for the annotated notebooks against the fastai_docs repository. I get an error while syncing with the original repository (see below).

Could someone help me getting past this error?

serge@mybox:  git remote -v

origin	https://github.com/sergeman/fastai_docs.git (fetch)
origin	https://github.com/sergeman/fastai_docs.git (push)
upstream	git@github.com:fastai/fastai_docs.git (fetch)
upstream	git@github.com:fastai/fastai_docs.git (push)

serge@mybox:~/annotations/fastai_docs-fork$ git fetch upstream
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
3 Likes

I get an error while syncing with the original repository (see below).

Please follow this guide: https://docs.fast.ai/dev/git.html#how-to-make-a-pull-request-pr

Most likely it’s because your protocols don’t match, one of them is https the other git, normally via ssh it looks like:

git remote -v
origin  git@github.com:stas00/pytorch.git (fetch)
origin  git@github.com:stas00/pytorch.git (push)
upstream        git@github.com:pytorch/pytorch.git (fetch)
upstream        git@github.com:pytorch/pytorch.git (push)

used pytorch example as I don’t have fastai_docs handy. but you can see the difference.
or perhaps make both of them https, I haven’t tried that way.

Is Jupyter notebook from the lectures available for lesson 9 ?

1 Like

Thanks @gietema

Please, remember, the first post of each lesson thread is always a Wiki, so if you see people asking resource questions please kindly consider editing the first post and add a link to the resource. Thank you.

p.s. added a link to the lesson notebooks.

2 Likes

This comment is in the context of “good init” and normalization:

I have just stumbled upon a good article introducing Scaled Exponential Linear Units (SELUs).

Quoting:

The activation function needs both positive and negative values for y to shift the mean . Both options are given here. It is also the reason why ReLU is not a candidate for a self-normalizing activation function since it can not output negative values.

The gradients can be used to adjust the variance . The activation function needs a region with a gradient larger than one to increase it. … A gradient very close to zero can be used to decrease the variance

The reason I mention it is that we haven’t quite discussed how to fix the mean with ReLU (only positive values) and it appears SELU doesn’t have this issue.

Are we sticking to ReLU because it still proven to be superior to SeLU or is it more difficult to teach with?

4 Likes

My take on it is that while the ReLU output (say x) will have positive mean, when the next layer symmetric weights around 0 (so same probability (density) for +w and -w) the activations x*w (or some sum of them) will be zero mean again. As such, having a mean of 0 isn’t paramount as long as it isn’t so large that the variance of x*w explodes (because the variance of the next layer will have a term mean(x)**2 * var(w)).
So as long as you control the mean, having it zero might not be an end in itself.

Best regards

Thomas

3 Likes

We did briefly mention it in lesson 1, when we did this:

We’ll discuss it more in the next lesson.

5 Likes

I did not understand what the order of callbacks actually does and how it works. What do we actually order, the classes that inherit from the callback class or the methods?
And why we did not specify an order in “AvgStatsCallback”, I understand that it automatically gets a 0 (from the parent). What am I misunderstanding?

I briefly discussed the problems with stuff like SELU and Fixup in lesson 2 - they are extremely sensitive to tiny changes in architecture. I’ll be discussing this a bit more in the next lesson.

3 Likes

Stas, I have been following the guide, but the guide does not seem to apply only to fastai, course-v3, and fastprogress repos. I am trying to create a pull request aganst fasta_docs. Does it matter?