Lesson 6 In-Class Discussion

I think it’s in torch.nn.functional http://pytorch.org/docs/master/nn.html#torch.nn.functional.softmax

Should we be worried about that being overfitted?

4 Likes

as the operation matrix (yellow line and green line) reused multiple times, does the “grad” accumulates multiple times during back propagation?

2 Likes

@yinterian why use tanh instead of sigmoid?

These are similar functions.
https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/

1 Like

This is well described here: http://cs231n.github.io/neural-networks-1/
Short answer: tanh output is zero-centered, it makes gradient descent process easier to converge.

7 Likes

i guess it depends on the output range one wants. tanh gives you (-1,1) and sigmoid gives you (0,1).

2 Likes

@jeremy / @yinterian: This pytorch [-1] to get the last piece of the sequence-list is the same as keras return_sequences = False ?

I wonder if there is a fastai equivalent shortcut ?

what do the asterisks mean e.g., *cs or *V?

used to unpack a tuple/list.

2 Likes

thx, @A_TF57

@jeremy,
Do you think by declaring the h0 variable as a self.h0 in constructor, so the model can work on cpu as well?

Edited:
Hi @jeremy,
First, I want to thank you for your clear explanation of RNN. This is THE best explanation I ever heard. The chart diagram is well-designed and illuminating.

in class CharSeqRnn, function forward(), h=V(torch.zeros(1,bs, n_hidden)).cuda() or without .cuda().
I think a better approach might be to declare a self.h0 =V(torch.zeros(1,bs, n_hidden)) in the constructor __init__(), so the model registers the variable, and can move its data to gpu via model.cuda(). In forward() method, we can initialize the variable h = self.h0.clone() for the loop.

1 Like

@yinterian are the weights for the hidden states not shared like with the character input? It sounded like there are a stack of them.

Can you explain further?

1 Like

They are shared. Read the code so that you see it.

Ok, thanks. I’ll spend some time with the code.

this should help:

6 Likes

it’s for variable arguments in Python, you can read more about it in:
https://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/

2 Likes

As Jeremy talks about all the characters, I start to think about revisiting my Cryptography courses. ^^

1 Like

Hey, anyone know where is Nietzsche data couldn’t find it at files.fast.ai

Thanks