Lesson 6 In-Class Discussion

(Kevin Bird) #82

There it is. Thank you.

(yinterian) #83

Some loops may be unavoidable. (-: But yes, you want to avoid loops in python.

(Ezequiel) #84

I think it’s in torch.nn.functional http://pytorch.org/docs/master/nn.html#torch.nn.functional.softmax

(Hiromi Suenaga) #85

Should we be worried about that being overfitted?

(Yihui Ray Ren) #86

as the operation matrix (yellow line and green line) reused multiple times, does the “grad” accumulates multiple times during back propagation?

(Aymen Ibrahim) #87

@yinterian why use tanh instead of sigmoid?

(yinterian) #88

These are similar functions.

Lesson 7 In-Class Discussion
(Pavel Surmenok) #89

This is well described here: http://cs231n.github.io/neural-networks-1/
Short answer: tanh output is zero-centered, it makes gradient descent process easier to converge.

Lesson 7 In-Class Discussion
(Yihui Ray Ren) #90

i guess it depends on the output range one wants. tanh gives you (-1,1) and sigmoid gives you (0,1).

(Arvind Nagaraj) #91

@jeremy / @yinterian: This pytorch [-1] to get the last piece of the sequence-list is the same as keras return_sequences = False ?

I wonder if there is a fastai equivalent shortcut ?

(Maureen Metzger) #92

what do the asterisks mean e.g., *cs or *V?

(Ankit Goila) #93

used to unpack a tuple/list.

(Maureen Metzger) #94

thx, @A_TF57

(Yihui Ray Ren) #95

Do you think by declaring the h0 variable as a self.h0 in constructor, so the model can work on cpu as well?

Hi @jeremy,
First, I want to thank you for your clear explanation of RNN. This is THE best explanation I ever heard. The chart diagram is well-designed and illuminating.

in class CharSeqRnn, function forward(), h=V(torch.zeros(1,bs, n_hidden)).cuda() or without .cuda().
I think a better approach might be to declare a self.h0 =V(torch.zeros(1,bs, n_hidden)) in the constructor __init__(), so the model registers the variable, and can move its data to gpu via model.cuda(). In forward() method, we can initialize the variable h = self.h0.clone() for the loop.

(Ken) #96

@yinterian are the weights for the hidden states not shared like with the character input? It sounded like there are a stack of them.

(asaia) #97

Can you explain further?

(yinterian) #98

They are shared. Read the code so that you see it.

(Ken) #99

Ok, thanks. I’ll spend some time with the code.

(Ben Eacrett) #100

this should help:

(Ezequiel) #101

it’s for variable arguments in Python, you can read more about it in: