Lesson 6 In-Class Discussion


(Kevin Bird) #82

There it is. Thank you.


(yinterian) #83

Some loops may be unavoidable. (-: But yes, you want to avoid loops in python.


(Ezequiel) #84

I think it’s in torch.nn.functional http://pytorch.org/docs/master/nn.html#torch.nn.functional.softmax


(Hiromi Suenaga) #85

Should we be worried about that being overfitted?


(Yihui Ray Ren) #86

as the operation matrix (yellow line and green line) reused multiple times, does the “grad” accumulates multiple times during back propagation?


(Aymen Ibrahim) #87

@yinterian why use tanh instead of sigmoid?


(yinterian) #88

These are similar functions.
https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/


Lesson 7 In-Class Discussion
(Pavel Surmenok) #89

This is well described here: http://cs231n.github.io/neural-networks-1/
Short answer: tanh output is zero-centered, it makes gradient descent process easier to converge.


Lesson 7 In-Class Discussion
(Yihui Ray Ren) #90

i guess it depends on the output range one wants. tanh gives you (-1,1) and sigmoid gives you (0,1).


(Arvind Nagaraj) #91

@jeremy / @yinterian: This pytorch [-1] to get the last piece of the sequence-list is the same as keras return_sequences = False ?

I wonder if there is a fastai equivalent shortcut ?


(Maureen Metzger) #92

what do the asterisks mean e.g., *cs or *V?


(Ankit Goila) #93

used to unpack a tuple/list.


(Maureen Metzger) #94

thx, @A_TF57


(Yihui Ray Ren) #95

@jeremy,
Do you think by declaring the h0 variable as a self.h0 in constructor, so the model can work on cpu as well?

Edited:
Hi @jeremy,
First, I want to thank you for your clear explanation of RNN. This is THE best explanation I ever heard. The chart diagram is well-designed and illuminating.

in class CharSeqRnn, function forward(), h=V(torch.zeros(1,bs, n_hidden)).cuda() or without .cuda().
I think a better approach might be to declare a self.h0 =V(torch.zeros(1,bs, n_hidden)) in the constructor __init__(), so the model registers the variable, and can move its data to gpu via model.cuda(). In forward() method, we can initialize the variable h = self.h0.clone() for the loop.


(Ken) #96

@yinterian are the weights for the hidden states not shared like with the character input? It sounded like there are a stack of them.


(asaia) #97

Can you explain further?


(yinterian) #98

They are shared. Read the code so that you see it.


(Ken) #99

Ok, thanks. I’ll spend some time with the code.


(Ben Eacrett) #100

this should help:


(Ezequiel) #101

it’s for variable arguments in Python, you can read more about it in:
https://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/