Lesson 6 In-Class Discussion

KevinB · December 5, 2017, 4:23am

There it is. Thank you.

yinterian · December 5, 2017, 4:24am

Some loops may be unavoidable. (-: But yes, you want to avoid loops in python.

ezequiel · December 5, 2017, 4:25am

I think it’s in torch.nn.functional http://pytorch.org/docs/master/nn.html#torch.nn.functional.softmax

hiromi · December 5, 2017, 4:25am

Should we be worried about that being overfitted?

Ray2 · December 5, 2017, 4:26am

as the operation matrix (yellow line and green line) reused multiple times, does the “grad” accumulates multiple times during back propagation?

aymenim · December 5, 2017, 4:26am

@yinterian why use tanh instead of sigmoid?

yinterian · December 5, 2017, 4:30am

These are similar functions.
https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/

surmenok · December 5, 2017, 4:31am

This is well described here: http://cs231n.github.io/neural-networks-1/
Short answer: tanh output is zero-centered, it makes gradient descent process easier to converge.

Ray2 · December 5, 2017, 4:32am

i guess it depends on the output range one wants. tanh gives you (-1,1) and sigmoid gives you (0,1).

narvind2003 · December 5, 2017, 4:40am

@jeremy / @yinterian: This pytorch [-1] to get the last piece of the sequence-list is the same as keras return_sequences = False ?

I wonder if there is a fastai equivalent shortcut ?

memetzgz · December 5, 2017, 4:41am

what do the asterisks mean e.g., *cs or *V?

A_TF57 · December 5, 2017, 4:42am

used to unpack a tuple/list.

memetzgz · December 5, 2017, 4:42am

thx, @A_TF57

Ray2 · December 5, 2017, 4:42am

@jeremy,
Do you think by declaring the h0 variable as a self.h0 in constructor, so the model can work on cpu as well?

Edited:
Hi @jeremy,
First, I want to thank you for your clear explanation of RNN. This is THE best explanation I ever heard. The chart diagram is well-designed and illuminating.

in class CharSeqRnn, function forward(), h=V(torch.zeros(1,bs, n_hidden)).cuda() or without .cuda().
I think a better approach might be to declare a self.h0 =V(torch.zeros(1,bs, n_hidden)) in the constructor __init__(), so the model registers the variable, and can move its data to gpu via model.cuda(). In forward() method, we can initialize the variable h = self.h0.clone() for the loop.

kmatsuda · December 5, 2017, 4:42am

@yinterian are the weights for the hidden states not shared like with the character input? It sounded like there are a stack of them.

apalacios · December 5, 2017, 4:43am

Can you explain further?

yinterian · December 5, 2017, 4:43am

They are shared. Read the code so that you see it.

kmatsuda · December 5, 2017, 4:45am

Ok, thanks. I’ll spend some time with the code.

beacrett · December 5, 2017, 4:45am

this should help:

ezequiel · December 5, 2017, 4:46am

it’s for variable arguments in Python, you can read more about it in:
https://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/