There it is. Thank you.
Some loops may be unavoidable. (-: But yes, you want to avoid loops in python.
I think it’s in torch.nn.functional
http://pytorch.org/docs/master/nn.html#torch.nn.functional.softmax
as the operation matrix (yellow line and green line) reused multiple times, does the “grad” accumulates multiple times during back propagation?
These are similar functions.
https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/
This is well described here: http://cs231n.github.io/neural-networks-1/
Short answer: tanh output is zero-centered, it makes gradient descent process easier to converge.
i guess it depends on the output range one wants. tanh gives you (-1,1) and sigmoid gives you (0,1).
@jeremy / @yinterian: This pytorch [-1] to get the last piece of the sequence-list is the same as keras return_sequences = False ?
I wonder if there is a fastai equivalent shortcut ?
what do the asterisks mean e.g., *cs or *V?
used to unpack a tuple/list.
@jeremy,
Do you think by declaring the h0 variable as a self.h0 in constructor, so the model can work on cpu as well?
Edited:
Hi @jeremy,
First, I want to thank you for your clear explanation of RNN. This is THE best explanation I ever heard. The chart diagram is well-designed and illuminating.
in class CharSeqRnn
, function forward()
, h=V(torch.zeros(1,bs, n_hidden)).cuda()
or without .cuda()
.
I think a better approach might be to declare a self.h0 =V(torch.zeros(1,bs, n_hidden))
in the constructor __init__()
, so the model registers the variable, and can move its data to gpu via model.cuda()
. In forward()
method, we can initialize the variable h = self.h0.clone()
for the loop.
@yinterian are the weights for the hidden states not shared like with the character input? It sounded like there are a stack of them.
Can you explain further?
They are shared. Read the code so that you see it.
Ok, thanks. I’ll spend some time with the code.
this should help:
it’s for variable arguments in Python, you can read more about it in:
https://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/