This thread contains the in-class discussion from Lesson 7. The Wiki links have been moved to this new thread. Please ask any questions about lesson 7 in the new wiki thread.
One last time!
- Class Activation Maps.
- Something on Dropout and PyTorch.
- Jeremy’s presentation(link)
The official start date for DL part2 v2 will be March 19.
How does the enrollment of part 2 for international fellows happen?
Same like part1v2.
Emailing as we did for this one…
Any idea when Part1v2 will be available to the rest of the world?
Why Than instead of RELU?
This was about sigmoid vs. tanh.
tanh vs. relu is still not clear.
The second link is about why tanh works best in this case, it’s a complex question.
Practical question: what type of visa have you international people used to attend the course? Got to get involved in person for part 2
I thought the hidden layers were supposed to be initialized with identity matrices, not 0’s… Wasn’t that the conclusion of the Hinton paper brought up last class?
I think in the RNN case it makes sense to initialise with zeros because the first state is no input.
Found one paper that mentions IRNN - RNN with ReLU as an activation function. They say it works almost as well as LSTM. https://arxiv.org/pdf/1603.09420.pdf
This is the paper I’m referring to:
https://arxiv.org/abs/1504.00941
Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix.
Is the recurrent weight matrix equivalent to the hidden layer?
That explains why when I set bptt to 140 everything blew up… It is all coming together.
So what are the cons of setting it lower? Slower training times?
Hope you’re referencing the greatest musical of all time ;p