Lesson 7 In-Class Discussion

jeremy · December 12, 2017, 2:00am

This thread contains the in-class discussion from Lesson 7. The Wiki links have been moved to this new thread. Please ask any questions about lesson 7 in the new wiki thread.

vikbehal · December 12, 2017, 2:24am

One last time!

ecdrid · December 12, 2017, 2:28am

Class Activation Maps.
Something on Dropout and PyTorch.
Jeremy’s presentation(link)

vikbehal · December 12, 2017, 2:33am

The official start date for DL part2 v2 will be March 19.

nafizh · December 12, 2017, 2:37am

How does the enrollment of part 2 for international fellows happen?

vikbehal · December 12, 2017, 2:38am

Same like part1v2.

ecdrid · December 12, 2017, 2:38am

Emailing as we did for this one…

KevinB · December 12, 2017, 2:41am

Any idea when Part1v2 will be available to the rest of the world?

gerardo · December 12, 2017, 2:43am

Why Than instead of RELU?

pete.condon · December 12, 2017, 2:44am

surmenok · December 12, 2017, 2:46am

This was about sigmoid vs. tanh.
tanh vs. relu is still not clear.

ecdrid · December 12, 2017, 2:47am

https://algorithmsdatascience.quora.com/ReLu-compared-against-Sigmoid-Softmax-Tanh

pete.condon · December 12, 2017, 2:47am

The second link is about why tanh works best in this case, it’s a complex question.

sjdlloyd · December 12, 2017, 2:48am

Practical question: what type of visa have you international people used to attend the course? Got to get involved in person for part 2

Even · December 12, 2017, 2:49am

I thought the hidden layers were supposed to be initialized with identity matrices, not 0’s… Wasn’t that the conclusion of the Hinton paper brought up last class?

pete.condon · December 12, 2017, 2:50am

I think in the RNN case it makes sense to initialise with zeros because the first state is no input.

surmenok · December 12, 2017, 2:51am

Found one paper that mentions IRNN - RNN with ReLU as an activation function. They say it works almost as well as LSTM. https://arxiv.org/pdf/1603.09420.pdf

Even · December 12, 2017, 2:52am

This is the paper I’m referring to:

Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix.

Is the recurrent weight matrix equivalent to the hidden layer?

KevinB · December 12, 2017, 2:54am

That explains why when I set bptt to 140 everything blew up… It is all coming together.

So what are the cons of setting it lower? Slower training times?

jenna · December 12, 2017, 2:55am

Hope you’re referencing the greatest musical of all time ;p