Lesson 7 In-Class Discussion

This thread contains the in-class discussion from Lesson 7. The Wiki links have been moved to this new thread. Please ask any questions about lesson 7 in the new wiki thread.

8 Likes

One last time!

1 Like
  • Class Activation Maps.
  • Something on Dropout and PyTorch.
  • Jeremy’s presentation(link)
1 Like

The official start date for DL part2 v2 will be March 19.

14 Likes

How does the enrollment of part 2 for international fellows happen?

Same like part1v2.

1 Like

Emailing as we did for this one…

1 Like

Any idea when Part1v2 will be available to the rest of the world?

1 Like

Why Than instead of RELU?

1 Like

This was about sigmoid vs. tanh.
tanh vs. relu is still not clear.

https://algorithmsdatascience.quora.com/ReLu-compared-against-Sigmoid-Softmax-Tanh

The second link is about why tanh works best in this case, it’s a complex question.

Practical question: what type of visa have you international people used to attend the course? Got to get involved in person for part 2 :slight_smile:

I thought the hidden layers were supposed to be initialized with identity matrices, not 0’s… Wasn’t that the conclusion of the Hinton paper brought up last class?

I think in the RNN case it makes sense to initialise with zeros because the first state is no input.

Found one paper that mentions IRNN - RNN with ReLU as an activation function. They say it works almost as well as LSTM. https://arxiv.org/pdf/1603.09420.pdf

1 Like

This is the paper I’m referring to:
https://arxiv.org/abs/1504.00941

Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix.

Is the recurrent weight matrix equivalent to the hidden layer?

That explains why when I set bptt to 140 everything blew up… It is all coming together.

So what are the cons of setting it lower? Slower training times?

Hope you’re referencing the greatest musical of all time ;p