Lesson 7 In-Class Discussion


(Jeremy Howard (Admin)) #1

This thread contains the in-class discussion from Lesson 7. The Wiki links have been moved to this new thread. Please ask any questions about lesson 7 in the new wiki thread.


(Vikrant Behal) #2

One last time!


(ecdrid) #3
  • Class Activation Maps.
  • Something on Dropout and PyTorch.
  • Jeremy’s presentation(link)

(Vikrant Behal) #4

The official start date for DL part2 v2 will be March 19.


(Nafiz Hamid) #5

How does the enrollment of part 2 for international fellows happen?


(Vikrant Behal) #6

Same like part1v2.


(ecdrid) #7

Emailing as we did for this one…


(Kevin Bird) #8

Any idea when Part1v2 will be available to the rest of the world?


(Gerardo Garcia) #9

Why Than instead of RELU?


(Pete Condon) #10

(Pavel Surmenok) #11

This was about sigmoid vs. tanh.
tanh vs. relu is still not clear.


(ecdrid) #12

https://algorithmsdatascience.quora.com/ReLu-compared-against-Sigmoid-Softmax-Tanh


(Pete Condon) #13

The second link is about why tanh works best in this case, it’s a complex question.


(Sam Lloyd) #14

Practical question: what type of visa have you international people used to attend the course? Got to get involved in person for part 2 :slight_smile:


(Even Oldridge) #15

I thought the hidden layers were supposed to be initialized with identity matrices, not 0’s… Wasn’t that the conclusion of the Hinton paper brought up last class?


(Pete Condon) #16

I think in the RNN case it makes sense to initialise with zeros because the first state is no input.


(Pavel Surmenok) #17

Found one paper that mentions IRNN - RNN with ReLU as an activation function. They say it works almost as well as LSTM. https://arxiv.org/pdf/1603.09420.pdf


(Even Oldridge) #18

This is the paper I’m referring to:
https://arxiv.org/abs/1504.00941

Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix.

Is the recurrent weight matrix equivalent to the hidden layer?


(Kevin Bird) #19

That explains why when I set bptt to 140 everything blew up… It is all coming together.

So what are the cons of setting it lower? Slower training times?


#20

Hope you’re referencing the greatest musical of all time ;p