Lesson 7 In-Class Discussion

username_not_found · December 12, 2017, 3:23am

Our links and access to the forums will remain open even though the class is ending , right?

vikbehal · December 12, 2017, 3:24am

What’s Gradient explosion?

pete.condon · December 12, 2017, 3:24am

Are you able to find another corpus to augment the data being supplied to the language model?

vikbehal · December 12, 2017, 3:24am

Nope. Pretty navie with NLP.

pete.condon · December 12, 2017, 3:24am

https://www.quora.com/Why-is-it-a-problem-to-have-exploding-gradients-in-a-neural-net-especially-in-an-RNN

lgvaz · December 12, 2017, 3:27am

Jeremy said we use tanh instead of relu to avoid gradient explosion, but with tanh we face the exact opposite problem (vanishing gradient) don’t we? Basically if the activation is too high or low the gradient of tanh becomes very small and we get stuck on that region

pete.condon · December 12, 2017, 3:28am

I suspect that your ratio of unique words to total words is too low for the model to understand the structure of the language in your corpus, so it might be better to use a pretrained embedding (e.g. word2vec) or download a large corpus to train on (e.g. wikipedia).

narvind2003 · December 12, 2017, 3:28am

with tanh the biggest +ve number you multiply it is 1. With relu, it could be max (0, a large num) leading to an exploding gradient.

vikbehal · December 12, 2017, 3:29am

Handy blog including what Jeremy is presenting now: https://distill.pub/
cc: @narvind2003

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Sree · December 12, 2017, 3:40am

The GRU Cell blogpost:

hiromi · December 12, 2017, 3:40am

Any tips on fixing generative model that repeats itself over and over and over…?

apalacios · December 12, 2017, 3:41am

This is great! Thanks for the share

ezequiel · December 12, 2017, 3:41am

Another interesting blog of CharRNN by Andrej Karpathy:

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

narvind2003 · December 12, 2017, 3:46am

The LSTM or GRU should learn to not repeat itself as compared to a standard RNN cell

vikbehal · December 12, 2017, 4:00am

@Moody! FYI

erinjerri · December 12, 2017, 4:10am

Anyone else having issues with Crestle not loading the new notebooks for this lesson after a git pull?

karthikramesh · December 12, 2017, 4:14am

Output dimension calculation after every convolution layer = ( W-K + 2P/ S ) + 1 where W - input height/length, K - Filter size(kernel size), P - Padding (P = (K-1)/2), S - Stride

karthikramesh · December 12, 2017, 4:18am

@yinterian What about division by zero problem, can we add a small number in the denominator ?

Sree · December 12, 2017, 4:22am

Jeremy asked to remind(last week) to talk more on using Dropout in training and testing in Pytorch .

Sree · December 12, 2017, 4:23am

Also, could jeremy give some inputs(towards the end of the class maybe) on how to use the model we created in an application we want to create around it like may be a mobile or web app.