Lesson 7 In-Class Discussion

Our links and access to the forums will remain open even though the class is ending , right?

Whatā€™s Gradient explosion?

Are you able to find another corpus to augment the data being supplied to the language model?

Nope. Pretty navie with NLP.

https://www.quora.com/Why-is-it-a-problem-to-have-exploding-gradients-in-a-neural-net-especially-in-an-RNN

2 Likes

Jeremy said we use tanh instead of relu to avoid gradient explosion, but with tanh we face the exact opposite problem (vanishing gradient) donā€™t we? Basically if the activation is too high or low the gradient of tanh becomes very small and we get stuck on that region

I suspect that your ratio of unique words to total words is too low for the model to understand the structure of the language in your corpus, so it might be better to use a pretrained embedding (e.g. word2vec) or download a large corpus to train on (e.g. wikipedia).

1 Like

with tanh the biggest +ve number you multiply it is 1. With relu, it could be max (0, a large num) leading to an exploding gradient.

Handy blog including what Jeremy is presenting now: https://distill.pub/
cc: @narvind2003

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

2 Likes

The GRU Cell blogpost:

1 Like

Any tips on fixing generative model that repeats itself over and over and overā€¦?

3 Likes

This is great! Thanks for the share

Another interesting blog of CharRNN by Andrej Karpathy:

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

1 Like

The LSTM or GRU should learn to not repeat itself as compared to a standard RNN cell

3 Likes

@Moody! FYI

Anyone else having issues with Crestle not loading the new notebooks for this lesson after a git pull?

Output dimension calculation after every convolution layer = ( W-K + 2P/ S ) + 1 where W - input height/length, K - Filter size(kernel size), P - Padding (P = (K-1)/2), S - Stride

@yinterian What about division by zero problem, can we add a small number in the denominator ?

1 Like

Jeremy asked to remind(last week) to talk more on using Dropout in training and testing in Pytorch .

Also, could jeremy give some inputs(towards the end of the class maybe) on how to use the model we created in an application we want to create around it like may be a mobile or web app.

1 Like