Our links and access to the forums will remain open even though the class is ending , right?
Whatās Gradient explosion?
Are you able to find another corpus to augment the data being supplied to the language model?
Nope. Pretty navie with NLP.
Jeremy said we use tanh instead of relu to avoid gradient explosion, but with tanh we face the exact opposite problem (vanishing gradient) donāt we? Basically if the activation is too high or low the gradient of tanh becomes very small and we get stuck on that region
I suspect that your ratio of unique words to total words is too low for the model to understand the structure of the language in your corpus, so it might be better to use a pretrained embedding (e.g. word2vec) or download a large corpus to train on (e.g. wikipedia).
with tanh the biggest +ve number you multiply it is 1. With relu, it could be max (0, a large num) leading to an exploding gradient.
The GRU Cell blogpost:
Any tips on fixing generative model that repeats itself over and over and overā¦?
This is great! Thanks for the share
Another interesting blog of CharRNN by Andrej Karpathy:
The LSTM or GRU should learn to not repeat itself as compared to a standard RNN cell
Anyone else having issues with Crestle not loading the new notebooks for this lesson after a git pull
?
Output dimension calculation after every convolution layer = ( W-K + 2P/ S ) + 1 where W - input height/length, K - Filter size(kernel size), P - Padding (P = (K-1)/2), S - Stride
@yinterian What about division by zero problem, can we add a small number in the denominator ?
Jeremy asked to remind(last week) to talk more on using Dropout in training and testing in Pytorch .
Also, could jeremy give some inputs(towards the end of the class maybe) on how to use the model we created in an application we want to create around it like may be a mobile or web app.