you’re not doing the backprop, so you can run a larger batchsize.
For non-editable installs, the project is built locally in a temp dir and then installed normally. Note that if a satisfactory version of the package is already installed, the VCS source will not overwrite it without an --upgrade flag. VCS requirements pin the package version (specified in the setup.py file) of the target commit, not necessarily the commit itself.
https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support
Nothing prevents us from using different embedding sizes for encoder and decoder. It’s not a constraint of the architecture.
Where is the em_sz declared??
above fasttext embedding size
Do we just keep embedding vectors from training? Why don’t we keep all word embeddings in case you have a new words in the test set?
At about 7:25, Jeremy pointed out l'
… I don’t want to type it, but that apostophe looks like it’s actually the acute accent mark. If you add a change from acute-accent-mark to '
, you’ll get better results.
Same with curly quotes.
Would AWD-LSTM work too in place of GRU ?
can we save the “bottleneck states” is that useful to us in some way?
Any particular reason for not using dropout in decoder embedding layer?
Could we potentially start with an autoencoder on the english sentences to “pre train” the network, then switch to predicting French?
AWD LSTM is just LSTM with dropouts. So change GRU to LSTM and add those dropouts.
thanks. was wondering why it was’nt used in here since it trains faster than regular LSTM/GRUs.
If our network can learn to output the end of sentence token, why do we need to have a limit on the length of the output for
loop?
I didn’t see any bptt here, so maybe it’s to limit the backprop?
It would learn it eventually, but it wouldn’t learn it immediately. Infinite loops be bad
Or is it possible that AWD-LSTM is optimized for Language models.
We’re doing the decoder RNN for 90 steps. I would imagine the vast majority of answers are far less than 90. Does the length of this loop matter much for overall accuracy?
Because some of the input sentences were longer than a limit he put in, and so the sentence got truncated – which means no EOS token.
you have to read the whole en and fr sequences…unlike LMs, you cannot arbitrarily pick lengths to translate.