Part 2 lesson 11 wiki

sjdlloyd · April 10, 2018, 2:19am

you’re not doing the backprop, so you can run a larger batchsize.

danielhunter · April 10, 2018, 2:19am

For non-editable installs, the project is built locally in a temp dir and then installed normally. Note that if a satisfactory version of the package is already installed, the VCS source will not overwrite it without an --upgrade flag. VCS requirements pin the package version (specified in the setup.py file) of the target commit, not necessarily the commit itself.

https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support

surmenok · April 10, 2018, 2:19am

Nothing prevents us from using different embedding sizes for encoder and decoder. It’s not a constraint of the architecture.

gerardo · April 10, 2018, 2:22am

Where is the em_sz declared??

Interogativ · April 10, 2018, 2:22am

above fasttext embedding size

yinterian · April 10, 2018, 2:23am

Do we just keep embedding vectors from training? Why don’t we keep all word embeddings in case you have a new words in the test set?

Ducky · April 10, 2018, 2:27am

At about 7:25, Jeremy pointed out l'… I don’t want to type it, but that apostophe looks like it’s actually the acute accent mark. If you add a change from acute-accent-mark to ', you’ll get better results.

Same with curly quotes.

ravivijay · April 10, 2018, 2:28am

Would AWD-LSTM work too in place of GRU ?

Interogativ · April 10, 2018, 2:29am

can we save the “bottleneck states” is that useful to us in some way?

rudraksh · April 10, 2018, 2:32am

Any particular reason for not using dropout in decoder embedding layer?

pl3 · April 10, 2018, 2:32am

Could we potentially start with an autoencoder on the english sentences to “pre train” the network, then switch to predicting French?

divyansh · April 10, 2018, 2:33am

AWD LSTM is just LSTM with dropouts. So change GRU to LSTM and add those dropouts.

ravivijay · April 10, 2018, 2:38am

thanks. was wondering why it was’nt used in here since it trains faster than regular LSTM/GRUs.

wdhorton · April 10, 2018, 2:38am

If our network can learn to output the end of sentence token, why do we need to have a limit on the length of the output for loop?

sgugger · April 10, 2018, 2:40am

I didn’t see any bptt here, so maybe it’s to limit the backprop?

sjdlloyd · April 10, 2018, 2:42am

It would learn it eventually, but it wouldn’t learn it immediately. Infinite loops be bad

ravivijay · April 10, 2018, 2:43am

Or is it possible that AWD-LSTM is optimized for Language models.

blakewest · April 10, 2018, 2:45am

We’re doing the decoder RNN for 90 steps. I would imagine the vast majority of answers are far less than 90. Does the length of this loop matter much for overall accuracy?

Ducky · April 10, 2018, 2:45am

Because some of the input sentences were longer than a limit he put in, and so the sentence got truncated – which means no EOS token.

narvind2003 · April 10, 2018, 2:46am

you have to read the whole en and fr sequences…unlike LMs, you cannot arbitrarily pick lengths to translate.