Part 2 lesson 11 wiki

you’re not doing the backprop, so you can run a larger batchsize.

2 Likes

For non-editable installs, the project is built locally in a temp dir and then installed normally. Note that if a satisfactory version of the package is already installed, the VCS source will not overwrite it without an --upgrade flag. VCS requirements pin the package version (specified in the setup.py file) of the target commit, not necessarily the commit itself.

https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support

Nothing prevents us from using different embedding sizes for encoder and decoder. It’s not a constraint of the architecture.

4 Likes

Where is the em_sz declared??

above fasttext embedding size

Do we just keep embedding vectors from training? Why don’t we keep all word embeddings in case you have a new words in the test set?

5 Likes

At about 7:25, Jeremy pointed out l'… I don’t want to type it, but that apostophe looks like it’s actually the acute accent mark. If you add a change from acute-accent-mark to ', you’ll get better results.

Same with curly quotes.

5 Likes

Would AWD-LSTM work too in place of GRU ?

1 Like

can we save the “bottleneck states” is that useful to us in some way?

1 Like

Any particular reason for not using dropout in decoder embedding layer?

1 Like

Could we potentially start with an autoencoder on the english sentences to “pre train” the network, then switch to predicting French?

1 Like

AWD LSTM is just LSTM with dropouts. So change GRU to LSTM and add those dropouts.

5 Likes

thanks. was wondering why it was’nt used in here since it trains faster than regular LSTM/GRUs.

If our network can learn to output the end of sentence token, why do we need to have a limit on the length of the output for loop?

4 Likes

I didn’t see any bptt here, so maybe it’s to limit the backprop?

2 Likes

It would learn it eventually, but it wouldn’t learn it immediately. Infinite loops be bad

3 Likes

Or is it possible that AWD-LSTM is optimized for Language models.

1 Like

We’re doing the decoder RNN for 90 steps. I would imagine the vast majority of answers are far less than 90. Does the length of this loop matter much for overall accuracy?

Because some of the input sentences were longer than a limit he put in, and so the sentence got truncated – which means no EOS token.

you have to read the whole en and fr sequences…unlike LMs, you cannot arbitrarily pick lengths to translate.

2 Likes