Part 2 lesson 11 wiki

(This is a wiki - feel free to edit.)

<<< Wiki: Lesson 10Wiki: Lesson 12 >>>

Lesson resources

Lesson papers

Timeline (incomplete)

  • (0:00:00) 1 cycle policy blog
  • (0:03:58) Google demo of seq to seq
  • (0:05:40) seq to seq models - machine translation
  • (0:07:20) Why are we learning about seq to seq?
  • (0:08:40) Four big wins of neural Machine translation
  • (0:09:20) BiDirectional GRU with attention
  • (0:09:55) Introducing the problem
  • (0:13:15) Think about language modeling vs neural translation
  • (0:13:35) neural translation from seq to seq model
  • (0:14:22) concat pooling
  • (0:18:00) seq to general purpose seq
  • (0:18:20) Pre-requisite Lesson 6
  • (0:19:50) Char Loop concat model
  • (0:21:40) Stacking RNN on another
  • (0:22:46) Translation start
  • (0:23:25) French to English questions instead of language translation
  • (0:42:40) Separate training and validation set
  • (0:43:30) Creating Data Loaders and Sampler trick to sort the sentences into similar sized ones
  • (0:47:06) First encoder-decoder architecture. Uses a GRU RNN.
  • (0:50:28) PyTorch module has a weight attribute. The weight attribute is a variable that has a data attribute.Finally, the data attribute is a tensor
  • (0:54:28) Question: If we just keep all embeddings for training, why don’t we keep all words embedding in case we have new words on the test set?
  • (0:55:35) Using vocabulary bigger than 40 thousand words
  • (1:00:50) Explaining the decoder architecture
  • (1:11:00) Results of the first architecture
  • (1:13:09) PAUSE
  • (1:14:00) Question about regularization techniques on seq2seq models and the AWD-LSTM architecture
  • (1:16:40) Bidirectional LSTMs architecture
  • (1:21:00) Question: Why do you have to have an end to the loop
  • (1:22:39) Teacher forcing architecture
  • (1:31:03) Attentional model
  • (1:40:11) Second explanation of attention in an RNN
  • (1:55:51) Devise
  • (2:11:48) nmslib: Super fast library for finding nearest neighbors on high-dimensional spaces
  • (2:13:03) Searching wordnet noun classes on imagenet

Other resources

Helpful stuff

Additional papers

Other libraries

  • Tensor2Tensor - A google DL mini-library with many datasets and tutorials for various Seq2Seq tasks

Useful function to transform Pytorch nn.module Class to fastai Learner Class

rnn = Seq2SeqRNN(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90) 
learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)

For the acronym-obsessed like myself: BiLingual Evaluation Understudy (BLEU)


How long are the sequences used to train translation models? Sentences?


There are two cool papers from Oct of last year that show how to do neural machine translation w/o parallel sentences!


Was an attention layer tried in the language model? Do you think it would be a good idea to try to add one and see what happens?


How would we start with pre-trained models of French and English and then fine-tune in this case (I.e., use the “Jeremey special” method)?


Google’s neural machine translation system.

Here is the link to my notes from lesson 6 if anyone is interested in a refresher.


Just heard him mention that we divide num_cpus by 2 because with hyperthreading we don’t get a speedup using all the hyperthreaded cores. Is that just based on practical experience, or is there some underlying reason why we wouldn’t get additional speedup from hyperthreading?


Why are we not starting with language model of English and French then starting the translation from scratch.


How should we tokenize audio or video files.


why he didn’t add bos token?


How would you keep the pip- installed git version of fastai up-to-date? Would you just rerun the pip install command, or would you do some git pull command?


Do dimensions of both language embeddings have to be equal?


Perhaps pip install [library name here] --upgrade?

1 Like

pytorch official examples are not as good as the Keras official examples. Because most users of pytorch are researchers, I think they did not put too much importance on the best practices there assuming people know these.


So install would be ! pip install git+
and would update be: ! pip install git+ --upgrade

1 Like

I’m not sure – I was just guessing, but that may work.

why bs=bs*1.6?

1 Like