Part 2 lesson 11 wiki

(Jeremy Howard (Admin)) #1

(This is a wiki - feel free to edit.)

<<< Wiki: Lesson 10Wiki: Lesson 12 >>>

Lesson resources

Lesson papers

Timeline (incomplete)

  • (0:00:00) 1 cycle policy blog
  • (0:03:58) Google demo of seq to seq
  • (0:05:40) seq to seq models - machine translation
  • (0:07:20) Why are we learning about seq to seq?
  • (0:08:40) Four big wins of neural Machine translation
  • (0:09:20) BiDirectional GRU with attention
  • (0:09:55) Introducing the problem
  • (0:13:15) Think about language modeling vs neural translation
  • (0:13:35) neural translation from seq to seq model
  • (0:14:22) concat pooling
  • (0:18:00) seq to general purpose seq
  • (0:18:20) Pre-requisite Lesson 6
  • (0:19:50) Char Loop concat model
  • (0:21:40) Stacking RNN on another
  • (0:22:46) Translation start
  • (0:23:25) French to English questions instead of language translation
  • (0:42:40) Separate training and validation set
  • (0:43:30) Creating Data Loaders and Sampler trick to sort the sentences into similar sized ones
  • (0:47:06) First encoder-decoder architecture. Uses a GRU RNN.
  • (0:50:28) PyTorch module has a weight attribute. The weight attribute is a variable that has a data attribute.Finally, the data attribute is a tensor
  • (0:54:28) Question: If we just keep all embeddings for training, why don’t we keep all words embedding in case we have new words on the test set?
  • (0:55:35) Using vocabulary bigger than 40 thousand words
  • (1:00:50) Explaining the decoder architecture
  • (1:11:00) Results of the first architecture
  • (1:13:09) PAUSE
  • (1:14:00) Question about regularization techniques on seq2seq models and the AWD-LSTM architecture
  • (1:16:40) Bidirectional LSTMs architecture
  • (1:21:00) Question: Why do you have to have an end to the loop
  • (1:22:39) Teacher forcing architecture
  • (1:31:03) Attentional model
  • (1:40:11) Second explanation of attention in an RNN
  • (1:55:51) Devise
  • (2:11:48) nmslib: Super fast library for finding nearest neighbors on high-dimensional spaces
  • (2:13:03) Searching wordnet noun classes on imagenet

Other resources

Helpful stuff

Additional papers

Other libraries

  • Tensor2Tensor - A google DL mini-library with many datasets and tutorials for various Seq2Seq tasks

Useful function to transform Pytorch nn.module Class to fastai Learner Class

rnn = Seq2SeqRNN(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90) learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)

Lesson Index
About the Part 2 & Alumni (2018) category
(Jeremy Howard (Admin)) pinned #2

(Brian Holland) #17

For the acronym-obsessed like myself: BiLingual Evaluation Understudy (BLEU)

(yinterian) #19

How long are the sequences used to train translation models? Sentences?

(Aza Raskin) #20

There are two cool papers from Oct of last year that show how to do neural machine translation w/o parallel sentences!

(Hamel Husain) #21

Was an attention layer tried in the language model? Do you think it would be a good idea to try to add one and see what happens?

(Aza Raskin) #22

How would we start with pre-trained models of French and English and then fine-tune in this case (I.e., use the “Jeremey special” method)?

(Sharwon Pius) #23

Google’s neural machine translation system.

(Amrit ) #24

Here is the link to my notes from lesson 6 if anyone is interested in a refresher.

(William Horton) #25

Just heard him mention that we divide num_cpus by 2 because with hyperthreading we don’t get a speedup using all the hyperthreaded cores. Is that just based on practical experience, or is there some underlying reason why we wouldn’t get additional speedup from hyperthreading?

(rachana) #26

Why are we not starting with language model of English and French then starting the translation from scratch.

(rachana) #27

How should we tokenize audio or video files.

(Divyansh Jha) #28

why he didn’t add bos token?

(unknown) #29

How would you keep the pip- installed git version of fastai up-to-date? Would you just rerun the pip install command, or would you do some git pull command?

(Emil) #30

Do dimensions of both language embeddings have to be equal?

(Daniel Hunter) #31

Perhaps pip install [library name here] --upgrade?

(Nafiz Hamid) #32

pytorch official examples are not as good as the Keras official examples. Because most users of pytorch are researchers, I think they did not put too much importance on the best practices there assuming people know these.

(unknown) #33

So install would be ! pip install git+
and would update be: ! pip install git+ --upgrade

(Daniel Hunter) #34

I’m not sure – I was just guessing, but that may work.

(Bart Fish) #35

why bs=bs*1.6?