Lesson 4 Advanced Discussion ✅


(Rachel Thomas) #1

This is a place to talk about more advanced or tangential topics related to the Lesson 4 lecture. This will not be monitored during class, but we will read it afterwards.

Feel free to discuss anything you like, as long as it’s at least somewhat related to what’s happening in class.


Lesson 4 official resources and updates ✅
Lesson 4 In-Class Discussion ✅
FAQ, resources, and official course updates ✅
📝 Deep Learning Lesson 4 Notes
Fast.ai v3 2019课程中文版笔记
(Charin) #5
  1. In imdb scripts, when we use QRNN instead of LSTM, we set dropouts to 0. and increased the number of hidden nodes from 1150 to 1550. What’s the rationale behind this?
  2. Can you explain a little bit more about how gradient clipping affects the training? For instance, in classification based on imdb it was set to 25 while language modeling based on wiki it was 0.12.

#16

Hey guys, Jeremy just mentioned that someone improved the score for CamVid, anybody knows what was the fixed to calculate the accuracy properly?


(Fred Guth) #17

Nothing in git log?


(Nissan Dookeran) #18

I am assuming they reduced the dataset to match the subset used from the paper so it was a fairer comparison


(Fred Guth) #19

Spacy is by far the biggest lib depencency in fastai… around 1Gb. For comparisson, torch is about 250Mb.
It seems that we use it basically for training, is it possible to somehow prevent loading it when we only want/need to predict?


Developer chat
Lesson 4 In-Class Discussion ✅
#21

So, I am abit assumming here that Jeremy is going to discuss classics like RNNs and LSTMs. I’ve wondered what does Jeremy think of GCNNs by FAIR. Since they are all convolutional they seem to be very useful for training and doing transfer learning.


(Fred Guth) #22

TextLMDataBunch does not let us set bs nor max_vocab anymore. How do we set that?


(Karl) #25

Language models are typically trained using some form of cross entropy loss. This loss function only evaluates if a word is wrong or right. Is there a way to reward the model for predicting an incorrect word that is very similar to the correct word?

Thinking back to the devise lesson in last year’s part 2, could you train a language model by having it output a word vector, evaluating loss by comparing the cosine similarity to the correct word vector?


(Fred Guth) #26

I guess we should use DataBlock… how do you set bs and max_vocab with this new api?


(Gregory PAKOSZ) #27

Do people use CNNs for NLP? Or are people using (Bi)-LSTMs?


(Pavithra Solai Jawahar) #28

moms parameter in learn_fit_cycle? I didn’t get it… Can you please explain it again?


(Nitin Pasumarthy) #29

How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia


#30

I think Jeremy will explain them later.


(Nitin Pasumarthy) #31

How to expand the vocab once the embeddings are learned?


#32

fastai will do that automatically for you: when you load your pretrained weights, it adapts them to the model. On words that were present in the pretrained model vocab, it puts the learned embeddings, and on new words, it puts the mean of al learned embeddings.
When you train your model, the first stage when the body is frozen, is to make those new embeddings learn something better.


(benedikt herudek) #33

how can we use language models for language to language translation? So far, am aware of the approach with word vectors (last version fast.ai course, part2)


(Sudarshan) #34

If columnar data contains other types of data, say like text or an image in one of its columns how would we use that information in our model? How would we do that using the FastAI library?


(Alena Harley) #35

are fully connected networks good for very ‘wide’ tabular datasets with real values? can we train attention/autoencoder while training the network at the same time or would generally fully connected network + regularization work better?


(Ramesh Kumar Singh) #36

Is there any good resource which explains how to explain a Neural Net… i.e the feature interactions, multi-collinearity ?