Lesson 4 Advanced Discussion ✅

rachel · November 14, 2018, 2:19am

This is a place to talk about more advanced or tangential topics related to the Lesson 4 lecture. This will not be monitored during class, but we will read it afterwards.

Feel free to discuss anything you like, as long as it’s at least somewhat related to what’s happening in class.

cstorm125 · November 14, 2018, 2:36am

In imdb scripts, when we use QRNN instead of LSTM, we set dropouts to 0. and increased the number of hidden nodes from 1150 to 1550. What’s the rationale behind this?
Can you explain a little bit more about how gradient clipping affects the training? For instance, in classification based on imdb it was set to 25 while language modeling based on wiki it was 0.12.

maks · November 14, 2018, 2:52am

Hey guys, Jeremy just mentioned that someone improved the score for CamVid, anybody knows what was the fixed to calculate the accuracy properly?

fredguth · November 14, 2018, 2:53am

Nothing in git log?

nissan · November 14, 2018, 2:53am

I am assuming they reduced the dataset to match the subset used from the paper so it was a fairer comparison

fredguth · November 14, 2018, 2:54am

Spacy is by far the biggest lib depencency in fastai… around 1Gb. For comparisson, torch is about 250Mb.
It seems that we use it basically for training, is it possible to somehow prevent loading it when we only want/need to predict?

Raanan · November 14, 2018, 2:55am

So, I am abit assumming here that Jeremy is going to discuss classics like RNNs and LSTMs. I’ve wondered what does Jeremy think of GCNNs by FAIR. Since they are all convolutional they seem to be very useful for training and doing transfer learning.

fredguth · November 14, 2018, 2:58am

TextLMDataBunch does not let us set bs nor max_vocab anymore. How do we set that?

KarlH · November 14, 2018, 3:03am

Language models are typically trained using some form of cross entropy loss. This loss function only evaluates if a word is wrong or right. Is there a way to reward the model for predicting an incorrect word that is very similar to the correct word?

Thinking back to the devise lesson in last year’s part 2, could you train a language model by having it output a word vector, evaluating loss by comparing the cosine similarity to the correct word vector?

fredguth · November 14, 2018, 3:04am

I guess we should use DataBlock… how do you set bs and max_vocab with this new api?

gpakosz · November 14, 2018, 3:04am

Do people use CNNs for NLP? Or are people using (Bi)-LSTMs?

pavisj · November 14, 2018, 3:06am

moms parameter in learn_fit_cycle? I didn’t get it… Can you please explain it again?

nithanaroy · November 14, 2018, 3:12am

How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia

sgugger · November 14, 2018, 3:12am

I think Jeremy will explain them later.

nithanaroy · November 14, 2018, 3:13am

How to expand the vocab once the embeddings are learned?

sgugger · November 14, 2018, 3:14am

fastai will do that automatically for you: when you load your pretrained weights, it adapts them to the model. On words that were present in the pretrained model vocab, it puts the learned embeddings, and on new words, it puts the mean of al learned embeddings.
When you train your model, the first stage when the body is frozen, is to make those new embeddings learn something better.

Benudek · November 14, 2018, 3:38am

how can we use language models for language to language translation? So far, am aware of the approach with word vectors (last version fast.ai course, part2)

shaun1 · November 14, 2018, 3:38am

If columnar data contains other types of data, say like text or an image in one of its columns how would we use that information in our model? How would we do that using the FastAI library?

alenas · November 14, 2018, 3:38am

are fully connected networks good for very ‘wide’ tabular datasets with real values? can we train attention/autoencoder while training the network at the same time or would generally fully connected network + regularization work better?

rameshsingh · November 14, 2018, 3:39am

Is there any good resource which explains how to explain a Neural Net… i.e the feature interactions, multi-collinearity ?