Lesson 4 Advanced Discussion ✅

This is a place to talk about more advanced or tangential topics related to the Lesson 4 lecture. This will not be monitored during class, but we will read it afterwards.

Feel free to discuss anything you like, as long as it’s at least somewhat related to what’s happening in class.

1 Like
  1. In imdb scripts, when we use QRNN instead of LSTM, we set dropouts to 0. and increased the number of hidden nodes from 1150 to 1550. What’s the rationale behind this?
  2. Can you explain a little bit more about how gradient clipping affects the training? For instance, in classification based on imdb it was set to 25 while language modeling based on wiki it was 0.12.
8 Likes

Hey guys, Jeremy just mentioned that someone improved the score for CamVid, anybody knows what was the fixed to calculate the accuracy properly?

Nothing in git log?

I am assuming they reduced the dataset to match the subset used from the paper so it was a fairer comparison

2 Likes

Spacy is by far the biggest lib depencency in fastai… around 1Gb. For comparisson, torch is about 250Mb.
It seems that we use it basically for training, is it possible to somehow prevent loading it when we only want/need to predict?

6 Likes

So, I am abit assumming here that Jeremy is going to discuss classics like RNNs and LSTMs. I’ve wondered what does Jeremy think of GCNNs by FAIR. Since they are all convolutional they seem to be very useful for training and doing transfer learning.

4 Likes

TextLMDataBunch does not let us set bs nor max_vocab anymore. How do we set that?

2 Likes

Language models are typically trained using some form of cross entropy loss. This loss function only evaluates if a word is wrong or right. Is there a way to reward the model for predicting an incorrect word that is very similar to the correct word?

Thinking back to the devise lesson in last year’s part 2, could you train a language model by having it output a word vector, evaluating loss by comparing the cosine similarity to the correct word vector?

11 Likes

I guess we should use DataBlock… how do you set bs and max_vocab with this new api?

1 Like

Do people use CNNs for NLP? Or are people using (Bi)-LSTMs?

1 Like

moms parameter in learn_fit_cycle? I didn’t get it… Can you please explain it again?

3 Likes

How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia

1 Like

I think Jeremy will explain them later.

2 Likes

How to expand the vocab once the embeddings are learned?

1 Like

fastai will do that automatically for you: when you load your pretrained weights, it adapts them to the model. On words that were present in the pretrained model vocab, it puts the learned embeddings, and on new words, it puts the mean of al learned embeddings.
When you train your model, the first stage when the body is frozen, is to make those new embeddings learn something better.

8 Likes

how can we use language models for language to language translation? So far, am aware of the approach with word vectors (last version fast.ai course, part2)

1 Like

If columnar data contains other types of data, say like text or an image in one of its columns how would we use that information in our model? How would we do that using the FastAI library?

3 Likes

are fully connected networks good for very ‘wide’ tabular datasets with real values? can we train attention/autoencoder while training the network at the same time or would generally fully connected network + regularization work better?

2 Likes

Is there any good resource which explains how to explain a Neural Net… i.e the feature interactions, multi-collinearity ?

1 Like