This is a place to talk about more advanced or tangential topics related to the Lesson 4 lecture. This will not be monitored during class, but we will read it afterwards.
Feel free to discuss anything you like, as long as it’s at least somewhat related to what’s happening in class.
In imdb scripts, when we use QRNN instead of LSTM, we set dropouts to 0. and increased the number of hidden nodes from 1150 to 1550. What’s the rationale behind this?
Can you explain a little bit more about how gradient clipping affects the training? For instance, in classification based on imdb it was set to 25 while language modeling based on wiki it was 0.12.
Spacy is by far the biggest lib depencency in fastai… around 1Gb. For comparisson, torch is about 250Mb.
It seems that we use it basically for training, is it possible to somehow prevent loading it when we only want/need to predict?
So, I am abit assumming here that Jeremy is going to discuss classics like RNNs and LSTMs. I’ve wondered what does Jeremy think of GCNNs by FAIR. Since they are all convolutional they seem to be very useful for training and doing transfer learning.
Language models are typically trained using some form of cross entropy loss. This loss function only evaluates if a word is wrong or right. Is there a way to reward the model for predicting an incorrect word that is very similar to the correct word?
Thinking back to the devise lesson in last year’s part 2, could you train a language model by having it output a word vector, evaluating loss by comparing the cosine similarity to the correct word vector?
How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia
fastai will do that automatically for you: when you load your pretrained weights, it adapts them to the model. On words that were present in the pretrained model vocab, it puts the learned embeddings, and on new words, it puts the mean of al learned embeddings.
When you train your model, the first stage when the body is frozen, is to make those new embeddings learn something better.
how can we use language models for language to language translation? So far, am aware of the approach with word vectors (last version fast.ai course, part2)
If columnar data contains other types of data, say like text or an image in one of its columns how would we use that information in our model? How would we do that using the FastAI library?
are fully connected networks good for very ‘wide’ tabular datasets with real values? can we train attention/autoencoder while training the network at the same time or would generally fully connected network + regularization work better?