The language model we use is publicly available, though. No?
Yes, correct, but I was thinking to use the lib instead of writing custom training loop and models =) Also, they provide some embeddings already.
Why not also train the language model on the unsupervised entries in the IMDB dataset?
If we don’t hold out a validation set, does that mean there is no possibility of overfitting when building the language model?
We do!
How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia
A validation set is hold out, just a tinier portion (10k reviwes instead of 25k)
Is there a backwards pre-trained wiki103 model?
The new model we finetune will have new words in its vocab. That’s fine, it will learn their meaning during the fine-tuning.
You create the vocab from your own dataset.
Is there an analogue of Language Models for Images? Unstructured learning on an image corpus? e.g. obstructing part of the image and trying to predict it
Does that work also for titlecase, and mixed case?
TextLMDataBunch does not let us set bs
nor max_vocab
anymore. How do we set that?
I guess we should use new DataBlock api, but how to set those?
Can anyone find a source/citation for what Jeremy was talking about with SwiftKey(?) with the generated LaTeX proofs?
When fitting with Wikipedia there is no risk of overfitting because that is not the task that we are going to test the model on. With IMDB as Sylvain said, there is a validation set to avoid overfitting.
what is moms?
If we use another language. Where I set Lang=´pt´ for example? And do I set that it is going to use spacy?
Momentums
Where are the English language punctuation rules defined?
This competition is particularly strict, they’re limiting external data to a pre-selected set of embeddings. (see discussion https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/70978#418095 for example). But the un-trained models and training techniques would still be of use, as @devforfu noted