Please post your questions about the lesson here. This is a wiki post. Please add any resources or tips that are likely to be helpful to other students.
<<< Wiki: Lesson 9 | Wiki: Lesson 11 >>>
Lesson resources
- Lesson video
- Notebook Link
- PowerPoint slides
- IMDB data
- Lesson notes from @hiromi
- Lesson notes from @timlee
Links
- CS224n: Natural Language Processing With Deep Learning
- Wikipedia: List of mathematical symbols
- Wikipedia: Greek Letters Used in Mathematics, Science and Engineering
- VNC natively available in Mac - Screen Sharing
- Information and Entropy (course from MIT OpenCourseWare)
Papers
- Regularizing and Optimizing LSTM Language Models on understanding dropout in LSTM models by Stephen Merity et al.
- Universal Language Model Fine-tuning for Text Classification (ULMFiT or FitLam) - transfer learning for NLP by Jeremy Howard and Sebastian Ruder
- A disciplined approach to neural network hyper-parameters by Leslie N. Smith
- Learning non-maximum suppression - end-to-end convolutional network to replace manual NMS by Jan Hosang et al.
Code Snippets
Downloading the data:
curl -OL http://files.fast.ai/data/aclImdb.tgz
tar -xzf aclImdb.tgz
Time Line
Review
IMDB
- (0:20:30) IMDB with fastai.text
- (0:23:10) The standard format of text classification dataset
- (0:28:08) Difference between tokens and words 1 - spaCy
- (0:29:59) Pandas
chunksize
to deal with a large corpus - (0:32:38) {BOS} (beginning of sentence/stream) and {FLD} (field) tokens
- (0:33:57) Run spaCy on multi-cores with
proc_all_mp()
- (0:35:40) Difference between tokens and word 2 - capture semantic of letter case and others
- (0:38:05) Numericalise tokens - Python
Counter()
class
Pre-trained Language Model - PreTraining
- (0:42:16) Pre-trained language model
- (0:47:13) Map IMDb index to wiki text index
- (0:53:09) fastai documentation project
- (0:58:24) Difference between pre-trained LM and embeddings 1 - word2vec
- (1:01:25) The idea behind using average of embeddings for non-equivalent tokens
Pre-trained Language Model - Training
- (1:02:34) Dive into source code of
LanguageModelLoader()
- (1:09:55) Create a custom
Learner
andModelData
class - (1:20:35) Guidance to tune dropout in LM
- (1:21:43) The reason to measure accuracy than cross entropy loss in LM
- (1:25:23) Guidance of reading paper vs coding
- (1:28:10) Tips to vary dropout for each layer
- (1:28:44) Difference between pre-trained LM and embeddings 2 - Comparison of NLP and CV
- (1:31:21) Accuracy vs cross entropy as a loss function
- (1:33:37) Shuffle documents; Sort-ish to save computation
Paper ULMFiT (FitLam)
- (1:44:00) Paper: ULMFiT - pre-trained LM
- (1:49:09) New version of Cyclical Learning Rate
- (1:51:34) Concat Pooling
- (1:52:44) RNN encoder and
MultiBatchRNN
encoder - BPTT for text classification (BPT3C)
Tricks to conduct ablation studies