Part 2 Lesson 10 wiki

jeremy · April 3, 2018, 1:07am

Please post your questions about the lesson here. This is a wiki post. Please add any resources or tips that are likely to be helpful to other students.

<<< Wiki: Lesson 9 ｜ Wiki: Lesson 11 >>>

Lesson resources

Links

Papers

Regularizing and Optimizing LSTM Language Models on understanding dropout in LSTM models by Stephen Merity et al.
Universal Language Model Fine-tuning for Text Classification (ULMFiT or FitLam) - transfer learning for NLP by Jeremy Howard and Sebastian Ruder
A disciplined approach to neural network hyper-parameters by Leslie N. Smith
Learning non-maximum suppression - end-to-end convolutional network to replace manual NMS by Jan Hosang et al.

Code Snippets

Downloading the data:

curl -OL http://files.fast.ai/data/aclImdb.tgz
tar -xzf aclImdb.tgz

Time Line

Review

(0:00:14) Review lesson 9 - SSD
(0:16:40) Introducing fastai.text

IMDB

(0:20:30) IMDB with fastai.text
(0:23:10) The standard format of text classification dataset
(0:28:08) Difference between tokens and words 1 - spaCy
(0:29:59) Pandas chunksize to deal with a large corpus
(0:32:38) {BOS} (beginning of sentence/stream) and {FLD} (field) tokens
(0:33:57) Run spaCy on multi-cores with proc_all_mp()
(0:35:40) Difference between tokens and word 2 - capture semantic of letter case and others
(0:38:05) Numericalise tokens - Python Counter() class

Pre-trained Language Model - PreTraining

(0:42:16) Pre-trained language model
(0:47:13) Map IMDb index to wiki text index
(0:53:09) fastai documentation project
(0:58:24) Difference between pre-trained LM and embeddings 1 - word2vec
(1:01:25) The idea behind using average of embeddings for non-equivalent tokens

Pre-trained Language Model - Training

(1:02:34) Dive into source code of LanguageModelLoader()
(1:09:55) Create a custom Learner and ModelData class
(1:20:35) Guidance to tune dropout in LM
(1:21:43) The reason to measure accuracy than cross entropy loss in LM
(1:25:23) Guidance of reading paper vs coding
(1:28:10) Tips to vary dropout for each layer
(1:28:44) Difference between pre-trained LM and embeddings 2 - Comparison of NLP and CV
(1:31:21) Accuracy vs cross entropy as a loss function
(1:33:37) Shuffle documents; Sort-ish to save computation

Paper ULMFiT (FitLam)

(1:44:00) Paper: ULMFiT - pre-trained LM
(1:49:09) New version of Cyclical Learning Rate
(1:51:34) Concat Pooling
(1:52:44) RNN encoder and MultiBatchRNN encoder - BPTT for text classification (BPT3C)

Tricks to conduct ablation studies

(1:58:35) VNC and Google Fire Library
(2:05:10) SentencePiece; Tokenize sub-word Units

binga · April 3, 2018, 1:37am

Could Jeremy speak about use_clr argument in the fit function?

YangL · April 3, 2018, 1:41am

Usually, when we down sample, we increase the number of filters, or depth. when we’re doing sampling from 77 to 44 etc, why are we decreasing the number from 512 to 256?

why not decrease dimension in SSD head? (performance related?)

KevinB · April 3, 2018, 1:49am

This was a useful post for me to learn what is happening, http://forums.fast.ai/t/understanding-use-clr/13969.

vikbehal · April 3, 2018, 2:00am

How fast is this IMDB nb?

Ducky · April 3, 2018, 2:01am

Won’t you have to set the seed in between trn_idx and val_idx? Or won’t the random values be different when you start the permutation which creates val_idx?

KevinB · April 3, 2018, 2:02am

What do you mean how fast is it?

blakewest · April 3, 2018, 2:02am

Where do we get the IMDB data? Or where can we find out how to get it?

wgpubs · April 3, 2018, 2:02am

Why would you not want to include a “header” in the .csv file for NLP?

mandroid6 · April 3, 2018, 2:02am

The mentioned indexes are different for both train and val, so it won’t affect each other.

wdhorton · April 3, 2018, 2:03am

The seed is where it starts out, by calling np.random.permutation to get trn_idx you change the state in the random number generator so that val_idx ends up different

memetzgz · April 3, 2018, 2:03am

@blakewest, check Part 1 notebooks, second(?) lesson on NLP has link

KevinB · April 3, 2018, 2:03am

Pretty sure there is a link in lesson 4 nb

emilmelnikov · April 3, 2018, 2:04am

cd ~/fastai/courses/dl2/data
curl -OL http://files.fast.ai/data/aclImdb.tgz
tar -xzf aclImdb.tgz

wgpubs · April 3, 2018, 2:04am

For language modeling, why is there a “labels” column? It isn’t even used in language modeling.

gerardo · April 3, 2018, 2:05am

Where’s the data? for the aclImdb??

YangL · April 3, 2018, 2:06am

When I’m working with NLP data, manytimes I come across data with foreign texts/characters.

In your infinite experience, is it better to discard them or to keep them? (Are they worth keeping?)

erinjerri · April 3, 2018, 2:07am

Where is the GH repo where this is at? For some reason lost my place (sorry about this)? Would like a resource link at the top.

YangL · April 3, 2018, 2:07am

… and chunksize is in lines? bits? bytes?