Fast.ai v3 2019课程中文版笔记

NLP, Tabular Data, Recsys

课程计划与展望

lesson plan and forward

lesson plan and forward
::keywords::
classification, image regression, localization, tabular data, collaborative filtering, NLP transfer learning, U-turn, math
::key questions::
* What we learnt before lesson 4?
* What’s our focus (NLP transfer learning, and collaborative filtering) in lesson 4?
* What’s the math behind collaborative filtering?
* How to take a U-turn to dive into previous learnt applications behind the scene?

fastai在camvid数据集的战绩

fastai model beat state of art in camvid dataset

fastai model beat state of art in camvid dataset
::keywords::
The one hundred layers tiramisu (paper), camvid, state of art, smaller subset of classes, 94% > 91%, default setting
::key questions::
* How good is fastai model on camvid dataset?
* What is the fair comparison between different models on camvid dataset?
* How much can a default fastai model do these days?

NLP问题与神经网络解决方案

NLP problems and neural nets approach

NLP problems and neural nets approach
::Key words::
NLP transfer learning, IMDB dataset, Legal text classifier, Wikitext dataset,
::Key questions::
- What are the applications of NLP?
- Why it is difficult to use neuralnet to NLP classification?
- Why and how we say there isn’t enough information to learn?
- What is the nature or core of neural nets or deep learning?
- Why transfer learning is always the trick to go?
- How come Jeremy think of trying it then he can actually try it out, as if no one else thought of it and tried it? (I thought of it, but I didn’t know how to try it out)

如何将迁移学习用于NLP

How to do NLP transfer learning?

How to do NLP transfer learning?
::keywords::
Wikitext, language model, IMDB, classifier, finetune, target corpus,
::key questions::
* What is a language model? what can it do?
* What is the difference between language model from Wikitext and IMDB?
* How so that to train a movie review classifier is to train with wikitext first, and finetune with IMDB dataset, and finally train the classifier with positive/negative dataset?
* Can language model learn some abbreviation expressions? think of language model generate math papers like output
* What is swiftkey’s language model in your phone?
* what exactly has been learnt from a language model trained with wikipedia dataset?

实验IMDB数据集和NLP的基本操作步骤

Experiment the IMDB sample and NLP basic procedure

Experiment the IMDB sample and NLP basic procedure 14:00-19:44
::key questions::
* How to experiment on the IMDB sample from csv file?
* What is token, numericalization?
* How to access the vocab?
* What is the default number of vocabulary?
* What is the threshold number of appearance to keep/throw the word?
* How to turn dataset from csv file into a DataBunch with data block api?
* But how to put the original IMDB dataset into DataBunch? (it is not in csv file anymore)

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()
df = pd.read_csv(path/'texts.csv')
df.head()
df['text'][1]
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data = TextDataBunch.load(path)
data.show_batch()
data.vocab.itos[:10]
data.train_ds[0][0]
data.train_ds[0][0].data[:10]
data = (TextList.from_csv(path, 'texts.csv', cols='text')
                .split_from_df(col=2)
                .label_from_df(cols=0)
                .databunch())
如何训练IMDB语言模型

How to train IMDB language model?

How to train IMDB language model?
::key questions::
* what if you got a huge medical dataset no smaller than wikitext dataset?
* why we can use test set to train our language model?
* what does label language model mean?
* how to create a language model learner with RNN?
* what is dropout in terms of regularization?
* what is moms in fit-one-cycle?
* what does the model predict do? and how to do it?
* what does encoder do? and how to just save encoder as the model?

如何训练语言模型来做分类

How to train a language model for classification

How to train a language model for classification
time 27:13-33:12
::key questions::
* How to create the DataBunch to train the language classifier?
* why use vocab?
* How to manage the batch_size given the size of GPU memory card?
* What does the time spent look like on the second model and many classifier models?
* How to freeze up to specific number of layers?
* What is moms or momentum parameter for?
* How exactly do Jeremy figure out the best hyper-parameter value such as moms to automate?

如何用random forest来寻找最优学习率

How to find the best parameter value for learning rate using random forest

How to find the best parameter value for learning rate using random forest
time: 33:12-36:47
- Where does 2.6**4 come from?
[image:DAF31EAD-5DA5-4FD9-82A3-2299FF5EA1B0-11295-0002318019A7FB44/C9F72A37-9B3B-45A1-9D75-49539C461B27.png]
* How to use random forest search for the best hyper-parameter value?
* what is all about auto-ML? build models to how to train your model
* but we are fond of building models to better understand how your hyper-parameters work

如何用深度学习来做表格数据问题

How to do tabular data with deep learning

How to do tabular data deep learning
time: 36:31 - 53:09
* What are the problems with tabular data?
* How people first reacted to deep learning in tabular data problem?
* How such wrong reaction has been changed?
* Why and how (feature engineering and Pinterest conference) deep learning become powerful and useful in dealing with tabular data?
* What is Jeremy’s top options for tabular data problem? (DL, RF, GBoost?)
* What are the reasons why DL for tabular data not widely used? (library)
* Why fastai use pandas a lot for tabular data?
* What are the 10% cases in which DL is not the default approach?
* Why do we use URLs.ADULT_SAMPLE dataset?
* How to make tabular DataBunch from dataframe?
* What are dep_var, cat_names, cont_names and procs?
* How to deal with categorical variables inside tabular data in DL models? (embeddings) How about continuous variables?
* What are the differences between processor and transform? (once ahead of time vs every time sending in model)
* What does FillMissing, Categorify, Normalize do?
* Why do we split valid with split_by_idx to have connected sub dataset for validation?
* How to build tabular model with get_tabular_learner? what does parameter layers=[200,100] do?
* How to combine NLP data with metadata (tabular data) and let DL to apply to them?
* Will sklearn and XGBoost go outdated?
* What does metrics do?

如何将深度学习应用到collaborative filltering问题

How to apply DL to collaborative filtering

How to apply DL to collaborative filtering
53:09-67:24
* What kind of problems do we apply Collab filtering?
* What is the data structure like? (user, movie, rating two styles representing)
* What is the pros and cons of the sparse matrix style?
* What if you want to learn to deal with large sparse matrix storage problem? (Rachel’s computation LA course)
* What is GroupLense dataset about?
* How to experiment with the dataset using Collab filtering?
* How to create a collaborative filtering model?
* Why using Collab filtering was difficult?
* What is Cold stack problem?
* How Netflix fix the Cold stack problem?
* What is the other solution (predictive model) to cold stack problem?
* How to make language model learn to use emoji’s?
* How to deal with time series tabular data with DL? (extract and add more columns, not use RNN)
* Is there a source to learn more of Cold stack problem?

如何用excel帮助理解colaborative filtering的数据集和模型

How to understand dataset and models with excel

How to understand dataset and models with excel
time: 67:23-77:11
* How to visualize collaborative filtering process with excel?
* How to create weights for users and weights for movies?
* How to do gradient descent with solver?

用Vim来探索collab embedding的源代码

Explore collab embedding with VIM

Code Browsing - YouTube
Explore collab embedding with VIM
VIM Adventures
Timesavers: Bash kernel for Jupyter notebooks & ctag Vim navigation
77:07-92:28
How to use VIM to explore source code quickly?
What is embedding and how it is created?

总结

Explain deep learning process up to output layer

Explain deep learning process up to output layer
92:11 - end
* what is the deep learning workflow?
* what is input layer, hidden layers, output?
* what are parameters, weights?
* what are activations?
* How much linear algebra we need to do deep learning?