Fast.ai v3 2019课程中文版笔记

Daniel · March 9, 2019, 6:10am

深入学习CNN与数据科学伦理

如何使用platform.ai网站来标注图片

How to use platform.ai to label images

0:00-9:46
How to use platform.ai to label images
- How to use combine human skills with platform.ai to group or label images efficiently?

介绍Rossmann Store Sales 数据集

Getting Started with Rossmann Store Sales dataset

Getting Started with Rossmann Store Sales dataset
9:46-16:59
* How to understand Tabular learner source code?
* How to sort out Rossmann Store Sales dataset?
* Why the test set is more closer to the recent time?
* What is the loss function offered by Kaggle competition?
* Where to learn how to joint and manipulate data tables?
* Which notebook to clean rossman_data to generate pickle file?
* What does add_datepart do?
* Why it is useful to turn date into metadata columns?

如何预处理数据集

How to preprocess the dataset before DataBunch

How to preprocess the dataset
16:54-22:26
- What is transform and what transforms are good at?
- What is preprocess and how to use it?
- How to graph a small subset of data to experiment with?
- How to categorify for tabular dataset?
- what does categorify mean and what the output look like?
- What does NaN and -1 mean?
- Why we can’t use -1? and How we deal with -1 here?
- How to use FillMissing to fill the missing value with median values?
- How does fastai do all these preprocessing automatically or easily together?

如何对Rossmann数据集生成DataBunch

How to create DataBunch for Rossmann dataset?

22:26-27:14
How to create DataBunch for Rossmann dataset
- how to provide column names for all variables?
- how to determine validation set using test set from Kaggle?
- why make sure label_cls to be FloatList not IntList?
- Why use log=true for FloatList to use RMSPE?

如何用tabular模型来解决Rossmann数据集问题

How to build tabular model for Rossmann dataset?

How to build tabular model for Rossmann dataset
27:14-30:04
- What to do about y_range for tabular_learner and why?
- What kind of NN structure for tabular dataset? (simple fully connected)
- What does layers=[1000, 500] suggest?
- How to use ps and emb_drop to prevent overfitting?

如何理解dropout层

How to understand and use dropout?

30:03-39:14
How to understand and use dropout
* How to understand the basic idea of dropout from the paper?
* not only hidden activations but also inputs (sometimes) to be thrown away
* why it is useful?
* to make parameters to remember patterns rather than the specific images
* where usually do new brilliant ideas come from
* not from math
* but from life and intuition
* How to choose the probability to drop out for each layer?
* a single p
* or list of p
* What to do in training and testing time?
* do we throw activations too?
* if not, how to balance with training throwing percentage?
- How to understand dropout C source code?
- How to use ps and emb_drop in coding?
- How to understand the use of emb_drop?
- just another layer’s output/activations to be dropped with probability

如何理解Tabular Learner模型中的embedding 层

How to understand embedding layers in Tabular learner?

39:14-42:25
How to understand embedding layers here
How to experiment things out to find the best hyper parameter values?
- such as the process of finding emb_drop=0.04
How to understand embeddings?
- how embedding layers corresponds to categorical input variables?
- how to read and set the embedding sizes?
- continuous input variables work not with embedding layers, but batch norm layers

什么是Batch Normalization

What is Batch normalization

42:00-48:28
What is Batch normalization
* What is Batch normalization in a sentence?
* What is interesting about the recognition story of dropout paper by major journals?
* Why batch norm paper was accepted quickly?
* Why you should understand papers and give no big deal to math jargons?
* What is the real reason why BN is so powerful?
* loss vs parameters is not as bumpy, therefore model can converge with higher lr
- what does BN algorithm actually do?
- BN is a layer to produce activations
- get mean and sd from a batch, and normalize the batch with them
- [image:C1E8B194-E4C9-4561-89B3-0AF453351EF4-76996-000338D1A2BB7D80/579DC33D-7356-4458-907E-68A08C964680.png]
- scale (coefficient param) and shift (bias param) the distribution of the batch (most important)

为什么BN中的scale, shift能产生显著功效

Why BN (scale and shift) make a difference?

48:25-52:00
Why BN (scale and shift) make a difference
* what is the problem behind the scene?
* target range 1 to 5
* prediction range -1 to 1
* it causes difficulties
* scale with alpha and shift with beta can deal with it
* to transform [-1,1] closer to [1, 5]

如何在代码中具体使用BN

How to use BN in code?

51:50-54:56
How to use BN in code
- what does momentum=0.1 mean for BatchNorm1d?
- value low = mean and sd vary less between mini-batches = less regularization
- value high = otherwise = high regularization
- it trains much faster now

如何在BN，数据增强，dropout, weight decay 和L2 norm之间选择

How to pick between BN, data augmentation, dropout, weight decay and L2 norm

54:56-56:46
How to pick between BN, data augmentation, dropout, weight decay and L2 norm
- L2 = weight decay, use weight decay
- always use BN, data augmentation
- experiment to see the combination options for dropout and weight decay

如何做数据增强

How to do data augmentation

56:45-65:24
How to do data augmentation
- Why it is Least well studied and most exciting regularization?
- no cost
- no longer to train
- no underfitting
- how to find out all about data transformation through docs
- how to pick appropriate values for brightness
- how to pick for diheral
- how about flip
- pad mode a fastai paper about it
- what is symmetric warp doing
- how to transform a single dog picture into many “different looking” images
- why data augmentation is such a big potential opportunity?

如何一步一步手写CNN

How to create a CNN step by step

65:12-109:08
How to create a CNN step by step

本课目标和展望

Overview and Why to understand CNN by creating a heat map in the end?

65:12-67:30
Overview and Why to understand CNN by creating a heat map in the end
* how to quickly create, train and save a CNN with fastai?
* To understand CNN by creating a heat map from scratch

如何动态可视化理解kernels的功能

How to understand kernels with Setosa’s web app?

67:27-75:05
How to understand kernels with Setosa’s web app
why study how CNN work in the end of a course?
- not useful in terms of just using them
- but if want to do it slightly differently, we need to know CNN behind the scenes
- convolution: a special matrix multiplication
How to understand CNN kernel or image kernel with Setosa’s web app?
- how the kernel transform an image?
- why there is black outer edge of the output image?
- why head area is transformed into while cells but face areas into black cells?
- How to define a convolution with this example?
- How to relate this to channel visualization with the paper?
- Why such kernel is to help find up-edges?

如何理解Convolution以及padding的用途

How to understand convolution differently and what is padding for?

75:05-80:00
How to understand convolution differently and what is padding for
* How to view convolution as standard matrix multiplication?
* transform kernel convolution movement into a single larger matrix kernel doing simple matrix multiplication with input matrix
* How to understanding padding?
* to keep the output feature map the same size as input feature map

kernels, stride, padding在CNN中是如何工作的

How kernels, stride, padding work in a real CNN?

79:55-89:39
How kernels, stride, padding work in a real CNN
what does a 3-channel kernel look like? and how does it work?
how do we find more features by adding more 3-channels kernels?
- e.g., add 16 kernels to find 16 different features

Why and How to shrink the feature map but double the kernels?
- avoid memory goes out of control by kernel skipping over one or several pixels
- feature map size shrinks, but we can add more kernels

Let’s Experiment an image with kernels, stride and padding
- create a CNN over an image
- check out its model summary, particularly its feature map size half and kernels double

如何手写你的CNN

How to do your manual CNN?

89:30-93:49
How to do your manual CNN
- how to create your own 3 channel kernel with 4D to show bottom right edge
- how to get a single image
- how to create a kernel as a 4D tensor
- how to create a mini-batch of size 1
- how to apply the kernel to an image

如何创建heat map

How to create the heat map?

93:46-109:00
How to create the heat map
- how to turn a 512x11x11 tensor into a vector of 37 values
- average pool 2d with output size 1
- linear layer with (512, 37)
- what does the finally feature map (512, 1, 1) tell us?
- what does (512, 11, 11) tell us?
- what does it mean to average cross same cell position for 512 channels rather than 11x11 grid of a single channel?
- how to use hook to get the feature map 512x11x11?
- how to run model on a single example

数据科学与伦理

Ethics and data science

Ethics and data science
109:08-end
- what are generative models?
- what are the ethics issues of data science
- what are the gender bias based on facial classifier on major DL companies
- why? what are the reasons caused such bias? (where is data source)
- how biased surveillance DL cause massive arrest?
- the best way to get publicity is to do something like “Amazon Face Recognition falsely matched black 28 members of congress with mugshots”
- google machine translation seem don’t fix the gender bias
- machine bias is overwhelming in public policy and judicious system
- Facebook and Mianmar genocide
- how should a DL engineer face ethical issues