哈哈,我们尝试着把中文Community搞起来,他们肯定是愿意的。靠我们大家努力啦。
Nvidia Tesla V100.
哈哈,我们尝试着把中文Community搞起来,他们肯定是愿意的。靠我们大家努力啦。
Nvidia Tesla V100.
阿里云并不比GCP便宜, 上个月瞄过一眼, P100美西的bidding pricing差不多2刀一个小时(exchange rate roughly*6, around 12rmb / hour with bidding)
安装fastai和pytorch并不是什么特别困难的事情… 目前看来,
谢谢分享!能否解释一下,什么是quota,以及为什么需要解决quota application相关问题?
根据fast.ai的指南,crestle似乎是最容易的,不需要任何安装。但我没有实际使用对比过。你有两者使用对比经验可以具体说明一下吗?谢谢!
算力方面,Kaggle, Colab与GCP相比,Colab的算力大概是怎样的呢?谢谢!
我设置环境已经有2-3个月了, 我不确定具体流程是否有改变. 另外Colab我没有使用过.
By default, 新帐号的limit应该是0
至于GPU方面, 那就得看看浮点数的运算能力. 这个有很多文章评论, 我基本上还是按照课程推荐的设置来进行的, kaggle用的好像是Tesla K80? 而GCP fastai 推荐的是P4
另外, 如果不想付费. 因为我不是墙内的用户, 所以我也不太清楚kaggle能不能直接登录
可以在自己的电脑上跑简单的model, 当然, segmentation的问题肯定是时间很长
具体设置可以使用conda, 先安装
Anaconda3
然后运行Anaconda prompt (windows)
之后conda install -c pytorch -c fastai fastai
当然, 我个人是很不推荐用这个方法. 因为GPU memory的问题, 建议有能力的话还是用kaggle或者GCP. 另外就是AMD的GPU不支持cuda, 所以如果不是nvdia的GPU基本上就只能用CPU来跑了
特别说明一下, 我自己的电脑因为没有nvdia gpu,所以只能处理一些processing的问题
比如测试一下datablock api, custom item list, hook 以及一些pytorch的功能. 所以我不太清楚这么安装cuda能否工作
好的,谢谢你的使用分享!
Hi @PoonamV , could you show me how to free the word limit in a single post? thanks!
I could not find anything useful online to deal with it.
完全同意!其实假如GCP上面用Preemptible Instance的话,GPU的价格简直便宜到离谱。下面是最新的价目表:
GPU | Standard (Prices vary by location) | Previous Preemptible (All Locations) | New Preemptible (All Locations) |
---|---|---|---|
V100 | $2.48 | $1.24 | $0.74 |
P100 | $1.46 | $0.73 | $0.43 |
K80 | $0.45 | $0.22 | $0.135 |
基础instance的价格相对于GPU基本可以忽略不计。我们选的类型是n1-highmem-8
,$0.12/h ,已经是52G内存,8 vCPU了。
所以GCP上面V100也就¥5.78/小时,P100更是只有¥3.69/小时。这年代小学生三线城市网吧上网都不止这个价钱了吧
下面是我查到的阿里云在张家口,相对来说应该已经是较便宜的一个节点的价目表:
毫不客气地说,GCP上面V100都比阿里云的P4便宜;而P100的算力差不多是P4的两倍,而V100差不多是P100的两倍。V100上假如能正确使用mixed-precision training的话,还可以提速近一倍,所以最后达到P4训练速度的6-8倍是很轻松的。更别提连外网的速度GCP上面要更快,下载数据集也更迅速。在我看来,没有任何理由选择阿里云。
Neither I. I had to truncate my long post to limited number of words. Maybe in discourse settings something can be found.
0:00-0:47
ytcropper Lesson 0 How to get GPU running
0:30-3:16
What else do you need to get started
你需要做些什么准备?
3:13-4:29
How to make the most out of these lesson videos and notebooks
如何最大化利用课程视频与notebook?
4:29-5:28
what you expect to be with fastai course
5:26-6:47
why should we learn from Jeremy Howard
6:40-7:26
How to make DL accessible to everyone to do useful things
7:25-8:51
How much to invest and What I get out
8:51-10:23
Prerequisites and False assumptions and claims on DL
10:22-11:04
What will you be able to do at the end of lesson 1
11:04-12:31
What is fastai learning philosophy
12:30-13:17 *
How to use Jupyter notebook as a pro
13:17-14:00
What are Jupyter magics
14:00-17:56
What are fastai lib and how to use it
fast.ai
课程和机构名称fastai
库的名称pytorch
is easier and more powerful than tensorflow*
的使用都支持
from fastai import *
(个人评论:但实际作用几乎为零)from fastai.vision import *
引入所需工具个人评论:代码探索发现 from fastai import *
并无真用途
17:56-21:08
Academic vs Kaggle Datasets, CatsDogs vs Pets dataset
两大数据集来源
- 学术和Kaggle
学术数据集特点?
- 学术人员耗费大量时间精力收集处理
- 用来解决富有挑战的问题
- 对比不同方法或模型的表现效果,从而凸显新方法的突破性表现
- 不断攀登和发表学术最优水平
为什么这些数据集有帮助?
- 提供强大的比较基准
- 排行榜与学界最优表现
- 从而得知你的模型的好坏程度
记住要注明对数据集和论文的引用
- 同时学习了解数据集创建的背景和方法
你的宠物数据集问题的难度
- 猫狗大战相比较是非常简单的问题
- 二元分类,全部猜狗也有50%准确率
- 而且猫狗之间差异大,特征比较简单
- 刚开始做猫狗大战竞赛时,80%已经是行业顶级水平
- 如今我们的模型几乎做到预测无误
- 宠物数据集要求识别37不同种的猫狗
- 猜一种只会有1/37正确率
- 因为不同种猫和不同种狗之间差异小,特征难度更高
- 我们要做的是细微特征分类fine-grain classification
21:07-23:56
How to download dataset with fastai
AWS 在云端为fast.ai所需数据集提供免费告诉下载
我们去Kernel中看untar_data
的用法
23:56-25:49
How to access image folders and check filenames inside
我们去Kernel中看Path,ls,get_image_files
的用法
25:40-27:53
How to get the labels of dataset
如何理解用regular expression 提取label,见笔记
运行代码理解其用法,见kernel
27:48-29:15
Why and how to pick the image size for DataBunch
size = 224
size=224
now29:15-29:56
What is a DataBunch
What does DataBunch contain
- training Dataset
- images and labels
- texts and labels
- tabular data and labels
- etc
- validation Dataset
- testing Dataset (optional)
29:56-30:19
What does normalize do to DataBunch
to make data about the same size with same mean and std
30:19 - 31:50
What to do if size
is not 224
* get_transforms
function will make the size so
- data looks zoomed
- center-cropping
- resizing
- padding
- these techniques will be used in data augmentation
31:50-33:01
What does it mean to normalize images
* all pixel start 0 to 255
* but some channels are very bright and other not, vary a lot
* if all channels don’t have mean 0 and std 1
* models may be hard to train well
33:01- 33:34
Why 224 not 256 as power of 2
- because final layer of model is 7x7
- so 224 is better than 256
- more in later
33:34-35:06
How to check the real images and labels
* to be a really good practitioner is to look at your data
* how to look at your images
* data.show_batch(rows=3, figsize=(7,6))
* how to look at your labels
* print(data.classes
- what is data.c
of DataBunch
* number of classes for classification problem
* not for regression and other problems
35:06-37:25
How to build a CNN learner/model
what is a Learner?
- things can learn to fit the data/model
what is ConvLearner?
- to create convolution NN
- ConvLearner
is replaced by create_cnn
*
what is needed to make such a model?
- required
- DataBunch
- Model: resnet34
or resnet50
- metrics =
is from kwargs
*
How to pick between resnet34 and resnet50?
- always start with smaller one
- then see whether bigger is better
What is metrics
- things to print out during training
- e.g., error_rate
37:25-40:03
Why use a pretrained model (framework and parameters) for your CNN? in other words, What is transfer learning?
* the model resnet34 will be automatically downloaded if not already so
* what exactly is downloaded
* pretrained model with weights trained with ImageNet dataset
* why a pretrained model is useful?
* such model can recognize 1000 categories
* not the 37 cats and dogs,
* but know quit a lot about cats and dogs
* what is transfer learning?
* take a model which already can do something very well (1000 objects)
* make it do your thing well (37 cats dogs species)
* also need thousands times less data to train your model
40:03-41:40
what is overfitting? why wouldn’t the model cheating?
How do we know the model is not cheating?
- not learn the patterns to tell cricket from baseball
- but only member those specific objects in the images
How to avoid cheating
- use validation set which your model doesn’t see when training
what use validation set for?
- use validation set to plot metrics to check how good model is fairly
Where is validation set?
- automatically and directly baked into the DataBunch
- to enforce the best practice, so it is impossible to not use it
41:40-44:33
How to train the model with the best technique
we can use function fit
, but always better to use fit_one_cycle
What is the big deal of fit_one_cycle
?
- a paper released in 2018
- more accurate and faster than any previous approach
- ::fastai incorporates the best current techniques *::
keyboard shortcut for functions
* tab
to use autocomplete for possible functions
* shift + tab
to display all args for the function
how to pick the best number of epochs for training?
- learn how to tune epochs (4) in later lessons
- I don’t remember it had been discussed in later 6 lessons *
- not too many, otherwise easy overfit
44:28 - 46:42
How to find out how good is your model
46:47-48:41
How to get the most out of this course
What is the most occurred mistake or regret?
- spend too much on the concept and theory
- spend too little time on notebooks and codes
What your most important skill is about
- understanding what goes in
- and what comes out
48:41-52:53
The popularity of fastai library
* Why we say fastai library becomes very popular and important
* major cloud support fastai
* many researchers start to use fastai
* what is the best way of understand fastai software well?
* docs.fast.ai
* How fastai compare with keras
* codes are much shorter
* keras has 31 lines which you need to make a lot of decisions
* fastai has 5 lines which make the decisions for you
* accuracy is much higher
* training time is much less
* cutting edge researches use fastai to build models
* “the ImageNet moment” for NLP done with fastai
* github: “towards Natural Language Semantic code search”
* Where on the forum people talking about papers?
* Deep Learning section
52:45-65:51
What students achieved with fastai and this course
Sarah Hooker
- first course student, economics (no background in coding)
- delta analytics to detect chainsaw to prevent rainforest
- google brain researcher and publish papers
- go to Africa to setup the first DL research center
- dig deep into the course and Deep learning book
Christine Mcleavey Payne
- 2018 year student
- openAI
- Clara: a neural net music generator
- background: math and …. too much to mention
- pick one project and do it really well and make it fantastic
Alexandre Cadrin
- can tell MIT X-ray chest model is overfitting
- bring deep learning into your industry and expertise
Melissa Fabros
- English literature degree, became Kiva engineer
- help Kiva (micro-lending) to recognize faces to reduce gender and racial bias
Karthik and envision
- after the course started a startup named envision
- help blind people use phone to see ahead of you
Jeremy helped a small student team
- to beat google team in ImageNet competition
Helena Saren? @glagoli…?
- combine her own artistic skills with image generator
- style transfer
a student as Splunk engineer to detect fraud
Francisco and Language Model Zoo at the forum
- use NLP to do different languages with different students
Don’t feel intimidated and ask for help and contribute
66:00 - 67:57
Why use Resnet rather than Inception
* DAWNBench on ImageNet classification
* “Resnet is good enough” for top 5 places
* edge computing
* but the most flexible way is let your model on cloud talk with your mobile app
* inception is memory intensive and not resilient
67:57-68:43
How to save a trained model
What is inside the trained model?
- updated weights
why do we need to save model?
- keep working and updating the previous weights
how to save a model?
- learn.save("stage-1")
where will be the model be located?
- in the same fold where data is
68:43-73:22
how to plot top losses examples/images
How to create model interpreter?
- interp = ClassificationInterpretation.from_learner(learn)
Why plot the high loss?
- to find out our high prob predictions are wrong
- they are the defect of our model
How to plot top losses using the interpreter?
- interp.plot_top_losses(9, figsize=(15,11))
How to read the output of the plotting and numbers?
- doc(interp.plot_top_losses)
-> doc & source
What does those numbers on the plotting mean?
- prediction, actual, loss, prob of actual (not prediction)
Why fastai source code is very easy to read?
- intension when writing it
- don’t be afraid to read the source
Why it is useful to see top loss images
- figure out where is the weak spot
- error analysis * (Ng)
73:20 -74:39
How to find out the most confused images of our model
* why we need the confusion matrix to interpret the model?
* when not to use interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
?
* when to use interp.most_confused(min_val=2)
?
74:37 - 76:26
How to improve our model
What is the default way of training?
- add a few layers at the end
- only train or update weights for the last few layers
What is the benefit of the default way?
- less likely to overfit
- much faster
How to train the whole model?
- unfreeze the model learn.unfreeze()
- train the entire model learn.fit_one_cycle(1)
why it is easy to ruin the model by training the whole model?
- learning rate is more likely to set too large for earlier layers *
- to understand it please see the next question
76:24 - 82:32
what is CNN actually learning and why previous full model training didn’t work
what is the plot of layer 1?
- coefficients? weights? filters
- finding some basic shapes
what are the plots of layer 2?
- 16 filters
- each filter is good at finding one type of pattern
What are the plots of layer3?
- 12 filters
- each is more complex patterns
What are inside plots of layer4 and layer 5?
- filters to find out even more complex patterns using previous layer patterns
Which layer’s filter pattern can be improved?
- less likely for layer 1
- maybe not layer 4-5
- probably much later layers should be changed to some extent
Why the previous full model train won’t work?
- the same learning rate is applied to earlier and later layers
82:32- 86:55
How to train the whole model in the right way
How we go back to the unbroken model by full training?
- load the backup model
- learn.load('stage-1');
How to find the best learning rate?
- to find the fastest learning rate value
- learn.lr_find()
How to plot the result of learning rate finding?
- learn.recorder.plot()
how to read the learning rate plot?
- learn.unfreeze()
- learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
- how to find the lowest/fastes learning rate?
- find the lr value before loss get worse
- how to find the highest/slowest learning rate?
- 10x smaller than original learning rate
- how to give learning rate value to middle layers?
- distribute values equally to other middle layers
Why you can’t win Kaggle easily?
- many fastai alumni compete on Kaggle
- this is the first thing they will try out
86:55-91:00
How to improve model with more layers
to use ResNet50 instead of ResNet34
- data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=bs//2).normalize(imagenet_stats)
- learn = create_cnn(data, models.resnet50, metrics=error_rate)
what to do when GPU memory is tight?
- due to model is too large and take too much GPU memory
- less 8 GPU memory can’t run ResNet50
How to fix it?
- shrink the batch_size when creating the DataBunch
How good is 4% error rate for Pets dataset?
- compare to CatsDogs 3% error rate
- 4% for 37 similar looking species is extraordinary
Why ResNet50 still use the same lr range from ResNet34? *
- the lr plot looks different from that of ResNet34
- but why we still use the following code
- learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))
- problem asked on formum
How to use most confused images to demonstrate model is already quite good?
- check out the most confused images online
- see whether human can’t tell the difference neither
- if so, then model is doing good enough
- it teaches you to become a domain expert
91:35-95:10
Different ways to put your data into DataBunch
How to use MNIST sample dataset?
* path = untar_data(URLs.MNIST_SAMPLE); path
How to create DataBunch while labels on folder names?
- data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)
How to check the images and labels?
How to read from CSV?
- df = pd.read_csv(path/'labels.csv')
How to create DataBunch while labels in CSV file?
- data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)
How to create DataBunch while labels in dataframe?
- data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
How to create DataBunch while labels in filename?
- data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
How to create DataBunch while labels in filename using function?
- data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24, label_func = lambda x: '3' if '/3/' in str(x) else '7')
How to create DataBunch while labels in a list?
- labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]
- data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)
95:10-97:28
How to make the most out of documents
- To experiment the doc notebook
- How do I do better?
97:28-98:09
QA on fastai with multi-GPU, 3D data
98:09-end
An interesting and inspiring project
how to transform mouse moment into images
then train it with CNN
start - 4:26
how to use forum and contribute to fastai
resources
* How to contribute to fastai - Part 1 (2019)
* Doc Maintenance | fastai
where is the most important information at forum?
- official updates and resources
- start from here
how not be intimidated by the overwhelming forum
- click summary button
- with [kaggle](https://course.fast.ai/update_kaggle.html)
- click into kernels
- with your local workplace
- `git pull`
- `condo update conda` outside conda environment
- `conda install -c fastai fastai`
4:26-12:56
What students have done after the first week
- use NN to clear whatapp downloaded images
- use NN to beat the state of art on recognize background noise
- new state of art performance on a language DHCD recognition
- turn point mutation of tumor into images and beat the sate of art
- automatically playing science communication games with transfer learning and fastai
- James Delinger: do useful things without reading math equations (greek)
- Daniel R. Armstrong: want to contribute to the library, step by step, you will get there
- project to classify zucchinis (39 images) and cucumbers (47 images)
- use PCA to create a hairless classifier for dogs and cats
- classifier for new and old special buses
- models classify 110 cities from satellite images
- models to classify complete and incomplete construction sites
12:56 - 16:20
What is the course structure and teaching philosophy
* recursive learning in curriculum
* Perkins’s theory (chinese version)
* code first
* whole game with videos
* concepts not details
* keep moving forward
16:20-23:47
How to create your own dataset for classifier
inspired by PyImageSearch, great resources
project to classify teddy bear, grizzly bear and black bear
search “teddy bear” in google image
- ctrl+shift+j
or cmd+opt+j
paste the codes and save image urls into a file in your directory
how to create three set of folders experimentally
- create variables for a folder and url.txt
- create the folder path
- download the images into the folder
- do it three times for three kinds of bears
How to verify images that are problematic with `verify_images’?
23:47-25:42
How to create DataBunch from a single fold of images
- how to set the training set from the single folder
- how to split into a validation set from the single folder
- why set random seed before creating DataBunch?
25:42-26:49
How to check images, labels, and sizes of train and validation set
* How to display images from a batch
* How to check labels and classes
* How to count the size of train_ds
and valid_ds
?
26:49-27:41
How to train and save the model
- how to create a CNN model with ResNet34 and plot error-rate
- how to train the model for 4 epochs
- how to save the trained model
27:36-29:39
视频节点
29:39-29:57
How to interpret the model
how to read most confused matrix?
29:57-31:31
Noisy data and model output
What does noisy data mean?
- such as mislabelled data
What problem noisy data could cause model to have?
- unlikely, some data are predicted correctly with high confidence
- these data are likely to be mislabelled
Solution approach
- joint domain expert and machine automation
31:31-35:32
How to clean up noisy data with widget
How to work with widget to clean mislabelled data manually?
35:12-37:37
How to build a ipywidget for your notebook
how to read the source code of the widget?
how to build a tool for notebook experimenter?
Exciting to create tools for fellow practitioners
encouraged to dig into the ipywidget docs
not a production web app
37:35-38:32
What is biased noise?
* most time after remove mislabelled data, model improved only a little
* it is normal as model can handle some level of noise itself
* what is toxic is biased noise, not randomly noisy data
38:32-45:50
How to put model into production web app
* why to run production on CPU not GPU?
* the time difference between CPU web app vs GPU server is 0.2 vs 0.01s
* how to prepare your model for production use?
* it is very easy and free to use with some instruction on course wiki
* try to make all your classifier into web apps
46:05-53:09
99% of time what we need to finetune is lr and epochs for CV
experiment what happen when lr is very high
- no way to undo it, has to recreate model
experiment what happen when lr is too low
- loss down very slow
validation loss is lower than training loss
- lr is too low
- too few epochs
too many epochs
- overfitting - to learn specific images of teddy bears
- signal - loss goes down but goes up again
- but it is difficult to make our model to overfit
53:09-62:15
what is the math behind an image and its classification
what is the math behind an image and its classification?
what is behind learn.predict source
what does np.argmax do
what is error_rate source code?
what is behind accuracy function?
which dataset does metric apply to?
doc
is not just nice printing of ?
, because it may has examples
why use the 3
of 3e-5
often?
62:15- 68:23
what is linear function, and how matrix multiplication fit in
* KhanAcademy for basics and advanced math
* to replace b
with a_2*x_2
* there are lots of examples (x1, y1), (x2, y2), …
* Rachel’s best linear algebra course
* vectorization, dot product, matrix product to avoid loop and speed up
* matrix multiplication in visualization
68:32-74:14
QA on data size, unbalanced data, model framework and weights
How do we know we don’t have enough data
* lr is good, can’t be a little higher or lower
* if epochs goes a little bigger then make validation loss worse
* then we may need to get more data
* most time you need less data than you think
How do you deal with unbalanced data?
- do nothing, it always works
What is ResNet34 as function?
- function framework without number or weights
- pretrained model with weights
74:14-101:10
How to create the simplest NN (tensor, rank)
what is the simplest architecture?
what is SGD?
how to generate some data for a simple linear function?
how to use matrix product @
to create the linear architecture?
what is a tensor?
- array
what is a rank?
- rank 1 tensor is a vector
how to create the X features?
how to create the coefficients or the weights?
how to plot the x and y (ignoring x_2 as it is just 1)?
what about matplotlib?
how to create MSE function?
how to do scatter plot?
how to do Gradient Descent?
how to calculate derivative with Pytorch?
101:10-105:47
why do we need learning rate at all
- derivative tells us direction and how much
- but it may not best reduce the loss
- we need learning rate to help get loss down appropriately
108:09-109:49
why mini-batches makes training more efficient
109:49-111:43
What are the new vocal learnt?
Learning rate
epoch: too many epochs, easily overfit
mini batch: more efficient than full batch training
SGD : GD with mini-batch
Model/Architecture: y = x@a, Resnet34, matrix product
parameters: weights
loss function
114:43-end
what is overfitting and regularization and validation set
- what is training dataset on the graph?
- which model/graph is underfitting the training set?
- doing bad, having worse loss
- which model/graph is overfitting the training set?
- doing good, having low loss
- both are different from the right model
- both have bad loss on new/validation dataset
- false assumption
- more parameters -> overfitting
- less parameters -> underfitting
- truth
- overfitting and underfitting -> nothing to do with parameter number
- boss and org
- training set can tell underfitting from overfitting and ok models
- validation set can differ overfitting model from OK model
- use validation set from being sold snake oil
- further study
- Rachel’s blog post
- Rachel’s courses
In this lesson we will build our first image classifier from scratch, and see if we can achieve world-class results. Let’s dive in!
Every notebook starts with the following three lines; they ensure that any edits to libraries you make are reloaded here automatically, and also that any charts or images displayed are shown in this notebook.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
We import all the necessary packages. We are going to work with the fastai V1 library which sits on top of Pytorch 1.0. The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models.
from fastai.vision import *
from fastai.metrics import error_rate
If you’re using a computer with an unusually small GPU, you may get an out of memory error when running this notebook. If this happens, click Kernel->Restart, uncomment the 2nd line below to use a smaller batch size (you’ll learn all about what this means during the course), and try again.
bs = 64
# bs = 16 # uncomment this line if you run out of memory even after clicking Kernel->Restart
We are going to use the Oxford-IIIT Pet Dataset by O. M. Parkhi et al., 2012 which features 12 cat breeds and 25 dogs breeds. Our model will need to learn to differentiate between these 37 distinct categories. According to their paper, the best accuracy they could get in 2012 was 59.21%, using a complex model that was specific to pet detection, with separate “Image”, “Head”, and “Body” models for the pet photos. Let’s see how accurate we can be using deep learning!
We are going to use the untar_data
function to which we must pass a URL as an argument and which will download and extract the data.
help(untar_data)
Help on function untar_data in module fastai.datasets:
untar_data(url: str, fname: Union[pathlib.Path, str] = None, dest: Union[pathlib.Path, str] = None, data=True, force_download=False) -> pathlib.Path
Download `url` to `fname` if it doesn't exist, and un-tgz to folder `dest`.
path = untar_data(URLs.PETS); path
PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet')
path.ls()
[PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images'),
PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/annotations')]
path_anno = path/'annotations'
path_img = path/'images'
The first thing we do when we approach a problem is to take a look at the data. We always need to understand very well what the problem is and what the data looks like before we can figure out how to solve it. Taking a look at the data means understanding how the data directories are structured, what the labels are and what some sample images look like.
The main difference between the handling of image classification datasets is the way labels are stored. In this particular dataset, labels are stored in the filenames themselves. We will need to extract them to be able to classify the images into the correct categories. Fortunately, the fastai library has a handy function made exactly for this, ImageDataBunch.from_name_re
gets the labels from the filenames using a regular expression.
fnames = get_image_files(path_img)
fnames[:5]
[PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/saint_bernard_188.jpg'),
PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/staffordshire_bull_terrier_114.jpg'),
PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Persian_144.jpg'),
PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Maine_Coon_268.jpg'),
PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/newfoundland_95.jpg')]
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'
data = ImageDataBunch.from_name_re(path_img,
fnames,
pat,
ds_tfms=get_transforms(),
size=224,
bs=bs
).normalize(imagenet_stats)
data.show_batch(rows=3, figsize=(7,6))
print(data.classes)
len(data.classes),data.c
['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']
(37, 37)
Now we will start training our model. We will use a convolutional neural network backbone and a fully connected head with a single hidden layer as a classifier. Don’t know what these things mean? Not to worry, we will dive deeper in the coming lessons. For the moment you need to know that we are building a model which will take images as input and will output the predicted probability for each of the categories (in this case, it will have 37 outputs).
We will train for 4 epochs (4 cycles through all our data).
learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.model
Sequential(
(0): Sequential(
(0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(5): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(6): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(7): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(1): Sequential(
(0): AdaptiveConcatPool2d(
(ap): AdaptiveAvgPool2d(output_size=1)
(mp): AdaptiveMaxPool2d(output_size=1)
)
(1): Flatten()
(2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.25)
(4): Linear(in_features=1024, out_features=512, bias=True)
(5): ReLU(inplace)
(6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): Dropout(p=0.5)
(8): Linear(in_features=512, out_features=37, bias=True)
)
)
learn.fit_one_cycle(4)
Total time: 01:46
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 1.409939 | 0.357608 | 0.102165 |
2 | 0.539408 | 0.242496 | 0.073072 |
3 | 0.340212 | 0.221338 | 0.066306 |
4 | 0.261859 | 0.216619 | 0.071042 |
learn.save('stage-1')
Let’s see what results we have got.
We will first see which were the categories that the model most confused with one another. We will try to see if what the model predicted was reasonable or not. In this case the mistakes look reasonable (none of the mistakes seems obviously naive). This is an indicator that our classifier is working correctly.
Furthermore, when we plot the confusion matrix, we can see that the distribution is heavily skewed: the model makes the same mistakes over and over again but it rarely confuses other categories. This suggests that it just finds it difficult to distinguish some specific categories between each other; this is normal behaviour.
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)
True
doc(interp.plot_top_losses)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp.most_confused(min_val=2)
[('British_Shorthair', 'Russian_Blue', 5),
('Ragdoll', 'Birman', 5),
('staffordshire_bull_terrier', 'american_pit_bull_terrier', 5),
('Birman', 'Ragdoll', 3),
('Birman', 'Siamese', 3),
('Persian', 'Maine_Coon', 3),
('Persian', 'Ragdoll', 3),
('Russian_Blue', 'British_Shorthair', 3),
('american_bulldog', 'american_pit_bull_terrier', 3),
('american_pit_bull_terrier', 'staffordshire_bull_terrier', 3),
('chihuahua', 'miniature_pinscher', 3)]
Since our model is working as we expect it to, we will unfreeze our model and train some more.
learn.unfreeze()
learn.fit_one_cycle(1)
Total time: 00:26
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 0.558166 | 0.314579 | 0.101489 |
learn.load('stage-1');
learn.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.recorder.plot()
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
Total time: 00:53
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 0.242544 | 0.208489 | 0.067659 |
2 | 0.206940 | 0.204482 | 0.062246 |
That’s a pretty accurate model!
Now we will train in the same way as before but with one caveat: instead of using resnet34 as our backbone we will use resnet50 (resnet34 is a 34 layer residual network while resnet50 has 50 layers. It will be explained later in the course and you can learn the details in the resnet paper).
Basically, resnet50 usually performs better because it is a deeper network with more parameters. Let’s see if we can achieve a higher performance here. To help it along, let’s us use larger images too, since that way the network can see more detail. We reduce the batch size a bit since otherwise this larger network will require more GPU memory.
data = ImageDataBunch.from_name_re(path_img,
fnames,
pat,
ds_tfms=get_transforms(),
size=299,
bs=bs//2).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet50, metrics=error_rate)
learn.lr_find()
learn.recorder.plot()
LR Finder complete, type {learner_name}.recorder.plot() to see the graph.
learn.fit_one_cycle(8)
Total time: 06:59
epoch train_loss valid_loss error_rate
1 0.548006 0.268912 0.076455 (00:57)
2 0.365533 0.193667 0.064953 (00:51)
3 0.336032 0.211020 0.073072 (00:51)
4 0.263173 0.212025 0.060893 (00:51)
5 0.217016 0.183195 0.063599 (00:51)
6 0.161002 0.167274 0.048038 (00:51)
7 0.086668 0.143490 0.044655 (00:51)
8 0.082288 0.154927 0.046008 (00:51)
learn.save('stage-1-50')
It’s astonishing that it’s possible to recognize pet breeds so accurately! Let’s see if full fine-tuning helps:
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))
Total time: 03:27
epoch train_loss valid_loss error_rate
1 0.097319 0.155017 0.048038 (01:10)
2 0.074885 0.144853 0.044655 (01:08)
3 0.063509 0.144917 0.043978 (01:08)
learn.load('stage-1-50');
interp = ClassificationInterpretation.from_learner(learn)
interp.most_confused(min_val=2)
[('american_pit_bull_terrier', 'staffordshire_bull_terrier', 6),
('Bengal', 'Egyptian_Mau', 5),
('Bengal', 'Abyssinian', 4),
('boxer', 'american_bulldog', 4),
('Ragdoll', 'Birman', 4),
('Egyptian_Mau', 'Bengal', 3)]
path = untar_data(URLs.MNIST_SAMPLE); path
PosixPath('/home/ubuntu/course-v3/nbs/dl1/data/mnist_sample')
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)
data.show_batch(rows=3, figsize=(5,5))
learn = create_cnn(data, models.resnet18, metrics=accuracy)
learn.fit(2)
Total time: 00:23
epoch train_loss valid_loss accuracy
1 0.116117 0.029745 0.991168 (00:12)
2 0.056860 0.015974 0.994603 (00:10)
df = pd.read_csv(path/'labels.csv')
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
name | label | |
---|---|---|
0 | train/3/7463.png | 0 |
1 | train/3/21102.png | 0 |
2 | train/3/31559.png | 0 |
3 | train/3/46882.png | 0 |
4 | train/3/26209.png | 0 |
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)
data.show_batch(rows=3, figsize=(5,5))
data.classes
[0, 1]
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes
[0, 1]
fn_paths = [path/name for name in df['name']]; fn_paths[:2]
[PosixPath('/home/ubuntu/course-v3/nbs/dl1/data/mnist_sample/train/3/7463.png'),
PosixPath('/home/ubuntu/course-v3/nbs/dl1/data/mnist_sample/train/3/21102.png')]
pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
data.classes
['3', '7']
data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24,
label_func = lambda x: '3' if '/3/' in str(x) else '7')
data.classes
['3', '7']
labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]
labels[:5]
['3', '3', '3', '3', '3']
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)
data.classes
['3', '7']
by: Francisco Ingham and Jeremy Howard. Inspired by Adrian Rosebrock
In this tutorial we will see how to easily create an image dataset through Google Images. Note: You will have to repeat these steps for any new category you want to Google (e.g once for dogs and once for cats).
from fastai.vision import *
Go to Google Images and search for the images you are interested in. The more specific you are in your Google Search, the better the results and the less manual pruning you will have to do.
Scroll down until you’ve seen all the images you want to download, or until you see a button that says ‘Show more results’. All the images you scrolled past are now available to download. To get more, click on the button, and continue scrolling. The maximum number of images Google Images shows is 700.
It is a good idea to put things you want to exclude into the search query, for instance if you are searching for the Eurasian wolf, “canis lupus lupus”, it might be a good idea to exclude other variants:
"canis lupus lupus" -dog -arctos -familiaris -baileyi -occidentalis
You can also limit your results to show only photos by clicking on Tools and selecting Photos from the Type dropdown.
Now you must run some Javascript code in your browser which will save the URLs of all the images you want for you dataset.
Press CtrlShiftJ in Windows/Linux and CmdOptJ in Mac, and a small window the javascript ‘Console’ will appear. That is where you will paste the JavaScript commands.
You will need to get the urls of each of the images. You can do this by running the following commands:
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
Choose an appropriate name for your labeled images. You can run these steps multiple times to create different labels.
一个类别,一个文件夹,一个链接文本
folder = 'black'
file = 'urls_black.txt'
folder = 'teddys'
file = 'urls_teddys.txt'
folder = 'grizzly'
file = 'urls_grizzly.txt'
You will need to run this cell once per each category.
下面这个Cell,每个类别运行一次
path = Path('data/bears')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)
path.ls()
[PosixPath('data/bears/urls_teddy.txt'),
PosixPath('data/bears/black'),
PosixPath('data/bears/urls_grizzly.txt'),
PosixPath('data/bears/urls_black.txt')]
Finally, upload your urls file. You just need to press ‘Upload’ in your working directory and select your file, then click ‘Upload’ for each of the displayed files.
通过云端的Nb’upload’来上传
Now you will need to download your images from their respective urls.
fast.ai has a function that allows you to do just that. You just have to specify the urls filename as well as the destination folder and this function will download and save all images that can be opened. If they have some problem in being opened, they will not be saved.
Let’s download our images! Notice you can choose a maximum number of images to be downloaded. In this case we will not download all the urls.
You will need to run this line once for every category.
classes = ['teddys','grizzly','black']
download_images(path/file, dest, max_pics=200)
# If you have problems download, try with `max_workers=0` to see exceptions:
download_images(path/file, dest, max_pics=20, max_workers=0)
Then we can remove any images that can’t be opened:
for c in classes:
print(c)
verify_images(path/c, delete=True, max_size=500)
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
# If you already cleaned your data, run this cell instead of the one before
# np.random.seed(42)
# data = ImageDataBunch.from_csv(".", folder=".", valid_pct=0.2, csv_labels='cleaned.csv',
# ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
Good! Let’s take a look at some of our pictures then.
data.classes
['black', 'grizzly', 'teddys']
data.classes, data.c, len(data.train_ds), len(data.valid_ds)
(['black', 'grizzly', 'teddys'], 3, 448, 111)
learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4)
learn.save('stage-1')
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))
learn.save('stage-2')
learn.load('stage-2');
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
Some of our top losses aren’t due to bad performance by our model. There are images in our data set that shouldn’t be.
有些高损失值是因为错误标注造成的。
Using the ImageCleaner
widget from fastai.widgets
we can prune our top losses, removing photos that don’t belong.
ImageCleaner
可以帮助找出和清除这些图片
from fastai.widgets import *
First we need to get the file paths from our top_losses. We can do this with .from_toplosses
. We then feed the top losses indexes and corresponding dataset to ImageCleaner
.
Notice that the widget will not delete images directly from disk but it will create a new csv file cleaned.csv
from where you can create a new ImageDataBunch with the corrected labels to continue training your model.
ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)
ImageCleaner
生成这些图片以便清除ImageCleaner(ds, idxs, path)
'No images to show :)'
Flag photos for deletion by clicking ‘Delete’. Then click ‘Next Batch’ to delete flagged photos and keep the rest in that row. ImageCleaner
will show you a new row of images until there are no more to show. In this case, the widget will show you images until there are none left from top_losses.ImageCleaner(ds, idxs)
You can also find duplicates in your dataset and delete them! To do this, you need to run .from_similars
to get the potential duplicates’ ids and then run ImageCleaner
with duplicates=True
. The API works in a similar way as with misclassified images: just choose the ones you want to delete and click ‘Next Batch’ until there are no more images left.
ds, idxs = DatasetFormatter().from_similars(learn, ds_type=DatasetType.Valid)
ImageCleaner(ds, idxs, path, duplicates=True)
'No images to show :)'
Remember to recreate your ImageDataBunch from your cleaned.csv
to include the changes you made in your data!
First thing first, let’s export the content of our Learner
object for production:
learn.export()
This will create a file named ‘export.pkl’ in the directory where we were working that contains everything we need to deploy our model (the model, the weights but also some metadata like the classes or the transforms/normalization used).
You probably want to use CPU for inference, except at massive scale (and you almost certainly don’t need to train in real-time). If you don’t have a GPU that happens automatically. You can test your model on CPU like so:
defaults.device = torch.device('cpu')
img = open_image(path/'black'/'00000021.jpg')
img
We create our Learner
in production enviromnent like this, jsut make sure that path
contains the file ‘export.pkl’ from before.
learn = load_learner(path)
pred_class,pred_idx,outputs = learn.predict(img)
pred_class
Category black
So you might create a route something like this (thanks to Simon Willison for the structure of this code):
@app.route("/classify-url", methods=["GET"])
async def classify_url(request):
bytes = await get_bytes(request.query_params["url"])
img = open_image(BytesIO(bytes))
_,_,losses = learner.predict(img)
return JSONResponse({
"predictions": sorted(
zip(cat_learner.data.classes, map(float, losses)),
key=lambda p: p[1],
reverse=True
)
})
(This example is for the Starlette web app toolkit.)
learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(1, max_lr=0.5)
Total time: 00:13
epoch train_loss valid_loss error_rate
1 12.220007 1144188288.000000 0.765957 (00:13)
learn = create_cnn(data, models.resnet34, metrics=error_rate)
Previously we had this result:
Total time: 00:57
epoch train_loss valid_loss error_rate
1 1.030236 0.179226 0.028369 (00:14)
2 0.561508 0.055464 0.014184 (00:13)
3 0.396103 0.053801 0.014184 (00:13)
4 0.316883 0.050197 0.021277 (00:15)
learn.fit_one_cycle(5, max_lr=1e-5)
Total time: 01:07
epoch train_loss valid_loss error_rate
1 1.349151 1.062807 0.609929 (00:13)
2 1.373262 1.045115 0.546099 (00:13)
3 1.346169 1.006288 0.468085 (00:13)
4 1.334486 0.978713 0.453901 (00:13)
5 1.320978 0.978108 0.446809 (00:13)
learn.recorder.plot_losses()
As well as taking a really long time, it’s getting too many looks at each image, so may overfit.
learn = create_cnn(data, models.resnet34, metrics=error_rate, pretrained=False)
learn.fit_one_cycle(1)
Total time: 00:14
epoch train_loss valid_loss error_rate
1 0.602823 0.119616 0.049645 (00:14)
np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.9, bs=32,
ds_tfms=get_transforms(do_flip=False, max_rotate=0, max_zoom=1, max_lighting=0, max_warp=0
),size=224, num_workers=4).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet50, metrics=error_rate, ps=0, wd=0)
learn.unfreeze()
learn.fit_one_cycle(40, slice(1e-6,1e-4))
Total time: 06:39
epoch train_loss valid_loss error_rate
1 1.513021 1.041628 0.507326 (00:13)
2 1.290093 0.994758 0.443223 (00:09)
3 1.185764 0.936145 0.410256 (00:09)
4 1.117229 0.838402 0.322344 (00:09)
5 1.022635 0.734872 0.252747 (00:09)
6 0.951374 0.627288 0.192308 (00:10)
7 0.916111 0.558621 0.184982 (00:09)
8 0.839068 0.503755 0.177656 (00:09)
9 0.749610 0.433475 0.144689 (00:09)
10 0.678583 0.367560 0.124542 (00:09)
11 0.615280 0.327029 0.100733 (00:10)
12 0.558776 0.298989 0.095238 (00:09)
13 0.518109 0.266998 0.084249 (00:09)
14 0.476290 0.257858 0.084249 (00:09)
15 0.436865 0.227299 0.067766 (00:09)
16 0.457189 0.236593 0.078755 (00:10)
17 0.420905 0.240185 0.080586 (00:10)
18 0.395686 0.255465 0.082418 (00:09)
19 0.373232 0.263469 0.080586 (00:09)
20 0.348988 0.258300 0.080586 (00:10)
21 0.324616 0.261346 0.080586 (00:09)
22 0.311310 0.236431 0.071429 (00:09)
23 0.328342 0.245841 0.069597 (00:10)
24 0.306411 0.235111 0.064103 (00:10)
25 0.289134 0.227465 0.069597 (00:09)
26 0.284814 0.226022 0.064103 (00:09)
27 0.268398 0.222791 0.067766 (00:09)
28 0.255431 0.227751 0.073260 (00:10)
29 0.240742 0.235949 0.071429 (00:09)
30 0.227140 0.225221 0.075092 (00:09)
31 0.213877 0.214789 0.069597 (00:09)
32 0.201631 0.209382 0.062271 (00:10)
33 0.189988 0.210684 0.065934 (00:09)
34 0.181293 0.214666 0.073260 (00:09)
35 0.184095 0.222575 0.073260 (00:09)
36 0.194615 0.229198 0.076923 (00:10)
37 0.186165 0.218206 0.075092 (00:09)
38 0.176623 0.207198 0.062271 (00:10)
39 0.166854 0.207256 0.065934 (00:10)
40 0.162692 0.206044 0.062271 (00:09)
Intro Andrew Ng and Fastai ML courses
- what special about fastai ML course?
- why should take both Ng and fastai ML courses?
Deploy your model on Zeit
- just a page instruction
- free and easy
3:30-9:20
Student Projects deployed online 170-418
- what car is that by Edward Ross
- build the app help understand the model better
- no need to use mobile NN api
- Healthy or Not!
- [image:104C2ABE-D1CF-440B-A38B-6BE5569D516B-86291-000381CF60EFD424/26B26097-13E0-4526-95CD-64EBD0D52680.png]
- Trinidad and Tobago Hummingbird classifier
- [image:ADB4E47E-BFF5-4065-9471-E6AAE95CF2B8-86291-000381DF9FE0D786/2A47E6C9-ABAB-4FD9-B329-02DDE21DBD83.png]
- Check your mushroom
- [image:A2CAB546-E926-4BEF-B0FE-A97CA8C59C8E-86291-000381EE4799B078/0866584B-7A59-49B8-9718-CA9CC3D1881E.png]
- cousin classifier
- [image:0A3510E0-A979-43EE-A560-AFD9E5D0CD98-86291-000381F869D4F914/CFBB57C7-1ED6-4D19-9D63-2DD2A480C3BD.png]
- emotion detector and classifier
- [image:CDC2A15D-C402-4533-A51C-A572060A2AD6-86291-000382017E9B9B81/9AF78E81-7344-49B1-894A-F81033F7BFD6.png]
- sign language detector
- [image:29705796-EDB2-4EA3-B67F-34E1A272AD0B-86291-0003820E0154E0B5/3106750F-8A8D-4414-9910-780AB4DE4411.png]
- your city detector
- [image:ACB64105-9676-4D10-A7ED-A3CA094E4462-86291-0003821CB4217822/D18076AA-FA7F-4F10-A674-CA84BA810373.png]
- time series classification
- [image:877696BC-C47D-4AB7-AE0D-05205B9BCA85-86291-00038228FFA7EFC0/4738255D-E810-4830-84C7-D34109F22E69.png]
- facial emotion classification
- [image:1647B963-2CAC-406D-8EA5-B8C8E41431D8-86291-00038355D164F364/F9A95BBA-6683-4E15-8757-A798FC5258A0.png]
- tumor sequencing
- [image:D0B6EDB8-9781-4C0E-BE11-AE5E9CF49EDA-86291-0003835CAE96793E/6E587C7F-FD7C-43E8-B06F-EDE585B8007C.png]
- [Face Expression Recognition with fastai v1 – Pierre Guillou – Medium](https://medium.com/@pierre_guillou/face-expression-recognition-with-fastai-v1-dc4cf6b141a3)
Introduction to Satellite Imaging dataset 9:20-11:02 418-500
9:20-11:02
- check out the image examples and labels
- what is multi-label classification?
11:00-14:56
How to download dataset from Kaggle
- What is Kaggle and why it is good?
- How to download dataset from Kaggle?
- How to comment, reverse comment?
- use the notebook to guide the process of downloading
- how to download with 7zip format?
- how to unzip 7zip file?
14:56-18:28
Introduction to data block API
- Note the dataset is images with multiple labels
- how to read csv with pandas?
- what data object we use for modeling?
- previously what was the trickiest step of deep learning?
- how to create more flexible ways to create your DataBunch, instead of factory method?
- What is data block api and how does it work in general?
18:30-24:00
Introduction to Dataset, DataLoader, DataBunch
- What is Dataset class?
- what does __getitem__
and __len__
do?
- How to use DataLoader
to handle mini-batch?
- How to validate model with DataBunch
?
- How to link all the above together through data block api?
Explore data block api notebook
24:00-30:20
- use data block api notebook to play with the functions and tiny version of dataset
- a lot of skills to dig into (a lot of questions can be further created)
30:11-33:50
How to do transforms
- when and how to pick on flip_vert
?
- how to experiment to find out the best values for other parameters?
- what does max_warp
do?
- when and how to use it for different dataset?
- building a model procedure is the same as usual
How to create your own version of metrics
30:11-41:08
* what does we use metrics for?
* how to create the accuracy required by Kaggle?
* how does accuracy
in fastai source code?
* what does data.c
mean?
* why a threshold is needed for satellite dataset accuracy?
* how to create a special version of an accuracy function with specific arg values using partial
?
QA on corrected data, data api style, video frames
42-48:56
* should we record the error from app?
* how to do finetuning with the corrected dataset?
* how we set the learning rate with the corrected dataset?
* should data block api be in certain order?
* where does the idea of data block come from?
* how to dig into the details of data block source code?
* what software to pull frames? (web api, opencv)
48:37-50:27
How to pick learning rate carefully
* How to read the lr for fine-tuning?
* How to read the lr for full training?
* What is discriminative learning rate?
How to further improve CamVid model performance
50:27-56:32
why use smaller images than Kaggle provided for training?
why then larger images to train the model again can avoid overfitting and improve model?
How do we make use the larger images and train model?
- how to change the data with large images?
- how to put the new data into the learner previously trained?
- how to freeze most of the layers of the model and only train the last few layers?
- how to find the best learning rate?
- how to train the model 5 times?
- evetually we move up to top 10%
56:24-60:25
Introducing Camvid dataset
* what kind of problem is segmentation?
* What kind of dataset needed for segmentation?
* What industries have such segmentation problems?
* How to cite the datasets to get them credits?
60:23-63:06
QA How to find a specific lr number or range
- still a bit more artisanal than expected
- require certain experiment
- bottom point value not good
- try numbers x10 smaller and a few more around
- maybe someone will create an auto learning rate finder
63:06-69:50
How to do segmentation modeling
- How to get data?
- How to take a look at the data?
- how to extract labels for the data?
- How to open image and segmentation image?
- how to create DataBunch and how to set validation dataset?
- how to pick and use classes
names?
- how to do transformation for Camvid dataset?
- how to choose batch size?
- how convenient to do show batch for Camvid dataset?
- How to create a learner
- how to find the learning rate
- how to start training, unfreeze and train more
69:55-72:32
QA unsupervised learning and different sized dataset training
* can we do unsupervised learning do segmentation?
* cons of unsupervised learning for segmentation
* should we make smaller size dataset to do training?
* great idea and great trick to improve you model
72:35-75:03
what kind of accuracy do we use for pixel segmentation
why we use acc_camvid
rather than accuracy
?
what are void
pixels?
what are the basic skills you need to create such metrics?
75:03-76:21
QA what to do when training loss higher than validation loss
- what to do?
- training longer
- train the last bit with lower learning rate
- decease regularization
- data augmentation
- what to learn about regularization in coming weeks?
76:21-78:52
Why and how to create a U-net for segmentation
* what does a U-net look like?
* how a medical DL paper contribute to DL in general?
* U-net is the best option for segmentation so far
* What is the usual factory procedure for building and training segmentation model?
78:52-86:22
What is the trick of one cycle learning
* How to plot the loss during training and validation?
* Why loss go up a bit and then go down?
* why it is a good idea to start low and push lr up and push it down?
* how to explain it through visualization?
* too small or large is bad
* you want lr to decrease as it is converging
* what does the loss space look like?
* what is learning rate annealing?
* what is leslie Smith’s contribution in one cycle learning?
* How Smith see the loss space in general?
* What is Smith’s learning rate strategy to explore the space much more quickly and efficiently?
* How to get yourself to be an experimentalist?
86:22-90:56
how to get full image size to train the previous trained model
* how to get full image size to train the previous trained model?
* how to show result for segmentation model?
* how good is our model compared to the state of art?
* how big deal with U-net and one-cycle-learning from this example?
90:56-94:01
What is mixed precision training for memory limit
* what does to_fp16
mean? and how to use it?
* what the hardware requirement is?
* why it is so fast and easy with GPU?
* how good is its performance?
94:01-97:20
Introduction to BIWI dataset
What does this dataset problem do?
How to use the provided functions and methods to calculate the coordinates for dot on the face?
what is the interesting point about the ImagePoints
?
what kind of problem is it? (classification or regression)
DataBunch
?97:20-99:14
how to prepare BIWI data into DataBunch
* how to pick validation set?
* how to set the labels?
* how to do transformation?
99:14-101:11
How to train CNN for regression
* how to choose a model
* how to choose a loss
* how to find the lr?
* how to fit and save model
* how to check the result
101:08-106:52
Introduction to IMDB dataset
* How to classify document
* which submodule of fastai to use
* introduction to IMDB dataset
* how to create DataBunch quickly?
* how to create it step by step?
- what is tokenization?
- what is numericalization?
- how to do all these steps with data block api?
- What is a language model?
- How to train a learner and save the model
- there is a fastai paper on this model
108:40-116:45
How simple DL theory is
* why NN is all about matrix multiplication?
* what is nonlinear activation?
* what were the popular activations?
* what is ReLU, as the current popular activation?
* how come matrix multiplication + nonlinear + stack together = deep NN?
* what is universal approximation theorem?
* BP is used to find parameter values
* people can’t accept DL theory is so simple
116:45-119:10
QA how tokenization work on things like San Francisco
* what did the old way do (Bigram, N-gram)?
* what does NN do with tokens? (how words work together)
119:08-121:23
how to deal with images with four channels
* plan to incorporate into fastai library
* what if you only got 2 channels?
* what if you have 4 channels?
121:23-end
- a single workflow to solve different kinds of problems
- data block api is not big and you are free to add your own
summary of lesson 3
lesson plan and forward
::keywords::
classification, image regression, localization, tabular data, collaborative filtering, NLP transfer learning, U-turn, math
::key questions::
* What we learnt before lesson 4?
* What’s our focus (NLP transfer learning, and collaborative filtering) in lesson 4?
* What’s the math behind collaborative filtering?
* How to take a U-turn to dive into previous learnt applications behind the scene?
fastai model beat state of art in camvid dataset
::keywords::
The one hundred layers tiramisu (paper), camvid, state of art, smaller subset of classes, 94% > 91%, default setting
::key questions::
* How good is fastai model on camvid dataset?
* What is the fair comparison between different models on camvid dataset?
* How much can a default fastai model do these days?
NLP problems and neural nets approach
::Key words::
NLP transfer learning, IMDB dataset, Legal text classifier, Wikitext dataset,
::Key questions::
- What are the applications of NLP?
- Why it is difficult to use neuralnet to NLP classification?
- Why and how we say there isn’t enough information to learn?
- What is the nature or core of neural nets or deep learning?
- Why transfer learning is always the trick to go?
- How come Jeremy think of trying it then he can actually try it out, as if no one else thought of it and tried it? (I thought of it, but I didn’t know how to try it out)
How to do NLP transfer learning?
::keywords::
Wikitext, language model, IMDB, classifier, finetune, target corpus,
::key questions::
* What is a language model? what can it do?
* What is the difference between language model from Wikitext and IMDB?
* How so that to train a movie review classifier is to train with wikitext first, and finetune with IMDB dataset, and finally train the classifier with positive/negative dataset?
* Can language model learn some abbreviation expressions? think of language model generate math papers like output
* What is swiftkey’s language model in your phone?
* what exactly has been learnt from a language model trained with wikipedia dataset?
Experiment the IMDB sample and NLP basic procedure 14:00-19:44
::key questions::
* How to experiment on the IMDB sample from csv file?
* What is token, numericalization?
* How to access the vocab
?
* What is the default number of vocabulary?
* What is the threshold number of appearance to keep/throw the word?
* How to turn dataset from csv file into a DataBunch
with data block api?
* But how to put the original IMDB dataset into DataBunch
? (it is not in csv file anymore)
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()
df = pd.read_csv(path/'texts.csv')
df.head()
df['text'][1]
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data = TextDataBunch.load(path)
data.show_batch()
data.vocab.itos[:10]
data.train_ds[0][0]
data.train_ds[0][0].data[:10]
data = (TextList.from_csv(path, 'texts.csv', cols='text')
.split_from_df(col=2)
.label_from_df(cols=0)
.databunch())
How to train IMDB language model?
::key questions::
* what if you got a huge medical dataset no smaller than wikitext dataset?
* why we can use test set to train our language model?
* what does label language model mean?
* how to create a language model learner with RNN?
* what is dropout in terms of regularization?
* what is moms
in fit-one-cycle
?
* what does the model predict do? and how to do it?
* what does encoder do? and how to just save encoder as the model?
How to train a language model for classification
time 27:13-33:12
::key questions::
* How to create the DataBunch
to train the language classifier?
* why use vocab
?
* How to manage the batch_size
given the size of GPU memory card?
* What does the time spent look like on the second model and many classifier models?
* How to freeze up to specific number of layers?
* What is moms
or momentum
parameter for?
* How exactly do Jeremy figure out the best hyper-parameter value such as moms
to automate?
How to find the best parameter value for learning rate using random forest
time: 33:12-36:47
- Where does 2.6**4
come from?
[image:DAF31EAD-5DA5-4FD9-82A3-2299FF5EA1B0-11295-0002318019A7FB44/C9F72A37-9B3B-45A1-9D75-49539C461B27.png]
* How to use random forest search for the best hyper-parameter value?
* what is all about auto-ML
? build models to how to train your model
* but we are fond of building models to better understand how your hyper-parameters work
How to do tabular data deep learning
time: 36:31 - 53:09
* What are the problems with tabular data?
* How people first reacted to deep learning in tabular data problem?
* How such wrong reaction has been changed?
* Why and how (feature engineering and Pinterest conference) deep learning become powerful and useful in dealing with tabular data?
* What is Jeremy’s top options for tabular data problem? (DL, RF, GBoost?)
* What are the reasons why DL for tabular data not widely used? (library)
* Why fastai use pandas a lot for tabular data?
* What are the 10% cases in which DL is not the default approach?
* Why do we use URLs.ADULT_SAMPLE
dataset?
* How to make tabular DataBunch
from dataframe
?
* What are dep_var
, cat_names
, cont_names
and procs
?
* How to deal with categorical variables inside tabular data in DL models? (embeddings) How about continuous variables?
* What are the differences between processor
and transform
? (once ahead of time vs every time sending in model)
* What does FillMissing
, Categorify
, Normalize
do?
* Why do we split valid with split_by_idx
to have connected sub dataset for validation?
* How to build tabular model with get_tabular_learner
? what does parameter layers=[200,100]
do?
* How to combine NLP data with metadata (tabular data) and let DL to apply to them?
* Will sklearn and XGBoost go outdated?
* What does metrics
do?
How to apply DL to collaborative filtering
53:09-67:24
* What kind of problems do we apply Collab filtering?
* What is the data structure like? (user, movie, rating two styles representing)
* What is the pros and cons of the sparse matrix style?
* What if you want to learn to deal with large sparse matrix storage problem? (Rachel’s computation LA course)
* What is GroupLense
dataset about?
* How to experiment with the dataset using Collab filtering?
* How to create a collaborative filtering model?
* Why using Collab filtering was difficult?
* What is Cold stack problem?
* How Netflix fix the Cold stack problem?
* What is the other solution (predictive model) to cold stack problem?
* How to make language model learn to use emoji’s?
* How to deal with time series tabular data with DL? (extract and add more columns, not use RNN)
* Is there a source to learn more of Cold stack problem?
How to understand dataset and models with excel
time: 67:23-77:11
* How to visualize collaborative filtering process with excel?
* How to create weights for users and weights for movies?
* How to do gradient descent with solver?
Code Browsing - YouTube
Explore collab embedding with VIM
VIM Adventures
Timesavers: Bash kernel for Jupyter notebooks & ctag Vim navigation
77:07-92:28
How to use VIM to explore source code quickly?
What is embedding and how it is created?
Explain deep learning process up to output layer
92:11 - end
* what is the deep learning workflow?
* what is input layer, hidden layers, output?
* what are parameters, weights?
* what are activations?
* How much linear algebra we need to do deep learning?
lesson 5 outline
0:00-3:30
* downhill into details behind the scene
* why start with machine vision?
* why we finish with tabular data and collaborative filtering?
* how this lesson is structured by starting with the latest notebook?
* regularization is the key in this lesson and will help improve your models
3:20-8:30
review the workflow of deep learning and backprop
* How to understand the Layers of parameters and activations
* how to update parameters
* how to get activations with parameters
* Inputs are special kind of activations
* original inputs
* inputs created by element-wise function with activations, Relu
* Relu works all the time
* What is Universal Approximation theorem
* parameter matrix product with input
* activations with relu to get input features
* stack many and large enough of weight matrices together to solve any function to any level of accuracy
* This is all the trick you need about DL for CS
* What is bp?
* name sounds impressive, but
* in fact = prediction + target -> loss -> gradient -> update parameters by - lr*gradient
8:30-19:51
How to understand fine-tuning with ImageNet on new classifiers
what exactly does Resnet classifier do behind the scene?
how to change ResNet’s last second matrix of 1000 category to suit your classes need?
What are the other/remaining hidden layers good at?
- as layers go up, features become more complex
- you want earlier layers weights stay where they are
- so we want them frozen
What does freeze do to the model?
- don’t backpropagate those frozen layers
- model run faster
- earlier layers’ weights stay the same
After a while, we want to train the rest of network, how to do that?
- unfreeze the layers
- earlier layers need almost no update, refer to very small learning rate
- middle layers need slight higher learning rate to have a little more update
- later layers need larger learning rate to update even more
- this process is called “discriminative learning rate”
How to do discriminative learning rate with fastai?
- fit(1, 1e-3)
- fit(1, slice(1e-3))
, middle layer rate/3
- fit(1, slice(1e-5, 1e-3))
, spread reasonably
- different learning rate for each layer group
19:50-36:00
why is N-embedding better than one-hot encoding
What is Affine function?
- sort of matrix multiplications
- CNN: weights are tied, so affine function is more accurate
- most common in DL
How to use one-hot encoding as input
- 15 users one-hot encoding as user input
- 15 movie one-hot encoding as movie input
How to understand one-hot encoding vs N-embedding
- conventionally, user weights(embedding) do matrix multiplication with user input (one-hot encoding)
- output is activation, which actually is the same to user weights (embedding)
- activations are actually the same to user embedding
- or equal to user embedding with array lookup (computationally easy)
What does it mean that user embeddings corresponds to user idx (same to movie)?
- when movie and user embedding matrix multiplication output (activation) is high, it means
- user embedding features corresponds to movie embedding features
- they are referring to the same features in their own corresponding embedding values
- user’s features = personal tastes, corresponds to/matches to movie’s features
- these underlying features are latent factor or latent features
How to deal with bad movies even though there are good features inside?
- solution is to add bias (for both user and movie)
- use user bias to represent user rating behavior
- use movie bias to represent movie overall quality
- so, bias is important, this is why by default all NN add bias to train
questions on epoch and affine function
36:06-38:21
* When we load a pretrained model, can we reload activations to say what they are good at?
* yes
* What is Affine function?
* linear function
* multiplication + add up = affine function
* affine function (affine function) = affine function
* affine + relu + affine + relu …. = deep neural net
run full movie lens dataset with collaborative filtering
38:21- 48:02
- Introduction to Movielens dataset and how to pick the dataset
- How to open and check the dataset with pandas
- What does encoding="latin-1"
do?
- nowadays standards is unicode
- old style is latin-1
- what about the genre of movie in the dataset?
- how to use merge function to bring title into the table?
- how to create a CollabDataBunch
and set specific column to be the movie or item column?
- what about the trick of setting the y-range to improve performance?
- first use sigmoid to move values into [0, 1]
- better, to move [0,1] to [0, 5.5] make sure the largest rate can be 5
- What are n_factors
(latent factors)?
- matrix factorization
- the width of embedding matrix
- tried a number of values, 40 works the best so far
- How to pick the lr from lr plotting?
- paper author’s method
- find the lowest value
- then go back by x10
- Jeremy method method
- find the sharpest decline
- go back and forward by x10
- see which one works better
- go to LibRec to compare your result with the benchmarks
48:00-61:00
How to interpret the weights or parameters of users and movies
- What interesting inside the latent factors (embeddings)?
- How to deal with Anime fans who just love anime and rate very high, so you see many episodes of anime stay the top 100 over the top 1000 movies?
- this kind of features are captured by bias
- it is interesting to check out bias vector of all movies
- how to use pandas to find the movies being rated the most?
- to find movies we might have seen hopefully
- How to access model’s item/movie bias?
- it is a vector of course
- how to group mean_ratins
, bias
and movie title together?
- then, how to sort them by bias values
- to compare bias value and rating numbers
- How to squish the 40 latent factors into 3 factors?
- how to use pca
?
- Rachel teaches PCA from a different course
- How to use PCA to compare image similarities?
- How to group PCA latent factors with movie titles
- and sort by different factors
- how to interpret
- How to plot the movie by factors
How to read source code of collaborative learner
- just watch Jeremy explore it a lot more times
- dive deeper with vim
- dive deeper with ipdb
61:00-66:57
66:57-72:27
Interpreting embeddings
What is the big deal about entity embeddings?
- a Kaggle dataset with 2016 paper with entity embeddings
- how it work with different models?
What is the interesting founding from plotting the embedding?
- embedding projection discovered geography
- clear path between weekday and month of a year
72:20-79:35
what is weight decay
- weight decay is a kind of regularization
- how to understand regularization with Ng’s graph?
- what is under and over fitting?
- what is the lie taught in social statistics courses?
- too many parameters cause overfitting
- complexity depend on number parameters
- How to balance complexity and parameters number?
- real life is full of complexity, curve lines, more nonlinearity, many parameters
- but no more curvy than necessary
- but how to avoid overfitting at the same time?
- how to penalize complexity?
- use a lot of parameters but penalize complexity
- one way: sum up value of parameters(not really), sum up the square of parameter values
- all such value to loss
- what problem can it be?
- good loss require sum of squared parameters to be zero
- solution: multiply wd to the sum of squares of parameters
- generally, wd = 0.01, but here we use 0.1
- wd too small, model is easy to overfit, so can’t train too long
- what is the difference on using kwarg
between collab_learner
and learner
?
- how to pass on or add additional args
, such as wd
79:35-102:50
How to write SGD with weight decay from scratch
- how to implement SGD from scratch? (review)
- what is loss from scratch, such as MSE?
- how to move from loss to loss with weight decay?
- How to use MNIST dataset pickle file for experiment
- How to use Pytorch to create DataLoader, loss and Affine function?
- How to access a batch of dataset? (why always use DataBunch)
- Make sure you can do subModule
- What does nn.Linear
do? do the same to Mnist_logistic
- how to create a model out of the submodule you create?
- what does model.parameters()
do?
- why use crossEntropy rather than MSE?
- How to implement update
with weight_decay
?
- how to write w2
sum squared of parameters?
- what to access all the parameters for update?
- how to write loss with weight decay?
- what does loss.item()
mean?
- why we need reduce learning rate as we train the model further?
- see the plotting of loss
- why we call w2*wd
weight decay?
- linear algebra transformation outcome show wd*w
- without this transformation, we call it L2 regularization
- later they become different some how
- How to further refactor the code Mnist_NN
- How to do all the above with less code using Pytorch functions?
102:50-120:00
What is Adam optimization
* what is SGD in excel?
* what is momentum in excel?
* v = v_{t-1}*0.9 + g*0.1
* w = w_{t-1} - lr*v
* so momentum is gradients momentum, not weights
* what is the intuition of momentum on graph
* weighted gradient
* ::one more step by inertia, maybe we see the world better::
* how to do SGD with momentum in pytorch
* what is RMSProp in excel?
* where is the first citation of this method
* v = v_{t-1}*0.9 + 0.1*g^2
->
* if gradient is consistently small, v will be small
* if gradient is volatile, v will be large
* if gradient is consistently large, v will be large
* w = w_{t-1} - g_{t-1}*lr/sqrt(v_{t-1})
->
* if previous momentum of gradient is very small, let’s update weight in bigger steps
* vice versa
* ::make a change the previous lasting state, maybe we see the world better::
* learning rate is still necessary
* what is Adam in excel?
* adding momentum and RMSProp together
* w = w_{t-1} - lr*v_momentum/sqrt(v_rmsp)
* Deep dive into the excel sheet
* Deep Dive: An overview of gradient descent optimization algorithms
120:00-123:30
What is fit-one-cycle
- fastai takes care of the optimization details for us
- what does fit-one-cycle
do?
- make learning rate start low
- as we know very little of the world
- go up about half of the time
- knows the world better, direction is correct
- then go down about half of the time
- finetune to get closer to convergence
- right : momentum graph
- when steps are small and momentum is accumulating, suggest we can just make big steps
- when steps are big, momentum is small, suggesting we change flexibly
- when steps are getting smaller, the momentum is increasing, we can make big steps again
- make it super fast convergence
- an inspiring story
123:30-end
What is cross-entropy loss
Introduction to the toy dataset
what is the intuition of cross-entropy loss in excel
what is intuition of softmax in excel
- single label multiple classification
How pytorch does them both for us?
- nn.CrossEntropyLoss
has both loss functions above inside
what pytorch multiple class classification model return to us?
- pytorch default is different from fastai default
0:00-9:46
How to use platform.ai to label images
- How to use combine human skills with platform.ai to group or label images efficiently?
Getting Started with Rossmann Store Sales dataset
9:46-16:59
* How to understand Tabular learner source code?
* How to sort out Rossmann Store Sales dataset?
* Why the test set is more closer to the recent time?
* What is the loss function offered by Kaggle competition?
* Where to learn how to joint and manipulate data tables?
* Which notebook to clean rossman_data to generate pickle file?
* What does add_datepart
do?
* Why it is useful to turn date into metadata columns?
How to preprocess the dataset
16:54-22:26
- What is transform and what transforms are good at?
- What is preprocess and how to use it?
- How to graph a small subset of data to experiment with?
- How to categorify for tabular dataset?
- what does categorify mean and what the output look like?
- What does NaN
and -1
mean?
- Why we can’t use -1
? and How we deal with -1
here?
- How to use FillMissing
to fill the missing value with median values?
- How does fastai do all these preprocessing automatically or easily together?
22:26-27:14
How to create DataBunch for Rossmann dataset
- how to provide column names for all variables?
- how to determine validation set using test set from Kaggle?
- why make sure label_cls
to be FloatList
not IntList
?
- Why use log=true
for FloatList
to use RMSPE
?
How to build tabular model for Rossmann dataset
27:14-30:04
- What to do about y_range
for tabular_learner
and why?
- What kind of NN structure for tabular dataset? (simple fully connected)
- What does layers=[1000, 500]
suggest?
- How to use ps
and emb_drop
to prevent overfitting?
30:03-39:14
How to understand and use dropout
* How to understand the basic idea of dropout from the paper?
* not only hidden activations but also inputs (sometimes) to be thrown away
* why it is useful?
* to make parameters to remember patterns rather than the specific images
* where usually do new brilliant ideas come from
* not from math
* but from life and intuition
* How to choose the probability to drop out for each layer?
* a single p
* or list of p
* What to do in training and testing time?
* do we throw activations too?
* if not, how to balance with training throwing percentage?
- How to understand dropout C source code?
- How to use ps
and emb_drop
in coding?
- How to understand the use of emb_drop
?
- just another layer’s output/activations to be dropped with probability
39:14-42:25
How to understand embedding layers here
How to experiment things out to find the best hyper parameter values?
- such as the process of finding emb_drop=0.04
How to understand embeddings?
- how embedding layers corresponds to categorical input variables?
- how to read and set the embedding sizes?
- continuous input variables work not with embedding layers, but batch norm layers
42:00-48:28
What is Batch normalization
* What is Batch normalization in a sentence?
* What is interesting about the recognition story of dropout paper by major journals?
* Why batch norm paper was accepted quickly?
* Why you should understand papers and give no big deal to math jargons?
* What is the real reason why BN is so powerful?
* loss vs parameters is not as bumpy, therefore model can converge with higher lr
- what does BN algorithm actually do?
- BN is a layer to produce activations
- get mean and sd from a batch, and normalize the batch with them
- [image:C1E8B194-E4C9-4561-89B3-0AF453351EF4-76996-000338D1A2BB7D80/579DC33D-7356-4458-907E-68A08C964680.png]
- scale (coefficient param) and shift (bias param) the distribution of the batch (most important)
48:25-52:00
Why BN (scale and shift) make a difference
* what is the problem behind the scene?
* target range 1 to 5
* prediction range -1 to 1
* it causes difficulties
* scale with alpha and shift with beta can deal with it
* to transform [-1,1] closer to [1, 5]
51:50-54:56
How to use BN in code
- what does momentum=0.1
mean for BatchNorm1d
?
- value low = mean and sd vary less between mini-batches = less regularization
- value high = otherwise = high regularization
- it trains much faster now
54:56-56:46
How to pick between BN, data augmentation, dropout, weight decay and L2 norm
- L2 = weight decay, use weight decay
- always use BN, data augmentation
- experiment to see the combination options for dropout and weight decay
56:45-65:24
How to do data augmentation
- Why it is Least well studied and most exciting regularization?
- no cost
- no longer to train
- no underfitting
- how to find out all about data transformation through docs
- how to pick appropriate values for brightness
- how to pick for diheral
- how about flip
- pad mode
a fastai paper about it
- what is symmetric warp
doing
- how to transform a single dog picture into many “different looking” images
- why data augmentation is such a big potential opportunity?
65:12-67:30
Overview and Why to understand CNN by creating a heat map in the end
* how to quickly create, train and save a CNN with fastai?
* To understand CNN by creating a heat map from scratch
67:27-75:05
How to understand kernels with Setosa’s web app
why study how CNN work in the end of a course?
- not useful in terms of just using them
- but if want to do it slightly differently, we need to know CNN behind the scenes
- convolution: a special matrix multiplication
How to understand CNN kernel or image kernel with Setosa’s web app?
- how the kernel transform an image?
- why there is black outer edge of the output image?
- why head area is transformed into while cells but face areas into black cells?
- How to define a convolution with this example?
- How to relate this to channel visualization with the paper?
- Why such kernel is to help find up-edges?
75:05-80:00
How to understand convolution differently and what is padding for
* How to view convolution as standard matrix multiplication?
* transform kernel convolution movement into a single larger matrix kernel doing simple matrix multiplication with input matrix
* How to understanding padding?
* to keep the output feature map the same size as input feature map
79:55-89:39
How kernels, stride, padding work in a real CNN
what does a 3-channel kernel look like? and how does it work?
how do we find more features by adding more 3-channels kernels?
- e.g., add 16 kernels to find 16 different features
Why and How to shrink the feature map but double the kernels?
- avoid memory goes out of control by kernel skipping over one or several pixels
- feature map size shrinks, but we can add more kernels
Let’s Experiment an image with kernels, stride and padding
- create a CNN over an image
- check out its model summary, particularly its feature map size half and kernels double
89:30-93:49
How to do your manual CNN
- how to create your own 3 channel kernel with 4D to show bottom right edge
- how to get a single image
- how to create a kernel as a 4D tensor
- how to create a mini-batch of size 1
- how to apply the kernel to an image
93:46-109:00
How to create the heat map
- how to turn a 512x11x11 tensor into a vector of 37 values
- average pool 2d with output size 1
- linear layer with (512, 37)
- what does the finally feature map (512, 1, 1) tell us?
- what does (512, 11, 11) tell us?
- what does it mean to average cross same cell position for 512 channels rather than 11x11 grid of a single channel?
- how to use hook to get the feature map 512x11x11?
- how to run model on a single example
Ethics and data science
109:08-end
- what are generative models?
- what are the ethics issues of data science
- what are the gender bias based on facial classifier on major DL companies
- why? what are the reasons caused such bias? (where is data source)
- how biased surveillance DL cause massive arrest?
- the best way to get publicity is to do something like “Amazon Face Recognition falsely matched black 28 members of congress with mugshots”
- google machine translation seem don’t fix the gender bias
- machine bias is overwhelming in public policy and judicious system
- Facebook and Mianmar genocide
- how should a DL engineer face ethical issues
A warning of lesson 7 and a student role model to checkout
0:00-1:59
- Food classifier with fastai on Android and IOS app
- help docs, tutorials, community organizing
data block
api step by step?1:59-11:02
How to do data block
api step by step
- how to extract images with gray scale with ImageItemList.from_folder
and convert_mode
?
- how to access each item from the folder path object?
- how to set the default color map for fastai?
- why fastai make each image into a rank 3 tensor rather than a 2D matrix?
- how to access an image item as file path and item as image?
- how to split training and validation sets by two folders “training” and “testing”?
- the images inside “testing” folder do has labels, not real testing data without label
- How to check what included inside training set folder?
- inside training set folder, there is a folder for each class
- how to provide labels for your training and validation sets?
- then check to see the difference from previous step
- how to access a single image example from training set with both x and y?
- how to do transforms for small image dataset recognition?
- how to do it with tuple setting for transforms?
- how to create DataBunch
with normalize
?
- how to access data example from DataBunch.train_ds
?
- how to plot an image from data.train_ds
?
- how to plot this image with different transformations?
- how to get a batch of x and y from DataBunch
?
- how to show a batch of data?
How to create a CNN model from scratch
11:00-16:54
- How to refactor nn.Conv2d
for usual use?
- leaving two function inputs:
- ni
: number of input channels
- nf
: number of output channels
- #14
: the size of feature map 14 x 14
- How down-sampling from 7x7
to 4x4
?
- 7/2=3.5 + max_ceiling
= 4
- How to build a CNN model with conv
and BatchNorm2d
and Relu
?
- [image:16EDA985-0D1A-4765-A638-A584274C1AFD-76996-00031B1524A7B3E7/31DEDE98-8C85-49DD-8BBD-805C7617B89C.png]
- How to use a single batch of data xb
to double check on model
built above?
- [image:30C46125-9967-4C3A-8CA4-E4D671AF1DBD-76996-00031BCD20A29908/E30127C3-CE26-42E0-AF31-531AF1DC1C19.png]
- How to refactor code further into `conv2`?
- use fastai `conv_layer` which include `conv2d`, `BatchNorm` and `Relu`
- refactor `conv_layer` into `conv2` with `stride=2`
16:10-31:02
How to create a Resnet from scratch
How to make the CNN deeper without shrink feature map size?
what caused a deeper CNN model to perform worse than a shallow one?
- [image:07F9441D-7B1B-42C7-AB03-511415A802C2-76996-00030C8FB1006D28/5F24E25C-4439-4077-BDE9-1F48D0CF7A30.png]
what does really good researcher do in front this kind of problem?
What is the key insight that ResNet or ResNet block offers to us? (identity/skip connection)
What the real reason for why ResNet block work so well?
- [image:4291DC80-8849-40B6-B0BA-EA852DFAB7F2-76996-00030CDE88DA86CA/CC14544A-4771-453E-B8CA-7A32D0B5DCD2.png]
How to write the ResNet block?
- [image:5FD3BB7A-9C45-44C5-9C19-339F57D213DC-76996-00030CEA99BEF92C/2D3B65A4-D464-4B14-BA1D-1259E04C3094.png]
How to build the ResNet model?
- [image:AB5244DF-26E5-4ED5-9868-41E9D041F6C6-76996-00030CFD98B63574/25A2D232-5166-4BF7-87D9-4003793E359F.png]
sequentialEx
and mergeLayer
to create Resnet block
and Dense block
?::26:18-31:36
How to use sequentialEx
and mergeLayer
to create Resnet block
and Dense block
What are the pros and cons of Dense net?
What kind of tasks or problems does Dense net good at solving?
How does it link to U-net?
29:50-48:38
How to build upgraded U-net from scratch
- how much better skip/dense connection help to strengthen U-net on segmentation?
- 29:50-32:12
- What does it mean by upgrading U-net with ResNet and Deconvolution
- 32:00-36:10
* What is the wasteful way of doing deconvolution and what is the better way (nearest neighbor interpolation?
* 36:08-41:20
* how to enlarge feature maps not shrink anymore?
* [image:59E5FB01-2CFB-47DA-8085-05DD1DE7368D-76996-00031687B20EFB04/71B721CF-5FD1-494C-86BD-6149546E7D33.png]
* part2 : pixel shuffle > NN interpolation
- How to implement U-net?
- 41:20-48:31
- why simple convolution (down-sampling) + deconvolution(up-sampling) won’t work
- What kind of special skip connection does U-net have?
- skip long distance and not add but concat
- How such special skip connection enable U-net to better segmentation?
- 41:20-43:35
- What is the U-net implementation?
- 43:30-47:24
- what does a UnetBlock do and how to implement it?
- How exactly does U-net train? ::Not explained in the course::
49:50-52:02
Why concat before and How to keep dense concat without shrinking
48:29-97:13
How to make low resolution image with high resolution image
48:29-49:50
What is image restoration
What are those specific applications of image restoration?
52:04-55:12
How to crappify image as low resolution
Why to crappify image?
How did Jeremy do it? (low res and text written)
* How to open image file
* How to resize and bilinear interpolation
* How to write a text on the image
* How to save image with random quality level
[image:729A7FED-5489-478E-9D46-D06A95DE02D3-76996-0002E3F330FC7744/F7BF4E6B-D36E-4081-8005-6884400B0C9E.png]
Why not always see the text or number?
[image:16DA9837-D19A-4F7B-8796-EE82F3B0FC08-76996-0002E3F80F200E0C/685F50BB-D57C-4F3E-84A9-8BEDC654C340.png]
How to speed up with parallel?
[image:28BC542D-07E2-4ACA-82DD-26C52FFE4FE4-76996-0002E45B5374B1E2/A4868BD2-6333-44FC-B583-AE07FFF2666C.png]
How to come with your own crappification?
- this is how to make something interesting or original
Why crappify is important to models to learn?
55:12-58:11
How to train a model to remove watermarks
Why use a U-net to train?
How to create the DataBunch?
Why need transfer learning to get rid of the text in the image?
What is a generator learner?
What does MSELossFlat
do here?
What is frozen for the U-net?
[image:846764AE-B33B-4504-8F95-CE6412299973-76996-0002E6F8C0878DB5/FDC2EBA7-29CB-4C0D-94EB-E39596FA4DCB.png]
58:10-64:07
How to use GAN to upgrade image resolution
why to blame the MSE loss for current model can’t upgrade the resolution?
How GAN solves the upgrade problem with another loss function by calling another model?
How to understand all the concepts and the workflow in the diagram below?
[image:D3D36EFF-991E-4A7B-934C-276869A877A0-76996-0002F142639FB908/8E89A71B-A3FB-462A-8A96-279CC9BB7A7C.png](how loss change between different stages)
How the generator training and critic training do the ping pong game?
why it is a pain to train GAN (very slow to train especially at the beginning)?
How can pretrained both generator and discriminator to solve this problem?
images
and image_gen
64:07-70:10
How to build and train a discriminator to tell images
and image_gen
What folders of images do we need for discriminator training?
How to generate and save prediction images into a folder?
- how to create and remove directories (trees)
- how to get all the image file names
- how to access each batch of files at a time
- how to save images with specific names in a directory
- [image:EDA6CF1C-83E8-4351-A8FF-0F144B9431B5-76996-0002F6B41F32AE5B/7A5230C9-302E-4B45-9DEC-8FE5394AAAA1.png]
Why we should start to learn write our own codes/functions?
How to use GPU memory efficiently to avoid restarting notebook?
How create DataBunch
for discriminator with two classes images
and image_gen
?
- [image:ECD5A2F8-A1D2-4CF7-8A89-9C3495D82150-76996-0002F6AC6B981613/381EAF2C-0163-4BF1-9B31-744143C9E2D5.png]
Why do we use a specific gan_critic
model architecture rather than ResNet to build discriminator?
- what kind of loss do we use here?
- what is spectral normalization to make GAN work?
- maybe we could make a ResNet with spectral normalization to replace gan_critic
- how to train the critic
- [image:C82EBF3E-0DAD-487B-96D6-0DF587286F76-76996-0002F6C426E9DFFA/D2675782-151B-4BE5-8A91-170D7C868D5F.png]
70:08- 73:05
How to use GAN to do pingpong with generator and discriminator training
How to use GANLearner.from_learners
to train the ping pong process?
How weights_gen
combine both losses (pixel MSE
and binaryEntropyLoss
) together to balance generator and the critic?
Why and how to downplay momentum
when use Adam
?
- [image:7321EE22-B189-47AE-B7CA-EB65AEF00CED-76996-0002F93E289E4CA9/0A83E2DD-B404-41E5-A5D8-B1FE3E637856.png]
How to understand the gen_loss
and disc_loss
during the training? (one gets better and the other gets worse, and vice versa)
How to show the result of gan training?
- [image:C16059DC-7061-4621-ADB9-68F9CB0ADB33-76996-0002F99336D4EBFC/778D82F1-55F6-46B9-8A1B-A2FDFC96EB68.png]
74:47-75:56
when use to U-net or not
- segmentation and high resolution
- classification make no sense
75:56-78:37
How can WGAN do
What does WGAN aim to do?
What does generator aim to train a noise image into?
Does WGAN use any pretrained model? NO
After hours, some bedroomish images can be produced eventually
why cat’s eye features can’t be upgraded with current GAN?
73:05-74:47
How to implement feature loss paper to help
78:37-97:06
How to use the perceptual loss paper idea to solve the problem
78:37-83:49
- How exactly does the style loss and content loss help to capture eye features?
- do we really do GAN still now? (seems not)
How to implement the perceptual loss paper into FeatureLoss
function?
83:52-88:54
* how to do crappification for this model?
* which the loss or base loss to pick? which loss does Jeremy like better?
* How to grab all the feature layers of pretrained VGG model?
* how to get all the layers for generating features losses?
* how to create the feature loss or perceptual loss?
How to train and test on our U-net with feature loss model?
88:54-93:25
- how to train a U-net with feature loss
- how to refactor do_fit
to make the process easier a little
- how to train to improve performance
- test the model with larger images
What can we be creative with U-net + GAN + feature_loss notebook?
93:00-97
[image:3F83E9B6-CCF5-4BB7-91D3-008E50AFF678-76996-000305FCE24409DE/0AC75F18-A7C0-42EA-8410-FD9EEC5D54DB.png]
* What Jason’s crappification approach?
* What is deOldify doing?
* what should we learn and do about crappification and deOldify?
97:00-98:41
What we have learnt in part 1
How to link all the concepts below to pain a brief picture of deep learning workflow?
[image:7BE0220B-A7E5-471D-AAEF-B571901C66B6-76996-0002CD2B80C48B99/96AB3B59-287C-4127-83A6-E138EA601D64.png]
- people usually have to watch the lesson three times to get all the details and feel comfortable with those key concepts
98:34-100:22
How to represent basic NN with single hidden layer with diagram
[image:54EB7701-8A48-4AD7-8165-97B9ECAA8B39-76996-0002CF0A4228A58B/AC0913D7-E16D-477F-B4C1-57387524EA8C.png]
- make sure you are comfortable with how to calculate the shape of the input, activations, and output
100:02-103:12
How to predict 3rd or 4th word with fully connected NN diagram
- How do fully connected NN use two words to predict the third?
- How then to predict the fourth word?
- why should same color parameters should be the same set of parameters?
[image:A3DB17E3-AB76-4818-A1BF-D767C7B646DE-76996-0002D85254D25DE9/BECCCC16-C799-41D3-943B-EBC449BEDA64.png] [image:2EBD0725-7D53-4292-9E85-CCD47C02242F-76996-0002D857FC4B16B9/B4551F3D-468F-4721-9A70-BCA8BF561B96.png]
103:09-109:11
Toy example - how to create the training and validation sets
- how to access the number of tokens in validation set?
- how to distinguish bs=batch_size
, bptt=backpropagation through time
, num_batches
?
- How to get 3 batches of data from validation set one by one?
- How to count the number of elements within a batch of x or y?
- Why bptt=70
but first batch has 95 elements and second batch as 69 elements?
- How x1
and y1
differ from each other?
- How to textify
numbers into words?
- How mini-batches of x join up with each other?
[image:AE5A0EB9-4737-49D4-A96D-5D7F639C6795-76996-0002D54E2D48B6B0/22BDBA2A-B966-420D-8A35-C0FD52358568.png] [image:18BC726D-3B3A-425F-A1FD-056D96B27B76-76996-0002D55B8C6BCD68/A55BC1CA-08A8-4410-A760-E044A17C636E.png]
[image:F0C85252-1AFF-453A-82C2-16D1F13119F5-76996-0002D564FBB255BD/384FE658-537D-45BC-AF0D-850578C3D6D0.png] [image:90F9A20B-F518-490F-9589-E642B14FEA94-76996-0002D56F65054CFA/DB6FB9FD-D207-470E-9069-631F54B9992A.png] [image:8617D414-3D73-472F-8A61-0D7A74FD76B1-76996-0002D57EE9EF1C9F/280A1DD4-0F34-4ED1-8063-951DF0C4258E.png]
108:57-112:08
How to build the predicting n-th from n-1 words model based on the diagram above
* How to implement the diagram into a NN?
* When is appropriate to refractor code?
* Why RNN can be seen as actually the NN with refactor?
* How to create a tensor container for h
the activation?
* Why the h
the activation shape should be a fixed size? (just assume to be in the video)
[image:3559F672-B354-4F89-9B4C-3BAF03458589-76996-0002D67E54EA29E7/392E6C24-BDAD-499F-8179-C84AFFA17A3B.png] [image:FEBC5FE9-71E0-4759-BBAC-DB98CB7FD8D2-76996-0002D6BE424D17C4/76DF1742-C01A-4065-8994-3FC590FF12FF.png]
112:08-115:00
How to build the NN to predict N-th word with N-1th word
- How can such model’s loss function make the most out of words input (compared to previous model)?
- [image:115D0079-23B3-41B6-A52C-1F49EAAAFE22-76996-0002D8B1CA8E7878/EDE758BD-BEB8-4B1C-97C3-8A5821C2C0D0.png]
* What does the diagram and the NN look like now?
* [image:AD60460F-79F2-4AB9-A62B-7CEA27DEE907-76996-0002D8CF71CAF5D2/AF0DF661-4D14-4855-AA30-B80D8CE0325D.png] [image:05A5B6D4-9590-40E1-8E68-A7F5A0C0C163-76996-0002D8DAE81AD37C/0B17DF53-3B3B-46E0-8038-765B09D66A52.png]
- Why this new model has a worse performance?
- [image:B58A024E-E6D2-40F4-BC31-7CCEAAD717F1-76996-0002D8EC63A82D9B/9770EC6E-5046-417C-80FD-09132DBB0B24.png]
- How to solve this problem?
- [image:602586BF-3EC6-4B5C-9CD6-598BC7BB0E66-76996-0002D909E79C817B/1E5F7C06-85C4-4307-BBC8-E7B0454BC1EB.png][image:0D53D28C-C4CA-4315-A73D-D0D6FC0553B4-76996-0002D920A624A226/606BD09D-1FD6-493C-BFCF-4457923C4DCF.png]
* So, what is RNN?
* just fully connected NN with refactor of loops
115:00-119:00
How to construct multi-layer RNN
- How to refactor the code to with nn.RNN
?
- [image:FBD17C9F-E448-4C70-9667-7518665CE381-76996-0002DBFE40EF2337/A2496C4A-EB7C-4FE3-8A7B-7183F6F24527.png]
- How to construct a 2-layer RNN?
- What is GRU or LSTM
- [image:98B2640C-1011-4FBC-B3CF-7A9D7F02D8BC-76996-0002DC3F7C828DAD/3E17705E-BFA2-4FC7-8CBF-FAC8D15AFC9E.png] [image:23733A06-1A7D-42C7-935E-2BC9BFAE7045-76996-0002DC47DC4EE864/F2D891A4-8012-461A-8814-6BEC49DEC1B1.png]
* What are sequence labeling tasks?
* What and how to do NLP classification?
1:58:59-end
How to learn by heart
- a second time can always help to get some bit of the lesson previously not understood and enable to implement some code which was not able to previously
- make sure you code something on your own
- people can confirm what you did right and where to improve and learn more
- more papers to cover in part 2
- just focus on practical sections such as “why we are solving this problem” and “what are the results”
- put into words on what you learnt
- not for DL academic professionals
- but to help people like you 6 months ago
- to get help from others
- to help others
- to share your successful stories
- social learning is very helpful
- we can do book clubs, meetups, study groups
- make the world a slightly better place
- or, make people you love a little more delight
- just finish something, build something, such as a model can generate tweet sounds like Elon Musk
- people on forum can help even guide you to do so
- you can build an app, create a project, help with library
[image:9D096B6E-14A0-40E2-AB84-41EB42F1ABC5-76996-0002CB13B44C88BA/FD9C3647-BCE9-47D8-BB36-8EF64F8D9356.png]
- it may seem boring from outside
- help docs, texts require deep understanding of the implementation of codes
- curators can send you papers and materials to figure out why they wrote code this way
- eventually you are going to write the docs and texts to explain it clearly
[image:E6B8B10E-1561-469E-B457-9EB422A37EA0-76996-0002CB1789E76F94/391148FC-2619-4F2E-AF71-F7E7C064E926.png]
- go on to forum and find your timezone
- get a google sheet to sign up
- to create projects and wiki together
- see how the fastai codebase was built from stage to stage
- talking about software development in terms of fastai
- to learn the process of doing research and reading papers
- how to turn math into codes
- many more advanced architectures
[image:48B3388F-1A6C-4D5F-B31D-BFF092F6F724-76996-0002CC48AABA35C9/18DAEBA0-62B4-4E4A-84A0-DF8DC9AC6ECD.png]
- people shocked to see me disorganized and incompetent
- have a good time without a specific plan, just want to finish it
- DL is not like web app with regular feedback and specific milestones, therefore you must be able to have fun in DL to keep you going
- No meetings, phone calls, coffee, TV, PC games, but a lot of time coding, reading, exercising and with family
- make sure to get something finished properly, and even get a group to do it together
- RL is overly complex and less useful to normal people in day to day work
- Transfer learning has always been under appreciated and researched, help changed NLP with transfer learning. I am excited to get transfer learning work better and faster in many areas
- just coding and code all the time
- make sure you know all the tiny coding skills we covered
- rebuild all the notebooks from scratch but with fastai lib
- it makes you top edge students or practioners
- become a software to use without coding
- get rid of course and code and do useful stuff easily and nicely
%matplotlib inline
from fastai.basics import *
In this part of the lecture we explain Stochastic Gradient Descent (SGD) which is an optimization method commonly used in neural networks. We will illustrate the concepts with concrete examples.
The goal of linear regression is to fit a line to a set of points.
n=100
x = torch.ones(n,2) # 第二个特征的值都是1, 让问题简单点
x[:,0].uniform_(-1.,1)
x[:5]
tensor([[-0.1957, 1.0000],
[ 0.1826, 1.0000],
[-0.1008, 1.0000],
[-0.1449, 1.0000],
[ 0.7091, 1.0000]])
a = tensor(3.,2); a
tensor([3., 2.])
y = x@a + torch.rand(n)
plt.scatter(x[:,0], y);
You want to find parameters (weights) a
such that you minimize the error between the points and the line x@a
. Note that here a
is unknown. For a regression problem the most common error function or loss function is the mean squared error.
def mse(y_hat, y): return ((y_hat-y)**2).mean()
Suppose we believe a = (-1.0,1.0)
then we can compute y_hat
which is our prediction and then compute our error.
a = tensor(-1.,1)
y_hat = x@a
mse(y_hat, y)
tensor(7.9356)
plt.scatter(x[:,0],y)
plt.scatter(x[:,0],y_hat);
So far we have specified the model (linear regression) and the evaluation criteria (or loss function). Now we need to handle optimization; that is, how do we find the best values for a
? How do we find the best fitting linear regression.
We would like to find the values of a
that minimize mse_loss
.
Gradient descent is an algorithm that minimizes functions. Given a function defined by a set of parameters, gradient descent starts with an initial set of parameter values and iteratively moves toward a set of parameter values that minimize the function. This iterative minimization is achieved by taking steps in the negative direction of the function gradient.
Here is gradient descent implemented in PyTorch.
a = nn.Parameter(a); a
Parameter containing:
tensor([-1., 1.], requires_grad=True)
def update():
y_hat = x@a
loss = mse(y, y_hat)
if t % 10 == 0: print(loss)
loss.backward()
with torch.no_grad():
a.sub_(lr * a.grad)
a.grad.zero_()
lr = 1e-1
for t in range(100): update()
tensor(7.9356, grad_fn=<MeanBackward1>)
tensor(1.4609, grad_fn=<MeanBackward1>)
tensor(0.4824, grad_fn=<MeanBackward1>)
tensor(0.1995, grad_fn=<MeanBackward1>)
tensor(0.1147, grad_fn=<MeanBackward1>)
tensor(0.0893, grad_fn=<MeanBackward1>)
tensor(0.0816, grad_fn=<MeanBackward1>)
tensor(0.0793, grad_fn=<MeanBackward1>)
tensor(0.0786, grad_fn=<MeanBackward1>)
tensor(0.0784, grad_fn=<MeanBackward1>)
plt.scatter(x[:,0],y)
plt.scatter(x[:,0],x@a);
from matplotlib import animation, rc
rc('animation', html='jshtml')
a = nn.Parameter(tensor(-1.,1))
fig = plt.figure()
plt.scatter(x[:,0], y, c='orange')
line, = plt.plot(x[:,0], x@a)
plt.close()
def animate(i):
update()
line.set_ydata(x@a)
return line,
animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
In practice, we don’t calculate on the whole file at once, but we use mini-batches.
For classification problems, we use cross entropy loss, also known as negative log likelihood loss. This penalizes incorrect confident predictions, and correct unconfident predictions.