Check your understanding of the lesson 3
<<< Check your understanding of the lesson 2 | Check your understanding of the lesson 4 >>>
Hi guys,
I did watch again the video of the lesson 3 (part 1) to get the whole image and I took notes of the vocabulary used by @jeremy.
Let’s play ! OK ?
Can you give a definition / a url / an explanation for all the followings terms and expressions ?
If yes, you are done with the 3rd lesson !!!
PS : you do not want to test yourself or you want to check your answers ? Go to the blog post “Deep Learning 2: Part 1 Lesson 3” of @hiromi : " super travail !!! "
- try to teach what you learned by posting in a blog
- wiki thread in the Fastai forum
- AWS fastai AMI
- Github
- Tmux (Ubuntu, Mac Os)
- Understand why there are validation images not well classified
- learning rate
- why a low learning rate is safer but slower for training a NN ?
- why a high learning rate can increase the value of the loss function ?
- learn.lr_find(); learn.sched.plot()
- batch size
- SGDR
- fastai vs pytorch
- CNN ou Convolutional Neural Network
- Resnet
- Beginner Fastai forum
- Kaggle site
- How to download data from Kaggle : script kaggle-cli
- pip install kaggle-cli
- accepts the competition rules in Kaggle site
- kg download -u user -p ‘password’ -c competition
- How to download images from any sites
- CurlWget as Google Chrome extension
- symlinks
- ls -l in a terminal
- Quick DogsCats
- fastai.conv_learner
- tfms, data transformation
- data object
- shift + tab
- test_name=“test”
- learn object
- precompute=True
- learn.unfreeze()
- learn.bn_freeze(True) for deeper NN (resnet50 and above) with similar dataset that Imagenet dataset ( if are you using a deep network on a very similiar dataset to your target (ours is dogs and cats) - its causing the batch normalization not be updated)
- batch normalization
- use TTA for get validation predictions
- tensorflow, keras // pytorch, fastai
- mobile applications
- create a submission file
- individual prediction
- http://setosa.io/ev/image-kernels/
- diference between element-wise product and matrix product ?
- Video do Otavio Good : “A visual and intuitive understanding of deep learning”
- kernel / filter of convolutional with a shape of 3 x 3
- search for edges (left and top)
- feature maps
- non linearity, relu
- max pooling
- fastai/courses/dl1/excel
- MNIST data base
- filter to detects top edges
- we get activation after the element-wise product by the convolutional filter
- an activation is calculated
- Relu means max(0, value)
- pytorch stores convolutional filters as a tensor
- a tensor is an array with more dimensions (additional axis)
- the size of each hidden layer in a CNN is the number of convolutional filters used to get the feature maps
- the size of a convolutional kernel has 3 dimensions and the third one is the number of feature maps in the input hidden layer
- max pooling : kill the dimension by sub-sampling (keep the max) without over-lapping
- fully connected layer (linear matrix product)
- but big CNN gives big number of weights in the fully connected layers : risk of overfitting !
- VGG (16 layers) : 138 millions of weights
- VGG (19 layers) : more than 143 millions of weights
- in theses CNN, the number of weights of the convolutional filters is about 20 millions : the majority of the weights comes from the fully connected layers
- Resnet and ResNext do not use large fully connected layers
- the 50-layer ResNet network has about 26 million weight parameters and computes ~16 million activations in the forward pass (https://www.graphcore.ai/posts/why-is-so-much-memory-needed-for-deep-neural-networks)
- the fully connected layers do a classic matrice product
- last layer : there is no Relu (than, we can have negative value)
- softmax is an activation function that allows to get probabilities
- softmax tends to take one thing out of the other (ie, with a probabilities clearly higher than the other ones) : its “personality” is to pick a thing (so, it is perfect for one or 2 label classifier)
- sigmoid is an activation function uses for multi-label classifier because it gives a number between 0 and 1 (looks like a probability) for each label
- Relu is an activation function too but it does not get probabilities
- an activation function is a function applied on activations
- in Deep Learning, an activation function adds a non-linearity
- we must know log, exp
- activation functions have a personality
- we can not use softmax for multi-label classification
- if your objective is to classify multi-labels images, you can not use ImageClassifierData.from_paths because an image can not be in more than a folder. Then, you need to use ImageClassifierData.from_csv
- Good news : the Fastai library will recognize in your csv file if they are more than 2 labels (multi-label classification)
- data.val_ds (ds como data set in pytorch) : gives you a single image (or object) back
- data.val_dl (dl como data loader in pytorch) : gives you a transformed mini batch
- in pytorch, to get the next mini batch, we use a generator (iterator) : next(iter(data.val_dl))
- if you know python, you learn pytorch naturally
- zip takes 2 lists and combines them : list(zip(data.classes,y[0]))
- 1 hot encoded vector
- CatsDogs and DogsBreed were a single-label classification
- images from The Planet competition are not like ones used in Imagenet competition
- you can change the input image size during the training for the NN that have an adaptative pooling before the first fully connected layer like Resnet (but not VGG) : learn.set_data(get_data(sz))
- get data (imagens) resize before to pass them to the data object thanks to data.resize(int(sz*1.3), ‘tmp’) : speed-up ! (faster than resize directly in the tfms)
- after dogsbreed, try to run the Planet jupyter notebook
- metrics for accuracy : metrics = [f2] (f2 uses fbeta_score) and pass it to the learn object : learn = ConvLearner.pretrained(arch, data, metrics=metrics)
- in the Fastai library, everything can be changed
- sigmoid function is used for logistic regression
- fastai chooses automatically softmax or sigmoid activation function
- when you use a pretrained CNN network, it means that the weight of the first layer of your new models are not random but the ones of the last fully connected layers you added, are random. Then, you need to train firstly theses last layers before to unfreeze and train teh whole network. If not, the random weight of the last layers will destroy the weights of the first layers (from the pretrained model)
- the GPU takes a center crop on each input image of size sz. That’s why it is important to do Data Augmentation before on the input dataset
- in the fastai library, there is a concept of layer groups
- learn.summary()
- tables of data : structured data
- audio, images, natural linguaguem : unstructured
- Grocery Sales Forecasting competition in Kaggle
- Rossman data
- from fastai.structured import *
- from fastai.colum_data import *
- pandas (book : Python for Data Analysis)
- test = pd.read_csv(f’{PATH}test.csv’, parse_dates=[‘Date’])
- there is a difference with the DogsCats dataset : we do a lot of preprocessing on these structured data
- enter kaggle and do competitions !