(Will be updated when the video has been edited)
- Welcome to lesson 5 - we started with computer vision, it is mature, then text, then tabular and collab
- We are trying to understand every line of code. After this lession you will understand stuff like regularisation
- Starting where we left off, picture of what a deep neural net looks like, 2 types of layers, ones with parameters that you rmodel learns, and onese with activations (numbers are calculated)
- Activation functions are element wise function, relu main one, doesn’t matter that much what you pick
- Universal approximation themorem
- Back propagation defintion
- Fine tuning
- Visualization filters and the layers
- We need to train the new weights, we freeze the previouse layers because we do not need to train those (as much, in transfer learning, it will not change weights that are better nth)
- Now we unfreeze and train the rest of the ntework, more training to later layers than earllier, we split our model into sections and give different parts of the model different learning rates => discriminative learning rates
- Discriminative learning rates in fastai
- Collaborative filtering recap (also, affine functions)
- Excel solver
- Another work sheet, one hot encoding
- Third version of the spreadsheet, arrray lookup, what an embedding is
- Bias (rewind a bit for reason why it is needed)
- Q: when we load a pretrained a model, can we explore the activation grid to see wha tthey are good at recognatin
- Q:-> fits first number is number of epochs
- Q: what is an affine function -> linear function, where you multiply things and add them up, it is linear
- Collaborative filtering notebook (movie lens)
- Encoding problems with non unicode files
- N-hot encoding
y_range
(-10 sec)- Factors
- Dealing with the fact that people that like anime, really like anime
- Looking at the code, top_movies
- Asking for bias
- PCA - Check out Rachels course on Linear Algebra
- Can be used also for image similarity
- Q: why negative loss
cloab_learner
code- EmbeddingDotBias.forward
- Break
- End of break, interpreting embeddings
- Paper, entity embeddings of categorial variables (interesting preprocessing, 2016, nerual net on tabular data was new)
- 2d projection of embedding space discovers geography, distance etc
- Continuing with notebook - Weight decay - Regularisation
- More parameters = more non-linearities (for motivation, rewind a bit)
- Set wd to 0.1 instead of the default 0.01
- SGD - rewatch this part of lesson 2
- Loss
- Mnist as a standard fully comnnected net
- TensorDataset
- Our own nn.Module -> also learn how to do that yourself, remember init
- Try make your own nn.Linear
- Model.parameters
- Cant use mse to predict mnist number classes, instread cross entropy loss — more on wd in code
- .item -> to normal number
- Math (L2 regularization, weight decay)
- Small dataset and avoid overfitting by adding weight decay
mnist_NN
(-20 sec) we have now made a neural net from scratch- Opt = optim.Adam or optim.SGD ->
opt.step()
- Sgd vs adam a(-30 sec) -> diverged -> start again -> better
- Gradient descent in excel to demonstrate Adam
- Macro
- Momentum, 10% is the derivateive, 90% is the direction i went last time -> steps get bigger and bigger -> if you go to far, the average is somewhere inbetween
- Exponential moving average formula
- SGD with momentum
- Assignment suggestion
- Rmsprop
- What adam does momentum + rmsprop (dynamic learning rates)
- What
fit_one_cycle
does => avoid bumpy start & momentum is smaller when lr is high and vice versa => 10x faster - Plotting losses in fastai exp weighted moving average
- Understand all tabular code
- X-entropy loss
- Correct activation fn (softmax in this case) (and cross entropy as loss) for single class classification
- If you have big nrs you might need to add softmax, esp if you are using custom loss function, then fastai might not add softmax automatically
- Next week, forward in tabular, dropout, batch norm, data augmentations, convolutions, new architectures for computer vision
- End of lesson