Help wanted: YouTube chapter markers

RogerS49 · May 18, 2022, 3:02pm

Lesson 4 Timings

Not sure the splits are appropriate but is a csv file so can copy and put into a DataFrame

Time,section,
00:00:00,Intro to lesson,
00:00:54,Hugging Face Intro,
00:02:27,Different Architecture not so well layered,
00:03:27,Fine tune a pretrained model,
00:05:11,ULMFit,
00:09:10,Transformers description,
00:10:11,Questions1 How do you go from a trained to a classifiction model,
00:14:41,Nlp for Beginners'
00:18:10,look at classification data,
00:19:47,How to turn data into classification problem,
00:21:00,Kaggle GPU setup Look at the early cells of the notebook,
00:22:41,Tips on installing packages on remote servers {} Braces,
00:24:02,"Check out Competions website for rules and other info,lose function etc",
00:24:35,CSV files and Pandas,
00:26:40,DataFrame discussion,
00:29:29,Tokenization and Numericalization,
00:40:04,Jeremy asks for Questions 2,
00:45:20,Test Validation sets Overfitting,
00:52:32,How and why to create a good validation set,
00:56:06,Dangers of cross validation,
00:57:10,Test set vs Validation set,
00:59:04,Metrics and Correlation,
01:01:20,Problem with Metrics,
01:04:02,Pearson Correlation Coefficient,
01:08:23,Show Correlation Coefficient Plots,01:12:45,Training  returns a python dictionary,
01:13:09,Training Test Data Split,
01:14:00,Hugging SpaceTrainer,
01:16:30,Selecting the Model for the Trainer,
01:19:21,Jeremy asks for Questions 3,
01:22:12,Back to Training,
01:24:23,Summary,
01:33:00,Any more Questions

wyquek · May 18, 2022, 3:53pm

Nice, I did one too while watching Lecture 4. I suppose Jeremy now has the luxury of choice

Lesson 4

00:00:00 Using Huggingface
00:03:24 Finetuning pretrained model
00:05:14 ULMFit
00:09:15 Transformer
00:10:52 Zeiler & Fergus
00:14:47 US Patent Phrase to Phase Matching Kaggle competition
00:16:10 NLP Classification
00:20:56 Kaggle configs, insert python in bash, read competition website
00:24:51 Pandas, numpy, matplotlib, & pytorch
00:29:26 Tokenization
00:33:20 Huggingface model hub
00:36:40 Examples of tokenized sentences
00:38:47 Numericalization
00:41:13 Question: rationale behind how input data was formatted
00:43:20 ULMFit fits large documents easily
00:45:55 Overfitting & underfitting
00:50:45 Splitting the dataset
00:52:31 Creating a good validation set
00:57:13 Test set
00:59:00 Metric vs loss
01:01:27 The problem with metrics
01:04:10 Pearson correlation
01:10:27 Correlation is sensitive to outliers
01:14:00 Training a model
01:19:20 Question: when is it ok to remove outliers?
01:22:10 Predictions
01:25:30 Opportunities for research and startups
01:26:16 Misusing NLP
01:33:00 Question: isn’t the target categorical in this case?

jeremy · May 18, 2022, 7:39pm

Many thanks @RogerS49 @Daniel @wyquek I’ve added those now.

Daniel · May 18, 2022, 11:14pm

I know how it sounds, but I still want to say you forgot to credit me for Lesson 0,

jeremy · May 19, 2022, 12:03am

I appreciate the reminder Daniel - I had a vague feeling this morning that I’d missed one, and thought I checked them all… but forgot about lesson 0!

Daniel · May 19, 2022, 12:30am

Thanks a lot! I am glad it also helped to save you from checking all the videos

RogerS49 · May 19, 2022, 5:24am

Well Jeremy has the option of picking points from each also if they fit more precisely with his vision of the chapter markers, please don’t hesitate to do it again, I am on an archaeological dig during weeks 6 & 7 so this and the next lesson are perhaps all I can manage logistically.

Raymond-Wu · May 19, 2022, 7:03pm

Awesome work everyone! I think Lesson 4 timestamps are a bit premature though. The edited version of the video doesn’t seem to be up. That means that all the timestamps will have to be adjusted depending on what was cut out.

jeremy · May 19, 2022, 7:16pm

It is out. See the FAQ topic or the main lesson topic.

Raymond-Wu · May 19, 2022, 7:27pm

Ah it must be a bug on my end. I see a link but clicking on it doesn’t do anything. The usual caption showing how many have clicked it is blank as well.

jeremy · May 19, 2022, 11:27pm

Ah no it’s my bad sorry - it was working correctly in the FAQ thread, but not here, because I accidentally put an extra [ in the link. Fixed now.

Daniel · May 21, 2022, 5:51am

Lecture 4 Getting started with NLP for beginners

00:00 New and Exciting Content

00:53 Why Hugging Face transformer

Will we in this lecture fine-tune a pretrained NLP model with HF rather than fastai library?
Why use transformer rather than fastai library?
Is Jeremy in the process of integrating transformer into fastai library?
Does transformer has the same layered architecture of fastai? Is it high level enough?
Why it is a good thing to use a reasonably high level library (not as high as fastai)?

03:17 Understand Fine-tuning

Do we have the foundations to understand the details of fine-tuning now?
How to understand pretrained model in terms of parameters confidence? 03:51
Is fine-tuning trying to increase on the parameters which are not confident?

05:01 ULMFiT: the first fine-tuned NLP model

Where this model was first developed and taught?
Who wrote the paper?
What’s its impact?

05:42 ULMFiT step 1: a language model from scratch

What is the first language model in step one?
What’s the model trying to predict? What’s the dataset?
Why is this task so difficult? 06:10
How much knowledge does the model have to understand in order to predict?
How well can this first model predict in step one?

07:29 Step 2: fine-tuned the first model on IMDB

How did Jeremy build the second language model?
Where did the second model start with? What was the dataset for the second model?
What was the second model good at predicting?

08:15 Step 3: turn a language model to classify

08:30 Labels of language models

What are the labels for the datasets of the first two models?

08:56 Transformer models vs ULMFiT

When did the transformers first appear?
What’s transformers models are built to take advantage of?
What is not transformers trying to predict? (reason in part 2)
How transformers modified its dataset and what does it predict? 09:41
Does ULMFiT and Transformers really differ much on what to predict?
How much different are the 3 steps between ULMFiT and Transformers?

10:19 What a model knows

What can lower and higher layers of parameters/weights learn? 11:08
What we do to those layers of weights for transfer learning? 13:20
Zeiler and Fergus paper

14:40 NLP beginner on Kaggle competition

Using a Kaggle competition to introduce NLP for beginners, isn’t it amazing!
Why we should take Kaggle competition more seriously? 15:06
What real world tasks can NLP classification do? 15:57

17:57 Examine the competition dataset

What is inside the competition dataset?
How classificationish does the dataset look like?
What do we predict about ‘anchor’ and ‘target’?
What value to predict?
Why it is not really a straightforward classification?
What is the use of ‘context’?

19:42 Model Strategy

How to modify the dataset in order to turn a similarity problem into a classification problem?
Should we always try to solve a problem by turning it into a problem we are familiar with?

20:56 Get notebook ready

When and how to use a GPU on Kaggle?
Why Jeremy recommend Paperspace over Kaggle as your workstation?
How easy has Jeremy made it to download Kaggle dataset and work on Paperspace or locally?
How to do both python and bash in the same jupyter cell?

23:33 Get raw dataset into documents

How to check what inside the dataset folder?
Why it is important to read Competition data introduction which is often overlooked?
How to read a csv file with pandas? 24:30
What are the key four libraries for data science in python? 24:46
What is the other book besides fastbook recommended by Jeremy? 25:36
Why you must read it too?
How to access and show the dataset in dataframe? 26:39
How to describe the dataset? What does it tell us in general? 27:10
What did the number of unique data samples mean to Jeremy at first? 27:57
How to create a single string based on the model strategy? 28:26
How to refer to a column of a dataframe in reading and writing a column data?

29:17 Tokenization: Intro

How to turn strings/documents into numbers for neuralnet?
Do we split the string into words first?
What’s the problem with the Chinese language on words?
What are vocabularies compared with splitted words?
What to do with the vocabulary?
Why we want the vocabulary to be concise not too big?
What nowadays people prefer rather than words to be included in vocab?

31:12 Subwords tokenization by Transformer

How to turn our dataframe into Hugging Face Dataset?
What does HF Dataset look like?
What is tokenization? What does it do?
Why should we choose a pretrained model before tokenization?
Why must we use the model’s vocab instead of making our own?
How similar is HF model hub to TIMM? 33:10
What Jeremy’s advice on how to use HF model hub?
Are there some models generally good for most of practical problems? 34:17
When did NLP models start to be actually very useful? 34:35
Why we don’t know much about those models which potentially are good for most of things?
Why should we choose a small model to start with?
How to get the tokens, vocabs and related info of the pretrained model? 36:04
How to tokenize a sentence by the model’s style?
After a document is splitted into a list of vocab, do we turn the list of vocab into a list of numbers? Numericalization 38:30
Can you get a sense of what subword vs word is from the examples of tokenization
How to tokenize all the documents with parallel computing? 38:50
Given the input column is the document, what’s inside the input_id column?

41:09 Special treatment to build input?

Do we need to follow some special treatment when building a document or an input from dataset?
What about when the document is very long?

43:24 Start ULMFiT on large documents

What ULMFiT is best at doing?
Why ULMFiT can work on large documents fast and without that much GPU?
How large is large for a document?

44:33 Some obscure documentations of Transformer library

45:12 The most important idea in ML

Is it the idea of having separate training, testing, validation datasets?

45:56 Underfitting vs Overfitting

How to create a function to plot a polynomial function with a degree variable?
What are 1st/2nd/3rd degree polynomial?
What does Jeremy think of sklearn? When to use it? 47:37
What is underfitting? Why a too-simple model is a problem or is systematically biased? 48:12
What is overfitting? What does an overfit look like? 48:58
What is the cause of overfitting?
It is easy to spot underfitting, but how to filter an overfitting from the function we want?

50:34 Validation: avoid overfitting on training set

How to get a validation dataset and use it?
Why you need to be careful when use other libraries other than fastai?

52:24 How and Why to create a good validation set

Did you know simply random 20% of dataset as a validation set is not good enough?
For example, shouldn’t you select validation dataset so that your model can predict the future rather than the past?
Why is Kaggle competition a great and real-world way to appreciate using validation set to avoid overfitting?
How validation set can help avoid overfitting in 2 Kaggle competition on real world problems? 54:44
Watch out when touching cross-validation 56:03
Why should you be careful when simply using library-ready tools of selecting validation set randomly?
Validation post by Rachel

56:58 Test set: avoid overfitting on validation set

What is a test set for?
Why need it when we have a validation set?
When or how can you overfit on a validation set? or
Why is validation set not enough to overcome model overfitting?
Why Kaggle prepares two test sets? or
Why Kaggle thinks that two test sets are enough to filter overfitting models #question

59:03 Metrics functions vs Loss functions

How we use validation set to check on the performance of model?
Will Kaggle competition choose the metrics for you?
Should the metrics be our loss function?
What kind of functions you should use as loss function? (bumpy vs smooth)
So, always be aware: the loss your model tries to beat may not be the same function to rate your model
Why one metric is always not enough and can cause much problem?

1:03:54 Metrics: you can’t feel it from math

What is Pearson correlation (r) and how to interpret it?
Which can teach you how r behave, its math function or its performance on datasets? #best-practice
Should we a plot with a 1000 random data point or a plot with the entire a million data points?
How to get correlation coefficient for every variable to every other variable? 1:06:27
How to read the correlation coefficient matrix?
How to get a single correlation coefficient between two things?
How to tell how good is a correlation coefficient number? 1:07:45
What are the things to spot? (tendency line, variation around the line, outliers)
How to create transparency on the plot?
How can we tell from another example that r is very sensitive to outliers? 1:09:47
How much can removing or mess up a few outliers really affect your scores on r? or
Why do you have to be careful with every row of data when dealing with r?
Can we know how good is r = 0.34 or r = -0.2 without a plot?
Don’t forget to get the data format right for HF

1:12:50 HF train-validation split

How to do the random split with HF?
Will Jeremy talk about proper split in another notebook?

1:14:00 Training a model

What to use for training a model in HF?
What is batch and batch size?
How large should a batch size be?
How to find a good learning rate? (details in a future lecture)
Where to prepare all the training arguments in HF library?
Which type of tasks do we use for picking the model?
How to create the learner or trainer after model?
How to train?
Why is the result on metrics so good right from the first epoch?

1:19:20 Dealing with outliers

Should we get a second analysis for the outliers rather than simply removing them?
What outliers really are in the real world?
Doesn’t outliers usually tell us a lot of surprisingly useful info than in the limiting statistical sense?
What is Jeremy’s advice on outliers?

1:22:07 Predict and submit

How to do prediction with HF?
Should we always check the prediction output as well as the test set input?
What is the common problem with the output? (proper solution may be in the next lecture)
What is the easy solution?
How to submit your answer to Kaggle?

1:24:35 Huge opportunities in research and business

1:26:05 Misuses of NLP

Can NLP chatbots can create 99% of online chat which almost non-distinguishable from real humans?
Can GTP-3 create even longer and more sophisticated prose which is even more human-like?
How machined generated public opinions can influence public policies or laws?

1:32:59 Issues on num_labels in HF library

Daniel · May 23, 2022, 7:09am

Revised version

Lecture 1

00:00 Welcome

Welcome to Part 1 2022 course

00:25 Computers can tell birds was a joke

Were computers smart enough to determine photos of birds before 2015?

01:20 Build your first model

How to build your first model #code
How to download and display a photo of a bird from DuckDuckGo using simple codes?
What photos/images are actually made of, at least for computers? 02:09
How to create two folders named ‘bird’ and ‘forest’ respectively under a larger folder ‘dest’? 03:20
How to download 200 images for each category?
How to resize and save those images in respective folders?
How to find broken images and then remove or unlink them from their folders? 04:10
How to prepares all the data for building models? 04:25
How to display the images in a batch?
How to build a model and train/finetune it on your local computer? 05:10

05:55 The joke comes true

How to predict or classify a photo of bird with a model?

07:09 Run on cloud

How to get started running and playing around the codes and models immediately and effortlessly?

07:56 Questionnaires first

Why should you read lecture questionnaires before studying the lecture?

08:37 Searchable Lecture videos

How do you search and locate a particular moment inside a lecture video?

09:22 Amazing models

Can you create an original masterpiece painting by simply utterring some artistic words to a model?
Can you believe that models today can explain your math problems not just give you a correct answer? 12:33
Can you believe that models today can help you get a joke?

13:49 Data Ethics

Do you know Rachel Thomas has taught a course on practical data ethics?

14:20 Hey, how are you?

Do Jeremy and fastai community take it very seriously in help beginners along the way?

16:33 Make the most out of fast.ai

Do you want to know how to make the most out of fastai?
Do you know Lesson 0 and the book meta learning by an alumni?

17:41 Learn in context

Do you know people learn naturally (better) with context rather than by theoretical curriculum?
Do you want this course to make you a competent deep learning practitioner by context and practical knowledge?
If you want theory from ground up, should you go to part 2 fastai 2019?

20:01 Fastbook

Do you know that learning the same thing in different ways betters understanding?
Why should you read the fastbook?

21:25 Take it seriously

Why you must take this course very seriously?

24:38 Create features manually

Why did we need so many scientists from different disciplines to collaborate for many years in order to design a successful model before deep learning?

26:19 Create features automatically

Why can deep learning create a model to tell bird from forest photos in 2 minute which was the impossible before 2015?
Would you like to see how much better/advanced/complex are the features discovered by deep learning than groups of interdisciplinary scientists?

29:14 Turn sound, time series, movement into images

Are all things are data, no matter it is image, sound, time (series), or even movement?
Are images are just one way of expressing data?
Can we store or express data (of sound, time, movement) in the form of images? #surprise prise
Can imaged based algos learn on those weird images which make no sense to humans?

30:45 Transfer learning liberate DL for everyone

Can I do DL with no math (I mean with high school math)?
Can I train DL models with hand-made data (<50 samples)? #surprise
Can I train state of art models for free (literally)?
Transfer learning is DL myth breaker

32:16 Hi, Pytorch! (Farewell, Tensorflow)

Which would you pick for DL, Pytorch or Tensorflow?

33:43 Fastai = Pytorch + best practice

Why should you use fastai over pure pytorch?
Don’t you want to write less code, make less error, achieve better result? #best-practice
Don’t you want a robust and simple tool used by your future colleagues and bosses?

35:50 Jupyter Notebook = Code + Write + Run on cloud

Why is jupyter notebook the most loved and tested coding tool for DL?
How Jeremy introduce Jupyter notebook to beginners?

40:35 Jupyter on cloud: first best practices

How to make sure your notebook is connected in the cloud?
How to make sure you are using the latest updated fastai? #code

41:22 Get started with the bird/forest notebook

Doesn’t fastai feel like python with best practices too?
How to import libraries to download images?
How to create and display a thumbnail image?
Always view your data at every step of building a model #best-practice
How to download and resize images?
Why do we resize images? #best-practice

43:56 Data massaging vs model tweaking

Why a real world DL practitioner spend most of the valuable/productive time preparing data rather than tweaking models? #surprise
Can super tiny amount of models solve super majority of practical problems in the world?
Have fastai selected and prepared the best models for us already?

45:32 Best practices added from other languages

Does Jeremy add best practices of other programming languages into fastai?
Jeremy loves functional programming

46:10 Prepare your data

How fastai design team decide what tasks should DataBlock do? #code
task 1: Which blocks of data do DataBlock need to prepare for training?
task 2: How should DataBlock get those data, or by what function/tool?
task 3: Should we always ask DataBlock to keep a section of data for validation?
task 4: Which function or method should DataBlock use to get label for y?
task 5: Which transformation should DataBlock apply to each data sample?
task 6: Does dataloader do the above tasks efficiently by doing them in thousands of batches at the same time with the help of GPUs?

50:29 Docs: Tutorials and API

What is the most efficient way of finding out how to use e.g., DataBlock properly? #code
How to learn DataBlock thoroughly?

51:27 What is a Learner

What do you give to a learner, e.g., vision_learner? #code

52:33 TIMM: largest collection of CV models

Is fastai the first and only framework implement TIMM? #surprise
Can you use any model from TIMM in your project?
Where can you learn more of TIMM?

53:46 Resnet 18: a pretrained model

What is a pretrained model, Resnet18?
What did this model learn from?
What come out of this model’s learning?
or what is Kaggle downloading exactly?

54:51 Fine tuning

What exactly does fine tuning do to the pretrained model? #fine-tuning
What does fine-tuning want the model to learn from your dataset compared with the pretrained dataset?

55:34 Prediction

How to use the fine tuned model to make predictions? #code

56:48 Other CV model uses: Segmentation

Can we fine tune pretrained CV models to tell us the object each and every pixel on a photo belong to?

58:49 Specialized DataLoaders

Why do we need specialized DataLoaders like SegmentationDataLoaders given DataBlock?

01:00:31 Non-CV: Tabular analysis

What can tabular analysis do? Can we use a bunch of columns to predict another column of a table?
How do you download all kinds of dataset for training easily with fastai? untar_data
What are the parameters for TabularDataLoaders?
What is the best practice show_batch of fastai learned from Julia (another popular language)?
Why to use fit_one_cycle instead of fine_tune for tabular dataset?

01:03:22 Non-CV: Collaborative filtering

Can we use collaborative filtering to make movie recommendations for users?
How does recommendation system work?
Can collaborative filtering models learn from data of similar music users and recommend/predict music for new users based on how similar they are to existing users?

01:04:34 Recommending with collaborative filtering

How to download dataset for collaborative filtering models?
How to use CollabDataLoaders?
How to build a collaborative filtering model with collab_learner?
What is the best practice for setting y_range for collab_learner? #best-practice
If in theory no reason to use pretrained collab models, and fine_tune works as good as fit or fit_one_cycle, any good explanations for it? #question
How to show results of this recommendation model using show_results?

01:06:53 Jupyter Notebook: everything you need

01:10:06 Deep learning: its present capacity span

What can Deep Learning do at the present?
What are the tasks that deep learning may not be good at?

01:12:33 First neuralnet model in 1959

Has the basic idea of deep learning changed much since 1959?

01:13:21 Programs before machine/deep learning

What did we write into programs/models before deep learning?
How to draw chart in jupyter notebook?

01:14:37 Deep learning theory in 5 minute

Can you explain DL theory in 5 min? #surprise
What is a model?
What are weights?
How do data, weights and model work together to produce result?
Why are the initial results are no good at all?
Can we design a function to tell the model how good it is doing? loss function
Then can we find a way to update/improve weights by knowing how bad/good the model is learning each time from the data?
If we can iterate the cycle multiple times, can we build a powerful model?

01:20:28 Homework

Run notebooks, especially the bird notebook
Create something interesting to you based on the bird notebook
Read the first chapter of the book
Be inspired by all the amazing student projects

RogerS49 · May 25, 2022, 9:23am

Lesson 5 Chapter Timings
Cherry pick as you desire .

00:00:00,  Introduction     ,
00:01:59,  05-linear-model-and-neural-net-from-scratch     ,
00:04:36,    "## Introduction" ,
00:07:30,    "## Cleaning the data" ,
00:14:00,    "	   Question 1" , 
00:26:46,    "## Setting up a linear model" ,
00:38:48,      "def calc_preds(coeffs, indeps): return (indeps*coeffs).sum(axis=1)" ,
00:38:53,      "def calc_loss(coeffs, indeps, deps): return torch.abs(calc_preds(coeffs, indeps)-deps).mean()" ,
00:39:39,    "## Doing a gradient descent step" ,
00:42:15,    "## Training the linear model" ,
00:43:55,      "def update_coeffs(coeffs, lr): coeffs.sub_(coeffs.grad * lr)" ,
00:44:03,      "def one_epoch(coeffs, lr):" ,
00:44:10,      "def init_coeffs(): return (torch.rand(n_coeff)-0.5).requires_grad_()" ,
00:44:20,      "def train_model(epochs=30, lr=0.01):" ,
00:45:26,      "def show_coeffs(): return dict(zip(indep_cols, coeffs.requires_grad_(False)))" ,
00:46:05,    "## Measuring accuracy" ,
00:47:14,      "def acc(coeffs): return (val_dep.bool()==(calc_preds(coeffs, val_indep)>0.5)).float().mean()" ,
00:48:10,    "## Using sigmoid" ,
00:50:24,      "def calc_preds(coeffs, indeps): return torch.sigmoid((indeps*coeffs).sum(axis=1))" ,
00:54:18,      "	Question 2"  ,
00:56:09,    "## Submitting to Kaggle" ,
00:58:25,    "## Using matrix product" ,
00:59:58,      "def calc_preds(coeffs, indeps): return torch.sigmoid(indeps@coeffs)" ,
01:00:29,      "def init_coeffs(): return (torch.rand(n_coeff, 1)*0.1).requires_grad_()" ,
01:03:31,    "## A neural network" ,
01:04:22,      "def init_coeffs(n_hidden=20):" ,
01:07:10,      "def calc_preds(coeffs, indeps):" ,
01:08:11,      "def update_coeffs(coeffs, lr):" ,
01:09:20,    "## Deep learning" ,
01:09:35,      "def init_coeffs():" ,
01:10:39,      "def calc_preds(coeffs, indeps):" ,
01:11:47,      "def update_coeffs(coeffs, lr):" ,
01:12:10,    "## Final thoughts" ,
01:15:30,  06-why-you-should-use-a-framework ,
01:16:21,    "## Introduction and set up" ,
01:16:33,    "## Prep the data" ,
01:16:45,      "def add_features(df):" ,
01:19:38,    "## Train the model" ,
01:21:34,    "## Submit to Kaggle" ,
01:23:22,    "## Ensembling" ,
01:23:40,      "def ensemble():" ,
01:25:08,    "## Final thoughts" ,
01:25:28     "	   Question 3", 
01:26:44,  07-how-random-forests-really-work ,
01:28:36,    "## Introduction" ,
01:28:57,    "## Data preprocessing" ,
01:29:06,      "def proc_data(df):" ,
01:30:56,    "## Binary splits" ,
01:33:07,      "def xs_y(df):" ,
01:35:46,      "def _side_score(side, y):" ,
01:37:32,      "def score(col, y, split):" ,
01:38:13,      "def iscore(nm, split):" ,
01:40:35,      "def min_col(df, nm):" ,
01:41:34,    "Final Roundup "   ,

jeremy · May 25, 2022, 9:44am

That’s an interesting approach!..

jeremy · May 25, 2022, 11:28pm

I’ve added a cleaned up version of a subset of those to YouTube now.

RogerS49 · May 26, 2022, 4:46am

briefly as below

loop : grep -e “##” -e “def” notebooks >> text file
clean test file remove already present tailing “,”
sed insert front “00:00:00,” and rear “,” to text file
vim text file use ‘r’ replace 0 with time digit as watch video

The “tail history” is on my other machine may edit this with actual later

EDIT:
A shell script assuming this directory structure “/homedir/fastai/course22/clean”

cd fastai
cd course22/clean
grep -e '##' -e 'def'  05-linear-model-and-neural-net-from-scratch.ipynb > ../../L5csvv2.txt
grep -e '##' -e 'def'  06-why-you-should-use-a-framework.ipynb  >> ../../L5csvv2.txt
grep -e '##' -e 'def'  07-how-random-forests-really-work.ipynb  >> ../../L5csvv2.txt
sed 's/\\n",/"/g' ../../L5csvv2.txt | sed 's/.*/00:00:00,&,/' > ../../L5csvv5.txt
less ../../L5csvv5.txt

The transformation was performed on a mid 2009 MacBook Pro OS X EL Captain 10.11.06. So ‘sed’/‘grep’ commands may work differently on other systems.

I think all students should break the videos down into their own personal memorable sections. This particular lesson was easy to do as the layout was all in the notebooks other lesson perhaps could not be broken down in this way

for all notebooks

find . -name '*.ipynb' -print | sort | xargs grep -e "##" -e "def" > ../../Lessonscsvv1.txt
sed 's/\\n",/"/g' ../../Lessonscsvv1.txt | sed 's/.*/00:00:00,&,/' > ../../Lessonscsvv2.txt
less ../../L5csvv3.txt

Daniel · May 28, 2022, 3:12am

Lecture 5

Build a tabular model from scratch

00:00 Tabular model from scratch

00:42 Review Titanic dataset and the two models in excel

01:30 From excel to python

Linear model and neuralnet from scratch

01:57 Clean version of notebook

What does a clean version of the notebook look like?

02:38 Get comfortable in Paperspace Gradient

How to work with jupyterlab mode instead of default mode?
How to swift between jupyterlab mode and jupyter notebook mode?
Learn some useful keyboard shortcuts

04:30 Things to do with clean notebook

What are the steps or things we should do with the clean version of a course notebook?
Where is the non-clean version?

05:17 Same notebook runs on Kaggle and everywhere

How to check whether the notebook is running on Kaggle or elsewhere? #setup
How to get the data and its path right accordingly?

06:42 Libraries and format setup

How much should we know about these libraries before starting?
How to make the printed result look nicer in cells?

07:24 Read train.csv as Dataframe

How to read and display a csv file in pandas dataframe format? #data-cleaning

07:47 Find and count missing data with pandas

How to check missing data in each cell/row? #data-cleaning
How to sum up missing data in each column?

09:32 Choose mode value for the missing data

What is the most common choice for replacing the missing data regardless categorical or continuous? mode #data-cleaning
How to select the first mode value if there are two modes available for a column?

10:43 Be proactively curious

Why it is impractical for Jeremy to explain every common function of every library used?
What should you do about it?

12:22 Replace missing data with mode values

How to fill in the missing data with mode values of those columns with or without creating a new dataframe? #data-cleaning

13:14 Keep things simple where we can

Why use the world’s simplest way of filling missing data?
Does this simplest way work most of the time?
Do we always know the complicated way would help?

13:54 Don’t throw out rows nor columns

Do those filled columns sometimes turn out to matter much for the model? #best-practice
How does fastai library help to find out about it?

14:53 Describe your data or columns

How to get a quick overview/description of your data? #data-describing
What do we look for in the descriptions?

15:52 See your columns in histogram

What to do with interesting columns? #data-describing
What can you find out with histogram?
What is long-tailed distribution of the column? What does it look like?

16:26 Log transformation on long-tailed columns

Which models don’t like long-tailed distributions in the data? #best-practice #data-describing
What is the easiest way to turn long-tailed to centered distribution?
Where to find more about the log and log curve?
What does log do in one sentence? 17:11
How to avoid the problem of log(0)? adding 1
What does the column data (histogram) look like after transformed by log?

17:53 Most likely long-tailed data

What kind of data are most likely to be long-tailed which need log transformation? #best-practice

18:10 Describe non-numerical columns

How to describe seemingly numerical but actual categorical columns? #data-describing
How to describe all non-numeric columns altogether?
What does this description look like? (how it differ from those of numeric data)

18:55 Apply coefficients on categorical columns

How to apply coefficients to categorical columns? #data-cleaning
What does it mean by applying dummy variables to categorical columns?
What are the two ways of getting dummy variables and what’s Jeremy view on them?
What does the dummy variable transformation of categorical variables look like ?

21:13 The secret power of name column

Can a model built only on name column score No.1 in Titanic competition? #surprise
Where to find more about it?
This technique is not covered in this lecture

23:01 Tensor

Why focus on pytorch rather than numpy?
What data format does pytorch require? How to do this data format conversion? #data-cleaning
What is a tensor? Where did it come from?
How to turn all independent columns into a single large tensor?
What is the number type does tensor need? float
How to check the shape of a tensor? (num of rows and columns)
How to check the rank/dimensions/axis of a tensor? What is rank?
What is the rank of a vector, a table/matrix, or a zero?

26:46 Create random coefficients

Why we don’t need a constant here as in excel? #question
How many coefficients we need? How we figure it out?
How to create a vector of randomized numbers for the coefficients?
How to make the coefficients value centered? Why this is important? #question (answered later)

27:54 Reproducibility of coefficients

How to create the same set of random numbers for your coefficients each time running the cell?
When to and not to use random seed to make your result reproducible?
How not using random seed can help understand your model intuitively?

29:30 Broadcasting: data * coefficients operation on GPU

What is broadcasting? Isn’t it just matrix and vector multiplication?
Where did it come from?
What are the benefits of using broadcasting?
simple code vs lots of boilerplate
coded and optimized in C language for GPU computation
What’s the rule of broadcasting and where to find more about it?
a great blog post help understand broadcasting

33:43 Normalization: make the same range of values for each column

What would happen when the values of a column is much larger than the values of other columns?
Why to make every data column to have the same range of values?
How to achieve the same range for all column values?
What are the two major ways of doing normalization?
Does Jeremy favor one over the other?

37:08 Sum up to get predictions

How to sum up the multiplication of each row with the coefficients, and do it for all rows?
Is the summed-up number the prediction for each person/row of data?

37:32 A default choice for loss function

How to make the model better? Gradient descent
What is needed to do gradient descent? loss function
What does a loss function do? measure the performance of coefficients
What is Jeremy’s default/favorite choice for loss function?
Why does Jeremy always write the loss function out manually when experimenting?

38:24 Make notebook readable/understandable in the future

When to encapsulate all exploratory steps into a few functions?
Why keep all these exploratory steps available (Don’t delete them)?

39:39 Update coefficients with gradient descent in Pytorch

How to ask PyTorch to do gradients on coefficients?
How to ask Pytorch to update values on the same coefficients tensor (not create new one)?
What does loss function do besides giving us a loss value? What does it store?
What function to run with loss to calculate gradients for coefficients?
How to access the gradients of coefficients? and how to interpret the gradients?
How to decide whether it is to subtract or add gradients to coefficient? #question
How to choose on the learning rate? #question
How to calc updated loss with renewed coefficients?

41:54 Split the dataset

Why did Jeremy randomly split training and validation set for Titanic dataset? #data-splitting
Why to use fastai’s random splitter function?
How to create the training and validation set with the splitter function?

43:41 Encapsulate functions for model training

How does Jeremy create functions like init_coeffs, update_coeffs, one_epoch, train_model from the exploratory steps above?
How to use the train_model function to see how well the model works?

44:50 Titanic dataset is a good playground

Why so?

45:24 Display coefficients

How to display the final coefficients?
How to interpret it? #question
Can we make some sense of the values inside?

46:01 Accuracy as metrics

Why not use accuracy as loss function? #question
What can we use accuracy function for?
What threshold did Jeremy use for survival?
How to calculate accuracy and put it into a function?

48:07 Sigmoid function: ease coefficients optimization

What you see from the predictions make you think of using sigmoid function? #surprise
Why sigmoid function can really make optimization easier for the model?
Why the two-ends plateau of the function is good for optimization? (to tolerate very large and small values of predictions rather than forcing every prediction to get closer to 1 or 0)
Why the straight-line middle part of the function plot is also what we want? 48:58
How to plot any function with just one line of code? What library is this? sympy
How to update calc_preds function with sigmoid function easily in Jupyter? 50:52
Why to make predictions to center on 0 before sigmoid function? (a reply by Jeremy)
Do you remember what did Jeremy do to make prediction center on 0? (see how initial coefficients is defined, a cell link on Kaggle)
Why allow predictions to be large or small can make weights optimization easier? (Jeremy’s reply)
How python with Jupyter make exploratory work so easy?
How come the learning rate jump from 0.1 before sigmoid to 2 after using sigmoid? #question 51:57
When or How often (as a rule) should you use sigmoid function to your prediction? 52:23
Does HF library specify whether they use sigmoid or not? (probably the others neither)
What You need to watch out for optimization is the input and output not the middle for now. Why is that? 53:13

54:17 What if test dataset has extra columns?

What would the normal consequences be?
How does fastai deal with it for good?

55:58 Submit to Kaggle

How and why Jeremy replaced a missing value of Fare with 0?
How to apply the above data cleaning steps to the test set?
How to prepare the output column expected by Kaggle?
How to create the submit file expected by Kaggle?

Key steps from linear model to neuralnet

58:24 val_indep * coeffs vs val_indep @ coeffs

What do we know about val_indep * coeffs mean? Is it element-wise? Is it matrix and vector multiplication?
What do we know about val_indep @ coeffs? Is it matrix-matrix multiplication?
Is (val_indep * coeffs).sum(axis=1) equal to val_indep @ coeffs?
What should we know about them to distinguish them properly? #question
In val_indep @ coeffs, when coeffs is a matrix, do we need to specify its shape? 59:50
How to initiate coefficients as a one column matrix rather than a vector?

1:01:26 Transform existing vectors into matrix

How to turn both trns_dep and vald_dep from existing vectors to matrices which responding to coeffs matrix?

Building a neural net

1:03:31 Keep linear model and neuralnet comparable in output

How to create a layer within multi-hidden layers inside (or a matrix of coefficients rather than a vector of coefficients)?
why to divide the coefficients of the multi-hidden layers by the number of layers (or the matrix of coefficients by the number of columns)?
Is it to make sure the outputs of neuralnet and previous linear model are comparable?

1:05:31 Build the output layer

How to build the coeffs of the output layer with correct shape which connects with the previous layer?
How to decide the number of output of this output layer?

1:06:10 TRY to getting the training started

Why Jeremy make the coefficients of the output layer to minus 0.3?
What does it mean by this minus 0.3 can get the training start? #question
(I guess Jeremy may tried -0.5 first, experiment to find it out)

1:06:26 Adding Constant or not

Why we don’t need a constant for layer 1 (think of the constant of the linear model)?
Why we must have a constant for layer 2?
Do coefficients of layer1, layer 2 and constants all need their own gradients initiated?

1:07:05 Building the model

What is a tuple and how it is used for grouping and separating the three coefficients?
How to construct the prediction function by sending data through layer 1 and layer 2 finally add constants? #model-building

1:08:08 A neuralnet done but super fiddly

How to update all three coefficients in a loop?
Did you notice that the learning rate changed again? (1.4, last time was 2, earlier was 0.1)
What did Jeremy say getting this model work was super fiddly?

From neuralnet (1 hidden layer) to deep learning with 2 hidden layers

1:09:09 Initialize coefficients of all hidden layers and constants

How to initialize coefficients of 2 hidden layers and 1 output layer and constants, and get gradients ready for all of them, in one compact function? #model-building
What are the shape of each coefficient matrix?

1:10:36 Building the 2 hidden layer model

What are activation functions?
What are the activation functions for 2 hidden layers?
What is the activation function for the output layer?
What is the most common mistake on applying activation function to the final layer?

1:11:46 Train the model

Don’t forget to update gradients
Which are those numbers Jeremy still have to fiddle to get right?
Did this deep learning model improve on the loss and accuracy?

1:12:12 Dissect and Experiment large functions

How to experiment on a large function like the init_coeffs by breaking it into small pieces and running them?

1:13:16 Tabular datasets: where deep learning not shining

How should we think about that both neuralnet and deep learning models aren’t better?
What does it mean that a carefully designed algo beat all deep learning models in Titanic competition?
What datasets do deep learning generally perform better?

Framework: DL without super fiddling notebook

1:15:28 Why use framework rather than from scratch

Why you should use a library framework in real life rather than building yourself like the above?
When to do it from scratch?
What can a framework do for us?
Can it automate the obvious things like initialization, learning rate, dummy variable, normalization, etc?
Can I still make choice on the not-so obvious things?

1:16:34 Feature engineering with pandas

What is the feature engineering with pandas look like?
How and where does Jeremy suggest on digging in pandas?

1:18:25 Automate the obvious when preparing dataset

How framework make cateorifying data, filling missing data, normalization automatic?
How to specify the dependent column to be a category?

1:19:37 Build multi-hidden layers with one line of code

How to specify the shapes of two hidden layers with just two numbers?
Do you only need to specify accuray without worrying about loss and activations?

1:19:56 Automate the search of best learning rate

How does fastai help you find the range where best learning rate locates?
How should you pick the learning rate from the range?

Predict and Submit with ease

1:21:29 Automate transformation of test set in one line of code

How to automatically apply all transformations done to training and validation sets to test set?

Experiment with Ensembling

1:23:03 Ensemble is easy with fastai

Does framework save so much of fiddling so that experimenting with some advanced ideas become easier?
What is ensembling?
Is it to combine multiple models and combine their predictions?

1:23:36 The simplest ensemble

What does a simple ensemble look like?
How to build, run and predict with 5 models with ease?
How different are those 5 models? (only initial coefficients are different)
How to combine their predictions?
How much improvement does this simple ensemble get us to?

1:25:20 Ways to combine the predictions

Why not use mode but mean?
What are 3 ways of combining the predictions?
Does one is better than the others?
What’s Jeremy’s suggestion?

How Random Forest really work

Is this a good place to also learn pandas and numpy?

1:26:38 Why Random Forest

What is the history of Random Forest and Jeremy?
What does Jeremy think of random forest?
Why random forest is so much easy and better?
Why the seemingly simple logistic regression is so easy to get wrong?

1:28:34 Pandas categorical function

How to import all the libraries you need at once?
How to do fillna with pandas and log with numpy?
What does panda categorical function do for us?
What’s friendly display after the function applied?
What’s the actual data transformation under the hood?
Key points to make: No dummy variables, Pclass no long needed to be viewed as categories

1:30:54 Binary splits: bases of random forest

1:31:14 A binary splits on gender

1:32:32 Build a binary splits model on gender with sklearn

1:33:59 Build a binary splits model on Ticket prices with sklearn

1:35:38 Build a score machine on binary splits regardless categorical or continuous

What is a good split?
Is it good that within each group their dependent values are similar?
How to measure the similarity of values within a group? std
How to compare standard deviations between two groups appropriately? (multiply by size)
How to calc the score for evaluating the splits based on the value of combined std of two groups?

1:39:01 Automate the score machine on all columns

How to find the best binary splits by trying out all possible split points of a column?

1:41:26 1R model as the baseline

What is a random forest? and what is a random forest?
What is 1r model?
how good was it in the 90s of ML world?
Should we always go for complicated models?
Should we always start with a 1r model as a baseline model for our problem?

jeremy · June 29, 2022, 7:22pm

If anyone has some free time to help by creating chapter markers to lesson 6, I’d be very grateful!

Daniel · July 1, 2022, 4:28am

Will this one help?