YouTube Chapter Markers & Transcriptions

fmussari · December 13, 2022, 3:05pm

Done

Lesson 16 Chapters

0:00:00 - The Learner
0:02:22 - Basic Callback Learner
0:07:57 - Exceptions
0:08:55 - Train with Callbacks
0:12:15 - Metrics class: accuracy and loss
0:14:54 - Device Callback
0:17:28 - Metrics Callback
0:24:41 - Flexible Learner: @contextmanager
0:31:03 - Flexible Learner: Train Callback
0:32:34 - Flexible Learner: Progress Callback (fastprogress)
0:37:31 - TrainingLearner subclass, adding momentum
0:43:37 - Learning Rate Finder Callback
0:49:12 - Learning Rate scheduler
0:53:56 - Notebook 10
0:54:58 - set_seed function
0:55:36 - Fashion-MNIST Baseline
0:57:37 - Look inside the Model
1:02:50 - PyTorch hooks
1:08:52 - Hooks class / context managers
1:12:17 - Dummy context manager, Dummy list
1:14:45 - Colorful Dimension: histogram

Transcription Done

Lesson 16: Deep Learning Foundations to Stable Diffusion, 2022

fmussari · December 14, 2022, 6:49pm

fmussari · December 16, 2022, 7:26pm

Done

Lesson 17 Chapters

0:00:00 - Changes to previous lesson
0:07:50 - Trying to get 90% accuracy on Fashion-MNIST
0:11:58 - Jupyter notebooks and GPU memory
0:14:59 - Autoencoder or Classifier
0:16:05 - Why do we need a mean of 0 and standard deviation of 1?
0:21:21 - What exactly do we mean by variance?
0:25:56 - Covariance
0:29:33 - Xavier Glorot initialization
0:35:27 - ReLU and Kaiming He initialization
0:36:52 - Applying an init function
0:38:59 - Learning rate finder and MomentumLearner
0:40:10 - What’s happening in each stride-2 convolution?
0:42:32 - Normalizing input matrix
0:46:09 - 85% accuracy
0:47:30 - Using with_transform to modify input data
0:48:18 - ReLU and 0 mean
0:52:06 - Changing the activation function
0:55:09 - 87% accuracy and nice looking training graphs
0:57:16 - “All You Need Is a Good Init”: Layer-wise Sequential Unit Variance
1:03:55 - Batch Normalization, Intro
1:06:39 - Layer Normalization
1:15:47 - Batch Normalization
1:23:28 - Batch Norm, Layer Norm, Instance Norm and Group Norm
1:26:11 - Putting all together: Towards 90%
1:28:42 - Accelerated SGD
1:33:32 - Regularization
1:37:37 - Momentum
1:45:32 - Batch size
1:46:37 - RMSProp
1:51:27 - Adam: RMSProp plus Momentum

Lesson 17 Transcriptions Done

Lesson 17: Deep Learning Foundations to Stable Diffusion, 2022

fmussari · January 9, 2023, 11:39am

Transcriptions and timestamps status:

Video for Lesson 16 doesn’t have the manual transcription:
Lesson 16: Deep Learning Foundations to Stable Diffusion, 2022
Video for Lesson 17 doesn’t have some timestamps, the following are missing:

1:33:32 - Regularization
1:37:37 - Momentum
1:45:32 - Batch size
1:46:37 - RMSProp
1:51:27 - Adam: RMSProp plus Momentum

Video for Lesson 17 doesn’t have the manual transcription:
Lesson 17: Deep Learning Foundations to Stable Diffusion, 2022

fmussari · January 11, 2023, 8:36pm

Chapter and Transcriptions Done

Lesson 18 Chapters

0:00:00 - Accelerated SGD done in Excel
0:01:35 - Basic SGD
0:10:56 - Momentum
0:15:37 - RMSProp
0:16:35 - Adam
0:20:11 - Adam with annealing tab
0:23:02 - Learning Rate Annealing in PyTorch
0:26:34 - How PyTorch’s Optimizers work?
0:32:44 - How schedulers work?
0:34:32 - Plotting learning rates from a scheduler
0:36:36 - Creating a scheduler callback
0:40:03 - Training with Cosine Annealing
0:42:18 - 1-Cycle learning rate
0:48:26 - HasLearnCB - passing learn as parameter
0:51:01 - Changes from last week, /compare in GitHub
0:52:40 - fastcore’s patch to the Learner with lr_find
0:55:11 - New fit() parameters
0:56:38 - ResNets
1:17:44 - Training the ResNet
1:21:17 - ResNets from timm
1:23:48 - Going wider
1:26:02 - Pooling
1:31:15 - Reducing the number of parameters and megaFLOPS
1:35:34 - Training for longer
1:38:06 - Data Augmentation
1:45:56 - Test Time Augmentation
1:49:22 - Random Erasing
1:55:55 - Random Copying
1:58:52 - Ensembling
2:00:54 - Wrap-up and homework

Lesson 18 Transcriptions

Lesson 18: Deep Learning Foundations to Stable Diffusion, 2022

fmussari · January 19, 2023, 9:28am

Lesson 19 Chapters

0:00:00 - Introduction and quick update from last lesson
0:02:08 - Dropout
0:12:07 - DDPM from scratch - Paper and math
0:40:17 - DDPM - The code
0:41:16 - U-Net Neural Network
0:43:41 - Training process
0:56:07 - Inheriting from miniai TrainCB
1:00:22 - Using the trained model: denoising with “sample” method
1:09:09 - Inference: generating some images
1:14:56 - Notebook 17: Jeremy’s exploration of Tanishq’s notebook
1:24:09 - Make it faster: Initialization
1:27:41 - Make it faster: Mixed Precision
1:29:40 - Change of plans: Mixed Precision goes to Lesson 20

Lesson 19 Transcriptions

I had some doubts in the transcriptions, I made comments in Docs:

01:24:47 → Whisper trancribed “Kat Crowley” as author of k-diffusion. I also hear Crowley or similar. But googling I found Katherine Crowson as k-diffusion author. What should be used in the transcription?
Google Docs Comment → link

01:26:03 → Whisper trancribed “Darrow while Google paper”. Please help in this part as well.
Google Docs Comment → link

Lesson 19: Deep Learning Foundations to Stable Diffusion, 2022

jeremy · January 21, 2023, 12:02am

Crowson is the correct name - I probably just misremembered during the lesson.

jeremy · January 21, 2023, 12:10am

That’s meant to be “Dhariwal”. Thanks for checking!

fmussari · January 21, 2023, 12:37am

Transcriptions and Chapters Done

Lesson 19 Chapters

0:00:00 - Introduction and quick update from last lesson
0:02:08 - Dropout
0:12:07 - DDPM from scratch - Paper and math
0:40:17 - DDPM - The code
0:41:16 - U-Net Neural Network
0:43:41 - Training process
0:56:07 - Inheriting from miniai TrainCB
1:00:22 - Using the trained model: denoising with “sample” method
1:09:09 - Inference: generating some images
1:14:56 - Notebook 17: Jeremy’s exploration of Tanishq’s notebook
1:24:09 - Make it faster: Initialization
1:27:41 - Make it faster: Mixed Precision
1:29:40 - Change of plans: Mixed Precision goes to Lesson 20

Lesson 19 Transcriptions

Kat Crowson instead of Kat Crowley
Dhariwal instead of “Darrow while”

Lesson 19: Deep Learning Foundations to Stable Diffusion, 2022

fmussari · February 5, 2023, 10:08pm

Transcriptions and Chapters Done

Lesson 20 Chapters

0:00:00 - noisify inside a collation function
0:02:56 - MixedPrecision callback
0:05:59 - Getting the benefits from MixedPrecision
0:07:27 - HuggingFace Accelerator
0:13:57 - Sneaky trick: keep GPUs busy with MultDL
0:16:53 - Homework and experiment ideas
0:20:33 - Style Transfer notebook
0:24:19 - Optimizing an image
0:30:07 - Loss function and Learner
0:32:33 - Viewing progress: ImageLogCB
0:35:04 - Extracting features from a pre-trained network, VGG16
0:40:36 - Normalizing the image
0:44:21 - Intermediate representations, features
0:46:21 - (Hooks homework)
0:47:20 - Optimizing an image with Content Loss
0:56:05 - Style Loss with Gram Matrix
0:59:21 - “A Neural Algorithm of Artistic Style” paper
1:05:59 - Optimizing to get the final result
1:07:42 - Possible experiments and miniai
1:14:26 - Neural Cellular Automata (NCA) notebook
1:19:37 - Alexander Mordvintsev’s NCA simulation
1:21:44 - Setting up a Neural Network
1:27:16 - Getting into code
1:37:51 - Training
1:42:50 - Preview of what’s possible

Lesson 20: Deep Learning Foundations to Stable Diffusion, 2022

The transcription has this unintelligible word or name that maybe is worth correcting:
01:17:19.740
JEREMY: I watched a really cool ***** video the other day about ants and I didn’t know this before,

piotr.czapla · February 9, 2023, 5:14pm

I’ve added first version of lesson 22 chapters as comment to the video so YouTube can add the links to jump between them.

Chapters

00:00 - Intro
00:30 - Cosine Schedule (22_cosine)
06:05 - Sampling
09:37 - Summary / Notation
10:42 - Pedicting the noise level of noisy Fashion MNIST images (22_noise-pred)
12:57 - Why .logit() when predicting alpha bar t
14:50 - Random baseline
16:40 - mse_loss why .flatten()
17:30 - Model & results
19:03 - Why are we trying to predict the noise level?
20:10 - Training diffiusion without t - first attempt
22:58 - Why it isn’t working?
27:02 - Debugging (summmary)
29:29 - Bug in ddpm - paper that cast some light on the issue
38:40 - Kerras (Elucidating the Design Space of Diffusion - Based Generative Models)
49:47 - Picture of target images
52:48 - Scaling problem - (scalings)
59:42 - Training and predictions of modified model
1:03:49 - Sampling
1:06:05 - Sampling: Problems of composition
1:07:40 - Sampling: Rationale for rho selection
1:09:40 - Sampling: Denosing
1:15:26 - Sampling: Heun’s method fid: 0.972
1:19:00 - Sampling: LMS sampler
1:20:00 - Kerras Summary
1:23:00 - Comparison of different approaches
1:25:00 - Next lessons

fmussari · March 12, 2023, 10:22am

Transcription and Chapters Done

Lesson 21 Chapters

0:00:00 - A super cool demo with miniai and CIFAR-10
0:02:55 - The notebook
0:07:12 - Experiment tracking and W&B callback
0:16:09 - Fitting
0:17:15 - Comments on experiment tracking
0:20:50 - FID and KID, metrics for generated images
0:23:35 - FID notebook (18_fid.ipynb)
0:31:07 - Get the FID from an existing model
0:37:22 - Covariance matrix
0:42:21 - Matrix square root
0:46:17 - Why it is called Fréchet Inception Distance (FID)
0:47:54 - Some FID caveats
0:50:13 - KID: Kernel Inception Distance
0:55:30 - FID and KID plots
0:57:09 - Real FID - The Inception network
1:01:16 - Fixing (?) UNet feeding - DDPM_v3
1:08:49 - Schedule experiments
1:14:52 - Train DDPM_v3 and testing with FID
1:19:01 - Denoising Difussion Implicit Models - DDIM
1:26:12 - How does DDIM works?
1:30:15 - Notation in Papers
1:32:21 - DDIM paper
1:53:49 - Wrapping up

Lesson 21: Deep Learning Foundations to Stable Diffusion, 2022

jeremy · March 17, 2023, 7:10am

Kurzgesagt - https://www.youtube.com/watch?v=7_e0CA_nhaE (and many more!)

jeremy · March 17, 2023, 11:15pm

I’ve created a little script for downloading YouTube audio and creating a whisper transcription in case this is helpful to anyone. (cc @fmussari)

import sys,whisper,yt_dlp as yt

vid = sys.argv[1]
video = f"https://www.youtube.com/watch?v={vid}"
ydl_opts = {
    'format': 'bestaudio/best',
    'outtmpl': vid+'.%(ext)s',
    'postprocessors': [{ 'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192', }],
}
with yt.YoutubeDL(ydl_opts) as ydl: ydl.download([video])

model = whisper.load_model("base")
ip = "This is a discussion of 'fastai', 'fast.ai', 'Tanishq', 'Johno', 'Karras', 'DDIM', 'DDPM', 'Imagenet', 'MNIST', and various other deep learning things. "
text = model.transcribe(vid+".mp3", verbose=False, initial_prompt=ip)
with open(f"{vid}.txt", "w") as f: f.write(text['text'])

(I used an “initial_prompt” to try to get some less common words recognised automatically, but it only helped a little.)

I’ve used this to create a rough transcript for the last 3 videos, since I wanted to be able to create summaries for them, but help cleaning them up would be much appreciated!

fmussari · March 18, 2023, 12:40am

Great, very concise. I’m going to use/include it.
I have been using this Colab notebook: Generate .vtt from Youtube.ipynb, but pytube library just broke today when I tried to use it, so I had to do like a patch.
For the cleaning I already started with Lesson 22, anyone can contribute to it, or to Lessons 23 or 24:

fmussari · March 18, 2023, 12:43am

Wiki to track contribution to transcriptions:

Remember the rules:

I also added JEREMY, JOHNO or TANISHQ when any of them starts talking.

Lesson 22: Deep Learning Foundations to Stable Diffusion, 2022 Done

Lesson 23: Deep Learning Foundations to Stable Diffusion, 2022 Done

Lesson 24: Deep Learning Foundations to Stable Diffusion, 2022

Lesson 25: Deep Learning Foundations to Stable Diffusion, 2022

jeremy · March 18, 2023, 1:00am

BTW it’s slightly easier for me if the timestamps aren’t in the file. I have something semi-automated to remove them, so if it’s easier for you to keep them in, that’s fine. But if it’s easier to remove them, then do that.

jeremy · March 18, 2023, 1:01am

Yeah that’s why I didn’t use pytube in my version.

fmussari · March 18, 2023, 2:05am

Oh, no problem, I removed them.

I saw you kept speaker identification, so I’ll keep that.

jeremy · March 23, 2023, 8:59pm

FYI I replaced lesson 24 with a new video, but it’s only some text that’s been added as an overlay - no change to the audio or timings, so should impact anything.