0:00:00 - Changes to previous lesson 0:07:50 - Trying to get 90% accuracy on Fashion-MNIST 0:11:58 - Jupyter notebooks and GPU memory 0:14:59 - Autoencoder or Classifier 0:16:05 - Why do we need a mean of 0 and standard deviation of 1? 0:21:21 - What exactly do we mean by variance? 0:25:56 - Covariance 0:29:33 - Xavier Glorot initialization 0:35:27 - ReLU and Kaiming He initialization 0:36:52 - Applying an init function 0:38:59 - Learning rate finder and MomentumLearner 0:40:10 - What’s happening in each stride-2 convolution? 0:42:32 - Normalizing input matrix 0:46:09 - 85% accuracy 0:47:30 - Using with_transform to modify input data 0:48:18 - ReLU and 0 mean 0:52:06 - Changing the activation function 0:55:09 - 87% accuracy and nice looking training graphs 0:57:16 - “All You Need Is a Good Init”: Layer-wise Sequential Unit Variance 1:03:55 - Batch Normalization, Intro 1:06:39 - Layer Normalization 1:15:47 - Batch Normalization 1:23:28 - Batch Norm, Layer Norm, Instance Norm and Group Norm 1:26:11 - Putting all together: Towards 90% 1:28:42 - Accelerated SGD 1:33:32 - Regularization 1:37:37 - Momentum 1:45:32 - Batch size 1:46:37 - RMSProp 1:51:27 - Adam: RMSProp plus Momentum
0:00:00 - Accelerated SGD done in Excel
0:01:35 - Basic SGD
0:10:56 - Momentum
0:15:37 - RMSProp
0:16:35 - Adam
0:20:11 - Adam with annealing tab
0:23:02 - Learning Rate Annealing in PyTorch
0:26:34 - How PyTorch’s Optimizers work?
0:32:44 - How schedulers work?
0:34:32 - Plotting learning rates from a scheduler
0:36:36 - Creating a scheduler callback
0:40:03 - Training with Cosine Annealing
0:42:18 - 1-Cycle learning rate
0:48:26 - HasLearnCB - passing learn as parameter
0:51:01 - Changes from last week, /compare in GitHub
0:52:40 - fastcore’s patch to the Learner with lr_find
0:55:11 - New fit() parameters
0:56:38 - ResNets
1:17:44 - Training the ResNet
1:21:17 - ResNets from timm
1:23:48 - Going wider
1:26:02 - Pooling
1:31:15 - Reducing the number of parameters and megaFLOPS
1:35:34 - Training for longer
1:38:06 - Data Augmentation
1:45:56 - Test Time Augmentation
1:49:22 - Random Erasing
1:55:55 - Random Copying
1:58:52 - Ensembling
2:00:54 - Wrap-up and homework
0:00:00 - Introduction and quick update from last lesson
0:02:08 - Dropout
0:12:07 - DDPM from scratch - Paper and math
0:40:17 - DDPM - The code
0:41:16 - U-Net Neural Network
0:43:41 - Training process
0:56:07 - Inheriting from miniai TrainCB
1:00:22 - Using the trained model: denoising with “sample” method
1:09:09 - Inference: generating some images
1:14:56 - Notebook 17: Jeremy’s exploration of Tanishq’s notebook
1:24:09 - Make it faster: Initialization
1:27:41 - Make it faster: Mixed Precision
1:29:40 - Change of plans: Mixed Precision goes to Lesson 20
Lesson 19 Transcriptions
I had some doubts in the transcriptions, I made comments in Docs:
01:24:47 → Whisper trancribed “Kat Crowley” as author of k-diffusion. I also hear Crowley or similar. But googling I found Katherine Crowson as k-diffusion author. What should be used in the transcription?
Google Docs Comment → link
01:26:03 → Whisper trancribed “Darrow while Google paper”. Please help in this part as well.
Google Docs Comment → link
0:00:00 - Introduction and quick update from last lesson
0:02:08 - Dropout
0:12:07 - DDPM from scratch - Paper and math
0:40:17 - DDPM - The code
0:41:16 - U-Net Neural Network
0:43:41 - Training process
0:56:07 - Inheriting from miniai TrainCB
1:00:22 - Using the trained model: denoising with “sample” method
1:09:09 - Inference: generating some images
1:14:56 - Notebook 17: Jeremy’s exploration of Tanishq’s notebook
1:24:09 - Make it faster: Initialization
1:27:41 - Make it faster: Mixed Precision
1:29:40 - Change of plans: Mixed Precision goes to Lesson 20
0:00:00 - noisify inside a collation function
0:02:56 - MixedPrecision callback
0:05:59 - Getting the benefits from MixedPrecision
0:07:27 - HuggingFace Accelerator
0:13:57 - Sneaky trick: keep GPUs busy with MultDL
0:16:53 - Homework and experiment ideas
0:20:33 - Style Transfer notebook
0:24:19 - Optimizing an image
0:30:07 - Loss function and Learner
0:32:33 - Viewing progress: ImageLogCB
0:35:04 - Extracting features from a pre-trained network, VGG16
0:40:36 - Normalizing the image
0:44:21 - Intermediate representations, features
0:46:21 - (Hooks homework)
0:47:20 - Optimizing an image with Content Loss
0:56:05 - Style Loss with Gram Matrix
0:59:21 - “A Neural Algorithm of Artistic Style” paper
1:05:59 - Optimizing to get the final result
1:07:42 - Possible experiments and miniai
1:14:26 - Neural Cellular Automata (NCA) notebook
1:19:37 - Alexander Mordvintsev’s NCA simulation
1:21:44 - Setting up a Neural Network
1:27:16 - Getting into code
1:37:51 - Training
1:42:50 - Preview of what’s possible
The transcription has this unintelligible word or name that maybe is worth correcting: 01:17:19.740 JEREMY: I watched a really cool ***** video the other day about ants and I didn’t know this before,
0:00:00 - A super cool demo with miniai and CIFAR-10
0:02:55 - The notebook
0:07:12 - Experiment tracking and W&B callback
0:16:09 - Fitting
0:17:15 - Comments on experiment tracking
0:20:50 - FID and KID, metrics for generated images
0:23:35 - FID notebook (18_fid.ipynb)
0:31:07 - Get the FID from an existing model
0:37:22 - Covariance matrix
0:42:21 - Matrix square root
0:46:17 - Why it is called Fréchet Inception Distance (FID)
0:47:54 - Some FID caveats
0:50:13 - KID: Kernel Inception Distance
0:55:30 - FID and KID plots
0:57:09 - Real FID - The Inception network
1:01:16 - Fixing (?) UNet feeding - DDPM_v3
1:08:49 - Schedule experiments
1:14:52 - Train DDPM_v3 and testing with FID
1:19:01 - Denoising Difussion Implicit Models - DDIM
1:26:12 - How does DDIM works?
1:30:15 - Notation in Papers
1:32:21 - DDIM paper
1:53:49 - Wrapping up
I’ve created a little script for downloading YouTube audio and creating a whisper transcription in case this is helpful to anyone. (cc @fmussari)
import sys,whisper,yt_dlp as yt
vid = sys.argv[1]
video = f"https://www.youtube.com/watch?v={vid}"
ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': vid+'.%(ext)s',
'postprocessors': [{ 'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192', }],
}
with yt.YoutubeDL(ydl_opts) as ydl: ydl.download([video])
model = whisper.load_model("base")
ip = "This is a discussion of 'fastai', 'fast.ai', 'Tanishq', 'Johno', 'Karras', 'DDIM', 'DDPM', 'Imagenet', 'MNIST', and various other deep learning things. "
text = model.transcribe(vid+".mp3", verbose=False, initial_prompt=ip)
with open(f"{vid}.txt", "w") as f: f.write(text['text'])
(I used an “initial_prompt” to try to get some less common words recognised automatically, but it only helped a little.)
I’ve used this to create a rough transcript for the last 3 videos, since I wanted to be able to create summaries for them, but help cleaning them up would be much appreciated!
I have been using this Colab notebook: Generate .vtt from Youtube.ipynb, but pytube library just broke today when I tried to use it, so I had to do like a patch.
For the cleaning I already started with Lesson 22, anyone can contribute to it, or to Lessons 23 or 24:
BTW it’s slightly easier for me if the timestamps aren’t in the file. I have something semi-automated to remove them, so if it’s easier for you to keep them in, that’s fine. But if it’s easier to remove them, then do that.
FYI I replaced lesson 24 with a new video, but it’s only some text that’s been added as an overlay - no change to the audio or timings, so should impact anything.