YouTube Chapter Markers & Transcriptions

jeremy · October 25, 2022, 9:38pm

Thanks for the amazing work on this @fmussari ! I’ve added this to the video now.

I’m going to try using Whisper for lesson 10 now to see if that helps…

fmussari · October 25, 2022, 10:23pm

Great!

If it is still needed a review after Whisper, just tell us/me, I find it useful to study by doing that.

If it helps in any way, I transcribed Lesson 10 up to 1:04:13.

Lesson 10: Deep Learning Foundations to Stable Diffusion, 2022

Raymond-Wu · October 27, 2022, 5:17am

Lesson 9B chapter markers done. @seem I’m not sure which paper you were first talking about by Jascha Sohl-Dickstein. Is it this one? [1503.03585] Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Raymond-Wu · October 27, 2022, 5:20am

Done

Lesson 11
0:00 - Introduction
0:20 - Showing student’s work
13:03 - Workflow on reading an academic paper
16:20 - Read DiffEdit paper
26:27 - Understanding the equations in the “Background” section
46:10 - 3 steps of DiffEdit
51:42 - Experiments and benchmarks
59:15 - Matrix multiplication from scratch
1:08:47 - Speed improvement with Numba library
1:19:25 - Frobenius norm
1:25:54 - Broadcasting with scalars and matrices
1:39:22 - Broadcasting rules
1:42:10 - Matrix multiplication with broadcasting

Additional Links:
DiffEdit: Diffusion-based semantic image editing with mask guidance - https://arxiv.org/abs/2210.11427

seem · October 27, 2022, 8:24am

Yep, thank you so much!

Raymond-Wu · October 28, 2022, 9:28pm

@jeremy Lesson 9B and 11 are done

jeremy · October 31, 2022, 4:25am

Thanks so much @Raymond-Wu !

cfalholt · October 31, 2022, 10:43am

In case anyone has the use for a transcription with corresponding Youtube timestamps, I used OpenAI’s Whisper to create them for lessons 9 and 10.

Lesson 9 transcribed with timestamps
Lesson 10 transcribed with timestamps

fmussari · November 3, 2022, 11:36am

Finished manually transcribing Lesson 10, with help from Whisper’s transcription by @cfalholt.

Lesson 10: Deep Learning Foundations to Stable Diffusion, 2022

Raymond-Wu · November 3, 2022, 7:04pm

how accurate would you say were those Whisper transcriptions?

fmussari · November 3, 2022, 7:48pm

I mainly used Whisper transcription when there were words that I couldn’t understand, and I thought Youtube’s CC wasn’t giving the correct word. Some of them were not picked by Whisper either.

I found the punctuation really good overall.

To get the best result I think the process should be to manually review and correct Whisper’s transcription instead of Youtube’s CC. But I don’t know if the timestamps are necessary.

fmussari · November 5, 2022, 2:52pm

I started transcribing Lesson 11 from a Whisper generated .vtt file. And yeah, the process is much faster.
I’m just tweaking a little, mainly names of people or variable. The punctuation is really good.
The model even ignores the bits when Jeremy says something incomplete because then he corrects himself, which is probably ok.

The Colab notebook that generated the .vtt is this one:
Generate .vtt from Youtube.ipynb

And here is the Google Doc file where I’m suggesting instead of editing —to keep track of the changes done to Whisper output:

Lesson 11: Deep Learning Foundations to Stable Diffusion, 2022

jeremy · November 5, 2022, 10:55pm

Wow this is great - thanks so much.

If anyone is interested in a project, my thought is that there might be a way to auto-download the Youtube-generated captions, and get the captions from Whisper, and then put both in a diff tool to show where they differ. That would help identify any possible issues, perhaps? Maybe include some simple cleanup script as well in the process that fixes known issues (e.g removing “umm” etc)?

Another idea - you could download lots of auto-transcripts with associated manual transcripts from Youtube videos, and train a seq2seq model to automatically fix problems (including lack of punctuation) from the YouTube auto-transcripts! Also, you could run Whisper on the audio for those videos, and train another seq2seq model to fix problems in Whisper models (this would be especially helpful if trained on previous fast.ai videos, since it’ll learn the fast.ai-specific vocab).

Finally, you could try to fine-tune Whisper on all my videos that have manual transcripts (which I think is nearly all fast.ai lessons). Here are two examples I’ve found of Whisper fine-tuning:

What do you think of these ideas, @mcleavey?

Raymond-Wu · November 7, 2022, 9:10pm

Any clue what sort of tech I could use to speed up my processes? I’d like to spend less time doing these because I’ve been wanting to try my hand at implementing a paper.

Raymond-Wu · November 7, 2022, 9:10pm

Done

Lesson 12
0:00 - Introduction
0:15 - CLIP Interrogator & how it works
10:52 - Matrix multiplication refresher
11:59 - Einstein summation
18:34 - Matrix multiplication put on to the GPU
33:31 - Clustering (Meanshift)
37:05 - Create Synthetic Centroids
41:47 - Mean shift algorithm
47:37 - Plotting gaussian kernels
53:33 - Calculating distances between points
57:42 - Calculating distances between points (illustrated)
1:04:25 - Getting the weights and weighted average of all the points
1:11:53 - Matplotlib animations
1:15:34 - Accelerating our work by putting it on the GPU
1:37:33 - Calculus refresher

Contributors:
Raymond-Wu, laith.zumot

jeremy · November 9, 2022, 1:17am

@andreilys has kindly agreed to help

andreilys · November 9, 2022, 2:01am

Happy to help, thanks Jeremy.

@Raymond-Wu here’s a great tool to help speed things up. I’ve just been added to the course so am a bit behind but once I’m caught up to Lesson 12 I plan to use summarize.tech and GPT-3 to simplify this process, with a final sanity check once I get a chance to actually watch the video.

https://www.summarize.tech/www.youtube.com/watch?v=_xIzPbCgutY

Raymond-Wu · November 9, 2022, 3:41am

Great to have your help! This looks more like it’s for summarizing a video whereas I’m trying to create YouTube chapter markers to denote the beginning of a new topic being discussed.

andreilys · November 9, 2022, 3:53am

Yep the summary would just be a start for simplifying the workflow, they would then need to be manually updated (shortened to a few words as you’ve done above)

I’ll be able to help here once I get up to lesson 12

Raymond-Wu · November 9, 2022, 3:56am

Got it. If I finish ch 12 by the time you catch up you’re welcome to do ch 13