Help wanted: transcriptions

Something that the community has been amazing at for each course is creating accurate transcriptions of each lesson. I’d love your help doing that for this year’s course too! If you have a few moments spare, could you transcribe some of a lesson? The way you do it is to watch the lesson on YouTube, open the auto-generated transcript (linked below), and correct the auto-generated version so it matches what you hear, and also is reasonably correct for punctuation.

You can do this directly in these shared Google docs I’ve created:

The title of each doc is a link to the video. The sections that are in italics are auto-generated. So once you fix a paragraph, remove italics from it (select it and press ctrl-i). So I can credit you in the YouTube description, also add your forum username above it. You’ll see I’ve done the first few paragraphs of the first few lessons, to give you a sense of what it should look like.

In general, change the wording as little as possible – the transcript should say what I say in the video. Google syncs the transcript to the video automatically, so if the text is changed, it won’t be able to sync it.

I’ll add credits to the video description on YouTube for everyone that helps.


Is is okay to adjust the paragraphing - without altering orderof the words?
e.g. remove paragraph split after “pixby”?

  image

A suggestion…
This will require you to manually remove the forum-username from the document.
An alternative is to highlight the text, then click Add comment, and add of forum-username there. Then you can use the final text directly, without further editing. The other advantage is the yellow highlight makes it really clear what is complete. I find italic-versus-normal-font doesn’t have enough contrast when quickly skiming whole documents

e.g. Adding comment…

How it looks…

Also, clicking anywhere emphasies which text was edited by which person.

Until you advise otherwise, I will do it as above, since it is easy to revert, and you can see it in practice.

Another suggestion…

When someone finishes a part, to help the next person jump to the right video location…
right click on the video, then select Copy video URL at current time


then paste that into a comment…


Possibly some additional small single bridging words at the start of sentences might also be good to clean. Especially “And” like this example.

That’s fine too - you’re welcome to do whatever works for you! :slight_smile:

If you’re ever unsure whether you’re looking at auto-generated or manual transcription, then look for capital letters. Google’s auto-transcribe doesn’t use capitals.

I wonder if we could keep both auto-generated and manually edited versions to compare, or maybe fine-tune the recognition software to improve the automated results? :thinking:

Not sure if it makes much sense though, just got some thought about “residuals” and boosted learning.


Ah, ok, so these subscripts were provided by YouTube/Google. So I guess there is no much to do about the model.

Thanks all for your help so far with the transcripts! :slight_smile:

I’ve just added the lesson 3 auto-generated transcript to the top post of this thread.

PS: Where possible, it’s best to finish the lesson 1 transcript before moving onto lesson 2, and so forth. That way we get finished transcripts uploaded and usable ASAP!

Would it be possible to add the link to the video here as well so people can jump to it instead of going back out and looking for the lesson thread?

The title of each doc is a link to the video. I’ll add a mention of that in the top post.

Thank you! I totally missed that!

How to deal with repeated utterances.

and for any kind of model you can

and for any kind of model you can always call show_results()

My instinct here says, remove the second “and for any kind of model you can” to make it flow like this:

and for any kind of model you can always call show_results()

But that wouldn’t be true to the actual spoken repitition in the lecture, which sounds fine in the speech version of a lecture.

I would suggest using an ellipsis: “and for any kind of model you can… and for any kind of model you can always call show_results()”.

That way the text matches what is spoken.


Thanks gang - we’re getting close to having the lesson 1 transcript finished! Let’s get it over the line! :smiley:

Lesson One is now done. :metal:

I am quite surprised that the autogenerated transcripts do generally pretty well. Except for proper nouns. So, pie torch, caggle etc. Interestingly it got tensorflow correct. Google may have been sneaky there.


Lesson Two is complete.
Great work @mike.moloch, @amr.malik, @gagan, @fmussari, @kurianbenoy, @heylara


I’m starting with the last paragraph of Lesson 3. I didn’t see this mentioned but it’s much easier than copying and pasting timestamps to find where one person stopped and another picked up. There’s a “Show transcript” option that’ll allow you to search for text. Hope this helps us all work in parallel!

Another tip for those of you with 2 monitors is to have the doc on one screen and the video on the other. Makes editing a lot quicker!


I’ve been staking out pages with
[@mike.moloch WIP
… para
[/@mike.moloch ]

I’m going to start adding timestamps from the transcript as well (which I’ve been doing but not all the time tbh)

The WIP I think is a good idea. I’m not entirely sure what adding the timestamps in the docs does over the tip in my previous post?

That’s more for my own record keeping. It’s easier to find where I started/stopped if I don’t finish a block in one sitting (when I’ve blocked off 2-3 pages at a time.) I personally find it a faster way to go back to the video location where I may have left off. At the end I take all of these out of course and submit the whole block as one “change”.