Lesson transcriptions (2020) - help wanted

hiromi · April 23, 2020, 1:28pm

Some of us were wondering if we can have the captions.sbv files somewhere in fastai Github repo. It is difficult to get them all right at the first time and I noticed typos here and there as I re-watch the videos (some are minor, some are pretty major).

I tried to fix it via YouTube, but for English it is locked saying “The video owner already provided subtitles/CC”:

If they were in Github, we can create PRs and continuously improve the overall quality. How does that sound?

AndreaPi · April 23, 2020, 2:15pm

Having them in Github would be nice in order for us to improve! If we see corrections to our captions, next time we could do better.

quantum · April 23, 2020, 5:42pm

How about this: if we see mistakes in the Google doc (and some entries have many) we go in and fix them, and add “edited by” and our name?

AndreaPi · April 23, 2020, 6:02pm

For me it’s fine either way: it’s also fine if people just fix the mistakes, and don’t add any “edited by”

lin.crampton · April 24, 2020, 10:03pm

Having trouble understanding one part of Lesson5 (https://www.youtube.com/watch?v=krIVOb23EH8&feature=youtu.be):

around 1.40.04, Jeremy mentions “Patti Hendricks has trained a language model of me.” … looking for the correct spelling of the name Patty/Patti Hendrix/Hendric

hiromi · April 24, 2020, 10:16pm

It’s his username pattyhendrix

lin.crampton · April 24, 2020, 10:18pm

Thanks, again, Hiromi.

hiromi · April 26, 2020, 1:25am

@jeremy, lesson 5 transcription is done thanks to these amazing volunteers!!

@barnacl 1
@SOVIETIC-BOSS88 9
@lin.crampton 9
@pnvijay 1
@morgan 1
@AndreaPi 1
@jona 1
@Jess 1
@gautam_e 1

Here are splits for lesson 6. Looks like @Albertotono is already hard at work

gautam_e · April 26, 2020, 8:05am

At around 57:04, where Jeremy is answering a question about DASK, there’s a part where he’s talking about non-indexable datasets which was quite inaudible (probably since a part of it got cut of from the original video?). Perhaps some one could help with what he says there:

If it’s not indexable, like it’s a, it’s a network stream or something like that, then um the data loaders datasets api’s directly which we’ll learn about either in this course or the next one

Thanks, in advance!

SOVIETIC-BOSS88 · April 26, 2020, 8:22am

Hi, I assume you meant Lesson 6. From what it seems, Jeremy is saying if the dataset is non-indexable you can’t use the data loaders datasets APIs directly. So its possible it would be something like:

… you can’t use um the data loaders datasets api’s directly which we’ll learn about either in this course or the next one.

Hope this helps.

RogerS49 · April 28, 2020, 1:18pm

Did one for 6 last on list

jeremy · April 28, 2020, 7:55pm

This is fantastic!

jeremy · April 28, 2020, 7:56pm

Probably easiest, if possible, would be if we can figure out how to “unlock” the captions so that corrections can be contributed?

hiromi · April 29, 2020, 12:19am

Sounds good! I will look into it and see if there is a way

jeremy · April 30, 2020, 5:17pm

@transcribe-1v4 - thank you to those of you who have stuck with this project so well! And for those of you that haven’t been able to - that’s totally understandable! Now’s a great time to get involved again and help out. I’ve just published the lesson 7 auto-generated transcript:

Pick a paragraph, pop your name above it, along with “status: in progress”, and get transcribing!

Albertotono · April 30, 2020, 8:53pm

great work with Perl, jeremy, thank you. It is a pleasure to help you, and it is amazing to see such great team work, divide and conquer, so upset that this is the second to last lesson. I don’t know if we can wait till September for the part 2 you really help us a lot during this difficult time.

hiromi · May 1, 2020, 2:46am

It looks like as long as the owner is not the one who submitted the transcripts, we can continue to improve. For example, I can still edit the Tamil translation that is already published:

BUT! Something only the owner can do is to upload the transcription with no timestamps. So maybe for part 2, we can just download the .sbv file from YouTube and people can update that (leaving the timestamp). So once that’s done, we can submit it to YouTube and Jeremy can publish it. And because we submitted it, it will continue to be editable.

It’s probably too late to change the logistics now, so I think what we can do is once all the lessons are done and submitted to YouTube:

I can download the sbv files that are processed and published.
Jeremy deletes the published English CC
I will upload the downloaded sbv file
Jeremy will publish them
Hooray, we can still edit them

imrandude · May 1, 2020, 7:43am

Hope you are not actually editing the Tamil text

jeremy · May 1, 2020, 3:47pm

Would that be a bit harder for the folks doing the transcriptions? I wouldn’t want to do anything to make things more difficult!..

imrandude · May 1, 2020, 4:32pm

One cool trick for people doing translations, you can load the entire english subtitle file in google sheet and use “googletranslate” function to auto-translate everything to your native language. Once you have corrected the text, we can modify the timings to suit active/passive voice issues.

For instance here, I’ve translated from English to Tamil.