Crowdsourcing lecture transcriptions

pramod.srinivasan · October 23, 2017, 1:04am

I am exploring the idea to create a fast, reliable crowd-powered method to accurately transcribe lectures. This has been well-explored in the past and fortunately due to Google’s auto transcription we have a fairly easy way to extract near-perfect descriptions – the draft for yesterday’s workshop is here.

For starters, one powerful use-case is the search functionality this doc can provide. For instance, I can quickly retrieve places where @jeremy is talking about list comprehension. However, the document is far from perfect – there are typos – some innocuous and a few others which may potentially impede the reader’s thought-flow. Here is a sample :

10:03, your laptop probably doesn’t have they
10:06, deep learning compatible GPU in it this
10:09, is something and you know most people I
10:11, know including most of the most serious
10:12, researchers in breakfast
10:13, used AWS for most of their work back so
10:18, that’ll make you you know it’s a kind of
10:21, getting familiar with AWS is something

Looking for suggestions on systematic ways to clean this. Ideally, it would be great to come up with a method which can work for all future lectures. We can probably add the improved version to the wiki thread. Happy learning!

satish860 · October 23, 2017, 1:42am

The same feature is available in azure as video indexer API. But it will be nice if we can build something similar out as a output for learning in fast ai.

A_TF57 · October 23, 2017, 1:56am

I know of https://meetscribe.io that a friend is working on for transcribing meetings. I feel this can be used for recording the lectures, although this product is in very early stages right now.

I tried it on the workshop video and the process was very slow.

jeremy · October 23, 2017, 2:04am

A transcription of each lesson would be much appreciated. They don’t need to be time-coded at all - YouTube does that automatically. In the previous courses @lin.crampton was kind enough to transcribe the whole lot! Some kind of more crowd-sourced approach would be cool. It’s important for students where English isn’t their first language, and of course for those with hearing difficulties.

Another thing that’s helpful is to create a timeline: e.g. http://wiki.fast.ai/index.php/Lesson_1_Timeline . If you do this, you can put it straight into the wiki lesson post.

Thanks to all of you who are thinking about how to help your fellow students! Let me know if you need anything from me to make things easier.

A_TF57 · October 23, 2017, 2:10am

Thanks @jeremy!

How did you create the timeline? Did you use any specific tool or does YouTube do that for you?

[Edit]: I think I got the gist from the View Source tab on that wiki link you shared.

jeremy · October 23, 2017, 2:43am

Wasn’t me! Was participants in the course