Thanks - done!
fantastic. thank you Hiromi and Jeremy!
Now that the English transcript is pretty much done we can look at that for more accuracy
I am sorry, I had a mini covid burnout and needed to take a long break. I have made a copy of the document with the transcription being made and started translating that. I have done only one of the 25 parts though. I will work more on that and I can share the document
The manual English transcription is now available as YouTube captions - hope this helps with your translations!
Haven’t had as much time as I expected to work on this, so I’ve retracted my name from the list.
I have a new found respect for translators. I figured I should share what I found out as I make my first attempt at this.
What worked well so far:
- DeepL provides translations that are different from Google, and it sounds much more natural (one con is that it sometimes omits a whole sentence for the sake of flow).
- YouTube caption editor (here are some tips including keyboard shortcuts)
What did not work:
- Creating a google doc of translation - unless you are doing that for the video’s original language, YouTube will not try to synchronize the timing for you. I even tried converting a text file into the caption file format by splitting each sentence and adding evenly distributed time span, but that turned out to be a lot more manual timestamp adjusting.
Still a long road ahead of me and I would appreciate any tips you guys may have!
I have found the autotranslation for Swahili. But, it is one on one relationship meaning that its just replaced the English word to Swahili. Therefore, there are flow issues. For some sentences it’s spot on but others are just bad. Just an observation.
What I am doing now is taking the result and editing it to make sense and be grammatical.
Perhaps you could download the timed-text subtitle or translation text files from youtube, and edit them directly? (I haven’t tried this myself).
Yes, that is similar to what I am doing now. The quality of translation is much higher if I run the English transcript people created. So now it is a lot of copying and pasting to corresponding time slot in the downloaded subtitle file
YouTube has that as its transcription now - so no need to copy/paste!
The main problem is that youtube chops up a sentence into pieces. Like so:
0:11:19.790,0:11:26.899
Which really, to a strong degree happened
because an MIT professor named Marvin Minsky
0:11:26.899,0:11:31.810
and Papert wrote a book called perceptrons
about Rosenblatt's invention in which they
0:11:31.810,0:11:39.620
pointed out that a single layer of these artificial
neuron devices, actually couldn't learn some
0:11:39.620,0:11:40.620
critical things.
And we get much better translation from the google doc than the caption file even though they technically contain the same transcription:
Which really, to a strong degree happened because an MIT professor named Marvin Minsky and Papert wrote a book called perceptrons about Rosenblatt's invention, in which they pointed out that a single layer of these artificial neuron devices actually couldn't learn some critical things.
So now we end up with:
マービン・ミンスキーというMITの教授とパパートが ローゼンブラットの発明について パーセプトロンという本を書いたからです その中で彼らは、これらの人工ニューロン装置の単層では 重要なことを学習できないと指摘しています。
which we then need chop it up and put it back to the caption file.
We probably just need to train a language model for caption files
It feels like their might be an opportunity to build some kind of little GUI to make this easier…
YouTube’s GUI is okay. A little too small, but it’s side by side with a video player.
Can someone provide an export of the new English transcript? I need it for the Bulgarian translation. The timing of my current version need to be fixed.
Hey @krasin
Edit: Here is the exported subtitle file
If you’re curious on how to grab them, here is one method:
Warning: I was going through a bash refresher today, so this might be an overkill:
- Install (latest)
youtube-dl
version:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
hash -r
After that, you can download the subtitle files using the tool from your terminal with:
youtube-dl --all-subs --skip-download <URL for Lecture here>
Note: I’m pretty sure there would be some 3rd party tools that would make this easier to download, if this looks like too much of a hassle, please ignore it.
it’s a LOT of work i agree.
I am using YT editor and autogenerated autotranslated subtitles and then i correct them based on audio. Some people suggested to just start from scratch as it takes maybe more effort to correct than to type from scratch. I will try that next.
Was also thinking along the same lines as Hiromi than the translations are kind of broken because the senteces are broken by the timestamp. So i like the idea to remove timestamps so that there is the flow of senteces for translation.
Oh and today i noticed the original subtitles are not in sync anymore, i guess for next video i will wait for the corrected English subtitles.
My understanding is that we should aim for the translations to be available for MOOC release which is July?
I can show you the hacky way I did this (and sorry I’m doing this in a rush before work - let me know if it’s confusing):
If you open the published “English” one, there is no longer Action button.
So I did
- “Canadian English”
- Copy from the original (the uploaded caption)
- Download that
- Delete the draft
Stuck in translating this part
what it is you can do if only you have a way to parameterize a model and you have an update procedure which can update the weights to make you better at your loss function.
@jeremy, I know this is a big ask, but from next lessons, if possible, requesting for using smaller sentences I have been at this for an hour now and am literally tearing my hairs to translate this into a meaningful format in vernacular language.
For, the translation, I have rephrased it as:
What you can do, if you can only parameterize the model and use an update procedure to the weights to improve the loss function?
Can anyone confirm, if it conveys the same meaning?