Lesson 4 recorded voice issue in video

mgorecki · February 9, 2019, 8:29pm

It’s ok for me on Windows.
I’ve started lesson 4 today and the sound is slightly different than the previous lessons, but it’s loud and clear.

gnchen · February 9, 2019, 8:52pm

Thanks for confirmation. I tried a few speakers and turns out the default speaker is not good for this lesson. It is odd.

Eratudo · February 9, 2019, 9:03pm

I had the same problem, switching also worked for me. (I also had problems with the last lesson)

amqdn · February 9, 2019, 11:28pm

Yes. I’ve had this same issue. I could not hear much of anything when I played the video on mobile (on two different devices), but when I played it back on desktop, it was fine.

The sound, however, sounds weird. I haven’t looked into it, but I’m guessing that there is something wrong with the audio channels. I think that because my phones may be playing audio back in mono, the summing of the audio channels down to mono creates phase cancellation, destroying the audio. Something like that.

It should be an easy audio fix on their end. Not sure who to bring it up to.

jeremy · February 10, 2019, 4:06am

This is interesting - these are the two lessons that the youtube auto-transcript also failed on. If anyone figures out what’s going on, please let me know!

yoongkang · February 10, 2019, 2:58pm

It’s completely unlistenable on my mobile device. On my Mac it’s fine and clear, but it kind of seems like the sound is coming from only one side.

amqdn · February 10, 2019, 5:49pm

I’ll take a look in the next day or so.

marcmuc · February 10, 2019, 7:59pm

Here is a quick comparison of the audio using audacity. Comparison of lesson 4 (top) and lesson 3 (bottom) (lesson7 is similar to lesson 4, lesson 6 is similar to lesson 3):

I am not an expert in this, but it clearly points to a different type of recording equipment or settings that was used for the recordings of lessons 4 and 7:

use of noise gates / noise canceling (as can be seen from the clear gaps in the speech in lesson 4 where the levels are essentially reduced to zero compared to lesson 3 where the gaps always have some noise)
use of compression technology (audio compression, meaning silent levels were raised and loud levels were capped/dampened) as can be seen from the much “redder” lesson 4 where a lot of middle frequencies were “boosted” and from picture 2, where the signal is much less dynamic for lesson 4 and much “louder”
and/or use of equalizer/filter of some sort (pic 3 shows the spectrum), which could be from “voice understandability boosting”-filter or something like that.
although the signal is compressed, the overall level is lower than for lesson 3 (less “loud”)

This would explain, why the audio transcription doesn’t work well (compressed, noise-canceled audio is not representative of the training data that the transcription model had?!) or why some people report noise on their speakers (could be from distortion by “overpowering” the speakers due to the compressed signal if played too loud)

marcmuc · February 10, 2019, 8:32pm

Ah, and this might be the worst: The signals of left and right channels seem to be out of sync = phase-shifted. (I am lacking the right software for this phase comparison with X/Y plots etc. but looking at the waveform seems enough in this case)

This might mean on devices with speakers very close to each other (or one speaker where both channels are combined for mono) this might cancel out the sound basically. So it might be worth trying to just use the left or right channel of the video for transcription instead of both?!

amqdn · February 11, 2019, 7:58am

Hey, Jeremy – I’ve figured it out. I suspect the lessons were recorded in mono, and somehow Lessons 4 and 7 have recordings whose right channels’ polarities ended up reversed (see below). This resulted in phase cancellation that mangles the audio when systems try to play the audio by summing the channels. This probably also messed up the auto-transcription.

It’s an easy fix; I just reversed the polarities of one of the channels and played it back. I tested snippets on both of my mobile phones and the playback is fine now. If I provide fixed audio files, are you able to re-render the videos (caveat: the raw WAVs are about 1GB each)? Let me know and we can figure something out.

marcmuc · February 11, 2019, 8:31am

why don’t you encode them as .aac which is what they are inside the youtoube videos anyways, then they are only 100-130MB. Would make handling easier/faster…

amqdn · February 11, 2019, 8:43am

They aren’t:

Theoretically, I can re-encode them back into WEBM or encode them into whatever else, but the point is to make it drag-and-drop simple to re-render the lesson videos without changing anything about the original files except polarity.

We’ll see.

marcmuc · February 11, 2019, 8:55am

hmm, those are the options youtube-dl gives you for audio only download. But inside the mp4 video containers afaik they are encoded as aac and it is what youtube/google officially recommends.

https://support.google.com/youtube/answer/1722171?hl=en

amqdn · February 11, 2019, 8:46pm

They’re actually encoded with Opus; you can see that on the right.

Regardless, I think we have a lot of hammers and no nail. Can we wait for Jeremy to respond?

If file size is an issue, we can address it then.

snosrap · February 14, 2019, 8:31pm

Hi Gen,
I ran into this too. The speech is almost imperceptible if you have your Mac settings set to “play stereo audio as mono”. Not sure why I had this setting checked (it’s not the default). Unchecking that box makes the audio much better, but it still sounds strange and out-of-phase for the reasons described elsewhere in this thread.
My solution was to download the video with youtube-dl, open the mp4 in Quicktime 7 (!!), and change the audio tracks to: L->center, R->unused.
Another solution would be to reverse the polarity of one of your speaker wires if you’re connected to a stereo (ha!)
Ford

Cain · February 17, 2019, 1:45am

I uploaded a subtitle file generated by google for Lesson 4, but the view of the open video is not displayed. Is it necessary for the administrator to review it?

marcmuc · February 17, 2019, 9:00pm

That’s great, @Cain, if it is a youtube transcript with timestamps it would be very useful to include it in the new video player. Have a look at this thread, best make a pull request to Zach’s repo or contact him about it (@zachcaceres). It would be awesome if you could do the same for lesson 7!?

jeremy · February 19, 2019, 10:53pm

Thanks @amqdn - I rather suspected that’s what happened. I can fix that easily enough - I assume Audacity has some filter that can do it…

Yup looks like it’s this one: https://manual.audacityteam.org/man/invert.html

amqdn · February 20, 2019, 12:01am

Great!

Yes, that should do it, as long as it’s applied to only one of the channels.

elimike · August 23, 2022, 3:51am

Hi, my name is Elie, I’ve been following the course for the past Month and as a first exercise I was able to buils an object detection on the logos of the fortune 1,000 companies with ~80% accuracy.

I started lesson 4, I’m interested in building a semantic search engine prototype part of the course would you have any recommandations?

Thank you