I have a video (or audio) file and I need to align already existing subtitles to the audio exactly. It’s not a “crop” function that adjust the start and end, but each subtitle needs to be accurate.
I’m building a Japanese learning tool for students and I have a DVD video with subs available, but they are not accurate enough timing wise.
Since I’m an absolute beginner I’ll just try and follow the tutorial and update my progress. If anyone’s got tips I’d be happy to listen. In any case hopefully this topic will be of help to people.
To wrap up here’s a few more useful resources:
- Aeneas Forced Alignment - this i’ll have to try too
- Google’s WebRTC Voice Activity Detector - Totally not sure if this is like a volume detector or will background noise be accounted for
- DeepSpeech for Torch - There’s a lot of talk about Baidu’s Researched DeepSpeech, but thats speech-to-text and using if for japanese… I would expect much results.