Speech-to-text and Speech Synthesis

@jeremy , is there any room for ‘requests’ for the remaining portion of the curriculum? Along with the wonderful world of NLP and sentiment analysis, I think a wonderful “full circle” coverage of language would be ‘speech-to-text’ and ‘speech synthesis’. I know LyreBird and WaveNet are both very amazing results obtained through NNs. Is that already/possible to be in the remaining lessons for Part 2?

I’m very interested to learn some of the tips/tricks of this area. If anybody’s done anything so far, I’m all ears.

7 Likes

Even if these don’t make it into official curriculum, this would be interesting to discuss here. I have been pondering the idea of a speech-to-speech model that could take speech from one language and turn into into a machine generated version of your own voice on the other side in the other language.

1 Like

I’m afraid not - it takes me months of research to get to a point I have useful things to show in a lesson, so the main pieces have to be done well ahead of time. I would love to spend time on audio in the future however, and would be interested to hear if any students take a look at this area.

8 Likes

I started reading more about recent text to speech systems. In my limited exploration I think the easiest to understand/implement is Voiceloop from facebook https://github.com/facebookresearch/loop. Git repo is in pytorch 0.1 but I was able to run it in v0.4. Right now I am trying to understand how they did preprocessing.

Few good things about Voiceloop

  • Simple architecture
  • Ability to generate voices for different speakers
  • Ability to learn from noisy data
  • Working code

Speech data is very complicated when compared to image or text. Lot of preprocessing and feature engineering is required before Neural Nets can be applied.

2 Likes

Hi Saurabh,
thanks, the Loop sounds very promising, I was not aware of that !

Kind regards
Ernst