Sensory Models

KevinB · April 26, 2018, 7:57pm

Are there any models or lines of research that take input from multiple type to train a model? Like a language model that also uses pictures of the words that are being read to give a second layer of thought to the model? Also maybe text plus audio to build a speech output that doesn’t sound computery. The other one would be creating a person’s lip movements when they talk by giving the transcript of what is being said and the visual of what their lips look like. Just curious if these are being combined currently or if these are pretty silo’d at the moment

binga · April 26, 2018, 8:01pm

You might want to read this: https://research.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html where the folks used data from different modalities.

KevinB · April 26, 2018, 8:05pm

That’s exactly the type of thing I was looking for. Thanks for sharing. Here’s a paper based on that page for more technical details: https://arxiv.org/pdf/1804.03619.pdf

jeremy · April 26, 2018, 10:40pm

Pretty much all our models we’ve looked at can work together in this way. Give it a try!