Counting unique individuals in an audio file

hiromi · January 23, 2018, 7:59pm

I am looking for a way to count how many unique voices are in a short audio file. I am not interested in what they are saying but just identify unique individuals (e.g. there are 2 unique voices in this audio). Could anybody point me to good resources?

Thank you!

krasin · January 27, 2018, 10:15pm

I do not know the answer. It worth noting that sound is usually studied in the frequency domain (or time-frequency) - each voice has specific timbre. So the input to the neural network model can be Fourier Transform (FFT, STFT) of the signal.

hiromi · January 29, 2018, 3:05pm

Thanks for the input! I will report back on what I find

jeremy · January 29, 2018, 4:52pm

Here’s a paper that may be helpful http://research.baidu.com/deep-speaker-end-end-system-large-scale-speaker-recognition/

hiromi · January 29, 2018, 5:46pm

Wonderful! Thank you so much! I will study the paper right away

jeremy · January 29, 2018, 8:09pm

I also came across this today: https://www.youtube.com/watch?v=13NVgk3N6Uo
This is the paper: https://www.merl.com/publications/docs/TR2016-003.pdf

hiromi · January 29, 2018, 8:21pm

Thank you for keeping an eye out for me! I added them to my list

krasin · January 30, 2018, 6:52am

What about fine tuning an existing audio model like WaveNet https://arxiv.org/abs/1609.03499 (https://deepmind.com/blog/wavenet-generative-model-raw-audio/)? Another topic for research Just brainstorming.

hiromi · January 30, 2018, 2:30pm

This is great! I am so happy to have all the resources and the community!!