Counting unique individuals in an audio file

I am looking for a way to count how many unique voices are in a short audio file. I am not interested in what they are saying but just identify unique individuals (e.g. there are 2 unique voices in this audio). Could anybody point me to good resources?

Thank you!

3 Likes

I do not know the answer. It worth noting that sound is usually studied in the frequency domain (or time-frequency) - each voice has specific timbre. So the input to the neural network model can be Fourier Transform (FFT, STFT) of the signal.

1 Like

Thanks for the input! I will report back on what I find :slight_smile:

Here’s a paper that may be helpful http://research.baidu.com/deep-speaker-end-end-system-large-scale-speaker-recognition/

1 Like

Wonderful! Thank you so much! I will study the paper right away :slight_smile:

I also came across this today: https://www.youtube.com/watch?v=13NVgk3N6Uo
This is the paper: https://www.merl.com/publications/docs/TR2016-003.pdf

1 Like

Thank you for keeping an eye out for me! I added them to my list :grinning:

What about fine tuning an existing audio model like WaveNet https://arxiv.org/abs/1609.03499 (https://deepmind.com/blog/wavenet-generative-model-raw-audio/)? Another topic for research :slight_smile: Just brainstorming.

This is great! I am so happy to have all the resources and the community!!