Counting unique individuals in an audio file


(Hiromi Suenaga) #1

I am looking for a way to count how many unique voices are in a short audio file. I am not interested in what they are saying but just identify unique individuals (e.g. there are 2 unique voices in this audio). Could anybody point me to good resources?

Thank you!


#2

I do not know the answer. It worth noting that sound is usually studied in the frequency domain (or time-frequency) - each voice has specific timbre. So the input to the neural network model can be Fourier Transform (FFT, STFT) of the signal.


(Hiromi Suenaga) #3

Thanks for the input! I will report back on what I find :slight_smile:


(Jeremy Howard) #4

Here’s a paper that may be helpful http://research.baidu.com/deep-speaker-end-end-system-large-scale-speaker-recognition/


(Hiromi Suenaga) #5

Wonderful! Thank you so much! I will study the paper right away :slight_smile:


(Jeremy Howard) #6

I also came across this today: https://www.youtube.com/watch?v=13NVgk3N6Uo
This is the paper: https://www.merl.com/publications/docs/TR2016-003.pdf


(Hiromi Suenaga) #7

Thank you for keeping an eye out for me! I added them to my list :grinning:


#8

What about fine tuning an existing audio model like WaveNet https://arxiv.org/abs/1609.03499 (https://deepmind.com/blog/wavenet-generative-model-raw-audio/)? Another topic for research :slight_smile: Just brainstorming.


(Hiromi Suenaga) #9

This is great! I am so happy to have all the resources and the community!!