Counting unique individuals in an audio file

I am looking for a way to count how many unique voices are in a short audio file. I am not interested in what they are saying but just identify unique individuals (e.g. there are 2 unique voices in this audio). Could anybody point me to good resources?

Thank you!


I do not know the answer. It worth noting that sound is usually studied in the frequency domain (or time-frequency) - each voice has specific timbre. So the input to the neural network model can be Fourier Transform (FFT, STFT) of the signal.

1 Like

Thanks for the input! I will report back on what I find :slight_smile:

Here’s a paper that may be helpful

1 Like

Wonderful! Thank you so much! I will study the paper right away :slight_smile:

I also came across this today:
This is the paper:

1 Like

Thank you for keeping an eye out for me! I added them to my list :grinning:

What about fine tuning an existing audio model like WaveNet ( Another topic for research :slight_smile: Just brainstorming.

This is great! I am so happy to have all the resources and the community!!