Feature extraction from audio signals

pbprateek · February 12, 2019, 6:56am

I don’t know where to start, I have never used audio in any of my projects. I need to finish a project which requires Feature extraction from audio signal. Features to be extracted are:

duration: Total duration of the audio
energy: Energy of the audio signal
power: Power of the audio signal
min_pitch: Minimum Pitch
max_pitch: Maximum pItch
mean_pitch: Mean Pitch
IntensityMin: Minimum Intensity of the audio signal
intensityMax: Maximum Intensity of the audio signal
intensity mean: Mean Intensity of the audio signal
jitter: Jitter is the variation in the periodicity of a signal
shimmer: Shimmer is used as one of the measures for the micro-instability of vocal cord vibrations
. * jitterRap: Jitter Relative Average Perturbation
numVoiceBreaks: number of voice breaks
PercentBreaks: percentage of voice breaks
speakRate: Rate of speaking
numPause: Number of Pauses
maxDurPause: Maximum Duration of Pauses
avgDurPause: Average Duration of Pauses
TotDurPause: Total Duration of Pauses
maximising: Rising of Voice
MaxFalling: Falling of voice
AvgToRise : Average Rise
AvgToFall: Average Fall
numRising: Number of Rises
numeral L Number of Falls

Just guide me a little bit. How can i proceed with this. Thanks.

aminecherif94 · April 4, 2019, 1:51pm

You can use RNN with LSTM or GRU cells to extract all the features you need without performing any manual feature engineering. You take the audio files as they are and then you build a spectrogram. After that you perform 1D convolution on the audio spectrograms to extract the features. Once you do that feed those features to an RNN either with LSTM or GRU cells. It is better if you have access to GPU to train your model faster. I hope this helps. If you have further questions please don’t hesitate.