Feature extraction from audio signals

I don’t know where to start, I have never used audio in any of my projects. I need to finish a project which requires Feature extraction from audio signal. Features to be extracted are:

  • duration: Total duration of the audio
  • energy: Energy of the audio signal
  • power: Power of the audio signal
  • min_pitch: Minimum Pitch
  • max_pitch: Maximum pItch
  • mean_pitch: Mean Pitch
  • IntensityMin: Minimum Intensity of the audio signal
  • intensityMax: Maximum Intensity of the audio signal
  • intensity mean: Mean Intensity of the audio signal
  • jitter: Jitter is the variation in the periodicity of a signal
  • shimmer: Shimmer is used as one of the measures for the micro-instability of vocal cord vibrations
    . * jitterRap: Jitter Relative Average Perturbation
  • numVoiceBreaks: number of voice breaks
  • PercentBreaks: percentage of voice breaks
  • speakRate: Rate of speaking
  • numPause: Number of Pauses
  • maxDurPause: Maximum Duration of Pauses
  • avgDurPause: Average Duration of Pauses
  • TotDurPause: Total Duration of Pauses
  • maximising: Rising of Voice
  • MaxFalling: Falling of voice
  • AvgToRise : Average Rise
  • AvgToFall: Average Fall
  • numRising: Number of Rises
  • numeral L Number of Falls

Just guide me a little bit. How can i proceed with this. Thanks.

1 Like

You can use RNN with LSTM or GRU cells to extract all the features you need without performing any manual feature engineering. You take the audio files as they are and then you build a spectrogram. After that you perform 1D convolution on the audio spectrograms to extract the features. Once you do that feed those features to an RNN either with LSTM or GRU cells. It is better if you have access to GPU to train your model faster. I hope this helps. If you have further questions please don’t hesitate.

1 Like