Deep Learning with Audio Thread

Roughly Subclass AudioList and override “open” method to support numpy file format.

2 Likes

You can also pass numpy arrays into many librosa functions. You just need to provide the sample rate as well. So you could just compute spectograms on your array, then use those with other code used for audio. You most likely don’t want a melspectogram though. The mel scaling is based on the properties of human hearing which there is no reason to think are useful in your case. But you should be able to apply many of the techniques here to non mel-weighted spectograms and librosa can produce those. Though then you may have a much larger amount of frequency data to deal with. You’ll thus probably want to set the minimum and maximum frequencies of your spectograms to something appropriate to ECG.
Another issue you are likely to find is the problem of dealing with the various channels of ECG data. That’s quite different to audio where generally only 1 channel is used and at most the 2 of stereo audio files.
So you may find that much of the stuff here doesn’t necessarily help. I’d imagine you can still use the basic idea of converting the time-series data to the frequency domain (spectograms), then creating an image from that and using existing image models. Perhaps creating a spectogram per channel, then concatenating the resulting arrays along the height axis to create a single image. Something like your sample image but replacing those time-domain line plots with spectograms. You could also use each EEG channel as a separate channel in the image, but pre-trained image models want 3 image channels so you’d have to deal with that. I think that Jeremy talked about this in one of the lectures in Part 1 when considering dealing with 4 channels in some dataset, maybe satellite data. He suggested some things you might apply to extending pre-trained 3 channel image models to the 12 you have (or however many, there’s 12 in that image).
I’d also be aware that the multi-channel nature might make a lot of the processing here not applicable. I’d imagine you would for instance want to take into account the multiple channels when normalising your data. Similarly applying augmentations without taking into account the multi-channel nature may not work well. Though on the other hand it might be fine.

3 Likes

please check out https://github.com/StatisticDean/Crystal_Clear , thanks to StatisticDean. I think this is an implementation of the idea from lesson 7.

1 Like

Wiki overhaul completed!

1 Like

Hey sorry for the delay in replying to this. I find your article really fascinating. I think the best topics to write on are the ones you’re interested in and working on so I’d say just write up whatever you think is coolest and share it! But one thing that is currently missing is a thorough intro to audio. It’s a bit overwhelming coming in and seeing spectrograms, melspectrograms and logmelspectrograms, not to mention all the various feature extractions like MFCC, and all the params they take like min/max freq (different for speech v music/sound), hop_length, n_fft, number of mel bins…etc. Fine tuning those parameters for your application and clip length do appear to matter and it’s a possibly intractable problem to just provide all “audio” users with good defaults. One option is instead of having defaults around application. So passing in “speech” and using logmelspec with frequency range of human speech and so on.

Anyways I’m getting off track, but I think a very comprehensive intro to audio could be really useful. Something in the fastai style, top-down and ready to use on several problems, with lots of detail and stuff a new user won’t understand on the first pass, but will gradually come to understand with time. This is something I’d like to work on but could definitely see it being a collaborative effort between various regular posters in the thread. You seem to have more domain expertise so let me know if it’s something you’d be interested in helping with (even if that means just reading over it and correcting mistakes/misconceptions we have). Cheers.

4 Likes

Love the idea of “defaults by application” like speech, music, etc.

Btw: Fast.ai audio should be part of version 1.2!

https://forums.fast.ai/t/fastai-v1-2-plans-based-on-part2-v3-notebooks/45367/4?u=ste

3 Likes

Hi all,

I’ve been using fast.ai for language classification from audio signals using mel-spectrograms as my inputs. It works well as long as all of my audio data from different languages is from the same dataset (therefore in the same format). When I train on language A from dataset 1, and validate on language A from dataset 2, the accuracy for that language drops significantly, and instead the network guesses that a lot of the signals from language A are a different language from dataset 2.

This leads me to believe the network is first picking up on features that determine which dataset the audio came from, and then classifying the language as a secondary feature.

I tried reformatting data from each dataset so that it would contain the same amount of information. Dataset 1 (voxforge) consisted of wavfiles and dataset 2 (mozilla common voice) consisted of mp3 data, so I converted dataset 1 to mp3 data that had the same sampling rate and bits per sample as dataset 2.

I’m thinking there must be some other encoding artifact from the datasets that is throwing my network off. Has anyone else had issues with “normalizing” data across datasets or anything about converting encodings so that the information encoded is of equivalent quality?

2 Likes

bioacoustics defaults too, please!

P.S. What does the forum post say, I cannot see it :slightly_frowning_face:

2 Likes

New conference presentation on using CNN for bioacoustics audio classification.
Authors reported on augmentations they used, good info to consider for fast.ai audio

  1. " changed the amplitude of the spectrograms in range −6 dB to +3 dB"

Result: weak calls detected better.

  1. The pitch was changed by a factor in the range of [0.5, 1.5]
    and the length of the signal was stretch by a factor between
    [0.5, 2].

Result: better generalization with new data

  1. added characteristic noise

Result: ???

Overall results on calls classification task:

mean test accuracy of 87 % after training for 72 epochs

1 Like

The post just says that Jeremy wants to include the audio library in the next fastai library release :slight_smile: In the next month or two.

5 Likes

Am sorry , if its a dump question ? has anybody faced issue with _torch_sox ?.

i am not able to import _torch_sox

Are you using the fastai audio module or trying to import directly ? Can you post a full error message?

Trying to use fastai audio module, may be below error is the problem

Running in RHEL 3.10 , torch version is 1.0

sh-4.2$ python setup.py install
running install
running bdist_egg
running egg_info
writing torchaudio.egg-info/PKG-INFO
writing dependency_links to torchaudio.egg-info/dependency_links.txt
writing requirements to torchaudio.egg-info/requires.txt
writing top-level names to torchaudio.egg-info/top_level.txt
reading manifest file ‘torchaudio.egg-info/SOURCES.txt’
writing manifest file ‘torchaudio.egg-info/SOURCES.txt’
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
error: [Errno 2] No such file or directory: ‘which’: ‘which’

Please suggest if you have seen this before.

I’m wondering if behavior related to unknown class is a function of using softmax.

During part 2 it was mentioned that Softmax always wants to elevate one feature. In a dataset that doen’t always conform to labels, or where more that on object might be present binomial loss function might be better.

Just a guess

I’ve made a post related to Siamese Networks but I’m using it for Audio related stuff.

3 Likes

Are you running the install.sh file?

Can Any Body please help me to set up fastai audio module in google colab . As I am new to this field and already tried with !wget to get files from github and also running install.sh but it is throwing me error .

Can you attach the error log?

1 Like

I’ve made a colab setting up the audio module:

https://colab.research.google.com/drive/1s0Ouw5PxvrmHdm_gBU0qiA6piOf3VSWO

Might want to pin this for others @MadeUpMasters

3 Likes

@baz I am getting this error when running in colab .

ImportError Traceback (most recent call last)

<ipython-input-5-310b7e31ad9a> in <module>() 1 from exp.nb_AudioCommon import * ----> 2 from exp.nb_DataBlock import * 3 import matplotlib.pyplot as plt 4 import torch 5 from fastai import *

/content/fastai-audio/exp/nb_DataBlock.py in <module>() 15 from IPython.display import Audio 16 import torchaudio —> 17 from torchaudio import transforms 18 19 class AudioItem(ItemBase):

ImportError: cannot import name ‘transforms’