Deep Learning with Audio Thread

Hey! can you invite me to this one? :slight_smile:

hey, did anyone face problem in importing librosa in GCP?

I dont why I am unable to import it. Re-installed it using pip install librosa. Doesn’t help.


ModuleNotFoundError Traceback (most recent call last)
in
1 from fastai.text import *
----> 2 from audio import *

~/audio/init.py in
1 from .audio import *
----> 2 from .data import *
3 from .learner import *
4 from .transform import *

~/audio/data.py in
1 from .audio import *
----> 2 from .transform import *
3 from pathlib import Path as PosixPath
4 from IPython.core.debugger import set_trace
5 import os

~/audio/transform.py in
9 import torch
10 import torch.nn.functional as F
—> 11 import librosa
12 import torchaudio
13 from librosa.effects import split

ModuleNotFoundError: No module named ‘librosa’

2 Likes

Is this v1 or v2? We are trying to straighten out our environments in v2 right now. If it’s v1, I’ve had no reported problems and I use GCP as well.

2 Likes

v1 - working now. I randomly added ‘sudo’ before chmod. Dont know why this should work though.
!git clone https://github.com/mogwai/fastai_audio > /dev/null 2>&1
!cd fastai_audio/ && sudo chmod +x install.sh > /dev/null 2>&1

we dont have any way to save and load audiolist databunch?
Tried the load_data, it throws some warning:

/usr/local/lib/python3.6/dist-packages/fastai/basic_data.py:262: UserWarning: There seems to be something wrong with your dataset, for example, in the first batch can't access any element of self.train_ds. Tried: 504,23312,27618,28118,21732...

and error while fetching the learner.

This looks very interesting for DL with audio data:

5 Likes

MicPie
WOW AWESOME!

cheers mrfabulous1:smiley::smiley::smiley:

Id love to work differentiable signal processing into fastai audio but unfortunately this is built on tensorflow and I doubt we’ll see a PyTorch implementation in time to use it. We may implement some related things such as learned filterbanks (as opposed to set frequency ranges in a linear or melspectrogram)

I’ve sorted out the colab notebook for people interseted in a quick way to get started with v1 of the library. We will create one of these for fastai2_audio as well

https://colab.research.google.com/drive/1HUVI1CZ-CThHUBO8l2lp6hySjrbs0SY-

3 Likes

I was able to successfully convert my fast.ai model to Caffe2 for deployment following this tutorial https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html.

I now realize, I also need to convert my preprocessing steps as well. Currently I’m generating spectrograms on the fly as part of an Audio Databunch.

Has anyone had experience with stripping the fast.ai dependencies to be able to generate their spectrograms for inference in “pure” python (outside of fast.ai) in the same way that they are done in the databunch?

deep visual-semantic embedding model
identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. Has anybody played around with audio-semantic embedding ?.

Hello, I tried to implement the 03_Environmental_Sound_Classification.ipynb notebook on Colab.
But I got these errors.
“from audio import *” returned “ModuleNotFoundError: No module named ‘audio’”. “pip install audio” returned an other error: “ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.”

Then with “sg_cfg= SpectrogramConfig(hop_length=512, n_mels=128, n_fft=1024, top_db=80, f_min=20.0, f_max=22050.0)”, I got the error “NameError: name ‘SpectrogramConfig’ is not defined”

Any leads to keep on testing the notebook? Thx

Since yesterday I have been unable to import audio

----> 2 from audio import *

/usr/local/lib/python3.6/dist-packages/torchvision/models/inception.py in Inception3()
180 return x, aux
181
–> 182 @torch.jit.unused
183 def eager_outputs(self, x, aux):
184 # type: (Tensor, Optional[Tensor]) -> InceptionOutputs

AttributeError: module 'torch.jit' has no attribute 'unused'

Anyone else facing a similar issue?

This looks a torch versioning issue? Have you tried resetting and reinstalling your python environment?

1 Like

Did you base your notebook off this one:

https://colab.research.google.com/drive/1HUVI1CZ-CThHUBO8l2lp6hySjrbs0SY-

1 Like

yes this is torch version issue. I am on colab, tried installing torch 1.4 etc, but think tourchaudio is of specific version and works with a specific version of torch.

(Sorry forgot which versions were compatible)

I followed this notebook to install audio. It’s working fine now!

Great I’m glad it worked for you. Yes currnently torchaudio has a requirement of torch 1.4 which is a bit annoying so make sure your installing torchaudio < 0.4.0

Hello! Over the past few weeks I have been developing a bird sound classifier, using the new fastai v2 library!

You can look at my notebook here: https://github.com/aquietlife/fastai2_audio/blob/birdclef/nbs/bird_classifier.ipynb

I wanted to incorporate some of the fastai v2 audio library into this, but I wasn’t sure how best to do it.

The dataset I’m using is from the LifeCLEF 2018 Bird dataset, and I re-implemented the BirdCLEF baseline system into Jupyter notebooks with some refactoring done along the way with the fastai v2 library.

The basic idea of what I did was:

Take the dataset, and use the baseline system’s methodology of extracting spectrograms to get a large amount of spectrograms for each of the 1500 classes of bird species.

The interesting bit about extracting the spectrograms can be found here: https://github.com/kahst/BirdCLEF-Baseline#spectrogram-extraction

From there, I did the classic transfer learning technique of training my model against the spectrogram images, on a ResNet model pretrained on ImageNet. I got down to about a 27% error rate!

I just wanted to post this now as I begin to tie it up to see if anyone had any feedback or questions. I’m going to be presenting my work at Localhost, a talk in NYC on February 25th if anyone is around! I’ll be presenting fastai v2 and the audio library to a big audience, so hopefully it will get more people interested in the library :slight_smile:

I wasn’t able to think about how to use the audio library for this because of the weak labeling problem around finding where in the audio signal the bird sounds are. Much of my notebook is around re-implementing the baseline system’s approach which goes through all of the recordings, takes 1 second chunks, creates a spectrogram, and uses a signal to noise heuristic to determine if that section has a bird call inside of it.

I’d love to help implement some kind of approach like that within the audio library so I could use it for the entirety of my pipeline - I know that other datasets have the same issue was well, so its going to be something to think about.

Two papers that I came across that deal with this are:

and

Anyways, I wanted to share my progress with others to see if you had any feedback, questions, or suggestions on moving forward. My next main goals are to keep training (always be training), inference with the test data set, and then do some training on a smaller dataset that I’m interested in (birds from around my area), and do some inference testing on that.

7 Likes

Congratulations on your success! I’m glad that you found the library useful. Seems like you did a lot of extra pre-processing which is interesting and something that we might need to think about when extending the functionality of the library.

So as I understand it, you’re detecting whether a signal contains a bird call so that you can crop that particular area before you actually train on it later on and therefore reduce noise in the data set. Depending on how well that does its job, you could also be adding noise to your data set potentially?

Even though the notebook is in a fork, looks like your not actually using the library in the notebook you’ve shared, you’ve generated the spectrograms on your own and then using the core fastai2 to train. Was there any problems you had specifically?

I’d see if you could incorporate the code from fastai2_audio which could help boost your accuracy. SpecAugment in particular might allow you to train longer without over fitting and pre-processing such as silence removal could be useful aswell.