Learning audio spectral mapping

elece · September 26, 2018, 6:41pm

Hello. I am finishing the first part of the DL course. Very excited about the things that I am learning. I come from the audio/acoustics world, so I’m eager to try some of the things showed in the classes within my field of work. I already applied some NN on a fairly simple classification task of audio files (the Audio Cats and Dogs dataset on kaggle), I processed the audio files to lean using the STFT (short-time fourier transform).
My next project is much more challenging. I want to work with de-reverberation, which is one of the toughest problems in acoustics and audio DSP (noise reduction already works pretty well). The idea is to enter the NN with an audio file of a voice with a degree of reverberation and that the output is the same voice with less reverberation. I have built a database consisting of 1444 anechoic voices and 8664 reverberant voices (so basically each anechoic gave “birth” to 6 different reverberant versions of it with different degrees of reverberation and frequency characteristics). The idea is to preprocess everything with the STFT, and let the DNN learn the spectral mapping from the reverberant version to its anechoic counterpart. I have many doubts at the moment, but the first one is how to treat the problem. I don’t know if I should treat it as a classification task, where each anechoic audio file is a label, or use some other approach. Any thoughts on this are welcome. If everyone is interested in audio and neural networks and wants to join forces, just PM me.
Regards

cahya · September 27, 2018, 10:15am

Hi elece, I am also interested in audio processing with deep learning. I read the challenge you described, and I think maybe this could be done using autoencoder since it is used for example also for denoising or reconstructing images. Here is a good article about it using Keras: https://blog.keras.io/building-autoencoders-in-keras.html

johnhartquist · November 28, 2018, 5:20am

You may be interested in a blog post I put up earlier today going over some examples of loading and classifying audio using a custom AudioItemList and a few other classes. It takes advantages of many of the new changes in fastai v1.

All the code is available, and there are lots of examples in the notebooks section of the repo

hwasiti · December 11, 2018, 6:14am

I think GAN is useful for such problems. Maybe you can try it.In part 2 or the new fastai course v3 part 1, Jeremy showed how to use GAN (it is coming in lesson 7 to be precise.)