Forming Post Course Project Groups

ranakj · May 4, 2018, 5:19pm

I saw this competition as well, interested to go for it?

johnhartquist · May 4, 2018, 5:50pm

I think WaveNet may be a little overkill for the general audio classification task, but would definitely be worth looking into for generative models. Spectrograms are still being used quite a bit, along with MFCCs which attempt to map frequencies to something more representative of human perception.

I’m really interested in multi-resolution spectral features in addition to the raw waveform itself, somehow combining RNNs and CNNs.

bhollan · May 4, 2018, 7:17pm

I really appreciate the input. Honestly, after I posted this I had kind of resolved to make a post about my project, come what may. Finishing it is much more important than profiting off it. But I think you’re definitely right. I think somebody stealing my idea is much more remote a possibility than I’m probably worried about. Thanks for the inspiration!

travis · May 5, 2018, 11:49am

Do you all have torchaudio installed? If so, what’s the best way to do it? I’m using Paperspace, and it was not included as part of the Fastai environment. For torchvision, you can do a conda-install as described here by Jeremy. No such luck for torchaudio.

andrewyip · May 5, 2018, 1:16pm

Am interested in music generation models. I’m a musician myself and would love to follow-up along this line.

blakewest · May 5, 2018, 3:46pm

I’m definitely interested in music generation models. I think it could be really cool to try the progressive GAN approach for music (on either audio, or MIDI)

jsonm · May 6, 2018, 12:30am

This should work for you https://github.com/pytorch/audio

But it doesn’t support windows… not sure what to do here.

travis · May 6, 2018, 11:26am

Thanks. I gave that a shot, but when I tried importing torchaudio into a notebook, I got the following error:

narvind2003 · May 6, 2018, 1:10pm

Anywhere near Atlanta, If I may ask?

travis · May 6, 2018, 2:27pm

No, I’m in Savannah.

blakewest · May 6, 2018, 11:38pm

@andrewyip what’s your email? We should keep in touch about music generation models.

msmedes · May 23, 2018, 3:09pm

has anyone taken a crack at the freesound competition? or processing audio at all using the fast.ai library? It seems most people have generated STFT/FTT/MFCC spectrographs and run a CNN on the resulting jpgs, but I wanted to try to run a model on the resulting arrays themselves. Has anyone tried this yet?

blakewest · May 24, 2018, 1:35am

Yes. 5 of us have a team doing it right now. I’m not quite sure what you mean by “run on the resulting arrays themselves” vs. a jpg. Everything is just tensor arrays by the time it gets to the model. If you want to join our team, I can invite you to the slack channel we have. We’re currently getting a little above the baseline. But we have loads of ideas, and I feel confident we can create a solid submission.

msmedes · May 24, 2018, 1:13pm

Sorry I should have been clearer. I’ve seen approaches that involve generating a spectrogram plot (of whatever length windows), converting that an image file, and then training a CNN on those images, the idea being the spectrograms will represent the content the same way a picture of a dog or cat would. The other way to use that data would be to use the numpy array generated by a call to, say, librosa.features.mfcc and using that as the input for a neural net, which sounds like your approach.

Even · May 25, 2018, 5:51pm

Given the interest in deep learning applied to music I thought I’d share this paper as it seems likely to be of interest and relevant:

A Universal Music Translation Network
Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman
(Submitted on 21 May 2018 (v1), last revised 23 May 2018 (this version, v2))
We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or musical transcriptions. We evaluate our method on NSynth, as well as on a dataset collected from professional musicians, and achieve convincing translations, even when translating from whistling, potentially enabling the creation of instrumental music by untrained humans.

blakewest · May 25, 2018, 8:20pm

Yeah we’re doing mfcc’s, and mel filterbanks, and just raw audio. And we’re trying them on various architectures, including CNN’s, RNN’s, and soon I’m hoping to do an all attention model like a Transformer network.