Forming Post Course Project Groups

I’m very interested in forming a partnership for a project I’m working on. I’m planning on selling it later, so I can’t really post too much publicly. I have no idea how profitable it might be, but I’m open to connecting with anybody interested. Feel free to comment here, DM/email me. As far as the techniques used, it’s not terribly complicated, but I’m having some trouble with it because I’m having to make my own training/test set.

Hey gdc. Love to have you on the team. Reply with your email, and I’ll bring you into our thread. We’ll deal with timezones as they become an issue.

@bhollan Hope you find some people! To aid in that, it’s usually better to just tell people what you’re working on. The idea of someone “stealing” your idea and running away to make millions is a total myth. It’s hard enough to get people to join your team when you’re offering them money and making them a co-founder! Ideas, in and of themselves, are generally considered “worthless”. Execution is what matters, and execution is what’s hard. And often, successful entrepreneurs have a keen insight or special contacts in their market. If any old person can just take your idea and do it as well as you, then it’s probably not a very defensible business.
If you really have some black magic secret sauce, then sure, you don’t have to share that. But you gotta get people excited enough to contact you. Just my two cents after having played the startup game for a number of years.


Great! my email is [removed as forum will soon be public] . Thanks!

I saw this competition as well, interested to go for it?

I think WaveNet may be a little overkill for the general audio classification task, but would definitely be worth looking into for generative models. Spectrograms are still being used quite a bit, along with MFCCs which attempt to map frequencies to something more representative of human perception.

I’m really interested in multi-resolution spectral features in addition to the raw waveform itself, somehow combining RNNs and CNNs.

1 Like

I really appreciate the input. Honestly, after I posted this I had kind of resolved to make a post about my project, come what may. Finishing it is much more important than profiting off it. But I think you’re definitely right. I think somebody stealing my idea is much more remote a possibility than I’m probably worried about. Thanks for the inspiration!

Do you all have torchaudio installed? If so, what’s the best way to do it? I’m using Paperspace, and it was not included as part of the Fastai environment. For torchvision, you can do a conda-install as described here by Jeremy. No such luck for torchaudio.

Am interested in music generation models. I’m a musician myself and would love to follow-up along this line.

I’m definitely interested in music generation models. I think it could be really cool to try the progressive GAN approach for music (on either audio, or MIDI)

This should work for you

But it doesn’t support windows… not sure what to do here. :frowning:

Thanks. I gave that a shot, but when I tried importing torchaudio into a notebook, I got the following error:

Anywhere near Atlanta, If I may ask?

No, I’m in Savannah.

1 Like

@andrewyip what’s your email? We should keep in touch about music generation models.

has anyone taken a crack at the freesound competition? or processing audio at all using the library? It seems most people have generated STFT/FTT/MFCC spectrographs and run a CNN on the resulting jpgs, but I wanted to try to run a model on the resulting arrays themselves. Has anyone tried this yet?

Yes. 5 of us have a team doing it right now. I’m not quite sure what you mean by “run on the resulting arrays themselves” vs. a jpg. Everything is just tensor arrays by the time it gets to the model. If you want to join our team, I can invite you to the slack channel we have. We’re currently getting a little above the baseline. But we have loads of ideas, and I feel confident we can create a solid submission.

Sorry I should have been clearer. I’ve seen approaches that involve generating a spectrogram plot (of whatever length windows), converting that an image file, and then training a CNN on those images, the idea being the spectrograms will represent the content the same way a picture of a dog or cat would. The other way to use that data would be to use the numpy array generated by a call to, say, librosa.features.mfcc and using that as the input for a neural net, which sounds like your approach.

Given the interest in deep learning applied to music I thought I’d share this paper as it seems likely to be of interest and relevant:

A Universal Music Translation Network
Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman
(Submitted on 21 May 2018 (v1), last revised 23 May 2018 (this version, v2))
We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or musical transcriptions. We evaluate our method on NSynth, as well as on a dataset collected from professional musicians, and achieve convincing translations, even when translating from whistling, potentially enabling the creation of instrumental music by untrained humans.


Yeah we’re doing mfcc’s, and mel filterbanks, and just raw audio. And we’re trying them on various architectures, including CNN’s, RNN’s, and soon I’m hoping to do an all attention model like a Transformer network.