Forming Post Course Project Groups

Hey ya’ll!
In an effort to actually absorb the class material, I’m interested in applying all this DL to a totally new project, and trying to produce something worthy of a sweet blog post, or maybe even publishing. I imagine others are too, so this is a thread to start forming those groups. I don’t know what others want to do, but personally, I’m a musician, and would love to work with audio data. I think it’s understudied compared to vision/nlp, and my love of music and audio would give me a bit of special knowledge here. So if anyone is interested in some of the following project ideas, just respond and let me know!

  • Making awesome music generation models
    –> Cycle GANS for transforming songs from one genre to another (eg. create a reggae version of any song)
    –> Something to generate lifelike piano pieces from sketches or from nothing
    –> Build a text-to-singing model. Can we make it sound like Adele sang a Beatles song?

  • Beat benchmarks for the FMA dataset, the polyphonic piano transcription**, or for Google’s Audioset

  • Automatically single out instrument parts in a song
    -> I’d love to be able to just get the vocals from any tune. Or just the guitar part, etc. DJ’s would love this!

If you have your own project ideas, you should also post those here.


There was an interesting talk today at ICLR by Kristen Grauman about sound source separation using video. Really interesting.

1 Like

Also saw a new Kaggle competition today by CERN

Anyone interested?


I also just noticed this Freesound General Purpose Audio Classification challenge on Kaggle. It’s kind of like imagenet for audio. It looks cool, and still has 3-months to go. Could be a fun one! Anyone down?


I’m interested. I’ve done some work in audiobook narration, and I think there are a number of practical uses for deep learning in audiobooks. For example, there are some very time-consuming tasks audio editors must do that probably could be handled rather easily with deep learning. This kind of thing could prove quite marketable.

Sounds interesting - what would be examples of those tasks?

The main example I had in mind was this: Every audiobook has to go through a quality control check before it’s published. A human listener has to check that each spoken word matches the word on the page. The books are 8-, 12-, 20-hours long or longer. On the low end, the QC folks are charging $50-60/hour to do the quality check.

Perhaps I’m wrong, but it seems like it would be fairly easy to use voice recognition to do the check much quicker and cheaper.


@travis Interesting. If speech recognition was all it took, then that’s already available via Google’s speech recognition API. Is the audio book world too behind to know this even exists? Or perhaps these API’s are not as good as we think. My limited experience with YouTube’s automated transcription is that it’s good, but misses plenty of stuff in real world contexts. Do you still know anyone in that audiobook world? I’d be curious to know if they’ve tried the various voice recognition API’s. It would be great to know where they fall down.

Also, do any other examples come to mind?

@travis You’d want to benchmark against as well.


I’d be interested in teaming up to tackle some audio work with deep learning. I’m ultimately trying to work on polyphonic music transcription (geared toward guitar), but I’m just now starting to try to apply the stuff from this course to raw audio.

I’ve been trying to wrap my head around WaveNet (and some of the more recent modifications), and was going to try to build it up using the fastai library. The Freesound Kaggle competition looks like a great opportunity to start applying some of this stuff.

At the very least it would be cool to get a discussion group around these topics.

1 Like

Well, I had the same thought as you, “Surely somebody is doing this already?” But it definitely was not a thing that was readily available when I was doing it about a year ago, and a quick search of Facebook groups that I used to haunt reveals many people still offering QC/Proofing services ranging from $25-60/hour.

I can’t speak for the major publishing houses. I don’t know how they do it. And I didn’t have super high-end software. However, what I had was pretty standard for narrators working out of a home studio. It claimed to use machine learning for noise reduction and audio repair (and was pretty good at it), but it didn’t have the functionality that I mentioned.

I presume you’re right: since it is not readily available, it must be harder to do than it seems. If so, that sounds like a problem worth looking into.

As for other examples, they all have to do with making it easier to achieve excellent sound quality and speeding up the processing time. You have to produce audiobooks in pretty narrow decibel ranges, and you have to have a low noise floor. This can be a challenge for those with less than ideal sound proofing and audio-capture equipment, both of which can be quite expensive. Also, there are a million different sounds that can detract from the quality, everything from train and airplane noises that make it into the background to mouth clicks and pops.

For these, many companies are already using deep learning to address them. I just think it would be interesting to see if can improve on it.

1 Like

@travis @johnhartquist OK, so it sounds like we have a little group of people interested in audio. I agree that the Freesound Kaggle competition seems like a good place to get our feet wet. Wanna form a team on that? In my eyes, it doesn’t mean a 3-month commitment. We can try to create a few decent submissions, and see how it goes. If we love it, we can keep hacking on it. Or we can take what we learn and move on to a new project, like either polyphonic music transcription, voice recognition, or music generation.

Are you guys in SF? I’d love to be able to meet up in person. But we could also do some skype calls or emails to talk through things, and get the ball rolling. My email is What about yours?

I’m down, that sounds good to me. I’m in the East Bay but I can meet in SF or Oakland anywhere near a BART station – my email is

Sounds good to me. I’m not in SF, but all the way across the country in Georgia.

By the way, I see the flaw in my thinking about voice recognition. Even at 99% accuracy, that still is about 550 incorrect words for a typical novel, which would be unacceptable.

I’m also interested to try a CycleGAN to change music styles/genres, audio segmentation to separate instruments (no idea how!) and music transcription to guitar tabs. Also, i’d like to try audio improvement from a low-quality mp3 (there was a thread about this some time ago, i read a paper about it but then the course contents became a bit overwhelming so I focused on lessons - i’m still catching up btw).

I’m going to have a look at freesound competition. I’d love to join your team but I am not at all in SF (France, GMT+2 at the moment) so i understand if this is too much of a hassle!

I’m very interested in forming a partnership for a project I’m working on. I’m planning on selling it later, so I can’t really post too much publicly. I have no idea how profitable it might be, but I’m open to connecting with anybody interested. Feel free to comment here, DM/email me. As far as the techniques used, it’s not terribly complicated, but I’m having some trouble with it because I’m having to make my own training/test set.

Hey gdc. Love to have you on the team. Reply with your email, and I’ll bring you into our thread. We’ll deal with timezones as they become an issue.

@bhollan Hope you find some people! To aid in that, it’s usually better to just tell people what you’re working on. The idea of someone “stealing” your idea and running away to make millions is a total myth. It’s hard enough to get people to join your team when you’re offering them money and making them a co-founder! Ideas, in and of themselves, are generally considered “worthless”. Execution is what matters, and execution is what’s hard. And often, successful entrepreneurs have a keen insight or special contacts in their market. If any old person can just take your idea and do it as well as you, then it’s probably not a very defensible business.
If you really have some black magic secret sauce, then sure, you don’t have to share that. But you gotta get people excited enough to contact you. Just my two cents after having played the startup game for a number of years.


Great! my email is [removed as forum will soon be public] . Thanks!