I’ve found very little audio content on the forums, so I thought I’d start a thread for all things audio where we can post resources, find people working on similar projects, and help each other out. Maybe we could get a separate study group or slack/telegram chat going as well. Note: I am early in fast.ai and have only studied the audio->image->CNN route, if anyone else has experience with using RNNs in audio, please help contribute some resources.
Introduction to Sound:
Jack Schaedler - Compact Primer on Digital Signal Processing - This is a fantastic ~30 page tutorial of interactive diagrams explaining an introduction to sound, signal processing, fourier-transforms and lots of other concepts you’ll see in the sound processing world.
Mel-Frequency Cepstral Coefficient Tutorial - This article on MFCC by James Lyons is a thorough, detailed, and at times difficult description of MFCCs which are an important part of Speech Processing. Luckily for you, they are implemented with one line in Librosa using librosa.feature.mfcc, so this isn’t 100% required reading.
Coursera course on audio signal processing. Don’t get sucked into this one unless you really need it. It appears to be a great resource for people who want a truly deep dive on audio, but it is absolutely not necessary for building working audio projects with fast.ai.
Tensorflow Speech Recognition Challenge - (Non-active) competition to recognize 1 of 30 one word voice commands with 65000 samples.
Davids1992 - Speech Representation Kernel - An excellent kernel that will show you how to do essential speech processing.
DARPA-TIMIT Acoustic Phonetic Speech Corpus - An extremely detailed and well-curated speech dataset with 6300 sentences from 630 speakers from 8 major dialect regions of the United States. Set includes accurate English and IPA (International Phonetic Alphabet) transcripts. Note: The files in this dataset appear to be .wav but are actually a special format you’ll need to convert. StackOverflow - Reading TIMIT wav files
General Audio Processing
Kaggle Freesound Audio Tagging - (non-active) competition to classify 41 categories of general sound like “applause”, “cough”, “meow”, “scissors”, “tearing”…etc.
Zafar Beginner Audio Kernel - Another awesome kernel to get you started with sound processing.
Bachir - FreeSound Competition using Fast.ai libraries - Bachir went back and did the freesound competition but using fast.ai libraries. Very useful for fast.ai and audio beginners!
- Google Magenta Nsynth Dataset - A large-scale and high-quality dataset of annotated musical notes. Over 300,000 4 second clips of various instruments and notes.
- John Hartquist - Audio Classification using FastAI and On-the-Fly Frequency Transforms - John is trying to build audio processing directly into fast.ai libraries and gives an excellent report of how he did it, and what work remains. Current open problem: how can we transform audio files to make more robust data sets in the same way we do with image transforms? Who here can help? He also includes several notebooks showing both basic audio processing, and how to do it directly in fast.ai using a custom DataBunch object via the data_block API. John Hartquist - Fastai Audio Github Repo
Post here and share what you’re working on and what techniques you’ve found helpful!