Hey everyone! I just completed my first blog post.
It’s a quick intro to classifying audio using image classifiers with the fastai framework. I have been trying to transfer some of my work from the freesound kaggle competition to the updated v1 library.
Over the past week or so I wrote an experimental audio
module to read in raw audio and compute spectrograms on the fly during training (straight from the DataLoader), so you don’t need to actually create images beforehand. It then can take pretrained image models, modify them to take a single channel of 2D input, and finetune from there. I was able to run 3 epochs over 100k 4 second audio files from the NSynth dataset, classifying their instrument families to around 80% accuracy in under 3 minutes.
I’m planning to continue working on this for the next few weeks in hopes of supporting data augmentation and maybe even support for finetuning of models trained on audio data (as opposed to imagenet models).
It’s still a work in progress, and ALREADY a version behind fastai (currently tested with 1.0.28). You can view all the code and notebooks here: