Processing Audio Data

Hey y’all,

Hope everyone is having a blast with these series and learning a lot from the community and the teachers here. I want to ask what might be a good way to build a “Note” classifier for detecting what musical note is being played, based on a “.wav” file.

At first, as a learner, I wanted to write code that has the least amount of variance, so I decided to convert the .wav files as a spectrometer-based image, and then used a vision classifier to train. But the results are abysmal (and rightfully so).

I should just want to use the .wav data directly against pytorch or fastai. I wanted to ask if some folks in the community have worked on something similar or want to pair up and tackle this problem together~

Here is a Kaggle notebook for reference: Experimenting with Classifying Notes | Kaggle

Happy coding!