A Guide to Converting Audio to Images


I’ve written up a to-the-point guide on how one can create spectrogram images from audio using PyTorch’s torchaudio library.

You can read the guide with the link below.

If you have any comments, questions, suggestions, feedback, criticisms, or corrections, please do let me know!


Nice work! I recently implemented this in a project of my own and I wish this had be written before I started! I ended up going down the route of using Librosa which ends up being a bit more verbose. That said it does allow for nicer visualisations out of the box

1 Like

I initially used Librosa too, but I had problems converting spectrograms to images since it took too long to save generated matplotlib graphs as images. After many hours of trying to figure out how to overcome this, and while using torchaudio, I figured out (thanks to ChatGPT) I could directly save an array/tensor as an image using PIL, and that was a facepalm moment. :smile:

torchaudio builds on top of Librosa I think, so you could potentially further extend what’s generated by torchaudio with Librosa.

Yeah that makes sense, I’ve only been visualising them rather than actually saving them out as images as I need to convert back from image to audio at some point and it feels as though keeping them as a 2d array is easiest

Yeah, in that case it would be easier to keep them as arrays instead. The arrays can also still be saved to disk, which can save time and provide more flexibility (like making images at will).

1 Like

Hi guys, I tried to set up a pipeline with fastai for the birdclef-2023 competition, I took great inspiration from your code @ForBo7 to create images but used librosa instead of pytorch audio taking some other code from a great public notebook.
I then implemented the training phase and inference phase but for some reason I get a submission error.
Here is the link to the discussion on Kaggle: BirdCLEF 2023 | Kaggle
If some of the fastai folks can help me out with this would be much appraciated.


1 Like

I don’t think I can provide much input; I didn’t do much for this competition, but I do remember the submission file being really finicky. I’ve looked at your sample submission file and I don’t seem to see anything incorrect. Perhaps there are some missing birds in the columns?

Hopefully somebody else can provide some input.

1 Like

Thanks for the response, I think I am using columns directly coming from the sample csv, but there might be something related to missing birds and I am predicting NaN values somewhere in the csv, I’ll have a look at it

@ForBo7 thanks for the input, I think I was not handling properly the load of all test files and I think I also fixed a minor seconds mismatch on tests images, now I can submit and I have and end to end pipeline with fastai, thanks again!

Very nice! :smile: