Hacking an MNIST tutorial to produce audio


I’ve written a bit about my 1 month ride with leaning Machine Learning and FastAI.

I’m really interested in audio applications, I’m a signal/dsp hacker. But I found it hard to get started with audio in neural nets at first. This blog post is about how I hacked a tutorial on images to create sound without going through a spectrogram representation.

The TL;DR of it is; you can train a vae on audio samples formatted into an image and get quality sounding results.

Pretty informal blog post, not a towards data science entry.

I’d be interested in trying to push the approach further, using vqvae, or trying diffusion using this method as there’s a natural mapping between approaches.

Pretty silly idea, but it works for some reason.