I’ve written a bit about my 1 month ride with leaning Machine Learning and FastAI.
I’m really interested in audio applications, I’m a signal/dsp hacker. But I found it hard to get started with audio in neural nets at first. This blog post is about how I hacked a tutorial on images to create sound without going through a spectrogram representation.
The TL;DR of it is; you can train a vae on audio samples formatted into an image and get quality sounding results.
Pretty informal blog post, not a towards data science entry.
I’d be interested in trying to push the approach further, using vqvae, or trying diffusion using this method as there’s a natural mapping between approaches.
Pretty silly idea, but it works for some reason.