ResNet Equivalent for Speech Recognition


I have a project that is based around getting transcripts out of audio. I have no idea were to start. But I assumed since pre trained networks like ResNet-50 exist, there must be something equivalent for audio.

I appreciate any hint or help on where to start.


I don’t think this is a Resnet equivalent for STT, but it is a great project.

Thanks a lot. Do you know of any other open source project?

More info here:

thanks a lot much appreciate it

Here are a few forum threads that may have some ideas that could help you.

This isn’t quite at a usable state, but should give you a good step in the right direction!

