Fastai v2 audio

I don’t know if it’s correctly implemented - you’ll need to create tests to convince yourself of that.

RandTransform looks fine. Not sure what you’re passing as_item=True. Do you need that?

I don’t know exactly what you’re asking about TypeDispatch. What do you want to do? What have you tried? What happened when you tried? Please provide code for anything that didn’t work the way you hoped, as appropriate.

To make GPU transforms work on items or batches you need to use broadcasting carefully. There’s nothing fastai specific about that. Using ellipses, e.g. t[...,c,x,y], can help. Otherwise, just create separate versions for each with different names.

You should use a profiler to see where the time is being spent.

Thanks for the feedback, and I’m sorry, I should have made this post more clear. I haven’t had many eyes on the code, especially from people extremely comfortable with the v2 codebase, and GPU transforms, so I was looking more for a quick look over the code to see if anything jumps out as either a bad practice, or something that could be done better another way. I didn’t mean to ask you, ‘will this code work?’, we do have tests in place, but more along the lines of “before we copy this pattern out for all the transforms we have, does everything look more or less okay?”. I’ll try to be more explicit in the future about what would be helpful, and I’ll do my best to make sure all the needed info is there without overwhelming the post.

Sorry, on rereading my question about TypeDispatch was extremely unclear. I was going to ask if combining separate signal and spectrogram transforms into one, when they do the same type of operation, was a good idea, but I’m sure it is and it’s a key reason you implemented it in Python in the first place, so I’ve got my answer :slight_smile:

Yes I think you could say def encodes(self, item:(AudioItem,AudioSpectrogram)) and then cover both with basically the same code. I just try refactorings and see whether what comes out is (to me) clearer, or less clear, than what I started with. And only keep it if I think it’s an improvement.

Nothing jumps out at me as obviously problematic with your code. But if there’s design decisions that are going to end up appearing in lots of different transforms, you might want to think about ways to factor them out anyway, so you only have to change things in one place later if needed.

Personally, my approach to development is very iterative. Once I’ve built 3 transforms, for instance, I’ll go back and look for ways to share code and simplify them as a group. And I’ll keep doing that as I add more. I’m not very good at looking at a single piece of code and knowing whether it’ll end up a good pattern in the long run.


I’m working on a new audio project (bird calls ) and hoping to use v2 audio for it. However is v2 still progressing in terms of audio as this thread appears to have gone quiet? Or would it be better to go Pytorch native at this point?

Also I followed the setup directions in Rbraccos github but see nothing about audio in the resulting folders…are there updated config directions?
(from Fastai v2 audio)

Thanks for any updates.


Hello! I’d love to get involved in this. Please let me know how/where I can help!

@MadeUpMasters can you add me to the Telegram chat for Audio ML and working on the library?

So happy to see this happening!

1 Like

Hey Less, sorry we are in a bit of a transition state and that plus vacation holidays has made it appear quiet, but we are very much working on it.

V2 Updates are posted here

After fastai_dev became fastai2 (and a host of other repos), our fork moved to a new repo to fastai2_audio

If you can tell me a bit more about your project (will it be for production? research? just you?) I’d be happy to give you an honest opinion about the best option for you. Each one has it’s drawbacks:

  • Pytorch only, you’ll have to do all your own preprocessing, spectrogram generation/cropping, and transforms.
  • Fastai audio v2, it will be changing a lot over the next 3 months, I wouldn’t use it for anything you need to count on right now
  • Fastai audio v1, it has nice features and documentation, and is stable, but some things don’t work well (inference and model export) and we aren’t doing a ton to support them at the moment.

Hope this rundown helps, let me know if there’s any way I can help you get started.



+1 for telegram chat. My username is: @madhavajay on telegram. Would it be best to start with audio v1 to do exploration on Audio Classification to determine the data quality and problem approach or does v2 provide superior results and easier tooling?

1 Like

Hi Rob,
Awesome thanks a ton for the feedback and update. Glad to see that audio work is still underway!

Re: project - it’s a prototype right now with their planning to roll it out into commercial use next year if it goes well (for environmental monitoring basically).

I did look at 1.0 and it looked good but some of the stuff in the notebooks is broken (i.e. librosa has updated and changed) so I was assuming 2.0 would likely be the way to go.

Thanks for the link to the v2 audio repro - now I can see the v2 audio work there so that’s a big help.

Their timeline is somewhat flexible as they are still gathering field data to buuild out the datasets we’ll need so my preference ideally at least would be to work with 2.0 and ideally help contribute to it as v2 grows and this project grows.

I’ll setup with the v2 that’s there tomorrow though and try to get up to speed on it as it is for now, so thanks again!

Best regards,


Given those conditions I would recommend v1 for the time being.

Thanks let us know how it goes, things are quite messy at the moment (show batch is broken, there’s a bug for autocompleting arguments to AudioSpectrogram constructor where it doesnt show all the available kwargs, which is annoying as there are a lot of them. These should be relatively easy fixes but I won’t be back to working on this until Monday Jan 6. We have a great group now and would love any feedback/contribution.


Hi Folks,

Not a code level question but one about the direction of the library.
Is the main goal of the library to classify discrete audio eg. single words or snippets (sound classification)? Or a more generalised ASR like Kaldi where longer audio is processed?

Best regards,

1 Like

Hey JP, we would like to do both, but discrete audio classification (acoustic scene recognition, voice recognition, command/word recognition) is the easier of the two and is already working. We aim to add support for full ASR using CTC-loss but haven’t actually integrated it yet.

As for Kaldi, torchaudio (pytorch audio library that we use in fastai audio) wraps Kaldi so I think we can pretty easily access their functionality for things like audio alignment, but full grapheme/phoneme level ASR is a bigger leap. Hope that answers your question.


Hi folks. I’ve been away for a long time doing my own ML-audio work. I saw this thread come up near the top and decided to try fastai2_audio…
…but I’m getting an error from the tutorial notebook:

from fastprogress import progress_bar as pb

produces the error

ImportError                               Traceback (most recent call last)
<ipython-input-4-383e3a70bf7e> in <module>
----> 1 from fastprogress import progress_bar as pb

ImportError: cannot import name 'progress_bar' from 'fastprogress' (/home/shawley/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastprogress/

I’m able to run the main fastai2 notebooks and see progress bars. It’s just this part of the 02_tutorial.ipynb for the of fastai2_audio that’s producing this error.
Any suggestions?

Update: Seems that installing fastai2_audio broke my working fastai2 environment. I’d assumed fastai2_audio was an add-on to fastai2, but it seems to have grabbed different versions of packages that I already had, and replaced them, e.g. with pytorch=1.4 instead of the 1.3 I had.

It’s done by the kind folks here (not me, I’ve just been following it) :slight_smile: if I had to guess, update fastprogress

Hey Scott, sorry that happened and thanks for reporting it. I will look into why we don’t have compatibility and see how we can fix it, and if we can’t in the short run then I’ll at least notify people in the repo of the temporary incompatibility. (sometimes we are beholden to certain versions of pytorch due to our current reliance on torchaudio). Thanks again.

@MadeUpMasters thanks. I’ll keep trying; it can easily just be user error.
My collaborator has a new idea for an alternative architecture to our SignalTrain model and I’m thinking of trying to put the new one in ‘Fastai’ form…both to make it more accessible to others (like you all) and so I can benefit from the collaborative tool-making of the FastAI community.
So in the coming weeks/months I may have a lot of questions about writing custom DataLoaders!

1 Like

Hi Robert. Thank you for the response! Answers my questions perfectly! :slight_smile:


I installed fastai2 using

!git clone
!cd fastai2 && pip install fastai2


!git clone
!pip install packaging
!pip install -e

when I install torchaudio

!pip install torchaudio>=0.3

I get the following error -

ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you’ll have torch 1.4.0 which is incompatible.
ERROR: fastai2 0.0.7 has requirement torch<1.4.0,>=1.2.0, but you’ll have torch 1.4.0 which is incompatible.

Should I install the libraries some other way?

[EDIT] I installed the fastai2 in the following way, as recommended here

import os
!pip install -q fastai2 fastcore feather-format kornia pyarrow wandb nbdev fastprogress --upgrade
!pip install torchvision===0.4.2
!pip install Pillow==6.2.1 --upgrade
!pip install torch==1.3.1

Now I am getting this error

ERROR: torchaudio 0.4.0 has requirement torch==1.4.0, but you’ll have torch 1.3.1 which is incompatible

1 Like

Hi shruti_01 hope you are having a fun day! I am currently following this course A walk with fastai2 - Study Group and Online Lectures Megathread run by muellerzr.

If you look at the start of the notebooks we do some uninstalling of libraries so that we can import the fastai2 modules successfully. It looks like you could be having a similar issue.

Maybe this image can help you resolve your issue.

Cheers mrfabulous1 :smiley: :smiley:

1 Like

We’re working on a fix to this at the moment.

For now I believe there is a solution on the b-env-fix branch.

If you could share your colab notebook I might be able base a solution using what we have on this branch for you.

1 Like

Thanks @baz

For now I am directly putting the fastai2_audio code in the colab notebook.