Fastai v2 audio

It’s done by the kind folks here (not me, I’ve just been following it) :slight_smile: if I had to guess, update fastprogress

Hey Scott, sorry that happened and thanks for reporting it. I will look into why we don’t have compatibility and see how we can fix it, and if we can’t in the short run then I’ll at least notify people in the repo of the temporary incompatibility. (sometimes we are beholden to certain versions of pytorch due to our current reliance on torchaudio). Thanks again.

@MadeUpMasters thanks. I’ll keep trying; it can easily just be user error.
My collaborator has a new idea for an alternative architecture to our SignalTrain model and I’m thinking of trying to put the new one in ‘Fastai’ form…both to make it more accessible to others (like you all) and so I can benefit from the collaborative tool-making of the FastAI community.
So in the coming weeks/months I may have a lot of questions about writing custom DataLoaders!

1 Like

Hi Robert. Thank you for the response! Answers my questions perfectly! :slight_smile:


I installed fastai2 using

!git clone
!cd fastai2 && pip install fastai2


!git clone
!pip install packaging
!pip install -e

when I install torchaudio

!pip install torchaudio>=0.3

I get the following error -

ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you’ll have torch 1.4.0 which is incompatible.
ERROR: fastai2 0.0.7 has requirement torch<1.4.0,>=1.2.0, but you’ll have torch 1.4.0 which is incompatible.

Should I install the libraries some other way?

[EDIT] I installed the fastai2 in the following way, as recommended here

import os
!pip install -q fastai2 fastcore feather-format kornia pyarrow wandb nbdev fastprogress --upgrade
!pip install torchvision===0.4.2
!pip install Pillow==6.2.1 --upgrade
!pip install torch==1.3.1

Now I am getting this error

ERROR: torchaudio 0.4.0 has requirement torch==1.4.0, but you’ll have torch 1.3.1 which is incompatible

1 Like

Hi shruti_01 hope you are having a fun day! I am currently following this course A walk with fastai2 - Study Group and Online Lectures Megathread run by muellerzr.

If you look at the start of the notebooks we do some uninstalling of libraries so that we can import the fastai2 modules successfully. It looks like you could be having a similar issue.

Maybe this image can help you resolve your issue.

Cheers mrfabulous1 :smiley: :smiley:

1 Like

We’re working on a fix to this at the moment.

For now I believe there is a solution on the b-env-fix branch.

If you could share your colab notebook I might be able base a solution using what we have on this branch for you.

1 Like

Thanks @baz

For now I am directly putting the fastai2_audio code in the colab notebook.

Just updated the library so that it’s compatible with the latest fastai2 and python 3.6. So, right now it’s possible to install it via pip directly:

pip install packaging
pip install git+

Colab is also supported, and I prepared a example notebook here.


Well done on all the work here folks, just after watching @muellerzr’s run through, this lib looks super useful!

I am hoping to use it in kaggle’s deepfake comp as some of the videos also have fake audio. Just wondering if anyone has any suggestions on the easiest way to extract audio from mp4 files? And is there a preferential format I should save them to?

1 Like

ffmpeg is a great tool to manipulate video and audio via the command line on linux, the usage may look scary at first but it’s very powerful. To extract the audio from only one video:

ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ac 1 -ar 16000 out.wav

Here, we have:

  • -i video.mp4 is the input file;
  • -vn means no video output;
  • -acodec pcm_s16le is the audio coded used;
  • -ac 1 is to use only one channel (mono audio);
  • -ar 16000 is the sampling rate
  • out.wav the last argument is the output file.

If you search on the internet you’ll find some posts listing all of the different ways you can use ffmpeg like this one. To process multiple files, it’s just a matter of using a bash loop:

for vid in *.mp4; do ffmpeg -i "$vid" -vn -acodec pcm_s16le -ac 1 -ar 16000 "${vid%.mp4}.wav"; done

About the format, .wav with this coded is a common choice for audio data. The only parameters that you should change are the channels to 2 if you want to use stereo audio, and the sampling rate. For pure voice audio, 8 khz (-ar 8000) should be enough, but if you have other sources of sound besides voice you may want to use 16 khz (-ar 16000) or even 44.1 khz (-ar 441000). Those rates are directly related to the highest frequency present in your audio and the Nyquist theorem.


Amazing, appreciate it! Its only voice, although maybe I’ll us 16 khz because the goal is identify fake/manipulated voice, so maybe some crazy artefacts show up beyond the expected 8 khz…thanks again!

Hi all,

I thought I’d introduce myself after lurking for enough time! My background is in acoustic consultancy/engineering and I’m currently making a career change towards ML. I’m currently doing the Udacity ML Engineer Nanodegree and will (hopefully) be going to Georgia Tech to start the OMSCS ML specialization later in the year.

First of all, I absolutely love the work you all have done - machine listening is such a fascinating area, so I would love to contribute however I can. I also have my own personal project working on bird sound recognition for an area next to a national park in Colombia, near where I’m lucky enough to live (Bogotá), so will have a play around with V2 and feedback in due course. I used V1 late last year and it worked pretty well with mel-spectrograms on a dataset of xeno-canto recordings of 134 bird species ranging from excellent to pretty dodgy quality, so I’m excited to see how V2 can do.

I would like to use the library for my Udacity Capstone project, would you recommend I stick with V1 for now or go ahead with V2?



1 Like

Hi all, I’m having some trouble running my code on the google tpu using a colab notebook. I thought you might have some more experience in this field and I’m trying to ask here.

I’m trying to run a pytorch script which is using torchaudio on a google TPU. To do this I’m using pytorch xla following this notebook, more specifically I’m using this code cell to load the xla:

!pip install torchaudio
import os
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'

VERSION = "20200220"  #@param ["20200220","nightly", "xrt==1.15.0"]
!curl -o
!python --version $VERSION

import torch

import torchaudio

import torch_xla

however this is incompatible with the version of torchaudio that I need as: ERROR: torchaudio 0.4.0 has requirement torch==1.4.0, but you'll have torch 1.5.0a0+e95282a which is incompatible.

I couldn’t find anywhere how to load torch 1.4.0 using pytorch xla.

I tried to use the nightly version of torch audio but that gives the error as follows:

!pip install torchaudio_nightly -f

import os
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'

VERSION = "20200220"  #@param ["20200220","nightly", "xrt==1.15.0"]
!curl -o
!python --version $VERSION

import torch
import torchaudio

import torch_xla
ImportError                               Traceback (most recent call last)
<ipython-input-2-968e9d93c06f> in <module>()
     10 import torch
---> 11 import torchaudio
     13 import torch_xla

/usr/local/lib/python3.6/dist-packages/torchaudio/ in <module>()
      4 import torch
----> 5 import _torch_sox
      7 from .version import __version__, git_version

ImportError: /usr/local/lib/python3.6/dist-packages/ undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_29E


So how would I go to load the stable version or 1.4.0 version of pytorch using xla or is there any other workaround for this situation?

Thanks a lot for your help!

Hi everyone,

I just came across this audio extension for fastai and I was amazed. I’m trying to write a naive app to classify between 2 data sources. The model trains well, thanks to the notebook provided on GitHub.

I’m trying to load a single wav file and get predictions but I’m doing something wrong here.

I created this single file batch to get predictions and the types for learner.x and my single input are the same

But I’m getting the below error

I don’t understand why the learner is looking for an AudioTensor file and when I simply pass the path to test file it can’t process it. I’m sure I’m missing a key understanding of Data Block API here, please help.

Notebook here

Following up, I was able to collect my sample as an AudioTensor but the predict method still doesn’t work.

@muellerzr Maybe you can help, I picked the AudioTensor creation part from your video tutorial. My apologies, I usually don’t at-mention at all but I’m fighting this for the last 6 hours and going crazy. And I just found a similar thread and I’m not sure if it’s a fastaiv2 issue.

Not a problem, try upgrading fastcore? Or try doing the dev installs


pip install git+

(And repeat for fastcore)

Thanks for the quick response, I tried updating everything but still the exact same issue

!pip install git+
!pip install git+
!pip install packaging
!pip install git+
1 Like

Awesome, thank you for investigating :slight_smile: I’ll look into it tonight (as it seems like a fastai issue on a whole)

1 Like

@PranY I looked into it, I can predict normally (with regular fastai data) so I think something updated that needs to be adjusted in the audio library :slight_smile:

1 Like

Thanks again. In case you want a reproducible check for the audio library, the notebook has all the components. I’ll keep looking on the audio side now.

Update: I checked last 8 commits for fastcore and the error is not related. Will look back further, I think Slyvian will know how to fix this.