Fastai v2 audio

MuhammadAli · May 26, 2021, 3:01am

Hi, kindly use following for installs:

!pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2

!pip install --upgrade git+https://github.com/fastaudio/fastaudio.git

And then restart the kernel.

glyphy · May 30, 2021, 2:17pm

Hi, is there a way to have different transformation for audio similar to the one for image?
I tried to use Pipeline to compile two transformations, however, I got ValueError…

ValueError: too many values to unpack (expected 3)

class AlbumentationsTransform(RandTransform):
    "A transform handler for multiple `Albumentation` transforms"
    split_idx, order = None, 2

    def __init__(self, train_aug, valid_aug):
        store_attr()

    def before_call(self, b, split_idx):
        self.idx = split_idx

    def encodes(self, sig):
        if self.idx == 0:
            aug_audio = self.train_aug(sig)
        else:
            aug_audio = self.valid_aug(sig)
        return aug_audio
def get_train_aug():
    return Pipeline([a2s, MaskFreq(), MaskTime()])
def get_valid_aug():
    return Pipeline([a2s])

clck10 · June 1, 2021, 6:06pm

Semi-related, but should the SpecAugment Transforms in fastaudio be RandTransforms, instead of Transforms? It seems they are currently applied on the validation set as well. It has been some time since I looked into split_idx and Transform mapping, so fastai may be correctly handling this under the hood. Just want to be sure! Thanks

^ The same reasoning could apply to other Augments that are currently Transforms and not RandTransforms. But it is not as clear which should be one vs. the other and it is likely problem-specific, so maybe the flexibility is best kept.
Looks like someone recently brought this up as well:

https://github.com/fastaudio/fastaudio/issues/101

victor.velo · October 4, 2021, 1:45pm

Hello Robert,

I would like to experiment with the latest version - 0.9.x - of Torchaudio wav2vec stuff in FastAudio.

Could you please give some direction on how to do the upgrade of FastAudio to support TorchAudion 0.9.x in order to achieve this?

I’m pretty sure there is reason why FastAudio is pinned to TorchAudio 0.8.0.
Thanks,
Victor

MadeUpMasters · October 5, 2021, 11:20am

Hey Victor,

Can you tell me more about what you’re trying to do? FastAudio is built for classification but not for speech recognition tasks. If you are working with speech I would recommend using thunder-speech which supports wav2vec2, or using torchaudio and pytorch directly.

victor.velo · October 5, 2021, 12:24pm

Hello Robert,

Yes, I’m doing audio classification with Fastaudio and it works very well. But, I also need to do some kind of “liftering” of the audio before the classification in order to validate that the audio is correctly meet some criteria. The task is just to classify 3 type of sound, everything else should be ignored.
In the test I did, if the sound is saturated it is wrongly classified. The sounds are 1s short.

Intuitively, I think wav2vec2 will help. (I have less experience in audio manipulation)

But in general, is it difficult to update fastaudio to support torchaudio 0.9.x on Linux?

Thanks,

MadeUpMasters · October 6, 2021, 1:39pm

I haven’t touched the code base in a while but if I remember correctly we pinned to 0.8.0 both to avoid future breaking changes, and because that version of fastai insisted on using a specific version of pytorch that wasn’t compatible with future torchaudio versions.

As far as how difficult it will be to update, it is something I would recommend creating a new environment and stepping through and seeing what breaks. Currently fastai supports pytorch 1.7+, and for torchaudio 0.9 you only need pytorch 1.4+. If you try this I expect some fastai stuff will break, and that would be a pain to debug/upgrade, and there may be some small fixes for torchaudio as well. Release notes help, and fastai has a discord with an audio channel: Discord, you’re more likely to get responses there than here I think.

Good luck and happy to answer any questions you may have.

victor.velo · October 7, 2021, 12:38am

Hello Robert,

Thanks for the tips. I re-compile fastaudio against the following version and the only thing I have to do is to recreate the model.
fastaudio 1.0.2.post0.dev1+g3d6c0a0.dirty (edit setup.cfg & re-build )

install_requires =
fastai>=2.5.0
torchaudio>=0.9
librosa==0.8
colorednoise>=1.1
IPython #Temporary remove the bound on IPython
fastcore>=1.3.20

fastai 2.5.2
fastbook 0.0.18
fastcore 1.3.26
fastdownload 0.0.5
fastprogress 1.0.0
fastrelease 0.1.12

torch 1.9.1
torchaudio 0.9.1
torchvision 0.10.1

Thanks,

victor.velo · October 10, 2021, 3:26pm

Hello Robert,

One error I got after the upgrade is this:
dls.show_batch(max_n=3)
…
~/projects/torch_1.9.1/lib/python3.7/site-packages/fastaudio/core/spectrogram.py in getattr(self, name)
70 return self._settings[name]
71 raise AttributeError(
—> 72 f"{self.class.name} object has no attribute {name}"
73 )
74

AttributeError: AudioSpectrogram object has no attribute _settings

Thanks,
Victor

MadeUpMasters · October 12, 2021, 1:06am

Hey Victor,

Sorry it’s been a long time since I’ve looked at the code and I’m having trouble following it. The error is occurring in this block of code

def __getattr__(self, name):
        if name == "settings":
            return self._settings
        if not name.startswith("_"):
            return self._settings[name]
        raise AttributeError(
            f"{self.__class__.__name__} object has no attribute {name}"
        )

If you’re still stuck, can you share the full stack trace? Thanks

victor.velo · October 12, 2021, 9:08pm

Hello Robert,

Here is the full stack trace. Would be nice to have it resolved.

========

AttributeError Traceback (most recent call last)
/tmp/ipykernel_2594/1652635938.py in
----> 1 dls.show_batch(max_n=3)

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastai/data/core.py in show_batch(self, b, max_n, ctxs, show, unique, **kwargs)
100 if b is None: b = self.one_batch()
101 if not show: return self._pre_show_batch(b, max_n=max_n)
→ 102 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)
103 if unique: self.get_idxs = old_get_idxs
104

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastcore/dispatch.py in call(self, *args, **kwargs)
116 elif self.inst is not None: f = MethodType(f, self.inst)
117 elif self.owner is not None: f = MethodType(f, self.owner)
→ 118 return f(*args, **kwargs)
119
120 def get(self, inst, owner):

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in show_batch(x, y, samples, ctxs, max_n, nrows, ncols, figsize, **kwargs)
116 min(len(samples), max_n), nrows=nrows, ncols=ncols, figsize=figsize
117 )
→ 118 ctxs = show_batch[object](x, y, samples, ctxs=ctxs, max_n=max_n, **kwargs)
119 return ctxs
120

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastai/data/core.py in show_batch(x, y, samples, ctxs, max_n, **kwargs)
16 else:
17 for i in range_of(samples[0]):
—> 18 ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(samples.itemgot(i),ctxs,range(max_n))]
19 return ctxs
20

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastai/data/core.py in (.0)
16 else:
17 for i in range_of(samples[0]):
—> 18 ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(samples.itemgot(i),ctxs,range(max_n))]
19 return ctxs
20

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in show(self, ctx, ax, title, **kwargs)
75 def show(self, ctx=None, ax=None, title="", **kwargs):
76 “Show spectrogram using librosa”
—> 77 return show_spectrogram(self, ctx=ctx, ax=ax, title=title, **kwargs)
78
79

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in show_spectrogram(sg, title, ax, ctx, **kwargs)
87 ia = ax.inset_axes((i / sg.nchannels, 0.2, 1 / sg.nchannels, 0.7))
88 z = specshow(
—> 89 channel.cpu().numpy(), ax=ia, **sg._all_show_args(show_y=i == 0), **kwargs
90 )
91 ia.set_title(f"Channel {i}")

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in _all_show_args(self, show_y)
50 def _all_show_args(self, show_y: bool = True):
51 proper_kwargs = get_usable_kwargs(
—> 52 specshow, self._settings, exclude=[“ax”, “kwargs”, “data”]
53 )
54 if “mel” not in self._settings or not show_y:

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in getattr(self, name)
70 return self._settings[name]
71 raise AttributeError(
—> 72 f"{self.class.name} object has no attribute {name}"
73 )
74

AttributeError: AudioSpectrogram object has no attribute _settings

========

Thanks,
Victor

MadeUpMasters · October 13, 2021, 9:09pm

Hey Victor, I’m sorry but I looked this over and I’m still not sure how the bug is getting triggered. I’m on my way out of town, but you may want to try asking in the Discord Group. Good luck and sorry I couldn’t be of more help.

victor.velo · October 13, 2021, 11:25pm

Thanks Robert.

gullabi · January 9, 2022, 1:34pm

Hi all! I am a data scientist/cofounder, and a deep learning practitioner. I mostly work on speech technologies, training models, pushing them to production and publishing whenever I can. But in order to really learn about deep learning fundamentals, I recently started getting into fastai.

Since I work a lot with audio and speech I was both happy and excited to see that the audio part of the project is community driven. I was wondering what are the development plans of fastai audio and whether there are maintenance issues. I checked the issues but they are mostly enhancement and there are not many commits during the last months.

Thanks!

MadeUpMasters · January 22, 2022, 10:07pm

Hey @gullabi, sorry for the delayed response. Fastaudio is not currently under active development. The original developers switched to mainly doing speech-to-text and text-to-speech, while fastaudio is focused on classification and isn’t suited for ASR/TTS. One of the original developers, @scart97 maintains a simple but awesome ASR library (GitHub - scart97/thunder-speech: A Hackable speech recognition library.), and we also still have an audio machine learning telegram where there is very little chatter but if you ask a question someone usually answers, let me know if you’d like to join.

Right now I’d recommend maybe contributing to torchaudio. When we started fastaudio, audio ML was a pain and you had to do lots of stuff manually, so we tried to build that stuff so you wouldn’t have to be an audio expert to do ML in the domain, but torchaudio came around and built a lot of the same stuff (but more of it and a lot better). Hope this helps, take care.

baz · April 18, 2022, 8:28pm

Hi I’m Harry, I’m one of the creators of fastaudio (classification library) and have contributed to other audio libraries like pyannote-audio (speaker diarization).

Currently I work at a TTS company (sonantic.io) as a Research Engineer.

There are several audio chats / communities for different audio problems and they are all quite small and quiet. For people who are interested in various audio applications it’s also annoying to have to login to so many chats

I’ve created a discord channel that is for all types of audio problems to try and merge the communities a bit as many of the things we work on are shared / similar when working on ml with audio.

Feel free to join

Machine Learning with Audio

sbavery · November 16, 2022, 4:50am

Hi all, I was interested in better understanding audio spectrogram vision learning, so I recreated some useful notebooks I found into a simple notebook using fastai v2:

github.com

sbavery/ml-examples/blob/main/nbs/02_classification_sound.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sound Classification Vision Learning using Fastai v2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sbavery/ml-examples/blob/main/nbs/02_classification_sound.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [

This file has been truncated. show original

I hope this may be useful as a simple starting point or learning tool! It’s not currently using fastaudio, but I may look into adding that as well.

Maimonator · April 22, 2023, 6:57pm

Hey guys, it’s been a while since this library was maintained.
I wanted to try it out with the latest libraries, so I opened a PR to update the dependencies.
Not sure who is maintaining this library today, so I’m also writing here in case my PR reaches no one.

For those that are interested here’s a link

github.com/fastaudio/fastaudio

Update to fastai 2.7.12

fastaudio:master ← Maimonator:update-to-latest

opened 06:49PM - 22 Apr 23 UTC

Maimonator

+24 -14

Updated the following libraries ``` fastai==2.7.12 torchaudio>=2.0.0 fastcor…e==1.5.29 librosa==0.10.0 ``` I also ran the tests on my environment which contains python 3.11.3 and the following libraries installed Note that I had to run `pip install --pre numba` to make `librosa==0.10.0` work with python 3.11.3 ``` alabaster==0.7.13 ansiwrap==0.8.4 anyio==3.6.2 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.2.1 attrs==23.1.0 audioread==3.0.0 Babel==2.12.1 backcall==0.2.0 beautifulsoup4==4.12.2 black==23.3.0 bleach==6.0.0 blis==0.7.9 catalogue==2.0.8 certifi==2022.12.7 cffi==1.15.1 cfgv==3.3.1 charset-normalizer==3.1.0 click==8.1.3 cmake==3.26.3 colorama==0.4.6 colorednoise==2.1.0 comm==0.1.3 commonmark==0.9.1 confection==0.0.4 contourpy==1.0.7 coverage==7.2.3 cycler==0.11.0 cymem==2.0.7 debugpy==1.6.7 decorator==5.1.1 defusedxml==0.7.1 distlib==0.3.6 docutils==0.19 entrypoints==0.4 executing==1.2.0 fastai==2.7.12 fastcore==1.5.29 fastdownload==0.0.7 fastjsonschema==2.16.3 fastprogress==1.0.3 filelock==3.12.0 fonttools==4.39.3 fqdn==1.5.1 ghp-import==2.1.0 gitdb==4.0.10 GitPython==3.1.31 identify==2.5.22 idna==3.4 imagesize==1.4.1 iniconfig==2.0.0 ipykernel==6.22.0 ipython==8.12.0 ipython-genutils==0.2.0 ipywidgets==8.0.6 isoduration==20.11.0 jedi==0.18.2 Jinja2==3.1.2 joblib==1.2.0 jsonpointer==2.3 jsonschema==4.17.3 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.6.3 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyter_server==2.5.0 jupyter_server_terminals==0.4.4 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.7 kiwisolver==1.4.4 langcodes==3.3.0 lazy_loader==0.2 librosa==0.10.0 lit==16.0.1 llvmlite==0.40.0rc1 Markdown==3.3.7 MarkupSafe==2.1.2 matplotlib==3.7.1 matplotlib-inline==0.1.6 mergedeep==1.3.4 mistune==2.0.5 mkautodoc==0.2.0 mkdocs==1.4.2 mkdocs-material==9.1.7 mkdocs-material-extensions==1.1.1 mknotebooks==0.6.1 mpmath==1.3.0 msgpack==1.0.5 murmurhash==1.0.9 mypy-extensions==1.0.0 nbclassic==0.5.5 nbclient==0.7.3 nbconvert==7.3.1 nbformat==5.8.0 nest-asyncio==1.5.6 networkx==3.1 nodeenv==1.7.0 notebook==6.5.4 notebook_shim==0.2.2 numba==0.57.0rc1 numpy==1.24.2 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 packaging==23.1 pandas==2.0.0 pandocfilters==1.5.0 papermill==2.4.0 parso==0.8.3 pathspec==0.11.1 pathy==0.10.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 platformdirs==3.2.0 pluggy==1.0.0 pooch==1.7.0 pre-commit==3.2.2 preshed==3.0.8 prometheus-client==0.16.0 prompt-toolkit==3.0.38 psutil==5.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 pydantic==1.10.7 Pygments==2.15.1 pymdown-extensions==9.11 pyparsing==3.0.9 pyrsistent==0.19.3 pytest==7.3.1 pytest-cov==4.0.0 python-dateutil==2.8.2 python-json-logger==2.0.7 pytz==2023.3 PyYAML==6.0 pyyaml_env_tag==0.1 pyzmq==25.0.2 qtconsole==5.4.2 QtPy==2.3.1 recommonmark==0.7.1 regex==2023.3.23 requests==2.28.2 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 scikit-learn==1.2.2 scipy==1.10.1 Send2Trash==1.8.0 six==1.16.0 smart-open==6.3.0 smmap==5.0.0 sniffio==1.3.0 snowballstemmer==2.2.0 soundfile==0.12.1 soupsieve==2.4.1 soxr==0.3.5 spacy==3.5.2 spacy-legacy==3.0.12 spacy-loggers==1.0.4 Sphinx==6.1.3 sphinxcontrib-applehelp==1.0.4 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.1 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 srsly==2.4.6 stack-data==0.6.2 sympy==1.11.1 tenacity==8.2.2 terminado==0.17.1 textwrap3==0.9.2 thinc==8.1.9 threadpoolctl==3.1.0 tinycss2==1.2.1 torch==2.0.0 torchaudio==2.0.1 torchvision==0.15.1 tornado==6.3.1 tqdm==4.65.0 traitlets==5.9.0 triton==2.0.0 typer==0.7.0 typing_extensions==4.5.0 tzdata==2023.3 uri-template==1.2.0 urllib3==1.26.15 virtualenv==20.22.0 wasabi==1.1.1 watchdog==3.0.0 wcwidth==0.2.6 webcolors==1.13 webencodings==0.5.1 websocket-client==1.5.1 widgetsnbextension==4.0.7 ```

UmerAdil · June 23, 2023, 8:55am

Hey Tal, thanks for that!

But wouldn’t it be better to not fix versions of dependencies, and consistently use min-versions instead? Ie fastai>=2.7.0 instead of fastai==2.7.12.

Otherwise, the error you describe is bound to happen again.

Maimonator · June 23, 2023, 12:15pm

Hey!
Generally I agree, but I just wanted this PR to be accepted so I want by the old rules.
Also I remember I saw a previous comment somewhere (maybe in a previous PR) that it’s like that so fastai doesn’t sometime break fastaudio.
But as you can see my PR wasn’t accepted yet and it’s been a while sadly