Fastai v2 audio

Hi, kindly use following for installs:

!pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2

!pip install --upgrade git+https://github.com/fastaudio/fastaudio.git

And then restart the kernel.

Hi, is there a way to have different transformation for audio similar to the one for image?
I tried to use Pipeline to compile two transformations, however, I got ValueError…

ValueError: too many values to unpack (expected 3)

class AlbumentationsTransform(RandTransform):
    "A transform handler for multiple `Albumentation` transforms"
    split_idx, order = None, 2

    def __init__(self, train_aug, valid_aug):
        store_attr()

    def before_call(self, b, split_idx):
        self.idx = split_idx

    def encodes(self, sig):
        if self.idx == 0:
            aug_audio = self.train_aug(sig)
        else:
            aug_audio = self.valid_aug(sig)
        return aug_audio
def get_train_aug():
    return Pipeline([a2s, MaskFreq(), MaskTime()])
def get_valid_aug():
    return Pipeline([a2s])
1 Like

Semi-related, but should the SpecAugment Transforms in fastaudio be RandTransforms, instead of Transforms? It seems they are currently applied on the validation set as well. It has been some time since I looked into split_idx and Transform mapping, so fastai may be correctly handling this under the hood. Just want to be sure! Thanks

^ The same reasoning could apply to other Augments that are currently Transforms and not RandTransforms. But it is not as clear which should be one vs. the other and it is likely problem-specific, so maybe the flexibility is best kept.
Looks like someone recently brought this up as well:

https://github.com/fastaudio/fastaudio/issues/101

Hello Robert,

I would like to experiment with the latest version - 0.9.x - of Torchaudio wav2vec stuff in FastAudio.

Could you please give some direction on how to do the upgrade of FastAudio to support TorchAudion 0.9.x in order to achieve this?

I’m pretty sure there is reason why FastAudio is pinned to TorchAudio 0.8.0.
Thanks,
Victor

Hey Victor,

Can you tell me more about what you’re trying to do? FastAudio is built for classification but not for speech recognition tasks. If you are working with speech I would recommend using thunder-speech which supports wav2vec2, or using torchaudio and pytorch directly.

Hello Robert,

Yes, I’m doing audio classification with Fastaudio and it works very well. But, I also need to do some kind of ā€œlifteringā€ of the audio before the classification in order to validate that the audio is correctly meet some criteria. The task is just to classify 3 type of sound, everything else should be ignored.
In the test I did, if the sound is saturated it is wrongly classified. The sounds are 1s short.

Intuitively, I think wav2vec2 will help. (I have less experience in audio manipulation)

But in general, is it difficult to update fastaudio to support torchaudio 0.9.x on Linux?

Thanks,

1 Like

I haven’t touched the code base in a while but if I remember correctly we pinned to 0.8.0 both to avoid future breaking changes, and because that version of fastai insisted on using a specific version of pytorch that wasn’t compatible with future torchaudio versions.

As far as how difficult it will be to update, it is something I would recommend creating a new environment and stepping through and seeing what breaks. Currently fastai supports pytorch 1.7+, and for torchaudio 0.9 you only need pytorch 1.4+. If you try this I expect some fastai stuff will break, and that would be a pain to debug/upgrade, and there may be some small fixes for torchaudio as well. Release notes help, and fastai has a discord with an audio channel: Discord, you’re more likely to get responses there than here I think.

Good luck and happy to answer any questions you may have.

Hello Robert,

Thanks for the tips. I re-compile fastaudio against the following version and the only thing I have to do is to recreate the model.
fastaudio 1.0.2.post0.dev1+g3d6c0a0.dirty (edit setup.cfg & re-build )

install_requires =
fastai>=2.5.0
torchaudio>=0.9
librosa==0.8
colorednoise>=1.1
IPython #Temporary remove the bound on IPython
fastcore>=1.3.20

fastai 2.5.2
fastbook 0.0.18
fastcore 1.3.26
fastdownload 0.0.5
fastprogress 1.0.0
fastrelease 0.1.12

torch 1.9.1
torchaudio 0.9.1
torchvision 0.10.1

Thanks,

1 Like

Hello Robert,

One error I got after the upgrade is this:
dls.show_batch(max_n=3)
…
~/projects/torch_1.9.1/lib/python3.7/site-packages/fastaudio/core/spectrogram.py in getattr(self, name)
70 return self._settings[name]
71 raise AttributeError(
—> 72 f"{self.class.name} object has no attribute {name}"
73 )
74

AttributeError: AudioSpectrogram object has no attribute _settings

Thanks,
Victor

Hey Victor,

Sorry it’s been a long time since I’ve looked at the code and I’m having trouble following it. The error is occurring in this block of code

def __getattr__(self, name):
        if name == "settings":
            return self._settings
        if not name.startswith("_"):
            return self._settings[name]
        raise AttributeError(
            f"{self.__class__.__name__} object has no attribute {name}"
        )

If you’re still stuck, can you share the full stack trace? Thanks

Hello Robert,

Here is the full stack trace. Would be nice to have it resolved.

========

AttributeError Traceback (most recent call last)
/tmp/ipykernel_2594/1652635938.py in
----> 1 dls.show_batch(max_n=3)

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastai/data/core.py in show_batch(self, b, max_n, ctxs, show, unique, **kwargs)
100 if b is None: b = self.one_batch()
101 if not show: return self._pre_show_batch(b, max_n=max_n)
→ 102 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)
103 if unique: self.get_idxs = old_get_idxs
104

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastcore/dispatch.py in call(self, *args, **kwargs)
116 elif self.inst is not None: f = MethodType(f, self.inst)
117 elif self.owner is not None: f = MethodType(f, self.owner)
→ 118 return f(*args, **kwargs)
119
120 def get(self, inst, owner):

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in show_batch(x, y, samples, ctxs, max_n, nrows, ncols, figsize, **kwargs)
116 min(len(samples), max_n), nrows=nrows, ncols=ncols, figsize=figsize
117 )
→ 118 ctxs = show_batch[object](x, y, samples, ctxs=ctxs, max_n=max_n, **kwargs)
119 return ctxs
120

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastai/data/core.py in show_batch(x, y, samples, ctxs, max_n, **kwargs)
16 else:
17 for i in range_of(samples[0]):
—> 18 ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(samples.itemgot(i),ctxs,range(max_n))]
19 return ctxs
20

~/projects/torch_1.9.1/lib/python3.7/site-packages/fastai/data/core.py in (.0)
16 else:
17 for i in range_of(samples[0]):
—> 18 ctxs = [b.show(ctx=c, **kwargs) for b,c,_ in zip(samples.itemgot(i),ctxs,range(max_n))]
19 return ctxs
20

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in show(self, ctx, ax, title, **kwargs)
75 def show(self, ctx=None, ax=None, title="", **kwargs):
76 ā€œShow spectrogram using librosaā€
—> 77 return show_spectrogram(self, ctx=ctx, ax=ax, title=title, **kwargs)
78
79

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in show_spectrogram(sg, title, ax, ctx, **kwargs)
87 ia = ax.inset_axes((i / sg.nchannels, 0.2, 1 / sg.nchannels, 0.7))
88 z = specshow(
—> 89 channel.cpu().numpy(), ax=ia, **sg._all_show_args(show_y=i == 0), **kwargs
90 )
91 ia.set_title(f"Channel {i}")

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in _all_show_args(self, show_y)
50 def _all_show_args(self, show_y: bool = True):
51 proper_kwargs = get_usable_kwargs(
—> 52 specshow, self._settings, exclude=[ā€œaxā€, ā€œkwargsā€, ā€œdataā€]
53 )
54 if ā€œmelā€ not in self._settings or not show_y:

~/projects/fast-src/fastaudio/src/fastaudio/core/spectrogram.py in getattr(self, name)
70 return self._settings[name]
71 raise AttributeError(
—> 72 f"{self.class.name} object has no attribute {name}"
73 )
74

AttributeError: AudioSpectrogram object has no attribute _settings

========

Thanks,
Victor

1 Like

Hey Victor, I’m sorry but I looked this over and I’m still not sure how the bug is getting triggered. I’m on my way out of town, but you may want to try asking in the Discord Group. Good luck and sorry I couldn’t be of more help.

Thanks Robert.

Hi all! I am a data scientist/cofounder, and a deep learning practitioner. I mostly work on speech technologies, training models, pushing them to production and publishing whenever I can. But in order to really learn about deep learning fundamentals, I recently started getting into fastai.

Since I work a lot with audio and speech I was both happy and excited to see that the audio part of the project is community driven. I was wondering what are the development plans of fastai audio and whether there are maintenance issues. I checked the issues but they are mostly enhancement and there are not many commits during the last months.

Thanks!

Hey @gullabi, sorry for the delayed response. Fastaudio is not currently under active development. The original developers switched to mainly doing speech-to-text and text-to-speech, while fastaudio is focused on classification and isn’t suited for ASR/TTS. One of the original developers, @scart97 maintains a simple but awesome ASR library (GitHub - scart97/thunder-speech: A Hackable speech recognition library.), and we also still have an audio machine learning telegram where there is very little chatter but if you ask a question someone usually answers, let me know if you’d like to join.

Right now I’d recommend maybe contributing to torchaudio. When we started fastaudio, audio ML was a pain and you had to do lots of stuff manually, so we tried to build that stuff so you wouldn’t have to be an audio expert to do ML in the domain, but torchaudio came around and built a lot of the same stuff (but more of it and a lot better). Hope this helps, take care.

4 Likes

Hi I’m Harry, I’m one of the creators of fastaudio (classification library) and have contributed to other audio libraries like pyannote-audio (speaker diarization).

Currently I work at a TTS company (sonantic.io) as a Research Engineer.

There are several audio chats / communities for different audio problems and they are all quite small and quiet. For people who are interested in various audio applications it’s also annoying to have to login to so many chats :slightly_smiling_face:

I’ve created a discord channel that is for all types of audio problems to try and merge the communities a bit as many of the things we work on are shared / similar when working on ml with audio.

Feel free to join :slightly_smiling_face:

Machine Learning with Audio

5 Likes

Hi all, I was interested in better understanding audio spectrogram vision learning, so I recreated some useful notebooks I found into a simple notebook using fastai v2:

I hope this may be useful as a simple starting point or learning tool! It’s not currently using fastaudio, but I may look into adding that as well.

1 Like

Hey guys, it’s been a while since this library was maintained.
I wanted to try it out with the latest libraries, so I opened a PR to update the dependencies.
Not sure who is maintaining this library today, so I’m also writing here in case my PR reaches no one.

For those that are interested here’s a link

5 Likes

Hey Tal, thanks for that!

But wouldn’t it be better to not fix versions of dependencies, and consistently use min-versions instead? Ie fastai>=2.7.0 instead of fastai==2.7.12.

Otherwise, the error you describe is bound to happen again.

2 Likes

Hey!
Generally I agree, but I just wanted this PR to be accepted so I want by the old rules.
Also I remember I saw a previous comment somewhere (maybe in a previous PR) that it’s like that so fastai doesn’t sometime break fastaudio.
But as you can see my PR wasn’t accepted yet and it’s been a while sadly :frowning:

1 Like