Deep Learning with Audio Thread

taher · May 12, 2019, 12:17pm

Can Any Body please help me to set up fastai audio module in google colab . As I am new to this field and already tried with !wget to get files from github and also running install.sh but it is throwing me error .

ste · May 12, 2019, 12:41pm

Can you attach the error log?

baz · May 12, 2019, 12:41pm

I’ve made a colab setting up the audio module:

https://colab.research.google.com/drive/1s0Ouw5PxvrmHdm_gBU0qiA6piOf3VSWO

Might want to pin this for others @MadeUpMasters

taher · May 12, 2019, 12:51pm

@baz I am getting this error when running in colab .

ImportError Traceback (most recent call last)

<ipython-input-5-310b7e31ad9a> in <module>() 1 from exp.nb_AudioCommon import * ----> 2 from exp.nb_DataBlock import * 3 import matplotlib.pyplot as plt 4 import torch 5 from fastai import *

/content/fastai-audio/exp/nb_DataBlock.py in <module>() 15 from IPython.display import Audio 16 import torchaudio —> 17 from torchaudio import transforms 18 19 class AudioItem(ItemBase):

ImportError: cannot import name ‘transforms’

taher · May 12, 2019, 12:51pm

Thanks i got what i need from @baz

baz · May 12, 2019, 3:31pm

I’m not getting any problems? It should be the same on both machines? What is the output when you run the first cell?

baz · May 13, 2019, 11:57am

I’ve been meaning to create a web app that can be used on peoples phones points to a server of the users choosing to send the recording to but I was finding it hard to find code that worked on iOS. However the search is over: https://kaliatech.github.io/web-audio-recording-tests/dist/#/

I’m going to fork this and create something simple that people can use on their phones/computers to test their models. Would people please do me a massive favour and confirm this works on their Android phones as I don’t have one

Here is the initial very basic version. Working on iOS 12, Chrome on Mac.

Any suggestions for features?

I’m thinking:

Diagnostic for browser support
Remembers server url
Remembers results
Clear results

There is also an example of a simple flask server that you can run with the app to predict results. However, beware of ssl problems. For this app to work on a phone, you’ll need your prediction server to have a certified ssl certificate. Does anyone know a good way to do this locally?

taher · May 13, 2019, 12:48pm

First Cell Works fine but when i run below code

from exp.nb_DataBlock import *

i am getting error mentioned above . Problem actually states that not able to import transforms from torchaudio. I even updated torchaudio but faced the same problem

jcatanza · May 13, 2019, 4:11pm

I checked functionality of all the blue buttons – they work on my Android galaxy s7

baz · May 13, 2019, 6:53pm

Thanks Did you check to see whether simple version I built works too?

jcatanza · May 13, 2019, 7:14pm

Ok, I just checked. I “start” and “stop” recording, then I can play the recording back. But underneath the playback progress bar, a “network error” message appears, for some reason.

jcatanza · May 13, 2019, 7:18pm

Exciting news, though. I also checked simple version on my laptop, which has been acting as though the microphone is not working. Your app worked, as before (including the network error message), but it finally verifies for me that my laptop’s internal mic is working!

baz · May 13, 2019, 7:35pm

Thats because the wav file isn’t being sent no where. You have to specify the server that is hosting your model. If you’re using the app from harryblum.co.uk, you’ll need to set up SLL too otherwise most browsers will block the request.

I’ve created a flask server in the repo server.py but I’m having problems loading learners with torchaudio. torchaudio seems to depend on a version of torch that doesn’t work with load_learner.

Has anyone managed to export a audio related model, re-import it with load_learner and then use it to make a after wrapping it in a AudioItem?

ThomM · May 14, 2019, 8:41am

Yep, I did this with basically no change from the part 1 lesson 2 example, it worked just fine. When I’m back at a computer I’ll make a gist and link it here. Have you tried it and had trouble?

Very cool that you’re working on this by the way, I always feel like actually using the model never sees enough love

baz · May 14, 2019, 12:07pm

I was trying most of yesterday to get it to work. What version of pytorch and fastai were you using?

ThomM · May 15, 2019, 3:37am

I’m afraid I’m not exactly sure anymore… it was an early “version” of fastai_audio, but I don’t believe the AudioItem has changed much.

Here’s the gist of the notebook where I used inference on an AudioItem with a Learner which used an AudioList as data:

gist.github.com

https://gist.github.com/mechamoth/b2bda1ff318dd77feb12646b973fd744

fastai_audio inference.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ATP Inference"
   ]
  },
  {

This file has been truncated. show original

What trouble were you having exactly?

baz · May 15, 2019, 11:12am

Amazing thanks @ThomM. Just weird pytorch errors. I’ll try again today and post if I have any problems.

I was literally just thinking whether it was possible to post notebooks as gists. I guess that you don’t share the code that goes along with it though.

Is this the preferred way to share notebooks?

baz · May 15, 2019, 11:27am

I’ve managed to get a method to remove silence down to this speed on a 10s audio signal at 16K

%5D
It’s still a training speed bottleneck though.

Any suggestions on how to speed it up?

def chop_silence(signal, rate, thresholds=(-.3, .3), pad_ms=200):
    actual = signal.squeeze().numpy()
    padding = int(pad_ms/1000*rate) 
    a = actual.copy()
    mask = lambda l, u:  (a < u) & (a > l)
    
    if isinstance(thresholds, float):
        thresholds = abs(thresholds)
        thresholds = mask(-thresholds, thresholds)
    elif len(thresholds) > 1:
        thresholds = mask(*thresholds)
    
    a[thresholds] = 0
    a[~thresholds] = 1
    a = a.astype(np.bool)
    
    for i in range(1, padding):
        a[:-i] |= a[i:]
        a[i:] |= a[:-i]

    z = a != 0
    z[1:] ^= z[:-1]
    ret = np.split(actual, *np.where(z))[1::2]
    return torch.tensor(ret[0])

ThomM · May 15, 2019, 12:37pm

Yeah it’s pretty convenient. There’s a “gist-it” Jupyter extension which makes it 1-click. Sadly that extension isn’t working for me so I had to download the notebook, make a new gist manually, use .ipynb as the file extension, and copy-paste the contents of the file into the gist. The extension used to work and is much easier… once it’s there you can also (usually) pretty easily embed them into blog posts or whatever. AFAICT it should pretty-render in Discourse (this forum platform), I’m not sure why it’s not working.

If it’s already in a proper github repo it’s probably just as easy to share the link to the ipynb file in the repo. There’s also https://nbviewer.jupyter.org which is usually faster and prettier than github; you can paste a github file URL or even a gist ID and get a decent preview, eg.

But it looks like that still doesn’t embed nicely in Discourse, hmm, it’s a shame they don’t play well.

ThomM · May 15, 2019, 1:06pm

What exactly are you doing here, trying to trim silence from the start and end of a signal (but not the middle)?

Is there a reason you’re casting it back and forth from numpy? I’m not sure, and I’m not in front of a proper computer to test it out, but I would’ve thought it would probably be faster if you used pytorch ops; it spares you the conversion, and it could leverage the GPU if you give it a CUDA Tensor. It doesn’t look like you’re doing anything numpy-specific (could be wrong there). I don’t think your mask needs to be a function, you should be able to compare directly. If you don’t mind losing some silence in the middle, maybe you could use pytorch’s nonzero op (np has this too) which would let you “extract” the good parts.

I’m writing this on an iPad so this almost certainly won’t work but maybe you could do something like:

def tfm_random_cutout(signal, thresholds=(-.3, .3)):
    l, u = thresholds
    mask = (l < signal < u).float() # bool to 0,1
    masked = signal * mask # zeros areas that don’t match
    return signal[masked.nonzero()] # extract non-zeros

And then pass the result of that to the PadTrim transform?

If you can’t lose the parts in the middle then it gets a bit trickier!

Also it turns out this problem (trimming leading & trailing silence) is called “alignment” in ASP, here’s a method posted earlier that might do the trick, then again pass the result to PadTrim to finalise? This library looks fairly full-featured, too: