Deep Learning with Audio Thread

pavlos · April 15, 2021, 12:57pm

You should look into those videos

ahammami0 · April 15, 2021, 1:37pm

Thank you @pavlos. Even though Seth’s implementation is in keras/ tensorflow, it’s still useful, especially as I’m struggling with the Dataset/Dataloader part in pytorch.

MuhammadAli · May 26, 2021, 2:59am

Hi, can anybody help me out with Fastai Audio model testing on Gradio.
Currently I am facing the following error:

AssertionError: Expected an input of type in

<class ‘pandas.core.series.Series’>
<class ‘pathlib.PosixPath’>
<class ‘fastaudio.core.signal.AudioTensor’>
but got <class ‘torch.Tensor’>

Thanks

KevinB · May 26, 2021, 3:06am

Can you create a small sample block of code to test out? I haven’t used gradio before so I might not be helpful, but I’m willing to look at it if you have an easy to run chunk of code to tinker with.

MuhammadAli · May 26, 2021, 3:08am

Yes sure, thanks for taking time.

labels=learner_res34.dls.vocab

def predict_audio(audio):

pred,pred_idx,probs = learner_res34.predict(audio[1])

return {labels[i]: float(probs[i]) for i in range(len(labels))}


gr_interface = gr.Interface(fn=[
                            predict_audio], inputs=gr.inputs.Audio(source="upload",type='numpy'),
                        #outputs=gr.outputs.Label(num_top_classes=len(labels_34)),
                        outputs=gr.outputs.Label(num_top_classes=5),

                        title="Audio Classification:",
                        description='{}'.format(labels),

                        #examples=[[examples_20_dir+x] for i,x in enumerate(os.listdir(examples_20_dir)) if i<2],
                        #examples_per_page=5,
                        #embedding='default',
                        interpretation="default")

gr_interface.launch(debug=True)

KevinB · May 26, 2021, 3:12am

What is being passed into audio? I think seeing where you build your dls would be useful as well

MuhammadAli · May 26, 2021, 3:14am

Actually, learner can handle the preprocessing, from a path.
But when I pass the ‘file mode’ in gradio, it gives me error:

AssertionError: Expected an input of type in

<class ‘pandas.core.series.Series’>
<class ‘pathlib.PosixPath’>
<class ‘fastaudio.core.signal.AudioTensor’>
but got <class ‘tempfile._TemporaryFileWrapper’>

MuhammadAli · May 26, 2021, 3:15am

AudioTensor.create expects a file. Is there any way, to pass a numpy array to it.
As that’s what “numpy” mode returns.

MuhammadAli · May 26, 2021, 3:20am

Here are the Gradio, Audio Input docs:

KevinB · May 26, 2021, 3:21am

If you look at the create function that AudioTensor uses, it looks like this:

    def create(cls, fn, cache_folder=None, **kwargs):
        "Creates audio tensor from file"
        if cache_folder is not None:
            fn = cache_folder / fn.name
        sig, sr = torchaudio.load(fn, **kwargs)
        return cls(sig, sr=sr)

where cls in this case is going to be AudioTensor.
So you can get a sig and sr from the file and then pass it into AudioTensor like this:

AudioTensor(sig, sr=sr)

MuhammadAli · May 26, 2021, 3:23am

And after it the transforms like:

aud2spec = AudioToSpec.from_cfg(cfg)
item_tfms2 = [ResizeSignal(2000), aud2spec2]

Will the learner be able to apply them?

MuhammadAli · May 26, 2021, 3:26am

MuhammadAli · May 26, 2021, 3:27am

I am not able to create AudioTensor this way

MuhammadAli · May 26, 2021, 3:29am

Sorry, for disturbing you too long, but I am stuck in it for very long now, so I just want to get over with it.

KevinB · May 26, 2021, 3:30am

AudioTensor doesn’t accept an argument called sig. if you remove sig= that should work

MuhammadAli · May 26, 2021, 3:40am

Thanks, Audio tensor worked.

RuntimeError: Error opening <tempfile._TemporaryFileWrapper object at 0x7f136d49d8d0>: File contains data in an unknown format.

But, still I am not able to load the file, neither from torch.load, nor librosa.load

KevinB · May 26, 2021, 3:46am

Have you looked at their demos? Maybe there is some help there. I don’t know that you want to be using gr.inputs.Audio, but since I’m not familiar with the tool, I can’t be sure.

demo1

demo2

demo3

MuhammadAli · May 26, 2021, 3:49am

Yes, I have gone through them.
Their docs says:

type (str) - Type of value to be returned by component. “numpy” returns a 2-set tuple with an integer sample_rate and the data numpy.array of shape (samples, 2), “file” returns a temporary file object whose path can be retrieved by file_obj.name, “mfcc” returns the mfcc coefficients of the input audio.

Now, using numpy returns “array of shape (samples, 2)”, I do not know how to handle it.
As once I have numpy array, the rest of pipeline is working fine.

MuhammadAli · May 26, 2021, 3:55am

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)

Now, I am able to parse the input, but learner is not able to predict.

KevinB · May 26, 2021, 4:01am

I’m not sure what’s causing that and would need more information to help debug it