You should look into those videos
Thank you @pavlos. Even though Seth’s implementation is in keras/ tensorflow, it’s still useful, especially as I’m struggling with the Dataset/Dataloader part in pytorch.
Hi, can anybody help me out with Fastai Audio model testing on Gradio.
Currently I am facing the following error:
AssertionError: Expected an input of type in
- <class ‘pandas.core.series.Series’>
- <class ‘pathlib.PosixPath’>
- <class ‘fastaudio.core.signal.AudioTensor’>
but got <class ‘torch.Tensor’>
Thanks
Can you create a small sample block of code to test out? I haven’t used gradio before so I might not be helpful, but I’m willing to look at it if you have an easy to run chunk of code to tinker with.
Yes sure, thanks for taking time.
labels=learner_res34.dls.vocab
def predict_audio(audio):
pred,pred_idx,probs = learner_res34.predict(audio[1])
return {labels[i]: float(probs[i]) for i in range(len(labels))}
gr_interface = gr.Interface(fn=[
predict_audio], inputs=gr.inputs.Audio(source="upload",type='numpy'),
#outputs=gr.outputs.Label(num_top_classes=len(labels_34)),
outputs=gr.outputs.Label(num_top_classes=5),
title="Audio Classification:",
description='{}'.format(labels),
#examples=[[examples_20_dir+x] for i,x in enumerate(os.listdir(examples_20_dir)) if i<2],
#examples_per_page=5,
#embedding='default',
interpretation="default")
gr_interface.launch(debug=True)
What is being passed into audio? I think seeing where you build your dls would be useful as well
Actually, learner can handle the preprocessing, from a path.
But when I pass the ‘file mode’ in gradio, it gives me error:
AssertionError: Expected an input of type in
- <class ‘pandas.core.series.Series’>
- <class ‘pathlib.PosixPath’>
- <class ‘fastaudio.core.signal.AudioTensor’>
but got <class ‘tempfile._TemporaryFileWrapper’>
AudioTensor.create expects a file. Is there any way, to pass a numpy array to it.
As that’s what “numpy” mode returns.
Here are the Gradio, Audio Input docs:
If you look at the create
function that AudioTensor uses, it looks like this:
def create(cls, fn, cache_folder=None, **kwargs):
"Creates audio tensor from file"
if cache_folder is not None:
fn = cache_folder / fn.name
sig, sr = torchaudio.load(fn, **kwargs)
return cls(sig, sr=sr)
where cls in this case is going to be AudioTensor.
So you can get a sig and sr from the file and then pass it into AudioTensor like this:
AudioTensor(sig, sr=sr)
And after it the transforms like:
aud2spec = AudioToSpec.from_cfg(cfg)
item_tfms2 = [ResizeSignal(2000), aud2spec2]
Will the learner be able to apply them?
I am not able to create AudioTensor this way
Sorry, for disturbing you too long, but I am stuck in it for very long now, so I just want to get over with it.
AudioTensor doesn’t accept an argument called sig. if you remove sig=
that should work
Thanks, Audio tensor worked.
RuntimeError: Error opening <tempfile._TemporaryFileWrapper object at 0x7f136d49d8d0>: File contains data in an unknown format.
But, still I am not able to load the file, neither from torch.load, nor librosa.load
Have you looked at their demos? Maybe there is some help there. I don’t know that you want to be using gr.inputs.Audio, but since I’m not familiar with the tool, I can’t be sure.
Yes, I have gone through them.
Their docs says:
- type (str) - Type of value to be returned by component. “numpy” returns a 2-set tuple with an integer sample_rate and the data numpy.array of shape (samples, 2), “file” returns a temporary file object whose path can be retrieved by file_obj.name, “mfcc” returns the mfcc coefficients of the input audio.
Now, using numpy returns “array of shape (samples, 2)”, I do not know how to handle it.
As once I have numpy array, the rest of pipeline is working fine.
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)
Now, I am able to parse the input, but learner is not able to predict.
I’m not sure what’s causing that and would need more information to help debug it