Hello guys, I started my first lesson of DL for coders this week. I’m working on a problem of multilabel classification, but I’ve been having a really hard time opening my dataset with the from_ methods. I’m using the fastai_audio library, I applied the correct paths but it doesn’t open any item. Probably I’m missing something. Thank you.
@baz Thanks for the response. I changed my folder structure to work correctly. But before It should be working fine as well. Anyways, I would like to know how you deal with real songs, where the duration of each audio is longer than some seconds. The preprocessing and train are taking a lot of time even with a small dataset. What do you recommend? I got a better performance using max_to_pad but I’m not sure it’s the best option. Thank you.
It depends on where the script is running. Printing items in AudioList. __init__ might help.
Update of my last response: Running some tests I realized that it didn’t take that long it was only the first epoch. Currently using only the duration parameter. Thank you.
Hello! I am training a CNN model to extract vocals from audio. I used MUSDB18 dataset for training. The problem is when I try to evaluate them with museval, I get negative SDR, ISR, SAR. I thought that the problem is how I inverse the signal with ISTFT(I used Librosa and SciPy Libraries), but I couldn’t figure it out. It’s really strange because the results are very good just by listening to them. Does anyone have any idea? Thanks!
Hey all, sorry I haven’t been contributing to this thread much. I’ve still been working on audio, specifically speech recognition, and have learned a lot of great tricks for training models. I plan to put some work into this thread updating the wiki with more resources and sharing what I’ve learned. Here’s some of what I hope to add:
Sorting your first epoch by audio length (Sortagrad) and how to implement
How to use a SortishSampler (trademark Jeremy Howard) for grouping audios by length for efficient GPU utilization while not having the same items together for every batch
Training with CTC loss, how it plateaus, and why it sometimes fails to converge
Experiments to warmup speech models on short labels until they converge, then train them on full datasets.
I’m a newbie in deep learning let alone audio… but I have a small question.
I’m trying to make sense of using MLP for speech feature extraction, and I’m a bit confused about the names of parameters.
Could you explain the difference between ‘control parameters’, ‘trainable parameters’, and ‘learning parameters’?
Hey @ahammami0, can you share with me where you saw these terms? I’m not completely familiar with them but can take a look and hopefully explain after. In deep learning we generally differentiate between ‘trainable parameters’ and ‘untrainable parameters’ which are more commonly called ‘hyperparameters’.
Trainable parameters are just what they sound like, parameters that are updated through training in order to lower the loss and get closer to a good solution. Hyperparameters are the ones that are chosen in advance and aren’t learned by the model through backpropagation, such as learning rate, the number of mel bins in your melspectrogram…etc.
Anyways welcome to audio DL and please reach out if you have any questions.
Hey @MadeUpMasters, Thanks a lot for showing support. I actually cleared my own confusion. The “control parameters” is the term used to call the acoustic parameters of the speech signal, during the speech feature extraction phase.
Much appreciated.
I’m a software engineer but a newbie in the ML/DL world. I’ve a non-english STT project that I’m currently tinkering with.
I’m trying to use the fastai DataBlock API to prepare a learner with ImageBlock (spectograms) and TextBlock, using the model from deepspeech.pytorch. Haven’t really been able to get things going yet unfortunately.
Do you folks recommend that I try the fastai-audio library instead?
Dear audio peeps,
I’m diving into pathological speech processing and I’m trying to wrap up my head around the feature extraction and audio pre-processing. For this, I’d like to know what you believe are best tools / methods / approaches and best practices that are applied in the said domain.