Make a dataset of audio, to slice and laveling

I have created several model projects with fastai to implement topics 2 and 3 of the course. The image datasets were obtained with the ddg_search_images function of Fastai. They were small projects but they were useful to learn. Now I want to do something more ambitious. I would like to create an application that is able to recognize from an audio the type of scale it uses, instrumentation, most used melodic patterns and tonality among other things… For this I proposed to create a well labeled dataset to be able to train the model with guarantees. The plan is the following for this:
Creating a dataset to train a model to identify the musical scale of a piece of music is an interesting project. Here are some general steps to create such a dataset:

Data collection:

Finding pieces of music in various scales and genres. Using existing music files or creating my own recordings.
Making sure the pieces of music are correctly labeled with the musical scale they are in.

Splitting audio tracks:

Dividing each piece of music into fragments of several seconds. be able to use audio editing software to do this. Make sure to label each fragment with the corresponding scale.

Conversion to digital format:

Convert the audio fragments to digital format, such as WAV or MP3, so that they can be processed by machine learning algorithms.

Characteristic extraction

I initially searched for the files in a Youtube playlist. With a very simple and easy to use python script I downloaded 37 pieces of good quality Mozart music.
So far so good. Then with Audacity I opened the first file of the 37, divided it into 6 labeled pieces that once labeled and cut I exported them in MP3 format to a folder on a hard disk. My question is if there is a more automatic and fluid way to tag and cut than going piece by piece and file by file doing this procedure. Because with the little time I have to cut, tag and prepare all the audios it will take me more than a month.
I would very much welcome any suggestions or advice from any of you.
Best regards!!!

Hi Silvino

Totally different idea to start. There are MIDI files of music available on the Internet. So you know the instruments and the notes and speeds from the MIDI bytes. These are effectively “labelled” so you could train your deep learning on them. You should be able to guess the scale from collections of MIDI and “somehow” detect when the key changes.

Regards Conwyn