Hi @drlauren, sorry for the delay in responding to this. I dumped a ton of info in here so please reply w info if you get stuck somewhere and I will help in a much more timely manner. If you’re able to share a nb or repo with your code, that would be even better. We also have some tutorials with basic DataBlock examples
I reformatted your code below, you can share code easily here by surrounding it with three backticks ``` (on US keyboards this is above the tab key and left of the 1 key)
import librosa
import prettymidi as pm
# load the audio data
sig, rate = libroasa.load(audiofile)
# create the VQT — representing the distribution of sound energy accross frequency
vqts = librosa.core.power_to_db( np.abs(librosa.vqt(sig, sr=rate, hop_length = rate * 0.01 fmin=27, n_bins=84, gamma=1.5, bins_per_octave=12).T))
#normalize the VQTs
X = spec_mag_db - np.mean(spec_mag_db)
X /= np.std(X)
#load the midi data
midi_data = pm.PrettyMIDI(midifile)
#turn midi data into one-hot chromagrams
y = midi_data.get_chroma(fs=20).T y = y.astype(np.bool).astype(np.uint8)
Assuming your output is some type of classification of the audio (e.g. a label representing the genre), you would use a CategoryBlock
, since your input is Audio, they would be passed in as follows:
blocks = (AudioBlock, CategoryBlock)
No, if you are using a csv file you would download the audio in advance and then do something like
# note you can only have one sample rate for all your audio, so if you have varying
# sample rates you will need to resample all audios to one sample rate. Replace
# all references to "rate" below with your actual sample rate e.g. '16000'
def vqt_func(sig):
return librosa.core.power_to_db( np.abs(librosa.vqt(sig, sr=rate, hop_length = rate * 0.01 fmin=27, n_bins=84, gamma=1.5, bins_per_octave=12).T))
# Resize Signal crops all audio signals to the same length in milliseconds, it is necessary to have
# inputs of equal size in order to use the gpu. 5000 in the example = 5000ms = 5s but can be changed
# to whatever.
item_tfms = [ResizeSignal(5000), vqt_func(sig)]
blocks = DataBlock(blocks=(AudioBlock, CategoryBlock),
# this reads the column of the csv that has the name of the audio file, if it is just
# the file itself, you need to add the argument `pref=str(audio_path.resolve())` where audio_path
# is a pathlib object representing where your audio is stored. This is really confusing
# as I write it so if you have any questions please ask.
get_x = ColReader('<name of column with your audio filenames>'),
get_y = ColReader('<name of column with your labels>'),
item_tfms = item_tfms
splitter=RandomSplitter(valid_pct=0.2, seed=42)
)