Deep Learning with Audio Thread

You can also pass numpy arrays into many librosa functions. You just need to provide the sample rate as well. So you could just compute spectograms on your array, then use those with other code used for audio. You most likely don’t want a melspectogram though. The mel scaling is based on the properties of human hearing which there is no reason to think are useful in your case. But you should be able to apply many of the techniques here to non mel-weighted spectograms and librosa can produce those. Though then you may have a much larger amount of frequency data to deal with. You’ll thus probably want to set the minimum and maximum frequencies of your spectograms to something appropriate to ECG.
Another issue you are likely to find is the problem of dealing with the various channels of ECG data. That’s quite different to audio where generally only 1 channel is used and at most the 2 of stereo audio files.
So you may find that much of the stuff here doesn’t necessarily help. I’d imagine you can still use the basic idea of converting the time-series data to the frequency domain (spectograms), then creating an image from that and using existing image models. Perhaps creating a spectogram per channel, then concatenating the resulting arrays along the height axis to create a single image. Something like your sample image but replacing those time-domain line plots with spectograms. You could also use each EEG channel as a separate channel in the image, but pre-trained image models want 3 image channels so you’d have to deal with that. I think that Jeremy talked about this in one of the lectures in Part 1 when considering dealing with 4 channels in some dataset, maybe satellite data. He suggested some things you might apply to extending pre-trained 3 channel image models to the 12 you have (or however many, there’s 12 in that image).
I’d also be aware that the multi-channel nature might make a lot of the processing here not applicable. I’d imagine you would for instance want to take into account the multiple channels when normalising your data. Similarly applying augmentations without taking into account the multi-channel nature may not work well. Though on the other hand it might be fine.

3 Likes