Making a custom data transform

First post, new to deep learning and just finished the book. Working on my first big project I have some data saved in several thousand npy files. One file loaded is of shape (3, 4096) and it is essentially a waveform.

I want to convert each file to a CQT spectrogram where the three dimensions act as ‘RGB’ and then use a CNN for binary classification. When it came to developing the DataBlock I had this solution:

First Approach
For blocks = (ImageBlock, CategoryBlock) I had get_x = cqt_tfm, with code below:

def scale_minmax(X, min=0.0, max=1.0):
    X_std = (X - X.min()) / (X.max() - X.min())
    X_scaled = X_std * (max - min) + min
    return X_scaled

def cqt_image(arr):
    cqt = np.abs(librosa.cqt(arr/np.max(arr), sr=2048, fmin=8, hop_length=64, filter_scale=0.8, bins_per_octave=12))
    img = scale_minmax(cqt, 0, 255).astype(np.uint8)
    return img
    
def cqt_tfm(fname):
    return np.apply_along_axis(cqt_image, axis=1, arr=np.load(fname)).transpose(1,2,0)

And this works, but one downside is that it’s slow because it has to do all this processing for several hundred thousand files. I also, instead of having CQT code operate in the get_x function, tried developing a transform to pass into item_tfms, because this seems like the “correct” way to do it so that new files can get transformed according to this structure but when I run the datablock summary, shown below, it calls my getx function first, then PILBase.create, interpreting the (3,4096) object as an image. When in reality I want the pipeline to do getx → CQTTransform → PILBase.create. Because the current way is scaling the data in weird ways and I’d preferably not have to undo the change by PILBase.create just to have to call it again.

Setting up Pipeline: getx -> PILBase.create
Setting up Pipeline: label_func -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: CQTTransform -> ToTensor
Setting up before_batch: Pipeline: 

Is there a good/fast way of going about this? Currently, I’m running a separate script to convert into CQT format in parallel for speed.

Hi @QuantumAbyss . Is this data from the Kaggle G2Net Gravitational wave detection competition? I had made a custom transform for the CQT transform but I am also having the same problem i.e. the transform is slow. I am using nnAudio to do the transform which is quite fast than other methods but it’s still slow as it does everything on the cpu.

I can do the transform on GPU as well but I am not there yet.

I have similar post here .

Also, you can refer my notebook here. Here my transform is doing close to what you want i.e. getx --> cqt transform --> image. You may find some insight about how to integrate your pipeline into your transform.

In case you are able to find a way to speed up the transformation from npy file to cqt transform then do post here. I am also trying to achieve similar thing in terms of speed.

@sapal6 Yes it is! Thanks for the recommendation for nnAudio, it definitely helped speed things up. I ended up approaching the problem slightly differently, using the mid-level API to build a custom data set instead of creating a datablock, see code below. Training ended up being much faster, though I’m still working with a portion of the data.

I also notice that during training, if I run with too many epochs, my machine completely crashes (requires force reboot), but that’s a separate issue.

def idmap(myid, is_test):
    a, b, c = myid[0], myid[1], myid[2]
    if is_test: return path/'test'/f'{a}/{b}/{c}/{myid}.npy'
    return path/'train'/f'{a}/{b}/{c}/{myid}.npy'

class GravDataset(torch.utils.data.Dataset):
    def __init__(self, labels, is_test=False):
        self.labels = labels
        self.is_test = is_test
        self.q_transform = CQT1992v2(sr=2048, fmin=20, fmax=1024, hop_length=32)
        
    def __getitem__(self, i):
        currid = self.labels['id'].loc[i]

        arr = np.load(idmap(currid, self.is_test))
        waves = arr / np.max(arr)
        waves = torch.from_numpy(waves).float()
        image = self.q_transform(waves)
        
        return (torch.nn.functional.normalize(image), 
                torch.tensor((0 if i < 10 else 1) if self.is_test else self.labels['target'].loc[i], dtype=torch.long))
    
    def __len__(self): return len(self.labels)
1 Like

Ye,s I started with a Pytorch dataset (similar to yours’s) first but then I wanted to learn how to use the low level API and how can I use the Siamese tutorial for this task. So, I was trying to play around with Transforms for my code.

Currently I am finding it very hard to take out time for this competition and as such I haven’t been able to make much progress on my code. Do, share your code once you are done with the competition . It would be a good opportunity to learn for me.