How to create a custom Transform which would work during inference

I am trying to create a custom Transform for the following kaggle competition.

The input are actually signals from detectors and the transform which I have created generates spectograms for these signals using nnAudio.

Here is my code for the same →

class NNAudioTransform(Transform):
    """Custom Transform which uses nnAudio transforms
    to extract spectogram on the fly"""
    def __init__(self, df, x_col:str, y_col: str):
        self.df = df
        self.x_col = x_col
        self.y_col = y_col

    def encodes(self, img_path):
        label = self.df[self.df[self.x_col] == img_path][self.y_col].values[0]
        img = qtfms(img_path)
        #img = img.squeeze().numpy()
        return NNAudioImage(img, label)

Here is the notebook describing what i have done and the support functions (show method etc.) which I have created.

My transform is working good and I can successfully train the model on the data. However, the thing which I can’t figure out is that how to use this same transform during inference. The transform in the current state would output a tuple of the transformed image and the label but during inference I won’t have a label for the transformed (test) image.

How should I modify my transform so that the same can be used during training as well as during inference ?

On of the way that I can think of is to refactor my transform so that it only returns a tuple with just the image and not the label but then how would I pass the corresponding labels for the transformed image during training time?

Hello, this may help some. It is an area I have been recently playing in image segmentation for image transformations (augmentations). And you should be able to translate this idea to your code.

class SegmentationAlbumentationsTransform(ItemTransform):
#    split_idx=0
    def __init__(self, aug, **kwargs): 
        super().__init__(**kwargs)
        self.aug = aug
        
    def encodes(self, x: tuple):   
         #this code is called on in the learn.fit_one_cycle phase
    
    def encodes(self, img: TensorImage):
         #this code is called on in the learn.predict phase

The discussion for this was had here
Probably in your “class NNAudioTransform(Transform):” class.
So depends on the encodes input parameters as to which encodes is called. Therefore you will need to add another encodes function.

1 Like

This looks close to my problem . I will need to play around with my code a bit the way you have suggested.

I will post here in a while if this resolved my issue

One more thing. For this transform, how do you create your dataset? I tried to refactor my transform this way but when I create my dataset it fails to recognize that.

class NNAudioTransform(Transform):
    """Custom Transform which uses nnAudio transforms
    to extract spectogram on the fly"""
    def __init__(self, tfm): self.tfm = tfm
    def encodes(self, x: tuple):
        img = self.tfm(x)
        img = img.squeeze().numpy()
        label = get_label(x)
        return (PILImage.create(img),label)

and below is my dataset–>

dsets = Datasets(sample_subset, [[NNAudioTransform(qtfms)]], splits=splits)

When doing dsets[0] I get this →

(Path('../input/g2net-gravitational-wave-detection/train/7/7/7/777d746e90.npy'),)

I think the reason is that my transform expects a tuple but the items passed to the dataset are not tuples.

Any idea where If I a missing something here.

For image segmentation, it expects the image and the label/mask image to be combined in a tuple in the ‘fit’ transform. Thats what its doing here and pulling them apart. To do the same transform on each of them.

   def encodes(self, x: tuple):   #<== add the word 'tuple'
        img,mask = x

My code for the DataBlock loading looks like this:

class SegmentationAlbumentationsTransform(ItemTransform):
#    split_idx=0
    def __init__(self, aug, **kwargs): 
        super().__init__(**kwargs)
        self.aug = aug
        
    def encodes(self, x: tuple):
#        print(type(x))
        img,mask = x
        aug = self.aug(image=np.array(img.permute(1,2,0)), mask=np.array(mask))
        the_ret = TensorImage(aug['image'].transpose(2,0,1)), TensorMask(aug['mask'])        
        return the_ret

    def encodes(self, img: TensorImage):
        #For albumentations to work correctly, the channels must be at the last dimension. (Permute)
        aug_img = self.aug(image=np.array(img.permute(1,2,0)))
        return TensorImage(aug_img['image'].transpose(2,0,1))   
    
aug_pipe = A.Compose([A.ShiftScaleRotate(p=.9),
                      A.HorizontalFlip(),
                      A.RandomBrightnessContrast(contrast_limit=0.0, p=1., brightness_by_max=False)
                ])
aug = SegmentationAlbumentationsTransform(aug_pipe)

db = DataBlock(blocks=(TransformBlock(open_img), MaskBlock(codes = np.loadtxt(path/'labels.txt', dtype=str))),               
    get_items=get_image_files,
    get_y=lambda o: path/'labels'/f'{o.stem}{o.suffix}',
    splitter=RandomSplitter(valid_pct=0.1, seed=42),
    item_tfms=aug)

To see what sort of ‘type’ is required in the ‘encodes’. Have only one encodes function in the transform, remove the tuple word and put the print type statement in. Run the ‘fit’, this should tell you the ‘type’ of x.

    def encodes(self, x):
        print(type(x))
        return......
2 Likes

cool.Let me try this out.

By the way your previous answer indirectly helped me to fix the ‘show’ method in my code . Thank you for that.