Different size for item_tfms in every batch

ElisonSherton · February 15, 2022, 12:32pm

Hi guys,

I am trying to run an experiment to check how the size of input could affect the training procedure at a batch level.

I have a fully convolutional neural network which is doing multi-class classification problem. I am grouping the images having similar aspect ratios in a batch and then what I want to do is

from all the samples in that batch
figure out the smallest image resolution (wmin, hmin)
resize all images to (wdash, hmin) [Where wdash is computed as per corresponding image's aspect ratio]
resize the batch to a square dimension of (hmin, hmin)

I want to check with this experiment the effect of aspect ratio and image resolution on training. *How can I build a dataloader which can allow me to apply custom item_tfms during every batch of an epoch?

Any suggestion is welcome and if there’s any research you’ve come across which talks about this, kindly let me know in the replies.

Thanks!

ElisonSherton · February 16, 2022, 8:59am

Was able to do this with before_batch callback in dataloaders by defining a custom function as follows

class lowest_resize(ItemTransform):
    def encodes(self,samples, pad_idx=1, pad_fields=0, pad_first=False, backwards=False):
        widths = []
        # Figure out the minimum height
        for item in samples:
            ip, op = item
            _, _, w = ip.shape
            widths.append(w)
        min_w = min(widths)
        
        # Define a resize function based on the minimum height
        rsz_func = Resize(min_w, method = ResizeMethod.Pad, pad_mode= PadMode.Zeros)
        
        # Using the resize method above, transform the images thus obtained into 
        # a small sample size and then collate them together
        final_samples = []
        for item in samples:
            ip_image, target_label = item
            
            # Here's where the transformation happens
            pilimage = to_image(ip_image)
            resized_image = rsz_func(pilimage)
            tensor_image = TensorImage(resized_image).unsqueeze(0).transpose(0,-1).squeeze(-1)
            final_samples.append((tensor_image, target_label))
        return final_samples
    
    def decodes(self, o):
        return o

Used this link as a template which is straight out of fastai source code

github.com

fastai/fastai/blob/master/fastai/text/data.py#L124

      
        
                display_df(pd.DataFrame(ctxs))
                return ctxs
            
            
# Cell
            @typedispatch
            def show_batch(x: LMTensorText, y, samples, ctxs=None, max_n=10, trunc_at=150, **kwargs):
                samples = L((s[0].truncate(trunc_at), s[1].truncate(trunc_at)) for s in samples)
                return show_batch[TensorText](x, None, samples, ctxs=ctxs, max_n=max_n, trunc_at=None, **kwargs)
            
            
# Cell
            class Pad_Input(ItemTransform):
                def encodes(self,samples, pad_idx=1, pad_fields=0, pad_first=False, backwards=False):
                    "Function that collect `samples` and adds padding"
                    self.pad_idx = pad_idx
                    pad_fields = L(pad_fields)
                    max_len_l = pad_fields.map(lambda f: max([len(s[f]) for s in samples]))
                    if backwards: pad_first = not pad_first
                    def _f(field_idx, x):
                        if field_idx not in pad_fields: return x
                        idx = pad_fields.items.index(field_idx) #TODO: remove items if L.index is fixed
                        sl = slice(-len(x), sys.maxsize) if pad_first else slice(0, len(x))

Thanks!