Unable to decode items

I’m trying to create a Transform for my y-block in the data-loader but the decode-function is never called but the encode function works fine and is called correctly.

I is decoding when I’m calling it separately but not when it’s used with the data loader.

class CustomText(str):
    """Helper function to be able to show the label of the data"""
    def show(self, ctx=None, **kwargs): 
        return show_title(self, ctx=ctx, **kwargs)


class CustomTokenizer(Transform):
    """ Converts characters to numbers and vise verse"""*
    def __init__(self, df, char_limit=100, str_max_len=12):
        self.df = df
        self.str_max_len = str_max_len+2 # need to add start and stop
        self.tokenstats = dict()   
        df['text'].apply(self.count_letters)
        

         chars  = [ c for c in sorted(self.tokenstats.keys()) if self.tokenstats[c]>char_limit]
        all_chars = ['#pad#','#stop#','#unk#'] + chars
        self.o2i = {c:i for i, c in enumerate(all_chars)}
        self.vocab = {self.o2i[c]:c for c in self.o2i.keys()}

    def count_letters(self, st):
        for c in st:
            n = self.tokenstats.get(c, 0) +1
            self.tokenstats[c]=n
            
    def encodes(self, x:CustomText):
        print('encodes')
        tokens = np.array([self.o2i.get(c, self.o2i['#unk#']) for c in x])
        tokens = np.pad(plate, pad_width=(1, self.str_max_len-len(plate)-1), constant_values=self.o2i['#pad#'])
        tokens[len(x)+1] = self.o2i['#stop#']
        # how to add endchar?
        return TensorText(tokens)

    def decodes(self, x):
        print('decodes')
        encoded = [self.vocab.get(n, '#unk#') for n in x.cpu().detach().numpy() if n != self.o2i['#pad#'] and n!= self.o2i["#stop#"]]
        return CustomText(''.join(encoded))

class Yblock(): 
    def __init__(self,df):
        self.df= df
    def __call__(self):   
        ltok = LicenseTokenizer(self.df)
        return TransformBlock(item_tfms=[ltok])

yblock = Yblock(df)
data = DataBlock(blocks=(ImageBlock, yblock),
          get_items=get_items,
         get_x=get_x, get_y=get_y,
                 item_tfms = [Resize((80,224), method='pad', pad_mode='border')],
         splitter=RandomSplitter())
dls = data.dataloaders(df,bs=4)
batch = dls.train.one_batch()
decoded = dls.train.decode(b)

The decoded batch is of type TextTensor but I’m expecting it to be of type “str” and the decode function never prints out “decodes”.
The data.summary() function does not throw any errors.

How do I make it to be called correctly or ideas of how to debug this problem?

To decode a batch you should use decode_batch

It’s format is:

learn.dls.decode_batch((*tuplify(batch[0]), *tuplify(batch[1])))

(where batch[0] is input and batch[1] is output)

2 Likes

Great, it works now.
One more question popped up to gain better understanding.
If I’m using Categorize as the y-block no type annotations seems to be used in either encode and decode functions and only the y-part is being decoded/decoded. But i my case I need to use type annotations i both encoder and decoder or it will also try do decode Images. My question is: How do I know when to use the type annotations and not? Is it possible to solve this problem without the helper class of TensorText?

If I’m not mistaken, the CustomText and TensorText you added are to add the .show method to your output. IIRC encodes expects inputs with a show method. Out of curiosity, does using the TensorText type in the encodes declaration instead of CustomText work?

yes, it works perfectly

Awesome…it shows that encodes needs a type with a show method