Trouble Understanding the Mid-Level Data API

OtterL2718 · October 12, 2023, 7:42am

Hello I recently went through chapter 11 of the Fastbook (Data Munging with FastAI’s Mid-Level API) and now I am trying to use it for creating a dataloader for image classification task. Unfortunately there are some things I don’t seem to grasp.

I am looking to create a custom type that I could use for showing images along with their class as a title.
Below are the classes and functions I have been creating for that purpose.

def resized_image(fn:path, sz=460):
    x = Image.open(fn).convert('RGB').resize((sz,sz))
    # convert image to tensor
    return tensor(array(x)).permute(2,0,1).float()/255.

class CustomType(Tuple):
    def show(self, ctx=None, **kwargs):
        img, title = self
        return show_image(img, title=title, ctx=ctx)

class CustomTransform(Transform):
    def setups(self, files):
        self.labeller = using_attr(RegexLabeller(pat=r'(.*)_\d+.png$'), 'name')
        labels = list(map(labeller, files))
        self.vocab = list(dict.fromkeys(labels))
        self.o2i = {label:idx for idx,label in enumerate(self.vocab)}
        
    def encodes(self,o): return (resized_image(o), self.o2i[self.labeller(o)])
    def decodes(self,x): return CustomType((x[0], self.vocab[x[1]]))

Encoding part works fine as it outputs a tuple with tensor and the class as a number.
When trying to decode the result I get the following error:

TypeError: only integer tensors of a single element can be converted to an index

Should I be encoding my data differently? I don’t understand what is going wrong here.

The CustomType works fine if I pass in the results like this:

img = resized_image(files[0])
CustomType((img, 'test_title')).show()

OtterL2718 · October 14, 2023, 8:30am

For anyone who might be facing similar issue to mine:

The issue was that I did not understand properly what “transforms dispatch over tuples” meant in action.
The simplest solution to my issue was changing the CustomTransform class to inherit from ItemTransform instead of the regular Transform. This class makes it so that when I pass in the tuple during decodes method it treats it as one item and not separate items so x[0] and self.vocab[x[1]] are passed to the same CustomType class simultaneously and not iteratively as would happen when a transform dispatches over tuples.

Alternatively, during encoding phase the returned tuple could be changed to a list type and once passed to decode it doesn’t dispatch over the elements as it only does so for tuples. Though, I do suspect this returning list type for encoding would cause issues when passed on to later transforms.