How to build dataloader using Midlevel APIs when using data from dataframes

In the tutorial notebook 10_tutorial.pets, a transform is created that gets the data needed to build a data loader for the PETS dataset as follows:

class PetTfm(ItemTransform):
    def setups(self, items):
        self.labeller = using_attr(RegexLabeller(pat = r'^(.*)_\d+.jpg$'), 'name')
        vals = map(self.labeller, items)
        self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)

    def encodes(self, o): return (PILImage.create(o), self.o2i[self.labeller(o)])
    def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])

tls = TfmdLists(items, [Resize(224), PetTfm(), FlipItem(p=0.5), ToTensor()])
dls = tls.dataloaders(bs=64)

I’m trying to write the labeller, encodes and decodes methods for such a transform that gets the x’s and y’s from a dataframe but unsuccessfully. Any ideas on how to do go about with this?

I think instead of the RegexLabeller you should just use ColReader, and then work with its output. (If I’m understanding this correctly?)

@muellerzr I did do that and the vocab and o2i seem alright but i get the error : AttributeError: 'Series' object has no attribute 'read' when building the TfmdLists as follows:

class Tfm(ItemTransform):
    def setups(self, items):
        self.labeller = ColReader('label')
        vals = list(map(self.labeller, items))
        self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)

    def encodes(self, o): 
        return (PILImage.create(ColReader('name', pref=path)(o)), self.o2i[self.labeller(o)])
    def decodes(self, x): 
        return TitledImage(x[0],self.vocab[x[1]])

tls = TfmdLists(df, [Resize(224), Tfm(), ToTensor()])

I guess the error is with my usage of ColReader or my dataframe

Screenshot from 2020-07-05 17-10-16

Its retrieving the right name for the label but the image path isn’t getting retrieved correctly when using ColReader…

Looks like you have to fix the dataframe and ensure the value of the path variable.

The value of df['name'][0] is 0<space>jpg/image_03860.jpg, ie., 0 is not the index, as per screenshot.
And the value print(path) is /home/harish3110/.fastai/data/oxford-102-flowers.

@imrandude Yes, I guess I did a df.reset_index() along the way and didn’t realize it which messed up the dataframe. THanks!

1 Like