How to iterate using Data API without having to reload dataset

I’m trying to understand how to best work with fast.ai’s data API. In my understanding, the purpose of subclassing ItemBase is so that one can, for example, implement a plotting method for said item in that subclass. Similarly, subclassing ItemList allows implementing show_xys() and show_xyzs(). That’s very handy when using functions such as data.show_batch().

Unfortunately, that also means that I always have to recreate my dataset when I iterate on these methods. It would be much more convenient to have my data in one place, and my plotting functions in a separate class. Then I could substitute my plotting functions without having to reload my data.

I vaguely remember that Jemery said that fast.ai v2 would make use a lot more of delegation. I guess this would be one such case, where I could simply substitute the delegate of a plotting function, for example.

Is my understanding of the limitation of the v1 API correct, or does it sound like I’m using it wrongly? If I have correctly described a limitation of the v1 data API, is this indeed something that you are trying to address in v2?

Edit: I have worked around this limitation with the following coding style:

class MyItemBase(ItemBase):
    #...
    def plot(self, *args, **kwargs): return _plot(self.data, *args, **kwargs)
        
def _plot(data): 
    # ...
    plt.show()

That way, I can just reexecute the cell, which will redefine _plot, and it gets picked up by my existing MyItemBase instances. Same for show_xys etc… I do wonder though whether there is a more elegant way.

You are correct on all counts. With the dispatach system in v2, you end up implementing a version of show_batch or show_results for your new type, which doesn’t require you to recreate your dataset at each new iteration (if it returns batches of that new type).

1 Like

Thank you very much for clarifying.