DataBlock summary is amazing in v2

This is simply to highlight that the new DataBlock api summary function is amazingly useful for understanding what is going on…

Taken from 50_datablock_examples.ipynb

I have been playing with the library a lot and I really like it so far!

8 Likes

I completely agree :slight_smile: if I see anything broken it’s my go to now (even before trying a DataLoader!)

Props to @sgugger for building that :slight_smile:

6 Likes

Just wanted to give my :+1: to @sgugger for building the summary function. As I use it a lot.

Q: Would it be helpful to lift it into the DataLoaders instead of the DataBlock?

  • most often you have a Dataloaders instance - the library often hides the DataBlock, i.e. TextDataLoaders and ImageDataLoaders do this.
  • If you are not using the DataBlock API, you still have a DataLoaders instance.
  • DataBlock summary really is ‘just’ directing to DataLoaders anyway.

Anyway, fantastic to have, and I have my own version with patch - ++fastcore.

It’s less obvious to do a detailed summary of DataLoader as it does not know all the steps to build the batchs like the data block, but we could have a different summary for DataLoader yes.

Perhaps just change the naming?

IE instead of saying batch_tfms just simply do after_batch and after_item. I think that should solve most of the issues right there?

(we may then want something separate again for the Datasets too)

I mean it wouldn’t be able to do the steps of the dataset (the type_tfms), the split or the way the data is gathered. Just trying to get items, applying the items transforms, collating them and the batch transforms.

1 Like

So then we have two?

  • DataLoader level
  • Dataset level

This way both get in and for people who may not want to use the DataBlock (for whatever strange reason :wink: ) they can still get the useful debugging tools

And it’s up to them on debugging with both (if they so choose)

My initial thoughts was with DataLoaders. But I like the idea of being able to debug or inspect - as far as possible - at all levels. Also the individual DataLoader.

2 Likes