Understanding the data block API and how to build your own cusotm ItemBase/ItemLists

Just posted this on my twitter but hopefully this will be helpful to folks on here as the data block API is a popular subject around these parts:

Let me know what y’all think.



This is super useful and clarified a lot of doubts! Can you link to how a model is created for this datasets that it would be a end to end example?

Hi Ravi … that is something I’m working on.

I think some folks are already working on this and have done it elsewhere. Either way, I’ll have something out in next month or so if there is interest.

Wade, thanks and will keep an eye for such models. I am still learning hot to create models for complex tasks. Am I correct in assuming that you are trying to build some seq2seq encoder for text and feed along with rest of the embeddings into a single model ?

I wasn’t thinking that myself but I’m sure there are many uses for something like this, including seq2seq problems.

If you get something working using this lmk.

Thanks a ton @wgpubs ,this was super helpful.
I implemented your code + read the docs concurrently and it helped me a lot in succinctly understanding the basic inner workings of the library in the context of creating a databunch(which I could not get much with just the docs).
Please do share if you do the same for the learner as well. I’ll try myself as well.

hi All,

It would be very helpful how to define item list and label list for ranking problems where we have to sample among queries and not independent rows/indices. Each row in data frame is a query - item. I could change get() and from_df() in itemlist to sample queries and pull all item rows to pass on via itembase. But if we want analogous query-item specific label/target value, how to specific custom _item_cls?