Using Tab and Shift+Tab with the data block API

KevinB · November 15, 2018, 10:37pm

Is there a way to use tab complete and shift+tab with the new data block API? I really like being able to look at all of the arguments available in Jupyter Notebook, but I haven’t found a way to look at the arguments available on any of the functions besides the first one:

data = (TextList.from_csv(path, "texts.csv", col='text') #<------I can shift+tab this one
       .split_from_df(cols=2)#<-------shift+tab doesn't work here or any of the other ones.  
       .label_from_df(cols=0)
       .databunch())

This may just be a limitation of Jupyter and there is no way to do it,but I’m really hoping it’s something that’s doable.

wgpubs · November 16, 2018, 2:09am

Unfortunately, the only way you’re going to get the intellisense is to do things piece-meal.

Usually, what I do is break down the data block api code line-by-line and then once I have it doing what I want, I roll it all into a single line like you have above.

KevinB · November 16, 2018, 2:59pm

So I’ve done something similar to this in the past too and I want to make sure we are on the same page. So what I do is I set

test = TextList.from_csv(path, "texts.csv", col='text')

Then I can look at what I can do with test be typing test and hitting Tab. That pulls up all of the available arguments so on this on, I have .split_from_df and .label_from_df so I can do all the manipulations I want for those. I’m not sure how effective this is, but that’s my hacky work-around since I can’t tab to see what’s available.

sgugger · November 16, 2018, 5:06pm

Jupyter isn’t super great for that, and you will miss all the label methods at the second step, because they’re not direct children of ItemLists
It’s a bit frustrating, but you may have to type ?ItemList.label_xxx each time you want to see the args.

KevinB · November 16, 2018, 5:11pm

Is there a way to see what order the data blocks are supposed to go in? Is that just something you have to learn by looking at examples or what’s the best way to figure out the ordering of the datablocks. To give an example, if I want to do

TextList.from_csv(path, "texts.csv", col='text')
.databunch()
.label_from_df(cols=0)

How do I know if that is a valid order of blocks or not? My guess is that split_from_df and label_from_df could probably be swapped without any issue, but databunch has to be at the end and from_csv has to be at the beginning, but I have no idea how to verify that without testing different combinations of blocks.

sgugger · November 16, 2018, 5:32pm

There’s only one valid order: Input -> Split -> Label -> DataBunch (as explained in the docs).
Even exchanging splitting and labels won’t work (you’ll lose your labels basicaly) as it’s not designed this way.