Developer chat

stas · November 15, 2018, 6:00am

I implemented the jupyter notebook experiments module to maximize memory utilization, as discussed here.

Please have a look: https://github.com/stas00/ipyexperiments

Your feedback is sought after and if you do have some, please send it to this thread.

Thank you.

Rvbens · November 15, 2018, 8:07am

Okey, I only have to change the file in docs_src and you take care of the conversion right?

fredguth · November 15, 2018, 12:44pm

This is great!

sgugger · November 15, 2018, 2:27pm

Yes indeed.

sgugger · November 15, 2018, 4:48pm

Small breaking changes:

removed TextFilesList to replace it by TextList like in everywhere else.
put col everywhere there was a col or cols argument in the data block API.

stas · November 15, 2018, 5:45pm

While you’re changing the API, perhaps these could be normalized?

def language_model_learner(data:DataBunch, bptt:int=70, emb_sz:int=400, nh:int=1150, nl:int=3, pad_token:int=1,
def text_classifier_learner(data:DataBunch, bptt:int=70, max_len:int=70*20, emb_sz:int=400, nh:int=1150, nl:int=3,
def get_tabular_learner(data:DataBunch, layers:Collection[int], emb_szs:Dict[str,int]=None, metrics=None,
def get_collab_learner(ratings:DataFrame, n_factors:int, pct_val:float=0.2, user_name:Optional[str]=None,

have get_ everywhere, or nowhere?

Also the first two could have their argument positions synced. text_classifier_learner injects max_len before other arguments - could probably go after, to stay similar.

and then we have:

def create_cnn(data:DataBunch, arch:Callable, cut:Union[int,Callable]=None, pretrained:bool=True,

it also returns a learner object, but the name is completely different. get_cnn_learner?

And this one has no action - get/create in the name:

def simple_cnn(actns:Collection[int], kernel_szs:Collection[int]=None,

and we use ‘get_’ in:

def get_embedding(ni:int,nf:int) -> nn.Module:

Rvbens · November 15, 2018, 6:27pm

Nice, that was my first PR ever

stas · November 15, 2018, 6:46pm

Here is another questionable API:

def series2cat(df:DataFrame, *col_names):

it does in place edit, returns nothing - should it be series2cat_ instead?

sgugger · November 15, 2018, 6:53pm

That one will very likely disappear once I’ve refactored collab as it’s only used there.

stas · November 15, 2018, 7:03pm

awesome. and when you do also can you replace *col_names with normal list argument, so one could pass a list and not needing to expand it with *cols. Thank you.

sgugger · November 15, 2018, 9:21pm

After discussing with Jeremy, I changed again the names of the arguments from col to cols in the data block API when you can pass one or more columns. If you can pass only one, the name is col, if you can pass several, the name is cols. Example:

data_clas = (TextList.from_csv(imdb, 'texts.csv', cols='text')
                     .split_from_df(col='is_valid')
                     .label_from_df(cols='label'))

In the first and last function, you can pass multiple columns (if you have multiple text fields or multiple labels), but one the second one, you can only pass one column.

313V · November 16, 2018, 1:27pm

a few datablock api questions:
i really like it in general but am not sure how to do a couple things…

it’s not clear to me through the datablock api how one would change the sampler or set shuffle=False
how do you add an unlabeled test set? in general, inference on outside data is confusing to me

apologies if there are examples or docs i missed.

sgugger · November 16, 2018, 2:02pm

All the arguments for dataloader have to be passed to the call databunch(). It delegates to DataBunch.create which doesn’t take shuffle or sampler for now. My advice would be to write a customized subclass of DataBunch for your needs (like the TextLMDataBunch or TextClasDataBunch).
You add the test set with the add_test method, after the labelling. It can either take an array of items or an ItemList.

313V · November 16, 2018, 2:54pm

Thanks

sgugger · November 16, 2018, 3:59pm

So to continue on the renamings:

get_tabular_learner is now tabular_learner
get_collab_learner is now collab_learner
get_embedding is now embedding

Args of text_classifier_learner and language_model_learner are aligned.

pgollakota · November 16, 2018, 4:03pm

Hello everyone,

I added a new feature to explore a given ImageDataBunch using PAIR’s Facets Dive exploration. You may have seen the excellent visualization of the Quick, Draw! dataset in Facets Dive. I find it really useful to be able to rapidly evaluate which images are doing worse or better and get an intuitive sense of the images, and Facets Dive helps immensely in that regard.

Here is the diff and the doc notebook for starters. I still need to add tests, so it’s a WIP. I just wanted to announce it to get some feedback about whether you would consider adding this to FastAI or if it is considered out of scope.

Facets Dive GIF

A big caveat is that Facets Dive has still not been ported to work with Jupyter Lab. It only works with Jupyter Notebook. There is an outstanding issue in the Facets repo to fix the Jupyter Lab incompatibility.

Any feedback appreciated.

stas · November 16, 2018, 5:30pm

perfect!

and create_cnn => cnn_learner ?

sgugger · November 16, 2018, 5:33pm

No this one stays as it is.

stas · November 16, 2018, 5:34pm

despite it returning a learner object, just like the others?

piotr.czapla · November 16, 2018, 9:40pm

if you want to keep compatibility, with the recorded videos why not to simply add deprecation warning to create_cnn and pointing ppl to new cnn_learner ?