Fastai v2 chat

lgvaz · April 13, 2020, 9:47pm

That’s is certainly possible, and a very cool idea. You just need to be careful on how you handle the case when you’re doing some other form of learning that is not classification, how would this work for regression for example?

kshitijpatil09 · April 14, 2020, 10:35am

I wonder what happens with unique in the said example

sgugger · April 14, 2020, 11:36am

It shows the same pictures/text/input with various data augmentation and the float target.

boris · April 14, 2020, 9:49pm

Looking at GradientAccumulation again I wonder if there should be the following updates:

define it by number of batches to run n_batches instead of minimum number of items to accumulate n_acc
scale the loss prior to calculating the gradients (and scale it back for logging purposes) so that we don’t have to adjust learning rate

Do you think it would make it more intuitive to use?

vrodriguezf · April 15, 2020, 10:53am

Hi! Is there a way to work with imbalanced classes in a tabular dataset in fastai2? I can see references to weighted_dataloaders in the code, but I am not sure of how to use it.

Thank you!

boris · April 15, 2020, 9:15pm

This is the correct one to use!
The idea is to pass weights for each items of your dataset. So for example you could define weights so that the sum of the weights for each class are all equal, ie “1/n_class_samples” for each item of a specific class.

This would give same chances to draw each class. The risk would be if a class has very few samples you could overfit on that class so you may want to check your accuracy per class at the end of your training.

You can see an example here: https://github.com/fastai/fastai2/blob/master/nbs/14a_callback.data.ipynb

vrodriguezf · April 16, 2020, 8:37am

Thank you! For some reason, from the two methods of that notebook, the only one that I can use from a TabularPandas object is partial_dataloaders. weighted_dataloaders does not seem to be recognized.

Any idea of why? If your think this is more suitable for the tabular forum, I will move it there.

Thanks!

Pablo · April 16, 2020, 8:48am

Would anyone know if there is anything similar but for text? And… for multi label text?

zlapp · April 16, 2020, 5:36pm

I’m trying to run fastai2 training and tracking with weights and biases (wandb). I’m using the datablocks api for image classification, and instantiating the dataloaders using pandas dataframe. Got the following error:

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    187             try:
--> 188                 self._do_begin_fit(n_epoch)
    189                 for epoch in range(n_epoch):

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in _do_begin_fit(self, n_epoch)
    159     def _do_begin_fit(self, n_epoch):
--> 160         self.n_epoch,self.loss = n_epoch,tensor(0.);         self('begin_fit')
    161 

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in __call__(self, event_name)
    123 
--> 124     def __call__(self, event_name): L(event_name).map(self._call_one)
    125     def _call_one(self, event_name):

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    371              else f.__getitem__)
--> 372         return self._new(map(g, self))
    373 

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
    322     def _xtra(self): return None
--> 323     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
    324     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    313         if (use_list is not None) or not _is_array(items):
--> 314             items = list(items) if use_list else _listify(items)
    315         if match is not None:

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in _listify(o)
    249     if isinstance(o, str) or _is_array(o): return [o]
--> 250     if is_iter(o): return list(o)
    251     return [o]

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
    215         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 216         return self.fn(*fargs, **kwargs)
    217 

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in _call_one(self, event_name)
    126         assert hasattr(event, event_name)
--> 127         [cb(event_name) for cb in sort_by_run(self.cbs)]
    128 

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in <listcomp>(.0)
    126         assert hasattr(event, event_name)
--> 127         [cb(event_name) for cb in sort_by_run(self.cbs)]
    128 

/usr/local/lib/python3.6/dist-packages/fastai2/callback/core.py in __call__(self, event_name)
     23                (self.run_valid and not getattr(self, 'training', False)))
---> 24         if self.run and _run: getattr(self, event_name, noop)()
     25         if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit

/usr/local/lib/python3.6/dist-packages/fastai2/callback/wandb.py in begin_fit(self)
     55             idxs = wandbRandom.sample(range(len(self.dls.valid_ds)), self.n_preds)
---> 56             test_items = [self.dls.valid_ds.items[i] for i in idxs]
     57             self.valid_dl = self.dls.test_dl(test_items, with_labels=True)

/usr/local/lib/python3.6/dist-packages/fastai2/callback/wandb.py in <listcomp>(.0)
     55             idxs = wandbRandom.sample(range(len(self.dls.valid_ds)), self.n_preds)
---> 56             test_items = [self.dls.valid_ds.items[i] for i in idxs]
     57             self.valid_dl = self.dls.test_dl(test_items, with_labels=True)

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 3412

Was able to reproduce by running:

wandbRandom = random.Random(0) # For repeatability
idxs = wandbRandom.sample(range(len(dls.valid_ds)), 36)
test_items = [dls.valid_ds.items[i] for i in idxs]

Think it has to do with issue with indexing into dataframe.

@boris

boris · April 16, 2020, 5:45pm

@vrodriguezf Can you try to add this cell in your notebook and let me know if it works for your project?

@patch
@delegates(Datasets.dataloaders)
def weighted_dataloaders(self:FilteredBase, wgts, bs=64, **kwargs):
    xtra_kwargs = [{}] * (self.n_subsets-1)
    return self.dataloaders(bs=bs, dl_type=WeightedDL, dl_kwargs=({'wgts':wgts}, *xtra_kwargs), **kwargs)

If that’s good I’ll send a PR.

@Pablo those should work with any type of data. You just need to define the weights you want to use.

boris · April 16, 2020, 5:49pm

@zlapp Do you have a full example I could test?
This will make it easier to propose a fix.

In the meantime you can use log_preds=False in the callback.
I’ll propose a PR so that the callback does not completely fail if there is an error in logging predictions.

zlapp · April 16, 2020, 7:35pm

Thanks @boris will try with log_preds=False
Attached is a standalone notebook I put together hope it’s clear enough.
I artificially created a df from dogs vs cats using MultiCategoryBlock (since that is my use case) but think the issue persists in CategoryBlock as well and was able to reproduce the err.

gist.github.com

https://gist.github.com/zlapp/dafa3484b8d641f04625d56b41f7a2b2

wandb_df_err.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",

This file has been truncated. show original

boris · April 17, 2020, 2:57am

Thanks @zlapp, this PR should fix your issue and let WandbCallback handle dataframes.

philchu · April 17, 2020, 3:48am

[getattr(self.dls.valid_ds.items, 'iloc', self.dls.valid_ds.items)[i]

I ran into a similar indexing problem when DistributedDL “wraps around” an underlying TabularPandas based dataloader, and an integer index reference will break, e.g. dataset[i].

Perhaps it’s worthwhile to do a blanket search of *.items\[ and .dataset\[ (and other patterns), and consider a fix at the Datasets level…

Pablo · April 17, 2020, 6:53am

Yeah, I guess the challenge is how to define those weights for multi label… I am actually trying something like that at the moment, let’s see how it goes.

vrodriguezf · April 17, 2020, 9:41am

That code worked, now I can call weighted_dataloaders from a TabularPandas object, thanks!

However, I am getting this warning in the call:
wdls = to.weighted_dataloaders(wgts=range(len(to.train)), bs=16)
Could not do one pass in your dataloader, there is something wrong in it

And then an error when I call wdls.show_batch(). I am assuming that the weights of a weighted_dataloader must have the same length than the training dataset, is that correct?

boris · April 17, 2020, 2:32pm

Does it work if you use a regular dataloader? The best is to share small reproducible code.

lawrence · April 17, 2020, 5:45pm

Apologies for cross-posting this from another forum. I am seeking advice on what the best fastai-based solution is for working with images where the input annotations are bounding boxes (not segmentation masks). New models seem to be cropping up very fast;

Fast R-CNN (Girshick 2015),
Faster R-CNN (Ren et al. 2016),
Feature Pyramid Networks (Lin et al 2017),
Mask R-CNN (He et al. 2017),
Mask scoring R-CNN (Huang et al. 2019),
Detectron1 and now Detectron2 in PyTorch (Wu et al. 2019), etc.

Does anyone know what the state of the art would be in terms of models that are already implemented in either fastai v1 or fastai v2, for learning from bounding box annotations? The classes covered one approach in late 2018 but I’m wondering if there are better approaches now.

WaterKnight · April 17, 2020, 5:47pm

I think that none of them is implemented. Just UNet for semantic segmentation is implemented. However, you could just wrap torchvision models in a Learner and get all the advantages of FastAI 2 Framework!!

lawrence · April 17, 2020, 5:59pm

I did see one implementation of Mask R-CNN here, but the architecture is quite complicated and it doesn’t look like you could easily wrap it in a Learner. The author called it “mainly a personal learning exercise” and the repo doesn’t appear to have been very active, but I don’t understand why more people haven’t grabbed it (is it hard to use? slow to train? Are there better approaches?). I also noticed that Detectron2 is a Pytorch rewrite of Detectron1, but I don’t know if anyone’s ported it into fastai yet.