That’s is certainly possible, and a very cool idea. You just need to be careful on how you handle the case when you’re doing some other form of learning that is not classification, how would this work for regression for example?
I wonder what happens with unique
in the said example
It shows the same pictures/text/input with various data augmentation and the float target.
Looking at GradientAccumulation
again I wonder if there should be the following updates:
- define it by number of batches to run
n_batches
instead of minimum number of items to accumulaten_acc
- scale the loss prior to calculating the gradients (and scale it back for logging purposes) so that we don’t have to adjust learning rate
Do you think it would make it more intuitive to use?
Hi! Is there a way to work with imbalanced classes in a tabular dataset in fastai2
? I can see references to weighted_dataloaders
in the code, but I am not sure of how to use it.
Thank you!
This is the correct one to use!
The idea is to pass weights for each items of your dataset. So for example you could define weights so that the sum of the weights for each class are all equal, ie “1/n_class_samples” for each item of a specific class.
This would give same chances to draw each class. The risk would be if a class has very few samples you could overfit on that class so you may want to check your accuracy per class at the end of your training.
You can see an example here: https://github.com/fastai/fastai2/blob/master/nbs/14a_callback.data.ipynb
Thank you! For some reason, from the two methods of that notebook, the only one that I can use from a TabularPandas
object is partial_dataloaders
. weighted_dataloaders
does not seem to be recognized.
Any idea of why? If your think this is more suitable for the tabular forum, I will move it there.
Thanks!
Would anyone know if there is anything similar but for text? And… for multi label text?
I’m trying to run fastai2 training and tracking with weights and biases (wandb). I’m using the datablocks api for image classification, and instantiating the dataloaders using pandas dataframe. Got the following error:
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
187 try:
--> 188 self._do_begin_fit(n_epoch)
189 for epoch in range(n_epoch):
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in _do_begin_fit(self, n_epoch)
159 def _do_begin_fit(self, n_epoch):
--> 160 self.n_epoch,self.loss = n_epoch,tensor(0.); self('begin_fit')
161
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in __call__(self, event_name)
123
--> 124 def __call__(self, event_name): L(event_name).map(self._call_one)
125 def _call_one(self, event_name):
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
371 else f.__getitem__)
--> 372 return self._new(map(g, self))
373
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
322 def _xtra(self): return None
--> 323 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
324 def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
40
---> 41 res = super().__call__(*((x,) + args), **kwargs)
42 res._newchk = 0
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
313 if (use_list is not None) or not _is_array(items):
--> 314 items = list(items) if use_list else _listify(items)
315 if match is not None:
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in _listify(o)
249 if isinstance(o, str) or _is_array(o): return [o]
--> 250 if is_iter(o): return list(o)
251 return [o]
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
215 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 216 return self.fn(*fargs, **kwargs)
217
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in _call_one(self, event_name)
126 assert hasattr(event, event_name)
--> 127 [cb(event_name) for cb in sort_by_run(self.cbs)]
128
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in <listcomp>(.0)
126 assert hasattr(event, event_name)
--> 127 [cb(event_name) for cb in sort_by_run(self.cbs)]
128
/usr/local/lib/python3.6/dist-packages/fastai2/callback/core.py in __call__(self, event_name)
23 (self.run_valid and not getattr(self, 'training', False)))
---> 24 if self.run and _run: getattr(self, event_name, noop)()
25 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
/usr/local/lib/python3.6/dist-packages/fastai2/callback/wandb.py in begin_fit(self)
55 idxs = wandbRandom.sample(range(len(self.dls.valid_ds)), self.n_preds)
---> 56 test_items = [self.dls.valid_ds.items[i] for i in idxs]
57 self.valid_dl = self.dls.test_dl(test_items, with_labels=True)
/usr/local/lib/python3.6/dist-packages/fastai2/callback/wandb.py in <listcomp>(.0)
55 idxs = wandbRandom.sample(range(len(self.dls.valid_ds)), self.n_preds)
---> 56 test_items = [self.dls.valid_ds.items[i] for i in idxs]
57 self.valid_dl = self.dls.test_dl(test_items, with_labels=True)
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 3412
Was able to reproduce by running:
wandbRandom = random.Random(0) # For repeatability
idxs = wandbRandom.sample(range(len(dls.valid_ds)), 36)
test_items = [dls.valid_ds.items[i] for i in idxs]
Think it has to do with issue with indexing into dataframe.
@vrodriguezf Can you try to add this cell in your notebook and let me know if it works for your project?
@patch
@delegates(Datasets.dataloaders)
def weighted_dataloaders(self:FilteredBase, wgts, bs=64, **kwargs):
xtra_kwargs = [{}] * (self.n_subsets-1)
return self.dataloaders(bs=bs, dl_type=WeightedDL, dl_kwargs=({'wgts':wgts}, *xtra_kwargs), **kwargs)
If that’s good I’ll send a PR.
@Pablo those should work with any type of data. You just need to define the weights you want to use.
@zlapp Do you have a full example I could test?
This will make it easier to propose a fix.
In the meantime you can use log_preds=False
in the callback.
I’ll propose a PR so that the callback does not completely fail if there is an error in logging predictions.
Thanks @boris will try with log_preds=False
Attached is a standalone notebook I put together hope it’s clear enough.
I artificially created a df from dogs vs cats using MultiCategoryBlock
(since that is my use case) but think the issue persists in CategoryBlock
as well and was able to reproduce the err.
[getattr(self.dls.valid_ds.items, 'iloc', self.dls.valid_ds.items)[i]
I ran into a similar indexing problem when DistributedDL
“wraps around” an underlying TabularPandas based dataloader, and an integer index reference will break, e.g. dataset[i]
.
Perhaps it’s worthwhile to do a blanket search of *.items\[
and .dataset\[
(and other patterns), and consider a fix at the Datasets
level…
Yeah, I guess the challenge is how to define those weights for multi label… I am actually trying something like that at the moment, let’s see how it goes.
That code worked, now I can call weighted_dataloaders
from a TabularPandas
object, thanks!
However, I am getting this warning in the call:
wdls = to.weighted_dataloaders(wgts=range(len(to.train)), bs=16)
Could not do one pass in your dataloader, there is something wrong in it
And then an error when I call wdls.show_batch()
. I am assuming that the weights of a weighted_dataloader
must have the same length than the training dataset, is that correct?
Does it work if you use a regular dataloader? The best is to share small reproducible code.
Apologies for cross-posting this from another forum. I am seeking advice on what the best fastai-based solution is for working with images where the input annotations are bounding boxes (not segmentation masks). New models seem to be cropping up very fast;
- Fast R-CNN (Girshick 2015),
- Faster R-CNN (Ren et al. 2016),
- Feature Pyramid Networks (Lin et al 2017),
- Mask R-CNN (He et al. 2017),
- Mask scoring R-CNN (Huang et al. 2019),
- Detectron1 and now Detectron2 in PyTorch (Wu et al. 2019), etc.
Does anyone know what the state of the art would be in terms of models that are already implemented in either fastai v1 or fastai v2, for learning from bounding box annotations? The classes covered one approach in late 2018 but I’m wondering if there are better approaches now.
I think that none of them is implemented. Just UNet for semantic segmentation is implemented. However, you could just wrap torchvision models in a Learner and get all the advantages of FastAI 2 Framework!!
I did see one implementation of Mask R-CNN here, but the architecture is quite complicated and it doesn’t look like you could easily wrap it in a Learner. The author called it “mainly a personal learning exercise” and the repo doesn’t appear to have been very active, but I don’t understand why more people haven’t grabbed it (is it hard to use? slow to train? Are there better approaches?). I also noticed that Detectron2 is a Pytorch rewrite of Detectron1, but I don’t know if anyone’s ported it into fastai yet.