Fastai v2 chat

muellerzr · March 23, 2020, 8:54pm

Well, that’s embarrassing

We’re all still learning new things Thanks for the explanation!

Pablo · March 24, 2020, 8:35am

I knew you could pass callbacks both at init and at fit, but I wasn’t aware of the difference either! I expected the difference in behaviour should be built into each callback.

muellerzr · March 24, 2020, 3:44pm

This is a bit of a general question. I understand that when we use the high level DataBlock, ToTensor is being applied automatically, but if I look at what it is it’s simply:

class ToTensor(Transform):
    "Convert item to appropriate tensor class"
    order = 5

I then went in and tried to look at Transform to see if I could find anywhere where it’s explicitly done but I couldn’t figure out where the magic is happening. Any hints to what I’m missing?

scart97 · March 24, 2020, 4:07pm

Take a look at this, where the ToTensor is defined for images

muellerzr · March 24, 2020, 4:09pm

Aha! That’s what I was missing. It simply has the Transform base class and then the encodes functionalities are assigned in the submodules. That makes perfect sense. Thanks @scart97

(Along with the tensor classes being assigned to each Pillow type)

lgvaz · March 24, 2020, 11:36pm

I want to clarify something before I make a PR, does passing get_x and/or get_y to DataBlock only works when n_inp=1?

I’m asking this because of the following lines of code:

if self.get_x: self.getters[0] = self.get_x
if self.get_y: self.getters[1] = self.get_y

Now, let’s say I have 2 inputs and 1 output. If I pass get_y it will be wrongly assigned to the second input.

I think we should do something like:

if self.get_x: getters[:n_inp] = self.get_x
if self.get_y: getters[n_inp:] = self.get_y

We also need to check that get_x is a list with length 2 in this case.

sgugger · March 25, 2020, 12:05am

Yes, for now, get_x/get_y only works for one input and one target. In other cases, you are supposed to provide getters.
But I like your approach! Feel free to suggest a PR with it!

jeremy · March 25, 2020, 12:23am

Me too.

GregLed · March 25, 2020, 11:11pm

Hi,

Is there already a loss function that can be used with a multi-target classification? In my Dataloaders I end up with a collated tensor of one hot encoded vectors but after that the Learner fails on the loss calculation (I tried LabelSmoothingCrossEntropy and CrossEntropyLossFlat). I went through the notebooks but cannot find any example that would cover that (only the planet dataset but it focuses on how to prepare the data). Thanks for your help!

muellerzr · March 25, 2020, 11:14pm

That’s because if you choose the “MultiCategoryBlock” the proper loss function is already assigned (if using cnn_learner), which is BCELossLogitsFlat

GregLed · March 25, 2020, 11:17pm

Ah ok, thank you. I need to dig deeper into that bit.

kshitijpatil09 · March 26, 2020, 7:33am

I’m not sure what does PointScaler do with TensorBBox?

@PointScaler
def encodes(self, x:TensorBBox):
    pnts = self.encodes(cast(x.view(-1,2), TensorPoint))
    return cast(pnts.view(-1, 4), TensorBBox)

@PointScaler
def decodes(self, x:TensorBBox):
    pnts = self.decodes(cast(x.view(-1,2), TensorPoint))
    return cast(pnts.view(-1, 4), TensorBBox)

I see the method calling encodes on self, but there’s no encodes implementation in TensorBBox and TensorPoint

EDIT: I found scale_pnts being called for TensorPoint but not getting what it’s doing. Any reference to learn about this scaling?

kshitijpatil09 · March 26, 2020, 8:32am

I’m trying to understand what it does to TensorBBox, the code works with Pipeline but not with Datasets

# PointScaler expects `img_size` in _meta
class AddImsize(Transform):
  def __init__(self,sz=128): self.sz=sz
  def encodes(self, x:TensorBBox): 
    x._meta = {'img_size': self.sz}
    return x    
# This works
p = Pipeline([img2bbox.__getitem__, TensorBBox.create, AddImsize, PointScaler]); p(imgs[0])

# But this does not
itfms = [lambda o: path/'train'/o, PILImage.create]
bbtfms = [img2bbox.__getitem__, TensorBBox.create, AddImsize, PointScaler]
ds = Datasets(imgs,[itfms,bbtfms])

Causes AttributeError: do_item

Here’s the complete stack trace

<ipython-input-80-178eb5fcaa07> in <module>()
      7 itfms = [lambda o: path/'train'/o, PILImage.create]
      8 bbtfms = [img2bbox.__getitem__, TensorBBox.create, AddImsize(), PointScaler]
----> 9 tds = Datasets(imgs,[itfms,bbtfms])
     10 # p = Pipeline([img2bbox.__getitem__, TensorBBox.create, AddImsize, PointScaler]); p(imgs[0])
     11 # s = p(imgs[0])

12 frames

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
    272     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    273         super().__init__(dl_type=dl_type)
--> 274         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    275         self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
    276 

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in <listcomp>(.0)
    272     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    273         super().__init__(dl_type=dl_type)
--> 274         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    275         self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
    276 

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in __init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose)
    212         if do_setup:
    213             pv(f"Setting up {self.tfms}", verbose)
--> 214             self.setup(train_setup=train_setup)
    215 
    216     def _new(self, items, **kwargs): return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, **kwargs)

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in setup(self, train_setup)
    226 
    227     def setup(self, train_setup=True):
--> 228         self.tfms.setup(self, train_setup)
    229         if len(self) != 0:
    230             x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]

/usr/local/lib/python3.6/dist-packages/fastcore/transform.py in setup(self, items, train_setup)
    177         tfms = self.fs[:]
    178         self.fs.clear()
--> 179         for t in tfms: self.add(t,items, train_setup)
    180 
    181     def add(self,t, items=None, train_setup=False):

/usr/local/lib/python3.6/dist-packages/fastcore/transform.py in add(self, t, items, train_setup)
    180 
    181     def add(self,t, items=None, train_setup=False):
--> 182         t.setup(items, train_setup)
    183         self.fs.append(t)
    184 

/usr/local/lib/python3.6/dist-packages/fastcore/transform.py in setup(self, items, train_setup)
     76     def setup(self, items=None, train_setup=False):
     77         train_setup = train_setup if self.train_setup is None else self.train_setup
---> 78         return self.setups(getattr(items, 'train', items) if train_setup else items)
     79 
     80     def _call(self, fn, x, split_idx=None, **kwargs):

/usr/local/lib/python3.6/dist-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
     96         if not f: return args[0]
     97         if self.inst is not None: f = MethodType(f, self.inst)
---> 98         return f(*args, **kwargs)
     99 
    100     def __get__(self, inst, owner):

/usr/local/lib/python3.6/dist-packages/fastai2/vision/core.py in setups(self, dl)
    235 
    236     def setups(self, dl):
--> 237         its = dl.do_item(0)
    238         for t in its:
    239             if isinstance(t, TensorPoint): self.c = t.numel()

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __getattr__(self, k)
    228         if self._component_attr_filter(k):
    229             attr = getattr(self,self._default,None)
--> 230             if attr is not None: return getattr(attr,k)
    231         raise AttributeError(k)
    232     def __dir__(self): return custom_dir(self,self._dir())

/usr/local/lib/python3.6/dist-packages/fastcore/transform.py in __getattr__(self, k)
    187     def __getitem__(self,i): return self.fs[i]
    188     def __setstate__(self,data): self.__dict__.update(data)
--> 189     def __getattr__(self,k): return gather_attrs(self, k, 'fs')
    190     def __dir__(self): return super().__dir__() + gather_attr_names(self, 'fs')
    191 

/usr/local/lib/python3.6/dist-packages/fastcore/transform.py in gather_attrs(o, k, nm)
    151     att = getattr(o,nm)
    152     res = [t for t in att.attrgot(k) if t is not None]
--> 153     if not res: raise AttributeError(k)
    154     return res[0] if len(res)==1 else L(res)
    155 

AttributeError: do_item

Interesting note: when I pass in Pipelines of transform, surprisingly it works

p1 = Pipeline(itfms)
p2 = Pipeline(bbtfms)
ds = Datasets(imgs,[p1,p2])

sgugger · March 26, 2020, 11:53am

This transform is not meant to be used on its own in a Datasets, but at the batch level as after_item. It needs the tuple (image, point) to work.

oguiza · March 26, 2020, 12:41pm

Hi,
I’m trying to create a TfmdDL subclass, but I need to be able to pass a new collate_fn. Is there any way to do that in fastai2 similar to how it was done in fastai?

sgugger · March 26, 2020, 12:51pm

If your collate function is just an operation on samples (like padding) you should pass it to before_batch (expects an array of samples and should return the modified array).
If it’s a very custom collation in itself, it’s the function create_batch you want to modify (defaults to fa_collate of fa_convert depending of if your data loader has a batch size or not)

rsomani95 · March 26, 2020, 1:43pm

Thanks. I’ve opened a PR for this here

vijayabhaskar · March 26, 2020, 2:29pm

I’m trying to run TTA on test set, but it is using validation set instead of the test set.
get_preds works fine though.
I’m using current fastai2 github version.
Has anyone experienced this?

tst_dl = dls.test_dl([path.ls()[0]/x for x in testdf.Image.values])

preds=learn.tta(dl=tst_dl)
preds[0].shape #==> torch.Size([600, 4])

preds=learn.get_preds(dl=tst_dl)
preds[0].shape #==> torch.Size([3219, 4])

sgugger · March 26, 2020, 4:57pm

Ah yes, it was using the ds_idx internally and not the dl. Should be fixed now.

kdorichev · March 26, 2020, 5:51pm

I’ve noticed that get_image_files() from data.transforms reads not all the image files.
Investigation showed that image_extensions is not initialized with all possible mimetypes.

Fix: add mimetypes.init()

mimetypes.init()
image_extensions = set(k for k,v in mimetypes.types_map.items() if v.startswith('image/'))

See what difference this makes.