Fastai v2 vision

I thing It’s broken recently last week it was working… I had the same issue with classification with 4 classes of images…

Yes Jeremy broke this during the weekend while adding some other functionality. Will fix this morning.

Edit: Should be fixed now.

1 Like

So I have a question on the DataBlock API. I’m going through object detection at the moment and I notice we have three blocks here: ImageBlock, BBoxBlock, BBoxLblBlock. My understanding of the blocks was that we should make a tuple around what’s our input blocks and our output blocks, and that block wants a tuple of 2. Eg: blocks = ((ImageBlock), (BBoxBlock, BBoxLblBlock))

Do we just in general say what overall blocks we will be using? Or is that when we declare n_inp=1 so it will assume the output is 2. (the other two blocks)

Last question, why is noop necessary? (check me if I’m wrong here) but can we assume we should have a getter place for each Block?

Code I am talking about:

coco = DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 getters=[noop, lambda o: img2bbox[o.name][0], lambda o: img2bbox[o.name][1]], n_inp=1)

Also any thought to adding RetinaNet into the library?

My guess is that the way to go is the latter, using (ImageBlock, BBoxBlock, BBoxLblBlock) with n_inp. This ensures that at any stage in the data processing pipeline, there is a Tuple that is just one level deep.

((ImageBlock), (BBoxBlock, BBoxLblBlock)) is a nested tuple, and I don’t think Transform acts on elements that are inside a nested tuple. So, if you try to apply a rotation transform, it will just rotate ImageBlock, but not BBoxBlock or BBoxLblBlock.

1 Like

Makes sense. The latter was in the DataBlock notebook, just trying to understand how we determine inputs and outputs. So it’s safe to assume it will search for the first x blocks to make our inputs those types? And the rest will be outputs?

I’ve tried using n_inp=1 with DataSource for a tuple of length 5 (so 1 input + 4 outputs) and everything seems to work, up until, and including, calling Learner.fit.

I’ve sort of managed to train such a model, but am running into an error when using Learner's predict and show_results methods. For example, when using show_results:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-91-9ca34d9564c5> in <module>
----> 1 learn.show_results(max_n=4)

~/fastai_dev/dev/local/learner.py in show_results(self, ds_idx, dl, max_n, **kwargs)
    316         if dl is None: dl = self.dbunch.dls[ds_idx]
    317         b = dl.one_batch()
--> 318         _,_,preds = self.get_preds(dl=[b], with_decoded=True)
    319         self.dbunch.show_results(b, preds, max_n=max_n, **kwargs)
    320 

~/fastai_dev/dev/local/learner.py in get_preds(self, ds_idx, dl, with_input, with_loss, with_decoded, act, save_preds, save_targs)
    297             res = []
    298             if len(cb.preds):
--> 299                 preds = act(torch.cat(cb.preds))
    300                 res.append(preds)
    301                 if with_decoded: res.append(getattr(self.loss_func, 'decodes', noop)(preds))

TypeError: expected Tensor as element 0 in argument 0, but got tuple

As far as I can tell, it’s because the model’s output is a tuple of tensors, instead of just a tensor. The exact dimensions of the model output and targets in this instance are:

06

The images and masks are 210 px in width and 140 px in height. The batch size is 9. The target is a tuple of 4 masks, and the model outputs a 2x140x210 tensor for each of these masks.

Are there some ways that get_preds can be generalized to handle such situations?

Is the torch.cat in preds = act(torch.cat(cb.preds)) to concatenate things from different batches?

What is the current formal way of obtaining predictions on a test set? I see from this Kaggle starter notebook from @radek that, to do this, a separate DataBunch with dummy targets is created, then it is assigned as Learner.dbunch, and then finally Learner.get_preds is called to get the predictions, along with the targets.

There exists a test_dl method for obtaining a databunch with your test items. It can be found in data/core.py

@delegates(TfmdDL.__init__)
def test_dl(dbunch, test_items, rm_type_tfms=0, **kwargs):
    "Create a test dataloader from `test_items` using validation transforms of `dbunch`"
    test_ds = test_set(dbunch.valid_ds, test_items, rm_tfms=rm_type_tfms) if isinstance(dbunch.valid_ds, DataSource) else test_items
    return dbunch.valid_dl.new(test_ds, **kwargs)
3 Likes

For now get_preds only works with outputs that are a single tensor, not tuples. Will fix that when I have time.

1 Like

I’m trying to get a labeled ImagePoint setup, I will have a notebook to do so and why I believe it should be added (it’s very similar to LabeledBBox) but I am currently running into some troubles. I’m working on showing my labeled points and when I do:

tpnts = LabeledPnts(TensorPoint(pnts), lbls)
ctx = pnts_img.show(figsize=(3,3))
tpnts.show(ctx=ctx)

I get AttributeError: 'PathCollection' object has no property 'text'

Assume pnts is a list of points, and LabeledPnts is as follows:

class LabeledPnts(Tuple):
    "Basic type for a list of points in an image"
    def show(self, ctx=None, **kwargs):
        for b,l in zip(self.pnts, self.lbl):
            if l != '#na#': ctx = retain_type(b, self.pnts).show(ctx=ctx, text=l)
        return ctx

    pnts,lbl = add_props(lambda i,self: self[i])

It is practically the same as BBox so I’m unsure to why this is happening. Thoughts?

It’s way to little code and way to little of the error message for us to understand what’s going on.

So the goal is labelled keypoints similar to how Bounding Boxes get labels. This is useful for cases such as identifying the wrist vs the elbow, etc. I have my Block, Labeler, and points as such:

def PntLblBlock(vocab=None, add_na=True):
  return TransformBlock(type_tfms=MultiCategorize(vocab=vocab, add_na=add_na), item_tfms=PntLabeler)

class PntsLabeler(Transform):
    def setup(self, dl): self.vocab = dl.vocab
    def before_call(self): self.pnts,self.lbls = None,None
        
    def decode (self, x, **kwargs): 
        self.pnts,self.lbls = None,None
        return self._call('decodes', x, **kwargs)
    
    def decodes(self, x:TensorMultiCategory): 
        self.lbls = [self.vocab[a] for a in x]
        return x if self.pnts is None else LabeledPnts(self.pnts, self.lbls)
        
    def decodes(self, x:TensorPoint): 
        self.pnts = x
        return self.pnts if self.lbls is None else LabeledPnts(self.pnts, self.lbls)

class LabeledPnts(Tuple):
    "Basic type for a list of points in an image"
    def show(self, ctx=None, **kwargs):
        for b,l in zip(self.pnts, self.lbl):
            if l != '#na#': ctx = retain_type(b, self.pnts).show(ctx=ctx)
            ctx.annotate(l, b)
        return ctx

    pnts,lbl = add_props(lambda i,self: self[i])

My test input is 9 labels corresponding to 9 points. To ensure I visualized it right, I did the following:

img = PILImage.create(fname)
pnts_img = TensorImage(img)

lbbox = LabeledPnts(TensorPoint(pnts), lbls)
ctx = pnts_img.show(figsize=(3,3))
lbbox.show(ctx=ctx)

Which brought up a proper pointed image with labels. From here I made two private functions to help with the points and labels in a DataSource transform:

def _pnt_pnts(x): return TensorPoint.create(pnts)
def _pnt_lbl(x): return lbls

From here, I then created the DataSource like so:
pnt_tds = DataSource([fname], [PILImage.create, [_pnt_pnts], [_pnt_lbl, MultiCategorize(add_na=True)]], n_inp=1) (where fname points to a single image)

pnt_tds outputs what I want, a tuple of a PILImage, my points, and hot-encoded targets.

(#1) [(<fastai2.vision.core.PILImage image mode=RGB size=1024x768 at 0x7F3F99772128>, tensor([[371., 429.],
        [598., 426.],
        [481., 633.],
        [240., 310.],
        [ 68.,  41.],
        [378., 247.],
        [557., 243.],
        [755., -11.],
        [767., 303.]]), tensor([4, 9, 5, 1, 2, 3, 6, 7, 8]))]

I then make a DataLoader from this, with my after_item transforms being my PntsLabeler, PointScaler, and ToTensor:
pnt_tdl = TfmdDL(pnt_tds, bs=1, after_item=[PntsLabeler(), PointScaler(), ToTensor()])

pnt_tdl.after_item.c brought up the right number of total outputs (as there should be 18 total, one x and one y)

I then grabbing a batch, x, y, z = pnt_tdl.one_batch() and decoding said batch, needs a decide property.

@MultiCategorize
def decodes(self, x:LabeledPnts): return x

Now onto the show issue. Show wouldn’t work with text like with bounding boxes, most likely due to it not being a shape. Instead I annotated it at the point. ctx.annotate(l, b). Is there a better way to do this inside of the show method?

Here is the end result I was after (last goal is to deal with those going out of bounds):

image

I believe I need to do that as a before batch similar to how the bounding boxes pad

Okay I’ve revisited it in the morning and need some advice on how to deal with a particular issue. The goal is to have any points that get moved off after any transform (be it Random Resized Crop, etc) to then get a label of 0, and have that point reset to (0,0) (we ignore it). I currently have the following helper functions:

def clip_remove_empty_ips(ips, lbl):
  ips = torch.clamp(ips, -1, 1)
  empty = ((ips[...,1]) * (ips[...,0]) < 0.)
  return (ips[~empty], lbl[~empty])

def pts_pad(samples, pad_idx=0):
  samples = [(s[0], *clip_remove_empty_ips(s[1], s[2])) for s in samples]
  max_len = max([len(s[2]) for s in samples])
  def _f(img, pts, lbl):
    pts = torch.cat([pts, pts.new_zeros(max_len-pts.shape[0], 2)])
    lbl = torch.cat([lbl, lbl.new_zeros(max_len-lbl.shape[0]) + pad_idx])
    return img, pts, lbl
  return [_f(*s) for s in samples]

I can successfully make a databunch, however wen I show a batch, I get an output like the following:

image

As you can see, some of the points are properly labeled after the transform, but a few are not, and on the left’s case none were even there (even though they have labels). Did I forget to check something in my pads? (here’s my getters and dblock):

PointsBlock = TransformBlock(type_tfms=TensorPoint.create, item_tfms=PointScaler, 
                             dbunch_kwargs = {'before_batch' : pts_pad})

getters = [noop, lambda o: get_y(o.name), lambda o: lbls]
dblock = DataBlock(blocks=(ImageBlock, PointsBlock, PntLblBlock(vocab=lbls)),
                   get_items=get_image_files,
                   splitter=RandomSplitter(),
                   getters=getters,
                   n_inp=1)

dbunch = dblock.databunch(path, item_tfms=RandomResizedCrop(460), bs=32,
                batch_tfms=[Normalize(*imagenet_stats)])

Thoughts? I almost am there :slight_smile: last little bit left

It looks like it has to do with my pts_pad

1 Like

Thinking about this I realized I may be going about this wrong. So would it be possible to return a dictionary attached with each point so we know which ones are there? (instead of looking at the raw model output to see which ones actually show up). Also, during augmentation in 1.0 for keypoints we could pass in an argument to drop any points that were out of the augmentation area. Do we have this capability in 2.0 yet?

No, it’s not implemented. Note that you can easily add a transform to do this but you will also likely need padding since then your tensors won’t all have the same dimensions, so it’s probably in a before_batch custom function that you should add it.

1 Like

Got it. I tried to do padding in the post above but didn’t quite work. I’ll revisit it. Thanks! :slight_smile:

I notice a strange behaviour in 09_vision_augement.ipynb, between RandomTransform and the cell

tflip = FlipItem(p=1.)
test_eq(tflip(bbox,split_idx=0), tensor([[1.,0., 2.,1]]) -1)

The test in this cell passes the first time round, but, after adding a print(self.do) in RandomTransform.before_call, it appears to fail.
Steps to reproduce the observed behaviour:

  1. Shutdown ‘09_vision_augment.ipynb’.
  2. Open ‘09_vision_augment.ipynb’.
  3. Execute notebook up until In [14].
  4. Go to the definition of RandTransform.before_call (In [6]).
  5. Add print(self.do) at the end of this function. Run cell.
  6. Go back down to In [14].
  7. Either re-execute this cell, or copy the code to a new cell and execute it.

The raised exception is:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-60f334b49970> in <module>
      1 tflip = FlipItem(p=1.)
----> 2 test_eq(tflip(bbox,split_idx=0), tensor([[1.,0., 2.,1]]) -1)

~/git_repos/fastai_dev/dev/local/test.py in test_eq(a, b)
     30 def test_eq(a,b):
     31     "`test` that `a==b`"
---> 32     test(a,b,equals, '==')
     33 
     34 #Cell

~/git_repos/fastai_dev/dev/local/test.py in test(a, b, cmp, cname)
     20     "`assert` that `cmp(a,b)`; display inputs and `cname or cmp.__name__` if it fails"
     21     if cname is None: cname=cmp.__name__
---> 22     assert cmp(a,b),f"{cname}:\n{a}\n{b}"
     23 
     24 #Cell

AssertionError: ==:
tensor([[-0., -1., -1.,  0.]])
tensor([[ 0., -1.,  1.,  0.]])

After the change, RandomTransform.before_call is:

def before_call(self, b, split_idx):
    "before_call the state for input `b`"
    self.do = random.random() < self.p
    print(self.do)

fastai_dev commit number: aab5d463

Cannot get FlipItem to work on image-related types. For example:

  1. img.show() and tflip(img).show() display the same thing, though these are really PILImages rather than Image.Images.
  2. test_eq(tflip(TensorImage(img)), TensorImage(img)) passes.

Should there also be a patch to flip TensorMasks left-right, like:

@patch
def flip_lr(x:TensorMask): return x.flip(-1)

, or is this case covered by the patch for ImageTensor?

It should be TensorImageBase since that is what TensorImage and TensorMask both subclass. Will fix. As for your over bug, this is because tflip changes the bbox inplace. Will look to fix that too.

2 Likes

Hi everyone,

I’m doing a image single label classification with 3 categories, and I want to use AUC score as a metric but it was restricted to binary classification, so I used this in fastai v1, which was modified from the work of someone in the v1 forum (thank you but I can’t find the original post to give a link):

class AUC(Callback):
    def __init__(self, main_class=0):
        super().__init__()
        self.main_class = main_class
        self.name = f'{data.classes[self.main_class]} auc'

    def on_epoch_begin(self, **kwargs):
        self.targs, self.preds = LongTensor([]), Tensor([])

    def on_batch_end(self, last_output:Tensor, last_target:Tensor, **kwargs):
        self.preds = torch.cat((self.preds, last_output.cpu()))
        self.targs = torch.cat((self.targs, last_target.cpu().long()))
    
    def on_epoch_end(self, last_metrics, **kwargs):
        idx = (self.targs==self.main_class)
        targs = torch.zeros(self.targs.size())
        targs[idx] = 1
        preds = F.softmax(self.preds, dim=1)[:, self.main_class]

        return add_metrics(last_metrics, auc_roc_score(preds, targs))` and `metrics = [AUC(0), AUC(1), ...]

But when I bring it to fastai2 (naively), I got the ‘can’t set attribute’ error at self.name line. My question is what’s the proper way to define a new metric like that in fastai2?