Fastai v2 vision

leaf · January 27, 2020, 3:32pm

Hi I was just trying the new release version 0.0.6 of fastai last couple of days and I noticed this behavior. Not actually a problem for me, (I am stuck on other issues), but I thought it’d point it out in case it is helpful:

dir(DataLoaders) has device listed twice and dir(DataLoader) has device and dataset listed twice.

I am investigating this, because my cnn_learner errors out with AttributeError: 'DataLoader' object has no attribute 'after_item', but that happens here: if getattr(dls.train_dl.after_item, 'c', False): return dls.train_dl.after_item.c so they only may be related.

sgugger · January 27, 2020, 4:50pm

Will add more safeguard here, thanks for reporting. It’s going to error anyway because it can’t find the number of output channels in your data, but with a useful error message (pass it with n_out=...).

boris · January 27, 2020, 5:07pm

I’m using U-net with different number of input channels (current model only accepts 3).

Here is my change:

added arg new_arch_in_channels to unet_learner (could find a better name)
added function change_body_in_channels and call it in unet_learner
rest is the same

Let me know if you are interested in a PR (or even require any change before)?

Here is the change in vision/learner.py

def change_body_in_channels(body, in_channels, pretrained):
    "Change first layer to match `in_channels`"
    assert not(pretrained), 'Change of input channels does not support pretrained models'
    assert body[0].__class__.__name__ == 'Conv2d', f'Change of input channels only supported with Conv2d, found {body[0].__class__.__name__}'
    prev_layer = body[0]
    # get init parameters
    params = {attr:getattr(prev_layer, attr) for attr in 'out_channels,kernel_size,stride,padding,dilation,groups,padding_mode'.split(',')}
    params['bias'] = getattr(prev_layer, 'bias') is not None
    # set number of input channels
    params['in_channels'] = in_channels
    body[0] = nn.Conv2d(**params)
    return(body)    

@delegates(Learner.__init__)
def unet_learner(dls, arch, new_arch_in_channels=None, loss_func=None, pretrained=True, cut=None, splitter=None, config=None, **kwargs):
    "Build a unet learner from `dls` and `arch`"
    if config is None: config = unet_config()
    meta = model_meta.get(arch, _default_meta)
    body = create_body(arch, pretrained, ifnone(cut, meta['cut']))
    if new_arch_in_channels:
        change_body_in_channels(body, new_arch_in_channels, pretrained)
    size = dls.one_batch()[0].shape[-2:]
    model = models.unet.DynamicUnet(body, get_c(dls), size, **config)
    learn = Learner(dls, model, loss_func=loss_func, splitter=ifnone(splitter, meta['split']), **kwargs)
    if pretrained: learn.freeze()
    return learn

sgugger · January 27, 2020, 6:01pm

This could be useful for cnn_learner as well. I’d make the first function private (it can also have a shorter name ) and use ch_in for the argument instead of new_arch_in_channels.
The only problem is that this new conv will be frozen if you use pretrained=True so we should not freeze the Learner in that case.

boris · January 27, 2020, 6:03pm

Sounds good. Yes I have some assert statements at the start of the function.

I’ll implement the changes and will do some tests, probably taking only one dimension from MNIST.

jeremy · January 27, 2020, 7:56pm

@boris I suggest you use the param n_in for the number of input channels. I used to have some code to do this for pretrained models, but now embarassingly I can’t find it. Perhaps you can add this logic to your code - I’ve found it works well in practice:

For n_in==1: take the sum of the pretrained weights to create the unit axis (e.g. going from color->b&w this works great)
For n_in>3: add all-zero slices for the additional channels, and leave the existing weights as-is (e.g. adding an alpha channel to an RGB pretrained model; fine-tuning will only change the weights much from zero if it’s actually useful)
For n_in==2: Delete the 3rd channel and increase the weights of the other channels by 50%

boris · January 27, 2020, 8:34pm

Thanks, at the moment I was not loading any pretrained weights but your suggestion will make it more useful. I’ll do some tests and propose a change.

muellerzr · January 28, 2020, 2:38am

@sgugger just to make you aware of a bug someone showed me, the resize bug seems to also be present on the DataBlock level too (same transforms as before):

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(),
                 get_y=RegexLabeller(pat = r'/([^/]+)_\d+.*'))
dbunch = pets.dataloaders(path_im, item_tfms=item_tfms, batch_tfms=batch_tfms, bs=bs)
dbunch.show_batch(max_n=9, figsize=(6,7))

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 456 and 375 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

(Full trace if you need it):

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-5a9f6693bc49> in <module>()
----> 1 dbunch.show_batch(max_n=9, figsize=(6,7))

12 frames
/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in show_batch(self, b, max_n, ctxs, show, **kwargs)
     87 
     88     def show_batch(self, b=None, max_n=9, ctxs=None, show=True, **kwargs):
---> 89         if b is None: b = self.one_batch()
     90         if not show: return self._pre_show_batch(b, max_n=max_n)
     91         show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in one_batch(self)
    128     def one_batch(self):
    129         if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')
--> 130         with self.fake_l.no_multiproc(): res = first(self)
    131         if hasattr(self, 'it'): delattr(self, 'it')
    132         return res

/usr/local/lib/python3.6/dist-packages/fastcore/utils.py in first(x)
    172 def first(x):
    173     "First element of `x`, or None if missing"
--> 174     try: return next(iter(x))
    175     except StopIteration: return None
    176 

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in __iter__(self)
     95         self.randomize()
     96         self.before_iter()
---> 97         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
     98             if self.device is not None: b = to_device(b, self.device)
     99             yield self.after_batch(b)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    344     def __next__(self):
    345         index = self._next_index()  # may raise StopIteration
--> 346         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    347         if self._pin_memory:
    348             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     32                 raise StopIteration
     33         else:
---> 34             data = next(self.dataset_iter)
     35         return self.collate_fn(data)
     36 

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in create_batches(self, samps)
    104         self.it = iter(self.dataset) if self.dataset is not None else None
    105         res = filter(lambda o:o is not None, map(self.do_item, samps))
--> 106         yield from map(self.do_batch, self.chunkify(res))
    107 
    108     def new(self, dataset=None, cls=None, **kwargs):

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in do_batch(self, b)
    125     def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
    126     def create_batch(self, b): return (fa_collate,fa_convert)[self.prebatched](b)
--> 127     def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
    128     def one_batch(self):
    129         if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in create_batch(self, b)
    124     def retain(self, res, b):  return retain_types(res, b[0] if is_listy(b) else b)
    125     def create_item(self, s):  return next(self.it) if s is None else self.dataset[s]
--> 126     def create_batch(self, b): return (fa_collate,fa_convert)[self.prebatched](b)
    127     def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
    128     def one_batch(self):

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in fa_collate(t)
     44     b = t[0]
     45     return (default_collate(t) if isinstance(b, _collate_types)
---> 46             else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
     47             else default_collate(t))
     48 

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in <listcomp>(.0)
     44     b = t[0]
     45     return (default_collate(t) if isinstance(b, _collate_types)
---> 46             else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
     47             else default_collate(t))
     48 

/usr/local/lib/python3.6/dist-packages/fastai2/data/load.py in fa_collate(t)
     43 def fa_collate(t):
     44     b = t[0]
---> 45     return (default_collate(t) if isinstance(b, _collate_types)
     46             else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
     47             else default_collate(t))

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 456 and 375 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

It is also present on the most recent pip

sgugger · January 28, 2020, 4:05am

Yes you pass the transforms to the DataBlock now, not your call to dataloaders.

muellerzr · January 28, 2020, 4:08am

Ah missed that. Thank you! (Duh, makes sense given DataBlock.summary())

leaf · January 28, 2020, 4:06pm

Ha, that’s amazing. This is like reading from the future. Just ran into the same thing.

But now I am facing even bigger issue. I got my model to start training and after some fights with CUDA, I managed to get it to this stage:

st = DataBlock(blocks=(ImageBlock, ImageBlock(cls=PILMask)),
              splitter=RandomSplitter(),
              get_items=get_image_files,
              item_tfms=RandomResizedCrop(256),
              get_y=lambda o: str(o).replace(
                  '_standard_','_coco_').replace('standard','label').replace('jpg','png')) 

dls = st.dataloaders(path, bs=4,
                     batch_tfms=[*aug_transforms(size=256,
                                                       max_warp=0), 
                                 Normalize.from_stats(*imagenet_stats)])
lrnr = unet_learner(dls, resnet50, config=unet_config(self_attention=True), 
                    n_out=6, loss_func=seg_accuracy)

Where:

lrnr.y.shape
torch.Size([4, 256, 256])
lrnr.x.shape
torch.Size([4, 3, 256, 256])

But then I get this issue:

lrnr.fit_one_cycle(10,3e-4)

epoch	train_loss	valid_loss	time
0	0.176048	00:06
 0.54% [1/185 01:39<5:06:20 0.1760]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-ac44c0b0efe6> in <module>
----> 1 lrnr.fit_one_cycle(10,3e-4)

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
     88     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
     89               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
---> 90     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
     91 
     92 # Cell

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    287                     try:
    288                         self.epoch=epoch;          self('begin_epoch')
--> 289                         self._do_epoch_train()
    290                         self._do_epoch_validate()
    291                     except CancelEpochException:   self('after_cancel_epoch')

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in _do_epoch_train(self)
    262         try:
    263             self.dl = self.dls.train;                  self('begin_train')
--> 264             self.all_batches()
    265         except CancelTrainException:                         self('after_cancel_train')
    266         finally:                                             self('after_train')

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in all_batches(self)
    240     def all_batches(self):
    241         self.n_iter = len(self.dl)
--> 242         for o in enumerate(self.dl): self.one_batch(*o)
    243 
    244     def one_batch(self, i, b):

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in one_batch(self, i, b)
    250             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    251             if not self.training: return
--> 252             self.loss.backward();                            self('after_backward')
    253             self.opt.step();                                 self('after_step')
    254             self.opt.zero_grad()

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/torch_core.py in _f(self, *args, **kwargs)
    270         def _f(self, *args, **kwargs):
    271             cls = self.__class__
--> 272             res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
    273             return retain_type(res, self)
    274         return _f

~/daisy-gan/venv/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    148                 products. Defaults to ``False``.
    149         """
--> 150         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    151 
    152     def register_hook(self, hook):

~/daisy-gan/venv/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I suspect that has something to do with the custom loss_func, but I included it to get around the CUDA issue. Though the CAMVID test case runs without an issue, so I am still investigating what the difference is.

leaf · January 28, 2020, 6:58pm

I am just so confused. I have made my example follow so closely the only thing they differ on now is that they are loading different images. But I can now make the CAMVID example error out.

Can anyone replicate the example below? You only need to change the CAMVID path to fit your system.


import torchvision as tv
import torch; torch.__version__, torch.__file__

from src.dataset_builder import load_dataset_from_address
from utils.misc_utils import load_h5py
from PIL import Image
import numpy as np
import matplotlib.pylab as plt
# %pylab inline

from fastai2.basics import *
from fastai2.callback.all import *
from fastai2.vision.all import *


def seg_accuracy(input, target):
    target = target.squeeze(1)
    return (input.argmax(dim=1)==target).float().mean()


path = '/home/jakub/.fastai/data/camvid/images/'

dls = SegmentationDataLoaders.from_label_func(path, bs=1,
    fnames = get_image_files(path), 
    item_tfms=RandomResizedCrop(256),
#     label_func = lambda o: str(o).replace(
#                   '_standard_','_coco_').replace('standard','label').replace('jpg','png'),
    label_func = lambda o: str(o).replace('images','labels').replace('.png','_P.png'),
    codes = np.loadtxt('/home/jakub/.fastai/data/camvid/codes.txt', dtype=str),                         
    batch_tfms=[*aug_transforms(size=(360,480)), Normalize.from_stats(*imagenet_stats)])

# +
# codes = np.loadtxt('st_codes.txt', dtype=str)
codes = np.loadtxt('/home/jakub/.fastai/data/camvid/codes.txt', dtype=str)
dls.vocab = codes
name2id = {v:k for k,v in enumerate(codes)}

void_code = name2id['Void']

def acc_camvid(input, target):
    target = target.squeeze(1)
    mask = target != void_code
    return (input.argmax(dim=1)[mask]==target[mask]).float().mean()


# -

dls.show_batch(max_n=2, rows=1, vmin=1, vmax=30, figsize=(20, 7))

# +
opt_func = partial(Adam, lr=3e-3, wd=0.01)#, eps=1e-8)

learn = unet_learner(dls, resnet34, loss_func=CrossEntropyLossFlat(axis=1), opt_func=opt_func, path=path, 
                     metrics=acc_camvid, n_out=6, 
                     config = unet_config(norm_type=None, self_attention=True), wd_bn_bias=True)

# +
# lrnr = unet_learner(dls, resnet50, config=unet_config(self_attention=True), 
#                     n_out=6, loss_func=seg_accuracy)
# -

learn.fit_one_cycle(10,3e-4)

This yields the following:

epoch	train_loss	valid_loss	acc_camvid	time
0	0.000000	00:02
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in one_batch(self, i, b)
    251             if not self.training: return
--> 252             self.loss.backward();                            self('after_backward')
    253             self.opt.step();                                 self('after_step')

~/daisy-gan/venv/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    149         """
--> 150         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    151 

~/daisy-gan/venv/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 

RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-12-474116ee9487> in <module>
----> 1 learn.fit_one_cycle(10,3e-4)

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
     88     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
     89               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
---> 90     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
     91 
     92 # Cell

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    287                     try:
    288                         self.epoch=epoch;          self('begin_epoch')
--> 289                         self._do_epoch_train()
    290                         self._do_epoch_validate()
    291                     except CancelEpochException:   self('after_cancel_epoch')

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in _do_epoch_train(self)
    262         try:
    263             self.dl = self.dls.train;                  self('begin_train')
--> 264             self.all_batches()
    265         except CancelTrainException:                         self('after_cancel_train')
    266         finally:                                             self('after_train')

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in all_batches(self)
    240     def all_batches(self):
    241         self.n_iter = len(self.dl)
--> 242         for o in enumerate(self.dl): self.one_batch(*o)
    243 
    244     def one_batch(self, i, b):

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in one_batch(self, i, b)
    254             self.opt.zero_grad()
    255         except CancelBatchException:                         self('after_cancel_batch')
--> 256         finally:                                             self('after_batch')
    257 
    258     def _do_begin_fit(self, n_epoch):

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in __call__(self, event_name)
    221     def ordered_cbs(self, cb_func:str): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, cb_func)]
    222 
--> 223     def __call__(self, event_name): L(event_name).map(self._call_one)
    224     def _call_one(self, event_name):
    225         assert hasattr(event, event_name)

~/daisy-gan/venv/lib/python3.6/site-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    360              else f.format if isinstance(f,str)
    361              else f.__getitem__)
--> 362         return self._new(map(g, self))
    363 
    364     def filter(self, f, negate=False, **kwargs):

~/daisy-gan/venv/lib/python3.6/site-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
    313     @property
    314     def _xtra(self): return None
--> 315     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
    316     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
    317     def copy(self): return self._new(self.items.copy())

~/daisy-gan/venv/lib/python3.6/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

~/daisy-gan/venv/lib/python3.6/site-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    304         if items is None: items = []
    305         if (use_list is not None) or not _is_array(items):
--> 306             items = list(items) if use_list else _listify(items)
    307         if match is not None:
    308             if is_coll(match): match = len(match)

~/daisy-gan/venv/lib/python3.6/site-packages/fastcore/foundation.py in _listify(o)
    240     if isinstance(o, list): return o
    241     if isinstance(o, str) or _is_array(o): return [o]
--> 242     if is_iter(o): return list(o)
    243     return [o]
    244 

~/daisy-gan/venv/lib/python3.6/site-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
    206             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    207         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 208         return self.fn(*fargs, **kwargs)
    209 
    210 # Cell

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in _call_one(self, event_name)
    224     def _call_one(self, event_name):
    225         assert hasattr(event, event_name)
--> 226         [cb(event_name) for cb in sort_by_run(self.cbs)]
    227 
    228     def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in <listcomp>(.0)
    224     def _call_one(self, event_name):
    225         assert hasattr(event, event_name)
--> 226         [cb(event_name) for cb in sort_by_run(self.cbs)]
    227 
    228     def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in __call__(self, event_name)
     23         _run = (event_name not in _inner_loop or (self.run_train and getattr(self, 'training', True)) or
     24                (self.run_valid and not getattr(self, 'training', False)))
---> 25         if self.run and _run: getattr(self, event_name, noop)()
     26 
     27     @property

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in after_batch(self)
    495         if len(self.yb) == 0: return
    496         mets = self._train_mets if self.training else self._valid_mets
--> 497         for met in mets: met.accumulate(self.learn)
    498         if not self.training: return
    499         self.lrs.append(self.opt.hypers[-1]['lr'])

~/daisy-gan/venv/lib/python3.6/site-packages/fastai2/learner.py in accumulate(self, learn)
    458     def accumulate(self, learn):
    459         self.count += 1
--> 460         self.val = torch.lerp(to_detach(learn.loss.mean(), gather=False), self.val, self.beta)
    461     @property
    462     def value(self): return self.val/(1-self.beta**self.count)

RuntimeError: CUDA error: device-side assert triggered

muellerzr · January 28, 2020, 6:59pm

That’s due to your mask labels not being equal to all the possible pixel classes present, not an API issue. Are you sure your codes align with the dataset present? If so, add one more category for ‘other’

leaf · January 28, 2020, 7:24pm

Wow amazing! Thanks I knew it must have been something simple. Really appreciate your help!

muellerzr · January 28, 2020, 8:37pm

@sgugger I actually have a question about that, how would someone go about putting a test there to raise an issue if this is a thing (during dbunch generation) as I know this is a very common issue with segmentation. Perhaps it could be done on your DataBlock.summary()? (I’m unsure if it checks for this right now)

sgugger · January 28, 2020, 9:36pm

I can look at this when I have time (not for a bit though )

muellerzr · January 30, 2020, 7:24pm

I know you may not have gotten to it yet, but show_results is currently broken for an object detection model:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in to_concat(xs, dim)
    216     #   in this case we return a big list
--> 217     try:    return retain_type(torch.cat(xs, dim=dim), xs[0])
    218     except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])

TypeError: expected Tensor as element 0 in argument 0, but got int

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
22 frames
<ipython-input-32-c3b657dcc9ae> in <module>()
----> 1 learn.show_results()

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in show_results(self, ds_idx, dl, max_n, shuffle, **kwargs)
    332         if dl is None: dl = self.dls[ds_idx].new(shuffle=shuffle)
    333         b = dl.one_batch()
--> 334         _,_,preds = self.get_preds(dl=[b], with_decoded=True)
    335         self.dls.show_results(b, preds, max_n=max_n, **kwargs)
    336 

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in get_preds(self, ds_idx, dl, with_input, with_decoded, with_loss, act, **kwargs)
    313             self(_before_epoch)
    314             self._do_epoch_validate(ds_idx, dl)
--> 315             self(_after_epoch)
    316             if act is None: act = getattr(self.loss_func, 'activation', noop)
    317             res = cb.all_tensors()

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in __call__(self, event_name)
    221     def ordered_cbs(self, cb_func:str): return [cb for cb in sort_by_run(self.cbs) if hasattr(cb, cb_func)]
    222 
--> 223     def __call__(self, event_name): L(event_name).map(self._call_one)
    224     def _call_one(self, event_name):
    225         assert hasattr(event, event_name)

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in map(self, f, *args, **kwargs)
    360              else f.format if isinstance(f,str)
    361              else f.__getitem__)
--> 362         return self._new(map(g, self))
    363 
    364     def filter(self, f, negate=False, **kwargs):

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in _new(self, items, *args, **kwargs)
    313     @property
    314     def _xtra(self): return None
--> 315     def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
    316     def __getitem__(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
    317     def copy(self): return self._new(self.items.copy())

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    304         if items is None: items = []
    305         if (use_list is not None) or not _is_array(items):
--> 306             items = list(items) if use_list else _listify(items)
    307         if match is not None:
    308             if is_coll(match): match = len(match)

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in _listify(o)
    240     if isinstance(o, list): return o
    241     if isinstance(o, str) or _is_array(o): return [o]
--> 242     if is_iter(o): return list(o)
    243     return [o]
    244 

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in __call__(self, *args, **kwargs)
    206             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    207         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 208         return self.fn(*fargs, **kwargs)
    209 
    210 # Cell

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in _call_one(self, event_name)
    224     def _call_one(self, event_name):
    225         assert hasattr(event, event_name)
--> 226         [cb(event_name) for cb in sort_by_run(self.cbs)]
    227 
    228     def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in <listcomp>(.0)
    224     def _call_one(self, event_name):
    225         assert hasattr(event, event_name)
--> 226         [cb(event_name) for cb in sort_by_run(self.cbs)]
    227 
    228     def _bn_bias_state(self, with_bias): return bn_bias_params(self.model, with_bias).map(self.opt.state)

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in __call__(self, event_name)
     23         _run = (event_name not in _inner_loop or (self.run_train and getattr(self, 'training', True)) or
     24                (self.run_valid and not getattr(self, 'training', False)))
---> 25         if self.run and _run: getattr(self, event_name, noop)()
     26 
     27     @property

/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in after_fit(self)
     86         "Concatenate all recorded tensors"
     87         if self.with_input:     self.inputs  = detuplify(to_concat(self.inputs, dim=self.concat_dim))
---> 88         if not self.save_preds: self.preds   = detuplify(to_concat(self.preds, dim=self.concat_dim))
     89         if not self.save_targs: self.targets = detuplify(to_concat(self.targets, dim=self.concat_dim))
     90         if self.with_loss:      self.losses  = to_concat(self.losses)

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in to_concat(xs, dim)
    211 def to_concat(xs, dim=0):
    212     "Concat the element in `xs` (recursively if they are tuples/lists of tensors)"
--> 213     if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
    214     if isinstance(xs[0],dict):  return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs.keys()}
    215     #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in <listcomp>(.0)
    211 def to_concat(xs, dim=0):
    212     "Concat the element in `xs` (recursively if they are tuples/lists of tensors)"
--> 213     if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
    214     if isinstance(xs[0],dict):  return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs.keys()}
    215     #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in to_concat(xs, dim)
    211 def to_concat(xs, dim=0):
    212     "Concat the element in `xs` (recursively if they are tuples/lists of tensors)"
--> 213     if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
    214     if isinstance(xs[0],dict):  return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs.keys()}
    215     #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in <listcomp>(.0)
    211 def to_concat(xs, dim=0):
    212     "Concat the element in `xs` (recursively if they are tuples/lists of tensors)"
--> 213     if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
    214     if isinstance(xs[0],dict):  return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs.keys()}
    215     #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in to_concat(xs, dim)
    211 def to_concat(xs, dim=0):
    212     "Concat the element in `xs` (recursively if they are tuples/lists of tensors)"
--> 213     if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
    214     if isinstance(xs[0],dict):  return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs.keys()}
    215     #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in <listcomp>(.0)
    211 def to_concat(xs, dim=0):
    212     "Concat the element in `xs` (recursively if they are tuples/lists of tensors)"
--> 213     if is_listy(xs[0]): return type(xs[0])([to_concat([x[i] for x in xs], dim=dim) for i in range_of(xs[0])])
    214     if isinstance(xs[0],dict):  return {k: to_concat([x[k] for x in xs], dim=dim) for k in xs.keys()}
    215     #We may receives xs that are not concatenatable (inputs of a text classifier for instance),

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in to_concat(xs, dim)
    217     try:    return retain_type(torch.cat(xs, dim=dim), xs[0])
    218     except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])
--> 219                           for i in range_of(o_)) for o_ in xs], L())
    220 
    221 # Cell

/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in <listcomp>(.0)
    217     try:    return retain_type(torch.cat(xs, dim=dim), xs[0])
    218     except: return sum([L(retain_type(o_.index_select(dim, tensor(i)).squeeze(dim), xs[0])
--> 219                           for i in range_of(o_)) for o_ in xs], L())
    220 
    221 # Cell

/usr/local/lib/python3.6/dist-packages/fastcore/utils.py in range_of(x)
    160 def range_of(x):
    161     "All indices of collection `x` (i.e. `list(range(len(x)))`)"
--> 162     return list(range(len(x)))
    163 
    164 # Cell

TypeError: object of type 'int' has no len()

sgugger · January 30, 2020, 7:34pm

It strongly depends on how your model returns its output too. I don;t have any examples of show_results with multi-target, so there might be something broken in fastai2.

muellerzr · January 30, 2020, 7:37pm

Got it. I’ll take a look at that and see. It’s the RetinaNet architecture used in previous lectures. IIRC I had a seperate function that’s as used to show the results. I’ll get that working then Maybe we can get some inspiration on how to fit it in

boris · January 31, 2020, 3:15am

I implemented the “non-pretrained” version and am now working on the pretrained version.
I just wanted to check if for n_in>3, additional weights should be 0 as I understand they won’t ever learn anything.