Segmentation in V1

sgugger · October 13, 2018, 5:51pm

I think the 2 should be -2. You’re cutting your model at the second layer instead of the second to last, which then doesn’t work with the split.
This split will be made easier later, it’s basically the layers where to separate the model to have three groups for differential learning rates.

MicPie · October 13, 2018, 6:13pm

I am not sure if I should start an extra thread for this, but I currently try to debug this strange behavior when I iterate through a DataBunch from ObjectDetectDatasets based on png images with bounding boxes (so kind of segmentation):

Test code for a dummy dataset of 100 entries each for train and valid:

# Create ObjectDetectDatasets
train_ds = get_datasets(PATH_train)
valid_ds = get_datasets(PATH_valid)
size = 128
bs = 4 # bs=1 is working!

# Create DataBunch
def get_data(bs, size):
    return DataBunch.create(train_ds, valid_ds, bs=bs, size=size,ds_tfms=None, path=PATH)

data = get_data(bs, size)

# Test DataBunch DataLoader
for i in range(100):
    print(i, end=', ')
    next(iter(data.train_dl.dl))

Output:
bs = 1

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,

For bs > 1it stops the loop after a unreproducible step (I guess due to random shuffling) with this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-132-d0590afb83a6> in <module>()
      1 for i in range(100):
      2     print(i, end=', ')
----> 3     next(iter(data.train_dl.dl))

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    351                 self.reorder_dict[idx] = batch
    352                 continue
--> 353             return self._process_next_batch(batch)
    354 
    355     next = __next__  # Python 2 compatibility

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    372         self._put_indices()
    373         if isinstance(batch, ExceptionWrapper):
--> 374             raise batch.exc_type(batch.exc_msg)
    375         return batch
    376 

RuntimeError: Traceback (most recent call last):
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 114, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/paperspace/fastai/fastai/torch_core.py", line 86, in data_collate
    return torch.utils.data.dataloader.default_collate(to_data(batch))
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 198, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 175, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 2 and 3 at /opt/conda/conda-bld/pytorch-nightly_1538165619353/work/aten/src/TH/generic/THTensorMoreMath.cpp:1308

or got 3 and 2.

The error also occurs with the show_image_batch() method.

I found this thread on PyTorch forum which points into direction of png files with different channel numbers: https://discuss.pytorch.org/t/runtimeerror-invalid-argument-0/17919/5
However, in the fastai library the open_image() function uses .convert('RGB') and when I debug the tensor shapes I always find the same shape for each element with 3 channels x width x height.

What I don’t get is why is it working with bs = 1?

Maybe somebody has a tip?
Maybe I am using parts of the library which are currently under development?

Thank you & best regards
Michael

PS: When I try to visualize the images with show_image_batch() and bs = 1 I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-68-7d54941a820f> in <module>
      1 # http://docs.fast.ai/vision.data.html
----> 2 show_image_batch(data.train_dl, data.train_ds.classes, rows=3, figsize=(5,5))

~/fastai/fastai/vision/data.py in show_image_batch(dl, classes, rows, figsize, denorm)
     44     x = x[:rows*rows].cpu()
     45     if denorm: x = denorm(x)
---> 46     show_images(x,y[:rows*rows].cpu(),rows, classes, figsize)
     47 
     48 def show_images(x:Collection[Image],y:int,rows:int, classes:Collection[str], figsize:Tuple[int,int]=(9,9))->None:

AttributeError: 'list' object has no attribute 'cpu'

When I define a custom show_image_batch() function without the .cpu() in the jupyter notebook I get this error:
NameError: name 'show_image' is not defined

MicPie · October 13, 2018, 6:37pm

@Tcapelle If you call the model with tvm.resnet34() and your body, you can see where the create_body function cuts.

For example with 2 instead of -2 you only see the input stage with the 7x7 kernel and a subsequent batchnorm layer:

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

tcapelle · October 13, 2018, 8:06pm

Thanks, I have already figured it out, I changed this to -2.
Anyway,

lr_find(learn)
>> ---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-98-dcd2a06c9caf> in <module>()
----> 1 lr_find(learn)
      2 # learn.recorder.plot()

/usr/local/lib/python3.6/dist-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, **kwargs)
     24     cb = LRFinder(learn, start_lr, end_lr, num_it)
     25     a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 26     learn.fit(a, start_lr, callbacks=[cb], **kwargs)
     27 
     28 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    135         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    136         fit(epochs, self.model, self.loss_fn, opt=self.opt, data=self.data, metrics=self.metrics,
--> 137             callbacks=self.callbacks+callbacks)
    138 
    139     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_fn, opt, data, callbacks, metrics)
     88     except Exception as e:
     89         exception = e
---> 90         raise e
     91     finally: cb_handler.on_train_end(exception)
     92 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_fn, opt, data, callbacks, metrics)
     78             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     79                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 80                 loss = loss_batch(model, xb, yb, loss_fn, opt, cb_handler)
     81                 if cb_handler.on_batch_end(loss): break
     82 

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_fn, opt, cb_handler, metrics)
     16     if not is_listy(xb): xb = [xb]
     17     if not is_listy(yb): yb = [yb]
---> 18     out = model(*xb)
     19     out = cb_handler.on_loss_begin(out)
     20 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/fastai/vision/models/unet.py in forward(self, up_in)
     28         up_out = self.upconv(up_in)
     29         cat_x = torch.cat([up_out, self.hook.stored], dim=1)
---> 30         x = F.relu(self.conv1(cat_x))
     31         x = F.relu(self.conv2(x))
     32         return self.bn(x)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
    311     def forward(self, input):
    312         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 313                         self.padding, self.dilation, self.groups)
    314 
    315 

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in handler(signum, frame)
    271         # This following call uses `waitid` with WNOHANG from C side. Therefore,
    272         # Python can still get and update the process status successfully.
--> 273         _error_if_any_worker_fails()
    274         if previous_handler is not None:
    275             previous_handler(signum, frame)

RuntimeError: DataLoader worker (pid 2631) is killed by signal: Bus error.

This is the call I would like to understand: learn.split([model[0][6], model[1]])

I was pretty good using fastai v0.7 and I am having a hard time with this…

sgugger · October 13, 2018, 8:24pm

You didn’t specify any transform so I’m guessing you don’t have images of the same size. The error message indicates pytorch isn’t able to group them in a batch.

sgugger · October 13, 2018, 8:27pm

This error message has nothing to do with the model being split. A quick search led me there, I don’t know if this is applicable to you or not.

tcapelle · October 13, 2018, 8:55pm

learn.model.state_dict() this is not working.

tcapelle · October 13, 2018, 9:00pm

I solved this reducing the bs. I never had this problem in v0.7 with the exactly same dataset and params.

jeremy · October 13, 2018, 11:15pm

We need more info to help you. Stack trace and exact code and error message at least.

MicPie · October 14, 2018, 10:07am

Thank you for your fast reply!
You guys are great!

I now added a ds_tfms and tfms (see code below).
I have to specifiy the ds_tfms as a list because, otherwise I get an error that it cannot be indexed.

Calling data.train_ds.tfms, data.valid_ds.tfms, data.train_dl.tfms, and data.valid_dl.tfms returns the information on the transformation and seems to be looking ok.

However, I get this error with show_image_batch():

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-31-7d54941a820f> in <module>
----> 1 show_image_batch(data.train_dl, data.train_ds.classes, rows=3, figsize=(5,5))

~/fastai/fastai/vision/data.py in show_image_batch(dl, classes, rows, figsize, denorm)
     40                      denorm:Callable=None) -> None:
     41     "Show a few images from a batch."
---> 42     x,y = next(iter(dl))
     43     if rows is None: rows = int(math.sqrt(len(x)))
     44     x = x[:rows*rows].cpu()

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    602                 self.reorder_dict[idx] = batch
    603                 continue
--> 604             return self._process_next_batch(batch)
    605 
    606     next = __next__  # Python 2 compatibility

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    623         self._put_indices()
    624         if isinstance(batch, ExceptionWrapper):
--> 625             raise batch.exc_type(batch.exc_msg)
    626         return batch
    627 

AttributeError: Traceback (most recent call last):
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 137, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 137, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/paperspace/fastai/fastai/vision/data.py", line 190, in __getitem__
    x = apply_tfms(self.tfms, x, **self.kwargs)
  File "/home/paperspace/fastai/fastai/vision/image.py", line 422, in apply_tfms
    tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)
  File "/home/paperspace/fastai/fastai/vision/image.py", line 422, in <lambda>
    tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)
AttributeError: 'functools.partial' object has no attribute 'tfm'

This is my code:

paths = [path1, path2]

stats = ([0.4914, 0.4914, 0.4914], [0.2492, 0.2492, 0.2492])
norm, denorm = normalize_funcs(*stats)

tfms = get_transforms()

def get_tfms_datasets(size, paths, tfms):
    datasets = get_datasets(paths)
    return transform_datasets(*datasets, test_ds=None, tfms=tfms, size=size)

def get_data(bs, size, paths):
    return DataBunch.create(*get_tfms_datasets(size, tfms=tfms, paths=paths), bs=bs, size=size, ds_tfms=[norm, norm], tfms=tfms)

data = get_data(bs, size, paths)

I also upgraded to the latest pytorch-nightly.
There must be still an issue with how I apply the tfms to the data…?

Best regards
Michael

sgugger · October 14, 2018, 12:46pm

Yes ds_tfms must be a list of two list of transforms (one for the training set, one for the validation set) as explained in the docs.
Then in your last DataBunch you’re mxing the arguments: tfms are the transforms that will be applied to the batches, so it should be [norm] and ds_tfms should be your tfms variable.

tcapelle · October 14, 2018, 2:57pm

I think the error comes from loading the weights of resnet34.

from fastai.vision.models.unet import *
body = create_body(tvm.resnet34(True), -2) #/root/.torch/models/
model = DynamicUnet(body, n_classes=2).cuda()

learn = Learner(data, model, metrics=metrics,
                loss_fn=CrossEntropyFlat())
learn.split([model[0][7], model[1]])
learn.freeze()
lr_find(learn)

>>RuntimeError                              Traceback (most recent call last)
<ipython-input-78-dcd2a06c9caf> in <module>()
----> 1 lr_find(learn)
      2 # learn.recorder.plot()

/usr/local/lib/python3.6/dist-packages/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, **kwargs)
     24     cb = LRFinder(learn, start_lr, end_lr, num_it)
     25     a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 26     learn.fit(a, start_lr, callbacks=[cb], **kwargs)
     27 
     28 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    136         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    137         fit(epochs, self.model, self.loss_fn, opt=self.opt, data=self.data, metrics=self.metrics,
--> 138             callbacks=self.callbacks+callbacks)
    139 
    140     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_fn, opt, data, callbacks, metrics)
     69     cb_handler = CallbackHandler(callbacks)
     70     pbar = master_bar(range(epochs))
---> 71     cb_handler.on_train_begin(epochs, pbar=pbar, metrics=metrics)
     72 
     73     exception=False

/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_train_begin(self, epochs, pbar, metrics)
    186         self.state_dict = _get_init_state()
    187         self.state_dict['n_epochs'],self.state_dict['pbar'],self.state_dict['metrics'] = epochs,pbar,metrics
--> 188         self('train_begin')
    189 
    190     def on_epoch_begin(self)->None:

/usr/local/lib/python3.6/dist-packages/fastai/callback.py in __call__(self, cb_name, **kwargs)
    180     def __call__(self, cb_name, **kwargs)->None:
    181         "Call through to all of the `CallbakHandler` functions."
--> 182         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    183 
    184     def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None:

/usr/local/lib/python3.6/dist-packages/fastai/callback.py in <listcomp>(.0)
    180     def __call__(self, cb_name, **kwargs)->None:
    181         "Call through to all of the `CallbakHandler` functions."
--> 182         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    183 
    184     def on_train_begin(self, epochs:int, pbar:PBar, metrics:MetricFuncList)->None:

/usr/local/lib/python3.6/dist-packages/fastai/callbacks/lr_finder.py in on_train_begin(self, **kwargs)
     22     def on_train_begin(self, **kwargs:Any)->None:
     23         "Initialize optimizer and learner hyperparameters."
---> 24         self.learn.save('tmp')
     25         self.opt = self.learn.opt
     26         self.opt.lr = self.sched.start

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in save(self, name)
    167     def save(self, name:PathOrStr):
    168         "Save model with `name` to `self.model_dir`."
--> 169         torch.save(self.model.state_dict(), self.path/self.model_dir/f'{name}.pth')
    170 
    171     def load(self, name:PathOrStr):

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol)
    207         >>> torch.save(x, buffer)
    208     """
--> 209     return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
    210 
    211 

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _with_file_like(f, mode, body)
    132         f = open(f, mode)
    133     try:
--> 134         return body(f)
    135     finally:
    136         if new_fd:

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in <lambda>(f)
    207         >>> torch.save(x, buffer)
    208     """
--> 209     return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
    210 
    211 

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _save(obj, f, pickle_module, pickle_protocol)
    286     f.flush()
    287     for key in serialized_storage_keys:
--> 288         serialized_storages[key]._write_file(f, _should_read_directly(f))
    289 
    290 

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/csrc/generic/serialization.cpp:15

I will add that when this eerror ocurr, I am force to restart the kernel, even a model that worked before (some lines before) it triggers:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 240, in reduce_storage
    fd, size = storage._share_fd_()
RuntimeError: unable to write to file </torch_352_228954310>

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-87-aec71e564917> in <module>()
----> 1 x,y = next(iter(md.train_dl))

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    596         while True:
    597             assert (not self.shutdown and self.batches_outstanding > 0)
--> 598             idx, batch = self._get_batch()
    599             self.batches_outstanding -= 1
    600             if idx != self.rcvd_idx:

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _get_batch(self)
    575             # need to call `.task_done()` because we don't use `.join()`.
    576         else:
--> 577             return self.data_queue.get()
    578 
    579     def __next__(self):

/usr/lib/python3.6/multiprocessing/queues.py in get(self, block, timeout)
     92         if block and timeout is None:
     93             with self._rlock:
---> 94                 res = self._recv_bytes()
     95             self._sem.release()
     96         else:

/usr/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength)
    214         if maxlength is not None and maxlength < 0:
    215             raise ValueError("negative maxlength")
--> 216         buf = self._recv_bytes(maxlength)
    217         if buf is None:
    218             self._bad_message_length()

/usr/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize)
    405 
    406     def _recv_bytes(self, maxsize=None):
--> 407         buf = self._recv(4)
    408         size, = struct.unpack("!i", buf.getvalue())
    409         if maxsize is not None and size > maxsize:

/usr/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read)
    377         remaining = size
    378         while remaining > 0:
--> 379             chunk = read(handle, remaining)
    380             n = len(chunk)
    381             if n == 0:

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in handler(signum, frame)
    271         # This following call uses `waitid` with WNOHANG from C side. Therefore,
    272         # Python can still get and update the process status successfully.
--> 273         _error_if_any_worker_fails()
    274         if previous_handler is not None:
    275             previous_handler(signum, frame)

RuntimeError: DataLoader worker (pid 351) is killed by signal: Bus error.

sgugger · October 14, 2018, 5:41pm

The last one is linked to pytorch. On what environment are you running this?

tcapelle · October 14, 2018, 6:12pm

https://colab.research.google.com/gist/tcapelle/4083ffd865fabc8703175515d521a7f2/tgs-fastai-v1.ipynb

sgugger · October 14, 2018, 6:15pm

Ok, so on colab, there is an issue with pytorch as reported here. You need to increase the amount of shared memory you have.

tcapelle · October 14, 2018, 6:22pm

Do you know why the first model just runs fine then and the segmentation one does not?

MicPie · October 14, 2018, 7:12pm

Thank you for your help and patience.

But I am not sure if I understood it correctly.
I set up like this (and I also tried the other combination, see below):

def get_tfms_datasets(size, paths, tfms):
    datasets = get_datasets(paths)
    #print('get_tfms_datasets tfms:\n', tfms)
    return transform_datasets(*datasets, test_ds=None, tfms=tfms, tfm_y=True, size=size)

def get_data(bs, size, paths):
    return DataBunch.create(*get_tfms_datasets(size=size, paths=paths, tfms=[get_transforms(), get_transforms()]), bs=bs, size=size, tfms=rsna_norm)

data = get_data(bs, size, paths)

These are the results:

data.train_dl.tfms & data.valid_dl.tfms show the norm function:

[functools.partial(<function _normalize_batch at 0x7f0162126510>, mean=tensor([0.4914, 0.4914, 0.4914]), std=tensor([0.2492, 0.2492, 0.2492]))]

data.train_ds.tfms & data.valid_ds.tfms show the data augm. func.:

([RandTransform(tfm=TfmCrop (crop_pad), kwargs={'row_pct': (0, 1), 'col_pct': (0, 1)}, p=1.0, resolved={}, do_run=True, is_random=True),
  RandTransform(tfm=TfmPixel (flip_lr), kwargs={}, p=0.5, resolved={}, do_run=True, is_random=True),
  RandTransform(tfm=TfmCoord (symmetric_warp), kwargs={'magnitude': (-0.2, 0.2)}, p=0.75, resolved={}, do_run=True, is_random=True),
  RandTransform(tfm=TfmAffine (rotate), kwargs={'degrees': (-10.0, 10.0)}, p=0.75, resolved={}, do_run=True, is_random=True),
  RandTransform(tfm=TfmAffine (zoom), kwargs={'row_pct': (0, 1), 'col_pct': (0, 1), 'scale': (1.0, 1.1)}, p=0.75, resolved={}, do_run=True, is_random=True),
  RandTransform(tfm=TfmLighting (brightness), kwargs={'change': (0.4, 0.6)}, p=0.75, resolved={}, do_run=True, is_random=True),
  RandTransform(tfm=TfmLighting (contrast), kwargs={'scale': (0.8, 1.25)}, p=0.75, resolved={}, do_run=True, is_random=True)],
 [RandTransform(tfm=TfmCrop (crop_pad), kwargs={}, p=1.0, resolved={}, do_run=True, is_random=True)])

But with:

show_image_batch(data.train_dl, data.train_ds.classes, rows=3, figsize=(5,5))

I still get the same AttributeError: 'list' object has no attribute 'tfm' from above.

When I debug the error, the dl in line 42 shows the norm. func.:

/home/paperspace/fastai/fastai/vision/data.py(42)show_image_batch()
     40                      denorm:Callable=None) -> None:
     41     "Show a few images from a batch."
---> 42     x,y = next(iter(dl))
     43     if rows is None: rows = int(math.sqrt(len(x)))
     44     x = x[:rows*rows].cpu()

ipdb> dl
DeviceDataLoader(dl=<torch.utils.data.dataloader.DataLoader object at 0x7f016054b780>, device=device(type='cuda'), tfms=[functools.partial(<function _normalize_batch at 0x7f0162126510>, mean=tensor([0.4914, 0.4914, 0.4914]), std=tensor([0.2492, 0.2492, 0.2492]))], collate_fn=<function data_collate at 0x7f016922a840>)

This also happens when I change the tfms func. to (= exchange the two tfms func. from above with each other):

def get_data(bs, size, paths):
    return DataBunch.create(*get_tfms_datasets(size=size, paths=paths, tfms=[rsna_norm, rsna_norm]), bs=bs, size=size, tfms=get_transforms())

With that setup I still get the AttributeError: 'functools.partial' object has no attribute 'tfm' even though I see at the debugging the tfm attribute in the dl that generates the error:

DeviceDataLoader(dl=<torch.utils.data.dataloader.DataLoader object at 0x7f01605384a8>, device=device(type='cuda'), tfms=[[RandTransform(tfm=TfmCrop (crop_pad), kwargs={'row_pct': (0, 1), 'col_pct': (0, 1)}, p=1.0, resolved={}, do_run=True, is_random=True), RandTransform(tfm=TfmPixel (flip_lr), kwargs={}, p=0.5, resolved={}, do_run=True, is_random=True), RandTransform(tfm=TfmCoord (symmetric_warp), kwargs={'magnitude': (-0.2, 0.2)}, p=0.75, resolved={}, do_run=True, is_random=True), RandTransform(tfm=TfmAffine (rotate), kwargs={'degrees': (-10.0, 10.0)}, p=0.75, resolved={}, do_run=True, is_random=True), RandTransform(tfm=TfmAffine (zoom), kwargs={'row_pct': (0, 1), 'col_pct': (0, 1), 'scale': (1.0, 1.1)}, p=0.75, resolved={}, do_run=True, is_random=True), RandTransform(tfm=TfmLighting (brightness), kwargs={'change': (0.4, 0.6)}, p=0.75, resolved={}, do_run=True, is_random=True), RandTransform(tfm=TfmLighting (contrast), kwargs={'scale': (0.8, 1.25)}, p=0.75, resolved={}, do_run=True, is_random=True)], [RandTransform(tfm=TfmCrop (crop_pad), kwargs={}, p=1.0, resolved={}, do_run=True, is_random=True)]], collate_fn=<function data_collate at 0x7f016922a840>)

I also checked the docs for ObjectDetectDataset and transform_datasets.

I am not sure what I can mess up or in which direction I should debug further?

digitalspecialists · October 14, 2018, 7:26pm

If I understand v1 myself (a stretch!), the list error comes from tfms=[get_transforms(), get_transforms()] which should be ds_tfms=get_transforms() and am not sure that you need get_tfms_datasets as the DataBunch creator already calls it, just pass in your datasets.

sgugger · October 14, 2018, 7:59pm

I think it’s due to our inefficient implementation of bounding boxes for data augmentation. It will change soon as we commit the transformations for points inside the main fastai library.

tcapelle · October 15, 2018, 4:23pm

nothing is working now =(, resnet34 is not there anymore, Darknet does not work, snif…
What would you recommend to be able to help in dev, a paperspace instance?
Colab is free, and most people will try the library there first, and the K80 is not bad.
Kaggle is not working either.