Create Databunch from pytorch dataloader

sarvagya1991 · February 21, 2020, 9:39am

Hi. I don’t know if this is applicable now. But I want to ask you how to train my custom dataset. The thing is, I have images stored as npz since the images have negative values. So I’ll need to load them through numpy and then use the CNN. Hence, I have created my own data generator (as shown below):

class NumbersDataset():
    def __init__(self, inputs, labels):
        self.X = inputs
        self.y = labels

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        tmp = np.load(self.X[idx])
        img_train = tmp['x']
        tmp = np.load(self.y[idx])
        img_mask = tmp['x']
        img_train = cv2.resize(img_train, (224,224), interpolation = cv2.INTER_LANCZOS4) 
        img_mask = cv2.resize(img_mask, (224,224), interpolation = cv2.INTER_LANCZOS4) 
        return img_train, img_mask

I create a DataLoader and create DataBunch for FastAI to load it on the UNet like this:

datas = DataBunch(train_dl = dataloader_train, valid_dl = dataloader_valid)

I want to train a ResNet based UNet from scratch and for that, I used the following code:

leaner = unet_learner(data = datas, arch = models.resnet34, pretrained=False)

But I get the following error:

AttributeError: ‘NumbersDataset’ object has no attribute ‘c’

which I figured out is for the number of classes (basically for classification). But I want to use the model for regression. How do I go about it then?

sgugger · February 21, 2020, 3:19pm

Just put data.c = the number of channels of the final layer of the unet.

sarvagya1991 · February 21, 2020, 3:32pm

Hi,

Thank you very mnuch for the reply. The last layer is a convolution layer. I want to train the network to recreate the input image. Hence, it’s not a classification but a regression problem.

What do you suggest I should do in this case?

sgugger · February 21, 2020, 5:47pm

Like I said, data.c = the number of channels of the final layer of the unet. If you want an image, it’s probably 3 channels.

sarvagya1991 · February 21, 2020, 8:50pm

Hello,

Thanks. I apologize regarding the cofusion from my side. Instead of channels, I understood it as nimber of classes.

However, can you tell me what is data object? where should I mention data.c?

muellerzr · February 21, 2020, 8:53pm

It’d be datas.c before you call unet_learner

So datas.c = 3

sarvagya1991 · February 29, 2020, 8:47am

@muellerzr @sgugger

Along with the above issue, I am also working on a segmentation problem where the pixels are labeled person and not person. Just two classes.

So in the __init__ method, I have added self.c = 2 as suggested here for binary classification. I have also attached my code in case you want to have a look at it. When I run the code, I get the following error:

AttributeError: ‘dict’ object has no attribute ‘shape’

And when I try to run datas.show_batch(), I get this:

AttributeError: ‘NumbersDataset’ object has no attribute ‘x’

I also tried to return the torch tensor as an array instead of a dictionary but I still got the above error.
When I use DataBunch.create instead of DataBunch, I get this:

TypeError: new() argument after * must be an iterable, not builtin_function_or_method

In the end, I also added datas.c = 1 (final layer is a binary image). But I am still getting the above error.
What should I do?

sarvagya1991 · March 7, 2020, 8:13am

Hi @muellerzr and @sgugger, can you please help me with the issue I am facing?

c_varun · June 21, 2020, 11:59am

Hi Sarvagya !
Did you find the solution to your problem ?
Am also doing image regression, would be helpful if you could provide the solution you followed for your task.
ty

tam · June 25, 2020, 3:33am

@sgugger

I have met other with this issue when create custom dataloader

----> 1 learn.fit_one_cycle(3)

/usr/local/lib/python3.6/dist-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
     21                                        final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None):

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    194         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    195         if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
--> 196         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    197 
    198     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
     96             cb_handler.set_dl(learn.data.train_dl)
     97             cb_handler.on_epoch_begin()
---> 98             for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
     99                 xb, yb = cb_handler.on_batch_begin(xb, yb)
    100                 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)

/usr/local/lib/python3.6/dist-packages/fastprogress/fastprogress.py in __iter__(self)
     45         except Exception as e:
     46             self.on_interrupt()
---> 47             raise e
     48 
     49     def update(self, val):

/usr/local/lib/python3.6/dist-packages/fastprogress/fastprogress.py in __iter__(self)
     39         if self.total != 0: self.update(0)
     40         try:
---> 41             for i,o in enumerate(self.gen):
     42                 if i >= self.total: break
     43                 yield o

/usr/local/lib/python3.6/dist-packages/fastai/basic_data.py in __iter__(self)
     73     def __iter__(self):
     74         "Process and returns items from `DataLoader`."
---> 75         for b in self.dl: yield self.proc_batch(b)
     76 
     77     @classmethod

/usr/local/lib/python3.6/dist-packages/fastai/basic_data.py in proc_batch(self, b)
     67     def proc_batch(self,b:Tensor)->Tensor:
     68         "Process batch `b` of `TensorImage`."
---> 69         b = to_device(b, self.device)
     70         for f in listify(self.tfms): b = f(b)
     71         return b

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in to_device(b, device)
    118     "Recursively put `b` on `device`."
    119     device = ifnone(device, defaults.device)
--> 120     if is_listy(b): return [to_device(o, device) for o in b]
    121     if is_dict(b): return {k: to_device(v, device) for k, v in b.items()}
    122     return b.to(device, non_blocking=True)

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in <listcomp>(.0)
    118     "Recursively put `b` on `device`."
    119     device = ifnone(device, defaults.device)
--> 120     if is_listy(b): return [to_device(o, device) for o in b]
    121     if is_dict(b): return {k: to_device(v, device) for k, v in b.items()}
    122     return b.to(device, non_blocking=True)

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in to_device(b, device)
    118     "Recursively put `b` on `device`."
    119     device = ifnone(device, defaults.device)
--> 120     if is_listy(b): return [to_device(o, device) for o in b]
    121     if is_dict(b): return {k: to_device(v, device) for k, v in b.items()}
    122     return b.to(device, non_blocking=True)

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in <listcomp>(.0)
    118     "Recursively put `b` on `device`."
    119     device = ifnone(device, defaults.device)
--> 120     if is_listy(b): return [to_device(o, device) for o in b]
    121     if is_dict(b): return {k: to_device(v, device) for k, v in b.items()}
    122     return b.to(device, non_blocking=True)

/usr/local/lib/python3.6/dist-packages/fastai/torch_core.py in to_device(b, device)
    120     if is_listy(b): return [to_device(o, device) for o in b]
    121     if is_dict(b): return {k: to_device(v, device) for k, v in b.items()}
--> 122     return b.to(device, non_blocking=True)
    123 
    124 def data_collate(batch:ItemsList)->Tensor:

AttributeError: 'str' object has no attribute 'to'

I checked dataloader (pytorch) and iterate it

  for idx, sample_batch in enumerate(train_loader):
       image, label = sample_batch
       if idx <= 100:
           mage, label = sample_batch
           print(type(image), label)
       else:
           break

it all tensor and label, some guys suggest that maybe image is wrong (str) but in return imagedataset it always return Tensor.

Do you have any suggestion?

BTW, in document I see that

Warning: You can pass regular pytorch Dataset here, but they’ll require more attributes than the basic ones to work with the library. See below for more details.

Functions that really won’t work

To make those last functions work, you really need to use the data block API and maybe write your own custom ItemList.

DataBunch.show_batch (requires .x.reconstruct , .y.reconstruct and .x.show_xys )

what does it mean?

Thanks