Create Databunch from pytorch dataloader

this is the custom dataset template from pytorch:
if no transform apply, it will return the PIL image, but normally will have the transforms.ToTensor() apply.

from import Dataset
from torchvision import transforms

class MyCustomDataset(Dataset):
    def __init__(self, ..., transforms=None):
        # stuff
        self.transforms = transforms
    def __getitem__(self, index):
        # stuff
        data = # Some data read from a file or image
        if self.transforms is not None:
            data = self.transforms(data)
        # If the transform variable is not empty
        # then it applies the operations in the transforms with the order that it is created.
        return (img, label)

    def __len__(self):
        return count # of how many data(images?) you have

Yes and since you don’t have it here you pass Image directly in a pytorch dataloader, which again isn’t possible.

I changed my code:

  1. use transfrom during craeting dataloader
transform = transforms.Compose([
dataset = ImdbWikiDataset(transform=transform)
dataloader = DataLoader(dataset, batch_size=8, shuffle=True, num_workers=1)

dataloader works fine:

  1. create the databunch from the pytorch dataloader:
tfms_train, tfms_val = get_transforms()
test_db = DataBunch(dataloader, dataloader, tfms=tfms_train)

the error msg says: AttributeError: ‘list’ object has no attribute ‘pixel’

AttributeError                            Traceback (most recent call last)
<ipython-input-7-15cf372150e2> in <module>()
----> 1 test_db.one_batch()

~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/ in one_batch(self, ds_type, detach, denorm)
    132         w = self.num_workers
    133         self.num_workers = 0
--> 134         try:     x,y = next(iter(dl))
    135         finally: self.num_workers = w
    136         if detach: x,y = to_detach(x),to_detach(y)

~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/ in __iter__(self)
     68         for b in self.dl:
     69             y = b[1][0] if is_listy(b[1]) else b[1]
---> 70             if not self.skip_size1 or y.size(0) != 1: yield self.proc_batch(b)
     72     @classmethod

~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/ in proc_batch(self, b)
     60         "Proces batch `b` of `TensorImage`."
     61         b = to_device(b, self.device)
---> 62         for f in listify(self.tfms): b = f(b)
     63         return b

~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/ in __call__(self, x, *args, **kwargs)
    495     def __call__(self, x:Image, *args, **kwargs)->Image:
    496         "Randomly execute our tfm on `x`."
--> 497         return self.tfm(x, *args, **{**self.resolved, **kwargs}) if self.do_run else x
    499 def _resolve_tfms(tfms:TfmList):

~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/ in __call__(self, p, is_random, *args, **kwargs)
    442     def __call__(self, *args:Any, p:float=1., is_random:bool=True, **kwargs:Any)->Image:
    443         "Calc now if `args` passed; else create a transform called prob `p` if `random`."
--> 444         if args: return self.calc(*args, **kwargs)
    445         else: return RandTransform(self, kwargs=kwargs, is_random=is_random, p=p)

~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/ in calc(self, x, *args, **kwargs)
    447     def calc(self, x:Image, *args:Any, **kwargs:Any)->Image:
    448         "Apply to image `x`, wrapping it if necessary."
--> 449         if self._wrap: return getattr(x, self._wrap)(self.func, *args, **kwargs)
    450         else:          return self.func(x, *args, **kwargs)

AttributeError: 'list' object has no attribute 'pixel'

Yes, since you’re not using a fastai datasets, you can’t expect the fastai functions to work properly as they rely on different behaviors.
You can put your DataBunch in a Learner object with your custom model to use fastai to train it, but all the helper functions around your data will need fastai datasets.

Thanks Sugugger, I will change to use the fastai datasets.

The fastai doc says we can use the or during construct the Databunch. But no where to see how…


I am getting an error:

samples = collate_fn([dataset[i] for i in batch_indices])
TypeError: ‘DataLoader’ object does not support indexing

What am I doing wrong.
Why am I not able to create and train a Databunch from pytorch dataloaders

Can You please help me with the following error message

As indicated by the docs DataBunch.create takes datasets. It’s the regular init that takes DataLoader.

Oh okay.
This solved it.

Thank You.

This solved the problem but I am not getting the expected results.
Can you please take a look at my problem.
Link is given below

It’s hard to say why a model doesn’t want to train. Did you try a higher learning rate?

Yeah I did.
When I run the same model with keras it trains perfectly.

But I want to use fast ai now.

You should check the initialization. There is a bug in the default initialization of PyTorch for conv layers, that might be the difference with Keras.

I tried that and still accuracy is 14%.
It is weird as no matter what validation loss I have The accuracy is ~14%.

I have had val_loss=9.5 and accuracy ~14% and also
val_loss=1.93 and still accuracy is ~ 14%

As you can see.
All my predictions are exactly the same

So I searched on how to check gradients of different layers in the model.
It turns out all my gradients are zero.

Can you tell me a possible solution/reason for this.
Below is the code that I am using to initialize my weights now.

If you don’t get gradients, that’s the whole reason your model doesn’t train. How did you check them? Note that they are zeroed in the training loop after each step, so just looking after a fit of 1 epoch doesn’t mean they were all zeros.

You should amnually check with

x,y = next(iter(data.train_dl))
z = model(x)
loss = criterion(z,y)

and see if you can then see gradients, for instance in model.layer1.weight.grad


so when I try this my gradients are not zero.

So I tried to go deeper into the problem and as you can see in the snippet below.
My parameters before and after the update step are not same i.e they get updated.

I am not sure if i am going on the right track but everything seems fine.

Have you tried creating a DataBunch from Pytorch dataset using DataBunch.create()?

Hi. I don’t know if this is applicable now. But I want to ask you how to train my custom dataset. The thing is, I have images stored as npz since the images have negative values. So I’ll need to load them through numpy and then use the CNN. Hence, I have created my own data generator (as shown below):

class NumbersDataset():
    def __init__(self, inputs, labels):
        self.X = inputs
        self.y = labels

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        tmp = np.load(self.X[idx])
        img_train = tmp['x']
        tmp = np.load(self.y[idx])
        img_mask = tmp['x']
        img_train = cv2.resize(img_train, (224,224), interpolation = cv2.INTER_LANCZOS4) 
        img_mask = cv2.resize(img_mask, (224,224), interpolation = cv2.INTER_LANCZOS4) 
        return img_train, img_mask

I create a DataLoader and create DataBunch for FastAI to load it on the UNet like this:

datas = DataBunch(train_dl = dataloader_train, valid_dl = dataloader_valid)

I want to train a ResNet based UNet from scratch and for that, I used the following code:

leaner = unet_learner(data = datas, arch = models.resnet34, pretrained=False)

But I get the following error:

AttributeError: ‘NumbersDataset’ object has no attribute ‘c’

which I figured out is for the number of classes (basically for classification). But I want to use the model for regression. How do I go about it then?

Just put data.c = the number of channels of the final layer of the unet.

1 Like