Create Databunch from pytorch dataloader

I try to create a Databunch from pytorch dataloader but failed.
I need to build a network for age and gender estimation, thus my dataset have to return the image, age and gender info.
Fastai version:1.0.33 and below is the my code and error screenshot:

1 Like

This is unrelated to fastai: you can’t put PIL Image directly in a pytorch dataloader.

this is the custom dataset template from pytorch:
if no transform apply, it will return the PIL image, but normally will have the transforms.ToTensor() apply.

from torch.utils.data.dataset import Dataset
from torchvision import transforms

class MyCustomDataset(Dataset):
    def __init__(self, ..., transforms=None):
        # stuff
        ...
        self.transforms = transforms
        
    def __getitem__(self, index):
        # stuff
        ...
        data = # Some data read from a file or image
        if self.transforms is not None:
            data = self.transforms(data)
        # If the transform variable is not empty
        # then it applies the operations in the transforms with the order that it is created.
        return (img, label)

    def __len__(self):
        return count # of how many data(images?) you have

Yes and since you don’t have it here you pass Image directly in a pytorch dataloader, which again isn’t possible.

I changed my code:

  1. use transfrom during craeting dataloader
transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.ToTensor(),
])
dataset = ImdbWikiDataset(transform=transform)
dataloader = DataLoader(dataset, batch_size=8, shuffle=True, num_workers=1)

dataloader works fine:
image

  1. create the databunch from the pytorch dataloader:
tfms_train, tfms_val = get_transforms()
test_db = DataBunch(dataloader, dataloader, tfms=tfms_train)
test_db.one_batch()

the error msg says: AttributeError: ‘list’ object has no attribute ‘pixel’

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-15cf372150e2> in <module>()
----> 1 test_db.one_batch()


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm)
    132         w = self.num_workers
    133         self.num_workers = 0
--> 134         try:     x,y = next(iter(dl))
    135         finally: self.num_workers = w
    136         if detach: x,y = to_detach(x),to_detach(y)


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in __iter__(self)
     68         for b in self.dl:
     69             y = b[1][0] if is_listy(b[1]) else b[1]
---> 70             if not self.skip_size1 or y.size(0) != 1: yield self.proc_batch(b)
     71 
     72     @classmethod


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in proc_batch(self, b)
     60         "Proces batch `b` of `TensorImage`."
     61         b = to_device(b, self.device)
---> 62         for f in listify(self.tfms): b = f(b)
     63         return b
     64 


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in __call__(self, x, *args, **kwargs)
    495     def __call__(self, x:Image, *args, **kwargs)->Image:
    496         "Randomly execute our tfm on `x`."
--> 497         return self.tfm(x, *args, **{**self.resolved, **kwargs}) if self.do_run else x
    498 
    499 def _resolve_tfms(tfms:TfmList):


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in __call__(self, p, is_random, *args, **kwargs)
    442     def __call__(self, *args:Any, p:float=1., is_random:bool=True, **kwargs:Any)->Image:
    443         "Calc now if `args` passed; else create a transform called prob `p` if `random`."
--> 444         if args: return self.calc(*args, **kwargs)
    445         else: return RandTransform(self, kwargs=kwargs, is_random=is_random, p=p)
    446 


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in calc(self, x, *args, **kwargs)
    447     def calc(self, x:Image, *args:Any, **kwargs:Any)->Image:
    448         "Apply to image `x`, wrapping it if necessary."
--> 449         if self._wrap: return getattr(x, self._wrap)(self.func, *args, **kwargs)
    450         else:          return self.func(x, *args, **kwargs)
    451 


AttributeError: 'list' object has no attribute 'pixel'

Yes, since you’re not using a fastai datasets, you can’t expect the fastai functions to work properly as they rely on different behaviors.
You can put your DataBunch in a Learner object with your custom model to use fastai to train it, but all the helper functions around your data will need fastai datasets.

Thanks Sugugger, I will change to use the fastai datasets.

The fastai doc says we can use the torch.utils.data.DataLoader or torch.utils.data.Dataset during construct the Databunch. But no where to see how…

2 Likes

I am getting an error:

samples = collate_fn([dataset[i] for i in batch_indices])
TypeError: ‘DataLoader’ object does not support indexing

What am I doing wrong.
Why am I not able to create and train a Databunch from pytorch dataloaders

Hey
Can You please help me with the following error message

As indicated by the docs DataBunch.create takes datasets. It’s the regular init that takes DataLoader.

Oh okay.
This solved it.

Thank You.

Hey,
This solved the problem but I am not getting the expected results.
Can you please take a look at my problem.
Link is given below

It’s hard to say why a model doesn’t want to train. Did you try a higher learning rate?

Yeah I did.
When I run the same model with keras it trains perfectly.

But I want to use fast ai now.

You should check the initialization. There is a bug in the default initialization of PyTorch for conv layers, that might be the difference with Keras.

Hey,
I tried that and still accuracy is 14%.
It is weird as no matter what validation loss I have The accuracy is ~14%.

I have had val_loss=9.5 and accuracy ~14% and also
val_loss=1.93 and still accuracy is ~ 14%

As you can see.
All my predictions are exactly the same

Hey,
So I searched on how to check gradients of different layers in the model.
It turns out all my gradients are zero.

Can you tell me a possible solution/reason for this.
Below is the code that I am using to initialize my weights now.

If you don’t get gradients, that’s the whole reason your model doesn’t train. How did you check them? Note that they are zeroed in the training loop after each step, so just looking after a fit of 1 epoch doesn’t mean they were all zeros.

You should amnually check with

model.train()
x,y = next(iter(data.train_dl))
z = model(x)
loss = criterion(z,y)
loss.backward()

and see if you can then see gradients, for instance in model.layer1.weight.grad

2 Likes

Hey,
so when I try this my gradients are not zero.


So I tried to go deeper into the problem and as you can see in the snippet below.
My parameters before and after the update step are not same i.e they get updated.

I am not sure if i am going on the right track but everything seems fine.

Have you tried creating a DataBunch from Pytorch dataset using DataBunch.create()?