Create Databunch from pytorch dataloader

nep7une · December 6, 2018, 3:47pm

I try to create a Databunch from pytorch dataloader but failed.
I need to build a network for age and gender estimation, thus my dataset have to return the image, age and gender info.
Fastai version:1.0.33 and below is the my code and error screenshot:

sgugger · December 6, 2018, 9:53pm

This is unrelated to fastai: you can’t put PIL Image directly in a pytorch dataloader.

nep7une · December 7, 2018, 12:39am

this is the custom dataset template from pytorch:
if no transform apply, it will return the PIL image, but normally will have the transforms.ToTensor() apply.

from torch.utils.data.dataset import Dataset
from torchvision import transforms

class MyCustomDataset(Dataset):
    def __init__(self, ..., transforms=None):
        # stuff
        ...
        self.transforms = transforms
        
    def __getitem__(self, index):
        # stuff
        ...
        data = # Some data read from a file or image
        if self.transforms is not None:
            data = self.transforms(data)
        # If the transform variable is not empty
        # then it applies the operations in the transforms with the order that it is created.
        return (img, label)

    def __len__(self):
        return count # of how many data(images?) you have

sgugger · December 7, 2018, 2:33am

Yes and since you don’t have it here you pass Image directly in a pytorch dataloader, which again isn’t possible.

nep7une · December 7, 2018, 3:17am

I changed my code:

use transfrom during craeting dataloader

transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.ToTensor(),
])
dataset = ImdbWikiDataset(transform=transform)
dataloader = DataLoader(dataset, batch_size=8, shuffle=True, num_workers=1)

dataloader works fine:

create the databunch from the pytorch dataloader:

tfms_train, tfms_val = get_transforms()
test_db = DataBunch(dataloader, dataloader, tfms=tfms_train)
test_db.one_batch()

the error msg says: AttributeError: ‘list’ object has no attribute ‘pixel’

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-15cf372150e2> in <module>()
----> 1 test_db.one_batch()


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm)
    132         w = self.num_workers
    133         self.num_workers = 0
--> 134         try:     x,y = next(iter(dl))
    135         finally: self.num_workers = w
    136         if detach: x,y = to_detach(x),to_detach(y)


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in __iter__(self)
     68         for b in self.dl:
     69             y = b[1][0] if is_listy(b[1]) else b[1]
---> 70             if not self.skip_size1 or y.size(0) != 1: yield self.proc_batch(b)
     71 
     72     @classmethod


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in proc_batch(self, b)
     60         "Proces batch `b` of `TensorImage`."
     61         b = to_device(b, self.device)
---> 62         for f in listify(self.tfms): b = f(b)
     63         return b
     64 


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in __call__(self, x, *args, **kwargs)
    495     def __call__(self, x:Image, *args, **kwargs)->Image:
    496         "Randomly execute our tfm on `x`."
--> 497         return self.tfm(x, *args, **{**self.resolved, **kwargs}) if self.do_run else x
    498 
    499 def _resolve_tfms(tfms:TfmList):


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in __call__(self, p, is_random, *args, **kwargs)
    442     def __call__(self, *args:Any, p:float=1., is_random:bool=True, **kwargs:Any)->Image:
    443         "Calc now if `args` passed; else create a transform called prob `p` if `random`."
--> 444         if args: return self.calc(*args, **kwargs)
    445         else: return RandTransform(self, kwargs=kwargs, is_random=is_random, p=p)
    446 


~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in calc(self, x, *args, **kwargs)
    447     def calc(self, x:Image, *args:Any, **kwargs:Any)->Image:
    448         "Apply to image `x`, wrapping it if necessary."
--> 449         if self._wrap: return getattr(x, self._wrap)(self.func, *args, **kwargs)
    450         else:          return self.func(x, *args, **kwargs)
    451 


AttributeError: 'list' object has no attribute 'pixel'

sgugger · December 7, 2018, 3:22am

Yes, since you’re not using a fastai datasets, you can’t expect the fastai functions to work properly as they rely on different behaviors.
You can put your DataBunch in a Learner object with your custom model to use fastai to train it, but all the helper functions around your data will need fastai datasets.

nep7une · December 7, 2018, 3:37am

Thanks Sugugger, I will change to use the fastai datasets.

The fastai doc says we can use the torch.utils.data.DataLoader or torch.utils.data.Dataset during construct the Databunch. But no where to see how…

shivamchandhok · March 25, 2019, 12:12pm

I am getting an error:

samples = collate_fn([dataset[i] for i in batch_indices])
TypeError: ‘DataLoader’ object does not support indexing

What am I doing wrong.
Why am I not able to create and train a Databunch from pytorch dataloaders

shivamchandhok · March 25, 2019, 12:13pm

Hey
Can You please help me with the following error message

sgugger · March 25, 2019, 1:08pm

As indicated by the docs DataBunch.create takes datasets. It’s the regular init that takes DataLoader.

shivamchandhok · March 25, 2019, 7:42pm

Oh okay.
This solved it.

Thank You.

shivamchandhok · March 28, 2019, 11:52am

Hey,
This solved the problem but I am not getting the expected results.
Can you please take a look at my problem.
Link is given below

sgugger · March 28, 2019, 2:30pm

It’s hard to say why a model doesn’t want to train. Did you try a higher learning rate?

shivamchandhok · March 28, 2019, 2:37pm

Yeah I did.
When I run the same model with keras it trains perfectly.

But I want to use fast ai now.

sgugger · March 28, 2019, 2:50pm

You should check the initialization. There is a bug in the default initialization of PyTorch for conv layers, that might be the difference with Keras.

shivamchandhok · March 28, 2019, 3:26pm

Hey,
I tried that and still accuracy is 14%.
It is weird as no matter what validation loss I have The accuracy is ~14%.

I have had val_loss=9.5 and accuracy ~14% and also
val_loss=1.93 and still accuracy is ~ 14%

As you can see.
All my predictions are exactly the same

shivamchandhok · March 28, 2019, 6:11pm

Hey,
So I searched on how to check gradients of different layers in the model.
It turns out all my gradients are zero.

Can you tell me a possible solution/reason for this.
Below is the code that I am using to initialize my weights now.

sgugger · March 28, 2019, 7:13pm

If you don’t get gradients, that’s the whole reason your model doesn’t train. How did you check them? Note that they are zeroed in the training loop after each step, so just looking after a fit of 1 epoch doesn’t mean they were all zeros.

You should amnually check with

model.train()
x,y = next(iter(data.train_dl))
z = model(x)
loss = criterion(z,y)
loss.backward()

and see if you can then see gradients, for instance in model.layer1.weight.grad

shivamchandhok · March 30, 2019, 7:45am

Hey,
so when I try this my gradients are not zero.

So I tried to go deeper into the problem and as you can see in the snippet below.
My parameters before and after the update step are not same i.e they get updated.

I am not sure if i am going on the right track but everything seems fine.

Iron4dam · June 24, 2019, 3:07pm

Have you tried creating a DataBunch from Pytorch dataset using DataBunch.create()?