I try to create a Databunch from pytorch dataloader but failed.
I need to build a network for age and gender estimation, thus my dataset have to return the image, age and gender info.
Fastai version:1.0.33 and below is the my code and error screenshot:
This is unrelated to fastai: you can’t put PIL Image directly in a pytorch dataloader.
this is the custom dataset template from pytorch:
if no transform apply, it will return the PIL image, but normally will have the transforms.ToTensor() apply.
from torch.utils.data.dataset import Dataset
from torchvision import transforms
class MyCustomDataset(Dataset):
def __init__(self, ..., transforms=None):
# stuff
...
self.transforms = transforms
def __getitem__(self, index):
# stuff
...
data = # Some data read from a file or image
if self.transforms is not None:
data = self.transforms(data)
# If the transform variable is not empty
# then it applies the operations in the transforms with the order that it is created.
return (img, label)
def __len__(self):
return count # of how many data(images?) you have
Yes and since you don’t have it here you pass Image directly in a pytorch dataloader, which again isn’t possible.
I changed my code:
- use transfrom during craeting dataloader
transform = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.ToTensor(), ]) dataset = ImdbWikiDataset(transform=transform) dataloader = DataLoader(dataset, batch_size=8, shuffle=True, num_workers=1)
dataloader works fine:
- create the databunch from the pytorch dataloader:
tfms_train, tfms_val = get_transforms() test_db = DataBunch(dataloader, dataloader, tfms=tfms_train) test_db.one_batch()
the error msg says: AttributeError: ‘list’ object has no attribute ‘pixel’
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-15cf372150e2> in <module>()
----> 1 test_db.one_batch()
~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm)
132 w = self.num_workers
133 self.num_workers = 0
--> 134 try: x,y = next(iter(dl))
135 finally: self.num_workers = w
136 if detach: x,y = to_detach(x),to_detach(y)
~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in __iter__(self)
68 for b in self.dl:
69 y = b[1][0] if is_listy(b[1]) else b[1]
---> 70 if not self.skip_size1 or y.size(0) != 1: yield self.proc_batch(b)
71
72 @classmethod
~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/basic_data.py in proc_batch(self, b)
60 "Proces batch `b` of `TensorImage`."
61 b = to_device(b, self.device)
---> 62 for f in listify(self.tfms): b = f(b)
63 return b
64
~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in __call__(self, x, *args, **kwargs)
495 def __call__(self, x:Image, *args, **kwargs)->Image:
496 "Randomly execute our tfm on `x`."
--> 497 return self.tfm(x, *args, **{**self.resolved, **kwargs}) if self.do_run else x
498
499 def _resolve_tfms(tfms:TfmList):
~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in __call__(self, p, is_random, *args, **kwargs)
442 def __call__(self, *args:Any, p:float=1., is_random:bool=True, **kwargs:Any)->Image:
443 "Calc now if `args` passed; else create a transform called prob `p` if `random`."
--> 444 if args: return self.calc(*args, **kwargs)
445 else: return RandTransform(self, kwargs=kwargs, is_random=is_random, p=p)
446
~/anaconda3/envs/pytorch_v1/lib/python3.6/site-packages/fastai/vision/image.py in calc(self, x, *args, **kwargs)
447 def calc(self, x:Image, *args:Any, **kwargs:Any)->Image:
448 "Apply to image `x`, wrapping it if necessary."
--> 449 if self._wrap: return getattr(x, self._wrap)(self.func, *args, **kwargs)
450 else: return self.func(x, *args, **kwargs)
451
AttributeError: 'list' object has no attribute 'pixel'
Yes, since you’re not using a fastai datasets, you can’t expect the fastai functions to work properly as they rely on different behaviors.
You can put your DataBunch in a Learner object with your custom model to use fastai to train it, but all the helper functions around your data will need fastai datasets.
Thanks Sugugger, I will change to use the fastai datasets.
The fastai doc says we can use the torch.utils.data.DataLoader or torch.utils.data.Dataset during construct the Databunch. But no where to see how…
I am getting an error:
samples = collate_fn([dataset[i] for i in batch_indices])
TypeError: ‘DataLoader’ object does not support indexing
What am I doing wrong.
Why am I not able to create and train a Databunch from pytorch dataloaders
Hey
Can You please help me with the following error message
As indicated by the docs DataBunch.create
takes datasets. It’s the regular init that takes DataLoader
.
Oh okay.
This solved it.
Thank You.
Hey,
This solved the problem but I am not getting the expected results.
Can you please take a look at my problem.
Link is given below
It’s hard to say why a model doesn’t want to train. Did you try a higher learning rate?
Yeah I did.
When I run the same model with keras it trains perfectly.
But I want to use fast ai now.
You should check the initialization. There is a bug in the default initialization of PyTorch for conv layers, that might be the difference with Keras.
Hey,
I tried that and still accuracy is 14%.
It is weird as no matter what validation loss I have The accuracy is ~14%.
I have had val_loss=9.5 and accuracy ~14% and also
val_loss=1.93 and still accuracy is ~ 14%
As you can see.
All my predictions are exactly the same
Hey,
So I searched on how to check gradients of different layers in the model.
It turns out all my gradients are zero.
Can you tell me a possible solution/reason for this.
Below is the code that I am using to initialize my weights now.
If you don’t get gradients, that’s the whole reason your model doesn’t train. How did you check them? Note that they are zeroed in the training loop after each step, so just looking after a fit of 1 epoch doesn’t mean they were all zeros.
You should amnually check with
model.train()
x,y = next(iter(data.train_dl))
z = model(x)
loss = criterion(z,y)
loss.backward()
and see if you can then see gradients, for instance in model.layer1.weight.grad
Hey,
so when I try this my gradients are not zero.
So I tried to go deeper into the problem and as you can see in the snippet below.
My parameters before and after the update step are not same i.e they get updated.
I am not sure if i am going on the right track but everything seems fine.
Have you tried creating a DataBunch
from Pytorch dataset using DataBunch.create()
?