Fix_dl in databunch

I am using basic_train modules to create custom datasets. There are three dataloaders that need to be passed, train_dl,valid_dl and fix_dl. What’s fix_dl ? It’s not optional. it hasn’t been documented as well. Neither are any traces of it in any nbs.

Could you please tell @sgugger ?

fix_dl is optional now, it was a bug to have it be a non-default arg.

Thanks for telling me, also i think there is an issue with learn.summary() in 1.0.39.

Hello. My question is similar to the one of @prajjwal1 about fix_dl in DataBunch. Then, I post it here.

I do not understand dl_tfms as argument of the ImageDataBunch class in the fastai v1 documentation. What is the purpose of dl_tfms that appears as well in the DataBunch class? Thanks.

Note: in the vision.data page of the fastai v1 documentation, we use ds_tfms but I never saw dl_tfms.

1 Like

Hi Sylvain,
I have a problem too with fix_dl when I try to create a DataBunch with a test set. I’m working on colab using fats.ai version 1.0.42. This is the code that I use to create the DataBunch …

data = ImageDataBunch.from_folder(path=train_path, valid_pct=0.2, test=test_path, 
ds_tfms=tfms, size=100, bs=32)
data.normalize(imagenet_stats)

That’s the output I get …

ImageDataBunch;

Train: LabelList
y: CategoryList (39124 items)
[Category Avocado ripe, Category Avocado ripe, Category Avocado ripe, Category Avocado 
ripe, Category Avocado ripe]...
Path: data/fruits/fruits-360/Training
x: ImageItemList (39124 items)
[Image (3, 100, 100), Image (3, 100, 100), Image (3, 100, 100), Image (3, 100, 100), Image (3, 100, 100)]...
Path: data/fruits/fruits-360/Training;

Valid: LabelList
y: CategoryList (9781 items)
[Category Papaya, Category Plum 3, Category Walnut, Category Tomato Maroon, Category Cherry Wax Red]...
Path: data/fruits/fruits-360/Training
x: ImageItemList (9781 items)
[Image (3, 100, 100), Image (3, 100, 100), Image (3, 100, 100), Image (3, 100, 100), Image (3, 100, 100)]...
Path: data/fruits/fruits-360/Training;

Test: LabelList
y: EmptyLabelList (0 items) 
[]...
Path: .
x: ImageItemList (0 items)
[]...
Path: data/fruits/fruits-360/Training

As you can see the test Test LabelList is empty. I think that the problem is in the create method of the DataBunch…

@classmethod
    def create(cls, train_ds:Dataset, valid_ds:Dataset, test_ds:Optional[Dataset]=None, path:PathOrStr='.', bs:int=64,
               num_workers:int=defaults.cpus, dl_tfms:Optional[Collection[Callable]]=None, device:torch.device=None,
               collate_fn:Callable=data_collate, no_check:bool=False)->'DataBunch':
        "Create a `DataBunch` from `train_ds`, `valid_ds` and maybe `test_ds` with a batch size of `bs`."
        datasets = cls._init_ds(train_ds, valid_ds, test_ds)
        val_bs = bs
        dls = [DataLoader(d, b, shuffle=s, drop_last=s, num_workers=num_workers) for d,b,s in
               zip(datasets, (bs,val_bs,val_bs,val_bs), (True,False,False,False)) if d is not None]
        return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)

… when you call the constructor unpacking the dataloaders …

return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)

…because in the constructor of the DataBunch the third parameter is fix_dl and not test_dl. As a proof of this when I execute this command …

len(data.valid_dl.dataset), len(data.fix_dl.dataset), len(data.test_dl.dataset)

I get the following results…

(9781, 39124, 1)

Let me know if I’m right and this is a bug or otherwise if I missed something.

Thank you!

That is a bug if it happens, but I can’t reproduce it. Your codes gives me the fix_ds and the test_ds with the right number of elements. Can you double-check your version of fastai and then try your code on master?

Yes sure, I will check on the master and I’ll let you know. In the meanwhile you can reproduce the error using my notebook on colab. This is the public link:

https://colab.research.google.com/drive/1Irwe5wwvW25ZCsfAYeOXkmp01vibme7-

Andrea

1 Like

Use print(learn.summary())

Hi,
I think dl_tfms stands for data loader transformations, if it can help someone.

1 Like