ImageDataLoaders split_from_df() attribute error

g.s · July 1, 2022, 4:39pm

Hello,

I am updating a script from fastai v1 to fastai v2. The following line:
src = ImageList.from_df(df, path).split_from_df().label_from_df()
gives me the following error:
NameError: name 'ImageList' is not defined

I have seen on this link that ImageList was replaced by ImageDataLoaders.

So I imported ImageDataLoaders
from fastai.vision.data import ImageDataLoaders
and I modified the line mentioned above as
src = ImageDataLoaders.from_df(df, path).split_from_df().label_from_df()

Now I am getting
AttributeError: split_from_df

I searched in different pages but I could not find any explanation. Could you please tell me how to rewrite the line in order to get the same behavior as before? Thank you!

KevinB · July 1, 2022, 7:14pm

I think what you are looking for are arguments in from_df now. So there is an argument: label_col and valid_col so assuming you want to replicate the behavior, I think you would do

dls = ImageDataLoaders.from_df(df, path, label_col=1, valid_col=2)

label_col is already 1 by default so you may not need that, but by default, validation splitting is done with the valid_pct command and randomly splits 20% into the validation set. Let me know if this doesn’t work and what error you get after making the changes.

I also believe these will take a column name rather than a column number if that makes anything easier

g.s · July 4, 2022, 1:11pm

Thanks, it’s working now… but I stumbled across another error later in the script:

data = src.transform(transforms, size=args.imgsize).databunch(bs=args.bs, num_workers=16).normalize(imagenet_stats)
AttributeError: 'list' object has no attribute 'databunch'

Could you please give me the equivalent of the transform / normalize methods in fastai 2?

Thank you!

g.s · July 11, 2022, 7:34am

In the end, I found a solution using a DataBlock:

item_tfms=Resize(224)
batch_tfms = [*aug_transforms(size=224, do_flip=True, max_rotate=20, max_zoom=1.1), Normalize.from_stats(*imagenet_stats)]

blk = DataBlock(blocks=(ImageBlock, CategoryBlock),
                    get_x = ColReader("image_name", pref=path),
                    get_y=ColReader('class'),
                    splitter = ColSplitter(col='is_valid'),
                    item_tfms=item_tfms,
                    batch_tfms=batch_tfms)

dls = blk.dataloaders(df, bs=bs, num_workers=16)