ImageDataLoaders split_from_df() attribute error


I am updating a script from fastai v1 to fastai v2. The following line:
src = ImageList.from_df(df, path).split_from_df().label_from_df()
gives me the following error:
NameError: name 'ImageList' is not defined

I have seen on this link that ImageList was replaced by ImageDataLoaders.

So I imported ImageDataLoaders
from import ImageDataLoaders
and I modified the line mentioned above as
src = ImageDataLoaders.from_df(df, path).split_from_df().label_from_df()

Now I am getting
AttributeError: split_from_df

I searched in different pages but I could not find any explanation. Could you please tell me how to rewrite the line in order to get the same behavior as before? Thank you!

I think what you are looking for are arguments in from_df now. So there is an argument: label_col and valid_col so assuming you want to replicate the behavior, I think you would do

dls = ImageDataLoaders.from_df(df, path, label_col=1, valid_col=2)

label_col is already 1 by default so you may not need that, but by default, validation splitting is done with the valid_pct command and randomly splits 20% into the validation set. Let me know if this doesn’t work and what error you get after making the changes.

I also believe these will take a column name rather than a column number if that makes anything easier

Thanks, it’s working now… but I stumbled across another error later in the script:

data = src.transform(transforms, size=args.imgsize).databunch(, num_workers=16).normalize(imagenet_stats)
AttributeError: 'list' object has no attribute 'databunch'

Could you please give me the equivalent of the transform / normalize methods in fastai 2?

Thank you!

In the end, I found a solution using a DataBlock:

batch_tfms = [*aug_transforms(size=224, do_flip=True, max_rotate=20, max_zoom=1.1), Normalize.from_stats(*imagenet_stats)]

blk = DataBlock(blocks=(ImageBlock, CategoryBlock),
                    get_x = ColReader("image_name", pref=path),
                    splitter = ColSplitter(col='is_valid'),

dls = blk.dataloaders(df, bs=bs, num_workers=16)