Filter_by_func on ImageList

For a project I want to apply a filter to the image data.
The filter should select only images with certain labels in the accompanied .csv file.
How can I achieve this?

Before filtering I retrieved the data like this:

data = (ImageList.from_csv(PATH, folder='train', csv_name='labels.csv', cols='image_id')
        .use_partial_data(sample_pct = .1, seed = 31)
        .random_split_by_pct(valid_pct=0.25, seed = 29)
        .label_from_df(cols='dx')
        .transform(tfms, size = 224)
        .databunch(bs=64)).normalize(imagenet_stats)

My current idea is to list all needed files with Pandas. Then compare for each item in data.items if the item is present in the earlier created list. Somehow I think there is an easier way using either filter_by_func or label_cls. Any tips?

did you get a solution?

ImageList.from_csv use ImageList.from_df behind the scene.

So my suggestion is to read your data in a DataFrame and filter them using standard pandas code then use ImageList.from_df

1 Like

Hey, if I understand the question correctly, why don’t you subset your df using pandas to select the labels you want. You can store the results in a new df or new csv and then use that to create a data bunch. Suppose you have 5 classes and you don’t want classes numbered 2 and 4 you can do it as follows:

df_sub = df[(df['label'] != 2) & (df['label'] != 4)]

Hope this helps

Thanks for the idea Stephano!

1 Like