Query related to ImagedataBunch

I have been working on a model to detect the damage on cars and a single image folder is provided to me containing pictures of all kind of damages ex. crashes, dents, scratches, tear, etc. However, among various categories of damages, I have to model only 2 damage categories- ‘scratches’ and ‘dents’ as other categories of damage have very fewer instances.

So, should I remove the pictures of unwanted damages from the image folder as these might get picked up by the ImagedatatBunch in the batch and reduce accuracy?
Also, the labels of the pictures are provided in a csv file.

If your csv file has file paths with their corresponding labels in each row then you can just read that csv into a dataframe and alter that dataframe to use the labels you want.

df = pd.read_csv('path_to_your_csv_file')
df = df.loc[df['label_column_name'].isin(['scratches', 'dents'])]

data = ImageDataBunch.from_df(df)
       ...
1 Like

Thanks, it worked!