Kaggle Competitions Data Grabbing

Hi, I notice that most Kaggle competition datasets have a CSV file that consists the name of the image file and the labels in separate columns. How do you guys usually read these files into ImageDataBunch to get the images and labels into a data bunch.

This is the competition that I’m referring to, It’s a multi-classification problem.

can’t you use imagedatabunch.from_csv

1 Like

Hi so I tried using ImageItemList.from_csv() and the problem is that in each sample has 4 different extensions such as ‘_blue’, ‘_yellow’, ‘_green’, and ‘_red’ for each ID in the train.csv. However the train.csv file name does not consist the extensions in the id column, do you know how to add these extensions when creating the ImageItemList.from_csv() so that it can find the image from the train folder correctly?

Each sample might have 2-4 images with an extension ‘_blue’,‘_yellow’, ‘_green’, or ‘_red’
Below is the id of the file name:

However, if we look at one of the id (the first one) in the train folder (folder of all of the images)
The sample has these extensions at the end of their file name then followed by .png, do you know how to dynamically add these string extensions so that ImageItemList can bunch the images?

You can start with
https://forums.fast.ai/t/human-protein-atlas-competition-starter-code/

1 Like

thank you!