ImageItemList label_from_df trying to find non-existant images

mcsquared · February 16, 2019, 12:07am

Hi all

For my lesson 3 assignment I am trying to conduct some multi-label predictions using the Yelp Kaggle data (https://www.kaggle.com/c/yelp-restaurant-photo-classification).

I ran into the following issue and would really appreciate some hints on this.

I am trying to generate and label my data from a csv using the following code very similar to the in-class satellite image exercise.

np.random.seed(3)
src = (ImageItemList.from_csv(path = path, csv_name = 'key.csv', 
                              folder='train_photos', suffix='.jpg')
       .random_split_by_pct(0.2)
       .label_from_df(label_delim=' ')
      )

My csv file, ‘key.csv’, which is in the ‘path’ directory, contains two columns: image_name and tags which are numeric space-delineated tags.

The format of each image is path/train_photos/{image_name}.jpg

When I run the above code and call src I see the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jupyter/hw/data/yelp/train/train_photos/0.jpg'

The issue is, I know that 0.jpg doesn’t exist, ‘0’ is not a row in the image_name column of my key.csv, so I am not sure why it is expecting that as an input.

Any ideas?

Thank you!

Tom2718 · February 16, 2019, 10:47am

I presume you created a new csv that mapped the business IDs to the image IDs and called it key.csv? Maybe there’s a mistake in the mapping process of business_id to photo_id when you created key.csv? Perhaps a missing entry that got evaluated as 0?

mcsquared · February 16, 2019, 7:00pm

Thanks so much for the reply!

The solution was actually even more simple - the key.csv that I was saving was recording the index as the first column. Thus, it was searching for 0.jpg, 1.jpg, etc. which are not necessarily file names in my directory.

Upon using key.to_csv(path + ‘key2.csv’, index = False) when saving my csv file everything worked perfectly.

Thank you again!