ImageItemList label_from_df trying to find non-existant images

mcsquared · February 15, 2019, 3:41am

Hi all

For my lesson 3 assignment I am trying to conduct some multi-label predictions using the Yelp Kaggle data (https://www.kaggle.com/c/yelp-restaurant-photo-classification).

I ran into the following issue and would really appreciate some hints on this.

I am trying to generate and label my data from a csv using the following code very similar to the in-class satellite image exercise.

np.random.seed(3)
src = (ImageItemList.from_csv(path = path, csv_name = 'key.csv', 
                              folder='train_photos', suffix='.jpg')
       .random_split_by_pct(0.2)
       .label_from_df(label_delim=' ')
      )

My csv file, ‘key.csv’, which is in the ‘path’ directory, contains two columns: image_name and tags which are numeric space-delineated tags.

The format of each image is path/train_photos/{image_name}.jpg

When I run the above code and call src I see the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jupyter/hw/data/yelp/train/train_photos/0.jpg'

The issue is, I know that 0.jpg doesn’t exist, ‘0’ is not a row in the image_name column of my key.csv, so I am not sure why it is expecting that as an input.

Any ideas?

Thank you!