Lesson 3 - File names existing in csv but not in folder when using ImageList.from_csv

JanM · December 14, 2019, 6:18pm

Hi,

I am trying to create an imagelist using “ImageList.from_csv.”

My csv contains 40108 rows.
My folder with pictures (only using a sample of the total) contains 997 files. The names of the files correspond to the colums imdbId in the csv, but also have the .jpg suffix.

Picture of the csv:
csv_posters2

When I run the following code I get the following error:

np.random.seed(42)
data_csv = (ImageList.from_csv(path, ‘MovieGenre.csv’, folder=‘SampleMoviePosters’, suffix=’.jpg’)
.split_by_rand_pct(0.2)
.label_from_df(label_delim=’ ')
.transform(get_transforms(), size=128)
.databunch()
)

FileNotFoundError: [Errno 2] No such file or directory: ‘moviegenre/SampleMoviePosters/114709.jpg’

Since this file, 114709.jpg, doesnt exists in my sample image folder (only 997 rows out of 40108 will do), this is understandable.

Is there any good way to solve this? Maybe some way to drop all rows in the csv where there is no file with the specified name in the folder?

//Jan

JanM · December 14, 2019, 10:30pm

If anyone has a similar problem in the future, I asked a similar question on stackoverflow and got a great answer: