Accessing filenames while batch processing images


(Gleb Esman) #1

I ingest images in batches. Image name contain important pieces of info that i want add as a second input for better classification:
train_gen = datagen.flow_from_directory(’/my/images/’, …)

While Keras does not document this - I can access all filenames like this:
train_gen.filenames

I can create infinite generator on above list and construct “batches” of image names in parallel with batches of image data that are coming in.

Question: is there a guarantee match between image names and image data sequence order in this case?


(David Gutman) #2

If you set shuffle=False and only run through it one time, yes.


(Gleb Esman) #3

Sounds good.
However i was wondering about shuffle=True. - Would it shuffle data after image names are constructed - and then no longer would match?


(David Gutman) #4

IIRC shuffle just changes indices, don’t think it changes the order of the filenames attribute.


(Jeremy Howard (Admin)) #5

I think it would be better to put each bit of info into a separate array, and then generate the batches yourself. The filenames won’t be shuffled correctly across iterations in the approach you’ve proposed.


(Gleb Esman) #6

Hi Jeremy,
you mean fully replacing flow_from_directory() with custom code?

Gleb


(Jeremy Howard (Admin)) #7

I do indeed - or you could make a copy of and edit flow_from_directory to also return the file names each batch, or to return the file names instead of the labels…


(Gleb Esman) #8

After looking at Keras source code and getting scared - I managed to modify it to return image filenames alltogether with the rest of jazz.
The significance of that is I can also shuffle input and still get matched filenames list with my binary data batch.
And it took me 15 minutes!

This also allowed to create multi-input model that parses filenames for metadata (filenames has embedded info).
This propelled my model from 80% to 99% validation accuracy in my experiment to match person’s identity to mouse movements on a very small datasets.

Yay! Thanks Jeremy. Weekend well spent.


(Jeremy Howard (Admin)) #9

That’s awesome! I hope you’ll share your code and a post describing the approach - I’m sure a lot of people would find it really helpful :slight_smile:


(Gleb Esman) #10

I definitely looking forward to make a write-up on this.
Will send you a link as soon as it’s done.

Gleb


(Kent) #11

Hi Gleb, what you wrote more than a year ago is exactly what I am looking for now. Did you have a chance to share a post or the code somewhere?