I ingest images in batches. Image name contain important pieces of info that i want add as a second input for better classification:
train_gen = datagen.flow_from_directory(’/my/images/’, …)
While Keras does not document this - I can access all filenames like this:
train_gen.filenames
I can create infinite generator on above list and construct “batches” of image names in parallel with batches of image data that are coming in.
Question: is there a guarantee match between image names and image data sequence order in this case?
I think it would be better to put each bit of info into a separate array, and then generate the batches yourself. The filenames won’t be shuffled correctly across iterations in the approach you’ve proposed.
I do indeed - or you could make a copy of and edit flow_from_directory to also return the file names each batch, or to return the file names instead of the labels…
After looking at Keras source code and getting scared - I managed to modify it to return image filenames alltogether with the rest of jazz.
The significance of that is I can also shuffle input and still get matched filenames list with my binary data batch.
And it took me 15 minutes!
This also allowed to create multi-input model that parses filenames for metadata (filenames has embedded info).
This propelled my model from 80% to 99% validation accuracy in my experiment to match person’s identity to mouse movements on a very small datasets.