Train and Test from a different folder structure without changing it?

anshaj · April 22, 2017, 12:09pm

I am trying to do Face recognition from AT&T face database and its folder has a very different structure ie. 40 folders with 10 photos each and I want 8 images from each folder as training and other 2 as test images. But I don’t want to label all the data again and put them in a different folder structure. How can I achieve this ??

Here is the [Database].(http://www.cl.cam.ac.uk/Research/DTG/attarchive:pub/data/att_faces.zip )

Thanks

shawn · April 23, 2017, 12:57am

Hi Anshaj,

I’m not sure exactly what your objective is. Is the goal to classify a new photo as being one of the 40 subjects in the training set? If so, then you are starting with a directory structure exactly like we saw in Lesson 1: cats in one folder, dogs in another. Only in this case, there are 40 classes instead of two.

If you have a different objective, you would have to reorganize your data to suit that objective. For example, if the 10 photos of each subject are from specific angles or lighting conditions and you want to predict the angle or lighting condition, then you’d need to restructure your images so that all of the photos (regardless of subject) are grouped by angle or lighting.

You said that you don’t want to put your images in a different folder structure. Is there a reason why? The ImageDataGenerator class in Keras comes with a handy flow_from_directory() function that works nicely for classification tasks where the examples are organized into per-class subdirectories. You don’t have to use it; you could write your own generator to do this differently. But it’s not too hard to move files around in Python, so I’m not sure that you’ll be saving yourself any time or effort.

In general, data analysis tasks will usually include a significant amount of “data wrangling,” so it’s a good idea to get comfortable doing whatever wrangling needs to be done on a particular task.

Good luck!