Vgg.get_batches is returning more classes than I expect


After successfully submitting to dogs_vs_cats redux by working through lessons 1 and the dogs vs cats redux notebook I thought it would be a good learning experience to do it all over with my own small data set (and it really has been!)

The Issue I am having is that vgg.get_batches() is returning one more class than it should (I only have two classes, it tells me I have 3)

I have mimicked the directory setup

  • /train/
  • class1/
  •    class1.1.jpg
  •    class1.2.jpg
  •    class1.n.jpg 
  • class2/
  •    class2.1.jpg
  •    class2.2.jpg
  •    class2.n.jpg

When I run…

#Fine tune the model
batches = vgg.get_batches(train_path, batch_size = batch_size)
val_batches = vgg.get_batches(valid_path, batch_size = batch_size*2)

vgg.finetune(batches) = 0.01

I get…

Found 34 images belonging to 3 classes.
Found 10 images belonging to 2 classes.

My “valid” directory and my “train” directory have identical folders in them.

Here is the path I am defining…

#Change the directory

#create relative path names
path = DATA_HOME_DIR + '/'
test_path = path + '/test/'
results_path = path + '/results/'
train_path = path + '/train/'
valid_path = path + ‘/valid/’

So I guess my question is, does vgg.get_batches determine the number of classes based on the number of folders in the directory? I looked at, tried to understand the get_batches, from there I went to Keras trying to track down my issue, but after a couple hours of troubleshooting around on this, I thought I would reach out here.)

Thanks for any help

Yes, get_batches returns classes based on the number of directories. So maybe checking that would help. I’ve have not really noticed it returning extra classes for me if i get my directory setup right. so in this case it should ideally be
and the same for valid. for test data you can put all images in an ‘unknown’ folder.

Thanks @karthik_k314 - this will keep me on the right direction!

I know if I add a directory folder (just an empty one), vgg.get_batches shows an extra class. It is almost like I have a hidden directory in my train/ directory that get_batches see’s, but I don’t. I have used the terminal to list all the directories in the train folder, and it only shows two. I’ll figure it out yet :slight_smile:

Thanks again!

Well - I found out the issue - or at least I tried something different and it worked.

I am writing this here so that hopefully if someone has the same issue in the future, it can help them.

When I created my /train directory, I actually created the directory using the web browser file structure in the jupyter “environment” i.e., like below:

I did this because I am not to great at bash commands (yet), and I was having trouble loading a zip file of my personal data set from my computer to the AWS server via bash. I was able to make directories fine with a jupyter notebook - but for the above /train folder, I created it using the jupyter “environment” (though I don’t think I am using the right words here, I hope you know what I mean.

So what I did was manually create the /train folder in jupyter (like I said above) and then manually upload the photos using the upload feature. It all seemed fine, until vgg.get_batches seemed to find 1 more class than I had in the train/ directory.

I ended up creating a new directory using bash, copying the files over (via bash) and now every things is working as expected.

So I guess what I am saying is, that if you use the jupyter web “environment” (I am not referring to commands inside a jupyter notebook), then vgg.get_batches might find a “hidden folder” which it deems a class.

I could be totally off here - but I thought I would throw it out there in case it might help someone in the future.


Had the same problem. Only solution was to completely delete the directories and recreate in Unix. Jupyter notebook must have a bug.

