How to modify the get_batches generator to provide labels for Regression

(Satish Kottapalli) #1

Continuing the discussion from Lesson 2 discussion:


I am currently working on a regression problem where the model needs to count the number of connectors in a pic. I have only 120 labeled data points split into 100 training and 20 validation samples. The directory names are numbers indicating the count of connectors.

I have been able to modify the labels from get_data() and do a on the resultant data. Given the limited samples, have to do heavy data augmentation. So would like to train the model using fit_generator. However am running into problems with fit_generator.

my code is as follows

batches= gen.flow_from_directory(dirname,target_size=target_size,class_mode='categorical', shuffle=True, batch_size=batch_size)
for x in range(batch_size):
yield (imgs,re_label)

However, get the following error when I run this above code.

File “C:\Users\satish\Anaconda\lib\site-packages\keras\engine\”, line 436, in data_generator_task
generator_output = next(generator)
TypeError: function object is not an iterator

Exception Traceback (most recent call last)
in ()
1 model_v.compile(optimizer=Adam(1e-6),
2 loss=‘mse’)
----> 3 model_v.fit_generator(get_batches_t,nb_epoch=1,samples_per_epoch=500,validation_data=(val1,val2),nb_val_samples=100,verbose=1)

C:\Users\satish\Anaconda\lib\site-packages\keras\models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, **kwargs)
895 max_q_size=max_q_size,
896 nb_worker=nb_worker,
–> 897 pickle_safe=pickle_safe)
899 def evaluate_generator(self, generator, val_samples,

C:\Users\satish\Anaconda\lib\site-packages\keras\engine\training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch)
1447 raise Exception('output of generator should be a tuple '
1448 '(x, y, sample_weight) '
-> 1449 'or (x, y). Found: ’ + str(generator_output))
1450 if len(generator_output) == 2:
1451 x, y = generator_output

Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

Not sure why it says generator output is not a tuple.

(Jeremy Howard) #2

I suggest you use ImageDataGenerator.flow(), instead of ImageDataGenerator.flow_from_directory. You’ll need to load your images into a numpy array, and your labels into another array, then you can take advantage of the data augmentation in the class.

(Satish Kottapalli) #3

At the outset, Thanks for not only creating such a fantastic course but more importantly for opening it up to the general public as well. And taking the trouble to respond to queries.

Great suggestion. Did the trick; Though at this point it has become more of a puzzle as to why the earlier error was occurring. Will have to park that for now.

(Pietz) #4

i hope i may pickup this thread with an ongoing question.

i want to use flow_from_directory() to save the images inside a numpy array. i wanted to write my own function based on Jeremys lesson 2 notebook, but i cant get it to work. i tried this:

itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
imgs = np.concatenate([ for i in range(itr.nb_sample)])

which (to me) looks like the same logic thats used in the get_batches() and get_data() functions. however it results in:

ValueError: could not broadcast input array from shape (32,32,3) into shape (1)

any ideas whats happening?


I remember I saw same error but not sure exactly how I fixed it and can’t test atm…

IIRC returns a tuple of (class_label, img_array) or something like that. You can just do[0] or[1] depending on the position where the img data in the returned tuple resides.

Don’t remember but either way I think the shape of whatever returns might be key. Let me know please if the above solution works or please check what return and we’ll figure this out.

(Pietz) #6

thanks for giving me the insight that you still remember. i’ll try a bit more and give feedback if i solve it.

(Pietz) #7

holy smokes, that did the trick! so in summary:

if you want to save a directory of images inside a numpy array using the DataImageGenerator, you can do the following:

  1. Create your ImageDataGenerator with the constructor parameters you want
  2. Assign the return of the flow_from_directory() to a variable
    itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
  3. Loop over[0] like so
    imgs = np.concatenate([[0] for i in range(itr.nb_sample)])

this will give you a numpy array in the shape of (# of samples, width, height, channels)

thanks again radek!

(Pietz) #8

ok one question remains.

with these 3 steps i can save the x-data to an array and by changing the 0 to a 1, i can do the same for the y-data. however as of right now id like to have it all in one numpy array and im having trouble concatenating to arrays of shape (a,b,c,d) and (a,e) to (a,b,c,d,e). also i think there must be a better way to read it in one array alltogether.

(Kay) #9


did you solved your Problem?
I think i need the same procedure to solve a problem of mine. I would like to load my two classes with different augmentation gens and concatenate them to my training data.

(Pietz) #10

i believe what i wanted to do is simply not possible because of the 2 different shapes.

(sathya) #11

I am using Keras 2.0 and acquiring vgg16 from keras directly. in the VGG16 model there is no function with name get_batches

(Satish Kottapalli) #12

It is not in VGG16 model but in utils which you can download from the course site. You need to do the following
> import utils; reload(utils)
> from utils import get_batches

(sathya) #13

its not in utils but called using vgg16 object. I found that included in github has been edited to include new functions but i was using keras vgg16 hence the confusion. It was solved by using github