How to modify the get_batches generator to provide labels for Regression


(Satish Kottapalli) #1

Continuing the discussion from Lesson 2 discussion:

Hi,

I am currently working on a regression problem where the model needs to count the number of connectors in a pic. I have only 120 labeled data points split into 100 training and 20 validation samples. The directory names are numbers indicating the count of connectors.

I have been able to modify the labels from get_data() and do a model.fit on the resultant data. Given the limited samples, have to do heavy data augmentation. So would like to train the model using fit_generator. However am running into problems with fit_generator.

my code is as follows

batches= gen.flow_from_directory(dirname,target_size=target_size,class_mode='categorical', shuffle=True, batch_size=batch_size)
imgs,labels=next(batches)   
re_label=np.dot(labels,indices)
for x in range(batch_size):
    re_label[x]=lookup[int(re_label[x])]
yield (imgs,re_label)

However, get the following error when I run this above code.

File “C:\Users\satish\Anaconda\lib\site-packages\keras\engine\training.py”, line 436, in data_generator_task
generator_output = next(generator)
TypeError: function object is not an iterator


Exception Traceback (most recent call last)
in ()
1 model_v.compile(optimizer=Adam(1e-6),
2 loss=‘mse’)
----> 3 model_v.fit_generator(get_batches_t,nb_epoch=1,samples_per_epoch=500,validation_data=(val1,val2),nb_val_samples=100,verbose=1)

C:\Users\satish\Anaconda\lib\site-packages\keras\models.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, **kwargs)
895 max_q_size=max_q_size,
896 nb_worker=nb_worker,
–> 897 pickle_safe=pickle_safe)
898
899 def evaluate_generator(self, generator, val_samples,

C:\Users\satish\Anaconda\lib\site-packages\keras\engine\training.pyc in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch)
1447 raise Exception('output of generator should be a tuple '
1448 '(x, y, sample_weight) '
-> 1449 'or (x, y). Found: ’ + str(generator_output))
1450 if len(generator_output) == 2:
1451 x, y = generator_output

Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

Not sure why it says generator output is not a tuple.


(Jeremy Howard) #2

I suggest you use ImageDataGenerator.flow(), instead of ImageDataGenerator.flow_from_directory. You’ll need to load your images into a numpy array, and your labels into another array, then you can take advantage of the data augmentation in the class.


(Satish Kottapalli) #3

At the outset, Thanks for not only creating such a fantastic course but more importantly for opening it up to the general public as well. And taking the trouble to respond to queries.

Great suggestion. Did the trick; Though at this point it has become more of a puzzle as to why the earlier error was occurring. Will have to park that for now.


(Pietz) #4

i hope i may pickup this thread with an ongoing question.

i want to use flow_from_directory() to save the images inside a numpy array. i wanted to write my own function based on Jeremys lesson 2 notebook, but i cant get it to work. i tried this:

itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
imgs = np.concatenate([itr.next() for i in range(itr.nb_sample)])

which (to me) looks like the same logic thats used in the get_batches() and get_data() functions. however it results in:

ValueError: could not broadcast input array from shape (32,32,3) into shape (1)

any ideas whats happening?


#5

I remember I saw same error but not sure exactly how I fixed it and can’t test atm…

IIRC itr.next() returns a tuple of (class_label, img_array) or something like that. You can just do itr.next()[0] or itr.next()[1] depending on the position where the img data in the returned tuple resides.

Don’t remember but either way I think the shape of whatever itr.next() returns might be key. Let me know please if the above solution works or please check what itr.next() return and we’ll figure this out.


(Pietz) #6

thanks for giving me the insight that you still remember. i’ll try a bit more and give feedback if i solve it.


(Pietz) #7

holy smokes, that did the trick! so in summary:

if you want to save a directory of images inside a numpy array using the DataImageGenerator, you can do the following:

  1. Create your ImageDataGenerator with the constructor parameters you want
  2. Assign the return of the flow_from_directory() to a variable
    itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
  3. Loop over itr.next()[0] like so
    imgs = np.concatenate([itr.next()[0] for i in range(itr.nb_sample)])

this will give you a numpy array in the shape of (# of samples, width, height, channels)

thanks again radek!


(Pietz) #8

ok one question remains.

with these 3 steps i can save the x-data to an array and by changing the 0 to a 1, i can do the same for the y-data. however as of right now id like to have it all in one numpy array and im having trouble concatenating to arrays of shape (a,b,c,d) and (a,e) to (a,b,c,d,e). also i think there must be a better way to read it in one array alltogether.


(Kay) #9

@pietz

did you solved your Problem?
I think i need the same procedure to solve a problem of mine. I would like to load my two classes with different augmentation gens and concatenate them to my training data.


(Pietz) #10

i believe what i wanted to do is simply not possible because of the 2 different shapes.


(sathya) #11

I am using Keras 2.0 and acquiring vgg16 from keras directly. in the VGG16 model there is no function with name get_batches


(Satish Kottapalli) #12

It is not in VGG16 model but in utils which you can download from the course site. You need to do the following
> import utils; reload(utils)
> from utils import get_batches


(sathya) #13

its not in utils but called using vgg16 object. I found that Vgg16.py included in github has been edited to include new functions but i was using keras vgg16 hence the confusion. It was solved by using github vgg16.py