Type error in lesson 2, get_data method


(sai kiran) #1

I am getting the following error when I run the ‘get_data’ method. I have found in other threads that the code needs to be updated. But, I do not know where from.
I didn’t find it in platform.ai and it also doesn’t appear that the code has been updated on github.
TypeError: coercing to Unicode: need string or buffer, DirectoryIterator found

Following is the traceback.

TypeError Traceback (most recent call last)
in ()
----> 1 val_data = get_data(val_batches)
2

/home/ubuntu/nbs/utils.pyc in get_data(path, target_size)
134
135 def get_data(path, target_size=(224,224)):
–> 136 batches = get_batches(path, shuffle=False, batch_size=1, class_mode=None, target_size=target_size)
137 return np.concatenate([batches.next() for i in range(batches.nb_sample)])
138

/home/ubuntu/nbs/utils.pyc in get_batches(dirname, gen, shuffle, batch_size, class_mode, target_size)
90 target_size=(224,224)):
91 return gen.flow_from_directory(dirname, target_size=target_size,
—> 92 class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)
93
94

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.pyc in flow_from_directory(self, directory, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, save_to_dir, save_prefix, save_format, follow_links)
444 save_prefix=save_prefix,
445 save_format=save_format,
–> 446 follow_links=follow_links)
447
448 def standardize(self, x):

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.pyc in init(self, directory, image_data_generator, target_size, color_mode, dim_ordering, classes, class_mode, batch_size, shuffle, seed, save_to_dir, save_prefix, save_format, follow_links)
775 if not classes:
776 classes = []
–> 777 for subdir in sorted(os.listdir(directory)):
778 if os.path.isdir(os.path.join(directory, subdir)):
779 classes.append(subdir)

TypeError: coercing to Unicode: need string or buffer, DirectoryIterator found


(Yohei Sazanami) #2

I hit the same error, but I passed it by referring the discussion below.
It seems get_data method should be used as get_data(path+‘train’).


(sai kiran) #3

I’m sorry but I don’t follow. Should I pass it like, val_data = get_data(val_batches+‘train’) ?

Or should I change the method itself? Like,
def get_data(path, target_size=(224,224)):
batches = get_batches(path, shuffle=False, batch_size=1, class_mode=None, target_size=target_size)
return np.concatenate([batches.next() for i in range(batches.nb_sample)])


(sai kiran) #4

And also it doesn’t make any sense to me. The val_batches has the images, but not the complete path of that image. So why add, path+‘train’ ?


(carlos roberto) #5

Hey @sakiran, it was confuse for me too but just take a look the code of get_data and see that it call get_batches that will receive a path as argument. So, just pass the path to train and valid directories to get_data and it will work.


(Sepehr Akhavan) #6

As @carlosdeep mentioned, simply change

trn_data = get_data(batches) 

to:

trn_data = get_data(path+'train')

similarly for valid, do:

val_data = get_data(path+'valid')

(Roy) #7

Hi All, I think I am missing something here…please advise.

I am using
trn = get_data(path+'train')

But I am still getting the same error.

Found 23000 images belonging to 2 classes.
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
in ()
----> 1 trn = get_data(path+‘train’)
2 val = get_data(path+‘valid’)

/home/ubuntu/courses/deeplearning1/nbs/utils.pyc in get_data(path, target_size)
    135 def get_data(path, target_size=(224,224)):
    136     batches = get_batches(path, shuffle=False, batch_size=1, class_mode=None, target_size=target_size)
--> 137     return np.concatenate([batches.next() for i in range(batches.nb_sample)])
    138 
    139 

MemoryError:

(sai kiran) #8

Use get_batches instead of get_data. You are getting a memory error.
Are you working on your personal computer or AWS instance?


(Roy) #9

Thanks It worked !!


(Roy) #10

However, save_array after that chokes…

save_array(model_path+'train_data.bc', trn)
save_array(model_path+'valid_data.bc', val)

TypeErrorTraceback (most recent call last)
<ipython-input-18-85e96f8c655b> in <module>()
      1 from utils import *
----> 2 save_array(model_path+'train_data.bc', trn)
      3 save_array(model_path+'valid_data.bc', val)

/home/ubuntu/courses/deeplearning1/nbs/utils.pyc in save_array(fname, arr)
    165 
    166 def save_array(fname, arr):
--> 167     c=bcolz.carray(arr, rootdir=fname, mode='w')
    168     c.flush()
    169 
/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

TypeError: can't pickle generator objects

(Anil Pandey) #11

I am still getting the error - ​AttributeError: ‘DirectoryIterator’ object has no attribute ‘nb_sample’

Looked at all the solution for get_data error but it is still not working

val_data = get_data(path+‘valid’)

Found 2000 images belonging to 2 classes.


AttributeError Traceback (most recent call last)
in ()
3 #print('val_batches.nb_class ',batches.nb_class, 'val_batches.nb_sample ', batches.nb_sample)
4
----> 5 val_data = get_data(path+‘valid’)

/mnt/data/ssd000/dsb2017/anil/numpy/utils.py in get_data(path, target_size)
147 def get_data(path, target_size=(224,224)):
148 batches = get_batches(path, shuffle=False, batch_size=1, class_mode=None, target_size=target_size)
–> 149 return np.concatenate([batches.next() for i in range(batches.nb_sample)])
150
151

AttributeError: ‘DirectoryIterator’ object has no attribute ‘nb_sample’


(Cristian) #12

I don’t know if you solved this, but as I went through the same error I want to share the solution I have found.
This work if you are using keras 2.

In utils.py, you have to go to the get_data function and change the last line:

(new) return np.concatenate([batches.next() for i in range(batches.samples)])     #keras 2
(old) return np.concatenate([batches.next() for i in range(batches.nb_sample)])

Now it would be nice if anybody knows the reason behind this :slight_smile:


(Evan DeFilippis) #13

Did you ever figure out how to resolve this issue? I’m having the same problem.


(Sherif) #14

Going through lesson 2 and had the same problem. The following code worked on Keras 2 for me:

  • return np.concatenate([batches.next()[0] for i in range(batches.samples)])

Does anyone know where do I find the documentation for batches attributes? I looked at Keras ImageDataGenerator https://keras.io/preprocessing/image/ but couldn’t find a reference to the samples property.