Issues with get_data, save_array, and load_array

MPJ · February 10, 2017, 10:30pm

Hi there,

I was struggling with roughly the same issue and found a hidden comment somewhere on kaggle about an iterator for bcolz Carrays. So now, much like with get_batches, you can use bcolz output with fit_generator etc.
See https://github.com/MPJansen/courses/commit/77acd076e9fd57899617a61af2890c1081622015
If there is interest, I will open a pull request in the course branch.

Usage is like so:

X = bcolz.open(path + 'train_convlayer_features.bc', mode='r')
y = bcolz.open(path + 'train_labels.bc', mode='r')
trn_batches = BcolzArrayIterator(X, y, batch_size=X.chunklen * batch_size, shuffle=True)

model.fit_generator(generator=trn_batches, samples_per_epoch=trn_batches.N, nb_epoch=1)

Thanks to R4mon: http://pastebin.com/y0DskiK6

This way you avoid having to load the entire dataset into memory when you want to eval the dense layers with pre-computed conv. output.

Hope this helps