Bcolz based generator

catblue88 · June 5, 2017, 5:15pm

Hello,
and Help!
I am stuck in Lesson3 trying to train fc_model.
Since combined sizes of trn and val _features exceed 10GB my computer can’t cope with it.
Besides, it is preposterous to clog memory with such huge arrays being used only in sequential way… that’s what a disk is for! I think…
I tried to write small generator to access bclozed previously arrays: train_convlayer_features.bc and valid_convlayer_features.bc and benefit from fit_generator to train fc_model.

Here’s code, maybe someone with better understanding of Python generators and bcolz can help me find why the bloody thingy is yielding one and same sample?

def flow_from_bcoltz(path,output_labels,batch_size):
    i=0
    n=len(output_labels)
    b= bcolz.open(path)
    while True:
        j=i+batch_size
        if j>n: j=n
        res1, res2 = b[i:j], output_labels[i:j]
        if j-i < batch_size:
            j = batch_size - (n-i)
            res1, res2 = np.vstack((res1,b[0:j])), np.vstack((res2,output_labels[0:j]))
            j=0
        i=j
        yield res1,res2

I spent quite a lot of time before I noticed… :)) being surprised by a huge overfitting!

Peter

iNLyze · June 6, 2017, 10:00pm

Check out part2. Jeremy wrote (or modified) a nice iterator class which does what you need. Combined with a fast SSD drive that works really well. I convert all my image data into bcolz arrays.