If I run my this code, it seems to take up all of my RAM memory and crash my computer:
train_data = np.concatenate([next(x)[0] for _ in range(int(np.ceil(total_samples/r_batch_size)))])
save_array(os.path.join(save_dir, "train_data"), train_data)
Any tips as to how to save it without loading everything into memory?
You can create an empty bcolz array and append to it:
# assuming you have images of 128x128:
bcolz_array = bcolz.carray(np.zeros([0,3,128,128], dtype=np.float32), mode='w', rootdir=path)
for x in your_data:
bcolz_array.append(x)
bcolz_array.flush()
Oh yeah, forgot to specify the chunklen, this should work : bcolz_array = bcolz.carray(np.zeros([0,img_width, img_height,3], dtype=np.float32), chunklen=1, mode='w', rootdir=fname)
Also, I put the channels first because that’s what pytorch expects.