I am seeing a bizzare issue with get_data function that we use before we save_array to persist the pre-processed numpy array to disk.
Here’s the notebook that I am using - https://github.com/thejaswiraya/misc/blob/master/vgg16_save_array_issue.ipynb
In cell 13, where I call get_data, the memory usage on my P2 instance increases from almost 0 to almost 55GB! The training data on disk is only 550MB (100 times smaller than the memory occupied). After this step, 8/10 times I end up getting
OSError: [Errno 12] Cannot allocate memory as you can see in cell 19.
The remaining times when I don’t get the memory error, fit_generator function takes around 600s to run which is the same amount of time it would take to run if I skipped using bcolz. I remember when @jeremy uses load_array to load the preprocessed arrays, and uses fit_generator it takes him only 300s. Why is the universe being unkind to me?
I have tried calling only load_array, to load the preprocessed arrays from disk, but I still don’t see any improvements. Infact using get_data and load_array via bcolz seems to be making things worse by occupying way more memory than it needs too.
Anybody else facing the same issue?