Memory issue in the begining lesson3, needs 8GB of memory?

DimaN · July 7, 2017, 7:26pm

EDIT :
From some strange reason, after few minutes it started to work.
I would’ve deleted this post, but I can’t figure out how

ORIGINAL POST :

Hi guys,

I’m struggling with memory issue in lesson 3.
I’m running on my local PC (16GB ram) and 1060GTX 6GB.

When I run the following code:
> trn_features = conv_model.predict_generator(batches, batches.nb_sample)

I get :

MemoryError Traceback (most recent call last)
in ()
----> 1 trn_features = conv_model.predict_generator(batches, batches.nb_sample)

D:\Anaconda2\lib\site-packages\keras\models.pyc in predict_generator(self, generator, val_samples, max_q_size, nb_worker, pickle_safe)
1010 max_q_size=max_q_size,
1011 nb_worker=nb_worker,
→ 1012 pickle_safe=pickle_safe)
1013
1014 def get_config(self):

D:\Anaconda2\lib\site-packages\keras\engine\training.pyc in predict_generator(self, generator, val_samples, max_q_size, nb_worker, pickle_safe)
1776 for out in outs:
1777 shape = (val_samples,) + out.shape[1:]
→ 1778 all_outs.append(np.zeros(shape, dtype=K.floatx()))
1779
1780 for i, out in enumerate(outs):

MemoryError:

Theoretically if it was be able to run, the output should have been of shape : (23000,512,14,14).
Each number in the array is float32 (4 bytes).
Meaning this array needs : 8804.68MB of memory.

I’m not sure where theano puts this array:

If it stores it in the RAM I should have enough space.
But if it tries to store it in my GPU memory it won’t have enough space.

Any suggestion ? I assume many of you have encountered this issue.

telarson · July 7, 2017, 8:33pm

This feels like it should work. How big are your batches?

To diagnose if it’s a main memory or GPU memory you could run nvidia-smi while the program is running.

generator functions don’t load the entire test or training set into GPU memory when they are invoked so I doubt this is a GPU issue.

telarson · July 8, 2017, 6:40pm

I believe you’re running out of system memory, not GPU memory.

Looking at predict_generator in training.py

        if len(all_outs) == 0:
            for out in outs:
                shape = (val_samples,) + out.shape[1:]
                all_outs.append(np.zeros(shape, dtype=K.floatx()))

        for i, out in enumerate(outs):
            all_outs[i][processed_samples:(processed_samples + nb_samples)] = out
        processed_samples += nb_samples

all_outs is a list of numpy arrays containing val_samples number of samples. As you mentioned to store all of these in memory you need 8G of system memory.

You could run predict generator twice, store the results to disk. I don’t know if running “fit” twice with two datasets that are half the size of the original data set would be equivalent to running the entire dataset through for one epoch. It might be.

cold_fashioned · August 9, 2017, 2:55pm

Hi, I’m having a similar issue with the beginning of Lesson 3. When I create the trn_features matrix, it’s so large that it’s taking up almost all of my 16GB of system memory. When I then run the fc_model.fit block of code, I get an insufficient memory error. Is there a way to either batch the code for creating trn_features, or a way to compress the output so it doesn’t take up too much space before model.fit? I would’ve thought 16GB would be plenty of memory, but apparently, it’s not (at least how the course jupyter notebooks are setup). Thanks.