Lesson 3 - Most GPU memory used up before fitting the model


(Manu) #1

Hi,

I’m trying to run Lesson 3 in my desktop computer but I’m running into memory problems even before fitting the model. Is this expected? I was assuming not much is sent to the GPU (maybe some initialization stuff) until you fit the model.

When I open the notebook, 236MB are already used

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0 Off |                  N/A |
| 34%   41C    P0     2W /  38W |    236MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      8103      C   /home/manu/anaconda3/envs/py2/bin/python      55MiB |
+-----------------------------------------------------------------------------+

which is fine, I guess (notice that some unrelated processes have been cut from the output). However, right before training the model at

fc_model.fit(trn_features, trn_labels, nb_epoch=8, 
             batch_size=batch_size, validation_data=(val_features, val_labels))

1263MB are already taken

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0 Off |                  N/A |
| 34%   40C    P0     2W /  38W |   1263MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      8103      C   /home/manu/anaconda3/envs/py2/bin/python      55MiB |
|    0      8759      C   /home/manu/anaconda3/envs/py2/bin/python    1027MiB |
+-----------------------------------------------------------------------------+

and when I try and fit the model I get

MemoryError: Error allocating 411041792 bytes of device memory (out of memory).

The funny thing is I always get the same number of bytes in the error (392 MB) no matter which batch size I choose (currently, 8).

Any clues?

Cheers.


#2

HI Manu,

I am having very similar issue. Did you came to any resolution so far on this problem?
Cheers!
Shahin


(Manu) #3

Hi there,

no, I didn’t get to solve it. My conclusion was: it’s taking up a lot of memory before fitting because I’m loading the pre-trained weights from huge models like resnet (or whatever, I don’t recall exactly what model we were using there). In the end, I had to skip trying that lesson (or that part) in my own computer.

Cheers.