Memory Error issue

nima · November 22, 2016, 2:48am

I keep getting a Memory Error on a sample data with a small batch size. Not sure why this is happening (this is a model I’m trying to train from scratch).

Here’s the full code: https://gist.github.com/nemo/a35e53343ea0d113b4ff87dae5a9a1af

Here’s the error:

MemoryError: (‘Error allocating 7803502592 bytes of device memory (out of memory).’, “you might consider using ‘theano.shared(…, borrow=True)’”)

Anyone have thoughts around how to even debug this issue? Everywhere I’ve read speaks about the batch_size which I’ve reduced to 8 and it’s still crashing…

vshets · November 22, 2016, 5:44am

Any off chance restarting the kernel would fix it? Are you also running on a aws instance or locally? Also noticed that your batch normalization does not have axis=1. Apparently it is important for an n channel input.

nima · November 22, 2016, 6:43am

It was the axis=1 – thanks!

maciej · February 11, 2017, 1:20am

Similar thing just in lesson 1

tiagomarques · February 26, 2017, 3:51pm

@maciej were you able to solve the problem? I am facing a similar memory issue.

maciej · February 26, 2017, 9:44pm

I had to switch to bigger GPU, looks like 2GB is minimum to go with it

erlapi · May 8, 2017, 6:58pm

Similar issue using a T2 instance.
Is there a way to actually run the code using T2 ?

erlapi · May 8, 2017, 8:11pm

The answer is no with a T2. micro.

Just found this "The free t2.micro instance type recommended in the lesson does not have enough RAM to use the VGG16 model. "

Here: http://wiki.fast.ai/index.php/Lesson_1

rrwiyatn · July 6, 2017, 2:21pm

Hi,

I am trying to play around with other datasets and it seems that I ran into memory problem. I run my code in my own PC that has 32GB of RAM. When I opened system monitor after I run, let’s say,

x = 10*np.ones((50000,3,224,224))

I can see that it takes almost half of my RAM and won’t free it. Same thing happened when I open my dataset using bcolz, so when I open the rest of my dataset, I get the memory error message.

Any suggestions?

Thanks!

marcodb · August 6, 2017, 6:23am

The lesson suggests to switch to t2.large. Alternatively, you can create a swap file on t2.micro and use swap: https://stackoverflow.com/questions/17173972/how-do-you-add-swap-to-an-ec2-instance

For my t2.micro instance (Ubuntu 16.04 AMI), the minimum swap file size to make lesson1 work is 2GB (with batch_size=1). With 1GB as in the stackoverflow example you’ll still run out of memory.

Note that some comments on stackoverflow talk about AWS charges for EBS I/O, so use at your own risk (I’ve just done it today so I don’t know yet if I’ll have charges).
Personally I created a script to add the swap partition on demand (and don’t persist it), and I reboot after I’m done using the swap (as swapoff won’t work)… just in case.

Simmo · August 6, 2017, 11:54pm

If anyone want’s to try out their environment using t2.mico. I found you can run the Mushroom Classification training without running out memory. https://www.kaggle.com/giuseppemerendino/deep-mushroom-keras-t-sne/notebook. It’s a very small data set but good for validating your environment is setup correctly.