Get_data function running out of memory on training data in lesson 2

Hello all, I’m going through this MOOC on a laptop with a GTX 960 GPU and 16GB CPU RAM. I was able to get through lesson 1 by setting the batch size down to 10, but hit a wall in lesson 2 when running the get_data helper function on the training set. I was able to run get_data for the validation set but it looks to take up 5GB of RAM. I went ahead and used bcolz to save the validation data and deleted the variable to free up some memory, but the training data still uses up all 16 GB and 8GB of swap memory before my computer hangs and I have to REISUB and restart.

Looking at htop I see that only one of the four cores in my computer is working at 100% when running the get_data function. Looking at this thread Numpy core affinity , I thought it might be an issue with Importing Numpy and Scipy messing with core affinity but the issue was supposedly resolved in newer versions and I have all the newest versions of Scipy and Numpy installed.

16GB isn’t enough to use get_data() for this data set. So just use fit_generator instead on the batches (get_data is simply a little time-saver for pre-calculating the resized images and keeping them in RAM).

3 Likes

Thanks for your reply. I was able to fit the model using just the batches; however I ran into an issue when evaluating the model on validation data.

This line model.evaluate(val_batches, val_labels)
gives me the error
Exception: Error when checking model input: data should be a Numpy array, or list/dict of Numpy arrays. Found: <keras.preprocessing.image.DirectoryIterator object at 0x7fba7d106390>..

It looks like the get_batches() return a iterator unlike load_array() which returns a numpy list.

1 Like

@clu2033, you could try using fit_generator then

4 Likes

@mattobrien415 This was very helpful for me. I thank you and Jeremy both for the help.

1 Like

I think this is similar to the error I’m having in notebook2. So I’m a little confused on how to implement where to implement fit_generator. Is it going to be directly on trn_data from the line of code:
trn_data = get_batches(path+‘train’, shuffle=False, batch_size=1, class_mode=None, target_size=(224,224)) if so what are the parameters we need to pass in? I appreciate your help.

1 Like

Search the github repo for ‘fit_generator’ for examples of how we use it in the lessons.

2 Likes

@jeremy Hi Jeremy
I saw your suggestion with regards to ‘fit_generator’ function.
I’ve tried, but I think I am doing something wrong about it - something is totally missing in my understanding.
Now I am getting dimensions mismatch errors (the errors and notebook is here)
I am running it on p2.xlarge on AWS.
Does everyone have the memory issue on it or something is wrong with my server setup?

To my limited understanding Correct me if I’m wrong.

The fit_generator is a replacement for the model’s fit function. Some of the arguments have been moved around ore replaced because it’s using a generator to get its data instead of being handed the data directly.

So you can us it something like:

BATCH_SIZE = 64
CATEGORIES = ['cat', 'dog']
TARGET_SIZE = (224, 224)

gen = ImageDataGenerator()

t_batches = gen.flow_from_directory(T_DIR, target_size=TARGET_SIZE, batch_size=BATCH_SIZE)
v_batches = gen.flow_from_directory(V_DIR, target_size=TARGET_SIZE, batch_size=BATCH_SIZE)

model.fit_generator(t_batches, samples_per_epoch=t_batches.n, nb_epoch=5,
                    validation_data=v_batches, nb_val_samples=v_batches.n)

Then you should get out your normal:

Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/5
23000/23000 [==============================] - 246s - loss: 0.5111 - acc: 0.8000 - val_loss: 0.2962 - val_acc: 0.8810

etc.

thank you @telesphore I have tried that but I am running into the same error.
So there must be something fundamentally wrong with how I pass this data - still trying to figure it out :frowning:

@jeremy
hi Jeremy
To be honest I am still struggling with this and could not figure it out :frowning:
Even though I tried what @telesphore has offered below, I am still running into the same error:

Exception: Error when checking model input: expected dense_input_2 to have 2 dimensions, but got array with shape (1, 3, 224, 224)

I’d love to figure it out, but looks like I need some help with it. Much appreciated!

@katya The linear model is supposed to go from a vector of length 1000 (the predictions of the vgg imagenet model) to a vector of length 2 (the dogs and cat categories).
But instead you are feeding it with “batches” which is the imagepreprocessor and hence you are giving the lm model an input of shape (1,3,224,224), which is in fact the input of the vgg16 model.
Basically what you are expected to do is:
IMAGES (3,224,224) – pretrained- vgg16 —> 1000 vector of probability for each imagenet category
then, once you have these predictions use a linear model to:
1000vector ---- lm.fit ----> 2 vector of probability for cat and dog

but what you are incorrectly doing is:
IMAGES (3,224,224) — lm ----> 2 vector of probability for cat and dog
since “lm” expects a 1000 vector (as defined in your code line: lm = Sequential([Dense(2, activation=‘softmax’, input_shape=(1000,))]) you receive the error message.

2 Likes

thank you so much for this explanation @Gelu74

I also have 16GB and had an issue with get_data(), but I didn’t want to deviate to much from the lesson’s notebook by using fit_generator. So instead, I constructed the predictions on the training images with a for loop. Here is my code:

imgs,_ = next(batches)
trn_features = model.predict(imgs, batch_size=batch_size)

for i in range(len(batches.filenames) - 1):
imgs,_ = next(batches)
pred_new = model.predict(imgs, batch_size=batch_size)
trn_features = np.vstack((trn_features, pred_new))

3 Likes

Thanks for the tip - this definitely helped, but took about an hour to build the trn_features array. I think something weird might be going on, because trn_features.shape returned (184000, 1000), instead of (23,000, 1000). So, when I run lm.fit, there’s a mismatch between the inputs and targets. Anyone else run into this issue?

I think you have a mismatch between the batch size in batches and the batch size in predict cussing you to execute the for loop 8 times more than you should (184,000 results instead of 23,000)

Thanks - I noticed this perfect 8x multiplication also. It took a little bit of diving, but I fixed it! I probably should’ve changed one thing at a time, so I know where the problem was, but I didn’t want to wait and changed any batch_size references (within reason) from 8 to 1. Thanks for the help. It still took a while to load all the images, but once loaded, the model trained insanely quickly! This lesson has had me stuck for a while, but at least I’m learning and getting more familiar with the programs as I go along.

You’re welcome! Good luck with the rest of the course, this is the best MOOC I ever took.

How can I know how much memory I need in this dataset?Are there some methods for computing this?

Jeremy suggested using fit_generator and get_batches to replace get_data when people have the issue with ram. can fit_generator and get_batches be used with read_csv for multilabel? is there an example of that? (i did searched in the github, most examples i see are not multilabel)