Training models with kcross validation(5 cross), using tensorflow as back end.
Every time the program start to train the last model, keras always complain it is running out of memory, I call gc after every model are trained, any idea how to release the memory of gpu occupied by keras?
for i, (train, validate) in enumerate(skf):
model, im_dim = mc.generate_model(parsed_json["keras_model"], parsed_json["custom_model"], parsed_json["top_model_index"], parsed_json["learning_rate"])
training_data = _generate_train_data(im_dim, parsed_json["train_dir"], int(parsed_json["train_batch_size"]))
testing_data = _generate_train_data(im_dim, parsed_json["val_dir"], int(parsed_json["test_batch_size"]))
checkpoint = ModelCheckpoint(parsed_json["trained_model_name"] + "cross" + str(index) + ".h5", monitor='val_loss', verbose=0,
save_best_only=True, save_weights_only=False, mode='auto', period=1)
index = index + 1
callback_list = [checkpoint]
start_time = time.time()
training_history = model.fit_generator(training_data,
samples_per_epoch=len(training_data.classes),
nb_epoch = int(parsed_json["nb_epoch"]),
validation_data=testing_data,
nb_val_samples=parsed_json["nb_val_samples"],
callbacks = callback_list)
gc.collect() //cross my finger and beg gc can help me release memory
same model, change nothing, to help gc release, I add one more line before I call it
model = None
gc.collect()
Any recommended way to force python release memory? I search by google, some posts say gc.collect() will force gc of python release non-reference memory, but it do not work in this case, wonder what happen, do keras allocate memory globally?
I was having exactly same issue. I solved by also deleting the training_history variable. Here is my piece of code that doesn’t have memory leakage. Looks like training_history keeps some pointer to GPU memory.
for fold, (train_clusters_index, valid_clusters_index) in enumerate(kf):
model = create_model(TRAINABLE_VGG, include_top=INCLUDE_TOP)
train_fold_id = folds.index[folds.isin(clusters[train_clusters_index])]
valid_fold_id = folds.index[folds.isin(clusters[valid_clusters_index])]
train_index = np.where(np.in1d(train_id, train_fold_id))[0]
valid_index = np.where(np.in1d(train_id, valid_fold_id))[0]
if SAMPLE < 1:
train_index = subsample(train_index, SAMPLE)
valid_index = subsample(valid_index, SAMPLE)
X_train = train_data[train_index]
Y_train = train_target[train_index]
X_valid = train_data[valid_index]
Y_valid = train_target[valid_index]
base_model = Y_train.mean(axis=0)
train_baseline = log_loss(Y_train, np.tile(base_model, len(Y_train)).reshape(-1, (len(base_model))))
valid_baseline = log_loss(Y_valid, np.tile(base_model, len(Y_valid)).reshape(-1, (len(base_model))))
num_fold += 1
print('Start KFold number {} from {}'.format(num_fold, num_folds))
print('Split train:', len(X_train), 'Baseline: %7.5f' % train_baseline)
print('Split valid:', len(X_valid), 'Baseline: %7.5f' % valid_baseline)
model_file = join(MODEL_FOLDER, 'nn_fold_%02d.model' % fold)
callbacks = [
EarlyStopping(monitor='val_loss', patience=5, verbose=0),
ModelCheckpoint(model_file, save_best_only=True, save_weights_only=True)
]
# fits the model on batches with real-time data augmentation:
history = model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
samples_per_epoch=len(X_train), nb_epoch=nb_epoch,
verbose=2,
validation_data=(X_valid, Y_valid),
callbacks=callbacks)
all_history.append(history)
# Load best epoch
model.load_weights(model_file)
predictions_valid = model.predict(X_valid, batch_size=batch_size, verbose=2)
score = log_loss(Y_valid, predictions_valid)
print('Score log_loss: ', score)
sum_score += score*len(valid_index)
# Store valid predictions
for i in range(len(valid_index)):
yfull_train[valid_index[i]] = predictions_valid[i]
print('Start test KFold number {} from {}'.format(num_fold, num_folds))
test_prediction = model.predict(test_data, batch_size=batch_size, verbose=2)
yfull_test.append(test_prediction)
del history
del model
gc.collect()
score = sum_score/len(train_data)
print("Log_loss train independent avg: ", score)
I had a similar issue with theano. I am not sure how it relates to tensorflow, if it could be something similar.
I had the problem when using the libgpuarray backend, when I changed device configuration in .theanorc from cudo to gpu, keras and theano released memory when I called gc.collect().
Someone said elsewhere that theano can be slow to release the references and recommended to call gc.collect() a couple of times,
I ran into similar problems. I found that deleting the model variable and calling the gc.collect() helped. The behaviour of GPU memory getting freed still seems somewhat random. At times I had to call gc.collect() 12-15 times to free memory.
Yadu, please first don’t implement any K.clear_session() line, just train your model. Then when your training process is finished, if you are going to compose a new model just run K.clear_session() to clear the memory. Or you can use each model with a graph:
import tensorflow as tf
model1 = (load/define first model)
graph1 = tf.get_default_graph()
model2 = (load/define the second model)
graph2 = tf.get_default_graph()
with graph1.as_default():
model1.predict(X) (or training)
with graph2.as_default():
model2.predict(X) (or training)
This code helped me to come over the problem of GPU memory not releasing after the process is over. Run this code at the start of your program. Thanks.
I had the same problem. I created this function which works every time now. It releases the GPU memory. I call these before each run.
from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow
# Reset Keras Session
def reset_keras():
sess = get_session()
clear_session()
sess.close()
sess = get_session()
try:
del classifier # this is from global space - change this as you need
except:
pass
print(gc.collect()) # if it's done something you should see a number being outputted
# use the same config as you used to create the session
config = tensorflow.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 1
config.gpu_options.visible_device_list = "0"
set_session(tensorflow.Session(config=config))
I had the same issue. For me, the solution was del the input array of train, the input array of test and the model, then i clear session of keras and call gc.collect().
from keras import backend as bek
del X
del XT
del model
bek.clear_session()
gc.collect()