Low performance on GPU

nzaker2 · December 15, 2017, 4:29pm

Hi,

I am trying to run my biLSTM model on DGX station. Installed latest miniconda (4.3.30) on my suer account, and created a new conda environment as follows:

name: tfGPU
dependencies:

python=3.5
keras=2.1.2
tensorflow-gpu=1.3.0
mkl=11.3.3
pip:
- flask==0.12.2
- requests==2.18.4
- numpy==1.13.3
- pandas==0.20.3
- scikit-learn==0.19.0
- nltk==3.2.4
- fuzzywuzzy==0.11.0
- pymongo==3.5.1
- gensim==3.0.0
- h5py==2.7.1
- pyhdb==0.3.3
- bs4==4.4.0

I tested the simple mnist_cnn.py (available online) code to make sure everything is ok. It works fine using GPU.
However, when I try my own model which is a simple BiLSTM, I see the model is not converging and does not return the results I get using CPU. It is very wired, and seems some libraries are not working fine.

#LSTM model:
l_lstm = Bidirectional(LSTM(EMBEDDING_DIM))(embedded_sequences)
preds = Dense(labels.shape[1], activation=‘softmax’)(l_lstm)
model = Model(sequence_input, preds)
adam = Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss=‘categorical_crossentropy’,
optimizer=adam,
metrics=[‘acc’])

print("model fitting - Bidirectional LSTM")
model.summary()
model.fit(x_train, y_train, validation_data=(x_val, y_val),
      nb_epoch=n_epoch, batch_size=batch_s)

I like to know if anyone encountered this issue while running code on GPU?

Best,
Nazanin.

slowfast · December 17, 2017, 10:19am

Have you tried monitoring GPU utilization while your code is running??

nzaker2 · December 21, 2017, 12:43am

Yeah I checked that for sure.

The issue was solved after downgrading some libraries. It was library incompatibility issue as I guessed.
mkl=11.3.3
python3.5-gdbm