Low performance on GPU

Hi,

I am trying to run my biLSTM model on DGX station. Installed latest miniconda (4.3.30) on my suer account, and created a new conda environment as follows:

name: tfGPU
dependencies:

  • python=3.5
  • keras=2.1.2
  • tensorflow-gpu=1.3.0
  • mkl=11.3.3
  • pip:
    • flask==0.12.2
    • requests==2.18.4
    • numpy==1.13.3
    • pandas==0.20.3
    • scikit-learn==0.19.0
    • nltk==3.2.4
    • fuzzywuzzy==0.11.0
    • pymongo==3.5.1
    • gensim==3.0.0
    • h5py==2.7.1
    • pyhdb==0.3.3
    • bs4==4.4.0

I tested the simple mnist_cnn.py (available online) code to make sure everything is ok. It works fine using GPU.
However, when I try my own model which is a simple BiLSTM, I see the model is not converging and does not return the results I get using CPU. It is very wired, and seems some libraries are not working fine.

#LSTM model:
l_lstm = Bidirectional(LSTM(EMBEDDING_DIM))(embedded_sequences)
preds = Dense(labels.shape[1], activation=‘softmax’)(l_lstm)
model = Model(sequence_input, preds)
adam = Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss=‘categorical_crossentropy’,
optimizer=adam,
metrics=[‘acc’])

print("model fitting - Bidirectional LSTM")
model.summary()
model.fit(x_train, y_train, validation_data=(x_val, y_val),
      nb_epoch=n_epoch, batch_size=batch_s) 

I like to know if anyone encountered this issue while running code on GPU?

Best,
Nazanin.

Have you tried monitoring GPU utilization while your code is running??

1 Like

Yeah I checked that for sure.

The issue was solved after downgrading some libraries. It was library incompatibility issue as I guessed.
mkl=11.3.3
python3.5-gdbm