Choosing the Best Model Using Random Hyperparameter Optimization

I am using LSTM models on time series data - essentially, I am trying to predict the next bit in a data stream where the input is the packet payload (up to 64 bits).

To determine the optimal set of hyperparameters, I am using a function to randomly generate the params (see func below) and just build/train around 20 models. Some of the params aren’t relevant but I basically wanted to determine how many hidden layers, hidden units, which loss function to use, which activation, etc.

But now that I am done training these 20 models, I’m not 100% sure how to pick the best model.
I plot the training / validation losses for each model to see which one is the best model and I found one that did really strong, the validation loss is 0.10 and training loss is 0.09 - which is the lowest loss for any of the models. However, when I dug deeper, the model had an accuracy of 0.05 and validation accuracy of 0.03. Is this model still a good pick?

I am kinda stuck at this point not too sure how to proceed. Any suggestions or advice? Thanks!

def generate_random_parameters(pid=1):
"""
Produces a set of randomized parameters, within the constraints I've 
defined for this problem.
:return: 
"""

num_hidden_layer_range = [1, 2, 3, 4]  
num_hidden_units_range = [16, 32, 64, 128] 
hidden_activation_range = ['relu', 'sigmoid']

rnn_type_range = ['lstm']
num_rnn_layer_range = [1, 2, 3, 4]
num_rnn_units_range = [16, 32, 64, 128] 
rnn_activation_range = ['relu', 'sigmoid']

num_steps_range = [10, 20]  
batch_size_range = [64] 
dropout_range = np.array([0.2, 0.5],
                                   dtype=np.float32)

loss_types = ['binary_crossentropy', 'mean_absolute_error', 'mean_squared_error']

optimizer_range = ['adam']

c = {'version': 1.0,
     'base_dir': '../models/rnn{}'.format(pid),
     'input_dim': 64,
     'pid': pid,  #changed from 0 to 5 for testing purposes
     'remove_const_ids': True,  #Original: True  # np.random.choice(remove_constant_ids_range),

     'num_hidden_layer': np.random.choice(num_hidden_layer_range),
     'num_hidden_unit': np.random.choice(num_hidden_units_range),
     'hidden_activation': np.random.choice(hidden_activation_range),

     'rnn_type': np.random.choice(rnn_type_range),
     'num_rnn_layer': np.random.choice(num_rnn_layer_range),
     'num_rnn_unit': np.random.choice(num_rnn_units_range),
     'rnn_activation': np.random.choice(rnn_activation_range),
     'rnn_implementation': 0,

     # Training parameters
     'num_steps': np.random.choice(num_steps_range),
     'batch_size': np.random.choice(batch_size_range),
     'dropout_input': np.random.choice(dropout_range),
     'dropout_hidden': np.random.choice(dropout_range),
     'dropout_rnn_input': np.random.choice(dropout_range),
     'dropout_rnn_recurrent': np.random.choice(dropout_range),
     'optimizer': np.random.choice(optimizer_range),
     'loss': np.random.choice(loss_types),
     'num_epochs': 100,  #Original: 1000
     'train_file_idx': range(0, 3),
     'val_file_idx': range(3, 5),
     'patience': 3,
     'shuffle_epochs': True,
     'consume_less': 'gpu',
     'batch_normalization': True,
     'high_rate_data_only': True,
     }
return c