I am using LSTM models on time series data - essentially, I am trying to predict the next bit in a data stream where the input is the packet payload (up to 64 bits).
To determine the optimal set of hyperparameters, I am using a function to randomly generate the params (see func below) and just build/train around 20 models. Some of the params aren’t relevant but I basically wanted to determine how many hidden layers, hidden units, which loss function to use, which activation, etc.
But now that I am done training these 20 models, I’m not 100% sure how to pick the best model.
I plot the training / validation losses for each model to see which one is the best model and I found one that did really strong, the validation loss is 0.10 and training loss is 0.09 - which is the lowest loss for any of the models. However, when I dug deeper, the model had an accuracy of 0.05 and validation accuracy of 0.03. Is this model still a good pick?
I am kinda stuck at this point not too sure how to proceed. Any suggestions or advice? Thanks!
def generate_random_parameters(pid=1):
"""
Produces a set of randomized parameters, within the constraints I've
defined for this problem.
:return:
"""
num_hidden_layer_range = [1, 2, 3, 4]
num_hidden_units_range = [16, 32, 64, 128]
hidden_activation_range = ['relu', 'sigmoid']
rnn_type_range = ['lstm']
num_rnn_layer_range = [1, 2, 3, 4]
num_rnn_units_range = [16, 32, 64, 128]
rnn_activation_range = ['relu', 'sigmoid']
num_steps_range = [10, 20]
batch_size_range = [64]
dropout_range = np.array([0.2, 0.5],
dtype=np.float32)
loss_types = ['binary_crossentropy', 'mean_absolute_error', 'mean_squared_error']
optimizer_range = ['adam']
c = {'version': 1.0,
'base_dir': '../models/rnn{}'.format(pid),
'input_dim': 64,
'pid': pid, #changed from 0 to 5 for testing purposes
'remove_const_ids': True, #Original: True # np.random.choice(remove_constant_ids_range),
'num_hidden_layer': np.random.choice(num_hidden_layer_range),
'num_hidden_unit': np.random.choice(num_hidden_units_range),
'hidden_activation': np.random.choice(hidden_activation_range),
'rnn_type': np.random.choice(rnn_type_range),
'num_rnn_layer': np.random.choice(num_rnn_layer_range),
'num_rnn_unit': np.random.choice(num_rnn_units_range),
'rnn_activation': np.random.choice(rnn_activation_range),
'rnn_implementation': 0,
# Training parameters
'num_steps': np.random.choice(num_steps_range),
'batch_size': np.random.choice(batch_size_range),
'dropout_input': np.random.choice(dropout_range),
'dropout_hidden': np.random.choice(dropout_range),
'dropout_rnn_input': np.random.choice(dropout_range),
'dropout_rnn_recurrent': np.random.choice(dropout_range),
'optimizer': np.random.choice(optimizer_range),
'loss': np.random.choice(loss_types),
'num_epochs': 100, #Original: 1000
'train_file_idx': range(0, 3),
'val_file_idx': range(3, 5),
'patience': 3,
'shuffle_epochs': True,
'consume_less': 'gpu',
'batch_normalization': True,
'high_rate_data_only': True,
}
return c