This might be a really stupid question, but when you pass in a metric, such as RMSE, how is it being calculated during training?
During training I am getting pretty good results, but then when I call
.predict() on the model (loaded via
best_save_name), and manually calculate the RMSE using the same function, I am getting a completely different number (and I’m using the same validation indices for
Also, when validating (at the end of each epoch during training), it iterates through 1236 validation samples which, with a batch size of 128, accounts for only 17.5% of my data (and is only half of my validation size, which is 35%).
Here is a screenshot of the min loss and rmse shown during training vs the rmse calculated with the best save:
As you can see, there isn’t even a relationship between the best performance…
So I really have no idea how this is being calculated during training! I really hope someone can clarify what is probably a simple misunderstanding.
Edit: and just to make sure there is no bug in how I’m calculating the loss of the saved models, here is the function:
def load_model_get_val_rmse(saved_weights, loc_val_idx): # init model objects md = ColumnarModelData.from_data_frame(path = 'models', val_idxs = loc_val_idx, df = df, y = yl.astype(np.float32), cat_flds = cat_vars, bs = 128, test_df = df_test) m = md.get_learner(emb_szs = emb_szs, n_cont = len(df.columns) - len(cat_vars), emb_drop = 0.04, out_sz = 1, szs = arch, drops = dropout, y_range = y_range) # load saved weights m.load(saved_weights) # calc rmse yl_true = deepcopy(yl[loc_val_idx]) yl_pred = deepcopy(m.predict().reshape(-1,)) error = rmse(yl_pred, yl_true) return error