I need to adjust a few things but that’s the gist. If we wanted a list we’d adjust accordingly which I will do as well.
For reproducability, here on the forum is a small snippet to have exact reproducability, but we want that varience do we not? Varience is real and can help some times.
@muellerzr do you think this could included in fastai v2 version? I was just revisiting this thread (and your fasntastic code) today and thought it may also be useful to many others.
@mgloria I’m not the biggest fan of having EarlyStopping (due to training can vary and I want that variance) and this thread exists for people to find if they wanted to do so if there’s more want for it I’ll make a notebook with it.
Sure! I meant mostly the K-folds approach. I think it is a great thing to have in fastai. I belive that the default split methods are not stratified (correct me if I am wrong), this can be a problem for imbalanced datasets with many classes…
Hi, everyone! Thank you for the discussion @mgloria and @muellerzr, it was extremely useful.
I am implementing a time series cross validation, and would like to know if you have some advices for me?
I am currently using the TimeSeriesSplit from scikit-learn and the following code:
model = model_builder(model_args=MODEL_ARGS)
model, device = set_device_to_train(model)
tscv = TimeSeriesSplit(n_splits=5)
for train_index, val_index in tscv.split(X_train_val):
# data split based on time series split
X_train, X_val = X_train_val[train_index], X_train_val[val_index]
y_train, y_val = y_train_val[train_index], y_train_val[val_index]
# make data read for the model
# make data ready for model
train_ds = TensorDataset(torch.tensor(X_train).float(), torch.tensor(y_train).unsqueeze(1).float())
valid_ds = TensorDataset(torch.tensor(X_val).float(), torch.tensor(y_val).unsqueeze(1).float())
# Above functions is the same as doing this - maybe we should decrease the training size
dls = DataLoaders.from_dsets(train_ds, valid_ds, bs=MODEL_ARGS['batch_size'])
CBS_ = [
EarlyStoppingCallback(patience=PATIENCE),
SaveModelCallback(fname=f'{TARGET}_model'),
]
# Train the model
learn = Learner(dls,
model,
loss_func=loss_function_builder(MODEL_ARGS['loss_function']),
opt_func=optimizer_builder(MODEL_ARGS['optimizer']),
metrics=metrics_builder(MODEL_ARGS['metrics']),
# Later we can check for more callbacks if needed
cbs=CBS_
)
learn.fit_one_cycle(MODEL_ARGS['epochs']) # should we try to define to do the lr_find() first
I would like to know how can I adapt this code to get the best model based on the average validation loss and save it with the Callback that I have defined in CBS_ variable.
Is there any straight forward solution for it?