Tabular model is worse after baysian optimization

Hi guys,

I trained a tabular model, first using just fit_one_cycle.

learn = tabular_learner(dls, metrics=[accuracy])
learn.fit_one_cycle(2, cbs=[EarlyStoppingCallback(monitor='accuracy', min_delta=0.01, patience=2)])

Then I did bayesian optimization like the following:

def fit_with(lr:float, wd:float, dp:float, n_layers:float, layer_1:float, layer_2:float, layer_3:float):
  print(lr, wd, dp)
  if int(n_layers) == 2:
    layers = [int(layer_1), int(layer_2)]
  elif int(n_layers) == 3:
    layers = [int(layer_1), int(layer_2), int(layer_3)]
  else:
    layers = [int(layer_1)]
  config = tabular_config(embed_p=float(dp),
                          ps=float(wd))
  learn = tabular_learner(dls, layers=layers, metrics=accuracy, config = config)

  with learn.no_bar() and learn.no_logging():
    learn.fit(5, lr=float(lr))

  acc = float(learn.validate()[1])

  return acc

hps = {'lr': (1e-15, 1e-01),
      'wd': (4e-4, 0.4),
      'dp': (0.01, 0.5),
       'n_layers': (1,3),
       'layer_1': (50, 200),
       'layer_2': (100, 1000),
       'layer_3': (200, 2000)}

optim = BayesianOptimization(
    f = fit_with, # our fit function
    pbounds = hps, # our hyper parameters to tune
    verbose = 2, # 1 prints out when a maximum is observed, 0 for silent
    random_state=RANDOM_SEED
)
optim.maximize(n_iter=10, init_points=5)

opt = optim.max["params"]
layers = [int(opt["layer_1"]), int(opt["layer_2"]), int(opt["layer_3"])]
learn = tabular_learner(dls, layers=layers, metrics=[accuracy])

learn.fit(2, lr=float(opt["lr"]), config=opt, cbs=[EarlyStoppingCallback(monitor='accuracy', min_delta=0.01, patience=2)])

However, the results are much worse than the attempt without fit_one_cycle.
What could be the problem?

Thanks a lot!

I think I figured it out:
The problem is that, the bayesian optimization gives me an optimal size for all three layers, although the optimal number of layers (n_layers is 1.9). Therefore I have to use just two layers if I round n_layers (

round(n_layers)

), or one, if I cut it

int(n_layers)

(I cutted n_layers, because I got better results with just one layer, although I’m not shure if that makes sense )

I implemented:

layers = []
for i in range(0,int(optim.max["params"]["n_layers"])):
    selector = "layer_" + str(i+1)
    layers.append(int(opt[selector]))

to grab the optimal n_layers dynamically.

Hi Daniel!

Interesting work. And does the optimized model now give better results than the default settings?

You should also cut it as the algorithm cuts it as well, so you reproduce it the same (the three if statements in the beginning), cut aka cast to an integer.

1 Like

ah yeah, you are totally right, thanks for pointing that out!

Also one more observation, you’re optimizing for 5 epochs, but attempting to reproduce with 2 epochs. You should be using 5 epochs there as well. (You’re also not using EarlyStoppingCallback too)

yeah, however just a bit:
accuracy: 64,10% => 64,25%
AUC: 63,88% => 63,96%

I guess I still do something wrong in training after optimization

True. I will retrain.
You mean I have to use early stopping as well in the Bayesian Optimization?
How can I do that?

Or you just don’t use EarlyStopping when you reproduce it. The fit_with function is all fastai code, so you just add it to the call to fit inside the function. If you adjust anything that is inside the bayesian fit function when you try to reproduce it, then you are not reproducing what you did. I would highly recommend watching lesson 3 from roughly 48 minutes in:

1 Like

okey, thank you!