Tabular model is worse after baysian optimization

dp3011 · July 21, 2020, 11:51pm

Hi guys,

I trained a tabular model, first using just fit_one_cycle.

learn = tabular_learner(dls, metrics=[accuracy])
learn.fit_one_cycle(2, cbs=[EarlyStoppingCallback(monitor='accuracy', min_delta=0.01, patience=2)])

Then I did bayesian optimization like the following:

def fit_with(lr:float, wd:float, dp:float, n_layers:float, layer_1:float, layer_2:float, layer_3:float):
  print(lr, wd, dp)
  if int(n_layers) == 2:
    layers = [int(layer_1), int(layer_2)]
  elif int(n_layers) == 3:
    layers = [int(layer_1), int(layer_2), int(layer_3)]
  else:
    layers = [int(layer_1)]
  config = tabular_config(embed_p=float(dp),
                          ps=float(wd))
  learn = tabular_learner(dls, layers=layers, metrics=accuracy, config = config)

  with learn.no_bar() and learn.no_logging():
    learn.fit(5, lr=float(lr))

  acc = float(learn.validate()[1])

  return acc

hps = {'lr': (1e-15, 1e-01),
      'wd': (4e-4, 0.4),
      'dp': (0.01, 0.5),
       'n_layers': (1,3),
       'layer_1': (50, 200),
       'layer_2': (100, 1000),
       'layer_3': (200, 2000)}

optim = BayesianOptimization(
    f = fit_with, # our fit function
    pbounds = hps, # our hyper parameters to tune
    verbose = 2, # 1 prints out when a maximum is observed, 0 for silent
    random_state=RANDOM_SEED
)
optim.maximize(n_iter=10, init_points=5)

opt = optim.max["params"]
layers = [int(opt["layer_1"]), int(opt["layer_2"]), int(opt["layer_3"])]
learn = tabular_learner(dls, layers=layers, metrics=[accuracy])

learn.fit(2, lr=float(opt["lr"]), config=opt, cbs=[EarlyStoppingCallback(monitor='accuracy', min_delta=0.01, patience=2)])

However, the results are much worse than the attempt without fit_one_cycle.
What could be the problem?

Thanks a lot!

dp3011 · July 23, 2020, 11:27am

I think I figured it out:
The problem is that, the bayesian optimization gives me an optimal size for all three layers, although the optimal number of layers (n_layers is 1.9). Therefore I have to use just two layers if I round n_layers (

round(n_layers)

), or one, if I cut it

int(n_layers)

(I cutted n_layers, because I got better results with just one layer, although I’m not shure if that makes sense )

I implemented:

layers = []
for i in range(0,int(optim.max["params"]["n_layers"])):
    selector = "layer_" + str(i+1)
    layers.append(int(opt[selector]))

to grab the optimal n_layers dynamically.

stefan-ai · July 23, 2020, 11:56am

Hi Daniel!

Interesting work. And does the optimized model now give better results than the default settings?

muellerzr · July 23, 2020, 11:59am

You should also cut it as the algorithm cuts it as well, so you reproduce it the same (the three if statements in the beginning), cut aka cast to an integer.

dp3011 · July 23, 2020, 12:54pm

ah yeah, you are totally right, thanks for pointing that out!

muellerzr · July 23, 2020, 12:56pm

Also one more observation, you’re optimizing for 5 epochs, but attempting to reproduce with 2 epochs. You should be using 5 epochs there as well. (You’re also not using EarlyStoppingCallback too)

dp3011 · July 23, 2020, 12:59pm

yeah, however just a bit:
accuracy: 64,10% => 64,25%
AUC: 63,88% => 63,96%

I guess I still do something wrong in training after optimization

dp3011 · July 23, 2020, 1:02pm

True. I will retrain.
You mean I have to use early stopping as well in the Bayesian Optimization?
How can I do that?

muellerzr · July 23, 2020, 1:10pm

Or you just don’t use EarlyStopping when you reproduce it. The fit_with function is all fastai code, so you just add it to the call to fit inside the function. If you adjust anything that is inside the bayesian fit function when you try to reproduce it, then you are not reproducing what you did. I would highly recommend watching lesson 3 from roughly 48 minutes in:

dp3011 · July 23, 2020, 1:13pm

okey, thank you!