Dropout understanding

offirinbar · April 29, 2019, 8:38am

Hello

I have a misunderstanding about using dropout. what is the difference between:

1 )what is the difference between ps=[] and emb_drop ?
2) can I use them both? on what variables they work?
3) when I use dropout, it effects my training loss? validation loss? both?

thank you very much!

sheepish · April 30, 2019, 9:01pm

1 )what is the difference between ps=[] and emb_drop ?
ps is the dropout on the hidden layers. emb_drop is the dropout on the embedding input.
2) can I use them both? on what variables they work?
Yes you can use them both.
3) when I use dropout, it effects my training loss? validation loss? both?
Dropout is only applied during training. You don’t want to apply dropout on your evaluation data.

offirinbar · May 1, 2019, 10:05am

@sheepish thank you very much. After this answer, I have some more questions. it will be great if someone can help

I saw in learn.model that that emb_drop works only one time, why is that?
Is there a good strategy to tune those parameters?
in jeremys note book he tuned ps[0.001,0.01]. do you know whats the reason he defined it like it?
so when I use dropout, I should expect to see a differnce only in the train_loss? the validition loss is not effected at all?
something else: in tabular_learner we have to tune layers[]. this parameter defines the number is nodes in each layer (“vertical”) or the number of hidden layers?

digitalspecialists · May 1, 2019, 8:13pm

As it happens, I read this survey of dropout history today. Nice summary… https://arxiv.org/abs/1904.13310

rpcoelho · June 26, 2019, 4:25pm

Hi @sheepish,

Do you know if there is a way to set some sort of random seed for the dropout? I’m running a model twice, with the same parameters, and each time I run I get a different MAE. The only reason I can think of is that the randomness of the dropout causes this. Wouldn’t you agree?

dangraf · June 26, 2019, 4:44pm

Hello.
The validation loss is affected by adding dropout. It will hopefully create a more general model since you are removing information during training forcing the model to use information from all input data. Hopefully you get a lower loss on the validation set. But as mentioned earlier, the dropout is only done during training.

Yes it’s possible to set a random seed to make the dropout deterministic. This function works if you are using the cuda cores:

def set_seed(seed=8):
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # gpu vars
    torch.backends.cudnn.deterministic = True  # needed
    torch.backends.cudnn.benchmark = False

and you might need to set num_workers to 1 in your databunch.

rpcoelho · June 26, 2019, 7:59pm

Thank you for your reply Daniel. I don’t know if I’m setting num_workers in the right place because I’m still getting different results for each run. I’m loading the datasets already split so this can’t be causing the problems. Here is how I’m creating the databunch, setting the seeds and running the learner:

 def random_seed(seed_value):
    import random 
    random.seed(seed_value) # Python
    import numpy as np
    np.random.seed(seed_value) # cpu vars
    import torch
    torch.manual_seed(seed_value) # cpu  vars
    
    if torch.cuda.is_available(): 
        torch.cuda.manual_seed(seed_value)
        torch.cuda.manual_seed_all(seed_value) # gpu vars
        torch.backends.cudnn.deterministic = True  #needed
        torch.backends.cudnn.benchmark = False

random_seed(0)

dep_var = 'NumberOfSales'
df = train_df[cat_vars + cont_vars + [dep_var]].copy()

path="c:/Benchmarking/testBench.csv"
data = (TabularList.from_df(df, cat_names=cat_vars, cont_names=cont_vars, procs=procs,)
                .split_by_idx(valid_idx)
                .label_from_df(cols=dep_var, label_cls=FloatList, log=False)
                .add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars))
                .databunch(num_workers=1))

    #x=best.x   I'm using scikit opt to find the best parameters but then can't reproduce the results.
    x=[500, 500, 100, 0.0005, 0.4, 8]
    print(x)
    learn3 = tabular_learner(data, layers=[x[0],x[1],x[2]], ps=[0.09,0.5,0.5], emb_drop=0.04, 
                        y_range=y_range, metrics=mae)
    learn3.fit_one_cycle(1, x[3], wd=x[4], div_factor=x[5])