What should happen to categorical crossentropy after increasing / reducing dropout?

radek · February 23, 2017, 3:23pm

If I have some model with Dropout layers with p = 0.5, I then recreate the model, set p = 0, divide weights on Dense layers by 2, and run evaluate on the model, I should get exactly the same categorical cross entropy loss as on the earlier model?

This is not what my code produces hence wanted to check if my understanding is correct. In my code I do something like in the gist here

radek · February 23, 2017, 4:05pm

I guess the sanity check on dropout changes would be the below? Assuming I properly made changes to dropout the below should give me a True? (getting a False atm with code like in the gist above)

fc = model_fc.predict(val_X)
fc_zero = model_fc_p_zero.predict(val_X)

np.allclose(fc, fc_zero)

As expected though (at test time the weights of the model with p = 0.5 should be getting halved, thus):

a = model_fc.get_weights()
b = model_fc_p_zero.get_weights()

np.allclose(a[0], 2 * b[0]) # => True
np.allclose(a[1], 2 * b[1]) # => True

Even · February 23, 2017, 6:07pm

I’m not sure about the use of the Lambda function there. It should be fine, but here’s what I did:

def proc_wgts(layer, dropout):
    return [o*(0.5/(1-dropout)) for o in layer.get_weights()]

Which I then call using

    if (dropout != 0.5):
        last_conv_idx = [index for index,layer in enumerate(model.layers) 
                if type(layer) is Convolution2D][-1]
        print('Updating vgg model layers after index ', last_conv_idx)       
        for l1,l2 in zip(model.layers[last_conv_idx+1:], model.layers[last_conv_idx+1:]): l1.set_weights(proc_wgts(l2,dropout))

davecg · February 23, 2017, 8:34pm

Pretty sure when you run evaluate on a model with dropout, it automatically sets dropout to zero and rescales weights for every layer appropriately. (I haven’t actually checked but that was my understanding of using dropout layers.) You can get a sense that this is happening by observing that training loss increases with higher dropout but validation doesn’t (might even go down with better generalizability) - the model is automatically switching dropout on and off for you in different training and testing modes.

So it shouldn’t matter what p is set to for evaluation, only training.

radek · February 24, 2017, 8:56am

Thank you very much for your help :). Figured out what was going on and posted the reasoning here: Bugged entropy loss calculation in Keras?