Dropout and Batch Normalization

Hi all,

Is there a consensus regarding using both dropout and batch norm? I was looking over the original resnet paper, and they used batch norm without dropout. I think the authors referenced a paper saying dropout shouldn’t be used with batch norm. Then I reviewed newer architectures, densenet and efficientnet, and they both used batch norm and dropout. If the trend is to use both, are there rules of thumb of how much to use for each?



Good question! Also looking for more information on this…