DropBlock: A new regularization

Authors are primilarily comparing to dropout saying dropout is ineffective in preventing overfitting as information still seeps in from other neurons due to the nature of convolution. Instead they shut off a specific n*n region in the feature maps so that the feature present in that region has no chance of being collected by any neuron and the network is forced to learn from other features. Fig6 is interesting, activation maps show that drop block looks at more distinct regions in an image to make its predictions.

This looks like it could be a handy trick. Results(claimed) seem good.